0% found this document useful (0 votes)
55 views

A Comparison of Machine Learning Algorithms for Customer Churn Prediction

The paper compares various machine learning algorithms for predicting customer churn across different industries, emphasizing the importance of churn prediction for business sustainability. It reviews algorithms such as Decision Tree, Random Forest, AdaBoost, and XGBoost, finding them optimal for churn prediction, while also discussing data preprocessing techniques. The study highlights the performance of these algorithms on datasets from telecommunications, banking, and e-commerce, concluding that Random Forest and ANN yield the best accuracy but vary in training time.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

A Comparison of Machine Learning Algorithms for Customer Churn Prediction

The paper compares various machine learning algorithms for predicting customer churn across different industries, emphasizing the importance of churn prediction for business sustainability. It reviews algorithms such as Decision Tree, Random Forest, AdaBoost, and XGBoost, finding them optimal for churn prediction, while also discussing data preprocessing techniques. The study highlights the performance of these algorithms on datasets from telecommunications, banking, and e-commerce, concluding that Random Forest and ANN yield the best accuracy but vary in training time.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2023 6th International Conference on Advances in Science and Technology (ICAST)

A Comparison of Machine Learning Algorithms for


Customer Churn Prediction
Parth Pulkundwar Krishna Rudani Omkar Rane
2023 6th International Conference on Advances in Science and Technology (ICAST) | 979-8-3503-5981-7/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICAST59062.2023.10455051

Department of Computer Engineering Department of Computer Engineering Department of Computer Engineering


K. J. Somaiya Institute of Technology K. J. Somaiya Institute of Technology K. J. Somaiya Institute of Technology
Mumbai, India Mumbai, India Mumbai, India
[email protected] [email protected] [email protected]

Chintan Shah Dr. Shyamal Virnodkar


Department of Computer Engineering Department of Computer Engineering
K. J. Somaiya Institute of Technology K. J. Somaiya Institute of Technology
Mumbai, India Mumbai, India
[email protected] [email protected]

Abstract— Today's fiercely competitive business customer’s future with respect to their company, the fact that
environment has given significant importance to customer they cease their involvement with any of the company’s
churn, a term used for the loss of customers, which possesses a products or services is called Churn.
significant challenge to organizations across various industries.
To mitigate revenue loss and sustain growth, companies are
increasingly turning to machine learning (ML) algorithms for Companies may create their own datasets to keep track of
customer churn prediction. This review paper provides a customers who “churned” or stopped using their products or
concise examination of ML algorithms' role in predicting services. There are traditional statistical methods which were
customer churn, a pivotal concern for businesses seeking to used for quite some time in the analysis of churn. However,
sustain growth and profitability. The review begins by today’s world is blessed with advancements in computing
underlining the significance of customer churn in today's
competitive landscape, highlighting the impact of data-driven
technology, as well as the rapid increase of ML algorithms
approaches in this context. The paper then explores various ML for churn prediction. These advanced algorithms enable
algorithms suitable for churn prediction and comparing the businesses to not only identify churn patterns but also to
results to find out the most optimal algorithm for a few real- harness the power of predictive analytics, allowing for more
world scenarios, namely telecommunication, banking and e- proactive and targeted retention efforts. Additionally, the
commerce. The review found that Decision Tree Classification, scalability and adaptability of ML models make them
Random Forest Classification, AdaBoost and XGBoost
Classification algorithms were optimal for churn prediction.
invaluable in handling vast and complex datasets, providing
Additionally, the review covers the implementation of the businesses with a competitive edge in customer retention
findings in a churn prediction application. strategies. There are also Deep Learning Models to take
advantage of. However, owing to their high computational
Index Terms— Machine Learning, Churn Prediction, Data- power requirements, as well as higher model training time, it
driven Approaches, Gradient Boosting Algorithms, Customer, was decided not to include them in this comparative study, as
churn
all other models were comparatively less demanding.
I. INTRODUCTION
This paper intends to discover the impact of various ML
Contemporary world businesses have loads of data to work algorithms on real-world scenarios. This paper compares
and grow from. Every move of a human in this age generates accuracy and time required for each of the nine algorithms to
data, straight from their smartwatches to their choice of classify a new data item. The analysis made by this study will
turning on the ceiling fan at their homes. However what be utilized in a churn predictor application.
matters is how these companies handle this data. Data being
the new oil, has numerous uses, which only need to be II. LITERATURE REVIEW
uncovered with innovative analytics, insightful While looking for ML algorithms to process the real-world
interpretation, and strategic application to unlock its full scenarios on, this study took care of two factors: first is
potential for driving business growth and societal accuracy of the model, for obvious reasons, and second is the
advancements. time taken by the model to train. The latter was of equal
It is widely recognized that maintaining a current customer importance as the ultimate intent was of the creation a client-
is more cost-effective than acquiring a new one. [2, 5]. One centered portal for predicting customer churn.
of the key metrics in this regard is Customer Churn. In simple
words, customer churn refers to the portion of your customer The recent years have witnessed the use of Decision Trees
base that stops to engage with your products or services (DT) based algorithms, as well as Ensemble Learning
within a specified timeframe. Thus, when predicting a methods for Churn prediction [1]. Decision Tree algorithms

979-8-3503-5981-7/23/$31.00 ©2023 IEEE 437


Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 05,2025 at 04:04:15 UTC from IEEE Xplore. Restrictions apply.
2023 6th International Conference on Advances in Science and Technology (ICAST)
have been in use for a long time, because they are simple and have values in them differing from each other in terms of
easy to comprehend [3]. datatypes and extremes, it is required to alter these data items
Logistic regression is another ML algorithm which is easy in order to make them workable.
to implement and train a dataset on, while also ensuring no This means that, preprocessing of the datasets is required.
assumptions are made about the distributions [4] of classes in Preprocessing involves Cleaning, Transformation and
the feature space which is good for customer churn Reduction of the data.
prediction.
Random Forest is one of the Ensemble Learning methods, 1. Data cleaning involves finding and rectifying errors or
which are deployed for use in regression as well as inconsistencies in the data [15].
classification used for customer churn prediction [7, 10]. It 2. Transformation: This process entails preparing data
avoids overfitting to a high extent, and also scales quite well for analysis by employing various techniques.
on data. Common methods for data transformation include
Support Vector Machines (SVM) are mainly used for data normalization, standardization, and discretization.
that has unknown distribution. It also doesn’t suffer from 3. Data Reduction involves reduction of volume of data,
overfitting. SVM models are parametric hence it maximizes but still keeping essential information in a dataset.
the effectiveness of churn prediction [16].
Gradient Boosting algorithms work on somewhat similar The datasets had a few missing values, columns with
lines as Random Forest. They involve an ensemble of weak categorical variables, as well as imbalanced classes. Hence,
prediction models, wherein the newer model learns on the before training the models on the datasets, it was important to
shortcomings of the previous model making more accurate deal with these issues first.
churn predictions [14]. 4.1 Missing Values: In the datasets, the records with
There are two very commonly used gradient boosting missing values were removed.
algorithms, which was in this review, namely AdaBoost and
XGBoost. AdaBoost (Adaptive Boosting) is one of the 4.2 Categorical Variables: As numeric inputs are a must for
earliest and most well-known gradient boosting algorithms. a majority of ML algorithms, categorical variables were
XGBoost (Extreme Gradient Boosting) is a more recent and encoded into numeric values using One Hot Encoding.
highly optimized gradient boosting algorithm [8].
K-Nearest Neighbors (KNN) classification offers several 4.3 Normalization: The features have different units of
advantages. It doesn't require a training period, making it measurements, therefore MinMax Scaling method was used
highly time-efficient and suitable for quick modeling on for uniform data normalization as shown in equation 1.
existing data for fast churn prediction [6].
Naive Bayes is an algorithm for probabilistic classification = (1)
that relies on the principles of Bayes' theorem. It assumes that
features are independent, simplifying calculations which where min(x) is the minimum value in x, and max(x) is the
makes the churn prediction as accurate as possible [17]. maximum value in x.

III. ML CLASSIFICATION MODELS 4.4 Class Imbalance: In some cases, some variables are
Based on the review of ML algorithms for classification, imbalanced. These variables will be balanced using the
the following models were chosen to classify the datasets OverSampling method, so the size of the minority values is
with: increased to a size similar to the majority before balancing.
1. Logistic Regression (LR)
2. Random Forest Classification (RF) 4.5 Feature Selection: In the final stage of data
3. Support Vector Machines (SVM) preprocessing, the task is to choose the most suitable features
4. AdaBoost (ADAB) that serve as indicators for churn.
5. XGBoost (XGB)
6. Decision Tree Classification (DT) V. DATASETS
7. Naïve Bayes Classification (NB)
8. K-Nearest Neighbors (KNN) Classification This review analysed the aforementioned ML algorithms on
9. A basic artificial neural network (ANN) the following datasets
A. Telecom Company Dataset
A basic artificial neural network was also created to
classify the datasets to compare the accuracy score as well as Telecom companies today have to keep up with huge
the time it takes to classify data on a deep learning model, both competition, as a lot of companies have sprung up, providing
of which were key factors in the choice of models for the end- services and programs at prices which aim to capture the
user application. price-sensitive consumer. They need to be aware of the
patterns of modern-day consumers and adapt their strategies
IV. DATA PREPROCESSING in order to stay afloat in this dynamic industry.
The Telecom company dataset [16] has 7043 rows and 21
The customer churn datasets have columns like Price, columns of customer data about their usage of the company’s
Geography, Tenure and so on. These columns are expected to

438
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 05,2025 at 04:04:15 UTC from IEEE Xplore. Restrictions apply.
2023 6th International Conference on Advances in Science and Technology (ICAST)

phone and internet services. This dataset includes features The aforementioned selection of ML algorithms
(columns) like “PhoneService”, “InternetService”, performed as expected on this dataset.
“StreamingTV”, “StreamingMovies” etc., which gives the
description of the services a customer has subscribed for, 1. Naïve Bayes and KNN Classification took the least
from the company. The only irrelevant data column in this time to train but provided low accuracies on testing.
dataset was the “customerID” column, hence it was dropped.
2. A similar result was observed for Logistic
Regression, in terms of training time and accuracy.
B. Bank Customer Dataset
Banks benefit from understanding the factors that 3. Gradient Boosting algorithms (AdaBoost and
influence a client's decision to depart from the company. XGBoost) did take slightly more time to train (Fig.
Churn prevention allows banks to develop loyalty programs 2), however, they provided good accuracy scores
and retention campaigns to keep as many customers as (Fig. 1).
possible.
This dataset [17] has 10000 values and 18 columns, 4. From TABLE I, it is evident that the best performer
RowNumber, CustomerId, Surname, CreditScore, out of the ML algorithms is Random Forest.
Geography, Gender, Age, Tenure, Balance, NumOfProducts,
5. Even though the ANN provided the highest
HasCrCard, IsActiveMember, EstimatedSalary, Exited,
accuracy, it took a comparatively large amount of
Complain, Satisfaction, Score, Card Type, Point Earn. The time to train, which reinforced our pre-assumptions
variables, RowNumber, CustomerId, Surname will be about its performance.
dropped as these will not useful for model training.

Figure 1 shows the accuracy score of various


C. E Commerce Dataset
algorithms applied on telecom churn.
Customer churn in the E-Commerce industry is a big
problem for the organizations. It is beneficial for these
organizations to know what makes the customers exit or not
use their platform or services. This helps them to design
attractive discounts accordingly to retain as many customers
as possible.
This dataset [18] has 5630 values and 20 columns,
describing a customer’s profile on this e-commerce platform.
Some of these columns are “PreferredLoginDevice”,
“CityTier”, “PreferredOrderCat”, “DaySinceLastOrder” and
“HourSpendOnApp” to name a few. The variable
“CustomerID” is dropped from the dataset as it will not be
useful for model training. Fig. 1. Accuracy scores of various algorithms on Telecom Churn
dataset
VI. ANALYSIS AND DISCUSSION

Below mentioned are the accuracy and runtime Figure 2 shows the model training time of different
comparison of the 3 datasets, and their analysis. algorithms on telecom company dataset.

A. Telecom Company dataset Analysis

TABLE I. Results of different algorithms' performance on the Telecom


Company dataset

Algorithm Accuracy Run Time

Logistic 80.75% 0.961s


Regression

Random Forest 80.88% 3.3201s


SVM 82.02% 1.4811s
AdaBoost 81.59% 0.4151s
XGBoost 82.94% 0.8828s
DecisionTree 73.77% 0.0498s
Fig. 2. Model Training time of various algorithms on Telecom
Naive Bayes 69.79% 0.0055s Company dataset
KNN 74.84% 0.0312s
ANN (DL) 90.33% 29.0029s
439
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 05,2025 at 04:04:15 UTC from IEEE Xplore. Restrictions apply.
2023 6th International Conference on Advances in Science and Technology (ICAST)

B. Bank Churn dataset Analysis

TABLE II. Results of different algorithms' performance on Bank Churn


dataset

Algorithm Accuracy Run Time

Logistic 65.76% 0.033s


Regression
Random Forest 83.63% 1.82s
SVM 70.40% 9.35s
AdaBoost 81.59% 0.65s
XGBoost 78.00% 1.64s
DecisionTree 76.63% 1.41s Fig. 4. Model Training time of various algorithms on Bank Churn
dataset
Naive Bayes 70.46% 0.009s
KNN 67.10% 0.004s C. E-Commerce dataset Analysis
ANN (DL) 95.33% 42.19s
TABLE III. Results of different algorithms' performance on E-Commerce
dataset
The results observed in TABLE II, were similar to that
observed in the Telecom dataset.
1. Random Forest Classification stands out as the best Algorithm Accuracy Run Time
ML algorithm, recording an accuracy score of
Logistic 86.00% 0.8570s
83.63%, while Decision Tree Classification Regression
provided an acceptable 76.63% accuracy.
2. Gradient Boosting Algorithms follow-up behind Random Forest 81.18% 2.8990s
Random Forest, taking up 0.65s (AdaBoost) and SVM 83.77% 1.5101s
1.64s (XGBoost) for training (Fig. 4), while AdaBoost 85.79% 0.2151s
providing accuracy scores of 81.59% and 78% XGBoost 83.12% 1.2828s
respectively (Fig. 3).
DecisionTree 93.00% 0.0384s
3. SVM Classification did not perform as expected, as
it returned a comparatively low accuracy, while Naive Bayes 83.77% 0.0189s
taking up 9.35s in training. KNN 79.00% 0.1275s
4. Logistic Regression had a relatively dismal accuracy ANN (DL) 87.54% 27.0029s
score of 65.76%.

When compared with the accuracy scores on the Telecom The performance of ML algorithms (TABLE III) on E-
Churn dataset (Fig.1), the selection of ML algorithms have commerce dataset was comparatively different than those on
shown a similar pattern of performance as observed in Fig. 3, Telecom dataset and bank dataset.
with the exception of Logistic Regression. This pattern isn’t 1. While Decision Tree Classification couldn’t
observed in the next dataset. provide a good accuracy score on the previous
datasets, it gave an impressive 93% accuracy
score on this dataset.
2. Random Forest Classification and Gradient
Boosting Algorithms (AdaBoost and
XGBoost) displayed consistent levels of
accuracies, when compared to their previous
performances.
3. Support Vector Machines (SVM) showed an
improved accuracy when compared to its
performance on the Bank dataset.
4. A tremendous improvement was recorded by
Naïve Bayes and KNN Classification (Fig. 5).
5. Logistic Regression, which wasn’t among the
top models for the earlier cases, showcased an
Fig. 3. Accuracy scores of various algorithms on Bank Churn
exceptional improvement in this scenario.
dataset.

440
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 05,2025 at 04:04:15 UTC from IEEE Xplore. Restrictions apply.
2023 6th International Conference on Advances in Science and Technology (ICAST)

Overall, the ML algorithms performed at par with Once the data is entered, the application moves on to the final
the ANN (DL) model (Fig. 6). result, providing the churn status as shown in Fig. 8, as well
as the probability that churn occurs.

Fig. 8. Output/Prediction on the input features

Fig. 5: Accuracy scores of various algorithms on E Commerce dataset.


VIII. CONCLUSION
This paper involved the training of ML models discussed
earlier with respect to three real-world scenarios, for churn
prediction. It also reviewed a churn prediction application
which implemented the findings of this paper. As apparent, it
was found out that on an average, gradient boosting
algorithms i.e AdaBoost and XGBoost, Random Forest
Classification as well as Decision Tree Classification
performed the best, while other classification algorithms
couldn't provide as much accuracy, even though they took
much less time to train on the dataset.
In contrast, even though the basic ANN (Deep learning
model) took a lot more time to train in comparison to ML
models, it provided a much better accuracy score than the
Fig. .6. Model Training time of various algorithms on E-Commerce dataset former.
Therefore, it only came down to the availability of
resources mainly in terms of systems with better computing
VII. RESULTS power, and the situation at hand. The main concerns were the
accuracy of the result as well as the time taken for making the
Based on the results obtained on training the models, it result available to the client, when they use the churn
was decided to incorporate Decision Tree, Random Forest, prediction application. Hence, deep learning techniques
and XGBoost in the churn prediction application. The would not be suitable for this situation. This may not be the
availability of models with the best performance in churn case with other applications of customer churn prediction,
prediction would enable the users of this application to obtain however when a client-centered application is considered, it
the best possible insights about their data. Given below is an is important to provide the client, the desired output as
excerpt (Fig. 7) developed using Streamlit demonstrating quickly as possible. Therefore, as ML models provided us
how the models have been provided for the user's with satisfactory accuracy scores while taking up only a
consideration. fraction of time as compared to a DL model, it was decided
to move forward with ML models to be implemented in the
application.
IX. FUTURE SCOPE
Customer churn prediction, as mentioned earlier, has its
pre-defined importance in this era of dynamically changing
markets. With the advent of ML models, paired with the
increasing reliance on data-driven decision making, customer
churn prediction is only expected to grow in importance.
Today’s world is a witness to the staggering growth of
computing technology, with faster and more efficient
components making way into the consumer markets sooner
Fig. 7. Streamlit web application than ever. This indirectly means that computing power is
going to grow significantly, for a given amount of power
The application asks the user to select an ML model of supply and time.
their choice. Next step involves selecting the features of their What is inferred from this future is DL models will be
choice to provide data to the model for predicting churn. treated in the future just like how ML models are treated and

441
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 05,2025 at 04:04:15 UTC from IEEE Xplore. Restrictions apply.
2023 6th International Conference on Advances in Science and Technology (ICAST)

used in today’s world. This would help in the realization of Identification in Telecom Sector," in IEEE Access, vol. 7, pp. 60134-
better and faster predictions, for Customer churn as well as 60149, 2019, doi: 10.1109/ACCESS.2019.2914999.
for many other purposes. DL models would even enable the [11] A. Alamsyah and N. Salma, "A Comparative Study of Employee
Churn Prediction Model," 2018 4th International Conference on
client to use many more parameters for calculation, making Science and Technology (ICST), Yogyakarta, Indonesia, 2018, pp. 1-
predictions as real as possible. 4, doi: 10.1109/ICSTC.2018.8528586.
In the meantime, there could be studies for optimization [12] K. Gupta, A. Hardikar, D. Gupta and S. Loonkar, "Forecasting
of DL models to meet the time limitations as well as Customer Churn in the Telecommunications Industry," 2022 IEEE
performance benchmarks. Bombay Section Signature Conference (IBSSC), Mumbai, India,
2022, pp. 1-5, doi: 10.1109/IBSSC56953.2022.10037334.
REFERENCES [13] A. Raj and D. Vetrithangam, "Machine Learning and Deep Learning
technique used in Customer Churn Prediction: - A Review," 2023
International Conference on Computational Intelligence and
[1] Wang, Xing & Nguyen, Khang & Nguyen, Binh. (2020). Churn Sustainable Engineering Solutions (CISES), Greater Noida, India,
Prediction using Ensemble Learning. 56-60. 2023, pp. 139-144, doi: 10.1109/CISES58720.2023.10183530.
10.1145/3380688.3380710. [14] Y. Y. Win and C. G. Vung, "Churn Prediction Models Using Gradient
[2] A. De Caigny, K. Coussement, and K.W. De Bock. 2018. A new Boosted Tree and Random Forest Classifiers," 2023 IEEE Conference
hybrid classification algorithm for customer churn prediction based on Computer Applications (ICCA), Yangon, Myanmar, 2023, pp.
on logistic regression and decision trees. 271-275, doi: 10.1109/ICCA51723.2023.10181933.
[3] European Journal of Operational Research 269, 2 (2018), 760–772Hu, [15] D. Dasari and P. S. Varma, "Employing Various Data Cleaning
Xin & Yang, Yanfei & Chen, Lanhua & Zhu, Siru. (2020). Research Techniques to Achieve Better Data Quality using Python," 2022 6th
on a Customer Churn Combination Prediction Model Based on International Conference on Electronics, Communication and
Decision Tree and Neural Network. 129-132. Aerospace Technology, Coimbatore, India, 2022, pp. 1379-1383, doi:
10.1109/ICCCBDA49378.2020.9095611. 10.1109/ICECA55336.2022.10009079.
[4] Martínez-García, M. et al. (2023). Learning Logistic Regression with [16] Rodan, Ali & Faris, Hossam & Al-sakran, Jamal & Al-Kadi, Omar.
Unknown Features. In IEEE CAI 2023, pp. 298-299. doi: (2014). A Support Vector Machine Approach for Churn Prediction in
10.1109/CAI54212.2023.00133. Telecom Industry. International journal on information.
[5] Rani, K. Sandhya and., Shaik Thaslima and., N.G.L. Prasanna and ., [17] D. T. Barus, R. Elfarizy, F. Masri and P. H. Gunawan, "Parallel
R.Vindhya and ., P. Srilakshmi, Analysis of Customer Churn Programming of Churn Prediction Using Gaussian Naïve Bayes,"
Prediction in Telecom Industry Using Logistic Regression (JUNE 10, 2020 8th International Conference on Information and
2021). International Journal of Innovative Research in Computer Communication Technology (ICoICT), Yogyakarta, Indonesia, 2020,
Science & Technology (IJIRCST) ISSN: 2347-5552, Volume-9, pp. 1-4, doi: 10.1109/ICoICT49345.2020.9166319.
Issue-4, July 2021. [18] https://www.kaggle.com/datasets/blastchar/telco-customer-churn
[6] Hassonah, M. A. et al. (2019). Churn Prediction: KNN vs. Decision [19] https://www.kaggle.com/datasets/radheshyamkollipara/bank-
Trees. In Sixth HCT ITT 2019, pp. 182-186. doi: customer-churn
10.1109/ITT48889.2019.9075077. [20] https://www.kaggle.com/datasets/ankitverma2010/ecommerce-
[7] Feng, L. (2022). Customer Churn Prediction: Borderline-SMOTE and customer-churn-analysis-and-prediction
Random Forest. In IEEE ICPICS 2022, pp. 803-807. doi:
10.1109/ICPICS55264.2022.9873702.
[8] Zhang, J., & Dong, Y. (2022). Customer Loss Identification and
Factor Analysis in Mobile Operators with XGBoost. In 2022 NetCIT,.
[9] Wu, X., & Meng, S. (2016). E-commerce Customer Churn Prediction
with Enhanced SMOTE and AdaBoost. In 2016 ICSSSM.
[10] I. Ullah, B. Raza, A. K. Malik, M. Imran, S. U. Islam and S. W. Kim,
"A Churn Prediction Model Using Random Forest: Analysis of
Machine Learning Techniques for Churn Prediction and Factor

442
Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 05,2025 at 04:04:15 UTC from IEEE Xplore. Restrictions apply.

You might also like