A Survey and Implementation of Machine Learning Algorithms For Customer Churn Prediction
A Survey and Implementation of Machine Learning Algorithms For Customer Churn Prediction
Abstract— Estimating customer traffic is an important task for businesses because it helps them identify customers who are most likely to
leave and take preventative measures to retain them by improving customer satisfaction and further increasing their own reven ue. In this article,
we focus on developing a machine-learning model for predicting customer churn using historical customer data We performed engineering
operations on the data, addressed the missing digits, coded the categorical variables, and preprocessed the data before evalu ating it using a
variety of performance indicators, including accuracy, precision, recall, f1 score, and ROC AUC_Score. Our feature significance analysis
revealed that monthly fees, customer tenure, contract type, and payment method are the factors that have the most impact on f orecasting
customer churn. Finally, we conclude the best-performing model, the Soft Voting Classifier, implemented on the four best-performing
classifiers with a good accuracy of 0.78 and a relatively better ROC AUC_Score of 0.82. Keywords — Customer churn prediction, Machine
learning, Feature importance analysis, Gradient boosting, Business revenue.
I. INTRODUCTION necessary to determine the most appropriate data for the type of
analysis performed. Different datasets provide better metrics for
Customer churn prediction is a critical problem for companies different problems and services. [11] In order to find trends and
across various industries. It refers to the task of identifying signs that assist in identifying high-risk customers, the
customers who are likely to discontinue using a company's organization employs sophisticated analytics as well as machine
products or services. From a business perspective, customer learning approaches.
churn poses significant challenges and can have a substantial
impact on a company's profitability and growth. Retaining Some popular betting methods include:
existing customers is the most important task for the survival of 1. Machine-Learning Algorithms: Gradient boosting, decision
the business, which has become common sense in the business trees, random forests, logistic regression, etc. The likelihood of
world. [15]. Customer acquisition requires substantial additional incoming losses can frequently be predicted by
marketing and promotional efforts while retaining loyal monitoring machine learning systems as such.
customers can lead to repeat business and increased customer 2. Survival Analysis: Survival analysis is a statistical technique
lifetime value. Therefore, accurately predicting customer churn used to analyze event-time data, such as the time before losing
allows companies to proactively address the underlying issues a customer. It considers the varying time periods during which
and take appropriate measures to retain valuable customers. customers remain active and enables the prediction of the
Customer churn prediction relies on analysing historical probability of churn over time.
customer data, such as demographic information, transactional 3. Neural Networks: Deep learning techniques, specifically
records, service usage patterns, and customer interactions. neural networks that can learn complex patterns and
Different data types have different analysis capabilities. It is
1
IJRITCC | Month 20__, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: x Issue: y
DOI: (do not edit)
Article Received: (do not edit) Revised: (do not edit) Accepted: (do not edit) Publication: (do not edit)
____________________________________________________________________________________________________________________
relationships from large volumes of data, can be used for Tamaddoni et al. [4] found that both the simple predictive
customer churn prediction.[16],[17],[18] model and the cost-sensitive model were better than the
4. Ensemble Methods: The integrated system combines comparison of both the CART model and the multi-model
multiple models using techniques such as packaging, algorithm, and the cost-sensitive learning model obtained the
promotion, and configuration to increase the accuracy of loss model base only in the CART model but not in many models.
estimation. Praveen et al.[5] employed Logistic regression, Support Vector
Machine, Pruning Tree, and Naive Bayes in their comparative
By applying these techniques, companies can gain valuable study of machine-learning techniques for forecasting attrition of
insights into customer behaviour, identify early warning signs customers and looked into how magnification affects accurate
of churn, and implement proactive strategies to mitigate churn classification.
risk. Lalwani et al.[5] found that combining special selection
techniques such as randomization can improve classification
accuracy.
II. MOTIVATION For predicting customer attrition, Horia Beleiu et al.[6] used
With the help of historical customer data, our research aims to three types of machine learning techniques: deep neural
give a thorough and methodical assessment of numerous networks, support vector models, and Bayesian networks.
machine learning models in order to predict customer attrition. Principal component analysis (PCA) was considered
Although there have been previous research works throughout the feature selection process, which reduced
[1],[3],[5],[6] that involved a comparison of the performances residual data. They used optimization techniques to improve the
of a few machine learning models, we aimed at comparing as feature selection process and thereby improved accurate
many models as possible that one can use for classification. We population classification.
applied a wide array of traditional(logistic regression, ridge, The authors of K Coussement et al. [7] used support vector
decision tree, naïve bayes, knn, etc) and ensemble(catboost, machines, logistic regression (LR), and random forests (RF) to
adaboost, xgboost, LGBM, bagging, etc) machine-learning attempt to model the churn prediction problem. SVM initially
models. We wanted to truly explore the potential of Ensemble performed about as well as LR and RF, but when the best
Learning and we did so by employing a Voting Classifier which parameters were chosen, it surpassed LR and RF with regard to
considered the predictions of the best-performing classifiers of PCC and AUC.
our research and therefore provided better results. We were also The decision tree and logistic regression model were used in the
interested in gaining important insights from the dataset by churn prediction data set by K. Dahiya et al. [8]. WEKA tool
thorough visualization of every numerical and categorical was employed during the trial.
column (via kdeplot, boxplot, and histograms) as well as getting Authors Umman et al. [9] used decision tree and logistic
a quantitative value for each feature signifying its importance regression machine learning models to analyze a large data set,
w.r.t the target column (via the chi-square test). Also, we but the accuracy of the results was poor. And proposed that
addressed the issue of an imbalanced target column using development was, therefore, necessary before using additional
SMOTE. We laid greater emphasis on addressing the above machine learning and feature selection techniques.
specific aspects which drove us to do this research. We believe J. Hadden et al. [10] showed that decision trees are better than
this paper will surely provide valuable insights and other systems because of their rules. By using the existing
methodologies that can be applied in real-world scenarios as feature selection strategy, the acquisition accuracy can be
well as prove to be helpful for future researchers. further improved.
J. Hadden et al. [11] reviewed all machine learning models
considered and provided a thorough study of the methods
III. LITERATURE SURVEY currently used for feature selection. They discovered that
This is a concise outline of related research that has been decision trees outperformed the competition in the prediction
proposed by notable scholars as well as churn prediction in the models. The improvement of the prediction algorithms in
telecom business. feature selection is greatly aided by optimization techniques.
On the dataset, Dhangar et al. [1] discovered that SVM and According to Y. Huang et al. [12], the authors used a variety of
Random Forest had the greatest accuracy rates, at 84 and 87 classifiers on the churn prediction dataset, and the findings
percent, respectively. SVM classifiers surpass others with an showed that random forest outperforms the competition with
AUC score of 92.1 percent, while Random Forest earns the regard to AUC and AUC(PR) analysis. However, it is possible
highest AUC score of 94.5 percent. Working on the Customer to raise accuracy even more by employing feature extraction
DNA website, Saad et al [2] highlighted the usage of the re- optimization approaches.
sampling strategy to address the issue of class imbalance. Their Genetic programming (GP) and the Adaboost machine learning
findings show that decision trees are the most accurate model were combined by researchers working under the
classification system when it comes to detecting losses for data direction of A. Idris et al. [13] in order to compare their results
analysis. Authors Adbelrahim et al. [3] Predict users using with those of other classification algorithms. Adaboost and GP's
decision trees, random forests, GBM tree algorithms, and results were more accurate than those of the competition.
XGBoost as tree-based algorithms. Comparative research However, accuracy can be increased even more by utilizing
shows that XGBoost outperforms its competitors in terms of various optimization strategies, like the gravitational search
AUC accuracy. However, feature selection techniques and algorithm, bio-geography-based optimization, and many others.
optimization algorithms can improve accuracy.
2
IJRITCC | Month 20__, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: x Issue: y
DOI: (do not edit)
Article Received: (do not edit) Revised: (do not edit) Accepted: (do not edit) Publication: (do not edit)
____________________________________________________________________________________________________________________
Authors P. Kisioglu et al. [14] used Bayesian Belief Networks 2. The clients with higher monthly charges are also more likely
(BBN) to estimate client attrition. Correlation analysis and to churn.
multicollinearity tests were carried out during the experimental 3. Both tenure and monthly charges are likely to be important
analysis. BBN was proven to be a viable alternative for the features in predicting churn, as they show significant variations
prediction of churn. They offered suggestions for future study between churned and non-churned customers.
directions as well.
Furthermore, boxplots were generated for the same three
IV. METHODOLOGY numerical columns in Figure 4.3
4.1 SYSTEM ARCHITECTURE:
5
IJRITCC | Month 20__, Available @ http://www.ijritcc.org
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: x Issue: y
DOI: (do not edit)
Article Received: (do not edit) Revised: (do not edit) Accepted: (do not edit) Publication: (do not edit)
____________________________________________________________________________________________________________________
5.3 Comparison of Every Model: our assessment, we implemented the voting classifier (soft as
Figure 5.3 shows a detailed comparison of all the models well as hard). The Soft Voting Classifier showed the best
implemented in this research work. performance out of all. It showed an accuracy of 0.78 which is
at par with the highest accuracy achieved in our research so far
(i.e. 0.79 of XGBoost) and a decent f1-score of 0.74. The
difference-maker here is the ROC AUC_Score of 0.82 which is
significantly more than the highest ROC AUC_Score we’ve
achieved so far (i.e. 0.76 of Random Forest and Ridge
Classifier). Therefore, we infer that the Soft Voting Classifier is
the best-performing classifier in our research.
VI. CONCLUSION
In this research, we did a comparative analysis of
the effectiveness of various machine learning models
for user loss prediction in the telecom industry. We used
qualitative selection techniques i.e., chi-
square test and correlation analysis to select the best features.
Fig. 5.3: Values of evaluation metrics of all the models We also used SMOTE on the data and created a new synthetic
model with a small number of classes to sample them. We
The graph in Figure 5.4 visualizes how “accuracy”, “f1 score”, employed numerous learning and machine learning techniques,
and “ROC AUC_Score” with respect to every classifier. including Logistic Regression, Support Vector Machine, K
Neighbors, Naive Bayes, Pruning Tree, Random Forest,
Gradient Boosted Classifier, AdaBoost Classifier, XGBoost
Classifier, Light Gradient Accelerator Machine Classifier,
Ridge Classifier, Bagging
Classifier. We evaluate each model using performance indicat
ors such as accuracy, precision, and recall along with f1
and AUC(ROC) scores.
7
IJRITCC | Month 20__, Available @ http://www.ijritcc.org