0% found this document useful (0 votes)
26 views

Algorithms 17 00231

Uploaded by

fathimohamed2384
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Algorithms 17 00231

Uploaded by

fathimohamed2384
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

algorithms

Article
Prediction of Customer Churn Behavior in the
Telecommunication Industry Using Machine Learning Models
Victor Chang 1, * , Karl Hall 2 , Qianwen Ariel Xu 1 , Folakemi Ololade Amao 2 , Meghana Ashok Ganatra 1
and Vladlena Benson 1

1 Department of Operations and Information Management, Aston Business School, Aston University,
Birmingham B4 7ET, UK; [email protected] (M.A.G.); [email protected] (V.B.)
2 School of Computing, Engineering and Digital Technologies, Teesside University,
Middlesbrough TS1 3BX, UK; [email protected] (K.H.); [email protected] (F.O.A.)
* Correspondence: [email protected] or [email protected]

Abstract: Customer churn is a significant concern, and the telecommunications industry has the
largest annual churn rate of any major industry at over 30%. This study examines the use of ensem-
ble learning models to analyze and forecast customer churn in the telecommunications business.
Accurate churn forecasting is essential for successful client retention initiatives to combat regular
customer churn. We used innovative and improved machine learning methods, including Decision
Trees, Boosted Trees, and Random Forests, to enhance model interpretability and prediction accuracy.
The models were trained and evaluated systematically by using a large dataset. The Random Forest
model performed best, with 91.66% predictive accuracy, 82.2% precision, and 81.8% recall. Our results
highlight how well the model can identify possible churners with the help of explainable AI (XAI)
techniques, allowing for focused and timely intervention strategies. To improve the transparency of
the decisions made by the classifier, this study also employs explainable artificial intelligence methods
such as LIME and SHAP to illustrate the results of the customer churn prediction model. Our results
demonstrate how crucial it is for customer relationship managers to implement strong analytical
tools to reduce attrition and promote long-term economic viability in fiercely competitive market-
places. This study indicates that ensemble learning models have strategic implications for improving
Citation: Chang, V.; Hall, K.; Xu, Q.A.; consumer loyalty and organizational profitability in addition to confirming their performance.
Amao, F.O.; Ganatra, M.A.; Benson, V.
Prediction of Customer Churn Keywords: customer churn prediction; machine learning; explainable AI; ensemble learning; predictive
Behavior in the Telecommunication analytics
Industry Using Machine Learning
Models. Algorithms 2024, 17, 231.
https://doi.org/10.3390/a17060231

Academic Editors: Mateus Mendes 1. Introduction


and Balduíno Mateus Electronic commerce dramatically boosted the quantity of information accessible to
Received: 21 April 2024
customers when it first began. In the digital age, consumers have become more informed
Revised: 19 May 2024 about the products they buy. Armed with access to information about a wide range
Accepted: 22 May 2024 of related products or services, the behavior of consumers has shifted towards being
Published: 27 May 2024 less impulsive. Instead, getting a range of information on a wide range of products
to make a more calculated purchasing decision has become the norm. This change in
attitude makes attracting and retaining new customers one of the most critical difficulties
confronting businesses in marketing today. While new businesses focus on obtaining new
Copyright: © 2024 by the authors. consumers, established businesses are more focused on retaining current customers to
Licensee MDPI, Basel, Switzerland. increase cross-selling opportunities. Customer satisfaction is one of the keystone concepts
This article is an open access article in strategic marketing [1]. Efforts have been made to enhance customer value, such as
distributed under the terms and the 7Cs framework proposed by [2]. In several businesses, customer churn rates are a
conditions of the Creative Commons
significant problem. Many studies have demonstrated that even a minor shift in churn rates
Attribution (CC BY) license (https://
may have a big effect on profits. Understanding client behavior in advance can provide
creativecommons.org/licenses/by/
businesses with a competitive edge.
4.0/).

Algorithms 2024, 17, 231. https://doi.org/10.3390/a17060231 https://www.mdpi.com/journal/algorithms


Algorithms 2024, 17, 231 2 of 21

Even though mobile phones account for over 75% of all potential phone calls world-
wide, the mobile telephone market is one of the most rapidly growing segments of the
telecom sector. For the same reasons that every other competitive market has witnessed
a movement in the competitive landscape from customer acquisition to customer reten-
tion, retail has also seen a shift in the means of competition from customer acquisition to
customer retention. In the telecom industry, churn refers to a company losing customers
to other service providers [3]. Like many businesses that work with long-term clientele,
telecom firms utilize customer churn research as one of their primary business indicators [4].
According to a survey presented by [5], 30–35% of clients leave their telecom company
annually post-COVID. This churn rate may continue to rise with market growth or the
emergence of new, big telecom players in the future. In addition, for cellular corporations,
client acquisition costs can be comparable to 5–10 times the amount spent on customer
retention or satisfaction costs [6].
Therefore, it is essential to learn the reasons why customers leave to reduce the harm
churn has on a business’s bottom line. Predictive marketing analytics and churn analysis
can help identify factors influencing consumers’ voluntary churn by using advanced
machine learning algorithms [7]. By investigating the existing studies, it was discovered
that ensemble learning, including classical machine learning algorithms like Random Forest,
Decision Trees, and Naïve Bayes [8–10], and deep learning methods, such as particle-
classification-optimization-based BP networks [11] and Deep-BP-ANN [12], have been
employed to improve the accuracy of churn prediction.
However, the models should not only focus on accuracy in predicting churning [9]
but also be comprehensible, which means it should provide reasons for churning so that
experts can validate its results and check that it predicts intuitively and correctly. If the
company had understandable and transparent models to work with, it would increase the
understanding of what was causing the churn and how to enhance customer happiness to
boost retention.
As a result, this research aims to construct an accurate, efficient, responsible, and
explainable prediction model for customer attrition in the telecom industry using ensemble
learning methods. Various data mining methods like Decision trees, Random Forests,
and Logistic Regression were utilized by experts to build the predictive model. The
performance of the models was evaluated using the accuracy measures, area under the
curve, and sensitivity and specificity measures. In addition, this study also aims to improve
the interpretability of customer churn predictive models through the use of LIME and
SHAP to provides decision-makers with an overall explanation of the factors affecting
the customer’s decision to churn, as well as a specific analysis for every single customer.
Following a brief introduction to customer churn amplification in the telecom business, the
existing literature is reviewed and summarized in detail.
The remainder of this paper is structured as follows: Section 2 focuses on reviewing
the contribution of predictive marketing to Customer Relationship Management, different
machine learning-based churn analysis models for the telecom industry, as well as the
explainable AI. In Section 3, the most acceptable attributes are determined, and the tech-
niques used in this study are introduced. Section 4 is concerned with the research analysis
and findings. Finally, the conclusion and several alternate interpretations are presented in
Section 5.

2. Literature Review
The current section is divided into three distinct components. The first section will
introduce Customer Relationship Management (CRM) with its core ideas. The next part
discusses the most well-known and significant ensemble learning models in CRM, which
were also employed in the model design phase of this examination. Finally, the third
segment discusses current research on the importance of explainability and transparency
of the methods used in this regard.
Algorithms 2024, 17, 231 3 of 21

2.1. Customer Relationship Management and Predictive Marketing


Customer Relationship Management (CRM) started gaining popularity in the late 1990s
through work conducted by American tech companies such as IBM and Gartner. There are
four main types of CRM—strategic, operational, analytical, and collaborative—which all aim
to manage the company’s relationship with both potential and current customers:
• Strategic CRM, which is the use of customer data through systematic analysis as a
means of marketing management [13];
• Operational CRM, which supports company operations whereby information on
employees, customers, and leads is stored;
• Analytical CRM, which is a strategy whereby customer and market information is
analyzed to aid the decision-making process for business management [14];
• Collaborative CRM, which allows for multi-way communications between compa-
nies and their customers, with the goal of facilitating more profitable retention of
customers [15].
The primary goal of CRM is to maximize customer retention while attracting a steady
stream of new customers to maximize the company’s revenue streams [16]. By adopting
certain CRM methodologies, companies can improve their understanding of how to en-
gage and interact with their customers to form mutually beneficial relationships through
collaboration [17].
Utilizing information technology and information systems in CRM is becoming in-
creasingly important. More specifically, modern data analysis techniques combined with
the increased amount of available data have made this approach particularly attractive
to businesses. It is critical, therefore, for companies to adopt such practices to achieve
innovative CRM capabilities. Machine learning has emerged as one of the most influen-
tial and popular tools to increase the effectiveness of any CRM strategy. The main four
dimensions [18] across which ML is commonly applied in the context of CRM are customer
identification (target customer analysis and customer segmentation), customer attraction
(marketing), customer retention (loyalty program analysis, one-to-one marketing, com-
plaint, and conflict management), and customer development (customer lifetime cycle
analysis, upselling and cross-selling, the market basket analysis).
Big data analytics is transforming businesses by transferring the focus from products
and channels to the customer, with the aim of maintaining personal relationships and
moving from mass marketing to highly personalized marketing. Predictive analytics, used
to transform raw data into useful information, has great relevance for marketing purposes,
allowing the prediction of customer behavior and allocation to specific groups of customers.
By using innovative instruments, companies can adopt big-data-driven, micro-targeting
marketing practices, which permit the improved precision of segmentation and targeting.
Predictive marketing is a powerful tool that can greatly enhance a company’s CRM
capabilities. One of the predictive marketing methods is cluster analysis. Cluster analysis
algorithms can manage vast amounts of data to identify elements that characterize certain
consumers and find correlations that are difficult to identify manually. Conjoint analysis is
also a method adopted in predictive marketing for CRM. It is a research technique used
to identify customer preferences by identifying relevant attributes for consumers in the
selection and purchase processes. The outputs of conjoint analysis often guide business
decisions on new products and promotions. Moreover, online reviews have a strong impact
on consumer choices, and sentiment analysis or opinion mining can potentially predict
future sales and assist marketing strategy. Sentiment analysis is important for a better
understanding of consumer preferences, which leads to the identification of precise targets
and better advertising strategies. Eachempati et al. [19] conducted sentiment analysis to
analyze the effect of the emotions expressed on social media platforms by Indian customers
regarding automobile companies’ stock prices. They found that customer sentiment is a
strong factor contributing to the stock price as well as the corporate value. In addition
to the classification algorithms for cluster analysis, Lamrhari et al. [20] also employed
cluster analysis and k-means algorithms to group customers into different clusters for
Algorithms 2024, 17, 231 4 of 21

the purpose of extracting insights from social media data. Random Forest outperformed
the other models used with 98.46% accuracy and K-means helped to realize the eWoM
communication balance.
While gaining new clients is necessary for a company to develop, customer retention
should not be disregarded. Companies employing CRM strategies should take particular
care to analyze the factors contributing most to improving customer retention rates. Ac-
cording to [21], such key factors involve the interplay between relationship management
practices, such as customer trust, employee commitment, and conflict handling. However,
it is also noted that more research needs to be conducted in these areas. Predictive analytics
not only enhances segmentation techniques but also enables the implementation of churn
analysis, which predicts the likelihood of customers leaving the company for competitors.
This information helps the company take initiative-taking measures to prevent customer
churn. Predictive analytics algorithms can estimate the likelihood of customers switching
to competitors and identify the factors that contribute to that probability.

2.2. Customer Churn Prediction Review


Customer churn prediction (CCP) plays a crucial role in this research, since it can
contribute to the impact of customer retention on business profitability and growth. Well-
known papers have been studied and summarized as follows.
Researchers [16] have investigated the understanding of nuanced behaviors leading
to customer churn in the telecommunication industry. Their combined approaches, using
temporal centrality metrics and data analytics, have provided insights into early churn
indicators and the role of customer lifecycle stages in influencing churn tendencies. Their
emphasis on leveraging advanced predictive analytics for proactive churn management
underscores the importance of timely interventions and personalized customer outreach.
De Caigny et al. [22] focused on developing their convolution neural network (CNN)
techniques in CCP and compared their approach with current practices of text data analysis
via text mining. They have found that CNNs outperform the current practices and un-
structured data in text data prevent them from achieving a high accuracy. Their approach
can extract the most valuable features from the unstructured textual data, thus achieving
more reliable prediction with higher accuracy. Additionally, they [23] have focused on
business-to-business (B2B) customer retention and analyzed 6432 customers. They devel-
oped the uplift Logit Leaf model (LLM), which can achieve better performance in CCP than
competing models. By integrating behavioral economics principles with their LLM model,
they have paved the way for more human-centric churn prediction strategies.
Seymen et al. [24] proposes a deep learning model for predicting whether retail
customers will churn in the future. The results showed that the deep learning model
achieved better classification and prediction results than the Logistic Regression model
and an artificial neural network model. Deep learning models have been proved to be
effective in a variety of domains, including agriculture [25] and stock price forecasting [26].
Researchers [27] blended deep learning and natural language processing (NLP) to analyze
25,943 customer survey data. Their blended model can identify patterns, which can lead to
improved decision-making and improve accuracy in their CCP analysis. Therefore, the use
of ML algorithms can be effective in prediction robustness.

2.3. Customer Churn Prediction Model for Telecom Industry


Ensemble learning contains the use of multiple ML models to achieve improved
performance and accuracy [28,29] and also provides more independent views, so that
decision-making can be better and more accurate [30]. Ensemble-based ML classifiers
have recently emerged as a new way of building ML models. Since models have different
strengths and weaknesses, researchers have started building hybrid models consisting of
two or more models. Mishra and Reddy [31] took this approach and applied ensemble
models, such as Random Forest, Decision Trees, bagging algorithms, and boosting algo-
rithms, to predict customer churn in the telecom industry. They compared the results of
Algorithms 2024, 17, 231 5 of 21

these ensemble models with more classical models such as Naïve Bayes. From their results,
they found ensemble classifiers performed favorably, with Random Forest returning the
highest accuracy of 91.66% and all other ensemble classifiers returning results above 90%.
Researchers [19] used a particle classification optimization-based BP network for tele-
com customer churn prediction. The PBCCP algorithm is based on particle classification
optimization and particle fitness calculation. In this case, particles refer to vectors contain-
ing the thresholds and weights within a BP neural network. The particles are classified
into categories using their fitness values and are updated using distinct equations. They
found that increasing the number of layers can improve the performance of the algorithm
at the cost of training time. As a result of this, they opted to use one hidden layer in the
neural network. They used a balanced dataset made up of 50% churn customers and 50%
non-churn customers, resulting in the PBCCP network returning an overall accuracy of
73.3%. By comparison, the PSO-BP network returned an overall accuracy of 69.6% and the
BP model had 63.6%.
Similarly, researchers [9] devised the Logit Leaf model, an ensemble model using
aspects of Logistic Regression and Decision Trees, and compared their results with standard
Decision Tree, Logistic Model Tree, Logistic Regression, and Random Forest models. The
AUC performance criteria were one of the metrics used to evaluate performance—a metric
commonly used for evaluating the performance of binary classification systems, such as
customer churn prediction. In this regard, the Logit Leaf model performed the best, slightly
better than Random Forest. The same was true when the models were evaluated using the
TDL (10%) performance criteria. The Logit Leaf model was also more efficient at making
predictions, taking less time than the other models used. When real-time predictions are
required, this can be a crucial factor when deciding which model to use for a given problem.
Finally, scientists [10] applied gravitational search algorithms to perform effective
feature selection, resulting in a dimensionality reduction in the dataset used for customer
churn prediction. After this, they applied a selection of ML models for comparison: Logistic
Regression, Naïve Bayes, Support Vector Machine, Decision Trees, Random Forest, and
Extra Tree Classifiers and boosting algorithms such as Adaboost, XGBoost, and CatBoost.
They found that, when comparing the models used, the boosting algorithms performed the
best, with CatBoost achieving the highest accuracy, recall, and precision with scores of 81.8%,
82.2%, and 81.2%, respectively. When comparing their AUC scores, boosting algorithms
performed the best again, with XGBoost and Adaboost scoring the highest at 84%.
One of the most important applications of ML with regard to CRM is to give companies
the ability to predict customer churn. This is particularly important given that retaining
existing customers is significantly more valuable than acquiring new customers, as more
resources are needed for new customer acquisition. It follows, therefore, that customer
retention is one of the key priorities of the CRM strategy [32]. The ability to harness ML to
significantly improve customer churn rates is of particular importance.
In this study, we will make use of predictive machine learning models to identify clients
who are likely to churn after categorizing current customers using clear machine learning.

2.4. Explainable Artificial Intelligence (AI) in Churn Analysis


While the application of machine learning algorithms has brought many benefits to
business, it has also revealed several drawbacks. For example, in Customer Relationship
Management, it is essential that every decision is meaningful. Advanced data analytics
should enable decision-makers to understand the reason underlying the model’s prediction
of customer behavior results so that they can tailor their decisions and adjust personalized
marketing strategies accordingly [4]. Decision-makers are often wary of or even reject AI
systems because the performance of the predictive model or system is overemphasized
while interpretability or transparency is ignored [33].
Many popular ML models are considered as “black boxes” [7,34], meaning that the
inner workings of the algorithms and associated decisions or classifications made are
secretive. For example, individual predictions often lack interpretability when predicting a
Algorithms 2024, 17, 231 6 of 21

potential customer’s risk of discontinuing a telecommunications service. It is difficult to


relate the predicted probability of customer churn to customer characteristics, which creates
challenges in determining customer retention success and deciding on refinement strategies.
Additionally, from a legislative perspective, the interpretability of AI algorithm conclusions
is itself a right of consumers and regulators have a responsibility to ensure this right.
Studies have discussed data transparency and ethical issues from the perspective of users
or data subjects [35]. To address this problem, explainable AI (XAI) models are required to
provide details or interpretations that make the operations of AI understandable.
According to [36,37], explainable AI enables interested parties to comprehend the
primary influences on model-driven decisions as well as how AI algorithms carry out their
operations, forecast outcomes, and make decisions. The European General Data Protection
Regulation (GDPR) emphasizes that intelligent decisions should be accompanied by relevant
information about the logic involved and the significance and possible effects of their
processing for the data subject. Therefore, the GDPR grants data subjects the right to obtain
pertinent information regarding the basis for smart decision-making [38].
A few XAI approaches have been used for customer churn predictive models in the
existing literature, including global surrogate models, Partial Dependence Plots (PDPs),
Accumulated Local Effects (ALEs) [4], LIME [39], and Shapley values [40,41].
In the work of [4], a number of XAI methods were employed to explain the customer
churn predictive model by using a dataset of the telecom industry. The discussed global
explanation methods included Partial Dependence Plots (PDPs), Accumulated Local Effects
(ALEs), global surrogate models, and Shapley Valuea, as well local methods, including
Individual Conditional Expectation (ICE) and the local surrogate model (LIME). The results
show that a thorough understanding of the data, the inner workings of the model, and
the issue can be addressed by the combination of several interpretations and approaches
in a specific way. Scientists [42] proposed a novel XAI framework to identify the most
significant factors that influence a customer’s decisions on purchasing or abandoning non-
life insurance coverage. Their framework applied similarity clustering to the Shapley values
from a predictive model created by XGBoost. The results showed that the integration of
the clustering technique and Shapley values effectively grouped the customers, which was
used to forecast the churn decisions of customers. Researchers [7] provided an XAI solution
for predicting customer churn and outputting its explanations through visualization. This
system integrates two major methods, the Random Forest algorithm used to train a churn
classifier and Shapley values used to provide global and local explanations for the churn
classifier. Using LIME and Shapley techniques, Ref. [39] interpreted customer churn
predictive models constructed using Random Forest and light-gradient-boosting machines
from local and global perspectives, respectively.
Personalized marketing initiatives, especially those related to managing customer
churn, are essential for companies looking to establish and uphold strong customer relation-
ships. The use of predictive analytics methods undoubtedly enhances the efficacy of these
endeavors [43]. For better customer churn prediction, various emerging ML algorithms
have been employed in the existing literature and have performed well in terms of accuracy.
Nevertheless, decisions produced via ML-based customer churn predictive models are not
easily understood by managers of customer relationships because of the black-box nature
of artificial intelligence. Further research in this area is nascent. Therefore, the aim of this
research is to contribute to the literature by improving the interpretability of customer
churn predictive models through the use of XAI interpretations to optimize their local and
global interpretability.

3. Research Methodology
The telecommunications sector has long struggled with churn. This research aims to
create and deploy a cost-effective system for predicting client churn in the telecommunica-
tions sector. Addressing this issue is expected to yield a deeper comprehension of churning
customers, enabling the identification of such customers and providing a foundation for
The telecommunications sector has long struggled with churn. This research aims to
create and deploy a cost-effective system for predicting client churn in the telecommuni-
cations sector. Addressing this issue is expected to yield a deeper comprehension of
churning customers, enabling the identification of such customers and providing a foun-
Algorithms 2024, 17, 231 7 of 21
dation for future initiatives aimed at reducing the sector’s churn rate. The methodology
section discusses both the research approach selected and the machine learning models
adopted.
future initiatives aimed at reducing the sector’s churn rate. Section 3 discusses both the
research approach selected and the machine learning models adopted.
3.1. The Crisp Model
The
3.1.CRoss-Industry
The Crisp ModelStandard Process for Data Mining (CRISP-DM) model is referred
to as a standardized way of obtaining
The CRoss-Industry Standarda Process
good process viaMining
for Data data mining across model
(CRISP-DM) businesses
is referred
and industries where dataway
to as a standardized and ofmodeling
obtainingare a priority
a good [44].
process Researchers
via data miningadvise
acrossthat, after and
businesses
twentyindustries
years of developing CRISP-DM,
where data and modelingthe areemphasis
a priorityis on Researchers
[44]. data scienceadvise
and the method-
that, after twenty
ologies should accommodate the need for data release, data architecting, data simulation,
years of developing CRISP-DM, the emphasis is on data science and the methodologies
and data acquisition
should [45]. Business
accommodate processes
the need for dataand demands
release, data can be centered
architecting, based
data on the and
simulation,
data-driven approach. In other words, we can check work progress, evaluate our outputs,
data acquisition [45]. Business processes and demands can be centered based on the data-
and make decisions
driven in real-time.
approach. By doing
In other words, weso,canour efficiency
check and accuracy
work progress, in our
evaluate tasks
our can and
outputs,
be significantly improved.
make decisions in real-time. By doing so, our efficiency and accuracy in our tasks can be
Thus, CRISP-DM
significantly models are an apparent methodological way of directing the re-
improved.
search’s procedure. The
Thus, CRISP-DM process diagram
models of the CRISP-DM
are an apparent modelway
methodological is depicted in Figure
of directing 1
the research’s
below.procedure. The process diagram of the CRISP-DM model is depicted in Figure 1 below.

FigureFigure Crisp model


1. Crisp1.model cycle. cycle.

Wedefine
We first first define a specific
a specific business
business problem,
problem, i.e., customer
i.e., customer churnchurn
in theintelecommuni-
the telecommunica-
tion industry, and set explicit goals to mitigate this problem through
cation industry, and set explicit goals to mitigate this problem through predictive predictive modeling.
Following the CRISP-DM framework, we collected and preprocessed customer data from
the telecom industry, focusing on the characteristics that may indicate customer churn.
Throughout the modeling phase, we applied various data mining techniques such as Deci-
sion Trees, Random Forests, and Logistic Regression. After modeling, we evaluated the
performance of the models using rigorous metrics such as accuracy and area under the
curve to ensure that they meet the operational requirements of the business environment.
Finally, we translated the findings from the models into actionable strategies aimed at
customer retention.

3.2. Dataset
The dataset used in this study is a publicly available, large dataset. The aim was to
find a sufficiently large and recent dataset regarding churn within the telecommunication
industry. The dataset in question was selected based on meeting certain search criteria—it
could meet a representative size and was presumed to contain relevant explanatory vari-
ables such as demographics. Finally, a Telecom CUSTOMER Churn dataset from the Maven
Analytics website platform was selected. It consists of customer activity data (features),
Algorithms 2024, 17, 231 8 of 21

along with a churn label specifying whether a customer canceled the subscription, which
we used to develop predictive models. The dataset consists of 7043 rows and 38 attributes.
The attributes consist of different pieces of information about the individual customer,
including the status of the customer’s subscription, which is categorized as churn and
not churn.
A high-quality dataset is required for further analysis; therefore, this study examined
each variable in the dataset for missing values. In order to select a predictive model with
optimal predictive accuracy, the research data was divided into two groups—the Train
(75%) and Test (25%) datasets—by putting into consideration the great ratio of 2:1 (non-
churner: churner) based on Wei and Chiu ’s [31] research. After testing data quality and
selecting variables, none of the observations or rows were deleted.

3.3. Machine Learning Algorithms


Different machine learning techniques were adopted to create a customer churn
prediction system. These techniques were adopted to find the best possible way to predict
customer churn through a model with high predictive accuracy. The five techniques used
were the Logistic Regression model, the Random Forest classifier, the Naïve Bayes classifier,
The K-Nearest Neighbor classifier, and the Decision Tree Classifier. All the models and
their functions are described below.
Logistic Regression: Logistical models attempt to create a regression model based on
data with the binary response variable. These models can be used, among other things,
to estimate population groups where a statement may be true or false, such as churn or
non-churn. In Logistic Regression, the logit function is used to determine the probability of
a binary outcome.
K-Nearest Neighbor (KNN) Classifier: KNN is a non-parametric algorithm and a
lazy-learner algorithm, meaning it does not make any assumptions about the underlying
data and does not immediately learn from the training set until new data is obtained. The
KNN algorithm assumes that the new data is similar to the existing data and assigns the
new data to the category that is most similar to the existing category based on the distance
between two points.
Naïve Bayes Classifier: This procedure is based on Bayes’ Theorem and the assump-
tion that the predictors are unrelated. In contrast to other models, the Naive Bayes model
is straightforward to create and is particularly successful when dealing with large datasets.
Additionally, it is easy to use.
Decision Tree: Decision Tree learning is a fundamental technique in decision theory.
These trees comprise a root at the top and knots that are interconnected by branches. Nodes
can be classified as either internal or terminal. At each internal node, a specific attribute
is tested, and the result guides the selection of different branches, eventually leading to a
terminal node. The terminal nodes, or “leaves”, correspond to a classification [46].
Random Forest Classifier: A Random Forest is a combination method that works
with Decision Trees as building blocks. The algorithm generates a predefined number of
trees and takes a cut of the total number of trees and uses it as its predictor [47]. In the
task, the Random Forest model can be adjusted to achieve the best possible performance by
adjusting the number of trees and the number of cuts used in the algorithm.

3.4. Model Performance Evaluation


In classification problems, the following measures are used to assess the performance
of ML models.

3.4.1. Confusion Matrix


A Confusion Matrix is a well-known way of examining how a good classification
model achieves the observed classes. The Confusion Matrix is presented in Table 1
as follows.
Algorithms 2024, 17, 231 9 of 21

Table 1. Confusion Matrix.

Observed 0 Observed 1
Estimated 0 TN FN
Estimated 1 FP TP

The four boxes in the classification table are all assigned a name: FN, FP, TP, and TN.
• TN stands for true negative. Here, the customers are observed as not being churners,
and the model has also classified the customers as non-churners.
• FP stands for false positive. Here, customers are observed as being non-churners, but
the model has classified the customers as churners.
• FN stands for false negative. Customers are observed as being churners, but the model
has classified the customers as non-churners.
• TP stands for true positive. Here, customers are both observed and classified as churners.
The Confusion Matrix is not only a visually advantageous way of measuring the
model’s ability to classify correctly. Several calculations can also be made based on the four
values. These calculations are all targets for more specific evaluations of the model and can,
therefore, be used to identify the model’s strengths and weaknesses.

3.4.2. Four Evaluation Metrics


This section describes four metrics used to assess the performance of churn prediction
models, including accuracy, error rate, specificity, and sensitivity.
Accuracy: Accuracy is the proportion of correctly classified observations out of all
customers classified. This is an overall score, which counts on the overall performance of
the model. Accuracy is determined as follows:

TP + TN
Accuracy = . (1)
TP + TN + FP + FN
Error rate: As opposed to accuracy, the error rate is the proportion of incorrectly
classified observations out of all customers classified. The error rate is calculated as follows:

Error rate = (FP + FN)/(TP + TN + FP + FN). (2)

Specificity: This is the true negative rate. The rate, thus, indicates how large a pro-
portion of the customers estimated as non-churners are correctly classified. Specificity is
determined as follows:
Specificity = TN/(TN + FP). (3)
Sensitivity: This is the true positive rate. The rate indicates what percentage of the
customers were estimated as churners and classified correctly. Sensitivity is calculated
as below:
Sensitivity = TP/(TP + FN).

3.4.3. Other Evaluation Metrics


ROC stands for Receiver Operating Characteristic and is a visual representation of the
ability to classify correctly. The graph shows the relationship between the false positive
rate on the x-axis and the true positive rate on the y-axis. Three different ROC curves are
shown in Figure 2.
3.4.3. Evaluation Metrics
ROC stands for Receiver Operating Characteristic and is a visual representation of
the ability to classify correctly. The graph shows the relationship between the false posi-
Algorithms 2024, 17, 231 tive rate on the x-axis and the true positive rate on the y-axis. Three different10ROC
of 21 curves
are shown in Figure 2.

Figure 2.2.ROC
Figure ROCcurves.
curves.

The ROC curve in the graph to the left indicates a model that correctly predicts 50%
The ROC curve in the graph to the left indicates a model that correctly predicts 50%
of the time and, therefore, also indicates random classification. The graph in the middle
of the time
indicates and, with
a model therefore, alsopredictive
improved indicatesability
randombut classification.
that still leads to The graph in the middle
misclassification.
indicates a model with improved predictive ability but that still
In the graph to the right, the ROC curve indicates that the model classifies closeleads to misclassification
to perfect,
In the curve
as the graphisto theclose
very right,tothe
the ROC
uppercurve indicates that the model classifies close to perfect
left corner.
as the
AUCcurve is very
stands for close to the the
area under upper
curve leftand
corner.
measures the area under the Receiver
Operating
AUC stands for area under the curve andtomeasures
Characteristic (ROC) curve. It is one way obtain a more accurate
the area undermeasurement
the Receiver Op-
of the model’s performance. A perfect model will have an AUC
erating Characteristic (ROC) curve. It is one way to obtain a more accurate of 1, after which the
measurement
model’s quality decreases as the AUC value decreases.
of the model’s performance. A perfect model will have an AUC of 1, after which the
model’s quality
3.5. Explainable AIdecreases
Techniques as the AUC value decreases.
Due to the black-box nature of machine learning algorithms, the operating principles of
3.5. Explainable
the algorithms areAI Techniques
difficult to understand and cannot be easily explained to decision-makers
in theDue telecom
to the industry,
black-boxespecially
nature ifofthey do notlearning
machine have a computer
algorithms, science background.
the operating principles
As a result, although ML algorithms excel in terms of accuracy, they
of the algorithms are difficult to understand and cannot be easily explained to decision- may not gain the
trust of decision-makers
makers or users orespecially
in the telecom industry, be accepted in real-world
if they do not haveenterprise management.
a computer science back-
While aiming to address this issue, after evaluating the classifiers, this paper introduces the
ground. As a result, although ML algorithms excel in terms of accuracy, they may not gain
concept of explainable AI in the study and illustrates the results of the best customer churn
the trust of decision-makers or users or be accepted in real-world enterprise management
prediction models through two visualization tools, including LIME and SHAP.
While aiming
First, LocaltoInterpretable
address thisModel-Agnostic
issue, after evaluating
Explanationsthe classifiers,
(LIMEs) arethis paper introduces
employed to
the
optimize the local interpretability of the Decision Tree classifiers and Random best
concept of explainable AI in the study and illustrates the results of the Forestcustomer
churn prediction
classifiers. models
It emphasizes through
training twointerpretable
locally visualization tools,
models including
that LIME
may be used to and SHAP.
explain
First, Local Interpretable Model-Agnostic Explanations (LIMEs) are employed to op-
specific predictions and help decision-makers understand why a particular class was
predicted
timize thefor a certain
local instance [46].of the Decision Tree classifiers and Random Forest classi-
interpretability
fiers.However,
It emphasizesthe purely locallocally
training character of the LIMEmodels
interpretable explanation,
that maywhich
be only
usedpredicts
to explain spe-
how the present prediction will change in response to minute changes in the input values,
cific predictions and help decision-makers understand why a particular class was pre-
is one of its limitations. As a result, rather than serving as an interpretation of why the
dicted for a certain instance [46].
forecast was made in the first place, it acts more like a sensitivity study. Therefore, this
studyHowever,
also employs theanother
purelyexplanation
local character of the
technique, LIME
SHAP, explanation,
to show which
the overall only predicts
importance
how
of thethe present prediction will change in response to minute changes in the input values
factors.
is one The ofSHapley
its limitations.
AdditiveAs a result, (SHAP)
exPlanation rather than servingofas
is a method an interpretation
interpreting the resultsofofwhy the
forecast
any ML model was made in the first
by attributing place, it acts
an importance more
score likefeature
to each a sensitivity study.
in the data [48].Therefore,
In this this
study, summary plots are drawn to allow visualization of the importance of features and
their influence on prediction. The features are ranked according to the sum of the SHAP
values of all samples. The colors indicate the high and low SHAP values of the features,
i.e., blue indicates low importance and red indicates high importance. It makes use of the
SHAP values to show the impact distribution of each feature as well.
Algorithms 2024, 17, 231 11 of 21

4. Analysis and Results


This section will present the results from the application of the machine learning model
described in Section 3 of this paper to the research dataset, as well as the model evaluation
procedure. The initial step of the analysis section involves the descriptive statistics, which
will help to build an understanding of the data for the subsequent analysis. The second
area consists of, Sections 4.2 and 4.3, showing different but related results and analysis. The
section will conclude with a review of the variable and the machine learning technique that
provides the best algorithm to predict customer churn.

4.1. Descriptive Statistics


To develop a better understanding of the attributes, it is required to construct summary
statistics for all the features contained in the dataset used in this study. Table 2 shows
the outcomes.

Table 2. Table of descriptive statistics.

Count Mean Std Min 25% 50% 75% Max


Age 4601 47.89307 17.362 19 33 47 62 80
Number of Dependents 4601 0.380569 0.880541 0 0 0 0 8
Number of Referrals 4601 1.947403 2.957352 0 0 0 3 11
Tenure in Months 4601 34.63117 24.19849 1 12 32 58 72
Avg Monthly
4601 25.5811 14.26574 1.01 13.02 25.84 37.97 49.99
Long-Distance Charges
Avg Monthly GB
4601 26.12889 19.53716 2 13 21 30 85
Download
Monthly Charge 4601 81.20272 21.14967 −10 69.9 83.75 96.2 118.75
Total Charges 4601 3042.595 2391.057 42.9 847.3 2564.3 4968 8684.8
Total Refunds 4601 2.163306 8.286778 0 0 0 0 49.57
Total Extra Data Charges 4601 8.943708 28.62071 0 0 0 0 150
Total Long-Distance
4601 888.9163 866.5069 1.13 178.89 582 1417.92 3536.64
Charges
Total Revenue 4601 3938.291 3054.189 46.92 1119.4 3378.79 6412.05 11979.34

From the over 4601 customer samples used in this research study, the number of
referrals is 1.95, with an SD of 2.96, which means that, on average, each customer referred
this telecom company to two friends. The average download volume in gigabytes to the
end of the second quarter is 26.12 GB, and the customer’s total charges for additional data
downloads for the same quarter is 8.94, with an SD of 28.62. Table 2 also shows details
of the customers’ charge history, where the average values are 3042.59, 2.16, 8.94, and
888.91 for total charges, total refunds, total extra data charges, and total long-distance
charges, respectively.
The bar chart presented below was computed to examine the distribution of the target
variable across the customer service call variable. It can be observed from the chart that
the dataset is unbalanced, with almost twice as many churning samples than not churning
samples. See Figure 3.
To examine the inter-correlation among the features, a correlation matrix was com-
puted and displayed in the form of a heatmap to indicate the pairwise relationship among
the call activities and features of the telecom customers. See Figure 4.
that the dataset is unbalanced, with almost twice as many churning samples than not
churning samples. See Figure 3.
The bar chart presented below was computed to examine the distribution of the tar-
get variable across the customer service call variable. It can be observed from the chart
Algorithms 2024, 17, 231 that the dataset is unbalanced, with almost twice as many churning samples than 12 not
of 21
churning samples. See Figure 3.

Figure 3. Bar plot for customer churn distribution.

To examine the inter-correlation among the features, a correlation matrix was com-
puted and displayed in the form of a heatmap to indicate the pairwise relationship among
the call
Figure
Figure
activities
Barplot
3.3.Bar
and
plotfor
for features
customer
customer
of the
churn
churn
telecom customers. See Figure 4.
distribution.
distribution.

To examine the inter-correlation among the features, a correlation matrix was com-
puted and displayed in the form of a heatmap to indicate the pairwise relationship among
the call activities and features of the telecom customers. See Figure 4.

Figure 4.
Figure 4. Heatmap
Heatmap of
of Correlation Matrix.
Correlation Matrix.

4.2. Results Based on Confusion Matrix


Customer churn is a critical issue for telecom companies, as it can significantly impact
their revenue and profitability. Therefore, accurate prediction of customer churn is an
Figure 4. Heatmap of Correlation Matrix.
important task in the industry. This study applied five different ML algorithms to predict
customer churn in the telecom industry, including Logistic Regression, KNN model, Naïve
Bayes, Decision Tree, and Random Forest. Their performance was first evaluated using a
Confusion Matrix, which consisted of values of true negatives (TN) and true positives (TP),
false negatives (FN) and false positives (FP). As stated in Section 3.4.1, TNs are cases where
the actual negative is also predicted to be negative, while TPs are cases where the actual
positive is also predicted to be positive. FNs are cases that are positive but predicted to be
negative, while FPs are cases that are actually negative but predicted to be positive. See
Figure 5.
a Confusion Matrix, which consisted of values of true negatives (TN) and true positives
(TP), false negatives (FN) and false positives (FP). As stated in Section 3.4.1, TNs are cases
where the actual negative is also predicted to be negative, while TPs are cases where the
actual positive is also predicted to be positive. FNs are cases that are positive but predicted
Algorithms 2024, 17, 231 13 of 21
to be negative, while FPs are cases that are actually negative but predicted to be positive.
See Figure 5.

109
Random Forest 670
88
641
157
Decision Tree 614
144
593
133
Naïve Bayes 587
171
617
115
KNN Model 440
318
635
195
Logistic Regression 584
174
555

0 100 200 300 400 500 600 700 800

FP TP FN TN

Figure5.5.Confusion
Figure ConfusionMatrix
Matrixvalues
valuesfor
forthe
thefive
fivealgorithms.
algorithms.

A A clustered
clustered bar bar chart was drawn drawn to toillustrate
illustratethetheperformance
performanceofofthese thesealgorithms
algorithms in
terms
in terms of of
these
thesefourfourmetrics.
metrics.RandomRandomForest appeared
Forest to have
appeared the highest
to have number
the highest of TP
number
and
of TPTN andandTN theand
lowest numbernumber
the lowest of FN andof FP,
FN indicating that it maythat
and FP, indicating be the best-performing
it may be the best-
model of the
performing five. Itofcan
model thebe observed
five. that
It can be 641 customers
observed that 641from the testfrom
customers dataset
the are
testcorrectly
dataset
are correctly classified as non-churner, 109 of the customers who
classified as non-churner, 109 of the customers who are non-churners are incorrectly clas- are non-churners are
incorrectly classified88asofchurners,
sified as churners, 88 of the
the customers thatcustomers
are churners thatare
aremisclassified
churners areas misclassified
non-churners, as
non-churners,
and 670 of theand 670 of the
customers arecustomers
classifiedare classified
correctly correctly as
as churners. Thechurners.
predictionTheaccuracy
prediction of
accuracy
the Random of theForest
Random Forest algorithm
algorithm was computed
was computed to be to
to be equal equal to 0.8694.
0.8694. This This implies
implies that
that 86.94%
86.94% of the
of the customers
customers areare correctlyclassified
correctly classifiedby bythe
theRandom
Random Forest model. model. Random
Random
Forest
Forest is effective in identifying both customers at risk of churn (TP) and those unlikelyto
is effective in identifying both customers at risk of churn (TP) and those unlikely to
churn
churn(TN),
(TN),while
whileavoiding
avoidingmisclassification
misclassification ofofnon-churning
non-churning customers,
customers, which
whichcancan
lead to
lead
customer
to customer dissatisfaction,
dissatisfaction, or churning
or churningcustomers
customersas non-churning
as non-churning and missed opportunities
and missed opportu-
to taketo
nities action to retain
take action to them,
retain leading to loss to
them, leading of loss
profit.
of profit.
InIn terms
terms of ofthetheTP TPmetric,
metric,Decision
Decision Tree
Treewaswasthethesecond-best
second-bestchoicechoiceafter
afterRandom
Random
Forest,
Forest, indicating that it was excellent at identifying customers at risk of churn (TP: 614).
indicating that it was excellent at identifying customers at risk of churn (TP: 614).
However,
However,its itsrelatively
relativelylower lowerTN TNvalues
valuesof of593
593compared
comparedto tothe
theother
otheralgorithms
algorithmsindicate
indicate
that
thatititperforms
performspoorly poorlyin inidentifying
identifyingnon-churning
non-churningcustomers.
customers.Furthermore,
Furthermore,its itsrelatively
relatively
low FN values (144) and high FP
low FN values (144) and high FP values (157) confirm that the Decision Treea has
values (157) confirm that the Decision Tree has tendency
a ten-
to classify
dency customers
to classify as thoseasatthose
customers risk of
at churn. Therefore,
risk of churn. telecom
Therefore, companies
telecom may want
companies may
to carefully assess the accuracy of the algorithm’s positive predictions before taking any
retention action.
Naïve Bayes and Logistic Regression performed similarly with moderate levels of
performance, with high numbers of true negatives (TNNB: 617; TNLR: 555) and true
positives (TPNB: 587; TPLR: 584) as well as high numbers of false negatives (FNNB: 171;
FNLR: 174) and false positives (FPNB: 133; FPLR: 195). This suggests that these algorithms
can effectively identify customers who are likely to churn (TP) and those who are unlikely
to churn (TN), but it may also misclassify some customers as either those who will not
churn (FN) or those who will churn (FP).
The KNN model is the worst performer of all the algorithms. Although it performs
well in identifying non-churning customers, as evidenced by the high number of TNs
(635 customers), it performs rather poorly in identifying customers at risk of churning (TP:
440 customers). In addition, it has a high number of FN (318 customers) and a relatively
low number of FP (115 customers) compared to the other models, which could indicate
a tendency to misclassify positive instances (i.e., customers who will churn) as negative
Algorithms 2024, 17, 231 14 of 21

(i.e., customers who will not churn). This could potentially lead to missed opportunities for
telecom companies to take retention action.
The area under the Receiver Operating Characteristic (ROC) curve is a metric widely
used to assess the overall performance of binary classification models. The results of
AUC-ROC for all algorithms tested in this study are shown in Figure 6. Random Forest
had the highest score of 0.95, followed closely by Naïve Bayes, with a score of 0.88. These
two models were the most effective at predicting customer churn in the telecom industry in
Algorithms 2024,this
17, xstudy.
FOR PEER REVIEW
Logistic Regression also performed well, with an AUC score of 0.84, indicating 15
that it is a reliable algorithm for predicting customer churn. KNN and Decision Tree had
lower AUC scores of 0.81 and 0.8, respectively, suggesting that they may be less accurate in
distinguishing between the customers at risk of churn and customers who will not churn.
See Figure 6.

(a) Logistic Regression Model. (b) KNN Model. (c) Naïve Bayes Model.

(d) Decision Tree Model. (e) Random Forest Model.

Figure 6. ROC curves for the five models.


Figure 6. ROC curves for the five models.
In summary, from the analysis of the Confusion Matrix and AUC-ROC curve, Random
Forest was the most effective model
In summary, forthe
from automatically detecting
analysis of the customer
Confusion Matrixchurn risk for curve,
and AUC-ROC
telecom companies.
domThe performance
Forest of the
was the most five models
effective will
model for be further evaluated
automatically detecting in the
customer chur
next section. for telecom companies. The performance of the five models will be further evaluat
the next section.
4.3. Results of Analysis and Discussion
This study aims to identify
4.3. Results the system
of Analysis that produces good predictions on customer
and Discussion
churn through machine learning models. Therefore,
This study aims to identify the system it is necessary to compare
that produces goodthe models on cust
predictions
selected in this study and identify the model with the best predictive ability. Presented
churn through machine learning models. Therefore, it is necessary to compare in the m
Table 3 are the model evaluation
selected criteria
in this study andconsidered
identify theinmodel
this study.
with the best predictive ability. Presen
The accuracy Table 3 are the model evaluation criterialearning
measure is the ability of a machine model
considered to study.
in this make a correct
prediction. As observed from Table 3 above, the machine learning models based on Naïve
Bayes, Decision Tree,
Tableand Random
3. Table Forest statistics.
of descriptive algorithms computed high prediction accuracy,
as they computed well with around 80% or above accuracy measures. The Random Forest
Modelaccuracy (86.94%)
classifier computed the highest Accuracy measuresAUC with aroundSensitivity
7% points betterSpecifici
Logistic
than the decision tree model,Regression
which computed 0.7553an 80.04% 0.84 0.7400Figure 7 0.7704
accuracy measure.
KNN 0.7129
depicts a graphical representation of the model evaluation metrics. 0.81 0.8467 0.5805
Naïve Bayes 0.7984 0.88 0.8227 0.7744
Decision Tree 0.8004 0.80 0.7907 0.8100
Random Forest 0.8694 0.95 0.8547 0.8839

The accuracy measure is the ability of a machine learning model to make a c


prediction. As observed from Table 3 above, the machine learning models based on N
Bayes, Decision Tree, and Random Forest algorithms computed high prediction accu
Algorithms 2024, 17, 231 15 of 21

Table 3. Table of descriptive statistics.

Model Accuracy AUC Sensitivity Specificity


Logistic Regression 0.7553 0.84 0.7400 0.7704
Algorithms 2024, 17, x FOR PEER REVIEW KNN 0.7129 0.81 0.8467 0.5805
16 of 22
Naïve Bayes 0.7984 0.88 0.8227 0.7744
Decision Tree 0.8004 0.80 0.7907 0.8100
than the decision
Random tree model, 0.8694
Forest which computed 0.95
an 80.04% accuracy
0.8547measure. Figure
0.8839 7 de-
picts a graphical representation of the model evaluation metrics.

0.9

0.8

0.7

0.6
0.5

0.4

0.3

0.2

0.1

0
Logistic regression KNN Naïve bayes Decision tree Random forest

Accuracy AUC sensitivity Specificity

Figure 7. Model Evaluation.


Figure 7. Model Evaluation.

4.4. Model Interpretation


According totothe
According themodel
model comparison
comparison results,
results, the customer
the best best customer
churn churn prediction
prediction model
model
was was created
created from the from the random
random forest algorithm,
forest algorithm, which hadwhich had a prediction
a prediction accuracyaccuracy
of close of
to
close After
90%. to 90%. After
that, that, to with
to address address with the black-box
the black-box nature oflearning
nature of machine machinealgorithms,
learning algo-
this
rithms,illustrates
paper this paperthe
illustrates
results ofthe results of
customer customer
churn churnmodels
prediction prediction models
through twothrough two
explainable
explainable AI tools, including LIME and SHAP.
AI tools, including LIME and SHAP.

4.4.1. Local Interpretable


4.4.1. Local Interpretable Model-Agnostic
Model-Agnostic Explanations
Explanations (LIME)
(LIME)
Figure
Figure 8 shows the local explanations of the Random Forest
8 shows the local explanations of the Random Forest classifier
classifier for
for the
the first
first
sample
sample and the second sample using the LIME technique. According to Figure 8, the
and the second sample using the LIME technique. According to Figure 8, the five
five
most significant factors
most significant factors in Random-Forest-based churn
in Random-Forest-based churn detection
detection for
for the first customer
the first customer areare
the ‘Contract’, ‘Number of Dependents’, ‘Online Security’, ‘Premium Tech
the ‘Contract’, ‘Number of Dependents’, ‘Online Security’, ‘Premium Tech Support’, and Support’, and
‘Number
‘Number of of Referrals’.
Referrals’. Furthermore,
Furthermore, anan increase
increase in
in the
the value
value of
of these
these five
five factors
factors will drive
will drive
the prediction toward not churning, while an increase in the value of ‘Payment
the prediction toward not churning, while an increase in the value of ‘Payment Method’ Method’
and
and ‘Monthly Charge’ will
‘Monthly Charge’ will increase
increase the
the likelihood
likelihood ofof churning.
churning. For
For the
the second
second customer,
customer,
‘Monthly
‘Monthly Charge’ was more important than ‘Premium Tech Support’ and was one of
Charge’ was more important than ‘Premium Tech Support’ and was one of the
the
top five factors that influenced a customer’s subscription decision. In addition,
top five factors that influenced a customer’s subscription decision. In addition, contrary contrary to
the first customer, LIME indicates that an increase in the value of ‘Contract’, ‘Number of
to the first customer, LIME indicates that an increase in the value of ‘Contract’, ‘Number
Dependents’, and ‘Number of Referrals’ values can increase the likelihood of the second
of Dependents’, and ‘Number of Referrals’ values can increase the likelihood of the second
customer’s churn. See Figure 8.
customer’s churn. See Figure 8.
Algorithms 2024, 17, x FOR PEER REVIEW 17
Algorithms 2024, 17, 231 16 of 21

(a)

(b)
Figure 8. Local
Figure 8. Local explanation for explanation for (a) the
(a) the first sample first sample
(upper) and (b) (upper) andsample
the second (b) the (lower)
second of
sample
the (lower)
Random Forest Random
classifier.Forest classifier.

However,
However, the LIME the LIME
explanation explanation
only onlythe
predicts how predicts how
present the present
prediction willprediction
change will ch
in response toinminute
response to minute
changes changes
in the in the input
input values. values. this
Therefore, Therefore, this study
study also employsalso employ
other explanation
another explanation technique,
technique, SHAP, SHAP,
to show to showimportance
the overall the overallofimportance
the factors.of the factors
The
results
results are shown arefollowing
in the shown in subsection.
the following subsection.

4.4.2. SHapley4.4.2.
Additive exPlanation
SHapley Additive(SHAP)
exPlanation (SHAP)
The summaryThe plots for the random
summary plots forforest classifierforest
the random computed using
classifier the SHAP
computed tech-
using the SHAP
nique are shown in Figure 9. In terms of the overall importance of the features, ‘Contract’,
nique are shown in Figure 9. In terms of the overall importance of the features, ‘Con
‘Number of Referrals’,
‘Number‘Tenure in Months’,
of Referrals’, ‘Monthly
‘Tenure Charge’,
in Months’, and ‘Online
‘Monthly Security’
Charge’, are the Securit
and ‘Online
five factors that make the most contribution to the prediction outcomes of the Random
the five factors that make the most contribution to the prediction outcomes of the Ra
Forest classifier. To beclassifier.
Forest specific, high
To bevalues of ‘Contract’,
specific, high values ‘Number of Referrals’,
of ‘Contract’, ‘Number ‘Tenure in
of Referrals’, ‘T
Months’, and ‘Online Security’
in Months’, increaseSecurity’
and ‘Online the possibility
increaseof customer churnofpredicted
the possibility customerby the predict
churn
classifiers, while
thethe low value
classifiers, of ‘Monthly
while Charge’
the low value increases Charge’
of ‘Monthly the possibility.
increases the possibility.
Algorithms 2024, 17, x FOR PEER REVIEW 18 of 22
Algorithms 2024, 17, x FOR PEER REVIEW
Algorithms 2024, 17, 231 17 of 21 18

Figure 9. Summary plots for Random Forest classifier.


Figure 9. Summary plots9.for
Figure Randomplots
Summary Forest
forclassifier.
Random Forest classifier.

In addition,
In addition, toto provide
provide aa global
In addition, global explanation,
explanation,
to provide a global SHAP
SHAP values can
values
explanation, can
SHAP alsovalues
also be
be used canto
used explore
toalso
explore
be used to exp
the prediction
the prediction mademade for an individual
for an individual
the prediction sample.
made forsample. Therefore,
Therefore,
an individual sample. this paper
thisTherefore, computes
paper computesthis paper the SHAP the SH
thecomputes
SHAP
explanation for
explanation the same two
forexplanation
the same two samples,
forsamples,
the sameas as
two investigated
samples, asusing
investigated using LIME in
LIME
investigated inusing
Section
SectionLIME 4.4.1,
4.4.1, and
and
in Section 4.4.1,
conducts aa comparison.
conducts comparison.
conducts a comparison.
According to to Figure
Figure 10,
According10,thetomost
the most
Figure significant
significant
10, the mostfactors
factors forfor
significant the the
firstfirst
factorscustomer
customer
for sample
the first that sample
sample
customer
push the prediction
that push the prediction toward churning
toward churning
push the prediction are ‘Payment Method’,
are ‘Payment
toward churning are ‘Payment ‘Age’,
Method’, and ‘Monthly
‘Age’,‘Age’,
Method’, Charge’,
and ‘Monthly
and ‘Monthly Cha
and the ones
Charge’, that
and the and push
ones thepush
thethat
ones prediction
thatthe toward
prediction
push the not churning
toward
prediction not churning
toward are
not ‘Contract’, ‘Online
are ‘Contract’,
churning are Secu-‘Online S
‘Online
‘Contract’,
rity’, ‘City’,
Security’, andrity’,
‘City’,‘Number
and‘City’, ofand
‘Number Referrals’.
of Although
Referrals’.
‘Number ‘Age’Although
Although
of Referrals’. is‘Age’
not included
is ‘Age’ isin
not includedthe
not top 10 groups
in the
included top 10top 10 gro
in the
of LIME,ofLIME
groups LIME,and
of SHAP
LIME
LIME, and are
LIME able
SHAP to
and SHAParecorroborate
ableareto each
corroborate
able to other’s
corroborate eachjudgment
other’s
each about
judgment
other’s the signif-
judgment about
about the si
icance
the and direction
significance and of
icance the direction
and factors
direction of thefor the
thefirst
offactors sample.
for
factors the theFor
forfirst the
sample.
first second
sample. ForForcustomer
the second
the sample,
second customer as sampl
customer
shown inasFigure
sample, shown 10,
shown inSHAP
Figure shows
in Figure10, 10, that
SHAP SHAPthe
shows ‘Number
showsthat of
the
that the Referrals’,
‘Number
‘Number of‘Contract’,
ofReferrals’,
Referrals’, ‘Number
‘Contract’, of ‘Numb
‘Contract’,
Dependents’,
‘Number and ‘Monthly
of Dependents’,
Dependents’, and Charge’
‘Monthly
and are Charge’
‘Monthly theCharge’
mostaresignificant
thethe
are most factors
most that factors
significant
significant push the
factors predic-
that
thatpush
push the pre
the
tionprediction toward
tion toward
toward churning, churning,
which which
churning, is consistent
which
is consistent LIMEwith
is consistent
with LIME
with
explanation explanation
LIME well. as well.
asexplanation as well.

(a) (a)

(b)
(b)
Figure 10. SHAP explanation for (a) the first sample (upper) and (b) the second sample (low
Figure 10.
Figure 10. SHAP
SHAP explanation
explanation
the Randomfor for(a)
(a)the
Forest thefirst
firstsample
classifier. sample (upper)
(upper) and
and (b)(b)
thethe second
second sample
sample (lower)
(lower) of
of the
the Random Forest classifier.
Random Forest classifier.
Algorithms 2024, 17, 231 18 of 21

In summary, the results of the model explanations presented by the LIME and SHAP
techniques, respectively, are consistent. By showing decision-makers in the telecom indus-
try how customer-related factors affect customer retention through these interpretation
figures, decision-makers can understand how AI makes predictions in a way they under-
stand, even if they do not understand the complex algorithms of AI.

5. Discussion and Recommendations


5.1. Relevance to This Emerging Area and Its Contributions
Due to the black-box nature of machine learning algorithms, it may be difficult for
decision-makers in any industry to comprehend the advanced customer churn system
and may, therefore, lead to the systems being rejected. While aiming to address this issue,
this paper also introduces the concept of explainable ensemble learning, with the use
of ensemble-based machine learning and innovative ensemble learning, into business
analytics and illustrates the results of the best customer churn prediction model through
two visualization tools, including LIME and SHAP.
We implemented multiple classifiers including Logistic Regression, Random Forests,
Naïve Bayes, Decision Trees, and KNN to compare ensemble techniques like bagging and
boosting against individual methods [49]. The ensemble approach of combining diverse
models clearly provided superior predictive performance over any individual learner for
this key business analytics challenge.
Consequently, the developed predictive system is sufficient for distinguishing churn-
ers and non-churners and helps business developers in the telecommunication industry
to conduct a more efficient campaign for customer retention, which will help to reduce
marketing costs and churn rate effectively. The use of analytics and explainable ensemble
learning can deliver critical insights effectively to all customers, stakeholders, and employ-
ees. Therefore, it makes decision-making more streamlined and straightforward, and the
employees and other stakeholders can revise their work focus.

5.2. Implications Relevance to This Special Issue


This research aimed to develop a predictive model for customer turnover in the
telecommunications industry that could distinguish between customers who are likely
to churn in the near future and those who are loyal to the company. The value of such a
strategy to a firm is that it avoids wasting money on useless mass marketing approaches
and allows organizations to target true churners by removing customers who are likely
to churn. Furthermore, as explained in Section 1, obtaining a new customer costs eight
times as much as keeping an existing one; as a result, because the churn predictive model
can predict future churners, businesses seeking to keep their clientele can concentrate on
retention strategies rather than acquisition strategies, which are less expensive.
In addition, this study also improves the interpretability of customer churn predictive
models through the use of LIME and SHAP to optimize their local and global interpretabil-
ity. It provides decision-makers with an overall explanation of the factors affecting the
customer’s decision to churn, as well as a specific analysis for every single customer. As
a result, decision-makers can more easily understand the results of advanced predictive
models and use these explanations to develop global strategies and customize strategies
for specific customer segments.
The conclusions of this study have significant consequences for telecommunication
companies. Apart from the notion of establishing a predictive system for predicting
customer turnover in the telecommunication industry, the findings of this study may be
applied to the banking industry in terms of developing a churn prediction model for debit
card customers.
To generalize the previous discussion, we may advocate that companies employ
machine learning approaches to transform existing consumer information in their databases
into meaningful information that can assist them with their efforts to market based on the
research results. They would also benefit from using machine learning to create a projected
Algorithms 2024, 17, 231 19 of 21

churn model, which could act as an alert system for organizations and help them to spend
their retention money effectively.

5.3. Limitations of the Study


There are limitations to this study. Some of the constraints were that we could not ac-
cess some types of customer data, such as billing and credit information, due to telecom data
categorization and confidentiality restrictions. This was a significant research constraint.
Another constraint in conducting this study was the lack of demographic information
on the clients. Therefore, it was not possible to include such criteria in the classification
process, which would have improved the classification accuracy and interpretability.

6. Conclusions and Recommendations


The telecommunication industry has been hit the hardest and at a high stake, with an
average annual churn rate of 30%, resulting in a tremendous waste of money and effort. It
is essential to understand why customers leave to minimize the damage to the bottom line
of the business.
Since churn is certain to result in lost revenue, and the cost of customer acquisition
is equivalent to 5–10 times the cost spent on customer retention or satisfaction, churn
forecasting has become a major task for companies to remain competitive. While aiming
to help companies reduce customer churn to survive or grow, customer churn forecasting
can be used in the long term to obtain continuously updated, real-time customer data.
Therefore, the principal aim of this study was to build a prediction system that can help to
identify customers who are likely to leave, specifically in the telecommunication industry.
Various data mining methods like Decision Trees, Random Forests, and Logistic Regression
were utilized by experts to build the predictive model. This study created a comparison
system, since it is a more interpretable and intelligible approach to visualizing several
machine learning algorithms. The performance of the models was evaluated using the
accuracy measures, area under the curve, and sensitivity and specificity measures. The
Random Forest classifier is the best classifier in this study, with an accuracy of around 90%,
and it also performed well in other evaluation metrics.
Moreover, this paper contributes to the field by introducing the concept of explainable
ensemble learning. By integrating various tools, such as LIME and SHAP, this study
provides decision-makers with clear, actionable insights into the factors driving customer
churn. This advancement is particularly significant in addressing the opaque nature of
machine learning algorithms, making advanced analytics accessible and applicable for
strategic decision-making in business settings.

Author Contributions: Conceptualization, V.C. and F.O.A.; methodology, V.C. and F.O.A.; software,
V.C., F.O.A. and M.A.G.; validation, V.C. and Q.A.X.; formal analysis, V.C. and K.H.; investigation,
V.C.; resources, V.C.; data curation, F.O.A. and M.A.G.; writing—original draft preparation, V.C. and
F.O.A.; writing—review and editing, V.C., K.H., Q.A.X. and V.B.; visualization, K.H., Q.A.X., F.O.A.
and M.A.G.; supervision, V.C.; project administration, V.C.; funding acquisition, V.C. All authors
have read and agreed to the published version of the manuscript.
Funding: This research is partly supported by VC Research (VCR 0000183) for Prof. Chang.
Data Availability Statement: The authors do not own the data.
Acknowledgments: Thank you to reviewers and editor for their reviews.
Conflicts of Interest: The authors declare no conflicts of interest.

References
1. Eklof, J.; Podkorytova, O.; Malova, A. Linking customer satisfaction with financial performance: An empirical study of Scandina-
vian banks. Total Qual. Manag. Bus. Excell. 2020, 31, 1684–1702. [CrossRef]
2. Madhani, P.M. Enhancing Customer Value Creation with Market Culture: Developing 7Cs Framework. IUP J. Manag. Res. 2018,
17, 46–64.
Algorithms 2024, 17, 231 20 of 21

3. Chouiekh, A. Deep convolutional neural networks for customer churn prediction analysis. Int. J. Cogn. Inform. Nat. Intell. 2020,
14, 1–16. [CrossRef]
4. Duval, A. Explainable Artificial Intelligence (XAI); MA4K9 Scholarly Report; Mathematics Institute, The University of Warwick:
Coventry, UK, 2019; pp. 1–53.
5. Reilly, J. The Machine Learning Revolution: Telco Customer Churn Prediction. Technical Paper. 2023. Available online:
https://www.akkio.com/post/telecom-customer-churn (accessed on 13 April 2024).
6. Yulianti, Y.; Saifudin, A. Sequential feature selection in customer churn prediction based on Naive Bayes. IOP Conf. Ser. Mater. Sci.
Eng. 2020, 879, 012090. [CrossRef]
7. Leung, C.K.; Pazdor, A.G.; Souza, J. Explainable artificial intelligence for data science on customer churn. In Proceedings of the
2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), Porto, Portugal, 6–9 October 2021;
pp. 1–10.
8. Mishra, A.; Reddy, U.S. A comparative study of customer churn prediction in telecom industry using ensemble based classifiers.
In Proceedings of the 2017 International Conference on Inventive Computing and Informatics (ICICI), Coimbatore, India, 23–24
November 2017; pp. 721–725.
9. De Caigny, A.; Coussement, K.; De Bock, K.W. A new hybrid classification algorithm for customer churn prediction based on
logistic regression and decision trees. Eur. J. Oper. Res. 2018, 269, 760–772. [CrossRef]
10. Lalwani, P.; Mishra, M.K.; Chadha, J.S.; Sethi, P. Customer churn prediction system: A machine learning approach. Computing
2022, 104, 271–294. [CrossRef]
11. Yu, R.; An, X.; Jin, B.; Shi, J.; Move, O.A.; Liu, Y. Particle classification optimization-based BP network for telecommunication
customer churn prediction. Neural Comput. Appl. 2018, 29, 707–720. [CrossRef]
12. Fujo, S.W.; Subramanian, S.; Khder, M.A. Customer churn prediction in telecommunication industry using deep learning. Inf. Sci.
Lett. 2022, 11, 24.
13. Kampani, N.; Jhamb, D. Analyzing the role of e-crm in managing customer relations: A critical review of the literature. J. Crit.
Rev. 2020, 7, 221–226.
14. Nelson, C.A.; Walsh, M.F.; Cui, A.P. The role of analytical CRM on salesperson use of competitive intelligence. J. Bus. Ind. Mark.
2020, 35, 2127–2137. [CrossRef]
15. Al-Homery, H.; Asharai, H.; Ahmad, A. The core components and types of CRM. Pak. J. Human. Soc. Sci. 2019, 7, 121–145.
[CrossRef]
16. Calzada-Infante, L.; Óskarsdóttir, M.; Baesens, B. Evaluation of customer behavior with temporal centrality metrics for churn
prediction of prepaid contracts. Expert Syst. Appl. 2020, 160, 113553. [CrossRef]
17. Hendriyani, C.; Auliana, L. Transformation from relationship marketing to electronic customer relationship management: A
literature study. Rev. Integr. Bus. Econ. Res. 2018, 7, 116–124.
18. Chagas, B.N.R.; Viana, J.A.N.; Reinhold, O.; Lobato, F.; Jacob, A.F.; Alt, R. Current applications of machine learning techniques in
CRM: A literature review and practical implications. In Proceedings of the 2018 IEEE/WIC/ACM International Conference on
Web Intelligence (WI), Santiago, Chile, 3–6 December 2018; pp. 452–458.
19. Eachempati, P.; Srivastava, P.R.; Kumar, A.; de Prat, J.M.; Delen, D. Can customer sentiment impact firm value? An integrated
text mining approach. Technol. Forecast. Soc. Chang. 2022, 174, 121265. [CrossRef]
20. Lamrhari, S.; El Ghazi, H.; Oubrich, M.; El Faker, A. A social CRM analytic framework for improving customer retention,
acquisition, and conversion. Technol. Forecast. Soc. Chang. 2022, 174, 121275. [CrossRef]
21. Mahmoud, M.A.; Hinson, R.E.; Adika, M.K. The effect of trust, commitment, and conflict handling on customer retention: The
mediating role of customer satisfaction. J. Relat. Mark. 2018, 17, 257–276. [CrossRef]
22. De Caigny, A.; Coussement, K.; De Bock, K.W.; Lessmann, S. Incorporating textual information in customer churn prediction
models based on a convolutional neural network. Int. J. Forecast. 2020, 36, 1563–1578. [CrossRef]
23. De Caigny, A.; Coussement, K.; Verbeke, W.; Idbenjra, K.; Phan, M. Uplift modeling and its implications for B2B customer churn
prediction: A segmentation-based modeling approach. Ind. Mark. Manag. 2021, 99, 28–39. [CrossRef]
24. Seymen, O.F.; Dogan, O.; Hiziroglu, A. Customer Churn Prediction Using Deep Learning. In Proceedings of the 12th International
Conference on Soft Computing and Pattern Recognition (SoCPaR 2020). SoCPaR 2020; Advances in Intelligent Systems and Computing;
Springer: Cham, Switzerland, 2021; Volume 1383. [CrossRef]
25. Guo, W.W.; Xue, H. Crop yield forecasting using artificial neural networks: A comparison between spatial and temporal models.
Math. Probl. Eng. 2014, 2014, 857865. [CrossRef]
26. Shahi, T.B.; Shrestha, A.; Neupane, A.; Guo, W. Stock price forecasting with deep learning: A comparative study. Mathematics
2020, 8, 1441. [CrossRef]
27. Aldunate, Á.; Maldonado, S.; Vairetti, C.; Armelini, G. Understanding customer satisfaction via deep learning and natural
language processing. Expert Syst. Appl. 2022, 209, 118309. [CrossRef]
28. Raza, K. Improving the prediction accuracy of heart disease with ensemble learning and majority voting rule. In U-Healthcare
Monitoring Systems; Academic Press: Devon, UK, 2019; pp. 179–196.
29. Ahmad, I.; Yousaf, M.; Yousaf, S.; Ahmad, M.O. Fake news detection using machine learning ensemble methods. Complexity 2020,
2020, 8885861. [CrossRef]
Algorithms 2024, 17, 231 21 of 21

30. Sahoo, A.K.; Pradhan, C.; Das, H. Performance evaluation of different machine learning methods and deep-learning based
convolutional neural network for health decision making. Nat. Inspired Comput. Data Sci. 2020, 871, 201–212.
31. Wei, C.P.; Chiu, I.T. Turning telecommunications call details to churn prediction: A data mining approach. Expert Syst. Appl. 2002,
23, 103–112. [CrossRef]
32. Anees, R.T.; Nordin, N.A.; Anjum, T.; Cavaliere, L.P.L.; Heidler, P. Evaluating the Impact of Customer Relationship Management
(CRM) Strategies on Customer Retention. Bus. Manag. Strateg. 2020, 11, 117–133. [CrossRef]
33. Shrestha, Y.R.; Ben-Menahem, S.M.; Von Krogh, G. Organizational decision-making structures in the age of artificial intelligence.
Calif. Manag. Rev. 2019, 61, 66–83. [CrossRef]
34. Buchanan, B.G.; Wright, D. The impact of machine learning on UK financial services. Oxf. Rev. Econ. Pol. 2021, 37, 537–563.
[CrossRef] [PubMed]
35. Fernández-Rovira, C.; Valdés, J.Á.; Molleví, G.; Nicolas-Sans, R. The digital transformation of business. Towards the datafication
of the relationship with customers. Technol. Forecast. Soc. Chang. 2021, 162, 120339. [CrossRef]
36. Bauer, K.; von Zahn, M.; Hinz, O. Expl(AI)ned: The impact of explainable artificial intelligence on users’ information processing.
Inf. Syst. Res. 2023, 34, 1582–1602. [CrossRef]
37. Teng, Y.; Zhang, J.; Sun, T. Data-driven decision-making model based on artificial intelligence in higher education system of
colleges and universities. Expert Syst. 2023, 40, e12820. [CrossRef]
38. Djeffal, C. The Normative Potential of the European Rule on Automated Decisions: A New Reading for Art. 22 GDPR. Z.
Ausländisches Öffentl. Recht Völkerr. 2020, 81, 847–879.
39. Nkolele, R.; Wang, H. Explainable Machine Learning: A Manuscript on the Customer Churn in the Telecommunications Industry.
In Proceedings of the 2021 Ethics and Explainability for Responsible Data Science (EE-RDS), Johannesburg, South Africa, 27–28
October 2021; pp. 1–7.
40. Shapley, L.S. A value for n-person games. In Contributions to the Theory of Games; Princeton University Press: Princeton, NJ, USA,
1953; Volum 2, pp. 307–317.
41. Štrumbelj, E.; Kononenko, I. Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst.
2014, 41, 647–665. [CrossRef]
42. Gramegna, A.; Giudici, P. SHAP and LIME: An evaluation of discriminative power in credit risk. Front. Artif. Intell. 2021,
4, 752558. [CrossRef] [PubMed]
43. Kwon, O.; Lee, N.; Shin, B. Data quality management, data usage experience and acquisition intention of big data analytics. Int. J.
Inf. Manag. 2014, 34, 387–394. [CrossRef]
44. Provost, F.; Fawcett, T. Data science and its relationship to big data and data-driven decision making. Big Data 2013, 1, 51–59.
[CrossRef] [PubMed]
45. Martínez-Plumed, F.; Contreras-Ochando, L.; Ferri, C.; Hernández-Orallo, J.; Kull, M.; Lachiche, N.; Ramírez-Quintana, M.J.;
Flach, P. CRISP-DM twenty years later: From data mining processes to data science trajectories. IEEE Trans. Knowl. Data Eng.
2019, 33, 3048–3061. [CrossRef]
46. Hakkoum, H.; Idri, A.; Abnane, I. Artificial neural networks interpretation using LIME for breast cancer diagnosis. In Trends and
Innovations in Information Systems and Technologies; Springer: Cham, Switzerland, 2020; pp. 15–24.
47. Hutter, F.; Xu, L.; Hoos, H.H.; Leyton-Brown, K. Algorithm runtime prediction: Methods & evaluation. Artif. Intell. 2014, 206,
79–111.
48. Antwarg, L.; Miller, R.M.; Shapira, B.; Rokach, L. Explaining anomalies detected by autoencoders using Shapley Additive
Explanations. Expert Syst. Appl. 2021, 186, 115736. [CrossRef]
49. Lessmann, S.; Baesens, B.; Seow, H.V.; Thomas, L.C. Benchmarking state-of-the-art classification algorithms for credit scoring: An
update of research. Eur. J. Oper. Res. 2015, 247, 124–136. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like