0% found this document useful (0 votes)
17 views

Clustering Comparison of Customer Attrition Dataset using Machine Learning Algorithms

This study compares various machine learning algorithms for predicting customer attrition, focusing on techniques like KMeans, Affinity, and Agglomerative Clustering. The research aims to enhance customer retention strategies by identifying patterns in customer behavior and predicting churn using historical data. The findings suggest that effective clustering and predictive analytics can significantly improve customer relationship management in competitive industries such as telecommunications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Clustering Comparison of Customer Attrition Dataset using Machine Learning Algorithms

This study compares various machine learning algorithms for predicting customer attrition, focusing on techniques like KMeans, Affinity, and Agglomerative Clustering. The research aims to enhance customer retention strategies by identifying patterns in customer behavior and predicting churn using historical data. The findings suggest that effective clustering and predictive analytics can significantly improve customer relationship management in competitive industries such as telecommunications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/24apr643

Clustering Comparison of Customer Attrition


Dataset using Machine Learning Algorithms
Anitha R1; Aameer Khan S2; Harini Murugan3; Nithisshkrishna KS4
1-4
Department of Artificial Intelligence and Machine Learning,
Rajalakshmi Engineering College, Chennai, India

Publication Date: 2025/03/19

Abstract: In the dynamic landscape of today's business environment, customer retention is a critical factor for sustainable
growth and success. This project focuses on developing and comparing machine learning models for customer attrition
and churn prediction using state-of-the-art algorithms such as Affinity, Birch, KMeans, and Agglomerative Clustering.
The objective of this study is to evaluate the effectiveness of these clustering algorithms in identifying patterns and
predicting customer churn. Using a dataset containing historical customer data, the project aims to create prediction
models that can assist firms in proactively addressing possible churn concerns and implementing targeted retention
efforts. The study is significant because it can give businesses predictive analytics capabilities to enhance their customer
relationship management strategies, by figuring out which customers are likely to leave. In addition, the project intends to
execute label selection by evaluating each feature individually according to its impurity score and to perform cluster
classification to choose the optimal cluster according to its metrics. The study concentrates on the crucial machine learning
methods for calculating client churn. This can include improving customer service, offering loyalty programs, or adjusting
pricing strategies.

Keywords: Customer Attrition - Apache Spark - K-Means Clustering - Web Application - Customer Retention- Logistic Regression
- Machine Learning Algorithms.

I. INTRODUCTION go, many businesses frequently experience customer loss.


The main goal of this research is to provide telecom
Customers represent the utmost value in any industry businesses with rapid and precise techniques for identifying
resource since they are the primary engine of profit consumers likely to leave. However, in the
generation. Organizations in today’s world understand that telecommunication industry, the customer churns quite
they should invest in a lot of strategies that encourage often. The highly competitive nature of telecom industry
customer retention and satisfaction. For many years, companies actively monitors customer behavior and predict
businesses have utilized. churn through data-driven insights and also allocate
resources strategically to retain customers [3]. Customer
customer churn to increase revenue and create long- churning is the analysis or estimate of analysis or the degree
lasting relationships with their clients [16]. Churners are of customers who turn to shift to an alternative [12].
those who relocate to other companies for several reasons.
To reduce customer turnover, the organization should Reactive and proactive customer churn management
possess the ability to accurately forecast the customer are the two main approaches, as demonstrated by Van den
actions and underlying causes under their control. The Poel and Burez [1]. When a business adopts a reactive
binary classification task of prediction distinguishes strategy, it holds off on terminating its service connection
churners from non-churners. Customer attrition prediction until the client requests it. In this instance, the business will
has become a critical area of focus which gained interest in provide the client a reason to stick around. When the
recent times. Telecommunication companies, in particular, business takes a proactive approach, it looks for clients who
are facing increasing pressure to retain their customers in a are about to purchase before the others do. The business then
saturated market. The emergence of customer attrition gives them unique incentives to prevent these customers
prediction has led to a significant change in the telecom from quitting. Employing machine learning techniques that
industry. Churn is one of the most important service aspects learn from data iteratively: K-means clustering and logistic
in the telecommunications industry [9]. The customer data regression are two well-liked algorithms that can be
collected from multiple diverse contact points empowers used to forecast client attrition.An approach for supervised
companies to develop personalized products, nurture learning called logistic regression can be used to model the
innovation, tailor products and services, and thereby likelihood of a binary outcome, such as a customer's
enhance customer satisfaction and competitive advantage likelihood of churning. To calculate the likelihood of churn,
[2]. Due to their inability to forecast when a consumer will a linear model is built using several independent

IJISRT24APR643 www.ijisrt.com 3432


Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/24apr643
variables, including customer satisfaction, consumption (Nadeem, Umar, and Shahzad, 2018). This method
trends, and demographics. An unsupervised learning effectively organizes data based on the most significant
approach called K-means clustering can be used to cluster characteristics, even in scenarios where the vectors are
data points according to how similar they are. To minimize nonlinearly separable [14]. In the SVM system, a few key
the separation between data points within a cluster and parts assume essential parts:
increase the separation between data points in distinct
clusters, iteratively divide the data into a predefined  M: Indicates how many samples there are in the training
number of clusters. While there are numerous methods for set.
predicting and assessing customer attrition, only a small  Xi: Denotes vector support when the value of ai exceeds
number of them generate accurate forecasts and function 0.
well with large amounts of data.  'X': Represents an unidentified vector sample.
 δ (delta): Serves as a threshold or margin.
II. METHODOLOGY  (ai): Parameter derived from solving a convex
quadratic programming problem related to linear
 Random Forest Algorithm (Xiancheng Xiahou): constraints.
The methodology utilized in this research is the
Random Forest algorithm. This algorithm was employed as In practice, various kernel functions are employed,
a robust and efficient feature selection method. Random such as the Polynomial kernel and Gaussian radial basis
Forest is widely recognized for its exceptional classification functions (RBF), to transform data into higher-dimensional
preciseness, capability to withstand noise and anomalies, spaces, permitting greater effectual class separation. The
and to generalize well across various domains including threshold (δ) is another parameter determined by selecting
business management, economics, finance, and biological any 'i' where ai is greater than 0, and it satisfies the Karush–
sciences.Given the dataset's considerable dimensionality of Kuhn–Tucker condition (Burges, 1998).
17 variables, the challenge was to determine the number of
features (M) to include in the predictive model. To address In summary, SVM is a powerful classification
this, the Out-of-Bag (OOB) error was utilized as a standard technique that maximizes the margin between data points
of measurement for feature selection. In order to compute in a high-dimensional space. It is an important technique in
the OOB error, different bootstrap samples were used for the data mining and classification tasks since it is especially
training set throughout the construction of every tree in the helpful when working with complex and nonlinearly
Random Forest. Surprisingly, the number of randomly separable data.
selected features changed. It was seen that the distinctions in
the OOB mistake rates were minimal.This suggested that the  KNN (Prabadevi. B):
choice of the feature count (M) did not significantly impact This kind of boosting is called random gradient
the model's performance. Consequently, the decision was boosting. For every repeat, a subset of the training data is
made to select four features in each iteration, resulting in a randomly (and without replacement) chosen from the whole
relatively low OOB error.This suggested that the choice of preparation dataset. Then, the randomly chosen subsample is
the feature count (M) did not significantly impact the used to fit the base student instead of learning from the
model's performance. Consequently, the decision was made complete example. A few possible stochastic variants are as
to select four features in each iteration, resulting in a follows: Once the columns have been subsampled, create
relatively low OOB error. Fourvariables were identified as each tree. Before creating each tree, subsample the
crucial for predicting customer churn: "Night Buy," "PM segments.
Buy," "Night PV," and "PM PV." These variables were
considered key indicators for predicting customer loss in the  Rule of Training.
churn prediction model [15].
 The initial phase of the teaching for training.
 U-Net (Karan Jakhar et al.):  Inputs
In the realm of data analysis, the abundance of
 A tendency
available data often necessitates a process of classification,
 The learning rate should be set to a level appropriate for
grouping this data into various categories or types, such as
basic estimation, and NN parameters like biases and
sound, video, and text designs. This characterization is
fundamental for viable information mining, which envelops weights should be set to a desirable zero.
a scope of functionalities like grouping, segregation,  Begin each information unit with the following Si (i=1 to
affiliation, and bunching. Numerous complete frameworks n) = xi
are intended to give a set-up of information mining  After the result, get feedback from the web.
functionalities inside a solitary stage (Neha and Vikram,  Using the appropriate activation function, ascertain a
2015). One notable classification technique is the Support conclusive result based on the results of step 6.
Vector Machine (SVM), which excels in handling linear
permutations of subsets within a training dataset. SVM aims In this paper, we introduce stochastic gradient boosting
to find a maximum margin separating hyperplanes in a high- to enable the gradient boosting approach to be used for both
dimensional feature space, which is particularly useful when continuous target variables and categorical objective
dealing with nonlinearly separable information highlights variables (as a classifier or regression). The model's bias
error is reduced by the application of gradient boosting. Log

IJISRT24APR643 www.ijisrt.com 3433


Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/24apr643
loss is the price performed when it is used as a classifier, customer behavior. Creating labels that indicate whether a
and mean square error (MSE) is the price performed when it customer has churned (assigned a value of 1) or not
is used as a regressor. Because it can improve a wide range (assigned a value of 0) is an essential step in training such a
of unlucky works and provides a few hyper boundary tuning model. It is recognized, nevertheless, that the sechurn
options that make the capacity fit entirely flexible, stochastic label assignments may be somewhat arbitrary, given that
gradient boosting is incredibly versatile. There is no need to churn is frequently determined by a variety of variables and
prepare the data; both the numerical values and the interpretations.
categories function well [7].
A matrix layer is subsequently created from these
 Churn Prediction Using Naive Bayes (Khulood Ebrah): labeled data points, and this layer is used as the input
A set of conditional independence presumptions and the for a Long Short-Term Memory Recurrent Neural
Bayes rule serve as the foundation for the Naive Bayes Network (LSTM-RNN). A particular kind of RNN cell
algorithm, which is a classification algorithm [11]. For each called an LSTM cell is made to handle sequential data and
class Ci, P(X|Ci) P(Ci) is evaluated in order to predict the crecognize long-range dependencies. Even though they
class label of X.The classifier predicts that the class label of work incredibly well for things like time-series analysis and
tuple X is the class Ci if and only if. natural language processing, they do have certain
disadvantages. Because of their intricate architecture with
 P(X|Ci)P(Ci)>P(X|Cj)P(Cj) for 1≤j≤m,j≠i Stated many parameters and operations, LSTM cells have a higher
differently, the class Ci with the highest P(X|Ci)P(Ci) is computational cost and require more memory and time to
the projected class label [12]. models posterior probability train and operate. LSTM-RNNs are preferred because of
using the Bayesian approach. Specifically, for every k=1,⋯, their superior capacity to represent sequential data in spite of
K, these disadvantages.

 Pˆ(Y=k|X1,⋯,Xp)=π(Y=k)πP(Xj|Y=k)/∑π(Y=k)P(Xj|Y= Simple RNN cells, on the other hand, require less


k)where Y represents the random variable associated computing power but have difficulty identifying long-term
with the churn class index of an observation. The relationships in data. RNNs have benefits of their own
predictors of an observation are X1, ⋯, Xp. (Y=k) is the within the larger neural network context. They are
previous probability that a class index is k.The model appropriate for a variety of tasks because of their
uses the mean and standard deviation to distribute architecture, which is based on deep neural networks and
predictors within each class. enables them to process information both sequentially and
concurrently. Furthermore, the network can replicate some
Using Naive Bayes classification, the method aspects of the brain's processing capabilities, particularly in
determines the parameters of a probability distribution, retaining and utilizing information over lengthy sequences,
provided that predictors are conditionally [4] independent thanks to the addition of memory cells, as seen in LSTM-
given the class. Step 1: Predicting The method computes the RNNs. As a result, even with their high processing
posterior probability of a sample belonging to each class for requirements, LSTM cells are still a good option for issues
any unseen test data. The test data is then categorized by the like churn prediction where it's important to comprehend
method according to the largest posterior probability. sequential data and long-term dependencies.

 Customer Churn Analysis Using LSTM-RNN Model  Automated Pneumothorax Detection and Quantification
(Nagaraju Jajam): from CT Scans (Soumi De):
Churn describes the consumer who transfers from one The Sampling-based Stack Framework (SS-IL) that
supplier of telephone services to another [17]. In order to has been proposed provides a new method for churn
precisely ascertain the probability of customer churn from prediction. This framework makes use of ensemble learning
the provided dataset, the LSTM-RNN model is applied in to improve classifier performance. The outputs of several
the churn classification process. In order to do this, a deep base classifiers are combined using the potent technique of
learning framework with an attention layer that improves the ensemble learning to arrive at a final classification. A
comprehension of churn classification accuracy is used. To particular type of ensemble learning called stacking uses
fully execute the suggested LSTM-RNN model, a few more multiple base learners, also called level-0 learners, who are
processing steps are needed. First, features are convolution, trained with the same training dataset.
and then input data is loaded into the LSTM-RNN
architecture. This stage is dedicated to obtaining detailed The SS-IL framework is unique in that it uses different
semantic information from the word order. Furthermore, the training datasets for the classifiers at level 0 of the
temporal relationships between features are efficiently classification. By using sampling techniques, the goal is to
identified and captured by the LSTM-RNN architecture, increase the variety of attributes taken into account and
which in turn produces a feature vector that aids in the make it easier for the ensemble to gather important
overall churn classification process. (5) information. The goal of this training data diversification is
to raise the framework's overall predictive power.
Assuming the semantic meaning of input data also
entails comprehending the context and underlying Furthermore, a meta-learner—an additional
information contained in the data, especially as it relates to component of the SS-IL framework—is trained with the

IJISRT24APR643 www.ijisrt.com 3434


Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/24apr643
predictions produced by the level-0 learners. Effective levels could both benefit greatly from these strategies.
instance classification is made possible by this meta- Telecom companies can improve customer satisfaction,
learner's acquisition of the combination weights for each of lower revenue loss associated with churn, and ultimately
the decision probabilities supplied by the base-level bolster their competitive position in the market by precisely
classifiers. A stacked ensemble such as SS-IL relies on the identifying customers who are at risk of leaving and
level-0 base learners to facilitate the information gained customizing retention efforts to meet their specific needs.
from the features used in training the meta-learner. This Our research aims to enhance comprehension of customer
framework is essentially based on the idea that multiple attrition prediction and stimulate additional investigation
classifiers' combined knowledge and the variety of training and creativity in the big data analytics and
data improve predictive robustness and accuracy. telecommunications domains.

Note that although the SS-IL framework is discussed REFERENCES


in the context of churn prediction in this content, there are
indications of possible medical applications as well, [1]. Burez J., & Van den Poel, D “Crm at a pay-TV
particularly for monitoring and diagnosing pneumothorax in company: Using analytical models to reduce
clinical settings. The framework highlights its significance customer attrition by targeted marketing for
beyond predictive analytics by demonstrating its versatility subscription services”, Expert Systems with
and utility across different domains, potentially improving Applications 32, 277– 288.
patient care and saving time. [2]. Ledro, C., Nosella, A., & Vinelli, A. (2022).
Artificial intelligence in customer relationship
III. PROPOSED IMPLEMENTATION management: literature review and future research
directions. Journal of Business & Industrial
The paper aims to deploy an ensemble approach to Marketing, 37(13), 48-63.
provide businesses with a holistic and proactive solution for [3]. Jain, H., Khunteta, A., & Srivastava, S. (2020). Churn
predicting customer churn and optimizing customer prediction in telecommunication using logistic
retention strategies.The methodology involves preprocessing regression and logit boost. Procedia Computer
and enhancing the dataset to ensure optimal performance of Science, 167, 101-112.
the predictive models. The customer base is segmented [4]. Khulood Ebrah, Selma Elnasir “Churn Prediction
using Agglomerative Clustering, Affinity, Birch, and Using Machine Learning and Recommendations
KMeans algorithms, enabling the identification of distinct Plans for Telecoms”.Journal of Computer and
customer groups with varying churn probabilities. Communications > Vol.7 No.11, November 2019.
Subsequently, predictive models are developed for each [5]. Nagaraju Jajam, Nagendra Panini Challa, Kamepalli
cluster, enhancing the granularity of churn predictions.The S.L.Prasanna “Arithmetic Optimization With
ensemble approach integrates the predictions from Ensemble Deep Learning SBLSTM-RNN-IGSA
individual models, capitalizing on the strengths of each Model for Customer Churn Prediction” in IEEE vol
algorithm. This ensures a more robust and accurate 11.
prediction by considering diverse customer behaviors and [6]. Soumi De, Prabu.P” A Sampling-Based Stack
patterns. Framework for Imbalanced Learning in Churn
Prediction in IEEE vol 10.
IV. RESULTS & DISCUSSION [7]. Prabadevi.B, Shalini.R, Kavitha.B.R (2023).
Customer Churning analysis using machine learning
The goal of business studies in the telecommunications algorithms. In International Journal of Intelligent
sector is to increase their financial gains. Churn prediction is Networks.
widely recognized as the primary revenue stream for [8]. M. Alizadeh, D. S. Zadeh, B. Moshiri and A.
telecommunications companies. This paper examined Montazeri, "Development of a Customer Churn
methods for developing a big data application that predicts Model for Banking Industry Based on Hard and Soft
the percentage of customer attrition. The telecom sector can Data Fusion," in IEEE Access, vol. 11, pp. 29759-
benefit greatly from the combination of big data technology 29768, 2023, doi: 10.1109/ACCESS.2023.3257352
and machine learning to predict customer attrition. We [9]. Anand, M., Shaukat, I., Kaler, H., Narula, J., & Rana,
understood how crucial it was to preprocess and get the data P. S. Hybrid Model for the Customer Churn
ready for analysis. Our goal was to identify the key elements Prediction
influencing customer churn by analyzing pertinent attributes [10]. Zadoo, A., Jagtap, T., Khule, N., Kedari, A., &
and customer behavior.Additionally, by employing methods Khedkar, S. (2022, May). A review on churn
like logistics regression and K-means clustering, significant prediction and customer segmentation using machine
attributes can be extracted from massive amounts of telecom learning. In 2022 International Conference on
data, and pertinent data can be fed into machine learning Machine Learning, Big Data, Cloud and Parallel
algorithms to anticipate and prevent customer attrition. All Computing (COM-IT-CON) (Vol. 1, pp. 174-178).
things considered, the approaches under review hold IEEE..
promise for developing a well-rounded machine-learning [11]. Mitchell, T.M. (2015) Generative and Discriminative
model that will aid the sector in reducing customer Classifiers: Naive Bayes and Logistic Regression.
attrition.The industry's bottom line and customer satisfaction [12]. Han, J., Pei, J. and Kamber, M. (2011) Data

IJISRT24APR643 www.ijisrt.com 3435


Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/24apr643
Mining: Concepts and Techniques. Elsevier,
Amsterdam.
[13]. PM, U., & Balaji, N. V. (2019). Analyzing
Employee attrition using machine learning.
Karpagam Journal of Computer Science, 13, 277-282.
[14]. Abdulsalam Sulaiman Olaniyi , Arowolo Micheal
Olaolu , Bilkisu Jimada- Ojuolape , Saheed Yakub
Kayode,,” Customer Churn Prediction in Banking
Industry Using K-Means and Support Vector
Machine Algorithm. In International Journal of
Multidisciplinary Sciences and Advanced
Technology Vol 1 No 1 (2020) 48–54.
[15]. Xiancheng Xiahou and Yoshio Harada , "B2C E-
Commerce Customer Churn Prediction Based on K-
Means and SVM.
[16]. Seymen, O. F., Dogan, O., & Hiziroglu, A. (2020,
December). Customer churn prediction using deep
learning. In International Conference on Soft
Computing and Pattern Recognition (pp. 520-529).
Cham: Springer International Publishing.
[17]. Fujo, S. W., Subramanian, S., & Khder, M. A.
(2022). Customer churn prediction in the
telecommunication industry using deep learning.
Information Sciences Letters, 11(1), 24.

IJISRT24APR643 www.ijisrt.com 3436

You might also like