0% found this document useful (0 votes)
53 views

Machine Learning Based Telecom-Customer Churn Prediction

In the highly competitive telecom sector, maintaining client loyalty is a critical obstacle to longterm profitability and expansion. This research uses the Random Forest and Logistic Regression algorithms to give a detailed investigation of customer attrition prediction specifically for the telecom industry. Building a strong predictive model to identify possible churners will enable telecom businesses to implement focused customer loyalty campaigns.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Machine Learning Based Telecom-Customer Churn Prediction

In the highly competitive telecom sector, maintaining client loyalty is a critical obstacle to longterm profitability and expansion. This research uses the Random Forest and Logistic Regression algorithms to give a detailed investigation of customer attrition prediction specifically for the telecom industry. Building a strong predictive model to identify possible churners will enable telecom businesses to implement focused customer loyalty campaigns.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Volume 9, Issue 1, January 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Machine Learning Based Telecom-Customer


Churn Prediction
C. Subalakshmi; G. Bhanu Praveen; C. V. Saketh; N. Reddy Samba Siva Reddy
IV Year B.Tech CSE DS(AI) Students, Dept of Computer science and Engineering,
DR. M.G.R Educational and Research Institute, Maduravoyal, Chennai-95, Tamil Nadu, India

Abstract:- In the highly competitive telecom sector, Our approach's telecom-specific peculiarities are
maintaining client loyalty is a critical obstacle to long- highlighted by the incorporation of a wide range of industry-
term profitability and expansion. This research uses the specific elements. These include call trends, use
Random Forest and Logistic Regression algorithms to information, and contacts with customer service
give a detailed investigation of customer attrition representatives, among other aspects of consumer behavior.
prediction specifically for the telecom industry. Building Taking into account the complex nature of customer
a strong predictive model to identify possible churners attrition, our approach attempts to analyze and comprehend
will enable telecom businesses to implement focused the complex dynamics that lead up to the loss of subscribers.
customer loyalty campaigns.
Because of its capacity to manage intricate, non-linear
Our methodology incorporates a wide range of interactions within the data, the Random Forest algorithm
telecom-specific characteristics, such as call trends, was selected, whereas Logistic Regression offers
usage information, and customer support exchanges. By interpretability and insights into the importance of specific
utilizing the Random Forest and Logistic Regression features. By carefully crafting its features, we improve the
methods, we may increase the forecasting accuracy by model's ability to identify nuanced patterns that are
exploring the complex patterns that indicate customer exclusive to the telecom sector, hence enhancing its
churn. Carefully considered feature engineering predictive power.
techniques are used to improve the model's capacity to
capture subtleties specific to the telecom . Our approach We perform a thorough analysis on a real-world
is validated using a real-world telecom dataset that telecom dataset that covers a variety of client groups in
includes a range of customer categories. Performance order to validate the effectiveness of our strategy. In order to
metrics such as F1 score, recall, accuracy, and precision provide a thorough evaluation of the Important metrics
show how well our model forecasts customer attrition in including recall, accuracy, precision, and the F1 score are
the dynamic telecom market. utilised to assess the model's success in predicting customer
attrition in the dynamic telecom industry.
Keywords:- Customer Churn, Machine Learning, Telecom
Sector, Performance Metrics In the fiercely competitive and technologically-
advancing telecom industry, keeping customers loyal poses
I. INTRODUCTION a significant issue. The importance of this issue is
highlighted by recent industry studies, which show that
The telecommunications industry is characterized by telecom businesses face an annual turnover rate of 10% to
constant change and fierce competition, making customer 15%. These figures underscore the necessity of taking
retention essential to long-term survival, market share, and proactive steps to reduce customer attrition in addition to
profitability. Telecom firms confront a daunting task as highlighting the financial impact of the problem. The
consumer options grow and technology advances: not only strategic importance of anticipating and reducing customer
must they attract new consumers, but more importantly, they attrition is becoming more and more evident as telecom
must hold onto their current clientele. One major barrier to carriers come to terms with the harsh fact that recruiting
reaching this retention target is customer churn, the new customers can cost up to five times more than keeping
occurrence when users defect to rival service providers. existing ones.This study uses sophisticated machine learning
methods, including the Random Forest and Logistic
In the telecom sector, which is renowned for its quick Regression algorithms, to explore the complex field of
innovation and changing consumer expectations, this study customer churn prediction in the telecom sector. By doing
conducts a thorough investigation into the field of customer this, we hope to give telecom companies a powerful
churn prediction. Our research is centered on the use of prediction model that does more for them than just identify
sophisticated machine learning methods, particularly the possible churners; instead, it gives them useful information
Random Forest and Logistic Regression algorithms. Our that they can use to launch targeted, successful customer
goal is to create a strong predictive model that can loyalty efforts.
accurately detect probable churners by utilizing these
advanced approaches.

IJISRT24JAN1349 www.ijisrt.com 1466


Volume 9, Issue 1, January 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Our technique takes into account a wide range of utilizing huge datasets to identify hidden patterns and gain
telecom-specific characteristics, such as call statistics, usage previously unreachable insights. Our study contributes to the
data, and customer support interactions. This strategy is continuing discussion on the practical application of
based on the understanding that complex datasets specific to machine learning in tackling industry-specific difficulties
the telecom sector require advanced algorithms that can since it coincides with the current trend of leveraging real-
identify minute trends that indicate customer world telecom information.
attrition.Utilizing advanced machine learning techniques,
such as the Random Forest and Logistic Regression Despite the advances made in the field, problems
algorithms, this study delves into the intricate domain of remain. The intrinsic complexity of telecom data, privacy
telecom customer churn prediction. By doing this, we intend concerns, and the necessity for interpretability in predictive
to provide telecom firms with a robust prediction model that models remain hot topics in academic and corporate circles.
helps them with more than just identifying potential By expanding on previous studies, we hope to improve our
churners; rather, it provides them with actionable insights understanding of customer churn prediction in the telecom
that they can utilize to initiate focused, fruitful customer business. We hope to add nuanced insights and practical
loyalty initiatives. Numerous telecom-specific factors, approaches geared to the unique issues presented by the
including call records, usage data, and customer support telecom sector by focusing primarily on the Random Forest
interactions, are taken into consideration by our method. and Logistic Regression algorithms.
This approach is predicated on the knowledge that
sophisticated algorithms are needed to detect minute trends We will elaborate on our methodology in the following
that signify customer attrition in complicated datasets sections of this paper, detailing the application of these
unique to the telecom industry. algorithms to a real-world telecom dataset. By connecting
our work with and expanding on existing studies, we hope to
In conclusion, this study not only fills a critical gap in provide a thorough and forward-thinking contribution to the
the telecom industry's understanding of customer churn subject of customer attrition prediction in telecoms.
prediction, but it also advances the conversation about the
strategic business insights that can be obtained from III. ALGORITHMS
advanced machine learning techniques. By giving telecom
companies a customized approach and useful data, we hope Using cutting edge machine learning algorithms is
to enable them to continue their unwavering focus on essential when it comes to telecom sector predictive
customer retention and long-term profitability. modeling for customer attrition. This study uses two well-
known algorithms, Random Forest and Logistic Regression,
II. EXISTING WORK which each bring unique advantages to the difficult issue of
subscriber attrition prediction.
In the ever-changing telecom industry, the need to
combat customer churn has sparked a slew of research A. Logistic Regression
projects targeted at applying advanced analytics and A statistical technique for simulating the likelihood of
machine learning approaches. Previous research in the realm a binary outcome is called logistic regression. The binary
of customer churn prediction has laid the groundwork for result in customer churn prediction usually indicates
our current study, offering light on approaches, obstacles, whether a client will churn (1) or not (0). The logistic
and outcomes that have impacted our understanding of this function (sigmoid function) is used to convert a linear
essential subject. combination of input data into a probability value between 0
and 1. It is used to predict a categorical dependent variable
A thorough analysis of the available literature indicates from a given set of independent variables.
a plethora of ways to predicting customer attrition in the  With logistic regression, the result of a categorical
telecommunications industry. Researchers have investigated dependent variable is predicted. As a result, a discrete or
the effectiveness of various machine learning techniques, category value must be the result.
such as decision trees, neural networks, and ensemble  Instead of providing the exact values, which are 0 and 1,
methods. Notably, studies have highlighted the potential of it provides the probabilistic values, which fall between 0
Random Forest and Logistic Regression algorithms, which and 1. It can be either Yes or No, 0 or 1, true or False,
are relevant to our current research.Several significant etc.
publications have delves into the complexities of telecom  With the exception of how they are applied, logistic
datasets, emphasizing the value of feature engineering and regression and linear regression are very similar. While
the nuanced interpretation of customer behavior patterns. logistic regression is used to solve classification
For example, highlighted the efficacy of call detail records difficulties, linear regression is used to solve regression
and customer contact data in predicting churn, revealing problems.
temporal and behavioral elements impacting subscriber  In logistic regression, we fit a "S" shaped logistic
attrition. Furthermore, the introduction of big data analytics function, which predicts two maximum values, rather
has sparked a paradigm shift in the forecast of client than a regression line. (0 or 1).
attrition. Recent research has underlined the importance of

IJISRT24JAN1349 www.ijisrt.com 1467


Volume 9, Issue 1, January 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig. 1: Linear Regression Model for Predicting

Mathematics:-The logistic regression formula is: IV. TOOLS AND LIBRARIES


P(y = 1 /x) = σ(b0 + b1*x)
 Python: Python is a popular high-level, interpreted,
Where: dynamically typed programming language that is easy to
 P(y = 1 | x) is the probability of the event y=1, given x understand and use. It is versatile and widely used for
 σ(z) is the standard logistic function, also known as the applications such as web development, data analysis,
sigmoid function, which maps any input to a value artificial intelligence, machine learning, and automation.
between 0 and 1. Python emphasizes code readability with a clean and
 b0 and b1 are the parameters to be estimated, where b0 is straightforward syntax, reducing the cost of program
the intercept and b1 is the slope. maintenance.
 Pandas: Pandas is a Python-based open-source data
B. Random forest analysis and manipulation package. The Data Frame, a
An ensemble learning technique called a random forest two-dimensional table for holding structured data, is at
generates a large number of decision trees during the the heart of Pandas.It provides key data structures like
training phase and outputs a forecast that is the mean of the Series and DataFrame, enabling efficient data
predictions made by each individual tree. Each tree in the manipulation and analysis. pandas facilitates reading
forest is built by selecting a random subset of features at data from various file formats, such as CSV and Excel,
each node and finding the best split among those features to making it a versatile tool for data loading
minimize the impurity measureAn ensemble algorithm  Numpy: Numpy is an open-source numerical
known as random forest is made up of decision trees, it is computation package for Python. It includes
also a supervised machine algorithm. The many individual fundamental tools for working with huge,
decision trees that make up a random forest are unrelated to multidimensional arrays and matrices, as well as
one another. The main steps in the operation of random mathematical functions that may be applied to them.
forest are: Numpy is a core Python module for scientific computing
 For an N-sample-size-minimum sample, N samples are and data analysis that prioritises efficiency and
taken for N times, and getting an N sample is created by performance.
taking a single sample each time;  Streamlit: Streamlit is an open-source Python library
 When there are M attributes per sample and each node of designed for creating web applications for data science
the decision tree needs to be partitioned, then m and machine learning with minimal effort. It simplifies
attributes will be chosen at random (meeting the criterion the process of turning data scripts into interactive web
of m ≤ M). Then, one of these m attributes is chosen as applications. With a straightforward and intuitive API,
the node's splitting attribute using some method; streamlit enables users to build web apps quickly and
 During the decision tree-building process, each node is efficiently. Key features include automatic widget
divided in accordance with the precious step; generation, real-time updates, and the ability to integrate
charts, tables, and interactive components seamlessly.
Follow the above procedures to construct a huge  Plotly: An open-source Python package called Plotly is
number of decision trees. Random forest can be applied to used to create engaging and eye-catching data
highly dimensional data, judge the significance of features visualizations. Because of its adaptability to different
and the interaction between different features, balance the chart types, it can be used for a wide range of data
error for unbalanced data sets, and can maintain accuracy visualization applications. Plotly is renowned for its
for missing features. Compared to decision trees, it is less capacity to create dynamic dashboards that let people
prone to overfitting. However, it has been shown to overfit engage and examine data in real time.
on certain noisy classification or regression problems

IJISRT24JAN1349 www.ijisrt.com 1468


Volume 9, Issue 1, January 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
V. FLOW CHART

Fig. 2: Flow chart for churn process

Financial institutions are heavily dependent on client C. Data Pre-Processing


happiness for their operations, thus predicting customer In order to prepare raw data for analysis and model
attrition has a significant impact on them. These training, a number of crucial tasks are involved in data
establishments operate in a very competitive market and preprocessing, which is an essential step in the data analysis
keep clients by meeting their needs within the limitations of and machine learning process. One of the main tasks is to
their resources. By fitting the model onto the existing deal with missing information by using sophisticated
historical data, data mining techniques are utilized to find imputation algorithms or statistical measurements to either
interesting patterns and relationships in the data and remove or impute them. To keep outliers from distorting the
anticipate the behavior of the consumers, whether they will study or impairing model performance, data points that
be churning or not. considerably deviate from the norm are found and handled.

A. Data Collection D. Feature engineering


The dataset was extracted from the online of one of the Feature engineering, which involves creating or
leading site on internet Kaggle. The contained about 1000 modifying features to improve the model's predicted
customers’ data with 30 attributes. performance, is an essential phase in the process of
preparing data for machine learning models. This approach
B. Data Preparation seeks to extract meaningful information or relationships
The missing values with 30% null were removed from from the available data, going beyond simple feature
the dataset with the aid of Python programming language selection. Creating new features, which could entail
libraries. Numerical data was replaced with the ‘mean’ of combining or altering current ones, is a popular tactic. For
the variables while the ‘mode’ was used for the categorical example, by adding higher-order terms, polynomial features
data. To achieve better performance, the categorical data enable the model to capture non-linear connections. When
was transformed to numerical format using the Label two or more features are combined, interaction terms are
Encoder function in Python. created that can reveal intricate dependencies that could go
unnoticed when looking at the individual features.

IJISRT24JAN1349 www.ijisrt.com 1469


Volume 9, Issue 1, January 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
VI. IMPLEMENTATION

Fig. 3: Streamlit Interface for Predicting Churn of Customers

Customer Churn Prediction is a tool used to forecast data submission, the app will use the input data to create a
whether or not a customer would leave a business. The prediction. The customer will be labeled as "EXIT" and the
software employs a pre-trained machine learning model to prediction probability will be shown on the app if there is a
assess the probability of client churn by supplying several high probability of churn.
kinds of input data.The user of the app must enter payment
and demographic information, such as the customer's For instance, based on the input data shown in the
payment method and how long they have been with the firm. screenshot, the client is a senior citizen who pays with an
Payment information includes the number of months the electronic check and is dependant. According to the
consumer has been with the company, the type of contract, software, there is a 92% chance that the user will leave.
and whether or not they have paperless billing. Following

Fig. 4: App Interface

A. Data visualization  Line Plot offers comparable customisation, including


A Streamlit web application for data visualization is choices for color and the X and Y axes.
created by this Python script. With the help of the Plotly  With the use of a histogram, users can choose a
library, the program enables users to create a variety of numerical feature to plot and alter the color and number
interactive plots, examine the dataset, and add CSV or Excel of bins.
files.  Boxplot offers choices for color and the X and Y axes.
 In the sidebar, the script offers relevant customization
choices based on the type of chart that has been selected.
 Users can select X and Y axes for Scatterplot in addition
to a color column for categorical coloring.

IJISRT24JAN1349 www.ijisrt.com 1470


Volume 9, Issue 1, January 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig. 5: Visualization of Customer Churn Prediction Based on Monthly Charges

The x-axis in this bar chart indicates the monthly In the area of 0 to 100 monthly charges, for example,
charges of clients. The number of clients with a specific 40 clients churned (marked as "Churn-Yes") whereas the
monthly charge is represented on the y-axis. On the x-axis, remainder 1400 did not (marked as "Churn=No").The bar
there are ten separate categories, each reflecting a different chart not only shows the total number of customers, but it
range of monthly payments. Bars represent the number of also shows the trend of customer churn based on monthly
clients in each category.There are 1400 consumers in the charges.
first group (range from 0 to 100).
We may learn which consumers are most likely to
This bar chart is useful for visualizing how monthly churn based on their monthly charges by looking at this
charges are distributed among consumers.The "Churn" chart. This data can be utilized to strengthen the company's
legend in the graphic indicates whether or not the customer marketing tactics and keep clients who are on the verge of
has churned (left the company). leaving.

Fig. 6: Histograms of Numerical Features for Customer Churn Prediction

The frequency of the majority of consumers (Churn = VII. CONCLUSION


No) is around 400.Customers that pay a greater monthly fee
tend to be more frequent. Customers with Monthly Charges Using Python, Streamlit, and Plotly, a robust and
between 200 and 400, for example, have a higher frequency adaptable Data Visualization App has been created as the
than those with Monthly Charges less than 200.Customers result of this extensive project. With the help of this
with Frequency of 600 or higher have a larger amount of program, users may upload datasets with ease, examine their
Churns (Churn = Yes).The Total Charges histogram shows details, and create perceptive visualizations that are
that the majority of consumers have total charges between 0 customized for their analytical requirements. Data
and 2000. exploration is made simple and interesting by the
combination of Plotly's interactive charting features and
Customers with greater Frequency and Monthly Streamlit's straightforward design.
Charges are more likely to churn, according to the
histograms. This could point to areas in need of
improvement or action.

IJISRT24JAN1349 www.ijisrt.com 1471


Volume 9, Issue 1, January 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
The robustness of the script is attributed to its modular [4]. Nguyen, T. H., & Huynh, H. C. (2018). Dynamic
structure and careful error handling, which guarantee a churn prediction using machine learning algorithms.
seamless user experience even when faced with probable IEEE Transactions on Services Computing, 31(8),
difficulties during file reading or chart generation. Because 1848-1859.
of the application's flexibility, users can tailor their [5]. Zhang, Z., & Wu, M. (2019). Machine learning models
visualizations to particular dataset features, which promotes for customer churn risk prediction. IEEE Transactions
a deeper comprehension of underlying patterns.Users can on Intelligent Transportation Systems, 20(8), 961-974.
customize the program to meet their unique dataset and
analysis requirements because to its adaptability. Users may
personalize visualizations, dynamically choose features, and
learn more about the underlying data patterns thanks to the
user-friendly interface.

Although the script offers a strong basis for data


exploration, it may be improved further with functionality to
save or export created visualizations, add more chart kinds,
and offer more extensive customization possibilities.
Because of its modular design, the script is a great place for
users to start when creating custom data
visualization apps.The functions offered, which include
Histograms, Boxplots, Scatterplots, and Line Plots,
accommodate a wide variety of data exploration situations.
The sidebar's obvious division of visualization settings
improves user experience by offering a convenient and well-
organized area for adjustment. Making educated decisions is
made easier and the selection process is made simpler by the
addition of both number and non-numeric column
categorizations.

Although the Data Visualization App is a reliable tool


for interactive exploration, it can yet be improved in the
future. Its capabilities could be further enhanced by adding
more chart types, sophisticated customization options, and
mechanisms for exporting or saving visualizations. Because
of the script's modular nature, users may easily modify or
customize it to meet their own needs. This makes it a great
starting point for developers. In conclusion, this project
provides a useful tool that can be used by non-technical
users as well as data analysts, providing a smooth and easy
way to extract meaningful information from large and
complicated datasets. The cooperation between Plotly and
Streamlit demonstrates the possibility of developing
efficient and approachable Python data visualization
applications. The Data Visualization App is a testament to
the importance of easily available and interactive tools in the
field of data exploration and analysis, particularly as data-
driven decision-making becomes more and more important.

REFERENCES

[1]. Kaufman, R., & Kohli, R. (2009). Customer churn


prediction based on machine learning. Journal of Data
Mining and Knowledge Discovery, 1(1), 1-21.
[2]. Zhang, Z., & Yang, Q. (2016). Churn prediction
estimation based on machine learning methods. IEEE
Transactions on Knowledge and Data Engineering,
30(8), 1489-1498.
[3]. Das, S., & Das, P. (2018). Customer churn prediction
using machine learning approaches. IEEE Access, 6,
4424-453.

IJISRT24JAN1349 www.ijisrt.com 1472

You might also like