0% found this document useful (0 votes)
21 views77 pages

Recognition of Handwritten Digit Using CNN

Uploaded by

.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views77 pages

Recognition of Handwritten Digit Using CNN

Uploaded by

.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

A Minor Project Report (CS755PC)

on
Recognition of Handwritten Digit using CNN

Submitted
in partial fulfillment of the requirements forthe award of the degree of

Bachelor of Technology
in
Computer Science and Engineering
(Artificial Intelligence and Machine Learning)
By
Varshini Manuka
(21261A6641)

Under the guidance of


Mr. M. Srikanth Sagar
(Assistant Professor)

Department of Emerging Technologies


Mahatma Gandhi Institute of Technology
(Affiliated to Jawaharlal Nehru Technological University Hyderabad)Kokapet(V),
Gandipet(M), Hyderabad.
Telangana - 500 075.

2024 - 2025

i
MAHATMA GANDHI INSTITUTE OF TECHNOLOGY
(Affiliated to Jawaharlal Nehru Technological University Hyderabad)GANDIPET,
HYDERABAD – 500075, Telangana

CERTIFICATE

This is to certify that the project entitled – “RECOGINTION OF HANDWRITTEN


DIGIT USING CNN” is being submitted by VARSHINI MANUKA bearing Roll No.
21261A6641 in partial fulfillment of the requirements for the Minor Project (CS755PC) in
COMPUTER SCIENCE AND ENGINEERING (ARTIFICIAL INTELLIGENCE AND
MACHINE LEARNING) is a record of bonafide work carried out by her.
The results of the investigations enclosed in this report have been verified and found
satisfactory.

Supervisor Head of the Department Principal


Mr. M. Srikanth Sagar Dr. M. Rama Bai Dr. G. Chandra Mohan Reddy
Assistant Professor Professor Professor

External Examiner

iii
DECLARATION

This is to certify that the work reported in this project titled ―” RECOGNITION OF
HANDWRITTEN DIGIT USING CNN” is a record of work done by me in the Department of
Emerging Technologies, Mahatma Gandhi Institute of Technology, Hyderabad.

No part of the work is copied from books/journals/internet and wherever the portion is
taken, the same has been duly referred in the text. The report is based on the work done
entirely by me and not copied from any other source.

VARSHINI MANUKA (21261A6641)

i
ACKNOWLEDGEMENTS

I would like to express my sincere thanks to Dr. G. Chandra Mohan Reddy, Principal MGIT,
for providing the working facilities in college.

I wish to express my sincere thanks and gratitude to Dr. M. Rama Bai, Professor and HOD,
Department of ET, MGIT, for all the timely support and valuable suggestions during the period
of project.

I am extremely thankful to Dr. M. Rama Bai, Professor and HOD, and Dr. M Srikanth Sagar,
Associate Professor, Department of Emerging Technology, MGIT, minor project coordinators
for their encouragement and support throughout the project.

I am extremely thankful and indebted to my internal guide Ms. Najini Mohd, Assistant
Professor, Department of Emerging Technology, for her constant guidance, encouragement and
moral support throughout the project.

Finally, I would also like to thank all the faculty and staff of the Emerging Technology
Department who helped me directly or indirectly, for completing this project.

VARSHINI MANUKA (21261A6641)

i
TABLE OF CONTENTS
Certificate i

Declaration ii

Acknowledgement iii

List of Figures vi

List of Tables vii

Abstract vii
i
1. Introduction 1

1.1 Motivation 2

1.2 Problem Definition 2

1.3 Existing System 2

1.4 Proposed System 3

1.5 Requirements Specification 3

1.5.1 Software Requirements 3

1.5.2 Hardware Requirements 3

2. Literature Survey 12

3. Customer Churn Prediction Methodology 16

3.1 Logistic Regression 21

3.2 Decision Trees 22

3.3 K-Nearest Neighbor Algorithm 23

3.4 Support Vector Machines 23

3.5 UML Diagrams 26

i
3.5.1 Data Flow Diagram 26
3.5.2 Sequence Diagram 26
3.5.3 Activity Diagram 27
3.5.4 Collaboration Diagram 27
4. Testing and Results 28
4.1 Model Performances 32
4.2 Comparison of Models 38
5. Conclusion and Future Work 39
5.1 Conclusion 39
5.2 Future Work 39
Bibliography 40
Appendix 41

i
LIST OF FIGURES

Figure Anaconda Navigator window on Windows 10 8


1.1
Figure Jupyter Notebook Interface 10
1.2
Figure Snapshot of the dataset being used 17
3.1
Figure How different features correlate with churn 20
3.2
Figure The logistic regression model 21
3.3
Figure Logistic Function 21
3.4
Figure Basic structure of a decision tree 22
3.5
Figure Visualization of KNN 23
3.6
Figure Support vectors separated by a hyperplane 24
3.7
Figure Data Flow Diagram 26
3.8
Figure Sequence Diagram 26
3.10
Figure Activity Diagram 27
3.11
Figure Collaboration Diagram 27
3.12
Figure Accuracy graph 28
4.1
Figure Precision graph 29
4.2
Figure Recall graph 30
4.3
Figure F-1 score graph 31
4.4
Figure Kappa metric graph 32
4.5
Figure Classification Report and ROC of LR 32
4.6
Figure Feature Importance of LR 33
4.7
Figure Classification Report and ROC of DT 34
4.8
Figure Feature Importance of DT 35
4.9

i
Figure Classification Report and ROC of KNN 36
4.10
Figure Classification Report and ROC of SVM 36
4.11
Figure Feature Importance of SVM 37
4.12
Figure Graphical summarization of model performances 38
4.13

i
LIST OF TABLES
Tab Comparison of Literature survey 1
le 4
2.1
Tab Description of attributes used in the dataset 1
le 8
3.1
Tab Comparison of Results 3
le 8
4.1

i
ABSTRACT

Handwritten digit recognition is a key problem in the field of computer vision, with significant
applications in areas such as postal services, banking, and document processing. This study
proposes a deep learning-based approach using Convolutional Neural Networks (CNN) for
classifying digits drawn by users through an interactive drawing interface. The system allows
users to directly sketch digits on a digital canvas, which are then preprocessed and fed into the
CNN for classification. The CNN model consists of multiple convolutional and pooling layers for
automatic feature extraction, followed by fully connected layers to classify the digits from 0 to 9.
A key component of this project is the design of an intuitive interface that allows users to easily
input handwritten digits, providing a real-time, interactive experience. Techniques such as data
augmentation, dropout, and batch normalization are used to enhance model performance and
ensure robust classification. The results demonstrate the ability of the CNN to accurately
recognize drawn digits, highlighting the potential of deep learning and interactive interfaces for
real-time digit recognition applications.

Keywords: Handwritten recognition, digit recognition, epochs, hidden layers, machine learning,
neural network, CNN, Random Forest Classifier, Decision Tree Classifier, Gini Index, Entropy,
K-Nearest Neighbor, Supports Vector Machine, Gaussian Naive Bayes, , Bagging Classifier.

10
CHAPTER- 01

INTRODUCTION
With the continuous development of the economy and society, digital-related applications in
daily life are becoming more and more extensive, and the use scenarios are becoming more and
more abundant. The corresponding demand for handwritten digit recognition has also increased
significantly. Handwritten digit recognition is an important technology in computer vision, and it
has a very wide range of applications in postal codes, financial statements, and grade judgments.
At present, handwritten digit recognition technology is used in recognition systems such as postal
code recognition, automatic entry of bank documents, and automatic entry of financial
statements.

Handwritten digit recognition technology refers to machine automatic recognition of handwritten


Arabic numerals, which is a significant research focus in machine learning and pattern
recognition. In recent years, with the continuous development of the information society, people
not only require the recognition rate of handwritten digits to be higher, but also put forward
higher requirements for the training time of the model.

At present, many algorithms for handwritten digit recognition have been researched at home and
abroad. Commonly used algorithms include vector machine algorithm, Bayesian classification
algorithm, neural network, the K-nearest neighbor algorithm and so on. However, the recognition
accuracy of the above-mentioned algorithms is not high enough, because these algorithms have
limited mathematical function expression ability and network generalization ability for complex
classification problems. The emergence of convolutional neural networks provides a way to solve
the above problems, that is, the recognition accuracy is not high enough.

Using the fundamental principles of convolutional neural networks, this project develops a
pytorch-based model and applies dropout regularization to prevent overfitting and enhance the
recognition performance. The model is trained using the MNIST dataset and assessed on the VS
code platform. Experimental results demonstrate a recognition accuracy of approximately 98%,
indicating a high level of accuracy in recognizing handwritten digits.

11
1.1 Motivation

The ability to recognize handwritten digits accurately has become a cornerstone in various fields,
including banking, postal services, and document digitization. Traditional methods often rely on
rule-based algorithms or manual effort, which are prone to errors and inefficiencies when faced
with diverse handwriting styles. The advent of deep learning, particularly Convolutional Neural
Networks (CNNs), has revolutionized how we approach image recognition tasks by enabling
automated and highly accurate solutions.

This project was inspired by the need to create an accessible and interactive tool that
demonstrates the power of deep learning in real-time applications. By integrating a CNN model
with an intuitive drawing interface, the project seeks to bridge the gap between advanced
machine learning techniques and user-friendly applications. The motivation lies in showcasing
how AI can simplify complex tasks, such as digit recognition, while providing a hands-on,
engaging experience for users. Additionally, this project aims to highlight the practical
implications of AI in solving everyday problems, encouraging further innovation and exploration
in this domain

1.2 Problem Definition


Handwritten digit recognition is a challenging problem in the field of computer vision due to the
vast variability in handwriting styles, orientations, and sizes. Traditional recognition systems
struggle to generalize across different handwriting patterns, often leading to errors and
inefficiencies. The need for a robust, real-time, and interactive system to accurately classify
handwritten digits is crucial in various applications, such as automated form processing, postal
sorting, and banking systems.

The goal of this project is to develop a reliable handwritten digit recognition system that allows
users to draw digits interactively on a digital canvas. The system should preprocess the input,
classify the digit using a trained Convolutional Neural Network (CNN), and display the result in
real time. The primary challenges include handling variations in handwriting, ensuring accurate
predictions, and creating an intuitive interface for seamless user interaction. This project aims to
address these challenges and demonstrate the potential of deep learning in solving real-world
classification tasks effectively.

12
1.3 Existing System

13
south Asia. Another model utilized a CRM framework using neural network and data mining for
the prediction of customer behavior in banking. An algorithm was also developed based on
clickstream data of a website to extract information and tested the predictive power of the model
based on data such as number of clicks, repeated visits, repetitive purchases, etc. Nonetheless,
these models raised a few concerns which are to be addressed. The main drawbacks of existing
systems include:
● Most of them were suited only for applying suitable models and taking inference from
predictions.
● None of them focused on the attributes crucial towards customer churn.

● Focus was more towards comparison rather than attributes determination.

1.3 Proposed System


The basic model for predicting future customer churn is data from the past. We look at data from
customers that already have churned (response) and their characteristics / behaviour (predictors)
before the churn happened. The dataset contains demographic details of customers, their total
charges and the type of service they receive from the company. It comprises churn data of over
7000 customers spread over 21 attributes obtained from Kaggle [2]. By fitting statistical models
that relate the predictors to the response, we try to predict the response for existing customers.

1.4 Requirements Specification


1.5.1 Software Requirements
Language : Python 3.6
Operating system : Windows / Linux / macOSTools:
● Anaconda Navigator
● Jupyter Notebook
● Numpy
● Pandas
● Matplotlib
● Plotly

14
1.5.2 Hardware Requirements
● RAM – 4GB minimum

1.5.1.1 Python
Python is a high-level, interpreted, interactive and object-oriented scripting language created by
Guido Rossum in 1989. Python is designed to be highly readable. Its language constructs and
object-oriented approach aim to help programmers write clear, logical code for small and large-
scale projects. It is ideally designed for rapid prototyping of complex applications.

It has interfaces to many OS system calls and libraries and is extensible to C or C++. Python is
dynamically typed and garbage-collected. It supports multiple programming paradigms,
including procedural, object-oriented, and functional programming. Python programming is
widely used in Artificial Intelligence, Natural Language Generation, Neural Networks and other
advanced fields of Computer Science.

History of Python
Python was conceived in the late 1980s by Guido van Rossum at Centrum Wiskunde &
Informatica (CWI) in the Netherlands as a successor to the ABC language, capable of exception
handling and interfacing with the Amoeba operating system. Language developer Guido van
Rossum shouldered sole responsibility for the project until July 2018 but now shares his
leadership as a member of a five-person steering council.

Python 2.0 was released on 16 October 2000 with many major new features, including a cycle-
detecting garbage collector and support for Unicode.

Python 3.0 was released on 3 December 2008. It was a major revision of the language that is not
completely backward-compatible. Many of its major features were backported to Python 2.6.x
and 2.7.x version series. Releases of Python 3 include the 2 to 3 utility, which automates (at least
partially) the translation of Python 2 code to Python 3.

15
Features of Python
Python's features include:
• Easy-to-learn: Python has few keywords, simple structure, and a clearly defined syntax.This
allows the student to pick up the language quickly.
• Easy-to-read: Python code is more clearly defined and visible to the eyes.
• Easy-to-maintain: Python's source code is fairly easy-to-maintain.
• A broad standard library: Python's bulk of the library is very portable and cross-
platform compatible on UNIX, Windows, and Macintosh.
• Interactive Mode: Python has support for an interactive mode which allows interactivetesting
and debugging of snippets of code.
• Portable: Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.
• Extendable: You can add low-level modules to the Python interpreter. These modulesenable
programmers to add to or customize their tools to be more efficient.
• Databases: Python provides interfaces to all major commercial databases.
• GUI Programming: Python supports GUI applications that can be created and ported to many
system calls, libraries, and windows systems, such as Windows MFC, Macintosh, and the X
Window system of Unix.
• Scalable: Python provides a better structure and support for large programs than shell scripting.

Apart from the above-mentioned features, Python has a big list of good features, few are listed
below:
• IT supports functional and structured programming methods as well as OOP.
• It can be used as a scripting language or can be compiled to byte-code for building large
applications.
• It provides very high-level dynamic data types and supports dynamic type checking.
• IT supports automatic garbage collection.
• It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.

16
Python ModulesNumPy
NumPy is a general-purpose array-processing package. It provides a high-performance
multidimensional array object, and tools for working with these arrays.

It is the fundamental package for scientific computing with Python. It contains various features
including these important ones:
● A powerful N-dimensional array object
● Sophisticated (broadcasting) functions
● Tools for integrating C/C++ and Fortran code
● Useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional
container of generic data. Arbitrary data-types can be defined using Numpy which allowsNumPy
to seamlessly and speedily integrate with a wide variety of databases.

Pandas
Pandas is a Python package providing fast, flexible, and expressive data structures designed to
make working with structured (tabular, multidimensional, potentially heterogeneous) and time
series data both easy and intuitive. It aims to be the fundamental high-level building block for
doing practical, real world data analysis in Python. Additionally, it has the broader goal of
becoming the most powerful and flexible open source data analysis / manipulation tool available
in any language.

Pandas is well suited for many different kinds of data:


● Tabular data with heterogeneously-typed columns, as in an SQL table or Excelspreadsheet
● Ordered and unordered (not necessarily fixed-frequency) time series data.
● Arbitrary matrix data (homogeneously typed or heterogeneous) with row and columnlabels
● Any other form of observational / statistical data sets. The data actually need not belabeled
at all to be placed into a pandas data structure.

17
The two primary data structures of pandas, Series (1-dimensional) and DataFrame (2-
dimensional), handle the vast majority of typical use cases in finance, statistics, social science,
and many areas of engineering. For R users, DataFrame provides everything that R’s data.frame
provides and much more. pandas is built on top of NumPy and is intended to integrate well
within a scientific computing environment with many other 3rd party libraries.

Matplotlib
Matplotlib is an amazing visualization library in Python for 2D plots of arrays. Matplotlib is a
multi-platform data visualization library built on NumPy arrays and designed to work with the
broader SciPy stack. It was introduced by John Hunter in the year 2002.

One of the greatest benefits of visualization is that it allows us visual access to huge amounts of
data in easily digestible visuals. Matplotlib consists of several plots like line, bar, scatter,
histogram etc. For simple plotting the pyplot module provides a MATLAB-like interface,
particularly when combined with IPython. For the power user, you have full control of line styles,
font properties, axes properties, etc, via an object oriented interface or via a set of functions
familiar to MATLAB users.

Plotly
Plotly is an interactive, open-source, and browser-based graphing library for Python. Built on top
of plotly.js, plotly.py is a high-level, declarative charting library. plotly.js ships with over 30
chart types, including scientific charts, 3D graphs, statistical charts, SVG maps, financial charts,
and more.

Plotly has got some amazing features that make it better than other graphing libraries:
● It is interactive by default
● Charts are not saved as images but serialized as JSON, making them open to be read withR,
MATLAB, Julia and others easily
● Exports vector for print/publication
● Easy to manipulate/embed on web

18
1.5.1.2 Anaconda Navigator
Anaconda Navigator is a desktop graphical user interface (GUI) included in Anaconda®
distribution that allows you to launch applications and easily manage conda packages,
environments, and channels without using command-line commands. Navigator can search for
packages on Anaconda Cloud or in a local Anaconda Repository. It is available for Windows,
macOS, and Linux.

Figure 1.1 Anaconda Navigator window on Windows operating system

Usage of Navigator
In order to run, many scientific packages depend on specific versions of other packages. Data
scientists often use multiple versions of many packages and use multiple environments toseparate
these different versions.

The command-line program conda is both a package manager and an environment manager. This
helps data scientists ensure that each version of each package has all the dependencies it requires
and works correctly.

19
Navigator is an easy, point-and-click way to work with packages and environments without
needing to type conda commands in a terminal window. You can use it to find the packages you
want, install them in an environment, run the packages, and update them – all inside Navigator.

The following applications are available by default in Navigator:


● JupyterLab
● Jupyter Notebook
● Spyder
● VSCode
● Glueviz
● Orange 3 App
● RStudio

Advanced conda users can also build their own Navigator applications.

Executing the code with Navigator


The simplest way is with Spyder. From the Navigator Home tab, click Spyder, and write and
execute your code. You can also use Jupyter Notebooks the same way. Jupyter Notebooks are
increasingly popular systems that combine your code, descriptive text, output, images, and
interactive interfaces into a single notebook file that is edited, viewed, and used in a web
browser.

1.5.1.3 The Jupyter Notebook


The Jupyter Notebook is an open-source web application that allows you to create and share
documents that contain live code, equations, visualizations and narrative text. Uses include: data
cleaning and transformation, numerical simulation, statistical modeling, data visualization,
machine learning, and much more.

The Jupyter notebook combines two components:


A web application: A browser-based tool for interactive authoring of documents which combine
explanatory text, mathematics, computations and their rich media output.

20
Notebook documents: A representation of all content visible in the web application, including
inputs and outputs of the computations, explanatory text, mathematics, images, and rich media
representations of objects.

Main features of the web application

● In-browser editing for code, with automatic syntax highlighting, indentation, and tab
completion/introspection.

● The ability to execute code from the browser, with the results of computations attached to the
code which generated them.

● Displaying the result of computation using rich media representations, such as HTML, LaTeX,
PNG, SVG, etc. For example, publication-quality figures rendered by the matplotlib library, can
be included inline.

● In-browser editing for rich text using the Markdown markup language, which can provide
commentary for the code, is not limited to plain text.

● The ability to easily include mathematical notation within markdown cells using LaTeX, and
rendered natively by MathJax.

Figure 1.2 Jupyter Notebook Interface

21
Notebook name: The name displayed at the top of the page, next to the Jupyter logo, reflects the
name of the .ipynb file. Clicking on the notebook name brings up a dialog which allows you to
rename it. Thus, renaming a notebook from ―Untitled0‖ to ―My first notebook‖ in the browser,
renames the Untitled0.ipynb file to My first notebook.ipynb.
Menu bar: The menu bar presents different options that may be used to manipulate the way the
notebook functions.
Toolbar: The tool bar gives a quick way of performing the most-used operations within the
notebook, by clicking on an icon.
Code cell: The default type of cell; read on for an explanation of cells.

22
1. LITERATURE SURVEY
Potential research work carried out on various techniques for churn prediction in different
areas such as telecom, e-commerce, and banking etc. has been discussed in the following
paragraphs. Various researchers have employed different mechanisms for predicting customer
churn and to findout the most useful features used in the prediction.

[3] Muhammad Azeem, Muhammad Usman and A. C. M. Fong published a paper titled ―A
churn prediction model for prepaid customers in telecom using fuzzy classifiers‖. In this paper, a
fuzzy based churn prediction model has been proposed and validated using real data from a
telecom company in South Asia. A number of predominant classifiers namely, Neural Networks,
Linear Regression, C4.5, Support Vector Machines, AdaBoost, Gradient Boosting and Random
Forest have been compared with fuzzy classifiers to highlight the superiority of fuzzy classifiers
in predicting the accurate set of churners. Parameters such as TP rate & AUC were considered
and enhanced using the model.

[4] J. Vijaya and E. Sivasankar published a paper titled ―An efficient system for customer
churn prediction through particle swarm optimization based feature selection model with
simulated annealing‖. It employs particle swarm optimization (PSO) and proposes three variants
of PSO for churn prediction namely PSO incorporated with feature selection as its pre-processing
mechanism, The proposed classifiers were compared with a decision tree, naive bayes, K-nearest
neighbor, support vector machine, random forest and three hybrid models to analyze their
predictability levels and performance aspects. Experiments reveal that the performance of meta-
heuristics was more efficient and they also exhibited better predictability levels.

[5] Matrin Fridrich published a paper titled ―Hyperparameter Optimization of Artificial


Neural Network in Customer Churn Prediction‖ using Genetic Algorithms in e-commerce. The
prediction model is developed to identify customers at risk of defection. The proposed model
leads toimproved customer churn prediction ability on the basis of parameters such as TP rate, FP
rate, and accuracy. The analysis is carried out on a labeled ecommerce retail dataset describing
10,000 most valuable customers with the highest CLV (Customer Lifetime Value). To obtain the
best performing ANN (Artificial Neural Network) classification model, the proposed
hyperparameter search space is explored with genetic algorithms to find suitable parameter
settings. Explored part of hyperparameter search space is analyzed with conditional inference
tree structure addressing the

23
underlying fundamental context of given optimization which results in identification of
criticalfactors leading to well performing ANN classification model.

[6] Gordini and Veglio published a paper titled ―Customers churn prediction and
marketing retention strategies. An application of support vector machines based on the AUC
parameter- selection technique in B2B e-commerce industry‖. Parameters such as recentness,
frequency, length, product category, failure, monetary, age, profession, gender, request status etc.
were taken for performance comparison. The prediction power of the proposed method was
found to be better as compared to Linear Regression, Neural Networks & SVM especially for
noisy, imbalance & nonlinear data. Thus, their findings confirm that the data-driven approach to
churn prediction and the development of retention strategies outperforms commonly used
managerial heuristics in B2B e-commerce industry.

[7] Femina Bahari and Sudheep Elayidom published a paper titled ―An Efficient CRM-
Data Mining Framework for the Prediction of Customer Behaviour‖ in banking. The UCI dataset
containing direct bank marketing campaigns of Portuguese bank was taken. The model is used to
predict the behaviour of customers to enhance the decision-making processes for retaining valued
customers. Two classification models, Naïve Bayes and Neural Networks are studied and it was
concluded that Neural Network was better than Naïve Bayes algorithm for accuracy & specificity
while Naïve Bayes was better than Neural Network algorithm for sensitivity, TPR, FPR, and
ROC area. Neural network classified 4007/514 & Naive Bayes classified 3977/544 instances
correctly/incorrectly.

[8] Anil Kumar D. and Ravi V published a paper titled ―Predicting credit card customer
churn in banks using data mining‖. An ensemble system was developed incorporating majority
voting and involving Multilayer Perceptron (MLP), Logistic Regression (LR), decision trees
(J48), Random Forest (RF), Radial Basis Function (RBF) network and Support Vector Machine
(SVM) as the constituents. Since the dataset is highly unbalanced, with 93% loyal and 7%
churned customers,
(1) undersampling, (2) oversampling, (3) a combination of undersampling and oversampling
and
(4) the Synthetic Minority Oversampling Technique (SMOTE) was employed for balancing it.
The results indicated that SMOTE achieved good overall accuracy. Also, SMOTE and a
combination of undersampling and oversampling improved the sensitivity and overall accuracy in
majority voting. Moreover, the rules generated by decision tree J48 can act as an early warning
24
expert system.

25
Table 1: Comparison of Literature survey
S Y A TITL TECH ADVAN DISADV
. E U E NIQUE TAGES ANTAG
N A T S ES
O R H
O
R

1 2 M A Neural - Works - Churn


0 u churn Netwo on predicti
1 h predi rks, different on in
7
a ction Linear models other
m mode Regre to areas
m l for ssion, determi canalso
a prepa C4.5, ne the be
d id Suppo most done.
A custo rt efficient
z mers Vector one. - Predicti
e in Machi on
e telec nes, - Paramet accurac
m om AdaB ers such y can be
, using oost, as TP improve
M fuzzy Gradie rate & d with
u classi nt Area large
h fiers Boosti Under data.
a ng and the
m Rando Curve
m m (AUC)
a Forest were
d algorit also
U hms consider
s ed.
m
a
n
a
n
d
A
.
C
.
M
.
F
o
n
26
g

2 2 J. An Particl Compar - Enhanc


0 V effici e es PSO ed
1 ij ent Swar in3 algorith
7
a syste m variants m canbe
y m for Optim while used for
a custo ization taking other
a mer (PSO) into applicat
n churn , consider ion.
d predi Decisi ation
E ction on other - The
. throu Tree, paramet latest
Si gh Naive ers too. toolcan
v partic Bayes, be used.
as le K-
a swar Neare
n m st
k optim Neigh
ar izatio bor,
n Suppo
based rt
featur Vector
e Machi
select ne,
ion Rando
mode m
lwith Forest
simul algorit
ated hms
annea
ling

27
3 2 M Hype Artific It - Improv
0 at rpara ial provides ed LR,
1 ri meter Neural a decision
7
n Opti Netwo hyperpa tree,
F mizat rk, rametric fuzzy
ri ion of Geneti approac method
dr Artifi c h to scan be
ic cial Algori optimiz used.
h Neur thms e the
al custome - E-
Netw r churn commer
orkin predicti ce
Custo on datasets
mer model. and
Chur more
n paramet
Predi ers can
ction be used
for
better
predicti
on
accurac
y.

28
4 2 G Custo Logist - Paramet - Staying
0 or mers ic er power of
1 di churn Regre optimiz the
6
ni predi ssion, ation model
a ction Neural procedu is not
n and Netwo re plays predicte
d mark rks & a key d.
V eting Suppo role in
e retent rt predicti - Selectio
gl ion Vector ve n of
io strate Machi perform SVM
gies nes ance. kernel
function
- SVMau can be
c points done
out more
good accurate
generali ly &
zation more
perform predicti
ance on
when variable
applied s can be
tonoisy include
data d.

5 2 F An Naïve Compar - Improv


0 e Effici Bayes es the ed
1 m ent and two algorith
5
in CRM Neural models m can
a -Data Netwo with be used
B Mini rks respect for
a ng to banking
h Fram different and e-
ar ewor paramet commer
i k for ers ce.
a the individu
n Predi ally to - Time
d ction enhance for
S of applicati model
u Custo on- building
d mer based needs to
h Beha use. be
e viour reduced
e .
29
p
E
la
yi
d
o
m

6 2 A Predi Multil - This - A


0 ni cting ayer system warning
0 l credit Percep gives system
8 K
card tron, the best for
u
m custo Logist result churn
ar mer ic when predicti
D churn Regre the on can
. in ssion, unbalan be
a banks decisi ced designe
n using on original d.
d data trees data is
R minin (J48), smoted. - Improv
a g Rando ed tools
vi m - Multilay and data
V Forest er mining
, perceptr can be
Radial on used for
Basis produce accurate
Functi d better churn
on results. predicti
netwo on in
rk and other
Suppo applicat
rt ions.
Vector
Machi
ne

30
2. CUSTOMER CHURN PREDICTION METHODOLOGY
The basic layer for predicting future customer churn is data from the past. We look at data from
customers that already have churned (response) and their characteristics / behaviour (predictors)
beforethe churn happened. By fitting a statistical model that relates the predictors to the response,
the response for existing customers is predicted. The overall scope of work to forecast customer
attrition may look like the following:

● Understanding a problem and final goal

● Data collection

● Data preparation and preprocessing

● Modeling and testing

● Model deployment and monitoring

Understanding a problem and a final goal


It’s important to understand what insights one needs to get from the analysis. In short, you must
decide what question to ask and consequently what type of machine learning problem to solve:
classification or regression.

● Classification
The goal of classification is to determine to which class or category a data point (that is,
customer) belongs to. For classification problems, historical data is used with predefined target
variables, that is, labels (churner/non-churner) – answers that need to be predicted – to train an
algorithm. With classification, businesses can answer the following questions:
● Will this customer churn or not?

● Will a customer renew their subscription?

● Will a user downgrade a pricing plan?

● Are there any signs of unusual customer behavior?

● Regression
Customer churn prediction can be also formulated as a regression task. Regression analysis is a
statistical technique to estimate the relationship between a target variable and other data
values that

31
influence the target variable, expressed in continuous values. The result of regression is always
some number, while classification always suggests a category. In addition, regression analysis
allows for estimating how many different variables in data influence a target variable. With
regression, businessescan forecast in what period of time, a specific customer is likely to churn or
receive some probability estimate of churn per customer.

Data collection
Once kinds of insights to look for are identified, the data sources necessary for further predictive
modeling can be decided. The dataset used for this project contains demographic details of
customers, their total charges and they type of service they receive from the company. It comprises of
churn data of over 7000 customers spread over 21 attributes (described in Table 3.1) obtained
from the Kaggle website. (as shown in Figure 3.1). It can be used to analyze all relevant customer
data and develop focused customer retention programs.

Figure 3.1 A snapshot of the dataset being used.

In the given Figure 3.1, each row represents a customer, and each column contains customer’s
attributes described on the column Metadata.
The data set includes information about:

● Customers who left within the last month – the column is called Churn

● Services that each customer has signed up for – phone, multiple lines, internet, online
security,online backup, device protection, tech support, and streaming TV and movies
● Customer account information – how long they’ve been a customer, contract, payment
method,paperless billing, monthly charges, and total charges
● Demographic info about customers – gender, age range, and if they have partners and dependents

32
Table 3.1 Description of attributes used in the dataset
ATTRI DESCRIPTION
BUTE
customer Customer ID
ID
gender Whether the customer is a male or a female

SeniorCi Whether the customer is a senior citizen or not (1, 0)


tizen
Partner Whether the customer has a partner or not (Yes, No)

Depende Whether the customer has dependents or not (Yes, No)


nts
tenure Number of months the customer has stayed with the company

PhoneSe Whether the customer has a phone service or not (Yes, No)
rvice
Multiple Whether the customer has multiple lines or not (Yes, No, No phone
Lines service)
InternetS Customer’s internet service provider (DSL, Fiber optic, No)
ervice
OnlineS Whether the customer has online security or not (Yes, No, No internet
ecurity service)
OnlineB Whether the customer has online backup or not (Yes, No, No internet
ackup service)
DeviceP Whether the customer has device protection or not (Yes, No, No internet
rotection service)
TechSup Whether the customer has tech support or not (Yes, No, No internet
port service)
Streamin Whether the customer has streaming TV or not (Yes, No, No internet
gTV service)
Streamin Whether the customer has streaming movies or not (Yes, No, No internet
gMovies service)
Contract The contract term of the customer (Month-to-month, One year, Two
year)
Paperles Whether the customer has paperless billing or not (Yes, No)
sBilling
Payment The customer’s payment method (Electronic check, Mailed check, Bank
Method transfer
(automatic), Credit card (automatic))
Monthly The amount charged to the customer monthly
Charges
TotalCh The total amount charged to the customer
arges
Churn Whether the customer churned or not (Yes or No)

33
Data preparation and preprocessing
Historical data that was selected for solving the problem must be transformed into a format
suitable for machine learning. Since model performance and therefore the quality of received
insights depend on the quality of data, the primary aim is to make sure all data points are
presented using the same logic, and the overall dataset is free of inconsistencies.

Feature engineering, extraction, and selection


Feature engineering is a very important part of dataset preparation. During the process, a set of
attributes (input features) are created that represent various behavior patterns related to customer
engagement level with a service or product. In a broad sense, features are measurable
characteristics of observations that an ML model takes into account to predict outcomes (in this
case the decision relatedto churn probability.)

Features can be classified into four groups:


● Customer Demographic Features that contain basic information about a customer (e.g., age,
education level, location, income)
● User Behavior Features describing how a person uses a service or product (e.g., lifecycle stage,
number of times they log in into their accounts, active session length, time of the day when a
product is used actively, features or modules used, actions, monetary value)
● Support Features that characterize interactions with customer support (e.g., queries sent,
number of interactions, history of customer satisfaction scores)
● Contextual Features representing other contextual information about a customer.

34
Figure 3.2. How different user behavior, subscription, and demographic features correlate with
churn [9].

Feature extraction aims at reducing the number of variables (attributes) by leaving the ones that
represent the most discriminative information. Feature extraction helps to reduce the data
dimensionality (dimensions are columns with attributes in a dataset) and exclude irrelevant
information[10]. During this process, specialists revise previously extracted features and define a
subgroup of them that’s most correlated with customer churn. As a result of feature selection,
specialists have a dataset with only relevant features.

Modeling and Testing


The main goal of this project stage is to develop a churn prediction model. Specialists usually
train numerous models, tune, evaluate, and test them (as shown in the figure below) to define the
one that detects potential churners with the desired level of accuracy on training data. The
following supervisedmachine learning models have been used for predicting customer churn:

35
3.1 Logistic Regression
Logistic regression is a statistical model that in its basic form uses a logistic function to model a
binary dependent variable. Mathematically, a binary logistic model has a dependent variable with
two possible values, such as pass/fail which is represented by an indicator variable, where the
two values are labeled "0" and "1".

Figure 3.3 The binary logistic regression model basically gives you two possible values – 0/1,
happy/sad and churn/not churn.

Logistic Function
Logistic regression is named for the function used at the core of the method, the logistic function.
The logistic function, also called the sigmoid function is an S-shaped curve (as shown in Figure)
that can take any real-valued number and map it into a value between 0 and 1, but never exactly
at those limits.

Where e is the base of the natural logarithms and t value is the actual numerical value that you
want to transform.

Figure 3.4 Logistic Function

36
3.2 Decision Trees
Decision tree learning is one of the predictive modeling approaches that uses a decision tree (as a
predictive model) to go from observations about an item i.e. attribute (represented in the
branches) to conclusions about the item's target value i.e. churn or not (represented in the leaves).
Tree models where the target variable can take a discrete set of values are called classification
trees; in these tree structures, leaves represent class labels and branches represent conjunctions of
features that lead to those class labels. Decision trees where the target variable can take
continuous values (typically real numbers) are called regression trees. This algorithm splits a data
sample into two or more homogeneous sets based on the most significant differentiator in input
variables to make a prediction. With each split, a part of a tree is being generated. As a result,
a tree with decision nodes and leafnodes (which are decisions or classifications) is developed. A
tree starts from a root node – the best predictor.

Figure 3.5 Basic structure of a decision tree

Prediction results of decision trees can be easily interpreted and visualized. Even people without
an analytical or data science background can understand how a certain output appeared.
Compared to other algorithms, decision trees require less data preparation, which is also an
advantage. However, they may be unstable if any small changes were made in data. In other
words, variations in data may lead to radically different trees being generated.

37
3.3 K-Nearest Neighbors Algorithm
The k-nearest neighbors algorithm (k-NN) is a method used for classification and regression. In
both cases, the input consists of the k closest training examples in the feature space.
In k-NN classification, the output is a class membership (churn or not). A customer is classified
by a plurality vote of its neighbors, with the customer being assigned to the class most common
among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the
customer is simplyassigned to the class of that single nearest neighbor.

Figure 3.6 Visualization of KNN classifier


The figure shown above visualizes how KNN works when trying to classify a data point based a
given data set. It is compared to its nearest points and classified based on which points it is
closest and most similar to. Here you can see the point Xj will be classified as either W1 (red)
or W3 (green) based onits distance from each group of points.

To determine which of the K instances in the training dataset are most similar to a new
input, a distance measure is used. For real-valued input variables, the most popular distance
measure isEuclidean distance.

3.4 Support Vector Machine


Support Vector Machine (SVM) is a supervised machine learning algorithm which can be
used forboth classification and regression challenges. However, it is mostly used in classification
problems. In this algorithm, we plot each data item ie. a customer as a point in n-dimensional
space (where n is number of features you have) with the value of each feature being the value of a
particular coordinate. Then, we perform classification by finding the hyper-plane that
differentiates the two classes very well (as shown in the figure below).

38
Figure 3.7 Support vectors used to classify data items by separating with a hyperplane

Advantages:
● It works really well with clear margin of separation
● It is effective in high dimensional spaces.
● It is effective in cases where number of dimensions is greater than the number of samples.
● It uses a subset of training points in the decision function (called support vectors), so it is also
memoryefficient.
Disadvantages:
● It doesn’t perform well, when we have large data set because the required training time is higher
● It also doesn’t perform very well, when the data set has more noise i.e. target classes are
overlapping
● SVM doesn’t directly provide probability estimates, these are calculated using an expensive
five-foldcross-validation. It is related SVC method of Python scikit-learn library.
Deployment and monitoring
The selected models need to be put into production. Predicting customer churn with machine
learning is an iterative process that never ends. We monitor model performance and adjust
features as necessary to improve accuracy when customer-facing teams give us feedback or new
data becomes available. At the point of any human interaction – a support call, a CSM QBR
[quarterly business review], a Sales discovery call – we monitor and log the human interpretation
of customer help, which augments the machine learning models and increases the accuracy of our
health prediction for each customer.

39
Insights and Actions
Lastly, we have to evaluate and interpret the outcomes. Because predicting customer churn is
only half of the part and many people forget that by just predicting, they can still leave. In our
case we actually want to make them stop leaving. Selection of the most significant features for a
model would influence its predictive performance: The more qualitative the dataset, the more
precise forecasts are.

The ability to identify customers that aren’t happy with provided solutions allows businesses to
learn about product or pricing plan weak points, operation issues, as well as customer preferences
and expectations to proactively reduce reasons for churn.

40
3.5 UML Diagrams
3.5.1 Data Flow Diagram

Figure 3.8 Data Flow Diagram for Churn Prediction


The data flow diagram shown in Figure 3.8 shows the flow of data right from acquiring the
dataset tomodel building, extracting feature importance and comparison of model performances.

3.5.2 Sequence Diagram

Figure 3.9 Sequence Diagram for Customer Churn Prediction


The sequence diagram shown in Figure 3.9 shows the sequence of executing of various processes wiz.
acquiringdataset, preprocessing, feature extraction, predicting results and accuracy determination.

41
3.5.3 Activity Diagram

Figure 3.10 Activity diagram for implementation of Churn Prediction algorithm


The activity diagram shown in Figure 3.10 displays the flow of execution right from dataset
gatheringto training the algorithm to feature evaluation and analysis of results.

3.5.4 Collaboration Diagram

Figure 3.11 Collaboration diagram for Customer Churn Prediction


The collaboration diagram shown in Figure 3.11 illustrates the relationships and interactions among the
data andthe algorithm of the prediction model.

42
3. TESTING AND RESULTS

Testing is a crucial phase that determines the quality of models used as well as the importances
of all the features under consideration. The algorithms used in this project have been rigorously
tested based on various factors including accuracy, recall, precision, f1 score and kappa statistic.
Accuracy - It measures how many observations, both positive and negative, were correctly
classified.
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = … (1)
(
𝑡𝑝+𝑡𝑛)
(𝑡𝑝+𝑓𝑝+𝑓𝑛+𝑡𝑛)

𝐹𝑜𝑟 𝐿𝑜𝑔i𝑠𝑡i𝑐 𝑅𝑒𝑔𝑟𝑒𝑠𝑠i𝑜𝑛,


259 + 1152
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
259 + 231 + 116 + 1152 = 0.802 = 80.2%
𝐹𝑜𝑟 𝐾𝑁𝑁 𝐶𝑙𝑎𝑠𝑠i𝑓i𝑒𝑟,
351 + 878
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
351 + 139 + 390 + 878 = 0.699 = 69.9%

Figure 4.1. From the above figure, it is clear that Logistic Regression has the highest accuracy of
0.802 while the KNN classifier performed the worst with an accuracy of 0.699

Recall - It measures how many observations out of all positive observations, have we classified
as positive. Taking our customer churn example, it tells us how many churned customers we
recalled from all the churned customers.
𝑡𝑝
𝑅𝑒𝑐𝑎𝑙𝑙 = … (2)
𝑡𝑝+𝑓𝑛

43
While optimizing recall, you want to make sure you have identified ALL the customers who
could churn.
𝐹𝑜𝑟 𝑡ℎ𝑒 𝑆𝑉𝑀 𝐶𝑙𝑎𝑠𝑠i𝑓i𝑒𝑟,
391
𝑅𝑒𝑐𝑎𝑙𝑙 =
391 + 345 = 0.798 = 79.8%
𝐹𝑜𝑟 𝐷𝑒𝑐i𝑠i𝑜𝑛 𝑇𝑟𝑒𝑒,
3
𝑅𝑒𝑐𝑎𝑙𝑙 = 2 = 0.455 = 45.5%
5
3
2
5

3
2
4

Figure 4.2. From the above figure, it is clear that SVM classifier has the highest recall score of
0.798 while the Decision Tree has the least recall score of 0.455

Precision - It measures how many observations predicted as positive are in fact positive. Taking
our fraud detection example, it tells us what the ratio of customers correctly classified as churned
is.
𝑡𝑝
𝑃𝑟𝑒𝑐i𝑠i𝑜𝑛 = … (3)
𝑡𝑝+𝑓𝑝
While optimizing precision, you want to make sure that the customers that you classify as
churned ARE ACTUALLY CHURNED.
𝐹𝑜𝑟 𝐿𝑜𝑔i𝑠𝑡i𝑐 𝑅𝑒𝑔𝑟𝑒𝑠𝑠i𝑜𝑛, 259 + 231
259 𝐹𝑜𝑟 𝐾𝑁𝑁 𝐶𝑙𝑎𝑠𝑠i𝑓i𝑒𝑟,
𝑃𝑟𝑒𝑐i𝑠i𝑜𝑛 = 44
= 0.688 = 68.8%

3
𝑃𝑟𝑒𝑐i𝑠i𝑜𝑛 = 5 = 0.474 = 47.4%
1
3
5
1

1
3
9

45
Figure 4.3. From the above figure, it is clear that Logistic Regression is most precise at 0.688
while the KNN classifier performed the worst with a precision of 0.474

F-1 score - Simply put, it combines precision and recall into one metric. It’s the harmonic mean
between precision and recall. A perfect F1-score is 1.0 or 100%. The closer it is to 1.0, the better
the model. You can calculate it in the following way:
𝑝𝑟𝑒𝑐i𝑠i𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙
𝐹𝑏𝑒𝑡𝑎 = (1 𝑤ℎ𝑒𝑟𝑒 𝛽 = 1 … (4)
𝛽2 × 𝑝𝑟𝑒𝑐i𝑠i𝑜𝑛
+ 𝛽 2) + 𝑟𝑒𝑐𝑎𝑙𝑙
𝐹𝑜𝑟 𝑆𝑉𝑀 𝐶𝐿𝑎𝑠𝑠i𝑓i𝑒𝑟,
0.5312 × 0.798
𝐹1𝑠𝑐𝑜𝑟𝑒 = (1 + 1) = 0.638
1 × 0.5312 + 0.798
𝐹𝑜𝑟 𝐷𝑒𝑐i𝑠i𝑜𝑛 𝑇𝑟𝑒𝑒,
0.5947 × 0.4551
𝐹1𝑠𝑐𝑜𝑟𝑒 = (1 + 1) = 0.515
1 × 0.5947 + 0.4551

46
Figure 4.4. From the above figure, it is clear that SVM Classifier has the highest F1-score of
0.638 while the Decision Tree has the least score of 0.515
In the formulae of above metrics, tp is the true positive rate, fp is the false positive rate, tn is the
true negative rate and fn is the false negative rate.

Kappa Metric - It is a measure of agreement between the predictions and the actual labels. It
can also be interpreted as a comparison of the overall accuracy to the expected random chance
accuracy. The higher the Kappa metric is, the better your classifier is compared to a random
chance classifier. Kappa is defined as the difference between the overall accuracy and the
expected accuracy divided by 1 minus the expected accuracy.
𝑝𝑜−𝑝𝑒
𝐾= … (5)
1−𝑝𝑒
Where po is the observed agreement wr.t. our classifier and pe is the expected agreement wr.t. a
random classifier.
𝐹𝑜𝑟 𝐿𝑜𝑔i𝑠𝑡i𝑐 𝑅𝑒𝑔𝑟𝑒𝑠𝑠i𝑜𝑛,
0.8020 − 0.7425
𝐾=
1 − 0.7425 = 0.470
𝐹𝑜𝑟 𝐾𝑁𝑁 𝐶𝑙𝑎𝑠𝑠i𝑓i𝑒𝑟,
0.6991 − 0.7425
𝐾=
1 − 0.7425 = 0.353

47
Figure 4.5. From the above figure, it is clear that Logistic Regression has the highest Kappa
metric of 0.470 while the KNN classifier has the least value of 0.353.

4.1 Model Performances


4.1.1 Logistic Regression

Figure 4.6. From the above classification report and ROC, the following information can be
concluded:
● Accuracy of 0.80 indicates that 80% of the customers were correctly classified.
● Precision of 0.83 indicates that 83% of the churned customers predicted by the model have
actually churned.
● Recall score of 0.91 indicates that the model was able to predict 91% of the actual churned
customers as churned.
● F1 score of 0.87 out of maximum of 1 indicates that the model performs really well.

48
● The Area under Curve (AUC) of the Receiver Operating Characteristic(ROC) is 0.71 out of a
maximum of 1 which again indicates that the model has a good performance.

Figure 4.7. From the Logistic Regression algorithm, the above graph highlights the following
points:
● Attributes such as Contract_Two_year, Tenure_group_12-24 and InternetService_No contribute
the most towards churn. This implies that customers having a two-year contract with the
company, or who have stayed with the company for 12 to 24 months or have no internet service
are more likely to leave the company.
● Attributes such as Contract_Month-to-month, Tenure_group_48-60, Tenure_group_gt_60 and
PaperlessBilling contribute the least towards churn. This implies that customers having a monthly
contract with the company, or who have stayed with the company for more than 48 months or
have enrolled for a paperless billing service are more likely to stay with the company.
● Attributes such as Partner, DeviceProtection and MultipleLines_Yes have negligible contribution
in deciding customer churn. This implies that having a partner or not, or the device protection
service or not, or multiple phone lines plays an insignificant role in estimating the likelihood of a
custormer to leave the company.

49
4.1.2 Decision Tree Classifier

Figure 4.8. From the above classification report and ROC, the following information can be
concluded:
● Accuracy of 0.72 indicates that 72% of the customers were correctly classified.
● Precision of 0.85 indicates that 85% of the churned customers predicted by the model have
actually churned.
● Recall score of 0.74 indicates that the model was able to predict 74% of the actual churned
customers as churned.
● F1 score of 0.79 out of maximum of 1 indicates that the model performs well.
● The Area under Curve (AUC) of the Receiver Operating Characteristic(ROC) is 0.70 out of a
maximum of 1 which again indicates that the model has a good performance.

50
Figure 4.9. From the Decision Tree Classifier, the above graph highlights the following points:
● Attributes such as Tenure_group_gt_60, TotalCharges and MonthlyCharges contribute the most
towards churn. This implies that customers who have stayed with the company for more than 60
months are more likely to leave the company. High charges levied upon the customer for the
services provided also contribute to churn.
● Attribute such as Contract_Month-to-month contributes the least towards churn. This implies that
customers having a monthly contract are given the flexibility to choose among different plans and
hence are more likely to stay with the company.
● Attributes such as Gender, PhoneService and PaymentMethod_Bank_transfer have negligible
contribution in deciding customer churn. This implies that the customer’s gender, or having a
phone service or not, bank transfer as a payment method or not plays an insignificant role in
estimating the likelihood of a custormer to leave the company.

51
4.1.3 KNN Classifier

Figure 4.10. From the above classification report and ROC, the following information can be
concluded:
● Accuracy of 0.69 indicates that 69% of the customers were correctly classified.
● Precision of 0.86 indicates that 86% of the churned customers predicted by the model have
actually churned.
● Recall score of 0.69 indicates that the model was able to predict 69% of the actual churned
customers as churned.
● F1 score of 0.77 out of maximum of 1 indicates that the model performs well.
● The Area under Curve (AUC) of the Receiver Operating Characteristic(ROC) is 0.70 out of a
maximum of 1 which again indicates that the model has a good performance.

4.1.4 SVM Classifier

Figure 4.11. From the above classification report and ROC, the following information can be
concluded:

52
● Accuracy of 0.75 indicates that 75% of the customers were correctly classified.
● Precision of 0.90 indicates that 90% of the churned customers predicted by the model have
actually churned.
● Recall score of 0.73 indicates that the model was able to predict 73% of the actual churned
customers as churned.
● F1 score of 0.81 out of maximum of 1 indicates that the model performs well.
● The Area under Curve (AUC) of the Receiver Operating Characteristic (ROC) is 0.76 out ofa
maximum of 1 which again indicates that the model has a good performance.
Figure 4.12. From the SVM Classifier, the above graph highlights the following points:

● Attributes such as tenure, InternetService_No and Contract_Two_year contribute the most


towards churn. This implies that customers having a two-year contract with the company, or no
internet service are more likely to leave the company. The number of months the customer has
stayed with the company i.e. his tenure also decides the churn potential.
● Attributes such as TotalCharges, and Contract_Month-to-month contribute the least towards
churn. This implies that customers having a monthly contract are given the flexibility to choose
among different plans and hence are more likely to stay with the company.
● Attributes such as DeviceProtection, Partner and PaymentMethod_Bank_transfer have negligible
contribution in deciding customer churn. This implies that having a partner or not, or the device
protection service or not, or bank transfer as a payment method or not lines plays an insignificant
role in estimating the likelihood of a custormer to leave the company.

53
4.2 Comparison of Models
A thorough comparison of algorithms based on the metrics mentioned above gives a
comprehensive insight into the performance and efficiency of each of them. Their performances
can be summarized as follows:

Figure 4.13. Graphical summarization of the performances of all the algorithms usedTable 4.1.
Comparison of Results
Model Accuracy_score Recall_score Precision f1_score Kappa_metric
Logistic Regression 0.8020 0.5286 0.6888 0.5982 0.4698
Decision Tree 0.7218 0.4551 0.5947 0.5156 0.3612
KNN Classifier 0.6991 0.7163 0.4737 0.5703 0.3532
SVM Classifier 0.7474 0.798 0.5312 0.6378 0.4557

From the above table, we observe that the results predicted by the Logistic Regression algorithm
are the most efficient, evident from the high accuracy, precision, kappa metric and f1 score.

54
4. CONCLUSION AND FUTURE WORK
5.1 Conclusion
Churn prediction is one of the most effective strategies used in the telecom sector to retain
existing customers. It leads directly to improved cost allocation in customer relationship
management activities, retaining revenue and profits in future. It also has several positive
indirect impacts such as increasing customer’s loyalty, lowering customer’s sensitivity to
competitors marketing activities, and helps to build a positive image through satisfied customers.

The results predicted by the Logistic Regression algorithm were the most efficient with an
accuracy of 80.2%. Therefore, companies that want to prevent customer churn should utilize this
algorithm and remove features like long term contracts and instead replace them with monthly or
short term contracts, thereby giving them more flexibility. Providing additional services such as
device protection and multiple phone lines proves to be of little value to customer attrition.
Lastly, focusing on enhancing the experience of loyal customers who have stayed with the
company for long will prove worthwhile, ensuring their retention. The ability to identify
customers that aren’t happy with provided solutions allows businesses to learn about product or
pricing plan weak points, operation issues, as well as customer preferences and expectations to
proactively reduce reasons for churn.

5.2 Future Work


An important area for future research is to use a customer profiling methodology for developing
a real-time monitoring system for churn prediction. Research dedicated to the development of an
exhaustive customer loyalty value would have significant benefits to industry. It is anticipated
that the profiling methodology could provide an insight into customer behaviour, spending
patterns, cross-selling and up-selling opportunities. Seasonal trends could be apparent if the same
data was studied over a period of several years.

A comparative analysis of prediction model building time with respect to different classifiers
could be done in order to assist telecom analysts to pick a classifier which not only gives
accurate results in terms of TP rate, AUC and lift curve but also scales well with high dimension
and large volume of call records data. As concrete findings are related to the telecom dataset,
other domains’ datasets might be subject for further exploration and testing. Also, different and a
greater number of performance metrics with respect to business context and interpretability
might be explored in future.

55
BIBLIOGRAPHY
[1] Pavan Raj. Telecom Customer Churn Prediction, October 29th, 2018. Available:
https://www.kaggle.com/pavanraj159/telecom-customer-churn-prediction/
[2] Dataset resource link. Available: https://www.kaggle.com/blastchar/telco-customer-churn
[3] Azeem, M., Usman, M. & Fong, A.C.M. A churn prediction model for prepaid customers in
telecom using fuzzy classifiers. Telecommun Syst 66, 603–614 (2017).
[4] Vijaya, J. & Elango, Sivasankar. An efficient system for customer churn prediction through
particle swarm optimization based feature selection model with simulated annealing. Cluster
Computing. 22. 10.1007/s10586-017-1172-1 (2017).
[5] Fridrich, Martin. Hyperparameter optimization of artificial neural network in customer churn
prediction using genetic algorithm. 11. 9. 10.13164/trends.2017.28.9 (2017).
[6] Gordini, Niccolo & Veglio, Valerio. Customers Churn Prediction And Marketing Retention
Strategies. An Application of Support Vector Machines Based On the Auc Parameter-Selection
Technique In B2B E-Commerce Industry. Industrial Marketing Management. 62.
10.1016/j.indmarman.2016.08.003 (2016).
[7] Bahari, Tirani and M. Sudheep Elayidom. ―An Efficient CRM-Data Mining Framework for the
Prediction of Customer Behaviour.‖ (2015).
[8] Kumar, Dudyala & Ravi, Vadlamani. Predicting credit card customer churn in banks using
data mining. International Journal of Data Analysis Techniques and Strategies. 1. 4-28.
10.1504/IJDATS.2008.020020 (2008).
[9] Customer Churn Prediction for Subscription Businesses Using Machine Learning: Main
Approaches and Models. Available: https://www.altexsoft.com/blog/business/customer-churn-
prediction-for-subscription-businesses-using-machine-learning-main-approaches-and-models/
[10] Hands-on: Predict Customer Churn. Available: https://towardsdatascience.com/hands-on-
predict-customer-churn-5c2a42806266

56
APPENDIX
#Importing librariesimport numpy as np
import pandas as pd # data processingimport os
import matplotlib.pyplot as plt #visualizationfrom PIL import Image
%matplotlib inline
import seaborn as sns#visualizationimport itertools
import warnings warnings.filterwarnings("ignore")import io
import plotly.offline as py #visualization py.init_notebook_mode(connected=True) #visualization
import plotly.graph_objs as go #visualization
import plotly.tools as tls #visualization
import plotly.figure_factory as ff #visualization

telcom = pd.read_csv("WA_Fn-UseC_-Telco-Customer-Churn.csv")#first few rows


telcom.head()

#Data overview
print ("Rows : " ,telcom.shape[0]) print ("Columns : " ,telcom.shape[1])
print ("\nFeatures : \n" ,telcom.columns.tolist())
print ("\nMissing values : ", telcom.isnull().sum().values.sum())print ("\nUnique values :
\n",telcom.nunique())

#Data Manipulation
#Replacing spaces with null values in total charges column telcom['TotalCharges'] =
telcom["TotalCharges"].replace(" ",np.nan)

#Dropping null values from total charges column which contain .15% missing datatelcom =
telcom[telcom["TotalCharges"].notnull()]
telcom = telcom.reset_index()[telcom.columns]

#convert to float type


telcom["TotalCharges"] = telcom["TotalCharges"].astype(float)

57
#replace 'No internet service' to No for the following columns replace_cols = [ 'OnlineSecurity',
'OnlineBackup', 'DeviceProtection',
'TechSupport','StreamingTV', 'StreamingMovies']for i in replace_cols :
telcom[i] = telcom[i].replace({'No internet service' : 'No'})

#replace values
telcom["SeniorCitizen"] = telcom["SeniorCitizen"].replace({1:"Yes",0:"No"})

#Tenure to categorical columndef tenure_lab(telcom) :


if telcom["tenure"] <= 12 :
return "Tenure_0-12"
elif (telcom["tenure"] > 12) & (telcom["tenure"] <= 24 ):return "Tenure_12-24"
elif (telcom["tenure"] > 24) & (telcom["tenure"] <= 48) :return "Tenure_24-48"
elif (telcom["tenure"] > 48) & (telcom["tenure"] <= 60) :return "Tenure_48-60"
elif telcom["tenure"] > 60 :return "Tenure_gt_60"
telcom["tenure_group"] = telcom.apply(lambda telcom:tenure_lab(telcom),
axis = 1)

#Separating churn and non churn customers churn = telcom[telcom["Churn"] == "Yes"]


not_churn = telcom[telcom["Churn"] == "No"]

#Separating catagorical and numerical columnsId_col = ['customerID']


target_col = ["Churn"]
cat_cols = telcom.nunique()[telcom.nunique() < 6].keys().tolist()cat_cols = [x for x in cat_cols
if x not in target_col]
num_cols = [x for x in telcom.columns if x not in cat_cols + target_col + Id_col]

#Exploratory Data Analysis#Customer attrition in data #labels


lab = telcom["Churn"].value_counts().keys().tolist()#values
val = telcom["Churn"].value_counts().values.tolist()

58
trace = go.Pie(labels = lab ,
values = val ,
marker = dict(colors = [ 'royalblue' ,'lime'],line = dict(color = "white",
width = 1.3)
),
rotation = 90,
hoverinfo = "label+value+text",hole = .5
)
layout = go.Layout(dict(title = "Customer attrition in data",plot_bgcolor = "rgb(243,243,243)", paper_bgcolor =
"rgb(243,243,243)",
)
)

data = [trace]
fig = go.Figure(data = data,layout = layout)py.iplot(fig)

#Varibles distribution in customer attrition #function for pie plot for customer attrition typesdef
plot_pie(column) :

trace1 = go.Pie(values = churn[column].value_counts().values.tolist(),labels =


churn[column].value_counts().keys().tolist(), hoverinfo = "label+percent+name",
domain = dict(x = [0,.48]), name = "Churn Customers",
marker = dict(line = dict(width = 2,
color = "rgb(243,243,243)")
),
hole = .6
)
trace2 = go.Pie(values = not_churn[column].value_counts().values.tolist(),labels =
not_churn[column].value_counts().keys().tolist(), hoverinfo = "label+percent+name",
marker = dict(line = dict(width = 2,
color = "rgb(243,243,243)")
),
domain = dict(x = [.52,1]),hole = .6,

59
name = "Non churn customers"
)

layout = go.Layout(dict(title = column + " distribution in customer attrition ",plot_bgcolor =


"rgb(243,243,243)",
paper_bgcolor = "rgb(243,243,243)", annotations = [dict(text = "churn customers",
font = dict(size = 13),showarrow = False,
x = .15, y = .5),
dict(text = "Non churn customers",font = dict(size = 13), showarrow = False,
x = .88,y = .5
)
]
)
)
data = [trace1,trace2]
fig = go.Figure(data = data,layout = layout)py.iplot(fig)

#function for histogram for customer attrition typesdef histogram(column) :


trace1 = go.Histogram(x = churn[column],
histnorm= "percent",
name = "Churn Customers",
marker = dict(line = dict(width = .5,
color = "black"
)
),
opacity = .9
)

trace2 = go.Histogram(x = not_churn[column],histnorm = "percent",


name = "Non churn customers", marker = dict(line = dict(width = .5,
color = "black"
)
),

60
opacity = .9
)

data = [trace1,trace2]
layout = go.Layout(dict(title =column + " distribution in customer attrition ",plot_bgcolor =
"rgb(243,243,243)",
paper_bgcolor = "rgb(243,243,243)",
xaxis = dict(gridcolor = 'rgb(255, 255, 255)',title = column,
zerolinewidth=1,ticklen=5, gridwidth=2
),
yaxis = dict(gridcolor = 'rgb(255, 255, 255)',title = "percent", zerolinewidth=1,
ticklen=5, gridwidth=2
),
)
)
fig = go.Figure(data=data,layout=layout)py.iplot(fig)

#function for scatter plot matrix for numerical columns in datadef scatter_matrix(df) :

df = df.sort_values(by = "Churn" ,ascending = True)classes = df["Churn"].unique().tolist()


classes

class_code = {classes[k] : k for k in range(2)}class_code

color_vals = [class_code[cl] for cl in df["Churn"]]color_vals

pl_colorscale = "Portland"pl_colorscale

61
text = [df.loc[k,"Churn"] for k in range(len(df))]text

trace = go.Splom(dimensions = [dict(label = "tenure",


values = df["tenure"]), dict(label = 'MonthlyCharges',
values = df['MonthlyCharges']),dict(label = 'TotalCharges',
values = df['TotalCharges'])],
text = text,
marker = dict(color = color_vals, colorscale = pl_colorscale,size = 3,
showscale = False, line = dict(width = .1,
color='rgb(230,230,230)'
)
)
)
axis = dict(showline = True,zeroline = False, gridcolor = "#fff", ticklen = 4
)

layout = go.Layout(dict(title =
"Scatter plot matrix for Numerical columns for customer attrition",autosize = False,
height = 800,
width = 800, dragmode = "select", hovermode = "closest",
plot_bgcolor = 'rgba(240,240,240, 0.95)',xaxis1 = dict(axis),
yaxis1 = dict(axis),xaxis2 = dict(axis),yaxis2 = dict(axis),xaxis3 = dict(axis),yaxis3 = dict(axis),
)
)
data = [trace]

62
fig = go.Figure(data = data,layout = layout )py.iplot(fig)

#for all categorical columns plot piefor i in cat_cols :


plot_pie(i)

#for all categorical columns plot histogramfor i in num_cols :


histogram(i)

#scatter plot matrix scatter_matrix(telcom)

#Customer attrition in tenure groups#cusomer attrition in tenure groups


tg_ch = churn["tenure_group"].value_counts().reset_index()tg_ch.columns =
["tenure_group","count"]
tg_nch = not_churn["tenure_group"].value_counts().reset_index()tg_nch.columns =
["tenure_group","count"]

#bar - churn
trace1 = go.Bar(x = tg_ch["tenure_group"] , y = tg_ch["count"],name = "Churn Customers",
marker = dict(line = dict(width = .5,color = "black")),opacity = .9)

#bar - not churn


trace2 = go.Bar(x = tg_nch["tenure_group"] , y = tg_nch["count"],name = "Non Churn Customers",
marker = dict(line = dict(width = .5,color = "black")),opacity = .9)

layout = go.Layout(dict(title = "Customer attrition in tenure groups",plot_bgcolor = "rgb(243,243,243)",


paper_bgcolor = "rgb(243,243,243)",
xaxis = dict(gridcolor = 'rgb(255, 255, 255)',title = "tenure group",
zerolinewidth=1,ticklen=5,gridwidth=2),yaxis = dict(gridcolor = 'rgb(255, 255, 255)',
title = "count", zerolinewidth=1,ticklen=5,gridwidth=2),

63
)
)
data = [trace1,trace2]
fig = go.Figure(data=data,layout=layout)py.iplot(fig)

#Data preprocessing
from sklearn.preprocessing import LabelEncoder from sklearn.preprocessing import
StandardScaler

#customer id col
Id_col = ['customerID']#Target columns target_col = ["Churn"] #categorical columns
cat_cols = telcom.nunique()[telcom.nunique() < 6].keys().tolist()cat_cols = [x for x in cat_cols
if x not in target_col]
#numerical columns
num_cols = [x for x in telcom.columns if x not in cat_cols + target_col + Id_col]#Binary
columns with 2 values
bin_cols = telcom.nunique()[telcom.nunique() == 2].keys().tolist()#Columns more than 2 values
multi_cols = [i for i in cat_cols if i not in bin_cols]

#Label encoding Binary columnsle = LabelEncoder()


for i in bin_cols :
telcom[i] = le.fit_transform(telcom[i])

#Duplicating columns for multi value columns


telcom = pd.get_dummies(data = telcom,columns = multi_cols )

#Scaling Numerical columnsstd = StandardScaler()


scaled = std.fit_transform(telcom[num_cols]) scaled = pd.DataFrame(scaled,columns=num_cols)

#dropping original values merging scaled values for numerical columnsdf_telcom_og =


telcom.copy()
telcom = telcom.drop(columns = num_cols,axis = 1)
telcom = telcom.merge(scaled,left_index=True,right_index=True,how = "left")

64
#Variable Summary
summary = (df_telcom_og[[i for i in df_telcom_og.columns if i not in Id_col]].
describe().transpose().reset_index())

summary = summary.rename(columns = {"index" : "feature"})summary = np.around(summary,3)

val_lst = [summary['feature'], summary['count'],summary['mean'],summary['std'],


summary['min'], summary['25%'],
summary['50%'], summary['75%'], summary['max']]

= go.Table(header = dict(values = summary.columns.tolist(),line = dict(color = ['#506784']),


fill = dict(color = ['#119DFF']),
),
cells = dict(values = val_lst,
line = dict(color = ['#506784']),
fill = dict(color = ["lightgrey",'#F5F8FF'])
),
columnwidth = [200,60,100,100,60,60,80,80,80])
layout = go.Layout(dict(title = "Variable Summary"))figure =
go.Figure(data=[trace],layout=layout) py.iplot(figure)

#Correlation Matrix#correlation
correlation = telcom.corr()#tick labels
matrix_cols = correlation.columns.tolist()#convert to array
corr_array = np.array(correlation)

#Plotting
trace = go.Heatmap(z = corr_array,
x = matrix_cols,y = matrix_cols,
colorscale = "Viridis",
colorbar = dict(title = "Pearson Correlation coefficient",titleside = "right"
),
)

65
layout = go.Layout(dict(title = "Correlation Matrix for variables",autosize = False,
height = 720,
width = 800,
margin = dict(r = 0 ,l = 210,
t = 25,b = 210,
),
yaxis = dict(tickfont = dict(size = 9)),xaxis = dict(tickfont = dict(size = 9))
)
)

data = [trace]
fig = go.Figure(data=data,layout=layout)py.iplot(fig)

#Logistic Regression
from sklearn.model_selection import train_test_split from sklearn.linear_model import
LogisticRegression
from sklearn.metrics import confusion_matrix,accuracy_score,classification_reportfrom
sklearn.metrics import roc_auc_score,roc_curve,scorer
from sklearn.metrics import f1_scoreimport statsmodels.api as sm
from sklearn.metrics import precision_score,recall_score from yellowbrick.classifier import
DiscriminationThreshold#splitting train and test data
train,test = train_test_split(telcom,test_size = .25 ,random_state = 111)

##seperating dependent and independent variables


cols = [i for i in telcom.columns if i not in Id_col + target_col]train_X = train[cols]
train_Y = train[target_col]test_X = test[cols]
test_Y = test[target_col]

#Function attributes
#dataframe - processed dataframe#Algorithm - Algorithm used
#training_x - predictor variables dataframe(training)#testing_x - predictor variables
dataframe(testing) #training_y - target variable(training)
#training_y - target variable(testing)

66
#cf - ["coefficients","features"](cooefficients for logistic
#regression,features for tree based models)#threshold_plot - if True returns

threshold plot for model

telecom_churn_prediction(algorithm,training_x,testing_x, training_y,testing_y,cols,cf,threshold_plot) :

#model algorithm.fit(training_x,training_y) predictions = algorithm.predict(testing_x)


probabilities = algorithm.predict_proba(testing_x)#coeffs
if cf == "coefficients" :
coefficients = pd.DataFrame(algorithm.coef_.ravel())elif cf == "features" :
coefficients = pd.DataFrame(algorithm.feature_importances_)

column_df = pd.DataFrame(cols)
oef_sumry = (pd.merge(coefficients,column_df,left_index= True,right_index= True, how = "left"))
coef_sumry.columns = ["coefficients","features"]
coef_sumry = coef_sumry.sort_values(by = "coefficients",ascending = False)

print (algorithm)
print ("\n Classification report : \n",classification_report(testing_y,predictions))print ("Accuracy
Score : ",accuracy_score(testing_y,predictions))

#confusion matrix
conf_matrix = confusion_matrix(testing_y,predictions)#roc_auc_score
model_roc_auc = roc_auc_score(testing_y,predictions) print ("Area under curve :
",model_roc_auc,"\n") fpr,tpr,thresholds = roc_curve(testing_y,probabilities[:,1])

#plot confusion matrix


trace1 = go.Heatmap(z = conf_matrix ,
x = ["Not churn","Churn"],
y = ["Not churn","Churn"],
showscale = False,colorscale = "Picnic",name = "matrix")

67
#plot roc curve
trace2 = go.Scatter(x = fpr,y = tpr,
name = "Roc : " + str(model_roc_auc),
line = dict(color = ('rgb(22, 96, 167)'),width = 2))trace3 = go.Scatter(x =
[0,1],y=[0,1],
line = dict(color = ('rgb(205, 12, 24)'),width = 2,dash = 'dot'))
#plot coeffs
trace4 = go.Bar(x = coef_sumry["features"],y = coef_sumry["coefficients"],name = "coefficients",
marker = dict(color = coef_sumry["coefficients"],colorscale = "Picnic",
line = dict(width = .6,color = "black")))

#subplots
fig = tls.make_subplots(rows=2, cols=2, specs=[[{}, {}], [{'colspan': 2}, None]],subplot_titles=('Confusion
Matrix',
'Receiver operating characteristic','Feature Importances'))

fig.append_trace(trace1,1,1) fig.append_trace(trace2,1,2) fig.append_trace(trace3,1,2)


fig.append_trace(trace4,2,1)

fig['layout'].update(showlegend=False, title="Model performance" ,autosize = False,height = 900,width = 800,


plot_bgcolor = 'rgba(240,240,240, 0.95)', paper_bgcolor = 'rgba(240,240,240, 0.95)',
margin = dict(b = 195)) fig["layout"]["xaxis2"].update(dict(title = "false positive
rate"))fig["layout"]["yaxis2"].update(dict(title = "true positive rate"))
out"]["xaxis3"].update(dict(showgrid = True,tickfont = dict(size = 10),tickangle = 90))
py.iplot(fig)

if threshold_plot == True :
visualizer = DiscriminationThreshold(algorithm)visualizer.fit(training_x,training_y)
visualizer.poof()

logit = LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,

68
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2',
random_state=None, solver='liblinear', tol=0.0001,verbose=0, warm_start=False)

elecom_churn_prediction(logit,train_X,test_X,train_Y,test_Y,cols,"coefficients",threshold_plot = True)

#Synthetic Minority Oversampling TEchnique (SMOTE)#Randomly pick a point from the


minority class.
#Compute the k-nearest neighbors (for some pre-specified k) for this point.
#Add k new points somewhere between the chosen point and each of its neighbors
!pip uninstall imblanced-learn
!pip install imbalanced-learn==0.4.0
from imblearn import under_sampling, over_samplingfrom imblearn.over_sampling import
SMOTE

cols = [i for i in telcom.columns if i not in Id_col+target_col]smote_X = telcom[cols]


smote_Y = telcom[target_col]

#Split train and test data


smote_train_X,smote_test_X,smote_train_Y,smote_test_Y =train_test_split(smote_X,smote_Y,
test_size = .25 , random_state = 111)

#oversampling minority class using smoteos = SMOTE(random_state = 0)


os_smote_X,os_smote_Y = os.fit_sample(smote_train_X,smote_train_Y)os_smote_X =
pd.DataFrame(data = os_smote_X,columns=cols) os_smote_Y = pd.DataFrame(data =
os_smote_Y,columns=target_col) ###

logit_smote = LogisticRegression(C=1.0, class_weight=None, dual=False,fit_intercept=True,


intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2',
random_state=None, solver='liblinear', tol=0.0001,verbose=0, warm_start=False)

elecom_churn_prediction(logit_smote,os_smote_X,test_X,os_smote_Y,test_Y, cols,"coefficients",threshold_plot
= True)

69
#Recursive Feature Elimination
from sklearn.feature_selection import RFElogit = LogisticRegression()

rfe = RFE(logit,10)
rfe = rfe.fit(os_smote_X,os_smote_Y.values.ravel())

rfe.support_rfe.ranking_

#identified columns Recursive Feature Elimination idc_rfe = pd.DataFrame({"rfe_support"


:rfe.support_,
"columns" : [i for i in telcom.columns if i not in Id_col + target_col],"ranking" : rfe.ranking_,
})
cols = idc_rfe[idc_rfe["rfe_support"] == True]["columns"].tolist()

#separating train and test data train_rf_X = os_smote_X[cols]train_rf_Y = os_smote_Y test_rf_X


= test[cols] test_rf_Y = test[target_col]

logit_rfe = LogisticRegression(C=1.0, class_weight=None, dual=False,fit_intercept=True,


intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2',
random_state=None, solver='liblinear', tol=0.0001,verbose=0, warm_start=False)
#applying model telecom_churn_prediction(logit_rfe,train_rf_X,test_rf_X,train_rf_Y,test_rf_Y,
cols,"coefficients",threshold_plot = True)

tab_rk = ff.create_table(idc_rfe)py.iplot(tab_rk)

#Univariate Selection
#Feature Extraction with Univariate Statistical Tests (Chi-squared for classification)
#uses the chi squared (chi^2) statistical test for non-negative features to select the bestfeatures
from sklearn.feature_selection import chi2
from sklearn.feature_selection import SelectKBest

70
#select columns
cols = [i for i in telcom.columns if i not in Id_col + target_col ]#dataframe with non negative
values
df_x = df_telcom_og[cols]
df_y = df_telcom_og[target_col]

#fit model with k= 3


select = SelectKBest(score_func = chi2,k = 3)fit = select.fit(df_x,df_y)

#Summerize scoresprint ("scores") print (fit.scores_) print ("P - Values")print (fit.pvalues_)

#create dataframe
score = pd.DataFrame({"features":cols,"scores":fit.scores_,"p_values":fit.pvalues_ })score =
score.sort_values(by = "scores" ,ascending =False)

#createing new label for categorical and numerical columns


score["feature_type"] = np.where(score["features"].isin(num_cols),"Numerical","Categorical")

#plot
trace = go.Scatter(x = score[score["feature_type"] == "Categorical"]["features"],y =
score[score["feature_type"] == "Categorical"]["scores"],
name = "Categorial",mode = "lines+markers",marker = dict(color = "red",
line = dict(width =1))
)

trace1 = go.Bar(x = score[score["feature_type"] == "Numerical"]["features"],y =


score[score["feature_type"] == "Numerical"]["scores"],name =
"Numerical",
marker = dict(color = "royalblue",line = dict(width =1)),
xaxis = "x2",yaxis = "y2"
)
layout = go.Layout(dict(title = "Scores for Categorical & Numerical features",plot_bgcolor =
"rgb(243,243,243)",
paper_bgcolor = "rgb(243,243,243)",

71
xaxis = dict(gridcolor = 'rgb(255, 255, 255)',tickfont = dict(size =10), domain=[0, 0.7],
tickangle = 90,zerolinewidth=1,ticklen=5,gridwidth=2),
yaxis = dict(gridcolor = 'rgb(255, 255, 255)',title = "scores",
zerolinewidth=1,ticklen=5,gridwidth=2),margin = dict(b=200),
xaxis2=dict(domain=[0.8, 1],tickangle = 90,gridcolor = 'rgb(255, 255, 255)'),
yaxis2=dict(anchor='x2',gridcolor = 'rgb(255, 255, 255)')
)
)

data=[trace,trace1]
fig = go.Figure(data=data,layout=layout)py.iplot(fig)

#Decision Tree
#Using top three numerical features
from sklearn.tree import DecisionTreeClassifierfrom sklearn.tree import export_graphviz
from sklearn import tree from graphviz import Source
from IPython.display import SVG,display

#top 3 categorical features


features_cat = score[score["feature_type"] == "Categorical"]["features"][:3].tolist()

#top 3 numerical features


features_num = score[score["feature_type"] == "Numerical"]["features"][:3].tolist()

#Function attributes
#columns - selected columns #maximum_depth - depth of tree #criterion_type - ["gini" or
"entropy"]#split_type - ["best" or "random"]
#Model Performance - True (gives model output)

def plot_decision_tree(columns,maximum_depth,criterion_type,split_type,model_performance = None) :

72
#separating dependent and in dependent variablesdtc_x = df_x[columns]
dtc_y = df_y[target_col]

#model
ier = DecisionTreeClassifier(max_depth = maximum_depth,splitter = split_type,
criterion = criterion_type,
)
dt_classifier.fit(dtc_x,dtc_y)

#plot decision tree


graph = Source(tree.export_graphviz(dt_classifier,out_file=None,
rounded=True,proportion = False,feature_names = columns, precision = 2,
class_names=["Not churn","Churn"],filled = True
)
)

#model performance
if model_performance == True : telecom_churn_prediction(dt_classifier,
dtc_x,test_X[columns],dtc_y,test_Y,
columns,"features",threshold_plot = True)
display(graph) plot_decision_tree(features_num,3,"gini","best")

#using contract,tenure and paperless billing variables


columns = ['tenure','Contract_Month-to-month', 'PaperlessBilling','Contract_One year', 'Contract_Two
year']

plot_decision_tree(columns,3,"gini","best",model_performance= True)#KNN Classifier


#Applying knn algorithm to smote oversampled data.
def telecom_churn_prediction_alg(algorithm,training_x,testing_x,
training_y,testing_y,threshold_plot = True) :

73
#model algorithm.fit(training_x,training_y) predictions = algorithm.predict(testing_x)
probabilities = algorithm.predict_proba(testing_x)

print (algorithm)
print ("\n Classification report : \n",classification_report(testing_y,predictions))print ("Accuracy
Score : ",accuracy_score(testing_y,predictions))
#confusion matrix
conf_matrix = confusion_matrix(testing_y,predictions)#roc_auc_score
model_roc_auc = roc_auc_score(testing_y,predictions) print ("Area under curve :
",model_roc_auc) fpr,tpr,thresholds = roc_curve(testing_y,probabilities[:,1])

#plot roc curve


trace1 = go.Scatter(x = fpr,y = tpr,
name = "Roc : " + str(model_roc_auc),
line = dict(color = ('rgb(22, 96, 167)'),width = 2),
)
trace2 = go.Scatter(x = [0,1],y=[0,1],
line = dict(color = ('rgb(205, 12, 24)'),width = 2,dash = 'dot'))

#plot confusion matrix


trace3 = go.Heatmap(z = conf_matrix ,x = ["Not churn","Churn"],y = ["Not churn","Churn"],
showscale = False,colorscale = "Blues",name = "matrix",xaxis = "x2",yaxis = "y2"
)

layout = go.Layout(dict(title="Model performance" , autosize = False,height = 500,width = 800,showlegend =


False,
plot_bgcolor = "rgb(243,243,243)", paper_bgcolor = "rgb(243,243,243)", xaxis = dict(title =
"false positive rate",
gridcolor = 'rgb(255, 255, 255)',
domain=[0, 0.6], ticklen=5,gridwidth=2),
yaxis = dict(title = "true positive rate", gridcolor = 'rgb(255, 255, 255)',

74
zerolinewidth=1, ticklen=5,gridwidth=2),
margin = dict(b=200), xaxis2=dict(domain=[0.7, 1],tickangle = 90,
gridcolor = 'rgb(255, 255, 255)'),
yaxis2=dict(anchor='x2',gridcolor = 'rgb(255, 255, 255)')
)
)
data = [trace1,trace2,trace3]
fig = go.Figure(data=data,layout=layout)py.iplot(fig)

if threshold_plot == True :
visualizer = DiscriminationThreshold(algorithm)visualizer.fit(training_x,training_y)
visualizer.poof()

from sklearn.neighbors import KNeighborsClassifier


knn = KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',metric_params=None,
n_jobs=1, n_neighbors=5, p=2,
weights='uniform') telecom_churn_prediction_alg(knn,os_smote_X,test_X,
os_smote_Y,test_Y,threshold_plot = True)

#Support Vector Machine from sklearn.svm import SVC

#Support vector classifier#using linear hyper plane


svc_lin = SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr',
degree=3, gamma=1.0, kernel='linear',max_iter=-1, probability=True, random_state=None,
shrinking=True,tol=0.001, verbose=False)

cols = [i for i in telcom.columns if i not in Id_col + target_col]


telecom_churn_prediction(svc_lin,os_smote_X,test_X,os_smote_Y,test_Y,
cols,"coefficients",threshold_plot = False)

#Model Performance Metrics


from sklearn.metrics import f1_score
from sklearn.metrics import cohen_kappa_score

75
#gives model report in dataframe
def model_report(model,training_x,testing_x,training_y,testing_y,name) :
model.fit(training_x,training_y)
predictions = model.predict(testing_x)
accuracy = accuracy_score(testing_y,predictions)recallscore =
recall_score(testing_y,predictions) precision = precision_score(testing_y,predictions)
roc_auc = roc_auc_score(testing_y,predictions) f1score =
f1_score(testing_y,predictions)
kappa_metric = cohen_kappa_score(testing_y,predictions)

df = pd.DataFrame({"Model" : [name], "Accuracy_score" : [accuracy], "Recall_score"


: [recallscore], "Precision" : [precision], "f1_score" :
[f1score], "Area_under_curve": [roc_auc], "Kappa_metric" : [kappa_metric],
})
return df

#outputs for every model


model1 = model_report(logit,train_X,test_X,train_Y,test_Y,"Logistic Regression")
decision_tree = DecisionTreeClassifier(max_depth = 9,
random_state = 123,splitter = "best", criterion = "gini",
)
model2 = model_report(decision_tree,train_X,test_X,train_Y,test_Y,"Decision Tree")
model3 = model_report(knn,os_smote_X,test_X,os_smote_Y,test_Y,"KNN Classifier")
model4 = model_report(svc_lin,os_smote_X,test_X,os_smote_Y,test_Y,"SVM Classifier")

#concat all models


model_performances = pd.concat([model1,model2,model3,model4],axis =0).reset_index()

model_performances = model_performances.drop(columns = "index",axis =1)

76
table = ff.create_table(np.round(model_performances,4))py.iplot(table)

#Compare Model Metricsmodel_performances


def output_tracer(metric,color) :
tracer = go.Bar(y = model_performances["Model"] ,x = model_performances[metric], orientation =
"h",name = metric ,
marker = dict(line = dict(width =.7),color = color)
)
return tracer

layout = go.Layout(dict(title = "Model Performances",plot_bgcolor = "rgb(243,243,243)", paper_bgcolor =


"rgb(243,243,243)",
xaxis = dict(gridcolor = 'rgb(255, 255, 255)',title = "Metric",
zerolinewidth=1, ticklen=5,gridwidth=2),
yaxis = dict(gridcolor = 'rgb(255, 255, 255)', zerolinewidth=1,ticklen=5,gridwidth=2),
margin = dict(l = 250),height = 780
)
)

trace1 = output_tracer("Accuracy_score","#6699FF") trace2 = output_tracer('Recall_score',"red")


trace3 = output_tracer('Precision',"#33CC99") trace4 = output_tracer('f1_score',"lightgrey")
trace5 = output_tracer('Kappa_metric',"#FFCC99")

data = [trace1,trace2,trace3,trace4,trace5]fig = go.Figure(data=data,layout=layout)py.iplot(fig)

77

You might also like