0% found this document useful (0 votes)
115 views10 pages

Customer Churn Analysis - Jupyter Notebook

This document contains a summary of steps taken to analyze customer churn from a telecom dataset: 1. Various libraries are imported and the dataset is read into a Pandas dataframe with over 7,000 rows and 21 columns of customer data. 2. Unneeded columns like customer ID are dropped, object columns are converted to numeric where possible, and null values are dropped to clean the data. 3. Exploratory data analysis is begun with a bar plot showing the distribution of customers who have churned versus those still with the provider.

Uploaded by

akash.050501
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views10 pages

Customer Churn Analysis - Jupyter Notebook

This document contains a summary of steps taken to analyze customer churn from a telecom dataset: 1. Various libraries are imported and the dataset is read into a Pandas dataframe with over 7,000 rows and 21 columns of customer data. 2. Unneeded columns like customer ID are dropped, object columns are converted to numeric where possible, and null values are dropped to clean the data. 3. Exploratory data analysis is begun with a bar plot showing the distribution of customers who have churned versus those still with the provider.

Uploaded by

akash.050501
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

Import necessary libraries


In [1]: 1 import pandas as pd
2 import numpy as np
3 import matplotlib.pyplot as plt
4 import seaborn as sns

Read dataset
In [2]: 1 df=pd.read_csv('Tel_Customer_Churn_Dataset.csv')
2 df.head()

Out[2]:
customerID gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService Onlin

7590- No phone
0 Female 0 Yes No 1 No DSL
VHVEG service

5575-
1 Male 0 No No 34 Yes No DSL
GNVDE

3668-
2 Male 0 No No 2 Yes No DSL
QPYBK

7795- No phone
3 Male 0 No No 45 No DSL
CFOCW service

9237-
4 Female 0 No No 2 Yes No Fiber optic
HQITU

5 rows × 21 columns

In [3]: 1 df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 customerID 7043 non-null object
1 gender 7043 non-null object
2 SeniorCitizen 7043 non-null int64
3 Partner 7043 non-null object
4 Dependents 7043 non-null object
5 tenure 7043 non-null int64
6 PhoneService 7043 non-null object
7 MultipleLines 7043 non-null object
8 InternetService 7043 non-null object
9 OnlineSecurity 7043 non-null object
10 OnlineBackup 7043 non-null object
11 DeviceProtection 7043 non-null object
12 TechSupport 7043 non-null object
13 StreamingTV 7043 non-null object
14 StreamingMovies 7043 non-null object
15 Contract 7043 non-null object
16 PaperlessBilling 7043 non-null object
17 PaymentMethod 7043 non-null object
18 MonthlyCharges 7043 non-null float64
19 TotalCharges 7043 non-null object
20 Churn 7043 non-null object
dtypes: float64(1), int64(2), object(18)
memory usage: 1.1+ MB

Dropping unwanted columns

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 1/10


07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

In [4]: 1 df=df.drop(["customerID"], axis = 1)


2 df.head()

Out[4]:
gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity On

No phone
0 Female 0 Yes No 1 No DSL No
service

1 Male 0 No No 34 Yes No DSL Yes

2 Male 0 No No 2 Yes No DSL Yes

No phone
3 Male 0 No No 45 No DSL Yes
service

4 Female 0 No No 2 Yes No Fiber optic No

Converting the 'TotalCharges' column to numeric values

In [5]: 1 df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')


2 df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gender 7043 non-null object
1 SeniorCitizen 7043 non-null int64
2 Partner 7043 non-null object
3 Dependents 7043 non-null object
4 tenure 7043 non-null int64
5 PhoneService 7043 non-null object
6 MultipleLines 7043 non-null object
7 InternetService 7043 non-null object
8 OnlineSecurity 7043 non-null object
9 OnlineBackup 7043 non-null object
10 DeviceProtection 7043 non-null object
11 TechSupport 7043 non-null object
12 StreamingTV 7043 non-null object
13 StreamingMovies 7043 non-null object
14 Contract 7043 non-null object
15 PaperlessBilling 7043 non-null object
16 PaymentMethod 7043 non-null object
17 MonthlyCharges 7043 non-null float64
18 TotalCharges 7032 non-null float64
19 Churn 7043 non-null object
dtypes: float64(2), int64(2), object(16)
memory usage: 1.1+ MB

Checking for null values

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 2/10


07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

In [6]: 1 df.isnull().sum()

Out[6]: gender 0
SeniorCitizen 0
Partner 0
Dependents 0
tenure 0
PhoneService 0
MultipleLines 0
InternetService 0
OnlineSecurity 0
OnlineBackup 0
DeviceProtection 0
TechSupport 0
StreamingTV 0
StreamingMovies 0
Contract 0
PaperlessBilling 0
PaymentMethod 0
MonthlyCharges 0
TotalCharges 11
Churn 0
dtype: int64

Treating null values


In [7]: 1 df = df.dropna()

In [8]: 1 df.isnull().sum().sum()

Out[8]: 0

EDA

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 3/10


07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

In [9]: 1 #churn distribution


2 ​
3 plt.figure(figsize=(8, 6))
4 plt.bar(df['Churn'].unique(), df['Churn'].value_counts(), color=['green', 'yellow'])
5 plt.title('Churn Distribution', fontsize=16, fontweight='bold')
6 plt.xlabel('Churn')
7 plt.ylabel('Count')
8 plt.grid(axis='y', linestyle='--', alpha=0.7)
9 plt.show()

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 4/10


07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

In [10]: 1 #churn vs gender


2 ​
3 sns.countplot(x='gender', hue="Churn", data=df, palette=['green', 'yellow'], edgecolor = 'Bl
4 sns.despine()
5 plt.grid(axis='y', linestyle='--', alpha=0.7)

In [11]: 1 #churn vs SeniorCitizen


2 ​
3 sns.countplot(x='SeniorCitizen', hue="Churn", data=df, palette=['green', 'yellow'], edgecolo
4 sns.despine()
5 plt.grid(axis='y', linestyle='--', alpha=0.7)

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 5/10


07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

In [12]: 1 #churn vs partner


2 ​
3 sns.countplot(x='Partner', hue="Churn", data=df, palette=['green', 'yellow'], edgecolor = 'B
4 sns.despine()
5 plt.grid(axis='y', linestyle='--', alpha=0.7)

In [13]: 1 #churn vs dependents


2 ​
3 sns.countplot(x='Dependents', hue="Churn", data=df, palette=['green', 'yellow'], edgecolor =
4 sns.despine()
5 plt.grid(axis='y', linestyle='--', alpha=0.7)

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 6/10


07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

In [14]: 1 #churn vs contract


2 ​
3 sns.countplot(x='Contract', hue="Churn", data=df, palette=['green', 'yellow'], edgecolor =
4 sns.despine()
5 plt.grid(axis='y', linestyle='--', alpha=0.7)

In [15]: 1 #churn vs MonthlyCharges


2 ​
3 ax = sns.kdeplot(df['MonthlyCharges'][df["Churn"] == 'No'], fill = True,color='green')
4 ax = sns.kdeplot(df['MonthlyCharges'][df["Churn"] == 'Yes'],ax =ax, fill= True,color='skyblu
5 ax.legend(["Not Churn","Churn"],loc='upper right')
6 ax.set_ylabel('Density')
7 ax.set_xlabel('Monthly Charges')
8 ax.set_title('Distribution of monthly charges by churn')

Out[15]: Text(0.5, 1.0, 'Distribution of monthly charges by churn')

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 7/10


07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

In [16]: 1 #churn vs TotalCharges


2 ​
3 ax = sns.kdeplot(df['TotalCharges'][df["Churn"] == 'No'], fill = True,color='green')
4 ax = sns.kdeplot(df['TotalCharges'][df["Churn"] == 'Yes'],ax =ax,fill= True,color='skyblue'
5 ax.legend(["Not Churn","Churn"],loc='upper right')
6 ax.set_ylabel('Density')
7 ax.set_xlabel('TotalCharges')
8 ax.set_title('Distribution of Total Charges by churn')

Out[16]: Text(0.5, 1.0, 'Distribution of Total Charges by churn')

Label encoding
In [17]: 1 df["gender"]=df["gender"].map({"Female":0,"Male":1})
2 df["Partner"]=df["Partner"].map({"No":0,"Yes":1})
3 df["Dependents"]=df["Dependents"].map({"No":0,"Yes":1})
4 df["PhoneService"]=df["PhoneService"].map({"No":0,"Yes":1})
5 df["PaperlessBilling"]=df["PaperlessBilling"].map({"No":0,"Yes":1})
6 df["Churn"]=df["Churn"].map({"No":0,"Yes":1})

In [18]: 1 df=pd.get_dummies(df,drop_first=True)
2 df.head()

Out[18]:
gender SeniorCitizen Partner Dependents tenure PhoneService PaperlessBilling MonthlyCharges TotalCharges

0 0 0 1 0 1 0 1 29.85 29.85

1 1 0 0 0 34 1 0 56.95 1889.50

2 1 0 0 0 2 1 1 53.85 108.15

3 1 0 0 0 45 0 0 42.30 1840.75

4 0 0 0 0 2 1 1 70.70 151.65

5 rows × 31 columns

Assigning dependent and independent variable


localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 8/10
07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

In [19]: 1 X=df.drop(columns = "Churn")


2 y=df["Churn"]

Machine Learning classification model libraries


In [20]: 1 from sklearn.linear_model import LogisticRegression
2 from sklearn.tree import DecisionTreeClassifier
3 from sklearn.ensemble import RandomForestClassifier
4 from sklearn import metrics
5 from sklearn.metrics import classification_report
6 from sklearn.model_selection import train_test_split

Splitting the dataset into training and testing set


In [21]: 1 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state =12

Logistic regression
In [22]: 1 logmodel = LogisticRegression(random_state=50)
2 logmodel.fit(X_train,y_train)
3 pred = logmodel.predict(X_test)
4 ​
5 print(classification_report(y_test, pred))

precision recall f1-score support

0 0.85 0.88 0.87 1567


1 0.63 0.57 0.60 543

accuracy 0.80 2110


macro avg 0.74 0.73 0.73 2110
weighted avg 0.80 0.80 0.80 2110

C:\Users\msi\anaconda3\Lib\site-packages\sklearn\linear_model\_logistic.py:460: ConvergenceWar
ning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html (https://scikit-learn.org/stabl
e/modules/preprocessing.html)
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression (https://sci
kit-learn.org/stable/modules/linear_model.html#logistic-regression)
n_iter_i = _check_optimize_result(

Decision Tree

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 9/10


07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

In [23]: 1 dtmodel = DecisionTreeClassifier(criterion = "gini", random_state = 50)


2 dtmodel.fit(X_train, y_train)
3 dt_pred = dtmodel.predict(X_test)
4 ​
5 print(classification_report(y_test, dt_pred))

precision recall f1-score support

0 0.83 0.79 0.81 1567


1 0.47 0.52 0.49 543

accuracy 0.72 2110


macro avg 0.65 0.66 0.65 2110
weighted avg 0.73 0.72 0.73 2110

Random Forest
In [24]: 1 rfmodel = RandomForestClassifier(n_estimators = 100, criterion = 'entropy', random_state = 0
2 rfmodel.fit(X_train, y_train)
3 rf_pred = rfmodel.predict(X_test)
4 ​
5 print(classification_report(y_test, rf_pred))

precision recall f1-score support

0 0.84 0.89 0.86 1567


1 0.62 0.50 0.55 543

accuracy 0.79 2110


macro avg 0.73 0.69 0.71 2110
weighted avg 0.78 0.79 0.78 2110

In [25]: 1 # Factors contributing to customer attrition :


2 # 1. Contract - if it is 'Month to month', churn rate is high
3 # 2. Monthly charge - if it is between 65 and 110, churn rate is high
4 # 3. Total charg - churn rate is high when it is less than 2000

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 10/10

You might also like