0% found this document useful (0 votes)
7 views

unsupervised learning

AllLife Bank aims to enhance its credit card customer base through targeted marketing and improved service delivery. The Data Science team utilized clustering algorithms, specifically K-means and Hierarchical clustering, to segment customers based on financial attributes and interaction patterns. The analysis identified distinct customer clusters, leading to tailored marketing strategies and recommendations for service improvements.

Uploaded by

murali.dhiviya96
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

unsupervised learning

AllLife Bank aims to enhance its credit card customer base through targeted marketing and improved service delivery. The Data Science team utilized clustering algorithms, specifically K-means and Hierarchical clustering, to segment customers based on financial attributes and interaction patterns. The analysis identified distinct customer clusters, leading to tailored marketing strategies and recommendations for service improvements.

Uploaded by

murali.dhiviya96
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

UNSUPERVISED LEARNING

K MEANS , HIERARCHICAL CLUSTERING

ALL LIFE BANK


BUSINESS REPORT

PGPDSBA.O. AUG24.A
DHIVIYA MURALIDHARAN
Problem Statement
AllLife Bank wants to focus on its credit card customer base in the next financial year. They have been
advised by their marketing research team, that the penetration in the market can be improved.
Based on this input, the Marketing team proposes to run personalized campaigns to target new
customers as well as upsell to existing customers. Another insight from the market research was that
the customers perceive the support services of the back poorly. Based on this, the Operations team
wants to upgrade the service delivery model, to ensure that customer queries are resolved faster.
The Head of Marketing and Head of Delivery both decide to reach out to the Data Science team for
help

Objective

To identify different segments in the existing customers, based on their spending patterns as well as
past interaction with the bank, using clustering algorithms, and provide recommendations to the
bank on how to better market to and service these customers.

Data Description

The data provided is of various customers of a bank and their financial attributes like credit limit, the
total number of credit cards the customer has, and different channels through which customers have
contacted the bank for any queries (including visiting the bank, online, and through a call center).

Data Dictionary

 Sl_No: Primary key of the records

 Customer Key: Customer identification number

 Average Credit Limit: Average credit limit of each customer for all credit cards

 Total credit cards: Total number of credit cards possessed by the customer

 Total visits bank: Total number of visits that the customer made (yearly) personally to the
bank

 Total visits online: Total number of visits or online logins made by the customer (yearly)

 Total calls made: Total number of calls made by the customer to the bank or its customer
service department (yearly)

DATA OVERVIEW
STATISTICAL SUMMARY

OBSERVATIONS
Mean:

 Avg_Credit_Limit: 34574.24

 Total_Credit_Cards:4.70

 Total_visits_bank:2.40

 Total_visits_online:2.60

 Total_calls_made:3.58

Total_visits_bank 100 rows with zeros.

Total_visits_online 144 rows with zeros.

Total_calls_made 97 rows with zeros.

No Missing or Duplicate values.

EXPLORATARY DATA ANALYSIS


 Skew: Right

 Normal distribution: No

 Outliers: Right
 Skew: Right

 Normal distribution: No

 Outliers: No

 skew: Right
 Normal distribution: No
 Outliers: Right

 Skew: Right

 Normal distribution: No

 Outliers: No
 Skew: left

 Normal distribution: No

 Outliers: No
BIVARIATE ANALYSIS

OBSERVATIONS
Observations (Mean):

 Avg_Credit_Limit: 34574.24

 Total_Credit_Cards:4.70

 Total_visits_bank:2.40

 Total_visits_online:2.60

 Total_calls_made:3.58
UNIVARIATE (NUMERICAL AND DISCRETE)

 Feature: Avg_Credit_Limit, Mean: 34574.242424242424, Median: 18000.0, Mode: 100000

 Feature: Total_Credit_Cards, Mean: 4.706060606060606, Median: 5.0, Mode: 2

 Feature: Total_visits_bank, Mean: 2.403030303030303, Median: 2.0, Mode: 1

 Feature: Total_visits_online, Mean: 2.606060606060606, Median: 2.0, Mode: 1

 Feature: Total_calls_made, Mean: 3.5833333333333335, Median: 3.0, Mode: 0

BIVARIATE ANALYSIS

 'Avg_Credit_Limit' vs. 'Total_calls_made' Negative correlated

 'Avg_Credit_Limit': vs. 'Total_visits_online' Positive correlated

 'Avg_Credit_Limit': vs. 'Total_Credit_Cards' Positive correlated

 'Total_calls_made' vs. 'Total_Credit_Cards' Negative correlated

 'Total_visits_bank' vs. 'Total_Credit_Cards' Positive correlated

 'Total_visits_bank' vs. 'Total_visits_online' Negative correlated

 'Total_visits_bank' vs. 'Total_calls_made' Negative correlated

 There is no correlations above +/- 0.75 threshold

Data pre-processing
 Prepare the data for analysis

 Feature Engineering

 Missing value Treatment

 Outlier Treatment

 Duplicate observations check and removal if found

After outlier treatment

Outliers were treated with flooring and caping , there are no missing and duplicate values. Scaling
was also done to process the data for clustering .
Applying K means Clustering with Elbow curve
Either 2 or 3 seems to be appropriate

Silhoutte scores

Silhouette score for 3 is the highest.


Finding optimal number of clusters with silhoutte co effecients
When K=3
Cluster 2 seems to be too thin in
comparison to the others.

Cluster 1 seems to be too large in


comparison to the others.

When K=2

FINAL MODEL
Let us take 3 as appropriate no. of clusters as silhoutte score is high enough
and there is knick at 3 in elbow curve.

CLUSTER PROFILING
Insights K-means
 Cluster 0 :

o Avg_Credit_Limit: The mid end type of client.

o Total_Credit_Cards: The mid end type of client.

o Total_visits_bank: Visit the most the bank.

o Total_visits_online: Doesn't access much the online bank.

o Total_calls_made: Don't call as much as expected.

 Cluster 1 :

o Avg_Credit_Limit: The lowest end type of client.

o Total_Credit_Cards: The lowest end type of client.

o Total_visits_bank: Doesn't visit much the bank.

o Total_visits_online: Average end in terms of online banking usage.

o Total_calls_made: The highest end type of client.


 Cluster 2 :

o Avg_Credit_Limit: The highest end type of client.

o Total_Credit_Cards: The highest end type of client.

o Total_visits_bank: The lowest end type of client.

o Total_visits_online: The highest end type of client.

o Total_calls_made: The lowest end type of client.

HIERARCHICAL CLUSTERING
Apply Hierarchical clustering with different linkage methods and plot
dendrograms for each linkage methods
Dendogram with Weigthed, centroid and average Linkage shows the distintic
and separated cluster, which is represented by highest correlation score
meaning that the clusters are separated from each other.
Calculating cophenatic correlation
Highest cophenet correlation is 0.8926672966587861, which is obtinaed with
euclidean distance metric and average linkage method
CLUSTER PROFILING
We are creating 3 different clusters

It seems that for hierarchical approach 2 clusters is a better choice since


the Freq of cluster 1 is defined by only 6 elements.

Insights hierarchical clustering


 Cluster 0 :

o Avg_Credit_Limit: The lowest end type of client.

o Total_Credit_Cards: The lowest end type of client.

o Total_visits_bank: The lowest end type of client.

o Total_visits_online: The mid end type of client.

o Total_calls_made: The mid end type of client.

 Cluster 1 :
o Avg_Credit_Limit: The highest end type of client.

o Total_Credit_Cards: The mid end type of client.

o Total_visits_bank: The lowest end type of client.

o Total_visits_online: The highest end type of client.

o Total_calls_made: The highest end type of client.

 Cluster 2 :

o Avg_Credit_Limit: The mid end type of client.

o Total_Credit_Cards: The highest end type of client.

o Total_visits_bank: The mid end type of client.

o Total_visits_online: The lowest end type of client.

o Total_calls_made: The lowest end type of client.

Compare cluster K-means clusters and Hierarchical clusters

Conclusions K-means
 Cluster 0 : Seems to be type of clients with the lowest credit limit, more willing to
visit the bank.
 Cluster 1 : Mid range type of client a mix between cluster 2 and cluster 0.
 Cluster 2 : Seems to be the type of client with the highest credit limit, more willing to
use online banking system.
Conclusions Hierarchical clusters
 Cluster 0 : Seems to be type of clients with the lowest credit limit.
 Cluster 1 : Seems to be type of clients with the highest credit limit. A client that
demands online and mobile contact.
 Cluster 2 : Seems to be the type of client with the mid credit limit range, a type of
client that do not visit the bank neither use the online banking.

Recommendations
Accessing customers with an upper credit limit seems to be the best strategy based on the
analysis of the clusters.
 Kmeans -> Cluster 2 : Explore online marketing campaigns to this type of client. High
financial potential in comparison with others clusters and desirous to access the
bank online.
 Hierarchical -> Cluster 1 : Explore online marketing campaigns to this type of client,
and also develop a better approach in the call center. This type of client is willing to
access the bank online however needs a better call center service.

Dimensionality Reduction using PCA for visualization

for 90% variance, the number of components should


be close to 3.5

It can be seen that ward linkage method show 4 as apt


number of clusters after PCA
CLUSTER PROFILING

Insights

 Cluster 0
o Second lowest Avg_Credit_Limit with a higher variance.
o Second highest number in Total_Credit_Cards.
o Total_visits_bank biggest one.
o Total_visits_online smallest one.
o Total_calls_made avg of 2.
o Clients visit in person.
 Cluster 1
o The lowest Avg_Credit_Limit with a smaller variance.
o The lowest number in Total_Credit_Cards.
o Total_visits_bank second smallest.
o Total_visits_online second biggest.
o Total_calls_made The highest number of clients whom make phone calls.
o Clients would rather call.
 Cluster 2
o The highest Avg_Credit_Limit with a smallest variance.
o The highest number in Total_Credit_Cards.
o Total_visits_bank the smallest.
o Total_visits_online the biggest.
o Total_calls_made The smallest.
o Clients would visit online.
 Cluster 3
o The second highest Avg_Credit_Limit with a bigger variance.
o The second biggest number in Total_Credit_Cards.
o Total_visits_bank second smallest.
o Total_visits_online the smallest.
o Total_calls_made The second smallest.
o Clients visit in person.

BUSINESS RECOMMENDATIONS

Cluster 0
 This type of customer has a good Avg_Credit_Limit and likes to visit the bank in
person. It is important to identify visiting patterns and improve your experience.
Cluster 1
 This type of customer has a bad Avg_Credit_Limit and likes to call the bank. It is
important to identify whether they are the type of customer the bank wants to
invest in. Mainly because developing a better call center experience can be
expensive and customers in this cluster enjoy the phone call experience.
Cluster 2
 This type of customer has a good Avg_Credit_Limit and likes to visit the online bank. It
is important to identify patterns of online visits and improve your experience by
tracking your internet flow showing new products and services.
Cluster 3
 This type of customer has a good Avg_Credit_Limit and likes to visit the bank in
person. It is important to identify visiting patterns and improve their experience.

You might also like