unsupervised learning
unsupervised learning
PGPDSBA.O. AUG24.A
DHIVIYA MURALIDHARAN
Problem Statement
AllLife Bank wants to focus on its credit card customer base in the next financial year. They have been
advised by their marketing research team, that the penetration in the market can be improved.
Based on this input, the Marketing team proposes to run personalized campaigns to target new
customers as well as upsell to existing customers. Another insight from the market research was that
the customers perceive the support services of the back poorly. Based on this, the Operations team
wants to upgrade the service delivery model, to ensure that customer queries are resolved faster.
The Head of Marketing and Head of Delivery both decide to reach out to the Data Science team for
help
Objective
To identify different segments in the existing customers, based on their spending patterns as well as
past interaction with the bank, using clustering algorithms, and provide recommendations to the
bank on how to better market to and service these customers.
Data Description
The data provided is of various customers of a bank and their financial attributes like credit limit, the
total number of credit cards the customer has, and different channels through which customers have
contacted the bank for any queries (including visiting the bank, online, and through a call center).
Data Dictionary
Average Credit Limit: Average credit limit of each customer for all credit cards
Total credit cards: Total number of credit cards possessed by the customer
Total visits bank: Total number of visits that the customer made (yearly) personally to the
bank
Total visits online: Total number of visits or online logins made by the customer (yearly)
Total calls made: Total number of calls made by the customer to the bank or its customer
service department (yearly)
DATA OVERVIEW
STATISTICAL SUMMARY
OBSERVATIONS
Mean:
Avg_Credit_Limit: 34574.24
Total_Credit_Cards:4.70
Total_visits_bank:2.40
Total_visits_online:2.60
Total_calls_made:3.58
Normal distribution: No
Outliers: Right
Skew: Right
Normal distribution: No
Outliers: No
skew: Right
Normal distribution: No
Outliers: Right
Skew: Right
Normal distribution: No
Outliers: No
Skew: left
Normal distribution: No
Outliers: No
BIVARIATE ANALYSIS
OBSERVATIONS
Observations (Mean):
Avg_Credit_Limit: 34574.24
Total_Credit_Cards:4.70
Total_visits_bank:2.40
Total_visits_online:2.60
Total_calls_made:3.58
UNIVARIATE (NUMERICAL AND DISCRETE)
BIVARIATE ANALYSIS
Data pre-processing
Prepare the data for analysis
Feature Engineering
Outlier Treatment
Outliers were treated with flooring and caping , there are no missing and duplicate values. Scaling
was also done to process the data for clustering .
Applying K means Clustering with Elbow curve
Either 2 or 3 seems to be appropriate
Silhoutte scores
When K=2
FINAL MODEL
Let us take 3 as appropriate no. of clusters as silhoutte score is high enough
and there is knick at 3 in elbow curve.
CLUSTER PROFILING
Insights K-means
Cluster 0 :
Cluster 1 :
HIERARCHICAL CLUSTERING
Apply Hierarchical clustering with different linkage methods and plot
dendrograms for each linkage methods
Dendogram with Weigthed, centroid and average Linkage shows the distintic
and separated cluster, which is represented by highest correlation score
meaning that the clusters are separated from each other.
Calculating cophenatic correlation
Highest cophenet correlation is 0.8926672966587861, which is obtinaed with
euclidean distance metric and average linkage method
CLUSTER PROFILING
We are creating 3 different clusters
Cluster 1 :
o Avg_Credit_Limit: The highest end type of client.
Cluster 2 :
Conclusions K-means
Cluster 0 : Seems to be type of clients with the lowest credit limit, more willing to
visit the bank.
Cluster 1 : Mid range type of client a mix between cluster 2 and cluster 0.
Cluster 2 : Seems to be the type of client with the highest credit limit, more willing to
use online banking system.
Conclusions Hierarchical clusters
Cluster 0 : Seems to be type of clients with the lowest credit limit.
Cluster 1 : Seems to be type of clients with the highest credit limit. A client that
demands online and mobile contact.
Cluster 2 : Seems to be the type of client with the mid credit limit range, a type of
client that do not visit the bank neither use the online banking.
Recommendations
Accessing customers with an upper credit limit seems to be the best strategy based on the
analysis of the clusters.
Kmeans -> Cluster 2 : Explore online marketing campaigns to this type of client. High
financial potential in comparison with others clusters and desirous to access the
bank online.
Hierarchical -> Cluster 1 : Explore online marketing campaigns to this type of client,
and also develop a better approach in the call center. This type of client is willing to
access the bank online however needs a better call center service.
Insights
Cluster 0
o Second lowest Avg_Credit_Limit with a higher variance.
o Second highest number in Total_Credit_Cards.
o Total_visits_bank biggest one.
o Total_visits_online smallest one.
o Total_calls_made avg of 2.
o Clients visit in person.
Cluster 1
o The lowest Avg_Credit_Limit with a smaller variance.
o The lowest number in Total_Credit_Cards.
o Total_visits_bank second smallest.
o Total_visits_online second biggest.
o Total_calls_made The highest number of clients whom make phone calls.
o Clients would rather call.
Cluster 2
o The highest Avg_Credit_Limit with a smallest variance.
o The highest number in Total_Credit_Cards.
o Total_visits_bank the smallest.
o Total_visits_online the biggest.
o Total_calls_made The smallest.
o Clients would visit online.
Cluster 3
o The second highest Avg_Credit_Limit with a bigger variance.
o The second biggest number in Total_Credit_Cards.
o Total_visits_bank second smallest.
o Total_visits_online the smallest.
o Total_calls_made The second smallest.
o Clients visit in person.
BUSINESS RECOMMENDATIONS
Cluster 0
This type of customer has a good Avg_Credit_Limit and likes to visit the bank in
person. It is important to identify visiting patterns and improve your experience.
Cluster 1
This type of customer has a bad Avg_Credit_Limit and likes to call the bank. It is
important to identify whether they are the type of customer the bank wants to
invest in. Mainly because developing a better call center experience can be
expensive and customers in this cluster enjoy the phone call experience.
Cluster 2
This type of customer has a good Avg_Credit_Limit and likes to visit the online bank. It
is important to identify patterns of online visits and improve your experience by
tracking your internet flow showing new products and services.
Cluster 3
This type of customer has a good Avg_Credit_Limit and likes to visit the bank in
person. It is important to identify visiting patterns and improve their experience.