67% found this document useful (3 votes)
1K views66 pages

Low Code AIML USL Project CreditCardCustomerSegmentation Vijay Borade Aug23

The document discusses segmenting customers of AllLife Bank using machine learning clustering algorithms. It explores preprocessing a customer dataset, finding the optimal number of clusters, and visualizing the cluster results. Both k-means and hierarchical clustering are implemented in Python to segment customers based on features like credit limits, number of credit cards, bank visits, and calls. The goal is to help AllLife Bank improve its credit card business and customer support services through personalized campaigns targeted at different customer segments.

Uploaded by

borade.vijay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
67% found this document useful (3 votes)
1K views66 pages

Low Code AIML USL Project CreditCardCustomerSegmentation Vijay Borade Aug23

The document discusses segmenting customers of AllLife Bank using machine learning clustering algorithms. It explores preprocessing a customer dataset, finding the optimal number of clusters, and visualizing the cluster results. Both k-means and hierarchical clustering are implemented in Python to segment customers based on features like credit limits, number of credit cards, bank visits, and calls. The goal is to help AllLife Bank improve its credit card business and customer support services through personalized campaigns targeted at different customer segments.

Uploaded by

borade.vijay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

AllLife Bank Customer Segmentation

BUSINES
REPORT

Vijay Arjun Borade Batch : Aug 23 Logo


[email protected] Name
1

Contents ....................................................................................................................................................................... 1
Context ......................................................................................................................................................................... 2
What is customer segmentation .................................................................................................................................... 3
Machine Learning for Customer Segmentation .............................................................................................................. 4
Advantages of Customer Segmentation ......................................................................................................................... 5
Exploring Customer dataset and its feature ................................................................................................................... 6
Pre-processing Dataset .................................................................................................................................................. 7
Implementing K-means Clustering in Python ............................................................................................................ 8-12
Pre-processing dataset ................................................................................................................................................ 13
Loading Dataset .......................................................................................................................................................... 14
Creating copy of data .................................................................................................................................................. 15
Checking missing value ................................................................................................................................................ 16
Checking duplicate value ............................................................................................................................................. 17
Explore data Analysis ............................................................................................................................................. 18-27
Univariate Analysis ................................................................................................................................................. 28-37
Bivariate Analysis ................................................................................................................................................... 38-41
Data Processing ........................................................................................................................................................... 42
Finding the optimal number of Clustering .................................................................................................................... 43
Optimal Value of K =5 ................................................................................................................................................. 44
Cluster Visualization .................................................................................................................................................... 45
Let’s checking the Silhouette Scores ............................................................................................................................ 46
Let's visualize the silhouette scores for different number of clusters ....................................................................... 47-50

Hierarchical Clustering................................................................................................................................................. 51

Checking Dendrograms ........................................................................................................................................... 52-55

Clustering Profiling: Hierarchical Clustering.................................................................................................................. 56

K-means vs Hierarchical Clustering ......................................................................................................................... 57-60

understand customer distribution among the cluster.............................................................................................. 61-62

Visualizing Customer Segment ..................................................................................................................................... 63


Creating Copy of the Original Data ............................................................................................................................... 64
Actionable Insights and Recommendations ................................................................................................................. 65
2
Context

Executive Summary
AllLife Bank wants to focus on its credit card customer base in the next financial year.
They have been advised by their marketing research team, that the penetration in the
market can be improved. Based on this input, the Marketing team proposes to run
personalized campaigns to target new customers as well as upsell to existing customers.
Another insight from the market research was that the customers perceive the support
services of the back poorly. Based on this, the Operations team wants to upgrade the
service delivery model, to ensure that customer queries are resolved faster. Head of
Marketing and Head of Delivery both decide to reach out to the Data Science team for
help
What is customer 3

segmentation

Customer segmentation simply It’s a way for organizations to


means grouping your customers understand their customers. Knowing
according to various the differences between customer
characteristics (for example groups, it’s easier to make strategic
grouping customers by age). decisions regarding product growth and
marketing.
The opportunities to segment are
endless and depend mainly on
how much customer data you There are different methodologies for
have at your use. Starting from customer segmentation, and they
the basic criteria, like gender, depend on four types of parameters:
hobby, or age, it goes all the way
to things like “time spent of • geographic,
website X” or “time since user • demographic,
opened our app”. • behavioral,
• psychological.
4

Machine Learning for customer segmentation


Machine learning methodologies are a great tool for analysing customer data and finding
insights and patterns. Artificially intelligent models are powerful tools for decision-makers.
They can precisely identify customer segments, which is much harder to do manually or with
conventional analytical methods.

There are many machine learning algorithms, each suitable for a specific type of problem. One
very common machine learning algorithm that’s suitable for customer segmentation
problems is the k-means clustering algorithm. There are other clustering algorithms as well
such as DBSCAN, Agglomerative Clustering, and BIRCH, etc.

Why would you implement machine learning for customer segmentation?


5

Advantages of
customer segmentation
Implementing customer segmentation
leads to plenty of new business
opportunities. You can do a lot of
optimization in:

• budgeting,
• product design,
• promotion,
• marketing,
• customer satisfaction.

Let’s discuss these benefits in more


depth.
6

Exploring Customer dataset and its features


Let’s analyze a customer dataset. Our dataset has 24,000 data points and four
features. The features are:

• Customer Key – This is the id of a customer for a particular business.


• Average Credit Limit – This feature represents the average credit limit data of
Customers.
• Total Number of Credit Cards – This column value indicates the number of Credit
cards held by the customers.
• Total Visit Bank – This column value indicates the Customer Visit done in the
bank.
• Total Visit Online – This column value indicates the Customer Visited in the bank
website online.
• Total call made – This column value indicates the Customer made call with the
bank.
7

Pre-Processing Dataset
Before feeding the data to the k-means clustering algorithm, we need to pre-process
the dataset. Let’s implement the necessary pre-processing for the customer dataset.

Moving on, we’ll implement our k-means clustering algorithm in Python.


8

Implementing K-means clustering in python


K-Means clustering is an efficient machine learning algorithm to solve data clustering
problems. It’s an unsupervised algorithm that’s quite suitable for solving customer
segmentation problems. Before we move on, let’s quickly explore two key concepts.

Unsupervised Learning:-

Unsupervised machine learning is quite different from supervised machine learning.


It’s a special kind of machine learning algorithm that discovers patterns in the dataset
from unlabelled data.

Unsupervised machine learning algorithms can group data points based on similar
attributes in the dataset. One of the main types of unsupervised models is clustering
models.

Note that, supervised learning helps us produce an output from the previous
experience.

Clustering algorithms

A clustering machine learning algorithm is an unsupervised machine learning


algorithm. It’s used for discovering natural groupings or patterns in the dataset. It’s
worth noting that clustering algorithms just interpret the input data and find natural
clusters in it.

Some of the most popular clustering algorithms are:

• K-Means Clustering
• Agglomerative Hierarchical Clustering
• Expectation-Maximization (EM) Clustering
• Density-Based Spatial Clustering
• Mean-Shift Clustering
9

In the following section, we’re going to analyze the customer segmentation problem
using the k-means clustering algorithm and machine learning. However, before that,
let’s quickly discuss why we’re using the k-means clustering algorithm.

Why use K-means clustering for customer segmentation?

Unlike supervised learning algorithms, K-means clustering is an unsupervised machine


learning algorithm. This algorithm is used when we have unlabelled data. Unlabelled
data means input data without categories or groups provided. Our customer
segmentation data is like this for this problem.

The algorithm discovers groups (clusters) in the data, where the number of clusters is
represented by the K value. The algorithm acts iteratively to assign each input data to
one of the K clusters, as per the features provided. All of this makes k-means quite
suitable for the customer segmentation problem.

Given a set of data points are grouped as per feature similarity. The output of the K-
means clustering algorithm is:

• The centroids values for K clusters,


• Labels for each input data point.

At the end of implementation, we’re going to get output such as a group of clusters
along with which customer belongs to which cluster.
10

K Means Clustering

Checking Elbow plot


11
12
13

Pre-processing dataset
First, we need to implement the required Python libraries as shown in the table below.

We’ve imported the pandas, NumPy sklearn, plotly and matplotlib libraries. Pandas
and NumPy are used for data wrangling and manipulation, sklearn is used for
modelling, and plotly along with matplotlib will be used to plot graphs and images.

After importing the library, our next step is to load the data in the panda’s data frame.
For this, we’re going to use the reading method of pandas.
14

Overview of Dataset
The initial steps to get an overview of any dataset is to:

• observe the first few rows of the dataset, to check whether the dataset has been
loaded properly or not
• get information about the number of rows and columns in the dataset
• find out the data types of the columns to ensure that data is stored in the preferred
format and the value of each property is as expected.
• check the statistical summary of the dataset to get an overview of the numerical
columns of the data

Checking the shape of the dataset

Displaying few row of the dataset


15

Creating a copy of the data

Checking the data type of the columns for the dataset


16

Checking the missing values


17

Checking for the duplicate values

Let's look at the duplicate values in the Customer_Key column closely.

Statistical summary of the dataset


18

Exploratory data Analysis

The below functions need to be defined to carry out the Exploratory Data Analysis.
19
20
21
22
23
24
25
26
27
28

Univariate analysis
29
30
31
32
33
34
35
36
37
38
39
40
41
42

Data Processing

Outlier Detection

Scaling
43

Finding the optimal number of clusters


44

Optimal Value of K = 5
45

Cluster Visualization
46

Let’s checking the Silhouette Scores


47

Let's visualize the silhouette scores for different number of clusters


48
49
50

Creating final Model


51

Hierarchical Clustering

Computing Cophenetic Correlation


52

Checking Dendrograms
We see that the cophenetic correlation is maximum with Euclidean distance and average
linkage.
Let's view the dendrograms for the different linkage methods.
53
54
55

Cluster Profiling and Comparison

Cluster Profiling: K-means Clustering


56

Clustering Profiling: Hierarchical Clustering

1. Dendrogram Visualization:
Plot the Dendrogram: Visualize the hierarchical structure using a dendrogram to
understand how clusters merge at different linkage distances.
2. Cutting the Dendrogram:
Select the Number of Clusters: Decide on the number of clusters by cutting the
dendrogram at an appropriate height or distance.
3. Assigning Cluster Labels:
Cut and Assign Labels: Use the chosen number of clusters to cut the dendrogram and
assign cluster labels to your data points.
4. Cluster Profiling:
Compute Cluster Profiles: Calculate statistics (such as means, medians, standard
deviations) for each feature within each identified cluster.
Visualize Profiles: Create visualizations (like box plots, bar plots) to compare attributes
across different clusters.
57

K-means vs Hierarchical Clustering


You compare several things, like:

• Which clustering technique took less time for execution?


• Which clustering technique gave you more distinct clusters, or are they the same?
• How many observations are there in the similar clusters of both algorithms?
• How many clusters are obtained as the appropriate number of clusters from both
algorithms?

You can also mention any differences or similarities you obtained in the cluster profiles from
both the clustering techniques.
58
59
60

K-means Clustering: Algorithm: Iterative algorithm that partitions data into K clusters based
on the mean (centroid) of data points.
Number of Clusters (K): Requires specifying the number of clusters (K) beforehand.
Scalability: Scales well for large datasets and is computationally efficient.
Cluster Shape: Assumes clusters as spherical and evenly sized, which might not fit well for
complex or non-spherical clusters.
Result Sensitivity: Sensitive to initial centroid selection, often converging to local optima,
which can lead to different results on multiple runs.
Interpretability: Offers simpler interpretability due to straightforward cluster assignment.
Hierarchical Clustering: Agglomerative vs Divisive: Builds a hierarchy of clusters either by
merging (agglomerative) or by dividing (divisive) until all data points belong to one cluster.
Hierarchy Visualization: Produces dendrograms that display the merging/dividing process
and allow choosing the number of clusters post hoc.
Number of Clusters: Doesn’t require specifying the number of clusters beforehand but needs
a method to determine the stopping point.
Cluster Shape: Can handle clusters of various shapes and sizes.
Computation: Can be more computationally expensive, especially for large datasets.
Interpretability: Offers more insights into the hierarchical relationships between clusters due
to its tree-like structure.
Choosing Between Them: Data Structure: Consider the nature of your data; for example, K-
means might perform better on well-separated spherical clusters, while hierarchical
clustering might handle more complex relationships.
Number of Clusters: If you have a specific number of clusters in mind, K-means might be
more suitable. Otherwise, hierarchical clustering offers flexibility in choosing the number post
hoc.
Interpretability vs Performance: If interpretability and visual representation of hierarchy are
crucial, hierarchical clustering might be more beneficial. For computational efficiency and
simplicity in assignment, K-means might be preferred.
In practice, it's often beneficial to try both methods and evaluate them based on clustering
quality, domain knowledge, and the specific objectives of your analysis.
Would you like to explore specific aspects of these clustering techniques further?
61

Lets Create some plot on the original data to understand customer distribution among
the cluster
62
63

Visualizing Customer Segments


64

Creating a copy of original data


65

Actionable Insights and Recommendations

1. Interpret Cluster Characteristics: Feature Analysis: Examine the attributes defining


each cluster (e.g., means, standard deviations) to understand their distinct
characteristics. Visualizations: Use visualizations (like scatter plots, box plots) to
explore differences among clusters visually.
2. Identify Patterns and Trends: Identify Key Features: Determine which attributes
contribute most to the differences between clusters. Cluster Comparisons: Analyze
how clusters differ or resemble each other, focusing on their unique and common
traits.
3. Segmentation Insights: Customer Segmentation: If it's customer data, understand
behavior patterns or preferences within clusters to tailor marketing strategies or
services. Product Segmentation: For product data, identify distinct product groups for
targeted marketing or product improvement.
4. Predictive Insights: Predictive Modeling: Utilize the clusters as features for predictive
models to anticipate future behavior or outcomes for each segment.
5. Actionable Recommendations: Customized Strategies: Develop tailored strategies,
services, or products for each cluster based on their unique characteristics. Marketing
and Campaigns: Design targeted marketing campaigns suited to the preferences of
each cluster.
6. Validation and Feedback Loop: Feedback Incorporation: Incorporate feedback from
implemented strategies to refine clusters or improve recommendations. Model
Iteration: Iterate and refine clustering models based on real-world outcomes and
feedback. Example Insights and Recommendations (Customer Segmentation): Insight:
Clusters exhibit distinct spending patterns.

Recommendation: Customize marketing strategies—offer discounts or loyalty programs


tailored to high-spending clusters. Insight: Some clusters prefer certain product categories.
Recommendation: Create targeted advertisements or product bundles based on these
preferences. Insight: Clusters show different engagement levels.
Recommendation: Adjust communication channels or content to suit each cluster's preferred
engagement method. Actionable insights and recommendations are derived by combining
data-driven findings with domain expertise and business objectives. It's important to regularly
assess and refine strategies based on the outcomes observed in real-world implementations.

You might also like