Mall Customer Segmentation
Mall Customer Segmentation
By Anuj Vyas
1. Introduction
1.1 Problem Statement
Use K-means clustering and also visualize the gender and age distributions. Then analyze their annual
incomes and spending scores.
1.2 Introduction to Problem Statement
To make predictions and find the clusters of potential customers of the mall and thus find appropriate
measures to increase the revenue of the mall is one of the prevailing applications of unsupervised
learning.
For example, a group of customers have high income but their spending score (amount spent in the
mall) is low so from the analysis we can convert such type of customers into potential customers
(whose spending score is high) by using strategies like better advertising, accepting feedback and
improving the quality of products.
To identify such customers, this project analyses and forms clusters based on different criteria which
are discussed in the further sections.
2. Dataset
Overview of Dataset
The dataset name is Mall_Customers.csv
consists of 5 columns which are
CustomerID, Gender, Age, Annual Income
(k$), Spending Score (1-100) where
Gender is a categorical value and rest all
features are numeric.
Data: The size of the dataset is (200, 5) which is 200 rows and 5 columns. Also on dataset does not contain
any NULL or NaN values.
Algorithms: K-means algorithm is used in this project to analyze and form clusters of customers based on
their income and spending score features.
Model: K-means model is used and is hyper tuned parameters like n_clusters=5 using elbow method to find
the optimal number of clusters also init=’k-means++’ to avoid random initialization trap.
Environment (Libraries and Technologies): Numpy, Pandas, Matplotlib, Seaborn, Jupyter Notebook, Google
Colab.
4. Methodology
Methodology