WORK BOOK 8 - Segmentation
WORK BOOK 8 - Segmentation
Techtronics ltd is seeking new market opportunities. The company is focusing on voice recognition
market & has narrowed down to 3 segments: the fearful typists, the power users & the professional
specialists. The fearful typists don’t know much about computers – they just want a fast way to create e-
mail messages, letters & reports without errors. The power users know a lot about computers, use them
often & want a voice recognition program with lots of features. The professional specialist have jobs that
require a lot of writing . They don’t know much about computers but are willing to learn.
The marketing manager prepared a table summarizing the importance of each of three key needs in the
three segments.
Spread sheet 1
Techtronics sales staff conducted interviews with seven potential customers who were asked to rate
how important each of these three needs were in their work. The manager prepared a spreadsheet to
help him cluster each person in to one of the segments. The manager can then aggregate potential
customers in to the segment that is most similar
Spread Sheet 2
Importance of need:
A Dell 8 1 2
B IBM 6 6 5
C Apple 4 9 8
D Apple 2 6 7
E IBM 5 6 5
F Dell 8 3 1
G Apple 4 6 8
a. The ratings for a potential customer appear on the first spread sheet. In to which segment would
you aggregate this person.
b. The response for 7 potential customers who were interviewed are listed in spread sheet 2.
Write down the similarity score for each segment. Repeat the process for each customer.
Based on your anlaysis indicate the segment in to which you would aggregate each customer.
c. In the interview each potential customer was also asked what type of computers he would
be using. The responses are shown in the table along with the ratings. Group the responses
based on the customer’s segment. If you were targeting the fearful typist segment, what
d. Based on your analysis, which customer would you say is least like any of the segments ?
B. Aham retail conducts market research amongst a sample of 100
consumer to develop appropriate segments. Loyalty, Ad exposure and
usage were considered as important variables. The data was scaled from
1-9, with 1 being very low and 9 being very high. Aham wants to find
best segments using cluster analysis
AD
Consumer LOYALTY USAGE EXPOSURE
1 9.05 9.31 8.15
2 9.34 8.88 6
3 9.61 8.80 8
4 9.08 9.03 6
5 9.60 8.72 8
6 8.02 8.33 7
7 8.87 8.37 10
8 8.34 7.87 10
9 8.25 7.65 6
10 8.28 8.74 6
11 8.31 8.74 9
12 9.78 9.31 8
13 9.96 6.71 9
14 9.68 8.59 9
15 9.84 7.96 8
16 9.13 7.79 8
17 7.09 4.92 9
18 7.75 6.75 8
19 7.25 5.92 10
20 7.27 7.27 9
21 6.31 6.94 7
22 6.58 6.80 6
23 6.26 4.83 7
24 6.43 6.04 5
25 8.83 2.91 6
26 8.14 4.38 5
27 8.52 1.78 7
28 8.02 2.15 5
29 8.68 1.40 7
30 8.55 1.17 4
31 8.21 2.14 4
32 8.50 1.86 5
33 8.91 3.48 6
34 8.22 1.75 6
35 8.36 1.23 6
36 8.03 5.42 5
37 8.17 5.48 4
38 8.23 2.12 7
39 2.19 2.74 4
40 5.97 1.97 4
41 5.96 2.27 1
42 5.78 1.71 5
43 5.35 0.96 1
44 5.74 1.79 3
45 4.59 2.89 3
46 4.69 2.80 4
47 4.02 1.95 4
48 4.22 0.69 2
49 4.07 3.73 4
50 4.13 3.16 5
51 4.73 7.66 3
52 4.19 6.08 5
53 4.47 7.76 2
54 4.72 6.44 2
55 4.58 3.65 1
56 3.50 2.93 4
57 3.94 1.98 3
58 3.65 3.21 2
59 3.98 4.06 4
60 5.64 2.77 3
61 2.89 6.24 3
62 2.55 6.46 3
63 2.25 5.70 2
64 2.47 5.89 5
65 2.23 5.93 4
66 2.41 5.58 3
67 2.66 9.16 5
68 2.45 8.56 7
69 5.06 2.69 3
70 5.25 3.86 3
71 5.78 3.72 5
72 5.46 3.24 7
73 5.78 4.36 3
74 5.49 5.06 4
75 5.51 4.42 6
76 5.13 3.03 7
77 5.67 3.79 5
78 2.84 4.56 4
79 2.35 3.53 3
80 2.15 5.25 2
81 2.45 5.88 1
82 2.07 4.85 4
83 2.59 3.85 3
84 2.96 9.41 3
85 2.51 0.59 2
86 3.61 1.18 2
87 3.32 1.26 1
88 2.04 1.29 1
89 2.96 0.94 5
90 3.00 1.24 3
91 2.65 1.24 2
92 2.32 1.00 2
93 1.32 1.37 2
94 1.10 0.85 3
95 1.85 0.81 3
96 1.56 1.05 2
97 1.06 1.36 1
98 1.84 1.30 2
99 1.43 0.67 1
100 1.81 0.50 2
Demographic Details :
TASK :
C. R STUDIO
Activity 1 :
Data set : Age – Spend
Install packages : ggplo2
Activity 2:
• Top 5 Dim Summary Head Tail Str
• What is the pattern visible at this point?
• Whom to target? To improve advertising, the marketing team wants to
send more targeted emails to their customers.
Activity 3:
Create clusters: Plot the total spend and the age of the customers
Activity 4:
IMPORT DATA - iris (in-built data set)
Activity 5:
Packages : cluster, factoextra
Activity 6:
Boxplot for Iris data
Activity 7:
• Clustering with k-means
Activity 8:
Principal Component Analysis : PCA finds a new set of dimensions (or a set of
basis of views) such that all the dimensions are orthogonal (and hence linearly
independent) and ranked according to the variance of data along them.
Use PCA for display
Activity 9:
PLOT CLUSTER MAP
Activity 10:
Hierarchical Clustering : Creating clusters that have a predetermined ordering from
top to bottom
Activity 11:
• PLOT DENDOGRAM
Activity 12:
SEGMENT LABELS
Activity 13:
Download dataset : mydata
Packages : a. cluster b. factoextra
Activity 14:
Data Preparation : Prior to clustering data, you may want to remove or estimate
missing data and rescale variables for comparability
Activity 15:
Determine number of clusters
Activity 16:
K-Means Cluster Analysis :
5 cluster solution
Get cluster means
Append cluster assignment
Activity 17:
Wide range of hierarchical clustering approaches.
Ward Hierarchical Clustering
Activity 18:
• cut tree into 5 clusters
• draw dendogram with red borders around the 5 clusters
Activity 19:
Plotting Cluster Solutions :
a. K-Means Clustering with 5 clusters
b. Cluster Plot against 1st 2 principal components
Activity 20:
Centroid Plot against 1st 2 discriminant functions
Activity 21:
Data set : Wholesale.customers.data
Activity 22 :
Assign & call with customer
Activity 23 :
Prepare the data for analysis. Remove the missing value & remove “Channel” and
“Region” columns because they are not useful for clustering
Activity 24 :
Standardize the variables
Activity 25 :
Determine the number of clusters
Activity 26 :
Fit the model and print out the cluster means
Activity 27 :
Plot the results
Activity 28 :
Outlier detection with K-Means
Activity 29:
Calculate the distance between each object and its cluster center, then pick those
with largest distances as outliers and print out outliers’ IDs
Activity 30:
Make it more meaningful
Activity 31:
Apply Hierarchical Clustering to the data
Activity 32:
Ward’s method
Activity 33
Customer Segmentation using RFM Analysis
Activity 34
• INSTALL PACKAGES
• library(data.table) library(dplyr)
library(ggplot2) library(tidyr)
library(knitr) library(rmarkdown)
Activity 35
• Load Dataset : E commerce data
Activity 36
• Assign & Call : df_data
Activity 37
DATA SENSE : Top 5
• glimpse(df_data)
Activity 38
Data Cleaning : Delete all negative Quantity and Price. Delete NA customer ID
Activity 39
Recode variables : To convert character variables to factors
Activity 40
Calculate RFM
Activity 41
• HISTOGRAM : Recency – How recently did the customer purchase?
• Frequency – How often do they purchase?
• Monetary Value – How much do they spend?
Activity 42
Data is skewed, use log scale to normalize
Activity 43
Clustering
Activity 44
CLUSTER DENDOGRAM
Activity 45
Cut