Handout PS 1 - Customer Analytics
Handout PS 1 - Customer Analytics
in Product Management
Customer (Research)
Analytics Concepts
1
Analysts are different animal!
TENATIOUS REFLECTIVE
AMBITIOUS
2
What Customer Analytics Is in Product What is Customer Analytics in product
Management management?
is the use of:
customer data,
statistical analysis,
The systematic process of collecting and quantitative methods, and
analyzing customer information (data) in mathematical or computer-based models
order to increase our understanding of to help product managers gain improved insight
the phenomenon about which we are about their customers and building and maintaining
concerned or interested about relationships in their customer operations and
customers, employees and competitors.
make better, fact-based decisions.
3
Scope of Customer Analytics in Chic-
Data for Customer Analytics
Chicken
DATA
What are the other Customer decisions Kiran can
- collected facts and figures
take using Customer Analytics DATABASE
- collection of computer files containing
data
INFORMATION
- comes from analyzing data
Metrics are used to quantify performance. Four Types Data Based on Measurement
Measures are numerical values of metrics. Scale:
Discrete metrics involve counting Categorical (nominal) data
- on time or not on time
Ordinal data (ordered categorical data)
- number or proportion of on time deliveries
Continuous metrics are measured on a
Interval data
continuum Ratio data
- customer rating
- package weight
- purchase price
4
Data for Customer Analytics Data for Customer Analytics
5
Decision Models – Kiran’s Dilemma
Correlation
6
Variance vs Covariance Covariance
(x i x) 2 (x i x )( yi y )
Variance ~ cov( x, y ) i 1
S x2 i 1
n
DX * DX n
• When X and Y : cov (x,y) = pos.
n
• When X and Y : cov (x,y) = neg.
Covariance ~ (x i x)( yi y )
cov( x, y ) i 1 • When no constant relationship: cov (x,y)
DX * DY n =0
7
cov( x, y )
6
x y xi x yi y ( xi x )( yi y )
5
0 3 -3 0 0 • Covariance does not really tell us much
4
2 2 -1 -1 1
3
3 4 0 1 0 about the strength of association
2
1
4 0 1 -3 -3 – Solution: standardise this measure
0
6 6 3 3 9
0 1 2 3 4 5 6 7
x3 y3 7 • Pearson’s R: standardise by adding s.d to
n equation: cov( x, y )
(x i x)( yi y))
7 What does this rxy
cov( x, y ) 1.4
sx s y
i 1
7
What is a Model? What is a Math/Stats Model?
Types of
Regression Models
1 Explanatory Regression 2+ Explanatory
Variable Models Variables Estimating effect of product
strategy/tactics and
Simple Multiple
predicting consumer
Non- Non-
behavior
Linear Linear
Linear Linear
Two variable (Simple)
Linear Regression Model
8
Deviation not
Y explained by
regression
Total Deviation
Y=
Deviation average
explained by
regression
9
Linear Regression
Discussion Questions
Assumptions
• Will the regression line be the same if you
exchange X and Y?
• Y can be predicted from X
– In nearly all instances, the regression
• A graph of X & Y is a straight line line will be different
• The line extends infinitely in both • Will the correlation coefficient be the same if you
directions exchange X and Y?
• Model explains only the variability in Y – The correlation will be the same, as it
• Each XY pair was randomly sampled makes no difference which is called X
• Each XY pair was selected and which is called Y
independently • Do the X and Y axes have to have the same units
to perform linear regression?
– No
10
Multiple Regression Model
The equation that describes how the dependent
variable y is related to the independent variables
X3 x1, x2, . . . xp and an error term is called the
multiple regression model.
The multiple regression model is:
The equation that describes how the mean value of y is A simple random sample is used to compute sample
related to x1, x2, . . . xp is called the multiple regression statistics b0, b1, b2, . . . , bp that are used as the point
equation. estimators of the parameters b0, b1, b2, . . . , bp.
The multiple regression equation is:
The estimated multiple regression equation is:
E(y) = b0 + b1x1 + b2x2 + . . . + bpxp
^y = b0 + b1x1 + b2x2 + . . . + bpxp
11
Least Squares Method
Cluster Analysis
Applications:
12
Distance measures for individual
observations
Model:
• To measure similarity between two observations a
distance measure is needed Data: each object is characterized by a set of
• With a single variable, similarity is straightforward numbers (measurements);
• Example: income – two individuals are similar if their income level
is similar and the level of dissimilarity increases as the income
e.g., object 1: (x11, x12, … , x1n)
gap increases object 2: (x21, x22, … , x2n)
• Multiple variables require an aggregate distance : :
measure
object p: (xp1, xp2, … , xpn)
• Many characteristics (e.g. income, age, consumption habits,
brand loyalty, purchase frequency, family composition, education
level, ..), it becomes more difficult to define similarity with a single
value Distance: Euclidean distance, dij,
• The most known measure of distance is the Euclidean
distance, which is the concept we use in everyday life for d ij xi1 x j1 xi 2 x j 2 xin x jn
2 2 2
spatial coordinates.
76
Low
Low High Low
Low High
Frequency of going to fast food restaurants Frequency of going to fast food restaurants
13
Clustering procedures Hierarchical clustering
• Agglomerative:
• Hierarchical procedures • Each of the n observations constitutes a separate cluster
• The two clusters that are more similar according to some distance rule are
• Agglomerative (start from n clusters to aggregated, so that in step 1 there are n-1 clusters
• In the second step another cluster is formed (n-2 clusters), by nesting the two
get to 1 cluster) clusters that are more similar, and so on
• There is a merging in each step until all observations end up in a single
• Divisive (start from 1 cluster to get to n cluster in the final step.
• Divisive
clusters) • All observations are initially assumed to belong to a single cluster
• The most dissimilar observation(s) is extracted to form a separate cluster
• Non hierarchical procedures • In step 1 there will be 2 clusters, in the second step three clusters and so on,
until the final step will produce as many clusters as the number of
observations. This technique is used in medical research and not in the
• K-means clustering scope of our course.
• The number of clusters determines the stopping rule for the
algorithms
14
Using SPSS to Identify Clusters
15
Kiran’s Dilemma
Variable Description Type
Work Environment Measures
• Group employees based on their X1
X2
I am paid fairly for the work I do.
I am doing the kind of work I want.
Metric
Metric
commitment to Chic-Chicken X3
X4
My supervisor gives credit and praise for work well done.
There is a lot of cooperation among the members of my work group.
Metric
Metric
X5 My job allows me to learn new skills. Metric
For this example we are looking for subgroups among all the 63
employees of Chic-Chicken restaurant using the “organizational
commitment” variables. The SPSS click through sequence is: Analyze
Classify Hierarchical Cluster. This will take you to a dialog box where
you select and move variables X13, X14 and X15 into the “Variables” box.
Next you go to the statistics box and agglomeration schedule is selected as
default option. Cluster membership ‘none’ is selected as default. We shall
continue with default option here. Next click on ‘plot’ box. Check on
dendogram and in Icicle window, click on none button. Then continue.
Next click on the Method box and select Ward’s under Cluster Method (it
•Thank you
is the last option). Squared Euclidean Distances is the default under
Measure and we will use it, and we do not need to standardize this data.
We will not select anything on the save option now. Now click on “OK” to
run the program.
16