0% found this document useful (0 votes)
64 views

Handout PS 1 - Customer Analytics

Uploaded by

mysticriverlabs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

Handout PS 1 - Customer Analytics

Uploaded by

mysticriverlabs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Customer Analytics

in Product Management

Customer (Research)
Analytics Concepts

Analysts are not retrievers! What Customer Analytics Is Not


• Customer analytics isn’t information gathering:
– Gathering information from resources such as
books or magazines isn’t BA.
– No contribution to new knowledge.
• BA isn’t the transportation of facts:
– Merely transporting facts from one resource to
another doesn’t constitute BA.
– No contribution to new knowledge although this
might make existing knowledge more accessible.

1
Analysts are different animal!

TENATIOUS REFLECTIVE

AMBITIOUS

2
What Customer Analytics Is in Product What is Customer Analytics in product
Management management?
is the use of:
customer data,
statistical analysis,
The systematic process of collecting and quantitative methods, and
analyzing customer information (data) in mathematical or computer-based models
order to increase our understanding of to help product managers gain improved insight
the phenomenon about which we are about their customers and building and maintaining
concerned or interested about relationships in their customer operations and
customers, employees and competitors.
make better, fact-based decisions.

What is Customer Analytics in Product What is Customer Analytics in Product


Strategy? Strategy?
Customer Analytics Applications in Product Strategy Importance of Customer Analytics in Product Strategy
 Analysis of customer perception  There is a strong relationship of Customer
 Predicting customer future behavior Analytics with:
 Estimating effect of strategic and tactical actions
- customer satisfaction, repeat purchase
 Finding customer segments, size and profile
 Analysis of factors that affect consumer decision - Market share analysis
making - revenue of Customers
 Analysis of customer satisfaction, repeat purchase - shareholder return
behavior
 Pricing decisions  CA enhances understanding of data
 Estimating market share based on customer data  CA is vital for companies to remain
 Many other applications competitive
 CA enables creation of informative reports

Scope of Customer Analytics in Product


Evolution of Customer Analytics
Strategy
• Marketing Research Example: Chic-Chicken Case
 The question is:
• Operations research To what extent customer perceptions to be
• Customer intelligence increased?
 Descriptive analytics: examine consumer data
• Decision support systems (food quality, service quality, ambiance, price …)
• Personal computer software  Predictive analytics (subset of descriptive analysis):
predict future return based on customer perception
 Prescriptive analytics: What are the work
environment perceptual dimensions of Chic-
Chicken employees

3
Scope of Customer Analytics in Chic-
Data for Customer Analytics
Chicken
 DATA
What are the other Customer decisions Kiran can
- collected facts and figures
take using Customer Analytics  DATABASE
- collection of computer files containing
data
 INFORMATION
- comes from analyzing data

Data for Customer Analytics Data for Customer Analytics

 Metrics are used to quantify performance. Four Types Data Based on Measurement
 Measures are numerical values of metrics. Scale:
 Discrete metrics involve counting  Categorical (nominal) data
- on time or not on time
 Ordinal data (ordered categorical data)
- number or proportion of on time deliveries
 Continuous metrics are measured on a
 Interval data
continuum  Ratio data
- customer rating
- package weight
- purchase price

Data for Customer Analytics

Categorical (nominal) Data


• Order  Data placed in categories according to a
specified characteristic
 Categories bear no quantitative relationship
• Distance to one another
 Examples:
• Origin - customer’s location (Chic-Chicken outlet)
- employee classification (Store manager,
supervisor, associate)

4
Data for Customer Analytics Data for Customer Analytics

Ordinal Data Interval Data


 Data that is ranked or ordered according to  Ordinal data but with constant differences
some relationship with one another between observations
 No fixed units of measurement  No true zero point
 Examples:  Ratios are not meaningful
- Food quality ranking  Examples:
- Service quality ranking - Customer perception rating
- Sales employee rating

Data for Customer Analytics Decision Models

Ratio Data Model:


 Continuous values and have a natural zero  An abstraction or representation of a real
point system, idea, or object
 Ratios are meaningful  Captures the most important features
 Examples:
- monthly sales
- delivery times

Decision Models Decision Models

In industry, managers typically need to


Nature of Decision Models know how best to use, price, quality of
product and advertising strategies to
influence sales.
Example: Chic-Chicken case
Using Customer Analytics, Kiran can
develop a model that predicts customers’
willingness to purchase using food taste,
quantity served, pricing, ambiance and
advertising.

5
Decision Models – Kiran’s Dilemma

Willingness to return in future = . Finding associations


between strategic
variables and consumer
decisions.
Correlation and Simple Regression

Correlation

• Correlation: The simultaneous change in value


of two numerically valued random variables
• The correlation coefficient computed from
the sample data measures the strength and
direction of a relationship between two
variables. Correlation does NOT necessarily imply
• Sample correlation coefficient is denoted by causation
r. We do hypotheses testing to infer about
the population coefficient
• Population correlation coefficient, 

Correlation & Causation Why?


• Causation means cause & effect relation.
• Correlation denotes the interdependency among the • Height determines weight
variables for correlating two phenomenon, it is essential that
the two phenomenon should have relationship but may not • Weight determines height
be cause-effect. • Sells of Mercedes Benz and sells of diamond
• If two variables vary in such a way that change in one jewelry. Both are determined by disposable
(cause) are accompanied by change in other (effect) having income.
all other factors that can make move the ‘effect’, constant,
then these two variables are said to have cause and effect
• The two variables don’t correlate in the
relationship. population at all, and the observed correlation in
• In other words causation always implies correlation but our sample was a coincidence
correlation does not always imply causation. – If p < 0.05 then you can reject this

6
Variance vs Covariance Covariance

• Do two variables change together?


n
n

(x i  x) 2  (x i  x )( yi  y )
Variance ~ cov( x, y )  i 1
S x2  i 1
n
DX * DX n
• When X and Y : cov (x,y) = pos.
n
• When X and Y : cov (x,y) = neg.
Covariance ~  (x i  x)( yi  y )
cov( x, y )  i 1 • When no constant relationship: cov (x,y)
DX * DY n =0

Example Covariance Pearson’s R

7
   cov( x, y )  
6
x y xi  x yi  y ( xi  x )( yi  y )
5
0 3 -3 0 0 • Covariance does not really tell us much
4
2 2 -1 -1 1
3
3 4 0 1 0 about the strength of association
2

1
4 0 1 -3 -3 – Solution: standardise this measure
0
6 6 3 3 9
0 1 2 3 4 5 6 7
x3 y3  7 • Pearson’s R: standardise by adding s.d to
n equation: cov( x, y )
(x i  x)( yi  y))
7 What does this rxy 
cov( x, y )    1.4
sx s y
i 1

n 5 number tell us?

To start with, out of curiosity, Kiran


wanted to check the following
• Is perceived food quality associated with Predict customer behavior and
customer’s willingness to repurchase from
Chic-Chicken?
decisions
• Is how fun place a fast food joint associated Know the effect of different
with customers’ satisfaction? product strategies on
• Do it in Excel / SPSS consumer decisions
Regression Model

7
What is a Model? What is a Math/Stats Model?

1. Describe Relationship between Variables


Non-Math/Stats Model
Representation of Some 2. Types
Phenomenon - Deterministic Models (no randomness)

- Probabilistic Models (with randomness)


– Not of our interest at present

Specifying the deterministic


Regression Models
component

• Relationship between one dependent


1. Define the dependent variable and
variable(s) and explanatory variable(s) independent (loosely speaking) variable
• Use equation to set up relationship
• Continuous (or discrete) Dependent (Response) 2. Hypothesize Nature of Relationship
Variable
– Expected Effects (i.e., Coefficients’ Signs)
• 1 or More continuous or Categorical
– Functional Form (Linear or Non-Linear)
Independent (Explanatory) Variables
– Interactions
• Used Mainly for Prediction & Estimation

Types of
Regression Models
1 Explanatory Regression 2+ Explanatory
Variable Models Variables Estimating effect of product
strategy/tactics and
Simple Multiple
predicting consumer
Non- Non-
behavior
Linear Linear
Linear Linear
Two variable (Simple)
Linear Regression Model

8
Deviation not
Y explained by
regression
Total Deviation

Y=
Deviation average
explained by
regression

Regression Analysis Terms


Population Sample
Parameter Statistic

y-intercept of regression equation b0 b0


Explained variance = R2 (coefficient of
Slope of regression equation b1 b1
determination).
Equation of the regression line E(y) = b0 + b1 x ^
y = b0 + b1 x Unexplained variance = residuals (error).

Which line to take Coefficient of Determination


• Coefficient of Determination
– This is the sum of differences
between the points and the
regression line.
– It can serve as a measure of how well • Adjusted Coefficient of Determination
the line fits the data. SSE is defined
by
• If R2 > 0, then we reject the null hypothesis
n
SSE   ( y i  ŷ i ) 2 . of no relationship.
i 1

9
Linear Regression
Discussion Questions
Assumptions
• Will the regression line be the same if you
exchange X and Y?
• Y can be predicted from X
– In nearly all instances, the regression
• A graph of X & Y is a straight line line will be different
• The line extends infinitely in both • Will the correlation coefficient be the same if you
directions exchange X and Y?
• Model explains only the variability in Y – The correlation will be the same, as it
• Each XY pair was randomly sampled makes no difference which is called X
• Each XY pair was selected and which is called Y
independently • Do the X and Y axes have to have the same units
to perform linear regression?
– No

Description of Customer Survey Variables


Kiran to check Variable Description
Restaurant Perceptions
Variable Type

X1 Excellent Food Quality Metric


X2 Attractive Interior Metric
X3 Generous Portions Metric
• Does perceived food quality has significant X4 Excellent Food Taste Metric
X5 Good Value for the Money Metric
positive effect on future return (Suhani Sharma’s claim) X6
X7
Friendly Employees
Appears Clean & Neat
Metric
Metric
X8 Fun Place to Go Metric
• Does perceived attractiveness of interior of X9
X10
Wide Variety of menu Items
Reasonable Prices
Metric
Metric
restaurant has significant positive effect on future X11
X12
Courteous Employees
Competent Employees
Metric
Metric

return (Rohit Sen’s claim) Selection Factor Rankings


X13 Food Quality Nonmetric
X14 Atmosphere Nonmetric
• Does portion served has significant positive effect X15
X16
Prices
Employees
Nonmetric
Nonmetric

on future return? (Rohit Sen’s claim) Relationship & Classification Variables


X17 Satisfaction Metric
X18 Likely to Return in Future Metric
X19 Recommend to Friend Metric
X20 Frequency of Patronage Nonmetric
X21 Who Saw Ad Nonmetric
• Do it in software X22
X23
Which Ad Viewed
Ad Rating
Nonmetric
Metric
X24 Length of Time a Customer Metric
X25 Gender Nonmetric
X26 Age Metric
X27 Income Metric
X28 Competitor Nonmetric
58

Using SPSS to Compute a Multiple Regression


Description of Employee Survey Variables
Model
Variable Description Variable Type
Work Environment Measures
X1 I am paid fairly for the work I do. Metric
X2 I am doing the kind of work I want. Metric We want to first see the effect of food quality on
X3
X4
My supervisor gives credit an praise for work well done.
There is a lot of cooperation among the members of my work group.
Metric
Metric
customers future return. The SPSS click through sequence
X5 My job allows me to learn new skills. Metric is ANALYZE  REGRESSION  LINEAR. Highlight X18
X6 My supervisor recognizes my potential. Metric
X7 My work gives me a sense of accomplishment. Metric and move it to the dependent variables box. Highlight X1
X8 My immediate work group functions as a team. Metric
X9 My pay reflects the effort I put into doing my work. Metric and move it to the independent variables box. Use the
X10
X11
My supervisor is friendly and helpful.
The members of my work group have the skills and/or training
Metric default “Enter” in the Methods box. Click on the Statistics
to do their job well. Metric button and use the defaults for “Estimates” and “Model Fit”.
X12 The benefits I receive are reasonable. Metric
Relationship Measures Next click on “Descriptives” and then Continue. There are
X13
X14
Loyalty – I have a sense of loyalty to Chic-Chicken restaurant.
Effort – I am willing to put in a great deal of effort beyond that
Metric
several other options you could select at the bottom of this
expected to help Chic-Chicken restaurant to be successful. Metric dialog box but for now we will use the program defaults.
X15 Proud – I am proud to tell others that I work for Chic-Chicken restaurant. Metric
Classification Variables Click on “OK” at the top right of the dialog box to run the
X16 Intention to Search Metric
X17 Length of Time an Employee Nonmetric regression.
X18 Work Type = Part-Time vs. Full-Time Nonmetric
X19 Gender Nonmetric
X20 Age Metric
X21 Performance 59 Metric

10
Multiple Regression Model
 The equation that describes how the dependent
variable y is related to the independent variables
X3 x1, x2, . . . xp and an error term is called the
multiple regression model.
 The multiple regression model is:

Y’ y = b0 + b1x1 + b2x2 + . . . + bpxp + e

X1  b 0, b 1, b 2, . . . , b p are the effects of


independent variables.
 e is a random variable called the error term.
X2 In the SLR, the conditional mean of Y depends on
X. The Multiple Regression Model extends this
idea to include more than one independent
variable.

Multiple Regression Equation Estimated Multiple Regression Equation

 The equation that describes how the mean value of y is  A simple random sample is used to compute sample
related to x1, x2, . . . xp is called the multiple regression statistics b0, b1, b2, . . . , bp that are used as the point
equation. estimators of the parameters b0, b1, b2, . . . , bp.
 The multiple regression equation is:
 The estimated multiple regression equation is:
E(y) = b0 + b1x1 + b2x2 + . . . + bpxp
^y = b0 + b1x1 + b2x2 + . . . + bpxp

Estimation Process Least Squares Method

Multiple Regression Model Sample Data:


y = b0 + b1x1 + b2x2 + . . + bpxp + e x1 x2 . . . x p y • Least Squares Criterion
Multiple Regression Equation . . . .
E(y) = b0 + b1x1 + b2x2 + . . . + bpxp . . . . min  ( y i  ^y i )2
Unknown parameters are • Computation of Coefficients Values
b0, b1, b2, . . . , bp
The formulas for the regression
Estimated Multiple coefficients b0, b1, b2, . . . bp involve the
b0, b1, b2, . . . , bp
Regression Equation use of matrix algebra. We will rely on
ˆy  b0  b1 x1  b2 x2  ...  bp x p
provide estimates of
b0, b1, b2, . . . , bp
computer software packages to perform
b0, b1, b2, . . . , bp
are sample statistics the calculations.

11
Least Squares Method

• Carryout the same process taking X1 and


 A Note on Interpretation of Coefficients X3 as independent variables - and then X1,
bi represents an estimate of the change in y X2 and X3 as independent variables
corresponding to a one-unit change in xi when all other
independent variables are held constant.

Cluster Analysis

Segmenting the (classification analysis, numerical


market/grouping sales taxonomy):
employees, estimating size of a class of techniques used to classify objects or
cases into relatively homogeneous groups called
the market/group and clusters based on the set of variables considered.
profiling the segments/groups Objects in each cluster tend to be similar to each
other and dissimilar to objects in the other
clusters.
Cluster Analysis
objects: either variables or observations;
likeness: calculated from the measurements for
each object.

Applications:

3. identifying new product opportunities: e.g.,


1. market segmentation: e.g., benefit clustering brands and products to identify
segmentation: clustering consumers on the competitive sets within the market, a firm can
basis of benefits sought from the purchase of examine its current offerings compared to
a product, those of its competitors to identify potential
new product opportunities,
2. understanding buyer behaviors: e.g.,
clustering consumers to identify 4. selecting test markets: e.g., clustering cities
homogeneous groups, a firm can examine the into homogeneous clusters, a firm can select
buying behavior or information seeking comparable cities to test various marketing
behavior of each group, strategies.

12
Distance measures for individual
observations
Model:
• To measure similarity between two observations a
distance measure is needed Data: each object is characterized by a set of
• With a single variable, similarity is straightforward numbers (measurements);
• Example: income – two individuals are similar if their income level
is similar and the level of dissimilarity increases as the income
e.g., object 1: (x11, x12, … , x1n)
gap increases object 2: (x21, x22, … , x2n)
• Multiple variables require an aggregate distance : :
measure
object p: (xp1, xp2, … , xpn)
• Many characteristics (e.g. income, age, consumption habits,
brand loyalty, purchase frequency, family composition, education
level, ..), it becomes more difficult to define similarity with a single
value Distance: Euclidean distance, dij,
• The most known measure of distance is the Euclidean
distance, which is the concept we use in everyday life for d ij  xi1  x j1   xi 2  x j 2     xin  x jn 
2 2 2

spatial coordinates.

Three Cluster Diagram Showing


Example Between-Cluster and Within-Cluster Variation

Between-Cluster Variation = Maximize


Within-Cluster Variation = Minimize
Household Household Size
Income Size A
A 50K 5
4.24  32  32
1
B 50K 4 B
C 20K 2
3.61  22  32
D 20K 1
C
1
D $
(unit: 10K)
20K 50K

76

Scatter Diagram for Cluster


Observations Scatter Diagram for Cluster Observations
High
High
Frequency of eating out

Frequency of eating out

Low
Low High Low
Low High
Frequency of going to fast food restaurants Frequency of going to fast food restaurants

13
Clustering procedures Hierarchical clustering
• Agglomerative:
• Hierarchical procedures • Each of the n observations constitutes a separate cluster
• The two clusters that are more similar according to some distance rule are
• Agglomerative (start from n clusters to aggregated, so that in step 1 there are n-1 clusters
• In the second step another cluster is formed (n-2 clusters), by nesting the two
get to 1 cluster) clusters that are more similar, and so on
• There is a merging in each step until all observations end up in a single
• Divisive (start from 1 cluster to get to n cluster in the final step.
• Divisive
clusters) • All observations are initially assumed to belong to a single cluster
• The most dissimilar observation(s) is extracted to form a separate cluster
• Non hierarchical procedures • In step 1 there will be 2 clusters, in the second step three clusters and so on,
until the final step will produce as many clusters as the number of
observations. This technique is used in medical research and not in the
• K-means clustering scope of our course.
• The number of clusters determines the stopping rule for the
algorithms

Non-hierarchical clustering Outlairs


• These algorithms do not follow a hierarchy and produce a
single partition • It would affect your cluster solution if you
• Knowledge of the number of clusters (c) is required don’t remove it!
• In the first step, initial cluster centres (the seeds) are
determined for each of the c clusters.
• Each iteration allocates observations to each of the c • It would affect your cluster solution if you
clusters, based on their distance from the cluster centres
remove it! (small sample size)
• Cluster centres are computed again and observations may
be reallocated to the nearest cluster in the next iteration
• When no observations can be reallocated or a stopping rule
is met, the process stops

How many clusters? Cluster Analysis – Variable Selection

no hard and fast rules,


a. theoretical, conceptual, or practical
• Variables are typically
considerations; measured metrically, but
b. the distances at which clusters are combined technique can be applied to
in a hierarchical clustering; non-metric variables with
c. the relative size of the clusters should be caution.
meaningful, etc.
• Variables are logically related
to a single underlying concept
or construct.

14
Using SPSS to Identify Clusters

Variable Description Type


Work Environment Measures
X1 I am paid fairly for the work I do. Metric For this example we are looking for subgroups among all the 63
X2 I am doing the kind of work I want. Metric
X3 My supervisor gives credit and praise for work well done. Metric employees of Chic-Chicken restaurant using the “organizational
X4 There is a lot of cooperation among the members of my work group. Metric
X5 My job allows me to learn new skills. Metric
commitment” variables. The SPSS click through sequence is: Analyze 
X6 My supervisor recognizes my potential. Metric Classify  Hierarchical Cluster. This will take you to a dialog box where
X7 My work gives me a sense of accomplishment. Metric
X8 My immediate work group functions as a team. Metric
you select and move variables X13, X14 and X15 into the “Variables” box.
X9 My pay reflects the effort I put into doing my work. Metric Next you go to the statistics box and agglomeration schedule is selected as
X10 My supervisor is friendly and helpful. Metric
X11 The members of my work group have the skills and/or training
default option. Cluster membership ‘none’ is selected as default. We shall
to do their job well. Metric continue with default option here. Next click on ‘plot’ box. Check on
X12 The benefits I receive are reasonable. Metric
Relationship Measures
dendogram and in Icicle window, click on none button. Then continue.
X13 I have a sense of loyalty to Chic-Chicken restaurant. Metric Next click on the Method box and select Ward’s under Cluster Method (it
X14 I am willing to put in a great deal of effort beyond that
expected to help Chic-Chicken restaurant to be successful. Metric
is the last option). Squared Euclidean Distances is the default under
X15 I am proud to tell others that I work for Chic-Chicken restaurant. Metric Measure and we will use it, and we do not need to standardize this data.
Classification Variables
X16 Intention to Search Metric
We will not select anything on the save option now. Now click on “OK” to
X17 Length of Time an Employee Nonmetric run the program.
X18 Work Type = Part-Time vs. Full-Time Nonmetric
X19 Gender Nonmetric
X20 Age Metric
X21 Performance Metric

Non-hierarchical clustering Hierarchical vs. non-hierarchical methods


• These algorithms do not follow a hierarchy and produce a
single partition Hierarchical Methods Non-hierarchical methods
• Knowledge of the number of clusters (c) is required
 No knowledge about the  Faster, more reliable, works
• In the first step, initial cluster centres (the seeds) are number of clusters required with large data sets
determined for each of the c clusters.  Outliers can be easily  Need to specify the number of
• Each iteration allocates observations to each of the c identified clusters
clusters, based on their distance from the cluster centres  Can be very slow  Need to set the initial seeds
• Cluster centres are computed again and observations may  At each step they require  Only cluster distances to seeds
computation of the full need to be computed in each
be reallocated to the nearest cluster in the next iteration
proximity matrix iteration
• When no observations can be reallocated or a stopping rule
is met, the process stops

How many clusters? Cluster Analysis – Variable Selection

no hard and fast rules,


a. theoretical, conceptual, or practical
• Variables are typically
considerations; measured metrically, but
b. the distances at which clusters are combined technique can be applied to
in a hierarchical clustering; non-metric variables with
c. the relative size of the clusters should be caution.
meaningful, etc.
• Variables are logically related
to a single underlying concept
or construct.

15
Kiran’s Dilemma
Variable Description Type
Work Environment Measures
• Group employees based on their X1
X2
I am paid fairly for the work I do.
I am doing the kind of work I want.
Metric
Metric

commitment to Chic-Chicken X3
X4
My supervisor gives credit and praise for work well done.
There is a lot of cooperation among the members of my work group.
Metric
Metric
X5 My job allows me to learn new skills. Metric

• Identify the profile of employees in each X6


X7
My supervisor recognizes my potential.
My work gives me a sense of accomplishment.
Metric
Metric
X8 My immediate work group functions as a team. Metric
group X9 My pay reflects the effort I put into doing my work.
X10 My supervisor is friendly and helpful.
Metric
Metric
X11 The members of my work group have the skills and/or training
• Recommend the HR head for suitable to do their job well.
X12 The benefits I receive are reasonable.
Metric
Metric
intervention Relationship Measures
X13 I have a sense of loyalty to Chic-Chicken restaurant. Metric
X14 I am willing to put in a great deal of effort beyond that
expected to help Chic-Chicken restaurant to be successful. Metric
X15 I am proud to tell others that I work for Chic-Chicken restaurant. Metric
Classification Variables
X16 Intention to Search Metric
X17 Length of Time an Employee Nonmetric
X18 Work Type = Part-Time vs. Full-Time Nonmetric
X19 Gender Nonmetric
X20 Age Metric
X21 Performance Metric

Using SPSS to Identify Clusters

For this example we are looking for subgroups among all the 63
employees of Chic-Chicken restaurant using the “organizational
commitment” variables. The SPSS click through sequence is: Analyze 
Classify  Hierarchical Cluster. This will take you to a dialog box where
you select and move variables X13, X14 and X15 into the “Variables” box.
Next you go to the statistics box and agglomeration schedule is selected as
default option. Cluster membership ‘none’ is selected as default. We shall
continue with default option here. Next click on ‘plot’ box. Check on
dendogram and in Icicle window, click on none button. Then continue.
Next click on the Method box and select Ward’s under Cluster Method (it
•Thank you
is the last option). Squared Euclidean Distances is the default under
Measure and we will use it, and we do not need to standardize this data.
We will not select anything on the save option now. Now click on “OK” to
run the program.

16

You might also like