0% found this document useful (0 votes)
7 views51 pages

MKT3600 - L09 - Correlation and Regression

This document covers key concepts in marketing research, specifically focusing on hypothesis testing, correlation, and regression analysis. It outlines the steps for hypothesis testing, including formulating hypotheses, selecting significance levels, calculating test statistics, and making decisions based on critical values. Additionally, it introduces linear regression models and their components, emphasizing their application in analyzing relationships between variables.

Uploaded by

sweetpotatokw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views51 pages

MKT3600 - L09 - Correlation and Regression

This document covers key concepts in marketing research, specifically focusing on hypothesis testing, correlation, and regression analysis. It outlines the steps for hypothesis testing, including formulating hypotheses, selecting significance levels, calculating test statistics, and making decisions based on critical values. Additionally, it introduces linear regression models and their components, emphasizing their application in analyzing relationships between variables.

Uploaded by

sweetpotatokw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 51

MKT 3600: Marketing

Research

Lecture 9: Correlation and


regression

Zhuping Liu
Announcements
• Article 7 discussion
• Assignment 5
– available on Blackboard
– due on May 17th
• Final Exam Review on May 10th
• Final Project Presentation on May
17th
• Final Exam:
– EMA MAY 19 W on Blackboard
– FMA MAY 24 M on Blackboard
What We Have Learned
Hypothesis testing

• Chi-Square Test

• Hypothesis testing about single


mean (t-test)
Hypothesis Testing: Steps
1. Formulate Hypotheses
2. Select significance level (usually
0.05)
3. Select appropriate formula and
calculate test statistic
– Compare what we observe from data
with what we expect under H0
4. Calculate degrees of freedom
5. Obtain critical value from table
6. Make decision regarding H0
Example
Question: Is the Sample Representative
of the Population in age?

Age Observed
Groups Population
N=210
10-20 35 12%

21-30 125 18%

31-40 27 28%

41+ 23 42%
Hypothesis Testing:
Steps
• Step 1: Formulate Hypotheses
H0: no difference between sample distribution and
population distribution
Ha: there is difference
• Step 2: Select significance level (0.05)
• Step 3: Select appropriate formula and
calculate test statistic
G (Obsg  Expg ) 2
2  
g 1 Expg
Hypothesis Testing: Step 3
G (Obs  Exp ) 2

 2 g1 g g

Exp g
G=Total number of groups
Age Observed
Population Expected
Groups N=210
10-20 35 12% 210*12/100

21-30 125 18% 210*18/100

31-40 27 28% 210*28/100

41+ 23 42% 210*42/100


(Obs  Exp ) 2

Step 3
G

 2 g1 g g

Exp g
G=Total number of groups

Age Observe Test Statistic


Expected
Group d N=210
s (35  25.2) 2
35 25.2 4
10-20 25.2
(125  37.8) 2
125 37.8 201
21-30 37.8
(27  58.8) 2
27 58.8 17
31-40 58.8
(23  88.2) 2
23 88.2 48
41+ 88.2

2 =4+201+17+48=270
Hypothesis Testing:
Steps
• Step 4: Calculate degrees of freedom

df = degrees of freedom = number of groups-1


= 4-1 = 3

• Step 5: Obtain critical value from table

2
 Critical(.05,3) 7.81
• Step 6: Make decision regarding H0

 2 270  7.81  2 Critical

We reject H0 that the sample is


Significance Level
df 0.1 0.05 0.025 0.01 0.005
1 2.7 3.8 5.0 6.6 7.9
2 4.6 6.0 7.4 9.2 10.6
3 6.3 7.8 9.3 11.3 12.8
4 7.8 9.5 11.1 13.3 14.9
5 9.2 11.1 12.8 15.1 16.8
6 10.6 12.6 14.4 16.8 18.5
7 12.0 14.1 16.0 18.5 20.3
8 13.4 15.5 17.5 20.1 22.0
9 14.7 16.9 19.0 21.7 23.6
10 16.0 18.3 20.5 23.2 25.2
11 17.3 19.7 21.9 24.7 26.8
12 18.5 21.0 23.3 26.2 28.3
13 19.8 22.4 24.7 27.7 29.8
14 21.1 23.7 26.1 29.1 31.3
15 22.3 25.0 27.5 30.6 32.8
16 23.5 26.3 28.8 32.0 34.3
17 24.8 27.6 30.2 33.4 35.7
18 26.0 28.9 31.5 34.8 37.2
19 27.2 30.1 32.9 36.2 38.6
Testing Relationships in Cross
Tabs
• Question: is there association between
income and journal choice?
Low High
Total
Income Income

Wall Street
Journal 83 180 263

USA Today 276 41 317

Total 359 221 580


Hypothesis Testing:
Steps
• Step 1: Formulate Hypotheses
H0: no association between income and journal
choice
Ha: there is association
• Step 2: Select significance level (0.05)
• Step 3: Select appropriate formula and
calculate test statistic
2
(Obs  Exp )
 2 
Exp
Expected Outcome
if No Association (assuming H0
is true)
Low Income High Income
263 221
580 * * 100
580 580
263 * 359 221* 263
Wall 163 100 263
Street 580 580
Journal

USA 317 * 359 221* 317 317


Today 196 121
580 580
359 221 580
2-test for Association
Observed Expected if no association

Low High Low High


Incom Incom Incom Incom
e e e e
Wall Wall Street 26
Street 83 180 263 Journal
163 100
3
Journal
USA 31
276 41 317 USA Today 196 121
Today 7
359 221 580 58
359 221
0

• 2 = (83-163)2/163 + (180-100)2/100 + (276-


196)2/196 + (41-121)2/121
= 188.8
Hypothesis Testing:
Steps
• Step 4: Calculate degrees of freedom

df = degrees of freedom = (r-1)*(c-1)


= (2-1)(2-1) = 1

• Step 5: Obtain critical value from table

 2 ( 0.05, df 1) 3.84


• Step 6: Make decision regarding H0

 2 188.8  3.84  2 Critical

We reject H0 that there is no association


Significance Level
df 0.1 0.05 0.025 0.01 0.005
1 2.7 3.8 5.0 6.6 7.9
2 4.6 6.0 7.4 9.2 10.6
3 6.3 7.8 9.3 11.3 12.8
4 7.8 9.5 11.1 13.3 14.9
5 9.2 11.1 12.8 15.1 16.8
6 10.6 12.6 14.4 16.8 18.5
7 12.0 14.1 16.0 18.5 20.3
8 13.4 15.5 17.5 20.1 22.0
9 14.7 16.9 19.0 21.7 23.6
10 16.0 18.3 20.5 23.2 25.2
11 17.3 19.7 21.9 24.7 26.8
12 18.5 21.0 23.3 26.2 28.3
13 19.8 22.4 24.7 27.7 29.8
14 21.1 23.7 26.1 29.1 31.3
15 22.3 25.0 27.5 30.6 32.8
16 23.5 26.3 28.8 32.0 34.3
17 24.8 27.6 30.2 33.4 35.7
18 26.0 28.9 31.5 34.8 37.2
19 27.2 30.1 32.9 36.2 38.6
Testing Hypothesis about a
Single Mean
Question: Do people think that the quality of
food at our restaurant is above average (5)?
Data: Average rating from 100 respondents
is 6.5
• Step 1: Formulate Hypotheses
– H0: Ratingfoodquality = 5
– Ha: There is a difference.
• Two-sided
– The average rating of food quality is not 5.
Ha: ratingfoodquality ≠ 5
• One-sided
– The average rating of food quality is higher than
5.
Ha: ratingfoodquality > 5
• Step 2: Select significance level (0.05)
Step 3: Computing t-
statistic
H : x k o

H a : x k or H a : x  k

x k
t-statistic: t
sx 2
s sample variance
s:standard
x error,sx 
x

n
x:sample mean,
x 6.5 2
s 4
x
n 100

6 .5  5 1 .5
t  7.5
4 .2
100
t-critical
• Step 4: calculate the degrees of freedom
df = degrees of freedom = total sample size-1
= n-1 = 100-1 = 99

• Step 5: Obtain critical value from table


– For two-sided test:
Ha: ratingfoodquality ≠ 5
• tcritical = t α/2,n-1
• For large n, t.025 = 1.96
t-critical=1.96
– For one-sided test:
Ha: ratingfoodquality > 5
• tcritical = t α,n-1
• For large n, t.05 =1.65 t-critical=1.65
Hypothesis Testing: Step 6

• Step 6: Make decision regarding H0

Ha: ratingfoodquality ≠ 5
For two-sided test:
- Reject null if |t| > t critical |t|=7.5 > t-critical=1.96 
- Fail to reject null if |t| < tReject H0
critical

For one-sidedHa:
test:
ratingfoodquality > 5

- Reject null if t > t critical t=7.5 > t-critical=1.65 


- Fail to reject null if t < t Reject
critical
H0
Type I and Type II errors
wasted resources & missed
opportunities
True State of the Null Hypothesis
Decision
Ad has no effect Ad has an effect
false positive
Select ad Correct
wasted resources
Do not select ad Correct missed opportunity

True State of the Null Hypothesis


Decision
H0 True H0 False
Reject H0 Type I error Correct
Do not Reject H0 Correct Type II error

21
Today
• Regression Analysis
– Regression Analysis in Excel:
http://www.excel-easy.com/examples/regression.
html

– Instructions on how to install the data analysis


tool are available on assignment 5
Linear Regression Model

Elements of a linear model

Random
Intercept Slope Error

y  a  bx  
Dependent Independent
Variable Variable
Linear regression
linear y  a  bx  
regression

observed unobserved

y dependent variable b regression coefficient (slope)


variable related to various other variables The effect. Measures the change of y as x
e.g., sales, preference increases by one unit (holding other factors constant
Also referred to as the marginal effect

a intercept
X
independent variable
value of y when x= 0
variables that influence
the value of the dependent variable
e.g., prices, promotions, etc.  random error
unobserved errors. E.g.,
measurement error 24
missing variables
Linear Regression Model

y
e a ns)
neo fm
(l i
+ bx Change
(y ) =a
E b = Slope in y
Change in x

a = y-intercept
x
Linear Regression
Model

y 𝑦 𝑖= 𝑎 ^ 𝑥 +𝜀
^ +𝑏 𝑖 𝑖 Observed
value

i = Random error

^ ^𝑥
^ +𝑏
𝑦 𝑖= 𝑎 𝑖

x
Observed value
What is the “Best”
Regression Model?
• How would you draw a line through the points?
• How do you determine which line ‘fits best’?

y
60
40
20
0 x
0 20 40 60
• ‘Best fit’ means difference between actual y values and
predicted y values are a minimum (least squares)
• So minimize SSE = 𝑛 𝑛

∑ 𝑖 𝑖 ∑ 𝑖
( 𝑦 − ^
𝑦 )
2
= 𝜀
2

• SSE: sum of squared


𝑖=1error 𝑖=1
Least Squares
Illustration
𝑛
Least squares
minimizes
∑𝜀 𝑖
2 2 2 2
=𝜀1 + 𝜀2 + 𝜀3 + 𝜀 4 2

𝑖 =1

y 𝑦 2= 𝑎 ^ 𝑥 +𝜀
^ +𝑏 2 2

𝜺𝟐 𝜺𝟒

𝜺𝟏 𝜺𝟑
^ ^𝑥
^ +𝑏
𝑦 𝑖= 𝑎 𝑖

x
Interpretation of
Regression Coefficients
Impact of Advertising on Yogurt Sales:

• Slope ()
– Yogurt sales are expected to increase
by 0.1 units for each $1 increase in
advertising (x)
• Intercept ()
– Average yogurt sales are expected to
be 100 units when there is no
Linear Regression:
Assessing Fit
how well does the regression line
Assess fit fit the data points ?
R2 : amount of variance of Y explained through the regre
0 < R2 < 1

Y Y

X
X
low R2 high R2
Linear regression:
Prediction
Once we know a and b’s, we can predict Y for any value of X’

How?
^ ^ ^
=
Ya+b X
^
1 1 + b2 X 2 + … + bK X K

e.g., what will be sales (Y)


‘what if’ analyses
when we set prices to $X
Hypotheses Testing
Yi = a + b1 X i1 + b2 X i2 + … + bK X iK + e i

there a statistically significant effect of an independent variable, X k (say price)


the dependent variable Y (say sales)?

Null hypothesis: H0 : bk = 0

2
use t statistic: tn-k-1 = bk / Sbk Sbk : variance of bk

with n-K-1 degrees of freedom

compute p-value based on T statistic

if p small, say < 0.05 reject H0 significant effect


Application of Regression:
Tropicana Orange Juice Pricing
Data
• Weekly sales data of Tropicana orange juice in
Dominick’s stores
• Data Description:
– WEEK Week number
– SalesTrop Units Sales of Tropicana Orange Juice
(cartons)
– PriceTrop Price of Tropicana Orange Juice
– PriceMM Price of Minute Maid Orange Juice
– PriceDom Price of Dominick’s Orange Juice
– Feature Dummy variable indicating that Tropicana was
featured in weekly brochure
– Display Dummy Variable indicating that Tropicana had
an In-store display, bonus-tags
Step 1: Model
• Estimate the following model

SalesTrop= Intercept +
a*PriceTrop + b*PriceMM +
c*PriceDom + d*Feature +
e*Display +
Step 2 : “What Ifs”
• If price of Tropicana were to increase
by $1 what would happen to the unit
sales of Tropicana?
• If the price of Minute Maid were to
increase by $1 what would happen to
the unit sales of Tropicana?
• If the price of store brand were to
increase by $1 what would happen to
the unit sales of Tropicana?
Step 2 : “What Ifs”
• When there is a Feature for
Tropicana, what is the impact on unit
sales of Tropicana?

• When there is a Display for


Tropicana, what is the impact on unit
sales of Tropicana?
Interpreting Regression
Output
SUMMARY OUTPUT
How good is the fit?
Regression Statistics
Multiple R 0.818424409
R Square 0.669818514
Adjusted R Square 0.654810265
Standard Error 11811.44376 What is an intercept?
Observations 116

ANOVA
What does a negative
df SS MS F coefficient imply?
Regression 5 31131717954 6226343591 44.63002297
Residual 110 15346122395 139510203.6
Total 115 46477840349

Coefficients Standard Error t Stat P-value What does the positive


Intercept 54434.01686 10151.90119 5.36195298 4.59178E-07
PriceTrop -21274.83138 2606.694783 -8.161611987 5.97451E-13
Coefficient of “Display
PriceMM 8796.880907 2830.093242 3.108336071 0.002395097 Mean?
PriceDom 898.2071683 3047.525898 0.294733236 0.768753189
Feature 938.3844498 2644.852137 0.354796564 0.723421321
Display 19576.16684 3195.870786 6.125456299 1.43361E-08

Is it significant? NO!

What kind of variable “Feature” is?


Use of Dummy Variables
• To capture the effect of categorical
variables
– Brands, In-store displays, Gender
• Dummy variables estimate indicate
the impact of the category on
dependent variable
• Dummy variable has a value of 0 or 1
– 1 indicates presence of characteristic
– 0 indicates absence of characteristic
Coding Dummy Variables
• If a category can either be present or
absent, then code:
– Presence as 1
– Absence as 0
– Example: Presence of “In Store Display”
• If a category can be of two “types”:
– Code one of the category as 1
– Code the other as 0
– Example: Male/ Female; Cash/ Credit
Example
Effect of presence of an in-store display (X) on brand sales

dummy codinguse one or more 0/1 variables as


independent variables

1 : if brand is on display
Di =
0 : if brand is not on display

a + b: if brand is on display
Yi = a + b D i =
a : otherwise

so b is the effect of display in this example

41
Non-Linear Effects
Likelihood of Purchasing Candy Bar = 1.1+ 3 *
Sweetness
So should we keep adding sugar?
hat if more is not better? Y i = a + b 1 X i + b 2 X i 2 + ei
purchase likelihood of candy bar (Y)

b1>0 b2 < 0

b1<0 b2 > 0

sweetness (X)
The log-log sales
Response Model
• The log-log sales response model is the single
most useful tool in analyzing the competitive
structure of retail markets
log(sales in period t) = β0 + β1*log(own price in period t) +
β2*log(competitor price in
period t) + εt

• This model typically fits the data much better


than the linear model
• Coefficients to log(prices) may be interpreted as
price elasticities
Log-Log Sales Model
Log(Sales in period t) = a + bown* Log (Own
Price in period t)+ bcross * Log (Other Good
Price in period t)+ badvert * Log (Advertising)
+ bdisplay * Display

• Interpretation of Coefficients:
– Coefficient on ln x = % change in Y,
when x increases by 1%
Running the Model

Log(SalesTrop)= Intercept +
a*Log(PriceTrop) + b*Log(PriceMM) +
c*Log(PriceDom) + d*Feature +
e*Display
Output of the Log-Log
Model
SUMMARY OUTPUT

Regression Statistics Is it a better model compared to a


Multiple R 0.89144 Simple linear model ???
R Square 0.794666
Adjusted R Square
0.785333
Standard Error 0.348969
Observations 116

ANOVA
df SS MS F
Regression 5 51.84306 10.36861 85.1425
Residual 110 13.39575 0.12178 Check R-Square
Total 115 65.23881

Coefficients
Standard Error t Stat P-value
Intercept 11.56145 0.273605 42.25597 7.83E-70
Ln(PriceTrop) -2.51154 0.203726 -12.328 1.85E-22
Ln(PriceMM) 0.553096 0.182128 3.036851 0.002986
Ln(PriceDom) -0.04492 0.196747 -0.2283 0.819838
Feature 0.065482 0.078316 0.836129 0.404895
Display 0.632155 0.094394 6.697001 9.37E-10
Linear Regression
Output
SUMMARY OUTPUT
How good is the fit?
Regression Statistics
Multiple R 0.8184244
R Square 0.6698185
Adjusted R Square0.6548103
Standard Error 11811.444
Observations 116

ANOVA
df SS MS F
Regression 5 31131717954 6.226E+09 44.630023
Residual 110 15346122395 139510204
Total 115 46477840349

CoefficientsStandard Error t Stat P-value


Intercept 54434.017 10151.90119 5.361953 4.592E-07
PriceTrop -21274.831 2606.694783 -8.161612 5.975E-13
PriceMM 8796.8809 2830.093242 3.1083361 0.0023951
PriceDom 898.20717 3047.525898 0.2947332 0.7687532
Feature 938.38445 2644.852137 0.3547966 0.7234213
Display 19576.167 3195.870786 6.1254563 1.434E-08
Interpreting the Ln(Price)
Coefficients
• When the price of Tropicana
increases by 1%, what is the impact
on sales for Tropicana?
– DECREASE 2.5%
• Is this effect statistically significant?
– YES
Interpreting the Ln(Price)
Coefficients
• When the price of Minute Maid
increases by 1%, what is the impact
on sales for Tropicana?
– INCREASE .55%
• Is this effect statistically significant?
– YES

• When the price of Dominick’s


increases by 1%, what is the impact
on sales for Tropicana?
– DECREASE 0.05%
• Is this effect statistically significant?
– NO
Prediction
• Compute the predicted sales when:
– PriceTrop = 3.99
– PriceMM= 2.85
– PriceDom= 2.99
– Feature = 0
– Display=0
• Answer: 5519
Preparations

• OL:
– Work on final project presentation

• May 10th
– Final Exam Review

51

You might also like