MKT3600 - L09 - Correlation and Regression
MKT3600 - L09 - Correlation and Regression
Research
Zhuping Liu
Announcements
• Article 7 discussion
• Assignment 5
– available on Blackboard
– due on May 17th
• Final Exam Review on May 10th
• Final Project Presentation on May
17th
• Final Exam:
– EMA MAY 19 W on Blackboard
– FMA MAY 24 M on Blackboard
What We Have Learned
Hypothesis testing
• Chi-Square Test
Age Observed
Groups Population
N=210
10-20 35 12%
31-40 27 28%
41+ 23 42%
Hypothesis Testing:
Steps
• Step 1: Formulate Hypotheses
H0: no difference between sample distribution and
population distribution
Ha: there is difference
• Step 2: Select significance level (0.05)
• Step 3: Select appropriate formula and
calculate test statistic
G (Obsg Expg ) 2
2
g 1 Expg
Hypothesis Testing: Step 3
G (Obs Exp ) 2
2 g1 g g
Exp g
G=Total number of groups
Age Observed
Population Expected
Groups N=210
10-20 35 12% 210*12/100
Step 3
G
2 g1 g g
Exp g
G=Total number of groups
2 =4+201+17+48=270
Hypothesis Testing:
Steps
• Step 4: Calculate degrees of freedom
2
Critical(.05,3) 7.81
• Step 6: Make decision regarding H0
Wall Street
Journal 83 180 263
H a : x k or H a : x k
x k
t-statistic: t
sx 2
s sample variance
s:standard
x error,sx
x
n
x:sample mean,
x 6.5 2
s 4
x
n 100
6 .5 5 1 .5
t 7.5
4 .2
100
t-critical
• Step 4: calculate the degrees of freedom
df = degrees of freedom = total sample size-1
= n-1 = 100-1 = 99
Ha: ratingfoodquality ≠ 5
For two-sided test:
- Reject null if |t| > t critical |t|=7.5 > t-critical=1.96
- Fail to reject null if |t| < tReject H0
critical
For one-sidedHa:
test:
ratingfoodquality > 5
21
Today
• Regression Analysis
– Regression Analysis in Excel:
http://www.excel-easy.com/examples/regression.
html
Random
Intercept Slope Error
y a bx
Dependent Independent
Variable Variable
Linear regression
linear y a bx
regression
observed unobserved
a intercept
X
independent variable
value of y when x= 0
variables that influence
the value of the dependent variable
e.g., prices, promotions, etc. random error
unobserved errors. E.g.,
measurement error 24
missing variables
Linear Regression Model
y
e a ns)
neo fm
(l i
+ bx Change
(y ) =a
E b = Slope in y
Change in x
a = y-intercept
x
Linear Regression
Model
y 𝑦 𝑖= 𝑎 ^ 𝑥 +𝜀
^ +𝑏 𝑖 𝑖 Observed
value
i = Random error
^ ^𝑥
^ +𝑏
𝑦 𝑖= 𝑎 𝑖
x
Observed value
What is the “Best”
Regression Model?
• How would you draw a line through the points?
• How do you determine which line ‘fits best’?
y
60
40
20
0 x
0 20 40 60
• ‘Best fit’ means difference between actual y values and
predicted y values are a minimum (least squares)
• So minimize SSE = 𝑛 𝑛
∑ 𝑖 𝑖 ∑ 𝑖
( 𝑦 − ^
𝑦 )
2
= 𝜀
2
𝑖 =1
y 𝑦 2= 𝑎 ^ 𝑥 +𝜀
^ +𝑏 2 2
𝜺𝟐 𝜺𝟒
𝜺𝟏 𝜺𝟑
^ ^𝑥
^ +𝑏
𝑦 𝑖= 𝑎 𝑖
x
Interpretation of
Regression Coefficients
Impact of Advertising on Yogurt Sales:
• Slope ()
– Yogurt sales are expected to increase
by 0.1 units for each $1 increase in
advertising (x)
• Intercept ()
– Average yogurt sales are expected to
be 100 units when there is no
Linear Regression:
Assessing Fit
how well does the regression line
Assess fit fit the data points ?
R2 : amount of variance of Y explained through the regre
0 < R2 < 1
Y Y
X
X
low R2 high R2
Linear regression:
Prediction
Once we know a and b’s, we can predict Y for any value of X’
How?
^ ^ ^
=
Ya+b X
^
1 1 + b2 X 2 + … + bK X K
Null hypothesis: H0 : bk = 0
2
use t statistic: tn-k-1 = bk / Sbk Sbk : variance of bk
SalesTrop= Intercept +
a*PriceTrop + b*PriceMM +
c*PriceDom + d*Feature +
e*Display +
Step 2 : “What Ifs”
• If price of Tropicana were to increase
by $1 what would happen to the unit
sales of Tropicana?
• If the price of Minute Maid were to
increase by $1 what would happen to
the unit sales of Tropicana?
• If the price of store brand were to
increase by $1 what would happen to
the unit sales of Tropicana?
Step 2 : “What Ifs”
• When there is a Feature for
Tropicana, what is the impact on unit
sales of Tropicana?
ANOVA
What does a negative
df SS MS F coefficient imply?
Regression 5 31131717954 6226343591 44.63002297
Residual 110 15346122395 139510203.6
Total 115 46477840349
Is it significant? NO!
1 : if brand is on display
Di =
0 : if brand is not on display
a + b: if brand is on display
Yi = a + b D i =
a : otherwise
41
Non-Linear Effects
Likelihood of Purchasing Candy Bar = 1.1+ 3 *
Sweetness
So should we keep adding sugar?
hat if more is not better? Y i = a + b 1 X i + b 2 X i 2 + ei
purchase likelihood of candy bar (Y)
b1>0 b2 < 0
b1<0 b2 > 0
sweetness (X)
The log-log sales
Response Model
• The log-log sales response model is the single
most useful tool in analyzing the competitive
structure of retail markets
log(sales in period t) = β0 + β1*log(own price in period t) +
β2*log(competitor price in
period t) + εt
• Interpretation of Coefficients:
– Coefficient on ln x = % change in Y,
when x increases by 1%
Running the Model
Log(SalesTrop)= Intercept +
a*Log(PriceTrop) + b*Log(PriceMM) +
c*Log(PriceDom) + d*Feature +
e*Display
Output of the Log-Log
Model
SUMMARY OUTPUT
ANOVA
df SS MS F
Regression 5 51.84306 10.36861 85.1425
Residual 110 13.39575 0.12178 Check R-Square
Total 115 65.23881
Coefficients
Standard Error t Stat P-value
Intercept 11.56145 0.273605 42.25597 7.83E-70
Ln(PriceTrop) -2.51154 0.203726 -12.328 1.85E-22
Ln(PriceMM) 0.553096 0.182128 3.036851 0.002986
Ln(PriceDom) -0.04492 0.196747 -0.2283 0.819838
Feature 0.065482 0.078316 0.836129 0.404895
Display 0.632155 0.094394 6.697001 9.37E-10
Linear Regression
Output
SUMMARY OUTPUT
How good is the fit?
Regression Statistics
Multiple R 0.8184244
R Square 0.6698185
Adjusted R Square0.6548103
Standard Error 11811.444
Observations 116
ANOVA
df SS MS F
Regression 5 31131717954 6.226E+09 44.630023
Residual 110 15346122395 139510204
Total 115 46477840349
• OL:
– Work on final project presentation
• May 10th
– Final Exam Review
51