CHAP13. 1997-2003
CHAP13. 1997-2003
Chapter 13
The Simple Linear Regression
Model and Correlation
© 1999 Prentice-Hall, Inc. Chap. 13 - 1
Chapter Topics
• Types of Regression Models
• Determining the Simple Linear Regression
Equation
• Measures of Variation in Regression and
Correlation
• Assumptions of Regression and Correlation
• Residual Analysis and the Durbin-Watson Statistic
• Estimation of Predicted Values
• Correlation - Measuring the Strength of the
Association
© 1999 Prentice-Hall, Inc. Chap. 13 - 2
Purpose of Regression
and Correlation Analysis
• Regression Analysis is Used Primarily for
Prediction
A statistical model used to predict the values of a
dependent or response variable based on values of
at least one independent or explanatory variable
Y intercept Random
Error
Yi 0 1 X i i
Dependent
(Response) Independent
Slope (Explanatory)
Variable
Variable
© 1999 Prentice-Hall, Inc. Chap. 13 - 6
Population
Linear Regression Model
Y Yi 0 1X i i Observed
Value
i = Random Error
0 1X i
YX
X
Observed Value
© 1999 Prentice-Hall, Inc. Chap. 13 - 7
Sample Linear
Regression Model
Yi b0 b1 X i
Yi = Predicted Value of Y for observation i
10000
8000
6000
4000
2000
0
0 1000 2000 3000 4000 5000 6000
Square Feet
Excel Output
10000
8000
87Xi
6000 + 1.4
15
36.4
4000 6
Yi =1
2000
0
0 1000 2000 3000 4000 5000 6000
Square Feet
_
SSR = (Yi - Y)2
_
Y
X
Xi
© 1999 Prentice-Hall, Inc. Chap. 13 - 15
Measures of Variation
The Sum of Squares: Example
^=b +b X
Y ^=b +b X
Y
i 0 1 i i 0 1 i
X X
© 1999 Prentice-Hall, Inc. Chap. 13 - 18
Standard Error of
Estimate
n
SSE ( Yi Yi )
2
Syx = i 1
n 2
n 2
Y
X2
X1
X Regression Line
© 1999 Prentice-Hall, Inc. Chap. 13 - 22
Residual Analysis
• Purposes
Examine Linearity
Evaluate violations of assumptions
Not Linear
Linear
e e
X X
Heteroscedasticity
SR
Homoscedasticity
SR
X X
Excel Output
0 1000 2000 3000 4000 5000 6000
Square Feet
n
2
( ei ei 1 ) Should be close to 2.
D i 2 n If not, examine the model
2
ie for autocorrelation.
i 1
Not Independent
Independent
SR SR
X X
•Test Statistic:
b1 1 SYX
t Where Sb
Sb1 1 n
2
(
iX X )
i 1
and df = n - 2
© 1999 Prentice-Hall, Inc. Chap. 13 - 29
Example: Produce Stores
Data for 7 Stores: Regression
Annual Model Obtained:
Store Square Sales
Feet ($000)
Yi = 1636.415 +1.487Xi
1 1,726 3,681
2 1,542 3,395 The slope of this model
3 2,816 6,653
is 1.487.
4 5,555 9,543 Is there a linear
5 1,292 3,318 relationship between the
6 2,208 5,563 square footage of a store
7 1,313 3,760 and its annual sales?
© 1999 Prentice-Hall, Inc. Chap. 13 - 30
Inferences about the
Slope: t Test Example
H0: 1 = 0 Test Statistic:
From Excel Printout
H1: 1 0 t Stat P-value
.05 Intercept 3.6244333 0.0151488
X Variable 1 9.009944 0.0002812
df 7 - 2 = 7
Critical Value(s): Decision:
Reject Reject Reject H0
+ b X
1 i
Yi = b0
_ X
X A Given X
© 1999 Prentice-Hall, Inc. Chap. 13 - 35
Example: Produce Stores
Data for 7 Stores:
Annual
Store Square Sales Predict the annual
Feet ($000)
sales for a store with
1 1,726 3,681 2000 square feet.
2 1,542 3,395
3 2,816 6,653 Regression Model Obtained:
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
Yi = 1636.415 +1.487Xi
7 1,313 3,760
© 1999 Prentice-Hall, Inc. Chap. 13 - 36
Estimation of Predicted
Values: Example
Confidence Interval Estimate for Individual Y
Find the 95% confidence interval for the average annual sales
for stores of 2,000 square feet
Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)
X = 2350.29 SYX = 611.75 tn-2 = t5 = 2.5706
1 ( X i X )2
Ŷi t n 2 Syx n = 4610.45 980.97
n ( X X )2
i
i 1 Confidence interval for mean Y
© 1999 Prentice-Hall, Inc. Chap. 13 - 37
Estimation of Predicted
Values: Example
Confidence Interval Estimate for XY
Find the 95% confidence interval for annual sales of one
particular stores of 2,000 square feet
Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)
1 ( X i X )2
Ŷi t n 2 Syx 1 n = 4610.45 1853.45
n ( X X )2
i Confidence interval for indivi
i 1
Y
© 1999 Prentice-Hall, Inc. Chap. 13 - 38
Correlation: Measuring the
Strength of Association
• Answer ‘How Strong Is the Linear
Relationship Between 2 Variables?’
• Coefficient of Correlation Used
Population correlation coefficient denoted
(‘Rho’)
Values range from -1 to +1