0% found this document useful (0 votes)
3 views41 pages

CHAP13. 1997-2003

Chapter 13 discusses the Simple Linear Regression Model and Correlation, focusing on its purpose for prediction and measuring the strength of association between variables. It covers the regression equation, measures of variation, assumptions of regression, and residual analysis, including the Durbin-Watson statistic. The chapter also provides examples and interpretations of regression results, emphasizing the importance of understanding linear relationships in data analysis.

Uploaded by

mammadle001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views41 pages

CHAP13. 1997-2003

Chapter 13 discusses the Simple Linear Regression Model and Correlation, focusing on its purpose for prediction and measuring the strength of association between variables. It covers the regression equation, measures of variation, assumptions of regression, and residual analysis, including the Durbin-Watson statistic. The chapter also provides examples and interpretations of regression results, emphasizing the importance of understanding linear relationships in data analysis.

Uploaded by

mammadle001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 41

Statistics for Managers

Using Microsoft Excel

Chapter 13
The Simple Linear Regression
Model and Correlation
© 1999 Prentice-Hall, Inc. Chap. 13 - 1
Chapter Topics
• Types of Regression Models
• Determining the Simple Linear Regression
Equation
• Measures of Variation in Regression and
Correlation
• Assumptions of Regression and Correlation
• Residual Analysis and the Durbin-Watson Statistic
• Estimation of Predicted Values
• Correlation - Measuring the Strength of the
Association
© 1999 Prentice-Hall, Inc. Chap. 13 - 2
Purpose of Regression
and Correlation Analysis
• Regression Analysis is Used Primarily for
Prediction
A statistical model used to predict the values of a
dependent or response variable based on values of
at least one independent or explanatory variable

Correlation Analysis is Used to Measure


Strength of the Association Between
Numerical Variables

© 1999 Prentice-Hall, Inc. Chap. 13 - 3


The Scatter Diagram

Plot of all (Xi , Yi) pairs


Y
60
40
20
0 X
0 20 40 60

© 1999 Prentice-Hall, Inc. Chap. 13 - 4


Types of Regression Models
Positive Linear Relationship Relationship NOT Linear

Negative Linear Relationship No Relationship

© 1999 Prentice-Hall, Inc. Chap. 13 - 5


Simple Linear
Regression Model
• Relationship Between Variables Is a Linear Function

• The Straight Line that Best Fit the Data

Y intercept Random
Error

Yi   0   1 X i   i
Dependent
(Response) Independent
Slope (Explanatory)
Variable
Variable
© 1999 Prentice-Hall, Inc. Chap. 13 - 6
Population
Linear Regression Model
Y Yi   0  1X i   i Observed
Value

i = Random Error

   0  1X i
YX

X
Observed Value
© 1999 Prentice-Hall, Inc. Chap. 13 - 7
Sample Linear
Regression Model

Yi  b0  b1 X i

Yi = Predicted Value of Y for observation i

Xi = Value of X for observation i

b0 = Sample Y - intercept used as estimate of


the population 0
b1 = Sample Slope used as estimate of the
population 1
© 1999 Prentice-Hall, Inc. Chap. 13 - 8
Simple Linear Regression
Equation: Example
Annual
Store Square Sales
Feet ($000)
1 1,726 3,681
You wish to examine the
2 1,542 3,395
relationship between the
square footage of produce 3 2,816 6,653
stores and its annual sales. 4 5,555 9,543
Sample data for 7 stores 5 1,292 3,318
were obtained. Find the
6 2,208 5,563
equation of the straight
line that fits the data best 7 1,313 3,760

© 1999 Prentice-Hall, Inc. Chap. 13 - 9


Scatter Diagram
Example
12000
Annual Sales ($000)

10000

8000

6000

4000

2000

0
0 1000 2000 3000 4000 5000 6000

Square Feet
Excel Output

© 1999 Prentice-Hall, Inc. Chap. 13 - 10


Equation for the Best
Straight Line

Yi  b0  b1 X i
1636.415  1.487 X i

From Excel Printout:


Coefficients
Intercept 1636.414726
X Variable 1 1.486633657

© 1999 Prentice-Hall, Inc. Chap. 13 - 11


Graph of the Best
Straight Line
12000
Annual Sales ($000)

10000
8000
87Xi
6000 + 1.4
15
36.4
4000  6
Yi =1
2000
0
0 1000 2000 3000 4000 5000 6000

Square Feet

© 1999 Prentice-Hall, Inc. Chap. 13 - 12


Interpreting the Results

Yi = 1636.415 +1.487Xi

The slope of 1.487 means for each increase of one


unit in X, the Y is estimated to increase 1.487units.

For each increase of 1 square foot in the size of the


store, the model predicts that the expected annual
sales are estimated to increase by $1487.

© 1999 Prentice-Hall, Inc. Chap. 13 - 13


Measures of Variation:
The Sum of Squares
SST = Total Sum of Squares
•measures_the variation of the Yi values around their
mean Y
SSR = Regression Sum of Squares
•explained variation attributable to the relationship
between X and Y
SSE = Error Sum of Squares
•variation attributable to factors other than the
relationship between X and Y

© 1999 Prentice-Hall, Inc. Chap. 13 - 14


Measures of Variation:
The Sum of Squares
Y 
SSE =(Yi - Yi )2
_ b Xi
 b0 + 1
SST = (Yi - Y) 2
Yi =

 _
SSR = (Yi - Y)2
_
Y

X
Xi
© 1999 Prentice-Hall, Inc. Chap. 13 - 15
Measures of Variation
The Sum of Squares: Example

Excel Output for Produce Stores


df SS
Regression 1 30380456.12
Residual 5 1871199.595
Total 6 32251655.71

SSR SSE SST

© 1999 Prentice-Hall, Inc. Chap. 13 - 16


The Coefficient of
Determination

SSR regression sum of squares


r =
2
=
SST total sum of squares

Measures the proportion of variation that is


explained by the independent variable X in
the regression model

© 1999 Prentice-Hall, Inc. Chap. 13 - 17


Coefficients of Determination
(r2) and Correlation (r)
Y r2 = 1, r = +1 Y r2 = 1, r = -1
^=b +b X
Yi 0 1 i
^=b +b X
Yi 0 1 i
X X

Yr2 = .8, r = +0.9 Y r2 = 0, r = 0

^=b +b X
Y ^=b +b X
Y
i 0 1 i i 0 1 i

X X
© 1999 Prentice-Hall, Inc. Chap. 13 - 18
Standard Error of
Estimate

n 
SSE  ( Yi  Yi )
2
Syx  = i 1
n 2
n 2

The standard deviation of the variation of


observations around the regression line

© 1999 Prentice-Hall, Inc. Chap. 13 - 19


Measures of Variation:
Example
Excel Output for Produce Stores
Regression Statistics
Multiple R 0.9705572
R Square 0.94198129
Adjusted R Square 0.93037754
Standard Error 611.751517
Observations 7
r2 = .94 Syx
94% of the variation in annual sales can be
explained by the variability in the size of the
store as measured by square footage
© 1999 Prentice-Hall, Inc. Chap. 13 - 20
Linear Regression
Assumptions
For Linear Models
1. Normality
 Y Values Are Normally Distributed For Each
X
 Probability Distribution of Error is Normal

2. Homoscedasticity (Constant Variance)


3. Independence of Errors

© 1999 Prentice-Hall, Inc. Chap. 13 - 21


Variation of Errors Around
the Regression Line
y values are normally distributed
f(e) around the regression line.
For each x value, the “spread” or
variance around the regression
line is the same.

Y
X2
X1
X Regression Line
© 1999 Prentice-Hall, Inc. Chap. 13 - 22
Residual Analysis
• Purposes
 Examine Linearity
 Evaluate violations of assumptions

• Graphical Analysis of Residuals


 Plot residuals Vs. Xi values
 Difference between actual Yi & predicted Y i
 Studentized residuals:
 Allows consideration for the magnitude of the
residuals

© 1999 Prentice-Hall, Inc. Chap. 13 - 23


Residual Analysis for Linearity

Not Linear
 Linear

e e

X X

© 1999 Prentice-Hall, Inc. Chap. 13 - 24


Residual Analysis for
Homoscedasticity

Heteroscedasticity 
SR
Homoscedasticity
SR

X X

Using Standardized Residuals

© 1999 Prentice-Hall, Inc. Chap. 13 - 25


Residual Analysis:
Computer Output Example
Observation Predicted Y Residuals
1 4202.344417 -521.3444173
2 3928.803824 -533.8038245 Produce Stores
3 5822.775103 830.2248971
4 9894.664688 -351.6646882
5 3557.14541 -239.1454103 Residual Plot
6 4918.90184 644.0981603
7 3588.364717 171.6352829

Excel Output
0 1000 2000 3000 4000 5000 6000

Square Feet

© 1999 Prentice-Hall, Inc. Chap. 13 - 26


The Durbin-Watson
Statistic
•Used when data is collected over time to detect
autocorrelation (Residuals in one time period
are related to residuals in another period)
•Measures Violation of independence assumption

n
2
 ( ei  ei  1 ) Should be close to 2.
D  i 2 n If not, examine the model
2
 ie for autocorrelation.
i 1

© 1999 Prentice-Hall, Inc. Chap. 13 - 27


Residual Analysis for
Independence

Not Independent
 Independent
SR SR

X X

© 1999 Prentice-Hall, Inc. Chap. 13 - 28


Inferences about the
Slope: t Test
• t Test for a Population Slope
Is a Linear Relationship Between X & Y ?
•Null and Alternative Hypotheses
H0: 1 = 0 (No Linear Relationship)
H1: 1  0 (Linear Relationship)

•Test Statistic:
b1   1 SYX
t  Where Sb 
Sb1 1 n
2
(
 iX  X )
i 1
and df = n - 2
© 1999 Prentice-Hall, Inc. Chap. 13 - 29
Example: Produce Stores
Data for 7 Stores: Regression
Annual Model Obtained:
Store Square Sales 
Feet ($000)
Yi = 1636.415 +1.487Xi
1 1,726 3,681
2 1,542 3,395 The slope of this model
3 2,816 6,653
is 1.487.
4 5,555 9,543 Is there a linear
5 1,292 3,318 relationship between the
6 2,208 5,563 square footage of a store
7 1,313 3,760 and its annual sales?
© 1999 Prentice-Hall, Inc. Chap. 13 - 30
Inferences about the
Slope: t Test Example
H0: 1 = 0 Test Statistic:
From Excel Printout
H1: 1  0 t Stat P-value
  .05 Intercept 3.6244333 0.0151488
X Variable 1 9.009944 0.0002812
df  7 - 2 = 7
Critical Value(s): Decision:
Reject Reject Reject H0

.025 .025 Conclusion:


There is evidence of a
-2.5706 0 2.5706
t relationship.
© 1999 Prentice-Hall, Inc. Chap. 13 - 31
Inferences about the Slope:
Confidence Interval Example
Confidence Interval Estimate of the Slope
b1 tn-2Sb1
Excel Printout for Produce Stores
Lower 95% Upper 95%
Intercept 475.810926 2797.01853
X Variable 11.06249037 1.91077694
At 95% level of Confidence The confidence Interval for the
slope is (1.062, 1.911). Does not include 0.
Conclusion: There is a significant linear relationship
between annual sales and the size of the store.
© 1999 Prentice-Hall, Inc. Chap. 13 - 32
Estimation of
Predicted Values
Confidence Interval Estimate for XY
The Mean of Y given a particular Xi
Size of interval vary according to
Standard error distance away from mean, X.
of the estimate
2
1 ( Xi  X )
Ŷi t n  2  Syx  n
n  ( X  X )2
t value from table i
with df=n-2
i 1
© 1999 Prentice-Hall, Inc. Chap. 13 - 33
Estimation of
Predicted Values
Confidence Interval Estimate for
Individual Response Yi at a Particular Xi
Addition of this 1 increased width of interval
from that for the mean Y
2
1 ( Xi  X )
Ŷi t n  2  Syx 1  n
n  ( X  X )2
i
i 1

© 1999 Prentice-Hall, Inc. Chap. 13 - 34


Interval Estimates for
Different Values of X
Confidence Interval Confidence
for a individual Yi Interval for the
Y mean of Y

 + b X
1 i
Yi = b0

_ X
X A Given X
© 1999 Prentice-Hall, Inc. Chap. 13 - 35
Example: Produce Stores
Data for 7 Stores:
Annual
Store Square Sales Predict the annual
Feet ($000)
sales for a store with
1 1,726 3,681 2000 square feet.
2 1,542 3,395
3 2,816 6,653 Regression Model Obtained:
4 5,555 9,543
5 1,292 3,318 
6 2,208 5,563
Yi = 1636.415 +1.487Xi
7 1,313 3,760
© 1999 Prentice-Hall, Inc. Chap. 13 - 36
Estimation of Predicted
Values: Example
Confidence Interval Estimate for Individual Y
Find the 95% confidence interval for the average annual sales
for stores of 2,000 square feet

Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)
X = 2350.29 SYX = 611.75 tn-2 = t5 = 2.5706

1 ( X i  X )2
Ŷi t n  2  Syx  n = 4610.45  980.97
n  ( X  X )2
i
i 1 Confidence interval for mean Y
© 1999 Prentice-Hall, Inc. Chap. 13 - 37
Estimation of Predicted
Values: Example
Confidence Interval Estimate for XY
Find the 95% confidence interval for annual sales of one
particular stores of 2,000 square feet

Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)

X = 2350.29 SYX = 611.75 tn-2 = t5 = 2.5706

1 ( X i  X )2
Ŷi t n  2  Syx 1  n = 4610.45  1853.45
n  ( X  X )2
i Confidence interval for indivi
i 1
Y
© 1999 Prentice-Hall, Inc. Chap. 13 - 38
Correlation: Measuring the
Strength of Association
• Answer ‘How Strong Is the Linear
Relationship Between 2 Variables?’
• Coefficient of Correlation Used
 Population correlation coefficient denoted
 (‘Rho’)
 Values range from -1 to +1

 Measures degree of association

• Is the Square Root of the Coefficient of


Determination
© 1999 Prentice-Hall, Inc. Chap. 13 - 39
Test of
Coefficient of Correlation
• Tests If There Is a Linear Relationship
Between 2 Numerical Variables
• Same Conclusion as Testing Population
Slope 1
• Hypotheses
 H0:  = 0 (No Correlation)
 H1:   0 (Correlation)

© 1999 Prentice-Hall, Inc. Chap. 13 - 40


Chapter Summary
• Described Types of Regression Models
• Determined the Simple Linear Regression
Equation
• Provided Measures of Variation in Regression and
Correlation
• Stated Assumptions of Regression and Correlation
• Described Residual Analysis and the Durbin-
Watson Statistic
• Provided Estimation of Predicted Values
• Discussed Correlation - Measuring the Strength of
the Association
© 1999 Prentice-Hall, Inc. Chap. 13 - 41

You might also like