Ch6 Multiple Regression
Ch6 Multiple Regression
Multiple Regression
1
Introduction
In this chapter we extend the simple linear
regression model, and allow for any number of
independent variables.
We expect to build a model that fits the data better
than the simple linear regression model.
2
We all believe that weight is affected by the amount of
calories consumed. Yet, the actual effect is different from
one individual to another.
•Therefore, a simple linear relationship leaves much
unexplained error.
Weight
Calories consumed
3
Weight
Calories consumed
Calories consumed
6
Model Assumptions
– Required conditions for
• The error is normally distributed.
• The mean is equal to zero
• the standard deviation is constant ( for
all values of y.
• The errors are independent.
7
17.2 Estimating the Coefficients and
Assessing the Model
The procedure used to perform regression analysis:
– Obtain the model coefficients and statistics
using a statistical software.
– Diagnose violations of required conditions. Try
to remedy problems when identified.
– Assess the model fit using statistics obtained
from the sample.
– If the model assessment indicates good fit to the
data, use it to interpret the coefficients and
generate predictions.
Example 1 Where to locate a new motor inn?
La Quinta Motor Inns is planning an expansion.
Management wishes to predict which sites are likely
to be profitable.
Several areas where predictors of profitability can be
identified are:
– Competition
– Market awareness
– Demand generators
– Demographics
– Physical quality
9
Profitability Operating Margin
Market
Competition Customers Community Physical
awareness
X1 x2 x3 x4 x5 x6
Rooms Nearest Office Income Distance
Enrollment
space
Number of Distance to College Median Distance to
hotels/motels the nearest Enrollment household downtown.
rooms within La Quinta inn. income.
3 miles from
the site. 10
Data were collected from randomly selected 100
inns that belong to La Quinta, and ran for the
following suggested model:
Margin = Rooms
NearestOfficeCollege + 5Income +
6Disttwn +
INN
INN MARGIN ROOMS
MARGIN ROOMS NEAREST
NEAREST OFFICE
OFFICE COLLEGE
COLLEGE INCOME
INCOME DISTTWN
DISTTWN
11 55.5
55.5 3203
3203 4.2
4.2 549
549 88 37
37 2.7
2.7
22 33.8
33.8 2810
2810 2.8
2.8 496
496 17.5
17.5 35
35 14.4
14.4
33 49
49 2890
2890 2.4
2.4 254
254 20
20 35
35 2.6
2.6
44 31.9
31.9 3422
3422 3.3
3.3 434
434 15.5
15.5 38
38 12.1
12.1
55 57.4
57.4 2687
2687 0.9
0.9 678
678 15.5
15.5 42
42 6.9
6.9
66 49
49 3759
3759 2.9
2.9 635
635 19
19 33
33 10.8
10.8
11
Thisisisthe
This thesample
sampleregression
regressionequation
equation
(sometimescalled
(sometimes calledthe
theprediction
predictionequation)
equation)
La Quinta
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.724611
R Square 0.525062
Adjusted R Square
0.49442
Standard Error
5.512084
MARGIN = 38.14 -
Observations 100
0.0076ROOMS +1.65NEAREST
ANOVA
Regression
df SS MS+ 0.02OFFICE
F Significance F
6 3123.832 520.6387 17.13581 3.03E-13
Residual
Total +0.21COLLEGE
93 2825.626 30.38307
99 5949.458
Intercept
Coefficients +0.41INCOME -
Standard Error t Stat
38.13858 6.992948 5.453862
P-value Lower 95%Upper 95%
4.04E-07 24.25197 52.02518
Number
Nearest
0.23DISTTWN
-0.00762 0.001255 -6.06871
1.646237 0.632837 2.601361
2.77E-08 -0.01011 -0.00513
0.010803 0.389548 2.902926
Office Space0.019766 0.00341 5.795594 9.24E-08 0.012993 0.026538
Enrollment 0.211783 0.133428 1.587246 0.115851 -0.05318 0.476744
Income 0.413122 0.139552 2.960337 0.003899 0.135999 0.690246
Distance -0.22526 0.178709 -1.26048 0.210651 -0.58014 0.129622
12
Model Assessment -Standard Error of Estimate
13
Q: How can we determine whether the standard deviation of
the error is small/large?
A: The magnitude of se is judged by comparing it
to y. SUMMARY OUTPUT
Coefficients
Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 38.13858 6.992948 5.453862 4.04E-07 24.25197 52.02518
Number -0.00762 0.001255 -6.06871 2.77E-08 -0.01011 -0.00513
Nearest 1.646237 0.632837 2.601361 0.010803 0.389548 2.902926
Office Space 0.019766 0.00341 5.795594 9.24E-08 0.012993 0.026538
Enrollment 0.211783 0.133428 1.587246 0.115851 -0.05318 0.476744
Income 0.413122 0.139552 2.960337 0.003899 0.135999 0.690246
Distance -0.22526 0.178709 -1.26048 0.210651 -0.58014 0.129622
14
Model Assessment – Coefficient of
Determination
15
Multiple R 0.724611
R Square 0.525062
Adjusted R Square
0.49442
Standard Error
5.512084
Observations 100
From the printout, R2 = 0.5251
that is, 52.51% of the variability
ANOVA in the margin values is explained
df SS by this MS
model. F Significance F
Regression 6 3123.832 520.6387 17.13581 3.03E-13
Residual 93 2825.626 30.38307
Total 99 5949.458
Coefficients
Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 72.45461 7.893104 9.179483 1.11E-14 56.78049 88.12874
ROOMS -0.00762 0.001255 -6.06871 2.77E-08 -0.01011 -0.00513
NEAREST -1.64624 0.632837 -2.60136 0.010803 -2.90292 -0.38955
OFFICE 0.019766 0.00341 5.795594 9.24E-08 0.012993 0.026538
COLLEGE 0.211783 0.133428 1.587246 0.115851 -0.05318 0.476744
INCOME -0.41312 0.139552 -2.96034 0.003899 -0.69025 -0.136
DISTTWN 0.225258 0.178709 1.260475 0.210651 -0.12962 0.580138
16
Interpreting b0
17
Interpreting the coefficients b1 through bk
y = b0 + b1x1 + b2x2 +…+ bkxk
19
Testing the Coefficients
The hypothesis for each i is H0: i 0; H1: i 0
bi i
Test statistic t d.f. = n - k -1
sbi
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 38.13858 6.992948 5.453862 4.04E-07 24.25196697 52.02518
Number -0.007618 0.00125527 -6.06871 2.77E-08 -0.010110585 -0.00513
Nearest 1.646237 0.63283691 2.601361 0.010803 0.389548431 2.902926
Office Space0.019766 0.00341044 5.795594 9.24E-08 0.012993078 0.026538
Enrollment 0.211783 0.13342794 1.587246 0.115851 -0.053178488 0.476744
Income 0.413122 0.1395524 2.960337 0.003899 0.135998719 0.690246
Distance -0.225258 0.17870889 -1.26048 0.210651 -0.580138524 0.129622
Prediction Interval
Lower limit = 25.39527
Upper limit = 48.78771
27
28
29