0% found this document useful (0 votes)
28 views

Ch6 Multiple Regression

The multiple regression model predicts profit margin (dependent variable) using 6 independent variables: number of rooms, distance to nearest La Quinta inn, office space, college enrollment, median household income, and distance to downtown. The standard error of the estimate is 5.512, which is relatively small compared to the mean profit margin of 45.739. Overall, the multiple regression model fits the data well in predicting profit margin.

Uploaded by

quanhle2005
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Ch6 Multiple Regression

The multiple regression model predicts profit margin (dependent variable) using 6 independent variables: number of rooms, distance to nearest La Quinta inn, office space, college enrollment, median household income, and distance to downtown. The standard error of the estimate is 5.512, which is relatively small compared to the mean profit margin of 45.739. Overall, the multiple regression model fits the data well in predicting profit margin.

Uploaded by

quanhle2005
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 29

Chapter 6

Multiple Regression

1
Introduction
In this chapter we extend the simple linear
regression model, and allow for any number of
independent variables.
We expect to build a model that fits the data better
than the simple linear regression model.

2
We all believe that weight is affected by the amount of
calories consumed. Yet, the actual effect is different from
one individual to another.
•Therefore, a simple linear relationship leaves much
unexplained error.

Weight

Calories consumed
3
Weight

Calories consumed

In an attempt to reduce the unexplained errors, we’ll add


a second explanatory (independent) variable
4
Weight

Calories consumed

If we believe a person’s height explains his/her weight too,


we can add this variable to our model.
The resulting Multiple regression model is shown:

Weight = 0 + 1Calories + 2Height + 


5
17.1 Model and Required Conditions
We allow k independent variables to potentially
explain the dependent variable

Coefficients Random error variable

y = 0 + 1x1+ 2x2 + …+ kxk + 

Dependent variable Independent variables

6
Model Assumptions
– Required conditions for 
• The error is normally distributed.
• The mean is equal to zero
• the standard deviation is constant ( for
all values of y.
• The errors are independent.

7
17.2 Estimating the Coefficients and
Assessing the Model
The procedure used to perform regression analysis:
– Obtain the model coefficients and statistics
using a statistical software.
– Diagnose violations of required conditions. Try
to remedy problems when identified.
– Assess the model fit using statistics obtained
from the sample.
– If the model assessment indicates good fit to the
data, use it to interpret the coefficients and
generate predictions.
Example 1 Where to locate a new motor inn?
La Quinta Motor Inns is planning an expansion.
Management wishes to predict which sites are likely
to be profitable.
Several areas where predictors of profitability can be
identified are:
– Competition
– Market awareness
– Demand generators
– Demographics
– Physical quality
9
Profitability Operating Margin

Market
Competition Customers Community Physical
awareness

X1 x2 x3 x4 x5 x6
Rooms Nearest Office Income Distance
Enrollment
space
Number of Distance to College Median Distance to
hotels/motels the nearest Enrollment household downtown.
rooms within La Quinta inn. income.
3 miles from
the site. 10
Data were collected from randomly selected 100
inns that belong to La Quinta, and ran for the
following suggested model:

Margin = Rooms
NearestOfficeCollege + 5Income +
6Disttwn + 

INN
INN MARGIN ROOMS
MARGIN ROOMS NEAREST
NEAREST OFFICE
OFFICE COLLEGE
COLLEGE INCOME
INCOME DISTTWN
DISTTWN
11 55.5
55.5 3203
3203 4.2
4.2 549
549 88 37
37 2.7
2.7
22 33.8
33.8 2810
2810 2.8
2.8 496
496 17.5
17.5 35
35 14.4
14.4
33 49
49 2890
2890 2.4
2.4 254
254 20
20 35
35 2.6
2.6
44 31.9
31.9 3422
3422 3.3
3.3 434
434 15.5
15.5 38
38 12.1
12.1
55 57.4
57.4 2687
2687 0.9
0.9 678
678 15.5
15.5 42
42 6.9
6.9
66 49
49 3759
3759 2.9
2.9 635
635 19
19 33
33 10.8
10.8
11
Thisisisthe
This thesample
sampleregression
regressionequation
equation
(sometimescalled
(sometimes calledthe
theprediction
predictionequation)
equation)
La Quinta
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.724611
R Square 0.525062
Adjusted R Square
0.49442
Standard Error
5.512084
MARGIN = 38.14 -
Observations 100
0.0076ROOMS +1.65NEAREST
ANOVA

Regression
df SS MS+ 0.02OFFICE
F Significance F
6 3123.832 520.6387 17.13581 3.03E-13
Residual
Total +0.21COLLEGE
93 2825.626 30.38307
99 5949.458

Intercept
Coefficients +0.41INCOME -
Standard Error t Stat
38.13858 6.992948 5.453862
P-value Lower 95%Upper 95%
4.04E-07 24.25197 52.02518
Number
Nearest
0.23DISTTWN
-0.00762 0.001255 -6.06871
1.646237 0.632837 2.601361
2.77E-08 -0.01011 -0.00513
0.010803 0.389548 2.902926
Office Space0.019766 0.00341 5.795594 9.24E-08 0.012993 0.026538
Enrollment 0.211783 0.133428 1.587246 0.115851 -0.05318 0.476744
Income 0.413122 0.139552 2.960337 0.003899 0.135999 0.690246
Distance -0.22526 0.178709 -1.26048 0.210651 -0.58014 0.129622
12
Model Assessment -Standard Error of Estimate

The standard deviation of the error  is estimated


by the Standard Error of Estimate s:
SSE
s 
n  k 1

So we would prefer a model with a small standard


deviation of the error rather than a large one.

13
Q: How can we determine whether the standard deviation of
the error is small/large?
A: The magnitude of se is judged by comparing it
to y. SUMMARY OUTPUT

Regression Statistics From the printout, s = 5.5121


Multiple R 0.724611
R Square 0.525062
Adjusted R Square
0.49442
Standard Error 5.512084 Calculating the mean value of
Observations 100
y we havey  45.739
ANOVA
df SS MS F Significance F
Regression 6 3123.832 520.6387 17.13581 3.03E-13
Residual 93 2825.626 30.38307
Total 99 5949.458

Coefficients
Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 38.13858 6.992948 5.453862 4.04E-07 24.25197 52.02518
Number -0.00762 0.001255 -6.06871 2.77E-08 -0.01011 -0.00513
Nearest 1.646237 0.632837 2.601361 0.010803 0.389548 2.902926
Office Space 0.019766 0.00341 5.795594 9.24E-08 0.012993 0.026538
Enrollment 0.211783 0.133428 1.587246 0.115851 -0.05318 0.476744
Income 0.413122 0.139552 2.960337 0.003899 0.135999 0.690246
Distance -0.22526 0.178709 -1.26048 0.210651 -0.58014 0.129622
14
Model Assessment – Coefficient of
Determination

The usefulness of the model is evaluated by the


amount of variability in the ‘y’ values explained by the
model. This is done by the coefficient of determination.
The coefficient of determination is calculated by

SSR SST  SSE


R 
2

SST SST

15
Multiple R 0.724611
R Square 0.525062
Adjusted R Square
0.49442
Standard Error
5.512084
Observations 100
From the printout, R2 = 0.5251
that is, 52.51% of the variability
ANOVA in the margin values is explained
df SS by this MS
model. F Significance F
Regression 6 3123.832 520.6387 17.13581 3.03E-13
Residual 93 2825.626 30.38307
Total 99 5949.458

Coefficients
Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 72.45461 7.893104 9.179483 1.11E-14 56.78049 88.12874
ROOMS -0.00762 0.001255 -6.06871 2.77E-08 -0.01011 -0.00513
NEAREST -1.64624 0.632837 -2.60136 0.010803 -2.90292 -0.38955
OFFICE 0.019766 0.00341 5.795594 9.24E-08 0.012993 0.026538
COLLEGE 0.211783 0.133428 1.587246 0.115851 -0.05318 0.476744
INCOME -0.41312 0.139552 -2.96034 0.003899 -0.69025 -0.136
DISTTWN 0.225258 0.178709 1.260475 0.210651 -0.12962 0.580138
16
Interpreting b0

b0 = 38.14. This is the y intercept, the value of y when all


the variables take the value zero. Since the data range of
all the independent variables do not cover the value zero,
do not interpret the intercept (that is, do not extrapolate)

17
Interpreting the coefficients b1 through bk
y = b0 + b1x1 + b2x2 +…+ bkxk

• b1 = – 0.0076. For each additional room within 3


mile of the La Quinta inn, the operating margin
decreases on the average by .0076% (assuming the
other variables are held constant).
• b2 = 1.65. For each additional mile that the nearest
competitor is to a La Quinta inn, the average
operating margin increases by 1.65% when the
other variables are held constant. 18
b3 = 0.02. For each additional 1000 sq-ft of office space,
the average increase in operating margin will be .02%.
b4 = 0.21. For each additional thousand students the
average operating margin increases by .21% when the
other variables remain constant.
b5 = 0.41. For additional $1000 increase in median
household income, the average operating margin
increases by .41%, when the other variables remain
constant.
b6 = - 0.23. For each additional mile to the downtown
center, the average operating margin decreases by .23%.

19
Testing the Coefficients
The hypothesis for each i is H0: i  0; H1: i  0
bi   i
Test statistic t d.f. = n - k -1
sbi
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 38.13858 6.992948 5.453862 4.04E-07 24.25196697 52.02518
Number -0.007618 0.00125527 -6.06871 2.77E-08 -0.010110585 -0.00513
Nearest 1.646237 0.63283691 2.601361 0.010803 0.389548431 2.902926
Office Space0.019766 0.00341044 5.795594 9.24E-08 0.012993078 0.026538
Enrollment 0.211783 0.13342794 1.587246 0.115851 -0.053178488 0.476744
Income 0.413122 0.1395524 2.960337 0.003899 0.135998719 0.690246
Distance -0.225258 0.17870889 -1.26048 0.210651 -0.580138524 0.129622

For 1: t = (-.007618-0)/.001255 = -6.068


Suppose alpha=.01. t.005,100-6-1=2.632
There is sufficient evidence to reject H 0 at =1%
The number of rooms is linearly related to the margin.
Interpretation of the regression results for this
model
– The number of hotel and motel rooms, distance
to the nearest motel, the amount of office
space, and the median household income are
linearly related to the operating margin
– Students enrollment and distance from
downtown are not linearly related to the
margin
– Preferable locations have only few other motels
nearby, much office space, and the surrounding
households are affluent.
21
Using the Regression Equation

The model can be used for making predictions by


– Producing prediction interval estimate of the
particular value of y, for given values of xi.
– Producing a confidence interval estimate for the
expected value of y, for given values of xi.
The model can be used to learn about relationships
between the independent variables xi, and the
dependent variable y, by interpreting the
coefficients i
22
Predict the average operating margin of an inn at a
site with the following characteristics:
– 3815 rooms within 3 miles,
– Closet competitor 3.4 miles away,
– 476,000 sq-ft of office space,
– 24,500 college students,
– $39,000 median household income,
– 3.6 miles distance to downtown center.

MARGIN = 38.14 - 0.0076(3815) -1.646(.9) + 0.02(476)


+0.212(24.5) - 0.413(35) + 0.225(11.2) = 37.1%
23
Interval estimates by Excel (Data analysis plus, not
available) for 95% confidence
Prediction Interval
Margin
Predicted value = 37.09149

Prediction Interval
Lower limit = 25.39527
Upper limit = 48.78771

Interval Estimate of Expected Value


Lower limit = 32.96972
Upper limit = 41.21326
The average inn would not be profitable (Less than
50%).
25
26
TT = 0.331*PU + 0.124*PE + 0.117*WOM + 0.321*PT

Note: PU-ease of use, PE-perceived usefulness,


WOM- e worth of mouth communication, PT-
perceived trust

27
28
29

You might also like