CH - 03 - Multiple Regression Analysis Estimation
CH - 03 - Multiple Regression Analysis Estimation
Analysis:
Chapter 3
Estimation
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or
service or otherwise on a password-protected website or school-approved learning management system for classroom use. © kentoh/Shutterstock.
Multiple Regression
Analysis: Estimation
● Definition of the multiple linear regression model
Dependent variable,
explained variable, Error term,
Independent variables, disturbance,
response variable,… explanatory variables, unobservables,…
regressors,…
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Motivation for multiple regression
• Incorporate more explanatory factors into the model
• Explicitly hold fixed other factors that otherwise would be in
• Allow for more flexible functional forms
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Example: Average test scores and per student spending
Other factors
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Example: Family income and family consumption
Other factors
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Example: CEO salary, sales, and CEO tenure
Log of CEO salary Log sales Quadratic function of CEO tenure with the firm
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● OLS Estimation of the multiple regression model
● Random sample
● Regression residuals
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Interpretation of the multiple regression model
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Example: Determinants of college GPA
Grade point average at college High school grade point average Achievement test score
● Interpretation
• Holding ACT fixed, another point on high school grade point average
is associated with another .453 points college grade point average
• Or: If we compare two students with the same ACT, but the hsGPA
of student A is one point higher, we predict student A to have a
colGPA that is .453 higher than that of student B
• Holding high school grade point average fixed, another 10 points on
ACT are associated with less than one point on college GPA
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Properties of OLS on any sample of data
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Goodness-of-Fit
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Example: Explaining arrest records
Number of times Proportion prior arrests Months in prison 1986 Quarters employed 1986
arrested 1986 that led to conviction
● Interpretation:
• If the proportion prior arrests increases by 0.5, the predicted fall in
arrests is 7.5 arrests per 100 men
• If the months in prison increase from 0 to 12, the predicted fall in
arrests is 0.408 arrests for a particular man
• If the quarters employed increase by 1, the predicted fall in arrests
is 10.4 arrests per 100 men
© 2016 Cengage Learning . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
®
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Example: Explaining arrest records (cont.)
• An additional explanatory variable is added:
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Standard assumptions for the multiple regression model
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Standard assumptions for the multiple regression model
(cont.)
● Remarks on MLR.3
• The assumption only rules out perfect collinearity/correlation bet-
ween explanatory variables; imperfect correlation is allowed
• If an explanatory variable is a perfect linear combination of other
explanatory variables it is superfluous and may be eliminated
• Constant variables are also ruled out (collinear with intercept)
© 2016 Cengage Learning . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
®
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Example for perfect collinearity: small sample
In a small sample, avginc may accidentally be an exact multiple of expend; it will not
be possible to disentangle their separate effects because there is exact covariation
Either shareA or shareB will have to be dropped from the regression because there
is an exact linear relationship between them: shareA + shareB = 1
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Standard assumptions for the multiple regression model
(cont.)
If avginc was not included in the regression, it would end up in the error term;
it would then be hard to defend that expend is uncorrelated with the error
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Discussion of the zero mean conditional assumption
• Explanatory variables that are correlated with the error term are
called endogenous; endogeneity is a violation of assumption MLR.4
• Explanatory variables that are uncorrelated with the error term are
called exogenous; MLR.4 holds if all explanat. var. are exogenous
• Exogeneity is the key assumption for a causal interpretation of the
regression, and for unbiasedness of the OLS estimators
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Including irrelevant variables in a regression model
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Omitted variable bias
If x1 and x2 are correlated, assume a linear
regression relationship between them
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Example: Omitting ability in a wage equation
If exper is approximately uncorrelated with educ and abil, then the direction
of the omitted variable bias can be as analyzed in the simple two variable case.
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Standard assumptions for the multiple regression model
(cont.)
with
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Theorem 3.2 (Sampling variances of the OLS slope estimators)
Under assumptions MLR.1 – MLR.5:
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Components of OLS Variances:
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● An example for multicollinearity
The different expenditure categories will be strongly correlated because if a school has a lot
of resources it will spend a lot on everything.
It will be hard to estimate the differential effects of different expenditure categories because
all expenditures are either high or low. For precise estimates of the differential effects, one
would need information about situations where expenditure categories change differentially.
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Discussion of the multicollinearity problem
• In the above example, it would probably be better to lump all
expen-diture categories together because effects cannot be
disentangled
• In other cases, dropping some independent variables may reduce
multicollinearity (but this may lead to omitted variable bias)
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
• Only the sampling variance of the variables involved in
multicollinearity will be inflated; the estimates of other effects may
be very precise
• Note that multicollinearity is not a violation of MLR.3 in the strict
sense
• Multicollinearity may be detected through “variance inflation factors”
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Variances in misspecified models
• The choice of whether to include a particular variable in a regression
can be made by analyzing the tradeoff between bias and variance
Estimated model 1
Estimated model 2
• It might be the case that the likely omitted variable bias in the
misspecified model 2 is overcompensated by a smaller variance
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Variances in misspecified models (cont.)
● Case 2: Trade off bias and variance; Caution: bias will not vanish even in large samples
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Estimating the error variance
An unbiased estimate of the error variance can be obtained by substracting the number of
estimated regression coefficients from the number of observations. The number of obser-
vations minus the number of estimated parameters is also called the degrees of freedom.
The n estimated squared residuals in the sum are not completely independent but related
through the k+1 equations that define the first order conditions of the minimization
problem.
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Estimation of the sampling variances of the OLS estimators
The true sampling
variation of the
estimated
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Multiple Regression
Analysis: Estimation
● Theorem 3.4 (Gauss-Markov Theorem)
• Under assumptions MLR.1 - MLR.5, the OLS estimators are the best
linear unbiased estimators (BLUEs) of the regression coefficients,
i.e.
© 2016 Cengage Learning ®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.