Week_8_Multicollinearity
Week_8_Multicollinearity
Multiple Regression:
Multicollinearity &
Testing Multiple Linear Restrictions
Collinearity Assumption
• There are no exact linear relationships among
the independent variables
Yi = β0 + β1X1i + β2X2i + β3X3i + ui
X3 = X 1 + 2 X 2
Var(bj) = σ2bj = *
where R j 2 is the proportion of the total variation
in Xi that can be explained by the other
independent variables in the model
Consequences
• High correlation among explanatory variables
does not necessarily lead to poor estimates:
– Will still get good estimates if the number of
observations and the variation in the explanatory
variables are high and the variance of the
disturbance term is small
• Model will be very sensitive to model
specification and outliers in the data
Detecting Multicollinearity
• A classic sign of the presence of a high degree
of collinearity is when you have a high R2 but
none of the variables show significant effects
– But possible to have high multicollinearity even if
low R2
Detecting Multicollinearity
• “High” pairwise correlations among the
independent variables
– High pairwise correlations is a sufficient, but not a
necessary, condition for near-perfect collinearity
• Might be more complex linear relationship among the Xs
• Regress each independent variable against the
other independent variables to see if there are
strong linear dependencies
• Calculate the Variance Inflation Factor (VIF)
– VIF>10 indicative of high collinearity
Dealing with Multicollinearity
• Worry about it when it affects your results
• Deal with the other factors that affect the
variance of your estimators
– Increase sample size, increase the sample
variation of Xi, etc…
• Not a good idea to drop one of the variables
– Will get bias
• Test for robustness
Dealing with Multicollinearity
• Bottom line: Multicollinearity is bad, but
almost always have it
– Standard errors are still correct, just larger
– All else being equal, for estimating βj it is better to
have less correlation between Xj and the other
independent variables
Testing Multiple Linear Restrictions
• The t statistic associated with any OLS coefficient can be used to
test whether the corresponding unknown parameter in the
population is equal to any given constant (usually zero)
Yi = β0 + β1X1i + β2X2i + β3X3i + ui
– This is a test of a hypothesis involving a single linear
restriction
HA0: β1 = 0 HB0: β2 = 0
HA1: β1 ≠ 0 HB1: β2 ≠ 0
• But we might be interested in testing the joint
explanatory power of a set of independent variables
H0: β1 = 0 and β2 = 0
H1: β1 ≠ 0 or β2 ≠ 0
Testing Multiple Linear Restrictions
• Testing multiple hypotheses about the
underlying parameters
– Determining whether or not the joint marginal
contribution of a group of variables is significant
– Whether a set of independent variables has no
partial effect on a dependent variable
• The F-test
– Different role in multiple regression analysis
Joint Hypothesis Test
• The null hypothesis is that a set of variables has
no partial effect on a dependent variable
• For example, the model:
Yi = β0 + β1X1i + β2X2i + β3X3i + β4X4i + β5X5i + ui
(this is called the unrestricted model)
Fq, n-k =
– Where RSSr is the residual sum of squares from the restricted model
– RSSur is the residual sum of squares from the unrestricted model
– q is the number of restrictions
– (n – k) is the denominator degrees of freedom (the degrees of
freedom in the unrestricted model)
Joint Hypothesis Test
Yi = β0 + β1X1i + β2X2i + β3X3i + β4X4i + β5X5i + ui
(unrestricted model)
Yi = β0 + β1X1i + β2X2i + ui (restricted model)
F3, n-6 =
Joint Hypothesis Test
• General model:
Y = β0 + β1X1 + β2X2 + … + βk-1Xk-1 + u
• Suppose we have q exclusion restrictions to
test (null is that q of the variables have zero
coefficients):
H0: βk-q = 0, … βk-1 = 0
• Restricted model:
Y = β0 + β1X1 + … + βk-q-1Xk-q-1 + u
Joint Hypothesis Test
• Reject H0 when F statistic is “sufficiently” large
– depends on the significance level
– F statistic > Fcrit
– Fcrit depends on 10%, 5%, 1% critical values
– If H0 is rejected, we say that Xk-q, … , Xk-1 are jointly
statistically significant at the appropriate
significance level
Example: Testing Multiple Linear
Restrictions
log(salary)i = β0 + β1yearsi + β2gamesyri + β3bavgi +
β4hrunsyri + β5rbisyri + ui , where
• salary = total salary
• years = total years in the major league
• gamesyr = average games played per year
• bavg = career batting average
• hrunsyr = homeruns per year
• rbisyr = runs batted in per year
Example
Source SS df MS Number of obs = 353
F( 5, 347) = 117.06
Model 308.989208 5 61.7978416 Prob > F = 0.0000
Residual 183.186327 347 .527914487 R-squared = 0.6278
Adj R-squared = 0.6224
Total 492.175535 352 1.39822595 Root MSE = .72658
years 1.0000
games 0.9413 1.0000
hruns 0.6744 0.7711 1.0000
rbis 0.8223 0.9243 0.9320 1.0000
bavg 0.1973 0.2674 0.1990 0.2787 1.0000
Example
Log(salary)i = β0 + β1yearsi + β2gamesyri + β3bavgi +
β4hrunsyri + β5rbisyri + ui
– H0: β3 = 0, β4 = 0, and β5 = 0
– H1: H0 is not true
Example
Source SS df MS Number of obs = 353
F( 2, 350) = 259.32
Model 293.864058 2 146.932029 Prob > F = 0.0000
Residual 198.311477 350 .566604221 R-squared = 0.5971
Adj R-squared = 0.5948
Total 492.175535 352 1.39822595 Root MSE = .75273
=
Fstat > Fcrit(3, 347) at the 1% significance level = 3.78, so we
reject the null hypothesis that bavg, hrunsyr and rbisyr
have no effect on salary
F-statistic
• Can also use the F-statistic to test the “overall
significance of the regression”
– Null would be: H0: β1 = β2 = … βk-1 = 0 (model has
no explanatory power
– H1: at least one different from zero
• In this case, you will have k-1 restrictions
F(k-1, n-k) = ESS/k-1
RSS/(n-k)