0% found this document useful (0 votes)
7 views

Week_8_Multicollinearity

The document discusses multiple regression analysis, focusing on multicollinearity and testing multiple linear restrictions. It explains the implications of multicollinearity on coefficient estimates and standard errors, as well as methods for detecting and addressing it. Additionally, it covers the process of testing joint hypotheses using the F-test to determine the significance of a set of independent variables in a regression model.

Uploaded by

mali1102013
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Week_8_Multicollinearity

The document discusses multiple regression analysis, focusing on multicollinearity and testing multiple linear restrictions. It explains the implications of multicollinearity on coefficient estimates and standard errors, as well as methods for detecting and addressing it. Additionally, it covers the process of testing joint hypotheses using the F-test to determine the significance of a set of independent variables in a regression model.

Uploaded by

mali1102013
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Quant II

Multiple Regression:
Multicollinearity &
Testing Multiple Linear Restrictions
Collinearity Assumption
• There are no exact linear relationships among
the independent variables
Yi = β0 + β1X1i + β2X2i + β3X3i + ui
X3 = X 1 + 2 X 2

• Would get 3 coefficients when trying to


estimate 4: β0* + β1*X1 + β2*X2 + ui
(The model is not identified; standard errors are infinite)
Collinearity Assumption
• The Gauss-Markov assumptions only require
that there is no perfect collinearity
– In practice, we hardly ever get perfect collinearity
– But we often encounter near-perfect collinearity
• Multicollinearity
Multicollinearity
• Multicollinearity- not perfect but “high”
collinearity between variables
– In a model with two explanatory variables, high
correlation between the two
– In model with more than two explanatory variables,
could also be that one variable is a linear
combination of the rest
• Will be harder to get good estimates because it
is harder to discriminate between the individual
effects of the explanatory variables
Different Degrees of Collinearity:
None
Multicollinearity
• Multicollinearity does not violate the Gauss-
Markov assumptions
– OLS estimators are still BLUE
– But won’t get as precise estimates as one would
in the absence of multicollinearity
• Harder to produce significant coefficients
Consequences
• Standard errors are valid, but they are larger
than they otherwise would be
• For a given σμ2 and sample variation in the Xs,
the smallest variance Var(bj) is obtained when
Xj has zero sample correlation with every other
independent variable
Variance of OLS Estimators
• Two explanatory variables:
Y = β0 + β1X1 + β2X2 + u
• If G-M assumptions are satisfied, then the
variance of the OLS estimators are:
Var(b1) = σ2b1 = *
where σ2 is the population variance of u and rX1X2
is the correlation between X1 and X2
Variance of OLS Estimators
• The general linear regression model:
Y = β0 + β1X1i + β2X2i + β3X3i + … + βk-1Xk-1i + ui

Var(bj) = σ2bj = *
where R j 2 is the proportion of the total variation
in Xi that can be explained by the other
independent variables in the model
Consequences
• High correlation among explanatory variables
does not necessarily lead to poor estimates:
– Will still get good estimates if the number of
observations and the variation in the explanatory
variables are high and the variance of the
disturbance term is small
• Model will be very sensitive to model
specification and outliers in the data
Detecting Multicollinearity
• A classic sign of the presence of a high degree
of collinearity is when you have a high R2 but
none of the variables show significant effects
– But possible to have high multicollinearity even if
low R2
Detecting Multicollinearity
• “High” pairwise correlations among the
independent variables
– High pairwise correlations is a sufficient, but not a
necessary, condition for near-perfect collinearity
• Might be more complex linear relationship among the Xs
• Regress each independent variable against the
other independent variables to see if there are
strong linear dependencies
• Calculate the Variance Inflation Factor (VIF)
– VIF>10 indicative of high collinearity
Dealing with Multicollinearity
• Worry about it when it affects your results
• Deal with the other factors that affect the
variance of your estimators
– Increase sample size, increase the sample
variation of Xi, etc…
• Not a good idea to drop one of the variables
– Will get bias
• Test for robustness
Dealing with Multicollinearity
• Bottom line: Multicollinearity is bad, but
almost always have it
– Standard errors are still correct, just larger
– All else being equal, for estimating βj it is better to
have less correlation between Xj and the other
independent variables
Testing Multiple Linear Restrictions
• The t statistic associated with any OLS coefficient can be used to
test whether the corresponding unknown parameter in the
population is equal to any given constant (usually zero)
Yi = β0 + β1X1i + β2X2i + β3X3i + ui
– This is a test of a hypothesis involving a single linear
restriction
HA0: β1 = 0 HB0: β2 = 0
HA1: β1 ≠ 0 HB1: β2 ≠ 0
• But we might be interested in testing the joint
explanatory power of a set of independent variables
H0: β1 = 0 and β2 = 0
H1: β1 ≠ 0 or β2 ≠ 0
Testing Multiple Linear Restrictions
• Testing multiple hypotheses about the
underlying parameters
– Determining whether or not the joint marginal
contribution of a group of variables is significant
– Whether a set of independent variables has no
partial effect on a dependent variable
• The F-test
– Different role in multiple regression analysis
Joint Hypothesis Test
• The null hypothesis is that a set of variables has
no partial effect on a dependent variable
• For example, the model:
Yi = β0 + β1X1i + β2X2i + β3X3i + β4X4i + β5X5i + ui
(this is called the unrestricted model)

– We are interested in whether X3, X4 and X5 have a


joint effect on Y

Yi = β0 + β1X1i + β2X2i + ui (restricted model)


Joint Hypothesis Test
• Null and research hypothesis:
– H0: β3 = 0, β4 = 0, and β5 = 0
– H1: β3 ≠ 0, β4 ≠ 0, or β5 ≠ 0 [H0 is not true ]
• How should we proceed in testing the null
against the alternative hypothesis?
– Separate t statistics would be misleading
– Need a way to test the exclusion restrictions jointly
Joint Hypothesis Test
• The residual sum of squares can be the basis
for testing multiple hypotheses
– How much the residual sum of squares increases
when we drop X3 , X4 , and X5 tells us something
• RSS always increases when variables are
dropped from the model
– The question is whether this increase is large
enough relative to the RSS in the model with all
the variables to warrant rejecting the null
Joint Hypothesis Test
F statistic = improvement in fit/extra degrees of freedom used up
Residual sum of squares remaining/degrees of
freedom remaining

Fq, n-k =

– Where RSSr is the residual sum of squares from the restricted model
– RSSur is the residual sum of squares from the unrestricted model
– q is the number of restrictions
– (n – k) is the denominator degrees of freedom (the degrees of
freedom in the unrestricted model)
Joint Hypothesis Test
Yi = β0 + β1X1i + β2X2i + β3X3i + β4X4i + β5X5i + ui
(unrestricted model)
Yi = β0 + β1X1i + β2X2i + ui (restricted model)

F3, n-6 =
Joint Hypothesis Test
• General model:
Y = β0 + β1X1 + β2X2 + … + βk-1Xk-1 + u
• Suppose we have q exclusion restrictions to
test (null is that q of the variables have zero
coefficients):
H0: βk-q = 0, … βk-1 = 0
• Restricted model:
Y = β0 + β1X1 + … + βk-q-1Xk-q-1 + u
Joint Hypothesis Test
• Reject H0 when F statistic is “sufficiently” large
– depends on the significance level
– F statistic > Fcrit
– Fcrit depends on 10%, 5%, 1% critical values
– If H0 is rejected, we say that Xk-q, … , Xk-1 are jointly
statistically significant at the appropriate
significance level
Example: Testing Multiple Linear
Restrictions
log(salary)i = β0 + β1yearsi + β2gamesyri + β3bavgi +
β4hrunsyri + β5rbisyri + ui , where
• salary = total salary
• years = total years in the major league
• gamesyr = average games played per year
• bavg = career batting average
• hrunsyr = homeruns per year
• rbisyr = runs batted in per year
Example
Source SS df MS Number of obs = 353
F( 5, 347) = 117.06
Model 308.989208 5 61.7978416 Prob > F = 0.0000
Residual 183.186327 347 .527914487 R-squared = 0.6278
Adj R-squared = 0.6224
Total 492.175535 352 1.39822595 Root MSE = .72658

logsalary Coef. Std. Err. t P>|t| [95% Conf. Interval]

years .0688626 .0121145 5.68 0.000 .0450355 .0926898


gamesyr .0125521 .0026468 4.74 0.000 .0073464 .0177578
bavg .0009786 .0011035 0.89 0.376 -.0011918 .003149
hrunsyr .0144295 .016057 0.90 0.369 -.0171518 .0460107
rbisyr .0107657 .007175 1.50 0.134 -.0033462 .0248776
_cons 11.19242 .2888229 38.75 0.000 10.62435 11.76048
Example

years games hruns rbis bavg

years 1.0000
games 0.9413 1.0000
hruns 0.6744 0.7711 1.0000
rbis 0.8223 0.9243 0.9320 1.0000
bavg 0.1973 0.2674 0.1990 0.2787 1.0000
Example
Log(salary)i = β0 + β1yearsi + β2gamesyri + β3bavgi +
β4hrunsyri + β5rbisyri + ui

– Suppose that we want to test the null hypothesis that once


years in the league and games per year are controlled for,
the statistics measuring performance (batting average,
homeruns and RBIs) have no effect on salary

– H0: β3 = 0, β4 = 0, and β5 = 0
– H1: H0 is not true
Example
Source SS df MS Number of obs = 353
F( 2, 350) = 259.32
Model 293.864058 2 146.932029 Prob > F = 0.0000
Residual 198.311477 350 .566604221 R-squared = 0.5971
Adj R-squared = 0.5948
Total 492.175535 352 1.39822595 Root MSE = .75273

logsalary Coef. Std. Err. t P>|t| [95% Conf. Interval]

years .071318 .012505 5.70 0.000 .0467236 .0959124


gamesyr .0201745 .0013429 15.02 0.000 .0175334 .0228156
_cons 11.2238 .108312 103.62 0.000 11.01078 11.43683
Example
Fq, n-k =

=
Fstat > Fcrit(3, 347) at the 1% significance level = 3.78, so we
reject the null hypothesis that bavg, hrunsyr and rbisyr
have no effect on salary
F-statistic
• Can also use the F-statistic to test the “overall
significance of the regression”
– Null would be: H0: β1 = β2 = … βk-1 = 0 (model has
no explanatory power
– H1: at least one different from zero
• In this case, you will have k-1 restrictions
F(k-1, n-k) = ESS/k-1
RSS/(n-k)

You might also like