0% found this document useful (0 votes)
2 views33 pages

Metrics Topic6 Part1 Multipleregression

The document discusses omitted variable bias in econometrics, explaining how omitted variables can lead to biased estimates in regression analysis. It outlines the causal analysis framework, the implications of omitted variables, and methods to mitigate bias, such as randomized controlled experiments and including control variables in regression models. The document also provides examples, including the Project STAR study, to illustrate the concepts of conditional independence and the importance of controlling for confounding factors.

Uploaded by

David NICE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views33 pages

Metrics Topic6 Part1 Multipleregression

The document discusses omitted variable bias in econometrics, explaining how omitted variables can lead to biased estimates in regression analysis. It outlines the causal analysis framework, the implications of omitted variables, and methods to mitigate bias, such as randomized controlled experiments and including control variables in regression models. The document also provides examples, including the Project STAR study, to illustrate the concepts of conditional independence and the importance of controlling for confounding factors.

Uploaded by

David NICE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Econ 3334: Introduction to Econometrics

Omitted variable bias and control variables

1
Review the causal analysis framework in topic 4
• 𝑋𝑖 : individual i’s received treatment (such as years of education).
• 𝑌𝑖 : individual i’s outcome measure (such as earnings).
• 𝑌𝑥𝑖 : potential outcome if individual i receives treatment 𝑥.
• Assume that potential outcome is linear in 𝑥, where 𝑥 = 𝑥1 , 𝑥2 , … , 𝑥𝑀 .
𝑌𝑥𝑖 = 𝛽0 + 𝛽1 𝑥 + 𝑒𝑖 , 𝐸 (𝑒𝑖 ) = 0.
Δ𝑌𝑥𝑖
• The causal effect of the treatment on outcome is defined as 𝛽1 =
Δ𝑥
One more year of education changes earnings by 𝛽1 units.
• Note that when 𝑥 = 𝑋𝑖 , 𝑌𝑥𝑖 = 𝑌𝑖 : 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑒𝑖
• In general, 𝐸 (𝑒𝑖 |𝑋𝑖 ) ≠ 0.
➢ Think of 𝑒𝑖 as the innate ability of person i. which determines
person i’s potential outcomes.
➢ On average, those with more education tend to have higher ability:
𝐸 (𝑒𝑖 |𝑋𝑖 = 16) > 𝐸 (𝑒𝑖 |𝑋𝑖 = 9).
➢ The causal effect is β1 × 7. But the measured difference in means:
𝐸 (𝑌𝑖 |𝑋𝑖 = 16) − 𝐸 (𝑌𝑖 |𝑋𝑖 = 9) = 7𝛽1 + 𝐸 (𝑒𝑖 |𝑋𝑖 = 16) − 𝐸 (𝑒𝑖 |𝑋𝑖 = 9) >
7𝛽1
2
Omitted variable bias, the theory
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑢
• The error u includes all omitted variables, other than X, that
influence Y. ( u might also reflect heterogeneous causal effect.)
• There are always omitted variables.
• If there are omitted variables that are correlated with X, then LSA#1
is violated and the OLS estimators converge in probability to the
causal parameter plus a bias. The bias is called “omitted variable
bias.”
• Recall that 𝑐𝑜𝑣(𝑌, 𝑋) = 𝑐𝑜𝑣(𝛽0 + 𝛽1 𝑋 + 𝑢, 𝑋) = 𝛽1 𝑣𝑎𝑟(𝑋) + 𝑐𝑜𝑣(𝑢, 𝑋)
• Then
𝑠𝑌𝑋 𝑝 𝑐𝑜𝑣(𝑌, 𝑋) 𝛽1 𝑣𝑎𝑟(𝑋) + 𝑐𝑜𝑣(𝑢, 𝑋) 𝑐𝑜𝑣(𝑢, 𝑋)
𝛽̂1 = → = = 𝛽1 +
𝑠𝑋2 𝑣𝑎𝑟(𝑋) 𝑣𝑎𝑟(𝑋) 𝑣𝑎𝑟(𝑋)
• Equivalently,
𝑝 𝑐𝑜𝑣 (𝑢, 𝑋 )
̂
𝛽1 − 𝛽1 → ≡ "𝐨𝐦𝐢𝐭𝐭𝐞𝐝 𝐯𝐚𝐫𝐢𝐚𝐛𝐥𝐞 𝐛𝐢𝐚𝐬"
(
𝑣𝑎𝑟 𝑋 )
• There is a downward bias if cov(u,X)<0, and upward bias if cov(u,X)>0.

3
The TestScore-STR example
𝑇𝑒𝑠𝑡𝑆𝑐𝑜𝑟𝑒 = 𝛽0 + 𝛽1 𝑆𝑇𝑅 + 𝑢
• PctEL has a negative effect on TestScore, and thus enters u with a
negative sign
𝑢 = 𝛾𝑃𝑐𝑡𝐸𝐿 + 𝑣, 𝛾<0
• PctEL is positively correlated with STR
𝑐𝑜𝑣(𝑃𝑐𝑡𝐸𝐿, 𝑆𝑇𝑅) > 0
So
cov(u, STR) < 0
• Then there is a downward bias.
• The OLS estimators: 𝛽̂1 − 𝛽1 < 0 in large samples.

4
Omitted variables that satisfy LSA#1

𝐼𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛𝑖 = 𝛽0 + 𝛽1 𝑉𝑎𝑐𝑐𝑖𝑛𝑒𝑖 + 𝑢𝑖

1 𝑖𝑓 𝑝𝑒𝑟𝑠𝑜𝑛 𝑖 𝑔𝑜𝑡 𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑;


𝐼𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛𝑖 = {
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
1 𝑖𝑓 𝑝𝑒𝑟𝑠𝑜𝑛 𝑖 𝑟𝑒𝑐𝑒𝑖𝑣𝑒𝑠 𝑡ℎ𝑒 𝑣𝑎𝑐𝑐𝑖𝑛𝑒;
𝑉𝑎𝑐𝑐𝑖𝑛𝑒𝑖 = {
0 𝑖𝑓 𝑝𝑒𝑟𝑠𝑜𝑛 𝑖 𝑟𝑒𝑐𝑒𝑖𝑣𝑒𝑠 𝑡ℎ𝑒 𝑝𝑙𝑎𝑐𝑒𝑏𝑜.

𝑢𝑖 might include person 𝑖’s age, original health condition,


individual treatment effects, etc.
The vaccine status is unknown to the subjects so that it will not
generate systematic difference between the treatment and control
groups in terms of the subjects’ behavior.
If the vaccine and placebo are randomly assigned,
E(ui |𝑉𝑎𝑐𝑐𝑖𝑛𝑒𝑖 ) = 0
So LSA#1 holds.

5
Three ways to overcome omitted variable bias
1. Run a randomized controlled experiment in which treatment (STR) is
randomly assigned: then PctEL is still a determinant of TestScore, but
PctEL is uncorrelated with STR. (This solution is not feasible.)

2. Adopt a “cross tabulation” approach (matching) – within each group, all


classes have about the same PctEL, so we control for PctEL (An
intuitive method. But become increasingly complicated when other
determinants like family income and parental education are involved)

3. Use a regression in which the omitted variable (PctEL) is no longer


omitted: include PctEL as an additional regressor in a multiple
regression.

6
Difference in means: holding constant omitted factors

• Among districts with comparable PctEL, the effect of class size is small than
the overall “test score gap” of 7.4.

7
The conditional independence assumption and control variables
• Conditional Independence Assumption: the treatment Xi is
independent of the potential outcomes Yxi conditional on Zi .
• Given that 𝑌𝑥𝑖 = 𝛽0 + 𝛽1 𝑥 + 𝑒𝑖 , the above assumption implies Xi is
independent of ei conditional on Zi . This implies
𝐸 (𝑒𝑖 |Xi , 𝑍𝑖 ) = 𝐸(𝑒𝑖 |𝑍𝑖 )
• We can always decompose a random variable as follows
ei = 𝐸 (𝑒𝑖 |Xi , 𝑍𝑖 ) + (𝒆𝒊 − 𝑬(𝒆𝒊 |𝐗 𝐢 , 𝒁𝒊 )) = 𝐸 (𝑒𝑖 |Xi , 𝑍𝑖 ) + 𝒖𝒊
where ui = 𝑒𝑖 − 𝐸 (𝑒𝑖 |Xi , 𝑍𝑖 ) and E(ui |𝑋𝑖 , 𝑍𝑖 ) = 0 by definition.
• Assume 𝐸 (𝑒𝑖 |𝑍𝑖 ) = 𝛾0 + 𝛾1 𝑍𝑖 , then
𝑬(𝒆𝒊 |𝐗 𝐢 , 𝒁𝒊 ) = 𝐸 (𝑒𝑖 |𝑍𝑖 ) = 𝜸𝟎 + 𝜸𝟏 𝒁𝒊 .
• Then ei = 𝛾0 + 𝛾1 𝑍𝑖 + 𝑢𝑖 , 𝑤𝑖𝑡ℎ 𝐸 (𝑢𝑖 |𝑋𝑖 , 𝑍𝑖 ) = 0.
• The original linear causal model becomes a linear regression model
with an additional regressor:
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑒𝑖 = (𝛽0 + 𝛾0 ) + 𝛽1 𝑋𝑖 + 𝛾1 𝑍𝑖 + 𝑢𝑖 , 𝐸 (𝑢𝑖 |𝑋𝑖 , 𝑍𝑖 ) = 0
• In the new model, Zi is called the “control variable”.
8
Example of Conditional Independence: Project STAR
• Project STAR (Student-Teacher Achievement Ratio)
• 11,600 kindergartners in 1985-86. Study ran for 4 years until the
original cohort was in 3rd grade.
– Cost $12 million.
• Upon entering the school system, a student was randomly assigned
to one of three groups within the school:
– regular class (22 – 25 students, no aid)
– regular class + aide (with a full-time aid)
– small class (13 – 17 students)
• Y = Stanford Achievement Test scores
• Teachers were also randomly assigned within a school.

9
The Data Structure of STAR
• A random sample or an iid. sample. Use the data structure of STAR
as an example:
{(𝑌𝑖 , 𝑋𝑖 , 𝑆𝑖 )}300
𝑖=1
• The observed variables are:
– 𝑌𝑖 : student 𝑖’s test score.
– 𝑋𝑖 : a dummy indicating whether student 𝑖 was treated with
small class.
– 𝑆𝑖 : a factor that indicates which school student 𝑖 belongs to.
Assume that there are three schools: 𝑆𝑖 = 1,2,3.
– We may think of 𝑆𝑖 in terms of 3 dummies: 𝑆1𝑖 , 𝑆2𝑖 , 𝑆3𝑖 .
• The causal model
𝑌𝑖 = 𝑐 + 𝛽𝑋𝑖 + 𝑢𝑖
where 𝑢𝑖 is the causal error, which includes other determinants of scores
and also possibly reflects heterogeneous treatment effects.

10
The Implied Linear Regression
• The treatment of class sizes is randomly assigned within a school but
not between schools:
– 𝑋𝑖 is independent of the potential scores (𝑌1𝑖 , 𝑌0𝑖 ), conditional
on (𝑆1𝑖 , 𝑆2𝑖 )
– Conditional independence implies conditional mean
independence between 𝑋𝑖 and the causal error 𝑢𝑖 :
𝐸 (𝑢𝑖 |𝑋𝑖 , 𝑆1𝑖 , 𝑆2𝑖 ) = 𝐸 (𝑢𝑖 |𝑆1𝑖 , 𝑆2𝑖 )
– We have two dummies that span all three school effects. This
is the so called “saturated model”: the conditional mean must be
a linear function of the set of dummies:
𝑢𝑖 = 𝑎 + 𝛾1 𝑆1𝑖 + 𝛾2 𝑆2𝑖 + 𝑒𝑖 , 𝐸 (𝑒𝑖 |𝑋𝑖 , 𝑆1𝑖 , 𝑆2𝑖 ) = 0.
• This implies a linear regression model with school fixed effects as
controls:
𝑌𝑖 = 𝛽0 + 𝛽𝑋𝑖 + 𝛾1 𝑆1𝑖 + 𝛾2 𝑆2𝑖 + 𝑒𝑖 , 𝐸 (𝑒𝑖 |𝑋𝑖 , 𝑆1𝑖 , 𝑆2𝑖 ) = 0

11
The Population Multiple Regression Model
Consider the case of two regressors:
Yi = 0 + 1X1i + 2X2i + ui, i = 1,…,n

• Y is the dependent variable


• X1, X2 are the two regressors
• (Yi, X1i, X2i) denote the ith observation on Y, X1, and X2.
• 0 = unknown population intercept
• 1 = partial effect of a change in X1 on Y, holding X2 constant
• ui = the regression error (omitted variables)

12
Interpretation of coefficients in multiple regression
Yi = 0 + 1X1i + 2X2i + ui, i = 1,…,n, E(ui |𝑋1𝑖 , 𝑋2𝑖 ) = 0
Consider changing X1 by X1 while holding X2 constant:

Population regression line before the change:


E(Y|X1 , 𝑋2 ) = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2
Population regression line, after the change:
E(Y|X1 + Δ𝑋1 , 𝑋2 ) = 𝛽0 + 𝛽1 (𝑋1 + Δ𝑋1 ) + 𝛽2 𝑋2

Difference: ΔY = E(Y|X1 + Δ𝑋1 , 𝑋2 ) − E(Y|X1 , 𝑋2 ) = 𝛽1 Δ𝑋1

Y
1 = , holding X2 constant
X 1

13
The OLS Estimator in Multiple Regression
Regression of TestScore against STR:

TestScore = 698.9 – 2.28STR

Now include percent English Learners in the district (PctEL):

TestScore= 686.0 – 1.10STR – 0.65PctEL

• What happens to the coefficient on STR?


• Why? (Note: corr(STR, PctEL) = 0.19) Does the regression match your
intuition of the direction of bias? (what is u in the first regression?)
Interpretation of “-1.10”?

14
What’s special about multiple regression?

• R2 and R
̅2 (adjusted R2)
• One more least square assumption: LSA #4 (no perfect
multicollinearity)
• F-test (Wald-test): testing joint significance of more than one
regression coefficients; testing restrictions on regression coefficients.
(linking F-statistic and t-statistic in case of only one restriction in H0)
Motivation: imperfect multicollinearity…

Remark: other than the above three, the theory of multiple regression
is the same as that of simple regression with a single regressor.

15
Measures of Fit for Multiple Regression
Actual = predicted + residual: Yi = Yˆi + uˆi
n
1 1 n 2
SER = 
n − k − 1 i =1
ˆ
ui
2
, RMSE = 
n i =1
uˆi

where k is the number of regressors.

• RMSE = std. deviation of uˆi (without degree-of-freedom adjustment)

• R2 = fraction of variance of Y explained by X

• R
̅2 = “adjusted R2” = R2 with a degrees-of-freedom adjustment

16
R2
R2 always increases after adding additional regressors.
By adding additional regressors, the SSR becomes smaller.
𝑛 2 𝑛 2
∑ (𝑌𝑖 − 𝑏̂0 − 𝑏̂1 𝑋1𝑖 − 𝑏̂2 𝑋2𝑖 ) ≤ ∑ (𝑌𝑖 − 𝛽̂0 − 𝛽̂1 𝑋1𝑖 )
𝑖=1 𝑖=1

By definition of OLS
𝑛 2
∑ (𝑌𝑖 − 𝑏̂0 − 𝑏̂1 𝑋1𝑖 − 𝑏̂2 𝑋2𝑖 )
𝑖=1
𝑚𝑖𝑛 𝑛
≡ ∑ (𝑌𝑖 − 𝑏0 − 𝑏1 𝑋1𝑖 − 𝑏2 𝑋2𝑖 )2
{𝑏0 , 𝑏1 , 𝑏2 } 𝑖=1
𝑛 2
≤∑ (𝑌𝑖 − 𝛽̂0 − 𝛽̂1 𝑋1𝑖 + 0 ⋅ 𝑋2𝑖 )
𝑖=1
SSR
R2 = 1− becomes larger when adding more regressors. (TSS is the
TSS
same for all regression models)

17
̅2
R
̅2 (“adjusted R2”) makes some adjustment by “penalizing” the
R
̅2 does not necessarily increase
regression with more regressors – the R
when adding additional regressors.

 n − 1  SSR
Adjusted R :2 R = 1− 
2

 n − k − 1  TSS
• When 𝑘 ≥ 1, for the same regression
̅2 < R2
R
• For a regression with only the intercept,
̅2 = R2=0 (In this case k=0).
R
• If n is large,
𝑅̅2 ≈ 𝑅2 .

18
̅𝟐
An example of 𝑹𝟐 and 𝑹

(1) TestScore=698.9 – 2.28STR,


̅2 = .049, SER = 18.58, n=420
R2 = .051, R

(2) TestScore=686.0 – 1.10STR – 0.65PctEL,


̅2 = .424, SER = 14.46, n=420
R2 = .426, R

(3) TestScore=654.2,
̅2 = 0, SER = 19.05, n=420
R2 = 0, R

19
The Least Squares Assumptions for Multiple Regression

Yi = 0 + 1X1i + 2X2i + … + kXki + ui, i = 1,…,n

1. E(ui|X1i = x1,…, Xki = xk) = 0, for all x1,…,xk


2. (X1i,…,Xki,Yi), i =1,…,n, are i.i.d.
3. Large outliers are unlikely: X1,…, Xk, and Y have finite fourth
4 4 4
moments: E( X 1i ) < ,…, E( X ki ) < , E(Yi ) < .
4. There is no perfect multicollinearity.

20
Perfect multicollinearity

Perfect multicollinearity is when one of the regressors is an exact


linear function of the other regressors.
• Perfect multicollearity is a modeling error.
• If 𝑋2 = 𝑎 + 𝑏𝑋1 , then there is a logical error when interpreting 𝛽1
𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝑢
= 𝛽0 + 𝛽1 𝑋1 + 𝛽2 (𝑎 + 𝑏𝑋1 ) + 𝑢
= (𝛽0 + 𝛽2 𝑎) + (𝛽1 + 𝛽2 𝑏)𝑋1 + 0 ⋅ 𝑋2 + 𝑢
• The logical error: is 𝛽1 𝑜𝑟 𝛽1 + 𝛽2 𝑏 the true effect of 𝑋1 on Y, holding
constant 𝑋2 ?

• Solution: remove the redundant regressor from regression: the


regression software will automatically do this.

21
An example of perfect multicollinearity
• Generate a new variable
STR2=2*STR+1
• Regress TestScore on STR and STR2 and PctEL

Model: OLS, using observations 1-420


Dependent variable: testscr
Omitted due to exact collinearity: str2

coefficient std. error t-ratio p-value


------------------------------------------------------------------------
const 686.032 7.41131 92.57 3.87e-280 ***
str −1.10130 0.380278 −2.896 0.0040 ***
el_pct −0.649777 0.0393425 −16.52 1.66e-047 ***

22
The Sampling Distribution of the OLS Estimator
Under the four Least Squares Assumptions,
• The sampling distribution of ˆ1 has mean 1
𝐸(𝛽̂1 ) = 𝛽1
• var( ˆ1 ) is inversely proportional to n.
• For large n
p
o ˆ1 is consistent: ˆ1 → 1 (law of large numbers)
̂1 −𝛽1
𝛽
o ̂1 ) is approximately N(0,1) (CLT)
𝑆𝐸(𝛽
o These statements hold for all 𝛽̂𝑗 , 𝑗 = 0,1, … , 𝑘

Conceptually, there is nothing new here!

23
The dummy variable trap
• Dummy variable trap is a special example of perfect multicollinearity.
• Consider a set of dummy variables, which are mutually exclusive and
exhaustive: there are multiple categories and every observation falls
in one and only one category.
Consider the four dummies for a college student:
Freshmen+Sophomores+Juniors+Seniors =1
• If your regression includes all these dummy variables and a constant,
you will have perfect multicollinearity – this is called the dummy
variable trap. Solutions:
1. Omit one of the groups (e.g. Seniors), or
2. Omit the intercept

24
An example of dummy variable trap
1, 𝑖𝑓 𝑖 𝑖𝑠 𝑎 𝑚𝑎𝑛 1, 𝑖𝑓 𝑖 𝑖𝑠 𝑎 𝑤𝑜𝑚𝑎𝑛
𝑀𝑖 = { , 𝑊𝑖 = {
0, 𝑒𝑙𝑠𝑒 0, 𝑒𝑙𝑠𝑒
• 𝑀𝑖 and 𝑊𝑖 are mutually exclusive: an individual cannot be both a man
and a woman.
• 𝑀𝑖 and 𝑊𝑖 are exhaustive: an individual must be either a man or a
woman
• In sum: 𝑀𝑖 + 𝑊𝑖 = 1 for all i.
• Consider the regression model
𝑊𝑎𝑔𝑒𝑖 = 𝛽0 + 𝛽1 𝑀𝑖 + 𝛽2 𝑊𝑖 + 𝑢𝑖
𝐸 (𝑊𝑎𝑔𝑒𝑖 |𝑖 𝑖𝑠 𝑎 𝑚𝑎𝑛) = 𝛽0 + 𝛽1
𝐸 (𝑊𝑎𝑔𝑒𝑖 |𝑖 𝑖𝑠 𝑎 𝑤𝑜𝑚𝑎𝑛) = 𝛽0 + 𝛽2
• Cannot separately interpret 𝛽0 , 𝛽1 , 𝛽2 .

25
Two correct models:
(1) 𝑊𝑎𝑔𝑒𝑖 = 𝛽0 + 𝛽1 𝑀𝑖 + 𝑢𝑖
(2) 𝑊𝑎𝑔𝑒𝑖 = 𝑏1 𝑀𝑖 + 𝑏2 𝑊𝑖 + 𝑢𝑖

The coefficients in (1) and (2) have different interpretations.


In (1), the wage difference between men and women is 𝛽1 :
𝐸 (𝑤𝑎𝑔𝑒|𝑀 = 0) = 𝛽0 , 𝐸 (𝑤𝑎𝑔𝑒|𝑀 = 1) = 𝛽0 + 𝛽1 .
In (2), the wage difference between men and women is 𝑏1 − 𝑏2 :
𝐸 (𝑤𝑎𝑔𝑒|𝑀 = 0) = 𝑏2 , 𝐸 (𝑤𝑎𝑔𝑒|𝑀 = 1) = 𝑏1 .

In sum, we have
𝛽0 = 𝑏2 , 𝛽1 = 𝑏1 − 𝑏2 .
The OLS estimators should follow the same relations
𝛽̂0 = 𝑏̂2 , 𝛽̂1 = 𝑏̂1 − 𝑏̂2 .
The corresponding standard errors are also the same:
𝑆𝐸(𝛽̂0 ) = 𝑆𝐸(𝑏̂2 ), 𝑆𝐸(𝛽̂1 ) = 𝑆𝐸(𝑏̂1 − 𝑏̂2 ).

26
Model 1: OLS, using observations 1-7986
Dependent variable: ahe
Heteroskedasticity-robust standard errors, variant HC1
Coefficient Std. Error t-ratio p-value
const 15.3586 0.133946 114.7 <0.0001 ***
male 2.41405 0.190958 12.64 <0.0001 ***

Mean dependent var 16.77115 S.D. dependent var 8.758696


Sum squared resid 601269.8 S.E. of regression 8.678095
R-squared 0.018443 Adjusted R-squared 0.018320
F(1, 7984) 159.8151 P-value(F) 2.76e-36
Log-likelihood −28586.81 Akaike criterion 57177.62
Schwarz criterion 57191.59 Hannan-Quinn 57182.40

Model 2: OLS, using observations 1-7986


Dependent variable: ahe
Heteroskedasticity-robust standard errors, variant HC1

Coefficient Std. Error t-ratio p-value


male 17.7726 0.136101 130.6 <0.0001 ***
female 15.3586 0.133946 114.7 <0.0001 ***

Mean dependent var 16.77115 S.D. dependent var 8.758696


Sum squared resid 601269.8 S.E. of regression 8.678095
R-squared 0.018443 Adjusted R-squared 0.018320
F(1, 7984) 15099.85 P-value(F) 0.000000
Log-likelihood −28586.81 Akaike criterion 57177.62
Schwarz criterion 57191.59 Hannan-Quinn 57182.40

27
Imperfect multicollinearity

Imperfect and perfect multicollinearity are quite different despite the


similarity of the names.

Imperfect multicollinearity occurs when two or more regressors are very


highly correlated.
• Why the term “multicollinearity”? If two regressors are very
highly correlated, then their scatterplot will pretty much look like
a straight line – they are “co-linear” – but unless the correlation is
exactly 1, that collinearity is imperfect.

28
Implications of imperfect multicollinearity
Theoretically, imperfect multicollinearity is not a problem.
Practically, imperfect multicollinearity implies that one or more of the
regression coefficients will be imprecisely estimated.
• 𝛽1 is the effect of X1 holding X2 constant; but if X1 and X2 are highly
correlated, there is very little variation in X1 once X2 is held constant
– so the data don’t contain much information about what happens
when X1 changes but X2 doesn’t.
• If so, the standard error of the OLS estimator of the coefficient on X1
will be large.

29
An example of imperfect multicollinearity
• Generate a new variable
STR2=2*STR+1
• Generate another new variable
STR3=STR2+rnorm()
Here rnorm() denotes iid.N(0,1) random variable.
• Perfect multicollinearity between STR and STR2
• Imperfect multicollinearity between STR and STR3
Corr(STR,STR3)=0.966
Model: OLS, using observations 1-420
Dependent variable: testscr
Omitted due to exact collinearity: str2

coefficient std. error t-ratio p-value


----------------------------------------------------------------------------
const 685.536 7.48948 91.53 9.55e-278 ***
str −1.77862 1.45801 −1.220 0.2232
el_pct −0.648585 0.0394567 −16.44 3.82e-047 ***
str3 0.341734 0.710105 0.4812 0.6306

30
Stronger correlation implies more imprecise estimates
• Generate
STR4=STR2+0.001*rnorm()
Corr(STR,STR3)=0.9999

Model: OLS, using observations 1-420


Dependent variable: testscr
Omitted due to exact collinearity: str2

coefficient std. error t-ratio p-value


------------------------------------------------------------
const 273.296 698.332 0.3914 0.6957
str −826.915 1397.16 −0.5919 0.5543
el_pct −0.649383 0.0393789 −16.49 2.25e-047 ***
str4 412.902 698.573 0.5911 0.5548

31
What to do if there is imperfect multicollinearity
• Suppose X is the variable of interest, such as STR.
• Assume another variable W in the regression function is highly
correlated with X.

• If SE of X’s coefficient is small, then there is no need to worry about


W.
• If SE of X’s coefficient is large, then drop W and find another control
variable.

• In sum, whether imperfect multicollinearity matters or not depends


on whether the estimates of X’s coefficient is precise or not.
• In theory, if the sample size is large enough, imperfect
multicollinearity will not pose any practical problem.

32
Summary

• Omitted variable bias


• Interpretation of coefficients in multiple regression
• Measure of fit (𝑅2 and 𝑅̅2 )
• OLS formula in matrix form
• Perfect and imperfect multicollinearity, implications of imperfect
multicollinearity in practice
• Dummy variable trap

33

You might also like