Lecture set 3
Lecture set 3
n
1 n
( X i X )ui
n i 1
vi
ˆ1 1 i n1
n 1 2
i 1
( X i X ) 2
n
sX
u Xu u
Xu ,
X X u X
ˆ
p u
1 1 Xu
X
• If an omitted variable Z is both:
1. a determinant of Y (that is, it is contained in u); and
2. correlated with X , then Xu 0 and the OLS estimator ˆ1 is biased
and is not consistent.
• For example, districts with few ESL students (1) do better on
standardized tests and (2) have smaller classes (bigger budgets),
so ignoring the effect of having many ESL students factor would
result in overstating the class size effect. Is this is actually going
on in the CA data?
The omitted variable bias formula: (2 of 2)
TABLE 6.1 Differences in Test Scores for California School Districts with Low and High
Student–Teacher Ratios, by the Percentage of English Learners in the District
Student–Teacher Student–Teacher Difference in Test Scores,
Ratio < 20 Ratio ≥ 20 Low vs. High STR
Blank Average Average
Test Score n Test Score n Difference t-statistic
All districts 657.4 238 650.0 182 7.4 4.04
Percentage of Blank Blank Blank Blank Blank Blank
English learners
< 1.9% 664.5 76 665.4 27 −0.9 −0.30
1.9–8.8% 665.2 64 661.8 44 3.3 1.13
8.8–23.0% 654.9 54 649.7 50 5.2 1.72
> 23.0% 636.7 44 634.8 61 1.9 0.68
– This is one way to “control” for the effect of PctEL when estimating the
effect of STR.
Return to omitted variable bias
Three ways to overcome omitted variable bias
1. Run a randomized controlled experiment in which treatment (STR) is
randomly assigned: then PctEL is still a determinant of TestScore, but
PctEL is uncorrelated with STR. (This solution to OV bias is rarely
feasible.)
2. Adopt the “cross tabulation” approach, with finer gradations of STR
and PctEL – within each group, all classes have the same PctEL, so
we control for PctEL (But soon you will run out of data, and what
about other determinants like family income and parental education?)
3. Use a regression in which the omitted variable (PctEL) is no longer
omitted: include PctEL as an additional regressor in a multiple
regression.
The Population Multiple Regression Model
(SW Section 6.2)
• Consider the case of two regressors:
Yi = β0 + β1X1i + β2X2i + ui, i = 1,…,n
• Y is the dependent variable
• X1, X2 are the two independent variables (regressors)
• (Yi, X1i, X2i) denote the ith observation on Y, X1, and X2.
• β0 = unknown population intercept
• β1 = effect on Y of a change in X1, holding X2 constant
• β2 = effect on Y of a change in X2, holding X1 constant
• ui = the regression error (omitted factors)
Interpretation of coefficients in multiple
regression (1 of 2)
Yi = β0 + β1X1i + β2X2i + ui, i = 1,…,n
Consider the difference in the expected value of Y for two values
of X1 holding X2 constant:
Population regression line when X1 = X1,0:
Y = β0 + β1X1,0 + β2X2
-----------------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+--------------------------------------------------------------------------
str | −1.101296 .4328472 −2.54 0.011 −1.95213 −.2504616
pctel | −.6497768 .0310318 −20.94 0.000 −.710775 −.5887786
_cons | 686.0322 8.728224 78.60 0.000 668.8754 703.189
-----------------------------------------------------------------------------------------
n
1
SER
n k 1 i 1
ˆ
ui
2
1 n 2
RMSE
n i 1
uˆi
2 2 2
R and R (adjusted R ) (1 of 2)
The R2 is the fraction of the variance explained – same definition
as in regression with a single regressor:
ESS SSR
R
2
1 ,
TSS TSS
n n n
where ESS (Yˆi Yˆ ) 2 , SSR uˆi2 , TSS (Yi Y ) 2 .
i 1 i 1 i 1
n 1 SSR
Adjusted R : R 1
2 2
n k 1 TSS
• What – precisely – does this tell you about the fit of regression
(2) compared with regression (1)?
• Why are the R 2 and the R 2 so close in (2)?
The Least Squares Assumptions for Causal
Inference in Multiple Regression (SW Section
6.5)
Let β1, β2,…, βk be causal effects.
Yi = β0 + β1X1i + β2X2i + … + βkXki + ui, i = 1,…,n
1. The conditional distribution of u given the X’s has mean zero,
that is, E(ui|X1i = x1,…, Xki = xk) = 0.
2. (X1i,…,Xki,Yi), i = 1,…,n, are i.i.d.
3. Large outliers are unlikely: X 1 ,..., X k , and Y have four moments:
E ( X 14i ) ,..., E ( X ki4 ) , E (Yi 4 ) .
So that
Y = β0 + β1X + β2W + u (+)
= β0 + β1X + β2W + γ0 + γ2W + v from (***)
= ( β0+ γ0) + β1X + ( β2+ γ2)W + v
= δ0+ β1X + δ2W + v, where δ0 = β0+ γ0 and δ2 = β2+ γ2 (++)
and
E ( ˆ2 ) 2 2 2 2
In summary, if W is such that conditional mean independence is
satisfied, then:
• The OLS estimator of the effect of interest, ˆ1, is unbiased.
• The OLS estimator of the coefficient on the control variable, ˆ2 , does not
have a causal interpretation. The reason is that the control variable is
correlated with omitted variables in the error term, so that ˆ2 is subject
to omitted variable bias.
Next topic: hypothesis tests and confidence intervals…