0% found this document useful (0 votes)

9 views

Lecture set 3

The document discusses applied quantitative research methods, focusing on linear regression with multiple regressors, particularly addressing omitted variable bias and its implications for estimating causal effects. It outlines the conditions under which omitted variable bias occurs, introduces the concept of causal effects through ideal randomized controlled experiments, and explains the interpretation of coefficients in multiple regression models. Additionally, it provides examples and measures of fit for multiple regression, emphasizing the importance of including relevant variables to avoid bias in estimations.

Uploaded by

Jimmy Teng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Lecture set 3

Uploaded by

Jimmy Teng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

ECONS303: Applied Quantitative Research Methods

Lecture set 3: Linear Regression with

Multiple Regressors

The total reality is the economy. We make to ourselves models of facts.

- Choy, Keen Meng (2020) Tractatus Modellus-Philosophicus, mimeo
Outline
1. Omitted variable bias
2. Using regression to estimate causal effects
3. Multiple regression and OLS
4. Measures of fit
5. Sampling distribution of the OLS estimator with multiple
regressors
6. Control variables
Omitted Variable Bias (SW Section 6.1) (1 of 5)
In the class size example, β1 is the causal effect on test scores of a
change in the STR by one student per teacher.
When β1 is a causal effect, the first least squares assumption for
causal inference must hold: E(u|X) = 0.
The error u arises because of factors, or variables, that influence Y
but are not included in the regression function. There are always
omitted variables!
If the omission of those variables results in E(u|X) ≠ 0, then the
OLS estimator will be biased.
Omitted Variable Bias (SW Section 6.1) (2 of 5)
The bias in the OLS estimator that occurs as a result of an omitted
factor, or variable, is called omitted variable bias. For omitted
variable bias to occur, the omitted variable “Z” must satisfy two
conditions:
The two conditions for omitted variable bias
1. Z is a determinant of Y (i.e. Z is part of u); and
2. Z is correlated with the regressor X (i.e. corr(Z,X) ≠ 0)

Both conditions must hold for the omission of Z to result in omitted

variable bias.
Omitted Variable Bias (SW Section 6.1) (3 of 5)
In the test score example:
1. English language ability (whether the student has English as a
second language) plausibly affects standardized test scores: Z is
a determinant of Y.
2. Immigrant communities tend to be less affluent and thus have
smaller school budgets and higher STR: Z is correlated with X.

Accordingly, ˆ1 is biased. What is the direction of this bias?

– What does common sense suggest?
– If common sense fails you, there is a formula…
Omitted Variable Bias (SW Section 6.1) (4 of 5)
A formula for omitted variable bias: recall the equation,

n
1 n
 ( X i  X )ui 
n i 1
vi
ˆ1  1  i n1 
 n 1 2
i 1
( X i  X ) 2

 n 
 sX

where vi  ( X i  X )ui  ( X i   X )ui .

Under Least Squares Assumption #1, E[(Xi – μX)ui] = cov(Xi,ui) = 0.

But what if E[(Xi – μX)ui] = cov(Xi,ui) = σXu ≠ 0?
Omitted Variable Bias (SW Section 6.1) (5 of 5)
Let β1 be the causal effect. Under LSA #2 and #3 (that is, even if
LSA #1 does not hold),
1 n
n
 ( X i  X )u i p
 Xu
ˆ1  1  i n1  2
1 X
 i
n i 1
( X  X ) 2

 u    Xu   u 
     Xu ,
X    X u  X 

where ρXu = corr(X,u). If assumption #1 is correct, then ρXu = 0, but

if not we have….
The omitted variable bias formula: (1 of 2)

ˆ
p  u 
1  1     Xu
X 
• If an omitted variable Z is both:
1. a determinant of Y (that is, it is contained in u); and
2. correlated with X , then  Xu  0 and the OLS estimator ˆ1 is biased
and is not consistent.
• For example, districts with few ESL students (1) do better on
standardized tests and (2) have smaller classes (bigger budgets),
so ignoring the effect of having many ESL students factor would
result in overstating the class size effect. Is this is actually going
on in the CA data?
The omitted variable bias formula: (2 of 2)
TABLE 6.1 Differences in Test Scores for California School Districts with Low and High
Student–Teacher Ratios, by the Percentage of English Learners in the District
Student–Teacher Student–Teacher Difference in Test Scores,
Ratio < 20 Ratio ≥ 20 Low vs. High STR
Blank Average Average
Test Score n Test Score n Difference t-statistic
All districts 657.4 238 650.0 182 7.4 4.04
Percentage of Blank Blank Blank Blank Blank Blank
English learners
< 1.9% 664.5 76 665.4 27 −0.9 −0.30
1.9–8.8% 665.2 64 661.8 44 3.3 1.13
8.8–23.0% 654.9 54 649.7 50 5.2 1.72
> 23.0% 636.7 44 634.8 61 1.9 0.68

• Districts with fewer English Learners have higher test scores

• Districts with lower percent EL (PctEL) have smaller classes
• Among districts with comparable PctEL, the effect of class size is small
(recall overall “test score gap” = 7.4)
Using regression to estimate causal effects

• The test score/STR/fraction English Learners example shows that, if an

omitted variable satisfies the two conditions for omitted variable bias, then
the OLS estimator in the regression omitting that variable is biased and
inconsistent. So, even if n is large, ˆ1 will not be close to 1.

• We have distinguished between two uses of regression: for prediction,

and to estimate causal effects.
– Regression also can be used simply to summarize the data without attaching any
meaning to the coefficients or for any other purpose, but we won’t focus on this
use.

• In the class size application, we clearly are interested in a causal effect:

what do we expect to happen to test scores if the superintendent reduces
the class size?
What, precisely, is a causal effect?
• “Causality” is a complex concept!
• In this course, we take a practical approach to defining causality:
A causal effect is defined to be the effect measured in an ideal
randomized controlled experiment.
Ideal Randomized Controlled Experiment
• Ideal: subjects all follow the treatment protocol – perfect
compliance, no errors in reporting, etc.!
• Randomized: subjects from the population of interest are
randomly assigned to a treatment or control group (so there are
no confounding factors)
• Controlled: having a control group permits measuring the
differential effect of the treatment
• Experiment: the treatment is assigned as part of the experiment:
the subjects have no choice, so there is no “reverse causality” in
which subjects choose the treatment they think will work best.
Back to class size:
Imagine an ideal randomized controlled experiment for measuring
the effect on Test Score of reducing STR…
• In that experiment, students would be randomly assigned to
classes, which would have different sizes.
• Because they are randomly assigned, all student characteristics
(and thus ui) would be distributed independently of STRi.
• Thus, E(ui|STRi) = 0 – that is, LSA #1 holds in a randomized
controlled experiment.
How does our observational data differ from
this ideal? (1 of 2)
• The treatment is not randomly assigned
• Consider PctEL – percent English learners – in the district.
It plausibly satisfies the two criteria for omitted variable
bias: Z = PctEL is:
1. a determinant of Y; and
2. correlated with the regressor X.

• Thus, the “control” and “treatment” groups differ in a

systematic way, so corr(STR,PctEL) ≠ 0
How does our observational data differ from
this ideal? (2 of 2)
• Randomization implies that any differences between the
treatment and control groups are random – not systematically
related to the treatment
• We can eliminate the difference in PctEL between the large class
(control) and small class (treatment) groups by examining the
effect of class size among districts with the same PctEL.
– If the only systematic difference between the large and small class size
groups is in PctEL, then we are back to the randomized controlled
experiment – within each PctEL group.

– This is one way to “control” for the effect of PctEL when estimating the
effect of STR.
Return to omitted variable bias
Three ways to overcome omitted variable bias
1. Run a randomized controlled experiment in which treatment (STR) is
randomly assigned: then PctEL is still a determinant of TestScore, but
PctEL is uncorrelated with STR. (This solution to OV bias is rarely
feasible.)
2. Adopt the “cross tabulation” approach, with finer gradations of STR
and PctEL – within each group, all classes have the same PctEL, so
we control for PctEL (But soon you will run out of data, and what
about other determinants like family income and parental education?)
3. Use a regression in which the omitted variable (PctEL) is no longer
omitted: include PctEL as an additional regressor in a multiple
regression.
The Population Multiple Regression Model
(SW Section 6.2)
• Consider the case of two regressors:
Yi = β0 + β1X1i + β2X2i + ui, i = 1,…,n
• Y is the dependent variable
• X1, X2 are the two independent variables (regressors)
• (Yi, X1i, X2i) denote the ith observation on Y, X1, and X2.
• β0 = unknown population intercept
• β1 = effect on Y of a change in X1, holding X2 constant
• β2 = effect on Y of a change in X2, holding X1 constant
• ui = the regression error (omitted factors)
Interpretation of coefficients in multiple
regression (1 of 2)
Yi = β0 + β1X1i + β2X2i + ui, i = 1,…,n
Consider the difference in the expected value of Y for two values
of X1 holding X2 constant:
Population regression line when X1 = X1,0:

Y = β0 + β1X1,0 + β2X2

Population regression line when X1 = X1,0 + ΔX1:

Y + ΔY = β0 + β1(X1,0 + ΔX1) + β2X2

Interpretation of coefficients in multiple
regression (2 of 2)
Before: Y = β0 + β1(X1,0 + ΔX1) + 2X2
After: Y + ΔY = β0 + β1(X1,0 + ΔX1) + β2X2
Difference: ΔY = β1ΔX1
So:
Y
1  , holding X 2 constant
X 1
Y
2  , holding X 1 constant
X 2
β0 = predicted value of Y when X1 = X2 = 0.
The OLS Estimator in Multiple Regression
(SW Section 6.3)
• With two regressors, the OLS estimator solves:
n
min b0 ,b1 ,b2  [Yi  (b0  b1 X 1i  b2 X 2i )]2
i 1

• The OLS estimator minimizes the average squared difference

between the actual values of Yi and the prediction (predicted
value) based on the estimated line.
• This minimization problem is solved using calculus
• This yields the OLS estimators of β0 and β1.
Example: the California test score data
Regression of TestScore against STR:
TestScore  698.9  2.28  STR

Now include percent English Learners in the district (PctEL):

TestScore  686.0  1.10  STR  0.65PctEL

• What happens to the coefficient on STR?

• What (STR, PctEL) = 0.19)
Multiple regression in STATA
reg testscr str pctel, robust;

Regression with robust standard errors Number of obs = 420

F( 2, 417) = 223.82
Prob > F = 0.0000
R-squared = 0.4264
Root MSE = 14.464

-----------------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+--------------------------------------------------------------------------
str | −1.101296 .4328472 −2.54 0.011 −1.95213 −.2504616
pctel | −.6497768 .0310318 −20.94 0.000 −.710775 −.5887786
_cons | 686.0322 8.728224 78.60 0.000 668.8754 703.189
-----------------------------------------------------------------------------------------

TestScore  686.0  1.10  STR  0.65PctEL

More on this printout later…
Measures of Fit for Multiple Regression
(SW Section 6.4) (1 of 2)
Actual  predicted  residual: Yi  Yî  uî
SER  std. deviation of uî (with d.f. correction)
RMSE  std. deviation of uî (without d.f. correction)
R 2  fraction of variance of Y explained by X
R 2  “adjusted R 2 ”  R 2 with a degrees-of-freedom correction
that adjusts for estimation uncertainty; R 2  R 2
SER and RMSE
As in regression with a single regressor, the SER and the RMSE are
measures of the spread of the Ys around the regression line:

n
1
SER  
n  k  1 i 1
ˆ
ui
2

1 n 2
RMSE  
n i 1
uˆi
2 2 2
R and R (adjusted R ) (1 of 2)
The R2 is the fraction of the variance explained – same definition
as in regression with a single regressor:

ESS SSR
R 
2
 1 ,
TSS TSS
n n n
where ESS   (Yˆi  Yˆ ) 2 , SSR   uˆi2 , TSS   (Yi  Y ) 2 .
i 1 i 1 i 1

• The R2 always increases when you add another regressor

(why?) – a bit of a problem for a measure of “fit”
2 2 2
R and R (adjusted R ) (2 of 2)
The R 2 (the “adjusted R 2 ”) corrects this problem by “penalizing”
you for including another regressor  the R 2 does not necessarily
increase when you add another regressor.

 n  1  SSR
Adjusted R : R  1  
2 2

 n  k  1  TSS

Note that R 2  R 2 , however if n is large the two will be very close.

Measures of Fit for Multiple Regression
(SW Section 6.4) (2 of 2)
Test score example:
(1) TestScore  698.9  2.28  STR,
R 2  .05, SER  18.6
(2) TestScore  686.0  1.10  STR  0.65 PctEL,
R 2  .426, R 2  .424, SER  14.5

• What – precisely – does this tell you about the fit of regression
(2) compared with regression (1)?
• Why are the R 2 and the R 2 so close in (2)?
The Least Squares Assumptions for Causal
Inference in Multiple Regression (SW Section
6.5)
Let β1, β2,…, βk be causal effects.
Yi = β0 + β1X1i + β2X2i + … + βkXki + ui, i = 1,…,n
1. The conditional distribution of u given the X’s has mean zero,
that is, E(ui|X1i = x1,…, Xki = xk) = 0.
2. (X1i,…,Xki,Yi), i = 1,…,n, are i.i.d.
3. Large outliers are unlikely: X 1 ,..., X k , and Y have four moments:
E ( X 14i )  ,..., E ( X ki4 )  , E (Yi 4 )  .

4. There is no perfect multicollinearity.

Assumption #1: the conditional mean of u
given the included Xs is zero. (1 of 2)
E(u|X1 = x1,…, Xk = xk) = 0
• This has the same interpretation as in regression with a single regressor.
• Failure of this condition leads to omitted variable bias, specifically, if an
omitted variable
1. belongs in the equation (so is in u) and
2. is correlated with an included X

• then this condition fails and there is OV bias.

• The best solution, if possible, is to include the omitted variable in the
regression.
• A second, related solution is to include a variable that controls for the
omitted variable (discussed shortly)
Assumption #1: the conditional mean of u
given the included Xs is zero. (2 of 2)
Assumption #2: (X1i,…,Xki,Yi), i =1,…,n, are i.i.d.
This is satisfied automatically if the data are collected by simple
random sampling.

Assumption #3: large outliers are rare (finite fourth moments)

This is the same assumption as we had before for a single
regressor. As in the case of a single regressor, OLS can be sensitive
to large outliers, so you need to check your data (scatterplots!) to
make sure there are no crazy values (typos or coding errors).
Assumption #4: There is no perfect multicollinearity
Perfect multicollinearity is when one of the regressors is an exact
linear function of the other regressors.
Example: Suppose you accidentally include STR twice:
regress testscr str str, robust
Regression with robust standard errors Number of obs = 420
F( 1, 418) = 19.26
Prob > F = 0.0000
R-squared = 0.0512
Root MSE = 18.581
------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------+---------------------------------------------------------------
str | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671
str | (dropped)
_cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057
------------------------------------------------------------------------
Perfect multicollinearity is when one of the regressors is an
exact linear function of the other regressors.

• In the previous regression, β1 is the effect on TestScore of a unit

change in STR, holding STR constant (???)
• We will return to perfect (and imperfect) multicollinearity
shortly, with more examples…
• With these least squares assumptions in hand, we now can derive
the sampling distribution of ˆ1 , ˆ2 ,..., ˆk .
The Sampling Distribution of the OLS
Estimator (SW Section 6.6)
Under the four Least Squares Assumptions,
• The sampling distribution of ˆ1 has mean 1
• var( ˆ1 ) is inversely proportional to n.
• Other than its mean and variance, the exact (finite-n) distribution
of ˆ is very complicated; but for large n...
1
p
 ˆ1 is consistent: ˆ1  1 (law of large numbers)
ˆ1  E ( ˆ1 )
 is approximately distributed N (0,1) (CLT)
var( ˆ1 )
 These statements hold for ˆ ,..., ˆ
1 k

Conceptually, there is nothing new here!

Multicollinearity, Perfect and Imperfect
(SW Section 6.7)
Perfect multicollinearity is when one of the regressors is an exact
linear function of the other regressors.
Some more examples of perfect multicollinearity
1. The example from before: you include STR twice,
2. Regress TestScore on a constant, D, and B, where: Di = 1 if
STR ≤ 20, = 0 otherwise; Bi = 1 if STR >20, = 0 otherwise, so
Bi = 1 – Di and there is perfect multicollinearity.
3. Would there be perfect multicollinearity if the intercept
(constant) were excluded from this regression? This example
is a special case of…
The dummy variable trap (1 of 2)
Suppose you have a set of multiple binary (dummy) variables, which are
mutually exclusive and exhaustive – that is, there are multiple categories
and every observation falls in one and only one category (Freshmen,
Sophomores, Juniors, Seniors, Other). If you include all these dummy
variables and a constant, you will have perfect multicollinearity – this is
sometimes called the dummy variable trap.
• Why is there perfect multicollinearity here?
• Solutions to the dummy variable trap:
1. Omit one of the groups (e.g. Senior), or
2. Omit the intercept
• What are the implications of (1) or (2) for the interpretation of the
coefficients?
The dummy variable trap (2 of 2)
• Perfect multicollinearity usually reflects a mistake in the
definitions of the regressors, or an oddity in the data
• If you have perfect multicollinearity, your statistical software
will let you know – either by crashing or giving an error message
or by “dropping” one of the variables arbitrarily
• The solution to perfect multicollinearity is to modify your list of
regressors so that you no longer have perfect multicollinearity.
Imperfect multicollinearity (1 of 2)
Imperfect and perfect multicollinearity are quite different despite
the similarity of the names.
Imperfect multicollinearity occurs when two or more regressors
are very highly correlated.

• Why the term “multicollinearity”? If two regressors are very

highly correlated, then their scatterplot will pretty much look like
a straight line – they are “co-linear” – but unless the correlation
is exactly ±1, that collinearity is imperfect.
Imperfect multicollinearity (2 of 2)
Imperfect multicollinearity implies that one or more of the regression
coefficients will be imprecisely estimated.
• The idea: the coefficient on X1 is the effect of X1 holding X2 constant;
but if X1 and X2 are highly correlated, there is very little variation in X1
once X2 is held constant – so the data don’t contain much information
about what happens when X1 changes but X2 doesn’t. If so, the
variance of the OLS estimator of the coefficient on X1 will be large.
• Imperfect multicollinearity (correctly) results in large standard errors
for one or more of the OLS coefficients.
• The math? See SW, App. 6.2
Control variables and conditional mean
independence (SW Section 6.8) (1 of 2)

We want to get an unbiased estimate of the effect on test scores of

changing class size, holding constant factors outside the school
committee’s control – such as outside learning opportunities
(museums, etc), parental involvement in education (reading with
mom at home?), etc.
If we could run an experiment, we would randomly assign students
(and teachers) to different sized classes. Then STRi would be
independent of all the other factors that go into ui, so E(ui|STRi) = 0
and the OLS slope estimator in the regression of TestScorei on STRi
would be an unbiased estimator of the desired causal effect.
Control variables and conditional mean
independence (SW Section 6.8) (2 of 2)

But with observational data, ui depends on additional factors

(parental involvement, knowledge of English, access in the
community to learning opportunities outside school, etc.).
• If you can observe those factors (e.g. PctEL), then include them
in the regression.
• But usually you can’t observe all these omitted causal factors
(e.g. parental involvement in homework). In this case, you can
include “control variables” which are correlated with these
omitted causal factors, but which themselves are not causal.
Control variables in multiple regression
A control variable W is a regressor included to hold constant
factors that, if neglected, could lead the estimated causal effect of
interest to suffer from omitted variable bias.
Control variables: an example from the
California test score data (1 of 4)
TestScore  700.2  1.00STR  0.122 PctEL  0.547 LchPct , R 2  0.773
(5.6) (0.27) (.033) (.024)

PctEL = percent English Learners in the school district

LchPct = percent of students receiving a free/subsidized lunch
(only students from low-income families are eligible)
• Which variable is the variable of interest?
• Which variables are control variables? Might they have a causal
effect themselves? What do they control for?
Control variables: an example from the
California test score data (2 of 4)
TestScore  700.2  1.00STR  0.122 PctEL  0.547 LchPct , R 2  0.773
(5.6) (0.27) (.033) (.024)
• STR is the variable of interest
• PctEL probably has a direct causal effect (school is tougher if you are
learning English!). But it is also a control variable: immigrant
communities tend to be less affluent and often have fewer outside
learning opportunities, and PctEL is correlated with those omitted
causal variables. PctEL is both a possible causal variable and a
control variable.
• LchPct might have a causal effect (eating lunch helps learning); it also
is correlated with and controls for income-related outside learning
opportunities. LchPct is both a possible causal variable and a control
variable.
Control variables: an example from the
California test score data (3 of 4)
1. Three interchangeable statements about what makes an
effective control variable:
i. An effective control variable is one which, when included in
the regression, makes the error term uncorrelated with the
variable of interest.
ii. Holding constant the control variable(s), the variable of
interest is “as if ” randomly assigned.
iii. Among individuals (entities) with the same value of the
control variable(s), the variable of interest is uncorrelated with
the omitted determinants of Y
Control variables: an example from the
California test score data (4 of 4)
2. Control variables need not be causal, and their coefficients
generally do not have a causal interpretation. For example:

TestScore  700.2  1.00STR  0.122 PctEL  0.547 LchPct , R 2  0.773

(5.6) (0.27) (.033) (.024)

• Does the coefficient on LchPct have a causal interpretation? If

so, then we should be able to boost test scores (by a lot! Do the
math!) by simply eliminating the school lunch program, so that
LchPct = 0! (Eliminating the school lunch program has a well-
defined causal effect: we could construct a randomized
experiment to measure the causal effect of this intervention.)
The math of control variables: conditional
mean independence. (1 of 3)
• Because a control variable is correlated with an omitted causal
factor, LSA #1 (E(ui|X1i,…, Xki) = 0) does not hold. For
example, the coefficient on LchPct is correlated with
unmeasured determinants of test scores such as outside learning
opportunities, so is subject to OV bias. But the fact that LchPct
is correlated with these omitted variables is precisely what
makes it a good control variable!
• If LSA #1 doesn’t hold, then what does?
• We need a mathematical condition for what makes an effective
control variable. This condition is conditional mean
independence: given the control variable, the mean of ui
doesn’t depend on the variable of interest
The math of control variables: conditional
mean independence. (2 of 3)
Let Xi denote the variable of interest and Wi denote the control
variable(s). W is an effective control variable if conditional mean
independence holds:
E(ui|Xi, Wi) = E(ui|Wi) (conditional mean independence)
If W is a control variable, then conditional mean independence
replaces LSA #1 – it is the version of LSA #1 that is relevant for
control variables.
The math of control variables: conditional
mean independence. (3 of 3)
Consider the regression model,
Y = β0 + β1 X + β2 W + u
where X is the variable of interest, β1 is its causal effect, and W is
an effective control variable so that conditional mean independence
holds:
E(ui|Xi, Wi) = E(ui|Wi).
In addition, suppose that LSA #2, #3, and #4 hold. Then:
1. 1 has a causal interpretation.
2. ˆ1 is unbiased
3. The coefficient on the control variable, ˆ2 , does not in general
estimate a causal effect.
The math of conditional mean independence
(1 of 5)

(SW Appendix 6.5) Under conditional mean independence:

1. β1 has a causal interpretation.
The math: The expected change in Y resulting from a change in X, holding
(a single) W constant, is:
E(Y | X = x+Δx, W=w) – E(Y | X = x, W=w)
= [β0 + β1(x+Δx) + β2w + E(u|X = x+Δx, W=w)]
– [β0 + β1x + β2w + E(u|X = x, W=w)]
= β1Δx + [E(u|X = x+Δx, W=w) – E(u|X = x, W=w)] = β1Δx
where the final line follows from conditional mean independence: under
conditional mean independence,
E(u|X = x+Δx, W=w) = E(u|X = x, W=w) = E(u|W=w).
The math of conditional mean independence
(2 of 5)

Under conditional mean independence:

2. ˆ1 is unbiased
3. ˆ2 does not in general estimate a causal effect

The math: Consider the regression model,

Y = β0 + β1X + β2W + u
where u satisfies the conditional mean independence assumption,
and where β1 and β1 are causal effects.
Suppose that E(u|W) = γ0 + γ2W (that is, that E(u|W) is linear in W).
Then, under conditional mean independence,
The math of conditional mean independence
(3 of 5)

E(u|X, W) = E(u|W) = γ0 + γ2W. (*)

Let
v = u – E(u|X, W) (**)
so that E(v|X, W) = 0. Combining (*) and (**) yields,
u = E(u|X, W) + v
= γ0 + γ2W + v, where E(v|X, W) = 0 (***)
Now substitute (***) into the regression,
Y = β0 + β1 X + β2 W + u (+)
The math of conditional mean independence
(4 of 5)

So that
Y = β0 + β1X + β2W + u (+)
= β0 + β1X + β2W + γ0 + γ2W + v from (***)
= ( β0+ γ0) + β1X + ( β2+ γ2)W + v
= δ0+ β1X + δ2W + v, where δ0 = β0+ γ0 and δ2 = β2+ γ2 (++)

• Because E(v|X, W ) = 0 in equation (++), the OLS estimators of δ0, β1,

and δ2 in (++) are unbiased.
• Because the regressors in () and (   ) are the same, the OLS coefficients
in regression () satisfy, E (ˆ1 )  1 and E (ˆ2 )   2   2   2 . Thus ˆ1 is an
unbiased estimator of the causal effect 1 , but ˆ2 is not unbiased for  2 .
The math of conditional mean independence
(5 of 5)
Under conditional mean independence,
E ( ˆ )  
1 1

and
E ( ˆ2 )   2   2   2   2
In summary, if W is such that conditional mean independence is
satisfied, then:
• The OLS estimator of the effect of interest, ˆ1, is unbiased.
• The OLS estimator of the coefficient on the control variable, ˆ2 , does not
have a causal interpretation. The reason is that the control variable is
correlated with omitted variables in the error term, so that ˆ2 is subject
to omitted variable bias.
Next topic: hypothesis tests and confidence intervals…

ir mcq 5
100% (1)
ir mcq 5
10 pages
Introduction To Econometrics - Stock & Watson - CH 4 Slides
100% (2)
Introduction To Econometrics - Stock & Watson - CH 4 Slides
84 pages
Lecture 3a
No ratings yet
Lecture 3a
44 pages
M06 StockWatson123635 03 Econ Ch06
No ratings yet
M06 StockWatson123635 03 Econ Ch06
50 pages
Lecture 4 MLR - 1
No ratings yet
Lecture 4 MLR - 1
30 pages
Lecture 4 MLR - 1
No ratings yet
Lecture 4 MLR - 1
30 pages
Chapter 6-Linear Regression With Multiple Regressors
No ratings yet
Chapter 6-Linear Regression With Multiple Regressors
68 pages
Introduction To Multiple Regression
No ratings yet
Introduction To Multiple Regression
36 pages
Chapter 6
No ratings yet
Chapter 6
36 pages
Lecture 3-1_Introduction to Multiple Regression
No ratings yet
Lecture 3-1_Introduction to Multiple Regression
48 pages
TCH442E Quantitative Methods For Finance: Last Lecture: Next
No ratings yet
TCH442E Quantitative Methods For Finance: Last Lecture: Next
13 pages
Problem Set 3 SOLUTIONS
No ratings yet
Problem Set 3 SOLUTIONS
7 pages
2024 1 Metrics 6 Multipleols 2
No ratings yet
2024 1 Metrics 6 Multipleols 2
22 pages
class 2
No ratings yet
class 2
53 pages
(eBook PDF) Introduction to Econometrics 4th Edition by James H. Stockinstant download
100% (2)
(eBook PDF) Introduction to Econometrics 4th Edition by James H. Stockinstant download
38 pages
Bus 173 - Lecture 5
No ratings yet
Bus 173 - Lecture 5
38 pages
Regression With One Regressor
No ratings yet
Regression With One Regressor
25 pages
Lecture6 MultiRegEstimate
No ratings yet
Lecture6 MultiRegEstimate
50 pages
(eBook PDF) Introduction to Econometrics 4th Edition by James H. Stock download
100% (1)
(eBook PDF) Introduction to Econometrics 4th Edition by James H. Stock download
45 pages
MGT 6203 - Sri - M5 - Treatment Effects v042919
No ratings yet
MGT 6203 - Sri - M5 - Treatment Effects v042919
21 pages
2 Regression With Multiple Regressors 1
No ratings yet
2 Regression With Multiple Regressors 1
22 pages
Stock3e Empirical SM PDF
No ratings yet
Stock3e Empirical SM PDF
1 page
Linear Regression
No ratings yet
Linear Regression
73 pages
ITER-126-160
No ratings yet
ITER-126-160
35 pages
Problem Set 3
No ratings yet
Problem Set 3
2 pages
Introduction To Econometrics (Lecture Slides Complete 1 - 13)
No ratings yet
Introduction To Econometrics (Lecture Slides Complete 1 - 13)
657 pages
Introduction To Econometrics Ebook PDF
No ratings yet
Introduction To Econometrics Ebook PDF
89 pages
5ssmn932 Lecture4 2021 Collated
No ratings yet
5ssmn932 Lecture4 2021 Collated
72 pages
CH 1, 2, 3 Slides
No ratings yet
CH 1, 2, 3 Slides
64 pages
CH 5 Slidesdd (1ddd)
No ratings yet
CH 5 Slidesdd (1ddd)
71 pages
Econometrics notes
No ratings yet
Econometrics notes
15 pages
Experiments and Quasi-Experiments: Solutions To Exercises
No ratings yet
Experiments and Quasi-Experiments: Solutions To Exercises
4 pages
Experiments and Causality
No ratings yet
Experiments and Causality
21 pages
Ch01 02 03 Final B
No ratings yet
Ch01 02 03 Final B
63 pages
(eBook PDF) Introduction to Econometrics, 4th Global Edition download pdf
100% (6)
(eBook PDF) Introduction to Econometrics, 4th Global Edition download pdf
56 pages
Panel Data Notes
No ratings yet
Panel Data Notes
5 pages
EC220/221 Introduction To Econometrics: Canh Thien Dang
No ratings yet
EC220/221 Introduction To Econometrics: Canh Thien Dang
30 pages
James Stock CH 1, 2, 3 Slides
No ratings yet
James Stock CH 1, 2, 3 Slides
66 pages
CH 123
No ratings yet
CH 123
63 pages
Introduction To Econometrics - Stock & Watson - CH 5 Slides
100% (2)
Introduction To Econometrics - Stock & Watson - CH 5 Slides
71 pages
(eBook PDF) Introduction to Econometrics, 4th Global Edition instant download
100% (6)
(eBook PDF) Introduction to Econometrics, 4th Global Edition instant download
57 pages
CH 123
No ratings yet
CH 123
63 pages
ECONOMETRICS Summary 21:22
No ratings yet
ECONOMETRICS Summary 21:22
54 pages
統計摘要
No ratings yet
統計摘要
12 pages
Watson Introduccion A La Econometria PDF
No ratings yet
Watson Introduccion A La Econometria PDF
253 pages
Econometrie
No ratings yet
Econometrie
63 pages
Week 7 - Omitted Variable Bias
No ratings yet
Week 7 - Omitted Variable Bias
38 pages
Manzan SW4e Ch01 02 03
No ratings yet
Manzan SW4e Ch01 02 03
70 pages
Correlation and Regression 2
No ratings yet
Correlation and Regression 2
24 pages
M01 StockWatson123635 03 Econ Part01
No ratings yet
M01 StockWatson123635 03 Econ Part01
61 pages
Student Solutions Manual to Introductory Econometrics 2nd edition Edition Jeffrey M. Wooldridgepdf download
100% (1)
Student Solutions Manual to Introductory Econometrics 2nd edition Edition Jeffrey M. Wooldridgepdf download
45 pages
Solution Manual for Introductory Econometrics A Modern Approach 5th Edition Wooldridge 1111531048 9781111531041 pdf download
80% (5)
Solution Manual for Introductory Econometrics A Modern Approach 5th Edition Wooldridge 1111531048 9781111531041 pdf download
49 pages
Student Solutions Manual to Introductory Econometrics 2nd edition Edition Jeffrey M. Wooldridge instant download
No ratings yet
Student Solutions Manual to Introductory Econometrics 2nd edition Edition Jeffrey M. Wooldridge instant download
53 pages
TCH442E Quantitative Methods For Finance
No ratings yet
TCH442E Quantitative Methods For Finance
21 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
From Everand
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
The Summation of Series
From Everand
The Summation of Series
Harold T. Davis
4/5 (1)
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
assignment 1
No ratings yet
assignment 1
1 page
Bad Samaritans - Aiym, Dilrabo, Mika
No ratings yet
Bad Samaritans - Aiym, Dilrabo, Mika
15 pages
alana clark
No ratings yet
alana clark
3 pages
acemoglu zhaniya
No ratings yet
acemoglu zhaniya
2 pages
Book review Allison-darya v
No ratings yet
Book review Allison-darya v
3 pages
gulim ir lynn
No ratings yet
gulim ir lynn
4 pages
Book review Buchanan darya v
No ratings yet
Book review Buchanan darya v
3 pages
The Rise of Western World - Mika, Aiym, Dilrabo
No ratings yet
The Rise of Western World - Mika, Aiym, Dilrabo
12 pages
alana freakonomics
No ratings yet
alana freakonomics
3 pages
moving-beyond-simple-experiments
No ratings yet
moving-beyond-simple-experiments
24 pages
Lecture set 1
No ratings yet
Lecture set 1
52 pages
Chatgptforresearchguide
No ratings yet
Chatgptforresearchguide
13 pages
intro-computational-social-science
No ratings yet
intro-computational-social-science
43 pages
01 What Why Which Experiments
No ratings yet
01 What Why Which Experiments
17 pages
survey-research-digital-age
No ratings yet
survey-research-digital-age
40 pages
human_computation
No ratings yet
human_computation
33 pages
mass-collaboration
No ratings yet
mass-collaboration
13 pages
Ch1
No ratings yet
Ch1
9 pages
Ch3
No ratings yet
Ch3
13 pages
Bretton Woods System
No ratings yet
Bretton Woods System
36 pages
ir mcq 4
100% (1)
ir mcq 4
36 pages
Thorstein Veblen
No ratings yet
Thorstein Veblen
17 pages
How Do You Behave Under Risk? St. Petersberg Game: Dr. Teng, J
No ratings yet
How Do You Behave Under Risk? St. Petersberg Game: Dr. Teng, J
14 pages
Why Study Money N Banking
No ratings yet
Why Study Money N Banking
25 pages
1971 1980stagflation PDF
No ratings yet
1971 1980stagflation PDF
24 pages
Veblen and Giffen Goods
No ratings yet
Veblen and Giffen Goods
19 pages
2020su Comparative Advantage
No ratings yet
2020su Comparative Advantage
32 pages
Backward Induction
No ratings yet
Backward Induction
12 pages
Demand and Supply
No ratings yet
Demand and Supply
63 pages
NURSING RESEARCH-21 Corrected
100% (1)
NURSING RESEARCH-21 Corrected
163 pages
Effectiveness of Teachers Training in Assessment
No ratings yet
Effectiveness of Teachers Training in Assessment
5 pages
Academic Progression
No ratings yet
Academic Progression
2 pages
E-Waste Life-Cycle
No ratings yet
E-Waste Life-Cycle
2 pages
Electoral Geography: From Mapping Votes To Representing Power
No ratings yet
Electoral Geography: From Mapping Votes To Representing Power
17 pages
Loyola Athletics Final Insights: Team Lightning Bolt
No ratings yet
Loyola Athletics Final Insights: Team Lightning Bolt
9 pages
Engineering Structures: Paul M. Hopkins, An Chen, Mostafa Yossef
No ratings yet
Engineering Structures: Paul M. Hopkins, An Chen, Mostafa Yossef
12 pages
BiomedParse: Advancing Biomedical Imaging With Microsoft Research
No ratings yet
BiomedParse: Advancing Biomedical Imaging With Microsoft Research
8 pages
Wmels Guiding Principles
No ratings yet
Wmels Guiding Principles
10 pages
Michigan Studies
No ratings yet
Michigan Studies
1 page
Doctorate of Business Administration: Management
No ratings yet
Doctorate of Business Administration: Management
4 pages
MCA - Zambia Social and Gender Integration Plan
No ratings yet
MCA - Zambia Social and Gender Integration Plan
85 pages
TUGAS Literate Women Make Better Mothers
No ratings yet
TUGAS Literate Women Make Better Mothers
4 pages
M.J. Road: All Dimensions Are in Meters
No ratings yet
M.J. Road: All Dimensions Are in Meters
1 page
Advertising As A Means of Communication
No ratings yet
Advertising As A Means of Communication
16 pages
West Visayas State University: (1st SEMESTER SY 2022-2023)
100% (1)
West Visayas State University: (1st SEMESTER SY 2022-2023)
4 pages
GSPC Pipeline AIM Format
100% (2)
GSPC Pipeline AIM Format
109 pages
Assignment Brief (Unit 34) Engineering - 1685981695 L5
No ratings yet
Assignment Brief (Unit 34) Engineering - 1685981695 L5
8 pages
Control Charts
No ratings yet
Control Charts
7 pages
Final Research Proposal
No ratings yet
Final Research Proposal
12 pages
1 - Brammer Et Al 2007 - The Contribution of Corporate Social Responsibility To Organizational Commitment
No ratings yet
1 - Brammer Et Al 2007 - The Contribution of Corporate Social Responsibility To Organizational Commitment
20 pages
Manufacture of Photonics Components: A European Perspective On Opportunities
No ratings yet
Manufacture of Photonics Components: A European Perspective On Opportunities
1 page
Summary of Findings, Conclusions and Recommendations
100% (1)
Summary of Findings, Conclusions and Recommendations
5 pages
Comparative Advertising Intensity (Donthu 1992)
No ratings yet
Comparative Advertising Intensity (Donthu 1992)
7 pages
Optimizing and strengthening the productivity of lecturer’s scientific work
No ratings yet
Optimizing and strengthening the productivity of lecturer’s scientific work
10 pages
Structure Document For Printing APPENDIX 1
No ratings yet
Structure Document For Printing APPENDIX 1
3 pages
Shopping Literature Review
100% (2)
Shopping Literature Review
6 pages
Qualitative vs. Quantitative
100% (4)
Qualitative vs. Quantitative
2 pages
JOURNAL CLUB New
No ratings yet
JOURNAL CLUB New
61 pages
Biocompatibility Testing
No ratings yet
Biocompatibility Testing
2 pages

Lecture set 3

Uploaded by

Lecture set 3

Uploaded by

ECONS303: Applied Quantitative Research Methods

Lecture set 3: Linear Regression with

The total reality is the economy. We make to ourselves models of facts.

Both conditions must hold for the omission of Z to result in omitted

Accordingly, ˆ1 is biased. What is the direction of this bias?

where vi  ( X i  X )ui  ( X i   X )ui .

Under Least Squares Assumption #1, E[(Xi – μX)ui] = cov(Xi,ui) = 0.

where ρXu = corr(X,u). If assumption #1 is correct, then ρXu = 0, but

• Districts with fewer English Learners have higher test scores

• The test score/STR/fraction English Learners example shows that, if an

• We have distinguished between two uses of regression: for prediction,

• In the class size application, we clearly are interested in a causal effect:

• Thus, the “control” and “treatment” groups differ in a

Population regression line when X1 = X1,0 + ΔX1:

Y + ΔY = β0 + β1(X1,0 + ΔX1) + β2X2

• The OLS estimator minimizes the average squared difference

Now include percent English Learners in the district (PctEL):

• What happens to the coefficient on STR?

Regression with robust standard errors Number of obs = 420

TestScore  686.0  1.10  STR  0.65PctEL

• The R2 always increases when you add another regressor

Note that R 2  R 2 , however if n is large the two will be very close.

4. There is no perfect multicollinearity.

• then this condition fails and there is OV bias.

Assumption #3: large outliers are rare (finite fourth moments)

• In the previous regression, β1 is the effect on TestScore of a unit

Conceptually, there is nothing new here!

• Why the term “multicollinearity”? If two regressors are very

We want to get an unbiased estimate of the effect on test scores of

But with observational data, ui depends on additional factors

PctEL = percent English Learners in the school district

TestScore  700.2  1.00STR  0.122 PctEL  0.547 LchPct , R 2  0.773

• Does the coefficient on LchPct have a causal interpretation? If

(SW Appendix 6.5) Under conditional mean independence:

Under conditional mean independence:

The math: Consider the regression model,

E(u|X, W) = E(u|W) = γ0 + γ2W. (*)

• Because E(v|X, W ) = 0 in equation (++), the OLS estimators of δ0, β1,

You might also like