0% found this document useful (0 votes)

24 views

Week 10

The document discusses endogeneity and how it can arise through measurement error, omitted variables, simultaneity, and dynamic models with lagged dependent variables. It explains that endogeneity violates the Gauss-Markov assumption that regressors are uncorrelated with the error term, which can bias OLS estimates. Several examples are provided to illustrate how endogeneity occurs in economic models and how it impacts coefficient estimates.

Uploaded by

Jerry ma

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Week 10

Uploaded by

Jerry ma

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Eco 401 Econometrics

SI 2020,
SI 2021,Week 10-11, 16-23 November 2021

Dr Syed Kanwar Abbas

Office Location: BS304
Email: [email protected]
Agenda
Last week, we looked at the Autocorrelation, consequences and its solution. We would now start the
instrumental variable/2SLS estimation. At the end of this section, you should be able to understand

What is the endogeneity problem?

How does endogeneity affect the OLS estimates?
How do we correct/solve endogeneity problem?

This lecture is based on Chapter 5 of your textbook by Verbeek (2017).

Endogeneity
Gauss-Markov assumptions
What does BLUE mean?
• Best – minimum variance of the estimator
• Linear – within the class of linear estimators
• Unbiased – the expected value = 'truth': 𝐸𝐸 𝒃𝒃 = 𝜷𝜷
• Estimator
When is OLS BLUE? Under the Gauss Markov assumptions:
• (A1) mean zero; error terms have mean zero: 𝐸𝐸 𝜺𝜺 = 𝟎𝟎
• (A2) independent; error terms independent of exogenous variables
• (A3) homoskedasticity; error terms have same variance 𝑉𝑉 𝜀𝜀𝑖𝑖 = 𝜎𝜎 2
• (A4) no autocorrelation; error terms mutually uncorrelated 𝑐𝑐𝑐𝑐𝑐𝑐 𝜀𝜀𝑖𝑖 , 𝜀𝜀𝑗𝑗 = 0, 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 ≠ 𝑗𝑗
Violation 3: 𝐸𝐸 𝒙𝒙𝑡𝑡 𝜀𝜀𝑡𝑡 ≠ 0; endogeneity

Consider a model 𝑦𝑦𝑡𝑡 = 𝒙𝒙′𝑡𝑡 𝜷𝜷 + 𝜀𝜀𝑡𝑡

So far, we assumed that the error term 𝜀𝜀𝑡𝑡 and the explanatory variables 𝒙𝒙𝑡𝑡 are
contemporaneously uncorrelated 𝐸𝐸 𝒙𝒙𝑡𝑡 𝜀𝜀𝑡𝑡 = 0
o This condition simply says that the error term (mean zero) is uncorrelated with
any of the explanatory variables
• We have also concluded that the OLS estimate 𝒃𝒃 is consistent for 𝜷𝜷 even if the Gauss-
Markov conditions (A3) and (A4) are not fulfilled

o the assumption that 𝜀𝜀𝑡𝑡 is independent of 𝒙𝒙𝑡𝑡 may be too strong

Violation 3: 𝐸𝐸 𝒙𝒙𝑡𝑡 𝜀𝜀𝑡𝑡 ≠ 0; endogeneity

We label variables in 𝒙𝒙𝑡𝑡 that are correlated with the error terms
endogenous; variables that are not are called exogenous
Endogeneity is thus said to occur in a multiple regression model if
𝐸𝐸 𝒙𝒙𝑡𝑡 𝜀𝜀𝑡𝑡 ≠ 0; endogeneity implies that an explanatory variable included
in the model is correlated with unobservables relegated to the error term
𝐸𝐸 𝒙𝒙𝑡𝑡 𝜀𝜀𝑡𝑡 ≠ 0; endogeneity can arise as a result of
• measurement error
• dynamic model (lagged dependent)
• omitted variable bias
• simultaneity
Measurement Error
Endogeneity – measurement error

Data is often measured with error

o reporting errors
o coding errors
Measurement error in the dependent variable 𝑦𝑦, does not cause endogeneity
Measurement error in the explanatory variable 𝑥𝑥, does cause endogeneity problems
Endogeneity – measurement error
• Suppose that we do not get a perfect measure of one of our explanatory
variables
• Instead of observing 𝑥𝑥𝑖𝑖 we observe 𝑥𝑥𝑖𝑖∗ = 𝑥𝑥𝑖𝑖 + 𝜐𝜐𝑖𝑖 ; where 𝜐𝜐𝑖𝑖 is the measurement
"noise"
• When we try to estimate the regression 𝑦𝑦𝑖𝑖 = 𝛼𝛼 + 𝛽𝛽𝑥𝑥𝑖𝑖 + 𝜀𝜀𝑖𝑖
• We actually end up estimating
𝑦𝑦𝑖𝑖 = 𝛼𝛼 + 𝛽𝛽 𝑥𝑥𝑖𝑖∗ − 𝜐𝜐𝑖𝑖 + 𝜀𝜀𝑖𝑖
𝑦𝑦𝑖𝑖 = 𝛼𝛼 + 𝛽𝛽𝑥𝑥𝑖𝑖∗ + (𝜀𝜀𝑖𝑖 −𝛽𝛽𝜐𝜐𝑖𝑖 )
𝑦𝑦𝑖𝑖 = 𝛼𝛼 + 𝛽𝛽 𝑥𝑥𝑖𝑖∗ + 𝑢𝑢𝑖𝑖

• Since both 𝑥𝑥𝑖𝑖∗ and 𝑢𝑢𝑖𝑖 depend on 𝜐𝜐𝑖𝑖 , they are correlated
Endogeneity – dynamic model (lagged dependent + autocorrelation)
Endogeneity – dynamic model (lagged dependent + autocorrelation)
Endogeneity problems may arise in a dynamic model that includes a lagged dependent
variable
𝑦𝑦𝑡𝑡 = 𝛽𝛽1 + 𝛽𝛽2 𝑥𝑥𝑡𝑡 + 𝛽𝛽3 𝑦𝑦𝑡𝑡−1 + 𝜀𝜀𝑡𝑡
As long as we assume that 𝐸𝐸 𝑥𝑥𝑡𝑡 𝜀𝜀𝑡𝑡 = 0 and 𝐸𝐸 𝑦𝑦𝑡𝑡−1 𝜀𝜀𝑡𝑡 = 0 for all 𝑡𝑡 the OLS estimator
for 𝜷𝜷 is consistent
However, suppose that 𝜀𝜀𝑡𝑡 is subject to first order autocorrelation 𝜀𝜀𝑡𝑡 = 𝜌𝜌𝜀𝜀𝑡𝑡−1 + 𝑣𝑣𝑡𝑡
This gives: 𝑦𝑦𝑡𝑡 = 𝛽𝛽1 + 𝛽𝛽2 𝑥𝑥𝑡𝑡 + 𝛽𝛽3 𝑦𝑦𝑡𝑡−1 + 𝜌𝜌𝜀𝜀𝑡𝑡−1 + 𝑣𝑣𝑡𝑡
Obviously, we also have: 𝑦𝑦𝑡𝑡−1 = 𝛽𝛽1 + 𝛽𝛽2 𝑥𝑥𝑡𝑡−1 + 𝛽𝛽3 𝑦𝑦𝑡𝑡−2 + 𝜀𝜀𝑡𝑡−1
This implies that error term 𝜀𝜀𝑡𝑡 is correlated with 𝑦𝑦𝑡𝑡−1
Thus if 𝜌𝜌 ≠ 0, OLS is no longer consistent for the parameters
Endogeneity – omitted variable bias
Endogeneity – omitted variable bias

Some unobservable (or omitted) variable affects both 𝑦𝑦 and 𝑥𝑥

If a relevant variable is omitted that is correlated with the included ones, OLS becomes
biased; problem for causal interpretations.
Example: wage equation with unobserved ability related to schooling

Consider a wage equation 𝑦𝑦𝑖𝑖 = 𝒙𝒙1𝑖𝑖 ′𝜷𝜷1 + 𝑥𝑥2𝑖𝑖 𝛽𝛽2 + 𝑢𝑢𝑖𝑖 𝛾𝛾 + 𝜐𝜐𝑖𝑖

Where 𝑥𝑥2𝑖𝑖 denotes years of schooling, and 𝑢𝑢𝑖𝑖 is an unobserved variable reflecting “ability”. Persons with
higher levels of ability tend to have higher wages but are also more likely to have higher schooling
level
Thus: 𝛾𝛾 > 0 and 𝑐𝑐𝑐𝑐𝑐𝑐 𝑥𝑥2𝑖𝑖 , 𝑢𝑢𝑖𝑖 > 0
Since 𝑢𝑢𝑖𝑖 is unobserved, we end up estimating 𝑦𝑦𝑖𝑖 = 𝒙𝒙𝑖𝑖 ′𝜷𝜷 + 𝜀𝜀𝑖𝑖 , where 𝒙𝒙′𝑖𝑖 = 𝒙𝒙1𝑖𝑖 ′, 𝑥𝑥2𝑖𝑖
𝜷𝜷𝜷 = 𝜷𝜷1′ , 𝛽𝛽2
𝜀𝜀𝑖𝑖 = 𝑢𝑢𝑖𝑖 𝛾𝛾 + 𝜐𝜐𝑖𝑖
Endogeneity – omitted variable bias
Estimating 𝜷𝜷 by OLS yields (see omitted variable bias discussion ,
Lecture 4): 𝒃𝒃 = 𝜷𝜷 + 𝑿𝑿′ 𝑿𝑿 −1 ∑𝑁𝑁𝑖𝑖=1 𝒙𝒙𝑖𝑖 𝑢𝑢𝑖𝑖 𝛾𝛾 + 𝑿𝑿′ 𝑿𝑿 −1 ∑𝑁𝑁𝑖𝑖=1 𝒙𝒙𝑖𝑖 𝜐𝜐𝑖𝑖

plim is only 0 if 𝐸𝐸 𝒙𝒙𝑖𝑖 𝑢𝑢𝑖𝑖 = 𝟎𝟎 so no problem; plim = 0

𝒙𝒙𝑖𝑖 and 𝑢𝑢𝑖𝑖 are orthogonal
Assuming 𝐸𝐸 𝒙𝒙𝑖𝑖 𝜐𝜐𝑖𝑖 = 0, when 𝛾𝛾 ≠ 0 consistency of the OLS estimator
requires 𝐸𝐸 𝒙𝒙𝑖𝑖 𝑢𝑢𝑖𝑖 = 0
That is, the unobserved “ability” should be uncorrelated with
schooling and the other explanatory variables in the model
Assuming 𝐸𝐸 𝒙𝒙𝑖𝑖 𝑢𝑢𝑖𝑖 > 0, we thus expect that OLS overestimates the
returns to schooling
Therefore, it shows a bias if 𝑢𝑢𝑖𝑖 and 𝒙𝒙𝑖𝑖 are correlated
Endogeneity – simultaneity
Endogeneity – simultaneity
Simultaneity occurs when 𝑥𝑥𝑡𝑡 not only has an impact on 𝑦𝑦𝑡𝑡 , but at the same time 𝑦𝑦𝑡𝑡 has an
impact on 𝑥𝑥𝑡𝑡 (reverse causality)
This situation arises in many economic contexts, such as
o quantity & price determined by intersection demand & supply
o investment & productivity
o sales & advertising
Consider a Keynesian consumption function 𝑦𝑦𝑡𝑡 = 𝛽𝛽1 + 𝛽𝛽2 𝑥𝑥2𝑡𝑡 + 𝜀𝜀𝑡𝑡
where 𝑦𝑦𝑡𝑡 is consumption, 𝑥𝑥2𝑡𝑡 is income, 𝑡𝑡 = 1, . . , 𝑇𝑇 are periods (years), and 𝛽𝛽2 ∈
0,1 denotes the marginal propensity to consume
A situation of reverse causality naturally arises when 𝑦𝑦𝑡𝑡 and 𝑥𝑥2𝑡𝑡 are determined
simultaneously
The above consumption equation has a causal interpretation describing the impact of
income upon consumption: how much more will people consume if their income
increases by one dollar?
Endogeneity – simultaneity

However, aggregate income is not exogenous; in a closed economy without a

government income is defined by: 𝑥𝑥2𝑡𝑡 = 𝑦𝑦𝑡𝑡 + 𝑧𝑧2𝑡𝑡 , where 𝑧𝑧2𝑡𝑡 denotes the
investment level
It says that total income is the sum of total consumption and total investment
These two equations are structural equations since they have a, ceteris paribus, causal
interpretation
We assume that 𝑧𝑧2𝑡𝑡 and 𝜀𝜀𝑡𝑡 are uncorrelated 𝐸𝐸 𝑧𝑧2𝑡𝑡 𝜀𝜀𝑡𝑡 = 0
This means 𝑧𝑧2𝑡𝑡 is exogenous whereas 𝑦𝑦𝑡𝑡 and 𝑥𝑥2𝑡𝑡 are endogenous; jointly and
simultaneously determined within the model
Since 𝑦𝑦𝑡𝑡 influences 𝑥𝑥2𝑡𝑡 ,this implies that income 𝑥𝑥2𝑡𝑡 and error term 𝜀𝜀𝑡𝑡 are correlated;
the OLS estimate for 𝛽𝛽2 will thus be biased and inconsistent
Endogeneity – simultaneity

Structural equations:
𝑦𝑦𝑡𝑡 = 𝛽𝛽1 + 𝛽𝛽2 𝑥𝑥2𝑡𝑡 + 𝜀𝜀𝑡𝑡
𝑥𝑥2𝑡𝑡 = 𝑦𝑦𝑡𝑡 + 𝑧𝑧2𝑡𝑡
Solve to get reduced form equations:
𝛽𝛽1 1 𝜀𝜀𝑡𝑡
𝑥𝑥2𝑡𝑡 = + 𝑧𝑧2𝑡𝑡 +
1 − 𝛽𝛽2 1 − 𝛽𝛽2 1 − 𝛽𝛽2
𝛽𝛽1 𝛽𝛽2 𝜀𝜀𝑡𝑡
𝑦𝑦𝑡𝑡 = + 𝑧𝑧2𝑡𝑡 +
1 − 𝛽𝛽2 1 − 𝛽𝛽2 1 − 𝛽𝛽2
The reduced form equations can be estimated with OLS since 𝐸𝐸 𝑧𝑧2𝑡𝑡 𝜀𝜀𝑡𝑡 = 0
structural equation cannot be estimated by OLS
Instrumental variables
Instrumental variables

A possible solution to the endogeneity problem is using Instrumental Variable (IV)

techniques
Let us, for exposition purposes, consider the simple model
𝑦𝑦𝑡𝑡 = 𝛽𝛽1 + 𝛽𝛽2 𝑥𝑥𝑡𝑡 + 𝜀𝜀𝑡𝑡 , where 𝐸𝐸 𝑥𝑥𝑡𝑡 𝜀𝜀𝑡𝑡 ≠ 0, so OLS is inconsistent
Now, suppose we can find a variable 𝑧𝑧𝑡𝑡 (which we will call an instrument) that
satisfies the following two conditions
o Exogeneity: 𝐸𝐸 𝑧𝑧𝑡𝑡 𝜀𝜀𝑡𝑡 = 0 (instrument uncorrelated with error)
o Relevance: cov 𝑥𝑥𝑡𝑡 , 𝑧𝑧𝑡𝑡 ≠ 0 (instrument correlated with end. regressor)
𝑧𝑧𝑡𝑡 is called an instrumental variable / instrument
Instrumental variables
Let us now take the covariance with 𝑧𝑧𝑡𝑡 on both sides of
𝑦𝑦𝑡𝑡 = 𝛽𝛽1 + 𝛽𝛽2 𝑥𝑥𝑡𝑡 + 𝜀𝜀𝑡𝑡
to get
cov 𝑦𝑦𝑡𝑡 , 𝑧𝑧𝑡𝑡 = 𝛽𝛽2 cov 𝑥𝑥𝑡𝑡 , 𝑧𝑧𝑡𝑡 + cov 𝑧𝑧𝑡𝑡 , 𝜀𝜀𝑡𝑡

cov 𝑥𝑥𝑡𝑡 , 𝑧𝑧𝑡𝑡 ≠ 0 cov 𝑧𝑧𝑡𝑡 , 𝜀𝜀𝑡𝑡 = 0

So we can write
cov 𝑦𝑦𝑡𝑡 ,𝑧𝑧𝑡𝑡
𝛽𝛽2 =
cov 𝑥𝑥𝑡𝑡 ,𝑧𝑧𝑡𝑡
This (theoretically) determines 𝛽𝛽2 ; how to estimate it?
Instrumental variables
• Simply replace the population covariances by the sample
∑ 𝑧𝑧 −𝑧𝑧̅ 𝑦𝑦𝑡𝑡 −𝑦𝑦�
covariances: 𝛽𝛽̂2,𝐼𝐼𝐼𝐼 = 𝑡𝑡 𝑡𝑡
∑𝑡𝑡 𝑧𝑧𝑡𝑡 −𝑧𝑧̅ 𝑥𝑥𝑡𝑡 −𝑥𝑥̅
• This is called an instrumental variable (IV) estimator
• Recall that the OLS estimator in this case is (see chapter 2):
∑𝑡𝑡 𝑥𝑥𝑡𝑡 −𝑥𝑥̅ 𝑦𝑦𝑡𝑡 −𝑦𝑦�
̂
𝛽𝛽2,𝑂𝑂𝑂𝑂𝑂𝑂 = 𝑏𝑏2 =
∑𝑡𝑡 𝑥𝑥𝑡𝑡 −𝑥𝑥̅ 𝑥𝑥𝑡𝑡 −𝑥𝑥̅
• The instrument 𝑧𝑧𝑡𝑡 thus replaces 𝑥𝑥𝑡𝑡 twice in the IV formula;
alternatively, this means that IV reduces to OLS if 𝑧𝑧𝑡𝑡 = 𝑥𝑥𝑡𝑡
More generally
Consider the model

where

for some elements of xt.

Suppose we can find a vector of instruments zt, having the same dimensions as xt
such that

Then the IV estimator based on these instruments is given by

23
More generally
∑𝑡𝑡 𝑧𝑧𝑡𝑡 − 𝑧𝑧̅ 𝑦𝑦𝑡𝑡 − 𝑦𝑦�
𝛽𝛽̂2,𝐼𝐼𝐼𝐼 =
∑𝑡𝑡 𝑧𝑧𝑡𝑡 − 𝑧𝑧̅ 𝑥𝑥𝑡𝑡 − 𝑥𝑥̅

Its (asymptotic) covariance matrix is given by

which can be estimated fairly easily (to get standard errors etc.)

24
Instrumental variables

Consider the general model 𝑦𝑦𝑡𝑡 = 𝒙𝒙′𝑡𝑡 𝜷𝜷 + 𝜀𝜀𝑡𝑡 , where 𝐸𝐸 𝒙𝒙𝑡𝑡 𝜀𝜀𝑡𝑡 ≠ 0
for some elements of 𝒙𝒙𝑡𝑡
Suppose we can find a vector of instruments 𝒛𝒛𝑡𝑡 with the same dimensions as 𝒙𝒙𝑡𝑡 such that 𝐸𝐸 𝒛𝒛𝑡𝑡 𝜀𝜀𝑡𝑡 = 0;
note: we only have to find instruments for the endogenous explanatory variables, the exogenous
explanatory variables can serve as their own instruments
Using matrix notation, this can be written as 𝒚𝒚 = 𝑿𝑿𝜷𝜷 + 𝜺𝜺, where
𝒚𝒚 is the 𝑁𝑁 × 1 column of observations for 𝑦𝑦𝑖𝑖 ,
𝑿𝑿 is the 𝑁𝑁 × 𝐾𝐾 matrix collecting the vectors 𝒙𝒙′𝑖𝑖 , and
𝜺𝜺 is the 𝑁𝑁 × 1 column of observations for 𝜀𝜀𝑖𝑖
The OLS estimate for 𝜷𝜷 is 𝒃𝒃 = 𝑿𝑿′ 𝑿𝑿 −𝟏𝟏 𝑿𝑿𝑿𝑿𝑿, which is inconsistent
Let 𝒁𝒁 be the 𝑁𝑁 × 𝐾𝐾 matrix of instruments; then as in the simple case before, the IV estimator is given by
partially replacing 𝑿𝑿 with 𝒁𝒁 as follows: 𝜷𝜷 � 𝐼𝐼𝐼𝐼 = 𝒁𝒁′ 𝑿𝑿 −𝟏𝟏 𝒁𝒁′𝒚𝒚
Instrumental variables
• In the above discussion the number of instruments 𝑅𝑅, say, is the
same as the number 𝐾𝐾 of explanatory variables (with exogenous
explanatory variables as their own instruments): 𝑅𝑅 = 𝐾𝐾; identified
• What if 𝑅𝑅 < 𝐾𝐾? We do not have enough information: no solution,
under-identified
• What if 𝑅𝑅 > 𝐾𝐾? We have more information (more equations) than
needed: over-identified
• Rather than ignoring relevant information (if 𝑅𝑅 > 𝐾𝐾) we minimize a
quadratic in the sample moments; leads to Generalized Instrumental
Variables Estimator (GIVE) or Two-Stage Least Squares (2SLS):
−𝟏𝟏 ′
�
𝜷𝜷𝐺𝐺𝐼𝐼𝐼𝐼𝐸𝐸 = 𝑿𝑿 𝒁𝒁 𝒁𝒁 𝒁𝒁 𝒁𝒁𝒁𝑿𝑿 𝑿𝑿 𝒁𝒁 𝒁𝒁′ 𝒁𝒁 −𝟏𝟏 𝒁𝒁𝒁𝒚𝒚
′ ′ −𝟏𝟏

replaces 𝑿𝑿′ in OLS formula

Instrumental variables

Why the name 2SLS? Consider a structural model and reduced form
Let reduced form of kth explanatory variable be 𝒙𝒙𝑘𝑘 = 𝒁𝒁𝝅𝝅𝑘𝑘 + 𝝊𝝊𝑘𝑘
First step; the OLS estimate of 𝝅𝝅𝑘𝑘 is 𝒃𝒃𝑘𝑘 = 𝒁𝒁′ 𝒁𝒁 −𝟏𝟏 𝒁𝒁′𝒙𝒙𝑘𝑘
�𝑘𝑘 = 𝒁𝒁𝒃𝒃𝑘𝑘 = 𝒁𝒁 𝒁𝒁′ 𝒁𝒁 −𝟏𝟏 𝒁𝒁′𝒙𝒙𝑘𝑘
�𝑘𝑘 of 𝒙𝒙𝑘𝑘 is: 𝒙𝒙
The predicted value 𝒙𝒙

Second step; estimate the original structural equations by OLS, while replacing all
endogenous variables on the right-hand-side with their predicted values from the
reduced form
� be the matrix with predicted values with columns 𝒙𝒙
Let 𝑿𝑿 �=
�𝑘𝑘 ; it is thus equal to 𝑿𝑿
𝒁𝒁 𝒁𝒁′ 𝒁𝒁 −𝟏𝟏 𝒁𝒁′ 𝑿𝑿
−𝟏𝟏
� ′𝑿𝑿
The OLS estimator in the second step is given by 𝒃𝒃 = 𝑿𝑿 � � ′ 𝒚𝒚 which is actually
𝑿𝑿
equal to GIVE
Finding instruments
Finding instruments

To use IV as a consistent estimator for 𝜷𝜷 we need valid instruments

Two conditions are required for instrument 𝑧𝑧𝑡𝑡 to be valid:
o Exogeneity: 𝐸𝐸 𝑧𝑧𝑡𝑡 𝜀𝜀𝑡𝑡 = 0 (instrument uncorrelated with error)
o Relevance: cov 𝑥𝑥𝑡𝑡 , 𝑧𝑧𝑡𝑡 ≠ 0 (instrument correlated with end. regressor)

Exogeneity is based on economic assumptions or gut feeling; it can only be tested if we

have over-identification (𝑅𝑅 > 𝐾𝐾) Hausman test
Relevance can be tested by regressing 𝑥𝑥𝑡𝑡 = 𝛾𝛾1 + 𝛾𝛾2 𝑧𝑧𝑡𝑡 + 𝑣𝑣𝑡𝑡 where an instrument is not
relevant if 𝛾𝛾2 = 0
Finding instruments is hard.
Estimating the returns to schooling
Estimating the returns to schooling
Estimating the causal effect of schooling upon earnings has attracted substantive
attention in the literature
Causal: What is the effect on earnings of an exogenous increase in schooling?
OLS estimates tend to be biased, because they reflect differences in unobserved
characteristics of individuals that have attained different levels of schooling (such
as: intelligence level, specific skill). This is referred to as “ability bias”.
Another cause of biased OLS estimates could be measurement error in schooling
(downward bias)
Suppose we want to estimate a wage equation explaining earnings from schooling
and other variables
𝑦𝑦𝑖𝑖 = 𝒙𝒙1𝑖𝑖 ′𝜷𝜷1 + 𝑥𝑥2𝑖𝑖 𝛽𝛽2 + 𝑢𝑢𝑖𝑖 𝛾𝛾 + 𝜐𝜐𝑖𝑖 , where 𝑥𝑥2𝑖𝑖 denotes years of schooling, and 𝑢𝑢𝑖𝑖 is an
unobserved variable reflecting “ability”
Estimating the returns to schooling

So, which factors affect schooling but not earnings directly? (not related to unobserved
ability/intelligence that is determining wages?) Parents’ education? Distance to school?
Once we identify the instruments, we are ready to estimate the parameters using IV regression
We do this in two stages, so called 2SLS (Two Stage Least Square).
In order to run IV, we need at least one instrument for each endogenous explanatory variable
IV regression (2SLS) runs as follows:
In the first stage regression, endogenous explanatory variables are regressed upon instruments and
exogenous explanatory variables; fitted values are saved
This is done for all the endogenous regressors included
In the second stage regression, the original dependent variable is regressed upon predicted values of
endogenous regressors and exogenous variables
Wage example
Wage example

Wage data taken from Card (1995), based on the National Longitudinal Survey of Young
Men
3010 men, wages in 1976
We observe individual characteristics, including experience, race, region, family
background, and so on
We choose a fairly simple specification
First step: always do (and report) OLS; provides a benchmark for what follows
In the equation, if schooling is endogenous, then experience and experience squared
are by construction endogenous
Therefore, at least 3 instruments required for 3 endogenous regressors
Wage example

The estimated average returns to schooling is 7.4%

For 3 endogenous variables, we need 3 instruments; age and age2
for experience; in this case college proximity is used for schooling
We conduct the reduced form equation for schooling on age and
age2 and proximity
Wage example Reduced form or first stage regression

Students who live near a college have on av. 0.35 years more schooling
The requirement of relevance can be tested (look at t-values above)
Instrument exogeneity is not fully testable only if over-identifying
restrictions; we need to argue ‘plausibility’ – in cases like this, economic arguments
are more valid than statistical ones
Next, we estimate the regression using the instruments
Wage example
IV estimate

Est. return to schooling over 13%; large std error, but significant
Estimate higher than OLS, but might just be due to sampling error;
(estimate fairly robust to alternative specifications)
The larger std error is due to low correlation between instruments and
endogenous regressors; note low R2 in reduced form
Important Issues
Issues

Any IV estimate requires a choice of instruments that should be motivated; always

mention this choice
Reduced form equation, explaining endogenous regressors from exogenous regressors
and instruments, should show significant effect of the instruments
If weak: weak instruments problem (or weak identification) arises
o weak instruments problem: properties of the IV estimator can be very poor and
can be severely biased if the instruments exhibit only weak correlation with the
endogenous regressors
o If so, the normal distribution provides a poor approximation to the true
distribution of the IV estimator, even for large samples
o As a result, the standard IV estimator is biased, its standard errors are misleading
and hypothesis tests are unreliable
Issues

IV estimates are (much) less accurate than OLS (how much depends upon their
correlation with the endogenous regressors)
No R2 reporting in IV; our goal is to produce consistent estimator for causal effect,
which is what IV tries to do
There is no unique definition of an R2 or adjusted R2 if the model is not estimated by
ordinary least squares (OLS)
This reflects that the R2 plays no role at all in comparing alternative estimators
It is possible to use more instruments than required (overidentification)
Sargan test- overall validity of instruments

• The Sargan test can be used to test the overall validity of instruments, provided the
number of instruments exceeds the number of endogenous variables.

• The test statistic is NR2 of an auxiliary regression of IV residuals upon a full set of
instruments, which has a chi-squared distribution.

• The joint null hypothesis is that the instruments are valid instruments, i.e.,
uncorrelated with the error term, and that the excluded instruments are correctly
excluded from the estimated equation.
Next Week
Test of Endogeneity

BattleRoyale v07
No ratings yet
BattleRoyale v07
185 pages
Endogeneity: Yusep Suparman
No ratings yet
Endogeneity: Yusep Suparman
25 pages
Lecture: Simultaneous Equation Model (Wooldridge's Book Chapter 16)
No ratings yet
Lecture: Simultaneous Equation Model (Wooldridge's Book Chapter 16)
28 pages
IV
No ratings yet
IV
27 pages
15 Instrumental Variables
No ratings yet
15 Instrumental Variables
27 pages
Endogeneity
No ratings yet
Endogeneity
73 pages
Chapter 1_Instrumental Variable Method
No ratings yet
Chapter 1_Instrumental Variable Method
32 pages
Lectute 1 - Instrumental Variable Method
No ratings yet
Lectute 1 - Instrumental Variable Method
32 pages
How_to_Test_Endogeneity_or_Exogeneity_using_SAS-1 (1)
No ratings yet
How_to_Test_Endogeneity_or_Exogeneity_using_SAS-1 (1)
28 pages
Topic 3 - endogeneity (1)
No ratings yet
Topic 3 - endogeneity (1)
53 pages
Instrumental Variable Estimation 1: Framework: Instructor: Yuta Toyama Last Updated: 2021-05-18
No ratings yet
Instrumental Variable Estimation 1: Framework: Instructor: Yuta Toyama Last Updated: 2021-05-18
30 pages
05 Advance
No ratings yet
05 Advance
38 pages
Cathy Econ0019_w2
No ratings yet
Cathy Econ0019_w2
62 pages
Instrumental PDF
No ratings yet
Instrumental PDF
69 pages
Chapter 2 Simultaneous Equation Models New
No ratings yet
Chapter 2 Simultaneous Equation Models New
15 pages
16 Simultaneous Equations
No ratings yet
16 Simultaneous Equations
15 pages
Endogeneity 6
No ratings yet
Endogeneity 6
16 pages
Instrumental Variables & 2SLS: y + X + X + - . - X + U X + Z+ X + - . - X + V
No ratings yet
Instrumental Variables & 2SLS: y + X + X + - . - X + U X + Z+ X + - . - X + V
21 pages
Lec 01
No ratings yet
Lec 01
8 pages
Introduction To Econometrics: Wk14. Simultaneous Equations and IV Technique
No ratings yet
Introduction To Econometrics: Wk14. Simultaneous Equations and IV Technique
10 pages
Development Economics I Dr. Elisabetta Gentile: Orientation Tutorial
No ratings yet
Development Economics I Dr. Elisabetta Gentile: Orientation Tutorial
11 pages
Simultaneous Equations: Main Reading: Chapter 18,19 +20
No ratings yet
Simultaneous Equations: Main Reading: Chapter 18,19 +20
49 pages
additional-cheatsheet-en (1)
No ratings yet
additional-cheatsheet-en (1)
3 pages
slides-5-iu
No ratings yet
slides-5-iu
38 pages
Additional Cheatsheet en
No ratings yet
Additional Cheatsheet en
3 pages
Ec0 8203 Econometrics Ppt6b
No ratings yet
Ec0 8203 Econometrics Ppt6b
25 pages
CH 15
No ratings yet
CH 15
21 pages
Endogeneity and Instruments
No ratings yet
Endogeneity and Instruments
5 pages
Lecture 12 Instrumental Variables
No ratings yet
Lecture 12 Instrumental Variables
5 pages
Module 4
No ratings yet
Module 4
36 pages
Violation of (Weak) Exogeneity Assumption
No ratings yet
Violation of (Weak) Exogeneity Assumption
34 pages
Ch. 1 - Endogeneity
No ratings yet
Ch. 1 - Endogeneity
18 pages
wk06 IV
No ratings yet
wk06 IV
34 pages
Week 2, OLS
No ratings yet
Week 2, OLS
83 pages
Metrics WT 2023-24 Unit14 Sem
No ratings yet
Metrics WT 2023-24 Unit14 Sem
32 pages
Seminar TSLSM
No ratings yet
Seminar TSLSM
14 pages
Lec2 Ase Iev
No ratings yet
Lec2 Ase Iev
32 pages
Lecture10-Estimating the Linear Causal Model II -Slides annotated
No ratings yet
Lecture10-Estimating the Linear Causal Model II -Slides annotated
26 pages
Week_12_Measurement_Error_Spring_2021
No ratings yet
Week_12_Measurement_Error_Spring_2021
20 pages
Economics 308: Econometrics Professor Moody: Describing The Relationship Between Two Variables
No ratings yet
Economics 308: Econometrics Professor Moody: Describing The Relationship Between Two Variables
8 pages
Notes 11
No ratings yet
Notes 11
9 pages
Studenmund Ch14 v2
No ratings yet
Studenmund Ch14 v2
48 pages
Chapter 2: Causal and Noncausal Models: Advanced Econometrics 1
No ratings yet
Chapter 2: Causal and Noncausal Models: Advanced Econometrics 1
31 pages
MIT Microeconomics 14.32 Final Review
No ratings yet
MIT Microeconomics 14.32 Final Review
5 pages
Additional Cheatsheet En
No ratings yet
Additional Cheatsheet En
2 pages
Variáveis Instrumentais
No ratings yet
Variáveis Instrumentais
21 pages
Metrics WT 2023-24 Unit11 Endogeneity
No ratings yet
Metrics WT 2023-24 Unit11 Endogeneity
36 pages
Instrumental Variables & 2SLS: y + X + X + - . - X + U X + Z+ X + - . - X + V
No ratings yet
Instrumental Variables & 2SLS: y + X + X + - . - X + U X + Z+ X + - . - X + V
21 pages
L14 Instrumental Variables
No ratings yet
L14 Instrumental Variables
61 pages
ECONOMETRICS Summary 21:22
No ratings yet
ECONOMETRICS Summary 21:22
54 pages
Jack K. Strauss Simultaneity and VAR PDF
No ratings yet
Jack K. Strauss Simultaneity and VAR PDF
5 pages
Lecture Set 7
No ratings yet
Lecture Set 7
88 pages
Best Linear Predictor
No ratings yet
Best Linear Predictor
15 pages
TOPIC 3; INSTRUMENTAL VARIABLES REGRESSION (PART 1; BASICS)
No ratings yet
TOPIC 3; INSTRUMENTAL VARIABLES REGRESSION (PART 1; BASICS)
26 pages
Applied Economics IV Lecture Notes
No ratings yet
Applied Economics IV Lecture Notes
64 pages
Chapter 4 - Simultaneous-Equation Models
No ratings yet
Chapter 4 - Simultaneous-Equation Models
8 pages
Instrumental Variables
No ratings yet
Instrumental Variables
28 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Calculus by Muhammad Umer
From Everand
Calculus by Muhammad Umer
Muhammad Umer
No ratings yet
Calculus Volume1
From Everand
Calculus Volume1
Ming Yao Tsai
No ratings yet
Attacking Problems in Logarithms and Exponential Functions
From Everand
Attacking Problems in Logarithms and Exponential Functions
David S. Kahn
5/5 (1)
T3. Answers
No ratings yet
T3. Answers
3 pages
ECO401-Econometrics: Panel Data-Estimation Using Stata
No ratings yet
ECO401-Econometrics: Panel Data-Estimation Using Stata
9 pages
ECO 401 Econometrics: SI 2021 Week 5, 12 October
No ratings yet
ECO 401 Econometrics: SI 2021 Week 5, 12 October
31 pages
Eco 401 Econometrics: SI 2021, Week 9, 9 November 2021
No ratings yet
Eco 401 Econometrics: SI 2021, Week 9, 9 November 2021
39 pages
ECO 401 Econometrics: SI 2021 Week 1, 7 September
100% (1)
ECO 401 Econometrics: SI 2021 Week 1, 7 September
65 pages
ECO 401 Econometrics: SI 2021 Week 2, 14 September
100% (1)
ECO 401 Econometrics: SI 2021 Week 2, 14 September
47 pages
Business Plan For Jinja
No ratings yet
Business Plan For Jinja
289 pages
The Carvaka Philosophy
100% (1)
The Carvaka Philosophy
21 pages
Morgan Flooring Catalogue
No ratings yet
Morgan Flooring Catalogue
4 pages
om ent
No ratings yet
om ent
7 pages
June 2022 MS (1)
No ratings yet
June 2022 MS (1)
27 pages
TT2 Contents PDF
No ratings yet
TT2 Contents PDF
1 page
Interesting Adjectives and Good Sentences
No ratings yet
Interesting Adjectives and Good Sentences
3 pages
PMRE 6009 - Term Final
No ratings yet
PMRE 6009 - Term Final
3 pages
ELISA Worksheet 5
No ratings yet
ELISA Worksheet 5
3 pages
Seafish Aquaculture Windfarms
No ratings yet
Seafish Aquaculture Windfarms
36 pages
Magic Tee: Structure
No ratings yet
Magic Tee: Structure
5 pages
Ask For and Give Information Related To Actions/Functions
No ratings yet
Ask For and Give Information Related To Actions/Functions
10 pages
EMC Classic Performance
No ratings yet
EMC Classic Performance
1 page
Agriculture 6 Term 1 2020
No ratings yet
Agriculture 6 Term 1 2020
11 pages
G.R. No. 246702
No ratings yet
G.R. No. 246702
6 pages
Complaint-Affidavit: Republic of The Philippines Fifth Judicial Region Branch 1 Sorsogon City
No ratings yet
Complaint-Affidavit: Republic of The Philippines Fifth Judicial Region Branch 1 Sorsogon City
3 pages
MAZDA ORIGINAL OIL ATF FZ. Revision Date - 2018 - 08 - 08
No ratings yet
MAZDA ORIGINAL OIL ATF FZ. Revision Date - 2018 - 08 - 08
13 pages
Lisa Comfort Skirt Project
No ratings yet
Lisa Comfort Skirt Project
4 pages
IP Access
No ratings yet
IP Access
1 page
Report Encryption Decryption
100% (4)
Report Encryption Decryption
50 pages
Subhash_BR_P_CV (1)
No ratings yet
Subhash_BR_P_CV (1)
11 pages
National Educators Academy of The Philippines: Director IV, NEAP
No ratings yet
National Educators Academy of The Philippines: Director IV, NEAP
4 pages
Moderator Forum Text
No ratings yet
Moderator Forum Text
3 pages
50 Tips and Expressions For After Effects Enchanted Media
No ratings yet
50 Tips and Expressions For After Effects Enchanted Media
3 pages
Dakuten and Handakuten
No ratings yet
Dakuten and Handakuten
3 pages
Lesson21-Network Programmability and Automation
No ratings yet
Lesson21-Network Programmability and Automation
34 pages
Riba in Hadith
No ratings yet
Riba in Hadith
4 pages
Chapter#3 Socialization
No ratings yet
Chapter#3 Socialization
46 pages
DigiBanker 2.0 User Guide Deletion of User IDs
No ratings yet
DigiBanker 2.0 User Guide Deletion of User IDs
22 pages

Week 10

Uploaded by

Week 10

Uploaded by

Eco 401 Econometrics

Dr Syed Kanwar Abbas

What is the endogeneity problem?

This lecture is based on Chapter 5 of your textbook by Verbeek (2017).

Consider a model 𝑦𝑦𝑡𝑡 = 𝒙𝒙′𝑡𝑡 𝜷𝜷 + 𝜀𝜀𝑡𝑡

o the assumption that 𝜀𝜀𝑡𝑡 is independent of 𝒙𝒙𝑡𝑡 may be too strong

Data is often measured with error

Some unobservable (or omitted) variable affects both 𝑦𝑦 and 𝑥𝑥

plim is only 0 if 𝐸𝐸 𝒙𝒙𝑖𝑖 𝑢𝑢𝑖𝑖 = 𝟎𝟎 so no problem; plim = 0

However, aggregate income is not exogenous; in a closed economy without a

A possible solution to the endogeneity problem is using Instrumental Variable (IV)

cov 𝑥𝑥𝑡𝑡 , 𝑧𝑧𝑡𝑡 ≠ 0 cov 𝑧𝑧𝑡𝑡 , 𝜀𝜀𝑡𝑡 = 0

for some elements of xt.

Then the IV estimator based on these instruments is given by

Its (asymptotic) covariance matrix is given by

replaces 𝑿𝑿′ in OLS formula

To use IV as a consistent estimator for 𝜷𝜷 we need valid instruments

Exogeneity is based on economic assumptions or gut feeling; it can only be tested if we

The estimated average returns to schooling is 7.4%

Any IV estimate requires a choice of instruments that should be motivated; always

You might also like