0% found this document useful (0 votes)
2 views

Instrumental-variables-slides-2021

The notes discuss instrumental variables and their application in regression models, particularly in the context of measurement error and identification of causal relationships. It outlines the conditions under which instrumental variables can be used to obtain consistent estimates of parameters, emphasizing the importance of the correlation between instruments and error terms. Examples include demand equations, training program evaluations, and measurement error in regression models, along with methods for estimation such as two-stage least squares.

Uploaded by

jorgebac1718
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Instrumental-variables-slides-2021

The notes discuss instrumental variables and their application in regression models, particularly in the context of measurement error and identification of causal relationships. It outlines the conditions under which instrumental variables can be used to obtain consistent estimates of parameters, emphasizing the importance of the correlation between instruments and error terms. Examples include demand equations, training program evaluations, and measurement error in regression models, along with methods for estimation such as two-stage least squares.

Uploaded by

jorgebac1718
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Instrumental variables

Class Notes

Manuel Arellano
Revised: March 1, 2021
Introduction

So far we have studied regression models; that is, models for a conditional
expectation or a linear approximation to it.
Now we wish to study relations between random variables that are not regressions.
An example is the relationship between yt and yt 1 in an ARMA(1,1) model.
A linear regression is as a linear relationship between observable and unobservable
variables with the property that regressors are orthogonal to the unobservable term.
For example, given two variables (yi , xi ), the regression of y on x is
yi = α + βxi + ui (1)
where β = Cov (yi , xi ) /Var (xi ), therefore Cov (xi , ui ) = 0.
Similarly, the regression of x on y is:
xi = γ + δyi + εi
where δ = Cov (yi , xi ) /Var (yi ), and Cov (yi , εi ) = 0. Solving the latter for yi also:
yi = α† + β† xi + ui† (2)

with α† = γ/δ, β = 1/δ, ui† = εi /δ.

2
Introduction (continued)

Both (1) and (2) are statistical relationships between y and x . If we are interested in
some economic relation between y and x , how should we choose?

If the goal is to describe means, we would opt for (1) if interested in the mean of y
for given values of x , and for (2) if interested in the mean of x for given values of y .

In equation (2) Cov x , u † 6= 0 but Cov y , u † = 0 whereas in equation (1) the


opposite is true.

However, in the ARMA(1,1) model both the left-hand side and the right-hand side
variables are correlated with the error term.

To respond a question of this kind we need a prior idea about the nature of the
unobservables in the relationship.

We …rst illustrate this situation by considering measurement error models.

3
Measurement error

4
Measurement error in an exact relationship

Consider an exact relationship between the variables yi and xi :

yi = α + βxi

Suppose we observe xi without error but we observe an error-ridden measure of yi :

yi = yi + vi

where vi is a zero-mean measurement error independent of xi . Therefore,

yi = α + βxi + vi .

In this case β coincides with the slope coe¢ cient in the regression of yi on xi :

Cov (xi , yi )
β= .
Var xi

5
Measurement error in an exact relationship (continued)

Now suppose that we observe yi without error but xi is measured with an error εi
independent of (yi , xi ):
xi = xi + εi .
The relation between the observed variables is
yi = α + βxi + ζ i (3)
where ζ i = βεi .
In this case the error is independent of yi but is correlated with xi .
Thus, β coincides with the inverse slope coe¢ cient in the regression of xi on yi :
Var (yi )
β= . (4)
Cov xi , yi
In general, inverse regression makes sense if one suspects that the error term in the
relationship between y and x is essentially driven by measurement error in x .
As it will become clear later, (4) can be interpreted as an instrumental-variable
parameter in the sense that yi is used as an instrument for xi in (3).
Next, we consider measurement error in regression models.

6
Regression model with measurement error

Measurement error may be due to de…nitional di¤erences between a variable of


interest and the one we observe, but also to rounding error or misreporting.

Let us consider the regression model

yi = α + βxi + ui

where ui is independent of xi .

We distinguish two cases: one in which there is measurement error in yi and another
in which there is measurement error in xi .

7
Measurement error in yi

We observe yi = yi + vi such that vi ? (xi , ui ). In this case,

yi = α + βxi + (ui + vi ) ,

so that
Cov (xi , yi ) Cov (xi , yi )
β= = .
Var xi Var xi
The only di¤erence with the original regression is that the variance of the error term
is larger due to the measurement error, which means that the R 2 will be smaller:

β2 Var (xi ) β2 Var (xi )


R2 = , R2 = ,
β2 Var xi + σ2u β2 Var xi + σ2u + σ2v

so that the larger σ2v the smaller R 2 will be relative to R 2 :

R2
R2 = σ2v
.
1+
β2 Var (xi )+σ2u

8
Measurement error in xi

Now xi = xi + εi such that εi ? (xi , ui ). In this case,

yi = α + βxi + (ui βεi ) .

Then
Cov (xi , yi ) Cov (xi , yi ) β λ
β= = = =β β
Var (xi ) Var xi + σ2ε σ2
1 + Var (εx ) 1+λ
i

where λ = σ2ε /Var (xi ).

Thus, OLS estimates will be biased for β with a bias that depends on the noise to
signal ratio λ.

For example, if λ = 1 the regression coe¢ cient will be half the size of the e¤ect of
interest.

An example: yi = consumption, xi = permanent income, ui = transitory


consumption, εi = transitory income.

9
Identi…cation using λ

If we can consistently estimate λ or σ2ε then consistent estimation may be based on


the following expressions:
Cov (xi , yi ) Cov (xi , yi )
β = (1 + λ ) = . (5)
Var (xi ) Var (xi ) σ2ε
More generally, if xi is a vector of variables measured with error, so that

yi = xi0 β + ui εi0 β

xi = xi + εi , E εi εi0 = Ω,
a vector-valued generalization of (5) takes the form:
1
β = E xi xi0 Ω E (xi yi ) .

10
Instrumental-variable model

11
Identi…cation

The set-up is as follows. We observe fyi , xi , zi gni=1 with dim (xi ) = k, dim (zi ) = r
such that
yi = xi0 β + ui E (zi ui ) = 0.
Typically there will be overlap between variables contained in xi and zi , for example a
constant term (“control” variables).
Variables in xi that are absent from zi are endogenous explanatory variables.
Variables in zi that are absent from xi are external instruments.
The assumption E (zi ui ) = 0 implies that β solves the system of r equations:
E zi yi xi0 β =0
or
E zi xi0 β = E (zi yi ) . (6)
If r < k, system (6) has a multiplicity of solutions, so that β is not point identi…ed.
If r k and rank E (zi xi0 ) = k then β is identi…ed.
In estimation we will distinguish between the just-identi…ed case (r = k) and the
over-identi…ed case (r > k).
If r = k and the rank condition holds we have
1
β = E zi xi0 E (zi yi ) . (7)
12
Identi…cation (continued)

In the simple case where xi = (1, xoi )0 , zi = (1, zoi )0 and β = ( β1 , β2 )0 we get

Cov (zoi , yi )
β2 =
Cov (zoi , xoi )
and
β1 = E (yi ) β2 E (xoi ) .
In general, the OLS parameters will di¤er from the parameters in the IV model.

In the previous simple example we have:


Cov (xi , yi ) Cov (xi , ui )
= β2 + . (8)
Var (xi ) Var (xi )
Sometimes the orthogonality between instruments and error term is expressed in the
form of a stronger mean independence assumption instead of lack of correlation:

E (ui j zi ) = 0.

13
Examples

Demand equation

In this example the units are markets across space or over time, yi is quantity, the
endogenous explanatory variable is price and the external instrument is a supply
shifter, such as weather variation in the case of an agricultural product.

This is the classic example from the simultaneous equations literature.

Evaluation of a training program

Here the units are workers, the endogenous explanatory variable is an indicator of
participation in a training program and yi is some subsequent labor market outcome,
such as wages or employment status.

The external instrument is an indicator of random assignment to the program.

In this example we would expect the coe¢ cient in the instrumental-variable line to be
positive, whereas the coe¢ cient in the OLS line could be negative.

14
Examples (continued)

Measurement error
Consider the measurement error regression model:
yi = β1 + β2 xi + vi
where we observe two measurements of xi with independent errors:
x1i = xi + ε1i
x2i = xi + ε2i .
All unobservables xi , vi , ε1i , ε2i are mutually independent.
In this example, we could have xi = (1, x1i )0 , zi = (1, x2i )0 and ui = vi β2 ε1i ; or
alternatively xi = (1, x2i )0 , zi = (1, x1i )0 and ui = vi β2 ε2i .

Time series regression with dynamics and serial correlation


A simple example is the ARMA(1,1) model:
yt = β1 + β2 yt 1 + ut
ut = εt + θεt 1
0 0
where εt is a white noise error term. Here xt = (1, yt 1) and zt = (1, yt 2) .

15
Estimation

Simple IV estimator

When r = k a simple IV estimator is the sample counterpart of (7):


n 1 n
b
β= ∑ zi xi0 ∑ zi yi .
i =1 i =1

The estimation error is given by


1
b 1 n 1 n
β β= ∑ z x0 ∑ zu.
n i =1 i i n i =1 i i

Thus, plimn !∞ b β = β if plim n1 ∑ni=1 zi xi0 = E (zi xi0 ) = H, rank H = k, and


1 n
plim n ∑i =1 zi ui = E (zi ui ) = 0.

Also,
p d
n b
β β ! N 0, H 1
WH 0 1

d
if n 1/2 ∑ni=1 zi ui ! N (0, W ).
When fyi , xi , zi gni=1 is a random sample W = E ui2 zi zi0 .

16
Overidenti…ed IV

If r > k the system (6) contains more equations than unknowns.


To determine the population value of β we could solve any rank-preserving k linear
combinations for some k r matrix G :
GE zi xi0 β = GE (zi yi )
so that
1
β = E Gzi xi0 E (Gzi yi ) , (9)
leading to consistent estimators of the form
n 1 n
b
βG = ∑ Gzi xi0 ∑ Gzi yi . (10)
i =1 i =1
Note that while (9) should be invariant to the choice of G if the model is correctly
speci…ed, the estimated quantity (10) will di¤er due to sample error.
For example, if xi = (1, xoi )0 and zi = (1, z1i , z2i )0 we will have
Cov (z1i , yi ) Cov (z2i , yi )
=
Cov (z1i , xoi ) Cov (z2i , xoi )
but
d (z1i , yi )
Cov d (z2i , yi )
Cov
6= .
d (z1i , xoi )
Cov d (z2i , xoi )
Cov

17
Asymptotic normality

Turning to large sample properties, repeating the previous asymptotic normality


argument for (10), under iid sampling we get:
p d
n b
βG β ! N (0, VG )

with
1 1
VG = GE zi xi0 GE ui2 zi zi0 G 0 E xi zi0 G 0 . (11)

Thus, the large sample variance depends on the choice of G .

18
Optimality
1
For G = E (xi zi0 ) E ui2 zi zi0 the matrix VG equals
h i 1 1
V0 = E xi zi0 E ui2 zi zi0 E zi xi0 .

Moreover, it can be shown that for any other choice of G we have:


VG V0 0.
Therefore, estimators of the form
n 1 n
b
βG n = ∑ Gn zi xi0 ∑ G n zi yi (12)
i =1 i =1
p 1
with a possibly stochastic Gn such that Gn ! E (xi zi0 ) E ui2 zi zi0 are optimal in
the sense of being minimum asymptotic variance within the class of linear IV
estimators, which use zi as instruments.
Under homoskedasticity E ui2 zi zi0 = σ2 E (zi zi0 ), therefore a choice of Gn such that
p 1
Gn ! E xi zi0 E zi zi0 =Π
is optimal.
Π is the matrix of OLS population coe¢ cients in regressions of the xi variables on zi .

19
Two-stage least squares

b = ( ∑n xi z 0 ) ( ∑n zi z 0 ) 1
Letting Π i =1 i i =1 i be the sample counterpart of Π, the two-stage
least squares estimator is
1
b b i xi0
β2SLS = ∑ni=1 Πz b i yi
∑ni=1 Πz (13)
or in short
b 1
β2SLS = ∑ni=1 xbi xi0 ∑ni=1 xbi yi (14)
b i is the vector of …tted values in the (“…rst-stage”) regressions of the
where xbi = Πz
xi variables on zi :
xi = Πzi + vi (15)
If a variable in xi is also contained in zi its …tted value will coincide with the variable
itself and the corresponding element of vi will be equal to zero.
Sometimes it is convenient to use matrix notation as follows:
b = X 0Z Z 0Z 1
Π
so that h i
1 1 1
b
β2SLS = X 0Z Z 0Z Z 0X X 0Z Z 0Z Z 0y
b = Z (Z 0 Z ) 1
and letting X (Z 0 X ) also:
1
b b 0X
β2SLS = X b 0y .
X
20
Two-stage least squares (continued)

b b:
β2SLS is also the OLS regression of y on X
1
b b 0X
β2SLS = X b b 0y .
X

This interpretation of the 2SLS estimator is the one that originated its name.

2SLS relies on a powerful intuition: we use as instrument the linear combination of


the instrumental variables that best predicts the endogenous explanatory variables in
the linear projection sense.

Consistency of b
β2SLS relies on n ! ∞ for …xed r .

b = X so that 2SLS and OLS coincide. If r is less than n but close to


If r = n then X
it, one would expect 2SLS to be close to OLS.

21
Robust standard errors

Although its optimality requires homoskedasticity, 2SLS (like OLS) remains a popular
estimator under more general conditions.

Particularizing expression (11) to G = Π we obtain the asymptotic variance of the


2SLS estimator
1 1
VΠ = ΠE zi zi0 Π0 ΠE ui2 zi zi0 Π0 ΠE zi zi0 Π0 . (16)

Heteroskedasticity-robust standard errors and con…dence intervals can be obtained


from the estimated variance:
h i 1 h i 1
bΠ =
V bE
Π b0
b zi zi0 Π bE
Π b u b0 Π
bi2 zi zi0 Π bE b0
b zi zi0 Π
1 n 1
= b 0X
n X b bi2 xbi xbi0
∑ u b 0X
X b
i =1

bi are 2SLS residuals u


where the u bi = yi xi0 b
β2SLS .

22
Robust standard errors (continued)

With homoskedastic errors, (16) boils down to


1
VΠ = σ2 ΠE zi zi0 Π0 (17)

where σ2 = E ui2 . In this case a consistent estimator of VΠ is simply


1
V σ2 X
eΠ = nb b 0X
b (18)

b2 = n
where σ 1
∑ni=1 u
bi2 .

bb
Note that if the residual variance is calculated from …tted-value residuals y X β2SLS
b = y Xb
instead of u β2SLS , we would get an inconsistent estimate of σ2 and
therefore also of VΠ in (17).

23
Testing overidentifying restrictions

When r > k an IV estimator sets to zero k linear combinations of the moments:

GE zi xi0 β = GE (zi yi )

Thus, there remains r k linearly independent combinations that are not set to zero
in estimation but should be close to zero under correct speci…cation.

A test of overidentifying restrictions or Sargan test is a test of the null hypothesis


that the remaining r k linear combinations are equal to zero.

Under classical errors the form of the statistic is given by


1
b 0 Z (Z 0 Z )
u Z 0u
b d
S= ! χ2r k (19)
b2
σ
It is easy to see that S = nR 2 where R 2 is the r-squared in a regression of u
b on Z .

24
Testing overidentifying restrictions (continued)

A sketch of the result in (19) is as follows. With classical errors


d
n 1/2
∑ni=1 zi ui ! N 0, σ2 E (zi zi0 ) and therefore also
1 1 d
p C 0 Z 0 u ! N (0, Ir )
nσ b
1
where we are using the factorization (Z 0 Z /n ) = CC 0 .
Next, using
b=y
u Xb
β2SLS = u X b
β2SLS β
and h i
1 1 1
b
β2SLS β= X 0Z Z 0Z Z 0X X 0Z Z 0Z Z 0 u,
we get
1 1 0 0 h i 1 1
1 0
h= p C Z ub = Ir B B 0 B B p C 0Z 0u
nσ b nσb
where B = C 0 (Z 0 X /n ).
h i
1
Since the probability limit of I B (B 0 B ) B 0 is idempotent with rank r k it
follows that
1
b 0 Z (Z 0 Z ) Z 0 u
u b d 2
h0 h = n 0 ! χr k .
bu
u b
25
Testing overidentifying restrictions (continued)

In the presence of heteroskedasticity, the statistic S in (19) is not asymptotically


chi-square, not even under correct speci…cation.

An alternative robust Sargan statistic is:

f d
e0 Z W
SR = u 1
Z 0u
e ! χ2r k (20)

f = ∑n u
where W 2 0
i =1 bi zi zi and u
e=y Xb f
βG † with Gn† = (X 0 Z ) W 1.
n

Contrary to b
β2SLS , the IV estimator b
βG † given by
n

h i 1
b
βG † = f
X 0Z W 1
Z 0X f
X 0Z W 1
Z 0y (21)
n

uses an optimal choice of Gn under heteroskedasticity.

This improved IV estimator was studied by Halbert White in 1982 under the name
two-stage instrumental variables (2SIV) estimator.

26

You might also like