0% found this document useful (0 votes)
2 views

Applied Economics IV Lecture Notes

Uploaded by

朱逢爽
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Applied Economics IV Lecture Notes

Uploaded by

朱逢爽
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Applied Economics: Instrumental Variables

Philipp Ager
University of Mannheim and CEPR

IV Lecture Notes
September 10, 16, 23 and 24, 2024

1 / 64
Non-technical Introduction to IV-Estimation

In many applications of the linear regression model, we


suspect that some regressors are endogenous

How can we solve the problem of endogeneity of one or


more explanatory variables?
Sometimes, we can find exogenous variables that are correlated
with the endogenous regressor but not correlated with the
error term
Those variables are called instrumental variables or instruments

2 / 64
Inconsistency of OLS

y = βx + u

Under the assumption that the regressors are uncorrelated


with the errors in the model, the OLS estimate is unbiased

In many circumstances the regressor is correlated with the


error term and the OLS estimate is biased
Ex.: if wages are only regressed on the years of schooling

3 / 64
Threats to Identification

Applied researchers are often confronted with so-called


“endogeneity problems”
The most common threats to identification are:
1 Omitted variables

2 Reverse causality

3 Measurement error

One potential solution to circumvent these problems is the


instrumental variable approach

4 / 64
Definition of an Instrument

Typically derived from natural or random experiments

The aim is to find a variable z that generates only exogenous


variation in x and is not correlated with the error term

z can be used as instrumental variable for the regressor


x in the scalar regression model y = βx + u ONLY IF
(1) z is uncorrelated with the error u (exogeneity)
(2) z is correlated with the regressor (relevance)

5 / 64
Instrument Validity

1 Instrument is as good as randomly assigned


2 Instrument Exogeneity:
No direct effect of the instruments on the dependent variable
or through omitted variables (exclusion restriction)
No reverse effect of the dependent variable on the instruments
Convincingly describe why the instruments only influence
the endogenous regressors
3 Instrument Relevance (can be tested):
Valid instruments are highly correlated with the endogenous
regressors even after controlling for exogenous regressors
Diagnostics for the detection of weak instruments

6 / 64
Example

Yi = α + ρSi + γAi + ei

Aim is to estimate the returns to schooling

Problem of omitted variable bias, e.g., is there a good proxy


for ability Ai

IV solution: use variation in Si which is unrelated to Ai and


other unobserved variables

7 / 64
Three important causal effects

Call the instrumental variable Zi


There are three causal effects we can think about:
1 The causal effect of Zi on Si
2 The causal effect of Zi on Yi
3 The causal effect of Si on Yi

The last effect is the one we are ultimately interested in, the
returns to schooling ρ

8 / 64
Three important equations
Some “IV language” which relates to the three causal effects
of the previous slide
1 First stage: Regression of schooling on the instrument
(causal effect #1)
Si = π10 + π11 Zi + ϵ1i

2 Reduced Form: Regression of earnings on the instrument


(causal effect #2)
Yi = π20 + π21 Zi + ϵ2i

3 Structural Equation: Regression of earnings on schooling


(causal effect #3)
Yi = α + ρSi + ηi

ηi = γAi + ei is the structural error term


9 / 64
Important to know (1)

Conditions 1 and 3 on the instrument validity are enough to


get causal effects for equation #1 and #2

These conditions are sufficient for the first stage and reduced
forms to have a causal interpretation

Note: The reduced form coefficient can be interesting in


itself. For example, the instrument might be a policy variable
(e.g., compulsory schooling laws) in which case it is the
policy effect

10 / 64
Important to know (2)

To get the causal effect of equation (#3), i.e., the structural


parameter ρ, we also need condition 2, the exclusion restriction

This condition is often the most difficult requirement on an


instrument. It is distinct from random assignment, so having
experimental variation does not guarantee a valid interpretation
of the IV estimates

11 / 64
Linking the three equations

The coefficients of the three regressions are indeed linked:

Yi = α + ρSi + ηi
= α + ρ[π10 + π11 Zi + ϵ1i ] + ηi
= (α + ρπ10 ) + ρπ11 Zi + (ρϵ1i + ηi )
= π20 + π21 Zi + ϵ2i

The reduced form coefficients are:


π20 = α + ρπ10
π21 = ρπ11

12 / 64
Indirect Least Squares (ILS)

The IV estimate is equal to the ratio of the reduced form


coefficient on the instrument to the first stage coefficient

This is called indirect least squares

π21
ρ=
π11

Only works with one endogenous regressor and one instrument.


In this case one says the model is just identified

If there are multiple instruments for a single endogenous


regressor the model is over-identified

13 / 64
Two-stage Least Squares Estimation

A standard estimation technique used by applied researchers


when using instrumental variables is the so-called two-stage
least squares estimation (2SLS)
The 2SLS estimator gets its name from the result that it can
be obtained by two consecutive OLS regressions:
OLS regression of x on z ′ s to get xb
Followed by OLS of y on xb which gives βb2SLS
Standard statistical programs, such as STATA, have a routine
to estimate 2SLS
One can show that in case of one single endogenous variable
and one instrument, the 2SLS estimator is the same as the
corresponding ILS estimator.

14 / 64
Example: Angrist and Krueger (1991, QJE)

Quarter-of-birth as instruments

Most U.S. states require students to enter school in the


calendar year when they turn 6

School start age is therefore a function of date of birth (children


born in the 1st quarter enter school at an older age than
those born in the 4th quarter)

Compulsory schooling laws typically require students to remain


in school until their 16th birthday

Combination of entry-age and compulsory schooling laws


create natural experiment in “school length” before dropping
out depending on their birthdays

15 / 64
Example: Angrist and Krueger (1991, QJE)

Data are from the 1980 US Census

Sample includes 329,509 men born 1930 to 1939 (i.e., they


are in their 40s when observed)

For these men information on year of birth, quarter of birth,


years of schooling, and earnings in 1979

16 / 64
Example: Angrist and Krueger (1991, QJE)
Men born earlier in the calendar year tend to have lower average
schooling levels

Source: Angrist and Pischke (2009)


17 / 64
Example: Angrist and Krueger (1991, QJE)
Men born in early quarters almost always earned less, on average,
than those born later in the year

Source: Angrist and Pischke (2009)


18 / 64
Example: Angrist and Krueger (1991, QJE)

Related pattern in reduced-form and first-stage

Key assumption: Individual’s date of birth should be unrelated


to innate ability, family connections, motivation . . .
⇒ Only reason for up-and-down pattern in quarter-of-birth in
earnings is driven by the quarter-of-birth pattern in schooling

19 / 64
Example: Angrist and Krueger (1991, QJE)

One way to look at IV with a binary instrument and no covariates


(e.g., Angrist and Krueger use a dummy if born in the first
quarter (Q1)), is the following:

cov (lnYi , Q1i )


βIV =
cov (Si , Q1i )
E[lnYi |Q1i = 1] − E[lnYi |Q1i = 0]
=
E[Si |Q1i = 1] − E[Si |Q1i = 0]

Rescaling of reduced form-difference in means by the corres-


ponding first-stage difference in means (= Wald-Estimator)

20 / 64
Example: Angrist and Krueger (1991, QJE)

The first stage and reduced from are:

Si = π11 + π12 Q1i + ϵ1i ,


lnYi = π21 + π22 Q1i + ϵ2i

Taking expectation conditionally on Q1i , we obtain:

E[lnYi |Q1i = 1] = π21 + π22 ,


E[lnYi |Q1i = 0] = π21

Same for the first stage. The differences in group means


with the instrument switched on/off are π12 and π22

The ratio between the two is the IV-estimator

21 / 64
Example: Angrist and Krueger (1991, QJE)
Wald estimator: return to education is the ratio of the difference
in earnings between men born in 1st /other quarters of the
year and the corresponding difference in schooling

Source: Angrist and Krueger (1991)

22 / 64
Example: Angrist and Krueger (1991, QJE)

Source: Angrist and Pischke (2009)

23 / 64
Example: Angrist and Krueger (1991, QJE)

Is quarter-of-birth a good instrument?


1 Random Assignment: Births are almost uniformly spaced
over the year ... but
2 Exclusion restriction: variation in maternal characteristics
– women giving birth in winter are more likely to be teenagers
and less likely to be married or to have a high school diploma
(Buckles and Hungerman, 2013)
3 Relevance: issue of QOB being a weak instrument (see
Bound et al., 1995). Detection possible by looking at first
stage statistics

24 / 64
Buckles and Hungerman (2013, RESTAT)

Source: Buckles and Hungerman (2013))


25 / 64
IV with Heterogeneous Treatment Effects

26 / 64
Recap: IV with a constant causal effect

Assume one binary endogenous regressor and one binary


instrument

Let Yi be the outcome of interest for unit i, Di the endogenous


regressor, and Zi the instrument

Linear model:

Yi = α + ρDi + ηi

If the instrument Zi is both uncorrelated with ηi and correlated


with the endogenous regressor Di
⇒ use 2SLS (or ILS) to obtain ρ̂IV

27 / 64
Heterogeneous Treatment Effects

Let’s allow for treatment effect heterogeneity (i.e., a distribution


of causal effects across individuals)

The main questions there are:


1 What is IV estimating when we have heterogeneous treatment
effects?
2 Under what assumptions will IV identify a causal effect with
heterogeneous treatment effects?

28 / 64
Heterogeneous Treatment Effects

Why is treatment heterogeneity important?

With heterogeneous treatment effects, we introduce a distinction


between the internal validity of a study and its external validity
Internal validity means our strategy identified a causal effect
for the population we studied (e.g., a randomized clinical trial
has a strong claim to internal validity)
External validity is the predictive value of the study’s findings
in a different context

Under homogeneous treatment effects, there is no tension


between external and internal validity because everyone
has the same treatment effect, ρ

29 / 64
Local average treatment effects (LATE) – (1)

Let’s adopt a generalized potential outcomes concept, indexed


against both instruments and treatment status

Let Yi (d, z) denote the potential outcome of individual i were


this person to have treatment status Di = d and instrument
value Zi = z

This tells us, what the outcome of i would be given alternative


combinations of Di and Zi

We can think of instrumental variables as initiating a causal


chain where the instrument Zi affects the variable of interest,
Di , which in turn affects outcome, Yi

30 / 64
Local average treatment effects (LATE) – (2)

Let D1i be i’s treatment status when Zi = 1 and D0i the


treatment status when Zi = 0

The sub-populations
LATE framework partitions any population with an instrument
into a set of three instrument-dependent subgroups:
Compliers: the subpopulation with D1i = 1 and D0i = 0
Always-takers: the subpopulation with D1i = D0i = 1
Never-takers: the subpopulation with D1i = D0i = 0

There is also a group called Defiers that do exactly the


opposite what the assignment (instrument) wants them to
do (we will rule them out later)

31 / 64
Local average treatment effects (LATE) – (3)

Let’s think about the compliance behavior of the different


units, that is how they respond to different values of the
instrument in terms of the treatment received

There are four possible pairs of values (Di (0), Di (1)), given
the binary nature of the treatment and instrument

Problem: we only see the pair (Zi , Di ), not the pair (Di (0), Di (1))

32 / 64
Local average treatment effects (LATE) – (4)

Only one one of the potential treatment assignments, D1i


and D0i , is ever observed for any one person

The observed treatment status is therefore:

Di = D0i + (D1i − D0i )Zi


= π0 + π1i Zi + ηi

π0 ≡ E[D0i ]
π1i ≡ (D1i - D0i ) is the heterogeneous causal effect of the
instrument on Di

The average causal effect of Zi on Di is E[π1i ]

33 / 64
Local average treatment effects (LATE) – (5)

Table 2 summarizes the information about compliance behavior


from observed treatment status and instrument

34 / 64
Assumptions for Identification (1)

1 Independence assumption: the instrument is a good as


randomly assigned – it is independent of the vector of potential
outcomes and potential treatment assignments
sufficient for a causal interpretation of the reduced form (i.e.,
the effect of the instrument on Y)
the first stage captures the causal effect of Zi on Di

2 Exclusion restriction: is distinct from the claim that the


instrument is (as good as) randomly assigned

35 / 64
Assumptions for Identification (2)

Example: Draft lottery to estimate causal effect of military


service on earnings (Angrist, 1990)
The exclusion restriction would be violated if low lottery numbers
affected schooling by people avoiding the draft
If this was the case, then the lottery number would be correlated
with earnings for at least two cases
One, through the instrument’s effect on military service, and
two, through the instrument’s effect on schooling

The implication of the exclusion restriction is that a random


lottery number (independence) does not therefore imply that
the exclusion restriction is satisfied

It is a claim about a unique channel for causal effects of the


instrument
36 / 64
Assumptions for Identification (3)

3 Existence of a first stage


E[D1i − D0i ] is not 0
Z needs to have some statistically significant effect on the
average probability of treatment
Example: having a low lottery number. Does it increase the
average probability of military service? If so, then it satisfies
the first stage requirement
Note: this can be tested

37 / 64
Assumptions for Identification (4)
4 Monotonicity assumption
The instrument may have no effect on some people, all those
who are affected are affected in the same way
It is not the case that the instrument pushes some people
into treatment while pushing others out ⇒ no defiers

38 / 64
Local average treatment effects (LATE) – (6)

Given these four assumptions one can interpret the coefficient


of interest, ρ as the local average treatment effect (LATE)

Effect of Z on Y
ρIV ,LATE =
Effect of Z on D
E[Yi |Zi = 1] − E[Yi |Zi = 0]
=
E[Di |Zi = 1] − E[Di |Zi = 0]

E[Yi (D1i , 1) − Yi (D0i , 0)]


=
E[D1i − D0i ]

= E[(Y1i − Y0i )|D1i − D0i = 1]

39 / 64
Local average treatment effects (LATE) – (6)

ρIV ,LATE is the average causal effect of D on Y on for those


whose treatment status was changed by the instrument Z

We know that because notice the difference in the last line


D1i − D0i

So, for those people for whom that is equal to 1, we calculate


the difference in potential outcomes, which means we are
only averaging over treatment effects for compliers

This is why the parameter we are estimating is “local”

40 / 64
Local average treatment effects (LATE) – (7)

Some more formalities . . .

Consider the least squares regression of Y on a constant


and Z (reduced form). The slope coefficient from this regression
is E[Yi |Zi = 1] − E[Yi |Zi = 0]
Consider the first term:
E[Yi |Zi = 1] = E[Yi |Zi = 1, c] · P(c|Zi = 1) + E[Yi |Zi = 1, n] · P(n|Zi = 1) + . . .
+ E[Yi |Zi = 1, a] · P(a|Zi = 1)
= E[Y1i |c] · πc + E[Y0i |n] · πn + E[Y1i |a] · πa

where πc = share of compliers in the population; πn = share of never-takers in the


population; and πa = share of always-takers in the population(*)

41 / 64
Local average treatment effects (LATE) – (8)

Consider the second term:


E[Yi |Zi = 0] = E[Yi |Zi = 0, c] · P(c|Zi = 0) + E[Yi |Zi = 0, n] · P(n|Zi = 0) + . . .
+ E[Yi |Zi = 0, a] · P(a|Zi = 0)
= E[Y0i |c] · πc + E[Y0i |n] · πn + E[Y1i |a] · πa

Hence the difference is:

E[Yi |Zi = 1] − E[Yi |Zi = 0] = E[Y1i − Y0i |compliers] · πc

The same argument can be used to show that the slope


coefficient in the regression of D on Z is:

E[Di |Zi = 1] − E[Di |Zi = 0] = πc

42 / 64
Local average treatment effects (LATE) – (9)

Hence, the instrumental variables estimand, the ratio of these


two reduced form estimands, is equal to the local average
treatment effect

E[Yi |Zi = 1] − E[Yi |Zi = 0]


βIV ,LATE = = E[Y1i − Y0i |compliers]
E[Di |Zi = 1] − E[Di |Zi = 0]

The key insight is that the data are informative solely about
the average effect for compliers
The data are not informative about the average effect for
never-takers because they are never seen receiving treatment
The data are also not informative about the average effect for
always-taker because they are never seen without treatment

43 / 64
Distribution of Compliance Types* (p.41)

Under random assignment and monotonicity we can estimate


the distribution of compliance types

πa = P(D1i = D0i = 1) = E[Di |Zi = 0]


πc = P(D1i = 1, D0i = 0) = E[Di |Zi = 1] − E[Di |Zi = 0]
πn = P(D1i = D0i = 0) = 1 − E[Di |Zi = 1]

We can then consider average outcomes by instrument and


treatment

44 / 64
Average Outcomes by Instrument and Treatment

E[Yi |Di = 0, Zi = 1] = E[Y0i |n]

E[Yi |Di = 1, Zi = 0] = E[Y1i |a]

πc πn
E[Yi |Di = 0, Zi = 0] = πc +πn · E[Y0i |c] + πc +πn · E[Y0i |n]

πc πa
E[Yi |Di = 1, Zi = 1] = πc +πa · E[Y1i |c] + πc +πa · E[Y1i |a]

From this we can infer the average outcome for compliers


E[Y0i |c] and E[Y1i |c] and thus the average effect of compliers
E[Y1i − Y0i |compliers]

45 / 64
Proportion of treated who are compliers

We can also tell what proportion of the treated are compliers:

P(Zi = 1)(E[Di |Zi = 1] − E[Di |Zi = 0])


πc|Di =1 =
P(Di = 1)
that is, the first stage times the probability the instrument is switched on divided
by the proportion treated

46 / 64
Example: Probability of Compliance

47 / 64
General Remarks (1)

E[Y1i − Y0i |Di = 1] — the effect of treatment on the treated


is a weighted average of effects on always-takers and compliers
(see Angrist and Pischke (2009, p. 159))

E[Y1i − Y0i |Di = 0] — the average effect of treatment on the


non-treated is a weighted average of effects on never-takers
and compliers
E[Y1i − Y0i ] — the unconditional average treatment effect is
a weighted average of the effects on compliers, always-takers,
and never-takers
Because an IV is not directly informative about effects on always-takers and
never-takers, instruments do not usually capture the average causal effect
on all of the treated or on all of the non-treated

Exceptions: instruments that allow no always-takers or no never-takers

48 / 64
General Remarks (2)

Although we cannot consistently estimate the average effect of


the treatment for always-takers and never-takers, we do have
some information and can estimate E[Y0i |n] and E[Y1i |a]

Is there evidence of heterogeneity in outcomes by compliance


status: compare the pair of average outcomes of Y0i of never-takers
and compliers and Y1i of always-takers and compliers

If these outcomes are found to be substantially different in levels


⇒ less plausible that the average effect for compliers is indicative
of average effects for other compliance types

If these outcomes are similar between never-taker and compliers


and always-takers and compliers ⇒ more plausible that average
treatment effects for these groups are also comparable

49 / 64
General Remarks (3)

Generalizations:
Generalization in three important ways:
Multiple instruments (e.g., a set of quarter-of-birth dummies)
Models with covariates (e.g., controls for year of birth)
Models with variable and continuous treatment intensity (e.g.,
years of schooling)

In all three cases: the IV estimand is a weighted average of


causal effects for instrument-specific compliers

The econometric tool remains 2SLS and the interpretation


remains fundamentally similar to the basic LATE result, with
a few bells and whistles

50 / 64
IV — More Applications

51 / 64
Example: Devereux and Hart (2010)

Do students benefit from compulsory schooling?

Previous studies provide mixed evidence (high returns in


US but low returns in Europe)

Devereux and Hart find modest effects for men and no positive
returns for women

Strategy: exploit major change in CSL in Britain in 1947


⇒ school leaving age increased from 14 to 15

52 / 64
Example: Devereux and Hart (2010)

Persons born before Apr 33 faced a minimum age of 14,


while persons born after Apr 33 faced a minimum age of 15

Fraction of leaving school age before age 15 fell from over


60% for the 1932 cohort to about 10% for the 1934 cohort

53 / 64
Example: Devereux and Hart (2010)

Aim: Estimate the relationships between the law change


and schooling, wages and earnings

Estimation approach: Two-stages least squares (2SLS)


using CSL change as IV for schooling

Data: General Household Survey (GHS) for 1979-98 – national


survey of people living in private households

Sample: includes individuals born between 1921 and 1951


(age 28-64 in survey)
Variable of interest
Schooling: age at which person left school
Wages: weekly earnings and hourly earnings

54 / 64
Example: Devereux and Hart (2010)

55 / 64
Example: Devereux and Hart (2010)
Reduced-form: regresses log weekly earnings (Y ) on a
quartic function of year-of-birth (YOB) and the LAW variable

ln(Yi ) = γ0 + γ1 LAWi + f (YOBi ) + ei

First-stage: regresses age left school (SCH) on a quartic


function of year-of-birth (YOB) and the LAW variable

SCHi = α0 + α1 LAWi + f (YOBi ) + ϵi

Structural equation: regresses log weekly earnings on


age left school and a quartic function of year-of-birth

ln(Yi ) = β0 + β1 SCHi + f (YOBi ) + µi

56 / 64
Example: Devereux and Hart (2010)

57 / 64
Example: Devereux and Hart (2010)

Col. 1-3 (first row): the law increased the average school
leaving age by about a half of a year

Col. 4-6 (first row): the law has a positive but insignificant
effect on weekly earnings

Col. 7-9 (first row): the 2SLS coefficients imply that one
year extra schooling increase earnings by about 2%, but
coefficients are always statistically insignificant

Homework: Check if 2SLS coefficients equal the reduced


form coefficient divided by the first stage coefficient

Other rows are robustness checks to compare results to


Oreopoulus (2006)

58 / 64
Example: Devereux and Hart (2010)

59 / 64
Example: Devereux and Hart (2010)

60 / 64
Example: Devereux and Hart (2010)

61 / 64
Example: Devereux and Hart (2010)

Summary

Evaluates how change of school leaving age in Britain in


1947 from age 14 to age 15 affected earnings

Modest effect of an extra year of schooling for men and no


extra return for women

Indicates rather low returns of schooling for those who dropped


out early of school

62 / 64
Pro and Cons of IV (Becker 2016)

63 / 64
Homework

Read Becker’s article “Using instrumental variables to establish


causality ” available on Ilias
Questions
1 What conditions does a credible instrument need to satisfy?
2 Which of the conditions are easier to satisfy and why?
3 Can you think of examples that would invalidate the IV used
in Devereux and Hart (2010)?
4 What is a local average treatment effect (LATE)?
5 When can the IV estimate be interpreted as LATE?

64 / 64

You might also like