0% found this document useful (0 votes)
14 views

RCS

The document discusses the use of Difference-in-Differences (DiD) methods with repeated cross-sectional data, highlighting the incorporation of covariates to account for specific trends. It emphasizes the importance of assumptions such as parallel trends and the implications of different sampling schemes on efficiency. The findings suggest that using panel data is generally more efficient than repeated cross-sectional data, particularly when considering compositional changes and modeling outcomes for treated units.

Uploaded by

kanspurchase2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

RCS

The document discusses the use of Difference-in-Differences (DiD) methods with repeated cross-sectional data, highlighting the incorporation of covariates to account for specific trends. It emphasizes the importance of assumptions such as parallel trends and the implications of different sampling schemes on efficiency. The findings suggest that using panel data is generally more efficient than repeated cross-sectional data, particularly when considering compositional changes and modeling outcomes for treated units.

Uploaded by

kanspurchase2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Causal Inference using Difference-in-Differences

Lecture 7: Leveraging repeated cross-sectional data

Pedro H. C. Sant’Anna
Emory University

January 2025
Introduction
DiD procedures with Covariates

■ We can include covariates into DiD to allow for covariate-specific trends.


▶ Regression adjustments;

▶ Inverse probability weighting;

▶ Doubly Robust (augmented inverse probability weighting);

■ All these are implemented in DRDID and did R packages, and drdid and csdid Stata
packages.

■ We can use them with panel data or repeated cross-sectional data.

1
Are there differences between these two cases?

For a given sample size, how much efficiency do we lose by not


having balanced panel data?

2
We will focus on the 2x2 case

3
Let’s review our assumptions
Assumptions in 2x2 setup

Assumption (Conditional Parallel Trends Assumption)


E [Yt=2 (∞)|G = 2, X] − E [Yt=1 (∞)|G = 2, X] = E [Yt=2 (∞)|G = ∞, X] − E [Yt=1 (∞)|G = ∞, X] a.s.

Assumption (No-Anticipation)
For all units i, Yi,t (g) = Yi,t (∞) for all groups in their pre-treatment periods, i.e., for all
t < g.

Assumption (Strong Overlap Assumption)


The conditional probability of belonging to the treatment group, given observed
characteristics X, is uniformly bounded away from 1. That is, for some ϵ > 0,
P [G = 2|X] < 1 − ϵ almost surely.

4
Different Sampling Schemes
Panel data sampling scheme

Assumption (Panel Data Sampling Scheme)


The data {Yi,t=1 , Yi,t=2 , Gi , Xi }ni=i is a random sample of the population of interest.

■ Assumption states that we observe the same units in all time periods:
No need to worry about compositional changes!

■ Assumption does not restrict dependence between realized outcomes across


periods.

■ We observe covariates for all individuals.

5
Stationary Repeated cross-section data sampling scheme

Assumption (Stationary Repeated Cross-Section Data Sampling Scheme)


The pooled repeated cross-section data {Yi , Gi , Ti , Xi }ni=1 consist of iid draws from the
mixture distribution

P (Y ≤ y, X ≤ x, G = g, T = t) = 1{t = 2} · λ · P (Yt=2 ≤ y, X ≤ x, G = g|T = 2)


+1{t = 1} · (1 − λ) P (Yt=1 ≤ y, X ≤ x, G = g|T = 1) ,

where (y, x, g, t) ∈ R × Rk × {2, ∞} × {1, 2}, λ = P (T = 2) ∈ (0, 1).

d
Furthermore, (G, X) |T = 1 ∼ (G, X) |T = 2.

6
Stationary Repeated Cross-Section Data Sampling Schemes

■ It accommodates the binomial sampling scheme where an observation i is randomly


drawn from either (Yt=2 , G, X) or (Yt=1 , G, X) with fixed probability λ (here, T is a
non-degenerated random variable).

■ It also accommodates the “conditional” sampling scheme where nt=2 observations


are sampled from (Yt=2 , G, X), nt=1 observations are sampled from (Yt=1 , G, X) and
λ = nt=2 /(nt=1 + nt=2 ) (here, T is treated as fixed).

■ However, this assumption rules out compositional changes across periods:


we are sampling from the same population of interest in both periods.

■ RCS results of Abadie (2005) and Sant’Anna and Zhao (2020) really depends on this!

7
What if I want to allow for compositional changes?

8
Repeated cross-section data sampling scheme

Assumption (Repeated Cross-Section Data Sampling Scheme)


The pooled data {Yi , Gi , Ti , Xi }ni=1 consist of iid draws from

P (Y ≤ y, X ≤ x, G = g, T = t) = 1{t = 2} · λ · P (Yt=2 ≤ y, X ≤ x, G = g|T = 2)


+1{t = 1} · (1 − λ) P (Yt=1 ≤ y, X ≤ x, G = g|T = 1) ,

where (y, x, g, t) ∈ R × Rk × {2, ∞} × {1, 2}, λ = P (T = 2) ∈ (0, 1).

■ Not many results are available for this case in the literature.

■ In Sant’Anna and Xu (2023), we have worked out the details on how to allow
compositional changes while doing DiD.

9
Repeated cross-section data sampling scheme

■ Not many results are available for this case in the literature.

■ In Sant’Anna and Xu (2023), we have worked out the details on how to allow
compositional changes while doing DiD.
▶ Derive the semiparametric efficiency bound for this case;

▶ Propose nonparametric, data-driven estimators that achieve the semiparametric


efficiency bound;
▶ Propose DML estimators that can leverage modern ML methods;

▶ Propose Hausman-type tests for compositional changes.

▶ Derive the semiparametric efficiency bound for cases where part of the data is a
balanced panel and another part is repeated cross-sectional (like in CPS).
■ We won’t have time to dig into these details today, as these results are very recent—I
may update these slides in the future.
10
Difference 1:
Most DiD estimators with RCS data impose a no-compositional
changes assumption.
This is not the case when panel data is available.

11
Doubly Robust estimators
Doubly Robust DiD procedure with Panel

■ Sant’Anna and Zhao (2020) considered the following doubly robust estimand when
panel data are available:

  
p(X) (1 − D)
 D 1 − p(X)  
ATTdr,p 
= E  −    ∆Y − m∆ (X) 
G= ∞
E [D] p(X) (1 − D)  ,
E
1 − p(X)
where D = 1{G = 2}.

■ Note that we only need to model the evolution of Y given X for the untreated units.

■ No need to model the behavior of the treated units! Pretty neat!

■ Estimator can achieve the semiparametric efficiency bound (even without modelling
the out. evol. among treated). 12
Doubly Robust DiD procedure with stationary repeated cross-section

■ Sant’Anna and Zhao (2020) two different DR estimands with RCS data.
■ First one mimics the panel data one:

ATTdr
1
,rc
=
  
D · 1 {T = 2} D · 1 {T = 1} 
E − · Y − mrc rc
G=∞,t=2 (X) − mG=∞,t=1 (X)
E [D · 1 {T = 2}] E [D · 1 {T = 1}]

  
p(X)(1−D)·1{T=2} p(X)(1−D)·1{T=1}
1−p(X) 1−p(X) 
− E  h i − h i · Y − mrc rc
G=∞,t=2 (X) − mG=∞,t=1 (X)

p(X)(1−D)·1{T=2} p(X)(1−D)·1{T=1}
E 1−p(X)
E 1−p(X)

■ Model the evolution of Y given X for the untreated units:


need two models because we do not observe the same units over time.
13
However, this DR estimator is not efficient!
We can do better!

14
Doubly Robust DiD procedure with repeated cross-section

Sant’Anna and Zhao (2020) second DR DiD estimand also relies on outcome regression
models for the treated unit:

ATTdr,rc
2 = ATT1dr,rc

   rc 
+ E mrc rc rc
G=2,t=2 (X) − mG=∞,t=2 (X) D = 1 − E mG=2,t=2 (X) − mG=∞,t=2 (X) D = 1, T = 2

   rc 
− E mrc rc rc
G=2,t=1 (X) − mG=∞,t=1 (X) D = 1 − E mG=2,t=1 (X) − mG=∞,t=1 (X) D = 1, T = 1 ,

■ Need to model the behavior of the treated units.

■ Estimator can now achieve the semiparametric efficiency bound!


15
Doubly Robust DiD procedure with repeated cross-section

■ Both DR DiD estimators for RCS data are consistent for the ATT under the exact same
conditions:

■ Even if the regression model for the outcome evolution for the treated group is
misspecified, ATTdr,rc
2 is consistent for the ATT (provided that either the pscore or the
regression models for out. evol. among untreated units are correctly specified).

dr,rc
■ However, in general, ATT2 is more efficient than ATTdr,rc
1 .

dr,rc
■ In fact, Sant’Anna and Zhao (2020) shows that ATT2 is (locally) semiparametrically
efficient.

16
Difference 2:
We need to model the outcome evol. among treated units in RCS if
we want to achieve semiparametric efficiency bound!

17
Comparing Semiparametric efficiency bounds
Semiparametric efficiency: Panel vs. Repeated cross-section data

Corollary
Assume that T is independent of (Y1 , Y0 , D, X), and the other regularity conditions stated
in Sant’Anna and Zhao (2020) hold. Then,

p,2
Vrc,2
eff − Veff
 r r !2
1 1−λ λ
= E D (Yt=2 − mG=2,t=2 (X)) + (Yt=1 − mG=2,t=1 (X))
E [D]2 λ 1−λ
r r !2 
2
(1 − D) p (X) 1−λ λ
+ (Yt=2 − mG=∞,t=2 (X)) + (Yt=1 − mG=∞,t=1 (X))  ≥ 0,
(1 − p (X))2 λ 1−λ

where λ = P (T = 2)

18
Semiparametric efficiency: Panel vs. Repeated cross-section data

■ Efficiency loss is convex in λ:


loss of efficiency is bigger when the pre and post-treatment sample sizes are more
imbalanced.

■ Optimal λ depends on the data: λ = σ̃2 / (σ̃1 + σ̃2 ), where, for t = 1, 2


" #
2 (1 − D) p (X)2
σ̃t2 = E D (Yt − mG=2,t (X)) + 2
(Yt − mg=∞,t (X))2
(1 − p (X))
■ In principle, one may benefit from “oversampling” from either the pre or
post-treatment period.

■ However, it is, in general, not feasible to know the optimal λ during the design stage:
σ̃22 depends on post-treatment data!
■ λ = 0.5 is a “reasonable” choice, in practice.
19
Difference 3:
The best RCS DiD estimator is always less efficient than the best
Panel DiD estimator for a given sample size.

20
Take-way messages
Take-way message

■ When dealing with RCS data, we must consider compositional changes.

▶ did R package and csdid Stata package assumes stationarity.

▶ Check Sant’Anna and Xu (2023) for how to make your model more flexible and test for
compositional changes.

■ With RCS, there are important benefits of modeling the outcome evolution of treated
units when doing DiD.

■ Overall, for a given sample size, RCS is less efficient than Panel data.

Loss of efficiency is bigger when the pre and post-treatment sample sizes are more
imbalanced.

■ See Sant’Anna and Zhao (2020) and Sant’Anna and Xu (2023) for more details!

21
References
Abadie, Alberto, “Semiparametric Difference-in-Differences Estimators,” The Review of
Economic Studies, 2005, 72 (1), 1–19.
Sant’Anna, Pedro H. C. and Qi Xu, “Difference-in-Differences with Compositional Changes,”
arXiv:2304.13925, 2023.
Sant’Anna, Pedro H. C. and Jun Zhao, “Doubly robust difference-in-differences estimators,”
Journal of Econometrics, November 2020, 219 (1), 101–122.

21

You might also like