0% found this document useful (0 votes)

14 views

RCS

The document discusses the use of Difference-in-Differences (DiD) methods with repeated cross-sectional data, highlighting the incorporation of covariates to account for specific trends. It emphasizes the importance of assumptions such as parallel trends and the implications of different sampling schemes on efficiency. The findings suggest that using panel data is generally more efficient than repeated cross-sectional data, particularly when considering compositional changes and modeling outcomes for treated units.

Uploaded by

kanspurchase2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

RCS

Uploaded by

kanspurchase2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Causal Inference using Difference-in-Differences

Lecture 7: Leveraging repeated cross-sectional data

Pedro H. C. Sant’Anna
Emory University

January 2025
Introduction
DiD procedures with Covariates

■ We can include covariates into DiD to allow for covariate-specific trends.

▶ Regression adjustments;

▶ Inverse probability weighting;

▶ Doubly Robust (augmented inverse probability weighting);

■ All these are implemented in DRDID and did R packages, and drdid and csdid Stata
packages.

■ We can use them with panel data or repeated cross-sectional data.

1
Are there differences between these two cases?

For a given sample size, how much efficiency do we lose by not

having balanced panel data?

2
We will focus on the 2x2 case

3
Let’s review our assumptions
Assumptions in 2x2 setup

Assumption (Conditional Parallel Trends Assumption)

E [Yt=2 (∞)|G = 2, X] − E [Yt=1 (∞)|G = 2, X] = E [Yt=2 (∞)|G = ∞, X] − E [Yt=1 (∞)|G = ∞, X] a.s.

Assumption (No-Anticipation)
For all units i, Yi,t (g) = Yi,t (∞) for all groups in their pre-treatment periods, i.e., for all
t < g.

Assumption (Strong Overlap Assumption)

The conditional probability of belonging to the treatment group, given observed
characteristics X, is uniformly bounded away from 1. That is, for some ϵ > 0,
P [G = 2|X] < 1 − ϵ almost surely.

4
Different Sampling Schemes
Panel data sampling scheme

Assumption (Panel Data Sampling Scheme)

The data {Yi,t=1 , Yi,t=2 , Gi , Xi }ni=i is a random sample of the population of interest.

■ Assumption states that we observe the same units in all time periods:
No need to worry about compositional changes!

■ Assumption does not restrict dependence between realized outcomes across

periods.

■ We observe covariates for all individuals.

5
Stationary Repeated cross-section data sampling scheme

Assumption (Stationary Repeated Cross-Section Data Sampling Scheme)

The pooled repeated cross-section data {Yi , Gi , Ti , Xi }ni=1 consist of iid draws from the
mixture distribution

P (Y ≤ y, X ≤ x, G = g, T = t) = 1{t = 2} · λ · P (Yt=2 ≤ y, X ≤ x, G = g|T = 2)

+1{t = 1} · (1 − λ) P (Yt=1 ≤ y, X ≤ x, G = g|T = 1) ,

where (y, x, g, t) ∈ R × Rk × {2, ∞} × {1, 2}, λ = P (T = 2) ∈ (0, 1).

d
Furthermore, (G, X) |T = 1 ∼ (G, X) |T = 2.

6
Stationary Repeated Cross-Section Data Sampling Schemes

■ It accommodates the binomial sampling scheme where an observation i is randomly

drawn from either (Yt=2 , G, X) or (Yt=1 , G, X) with fixed probability λ (here, T is a
non-degenerated random variable).

■ It also accommodates the “conditional” sampling scheme where nt=2 observations

are sampled from (Yt=2 , G, X), nt=1 observations are sampled from (Yt=1 , G, X) and
λ = nt=2 /(nt=1 + nt=2 ) (here, T is treated as fixed).

■ However, this assumption rules out compositional changes across periods:

we are sampling from the same population of interest in both periods.

■ RCS results of Abadie (2005) and Sant’Anna and Zhao (2020) really depends on this!

7
What if I want to allow for compositional changes?

8
Repeated cross-section data sampling scheme

Assumption (Repeated Cross-Section Data Sampling Scheme)

The pooled data {Yi , Gi , Ti , Xi }ni=1 consist of iid draws from

P (Y ≤ y, X ≤ x, G = g, T = t) = 1{t = 2} · λ · P (Yt=2 ≤ y, X ≤ x, G = g|T = 2)

+1{t = 1} · (1 − λ) P (Yt=1 ≤ y, X ≤ x, G = g|T = 1) ,

where (y, x, g, t) ∈ R × Rk × {2, ∞} × {1, 2}, λ = P (T = 2) ∈ (0, 1).

■ Not many results are available for this case in the literature.

■ In Sant’Anna and Xu (2023), we have worked out the details on how to allow
compositional changes while doing DiD.

9
Repeated cross-section data sampling scheme

■ Not many results are available for this case in the literature.

■ In Sant’Anna and Xu (2023), we have worked out the details on how to allow
compositional changes while doing DiD.
▶ Derive the semiparametric efficiency bound for this case;

▶ Propose nonparametric, data-driven estimators that achieve the semiparametric

efficiency bound;
▶ Propose DML estimators that can leverage modern ML methods;

▶ Propose Hausman-type tests for compositional changes.

▶ Derive the semiparametric efficiency bound for cases where part of the data is a
balanced panel and another part is repeated cross-sectional (like in CPS).
■ We won’t have time to dig into these details today, as these results are very recent—I
may update these slides in the future.
10
Difference 1:
Most DiD estimators with RCS data impose a no-compositional
changes assumption.
This is not the case when panel data is available.

11
Doubly Robust estimators
Doubly Robust DiD procedure with Panel

■ Sant’Anna and Zhao (2020) considered the following doubly robust estimand when
panel data are available:

  
p(X) (1 − D)
 D 1 − p(X)  
ATTdr,p 
= E  −  ∆Y − m∆ (X) 
G= ∞
E [D] p(X) (1 − D)  ,
E
1 − p(X)
where D = 1{G = 2}.

■ Note that we only need to model the evolution of Y given X for the untreated units.

■ No need to model the behavior of the treated units! Pretty neat!

■ Estimator can achieve the semiparametric efficiency bound (even without modelling
the out. evol. among treated). 12
Doubly Robust DiD procedure with stationary repeated cross-section

■ Sant’Anna and Zhao (2020) two different DR estimands with RCS data.
■ First one mimics the panel data one:

ATTdr
1
,rc
=

D · 1 {T = 2} D · 1 {T = 1}
E − · Y − mrc rc
G=∞,t=2 (X) − mG=∞,t=1 (X)
E [D · 1 {T = 2}] E [D · 1 {T = 1}]

  
p(X)(1−D)·1{T=2} p(X)(1−D)·1{T=1}
1−p(X) 1−p(X)
− E  h i − h i · Y − mrc rc
G=∞,t=2 (X) − mG=∞,t=1 (X)

p(X)(1−D)·1{T=2} p(X)(1−D)·1{T=1}
E 1−p(X)
E 1−p(X)

■ Model the evolution of Y given X for the untreated units:

need two models because we do not observe the same units over time.
13
However, this DR estimator is not efficient!
We can do better!

14
Doubly Robust DiD procedure with repeated cross-section

Sant’Anna and Zhao (2020) second DR DiD estimand also relies on outcome regression
models for the treated unit:

ATTdr,rc
2 = ATT1dr,rc

rc
+ E mrc rc rc
G=2,t=2 (X) − mG=∞,t=2 (X) D = 1 − E mG=2,t=2 (X) − mG=∞,t=2 (X) D = 1, T = 2

rc
− E mrc rc rc
G=2,t=1 (X) − mG=∞,t=1 (X) D = 1 − E mG=2,t=1 (X) − mG=∞,t=1 (X) D = 1, T = 1 ,

■ Need to model the behavior of the treated units.

■ Estimator can now achieve the semiparametric efficiency bound!

15
Doubly Robust DiD procedure with repeated cross-section

■ Both DR DiD estimators for RCS data are consistent for the ATT under the exact same
conditions:

■ Even if the regression model for the outcome evolution for the treated group is
misspecified, ATTdr,rc
2 is consistent for the ATT (provided that either the pscore or the
regression models for out. evol. among untreated units are correctly specified).

dr,rc
■ However, in general, ATT2 is more efficient than ATTdr,rc
1 .

dr,rc
■ In fact, Sant’Anna and Zhao (2020) shows that ATT2 is (locally) semiparametrically
efficient.

16
Difference 2:
We need to model the outcome evol. among treated units in RCS if
we want to achieve semiparametric efficiency bound!

17
Comparing Semiparametric efficiency bounds
Semiparametric efficiency: Panel vs. Repeated cross-section data

Corollary
Assume that T is independent of (Y1 , Y0 , D, X), and the other regularity conditions stated
in Sant’Anna and Zhao (2020) hold. Then,

p,2
Vrc,2
eff − Veff
 r r !2
1 1−λ λ
= E D (Yt=2 − mG=2,t=2 (X)) + (Yt=1 − mG=2,t=1 (X))
E [D]2 λ 1−λ
r r !2 
2
(1 − D) p (X) 1−λ λ
+ (Yt=2 − mG=∞,t=2 (X)) + (Yt=1 − mG=∞,t=1 (X))  ≥ 0,
(1 − p (X))2 λ 1−λ

where λ = P (T = 2)

18
Semiparametric efficiency: Panel vs. Repeated cross-section data

■ Efficiency loss is convex in λ:

loss of efficiency is bigger when the pre and post-treatment sample sizes are more
imbalanced.

■ Optimal λ depends on the data: λ = σ̃2 / (σ̃1 + σ̃2 ), where, for t = 1, 2

" #
2 (1 − D) p (X)2
σ̃t2 = E D (Yt − mG=2,t (X)) + 2
(Yt − mg=∞,t (X))2
(1 − p (X))
■ In principle, one may benefit from “oversampling” from either the pre or
post-treatment period.

■ However, it is, in general, not feasible to know the optimal λ during the design stage:
σ̃22 depends on post-treatment data!
■ λ = 0.5 is a “reasonable” choice, in practice.
19
Difference 3:
The best RCS DiD estimator is always less efficient than the best
Panel DiD estimator for a given sample size.

20
Take-way messages
Take-way message

■ When dealing with RCS data, we must consider compositional changes.

▶ did R package and csdid Stata package assumes stationarity.

▶ Check Sant’Anna and Xu (2023) for how to make your model more flexible and test for
compositional changes.

■ With RCS, there are important benefits of modeling the outcome evolution of treated
units when doing DiD.

■ Overall, for a given sample size, RCS is less efficient than Panel data.

Loss of efficiency is bigger when the pre and post-treatment sample sizes are more
imbalanced.

■ See Sant’Anna and Zhao (2020) and Sant’Anna and Xu (2023) for more details!

21
References
Abadie, Alberto, “Semiparametric Difference-in-Differences Estimators,” The Review of
Economic Studies, 2005, 72 (1), 1–19.
Sant’Anna, Pedro H. C. and Qi Xu, “Difference-in-Differences with Compositional Changes,”
arXiv:2304.13925, 2023.
Sant’Anna, Pedro H. C. and Jun Zhao, “Doubly robust difference-in-differences estimators,”
Journal of Econometrics, November 2020, 219 (1), 101–122.

FRM Bionic Turtle T2-Quantitative
100% (2)
FRM Bionic Turtle T2-Quantitative
133 pages
JCI
No ratings yet
JCI
54 pages
Advancing Survey Sampling Efficiency Under Stratified Random and Post-Stratification
No ratings yet
Advancing Survey Sampling Efficiency Under Stratified Random and Post-Stratification
20 pages
Pooled Cross Sections and Panel Data, Difference in Difference
No ratings yet
Pooled Cross Sections and Panel Data, Difference in Difference
35 pages
Stock and Watson Summary PDF
No ratings yet
Stock and Watson Summary PDF
2 pages
Chapter8 Double Sampling
No ratings yet
Chapter8 Double Sampling
17 pages
Ca09 Pitblado Handout
No ratings yet
Ca09 Pitblado Handout
28 pages
Análisis de Regresión
No ratings yet
Análisis de Regresión
37 pages
Identification For Difference in Differences With Cross-Section and Panel Data
No ratings yet
Identification For Difference in Differences With Cross-Section and Panel Data
7 pages
DID topics
No ratings yet
DID topics
86 pages
Lee Wooldridge 20230720
No ratings yet
Lee Wooldridge 20230720
45 pages
Paper_28_RRJS_2014_Tha_Yad_Pat
No ratings yet
Paper_28_RRJS_2014_Tha_Yad_Pat
12 pages
ARTICULO Doble Estratificacion
No ratings yet
ARTICULO Doble Estratificacion
10 pages
Consistency of Stratified Random Sampling Estimators in Repetive Sampling
No ratings yet
Consistency of Stratified Random Sampling Estimators in Repetive Sampling
5 pages
A New Ratio Type Estimator For Double Sampling With Two Auxiliary Variables
No ratings yet
A New Ratio Type Estimator For Double Sampling With Two Auxiliary Variables
6 pages
2412.04816v1
No ratings yet
2412.04816v1
38 pages
Calibrated Empirical Neutrosophic Cumulative Distribution Function Estimation For Both Symmetric and Asymmetric Data
No ratings yet
Calibrated Empirical Neutrosophic Cumulative Distribution Function Estimation For Both Symmetric and Asymmetric Data
24 pages
Regression Discontinuity Designs Using Covariates
No ratings yet
Regression Discontinuity Designs Using Covariates
39 pages
Resampling Technique
No ratings yet
Resampling Technique
3 pages
Panel Data II
No ratings yet
Panel Data II
32 pages
Bilal Fyp
No ratings yet
Bilal Fyp
37 pages
svy glossary
No ratings yet
svy glossary
4 pages
2 Complex Sampling Concepts: PSU PSU PSU Usus CS SRS
No ratings yet
2 Complex Sampling Concepts: PSU PSU PSU Usus CS SRS
19 pages
Diseños de Regresión Discontinua Fundaciones
No ratings yet
Diseños de Regresión Discontinua Fundaciones
57 pages
Qin 等 - 2024 - Distribution-Free Prediction Intervals Under Covariate Shift, With an Application to Causal Inferenc
No ratings yet
Qin 等 - 2024 - Distribution-Free Prediction Intervals Under Covariate Shift, With an Application to Causal Inferenc
14 pages
Oxf Bull Econ Stat - 2021 - Khan - Assessing Sampling Error in Pseudo‐Panel Models
No ratings yet
Oxf Bull Econ Stat - 2021 - Khan - Assessing Sampling Error in Pseudo‐Panel Models
28 pages
Muestreo de Clúster Adaptativo Thompson1990
No ratings yet
Muestreo de Clúster Adaptativo Thompson1990
11 pages
(23007451 - Special Matrices) Best Linear Unbiased Estimation For Varying Probability With and Without Replacement Sampling
No ratings yet
(23007451 - Special Matrices) Best Linear Unbiased Estimation For Varying Probability With and Without Replacement Sampling
14 pages
STA3022 Test2 Solutions
No ratings yet
STA3022 Test2 Solutions
7 pages
Guido Imbens
No ratings yet
Guido Imbens
62 pages
Testing For Cross-Sectional Dependence in Panel-Data Models: 6, Number 4, Pp. 482-496
No ratings yet
Testing For Cross-Sectional Dependence in Panel-Data Models: 6, Number 4, Pp. 482-496
15 pages
Comp - Sem VI - Quantitative Analysis+Sample Questions
No ratings yet
Comp - Sem VI - Quantitative Analysis+Sample Questions
10 pages
Bootstrap Student Presentation
100% (1)
Bootstrap Student Presentation
36 pages
Paper 15
No ratings yet
Paper 15
17 pages
11 Ejs630
No ratings yet
11 Ejs630
36 pages
Paper_4_SiT_Dec2009_Shu_Tha_Pat_Raj
No ratings yet
Paper_4_SiT_Dec2009_Shu_Tha_Pat_Raj
18 pages
Sampling Theory: Double Sampling (Two Phase Sampling)
No ratings yet
Sampling Theory: Double Sampling (Two Phase Sampling)
12 pages
MPC PDF
No ratings yet
MPC PDF
12 pages
Arkhangelsky-SyntheticDifferenceinDifferences-2021
No ratings yet
Arkhangelsky-SyntheticDifferenceinDifferences-2021
32 pages
Applied Micro Methods
No ratings yet
Applied Micro Methods
32 pages
Panel Data Analysis With Stata Part 1: Fixed Effects and Random Effects Models
No ratings yet
Panel Data Analysis With Stata Part 1: Fixed Effects and Random Effects Models
26 pages
2024_Haultfouille et al_Partially linear models under data combination
No ratings yet
2024_Haultfouille et al_Partially linear models under data combination
30 pages
Exam 2
No ratings yet
Exam 2
21 pages
Applied Micro Methods Dark Mode
No ratings yet
Applied Micro Methods Dark Mode
63 pages
Lec06 - Panel Data
No ratings yet
Lec06 - Panel Data
160 pages
Eur Franses AE73
No ratings yet
Eur Franses AE73
14 pages
Probability and Non-Probability Samples LMU Munich 2022
No ratings yet
Probability and Non-Probability Samples LMU Munich 2022
22 pages
Survey Wss 2010
No ratings yet
Survey Wss 2010
134 pages
Final - Answers Stud
100% (1)
Final - Answers Stud
11 pages
Advanced Statistical Techniques Using R: Outliers and Missing Data
No ratings yet
Advanced Statistical Techniques Using R: Outliers and Missing Data
28 pages
STA3022Test2 2023 v2
No ratings yet
STA3022Test2 2023 v2
6 pages
Applied Micro Methods
No ratings yet
Applied Micro Methods
64 pages
Journal 1
No ratings yet
Journal 1
10 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Doubly Robust Inference With Censoring Unbiased Transformations
No ratings yet
Doubly Robust Inference With Censoring Unbiased Transformations
44 pages
Halunga 2017
No ratings yet
Halunga 2017
41 pages
Statistical Treatments
No ratings yet
Statistical Treatments
34 pages
An Efficient Estimator of Population Variance of A Sensitive Variable With A New Randomized Response Technique
No ratings yet
An Efficient Estimator of Population Variance of A Sensitive Variable With A New Randomized Response Technique
11 pages
TCH442E Quantitative Methods For Finance
No ratings yet
TCH442E Quantitative Methods For Finance
21 pages
Tests For Structural Change and Stability: y X X I N
No ratings yet
Tests For Structural Change and Stability: y X X I N
8 pages
Computer Solved: Nonlinear Differential Equations
From Everand
Computer Solved: Nonlinear Differential Equations
Joe J. Ettl
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
The Theory Cook Book
No ratings yet
The Theory Cook Book
5 pages
Black Book Final-Merged-666
No ratings yet
Black Book Final-Merged-666
6 pages
2.purchasing Tools and TechniquesPPT
100% (1)
2.purchasing Tools and TechniquesPPT
40 pages
Apa Standards of Practice
No ratings yet
Apa Standards of Practice
6 pages
THEME 2 - The Method With A Quantitative Approach
No ratings yet
THEME 2 - The Method With A Quantitative Approach
21 pages
Gutiérrez 2017 - Psychometric Properties Spanish PID-5 + Supp Mat
No ratings yet
Gutiérrez 2017 - Psychometric Properties Spanish PID-5 + Supp Mat
21 pages
Mohler and Tennison Randall
No ratings yet
Mohler and Tennison Randall
10 pages
5010 Stuttering and Romantic Relationships The Perception of People Who Stutter
No ratings yet
5010 Stuttering and Romantic Relationships The Perception of People Who Stutter
35 pages
GMP Guidelines For Herbal Medicine
No ratings yet
GMP Guidelines For Herbal Medicine
12 pages
CHAPTER 5 - The New Product Development Process
100% (2)
CHAPTER 5 - The New Product Development Process
16 pages
Fieldwork No. 1 (Measuring and Laying Out of Horizontal Angles)
No ratings yet
Fieldwork No. 1 (Measuring and Laying Out of Horizontal Angles)
5 pages
China Thesis Database
100% (3)
China Thesis Database
7 pages
IFE Matrix (Internal Factor Evaluation)
No ratings yet
IFE Matrix (Internal Factor Evaluation)
4 pages
Dr. Ujjwal Paul
No ratings yet
Dr. Ujjwal Paul
6 pages
Student Housing Guideline
No ratings yet
Student Housing Guideline
4 pages
Introduction of The Study: Heritance Kandalama
100% (1)
Introduction of The Study: Heritance Kandalama
12 pages
Pushing Critical Thinking Skills With.28
No ratings yet
Pushing Critical Thinking Skills With.28
4 pages
BMM5312 Assessment 1 Brief - Infographic Presentation 24-25 UEF
No ratings yet
BMM5312 Assessment 1 Brief - Infographic Presentation 24-25 UEF
5 pages
Practical Research 2 Quarter 1 Week 5
No ratings yet
Practical Research 2 Quarter 1 Week 5
5 pages
Veterinary Epidemiologic Research 1th Edition Ian Dohoo - The ebook version is available in PDF and DOCX for easy access
No ratings yet
Veterinary Epidemiologic Research 1th Edition Ian Dohoo - The ebook version is available in PDF and DOCX for easy access
63 pages
MCQ Hypothesis Testing 4
No ratings yet
MCQ Hypothesis Testing 4
3 pages
Online Fraud Report
No ratings yet
Online Fraud Report
15 pages
2f9caquality Gurus
100% (1)
2f9caquality Gurus
56 pages
Analysing Processes: About Testas Test Centres Testas Dates Registration Sample Questions FAQ Contact Disclaimer
No ratings yet
Analysing Processes: About Testas Test Centres Testas Dates Registration Sample Questions FAQ Contact Disclaimer
2 pages
PRISMA (2015) NMA
No ratings yet
PRISMA (2015) NMA
26 pages
Thesis and Dissertation Meaning
100% (1)
Thesis and Dissertation Meaning
6 pages
Video Lesson Proposal
No ratings yet
Video Lesson Proposal
8 pages
PHD Law Thesis PDF
100% (3)
PHD Law Thesis PDF
4 pages

RCS

Uploaded by

RCS

Uploaded by

Causal Inference using Difference-in-Differences

Lecture 7: Leveraging repeated cross-sectional data

■ We can include covariates into DiD to allow for covariate-specific trends.

▶ Inverse probability weighting;

▶ Doubly Robust (augmented inverse probability weighting);

■ We can use them with panel data or repeated cross-sectional data.

For a given sample size, how much efficiency do we lose by not

Assumption (Conditional Parallel Trends Assumption)

Assumption (Strong Overlap Assumption)

Assumption (Panel Data Sampling Scheme)

■ Assumption does not restrict dependence between realized outcomes across

■ We observe covariates for all individuals.

Assumption (Stationary Repeated Cross-Section Data Sampling Scheme)

P (Y ≤ y, X ≤ x, G = g, T = t) = 1{t = 2} · λ · P (Yt=2 ≤ y, X ≤ x, G = g|T = 2)

where (y, x, g, t) ∈ R × Rk × {2, ∞} × {1, 2}, λ = P (T = 2) ∈ (0, 1).

■ It accommodates the binomial sampling scheme where an observation i is randomly

■ It also accommodates the “conditional” sampling scheme where nt=2 observations

■ However, this assumption rules out compositional changes across periods:

Assumption (Repeated Cross-Section Data Sampling Scheme)

P (Y ≤ y, X ≤ x, G = g, T = t) = 1{t = 2} · λ · P (Yt=2 ≤ y, X ≤ x, G = g|T = 2)

where (y, x, g, t) ∈ R × Rk × {2, ∞} × {1, 2}, λ = P (T = 2) ∈ (0, 1).

▶ Propose nonparametric, data-driven estimators that achieve the semiparametric

▶ Propose Hausman-type tests for compositional changes.

■ No need to model the behavior of the treated units! Pretty neat!

■ Model the evolution of Y given X for the untreated units:

■ Need to model the behavior of the treated units.

■ Estimator can now achieve the semiparametric efficiency bound!

■ Efficiency loss is convex in λ:

■ Optimal λ depends on the data: λ = σ̃2 / (σ̃1 + σ̃2 ), where, for t = 1, 2

■ When dealing with RCS data, we must consider compositional changes.

▶ did R package and csdid Stata package assumes stationarity.

You might also like