Generalized Synthetic Control Method
Generalized Synthetic Control Method
ABSTRACT
Difference-in-differences (DID) is commonly used for causal inference in time-
series cross-sectional data. It requires the assumption that the average outcomes
of treated and control units would have followed parallel paths in the absence
of treatment. In this paper, we propose a method that not only relaxes this
often-violated assumption, but also unifies the synthetic control method (Abadie,
Diamond and Hainmueller 2010) with linear fixed effects models under a simple
framework, of which DID is a special case. It imputes counterfactuals for each
treated unit using control group information based on a linear interactive fixed ef-
fects model that incorporates unit-specific intercepts interacted with time-varying
coefficients. This method has several advantages. First, it allows the treatment
to be correlated with unobserved unit and time heterogeneities under reasonable
modelling assumptions. Second, it generalizes the synthetic control method to
the case of multiple treated units and variable treatment periods, and improves
efficiency and interpretability. Third, with a built-in cross-validation procedure,
it avoids specification searches and thus is easy to implement. An empirical ex-
ample of Election Day Registration and voter turnout in the United States is
provided.
∗
Department of Political Science, University of California, San Deigo. Social Science Building 377, 9500
Gilman Drive #0521, La Jolla, CA 92093. Email: [email protected]. The author is indebted to Matt Black-
well, Devin Caughey, Justin Grimmer, Jens Hainmueller, Danny Hidalgo, Simon Jackman, Jonathan Katz,
Luke Keele, Eric Min, Molly Roberts, Jim Snyder, Brandon Stewart, Teppei Yamamoto, as well as seminar
participants at the 2015 MPSA Annual Meeting and 2015 APSA Annual Meeting for helpful comments and
suggestions. I thank the editor, Mike Alvarez, and two anonymous reviewers for their extremely helpful
suggestions. I thank Jushan Bai for generously sharing the Matlab codes used in Bai (2009) and Melanie
Springer for kindly providing the state-level voter turnout data (1920-2000). The source code and data used
in the paper can be downloaded from the Political Analysis Dataverse at dx.doi.org/10.7910/DVN/8AKACJ
(Xu 2016) as well as the author’s website. Supplementary Materials for this article are available on the
journal’s website.
Difference-in-differences (DID) is one of the most commonly used empirical designs in today’s
social sciences. The identifying assumptions for DID include the “parallel trends” assump-
tion, which states that in the absence of the treatment the average outcomes of treated and
control units would have followed parallel paths. This assumption is not directly testable,
but researchers have more confidence in its validity when they find that the average out-
comes of the treated and control units follow parallel paths in pre-treatment periods. In
many cases, however, parallel pre-treatment trends are not supported by data, a clear sign
that the “parallel trends” assumption is likely to fail in the post-treatment period as well.
This paper attempts to deal with this problem systematically. It proposes a method that es-
timates the average treatment effect on the treated using time-series cross-sectional (TSCS)
data when the “parallel trends” assumption is not likely to hold.
The presence of unobserved time-varying confounders causes the failure of this assump-
tion. There are broadly two approaches in the literature to deal with this problem. The first
one is to condition on pre-treatment observables using matching methods, which may help
balance the influence of potential time-varying confounders between treatment and control
groups. For example, Abadie (2005) proposes matching before DID estimations. Although
this method is easy to implement, it does not guarantee parallel pre-treatment trends. The
synthetic control method proposed by Abadie, Diamond and Hainmueller (2010, 2015) goes
one step further. It matches both pre-treatment covariates and outcomes between a treated
unit and a set of control units and uses pre-treatment periods as criteria for good matches.1
Specifically, it constructs a “synthetic control unit” as the counterfactual for the treated unit
by reweighting the control units. It provides explicit weights for the control units, thus mak-
ing the comparison between the treated and synthetic control units transparent. However,
it only applies to the case of one treated unit and the uncertainty estimates it offers are not
1
See Hsiao, Ching and Wan (2012) and Angrist, Jord and Kuersteiner (2013) for alternative matching
methods along this line of thought.
2. FRAMEWORK
Suppose Yit is the outcome of interest of unit i at time t. Let T and C denote the sets of units
in treatment and control groups, respectively. The total number of units is N = Ntr + Nco ,
where Ntr and Nco are the numbers of treated and control units, respectively. All units are
observed for T periods (from time 1 to time T ). Let T0,i be the number of pre-treatment
periods for unit i, which is first exposed to the treatment at time (T0,i + 1) and subsequently
observed for qi = T − T0,i periods. Units in the control group are never exposed to the
treatment in the observed time span. For notational convenience, we assume that all treated
units are first exposed to the treatment at the same time, i.e., T0,i = T0 and qi = q; variable
treatment periods can be easily accommodated. First, we assume that Yit is given by a linear
factor model.
where the treatment indicator Dit equals 1 if unit i has been exposed to the treatment
prior to time t and equals 0 otherwise (i.e., Dit = 1 when i ∈ T and t > T0 and Dit = 0
otherwise).7 δit is the heterogeneous treatment effect on unit i at time t; xit is a (k × 1)
vector of observed covariates, β = [β1 , · · · , βk ]0 is a (k × 1) vector of unknown parameters,8
ft = [f1t , · · · , frt ]0 is an (r × 1) vector of unobserved common factors, λi = [λi1 , · · · , λir ]0 is
7
Cases in which the treatment switches on and off (or “multiple-treatment-time”) can be easily incorporated
in this framework as long as we impose assumptions on how the treatment affects current and future outcomes.
For example, one can assume that the treatment only affect the current outcome but not future outcomes
(no carryover effect), as fixed effects models often do. In this paper, we do not impose such assumptions.
See Imai and Kim (2016) for a thorough discussion.
8
β is assumed to be constant across space and time mainly for the purpose of fast computation in the
frequentist framework. It is a limitation compared with more flexible and increasingly popular random
coefficient models in Bayesian multi-level analysis.
Yi = Di ◦ δi + Xi β + F λi + εi , i ∈ 1, 2, · · · Nco , Nco + 1, · · · , N,
where Yi = [Yi1 , Yi2 , · · · , YiT ]0 ; Di = [Di1 , Di2 , · · · , DiT ]0 and δi = [δi1 , δi2 , · · · , δiT ]0 (sym-
bol “◦” stands for point-wise product); εi = [εi1 , εi2 , · · · , εiT ]0 are (T × 1) vectors; Xi =
9
For this reason, additive unit and time fixed effects are not explicitly assumed in the model. An extended
model that directly imposes additive two-way fixed effects is discussed in the next section.
10
In the former case, we can set f1t = t and f2t = t2 ; in the latter case, for example, we can rewrite
Yit = ρYi,t−1 + x0it β + εit as Yit = Yi0 · ρt + x0it β + νit , in which νit is an AR(1) process and ρt and Yi0 are
the unknown factor and factor loadings, respectively. See Gobillon and Magnac (2016) for more examples.
in which Yco = [Y1 , Y2 , · · · , YNco ] and εco = [ε1 , ε2 , · · · , εNco ] are (T × Nco ) matrices; Xco
is a three dimensional (T × Nco × p) matrix; and Λco = [λ1 , λ2 , · · · , λNco ]0 is a (Nco × r)
matrix, hence, the products Xco β and F Λ0co are also (T × Nco ) matrices. To identify β, F
and Λco in Equation (1), more constraints are needed. Following Bai (2003, 2009), I add
two sets of constraints on the factors and factor loadings: (1) all factor are normalized, and
(2) they are orthogonal to each other, i.e.: F 0 F/T = Ir and Λ0co Λco = diagonal.11 For the
moment, the number of factors r is assumed to be known. In the next section, we propose
a cross-validation procedure that automates the choice of r.
The main quantity of interest of this paper is the average treatment effect on the treated
(ATT) at time t (when t > T0 ):
1 1
δit .12
P P
AT Tt,t>T0 = Ntr i∈T [Yit (1) − Yit (0)] = Ntr i∈T
Note that in this paper, as in Abadie, Diamond and Hainmueller (2010), we treat the treat-
ment effects δit as given once the sample is drawn.13 Because Yit (1) is observed for treated
11
These constraints do not lead to loss of generality because for an arbitrary pair of matrices F and Λco ,
we can find an (r × r) invertible matrix A such that (F A)0 (F A)/T = Ir and (A−1 Λco )0 A−1 Λco is a diagonal
matrix. To see this, we can then rewrite λ0i F as λ̃0i F̃ , in which F̃ = F A and λ̃i = A−1 λi for units in both
the treatment and control groups such that F̃ and Λ̃co satisfy the above constraints. The total number of
constraints is r2 , the dimension of the matrix space where A belongs. It is worth noting that although the
original factors F may not be identifiable, the space spanned by F , a r-dimensional subspace of in the T -
dimensional space, is identified under the above constraints because for any vector in the subspace spanned
by F̃ , it is also in the subspace spanned by the original factors F .
12
For a clear and detailed explanation of quantities of interest in TSCS analysis, see Blackwell and Glynn
(2015). Using their terminology, this paper intends to estimate the Average Treatment History Effect on the
Treated given two specific treatment histories: E[Yit (a1t ) − Yit (a0t )|Di,t−1 = a1t−1 ] in which a0t = (0, · · · , 0),
a1t = (0, · · · , 0, 1, · · · , 1) with T0 zeros and (t − T0 ) ones indicate the histories of treatment statuses. We keep
the current notation for simplicity.
13
We attempt to make inference about the ATT in the sample we draw, not the ATT of the population. In
other words, we do not incorporate uncertainty of the treatment effects δit .
Assumption 2 means that the error term of any unit at any time period is independent of
treatment assignment, observed covariates, and unobserved cross-sectional and temporal
heterogeneities of all units (including itself) at all periods. We call it a strict exogen-
eity assumption, which implies conditional mean independence, i.e., E[εit |Dit , xit , λi , ft ] =
E[εit |xit , λi , ft ] = 0.15
Assumption 2 is arguably weaker than the strict exogeneity assumption required by
fixed effects models when decomposable time-varying confounders are at present. These
confounders are decomposable if they can take forms of heterogeneous impacts of a common
trend or a series of common shocks. For instance, suppose a law is passed in a state because
the public opinion in that state becomes more liberal. Because changing ideologies are often
cross-sectionally correlated across states, a latent factor may be able to capture shifting
ideology at the national level; the national shifts may have a larger impact on a state that
has a tradition of mass liberalism or has a higher proportion of manufacturing workers than a
14
The idea of predicting treated counterfactuals in a DID setup is also explored by Brodersen et al. (2014)
using a structural Bayesian time series approach.
15
Note that because εit is independent of Dis and xis for all (t, s), Assumption 2 rules out the possibility that
past outcomes may affect future treatments, which is allowed by the so called “sequential exogeneity” as-
sumption. A directed acyclic graph (DAG) representation is provided in the Online Appendix. See Blackwell
and Glynn (2015) and Imai and Kim (2016) for discussions on the difference between the strict ignorability
and sequential ignorability assumptions. What is unique here is that we conditional on unobserved factors
and factor loadings.
Assumptions 3 and 4 (see the Online Appendix for details) are needed for the consistent
estimation of β and the space spanned by F (or F 0 F/T ). Similar, though slightly weaker,
assumptions are made in Bai (2009) and Moon and Weidner (2015). Assumption 3 allows
weak serial correlations but rules out strong serial dependence, such as unit root processes;
errors of different units are uncorrelated. A sufficient condition for Assumption 3 to hold is
that the error terms are not only independent of covariates, factors and factor loadings, but
also independent both across units and over time, which is assumed in Abadie, Diamond and
Hainmueller (2010). Assumption 4 specifies moment conditions that ensure the convergence
of the estimator.
For valid inference based on a block bootstrap procedure discussed in the next section,
we also need to Assumption 5 (see Online Appendix for details). Heteroskedasticity across
time, however, is allowed.
3. ESTIMATION STRATEGY
In this section, we first propose a generalized synthetic control (GSC) estimator for treatment
effect of each treated unit. It is essentially an out-of-sample prediction method based on Bai
(2009)’s factor augmented model.
The GSC estimator for the treatment effect on treated unit i at time t is given by the
difference between the actual outcome and its estimated counterfactual: δ̂it = Yit (1) − Ŷit (0),
in which Ŷit (0) is imputed with three steps. In the first step, we estimate an IFE model
using only the control group data and obtain β̂, F̂ , Λ̂co :
X
Step 1. β̂, F̂ , Λ̂co =argmin (Yi − Xi β̃ − F̃ λ̃i )0 (Yi − Xi β̃ − F̃ λ̃i )
β̃,F̃ ,Λ̃co i∈C
We explain in detail how to estimate this model in the Online Appendix. The second step
estimates factor loadings for each treated unit by minimizing the mean squared error of the
predicted treated outcome in pre-treatment periods:
Step 2. λ̂i = argmin(Yi0 − Xi0 β̂ − Fˆ0 λ̃i )0 (Yi0 − Xi0 β̂ − Fˆ0 λ̃i )
λ̃i
in which β̂ and F̂ 0 are from the first-step estimation and the superscripts “0”s denote the
pre-treatment periods. In the third step, we calculate treated counterfactuals based on β̂,
10
1
P
An estimator for AT Tt therefore is: AT
[ Tt = Ntr i∈T [Yit (1) − Ŷit (0)] for t > T0 .
Remark 2: In the Online Appendix, we show that, under Assumption 1-4, the bias of
the GSC shrinks to zero as the sample size grows, i.e. Eε (AT
[ T t |D, X, Λ, F ) → AT Tt as
Nco , T0 → 0 (Ntr is taken as given), in which D = [D1 , D2 , · · · , DN ] is a (T × N ) matrix, X
is a three dimensional (T × N × p) matrix; and Λ = [λ1 , λ2 , · · · , λN ]0 is a (N × r) matrix.
Intuitively, both large Nco and large T0 are necessary for the convergences of β̂ and the
estimated factor space. When T0 is small, imprecise estimation of the factor loadings, or the
“incidental parameters” problem, will lead to bias in the estimated treatment effects. This
is a crucial difference from the conventional linear fixed-effect models.
Model selection. In practice, researchers may have limited knowledge of the exact number
of factors to be included in the model. Therefore, we develop a cross-validation procedure to
select models before estimating the causal effect. It relies on the control group information
as well as information from the treatment group in pre-treatment periods. Algorithm 1
describes the details of this procedure.
Step 1. Start with a given number of factors r, estimate an IFE model using the control
group data {Yi , Xi }i∈C , obtaining β̂ and F̂ ;
Step 2. Start a cross-validation loop that goes through all T0 pre-treatment periods:
(a) In round s ∈ {1, · · · , T0 }, hold back data of all treated units at time s. Run
an OLS regression using the rest of the pre-treatment data, obtaining factor
loadings for each treated unit i:
00 0 −1 00 0 00
λ̂i,−s = (F−s F−s ) F−s (Yi,−s − Xi,−s β̂), ∀i ∈ T ,
11
Step 4. Repeat Steps 1-3 with different r’s and obtain corresponding MSPEs.
The basic idea of the above procedure is to hold back a small amount of data (e.g. one
pre-treatment period of the treatment group) and use the rest of data to predict the held-
back information. The algorithm then chooses the model that on average makes the most
accurate predictions. A TSCS dataset with a DID data structure allows us to do so because
(1) there exists a set of control units that are never exposed to the treatment and therefore
can serve as the basis for estimating time-varying factors and (2) the pre-treatment periods
of treated units constitute a natural validation set for candidate models. This procedure
is computationally inexpensive because with each r, the IFE model is estimated only once
(Step 1). Other steps involves merely simple calculations. In the Online Appendix, we
conduct Monte Carlo exercises and show that the above procedure performs well in term of
choosing the correct number of factors even with relatively small datasets.
Remark 3: Our framework can also accommodate DGPs that directly incorporate additive
fixed effects, known time trends, and exogenous time-invariant covariates, such as:
in which lt is a (q × 1) vector of known time trends that may affect each unit differently; γi
is (q × 1) unit-specific unknown parameters; zi is a (m × 1) vector of observed time-invariant
12
13
Algorithm 2 (Inference) A parametric bootstrap procedure that gives the uncertainty es-
timates of the ATT is described as follows:
17
The treated outcome for unit i, thus can be drawn from Ỹi (1) = Ỹi (0) + δi . We do not directly observe δi ,
but since it is taken as given, its presence will not affect the uncertainty estimates of AT
[ T t . Hence, in the
bootstrap procedure, we use Ỹi (0) for both the treatment and control groups to form bootstrapped samples
(set δi = 0, for all i ∈ T ). We will add back AT
[ T t when constructing confidence intervals.
14
Step 2. Apply the GSC method to the original data, obtaining: (1) AT
[ T t for all t > T0 , (2)
estimated coefficients: β̂, F̂ , Λ̂co , and λ̂j,j∈T , and (3) the fitted values and residuals
of the control units: Ŷco = {Ŷ1 (0), Ŷ2 (0), · · · , ŶNco (0)} and ê = {ε̂1 , ε̂2 , · · · , ε̂Nco }.
(k)
Ỹi (0) = Ŷi (0) + ε̃i , i∈C
(k)
Ỹi (0) = Ŷi (0) + ε̃pi , j∈T
in which each vector of ε̃i and ε̃pj are randomly selected from sets e and ep ,
respectively, and Ŷi (0) = Xi β̂ + F̂ λ̂i . Note that the simulated treated coun-
terfactuals do not contain the treatment effect.
(b) Apply the GSC method to S (k) and obtain a new ATT estimate; add AT
[ T t,t>T0
(k)
to it, obtaining the bootstrapped estimate AT
[ T t,t>T . 0
and its confidence interval using the conventional percentile method (Efron and
Tibshirani 1993).
15
In this section, we conduct Monte Carlo exercises to explore the finite sample properties of the
GSC estimator and compare it with several existing methods, including the DID estimator,
the IFE estimator, and the original synthetic matching method. We also investigate the
extent to which the proposed cross-validation scheme can choose the number of factors
correctly in relatively small samples.
We start with the following data generating process (DGP) that includes two observed
time-varying covariates, two unobserved factors, and additive two-way fixed effects:
where ft = (f1t , f2t )0 and λi = (λi1 , λi2 )0 are time-varying factors and unit-specific factor
loadings. The covariates are (positively) correlated with both the factors and factor loadings:
xit,k = 1 + λ0i ft + λi1 + λi2 + f1t + f2t + ηit,k , k = 1, 2. The error term εit and disturbances
in covariates ηit,1 and ηit,2 are i.i.d. N (0, 1). Factors f1t and f2t , as well as time fixed effects
ξt , are also i.i.d. N (0, 1). The treatment and control groups consist of Ntr and Nco units.
The treatment starts to affect the treated units at time T0 + 1 and since then 10 periods are
observed (q = 10). The treatment indicator is defined as in Section 2, i.e., Dit = 1 when
i ∈ T and t > T0 and Dit = 0 otherwise. The heterogeneous treatment effect is generated
by δit,t>T0 = δ̄t + eit , in which eit is i.i.d. N(0,1). δ̄t is given by: [δ̄T0 +1 , δ̄T0 +1 , · · · , δ̄T0 +10 ] =
[1, 2, · · · , 10].
Factor loadings λi1 and λi2 , as well as unit fixed effects αi , are drawn from uniform
√ √ √ √ √ √
distributions U [− 3, 3] for control units and U [ 3 − 2w 3, 3 3 − 2w 3] for treated units
(w ∈ [0, 1]). This means that when 0 ≤ w < 1, (1) the random variables have variance 1;
(2) the supports of factor loadings of treated and control units are not perfectly overlapped;
and (3) the treatment indicator and factor loadings are positively correlated.18
18
The DGP specified here is modified based on Bai (2009) and Gobillon and Magnac (2016).
16
Treated Average
Estimated Y(0) Average for the Treated
40
Treated
30
Control
20
10
0
−10
0 5 10 15 20 25 30
12
Estimated ATT
10
True ATT
95% Confidence Intervals
8
6
4
2
0
−2
0 5 10 15 20 25 30
treated and control units, respectively. The bold solid line is the average outcome of the five
treated units while the bold dashed line is the average predicted outcome of the five units in
the absence of the treatment. The latter is imputed using the proposed method.
The lower panel of Figure 1 shows the estimated ATT (solid line) and the true ATT
(dashed line). The 95 percent confidence intervals for the ATT are based on bootstraps of
2,000 times. It shows that the estimated average treated outcome fits the data well in pre-
treatment periods and the estimated ATT is very close to the actual ATT. The estimated
17
Finite sample properties. We present the Monte Carlo evidence on the finite sample
properties of the GSC estimator in Table 1 (additional results are shown in the Online
Appendix). As in the previous example, the treatment group is set to have five units. The
estimand is the ATT at time T0 + 5, whose expected value equals 5. Observables, factors,
and factor loadings are drawn only once while the error term is drawn repeatedly; w is
set to be 0.8 such that treatment assignment is positively correlated with factor loadings.
Table 1 reports the bias, standard deviation (SD), and root mean squared error (RMSE) of
AT
[ T T0 +5 from 5,000 simulations for each pair of T0 and Nco .19 It shows that the the GSC
estimator has limited bias even when T0 and Nco are relatively small and the bias goes away
as T0 and Nco grow. As expected, both the SD and RMSE shrink when T0 and Nco become
larger. Table 1 also reports the coverage probabilities of 95 percent confidence intervals for
AT
[ T i,T0 +5 constructed by the parametric bootstrap procedure (Algorithm 2). For each pair
of T0 and Nco , the coverage probability is calculated based on 5,000 simulated samples, each
of which is bootstrapped for 1,000 times. These numbers show that the proposed procedure
can achieve the correct coverage rate even when the sample size is relatively small (e.g.,
T0 = 15, Ntr = 5, Nco = 80).
In the Online Appendix, we run additional simulations and compare the proposed method
with several existing methods, including the DID estimator, the IFE estimator, and the
synthetic matching method. We find that (1) the GSC estimator has less bias than the
DID estimator in the presence of unobserved, decomposable time-varying confounders; (2)
it has less bias than the IFE estimator when the treatment effect is heterogeneous; and
(3) it is usually more efficient than the original synthetic matching estimator. It is worth
q
(k) (k)
19
Standard deviation is defined as:
q SD(AT T t ) =
[ E[AT
[ Tt − E(AT
[ T t )]2 , while root mean squared error
(k) (k)
is defined as: RM SE(AT[ T t ) = E(AT[ T t − AT Tt )2 . The superscript (k) denotes the k-th sample. We
see that they are very close because the bias of the GSC estimator shrinks to zero as the sample size grows.
18
emphasizing that these results are under the premise of correct model specifications. To
address the concern that the GSC method relies on correct model specifications, we conduct
additional tests and show that the cross-validation scheme described in Algorithm 1 is able
to choose the number of factors correctly most of the time when the sample is large enough.
5. EMPIRICAL EXAMPLE
In this section, we illustrate the GSC method with an empirical example that investigates
the effect of Election Day Registration (EDR) laws on voter turnout in the United States.
Voting in the United States usually takes two steps. Except in North Dakota, where no regis-
tration is needed, eligible voters throughout the country must register prior to casting their
ballots. Registration, which often requires a separate trip from voting, is widely regarded
as a substantial cost of voting and a culprit of low turnout rates before the 1993 National
Voter Registration Act (NVRA) was enacted (e.g. Highton 2004). Against this backdrop,
EDR is a reform that allows eligible voters to register on Election Day when they arrive at
polling stations. In the mid-1970s, Maine, Minnesota, and Wisconsin were the first adopters
19
20
The two-way fixed effects model presented in Table 2 assumes a constant treatment effect
both across states and over time. Next we relax this assumption by literally employing a DID
approach. In other words, we estimate the effect of EDR laws on voter turnout in the post-
25
As is shown in the figure and has been pointed out by many, turnout rates are in general higher in states
that have EDR laws than states that have not, but this does not necessarily imply a causal relationship
between EDR laws and voter turnout.
21
22
8
Treated Average Estimated ATT
Estimated Y(0) Average for the Treated 95% Confidence Intervals
70
4
Turnout %
Turnout %
65
0
−4
60
−8
55
(a) Difference-in-differences
75
8
Treated Average Estimated ATT
Estimated Y(0) Average for the Treated 95% Confidence Intervals
70
4
Turnout %
Turnout %
65
0
−4
60
−8
55
that the gaps between the two lines are virtually flat in pre-treatment periods and the effect
takes off right after the adoption of EDR.28
Figure 3 presents the estimated factors and factor loadings produced by the GSC method.29
Figure 3(a) depicts the two estimated factors. The x-axis is year and the y-axis is the mag-
nitude of factors (re-scaled by the square root of their corresponding eigenvalues to demon-
strate their relative importance). Figure (b) shows the estimated factors loadings for each
treated (black, bold) and control (gray) units, with x- and y-axes indicating the magnitude
of the loadings for the first and second factors, respectively. Bearing in mind the caveat
28
Although it is not guaranteed, this is not surprising since the GSC method uses information of all past
outcomes and minimizes gaps between actual and predicted turnout rates in pre-treatment periods.
29
The results are essentially the same with or without controlling for the other two registration reforms.
23
6
Factor 1 Treated
Factor 2 Control (non−South)
Control (South)
CT
MA
10
4
NYRI PA
CA
WI
5
NJ WA LA
2
MI
UT ME
NH TX
Turnout %
WY MN AR
VT
OR
0
NEID GA SC
0
CO
IL KS MT TN
WV OH MS
IA AZ FL
−5
IN
NV
VA
−2
MO AL
DE NC
−10
MD
NM
−4
−15
that estimated factors may not be directly interpretable because they are, at best, linear
transformations of the true factors, we find that the estimated factors shown in this figure
are meaningful. The first factor captures the sharp increase in turnout in the southern states
because of the 1965 Voting Rights Act that removed Jim Crow laws, such as poll taxes or
literacy tests, that suppressed turnout. As shown in the right figure, the top 11 states that
have the largest loadings on the first factor are exactly the 11 southern states (which were
previously in the confederacy).30 The labels of these states are underlined in Figure 3(b).
The second factor, which is set to be orthogonal to the first one, is less interpretable. How-
ever, its non-negligible magnitude indicates a strong downward trend in voter turnout in
many states in recent years. Another reassuring finding shown by Figure 3(b) is that the
estimated factor loadings of the 9 treated units mostly lie in the convex hull of those of the
control units, which indicates that the treated counterfactuals are produced mostly by more
reliable interpolations instead of extrapolations.
Finally, we investigate the heterogeneous treatment effects of EDR laws. Previous studies
30
Although we can control for indicators of Jim Crow laws in the model, such indicators may not be able to
capture the heterogeneous impacts of these laws on voter turnout in each state.
24
The estimation of heterogeneous treatment effects is embedded in the GSC method since
it gives individual treatment effects for all treated units in a single run. Table 3 summarizes
the ATTs of EDR on voter turnout among the three waves of EDR adoptors. Again, additive
state and year fixed effects, as well as indicators of two other registration systems, are
controlled for. Table 3 shows that EDR laws have a large and positive effect on the early
adoptors (the estimate is about 7 percent with a standard error of 3 percent) while EDR
laws were found to have no statistically significant impact on the other six states.31 Such
31
In the Online Appendix, we show that the treatment effects are positive (and relatively large) for all three
25
6. CONCLUSION
In this paper, we propose the generalized synthetic control (GSC) method for causal inference
with TSCS data. It attempts to address the challenge that the “parallel trends” assumption
often fails when researchers apply fixed effects models to estimate the causal effect of a
early adopting states, Maine, Minnesota, and Wisconsin. Using a fuzzy regression discontinuity design, Keele
and Minozzi (2013) show that EDR has almost no effect on the turnout in Wisconsin. The discrepancy
with this paper could be mainly due to the difference in the estimands. Two biggest cities in Wisconsin,
Milwaukee and Madison constitute a major part of Wisconsin’s constituency but have neglectable influence
to their local estimates. One advantage of Keele and Minozzi (2013)’s approach over ours is the use of
fine-grained municipal level data.
32
Glynn and Quinn (2011) argue that traditional cross-sectional methods in general over-estimate the effect
of EDR laws on voter turnout and suggest that EDR laws are likely to have minimum effect on turnout in
non-EDR states (the ATC). In this paper, we focus on the effect of EDR in EDR states (the ATT) instead.
26
27
28
29
Campbell, John Y., Andrew W. Lo and A. Craig MacKinlay. 1997. The Econometrics of
Financial Markets. Princeton, NJ: Princeton University Press.
Dube, Arindrajit and Ben Zipperer. 2015. “Pooling Multiple Case Studies Using Synthetic
Controls: An Application to Minimum Wage Policies.” IZA Discussion Paper No. 8944.
Efron, Brad. 2012. “The Estimation of Prediction Error.” Journal of the American Statistical
Association 99(467):619–632.
Efron, Brad and Rob Tibshirani. 1993. An Introduction to the Bootstrap. New York, NY:
Chapman & Halll.
Fenster, Mark J. 1994. “The Impact of Allowing Day of Registration Voting On Turnout in
US Elections From 1960 To 1992 A Research Note.” American Politics Research 22(1):74–
87.
Gaibulloev, Khusrav, Todd Sandler and Donggyu Sul. 2014. “Dynamic Panel Analysis under
Cross-Sectional Dependence.” Political Analysis 22(2):258–273.
Glynn, Adam N. Glynn and Kevin M. Quinn. 2011. “Why Process Matters for Causal
Inference.” Political Analysis (19):273–286.
Gobillon, Laurent and Thierry Magnac. 2016. “Regional Policy Evaluation: Interactive Fixed
Eects and Synthetic Controls.” The Review of Economics and Statistics 98(3):535–551.
Hanmer, Michael J. 2009. Discount Voting: Voter Registration Reforms and their Effects.
Cambridge University Press.
Highton, Benjamin. 1997. “Easy Registration and Voter Turnout.” The Journal of Politics
59(2):565–575.
Highton, Benjamin. 2004. “Voter Registration and Turnout in the United States.” Perspect-
ives on Politics 2(3):507–515.
Holland, Paul W. 1986. “Statistics and Causal Inference.” Journal of American Statistical
Association 81(8):945–960.
Hsiao, Cheng, Steve H. Ching and Shui Ki Wan. 2012. “A Panel Data Approach for Program
Evaludation: Measuring the Benefits of Political and Economic Integration of Hong Kong
with Mainland China.” Journal of Applied Econometrics 27(5):705–740.
Huang, Chi and Todd G. Shields. 2000. “Interpretation of Interaction Effects in Logit and
Probit Analyses Reconsidering the Relationship Between Registration Laws, Education,
and Voter Turnout.” American Politics Research 28(1):80–95.
Imai, Kosuke and In Song Kim. 2016. “When Should We Use Linear Fixed Effects Regression
Models for Causal Inference with Panel Data.” Mimeo, Princeton University.
30
31
Rubin, Donald B. 1974. “Estimating Causal Effects of Treatments n Randomized and non-
randomized Studies.” Journal of Educational Psychology 5(66):688–701.
Springer, Melanie Jean. 2014. How the States Shaped the Nation: American Electoral Insti-
tutions and Voter Turnout, 1920-2000. University of Chicago Press.
Stewart, Brandon. 2014. “Latent Factor Regressions for the Social Sciences.” Mimeo, Prin-
ceton University.
Teixeira, Ruy A. 2011. The Disappearing American Voter. Brookings Institution Press.
Timpone, Richard J. 1998. “Structure, Behavior, and Voter Turnout in the United States.”
The American Political Science Review 92(1):145.
Timpone, Richard J. 2002. “Estimating Aggregate Policy Reform Effects: New Baselines for
Registration, Participation, and Representation.” Political Analysis 10(2):154–177.
Wolfinger, Raymond E. and Steven J. Rosenstone. 1980. Who Votes? New Haven, CT: Yale
University Press.
Xu, Yiqing. 2016. “Replication Data for: Generalized Synthetic Control Method: Causal
Inference with Interactive Fixed Effects Models.” doi:10.7910/DVN/8AKACJ, Harvard
Dataverse.
32