0% found this document useful (0 votes)
15 views27 pages

Wooldridge ControlFunctionMethods 2015

The paper by Jeffrey M. Wooldridge discusses control function (CF) methods in econometrics for addressing endogenous explanatory variables (EEVs) in both linear and nonlinear models. CF methods are highlighted for their simplicity and fewer assumptions compared to maximum likelihood methods, and they are particularly useful for estimating average partial effects. The paper also provides applications of CF methods to various datasets, illustrating their effectiveness in estimating causal relationships in econometric models.

Uploaded by

853461279lby
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views27 pages

Wooldridge ControlFunctionMethods 2015

The paper by Jeffrey M. Wooldridge discusses control function (CF) methods in econometrics for addressing endogenous explanatory variables (EEVs) in both linear and nonlinear models. CF methods are highlighted for their simplicity and fewer assumptions compared to maximum likelihood methods, and they are particularly useful for estimating average partial effects. The paper also provides applications of CF methods to various datasets, illustrating their effectiveness in estimating causal relationships in econometric models.

Uploaded by

853461279lby
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Control Function Methods in Applied Econometrics

Author(s): Jeffrey M. Wooldridge


Source: The Journal of Human Resources , SPRING 2015, Vol. 50, No. 2 (SPRING 2015), pp.
420-445
Published by: University of Wisconsin Press

Stable URL: https://www.jstor.org/stable/24735991

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms

University of Wisconsin Press is collaborating with JSTOR to digitize, preserve and extend access
to The Journal of Human Resources

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
Control Function Methods in Applied
Econometrics

Jeffrey M. Wooldridge

ABSTRACT

This paper provides an overview of control fonction (CF) methods for


solving the problem of endogenous explanatory variables (EEVs) in linear
and nonlinear models. CF methods often can be justified in situations
where "plug-in " approaches are known to produce inconsistent estimators
of parameters and partial effects. Usually, CF approaches require fewer
assumptions than maximum likelihood, and CF methods are computationally
simpler. The recent focus on estimating average partial effects, along with
theoretical results on nonparametric identification, suggests some simple,
flexible parametric CF strategies. The CF approach for handling discrete
EEVs in nonlinear models is more controversial but approximate solutions
are available.

I. Introduction

The term "control function" has been part of the econometrics lexicon
for several decades, but it has been used inconsistently, and its usage has evolved. In
early work — notably, Barnow, Cain, and Goldberger ( 1981 ) (hereafter, BCG) — a con
trol function is a variable that, when added to a regression, renders a policy variable
appropriately exogenous. From the BCG perspective, multiple regression that includes
the policy variable and one or more control functions provides consistent estimation
of the causal effect of a policy intervention. Cameron and Trivedi (2005, p. 37) en
dorses this definition of a control function (CF), and, based on the usage in BCG, what
Wooldridge (2010, Section 4.3.2) defines as a proxy variable would be considered a
CF. As one example, a standardized intelligence test score, such as IQ score, can be
considered a CF if conditioning on it appropriately controls for unobserved cognitive
ability, thereby enabling consistent estimation of the causal effect of schooling in a
standard wage equation.

Jeffrey M. Wooldridge is a University Distinguished professor of economics at Michigan State University.


He thanks four anonymous referees and the editors for very helpful comments on two earlier drafts. Data
used in this article are available from the author from November 2015 through October 2018.
[Submitted April 2013; accepted February 2014]
ISSN 0022-166X E-ISSN 1548-8004 © 2015 by the Board of Regents of the University of Wisconsin System

THE JOURNAL OF HUMAN RESOURCES >50-2

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
Wooldridge 421

In the application motivating BCG, variables measuring socioeconomic status


(SES) are control functions if participation in a program —such as Head Start —is
essentially determined by the SES variables. Goldberger (1972, reprinted 2008) was
an important contribution studying the problem of whether controlling for observables
could solve self-selection into program participation although it did not use the phrase
"control function." Therefore, in early usage, the notion of a control function was
closely tied to assumptions of "ignorable" or "unconfounded" treatment assignment
that are prevalent today: Conditional on observed covariates, the key policy variables
are appropriately exogenous. (For an overview, see Imbens and Wooldridge 2009.)
Heckman and Robb (1985), in the context of program evaluation with longitudinal
data, also describes a control function as a variable that, when conditioned on, makes
an intervention exogenous in a regression equation. It explicitly recognizes that CFs
might depend on unknown parameters and that to operationalize a CF procedure the
parameters must be estimated in a first stage. One example is a lagged residual in a
program evaluation equation using longitudinal data.
For the most part, the modern usage of "control function" maintains the spirit of the
earlier definitions but with an important defining feature: Constructing a valid CF relies
on the availability of one or more instrumental variables. I take this perspective in the
current paper: The control function approach to estimation is inherently an instrumental
variables method. More precisely, the equation of interest—for brevity called the "struc
tural equation"—contains at least one explanatory variable that is endogenous, or sus
pected of being so, in the sense that it is correlated with unobservables in the equation.
Further, I have excluded exogenous variables from the structural equation that explain
variation in the endogenous explanatory variables (EEVs for short). The exogenous
variation induced by excluded instrumental variables provides separate variation in the
residuals (or generalized residuals) obtained from a reduced form, and these residuals
serve as the control functions. By adding appropriate control functions, which are usually
estimated in a first stage, the EEVs become appropriately exogenous in a second-stage
estimating equation. The purpose of this review is to show how this general description
of the CF approach can be applied to various linear and nonlinear models.
In evaluating the scope of an estimation method, it is important to understand how
it works in familiar settings, including cases when it is not necessarily needed. Conse
quently, in Section II, I discuss the control function approach applied to linear models
with constant coefficients. Such models are still the workhorse in applied econometrics,
and simple IV methods, such as two-stage least squares (2SLS), are usually sufficient
for estimation. Nevertheless, even when the CF approach is identical to a 2SLS estima
tor, the CF perspective has a couple of attractive features. First, the CF approach pro
duces a simple Hausman (1978) test that compares OLS and 2SLS, and the test is easily
made robust to heteroskedasticity and cluster correlation (including serial correlation in
a panel data setting) of unknown form. Second, the CF approach parsimoniously handles
fairly complicated models that are nonlinear in endogenous explanatory variables.
In Section III, I turn to random coefficient models, where the partial effects of
the endogenous explanatory variables can vary across individual units in unobserved
ways. Estimating such models has fallen somewhat out of favor in empirical research
yet they (implicitly) play a role in the recent program evaluation literature, where
treatment effects are assumed to be heterogeneous. In the last decade or so, the focus
in the treatment effects literature has been on quantities that are identified using stan

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
422 The Journal of Human Resources

dard IV methods under general assumptions, with the "local average treatment effect"
(LATE) introduced in Imbens and Angrist (1994) being the most popular.
I argue in Section III that the control function approach is a useful complement to stan
dard IV methods for a couple of reasons. First, we might hope to estimate treatment effects
for identifiable populations or subpopulations, and the CF approach allows us to do that
under certain assumptions. Second, the CF approach allows us to study the nature of self
selection. In particular, if we think units self-select into treatment when the treatment is
likely to be beneficial, then we should be able to test that proposition. A classic example is
the endogenous switching regression model, as in Heckman (1976), which is often applied
to earnings equations under two different regimes (such as belonging to a union or not).
I discuss estimation of nonlinear models in Section IV, where the CF approach is par
ticularly appealing compared with other approaches such as "plug-in" methods or joint
maximum likelihood. Important contributions are Rivers and Vuong (1988), which de
veloped a two-step CF method for estimation of a probit model with a continuous EEV,
and Smith and Blundell (1986), which essentially did the same for the Tobit model. In
these early applications of CF methods to nonlinear models, the focus was on parameter
estimation. Many of the recent advances in CF methods demonstrate that average par
tial effects, or average causal effects, are identified quite generally. Wooldridge (2010)
uses these results extensively and, in pioneering work, Blundell and Powell (2003)
shows that the concept of a control function can be applied in nonparametric and semi
parametric contexts. For discrete EEVs, I also summarize the CF methods recently
proposed by Terza, Basu, and Rathouz (2008) and Wooldridge (2014). These methods
are more controversial because they rely on nonstandard parametric assumptions.
To illustrate several of the CF methods, I present applications to three data sets.
One data set allows estimation of a log wage equation allowing for education to be
endogenous. Such equations can be estimated assuming a constant return to education,
as in Section II, or the return to education can be individual-specific, as in Section III.
A second application is to a math test score equation, where the EEV of interest is a
binary indicator of attending a Catholic high school. Again, one can assume constant
coefficients or allow the effect of attending a Catholic high school to depend on un
observed characteristics. As we will see in Section III, there is strong evidence for
individual-specific heterogeneity in the effects of attending a Catholic high school.
To illustrate nonlinear models in Section IV, I use a data set on married women's
labor force participation, where the variable measuring other sources of income is
treated as endogenous. I show how to estimate the simple Rivers-Vuong model and
also show how the CF approach can be made much more flexible with almost no addi
tional computation. Finally, I briefly consider a binary response model for graduating
from high school with attending a Catholic high school as the endogenous explanatory
variable. The data sets and Stata® code used for all models estimated in the paper are
available on request from the author.

II. Models Linear in Constant Coefficients

I begin with a standard linear model with constant coefficients for


several reasons. The first is to show that a very common estimation method, two stage
least squares (2SLS), can be derived using the control function approach. Second, the

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
Wooldridge 423

control function (CF) approach leads to robust, regression-based Hausman tests of


whether the suspected EEVs are actually endogenous. Third, the basic 2SLS approach
can be contrasted with CF approaches that put structure on the reduced forms of the
endogenous explanatory variables. CF approaches that use more information can im
prove precision of the estimates but are generally less robust.
I consider a setting where y, is a scalar response variable, y2 is the endogenous ex
planatory variable (also a scalar for simplicity), and z is the 1 x L vector of exogenous
variables, which I assume contains unity to allow for a nonzero intercept. The "struc
tural" equation in the population is

(1) j, = z,8, + yxy2 + «„

where z, includes unity and is a 1 x L, subvector of z = (zp z2). The sense in which z
is exogenous is given by the L orthogonality (zero covariance) conditions

(2) £(z>,) = 0, j = 1,2,..., K.


The assumptions in Equation 2 hold if I make the stronger assumption E{ux |z) = 0,
which is sometimes preferred if Equation 1 is supposed to be a structural equation —
but I first derive the CF approach under the same assumption employed by 2SLS,
which is Equation 2.
I make the standard assumption that the elements of z are not perfectly collinear. In
addition, I assume the rank condition for identification holds. In the context of Model 1,
the rank condition is most easily stated in terms of the linear reduced form for yr If I write

(3) y2 = z it 2 + v2 = z,n2l + z2tt22 + v2

(4) E( z'v2) = 0,
then the rank condition holds if and only if w22 * 0. This is just the usual requirement
that there be at least one exogenous variable that is omitted from Equation 1 that is
partially correlated with yr As is now widely appreciated, given a random sample, one
should estimate the reduced from in Equation 3 and be able to reject the null
H0 : Ts22 = 0 at a suitably small significance level.
A leading example of the above setup is when yx is the logarithm of hourly earn
ings, y2 is a measure of schooling, and Zj includes other determinants of wages that
are assumed to be exogenous (such as workforce experience). Many instrumental
variables have been proposed for schooling in the literature, ranging from parents'
education to quarter of birth; Card (2001) includes a survey of some of the more
convincing efforts.
As a second example, suppose y{ is performance on a standardized test and v2 is a
binary indicator of attending a Catholic high school, a problem studied by Altonji, El
der, and Taber (2005), among others. In thinking about the scope of the model in Equa
tion 1, it is important to understand that it allows for y2 to be continuous or discrete (or
some mixture). The linear reduced form in Equation 3 under Condition 4 can always
be specified regardless of the nature of y2.1 need provide no structural interpretation of
this equation. I simply need z2 to be correlated with y2 after partialling out z,.
The CF approach based on Equations 1 through 4 proceeds by noting that correla
tion between the structural error, u., and the reduced form error, v2, can be captured
using a linear relationship:

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
424 The Journal of Human Resources

(5) M, = p,v2 + e,

(6) E(v2e{) = 0,
where p, = E(v2u,) / E(v2) is the population regression coefficient. Because ux and v2
are uncorrelated with z, it follows that ex is also uncorrected with z, and then e, must
also be uncorrelated with yr Therefore, I obtain a valid estimating equation by plug
ging Equation 5 into the structural equation to get

(7) y, = z,8, + y,y2 + p,v2 + e,.

In the CF approach, one views v2 as an explanatory variable in Equation 7. By including


it, one obtains a new error term, e], that is uncorrelated with all other righthand-side vari
ables, including yr In effect, including v2 in the equation "controls for" the endogeneity
of yr One can think of v2 as proxying for the factors in w, that are correlated with yr

A. Control Function Procedure: Linear Reduced Form

If one could observe v, along with the other variables, Equation 7 immediately sug
gests a way to estimate 8,, -y,, and p,: Run the OLS regression of yn on zn,yj2, and vj2
using a random sample of size N. The only problem is that one does not observe v2.
Nevertheless, from Equation 3, one can write v2 = y2 - ztt2 and, because data is col
lected on y2 and z, one can consistently estimate it2 by OLS. This leads to the follow
ing two-step control function procedure:
1. Run the OLS regression of the EEV, yiV on all exogenous variables, z.,

(8) ya onz,., i = l,..., N


and obtain the OLS residuals, vi2.
2. Run the OLS regression

(9) ya on Z(i, yi2, v,2, i = l,...,N


to obtain 8,, -y,, and p,.
It has been known since at least Hausman (1978) that this CF method produces coeffi
cients on z.j and v , that are numerically identical to the 2SLS estimates. Therefore, one
might wonder what all the fuss is about. It is true that, in this particular setting, the CF
approach does not lead to a new estimator. In fact, obtaining proper standard errors from
the regression in Equation 9 is made difficult by the first-stage estimation of n2. Neverthe
less, compared with the 2SLS approach, the inclusion of vj2 serves a valuable purpose: It
produces a heteroskedasticity-robust Hausman test of the null hypothesis H0 : p, = 0,
which means y2 is actually exogenous. The traditional form of the Hausman test that di
rectly compares OLS and 2SLS is substantially harder to make robust to heteroskedasticity.
The importance of the identification requirement that tt22 * 0 can be seen by study
ing Equations 3 and 7. If ti22 = 0, then v2 is a linear function of y2 and z,, which means
v2 is collinear in Equation 7. The presence of z, that is partially correlated with y2 en
sures v2 has variation separate from (zp y2). If there are no variables z2 then the CF
regression in Equation 9 suffers from perfect multicollinearity in the sample, and esti
mates of all parameters cannot be produced. These same observations apply to more
complicated CF procedures covered later.
I illustrate the CF approach using two data sets, one a wage data set used to estimate

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
Wooldridge 425

Table 1
Estimates of the log(wage) Equation

1 2 3 4 5

Explanatory Variable OLS 2SLS 2SLS CF CF

educ 0.0747 0.157 0.161 0.153 0.151


(0.0036) (0.052) (0.054) (0.048) (0.048)
exper 0.0848 0.119 0.120 0.116 0.115
(0.0068) (0.023) (0.024) (0.021) (0.021)
exper2 -0.0023 -0.0024 -0.0024 -0.0022 -0.0022
(0.0003) (0.0004) (0.0005) (0.0003) (0.0003)
black -0.119 -0.123 -0.121 -0.107 -0.105
(0.018) (0.051) (.062) (0.048) (0.048)
black ■ (educ - educ) -0.0008 0.018 0.019
(0.0408) (0.006) (0.006)
-0.082 -0.082 -0.106

(0.048) (0.048 (0.050)


v2 • educ 0.0019
(0.0010)
intercept 4.62 3.24 3.17 3.31 3.33
(0.07) (0.88) (0.91) (0.81) (0.81)
Observations 3,010 3,010 3,010 3,010 3,010

Notes: (i) Each equation contains dummy variables for living in an SMSA and living in the South. In addition,
they include regional dummies for where the man was living in 1966 and an indicator of whether the man
lived in an SMSA in 1966.
(ii) Standard errors for OLS and 2SLS are robust to heteroskedasticity.
(iii) In Column 2, the 2SLS estimates are equivalent to the CF estimates.
(iv) The standard errors for the CF estimates in Columns 4 and 5 are based on 1,000 bootstrap replications.

the return to schooling in Card (1995) and the other a subset of the data on student
performance and Catholic school attendance from Altonji, Elder, and Taber (2005)
(hereafter, AET). In both cases, the authors provide detailed discussions about the
exogeneity of the instruments, and AET casts doubt on the exogeneity of a commonly
used distance instrument. Nevertheless, I proceed as if the instruments are exogenous.
I begin with a standard wage equation

Iwage = z,Ô, + yxeduc + ux,

where Iwage is the log of wage and z, contains exogenous variables and a constant.
Years of schooling, educ, can be correlated with «, for many reasons, such as omitted
ability and measurement error. Rather than estimate 7, by OLS, I can try to find one or
more instrumental variables for educ. Card (1995) uses two dummy variables indicat
ing whether there is a two-year college (nearcl) or four-year college (nearcA) in the
local labor market at age 16. Details of the data are described in Card (1995).
Table 1 reports several estimates. The first column contains the OLS estimates with
the controls used in Card (1995). The return to a year of schooling is estimated to be

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
426 The Journal of Human Resources

Table 2
Estimates of the math 12 Equation

1 2 3 4 5 6
Explanatory Variable OLS 2SLS CF 2SLS CF CF

cathhs 1.49 2.36 1.59 2.06 2.30 -0.95


(0.39) (1.25) (1.07) (1.63) (1.19) (1.75)
motheduc 0.714 0.713 0.714 0.620 0.714 0.709
(0.062) (0.062) (0.062) (0.077) (0.064) (0.062)
fatheduc 0.893 0.887 0.893 0.908 0.886 0.876
(0.056) (0.057) (0.057) (0.071) (0.058) (0.058)
Ifaminc 1.84 1.82 1.84 1.87 1.90 1.86
(0.14) (0.14) (0.14) (0.18) (0.15) (0.15)
cathhs ■• (motheduc
cathhs motheduc)——
(motheduc - motheduc) 1.61 -0.077 -0.085
(0.73) (0.262) (0.262)
cathhs ■
cathhs ■ ((fatheduc
fatheduc - fatheduc)
fatheduc) -0.198 0.089 0.184
(0.684) (0.235) (0.238)
cathhs • (Ifaminc - Ifaminc) -0.688 -1.10 -0.691
(2.082) (0.61) (0.634)
-0.061 -0.290 -1.52
(0.594) (0.632) (0.80)
r2 ■ cathhs 3.31
(1.31)
intercept 11.20 11.45 11.23 11.87 10.72 11.18
(1.25) (1.29) (1.28) (1.62) (1.37) (1.38)
Observations 7,444 7,444 7,444 7,444 7,444 7,444

Notes: (i) Standard errors for OLS and 2SLS are robust to heteroskedasticity.
(ii) The standard errors for the CF estimates are based on 1,000 bootstrap replications.
(iii) Using the estimates from Column 6. the average treatment effect on the treated is 3.99 (t = 2.96) and the average
treatment effect on the untreated is -1.27 (t = -0.73).

0.075 (t = 20.48). Column 2 contains the 2SLS estimates reported in control function
form with the reduced form residual included. The heteroskedasticity-robust fstatistic
on v2 is -1.72, which is a marginal rejection of the null that education is exogenous —
even though the 2SLS point estimate for the return to education (0.157, t = 3.00) is
much higher than the OLS estimate.
For the AET application, I estimate the model

mathl 2 - z,8t + a fathhs + uv

where z, includes an intercept, mother's education, father's education, and the log of
family income. The instruments for cathhs, which is a binary indicator for attending
a Catholic high school, is distance from the nearest Catholic high school divided into
five bins. Thus, four distance dummies are used as IVs for cathhs. The OLS and 2SLS
estimates are given in Columns 1 and 2 of Table 2. The OLS estimate on cathhs is
about 1.49 (/ = 3.84), or about 0.16 standard deviations in the test score. The 2SLS
estimate is 2.36 (t = 1.90). However, the heteroskedasticity-robust test statistic on the

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
Wooldridge 427

control function (not reported in the table) is only t = -0.75, so the OLS and 2SLS
estimates are not statistically different.

B. Exploiting a Binary EEV

The test score example raises an interesting question because the EEV, cathhs, is bi
nary. The standard IV approach treats all EEVs the same: The structural equation is
supplemented with the linear reduced form given in Equations 3 and 4. An alternative
is to recognize the binary nature of y2 and replace its linear reduced form with a binary
response model. The two equations are then

(10) y, = z,8, + yxy2 + u{

(11) y2 = 1[z82 + e2> 0],


where 1[ ] is the binary indicator function. With these equations, one would assume
that (k , e2) is independent of z, which is already much stronger than the zero correla
tion assumptions used by the previous CF (2SLS) estimator. If it is assumed that ux is
linearly related to e2 and that

(12) e2 ~ NormaKO, 1),


then one can derive an alternative control function method. An implication of Equa
tions 11 and 12 is thaty, follows a probit model:

(13) P(y2 = l|z) = <D(z82),


where <£(•) is the standard normal cumulative distribution function. Nothing was as
sumed of the sort to apply 2SLS to Equation 1.
A thorough understanding of the pros and cons of different CF approaches requires
one to understand that the model for y2 in Equations 11 and 12 is entirely compatible
with the linear reduced form defined by Equations 3 and 4. The usual 2SLS approach
assumes nothing about the distribution of y2 given z. By contrast, Equations 11 and 12
completely characterize the distribution of y2 given z.

C. Control Function Procedure: Probit Reduced Form

When one specifies a full distribution for y2, the CF approach is based on the
conditional expectation E(yl\z,y2). This is a subtle difference with the 2SLS ap
proach, which is based on zero correlation assumptions only. It is well known —
see, for example, Wooldridge (2010, Section 21.4.2) —that under the previous as
sumptions,

(14) E(yx\z,y2) = z,8, + yxy2 + ^[^(zSj) - (1 - j2)\(-z82)],


where X( ) = <H) / $(•) is the well-known inverse Mills ratio. The function

(15) r{y2, z82) = j>2\(z82) - (1 - j2)\(-z82)


is sometimes called a "generalized error" because it has a mean of zero condi
tional on z.

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
428 The Journal of Human Resources

1. Estimate the probit model in Equation 13. Obtain the "generalized residuals"

(16) fa = yi2K(zß2) - (1 - >>,2)\(-z,ô2), i = 1


2. Run the OLS regression

(17) yn on z;1,ya, ra, i=l,...,N


to consistently estimate 8,, and %
As with the first CF approach — the one that produces 2SLS — a simple test of th
null hypothesis that y2 is exogenous is obtained as the (heteroskedasticity-robust)
statistic on ra.
The CF approach from Regression 17 is the same one computed by the "treatreg
command in Stata® using its two-step option. It exploits the binary nature of y2 bu
not without cost. For one, it is generally inconsistent if the probit model for y2
misspecified. This is in contrast to the usual 2SLS estimator—equivalently, the C
estimator from Equation 9. The robustness of the 2SLS estimator compared with the
estimator from Equation 17 is perhaps counterintuitive and has generated some conf
sion among empirical researchers. The key is that 2SLS does not use any distributiona
assumptions in the reduced form whereas the expression in Equation 14 does. If th
probit model for y2 is correctly specified, then the CF procedure in Equation 17 and
2SLS should give estimates that differ only due to sampling error.
Column 3 in Table 2 contains the CF estimates obtained from Equation 17 for th
math\2 equation. The cathhs coefficient is 1.59, which is close to the OLS estimat
This is expected because the t statistic on rn is only -0.10. If the coefficient on th
generalized residual were statistically significant, one should adjust the standard error
for the two-step estimation. The bootstrap can be used if analytical methods are not
readily available. Given the three estimates so far—OLS, 2SLS (the CF estimate
from Equation 9), and the CF estimates from Equation 17 — there is no reason to reje
the OLS estimate. This is not the case when one turns to a richer set of models.

D. Models Nonlinear in the EEV

One benefit of the CF approaches in Equations 9 and 17 is that they are easily adapted
to handle more complicated models. As one important example, consider a model
where y2 interacts with the exogenous variables (and appears on its own because z,
includes an intercept):

(18) y, = z,8, + y2zlyl + uv

If y2 is continuous, then one can use the regression in Equation 9 where yazn replaces
yi2, which means yi2 appears on its own and interacted with exogenous variables. If yj2
is binary, then one uses Regression 17, where ya appears by itself and interacted with
exogenous variables. The t statistic on either vj2 or rj2, perhaps made robust to hetero
skedasticity, is still a valid test of the null that y2 is exogenous. Because Equation 18
contains only a single EEV, a one degree-of-freedom test is appealing.
A standard IV approach to estimating Equation 18 would require choosing IVs for
the L, terms in y2zy For example, I can add interactions of elements in the excluded
exogenous variables, z2, with z{. When y2 is binary, there are other natural choices for
IVs, as discussed in Wooldridge (2010, Section 21.3): Use the interactions 4>(ziS2)z;l,

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
Wooldridge 429

where 4>(z;S2) are the probit fitted values. This IV estimator has a theoretical advan
tage over the CF estimator, at least if one assumes the linear model with constant coef
ficients is the correct specification: The IV estimator is generally consistent even if the
probit model is misspecified. Thus, one can exploit the binary nature of y2 but still
obtain an estimator that does not require a correctly specified model for D(y2\z), the
distribution of y2 given z. However, the CF approach offers a parsimonious way to
account for endogeneity of y2 even if it interacts with many exogenous variables. It
seems likely that it is more efficient quite generally, but this possibility seems not to
have been systematically investigated.
In Columns 3 and 4 of Table 1,1 include an interaction between the race indicator,
black, and educ, where I first center educ about its mean (roughly 13.3) before creat
ing the interaction. Column 3 contains the 2SLS estimates where nearc2, nearcA,
black ■ nearcl, and black ■ nearcA are used as instruments for educ and black ■ (educ
- educ). The coefficient on the latter is -0.0008, which is small and has a very wide
95 percent confidence interval. Column 4 contains the CF approach, and now the coef
ficient on the interaction term is positive and practically large, 0.018, and statistically
significant with / = 2.84. The return to education is estimated to be about 1.8 percent
age points higher for black men. Further, the earnings gap between black and nonblack
men shrinks at high levels of education. The picture given by the CF estimates is dif
ferent from the much less precise 2SLS estimates.
In the test score equation, I interact cathhs with motheduc,fatheduc, and lincome.
Column 4 in Table 2 contains the estimates where the fitted probit probabilities and
interactions are used as IVs, while Column 5 contains the CF estimates from add
ing the generalized residual. The estimates are notably different. The CF estimate of
the average effect of cathhs is 2.30 (t = 1.94) and the interaction terms are all small
and insignificant (although the interaction with lincome has t = -1.79). By contrast,
the average effect estimated by 2SLS is insignificant but there appears to be a large,
statistically significant interaction with mother's education. I cannot reconcile the dif
ference in these estimates without allowing the treatment effect of cathhs to depend
on unobservables, which I do in the next section.
Before ending this section, it is useful to summarize the key points of how the CF
approach compares with other common approaches.
1. In the basic linear model with constant coefficients, where the EEV appears lin
early, and where I use linear reduced forms, the CF approach is the same as 2SLS. The
CF approach provides a simple, robust test of the null hypothesis that y2 is exogenous.
2. When I exploit special features of the EEV y2— for example, recognize that it is
a binary variable —the CF approach uses generalized residuals. The CF approach is
likely more efficient than 2SLS because it exploits the binary nature of y2 but, in terms
of consistency, the CF approach is usually less robust than IV approaches.
3. In models with multiple, nonlinear functions of EEVs, the CF approach par
simoniously handles endogeneity and provides simple exogeneity tests. For general
nonlinearities, inserting fitted values for EEVs is generally inconsistent, even under
strong assumptions. The IV approach, where nonlinear functions of exogenous vari
ables are specified as instruments, is the most robust in terms of consistency, but in a
model such as Equation 18 it treats any function of the EEVs as a separate endogenous
variable; therefore, it can be quite inefficient relative to the more parsimonious CF
approach.

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
430 The Journal of Human Resources

III. Correlated Random Coefficient Models

The setup of the previous section allows the endogenous explanatory


variable or variables to appear linearly or nonlinearly and to interact with observed
covariates. This may be sufficient for some applications, but one may also want to
allow the effect of y2 to depend on unobservables. One might think, for example, that
the return to schooling or the causal effect of attending a Catholic high school vary
across individuals in ways that cannot be observed fully. When one allows random
coefficients to be correlated with some explanatory variables, such as amount of
school or choice of school, one obtains a "correlated random coefficient" (CRC)
model, a label adopted by Heckman and Vytlacil (1998) and discussed in the context
of the return to schooling by Card (2001). In the treatment effects literature, CRC
models allow for heterogeneous treatment effects combined with self-selection into
treatment —provided that there are suitable instrumental variables for treatment as
signment.
Consider the problem of estimating a wage equation with an individual-specific
return to schooling. For a random draw i,

(19) lwagei = z;1Ô, + gneduCi + un,

where gn is the individual-specific return to schooling. Now there are two sources of
unobserved heterogeneity and both g., and un might be correlated with educr In fact,
due to self-selection, one might expect the amount of education, educp to be positively
correlated with g : people for whom the return to schooling is higher will choose, on
average, to obtain more education.
Certainly, one cannot expect to estimate for each i. Instead, I focus on the aver
age return to schooling in the population, 7, = E(gn). Then I can write gn = 7, + v;1
where E(vn) = 0. Plugging into Equation 19 gives

(20) lwaget = znôj + yxeduct + vjleduci + uiV

If I apply the usual 2SLS estimator to Equation 20, then the error term is implicitly
en = vaeduci + uiv As discussed in Wooldridge (2003), the 2SLS estimator is generally
inconsistent for 7,, although it is consistent if one assumes, in addition to the standard
exogeneity requirements

(21) £(«,|z,.) = 0,£(v;1|z,) = 0,


a constant conditional covariance assumption:

(22) Cov(educi,vn\z,-) = Cov^educ^ vn).

Notice that Condition 22 allows arbitrary correlation between educi and the random re
turn to education, g . But the conditional covariance cannot depend on the exogenous
variables. Card (2001) discusses situations where this assumption is likely to fail in
simple models of schooling decisions.
A control function approach is based on similar assumptions but has the added
advantage of allowing estimation of a relationship between the level of education and
the return to education. The method is due to Garen ( 1984), although one can relax the
normality assumptions it uses. To describe the approach generally, let yn and ya be the
endogenous variables, as before, and assume that

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
Wooldridge 431

(23) ya = zp2 + va, E(va |z,.) = 0.

Assume also that both sources of unobservables, un and v(.,, are linearly related to vj2:

(24) E(un|va) = TijVjj, £(v,|v,2) = \|>,va

and that all unobservables are independent of zr The estimating equation is

(25) E(ya |z,, >>a) = ja, va) = zflô, + 7,^,7 + Wa + «l^n


Equation 25 leads to the following simple CF approach. As before, estimate Equation
23 by OLS and obtain the residuals, va. Second, run the OLS regression

(26) ya on zfl, ya, va, vaya, i = 1,..., TV.

Compared with the constant-coefficient case, I have added the interaction term
vaya. Without the interaction, I know that Regression 26 produces the 2SLS estimates
of Ô, and 7,. The interaction term accounts for the random coefficient on yj2. It is of
interest to test for statistical significance of the interaction term, but one must be care
ful: If the coefficient on va is different from zero, the usual t statistic on ynvi2 is not
valid because of the first-stage estimation. It is simple to bootstrap the two-step pro
cedure to obtain valid standard errors for all of the coefficients. Conveniently, a test of
joint significance of (v;2, vayj2) is valid without adjusting the standard errors. The joint
test is a test of the null hypothesis that y2 is exogenous.
Given the results on 2SLS by Wooldridge (2003) described earlier, it is possible that
the coefficient on vj2yn, ijj,, is large and statistically significant but the estimate of 7, is
similar to the 2SLS estimate. Even if the two procedures give similar estimates of the
average effect, 4», is of some interest because one can write

(27) E(gn |va) = 7, + i|»,va.

Even though I cannot estimate gir I can estimate its expected value given the reduced
form error, vQ, which necessarily has a zero mean. In the return-to-schooling example,
I might expect *1», > 0 because, as vQ increases, the person has more education than is
predicted by the exogenous variables, zr A positive is consistent with a selection
story: conditional on z., people obtain more education if their return to schooling is
higher. One can estimate the righthand side of Equation 27 as 7, + 4f,va for each i and,
if desired, study how these estimates vary across i. The average of the individual par
tial effects in the sample is, mechanically, %
As with the simpler CF method from Section II, Regression 26 easily extends to allow
any nonlinear functions of (zn,yj2), including quadratics and interactions. I estimate the
wage equation using the Card (1995) data by including the interaction black. ■ (educi -
educ) along with va and va • educt; the results are in Column 5 of Table 1. The estimates
on the educ., blacky and black: • (educ. - educ) are similar to the CF estimates without the
interaction term va • educp even though the latter is marginally significant (t = 1.84), re
vealing a certain robustness of the simpler CF approach. (Jointly, v;2 and va • educt are
significant with /rvalue = 0.042). From Equation 27, the positive coefficient on va • educt
implies that those with higher-than-predicted education have, on average, higher returns
to schooling, thereby providing some evidence for self-selection into schooling.
Using the CF approach, I do not have to stop at interaction terms between observed
variables or even just one random coefficient. A very general correlated random effects

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
432 The Journal of Human Resources

analysis is obtained by choosing a 1 x set of regressors, x.,, to be any function


{zn,yj2), say g^z^y^). This can include, in addition to zfl and ya. terms such as yj2 an
Z;iyj2, or even higher order polynomials and interactions. If one separates out an int
cept and allow all K{ elements of x;1 to have random slopes b;, = ß, + vfl, one ca
write

(28) yn = a, + xnß, + un + xn\n,

where both un and x;1v;1 are unobserved. After obtaining the reduced form residuals vi2
from OLS of yj2 on z., the CF regression is simply

(29) yn on 1, x,.„ va, x„va.

So, I add va and also interact all or some elements of xn with vj2. As before, it is simple
to use the nonparametric bootstrap, where both estimation steps are included, to obtain
valid inference. If if», is the Kx vector of OLS coefficients on xflvj2, one can estimate
E(bfl|va) as ß, + iyj», and possibly provide economic interpretations for the signs and
magnitudes of the elements of «j»,.
Even more flexibility is obtained by allowing £(vn|va) to be a nonlinear function in
v/2, such as E(\n\va) = v^tji, + (v,22 - t2)m,, where t2 = E(vf2). Then the terms v~2 and
xfl • (v?2 - t2), where t2 is the usual OLS variance estimate from the first stage, get
added to Equation 29. It is evident that these extensions of Garen's (1984) CF ap
proach allow significant flexibility in correlated random coefficient models.
The CF approach can also be used to estimate the random coefficient model when
y2 is binary. The typical endogenous switching model is

(30) yn = a, + ZflS, + ytyl2 + yazA + ",1 + JW1

and I combine this with the probit model for v,,, given in Equations 11 and 12, with all
unobservables independent of z.. After obtaining the generalized residuals in Equation
16, the CF regression is

(31) yn on 1, z,.„ yi2, yi2 ■ (zn - z,), ra, yi2ra,

where, again, centering zti about the sample averages ensures that the coefficient on
ya is the average effect.
The estimates of the switching regression model for the test score data are given in
Column 6 of Table 2. These estimates provide a very different picture than either the
2SLS estimates or the CF estimates that ignore the random coefficient on cathhsr
First, the two terms rj2 and cathhsj ■ ra are jointly significant using a heteroskedasticity
robust test with p-value = 0.022. By contrast, when ri2 is included by itself, its t statis
tic is only -0.46. The coefficient on cathhsi ■ rj2 is very large, 3.31 with t = 2.53, pro
viding evidence that the treatment effect of attending a Catholic high school depends
strongly on unobserved heterogeneity. Even more importantly, the average treatment
effect in the population is now negative and not statistically different from zero:
■y, = -0.95, (f =-0.58).
How can one reconcile the estimated average treatment effect in Column 6 of Table
2 with the 2SLS estimate, which, in the model with interactions, is 2.37 (t = 1.90)? As
is now well known from the work of Imbens and Angrist (1994), the 2SLS estimator
can be given a LATE interpretation. Because the instruments are functions of distance
to the nearest school, the interpretation is (somewhat loosely) as follows: The 2SLS

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
Wooldridge 433

estimate is the average treatment effect for those who are induced to attend a Catholic
high school because they live near a Catholic high school. This subpopulation can be
very different from the overall population, where the effect estimated by the CF ap
proach is not statistically different from zero.
One can shed further light on the difference between the 2SLS and CF estimates by
computing the average treatment effect on the treated (ATT) and the average treatment
effect on the untreated (ATU); see Imbens and Wooldridge (2009). The simplest way of
obtaining these quantities is to estimate separate equations for the control (yj2 = 0) and
treated (ya = 1) groups, in each case by regressing yn on 1, z.., ra. Then, fitted values
from each regression are obtained for all observations i, say and j>®, respectively.
Then

(32) ATT = - j>fl,


i=i

which is simply the average in the difference of fitted values over the yj2 - I obs
tions. (See, for example, Heckman, Tobias, and Vytlacil 2003.) Similarly, ATU is
average of over the ya = 0 observations. Using the full endogenous sw
ing specification, the estimated ATT (based on 452 students) is about 3.99 (t = 2
By contrast, the estimated ATU (based on 6,992 students) is about -1.27 (t = -0
The large difference is another way to illustrate the self-selection into attend
Catholic school: Those who would benefit based on factors unobserved to us are much
more likely to attend a Catholic high school. The usual 2SLS estimation of a linear
model is necessarily silent on such selection issues because it only estimates the
LATE.

The CF regression in Equation 31 can be made even more general to allow random
coefficients on some or all of the exogenous variables as well as on the interaction
terms. If one takes the vector of explanatory variables to be xn = (zn> yi2, znya) and
allow randomness in all coefficients bfl, then the CF regression (across all observa
tions) becomes

(33) ya on 1, z,.„ ya, yl2 ■ (zfl - z,), ra, yarn, fa ■ (za - z,), ya ■ ra ■ (zfl - z,).

The coefficient on yi2 in this regression is consistent for the average treatment effect.
Alternatively, one can run separate regressions for the control and treated groups,
where the regressions have the form yn on 1, z;1, ra, and ri2zn. The estimated ATT is still
obtained as in Equation 32, but the fitted values are obtained by adding the terms ri2zn
to the separate regressions. As usual, bootstrapping is an attractive way to obtain valid
standard errors. In the Catholic high school example, the expanded regression gives
the following estimates (not reported in Table 2): ATT = 3.59(f = 2.62), ATU = 0.063
(t = 0.03), and ATE = 0.28 (t - 0.14). Thus, the picture is similar to the switching re
gression model with constant coefficients: The average treatment effect in the entire
population is essentially zero, with a large average treatment effect for the relatively
small treated subpopulation.
The CF estimator obtained from Regression 33 allows substantial heterogeneity
across individuals — much more than is allowed in typical applications. Its main limi
tation is that it is based on a linear model for y, under zero conditional mean assump
tions for the unobservables. Such models are likely to be good approximations when

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
434 The Journal of Human Resources

yt is the log of wage or a test score, but linearity is harder to justify if y, is discrete o
its range is otherwise restricted. I now turn to nonlinear models for vr

IV. Nonlinear Models

Control function methods have long been employed for particular non
linear models, especially probit and Tobit, when the endogenous explanatory variables
are continuous. Thanks largely to the work of Blundell and Powell (2003, 2004), th
scope of such applications is now much broader. Wooldridge (2005) and Petrin and
Train (2010) give several examples of where CF methods can be applied with contin
ous EEVs. Here, I cover some simple examples that illustrate the flexibility of the CF
approach.

A. Continuous EEVs

Probably the leading example of a nonlinear model with continuous EEVs is the probit
model, as analyzed in Rivers and Vuong (1988). With a single EEV y2, the model can
be written as

(34) y, = l[z,8, + yly2 + «, > 0]

(35) y2 = z82 + v2,

where («,, v2) is bivariate normal with mean zero, Var(ul) = 1, and independent of z.
Here, both z and zl include constants with z,, a strict subset of z. In most cases, the
parameters of interest are constant insofar as they index partial effects. As discussed in
Wooldridge (2010, Section 15.7.2), the average partial effects are obtained by taking
derivatives or changes of

(36) £«,1{1[zi8i + "ViJ'2 + un - °]} = ^Oi8! + 7i^2)

where the notation Eun{} indicates averaging out the unobservables and treating (zp
y2) as fixed arguments. Equation 36 is an example of what Blundell and Powell (2003)
calls an "average structural function," or ASF. In defining the ASF, the observables are
taken as fixed arguments and the unobservables are averaged out. Under the assump
tions given, the parameters in Equations 34 and 35 and those in the bivariate normal
distribution can be estimated using joint MLE, and so the ASF can be estimated as
$(z,8| + %y2).
For the purposes of the current paper, a control function approach is attractive. The
CF approach is based on the following conditional probability; see Wooldridge (2010,
Section 15.7.2):

(37) P(yx = l|z ,y2) = P{yx = l|z„^2,v2) = ^(z^, + 7 ^y2 + Pt)lv2),


where E{ux\v2) = p,v2, the T| subscript denotes division by (1 - p?t2)1/2, and t2 = Var(v2).
The expression in Equation 37 leads to a simple two-step CF estimator for estimating
the scaled coefficients. First, the residuals, va, are obtained from the OLS regression
of ya on z.. Then, the scaled coefficients are consistently estimated from a probit of yn

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
Wooldridge 435

on zn, yiV va. The null hypothesis that y2 is exogenous is easily tested using the usual
t statistic on va.
The CF approach appears to have the drawback that it does not estimate the param
eters 8, and yt appearing in Equation 36. Fortunately, it turns out that the ASF is easily
estimated using the scaled parameters identified by Equation 37. As discussed in
Wooldridge (2010, Section 15.7.2), the ASF can be obtained as

(38) ASF( z„ y2) = £^[$^,8^ + y^y2 + Plllvi2)];


that is, one averages the control function, va out of the conditional probability
P(y{ = l|Zj, y2, v2). It follows that a consistent estimator of the ASF is
N

(39) ASF( z„ y2) = ^"'^(z^, + y^y2 + p^v,2),


i=l

and then I use derivatives or changes with respect to the elements of (zry2). After
partial effects have been obtained, further averaging can be used, or one can average
the partial effects across (zn, ya, vl7) to obtain a single average partial effect (as is done
by the "margins" command in Stata®).
Flexible extensions of the Rivers-Vuong approach can be obtained using the general
results of Blundell and Powell (2003, 2004, hereafter, BP), which at its most general
level is fully nonparametric. BP assumes a structural model of the form

(40) _v, = g,(z„ y2, u,)

for a vector of unobservables u, where, for simplicity, y2 is a scalar. The object of


interest in BP is the ASF, defined generally as

(41) ASF(z„y2) = ^[^(z^u,.,)];


again, the notation means that the unobservables u, are averaged out in the population
and z, and y2 are fixed values. The ASF can be differentiated with respect to (z,, y2), or
discrete differences can be calculated, to obtain average partial effects. Therefore, if
one can consistently estimate the ASF, then one can get not only directions of effects
but also magnitudes. As is now well known, parameters in nonlinear models often do
not deliver magnitudes of partial effects.
A key representation assumed by BP is

(42) y2 = g2( z) + v2,

where (u,, v2) is independent of z [and E(v2) = 0 so that E(y2|z) = g2(z)]. It is impor
tant to understand that independence between v2 and z effectively limits the scope of
the BP approach to continuous EEVs. If y2 is discrete, or its range is restricted in some
substantive way, v2 in Equation 42 cannot be independent of z. Together, Equations 40
and 42 are said to form a "triangular system" because the equation for v2 does not have
yx as an explanatory variable. Therefore, if y{ and y2 are simultaneously determined,
then assuming Equation 42 can be restrictive.
When Equation 42 holds and (u,, v2) is independent of z, the conditional distribution
of the unobservables u, in the structural function depends on (z, y2) only through the
reduced-form error, v2:

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
436 The Journal of Human Resources

(43) D(ux\z,y2) = D(u,|z,v2) = £>(u,|v2).

As shown by BP, the ASF can be obtained by using v2 as a proxy for Uj, in the follow
ing sense. First, define the conditional expectation

(44) A,(z„ y2, v2) = E(y, |z,, y2, v2).

Then the key result is

(45) ASF(zt, y2, ) = £Vj2[A,(z„ >>2, vi2)].


The result in Equation 45 is critical to the CF approach, and it generalizes the probi
case in Expression 38. It means that, for obtaining the ASF, it suffices to obtain
E(y\\zx, y2, v2) and then average out across the population distribution of v2. For ide
tification purposes, I effectively observe the v/2 because v;2 = yj2 - g2(z;), and g2( )
generally identified by E(y2 |z) = g2(z).
Let g2(-) be a consistent estimator of g2(-) and define the reduced-form residuals a

(46) vi2 = ya - g2(z,).


A consistent estimator of the ASF, under weak regularity conditions, is
N

(47) ASF( z„ y2) = ^"'^(z,, >>2, v,2).


j=i

Consistent estimates of partial effects are obtained by taking derivatives or changes


with respect to the elements in (z,, y2).
Wooldridge (2005) showed that the same analysis goes through if the deterministic
equation in Equation 40 is replaced with a conditional mean specification,

(48) E(yx\z,y2,ux) = E(yx\zx, y2, u,) = gx(zx,y2, u,).

Stating the structural model as in Equation 48 allows for some cases that fall outside
the BP framework, such as when yx is a fractional response or a count response.
A powerful implication of the BP work is that, provided one is interested in the aver
age structural function for v, and one can specify a reduced form for y2 with an additive,
independent error, one need not start with a structural model at all. For example, when
yx is binary case, the parameters in the structural Equation 34 are interesting insofar as
they provide directions of effects and enter into the average partial effects. But the
scaled coefficients in Equation 37 do just as nicely for getting directions of effects, ra
tios of coefficients, and average partial effects. In other words, one could start with the
probit model in Equation 37 and learn everything desired, including magnitudes of the
effects. The insight obtained from the probit model carries over to general situations.
By focusing on E{yx\zx, y2, v2), I can achieve considerable flexibility even within a para
metric framework. Of course, I need at least one exogenous variable that causes varia
tion in y2 not explained by zx, and I need to get suitable estimates of vr
As an example of how liberating the focus on the APEs can be, consider again the
binary response model. Let x, be any function of the exogenous and endogenous vari
ables and let v, be the error in a reduced form for v2, probably linear in parameters.
Then one can jump directly to specifying flexible models for P{yx = l|z,, y2, v2), such as

(49) P(yx = l|z„ y2, v2) = ^(xß, + Plv2 + ti,v22 + x,v2*|»,).

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
Wooldridge 437

It would be difficult, if not impossible, to derive Equation 49 from an underlying


structural equation of the form yx = g;(zp yv u{). Instead, I am skipping the step of
specifying a structural model and proceeding directly to estimating Equation 49. A
two-step CF method is straightforward. First, obtain the reduced form residuals vj2
from an initial (flexible) OLS regression. Then, estimate the parameters in Equation
49 using probit of yn on x.p vj2, vj2, xava. Testing the null hypothesis of exogeneity is
the same as testing that the last three terms are jointly insignificant. Importantly, there
is no need to worry that the coefficients might be scaled versions of underlying struc
tural parameters because the parameters estimated are precisely those that can be used
to estimate the ASF:

(50) ASF( z„ y2) = , + p,va + V,? + xi^24*i)


i= 1

As before, x, is a fixed argument and the averaging out is over the control function, vi2.
With large sample sizes, one can be even more flexible, including higher order poly
nomials or other transformations in va.
If x, includes nonlinear functions of (zp y2), such as y\ or interactions zxyv methods
where first-stage fitted values are inserted for y2 do not consistently estimate anything
interesting—either parameters or average partial effects. The CF approach has a distinct
advantage: If one thinks Equation 49 provides a good approximation to P{y] = l|z,, y2, v2),
then Equation 50 will deliver reliable estimates of the average partial effects.
As an application, consider estimating a binary response model of married women's
labor force participation (j^ = in//). The data, on 5,634 married women, come from
the May 1991 Current Population Survey. The EEV is other sources of income, y2 =
nwifeinc. I use husband's education (huseduc) as an instrument for nwifeinc. Other
controls are education, experience (as a quadratic), and a dummy variable for having
a child under the age of six. The first-stage t statistic on huseduc is 18.39; not surpris
ingly, husband's education is a good predictor of other sources of income.
Table 3 contains estimates of various models, starting with linear probability models
estimated by OLS and 2SLS. The OLS coefficient on nwifeinc is about -0.0033 (t =
-14.14), which implies that another $10,000 in other sources of income reduces the
labor force participation probability by 0.033. The IV estimate is substantially smaller
in magnitude, -0.0014, and not statistically different from zero (t = -1.42). Columns
3 and 4 contain the estimates for a probit model and the Rivers-Vuong control func
tion approach, respectively. The average partial effect when nwifeinc is treated as
exogenous is about -0.0033 (t = -14.21), the same as the OLS estimate of the linear
probability model to four decimal places. The APE from the CF approach is -0.0015
(t = -1.60), which is very similar to the linear IV estimate. In the probit CF method,
the first-stage residual has t = -1.93 and so there is marginal evidence of endogeneity.
Column 5 allows more flexibility by including a squared term in nwifeinc and an
interaction between having a young child, kidlt6, and nwifeinc. Like Column 4, the
estimates in Column 5 employ only a linear function in va. The square and interaction
are both statistically significant, and the CF is now slightly more significant. The coef
ficients are especially difficult to interpret because of the nonlinearity in the model
(including the probit functional form). Using the derivative, the APE of nwifeinc is
estimated to be -0.00097 (t = -1.00).

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
438 The Journal of Human Resources

^ ^ r^ r- r- r-
G\ oscn
<N ^ ^(N
^ m
m
m OC
IT) 00^-s CO Tt ^ —<
rrcn ^oJ^oo^omoJ'n^
- ^N^OO\Ofn(Nin^
(N (N O O O oo o ovo^ooin^r-r-oo
(N(NOOOoooovo^-oou-)^r^r^oo
-S
-s ^ OOOOn^OOOO^-vOOOOOCNOOO
O On O O O 0^tsOOOOO<NOOO
2 u
U 0OOOOOOOOc^OOOOOOOOO
0 0 0 0 0 0 OcopppppppOp
o o o o o o o ddddddddodd
dddodoooooodooodoo
W | W | W | W | W | w

so oo
00
^ Os r ^t
00oo ^^ r
r —H T^*-<
*-<^ O O O —I ^0 v~)
^ t>
h
(N00
(N 00 o
o O
o 00 o O
O ON
ON Tfr
Tt o o so h
-hm m(N(N I
•9,
a u O
O 00
00 -H
-H O
o o o O co
co vo
so O o o O O o
o o O I
o o o
eu o o
poopo o Q Q O co O O o o
o
o o
o o
o o o o o o o o
ooooooooo
o o o
w I I I

<N SO
<N in <N
(N r
rf (N h O
O -H
^ oo o
o O <N
<N SO in (N .
^ -R o° So 00-H^ Oo o o
o O -< Tt I I o o
^ S u o o o o
o© o o o© o min ©o o o
Oh
o o
o ogo
o
do o odd
o o o o
I w ' I

^ ,M
(N so
^
—« Oo^ ^Oo<N<N
—< so O
£w On ^
rnO o00
00 00 l>
00 h OO O o •/">
O
o O
o oO o O oO OOO O O CN ^t I i
i I
IS O O -H
o
-h o
o.
o o
o O
o O
© oo o
o© o
o p.ooo
d
o in o
o o
I I

^ r CO On
CO On SO
Sa
S3 (A
vi o o ininvo
^ o so t> -H O ^
r*
O CO
co O
o (N O
<N O O COCO so ,
a
jaJj I I i i I I
.£ oo
J
o m o o
o o o
o o
o
o
o O
o o
o
o O 00 -H
o o -< o 23 I
dodo
O O O O o o
o o o o o
w l

^ ^
^ /—s
iTi
co in
COCNO^OCOrtCNO
CO (N O SO CO (N o
«
S (/)
(/i co
co O in (N
o >n (N co
CO(N(NoO o
O Oo
O O COCO o oO o O
o Oo Oo 00
00

•jo o o o
o
0 0
'000000
O O
o 0o 0o0
o 0
0 0 0 0 0 0 o
o —
o 0—1

1 1

O
C

& &
o <Ü
»
<D c
C
a ■§
03
.3 « k. "L
1 i
& Sa
5" S s
Table 3 Estimates of the inlf Equation PJ
pj > §■ §■

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
Wooldridge 439

in
r*- (N ^ m O Tt
o o (N n ^ -H ^
O O <N O
© Q —i © 8 ^
o © © © ©

Tt
t> "<fr BBS
Tt H OOin
o o o o
I w I w

ON Tt
O O -H O cn
On "3* © © vo
n -î Q Q
o o o o
w
0
<

(N T+
O m o cn
on m © © vo
t-PP^
© © © ©

O rt
-H 00 »H fH ffj
r- © o
m o o o -

M rt
m in m o en
m t o © vo
n q q q
o o o o
—' I —•

« G
s
.2
s» oS
>
à jfe
C Ui
c <D

S on
Xi

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
440 The Journal of Human Resources

Table 4
Average Partial Effects o/nwifeinc at Different Quartiles

1 2 3
Probit Probit Probit
CF CF CF

No young children
25th percentile -0.00143 0.00026 0.00163
(0.00087) (0.00105) (0.00129)
50th
50thpercentile
percentile -0.00146 -0.00014 -0.00068
(0.00091) (0.00098) (0.00095)
75th
75thpercentile
percentile -0.00149 -0.00065 -0.00367
(0.00095) (0.00096) (0.00158)
At
At leastleast
one young
one child young child
25th
25th percentile
percentile -0.00157 -0.00197 -0.00067
(0.00099) (0.00120) (0.00129)
50th
50th percentile
percentile -0.00156 -0.00240 -0.00295
(0.00099) (0.00115) (0.00098)
75th
75thpercentile
percentile -0.00155 -0.00291 -0.00535
(0.00097) (0.00109) (0.00124)

Notes: (i) Column 1 is for the probit estimates reported in Column 4 of Table 3, Column 2 corresponds to
Column 5 in Table 3, and Column 3 corresponds to Column 6 in Table 3.
(ii) All standard errors are obtained from 1,000 bootstrap replications.

One of the benefits of using a nonlinear model is that it allows the effects of the expla
atory variables to change in a parsimonious way. Table 4 provides estimates of average
partial effects for nwifeinc, evaluated at the median as well as the first and third quartiles
I also consider the APEs with and without a young child. All of the other variables ar
averaged out. The picture is now different than that for the simple model; those APEs ar
reported in Column 1 of the table. When nwifeinc appears linearly in the probit mode
its APE is essentially flat across the six combinations of (kidltß, nwifeinc). By contras
in Column 2, the APEs vary substantially across different settings of the two covariates.
The effect of nwifeinc is essentially zero at the three income settings for women without
young child although the point estimates show the effect increases in magnitude as incom
increases. For women with a young child, the effect is marginally significant at the lowes
quartile, -0.0020 (t = -1.65), and is largest at the 75th percentile, -0.0029 (t = -2.67).
Finally, Column 6 in Table 3 contains estimated parameters of a model that adds a
quadratic in the CF, vi2, along with an interaction between vi2 and nwifeincr Now th
three terms that depend on the CF are jointly very significant, with value equal t
zero to four decimal places. Plus, each term is individually very significant, suggestin
that the earlier models suffer from functional form misspecification. As often happen
in comparing a variety of models, the estimated APE across all observations is ver
similar to the simpler models, including the linear model estimated by IV: -0.0015
(f = -1.50). But the pattern of APEs at different (kidlt6, nwifeinc) pairs differs. Column
3 in Table 4 contains the APEs. Now nwifecinc has a negative, statistically significant

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
Wooldridge 441

effect at the highest quartile among women without a small child: -0.0037 (t - -2.32).
Among women with a child, there is no income effect at the lowest quartile but a fairly
large effect, -0.0054 (t = -4.31), at the highest quartile.
There is no guarantee that even the last model captures all of the important non
linearities, but the example shows that accounting for the nonlinearities is potentially
important. With large sample sizes, one can try interactions among all variables — in
cluding the control function — and quadratics in the continuous variables (including
the control function). Two-step estimation is simple and the bootstrap efficiently com
putes standard errors of the coefficients and the average partial effects.
The BP setup, and therefore convenient parametric approximations, extends easily
to the case of a vector of continuous EEVs, say y2, provided there are sufficient instru
ments. An example is Petrin and Train (2010), which studies multinomial consumer
choice models with a vector of endogenous price variables. Rather than start with, say,
a multinomial or nested logit model that depends on unobserved taste heterogeneity
that can be correlated with price, Petrin and Train proposes estimating such models for
D(^||z,, y2, v2), where v2 is the vector of reduced for errors in y2 = II2z + v2. When v2
is replaced with reduced-form residuals vi2—obtained from OLS regressions using
prices or log prices —the CF methods are computationally simple even for many
choice alternatives. The standard approach, where the distribution of the heterogeneity
is modeled and then integrated out, is much more complicated. Petrin and Train pro
vides evidence that the CF approach works well.

B. Discrete EEVs

The major impediment to extending the BP framework to allow discrete EEVs is that
the average structural function is nonparametrically unidentified even under fairly
strong independence assumptions; see Chesher (2003). Consequently, parametric CF
approaches when y2 is discrete generally require the parametric assumptions to hold
in order to achieve identification. By contrast, the parametric models discussed in the
previous subsection are offered as flexible approximations to an analysis that, in prin
ciple, could be fully nonparametric.
The traditional approach to estimating nonlinear models with discrete y2 is not
a CF approach. Instead, maximum likelihood —or, in some cases, quasi-MLE (see
Wooldridge 2014 for some recent examples) —is by far the leading method. One oc
casionally sees plug-in methods used but these are generally inconsistent. In this sub
section, I discuss how two-step CF methods can be used in place of MLE approaches
under a different set of parametric assumptions. The CF approach is somewhat contro
versial in this case because the assumptions under which it produces consistent partial
effects are nonstandard.

To illustrate the issues, suppose that yx is binary and is generated by Equation 34.
Now, y2 is also binary and follows a linear index model:

(51) y2 = 1[z82 + v2 > 0],


As example, I could model a binary outcome, such as graduating from high school
(>',), as a function of attending a Catholic high school (y2). Usually the parameters in
Equations 34 and 51 are estimated jointly by MLE under the assumption that («,, v2)
is independent of z with a bivariate normal distribution, where ux and v2 are both

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
442 The Journal of Human Resources

standard normal. This model is sometimes called a "bivariate probit" model, where y
appears in the equation for but Equation 51 is taken to be a reduced form probi
equation. The ASF, <£^,8, + "y^), is easily estimated given the MLEs of 8, and A
plug-in approach that replaces ya with probit fitted values, <ï>(z,fi2), in the second-stage
probit inconsistently estimates both the parameters and the average partial effects.
Under the standard bivariate probit assumptions, there is no known CF method th
consistently estimates the parameters. Nevertheless, as shown by Wooldridge (2014),
an optimal test of the null hypothesis that y2 is exogenous is obtained as the usual ML
t statistic on the generalized residual ri2 = yl2k(tß2) - (1 - ya)\(-zß2)- Therefore,
one knew 82, rather than having to estimate it, one would estimate the probit model

(52) P(yn = l|z,.„ ya, ra) = «ï>(zflS, + yxyi2 + pfa)

and test H0 : p, = 0. To operationalize the test, replace r;2 with ra.


An intriguing possibility is that including ra in the second-stage probit along wit
(z;|, yn) might provide an accurate correction for "small" amounts of endogeneity
where smallness is measured by the size of p,. Terza, Basu, and Rathouz (2008) (TB
was the first to propose adding residuals to standard models — such as probit — to solve
the endogeneity problem for discrete yr Rather than the generalized residual ra, TB
uses the residual ên = yi2 - $(z;S2), but the motivation is the same. As noted b
Wooldridge (2014), in order to use Equation 52 to consistently estimate the averag
partial effects, one needs to add the assumption that r2 acts as a kind of sufficient s
tistic for capturing the endogeneity of yr One can state the condition by recalling th
yt = 1^,8, + 7,>'2 + m, > 0]. Then, assume that // depends on (z,y2) only through r2 i
the conditional distribution sense:

(53) D(u,\z,y2) = D(ujr2).


When Equations 52 and 53 are combined, the average structural function can be con
sistently estimated, just as in the BP case, by averaging out the generalized residuals:
N

(54) ASF( z„ y2) = + y ,y2 + p ,ri2).


(=i

In using Equation 52 as an estimating equation, I still require that z. has at least one
element with nonzero coefficient in 82 that is excluded from z This ensures that rj2
has variation that is not determined entirely by (z;l,;yi2). As with any CF method, it is
better to have more independent variation in rjr Because ri2 depends on z. in a nonlin
ear way, technically I could get by with z. = zn. However, as in other contexts, I should
not achieve identification off of nonlinearities. That is, if a linear version of the model
is not identified, then I should not proceed with a nonlinear model. For further discus
sion, see Wooldridge (2010, Section 9.5).
It is important to understand that the CF approach and the bivariate probit approach
use the same probit reduced form for y2 but use different assumptions about the con
ditional distribution D{yx\z, y2). The bivariate probit approach requires an extra inte
gration that leads to a fairly complicated log likelihood function; see, for example,
Wooldridge (2010, Section 15.7.3). By contrast, Assumption 52 leads to a straightfor
ward two-step method. While the CF assumptions are nonstandard, they are no more
or less general than the bivariate probit assumptions. Because Equation 52 is a valid

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
Wooldridge 443

approximation for p, "near" zero, the simple CF method might provide good estimates
of the ASF fairly generally.
Using the data in AET, but with only 5,979 students due to missing data on the bi
nary response yl - hsgrad, the probit model in Equation 52 can be estimated by insert
ing the same generalized residuals used for the linear math\2 equation. The exogenous
variables are exactly as before, with the distance dummies playing the role of instru
ments. The coefficient from the second stage probit on ra is 0.626 (t = 3.15), suggest
ing a strong form of self-selection into attending a Catholic high school. The average
partial effect of cathhs using the two-step CF approach is actually negative, -0.082,
with p-value above 0.25. Thus, in this simple model, there is no evidence that attend
ing a Catholic high school has a positive causal effect on graduating from high school.
When the generalized residuals are dropped so that cathhs is treated as exogenous, the
APE is 0.047 (t = 4.92), suggesting a nontrivial positive and very statistically signifi
cant effect. A complete set of estimates is available on request.
As in the case with a continuous EEV, I can use flexible parametric models to allow
general interactive effects inside the probit function. For example,

(55) P(yn = l|zp ya, ri2) = <&(xflß, + p xra + x,/ai|»,),

where Xj is a general function of (zvy2) and includes an intercept. I can use a standard
Wald test of H0 : p, = 0,1)1, = 0 after replacing rj2 with its generalized residuals from
the first-stage probit. The average structural function is estimated as in Equation 50
with ra replacing vi2.
If one embraces the flexibility of the control function approach when combined
with sensible parametric functional forms, problems that can be computationally de
manding using traditional approaches become much easier. For example, in the binary
response model, there might be a continuous EEV, say y2, and a binary EEV, say yr
One can include functions of the OLS residuals from the reduced form for y2 and the
generalized residuals from the reduced form probit model for v;J in a second-stage
probit model for yv These functions might include quadratics, cubics, and various
interactions among the OLS residuals, generalized residuals, and observed covariates.

V. Concluding Remarks

This survey of control function methods has focused on cross-sectional


applications where the average partial effects on a mean response function are of pri
mary interest—hence my focus on the average structural function. But one need not
focus on the mean. For example, Imbens and Newey (2009) defines the notion of a
"quantile structural function" and derives control function methods under monotonic
ity. It is important to understand that such a change of focus often restricts the amount
of heterogeneity that one may have in a model, especially when one approaches the
problem from a nonparametric perspective.
Control function methods are also very useful in panel data applications where
one must account for unobserved heterogeneity as well as endogeneity. Papke
and Wooldridge (2008) shows how the CF approach can be combined with the
Chamberlain-Mundlak device for handling time-constant heterogeneity. Two-step
estimation methods, where the first stage is a linear reduced form, are computationally

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
444 The Journal of Human Resources

simple and are consistent and asymptotically normal in the presence of serial correl
tion of unknown form. Altonji and Matzkin (2005) considers nonparametric identifi
tion of panel data models and endogeneity in a very general setting.

References

Altonji, Joseph, Todd Elder, and Christopher Taber. 2005. "An Evaluation of Instrumental
Variable Strategies for Estimating the Effects of Catholic Schooling." Journal of Human
Resources 40(4):791-821.
Altonji, Joseph, and Rosa Matzkin. 2005. "Cross Section and Panel Data Estimators for Non
separable Models with Endogenous Regressors." Econometrica 73(4):1053-102.
Barnow, Burt, Glen Cain, and Arthur Goldberger. 1981. "Selection on Observables." Evalua
tion Studies Review Annual 5( 1 ):43—59.
Blundell, Richard, and James Powell. 2003. "Endogeneity in Nonparametric and Semipara
metric Regression Models." In Advances in Economics and Econonometrics: Theory and
Applications, Eighth World Congress, Volume 2, ed. Mathias Dewatripont, Lars Hansen, and
Stephen Turnovsky, 312-57. Cambridge: Cambridge University Press.
. 2004. "Endogeneity in Semiparametric Binary Response Models." Review of Economic
Studies 71(3):655-79.
Cameron, Colin, and Pravin Trivedi. 2005. Microeconometrics: Methods and Applications.
New York: Cambridge University Press.
Card, David. 2001. "Estimating the Return to Schooling: Progress on Some Persistent Econo
metric Problems." Econometrica 69(5): 1127-60.
Chesher, Andrew. 2003. "Identification in Nonseparable Models." Econometrica 71(5): 1405-41.
Garen, John. 1984. "The Returns to Schooling: A Selectivity Bias Approach with a Continuous
Choice Variable." Econometrica 52(5): 1199—218.
Goldberger, Arthur. 2008. "Selection Bias in Evaluating Treatment Effects: Some Formal
Illustrations." In Advances in Econometrics, Volume 21, ed. Daniel Millimet, Jeffrey Smith,
and Edward Vytlacil, 1-31. Amsterdam: Elsevier.
Hausman, Jerry. 1978. "Specification Tests in Econometrics." Econometrica 46(6): 1251—71.
Heckman, James. 1976. "The Common Structure of Statistical Models of Truncation, Sample
Selection and Limited Dependent Variables and a Simple Estimator for Such Models." An
nals of Economic and Social Measurement 5(4):475-92.
Heckman, James, and Richard Robb. 1985. "Alternative Methods for Evaluating the Impact of
Interventions: An Overview." Journal of Econometrics 30( 1—2):239—67.
Heckman, James, and Edward Vytlacil. 1998. "Instrumental Variables Methods for the Corre
lated Random Coefficient Model: Estimating the Average Rate of Return to Schooling When
the Return Is Correlated with Schooling." Journal of Human Resources 33(4):974—87.
Heckman, James, Justin Tobias, and Edward Vytlacil. 2003. "Simple Estimators for Treatment
Parameters in a Latent-Variable Framework." Review of Economics and Statistics 85(3):
748-55.

Imbens, Guido, and Joshua Angrist. 1994. "Identification and Estimation of Local Average
Treatment Effects." Econometrica 62(2):467-75.
Imbens, Guido, and Whitney Newey. 2009. "Identification and Estimation of Triangular Simu
taneous Equations Models Without Additivity." Econometrica 77(5): 1481-512.
Imbens, Guido, and Jeffrey Wooldridge. 2009. "Recent Developments in the Econometrics of
Program Evaluation." Journal of Economic Literature 47(l):5-86.
Papke, Leslie, and Jeffrey Wooldridge. 2008. "Panel Data Methods for Fractional Response
Variables with an Application to Test Pass Rates." Journal of Econometrics 145(1-2): 121-33.

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms
Wooldridge 445

Petrin, Amil, and Kenneth Train. 2010. "A Control Function Approach to Endogeneity in
Consumer Choice Models." Journal of Marketing Research 47( 1):3—13.
Rivers, Douglas, and Quang Vuong. 1988. "Limited Information Estimators and Exogeneity
Tests for Simultaneous Probit Models." Journal of Econometrics 39(3):347-66.
Smith, Richard, and Richard Blundell. 1986. "An Exogeneity Test for a Simultaneous Equation
Tobit Model with an Application to Labor Supply." Econometrica 54(3):679—85.
Terza, Joseph, Anirban Basu, and Paul Rathouz. 2008. "Two-Stage Residual Inclusion
Estimation: Addressing Endogeneity in Health Econometric Modeling." Journal of Health
Economics 27(3):531 —43.
Wooldridge, Jeffrey. 2003. "Further Results on Instrumental Variables Estimation of Aver
age Treatment Effects in the Correlated Random Coefficient Model." Economics tetters
79(2): 185-91.
. 2005. "Unobserved Heterogeneity and Estimation of Average Partial Effects." In Iden
tification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg,
ed. Donald Andrews and James Stock, 27-55. Cambridge: Cambridge University Press.
.2010. Econometric Analysis of Cross Section and Panel Data, 2nd edition. Cambridge:
MIT Press.

. 2014. "Quasi-Maximum Likelihood Estimation and Testing for Nonlinear Models with
Endogenous Explanatory Variables." Journal of Econometrics 182( 1 ):226—34.

This content downloaded from


148.88.247.20 on Fri, 25 Oct 2024 10:56:43 UTC
All use subject to https://about.jstor.org/terms

You might also like