JMP_JHLee
JMP_JHLee
Regression Model
September 1, 2024
Abstract
We consider a unifying framework to test for direct and indirect treatment
effects in nonlinear models. Specifically, we extend a generalized linear-index
model to incorporate endogenous treatments and endogenous mediators. We
propose kernel-weighted Kendall’s tau statistics to test the significance of the di-
rect and indirect effects of endogenous treatments on the outcome variable medi-
ated by endogenous mediators. The proposed semiparametric model allows for
treatments and mediators to be discrete, continuous, and/or censored/truncated.
For the indirect effect, we construct two distinct kernel-weighted Kendall’s tau
statistics that capture the effect of (i) the treatment on the mediator and (ii) the
mediator on the outcome. Applying the testing approach of van Garderen and
van Giersbergen [2020] avoids the problem of under-sized testing of the joint
null hypothesis associated with the indirect effect. Monte Carlo Simulations in-
vestigate the performance of the semiparametric testing approach.
1
1 Introduction
Causal mediation analysis decomposes a treatment effect into two components: the
indirect effect (or causal mediation effect), which occurs through one or more me-
diator variables, and the direct effect, which captures the impact of the treatment
through other causal mechanisms not involving the mediator variable(s) of interest.
The term ’mediator’ refers to a variable that is affected by the treatment and, in turn,
affects the outcome variable of interest. If policymakers find that a treatment has an
indirect effect on the outcome variable through a specific, they may focus their ef-
forts on that causal mechanism to achieve their desired goal. Starting with Judd and
Kenny [1981] and Baron and Kenny [1986], causal mediation analysis was largely
developed by psychologists and sociologists. However, this framework has gained
popularity among economists as a tool to identify direct and indirect effects in eco-
nomic models (e.g., Heckman et al. [2013] and Heckman and Pinto [2015], among
many others).
There are two primary approaches to identifying causal mediation effects: the
parametric model approach and the potential outcome framework 1 . Sobel [1982]
and Baron and Kenny [1986] introduced linear models to identify indirect effects.
This method gives an easy and intuitive interpretation of indirect effects. How-
ever, it is restrictive since it does not fully incorporate interactions between variables
and assumes independence between outcome variables. The potential outcome ap-
proach is more flexible, as it typically does not assume parametric specification of
variables (e.g., Robins and Greenland [1992], Pearl [2022], among many others). A
recent contribution that shows the advantage of this approach is the identification of
ACME (average causal mediation effect) by Imai et al. [2010a]. However, the poten-
tial outcome framework typically hinges on strong statistical assumptions to achieve
nonparametric identification of indirect effects. For example, the identification strat-
egy of Imai et al. [2010a] requires sequential ignorability, which is untestable like the
unconfoundedness assumption.
At the same time, most research in causal mediation analysis considers (condi-
tionally) random treatment, which is more relevant to experimental settings. Even
if the treatment is randomized, it is often the case that the mediator is not random-
ized. Consequently, the endogeneity of the mediator can arise due to post-treatment
confounders. Although there are models that utilize instrumental variables to han-
dle endogenous treatment/mediator (e.g., Powdthavee et al. [2013], Jun et al. [2016],
Frölich and Huber [2017], Dippel et al. [2020], Chen et al. [2020], etc), there have been
few studies appropriate for quasi-experimental designs in economics.
This paper introduces a new semiparametric model that offers several advan-
tages over previous models. Using this model, one can identify and estimate the
existence and direction of direct and indirect effects in the presence of not only an en-
dogenous treatment but also an endogenous mediator. Unlike previous models that
focus on specific types of treatment, mediator, and outcome variables, our model
accommodates general outcome types (continuous, discrete, truncated, and/or cen-
sored). The model also allows for heterogeneous direct and indirect effects. The
model includes an endogenous mediator that is affected by the treatment and sub-
1
See Huber [2020] and Celli [2022] for a general review of causal mediation models
2
sequently affects the outcome variable of interest. Specifically, this extends the gen-
eralized regression framework by including an endogenous treatment and an en-
dogenous mediator. The model has a triangular structure, with three equations that
correspond to the treatment, the mediator, and the outcome.
Semiparametric test statistics, called conditional Kendall’s tau statistics, are pro-
posed to test the existence of direct and indirect effects in the model. Estimating
the direct effect requires just one conditional Kendall’s tau because it only takes into
account the treatment and the outcome variable of interest. On the other hand, the
indirect effect consists of two quantities: (i) the effect of the treatment on the medi-
ator and (ii) the effect of the mediator on the outcome. There exists a causal medi-
ation effect of the treatment if and only if these two effects are nonzero. Therefore,
we can conduct a joint hypothesis test based upon two distinct conditional Kendall’s
tau statistics that capture (i) and (ii), respectively. These two Kendall’s tau statistics
depend on estimated linear-index parameters in the treatment, mediator, and out-
come. We first estimate linear index parameters using certain matching conditions
that compare a pair of observations of lower-level outcome √ variables and their linear
indices. We show that the estimated linear indices are n−consistent and asymp-
totic normal. Then, we derive conditional √ Kendall’s tau statistics using the plug-in
method. Finally, we prove that given n−consistent and asymptotically normal es-
timators for linear
√ indices, our new estimator that captures (ii) based on the plug-in
method is also n−consistent and asymptotically normal.
Using these two test statistics and a new method introduced by van Garderen
and van Giersbergen [2020], we make simultaneous inference to test the existence
of the indirect effect. It has been noted that classic test methods (e.g., the Sobel’s
test, joint significance test) are severely under-sized when true values of (i) and (ii)
are very close to or equal to zero. Although the test method in van Garderen and
van Giersbergen [2020] was originally developed for the linear causal mediation
model without treatment/mediator endogeneity, we show that it can be applied to
the semiparametric model with endogeneity.
This paper also provides Monte Carlo evidence to verify that the theoretical re-
sults results are valid. The data generating processes are based on different types
of treatment variables and mediator variables. The simulation results show that the
testing method (i) gives correct size of the test and (ii) can detect non-zero indirect
effects because its statistical power increases as the sample size increases and true
indirect effects assumed in the DGPs deviate from zero.
3
between variables.
Some exceptions that introduce two instrumental variables for the treatment and
mediator are Powdthavee et al. [2013], Burgess et al. [2015], Jhun [2015], and Chen
et al. [2020]. They assume different parametric structures to identify indirect ef-
fects. Frölich and Huber [2017] discuss nonparametric identification of direct and
indirect effects among treatment compliers in the presence of the endogenous treat-
ment/mediator. They assume binary treatment variable and consider three different
scenarios based on a distribution of the mediator and the instrument for the me-
diator. Specifically, the mediator and instrument can be either discrete or continu-
ous, but they cannot both be discrete. In contrast, this paper permits various types
of treatment and mediator, such as continuous, censored, or truncated variables.
Also, the work proposed here focuses on the statistical significance of the indirect
effect rather than its magnitude. Additional studies addressing the issue of treat-
ment/mediator endogeneity include Dippel et al. [2020] and Dippel et al. [2022],
which consider parametric and nonparametric identification of indirect effects in the
presence of two sources of endogeneity. However, their approach focuses on a spe-
cific scenario where a single IV can address both the endogeneity of the treatment
and the mediator.
Although our model is semiparametric, the new test statistics that capture the
indirect effect have properties similar to those of classic estimators for the linear
causal mediation model (e.g., asymptotic normality). To be specific, Sobel [1982] and
Baron and Kenny [1986] consider the system of equations for the causal mediation
and least square estimators for the indirect effect. This paper shows that Kendall’s
tau statistics derived from our model that capture indirect effects can be used to
make simultaneous inference like those estimators. However, our test statistics are
different from classic estimators in that these statistics themselves do not represent
the magnitude of the indirect effect. Rather than magnitude, we focus on the sign
and existence of indirect effects and the statistical significance of the indirect effect
in terms of z−statistics. This is due to the flexibility of the model (see Abrevaya et al.
[2010] for a discussion).
This paper also adds to the literature on the identification and inference on the
sign of causal effects in the presence of endogeneity. (e.g., Shaikh and Vytlacil [2005],
Bhattacharya et al. [2008], Chiburis [2010], Shaikh and Vytlacil [2011], and Machado
et al. [2019]). Specifically, Abrevaya et al. [2010] extend the generalized regression
framework and propose a kernel-weighted version of Kendall’s tau statistics that
captures the existence and direction of a causal effect of an endogenous regressor.
Also, Kline [2016] identifies the existence and direction of a causal effect at a partic-
ular value of exogenous variables, instead of overall causal effects, using the gen-
eralized regression framework. In contrast, this paper focuses on the existence and
direction of the causal mediation effect and the associated inference.
1.2 Outline
The rest of this paper is organized as follows. Section 2 introduces the generalized
regression model, which is an extension of Han [1987]. The system of equations in-
cludes two possibly endogenous regressors to allow for the treatment and mediator
endogeneity. Section 3 introduces a three-step procedures for testing significance of
4
the indirect effect of the endogenous regressor mediated by the endogenous media-
tor.√A kernel-weighted Kendall’s tau coefficient computed in the third stage, which
is n−consistent and asymptotically normal, captures the effect of the mediator on
the outcome. To test the significance of the indirect effect, this statistic is combined
with another kernel-weighted Kendall’s tau statistics that captures the effect of the
treatment on the mediator. We review some issues that classic tests for indirect ef-
fects have and introduce a testing method that overcomes these problems. Com-
plete proofs of the theorems in this section are relegated to the Appendix. Section
4 provides Monte Carlo simulation results to evaluate the performance of the third-
stage test statistics for the indirect effects. The pattern of results shows that the new
method performs well in terms of the size of the test and statistical power.
2 The Model
We consider a system of three equations that constitue the generalized regression
model for the outcome variable Y1 , the mediator variable Y2 , and the treatment vari-
able Y3 . Each variable has an associated latent variable, determined by an unknown
function F . For each variable, the function D describes the form of the observable
variable.
where X2 = (X1 , Z2 ) and X3 = (X1 , Z3 ) 2 . The model for the latent dependent
variables (Y1∗ , Y2∗ , Y3∗ ) has a general linear-index form, where (ϵ1 , ϵ2 , ϵ3 ) ∈ R3 rep-
resents the vector of error disturbances. (X1 , Z2 , Z3 ) are assumed to be independent
of (ϵ1 , ϵ2 , ϵ3 ).
There are no restrictions on the correlation structure among the error distur-
bances (ϵ1 , ϵ2 , ϵ3 ), which allows for endogeneity of the treatment and the mediator.
To handle endogeneity issues, the exclusion restrictions for Y2 and Y3 are provided
by the two subcomponents Z2 and Z3 in X2 and X3 , respectively.
Powell et al. [1989], Ichimura [1993], Klein and Spady [1993], and Abrevaya
[2000] previously considered nonadditive index models. Chesher [2003], Chesher
[2005], and Imbens and Newey [2009] analyzed triangular models without linear in-
dex. Contrary to those models, each equation in our model is allowed to depend on
not only linear index but also lower-level outcome variables.
The functions F1 and F2 are assumed to be strictly monotone in their first ar-
guments (linear indices) and the last arguments (error disturbances). Furthermore,
these two functions are assumed to be weakly monotone in the lower-level outcomes
included as additional indices. Also, we assume that the functions D1 and D2 are
2
One possible extension is X2 = (X1 , Z2 ) and X3 = (X1 , Z2 , Z3 ). Because Z2 for the mediator Y2
is allowed to enter both X2 and X3 , it may also affect the treatment Y3 .
5
weakly increasing and nondegenerate. These assumptions are formally stated as
follows:
Assumption F.
(i) The function F1 (·, y2 , y3 , e1 ) is strictly monotone for all y2 , y3 , and e1 . Also, the func-
tion F2 (·, y3 , e2 ) is strictly monotone for all y3 and e2 .
(ii) The function F1 (x′1 β0 , y2 , y3 , ·) is strictly monotone for every x′1 β0 , y2 , and y3 . Also,
the function F2 (x′2 γ0 , y3 , ·) is strictly monotone for all x′2 γ0 and y3 .
(iii) The function F1 (x′1 β0 , ·, y3 , e3 ) is weakly monotone for all x′1 β0 , y3 , and e3 . Also,
F1 (x′1 β0 , y2 , ·, e3 ) is weakly monotone for all x′1 β0 , y2 , and e3 . The function F2 (x′2 γ0 , ·, e2 )
is weakly increasing for all x′2 γ0 and e2 .
Assumption D. The function D1 and D2 are nondegenerate and weakly increasing in Y1∗
and Y2∗ , respectively.
y2′ > y2′′ =⇒ F1 (ν, y2′ , y3 , e) ≥ F1 (ν, y2′′ , y3 , e) for all ν, y3 , and e or
y2′ > y2′′ =⇒ F1 (ν, y2′ , y3 , e) ≤ F1 (ν, y2′′ , y3 , e) for all ν, y3 , and e, (5)
3
Note that without (1) for the outcome variable of interest, the system of equations that comprises
(2) and (3) for the mediator and the treatment is the same as the one considered in Abrevaya et al.
[2010] in the context of causal effects without endogenous mediator. Therefore, we can focus on the
functions F1 and D1 in this paper to see how Y2 affects Y1 .
6
with strict inequality on some region of the support of Y2 . Note that the variable Y3
can be a moderator in the relationship between Y2 and Y1 , which is possible since
Y3 and Y2 are allowed to interact in the equation for the outcome Y1 . Thus, Y3 can
influence the strength and direction of the effect of Y2 on Y1 . In this case, the indirect
effect can be described as ”moderated mediation” (e.g., Model 1 in Preacher et al.
[2007]).
At this point, we discuss our approach for handling the direct and indirect effects
of Y3 on Y1 . For given values y2 , x′1 β0 , and y3++ ̸= y3+ , the individual direct effect can be
defined as
We will introduce a parameter τ31 that is a type of rank correlation that measures the
association between Y3 and Y1 . The null hypothesis for testing the indirect effect is
simply
H0 : τ31 = 0. (7)
In addition, for given values y3 , x′1 β0 , x′2 γ0 , and y3++ ̸= y3+ , the individual indirect effect
(or mediation effect) can be defined as
where
y2++ = D2 (F2 (x′2 γ0 , y3++ , ϵ2 )), and y2+ = D2 (F2 (x′2 γ0 , y3+ , ϵ2 )). (9)
Per (8) and (9), nonzero causal mediation effect requires two conditions: (i) the vari-
ation in Y3 effectively shifts Y2 , and (ii) the variation in Y2 effectively shifts Y1 . Put
differently, there is no causal mediation effect if either of these conditions is not sat-
isfied. In Section 3, we introduce two parameters τ32 and τ21 that capture the effect
of Y3 on Y2 and the effect of Y2 on Y1 , respectively. To test the indirect effect of Y3 on
Y1 in the model, we can construct two equivalent null hypotheses:
7
conditional independence of the mediator) to handle endogeneity issues (Imai et al.
[2010a] and Imai et al. [2010b]). However, this assumption is arguably strong and not
testable 4 . One may also consider using parametric or nonparametric IV approach
(e.g, Sobel [2008], Frölich and Huber [2017], Dippel et al. [2020], and Chen et al.
[2020]). However, these approaches typically restrict the type of variables and/or
the type of endogeneity. Each method has its own advantage if assumptions for the
identification fit into the specific circumstances or context being considered. At the
same time, instead of attempting to identify and estimate an exact magnitude of the
effect (for a given level of treatment Y3 in Y2 and Y1 ), verifying the existence of the
indirect effect under weaker assumptions may have advantages over other methods
if assumptions required for the identification are too restrictive or less plausible.
Y1 = β1 X1 + β2 Y2 + β3 Y3 + ϵ1 , (11)
Y2 = γ1 X1 + γ2 Y3 + ϵ2 , (12)
Y3 = δ1 X1 + ϵ3 . (13)
The model assumes that the treatment Y3 comes before the mediator Y2 and the me-
diator comes before the outcome Y1 , and there is no reverse causality. Additionally,
the model assumes that there is no interaction between the treatment and the media-
tor when considering their effect on the outcome. Substituting the equation (12) into
(11):
8
Figure 1: The causal DAG for the parametric IV model and the generalized regres-
sion model.
Z3 Z2 X1
Y3 Y2 Y1
τ32 (γ3 ) τ21 (β2 )
τ31 (γ2 )
Note: This DAG describes the causal relationships between variables in the parametric IV model, (15)
- (17), and the generalized regression model, (1) - (3). The parameters associated with the indirect
effect (γ3 and β2 , and τ32 and τ21 ) lie on the same path.
H0 : γ2 = 0 or β2 = 0 ⇐⇒ H0 : γ2 β2 = 0. (14)
Various hypothesis testing methods for these two equivalent null hypotheses can
be found in MacKinnon et al. [2002]. One of them is testing the significance of the
product of two parameters, γ2 β2 . For example, Sobel [1982] suggested a method of
testing the significance of a mediation effect using the estimator γ̂2 β̂2 and the delta
method.
The parametric IV model is one possible extension that incorporates the treat-
ment and mediator endogeneity (Huber [2020]). The following system of linear
equations have two instruments (Z2 and Z3 ) that provide exclusion restrictions:
Y1 = β1 X1 + β2 Y2 + β3 Y3 + ϵ1 , (15)
Y2 = γ1 X1 + γ2 Z2 + γ3 Y3 + ϵ2 , (16)
Y3 = δ1 X1 + δ2 Z3 + ϵ3 . (17)
The system of equations (1) − (3) nests the above linear model. In this model, the
direct effect of Y3 on Y1 is β3 , and the product γ3 β2 is the indirect effect of Y3 on Y1 .
Similar to the typical 2SLS with two equations for the treatment and the outcome,
one can identify γ3 , β2 , and β3 by replacing Y3 in (15) and (16) with Ŷ3 ≡ E[Y3 |X1 , Z3 ]
and Y2 in (15) with Ŷ2 ≡ E[Y2 |X1 , Z2 , Ŷ3 ] in the presence of treatment and mediator
endogeneity if Z2 and Z3 are independent of or uncorrelated with (ϵ1 , ϵ2 , ϵ3 ).
γ3 and β2 play analogous roles to the parameters τ32 and τ21 described above.
Figure 1 is the causal directed acyclic graph (DAG) for the parametric IV model, (15)
9
- (17), and the generalized regression model, (1) - (3). These two models share the
same causal relationships between variables. Also, the parameters associated with
the indirect effect (γ2 and β2 , and τ32 and τ21 ) lie on the same paths.
1. Estimate δ0 in Y3 .
3. Estimate τ̂32 given (δ̂, γ̂), and τ̂21 given (δ̂, γ̂, β̂).
10
′ ′
Pr(ϵ1 ≤ e | ϵ3 ≤ −X3 δ0 , Y2 = D2 (F2 (X2 γ0 , 0, ϵ2 ))) if Y3 = 0,
Pr(ϵ1 ≤ e | Y2 , Y3 , X2 , X3 ) =
Pr(ϵ1 ≤ e | ϵ3 > −X3′ δ0 , Y2 = D2 (F2 (X2′ γ0 , 1, ϵ2 ))) if Y3 = 1.
Hence, it follows that, for two observations indexed by i ̸= j, ϵ1i |Y2i , Y3i , X2i , X3i and
ϵ1j |Y2j , Y3j , X2j , X3j have the same distribution if
Y3i = Y3j ,
′ ′
X3i δ0 = X3j δ0 ,
Y2i = Y2j ,
′ ′
X2i γ0 = X2j γ0 .
We consider pairs of observations that satisfy these four matching conditions to es-
timate β0 . Then, by the two properties (4) and (5) implied by Assumptions F and D,
it holds that
′ ′ ′ ′ ′ ′
X1i β0 ≥ X1j β0 ⇐⇒ Pr(Y1i > Y1j | X1i , X1j , Y3i = Y3j , Y2i = Y2j , X3i δ0 = X3j δ0 , X2i γ0 = X2j γ0 )
′ ′ ′ ′
≥ Pr(Y1i < Y1j | X1i , X1j , Y3i = Y3j , Y2i = Y2j , X3i δ0 = X3j δ0 , X2i γ0 = X2j γ0 ).
(18)
Equation (18) says that, conditional on the matching conditions, it is more likely that
′ ′
Y1i > Y1j than Y1i < Y1j given an inequality X1i β0 ≥ X1j β0 . Based on (18), we can
construct a population version of the objective function G(θ) where parameter vector
θ0 is a solution. The conditional rank correlation G(θ) between Yi and X1′ β is
One may construct a sample objective function based on (19) using the analogy prin-
ciple:
1
1{Y1i > Y1j } 1{X1i′ θ > X1j′ θ}
X
n(n − 1) i̸=j
× 1{Y3i = Y3j } 1{Y2i = Y2j } 1{X3i
′ ′
δ0 = X3j δ0 } 1{X2i
′ ′
γ0 = X2j γ0 }.
11
However, we face two technical challenges. First, the true parameters in the linear
′
indices X3′ δ0 and X2′ γ0 are unknown. Second, the events Y2i = Y2j , X3i ′
δ0 = X3j δ0 ,
′ ′
and X2i γ0 = X2j γ0 may have measure-zero if the corresponding regressors have
continuous components. In this case, the value of the objective function is zero.
To address these issues we use estimates of δ0 and γ0 , and kernel weights for the
matching conditions. We introduce the conditional MRC estimator for θ0 , which is
obtained by maximizing the following objective function Ĝn (θ):
1
1{Y1i > Y1j } 1{X1i′ θ > X1j′ θ}
X
Ĝn (θ) =
n(n − 1) i̸=j
where kh (u) ≡ h−1 n K(u/hn ) for a kernel function K(·) and a bandwidth hn . The
kernel weighting places more weight on pairs of observations where Y2i ≃ Y2j ,
′ ′ ′ ′
X3i δ̂ ≃ X3j δ̂, and X2i γ̂ ≃ X2j γ̂. Note that if Y3 is not binary, the indicator function
that is used to match Y3i with Y3j would also be replaced with a kernel function.
The objective function (20) is a second-order U -process and a discontinuous func-
√
tion of θ 7 . To prove the n−consistency and asymptotic normality of θ̂ for this type
of objective function, we follow similar arguments used in Han [1987], Pakes and
Pollard [1989], Sherman [1993], Sherman [1994a], Sherman [1994b], Abrevaya [1999],
and Khan et al. [2021]. Theorem 1 establishes the asymptotic properties of the esti-
mator of θ0 under appropriate assumptions and regularity conditions introduced in
the Appendix. The asymptotic variance of θ̂ is affected by the estimators of δ0 and γ0
from the first and second stages.
p
Theorem 1. If Assumptions 1 -9 hold, then (i) θ̂ → θ0 and (ii) θ̂ is asymptotically normal,
√
with n(θ̂ − θ0 ) → N (0, V −1 ΛV −1 ), where V = 2−1 E[∇2 τk (θ0 )], Λ = E[∆i ∆′i ], and
d
∆i = ∇1 τk (θ0 )
+ E[∇θ µ2,ξj (Y2j , Y2j , ζj , ζj , ξj , ξj , θ0 )] − E[∇θ µ2,ξi (Y2i , Y2i , ζi , ζi , ξi , ξi , θ0 )] ψγk
12
Note that because directly using the terms in Theorem 1 to estimate the standard
error of θ̂0 is challenging, it is recommended to use bootstrapping to estimate se(θ̂0 ).
′ ′
X1i β0 = X1j β0 ,
Y3i = Y3j .
If there exists a positive effect of Y2 on Y1 given X1′ β0 and Y3 , then the rank correlation
between Y1 and X2′ γ0 is more likely to be positive:
′ ′ ′ ′
X2i γ0 ≥ X2j γ0 ⇐⇒ Pr(Y1i > Y1j | X2i , X2j , X1i β0 = X1j β0 , Y3i = Y3j )
′ ′
≥ Pr(Y1i < Y1j | X2i , X2j , X1i β0 = X1j β0 , Y3i = Y3j ).
′ ′ ′ ′
X2i γ0 ≥ X2j γ0 ⇐⇒ Pr(Y1i > Y1j | X2i , X2j , X1i β0 = X1j β0 , Y3i = Y3j )
′ ′
≤ Pr(Y1i < Y1j | X2i , X2j , X1i β0 = X1j β0 , Y3i = Y3j ).
′ ′
The event X1i β0 = X1j β0 can be measure-zero when X1 has a continuous compo-
nent. Also, we do not know β0 and γ0 , so we construct a kernel-weighted version of
Kendall’s tau statistics based upon estimates β̂ and γ̂:
′ ′
P
i̸=j ω̃ij sgn(Y1i − Y1j ) sgn(X2i γ̂ − X2j γ̂)
τ̂21 ≡ P ,
i̸=j ω̃ij
′
where ω̃ij ≡ kh (X1i ′
β̂ − X1j β̂) · 1{Y3i = Y3j }. Kernel weighting is used for continuous
random variables, while the indicator function is used for discrete random variables.
13
Note that if X1 has a single element, there is no need to estimate the parameter
β0 . Then, the weight ω̃ij simplifies to either kh (X1i − X1j ) · 1{Y3i = Y3j } if X1i is
continuous or 1{X1i = X1j } · 1{Y3i = Y3j } if X1i is discrete. If X1 has no elements, it
follows that ω̃ij = 1{Y3i = Y3j }.
√
Given n−consistent and asymptotically normal estimators δ̂, β̂, and γ̂, it can be
p
shown that as n → ∞ and the kernel bandwidth hn goes to zero, τ̂21 → τ21 , where
(21) identifies the existence of the effect of Y2 on Y1 for given two matching condi-
tions. In this sense, τ21 plays a similar role to the parameter β2 of (15) in the paramet-
ric IV model. √
Theorem 2 shows that τ̂21 is n−consistent and asymptotically normal.
Theorem 2. If Assumptions 1 - 9 hold, then τ̂21 has the following asymptotically linear
representation:
τ̂21 − τ21
n
X n
Pr[Y3i = Y3j ] 1 X
= U(Y1i , ξi ) + U(Y1j , ξj )
E[f (χi )] n i=1 j=1
n
E[D1 (Y3i = Y3j , χj , χj )f (χj ) + D(Y3i = Y3j , χj , χj )f ′ (χj )] ψβi
X
−
i=1
n i
E[T (Y3i = Y3j , χj , χj )] ψγi
X
+2 + op (n−1/2 ),
i=1
14
′ ′
P
i̸=j ω̂ij sgn(Y2i − Y2j ) sgn(X3i δ̂ − X3j δ̂)
τ̂32 ≡ P ,
i̸=j ω̂ij
′ ′
where ω̂ij ≡ kh (X2i γ̂ − X2j γ̂). As n → ∞ and the bandwidth hn goes to zero, we can
p
show that τ̂32 → τ32 , where
(22) identifies
√ the existence of the effect of Y3 on Y2 . Abrevaya et al. [2010] also estab-
lished the n−consistency and asymptotic normality of τ̂32 .
Corollary 1. The z−statistics for τ32 and τ21 , T32 ≡ τ̂32 /se(τ̂32 ) and T21 ≡ τ̂21 /se(τ̂21 ), are
asymptotically independent and normally distributed:
d
τ
32 τ21 ′
T → N (µ, I2 ), where T = (T32 , T21 )′ and µ = ,
σ(τ32 ) σ(τ21 )
Using the asymptotic properties of τ̂32 and τ̂21 , we can test (10) to determine whether
an indirect effect exists.
Remark. This paper focuses more on simultaneous inference using τ̂32 and τ̂21 to test
the existence of indirect effects. However, we can also construct τ̂31 that captures the
direct effect of Y3 on Y1 . Using τ̂31 , we can test the existence of a direct effect in the
model based on its asymptotic normality. After substituting the equation of Y3 in (3)
into Y1 in (1), we see that
For a fixed value of X1′ β0 and Y2 , the effect of Y3 on Y1 determines the sign of the rank
correlation between Y1 and X3′ δ0 , and vice versa. Therefore, we obtain the following
matching conditions:
15
′ ′
X1i β0 = X1j β0 ,
Y2i = Y2j .
Based on these matching conditions, we can construct Kendall’s tau statistics τ̂31
given (δ̂, β̂):
The advantage of the rank-based approach in this section is that it requires fewer
choices of bandwidth parameters for estimation compared to other semiparametric
methods. However, the objective function for this type of rank estimation involves a
double summation over the n(n − 1) observation pairs. It requires O(n2 ) calculations
and slows down computation speed for estimation. To overcome the problem and
enhance computational efficiency, previous studies such as Cavanagh and Sherman
[1998] and Abrevaya [1999] proposed methods to reduce the number of operations
to O(n log n). Additionally, recent improvements in CPU and computation skills, in-
cluding parallel computing, have made the rank-based approach more feasible 9 .
16
|T |(1) ≡ min{|T32 |, |T21 |},
|T |(2) ≡ max{|T32 |, |T21 |}.
Then we consider critical regions that are bounded by a certain function g and rejects
H0 : τ32 τ21 = 0 if |T |(1) > g(|T |(2) ). Specifically, we consider g in D(R+
0 , R0 ) which is
+
region
Critical Region: CRg = {(T32 , T21 ) ∈ R2 | |T |(1) > g(|T |(2) )},
Acceptance Region: CRg = {(T32 , T21 ) ∈ R2 | |T |(1) ≤ g(|T |(2) )}.
Following van Garderen and van Giersbergen [2020], we can show the existence and
uniqueness of a function g that is said to be a similar boundary function. We want to
construct a test that does not depend on the value of the parameter under the null
hypothesis H0 : τ32 τ21 = 0.
Definition 2. g(·) is a similar boundary function if the probability of the critical region CRg
defined by g is constant under H0 : τ32 τ21 = 0,
Proposition 1. (Theorem 2 in van Garderen and van Giersbergen [2020]) A similar bound-
ary function g(·) exists for testing H0 : τ32 τ21 = 0 if and only if 1/α is an integer 10 (or
trivially α = 0). If it exists, the boundary is unique in D(R+
0 , R0 ).
+
This test method is called an exact similar test. However, it turns out that the func-
tion g has some undesirable statistical properties. To tackle this issue, we vary the
g−function systematically by minimizing certain criterion function Q(g) to derive a
test called a nearly similar test. This new g-function that minimizes Q(g) gives a size
of the test that is sufficiently close to a desired significance level α. Furthermore,
10
Specifically, a similar boundary function g exists and is unique if for a given significance level α,
1/α is an integer. Therefore, we can find g for common significance levels (e.g., 1%, 5%, and 10%).
17
this function eliminates undesirable critical regions, such as cases where both T32
and T21 are smaller than 0.1, which can result in inappropriate and counter-intuitive
statistical inferences 11 .
Therefore, we can obtain valid critical values and size of the test for H0 : τ32 τ21 = 0
using the method suggested by van Garderen and van Giersbergen [2020]. In the
Section 4 Monte Carlo simulation, we leverage the method to get critical values for
a given significance level α.
Bootstrapping
One may consider using the asymptotic distribution of τ̂32 τ̂21 to test the null hy-
pothesis H0 : τ32 τ21 = 0. This approach could be to directly estimate τ̂32 τ̂21 and
its standard error se(τ̂32 τ̂21 ), and then use these estimates to construct a t-statistic.
However, while the asymptotic normality of τ̂32 and τ̂21 is established, the behavior
of their product τ̂32 τ̂21 is complex. We can show that the product estimator τ̂32 τ̂21 is
asymptotically not pivotal 12 . Note that the null hypothesis H0 : τ32 τ21 = 0 can be
divided into three possible cases:
Recent work by Nadarajah and Pogány [2016] tackled the exact distribution of the
product of two possibly correlated normal random variables. They demonstrated
that the product follows a distribution that is not normal. Based on their findings,
we can show that τ̂32 τ̂21 is asymptotically normal in the null Case 1 and 2, but follows
non-normal distribution in the null Case 3 (i.e., not pivotal). Therefore, the bootstrap
method is not valid since it is not pivotal.
Given the joint asymptotic normality of (τ̂32 , τ̂21 ), one potential approach is to
use testing methods proposed by Sobel [1982]. This method has been widely used
11
See Perlman and Wu [1999] for a detailed discussion of exact tests and appropriate statistical
inference.
12
For detailed proof, see the Appendix.
18
to test the existence of causal mediation effects in linear equations models. To be
specific, we can construct the Wald statistic Tn based on τ̂32 τ̂21 using the first-order
delta method:
τ̂32 τ̂21
Tn = p 2 2
.
τ̂32 se(τ̂21 )2 + τ̂21 se(τ̂32 )2
It relies on the asymptotic normality of the estimators used for the transformation
g(x, y) = xy, where (x, y) ∈ R2 . Similar to the bootstrap method, however, the
asymptotic distribution of the Sobel’s test depends on three null cases and is not
pivotal. Using the result of Glonek [1993], we can show that Tn has a discontinuity
in parameter space. Specifically, the Wald statistic requires the condition Dg(x, y) =
(y, x) ̸= (0, 0), which is not satisfied at (0, 0) 13 . According to the finding of Glonek
[1993], the asymptotic distribution of Tn is dependent on the true parameter (τ32 , τ21 ):
(
d N (0, 1) if (τ32 , τ21 ) ̸= (0, 0) (Case 1 and 2)
Tn →
N (0, 41 ) if (τ32 , τ21 ) = (0, 0) (Case 3).
√
Therefore, this is not pivotal. Although we know that Tn is n-consistent and fol-
lows normal distribution asymptotically, we cannot rely on the standard normal dis-
tribution table to obtain the critical value that covers three possible cases. Recent
studies by van Garderen and van Giersbergen [2020] and Liu et al. [2022] formally
demonstrated that (i) In the null Case 1 and 2, Tn follows N (0, 1) asymptotically, but
under-rejects if the sample size is finite, and (ii) in the null Case 3, Tn severely under-
rejects the null hypothesis H0 : τ32 τ21 = 0 when the true parameters (τ32 , τ21 ) are close
to or equal to (0, 0).
Figure 2 illustrates the issue that occurs when the true values of (τ32 , τ21 ) are both
equal to zero. The blue and orange lines in the figure represent the asymptotic distri-
butions of the test statistics T32 = τ̂32 /se(τ̂32 ) and T21 = τ̂21 /se(τ̂21 ), respectively. We
see that both test statistics are asymptotically standard normal. However, the green
line represents the asymptotic distribution of the product Tprod = τ̂32 τ̂21 /se(τ̂32 τ̂21 ),
and it follows N (0, 1/4). Therefore, using Tprod and the critical value from the stan-
dard normal distribution leads to severe under-rejection of the null hypothesis in
this case.
One may try to test the equivalent null hypothesis H0 : τ32 = 0 or τ21 = 0, based
on the fact that nonzero mediation effects require two conditions: (i) the variation
in Y3 affects Y2 , and (ii) the variation in Y2 affects Y1 . A test that takes into ac-
count these two conditions is called the joint significance test (MacKinnon et al.
13
This is the situation in which singularity arises. A comprehensive and general explanation can
be found in Dufour et al. [2013] and Drton and Xiao [2016].
19
Figure 2: Asymptotic distributions of T32 , T21 , and Tprod
Note: Both T32 (Blue line) and T21 (Orange line) follow the standard normal distribution asymptoti-
cally. However, Tprod follows N (0, 1/4) asymptotically.
[2002]), or the causal steps test (Biesanz et al. [2010]). This test uses the minimum
of the absolute values of two test statistics 14 . Specifically in our case, we can use
|T |(1) ≡ min{|T32 |, |T21 |} for testing the null hypothesis. van Garderen and van Giers-
bergen [2020] and Liu et al. [2022] showed that the joint significance test is always
slightly more powerful than the Sobel’s test. However, they also demonstrated that
(i) in the null Case 1 and 2, the joint significance test has correct size α if the sample
size goes to infinity, and (ii) in the null Case 3, the rejection rate is always smaller than
desired level α. Therefore, the joint significance test has a similar under-rejection
problem.
14
van Giersbergen et al. [2014] and Liu et al. [2022] showed that this test is equivalent to the likeli-
hood ratio test if one uses the linear causal mediation model.
20
DGP 1: Binary Y1 and Y2 DGP 2: Continuous Y1 and Y2
Y1 = 1{β2 Y2 + β3 Y3 + ϵ1 ≥ 0} Y1 = β2 Y2 + β3 Y3 + ϵ1
Y2 = 1{X2 + γY3 + ϵ2 ≥ 0} Y2 = X2 + γY3 + ϵ2
Y3 = 1{X3 + ϵ3 ≥ 0} Y3 = 1{X3 + ϵ3 ≥ 0}
Y1 = 1{β2 Y2 + β3 Y3 + ϵ1 ≥ 0} Y1 = β2 Y2 + β3 Y3 + ϵ1
Y2 = X2 + γY3 + ϵ2 Y2 = 1{X2 + γY3 + ϵ2 ≥ 0}
Y3 = 1{X3 + ϵ3 ≥ 0} Y3 = 1{X3 + ϵ3 ≥ 0}
where ρ ∈ {0, 0.3, 0.5, 0.7}. Note that we do not estimate the linear indices in this
setup and focus on the pure performance of the new estimator τ̂21 .
For each of the four DGP designs, we consider two cases: (γ, β2 ) = (0, 0) and
(γ, β2 ) = (1, 0). Furthermore, we set β3 = 1 to assume that there is a direct effect of Y3
on Y1 . In both cases, however, there is no indirect effect of Y3 on Y1 by construction.
Therefore, τ32 τ21 = 0 in both cases, but τ32 could be either zero or nonzero. Using the
null hypothesis, this situation can be described as
By the asymptotic normality of τ̂32 and τ̂21 we established in Section 3, either two of
z−statistics, or both, can be asymptotically standard normal under the null hypoth-
esis:
τ̂32 a τ̂21 a
T32 ≡ ∼ N (0, 1) or T21 ≡ ∼ N (0, 1).
ŝ32 ŝ21
Figure 3 shows that the asymptotic properties of T32 and T21 behave properly under
two different scenarios. Therefore, we may leverage the testing method suggested
by van Giersbergen et al. [2014] that hinges on the asymptotic normality of test statis-
tics. We see that T32 and T21 follow the standard normal distribution in the scenario
21
Figure 3: Estimated probability densities of T32 and T21
Note: The left and right figures are generated under the scenario (γ, β2 ) = (0, 0) and (γ, β2 ) = (1, 0)
using DGP 1, respectively. The bootstrap standard error is utilized to calculate T32 and T21 .
(γ, β2 ) = 0. However, T32 is not standard normal in the scenario (γ, β2 ) = (1, 0) since
there exists positive effect of Y3 on Y2 15 .
We conduct 1,000 simulations for each of the designs with a sample size n = 500.
Also, we use the bootstrap method to estimate standard errors s32 and s21 of τ̂32 and
τ̂21 and for T32 and T21 . Table 1 reports rejection rates for the 5% level for the four
DGP scenarios. The first four and the remaining rows of the table correspond to
the case (γ, β2 ) = (0, 0) and (γ, β2 ) = (0, 1), respectively. Overall, the rejection rates
reported in the four designs align closely with the significance level of 5%, even
when considering higher levels of endogeneity (ρ = 0.5 and ρ = 0.7).
We also evaluate the statistical power of the new testing method for using the
same DGP designs. Table 2 reports the power rates for three sizes of the indirect ef-
fect (small, medium, and large) at three sample sizes (n =200, 500, and 1000). Again,
we conduct 1,000 simulations for each of the designs. The level of endogeneity is
fixed (ρ = 0.7). The pattern of results shows that the power of the new testing
method increases as the sample sizes and the size of the indirect effect increase as
expected.
5 Conclusion
In this paper, we proposed a new generalized regression model with mediator vari-
ables and test statistics for the direct and indirect effects of a treatment variable. The
proposed methodology addresses the limitations of previous models by incorporat-
ing endogenous treatment/mediator and accommodating various types of variables
(e.g., binary, continuous, censored, or truncated). The indirect effect can be tested by
constructing two estimators that capture (i) the effect of the treatment on the media-
tor, and (ii) the effect of the mediator on the outcome of interest. To perform simul-
15
Note that an exact value of nonzero τ32 cannot be suggested since the true parameter τ32 exists as
a conditional expectation with respect to given matching conditions.
22
Table 1: Type I error rates for the hypothesis testing
Note: Rejection rates over 1,000 simulations for tests at 5% level are reported. The number of boot-
straps used for estimating standard error is 200.
Note: For all analyses, γ = β2 , β3 = 1, and ρ = 0.7. Small effect size = 0.3, medium effect size = 0.6,
and large effect size = 0.9. The number of bootstraps used for estimating standard error is 200.
23
taneous inference on the indirect effect, we use√two estimators that capture (i) and
(ii). The estimators of these effects converge at n-rate and are asymptotically nor-
mal. However, it turns out that popular classic test methods for indirect effects are
severely under-sized. Therefore, we leverage the method proposed by van Garderen
and van Giersbergen [2020] to obtain correct rejection probability.
There are several interesting issues that may be addressed in future work. One is
the challenge of computational speed when dealing with high-dimensional regres-
sors. We can enhance estimation speed through computational techniques like par-
allel processing. However, optimizing a discontinuous objective function remains
challenging, particularly when dealing with high-dimensional regressors and kernel
densities for matching conditions. There has been research that aims to address the
challenge of computation burden associated with high-dimensional covariates. For
instance, Khan et al. [2023] has demonstrated a new approach using iterative con-
vex optimization to estimate high-dimensional monotone index models and reduce
the computational burden. Whether it is possible to apply this type of estimation
strategy to the class of models within the proposed generalized regression model
framework is an open question. Another intriguing question is about simultaneous
inference for indirect effects and its critical region. The inference in this paper is
based on the critical region suggested by van Garderen and van Giersbergen [2020]
that has appropriate size properties. It would be interesting to explore whether al-
ternative approaches can be developed.
24
A Asymptotic distribution of the product τ̂32τ̂21
We first define two random variables W32 and W21 :
where s32 and s21 are the standard deviation of τ̂32 and τ̂21 characterized by the
√ d
asymptotic linear representations for τ̂32 and τ̂21 in Theorem 1. Then nW32 → N (0, 1)
√ d
and nW21 → N (0, 1). Therefore, nW32 W21 converges to some random variable
Wprod , which is the product of two standard normal random variables.
According to the findings of Nadarajah and Pogány [2016], it turns out that when
considering two standard normal random variables X and Y , the product Z = XY
follows a non-normal distribution with E[Z] = ρ and V ar(Z) = 1 + ρ2 , where ρ =
corr(X, Y ).
If τ32 = τ21 = 0, for example, by the continuous mapping theorem,
√ √ d
nτ̂32 τ̂21 = s32 s21 · nW32 nW21 → s32 s21 · Wprod .
Therefore, τ̂32 τ̂21 is n-consistent and the t-statistic based on its standard error does
not follow the standard normal distribution asymptotically.
Without loss of generality, suppose that τ21 = 0. Then
√ √ s32 s21 √ √ d
nτ̂32 τ̂21 = τ32 s21 · nW21 + √ · nW32 nW21 → τ32 s21 · N (0, 1) + 0 · Wprod .
n
√
If either τ32 or τ21 is zero, τ̂32 τ̂21 is n−consistent and the z−statistic based on its
standard error converges to the standard normal distribution. We conclude that the
z−statistic for τ̂32 τ̂32 is not pivotal, and therefore the bootstrap method is not valid.
25
B Proofs of Asymptotic Results
B.1 Assumptions
Assumption 2. The data {(Xi′ , Yi′ )′ }ni=1 are i.i.d distributed where Xi = (X1i , Z2i , Z3i )′ and
Yi = (Y1i , Y2i , Y3i )′ .
For future use, define the following terms that represent the linear indices and the
difference of variables:
′ ′ ′
(ζi , ξi , χi ) ≡ (X3i δ0 , X2i γ0 , X1i β0 ),
(ζij , ξij , χij , Y2ij , X1ij ) ≡ (ζi − ζj , ξi − ξj , χi − χj , X1i − X1j ).
(1)
Assumption 5 (First regressor and index properties). Let X1 be the first component of
X1 , and X̃1 be the other components of X1 , and similarly for X2 .
(i) The conditional density of X1ij on R is almost everywhere positive given X̃1ij and
(1)
(ii) The conditional density of X2ij on R is almost everywhere positive given X̃2i and X̃2i .
(1)
26
Assumption 7. Let N denote a neighborhood of β0 and W denote the support of Wi . Write
∇m for the mth partial derivative operator with respect to θ, and
X ∂m
|∇m |σ(θ) ≡ σ(θ) .
i1 ,...,im
∂θi1 · · · ∂θim
The symbol ∥·∥ denotes the matrix norm: ∥(aij )∥ = ( i,j a2ij )1/2 .
P
Z
| 1{X1i
′ ′
β(θ) > X1j β(θ)} − 1{X1i
′ ′
β0 > X1j β0 }| f (X1i , Y2j , ζj , ξj )dX1i ≤ ϕs (X1j ) ||θ − θ0 ||.
(ii) For each w in W, all mixed second partial derivatives of τ (w, ·) exists on N .
(iii) There is an integrable function M (w) such that for all w in W and θ in N
(iv) E[|∇1 τ (·, θ0 )|2 ] < ∞, E[|∇2 |τ (·, θ0 )] < ∞, and the matrix E[∇2 τ (·, θ0 )] is negative
definite.
27
B.2 Proof of Theorem 1
1
1{Y1i > Y1j } 1{X1i′ θ > X1j′ θ}
X
Gn (θ) =
n(n − 1) i̸=j
1
1{Y1i > Y1j } 1{X1i′ θ > X1j′ θ}
X
Ĝn (θ) =
n(n − 1) i̸=j
We start from showing that Ĝn (θ) converges uniformly to G(θ), where
hZ i
G(θ) ≡ E Sij (θ) F(X1i , X1j , Y2j , Y2j , ζj , ζj , ξj , ξj ) H(ζj , ζj ) f (X1i , Y2j , ζj , ξj ) dX1i .
(26)
Lemma 1. If Assumptions 1 - 6, 8, and 9 hold, then supθ∈Θ |Ĝn (θ) − G(θ)| = op (1).
16
In (23), we subtract the term 1{X1i
′ ′
β0 > X1j β0 }. However, doing this does not change a value of
the estimator.
28
Proof. By the triangular inequality,
sup |Ĝn (θ) − G(θ)| ≤ sup |Ĝn (θ) − Gn (θ)| + sup |Gn (θ) − E[Gn (θ)]| + sup | E[Gn (θ)] − G(θ)|.
θ∈Θ θ∈Θ θ∈Θ θ∈Θ
Note that Ĝn (θ) converges uniformly to Gn (θ) due to the consistency of δ̂ and γ̂ estab-
lished by Sherman [1993] and Abrevaya et al. [2010], and the dominated convergence
theorem. To prove the desired result for the second term on the right-hand side, we
need to show that
fn = 1{Y1i > Y1j }Sij (θ) 1{Y3i = Y3j } kh (Y2ij ) kh (ζij ) kh (ξij ).
Fn ≡ {1{Y1i > Y1j } 1{Y3i = Y3j }Sij (θ) K(∆Y2ij /hn ) K(∆ζij /hn ) K(∆ξij /hn ) | θ ∈ Θ}
= {h3n fn | θ ∈ Θ}
F ≡ {1{Y1i > Y1j } 1{Y3i = Y3j } Sij (θ) K(∆Y2ij /h) K(∆ζij /h) K(∆ξij /h) | θ ∈ Θ, h > 0}
= Fβ Fh1 Fh2 Fh3
with
Notice that 1{Y1i > Y1j } 1{Y3i = Y3j } Sij (θ) is uniformly bounded by 1. Example
2.11 and Lemma 2.15 in Nolan and Pollard [1987] imply that Fβ is Euclidean for the
constant envelope 1. Also, Fh1 , Fh2 , and Fh3 are Euclidean for the constant envelope
29
supu∈R |K(u)| by Lemma 22 in Nolan and Pollard [1987]. Putting all these results
together, F is Euclidean for the constant envelope (supv∈R |K(v)|)3 . Let G̃n (θ) =
h3n Gn (θ). Then G̃n (θ) − E[G̃n (θ)] is the second-order U −statistic with zero mean and
Euclidean with the constant envelope (supv∈R |K(v)|)3 . Consequently, Corollary 4 in
Sherman [1994a] and Assumption 9 imply that
To complete the argument, we need to demonstrate for the third term that
E[Gn (θ)]
= E[1{Y1i > Y1j } 1{Y3i = Y3j } Sij (θ) kh (Y2ij ) kh (ζij ) kh (ξij )]
= E kh (Y2ij ) kh (ζij ) kh (ξij ) Sij (θ) · E[1{Y1i > Y1j } 1{Y3i = Y3j } | X1i , X1j , Y2i , Y2j , ζi , ζj , ξi , ξj ] .
(27)
Using (24) and (25), the conditional expectation term in (27) can be written as
E[Gn (θ)]
Z
= K(u) K(v) K(w) Sij (θ) F(X1i , X1j , Y2j + uhn , Y2j , ζj + vhn , ζj , ξj + whn , ξj ) H(ζj + vhn , ζj )
30
R
Note that the first-order term in the Taylor expansion above is 0 since uK(u)du = 0
by Assumption 8 and the second-order term is O(||θ−θ0 ||h2n ) = O(h2n ) by Assumption
7. Thus by Assumption 9,
This establishes the uniform convergence of Ĝn (θ) to its limiting expected value G(θ).
′ (1) ′ ′ ′ (1)
Pr[(X̃1ij θ̃0 < −X1ij < X̃1ij θ̃) ∪ (X̃1ij θ̃ < −X1ij < X̃1ij θ̃0 ) | X1i , X1j , ζi = ζj , ξi = ξj ] > 0.
The probability that θ0 and θ yield distinct values of Sij (·) in G(·) is non-zero. There-
fore, G(θ0 ) > G(θ). This inequality implies that if G(θ0 ) = G(θ), then it must hold
that
′ (1) ′ ′ ′ (1)
Pr[(X̃1ij θ̃0 < −X1ij < X̃1ij θ̃) ∪ (X̃1ij θ̃ < −X1ij < X̃1ij θ̃0 ) | X1i , X1j , ζi = ζj , ξi = ξj ] = 0.
′ ′
Pr[X̃1ij θ̃0 = X̃1ij θ̃ | X1i , X1j , ζi = ζj , ξi = ξj ] = 1.
p
Lemma 3. If Assumptions 1 - 6, 8, and 9 hold, then θ̂ −→ θ0 .
Proof. The four sufficient conditions for Theorem 2.1 in Newey and McFadden [1994]
must be verified to prove the result: (C1) Θ is a compact set, (C2) supθ∈Θ |Ĝn (θ) −
G(θ)| = op (1), (C3) G(θ) is continuous in θ, and (C4) G(θ) is uniquely maximized at
θ0 .
Assumption 3 satisfies condition (C1), while condition (C2) is established by
Lemma 1. Also, (C3) is satisfied because G(θ) is continuous by Assumption 5. The
proof of Lemma 2 verifies the identification condition (C4).
31
√
After proving the consistency of θ, the next step is to establish its n−consistency
and asymptotic normality. We first introduce Theorem 1 of Sherman [1994b] that
suggests sufficient conditions for rates of convergence of θ.
Lemma 4. (Theorem 1 of Sherman [1994b]) Suppose θ̂ maximizes Ĝn (θ) and θ0 maximizes
G(θ). Let {δn } and {ϵn } be sequences of non-negative real numbers converging to zero as n
tends to infinity. If
1. θ̂ − θ0 = Op (δn )
2. there exists a neighborhood N of θ0 and a constant κ > 0 for which Ĝn (θ0 ) − G(θ0 ) ≤
−κ ∥θ − θ0 ∥2
√
Ĝn (θ) = G(θ) + Op (∥θ − θ0 ∥ / n) + op (∥θ − θ0 ∥2 ) + Op (ϵn ),
1/2 √
then ∥θ − θ0 ∥ = Op (max[ϵn , 1/ n]).
√
To establish n−consistency and asymptotic normality of θ, we examine
1 1
Ĝn (θ) = (θ − θ0 )′ V (θ − θ0 ) + √ (θ − θ0 )′ Zn + op (n−1 ), (31)
2 n
gn (Wi , Wj , θ) ≡ kh (Y2ij ) kh (ζij ) kh (ξij ) Sij (θ) F(X1i , X1j , Y2i , Y2j , ζi , ζj , ξi , ξj ) H(ζi , ζj ).
(32)
32
Un1 (w, θ) = E[gn (Wi , Wj , θ) | Wj = w] + E[gn (Wi , Wj , θ) | Wi = w] − 2 E[Gn (θ)],
n
1X 1 1
Gn (θ) = E[Gn (θ)] +
X
Un (Wk , θ) + U 2 (Wi , Wj , θ). (33)
n k=1 n(n − 1) i̸=j n
Z
τl (Wi , θ) ≡ Sij (θ) F(X1i , X1j , Y2i , Y2i , ζi , ζi , ξi , ξi ) H(ζi , ζi ) f (X1j , Y2i , ζi , ξi ) dX1j
(34)
Z
τr (Wj , θ) ≡ Sij (θ) F(X1i , X1j , Y2j , Y2j , ζj , ζj , ξj , ξj ) H(ζj , ζj ) f (X1i , Y2j , ζj , ξj ) dX1i
(35)
τk (θ) ≡ τl (Wk , θ) + τr (Wk , θ) for 1 ≤ k ≤ n. (36)
We first need to revisit the term G(θ) defined in (26), which was derived to prove
Lemma 1, to address the lead term E[Gn (θ)] of (33). Note that E[τl (Wi , θ)] = E[τr (Wj , θ)] =
G(θ) and E[τk (θ)] = 2G(θ).
Lemma 5. If Assumptions 7, 8, and 9 hold, then uniformly over Θn , we have
1
E[Gn (θ)] = (θ − θ0 )′ V (θ − θ0 ) + op (||θ − θ0 ||2 ),
2
where V ≡ 2−1 E[∇2 τk (θ0 )].
Proof. To obtain the term (29), we take the second-order Taylor expansion inside
around uhn = 0, vhn = 0, and whn = 0. The lead term is G(θ) and all remaining terms
are zero except the last term, which is of order O(h2n δn ) = o(n−1/2 δn ) = o(O(δn2 )) =
op (||θ − θ0 ||2 ) in the shrinking neighborhood Θn since h2n = o(n−1/2 ) by Assumption
9. Therefore,
33
Now, we consider G(θ). By taking a second-order Taylor expansion of τk (θ) around
θ0 ,
1
τk (θ) − τk (θ0 ) = (θ − θ0 )′ ∇1 τk (θ0 ) + (θ − θ0 )∇2 τk (θ̄)(θ − θ0 )
2
1
= (θ − θ0 )′ ∇1 τk (θ0 ) + (θ − θ0 )∇2 τk (θ0 )(θ − θ0 )
2
1
+ (θ − θ0 )′ [∇2 τk (θ̄) − ∇2 τk (θ0 )](θ − θ0 ).
2
1
E[τk (θ)] = 2G(θ) = (θ − θ0 )′ E[∇2 τk (θ0 )](θ − θ0 ) + o(∥θ − θ0 ∥2 ).
2
In the next step, we consider the second term of the decomposition (33).
1 X 1 1
Un (Wk , θ) = √ (θ − θ0 )′ Zn1 + op (||θ − θ0 ||2 ),
n 1≤k≤n n
E[gn (Wi , Wj , θ) | Wj = Wk ]
Z
= kh (Y2ik ) kh (ζik ) kh (ξik ) Sik (θ) F(X1i , X1k , Y2i , Y2k , zetai , ζk , ξi , ξk ) H(ζi , ζk ) dF (X1i , Y2i , ζi , ξi )
Z
= K(u)K(v)K(w) Sik (θ) F(X1i , X1k , Y2k + uhn , Y2k , ζk + vhn , ζk , ξk + whn , ξk ) H(ζk + vhn , ζk )
= τr (Wk , θ) + O(h2n δn ).
34
E[gn (Wi , Wj , θ) | Wi = Wk ]
Z
= Skj (θ) F(X1k , X1j , Y2k , Y2k , ζk , ζk , ξk , ξk ) H(ζk , ζk ) f (X1j , Y2k , ζk , ξk ) dX1j + O(h2n δn )
= τl (Wk , θ) + O(h2n δn ).
Therefore, Un1 (Wk , θ) = τk (θ) − 2 E[Gn (θ)] + O(h2n δn ). Also, O(h2n δn ) = o(||θ − θ0 ||2 )
within a shrinking neighborhood Θn .
n
1X 1
U (Wk , θ)
n k=1 n
n
1X
= τk (θ) − (θ − θ0 )′ V (θ − θ0 ) + op (||θ − θ0 ||2 )
n k=1
n n
1X 1X
= (θ − θ0 )′ ∇1 τk (θ0 ) + (θ − θ0 )′ [∇2 τk (θ0 ) − V ](θ − θ0 ) + op (||θ − θ0 ||2 )
n k=1 n k=1
1 X
U 2 (Wi , Wj , θ) = Op (n−1 h−3
n ).
n(n − 1) i̸=j n
Deduce from similar arguments for proving Lemma 1 and Corollaries 17 and 21
in Nolan and Pollard [1987] that Fn∗ is Euclidean for a constant envelope. Using
Corollary 4 in Sherman [1994a], we conclude that
1 X
U 2 (·, ·, θ) = M h−3 −1 −1 −3
n Op (n ) = Op (n hn ).
n(n − 1) i̸=j n
35
By combining Lemma 6 and 7, we obtain the following representation for Gn (θ):
1 1
Gn (θ) = (θ − θ0 )′ V (θ − θ0 ) + √ (θ − θ0 )′ Zn1 + op (||θ − θ0 ||2 ) + Op (n−1 h−3
n ).
2 n
1
Rn = √ (θ − θ0 )′ (Zn2 + Zn3 ) + op (n−1/2 ),
n
where
n
1 X
Zn2 E[∇θ µ2,ξj (Y2j , Y2j , ζj , ζj , ξj , ξj , θ0 )] − E[∇θ µ2,ξi (Y2i , Y2i , ζi , ζi , ξi , ξi , θ0 )] ψγk
≡√ and
n k=1
n
1 X
Zn3 E[∇θ µ3,ζj (Y2j , Y2j , ζj , ζj , ξj , ξj , θ0 )] − E[∇θ µ3,ζi (Y2i , Y2i , ζi , ζi , ξi , ξi , θ0 )] ψδk .
= √
n k=1
Proof. By the mean value theorem, there exist ∆ζij∗ and ∆ξij∗ for Rn :
h−1
1{y1i > y1j } Sij (θ) 1{y3i = y3j }
X
n
Rn =
n(n − 1) i̸=j
× kh (∆y2ij ) [kh′ (∆ζij∗ )kh (∆ξij∗ )(ζ̂i − ζi − ζ̂j + ζj ) + kh (∆ζij∗ )kh′ (∆ξij∗ )(ξˆi − ξi − ξˆj + ξj )].
We first define Rn,ζ to derive the linear representation that involves the term ζ̂i − ζi :
h−1
1{Y1i > Y1j } Sij (θ) 1{Y3i = Y3j }kh (∆Y2ij )kh′ (∆ζij∗ )kh (∆ξij∗ )X3ij
X
n ′
Rn,ζ = (δ̂ − δ0 ).
n(n − 1) i̸=j
(37)
Note that the term that involves ζ̂j − ζj can be dealt with in a similar way. Using the
linear representation of δ̂, (37) can be written as
36
h−1
1{Y1i > Y1j } Sij (θ) 1{Y3i = Y3j }kh (∆Y2ij )kh′ (∆ζij∗ )kh (∆ξij∗ )X3ij
X
′
n
ψδk + op (n−1/2 ).
n(n − 1)(n − 2) i̸=j̸=k
(38)
The unconditional expectation of this third-order U −process is zero. Also, its condi-
tional expectation with respect to each of the first and second arguments, i and j, is
zero due to the property of ψδk . Therefore, we only derive a linear representation for
its conditional expectation with respect to the third argument k.
By Assumption 2 and the consistency of δ̂, the conditional expectation of the
summand of (38) except ψδk is equal to its unconditional expectation:
E[1{Y1i > Y1j } Sij (θ) 1{Y3i = Y3j }kh (∆Y2ij )kh′ (∆ζij )kh (∆ξij )X3ij
′
]
= E{E[1{Y1i > Y1j } Sij (θ) 1{Y3i = Y3j }kh (∆Y2ij )kh′ (∆ζij )kh (∆ξij )X3ij
′
where
The first and the third equalities hold due to the law of iterated expectation. The
second equality holds since
E[1{Y1i > Y1j } 1{Y3i = Y3j } | X1i , X1j , X3i , X3j , Y2i , Y2j , ζi , ζj , ξi , ξj ]
= Pr(Y1i > Y1j | Y3i = Y3j , X1i , X1j , X3i , X3j , Y2i , Y2j , ζi , ζj , ξ, ξj )
× Pr(Y3i = Y3j | X1i , X1j , X3i , X3j , Y2i , Y2j , ζi , ζj , ξi , ξj )
= Pr(Y1i > Y1j | Y3i = Y3j , X1i , X1j , Y2i , Y2j , ζi , ζj , ξi , ξj ) × Pr(Y3i = Y3j | ζi , ζj )
= F(X1i , X1j , Y2i , Y2j , ζi , ζj , ξi , ξj ) H(ζi , ζj ).
By using the change of variable and because up K ′ (u)du = 0 and up+1 K ′ (u)du =
R R
37
Z
kh (∆Y2ij )kh′ (∆ζij )kh (∆ξij ) H(ζi , ζj )M3 (Y2i , Y2j , ζi , ζj , ξi , ξj , θ) dF (Y2i , ζi , ξi ) dF (Y2j , ζj , ξj )
Z
= K(u)K ′ (v)K(w) H(ζj + vhn , ζj ) M3 (Y2j + uhn , Y2j , ζj + vhn , ζj , ξj + whn , ξj , θ)
where
and µ3,ζi is its first-order partial derivative with respect to ζi . By a Taylor series
expansion of (40) around θ0 and Assumption 9, the conditional expectation of (38)
with respect to the third argument k can be written as
n
1X
− E[µ3 (Y2j , Y2j , ζj , ζj , ξj , ξj , θ)]ψδk + op (n−1/2 )
n k=1
n
1 1 X
′
= √ (θ − θ0 ) − √ E[∇θ µ3,ζi (Y2j , Y2j , ζj , ζj , ξj , ξj , θ0 )]ψδk + op (n−1/2 ).
n n k=1
We can take similar steps for ζ̂j − ζj and obtain the following term:
n
1 1 X
√ (θ − θ0 )′ √ E[∇θ µ3,ζj (Y2i , Y2i , ζi , ζi , ξi , ξi , θ0 )]ψδk + op (n−1/2 ),
n n k=1
where µ3,ζj is the first-order partial derivate of µ3 with respec to ζj . Finally, we define
Zn3 that follows normal distribution asymptotically:
n
1 X
Zn3 ≡ √ E[∇θ µ3,ζj (Y2j , Y2j , ζj , ζj , ξj , ξj , θ0 )] − E[∇θ µ3,ζi (Y2i , Y2i , ζi , ζi , ξi , ξi , θ0 )] ψδk .
n k=1
Also, by doing similarly for ξˆi − ξi and ξˆj − ξj , we have Zn2 where
38
n
1 X
Zn2 E[∇θ µ2,ξj (Y2j , Y2j , ζj , ζj , ξj , ξj , θ0 )] − E[∇θ µ2,ξi (Y2i , Y2i , ζi , ζi , ξi , ξi , θ0 )] ψγk .
≡ √
n k=1
with
1 1
Ĝn (θ) = (θ − θ0 )′ V (θ − θ0 ) + √ (θ − θ0 )′ Zn + op (||θ − θ0 ||2 ) + Op (n−1 h−3
n ) + Op (n
−1/2
),
2 n
(41)
where Zn = Zn1 + Zn2 + Zn3 .
√
Lemma 9. If Assumptions 1 - 9 hold, then θ̂ is n-consistent.
1 X
U 2 (·, ·, θ) = M h−3 α α −1 −1
n Op (δn γn n ) = Op (n )Op (n
−α/2 −3
hn ) = op (n−1 )
n(n − 1) i̸=j n
by choosing α sufficiently close to 1 and Assumption 9. This implies that Op (n−1 h−3
n )
−1
term is op (n ) over Op (δn ) neighborhood of θ0 . Finally, we apply Lemma 4 again to
√
conclude that ||θ̂ − θ0 || = Op (1/ n).
Now, we define ∆i :
∆i = ∇1 τk (θ0 )
+ E[∇θ µ2,ξj (Y2j , Y2j , ζj , ζj , ξj , ξj , θ0 )] − E[∇θ µ2,ξi (Y2i , Y2i , ζi , ζi , ξi , ξi , θ0 )] ψγk
39
Because E[∇1 τk (θ0 )] = 0 and E[ψγk ] = E[ψδk ] = 0, it holds that E[∆i ] = 0. Based on
d
Assumption 7 and the Lindberg-Levy CLT, we conclude that Zn → N (0, Λ), where
Λ = E[∆i ∆′i ]. The asymptotic normality of θ̂ follows from Theorem 2 of Sherman
[1993], i.e.
√ d
n(θ̂ − θ0 ) → N (0, V −1 ΛV −1 ).
40
B.3 Proof of Theorem 2
The conditional Kendall’s tau coefficient that captures the effect of Y2 on Y1 is given
as
′ ′
P
i̸=j ω̃ij sgn(Y1i − Y1j ) sgn(X2i γ̂ − X2j γ̂)
τ̂21 ≡ P , (42)
i̸=j ω̃ij
′ ′ ′
For notational convenience, we introduce the term (ζi , ξi , χi ) = (X3i δ0 , X2i γ0 , X1i β0 )
and (ζij , ξij , χij , Y2ij ) = (ζi − ζj , ξi − ξj , χi − χj , Y2i − Y2j ) again.
The difference between (42) and (43) is
We
P first evaluate the probability limit of the denominator term in (44). Note that
i̸=j ω̃ij = E[ω̃ij ] + op (1) by the ULLN for the U -process. By the change of variable
and Assumption (9), E[ω̃ij ] → E[f (χi )] as n → ∞ and hn → 0. Then we consider the
following decomposition to deal with the numerator term in (44):
1 X
ω̃ij [sgn(Y1ij ) sgn(ξˆij ) − τ21 ]
n(n − 1) i̸=j
1 X
= ωij [sgn(Y1ij ) sgn(ξij ) − τ21 ] (45)
n(n − 1) i̸=j
1 X
+ (ω̃ij − ωij ) [sgn(Y1ij ) sgn(ξij ) − τ21 ] (46)
n(n − 1) i̸=j
1 X
ω̃ij sgn(Y1ij ) sgn(ξˆij ) − sgn(ξij ) ,
+ (47)
n(n − 1) i̸=j
where ωij = kh (χij ) · 1{Y3i = Y3j }. By combining a result for (45) with the asymptotic
properties of the denominator term in (44), we get the limiting distribution of the
infeasible test statistic that depends on true parameters γ0 and β0 . Also, results for
(46) and (47) show how the influence function is adjusted by using estimated values
41
γ̂ and β̂. We deal with (45), (46), and (47) consecutively and collect those results to
obtain the asymptotic linear representation for τ̂21 .
Let us consider (45). We apply the Höeffding decomposition to (45) to obtain the
representation in the limit. As n → ∞ and hn → 0, the expectation of the term inside
the double summation becomes
Z
K(u) E[sgn(Y1ij ) sgn(ξij ) − τ21 | χj + uhn , χj ] f (χj + uhn ) du dF (χj )
Z
= K(u) E[sgn(Y1ij ) sgn(ξij ) − τ21 | χj , χj ] f (χj ) du dF (χj ) + O(h2n )
= O(h2n ),
which is op (n−1/2 ) by the higher order properties of the kernel function K(·). Also,
we obtain the conditional expectations of the term inside the double summation con-
ditional on its first and second arguments, i and j. Consider the arguments w.r.t. i,
Z
K(u)[sgn(Y1ij ) sgn(ξij ) − τ21 ] f (χi + uhn ) du dF (Y1j , ξj )
Z
= K(u)[sgn(Y1ij ) sgn(ξij ) − τ21 ] f (χi ) du dF (Y1j , ξj ) + O(h2n )
42
Following a similar procedure, we obtain an analogous representation for the argu-
ments with respect to j. Applying similar arguments used to prove Lemma 3.1 in
Powell et al. [1989], the degenerate U -statistic of order 2 in the Höeffding decompo-
sition is op (n−1/2 ).
Therefore, (45) has the following representation in the limit:
1 X
ωij [sgn(Y1ij ) sgn(ξij ) − τ21 ]
n(n − 1) i̸=j
n n
1 X X
= U(Y1i , ξi ) + U(Y1j , ξj ) × Pr[Y3i = Y3j ] + op (n−1/2 ). (50)
n i=1 j=1
Let us consider (46). By a mean-value expansion, the consistency of β̂, and the
linear representation for β̂, it holds that
1
h−1 k ′ (χ∗ )X ′ 1{Y3i = Y3j } (β̂ − β0 ) [sgn(Y1ij ) sgn(ξij ) − τ21 ]
X
n(n − 1) i̸=j n h ij 1ij
1
h−1 k ′ (χij ) 1{Y3i = Y3j } [sgn(Y1ij ) sgn(ξij ) − τ21 ]X1ij
X
′
= ψβk + op (n−1/2 ).
n(n − 1)(n − 2) i̸=j̸=k n h
(51)
Now, we consider the Höeffding decomposition of (51). Note that its unconditional
expectation and conditional expectation with respect to each of the first and the sec-
ond argument, i and j, are zero. Therefore, we only need the following conditional
expectation with respect to the third argument k:
n
h−1
E kh′ (χij ) 1{Y3i = Y3j }[sgn(Y1ij ) sgn(ξij ) − τ21 ]X1ij
X
′
n
ψβk + op (n−1/2 ).
(52)
n k=1
43
where
Z
K ′ (u) D(Y3i = Y3j , χj + uhn , χj )f (χj + uhn ) du dF (χj )
Z
= hn K ′ (u) [D1 (Y3i = Y3j , χj , χj )f (χj ) + D(Y3i = Y3j , χj , χj )f ′ (χj )]udu dF (χj ) + O(hn ),
(54)
1 X
(ω̂ij − ωij ) [sgn(Y1ij ) sgn(ξij ) − τ21 ]
n(n − 1) i̸=j
n
1X
=− E[D1 (Y3i = Y3j , χj , χj )f (χj ) + D(Y3i = Y3j , χj , χj )f ′ (χj )] Pr[Y3i = Y3j ] ψβk + op (n−1/2 ).
n k=1
(55)
1 X
ω̂ij sgn(Y1ij ) sgn(ξˆij ) − sgn(ξij )
n(n − 1) i̸=j
1
ω̂ij sgn(Y1ij ) E sgn(ξˆij ) − sgn(ξij ) | X2i , X2j .
X
− (56)
n(n − 1) i̸=j
1
ω̂ij sgn(Y1ij ) E sgn(ξˆij ) − sgn(ξij ) | X2i , X2j + op (n−1 ).
X
n(n − 1) i̸=j
44
Recall that we set the first component of γ to 1 as a normalization. Then γ can be
written as γ = (1, γ̃), where 1 is the first component of γ and γ̃ is the other compo-
nents of γ. We have
where Fξ|X̃2 is the conditional CDF of ξij given X̃2i and X̃2j . Because the linear index
ξi = X2i γ is smooth in a neighborhood of γ0 given X̃2i and X̃2j by Assumption 5, the
conditional CDF of ξij is sufficiently smooth for a mean-value expansion:
Fξ|X̃2 (ξˆij ) = Fξ|X̃2 (X̃2ij γ̃0 ) + fξ|X̃2 (X̃2ij γ̃0 )(γ̂ − γ0 ) + Op (||γ̂ − γ0 ||2 ).
2 X
ω̂ij sgn(Y1ij ) fξ|X̃2 (ξij )(γ̂ − γ0 ) + Op (||γ̂ − γ0 ||2 ) + op (n−1 )
n(n − 1) i̸=j
2 X
′
= ω̂ij sgn(Y1ij ) fξ|X̃2 (ξij ) X̃2ij ψγk + op (n−1 ). (57)
n(n − 1)(n − 2) i̸=j̸=k
Now, we consider the Höeffding decomposition of (57). Again, notice that its uncon-
ditional expectation and conditional expectation on the first and second argument,
i and j, are zero. Therefore, we only need the following conditional expectation on
the third argument k:
n
2X
E[ω̂ij sgn(Y1ij ) fξ|X̃2 (ξij ) X̃2ij
′
] ψγk + op (n−1 ). (58)
n k=1
By the consistency of γ̂, we can work with true ωij instead of ω̂ij . Then the expecta-
tion term in (58) is
45
Z
K(u) E[sgn(Y1ij ) fξ|X̃2 (ξij ) X̃2ij
′
| Y3i = Y3j , χj + uhn , χj ] f (χj + uhn ) du dF (χj )
Z
= K(u) E[sgn(Y1ij ) fξ|X̃2 (ξij ) X̃2ij′
| Y3i = Y3j , χj , χj ] f (χj ) du dF (χj ) + O(h2n )
where
1 X
ω̂ij sgn(Y1ij ) sgn(ξˆij ) − sgn(ξij )
n(n − 1) i̸=j
n
2X
= E[T (Y3i = Y3j , χj , χj )] Pr[Y3i = Y3j ]ψγk + op (n−1/2 ). (60)
n k=1
Taking all of our results (50), (55), and (60) into account, we obtain the linear repre-
sentation for τ̂21 − τ21 :
τ̂21 − τ21
n
X n
Pr[Y3i = Y3j ] 1 X
= U(Y1i , ξi ) + U(Y1j , ξj )
E[f (χi )] n i=1 j=1
n
E[D1 (Y3i = Y3j , χj , χj )f (χj ) + D(Y3i = Y3j , χj , χj )f ′ (χj )] ψβi
X
−
i=1
n i
E[T (Y3i = Y3j , χj , χj )] ψγi
X
+2 + op (n−1/2 ).
i=1
46
References
Kees Jan van Garderen and Noud van Giersbergen. A nearly similar powerful test
for mediation. arXiv preprint arXiv:2012.11342, 2020.
James Heckman, Rodrigo Pinto, and Peter Savelyev. Understanding the mechanisms
through which an influential early childhood program boosted adult outcomes.
American Economic Review, 103(6):2052–2086, 2013.
Martin Huber. Mediation analysis. Handbook of Labor, Human Resources and Population
Economics, pages 1–38, 2020.
James M Robins and Sander Greenland. Identifiability and exchangeability for direct
and indirect effects. Epidemiology, 3(2):143–155, 1992.
Judea Pearl. Direct and indirect effects. In Probabilistic and causal inference: the works
of Judea Pearl, pages 373–392. 2022.
Kosuke Imai, Luke Keele, and Dustin Tingley. A general approach to causal media-
tion analysis. Psychological methods, 15(4):309, 2010a.
Sung Jae Jun, Joris Pinkse, Haiqing Xu, and Neşe Yıldız. Multiple discrete endoge-
nous variables in weakly-separable triangular models. Econometrics, 4(1):7, 2016.
Markus Frölich and Martin Huber. Direct and indirect treatment effects–causal
chains and mediation analysis with instrumental variables. Journal of the Royal
Statistical Society, 79(5):1645–1666, 2017.
Christian Dippel, Robert Gold, Stephan Heblich, and Rodrigo Pinto. Mediation anal-
ysis in iv settings with a single instrument. Working Paper, 2020.
47
Yi-Ting Chen, Yu-Chin Hsu, and Hung-Jen Wang. A stochastic frontier model with
endogenous treatment status and mediator. Journal of business & economic statistics,
38(2):243–256, 2020.
Marshall M Joffe, Dylan Small, Thomas Ten Have, Steve Brunelli, and Harold I Feld-
man. Extended instrumental variables estimation for overall effects. The interna-
tional journal of biostatistics, 4(1), 2008.
Stacey H Chen, Yen-Chien Chen, and Jin-Tan Liu. The impact of family composition
on educational achievement. Journal of Human Resources, 54(1):122–170, 2019.
Christian Dippel, Robert Gold, Stephan Heblich, and Rodrigo Pinto. The effect of
trade on workers and voters. The Economic Journal, 132(641):199–217, 2022.
Jason Abrevaya, Jerry A Hausman, and Shakeeb Khan. Testing for causal effects in
a generalized regression model with endogenous regressors. Econometrica, 78(6):
2043–2061, 2010.
Azeem Shaikh and Edward J Vytlacil. Threshold crossing models and bounds on
treatment effects: a nonparametric analysis, 2005.
Jay Bhattacharya, Azeem M Shaikh, and Edward Vytlacil. Treatment effect bounds
under monotonicity assumptions: an application to swan-ganz catheterization.
American Economic Review, 98(2):351–356, 2008.
48
Cecilia Machado, Azeem M Shaikh, and Edward J Vytlacil. Instrumental variables
and the sign of the average treatment effect. Journal of Econometrics, 212(2):522–555,
2019.
Hidehiko Ichimura. Semiparametric least squares (sls) and weighted sls estimation
of single-index models. Journal of econometrics, 58(1-2):71–120, 1993.
Kosuke Imai, Luke Keele, and Teppei Yamamoto. Identification, inference and sen-
sitivity analysis for causal mediation effects. Statistical Science, 25(1):51–71, 2010b.
Christopher Cavanagh and Robert P Sherman. Rank estimators for monotonic index
models. Journal of Econometrics, 84(2):351–381, 1998.
Robert P Sherman. The limiting distribution of the maximum rank correlation esti-
mator. Econometrica: Journal of the Econometric Society, pages 123–137, 1993.
49
Ariel Pakes and David Pollard. Simulation and the asymptotics of optimization es-
timators. Econometrica: Journal of the Econometric Society, pages 1027–1057, 1989.
Michael D Perlman and Lang Wu. The emperor’s new tests. Statistical Science, 14(4):
355–369, 1999.
Jeremy C Biesanz, Carl F Falk, and Victoria Savalei. Assessing mediational models:
Testing and interval estimation for indirect effects. Multivariate Behavioral Research,
45(4):661–701, 2010.
Saralees Nadarajah and Tibor K Pogány. On the distribution of the product of cor-
related normal random variables. Comptes Rendus Mathematique, 354(2):201–204,
2016.
GFV Glonek. On the behaviour of wald statistics for the disjunction of two regular
hypotheses. Journal of the Royal Statistical Society: Series B (Methodological), 55(3):
749–755, 1993.
Jean-Marie Dufour, Eric Renault, and Victoria Zinde-Walsh. Wald tests when restric-
tions are locally singular. arXiv preprint arXiv:1312.0569, 2013.
Mathias Drton and Han Xiao. Wald tests of singular hypotheses. Bernoulli, pages
38–59, 2016.
Zhonghua Liu, Jincheng Shen, Richard Barfield, Joel Schwartz, Andrea A Baccarelli,
and Xihong Lin. Large-scale hypothesis testing for causal mediation effects with
applications in genome-wide epigenetic studies. Journal of the American Statistical
Association, 117(537):67–81, 2022.
Noud PA van Giersbergen et al. Inference about the indirect effect: a likelihood
approach. UvA-Econometrics Discussion Papers, 2014.
Shakeeb Khan, Xiaoying Lan, Elie Tamer, and Qingsong Yao. Estimating high dimen-
sional monotone index models by iterative convex optimization. arXiv preprint
arXiv:2110.04388, 2023.
50
Deborah Nolan and David Pollard. U-processes: rates of convergence. The Annals of
Statistics, pages 780–799, 1987.
Whitney K Newey and Daniel McFadden. Large sample estimation and hypothesis
testing. Handbook of econometrics, 4:2111–2245, 1994.
51