0% found this document useful (0 votes)
5 views51 pages

JMP_JHLee

This paper presents a semiparametric framework for causal mediation analysis in generalized regression models, addressing the identification of direct and indirect treatment effects with endogenous treatments and mediators. It introduces kernel-weighted Kendall’s tau statistics for testing the significance of these effects and demonstrates the model's flexibility in accommodating various types of variables. Monte Carlo simulations validate the proposed testing approach, showing its effectiveness in detecting non-zero indirect effects and maintaining appropriate test sizes.

Uploaded by

pro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views51 pages

JMP_JHLee

This paper presents a semiparametric framework for causal mediation analysis in generalized regression models, addressing the identification of direct and indirect treatment effects with endogenous treatments and mediators. It introduces kernel-weighted Kendall’s tau statistics for testing the significance of these effects and demonstrates the model's flexibility in accommodating various types of variables. Monte Carlo simulations validate the proposed testing approach, showing its effectiveness in detecting non-zero indirect effects and maintaining appropriate test sizes.

Uploaded by

pro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Causal Mediation Analysis in a Generalized

Regression Model

Jung Hyub Lee *

September 1, 2024

Click here for the latest version

Abstract
We consider a unifying framework to test for direct and indirect treatment
effects in nonlinear models. Specifically, we extend a generalized linear-index
model to incorporate endogenous treatments and endogenous mediators. We
propose kernel-weighted Kendall’s tau statistics to test the significance of the di-
rect and indirect effects of endogenous treatments on the outcome variable medi-
ated by endogenous mediators. The proposed semiparametric model allows for
treatments and mediators to be discrete, continuous, and/or censored/truncated.
For the indirect effect, we construct two distinct kernel-weighted Kendall’s tau
statistics that capture the effect of (i) the treatment on the mediator and (ii) the
mediator on the outcome. Applying the testing approach of van Garderen and
van Giersbergen [2020] avoids the problem of under-sized testing of the joint
null hypothesis associated with the indirect effect. Monte Carlo Simulations in-
vestigate the performance of the semiparametric testing approach.

JEL Numbers: C12, C13, C14, C31, C36


Keywords: Causal inference; Treatment effects; Mediation; Endogeneity; Instru-
mental variables; Semiparametric estimation.

* Department of Economics, University of Texas at Austin. Email: [email protected]. I am deeply


grateful to Jason Abrevaya for his guidance and support. Also, I thank Brendan Kline, Haiqing Xu,
Shakeeb Khan, Stephen Donald, Yu-Chin Hsu, Daniel Ackerberg, and participants at the 2023 Texas
Econometrics Camp and the 2023 Midwest Econometrics Group Conference for their helpful com-
ments and suggestions.

1
1 Introduction
Causal mediation analysis decomposes a treatment effect into two components: the
indirect effect (or causal mediation effect), which occurs through one or more me-
diator variables, and the direct effect, which captures the impact of the treatment
through other causal mechanisms not involving the mediator variable(s) of interest.
The term ’mediator’ refers to a variable that is affected by the treatment and, in turn,
affects the outcome variable of interest. If policymakers find that a treatment has an
indirect effect on the outcome variable through a specific, they may focus their ef-
forts on that causal mechanism to achieve their desired goal. Starting with Judd and
Kenny [1981] and Baron and Kenny [1986], causal mediation analysis was largely
developed by psychologists and sociologists. However, this framework has gained
popularity among economists as a tool to identify direct and indirect effects in eco-
nomic models (e.g., Heckman et al. [2013] and Heckman and Pinto [2015], among
many others).
There are two primary approaches to identifying causal mediation effects: the
parametric model approach and the potential outcome framework 1 . Sobel [1982]
and Baron and Kenny [1986] introduced linear models to identify indirect effects.
This method gives an easy and intuitive interpretation of indirect effects. How-
ever, it is restrictive since it does not fully incorporate interactions between variables
and assumes independence between outcome variables. The potential outcome ap-
proach is more flexible, as it typically does not assume parametric specification of
variables (e.g., Robins and Greenland [1992], Pearl [2022], among many others). A
recent contribution that shows the advantage of this approach is the identification of
ACME (average causal mediation effect) by Imai et al. [2010a]. However, the poten-
tial outcome framework typically hinges on strong statistical assumptions to achieve
nonparametric identification of indirect effects. For example, the identification strat-
egy of Imai et al. [2010a] requires sequential ignorability, which is untestable like the
unconfoundedness assumption.
At the same time, most research in causal mediation analysis considers (condi-
tionally) random treatment, which is more relevant to experimental settings. Even
if the treatment is randomized, it is often the case that the mediator is not random-
ized. Consequently, the endogeneity of the mediator can arise due to post-treatment
confounders. Although there are models that utilize instrumental variables to han-
dle endogenous treatment/mediator (e.g., Powdthavee et al. [2013], Jun et al. [2016],
Frölich and Huber [2017], Dippel et al. [2020], Chen et al. [2020], etc), there have been
few studies appropriate for quasi-experimental designs in economics.
This paper introduces a new semiparametric model that offers several advan-
tages over previous models. Using this model, one can identify and estimate the
existence and direction of direct and indirect effects in the presence of not only an en-
dogenous treatment but also an endogenous mediator. Unlike previous models that
focus on specific types of treatment, mediator, and outcome variables, our model
accommodates general outcome types (continuous, discrete, truncated, and/or cen-
sored). The model also allows for heterogeneous direct and indirect effects. The
model includes an endogenous mediator that is affected by the treatment and sub-
1
See Huber [2020] and Celli [2022] for a general review of causal mediation models

2
sequently affects the outcome variable of interest. Specifically, this extends the gen-
eralized regression framework by including an endogenous treatment and an en-
dogenous mediator. The model has a triangular structure, with three equations that
correspond to the treatment, the mediator, and the outcome.
Semiparametric test statistics, called conditional Kendall’s tau statistics, are pro-
posed to test the existence of direct and indirect effects in the model. Estimating
the direct effect requires just one conditional Kendall’s tau because it only takes into
account the treatment and the outcome variable of interest. On the other hand, the
indirect effect consists of two quantities: (i) the effect of the treatment on the medi-
ator and (ii) the effect of the mediator on the outcome. There exists a causal medi-
ation effect of the treatment if and only if these two effects are nonzero. Therefore,
we can conduct a joint hypothesis test based upon two distinct conditional Kendall’s
tau statistics that capture (i) and (ii), respectively. These two Kendall’s tau statistics
depend on estimated linear-index parameters in the treatment, mediator, and out-
come. We first estimate linear index parameters using certain matching conditions
that compare a pair of observations of lower-level outcome √ variables and their linear
indices. We show that the estimated linear indices are n−consistent and asymp-
totic normal. Then, we derive conditional √ Kendall’s tau statistics using the plug-in
method. Finally, we prove that given n−consistent and asymptotically normal es-
timators for linear
√ indices, our new estimator that captures (ii) based on the plug-in
method is also n−consistent and asymptotically normal.
Using these two test statistics and a new method introduced by van Garderen
and van Giersbergen [2020], we make simultaneous inference to test the existence
of the indirect effect. It has been noted that classic test methods (e.g., the Sobel’s
test, joint significance test) are severely under-sized when true values of (i) and (ii)
are very close to or equal to zero. Although the test method in van Garderen and
van Giersbergen [2020] was originally developed for the linear causal mediation
model without treatment/mediator endogeneity, we show that it can be applied to
the semiparametric model with endogeneity.
This paper also provides Monte Carlo evidence to verify that the theoretical re-
sults results are valid. The data generating processes are based on different types
of treatment variables and mediator variables. The simulation results show that the
testing method (i) gives correct size of the test and (ii) can detect non-zero indirect
effects because its statistical power increases as the sample size increases and true
indirect effects assumed in the DGPs deviate from zero.

1.1 Literature Review


This paper contributes to the literature on causal mediation analysis, focusing on
a quasi-experimental design with two instrumental variables for the endogenous
mediator and the treatment. Many causal mediation models with noncompliance
and endogeneity problem hinge on a single instrument (e.g. Robins and Greenland
[1992], Geneletti [2007], Sobel [2008], Joffe et al. [2008], Yamamoto [2013], Chen et al.
[2019], Dippel et al. [2020], among many others). Broadly speaking, these models
impose at least one of the following restrictions to disentangle direct and indirect
effects: (i) a parametric structure, (ii) a perfect single IV that affects both treatment
and mediator, and/or (iii) a restriction on the (conditional) dependence structure

3
between variables.
Some exceptions that introduce two instrumental variables for the treatment and
mediator are Powdthavee et al. [2013], Burgess et al. [2015], Jhun [2015], and Chen
et al. [2020]. They assume different parametric structures to identify indirect ef-
fects. Frölich and Huber [2017] discuss nonparametric identification of direct and
indirect effects among treatment compliers in the presence of the endogenous treat-
ment/mediator. They assume binary treatment variable and consider three different
scenarios based on a distribution of the mediator and the instrument for the me-
diator. Specifically, the mediator and instrument can be either discrete or continu-
ous, but they cannot both be discrete. In contrast, this paper permits various types
of treatment and mediator, such as continuous, censored, or truncated variables.
Also, the work proposed here focuses on the statistical significance of the indirect
effect rather than its magnitude. Additional studies addressing the issue of treat-
ment/mediator endogeneity include Dippel et al. [2020] and Dippel et al. [2022],
which consider parametric and nonparametric identification of indirect effects in the
presence of two sources of endogeneity. However, their approach focuses on a spe-
cific scenario where a single IV can address both the endogeneity of the treatment
and the mediator.
Although our model is semiparametric, the new test statistics that capture the
indirect effect have properties similar to those of classic estimators for the linear
causal mediation model (e.g., asymptotic normality). To be specific, Sobel [1982] and
Baron and Kenny [1986] consider the system of equations for the causal mediation
and least square estimators for the indirect effect. This paper shows that Kendall’s
tau statistics derived from our model that capture indirect effects can be used to
make simultaneous inference like those estimators. However, our test statistics are
different from classic estimators in that these statistics themselves do not represent
the magnitude of the indirect effect. Rather than magnitude, we focus on the sign
and existence of indirect effects and the statistical significance of the indirect effect
in terms of z−statistics. This is due to the flexibility of the model (see Abrevaya et al.
[2010] for a discussion).
This paper also adds to the literature on the identification and inference on the
sign of causal effects in the presence of endogeneity. (e.g., Shaikh and Vytlacil [2005],
Bhattacharya et al. [2008], Chiburis [2010], Shaikh and Vytlacil [2011], and Machado
et al. [2019]). Specifically, Abrevaya et al. [2010] extend the generalized regression
framework and propose a kernel-weighted version of Kendall’s tau statistics that
captures the existence and direction of a causal effect of an endogenous regressor.
Also, Kline [2016] identifies the existence and direction of a causal effect at a partic-
ular value of exogenous variables, instead of overall causal effects, using the gen-
eralized regression framework. In contrast, this paper focuses on the existence and
direction of the causal mediation effect and the associated inference.

1.2 Outline
The rest of this paper is organized as follows. Section 2 introduces the generalized
regression model, which is an extension of Han [1987]. The system of equations in-
cludes two possibly endogenous regressors to allow for the treatment and mediator
endogeneity. Section 3 introduces a three-step procedures for testing significance of

4
the indirect effect of the endogenous regressor mediated by the endogenous media-
tor.√A kernel-weighted Kendall’s tau coefficient computed in the third stage, which
is n−consistent and asymptotically normal, captures the effect of the mediator on
the outcome. To test the significance of the indirect effect, this statistic is combined
with another kernel-weighted Kendall’s tau statistics that captures the effect of the
treatment on the mediator. We review some issues that classic tests for indirect ef-
fects have and introduce a testing method that overcomes these problems. Com-
plete proofs of the theorems in this section are relegated to the Appendix. Section
4 provides Monte Carlo simulation results to evaluate the performance of the third-
stage test statistics for the indirect effects. The pattern of results shows that the new
method performs well in terms of the size of the test and statistical power.

2 The Model
We consider a system of three equations that constitue the generalized regression
model for the outcome variable Y1 , the mediator variable Y2 , and the treatment vari-
able Y3 . Each variable has an associated latent variable, determined by an unknown
function F . For each variable, the function D describes the form of the observable
variable.

Y1 = D1 (Y1∗ ) and Y1∗ = F1 (X1′ β0 , Y2 , Y3 , ϵ1 ), (1)


Y2 = D2 (Y2∗ ) and Y2∗ = F2 (X2′ γ0 , Y3 , ϵ2 ), (2)
Y3 = D3 (Y3∗ ) and Y3∗ = F3 (X3′ δ0 , ϵ3 ), (3)

where X2 = (X1 , Z2 ) and X3 = (X1 , Z3 ) 2 . The model for the latent dependent
variables (Y1∗ , Y2∗ , Y3∗ ) has a general linear-index form, where (ϵ1 , ϵ2 , ϵ3 ) ∈ R3 rep-
resents the vector of error disturbances. (X1 , Z2 , Z3 ) are assumed to be independent
of (ϵ1 , ϵ2 , ϵ3 ).
There are no restrictions on the correlation structure among the error distur-
bances (ϵ1 , ϵ2 , ϵ3 ), which allows for endogeneity of the treatment and the mediator.
To handle endogeneity issues, the exclusion restrictions for Y2 and Y3 are provided
by the two subcomponents Z2 and Z3 in X2 and X3 , respectively.
Powell et al. [1989], Ichimura [1993], Klein and Spady [1993], and Abrevaya
[2000] previously considered nonadditive index models. Chesher [2003], Chesher
[2005], and Imbens and Newey [2009] analyzed triangular models without linear in-
dex. Contrary to those models, each equation in our model is allowed to depend on
not only linear index but also lower-level outcome variables.
The functions F1 and F2 are assumed to be strictly monotone in their first ar-
guments (linear indices) and the last arguments (error disturbances). Furthermore,
these two functions are assumed to be weakly monotone in the lower-level outcomes
included as additional indices. Also, we assume that the functions D1 and D2 are
2
One possible extension is X2 = (X1 , Z2 ) and X3 = (X1 , Z2 , Z3 ). Because Z2 for the mediator Y2
is allowed to enter both X2 and X3 , it may also affect the treatment Y3 .

5
weakly increasing and nondegenerate. These assumptions are formally stated as
follows:
Assumption F.

(i) The function F1 (·, y2 , y3 , e1 ) is strictly monotone for all y2 , y3 , and e1 . Also, the func-
tion F2 (·, y3 , e2 ) is strictly monotone for all y3 and e2 .

(ii) The function F1 (x′1 β0 , y2 , y3 , ·) is strictly monotone for every x′1 β0 , y2 , and y3 . Also,
the function F2 (x′2 γ0 , y3 , ·) is strictly monotone for all x′2 γ0 and y3 .

(iii) The function F1 (x′1 β0 , ·, y3 , e3 ) is weakly monotone for all x′1 β0 , y3 , and e3 . Also,
F1 (x′1 β0 , y2 , ·, e3 ) is weakly monotone for all x′1 β0 , y2 , and e3 . The function F2 (x′2 γ0 , ·, e2 )
is weakly increasing for all x′2 γ0 and e2 .

Assumption D. The function D1 and D2 are nondegenerate and weakly increasing in Y1∗
and Y2∗ , respectively.

Assumption D allows the model to incorporate various nonlinear models by intro-


ducing weak monotonicity in the latent term. Thus we can handle various types of
mediator/outcome variables by Assumption D (e.g., binary choice models, ordered-
choice models, censored-regression models, transformation models, and propor-
tional hazard rate models).
To determine whether the endogenous Y3 affects Y1 through Y2 , we need to con-
sider two effects: (i) the effect of the treatment on the mediator, and (ii) the effect
of the mediator on the outcome 3 . Based on Assumptions F and D, we can derive
some properties required to derive matching conditions that will be used for the
estimation in Section 3. For instance, if F1 is strictly increasing in X1′ β0 and ϵ3 ,

ν ′ > ν ′′ =⇒ F1 (ν ′ , y2 , y3 , e) > F1 (ν ′′ , y2 , y3 , e) for all y2 , y3 , and e,


e′ > e′′ =⇒ F1 (ν, y2 , y3 , e′ ) > F1 (ν, y2 , y3 , e′′ ) for all ν, y2 , and y3 . (4)

Furthermore, the direction of monotonicity with respect to Y2 and Y3 is invariant to


the values of the other outcome variable, the linear index, and the error term. For
instance, let us consider the weak monotonicity in Y2 . Given x′1 β0 , y3 , and e, it holds
that either

y2′ > y2′′ =⇒ F1 (ν, y2′ , y3 , e) ≥ F1 (ν, y2′′ , y3 , e) for all ν, y3 , and e or
y2′ > y2′′ =⇒ F1 (ν, y2′ , y3 , e) ≤ F1 (ν, y2′′ , y3 , e) for all ν, y3 , and e, (5)

3
Note that without (1) for the outcome variable of interest, the system of equations that comprises
(2) and (3) for the mediator and the treatment is the same as the one considered in Abrevaya et al.
[2010] in the context of causal effects without endogenous mediator. Therefore, we can focus on the
functions F1 and D1 in this paper to see how Y2 affects Y1 .

6
with strict inequality on some region of the support of Y2 . Note that the variable Y3
can be a moderator in the relationship between Y2 and Y1 , which is possible since
Y3 and Y2 are allowed to interact in the equation for the outcome Y1 . Thus, Y3 can
influence the strength and direction of the effect of Y2 on Y1 . In this case, the indirect
effect can be described as ”moderated mediation” (e.g., Model 1 in Preacher et al.
[2007]).
At this point, we discuss our approach for handling the direct and indirect effects
of Y3 on Y1 . For given values y2 , x′1 β0 , and y3++ ̸= y3+ , the individual direct effect can be
defined as

D1 (F1 (x′1 β0 , y2 , y3++ , ϵ1 )) − D1 (F1 (x′1 β0 , y2 , y3+ , ϵ1 )). (6)

We will introduce a parameter τ31 that is a type of rank correlation that measures the
association between Y3 and Y1 . The null hypothesis for testing the indirect effect is
simply

H0 : τ31 = 0. (7)

In addition, for given values y3 , x′1 β0 , x′2 γ0 , and y3++ ̸= y3+ , the individual indirect effect
(or mediation effect) can be defined as

D1 (F1 (x′1 β0 , y2++ , y3 , ϵ1 )) − D1 (F1 (x′1 β0 , y2+ , y3 , ϵ1 )), (8)

where

y2++ = D2 (F2 (x′2 γ0 , y3++ , ϵ2 )), and y2+ = D2 (F2 (x′2 γ0 , y3+ , ϵ2 )). (9)

Per (8) and (9), nonzero causal mediation effect requires two conditions: (i) the vari-
ation in Y3 effectively shifts Y2 , and (ii) the variation in Y2 effectively shifts Y1 . Put
differently, there is no causal mediation effect if either of these conditions is not sat-
isfied. In Section 3, we introduce two parameters τ32 and τ21 that capture the effect
of Y3 on Y2 and the effect of Y2 on Y1 , respectively. To test the indirect effect of Y3 on
Y1 in the model, we can construct two equivalent null hypotheses:

H0 : τ32 = 0 or τ21 = 0 ⇐⇒ H0 : τ32 τ21 = 0. (10)

Note that it is possible to achieve nonparametric identification of the average


causal mediation effect if one is willing to drop the linear index assumption and as-
sume sequential ignorability (i.e. (i) conditional independence of the treatment, and (ii)

7
conditional independence of the mediator) to handle endogeneity issues (Imai et al.
[2010a] and Imai et al. [2010b]). However, this assumption is arguably strong and not
testable 4 . One may also consider using parametric or nonparametric IV approach
(e.g, Sobel [2008], Frölich and Huber [2017], Dippel et al. [2020], and Chen et al.
[2020]). However, these approaches typically restrict the type of variables and/or
the type of endogeneity. Each method has its own advantage if assumptions for the
identification fit into the specific circumstances or context being considered. At the
same time, instead of attempting to identify and estimate an exact magnitude of the
effect (for a given level of treatment Y3 in Y2 and Y1 ), verifying the existence of the
indirect effect under weaker assumptions may have advantages over other methods
if assumptions required for the identification are too restrictive or less plausible.

2.1 The Linear Causal Mediation Model


To provide readers with a better understanding of the indirect effect, we compare our
model with the linear causal mediation model. This model consists of three linear
equations without endogeneity issues. Sobel [1982] and Baron and Kenny [1986] first
considered this type of model:

Y1 = β1 X1 + β2 Y2 + β3 Y3 + ϵ1 , (11)
Y2 = γ1 X1 + γ2 Y3 + ϵ2 , (12)
Y3 = δ1 X1 + ϵ3 . (13)

The model assumes that the treatment Y3 comes before the mediator Y2 and the me-
diator comes before the outcome Y1 , and there is no reverse causality. Additionally,
the model assumes that there is no interaction between the treatment and the media-
tor when considering their effect on the outcome. Substituting the equation (12) into
(11):

Y1 = (β1 + γ1 β2 )X1 + (β3 + γ2 β2 )Y3 + (ϵ1 + β2 ϵ2 ).

The direct effect of Y3 on Y1 is captured by β3 , while the indirect effect mediated by


Y2 is given by γ2 β2 (i.e. Y3 → Y2 → Y1 ) 5 . The total effect is the sum of the direct
and indirect effects, equal to β3 + γ2 β2 . Thus, there is no indirect effect if γ2 β2 = 0,
which is satisfied when Y3 has no effect on Y2 (γ2 = 0) or when Y2 has no effect on Y1
(β2 = 0). In this case, the total effect is equivalent to the direct effect.
There are two equivalent null hypotheses for testing the existence of the indirect
effect:
4
Huber [2020] pointed out that in many empirical applications, (i) pre-treatment covariates are
not sufficient to control for the endogeneity of the mediator, and (ii) if there is a substantial time gap
between measuring the treatment and the mediator, it is more likely that there are post-treatment con-
founders. These variables invalidate the conditional independence of the mediator and the outcome.
5
More precisely, the direct effect β3 is a mixture of all other effects of Y3 on Y1 not explained by the
indirect effect γ2 β2 .

8
Figure 1: The causal DAG for the parametric IV model and the generalized regres-
sion model.

Z3 Z2 X1

Y3 Y2 Y1
τ32 (γ3 ) τ21 (β2 )

τ31 (γ2 )

Note: This DAG describes the causal relationships between variables in the parametric IV model, (15)
- (17), and the generalized regression model, (1) - (3). The parameters associated with the indirect
effect (γ3 and β2 , and τ32 and τ21 ) lie on the same path.

H0 : γ2 = 0 or β2 = 0 ⇐⇒ H0 : γ2 β2 = 0. (14)

Various hypothesis testing methods for these two equivalent null hypotheses can
be found in MacKinnon et al. [2002]. One of them is testing the significance of the
product of two parameters, γ2 β2 . For example, Sobel [1982] suggested a method of
testing the significance of a mediation effect using the estimator γ̂2 β̂2 and the delta
method.
The parametric IV model is one possible extension that incorporates the treat-
ment and mediator endogeneity (Huber [2020]). The following system of linear
equations have two instruments (Z2 and Z3 ) that provide exclusion restrictions:

Y1 = β1 X1 + β2 Y2 + β3 Y3 + ϵ1 , (15)
Y2 = γ1 X1 + γ2 Z2 + γ3 Y3 + ϵ2 , (16)
Y3 = δ1 X1 + δ2 Z3 + ϵ3 . (17)

The system of equations (1) − (3) nests the above linear model. In this model, the
direct effect of Y3 on Y1 is β3 , and the product γ3 β2 is the indirect effect of Y3 on Y1 .
Similar to the typical 2SLS with two equations for the treatment and the outcome,
one can identify γ3 , β2 , and β3 by replacing Y3 in (15) and (16) with Ŷ3 ≡ E[Y3 |X1 , Z3 ]
and Y2 in (15) with Ŷ2 ≡ E[Y2 |X1 , Z2 , Ŷ3 ] in the presence of treatment and mediator
endogeneity if Z2 and Z3 are independent of or uncorrelated with (ϵ1 , ϵ2 , ϵ3 ).
γ3 and β2 play analogous roles to the parameters τ32 and τ21 described above.
Figure 1 is the causal directed acyclic graph (DAG) for the parametric IV model, (15)

9
- (17), and the generalized regression model, (1) - (3). These two models share the
same causal relationships between variables. Also, the parameters associated with
the indirect effect (γ2 and β2 , and τ32 and τ21 ) lie on the same paths.

3 Estimation and Testing for Indirect Effects


We need multiple steps to construct a test statistic that captures the effect of Y2 on Y1 ,
In the first stage, we estimate the linear-index parameter in the Y3 equation. In the
second stage, we use a conditional maximum rank correlation (MRC) estimator to es-
timate the linear-index parameters in the Y2 and Y1 equations. In the third stage, two
conditional Kendall’s tau statistics τ̂32 and τ̂21 are estimated to determine the effect
of Y3 on Y2 and Y2 on Y1 , respectively. These estimation steps can be summarized:

1. Estimate δ0 in Y3 .

2.(a) Estimate γ0 in Y2 given δ̂.

2.(b) Estimate β0 in Y1 given (δ̂, γ̂).

3. Estimate τ̂32 given (δ̂, γ̂), and τ̂21 given (δ̂, γ̂, β̂).

The first stage: Estimation of δ0

To simplify our analysis, we assume Y3 = 1{X3′ δ0 + ϵ3 > 0} that satisfies key


model assumptions 6 . Note that we need to normalize δ0 since √ we can only identify
δ0 up to scale. We can use a semiparametric (or parametric) n-consistent estimator
to estimate the linear index parameter δ0 (e.g., the MRC estimator of Han [1987], the
monotone rank estimator of Cavanagh and Sherman [1998], the semiparametric in-
dex estimator of Powell et al. [1989], or the estimator of Klein and Spady [1993] for
binary response models). In this paper, we focus on the use of rank-based estimators
(specifically, the MRC estimator), which are not influenced by bandwidth selection.

The second stage: Estimation of γ0 and β0

We need to normalize γ0 in Y2 and β0 in Y1 since they can only be identified up


to scale. For example, we can normalize the first component of them to be 1 or -1.
Alternatively, we may consider normalization γ0 / ∥γ0 ∥ and β0 / ∥β0 ∥ where β0 ̸= 0
and γ0 ̸= 0. Following Sherman [1993] and Abrevaya et al. [2010], we normalize the
first components of γ0 and β0 to be 1. Note that we can estimate γ0 following the
steps outlined in Abrevaya et al. [2010]. Once we have obtained the estimate γ̂, we
can proceed to the estimation of β0 = (1, θ0 ).
Because (ϵ1 , ϵ2 , ϵ3 ) is independent of (X1 , Z2 , Z3 ), we get the following conditional
distributions of ϵ1 :
6
For more general cases, the MRC estimator in Han [1987] or other semiparametric methods can
be used.

10

′ ′
Pr(ϵ1 ≤ e | ϵ3 ≤ −X3 δ0 , Y2 = D2 (F2 (X2 γ0 , 0, ϵ2 ))) if Y3 = 0,

Pr(ϵ1 ≤ e | Y2 , Y3 , X2 , X3 ) =

Pr(ϵ1 ≤ e | ϵ3 > −X3′ δ0 , Y2 = D2 (F2 (X2′ γ0 , 1, ϵ2 ))) if Y3 = 1.

Hence, it follows that, for two observations indexed by i ̸= j, ϵ1i |Y2i , Y3i , X2i , X3i and
ϵ1j |Y2j , Y3j , X2j , X3j have the same distribution if

Y3i = Y3j ,
′ ′
X3i δ0 = X3j δ0 ,
Y2i = Y2j ,
′ ′
X2i γ0 = X2j γ0 .

We consider pairs of observations that satisfy these four matching conditions to es-
timate β0 . Then, by the two properties (4) and (5) implied by Assumptions F and D,
it holds that

′ ′ ′ ′ ′ ′
X1i β0 ≥ X1j β0 ⇐⇒ Pr(Y1i > Y1j | X1i , X1j , Y3i = Y3j , Y2i = Y2j , X3i δ0 = X3j δ0 , X2i γ0 = X2j γ0 )
′ ′ ′ ′
≥ Pr(Y1i < Y1j | X1i , X1j , Y3i = Y3j , Y2i = Y2j , X3i δ0 = X3j δ0 , X2i γ0 = X2j γ0 ).
(18)

Equation (18) says that, conditional on the matching conditions, it is more likely that
′ ′
Y1i > Y1j than Y1i < Y1j given an inequality X1i β0 ≥ X1j β0 . Based on (18), we can
construct a population version of the objective function G(θ) where parameter vector
θ0 is a solution. The conditional rank correlation G(θ) between Yi and X1′ β is

G(θ) = E[1{Y1i > Y1j } 1{X1i


′ ′
β > X1j β} + 1{Y1i < Y1j } 1{X1i
′ ′
β < X1j β}
′ ′ ′ ′
| Y3i = Y3j , Y2i = Y2j , X3i δ0 = X3j δ0 , X2i γ0 = X2j γ0 ].
(19)

One may construct a sample objective function based on (19) using the analogy prin-
ciple:

1
1{Y1i > Y1j } 1{X1i′ θ > X1j′ θ}
X
n(n − 1) i̸=j
× 1{Y3i = Y3j } 1{Y2i = Y2j } 1{X3i
′ ′
δ0 = X3j δ0 } 1{X2i
′ ′
γ0 = X2j γ0 }.

11
However, we face two technical challenges. First, the true parameters in the linear

indices X3′ δ0 and X2′ γ0 are unknown. Second, the events Y2i = Y2j , X3i ′
δ0 = X3j δ0 ,
′ ′
and X2i γ0 = X2j γ0 may have measure-zero if the corresponding regressors have
continuous components. In this case, the value of the objective function is zero.
To address these issues we use estimates of δ0 and γ0 , and kernel weights for the
matching conditions. We introduce the conditional MRC estimator for θ0 , which is
obtained by maximizing the following objective function Ĝn (θ):

θ̂ ≡ arg max Ĝn (θ) where


θ∈Θ

1
1{Y1i > Y1j } 1{X1i′ θ > X1j′ θ}
X
Ĝn (θ) =
n(n − 1) i̸=j

× 1{Y3i = Y3j } kh (Y2i − Y2j ) kh (X3i


′ ′
δ̂ − X3j ′
δ̂) kh (X2i ′
γ̂ − X2j γ̂).
(20)

where kh (u) ≡ h−1 n K(u/hn ) for a kernel function K(·) and a bandwidth hn . The
kernel weighting places more weight on pairs of observations where Y2i ≃ Y2j ,
′ ′ ′ ′
X3i δ̂ ≃ X3j δ̂, and X2i γ̂ ≃ X2j γ̂. Note that if Y3 is not binary, the indicator function
that is used to match Y3i with Y3j would also be replaced with a kernel function.
The objective function (20) is a second-order U -process and a discontinuous func-

tion of θ 7 . To prove the n−consistency and asymptotic normality of θ̂ for this type
of objective function, we follow similar arguments used in Han [1987], Pakes and
Pollard [1989], Sherman [1993], Sherman [1994a], Sherman [1994b], Abrevaya [1999],
and Khan et al. [2021]. Theorem 1 establishes the asymptotic properties of the esti-
mator of θ0 under appropriate assumptions and regularity conditions introduced in
the Appendix. The asymptotic variance of θ̂ is affected by the estimators of δ0 and γ0
from the first and second stages.

p
Theorem 1. If Assumptions 1 -9 hold, then (i) θ̂ → θ0 and (ii) θ̂ is asymptotically normal,

with n(θ̂ − θ0 ) → N (0, V −1 ΛV −1 ), where V = 2−1 E[∇2 τk (θ0 )], Λ = E[∆i ∆′i ], and
d

∆i = ∇1 τk (θ0 )
+ E[∇θ µ2,ξj (Y2j , Y2j , ζj , ζj , ξj , ξj , θ0 )] − E[∇θ µ2,ξi (Y2i , Y2i , ζi , ζi , ξi , ξi , θ0 )] ψγk


+ E[∇θ µ3,ζj (Y2j , Y2j , ζj , ζj , ξj , ξj , θ0 )] − E[∇θ µ3,ζi (Y2i , Y2i , ζi , ζi , ξi , ξi , θ0 )] ψδk .




Proof. See the Appendix 8 .


7
In practice, one needs to use derivative-free optimization techniques (e.g. grid search, the Nelder-
Mead method, simulated annealing, etc.) to estimate θ0 .
8
The functions used in this theorem are formally introduced in the Appendix.

12
Note that because directly using the terms in Theorem 1 to estimate the standard
error of θ̂0 is challenging, it is recommended to use bootstrapping to estimate se(θ̂0 ).

The third stage: Testing for an Indirect Effect

Substituting the Y2 equation in (2) into the Y1 equation in (1) yields

Y1 = D1 (F1 (X1′ β0 , D2 (F2 (X2′ γ0 , Y3 , ϵ2 )), Y3 , ϵ1 )).

Y2 is weakly monotone in X2′ γ0 and Y1 is weakly monotone in Y2 by assumption.


Therefore, for a fixed value of X1′ β0 and Y3 , the sign of the rank correlation between
Y1 and X2′ γ0 is determined by the effect of Y2 on Y1 , and vice versa. We employ the
following two matching conditions:

′ ′
X1i β0 = X1j β0 ,
Y3i = Y3j .

If there exists a positive effect of Y2 on Y1 given X1′ β0 and Y3 , then the rank correlation
between Y1 and X2′ γ0 is more likely to be positive:

′ ′ ′ ′
X2i γ0 ≥ X2j γ0 ⇐⇒ Pr(Y1i > Y1j | X2i , X2j , X1i β0 = X1j β0 , Y3i = Y3j )
′ ′
≥ Pr(Y1i < Y1j | X2i , X2j , X1i β0 = X1j β0 , Y3i = Y3j ).

On the contrary, if there is a negative effect of Y2 on Y1 , then

′ ′ ′ ′
X2i γ0 ≥ X2j γ0 ⇐⇒ Pr(Y1i > Y1j | X2i , X2j , X1i β0 = X1j β0 , Y3i = Y3j )
′ ′
≤ Pr(Y1i < Y1j | X2i , X2j , X1i β0 = X1j β0 , Y3i = Y3j ).

′ ′
The event X1i β0 = X1j β0 can be measure-zero when X1 has a continuous compo-
nent. Also, we do not know β0 and γ0 , so we construct a kernel-weighted version of
Kendall’s tau statistics based upon estimates β̂ and γ̂:

′ ′
P
i̸=j ω̃ij sgn(Y1i − Y1j ) sgn(X2i γ̂ − X2j γ̂)
τ̂21 ≡ P ,
i̸=j ω̃ij


where ω̃ij ≡ kh (X1i ′
β̂ − X1j β̂) · 1{Y3i = Y3j }. Kernel weighting is used for continuous
random variables, while the indicator function is used for discrete random variables.

13
Note that if X1 has a single element, there is no need to estimate the parameter
β0 . Then, the weight ω̃ij simplifies to either kh (X1i − X1j ) · 1{Y3i = Y3j } if X1i is
continuous or 1{X1i = X1j } · 1{Y3i = Y3j } if X1i is discrete. If X1 has no elements, it
follows that ω̃ij = 1{Y3i = Y3j }.

Given n−consistent and asymptotically normal estimators δ̂, β̂, and γ̂, it can be
p
shown that as n → ∞ and the kernel bandwidth hn goes to zero, τ̂21 → τ21 , where

τ21 = E[sgn(Y1i − Y1j ) sgn(X2i


′ ′
γ0 − X2j ′
γ0 ) | X1i ′
β0 = X1j β0 , Y3i = Y3j ]. (21)

(21) identifies the existence of the effect of Y2 on Y1 for given two matching condi-
tions. In this sense, τ21 plays a similar role to the parameter β2 of (15) in the paramet-
ric IV model. √
Theorem 2 shows that τ̂21 is n−consistent and asymptotically normal.

Theorem 2. If Assumptions 1 - 9 hold, then τ̂21 has the following asymptotically linear
representation:

τ̂21 − τ21
n
X n
Pr[Y3i = Y3j ] 1 X
= U(Y1i , ξi ) + U(Y1j , ξj )
E[f (χi )] n i=1 j=1
n
E[D1 (Y3i = Y3j , χj , χj )f (χj ) + D(Y3i = Y3j , χj , χj )f ′ (χj )] ψβi
X

i=1
n i
E[T (Y3i = Y3j , χj , χj )] ψγi
X
+2 + op (n−1/2 ),
i=1

where (χi , ξi , ζi ) ≡ (X1′ β0 , X2′ γ0 , X3′ δ0 ) with

U(Y1i , ξi ) ≡ E [sgn(Y1ij ) sgn(ξij ) − τ21 ]f (χi ) | Y1i , ξi ,




D(Y3i = Y3j , χi , χj ) ≡ E [sgn(Y1ij ) sgn(ζij ) − τ21 ]X1ij




| Y3i = Y3j , χi , χj ,
T (Y3i = Y3j , χi , χj ) ≡ E[sgn(Y1ij ) fξ|X̃2 (ζij ) X̃2ij

| Y3i = Y3j , χi , χj ].

Proof. See the Appendix.


Similar to the case of Theorem 1, it is recommended to use bootstrapping to estimate
the standard error of τ̂32 .
Abrevaya et al. [2010] showed that it is possible to construct a kernel-weighted
version of Kendall’s tau statistics, τ̂32 , to test the effect of Y3 on Y2 :

14
′ ′
P
i̸=j ω̂ij sgn(Y2i − Y2j ) sgn(X3i δ̂ − X3j δ̂)
τ̂32 ≡ P ,
i̸=j ω̂ij

′ ′
where ω̂ij ≡ kh (X2i γ̂ − X2j γ̂). As n → ∞ and the bandwidth hn goes to zero, we can
p
show that τ̂32 → τ32 , where

τ32 = E[sgn(Y2i − Y2j ) sgn(X3i


′ ′
δ0 − X3j ′
δ0 ) | X2i ′
γ0 = X2j γ0 ]. (22)

(22) identifies
√ the existence of the effect of Y3 on Y2 . Abrevaya et al. [2010] also estab-
lished the n−consistency and asymptotic normality of τ̂32 .

In the appendix, we introduce Assumption 5 to achieve point identification of


β0 . This assumption requires the existence of at least one continuous regressor in X1
(c.f., Manski [1985]). Then, we obtain the following corollary, which is required for
simultaneous inference in Section 3.1.

Corollary 1. The z−statistics for τ32 and τ21 , T32 ≡ τ̂32 /se(τ̂32 ) and T21 ≡ τ̂21 /se(τ̂21 ), are
asymptotically independent and normally distributed:

d
 τ
32 τ21 ′
T → N (µ, I2 ), where T = (T32 , T21 )′ and µ = ,
σ(τ32 ) σ(τ21 )

Proof. See the Appendix.

Using the asymptotic properties of τ̂32 and τ̂21 , we can test (10) to determine whether
an indirect effect exists.

Remark. This paper focuses more on simultaneous inference using τ̂32 and τ̂21 to test
the existence of indirect effects. However, we can also construct τ̂31 that captures the
direct effect of Y3 on Y1 . Using τ̂31 , we can test the existence of a direct effect in the
model based on its asymptotic normality. After substituting the equation of Y3 in (3)
into Y1 in (1), we see that

Y1 = D1 (F1 (X1′ β0 , Y2 , D3 (F3 (X3′ δ0 , ϵ3 )), ϵ1 )).

For a fixed value of X1′ β0 and Y2 , the effect of Y3 on Y1 determines the sign of the rank
correlation between Y1 and X3′ δ0 , and vice versa. Therefore, we obtain the following
matching conditions:

15
′ ′
X1i β0 = X1j β0 ,
Y2i = Y2j .

Based on these matching conditions, we can construct Kendall’s tau statistics τ̂31
given (δ̂, β̂):

ωij∗ sgn(Y1i − Y1j ) sgn(X1i


′ ′
P
i̸=j δ̂ − X1j δ̂)
τ̂31 ≡ P ∗
,
i̸=j ωij

where ωij∗ ≡ kh (X1i′ ′


β̂ − X1j β̂) · kh (Y2i − Y2j ). By taking similar
√ steps for the asymptotic
properties of τ̂32 in the appendix, we can show that τ̂31 is n−consistent and asymp-
totically normal.

The advantage of the rank-based approach in this section is that it requires fewer
choices of bandwidth parameters for estimation compared to other semiparametric
methods. However, the objective function for this type of rank estimation involves a
double summation over the n(n − 1) observation pairs. It requires O(n2 ) calculations
and slows down computation speed for estimation. To overcome the problem and
enhance computational efficiency, previous studies such as Cavanagh and Sherman
[1998] and Abrevaya [1999] proposed methods to reduce the number of operations
to O(n log n). Additionally, recent improvements in CPU and computation skills, in-
cluding parallel computing, have made the rank-based approach more feasible 9 .

3.1 Simultaneous Inference: A Nearly Similar Powerful Test


We leverage the testing method of van Garderen and van Giersbergen [2020]. This
method is called a ’nearly similar power test’, and we conduct simultaneous infer-
ence using τ̂32 and τ̂21 to test (10). They developed this method to address the un-
dersized test issue of classic test methods that will be introduced in Section 3.2. In
particular, their method was originally developed for the linear causal mediation
model without endogenous treatment/mediator.
Instead of focusing on a specific class of test statistics and their asymptotic dis-
tributions, van Garderen and van Giersbergen [2020] deal with the critical region
for the null hypothesis. As noted in their work, this approach just requires the
asymptotic normality and mutual independence of two t-statistics (in our case, two
z−statistics T32 = τ̂32 /se(τ̂32 ) and T21 = τ̂21 /se(τ̂21 )). Because Z32 and Z21 are asymp-
totically normal and independent by Corollary 1, we can apply their method in the
presence of treatment/mediator endogeneity.
To test the null hypothesis, we define two statistics using two z-statistics T32 and
T21 :
9
For instance, the author used the Python multiprocessing module to take advantage of multiple
CPU cores in Monte Carlo simulations in Section 4.

16
|T |(1) ≡ min{|T32 |, |T21 |},
|T |(2) ≡ max{|T32 |, |T21 |}.

Then we consider critical regions that are bounded by a certain function g and rejects
H0 : τ32 τ21 = 0 if |T |(1) > g(|T |(2) ). Specifically, we consider g in D(R+
0 , R0 ) which is
+

the set of weakly increasing càdlàg functions (e.g., step functions).

Definition 1. A function g ∈ D(R+


0 , R0 ) is called the boundary function of the critical
+

region

Critical Region: CRg = {(T32 , T21 ) ∈ R2 | |T |(1) > g(|T |(2) )},
Acceptance Region: CRg = {(T32 , T21 ) ∈ R2 | |T |(1) ≤ g(|T |(2) )}.

Following van Garderen and van Giersbergen [2020], we can show the existence and
uniqueness of a function g that is said to be a similar boundary function. We want to
construct a test that does not depend on the value of the parameter under the null
hypothesis H0 : τ32 τ21 = 0.

Definition 2. g(·) is a similar boundary function if the probability of the critical region CRg
defined by g is constant under H0 : τ32 τ21 = 0,

Pr[(T32 , T21 ) ∈ CRg | ∀(τ32 , τ21 ) ∈ R2 with τ32 τ21 = 0] = constant.

Proposition 1. (Theorem 2 in van Garderen and van Giersbergen [2020]) A similar bound-
ary function g(·) exists for testing H0 : τ32 τ21 = 0 if and only if 1/α is an integer 10 (or
trivially α = 0). If it exists, the boundary is unique in D(R+
0 , R0 ).
+

Given that we have such g-function, we can reject H0 if

|T |(1) > g(|T |(2) ).

This test method is called an exact similar test. However, it turns out that the func-
tion g has some undesirable statistical properties. To tackle this issue, we vary the
g−function systematically by minimizing certain criterion function Q(g) to derive a
test called a nearly similar test. This new g-function that minimizes Q(g) gives a size
of the test that is sufficiently close to a desired significance level α. Furthermore,
10
Specifically, a similar boundary function g exists and is unique if for a given significance level α,
1/α is an integer. Therefore, we can find g for common significance levels (e.g., 1%, 5%, and 10%).

17
this function eliminates undesirable critical regions, such as cases where both T32
and T21 are smaller than 0.1, which can result in inappropriate and counter-intuitive
statistical inferences 11 .
Therefore, we can obtain valid critical values and size of the test for H0 : τ32 τ21 = 0
using the method suggested by van Garderen and van Giersbergen [2020]. In the
Section 4 Monte Carlo simulation, we leverage the method to get critical values for
a given significance level α.

3.2 Theoretical Invalidity of Classic Testing Methods


In this subsection, we review three commonly used testing methods for causal me-
diation effects and discuss why it is invalid to apply these methods to our new esti-
mator. When the true parameter values (τ32 , τ21 ) are close to or equal to (0, 0), these
tests are severely under-sized. Note that the same problem has been reported when
these methods are applied to the linear model case (see, MacKinnon et al. [2002] and
Biesanz et al. [2010]).

Bootstrapping

One may consider using the asymptotic distribution of τ̂32 τ̂21 to test the null hy-
pothesis H0 : τ32 τ21 = 0. This approach could be to directly estimate τ̂32 τ̂21 and
its standard error se(τ̂32 τ̂21 ), and then use these estimates to construct a t-statistic.
However, while the asymptotic normality of τ̂32 and τ̂21 is established, the behavior
of their product τ̂32 τ̂21 is complex. We can show that the product estimator τ̂32 τ̂21 is
asymptotically not pivotal 12 . Note that the null hypothesis H0 : τ32 τ21 = 0 can be
divided into three possible cases:

Case 1 : τ32 = 0 and τ21 ̸= 0,


Case 2 : τ32 ̸= 0 and τ21 = 0,
Case 3 : τ32 = τ21 = 0.

Recent work by Nadarajah and Pogány [2016] tackled the exact distribution of the
product of two possibly correlated normal random variables. They demonstrated
that the product follows a distribution that is not normal. Based on their findings,
we can show that τ̂32 τ̂21 is asymptotically normal in the null Case 1 and 2, but follows
non-normal distribution in the null Case 3 (i.e., not pivotal). Therefore, the bootstrap
method is not valid since it is not pivotal.

The Sobel’s test

Given the joint asymptotic normality of (τ̂32 , τ̂21 ), one potential approach is to
use testing methods proposed by Sobel [1982]. This method has been widely used
11
See Perlman and Wu [1999] for a detailed discussion of exact tests and appropriate statistical
inference.
12
For detailed proof, see the Appendix.

18
to test the existence of causal mediation effects in linear equations models. To be
specific, we can construct the Wald statistic Tn based on τ̂32 τ̂21 using the first-order
delta method:

τ̂32 τ̂21
Tn = p 2 2
.
τ̂32 se(τ̂21 )2 + τ̂21 se(τ̂32 )2

It relies on the asymptotic normality of the estimators used for the transformation
g(x, y) = xy, where (x, y) ∈ R2 . Similar to the bootstrap method, however, the
asymptotic distribution of the Sobel’s test depends on three null cases and is not
pivotal. Using the result of Glonek [1993], we can show that Tn has a discontinuity
in parameter space. Specifically, the Wald statistic requires the condition Dg(x, y) =
(y, x) ̸= (0, 0), which is not satisfied at (0, 0) 13 . According to the finding of Glonek
[1993], the asymptotic distribution of Tn is dependent on the true parameter (τ32 , τ21 ):

(
d N (0, 1) if (τ32 , τ21 ) ̸= (0, 0) (Case 1 and 2)
Tn →
N (0, 41 ) if (τ32 , τ21 ) = (0, 0) (Case 3).


Therefore, this is not pivotal. Although we know that Tn is n-consistent and fol-
lows normal distribution asymptotically, we cannot rely on the standard normal dis-
tribution table to obtain the critical value that covers three possible cases. Recent
studies by van Garderen and van Giersbergen [2020] and Liu et al. [2022] formally
demonstrated that (i) In the null Case 1 and 2, Tn follows N (0, 1) asymptotically, but
under-rejects if the sample size is finite, and (ii) in the null Case 3, Tn severely under-
rejects the null hypothesis H0 : τ32 τ21 = 0 when the true parameters (τ32 , τ21 ) are close
to or equal to (0, 0).
Figure 2 illustrates the issue that occurs when the true values of (τ32 , τ21 ) are both
equal to zero. The blue and orange lines in the figure represent the asymptotic distri-
butions of the test statistics T32 = τ̂32 /se(τ̂32 ) and T21 = τ̂21 /se(τ̂21 ), respectively. We
see that both test statistics are asymptotically standard normal. However, the green
line represents the asymptotic distribution of the product Tprod = τ̂32 τ̂21 /se(τ̂32 τ̂21 ),
and it follows N (0, 1/4). Therefore, using Tprod and the critical value from the stan-
dard normal distribution leads to severe under-rejection of the null hypothesis in
this case.

The joint significance test

One may try to test the equivalent null hypothesis H0 : τ32 = 0 or τ21 = 0, based
on the fact that nonzero mediation effects require two conditions: (i) the variation
in Y3 affects Y2 , and (ii) the variation in Y2 affects Y1 . A test that takes into ac-
count these two conditions is called the joint significance test (MacKinnon et al.
13
This is the situation in which singularity arises. A comprehensive and general explanation can
be found in Dufour et al. [2013] and Drton and Xiao [2016].

19
Figure 2: Asymptotic distributions of T32 , T21 , and Tprod

Note: Both T32 (Blue line) and T21 (Orange line) follow the standard normal distribution asymptoti-
cally. However, Tprod follows N (0, 1/4) asymptotically.

[2002]), or the causal steps test (Biesanz et al. [2010]). This test uses the minimum
of the absolute values of two test statistics 14 . Specifically in our case, we can use
|T |(1) ≡ min{|T32 |, |T21 |} for testing the null hypothesis. van Garderen and van Giers-
bergen [2020] and Liu et al. [2022] showed that the joint significance test is always
slightly more powerful than the Sobel’s test. However, they also demonstrated that
(i) in the null Case 1 and 2, the joint significance test has correct size α if the sample
size goes to infinity, and (ii) in the null Case 3, the rejection rate is always smaller than
desired level α. Therefore, the joint significance test has a similar under-rejection
problem.

4 Monte Carlo Simulations


Based on the results from Monte Carlo simulations, we can assess the performance of
the third-stage test statistic introduced in Section 3 in finite samples. One advantage
of the new model proposed in this paper is that it can accommodate various types
of mediator/outcome variables. Therefore, we examine DGP designs with differ-
ent types of variables to evaluate the performance of the new estimator and testing
method. The simulation results demonstrate that the size of a test converges to the
desired level.
We consider the following DGP designs with binary treatment where X1 has no
elements and Z2 and Z3 have only one element (i.e. X2 = Z2 and X3 = Z3 ):

14
van Giersbergen et al. [2014] and Liu et al. [2022] showed that this test is equivalent to the likeli-
hood ratio test if one uses the linear causal mediation model.

20
DGP 1: Binary Y1 and Y2 DGP 2: Continuous Y1 and Y2

Y1 = 1{β2 Y2 + β3 Y3 + ϵ1 ≥ 0} Y1 = β2 Y2 + β3 Y3 + ϵ1
Y2 = 1{X2 + γY3 + ϵ2 ≥ 0} Y2 = X2 + γY3 + ϵ2
Y3 = 1{X3 + ϵ3 ≥ 0} Y3 = 1{X3 + ϵ3 ≥ 0}

DGP 3: Binary Y1 and continuous Y2 DGP 4: Continuous Y1 and binary Y2

Y1 = 1{β2 Y2 + β3 Y3 + ϵ1 ≥ 0} Y1 = β2 Y2 + β3 Y3 + ϵ1
Y2 = X2 + γY3 + ϵ2 Y2 = 1{X2 + γY3 + ϵ2 ≥ 0}
Y3 = 1{X3 + ϵ3 ≥ 0} Y3 = 1{X3 + ϵ3 ≥ 0}

We assume that X1 and X2 follow the standard normal distribution independently.


Also, the error disturbances (ϵ1 , ϵ2 , ϵ3 ) follow a multivariate normal distribution:
     
ϵ1 0 1 ρ ρ
ϵ2  ∼ N 0 , ρ 1 ρ ,
ϵ3 0 ρ ρ 1

where ρ ∈ {0, 0.3, 0.5, 0.7}. Note that we do not estimate the linear indices in this
setup and focus on the pure performance of the new estimator τ̂21 .
For each of the four DGP designs, we consider two cases: (γ, β2 ) = (0, 0) and
(γ, β2 ) = (1, 0). Furthermore, we set β3 = 1 to assume that there is a direct effect of Y3
on Y1 . In both cases, however, there is no indirect effect of Y3 on Y1 by construction.
Therefore, τ32 τ21 = 0 in both cases, but τ32 could be either zero or nonzero. Using the
null hypothesis, this situation can be described as

H0 : τ32 = 0 or τ21 = 0 ⇐⇒ H0 : τ32 τ21 = 0.


(F alse or T rue) (T rue)

By the asymptotic normality of τ̂32 and τ̂21 we established in Section 3, either two of
z−statistics, or both, can be asymptotically standard normal under the null hypoth-
esis:

τ̂32 a τ̂21 a
T32 ≡ ∼ N (0, 1) or T21 ≡ ∼ N (0, 1).
ŝ32 ŝ21

Figure 3 shows that the asymptotic properties of T32 and T21 behave properly under
two different scenarios. Therefore, we may leverage the testing method suggested
by van Giersbergen et al. [2014] that hinges on the asymptotic normality of test statis-
tics. We see that T32 and T21 follow the standard normal distribution in the scenario

21
Figure 3: Estimated probability densities of T32 and T21

Note: The left and right figures are generated under the scenario (γ, β2 ) = (0, 0) and (γ, β2 ) = (1, 0)
using DGP 1, respectively. The bootstrap standard error is utilized to calculate T32 and T21 .

(γ, β2 ) = 0. However, T32 is not standard normal in the scenario (γ, β2 ) = (1, 0) since
there exists positive effect of Y3 on Y2 15 .
We conduct 1,000 simulations for each of the designs with a sample size n = 500.
Also, we use the bootstrap method to estimate standard errors s32 and s21 of τ̂32 and
τ̂21 and for T32 and T21 . Table 1 reports rejection rates for the 5% level for the four
DGP scenarios. The first four and the remaining rows of the table correspond to
the case (γ, β2 ) = (0, 0) and (γ, β2 ) = (0, 1), respectively. Overall, the rejection rates
reported in the four designs align closely with the significance level of 5%, even
when considering higher levels of endogeneity (ρ = 0.5 and ρ = 0.7).
We also evaluate the statistical power of the new testing method for using the
same DGP designs. Table 2 reports the power rates for three sizes of the indirect ef-
fect (small, medium, and large) at three sample sizes (n =200, 500, and 1000). Again,
we conduct 1,000 simulations for each of the designs. The level of endogeneity is
fixed (ρ = 0.7). The pattern of results shows that the power of the new testing
method increases as the sample sizes and the size of the indirect effect increase as
expected.

5 Conclusion
In this paper, we proposed a new generalized regression model with mediator vari-
ables and test statistics for the direct and indirect effects of a treatment variable. The
proposed methodology addresses the limitations of previous models by incorporat-
ing endogenous treatment/mediator and accommodating various types of variables
(e.g., binary, continuous, censored, or truncated). The indirect effect can be tested by
constructing two estimators that capture (i) the effect of the treatment on the media-
tor, and (ii) the effect of the mediator on the outcome of interest. To perform simul-
15
Note that an exact value of nonzero τ32 cannot be suggested since the true parameter τ32 exists as
a conditional expectation with respect to given matching conditions.

22
Table 1: Type I error rates for the hypothesis testing

(γ, β2 ) ρ DGP1 DGP2 DGP3 DGP4


(0, 0) 0.0 0.052 0.046 0.051 0.051
0.3 0.051 0.052 0.047 0.060
0.5 0.052 0.051 0.055 0.054
0.7 0.054 0.053 0.055 0.043

(1, 0) 0.0 0.060 0.062 0.063 0.056


0.3 0.050 0.051 0.048 0.050
0.5 0.059 0.068 0.043 0.057
0.7 0.047 0.047 0.058 0.059

Note: Rejection rates over 1,000 simulations for tests at 5% level are reported. The number of boot-
straps used for estimating standard error is 200.

Table 2: Statistical power for the hypothesis testing

n Small Medium Large


DGP 1 200 0.076 0.168 0.488
500 0.132 0.626 0.960
1000 0.264 0.940 1

DGP 2 200 0.204 0.558 0.840


500 0.337 0.874 0.992
1000 0.592 0.991 1

DGP 3 200 0.044 0.256 0.530


500 0.124 0.668 0.944
1000 0.356 0.948 1

DGP 4 200 0.164 0.492 0.803


500 0.360 0.908 1
1000 0.632 0.996 1

Note: For all analyses, γ = β2 , β3 = 1, and ρ = 0.7. Small effect size = 0.3, medium effect size = 0.6,
and large effect size = 0.9. The number of bootstraps used for estimating standard error is 200.

23
taneous inference on the indirect effect, we use√two estimators that capture (i) and
(ii). The estimators of these effects converge at n-rate and are asymptotically nor-
mal. However, it turns out that popular classic test methods for indirect effects are
severely under-sized. Therefore, we leverage the method proposed by van Garderen
and van Giersbergen [2020] to obtain correct rejection probability.
There are several interesting issues that may be addressed in future work. One is
the challenge of computational speed when dealing with high-dimensional regres-
sors. We can enhance estimation speed through computational techniques like par-
allel processing. However, optimizing a discontinuous objective function remains
challenging, particularly when dealing with high-dimensional regressors and kernel
densities for matching conditions. There has been research that aims to address the
challenge of computation burden associated with high-dimensional covariates. For
instance, Khan et al. [2023] has demonstrated a new approach using iterative con-
vex optimization to estimate high-dimensional monotone index models and reduce
the computational burden. Whether it is possible to apply this type of estimation
strategy to the class of models within the proposed generalized regression model
framework is an open question. Another intriguing question is about simultaneous
inference for indirect effects and its critical region. The inference in this paper is
based on the critical region suggested by van Garderen and van Giersbergen [2020]
that has appropriate size properties. It would be interesting to explore whether al-
ternative approaches can be developed.

24
A Asymptotic distribution of the product τ̂32τ̂21
We first define two random variables W32 and W21 :

τ̂32 − τ32 τ̂21 − τ21


W32 ≡ and W21 ≡ ,
s32 s21

where s32 and s21 are the standard deviation of τ̂32 and τ̂21 characterized by the
√ d
asymptotic linear representations for τ̂32 and τ̂21 in Theorem 1. Then nW32 → N (0, 1)
√ d
and nW21 → N (0, 1). Therefore, nW32 W21 converges to some random variable
Wprod , which is the product of two standard normal random variables.
According to the findings of Nadarajah and Pogány [2016], it turns out that when
considering two standard normal random variables X and Y , the product Z = XY
follows a non-normal distribution with E[Z] = ρ and V ar(Z) = 1 + ρ2 , where ρ =
corr(X, Y ).
If τ32 = τ21 = 0, for example, by the continuous mapping theorem,

√ √ d
nτ̂32 τ̂21 = s32 s21 · nW32 nW21 → s32 s21 · Wprod .

Therefore, τ̂32 τ̂21 is n-consistent and the t-statistic based on its standard error does
not follow the standard normal distribution asymptotically.
Without loss of generality, suppose that τ21 = 0. Then

√ √ s32 s21 √ √ d
nτ̂32 τ̂21 = τ32 s21 · nW21 + √ · nW32 nW21 → τ32 s21 · N (0, 1) + 0 · Wprod .
n


If either τ32 or τ21 is zero, τ̂32 τ̂21 is n−consistent and the z−statistic based on its
standard error converges to the standard normal distribution. We conclude that the
z−statistic for τ̂32 τ̂32 is not pivotal, and therefore the bootstrap method is not valid.

25
B Proofs of Asymptotic Results

B.1 Assumptions

Assumption 1. The MRC estimator developed by Han [1987] is utilized to estimate δ0 in


Y3 , and the conditional MRC estimator presented in Abrevaya et al. [2010] is employed to
estimate γ0 in Y2 . We normalize the first coefficient of δ0 and γ0 to 1, and assume that the
relevant regressors have a continuous distribution on their respective supports.

Assumption 2. The data {(Xi′ , Yi′ )′ }ni=1 are i.i.d distributed where Xi = (X1i , Z2i , Z3i )′ and
Yi = (Y1i , Y2i , Y3i )′ .

Assumption 3 (Parameter space). B is the restricted parameter space which is a compact


subset of {β ∈ Rd : βd = 1}. Each β in B is β(θ) = (θ, 1) where θ is an element of Θ,
which is a compact subset of Rd−1 . Also, write β0 = β(θ0 ) where θ0 consists of the first d − 1
components of β0 .

Assumption 4 (Error distribution). (ϵ1 , ϵ2 , ϵ3 ) is independent of (X1 , Z2 , Z3 ). The joint


distribution of (ϵ1 , ϵ2 , ϵ3 ) is absolutely continuous with respect to the Lebesgue measure on
R3 .

For future use, define the following terms that represent the linear indices and the
difference of variables:

′ ′ ′
(ζi , ξi , χi ) ≡ (X3i δ0 , X2i γ0 , X1i β0 ),
(ζij , ξij , χij , Y2ij , X1ij ) ≡ (ζi − ζj , ξi − ξj , χi − χj , X1i − X1j ).

(1)
Assumption 5 (First regressor and index properties). Let X1 be the first component of
X1 , and X̃1 be the other components of X1 , and similarly for X2 .

(i) The conditional density of X1ij on R is almost everywhere positive given X̃1ij and
(1)

given a neighborhood of Y2ij , ζij , and ξij close to zero.

(ii) The conditional density of X2ij on R is almost everywhere positive given X̃2i and X̃2i .
(1)

Assumption 6 (Full rank condition). The support of X1ij conditional on (ζi , ζj , ξi , ξj ) is


not contained in any proper linear subspace of Rk .

26
Assumption 7. Let N denote a neighborhood of β0 and W denote the support of Wi . Write
∇m for the mth partial derivative operator with respect to θ, and

X ∂m
|∇m |σ(θ) ≡ σ(θ) .
i1 ,...,im
∂θi1 · · · ∂θim

The symbol ∥·∥ denotes the matrix norm: ∥(aij )∥ = ( i,j a2ij )1/2 .
P

(i) There exists an integrable function ϕs (·) such that

Z
| 1{X1i
′ ′
β(θ) > X1j β(θ)} − 1{X1i
′ ′
β0 > X1j β0 }| f (X1i , Y2j , ζj , ξj )dX1i ≤ ϕs (X1j ) ||θ − θ0 ||.

(ii) For each w in W, all mixed second partial derivatives of τ (w, ·) exists on N .
(iii) There is an integrable function M (w) such that for all w in W and θ in N

∥∇2 τ (θ) − ∇2 τ (θ0 )∥ ≤ M (w)|θ|.

(iv) E[|∇1 τ (·, θ0 )|2 ] < ∞, E[|∇2 |τ (·, θ0 )] < ∞, and the matrix E[∇2 τ (·, θ0 )] is negative
definite.

Assumption 8 (Matching stages kernel function). The kernel function K : Rk → R,


which is employed in the second and third stages, is twice continuously differentiable and
assumed to possess the following properties:
(i) supu∈Rk |K(u)| < ∞.
R
(ii) K(u)du = 1.
R
(iii) |u|1 K(u)du < ∞, where | · |1 denotes the l1 norm.
(iv) K(·) is symmetric about 0.
(v) K(·) is a pth order kernel, where p is an even interger:
Z
ul K(u)du = 0 f or l = 1, 2, ..., p − 1,
Z
up K(u)du ̸= 0.

Assumption 9 (Matching stages bandwidth sequence). The bandwidth sequence hn


used in the second stage√
and the third stage
√ is3 a sequence of positive numbers such that as
p
n → ∞, (i) hn → 0, (ii) nhn → 0, and nhn → ∞.

27
B.2 Proof of Theorem 1

Let Gn (θ) and Ĝn (θ) be defined as

1
1{Y1i > Y1j } 1{X1i′ θ > X1j′ θ}
X
Gn (θ) =
n(n − 1) i̸=j

× 1{Y3i = Y3j } kh (Y2i − Y2j ) kh (X3i


′ ′
δ̂ − X3j ′
δ) kh (X2i ′
γ − X2j γ̂),

1
1{Y1i > Y1j } 1{X1i′ θ > X1j′ θ}
X
Ĝn (θ) =
n(n − 1) i̸=j

× 1{Y3i = Y3j } kh (Y2i − Y2j ) kh (X3i


′ ′
δ̂ − X3j ′
δ̂) kh (X2i ′
γ̂ − X2j γ̂).

We outline the proof strategy in the following steps:

(i) Establish consistency of the estimator.



(ii) Show the estimator converges at n-rate.

(iii) Establish asymptotic normality of the estimator.

To prove Lemma 1, we define the following terms 16 :

Sij (θ) ≡ 1{X1i


′ ′
β(θ) > X1j β(θ)} − 1{X1i
′ ′
β0 > X1j β0 }, (23)
H(ζi , ζj ) ≡ Pr(Y3i = Y3j | ζi , ζj ), (24)
F(X1i , X1j , Y2i , Y2j , ζi , ζj , ξi , ξj ) ≡ Pr(Y1i > Y1j | Y3i = Y3j , X1i , X1j , Y2i , Y2j , ζi , ζj , ξi , ξj ).
(25)

We start from showing that Ĝn (θ) converges uniformly to G(θ), where

hZ i
G(θ) ≡ E Sij (θ) F(X1i , X1j , Y2j , Y2j , ζj , ζj , ξj , ξj ) H(ζj , ζj ) f (X1i , Y2j , ζj , ξj ) dX1i .
(26)

Lemma 1. If Assumptions 1 - 6, 8, and 9 hold, then supθ∈Θ |Ĝn (θ) − G(θ)| = op (1).

16
In (23), we subtract the term 1{X1i
′ ′
β0 > X1j β0 }. However, doing this does not change a value of
the estimator.

28
Proof. By the triangular inequality,

sup |Ĝn (θ) − G(θ)| ≤ sup |Ĝn (θ) − Gn (θ)| + sup |Gn (θ) − E[Gn (θ)]| + sup | E[Gn (θ)] − G(θ)|.
θ∈Θ θ∈Θ θ∈Θ θ∈Θ

Note that Ĝn (θ) converges uniformly to Gn (θ) due to the consistency of δ̂ and γ̂ estab-
lished by Sherman [1993] and Abrevaya et al. [2010], and the dominated convergence
theorem. To prove the desired result for the second term on the right-hand side, we
need to show that

sup |Gn (θ) − E[Gn (θ)]| = op (1).


θ∈Θ

Define the following function, which is the summand in Gn (θ):

fn = 1{Y1i > Y1j }Sij (θ) 1{Y3i = Y3j } kh (Y2ij ) kh (ζij ) kh (ξij ).

Then we consider the following class of functions:

Fn ≡ {1{Y1i > Y1j } 1{Y3i = Y3j }Sij (θ) K(∆Y2ij /hn ) K(∆ζij /hn ) K(∆ξij /hn ) | θ ∈ Θ}
= {h3n fn | θ ∈ Θ}

with hn > 0 and hn → 0. Then Fn is a subclass of the fixed class F where

F ≡ {1{Y1i > Y1j } 1{Y3i = Y3j } Sij (θ) K(∆Y2ij /h) K(∆ζij /h) K(∆ξij /h) | θ ∈ Θ, h > 0}
= Fβ Fh1 Fh2 Fh3

with

Fβ ≡ {1{Y1i > Y1j } 1{Y3i = Y3j } Sij (θ) | θ ∈ Θ}


Fh1 ≡ {K(∆Y2ij /h) | h > 0}
Fh2 ≡ {K(∆ζij /h) | h > 0}
Fh3 ≡ {K(∆ξij /h) | h > 0}.

Notice that 1{Y1i > Y1j } 1{Y3i = Y3j } Sij (θ) is uniformly bounded by 1. Example
2.11 and Lemma 2.15 in Nolan and Pollard [1987] imply that Fβ is Euclidean for the
constant envelope 1. Also, Fh1 , Fh2 , and Fh3 are Euclidean for the constant envelope

29
supu∈R |K(u)| by Lemma 22 in Nolan and Pollard [1987]. Putting all these results
together, F is Euclidean for the constant envelope (supv∈R |K(v)|)3 . Let G̃n (θ) =
h3n Gn (θ). Then G̃n (θ) − E[G̃n (θ)] is the second-order U −statistic with zero mean and
Euclidean with the constant envelope (supv∈R |K(v)|)3 . Consequently, Corollary 4 in
Sherman [1994a] and Assumption 9 imply that

sup |Gn (θ) − E[Gn (θ)]| = h−3 −1


n Op (n ) = op (1).
θ∈Θ

To complete the argument, we need to demonstrate for the third term that

sup | E[Gn (θ)] − G(θ)| = o(1).


θ∈Θ

By applying the law of iterated expectation,

E[Gn (θ)]
= E[1{Y1i > Y1j } 1{Y3i = Y3j } Sij (θ) kh (Y2ij ) kh (ζij ) kh (ξij )]
= E kh (Y2ij ) kh (ζij ) kh (ξij ) Sij (θ) · E[1{Y1i > Y1j } 1{Y3i = Y3j } | X1i , X1j , Y2i , Y2j , ζi , ζj , ξi , ξj ] .


(27)

Using (24) and (25), the conditional expectation term in (27) can be written as

E[1{Y1i > Y1j } 1{Y3i = Y3j } | X1i , X1j , Y2i , Y2j , ζi , ζj , ξi , ξj ]


= Pr(Y1i > Y1j | Y3i = Y3j , X1i , X1j , Y2i , Y2j , ζi , ζj , ξ, ξj ) × Pr(Y3i = Y3j | X1i , X1j , Y2i , Y2j , ζi , ζj , ξi , ξj )
= Pr(Y1i > Y1j | Y3i = Y3j , X1i , X1j , Y2i , Y2j , ζi , ζj , ξi , ξj ) × Pr(Y3i = Y3j | ζi , ζj )
= F(X1i , X1j , Y2i , Y2j , ζi , ζj , ξi , ξj ) H(ζi , ζj ). (28)

Therefore, we can represent E[Gn (θ)] as the following integral form:

E[Gn (θ)]
Z
= K(u) K(v) K(w) Sij (θ) F(X1i , X1j , Y2j + uhn , Y2j , ζj + vhn , ζj , ξj + whn , ξj ) H(ζj + vhn , ζj )

× f (X1i , Y2j + uhn , ζj + vhn , ξj + whn ) dX1i du dv dw dF (X1j , Y2j , ζj , ξj )


hZ i
=E Sij (θ) F(X1i , X1j , Y2j , Y2j , ζj , ζj , ξj , ξj ) H(ζj , ζj ) f (X1i , Y2j , ζj , ξj ) dX1i + O(h2n )

= G(θ) + O(h2n ), (29)

30
R
Note that the first-order term in the Taylor expansion above is 0 since uK(u)du = 0
by Assumption 8 and the second-order term is O(||θ−θ0 ||h2n ) = O(h2n ) by Assumption
7. Thus by Assumption 9,

sup | E[Gn (θ)] − G(θ)| = O(h2n ) = o(1).


θ∈Θ

This establishes the uniform convergence of Ĝn (θ) to its limiting expected value G(θ).

Lemma 2. If Assumptions 2 - 6 hold, then θ0 is identified in the parameter space Θ.

Proof. By Assumptions 2 and 4, the monotonic relationship implies that θ0 maxi-


mizes G(θ) for each pair of (i, j). We aim to prove that G(θ) is uniquely maximized
at θ0 . In order to do so, suppose that

′ (1) ′ ′ ′ (1)
Pr[(X̃1ij θ̃0 < −X1ij < X̃1ij θ̃) ∪ (X̃1ij θ̃ < −X1ij < X̃1ij θ̃0 ) | X1i , X1j , ζi = ζj , ξi = ξj ] > 0.

The probability that θ0 and θ yield distinct values of Sij (·) in G(·) is non-zero. There-
fore, G(θ0 ) > G(θ). This inequality implies that if G(θ0 ) = G(θ), then it must hold
that

′ (1) ′ ′ ′ (1)
Pr[(X̃1ij θ̃0 < −X1ij < X̃1ij θ̃) ∪ (X̃1ij θ̃ < −X1ij < X̃1ij θ̃0 ) | X1i , X1j , ζi = ζj , ξi = ξj ] = 0.

By Assumption 5, this is equivalent to

′ ′
Pr[X̃1ij θ̃0 = X̃1ij θ̃ | X1i , X1j , ζi = ζj , ξi = ξj ] = 1.

By Assumption 6 that precludes an exact linear relationship, this is a contradiction.


Therefore, the condition cannot hold, which yields the desired result.

p
Lemma 3. If Assumptions 1 - 6, 8, and 9 hold, then θ̂ −→ θ0 .

Proof. The four sufficient conditions for Theorem 2.1 in Newey and McFadden [1994]
must be verified to prove the result: (C1) Θ is a compact set, (C2) supθ∈Θ |Ĝn (θ) −
G(θ)| = op (1), (C3) G(θ) is continuous in θ, and (C4) G(θ) is uniquely maximized at
θ0 .
Assumption 3 satisfies condition (C1), while condition (C2) is established by
Lemma 1. Also, (C3) is satisfied because G(θ) is continuous by Assumption 5. The
proof of Lemma 2 verifies the identification condition (C4).

31

After proving the consistency of θ, the next step is to establish its n−consistency
and asymptotic normality. We first introduce Theorem 1 of Sherman [1994b] that
suggests sufficient conditions for rates of convergence of θ.

Lemma 4. (Theorem 1 of Sherman [1994b]) Suppose θ̂ maximizes Ĝn (θ) and θ0 maximizes
G(θ). Let {δn } and {ϵn } be sequences of non-negative real numbers converging to zero as n
tends to infinity. If

1. θ̂ − θ0 = Op (δn )

2. there exists a neighborhood N of θ0 and a constant κ > 0 for which Ĝn (θ0 ) − G(θ0 ) ≤
−κ ∥θ − θ0 ∥2

3. uniformly over Op (δn ) neighborhood of θ0 ,


Ĝn (θ) = G(θ) + Op (∥θ − θ0 ∥ / n) + op (∥θ − θ0 ∥2 ) + Op (ϵn ),

1/2 √
then ∥θ − θ0 ∥ = Op (max[ϵn , 1/ n]).


To establish n−consistency and asymptotic normality of θ, we examine

Ĝn (θ) = Gn (θ) + Rn (30)

where Rn = Ĝn (θ) − Gn (θ).


We consider a shrinking neighborhood of θ0 , Θn ≡ {θ ∈ Θn : ||θ − θ0 || ≤ δn },
where δn = O(n−δ ) for some 0 < δ ≤ 1/2. To apply Theorem 2 of Sherman [1994a]
that establishes the asymptotic normality of θ, we want to show that there exists the
following quadratic approximation to Ĝn (θ) within Op (n−1/2 ) neighborhoods of θ0 :

1 1
Ĝn (θ) = (θ − θ0 )′ V (θ − θ0 ) + √ (θ − θ0 )′ Zn + op (n−1 ), (31)
2 n

where V is a negative definite matrix and Zn is asymptotically normal random vari-


able with mean zero and variance Λ.
To apply the Höeffding decomposition to Gn (θ) of (30), we define the function
below using (28):

gn (Wi , Wj , θ) ≡ kh (Y2ij ) kh (ζij ) kh (ξij ) Sij (θ) F(X1i , X1j , Y2i , Y2j , ζi , ζj , ξi , ξj ) H(ζi , ζj ).
(32)

Then we define two functions using (32):

32
Un1 (w, θ) = E[gn (Wi , Wj , θ) | Wj = w] + E[gn (Wi , Wj , θ) | Wi = w] − 2 E[Gn (θ)],

Un2 (wi , wj , θ) = gn (wi , wj , θ) − E[gn (Wi , Wj , θ) | Wj = wj ] − E[gn (Wi , Wj , θ) | Wi = wi ] + E[Gn (θ)],

where W ≡ (X1 , Y2 , ζ, ξ). Then Gn (θ) can be written as

n
1X 1 1
Gn (θ) = E[Gn (θ)] +
X
Un (Wk , θ) + U 2 (Wi , Wj , θ). (33)
n k=1 n(n − 1) i̸=j n

To prove Lemma 5 and Lemma 6, we define the following terms:

Z
τl (Wi , θ) ≡ Sij (θ) F(X1i , X1j , Y2i , Y2i , ζi , ζi , ξi , ξi ) H(ζi , ζi ) f (X1j , Y2i , ζi , ξi ) dX1j
(34)
Z
τr (Wj , θ) ≡ Sij (θ) F(X1i , X1j , Y2j , Y2j , ζj , ζj , ξj , ξj ) H(ζj , ζj ) f (X1i , Y2j , ζj , ξj ) dX1i
(35)
τk (θ) ≡ τl (Wk , θ) + τr (Wk , θ) for 1 ≤ k ≤ n. (36)

We first need to revisit the term G(θ) defined in (26), which was derived to prove
Lemma 1, to address the lead term E[Gn (θ)] of (33). Note that E[τl (Wi , θ)] = E[τr (Wj , θ)] =
G(θ) and E[τk (θ)] = 2G(θ).
Lemma 5. If Assumptions 7, 8, and 9 hold, then uniformly over Θn , we have
1
E[Gn (θ)] = (θ − θ0 )′ V (θ − θ0 ) + op (||θ − θ0 ||2 ),
2
where V ≡ 2−1 E[∇2 τk (θ0 )].

Proof. To obtain the term (29), we take the second-order Taylor expansion inside
around uhn = 0, vhn = 0, and whn = 0. The lead term is G(θ) and all remaining terms
are zero except the last term, which is of order O(h2n δn ) = o(n−1/2 δn ) = o(O(δn2 )) =
op (||θ − θ0 ||2 ) in the shrinking neighborhood Θn since h2n = o(n−1/2 ) by Assumption
9. Therefore,

E[Gn (θ)] = G(θ) + o(||θ − θ0 ||2 ).

33
Now, we consider G(θ). By taking a second-order Taylor expansion of τk (θ) around
θ0 ,

1
τk (θ) − τk (θ0 ) = (θ − θ0 )′ ∇1 τk (θ0 ) + (θ − θ0 )∇2 τk (θ̄)(θ − θ0 )
2
1
= (θ − θ0 )′ ∇1 τk (θ0 ) + (θ − θ0 )∇2 τk (θ0 )(θ − θ0 )
2
1
+ (θ − θ0 )′ [∇2 τk (θ̄) − ∇2 τk (θ0 )](θ − θ0 ).
2

Note that τk (θ0 ) = 0 and E[τk (θ)] is maximized at θ0 . Therefore, by Assumption 7,

1
E[τk (θ)] = 2G(θ) = (θ − θ0 )′ E[∇2 τk (θ0 )](θ − θ0 ) + o(∥θ − θ0 ∥2 ).
2

In the next step, we consider the second term of the decomposition (33).

Lemma 6. If Assupmtions 1 - 9 hold, then uniformly over Θn , we have

1 X 1 1
Un (Wk , θ) = √ (θ − θ0 )′ Zn1 + op (||θ − θ0 ||2 ),
n 1≤k≤n n

where Zn1 ≡ n−1/2


P
1≤k≤n ∇1 τk (θ0 ).

Proof. By a change of variable and Assumption 8,

E[gn (Wi , Wj , θ) | Wj = Wk ]
Z
= kh (Y2ik ) kh (ζik ) kh (ξik ) Sik (θ) F(X1i , X1k , Y2i , Y2k , zetai , ζk , ξi , ξk ) H(ζi , ζk ) dF (X1i , Y2i , ζi , ξi )
Z
= K(u)K(v)K(w) Sik (θ) F(X1i , X1k , Y2k + uhn , Y2k , ζk + vhn , ζk , ξk + whn , ξk ) H(ζk + vhn , ζk )

× f (X1i , Y2k + uhn , ζk + vhn , ξk + whn ) dX1i du dv dw


Z
= Sik (θ) F(X1i , X1k , Y2k , Y2k , ζk , ζk , ξk , ξk ) H(ζk , ζk ) f (X1i , Y2k , ζk , ξk ) dX1i + O(h2n δn )

= τr (Wk , θ) + O(h2n δn ).

By doing similarly, we can show that

34
E[gn (Wi , Wj , θ) | Wi = Wk ]
Z
= Skj (θ) F(X1k , X1j , Y2k , Y2k , ζk , ζk , ξk , ξk ) H(ζk , ζk ) f (X1j , Y2k , ζk , ξk ) dX1j + O(h2n δn )

= τl (Wk , θ) + O(h2n δn ).

Therefore, Un1 (Wk , θ) = τk (θ) − 2 E[Gn (θ)] + O(h2n δn ). Also, O(h2n δn ) = o(||θ − θ0 ||2 )
within a shrinking neighborhood Θn .

n
1X 1
U (Wk , θ)
n k=1 n
n
1X
= τk (θ) − (θ − θ0 )′ V (θ − θ0 ) + op (||θ − θ0 ||2 )
n k=1
n n
1X 1X
= (θ − θ0 )′ ∇1 τk (θ0 ) + (θ − θ0 )′ [∇2 τk (θ0 ) − V ](θ − θ0 ) + op (||θ − θ0 ||2 )
n k=1 n k=1

∇2 τk (θ0 ) = V + op (1), where V = E[∇2 τk (θ0 )] by the SLLN.


Pn
and n−1 k=1

Now, we consider the third term of the decomposition (33).

Lemma 7. If Assupmtions 1 - 9 hold, then uniformly over Θn , we have

1 X
U 2 (Wi , Wj , θ) = Op (n−1 h−3
n ).
n(n − 1) i̸=j n

Proof. Let Θn = {θ ∈ Θ | ||θ − θ0 || ≤ δn }, where δn = o(1) and take γn = 1. Note that


Un2 (·, ·, θ0 ) = 0 and |Un2 (·, ·, θ)| is bounded by a multiple of M h−3
n , where M is some
positive constant. Define Fn = {Ũn (·, ·, θ) | θ ∈ Θ} where Ũn (·, ·, θ) = M −1 h3n Un2 (·, ·, θ).
∗ 2 2

Deduce from similar arguments for proving Lemma 1 and Corollaries 17 and 21
in Nolan and Pollard [1987] that Fn∗ is Euclidean for a constant envelope. Using
Corollary 4 in Sherman [1994a], we conclude that

1 X
U 2 (·, ·, θ) = M h−3 −1 −1 −3
n Op (n ) = Op (n hn ).
n(n − 1) i̸=j n

35
By combining Lemma 6 and 7, we obtain the following representation for Gn (θ):

1 1
Gn (θ) = (θ − θ0 )′ V (θ − θ0 ) + √ (θ − θ0 )′ Zn1 + op (||θ − θ0 ||2 ) + Op (n−1 h−3
n ).
2 n

Finally, we consider the residual term Rn of (30).

Lemma 8. If Assumptions 1 - 9 hold, then uniformly over Θn , we have

1
Rn = √ (θ − θ0 )′ (Zn2 + Zn3 ) + op (n−1/2 ),
n

where

n
1 X
Zn2 E[∇θ µ2,ξj (Y2j , Y2j , ζj , ζj , ξj , ξj , θ0 )] − E[∇θ µ2,ξi (Y2i , Y2i , ζi , ζi , ξi , ξi , θ0 )] ψγk

≡√ and
n k=1
n
1 X
Zn3 E[∇θ µ3,ζj (Y2j , Y2j , ζj , ζj , ξj , ξj , θ0 )] − E[∇θ µ3,ζi (Y2i , Y2i , ζi , ζi , ξi , ξi , θ0 )] ψδk .

= √
n k=1

Proof. By the mean value theorem, there exist ∆ζij∗ and ∆ξij∗ for Rn :

h−1
1{y1i > y1j } Sij (θ) 1{y3i = y3j }
X
n
Rn =
n(n − 1) i̸=j

× kh (∆y2ij ) [kh′ (∆ζij∗ )kh (∆ξij∗ )(ζ̂i − ζi − ζ̂j + ζj ) + kh (∆ζij∗ )kh′ (∆ξij∗ )(ξˆi − ξi − ξˆj + ξj )].

We first define Rn,ζ to derive the linear representation that involves the term ζ̂i − ζi :

h−1
1{Y1i > Y1j } Sij (θ) 1{Y3i = Y3j }kh (∆Y2ij )kh′ (∆ζij∗ )kh (∆ξij∗ )X3ij
X
n ′
Rn,ζ = (δ̂ − δ0 ).
n(n − 1) i̸=j
(37)

Note that the term that involves ζ̂j − ζj can be dealt with in a similar way. Using the
linear representation of δ̂, (37) can be written as

36
h−1
1{Y1i > Y1j } Sij (θ) 1{Y3i = Y3j }kh (∆Y2ij )kh′ (∆ζij∗ )kh (∆ξij∗ )X3ij
X

n
ψδk + op (n−1/2 ).
n(n − 1)(n − 2) i̸=j̸=k
(38)

The unconditional expectation of this third-order U −process is zero. Also, its condi-
tional expectation with respect to each of the first and second arguments, i and j, is
zero due to the property of ψδk . Therefore, we only derive a linear representation for
its conditional expectation with respect to the third argument k.
By Assumption 2 and the consistency of δ̂, the conditional expectation of the
summand of (38) except ψδk is equal to its unconditional expectation:

E[1{Y1i > Y1j } Sij (θ) 1{Y3i = Y3j }kh (∆Y2ij )kh′ (∆ζij )kh (∆ξij )X3ij

]
= E{E[1{Y1i > Y1j } Sij (θ) 1{Y3i = Y3j }kh (∆Y2ij )kh′ (∆ζij )kh (∆ξij )X3ij

| X1i , X1j , X3i , X3j , Y2i , Y2j , ζi , ζj , ξi , ξj ]}


= E[kh (∆Y2ij )kh′ (∆ζij )kh (∆ξij ) X3ij

Sij (θ)F(X1i , X1j , Y2i , Y2j , ζi , ζj , ξi , ξj ) H(ζi , ζj )]
= E[kh (∆Y2ij )kh′ (∆ζij )kh (∆ξij ) H(ζi , ζj ) M3 (Y2i , Y2j , ζi , ζj , ξi , ξj )], (39)

where

M3 (Y2i , Y2j , ζi , ζj , ξi , ξj ) ≡ E[X3ij



Sij (θ)F(X1i , X1j , Y2i , Y2j , ζi , ζj , ξi , ξj ) | Y2i , Y2j , ζi , ζj , ξi , ξj ].

The first and the third equalities hold due to the law of iterated expectation. The
second equality holds since

E[1{Y1i > Y1j } 1{Y3i = Y3j } | X1i , X1j , X3i , X3j , Y2i , Y2j , ζi , ζj , ξi , ξj ]
= Pr(Y1i > Y1j | Y3i = Y3j , X1i , X1j , X3i , X3j , Y2i , Y2j , ζi , ζj , ξ, ξj )
× Pr(Y3i = Y3j | X1i , X1j , X3i , X3j , Y2i , Y2j , ζi , ζj , ξi , ξj )
= Pr(Y1i > Y1j | Y3i = Y3j , X1i , X1j , Y2i , Y2j , ζi , ζj , ξi , ξj ) × Pr(Y3i = Y3j | ζi , ζj )
= F(X1i , X1j , Y2i , Y2j , ζi , ζj , ξi , ξj ) H(ζi , ζj ).

By using the change of variable and because up K ′ (u)du = 0 and up+1 K ′ (u)du =
R R

−1 by Assumption 8, the integral form of (39) can be written as

37
Z
kh (∆Y2ij )kh′ (∆ζij )kh (∆ξij ) H(ζi , ζj )M3 (Y2i , Y2j , ζi , ζj , ξi , ξj , θ) dF (Y2i , ζi , ξi ) dF (Y2j , ζj , ξj )
Z
= K(u)K ′ (v)K(w) H(ζj + vhn , ζj ) M3 (Y2j + uhn , Y2j , ζj + vhn , ζj , ξj + whn , ξj , θ)

× f (Y2j + uhn ) f (ζj + vhn ) f (ξj + whn ) du dv dw dF (Y2j , ζj , ξj )


Z
= K(u)K ′ (v)K(w) µ3 (Y2j + uhn , Y2j , ζj + vhn , ζj , ξj + whn , ξj , θ) du dv dw dF (Y2j , ζj , ξj )

= −hn E[µ3,ζi (Y2j , Y2j , ζj , ζj , ξj , ξj , θ)] + O(h2n ), (40)

where

µ3 (Y2i , Y2j , ζi , ζj , ξi , ξj , θ) ≡ H(ζi , ζj ) M3 (Y2i , Y2j , ζi , ζj , ξi , ξj , θ) f (Y2i ) f (ζi ) f (ξi ).

and µ3,ζi is its first-order partial derivative with respect to ζi . By a Taylor series
expansion of (40) around θ0 and Assumption 9, the conditional expectation of (38)
with respect to the third argument k can be written as

n
1X
− E[µ3 (Y2j , Y2j , ζj , ζj , ξj , ξj , θ)]ψδk + op (n−1/2 )
n k=1
n
1  1 X 

= √ (θ − θ0 ) − √ E[∇θ µ3,ζi (Y2j , Y2j , ζj , ζj , ξj , ξj , θ0 )]ψδk + op (n−1/2 ).
n n k=1

We can take similar steps for ζ̂j − ζj and obtain the following term:

n
1  1 X 
√ (θ − θ0 )′ √ E[∇θ µ3,ζj (Y2i , Y2i , ζi , ζi , ξi , ξi , θ0 )]ψδk + op (n−1/2 ),
n n k=1

where µ3,ζj is the first-order partial derivate of µ3 with respec to ζj . Finally, we define
Zn3 that follows normal distribution asymptotically:

n
1 X
Zn3 ≡ √ E[∇θ µ3,ζj (Y2j , Y2j , ζj , ζj , ξj , ξj , θ0 )] − E[∇θ µ3,ζi (Y2i , Y2i , ζi , ζi , ξi , ξi , θ0 )] ψδk .

n k=1

Also, by doing similarly for ξˆi − ξi and ξˆj − ξj , we have Zn2 where

38
n
1 X
Zn2 E[∇θ µ2,ξj (Y2j , Y2j , ζj , ζj , ξj , ξj , θ0 )] − E[∇θ µ2,ξi (Y2i , Y2i , ζi , ζi , ξi , ξi , θ0 )] ψγk .

≡ √
n k=1

with

M2 (Y2i , Y2j , ζi , ζj , ξi , ξj , θ) = E[X2ij



Sij (θ)F(X1i , X1j , Y2i , Y2j , ζi , ζj , ξi , ξj ) | Y2i , Y2j , ζi , ζj , ξi , ξj ]],
µ2 (Y2i , Y2j , ζi , ζj , ξi , ξj , θ) = H(ζi , ζj ) M2 (Y2i , Y2j , ζi , ζj , ξi , ξj , θ) f (Y2i ) f (ζi ) f (ξi ).

By combining Lemma 6, 7, and 8, we obtain the following quadratic approximation


of Ĝn (θ):

1 1
Ĝn (θ) = (θ − θ0 )′ V (θ − θ0 ) + √ (θ − θ0 )′ Zn + op (||θ − θ0 ||2 ) + Op (n−1 h−3
n ) + Op (n
−1/2
),
2 n
(41)
where Zn = Zn1 + Zn2 + Zn3 .


Lemma 9. If Assumptions 1 - 9 hold, then θ̂ is n-consistent.

Proof. Note that Op (n−1 h−3


n ) + Op (n
−1/2
) = Op (n−1 h−3
n ). We apply Lemma 4 to (41)
−1 −3
and deduce that ||θ − θ0 || = Op (n hn ).
1/2
Now, let Θn = {θ ∈ Θ : ||θ − θ0 || ≤ δn } and δn = O(ϵn ), where ϵn = n−1 h−3 n . Then
2 2 −2 6 2 2
we apply a Taylor expansion around θ = θ0 to Ũn (·, ·, θ) = M hn Un (·, ·, θ) to show
E[supΘ∈θ Ũn2 (·, ·, θ)2 ] = O(δn2 h3n ). Take γn = h3/2
n . We apply Theorem 3 of Sherman
[1994b] to see that uniformly over Θn , it holds that

1 X
U 2 (·, ·, θ) = M h−3 α α −1 −1
n Op (δn γn n ) = Op (n )Op (n
−α/2 −3
hn ) = op (n−1 )
n(n − 1) i̸=j n

by choosing α sufficiently close to 1 and Assumption 9. This implies that Op (n−1 h−3
n )
−1
term is op (n ) over Op (δn ) neighborhood of θ0 . Finally, we apply Lemma 4 again to

conclude that ||θ̂ − θ0 || = Op (1/ n).

Now, we define ∆i :

∆i = ∇1 τk (θ0 )
+ E[∇θ µ2,ξj (Y2j , Y2j , ζj , ζj , ξj , ξj , θ0 )] − E[∇θ µ2,ξi (Y2i , Y2i , ζi , ζi , ξi , ξi , θ0 )] ψγk


+ E[∇θ µ3,ζj (Y2j , Y2j , ζj , ζj , ξj , ξj , θ0 )] − E[∇θ µ3,ζi (Y2i , Y2i , ζi , ζi , ξi , ξi , θ0 )] ψδk .




39
Because E[∇1 τk (θ0 )] = 0 and E[ψγk ] = E[ψδk ] = 0, it holds that E[∆i ] = 0. Based on
d
Assumption 7 and the Lindberg-Levy CLT, we conclude that Zn → N (0, Λ), where
Λ = E[∆i ∆′i ]. The asymptotic normality of θ̂ follows from Theorem 2 of Sherman
[1993], i.e.

√ d
n(θ̂ − θ0 ) → N (0, V −1 ΛV −1 ).

40
B.3 Proof of Theorem 2
The conditional Kendall’s tau coefficient that captures the effect of Y2 on Y1 is given
as

′ ′
P
i̸=j ω̃ij sgn(Y1i − Y1j ) sgn(X2i γ̂ − X2j γ̂)
τ̂21 ≡ P , (42)
i̸=j ω̃ij

where ω̃ij ≡ kh (X1i ′ ′


β̂ −X1j β̂)· 1{Y3i = Y3j } . We want to derive a linear representation
for τ̂21 − τ21 , where

τ21 = E[sgn(Y1i − Y1j ) sgn(X2i


′ ′
γ0 − X2j ′
γ0 ) | X1i ′
β0 = X1j β0 , Y3i = Y3j ]. (43)

′ ′ ′
For notational convenience, we introduce the term (ζi , ξi , χi ) = (X3i δ0 , X2i γ0 , X1i β0 )
and (ζij , ξij , χij , Y2ij ) = (ζi − ζj , ξi − ξj , χi − χj , Y2i − Y2j ) again.
The difference between (42) and (43) is

ω̃ij [sgn(Y1ij ) sgn(ξˆij ) − τ21 ]


P
i̸=j
τ̂21 − τ21 = P . (44)
i̸=j ω̃ij

We
P first evaluate the probability limit of the denominator term in (44). Note that
i̸=j ω̃ij = E[ω̃ij ] + op (1) by the ULLN for the U -process. By the change of variable
and Assumption (9), E[ω̃ij ] → E[f (χi )] as n → ∞ and hn → 0. Then we consider the
following decomposition to deal with the numerator term in (44):

1 X
ω̃ij [sgn(Y1ij ) sgn(ξˆij ) − τ21 ]
n(n − 1) i̸=j
1 X
= ωij [sgn(Y1ij ) sgn(ξij ) − τ21 ] (45)
n(n − 1) i̸=j
1 X
+ (ω̃ij − ωij ) [sgn(Y1ij ) sgn(ξij ) − τ21 ] (46)
n(n − 1) i̸=j
1 X
ω̃ij sgn(Y1ij ) sgn(ξˆij ) − sgn(ξij ) ,
 
+ (47)
n(n − 1) i̸=j

where ωij = kh (χij ) · 1{Y3i = Y3j }. By combining a result for (45) with the asymptotic
properties of the denominator term in (44), we get the limiting distribution of the
infeasible test statistic that depends on true parameters γ0 and β0 . Also, results for
(46) and (47) show how the influence function is adjusted by using estimated values

41
γ̂ and β̂. We deal with (45), (46), and (47) consecutively and collect those results to
obtain the asymptotic linear representation for τ̂21 .

1. Asymptotic representation for (45)

Let us consider (45). We apply the Höeffding decomposition to (45) to obtain the
representation in the limit. As n → ∞ and hn → 0, the expectation of the term inside
the double summation becomes

E[ωij [sgn(Y1ij ) sgn(ξij ) − τ21 ]]


Z 
1  χij 
= K 1{Y3i = Y3j } E[sgn(Y1ij ) sgn(ξij ) − τ21 | χi , χj ] dF (χi ) dF (χj ) × Pr[Y3i = Y3j ]
hn hn
(48)
Then the integral term in (48) is

Z
K(u) E[sgn(Y1ij ) sgn(ξij ) − τ21 | χj + uhn , χj ] f (χj + uhn ) du dF (χj )
Z
= K(u) E[sgn(Y1ij ) sgn(ξij ) − τ21 | χj , χj ] f (χj ) du dF (χj ) + O(h2n )

= O(h2n ),

which is op (n−1/2 ) by the higher order properties of the kernel function K(·). Also,
we obtain the conditional expectations of the term inside the double summation con-
ditional on its first and second arguments, i and j. Consider the arguments w.r.t. i,

E[ωij [sgn(Y1ij ) sgn(ξij ) − τ21 ] | Y1i , ξi , χi ]


Z 
1  χij 
= K 1{Y3i = Y3j }[sgn(Y1ij ) sgn(ξij ) − τ21 ] dF (Y1j , ξj , χj ) × Pr[Y3i = Y3j ]
hn hn
(49)
Then the integral term in (49) is,

Z
K(u)[sgn(Y1ij ) sgn(ξij ) − τ21 ] f (χi + uhn ) du dF (Y1j , ξj )
Z
= K(u)[sgn(Y1ij ) sgn(ξij ) − τ21 ] f (χi ) du dF (Y1j , ξj ) + O(h2n )

= E[[sgn(Y1ij ) sgn(ξij ) − τ21 ]f (χi ) | Y1i , ξi ] + O(h2n )


= U(Y1i , ξi ) + O(h2n ),
where
U(Y1i , ξi ) ≡ E [sgn(Y1ij ) sgn(ξij ) − τ21 ]f (χi ) | Y1i , ξi .


42
Following a similar procedure, we obtain an analogous representation for the argu-
ments with respect to j. Applying similar arguments used to prove Lemma 3.1 in
Powell et al. [1989], the degenerate U -statistic of order 2 in the Höeffding decompo-
sition is op (n−1/2 ).
Therefore, (45) has the following representation in the limit:

1 X
ωij [sgn(Y1ij ) sgn(ξij ) − τ21 ]
n(n − 1) i̸=j
 n n 
1 X X
= U(Y1i , ξi ) + U(Y1j , ξj ) × Pr[Y3i = Y3j ] + op (n−1/2 ). (50)
n i=1 j=1

2. Asymptotic representation for (46)

Let us consider (46). By a mean-value expansion, the consistency of β̂, and the
linear representation for β̂, it holds that

1
h−1 k ′ (χ∗ )X ′ 1{Y3i = Y3j } (β̂ − β0 ) [sgn(Y1ij ) sgn(ξij ) − τ21 ]
X
n(n − 1) i̸=j n h ij 1ij
1
h−1 k ′ (χij ) 1{Y3i = Y3j } [sgn(Y1ij ) sgn(ξij ) − τ21 ]X1ij
X

= ψβk + op (n−1/2 ).
n(n − 1)(n − 2) i̸=j̸=k n h
(51)

Now, we consider the Höeffding decomposition of (51). Note that its unconditional
expectation and conditional expectation with respect to each of the first and the sec-
ond argument, i and j, are zero. Therefore, we only need the following conditional
expectation with respect to the third argument k:

n
h−1
E kh′ (χij ) 1{Y3i = Y3j }[sgn(Y1ij ) sgn(ξij ) − τ21 ]X1ij
X

n
ψβk + op (n−1/2 ).

(52)
n k=1

By applying TaylorR expansion to the expectation


R p+1 ′ term in the summation in (52) and
p ′
kernel properties u K (u)du = 0 and u K (u)du = −1 by Assumption 8, where
p is an even integer, we have

E kh′ (χij ) 1{Y3i = Y3j }[sgn(Y1ij ) sgn(ξij ) − τ21 ]X1ij





= E kh (χij ) E{[sgn(Y1ij ) sgn(ξij ) − τ21 ]X1ij


 ′ ′
| Y3i = Y3j , χi , χj }
Z 
= kh′ (χij ) D(Y3i = Y3j , χi , χj ) dF (χi ) dF (χj ) × Pr[Y3i = Y3j ], (53)

43
where

D(Y3i = Y3j , χi , χj ) = E [sgn(Y1ij ) sgn(ξij ) − τ21 ]X1ij




| Y3i = Y3j , χi , χj .

Then the integral term in (53) is

Z
K ′ (u) D(Y3i = Y3j , χj + uhn , χj )f (χj + uhn ) du dF (χj )
Z
= hn K ′ (u) [D1 (Y3i = Y3j , χj , χj )f (χj ) + D(Y3i = Y3j , χj , χj )f ′ (χj )]udu dF (χj ) + O(hn ),
(54)

where D1 (Y3i = Y3j , χi , χj ) is a partial derivative of D w.r.t χi .


Therefore, (46) has the following representation in the limit:

1 X
(ω̂ij − ωij ) [sgn(Y1ij ) sgn(ξij ) − τ21 ]
n(n − 1) i̸=j
n
1X
=− E[D1 (Y3i = Y3j , χj , χj )f (χj ) + D(Y3i = Y3j , χj , χj )f ′ (χj )] Pr[Y3i = Y3j ] ψβk + op (n−1/2 ).
n k=1
(55)

3. Asymptotic representation for (47)

Let us consider (47). Consider the following P -degenerate U -process.

1 X
ω̂ij sgn(Y1ij ) sgn(ξˆij ) − sgn(ξij )
 
n(n − 1) i̸=j
1
ω̂ij sgn(Y1ij ) E sgn(ξˆij ) − sgn(ξij ) | X2i , X2j .
X  
− (56)
n(n − 1) i̸=j

We observed that (56) evaluated at γ̂ = γ0 is zero. As a result, (56) is op (n−1 ) by


Theorem 3 in Sherman [1993]. Now (47) can be written as

1
ω̂ij sgn(Y1ij ) E sgn(ξˆij ) − sgn(ξij ) | X2i , X2j + op (n−1 ).
X  
n(n − 1) i̸=j

44
Recall that we set the first component of γ to 1 as a normalization. Then γ can be
written as γ = (1, γ̃), where 1 is the first component of γ and γ̃ is the other compo-
nents of γ. We have

E sgn(ξij ) | X̃2i , X̃2j = 2Fξ|X̃2 (X̃2ij γ̃) − 1,


 

where Fξ|X̃2 is the conditional CDF of ξij given X̃2i and X̃2j . Because the linear index
ξi = X2i γ is smooth in a neighborhood of γ0 given X̃2i and X̃2j by Assumption 5, the
conditional CDF of ξij is sufficiently smooth for a mean-value expansion:

Fξ|X̃2 (ξˆij ) = Fξ|X̃2 (X̃2ij γ̃0 ) + fξ|X̃2 (X̃2ij γ̃0 )(γ̂ − γ0 ) + Op (||γ̂ − γ0 ||2 ).

Therefore, (47) can be represented as

2 X
ω̂ij sgn(Y1ij ) fξ|X̃2 (ξij )(γ̂ − γ0 ) + Op (||γ̂ − γ0 ||2 ) + op (n−1 )
n(n − 1) i̸=j
2 X

= ω̂ij sgn(Y1ij ) fξ|X̃2 (ξij ) X̃2ij ψγk + op (n−1 ). (57)
n(n − 1)(n − 2) i̸=j̸=k

Now, we consider the Höeffding decomposition of (57). Again, notice that its uncon-
ditional expectation and conditional expectation on the first and second argument,
i and j, are zero. Therefore, we only need the following conditional expectation on
the third argument k:

n
2X
E[ω̂ij sgn(Y1ij ) fξ|X̃2 (ξij ) X̃2ij

] ψγk + op (n−1 ). (58)
n k=1

By the consistency of γ̂, we can work with true ωij instead of ω̂ij . Then the expecta-
tion term in (58) is

E[ωij sgn(Y1ij ) fξ|X̃2 (ξij ) X̃2ij



]
Z 
1  χij 
= K E[sgn(Y1ij ) fξ|X̃2 (ξij ) X̃2ij | Y3i = Y3j , χi , χj ] dF (χi ) dF (χj ) × Pr[Y3i = Y3j ],

hn hn
(59)

The integral term in (59) is

45
Z
K(u) E[sgn(Y1ij ) fξ|X̃2 (ξij ) X̃2ij

| Y3i = Y3j , χj + uhn , χj ] f (χj + uhn ) du dF (χj )
Z
= K(u) E[sgn(Y1ij ) fξ|X̃2 (ξij ) X̃2ij′
| Y3i = Y3j , χj , χj ] f (χj ) du dF (χj ) + O(h2n )

= E[T (Y3i = Y3j , χj , χj )] + O(h2n ),

where

T (Y3i = Y3j , χi , χj ) ≡ E[sgn(Y1ij ) fξ|X̃2 (ζij ) X̃2ij



| Y3i = Y3j , χi , χj ].

Therefore, (47) has the following representation in the limit:

1 X
ω̂ij sgn(Y1ij ) sgn(ξˆij ) − sgn(ξij )
 
n(n − 1) i̸=j
n
2X
= E[T (Y3i = Y3j , χj , χj )] Pr[Y3i = Y3j ]ψγk + op (n−1/2 ). (60)
n k=1

Taking all of our results (50), (55), and (60) into account, we obtain the linear repre-
sentation for τ̂21 − τ21 :

τ̂21 − τ21
n
X n
Pr[Y3i = Y3j ] 1 X
= U(Y1i , ξi ) + U(Y1j , ξj )
E[f (χi )] n i=1 j=1
n
E[D1 (Y3i = Y3j , χj , χj )f (χj ) + D(Y3i = Y3j , χj , χj )f ′ (χj )] ψβi
X

i=1
n i
E[T (Y3i = Y3j , χj , χj )] ψγi
X
+2 + op (n−1/2 ).
i=1

B.4 Proof of Corollary 1


′ ′
τ32 is a function of the event E32 = {X2i γ0 = X2j γ0 }. Also, τ21 is a function of the
′ ′
event E21 = {X1i β0 = X1j β0 , Y3i = Y3j }. By Assumption 5, the two events E32 and
E21 are measure-zero. Therefore, E32 and E21 are independent since Pr(E32 ∩ E21 ) =
0 and Pr(E32 ) Pr(E21 ) = 0.

46
References
Kees Jan van Garderen and Noud van Giersbergen. A nearly similar powerful test
for mediation. arXiv preprint arXiv:2012.11342, 2020.

Charles M Judd and David A Kenny. Process analysis: Estimating mediation in


treatment evaluations. Evaluation review, 5(5):602–619, 1981.

Reuben M Baron and David A Kenny. The moderator–mediator variable distinction


in social psychological research: Conceptual, strategic, and statistical considera-
tions. Journal of personality and social psychology, 51(6):1173, 1986.

James Heckman, Rodrigo Pinto, and Peter Savelyev. Understanding the mechanisms
through which an influential early childhood program boosted adult outcomes.
American Economic Review, 103(6):2052–2086, 2013.

James J Heckman and Rodrigo Pinto. Econometric mediation analyses: Identifying


the sources of treatment effects from experimentally estimated production tech-
nologies with unmeasured and mismeasured inputs. Econometric reviews, 34(1-2):
6–31, 2015.

Martin Huber. Mediation analysis. Handbook of Labor, Human Resources and Population
Economics, pages 1–38, 2020.

Viviana Celli. Causal mediation analysis in economics: Objectives, assumptions,


models. Journal of Economic Surveys, 36(1):214–234, 2022.

Michael E Sobel. Asymptotic confidence intervals for indirect effects in structural


equation models. Sociological methodology, 13:290–312, 1982.

James M Robins and Sander Greenland. Identifiability and exchangeability for direct
and indirect effects. Epidemiology, 3(2):143–155, 1992.

Judea Pearl. Direct and indirect effects. In Probabilistic and causal inference: the works
of Judea Pearl, pages 373–392. 2022.

Kosuke Imai, Luke Keele, and Dustin Tingley. A general approach to causal media-
tion analysis. Psychological methods, 15(4):309, 2010a.

Nattavudh Powdthavee, Warn N Lekfuangfu, and Mark Wooden. The marginal


income effect of education on happiness: Estimating the direct and indirect effects
of compulsory schooling on well-being in australia. Working Paper, 2013.

Sung Jae Jun, Joris Pinkse, Haiqing Xu, and Neşe Yıldız. Multiple discrete endoge-
nous variables in weakly-separable triangular models. Econometrics, 4(1):7, 2016.

Markus Frölich and Martin Huber. Direct and indirect treatment effects–causal
chains and mediation analysis with instrumental variables. Journal of the Royal
Statistical Society, 79(5):1645–1666, 2017.

Christian Dippel, Robert Gold, Stephan Heblich, and Rodrigo Pinto. Mediation anal-
ysis in iv settings with a single instrument. Working Paper, 2020.

47
Yi-Ting Chen, Yu-Chin Hsu, and Hung-Jen Wang. A stochastic frontier model with
endogenous treatment status and mediator. Journal of business & economic statistics,
38(2):243–256, 2020.

Sara Geneletti. Identifying direct and indirect effects in a non-counterfactual frame-


work. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(2):
199–215, 2007.

Michael E Sobel. Identification of causal parameters in randomized studies with


mediating variables. Journal of Educational and Behavioral Statistics, 33(2):230–251,
2008.

Marshall M Joffe, Dylan Small, Thomas Ten Have, Steve Brunelli, and Harold I Feld-
man. Extended instrumental variables estimation for overall effects. The interna-
tional journal of biostatistics, 4(1), 2008.

Teppei Yamamoto. Identification and estimation of causal mediation effects with


treatment noncompliance. Unpublished manuscript, 2013.

Stacey H Chen, Yen-Chien Chen, and Jin-Tan Liu. The impact of family composition
on educational achievement. Journal of Human Resources, 54(1):122–170, 2019.

Stephen Burgess, Rhian M Daniel, Adam S Butterworth, Simon G Thompson, and


EPIC-InterAct Consortium. Network mendelian randomization: using genetic
variants as instrumental variables to investigate mediation in causal pathways.
International journal of epidemiology, 44(2):484–495, 2015.

Min A Jhun. Epidemiologic approaches to understanding mechanisms of cardiovas-


cular diseases: Genes, environment, and dna methylation. Dissertation, 2015.

Christian Dippel, Robert Gold, Stephan Heblich, and Rodrigo Pinto. The effect of
trade on workers and voters. The Economic Journal, 132(641):199–217, 2022.

Jason Abrevaya, Jerry A Hausman, and Shakeeb Khan. Testing for causal effects in
a generalized regression model with endogenous regressors. Econometrica, 78(6):
2043–2061, 2010.

Azeem Shaikh and Edward J Vytlacil. Threshold crossing models and bounds on
treatment effects: a nonparametric analysis, 2005.

Jay Bhattacharya, Azeem M Shaikh, and Edward Vytlacil. Treatment effect bounds
under monotonicity assumptions: an application to swan-ganz catheterization.
American Economic Review, 98(2):351–356, 2008.

Richard C Chiburis. Semiparametric bounds on treatment effects. Journal of Econo-


metrics, 159(2):267–275, 2010.

Azeem M Shaikh and Edward J Vytlacil. Partial identification in triangular systems


of equations with binary dependent variables. Econometrica, 79(3):949–955, 2011.

48
Cecilia Machado, Azeem M Shaikh, and Edward J Vytlacil. Instrumental variables
and the sign of the average treatment effect. Journal of Econometrics, 212(2):522–555,
2019.

Brendan Kline. Identification of the direction of a causal effect by instrumental vari-


ables. Journal of Business & Economic Statistics, 34(2):176–184, 2016.

Aaron K Han. Non-parametric analysis of a generalized regression model: the max-


imum rank correlation estimator. Journal of Econometrics, 35(2-3):303–316, 1987.

James L Powell, James H Stock, and Thomas M Stoker. Semiparametric estimation of


index coefficients. Econometrica: Journal of the Econometric Society, pages 1403–1430,
1989.

Hidehiko Ichimura. Semiparametric least squares (sls) and weighted sls estimation
of single-index models. Journal of econometrics, 58(1-2):71–120, 1993.

Roger W Klein and Richard H Spady. An efficient semiparametric estimator for


binary response models. Econometrica: Journal of the Econometric Society, pages
387–421, 1993.

Jason Abrevaya. Rank estimation of a generalized fixed-effects regression model.


Journal of Econometrics, 95(1):1–23, 2000.

Andrew Chesher. Identification in nonseparable models. Econometrica, 71(5):1405–


1441, 2003.

Andrew Chesher. Nonparametric identification under discrete variation. Economet-


rica, 73(5):1525–1550, 2005.

Guido W Imbens and Whitney K Newey. Identification and estimation of triangular


simultaneous equations models without additivity. Econometrica, 77(5):1481–1512,
2009.

Kristopher J Preacher, Derek D Rucker, and Andrew F Hayes. Addressing moderated


mediation hypotheses: Theory, methods, and prescriptions. Multivariate behavioral
research, 42(1):185–227, 2007.

Kosuke Imai, Luke Keele, and Teppei Yamamoto. Identification, inference and sen-
sitivity analysis for causal mediation effects. Statistical Science, 25(1):51–71, 2010b.

David P MacKinnon, Chondra M Lockwood, Jeanne M Hoffman, Stephen G West,


and Virgil Sheets. A comparison of methods to test mediation and other interven-
ing variable effects. Psychological methods, 7(1):83, 2002.

Christopher Cavanagh and Robert P Sherman. Rank estimators for monotonic index
models. Journal of Econometrics, 84(2):351–381, 1998.

Robert P Sherman. The limiting distribution of the maximum rank correlation esti-
mator. Econometrica: Journal of the Econometric Society, pages 123–137, 1993.

49
Ariel Pakes and David Pollard. Simulation and the asymptotics of optimization es-
timators. Econometrica: Journal of the Econometric Society, pages 1027–1057, 1989.

Robert P Sherman. Maximal inequalities for degenerate u-processes with applica-


tions to optimization estimators. The Annals of Statistics, 22(1):439–459, 1994a.

Robert P Sherman. U-processes in the analysis of a generalized semiparametric re-


gression estimator. Econometric theory, 10(2):372–395, 1994b.

Jason Abrevaya. Computation of the maximum rank correlation estimator. Economics


letters, 62(3):279–285, 1999.

Shakeeb Khan, Fu Ouyang, and Elie Tamer. Inference on semiparametric multino-


mial response models. Quantitative Economics, 12(3):743–777, 2021.

Charles F Manski. Semiparametric analysis of discrete response: Asymptotic prop-


erties of the maximum score estimator. Journal of econometrics, 27(3):313–333, 1985.

Michael D Perlman and Lang Wu. The emperor’s new tests. Statistical Science, 14(4):
355–369, 1999.

Jeremy C Biesanz, Carl F Falk, and Victoria Savalei. Assessing mediational models:
Testing and interval estimation for indirect effects. Multivariate Behavioral Research,
45(4):661–701, 2010.

Saralees Nadarajah and Tibor K Pogány. On the distribution of the product of cor-
related normal random variables. Comptes Rendus Mathematique, 354(2):201–204,
2016.

GFV Glonek. On the behaviour of wald statistics for the disjunction of two regular
hypotheses. Journal of the Royal Statistical Society: Series B (Methodological), 55(3):
749–755, 1993.

Jean-Marie Dufour, Eric Renault, and Victoria Zinde-Walsh. Wald tests when restric-
tions are locally singular. arXiv preprint arXiv:1312.0569, 2013.

Mathias Drton and Han Xiao. Wald tests of singular hypotheses. Bernoulli, pages
38–59, 2016.

Zhonghua Liu, Jincheng Shen, Richard Barfield, Joel Schwartz, Andrea A Baccarelli,
and Xihong Lin. Large-scale hypothesis testing for causal mediation effects with
applications in genome-wide epigenetic studies. Journal of the American Statistical
Association, 117(537):67–81, 2022.

Noud PA van Giersbergen et al. Inference about the indirect effect: a likelihood
approach. UvA-Econometrics Discussion Papers, 2014.

Shakeeb Khan, Xiaoying Lan, Elie Tamer, and Qingsong Yao. Estimating high dimen-
sional monotone index models by iterative convex optimization. arXiv preprint
arXiv:2110.04388, 2023.

50
Deborah Nolan and David Pollard. U-processes: rates of convergence. The Annals of
Statistics, pages 780–799, 1987.

Whitney K Newey and Daniel McFadden. Large sample estimation and hypothesis
testing. Handbook of econometrics, 4:2111–2245, 1994.

51

You might also like