0% found this document useful (0 votes)

27 views24 pages

Qasim Et Al 2024 Lasso Type Instrumental Variable Selection Methods With An Application To Mendelian Randomization

Uploaded by

mikoking1000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views24 pages

Qasim Et Al 2024 Lasso Type Instrumental Variable Selection Methods With An Application To Mendelian Randomization

Uploaded by

mikoking1000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/385849040

LASSO-type instrumental variable selection methods with an application to

Mendelian randomization

Article in Statistical Methods in Medical Research · January 2025

DOI: 10.1177/09622802241281035

CITATIONS READS

0 19

3 authors, including:

M. Qasim Kristofer Månsson

Jönköping University Jönköping University
50 PUBLICATIONS 918 CITATIONS 78 PUBLICATIONS 1,552 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by M. Qasim on 15 November 2024.

The user has requested enhancement of the downloaded file.

Original Research Article

Statistical Methods in Medical Research

2024, Vol. 0(0) 1–23
LASSO-type instrumental variable © The Author(s) 2024

selection methods with an application Article reuse guidelines:

sagepub.com/journals-permissions
to Mendelian randomization DOI: 10.1177/09622802241281035
journals.sagepub.com/home/smm

Muhammad Qasim1 , Kristofer Månsson1

and Narayanaswamy Balakrishnan2

Abstract
Valid instrumental variables (IVs) must not directly impact the outcome variable and must also be uncorrelated with
nonmeasured variables. However, in practice, IVs are likely to be invalid. The existing methods can lead to large bias
relative to standard errors in situations with many weak and invalid instruments. In this paper, we derive a LASSO
procedure for the k-class IV estimation methods in the linear IV model. In addition, we propose the jackknife IV method
by using LASSO to address the problem of many weak invalid instruments in the case of heteroscedastic data. The
proposed methods are robust for estimating causal effects in the presence of many invalid and valid instruments, with
theoretical assurances of their execution. In addition, two-step numerical algorithms are developed for the estimation of
causal effects. The performance of the proposed estimators is demonstrated via Monte Carlo simulations as well as an
empirical application. We use Mendelian randomization as an application, wherein we estimate the causal effect of body
mass index on the health-related quality of life index using single nucleotide polymorphisms as instruments for body mass
index.

Keywords
Causal inference, instrumental variable, model selection, LASSO, jackknife, heteroscedasticity
JEL Classification: C13, C26, C36

1 Introduction
The instrumental variable (IV) technique is one of the most commonly used causal inference methods for analyzing obser-
vational and experimental studies with unmeasured confounders. This technique is based on three important assumptions.1
The first assumption is relevance, which requires that the exposure not be independent of the instrument. The second
assumption is exclusion, which requires the instrument’s impact on the outcome to be completely mediated by the expo-
sure. The final assumption is the independence of confounding factors (unmeasured variables). An example of IV analysis
in medical statistics is Mendelian randomization (MR), wherein genetic data are used as instruments to distinguish cau-
sation from correlation while analyzing the effects of adjustable risk factors (e.g. body mass index, blood pressure, and
alcohol intake) on health, social and economic outcomes. However, a difficult task in MR is identifying IVs that fulfill the
above-stated assumptions.2
One challenge regarding the relevance assumption is when instruments (e.g. genetic markers) are only weakly associated
with the outcome variable. Staiger and Stock3 derived the effects of weak instruments on the linear IV model, which led
to the development of a simple F-test for weak instruments introduced by Stock and Yogo.4 Seng and Li5 proposed a
model averaging method to address the issue of high-dimensional and weak instruments. Qasim et al.6 suggested weighted

1 Jönköping International Business School, Jönköping University, Jönköping, Sweden

2 Department of Mathematics and Statistics, McMaster University, Hamilton, ON, Canada

Corresponding author:
Kristofer Månsson, Jönköping International Business School, Jönköping University, Jönköping, Sweden.
Email: [email protected]
2 Statistical Methods in Medical Research 0(0)

average K-class IV methods to address the issue of many weak instruments. However, these methods are developed under
the assumption that all the instruments are valid. A second challenge is potential heteroscedasticity, which can bias the
classical two-stage least squares (TSLS) estimator, as demonstrated by Angrist et al.7 A third challenge arises when some
available instruments are invalid, as they may directly affect the outcome of interest. If IVs are uncorrelated, this issue can
be addressed via methods from the meta-analysis literature. When all instruments are valid, the inverse-variance weighted
method can be employed, and if a majority of the instruments are valid, then the median estimator, as suggested by Bowden
et al.,8 can be used. Further enhancements to these estimators are described in Burgess et al.9 In recent work, Seng et al.10
used model averaging in the linear IV model to address the challenge of high dimensionality. This model averaging approach
uses different subsets of single nucleotide polymorphisms (SNPs) as instruments to predict exposure, followed by weighting
the submodel predictions via penalization methods.
With potentially correlated instruments and if no prior knowledge exists regarding the validity of the instruments, this
problem can instead be treated as a model selection problem. This approach is more informative since it also shows which
instruments are in fact invalid and have a direct effect on the outcome variable. Andrews11 introduced the moment selec-
tion criterion (MSC) for the IV model, which is estimated via the generalized method of moments. However, this method
becomes computationally infeasible when the number of instruments is large. For this reason, Kang et al.12 proposed a
LASSO-type procedure for TSLS, which is as computationally fast as ordinary least squares (OLS). Even without prior
knowledge of the instrument’s validity, this method can identify valid instruments and estimate the causal effect under the
weak condition that the proportion of invalid instruments is strictly less than 50% of the total instruments. Windmeijer
et al.13 further developed this method and introduced the adaptive LASSO (ALASSO) approach, which can be used when
invalid instruments are relatively strong. Lin et al.14 introduced a robust IV estimation method to overcome the issue of
many weak and invalid instruments via a surrogate sparsest penalty. Moreover, accurate causal inference without selecting
instruments, especially in the context of Mendelian randomization methods from the meta-analysis literature, has been con-
sidered. Notable examples are the median8 and mode15 estimators. Using the flexible variable selection approach that allows
for correlated instruments, we show that one can find robust estimators for both weak instruments and heteroscedasticity.
The first contribution of this paper is that it adds to this growing research field by addressing the issue of invalid instru-
ments under many weak instruments. According to Hernan and Robins16 and Davies et al.,2 in the presence of weak
instruments, even minor deviations from the exclusion assumption cause large bias in the estimated causal effect. Therefore,
this is a particularly important empirical situation to examine. By following Kang et al.,12 we derive a LASSO procedure
for the limited information maximum likelihood (LIML) estimator and FUL17 estimator. We primarily consider situations
with a single outcome and a single risk factor. Burgess et al.18 stated that the methods do not significantly differ in this
situation; the main difference is that LIML estimates parameters only from a single equation, whereas FUL uses a three-
stage least squares approach and estimates the model simultaneously as a system of equations. When LIML is used, not all
moments are defined, but FUL does not suffer from this, as mentioned by Hahn et al.19 A significant advantage of LIML
and FUL over TSLS is that the median of the distribution of the LIML estimator is close to being unbiased in the presence
of many weak instruments.18
The second contribution of the paper is the use of the jackknife technique to derive heteroskedasticity-robust versions of
the LASSO type of estimators for TSLS, LIML and FUL. Angrist et al.7 showed that the TSLS is biased in both situations
and suggested a jackknife approach that performs better. Furthermore, Hausman et al.20 showed that the LIML estimator
is biased and presented some conditions under which it is even inconsistent in the presence of many instruments and
heteroscedasticity. These authors then derived heteroskedasticity-robust versions of the LIML and FUL estimators (denoted
as HLIML and HFUL, respectively). In this paper, we derive the jackknife version of the sisVIVE12 estimator in the presence
of many invalid instruments; this estimator is robust to heteroscedasticity. We also derive jackknife versions of the LIML
and FUL estimators, which provide comparatively easy solutions to the problem of many invalid and valid instruments in
the case of heteroscedastic data. Additionally, for convenience, we created an R package for implementing the proposed
methods.1
We show in the Monte Carlo simulation study that the LIML and FUL estimators yield substantial improvements in
high-dimensional instrumental variable studies. These improvements are especially pronounced for many weak instru-
ments. Our simulation results also reveal substantial improvements in the bias and median square error (MSE) when the
jackknife approach is used for both heteroscedastic and homoscedastic data. Therefore, we recommend that researchers
and practitioners use the jackknife technique, especially in the presence of heteroscedasticity. In real-life applications, we
use all of the suggested estimators in an MR study in which we estimate the causal effect of body mass index (BMI) on the
health-related quality of life index (HRQLI) via SNPs as instruments for BMI. Owing to the presence of heteroscedasticity
and weak instruments, the jackknife IV method performs the best in this case and yields quite reasonable results.
The remainder of this paper is organized as follows. In Section 2, the model construction and notations used are dis-
cussed, and the valid and invalid instruments in the linear IV model are defined. The LASSO-type robust estimation method
Qasim et al. 3

is introduced, and its properties and theoretical performance are then discussed in Section 3. The simulation study and
empirical application are detailed in Sections 4 and 5, respectively. Finally, some concluding remarks are provided in
Section 6. All mathematical proofs are provided in Appendix Sections A–C of the supplementary materials.

2 Model construction
We define the causal model by following the lines of Kang et al.12 and Small.21 Suppose we have n observations (Yi , Xi , Z i. :
i = 1, … , n) that are independently and identically distributed, where Yi ∈ ℝ1 and Xi ∈ ℝ1 represent the observed outcome
and the exposure (endogenous) variable, respectively, and the variables Z i ∈ ℝL are the IVs. The model for the random
sample is given by
Yi = Xi 𝛽0 + Z Ti. 𝜹0 + ei , 𝔼(ei |Z i. ) = 0, (2.1)
where 𝛽0 and 𝜹0 are the true parameters, ei is an error term and 𝛽0 is the causal parameter of interest. We further assume
that 𝔼[e2i |Z i. ] = 𝜎e2 and let 𝜹0 = 𝜸 0 + 𝚪0 , where 𝜸 0 represents the direct effect of the IVs on the outcome and where 𝚪0
represents the association between the IVs and the confounders. By defining 𝝍̂ = (Z T Z)−1 Z T X such that X̂ = ℙZ X with
the ith element of X̂ i being X̂ i = Z Ti. 𝝍,
̂ we define

Xi = Z Ti. 𝝍 0 + 𝜇i , (2.2)

where 𝝍 0 = (𝔼[Z Ti. Z i. ])−1 𝔼[Z i. Xi ] and where 𝜇i is an error term; therefore, 𝔼[Z i. 𝜇i ] = 0. Both ei and 𝜇i are random errors
[ 2 ]
𝜎e 𝜎e𝜇
and let 𝝃 i = (ei 𝜇i ) . The mean is 𝔼[𝝃 i ] = 0, and the variance–covariance matrix is 𝔼[𝝃 i 𝝃 i. ] =
T T
. In addition, the
𝜎𝜇e 𝜎𝜇2
assumption of the error terms under the setting of homoscedasticity and heteroscedasticity is discussed in Assumption 1.3.
Kang et al.12 emphasized the uniqueness of the solutions for parameters 𝛽0 and 𝜹0 and discussed necessary and sufficient
conditions for identifying 𝛽0 and 𝜹0 . If 𝜸 0 = 0, then there is no direct effect of instruments on the outcome, and similarly,
if 𝚪0 = 0, then there are no confounders because 𝜹0 = 0. The value of 𝜹0 encompasses the concept of valid and invalid
instruments. Therefore, the definition of valid and invalid instruments states that the instruments (j = 1, … , L) are valid
when 𝛿0,j = 0 and that the instruments (j = 1, … , L) are invalid when 𝛿0,j ≠ 0. Assume that Z IN is the set of invalid
instruments, where IN = (j = 1, … , L : 𝛿0,j ≠ 0) and 𝜹IN ∈ ℝr is the coefficient vector of invalid instruments. The definition
of valid instruments corresponds to the formal definition of Holland22 and a special case of the valid instrument’s definition
of Angrist et al.23 when L = 1. The theory of valid IVs can be perceived as a simplification of Holland’s22 model when
L > 1. Let r = 0, 1, … , L − 1 denote the number of invalid instruments that are below the upper bound, U = r + 1, i.e.
r < U. For any full-rank matrix Z ∈ ℝn×L , 𝕄Z = I n − ℙZ is the residual-forming matrix, where ℙZ = Z(Z T Z)−1 Z T is the
projection matrix onto the column space of Z and where I n is an identity matrix of n×n. The lp -norm is denoted by ‖ ⋅ ‖p so
that the l0 - norm corresponds to ‖ ⋅ ‖0 , which yields the number of nonzero components of a vector, and the l∞ - norm is
denoted by ‖ ⋅ ‖∞ , which yields the maximum element of a vector. We have, for example, 𝜹0 , which represents the number
of nonzero components in 𝜹. The vector 𝜹 is known as r -sparse if it contains r ≤ L nonzero elements. Let S ⊆ (1, 2, … L)
be any set and let S c denote the complement of set S. Furthermore, let supp(𝜹) = {j : 𝛿j ≠ 0} denote the support of 𝜹. If
∑m ∑n
𝔸 ∈ ℝm×n and 𝔹 ∈ ℝm×n are two matrices, their inner product is defined as {𝔸, 𝔹} = tr(𝔸T 𝔹) = i=1 k=1 aik bik .
The basic definitions of the restricted isometry (RI) property and restricted orthogonality constant (ROC) are given by
Khosravy et al.,24 Cai and Zhang25 and Cai et al.26 We use Definitions 2.1 and 2.2 below to analyze the performance of the
l1 -penalized k-class IV method. The RI property and ROC determine what subsets of cardinality q of columns of matrix
𝔸 are in an orthonormal structure. These conditions are common in the high-dimensional setting of the linear model.

Definition 2.1. A matrix 𝔸 has the RI property of order q if (1 − Δq ) ∥ 𝜹 ∥22 ≤∥ 𝔸𝜹 ∥22 ≤ (1 + Δq ) ∥ 𝜹 ∥22 for all q -sparse
vectors 𝜹, where Δq ∈ (0, 1). To simplify the notation, we define

Δ−q (𝔸)‖𝜹‖22 ≤ 𝔸‖𝜹‖22 ≤ Δ+q (𝔸)‖𝜹‖22 , ∀|𝜹| ≤ q, (2.3)

where Δ+q (𝔸) and Δ−q (𝔸) are the upper and lower RI property constants of order q.

Definition 2.2. If q + q′ ≤ p, then q, q′ - ROC 𝜃q,q′ (𝔸) is the smallest nonnegative number such that

|⟨𝔸𝜹, 𝔸𝜹′ ⟩| ≤ 𝜃q,q′ (𝔸)‖𝜹‖22 ‖𝜹′ ‖22

4 Statistical Methods in Medical Research 0(0)

for all 𝜹 and 𝜹′ , where 𝜹 and 𝜹′ are q-sparse and q′ -sparse vectors, respectively, and have nonoverlapping support.

3 l1 -Penalized instrumental variables estimation

It is important to first state the conditions on which the l1 -penalized IV estimation methods are based.

Assumption 1.

1. (Yi , Xi , Z i. : i = 1, … , n) are independently and identically distributed;

2. 𝔼[Z i. Z Ti. ] is of full rank and positive definite;
(( ) ) [ 2 ]
ei 𝜎e 𝜎e𝜇
3. |Z i ∼ N(0, Σ) and Σ = ;
𝜇i 𝜎𝜇e 𝜎𝜇2
4. 𝝍 0 = (𝔼[Z i. Z Ti. ])−1 𝔼(Z i. Xi ) with elements of 𝝍 0 being nonzero, i.e. 𝜓0,j ≠ 0∀j = 1, … , L.

Assumption 1.1 is a basic assumption that states that the observations are i.i.d. Assumption 1.2 requires the usual
identification assumption to be satisfied and the matrix Z to be full rank. In assumption 1.3, we first make a conditional
homoscedasticity assumption on the errors given the instruments, and we assume that the elements of Σ are finite.27 We
relax assumption 1.3 and propose the robust methods in Section 3.4 by following Hausman et al.20 if the errors are het-
eroscedastic, which is more common in practical applications. Assumption 1.4 indicates that the matrix Z is associated
with the exposure variable X.
] of IV estimators is found when the invalid instrumental variables (Z IN ) are known, and we then set
The [oracle class
ℚIN = X Z IN . Specifically, we consider estimators of the form
( )
̂k = 𝛽̂
𝚯 ̂𝜹IN = (ℚTIN (I n − k𝕄Z )ℚIN )−1 ℚTIN (I n − k𝕄Z )Y (3.1)

with different methods of estimating k. Eq. (3.1) encompasses all of the well-known k-class estimators. For example,
the OLS and TSLS estimators are special cases of these estimators when k = 0 and k = 1, respectively. In addi-
tion, Eq. (3.1) corresponds to the LIML estimator when [ k = ] 𝜅̂ liml , where 𝜅̂ liml is the smallest eigenvalue of the matrix
[𝕎 T 𝕄Z 𝕎]−1∕2 𝕎 T MZIN 𝕎[𝕎 T 𝕄Z 𝕎]−1∕2 , with 𝕎 = Y X , and therefore depends only on observable data and not on
unknown parameters.28 The modification of the LIML method known as FUL17 is also classified as a k-class estimator
where k = 𝜅̂ ful = [𝜅̂ liml − C0 (1 − 𝜅̂ liml )∕n]∕[1 − C0 (1 − 𝜅̂ liml )∕n] with a constant value of C0 . Note that 𝜅̂ liml ≥ 1 since
span(Z IN ) ⊂ span(Z) and 𝕎 T MZ IN 𝕎 cannot be smaller than [𝕎 T 𝕄Z 𝕎]−1 when the number of invalid instruments is
known. The FUL estimator was developed because the LIML estimator does not have moments since its distribution has
heavy tails, leading to high dispersion in finite samples.19 The FUL estimator addresses this problem. This modification of
LIML further leads to an FUL estimator with the existence of moments. LIML and FUL were developed as alternatives
to the TSLS estimator since they are capable of handling weak instruments, many instruments and misspecification of the
model.

3.1 Penalized k-class estimators

Here, we introduce the equivalent Lagrangian structure as an estimator of the causal effect, called the penalized k-class IV
(PKCIV) estimation method, as follows:

(𝜆) 1
(𝛽̂(𝜆) , 𝜹̂ ) ∈ argmin (I n − k𝕄Z )(Y − X𝛽 − Z𝜹)‖22 + 𝜆‖𝜹‖1 (3.2)
𝛽,𝜹 2

for 𝜆 ∈ ℝ>0 . The class of estimators in (3.2) is a modification of the popular LASSO29 method, wherein we consider
Model (2.1) and use l1 -penalization to parameter 𝜹 with many valid and invalid instruments. The PKCIV method does not
penalize 𝛽0 because it is the main parameter of interest, and we do not wish to bias the estimation of the causal effect. The
proposed estimator in (3.2) is a k-class invalid and valid IV estimator and can be seen as a generalization of Kang et al.’s12
estimator if k = 1, (3.2) is the penalized TSLS (PTSLS) estimator. Similarly, (3.2) corresponds to the penalized LIML
(PLIML) and penalized FUL (PFUL) estimators when k = 𝜅̂ liml and k = 𝜅̂ ful , respectively.
The choice of the tuning parameter 𝜆 affects the performance of the PKCIV estimator and affects the intensity of
the sparsity of the solution. Figure 1 shows the LASSO regularization path using the IV method to illustrate how the
Qasim et al. 5

Figure 1. LASSO instrumental variable regularization path.

coefficient estimates of 𝜹 decrease to zero as 𝜆 increases. Each curve corresponds to a variable. The axis above indicates
(𝜆)
the number of instruments at the current value of 𝜆. For 𝜆 → 0, few elements of 𝜹̂ will be zero, indicating that most
instruments are estimated to be invalid instruments. On the other hand, for large values of 𝜆, the penalty function, 𝜆 ∥
𝜹1 ∥1 , surpasses the sum of squares, which strongly penalizes parameter 𝜹, and most instruments are estimated as valid
instruments. Intermediate tuning parameter values yield a balance between these two extremes. An important aspect of the
PKCIV estimator is choosing the tuning parameter 𝜆.
Several different methods for selecting 𝜆 have been discussed in the literature. Selecting 𝜆 through cross-validation is
a very common data-driven approach that aims for optimal prediction accuracy. Various types of cross-validation exist,
such as K-fold and leave-out cross-validation. In this paper, we use 10-fold cross-validation, which is frequently used in
practice. We minimize the predictive error ‖Y − X𝛽 − Z𝜹‖2 while using 10-fold cross-validation, and the parameter of
interest is 𝛽0 .

3.2 Estimating the causal effect

We introduce a numerical optimization algorithm for estimating parameters 𝛽 and 𝜹. The solution of the numerical
algorithm is equivalent to the PKCIV estimator in (3.2). First, we rewrite (3.2) as

(𝜆) 1
𝛽̂(𝜆) , 𝜹̂ = argmin ‖(ℙZ + (1 − k)𝕄Z )(Y − X𝛽 − Z𝜹)‖22 + 𝜆‖𝜹‖1 .
𝛽,𝜹 2

(𝜆)
Step-I: Then, we obtain the estimator 𝜹̂ for a given 𝜆 > 0 as

(𝜆) 1
𝜹̂ = argmin ‖Ỹ − Z𝜹‖
̃ 2 + 𝜆‖𝜹‖1 ,
2
𝜹 2

where Ỹ = 𝕄X̂ ℙZ Y , Z̃ = 𝕄X̂ Z and 𝜆 are estimated through cross-validation.

(𝜆)
Step-II: Given the estimator 𝜹̂ , we obtain an estimator for 𝛽 as

(𝜆)
X̃ Y − X̂ Z 𝜹̂
T T
𝛽̂(𝜆) = ,
X̂ X̂ + d(X T X − X̂ X)̂
T T
6 Statistical Methods in Medical Research 0(0)

where X̃ = X̂ + d(X − X) ̂ and d = (1 − k)2 . Note that in the selection stage, we use the LASSO procedure with a k-class
estimator-based objective function. The tuning parameter, 𝜆, is chosen through cross-validation, wherein we minimize the
predictive error for the PTSLS, PLIML and PFULL estimators. This algorithm uses 10-fold cross-validation to determine
the optimal value of 𝜆, selecting it on the basis of the cross-validation results. Each method in PKCIV provides both
the estimated causal effect of exposure on the outcome and the set of invalid instruments for a specific 𝜆. Finally, the
algorithm gives a list of estimated results, which contains the estimations of 𝜹, 𝛽, and the set of invalid instruments for
the best 𝜆. This numerical algorithm is thus simple and easy to calculate as least squares. The theoretical properties of
this two-step algorithm are discussed in Appendix A. The PLIML estimator can be computed by finding 𝜅̂ liml and then
using this in the estimation of the causal effect of exposure on the outcome for d = (1 − 𝜅̂ liml )2 . Let C0 = 117 and 𝜅̃ ful =
[𝜅̂ liml − C0 (1 − 𝜅̂ liml )∕n]∕[1 − C0 (1 − 𝜅̂ liml )∕n]. Then, the value of 𝜅̂ ful in step II is substituted for d = (1 − 𝜅̂ ful )2 to compute
the PFUL estimator for the causal parameter.

3.3 Theoretical performance of the PKCIV estimator

To minimize the structure of the PKCIV method, Eq. (3.2) might have different minimizers, particularly for estimating the
causal effect of parameter 𝛽0 , because ‖𝜹‖1 is not strictly convex. In this case, the value of the parameter may need to be
carefully tuned to ensure that the algorithm is able to converge to the global minimum. The estimated difference between
all the minimizers of (3.2) and 𝛽0 , that is |𝛽̂(𝜆) − 𝛽0 |, is analyzed in this section. Through the RI property and ROC, we
illustrate the performance of the PKCIV estimator in finite samples. Let X̂ = ℙZ X be the predicted value of X given Z and
the residual-forming matrix be 𝕄Z . The solution of (3.2) is unique when the elements of the matrix 𝕄Z Z are taken from
a continuous distribution.30 The following theorem is a generalization of the theorem based on PTSLS (𝛽̂(𝜆) ) provided by
Kang et al.,12 wherein we consider the general estimator that includes the k-class IV methods.

(𝜆)
Theorem 3.1. Consider model (2.1) with X̂ = ℙZ X under assumptions 1.1–1.4. Let 𝜹̂ and 𝛽̂(𝜆) be the minimizers of
(3.2) with {e ∈ ℝ : Z 𝕄X̂ ℙZ e∞ ≤ 𝜆∕3} for 𝜆 > 0. Then:
n T

i. The estimator 𝛽̂(𝜆) can be expressed as

(𝜆)
X̂ ℙZ ℙX̂ Z(𝜹0 − 𝜹̂ ) + (X̂ + d(X T − X̂ ))e
T T T

𝛽̂(𝜆) = 𝛽0 + ; (3.3)
̂ 2 + d(‖X‖2 − ‖X‖
‖X‖ ̂ 2)
2 2 2

ii. Suppose that the condition Δ+2r (Z) < 2(Δ−2r (Z) − Δ+2r (ℙX̂ Z)) holds by definition of the RI constants. Then, 𝛽̂(𝜆) is such
that
( )
4𝜆(5Δ+2r (ℙX̂ Z))
1∕2
|(X̂ + d(X T − X̂ ))e|
T T
̂(𝜆) 1
‖𝛽 − 𝛽 0 ‖2 ≤ + (3.4)
̂ 2 + d(‖X‖2 − ‖X‖
‖X‖ ̂ 2) 6(Δ−2r (Z) − Δ+2r (ℙX̂ Z)) − 3Δ+2r (Z) ̂ 2 + d(‖X‖2 − ‖X‖
‖X‖ ̂ 2)
2 2 2

Proof. The first part of the theorem can be easily established by utilizing the algorithm primarily for estimating the causal
effect. However, to guarantee the performance of the proposed method, the final part of the theorem must be proven. The
proof of this theorem is presented in the Appendix.

Remark 1. The assumption Δ+2r (Z) < 2(Δ−2r (Z) − Δ+2r (ℙX̂ Z)) in part (ii) of Theorem 3.1 involves the RI property constants,
which are difficult to estimate. In addition to the RI property, the mutual incoherence property (MIP) is a commonly used
condition in the sparse recovery literature. The MIP conditions are defined as

𝜂 = maxi≠j |⟨Z i Z j ⟩|, (3.5)

which establishes the maximum pairwise correlation of the columns of the instrument’s matrix Z, and the maximum
strength of the individual instruments is measured as

|X̂ Z .j |
T

𝜌 = maxj . (3.6)
̂ 2 + d(‖X‖2 − ‖X‖
‖X‖ ̂ 2)
2 2 2
Qasim et al. 7

The performance of the PKCIV is analyzed in terms of the MIP conditions in (3.5) and (3.6). We modify the bounds in (3.4)
by following Corollary 2 in Kang et al.,12 wherein the number of invalid instruments is r such that r < min(1∕12𝜂, 1∕10𝜌).
In addition, by rewriting the assumption 2(Δ−2r (Z) − Δ+2r (ℙX̂ Z)) − Δ+2r (Z) > 0 in terms of two MIP constants 𝜂 and 𝜌, under
the conditions r < min(1∕12𝜂, 1∕10𝜌) and 𝜂 and 𝜌, the constraint from Lemma 3.1 can be modified and stated as
( )
|(X̂ + d(X T − X̂ ))e|
1∕2 T T
1 4𝜆𝜌(10(r + 2r2 𝜂))
‖𝛽̂(𝜆) − 𝛽0 ‖2 ≤ + , (3.7)
̂ 2 + d(‖X‖2 − ‖X‖
‖X‖ ̂ 2) 3 − 3r(6𝜂 + 5𝜌2 ) ̂ 2 + d(‖X‖2 − ‖X‖
‖X‖ ̂ 2)
2 2 2

where 2(Δ−2r (Z) − Δ+2r (ℙX̂ Z)) − Δ+2r (Z) ≥ 1 − r(6𝜂 + 5𝜌2 ) > 0 due to the upper and lower bounds of the RI property
constants in terms of MIP conditions such as Δ−2r (Z) ≤ 1 + 𝜇(2r − 1), Δ+2r (Z) ≥ 1 − 𝜂(2r − 1), Δ+2r (ℙX̂ Z) ≤ 2r𝜌2 Δ+2r (Z),
and Δ−2r (ℙX̂ Z) ≤ 2r𝜌2 Δ−2r (Z).

3.4 LASSO-type jackknife instrumental variable estimation

The LASSO procedure for IV estimation for some valid and invalid instruments was proposed by Kang et al.12 It is known
as the PTSLS estimator, which is a special form of the PKCIV estimator when k = 1. The PTSLS estimators of 𝜹 and 𝛽
can be computed in two parts. The PTSLS estimator of 𝜹, for a given 𝜆 > 0, from (3.2) is defined as

(𝜆) 1
𝜹̂ = argmin ‖Ỹ − Z𝜹‖
̃ 2 + 𝜆‖𝜹‖1 .
2
(3.8)
𝜹 2

The matrix Z̃ in (3.8) depends on X,

̂ which is estimated from the first-stage regression; thus, the bias of TSLS depends on
𝔼[X̂ e]. For observation i,
T

−1 L
𝔼[X̂ e] = 𝔼[𝝍̂ T Z Ti. ei ] = 𝔼[𝔼{(𝝍̂ T Z Ti. ei + 𝜇iT Z i. (Z T Z) Z Ti. ei )|Z}] =
T
𝜎 ,
n e𝜇

where 𝜎e𝜇 measures the degree of endogeneity. Ln 𝜎e𝜇 arises from the correlation of X̂ i for observation i with ei . In addi-
tion, this bias continues even if all the valid instruments are uncorrelated with ei . This becomes a more serious problem
in the presence of many or weak instruments, which increases the bias of the PTSLS estimator.7 Another issue with the
TSLS, as shown by Hausman et al.20 and Bekker,31 is that with many (weak) instruments, the TSLS is not consistent, even
under homoscedasticity. The LIML and FUL estimators are efficient with many weak instruments and under homoscedas-
ticity. However, these k-class IV methods are not robust when the data are heteroscedastic. This prompts us to introduce
a new class of LASSO-type jackknife IV estimator (LJIVE) that is robust to heteroscedasticity and many instruments by
following Hausman et al.20 The leave-one-out procedure in IVs regression can reduce bias by systematically excluding
each observation, performing the estimation, and then aggregating the results. The penalized jackknife TSLS (PJTSLS),
penalized jackknife LIML (PJLIML), and penalized jackknife FUL (PJFUL) are all members of a class of LJIVE.

Lemma 3.3 . 7 Let X(−i) be an (n − 1) × 1 vector given by X with the ith row removed and, similarly, Z (−i) be an (n − 1) × L
matrix. The ith row removes the dependence of the composing instrument on the exposure variable so that

𝔼[𝜓̂ (−i)
T
Z Ti. ei ] = 0.

Proof of Lemma 3.3 is provided in the appendix. We estimate the fitted value of exposure via Lemma 3.3 such that X̂ jiv
is the n × 1 vector with the ith row of Z i. 𝝍̂ (−i) , where 𝝍̂ (−i) is well defined in the proof of Lemma 3.3 in Appendix C.
Formally, the LJIVE for 𝜹 is obtained for a given 𝜆 > 0 as

(𝜆) 1
𝜹̂ jiv = argmin ‖Ỹ − Z̃ jiv 𝜹‖22 + 𝜆‖𝜹‖1 , (3.9)
𝜹 2

(𝜆)
where Ỹ = 𝕄X̂ jiv ℙZ Y , Z̃ jiv = 𝕄X̂ jiv Z. The LJIVE for 𝛽 using 𝜹̂ jiv in (3.9) is defined as

(𝜆)
X̃ jiv Y − X̂ jiv Z 𝜹̂jiv
T T

𝛽̂jiv
(𝜆)
= , (3.10)
X̂ jiv X̂ jiv + d(X T X − X̂ jiv X̂ jiv )
T T
8 Statistical Methods in Medical Research 0(0)

where X̃ jiv = X̂ jiv + djiv (X − X̂ jiv ) and djiv = (1 − kjiv )2 . PJTSLS (𝛽̂jiv
(𝜆)
) occurs with kjiv = 1, PJLIML (𝛽̂jiv
(𝜆)
T
) uses kjiv = 𝜅̌ liml ,
̂(𝜆) ̂(𝜆)
and PJFUL (𝛽jiv ) arises with kjiv = 𝜅̌ ful . 𝛽jiv can also be viewed as another estimator by setting kjiv = 0. For PJLIML,
( )
−1 ∑ n
kjiv = 𝜅̌ liml is estimated, where 𝜅̌ liml is the smallest eigenvalue of the matrix (𝕎 𝕎)
20 T
𝕎 ℙZ 𝕎 − ℙii 𝕎i 𝕎i , with
T T

[ ] [ ( ) ] [ ( ) ] i=1
𝕎 = Y X , and, for PJFUL, kjiv = 𝜅̌ ful = 𝜅̌ liml − 1 − 𝜅̌ liml ∕n ∕ 1 − 1 − 𝜅̌ liml ∕n . The tuning parameter, 𝜆, is
chosen through 10-fold cross-validation, wherein we minimize the predictive error for the PJTSLS, PJLIML and PJFULL
estimators. We display the solution path of the LASSO-based jackknife IV method in Figure 2 to visualize the impact of the
(𝜆)
penalty parameter 𝜆 on the estimated 𝜹̂ jiv . Tibshirani29 proposed the LASSO estimator for classical linear regression. The
LASSO estimates are nonlinear and nondifferentiable functions of the outcome values, making accurate estimation of their
standard errors difficult. As an alternative, Tibshirani29 suggested the use of bootstrapping to calculate the standard error.
Bootstrap methods are commonly used in statistics and econometrics, as well as in Mendelian randomization (see, e.g.
Refs.32,33 ). Therefore, the standard error and confidence intervals of the proposed methods and PTSLS can be estimated
by bootstrapping.

Remark 2. The theoretical performance of the LJIVE can be generalized on the basis of Theorem 3.1 via the estimator
𝛽̂jiv
(𝜆)
. When we remove the dependence of the constructed instruments on the exposure variable for observation i, we use
𝝍̂ (−i) = (Z T Z (−i) )−1 Z T X(−i) instead of 𝝍̂ = (Z T Z)−1 Z T X. This implies that X̃ = X̂ jiv + d(X − X̂ jiv ). We then replace
T
(−i) (−i) jiv
X̃ with X̃ jiv in (3.7) to obtain the estimation error bounds for the LJIVE, 𝛽̂jiv
(𝜆)
, as

|(X̃ jiv + d(X T − X̃ jiv ))e|

T T 1∕2
((4𝜆𝜌jiv (10(r + 2r2 𝜂)) )∕(3 − 3r(6𝜂 + 5𝜌2jiv )))
‖𝛽̂jiv
(𝜆)
− 𝛽 0 ‖2 ≤ + ,
‖X̂ jiv ‖22 + d(‖X‖22 − ‖X̂ jiv ‖22 ) ‖X̂ jiv ‖22 + d(‖X‖22 − ‖X̂ jiv ‖22 )

|X̃ jiv Z .j |
T

under Δ+2r (Z) < 2(Δ−2r (Z) − Δ+2r (ℙX̂ jiv Z)), where 𝜌jiv = maxj ‖X̂ ̂ .
jiv ‖2 +d(‖X‖2 −‖X jiv ‖2 )
2 2 2

4 Empirical study
We consider two experimental designs to examine the finite-sample behavior of the proposed estimators through Monte
Carlo simulations. The objective of Model-I design is to assess the performance of the PLIML and PFUL esti-
mators in the presence of numerous weak instruments and, subsequently, their performances with those of PTSLS.
The objective of Model-II design is to evaluate the performance of all estimators in the presence of heteroscedastic
errors.

Model I: We begin with a model in which the first-stage regression model is linear, and the errors are homoscedastic in the
form:

Yi = Xi 𝛽0 + Z Ti. 𝜹0 + ei ,
Xi = Z Ti. 𝝍 0 + 𝜇i , (4.1)

where

( ) ([ ] [ ])
ei i.i.d 0 𝜎e2 𝜎e𝜇
∼N ,
𝜇i 0 𝜎𝜇e 𝜎𝜇2
Qasim et al. 9

Figure 2. LASSO jackknife instrumental variable regularization path.

i.i.d
with 𝜎e2 = 𝜎𝜇2 = 1, and instrumental variables Z i. are drawn from the multivariate normal distribution, i.e. Z i. ∼ N(0, 𝚺z ),
with 𝚺z = diag(𝜎12 , … , 𝜎L2 ) by setting all the diagonal elements as one and the off-diagonal elements as 𝜂, which is a
pairwise correlation between instruments. Three different values of 𝜂, 𝜂 = 0.30, 0.60 and 0.75 are set to consider weak,
moderate and strong correlations between instruments. We set parameters 𝛽0 = 1, 𝜓0j = 0.10, and 𝜹0 = (10.3L , 00.7L )T ,
where we change r by increasing the number of instruments (L) in ∥𝜹0 ∥0 = r, and the causal parameter 𝛽0 is the quantity of
interest. The degree of endogeneity is measured by 𝜎e𝜇 , wherein we set the values of 𝜎e𝜇 from 0.30 to 0.90, while 𝜎e𝜇 = 0
represents no endogeneity. We set the sample sizes to 200, 500 and 1000. We consider cases with different numbers of
instruments to assess the performance of the proposed estimators with many weak and invalid instruments. The total
number of instruments (L) is selected by varying 10% to 70% of the sample size in a 10% interval; for example, L ranges
from 20 to 140 when the sample size n = 200. Increasing L from 50% to 70% corresponds to the high-dimensional setting
case.

Model II: The data generation process of the second model is given by Yi = Xi 𝛽0 + Z Ti. 𝜹0 + ei and Xi = Z Ti. 𝝍 0 + 𝜇i , where
i.i.d
the true parameter (𝛽0 , 𝜹0 ) values remain the same as those in Model (4.1) and Z i. ∼ N(0, IL ), where L ∈ {15, 30, 60} and r
represent the invalid instruments by setting 30% of L rounded to the nearest whole number. We set 𝜗2 = 𝜎𝜇−2 {(Z𝝍 0 )T Z𝝍 0 },
where 𝜗2 is intimately related to the concentration parameter (CP). We consider 𝜗2 = 8 and 𝜗2 = 64 to vary the strength of
the instruments.34 Both values of CP represent weak instruments and the lower the value of the CP parameter the weaker the
instruments are. The value of 𝜓0j is selected on the basis of the parameter 𝜗2 .2 The CP measures the strength of the instru-
ments, and it is also the first-stage F statistic when all the instruments are valid.35 The parameter 𝜗2 increases at the same
level as the sample size (n), i.e. 𝜗2 approaches n𝜗20 for some 𝜗20 > 0. We set n to 200, 500, 1000 and 5000. For Model-II we
included 5000 observations to reflect the larger sample sizes usually available in modern MR analysis. Due to the high com-
putational cost, we used only sample sizes of 200 to 1000 for Model-I. The second model is similar to the first model, but the
errors are not homoscedastic. The errors are allowed to be heteroscedastic by following the )design of Matsushita and Otsu. 36
{( }
∑L
2 )1∕2 𝜀
However, the disturbance terms ei and 𝜇i are generated as (ei , 𝜇i ) = 1+𝜙 Zij 𝜀1i , 𝜎e𝜇 𝜇i + (1 − 𝜎e𝜇 2i ,
j=r+1
i.i.d
where 𝜀1i and 𝜀2i are drawn from the normal distribution and where 𝜀1i , 𝜀2i ∼ N(0, 1), 𝜎e𝜇 ∈ {0.3, 0.6} and 𝜙 ∈ {0, 0.30}
are drawn for the homoscedastic and heteroscedastic error cases, respectively36 and.37 We consider the errors to be het-
eroscedastic and homoscedastic to gain a broader view of the performances of the estimators. A total of 1000 Monte Carlo
replications are used for each experiment.
10 Statistical Methods in Medical Research 0(0)

Figure 3. Relative median squared errors of PTSLS, PLIML and PFUL vs. percent of instruments × n when the sample size is 200
and (a) low endogeneity and low correlation exist between instruments, (b) low endogeneity and high correlation exist between
instruments, (c) high endogeneity and low correlation exist between instruments, and (d) high endogeneity and high correlation exist
between instruments.

4.1 Simulation results

Model I: We examine the PTSLS, PLIML and PFUL estimators for the first model in (4.1). We replicate the simulation
study of Kang et al.12 and propose robust estimators (PLIML and PFUL) to overcome the large bias relative to standard
errors when many weak valid and invalid instruments are present. The mean squared error is not a standard comparison in
this situation because LIML endures the moment problem, and high dispersion relates to the lack of moments in LIML; as a
result, we instead report the median squared error (MSE). Figures 3–5 depict the estimated results of the PKCIV estimators
(PTSLS, PLIML and PFUL) of 𝛽0 in terms of the relative median squared error2 and number of instruments for sample
sizes of n = 200, n = 500 and n = 1000. In each figure, we fix the sample size and increase the number of instruments to
observe the performances of the proposed estimators (PLIML and PFUL) and the PTSLS12 estimator with many weak and
invalid IVs. In addition, the numbers of invalid instruments (r) and valid instruments (L − r) increase with the total number
of instruments. This is true from low- to high-dimensional settings, where L = 0.1n to L = 0.7n, respectively. The PLIML
and PFUL estimators perform better as the number of valid and invalid weak instruments increases. The performances of
the PLIML and FUL estimators are almost equivalent for many instruments; these results align with those of Hahn et al.19
However, neither FUL nor LIML dominate each other in practice. Figures 3–5 (b) show that the median squared errors of
the PLIML and PFUL estimators are slightly greater than those of the PTSLS estimator when the number of instruments
is 10% of the sample size. Table 1 indicates the results of the rate of decrease (%) to examine the relative decrease in
median squared error due to sample size. As the sample size increases, the rate of decrease increases, and the performance
of the proposed estimators improves. Overall, these simulation results demonstrate that the proposed PLIML and PFUL
estimators perform better than PTSLS in the case of many instruments in terms of median squared errors.
Model II: Tables 2a, 2b, 2c, 3a, 3b, and 3c present the simulation results in terms of median bias, MSE and average standard
errors for oracle-LIML (OLIML),3 naive-LIML (NLIML),4 oracle-FUL (OFUL), naive-LIML (NFUL), penalized k-class
IV estimators (PTSLS, PLIML, PFUL) and LASSO-type jackknife IV estimators (PJTSLS, PJLIML, PJFUL) for a range
of numbers of instruments L, the degree of endogeneity 𝜎e𝜇 , the sample size n, and the strength of the instruments 𝜗2 .
The standard errors for the penalized methods are calculated by bootstrapping with 500 resamples. The average standard
Qasim et al. 11

Figure 4. Relative median squared errors of PTSLS, PLIML and PFUL vs. percent of instruments × n when the sample size is 500
and (a) low endogeneity and low correlation exist between instruments, (b) low endogeneity and high correlation exist between
instruments, (c) high endogeneity and low correlation exist between instruments, and (d) high endogeneity and high correlation exist
between instruments.

Table 1. Rate of decrease (%) for sample size using the relative median squared error.

L (%) PTSLS PLIML PFUL PTSLS PLIML PFUL PTSLS PLIML PFUL PTSLS PLIML PFUL
𝜎e𝜇 = 0.30 and 𝜂 = 0.30 𝜎e𝜇 = 0.30 and 𝜂 = 0.60 𝜎e𝜇 = 0.60 and 𝜂 = 0.30 𝜎e𝜇 = 0.60 and 𝜂 = 0.60

Sample size 200 to 500

10 9.00 8.23 6.34 12.31 14.1 12.50 −3.54 −5.68 −4.01 5.95 5.67 5.04
20 12.85 9.97 9.29 17.76 16.28 16.39 2.17 0.18 0.73 7.41 8.98 7.71
30 16.14 14.03 13.92 13.49 16.47 14.50 6.02 3.52 4.24 6.99 5.98 5.39
40 17.96 14.33 14.11 18.83 16.67 14.38 6.47 5.50 4.94 5.94 6.18 5.62
50 16.66 13.04 13.07 13.83 14.1 11.57 8.61 4.66 4.52 6.91 5.33 5.01
60 17.32 9.54 13.68 17.83 11.86 14.67 6.80 4.11 5.00 7.91 6.24 7.87
70 15.73 9.30 11.22 16.25 11.57 13.83 6.94 3.74 4.11 8.46 4.68 4.97
Sample size 200 to 1000
10 20.15 17.99 17.24 24.78 25.40 24.32 1.17 −0.70 0.27 11.32 10.82 9.00
20 24.28 21.08 20.50 28.48 28.26 26.63 8.72 6.23 6.22 14.28 14.91 14.09
30 27.52 24.38 25.89 25.22 26.30 24.81 11.82 9.82 10.56 13.09 11.59 10.68
40 26.46 21.45 21.17 28.69 24.56 24.07 11.85 9.54 8.57 12.31 11.84 11.48
50 27.87 22.98 23.09 26.17 23.74 21.67 14.25 9.66 10.06 13.86 11.48 10.95
60 28.49 18.27 20.12 27.76 20.48 22.68 13.39 8.46 8.87 14.15 12.32 13.06
70 26.06 16.99 19.03 25.32 20.88 21.93 12.46 7.21 7.53 14.04 12.97 14.01

Note: PTSLS = “Penalized two-stage least square”12 ; proposed estimators: PLIML = “Penalized limited information maximum likelihood”; PFUL =
“Penalized FUL.”17
12 Statistical Methods in Medical Research 0(0)

Figure 5. Relative median squared errors of PTSLS, PLIML and PFUL vs. percent of instruments × n when the sample size is 1000
and (a) low endogeneity and low correlation exist between instruments, (b) low endogeneity and high correlation exist between
instruments, (c) high endogeneity and low correlation exist between instruments, and (d) high endogeneity and high correlation exist
between instruments.

error performance criterion has been widely used in previous MR simulation studies, such as those by Burgess et al.38
Tables 2a, 2b, 2c, 3a, 3b, and 3c present the results when the errors are heteroscedastic and homoscedastic, respectively.
We estimate the causal effect for each experiment and the penalization parameter 𝜆 in the LASSO procedures selected by
10-fold cross-validation. The results of the OLIML and OFUL estimators are based on knowing which instruments are
invalid with supp(𝜹0 ), and the results of the NLIML and NFUL estimators are based on not knowing which instruments are
invalid. We expect NLIML and NFUL to perform poorly in the presence of invalid instruments.39 The PTSLS estimator
is taken from the sisVIVE routine in the literature.12 As discussed earlier, the PLIML and PFUL estimators are robust
and viable alternatives to PTSLS (sisVIVE) when there are many weak instruments. However, PLIML and PFUL can be
inconsistent in terms of many instruments and heteroskedasticity. Therefore, we present the results of PJTSLS, PJLIML
and PJFUL proposed for reducing the bias caused by the endogeneity, weak instruments and heteroscedastic errors in the
IV model with invalid instruments.
The results in Table 2a when L = 15 and r = 5 show some interesting patterns. The PJTSLS estimator outperforms the
other LASSO procedures (PTSLS, PLIML, PFUL, PJLIML and PJFUL) in terms of bias and MSE. However, the PJLIML
and PJFUL estimators are more efficient, with estimates having lower mean standard errors than those of the other methods.
The performance of the estimators improves when the sample size is increased, excluding the NLIML and NFUL estimators,
because of the number of invalid instruments. In the presence of heteroscedasticity, the MSE of the estimators is greater
than that in the homoscedastic scenario. The bias, MSE and mean standard error values of the estimators decrease when
the parameter 𝜗2 is changed from 8 to 64. 𝜗2 = 2 represents the case in which the instruments are very weak, and the
proposed estimators are more robust in this situation. Note that the OLIML and OFUL methods do not perform well in the
presence of weak instruments and heteroscedasticity. This might be because the LIML and FUL methods are not consistent
in handling this situation.20 The PJLIML and PJFUL methods exhibit greater bias and MSE than PTSLS when 𝜎e𝜇 = 0.60
and 𝜗2 = 64. This is the case when the instruments are slightly strong; however, in this situation, the alternative choice
is PJTSLS, which is efficient. When L increases from 15 to 30 (Table 2b), PJLIM and PJFUL outperform in a certain
case, such as when n = 200, 𝜎e𝜇 = 0.60 and 𝜗2 = 64. Table 2b and 2c present the estimation results for L = 30 and
Qasim et al. 13

L = 60, respectively. The bias, MSE and mean standard error increase for all IV methods when the number of instruments
is 30 or greater. However, in these situations, the use of LASSO-type jackknife IV estimators improves the estimation of
the causal effect in the MR. In addition, we observe that the PJTSLS outperforms all other estimators where the LASSO
procedure is used for the estimation of IVs when the errors are heteroscedastic.
In Tables 3a–3c, the values of bias, MSE and mean standard errors are lower than those in the heteroscedastic case.
Tables 3a–3c provide interesting findings for different cases. For example, when 𝜎e𝜇 = 0.30 and 𝜗2 = 8, the causal effect
estimates of PJLIML and PJFUL perform efficiently and have substantially lower bias, MSE and standard errors than those
of the other methods do. This is the benefit of the PJLIML and PJFUL methods under many (weak) instruments. On the
other hand, when the instruments are not very weak (𝜗2 = 64) and 𝜎e𝜇 = 0.30, PJTSLS seems to perform better than the
other methods do. When 𝜗2 = 8 and 𝜎e𝜇 = 0.30, OLIML and OFUL have higher MSEs. This is because both the LIML
and FUL estimators are inconsistent and exhibit greater dispersion, particularly for LIML, due to the “moments problem”
under conditions of many (weak) instruments and heteroskedasticity. However, even under homoscedasticity, the issue of
many weak instruments remains. With many (weak) instruments, ℙZ ii does not shrink to zero, causing inconsistency. When
𝜗2 = 64, the OLIML and OFUL estimators perform better than the other methods do, as expected. The performance of
PTSLS and PJTSLS is superior to that of other penalized methods when the instruments are slightly strong and the degree
of endogeneity is high (Tables 3a and 3b); when L = 60 (Table 3c), the bias, MSE and mean standard error of PJLIML
and PJFUL are lower than those of PTSLS. The median bias, MSE, and mean standard error values generally decrease as
n increases, but this is not the case for all estimators, and the pattern is not consistent. The parameter 𝜓0j varies with the
sample size and number of instruments and is not constant, as shown in Tables 2 and 3. However, in Model I, we fix the
value of 𝜓0j , and it can be seen in Table 1 that the MSE decreases when the sample size increases, and the performance of
the estimators improves.
The results of OLIML and OFUL achieve better performances than the naive estimators because the oracle estimators
accurately identify which instruments are valid and invalid. However, the naive estimators (NLIML and NFUL) assume that
all the instruments are valid, and consequently, they have higher bias, MSE and mean standard error values than the other
estimators do. Note that the proposed estimators do not use the information that one knows accurately which instruments are
valid, whereas the TSLS, LIML and FUL estimators do. Examining the FUL- and LIML-type estimators reveals that FUL
is less dispersed than LIML. The proposed estimators perform similar to the oracle estimators and sometimes perform even
better. The LASSO-type jackknife IV estimators outperform the PTSLS estimator. In summary, these simulation results
indicate that the PTSLS performs worse when the instruments are weak and the errors are heteroscedastic, so PJLIML and
PJFUL may be helpful methods when many instruments are used. Moreover, PJTSLS performs well relative to all other
estimators.

5 Analysis of body mass index, health-related quality of life and genetic markers
This analysis was conducted to perform an MR study in which we estimated the causal effect of BMI on the HRQLI using
SNPs as instruments for BMI. The HRQLI is estimated via the health utility index mark 3 developed by Horsman et al.,40
which is a summary measure of several health attributes, such as vision, hearing and cognitive skills. A health utility score
of 1 indicates “perfect health,” and a value of 0 represents a “dead” state. The health utility score can be negative, which
represents a state “worse than death.”41,42 We use data from the Wisconsin Longitudinal Study (WLS),5 which includes
American high school graduates from Wisconsin who have been tracked since 1957. According to the information provided
by the WLS, genetic variants can explain different dimensions of the HRQLI (e.g. cognitive skills). Our analysis is limited to
1816 individuals who were genotyped in 2004. We remove individuals with more than 10% missing genotype data. We use
10 genetic variants (SNPs) as potential IVs that have been used in previous research either to explain various dimensions of
HRQLI or as instruments explaining BMI. The SNPs used as potential instruments (APOE, CHRM2, GABBR2,5-HTR2A,
ADIPOQ, DISCI, CYP11A1, BDNF, HFE and DRD2), along with the respective references for each SNP, are summarized
in Table 4. In addition, the diseases/behavior associated with them as identified by WLS are also presented in Table 4. IVs
may be invalid for various reasons, such as linkage disequilibrium, population stratification, and horizontal pleiotropy.13,53
The R code for the analysis of BMI, HRQLI and genetic variants is provided in the supplementary material.6
The parameter of interest for estimating the causal effect of BMI on the HRQLI is 𝛽0 in Model (2.1). The results
̂ standard errors, 95% confidence intervals and number of invalid IVs from the causal
of the estimated causal effect (𝛽),
regression model using SNPs are given in Table 5. If we treat all instruments as valid, then the causal effects for the TSLS
(0.006769 ± 0.020022), LIML (1.041803 ± 4.260779), and FUL (0.052532 ± 0.069872) estimators are positive, which
is not expected. This is because these methods are not robust in the presence of invalid instruments. LIML has a higher
standard error than other methods because it suffers from a “moments problem,” as noted by Hahn et al.19 MR analysis
assumes homoscedasticity. In practice, this assumption is often not fulfilled, leading to heteroscedasticity. Additionally,
14 Statistical Methods in Medical Research 0(0)

Table 2a. Estimation results of the estimators for L = 15 and r = 5 with heteroscedastic errors.