IPCW_tutorial_2
IPCW_tutorial_2
1 Principle
Inverse probability of censoring weighting (IPCW) is a method able to handle in-
formative drop-out. Intuitively, in presence of informative drop-out a complete case
analysis is a biased approach as individuals with complete data are not representa-
tive of the population. However with an appropriate re-weighting of the individuals
with complete data, we can "re-balance" our sample and make it representative of
the population. To do so, we divide the population into sub-populations and at-
tribute weights to individuals who did not drop-out inversely proportional to the
frequency of the drop-out in the sub-population. Thanks to the weights, individuals
who did not drop-out "represent" the individuals who dropped-out. Thus, overall,
the weighted sample is representative of the population.
1
2 Continuous outcome
2.1 Generative model
To illustrate the use of IPCW in the continuous outcome case, we will consider a
longitudinal study with 2 groups (G = 0 and G = 1) and 2 timepoints (t1 and t2 )
and no other covariate. The outcome Y is normally distributed, denoted Y1 at t1
and Y2 at t2 :
" # " # " #!
Y1 |G = 0 50 1 ρ
=N , 100
Y2 |G = 0 50 − dµ1 ρ 1
" # " # " #!
Y1 |G = 1 75 1 ρ
=N , 100
Y2 |G = 1 75 − dµ2 ρ 1
2
2.2 Illustrative example 1
Consider a study were we follow depressed individual over time. They have a baseline
measurement, then are given a treatment, and then have a follow-up measurement.
We would like to assess the treatment effect in term of depression score 1 . The
population of interest contain severely and moderately depressed individuals; the
treatment may work differently in each sub-population. Unfortunately, some study
participants dropped-out and it seems that they are more likely to drop-out when
their baseline score is high.
Without drop-out, we could use a simple linear model to carry-out the analysis:
dtW.Boracle <- dcast(dtL.B, formula = id ∼ time, value.var = "Y")
dtW.Boracle$diff <- dtW.Boracle$T2-dtW.Boracle$T1
e.Boracle <- lm(diff∼1, data = dtW.Boracle)
summary(e.Boracle)$coef
1
:To simplify, there is no control group - we assume that without treatment the depression score
would be constant.
3
leading to an estimate quite close to the true value:
(-25-50)/2
[1] -37.5
With drop-out, a complete case analysis would lead to a biased estimator. In this
example, we can "see" that the estimated value is far away from the true one (even
when accouting for the uncertainty):
dtW.B <- dcast(dtL.B, formula = id + mdd ∼ time, value.var = "Yobs")
dtW.B$diff <- dtW.B$T2-dtW.B$T1
dtW.BCC <- dtW.B[!is.na(diff)]
e.BCC <- lm(diff∼1, data = dtW.BCC)
summary(e.BCC)$coef
An alternative approach would be to use a linear mixed model (i.e. full informa-
tion):
require(nlme)
e.BFI <- lme(Yobs∼time, random = ∼1|id, data = dtL.B,
na.action = na.omit)
summary(e.BFI)$tTable
which appears better than the complete case analysis but still downward biased.
This can be a bit surprising at first, but can be explained when seeing the mixed
model as a way to "impute" missing values at follow-up. The current mixed model is
misspecified (missing interaction between time and group) and it therefore use the
wrong imputation model. This is illustrated in Figure 1 (see appendix A.5 for the R
code). The bias is of opposite direction between the two mdd subgroups and same
magnitude so it would cancel out under random censoring. However here because
the severe group is more likely to be censored the bias does not cancel out.
4
Misspecified outcome model (~time) valid outcome model (~time*mdd)
80 80
Yobs
Yobs
40 40
0 0
T1 T2 T1 T2
time time
MDD group moderate (fully observed) moderate (partially observed) severe (fully observed) severe (partially observed)
Figure 1: Distribution of the observed and imputed value when using the mixed
model.
With a correct model for the outcome (i.e. adding the interaction), the mixed
would be able to impute the observations in an unbiased way:
e.BFIoracle <- lme(Yobs∼time*mdd, random = ∼1|id, data = dtL.B,
na.action = na.omit)
summary(e.BFIoracle)$tTable
Linear Hypotheses:
Estimate
timeT2 + 0.5 * timeT2:mddsevere == 0 -37.43
5
An alternative approach that does not require to specify an outcome model is to
use IPCW. It instead requires to correctly specify a model for the probability of not
dropping out at follow-up:
dtW.B$observed <- !is.na(dtW.B$T2)
e.glmW.B <- glm(observed ∼ T1, data = dtW.B,
family = binomial(link = "logit"))
coef(e.glmW.B)
(Intercept) T1
6.6357425 -0.1047988
and then compute the weights for observations with full data:
dtW.B$weight.oracle <- 1/predict(e.glmW.B, newdata = dtW.B,
type = "response")
dtW.B[observed == TRUE, sum(weight.oracle)]
[1] 2045.06
Note that the weights almost sum to the total sample size. We then perform the
complete case analysis with these weights:
dtW.BCC <- dtW.B[!is.na(diff)]
e.BIPCW <- lm(diff∼1, data = dtW.BCC, weights = dtW.BCC$weight.oracle)
summary(e.BIPCW)$coef
which gives a result very close to the true value. Here the IPCW works very well
because we have specified the correct censoring model.
6
2.3 Illustrative example 2
Consider a similar study with a different cause of drop-out. This time drop-out is
not due to baseline value but due to the severity of the disease (i.e. group): two
patients severely depressed but with different baseline score will have exactly the
same probability of drop-out while two patients, one severely depressed and the
other moderately depressed, with same baseline score will have different probability
of drop-out.
Overall the expected treatment effect is the same as before and, without drop-out,
the linear model gives the same estimates:
dtW.Loracle <- dcast(dtL.L, formula = id ∼ time, value.var = "Y")
dtW.Loracle$diff <- dtW.Loracle$T2-dtW.Loracle$T1
e.Loracle <- lm(diff∼1, data = dtW.Loracle)
summary(e.Loracle)$coef
7
With drop-out, a complete case analysis would still lead to a downward biased
estimator:
dtW.L <- dcast(dtL.L, formula = id + mdd ∼ time, value.var = "Yobs")
dtW.L$diff <- dtW.L$T2-dtW.L$T1
dtW.LCC <- dtW.L[!is.na(diff)]
e.LCC <- lm(diff∼1, data = dtW.LCC)
summary(e.LCC)$coef
for a reason similar as before, as patients from the severely depressed group will
drop more often and they benefit more from the treatmet. We can use a linear mixed
model (i.e. full information):
require(nlme)
e.LFI <- lme(Yobs∼time, random = ∼1|id, data = dtL.L, na.action = na.omit
)
summary(e.LFI)$tTable
which is better than the complete case analysis still biased because once more the
outcome model is misspecified. With a correctly specified outcome model, we would
get a much better estimate:
e.LFIoracle <- lme(Yobs∼time*mdd, random = ∼1|id, data = dtL.L, na.action
= na.omit)
glht(e.LFIoracle, linfct = "timeT2+0.5*timeT2:mddsevere=0")
Linear Hypotheses:
Estimate
timeT2 + 0.5 * timeT2:mddsevere == 0 -37.3
When using IPCW, we should model the probability of not dropping out at
follow-up as a function of the latent group:
dtW.L$observed <- !is.na(dtW.L$T2)
e.glmW.Loracle <- glm(observed ∼ mdd, data = dtW.L,
family = binomial(link = "logit"))
8
and then compute the weights for observations with full data:
dtW.L$weight.oracle <- 1/predict(e.glmW.Loracle, newdata = dtW.L,type = "
response")
dtW.L[observed == TRUE, sum(weight.oracle)]
[1] 2000
Note that the weights sum to the total sample size. We then perform the complete
case analysis with these weights:
dtW.LCC <- dtW.L[!is.na(diff)]
e.LIPCWoracle <- lm(diff∼1, data = dtW.LCC, weights = dtW.LCC$weight.
oracle)
summary(e.LIPCWoracle)$coef
which gives a result very close to the true value. A more feasible IPCW would
use the baseline score to define the weights:
e.glmW.L <- glm(observed ∼ T1, data = dtW.L,
family = binomial(link = "logit"))
dtW.L$weight <- 1/predict(e.glmW.L, newdata = dtW.L, type = "response")
dtW.L[observed == TRUE, sum(weight)]
[1] 2038.825
We then perform the complete case analysis with these new weights:
dtW.LCC <- dtW.L[!is.na(diff)]
e.LIPCW <- lm(diff∼1, data = dtW.LCC, weights = dtW.LCC$weight)
summary(e.LIPCW)$coef
9
2.4 Simulation study
The quality of the previous estimators is compared using a simulation study (see
appendix A.2):
• first under the right generative model (i.e. multivariate normal distribution)
Replicating a 1000 times and also varying the correlation coefficient leads to:
dt.simGaussian <- warper3TrialC(n = 1000, n.rep = 1000, cl = 50,
rho = c(0,0.25,0.5,0.8),
dmu = c(25,50),
piC = list(0.5,1,c(0.2,0.7)),
short = FALSE, seed = 10)
10
gg.simGaussian <- ggSimRes(dt.simGaussian)
dropout: random
−30
−35
−40
−35
−40
−35
−40
11
2.4.2 Incorrect parametric assumptions - heavy tails
Data were simulated using a student distribution with 3 degrees of freedom. See
figure below for an example:
moderate severe
0.03
0.02
T1
0.01
density
0.00
0.03
0.02
T2
0.01
0.00
−100 0 100 200 −100 0 100 200
Y
Figure 3: Histogram of the simulated outcome for one study when using a multivari-
ate student distribution. The red line indicates the best fitting Gaussian distribution.
12
Replicating a 1000 times and also varying the correlation coefficient leads to:
dt.simStudent <- warper3TrialC(n = 1000, n.rep = 1000, cl = 50,
rho = c(0,0.25,0.5,0.8),
dmu = c(25,50), df = 3,
piC = list(0.5,1,c(0.2,0.7)),
short = FALSE, seed = 10)
dropout: random
−30
−35
−40
−45
−25
−35
−40
−45
−25
−35
−40
−45
13
2.4.3 Incorrect parametric assumptions - skewed
Data were simulated as in the correctly specific case but a a gamma noise added
with shape parameter 1 and scale parameter 20. See figure below for an example:
moderate severe
0.03
0.02
T1
0.01
density
0.00
0.03
0.02
T2
0.01
0.00
0 50 100 150 200 250 0 50 100 150 200 250
Y
Figure 5: Histogram of the simulated outcome for one study when using a multivari-
ate normal distribution with added gamma distributed noise. The red line indicates
the best fitting Gaussian distribution.
14
Replicating a 1000 times and also varying the correlation coefficient leads to:
gg.simGamma <- ggSimRes(dt.simGamma, ylim = c(-45, -25))
dropout: random
−30
−35
−40
−45
−25
−35
−40
−45
−25
−35
−40
−45
15
2.4.4 Incorrect parametric assumptions - uniform
Data were simulated as in the correctly specific case but with a uniform noise added
(min 0 and max 100). See figure below for an example:
moderate severe
0.015
0.010
T1
0.005
density
0.000
0.015
0.010
T2
0.005
0.000
0 50 100 150 200 0 50 100 150 200
Y
Figure 7: Histogram of the simulated outcome for one study when using a multivari-
ate normal distribution with added gamma distributed noise. The red line indicates
the best fitting Gaussian distribution.
16
Replicating a 1000 times and also varying the correlation coefficient leads to:
dt.simUnif <- warper3TrialC(n = 1000, n.rep = 1000, cl = 50,
rho = c(0,0.25,0.5,0.8),
dmu = c(25,50), unif = c(0,100),
piC = list(0.5,0.2,c(0.2,0.7)),
short = FALSE, seed = 10)
dropout: random
−30
−35
−40
−45
−25
−35
−40
−45
−25
−35
−40
−45
17
3 Binary outcome
3.1 Illustrative example
A somehow similar approach can be used for binary endpoints. Consider now a study
comparing the survival probability at 1 year of patients treated with a new drug vs.
standard care. The population is composed of two types of patients, say some with
hypertension and some without. Survival as well as the treatment effect may differ
depending of the hypertension status. Hypertension may also affect the drop-out
probability.
18
set.seed(11)
tau <- 1
19
In absence of drop-out, we can compare the survival probabilities at 1 year using
a logistic regression:
e.oracle <- glm(responseUncensored ∼ treatment,
data = dt, family = binomial(link="logit"))
summary(e.oracle)$coef
A first idea would be to re-use the IPCW approach, first fitting a logistic model
for the probability of being observed at 1-year and then computing the weights:
e.IPCmodel <- glm(observed ∼ group*treatment, data = dt, family = binomial
(link="logit"))
dt$IPCweights <- 1/predict(e.IPCmodel, newdata = dt, type = "response")
sum(dt$IPCweights)
[1] 6305.334
Advarselsbesked:
I eval(family$initialize) : non-integer #successes in a binomial glm!
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.4515849 0.04586621 9.845700 7.153939e-23
treatmentT -0.3341242 0.06411408 -5.211402 1.874189e-07
20
as we disregarded the duration of observation among the censored individuals.
Intuitively, individuals censored early are more at risk of dying and therefore should
"transfer" more weight than those censored late, e.g. just before 1 year, who don’t
really need to transfer weights. This can be perform using a survival model (here a
Cox model) and using as weights the inverse of the probability of not being censored
at the earliest between when the event occured and 1 year:
library(survival)
library(riskRegression)
e.IPCmodel2 <- coxph(Surv(eventtime,status==0) ∼ group*treatment,
data = dt, x = TRUE, y = TRUE)
iPred <- predictCox(e.IPCmodel2, newdata = dt,
time = pmin(dt$eventtime,tau)-(1e-12), diag = TRUE)$
survival
dt$IPCweights2 <- dt$observed/iPred
sum(dt$IPCweights2)
[1] 3997.757
21
Note that this estimator is implemented in the riskRegression package:
e.wglm <- wglm(regressor.event = ∼treatment,
formula.censor = Surv(eventtime,status==0)∼group*treatment,
times = 1,
data = dt[,.(eventtime,status,group,treatment)])
summary(e.wglm)
n events
4000 1409
4000 clusters
coeffients:
Estimate Std.Err 2.5% 97.5% P-value
(Intercept) 0.041108 0.056878 -0.070371 0.152587 0.4698
treatmentT -0.264725 0.082562 -0.426543 -0.102906 0.0013
exp(coeffients):
Estimate 2.5% 97.5%
(Intercept) 1.04196 0.93205 1.1648
treatmentT 0.76742 0.65276 0.9022
2
the standard errors are slightly different though
22
3.2 Simulation study
The quality of the previous estimators is compared using a simulation study. The
results are summarized by Figure 9.
0.0
estimate
−0.5
−1.0
−0.4 −0.2 0.0 0.2 0.4
−0.4 −0.2 0.0 0.2 0.4
−0.4 −0.2 0.0 0.2 0.4
oracle
23
Appendix A: Rcode (continuous case)
A.1 Generative data model
simTrialC <- function(n, rho, dmu, causeC, piC,
df = Inf, gamma = NULL, unif = NULL){
## simulate data
sigma <- 10
Sigma <- sigma^2*matrix(c(1,rho,rho,1),2,2)
24
dtL[time=="T2", probaDO := ifelse(.SD$mdd=="moderate",piC[1],piC[2])]
}else if(causeC == "baseline"){
dtL$res <- 0
Ybar <- dtL[time=="T1",mean(Y)]
dtL[mdd=="moderate", res := c((Y[1]-Ybar)/sigma,NA), by = "id"]
dtL[mdd=="severe", res := c((Y[1]-Ybar)/sigma,NA), by = "id"]
dtL[mdd=="moderate", probaDO := c(0,plogis(piC[1]*res[1])), by = "id"]
dtL[mdd=="severe", probaDO := c(0,plogis(piC[1]*res[1])), by = "id"]
dtL$res <- NULL
}
## simulate dropout
dtL[,c("dropout","Yobs") := .(rbinom(.N,prob=probaDO,size=1),Y)]
dtL[dropout==1,Yobs:=NA]
## export
dtL$probaDO <- NULL
setkeyv(dtL,"id")
return(dtL)
}
require(pbapply)
## normalize arguments
if(!is.list(piC) || length(piC)!=3){
stop("Argument \’piC\’ should be a list of length 3. \n")
}
grid.seed <- expand.grid(rho = rho,
rep = 1:n.rep)
if(length(seed)==1 && NROW(grid.seed)>1){
set.seed(seed)
grid.seed$seed <- sample.int(1e6, size = NROW(grid.seed), replace =
FALSE)
}else{
grid.seed$seed <- seed
25
}
## internal warper
.warper3TrialC <- function(iSim){ ## iSim <- 1
iOut <- NULL
for(iR in 1:length(rho)){ ## iR <- 1
iSeed <- grid.seed[grid.seed$rho==rho[iR] & grid.seed$rep == iSim,"
seed"]
## iterate
if(trace==TRUE){
ls.res <- pblapply(1:n.rep, FUN = .warper3TrialC, cl = cl)
}else{
ls.res <- lapply(1:n.rep, FUN = .warper3TrialC)
}
out <- do.call(rbind,ls.res)
## export
if(!short){
26
out$estimator <- factor(out$model, c("complete case","FI","FI.oracle"
,"IPCW","IPCW.oracle","oracle"))
out$correlation <- paste0("correlation = ", out$rho)
out$cause <- factor(out$causeC,
levels = c("random","baseline","latent"),
labels = c("dropout: random", "dropout: baseline
score","dropout: latent group"))
}else{
test.col <- c("causeC","dropout","rho","n","dmu","seed")
test.duplicated <- duplicated(out[,test.col])
out[which(test.duplicated),test.col] <- ""
}
return(out)
#+ENDSRC
return(invisible(gg))
27
}
Identify patient with missing data and get the imputed value
pred.B <- predict(e.lmm, newdata = dtL.BNA, type = "dynamic",
keep.newdata = TRUE)
pred.B$mdd <- paste(pred.B$mdd," (imputed)")
predOracle.B <- predict(eOracle.lmm, newdata = dtL.BNA, type = "dynamic",
keep.newdata = TRUE)
predOracle.B$mdd <- paste(predOracle.B$mdd," (imputed)")
data: diff.lmm
t = -149.14, df = 1999, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-35.17796 -34.26483
sample estimates:
mean of x
-34.72139
28
timeT2
-34.72139
data: diff.lmm.oracle
t = -125.01, df = 1999, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-38.02137 -36.84682
sample estimates:
mean of x
-37.4341
Linear Hypotheses:
Estimate
timeT2 + 0.5 * timeT2:mddsevere == 0 -37.43
29
Advarselsbeskeder:
1: Removed 969 rows containing non-finite values (stat_boxplot).
2: Removed 969 rows containing non-finite values (stat_boxplot).
Advarselsbeskeder:
1: Removed 969 rows containing non-finite values (stat_boxplot).
2: Removed 969 rows containing non-finite values (stat_boxplot).
30