Fitting Additive Hazards Models For Case-Cohort Studies: A Multiple Imputation Approach
Fitting Additive Hazards Models For Case-Cohort Studies: A Multiple Imputation Approach
Jinhyouk Jung
Department of Statistics University of Connecticut
Aug 1, 2012
JSM 2012
08/01/12
1 / 31
Outline
Case-cohort study Multiple Imputation Additive hazards models Simulation studies Data example - Zinc concentration study
JSM 2012
08/01/12
2 / 31
JSM 2012
08/01/12
3 / 31
Two steps
Select subcohort randomly Add remaining cases in the cohort to the subcohort
JSM 2012
08/01/12
4 / 31
CASE-COHORT DESIGN
JSM 2012
08/01/12
5 / 31
CASE-COHORT DESIGN
JSM 2012
08/01/12
6 / 31
CASE-COHORT DESIGN
JSM 2012
08/01/12
7 / 31
CASE-COHORT DESIGN
JSM 2012
08/01/12
8 / 31
CASE-COHORT DESIGN
JSM 2012
08/01/12
9 / 31
INTRODUCTION: ASSUMPTION
Model assumption
Semiparametric additive hazards models Cox proportional hazards model? The critical proportional hazards assumption seems to be violated or the quantity of interest is risk dierences rather than relative risks
Missingness assumption
Missing at random (MAR) the selection of the subcohort could depend only on some covariates As a special case of missing covariate problem with MAR.
JSM 2012
08/01/12
10 / 31
JSM 2012
08/01/12
11 / 31
The missing data mechanism is determined by the conditional distribution of S given (X , , Z obs ).
JSM 2012
08/01/12
12 / 31
(1)
U() =
i=1
(2)
where Z (t) =
JSM 2012
08/01/12
13 / 31
There exists an explicit solution to the estimating equations U() = 0 taking the following form
n i=1 n i=1 0 0 1
dt
(3)
where a2 = aa .
JSM 2012
08/01/12
14 / 31
Instead of lling in a single value for each missing value, two or more acceptable values representing a distribution of possibilities are used. Rubin (1987) suggested three steps for MI
1) Imputing step : consider a reasonable imputation model with an approximate true distributional relationship between the unobserved data and the available information. 2) Analysis step : complete data analysis is performed M-times using each completed data set. 3) Combining step : combine these estimates to obtain the so-called repeated-imputation inference.
JSM 2012
08/01/12
15 / 31
m .
m=1
(4)
V (m ), m = 1, , M,
m=1
1 M
(m M )(m M ) .
m=1
1 )BM . M
08/01/12
(5)
16 / 31
JSM 2012
08/01/12
17 / 31
where a, b, c and d were assigned specic values to generate about 50 percent missing rate of full cohort. In our study, we used (0.68, 1, 1, 1) and sample size of our study was n = 2000.
JSM 2012
08/01/12
18 / 31
Generate 1000 data sets Full imputation model such as Z mis 0 + 1 X + 2 + 3 Z obs MICE package in R
Bayesian linear regression imputation, (Rubin (1987)) : norm predictive mean matching, (Little (1988)) : pmm
JSM 2012
08/01/12
19 / 31
obs
10
PB%
10
20
20 0.0
10
10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
correlation
correlation
1.0
1.0
Full Cohort
0.8 0.8
MI norm MI pmm
0.6
AL
0.4
0.2
0.2 0.0
0.4
0.6
0.1
0.2
0.3
0.4
0.5
0.6
0.7
correlation
correlation
JSM 2012
08/01/12
20 / 31
Averaged Estimates
Averaged Estimates Correlation Methods mis Full NORM PMM Full NORM PMM 0.0 -.101 -.098 -.097 .099 .099 .096 0.1 -.102 -.101 -.101 .099 .098 .098 0.2 -.100 -.098 -.097 .099 .098 .098 0.3 -.096 -.093 -.093 .098 .096 .096 0.4 -.098 -.091 -.090 .096 .093 .092 0.5 -.099 -.099 -.098 .102 .101 .100 0.6 -.098 -.098 -.096 .096 .095 .093 0.7 -.101 -.099 -.099 .100 .098 .097
obs
JSM 2012
08/01/12
21 / 31
Coverage Rate(%)
Coverage Rate (%) Correlation Methods mis Full NORM PMM Full NORM PMM 0.0 93.19 95.20 97.79 94.09 95.19 95.89 0.1 95.00 94.30 95.90 95.09 95.09 94.70 0.2 95.69 94.49 95.19 96.29 96.09 96.19 0.3 93.00 94.69 94.00 94.99 94.19 95.09 0.4 95.40 94.29 94.59 95.60 95.60 95.10 0.5 95.39 96.19 95.69 95.49 95.99 95.59 0.6 94.50 94.59 93.19 95.19 94.79 94.29 0.7 96.90 95.87 94.49 96.59 94.99 94.29
obs
JSM 2012
08/01/12
22 / 31
JSM 2012
08/01/12
23 / 31
Table: Estimates for mis and obs at dierent size and under simulation scenario II.
n = 500, = 0.5 Full Cohort mis Average Est. (AL) (PB%) (CR%) 0.500 0.471 0.116 94.8 obs 0.402 0.464 0.747 96.0 MI logreg mis 0.480 0.901 -3.862 91.8 obs 0.409 0.493 2.368 96.1 n = 500, = 0.8 Full Cohort mis 0.498 0.575 -1.777 96.4 obs 0.398 0.596 -0.425 95.1 MI logreg mis 0.491 0.989 -0.299 91.8 obs 0.397 0.651 -0.671 96.0
JSM 2012
08/01/12
24 / 31
We assume p0 =
1 18
and p1 =
1 2
The corresponding probabilities of Z mis = 1 are 0.5 and 0.26 when = 0.5, 0.8
Jinhyouk Jung (UConn) JSM 2012 08/01/12 25 / 31
JSM 2012
08/01/12
26 / 31
n = 500, = 0.5 Full Cohort mis Average Est. (AL ) (PB%) (CR%) 0.507 0.819 1.502 95.3 obs 0.409 1.640 2.345 94.2 MI logreg mis 0.511 1.440 2.289 96.7 obs 0.383 1.681 -4.180 94.3
n = 500, = 0.8 Full Cohort mis 0.507 1.006 1.460 94.5 obs 0.410 1.533 2.719 90.7 MI logreg mis 0.472 1.733 -5.413 92.7 obs 0.391 1.562 -2.150 90.7
JSM 2012
08/01/12
27 / 31
There are 81 cases and 350 controls among 431 subjects of total. The sample for zinc measurement consists of 56 cases and 67 controls so available data is 123 of total (Missing rate is 71%). Since it is expensive and dicult to measure the metal concentrations on precious oesophageal biopsy tissue to everyone so some subjects in the cohort are only chosen to measure for the concentrations of zinc as well as other metals. This can be treated as a special case of a missing covariate problem. ahaz (Anders, 2011) packages in R
Jinhyouk Jung (UConn) JSM 2012 08/01/12 28 / 31
JSM 2012
08/01/12
29 / 31
Table: MI approach using norm for zncent and logreg for anyhist
Covariate sexMale agepill bahistE basehistMD basehistMoD basehistSeD basehistNOS basehistCIS anyhist zncent Estimate 0.120 0.008 0.306 0.631 1.006 1.762 0.747 9.536 0.170 -0.066 Std. Error 0.100 0.006 0.108 0.203 0.504 0.857 1.003 9.652 0.122 0.031 Z 1.208 1.297 2.824 3.110 2.101 2.055 0.744 0.987 1.375 -2.128 Pr(> |z|) 0.256 0.194 0.004 0.001 0.035 0.039 0.456 0.323 0.168 0.033 lower 95% -0.075 -0.004 0.093 0.233 0.071 0.081 -1.219 -9.382 -0.072 -0.127 upper 95% 0.317 0.021 0.519 1.029 2.052 3.442 2.714 20.845 0.412 -0.005
JSM 2012
08/01/12
30 / 31
Concluding Remarks
missing data problems at case cohort study. Since case cohort data follows MAR mechanism Multiple Imputation method To yield valid results when imputing missing data, imputation model include all available variables is crucial point. MI norm, MI ppm, and MI logreg as imputing methods yield reasonable results
in terms of estimates, Coverage rate (CR), Average Length of CI (AL), and Percentage Bias (PB) in simulation study.
large bias when the amount of missing data is greater than 75% but it depends on sample size and correlation among covariates related to target missing variable. However these methods might still produce useful results despite high missing rates.
Jinhyouk Jung (UConn) JSM 2012 08/01/12 31 / 31