0% found this document useful (0 votes)
69 views126 pages

Survival Analysis With Correlated Endpoints: Joint Frailty-Copula Models

Survival Analysis book

Uploaded by

DCS 47
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views126 pages

Survival Analysis With Correlated Endpoints: Joint Frailty-Copula Models

Survival Analysis book

Uploaded by

DCS 47
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 126

SPRINGER BRIEFS IN STATISTICS

JSS RESEARCH SERIES IN STATISTICS

Takeshi Emura
Shigeyuki Matsui
Virginie Rondeau

Survival Analysis
with Correlated
Endpoints
Joint Frailty-
Copula Models
SpringerBriefs in Statistics

JSS Research Series in Statistics

Editors-in-Chief
Naoto Kunitomo, Graduate School of Economics, Meiji University, Bunkyo-ku,
Tokyo, Japan
Akimichi Takemura, The Center for Data Science Education and Research,
Shiga University, Bunkyo-ku, Tokyo, Japan

Series Editors
Genshiro Kitagawa, Meiji Institute for Advanced Study of Mathematical Sciences,
Nakano-ku, Tokyo, Japan
Tomoyuki Higuchi, The Institute of Statistical Mathematics, Tachikawa, Tokyo,
Japan
Toshimitsu Hamasaki, Office of Biostatistics and Data Mg, National Cerebral
and Cardiovascular Center, Suita, Osaka, Japan
Shigeyuki Matsui, Graduate School of Medicine, Nagoya University, Nagoya,
Aichi, Japan
Manabu Iwasaki, School of Data Science, Yokohama City University, Yokohama,
Tokyo, Japan
Yasuhiro Omori, Graduate School of Economics, The University of Tokyo,
Bunkyo-ku, Tokyo, Japan
Masafumi Akahira, Institute of Mathematics, University of Tsukuba, Tsukuba,
Ibaraki, Japan
Takahiro Hoshino, Department of Economics, Keio University, Tokyo, Japan
Masanobu Taniguchi, Department of Mathematical Sciences/School,
Waseda University/Science & Engineering, Shinjuku-ku, Japan
The current research of statistics in Japan has expanded in several directions in line
with recent trends in academic activities in the area of statistics and statistical
sciences over the globe. The core of these research activities in statistics in Japan
has been the Japan Statistical Society (JSS). This society, the oldest and largest
academic organization for statistics in Japan, was founded in 1931 by a handful of
pioneer statisticians and economists and now has a history of about 80 years. Many
distinguished scholars have been members, including the influential statistician
Hirotugu Akaike, who was a past president of JSS, and the notable mathematician
Kiyosi Itô, who was an earlier member of the Institute of Statistical Mathematics
(ISM), which has been a closely related organization since the establishment of
ISM. The society has two academic journals: the Journal of the Japan Statistical
Society (English Series) and the Journal of the Japan Statistical Society (Japanese
Series). The membership of JSS consists of researchers, teachers, and professional
statisticians in many different fields including mathematics, statistics, engineering,
medical sciences, government statistics, economics, business, psychology, educa-
tion, and many other natural, biological, and social sciences. The JSS Series of
Statistics aims to publish recent results of current research activities in the areas of
statistics and statistical sciences in Japan that otherwise would not be available in
English; they are complementary to the two JSS academic journals, both English
and Japanese. Because the scope of a research paper in academic journals inevitably
has become narrowly focused and condensed in recent years, this series is intended
to fill the gap between academic research activities and the form of a single
academic paper. The series will be of great interest to a wide audience of
researchers, teachers, professional statisticians, and graduate students in many
countries who are interested in statistics and statistical sciences, in statistical theory,
and in various areas of statistical applications.

More information about this series at http://www.springer.com/series/13497


Takeshi Emura Shigeyuki Matsui
• •

Virginie Rondeau

Survival Analysis
with Correlated Endpoints
Joint Frailty-Copula Models

123
Takeshi Emura Shigeyuki Matsui
Graduate Institute of Statistics Department of Biostatistics
National Central University Graduate School of Medicine
Taoyuan City, Taiwan Nagoya University
Nagoya, Aichi, Japan
Virginie Rondeau
INSERM CR1219 (Biostatistic)
University of Bordeaux
Bordeaux Cedex, France

ISSN 2191-544X ISSN 2191-5458 (electronic)


SpringerBriefs in Statistics
ISSN 2364-0057 ISSN 2364-0065 (electronic)
JSS Research Series in Statistics
ISBN 978-981-13-3515-0 ISBN 978-981-13-3516-7 (eBook)
https://doi.org/10.1007/978-981-13-3516-7

Library of Congress Control Number: 2019931819

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2019
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface

Typical cancer clinical trials evaluate at least two survival endpoints for patients.
For instance, a trial may adopt overall survival (OS) as the primary endpoint and
time-to-tumour progression (TTP) as the secondary endpoint.1 Often, the major
goal of a cancer clinical trial is to estimate the effect of treatments or prognostic
factors on either one or both of the endpoints. Nowadays, many databases for
cancers offer individual patient data containing at least two endpoints and a number
of prognostic factors (gene expressions, age, residual tumour, cancer stage, etc.).
In many cancers, the two endpoints can be strongly correlated. Indeed, one
endpoint may be a surrogate endpoint of the other endpoint. This implies the need
for an appropriate statistical model for dependence between event times. However,
the standard tools, such as Cox regression, are not suitable to analyze two event
times simultaneously, especially due to dependent censoring. For instance, early
death can censor the occurrence of tumour progression. In this case, one cannot
regard death as an independent censoring event from a progression event.
Inappropriate account for the effect of dependent censoring may produce biased
results in Cox regression.
It is even more challenging to analyze survival data when patients are collected
from multiple studies (meta-analysis or multicenter analysis) and patients have a
large number of prognostic factors. Researchers may need an advanced statistical
method to characterize heterogeneity across multiple studies and to perform a
feature selection tool. In the analysis of such complex survival data, it is insufficient
to apply the traditional Cox regression that can only deal with a single event, a
single study, independent censoring, and a small number of prognostic factors.
The book provides advanced statistical models that incorporate heterogeneity of
a population in terms of frailty and dependence between endpoints in terms of
copulas. Our aim is to analyze two endpoints simultaneously, where one event time
is called terminal event time (e.g., OS) and the other event time is called nonter-
minal event time (e.g., TTP). Our main statistical tool is the joint frailty-copula

1
Disease-free survival (DFS) and progression-free survival (PFS) are other frequently used
endpoints in practice. More details about DFS, PFS, and TTP shall be discussed in Chap. 2.

v
vi Preface

model that is particularly useful for analyzing two event times simultaneously using
meta-analytic data.
We focus on the pair (OS, TTP) because it is perhaps the most reasonable
example to explain why the joint is useful for correlated endpoints. To understand
the natural history of cancer, it would be informative to identify the prognostic
factors of TTP and OS, as well as the association between TTP and OS through a
single joint model. Once a joint model on the pair (OS, TTP) is established, TTP
and other prognostic factors can be used to predict OS (Chap. 5).2 For a method-
ological perspective, the pair (OS, TTP) can naturally explain the semi-competing
risks relationship between two endpoints (Chap. 3).
The book also discusses a feature selection method for incorporating
high-dimensional covariates, such as gene expressions, to the joint frailty-copula
model. The book even aims to contribute to the development of personalized
medicine by providing a dynamic survival prediction formula for a cancer patient,
where copulas can effectively formulate the influence of tumour progression on
survival.
To allow readers to apply the statistical methods of this book to their own data,
we include case studies to demonstrate the R package joint.Cox (freely available
from CRAN; https://cran.r-project.org/). With this package, readers can easily
reproduce the results of our case studies, and they can analyze their data.
Our emphasis is placed on survival data arising from cancer research. Such data
typically include survival endpoints, clinical covariates, and gene expressions
collected on cancer patients. Accordingly, we provide case studies using survival
data for cancer patients. Of course, statistical methods presented in this book can be
applied to many branches of medical research, such as research on AIDS,
cardio-vascular disorders, and neurological disorders. We also have seen that the
statistical methods are useful outside medicine, especially in the field of reliability.

Use as a Textbook

This book may be used as a textbook for a one-semester course aimed at master
students or a short course aimed at (bio) statisticians. Instructors (readers) may
begin with Chap. 2. Chapter 1 can be assigned for preview. After that, instructors
(readers) may proceed gradually to teach (learn) advanced statistical methods in
Chaps. 3–5. Chapters 2 and 3 contain exercises useful for homework/self-study.
Chapter 2 provides an introduction to multivariate survival analysis to review
many of the basic terms used throughout the book. Our review on the term endpoint
is unique, which is not available in other textbooks on survival analysis. This

2
However, it is not suitable to predict OS using endpoints such as DFS and PFS that include OS in
their definitions (Chap. 2). While technically possible to fit the joint frailty-copula model on other
the pairs, such as (OS, PFS) and (PFS, TTP), we shall not discuss this approach in this book.
Preface vii

chapter also reviews frailty models and copula models, the core elements of mul-
tivariate survival models. Studying these models can help understand the subse-
quent materials.
Chapter 3 introduces semi-competing risks data collected from multiple studies
(meta-analysis). This type of data is getting easier to be obtained through free open
source software and public data repository. However, relevant statistical methods are
less discussed in the standard textbooks on survival analysis, though there are a
number of journal articles on this theme in the last decade. The aim of Chap. 3 is to
provide the basis for fitting the joint frailty-copula model to analyze semi-competing
risks data.
Chapter 4 contains a feature selection method for high-dimensional gene
expressions and the compound covariate method to be applied to the joint
frailty-copula model. We detail the idea of compound covariate that was advocated
by John Wilder Tukey in 1993 and its application to the joint frailty-copula model.
Chapter 5 considers a dynamic prediction method of predicting survival for a
cancer patient under the joint frailty-copula model. The prediction formulas
incorporate the genetic and clinical covariates collected on the patient entry as well
as the tumour progression history evolving after the entry.
Chapter 6 collects additional remarks on the previous chapters, and several open
problems for future research. This might help find research topics for students and
researchers.

Use as a Reference Book

This book is designed to allow readers to read each chapter independently. Each
chapter defines all terminologies and symbols with minimal references to other
chapters. Also, each chapter provides a case study that helps readers understand
how to apply the statistical methods and how to interpret the results. Readers who
wish to analyze gene expression data may read Chap. 4. Readers who wish to
develop a clinical prediction model may read Chap. 5. If readers feel difficulty in
reading Chaps. 4 and 5, we suggest reading Chaps. 2 and 3 to build up basic skills.

Taoyuan City, Taiwan Takeshi Emura


Nagoya, Japan Shigeyuki Matsui
Bordeaux Cedex, France Virginie Rondeau
Acknowledgements

We thank the series editor, Dr. Toshimitsu Hamasaki, for his invitation to write this
book and his valuable comments on this book. We also thank Ms. Sayaka
Shinohara for providing excellent figures for Chaps. 1, 3, and 5, and Mr. Jia-Han
Shih for his effort to check the solutions to exercises. The contents of our book have
been presented in a number of places, including CM Statistics conferences (2015 in
London; 2016 in Seville; 2017 in London) and 2nd Pacific Rim Cancer Biostatistics
Workshop (2017 in Kanazawa), the South Taiwan Statistics Conference (in 2015,
2016, and 2017), and seminars arranged by Virginie Rondeau (2014 in University
of Bordeaux), Hayashi Kenichi (2016 in Keio University), and Ha Il-Do (2018 in
Pukyong National University). We thank all the organizers of the conferences and
seminars. We also thank all those who listened to our speeches and gave us
valuable comments, including David Cox, Helene Jacqmin-Gadda, Isao Yokota,
Masataka Taguri, Mengjiao Peng, Roel Braekers, Motomi (Tomi) Mori, and Xiang
Liming.
Emura T. is financially supported by Ministry of Science and Technology,
Taiwan (MOST 107-2118-M-008-003-MY3). Matsui S. is financially supported by
a Grant-in-Aid for Scientific Research (16H06299) and CREST, JST
(JPMJCR1412) and from the Ministry of Education, Culture, Sports, Science and
Technology of Japan. Rondeau V. is financially supported by the Fondation ARC
pour la recherche sur le cancer, France.

ix
Contents

1 Setting the Scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


1.1 Endpoints and Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivations for Investigating Correlated Endpoints . . . . . . . . . . . 3
1.2.1 Understanding Disease Progression Mechanisms . . . . . . . 3
1.2.2 Dynamic Prediction of Death . . . . . . . . . . . . . . . . . . . . . 3
1.2.3 Validating Surrogate Endpoints . . . . . . . . . . . . . . . . . . . . 4
1.3 Copulas and Bivariate Survival Models: A Brief History . . . . . . 5
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Introduction to Multivariate Survival Analysis . . . . . . . . . . . . . . . . . 9
2.1 Endpoints and Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Basic Terminologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Cox Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 R Survival Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Likelihood-Based Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.1 Spline and Penalized Likelihood . . . . . . . . . . . . . . . . . . . 18
2.5 Clustered Survival Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5.1 Shared Frailty Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.2 Likelihood Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.5.3 Penalized Likelihood and Spline . . . . . . . . . . . . . . . . . . . 25
2.6 Copulas for Bivariate Event Times . . . . . . . . . . . . . . . . . . . . . . 25
2.6.1 Measures of Dependence . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6.2 Residual Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.6.3 Likelihood Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

xi
xii Contents

3 The Joint Frailty-Copula Model for Correlated Endpoints . . . . . . . . 39


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Semi-competing Risks Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Joint Frailty-Copula Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4 Penalized Likelihood with Splines . . . . . . . . . . . . . . . . . . . . . . . 45
3.5 Case Study: Ovarian Cancer Data . . . . . . . . . . . . . . . . . . . . . . . 49
3.6 Technical Note 1: Numerical Maximization . . . . . . . . . . . . . . . . 54
3.7 Technical Note 2: LCV and Choice of j1 and j2 . . . . . . . . . . . . 55
3.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4 High-Dimensional Covariates in the Joint Frailty-Copula Model . . . 59
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Tukey’s Compound Covariate . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Univariate Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4 Meta-Analytic Data with High-Dimensional Covariates . . . . . . . . 62
4.5 The Joint Model with Compound Covariates . . . . . . . . . . . . . . . 63
4.6 The Joint Model with Ridge or Lasso Predictor . . . . . . . . . . . . . 65
4.7 Prediction of Patient-Level Survival Function . . . . . . . . . . . . . . . 66
4.8 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.8.1 Simulation Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.8.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.9 Case Study: Ovarian Cancer Data . . . . . . . . . . . . . . . . . . . . . . . 70
4.9.1 Compound Covariate . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.9.2 Fitting the Joint Frailty-Copula Model . . . . . . . . . . . . . . 71
4.9.3 Patient-Level Survival Function . . . . . . . . . . . . . . . . . . . 72
4.10 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5 Personalized Dynamic Prediction of Survival . . . . . . . . . . . . . . . . . . 77
5.1 Accurate Prediction of Survival . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2 Framework of Dynamic Prediction . . . . . . . . . . . . . . . . . . . . . . . 79
5.2.1 Conditional Failure Function . . . . . . . . . . . . . . . . . . . . . 80
5.2.2 Conditional Hazard Function . . . . . . . . . . . . . . . . . . . . . 82
5.3 Prediction Formulas Under the Joint Frailty-Copula Model . . . . . 84
5.4 Estimating Prediction Formulas . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.5 Case Study: Ovarian Cancer Data . . . . . . . . . . . . . . . . . . . . . . . 87
5.6 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6 Future Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.1 Recurrent Events Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.2 Kendall’s s in Meta-Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.3 Validation of Surrogate Endpoints . . . . . . . . . . . . . . . . . . . . . . . 97
Contents xiii

6.4 Left Truncation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98


6.5 Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.5.1 (Gene  Gene) Interaction . . . . . . . . . . . . . . . . . . . . . . . 99
6.5.2 (Gene  Time) Interaction . . . . . . . . . . . . . . . . . . . . . . . 100
6.6 Parametric Failure Time Models . . . . . . . . . . . . . . . . . . . . . . . . 100
6.7 Compound Covariate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Appendix A: Spline Basis Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Appendix B: R Codes for the Ovarian Cancer Data Analysis . . . . . . . . . 109
Appendix C: Derivation of Prediction Formulas . . . . . . . . . . . . . . . . . . . 113
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Abbreviations

AIC Akaike Information Criterion


CC Compound Covariate
CI Confidence Interval
DFS Disease-Free Survival
FGM copula Farlie-Gumbel-Morgenstern copula
GEO Gene Expression Omnibus
IPD Individual Patient Data
LCV Likelihood Cross-Validation
OS Overall Survival
PFS Progression-Free Survival
RR Relative Risk
SD Standard Deviation
SE Standard Error
TCGA The Cancer Genome Atlas
TTP Time-to-Tumour Progression

xv
Notations

a2A An element a belonging to a set A


a0 The transpose of a column vector a
E½X jY The conditional expectation of X given Y
f : A 7! B A function from the domain A to the range B
f_ ðxÞ ¼ df ðxÞ=dx The first derivative of a function f
€f ðxÞ ¼ d2 f ðxÞ=dx2 The second derivative of a function f
arg maxu ‘ðuÞ The argument that maximizes a function ‘
Nð0; 1Þ The standard normal distribution
Ið  Þ The indicator function: IðAÞ ¼ 1 if A is true, or IðAÞ ¼ 0 if A
is false
PrðAjBÞ The conditional probability of A given B
trðXÞ The trace of a square matrix X

xvii
Chapter 1
Setting the Scene

Abstract This chapter introduces the main theme of the book: statistical analysis of
correlated endpoints using a joint/bivariate survival model. We first review statistical
issues in the analysis of survival data involving correlated endpoints and censoring.
We then illustrate our motivations of investigating the interrelationship between end-
points using joint/bivariate survival models. We finally illustrate how copulas and
bivariate survival models have been grown through the literature.

Keywords Censoring · Competing risk · Cox regression · Dependent censoring ·


Endpoint · Informative dropout · Multivariate survival analysis · Overall survival ·
Time-to-tumour progression

1.1 Endpoints and Censoring

Survival analysis is a branch of statistics concerned with event times. In many exam-
ples of survival analysis, event times may be time-to-death as the name survival
suggests. Time-to-death from any cause is termed overall survival (OS) which is con-
sidered as the most objective measure of patient health in cancer research (Chap. 2).
More generally, event times can be time-to-tumour progression (TTP), progression-
free survival (PFS), disease-free survival (DFS), and so on, which are all important
measures of health status for cancer patients.
Multivariate survival analysis is a branch of survival analysis, which deals with
two or more events per subject. For instance, one may observe both TTP and OS for
a cancer patient. In analysis of such multivariate survival data, the key element is an
appropriate account for dependence between event times. Throughout this book, we
focus on frailty and copulas as main tools to model dependence between event times
and to develop estimation and prediction methods.
Nowadays, statisticians and medical researchers can easily obtain multivariate
survival data for patients through free open source software (e.g., R Bioconductor
software) and public data repository (e.g., GEO and TCGA repositories). Many
databases offer individual patient data containing two or more endpoints (OS, TTP,
PFS, etc.) and a number of covariates (gene expressions, age, cancer stage, etc.).

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2019 1
T. Emura et al., Survival Analysis with Correlated Endpoints,
JSS Research Series in Statistics, https://doi.org/10.1007/978-981-13-3516-7_1
2 1 Setting the Scene

However, analyzing such multivariate survival data remains a challenging task


as it requires model specifications on the association between endpoints. For
instance, there exists a positive association between OS and TTP for cancer patients
(Burzykowski et al 2008; Piedbois and Croswell 2008; Rondeau et al. 2015; Emura
et al. 2017, 2018). To build a prognostic model for OS, multivariate survival models
are required to stipulate the form of the joint distribution of OS and TTP. In addi-
tion, adequate statistical methods are required to estimate the degrees of association
between OS and TTP.
Analysis of survival data is further complicated by censoring. If patient follow-up
is terminated before observing endpoints, they are said to be censored. Censoring is
unavoidable in survival data; the study has a planned end of follow-up, or patients
may decide to withdraw from the study. If censoring mechanisms involve the dropout
due to a worsening of the symptoms, it may introduce bias into statistical inference.
Generally, if an endpoint of interest is censored by any mechanism related to the
endpoint, this phenomenon shall be referred to dependent censoring (Emura and Chen
2018). If the endpoint of interest is TTP, one might regard death as a censoring event;
however, statistical inference on TTP may be biased due to dependence between
tumour progression and death. Most of the traditional survival analysis methods
give a valid result under the independent censoring assumption, that is, censoring
mechanisms are unrelated to the endpoint of interest.
The Cox proportional hazards regression model (Cox 1972) has been one of the
traditional survival analysis tools among statisticians and medical researchers. The
partial likelihood approach (Cox 1972) provides a statistical inference procedure
for the Cox model. However, the Cox model with the partial likelihood approach
is clearly insufficient to analyze multivariate survival data, such as the bivariate
survival data in which TTP and OS are two endpoints of interest. In addition, it
may not be a valid approach to fit the Cox model for TTP by treating death as
independent censoring. This is because the independent censoring assumption made
on the partial likelihood approach may be invalid. For a similar reason, it may not be
a valid approach to fit the Cox model for OS if the censoring involves the dropout due
to a worsening of the symptoms. Furthermore, if one wishes to study the link between
TTP and OS, the Cox model for OS adjusted for TTP as a time-dependent covariate
is not appropriate since the Cox model can only handle exogenous or external time-
dependent covariates (that is, the covariate process develops independently of the
event process). This is why the alternative framework of the joint/multivariate models
for two time-to-event endpoints has been developed (Rondeau et al. 2015).
The book hopes to provide statistical methods that appropriately account for the
issues that have just been mentioned.
1.2 Motivations for Investigating Correlated Endpoints 3

1.2 Motivations for Investigating Correlated Endpoints

Researchers may demand a joint/bivariate survival model to specify the interrela-


tionship between endpoints. This book introduces frailty and copulas as main tools
for constructing a joint model. Listed below are specific motivations for adopting a
joint model to deal with correlated endpoints.

1.2.1 Understanding Disease Progression Mechanisms

In clinical trials, researchers often evaluate the treatment effect on selected endpoints
such as OS and TTP. Clearly, these endpoints are associated and evaluation of how
TTP relates OS is important to understand disease progression mechanisms and to
develop anti-cancer drugs (Sherrill et al. 2008; Michiels et al. 2009; Rondeau et al.
2015). In the medical literature, the treatment effect on endpoints is estimated by fit-
ting the Cox model separately for each endpoint, but the results do not allow one to
study the dependence among endpoints. With a joint/multivariate model that accounts
for dependence among endpoints, the results give more insight into the natural his-
tory of the disease and may provide physicians with a useful patient management
strategy dealing with multiple event risks. Chapter 3 introduces our recently devel-
oped approaches through the joint frailty-copula model between TTP and OS (Emura
et al. 2017). Chapter 4 extends the joint frailty-copula model to incorporate high-
dimensional covariates based on Tukey’s compound covariate method (Matsui 2006;
Emura et al. 2012, 2019). Building such a prognostic model with high-dimensional
covariates is an urgent issue to promote personalized or predictive medicine through
statistical methodologies (Matsui et al. 2015). Chapter 5 proposes a personalized
prediction formula to predict OS according to TTP and covariates.

1.2.2 Dynamic Prediction of Death

In cancer studies, predicting risk of death is fundamental for improving patient care
and treatment strategies. There is a great interest in dynamic prediction that predicts
risk of death at a certain moment by utilizing the record of intermediate events (van
Houwelingen and Putter 2011; Mauguen et al. 2013, 2015; Rondeau et al. 2017;
Sène et al. 2016; Emura et al. 2018). For instance, dynamic prediction can utilize
tumour progression histories (e.g., relapse) evolving over time to predict survival for
an individual patient. Clearly, tumour progression histories are related to survival as
a patient often encounters death immediately after tumour progression. This implies
that the probability of death can substantially increases after experiencing tumour
progression (Fig. 1.1). Accordingly, a bivariate survival model (joint model) for
correlated endpoints is essential to build a prediction model. Such joint models for
4 1 Setting the Scene

Fig. 1.1 The predicted probability of death can substantially increase after experiencing tumour
progression. Prediction is made at time t according to tumour progression histories. If a patient
experiences tumour progression before time t, TTP information x can be used for prediction. The
details shall be discussed in Chap. 5

dependent endpoints have been developed under frailty models (Mauguen et al. 2013,
2015) and are utilized for the development of personalized medicine (Sène et al.
2016). Chapter 5 introduces a different approach based on the joint frailty-copula
model, which allows one to utilize meta-analytic data.

1.2.3 Validating Surrogate Endpoints

Measuring dependence between endpoints is an essential process to validate surro-


gate endpoints to be adopted in clinical trials. The formal statistical validation is
possible by using meta-analysis (Burzykowski et al. 2005; Shi and Sargent 2009;
Rotolo et al. 2018). In meta-analytic studies, the process of validating surrogate
endpoints utilizes two different kinds of dependency; study-level dependence and
individual-level dependence (Buyse et al. 2008; Burzykowski et al. 2001, 2005).
A strong individual-level dependency between progression-free survival (PFS)
and OS was found in patients with colorectal cancer (Buyse et al. 2008), head and neck
cancer (Michiels et al. 2009), gastric cancer (Oba et al. 2013) and other cancers. A
strong individual-level dependence between time-to-recurrence and OS was observed
in advanced and early colon cancers (Alonso and Molenberghs 2008). These analyses
adopted copulas to measure dependence between two endpoints. We shall discuss
the topic of surrogate endpoints shortly as an important direction for future research
in Chap. 6.
1.3 Copulas and Bivariate Survival Models: A Brief History 5

1.3 Copulas and Bivariate Survival Models: A Brief History

A copula is a function to link two random variables together to form a joint dis-
tribution. The concept of copulas was introduced by a mathematician, Abe Sklar,
in his study of probabilistic metric space (Sklar 1959). From a modeling point of
view, copulas allow one to create a dependence structure between two variables by
specifying a copula function. Remarkably, a copula function does not restrict the
structure of the marginal distributions. Consequently, measures of dependence, such
as Kendall’s tau, can be derived from a copula without influenced by the marginal
distributions. More about copulas can be found in the book of Nelsen (2006).
Apparently, the applications of copulas in multivariate survival analysis became
active after David George Clayton introduced his bivariate survival model (Clayton
1978). His work yielded one of the most important copulas for bivariate survival
analysis, later known as the Clayton copula. The Clayton copula is a special case of
Archimedean copulas (Genest and MacKay 1986) that contain several useful copulas,
such as the Gumbel and Frank copulas. On the other hand, Clayton’s model is also
regarded as the gamma frailty model (Oakes 1989). More details shall be discussed
in Chap. 2.
One of the most successful papers on copula-based survival models is
Burzykowski et al. (2001) who developed the two-step method for analyzing depen-
dence between two correlated endpoints. In the design of cancer clinical trials with
surrogate endpoints, the current consensus is to base the copula-based validation
approach using the two-step method (Burzykowski et al. 2005). An R package for
implementing the two-step method is recently developed by Rotolo et al. (2018).
While the two-step approach considers dependence between two event times via
copulas, their estimation algorithm relies on the assumption of independent censor-
ship. An inconvenience occurs if one event censors the other. For instance, early
death censors the occurrence of tumour progression, and hence, TTP is dependently
censored by death. Hence, it is not a valid way to apply the two-step method by
treating TTP and OS as bivariate event times subject to independent censoring. This
problem is known as competing risks or dependent censoring. If one fits the Cox
model to TTP endpoint by treating death as an independent censoring, the estimates
of the effects of prognostic factors are systematically biased (Emura and Chen 2016,
2018; Moradian et al. 2017).
Fine et al. (2001) introduced the concept of semi-competing risks in which a ter-
minal event censors a nonterminal event, but not vice versa. The statistical approach
developed by Fine et al. (2001) provides a valid way to fit Clayton’s model to data
with TTP and OS. Their statistical approach was developed under Clayton’s model
and it was later extended to Archimedean copula models by Wang (2003). Chen
(2012) further extended the copula models to implement semiparametric regression
analysis on the transformation Cox model.
For a methodological point of view, copulas offer a unified strategy for model-
ing/analyzing survival data. For instance, the goodness-of-fit test of Emura et al.
(2010) for Archimedean copulas is more general than that of Shih (1998) that is
6 1 Setting the Scene

tailored for Clayton’s model. The likelihood-based method of Chen (2012) for
copulas is more general than the moment-based method of Fine et al. (2001) for
Clayton’s model.
Indeed, copula-based methods are adaptive to survival data with complex depen-
dence structures, such as clustered survival data (Rotolo et al. 2013; Emura et al.
2017; Peng et al. 2018), dependent competing risks data (Lo and Wilke 2010; Chen
2010; de Uña-Álvarez and Veraverbeke 2013, 2017; Emura and Michimae 2017;
Shih and Emura 2018), dependently censored data with one covariate (Braekers and
Veraverbeke 2005) or high-dimensional covariates (Emura and Chen 2016), depen-
dently truncated data (Chaieb et al. 2006; Emura and Murotani 2015; Emura and
Pan 2017), multivariate survival data with complex association pattern (Barthel et al.
2018), and recurrent event data (Ling et al. 2016; Li et al. 2019).
In summary, copulas have provided flexible survival models and unified statistical
methods. Here, copulas stipulate a dependence structure between two endpoints while
they impose no restriction on their marginal distributions. Consequently, copulas
provide measures of dependence, such as Kendall’s tau, that are free from the model
specifications of the marginal survival distributions. One can choose any copula
that he/she likes from a large pool of existing copulas. One can also choose any
specific type of regression models for marginal survival distribution, e.g., the Cox
model with parametric or nonparametric baseline hazard function. This modeling
strategy, which is adopted in this book, provides considerable flexibility/adaptability
to different types of survival data. Copulas would continue to be the heart of modeling
survival data with correlated endpoints.

References

Alonso A, Molenberghs G (2008) Evaluating time to cancer recurrence as a surrogate marker for
survival from an information theory perspective. Stat Methods Med Res 17(5):497–504
Barthel N, Geerdens C, Killiches M, Janssen P, Czado C (2018) Vine copula based likelihood estima-
tion of dependence patterns in multivariate event time data. Compt Stat Data Anal 117:109–127
Braekers R, Veraverbeke N (2005) A copula-graphic estimator for the conditional survival function
under dependent censoring. Can J Stat 33:429–447
Burzykowski T, Buyse M, Piccart-Gebhart MJ et al (2008) Evaluation of tumor response, disease
control, progression-free survival, and time to progression as potential surrogate end points in
metastatic breast cancer. J Clin Oncol 26(12):1987–1992
Burzykowski T, Molenberghs G, Buyse M (eds) (2005) The Evaluation of Surrogate Endpoints.
Springer, New York
Burzykowski T, Molenberghs G, Buyse M, Geys H, Renard D (2001) Validation of surrogate end
points in multiple randomized clinical trials with failure time end points. Appl Stat 50(4):405–422
Buyse M, Burzykowski T, Michiels S, Carroll K (2008) Individual-and trial-level surrogacy in
colorectal cancer. Stat Methods Med Res 17:467–475
Chaieb LL, Rivest LP, Abdous B (2006) Estimating survival under a dependent truncation.
Biometrika 93(3):655–69
Chen YH (2010) Semiparametric marginal regression analysis for dependent competing risks under
an assumed copula. J R Stat Soc Series B Stat Methodol 72:235–251
References 7

Chen YH (2012) Maximum likelihood analysis of semicompeting risks data with semiparametric
regression models. Lifetime Data Anal 18:36–57
Clayton DG (1978) A model for association in bivariate life tables and its application in epidemio-
logical studies of familial tendency in chronic disease incidence. Biometrika 65(1):141–151
Cox DR (1972) Regression models and life-tables (with discussion). J R Stat Soc Series B Stat
Methodol 34:187–220
de Uña-Álvarez J, Veraverbeke N (2013) Generalized copula-graphic estimator. TEST
22(2):343–360
de Uña-Álvarez J, Veraverbeke N (2017) Copula-graphic estimation with left-truncated and right-
censored data. Statistics 51(2):387–403
Emura T, Chen YH (2016) Gene selection for survival data under dependent censoring, a copula-
based approach. Stat Methods Med Res 25(6):2840–2857
Emura T, Chen YH (2018) Analysis of survival data with dependent censoring, copula-based
approaches, JSS Research Series in Statistics, Springer
Emura T, Chen YH, Chen HY (2012) Survival prediction based on compound covariate under
Cox proportional hazard models. PLoS ONE 7(10):e47627. https://doi.org/10.1371/journal.pone.
0047627
Emura T, Lin CW, Wang W (2010) A goodness-of-fit test for Archimedean copula models in the
presence of right censoring. Compt Stat Data Anal 54:3033–3043
Emura T, Matsui S, Chen HY (2019) compound.Cox: univariate feature selection and compound
covariate for predicting survival. Comput Methods Programs Biomed 168:21–37
Emura T, Michimae H (2017) A copula-based inference to piecewise exponential models under
dependent censoring, with application to time to metamorphosis of salamander larvae. Environ
Ecol Stat 24(1):151–173
Emura T, Murotani K (2015) An algorithm for estimating survival under a copula-based dependent
truncation model. TEST 24(4):734–751
Emura T, Nakatochi M, Murotani K, Rondeau V (2017) A joint frailty-copula model between
tumour progression and death for meta-analysis. Stat Methods Med Res 26(6):2649–2666
Emura T, Nakatochi M, Matsui S, Michimae H, Rondeau V (2018) Personalized dynamic prediction
of death according to tumour progression and high-dimensional genetic factors: meta-analysis
with a joint model. Stat Methods Med Res 27(9):2842–2858
Emura T, Pan CH (2017) Parametric likelihood inference and goodness-of-fit for dependently left-
truncated data, a copula-based approach. Stat Pap. https://doi.org/10.1007/s00362-017-0947-z
Fine JP, Jiang H, Chappell R (2001) On semi-competing risks data. Biometrika 88:907–920
Genest C, MacKay J (1986) The joy of copulas: bivariate distributions with uniform marginals. The
Am Stat 40(4):280–283
Lo SM, Wilke RA (2010) A copula model for dependent competing risks. J Roy Stat Soc Ser C
(Appl Stat) 59(2):359–376
Li Z, Chinchilli VM, Wang M (2019) A Bayesian joint model of recurrent events and a terminal
event. Biometrical Journal 61(1):187–202
Ling M, Tao Hu, Sun J (2016) Cox regression analysis of dependent interval-censored failure time
data. Comput Stat Data Anal 103:79–90
Matsui S (2006) Predicting survival outcomes using subsets of significant genes in prognostic
marker studies with microarrays. BMC Bioinformatics 7:156
Matsui S, Buyse M, Simon R (eds) (2015) Design and analysis of clinical trials for predictive
medicine, vol 72. CRC Press, New York
Mauguen A, Rachet B, Mathoulin-Pélissier S et al (2013) Dynamic prediction of risk of death using
history of cancer recurrences in joint frailty models. Stat Med 32(30):5366–5380
Mauguen A, Rachet B, Mathoulin-Pélissier S et al (2015) Validation of death prediction after breast
cancer relapses using joint models. BMC Med Res Methodol 15(1):27
Michiels S, Le Maître A, Buyse M et al (2009) Surrogate endpoints for overall survival in
locally advanced head and neck cancer: meta-analyses of individual patient data. Lancet Oncol
10(4):341–350
8 1 Setting the Scene

Moradian H, Denis Larocque D, Bellavance F (2017) Survival forests for data with dependent
censoring. Stat Methods Med Res. https://doi.org/10.1177/0962280217727314
Nelsen RB (2006) An Introduction to Copulas, 2nd edn. Springer, New York
Oakes D (1989) Bivariate survival models induced by frailties. J Am Stat Assoc 84:487–493
Oba K, Paoletti X, Alberts S et al (2013) Disease-free survival as a surrogate for overall survival in
adjuvant trials of gastric cancer: a meta-analysis. J Natl Cancer Inst 105(21):1600–1607
Peng M, Xiang L, Wang S (2018) Semiparametric regression analysis of clustered survival data
with semicompeting risks. Compt Stat Data Anal 124:53–70
Piedbois P, Croswell MJ (2008) Surrogate endpoints for overall survival in advanced colorectal
cancer: a clinician’s perspective. Stat Methods Med Res 17(5):519–527
Rondeau V, Mauguen A, Laurent A, Berr C, Helmer C (2017) Dynamic prediction models for
clustered and interval-censored outcomes: investigating the intra-couple correlation in the risk of
dementia. Stat Methods Med Res 26(5):2168–2183
Rondeau V, Pignon JP, Michiels S (2015) A joint model for dependence between clustered times to
tumour progression and deaths: a meta-analysis of chemotherapy in head and neck cancer. Stat
Methods Med Res 24(6):711–729
Rotolo F, Legrand C, Van Keilegom I (2013) A simulation procedure based on copulas to generate
clustered multi-state survival data. Comput Methods Programs Biomed 109(3):305–312
Rotolo F, Paoletti X. Michiels S (2018) surrosurv: an r package for the evaluation of failure time
surrogate endpoints in individual patient data meta-analyses of randomized clinical trials. Comput
Methods Programs Biomed 155:189–198
Sène M, Taylor JM, Dignam JJ et al (2016) Individualized dynamic prediction of prostate cancer
recurrence with and without the initiation of a second treatment: development and validation.
Stat Methods Med Res 25(6):2972–2991
Sherrill B, Amonker M, Wu Y et al (2008) Relationship between effects on time-to-disease pro-
gression and overall survival in studies of metastatic breast cancer. Br J Cancer 99:1542–1548
Shi Q, Sargent DJ (2009) Meta-analysis for the evaluation of surrogate endpoints in cancer clinical
trials. Int J Clin Oncol 14(2):102–111
Shih JH (1998) A goodness-of-fit test for association in a bivariate survival model. Biometrika
85(1):189–200
Shih JH, Emura T (2018) Likelihood-based inference for bivariate latent failure time models with
competing risks under the generalized FGM copula. Comput Stat 33(3):1293–1323
Sklar A (1959) Fonctions de répartition àn dimensions et leurs marges. Publications de l’Institut de
Statistique de L’Université de Paris 8:229–231
van Houwelingen HC, Putter H (2011) Dynamic Prediction in Clinical Survival Analysis. CRC
Press, New York
Wang W (2003) Estimating the association parameter for copula models under dependent censoring.
J R Stat Soc Series B Stat Methodol 65(1):257–273
Chapter 2
Introduction to Multivariate Survival
Analysis

Abstract This chapter introduces a framework for multivariate survival analysis


that is used in later chapters. We first explain the concepts of endpoint and censoring
in medical follow-up studies. Next, we review the basic tools in survival analysis,
such as the survival/hazard function, Cox regression, and likelihood-based method.
Finally, we introduce two major procedures for describing dependence among event
times: (i) the shared frailty models for clustered survival data (ii) the copula models
for bivariate survival data. We provide some exercises at the end of this chapter.

Keywords Censoring · Copula · Cox regression · Endpoint · Frailty ·


Independent censoring · Overall survival · Time-to-tumour progression

2.1 Endpoints and Censoring

In survival analysis, the term survival time refers to the time elapsed from an origin
to the occurrence of an event. In many medical studies, the origin may be the time at
study entry that can be the start of a medical treatment, the initiation of a randomized
experiment, or the operation date of surgery. In epidemiological studies, the origin
is often the date of birth. The event may be the occurrence of death, cancer relapse,
or tumour progression, which depends on the research design.
The term endpoint has the same definition as survival time, but is more specifically
used as a primary measure of evaluating medical treatments. Time-to-death is also
called overall survival. For instance, if one is interested in measuring the effect of
chemotherapy or radiotherapy in locally advanced head and neck cancer, the primary
endpoint is overall survival (Michiels et al. 2009; Le Tourneau et al. 2009).
Several different endpoints have been employed to measure a clinically convincing
effect of treatments or drugs for cancer patients (Pazdur 2008; Piedbois and Croswell
2008; Le Tourneau et al. 2009; Soria et al. 2010; Hamasaki et al. 2016). Endpoints
should be well-defined and unambiguous measures that objectively assess clinically
important aspects of a patient. The most popular endpoint is overall survival, which
is precisely defined as follows:

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2019 9
T. Emura et al., Survival Analysis with Correlated Endpoints,
JSS Research Series in Statistics, https://doi.org/10.1007/978-981-13-3516-7_2
10 2 Introduction to Multivariate Survival Analysis

Definition 1 Overall survival (OS) is defined as the time elapsed from study
entry to death from any cause.

Owing to the unambiguity of the definition of death, OS has been the gold standard
endpoint in many cancer studies (Pazdur 2008; Shi and Sargent 2009; Michiels et al.
2009; Oba et al. 2013).
Another endpoint of interest is the elapsed time from study entry to any increase
in tumour size, appearance of new tumour, or distant metastasis (collectively called
progression):

Definition 2 Time-to-tumour progression (TTP) is defined as the time elapsed


from study entry to the first evidence of tumour progression.

In cancer research, TTP may stand for time-to-progression rather than time-to-
tumour progression. However, the distinction between time-to-progression and time-
to-tumour progression is not always clear in the literature, and hence they may be
used interchangeably. In either case, a careful and precise definition of “progression
event” is necessary by following some guideline (e.g., RECIST guideline; Eisenhauer
et al. 2009).
The occurrence of death is not considered as a tumour progression event. Hence,
OS and TTP are distinct event times. The first occurring event between TTP and OS
is referred to progression-free survival (PFS), i.e., PFS  min{OS, TTP}. Hence,
OS and PFS are not distinct event times as OS is part of PFS. Since TTP may be
shorter than OS in patients with cancer-related deaths, TTP and PFS generally allow
faster clinical trials, compared with those evaluating OS.
Disease-free survival (DFS) is another endpoint defined as the time from a medical
treatment until recurrence of disease or death from any cause. DFS is similar to PFS,
but is more specifically used for the adjuvant setting after surgery or radiotherapy,
where long survivors are expected.
The endpoints such as OS, TTP, PFS, and DFS are popular in applications of
survival analysis to cancer research. The adequate choice of the endpoints depends
on the type of diseases (or types of cancers), sample sizes (or powers), study period,
and the goal of the research. Some discussions about OS, TTP, PFS, and DFS can
be found in the medical literature (Pazdur 2008; Green et al. 2008; Soria et al. 2010;
Cheema and Burkes 2013) and in the biostatistical literature (Rondeau et al. 2015;
Emura et al. 2017, 2018; Matsui et al. 2015; Hamasaki et al. 2016; Sugimoto et al.
2017).
Unlike OS, the definitions of TTP, PFS, and DFS vary with the clinicians—i.e., the
tumour progression may be defined by their own timing and assessment criteria. This
often brings some ambiguity as a primary measure of evaluating a medical treatment.
Hence, the definition of tumour progression or disease recurrence should clearly
2.1 Endpoints and Censoring 11

be described in the study protocol. In addition, tumour progression assessments


generally should be verified by central reviewers blinded to the treatments under
study, especially when the study is not blinded (e.g., in the absence of a placebo
control).
In medical follow-up studies, OS for a patient is possibly censored by some
mechanisms to terminate the studies. For example, a clinical trial typically has a
predetermined follow-up period. Clinicians may not obtain OS information for those
patients who are still alive after the study end. For another example, clinicians may
fail to obtain OS if a patient drops out of (or withdraws from) the trial before he/she
dies. In such circumstances, clinicians acquire partial information about OS as the
elapsed time until the censoring point.

Definition 3 If OS is the primary endpoint, censoring time is defined as the


time elapsed from study entry to the censoring event due to dropout (or with-
drawal) from a study, or the end of a predetermined follow-up period. If TTP
is the primary endpoint, the censoring event also includes death.

In the presence of censoring, clinicians can access either OS or censoring time,


whichever comes first for a patient. OS is available if the patient dies during the
follow-up. Alternatively, censoring time is available if the patient is alive at the end
of the study period or at the time of dropout.
Similarly, PFS and DFS are subject to censoring due to the termination of follow-
up or patients’ early withdrawals. Unlike PFS and DFS, TTP can also be censored by
death because TTP and OS are distinct event times as noted previously. Therefore,
the use of TTP endpoint for cancer patients would be more effective than PFS and
DFS in situations where the majority of deaths are unrelated to cancer.
Multivariate survival analysis is a branch of survival analysis that deals with more
than one event times per subject. For instance, one may observe both TTP and OS
for a cancer patient. In analysis of such multivariate survival data, the key element
is an appropriate account for dependence between event times.

2.2 Basic Terminologies

This section summarizes basic terminologies and notations used in survival analysis.
Consider random variables, defined as
• X: event time
• C: censoring time
Due to censoring, either one of X or C is observed. One can observe X if an event
comes faster than censoring (X ≤ C). On the other hand, one cannot exactly observe
X if censoring comes faster than an event (X > C). Even if the exact value of X is
12 2 Introduction to Multivariate Survival Analysis

unknown for the censored case, X is known to be greater than C. What we observe
are the first occurring time (min{X, C}) and the censoring status (X ≤ C or X > C).
Survival data consist of {(Ti , δi ); i  1, . . . . , n}, where
• Ti : event time or censoring time whichever comes first,
• δi : censoring indicator (δi  1 if Ti is event time, or δi  0 if Ti is censoring time).
One can write Ti  min{X i , Ci } and δi  I(X i ≤ Ci ), where I(·) is the indicator
function.
Survival data often include covariates, such as gender, tumour size, and cancer
stage. In medical applications, covariates are usually prognostic factors that are
associated with event time. With covariates, the dataset consists of {(Ti , δi , Zi ); i 
1, . . . , n}, where
• Zi  (Z i1 , . . . , Z i p ) : p-dimensional covariates.
Throughout this chapter, we impose the following assumption:

Independent censoring assumption: X and C are conditionally independent


given Z.

This assumption is imposed on most statistical methods for analyzing survival data,
such as Cox regression (Sect. 2.3). If the independent censoring assumption does
not hold, X can be dependently censored by C. However, throughout this book, the
symbol C always represents censoring time that satisfies the independent censoring
assumption.
The survival function is defined as S(t|Zi ) ≡ Pr(X i > t|Zi ) that is the probability
that the patient with covariates Zi is event-free at time t. This is the patient-level
survival function as it is conditionally on patient characteristics. The survival function
at Zi  0 is called the baseline survival function and denoted as S0 (t)  S(t|Zi  0).
Hereafter, we suppose that S(t|Zi ) is continuous and differentiable in t. The
instantaneous probability of experiencing an event between t and t + dt is S(t|Zi ) −
S(t + dt|Zi ), where dt is an infinitely small number. Since this probability is equal
to zero, we consider the rate by dividing by dt such that

S(t|Zi ) − S(t + dt|Zi ) dS(t|Zi )


f (t|Zi )  − .
dt dt
This is the density function.
The density function represents the frequency of events at time t. While the density
is an important measure in epidemiology or demography, it is not frequently used
in prognostic analysis of a patient. This is because a patient or clinician is more
interested in the risk than the frequency.
We formulate the risk of a patient as the instantaneous event rate between t and
t + dt given that the patient is surviving at time t. The risk, expressed as a function
of t, is called the hazard function:
2.2 Basic Terminologies 13

Definition 4 The hazard function is defined as

Pr(t ≤ X i < t + dt|X i ≥ t, Zi ) d


λ(t|Zi ) ≡  − log S(t|Zi ).
dt dt
The cumulative hazard function is defined as

t
(t|Zi )  λ(u|Zi )du.
0

The hazard function at Zi  0 is called the baseline hazard function and denoted
as λ0 (t)  t λ(t|Zi  0). Also, the baseline cumulative hazard function is defined as
0 (t)  0 λ0 (u)du.
The survival function and the cumulative hazard function is related through
S(t|Zi )  exp{−(t|Zi )}. The hazard function is written as λ(t|Zi ) 
f (t|Zi )/S(t|Zi ).
A parametric model is defined by a survival function or hazard function that
has a specified distributional form such as the exponential, Weibull, and log-normal
distributions. In parametric models, the effects of covariates on survival also have

a specific form. An example is an Weibull model S(t|Zi )  exp(−λt ν eβ Zi ), t ≥ 0,
where λ > 0 is a scale parameter, ν > 0 is a shape parameter, and β are regression

coefficients. It can be shown that S(t|Zi )  S0 (t)exp(β Zi ) for t ≥ 0, where S0 (t) 
S(t|Zi  0)  exp(−λt ν ) is the baseline survival function. With this model, the
effects of the covariates on survival is captured by β.
A semi-parametric model is defined by a survival function or hazard function that
has a specified form of covariate effects on survival without a specified distributional
form. Semi-parametric models are more flexible and usually fit better to data than
parametric models.
To make statistical inference on semi-parametric models, some mild assumptions
should still be made for the unspecified part. For instance, the baseline survival
function S0 (·) is assumed to be a decreasing step function with jumps at observed
times of death, or the baseline hazard function λ0 (·) is assumed to be a weighted sum
of spline basis functions. In either case, these assumptions do not restrict too much
the shape of the survival or hazard function by allowing a large number of parameters
to be determined by data.
Unfortunately, parametric models, such as the exponential, Weibull and log-
normal models, may not adequately fit survival data from cancer patients. This
implies that survival experience of cancer patients do not show a simple pattern
probably because they may experience complex treatment regimens and disease pro-
gression. This is why semi-parametric models are more useful and widely applied
in medical research. One may still accept the assumptions that the hazard func-
tion is continuous, does not abruptly change over time, and smooth (continuously
14 2 Introduction to Multivariate Survival Analysis

differentiable). Hazard models with cubic splines meet these assumptions without
restricting too much the shape of the hazard function (see Sect. 2.4.1 for more
details).

2.3 Cox Regression

The hazard function is a sensible measure for describing the risk of experiencing an
event, and hence can be used for prognostic analysis for a patient. It is then natural
to incorporate the effect of covariates into the hazard function.

Definition 5 The Cox proportional hazards model (Cox 1972) is defined as

λ(t|Zi )  λ0 (t) exp(β Zi ),

where β  (β1 , . . . , β p ) are unknown coefficients and λ0 (·) is an unknown


baseline hazard function.

The Cox model states that all patients share the same time-trend function λ0 (t). An
important property of the Cox model is that the form of λ0 (·) is unspecified, meaning
that the model is semi-parametric. Hence, the Cox model offers greater flexibility
over parametric models that specify the form of λ0 (·). The Cox model is also specified

as S(t|Zi )  S0 (t)exp(β Zi ) for t ≥ 0, where the form of S0 (t) is unspecified.
Let Z i be a dichotomous covariate, such as gender with Z i  1 for male and
Z i  0 for female. Under the Cox model λ(t|Z i )  λ0 (t) exp(β Z i ), the relative risk
(RR) is defined as

λ(t|Z i  1)
RR   exp(β).
λ(t|Z i  0)

For instance, the value RR  2 implies that the event rate under Z i  1 is twice the
event rate under Z i  0.
Let Z i be a continuous covariate, such as a gene expression. Under the Cox model
λ(t|Z i )  λ0 (t) exp(β Z i ), if the scale of Z i is standardized (to be mean  0 and SD
 1), the value exp(β) is interpreted as the RR for a one SD increase in Z i . That is,

λ(t|Z i + 1)
RR   exp(β).
λ(t|Z i )

Under the Cox model, one can use survival data {(Ti , δi , Zi ); i  1, . . . , n} to
estimate β. Let Ri  { : T ≥ Ti } be the risk set that contains patients at-risk at
2.3 Cox Regression 15

time Ti . The partial likelihood estimator β̂  (β̂1 , . . . , β̂ p ) is defined by maximizing


the partial likelihood function (Cox 1972)
 δi

n
exp(β Zi )
L(β)    .
i1 ∈Ri exp(β Z )

The estimator β̂ is consistent when the independent censoring assumption holds and
the model specification is correct (Fleming and Harrington 1991). If the independent
censoring assumption does not hold, β̂ is a biased estimate for the true regression
coefficients (Emura and Chen 2016, 2018).
The log-partial likelihood is
⎡ ⎧ ⎫⎤
 n ⎨ ⎬
(β)  log L(β)  δi ⎣β Zi − log exp(β Z ) ⎦.
⎩ ⎭
i1 ∈Ri

The derivatives of (β) give the score function,


  

∂(β) 
n
∈Ri Z exp(β Z )
S(β)   δi Zi −   .
∂β i1 ∈Ri exp(β Z )

The second-order derivatives of (β) constitute the Hessian matrix,

∂ 2 (β)
H (β) 
∂β∂β
   
 n 
∈Ri Z Z exp(β Z ) ∈Ri Z exp(β Z )
− δi   −  
i1 ∈Ri exp(β Z ) ∈Ri exp(β Z )
 
 
∈Ri Z exp(β Z )
  .
∈Ri exp(β Z )

Interval estimates for β are obtained by applying the asymptotic theory (Fleming
and Harrington 1991). The information matrix
 is defined as i(β̂)  −H (β̂). The
standard error (SE) of β̂ j is SE(β̂ j )  {i −1 (β̂)} j j , where {i −1 (β̂)} j j is the j-th
diagonal element of the inverse information matrix. The 95% confidence interval
(CI) is β̂ j ± 1.96 × SE(β̂ j ). The Wald test for the null hypothesis H0 : β j  0 is
based on the Z-value z  β̂ j /SE(β̂ j ). The P-value is computed as Pr( |Z |> |z| ),
where Z ∼ N (0, 1).
16 2 Introduction to Multivariate Survival Analysis

2.3.1 R Survival Package

We shall briefly introduce the R package survival to perform Cox regression.


As a running example, we use a dataset consisting of n  58 ovarian cancer
patients obtained from “Study 2” that shall be mentioned in Sect. 2.5. The event time
of interest is time-to-recurrence after surgery. In the data, 48 patients experience can-
cer recurrence and other 10 patients are censored. The covariate is a binary variable
(Z j  0 vs. Z j  1) on the residual tumour size at surgery (≤1 cm vs. >1 cm).
After installing the package, we enter event time ti , censoring indicator δi , and
covariate Z i for n  58 patients. Then, we run the codes:

library(survival)
t.event=c(385, 2582, 175, 162, 860, 3025, 454, 89, 252, 30, 401, 505, 511, 494, 195,
31, 242, 2195, 2282, 309, 3315, 387, 287, 367, 542, 165, 31, 2246, 481, 1003,
380, 367, 342, 265, 480, 664, 4208, 321, 431, 929, 125, 328, 3644, 811, 872,
804, 580, 298, 12, 282, 218, 114, 566, 803, 265, 407, 208, 309) ## event times
event=c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1,
1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1) ## censoring indicators
Z=c(1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0,
1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1) ## covariates
coxph(Surv(t.event,event)~Z)

Below are the outputs:

> coxph(Surv(t.event,event)~Z)
Call:
coxph(formula = Surv(t.event, event) ~ Z)

coef exp(coef) se(coef) z p


Z 0.849 2.338 0.307 2.76 0.0057

Likelihood raƟo test=7.9 on 1 df, p=0.00495


n= 58, number of events= 48

The results on Cox regression show β̂  0.849, RR  exp(β̂)  2.338, SE(β̂) 


0.307, and z  β̂/SE(β̂)  2.76. The P-value of the Wald test is 0.0057, and hence
the residual tumour size is significantly associated with time-to-recurrence. The result
implies that patients having a residual tumour (>1 cm) are more than twice as likely
to experience cancer relapse after surgery. Hence, the residual tumour would be an
important prognostic factor for cancer recurrence.

2.4 Likelihood-Based Method

This section considers likelihood-based methods for analyzing the dataset


{(Ti , δi , Zi ); i  1, . . . , n}. These methods are applicable to both parametric and
2.4 Likelihood-Based Method 17

semi-parametric models, and hence, provide more general tools than the partial like-
lihood method.
Event time X i and censoring time Ci are related to the observation (Ti , δi ) through
• X i  Ti and Ci > Ti if δi  1,
• X i > Ti and Ci  Ti if δi  0.
Each patient experiences either one of the two cases. Hence, the probability of observ-
ing the data ( Ti , δi , Zi ) for the i-th patient is

L i  Pr(X i  Ti , Ci > Ti |Zi )δi Pr(X i > Ti , Ci  Ti |Zi )1−δi .

This is the likelihood of binary outcomes. Under the independent censoring assump-
tion,

L i  [Pr(X i  Ti |Zi ) Pr(Ci > Ti |Zi )]δi [Pr(Ci  Ti |Zi ) Pr(X i > Ti |Zi )]1−δi
 [ f X (Ti |Zi )SC (Ti |Zi )]δi [ f C (Ti |Zi )S X (Ti |Zi )]1−δi
 [ f X (Ti |Zi )δi S X (Ti |Zi )1−δi ][ f C (Ti |Zi )1−δi SC (Ti |Zi )δi ].

where S X (t|Zi )  Pr(X i > t|Zi ), f X (t|Zi )  −dS X (t|Zi )/dt, SC (t|Zi )  Pr(Ci >
t|Zi ), and f C (t|Zi )  −dSC (t|Zi )/dt. In addition to the independent censoring
assumption, we further impose the following assumption:

Non-informative censoring assumption: SC (t|Zi ) does not contain any param-


eters related to S X (t|Zi ).

Under the non-informative censoring assumption, the term


f C (Ti |Zi )1−δi SC (Ti |Zi )δi can be ignored. Therefore, the likelihood function is
redefined as


n 
n
L f X (Ti |Zi )δi S X (Ti |Zi )1−δi  λ X (Ti |Zi )δi exp[− X (Ti |Zi )], (2.1)
i1 i1

The log-likelihood is


n
  log L  [δi log λ X (Ti |Zi ) −  X (Ti |Zi )].
i1

Independent censoring and non-informative censoring are mathematically dif-


ferent concepts. However, independent censoring in real-world applications usually
implies non-informative censoring. An artificial or unusual example may exist for
informative but independent censoring (p. 150 of Andersen et al. 1993; p. 196 of
Kalbfleisch and Prentice 2002). Independent censoring is more crucial assumption
18 2 Introduction to Multivariate Survival Analysis

than non-informative censoring since dependent censoring leads to bias in estimation


(Emura and Chen 2018).
Suppose that the log-likelihood is written as (ϕ), where ϕ is a vector of
parameters. Then, the maximum likelihood estimator (MLE) is defined by ϕ̂ 
arg maxϕ (ϕ). The first derivatives of the log-likelihood give the score function,
S(ϕ)  ∂(ϕ)/∂ϕ. The second derivatives of the log-likelihood give the Hessian
matrix H (ϕ)  ∂ 2 (ϕ)/∂ϕ∂ϕ . The MLE ϕ̂ is obtained from the Newton–Raphson
algorithm

ϕ(k+1)  ϕ(k) − H −1 (ϕ(k) )S(ϕ(k) ), k  0, 1, . . . .

Interval estimates for ϕ follow from the asymptotic theory of MLEs. The infor-
mation matrix is defined as i(ϕ̂)  −H (ϕ̂). For the j-th component φ̂ j of ϕ̂, the

standard error (SE) is SE(φ̂ j )  {i −1 (ϕ̂)} j j , where {i −1 (ϕ̂)} j j is the j-th diagonal
element of the inverse information matrix. The 95% CI is φ̂ j ± 1.96 × SE(φ̂ j ).

2.4.1 Spline and Penalized Likelihood

We consider a proportional hazards model λ(t|Zi )  λ0 (t; h) exp(β Zi ), where the


baseline hazard function is parametrically specified by a vector h. Letting ϕ  (h, β),
the log-likelihood is written as


n
(ϕ)  [δi {log λ0 (Ti ; h) + β Zi } − 0 (Ti ; h) exp(β Zi )].
i1

If the dimension of h is high, the baseline hazard function is a complex function of


t and difficult to interpret. In this case, one may wish to constrain the complexity of the
hazard function. A popularway to quantify the complexity of a function f is through
the roughness defined as f¨(t)2 dt, where f¨(t)  d2 f (t)/dt 2 . We then maximize
the likelihood while minimizing the roughness through a penalized likelihood

(ϕ) − κ λ̈0 (t; h)2 dt.

where κ > 0 is a given value, called a smoothing parameter.


A penalized likelihood is particularly useful for the spline-based method. The
spline method allows for a flexible hazard function that is difficult to be achieved
by parametric models such as the Weibull model. The spline method is also known
for its computational effectiveness because the spline basis functions are easy to
differentiate and integrate (Ramsay 1988). L
We specify the baseline hazard function as λ0 (t; h)  1 h  M (t), where
h  ’s are positive parameters and M (t)’s are called the M-spline basis func-
tions (Ramsay 1988). The number of bases L also represents the number of
2.4 Likelihood-Based Method 19

free parameters. One has the baseline cumulative hazard function


L   0 (t; h)  
L
1 h  I  (t) and the baseline survival function S0 (t; h)  exp − 1 h  I  (t) ,
where Il (t)’s are integrations of Ml (t) ’s, called the I-spline basis func-
tions.
To compute the spline basis functions, one needs to specify the knots and the range
of t. One of the simplest ways is to set the range t ∈ [ξ1 , ξ3 ] for the equally spaced
knots ξ1 < ξ2 < ξ3 , where ξ1  min(T j ), ξ3  max(T j ), and ξ2  (ξ1 + ξ3 )/2.
Figure 2.1 displays the M- and I-spline basis functions with L  5 and the knots
ξ1  1, ξ2  2, and ξ3  3. The joint.Cox package (Emura 2019) provides functions
M.spline() for M (t) and I.spline() for I (t) for L  5. The expressions of M (t) and
I (t) are given in Appendix A.

Fig. 2.1 M-spline basis functions (left-panel) and I-spline basis functions (right-panel) with knots
ξ1  1, ξ2  2, and ξ3  3

The joint.Cox R package (Emura 2019) provides a function splineCox.reg() to


compute penalized MLEs. The function automatically selects the optimal value of
κ among the user-specified grid points for κ. Since the choice of the grid points is
not obvious, some graphical diagnostic tools are necessary, which shall be detailed
in Chap. 3. Finally, the penalized likelihood estimator is defined as
  
PL
(β̂ , ĥ )  arg max (β, h) − κ̂ λ̈0 (t; h) dt ,
PL 2

PL
where κ̂ is the optimal value. Usually, the value of β̂ is close to the partial likelihood
estimator β̂.
20 2 Introduction to Multivariate Survival Analysis

2.5 Clustered Survival Data

Shared frailty models are useful to incorporate unexplained heterogeneity in the risks
of experiencing an event for patients. For instance, we shall consider a multicenter
analysis, where patients are collected from different hospitals. Obviously, some hos-
pitals perform well while others perform poorly in terms of prolonging survival of
their patients. In shared frailty models, it is assumed that each hospital has its own
unobserved factor (called the frailty term) influencing the risks of all patients in the
hospital. Hence, the patients in the same hospital share the same frailty term. Con-
sequently, the study population may be regarded as a mixture of frail patients (in a
hospital with a high frailty term) and robust patients (in a hospital with a low frailty
term).
Individual patient data (IPD) meta-analysis is based on patients collected from
different studies (Fig. 2.2). By replacing the term “center” to “study”, IPD meta-
analysis is essentially equivalent to multicenter analysis.
Figure 2.2 shows meta-analytic data collected from four different studies. The data
provide time-to-recurrence for 912 surgically treated patients with ovarian cancer.
One can observe the heterogeneity of relapse (recurrence) rates among the four
studies, with highest being Study 2 (83%) and the lowest being Study 4 (49%).

Study 2:
58 patients
48 relapses
Study 1: Study 4:
(83%)
84 patients 510 patients
59 relapses 252 relapses
(70%) (49%)
Study 3:
260 patients
185 relapses
(71%)

Fig. 2.2 Meta-analytic data combining four different studies of ovarian cancer patients (Ganzfried
et al. 2013; Emura et al. 2018). The data is available in the joint.Cox R package (Emura 2019)

In both the IPD meta-analysis and multicenter analysis, it is customary to account


for heterogeneity by means of random effects, called frailty. The gamma frailty
distribution has been routinely applied to account for the heterogeneity, where the
variance of the gamma distribution measures the degree of heterogeneity (Duchateau
et al. 2002; Rondeau et al. 2015; Emura et al. 2017, 2018).
Instead of “study” or “center”, we shall use a more general term “cluster” to
represent the study unit. This allows one to think more general and broad applica-
tions encountered in medical studies, where a cluster is a well-defined collection of
2.5 Clustered Survival Data 21

patients. In some family-based studies, a cluster may be a married couple (Rondeau


et al. 2017), or a family containing more than one member (Rodríguez-Girondo
et al. 2018).

2.5.1 Shared Frailty Model

We consider a dataset consisting of G independent clusters with the i-th cluster


containing Ni patients. For i  1, 2, . . . , G and j  1, 2, . . . . , Ni , let X i j be event
time and Zi j be a vector of covariates. For instance, the meta-analytic data of ovarian
cancer patients contain G  4 clusters, and each cluster has N1  84, N2  58,
N3  260, and N4  510 patients (total 912 patients).
In the shared frailty models, the heterogeneity of the event rates is specified by
multiplying a frailty term to the Cox model:

Definition 6 The shared frailty model is defined on the hazard function for the
j-th patient in the i-th cluster:

λi j (t|u i , Zi j )  u i λ0 (t) exp(β Zi j ),

where β  (β1 , . . . , β p ) are unknown coefficients, λ0 (·) is an unknown base-


line hazard function, and u i > 0, i  1, 2, . . . , G, are unobserved frailty
terms.

The most popular choice for the frailty distribution is the gamma density
 
1 u
f η (u)  u 1/η−1
exp − , η > 0, u > 0.
(1/η)η1/η η

The mean and variance are E η (u i )  1 and Varη (u i )  η.


A high-risk cluster is expressed as ui > 1, and a low-risk cluster is expressed as
0 < u i < 1. Hence, the variance parameter η represents the amount of heterogeneity
in the risk of the event. The limit η → 0 corresponds to the absence of heterogeneity.
Note that the value of u i is unobserved (i.e., u i is a latent variable).
In the shared frailty models, it is assumed that X i j , j  1, 2, . . . , Ni are condi-
tionally independent given u i . Hence, the joint survival function given u i is

Pr(X i j > xi j , j  1, 2, . . . , Ni |u i , Zi j , j  1, 2, . . . , Ni )
⎡ ⎤
 Ni 
Ni
 Pr(X i j > xi j |u i , Zi j )  exp⎣−u i Λ0 (xi j ) exp(β Zi j )⎦.
j1 j1
22 2 Introduction to Multivariate Survival Analysis

If we integrate out the unobserved u i , we have the joint survival function

Pr(X i j > xi j , j  1, 2, . . . , Ni |Zi j , j  1, 2, . . . , Ni )


⎡ ⎤ ⎡ ⎤
∞ 
Ni 
Ni
 exp⎣−u Λ0 (xi j ) exp(β Zi j )⎦ f η (u)du  ϕη ⎣ Λ0 (xi j ) exp(β Zi j )⎦,
0 j1 j1

∞
where ϕη (s)  0 exp(−su) f η (u)du is the Laplace transform of f η (·). Similarly,
the marginal survival function is Pr(X i j > xi j |Zi j )  ϕη [0 (xi j ) exp(β Zi j )] for
j  1, 2, . . . , Ni . Thus, the joint survival function is

Pr(X i j > xi j , j  1, 2, . . . , Ni |Zi j , j  1, 2, . . . , Ni )


⎡ ⎤

Ni
 ϕη ⎣ ϕη−1 {Pr(X i j > xi j |Zi j )}⎦.
j1

Examples of the Laplace transform and its inverse function are given in Table 2.1.

Table 2.1 Examples of frailty distributions


Distribution Heterogeneity Laplace: ϕη (s) Inverse Kendall’s tau:τη
Parameter Laplace:ϕη−1 (y)
Gamma η>0 (1 + ηs)−1/η (y −η − 1)/η η/( η + 2 )
Positive stable η≥0 exp{ −s 1/(η+1) } { − log(y) }η+1 η/( η + 1 )
Log-normal η>0 Not available Not available Not available

For the gamma frailty model, one has ϕη (s)  (1 + ηs)−1/η and ϕη−1 (y)  (y −η −
1)/η. Thus,

Pr(X i j > xi j , j  1, 2, . . . , Ni0 |Zi j , j  1, 2, . . . , Ni )


⎡ ⎤− η1
Ni
⎣ Pr(X i j > xi j |Zi j )−η − (Ni − 1)⎦ ,
j1

where the marginal distribution is Pr(X i j > xi j |Zi j )  [1+η 0 (xi j ) exp(β Zi j )]−1/η .
To investigate the correlation between X i j and X ik for j  k, we consider the
bivariate survival function by letting xi h  0 for h  j and h  k:

Pr(X i j > xi j , X ik > xik |Zi j , Zik )


 − η1
 Pr(X i j > xi j |Zi j )−η + Pr(X ik > xik |Zik )−η − 1 .

Kendall’s tau (τ ) is a measure of correlation between two random variables X i j and


X ik . It can be shown in Sect. 2.6.1 that Kendall’s tau is simply written as τ jk 
2.5 Clustered Survival Data 23

η/(η + 2). Hence, a large value of η corresponds to a higher association, and η  0


corresponds to independence.
The gamma frailty is the conjugate distribution for the Weibull distribution
(Molenbergh et al. 2015). The resultant model is written as Pr(X i j > xi j |Zi j ) 
[1 + ηλxiνj exp(β Zi j )]−1/η and called the Weibull–gamma distribution. The model
is also known as the Burr-XII distribution (Burr 1942). The Weibull–gamma model
permits the expression of mean and variance of X i j .

2.5.2 Likelihood Function

We consider a dataset consisting of G independent clusters with the i-th cluster


containing Ni patients. For i  1, 2, . . . , G and j  1, 2, . . . , Ni , let
• X i j : event time,
• Ci j : independent and non-informative censoring time.
The dataset consists of ( Ti j , δi j , Zi j ) for i  1, 2, . . . , G and j  1, 2, . . . , Ni ,
where
• Ti j  min( X i j , Ci j ): event time or censoring time whichever comes first,
• δi j  I( Ti j  X i j ): censoring status (censor  0; event  1),
where I( · ) is the indicator function,
• Zi j : p-dimensional covariates.

Proposition 1 Under the shared frailty model, the log-likelihood is



G Ni
  log L  ⎣ δi j log λi j (Ti j )
i1 j1
⎧∞ ⎧ ⎫ ⎫⎤
⎨ ⎨ 
Ni ⎬ ⎬
+ log u im i exp −u i Λi j (Ti j ) f η (u i )du i ⎦.
⎩ ⎩ ⎭ ⎭
0 j1

where λi j (t)  λ0 (t) exp(β Zi j ), i j (t)  0 (t) exp(β Zi j ), and m i 


 Ni
j1 δi j . In particular, under the gamma frailty model, the log-likelihood
is

 G Ni    
⎣ 1 1
 δi j log λi j (Ti j ) + m i log η + log Γ m i + − log
i1 j1
η η
⎧ ⎫⎤
  ⎨  Ni ⎬
1
− mi + log 1 + η i j (Ti j ) ⎦.
η ⎩ ⎭
j1
24 2 Introduction to Multivariate Survival Analysis

Proof of Proposition 1: Define notations Ti  (Ti1 , . . . , Ti Ni ) and δi 


(δi1 , · · · , δi Ni ). Since the G clusters are independent, the likelihood takes the form
!G
L  i1 L i , where L i is the contribution from the i-th cluster. To compute L i , we
recall the assumptions:
• All patients in the i-th cluster share the common frailty term u i ,
• All patients in the i-th cluster are independent given the frailty term u i ,
Under these assumptions, L i is computed as in Eq. (2.1) given u i . Accordingly,


Ni
L i (Ti , δi |u i )  {u i λi j (Ti j )}δi j exp{−u i Λi j (Ti j )}
j1
⎧ ⎫ ⎧ ⎫
⎨Ni ⎬ ⎨ 
Ni ⎬
 u im i λi j (Ti j )δi j exp −u i Λi j (Ti j ) .
⎩ ⎭ ⎩ ⎭
j1 j1

Integrating out the unobserved frailty term,


⎧ ⎫
∞ ⎨ Ni ⎬
L(Ti , δi )  L i (Ti , δi |u i ) f η (u i )du i  λi j (Ti j )δi j
⎩ ⎭
0 j1
⎧ ⎫
∞ ⎨ 
Ni ⎬
× u im i exp −u i Λi j (Ti j ) f η (u i )du i .
⎩ ⎭
0 j1

Combining the likelihoods for the G independent clusters,


⎡⎧ ⎫
G G ⎨ Ni ⎬
L L i (Ti , δi )  ⎣ λi j (Ti j )δi j
⎩ ⎭
i1 i1 j1
⎧ ⎫ ⎤
∞ ⎨ Ni ⎬
× u im i exp −u i Λi j (Ti j ) f η (u i )du i ⎦.
⎩ ⎭
0 j1

The log-likelihood is obtained by taking the logarithm of the above expression. The
log-likelihood under the gamma frailty model is obtained from the integral
⎧ ⎫
∞ ⎨ Ni ⎬ ηm i (m i + 1/η)
u im i exp −u i Λi j (Ti j ) f η (u i )du i 
⎩ ⎭ (1/η)
0 j1
⎛ ⎞−m i −1/η 
Ni
⎝1 + η Λi j (Ti j )⎠ .
j1
2.5 Clustered Survival Data 25

Many computing techniques and statistical packages for maximizing the log-
likelihood have been developed under a semi-parametric model, where the form of
λ0 (·) is unspecified (Hirsch and Wienke 2012). Vu and Knuiman (2002), Duchateau
et al. (2002), Klein and Moeschberger (2003), and Duchateau and Janssen (2007)
developed EM algorithms. Ha et al. (2017) regard u i as parameters, leading to an
iterative algorithm under a hierarchical likelihood. The algorithms in the R packages
survival and frailtypack do not use EM algorithms. Majority of software packages
use either the log-normal or gamma frailty distribution (Hirsch and Wienke 2012).
The Newton–Raphson algorithms are useful under parametric models, where
the form of λ0 (·) is parametrically specified. For instance, the R function, nlm()
or optim(), can maximize the log-likelihood by a Newton–Raphson type algorithm.
Examples for parametric models include the Weibull, log-normal, Pareto and Gamma
distributions. Among those, the Weibull model is the most common choice, which
specifies the baseline hazard function as λ0 (t; h)  λνt ν−1 , where h  (λ, ν), λ > 0
is a scale parameter and ν > 0 a shape parameter. However, the Weibull model has
only two parameters and is unlikely to capture the local changes of the hazard over
the follow-up period.

2.5.3 Penalized Likelihood and Spline

Rondeau et al. (2003) developed the spline method to obtain a smooth estimate
of λ0 (·) for
clustered survival data. They specify the baseline hazard function as
L
λ0 (t; h)  1 h  M (t), where h  ’s are positive parameters and M (t)’s are the
M-spline basis functions. Rondeau et al. (2003) proposed estimates (ĥ, β̂, η̂) that
maximize a penalized likelihood

(h, β, η) − κ λ̈0 (t; h)2 dt,

where (h, β, η) is the log-likelihood in Proposition 1, λ̈0 (t; h)  d 2 λ0 (t; h)/dt 2 ,


and κ > 0 is a smoothing parameter. The smoothing parameter controls the degrees
of penalty on the roughness of the baseline hazard function. The estimates can be
computed by applying the frailtypack R package (Rondeau and Gonzalez 2005).

2.6 Copulas for Bivariate Event Times

In medical studies, it is common to record two event times for each patient. For
instance, a patient’s medical records may have time-to-locoregional progression and
time-to-distant metastasis. In the analysis of such bivariate survival data, the key
element is an appropriate account for dependence between event times.
26 2 Introduction to Multivariate Survival Analysis

A copula can be used to link two event times by specifying their dependence
structure1 . Let X and Y be two event times, and Z1 and Z2 be associated covariates,
respectively. Also let S X (x|Z1 )  Pr(X > x | Z1 ) and SY (y|Z2 )  Pr(Y > y | Z2 )
be the marginal survival functions. Given Z  (Z1 , Z2 ), we consider a bivariate
survival function

Pr(X > x, Y > y|Z)  Cθ {S X (x|Z1 ), SY (y|Z2 )}, (2.2)

where a function Cθ is called bivariate copula2 (Sklar 1959; Nelsen 2006) and a
parameter θ describes the degree of dependence between X and Y . With this model,
the dependence structure between X and Y is fully described by Cθ . The examples
of bivariate copulas are listed below:

The independence copula:

C(u, v)  uv,

The Clayton copula (Clayton 1978):

Cθ (u, v)  (u −θ + v−θ − 1)−1/θ , θ > 0,

The Gumbel copula (Gumbel 1960), also known as the Hougaard copula:
 
Cθ (u, v)  exp −{(− log u)θ+1 + (− log v)θ+1 } θ +1 , θ ≥ 0,
1

The Farlie–Gumbel–Morgenstern (FGM) copula (Morgenstern 1956):

C(u, v)  uv{1 + θ (1 − u)(1 − v)}, −1 ≤ θ ≤ 1.

Any bivariate copula is a bivariate distribution function whose marginal distribu-


tions are the uniform distribution on [0,1]. Hence, one can consider a pair of random
variables (U, V ) such that Pr(U ≤ u, V ≤ v)  Cθ (u, v). If one defines a pair of
random variables (X, Y ) by transformations X  S X−1 (U |Z1 ) and Y  SY−1 (V |Z2 ),
its distribution satisfies Eq. (2.2).
The Clayton and Gumbel copulas are derived from the gamma frailty and positive
stable frailty models, respectively. However, the FGM copula cannot be derived as a
frailty model.

1 Ingeneral, a copula can be used to link more than two event times. We only consider a bivariate
copula in this book.
2 One may say “bivariate survival copula” or simply “survival copula” since the copula is applied

to survival function in Eq. (2.2). See Nelsen (2006) for details.


2.6 Copulas for Bivariate Event Times 27

Figure 2.3 gives the scatter plots for pairs (U, V ) under the Clayton copula. The
plots exhibit positive dependence between U and V , where the levels of dependence
are different between θ  2 (Kendall’s tau  0.5) and θ  8 (Kendall’s tau  0.8).

Fig. 2.3 The scatter plot for 500 pairs of (U, V ) under the Clayton copula

The density function for Cθ is

∂2
Cθ[1,1] (u, v)  Cθ (u, v), 0 ≤ u ≤ 1, 0 ≤ v ≤ 1.
∂u∂v
Figure 2.4 gives the contour plots under the Clayton copula density. We see that
the characteristic of the contour plots agrees with that for the scatter plots.

Fig. 2.4 The contour for the density Cθ[1,1] ( u, v ) under the Clayton copula
28 2 Introduction to Multivariate Survival Analysis

Table 2.2 Examples of copulas


Copula Parameter Generator: Kendall’s tau: τθ rθ (s) 
φθ (t) −s φ̈θ (s)/φ̇θ (s)
Independence none − log(t) 0 1

Clayton θ >0 (t −θ − 1)/θ θ/( θ + 2 ) 1+θ


Gumbel θ ≥0 { − log(t) }θ +1 θ/( θ + 1 ) 1 − θ/ log(s)
FGM −1 ≤ θ ≤ 1 none 2θ/9 none

An Archimedean copula is defined as

Cθ (u, v)  φθ−1 {φθ (u) + φθ (v)},

where the function φθ : [0, 1] → [0, ∞] is called a generator function of the


copula, which is continuous and strictly decreasing from φθ (0) > 0 to φθ (1)  0.
Table 2.2 summarizes examples for generator functions. Any bivariate shared frailty
model can be expressed as an Archimedean copula model by setting ϕη−1 (t)  φη (t).
Hence, the copula models provide a more general framework of constructing bivariate
survival models.
The Clayton copula has the generator φθ (t)  (t −θ − 1)/θ for θ > 0. The limit
limθ→0 φθ (t)  − log(t) is also a generator for the independence copula. The FGM
copula does not have a generator as it is not an Archimedean copula. In the Clayton,
Gumbel, and FGM copulas, the value θ  0 reduces to the independence copula,
namely, limθ→0 Cθ ( u, v )  uv.

2.6.1 Measures of Dependence

Kendall’s tau (τ ) is a measure to assess the dependence between X and Y . It can be


shown that Kendall’s tau is solely expressed as a function of Cθ through

τθ  Pr{(X 1 − X 2 )(Y1 − Y2 ) > 0} − Pr{(X 1 − X 2 )(Y1 − Y2 ) < 0}


1 1
4 Cθ (u, v)Cθ[1,1] (u, v)dudv − 1, (2.3)
0 0

where (X 1 , Y1 ) and (X 2 , Y2 ) are independently drawn from the copula model (2.2).
This expression implies that Kendall’s tau does not depend on how to specify the
marginal survival functions S X (x|Z1 )  Pr(X > x | Z1 ) and SY (y|Z2 )  Pr(Y >
y | Z2 ).
2.6 Copulas for Bivariate Event Times 29

An Archimedean copula has a “shortcut” formula to compute Kendall’s tau

1
φθ (t)
τθ  1 + 4 dt, (2.4)
φ̇θ (t)
0

where φ̇θ (t)  dφθ (t)/dt. This formula gives τθ  θ/(θ + 2) for the Clayton copula
and τθ  θ/(θ +1) for the Gumbel copula, both taking values from τ0  0 to τ∞  1.
Since the FGM copula is not an Archimedean copula, the shortcut formula cannot
apply. However, the FGM copula has a simple expression τθ  2θ/9 for −1 ≤ θ ≤ 1,
which can be derived from Eq. (2.3). In the FGM copula, the range of Kendall’s tau
is restricted from τ−1  −2/9 to τ1  2/9.
It is convenient to define the partial derivatives of a copula:

∂ ∂
Cθ[1,0] (u, v)  Cθ (u, v), Cθ[0,1] (u, v)  Cθ (u, v),
∂u ∂v
∂2
Cθ[1,1] (u, v)  Cθ (u, v).
∂u∂v

Definition 7 The cross-ratio function (Oakes 1989) is defined as

Cθ[1,1] (u, v)Cθ (u, v)


Rθ (u, v)  .
Cθ[1,0] (u, v)Cθ[0,1] (u, v)

The local dependence at a location ( u, v ) is defined as


• Rθ ( u, v ) > 1; positive local dependence,
• 0 < Rθ ( u, v ) < 1; negative local dependence,
• Rθ ( u, v )  1; local independence.

Under the independence copula, Rθ ( u, v )  1 for 0 ≤ u ≤ 1 and 0 ≤ v ≤ 1.


Remarkably, the Clayton copula has the constant cross-ratio Rθ ( u, v )  1 + θ .
A simplified formula of the cross-ratio function is available for an Archimedean
copula. Using basic derivative rules, it can be shown that

Rθ (u, v)  rθ {Cθ (u, v)},

where rθ (s)  −s φ̈θ (s)/φ̇θ (s) and φ̈θ (t)  d 2 φθ (t)/dt 2 . Table 2.2 shows the formu-
las for rθ (·) under selected copulas.
The cross-ratio function has a practical interpretation as the relative risk. Consider
a medical follow-up in which the primary endpoint is OS, denoted as Y , and the sec-
ondary endpoint is TTP, denoted as X . We are interested in how tumour progression
30 2 Introduction to Multivariate Survival Analysis

influences the risk of death. For this purpose, we consider the conditional hazard
functions:
• λY ( y|X  x, Z )  Pr( y ≤ Y < y + dy|Y ≥ y, X  x, Z )/dy:
– the hazard function of death given that a patient has experienced tumour pro-
gression at time x
• λY ( y|X > x, Z )  Pr( y ≤ Y < y + dy|Y ≥ y, X > x, Z )/dy:
– the hazard function of death given that a patient has not yet experienced tumour
progression at time x

Under a model Pr( X > x , Y > y|Z )  Cθ { S X (x |Z), SY (y |Z) }, the relative
risk is

λY ( y|X  x, Z )
 Rθ { S X (x |Z), SY (y |Z) }.
λY ( y|X > x, Z )

If Rθ > 1, patients who have experienced tumour progression possess higher risk
of death compared to those who have not yet. The Clayton copula yields the constant
relative risk, and hence, is regarded as a type of proportional hazards models. In
Chap. 5, we shall explore the role of the cross-ratio function on prognostic analysis
under the joint frailty-copula model.
The cross-ratio function is also interpreted through the equation

Pr(X  x, Y  y|Z) Pr(X > x, Y > y|Z)


 Rθ {S X (x|Z), SY (y|Z)}.
Pr(X  x, Y > y|Z) Pr(X > x, Y  y|Z)

This is the odds ratio in the following 2 × 2 table (Table 2.3):

Table 2.3 A 2 × 2 table


with the odds ratio AD/(BC) Y y Y >y

Xx A B

X>x C D

Clayton (1978) proposed to estimate θ by counting the number of events in each


cell of the 2 × 2 tables, which is possible even when data are subject to right-
censoring. Emura et al. (2010) generalized his idea to estimate θ under any member
of Archimedean copulas by utilizing the formula Rθ ( u, v )  rθ { Cθ ( u, v ) }. See
also Wang (2003), Emura and Wang (2010), and Emura et al. (2011) for the 2 × 2
table methods under Archimedean copula models.
We have seen that the Clayton copula has nice properties for statistical modeling:
(i) a simple copula function, (ii) simple expression of Kendall’s tau, (iii) constant
cross-ratio function, and (iv) interpretability of the parameter θ + 1 as the relative
2.6 Copulas for Bivariate Event Times 31

risk or odds ratio. These properties are useful for modeling bivariate survival data
and interpreting the results of data analysis.

2.6.2 Residual Dependence

We shall introduce the concept of residual dependence between two endpoints. This
type of dependence arises when covariates influencing two endpoints are ignored or
missing.
Suppose that the primary endpoint is OS, denoted as Y , and the secondary endpoint
is TTP, denoted as X . We impose the conditional independence between the two
endpoints

Pr(X > x, Y > y|Z)  S X (x|Z)SY (y|Z), (2.5)

where S X (x|Z)  Pr(X > x|Z) and Pr(Y > y|Z)  SY (y|Z) are the marginal
survival functions. If Eq. (2.5) holds, one can perform two separate Cox regression
analyses for two endpoints. If Eq. (2.5) does not holds, the separate analyses may
lose some information on dependence between endpoints and even produce biased
results due to dependent censoring (Emura and Chen 2016, 2018).
To simplify our discussions, we consider a case, where only one covariate is
measured. In this case, the conditional independence required for separate analyses
is

Pr(X > x, Y > y|Z 1 )  S X (x|Z 1 )SY (y|Z 1 ),

where Z 1 is a covariate. However, the conditional independence typically does not


hold for only one covariate Z 1 . To see this, let Z  (Z 1 , Z 2 ) be a two-dimensional
vector of covariates that influence the two endpoints. Suppose that Z 2 is ignored
as it is difficult to measure or is inconsistently measured (e.g., tumour volume).
Figure 2.5 explains how the conditional independence fails to hold by omitting Z 2 .
Since Z 2 relates to the two endpoints, the variation in Z 2 induces unobserved frailty.
For instance, a high (low) value of Z 2 is linked to short (long) values of X and Y .
Consequently, X and Y exhibit positive association.

Death
Z1: Observed covariate
Dependence

Tumour Progression Z2: Omitted covariate

Fig. 2.5 Residual dependence between death and tumour progression


32 2 Introduction to Multivariate Survival Analysis

The above discussions lead to a principle that the conditional independence is less
likely to hold if many important covariates are omitted or ignored from the model.
This mechanism of yielding dependence is termed residual dependence. Residual
dependence arises in a meta-analysis where important covariates are missing in some
studies (Chap. 3; Emura et al. 2017). In this case, the copula model (2.2) can help
relax the conditional independence.

2.6.3 Likelihood Function

We consider bivariate survival data containing N patients. For j  1, 2, . . . , N , let


• (X j , Y j ): a pair of event times,
• (C j , C ∗j ): a pair of censoring times for (X j , Y j ),
Bivariate survival data consist of {(T j , T j∗ , δ j , δ ∗j , Z j ); j  1, 2, . . . , N }, where T j 
min(X j , C j ), T j∗  min(Y j , C ∗j ), δ j  I(T j  X j ), δ ∗j  I(T j∗  Y j ), and Z j 
(Z1 j , Z2 j ) is a vector of covariates.

Proposition 2 Under the copula model Pr(X > x, Y > y|Z) 


Cθ {S X (x|Z1 ), SY (y|Z2 )}, the log-likelihood is


n

  log L  δ j log λ X (T j |Z1 j )+δ ∗j log λY (T j∗ |Z2 j )
j1

+ δ j δ ∗j log Rθ {S X (T j |Z1 j ), SY (T j∗ |Z2 j )}


+ δ j log η1,θ {S X (T j |Z1 j ), SY (T j∗ |Z2 j )}
+ δ ∗j log η2,θ {S X (T j |Z1 j ), SY (T j∗ |Z2 j )}
+ log Cθ {S X (T j |Z1 j ), SY (T j∗ |Z2 j )} .

where λ X (t|Z1 j )  −∂ log S X (t|Z1 j )/∂t, λY (t|Z2 j )  −∂ log SY (t|Z2 j )/∂t,


Cθ[1,0] (u, v) Cθ[0,1] (u, v)
ηθ,1 (u, v)  u Cθ (u, v)
and η2,θ (u, v)  v Cθ (u, v)
.

Proof of Proposition 2: Each patient experiences one of the four cases: (i) δ j  δ ∗j 
1, (ii) δ j  1 and δ ∗j  0, (iii) δ j  0 and δ ∗j  1, and (iv) δ j  δ ∗j  0. Each case
has its own likelihood. Combining the four cases, the likelihood for the j-th patient
is
∗ ∗
L j  Pr(X j  T j , Y j  T j∗ |Z j )δ j δ j Pr(X j  T j , Y j > T j∗ |Z j )δ j (1−δ j )
∗ ∗
× Pr(X j > T j , Y j  T j∗ |Z j )(1−δ j )δ j Pr(X j > T j , Y j > T j∗ |Z j )(1−δ j )(1−δ j )
2.6 Copulas for Bivariate Event Times 33

 δ j δ∗j
Pr(X j  T j , Y j  T j∗ |Z j ) Pr(X j > T j , Y j > T j∗ |Z j )

Pr(X j  T j , Y j > T j∗ |Z j ) Pr(X j > T j , Y j  T j∗ |Z j )
 δ j  δ ∗
Pr(X j  T j , Y j > T j∗ |Z j ) Pr(X j > T j , Y j  T j∗ |Z j ) j
×
Pr(X j > T j , Y j > T j∗ |Z j ) Pr(X j > T j , Y j > T j∗ |Z j )
× Pr(X j > T j , Y j > T j∗ |Z j ).

Under the copula model,



L j Rθ {S X (T j |Z1 j ), SY (T j∗ |Z2 j )}δ j δ j
 δ j
∂ Cθ[1,0] {S X (T j Z1 j ), SY (T j∗ |Z2 j )}
− S X (T j |Z1 j )
∂ Tj Cθ {S X (T j |Z1 j ), SY (T j∗ |Z2 j )}
 δ ∗
∂ ∗
Cθ[0,1] {S X (T j |Z1 j ), SY (T j∗ |Z2 j )} j
× − ∗ SY (T j |Z2 j )
∂ Tj Cθ {S X (T j |Z1 j ), SY (T j∗ |Z2 j )}
Cθ {S X (T j |Z1 j ), SY (T j∗ |Z2 j )}
∗ ∗
 λ X (T j |Z1 j )δ j × λY (T j∗ |Z2 j )δ j Rθ {S X (T j |Z1 j ), SY (T j∗ |Z2 j )}δ j δ j
 δ  δ ∗j
× ηθ,1 {S X (T j |Z1 j ), SY (T j∗ |Z2 j )} j ηθ,2 {S X (T j |Z1 j ), SY (T j∗ |Z2 j )}
× Cθ {S X (T j |Z1 j ), SY (T j∗ |Z2 j )}.

The log-likelihood is obtained by taking the logarithm of the above expression. 


The likelihood-based procedures developed in Sect. 2.4 are applicable to the like-
lihood function in Proposition 2. The following proposition follows since Rθ (u, v) 
ηθ,1 (u, v)  ηθ,1 (u, v)  1 under Cθ ( u, v )  uv.

Proposition 3 Under the independence model Pr( X > x , Y > y|Z ) 


S X (x |Z1 ) × SY (y |Z2 ), the log-likelihood is    X + Y , where


n
X  [δ j log λ X (T j |Z1 j ) −  X (T j |Z1 j )],
j1
n
Y  [δ ∗j log λY (T j∗ |Z2 j ) − Y (T j∗ |Z2 j )],
j1

where  X (t |Z1 j )  − log S X (t |Z1 j ) and Y (t |Z2 j )  − log SY (t |Z2 j ).

Proposition 3 implies that, under the independence model, one can obtains the MLE
by maximizing  X based on data {(T j , δ j , Z1 j ); j  1, 2, . . . , N } and maximizing Y
based on data {(T j∗ , δ ∗j , Z2 j ); j  1, 2, . . . , N } as discussed in Sect. 2.4. However, the
34 2 Introduction to Multivariate Survival Analysis

two separate analyses yield inefficient estimators due to the loss caused by ignoring
dependence between two event times.

2.7 Exercises

1. Is TTP an adequate endpoint in advanced colorectal cancer? After reading


Chapter 2 and Piedbois and Croswell (2008), please write a one-page report
to summarize your answers.
2. Are the following statements correct? Please verify your answers.

(1) PFS ≤ OS holds for all patients in a clinical trial.


(2) TTP ≤ OS holds for all patients in a clinical trial.
(3) PFS ≤ TTP holds for all patients in a clinical trial.
(4) PFS < OS holds for all patients in a clinical trial.

3. Answer the following questions by performing Cox regression on the 63 training


samples from the lung cancer data available in the compound.Cox R package
(Emura et al. 2019).

(1) Is ZNF264 univariately associated with survival (P-value < 0.05)?


(2) Is NF1 univariately associated with survival (P-value < 0.05)?
(3) Are ZNF264 and NF1 associated with survival (P-value < 0.05)?
(4) Discuss about the multicollinearity between ZNF264 and NF1.
(5) How many genes are univariately associated with survival (P-value < 0.05)?

4. We analyze the data ( Ti , δi , Z i ), i  1, . . . , n, under the model S(t|Z i ) 


where λ > 0, −∞ < β < ∞, and Z i  0 or 1. Let
n exp(β Z i )}, 
exp{−λt
n
m  i1 δi , n 1  i1 Z i , and n 0  n − n 1 .

(1) Write down the log-likelihood function (ϕ), where ϕ  (λ, β).
(2) Obtain the MLE by solving the score equation S(ϕ)  0.
(3) Derive the Hessian matrix of H (ϕ)  ∂ 2 (ϕ)/∂ϕ∂ϕ .
(4) Derive the Newton–Raphson algorithm and apply it to the data of Sect. 2.3.1.
(5) Compare the estimate exp(β̂) with the one obtained from the partial likeli-
hood.

5. Derive the mean E[X | Z] and variance V ar (X | Z) for the Weibull–gamma


distribution Pr(X > x|Z)[1 + ηλx ν exp(β Z)]−1/η .
6. Consider a Gamma(α, β) distribution with the density
 
1 α−1 u
f α,β (u)  u exp − , α > 0, β > 0, u > 0.
(α)β α β

(1) Show E( log u )  ψ(α) + log β, where ψ(α)  d{log (α)}/dα is the
digamma function.
2.7 Exercises 35

(2) Under Gamma(α  1/η, β  η), derive the conditional distribution


⎛ ⎧ ⎫−1 ⎞
1 ⎨1 
Ni ⎬
(u i |Ti , δi ) ∼ Gamma⎝α  + mi , β  + i j (Ti j ) ⎠.
η ⎩η ⎭
j1

Hint: f (u i |Ti , δi ) ∝ L i (Ti , δi |u i ) f (u i ), where L i (Ti , δi |u i ) is in the proof


of Proposition 1.
(3) Show

1/η + m i
E(u i |Ti , δi )   i ,
1/η + Nj1 i j (Ti j )
⎧ ⎫
  ⎨ Ni ⎬
1 1
E(log u i |Ti , δi )  ψ + m i − log + i j (Ti j ) .
η ⎩η ⎭
j1

7. Under the FGM copula, derive the expression of Kendall’s tau τθ  2θ/9 for
−1 ≤ θ ≤ 1.
8. Under the Clayton copula and Gumbel copula, derive the expressions of Kendall’s
tau τθ  θ/(θ + 2) and τθ  θ/(θ + 1), respectively.
9. We consider the log-likelihood of Proposition 2 under the Clayton copula.

(1) Derive the forms of ηθ,1 (u, v), η2,θ (u, v), and Rθ ( u, v ).
(2) Write down the log-likelihood.

References

Andersen PK, Borgan O, Gill RD, Keiding N (1993) Statistical models based on counting processes.
Springer-Verlag, New York
Burr IW (1942) Cumulative frequency functions. Ann Math Stat 13(2):215–232
Cheema PK, Burkes RL (2013). Overall survival should be the primary endpoint in clinical trials
for advanced non-small-cell lung cancer. Curr Oncol 20:e150–160
Clayton DG (1978) A model for association in bivariate life tables and its application in epidemio-
logical studies of familial tendency in chronic disease incidence. Biometrika 65(1):141–151
Cox DR (1972) Regression models and life-tables (with discussion). J R Stat Soc Series B Stat
Methodol 34:187–220
Duchateau L, Janssen P, Lindsey P, Legrand C, Nguti R, Sylvester R (2002) The shared frailty model
and the power for heterogeneity tests in multicenter trials. Comput Stat Data Anal 40(3):603–620
Duchateau L, Janssen P (2007) The frailty model. Springer, New York
Eisenhauer E, Therasse P, Bogaerts J, et al (2009) New response evaluation criteria in solid tumours:
revised RECIST guideline (version 1.1). Eur J Cancer 45(2):228–247
Emura T (2019) joint.Cox: joint frailty-copula models for tumour progression and death in meta-
analysis, CRAN
Emura T, Chen YH (2016) Gene selection for survival data under dependent censoring, a copula-
based approach. Stat Methods Med Res 25(6):2840–2857
36 2 Introduction to Multivariate Survival Analysis

Emura T, Chen YH (2018). Analysis of survival data with dependent censoring, copula-based
approaches, JSS Research Series in Statistics. Springer
Emura T, Matsui S, Chen HY (2019) compound.Cox: univariate feature selection and compound
covariate for predicting survival. Comput Methods Programs Biomed 168:21–37
Emura T, Lin CW, Wang W (2010) A goodness-of-fit test for Archimedean copula models in the
presence of right censoring. Compt Stat Data Anal 54:3033–3043
Emura T, Nakatochi M, Murotani K, Rondeau V (2017) A joint frailty-copula model between
tumour progression and death for meta-analysis. Stat Methods Med Res 26(6):2649–2666
Emura T, Nakatochi M, Matsui S, Michimae H, Rondeau V (2018) Personalized dynamic prediction
of death according to tumour progression and high-dimensional genetic factors: meta-analysis
with a joint model. Stat Methods Med Res 27(9):2842–2858
Emura T, Wang W (2010) Testing quasi-independence for truncation data. J Multivar Anal
101:223–239
Emura T, Wang W, Hung HN (2011) Semi-parametric inference for copula models for dependently
truncated data. Stat Sinica 21:349–367
Fleming TR, Harrington DP (1991). Counting processes and survival analysis. Wiley, USA
Ganzfried BF, Riester M, Haibe-Kains B, et al (2013) Curated ovarian data: clinically annotated
data for the ovarian cancer transcriptome, Database; Article ID bat013. https://doi.org/10.1093/
database/bat013
Green EM, Yothers G, Sargent DJ (2008) Surrogate endpoint validation: statistical elegance versus
clinical relevance. Stat Methods Med Res 17(5):477–486
Gumbel EJ (I960). Distributions de valeurs extremes en plusieurs dimensions. PubL Inst Statist.
Parids 9:171–173
Ha ID, Jeong JH, Lee Y (2017) Statistical modelling of survival data with random effects: h-
likelihood approach. Springer, Singapore
Hamasaki T, Asakura K, Evans SR, Ochiai T (2016) Group-sequential clinical trials with multiple
co-objectives. JSS Series in Statistics. Springer, New York
Hirsch K, Wienke A (2012) Software for semiparametric shared gamma and log-normal frailty
models: an overview. Comput Methods Programs Biomed 107(3):582–597
Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley,
New York
Klein JP, Moeschberger ML (2003) Survival analysis techniques for censored and truncated data.
Springer, New York
Le Tourneau C, Michiels S, Gan HK, Siu LL (2009) Reporting of time-to-event end points and track-
ing of failures in randomized trials of radiotherapy with or without any concomitant anticancer
agent for locally advanced head and neck cancer. J Clin Oncol 27(35):5965–5971
Matsui S, Buyse M, Simon R (eds) (2015) Design and analysis of clinical trials for predictive
medicine, vol 72. CRC Press, New York
Michiels S, Le Maître A, Buyse M, Burzykowski T, Maillard E, Bogaerts J, Pignon JP (2009)
Surrogate endpoints for overall survival in locally advanced head and neck cancer: meta-analyses
of individual patient data. Lancet Oncol 10(4):341–350
Molenberghs G, Verbeke G, Efendi A, Braekers R, Demétrio CG (2015) A combined gamma frailty
and normal random-effects model for repeated, overdispersed time-to-event data. Stat Methods
Med Res 24(4):434–452
Morgenstern D (1956) Einfache Beispiele zweidimensionaler Verteilungen. Mitteilungsblatt für
Mathematishe Statistik. 8:234–235
Nelsen RB (2006) An introduction to copulas, 2nd edn. Springer, New York
Oakes D (1989) Bivariate survival models induced by frailties. J Am Stat Assoc 84:487–493
Oba K, Paoletti X, Alberts S et al (2013) Disease-free survival as a surrogate for overall survival in
adjuvant trials of gastric cancer: a meta-analysis. J Natl Cancer Inst 105(21):1600–1607
Pazdur R (2008) Endpoints for assessing drug activity in clinical trials. Oncologist 13:19–21
Piedbois P, Croswell MJ (2008) Surrogate endpoints for overall survival in advanced colorectal
cancer: a clinician’s perspective. Stat Methods Med Res 17(5):519–527
References 37

Ramsay J (1988) Monotone regression spline in action. Statis Sci 3:425–461


Rodríguez-Girondo M, Deelen J, Slagboom EP, Houwing-Duistermaat JJ (2018) Survival analysis
with delayed entry in selected families with application to human longevity. Stat Methods Med
Res 27(3):933–954
Rondeau V, Commenges D, Joly P (2003) Maximum penalized likelihood estimation in a gamma-
frailty model. Lifetime Data Anal 9:139–153
Rondeau V, Gonzalez JR (2005) frailtypack: a computer program for the analysis of correlated
failure time data using penalized likelihood estimation. Comput Methods Programs Biomed
80(2):154–164
Rondeau V, Pignon JP, Michiels S (2015) A joint model for dependence between clustered times to
tumour progression and deaths: a meta-analysis of chemotherapy in head and neck cancer. Stat
Methods Med Res 24(6):711–729
Rondeau V, Mauguen A, Laurent A, Berr C, Helmer C (2017) Dynamic prediction models for
clustered and interval-censored outcomes: investigating the intra-couple correlation in the risk of
dementia. Stat Methods Med Res 26(5):2168–2183
Shi Q, Sargent DJ (2009) Meta-analysis for the evaluation of surrogate endpoints in cancer clinical
trials. Int J Clin Oncol 14(2):102–111
Sklar A (1959) Fonctions de répartition à n dimensions et leurs marges. Publications de l’Institut
de Statistique de L’Université de Paris. 8:229–231
Soria JC, Massard C, Le Chevalier T (2010) Should progression-free survival be the primary measure
of efficacy for advanced NSCLC therapy? Ann Oncol 21(12):2324–2332
Sugimoto T, Hamasaki T, Evans SR (2017) Sizing clinical trials when comparing bivariate time-to-
event outcomes. Stat Med 36(9):1363–1382
Vu HT, Knuiman MW (2002) A hybrid ML-EM algorithm for calculation of maximum likelihood
estimates in semiparametric shared frailty models. Compt Stat Data Anal 40(1):173–187
Wang W (2003) Estimating the association parameter for copula models under dependent censoring.
J R Stat Soc Series B Stat Methodol 65(1):257–273
Chapter 3
The Joint Frailty-Copula Model
for Correlated Endpoints

Abstract This chapter describes a meta-analysis (or multicenter analysis) of individ-


ual patient data with two correlated survival endpoints. The endpoints of interest are
time-to-tumour progression (TTP) and overall survival (OS). We first define a semi-
competing risks setting for TTP and OS. We then introduce the joint frailty-copula
model that formulates the shared frailty model for heterogeneity in a meta-analysis,
and utilizes a copula for dependence between TTP and OS. To account for the effect
that TTP is dependently censored by death, a likelihood function is derived under
the semi-competing risks setting. We adopt spline-based models for baseline hazard
functions with the aid of a penalized likelihood procedure. We analyze the data on
ovarian cancer patients to illustrate statistical analyses using the joint.Cox R package.

Keywords Clayton’s copula · Cox regression · Individual-level dependence ·


Penalized likelihood · Residual dependence · Semi-competing risk · Spline ·
Surrogate endpoint

3.1 Introduction

We consider a meta-analysis to perform Cox regression for both time-to-tumour


progression (TTP) and overall survival (OS). In this respect, Burzykowski et al.
(2001) developed a bivariate Weibull model for jointly performing Cox regression
for two endpoints with meta-analytic data. See also Chap. 11 of Burzykowski et al.
(2005). They proposed a two-step method; the first step applies a copula to account
for the individual-level dependence between two endpoints, and the second stage
applies random-effects to account for heterogeneity in a meta-analysis. The two-step
method of Burzykowski et al. (2001, 2005) has been applied to a number of cancer
studies for evaluating the correlations of two endpoints and is recently implemented
in an R package (Rotolo et al. 2018).

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2019 39
T. Emura et al., Survival Analysis with Correlated Endpoints,
JSS Research Series in Statistics, https://doi.org/10.1007/978-981-13-3516-7_3
40 3 The Joint Frailty-Copula Model for Correlated Endpoints

While the two-step method can account for dependence between two endpoints
through a copula, the estimation method cannot account for the effect of dependent
censoring. In other words, the two-step method is valid only when two endpoints are
subject to independent censoring. There would be a concern for bias when we assess
TTP through the two-step method since TTP is dependently censored by death. Since
death may be highly associated with tumour progression, censoring due to death is
less likely to be independent.
Statistical methods for semi-competing risks data properly deal with the case
where death can dependently censor TTP (Fine et al. 2001). This setting regards death
as a competing risk for TTP rather than treating death as an independent censoring
for TTP (Haneuse and Lee 2016). Under the semi-competing risks setup, Rondeau
et al. (2015) developed a one-step estimation method based on a joint frailty model,
where a frailty term account for heterogeneity in a meta-analysis. The joint frailty
model can induce the intra-study dependence between TTP and OS through unob-
served frailties. However, there exists some residual dependence (individual-level
dependence) in meta-analyses (Burzykowski et al. 2001, 2005). Emura et al. (2017)
extended the joint frailty model of Rondeau et al. (2015) to account for the residual
dependence via copulas, after having accounted for the intra-study dependency with
frailties. While their copula model is similar to that of Burzykowski et al. (2001),
the estimation procedure of Emura et al. (2017) incorporates the effect of dependent
censoring (semi-competing risk) into the likelihood. In addition, their approach
adopts cubic splines for the baseline hazard functions, providing more flexible
models over the Weibull model of Burzykowski et al. (2001) and Rotolo et al.
(2018). Under the copula models, Peng et al. (2018) developed an even more flexible
model, where the forms of the baseline hazard functions are completely unspecified.
In the sequel, we introduce the estimation procedure of Emura et al. (2017) and
its implementation via the joint.Cox R package (Emura 2019).

3.2 Semi-competing Risks Data

Meta-analysis using patient-level information is called individual patient data (IPD)


meta-analysis. IPD meta-analysis is essentially different from meta-analysis on sum-
mary data or published data where patient-level information is lost. We only consider
IPD meta-analyses since patient-level information is required to assess dependence
between endpoints.
We consider an IPD meta-analysis on data consisting of G independent studies
with the ith study containing Ni patients. For i  1, 2, . . . , G and j  1, 2, . . . , Ni ,
let
3.2 Semi-competing Risks Data 41

• Xij : time-to-tumour progression (TTP),


• Dij : overall survival (OS), or equivalently time-to-death,
• Cij : independent and noninformative censoring time.

Figure 3.1 provides observation patterns of data where each patient exhibits one of
the four mutually exclusive cases (Cases A–D). First, if a patient experiences tumour
progression and then dies before independent censoring time, both TTP and OS are
available (Case A). Second, if a patient experiences tumour progression but does
not die before independent censoring time, then TTP is available but OS is censored
(Case B). Third, if a patient dies without tumour progression, then OS is available,
but TTP is dependently censored by death (Case C). Fourth, if a patient experiences
neither tumour progression nor death before independent censoring time, both TTP
and OS are censored (Case D).
Consequently, what we actually observe can be written as
(Tij , Tij∗ , δij , δij∗ , Z1,ij , Z2,ij ) for i  1, 2, . . . , G and j  1, 2, . . . , Ni , where

• Tij  min(Xij , Dij , Cij ): first-occurring event time,


• δij  I(Tij  Xij ): status of tumour progression (no progression  0; pro-
gression  1),
where I(·) is the indicator function,
• Tij∗  min(Dij , Cij ): censored terminal event time,
• δij∗  I(Tij∗  Dij ): vital status (alive  0; dead  1),
• Z1, ij : p1 -dimensional covariates associated with TTP,
• Z2, ij : p2 -dimensional covariates associated with OS.

The four cases (Cases A–D in Fig. 3.1) can be identified by a pair (δij , δij∗ ). For
instance, Case A corresponds to the pair (δij , δij∗ )  (1, 1) whereby Tij  Xij and
Tij∗  Dij . Table 3.1 summarized the four possible pairs, (δij , δij∗ )  (1, 1), (1, 0),
(0, 1), and (0, 0).
The aforementioned observation patterns follow the semi-competing risks setting
(Fine et al. 2001) in which terminal event is a competing risk for nonterminal event. In
our setting, death (terminal event) can dependently censor TTP (nonterminal event);
see Case C in Fig. 3.1. On the other hand, tumour progression cannot censor OS;
see Case A in Fig. 3.1. Hence, death is a competing risk for tumour progression,
but tumor progression is not a competing risk for death, suggesting the term “semi-
competing risks”. The censoring of TTP by death is termed dependent censoring that
is distinguished from the usual independent censoring. What we defined as TTP can
actually be any nonterminal event such as “time-to-recurrence”.
42 3 The Joint Frailty-Copula Model for Correlated Endpoints

Death Censoring
Treatment
Progression

Case (A): Both TTP and OS are available


---------------------------------------------------------------------------------------------------------------

Progression Censoring
Treatment
Death

Case (B): TTP is available, but OS is censored


---------------------------------------------------------------------------------------------------------------

Progression Censoring
Treatment
Death

Case (C): OS is available, but TTP is dependently censored by death


---------------------------------------------------------------------------------------------------------------

Death
Censoring
Treatment
Progression
Case (D): Both TTP and OS are censored
---------------------------------------------------------------------------------------------------------------

Fig. 3.1 Observation patters of semi-competing risks data. Observed event times are denoted by
solid lines and unobserved event times are denoted by dotted lines

Table 3.1 Observation patterns of semi-competing risks data


First event Last event Tij Tij∗ δij δij∗

Case (A) Tumour progression Death Xij Dij 1 1

Case (B) Tumour progression Independent censoring Xij Cij 1 0

Case (C) Death Death Dij Dij 0 1

Case (D) Independent censoring Independent censoring Cij Cij 0 0


3.3 Joint Frailty-Copula Model 43

3.3 Joint Frailty-Copula Model

It is generally understood that meta-analyses should assess heterogeneity between


studies. Hence, an appropriate model for meta-analysis may include a study specific
(random) effect to account for heterogeneity. To capture the heterogeneity of baseline
risks, Rondeau et al. (2015) considered unobserved frailty terms ui (i  1, 2, . . . , G),
which act on the hazard functions for TTP and OS. The frailty terms are assumed to
follow a gamma distribution with a density
 
1 η −1 exp −
1 u
fη (u)  u , u > 0, η > 0.
(1/η)η1/η η

The distribution has mean 1 and variance η that represents the degree of the between-
study heterogeneity. Conditional on ui , Z1, ij and Z2, ij , we define the hazard functions

rij (t|ui )  Pr(t ≤ Xij < t + dt|Xij ≥ t, ui , Z1,ij )/dt,
λij (t|ui )  Pr(t ≤ Dij < t + dt|Dij ≥ t, ui , Z2,ij )/dt,

where Z1,ij and Z2,ij are suppressed on the left-hand sides. Rondeau et al. (2015)
proposed a joint frailty model for meta-analysis:

Definition 1 The joint frailty model is defined as



rij (t|ui )  ui r0 (t) exp(β1 Z1,ij ),
(3.1)
λij (t|ui )  uiα λ0 (t) exp(β2 Z2,ij ).

In this model, Xij and Dij are assumed to be conditionally independent given
ui , Z1, ij , and Z2, ij .

The parameters β1 (or β2 ) are the effects of Z1,ij (or Z2, ij ), which are the target
population parameters. The forms of the baseline hazard functions r0 (·) and λ0 (·)
are flexibly modeled. Since the frailty term ui is shared by the two hazard functions,
it induces the intra-study dependence between Xij and Dij . The parameter α can
differentiate the effect of heterogeneity between the two endpoints.
Residual dependence arises if a patient-level characteristics affecting both Xij and
Dij is ignored in the model (Sect. 2.6.2). In meta-analysis, residual dependence is a
legitimate concern since researchers often have a limited access to covariates. Emura
et al. (2017) proposed a joint frailty-copula model that extends the joint frailty model
by introducing the intra-subject dependence using a copula (Nelsen 2006):
44 3 The Joint Frailty-Copula Model for Correlated Endpoints

Definition 2 The joint frailty-copula model is defined as



⎨ rij (t|ui )  ui r0 (t) exp(β1 Z1,ij )
λij (t|ui )  uiα λ0 (t) exp(β2 Z2,ij ) , (3.2)

Pr(Xij > x, Dij > y|ui )  Cθ [SXij (x|ui ), SDij (y|ui )]

where Cθ is a copula with an unknown parameter θ .

In Definition 2, the survival functions and hazard functions are related through
⎧ x

⎪ 
⎨ SXij (x|ui )  exp −ui R0 (x) exp(β1 Z1,ij ) , R0 (x)  r0 (t)dt,
0
y


⎩ SDij (y|ui )  exp −uiα Λ0 (y) exp(β2 Z2,ij ) , Λ0 (y)  λ0 (t)dt.
0

The copula describes the intra-subject (individual-level) dependence between Xij


and Dij . We mainly focus on modeling positive dependence between Xij and Dij using
the Clayton or Gumbel copula1 :

The Clayton copula

Cθ (v, w)  (v−θ + w−θ − 1)−1/θ , θ > 0,

The Gumbel copula

Cθ (v, w)  exp −{(− log v)θ+1 + (− log w)θ+1 } θ +1 , θ ≥ 0.


1

Copulas provide a simple way to compute measures of correlation between Xij


and Dij . The most popular measure is Kendall’s tau, though copulas can also provide
other measures such as Spearman’s rho.
Under the Clayton copula, Kendall’s tau is τθ  θ/(θ + 2).
Under the Gumbel copula, Kendall’s tau is τθ  θ/( θ + 1 ).
More details about copulas and Kendall’s tau are referred to Sect. 2.6.
As in Burzykowski et al. (2001), one can use Kendall’s tau as a measure of the
individual-level dependence between two endpoints in meta-analysis.

1 The Clayton copula can be extended to allow for negative dependence by setting θ < 0. However,
we do not consider such an extension since it produces a singular distribution (Nelsen 2006). The
Gumbel copula cannot be defined for θ < 0.
3.3 Joint Frailty-Copula Model 45

Under the independence copula Cθ (v, w)  vw, Definition 2 reduces to the joint
frailty model (Definition 1). Note that the Clayton and Gumbel copulas reduce to the
independence copula by setting θ → 0.
One might also consider some copulas that allow negative dependence, such as
the FGM copula

The Farlie–Gumbel–Morgenstern (FGM) copula:

Cθ (v, w)  vw{1 + θ (1 − v)(1 − w)}, −1 ≤ θ ≤ 1.

Under the FGM copula, Kendall’s tau is τθ  2θ/9. However, the range of
Kendall’s tau is restricted from τ−1  −2/9 to τ1  2/9.

3.4 Penalized Likelihood with Splines

A likelihood function can be constructed given observed data


(Tij , Tij∗ , δij , δij∗ , Z1,ij , Z2,ij ) for i  1, 2, . . . , G and j  1, 2, . . . , Ni . Define
notations

Rij (t)  R0 (t) exp(β1 Z1,ij ), rij (t)  dRij (t)/dt  r0 (t) exp(β1 Z1,ij ),


ij (t)  0 (t) exp(β2 Z2,ij ), λij (t)  d ij (t)/dt  λ0 (t) exp(β2 Z2,ij ).

Proposition 1 Under the joint frailty-copula model, the log-likelihood is




G Ni 
  
 
 ⎣ δij log rij Tij + δij∗ log λij Tij∗
i1 j1

∞ ⎨    
∗ 
Ni
  δij   δij∗
+ log umi +αmi ψθ uRij Tij , uα ij Tij∗ ψθ∗ uRij Tij , uα ij Tij∗

0 j1
⎫ ⎤
      ⎬
δij δij∗
× θ uRij (Tij ), u α ∗
ij Tij Dθ uRij Tij , u α ∗
ij Tij fη (u)du⎦, (3.3)

46 3 The Joint Frailty-Copula Model for Correlated Endpoints

 Ni  Ni
where, mi  j1 δij , m∗i  ∗
j1 δij , Dθ [s, t]  Cθ [exp(−s), exp(−t)],
Dθ[1,0] [s,t] Dθ[0,1] [s,t] Dθ[1,1] [s,t]Dθ [s,t]
ψθ [s, t]  Dθ [s,t]
, ψθ∗ [s, t]  Dθ [s,t]
, θ [s, t] 
Dθ[1,0] [s,t]Dθ[0,1] [s,t]
,
Dθ[1,0] [s, t]  −∂Dθ [s, t]/∂s, Dθ[0,1] [s, t]  −∂Dθ [s, t]/∂t, and Dθ[1,1] 
∂ 2 Dθ [s, t]/∂s∂t.

Proof of Proposition 1: Let Ti  (Ti1 , . . . , TiNi ), T∗i  (Ti1∗ , . . . , TiN ∗


i
), δi 
∗ ∗ ∗
(δi1 , . . . , δiNi ), and δi  (δi1 , . . . , δiNi ) be the data in the ith cluster. Given ui , the
likelihood for the ith study is

Ni  δij δ ∗  δij (1−δ ∗ )
L(Ti , T∗i , δi , δ∗i |ui )  Pr Xij  Tij , Dij  Tij∗ |ui Pr Xij  Tij , Dij > Tij∗ |ui
ij ij

j1
 (1−δij )δ ∗  (1−δij )(1−δ ∗ )
× Pr Xij > Tij , Dij  Tij∗ |ui Pr Xij > Tij , Dij > Tij∗ |ui
ij ij

Ni 
     δij δ ∗
   
ui rij Tij uiα λij Tij∗ Dθ[1,1] ui Rij Tij , uiα Tij∗
ij
 ij
j1
       δij −δij δ ∗
× ui rij Tij Dθ[1,0] ui Rij Tij , uiα ij Tij∗
ij

       δ ∗ −δij δ ∗
× uiα λij Tij∗ Dθ[0,1] ui Rij Tij , uiα ij Tij∗
ij ij

     1−δij −δ ∗ +δij δ ∗
× Dθ ui Rij Tij , uiα ij Tij∗
ij ij

⎧ ⎫
⎨ Ni
 δij  ∗ δij∗ ⎬
 rij Tij λij Tij
⎩ ⎭
j1

⎨   δij  
mi +αm∗i 
Ni
    δij∗
× ui ψθ ui Rij Tij , uiα ij Tij∗ ψθ∗ ui Rij Tij , uiα ij Tij∗

j1

  α  ∗  δij δij∗
× θ ui Rij Tij , ui ij Tij Dθ ui Rij (Tij ), uiα ij (Tij∗ ) .

Integrating out the unobserved frailty, the likelihood for the ith study is
∞
 
L(Ti , T∗i , δi , δi∗ )  L Ti , T∗i , δi , δi∗ |u fη (u)du
0


Ni
 δ  δ ∗
rij Tij ij λij Tij∗
ij

j1

∞ ⎨    
∗ 
Ni
  δij   δij∗
× umi +αmi ψθ uRij Tij , uα ij Tij∗ ψθ∗ uRij Tij , uα ij Tij∗

0 j1
    δij δij∗     
× θ uRij Tij , uα ij Tij∗ Dθ uRij Tij , uα ij Tij∗ fη (u)du.
3.4 Penalized Likelihood with Splines 47

Equation (3.3) follows by taking logarithm and summing up for i  1, 2, . . . , G .


The log-likelihood function in Eq. (3.3) has a simple form under the Clayton
copula, one can easily obtain Dθ [s, t]  Aθ (s, t)−1/θ , ψθ [s, t]  exp(θ s)/Aθ (s, t),
ψθ∗ [s, t]  exp(θ t)/Aθ (s, t), and θ [s, t]  1 + θ , where Aθ (s, t)  exp(θ s) +
exp(θ t) − 1. By substituting these forms into Eq. (3.3), the log-likelihood is readily
computable.
Following Rondeau et al. (2015), the forms of r0 (·) and λ0 (·) are modeled via the
cubic M-spline (Ramsay 1988). The spline method aims to obtain smooth estimate
for r0 (·) or λ0 (·) as a weighted sum of cubic polynomial functions, called basis
functions. To define basis functions, one needs to determine knots that divide the
range of observed event times (see Fig. 3.2 for example).

Fig. 3.2 A baseline hazard function expressed by the cubic M-spline. The knots are set by the
smallest event time ξ1  minij (Tij ), the largest follow-up time ξ3  minij (Tij∗ ), and their intermediate
value ξ2  (ξ1 + ξ3 )/2

For instance, we define r0 (t) and λ0 (t) on t ∈ [ξ1 , ξ3 ], where ξ1 < ξ2 < ξ3 are
the knots. One may set the smallest event time ξ1  minij (Tij ), the largest follow-up
time ξ3  minij (Tij∗ ), and ξ2  (ξ1 + ξ3 )/2. We then obtain the five basis functions
such that


5 
5
r0 (t)  g M (t)  g M(t), λ0 (t)  h M (t)  h M(t),
1 1

where M(t)  (M1 (t), . . . , M5 (t)) are the M-spline basis functions and are cubic
polynomial functions of t. The concrete formulas of the basis functions are given in
Appendix A. Here, g  (g1 , . . . , g5 ) and h  (h1 , . . . , h5 ) are unknown positive
parameters. The five-parameter model gives a good flexibility for real applications
(Ramsay 1988) and is one of reasonable choices (Commenges and Jacqmin-Gadda
2015). Since the spline bases are easy to integrate, the baseline cumulative hazard
functions are computed as
48 3 The Joint Frailty-Copula Model for Correlated Endpoints


5 
5
R0 (t)  g I (t), 0 (t)  h I (t),
1 1

where I (t) is the integration of M (t), called the I-spline basis (Ramsay 1988).
The M-spline and I-spline bases are displayed in Fig. 2.1 of Chap. 2, and
their expressions are given in Appendix A. The joint.Cox package offers functions
M.spline() for computing M (t) and I.spline() for I (t).
With the spline-based model, we consider a penalized log-likelihood
 
(α, η, θ, β1 , β2 , g, h) − κ1 r̈0 (t)2 dt − κ2 λ̈0 (t)2 dt, (3.4)

where f¨ (t)  d2 f (t)/dt 2 , and (κ1 , κ2 ) are given nonnegative values. The parameters
(κ1 , κ2 ) are called smoothing parameters, which control the degrees of penalties on
the roughness of the two baseline hazard functions. Under the five-parameter splines,
it can be shown (Appendix A) that
⎡ ⎤
192 −132 24 12 0
  ⎢ −132 96 −24 −12 12 ⎥
⎢ ⎥
  ⎢ ⎥
r̈0 (t) dt  g Ωg, λ̈0 (t) dt  h Ωh, Ω  ⎢ 24 −24 24 −24 24 ⎥.
2 2
⎢ ⎥
⎣ 12 −12 −24 96 −132 ⎦
0 12 24 −132 192

Hence, the penalized log-likelihood is written as

PL (α, η, θ, β1 , β2 , g, h)  (α, η, θ, β1 , β2 , g, h) − κ1 g Ωg − κ2 h Ωh (3.5)

for a given pair of (κ1 , κ2 ). We suggest choosing κ1 and κ2 by maximizing LCV1 (κ1 )
and LCV2 (κ2 ) that shall be defined in Sect. 3.7.
If Cθ (v, w)  vw is fitted (or if θ ≈ 0 is assumed in the Clayton copula), then the
penalized log-likelihood in Eq. (3.5) is equivalent to that for the joint frailty model
of Rondeau et al. (2015). See Exercise 6 for more details.
The standard error (SE) and confidence interval (CI) are calculated from the
converged Hessian matrix defined as ĤPL ≡ HPL (ϕ̂), where HPL (ϕ)  ∂ 2 PL (ϕ)/∂ϕ2
and ϕ̂  (η̂, θ̂ , β̂1 , β̂2 , ĝ, ĥ)  arg maxϕ PL (ϕ). For instance, the 95% CI for β1 is
"
−1
β̂1 ± 1.96 × SE(β̂1 )  β̂1 ± 1.96 × (−ĤPL )β1 .

Similarly, the 95% CI for the baseline hazard function r0 (x) is


"
−1
r̂0 (t) ± 1.96 × SE{r̂0 (t)}  M (t)ĝ ± 1.96 × M (t)(−ĤPL )g M(t).
3.4 Penalized Likelihood with Splines 49

One can use the joint.Cox R package (Emura 2019) for computing κ1 , κ2 , ϕ̂, the
SEs and 95%CIs.

3.5 Case Study: Ovarian Cancer Data

To demonstrate statistical methods introduced in this chapter, we analyze the subset of


the ovarian cancer data of Ganzfried et al. (2013). Ganzfried et al. (2013) performed
the IPD meta-analysis on their data to conclude that the gene expression of CXCL12
is significantly associated with OS. In our analysis, we examine the effect of the
CXCL12 gene expression on time-to-relapse and OS using the joint frailty-copula
model.
To this end, we chose the subset consisting of four studies that recorded the two
endpoints as previously considered by Emura et al. (2017). Table 3.2 shows the
subset containing 1003 ovarian cancer patients from the four studies (N1  110,
N2  58, N3  278, and N4  557), which is available in the joint.Cox package.
All patients are surgically treated and then followed up for cancer relapse until death
or censoring. We regard TTP as time-to-relapse that is measured from the time of
surgery. We consider the CXCL12 gene expression as a covariate for TTP and OS.

Table 3.2 Data on ovarian cancer patients (Ganzfried et al. 2013; Emura et al. 2017)
Dataseta Sample size The number of observed events (event rates %)
Relapse (δij  1) Death (δij∗  1) Censoring
(δij∗  0)
GSE17260 N1  110 76 (69%) 46 (42%) 64 (58%)

GSE30161 N2  58 48 (83%) 36 (62%) 22 (38%)

GSE9891 N3  278 185 (67%) 113 (41%) 165 (59%)

TCGA N4  557 266 (48%) 290 (52%) 267 (48%)

Total 4 575 (57%) 485 (48%) 518 (52%)


i1 Ni  1003

Notes a Dataset is signified as GEO (Gene Expression Omnibus) accession number. Event rates (%)
are the percentage of experiencing a particular event (Relapse, Death or Censoring) within a study

We fitted the joint frailty-copula model to the data by using R codes given in
B1 of Appendix B. After running the codes, we obtained the plots for searching the
optimal values of the smoothing parameters κ1 and κ2 (Fig. 3.3). One can see that
κ1  2.76 × 1016 and κ2  3.45 × 1016 are chosen as the maximizers for LCV1 (κ1 )
and LCV2 (κ2 ), respectively.
50 3 The Joint Frailty-Copula Model for Correlated Endpoints

(1) The optimal value κ1 = 2.76 ×1016 is shown in the rightmost panel.

(2) The optimal value κ 2 = 3.45 ×1016 is shown in the rightmost panel.

Fig. 3.3 Plots for choosing the optimal values for κ1 and κ2 . They are chosen by maximizing LCV
 logL-DF, where logL is the log-likelihood and DF is the degrees of freedom
3.5 Case Study: Ovarian Cancer Data 51

The outputs for the R codes are shown below

> res
$count
No.of samples No.of events No.of deaths No.of censors
4 110 76 46 64
8 58 48 36 22
11 278 185 113 165
14 557 266 290 267

$beta1
esmate SE Lower Upper
0.19946579 0.03819308 0.12460735 0.27432422

$beta2
esmate SE Lower Upper
0.16550013 0.04371864 0.07981159 0.25118867

$eta
esmate SE Lower Upper
0.033423894 0.029324063 0.005987583 0.186578891

$theta
esmate SE Lower Upper
2.3468206 0.2500292 1.9045466 2.8917996

$tau
esmate tau_se Lower Upper
0.53989360 0.02646533 0.48777664 0.59115251

$LCV1
K1 LCV1
2.758621e+16 -4.591564e+03

$LCV2
K2 LCV2
3.448276e+16 -4.159635e+03

$g
[1] 9.065934e-01 1.711413e+00 6.733528e-06 3.947032e-02 2.790394e-07

$h
[1] 2.108053e-01 1.083808e+00 1.001098e+00 1.796956e-01 6.951190e-07

$g_var
[,1] [,2] [,3] [,4] [,5]
[1,] 4.576008e-03 -3.369090e-03 1.552692e-07 -3.758267e-04 7.350478e-06
[2,] -3.369090e-03 2.656668e-02 2.655560e-08 6.202427e-04 1.600087e-05
[3,] 1.552692e-07 2.655560e-08 -2.475294e-07 -9.974754e-06 -1.157045e-08
[4,] -3.758267e-04 6.202427e-04 -9.974754e-06 5.863771e-02 8.841581e-06
[5,] 7.350478e-06 1.600087e-05 -1.157045e-08 8.841581e-06 -1.720272e-06

$h_var
[,1] [,2] [,3] [,4] [,5]
[1,] 9.424816e-04 -1.368119e-03 -1.412362e-05 -7.147868e-04 4.058112e-10
[2,] -1.368119e-03 9.976173e-03 -1.180192e-02 -6.841087e-04 -4.529052e-10
[3,] -1.412362e-05 -1.180192e-02 5.718015e-02 2.243583e-03 -4.938438e-09
52 3 The Joint Frailty-Copula Model for Correlated Endpoints

[4,] -7.147868e-04 -6.841087e-04 2.243583e-03 1.078262e-03 -2.439095e-10


[5,] 4.058112e-10 -4.529052e-10 -4.938438e-09 -2.439095e-10 7.476648e-15

$convergence
MPL DF LCV code No.of.iteraons No.of.randomizaons
-8604.09320 11.69913 -8610.04633 1.00000 98.00000 10.00000

$convergence.parameters
NULL

Now we interpret the above outputs.


In “$count” we see the sample size (“No. of samples”  Ni ) and the number of
events (“No. of events”  mi , “No. of deaths”  m∗i , “No. of censors”  Ni − m∗i )
in each study. The numbers in “$count” are the same as those available in Table 3.2.
The numbers “4, 8, 11, 14” in the first column represent the study IDs, which do not
have particular meaning2 .
From “$beta1” to “$tau”, we see the estimate, the SE, and the 95%CI (lower and
upper limits). The correspondences are “$beta1”  β̂1 , “$beta2”  β̂2 , “$eta” η̂,
“$theta” θ̂ , and “$tau” τ̂  θ̂ /( θ̂ + 2 ). For instance, we had β̂1  0.199
(95%CI: 0.125-0.274). These values are converted to RR  exp(β̂1 )  1.22 (95%CI:
1.13-1.32). We set α  0 in this analysis.
“$LCV1” and “$LCV2” show the results for the grid searches for maximizing
LCV1 (κ1 ) and LCV2 (κ2 ), respectively. We see the maximizers κ1  2.76 × 1016
and κ2  3.45 × 1016 along with their maximized LCV values (see also Fig. 3.3).
“$g” and “$h” show the coefficients used in the splines, ĝ and ĥ, respectively. The
resultant baseline hazard functions are
r̂0 (t)  0.907 × M1 (t) + 1.711 × M2 (t) + 0.000 × M3 (t) + 0.040 × M4 (t) + 0.000 × M5 (t),

λ̂0 (t)  0.211 × M1 (t) + 1.084 × M2 (t) + 1.001 × M3 (t) + 0.180 × M4 (t) + 0.000 × M5 (t).

“$g_var” and “$h_var” are the covariance matrices of ĝ and ĥ, which are equiv-
−1 −1
alent to (−ĤPL )g and (−ĤPL )h . They are used to compute the SEs and CIs of r̂0 (t)
and λ̂0 (t).
“$convergence” shows several different aspects on likelihood maximization.
“MPL” gives the maximized penalized log-likelihood in Eq. (3.5). “DF” gives the
degrees of freedom that shall be defined in Sect. 3.7. The result “DF  11.69913”
implies that there are about 12 free parameters in the model. This number is smaller
than the total number of parameters, 14  1 + 1 + 1 + 1 + 5 + 5 (for $beta1, $beta2,
$eta, $theta, $g and $h), owing to constrained (penalized) likelihood optimization.
The value of “$LCV” represents the likelihood cross-validation (LCV) criterion, which
is interpreted as the negative of AIC. A larger LCV value corresponds to a better
model. Since the LCV value accounts for the number of parameters in the model, it
can be used for variable selection. “Randomize_num  10” implies that the default
initial value did not converge, and so the package tried 10 different initial values

2 These IDs are remnants of the study IDs previously used in an old version of the curatedOvarianData

package. Since the IDs may be changed in the new versions, the ID numbers lost the meaning.
3.5 Case Study: Ovarian Cancer Data 53

by adding random noises to the default initial values. The algorithm converged to
the proper solution as indicated by “code  1”. This implies that the gradients of
Eq. (3.5) are zero at the solution (see the help of the nlm() function in R).
Table 3.3 summarizes the outputs. The relative risk (RR) of CXCL12 on OS is
significantly greater than the null value (RR  1.18, 95%CI: 1.08–1.29). The RR of
CXCL12 on time-to-relapse is even higher (RR  1.22, 95%CI: 1.13–1.32) than that
on OS. These RRs are relative to one standard deviation increase in the expression of
CXCL12. Our result suggests that the expression of CXCL12 is a potential biomarker
predictive of cancer relapse in surgically treated ovarian cancer patients. The estimate
of the copula parameter (θ̂  2.35, 95%CI: 1.90–2.90) shows moderate amount of
dependence between relapse and death (τ̂  0.54, 95%CI: 0.38–0.70). This suggests
that the cancer relapse may predict death in ovarian cancer patients.

Table 3.3 The joint analysis of time-to-relapse and OS using the meta-analytic data (four studies,
1003 patients) for ovarian cancer patients
The Clayton copula The independence copula
Estimate (95% CI) Estimate (95% CI)
RRa for time-to-relapse: exp(β1 ) 1.22 (1.13–1.32) 1.24 (1.14–1.35)
RRa for OS: exp(β2 ) 1.18 (1.08–1.29) 1.17 (1.07–1.29)
Heterogeneity: η 0.033 (0.006–0.187) 0.028 (0.004–0.180)
Copula parameter: θ 2.35 (1.90–2.90) 0.00 (fixed)
RR for death after relapse: θ + 1 3.35 (2.90–3.90) 1.00 (fixed)
Kendall’s tau: τ  θ/(θ + 2) 0.54 (0.49–0.59) –
Maximum penalized log-likelihood −8604.09 −8744.02
Degrees of freedom 11.70 9.23
LCVb −8610.05 −8745.93
Notes a The RR (Relative Risk) of the CXCL12 expression is examined (RR > 1 indicates that
patients with high CXCL12 expression have poor survival outcomes)
b The LCV (likelihood cross-validation) assesses model adequacy (larger LCV corresponds to better

model)

Table 3.3 also includes the results under the independence copula (i.e., with a
fixed copula parameter, θ ≈ 0). Due to the failure to account for residual dependence
between time-to-relapse and OS, the LCV value under the independence model is
smaller than that under the Clayton copula. Nevertheless, the estimates of RR are
fairly comparable to those under the Clayton copula. This implies the robustness for
marginal inference under copula misspecification. The simulation studies of Emura
et al. (2017) also demonstrated some robustness for marginal inference against copula
misspecification.
Once the fitted parameter values are obtained, one can display the estimated
baseline hazard functions using the R codes given in B1 of Appendix B. Figure 3.4
plots the estimated baseline hazard functions r̂0 (t) and λ̂0 (t) and their 95% CIs.
The baseline hazard rate for relapse (r̂0 (t)) is high on early stage and gradually
54 3 The Joint Frailty-Copula Model for Correlated Endpoints

Fig. 3.4 Estimated baseline hazard functions for time-to-relapse (TTP) and overall survival (OS)
in ovarian cancer patients

decreases as time passes. On the other hand, the hazard rate for death (λ̂0 (t)) is
initially low and reaches a peak at around 2000 days. Hereafter, the hazard rate of
death is consistently higher than that of relapse. From these plots, one may suggest
physicians monitoring patients carefully for cancer relapse before 2000 days, and
after 2000 days, shifting more attention to other life-threatening symptoms. The
possibility of the joint assessments of the two hazard functions is one of the crucial
advantages of adopting splines for estimating baseline hazard functions.

3.6 Technical Note 1: Numerical Maximization

This section explains how the penalized log-likelihood in Eq. (3.5) is maximized in
the joint.Cox R package.
To avoid constraints on parameters (e.g., θ > 0), we consider log-transformed
parameters η̃  log(η), θ̃  log(θ ), g̃  log(g) and h̃  log(h). Given a value of α,
one can write
PL (η, θ, β1 , β2 , g, h)  PL (exp(η̃), exp(θ̃), β1 , β2 , exp(g̃), exp(h̃))  ˜PL (η̃, θ̃, β1 , β2 , g̃, h̃).

A minimization function nlm() is applied to − ˜PL with the initial values


(η̃, θ̃ , β1 , β2 , g̃, h̃)  0. The initial values are equivalent to η  θ  1, β1 
β2  0, and g  h  (1, 1, 1, 1, 1) . The converged Hessian matrix is defined
as ĤPL ≡ HPL (ϕ̂), where HPL (ϕ)  ∂ 2 PL (ϕ)/∂ϕ2 and ϕ̂  (η̂, θ̂ , β̂1 , β̂2 , ĝ, ĥ) 
arg maxϕ PL (ϕ). We obtain ĤPL by multiplying an appropriate transformation fac-
tor to the output of nlm(,hessian  TRUE). Note that ĤPL is useful for calculating
the SEs and LCV. If either nlm() does not converge or ĤPL is not negative definite,
3.6 Technical Note 1: Numerical Maximization 55

the package tries different initial values by adding uniform random noises between
−1 and 1 to (η̃, θ̃ , β1 , β2 , g̃, h̃)  0, and then reapply nlm(). This idea of random
initial values has been adopted in many different contexts (e.g., Hu and Emura 2015;
Emura and Pan 2017). To calculate the integrals in Eq. (3.3), a numerical integration
function integrate() is applied to the range 0 ≤ u ≤ 10. To determine the value of α,
one may use a profile likelihood, or try a few plausible values of α (e.g., α  0 or
α  1).

3.7 Technical Note 2: LCV and Choice of κ1 and κ2

We define a likelihood cross-validation (LCV) criterion

LCV  ˆ − tr{ĤPL
−1
Ĥ },

where ˆ is the log-likelihood value in Eq. (3.3) evaluated at ϕ̂, ĤPL is the converged
Hessian matrix of the penalized log-likelihood, and Ĥ is the converged Hessian
matrix of the un-penalized log-likelihood. Specifically, ˆ  (ϕ̂), ĤPL  HPL (ϕ̂),
−1
and Ĥ  H (ϕ̂), where H (ϕ)  ∂ 2 (ϕ)/∂ϕ2 . The term tr{ĤPL Ĥ } is the degrees of
freedom, a decreasing function with increasing κ1 and κ2 . The LCV is a criterion
capable of choosing the best values of κ1 and κ2 , as well as selecting the best subset
of covariates. The LCV plays a similar role as the AIC for model selection. However,
the calculation of the LCV requires high computational cost.
To alleviate the computational cost, we consider an approximation to the LCV in
the following way. Under the working assumptions for the absence of heterogeneity
and for the independence copula, the log-likelihood in Eq. (3.3) is

(β1 , β2 , g, h)  1 (β1 , g) + 2 (β2 , h),

where

G 
Ni  Ni 
G     
1 (β1 , g)  δij log rij (Tij ) − Rij (Tij ) , 2 (β2 , h)  δij∗ log λij Tij∗ − ij Tij∗ ,
i1 j1 i1 j1

This suggests choosing κ1 and κ2 based on two separate Cox models. As detailed in
PL PL
Sect. 2.4.1, one can obtain penalized likelihood estimates (β̂1 , ĝ) and (β̂2 , ĥ) given
κ1 and κ2 by using splineCox.reg(). Then, we define two LCVs

LCV1  ˆ1 − tr{ĤPL1
−1
Ĥ1 }, LCV2  ˆ2 − tr{ĤPL2
−1
Ĥ2 },

where ˆ1 and ˆ2 are the log-likelihood values evaluated at their penalized likelihood
estimates, and ĤPL1 and ĤPL2 are the converged Hessian matrices for the penalized
likelihood estimations, Ĥ1 and Ĥ2 are the converged Hessian matrices for the log-
likelihoods such that
56 3 The Joint Frailty-Copula Model for Correlated Endpoints
# $ # $
Op1 ×p1 Op1 ×5 Op2 ×p2 Op2 ×5
Ĥ1  ĤPL1 + 2κ1 , Ĥ2  ĤPL2 + 2κ2 ,
O5×p1 Ω O5×p2 Ω

where O is a zero matrix. We then expect that the following approximation holds:

LCV ≈ LCV1 + LCV2 .

Consequently, maximizing LCV is roughly equal to maximizing LCV1 for κ1 and


LCV2 for κ2 , separately.
The joint.Cox package provides the plots of LCV1 and LCV2 on a given grid along
with the optimized values for κ1 and κ2 . When looking at the outputs, following
properties must be checked: (i) ˆ1 and ˆ2 are smoothly decreasing in κ1 and κ2 ,
−1
respectively, (ii) the degrees of freedom tr{ĤPL1 Ĥ1 } decreases from p1 +5 to p1 +2; the
−1
degrees of freedom tr{ĤPL2 Ĥ2 } decreases from p2 + 5 to p2 + 2. If these two properties
are not met, the grid is inappropriate. The degrees of freedoms can occasionally be
quite big if the Hessian matrix is singular. The values of κ1 and κ2 producing such
results are ignored.
Given the chosen values for κ1 or κ2 , we fit the joint frailty-copula model and
calculate LCV  ˆ−tr{ĤPL −1
Ĥ }. The LCV represents the trade-off between goodness-
of-fit ( ˆ) and the degrees of freedom (tr{ĤPL
−1
Ĥ }). Hence, the LCV is similar to the
AIC. Consequently, the LCV can be used for covariate selection. That is, the best
subset of covariates is the one that minimizes the LCV. While one has LCV ≈
LCV1 + LCV2 , the values of LCV1 and LCV2 are used in order only to choose κ1 and
κ2 .

3.8 Exercises

1. Consider the joint frailty model with α  1, that is,


  % &
Pr Xij > x, Dij > y|ui  exp −ui Rij (x) + ij (y) .

Show Pr(Xij > x, Dij > y)  [1 + η{Rij (x) + ij (y)}]−1/η under the gamma frailty
model.
2. Consider the joint frailty-copula model in Eq. (3.2) with α  1 and the Gumbel
copula,
  θ+1 1/(1+θ)
Pr Xij > x, Dij > y|ui  exp −ui Rij (x)θ+1 + ij (y) .

Derive the expression of Pr(Xij > x, Dij > y).


3. Consider the joint frailty-copula model in Eq. (3.2) with α  1 and the Clayton
copula,
  % &−1/θ
Pr Xij > x, Dij > y|ui  exp θ ui Rij (x) + exp θ ui ij (y) −1 .
3.8 Exercises 57
 
(1) Under η  1, derive the expression of Pr Xij > x, Dij > y by using
1 −1/θ
Hθ (a, b)  t −aθ + t −bθ − 1 dt.
0  
(2) Derive the expression of Pr Xij > x, Dij > y by using some function
Hη,θ (a, b).

4. Consider the joint frailty-copula model in Eq. (3.2) with α  1 and the Pareto
baseline hazard functions,

r0 (x)  γ1 /x, γ1 > 0, x ≥ ξ1 > 0,


λ0 (y)  γ2 /y, γ2 > 0, y ≥ ξ2 > 0.

Derive the expression of Pr(Xij > x, Dij > y|ui ) under the Clayton and Gumbel
copulas.
5. Show the relationship between θ [s, t] and Rθ [u, v] that is defined in Sect. 2.6.
6. Under the independence copula Cθ (v, w)  vw, show that Eq. (3.3) reduces to
the log-likelihood of Rondeau et al. (2015) as follows:


G Ni 
 
(α, η, β1 , β2 , r0 , λ0 )  ⎣ δij log rij (Tij ) + δij∗ log λij (Tij∗ )
i1 j1
⎧ ⎛ ⎞⎫ ⎤
∞ ⎨ 
Ni 
Ni ⎬
mi +αm∗i
+ log u exp⎝−u Rij (Tij ) − uα ij (Tij )⎠ fη (u)du⎦.

⎩ ⎭
0 j1 j1

Simplify the above expression when α  0 and α  1, respectively.


7. Derive Dθ [s, t], ψθ [s, t], ψθ∗ [s, t], and θ [s, t] under the Gumbel copula.
8. Do the above exercise under the Farlie–Gumbel–Morgenstern (FGM) copula.
9. When performing numerical integrations in Eq. (3.3), we used the truncated range
0 ≤ u ≤ 10 rather than 0 ≤ u < ∞. This avoids some instability occurring for
u > 10.

(1) Draw a figure to show the gamma density for 0 ≤ u ≤ 10 under sev-
eral parameter values. Compare your figure with the figure given by Aalen
(1994).
(2) Conduct a numerical experiment to demonstrate if the range 0 ≤ u ≤ 10 is
enough to evaluate the integrations in the log-likelihood.

10. Let (ϕ) be a function of ϕ, and ˜(ϕ̃) be defined as ˜(ϕ̃)  (eϕ̃ ). Also, let
S̃(ϕ̃)  ∂ ˜(ϕ̃)/∂ ϕ̃ and H̃ (ϕ̃)  ∂ 2 ˜(ϕ̃)/∂ ϕ̃ 2 . Write S(ϕ̃)  ∂ (ϕ̃)/∂ ϕ̃ and
H (ϕ̃)  ∂ 2 (ϕ̃)/∂ ϕ̃ 2 in terms of S̃(·) and H̃ (·). Write down “a transformation
factor” mentioned in Sect. 3.6.
58 3 The Joint Frailty-Copula Model for Correlated Endpoints

References

Aalen OO (1994) Effects of frailty in survival analysis. Stat Methods Med Res 3(3):227–243
Burzykowski T, Molenberghs G, Buyse M, Geys H, Renard D (2001) Validation of surrogate end
points in multiple randomized clinical trials with failure time end points. Appl Stat 50(4):405–422
Burzykowski T, Molenberghs G, Buyse M (eds) (2005) The evaluation of surrogate endpoints.
Springer, New York
Commenges D, Jacqmin-Gadda H (2015) Dynamical biostatistical models. CRC Press, London
Emura T, Nakatochi M, Murotani K, Rondeau V (2017) A joint frailty-copula model between
tumour progression and death for meta-analysis. Stat Methods Med Res 26(6):2649–2666
Emura T (2019). joint.Cox: joint frailty-copula models for tumour progression and death in meta-
analysis, CRAN
Emura T, Pan CH (2017) Parametric likelihood inference and goodness-of-fit for dependently left-
truncated data, a copula-based approach, Stat Pap https://doi.org/10.1007/s00362-017-0947-z
Fine JP, Jiang H, Chappell R (2001) On semi-competing risks data. Biometrika 88:907–920
Ganzfried BF, Riester M, Haibe-Kains B et al (2013) Curated ovarian data: clinically annotated
data for the ovarian cancer transcriptome, Database; Article ID bat013: https://doi.org/10.1093/
database/bat013
Haneuse S, Lee KH (2016) Semi-competing risks data analysis, accounting for death as a competing
risk when the outcome of interest is nonterminal. Circ Cardiovasc Qual Outcomes 9:322–331
Hu YH, Emura T (2015) Maximum likelihood estimation for a special exponential family under
random double-truncation. Computation Stat 30(4):1199–1229
Nelsen RB (2006) An introduction to copulas, 2nd edn. Springer, New York
Peng M, Xiang L, Wang S (2018) Semiparametric regression analysis of clustered survival data
with semi-competing risks. Comput Stat Data Anal 124:53–70
Ramsay J (1988) Monotone regression spline in action. Statis Sci 3:425–461
Rondeau V, Pignon JP, Michiels S (2015) A joint model for dependence between clustered times to
tumour progression and deaths: a meta-analysis of chemotherapy in head and neck cancer. Stat
Methods Med Res 24(6):711–729
Rotolo F, Paoletti X, Michiels S (2018) surrosurv: an R package for the evaluation of failure time
surrogate endpoints in individual patient data meta-analyses of randomized clinical trials. Comput
Methods Programs Biomed 155:189–198
Chapter 4
High-Dimensional Covariates
in the Joint Frailty-Copula Model

Abstract The concerns for over-fitting, high computational cost, and large estima-
tion error arise when the number of covariates is large in a model. We introduce a
simple and effective strategy to handle high-dimensional covariates based on Tukey’s
compound covariate method. We then demonstrate how the compound covariate
method is applied to the joint frailty-copula model, and how patient-level survival
is predicted. Using simulations, we compare the compound covariate method with
ridge- and Lasso-based methods in a prediction setting. We analyze the ovarian can-
cer data for illustration.

Keywords Compound covariate · Cox regression · Feature selection · Gene


expression · Univariate selection · Lasso · Meta-analysis · Ridge regression ·
Survival prediction

4.1 Introduction

In the presence of high-dimensional covariates, the traditional Cox regression anal-


ysis (Cox 1972) fails to provide a satisfactory result. Many techniques to overcome
the problem for the traditional Cox model are now available (Witten and Tibshi-
rani 2010). In particular, shrinkage techniques, such as ridge regression and Lasso,
are commonly used to incorporate high-dimensional covariates into the Cox model
(Bøvelstad et al. 2007).
These techniques developed for the Cox model employ the partial likelihood
function, and hence they are not directly applicable to the joint frailty-copula model
for correlated endpoints (Emura et al. 2017; Chap. 3). What we present in this chapter
is the idea of compound covariate as advocated by Tukey (1993), a simple and
effective strategy to handle high-dimensional covariates using the univariate Cox
model.
Unlike shrinkage methods, the compound covariate method applies a univariate
feature selection method using multiple tests. This method involves computation of
the significance levels of features (in terms of P-value) and is suitable toward the
objective of achieving biological insights, where screening of prognostic features

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2019 59
T. Emura et al., Survival Analysis with Correlated Endpoints,
JSS Research Series in Statistics, https://doi.org/10.1007/978-981-13-3516-7_4
60 4 High-Dimensional Covariates in the Joint Frailty-Copula Model

exhaustively may be a relevant task, even if some selected features are highly cor-
related. In some shrinkage-based methods, such as Lasso, a feature subset may be
identified, taking account of the correlations among features. One has to recognize
that such a subset is one, haphazardly selected (due to random errors) from many
“solutions” of predictor with comparable predictive capability in high-dimensional
situations (Schumacher et al. 2012).
This chapter is organized as follows. Sections 4.2 and 4.3 review Tukey’s com-
pound covariate method. Section 4.4 introduces the data structure. Section 4.5 demon-
strates how the compound covariate method is applied to the joint frailty-copula
model. Section 4.6 introduces the ridge regression and Lasso methods. Section 4.7
constructs the patient-level survival function for prediction. Section 4.8 conducts
simulation studies and Sect. 4.9 analyzes real data. Section 4.10 concludes with
discussions.

4.2 Tukey’s Compound Covariate

The term compound covariate is first employed by Tukey (1993) who is also known
as the founder of the jackknife method and exploratory data analysis. According to
Tukey, compound covariate refers to a composite score calculated as a weighted sum
of individual covariates, where the weight assigned for each covariate is determined
by its univariate association with the outcome of interest. The compound covariate
method is a general method applicable to many different settings, including linear
regression, binary classification, logistic regression, and Cox regression.
We introduce Tukey’s compound covariate as a tool for predicting survival.
Consider a future patient with a covariate vector (Z 1 , . . . , Z q ). To predict clin-
ical outcomes of the patient, one can consider a compound covariate, defined
as w1 Z 1 + · · · + wq Z q , where (w1 , . . . , wq ) is a vector of weights. In the com-
pound covariate, the weight w j is computed by fitting survival data to univariate
models, e.g., the partial likelihood estimate w j  β̂ j under the univariate Cox
model h j (t|Z j )  h 0 j (t) exp(β j Z j ) for j  1, . . . , p. Some researchers apply
w j  β̂ j /S E(β̂ j ) (Wang et al. 2005) or the Z-value of the score tests for w j (Matsui
2006; Emura et al. 2019). In all cases, the compound covariate predictor is an
ensemble of univariate analyses, which does not employ a multivariate analysis.
If w j  β̂ j is employed, high (low) value of the compound covariate is associated
with poor (good) prognosis for survival. This prediction method is called compound
covariate prediction. For instance, Chen et al. (2007) employed the weights w j  β̂ j
attached to the q  16 gene expressions to construct a compound covariate

CC  (−1.09 × ANXA5) + (1.32 × DLG2) + (0.55 × ZNF264) + (0.75 × DUSP6)


+ (0.59 × CPEB4) + (−0.84 × LCK) + (−0.58 × STAT1) + (0.65 × RNF4)
+ (0.52 × IRF4) + (0.58 × STAT2) + (0.51 × HGF) + (0.55 × ERBB3)
+ (0.47 × NF1) + (−0.77 × FRAP1) + (0.92 × MMD) + (0.52 × HMMR),
4.2 Tukey’s Compound Covariate 61

where the covariates are expressed in the gene symbols. This compound covariate
predicts survival prognosis for lung cancer patients.
Compound covariate prediction has been shown to be useful in medical studies
with gene expressions as a simple and powerful tool for survival prediction (Beer
et al. 2002; Wang et al. 2005; Matsui 2006; Chen et al. 2007; Matsui et al. 2012;
Emura et al. 2012, 2018, 2019; Zhao et al. 2014). The compound covariate method
has the competitive performance over more sophisticated multivariate techniques,
such as ridge and Lasso methods; see numerical studies of Emura et al. (2012, 2018,
2019) and Zhao et al. (2014).

4.3 Univariate Feature Selection

Since the compound covariate method utilizes univariate regression, it is closely


related to univariate feature selection (Witten and Tibshirani 2010; Emura et al.
2019).
Suppose we wish to select a small fraction of genes from a large number of
genes. Let p be the number of all available genes, where p can be large, such as
p ≈ 5, 000. Univariate feature selection proceeds as follows: For each j  1, . . . , p,
the null hypothesis H0 : β j  0 is examined by the Wald test (or score test) under
the univariate Cox model treating the j-th gene as a covariate. The parameter β j
represents the univariate association between survival and the gene, where all other
genes are ignored. Then, one picks out a subset of genes that have low P-values from
the tests. The top q (< p) genes with lowest P-values are then selected.
Simon (2003) recommended the P-value threshold of 0.001 in microarrays anal-
yses. This is more stringent than the traditional 0.01 or 0.05 criterion, but less strin-
gent than the genome-wide significance 5 × 10−8 . The P-value < 0.001 condition is
designed to allow some, but not too many false positives. For p  5000, one would
have 5000 × 0.001  5 falsely identified genes. See also Matsui et al. (2012) and
Emura et al. (2018) who used the P-value threshold of 0.001 in analysis of survival
data. For simplicity, we shall adopt the P-value threshold of 0.001.1
After selecting the q genes having P-values lower  than the threshold, they are
used to compute a compound covariate predictor j β̂ j Z j .
Remarkably, the prediction performance of the compound covariate predictor is
robust against a small change in the P-value threshold. First, a small change of the
P-value may not change the selected genes at all. This reason shall be illustrated by
a simulation. Second, even if the selected genes are changed, majority of the genes
with lower P-values are still kept in the predictor. This property is due to the additive
property of the compound covariate predictor, without incorporating the correlations
among genes. Many shrinkage methods, in particular, the Lasso, do not possess

1 Obviously, the adequacy of 0.001 depends on many factors, such as the total number of genes
and sample sizes. If one uses the compound.Cox R package, one can obtain a data-driven P-value
threshold through a cross-validation (Emura et al. 2019).
62 4 High-Dimensional Covariates in the Joint Frailty-Copula Model

this property since a small change in the shrinkage parameter can alter the whole
structure of the predictor. The gain in predictive accuracy by ignoring correlations
among high-dimensional gene expressions has been known in linear discriminant
analysis (Dudoit et al. 2002; Bickel and Levina 2004).
An important caution is that one should not refit the multivariate Cox model
after univariate selection (e.g., Lossos et al. 2004). If one refits the multivariate Cox
model, some regression coefficients of the selected genes become nonsignificant,
especially those among correlated genes. While many biomedical researchers prefer
a multivariate prognostic model to avoid correlated genes in the model, the refitted
model may lose predictive power (Bøvelstad et al. 2007; van Wieringen et al. 2009)
and the additive property of the compound covariate predictor. Compound covariate
prediction should be made by an aggregation of the univariate models without going
through any multivariate model.

4.4 Meta-Analytic Data with High-Dimensional Covariates

The compound covariate method adapts to different types of analyses. Here, we


consider an individual-patient data (IPD) meta-analysis of semi-competing risks
data (Chap. 3). The semi-competing risks mean that a terminal event (death) censors
a nonterminal event (tumour progression), but not vice versa. We also consider high-
dimensional covariates that may be associated with both the terminal and nonterminal
events.
Meta-analytic data consist of G independent studies with the i-th study containing
Ni patients. For i  1, 2, . . . , G and j  1, 2, . . . , Ni , let
• X i j : time-to-tumour progression (TTP),
• Di j : overall survival (OS), or equivalently, time-to-death,
• Ci j : independent and non-informative censoring time.
As explained in Chap. 3, what we actually observe are the semi-competing risks
data (Ti j , Ti∗j , δi j , δi∗j , Z1,i j , Z2,i j , Ui j ) for i  1, 2, . . . , G and j  1, 2, . . . , Ni ,
where
• Ti j  min(X i j , Di j , Ci j ): first-occurring event time,
• δi j  I(Ti j  X i j ): status of tumour progression (no progression  0; progression
 1), where I(·) is the indicator function,
• Ti∗j  min(Di j , Ci j ): censored terminal event time,
• δi∗j  I(Ti∗j  Di j ): status for death (alive  0; dead  1),
• Z1,i j : p1 -dimensional clinical covariates associated with TTP,
• Z2,i j : p2 -dimensional clinical covariates associated with OS,
• Ui j  (Ui j,1 , . . . , Ui j, p ): p-dimensional gene expressions that are standardized to
have mean  0 and SD  1 across the entire patients (or across patients within
each study).
4.4 Meta-Analytic Data with High-Dimensional Covariates 63

We assume that the numbers p1 and p2 are small and the number p is large.
Table 4.1 shows an example of the data used in Emura et al. (2018) consisting of
912 ovarian cancer patients from four independent studies. There are 11,756 gene
expressions that are commonly available across the four studies. It is of our interest
to examine how the gene expressions can be incorporated into a joint model for the
terminal event (death) and nonterminal event (relapse).

Table 4.1 Meta-analytic data from four independent studies of ovarian cancer patients
Dataseta Median Sample size The number of observed events The number
follow-up (event rates) of genes
(days) Relapse
  
Death  Censoring
 
δi j  1 δi∗j  1 δi∗j  0

GSE17260 1410 N1  84 59 (70%) 38 (45%) 46 (55%) 18,548

GSE30161 2513 N2  58 48 (83%) 36 (62%) 22 (38%) 18,524

GSE9891 1140 N3  260 185 (71%) 113 (43%) 147 (57%) 18,524

TCGA 1721 N4  510 252 (49%) 278 (55%) 232 (45%) 12,211

Total 4 544 (60%) 465 (51%) 447 (49%) Common 


i1 Ni  912
11,756
Note The data are extracted from the curatedOvarianData package of Ganzfried et al. (2013)
a Dataset is signified as the GEO accession number which can be used to search the public genomics data in

the GEO (Gene Expression Omnibus) repository. Extracted studies are the subset having documented values of
“days-to-tumour-recurrence”, “days-to-death”, “recurrence status”, and “vital status” for all patients. The median
follow-up time is calculated from the Kaplan–Meier survival curve for time-to-censoring for each study. The
event rates are calculated separately for each study. However, our data extraction yielded a slightly reduced list
of patients compared to Table 3.2. The reason may be due to the update of “patientselection.config” file (from
older version 1.0.3 to the version 1.8.0) in the package to remove some duplicate samples (Waldron et al. 2014)

4.5 The Joint Model with Compound Covariates

This section considers a method for fitting the data to the joint frailty-copula
model for TTP and OS by screening the high-dimensional gene expressions Ui j 
(Ui j,1 , . . . , Ui j, p ).
In the initial step, we select q1 (< p) genes univariately associated with TTP
based on the P-value < 0.001 criterion. More precisely, the data {(Ti j , δi j , Ui j,k ); i 
1, . . . , G, j  1, . . . , Ni } are fitted to the univariate Cox model for TTP, say ri j,k (t) 
r0,k (t) exp(bk Ui j,k ), and the P-value of testing the null hypothesis H0 : bk  0
is evaluated on the k-th gene (k  1, . . . , p). Similarly, we select q2 (< p) genes
univariately associated with OS based on the data {(Ti∗j , δi∗j , Ui j,k ); i  1, . . . , G, j 
1, . . . , Ni }. Thus, we obtain Vi j ⊂ Ui j and Wi j ⊂ Ui j such that
64 4 High-Dimensional Covariates in the Joint Frailty-Copula Model

• Vi j  (Vi j,1 , . . . , Vi j,q1 ): q1 -dimensional genes associated with TTP (P-value <
0.001),
• Wi j  (Wi j,1 , . . . , Wi j,q2 ): q2 -dimensional genes associated with OS (P-value <
0.001),

where Vi j and Wi j may have common elements since some genes influence both
TTP and OS.
If the P-value < 0.001 cutoff is not suitable to data, one can try to find a data-driven
cutoff value that optimizes a predictive measure proposed by Matsui (2006). The
methodologies and computer programs for obtaining the optimal cutoff in univariate
feature selection are available in Emura et al. (2019).
In the initial process of screening genes, we focus on the univariate effect of
each gene, ignoring all other effects of genes and covariates. Neither the effect of
dependent censoring nor frailty is accounted at this stage.
Then, we construct compound covariates (CCs)

CC1,i j  b̂1 Vi j,1 + · · · + b̂q1 Vi j,q1 (associated with TTP)


CC2,i j  ĉ1 Wi j,1 + · · · + ĉq2 Wi j,q2 (associated with OS)

where the weights b̂k and ĉk are estimates under the univariate Cox models on the
k-th gene, namely, b̂k  arg maxk (bk ), where
⎡ ⎛ ⎞⎤

G Ni 
k (bk )  δi j ⎣bk Ui j,k − log⎝ exp(bk U,k )⎠⎦, Ri j  {; T ≥ Ti j },
i1 j1 ∈Ri j

and ĉk  arg max∗k (ck ), where


⎡ ⎛ ⎞⎤

G 
Ni 
∗k (ck )  δi∗j ⎣ck Ui j,k − log⎝ exp(ck U,k )⎠⎦, Ri∗j  {; T∗ ≥ Ti∗j }.
i1 j1 ∈Ri∗j

Let μ̂1 (or μ̂2 ) be the sample mean of CC1,i j (or CC2,i j ). Also, let σ̂1 (or σ̂2 )
be the sample SD of CC1,i j (or CC2,i j ). The standardized values of the compound
covariates are fitted to the joint frailty-copula model (Emura et al. 2017; Chap. 3):
⎧   
⎨ ri j (t|u i )  u i r0 (t) exp β1 Z1,i j + γ1 {CC1,i j − μ̂1 }/
 σ̂1 
λ (t|u i )  u iα λ0 (t) exp β2 Z2,i j + γ2 CC2,i j − μ̂2 /σ̂2 , (4.1)
⎩ ij
Pr(X i j > x, Di j > y|u i )  Cθ [S Xi j (x|u i ), S Di j (y|u i )]

where u i is a frailty term for the i-th study, Cθ is a copula with a parameter θ ,
r0 (t)  g M(t) and λ0 (t)  h M(t) are baseline hazard functions approximated by
splines, and M(t)  (M1 (t), . . . , M5 (t)) are the M-spline bases (Appendix A). In
Eq. (4.1), the survival functions and hazard functions are related through
4.5 The Joint Model with Compound Covariates 65
     
S Xi j (x|u i )  exp −u i R0 (x) exp β1 Z1,i j + γ1 CC1,i j − μ̂1 /σ̂1 ,
(4.2)
S Di j (y|u i )  exp −u iα Λ0 (y) exp β2 Z2,i j + γ2 CC2,i j − μ̂2 /σ̂2 ,
x y
where R0 (x)  0 r0 (t)dt and Λ0 (y)  0 λ0 (t)dt. The parameter estimates
(η̂, θ̂ , β̂1 , β̂2 , γ̂1 , γ̂2 , ĝ, ĥ) are computed through jointCox.reg() in the joint.Cox R
package (Emura 2019), where η̂ is an estimate of the heterogeneity parameter
η  Var(u i ).

Remarks We apply the standardized version of a compound covariate since the range
of CC1,i j (or CC2,i j ) can be very large if the number q1 (or q2 ) is large. In general,
fitting a large covariate value may yield computational difficulties in the joint frailty-
copula model.

4.6 The Joint Model with Ridge or Lasso Predictor

In addition to the compound covariate method, a variety of approaches are avail-


able to deal with high-dimensional covariates (Bøvelstad et al. 2007). The ridge
approach does not involve the preliminary selection of genes, unlike the compound
covariate predictor that screens out genes with a P-value threshold. Instead, the
ridge approach requires selecting a shrinkage parameter that plays a similar role as
the P-value threshold. We should notice that using the whole genes for prediction
appears to be uncommon in medical practices, and hence, this approach should be
considered with their practical feasibility for clinicians. Nevertheless, as the ridge
regression is an accurate and sophisticated statistical prediction tool in gene expres-
sion data (Bøvelstad et al. 2007; van Wieringen et al. 2009), it is worth considering
this approach.
The ridge approach uses the genes Ui j  (Ui j,1 , . . . , Ui j, p ) to construct predictors

Ridge1,i j  ξ̂1 Ui j,1 + · · · + ξ̂ p Ui j, p  ξ̂ Uij (associated with TTP)
,
Ridge2,i j  ς̂1 Ui j,1 + · · · + ς̂ p Ui j, p  ς̂ Uij (associated with OS)

where the weights ξ̂ and ς̂ are the ridge estimates (Bøvelstad et al. 2007). Specif-
ically, with the model ri j (t)  r0 (t) exp(ξ Ui j ) and the data {(Ti j , δi j , Ui j ); i 
1, . . . , G, j  1, . . . , Ni }, one can calculate ξ̂ by applying optL2(,fold  5) in the
penalized R package (Goeman et al. 2016). Here, the shrinkage parameter is opti-
mized by using the 5-fold cross-validation as indicated in the option “fold  5”. Sim-
ilarly, one can calculate ς̂ by the data {(Ti∗j , δi∗j , Ui j ); i  1, . . . , G, j  1, . . . , Ni }.
Finally, the joint frailty-copula model is fitted as

⎨ ri j (t|u i )  u i r0 (t) exp(β1 Z1,i j + γ1 Ridge1,i j )
λi j (t|u i )  u iα λ0 (t) exp(β2 Z2,i j + γ2 Ridge2,i j ) .

Pr(X i j > x, Di j > y|u i )  Cθ [S Xi j (x|u i ), S Di j (y|u i )]
66 4 High-Dimensional Covariates in the Joint Frailty-Copula Model

The Lasso-based predictors are computed by the command optL1(,fold  5),


which are denoted as

Lasso1,i j  ξ̂1 Ui j,1 + · · · + ξ̂ p Ui j, p  ξ̂ Uij (associated with TTP)
.
Lasso2,i j  ς̂1 Ui j,1 + · · · + ς̂ p Ui j, p  ς̂ Uij (associated with OS)

Note that some estimated regression coefficients are exactly zero in the Lasso method.
Thus, only those genes having nonzero estimates (ξ̂k  0 or ς̂k  0) contribute to
the predictors.
Both the ridge and Lasso predictors have strong shrinkage effects on their regres-
sion coefficients for a large number p. Consequently, the ranges of the predictors are
reduced by the shrinkage effect, which make it possible to fit them without standard-
ization.

4.7 Prediction of Patient-Level Survival Function

The patient-level survival function can be obtained for a new patient not in the
samples. We follow the general idea of Matsui et al. (2012) who developed a patient-
level survival function using compound covariates.
Let D be OS of the new patient. We wish to predict the survival function of D
according to the clinical covariates Z2 and gene expressions Ui j  (Ui j,1 , . . . , Ui j, p ).
We define a compound covariate

CC2  ĉ1 W1 + · · · + ĉq2 Wq2 ,

using the subset W  (W1 , . . . , Wq2 ) ⊂ (U1 , . . . , U p ) obtained from the new patient
and the estimate ĉk under the univariate Cox models on the k-th gene.
We assume that the new patient follows the same probability mechanism as the
patients in the samples. That is, the new patient has survival experience following
the model (4.2). Since the frailty term is usually unknown for the new patient, we
integrate out Eq. (4.2) to estimate the patient-level survival function.
 ∞   
α ˆ  CC2 − μ̂2
Ŝ(w|Z2 , CC2 )  exp −u 0 (w) exp β̂ 2 Z2 + γ̂2 f η̂ (u)du.
0 σ̂2

The confidence interval (CI) for Ŝ(w|Z2 , CC2 ) is computed by simulating param-
eters
∗(m) ∗(m)
(η̂∗(m) , θ̂ ∗(m) , β̂1 , β̂2 , γ̂1∗(m) , γ̂2∗(m) , ĝ∗(m) , ĥ∗(m) ), m  1, 2, . . . , 500,

from a multivariate normal distribution


4.7 Prediction of Patient-Level Survival Function 67

∗ ∗
(log(η̂∗ ), log(θ̂ ∗ ), β̂1 , β̂2 , γ̂1∗ , γ̂2∗ , log(ĝ∗ ), log(ĥ∗ ))
 
∼ N Mean  (log(η̂), log(θ̂), β̂1 , β̂2 , γ̂1 , γ̂2 , log(ĝ), log(ĥ)), Covariance  Σ ,

where Σ is the log-scaled covariance matrix of the parameter estimates, which is


obtained from the outputs of jointCox.reg(,convergence.par  TRUE). Accordingly,
we have simulated patient-level survival functions

Ŝ ∗(m) (w|Z2 , CC2 ), m  1, 2, . . . , 500.

Their 2.5 and 97.5% points give the pointwise 95% CI for the patient-level survival
function. Note that the delta method is difficult to use for computing the SE and 95%
CI due to a large number of parameters.

4.8 Simulations

We conduct a simulation to compare the compound covariate predictor with three


other predictors: (i) the ridge-based predictor, (ii) Lasso-based predictor, and (iii)
null predictor. The evaluation criterion is the prediction error (also known as the
Brier score), defined as

Err(w)  E[{I(D > w) − Ŝ(w| f (Z , U))}2 ], w > 0,

where Z is a clinical covariate, U  (U1 , . . . , U p ) are gene expressions, and f (·)


can be f (Z , U)  (Z , CC2 ), f (Z , U)  (Z , Ridge2 ), f (Z , U)  (Z , Lasso2 ), or
f (Z , U)  (0, 0). The expectation E[·] is taken for the distribution of (D, Z , U)
given Ŝ(w|·) (Gerds and Schumacher 2006). We calculate Err(w) according to the
following simulation designs.

4.8.1 Simulation Designs

Let G  5 and Ni  200 for i  1, 2, . . . , 5. A frailty value u i follows a gamma


distribution with η  0.5, and a covariate Z i j follows N (0, 1) truncated between
−3 and 3. Gene expressions Ui j  ( Ui j,1 , . . . , Ui j, p ) with p  400 fol-
low a uniform distribution with mean  0 and SD  1 whose correlation struc-
ture is Corr(Ui j,k , Ui j, )  0.5 for 1 ≤ k < l ≤ 25 or 26 ≤ k < l ≤ 50;
Corr(Ui j,k , Ui j, )  0 otherwise. The corresponding coefficients are

ξ  (0.1, . . . , 0.1, −0.1, . . . , −0.1, 0, . . . , 0).


     
×25 ×25 ×350
68 4 High-Dimensional Covariates in the Joint Frailty-Copula Model

We generated such gene expressions by the command: X.pathway(n  1,p  400,q1


 25,q2  25) using the compound.Cox R package (Emura et al. 2019).
Given u i , Z i j , and Ui j , the pair of X i j and Di j were generated from the model
⎧ 
⎨ ri j (t|u i )  u i r0 (t) exp(1.5 × Z i j + ξ Ui j ) (for

Xi j )

λi j (t|u i )  u i λ0 (t) exp(1.5 × Z i j + ξ Ui j ) for Di j ,
⎩ −θ −θ −1/θ
Pr(X i j > x, Di j > y|u i )  [S Xi j (x|u i ) + S Di j (y|u i ) − 1]

where λ0 (t)  r0 (t)  1 and θ  6 (Kendall’s tau  0.75). Censoring variables Ci j


were generated from a uniform distribution on (0, 5) that yielded about 30% censored
subjects. The training data consist of {(Ti j , Ti∗j , δi j , δi∗j , Z i j , Ui j ); i  1, . . . , 5, j 
1, . . . , 200}.
After fitting the data to the joint frailty-copula model, we calculated the patient-
level survival function Ŝ(w|·). To calculate the prediction error, we independently
j , Di j , Z i j , Ui j ); i  1, . . . , 5, j  1, . . . , 200}
generated the test data {(X iTest Test Test Test

using the same algorithms as the training data (with different random seeds). The
prediction error was then estimated as

1  ! "2
5 200
Err(w)  j >w) − Ŝ(w| f (Z i j , Ui j ) .
I(DiTest Test Test
1000 i1 j1

For the null predictor, we used the Kaplan–Meier estimator Ŝ(w|·) computed by the
data {(Ti∗j , δi∗j , Ui j,k ); i  1, . . . , 5, j  1, . . . , 200}. We report the average of the
prediction errors for 50 repetitions.

4.8.2 Simulation Results

Figure 4.1 compares the prediction error curve Err(w), 0 ≤ w ≤ 3, for the four
different predictors (null, compound covariate, ridge, and Lasso). The smallest pre-
diction error was achieved by the compound covariate predictor. The Lasso and
ridge predictors exhibited very similar prediction errors. The three predictors (com-
pound covariate, ridge, and Lasso) expressed remarkably smaller prediction errors
compared to the null predictor.
We explore the reason why the compound covariate is successful. On average,
the predictor CC2  ĉ1 W1 + · · · + ĉq2 Wq2 contains q2 ≈ 50.34 genes. Hence, the
average number of q2 is close to the number of nonzero coefficients in the population
coefficients ξ. Figure 4.2 displays P-values for the 400 genes from a single simulation
run. We see that 50 P-values are below 0.001, and they exactly correspond to the
50 informative genes. Hence, all the 50 informative genes are selected into CC2 
ĉ1 W1 + · · · + ĉ50 W50 .
On average, the Lasso-based predictor had { j : ς̂ j  0}  62.98 genes of
nonzero coefficients. This implies that the Lasso predictor contains at least 12 non-
informative genes (noise genes). In the ridge-based predictor, the coefficients of the
4.8 Simulations 69

Fig. 4.1 The plots of prediction error with four different prediction methods (averaged for 50 runs)

Fig. 4.2 P-values for the 400 genes from a single simulation run
70 4 High-Dimensional Covariates in the Joint Frailty-Copula Model

50 informative genes are |ς̂ j |≈ 0.046 while those of the 350 non-informative genes
are |ς̂ j |≈ 0.001. Although there are strong shrinkage effects for non-informative
genes, they still yield some noises to the ridge-based predictor.
Figure 4.2 also explains the robustness of the compound covariate predictors
against the change of the P-value cutoff. The 50 smallest P-values are so small that
the selection results are unchanged by decreasing the cutoff P-value of 0.001. If one
increases the cutoff P-value from 0.001 to 0.005, two non-informative genes are
included in the compound covariate. Since all the 50 informative genes are still kept
in the compound covariate, the prediction performance is almost unchanged by the
two noise genes.

4.9 Case Study: Ovarian Cancer Data

We performed an IPD meta-analysis on the subset of the ovarian cancer data of


Ganzfried et al. (2013) to demonstrate how gene expressions are incorporated into
the joint frailty-copula model. Our subset consists of 912 ovarian cancer patients from
G  4 different studies (Table 4.1). The detailed process of data extraction is referred
to in the bottom of Table 4.1. Across the four studies, 11,756 gene expressions are
available. All the expression values are standardized to have mean of 0 and SD of 1
in the patients. Among 11,756 genes, we initially chose a subset consisting of 6056
genes whose coefficient of variation in expression values is greater than 3%.

4.9.1 Compound Covariate

We performed univariate feature selection on the 6056 genes. Figure 4.3 shows
the P-values for testing the univariate association between the 6056 genes and OS.
Apparently, majority of genes are non-informative to predict OS as their log(P-
values) are around zero. We chose 128 genes whose P-values are below 0.001. In a
similar fashion, we chose 158 genes univariately associated with time-to-relapse.
Based on the selected genes, we obtained two compound covariates:

CC1,i j  (0.249 × CXCL12i j ) + (0.235 × TIMP2i j ) + (0.222 × PDPNi j )


+ · · · + (−0.152 × MMP12i j )

involving 158 genes (P-value < 0.001 for time-to-relapse), and

CC2,i j  (0.237 × NCOA3i j ) + (0.223 × TEAD1i j ) + (0.263 × YWHABi j )


+ · · · + (−0.157 × KCNH4i j ),

involving 128 genes (P-value < 0.001 for time-to-death).


4.9 Case Study: Ovarian Cancer Data 71

Fig. 4.3 P-values for the univariate association between the 6056 genes and OS from the ovarian
cancer data. Among them, 128 genes satisfy the P-value < 0.001 criterion

In the above CCs, the gene symbols were ordered by their significance. For
instance, CXCL12 was the most strongly associated gene for time-to-relapse. The
supplementary material of Emura et al. (2018) describes how the genes TIMP2,
PDPN, NCOA3, TEAD1, and YWHAB are biologically associated with relapse and
death. The means and SDs of the compound covariates are
#
$
1 4  Ni
$ 1  4  Ni
μ̂1  CC1,i j  0.338, σ̂1  % (CC1,i j − μ̂1 )2  10.468,
912 i1 j1 911 i1 j1
#
$
1  $ 1 
4 Ni 4  Ni
μ̂2  CC2,i j  0.222, σ̂2  % (CC2,i j − μ̂2 )2  7.894.
912 i1 j1 911 i1 j1

4.9.2 Fitting the Joint Frailty-Copula Model

In addition to the gene expressions, we have a clinical covariate (Z 2,i j  0 vs.  1)


on the residual tumour size at surgery (≤ 1 cm vs. >1 cm). The joint.Cox R package
was applied to fit the data to the joint frailty-copula model as

Pr(X > x, D > y|u)  Cθ̂ [ Ŝ X (x|u), Ŝ D (y|u)]  [ Ŝ X (x|u)−θ̂ + Ŝ D (y|u)−θ̂ − 1]−1/θ̂ ,
72 4 High-Dimensional Covariates in the Joint Frailty-Copula Model

where θ̂  1.9 (95%CI: 1.5–2.5), giving Kendall’s tau τ̂  0.49 (95%CI: 0.43–0.55),
and
  
CC1 − μ̂1
Ŝ X (x|u)  exp −u R̂0 (t) exp γ̂1 ,
σ̂1
  
CC2 − μ̂2
Ŝ D (y|u)  exp −Λ̂0 (t) exp β̂2 Z 2 + γ̂2 .
σ̂2

Here, the frailty term u appears only for the survival function of time-to-
relapse as we set α  0 as the best value. All the regression coefficients
in the model were significant (P-value < 0.05). Their relative risks were
exp(γ̂1 )  1.48 (95%CI : 1.37−1.59), exp(β̂2 )  1.18 (95%CI : 1.03−1.35),
and exp(γ̂2 )  1.56 (95%CI : 1.43−1.70). The heterogeneity parameter was
η̂  0.04 (95%CI : 0.01−0.22). The baseline hazard functions are estimated as

r̂0 (t)  d R̂0 (t)/dt  0.85 × M1 (t) + 2.14 × M2 (t) + 0 × M3 (t)


+ 0.07 × M4 (t) + 0 × M5 (t),
λ̂0 (t)  dΛ̂0 (t)/dt  0.17 × M1 (t) + 1.05 × M2 (t) + 1.24 × M3 (t)
+ 0.27 × M4 (t) + 0 × M5 (t),

for t ∈ [0, 6420], where 6420 (days) is the maximum follow-up time.
Appendix B (B.2) provides our R codes to reproduce the fitted values of the above
analysis.

4.9.3 Patient-Level Survival Function

To see how the patient-level survival function is predicted, we consider four hypo-
thetical patients (named Patients 1–4) with the following characteristics:

• Patient 1: risk genes (CC2  16); the residual tumour >1 cm (Z 2  1),
• Patient 2: protective genes (CC2  −16); the residual tumour ≤ 1 cm
(Z 2  0),
• Patient 3: average genes (CC2  0); Z 2  1,
• Patient 4: average genes (CC2  0); Z 2  0.

Here, the values of CC2  ±16 are chosen based on a two SD change from the
mean such that μ̂2 + 2σ̂2 ≈ 16 and μ̂2 − 2σ̂2 ≈ −16.
Figure 4.4 displays the patient-level survival function for Patients 1–4. Patient 1
has the shortest survival, e.g., the predicted survival probability at 1500 days (about
4 years) is only 15%. In contrast, Patient 2 achieves the longest survival with the
4.9 Case Study: Ovarian Cancer Data 73

Fig. 4.4 The patient-level survival functions and their 95% CIs (dotted lines)

predicted survival probability 76% at 1500 days. The tight confidence intervals for
their predicted survival probabilities confirm that the difference between Patient 1
and Patient 2 is not by chance. However, the difference of survival probabilities
between Patient 3 and Patient 4 is less clear.

4.10 Concluding Remarks

To handle high-dimensional covariates, we adopted a simple approach based on


Tukey’s compound covariate followed by univariate feature selection. We applied
the compound covariate predictor to the joint frailty-copula model and developed
a patient-level prediction scheme for survival. In our simulations, the compound
covariate method showed better predictive ability than the ridge-based or Lasso-
based approach. In analysis of ovarian cancer patients, we showed that the devel-
oped patient-level survival can classify patients into low, medium, and high-risk
groups. However, we recognize the need for validating our prediction formula with
an independent validation set of patients before it is widely applied by clinicians.
Our patient-level prediction of OS is based on covariates collected at the study
entry time. In the example of the ovarian cancer data, the study entry time refers
to the time at surgery where a tumour is surgically removed and its gene expres-
sions and residual tumour size are measured. If necessary, the prediction of OS is
updated at each scheduled monitoring date after surgery. Such a dynamic prediction
scheme needs a more elaborate formulation than the patient-level survival function
considered in this chapter. We shall discuss this topic in details in Chap. 5.
74 4 High-Dimensional Covariates in the Joint Frailty-Copula Model

A successful predictor based on gene expressions should not omit informative


genes. It is known that omitting informative gene from a predictor has a greater
deleterious effect than including non-informative genes (Simon 2005). Thus, it may
be difficult to develop a successful predictor based on a small number of genes. For
instance, a large number of informative genes are encountered in the lymphoma data
reported in Matsui (2006), where the optimized number of genes is 75 or 85. In
our illustrative example of ovarian cancer patients, univariate selection yielded 128
genes associated with time-to-death and 158 genes associated with time-to-relapse
(P-value < 0.001).

References

Beer DG, Kardia SLR, Huang CC, Giordano TJ, Levin AM et al (2002) Gene-expression profiles
predict survival of patients with lung adenocarcinoma. Nat Med 8:816–824
Bickel PJ, Levina E (2004) Some theory for Fisher’s linear discriminant function, naive Bayes,
and some alternatives when there are many more variables than observations. Bernoulli
10(6):989–1010
Bøvelstad HM, Nygård S, Storvold HL, Aldrin M, Borgan Ø et al (2007) Predicting survival from
microarray data—a comparative study. Bioinformatics 23:2080–2087
Chen HY, Yu SL, Chen CH, Chang GC, Chen CY et al (2007) A five-gene signature and clinical
outcome in non-small-cell lung cancer. N Engl J Med 356:11–20
Cox DR (1972) Regression models and life-tables (with discussion). J R Stat Soc Series B Stat
Methodol 34:187–220
Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification
of tumors using gene expression data. J Am Stat Assoc 97(457):77–87
Emura T (2019) joint.Cox: joint frailty-copula models for tumour progression and death in meta-
analysis, CRAN
Emura T, Chen YH, Chen HY (2012) Survival prediction based on compound covariate under
Cox proportional hazard models. PLoS ONE 7(10):e47627. https://doi.org/10.1371/journal.pone.
0047627
Emura T, Nakatochi M, Murotani K, Rondeau V (2017) A joint frailty-copula model between
tumour progression and death for meta-analysis. Stat Methods Med Res 26(6):2649–2666
Emura T, Nakatochi M, Matsui S, Michimae H, Rondeau V (2018) Personalized dynamic prediction
of death according to tumour progression and high-dimensional genetic factors: meta-analysis
with a joint model. Stat Methods Med Res 27(9):2842–2858
Emura T, Matsui S, Chen HY (2019) compound.Cox: univariate feature selection and compound
covariate for predicting survival. Comput Methods Programs Biomed 168:21–37
Ganzfried BF, Riester M, Haibe-Kains B, Risch T, Tyekucheva S et al (2013) Curated ovarian
data: clinically annotated data for the ovarian cancer transcriptome. Database; Article ID bat013.
https://doi.org/10.1093/database/bat013
Gerds TA, Schumacher M (2006) Consistent estimation of the expected Brier score in general
survival models with right-censored event times. Biometrical Journal 48(6):1029–1040
Goeman J, Meijer R, Chaturvedi N (2016) penalized: L1 (lasso and fused lasso) and L2 (ridge)
penalized estimation in GLMs and in the Cox model, CRAN; version 0.9-47
Lossos IS, Czerwinski DK, Alizadeh AA, Wechser MA, Tibshirani R, Botstein D, Levy R (2004)
Prediction of survival in diffuse large-B-cell lymphoma based on the expression of six genes. N
Engl J Med 350(18):1828–1837
Matsui S (2006) Predicting survival outcomes using subsets of significant genes in prognostic
marker studies with microarrays. BMC Bioinform 7:156
References 75

Matsui S, Simon RM, Qu P, Shaughnessy JD, Barlogie B, Crowley J (2012) Developing and vali-
dating continuous genomic signatures in randomized clinical trials for predictive medicine. Clin
Cancer Res 18(21):6065–6073
Schumacher M, Hollander N, Schwarzer G, Binder H, Sauerbrei W (2012) Prognostic factor studies.
In Crowley JJ, Hoering A (ed) Handbook of statistics in clinical oncology, 3rd edn. CRC Press,
Boca Raton, pp 415–469
Simon R (2003) Design and analysis of DNA microarray investigations. Springer Science & Busi-
ness Media, New-York
Simon R (2005) Roadmap for developing and validating therapeutically relevant genomic classifiers.
J Clin Oncol 23(29):7332–7341
Tukey JW (1993) Tightening the clinical trial. Control Clin Trials 14:266–285
van Wieringen WN, Kun D, Hampel R, Boulesteix AL (2009) Survival prediction using gene
expression data: a review and comparison. Comput Stat Data Anal 53(5):1590–1603
Waldron L, Haibe-Kains B, Culhane AC, Riester M, Ding J et al (2014) Comparative meta-analysis
of prognostic gene signatures for late-stage ovarian cancer. J Natl Cancer Inst 106(5):dju049
Wang Y, Klijn JG, Zhang Y, Sieuwerts AM et al (2005) Gene-expression profiles to predict distant
metastasis of lymph-node-negative primary breast cancer. The Lancet 365(9460):671–679
Witten DM, Tibshirani R (2010) Survival analysis with high-dimensional covariates. Stat Methods
Med Res 19:29–51
Zhao SD, Parmigiani G, Huttenhower C, Waldron L (2014) Más-o-menos: a simple sign averaging
method for discrimination in genomic data analysis. Bioinformatics 30(21):3062–3069
Chapter 5
Personalized Dynamic Prediction
of Survival

Abstract In the development of patient-tailored therapy, there is a great interest in


the dynamic prediction of survival at a certain moment in time (e.g., at a follow-up
visit after surgery). This chapter considers dynamic prediction formulas of predict-
ing survival for a cancer patient. The prediction formulas incorporate the genetic and
clinical covariates collected on the patient entry as well as the tumour progression
history evolving after the entry. We first review the framework of dynamic prediction
by introducing prediction formulas, such as the conditional failure function and con-
ditional hazard function. We then demonstrate how the parameters in the prediction
formulas are estimated by fitting meta-analytic data to the joint frailty-copula model.
For illustration, we apply the dynamic prediction formulas to predict survival for
ovarian cancer patients.

Keywords Conditional failure function · Conditional hazard function · Gene


expression · Meta-analysis · Ovarian cancer · Personalized medicine · Risk
prediction · Semi-competing risks · Tumour progression

5.1 Accurate Prediction of Survival

An important question in survival analysis is whether one can accurately predict


survival for a cancer patient according to the patient’s information. A number of
data-driven methods have been developed for predicting survival, e.g., for patients
with breast cancer (Gómez et al. 2016; Shukla et al. 2018), ovarian cancer (Yoshihara
et al. 2010; Enshaei et al. 2015; Emura et al. 2018), and prostate cancer (Guinney
et al. 2017). An accurate survival prediction method allows patients to consider their
future and physicians to choose an optimal therapy, constituting a core element of
personalized medicine. According to its definition, personalized medicine seeks to
improve health care by advancing the development of patient-tailored therapy based
on genetic information (Schleidgen et al. 2013; Hayes et al. 2014).
Waldron et al. (2014) performed a large-scale meta-analysis on late-stage
ovarian cancer patients to examine the prediction performances of 14 published
methods based on gene expressions. They concluded that 12 methods demonstrate

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2019 77
T. Emura et al., Survival Analysis with Correlated Endpoints,
JSS Research Series in Statistics, https://doi.org/10.1007/978-981-13-3516-7_5
78 5 Personalized Dynamic Prediction of Survival

their statistical significance for predicting overall survival (i.e., time-to-death) for
independent validation data (P-value < 0.05). However, they also noted the modest
gain in prediction accuracy (c-index of 0.56–0.60), suggesting the need for further
improvement to be of clinical value.
Several ideas for improving the accuracy of prediction are listed below
Combine clinical and genetic information
A number of prognostic models have been developed by applying both clinical and
genetic information (Matsui 2006; Binder and Schumacher 2008; Bøvelstad et al.
2009; van Houwelingen and Putter 2011; Matsui et al. 2012; Sun et al. 2018). These
studies concluded that the model incorporating both clinical and genetic information
leads to better predictive ability than the model including one of them alone. They
also concluded that the clinical and genetic covariates are independent predictors for
survival. For ovarian cancer patients treated by surgery, the residual tumour size and
gene expressions are independent clinical and genetic predictors for overall survival
(Yoshihara et al. 2010, 2012; Emura et al. 2018).
Use intermediate events
At study entry, available covariates for a patient would be age, stage, grade, residual
tumour size, etc. In addition to these covariates, some intermediate events (e.g.,
tumour progression) may influence survival during the follow-up. The framework of
dynamic prediction (van Houwelingen and Putter 2011) offers prediction formulas
that utilize the record of intermediate events occurring after study entry. Throughout
this chapter, we shall discuss the theory and application of dynamic prediction.
Use larger training samples to build a prediction formula
This idea is critical when a prediction algorithm involves gene selection (feature
selection). The results of the selection are unstable for small training samples and
high-dimensional features, which could be alleviated by increasing the sample size
(Michiels et al. 2005). Meta-analysis of individual patient data (IPD) is one promis-
ing way to stabilize the results. However, there are a few technical issues for per-
forming IPD meta-analyses. The first issue is the heterogeneity between studies,
which typically demands random-effects models or frailty models (Burzykowski
et al. 2001; Rondeau et al. 2015). The second one is the inconsistent definitions of
clinical covariates or inconsistently collected measurements between studies, result-
ing in the scarcity of reliable covariates in prediction models. This facilitates the
need to account for residual dependence (Chap. 3). The joint frailty-copula model
is a tailored model to resolve these difficulties in IPD meta-analyses (Emura et al.
2017).
Apply robust statistical methods for selecting genes and calculating a predictor:
Even when the sample size is large, the prediction results still depend on the choices
of statistical methods and some tuning parameters. Almost all statistical methods
for selecting genes and calculating a predictor are “tuned” versions of Cox’s partial
5.1 Accurate Prediction of Survival 79

likelihood method (Bøvelstad et al. 2007; van Wieringen et al. 2009; Witten and
Tibshirani 2010; Emura et al. 2019). Users need to specify a tuning parameter to
avoid over-fitting of high-dimensional genetic covariates. The tuning parameter can
be the P-value threshold in univariate feature selection or the shrinkage parameter in
the penalized partial likelihood method (Bøvelstad et al. 2007). In the analysis of a
single endpoint, a tuning parameter is usually optimized for the cross-validated partial
likelihood (Matsui 2006; Bøvelstad et al. 2007). However, the partial likelihood is
no longer applicable to joint models that demand the full likelihood for estimation.
In the sequel, we shall apply the compound covariate method (Chap. 4) to attain a
good degree of robustness against the choice of tuning parameters.

5.2 Framework of Dynamic Prediction

Dynamic prediction is a methodology that can utilize the record of intermediate


events accumulated before making prediction at time t (van Houwelingen and Putter
2011). For example, tumour progression of a patient may be strongly predictive of the
patient’s overall survival. However, the intermediate events are not available at the
study entry (at time t  0) as they evolve with time. The study entry time (t  0) can
be defined in a variety of ways, such as the date of surgery, the date of randomization,
and the starting date of chemotherapy. The prediction time t > 0 can be one of the
scheduled follow-up visits, where a clinician may carry out some examinations for
a patient.
Let D be time-to-death and X be time-to-tumour progression (TTP) measured
from the study entry. Let Z be a vector of covariates including both clinical and
genetic covariates recorded at the entry. In cancer research, D is more often called
overall survival (OS). See Chap. 2 for detailed discussions about OS and TTP. It
is assumed that Z is recorded at time t  0 and does not change over time (time-
dependent covariates are not considered). Such covariates are often called baseline
covariates.
In the traditional survival analysis, prediction of OS is based on the survival
function S(w|Z)  Pr(D > w|Z) where w > 0 is a fixed time period (e.g., 5 years).
The survival function aims to predict the vital status (alive or dead) after time w. This
prediction scheme shall be called baseline prediction since the prediction formula
S(w|Z) is constructed with the covariates recorded at time t  0. Since X is not
available at time t  0, it is not included in the prediction formula.
The simplest form of dynamic prediction utilizes the conditional survival function,
defined as

S(t + w|Z)
S(t, t + w|Z)  Pr(D > t + w|D > t, Z)  .
S(t|Z)

If a patient is surviving at time t, the prediction based on the above conditional


survival function is more informative than the baseline prediction.
80 5 Personalized Dynamic Prediction of Survival

In dynamic prediction, it is customary to use the conditional failure function

F(t, t + w|Z)  Pr(D ≤ t + w|D > t, Z)


 1 − Pr(D > t + w|D > t, Z),
S(t + w|Z)
1−
S(t|Z)

rather than the conditional survival function. The conditioning event {D ≥ t} means
that prediction of survival is meaningful only when a patient is still alive at time
t > 0. In this sense, the conditional failure function is similar to the hazard function
that quantifies the instantaneous risks of death for those who are alive at time t > 0.
Indeed, the hazard function is related to the conditional failure function through

F(t, t + dw|Z) dF(t, t + w|Z) 
λ(t|Z)    .
dw dw w0

In dynamic prediction, a prediction formula is constructed at t > 0 so that pro-


gression information about X may be available in addition to { D ≥ t } and Z. We
shall call t as prediction time, the time at which a clinician makes a prediction for a
patient.

5.2.1 Conditional Failure Function

First, suppose that a patient does not experience tumour progression at time t (i.e.,
X > t). Given that the patient is alive at time t, the conditional probability of death
between t and t + w is

F(t, t + w|X > t, Z)  Pr(D ≤ t + w| D > t, X > t, Z).

Second, suppose that a patient experiences tumour progression before time t and
that the time of the tumour progression (i.e., X  x) is available at time t. Given that
the patient is still alive at time t, the conditional probability of death between t and
t + w is

F(t, t + w|X  x, Z)  Pr(D ≤ t + w| D > t, X  x, Z), x ≤ t.

Tumour progression occurring to a patient may increase his/her probability of


death. Thus, one may expect the inequality F(t, t + w|X  x, Z) > F(t, t + w|X >
t, Z) to hold for w > 0 (Fig. 5.1). This inequality is usually implied by positive
dependence between X and D. Clearly, the equality F(t, t + w|X  x, Z)  F(t, t +
w|X > t, Z) is implied by the conditional independence between X and D given Z.
5.2 Framework of Dynamic Prediction 81

Fig. 5.1 Tumour progression occurring to a patient may increase his/her probability of death. Thus,
the inequality F(t, t + w|X  x, Z) > F(t, t + w|X > t, Z) holds for w > 0

The prediction time t should be chosen prospectively. For instance, t can be chosen
according to calendar schedule (e.g., every 6 weeks after treatment) prescribed in a
clinical trial. This schedule should not be influenced by the health status of a patient
or any other reasons that might be informative for survival.
Figure 5.2 demonstrates a dynamic prediction for four hypothetical patients
(named Patients 1–4). Patients 1 and 2 have died before time t, so they are excluded
from the target for prediction. Patients 3 and 4 are alive at time t, so they are the
target for prediction. Patient 3 experiences tumour progression before time t, so the
prediction formula incorporates TTP. Patient 4 does not experience tumour progres-
sion before time t, so the prediction formula incorporates the information that TTP
is greater than time t.
In dynamic prediction, the conditional failure function provides graphical tools
to demonstrate the time-varying risks of death. To facilitate the interpretation, one
may fix either t or w.
Given w, the conditional failure function is interpreted analogously with the con-
ditional hazard function. It represents how the amount of risk changes over time t.
The increasing (decreasing) hazard function typically corresponds to the increasing
conditional failure function (decreasing). van Houwelingen and Putter (2011) pro-
vide several examples to demonstrate this way of interpreting the conditional failure
function.
Given t, the conditional failure function is interpreted analogously with the usual
distribution function; it is an increasing function from zero (at w  0) to one (at
w  ∞). The plot of the conditional failure function against w depicts how the risk
of death evolves over time (Mauguen et al. 2013; Emura et al. 2018). See Fig. 5.1.
82 5 Personalized Dynamic Prediction of Survival

Death
Patient 1
Progression
---------------------------------------------------------------------------------------------------------------
Death without progression

Patient 2
-----------------------------------------------------------------------------------------------------------------

Death with probability

Patient 3
Progression
at time x
---------------------------------------------------------------------------------------------------------------

Death with probability

Patient 4
No progression
before time t

t 0 t t w

Fig. 5.2 Dynamic prediction of death according to tumour progression. Prediction is not performed
for Patients 1 and 2 who have died before time t. Prediction is performed for Patients 3 and 4
according to their tumour progression status observed before time t

5.2.2 Conditional Hazard Function

We define the conditional hazard function by letting w to be a small number dw in the


conditional failure function. Specifically, we define two conditional hazard functions
as

λ(t|X  x, Z)  F(t, t + dw|X  x, Z)/dw,

λ(t|X > x, Z)  F(t, t + dw|X > x, Z)/dw.

Dependence between X and D is assessed by the cross-ratio function (Oakes 1989)


5.2 Framework of Dynamic Prediction 83

λ(t|X  x, Z) Pr(X  x, D  t|Z) Pr(X > x, D > t|Z)


 , t > 0, x > 0.
λ(t|X > x, Z) Pr(X  x, D > t|Z) Pr(X > x, D  t|Z)

The ratio greater than 1 corresponds to positive dependence while the ratio less
than 1 corresponds to negative dependence. See Chap. 2 for more details about the
cross-ratio function.
The cross-ratio function was originally employed by Clayton (1978) in order to
introduce a bivariate survival model that satisfies

λ(t|X  x, Z)  (θ + 1)λ(t|X > x, Z), t > 0, x > 0. (5.1)

for some constant θ > 0. This is a proportional hazards model for the effect of
{X  x} relative to the effect of {X > x}, where the parameter (θ + 1) represents the
relative risk. Clayton (1978) introduced a model

Pr(X > x, D > t|Z)  [SX (x|Z)−θ + SD (t|Z)−θ − 1]− θ , t > 0, x > 0,
1

where SX (x|Z) and SD (t|Z) are arbitrary continuous survival functions. Equation (5.1)
holds true under Clayton’s model, irrespective of the forms of SX (·|·) and SD (·|·).
Clearly, Clayton’s model has a copula Cθ (v, w)  (v−θ + w−θ − 1)−1/θ that is the
Clayton copula.
The cross-ratio function was also employed by Day et al. (1997) in the landmark
analysis of dynamic prediction. In the context of dynamic prediction, the interest lies
in the case of t ≥ x since one is concerned with the instantaneous risk of death at
time t according to the previous tumour progression status at time x. Especially by
letting t  x, one can consider the effect of the current tumour progression status on
the instantaneous risk of death at time t. The corresponding relative risk is

λ(t|X  t, Z)
, t > 0.
λ(t|X > t, Z)

This function represents the current effect of tumour progression on survival at time
t. By Eq. (5.1), this function is constant under the Clayton model.
The actual formulas for the conditional failure function and conditional hazard
function depend on the types of models. van Houwelingen and Putter (2011) adopted
the landmark approach based on the conditional Cox model at each prediction time. In
recent years there has been a noticeable trend using joint models that account for the
dependence between survival and other responses via frailty. Different frailty models
have been developed to join different response types (Rizopoulos 2011; Mauguen
et al. 2013, 2015; Proust-Lima et al. 2014; Rondeau et al. 2017; Król et al. 2016; Sène
et al. 2016). However, these existing frailty models for dynamic prediction have not
been adapted to meta-analyses that combine heterogeneous studies. In the sequel, we
introduce the joint frailty-copula model (Chap. 3) to construct the prediction formula
based on IPD meta-analyses.
84 5 Personalized Dynamic Prediction of Survival

5.3 Prediction Formulas Under the Joint Frailty-Copula


Model

We need an appropriate statistical model to derive prediction formulas, such as the


conditional failure/hazard function. Since we aim to use meta-analytic data col-
lected from heterogeneous studies, we consider a frailty model. Specifically, let
SX (x|u)  Pr(X > x|u, Z) and SD (y|u)  Pr(D > y|u, Z) be survival functions
given an unobserved frailty term u. Here, we use u to represent the unobserved het-
erogeneity of patients, which is not explained by observed covariates Z. To simplify
the presentation, we have suppressed Z in the notations of SX (x|u) and SD (y|u). We
impose the assumption that u follows a gamma distribution with a density
 
1 1
−1 u
fη (u)  u η exp − , u > 0, η > 0.
(1/η)η1/η η

The distribution has mean 1 and variance η that represents the degree of heterogeneity.
As in Chap. 3, we consider the joint frailty-copula model

Pr(X > x , D > y|u, Z)  Cθ [SX (x|u), SD (y|u)],

where Cθ (v, w) is a copula, and the parameter θ represents the degree of association
between TTP and OS.
Following Emura et al. (2018), we divide a prediction formula into two different
cases according to the tumour progression status:

Case I If the patient does not experience tumour progression before time t
(i.e., X > t), the conditional failure function is

F(t, t + w|X > t, Z)  Pr(D ≤ t + w|D > t, X > t, Z)


∞
(Cθ [SX (t|u), SD (t|u)] − Cθ [SX (t|u), SD (t + w|u)])fη (u)du
 0 ∞ ,
0 Cθ [SX (t|u), SD (t|u)]fη (u)du

and the conditional hazard function is

SD (t|u)Cθ[0,1] [SX (t|u), SD (t|u)]


λ(t|X > t, Z, u)  λD (t|u) ,
Cθ [SX (t|u), SD (t|u)]

where λD (t|u)  −∂ log SD (t|u)/∂t, and Cθ[0,1] (v, w)  ∂Cθ (v, w)/∂w.
5.3 Prediction Formulas Under the Joint Frailty-Copula Model 85

Case II If the patient experiences tumour progression before time t (i.e., X 


x, x ≤ t), the conditional failure function is
F(t, t + w|X  x, Z)  Pr(D ≤ t + w|D > t, X  x, Z)
 ∞  [1,0] 
0 Cθ [SX (x|u), SD (t|u)] − Cθ[1,0] [SX (x|u), SD (t + w|u)] λX (x|u)SX (x|u)fη (u)du
  ∞ [1,0] ,
0 Cθ [SX (x|u), SD (t|u)]λX (x|u)SX (x|u)fη (u)du

where λX (x|u)  −∂ log SX (x|u)/∂x. The conditional hazard function is

SD (t|u)Cθ[1,1] [ SX (x|u), SD (t|u) ]


λ(t|X  x, Z, u)  λD (t|u) ,
Cθ[1,0] [ SX (x|u), SD (t|u) ]

where Cθ[1,0] (v, w)  ∂Cθ (v, w)/∂v and Cθ[1,1] (v, w)  ∂ 2 Cθ (v, w)/∂v∂w.

The derivations of these formulas are given in Appendix C. Appendix C also


gives the simplified expressions under the independence copula C(v, w)  vw that
corresponds to the joint frailty model of Rondeau et al. (2015).
We have defined the conditional hazard functions given the frailty u rather than
integrating it out. The reason is to utilize a mathematical relationship

λ(t|X  t, Z, u)
 Rθ [SX (t|u), SD (t|u)],
λ(t|X > t, Z, u)

where

Cθ[1,1] (v, w)Cθ (v, w)


Rθ (v, w) 
Cθ[1,0] (v, w)Cθ[0,1] (v, w)

is the cross-ratio function for the copula (Chap. 2). Under the Clayton copula
Cθ (v, w)  (v−θ + w−θ − 1)−1/θ , the cross-ratio function becomes Rθ (v, w)  1 + θ
that is interpreted as the relative risk of {X  t} versus {X > t}. Some other copulas
also induce simple mathematical forms of Rθ (v, w) (Chap. 2).
To perform dynamic prediction on the basis of the conditional hazard functions,
clinicians need to specify the unobserved frailty u. They may choose u  1 which
corresponds to the prediction for the average patient. Some sensitivity analysis on
√ √
the range of 1 − 2 η ≤ u ≤ 1 + 2 η might also be helpful.
86 5 Personalized Dynamic Prediction of Survival

5.4 Estimating Prediction Formulas

Before performing dynamic prediction for a new patient, all the unknown parameters
in the prediction formulas must be estimated by a training dataset. The new patient
refers to a hypothetical patient who is not included in the training dataset. We assume
that the survival outcome has not been observed for the new patient, but the baseline
covariates Z  (Z1 , Z2 , CC1 , CC2 ) have been recorded
• Z1 : p1 clinical covariates associated with TTP,
• Z2 : p2 clinical covariates associated with OS,
• CC1  w1 V1 + · · · + wq1 Vq1 : compound covariate predictor for TTP,
• CC2  1 W1 + · · · + q2 Wq2 ,: compound covariate predictor for OS,
where (V1 , . . . , Vq1 ) are q1 gene expressions associated with TTP, (W1 , . . . , Wq2 )
are q2 gene expressions associated with OS, where the weights wk and k are deter-
mined by the training dataset. Assume that the gene expressions are standardized to
have mean  0 and SD  1. A method of selecting q1 (or q2 ) genes and computing
wk (k ) is detailed in Chap. 4.
By fitting a training dataset to the joint frailty-copula model, one can estimate
survival functions for TTP and OS, respectively, as
  
 CC1 − μ̂1
ŜX (t|u)  exp −uR̂0 (t) exp β̂ 1 Z1 + γ̂1 ,
σ̂1
  
 CC2 − μ̂2
ŜD (t|u)  exp −uα Λ̂0 (t) exp β̂ 2 Z2 + γ̂2 ,
σ̂2
t t
where R̂0 (t)  0 r̂0 (x)dx and Λ̂0 (t)  0 λ̂0 (x)dx are estimated baseline hazard
functions, and (θ̂, η̂, β̂ 1 , β̂ 2 , γ̂1 , γ̂2 , r̂0 , λ̂0 ) are parameter estimates, μ̂1 (or μ̂2 )
is the mean of CC1 (or CC2 ), and σ̂1 (or σ̂2 ) is the SD of CC1 (or CC2 ); the details are
referred to Chap. 4. The baseline hazard functions are estimated by r̂0 (t)  ĝ M(t)
and λ̂0 (t)  ĥ M(t), where M(t)  (M1 (t), . . . , M5 (t)) are the cubic M-spline
basis functions (Chap. 3; Appendix A).
These estimates can be applied to compute the conditional failure/hazard func-
tions. For instance, we compute the conditional failure functions

F̂(t, t + w|X > t, Z)


∞ 
0 Cθ̂ [ŜX (t|u), ŜD (t|u)] − Cθ̂ [ŜX (t|u), ŜD (t + w|u)] fη̂ (u)du
 ∞ ,
0 Cθ̂ [ŜX (t|u), ŜD (t|u)]fη̂ (u)du
F̂(t, t + w|X  x, Z)
 ∞  [1,0] [1,0]

0 C [ ŜX (x|u), ŜD (t|u)] − C θ [ ŜX (x|u), SD (t + w|u)] λ̂X (x|u)ŜX (x|u)fη̂ (u)du
θ̂
  ∞ [1,0] .
0 C [ ŜX (x|u), ŜD (t|u) ]λ̂X (x|u)ŜX (x|u)fη̂ (u)du
θ̂
5.4 Estimating Prediction Formulas 87

The confidence interval (CI) is computed by a simulation method of Sect. 4.7 in


Chap. 4.
Estimates for the conditional hazard λ(t|·) are obtained in a similar way.

Remarks The spline basis functions M(t) are defined on t ∈ [ξ1 , ξ3 ], where ξ1 is the
smallest value of TTP and ξ3 is the largest value of OS in the training dataset. This
implies that the values of ŜX (t|u) and ŜD (t|u) are defined if t ∈ [ξ1 , ξ3 ]. Accordingly,
the values of F̂(t, t + w|·) is defined if both t ∈ [ξ1 , ξ3 ] and t + w ∈ [ξ1 , ξ3 ] hold. If
t < ξ1 or t + w > ξ3 , the values of F̂(t, t + w|·) are undefined.

5.5 Case Study: Ovarian Cancer Data

We use the data of Ganzfried et al. (2013) to demonstrate the dynamic prediction
formulas under the joint frailty-copula model. The data consist of 912 ovarian cancer
patients (American, Australian, and Japanese patients) from G  4 studies. The
endpoints of interest are time-to-relapse and time-to-death, referred to as TTP and
OS, respectively. A large number of gene expressions are available as prognostic
factors for TTP and OS. The data is summarized in Table 4.1 of Chap. 4.
As in Chap. 4, we construct compound covariates

CC1  (0.249 × CXCL12) + (0.235 × TIMP2) + (0.222 × PDPN )


+ · · · + (−0.152 × MMP12),

involving 158 genes (P-value < 0.001 for TTP), and

CC2  (0.237 × NCOA3) + (0.223 × TEAD1) + (0.263 × Y W HAB)


+ · · · + (−0.157 × KCNH 4),

involving 128 genes (P-value < 0.001 for OS).

Here, gene expressions (e.g., CXCL12) are standardized to have mean  0 and SD
 1 in the 912 patients. The fitted joint frailty-copula model is

Pr(X > x , D > y|u)  Cθ̂ [ŜX (x|u), ŜD (y|u)]  [ŜX (x|u)−θ̂ + ŜD (y|u)−θ̂ − 1]−1/θ̂ ,

where θ̂  1.9 (Kendall’s tau τ̂  0.49),


  
CC1 − μ̂1
ŜX (x|u)  exp −uR̂0 (t) exp γ̂1 ,
σ̂1
  
CC2 − μ̂2
ŜD (y|u)  exp −Λ̂0 (t) exp β̂2 Z2 + γ̂2 ,
σ̂2
88 5 Personalized Dynamic Prediction of Survival

where Z2 is a clinical covariate (0 or 1) on the residual tumour size at surgery (≤ 1 cm


or > 1 cm). The estimates are γ̂1  0.39, β̂2  0.16, γ̂2  0.44, μ̂1  0.338,
σ̂1  10.468, μ̂2  0.222, σ̂2  7.894,

r̂0 (t)  dR̂0 (t)/dt


 0.85 × M1 (t) + 2.14 × M2 (t) + 0 × M3 (t) + 0.07 × M4 (t) + 0 × M5 (t),

λ̂0 (t)  dΛ̂0 (t)/dt


 0.17 × M1 (t) + 1.05 × M2 (t) + 1.24 × M3 (t) + 0.27 × M4 (t) + 0 × M5 (t),

for t ∈ [0, 6420], where 6420 (days) is the maximum follow-up time. The het-
erogeneity parameter is Var(ui )  η̂  0.04. Although models including more
covariates could be considered, they do not improve the above model in terms of its
predictive ability. The R codes given in B2 of Appendix B produce the fitted values.
We use the joint.Cox R package (Emura 2019) to perform dynamic prediction on
two hypothetical patients with the following baseline covariates

• Patient 1: risk genes (CC1  10, CC2  10)1 ; residual tumour > 1 cm
(Z2  1).
• Patient 2: protective genes (CC1  −10, CC2  −10); residual tumour
≤ 1 cm (Z2  0).

For instance, one can compute the conditional failure function F̂(t, t + w|X  x, Z)
for Patient 2 with t  1000, t + w < 6420, and x  600 using the following codes:

1 Tocompute CC1 for a real patient, one needs to know his/her 158 gene expressions. We have
omitted this process to simplify the presentation. CC1  10 is about a one SD change from the
mean of CC1 . The same remarks apply to CC2 .
5.5 Case Study: Ovarian Cancer Data 89

Below are the outputs


90 5 Personalized Dynamic Prediction of Survival

Figure 5.3 displays the conditional failure functions for Patient 1 and Patient 2.
At the prediction time t  500 (days), Patient 1 has higher predicted probabilities of
death due to the unfavourable baseline covariates, compared to Patient 2. Here, we
have assumed that, at the prediction time t  500 (days), both Patient 1 and Patient
2 are relapse-free. To see how these predictions change as time passes, we assume
that Patient 2 experiences relapse at x  600 (days), but Patient 1 is still relapse-free
at the time t  1000 (days). Then, the predicted probability of death for Patient 2
gets higher than that for Patient 1. This risk inversion explains that the occurrence of
relapse is a stronger risk factor than the unfavourable influence of baseline covariates.

Fig. 5.3 The conditional failure functions computed for two patients

Figure 5.4 displays the conditional hazard function with relapse (λ̂(t|X 
t, Z, u)) and that without relapse (λ̂(t|X > t, Z, u)) at u  1. Since we apply the
Clayton copula, the fitted model meets the following proportional hazards relation-
ship

λ̂(t|X  t, Z, u)  (1 + θ̂ )λ̂(t|X > t, Z, u)  2.9λ̂(t|X > t, Z, u).

Hence, the occurrence of relapse increases the risk of death by almost three times.
Figure 5.4 graphically exhibits this relationship. Patient 1 has higher hazard rates
of death than Patient 2 due to the unfavourable baseline covariates. The hazard rate
for Patient 2 is quite stable and slowly decreasing, irrespective of the relapse status.
Patient 2 has fairly good prognosis if relapse does not occur or tumour progression
is suppressed during the follow-up (blue line in Fig. 5.4).
5.6 Discussions 91

Fig. 5.4 The conditional hazard functions computed for two patients

5.6 Discussions

This chapter has introduced the ideas of dynamic prediction and their implementation
under the joint frailty-copula model. We have used the publicly available data on 912
ovarian cancer patients to establish the prediction formulas for overall survival. In
addition to the clinical and genetic covariates, the prediction formulas can utilize the
record of tumour progression events occurring before the time of prediction, where
a copula is used to link the association between OS and TTP. The use of tumour
progression information has not been considered in most of the available prediction
formulas for ovarian cancer patients, which applied the traditional Cox regression
models with clinical covariates and genetic covariates (Tothill et al. 2008; Yoshihara
et al. 2010, 2012; Waldron et al. 2014).
Clinicians may use the R codes in Sect. 5.5 to perform prognostic analysis of a
surgically treated ovarian cancer patient. Clinicians first enter the following items:
• Prediction time (e.g., time  1000 days after surgery),
• Time of tumour progression (e.g., X  600 days),
• Residual tumour size (e.g., Z2  0 for residual tumour ≤ 1 cm),
• CCs (e.g., CC1  −10 and CC2  −10). They are set as CC1  0 and CC2 
0 if no information is available for gene expressions.
Then, the codes automatically produce the predicted probability of death in the next
w years.
Emura et al. (2018) performed a leave-one-out cross-validation on the 912 patients
to estimate the prediction error of the dynamic prediction formulas. This validation
step follows the guideline given by Simon (2005), where both the selection of genes
92 5 Personalized Dynamic Prediction of Survival

and the estimation of dynamic prediction formulas should be performed without the
single left-out sample. Their cross-validation has demonstrated the benefit of using
the dynamic prediction formula. However, before the dynamic prediction formulas
are applied in clinical practice, an independent validation set may be employed to
further assess the prediction ability.

References

Binder H, Schumacher M (2008) Allowing for mandatory covariates in boosting estimation of


sparse high-dimensional survival models. BMC Bioinform 9(1):14
Bøvelstad HM, Nygård S, Borgan Ø (2009) Survival prediction from clinico-genomic models—a
comparative study. BMC Bioinform 10(1):1
Bøvelstad HM, Nygård S, Storvold HL, Aldrin M, Borgan Ø et al (2007) Predicting survival from
microarray data—a comparative study. Bioinformatics 23:2080–2087
Burzykowski T, Molenberghs G, Buyse M, Geys H, Renard D (2001) Validation of surrogate end
points in multiple randomized clinical trials with failure time end points. Appl Stat 50(4):405–422
Clayton DG (1978) A model for association in bivariate life tables and its application in epidemio-
logical studies of familial tendency in chronic disease incidence. Biometrika 65(1):141–151
Day R, Bryant J, Lefkopoulou M (1997) Adaptation of bivariate frailty models for prediction, with
application to biological markers as prognostic indicators. Biometrika 84(1):45–56
Emura T (2019). joint.Cox: joint frailty-copula models for tumour progression and death in meta-
analysis, CRAN
Emura T, Matsui S, Chen HY (2019) compound.Cox: univariate feature selection and compound
covariate for predicting survival. Comput Methods Programs Biomed 168:21–37
Emura T, Nakatochi M, Murotani K, Rondeau V (2017) A joint frailty-copula model between
tumour progression and death for meta-analysis. Stat Methods Med Res 26(6):2649–2666
Emura T, Nakatochi M, Matsui S, Michimae H, Rondeau V (2018) Personalized dynamic prediction
of death according to tumour progression and high-dimensional genetic factors: meta-analysis
with a joint model. Stat Methods Med Res 27(9):2842–2858
Enshaei A, Robson CN, Edmondson RJ (2015) Artificial intelligence systems as prognostic and
predictive tools in ovarian cancer. Ann Surg Oncol 22(12):3970–3975
Ganzfried BF, Riester M, Haibe-Kains B, Risch T, Tyekucheva S et al (2013). Curated ovarian
data: clinically annotated data for the ovarian cancer transcriptome, Database; Article ID bat013:
https://doi.org/10.1093/database/bat013
Guinney J, Wang T, Laajala TD, Winner KK et al (2017) Prediction of overall survival for patients
with metastatic castration-resistant prostate cancer: development of a prognostic model through
a crowdsourced challenge with open clinical trial data. Lancet Oncol 18(1):132–142
Gómez I, Ribelles N, Franco L, Alba E, Jerez JM (2016) Supervised discretization can discover
risk groups in cancer survival analysis. Comput Methods Programs Biomed 136:11–19
Hayes DF, Markus HS, Leslie RD, Topol EJ (2014) Personalized medicine: risk prediction, targeted
therapies and mobile health technology. BMC Med 12(1):37
Król A, Ferrer L, Pignon JP, Proust-Lima C, Ducreux M et al (2016) Joint model for left-censored
longitudinal data, recurrent events and terminal event: Predictive abilities of tumor burden for
cancer evolution with application to the FFCD 2000–05 trial. Biometrics 72(3):907–916
Matsui S (2006) Predicting survival outcomes using subsets of significant genes in prognostic
marker studies with microarrays. BMC Bioinform 7:156
Matsui S, Simon RM, Qu P, Shaughnessy JD, Barlogie B, Crowley J (2012) Developing and vali-
dating continuous genomic signatures in randomized clinical trials for predictive medicine. Clin
Cancer Res 18(21):6065–6073
References 93

Mauguen A, Rachet B, Mathoulin-Pélissier S, Lawrence GM, Siesling S et al (2015) Validation


of death prediction after breast cancer relapses using joint models. BMC Med Res Methodol
15(1):27
Mauguen A, Rachet B, Mathoulin-Pélissier S, MacGrogan G, Laurent A, Rondeau V (2013)
Dynamic prediction of risk of death using history of cancer recurrences in joint frailty mod-
els. Stat Med 32(30):5366–5380
Michiels S, Koscielny S, Hill C (2005) Prediction of cancer outcome with microarrays: a multiple
random validation strategy. The Lancet 365(9458):488–492
Oakes D (1989) Bivariate survival models induced by frailties. J Am Stat Assoc 84:487–493
Proust-Lima C, Séne M, Taylor JM, Jacqmin-Gadda H (2014) Joint latent class models for longi-
tudinal and time-to-event data: a review. Stat Methods Med Res 23(1):74–90
Rizopoulos D (2011) Dynamic predictions and prospective accuracy in joint models for longitudinal
and time-to-event data. Biometrics 67(3):819–829
Rondeau V, Mauguen A, Laurent A, Berr C, Helmer C (2017) Dynamic prediction models for
clustered and interval-censored outcomes: investigating the intra-couple correlation in the risk of
dementia. Stat Methods Med Res 26(5):2168–2183
Rondeau V, Pignon JP, Michiels S (2015) A joint model for dependence between clustered times to
tumour progression and deaths: a meta-analysis of chemotherapy in head and neck cancer. Stat
Methods Med Res 24(6):711–729
Schleidgen S, Klingler C, Bertram T, Rogowski WH, Marckmann G (2013) What is personalized
medicine: sharpening a vague term based on a systematic literature review. BMC Medical Ethics
14(1):55
Sène M, Taylor JM, Dignam JJ, Jacqmin-Gadda H, Proust-Lima C (2016) Individualized dynamic
prediction of prostate cancer recurrence with and without the initiation of a second treatment:
development and validation. Stat Methods Med Res 25(6):2972–2991
Shukla N, Hagenbuchner M, Win KT, Yang J (2018) Breast cancer data analysis for survivability
studies and prediction. Comput Method Program Biomed 155:199–208
Simon R (2005) Roadmap for developing and validating therapeutically relevant genomic classifiers.
J Clin Oncol 23(29):7332–7341
Sun D, Li A, Tang B, Wang M (2018) Integrating genomic data and pathological images to effectively
predict breast cancer clinical outcome. Comput Method Program Biomed 161:45–53
Tothill RW, Tinker AV, George J, Brown R et al (2008) Novel molecular subtypes of serous and
endometrioid ovarian cancer linked to clinical outcome. Clin Cancer Res 14(16):5198–5208
Yoshihara K, Tajima A, Yahata T, Kodama S, Fujiwara H et al (2010) Gene expression profile
for predicting survival in advanced-stage serous ovarian cancer across two independent datasets.
PLoS ONE 5(3):e9615
Yoshihara K, Tsunoda T, Shigemizu D, Fujiwara H, Hatae M et al (2012) High-risk ovarian cancer
based on 126-gene expression signature is uniquely characterized by downregulation of antigen
presentation pathway. Clin Cancer Res 18(5):1374–1385
van Houwelingen HC, Putter H (2011) Dynamic prediction in clinical survival analysis. CRC Press,
New York
van Wieringen WN, Kun D, Hampel R, Boulesteix AL (2009) Survival prediction using gene
expression data: a review and comparison. Comput Stat Data Anal 53(5):1590–1603
Waldron L, Haibe-Kains B, Culhane AC, Riester M, Ding J et al. (2014). Comparative meta-analysis
of prognostic gene signatures for late-stage ovarian cancer. J Natl Cancer Inst 106(5):dju049
Witten DM, Tibshirani R (2010) Survival analysis with high-dimensional covariates. Stat Methods
Med Res 19:29–51
Chapter 6
Future Developments

Abstract This chapter collects additional remarks on the previous chapters and
several open problems for future research. This might help find research topics for
students and researchers.

Keywords Compound covariate · Dependent truncation · Interaction · Kendall’s


τ · Left truncation · Meta-analysis · Recurrent event · Surrogate endpoint

6.1 Recurrent Events Data

The joint frailty-copula model of Chap. 3 can be applicable to analyze recurrent


events data if event times are measured in the gap timescale (Emura et al. 2017; Li
et al. 2019). Under the recurrent event setting, the interpretation of the joint frailty-
copula model is substantially different from the meta-analytic setting of Chap. 3.
First, the frailty term represents the effect of unmeasured covariates at patient-level.
Thus, this frailty introduces patient-level dependence among recurrences, as well as
patient-level dependence between recurrences and death. Second, copulas describe
the residual dependence due to unmeasured recurrence-specific covariates. That is,
even after covariates and a frailty term are given, a pair of gap time and death time is
still dependent. This dependence would be weakened if one could obtain a sufficient
amount of recurrence-specific covariates in each recurrence step j (Li et al. 2019).
For instance, Emura et al. (2017) analyzed G  403 patients with colorectal
cancer who had operations in a hospital in Spain. The data was originally studied
by González et al. (2005) and was made available in R frailty pack package
(Rondeau and Gonzalez 2005). The patients are followed-up from the date of
surgery to either the study end or the time of death whichever comes first. During the
follow-up, patients may have several readmissions (recurrences) related to colorectal
cancer. The number of recurrences varies from 0 to 22. The results of fitting the
joint frailty-copula model under the Clayton copula show that there exists weak
residual dependence between readmission and death (Kendall’s τ  0.22, 95%CI:
0.14−0.31). The reason for residual dependence may be the use of the same set of
covariates for all the recurrence steps. This residual dependence could be removed,

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2019 95
T. Emura et al., Survival Analysis with Correlated Endpoints,
JSS Research Series in Statistics, https://doi.org/10.1007/978-981-13-3516-7_6
96 6 Future Developments

for instance, by incorporating time-dependent covariates, which are updated at the


last discharge date. In the absence of such covariates, the joint frailty-copula model
can capture the residual dependence.
As pointed out by Li et al. (2019), the method of Emura et al. (2017) imposes some
memoryless or Markov assumption in order to transform the likelihood function from
the meta-analytic data to the recurrent event data. This makes it difficult to justify the
validity of the likelihood function used in Emura et al. (2017) for recurrent event data.
The likelihood derived Li et al. (2019) may resolve these issues. Besides the proposal
of Li et al. (2019), there seem to be a few alternative ways to model dependence among
recurrent gap times and times to death. For instance, one can consider a copula-based
Markov chain for serial dependence between recurrent gap times and another copula
for dependence between the first recurrent gap time and time to death.

6.2 Kendall’s τ in Meta-Analysis

Kendall’s τ is a widely used measure of dependence between two endpoints. For


instance, medical researchers have used Kendall’s τ between time-to-tumour pro-
gression (TTP) and overall survival (OS) to assess the quality TTP as a surrogate
endpoint for OS at individual-level (Burzykowski et al. 2001; Rotolo et al. 2018).
See also real data examples in Chaps. 3–5 that use Kendall’s τ between TTP and OS.
Let us reconsider the definition of Kendall’s τ in a meta-analysis that combines
several different studies. In Chaps. 3–5 and in Burzykowski et al. (2001), Kendall’s
τ is actually be regarded as an “individual-level Kendall’s τ” as it is defined given
a frailty value.
In the following, two versions of Kendall’s τ shall be defined with and without a
given frailty value. Let X be TTP, D be OS, and U be frailty. Suppose that (X, D, U )
has the joint density f (x, y, u)  f (x, y|u) f (u).

The individual-level Kendall τ is defined as

τ Ind (u)  Pr{(X 2 − X 1 )(D2 − D1 ) > 0|u} − Pr{(X 2 − X 1 )(D2 − D1 ) < 0|u},

where (X i , Di )∼ f (x, y|u), i  1 and 2, are independent pairs.

The population-level Kendall τ is defined as

τ Pop  Pr{(X 2 − X 1 )(D2 − D1 ) > 0} − Pr{(X 2 − X 1 )(D2 − D1 ) < 0},

where (X i , Di , Ui )∼ f (x, y, u), i  1 and 2, are independent pairs.


6.2 Kendall’s τ in Meta-Analysis 97

The population-level Kendall τ involves a double integration on two frailty variables


such that
τ Pop EU1 U2 [Pr{(X 2 − X 1 )(D2 − D1 ) > 0|U1 , U2 } − Pr{(X 2 − X 1 )(D2 − D1 ) < 0|U1 , U2 }]
2EU1 U2 [Pr{(X 2 − X 1 )(D2 − D1 ) > 0|U1 , U2 } − 1]
2EU1 U2 [Pr{X 2 < X 1 , D2 < D1 |U1 , U2 } + Pr{X 2 > X 1 , D2 > D1 |U1 , U2 } − 1]
¨ ¨ 
2EU1 U2 S(x, y|U1 ) f (x, y|U2 )dxdy + S(x, y|U2 ) f (x, y|U1 )dxdy − 1.

Rondeau et al. (2015) showed that the above expression depends only on the frailty
distribution under their joint frailty model. Under the joint frailty-copula models, the
expression may also depend on the copula. It is important to note that

 Ind 
τ Pop
 E τ (U )  τ Ind (u) f (u)du.

Thus, even if the individual-level Kendall τ is zero for every u, the population-level
Kendall τ may be nonzero. Usually, the population-level Kendall τ has a higher
magnitude than the individual-level Kendall τ. However, theoretical analyses on the
population-level Kendall τ and its relationship with the individual-level Kendall τ
are less developed in the literature. A good starting point is to explore the expressions
of the population-level Kendall τ under the joint frailty-copula model.

6.3 Validation of Surrogate Endpoints

Although the validation criteria for surrogate endpoints are still a subject of intense
research, the current consensus is to base the validation on an approach of “correla-
tion” based on a two-step method (Burzykowski et al. 2005; Rotolo et al. 2018). In
the first step model and in order to assess the quality of the surrogate at the individual-
level, they proposed to use a measure of association between the surrogate and the true
endpoint using a copula model. In the second stage, a surrogate is termed as “valid”
at trial-level if it is able to predict the effect of treatment on the true endpoint based on
the observed effect of treatment on the surrogate endpoint. In order to make a formal
validation process for survival endpoints, Burzykowski et al. (2001) proposed to use
random effects models, where the quality of surrogate at the trial-level was assessed
with a coefficient of determination (R 2 ). They proposed an adjusted trial-level surro-
gacy measure, which takes estimation error of the treatment effects at the first step into
account at the second stage. However, in each of their case studies, some convergence
issues arise unless common baseline hazards across trials are assumed. Numerous
articles have been published on the validation of surrogates, but the methods pro-
posed are not suitable enough particularly in terms of identifiability or optimization
(Burzykowski et al. 2005; Li et al. 2011). Renfro et al. (2012) described that conver-
gence problems were frequently encountered in the first step (at individual-level). But
even when the first step provides estimators, the second stage (at trial-level) does not
98 6 Future Developments

always provide an estimate of the adjusted coefficients of determination (R 2 ). These


numerical problems are frequently encountered and are influenced by the number and
size of the trials as well as assumptions made on the baseline hazards among trials.
The FDA (the US Food and Drug Administration) adopts surrogate endpoints if
they predict clinical outcome for the true endpoint. In many cancer studies, disease-
free survival or progression-free survival is the surrogate endpoint for the true end-
point, namely OS. Rupp and Zuckerman (2017) reported that 18 anticancer drugs
approved by the FDA on the basis of surrogate endpoints actually did not improve
OS for patients. The reason behind this erroneous decision is difficult to identify.
However, we should explore whether the problem comes from statistical tools.
In this context, it seems necessary to improve the existing methods to evaluate
surrogate endpoints. It would be interesting to explore whether surrogate endpoints
can be better validated by new statistical methods such as Rotolo et al. (2017). Also,
the joint frailty-copula model (Chap. 3) is a tailored model to analyze the individual-
level dependence via copulas in meta-analytic settings. However, developing a formal
validation process of surrogacy requires further extensions of the joint frailty-copula
model to incorporate the trial-level dependence. We are currently working on this
topic.

6.4 Left Truncation

Left truncation often occurs if the timescale of endpoints is measured in terms of


age. If the endpoint of interest is age at death, left truncation time corresponds to age
at entry (entry age is not treated as a covariate). In other words, the endpoint is time
to death measured from birth.
All the examples discussed in this book were concerned about the endpoints
measured from the study entry time, so the problem of left truncation did not occur.
Left truncation is particularly relevant for survival data arising from epidemiological
and observational studies, where researchers cannot specify the valid study entry
time. From a methodological point of view, there is a growing interest in the issue of
left truncation arising from clustered survival data (Rondeau et al. 2017; Rodríguez-
Girondo et al. 2018).
Left truncation yields a biased sampling since the patients are available only
when the age at event exceeds the age at entry. All patients who have experienced the
event before the entry are not sampled. In the survival analysis of a single endpoint,
the bias due to left truncation can be adjusted by multiplying an inverse sampling
probability to the likelihood function (Klein and Moeschberger 2003). In the analysis
of two endpoints or clustered survival times, however, this adjustment is not always
trivial. So far, there are three different approaches, called “naive”, “updated”, and
“weighted approach” (Rodríguez-Girondo et al. 2018). Some method for handling
left truncation was already suggested for the joint frailty-copula model (Rondeau
et al. 2015; Emura et al. 2017) without numerical studies. More thorough analyses,
like Rodríguez-Girondo et al. (2018), would be needed.
6.4 Left Truncation 99

An interesting but challenging issue is to account for dependent left truncation.


Traditional analyses for left-truncated survival data rely on independent truncation
assumption (p. 126 of Klein and Moeschberger 2003). For instance, in the survival
analysis of elderly residents, the age at entry to a retirement center is assumed to
be independent of age at death (Hyde 1980). Several different tests for checking
the assumption of independent truncation were developed (Emura and Wang 2010;
Chiou et al. 2018). The effect of dependent truncation in competing risks analysis
was studied by Bakoyannis and Touloumi (2017). To fit survival data with dependent
left truncation, a copula model between event time and left truncation time has been
considered (Chaieb et al. 2006; Emura and Wang 2012; Emura and Murotani 2015;
Emura and Pan 2017). However, these methods cannot be directly applied to the
case where two event times are subject to dependent truncation. In this case, one
may consider two copulas, one for modeling dependent truncation and the other for
modeling dependence between two event times.

6.5 Interactions

The models discussed in this book do not consider interactions between covariates.
However, there are a few different cases, where interactions are of interest.

6.5.1 (Gene × Gene) Interaction

The (gene × gene) interaction may exist between those genes working in the same
pathway. A two-stage analysis may be a simple way to discover such interactions,
where the first stage considers main effects and the second stage considers their
interactions. In the first stage, a univariate feature selection method is performed to
select genes associated with survival (Chap. 4). In the second stage, for all pair of
selected genes, one can perform an additional feature selection method as an attempt
to discover interactions between the genes A variety of different methods could be
considered in the second stage.
However, in our experience of adding interaction terms into a compound covariate
predictor, there is only a modest improvement in prediction power. This might be
because the main effects carry the majority of predictive information of survival, or
the compound covariate implicitly incorporates some amount of interaction. There is
an opportunity to apply a more sophisticated method that systematically incorporates
the pathway or network information, such as those proposed by Kim et al. (2018),
Wang and Chen (2018) and Choi et al. (2018). Exploration of the (gene × gene)
interactions may not only improve prediction performance, but also lead to interesting
insight about the biological mechanisms on the genes such as pathway structures.
100 6 Future Developments

6.5.2 (Gene × Time) Interaction

Time-varying effects of genetic covariates are another interesting issue to be inves-


tigated. If clinical follow-up of patients is long, the prognostic effect of genes may
vary over time.
One may introduce time-varying effects of genes in the model by adding (gene ×
time) interaction terms. Specifically, the interaction term can be defined as CC ×
f (t), where CC is a compound covariate (a linear combination of gene expressions)
and f (t) is a time function flexibly chosen by users. For instance, one can use
f (t)  log(t + 1) (van Houwelingen and Putter 2011). In this way, the compound
covariate accounts for “common” time-varying effects of genes. This approach would
be effective if the majority of genes in the compound covariate share a similar time-
varying effect on survival.
If the time-varying effects of individual genes are heterogeneous, one may cate-
gorize genes into subgroups. For instance, one can consider two subgroups, where
genes of short-term effects and genes of long-term effects are separated. In this way,
time-varying effects can be more homogeneous within the group. However, this
strategy requires a way of grouping genes in order to reduce the heterogeneity of
time-varying effects within a group.
Although one can straightforwardly define the joint frailty-copula model with
time-varying effects, one cannot exploit the computational advantage of the spline
models for the baseline hazard functions (i.e., the spline-based hazard functions have
explicit integral formulas). As a result, likelihood-based inference becomes compu-
tationally demanding since the likelihood may involve some numerical integrations.
One possible alternative is to impose piecewise exponential models for the baseline
hazard functions. Some recent work on piecewise exponential models with copulas
is referred to Emura and Michimae (2017).

6.6 Parametric Failure Time Models

Throughout this book, we focus on the spline-based model to fit the distributions
of two endpoints. In this section, we shall briefly discuss the possibility of applying
parametric models for analyzing correlated endpoints. Parametric models are usually
simpler to use, interpret, and fit than semi-parametric models. However, as argued
in Chap. 2, semi-parametric models are more frequently used in cancer research.
One reason is that the hazard function for cancer patients rarely shows any simple
pattern (e.g., monotonically increasing or decreasing), due to the complex processes
of treatment regimens and their effects on patients. On the other hand, the hazard
function of machines or items may exhibit simpler patterns as long as they are used
in normal conditions. That is why parametric models are widely used in reliability
analysis of manufactured items.
6.6 Parametric Failure Time Models 101

The Weibull distribution is one of the most frequently used parametric models
in copula-based analysis for two endpoints. Escarela and Carrière (2003) proposed
to fit a copula-based parametric model to competing risks data, where they used
a copula for dependence between competing event times that follows the Weibull
model. Burzykowski et al. (2001, 2005) proposed a bivariate Weibull model for
jointly performing Cox regression for two endpoints with meta-analytic data; their
method is implemented in an R package (Rotolo et al. 2018).
It is possible to apply the Weibull model to the joint frailty-copula model
(Chap. 3; Emura et al. 2017) and to the dynamic prediction formulas (Chap. 5;
Emura et al. 2018). Some advantages of the Weibull model is that one can compute
the mean survival time, the correlation coefficient between endpoints, mean residual
lifetime, and other quantities. Notice that all these moment-based quantities are dif-
ficult to be computed in the spline-based model that leads to improper distributions
for the endpoints. To simplify the computation of moments, it is interesting to apply
a conjugate distribution (gamma distribution) for the Weibull distribution (Molen-
berghs et al. 2015). We also notice that the accuracy of the feature selection and
compound covariate methods of Chap. 4 may be improved by employing parametric
models.

6.7 Compound Covariate

Tukey’s compound covariate method, as detailed in Chap. 4, is a simple method


to predict survival based on high-dimensional covariates. The compound covariate
predictor is an ensemble of univariate models of individual covariates, and hence, it
is simpler than most of the other prediction methods that use the penalized multivari-
ate Cox model, such as the ridge and Lasso methods. Nevertheless, the compound
covariate predictor may exhibit a competitive performance with these multivariate
methods (Emura et al. 2012, 2018, 2019; Zhao et al. 2014; Chap. 4). While there are
a number of real data analyses and simulation analyses on the compound covariate
method, the theoretical studies are very scarce in the literature.
The unique property of Tukey’s compound covariate is that it ignores the cor-
relations between genes. Suppose that two genes are strongly correlated and both
of them are univariately associated with survival (P-value < 0.001). Hence, the two
genes are included in a compound covariate predictor. If the two genes were refitted
into a multivariate Cox model, a usual strategy is to remove one of them to avoid
the multicollinearity issue. The same logic may apply to a large number of genes
univariately associated with survival. The compound covariate predictor includes all
them by ignoring their correlations (i.e., without going through a multivariate Cox
model).
In linear discriminant analysis, it has been well recognized that a predictor
that ignores the correlations between individual covariates often performs better
than a predictor that tries to account for dependence. In particular, Dudoit et al.
(2002) reported a remarkable gain in predictive/classification accuracy by ignoring
102 6 Future Developments

correlations among high-dimensional gene expressions. Some theoretical accounts


for this phenomenon are available (Bickel and Levina 2004). However, the theory
behind compound covariate in survival models has not been explored in the literature.
Another unique property of compound covariate is its additive property. Com-
pound covariate aggregates univariate predictors to construct a multigene predic-
tor. This resembles the idea of naïve Bayes (Bickel and Levina 2004), jackknife
model-averaging (Hansen and Racine 2012) or boosting (Hastie et al. 2009). Con-
sequently, removing one univariate predictor from the multigene predictor does not
change much the whole model. This property produces the robustness of the com-
pound covariate predictor against the cutoff value for feature selection as discussed in
Chap. 4. It is worth exploring the robustness and accuracy of the compound covariate
method through the aforementioned machine learning approaches.

References

Bakoyannis G, Touloumi G (2017) Impact of dependent left truncation in semiparametric competing


risks methods: a simulation study. Commun Stat Simul Comput 46(3):2025–2042
Bickel PJ, Levina E (2004) Some theory for Fisher’s linear discriminant function, naive Bayes,
and some alternatives when there are many more variables than observations. Bernoulli
10(6):989–1010
Burzykowski T, Molenberghs G, Buyse M (eds) (2005) The evaluation of surrogate endpoints.
Springer, New York
Burzykowski T, Molenberghs G, Buyse M, Geys H, Renard D (2001) Validation of surrogate end
points in multiple randomized clinical trials with failure time end points. Appl Stat 50(4):405–422
Chaieb LL, Rivest LP, Abdous B (2006) Estimating survival under a dependent truncation.
Biometrika 93(3):655–669
Chiou SH, Qian J, Mormino E et al (2018) Permutation tests for general dependent truncation.
Comput Stat Data Anal 128:308–324
Choi J, Oh I, Seo S, Ahn J (2018) G2Vec: Distributed gene representations for identification of
cancer prognostic genes. Sci Rep 8(1):13729
Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification
of tumors using gene expression data. J Am Stat Assoc 97(457):77–87
Emura T, Chen YH, Chen HY (2012) Survival prediction based on compound covariate under
Cox proportional hazard models. PLoS ONE 7(10):e47627. https://doi.org/10.1371/journal.pone.
0047627
Emura T, Matsui S, Chen HY (2019) compound.Cox: univariate feature selection and compound
covariate for predicting survival. Comput Methods Programs Biomed 168:21–37
Emura T, Michimae H (2017) A copula-based inference to piecewise exponential models under
dependent censoring, with application to time to metamorphosis of salamander larvae. Environ
Ecol Stat 24(1):151–173
Emura T, Murotani K (2015) An algorithm for estimating survival under a copula-based dependent
truncation model. TEST 24(4):734–751
Emura T, Nakatochi M, Murotani K, Rondeau V (2017) A joint frailty-copula model between
tumour progression and death for meta-analysis. Stat Methods Med Res 26(6):2649–2666
Emura T, Nakatochi M, Matsui S, Michimae H, Rondeau V (2018) Personalized dynamic prediction
of death according to tumour progression and high-dimensional genetic factors: meta-analysis
with a joint model. Stat Methods Med Res 27(9):2842–2858
References 103

Emura T, Pan CH (2017) Parametric likelihood inference and goodness-of-fit for dependently left-
truncated data, a copula-based approach, Stat Pap. https://doi.org/10.1007/s00362-017-0947-z
Emura T, Wang W (2010) Testing quasi-independence for truncation data. J Multivar Anal
101:223–239
Emura T, Wang W (2012) Nonparametric maximum likelihood estimation for dependent truncation
data based on copulas. J Multivar Anal 110:171–188
Escarela G, Carrière JF (2003) Fitting competing risks with an assumed copula. Statist Methods
Med Res 12(4):333–349
González JR, Fernandez E, Moreno V, Ribes J et al (2005) Sex differences in hospital readmission
among colorectal cancer patients. J Epidemiol Community Health 59(6):506–511
Hansen BE, Racine JS (2012) Jackknife model averaging. J Econometrics 167(1):38–46
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New York
Hyde J (1980) Survival analysis with incomplete observations. In: Miller RG, Efron B, Brown BW,
Moses LE (eds) Biostatistics casebook. Wiley, New York, pp 31–46
Kim M, Oh I, Ahn J (2018) An improved method for prediction of cancer prognosis by network
learning. Genes 9:478
Klein JP, Moeschberger ML (2003) Survival analysis techniques for censored and truncated data.
Springer, New York
Li Y, Taylor JM, Elliott MR, Sargent DJ (2011) Causal assessment of surrogacy in a meta-analysis
of colorectal cancer trials. Biostatistics 12(3):478–492
Li Z, Chinchilli VM, Wang M (2019) A Bayesian joint model of recurrent events and a terminal
event. Biometrical Journal 61(1):187–202
Molenberghs G, Verbeke G, Efendi A, Braekers R, Demétrio CG (2015) A combined gamma frailty
and normal random-effects model for repeated, over dispersed time-to-event data. Stat Methods
Med Res 24(4):434–452
Renfro LA, Shi Q, Sargent DJ, Carlin BP (2012) Bayesian adjusted R2 for the meta-analytic
evaluation of surrogate time-to-event endpoints in clinical trials. Stat Med 31(8):743–761
Rodríguez-Girondo M, Deelen J, Slagboom EP, Houwing-Duistermaat JJ (2018) Survival analysis
with delayed entry in selected families with application to human longevity. Stat Methods Med
Res 27(3):933–954
Rondeau V, Gonzalez JR (2005) frailtypack: a computer program for the analysis of correlated
failure time data using penalized likelihood estimation. Comput Methods Programs Biomed
80(2):154–164
Rondeau V, Mauguen A, Laurent A, Berr C, Helmer C (2017) Dynamic prediction models for
clustered and interval-censored outcomes: investigating the intra-couple correlation in the risk of
dementia. Stat Methods Med Res 26(5):2168–2183
Rondeau V, Pignon JP, Michiels S (2015) A joint model for dependence between clustered times to
tumour progression and deaths: a meta-analysis of chemotherapy in head and neck cancer. Stat
Methods Med Res 24(6):711–729
Rotolo F, Paoletti X, Burzykowski T, Buyse M, Michiels S (2017) Poisson approach to the validation
of failure time surrogate endpoints in individual patient data meta-analyses, Stat Methods Med
Res. https://doi.org/10.1177/0962280217718582
Rotolo F, Paoletti X. Michiels S (2018). surrosurv: An R package for the evaluation of failure
time surrogate endpoints in individual patient data meta-analyses of randomized clinical trials.
Comput Methods Programs Biomed 155: 189–198
Rupp T, Zuckerman D (2017) Quality of life, overall survival, and costs of cancer drugs approved
based on surrogate endpoints. JAMA Internal Medicine 177(2):276–277
van Houwelingen HC, Putter H (2011) Dynamic prediction in clinical survival analysis. CRC Press,
New York
Wang JH, Chen YH (2018) Overlapping group screening for detection of gene-gene interactions:
application to gene expression profiles with survival trait. BMC Bioinformatics 201819:335
Zhao SD, Parmigiani G, Huttenhower C, Waldron L (2014) Más-o-menos: a simple sign averaging
method for discrimination in genomic data analysis. Bioinformatics 30(21):3062–3069
Appendix A
Spline Basis Functions

P
This appendix defines the spline basis functions used in k0 ðtÞ ¼ 5‘¼1
R
h‘ M‘ ðtÞ ¼ h0 MðtÞ. We also calculate the roughness € k0 ðtÞ2 dt.
For a knot sequence n1 \n2 \n3 with an equally spaced mesh
D ¼ n2  n1 ¼ n3  n2 , let zi ðtÞ ¼ ðt  ni Þ=D for i = 1, 2, and 3. Define M-spline
basis functions as

4Iðn1  t\n2 Þ 4Iðn2  t\n3 Þ


M1 ðtÞ ¼  z2 ðtÞ3 ; M5 ðtÞ ¼ z2 ðtÞ3 ;
D D
Iðn1  t\n2 Þ Iðn2  t\n3 Þ
M2 ðtÞ ¼ f7z1 ðtÞ3  18z1 ðtÞ2 þ 12z1 ðtÞg  z3 ðtÞ3 ;
2D 2D
Iðn1  t\n2 Þ Iðn2  t\n3 Þ
M3 ðtÞ ¼ f2z1 ðtÞ3 þ 3z1 ðtÞ2 g þ f2z2 ðtÞ3  3z2 ðtÞ2 þ 1g;
D D
Iðn1  t\n2 Þ Iðn2  t\n3 Þ
M4 ðtÞ ¼ z1 ðtÞ3 þ f7z2 ðtÞ3 þ 3z2 ðtÞ2 þ 3z2 ðtÞ þ 1g:
2D 2D
Rt
Define the I-spline basis function, I‘ ðtÞ ¼ n1 M‘ ðuÞdu, which can be written as

I1 ðtÞ ¼ 1  z2 ðtÞ4 Iðn1  t\n2 Þ; I5 ðtÞ ¼ z2 ðtÞ4 Iðn2  t\n3 Þ;


7 1
I2 ðtÞ ¼ f z1 ðtÞ4  3z1 ðtÞ3 þ 3z1 ðtÞ2 gIðn1  t\n2 Þ þ f 1  z3 ðtÞ4 gIðn2  t\n3 Þ;
8 8
1 1 1
I3 ðtÞ ¼ f z1 ðtÞ þ z1 ðtÞ gIðn1  t\n2 Þ þ f þ z2 ðtÞ  z2 ðtÞ3 þ z2 ðtÞgIðn2  t\n3 Þ;
4 3 4
2 2 2
1 1 7 1 3 1
I4 ðtÞ ¼ z1 ðtÞ Iðn1  t\n2 Þ þ f  z2 ðtÞ þ z2 ðtÞ3 þ z2 ðtÞ2 þ z2 ðtÞgIðn2  t\n3 Þ:
4 4
8 8 8 2 4 2

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2019 105
T. Emura et al., Survival Analysis with Correlated Endpoints,
JSS Research Series in Statistics, https://doi.org/10.1007/978-981-13-3516-7
106 Appendix A: Spline Basis Functions

The second derivatives of the M-spline basis functions are

€ 1 ðtÞ ¼  24 z2 ðtÞIðn1  t\n2 Þ; M


M € 5 ðtÞ ¼ 24 z2 ðtÞIðn2  t\n3 Þ;
 D 3
 D3
€ 2 ðtÞ ¼ 21 z1 ðtÞ  18 Iðn1  t\n2 Þ  3 z3 ðtÞIðn2  t\n3 Þ;
M
D3 D3 D3
   
€ 12 6 12 6
M3 ðtÞ ¼  3 z1 ðtÞ þ 3 Iðn1  t\n2 Þ þ z2 ðtÞ  3 Iðn2  t\n3 Þ;
D D D3 D
 
€ 4 ðtÞ ¼ 3 21 3
M z1 ðtÞIðn1  t\n2 Þ þ  3 z2 ðtÞ þ 3 Iðn2  t\n3 Þ:
D3 D D

It follows that
Z Z Z Z Z
€ 1 ðtÞ2 dt ¼ 192 ;
M M€ 2 ðtÞ2 dt ¼ 96 ; M€ 3 ðtÞ2 dt ¼ 24 ; M€ 4 ðtÞ2 dt ¼ 96 ; M€ 5 ðtÞ2 dt ¼ 192 ;
Z D 5
Z D 5
Z D 5
D 5
Z D5
€ 1 ðtÞM
M € 2 ðtÞdt ¼  132 ; M € 3 ðtÞdt ¼ 24 ;
€ 1 ðtÞM € 1 ðtÞM
M € 4 ðtÞdt ¼ 12 ; € 1 ðtÞM
M € 5 ðtÞdt ¼ 0;
Z D5 Z D5 Z D5
€ 2 ðtÞM
M € 3 ðtÞdt ¼  24 ; M€ 2 ðtÞM€ 4 ðtÞdt ¼  12 ; M€ 2 ðtÞM€ 5 ðtÞdt ¼ 12 ;
Z D 5
Z D 5
Z D5
€ 3 ðtÞM 24
€ 4 ðtÞdt ¼  ; € 3 ðtÞM€ 5 ðtÞdt ¼ 24 € 4 ðtÞM€ 5 ðtÞdt ¼  132 ;
M M ; M
D5 D5 D5

where the range of integral is ðn1 ; n3 . Then, the penalization term is explicitly
computed as
Z 5 X
X 5 Z
€k0 ðtÞ2 dt ¼ hk h‘ € k ðtÞM
M € ‘ ðtÞdt
k¼1 ‘¼1
2 3
192 132 24 12 0
6 7
6 132 96 24 12 12 7
1 06 7
¼ 5h6 24 24 24 24 24 7 0
7h ¼ h Xh:
D 6 6 7
4 12 12 24 96 132 5
0 12 24 132 192

All the expressions mentioned above were derived in the supplementary material
of Emura et al. (2017). The computational programs of the M- and I-spline basis
functions are available in the joint.Cox R package (Emura 2019). These basis
functions were derived from the general definition of M-spline basis functions
given by Ramsay (1988). The derivations of these basis functions are detailed in
Appendix A of Emura and Chen (2018).
Appendix A: Spline Basis Functions 107

References

Emura T (2019) joint.Cox: joint frailty-copula models for tumour progression and death in
meta-analysis. CRAN.
Emura T, Chen YH (2018) Analysis of survival data with dependent censoring, copula-based
approaches. JSS Research Series in Statistics. Springer.
Emura T, Nakatochi M, Murotani K, Rondeau V (2017) A joint frailty-copula model between
tumour progression and death for meta-analysis. Stat Methods Med Res 26(6):2649–2666
Ramsay J (1988) Monotone regression spline in action. Stat Sci 3:425–461
Appendix B
R Codes for the Ovarian Cancer Data
Analysis

B1. Using the CXCL12 Gene as a Covariate

library(joint.Cox)
data(dataOvarian)
t.event=dataOvarian$t.event ## Ɵme-to-relapse (TTP) ##
event=dataOvarian$event ## indicator for relapse ##
t.death=dataOvarian$t.death ## Ɵme-to-death (OS) ##
death=dataOvarian$death ## indicator for death ##
Z1=dataOvarian$CXCL12 ## gene expression of CXCL12 ##
group=dataOvarian$group ## study indicator (4 studies) ##
alpha_given=0
grid=seq(10, 1e+17, length=30) ## grid for searching the best smoothing parameter ##
set.seed(1)
res=jointCox.reg(t.event=t.event, event=event, t.death=t.death, death=death,
Z1=Z1, Z2=Z1, group=group, alpha=alpha_given,
kappa1=grid, kappa2=grid, LCV.plot=TRUE, Adj=500)
res

#### Plot the baseline hazard funcƟon for TTP ####


par( mfrow=c(1, 1) )
t_min=min(t.event) ## lower bound for the baseline hazard funcƟon
t_max=max(t.death) ## upper bound for the baseline hazard funcƟon

r1_func=funcƟon(t){ as.numeric( M.spline (t, t_min, t_max)%*%(res$g) ) }

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2019 109
T. Emura et al., Survival Analysis with Correlated Endpoints,
JSS Research Series in Statistics, https://doi.org/10.1007/978-981-13-3516-7
110 Appendix B: R Codes for the Ovarian Cancer Data Analysis

r1_Low_func=funcƟon(t){ ## lower confidence limit


M_vec=M.spline (t, t_min, t_max)
r1_V=M_vec%*%(res$g_var)%*%t(M_vec)
as.numeric( M_vec%*%(res$g)-1.96*sqrt(diag(r1_V)) )
}

r1_Up_func=funcƟon(t){ ## upper confidence limit


M_vec=M.spline (t, t_min, t_max)
r1_V=M_vec%*%(res$g_var)%*%t(M_vec)
as.numeric( M_vec%*%(res$g)+1.96*sqrt(diag(r1_V)) )
}

curve( r1_func, t_min, t_max, lwd=3, xlab="Days",


ylab="Baseline hazard", ylim=c(0.00003, 0.0012), xlim=c(0, 5500) )
curve(r1_Low_func, t_min, t_max, lty="doƩed", add=TRUE, col="blue")
curve(r1_Up_func, t_min, t_max, lty="doƩed", add=TRUE, col="blue")
AA=c("Hazard funcƟon for TTP","95% CI")
BB=c("solid", "doƩed")
CC=c("black", "blue")
legend(1800, 0.0011, AA, lwd=c(3, 1), merge = TRUE, lty=BB, col=CC)

#### Plot the baseline hazard funcƟon for OS ####


r2_func=funcƟon(t){as.numeric( M.spline (t, t_min, t_max)%*%(res$h) )}

r2_Low_func=funcƟon(t){ ## lower confidence limit


r2_V=M.spline (t, t_min, t_max)%*%(res$h_var)%*%t(M.spline (t, t_min, t_max))
as.numeric( M.spline (t,t_min, t_max)%*%(res$h)-1.96*sqrt(diag(r2_V)) )
}
r2_Up_func=funcƟon(t) { ## upper confidence limit
r2_V=M.spline (t, t_min, t_max)%*%(res$h_var)%*%t(M.spline (t, t_min, t_max))
as.numeric( M.spline (t, t_min, t_max)%*%(res$h)+1.96*sqrt(diag(r2_V)) )
}

curve(r2_func, t_min, t_max,lwd=3, lty="dotdash",xlab="Days",add=TRUE)


curve(r2_Low_func, t_min, t_max, lty="doƩed",lwd=1,add=TRUE,col="red")
curve(r2_Up_func, t_min, t_max, lty="doƩed",lwd=1,add=TRUE,col="red")

AA=c("Hazard funcƟon for OS", "95% CI")


BB=c("dotdash", "doƩed")
CC=c("black", "red")
legend(3800, 0.0008, AA, lwd=c(3,1), merge = TRUE, lty=BB, col=CC)

########## RelaƟve risk (RR) ##############


RR_TTP=c(RR=exp(res$beta1[1]), Low=exp(res$beta1[1]-1.96*res$beta1[2]),
Up=exp(res$beta1[1]+1.96*res$beta1[2]))
RR_OS=c(RR=exp(res$beta2[1]), Low=exp(res$beta2[1]-1.96*res$beta2[2]),
Up=exp(res$beta2[1]+1.96*res$beta2[2]))

list(alpha=alpha_given,
RR1=round(RR_TTP,2), RR2=round(RR_OS,2), eta=round(res$eta,4),
theta=round(res$theta,4), tau=round(res$tau,4)
)
Appendix B: R Codes for the Ovarian Cancer Data Analysis 111

B2. Using the Compound Covariates


(CCs) and Residual Tumour as Covariates

library(joint.Cox)
library(compound.Cox)

data(dataOvarian1)
data(dataOvarian2)
t.event=dataOvarian1$t.event ## Ɵme-to-relapse (TTP) ##
event=dataOvarian1$event ## indicator for relapse ##
t.death=dataOvarian2$t.death ## Ɵme-to-death (OS) ##
death=dataOvarian2$death ## indicator for death ##
residual=dataOvarian1[,4] ## residual tumour size (>=1cm vs. <1cm)
group=dataOvarian1[,3] ## study indicator (4 studies) ##
X.mat1=dataOvarian1[,-c(1,2,3,4)] ## genes associated with TTP
X.mat2=dataOvarian2[,-c(1,2,3,4)] ## genes associated with OS
Symbol1=colnames(dataOvarian1)[-c(1,2,3,4)] ## gene symbols for TTP
Symbol2=colnames(dataOvarian2)[-c(1,2,3,4)] ## gene symbols for OS
X.mat1=as.matrix(X.mat1)
X.mat2=as.matrix(X.mat2)
q1=ncol(X.mat1) ## the number of genes associated with TTP ##
q2=ncol(X.mat2) ## the number of genes associated with OS ##

##### Compound covariate for TTP ####


res=uni.Wald(t.event,event,X.mat1)
coef1=res$beta
data.frame( gene=names(res$beta)[order(res$P)], P=res$P[order(res$P)],
coef=round(coef1[order(res$P)],3) )
CC1_train=X.mat1%*%coef1 ### Compound covariate for TTP ###
mu1=mean(CC1_train)
sigma1=sd(CC1_train)
round(c(mu1=mu1,sigma1=sigma1),3)

##### Compound covariate for OS ####


res=uni.Wald(t.death,death,X.mat2)
coef2=res$beta
data.frame( gene=names(res$beta)[order(res$P)], P=res$P[order(res$P)],
coef=round(coef2[order(res$P)],3) )
CC2_train=X.mat2%*%coef2 ### Compound covariate for OS ###
mu2=mean(CC2_train)
sigma2=sd(CC2_train)
round(c(mu2=mu2,sigma2=sigma2),3)

mu2+2*sigma2 ## high-risk
mu2-2*sigma2 ## low-risk

############## Fit the joint frailty-copula model ################


grid=c(seq(10,1e+17,length=100))
set.seed(1)
res=jointCox.reg(t.event=t.event, event=event, t.death=t.death, death=death,
Z1=(CC1_train-mu1)/sigma1, Z2=cbind( residual,(CC2_train-mu2)/sigma2 ),
group=group, alpha=0, convergence.par = TRUE,
kappa1=grid, kappa2=grid, LCV.plot=TRUE, Randomize_num=1)

########## RelaƟve risk #########


112 Appendix B: R Codes for the Ovarian Cancer Data Analysis

RR_gamma1=exp(res$beta1[c(1,3,4)])
RR_beta2=exp(res$beta2[c(1,5,7)])
RR_gamma2=exp(res$beta2[c(2,6,8)])

########## Summarize esƟmates ##############


list(RR_gamma1=round(RR_gamma1,2),RR_beta2=round(RR_beta2,2),
RR_gamma2=round(RR_gamma2,2),eta=round(res$eta,2),
theta=round(res$theta,1),tau=round(res$tau,2))

list(gamma1=round(res$beta1,3),beta2=round(res$beta2[c(1,3,5,7)],3),
gamma2=round(res$beta2[c(2,4,6,8)],3),eta=round(res$eta,3),
theta=round(res$theta,2),tau=round(res$tau,2),
g=round(res$g,2),h=round(res$h,2)
)

The codes do not include the calculations for the patient-level survival curves
and their CIs.
Appendix C
Derivation of Prediction Formulas

Case I: Given that the patient does not experience tumour progression before time
t (i.e., X [ t), the conditional failure function is

Fðt; t þ wj X [ t; ZÞ ¼ PrðD  t þ wjD [ t; X [ t; ZÞ


PrðD [ t; X [ tj ZÞ  PrðD [ t þ w; X [ tj ZÞ
¼
PrðD [ t; X [ tj ZÞ
R1
ð PrðD [ t; X [ tju; ZÞ  PrðD [ t þ w; X [ tju; ZÞÞfg ðuÞdu
¼ 0 R1
0 PrðD [ t; X [ tju; ZÞfg ðuÞdu
R1
ð C h ½ SX ðtjuÞ; SD ðtjuÞ   Ch ½ SX ðtjuÞ; SD ðt þ wjuÞ Þfg ðuÞdu
¼ 0 R1 ;
0 Ch ½ SX ðtjuÞ; SD ðtjuÞ fg ðuÞdu

and the conditional hazard function is

kðtj X [ t; Z; uÞ ¼ Prðt\D  t þ dtjX [ t; D [ t; Z; uÞ=dt


PrðX [ t; t\D  t þ dtjZ; uÞ
¼
dt  PrðX [ t; D [ tjZ; uÞ
PrðX [ t; D [ t þ dtjZ; uÞ  PrðX [ t; D [ tjZ; uÞ
¼
dt  PrðX [ t; D [ tjZ; uÞ
@ PrðX [ t; D [ yjZ; uÞ=@yjy¼t
¼
PrðX [ t; D [ tjZ; uÞ
½0;1
Ch ½SX ðtjuÞ; SD ðtjuÞ
¼ f@SD ðtjuÞ=@tg
Ch ½SX ðtjuÞ; SD ðtjuÞ
½0;1
SD ðtjuÞCh ½SX ðtjuÞ; SD ðtjuÞ
¼ kD ðtjuÞ :
Ch ½SX ðtjuÞ; SD ðtjuÞ

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2019 113
T. Emura et al., Survival Analysis with Correlated Endpoints,
JSS Research Series in Statistics, https://doi.org/10.1007/978-981-13-3516-7
114 Appendix C: Derivation of Prediction Formulas

Case II: Given that the patient experiences tumour progression at time x  t, the
conditional failure function is

Fðt; t þ wj X ¼ x; ZÞ ¼ PrðD  t þ wjD [ t; X ¼ x; ZÞ


PrðD [ t; X ¼ xjZÞ  PrðD [ t þ w; X ¼ xjZÞ
¼
PrðD [ t; X ¼ xjZÞ
R1
PrððD [ t; X ¼ xju; ZÞ  PrðD [ t þ w; X ¼ xju; ZÞÞfg ðuÞdu
¼ 0 R1
0 PrðD [ t; X ¼ xju; ZÞfg ðuÞdu
R1  @  @ 
 @x PrðD [ t; X [ xju; ZÞ   @x PrðD [ t þ w; X [ xju; ZÞfg ðuÞdu
¼ 0 R1 @
0  @x PrðD [ t; X [ xju; ZÞfg ðuÞdu
R1  @  @ 
 @x Ch ½SX ðxjuÞ; SD ðtjuÞ   @x Ch ½SX ðxjuÞ; SD ðt þ wjuÞ fg ðuÞdu
¼ 0 R1 @
0  @x Ch ½ SX ðxjuÞ; SD ðtjuÞ  fg ðuÞdu
R 1  ½1;0 ½1;0
0 Ch ½SX ðxjuÞ; SD ðtjuÞ  Ch ½SX ðxjuÞ; SD ðt þ wjuÞ kX ðxjuÞSX ðxjuÞfg ðuÞdu
¼ R 1 ½1;0 ;
0 Ch ½SX ðxjuÞ; SD ðtjuÞkX ðxjuÞSX ðxjuÞfg ðuÞdu

and the conditional hazard function is

kðtj X ¼ x; Z; uÞ ¼ Prðt\D  t þ dtjX ¼ x; D [ t; Z; uÞ=dt


PrðX ¼ x; t\D  t þ dtjZ; uÞ
¼
dt  PrðX ¼ x; D [ tjZ; uÞ
PrðX ¼ x; D [ t þ dtjZ; uÞ  PrðX ¼ x; D [ tjZ; uÞ
¼
dt  PrðX ¼ x; D [ tjZ; uÞ
@ PrðX ¼ x; D [ tjZ; uÞ=@t
¼
PrðX ¼ x; D [ tjZ; uÞ
@ 2 PrðX [ x; D [ tjZ; uÞ=@x@t
¼
@ PrðX [ x; D [ tjZ; uÞ=@x
½1;1
@SD ðtjuÞ Ch ½SX ðxjuÞ; SD ðtjuÞ
¼
@t C ½1;0 ½SX ðxjuÞ; SD ðtjuÞ
h
½1;1
SD ðtjuÞCh½SX ðxjuÞ; SD ðtjuÞ
¼ kD ðtjuÞ ½1;0
:
Ch ½SX ðxjuÞ; SD ðtjuÞ
Appendix C: Derivation of Prediction Formulas 115

If one applies the independence copula Cðv; wÞ ¼ vw to the previous formulas,


R1
ðSD ðtjuÞ  SD ðt þ wjuÞÞSX ðtjuÞfg ðuÞdu
Fðt; t þ wj X [ t; ZÞ ¼ 0 R1 ;
0 SD ðtjuÞSX ðtjuÞfg ðuÞdu
R1
ðSD ðtjuÞ  SD ðt þ wjuÞÞkX ðxjuÞSX ðxjuÞfg ðuÞdu
Fðt; t þ wj X ¼ x; ZÞ ¼ 0 R1 ;
0 SD ðtjuÞkX ðxjuÞSX ðxjuÞfg ðuÞdu
kðtj X [ t; Z; uÞ ¼ kðtj X ¼ x; Z; uÞ ¼ kD ðtjuÞ:
Index

A F
Abe Sklar, 5 FGM copula, 26, 28, 45, 57
Archimedean copula, 5, 28–30 Frailty, 4, 20, 40, 83–85, 96, 97

B G
Bivariate copula, 26 Gamma frailty model, 5, 22, 23
Bivariate survival data, 32 Gene expression, 49, 63, 70, 74, 77, 87
Bivariate survival function, 26 Gumbel copula, 26, 29, 35, 44, 57

C H
Cancer research, 1, 10, 79, 100 Hazard function, 12, 53, 54
Clayton copula, 5, 26, 28, 30, 44, 47, 56, 83, Head and neck cancer, 4, 9
85, 90 Hessian matrix, 15, 34, 48, 54–56
Clustered survival data, 20, 25 Heterogeneity, 20, 21, 40, 43, 78, 84
Colorectal cancer, 4, 34, 95 High-dimensional covariates, 59, 62
Competing risks, 5, 40, 41, 99, 101
Compound covariate, 60–64, 70, 87, 100, 101 I
Compound.Cox, 34, 68 Independence copula, 26, 28, 29, 53, 57, 85
Conditional failure function, 80, 81, 84, 86, 88 Independent censoring assumption, 2, 12, 15,
Conditional hazard function, 30, 81, 82, 84, 85, 31
90 Individual-level dependence, 4, 98
Cox model, 14, 32 Individual patient data, 20, 40, 78
Cross-ratio function, 29, 30, 82, 85 Information matrix, 15, 18
Cross-validation, 52, 53, 55, 65 Interactions, 99
CXCL12, 49, 53, 71, 87 Intra-study dependence, 40, 43
Intra-subject dependence, 43, 44
D IPD meta-analysis, 20, 40, 78
David George Clayton, 5
Dependent censoring, 5, 31, 32, 40, 41, 99 J
Dependent truncation, 99 Joint.Cox, 19, 47–49, 54, 56, 65, 71, 86, 88
Disease-free survival, 1, 10, 98 Joint frailty-copula model, 44, 49, 64, 65, 84,
Dropout, 2, 11, 29 97, 98, 100
Dynamic prediction, 3, 78, 79, 82 Joint frailty model, 43, 85, 97

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2019 117
T. Emura et al., Survival Analysis with Correlated Endpoints,
JSS Research Series in Statistics, https://doi.org/10.1007/978-981-13-3516-7
118 Index

K R
Kendall’s tau, 22, 28, 29, 44, 45, 72, 87, 96 Random effects, 31, 97
Recurrent event, 95
L Relative risk, 14, 29, 53, 72, 83, 85
Landmark analysis, 83 Residual dependence, 31, 32, 40, 43, 53, 95
Lasso, 65–68 Residual tumour, 16, 71, 72, 88
Likelihood Cross-Validation (LCV), 50, 52, 55 Ridge regression, 65
Lung cancer, 34 Roughness, 18, 25, 48

M S
Machine learning, 102 Score function, 15, 18, 34
Meta-analysis, 39, 40, 43, 44, 49, 62, 70, 77, Semi-competing risks, 5, 40, 41
78, 96 Shared frailty model, 20, 21, 23
Multicenter analysis, 20 Smoothing parameter, 18, 25, 48, 49
Multicollinearity, 34, 101 Spline, 14, 19, 25, 47, 48, 54
Multivariate survival analysis, 1, 5, 11 Surrogate endpoint, 4, 96, 97
Survival function, 12
N
Newton–Raphson algorithm, 18, 25, 34 T
Non-informative censoring, 17 Terminal event, 41, 62, 63
Nonterminal event, 41, 62, 63 Time-to-relapse, 53, 70, 87
Time-varying effects, 100
O Tukey, 60, 101
Ovarian cancer, 49, 53, 63, 70, 77, 78, 87 Tumour progression, 1, 3, 10, 21, 23, 39–41,
Overall survival, 1, 10, 39, 41, 62, 78, 79, 96 62, 79–82, 84, 96
Two-step method, 5, 39, 97
P
Partial likelihood, 15 U
Pathway, 68, 99 Univariate feature selection, 61, 64, 70
Patient-level survival function, 12, 66, 68, 72
Patient-tailored therapy, 77 V
Penalized likelihood, 18, 25, 45, 48, 55 Validation of surrogate, 97
Personalized medicine, 4, 77
Piecewise exponential, 100 W
Prognostic analysis, 12, 14, 30, 91 Weibull, 13, 23, 25, 39, 101
Progression-free survival, 1, 10, 98

You might also like