Survival Analysis and Interpretation Of.32
Survival Analysis and Interpretation Of.32
E SPECIAL ARTICLE
Survival analysis, or more generally, time-to-event analysis, refers to a set of methods for ana-
lyzing the length of time until the occurrence of a well-defined end point of interest. A unique
feature of survival data is that typically not all patients experience the event (eg, death) by the
end of the observation period, so the actual survival times for some patients are unknown. This
phenomenon, referred to as censoring, must be accounted for in the analysis to allow for valid
inferences. Moreover, survival times are usually skewed, limiting the usefulness of analysis
methods that assume a normal data distribution. As part of the ongoing series in Anesthesia &
Analgesia, this tutorial reviews statistical methods for the appropriate analysis of time-to-event
data, including nonparametric and semiparametric methods—specifically the Kaplan-Meier esti-
mator, log-rank test, and Cox proportional hazards model. These methods are by far the most
commonly used techniques for such data in medical literature. Illustrative examples from stud-
ies published in Anesthesia & Analgesia demonstrate how these techniques are used in prac-
tice. Full parametric models and models to deal with special circumstances, such as recurrent
events models, competing risks models, and frailty models, are briefly discussed. (Anesth
Analg 2018;127:792–8)
The surprising thing about young fools is how many survive to also often interested in whether survival times are related
become old fools. to covariates, and estimating the effect size of a specific
—Doug Larson (1926–2017), American journalist, covariate (eg, magnitude of the treatment effect) when it is
columnist, and editor
adjusted for potential confounders.
T
he occurrence of a well-defined event such as patient Furthermore, it may initially appear that such a research
mortality is often a primary outcome in medical question about the length of a time interval, which is essen-
research. This is essentially a binary outcome (the tially a continuous outcome variable, can be addressed by
event has occurred versus it has not occurred). In a previ- linear regression or related techniques like a t test or analy-
ous tutorial in this series, we described how such binary sis of variance.1,2 However, a key distinction between sur-
outcome data can be analyzed with logistic regression.1 For vival times and other continuous data is that the event of
example, one can estimate the relationship between one or interest (eg, death) will usually have occurred only in some
more covariates, also referred to as independent variables but not in all patients by the time the study ends.
or predictor variables (eg, treatments or prognostic factors) For patients who survive until the end of the study period,
and the odds of experiencing the outcome within a specific or who are lost to follow-up before the end of the observa-
time frame (eg, mortality within 30 days postoperatively). tion period, full survival times are unknown. Instead all that
However, logistic regression analysis is not appropriate is known is that the survival time is greater than the observa-
when the research question involves the length of time until tion time. This unique feature of survival data is referred to
the end point occurs—for example, estimating median sur-
as right censoring, which is described in more detail below.3
vival times, plotting survival over time after treatment, or
Ignoring censored patients in the analysis, or simply equat-
estimating the probability of surviving beyond a prespeci-
ing their observed survival time (follow-up time) with the
fied time interval (eg, 5-year survival rate). Researchers are
unobserved total survival time, would bias the results. Even if
there was no censoring in the data set, survival times usually
From the *Department of Anesthesiology, VU University Medical Center,
Amsterdam, the Netherlands; and †Department of Surgery and Perioperative have a heavily skewed distribution, limiting the usefulness of
Care, Dell Medical School at the University of Texas at Austin, Austin, Texas. statistical tests that assume a normal data distribution.3
Accepted for publication June 8, 2018. Analyzing survival data is unique in that the research
Funding: None. interest is typically a combination of whether the event has
The authors declare no conflicts of interest. occurred (binary outcome) and when it has occurred (contin-
Reprints will not be available from the authors. uous outcome). Appropriate analysis of survival data requires
Address correspondence to Patrick Schober, MD, PhD, MMedStat, Depart- specific statistical methods that can deal with censored data.
ment of Anesthesiology, VU University Medical Center, De Boelelaan 1117,
1081 HV, Amsterdam, the Netherlands. Address e-mail to [email protected]. As the assessed outcome is frequently mortality, these tech-
Copyright © 2018 The Author(s). Published by Wolters Kluwer Health, Inc. niques are subsumed under the term survival analysis.
on behalf of the International Anesthesia Research Society. This is an open- More generally, however, these techniques can be used
access article distributed under the terms of the Creative Commons Attribu-
tion-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it for the analysis of the time until any event of interest occurs
is permissible to download and share the work provided it is properly cited. (eg, recurrence of a disease; initial, breakthrough postopera-
The work cannot be changed in any way or used commercially without per-
mission from the journal. tive pain; or failure of an implanted medical device), and
DOI: 10.1213/ANE.0000000000003653 such data can thus also be called time-to-event or failure
time data.4,5 In this tutorial, we use the terms survival time, disadvantage that they do not occur instantaneously, making
time-to-event, and failure time synonymously. it difficult to specify the exact time point of occurrence. In such
As part of the ongoing series in Anesthesia & Analgesia, settings, the clearest description of the outcome is often “time-
this basic tutorial reviews statistical methods that are appro- to-detection” rather than “time-to-event.” Whatever the event
priate for survival data. We focus on the most common tech- of interest, a clear and unambiguous definition is essential.
niques, which are the Kaplan-Meier estimator, log-rank test, The total length of follow-up and follow-up intervals should
and the Cox proportional hazards (PH) model. The Table be sensibly chosen to ensure that a sufficient number of events
provides a summary of key-related terms. An extensive dis- are observed (see Power and Sample Size Considerations sec-
Downloaded from http://journals.lww.com/anesthesia-analgesia by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsI
cussion of full parametric techniques and the special circum- tion) and that the timing of occurrence can be determined,
Ho4XMi0hCywCX1AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8K2+Ya6H515kE= on 08/29/2024
stances that call for other techniques is beyond the scope of to avoid interval censoring (as detailed below). Conversely,
this basic tutorial, and we thus intentionally provide only long-term observational studies carry the risk that factors that
precursory coverage of these more advanced aspects. influence survival time, other than the treatment or factor
under investigation, may also change during the study period.
CONCEPTS AND TERMINOLOGY IN SURVIVAL Patients recruited to the study early should ideally have the
ANALYSIS same risk of event occurrence as patients recruited late.3
General Considerations As the failure time is the time between some starting point
The event of interest should be clinically relevant, well (origin) and the event, not only the event but also the time of
defined, unambiguous, and preferably easily observable. origin needs to be clearly specified. Ideally, the time of ori-
While the patient death seems to be such an unambiguous end gin should also be sensibly chosen, so all individuals are as
point, misclassification is possible when specific-cause mor- much as possible on a par.6 In a study comparing therapeu-
tality, rather than all-cause mortality, is the outcome of inter- tic interventions on a survival outcome, the starting point
est.3 Some end points, such as recurrence of cancer, have the is typically the time when the intervention is administered.
In epidemiologic or screening studies, the origin is often Left censoring occurs when a subject is known to have
when a condition or disease is diagnosed. However, this can had the event before the start of the observation, but the
lead to biased estimates of survival times, especially when exact time of the event is unknown. This contrasts with
the study intervention does not only presumably affect the left truncation, where the patient is often not even known
event but also the time of origin. This so-called lead-time to exist.12 Similarly, interval censoring is where it is only
bias is common in studying whether a screening program known that the event occurred between 2 time points, but
for a specific disease increases survival time. Observed pro- again, the exact time is unknown. Left and interval censored
longed survival in screened patients could be merely due data are less common and usually do not exist when death
Downloaded from http://journals.lww.com/anesthesia-analgesia by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsI
to the disease being diagnosed earlier, and not necessarily is the outcome of interest. Statistical techniques to deal with
Ho4XMi0hCywCX1AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8K2+Ya6H515kE= on 08/29/2024
reflect a benefit of screening on absolute survival time.7 left and interval censored data are available; however, they
are infrequently used and will not be covered in this basic
Truncation tutorial.
As in any clinical study, the target population, as well as
inclusion and exclusion criteria, must be clearly defined. A Survival (Survivor) Function, Hazard Rate,
unique feature of survival data is truncation, which results Hazard Function, and Hazard Ratio
from selection bias and refers to subject selection depend- The survival (or survivor) function and the hazard func-
ing on whether or not the event has occurred.8 Subjects may tion are fundamental to survival analysis. The survival
only be identified for observation at some time point after function describes the probability of surviving past a spec-
their respective time of origin.9 ified time point, or more generally, the probability that the
Patients who have already experienced the event before event of interest has not yet occurred by this time point
the time point of patient identification may not be identi- (Figure 1).13
fied—for example, because they have already died, these A hazard rate (or failure rate) is the rate of occurrence of
patients may not be known to exist.8 In this situation, only the event during a given time interval.10 The hazard func-
those patients who have not experienced the event will tion describes the instantaneous rate of occurrence over
selectively enter the study, which is referred to as left trun- time, which can conceptually be viewed as the hazard rate
cation or delayed entry. during an infinitesimally small time interval. The hazard
Alternatively, right truncation occurs when patients who and survival functions are closely related and can easily be
experience the event are selectively included6—for exam- converted to each other.3 When the hazard rate is high, sur-
ple, when patients are obtained from a death registry, and vival declines rapidly and vice versa.
hence, survivors are not selected for the study. While the While it is not necessary to understand the hazard
bias caused by truncation can be partially addressed during function in detail, it is the basis of PH models, which are
the analysis, it is often preferable be prevent it at the design extensively used to model survival data. Importantly, the
phase of the study. exponentiated parameter estimates of these models can be
interpreted as a hazard ratio (HR), which is an estimate of
Censoring the ratio of the hazard rates between 2 groups (eg, treatment
Censoring refers to incompletely observed survival times and versus control).14
is inherent with most survival data. The situation described The HR is similar to the risk ratio (relative risk), with a
above in which not all the patients experience the event until value higher or lower than 1 indicating a higher or lower
completion of the study is referred to as right censoring. hazard rate, respectively, than the comparison group. While
Visualizing the timeline of a patient’s observed survival time, the HR is technically not the same as the risk ratio, it is often
the unobserved event, if it were to occur, would lie beyond conveniently interpreted as such in the literature.15
the right side of the time point at which the patient is cen-
sored. Right censoring is the most common type of censor- General Overview of Methods to Analyze
ing in survival studies, and the statistical methods described Survival Data
below are well suited to deal with this type of censoring. In analyzing survival data, 3 common classes of methods
Basically, censored patients are: (1) included in estimates of are broadly distinguished:
survival probabilities at time points preceding their censor-
ing time point; and (2) excluded from the analysis thereafter.10 1. Nonparametric methods, which neither impose
Unbiased inferences require the censoring to be nonin- assumptions on the distribution of survival times
formative, with the time of censoring absolutely not related (a specific shape of the survival function or hazard
to the event time.3 Informative censoring would occur when function) nor assume a specific relationship between
patients are censored due to a medical condition that is covariates and the survival time. This class includes
related to the future risk of the event (eg, inability to show the Kaplan-Meier estimator and log-rank test.
up for a clinic visit due to severe illness and thus loss to 2. Semiparametric methods also make no assumptions
follow-up). Unfortunately, this problem is neither easy to regarding the distribution of survival times but do
detect nor is there an ideal solution—other than conducting assume a specific relationship between covariates
the study in a way that promotes complete follow-up and and the hazard function—and hence, the survival
avoids informative censoring.11 If possible, data on the rea- time. The widely used Cox PH model is a semipara-
sons for loss-to-follow-up should be collected because such metric method.
information can be used in sensitivity analyses to assess for 3. Parametric methods assume a distribution of the sur-
potential bias. vival times and a functional form of the covariates.
794
www.anesthesia-analgesia.org ANESTHESIA & ANALGESIA
Survival Analysis
796
www.anesthesia-analgesia.org ANESTHESIA & ANALGESIA
Survival Analysis
authors followed the common (albeit not statistically cor- may not want to simply consider all-cause mortality as
rect) practice of interpreting the HR as a relative risk: “The the failure event, but would like to study the relationship
relative risk of achieving pain control at any collected time between covariates and specific-cause mortality. Or, com-
point in parturients receiving DPE was 1.7 times greater monly, researchers are interested in an event such as cancer
than in those receiving LE.”31 Of note, in the above first recurrence, but death that occurs before the event of interest
example, observational data were analyzed using the Cox is a competing risk. In this setting, the researcher can either
PH model to adjust for confounding. Here, in this random- model the time to the earliest of death or cancer recurrence
ized controlled trial, the purpose of the Cox PH model was or use special methods to model both events.
Downloaded from http://journals.lww.com/anesthesia-analgesia by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsI
to obtain an estimate of the treatment effect. Frailty models account for nonindependence of obser-
Ho4XMi0hCywCX1AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8K2+Ya6H515kE= on 08/29/2024
Using data on patients who participated in 2 trials across vations in clustered data (for correlated failure times), by
4 clinical sites for a follow-up analysis, Podolyak et al30 stud- incorporating random effects.39 Such data may arise when
ied effects of supplemental perioperative oxygen on long- the survival times of individuals within a cluster (eg, fam-
term mortality in patients undergoing colorectal surgery. In ily or hospital) tend to be more similar to each other than
the 2 original trials, patients had randomly received either survival times of patients who belong to different clusters.
30% or 80% inspired oxygen perioperatively. The authors These models are analogous to mixed effect models for
present survival curves using Kaplan-Meier estimates and uncensored longitudinal and correlated data, as described
use a Cox PH model, stratified by study and site to allow in a recent tutorial in this series.40
for separate baseline hazards for each study and site. This
approach was (presumably) chosen as it allows for the POWER AND SAMPLE SIZE CONSIDERATIONS
estimation of an overall HR estimate and significance test The power of a method to analyze survival time data
across all study sites. No effect of 80% vs 30% inspired oxy- depends on the number of events rather the total sample
gen was observed on mortality, with an overall estimated size.21 Therefore, calculation of total sample size is a 2-step
HR of 0.93 (95% CI, 0.72–1.20; P = .57).30 process.41
First, the number of events needed to detect a minimum
PARAMETRIC MODELS clinically important effect size, like a prespecified HR, with a
Parametric models assume a specific distribution of the preselected power and alpha level is computed. Depending
survival times. Advantages of a parametric model include on the planned data analysis method, different approaches
a higher efficiency (ie, greater power),14 which can be par- for estimating the number of events have been proposed,
ticularly useful with smaller sample sizes. Furthermore, a including the Schoenfeld method for log-rank tests or PH
variety of parametric techniques can model survival times models.42
when the PH assumption is not met. Second, to calculate the total sample size, the propor-
However, it can be quite challenging to identify the most tion of patients who are expected to experience the event
appropriate data distribution, and parametric models have needs to be estimated.41 Of note, for multivariable models
the drawback of providing misleading inferences if the dis- like the Cox PH model, it has been suggested that at least
tributional assumptions are not met. In contrast, the semi- 10 events need to be observed per covariate to be included
parametric Cox model is a safe and proven method without in the model.43
the need to specify a specific data distribution,36 which is
why this model is most common in analyzing survival data. CONCLUSIONS
For a more detailed discussion on parametric models, we Survival data are unique in that the research questions
refer to previously published literature on the topic.14,36 essentially involve a combination of whether the event has
occurred in the observation period and when it has occurred.
RECURRENT EVENTS, COMPETING RISKS, AND Censoring, or the incomplete observation of failure times, is
FRAILTY MODELS common in these data, such that specific statistical methods
The previously described techniques are useful for study- are required for an appropriate analysis.
ing time until occurrence of a specific event that occurs only The Kaplan-Meier method estimates the unadjusted
once, terminates the observation of a patient, and occurs probability of surviving beyond a certain time point, and a
independently between the patients. While this situation is Kaplan-Meier curve is a useful graphical tool to display the
common in many time-to-event study designs, researchers estimated survival function. The log-rank test is commonly
may be interested in: (1) events that can occur more than used to compare survival curves between different groups,
once or in a series of events in which each event has its own but can only be used for a crude, unadjusted comparison.
failure time; (2) situations in which follow-up may be termi- The Cox PH model is the most commonly used tech-
nated by >1 event; or (3) clusters of patients for whom the nique to assess the effect of factors, such as treatments, that
event does not occur independently. simultaneously allows one to control for the effects of other
Recurrent event models are capable of modeling the covariates. The exponentiated regression coefficients can be
sequential occurrence of events over time.37 This can be the interpreted in terms of an HR. This semiparametric tech-
same event transpiring several times (eg, occurrence of myo- nique makes no assumptions about the distribution of the
cardial infarction) or a series of different ordered events (eg, survival times.
different, progressive stages of a disease until death occurs). If the distribution can be appropriately identified and
Competing risk models can accommodate multiple modeled, parametric techniques can alternatively be used.
(competing) types of failure events, each of which terminate For special circumstances in which the standard techniques
the observation of an individual.38 For example, a researcher cannot be validly used, a variety of methods including
recurrent events models, competing risks models, and 22. Cox DR. Regression models and life-tables. J R Stat Soc Series B.
frailty models are available. E 1972;34:187–220.
23. Hosmer DW, Lemeshow S, May S. Interpretation of a fitted
proportional hazards regression model. In: Applied Survival
DISCLOSURES Analysis. 2nd ed. Hoboken, NJ: John Wiley & Sons, 2008:92–131.
Name: Patrick Schober, MD, PhD, MMedStat. 24. Nieto FJ, Coresh J. Adjusting survival curves for confounders: a
Contribution: This author helped write and revise the manuscript. review and a new method. Am J Epidemiol. 1996;143:1059–1068.
Name: Thomas R. Vetter, MD, MPH. 25. Hosmer DW, Lemeshow S, May S. Assessment of model ade-
Contribution: This author helped write and revise the manuscript. quacy. In: Applied Survival Analysis. 2nd ed. Hoboken, NJ: John
This manuscript was handled by: Jean-Francois Pittet, MD. Wiley & Sons, 2008:169–206.
Downloaded from http://journals.lww.com/anesthesia-analgesia by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsI
Ho4XMi0hCywCX1AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8K2+Ya6H515kE= on 08/29/2024
798
www.anesthesia-analgesia.org ANESTHESIA & ANALGESIA