Lect On Estimation of The Survival Function
Lect On Estimation of The Survival Function
(1) Kaplan-Meier
1
(1) The Kaplan-Meier Estimator
12 8
What about Ŝ(8)? Is it 21 or 21 ?
2
A table of Ŝ(t):
Values of t Ŝ(t)
t < 1 21/21=1.000
1 ≤ t < 2 19/21=0.905
2 ≤ t < 3 17/21=0.809
3≤t<4
4≤t<5
5≤t<8
8 ≤ t < 11
11 ≤ t < 12
12 ≤ t < 15
15 ≤ t < 17
17 ≤ t < 22
22 ≤ t < 23
3
1.0
0.8
0.6
Survival
0.4
0.2
0.0
0 5 10 15 20 25
Time
4
Empirical Survival Function:
5
What if there is censoring?
Consider the treated group from Table 1.1 of Cox and Oakes:
+
[Note: times with are right censored]
6
[Reading:]
A quick review of conditional probability
7
Now, let’s apply these ideas to estimate S(t):
D C C D D D
8
4 possibilities for each interval:
(1) No death or censoring - conditional probability of
surviving the interval is estimated to be 1;
9
Here as the intervals get finer and finer, the approximations
made in estimating the probabilities of getting through each
interval become more and more accurate, at the end the
estimator converges to the true S(t) in probability (proof
not shown here).
10
The Kaplan-Meier estimator of the survivorship
function (or survival probability) S(t) = P (T > t)
is:
Q rj −dj
Ŝ(t) = j:τj ≤t rj
Q dj
= j:τj ≤t 1− rj
where
• τ1, ...τK is the set of K distinct uncensored failure times
observed in the sample
• dj is the number of failures at τj
• rj is the number of individuals “at risk” right before
the j-th failure time (everyone who died or censored
at or after that time).
11
Calculating the KM - leukemia treated group
Note that:
12
1.0
0.8
0.6
Survival
0.4
0.2
0.0
0 5 10 15 20 25 30 35
Time
13
17 10 0 1 0.6275 0.1141 0.3675 0.8049
19 9 0 1 0.6275 0.1141 0.3675 0.8049
20 8 0 1 0.6275 0.1141 0.3675 0.8049
22 7 1 0 0.5378 0.1282 0.2678 0.7468
23 6 1 0 0.4482 0.1346 0.1881 0.6801
25 5 0 1 0.4482 0.1346 0.1881 0.6801
32 4 0 2 0.4482 0.1346 0.1881 0.6801
34 2 0 1 0.4482 0.1346 0.1881 0.6801
35 1 0 1 0.4482 0.1346 0.1881 0.6801
14
[Reading:] Redistribution to the right algorithm
(Efron, 1967)
Algorithm:
• Step (1): arrange the n observation times (failures or
censorings) in increasing order. If there are ties, put
censored after failures.
• Step (2): Assign weight (1/n) to each time.
• Step (3): Moving from left to right, each time you en-
counter a censored observation, distribute its mass to all
times to its right.
• Step (4): Calculate Ŝj by subtracting the final weight
for time j from Ŝj−1
15
Example of “redistribute to the right” algorithm
2, 2.5+, 3, 3, 4, 4.5+, 5, 6, 7
(Step 1) (Step 4)
Times Step 2 Step 3a Step 3b Ŝ(τj )
2 1/9=0.11 0.889
2.5+ 1/9=0.11 0 0.889
3 2/9=0.22 0.25 0.635
4 1/9=0.11 0.13 0.508
4.5+ 1/9=0.11 0.13 0 0.508
5 1/9=0.11 0.13 0.17 0.339
6 1/9=0.11 0.13 0.17 0.169
7 1/9=0.11 0.13 0.17 0.000
16
Properties of the KM estimator
As said before
asymp.
Ŝ(t) ∼ N (S(t), S(t)[1 − S(t)]/n)
The proofs can be done using the usual method (by writing
as sum of i.i.d. terms plus op(1)) but it is laborious, or it
can be done using counting processes which was considered
more elegant, or by empirical processes method which is
very powerful for semiparametric inferences.
17
The KM estimator is also an MLE
18
Cox and Oakes book Section 4.2 (please read if you have
time) shows that the right-censored data likelihood for such
a discrete distribution can be written as
g
d
Y
L(λ) = λj j (1 − λj )rj −dj
j=1
Y dj
= 1−
j:aj ≤t
rj
19
One can often show that an NPMLE behaves like a classic
MLE:
• consistent for the true parameter (function);
• asymptotically normal (converges in distribution to a
Gaussian process).
20
Greenwood’s formula for variance
Since the λ̂j ’s are just binomial proportions given rj ’s, then
λ̂j (1 − λ̂j )
Var(λ̂j ) ≈
rj
Also, the λ̂j ’s are asymptotically independent.
21
Greenwood’s formula (continued)
22
Greenwood’s Formula:
2
P dj
Var(Ŝ(t)) = [Ŝ(t)]
c
j:τj ≤t (rj −dj )rj
23
Confidence intervals
24
[Read] Log-log Approach for Confidence Intervals:
25
What is A?
• A is 1.96 · se(L̂(t))
• To calculate this, we need to calculate
h i
Var(L̂(t)) = Var log(− log(Ŝ(t)))
26
1.0
0.8
0.6
Survival
0.4
0.2
0.0
0 5 10 15 20 25 30 35
Time
27
R allows different types of CI (some are truncated to be
between 0 and 1):
28
Mean, Median, Quantiles based on the KM
y
= censored observation
E = event
30
When there is left truncation, the observed data is (Qi, Xi, δi),
i = 1, ..., n.
Now the ‘risk set’ at any time t consists of subjects who have
entered the study, and have not failed or been censored by
that time, i.e. {i : Qi < t ≤ Xi}.
Pn
So rj = i=1 I(Qi < τj ≤ Xi).
We still have
Y dj
Ŝ(t) = 1−
j:τ ≤t
rj
j
31
• When minni=1 Qi = t0 > 0, then the KM estimates
P (T > t|T > t0).
• The left truncated KM is still an NPMLE (Wang, 1991).
32
(2) Nelson-Aalen (Fleming-Harrington)
estimator
– Estimating the cumulative hazard
Rt
If we can estimate Λ(t) = 0 λ(u)du, the cumulative hazard
at time t, then we can estimate S(t) = e−Λ(t).
D C C D D D
33
This gives the Nelson-Aalen estimator:
X
Λ̂N A(t) = dj /rj .
j:τj ≤t
It follows that Λ̂(t), like the KM, changes only at the ob-
served death (event) times.
Example:
D C C D D D
rj n n n n-1 n-1 n-2 n-2 n-3 n-4
dj 0 0 1 0 0 0 0 1 1
cj 0 0 0 0 1 0 1 0 0
1 1
λ̂(tj )∆ 0 0 1/n 0 0 0 0 n−3 n−4
34
In general, the FH estimator of the survival function should
be close to the Kaplan-Meier estimator, ŜKM (t).
skm sfh
1. .91666667 .9200444
2. .83333333 .8400932
3. .75 .7601478
4. .66666667 .6802101
5. .58333333 .6002833
6. .5 .5203723
7. .41666667 .4404857
8. .33333333 .3606392
9. .25 .2808661
10. .16666667 .2012493
11. .08333333 .1220639
12. 0 .0449048
36
[Reading:]
(3) The Lifetable or Actuarial Estimator
37
There are several types of lifetable methods according to the
data sources:
38
Notation
Year after
Diagnosis j d j cj rj rj0 = rj − cj /2
[0, 1) 1 456 0 2418 2418.0
[1, 2) 2 226 39 1962 1942.5 (1962 - 39 2)
[2, 3) 3 152 22 1697 1686.0
[3, 4) 4 171 23 1523 1511.5
[4, 5) 5 135 24 1329 1317.0
[5, 6) 6 125 107 1170 1116.5
[6, 7) 7 83 133 938 871.5
etc..
39
Estimating the survivorship function
40
Constructing the lifetable
Quantities estimated:
• Conditional probability of dying (event)
q̂j = dj /rj0
41
Other quantities estimated at the
midpoint of the j-th interval:
42
Estimated Survival
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 100 200 300 400 500 600 700 800 900 1000
Lower Limit of Time Interval
43
Estimated hazard
0.010
0.008
0.006
0.004
0.002
0.000
0 100 200 300 400 500 600 700 800 900 1000
Lower Limit of Time Interval
44