0% found this document useful (0 votes)

101 views44 pages

Lect On Estimation of The Survival Function

1) The Kaplan-Meier estimator is a nonparametric method for estimating the survival function from life-time data that may be censored. It calculates the probability of surviving past certain time points by taking into account the number of individuals still at risk at each point. 2) The Kaplan-Meier estimator is calculated as the product of the conditional probabilities of surviving each interval between event times, where the probability is estimated based on the number of deaths and individuals still at risk in each interval. 3) The Kaplan-Meier estimator handles censoring by assuming censored individuals survive up until the time they were last known to be event-free. It provides a consistent estimate of the survival

Uploaded by

Hemangi Kulkarni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views44 pages

Lect On Estimation of The Survival Function

Uploaded by

Hemangi Kulkarni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Lecture 2

ESTIMATING THE SURVIVAL

FUNCTION

— One-sample nonparametric methods

There are commonly three methods for estimating a sur-

vivorship function
S(t) = P (T > t)

without resorting to parametric models:

(1) Kaplan-Meier

(2) Nelson-Aalen or Fleming-Harrington (via esti-

mating the cumulative hazard)

(3) Life-table (Actuarial Estimator)

We will mainly consider the first two.

1
(1) The Kaplan-Meier Estimator

The Kaplan-Meier (or KM) estimator is probably the most

popular approach.

Motivation (no censoring):

Remission times (weeks) for 21 leukemia patients receiving
control treatment (Table 1.1 of Cox & Oakes; Freireich et
al., 1963):
1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11, 11, 12, 12, 15, 17, 22, 23

We estimate S(10), the probability that an individual sur-

8
vives to week 10 or later, by 21 .

How would you calculate the standard error of the estimated

survival?
8
Ŝ(10) = P̂ (T > 10) = = 0.381
21
(Answer: se[Ŝ(10)] = 0.106)

12 8
What about Ŝ(8)? Is it 21 or 21 ?

2
A table of Ŝ(t):
Values of t Ŝ(t)
t < 1 21/21=1.000
1 ≤ t < 2 19/21=0.905
2 ≤ t < 3 17/21=0.809
3≤t<4
4≤t<5
5≤t<8
8 ≤ t < 11
11 ≤ t < 12
12 ≤ t < 15
15 ≤ t < 17
17 ≤ t < 22
22 ≤ t < 23

In most software packages, the survival function is evaluated

just after time t, i.e., at t+. In this case, we only count the
individuals with T > t.

3
1.0
0.8
0.6
Survival
0.4
0.2
0.0

0 5 10 15 20 25
Time

Figure 1: Example for leukemia data (control arm)

4
Empirical Survival Function:

When there is no censoring, the general formula is:

Pn
# individuals with T > t I(Ti > t)
Sn(t) = = i=1
total sample size n

Note that Fn(t) = 1 − Sn(t) is the empirical CDF.

Also I(Ti > t) ∼ Bernoulli(S(t)), so that

1. Sn(t) converges in probability to S(t) (consistency);

√
2. n{Sn(t) − S(t)} → N (0, S(t)[1 − S(t)]) in distribu-
tion.

[Make sure that you know these.]

5
What if there is censoring?

Consider the treated group from Table 1.1 of Cox and Oakes:

6, 6, 6, 6+, 7, 9+, 10, 10+, 11+, 13, 16, 17+

19+, 20+, 22, 23, 25+, 32+, 32+, 34+, 35+

+
[Note: times with are right censored]

We know S(5)= 21/21, because everyone survived at least

until week 5 or greater. But, we can’t say S(7) = 17/21,
because we don’t know the status of the person who was
censored at time 6.

In a 1958 paper in the Journal of the American Statistical

Association, Kaplan and Meier proposed a way to estimate
S(t) nonparametrically, even in the presence of censoring.
The method is based on the ideas of conditional proba-
bility.

6
[Reading:]
A quick review of conditional probability

Conditional Probability: Suppose A and B are two

events. Then,
P (A ∩ B)
P (A|B) =
P (B)

Multiplication law of probability: can be obtained

from the above relationship, by multiplying both sides by
P (B):
P (A ∩ B) = P (A|B) P (B)

Extension to more than 2 events:

Suppose A1, A2...Ak are k different events. Then, the prob-
ability of all k events happening together can be written as
a product of conditional probabilities:

P (A1 ∩ A2... ∩ Ak ) = P (Ak |Ak−1 ∩ ... ∩ A1) ×

×P (Ak−1|Ak−2 ∩ ... ∩ A1)
...
×P (A2|A1)
×P (A1)

7
Now, let’s apply these ideas to estimate S(t):

– Intuition behind the Kaplan-Meier Estimator

Think of dividing the observed timespan of the study into a

series of fine intervals so that there is a separate interval for
each time of death or censoring:

D C C D D D

Using the law of conditional probability,

Y
P (T > t) = P (survive j-th interval Ij | survived to start of Ij )
j
Y
= λj
j

where the product is taken over all the intervals preceding

time t.

8
4 possibilities for each interval:
(1) No death or censoring - conditional probability of
surviving the interval is estimated to be 1;

(2) Censoring - assume they survive to the end of the

interval (the intervals are very small), so that the condi-
tional probability of surviving the interval is again esti-
mated to be 1;

(3) Death, but no censoring - conditional probability

of not surviving the interval is estimated by # deaths
(d) divided by # ‘at risk’ (r) at the beginning of the
interval. So the estimated conditional probability of sur-
viving the interval is 1 − d/r;

(4) Tied deaths and censoring - assume censorings last

to the end of the interval, so that the estimated condi-
tional probability of surviving the interval is still 1−d/r.

General Formula for jth interval:

It turns out we can write a general formula for the estimated
conditional probability of surviving the j-th interval that
holds for all 4 cases:
dj
1−
rj

9
Here as the intervals get finer and finer, the approximations
made in estimating the probabilities of getting through each
interval become more and more accurate, at the end the
estimator converges to the true S(t) in probability (proof
not shown here).

This intuition explains why an alternative name for the KM

is the product-limit estimator.

10
The Kaplan-Meier estimator of the survivorship
function (or survival probability) S(t) = P (T > t)
is:
Q rj −dj
Ŝ(t) = j:τj ≤t rj

Q dj
= j:τj ≤t 1− rj

where
• τ1, ...τK is the set of K distinct uncensored failure times
observed in the sample
• dj is the number of failures at τj
• rj is the number of individuals “at risk” right before
the j-th failure time (everyone who died or censored
at or after that time).

Furthermore, let cj be the number of censored observations

between the j-th and (j + 1)-st failure times. Any censoring
tied at τj are included in cj , but not censorings tied at τj+1.

Note: two useful formulas are:

(1) rj = rj−1 − dj−1 − cj−1
X
(2) rj = (cl + dl )
l≥j

11
Calculating the KM - leukemia treated group

Make a table with a row for every death or censoring time:

τj dj cj rj 1 − (dj /rj ) Ŝ(τj )

18
6 3 1 21 21 = 0.857
7 1 0 17
9 0 1 16
10
11
13
16
17
19
20
22
23

Note that:

• Ŝ(t) only changes at death (failure) times;

• Ŝ(t) is 1 up to the first death time;
• Ŝ(t) only goes to 0 if the last observation is uncensored;
• When there is no censoring, the KM estimator equals
the empirical survival estimate

12
1.0
0.8
0.6
Survival
0.4
0.2
0.0

0 5 10 15 20 25 30 35
Time

Figure 2: KM plot for treated leukemia patients

Output from a software KM Estimator:

failure time: weeks

failure/censor: remiss

Beg. Net Survivor Std.

Time Total Fail Lost Function Error [95% Conf. Int.]
-------------------------------------------------------------------
6 21 3 1 0.8571 0.0764 0.6197 0.9516
7 17 1 0 0.8067 0.0869 0.5631 0.9228
9 16 0 1 0.8067 0.0869 0.5631 0.9228
10 15 1 1 0.7529 0.0963 0.5032 0.8894
11 13 0 1 0.7529 0.0963 0.5032 0.8894
13 12 1 0 0.6902 0.1068 0.4316 0.8491
16 11 1 0 0.6275 0.1141 0.3675 0.8049

13
17 10 0 1 0.6275 0.1141 0.3675 0.8049
19 9 0 1 0.6275 0.1141 0.3675 0.8049
20 8 0 1 0.6275 0.1141 0.3675 0.8049
22 7 1 0 0.5378 0.1282 0.2678 0.7468
23 6 1 0 0.4482 0.1346 0.1881 0.6801
25 5 0 1 0.4482 0.1346 0.1881 0.6801
32 4 0 2 0.4482 0.1346 0.1881 0.6801
34 2 0 1 0.4482 0.1346 0.1881 0.6801
35 1 0 1 0.4482 0.1346 0.1881 0.6801

(Note: the above is from Stata, but most software output

similar KM tables.)

14
[Reading:] Redistribution to the right algorithm
(Efron, 1967)

There is another way to compute the KM estimator.

In the absence of censoring, Ŝ(t) is just the proportion of

individuals with T > t. The idea behind Efron’s approach
is to spread the contributions of censored observations out
over all the possible times to their right.

Algorithm:
• Step (1): arrange the n observation times (failures or
censorings) in increasing order. If there are ties, put
censored after failures.
• Step (2): Assign weight (1/n) to each time.
• Step (3): Moving from left to right, each time you en-
counter a censored observation, distribute its mass to all
times to its right.
• Step (4): Calculate Ŝj by subtracting the final weight
for time j from Ŝj−1

15
Example of “redistribute to the right” algorithm

Consider the following event times:

2, 2.5+, 3, 3, 4, 4.5+, 5, 6, 7

The algorithm goes as follows:

(Step 1) (Step 4)
Times Step 2 Step 3a Step 3b Ŝ(τj )
2 1/9=0.11 0.889
2.5+ 1/9=0.11 0 0.889
3 2/9=0.22 0.25 0.635
4 1/9=0.11 0.13 0.508
4.5+ 1/9=0.11 0.13 0 0.508
5 1/9=0.11 0.13 0.17 0.339
6 1/9=0.11 0.13 0.17 0.169
7 1/9=0.11 0.13 0.17 0.000

This comes out the same as the product-limit approach.

16
Properties of the KM estimator

When there is no censoring, KM estimator is the

same as the empirical estimator:
# deaths after time t
Ŝ(t) =
n
where n is the number of individuals in the study.

As said before
asymp.
Ŝ(t) ∼ N (S(t), S(t)[1 − S(t)]/n)

How does censoring affect this?

• Ŝ(t) is still consistent for the true S(t);
• Ŝ(t) is still asymptotically normal;
• The variance is more complicated.

The proofs can be done using the usual method (by writing
as sum of i.i.d. terms plus op(1)) but it is laborious, or it
can be done using counting processes which was considered
more elegant, or by empirical processes method which is
very powerful for semiparametric inferences.

17
The KM estimator is also an MLE

You can read in Cox and Oakes book Section 4.2.

Here we need to think of the distribution function F (t) as an

(infinite dimensional) parameter, and we try to find the F̂ (or
Ŝ = 1 − F̂ ) that maximizes a nonparametric likelihood.
Such a MLE is called a NPMLE.

As it turns out, such a F̂ (t) has to be discrete in order

to for the likelihood to be bounded (otherwise MLE does
not exist), with masses only at the uncensored failure times
{aj }j . (aj = τj in our previous notation)

Ex. Show the following for an absolutely continuous distri-

bution function F(t): a) for a random sample of size n from
F(t), if we allow an estimator of F(t) to be absolutely con-
tinuous (i.e. having a positive density) on any open inter-
val covering the observed data point(s), then the likelihood
function can be made arbitrarily large; b) if we restrict to
those estimates of F(t) such that the likelihood function is
bounded, the maximum of the likelihood is achieved when
we assign point mass of 1/n to each observed data point.
This shows that the empirical CDF is the NPMLE.

18
Cox and Oakes book Section 4.2 (please read if you have
time) shows that the right-censored data likelihood for such
a discrete distribution can be written as
g
d
Y
L(λ) = λj j (1 − λj )rj −dj
j=1

where λj is the discrete hazard (i.e. conditional probability)

at aj . Note that the likelihood is the same as that of g
independent binomials.

Therefore, the maximum likelihood estimator of λj is (why):

λ̂j = dj /rj

For a discrete survival distribution

Y
S(t) = (1 − λj )
j:aj ≤t

Now we plug in the MLE’s of λ to estimate S(t) (why):

Y
Ŝ(t) = (1 − λ̂j )
j:aj ≤t

Y dj

= 1−
j:aj ≤t
rj

This is the NPMLE of S.

19
One can often show that an NPMLE behaves like a classic
MLE:
• consistent for the true parameter (function);
• asymptotically normal (converges in distribution to a
Gaussian process).

For a semparametric model (that we’ll talk about later),

it is often semiparametrically efficient.

20
Greenwood’s formula for variance

Note that the KM estimator is

Y
Ŝ(t) = (1 − λ̂j )
j:τj ≤t

where λ̂j = dj /rj .

Since the λ̂j ’s are just binomial proportions given rj ’s, then
λ̂j (1 − λ̂j )
Var(λ̂j ) ≈
rj
Also, the λ̂j ’s are asymptotically independent.

Since Ŝ(t) is a function of the λj ’s, we can estimate its vari-

ance using the Delta method.

Delta method: If Yn is (asymptotically) normal

with mean µ and variance σ 2, g is differentiable and
g 0(µ) 6= 0, then g(Yn) is approximately normally
distributed with mean g(µ) and variance [g 0(µ)]2σ 2.

21
Greenwood’s formula (continued)

Instead of dealing with Ŝ(t) directly, we will look at its log

(why?): X
log[Ŝ(t)] = log(1 − λ̂j )
j:τj ≤t

Thus, by approximate independence of the λ̂j ’s,

X
Var(log[Ŝ(t)]) =
c Var[log(1
c − λ̂j )]
j:τj ≤t
!2
X 1
= Var(
c λ̂j )
j:τj ≤t 1 − λ̂j
!2
X 1 λ̂j (1 − λ̂j )
=
j:τj ≤t 1 − λ̂j rj
X λ̂j
=
j:τj ≤t (1 − λ̂j )rj
X dj
=
j:τj
(r − dj )rj
≤t j

Now, Ŝ(t) = exp[log[Ŝ(t)]]. Using Delta method once again,

h i
2
c Ŝ(t)) = [Ŝ(t)] · Var
Var( c log[Ŝ(t)]

22
Greenwood’s Formula:
2
P dj
Var(Ŝ(t)) = [Ŝ(t)]
c
j:τj ≤t (rj −dj )rj

23
Confidence intervals

For a 95% confidence interval, we could use

Ŝ(t) ± z1−α/2 se[Ŝ(t)]
where se[Ŝ(t)] is calculated using Greenwood’s formula. Here
t is fixed, and this is refered to as pointwise confidence in-
terval.

Problem: This approach can yield values > 1 or < 0.

Better approach: Get a 95% confidence interval for

L(t) = log(− log(S(t)))
Since this quantity is unrestricted, the confidence interval
will be in the right range when we transform back:
S(t) = exp(− exp(L(t)).

[ To see why this works:

0 ≤ Ŝ(t) ≤ 1
−∞ ≤ log[Ŝ(t)] ≤ 0
0 ≤ − log[Ŝ(t)] ≤ ∞
h i
−∞ ≤ log − log[Ŝ(t)] ≤ ∞
]

24
[Read] Log-log Approach for Confidence Intervals:

(1) Define L(t) = log(− log(S(t)))

(2) Form a 95% confidence interval for L(t) based on L̂(t),

yielding [L̂(t) − A, L̂(t) + A]

(3) Since S(t) = exp(− exp(L(t)), the confidence bounds

for the 95% CI of S(t) are:
h i
L̂(t)+A L̂(t)−A
exp{−e }, exp{−e }

(note that the upper and lower bounds switch)

(4) Substituting L̂(t) = log(− log(Ŝ(t))) back into the above

bounds, we get confidence bounds of

eA e−A
[Ŝ(t)] , [Ŝ(t)]

25
What is A?

• A is 1.96 · se(L̂(t))
• To calculate this, we need to calculate
h i
Var(L̂(t)) = Var log(− log(Ŝ(t)))

• From our previous calculations, we know

X dj
Var(log[Ŝ(t)]) =
c
(r − dj )rj
j:τ ≤t j
j

• Applying the delta method again, we get:

Var(
c L̂(t)) = Var(log(−
c log[Ŝ(t)]))
1 X dj
=
[log Ŝ(t)]2 j:τ ≤t (rj − dj )rj
j

• We take the square root of the above to get se(L̂(t)),

26
1.0
0.8
0.6
Survival
0.4
0.2
0.0

0 5 10 15 20 25 30 35
Time

Figure 3: KM Survival Estimate and Confidence intervals (type ‘log-log’)

• Different software might use different approaches to cal-

culate the CI.

Software for Kaplan-Meier Curves

• Stata - stset and sts commands

• SAS - proc lifetest
• R - survfit() in ‘survival’ package

27
R allows different types of CI (some are truncated to be
between 0 and 1):

95 percent confidence interval is of type "log"

time n.risk n.event survival std.dev lower 95% CI upper 95% CI
6 21 3 0.8571429 0.07636035 0.7198171 1.0000000
7 17 1 0.8067227 0.08693529 0.6531242 0.9964437
10 15 1 0.7529412 0.09634965 0.5859190 0.9675748
13 12 1 0.6901961 0.10681471 0.5096131 0.9347692
16 11 1 0.6274510 0.11405387 0.4393939 0.8959949
22 7 1 0.5378151 0.12823375 0.3370366 0.8582008
23 6 1 0.4481793 0.13459146 0.2487882 0.8073720

95 percent confidence interval is of type "log-log"

time n.risk n.event survival std.dev lower 95% CI upper 95% CI
6 21 3 0.8571429 0.07636035 0.6197180 0.9515517
7 17 1 0.8067227 0.08693529 0.5631466 0.9228090
10 15 1 0.7529412 0.09634965 0.5031995 0.8893618
13 12 1 0.6901961 0.10681471 0.4316102 0.8490660
16 11 1 0.6274510 0.11405387 0.3675109 0.8049122
22 7 1 0.5378151 0.12823375 0.2677789 0.7467907
23 6 1 0.4481793 0.13459146 0.1880520 0.6801426

95 percent confidence interval is of type "plain"

time n.risk n.event survival std.dev lower 95% CI upper 95% CI
6 21 3 0.8571429 0.07636035 0.7074793 1.0000000
7 17 1 0.8067227 0.08693529 0.6363327 0.9771127
10 15 1 0.7529412 0.09634965 0.5640993 0.9417830
13 12 1 0.6901961 0.10681471 0.4808431 0.8995491
16 11 1 0.6274510 0.11405387 0.4039095 0.8509924
22 7 1 0.5378151 0.12823375 0.2864816 0.7891487
23 6 1 0.4481793 0.13459146 0.1843849 0.7119737

28
Mean, Median, Quantiles based on the KM

• Mean is not well estimated with censored data, since we

often don’t observe the right tail.

• Median - by definition, this is the time, τ , such that

S(τ ) = 0.5. In practice, it is often defined as the small-
est time such that Ŝ(τ ) ≤ 0.5. The median is more
appropriate for censored survival data than the mean.

For the treated leukemia patients, we find:

Ŝ(22) = 0.5378
Ŝ(23) = 0.4482
The median is thus 23.

• Lower quartile (25th percentile):

the smallest time (LQ) such that Ŝ(LQ) ≤ 0.75

• Upper quartile (75th percentile):

the smallest time (UQ) such that Ŝ(U Q) ≤ 0.25

From class discussion: 1) how do we bootstrap survival data?

2) how do we estimate the distribution of C? Recall that
the observed data is (Xi, δi), where Xi = min(Ti, Ci), δi =
I(Ti ≤ Ci), i = 1, ..., n.
29
Left truncated KM estimate

y
= censored observation
E = event

30
When there is left truncation, the observed data is (Qi, Xi, δi),
i = 1, ..., n.

Now the ‘risk set’ at any time t consists of subjects who have
entered the study, and have not failed or been censored by
that time, i.e. {i : Qi < t ≤ Xi}.

Pn
So rj = i=1 I(Qi < τj ≤ Xi).

We still have
Y dj

Ŝ(t) = 1−
j:τ ≤t
rj
j

where τ1, ...τK is the set of K distinct uncensored failure

times observed in the sample, dj is the number of failures at
τj , and rj is the number of individuals “at risk” right before
the j-th failure time (everyone who had entered and
who died or censored at or after that time).

Read [required] Tsai, Jewell and Wang (1987).

31
• When minni=1 Qi = t0 > 0, then the KM estimates
P (T > t|T > t0).
• The left truncated KM is still an NPMLE (Wang, 1991).

• The Greenwood’s formula for variance still applies.

• In R (and most other softwares) it is handled by some-

thing like ‘Surv(time=Q, time2=X, event=δ)’.

• This approach is referred to as conditional inference, i.e. con-

ditional on the Q’s. This is implemented in most survival
software.

• Another approach is the unconditional inference, where

assumptions are made about the distribution of the Q’s;
eg. uniform, in which case this is referred to as ‘length
biased’.

32
(2) Nelson-Aalen (Fleming-Harrington)
estimator
– Estimating the cumulative hazard
Rt
If we can estimate Λ(t) = 0 λ(u)du, the cumulative hazard
at time t, then we can estimate S(t) = e−Λ(t).

Just as we did for the KM, think of dividing the observed

time span of the study into a series of fine intervals so that
there is only one event per interval:

D C C D D D

Λ(t) can then be approximated by a sum:

X
Λ(t) ≈ λ̃j · ∆j
j:τj ≤t

where the sum is over intervals up to t, λ̃j is the value of

the hazard in the j-th interval and ∆j is the width of that
interval.

Since λ̃∆ is approximately the conditional probability of dy-

ing in the interval, we can further estimate λ̃j · ∆j by dj /rj .

33
This gives the Nelson-Aalen estimator:
X
Λ̂N A(t) = dj /rj .
j:τj ≤t

It follows that Λ̂(t), like the KM, changes only at the ob-
served death (event) times.

Example:
D C C D D D
rj n n n n-1 n-1 n-2 n-2 n-3 n-4
dj 0 0 1 0 0 0 0 1 1
cj 0 0 0 0 1 0 1 0 0
1 1
λ̂(tj )∆ 0 0 1/n 0 0 0 0 n−3 n−4

Λ̂(tj ) 0 0 1/n 1/n 1/n 1/n 1/n

Once we have Λ̂N A(t), we can obtain the Fleming-Harrington

estimator of S(t):
ŜF H (t) = exp(−Λ̂N A(t)).

34
In general, the FH estimator of the survival function should
be close to the Kaplan-Meier estimator, ŜKM (t).

We can compare the Fleming-Harrington survival estimate

to the KM estimate using a subgroup of the nursing home
data:

skm sfh
1. .91666667 .9200444
2. .83333333 .8400932
3. .75 .7601478
4. .66666667 .6802101
5. .58333333 .6002833
6. .5 .5203723
7. .41666667 .4404857
8. .33333333 .3606392
9. .25 .2808661
10. .16666667 .2012493
11. .08333333 .1220639
12. 0 .0449048

In this example, it looks like the Fleming-Harrington estima-

tor is slightly higher than the KM at every time point, but
with larger datasets the two will typically be much closer.

Question: do you think that the two estimators are asymp-

totically equivalent?
35
Note: We can also go the other way: we can take the
Kaplan-Meier estimate of S(t), and use it to calculate an
alternative estimate of the cumulative hazard function:
Λ̂KM (t) = − log ŜKM (t)

36
[Reading:]
(3) The Lifetable or Actuarial Estimator

• one of the oldest techniques around

• used by actuaries, demographers, etc.

• applies when the data are grouped

Our goal is still to estimate the survival function, hazard,

and density function, but this is sometimes complicated by
the fact that we don’t know exactly when during a time
interval an event occurs.

37
There are several types of lifetable methods according to the
data sources:

Population Life Tables

• cohort life table - describes the mortality experience

from birth to death for a particular cohort of people born
at about the same time. People at risk at the start of the
interval are those who survived the previous interval.
• current life table - constructed from (1) census infor-
mation on the number of individuals alive at each age,
for a given year and (2) vital statistics on the number
of deaths or failures in a given year, by age. This type
of lifetable is often reported in terms of a hypothetical
cohort of 100,000 people.

Generally, censoring is not an issue for Population Life Ta-

bles.

Clinical Life tables - applies to grouped survival data

from studies in patients with specific diseases. Because pa-
tients can enter the study at different times, or be lost to
follow-up, censoring must be allowed.

38
Notation

• the j-th time interval is [tj−1, tj )

• cj - the number of censorings in the j-th interval
• dj - the number of failures in the j-th interval
• rj is the number entering the interval

Example: 2418 Males with Angina Pectoris (chest pain,

from book by Lee, p.91)

Year after
Diagnosis j d j cj rj rj0 = rj − cj /2
[0, 1) 1 456 0 2418 2418.0
[1, 2) 2 226 39 1962 1942.5 (1962 - 39 2)
[2, 3) 3 152 22 1697 1686.0
[3, 4) 4 171 23 1523 1511.5
[4, 5) 5 135 24 1329 1317.0
[5, 6) 6 125 107 1170 1116.5
[6, 7) 7 83 133 938 871.5
etc..

39
Estimating the survivorship function

If we apply the KM formula directly to the numbers in the

table on the previous page, estimating S(t) as
Y dj

Ŝ(t) = 1− ,
j:τ <t
rj
j

this approach is unsatisfactory for grouped data because

it treats the problem as though it were in discrete time,
with events happening only at 1 yr, 2 yr, etc. In fact, we
should try to calculate the conditional probability of dying
within the interval, given survival to the beginning of it.

What should we do with the censored subjects?

Let rj0 denote the ‘effective’ number of subjects at risk. If we

assume that censorings occur:
• at the beginning of each interval: rj0 = rj − cj
• at the end of each interval: rj0 = rj
• on average halfway through the interval:
rj0 = rj − cj /2

The last assumption yields the Actuarial Estimator. It is

appropriate if censorings occur uniformly throughout the in-
terval.

40
Constructing the lifetable

First, some additional notation for the j-th interval, [tj−1, tj ):

• Midpoint (tmj ) - useful for plotting the density and

the hazard function
• Width (bj = tj −tj−1) needed for calculating the hazard
in the j-th interval

Quantities estimated:
• Conditional probability of dying (event)
q̂j = dj /rj0

• Conditional probability of surviving

p̂j = 1 − q̂j

• Cumulative probability of surviving at tj :

Y
Ŝ(tj ) = p̂`
`≤j
Y d`

= 1−
r`0
`≤j

41
Other quantities estimated at the
midpoint of the j-th interval:

• Hazard in the j-th interval (why)

dj
λ̂(tmj ) =
bj (rj0 − dj /2)
q̂j
=
bj (1 − q̂j /2)

• density at the midpoint of the j-th interval (why)

Ŝ(tj−1) − Ŝ(tj )
fˆ(tmj ) =
bj
Ŝ(tj−1) q̂j
=
bj
Note: Another way to get this is:
fˆ(tmj ) = λ̂(tmj )Ŝ(tmj )

= λ̂(tmj )[Ŝ(tj ) + Ŝ(tj−1)]/2

42
Estimated Survival
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 100 200 300 400 500 600 700 800 900 1000
Lower Limit of Time Interval

Figure 4: Life table estimate of survival

43
Estimated hazard
0.010

0.008

0.006

0.004

0.002

0.000
0 100 200 300 400 500 600 700 800 900 1000
Lower Limit of Time Interval

Figure 5: Estimated discrete hazard

Effective Statistical Learning Methods For Actuaroes I
No ratings yet
Effective Statistical Learning Methods For Actuaroes I
452 pages
Applied Life Data Analysis
83% (6)
Applied Life Data Analysis
656 pages
The Kaplan-Meier Estimate of The Survival Function
No ratings yet
The Kaplan-Meier Estimate of The Survival Function
23 pages
Chapter 2 Empirical Models Censured Data
No ratings yet
Chapter 2 Empirical Models Censured Data
36 pages
Survival Analysis-Debby Raden
No ratings yet
Survival Analysis-Debby Raden
98 pages
Survival - Notes (Lecture 3)
100% (1)
Survival - Notes (Lecture 3)
23 pages
Kaplan-Meier Estimator: Association. The Journal Editor, John Tukey, Convinced Them To Combine Their
No ratings yet
Kaplan-Meier Estimator: Association. The Journal Editor, John Tukey, Convinced Them To Combine Their
7 pages
Borgan 2014
No ratings yet
Borgan 2014
10 pages
Cai. Z (1996)
No ratings yet
Cai. Z (1996)
9 pages
Informe de Viaje de Visita Tecnica de Los Puentes La Leche Vilela y Motupe
No ratings yet
Informe de Viaje de Visita Tecnica de Los Puentes La Leche Vilela y Motupe
42 pages
Economics 508 Duration Models
No ratings yet
Economics 508 Duration Models
14 pages
W2 - Survival Function
No ratings yet
W2 - Survival Function
23 pages
7.2 Estimation of Survival Function: T T T W W W W W
No ratings yet
7.2 Estimation of Survival Function: T T T W W W W W
5 pages
Chapter Three
No ratings yet
Chapter Three
10 pages
Exact Waiting Time Survival Function 2155 6180.1000117 (2)
No ratings yet
Exact Waiting Time Survival Function 2155 6180.1000117 (2)
9 pages
Notation: T T T T T
No ratings yet
Notation: T T T T T
9 pages
05 Handout - Survival Analysis
No ratings yet
05 Handout - Survival Analysis
69 pages
Survival Analysis Theory 2024-4
No ratings yet
Survival Analysis Theory 2024-4
49 pages
Survival Analysis With STATA 1701597623
No ratings yet
Survival Analysis With STATA 1701597623
252 pages
Aduh 5
No ratings yet
Aduh 5
6 pages
A Confidence Interval For The Median Survival Time
No ratings yet
A Confidence Interval For The Median Survival Time
14 pages
Lecture 4
No ratings yet
Lecture 4
42 pages
Kaplan-Meier Survival Curves and The Log-Rank Test
No ratings yet
Kaplan-Meier Survival Curves and The Log-Rank Test
42 pages
Survival - Notes (Lecture 4)
No ratings yet
Survival - Notes (Lecture 4)
29 pages
Lecture 14: Nonparametric Survival Analysis Methods: James J. Dignam
No ratings yet
Lecture 14: Nonparametric Survival Analysis Methods: James J. Dignam
43 pages
Introduction to Survival Analysis
No ratings yet
Introduction to Survival Analysis
11 pages
Some Insight On Censored Cost Estimators: H. Zhao, Y. Cheng and H. Bang
No ratings yet
Some Insight On Censored Cost Estimators: H. Zhao, Y. Cheng and H. Bang
9 pages
Survival Analysis - lecture 2
No ratings yet
Survival Analysis - lecture 2
50 pages
Nearest Neighbor and Kernel Survival Analysis
No ratings yet
Nearest Neighbor and Kernel Survival Analysis
10 pages
Survival and Censored Data: Semester Project Winter 2005-2006
No ratings yet
Survival and Censored Data: Semester Project Winter 2005-2006
45 pages
Surv
No ratings yet
Surv
123 pages
Some Insight On Censored Cost Estimators: H. Zhao, Y. Cheng and H. Bang
No ratings yet
Some Insight On Censored Cost Estimators: H. Zhao, Y. Cheng and H. Bang
8 pages
5.nonparametricmethods
No ratings yet
5.nonparametricmethods
7 pages
Distribution-Weibull-Fitting
No ratings yet
Distribution-Weibull-Fitting
48 pages
Notes - Survival Analysis
No ratings yet
Notes - Survival Analysis
5 pages
Regresion models and life-tables-Cox10.2307@2985181
No ratings yet
Regresion models and life-tables-Cox10.2307@2985181
35 pages
Biostatistics Practicals
No ratings yet
Biostatistics Practicals
37 pages
Introduction To Clinical Research Survival Analysis - Getting Started Karen Bandeen-Roche, Ph.D. July 20, 2010
No ratings yet
Introduction To Clinical Research Survival Analysis - Getting Started Karen Bandeen-Roche, Ph.D. July 20, 2010
33 pages
Survival Analysis
No ratings yet
Survival Analysis
6 pages
Part 15 PDF
No ratings yet
Part 15 PDF
7 pages
School of Mathematics and Statistics: I I I I I
No ratings yet
School of Mathematics and Statistics: I I I I I
3 pages
2 Right Censoring and Kaplan-Meier Estimator: ST 745, Daowen Zhang
No ratings yet
2 Right Censoring and Kaplan-Meier Estimator: ST 745, Daowen Zhang
33 pages
Distribution (Weibull) Fitting
No ratings yet
Distribution (Weibull) Fitting
40 pages
Kaplan Meier analysis
No ratings yet
Kaplan Meier analysis
17 pages
A Programmer’s Introduction to Survival Analysis Using Kaplan Meier Methods
No ratings yet
A Programmer’s Introduction to Survival Analysis Using Kaplan Meier Methods
8 pages
Chapter 7 Models For Survival Data: (I) Motivation
No ratings yet
Chapter 7 Models For Survival Data: (I) Motivation
6 pages
ASC550
No ratings yet
ASC550
12 pages
Survival/Event History Analysis: BSC, MSC Biostatistics and Health Informatics
No ratings yet
Survival/Event History Analysis: BSC, MSC Biostatistics and Health Informatics
27 pages
Survival Analysis Using S
No ratings yet
Survival Analysis Using S
163 pages
Survival Analysis
No ratings yet
Survival Analysis
8 pages
(Cox (1972) ) Regression Models and Life Tables PDF
No ratings yet
(Cox (1972) ) Regression Models and Life Tables PDF
35 pages
PBHS32700 Lecture15
No ratings yet
PBHS32700 Lecture15
43 pages
Survival Analysis
No ratings yet
Survival Analysis
13 pages
Survival Analysis Notes
No ratings yet
Survival Analysis Notes
13 pages
Survival Analysis in R: David Diez
No ratings yet
Survival Analysis in R: David Diez
16 pages
R - Tutorial 1 - Survival Analysis in R
100% (1)
R - Tutorial 1 - Survival Analysis in R
16 pages
Chap1 PDF
No ratings yet
Chap1 PDF
12 pages
Survival AnalysisWMC2
No ratings yet
Survival AnalysisWMC2
56 pages
M1 Survival
No ratings yet
M1 Survival
46 pages
Life Table Analysis
No ratings yet
Life Table Analysis
19 pages
Rao Schoenfeld 2007 Survival Methods
No ratings yet
Rao Schoenfeld 2007 Survival Methods
5 pages
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
STA3030F - Jan 2015 PDF
No ratings yet
STA3030F - Jan 2015 PDF
13 pages
Stat 520 CH 7 Slides
No ratings yet
Stat 520 CH 7 Slides
35 pages
MLE of Utility Function by Using Stata
No ratings yet
MLE of Utility Function by Using Stata
14 pages
Assignment I - E-May 2021
No ratings yet
Assignment I - E-May 2021
2 pages
Forecasting Intermittent Demand For Slow-Moving Inventories PDF
No ratings yet
Forecasting Intermittent Demand For Slow-Moving Inventories PDF
13 pages
21EC51 DC Module 2
No ratings yet
21EC51 DC Module 2
52 pages
Quasi Maximum Likelihood Theory - Lecture Notes
No ratings yet
Quasi Maximum Likelihood Theory - Lecture Notes
119 pages
HW2 - Problem 4.2.1: General Model
No ratings yet
HW2 - Problem 4.2.1: General Model
6 pages
2018-A Reduced New Modified Weibull Distribution
No ratings yet
2018-A Reduced New Modified Weibull Distribution
27 pages
Biometric Models in Animal Breeding
100% (2)
Biometric Models in Animal Breeding
12 pages
Statistical Methods and Analyses for Medical Devices High-Quality eBook
100% (16)
Statistical Methods and Analyses for Medical Devices High-Quality eBook
17 pages
Autoregressive Conditional Heteroskedasticity (ARCH) Models: Econometrics II
No ratings yet
Autoregressive Conditional Heteroskedasticity (ARCH) Models: Econometrics II
49 pages
ST2334 Notes (Probability and Statistics - NUS)
No ratings yet
ST2334 Notes (Probability and Statistics - NUS)
55 pages
Final Exam Practice Problems
No ratings yet
Final Exam Practice Problems
8 pages
Likelihoodist Statistics
No ratings yet
Likelihoodist Statistics
4 pages
Stat 5102 Notes: Fisher Information and Confidence Intervals Using Maximum Likelihood
No ratings yet
Stat 5102 Notes: Fisher Information and Confidence Intervals Using Maximum Likelihood
8 pages
Practical Weibull Analysis Monograph 5th Ed
No ratings yet
Practical Weibull Analysis Monograph 5th Ed
103 pages
Anderson and Gerbing 1988
No ratings yet
Anderson and Gerbing 1988
13 pages
Avramov Doron Financial Econometrics
No ratings yet
Avramov Doron Financial Econometrics
554 pages
K R Shanmugam
No ratings yet
K R Shanmugam
18 pages
On A Bivariate Weibull Distribution: Advances and Applications in Statistics January 2011
No ratings yet
On A Bivariate Weibull Distribution: Advances and Applications in Statistics January 2011
31 pages
Ncaa ML Competition
No ratings yet
Ncaa ML Competition
24 pages
Fisher Information - Wikipedia
No ratings yet
Fisher Information - Wikipedia
13 pages
CONCEPTS_OF_MACHINE_LEARNING [MINOR]
No ratings yet
CONCEPTS_OF_MACHINE_LEARNING [MINOR]
14 pages
Shrutanik Chatterjee - 34230822046 - Machine Learning Applications
No ratings yet
Shrutanik Chatterjee - 34230822046 - Machine Learning Applications
8 pages
CT3 Past Exams 2005 - 2009
No ratings yet
CT3 Past Exams 2005 - 2009
175 pages
MLE Dan Bayesian Estimation From Walpole Book
No ratings yet
MLE Dan Bayesian Estimation From Walpole Book
13 pages
Statistical Modelling of Extreme Values
No ratings yet
Statistical Modelling of Extreme Values
70 pages