0% found this document useful (0 votes)
44 views3 pages

Bias in Censored Median Regression

This document evaluates the performance of censored median regression estimators. It finds that approaches that treat censored observations differently based on information like conditional probabilities introduce less bias than methods that treat all censored data the same. Specifically, an "inequality" loss model that handles censoring with inequality constraints is biased, while a weighted loss model that accounts for conditional probabilities of censoring using weights introduces only small bias. Simulation results support that the weighted approach has better mean squared error properties.

Uploaded by

Antonio Eleuteri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views3 pages

Bias in Censored Median Regression

This document evaluates the performance of censored median regression estimators. It finds that approaches that treat censored observations differently based on information like conditional probabilities introduce less bias than methods that treat all censored data the same. Specifically, an "inequality" loss model that handles censoring with inequality constraints is biased, while a weighted loss model that accounts for conditional probabilities of censoring using weights introduces only small bias. Simulation results support that the weighted approach has better mean squared error properties.

Uploaded by

Antonio Eleuteri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 3

Bias in censored median regression

A. Eleuteri

In this short report we evaluate the performance of a censored median regression


estimator, as presented by two different groups of authors [1], [2]. In both cases the
algorithm can be reduced to the same form, which we will define as based on an
“inequality” loss (since censoring is dealt with by a set of inequality constraints.) In
the following we consider the case of right censoring.

Source of bias

Suppose we have a random sample of pairs, { (Ti , Ci ) : i = 1,L , n} , Ti : F , Ti and Ci


conditionally independent (though in the following we will assume the one-sample
case.) Let us consider the case of right censoring, so Yi = min { Ti , Ci } and
d i = I ( Ti < Ci ) , where I(.) is the set indicator function. The median loss function is
r (r ) = r { 1 2 - I (r < 0)} .
In ordinary quantile regression the contribution of each point to the subgradient
condition only depends on the sign of the residuals [3] ri = Ti - x .
For uncensored data we observe both Yi = Ti < Ci and I ( ri < 0 ) (note the residuals can
be either negative or positive.)
Similarly for censored data, in the case x < Yi = Ci (note in this case Ti > Ci , hence
I ( ri < 0 ) = 0 ). However, if x > Yi = Ci we cannot observe the sign of the residual,
since we can have either x > Ti or x �Ti (i.e. the residual can be negative or positive.)
We can however evaluate the following conditional expectation (w.r.t. the measure F):
Pr { Ci < Ti < x } F (x ) - F (Ci ) 1 2 - F (Ci )
E [ I (ri < 0) | Ti > Ci ] = = = .
Pr { Ci < Ti } 1 - F (Ci ) 1 - F (Ci )

The above quantity (calculated for F (Ci ) < 1 2 ) gives a measure of the “weight”
attached to ambiguous observations. This suggests a weighting scheme originally
proposed by Efron [4] and adapted by Portnoy [5] to quantile regression.

In contrast, the censored observations are all dealt with in the same way in the
“inequality” loss model; this fact introduces a bias in the estimates.

In the following graph we compare different expressions of the empirical loss,


compared with the true (unfeasible) empirical loss. The “naïve” loss simply ignores
the censoring information, introducing a large bias. The “inequality” loss also
introduces some bias. The weighted loss introduces only a small bias, though it
requires knowledge of the distribution of the events, which is normally not available;
it seems plausible that a nonparametric estimator might be used, e.g. the Kaplan-
Meier estimator. It’s not clear how the presence of covariates affects the estimate.
naive loss
weighted loss
100
"inequality" loss
true (unfeasible) loss
80

60

40

20

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

One-sample asymptotics experiment

The following tables report the results of a simulation experiment comparing the finite
sample performance of some estimates of the median in a censored one-sample
setting. We assume the distribution of events as standard lognormal, and the censoring
distribution as exponential with rate 0.25. We follow the experimental setup in [6].
For each problem instance the estimate was calculated 1000 times and the results
averaged.
Note we also report the performance of the (unfeasible) sample median.

TABLE 2: Scaled MSE


Sample Kaplan-Meier Censored median Censored median
median regression regression
(“inequality” loss) (weighted loss)
N=50 1.674 1.756 1.528 1.686
N=200 1.780 2.023 2.265 1.826
N=500 1.565 1.902 3.694 1.774
N=1000 1.445 1.716 5.613 1.547
N= 1.571 1.839 ? ?
MSE for some estimators of the median. The estimates are scaled by sample size to
conform to asymptotic variance calculations.

It is evident that the “inequality” loss approach produces biased results.

References

[1] K. Pelckmans, J. De Brabanter, J. A. K. Suykens, B. De Moor. Risk Scores,


Empirical Z-estimators and its application to Censored Regression. Technical Report
kp06-105 (2006).
[2] P. Shivaswamy, W. Chu, M. Jansche. A Support Vector Approach to Censored
Targets. Proceedings of the 2007 Seventh IEEE International Conference on Data
Mining (2007).

[3] R. Koenker. Quantile Regression. Cambridge University Press (2005).

[4] B. Efron. The Two Sample Problem with Censored Data. Proceedings of the
5th Berkeley Symposium on Mathematical Statistics and Probability, Prentice-Hall,
New York (1967).

[5] S. Portnoy. Censored Quantile Regression." Journal of American Statistical


Association, 98, 1001-1012 (2003).

[6] R. Koenker. Censored Quantile Regression Redux. Journal of Statistical Software,


27-6 (2008).

You might also like