An Introduction To The Bootstrap
An Introduction To The Bootstrap
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/227686869
CITATIONS
READS
403
2,401
1 author:
Roger Johnson
South Dakota School of Mines and Technology
40 PUBLICATIONS 621 CITATIONS
SEE PROFILE
All content following this page was uploaded by Roger Johnson on 22 April 2015.
The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
Roger W. Johnson
^ INTRODUCTION ^
n the eighteenth century stories of The Adventures of Baron Munchausen by Rudolph Erich
Raspe (Raspe 1785), the Baron apparently falls to
the bottom of a deep lake. Just when it looks like
all is lost, he saves himself by picking himself up
by his own bootstraps. Likewise bootstrap
methods in statistics seem to accomplish the impossible. These computationally intensive methods,
brought to prominence through the pioneering
work of Bradley Efron, are commonly used by
statistics professionals and are beginning to work
their way into elementary, even algebra-based
statistics texts (e.g. Stout et al. 1999). In this article
I present bootstrap methods for estimating
standard errors and producing condence intervals. Bootstrap methods are more exible than
classical methods which may be analytically
intractable or unusable because of a lack of the
appropriate assumptions being satised. When
classical methods may reasonably be used, however, we will typically see that bootstrap methods
give quite similar results. The presentation that
follows is based on details that appear in Efron
and Tibshirani (1986, 1993) and Rice (1995), and
includes short Minitab macros to come up with the
desired estimates. Related articles that have
appeared in this journal are those of Ricketts and
Berry (1994), Reeves (1995) and Tae and
Garnham (1996).
STANDARD ERROR
s
7:87
SEX p p 1:24 seconds
n
40
^ BOOTSTRAP ESTIMATES OF ^
Teaching Statistics.
49
File: sedriver.txt
noecho
erase c10
let k1 = n(c1)
let k2 = 200
execute 'bootstrp.txt' k2
echo
let k3 = stdev(c10)
print k3
end
File: bootstrp.txt
sample k1 c1 c11;
replace.
let k20 = mean(c11)
stack c10 k20 c10
Use execute sedriver.txt at the Minitab prompt to run this bootstrap procedure
Fig 2. Bootstrap standard error code
50
. Teaching Statistics.
^ BOOTSTRAP CONFIDENCE ^
As a nal example to illustrate the above bootstrap
method of estimating standard errors, consider
male mortality rate averaged over the years 1958^
1964 for towns in England and Wales versus
calcium (from Hand et al. 1994, pp. 5^6), shown
as a scatter diagram in gure 3. The calcium
concentration may be thought of as a measure of
water hardness; the higher the calcium concentration, the harder the water. The correlation
coecient between male mortality and calcium
concentration for the 61 data points shown is
0.655.
INTERVALS
51
Use execute cidriver.txt at the Minitab prompt to run this bootstrap procedure
Fig 4. Bootstrap condence interval code
52
. Teaching Statistics.
^ FURTHER DETAILS ^
When trying to assess the performance of an
estimate y^ of y we will, in general, be concerned
^ Ey
^ y of y^ as well as
with the bias, Biasy
^
with SEy In fact, a measure of the typical
deviation
of y^ from y is the root mean square
q
^ , where
error, MSEy
^ E y^ y2 Biasy
^ 2
^ 2 SEy
MSEy
Consequently, if we can estimate the bias as
well as the standard error of an estimate, we
can determine an estimate of the root mean
square error. Fortunately, the bias can also be
estimated by a bootstrap procedure (before
continuing, do you see how?). In particular,
reasoning as we have before, we can use the
approximation
^ Ey
^ y
Biasy
Average of Bootstrap Estimates y^
The average of the bootstrap estimates can be
obtained by inserting the lines let k9 mean(c10),
print k9, just before the end statement in the
sedriver.txt macro. In the given illustrative examples involving the sample mean, sample median
and sample correlation as estimates of their
corresponding population parameters, little evidence of bias was seen (indeed, we know the
sample mean is unbiased for the population
mean).
The bootstrap procedures discussed here, one for
the standard error of an estimate y^ of y, and one
for producing condence intervals for y, are
nonparametric bootstrap procedures. In each case
the bootstrap samples are obtained by repeated
samples with replacement from the data. Alternatively, with parametric bootstrap procedures the
original data can be used to t a probability
model and our samples can be drawn from it. To
illustrate, the exponential density f x l elx
with l 1=7:80 ts the M1 motorway data well.
Consequently, bootstrap samples can be obtained
by taking random samples of size n from this
tted density and then computing the relevant
statistic (e.g. mean, median) as before. To
generate a particular random sample from an
exponential distribution the inverse cdf method
may be used. Specically, generate a uniform
random number U between 0 and 1. Then
lnU=l 7:80 lnU will have the desired
exponential distribution.
Acknowledgement
Thanks are due to the referee for comments that
led to an improved presentation.
Teaching Statistics.
53
References
Edgington, E. (1995). Randomization Tests
(3rd revised edn). New York: Marcel
Dekker.
Efron, B. and Tibshirani, R. (1986). Bootstrap
methods for standard errors, condence
intervals, and other measures of statistical accuracy. Statistical Science, 1(1),
54^77.
Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. London: Chapman
& Hall.
Good, P. (2000). Permutation Tests: A Practical Guide to Resampling Methods for
Testing Hypotheses (2nd edn). New York:
Springer.
Hand, D., Daly, F., Lunn A., McConway,
K. and Ostrowski, E. (eds) (1994). A
54
. Teaching Statistics.