L22 Bootstrap
L22 Bootstrap
The instructor of this course owns the copyright of all the course materials. This lecture
material was distributed only to the students attending the course MTH511a: “Statistical
Simulation and Data Analysis” of IIT Kanpur, and should not be distributed in print or
through electronic media without the consent of the instructor. Students can make their own
copies of the course materials for their use.
1 Resampling Methods
1.1 Bootstrapping
2 2
where M LE is the inverse Fisher information. Then, if we can estimate MLE , we can
construct asymptotically normal confidence intervals:
r
2
ˆM
✓ˆMLE ± z1 ↵/2 LE
.
n
We can also conduct hypothesis tests etc and go on to do regular statistical analysis.
But sometimes we cannot use an asymptotic distribution:
1. when our estimates are not MLEs, like ridge and bridge regression
2. when the assumptions for asymptotic normality are not satisfied (I haven’t shared
these assumptions)
3. when n is not large enough for asymptotic normality to hold
1
We will try to approximate the distribution ✓ˆ using Boostrap, and from there we will
obtain a confidence intervals.
Suppose ✓ˆ is some estimator of ✓ from sample X1 , . . . , Xn ⇠ F . Then since ✓ˆ is random
it has a sampling distribution Gn that is unknown. If asymptotic normality holds, then
Gn ⇡ N (·, ·) for large enough n, but in general we may not know much about Gn . If
we could obtain many similar datasets, we could obtain an estimate from each dataset:
iid
✓ˆ1 , . . . , ✓ˆB ⇠ Gn .
Once we have B realizations from Gn , we can easily estimate characteristics about Gn ,
like the overall mean, variance, quantiles, etc.
Thus, in order to learn things about the sampling distribution Gn , our goal is to draw
more samples of such data. But this, of course is not easy in real-data scenarios. We
could obtain more Monte Carlo datasets from F , but we typically do not know the
true F . Instead of obtaining typical Monte Carlo datasets, we will “resample” from
our current dataset. This would give us an approximate sample from our distribution
Gn , and we could estimate characteristics of this distribution! This resampling using
information from the current data is called bootstrapping. We will study two popular
bootstrap methods: nonparameteric bootstrap and parametric bootstrap.
2
⇣ ⌘
Then ✓ˆb↵/2⇤Bc
⇤
, ✓ˆb1
⇤
↵/2⇤Bc , is a 100(1 ↵)% bootstrap confidence interval.
iid
Example 1 (Mean of Gamma(a, 1)). Let X1 , . . . , Xn ⇠ Gamma(a, 1). The mean of this
distribution is ✓ = a. Consider estimating ✓ with the sample mean
n
1X
✓ˆ = X̄ = X i ⇠ Gn
n i=1
Although a central limit theorem holds, so that Gn ⇡ N for large n. However, we may
not have many samples available in order to make confidence intervals. Thus, instead
here we implement the nonparametric bootstrap:
⇤ ⇤ ⇤
X11 , X21 , . . . , Xn1 ) X̄1⇤
⇤ ⇤ ⇤
X12 , X22 , . . . , Xn2 ) X̄2⇤
..
.
X1B , X2B , . . . , XnB ) X̄B⇤
⇤ ⇤ ⇤
.
And find ↵/2 and 1 ↵/2 sample quantiles from X̄i⇤ , i = 1, . . . , B.
iid
Example 2 (Mean of Gamma(a, 1)). Let X1 , . . . , Xn ⇠ Gamma(a, 1). And let ✓ˆ = X̄
be the chosen estimator of a. A parametric bootstrap estimator does:
⇤ ⇤ ⇤
X11 , X21 , . . . , Xn1 ⇠ Gamma(X̄, 1) ) X̄1⇤
⇤ ⇤ ⇤
X12 , X22 , . . . , Xn2 ⇠ Gamma(X̄, 1) ) X̄2⇤
..
.
X1B , X2B , . . . , XnB ⇠ Gamma(X̄, 1) ) X̄B⇤ .
⇤ ⇤ ⇤
3
Let us implement this Gamma example in detail. Notice that a CLT holds here with
asymptotic variance a, so that as n ! 1.
p d
n(X̄ a) ! N (0, a) .
4
#0.02262155 0.31736283
# 95% asymptotic CI
c( barx - qnorm(.975)*sqrt(barx/n), barx + qnorm(.975)*sqrt(barx/n) )
#[1] -0.03029098 0.27848545
Since the sample size is low, all methods give di↵erent estimates. Certainly the CI
based on the CLT is quite o↵ since it proposed negative values in the interval, which
is invalid. The parametric bootstrap is the closest to the truth. We can also compare
their empirical densities against the truth.
plot(density(boot.np), col = "purple", xlim = c(0,1.5),
main = "Comparing sampling densities")
lines(density(boot.p), col = "blue")
lines(density(rnorm(1e4, mean = barx, sd = sqrt(barx/n))), col = "red")
lines(density(true.samp))
legend("topright",lty = 1, legend = c("Truth", "Nonparametric",
"Parameteric", "Approximate normal"), col = c("black", "purple", "blue",
"red"))
Truth
Nonparametric
6
Parameteric
5
Approximate normal
Density
4
3
2
1
0
5
Clearly the nonparametric estimator has lower than the real variance and this is the
consequence of the really small sample size. The approximate normal interval assumes
a symmetric sampling distribution, which is clearly not true.
We can repeat the same thing for a larger n. Here, the asymptotic normal distribution
coincides with the sampling distributions obtained via both bootstrap methods.
n <- 20
a <- .20
my.samp <- rgamma(n, shape = a, rate = 1)
for(b in 1:B)
{
boot.samp.np <- sample(my.samp, replace = TRUE) # NP Bootstramp samples
boot.np[b] <- mean(boot.samp.np) # NP Bootstrap estimator
# 95% asymptotic CI
c( barx - qnorm(.975)*sqrt(barx/n), barx + qnorm(.975)*sqrt(barx/n) )
#[1] 0.1809254 0.2376330
6
# 2.5% 97.5%
#0.1736390 0.2289925
All the estimated intervals are similar, however, they di↵er slightly from the true
interval, which is natural, since the true interval is centered around the ground truth,
and we are centered around X̄.
plot(density(boot.np), col = "purple", xlim = c(0,1.5),
main = "Comparing sampling densities")
lines(density(boot.p), col = "blue")
lines(density(rnorm(1e4, mean = barx, sd = sqrt(barx/n))), col = "red")
lines(density(true.samp))
legend("topright",lty = 1, legend = c("Truth", "Nonparametric",
"Parameteric", "Approximate normal"), col = c("black", "purple", "blue",
"red"))
Truth
Nonparametric
Parameteric
Approximate normal
Density
5
0