0% found this document useful (0 votes)
72 views

L22 Bootstrap

MTH511

Uploaded by

Ananya Agarwal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views

L22 Bootstrap

MTH511

Uploaded by

Ananya Agarwal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

MTH 511a - 2020: Lecture 22

Instructor: Dootika Vats

The instructor of this course owns the copyright of all the course materials. This lecture
material was distributed only to the students attending the course MTH511a: “Statistical
Simulation and Data Analysis” of IIT Kanpur, and should not be distributed in print or
through electronic media without the consent of the instructor. Students can make their own
copies of the course materials for their use.

1 Resampling Methods

1.1 Bootstrapping

We have discussed cross-validation, which we use to choose model tuning parameters.


However, once the final model is fit, we would like to make inference. That is, we want
to account for the variability of the final estimators obtained.
If our estimates are MLEs then we know that under certain important conditions,
MLEs have asymptotic normality, that is
p d
n(✓ˆMLE ✓) ! N (0, 2
MLE ) ,

2 2
where M LE is the inverse Fisher information. Then, if we can estimate MLE , we can
construct asymptotically normal confidence intervals:
r
2
ˆM
✓ˆMLE ± z1 ↵/2 LE
.
n
We can also conduct hypothesis tests etc and go on to do regular statistical analysis.
But sometimes we cannot use an asymptotic distribution:
1. when our estimates are not MLEs, like ridge and bridge regression
2. when the assumptions for asymptotic normality are not satisfied (I haven’t shared
these assumptions)
3. when n is not large enough for asymptotic normality to hold

1
We will try to approximate the distribution ✓ˆ using Boostrap, and from there we will
obtain a confidence intervals.
Suppose ✓ˆ is some estimator of ✓ from sample X1 , . . . , Xn ⇠ F . Then since ✓ˆ is random
it has a sampling distribution Gn that is unknown. If asymptotic normality holds, then
Gn ⇡ N (·, ·) for large enough n, but in general we may not know much about Gn . If
we could obtain many similar datasets, we could obtain an estimate from each dataset:
iid
✓ˆ1 , . . . , ✓ˆB ⇠ Gn .
Once we have B realizations from Gn , we can easily estimate characteristics about Gn ,
like the overall mean, variance, quantiles, etc.
Thus, in order to learn things about the sampling distribution Gn , our goal is to draw
more samples of such data. But this, of course is not easy in real-data scenarios. We
could obtain more Monte Carlo datasets from F , but we typically do not know the
true F . Instead of obtaining typical Monte Carlo datasets, we will “resample” from
our current dataset. This would give us an approximate sample from our distribution
Gn , and we could estimate characteristics of this distribution! This resampling using
information from the current data is called bootstrapping. We will study two popular
bootstrap methods: nonparameteric bootstrap and parametric bootstrap.

1.1.1 Nonparametric Bootstrap

In nonparametric bootstrap, we resample data of size n from within the sample of Xs


(with replacement) and obtain estimates of ✓ using these samples. That is

Bootstrap sample 1: X11 ⇤
, X21 ⇤
, . . . , Xn1 ) ✓ˆ1⇤
Bootstrap sample 2: X ⇤ , X ⇤ , . . . , X ⇤ ) ✓ˆ⇤
12 22 n2 2
..
.

Bootstrap sample B: X1B ⇤
, X2B ⇤
, . . . , XnB ) ✓ˆB

.
Each sample is called a bootstrap sample, and there are B bootstrap samples. Now,
the idea is that ✓ˆ1⇤ , . . . ✓ˆB
⇤ ˆ Gn .
are B approximate samples from the distribution of ✓,
We want to construct a 100(1 ˆ A confidence interval
↵)% confidence interval for ✓.
(L, U ) is an interval such that
Pr((L, U ) contains ✓) = 1 ↵.
Note that here L and U are random and ✓ is fixed. We can find the confidence interval
by looking at the quantiles of the obtained bootstrap estimates. So if we order the
bootstrap estimates
✓ˆ(1)

< ✓ˆ(2)

< · · · < ✓ˆ(B)

,
and set
L = ✓ˆb↵/2⇤Bc

and U = ✓ˆb1

↵/2⇤Bc .

2
⇣ ⌘
Then ✓ˆb↵/2⇤Bc

, ✓ˆb1

↵/2⇤Bc , is a 100(1 ↵)% bootstrap confidence interval.

iid
Example 1 (Mean of Gamma(a, 1)). Let X1 , . . . , Xn ⇠ Gamma(a, 1). The mean of this
distribution is ✓ = a. Consider estimating ✓ with the sample mean
n
1X
✓ˆ = X̄ = X i ⇠ Gn
n i=1
Although a central limit theorem holds, so that Gn ⇡ N for large n. However, we may
not have many samples available in order to make confidence intervals. Thus, instead
here we implement the nonparametric bootstrap:
⇤ ⇤ ⇤
X11 , X21 , . . . , Xn1 ) X̄1⇤
⇤ ⇤ ⇤
X12 , X22 , . . . , Xn2 ) X̄2⇤
..
.
X1B , X2B , . . . , XnB ) X̄B⇤
⇤ ⇤ ⇤

.
And find ↵/2 and 1 ↵/2 sample quantiles from X̄i⇤ , i = 1, . . . , B.

1.1.2 Parametric Bootstrap

Suppose X1 , . . . , Xn ⇠ F (✓), where ✓ is a parameter we can estimate. Let ✓ˆ be a chosen


estimator of ✓. Instead of resampling within our data, in parametric bootstrap, we use
our estimator of ✓ to obtain computer generated samples from F (✓): ˆ

X11 ⇤
, X21 ⇤
, . . . , Xn1 ˆ ) ✓ˆ⇤
⇠ F (✓) 1
⇤ ⇤ ⇤ ˆ
X , X , . . . , X ⇠ F (✓) ) ✓⇤ ˆ
12 22 n2 2
..
.

X1B ⇤
, X2B ⇤
, . . . , XnB ˆ ) ✓ˆ⇤
⇠ F (✓) B
⇣ ⌘
And again, we find the ↵/2 and 1 ↵/2 quantiles of the ✓ˆi⇤ s so that ✓ˆb↵/2⇤Bc

, ✓ˆb1

↵/2⇤Bc ,
is a 100(1 ↵)% bootstrap confidence interval.

iid
Example 2 (Mean of Gamma(a, 1)). Let X1 , . . . , Xn ⇠ Gamma(a, 1). And let ✓ˆ = X̄
be the chosen estimator of a. A parametric bootstrap estimator does:
⇤ ⇤ ⇤
X11 , X21 , . . . , Xn1 ⇠ Gamma(X̄, 1) ) X̄1⇤
⇤ ⇤ ⇤
X12 , X22 , . . . , Xn2 ⇠ Gamma(X̄, 1) ) X̄2⇤
..
.
X1B , X2B , . . . , XnB ⇠ Gamma(X̄, 1) ) X̄B⇤ .
⇤ ⇤ ⇤

And find ↵/2 and 1 ↵/2 sample quantiles from X̄i⇤ , i = 1, . . . , B.

3
Let us implement this Gamma example in detail. Notice that a CLT holds here with
asymptotic variance a, so that as n ! 1.
p d
n(X̄ a) ! N (0, a) .

So that an asymptotic 95% confidence interval is


r

X̄ ± z.975 .
n
We will make four comparisons:
• The empirical distribution of nonparametric bootstrap estimates
• The empirical distribution of parametric bootstrap estimates
• The large sample normal distribution N (a, a/n)
• The true sampling distribution
###########################################
## Bootstrap for the Gamma distribution
###########################################
set.seed(10)
n <- 20
a <- .20
my.samp <- rgamma(n, shape = a, rate = 1)

barx <- mean(my.samp)

B <- 1e3 # number of bootstrap samples. 1e3 is standard.


boot.np <- numeric(length = B)
boot.p <- numeric(length = B)
for(b in 1:B)
{
boot.samp.np <- sample(my.samp, replace = TRUE) # NP Bootstramp samples
boot.np[b] <- mean(boot.samp.np) # NP Bootstrap estimator

boot.samp.p <- rgamma(n, shape = barx, rate = 1) #parametric bootstrap


samples
boot.p[b] <- mean(boot.samp.p) # P bootstrap estimator
}

# 95% Bootstrap confidence interval


quantile(boot.np, probs = c(.025, .975)) # nonparameteric
# 2.5% 97.5%
#0.03058783 0.24917142

quantile(boot.p, probs = c(.025, .975)) #parametric


# 2.5% 97.5%

4
#0.02262155 0.31736283

# 95% asymptotic CI
c( barx - qnorm(.975)*sqrt(barx/n), barx + qnorm(.975)*sqrt(barx/n) )
#[1] -0.03029098 0.27848545

# Simulate repeated estimates to construct a 95% CI


true.samp <- numeric(length = 1e4)
for(i in 1:1e4)
{
samp <- rgamma(n, shape = a, rate = 1)
true.samp[i] <- mean(samp)
}
quantile(true.samp, probs = c(.025, .975))
# 2.5% 97.5%
#0.05446254 0.43878948

Since the sample size is low, all methods give di↵erent estimates. Certainly the CI
based on the CLT is quite o↵ since it proposed negative values in the interval, which
is invalid. The parametric bootstrap is the closest to the truth. We can also compare
their empirical densities against the truth.
plot(density(boot.np), col = "purple", xlim = c(0,1.5),
main = "Comparing sampling densities")
lines(density(boot.p), col = "blue")
lines(density(rnorm(1e4, mean = barx, sd = sqrt(barx/n))), col = "red")
lines(density(true.samp))
legend("topright",lty = 1, legend = c("Truth", "Nonparametric",
"Parameteric", "Approximate normal"), col = c("black", "purple", "blue",
"red"))

Comparing sampling densities


7

Truth
Nonparametric
6

Parameteric
5

Approximate normal
Density

4
3
2
1
0

0.0 0.5 1.0 1.5

N = 1000 Bandwidth = 0.01324

5
Clearly the nonparametric estimator has lower than the real variance and this is the
consequence of the really small sample size. The approximate normal interval assumes
a symmetric sampling distribution, which is clearly not true.
We can repeat the same thing for a larger n. Here, the asymptotic normal distribution
coincides with the sampling distributions obtained via both bootstrap methods.
n <- 20
a <- .20
my.samp <- rgamma(n, shape = a, rate = 1)

barx <- mean(my.samp)

B <- 1e3 # number of bootstrap samples


boot.np <- numeric(length = B)
boot.p <- numeric(length = B)

for(b in 1:B)
{
boot.samp.np <- sample(my.samp, replace = TRUE) # NP Bootstramp samples
boot.np[b] <- mean(boot.samp.np) # NP Bootstrap estimator

boot.samp.p <- rgamma(n, shape = barx, rate = 1) #parametric bootstrap


samples
boot.p[b] <- mean(boot.samp.p) # P bootstrap estimator
}

# 95% Bootstrap confidence interval


quantile(boot.np, probs = c(.025, .975)) # nonparameteric
# 2.5% 97.5%
#0.1812819 0.2370416

quantile(boot.p, probs = c(.025, .975)) #parametric


# 2.5% 97.5%
#0.1832509 0.2374146

# 95% asymptotic CI
c( barx - qnorm(.975)*sqrt(barx/n), barx + qnorm(.975)*sqrt(barx/n) )
#[1] 0.1809254 0.2376330

# Simulate repeated estimates to construct a 95% CI


true.samp <- numeric(length = 1e4)
for(i in 1:1e4)
{
samp <- rgamma(n, shape = a, rate = 1)
true.samp[i] <- mean(samp)
}
quantile(true.samp, probs = c(.025, .975))

6
# 2.5% 97.5%
#0.1736390 0.2289925

All the estimated intervals are similar, however, they di↵er slightly from the true
interval, which is natural, since the true interval is centered around the ground truth,
and we are centered around X̄.
plot(density(boot.np), col = "purple", xlim = c(0,1.5),
main = "Comparing sampling densities")
lines(density(boot.p), col = "blue")
lines(density(rnorm(1e4, mean = barx, sd = sqrt(barx/n))), col = "red")
lines(density(true.samp))
legend("topright",lty = 1, legend = c("Truth", "Nonparametric",
"Parameteric", "Approximate normal"), col = c("black", "purple", "blue",
"red"))

Comparing sampling densities


10 15 20 25 30

Truth
Nonparametric
Parameteric
Approximate normal
Density

5
0

0.15 0.20 0.25 0.30

N = 1000 Bandwidth = 0.003213

2 Questions to think about


• We set B = 1000 in our experiments. What would happen if we increase of
decrease B?
• How will you used bootstrapping to obtain confidence intervals for bridge regres-
sion coefficients?
• Do we need bootstrapping to obtain confidence intervals for ridge regression es-
timates?

You might also like