0% found this document useful (0 votes)
80 views29 pages

Small-Sample Inference and Bootstrap: Leonid Kogan

MIT15_450F10_lec09

Uploaded by

seanwu95
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views29 pages

Small-Sample Inference and Bootstrap: Leonid Kogan

MIT15_450F10_lec09

Uploaded by

seanwu95
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Small-Sample Inference

Bootstrap

Small-Sample Inference and Bootstrap


Leonid Kogan
MIT, Sloan

15.450, Fall 2010

c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

1 / 28

Small-Sample Inference

Bootstrap

Outline

Small-Sample Inference

Bootstrap

c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

2 / 28

Small-Sample Inference

Bootstrap

Overview

So far, our inference has been based on asymptotic results: LLN and
CLT.
Asymptotic inference is sometimes difcult to apply, too complicated
analytically.
In small samples, asymptotic inference may be unreliable:
Estimators may be consistent but biased.
Standard errors may be imprecise, leading to incorrect condence
intervals and statistical test size.

We can use simulation methods to deal with some of these issues:


Bootstrap can be used instead of asymptotic inference to deal with
analytically challenging problems.
Bootstrap can be used to adjust for bias.
Monte Carlo simulation can be used to gain insight into the properties
of statistical procedures.
c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

3 / 28

Small-Sample Inference

Bootstrap

Outline

Small-Sample Inference

Bootstrap

c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

4 / 28

Small-Sample Inference

Bootstrap

Example: Autocorrelation

We want to estimate rst-order autocorrelation of a time series xt


(e.g., ination), corr(xt , xt +1 ).
Estimate by OLS (GMM)
xt = a0 + 1 xt 1 + t
We know that this estimator is consistent:

1 = 1

plimT
We want to know if this estimator is biased, i.e., we want to estimate

1 ) 1
E(

c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

5 / 28

Small-Sample Inference

Bootstrap

Example: Autocorrelation, Monte Carlo

Perform a Monte Carlo study to gain insight into the phenomenon.


Simulate independently N random series of length T.
Each series follows an AR(1) process with persistence 1 and
Gaussian errors:
xt = 1 xt 1 + t ,

t N(0, 1)

1 (n), n = 1, ..., N for each simulated sample.


Compute
Estimate the bias:
N
1

1 ) 1 =
1 (n) 1
E(

n=1

Standard error of our simulation-based estimate is

2
1

(
1 (n) E
=
1 )

c Leonid Kogan ( MIT, Sloan )

n=1
Bootstrap

15.450, Fall 2010

6 / 28

Small-Sample Inference

Bootstrap

Example: Autocorrelation, Monte Carlo

MATLAB Code















c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

7 / 28

Small-Sample Inference

Bootstrap

Example: Autocorrelation, Monte Carlo


We use 100,000 simulations to estimate the average bias

1
0.9
0.0
0.9
0.0

T
50
50
100
100

Average Bias

0.0826 0.0006
0.0203 0.0009
0.0402 0.0004
0.0100 0.0006

Bias seems increasing in 1 , and decreasing with sample size.


There is an analytical formula for the average bias due to Kendall:

1 ) 1
E(

1 + 31
T

When explicit formulas are not known, can use bootstrap to estimate
the bias.
c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

8 / 28

Small-Sample Inference

Bootstrap

Example: Predictive Regression

Consider a predictive regression (e.g., forecasting stock returns using


dividend yield)
rt +1 = + xt + ut +1
xt +1 = + xt + t +1

(ut , t ) N(0, )
Stambaugh bias:

) =
E(

Cov(ut , t )
1 + 3 Cov(ut , t )
)
E(
Var(t )
T
Var(t )

In case of dividend yield forecasting stock returns, the bias is positive,


.
and can be substantial compared to the standard error of
c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

9 / 28

Small-Sample Inference

Bootstrap

Predictive Regression: Monte Carlo

Predictive regression of monthly S&P 500 excess returns on log


dividend yield:
rt +1 = + xt + ut +1
xt +1 = + xt + t +1
Data: CRSP, From 1/31/1934 to 12/31/2008.
Parameter estimates:

= 0.0089,

= 0.9936,

) = 0.005.
S.E.(

c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

10 / 28

Small-Sample Inference

Bootstrap

Predictive Regression: Monte Carlo


Generate 1,000 samples with parameters equal to empirical
estimates. Use 200 periods as burn-in, retain samples of the same
length as historical.
and standard errors for each sample. Use Newey-West
Tabulate
with 6 lags to compute standard errors.
Sample Distribution of T-stat
150

is 0.013.

Average of
100

is 0.004.

Average bias in
Average standard error is 0.005.

50

Average t-stat on is 0.75.

0
-4

c Leonid Kogan ( MIT, Sloan )

Bootstrap

-3

-2

-1

15.450, Fall 2010

11 / 28

Small-Sample Inference

Bootstrap

Testing the Mean: Non-Gaussian Errors

We estimate the mean of a distribution by the sample mean. Tests


are based on the asymptotic distribution

N(0, 1)
T
/
How good is the normal approximation in nite samples if the sample
comes from a Non-Gaussian distribution?
Assume that the sample is generated by a lognormal distribution:
1

xt = e 2 +t ,

c Leonid Kogan ( MIT, Sloan )

Bootstrap

t N(0, 1)

15.450, Fall 2010

12 / 28

Small-Sample Inference

Bootstrap

Lognormal Example: Monte Carlo

Monte Carlo experiment: N = 100,000,


T = 50. Document the distribution of the
t-statistic
1

t =

T
/
Asymptotic theory dictates that

Var(t ) = 1. We estimate

A histogram of t across
100,000 simulations
5000
4500
4000
3500
3000
2500
2000

Var(t ) = 1.25422

1500
1000

Tails of the distribution of t are far from

500
0
8

the asymptotic values:

Prob(t > 1.96) 0.0042,


c Leonid Kogan ( MIT, Sloan )

Prob(t < 1.96) 0.1053


Bootstrap

15.450, Fall 2010

13 / 28

Small-Sample Inference

Bootstrap

Outline

Small-Sample Inference

Bootstrap

c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

14 / 28

Small-Sample Inference

Bootstrap

Bootstrap: General Principle

Bootstrap is a re-sampling method which can be used to evaluate


properties of statistical estimators.
Bootstrap is effectively a Monte Carlo study which uses the empirical
distribution as if it were the true distribution.
Key applications of bootstrap methodology:
Evaluate distributional properties of complicated estimators, perform
bias adjustment;
Improve the precision of asymptotic approximations in small samples
(condence intervals, test rejection regions, etc.)

c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

15 / 28

Small-Sample Inference

Bootstrap

Bootstrap for IID Observations

Suppose we are given a sample of IID observations xt , t = 1, ..., T .

(xt ). What is the 95%

=E
We estimate the sample mean as
condence interval for this estimator?

Asymptotic theory suggests computing the condence interval based


on the Normal approximation

( xt )
E
T
N(0, 1),

T
2

t =1 [xt

(xt )]2
E
T

Under the empirical distribution, x is equally likely to take one of the


values x1 , x2 , ..., xT .

c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

16 / 28

Small-Sample Inference

Bootstrap

Key Idea of Bootstrap

REAL WORLD
Unknown
probability
model

BOOTSTRAP WORLD
Estimated
probability
model

Observed data

^
P

x = (x1, x2, ... xn)

Parameter of
interest
= (P)

Bootstrap sample
x* = (x*1, x2*, ... x*n)
Bootstrap
replicate
of ^

Estimated
parameter

Estimate of
^ = s(x)

^*
= s(x*)

^ = (P)
^

^
^ ,)
^ (
BiasP

^ )
BiasP(,

Image by MIT OpenCourseWare.

Source: Efron and Tibshirani, 1994, Figure 10.4.


c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

17 / 28

Small-Sample Inference

Bootstrap

Bootstrap Condence Intervals

Bootstrap condence interval starts by drawing R samples from the

empirical distribution.

. denotes statistics

For each bootstrapped sample, compute


computed using bootstrapped samples.

Compute 2.5% and 97.5% percentiles of the resulting distribution of

2.5% ,

97.5%

with the simulated distribution of


Approximate the distribution of


. Estimate the condence interval as

(
97.5%
),
(
2.5%
))
(

c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

18 / 28

Small-Sample Inference

Bootstrap

Example: Lognormal Distribution

Fix a sample of 50 observations from a lognormal distribution


ln xt N(1/2, 1) and compute the estimates

= 1.1784,

= 1.5340

Population mean
1

= E(xt ) = E(e 2 +t ) = 1,

t N(0, 1)

Asymptotic approximation produces a condence interval

1.96 ,
+ 1.96 ) = (0.7532, 1.6036)
(
T

Compare this to the bootstrapped distribution.

c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

19 / 28

Small-Sample Inference

Bootstrap

Lognormal Distribution
Use bootstrap instead of asymptotic inference.

MATLAB Code








Bootstrap estimate of the condence interval

(0.7280, 1.5615)
c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

20 / 28

Small-Sample Inference

Bootstrap

Testing the Mean: Bootstrap


Lognormal Example

Consistent with Monte Carlo results: small-sample distribution of


t-statistics exhibits left-skewness.
Variance of the bootstrapped t-statistic is 1.18522 . Normal
approximation: Var(t ) = 1. Monte Carlo estimate: Var(t ) = 1.25422 .
A histogram of t statistic
Bootstrap (10,000 samples)

Monte Carlo (100,000 samples)

500

5000

450

4500

400

4000

350

3500

300

3000

250

2500

200

2000

150

1500

100

1000
500

50
0
8

c Leonid Kogan ( MIT, Sloan )

0
8

Bootstrap

15.450, Fall 2010

21 / 28

Small-Sample Inference

Bootstrap

Boostrap Condence Intervals

The basic bootstrap condence interval is valid, and can be used in


situations when asymptotic inference is too difcult to perform.
Bootstrap condence interval is as accurate asymptotically as the
interval based on the normal approximation.
For t-statistic, bootstrapped distribution is more accurate than the
large-sample normal approximation.
Many generalizations of basic bootstrap have been developed for
wider applicability and better inference quality.

c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

22 / 28

Small-Sample Inference

Bootstrap

Parametric Bootstrap

Parametric bootstrap can handle non-IID samples.


Example: a sample from an AR(1) process: xt , t = 1, ..., T :
xt = a0 + a1 xt 1 + t
Want to estimate a condence interval for
a1 .
1
t .

Estimate the parameters


a0 ,
a1 and the residuals
2

Generate R bootstrap samples for xt .

For each sample: generate a long series according to the AR(1)


a0 ,
a1 , drawing shocks with replacement from the sample
dynamics with

T ;
1 , ...,
Retain only the last T observations (drop the burn-in sample).
3

Compute the condence interval as we would with basic nonparametric


bootstrap using R samples.

c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

23 / 28

Small-Sample Inference

Bootstrap

Bootstrap Bias Adjustment


Want to estimate small-sample bias of a statistic :
E 0
REAL WORLD
Unknown
probability
model

BOOTSTRAP WORLD
Estimated
probability
model

Observed data
x = (x1, x2, ... xn)

Parameter of
interest
= (P)

Estimate of

x* = (x*1, x2*, ... x*n)


Bootstrap
replicate
of ^

Estimated
parameter

^ = s(x)

^*
= s(x*)

^ = (P)
^

^
^ ,)
^ (
BiasP

^ )
BiasP(,

Source: Efron and Tibshirani, 1994, Figure 10.4.


c Leonid Kogan ( MIT, Sloan )

Bootstrap sample

^
P

Image by MIT OpenCourseWare.


Bootstrap

15.450, Fall 2010

24 / 28

Small-Sample Inference

Bootstrap

Bootstrap Bias Adjustment

Bootstrap provides an intuitive approach:

0 ER

where ER denotes the average across the R bootstrapped samples.


Intuition: treat the empirical distribution as exact, compute the

average bias across bootstrapped samples.

Caution: by estimating the bias, we may be adding sampling error.


.
Correct for the bias if it is large compared to the standard error of

c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

25 / 28

Small-Sample Inference

Bootstrap

Example: Predictive Regression

Use parametric bootstrap: 1,000 samples, 200 periods as burn-in,


retain samples of same length as historical.

and standard errors for each sample. Use Newey-West


Tabulate
with 6 lags to compute standard errors.
Sample Distribution of T-stat
150

is 0.0125.

Average of
100

is 0.0036.

Average bias in
Average standard error is 0.005.

50

Average t-stat on is 0.67.

0
-4

-3

-2

-1

c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

26 / 28

Small-Sample Inference

Bootstrap

Discussion

Asymptotic theory is very convenient when available, but in small


samples results may be inaccurate.
Use Monte Carlo simulations to gain intuition.
Bootstrap is a powerful tool. Use it when asymptotic theory is

unavailable or suspect.

Bootstrap is not a silver bullet:


Does not work well if rare events are missing from the empirical
sample;
Does not account for more subtle biases, e.g., survivorship, or sample
selection.
Does not cure model misspecication.

No substitute for common sense!

c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

27 / 28

Small-Sample Inference

Bootstrap

Readings

Campbell, Lo, MacKinlay, 1997, Section 7.2, pp. 273-274.


B. Efron and R.J. Tibshirani, An Introduction to the Bootstrap,

Sections 4.2-4.3, 10.1-10.2, 12.1-12.5.

A. C. Davison and D. V. Hinkley, Bootstrap Methods and Their

Application, Ch. 2. Cambridge University Press, 1997.

R. Stambaugh, 1999, Predictive Regressions, Journal of Financial


Economics 54, 375-421.

c Leonid Kogan ( MIT, Sloan )

Bootstrap

15.450, Fall 2010

28 / 28

MIT OpenCourseWare
http://ocw.mit.edu

15.450 Analytics of Finance


Fall 2010

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms .

You might also like