AOD Lec9
AOD Lec9
OUTLINE
• Bootstrap
• Jackknife
1
5/1/2024
2
5/1/2024
Good question.
Small sample size.
Non-normal distribution of the sample.
A test of means for two samples.
Not as sensitive to N.
3
5/1/2024
BOOTSTRAP IDEA
We avoid the task of taking many samples from the population by instead
taking many resamples from a single sample. The values of x from these
resamples form the bootstrap distribution. We use the bootstrap distribution
rather than theory to learn about the sampling distribution.
4
5/1/2024
Population
estimate by ˆ
sample
Unknown
distribution, F
i.i.d inference
resample
Repeat for
B times
(B≥1000)
XB1, XB2, … , XBn
statistics
IT142IU - ANALYTICS FOR OBSERVATIONAL DATA 5/1/2024 9
ˆ1* ˆ2* ˆB*
Step1 sampling
i.i.d
10
10
5
5/1/2024
step2 i.i.d
resampling
Repeat for
B times
XB1, XB2, … , XBn
STEP 2: Resampling the data B times with replacement, then you can
get many resampling data sets, and use this resampling data instead of
real samples data from the population
11
11
Step 3: statistics
Repeat for
B times
12
6
5/1/2024
13
b= 1,2, …,B
IT142IU - ANALYTICS FOR OBSERVATIONAL DATA 5/1/2024 14
14
7
5/1/2024
15
NONPARAMETRIC CONFIDENCE
INTERVALS FOR USING BOOTSTRAPPING
•Many methods
The simplest : The percentile method
16
8
5/1/2024
17
The percentile (1-a) 100% confidence interval for a population mean is:
( *(a/2) , * (1-a/2) )
18
9
5/1/2024
19
EXAMPLE
20
10
5/1/2024
21
22
11
5/1/2024
……
……
Resample #N
23
24
12
5/1/2024
25
26
13
5/1/2024
27
28
14
5/1/2024
IT142IU - ANALYTICS FOR OBSERVATIONAL DATA 98% C.I is 8.289 to 18.9365 5/1/2024 29
29
30
15
5/1/2024
We run the code the second time, and we get the result as
31
32
16
5/1/2024
33
MEAN
12.447
≈ 12.457
C. I.
(8.289 , 18.9365)
≈ (8.4825, 18.4995
34
17
5/1/2024
35
THE JACKKNIFE
Jackknife methods make use of systematic partitions
of a data set to estimate properties of an estimator
computed from the full sample.
36
18
5/1/2024
For a data set X = (x1, x2, x3, x4, x5) the standard
deviation of the average is:
n 1 n
i
2
x x
n i 1
For measurements other than the mean,
there is no easy way to assess the accuracy.
IT142IU - ANALYTICS FOR OBSERVATIONAL DATA 5/1/2024 37
37
Jackknife Method
n
t t ( x1 , x2 , , xi 1 , xi 1 , , xn ) . Let t
i
t n. Then the
i 1 i
jackknife estimate of SE (t ) is given by
(t ) n 1 t t 2 n 1 st*
n
JSE i
n i 1 n (1)
where st * is the sample standard deviation of t , t , , tn.
1
2
IT142IU - ANALYTICS FOR OBSERVATIONAL DATA 5/1/2024 38
38
19
5/1/2024
39
statistic t estimate by t
Unknown sample
distribution F
t x1 , x2 , , xn inference
resample
Repeat for
n times
t x2 , x3 , , xn t x1 , x3 , , xn n≥1000 t x1 , x3 , , xn 1
statistics
IT142IU - ANALYTICS FOR OBSERVATIONAL DATA 5/1/2024 40
t1 t2 tn
40
20
5/1/2024
x x
n 2
(t ) n 1 x * x * 2
n
i SE x (2)
i 1 i
JSE
n i 1 n n 1
41
42
21
5/1/2024
EXAMPLE FOR
JACKKNIFE
43
AN EXAMPLE IN PYTHON
44
22
5/1/2024
AN EXAMPLE IN PYTHON
45
AN EXAMPLE IN PYTHON
46
23
5/1/2024
EXERCISE
47
48
24
5/1/2024
……
Repeat for 10 times
49
50
25
5/1/2024
Jackknife
IT142IU - ANALYTICS FOR OBSERVATIONAL DATA Outcome Bias Correcte Standard error
5/1/2024 51
outcomes of original d
for total sample estimate
s
51
52
26