Drawing Random Samples From Statistical Distributions: Paul E. Johnson
Drawing Random Samples From Statistical Distributions: Paul E. Johnson
Paul E. Johnson1 2
2017
Descriptive
Where Do We Start?
Probability Density
Function, f (x)
0.4
PDF
Note small letter 0.3
f(x)
−3 −2 −1 0 1 2 3
x
Descriptive
variable of
0.8
integration.
0.6
F(x)
0.4
Note CAPITAL
0.2
−3 −2 −1 0 1 2 3
CDF is always “S x
shaped”
Some people may be confused about usage of x in f (x) and
F (x). Sometimes I write F (xupper ) or F (k) to clear that up
Descriptive
Inversion method:
Works easily if we can calculate “quantiles” (meaning the CDF
is invertible).
If CDF can be closely approximated, an approximate “look-up
table” can be created (R’s Normal)
Rejection Sampling
Find some other similar PDF that is easier to calculate
Use algorithm to test “candidates” and keep ones that fit
Composition, MCMC, and other methods are not worked out
in these notes.
Descriptive
Inversion
Consider a CDF.
What does the left
1
0.82
hand side mean?
0.58
Fraction of cases
F(x)
smaller than
0.21
that point.
Think
0
−0.8 0.2 0.9
“backwards” to
find x that x
corresponds.
Descriptive
An “equally likely”
draw from f (x)
1
would have this
0.82
property:
0.58
F(x)
All points on the
0.21
vertical axis
between [0,1]
0
−0.8 0.2 0.9
are going to be
equally likely. x
Right?
(Otherwise, a
randomly drawn x
wouldn’t really be
random, eh?)
Descriptive
Inversion Algorithm
Inversion method
draw a random
1
u ∼Uniform[0, 1]
0.82
“Think
0.58
F(x)
Backwards” to
get
0.21
corresponding
x = F −1 (u) 0 −0.8 0.2 0.9
Collect a lot of x
Draw u ∼ U[0, 1]
h i
u
Calculate x∗ = µ + σ · ln 1−u
And, as they say on Shampoo instructions, Repeat.
Descriptive
Limitations of Inversion
Rejection Sampling
Suppose
When x < 0, r (x) = f (x).
When x ≥ 0, r (x) = 1.1 · f (x)
For now, don’t worry if such an r (x) exists, because it
doesn’t. But it really makes the point clear.
Draw a “candidate” random number x∗ from r . Should we
keep it?
Descriptive
Illustration
If x∗ < 0, accept it
as a representation
0.8
of f (x)
probability density
0.6
When x < 0, r and Throw away this fraction
0.4
f coincide, so we
0.2
can keep all of
those draws.
0.0
−2 −1 0 1 2 3
x
Descriptive
Illustration, if x∗ ≥ 0
0.8
and r coincide most of the time.
probability density
0.6
Throw away this fraction
0.4
Where r and f “overlap”, we
0.2
want to keep x∗
0.0
That happens with probability −2 −1 0 1 2 3
f (x∗)/r (x∗) = x
Assume r (x) is
always bigger than
1.0
f (x) (by definition)
0.8
0.6
A draw from r (x)
f(x)
0.4
might be something
r
0.2
like a draw from 0.0
f
f (x).
0.0 0.5 1.0 1.5 2.0 2.5 3.0
x
Descriptive
The probability of
drawing x∗ = 1.9
1.0
can be calculated
0.8
from r (x) and f (x)
0.6
f(x)
0.4
Keep x∗ with r(1.9)
0.2
probability gap= 0.09
f(1.9)
0.0
f (1.9)/r (1.9).
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Amounts to
x
“throwing away” a
“gap sized fraction”
of candidate draws
equal to 1.9
Descriptive
This procedure
wastes
computational
1.5
r(x) = 1.53
probability density
effort
1.0
f(x) = β(1.2, 1.9)
“Works” even if r is
0.5
not like f at all, but
is just more
wasteful 0.0 0.0 0.2 0.4 0.6 0.8 1.0
v1 v2 v3 v4
v5 v6
[1 ,] 403 403 403 403
403 403
[2 ,] 2 4 7 9
11 16
[3 , ] −1346850345 −1346850345 −1346850345 −1346850345
−1346850345 −1346850345
[4 ,] 656028621 656028621 656028621 656028621
656028621 656028621
Repeat
i n v i s i b l e ( rgamma ( 1 , s h a p e = 1) ) ; v1 <− .Random.seed [1:4]
i n v i s i b l e ( rgamma ( 1 , s h a p e = 1) ) ; v2 <− .Random.seed [1:4]
i n v i s i b l e ( rgamma ( 1 , s h a p e = 1) ) ; v3 <− .Random.seed [1:4]
i n v i s i b l e ( rgamma ( 1 , s h a p e = 1) ) ; v4 <− .Random.seed [1:4]
i n v i s i b l e ( rgamma ( 1 , s h a p e = 1) ) ; v5 <− .Random.seed [1:4]
i n v i s i b l e ( rgamma ( 1 , s h a p e = 1) ) ; v6 <− .Random.seed [1:4]
c b i n d ( v1 , v2 , v3 , v4 , v5 , v6 )
Descriptive
v1 v2 v3 v4
v5 v6
[1 ,] 403 403 403 403
403 403
[2 ,] 19 21 24 26
28 31
[3 , ] −1346850345 −1346850345 −1346850345 −1346850345
−1346850345 −1346850345
[4 ,] 656028621 656028621 656028621 656028621
656028621 656028621
v <− v e c t o r ( mode = ” i n t e g e r ” , l e n g t h = 1 0 0 0 )
f o r ( i in 1:10000) {
i n v i s i b l e ( rgamma ( 1 , s h a p e = 1 ) ) ; v [ i ] <− . R a n d o m . s e e d [ 2 ]
}
vd <− d i f f ( v )
t a b l e ( vd )
Descriptive
vd
−622 −621 −619 −617 −615 −614 −613 −611 −609 −607 −605 −603
−601 −599 2 3 5 7 8 9
10 25 2 2 3 1 1 1 3 1 1 1
1 1 4937 3836 316 267 32 138
10 11 12 13 14 15 16 17 18 19 20 21
22 23 24 25 26 27 29 30
25 111 13 54 10 46 23 28 8 24 12 19
5 12 1 7 1 5 4 1
32 33 34 35 36 38 39
2 2 1 1 3 1 1
There’s some distracting wrap around when the counter hits 624
and goes back to 1. But the point is clear enough. Often, gamma
takes 2 or 3 draws from the stream, while we see 20 or 30 draws
sometimes.