Descriptive
Drawing Random Samples From Statistical
Distributions
Paul E. Johnson1 2
1 Department of Political Science
2 Center for Research Methods and Data Analysis, University of Kansas
2017
Descriptive
Where do random samplings come from?
Analytical solutions for a few distributions (ones that have
invertible CDF)
Approximate computational solutions for the rest
Descriptive
Where Do We Start?
Assume we have formulae for distributions from which we
want to draw samples
Assume we have a random generator that can give us random
integers on [0, MAX]
Assume that the random generator is a good one, either
MT19937 or one of L’Ecuyer’s parallel stream generators.
Of course, do whatever book keeping necessary to assure
perfect replication
Descriptive
PDF
Probability Density
Function, f (x)
0.4
PDF
Note small letter 0.3
f(x)
used for PDF
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
x
Descriptive
F (x) Cumulative Distribution Function (CDF) is S-Shaped
CDF: Area on left of point x
ˆ k ˆ x
F (k) = f (x)dx or F (x) = f (e)de
−∞ −∞
Used e for dummy 1.0
variable of
0.8
integration.
0.6
F(x)
0.4
Note CAPITAL
0.2
letter used for CDF
0.0
−3 −2 −1 0 1 2 3
CDF is always “S x
shaped”
Some people may be confused about usage of x in f (x) and
F (x). Sometimes I write F (xupper ) or F (k) to clear that up
Descriptive
The U[0,1] is Obtained Easily from Generator
A Uniform draw on 0, 1 is obtained:
x ∼ random integer from [0, MAX ] (1)
x
y = (2)
MAX
From that, can get Bernoulli Heads, Tails sample. If y > 0.5,
then Heads
Descriptive
Other Distributions Require More Thought
Inversion method:
Works easily if we can calculate “quantiles” (meaning the CDF
is invertible).
If CDF can be closely approximated, an approximate “look-up
table” can be created (R’s Normal)
Rejection Sampling
Find some other similar PDF that is easier to calculate
Use algorithm to test “candidates” and keep ones that fit
Composition, MCMC, and other methods are not worked out
in these notes.
Descriptive
Inversion
Consider a CDF.
What does the left
1
0.82
hand side mean?
0.58
Fraction of cases
F(x)
smaller than
0.21
that point.
Think
0
−0.8 0.2 0.9
“backwards” to
find x that x
corresponds.
Descriptive
Concept behind Inversion
An “equally likely”
draw from f (x)
1
would have this
0.82
property:
0.58
F(x)
All points on the
0.21
vertical axis
between [0,1]
0
−0.8 0.2 0.9
are going to be
equally likely. x
Right?
(Otherwise, a
randomly drawn x
wouldn’t really be
random, eh?)
Descriptive
Inversion Algorithm
Inversion method
draw a random
1
u ∼Uniform[0, 1]
0.82
“Think
0.58
F(x)
Backwards” to
get
0.21
corresponding
x = F −1 (u) 0 −0.8 0.2 0.9
Collect a lot of x
those, and you’ve
got a sample from
f (x)
Descriptive
Logistic: Easy Inversion
Recall the Logistic PDF, with “location” parameter µ and “scale”
parameter σ:
1 exp(− x−µ
σ )
f (x) = (3)
σ (1 + exp(− x−µ
σ ))
2
In the usual cases we discuss, µ = 0 and σ = 1.0, so the pdf in
question is simpler.
And CDF:
exp( x−µ
σ ) 1
F (x) = x−µ = −(x−µ)/σ
. (4)
1 + exp( σ ) 1 + e
Descriptive
Calculate Logistic Inverse CDF
Figure for each probability density output value y , find the x
that corresponds to it via F:
1
y= −(x−µ)/σ
1+e
Through the simple algebra used in derivation of Logistic
Regression
y
ln = (x − µ)/σ
1−y
So
y
x∗ = µ + σ · ln
1−y
Descriptive
Use That For Inversion Sampling
Draw u ∼ U[0, 1]
h i
u
Calculate x∗ = µ + σ · ln 1−u
And, as they say on Shampoo instructions, Repeat.
Descriptive
Limitations of Inversion
Inversion is only practical when we have a formula for F −1
that is easy to calculate.
Logistic distribution, for example, has an easy formula.
Normal, and many others, DO NOT.
So we must either
Approximate F −1 in order to conduct inversion
Find some other algorithm.
Descriptive
Rejection Sampling
f (x): The pdf from which we want to draw a sample
f (x): is some complicated formula, we can’t calculate its CDF
or inverse CDF. That means we have no obvious way to
sample from f (x)
But we can calculate the PDF at any given point, and that
turns out to be the magic bullet.
Descriptive
General Idea behind Rejection Sampling
r (x) is different from f (x) in some understandable way.
So, draw samples from r (x)
then keep the ones you need.
How do you know when to throw away a draw from r (x)?
That’s the trick!
Descriptive
Start with an Easy Case
Suppose
When x < 0, r (x) = f (x).
When x ≥ 0, r (x) = 1.1 · f (x)
For now, don’t worry if such an r (x) exists, because it
doesn’t. But it really makes the point clear.
Draw a “candidate” random number x∗ from r . Should we
keep it?
Descriptive
Illustration
If x∗ < 0, accept it
as a representation
0.8
of f (x)
probability density
0.6
When x < 0, r and Throw away this fraction
0.4
f coincide, so we
0.2
can keep all of
those draws.
0.0
−2 −1 0 1 2 3
x
Descriptive
Illustration, if x∗ ≥ 0
Most of the time we want to
keep that observation, since ϕ
0.8
and r coincide most of the time.
probability density
0.6
Throw away this fraction
0.4
Where r and f “overlap”, we
0.2
want to keep x∗
0.0
That happens with probability −2 −1 0 1 2 3
f (x∗)/r (x∗) = x
f (x∗)/(1.1 ∗ f (x∗)) = 1/1.1
So, with probability
1/1.1 = 0.9090909 . . ., we keep
x∗
Otherwise, throw it away and
draw another.
Descriptive
More Realistic Case
Assume r (x) is
always bigger than
1.0
f (x) (by definition)
0.8
0.6
A draw from r (x)
f(x)
0.4
might be something
r
0.2
like a draw from 0.0
f
f (x).
0.0 0.5 1.0 1.5 2.0 2.5 3.0
x
Descriptive
Check Out The Size of That Gap!
The probability of
drawing x∗ = 1.9
1.0
can be calculated
0.8
from r (x) and f (x)
0.6
f(x)
0.4
Keep x∗ with r(1.9)
0.2
probability gap= 0.09
f(1.9)
0.0
f (1.9)/r (1.9).
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Amounts to
x
“throwing away” a
“gap sized fraction”
of candidate draws
equal to 1.9
Descriptive
Realism and Rejection
This procedure
wastes
computational
1.5
r(x) = 1.53
probability density
effort
1.0
f(x) = β(1.2, 1.9)
“Works” even if r is
0.5
not like f at all, but
is just more
wasteful 0.0 0.0 0.2 0.4 0.6 0.8 1.0
If we draw a candidate x∗ = 0.2, we are likely to keep it
If we draw a candidate x∗ = 0.9, we are almost always going
to throw it away because r (0.9)/f (0.9) is very large.
Descriptive
The rgamma distribution uses rejection sampling I
Rejection uses the random number stream unpredictably.
Sometimes, it takes just a few pulls on the stream to get a
gamma draw, sometimes can take a lot more.
Discussed in vignette with portableParallelSeeds
“[Link]”
Look at row 2, which is the position in the random stream at
which we are positioned after a draw.
RNGkind ( ”Mersenne−Twister ”)
s e t . s e e d (12345)
i n v i s i b l e ( rgamma ( 1 , s h a p e = 1 ) ) ; v1 <− .[Link] [1:4]
i n v i s i b l e ( rgamma ( 1 , s h a p e = 1 ) ) ; v2 <− .[Link] [1:4]
i n v i s i b l e ( rgamma ( 1 , s h a p e = 1 ) ) ; v3 <− .[Link] [1:4]
i n v i s i b l e ( rgamma ( 1 , s h a p e = 1 ) ) ; v4 <− .[Link] [1:4]
i n v i s i b l e ( rgamma ( 1 , s h a p e = 1 ) ) ; v5 <− .[Link] [1:4]
i n v i s i b l e ( rgamma ( 1 , s h a p e = 1 ) ) ; v6 <− .[Link] [1:4]
c b i n d ( v1 , v2 , v3 , v4 , v5 , v6 )
Descriptive
The rgamma distribution uses rejection sampling II
v1 v2 v3 v4
v5 v6
[1 ,] 403 403 403 403
403 403
[2 ,] 2 4 7 9
11 16
[3 , ] −1346850345 −1346850345 −1346850345 −1346850345
−1346850345 −1346850345
[4 ,] 656028621 656028621 656028621 656028621
656028621 656028621
Repeat
i n v i s i b l e ( rgamma ( 1 , s h a p e = 1) ) ; v1 <− .[Link] [1:4]
i n v i s i b l e ( rgamma ( 1 , s h a p e = 1) ) ; v2 <− .[Link] [1:4]
i n v i s i b l e ( rgamma ( 1 , s h a p e = 1) ) ; v3 <− .[Link] [1:4]
i n v i s i b l e ( rgamma ( 1 , s h a p e = 1) ) ; v4 <− .[Link] [1:4]
i n v i s i b l e ( rgamma ( 1 , s h a p e = 1) ) ; v5 <− .[Link] [1:4]
i n v i s i b l e ( rgamma ( 1 , s h a p e = 1) ) ; v6 <− .[Link] [1:4]
c b i n d ( v1 , v2 , v3 , v4 , v5 , v6 )
Descriptive
The rgamma distribution uses rejection sampling III
v1 v2 v3 v4
v5 v6
[1 ,] 403 403 403 403
403 403
[2 ,] 19 21 24 26
28 31
[3 , ] −1346850345 −1346850345 −1346850345 −1346850345
−1346850345 −1346850345
[4 ,] 656028621 656028621 656028621 656028621
656028621 656028621
v <− v e c t o r ( mode = ” i n t e g e r ” , l e n g t h = 1 0 0 0 )
f o r ( i in 1:10000) {
i n v i s i b l e ( rgamma ( 1 , s h a p e = 1 ) ) ; v [ i ] <− . R a n d o m . s e e d [ 2 ]
}
vd <− d i f f ( v )
t a b l e ( vd )
Descriptive
The rgamma distribution uses rejection sampling IV
vd
−622 −621 −619 −617 −615 −614 −613 −611 −609 −607 −605 −603
−601 −599 2 3 5 7 8 9
10 25 2 2 3 1 1 1 3 1 1 1
1 1 4937 3836 316 267 32 138
10 11 12 13 14 15 16 17 18 19 20 21
22 23 24 25 26 27 29 30
25 111 13 54 10 46 23 28 8 24 12 19
5 12 1 7 1 5 4 1
32 33 34 35 36 38 39
2 2 1 1 3 1 1
There’s some distracting wrap around when the counter hits 624
and goes back to 1. But the point is clear enough. Often, gamma
takes 2 or 3 draws from the stream, while we see 20 or 30 draws
sometimes.