Brand
Brand
Charles J. Geyer
Contents
1 Discrete Uniform Distribution 2
3 Uniform Distribution 3
5 Bernoulli Distribution 4
6 Binomial Distribution 5
7 Hypergeometric Distribution 6
8 Poisson Distribution 7
9 Geometric Distribution 8
11 Normal Distribution 10
12 Exponential Distribution 12
13 Gamma Distribution 12
14 Beta Distribution 14
15 Multinomial Distribution 15
1
16 Bivariate Normal Distribution 18
18 Chi-Square Distribution 21
19 Student’s t Distribution 22
20 Snedecor’s F Distribution 23
21 Cauchy Distribution 24
22 Laplace Distribution 25
Type Discrete.
Moments
n+1
E(X) =
2
2
n −1
var(X) =
12
2
Probability Mass Function
1
f (x) = , x ∈ S,
n
where n is the number of elements of S.
3 Uniform Distribution
Abbreviation Unif(a, b).
Type Continuous.
Moments
a+b
E(X) =
2
(b − a)2
var(X) =
12
3
Probability Density Function
1
f (x) = , x∈S
c
where c is the measure (length in one dimension, area in two, volume in
three, etc.) of the set S.
5 Bernoulli Distribution
Abbreviation Ber(p).
Type Discrete.
Moments
E(X) = p
var(X) = p(1 − p)
4
6 Binomial Distribution
Abbreviation Bin(n, p).
Type Discrete.
Moments
E(X) = np
var(X) = np(1 − p)
Bin(n, p) ≈ Poi(np)
Theorem The fact that the probability mass function sums to one is
equivalent to the binomial theorem: for any real numbers a and b
n
X n k n−k
a b = (a + b)n .
k
k=0
5
Degeneracy If p = 0 the distribution is concentrated at 0. If p = 1 the
distribution is concentrated at n.
7 Hypergeometric Distribution
Abbreviation Hypergeometric(A, B, n).
Type Discrete.
Moments
E(X) = np
N −n
var(X) = np(1 − p) ·
N −1
where
A
p= (7.1)
A+B
N =A+B
Hypergeometric(n, A, B) ≈ Bin(n, p)
6
Normal Approximation If n is large, but small compared to either A
or B, then
Hypergeometric(n, A, B) ≈ N np, np(1 − p)
where p is given by (7.1).
Theorem The fact that the probability mass function sums to one is
equivalent to
min(A,n)
X A B A+B
=
x n−x n
x=max(0,n−B)
8 Poisson Distribution
Abbreviation Poi(µ)
Type Discrete.
Moments
E(X) = µ
var(X) = µ
Poi(µ) ≈ N (µ, µ)
7
Theorem The fact that the probability mass function sums to one is
equivalent to the Maclaurin series for the exponential function: for any
real number x
∞
X xk
= ex .
k!
k=0
9 Geometric Distribution
Abbreviation Geo(p).
Type Discrete.
Rationales
• Inverse sampling.
Moments
1−p
E(X) =
p
1−p
var(X) =
p2
8
Theorem The fact that the probability mass function sums to one is
equivalent to the geometric series: for any real number s such that |s| < 1
∞
X 1
sk = .
1−s
k=0
Type Discrete.
Rationale
• Sum of IID geometric random variables.
• Inverse sampling.
Moments
r(1 − p)
E(X) =
p
r(1 − p)
var(X) =
p2
9
Normal Approximation If r(1 − p) is large, then
r(1 − p) r(1 − p)
NegBin(r, p) ≈ N ,
p p2
Theorem The fact that the probability mass function sums to one is
equivalent to the generalized binomial theorem: for any real number
s such that −1 < s < 1 and any real number m
∞
X m k
s = (1 + s)m . (10.2)
k
k=0
which has a more obvious relationship to the negative binomial density sum-
ming to one.
11 Normal Distribution
Abbreviation N (µ, σ 2 ).
10
Type Continuous.
Rationale
• Limiting distribution in the central limit theorem.
• Error distribution that turns the method of least squares into maxi-
mum likelihood estimation.
Moments
E(X) = µ
var(X) = σ 2
E{(X − µ)3 } = 0
E{(X − µ)4 } = 3σ 4
Theorem The fact that the probability density function integrates to one
is equivalent to the integral
Z ∞
2 √
e−z /2 dz = 2π
−∞
11
12 Exponential Distribution
Abbreviation Exp(λ).
Type Continuous.
Rationales
Moments
1
E(X) =
λ
1
var(X) = 2
λ
13 Gamma Distribution
Abbreviation Gam(α, λ).
12
Type Continuous.
Rationales
• Sum of IID exponential random variables.
• Conjugate prior for exponential, Poisson, or normal precision family.
Moments
α
E(X) =
λ
α
var(X) = 2
λ
Theorem The fact that the probability density function integrates to one
is equivalent to the integral
Z ∞
Γ(α)
xα−1 e−λx dx = α
0 λ
the case λ = 1 is the definition of the gamma function
Z ∞
Γ(α) = xα−1 e−x dx (13.1)
0
13
Relation to Other Distributions
Γ(1) = 1
and the relationship between the N (0, 1) and Gam( 12 , 12 ) distributions gives
√
Γ( 12 ) = π
Together with the recursion (13.2) these give for any positive integer n
Γ(n + 1) = n!
and √
Γ(n + 12 ) = n − 1 3
· · · 32 · 1
2 n− 2 2 π
14 Beta Distribution
Abbreviation Beta(α1 , α2 ).
Type Continuous.
Rationales
14
Parameter Real numbers α1 > 0 and α2 > 0.
Γ(α1 + α2 ) α1 −1
f (x) = x (1 − x)α2 −1 0<x<1
Γ(α1 )Γ(α2 )
Moments
α1
E(X) =
α1 + α2
α1 α2
var(X) = 2
(α1 + α2 ) (α1 + α2 + 1)
Theorem The fact that the probability density function integrates to one
is equivalent to the integral
Z 1
Γ(α1 )Γ(α2 )
xα1 −1 (1 − x)α2 −1 dx =
0 Γ(α1 + α2 )
15 Multinomial Distribution
Abbreviation Multi(n, p).
Type Discrete.
15
Parameters Real vector p in the parameter space
k
( )
X
p ∈ Rk : 0 ≤ pi , i = 1, . . . , k, and pi = 1 (15.1)
i=1
where
n n!
= Qk
x i=1 xi !
is called a multinomial coefficient.
Moments
E(Xi ) = npi
var(Xi ) = npi (1 − pi )
cov(Xi , Xj ) = −npi pj , i 6= j
E(X) = np
var(X) = nM
where
M = P − ppT
16
Addition Rule If X1 , . . ., Xk are independent random vectors, Xi being
Multi(ni , p) distributed, then X1 + · · · + Xk is a Multi(n1 + · · · + nk , p)
random variable.
Theorem The fact that the probability mass function sums to one is
equivalent to the multinomial theorem: for any vector a of real num-
bers
X n Y k
axi i = (a1 + · · · + ak )n
x
x∈S i=1
Xi ∼ Bin(n, pi )
17
Conditional Distributions If {i1 , . . . , im } and {im+1 , . . . , ik } partition
the set {1, . . . , k}, then the conditional distribution of Xi1 , . . ., Xim given
Xim+1 , . . ., Xik is Multi(n − Xim+1 − · · · − Xik , q), where the parameter
vector q has components
p ij
qj = , j = 1, . . . , m
pi1 + · · · + pim
Type Continuous.
18
Moments
E(Xi ) = µi , i = 1, 2
var(Xi ) = σi2 , i = 1, 2
cov(X1 , X2 ) = ρσ1 σ2
cor(X1 , X2 ) = ρ
E(X) = µ
var(X) = M
Type Continuous.
Rationales
19
Probability Density Function If M is (strictly) positive definite,
E(X) = µ
var(X) = M
20
Conditional Distributions Every conditional of a multivariate normal
is normal (univariate or multivariate as the case may be). In partitioned
form, the conditional distribution of X1 given X2 is
N (µ1 + M12 M− −
22 [X2 − µ2 ], M11 − M12 M22 M21 )
18 Chi-Square Distribution
Abbreviation chi2 (ν) or χ2 (ν).
Type Continuous.
Rationales
Moments
E(X) = ν
var(X) = 2ν
21
Normal Approximation If ν is large, then
• If X and Y are independent and are chi2 (µ) and chi2 (ν) distributed,
respectively, then (X/µ)/(Y /ν) is F (µ, ν) distributed.
19 Student’s t Distribution
Abbreviation t(ν).
Type Continuous.
Rationales
√
• Sampling distribution of pivotal quantity n(X n − µ)/Sn when data
are IID normal.
1 Γ( ν+1
2 ) 1
f (x) = √ · ν · (ν+1)/2 , −∞ < x < +∞
νπ Γ( 2 ) x2
1+ ν
22
Moments If ν > 1, then
E(X) = 0.
Otherwise the mean does not exist. If ν > 2, then
ν
var(X) = .
ν−2
Otherwise the variance does not exist.
20 Snedecor’s F Distribution
Abbreviation F (µ, ν).
Type Continuous.
Rationale
• Ratio of sums of squares for normal data (test statistics in regression
and analysis of variance).
23
Moments If ν > 2, then
ν
E(X) = .
ν−2
Otherwise the mean does not exist.
• If X and Y are independent and are chi2 (µ) and chi2 (ν) distributed,
respectively, then (X/µ)/(Y /ν) is F (µ, ν) distributed.
21 Cauchy Distribution
Abbreviation Cauchy(µ, σ).
Type Continuous.
Rationales
24
Relation to Other Distributions
22 Laplace Distribution
Abbreviation Laplace(µ, σ).
Type Continuous.
Parameters Real numbers µ and σ > 0, called the mean and standard
deviation, respectively.
Moments
E(X) = µ
var(X) = σ 2
25