Lecture-2
Lecture-2
Paolo Zacchia
Lecture 2
General overview
1. Discrete distributions.
X ∼ Be (p)
E [X] = p
Var [X] = p (1 − p)
• The underlying sample space is the set S = {0, 1}n . All the
underlying Bernoulli trials are independent.
X ∼ Bn (p, n)
The binomial distribution (2/3)
• The binomial distribution owes its name to its extensive use
of the binomial coefficient and formula. In fact:
!
n n!
=
x x! (n − x)!
bxc !
n
pi (1 − p)n−i
X
FX (x; p, n) =
i=0
i
The binomial distribution (3/3)
E [X] = np
Var [X] = np (1 − p)
which is intuitive.
The geometric distribution (1/3)
• The geometric distribution is also based on some possibly
infinite Bernoulli trials. Implicitly, trials are ordered.
fX (x; p) = p (1 − p)x−1
bxc−1
p (1 − p)i = 1 − (1 − p)bxc
X
FX (x; p) =
i=0
The geometric distribution (2/3)
• The geometric m.g.f. exists for t < − log (1 − p):
M
exp (tx) · p (1 − p)x−1
X
MX (t; p) = lim
M →∞
x=0
M
[(1 − p) · exp (t)]x−1
X
= p exp (t) · lim
M →∞
x=0
1 − [(1 − p) · exp (t)]M
= p exp (t) · lim
M →∞ 1 − (1 − p) · exp (t)
p exp (t)
=
1 − (1 − p) exp (t)
• . . . hence, the mean and variance are as follows.
1
E [X] =
p
1−p
Var [X] =
p2
The geometric distribution (3/3)
P (X > s ∩ X > t)
P ( X > s| X > t) =
P (X > t)
P (X > s)
=
P (X > t)
= (1 − p)s−t
= P (X > s − t)
X ∼ NB (p, r)
Observation 1
The geometric distribution is a special case of the negative binomial
distribution, with r = 1; thus it is denoted as X ∼ NB (p, 1)
The negative binomial distribution (2/2)
• One could also look at the number of failures Y = X − r:
!
r+y−1 r
fY (y; p, r) = p (1 − p)y
y
!
y −r r
= (−1) p (1 − p)y
y
whence the name “negative” binomial.
• The m.g.f. is defined for t < − log (1 − p):
r
p exp (t)
MX (t; p, r) =
1 − (1 − p) exp (t)
• . . . and one can obtain the following mean and variance.
r
E [X] =
p
r (1 − p)
Var [X] =
p2
The Poisson distribution (1/6)
• The Poisson distribution is another important distribution
also connected to Bernoulli trials, albeit indirectly.
X ∼ Pois (λ)
exp (−λ) · λx
fX (x; λ) =
x!
The Poisson distribution (2/6)
• The Poisson’s c.d.f. is:
bxc i
X λ
FX (x; λ) = exp (−λ)
i=0
i!
fX (x)
.2
.1
x
0 2 4 6 8 10
E [X] = λ
Var [X] = λ
X ∼ U {a, b}
bxc − a + 1
FX (x; a, b) = · 1 [a ≤ x ≤ b] + 1 [b < x]
b−a+1
The uniform discrete distribution (2/2)
X ∼ H (N, K, n)
Hypergeometric distribution (2/3)
• The support of the hypergeometric distribution is:
Definition 1
Location and scale families. Let fZ (z) be a probability density
function associated with some random variable Z. For any µ ∈ R and
any σ ∈ R++ , the family of probability density functions of the form
1 x−µ
fX (x) = fZ
σ σ
• Conversely, it is X = σZ + µ.
Standardization of densities
Theorem 1
Standardization of densities. Let f (·) be any probability density
function, µ ∈ R and σ ∈ R++ . Then, a random variable X follows a
probability distribution with density function:
1 x−µ
fX (x) = f
σ σ
(x − µ)2
!
2 1
fX (x; µ, σ ) = √ exp −
2πσ2 2σ2
fX (x) Stand.
0.4 µ=2
σ2 = 4
0.2
x
−5 −3 −1 1 3 5 7
The normal distribution (3/5)
• To show that the density integrates to 1, one can focus on
the standard density, and specifically on half its support.
ˆ ∞
! ˆ ∞
!
z2 z2
r
1 π
√ exp − dz = 1 ⇔ exp − dz =
−∞ 2π 2 0 2 2
E [X] = µ
Var [X] = σ2
Skew [X] = 0
Kurt [X] = 3
log (Y ) ∼ N (µ, σ2 )
(log y − µ)2
!
2 1 1
fY (y; µ, σ ) = √ exp −
2πσ2 y 2σ2
fX (x) Stand.
1 µ=2
σ2 = 4
0.5
x
2 4 6 8
The lognormal distribution (3/3)
• The distribution lacks a m.g.f. but:
X ∼ Logistic (µ, σ)
fX (x) Stand.
µ=2
0.2 ξ=2
0.1
x
−8 −4 4 8 12
The logistic distribution (3/4)
• The m.g.f. of the standard logistic is obtained as:
ˆ ∞
exp (−z)
MZ (t) = exp (tz) dz
−∞ (1 + exp (−z))2
ˆ 1
= ut (1 − u)−t du
0
= B (1 + t, 1 − t)
1 du exp(−z)
where u = 1+exp(−z) ; observe that here dz = (1+exp(−z))2
.
E [X] = µ
σ2 π 2
Var [X] =
3
Skew [X] = 0
21
Kurt [X] =
5
observe the excess kurtosis!
√
• An obvious reparametrization of the logistic is σ∗ = 3
π σ.
X ∼ Cauchy (µ, σ)
fX (x) Stand.
0.3 µ=2
σ=2
0.2
0.1
x
−5 −3 −1 1 3 5 7
The Cauchy distribution (3/3)
• The Cauchy distribution is notorious for lacking defined
moments. Consider its standard version’s mean:
ˆ 0 ˆ +∞
1 z 1 z
E [Z] = 2
dz + dz
−∞ π 1 + z 0 π 1 + z2
• The Cauchy lacks a m.g.f. but, like the all distributions, has
a characteristic function; this is discontinuous at t = 0.
X ∼ Laplace (µ, σ)
1 |x − µ|
fX (x; µ, σ) = exp −
2σ σ
fX (x) Stand.
µ=2
0.4 σ=2
0.2
x
−5 −3 −1 1 3 5 7
The Laplace distribution (3/4)
• As usual, it is easier to calculate the standard m.g.f. first:
ˆ +∞
1
MZ (t) = exp (tz − |z|) dz
−∞ 2
ˆ
1 0
= exp ((1 + t) z) dz+
2 −∞
ˆ
1 +∞
+ exp (− (1 − t) z) dz
2 0
1 1 1
= +
2 1+t 1−t
1
=
1 − t2
exp (µt)
MX (t; µ, σ) =
1 − σ2 t2
The Laplace distribution (4/4)
E [X] = µ
Var [X] = 2σ2
X ∼ U (a, b)
1
fX (x; a, b) = · 1 [x ∈ (a, b)]
b−a
• Here α > 0 and β > 0 are the parameters; the notation is:
X ∼ Beta (α, β)
fX (x) α = 2, β=2
3 α = .5, β = .5
α = 2, β=5
2 α = 5, β=2
x
0.25 0.5 0.75 1
Observation 2
X ∼ Beta (1, 1) is equivalent to X ∼ U (0, 1).
The Beta distribution (4/5)
• The Beta’s m.g.f. is difficult to obtain:
∞ q−1
X Y α + k tq
MX (t; α, β) = 1 +
q k=0
α + β + k q!
Γ (c) = (c − 1) · Γ (c − 1)
(x − a)α−1 (b − x)β−1
fX (x; α, β, a, b) =
B (α, β) · (b − a)α+β−1
X ∼ Exp (λ)
1 x
fX (x; λ) = exp −
λ λ
x
FX (x; λ) = 1 − exp −
λ
The exponential distribution (2/4)
2 fX (x) λ = .5
λ=1
λ=2
x
1 2 3 4 5
The exponential distribution (3/4)
• It is easy to obtain the m.g.f. (recall the case with λ = 1),
however it only exists for t ≤ λ−1 :
1
MX (t; λ) =
1 − λt
E [X] = λ
Var [X] = λ2
Observation 4
1
If X ∼ Exp (λ) and Y = exp (−X) it is Y ∼ Beta λ, 1 .
Observation 5
If X ∼ Laplace (µ, σ) and Y = |X − µ| it is Y ∼ Exp (σ), whence the
name “double exponential” for the Laplace.
Observation 6
If X ∼ Exp (1) and
exp (−X)
Y = µ − σ log
1 − exp (−X)
it is Y ∼ Logistic (µ, σ). The standard logistic models the odds ratio of
exponential events.
The Gamma distribution (1/4)
• Gamma distributions have support upon the set of positive
real numbers X = R++ (X = 0 may be included at will).
• There are two parameters α > 0 and β > 0 (the latter can
be reparametrized as θ = β−1 ); two notations coexist.
1
fX (x; α, β) = βα xα−1 exp (−βx)
Γ (α)
fX (x) α = 2, β = 2
1.5 α = 4, β = 2
α = 2, β = 8
1
0.5
x
1 2 3 4 5
Observation 7
X ∼ Gamma 1, λ1 is equivalent to X ∼ Exp (λ), that is, exponential
distributions are all special cases of the Gamma family.
The Gamma distribution (3/4)
• Uncentered moments are better calculated directly:
ˆ ∞
r 1 1
E [X ] = βr+α xr+α−1 exp (−βx) dx
Γ (α) βr 0
Γ (r + α)
=
Γ (α) βr
X ∼ χ2 (κ) or X ∼ χ2κ
1 x
κ
−1
fX (x; κ) = κ κ x 2 exp −
2
Γ 2 · 22
fX (x) κ=3
κ=5
0.2 κ=7
0.1
x
4 8 12 16
Observation 8
κ 1
X ∼ Gamma 2 , 2 is equivalent to X ∼ χ2 (κ), that is, chi-squared
distributions are all special cases of the Gamma family.
The Chi-squared distribution (3/3)
E [X] = κ
Var [X] = 2κ
Observation 9
1
X ∼ χ2 (2) is equivalent to X ∼ Exp 2 .
Observation 10
If X ∼ N (0, 1) and Y = X 2 , it is Y ∼ χ2 (1).
Snedecor’s F-distribution (1/3)
• Another family of distribution with support upon the set of
positive real numbers X = R++ (with possibly X = 0).
ν1 = 2, ν2 = 2
1 fX (x) ν1 = 2, ν2 = 6
ν1 = 12, ν2 = 12
0.5
x
1 2 3 4 5
Observation 11
If X ∼ F (ν1 , ν2 ) and Y = X −1 , it is Y ∼ F (ν2 , ν1 ).
Snedecor’s F-distribution (3/3)
• The F-distribution lacks a m.g.f. and also its characteristic
function is involved. Key moments are better obtained via
direct integration.
ν2
E [X] =
ν2 − 2
2ν22 (ν1 + ν2 − 2)
Var [X] =
ν1 (ν2 − 2)2 (ν2 − 4)
Observation 12
If X ∼ F (ν1 , ν2 ) and Y ∼ Beta ν21 , ν22 , the random variables X and
(ν1 X/ν2 ) ν2 Y
Y = X=
(1 − ν1 X/ν2 ) ν1 (1 − Y )
Student’s t-distribution (1/4)
• Back to a bell-shaped family with “full” support X = R!
• There is one parameter ν > 0; when ν ∈ N (integer), this is
known as degrees of freedom. The notation is as follows.
X ∼ T (ν) or X ∼ Tν
• The p.d.f. is normalized by the Beta function:
!− ν+1
1 1 x2 2
fX (x; ν) = √ 1+
B 1, ν πν ν
2 2
. ν+1
1 ν ν
or even by the Gamma, since B 2, 2 =Γ 2 Γ 2 .
• The t-distribution’s c.d.f. is perhaps best expressed through
the incomplete Beta function.
1 B 2ν , 1 , ν
if x ≤ 0
FX (x; ν) = 2 1x +ν 2 2
1 − B 2ν , 1, ν if x > 0
2 x +ν 2 2
Student’s t-distribution (2/4)
fX (x) Stud.’s t, ν = 3
0.4
Stand. Cauchy
Stand. Normal
0.2
x
−5 −3 −1 1 3 5
Observation 13
X ∼ T (1) is equivalent to X ∼ Cauchy (0, 1).
Student’s t-distribution (3/4)
• The t-distribution lacks a m.g.f., its characteristic function
is involved, and moments of order r ≥ ν are not defined.
r+1 ν−r
Γ 2 Γ 2
q
νr
· if r is even, 0 < r < ν
r ν
E [X ] = π
Γ 2
0 if r is odd, 0 < r < ν
Observation 14
If X ∼ T (ν) and Y = X 2 , it is Y ∼ F (1, ν).
Observation 15
If X ∼ T (ν) and Y = X −2 , it is Y ∼ F (ν, 1).
X ∼ Pareto (α, β)
βαβ
fX (x; α, β) = for x ≥ α
xβ+1
fX (x) All: α = 1
3 β=1
β=2
2 β=3
x
1 2 3 4
Observation 16
If X ∼ Pareto (α, β) and Y ∼ Exp β−1 , the two random variables
• It gets its name from its connection with the Extreme Value
Theorem (Lecture 6). These distributions are fat-tailed.
fX (x) ξ = .5
0.4 ξ=0
ξ = −.5
0.2
x
−4 −2 2 4 6
• Type I Extreme Value: ξ = 0 (Gumbel)
• Type II Extreme Value: ξ > 0 (Fréchet)
• Type III Extreme Value: ξ < 0 (reverse Weibull)
Generalized Extreme Value distributions (4/4)
• The m.g.f. and characteristic functions are quite involved.
• Moments are better obtained via direct integration, but are
defined for some values of ξ only.
• The mean is given by:
σ
µ + ξ [Γ (1 − ξ) − 1] if ξ 6= 0, ξ < 1
E [X] =
µ + σγ if ξ = 0
∞
if ξ ≥ 1
• . . . while the variance is as follows.
2
σ h i
Γ (1 − 2ξ) − (Γ (1 − ξ))2 if ξ 6= 0, ξ < 1
ξ
2 2
Var [X] = π2
σ2 if ξ = 0
6
1
∞ if ξ ≥ 2
The Gumbel (Type I GEV) distribution (1/2)
• The simplest GEV distributions (Type I) have ξ = 0 and
only a location and scale parameter.
0.2
x
−5 −2 1 4 7 10
The Fréchet (Type II GEV) distribution (1/2)
• The Type II GEV distributions are usually rephrased via
α ≡ ξ−1 > 0 & the transformation Y = σ + µ (1 − ξ) + ξX.
• Two alternative pieces of notation are used for them.
fY (y) All: α = 2
Stand.
µ=2
σ=2
0.5
y
2 4 6 8
1 fY (y) Stand.
µ=2
σ=2
0.5
y
−6 −4 −2
1 fW (w) Stand.
µ=2
σ=2
0.5
w
2 4 6
Observation
√ 18
α and W ∼ Weibull α, 0, 12 , it is as follows.
If X ∼ Exp
√
X= W & W = X2
Observation 19
−1
If Y ∼ Frechet (α, µY , σ), and W = (Y − µY ) + µW , it is as follows.
W ∼ Weibull α, µW , σ−1