P6-Random Variables and Distributions
P6-Random Variables and Distributions
6.1.2 A random variable is useful for quantifying experimental outcomes or descriptive events. Some-
times, it may be convenient to simplify the sample space so that events of interest can be
described by X.
6.1.3 Examples.
(iii) Toss 2 coins. Ω = {HH, HT, TH, TT}. Interested in how many more vertical strokes the
first letter has than the second letter. Define
(iv) n Bernoulli trials. Interested in no. of successes. Define X = no. of successes ∈ {0, 1, . . . , n}.
(v) Annual income Y has sample space [0, ∞). Taxable when annual income exceeds c say.
Only interested in part of income which is taxable. May define X = max{0, Y − c}.
25
6.2.2 Example. Toss a coin twice, with Ω = {HH, HT, TH, TT}. Define
• X = no. of heads;
• Y = 1 if both tosses return the same side, and = −1 otherwise.
6.2.4 Random variables and distribution functions are useful for describing probability models. The
probabilities attributed to events concerning a random variable X can be calculated from the
distribution function of X — cdf of X completely specifies random behaviour of X.
Example. If X denotes an integer-valued random variable and its distribution function F is
given, then we can calculate
26
6.3.2 Examples.
6.3.3 Definition. The mass function of a discrete random variable X is the function f : (−∞, ∞) →
[0, 1] such that
f (x) = P(X = x), x ∈ X(Ω).
Note: Alternative names → probability mass function or probability function.
Definition. The set {x ∈ X(Ω) : f (x) > 0} is known as the support of X.
Note: The support of X usually, but not necessarily, coincides with X(Ω).
6.3.4 Examples.
(iv) Let X be no. of failures before first success in a sequence of independent Bernoulli trials
with success probability p. Then X(Ω) = {0, 1, 2, . . .}, and X has mass function
(
(1 − p)x p, x = 0, 1, 2, . . . ,
f (x) =
0, otherwise.
27
(v) Let X be no. of failures before kth success in a sequence of independent Bernoulli trials
with success probability p. Then X(Ω) = {0, 1, 2, . . .}, and X has mass function
k − 1 + x
(1 − p)x pk , x = 0, 1, 2, . . . ,
f (x) = x
0, otherwise.
This is called a negative binomial distribution.
(vi) Suppose a random sample of size m are drawn without replacement from a collection of
k objects of one kind and N − k of another kind. Let X be no. of objects of the first kind
found in the sample. Then X has mass function
k N −k N
, x = max{0, m + k − N }, . . . , min{k, m},
f (x) = x m−x m
0, otherwise.
This is called a hypergeometric distribution.
The following figures display the mass functions of examples of the above discrete random
variables.
Binomial (8,0.2) Bernoulli (p=0.2)
0.8
0.3
0.6
0.2
0.4
0.1 0.2
0.0 0.0
2 5 8 0 1
28
6.3.5 The cdf of a discrete random variable X is a step function with jumps at values in the support
of X.
The following figures display the distribution functions of some discrete random variables.
Binomial (8,0.2) Bernoulli (p=0.2)
0.9 0.9
0.4 0.4
-0.1 -0.1
1 4 7 0.0 0.5 1.0
0.9 0.9
0.4 0.4
0 5 10 0 5 10
0.9
0.8
0.4
0.3
0 20 40 0 2 4
29
6.4.3 If the cdf F is differentiable, we can obtain the pdf f by f (x) = F 0 (x) (≥ 0 since F is
increasing).
6.4.4 The pdf f plays a similar role as the mass function P(X = x) for discrete X. Results for discrete
and continuous random variables can often be interchanged with P(X = x) and summation
R
sign Σ replaced by f (x) and integration sign .
Example. For any subset A of real numbers,
X
P(X ∈ A) = P(X = x) (discrete)
x∈A
Z
P(X ∈ A) = f (x) dx (continuous)
x∈A
30
(ii) Exponential distribution, exp(λ) (λ > 0):
( (
λe−λx , x > 0, 0, x ≤ 0,
f (x) = F (x) = −λx
0, x ≤ 0, 1 − e , x > 0.
Remarks:
– An exponential random variable describes the interarrival time, i.e. the random time
elapsing between unpredictable events (e.g. telephone calls, earthquakes, arrivals of buses
or customers etc.)
– The exponential distribution is memoryless, i.e. if X ∼ exp(λ),
Knowing that event hasn’t occurred in the past s units of time doesn’t alter the distri-
bution of arrival time in the future, i.e. we may assume the process starts afresh at any
point of observation.
– The scale parameter λ is also called the rate. The greater is λ, the shorter is the
interarrival time (the more frequent are the arrivals).
(iii) Gamma distribution, Gamma (α, β) (α, β > 0):
α α−1 −βx
β x e
, x > 0,
f (x) = Γ(α)
0, x ≤ 0,
R∞
where Γ(·) denotes the gamma function Γ(α) , 0 uα−1 e−u du.
Remarks:
– α: shape parameter; β: scale parameter, or rate.
– Gamma (1, β) ≡ exp(β).
(iv) Beta distribution, Beta (α, β) (α, β > 0):
Γ(α + β) xα−1 (1 − x)β−1 , 0 < x < 1,
f (x) = Γ(α)Γ(β)
0, otherwise.
31
(vi) Normal (or Gaussian) distribution, N (µ, σ 2 ):
(x − µ)2
1
f (x) = √ exp − , −∞ < x < ∞.
2πσ 2 2σ 2
Remarks:
– µ is the mean, and σ 2 is the variance (to be discussed later).
– The pdf f has a bell shape, with centre µ. The bigger is σ 2 , the more widely spread is f .
– The Central Limit Theorem (CLT) states that in many cases, the average or sum of a
large number of independent 1 random variables is approximately normally distributed.
– The Binomial (n, p) random variable is the sum of n independent Bernoulli random vari-
ables. Thus we should expect, by CLT, that Binomial (n, p) is approximately normal for
large n. In fact,
approx.
Binomial (n, p) ∼ N (np, np (1 − p)), for large n.
– N (0, 1) is known as the standard normal distribution, i.e. special case of N (µ, σ 2 ) when
µ = 0 and σ = 1.
The pdf and cdf of N (0, 1) are usually denoted by φ and Φ, respectively:
Z x
1 −x2 /2
pdf φ(x) = √ e , cdf Φ(x) = φ(y) dy, −∞ < x < ∞.
2π −∞
32
(viii) Student’s t-distribution with m degrees of freedom, tm :
Z
tm is distribution of p , for independent Z ∼ N (0, 1) and X ∼ χ2m .
X/m
Remarks:
– tm is heavy-tailed version of N (0, 1): tm approaches N (0, 1) when m → ∞.
– t1 ≡ Cauchy distribution with centre θ = 0.
(ix) F distribution with parameters (m, n), Fm,n :
X/m
Fm,n is distribution of , for independent X ∼ χ2m and Y ∼ χ2n .
Y /n
Note: F1,n ≡ (tn )2 , and (Fm,n )−1 ≡ Fn,m .
The following diagrams display the density and distribution functions of examples of the above
continuous random variables.
Density functions
U[0,1] exp(1)
0.9 0.9
0.4 0.4
-0.1
-0.9 0.2 1.3 0 2 4
Gamma (3,1) Beta (0.8,1.2)
0.20 3
0.05 1
Cauchy (t 1) N(0,1)
0.3
0.3
0.1 0.1
-10 0 10 -3 0 3
χ 2 (4 d.f.) F 5,10
0.7
0.12
0.3
0.01
0 5 10 1 3 5
33
Distribution functions
U[0,1] exp(1)
0.9 0.9
0.4 0.4
-0.1
-0.9 0.2 1.3 0.5 2.0 3.5
0.4 0.4
-0.1
0 5 10 0.0 0.5 1.0
Cauchy (t 1) N(0,1)
0.9 0.9
0.4 0.4
-10 0 10 -3 0 3
χ 2 (4 d.f.) F 5,10
0.9 0.9
0.4 0.4
6.4.9 The normal distribution and the normal-related distributions — χ2m , tm , Fm,n — are useful
for statistical inference such as hypothesis testing, confidence interval construction, regression,
analysis of variance (ANOVA), etc.
34
(b) Show that
P(X < x) = lim F (x − 1/n).
n→∞
(c) Deduce from (b) that if F is continuous, then P(X = x) = 0 for all x ∈ R.
(d) Give an example of X for which P(X < x) 6= F (x) for some x.
35