0% found this document useful (0 votes)
5 views

3. Continuous Random Variables

Uploaded by

Plugg TM
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

3. Continuous Random Variables

Uploaded by

Plugg TM
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

2P7: Probability & Statistics

Continuous Random Variables

Thierry Savin

Lent 2024

the royal flush, the best possible hand in poker, has a probability 0.000154%
Introduction
Course’s contents

1. Probability Fundamentals

2. Discrete Probability Distributions

3. Continuous Random Variables

4. Manipulating and Combining Distributions

5. Decision, Estimation and Hypothesis Testing

1/26
Introduction
This lecture’s contents

Introduction

Fundamentals of Continuous Random Variables

The Probability Density Function

The Exponential Density

The Gaussian Density

The Beta Density

2/26
Introduction
Continuous Random Variables

In the last lectures:


▶ We have seen how discrete random variables are defined and
described by their probability mass function
▶ We have given important examples of probability mass
functions:
• Bernoulli
• Geometric
• Binomial
• Poisson
▶ We have shown how to characterise probability mass functions
via expectation, variance and other moments.
In this lecture, we will consider random variables with a continuous
support, which are described by their probability density function,
and give a few important examples.

3/26
Fundamentals
Definition of a continuous random variable

▶ We have seen random variables assign a number to each


outcome of the sample space.
▶ Discrete random variables have a discrete set of possible
values.
▶ Continuous random variables will have a continuous set of
values.
▶ The support can be finite (for example: [0, 1], [a, b]) or
infinite (for example: [0, +∞) , (−∞, +∞)) in extent.
Example: spinner wheel
▶ The sample space is a continuous set of outcomes
X (orientations of the arrow)
▶ The angle with the horizontal is a continuous
random variable X on a finite set X = [0, 2π).
▶ P[2.68 < X ≤ 2.69] = 0.01

▶ P[X = 2.68983285921430891716 . . . ] = 0
4/26
Fundamentals
The CDF of a continuous random variable

▶ In general, P[X = a] = 0 for continuous random variables.


▶ We can still consider events corresponding to intervals,
P[a < X ≤ b], and we have seen
P[a < X ≤ b] = FX (b) − FX (a)
where FX (x) = P[X ≤ x] is the cumulative distribution
function (CDF) of X.
▶ FX (x) is an “informative” probability, even for a continuous
random variable.
Example: spinner wheel FX (x)
 1

 0 if x ≤ 0,
 x
FX (x) = if 0 ≤ x < 2π,

 2π
1 if 2π ≤ x.

0 x
0 2π
5/26
The Probability Density Function
Definition

▶ Formally, we define the probability density function (PDF) as


dF (x)
fX (x) = X
▶ Interpretation: dx
FX (x + dx) − FX (x)
fX (x) = lim
dx→0 dx
P[x < X ≤ x + dx]
= lim ⇔ fX (x)dx ≈ P[x < X ≤ x + dx]
dx→0 dx
So fX (x)dx is the probability of X falling within the infinitesimal
interval (x, x + dx].
fX (x)
Example:
 spinner wheel 1
 1 2π
if x ∈ [0, 2π),
fX (x) = 2π
 0 otherwise
gives a “good picture” of the
uniform distribution of X. 0 x
0 2π
Note: we can extend the support to R by setting fX (x) = 0 for x ∈
/ X.
6/26
The Probability Density Function
Properties

▶ Reminder on the properties of FX :


(1) FX is non-decreasing: FX (a) ≤ FX (b) if a ≤ b
(2) lim FX (x) = 0 and lim FX (x) = 1
x→−∞ x→∞
▶ From (1), the probability density function is positive:
fX (x) ≥ 0 for all x ∈ R
▶ From fX (x) = F ′ (x):
Z b X
fX (x)dx = FX (b) − FX (a) = P[a < X ≤ b]
a
▶ From (2), the probability density function is normalised:
Z +∞
fX (x)dx = 1
−∞
▶ In general, the
P
R seen with discrete mass distributions
become with density functions.
▶ Note that fX is not a probability. It has the dimension of X−1 .
7/26
Joint Probability Density Function
Definitions

▶ For two continuous random variables X and Y, we defined the


joint probability density function fXY (x, y ) from the joint CDF
FXY (x, y ) = P[X ≤ x ∩ Y ≤ y ]:
∂ 2 FXY (x, y )
fXY (x, y ) =
∂x∂y
▶ The sum rule becomes an integral rule and marginalisation is
stated as Z +∞
fXY (x, y )dy = fX (x)
−∞
▶ Conditional probability density function1 and product rule:
f (x, y )
fX|Y (x|y ) = XY ⇒ fXY (x, y ) = fX|Y (x|y )fY (y )
fY (y )
1 ∂
We define the conditional PDF fX|Y (x|y ) = F
∂x X|Y = y
(x|Y = y ) with:
P[(X≤x)∩(y <Y≤y +dy )] F (x,y +dy )−F (x,y ) 1 ∂FXY (x,y )
FX|Y = y (x|Y = y ) = lim P[y <Y≤y +dy ]
= lim XYF (y +dy )−FXY(y ) = f (y ) ∂y
dy →0 dy →0 Y Y Y
fXY (x,y )
⇒ fX|Y (x|y ) = fY (y )
. This is conditional to “ Y = y ” exactly.
8/26
Joint Probability Density Function
Definitions & Properties

▶ Law of total probability


Z +∞
fX (x) = fX|Y (x|y )fY (y )dy
−∞

▶ Bayes’ rule

fX|Y (x|y )fY (y )


fY|X (y |x) = Z +∞
fX|Y (x|y )fY (y )dy
−∞

▶ Independence

X and Y independent ⇔ fXY (x, y ) = fX (x)fY (y )


⇔ fX|Y (x|y ) = fX (x) for all x, y ∈ R×R
⇔ fY|X (y |x) = fY (y )
9/26
Probability Density Function
Expectation and moments of a PDF

▶ The probability density function can be used to compute


expectations:
Z +∞  Z +∞ Z +∞ 
E[g (X)] = g (x)fX (x)dx E[g (X, Y)] = g (x, y )fXY (x, y )dxdy
−∞ −∞
−∞

▶ In particular, we call
E[Xn ] the nth moment
E[(X − E[X])n ] the nth central moment
The following moments are important:
• The mean (or first moment)
Z +∞
E[X] = x fX (x)dx
−∞
• The variance (or second central moment)
Var[X] = E[(X − E[X])2 ] = E[X2 ] − E[X]2
10/26
Probability Density Function
Other characteristics of a PDF

There are many ways to characterise the distribution of a random


variable X. For example:
p
▶ The standard deviation is σ = Var[X]
▶ The mode is the value of x at which fX (x) is maximum
▶ The median is the value Q1/2 of x at which FX (x) = 12 (split
area under the PDF in two equal parts):
Z median Z +∞
1
fX (x)dx = = fX (x)dx
−∞ 2 median

▶ The 1st and 3rd quartiles are the values Q1/4 and Q3/4 of x at
which FX (x) = 14 and 34 , respectively
▶ The interquartile range: Q3/4 − Q1/4
▶ The skewness E[(X − E[X])3 ]/σ 3 . If the skewness is positive,
the distribution is skewed to the right (the “tail” of the
distribution is longer to the right)
11/26
Probability Density Function
Characteristics of a PDF

fX (x)
mode standard deviation

mean

E[(X − E[X])3 ]
>0
σ3

x
median
(2ndquartile)
1st quartile 3rd quartile

interquartile range

12/26
The Exponential Density
Definition

What is the time/distance between two successive successes?


▶ Consider Xt ∼ Pois(λt) the number of successes (or arrivals)
over a time interval t with an average rate of arrivals λ.
▶ We wish to derive the density fT (t) of the time intervals T
between arrivals.
▶ The probability fT (t)dt = P[t < T ≤ t + dt].
▶ The event {t < T ≤ t + dt} means both:
• No arrivals happen between [0, t]: {Xt = 0}
• Exactly one arrival happens between [t, t + dt]: {Xdt = 1}
▶ So fT (t)dt = P[Xt = 0 ∩ Xdt = 1] = PX (0) × PX (1)
t dt

(λt)0 e −λt (λdt)1 e −λdt −λt −λdt


= × = λe e dt
0! 1!
after simplification and taking dt → 0, fT (t) = λe −λt

13/26
The Exponential Density
Definition [DB p.28]

A random variable X is said to have an Exponential distribution


with parameter λ > 0 if:
 −λx
λe if x ≥ 0,
X ∼ Exp(λ) ⇔ fX (x) =
0 otherwise.
The support of X, X = [0, ∞), is continuous infinite.

fX (x) FX (x)
2 1

1.5 0.75

1 0.5

0.5 0.25 λ= 2
λ= 1
λ = 0.5
0 x 0 x
0 1 2 3 4 5 0 1 2 3 4 5
Z +∞
Verify that fX (x)dx = 1.
−∞
14/26
The Exponential Density
Properties of X ∼ Exp(λ)

▶ Expectation E[X] = 1
[DB p.28]
λ ∞
Z ∞ Z ∞ 
−λx ∞ 1
dx = −xe −λx e −λx dx = − e −λx

E[X] = xλe 0
+
0 0 λ 0

▶ Variance Var[X] = 1
[DB p.28]
Z ∞
λ2 Z ∞
∞ 2
E[X2 ] = x 2 λe −λx dx = −x 2 e −λx 0 + 2xe −λx dx =

0 0 λ2
▶ Mode xmax = 0
Obvious from the curve. . .
▶ Median Q1/2 = ln 2
λ
See next
ln 34
▶ Quartile Qp = − ln(1−p)
λ , Q1/4 = λ , Q3/4 = ln 4
λ
Z x
FX (x) = fX (ξ)dξ = 1 − e −λx so FX (Qp ) = p ⇔ 1 − e −λQp = p
0
▶ Skewness 2 > 0 (strongly right-tailed)
Tedious but not difficult
15/26
The Exponential Density
Examples

Here are a few instances where the exponential density occurs:


▶ Business engineering: amount of money spent in one trip to
the supermarket; amount of time a clerk spends with their
customer;
▶ Reliability engineering: amount of time a product lasts;
▶ Earth sciences: time between earthquakes, geyser eruptions,
. . . ; climatology;
▶ Physics: time for a radioactive particle to decay; barometric
formula (how the density of air changes with altitude);
The exponential distribution describes the time for a continuous
process to change state; the geometric distribution describes the
number of trials necessary for a discrete process to change state.

16/26
The Gaussian Density
Definition [DB p.28]

A random variable X is said to have a Gaussian2 (or Normal)


distribution with mean µ and variance σ 2 if:
1 (x−µ)2
X ∼ N (µ, σ 2 ) ⇔ fX (x) = √ e − 2σ2 for all x ∈ R
2πσ 2
The support of X, X = R, is continuous infinite.
fX (x) FX (x)
1

0.75
0.4

0.5
0.2
(µ, σ 2 ) = −1, 21

0.25 (µ, σ 2 ) = (0, 1)
(µ, σ 2 ) = (2, 2)
0 x 0 x
−4 −2 0 2 4 6 −4 −2 0 2 4 6
Z +∞
R +∞ 2 R +∞ R +∞
Verify that fX (x)dx = 1; hint: calculate −∞
2
e −x dx = −∞ −∞ e −(x
2 +y 2 )
dxdy in cylindrical coordinates
−∞
2
named after the German mathematician Carl Friedrich Gauss (1777-1855)
17/26
The Gaussian Density
Cumulative distribution

▶ N (0, 1) is called the standard Gaussian distribution. We will


show in the next lecture that:
Y ∼ N (0, 1) ⇔ X = µ + σY ∼ N (µ, σ 2 ) [DB p.29]

▶ The cumulative distribution function of Y ∼ N (0, 1) is:


Z y
1 ξ2
FY (y ) = Φ(y ) = √ e − 2 dξ [DB p.29]
2π −∞
Φ(y ) is tabulated p.29 of the Maths Data Book. By
symmetry, you can verify Φ(−y ) = 1 − Φ(y ) and Φ(0) = 12 .
▶ The CDF of X ∼ N (µ, σ 2 ) is Φ x−µ

σ .
▶ Most computing environments (Python, MATLAB. . . ) have
an “error function” called erf. Be cautious that
Φ(y ) = 12 1 + erf( √y2
 

18/26
The Gaussian Density
Properties of X ∼ N (µ, σ 2 )

In the following, we write X = µ + σY with Y ∼ N (0, 1)


▶ Expectation E[X] = µ [DB p.28]
R +∞ 2
E[Y] = √1
2π −∞
y e −y dy = 0 (integrand is odd), E[X] = σE[Y] + µ
▶ Variance Var[X] = σ 2 [DB p.28]
R +∞ 2 2 +∞ R +∞ 2
E[Y2 ] = √12π y 2 e −y dy = √1 − y2 e −y e −y dy = 1,
 
−∞ 2π −∞
+ −∞
E[X2 ] = σ 2 E[Y2 ] + 2σµE[Y] + µ 2

▶ Mode xmax = µ
Obvious from the curve. . .
▶ Median Q1/2 = µ
See next
▶ Quartile Qp = µ + σΦ−1 (p)
Quartile of Y is Φ−1 (p)
Φ−1 (1/2) = 0 and Φ−1 (3/4) = −Φ−1 (1/4) ≈ 0.6745
▶ Skewness 0
By symmetry
19/26
The Gaussian Density
Properties of X ∼ N (µ, σ 2 )

▶ Confidence interval: P[|X − µ| ≤ mσ] = 2Φ(m) − 1

P[|X − µ| ≤ mσ] = P[|Y| ≤ m]


= P[−m ≤ Y ≤ m]
= Φ(m) − Φ(−m)
= 2Φ(m) − 1
How likely is X within m standard deviations of the mean?
• We have 2Φ(1) − 1 ≈ 68% confidence that |X − µ| ≤ σ
• We have 2Φ(2) − 1 ≈ 95% confidence that |X − µ| ≤ 2σ
▶ Familiarise yourself with the lookup table for Φ in the Data
Book p. 29.

20/26
The Gaussian Density
Examples

The Gaussian density is very common. It occurs in:


▶ Sciences: measurement errors in experiments are often
modelled by a normal distribution (we will see why!);
▶ Physics: position of a diffusing particle; ground state in a
quantum harmonic oscillator; thermal radiation;
▶ Education: standardised tests, exam results.
The Log-normal distribution (where the logarithm of a variable has
a Gaussian density) is commonly seen in:
▶ Biology: physiological measurement (blood pressure, size and
weight, length of hair. . . );
▶ Chemistry: particle size distribution, concentration of rare
elements in minerals. . .
▶ Computing: file sizes, length of emails. . .
▶ Finance: exchange rates, price indices, stock market indices. . .
▶ Social Sciences: demographics, scientometrics. . .
21/26
The Beta Density
Definition

▶ Suppose we observe that k out of n Bernoulli trials are


successes. What is the probability density of the Bernoulli
parameter p given this observation?
▶ From the Binomial distribution, Pk|p (k|p) = nCk p k (1 − p)n−k
▶ Using Bayes rule:

Pk|p (k|p)fp (p)


fp|k (p|k) = Z 1
Pk|p (k|p)fp (p)dp
0

▶ We assume that prior to any observation, all values of


p ∈ [0, 1] are believed to be equally likely, fp (p) = 1
▶ After some calculations we find
(n + 1)! k
fp|k (p|k) = p (1 − p)n−k
k!(n − k)!
22/26
The Beta Density
Definition [DB p.28]

A random variable X is said to have an Beta distribution with


shape parameter α > 0 and β > 0 if:

 Γ(α+β) α−1
x (1 − x)β−1 if x ∈ [0, 1],
X ∼ Beta(α, β) ⇔ fX (x) = Γ(α)Γ(β)
0 otherwise.

Z ∞
where the Gamma function is defined Γ(a) = ξ a−1 e −ξ dξ (with
0
a > 0).
The support of X, X = [0, 1], is continuous finite.
▶ The Gamma function is a generalisation of the factorial to
non-integers
▶ It has the property Γ(a) = (a − 1)! when a is an integer.3
3
From the previous slide, verify fp|k = Beta(α, β) the probability density of p
after the observation of k = α − 1 successes and n − k = β − 1 fails.
23/26
The Beta Density
Properties of X ∼ Beta(α, β)

fX (x) FX (x)
1
2
0.75
1.5
0.5
1
(α, β) = (1, 1)

0.5 0.25 (α, β) = (2, 5)


(α, β) = (2, 2)
(α, β) = (3, 2)
0 x 0 x
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

α
▶ Expectation E[X] = α+β [DB p.28]

αβ
▶ Variance Var[X] = (α+β)2 (α+β+1) [DB p.28]

No need to know this (but here for completeness)


α−1
▶ Mode xmax = α+β−2 for α, β > 1
▶ Median no closed-form expression. . .
▶ Quartile no closed-form expression. . .

2(β−α) α+β+1
▶ Skewness √ (tail’s side depends on the sign of β − α)
(α+β+2) αβ
24/26
The Beta Density
Examples

Here are some real-life examples of where the Beta Distribution


can be observed:
▶ Quality Control: proportion of defective items in a production
process;
▶ Sports Analytics: winning probability of sports teams;
▶ Environmental Modelling: probability distribution of
precipitation;
▶ Medicine: prevalence of a disease in a population, success rate
of a treatment;
▶ Finance: range of returns on investment portfolios or assets.

25/26
Additional Remarks

Two additional remarks:


▶ It is possible to define a probability density function for a
discrete random variable using the delta function!
Consider a discrete random variable X with probability mass
function PX and support X, then:
X
fX (x) = PX (k)δ(x − k)
k∈X

is the probability density function of X.


▶ It is possible to define conditional expectations:
Z +∞
E[X|Y = y ] = x fX|Y (x|y )dx (it is a function of y )
−∞

26/26
You can attempt all problems in Examples Paper 5

You might also like