0% found this document useful (0 votes)

7 views

03 - Random Variables 1

Uploaded by

park miru

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

03 - Random Variables 1

Uploaded by

park miru

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

SMSTC (2022/23)

Foundations of Probability

www.smstc.ac.uk

Contents

3 Random variables and their laws I 3–1

3.1 Random variables and their laws . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–1
3.1.1 Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–1
3.1.2 Law of a discrete random variable . . . . . . . . . . . . . . . . . . . . . . 3–2
3.1.3 Uniform random variables and continuous random variables . . . . . . . . 3–2
3.2 Distribution functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–3
3.2.1 Types of distribution function . . . . . . . . . . . . . . . . . . . . . . . . . 3–4
3.3 Transformation rules and densities . . . . . . . . . . . . . . . . . . . . . . . . . . 3–7
3.4 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–9
3.4.1 Definition of expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–9
3.4.2 Properties of expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–11

(i)
SMSTC (2022/23)
Foundations of Probability
Chapter 3: Random variables and their laws I
The Probability Teama

www.smstc.ac.uk

Contents
3.1 Random variables and their laws . . . . . . . . . . . . . . . . . . . . . 3–1
3.1.1 Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–1
3.1.2 Law of a discrete random variable . . . . . . . . . . . . . . . . . . . . . 3–2
3.1.3 Uniform random variables and continuous random variables . . . . . . . 3–2
3.2 Distribution functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–3
3.2.1 Types of distribution function . . . . . . . . . . . . . . . . . . . . . . . . 3–4
3.3 Transformation rules and densities . . . . . . . . . . . . . . . . . . . . 3–7
3.4 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–9
3.4.1 Definition of expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–9
3.4.2 Properties of expectation . . . . . . . . . . . . . . . . . . . . . . . . . . 3–11

3.1 Random variables and their laws

3.1.1 Laws
Recall from Chapter 1 that a (real-valued) random variable is a measurable function X
from a probability space (Ω, F, P) into (R, B) (the set R of real numbers endowed with the Borel
σ-algebra B).
More generally, we can consider a random variable, or random element, X as a measurable
function from an “abstract” probability space (Ω, F, P) into a “concrete”, or “observation” mea-
surable space (S, S). Examples of random variables in this more general sense are real-valued
random variables ((S, S) = (R, B)), random vectors ((S, S) = (Rd , B(Rd )) random sequences,
and random functions.
The random variable X then induces a probability measure PX on (S, S), given by

PX (B) = P(X ∈ B), B ∈ S. (3.1)

The probability measure PX depends on both P and X. It describes the probabilities that X
takes different values, or ranges of values, in the observation space S, and is referred to as the
law, or distribution, of the random variable X. b
a
[email protected]
b
For (quite) good reasons the terms law and distribution are used interchangeably throughout these notes.

3–1
SMSTC: Foundations of Probability 3–2

3.1.2 Law of a discrete random variable

A discrete random variable X : (Ω, F, P) → (S, S) is, by definition, one that takes countably
many values. In other words, the random variable X is discrete if and only if there is a countable
set D ∈ S such that X(ω) ∈ D for all ω ∈ Ω. We also assume that D and all its subsets are
members of S. Hence the law PX of X is a probability measure on D. We know that a probability
on a countable set D can be defined by defining its values on singletons (the individual elements).
These values form the so-called probability (mass) function. Thus, the probability function
is given by
p(x) = PX {x} = P(X = x), x ∈ D.
Clearly, if B ⊆ D then [ X
PX (B) = PX {x} = p(x).
x∈B x∈B

So p is sufficient for computing PX .

Exercise 3.1. Suppose that the random variable X takes n distinct values (i.e. X(Ω) is a set
with n elements). Show that σ(X) has 2n elements and describe (give a procedure for describing)
them.

A discrete random variable is called simple if it takes only finitely many values. Any random
variable can be approximated by (a sequence of) simple random variables. More precisely, the
following statement holds.

Lemma 3.1. Let X : (Ω, F, P) → (R, B). Then there exists a sequence X1 , X2 , . . . of simple
random variables such that limn→∞ Xn (ω) = X(ω) for all ω ∈ Ω. If X(ω) ≥ 0 for all ω ∈ Ω,
then we can choose the sequence so that 0 ≤ Xn (ω) ≤ Xn+1 (ω) for each n and ω.

Consider for simplicity a non-negative X. Then the result of the lemma is a consequence of
the following construction: for any n, divide the interval [0, n) into n · 2n disjoint intervals
[k2−n , (k + 1)2−n ), k = 0, . . . , n2n − 1 of equal length 2−n . Then let Xn = k2−n if X ∈
[k2−n , (k + 1)2−n ) and Xn = n if X ≥ n. Clearly, Xn is a simple random variable for any n.
Further, for any ω, if X(ω) = x, then Xn (ω) = xn is an increasing sequence and 0 ≤ x−xn ≤ 2−n
for all n ≥ x.
Another method of approximation of any random variable by discrete random variables—with
lattice, or arithmetic distributionsc is also of use. Divide the whole real line into a countable
number of disjoint intervals [k2−n , (k + 1)2−n ), n = . . . , −2, −1, 0, 1, 2, . . . of equal length 2−n ,
and, for any k, let Xn = k2−n if X ∈ [k2−n , (k + 1)2−n ). Then again Xn ↑ X and, for any n,

P(X − Xn ≤ 2−n ) = 1.

3.1.3 Uniform random variables and continuous random variables

Recall from Chapter 1 that a uniform random variable U taking values in the interval [0, 1]
(endowed with the Borel σ-algebra) is such that its law satisfies

PU ([a, b]) = b − a, 0 ≤ a ≤ b ≤ 1.

A standard approach to the construction of such a random variable is to take the probability
space (Ω, F, P) to be given by Ω = [0, 1], F = B (the Borel σ-algebra on [0, 1]), and P equal to
c
A random variable X hasP a lattice, or arithmetic distribution with span h > 0 if, first, X may take only values
which are multiples of h, i.e. ∞n=−∞ P(X = nh) = 1, and, secondly, h is the minimal positive number for which
the property above holds.
SMSTC: Foundations of Probability 3–3

Lebesgue measure. Having shown that the latter exists (see Chapter 1), we simply define U to
be the identity function, given by U (x) = x for all x ∈ [0, 1], and observe that its law PU is also
Lebesgue measure, i.e. PU is the law of a uniform random variable as required. Notice that if D
is a countable set in [0, 1], then
P(U ∈ D) = 0.
For example, P(U is rational number) = 0.
Generally, a random variable X is defined to be continuous if

P(X = x) = 0 for all x.

Then, by the axiomatic property P3 of Chapter 1, for any countable set D,

P(X ∈ D) = 0.

Hence, in particular, a uniform random variable is continuous.

3.2 Distribution functions

We now consider real-valued random variables defined on a probability space (Ω, F, P).
Recall that, for any such random variable X, the probability measure PX on the measurable
space (R, B) (where B is the Borel σ-algebra on R) is the law, of X and is given by (3.1), which
here becomes
PX (B) = P(X ∈ B), B ∈ B. (3.2)
The distribution function of a real-valued random variable X is defined by

F (x) := PX (−∞, x] = P(X ≤ x), x ∈ R. (3.3)

Note that the distribution function F depends only on the law PX of X, i.e. if two random
variables have the same law, they have the same distribution function. Further, as we shall see
below, the distribution function determines the corresponding law uniquely.

Lemma 3.2. Any distribution function F has the following properties.

(i) x1 < x2 ⇒ F (x1 ) ≤ F (x2 ),
(ii) limx→−∞ F (x) = 0,
(iii) limx→+∞ F (x) = 1,
(iv) limn→∞ F (x + 1/n) = F (x).

Here (i) is the monotonicity property; the properties (ii), (iii) and (iv) are direct consequences
of the continuity property of probability (see the properties P9 and P10 of Chapter 1); further
(iv) is frequently called the right continuity property of distribution functions. To establish
these properties is an easy exercise. For example, to show (ii) take any sequence xn ↓ −∞ and
T the events An := {X ≤ xn } = {X ∈ (−∞, xn ]}. Then the events An are decreasing in
consider
n and n An = ∅. Therefore P(An ) = F (xn ) → 0 as n → ∞, by the continuity property P10 of
Chapter 1.

Exercise 3.2. Verify the remaining properties given by Lemma 3.2.

Exercise 3.3. Let X be a random variable with law PX and distribution function F (x) =
PX (−∞, x] = P(X ≤ x) for all x ∈ R. Carefully justify the following important formulae.
(i) P(X ∈ (a, b]) = F (b) − F (a),
SMSTC: Foundations of Probability 3–4

(ii) P(X ∈ (a, b)) = F (b−) − F (a),

(iii) P(X ∈ [a, b]) = F (b) − F (a−),
(iv) P(X = a) = F (a) − F (a−).

A consequence of the following lemma is that the properties (i)-(iv) of Lemma 3.2 characterise
the class of distribution functions of real-valued random variables.
Lemma 3.3. Suppose that a function F on R satisfies the properties (i)-(iv) of Lemma 3.2. Then
there exists a random variable X having F as its distribution function. Further, F determines
the law PX of X uniquely.

Proof The standard proof is as follows. Given a function F satisfying (i)-(iv) of Lemma 3.2,
we show first that there exists a unique probability measure, i.e. a unique law, P on (R, B) such
that
P(−∞, x] = F (x), x ∈ R. (3.4)
The above relation clearly defines P uniquely on general intervals as in Exercise 3.3, and so also
on the algebra each of whose sets is a finite union of disjoint intervals (where the probability
assigned by P to such a set is the sum of the probabilities assigned to the individual intervals).
It is now readily verified from the properties of F that P thus defined satisfies the conditions
of the Extension Theorem (Chapter 1), and so P may now be extended to a unique probability
measure on (R, B) such that (3.4) holds as required.
To show that there exists a random variable X which has P as its law, we take (Ω, F) = (R, B)
and the random variable X to the identity function X(ω) = ω for all ω ∈ R. This has law
PX = P as required, and so has distribution function F .
The above construction of a probability space and random variable having a given law, or dis-
tribution, is sometimes referred to as the canonical construction: the “abstract” probability
space on which the random variable is defined and the “concrete”, or “observation”, space in
which it takes its values are the same, the random variable is the identity function, and the law
of the random variable coincides with the underlying probability measure. However, we remark
that in general much of the power of probability theory comes from our ability to be entirely
flexible in the choice of probability space.
The following alternative construction of a random variable having a given distribution function
is to some extent illustrative of the last remark above. Again given a function F satisfying
(i)-(iv) of Lemma 3.2, define its generalised inverse F −1 : [0, 1] → R by

F −1 (t) = inf{x : F (x) ≥ t}. (3.5)

The function F −1 is measurable with respect to the Borel σ-algebras on R and [0, 1] (this follows
since it is monotone non-decreasing—see the Exercises of Chapter 1). Now let a random variable
U have a uniform distribution on (0, 1) and define X = F −1 (U ). It is a useful exercise (see below)
to show that X has distribution function F . This latter construction forms the basis of one of
the most useful algorithms for the simulation of random variables—see Chapter 15 for more
details.
Exercise 3.4. Suppose that a distribution function F is continuous and strictly increasing.
Then the generalised inverse F −1 defined by (3.5) above is just the usual inverse function. Show
that in this case, the random variable X = F −1 (U ) does indeed have distribution function F .
Extend the result to the general case (hint: draw a picture).

3.2.1 Types of distribution function

We now discuss the various kinds of distribution function on R.
SMSTC: Foundations of Probability 3–5

Discrete distribution functions

A distribution function F is discrete if it is the distribution function of a discrete random

variable X taking values in some countable subset S of R, i.e. such that P(X ∈ S) = 1. Then
we may define the associated probability function
p(s) := P(X = s) = F (s) − F (s−), s ∈ S. (3.6)
Thus, for s ∈ S (provided p(s) > 0), the distribution function F (which P is right continuous)
jumps at s, the size of the jump being p(s); in particular F (s−) = 0
s0 ∈S, s0 <s p(s ), while
F (s) = s0 ∈S, s0 ≤s p(s0 ). Further, if (a, b) is any open interval containing no points of S, then
P

P is constant on (a, b). This follows since, if a < x < b then F (x) − F (a) = P(a < X ≤ x) =
F
s∈S,a<s≤x P(X = s) = 0.
Example 3.1. Let X be a random variable such that P(X = n) = 2−n , n ∈ N. (where
N = {1, 2, 3, . . . } is the set of natural numbers). Then distribution function of X is as shown in
Figure 3.1, i.e. is a step function.

0
0 1 2 3 4 5

Figure 3.1: Distribution function of a discrete random variable

Example 3.2. Let X be a random variable such that for every rational number of the form m/n
where m, n are positive integers with no common factors, we have P(X = m/n) = c2−(m+n)
where c is chosen so that P(X ∈ Q) = 1. Its distribution function is discrete because the set Q
of rational numbers Q is countable. Unfortunately, we can’t draw it. (There are no intervals
(a, b) containing no rational points.)

Continuous distribution functions

A distribution function F is continuous if it is a continuous function, i.e. if F (x) − F (x−) = 0

for all x ∈ R. In other words, a continuous distribution function is the distribution function of
a continuous random variable.
Example 3.3. The distribution function of a uniform random variable in [0, 1] is given by

0,
 x<0
F (x) = x, x ∈ [0, 1]

1, x > 1,


and hence is continuous. It is illustrated in Figure 3.2.

0
0 1

Figure 3.2: A (absolutely) continuous distribution function

Except at the points 0, 1, the function F is also differentiable with derivative

R u f (u) = 1 if 0 <
u < 1 and 0 otherwise. If we arbitrarily define f (0) = f (1) = 0, we also have −∞ f (t)dt = F (u)
for all u ∈ R.
SMSTC: Foundations of Probability 3–6

Absolutely continuous distribution functions

A (necessarily continuous) distribution function F is absolutely

R∞ continuous if there exists a
function f (referred to as the density of F ) such that −∞ f (x) dx = 1 and
Z x
F (x) = f (t) dt, x ∈ R.
−∞

Then, clearly, if X has distribution function F , for any a ≤ b,

Z b
P(X ∈ (a, b]) = F (b) − F (a) = f (t) dt.
a

The density f is not uniquely defined. For instance, it can be changed on a finite (or countable)
set and such a change will not affect the integral above. Usually, oned imposes additional
regularity conditions, such as continuity, resulting in uniqueness. Note that uniform distribution
is absolutely continuous.
Rb
Note also that integrals arising in probability theory of the form a h(x) dx (where we allow the
possibilities a = −∞ and b = ∞) are Lebesgue integrals. However, provided h is measurable
and bounded with a set of discontinuities of Lebesgue measure zero, then such integrals coincide
with their standard Reimann counterparts of elementary calculus —see [3]. This situation is
easily adapted to cover all practical applications.

Singularly continuous distribution functions

Unfortunately, not all continuous distribution functions are absolutely continuous. There also
exist so-called singularly continuous distribution functions. We do not provide a formal
definition of such functions. Instead we simply remark that they have very strange properties
and give one example of such a function.
Example 3.4. Consider the space (Ω = {0, 1}N , F, P), where F is the product σ-algebra, and
P is such that P{ω ∈ Ω : ω1 = i1 , . . . , ωn = in ) = 2−n , i1 , . . . , in ∈ {0, 1}, n ∈ N. Let
∞
X 2ωn
V (ω) := .
3n
n=1

One can show that the random variable V defined in Example 3.4 has a continuous but not
absolutely continuous distribution function. This is illustrated in Figure 3.4.

Figure 3.3: A continuous distribution function without density

In particular, the derivative of this function exists at almost all points (i.e. at all points except
for those in a set of Lebesgue measure zero) and—what is really surprising—at each such point
the value of the derivative is zero!
d
unconsciously
SMSTC: Foundations of Probability 3–7

General distribution functions

Suppose that F, G are distribution functions. Then, for any λ ∈ (0, 1), the function λF +(1−λ)G
is a distribution function. (Probabilistically, if X, Y are random variables with distribution
functions F, G, respectively, then we can define a new random variable Z which, independently
of X and Y , takes the same value as X with probability λ or the same value as Y with probability
1 − λ.) Thus, if F is discrete and G continuous, then λF + (1 − λG) is neither discrete nor
continuous: it is mixed. The question now is: can we describe all distribution functions as
mixtures of the three types (discrete, absolutely continuous, singularly continuous) mentioned
above? The answer, formalised in the following remark, is yes.
Remark 3.1. Any distribution function F has a unique decomposition

F = λd Fd + λac Fac + λsc Fsc

where Fd , Fac , Fsc are discrete, absolutely continuous, and singularly continuous distribution
functions, respectively, and where the coefficients λd , λac , λsc are nonnegative and such that
λd + λac + λsc = 1.
The last two terms of the above decomposition are known as the continuous part of F , and the
first term is the discrete part of F . We will not prove this result, but refer, e.g., to [2]. The
intuition behind this result is clear. First, if the distribution function F has jumps, it may have
at most countable number of them with a total jump size, say, λd . Consider first a random
variable which is discrete and takes values where these jumps occur, with the same probabilities
normalised by λ−1 d ; then its distribution function is Fd . From the remainder, we can further
subtract an absolutely continuous component—but this is a more complicated story.

Differentiation: a word of caution. The subject of densities involves the concept of deriva-
tives of functions that are not necessarily everywhere differentiable. Recall
R ∞that a distribution
function G has a density g if g is a non-negative function such that −∞ g(x)dx = 1 and
Rx
G(x) = −∞ g(t)dt for all x. One can deduce from that that the function G(x) is differentiable
0
almost everywhere (i.e. for all x except in a set of Lebesgue measure zero) with G (x) = g(x).
Note that we may change values of g(x) at a finite or countably infinite number of points—
then G is unchanged. In a more sophisticated manner, we may change values of g(x) on a
set of Lebesgue measure zero—again, with no effect on G. If g is a continuous function, then
0
g(x) = G (x) for all x.

3.3 Transformation rules and densities

Consider a random variable X on a probability space (Ω, F, P). Let ϕ be a real-valued mea-
surable function. Then, as we already know, Y = ϕ(X) is also a random variable, and we are
transforming X into Y using a function (or transformation) ϕ. One of the basic questions is:
assume that we know the law PX of X and the function ϕ, then how may we find the law PY
of Y ? This is not an easy question, and here we provide answers only in two particular cases,
(a) for a discrete random variable X and general function ϕ, and
(b) for an absolutely continuous random variable X and differentiable function ϕ with strictly
positive derivative.
In the case of a discrete random
P variable X, suppose that X takes each possible value an
with probability pn , where n∈N pn = 1 and N is either a finite or a countably infinite set.
Consider the set B = {ϕ(an ), n ∈ N }. Clearly, the set B is also countable (since it cannot have
more elements than does N ); denote its different elements by b1 , b2 , . . . . The random variable
Y = ϕ(X) takes values only in B, and, for any bm , we have Y = bm if and only if X = an where
SMSTC: Foundations of Probability 3–8

an is such that bm = ϕ(an ). Now let qm = P(Y = bm ). Then

X
qm = pn
n : ϕ(an )=bm
P
and m qm = 1.

Exercise 3.5. Let X be the outcome of rolling a fair die, and let Y = 1 if X divides 3 and
Y = 2 otherwise. Find the law (distribution) of Y .

In the case of an absolutely continuous random variable, the story is different: a merely one-to-
one function ϕ can simultaneously change both the values of the transformed random variable
and its law in a quite general fashion.

Theorem 3.1. Let X be an absolutely continuous real-valued random variable with density f .
Let ϕ : R → R be a differentiable function with strictly positive derivative and let ψ be its inverse
function. Then ϕ(X) is a random variable with absolutely continuous distribution function and
density f given by
f (ψ(s))ψ 0 (s) for all s ∈ ϕ(R),
and 0 elsewhere.

Proof Since ϕ has a strictly positive derivative, it is strictly increasing and its inverse function
exists with domain ϕ(R). Then, the distribution function of ϕ(X) is, for any t ∈ ϕ(R),
Z ψ(t)
P(ϕ(X) ≤ t) = P(X ≤ ψ(t)) = f (x)dx.
−∞

By changing variable in the integral we have

Z ψ(t) Z t
f (x)dx = f (ψ(s))ψ 0 (s)ds,
−∞ −∞

where we set ψ 0 (s) = 0 for s 6∈ ϕ(R). From the definition of an absolutely continuous distribution
function we see that, indeed, ϕ(X) has an absolutely continuous distribution function and that
its density is the function inside the last integral.

Exercise 3.6. Assume that, under the conditions of Theorem 3.1, the function ϕ is instead
differentiable with a strictly negative derivative with inverse function ψ. Show that, in this case,
the density of ϕ(X) is equal to f (ψ(s))|ψ 0 (s)| for s ∈ ϕ(R) and to 0 otherwise.

Exercise 3.7. Let U be a uniform random variable taking values in the interval (0, 1). Show
that, for a > 0, the density function g of the random variable Y = U 1/a is given by g(y) = ay a−1
for y ∈ (0, 1) and g(y) = 0 otherwise. Similarly find the density function of Y in the case a < 0,
identifying carefully the region in which it is nonzero.

Exercise 3.8. Let U be a uniform random variable taking values in the interval (0, 1). Show
that the density function g of the random variable Y = eU is given by g(y) = y −1 for y ∈ (1, e)
and g(y) = 0 otherwise. Find the density functions of each of the random variables e2U , e−U ,
and also e−cU for c > 0.

Exercise 3.9. Let X be a exponential random variable with density function f given by f (x) =
e−x for x ≥ 0 and f (x) = 0 otherwise. Show that, for a > 0, the density function g of the
random variable Y = e−X/a is given by g(y) = ay a−1 for y ∈ (0, 1) and g(y) = 0 otherwise.
SMSTC: Foundations of Probability 3–9

Note that each of the above three exercises is equally easily solved from first principles arguing
along the lines of the proof of Theorem 3.1.
Consideration of more general transformations ϕ is possible and relatively easy for random
variables taking values in R (the story for random vectors with values in Rd is more complicated).
For instance, we may assume that ϕ is piecewise differentiable. The problem then becomes one
in differential calculus and the general theorem is omitted. However, some examples are due.

Exercise 3.10. Let the random variable X1 have a uniform distribution (taking values) in
[−1, 1] and let the random variable X2 a uniform distribution (taking values) in [−1, 2]. Let
ϕ(x) = x2 . Find the densities of ϕ(X1 ) and ϕ(X2 ).

Exercise 3.11. Let X be a random variable with density f (x) = c(1 + x2 )−1 , x ∈ R. Let
ϕ(x) = cosh(x). Find the density (and hence show that it exists) of ϕ(X).

3.4 Expectation
The expectation, or mean, EX of a real-valued random variable X : (Ω, F) → (R, B), if it
can be defined, is justified (for instance) by the Law of Large Numbers, which was discussed in
Chapter 2 for Bernoulli trials, and which will be proved more generally in a later chapter.

3.4.1 Definition of expectation

We shall show how EX is defined for successively more general classes of random variable X.

B Indicator random variables

The simplest possible random variable is the indicator random variable IA of an event A,
defined by (
1, if ω ∈ A,
IA (ω) =
0, if ω 6∈ A.
Since IA takes the value 1 with probability P(A) or 0 with probability 1 − P(A), we define

EIA = P(A).

B Simple random variables

Recall that a random variable is simple if and only if it takes a finite number of distinct values.
A random variable X is simple if and only if it has a representation of the form
n
X
X= ai IAi (3.7)
i=1

for events A1 , . . . , An and constants a1 , . . . , an . We then define

n
X
EX = ai P(Ai ). (3.8)
i=1

It is necessary to check that EX as above is well-defined: a simple random variable X has

in general infinitely many representations of the form (3.7). However, we may define a unique
“canonical” representation by requiring that the events A1 , . . . , An are disjoint and the constants
a1 , . . . , an are distinct. It is then not difficult to see that, for any representation of X in the
SMSTC: Foundations of Probability 3–10

form (3.7), EX as defined by (3.8) evaluates to the same quantity as in this “canonical” case.
[Exercise: check this!].
Note the monotonicity property: if X and Y are simple random variables defined on the same
probability space and if X ≥ Y , then EX ≥ EY : to see this consider representations (3.7) for
both X and Y in which the events A1 , . . . , An are the same.
Further properties of expectation are given below.

B Nonnegative random variables

Suppose that X is a general nonnegative random variable. We allow also the possibility that X
may, sometimes, take the value +∞. We define
EX := sup{EY : Y is simple nonnegative, and Y ≤ X}. (3.9)
Note that if X is itself a simple random variable, then it follows from the monotonicity property
for simple random variables given above that the definitions (3.8) and (3.9) of EX agree.
Note further that, in the case where the (nonnegative) random variable X is discrete, but takes
a countably infinite set of values, it again has a representation of the form (3.7) and EX is again
as given by (3.8)—in each case with n replaced by ∞. (This extension to random variables
which take countably infinitely many values in general only works when they are nonnegative.)
Note that EX may be infinite, even when X is always finite.
Exercise 3.12. Let the probability space (Ω, F, P) be given by taking Ω = N (where N =
{1, 2, . . . }), taking F to be the σ-algebra consisting of all subsets of Ω, and taking P to be such
that P({ω}) = 2−ω for each ω ∈ N (observe that this does define a valid probability measure
P on (Ω, F)). Define the random variable X P by X(ω) = aω for some a > 0. Then X has a
representation of the form (3.7) given by X = ω∈N aω I{ω} . Show that EX is finite if and only
if a < 2 and is then equal to a/(2 − a).
Exercise 3.13. Let the probability space (Ω, F, P) be given by taking (Ω, F) as in Exercise 3.12,
and P({ω}) = cω −2 for the appropriate normalising constant c (= 6/π 2 ). Define the random
variable X by X(ω) = ω k for some k > 0. Show that EX is finite if and only if k < 1.

B General random variables

Suppose now that X is a general (real-valued) random variable. Represent X as the difference
of two nonnegative random variables:
X = X + − X −.
where X + = max(X, 0) and X − = max(−X, 0) (note that also X − = − min(X, 0)). Then both
the expectations EX + and EX − exist, although either may be infinite. In the case where at
least one of these expectations is finite, we define
EX = EX + − EX − . (3.10)
Thus, in this case, EX may be finite, or ∞ or −∞. If both EX + and EX − are infinite, then
EX is not defined.
Note also that in the case where the random variable X is simple, the random variables X +
and X − are also simple. Thus it follows easily from our earlier observations (for nonnegative
random variables and for simple random variables) that for a simple random variable X the
definitions (3.8) and (3.10) agree.
We say that the random variable X is integrable (with respect to P) if E|X| < ∞. Since |X| =
X + + X − , it follows from the linearity of expectation (see below) that E|X| = EX + + EX − .
Thus we have the following four cases:
SMSTC: Foundations of Probability 3–11

EX + EX − E|X| EX
finite finite finite finite (X integrable)
∞ finite ∞ ∞
finite ∞ ∞ −∞
∞ ∞ ∞ undefined

Only the first of these cases corresponds to X being integrable, which is thus seen to be equivalent
to the requirement that both EX + < ∞ and EX − < ∞, or to the requirement that EX is both
well defined and finite.
The expectation of the random variable X is the just the probabilistic term for the Lebesgue
integral of X on (Ω, F) with respect to the probability measure P—and indeed it is easy to see
that expectation of X is just its average weighted with respect to P, i.e. is a sum or integral.
Thus, from Lebesgue integration theory, we have the following alternative notations:
Z Z
EX := X dP := X(ω) dP (ω).
Ω Ω

Further, for any event A ∈ F, we can define the expectation of X on A by:

Z
E(X; A) := E(XIA ) := X(ω) dP (ω),
A

where, as usual, IA is the indicator random variable of the event A.

In the case where P(A) > 0 we can also define the expectation of X given A by:

E(X; A)
E(X|A) := .
P(A)

We remark that E(X|A) is expectation of X with respect to the conditional probability measure
PA introduced in Chapter 2, i.e. with respect to the probability measure PA : F → R given by
P(A ∩ B)
PA (B) := , B ∈ F.
P(A)

(In particular, for any B ∈ B, we have E(IB |A) = PA (B).) We may alternatively write

E(X|A) = EPA X.

Exercise 3.14. Consider again the probability space and random variable X of Exercise 3.12.
Let A = {1, 3, 5, . . . }, so that Ac = {2, 4, 6, . . . }. Show that E(X|A) = 3a/(4 − a2 ) and that
E(X|Ac ) = 3a2 /(4 − a2 ). Understand, if you can, why it should be immediately obvious that
E(X|Ac ) = aE(X|A). Verify also that EX = P(A)E(X|A) + P(Ac )E(X|Ac ) (this is an instance
of the partition rule as applied to expectation).

3.4.2 Properties of expectation

The following lemma gives some simple properties of expectation. Recalling the characterisation
of expectation as the Lebesgue integral, none of them is in the least surprising.

Lemma 3.4 (algebraic properties of expectation). Suppose X, Y are integrable random variables
on the same probability space (Ω, F, P). Then:
(i) If, for some c ∈ R, P(X = c) = 1, then EX = c.
(ii) If, for any event A, P(A) = 0, then EXIA = 0.
(iii) E(cX) = cE(X) for all c ∈ R. (Linearity 1)
SMSTC: Foundations of Probability 3–12

(iv) E(X + Y ) = EX + EY . (Linearity 2)

(v) If P(X ≥ 0) = 1, then EX ≥ 0. (Positivity)
(vi) If P(X ≤ Y ) = 1, then EX ≤ EY . (Monotonicity)
(vii) |EX| ≤ E|X|. (Modulus inequality)

These basic algebraic properties of expectation are easily proved for simple random variables X
and Y using the definition (3.8). (For example, to prove (ii) observe that if X is simple with
the representation (3.7), then XIA is simple with the same representation except that each set
An is replaced by An ∩ A, the latter events all having probability 0.)
The extension of the proofs to general random variables X and Y necessary involves limiting
arguments using simple random variables (as in the definition of the expectation of a general
random variable). This is part of the theory of Lebesgue integration.

Exercise 3.15. Prove the results (i)–(vi) above for simple random variables. Prove the result
(vii) for a general random variable X (assuming the earlier results).

Recall again that a real-valued random variable X on the probability space (Ω, F, P) induces a
probability measure PX on the space (R, B) given by (3.2). We now have the following important
result.

Theorem 3.2. For any measurable function ϕ : (R, B) → (R, B) such that the random variable
ϕ(X) is either nonnegative or integrable,
Z Z
Eϕ(X) := ϕ(X(ω)) P(dω) = ϕ(x) PX (dx), (3.11)
Ω R

i.e. Eϕ(X) is also the expectation of the function ϕ regarded as a random variable on the prob-
ability space (R, B, PX ). In particular we have:
(i) if X has discrete distribution, taking values in a set S, with probability function p (i.e.
p(x) = P(X = x), x ∈ S), then
X
Eϕ(X) = ϕ(x)p(x); (3.12)
x∈S

(ii) if X has an absolutely continuous distribution with density function f , then

Z
Eϕ(X) = ϕ(x)f (x) dx. (3.13)
R

The expressions (3.12) and (3.13) are the well-known formulae for computing the expectation of
(a function of) a random variable in the discrete and absolutely continuous cases respectively.
A special case of the above theorem occurs when the random variable X is integrable and ϕ is
the identity function, and here shows that EX depends only on the law, or distribution, of X.
There are also obvious extensions to the case where X is not necessarily integrable: for example,
if X has an absolutely continuous distribution function with density f , we have
Z ∞ Z 0
EX = xf (x) dx − (−x)f (x) dx,
0 −∞

provided that not both integrals are infinite.

SMSTC: Foundations of Probability 3–13

To prove Theorem 3.2 in the case where the distribution of X is simple is easy: it is necessary
to show that (3.12) holds, where S is the P
finite set of values that the random variable X may
take. Here X has the representation X = x∈S xI{X=x} , and so
X
ϕ(X) = ϕ(x)I{X=x} .
x∈S

Thus X X
Eϕ(X) = ϕ(x)P(X = x) = ϕ(x)p(x)
x∈S x∈S

as required. Again the extension to general random variables involves limiting arguments using
simple random variables.

Example 3.5. Suppose that the random variable U has a uniform distribution on (0, 1). Then,
for any a > 0, it follows from (3.13) that the expectation of the random variable U a is given by
Z 1
1
EU a = ua du = .
0 a+1

Define also the random variable Y = −λ log U . Then it again follows from (3.13) that the
expectation of the random variable Y is given by
Z 1
EY = E(−λ log U ) = −λ log u du = λ.
0

It also follows as in Section 3.3 that Y has an exponential distribution with mean λ, i.e. has
density function g given by g(y) = λ−1 e−y/λ for y ≥ 0 and g(y) = 0 otherwise, so that the
expectation of Y may also be evaluated (again from (3.13) but using this time the density
function g of Y ) as Z ∞
EY = λ−1 ye−y/λ dy = λ.
0

Exercise 3.16. Suppose that the random variable N has a Poisson distribution with probability
function p given by
λn
p(n) = P(N = n) = e−λ , n = 0, 1, 2, . . . ,
n!
for some λ > 0. Show that EN = λ and that EN 2 = λ2 + λ.

Finally, we note the following famous results, which are directly related to the definition of
expectation, and which will be discussed and frequently used in subsequent chapters.

Theorem 3.3 (Monotone Convergence Theorem). Let Xn be any sequence of nonnegative

random variables such that Xn ↑ X (i.e. Xn (ω) ↑ X(ω) as n → ∞ for all ω ∈ Ω) for some
further random variable X. Then E|X − Xn | → 0 and, as a weak consequence, EXn ↑ EX.

Lemma 3.5 (Fatou’s Lemma). Let Xn be any sequence of nonnegative random variables. Then
E lim inf Xn ≤ lim inf EXn .

Theorem 3.4 (Dominated Convergence Theorem). Let Xn be any sequence of random variables
such that X(ω) := limn→∞ Xn (ω) exists and such that, for some integrable random variable Y
(i.e. E|Y | < ∞), we have |Xn (ω)| ≤ Y (ω) for all n and ω. Then E|Xn − X| → 0 and, as a
weak consequence, EXn → EX.
SMSTC: Foundations of Probability 3–14

References
[1] B. Fristedt & L. Gray, A Modern Approach to Probability Theory, Birkhäuser, 1997.

[2] E. Hewitt & K. Stromberg, Real and Abstract Analysis, Springer, 1965.

[3] D. Williams, Probability with Martingales, Cambridge, 1991.

Algebraic Combinatorics: Richard P. Stanley
No ratings yet
Algebraic Combinatorics: Richard P. Stanley
268 pages
ALL ST218 Lecture Notes
No ratings yet
ALL ST218 Lecture Notes
87 pages
Lee J. Bain, Max Engelhardt-Introduction To Probability and Mathematical Statistics (2000)
67% (3)
Lee J. Bain, Max Engelhardt-Introduction To Probability and Mathematical Statistics (2000)
658 pages
Orf526 f24 Lec3
No ratings yet
Orf526 f24 Lec3
5 pages
Introduction To Probability: 2.1 Random Variable
No ratings yet
Introduction To Probability: 2.1 Random Variable
4 pages
IEOR 6711: Stochastic Models I Fall 2013, Professor Whitt Lecture Notes, Tuesday, September 3 Laws of Large Numbers
No ratings yet
IEOR 6711: Stochastic Models I Fall 2013, Professor Whitt Lecture Notes, Tuesday, September 3 Laws of Large Numbers
6 pages
Probability Theory (MATHIAS LOWE)
No ratings yet
Probability Theory (MATHIAS LOWE)
69 pages
03 PDF
No ratings yet
03 PDF
22 pages
ProbabilityStatistics_Probability2 (1)
No ratings yet
ProbabilityStatistics_Probability2 (1)
11 pages
Probability Preamble
No ratings yet
Probability Preamble
5 pages
Stochastic Processes and The Mathematics of Finance: Jonathan Block April 1, 2008
No ratings yet
Stochastic Processes and The Mathematics of Finance: Jonathan Block April 1, 2008
132 pages
Example 3.1.1
No ratings yet
Example 3.1.1
22 pages
Cs229 Probability Review
No ratings yet
Cs229 Probability Review
36 pages
Chapter 3
No ratings yet
Chapter 3
35 pages
Introductory Probability and The Central Limit Theorem
No ratings yet
Introductory Probability and The Central Limit Theorem
11 pages
Probability Review
No ratings yet
Probability Review
12 pages
Probability and Statistics
No ratings yet
Probability and Statistics
20 pages
Introduction To Probability (Lectures 1-2)
No ratings yet
Introduction To Probability (Lectures 1-2)
11 pages
Probability
No ratings yet
Probability
69 pages
Ugc Net Economics English Book 2
No ratings yet
Ugc Net Economics English Book 2
17 pages
Probability
No ratings yet
Probability
73 pages
MAS 102_Topic 1
No ratings yet
MAS 102_Topic 1
13 pages
Lecture 3 - Statistics
No ratings yet
Lecture 3 - Statistics
16 pages
Probability Distributions: Values That The Random Variable Can Take On. Thus, The Expression P (X X) Symbolizes The
No ratings yet
Probability Distributions: Values That The Random Variable Can Take On. Thus, The Expression P (X X) Symbolizes The
6 pages
Random Variables: 1.1 Elementary Examples
No ratings yet
Random Variables: 1.1 Elementary Examples
14 pages
Lecture 4
No ratings yet
Lecture 4
39 pages
Random Signals: 1 Kolmogorov's Axiomatic Definition of Probability
No ratings yet
Random Signals: 1 Kolmogorov's Axiomatic Definition of Probability
14 pages
gsm-199-prev
No ratings yet
gsm-199-prev
25 pages
Chapter3 Foundation of Math
No ratings yet
Chapter3 Foundation of Math
26 pages
Stats ch1
No ratings yet
Stats ch1
22 pages
Stochastic Analysis in Finance I Stochastic Analysis in Finance I
No ratings yet
Stochastic Analysis in Finance I Stochastic Analysis in Finance I
18 pages
Mathematical statistics
No ratings yet
Mathematical statistics
7 pages
Lect 03
No ratings yet
Lect 03
15 pages
Stochastic Dynamics
No ratings yet
Stochastic Dynamics
78 pages
Module 1 (3)
No ratings yet
Module 1 (3)
12 pages
STAT515 Lecture
No ratings yet
STAT515 Lecture
85 pages
Econ-2042- Unit 2-HO
No ratings yet
Econ-2042- Unit 2-HO
12 pages
Ch03 DSP
100% (1)
Ch03 DSP
149 pages
mcnotes51
No ratings yet
mcnotes51
8 pages
Unit-1-Single Random Variable
No ratings yet
Unit-1-Single Random Variable
64 pages
Uniform Distribution (Continuous)
No ratings yet
Uniform Distribution (Continuous)
5 pages
MIT6 436JF08 Lec05
No ratings yet
MIT6 436JF08 Lec05
14 pages
MIT6 436JF08 Lec05
No ratings yet
MIT6 436JF08 Lec05
14 pages
Article 6
No ratings yet
Article 6
33 pages
Chapter 2 - Random Variables and Probabi - 2016 - Introduction To Statistical Ma
No ratings yet
Chapter 2 - Random Variables and Probabi - 2016 - Introduction To Statistical Ma
14 pages
Notes
No ratings yet
Notes
56 pages
10 Introduction To Random Variables: Range (X) (X R: X(S) X For Some S S)
No ratings yet
10 Introduction To Random Variables: Range (X) (X R: X(S) X For Some S S)
5 pages
Module A
No ratings yet
Module A
43 pages
RVSP Notes
89% (9)
RVSP Notes
123 pages
Random Variables PDF
No ratings yet
Random Variables PDF
90 pages
Unit 1
No ratings yet
Unit 1
27 pages
Ignou Stat
No ratings yet
Ignou Stat
320 pages
Block 2
No ratings yet
Block 2
87 pages
Unit 5 Random Variables: Structure
No ratings yet
Unit 5 Random Variables: Structure
20 pages
SI_Chapter-1
No ratings yet
SI_Chapter-1
30 pages
Lec-6 Random Variable 1D
No ratings yet
Lec-6 Random Variable 1D
12 pages
Engineering Uncertainty Notes
No ratings yet
Engineering Uncertainty Notes
15 pages
Random Variables
No ratings yet
Random Variables
11 pages
STAT2011 Week3 2024
No ratings yet
STAT2011 Week3 2024
11 pages
Lesson 3 Theory
No ratings yet
Lesson 3 Theory
25 pages
Elementary Calculus
From Everand
Elementary Calculus
George N. Frempong
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Automorphism Group of A Hypercube
No ratings yet
Automorphism Group of A Hypercube
3 pages
Thecombinatoricsof Laguerre Charlierand Hermite Polynomials
No ratings yet
Thecombinatoricsof Laguerre Charlierand Hermite Polynomials
13 pages
XXXXX
No ratings yet
XXXXX
5 pages
Fundamentals of Noise: V.Vasudevan, Department of Electrical Engineering, Indian Institute of Technology Madras
No ratings yet
Fundamentals of Noise: V.Vasudevan, Department of Electrical Engineering, Indian Institute of Technology Madras
24 pages
q3 Sum1 Statprob Answer-Key
No ratings yet
q3 Sum1 Statprob Answer-Key
3 pages
Ahmadu Bello University Discreet Probability COSC
No ratings yet
Ahmadu Bello University Discreet Probability COSC
106 pages
Probability Basics
100% (1)
Probability Basics
380 pages
6 chapter 8 continuous normal
No ratings yet
6 chapter 8 continuous normal
2 pages
Math 11 - Stat Prob - Q3Wk 1 2 Key.v. 02282021
No ratings yet
Math 11 - Stat Prob - Q3Wk 1 2 Key.v. 02282021
17 pages
Chapter 2 Random Variable
No ratings yet
Chapter 2 Random Variable
30 pages
Probability & Statistical Inference: Business Analytics
No ratings yet
Probability & Statistical Inference: Business Analytics
58 pages
Statistics and Probability
No ratings yet
Statistics and Probability
8 pages
Module 24 - Statistics 1 (Self Study)
No ratings yet
Module 24 - Statistics 1 (Self Study)
7 pages
(Ebook) Schaum's Easy Outline of Probability and Statistics by Murray Spiegel, John Schiller, Alu Srinivasan ISBN 9780071398381, 0071398384 download pdf
100% (3)
(Ebook) Schaum's Easy Outline of Probability and Statistics by Murray Spiegel, John Schiller, Alu Srinivasan ISBN 9780071398381, 0071398384 download pdf
81 pages
Random Variables and Probability Distributions: Lesson 2: Constructing Probability Distribution
No ratings yet
Random Variables and Probability Distributions: Lesson 2: Constructing Probability Distribution
21 pages
CS1 CMP Upgrade 2020
No ratings yet
CS1 CMP Upgrade 2020
94 pages
7.2 Power Point
No ratings yet
7.2 Power Point
17 pages
Ruin Theory
No ratings yet
Ruin Theory
29 pages
[Ebooks PDF] download Probability with martingales 17th print. Edition Williams full chapters
No ratings yet
[Ebooks PDF] download Probability with martingales 17th print. Edition Williams full chapters
88 pages
2022 Year 12 Mathematics Methods SEMESTER 2 Exam (CA)
No ratings yet
2022 Year 12 Mathematics Methods SEMESTER 2 Exam (CA)
21 pages
Probability
No ratings yet
Probability
5 pages
S 15 Notes
No ratings yet
S 15 Notes
216 pages
Chap 1
No ratings yet
Chap 1
4 pages
Engineering Data Analysis Part 1 23241stsem Notes
No ratings yet
Engineering Data Analysis Part 1 23241stsem Notes
108 pages
Year 10 Math Topic List
No ratings yet
Year 10 Math Topic List
13 pages
Unit 4 - Continuous Random Variables
No ratings yet
Unit 4 - Continuous Random Variables
35 pages
STATISTICSPROBABILITY11-LAS-Q3W3-1
No ratings yet
STATISTICSPROBABILITY11-LAS-Q3W3-1
12 pages
Module WK 1 2 Stat Probability
No ratings yet
Module WK 1 2 Stat Probability
114 pages
PSDA MidTermTest 20132014
No ratings yet
PSDA MidTermTest 20132014
3 pages
Discrete Distributions - Hypergeometric, Binomial, and Poisson - Engineering LibreTexts
No ratings yet
Discrete Distributions - Hypergeometric, Binomial, and Poisson - Engineering LibreTexts
14 pages
VII Sem Syllabus 2021-2022231121032932
No ratings yet
VII Sem Syllabus 2021-2022231121032932
18 pages
Chapter8 (Law of Numbers)
No ratings yet
Chapter8 (Law of Numbers)
24 pages

03 - Random Variables 1

Uploaded by

03 - Random Variables 1

Uploaded by

SMSTC (2022/23)

3 Random variables and their laws I 3–1

3.1 Random variables and their laws

PX (B) = P(X ∈ B), B ∈ S. (3.1)

3.1.2 Law of a discrete random variable

So p is sufficient for computing PX .

3.1.3 Uniform random variables and continuous random variables

P(X = x) = 0 for all x.

Then, by the axiomatic property P3 of Chapter 1, for any countable set D,

Hence, in particular, a uniform random variable is continuous.

3.2 Distribution functions

F (x) := PX (−∞, x] = P(X ≤ x), x ∈ R. (3.3)

Lemma 3.2. Any distribution function F has the following properties.

Exercise 3.2. Verify the remaining properties given by Lemma 3.2.

(ii) P(X ∈ (a, b)) = F (b−) − F (a),

F −1 (t) = inf{x : F (x) ≥ t}. (3.5)

3.2.1 Types of distribution function

Discrete distribution functions

A distribution function F is discrete if it is the distribution function of a discrete random

Figure 3.1: Distribution function of a discrete random variable

Continuous distribution functions

A distribution function F is continuous if it is a continuous function, i.e. if F (x) − F (x−) = 0

and hence is continuous. It is illustrated in Figure 3.2.

Figure 3.2: A (absolutely) continuous distribution function

Except at the points 0, 1, the function F is also differentiable with derivative

Absolutely continuous distribution functions

A (necessarily continuous) distribution function F is absolutely

Then, clearly, if X has distribution function F , for any a ≤ b,

Singularly continuous distribution functions

Figure 3.3: A continuous distribution function without density

General distribution functions

F = λd Fd + λac Fac + λsc Fsc

3.3 Transformation rules and densities

an is such that bm = ϕ(an ). Now let qm = P(Y = bm ). Then

By changing variable in the integral we have

3.4.1 Definition of expectation

B Indicator random variables

B Simple random variables

for events A1 , . . . , An and constants a1 , . . . , an . We then define

It is necessary to check that EX as above is well-defined: a simple random variable X has

B Nonnegative random variables

B General random variables

Further, for any event A ∈ F, we can define the expectation of X on A by:

where, as usual, IA is the indicator random variable of the event A.

3.4.2 Properties of expectation

(iv) E(X + Y ) = EX + EY . (Linearity 2)

(ii) if X has an absolutely continuous distribution with density function f , then

provided that not both integrals are infinite.

Theorem 3.3 (Monotone Convergence Theorem). Let Xn be any sequence of nonnegative

[3] D. Williams, Probability with Martingales, Cambridge, 1991.

You might also like