0% found this document useful (0 votes)
7 views

03 - Random Variables 1

Uploaded by

park miru
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

03 - Random Variables 1

Uploaded by

park miru
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

SMSTC (2022/23)

Foundations of Probability

www.smstc.ac.uk

Contents

3 Random variables and their laws I 3–1


3.1 Random variables and their laws . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–1
3.1.1 Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–1
3.1.2 Law of a discrete random variable . . . . . . . . . . . . . . . . . . . . . . 3–2
3.1.3 Uniform random variables and continuous random variables . . . . . . . . 3–2
3.2 Distribution functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–3
3.2.1 Types of distribution function . . . . . . . . . . . . . . . . . . . . . . . . . 3–4
3.3 Transformation rules and densities . . . . . . . . . . . . . . . . . . . . . . . . . . 3–7
3.4 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–9
3.4.1 Definition of expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–9
3.4.2 Properties of expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–11

(i)
SMSTC (2022/23)
Foundations of Probability
Chapter 3: Random variables and their laws I
The Probability Teama

www.smstc.ac.uk

Contents
3.1 Random variables and their laws . . . . . . . . . . . . . . . . . . . . . 3–1
3.1.1 Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–1
3.1.2 Law of a discrete random variable . . . . . . . . . . . . . . . . . . . . . 3–2
3.1.3 Uniform random variables and continuous random variables . . . . . . . 3–2
3.2 Distribution functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–3
3.2.1 Types of distribution function . . . . . . . . . . . . . . . . . . . . . . . . 3–4
3.3 Transformation rules and densities . . . . . . . . . . . . . . . . . . . . 3–7
3.4 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–9
3.4.1 Definition of expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–9
3.4.2 Properties of expectation . . . . . . . . . . . . . . . . . . . . . . . . . . 3–11

3.1 Random variables and their laws


3.1.1 Laws
Recall from Chapter 1 that a (real-valued) random variable is a measurable function X
from a probability space (Ω, F, P) into (R, B) (the set R of real numbers endowed with the Borel
σ-algebra B).
More generally, we can consider a random variable, or random element, X as a measurable
function from an “abstract” probability space (Ω, F, P) into a “concrete”, or “observation” mea-
surable space (S, S). Examples of random variables in this more general sense are real-valued
random variables ((S, S) = (R, B)), random vectors ((S, S) = (Rd , B(Rd )) random sequences,
and random functions.
The random variable X then induces a probability measure PX on (S, S), given by

PX (B) = P(X ∈ B), B ∈ S. (3.1)

The probability measure PX depends on both P and X. It describes the probabilities that X
takes different values, or ranges of values, in the observation space S, and is referred to as the
law, or distribution, of the random variable X. b
a
[email protected]
b
For (quite) good reasons the terms law and distribution are used interchangeably throughout these notes.

3–1
SMSTC: Foundations of Probability 3–2

3.1.2 Law of a discrete random variable


A discrete random variable X : (Ω, F, P) → (S, S) is, by definition, one that takes countably
many values. In other words, the random variable X is discrete if and only if there is a countable
set D ∈ S such that X(ω) ∈ D for all ω ∈ Ω. We also assume that D and all its subsets are
members of S. Hence the law PX of X is a probability measure on D. We know that a probability
on a countable set D can be defined by defining its values on singletons (the individual elements).
These values form the so-called probability (mass) function. Thus, the probability function
is given by
p(x) = PX {x} = P(X = x), x ∈ D.
Clearly, if B ⊆ D then [  X
PX (B) = PX {x} = p(x).
x∈B x∈B

So p is sufficient for computing PX .

Exercise 3.1. Suppose that the random variable X takes n distinct values (i.e. X(Ω) is a set
with n elements). Show that σ(X) has 2n elements and describe (give a procedure for describing)
them.

A discrete random variable is called simple if it takes only finitely many values. Any random
variable can be approximated by (a sequence of) simple random variables. More precisely, the
following statement holds.

Lemma 3.1. Let X : (Ω, F, P) → (R, B). Then there exists a sequence X1 , X2 , . . . of simple
random variables such that limn→∞ Xn (ω) = X(ω) for all ω ∈ Ω. If X(ω) ≥ 0 for all ω ∈ Ω,
then we can choose the sequence so that 0 ≤ Xn (ω) ≤ Xn+1 (ω) for each n and ω.

Consider for simplicity a non-negative X. Then the result of the lemma is a consequence of
the following construction: for any n, divide the interval [0, n) into n · 2n disjoint intervals
[k2−n , (k + 1)2−n ), k = 0, . . . , n2n − 1 of equal length 2−n . Then let Xn = k2−n if X ∈
[k2−n , (k + 1)2−n ) and Xn = n if X ≥ n. Clearly, Xn is a simple random variable for any n.
Further, for any ω, if X(ω) = x, then Xn (ω) = xn is an increasing sequence and 0 ≤ x−xn ≤ 2−n
for all n ≥ x.
Another method of approximation of any random variable by discrete random variables—with
lattice, or arithmetic distributionsc is also of use. Divide the whole real line into a countable
number of disjoint intervals [k2−n , (k + 1)2−n ), n = . . . , −2, −1, 0, 1, 2, . . . of equal length 2−n ,
and, for any k, let Xn = k2−n if X ∈ [k2−n , (k + 1)2−n ). Then again Xn ↑ X and, for any n,

P(X − Xn ≤ 2−n ) = 1.

3.1.3 Uniform random variables and continuous random variables


Recall from Chapter 1 that a uniform random variable U taking values in the interval [0, 1]
(endowed with the Borel σ-algebra) is such that its law satisfies

PU ([a, b]) = b − a, 0 ≤ a ≤ b ≤ 1.

A standard approach to the construction of such a random variable is to take the probability
space (Ω, F, P) to be given by Ω = [0, 1], F = B (the Borel σ-algebra on [0, 1]), and P equal to
c
A random variable X hasP a lattice, or arithmetic distribution with span h > 0 if, first, X may take only values
which are multiples of h, i.e. ∞n=−∞ P(X = nh) = 1, and, secondly, h is the minimal positive number for which
the property above holds.
SMSTC: Foundations of Probability 3–3

Lebesgue measure. Having shown that the latter exists (see Chapter 1), we simply define U to
be the identity function, given by U (x) = x for all x ∈ [0, 1], and observe that its law PU is also
Lebesgue measure, i.e. PU is the law of a uniform random variable as required. Notice that if D
is a countable set in [0, 1], then
P(U ∈ D) = 0.
For example, P(U is rational number) = 0.
Generally, a random variable X is defined to be continuous if

P(X = x) = 0 for all x.

Then, by the axiomatic property P3 of Chapter 1, for any countable set D,

P(X ∈ D) = 0.

Hence, in particular, a uniform random variable is continuous.

3.2 Distribution functions


We now consider real-valued random variables defined on a probability space (Ω, F, P).
Recall that, for any such random variable X, the probability measure PX on the measurable
space (R, B) (where B is the Borel σ-algebra on R) is the law, of X and is given by (3.1), which
here becomes
PX (B) = P(X ∈ B), B ∈ B. (3.2)
The distribution function of a real-valued random variable X is defined by

F (x) := PX (−∞, x] = P(X ≤ x), x ∈ R. (3.3)

Note that the distribution function F depends only on the law PX of X, i.e. if two random
variables have the same law, they have the same distribution function. Further, as we shall see
below, the distribution function determines the corresponding law uniquely.

Lemma 3.2. Any distribution function F has the following properties.


(i) x1 < x2 ⇒ F (x1 ) ≤ F (x2 ),
(ii) limx→−∞ F (x) = 0,
(iii) limx→+∞ F (x) = 1,
(iv) limn→∞ F (x + 1/n) = F (x).

Here (i) is the monotonicity property; the properties (ii), (iii) and (iv) are direct consequences
of the continuity property of probability (see the properties P9 and P10 of Chapter 1); further
(iv) is frequently called the right continuity property of distribution functions. To establish
these properties is an easy exercise. For example, to show (ii) take any sequence xn ↓ −∞ and
T the events An := {X ≤ xn } = {X ∈ (−∞, xn ]}. Then the events An are decreasing in
consider
n and n An = ∅. Therefore P(An ) = F (xn ) → 0 as n → ∞, by the continuity property P10 of
Chapter 1.

Exercise 3.2. Verify the remaining properties given by Lemma 3.2.

Exercise 3.3. Let X be a random variable with law PX and distribution function F (x) =
PX (−∞, x] = P(X ≤ x) for all x ∈ R. Carefully justify the following important formulae.
(i) P(X ∈ (a, b]) = F (b) − F (a),
SMSTC: Foundations of Probability 3–4

(ii) P(X ∈ (a, b)) = F (b−) − F (a),


(iii) P(X ∈ [a, b]) = F (b) − F (a−),
(iv) P(X = a) = F (a) − F (a−).

A consequence of the following lemma is that the properties (i)-(iv) of Lemma 3.2 characterise
the class of distribution functions of real-valued random variables.
Lemma 3.3. Suppose that a function F on R satisfies the properties (i)-(iv) of Lemma 3.2. Then
there exists a random variable X having F as its distribution function. Further, F determines
the law PX of X uniquely.

Proof The standard proof is as follows. Given a function F satisfying (i)-(iv) of Lemma 3.2,
we show first that there exists a unique probability measure, i.e. a unique law, P on (R, B) such
that
P(−∞, x] = F (x), x ∈ R. (3.4)
The above relation clearly defines P uniquely on general intervals as in Exercise 3.3, and so also
on the algebra each of whose sets is a finite union of disjoint intervals (where the probability
assigned by P to such a set is the sum of the probabilities assigned to the individual intervals).
It is now readily verified from the properties of F that P thus defined satisfies the conditions
of the Extension Theorem (Chapter 1), and so P may now be extended to a unique probability
measure on (R, B) such that (3.4) holds as required.
To show that there exists a random variable X which has P as its law, we take (Ω, F) = (R, B)
and the random variable X to the identity function X(ω) = ω for all ω ∈ R. This has law
PX = P as required, and so has distribution function F . 
The above construction of a probability space and random variable having a given law, or dis-
tribution, is sometimes referred to as the canonical construction: the “abstract” probability
space on which the random variable is defined and the “concrete”, or “observation”, space in
which it takes its values are the same, the random variable is the identity function, and the law
of the random variable coincides with the underlying probability measure. However, we remark
that in general much of the power of probability theory comes from our ability to be entirely
flexible in the choice of probability space.
The following alternative construction of a random variable having a given distribution function
is to some extent illustrative of the last remark above. Again given a function F satisfying
(i)-(iv) of Lemma 3.2, define its generalised inverse F −1 : [0, 1] → R by

F −1 (t) = inf{x : F (x) ≥ t}. (3.5)

The function F −1 is measurable with respect to the Borel σ-algebras on R and [0, 1] (this follows
since it is monotone non-decreasing—see the Exercises of Chapter 1). Now let a random variable
U have a uniform distribution on (0, 1) and define X = F −1 (U ). It is a useful exercise (see below)
to show that X has distribution function F . This latter construction forms the basis of one of
the most useful algorithms for the simulation of random variables—see Chapter 15 for more
details.
Exercise 3.4. Suppose that a distribution function F is continuous and strictly increasing.
Then the generalised inverse F −1 defined by (3.5) above is just the usual inverse function. Show
that in this case, the random variable X = F −1 (U ) does indeed have distribution function F .
Extend the result to the general case (hint: draw a picture).

3.2.1 Types of distribution function


We now discuss the various kinds of distribution function on R.
SMSTC: Foundations of Probability 3–5

Discrete distribution functions

A distribution function F is discrete if it is the distribution function of a discrete random


variable X taking values in some countable subset S of R, i.e. such that P(X ∈ S) = 1. Then
we may define the associated probability function
p(s) := P(X = s) = F (s) − F (s−), s ∈ S. (3.6)
Thus, for s ∈ S (provided p(s) > 0), the distribution function F (which P is right continuous)
jumps at s, the size of the jump being p(s); in particular F (s−) = 0
s0 ∈S, s0 <s p(s ), while
F (s) = s0 ∈S, s0 ≤s p(s0 ). Further, if (a, b) is any open interval containing no points of S, then
P

P is constant on (a, b). This follows since, if a < x < b then F (x) − F (a) = P(a < X ≤ x) =
F
s∈S,a<s≤x P(X = s) = 0.
Example 3.1. Let X be a random variable such that P(X = n) = 2−n , n ∈ N. (where
N = {1, 2, 3, . . . } is the set of natural numbers). Then distribution function of X is as shown in
Figure 3.1, i.e. is a step function.

0
0 1 2 3 4 5

Figure 3.1: Distribution function of a discrete random variable

Example 3.2. Let X be a random variable such that for every rational number of the form m/n
where m, n are positive integers with no common factors, we have P(X = m/n) = c2−(m+n)
where c is chosen so that P(X ∈ Q) = 1. Its distribution function is discrete because the set Q
of rational numbers Q is countable. Unfortunately, we can’t draw it. (There are no intervals
(a, b) containing no rational points.)

Continuous distribution functions

A distribution function F is continuous if it is a continuous function, i.e. if F (x) − F (x−) = 0


for all x ∈ R. In other words, a continuous distribution function is the distribution function of
a continuous random variable.
Example 3.3. The distribution function of a uniform random variable in [0, 1] is given by

0,
 x<0
F (x) = x, x ∈ [0, 1]

1, x > 1,

and hence is continuous. It is illustrated in Figure 3.2.


1

0
0 1

Figure 3.2: A (absolutely) continuous distribution function

Except at the points 0, 1, the function F is also differentiable with derivative


R u f (u) = 1 if 0 <
u < 1 and 0 otherwise. If we arbitrarily define f (0) = f (1) = 0, we also have −∞ f (t)dt = F (u)
for all u ∈ R.
SMSTC: Foundations of Probability 3–6

Absolutely continuous distribution functions

A (necessarily continuous) distribution function F is absolutely


R∞ continuous if there exists a
function f (referred to as the density of F ) such that −∞ f (x) dx = 1 and
Z x
F (x) = f (t) dt, x ∈ R.
−∞

Then, clearly, if X has distribution function F , for any a ≤ b,


Z b
P(X ∈ (a, b]) = F (b) − F (a) = f (t) dt.
a

The density f is not uniquely defined. For instance, it can be changed on a finite (or countable)
set and such a change will not affect the integral above. Usually, oned imposes additional
regularity conditions, such as continuity, resulting in uniqueness. Note that uniform distribution
is absolutely continuous.
Rb
Note also that integrals arising in probability theory of the form a h(x) dx (where we allow the
possibilities a = −∞ and b = ∞) are Lebesgue integrals. However, provided h is measurable
and bounded with a set of discontinuities of Lebesgue measure zero, then such integrals coincide
with their standard Reimann counterparts of elementary calculus —see [3]. This situation is
easily adapted to cover all practical applications.

Singularly continuous distribution functions

Unfortunately, not all continuous distribution functions are absolutely continuous. There also
exist so-called singularly continuous distribution functions. We do not provide a formal
definition of such functions. Instead we simply remark that they have very strange properties
and give one example of such a function.
Example 3.4. Consider the space (Ω = {0, 1}N , F, P), where F is the product σ-algebra, and
P is such that P{ω ∈ Ω : ω1 = i1 , . . . , ωn = in ) = 2−n , i1 , . . . , in ∈ {0, 1}, n ∈ N. Let

X 2ωn
V (ω) := .
3n
n=1

One can show that the random variable V defined in Example 3.4 has a continuous but not
absolutely continuous distribution function. This is illustrated in Figure 3.4.

Figure 3.3: A continuous distribution function without density

In particular, the derivative of this function exists at almost all points (i.e. at all points except
for those in a set of Lebesgue measure zero) and—what is really surprising—at each such point
the value of the derivative is zero!
d
unconsciously
SMSTC: Foundations of Probability 3–7

General distribution functions

Suppose that F, G are distribution functions. Then, for any λ ∈ (0, 1), the function λF +(1−λ)G
is a distribution function. (Probabilistically, if X, Y are random variables with distribution
functions F, G, respectively, then we can define a new random variable Z which, independently
of X and Y , takes the same value as X with probability λ or the same value as Y with probability
1 − λ.) Thus, if F is discrete and G continuous, then λF + (1 − λG) is neither discrete nor
continuous: it is mixed. The question now is: can we describe all distribution functions as
mixtures of the three types (discrete, absolutely continuous, singularly continuous) mentioned
above? The answer, formalised in the following remark, is yes.
Remark 3.1. Any distribution function F has a unique decomposition

F = λd Fd + λac Fac + λsc Fsc

where Fd , Fac , Fsc are discrete, absolutely continuous, and singularly continuous distribution
functions, respectively, and where the coefficients λd , λac , λsc are nonnegative and such that
λd + λac + λsc = 1.
The last two terms of the above decomposition are known as the continuous part of F , and the
first term is the discrete part of F . We will not prove this result, but refer, e.g., to [2]. The
intuition behind this result is clear. First, if the distribution function F has jumps, it may have
at most countable number of them with a total jump size, say, λd . Consider first a random
variable which is discrete and takes values where these jumps occur, with the same probabilities
normalised by λ−1 d ; then its distribution function is Fd . From the remainder, we can further
subtract an absolutely continuous component—but this is a more complicated story.

Differentiation: a word of caution. The subject of densities involves the concept of deriva-
tives of functions that are not necessarily everywhere differentiable. Recall
R ∞that a distribution
function G has a density g if g is a non-negative function such that −∞ g(x)dx = 1 and
Rx
G(x) = −∞ g(t)dt for all x. One can deduce from that that the function G(x) is differentiable
0
almost everywhere (i.e. for all x except in a set of Lebesgue measure zero) with G (x) = g(x).
Note that we may change values of g(x) at a finite or countably infinite number of points—
then G is unchanged. In a more sophisticated manner, we may change values of g(x) on a
set of Lebesgue measure zero—again, with no effect on G. If g is a continuous function, then
0
g(x) = G (x) for all x.

3.3 Transformation rules and densities


Consider a random variable X on a probability space (Ω, F, P). Let ϕ be a real-valued mea-
surable function. Then, as we already know, Y = ϕ(X) is also a random variable, and we are
transforming X into Y using a function (or transformation) ϕ. One of the basic questions is:
assume that we know the law PX of X and the function ϕ, then how may we find the law PY
of Y ? This is not an easy question, and here we provide answers only in two particular cases,
(a) for a discrete random variable X and general function ϕ, and
(b) for an absolutely continuous random variable X and differentiable function ϕ with strictly
positive derivative.
In the case of a discrete random
P variable X, suppose that X takes each possible value an
with probability pn , where n∈N pn = 1 and N is either a finite or a countably infinite set.
Consider the set B = {ϕ(an ), n ∈ N }. Clearly, the set B is also countable (since it cannot have
more elements than does N ); denote its different elements by b1 , b2 , . . . . The random variable
Y = ϕ(X) takes values only in B, and, for any bm , we have Y = bm if and only if X = an where
SMSTC: Foundations of Probability 3–8

an is such that bm = ϕ(an ). Now let qm = P(Y = bm ). Then


X
qm = pn
n : ϕ(an )=bm
P
and m qm = 1.

Exercise 3.5. Let X be the outcome of rolling a fair die, and let Y = 1 if X divides 3 and
Y = 2 otherwise. Find the law (distribution) of Y .

In the case of an absolutely continuous random variable, the story is different: a merely one-to-
one function ϕ can simultaneously change both the values of the transformed random variable
and its law in a quite general fashion.

Theorem 3.1. Let X be an absolutely continuous real-valued random variable with density f .
Let ϕ : R → R be a differentiable function with strictly positive derivative and let ψ be its inverse
function. Then ϕ(X) is a random variable with absolutely continuous distribution function and
density f given by
f (ψ(s))ψ 0 (s) for all s ∈ ϕ(R),
and 0 elsewhere.

Proof Since ϕ has a strictly positive derivative, it is strictly increasing and its inverse function
exists with domain ϕ(R). Then, the distribution function of ϕ(X) is, for any t ∈ ϕ(R),
Z ψ(t)
P(ϕ(X) ≤ t) = P(X ≤ ψ(t)) = f (x)dx.
−∞

By changing variable in the integral we have


Z ψ(t) Z t
f (x)dx = f (ψ(s))ψ 0 (s)ds,
−∞ −∞

where we set ψ 0 (s) = 0 for s 6∈ ϕ(R). From the definition of an absolutely continuous distribution
function we see that, indeed, ϕ(X) has an absolutely continuous distribution function and that
its density is the function inside the last integral. 

Exercise 3.6. Assume that, under the conditions of Theorem 3.1, the function ϕ is instead
differentiable with a strictly negative derivative with inverse function ψ. Show that, in this case,
the density of ϕ(X) is equal to f (ψ(s))|ψ 0 (s)| for s ∈ ϕ(R) and to 0 otherwise.

Exercise 3.7. Let U be a uniform random variable taking values in the interval (0, 1). Show
that, for a > 0, the density function g of the random variable Y = U 1/a is given by g(y) = ay a−1
for y ∈ (0, 1) and g(y) = 0 otherwise. Similarly find the density function of Y in the case a < 0,
identifying carefully the region in which it is nonzero.

Exercise 3.8. Let U be a uniform random variable taking values in the interval (0, 1). Show
that the density function g of the random variable Y = eU is given by g(y) = y −1 for y ∈ (1, e)
and g(y) = 0 otherwise. Find the density functions of each of the random variables e2U , e−U ,
and also e−cU for c > 0.

Exercise 3.9. Let X be a exponential random variable with density function f given by f (x) =
e−x for x ≥ 0 and f (x) = 0 otherwise. Show that, for a > 0, the density function g of the
random variable Y = e−X/a is given by g(y) = ay a−1 for y ∈ (0, 1) and g(y) = 0 otherwise.
SMSTC: Foundations of Probability 3–9

Note that each of the above three exercises is equally easily solved from first principles arguing
along the lines of the proof of Theorem 3.1.
Consideration of more general transformations ϕ is possible and relatively easy for random
variables taking values in R (the story for random vectors with values in Rd is more complicated).
For instance, we may assume that ϕ is piecewise differentiable. The problem then becomes one
in differential calculus and the general theorem is omitted. However, some examples are due.

Exercise 3.10. Let the random variable X1 have a uniform distribution (taking values) in
[−1, 1] and let the random variable X2 a uniform distribution (taking values) in [−1, 2]. Let
ϕ(x) = x2 . Find the densities of ϕ(X1 ) and ϕ(X2 ).

Exercise 3.11. Let X be a random variable with density f (x) = c(1 + x2 )−1 , x ∈ R. Let
ϕ(x) = cosh(x). Find the density (and hence show that it exists) of ϕ(X).

3.4 Expectation
The expectation, or mean, EX of a real-valued random variable X : (Ω, F) → (R, B), if it
can be defined, is justified (for instance) by the Law of Large Numbers, which was discussed in
Chapter 2 for Bernoulli trials, and which will be proved more generally in a later chapter.

3.4.1 Definition of expectation


We shall show how EX is defined for successively more general classes of random variable X.

B Indicator random variables

The simplest possible random variable is the indicator random variable IA of an event A,
defined by (
1, if ω ∈ A,
IA (ω) =
0, if ω 6∈ A.
Since IA takes the value 1 with probability P(A) or 0 with probability 1 − P(A), we define

EIA = P(A).

B Simple random variables

Recall that a random variable is simple if and only if it takes a finite number of distinct values.
A random variable X is simple if and only if it has a representation of the form
n
X
X= ai IAi (3.7)
i=1

for events A1 , . . . , An and constants a1 , . . . , an . We then define


n
X
EX = ai P(Ai ). (3.8)
i=1

It is necessary to check that EX as above is well-defined: a simple random variable X has


in general infinitely many representations of the form (3.7). However, we may define a unique
“canonical” representation by requiring that the events A1 , . . . , An are disjoint and the constants
a1 , . . . , an are distinct. It is then not difficult to see that, for any representation of X in the
SMSTC: Foundations of Probability 3–10

form (3.7), EX as defined by (3.8) evaluates to the same quantity as in this “canonical” case.
[Exercise: check this!].
Note the monotonicity property: if X and Y are simple random variables defined on the same
probability space and if X ≥ Y , then EX ≥ EY : to see this consider representations (3.7) for
both X and Y in which the events A1 , . . . , An are the same.
Further properties of expectation are given below.

B Nonnegative random variables

Suppose that X is a general nonnegative random variable. We allow also the possibility that X
may, sometimes, take the value +∞. We define
EX := sup{EY : Y is simple nonnegative, and Y ≤ X}. (3.9)
Note that if X is itself a simple random variable, then it follows from the monotonicity property
for simple random variables given above that the definitions (3.8) and (3.9) of EX agree.
Note further that, in the case where the (nonnegative) random variable X is discrete, but takes
a countably infinite set of values, it again has a representation of the form (3.7) and EX is again
as given by (3.8)—in each case with n replaced by ∞. (This extension to random variables
which take countably infinitely many values in general only works when they are nonnegative.)
Note that EX may be infinite, even when X is always finite.
Exercise 3.12. Let the probability space (Ω, F, P) be given by taking Ω = N (where N =
{1, 2, . . . }), taking F to be the σ-algebra consisting of all subsets of Ω, and taking P to be such
that P({ω}) = 2−ω for each ω ∈ N (observe that this does define a valid probability measure
P on (Ω, F)). Define the random variable X P by X(ω) = aω for some a > 0. Then X has a
representation of the form (3.7) given by X = ω∈N aω I{ω} . Show that EX is finite if and only
if a < 2 and is then equal to a/(2 − a).
Exercise 3.13. Let the probability space (Ω, F, P) be given by taking (Ω, F) as in Exercise 3.12,
and P({ω}) = cω −2 for the appropriate normalising constant c (= 6/π 2 ). Define the random
variable X by X(ω) = ω k for some k > 0. Show that EX is finite if and only if k < 1.

B General random variables

Suppose now that X is a general (real-valued) random variable. Represent X as the difference
of two nonnegative random variables:
X = X + − X −.
where X + = max(X, 0) and X − = max(−X, 0) (note that also X − = − min(X, 0)). Then both
the expectations EX + and EX − exist, although either may be infinite. In the case where at
least one of these expectations is finite, we define
EX = EX + − EX − . (3.10)
Thus, in this case, EX may be finite, or ∞ or −∞. If both EX + and EX − are infinite, then
EX is not defined.
Note also that in the case where the random variable X is simple, the random variables X +
and X − are also simple. Thus it follows easily from our earlier observations (for nonnegative
random variables and for simple random variables) that for a simple random variable X the
definitions (3.8) and (3.10) agree.
We say that the random variable X is integrable (with respect to P) if E|X| < ∞. Since |X| =
X + + X − , it follows from the linearity of expectation (see below) that E|X| = EX + + EX − .
Thus we have the following four cases:
SMSTC: Foundations of Probability 3–11

EX + EX − E|X| EX
finite finite finite finite (X integrable)
∞ finite ∞ ∞
finite ∞ ∞ −∞
∞ ∞ ∞ undefined

Only the first of these cases corresponds to X being integrable, which is thus seen to be equivalent
to the requirement that both EX + < ∞ and EX − < ∞, or to the requirement that EX is both
well defined and finite.
The expectation of the random variable X is the just the probabilistic term for the Lebesgue
integral of X on (Ω, F) with respect to the probability measure P—and indeed it is easy to see
that expectation of X is just its average weighted with respect to P, i.e. is a sum or integral.
Thus, from Lebesgue integration theory, we have the following alternative notations:
Z Z
EX := X dP := X(ω) dP (ω).
Ω Ω

Further, for any event A ∈ F, we can define the expectation of X on A by:


Z
E(X; A) := E(XIA ) := X(ω) dP (ω),
A

where, as usual, IA is the indicator random variable of the event A.


In the case where P(A) > 0 we can also define the expectation of X given A by:

E(X; A)
E(X|A) := .
P(A)

We remark that E(X|A) is expectation of X with respect to the conditional probability measure
PA introduced in Chapter 2, i.e. with respect to the probability measure PA : F → R given by
P(A ∩ B)
PA (B) := , B ∈ F.
P(A)

(In particular, for any B ∈ B, we have E(IB |A) = PA (B).) We may alternatively write

E(X|A) = EPA X.

Exercise 3.14. Consider again the probability space and random variable X of Exercise 3.12.
Let A = {1, 3, 5, . . . }, so that Ac = {2, 4, 6, . . . }. Show that E(X|A) = 3a/(4 − a2 ) and that
E(X|Ac ) = 3a2 /(4 − a2 ). Understand, if you can, why it should be immediately obvious that
E(X|Ac ) = aE(X|A). Verify also that EX = P(A)E(X|A) + P(Ac )E(X|Ac ) (this is an instance
of the partition rule as applied to expectation).

3.4.2 Properties of expectation


The following lemma gives some simple properties of expectation. Recalling the characterisation
of expectation as the Lebesgue integral, none of them is in the least surprising.

Lemma 3.4 (algebraic properties of expectation). Suppose X, Y are integrable random variables
on the same probability space (Ω, F, P). Then:
(i) If, for some c ∈ R, P(X = c) = 1, then EX = c.
(ii) If, for any event A, P(A) = 0, then EXIA = 0.
(iii) E(cX) = cE(X) for all c ∈ R. (Linearity 1)
SMSTC: Foundations of Probability 3–12

(iv) E(X + Y ) = EX + EY . (Linearity 2)


(v) If P(X ≥ 0) = 1, then EX ≥ 0. (Positivity)
(vi) If P(X ≤ Y ) = 1, then EX ≤ EY . (Monotonicity)
(vii) |EX| ≤ E|X|. (Modulus inequality)

These basic algebraic properties of expectation are easily proved for simple random variables X
and Y using the definition (3.8). (For example, to prove (ii) observe that if X is simple with
the representation (3.7), then XIA is simple with the same representation except that each set
An is replaced by An ∩ A, the latter events all having probability 0.)
The extension of the proofs to general random variables X and Y necessary involves limiting
arguments using simple random variables (as in the definition of the expectation of a general
random variable). This is part of the theory of Lebesgue integration.

Exercise 3.15. Prove the results (i)–(vi) above for simple random variables. Prove the result
(vii) for a general random variable X (assuming the earlier results).

Recall again that a real-valued random variable X on the probability space (Ω, F, P) induces a
probability measure PX on the space (R, B) given by (3.2). We now have the following important
result.

Theorem 3.2. For any measurable function ϕ : (R, B) → (R, B) such that the random variable
ϕ(X) is either nonnegative or integrable,
 Z  Z
Eϕ(X) := ϕ(X(ω)) P(dω) = ϕ(x) PX (dx), (3.11)
Ω R

i.e. Eϕ(X) is also the expectation of the function ϕ regarded as a random variable on the prob-
ability space (R, B, PX ). In particular we have:
(i) if X has discrete distribution, taking values in a set S, with probability function p (i.e.
p(x) = P(X = x), x ∈ S), then
X
Eϕ(X) = ϕ(x)p(x); (3.12)
x∈S

(ii) if X has an absolutely continuous distribution with density function f , then


Z
Eϕ(X) = ϕ(x)f (x) dx. (3.13)
R

The expressions (3.12) and (3.13) are the well-known formulae for computing the expectation of
(a function of) a random variable in the discrete and absolutely continuous cases respectively.
A special case of the above theorem occurs when the random variable X is integrable and ϕ is
the identity function, and here shows that EX depends only on the law, or distribution, of X.
There are also obvious extensions to the case where X is not necessarily integrable: for example,
if X has an absolutely continuous distribution function with density f , we have
Z ∞ Z 0
EX = xf (x) dx − (−x)f (x) dx,
0 −∞

provided that not both integrals are infinite.


SMSTC: Foundations of Probability 3–13

To prove Theorem 3.2 in the case where the distribution of X is simple is easy: it is necessary
to show that (3.12) holds, where S is the P
finite set of values that the random variable X may
take. Here X has the representation X = x∈S xI{X=x} , and so
X
ϕ(X) = ϕ(x)I{X=x} .
x∈S

Thus X X
Eϕ(X) = ϕ(x)P(X = x) = ϕ(x)p(x)
x∈S x∈S

as required. Again the extension to general random variables involves limiting arguments using
simple random variables.

Example 3.5. Suppose that the random variable U has a uniform distribution on (0, 1). Then,
for any a > 0, it follows from (3.13) that the expectation of the random variable U a is given by
Z 1
1
EU a = ua du = .
0 a+1

Define also the random variable Y = −λ log U . Then it again follows from (3.13) that the
expectation of the random variable Y is given by
Z 1
EY = E(−λ log U ) = −λ log u du = λ.
0

It also follows as in Section 3.3 that Y has an exponential distribution with mean λ, i.e. has
density function g given by g(y) = λ−1 e−y/λ for y ≥ 0 and g(y) = 0 otherwise, so that the
expectation of Y may also be evaluated (again from (3.13) but using this time the density
function g of Y ) as Z ∞
EY = λ−1 ye−y/λ dy = λ.
0

Exercise 3.16. Suppose that the random variable N has a Poisson distribution with probability
function p given by
λn
p(n) = P(N = n) = e−λ , n = 0, 1, 2, . . . ,
n!
for some λ > 0. Show that EN = λ and that EN 2 = λ2 + λ.

Finally, we note the following famous results, which are directly related to the definition of
expectation, and which will be discussed and frequently used in subsequent chapters.

Theorem 3.3 (Monotone Convergence Theorem). Let Xn be any sequence of nonnegative


random variables such that Xn ↑ X (i.e. Xn (ω) ↑ X(ω) as n → ∞ for all ω ∈ Ω) for some
further random variable X. Then E|X − Xn | → 0 and, as a weak consequence, EXn ↑ EX.

Lemma 3.5 (Fatou’s Lemma). Let Xn be any sequence of nonnegative random variables. Then
E lim inf Xn ≤ lim inf EXn .

Theorem 3.4 (Dominated Convergence Theorem). Let Xn be any sequence of random variables
such that X(ω) := limn→∞ Xn (ω) exists and such that, for some integrable random variable Y
(i.e. E|Y | < ∞), we have |Xn (ω)| ≤ Y (ω) for all n and ω. Then E|Xn − X| → 0 and, as a
weak consequence, EXn → EX.
SMSTC: Foundations of Probability 3–14

References
[1] B. Fristedt & L. Gray, A Modern Approach to Probability Theory, Birkhäuser, 1997.

[2] E. Hewitt & K. Stromberg, Real and Abstract Analysis, Springer, 1965.

[3] D. Williams, Probability with Martingales, Cambridge, 1991.

You might also like