Probability Theory: Probability Spaces and Events
Probability Theory: Probability Spaces and Events
• The sample space for three coin flips is the set {0, 1}3 , where 0 represents heads
and 1 represents tails.
• The sample space for a random number between 0 and 1 is the interval [0, 1].
• “The sum of the values shown by the dice is greater than or equal to 7.”
From a formal point of view, events are usually defined to be certain subsets of the
sample space. Thus the event “both dice show even numbers” refers to the subset
{2, 4, 6} × {2, 4, 6}. Despite this, it is more common to write statements than subsets
when referring to a specific event.
In the special case of an experiment with finitely many outcomes, we can define
the probability of any subset of the sample space, and therefore every subset is an
event. In the general case, however, probability is a measure on the sample space,
and only measurable subsets of the sample space are events.
2
(a) (b)
Figure 1: (a) Each infinite sequence of coin flips corresponds to a path down an
infinite binary tree. In this case, the sequence begins with 010. (b) The leaves of an
infinite binary tree form a Cantor set.
experiments. Now, imagine that we perform both experiments, recording the outcome
for each. The combined outcome for this experiment is an ordered pair (ω1 , ω2 ), where
ω1 ∈ Ω1 and ω2 ∈ Ω2 . In fact, the combined experiment corresponds to a probability
space (Ω, E, P ), where:
• P : E → [0, 1] is the product of the measures P1 and P2 . That is, P is the unique
measure with domain E satisfying P (E1 × E2 ) = P1 (E1 ) P2 (E2 ) for all E1 ∈ E1
and E2 ∈ E2 .
For example, if we pick two random numbers between 0 and 1, the corresponding
sample space is the square [0, 1] × [0, 1], with the probability measure being two-
dimensional Lebesgue measure.
the product topology, the sample space Ω = {0, 1}N is homeomorphic to the standard
middle-thirds Cantor set.
It is not too hard to put a measure on Ω. Given a finite sequence b1 , . . . , bn of 0’s
and 1’s, let B(b1 , . . . , bn ) be the set of outcomes whose first n flips are b1 , . . . , bn , and
define
1
P0 B(b1 , . . . , bn ) = n .
2
Let B be the collection of all such sets, and let
P ∗ (E) = inf
P S
P0 (Bn ) B1 , B2 , . . . ∈ B and E ⊂ Bn
for every E ⊂ Ω. Then P ∗ is an outer measure on Ω, and the resulting measure P is
a probability measure.
The mechanism described above for putting a measure on {0, 1}N can be modified
to put a measure on ΩN for any probability space (Ω, E, P ). For example, it is possible
to talk about an experiment in which we roll an infinite sequence of dice, or pick an
infinite sequence of random numbers between 0 and 1, and for each of these there is
a corresponding probability space.
Random Variables
A random variable is a quantity whose value is determined by the results of a
random experiment. For example, if we roll two dice, then the sum of the values of
the dice is a random variable.
In general, a random variable may take values from any set S.
• If we pick a random number between 0 and 1, then the number X that we pick
is a random variable Ω → [0, 1].
• For an infinite number of coin flips, we can define a sequence C1 , C2 , . . . of
random variables Ω → {0, 1} by
(
0 if the nth flip is heads,
Cn =
1 if the nth flip is tails.
We end with a useful formula for integrating with respect to a probability distri-
bution. This is essentially just a restatement of the formula for the Lebesgue integral
with respect to a pushforward measure.
for every measurable set T ⊂ R, where m denotes Lebesgue measure. In this case,
the function fX is called a probability density function for X.
In this case, X is said to have the standard normal distribution. A graph of the
function fX is shown in Figure 2a.
For such an X, the probability PX (T ) that the value X lies in any set T is given
by the formula Z
PX (T ) = fX dm.
T
8
0.4
P Ha < X < bL
0.3
0.2
0.1
-3 -2 -1 0 1 2 3 a b
(a) (b)
Figure 2: (a) The probability density fX for a standard normal distribution. (b) Area
under the graph of fX on the interval (a, b).
PROOF This is related the Lebesgue differentiation theorem, though it does not
follows from it immediately.
9
Figure 3: Probability density for X 2 , where X is chosen uniformly from [0, 1].
EXAMPLE 10 Density of X 2
Let X : Ω → [0, 1] be uniformly distributed, and let Y = X 2 . Then for any interval
[a, b] ⊂ [0, 1], we have
√ √
Y ∈ [a, b] ⇔ X∈ a, b
so √ √ √ √
PY [a, b] = PX [ a, b ] = b − a.
Therefore,
√ √
PY [x − h, x + h] x+h− x−h 1
fY (x) = lim+ = lim+ = √ .
h→0 2h h→0 2 2 x
A plot of this function is shown in Figure 3. Note that the total area under the graph
of the function is 1, which proves that Y is continuous, and that fY is a probability
density function for Y .
Expected Value
The expected value is sometimes called the average value or mean of X, and is
also denoted X or hXi.
For example, if X is a random number from [0, 1] with uniform distribution, then
Z
1
EX = x dm(x) = .
[0,1] 2
For another example, consider √the random variable Y = X 2 . The probability density
function for Y is fY (x) = 1/(2 x) (see Example 10), so
Z
1 1
EY = x √ dm(x) = .
[0,1] 2 x 3
Note that both of these examples involve positive random variables. For a general
variable X : Ω → R, it is also possible for EX to be entirely undefined.
The following proposition lists some basic properties of expected value. These
follow directly from the corresponding properties for the Lebesgue integral:
12
1. If C ∈ R is constant, then EC = C.
3. E[X + Y ] = EX + EY .
4. |EX| ≤ E|X|.
Though the variance has nicer theoretical properties, the standard deviation has
more meaning, since it is measured in the same units as X. For example, if X is a
random variable with units of length, then the standard deviation of X is measured
in feet, while the variance is measured in square feet. The standard deviation in the
length of an adult blue whale is about 10 feet, meaning that most whales are about
70–90 feet long.
To give you a feel for these quantities, Figure 4a shows several normal distributions
with different standard deviations. The standard normal distribution has standard
deviation 1, and normal distributions with standard deviations of 0.5 and 2 have also
been plotted. As a general rule, a normally distributed random variable has roughly
13
¾ = 0.5
Area
¾=1 0.682
¾=2
-4 -2 0 2 4 -3 -2 -1 0 1 2 3
(a) (b)
Figure 4: (a) Normal distributions with standard deviations of 0.5, 1, and 2. (b) A
normally distributed variable has a 68.2% chance of being within one standard devi-
ation of the mean.
a 68.2% probability of being within one standard deviation of the mean, as shown
in Figure 4b. In particular,
Z 1
1 1 2
√ exp − x dx ≈ 0.682689.
−1 2π 2
For example, if we assume that the length of a random adult blue whale is nor-
mally distributed with a mean of 80 feet and a standard deviation of 10 feet, then
approximately 68% of blue whales will be between 70 and 90 feet long.
As with expected value, we can compute the variance of a random variable directly
from the distribution:
For example, if X is a random number from [0, 1] with uniform distribution, then
EX = 1/2, so
Z 2
1 1
Var(X) = x− dm(x) = .
[0,1] 2 12
Taking the square root yields a standard deviation of approximately 0.29.
Such a variable has mean µ and standard deviation σ, as can be seen from the
following integral formulas:
1 x − µ 2
Z
x
√ exp − dm(x) = µ.
R σ 2π 2 σ
(x − µ)2
1 x − µ 2
Z
√ exp − dm(x) = σ 2 .
R σ 2π 2 σ
15
But
Cx2
Z
Var(X) = dm(x) = ∞.
R 1 + |x|3
From a measure-theoretic point of view, this random variable is an L1 function on
the probability space Ω, but it is not L2 .
It follows from this formula that Var(X) is finite if and only if E[X 2 ] < ∞, i.e. if
and only if X ∈ L2 (Ω).
Exercises
1. Let X : Ω → (0, 1] have the uniform distribution, and let Y = 1/X.
a) Find the probability density function fY : [1, ∞) → [0, ∞] for Y .
16
2. Let X : Ω → [0, 1] have the uniform distribution, and let Y = sin(8X). Use the
integration formula for distributions (Proposition 1) to compute EY .
3. Let X and Y be the values of two die rolls, and let Z = max(X, Y ).
a) Find PZ ({n}) for n ∈ {1, 2, 3, 4, 5, 6}.
b) Determine EZ, and find the standard deviation for Z.