STA347
STA347
Final Preparation
Yuchen Wang
Contents
1 Experiments, Events and Sample Spaces 3
4 Inclusion-Exclusion Formula 5
5 Conditional Probability 5
6 Independence 5
7 Bayes Theorem 6
8 Random Variables 6
8.1 Examples of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
8.2 Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
8.3 Multivariate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
8.3.1 Bivariage Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
8.3.2 Marginal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
8.3.3 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
8.3.4 Multivariate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
8.4 Functions of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
8.5 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
8.6 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
9 Inequalities 14
10 Conditional Expectation 15
12 Stochastic process 18
12.1 Random Walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
12.2 Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
12.3 Reflection principle (Wiener process) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1
CONTENTS 2
13 Mode of Convergence 20
13.1 L1 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
13.2 Almost Sure Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
13.3 Convergence in distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
• Experiment: Any process, real or hypothetical, in which the possible outcomes can be identified ahead of
time;
Definition 1.2 (countably infinity). A set is countably infinite if its elements can be put in one-to-one
correspondence with the set of natural numbers.
Definition 1.3 (At most countable sets). A set that is either finite or countably infinite is called an at most
countable set.
Theorem 1.1. Suppose E, E1 , E2 , . . . are events. The following are also events
1. E c
2. E1 ∪ E2 ∪ . . . En
P∞
3. i=1 Ei
1. χ ∈ F
Remark 2.1. A σ-field refers to the collection of subsets of a sample space that we should use in order to
establish a mathematically formal definition of probability. The sets in the σ-field constitute the events from
our sample space.
Axiom 2.1 (Axioms of Probability). Let S be a sample space, and let F be a σ-field of S.
• Axiom 2 P (S) = 1
Definition 2.2 (probability). Any function P on a sample space S satisfying Axioms 1-3 is called a probability.
1. P (∅) = 0
3 CLASSICAL EQUAL PROBABILITY AND COMBINATORICS 4
3. P (Ac ) = 1 − P (A)
5. 0 ≤ P (A) ≤ 1
6. P (A − B) = P (A) − P (A ∩ B)
7. P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Theorem 3.2.
n n!
Cn,k = = = Pn,k /k!
k k!(n − k)!
Theorem 3.3 (Binomial coefficients).
n
n
X n k n−k
(x + y) = x y
k
k=0
Theorem 3.4 (Newton Expansion). For |z| < 1, the term (1 + z)r can be expanded as
∞
r
X r k
(1 + z) = z
k
k=0
4 INCLUSION-EXCLUSION FORMULA 5
Theorem 3.5.
n r(r − 1) . . . (r − k + 1) Γ(r + 1)
= =
k k! Γ(r − k + 1)Γ(k + 1)
´∞
with Γ(α) = 0 xα−1 e−x dx
Theorem 3.6. For any numbers x1 , . . . , xk and non-negative integer n,
X n
(x1 + . . . + xk ) =n
xn1 . . . xnk k
n1 , . . . , n k 1
It is easy to see that
n n n2 + · · · + nk n3 + · · · + nk nk
= ···
n1 , . . . , n k n1 n2 n3 nk
(1)
n!
=
n1 ! · · · nk !
Theorem 3.7 (Stirling’s formula).
1 1
lim log(n!) − [ log(2π) + (n + ) log(n) − n] = 0
n→∞ 2 2
4 Inclusion-Exclusion Formula
For any n events A1 , . . . , An ,
n
X X X
P (∪ni=1 Ai ) = P (Ai ) − P (Ai ∩ Aj ) + P (Ai ∩ Aj ∩ Ak ) + · · ·
i=1 i<j i<j<k (2)
+ (−1)n−1 P (A1 ∩ · · · ∩ An )
5 Conditional Probability
Definition 5.1 (conditional probability). When P (B) > 0, the conditional probability of an event A given
B is defined by
P (A|B) = P (A ∩ B)/P (B)
Theorem 5.1. If P (B) > 0, then P (A ∩ B) = P (A|B)P (B).
Theorem 5.2. Let A1 , . . . , An be events with P (A1 ∩ . . . ∩ An ) > 0. Then
P (A1 ∩ · · · ∩ An ) = P (A1 ) P (A2 |A1 ) P (A3 |A1 , A2 ) · · · P (An |A1 , . . . , An−1 ) (3)
6 Independence
Definition 6.1 (independence). Two events A and B are independent if and only if
P (A ∩ B) = P (A)P (B)
. A collection of events {Ai }i∈I are said to be (mutually) independent if
P (∩i∈J Ai ) = Πi∈J P (Ai )
for any ∅ =
6 J ⊂ I.
A collection of events {Ai }i∈I are said to be pair-wise independent if
P (Ai ∩ Aj ) = P (Ai )P (Aj )
for i 6= j ∈ I.
7 BAYES THEOREM 6
Theorem 6.1. Two events A and B are independent if and only if A and B c are independent.
Definition 6.2 (conditionally independence). Two events A and B are conditionally independent given C
if
P (A ∩ B|C) = P (A|C)P (B|C)
7 Bayes Theorem
Definition 7.1. A collection of sets B1 , . . . , Bk is called a partition of A if and only if B1 , . . . , Bk are disjoint
and A = ∪ki=1 Bi .
Theorem 7.1 (Law of total probability). Let events B1 , . . . , Bk be a partition of S with P (Bj ) > 0 for all
j = 1, . . . , k. For any event A,
Xk
P (A) = P (Bj )P (A|Bj )
j=1
P (A|B)P (B)
P (B|A) =
P (A|B)P (B) + P (A|B c )P (B c )
8 Random Variables
Definition 8.1. A real-valued function X on the sample space S is called a random variable if the probability
of X is well-defined, that is, {s ∈ S : X(s) ≤ r} is an event for each r ∈ R.
Definition 8.2 (Borel sets in R). The collection of all Borel sets B in R is the smallest collection satisfying the
followings
Definition 8.3 (Probability of a random variable). For any Borel set B in R, an event X ∈ B is defined as
{s ∈ S : X(s) ∈ B} and often denoted by {X ∈ B} or (X ∈ B). The corresponding probability is
Lemma 8.1. If |X(S)| < ∞ and (X = r) is an event for any r ∈ X(S), then X is a random variable.
Definition 8.4 (distribution). The distribution of X is the collection of all probabilities of all events induced
by X, that is, (B, P (X ∈ B)). Two random variables X and Y are said to be identically distributed if they
have the same distribution.
Remark 8.1. To show X and Y having the same distribution, we need to check for any event B on R,
P (X ∈ B) = P (Y ∈ B). Since all Borel sets on R are induced by intervals, it is enough to prove
P (a < X ≤ b) = P (a < Y ≤ b)
for any a < b ∈ R. Even P (X ≤ a) = P (Y ≤ a) for any a ∈ R guarantees that X and Y are identically
distributed.
8 RANDOM VARIABLES 7
Theorem 8.3. Let X(S) = {x1 , x2 , . . .} be the set of possible values of a discrete random variable X. Then
for any subset A of R, X X
P (X ∈ A) = P ({x}) = pmfX (x)
x∈A x∈A
Definition 8.7 (absolutely continuity and probability density function). A random variable X is said to be
absolutely continuous if the probability of each interval [a, b] is of the form
ˆ b
P (a < X ≤ b) = f (x) dx
a
where a < b ∈ R and f is a non-negative function on R. Such function f is called a probability density
function (pdf) of X.
Theorem 8.4. Let X be a continuous random variable. Then
d
pdfX (x) = P (X ≤ x)
dx
Definition 8.10 (binomial). A random variable X is called a binomial random variable if it has the same
distribution as Z which is the number of success in n independent trails with success probability p, and denoted
by X ∼ binomial(n, p).
The probability mass function of X ∼ binomial(n, p) is
(
n x n−x if n = 0, 1, . . .
pmfX (x) = x p (1 − p)
0 otherwise
Definition 8.11 (continuous uniform). A random variable X defined on (a, b) for finite real numbers a < b
d−c
satisfying P (c < X ≤ d) = b−a for any c, d such that a ≤ c ≤ d ≤ b is called a uniform random variable on
(a, b) which is denoted by X ∼ uniform(a, b). The probability mass function of X ∼ unif orm(a, b) is
(
1
if a < x < b
pmfX (x) = b−a
0 otherwise
8 RANDOM VARIABLES 8
Definition 8.12 (geometric). Consider an independent Bernoulli trial with success probability p. The number
of trials until the first success is called a geometric distribution with parameter p, denoted by geometric(p).
The geometric random variable X ∼ geometric(p) has probability mass function as
for n ∈ N.
Definition 8.13 (negative binomial). Consider an independent Bernoulli trial with success probability p. The
number of trials until k-th success is called a negative binomial distribution with parameter k and p, denoted
by neg-bin(k, p).
The negative binomial random variable X ∼ neg − bin(k, p) has probability mass function as
n−1
pmfX (n) = (1 − p)n−k pk
k−1
for n ∈ N s.t. n ≥ k.
Definition 8.14 (hypergeometric). Consider a jar containing n balls of which r are black and the remainder n−r
are white. The random variable X is the number of black balls when m balls are drawn without replacement.
The probability of k black balls are drawn is
n−r n
/ if k = 0, . . . , min(r, m)
pmf X (k) = m−k m
0 otherwise.
Definition 8.15 (zeta/zipf). A positive integer valued random variable X follows a Zeta or Zipf distribution
if
n−s
pmf X (n) =
ζ(s)
P∞ −s
for n = 1, 2, . . . and s > 1 where ζ(s) = n=1 n
Definition 8.16 (Poisson). A Poisson distribution with parameter µ > 0 has the probability mass function
µn
pmfX (n) = e−µ
n!
for non-negative integer n.
Definition 8.17 (Exponential). A continuous random variable W having the probability density
is distributed from an exponential distribution with parameter λ > 0, which is denoted by W ∼ exponential
(λ).
8 RANDOM VARIABLES 9
Theorem 8.8. If a real function F satisfies (a)-(c) in the above properties, then it is a distribution function of
a random variable.
Definition 8.18 (p-quantile). The p-quantile of a random variable X is x such that P (X ≤ x) ≥ p and
P (X ≥ x) ≥ 1 − p.
Definition 8.19. The median, lower quartile, upper quartile are 0.5-, 0.25-, 0.75-quantile. The inter quartile
range (IQR) is the difference between upper and lower quartile.
Definition 8.21. Two random variables X and Y are jointly continuously distributed if and only if there exists
a non-negative function f such that for any Borel set B in R2
¨
P ((X, Y ) ∈ B) = f (x, y) dx dy
B
Theorem 8.9 (Properties of joint density functions). Joint density functions satisfies
1.
pdfX,Y (x, y) ≥ 0
2. ¨
pdfX,Y (x, y) dx dy = 1
cdfX,Y (x, y) = P (X ≤ x, Y ≤ y)
Definition 8.23. When X and Y are discrete, then the joint probability mass function of X and Y is
defined by
pmfX,Y (x, y) = P (X = x, Y = y)
1.
pmfX,Y (x, y) ≥ 0
2. X
pmfX,Y (x, y) = 1
x,y
2. ˆ
pdfX (x) = pdfX,Y (x, y) dy
Definition 8.24. Two random variables X and Y are independent if and only if
P (X ∈ A, Y ∈ B) = P (X ∈ A)P (Y ∈ B)
Theorem 8.13. If two random variables X and Y are independent, then the following hold if the functions
exist.
1. cdfX,Y (x, y) = cdfX (x) × cdfY (y) for all x, y
Definition 8.27. Let X1 , . . . , Xn be random variables. Marginal cumulative distribution, probability mass,
probability density functions of X1 , . . . , Xi−1 , Xi+1 , . . . , Xn are
cdfX1 ,...,Xi−1 ,Xi+1 ,...,Xn (x1 ,...,xi−1 ,xi+1 ,...,xn ) = lim cdfX1 ,...,Xi−1 ,Xi ,Xi+1 ,...,Xn (x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ) (4)
xi →∞
X
pmfX1 ,...,Xi−1 ,Xi+1 ,...,Xn (x1 ,...,xi−1 ,xi+1 ,...,xn ) = pmfX1 ,...,Xi−1 ,Xi ,Xi+1 ,...,Xn (x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ) (5)
xi
ˆ
pdfX1 ,...,Xi−1 ,Xi+1 ,...,Xn (x1 ,...,xi−1 ,xi+1 ,...,xn ) = pdfX1 ,...,Xi−1 ,Xi ,Xi+1 ,...,Xn (x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ) dxi (6)
Theorem 8.19. Let X be a continuous random variable and Y = g(X) be a transformed random variable
where g is an appropriate transformation like continuous increasing. The cdf of Y is
ˆ
cdfY (y) = pdfX (x) dx
{x:g(x)≤y}
Theorem 8.21 (change of variable). Let X be a continuous random variable and g be a one-to-one and
differentiable function. Then the density of random variable Y = g(X) is
d −1
pdfY (y) = pdfX (g −1 (y)) g (y)
dy
Theorem 8.22. Consider discrete random variables X1 , . . . , Xn . There exist m functions g1 , . . . , gm so that
Yi = gi (X1 , . . . , Xn ). The joint probability mass function of Y = (Y1 , . . . , Ym ) is
X
pmfY (y) = pmfX (x)
x:gi (x)=yi ,i=1,...,m
Definition 8.29. Random variables X1 , . . . , Xn are said to be independent and identically distributed
(i.i.d) if all random variables have the same distribution and are independent.
Theorem 8.23. Let X and Y be jointly continuous random variables. The density of Z = X + Y is
ˆ
pdfZ (z) = pdfX,Y (x, z − x) dx
Theorem 8.24 (change of variable). Suppose X1 , . . . , Xn have a joint density function f (x1 , . . . , xn ) and
Yi = gi (X1 , . . . , Xn ) for one-to-one correspondent and differentiable functions gi ’s, say y = g(x). The joint
density of Y1 , . . . , Yn is
∂(x1 , . . . , xn )
pdfY (y) = pdfX (x) det
∂(y1 , . . . , yn )
where x = (x1 , . . . , xn ) = g −1 (y)
8.5 Expectation
Definition 8.30. expectation The expectation (or expected value or mean value) of a discrete random variable
is X X
E[X] = x × P (X = x) = x × pmfX (x)
x x
Lemma 8.2. Let F be the cumulative distribution function of a random variable X. For an interval,
P (a < X ≤ b) = E[1(a < X ≤ b)]
In general, for each event A of X,
P (X ∈ A) = E[1(X ∈ A)]
Theorem 8.27. Let X be a random variable and g be a function on R. If expectation of Y = g(X) is defined,
then ˆ ˆ ∞
E[Y ] = g(x) d cdfX (x) = g(x) · pdfX (x) dx
−∞
or ˆ X
E[Y ] = g(x) d cdfX (x) = g(x) · pdfX (x)
x
8.6 Moments
Definition 8.32. For positive integer k, the k-th moment of X is E[X k ] and the k-th central moment is
E[(X − E[X])k ].
Theorem 8.30. If E[|X|t ] < ∞ for some t > 0, then E[|X|s ] < ∞ for any 0 ≤ s ≤ t.
Definition 8.33 (variance). The variance of a random variable X is
VAR X = E[(X − E[X])2 ]
The covariance and correlation between two random variables X and Y are
Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]
and
Cov(X, Y )
Cor(X, Y ) = √
VAR X VAR Y
9 INEQUALITIES 14
1. VAR X ≥ 0
3. VAR aX + b = a2 VAR X
Definition 8.34 (skewness and kurtosis). The standardized third and fourth moments are said to be skewness
and kurtosis, that is,
skewness = E[(X − µ)3 ]/σ 3 , kurtosis = E[(X − µ)4 ]/σ 4
where µ = E[X] and σ 2 = VAR X.
9 Inequalities
Theorem 9.1 (Chebychev’s inequality). Let X be a random variable with mean µ and variance σ 2 . Then, for
any α > 0,
1
P (|X − µ| ≥ ασ) ≤ 2
α
Equivalently, for α > 0,
VAR X
P (|X − µ| > α) ≤
α2
Theorem 9.2 (Markov’s inequality). If X ≥ 0 with µ = E[X] < ∞, then for any α > 0,
P (X ≥ α) ≤ µ/α
Remark 9.1. The Chebychev’s inequality is a special case of Markov’s inequality by considering
Y = (X − µ)2
Theorem 9.3 (Cauchy-Schwartz’ inequality). Let X and Y be two random variables having finite second
moment. Then
[E[XY ]]2 ≤ E[X 2 ]E[Y 2 ]
where the equality holds if and only if P (aX = bY ) = 1 for some a, b ∈ R.
Theorem 9.4. Let X and Y be two random variables with finite second moment. Then Y = aX + b for some
a, b if and only if |Corr(X, Y )| = 1.
10 CONDITIONAL EXPECTATION 15
Lemma 9.1 (Young’s inequality). For p, q > 1 with 1/p + 1/q = 1 and two nonnegative real numbers x, y ≥ 0,
xy ≤ xp /p + y q /q
Theorem 9.5 (Hölder’s inequality). For p, q > 1 with 1/p + 1/q = 1,
E[|XY |] ≤ ||X||p ||Y ||q
when the expectations exist and are finite where ||X||r = E[|X|r ]1/r for r > 0.
Remark 9.2. The Cauchy-Schwartz’ inequality is a special case of Hölder’s inequality (p = q = 2)
Theorem 9.6 (Jensen’s inequality). For a convex function ϕ,
ϕ(E[X]) ≤ E[ϕ(X)]
Theorem 9.7 (Minkowski’s inequality). For p ≥ 1,
||X + Y ||p ≤ ||X||p + ||Y ||p
10 Conditional Expectation
Definition 10.1. conditional expectation The conditional expectation of Y given X = x is defined by
ˆ
E[Y |X = x] = y dcdfY |X (y|x)
Remark 10.1. The conditional expectation E[Y |X = x] is always a function of x, say h(x). Then denote
h(X) = E[Y |X] as a random variable.
Theorem 10.1. Assume E[|Y |] < ∞. Then
ˆ ∞ ˆ 0
E[Y |X = x] = P (Y > z|X = x) dz − P (Y < z|X = x) dz
0 −∞
If Y is discrete, then X
E[Y |X = x] = y × pmfY |X (y|x)
y
If Y is continuous, then ˆ
E[Y |X = x] = y × pmfY |X (y|x) dy
Theorem 10.4.
VAR Y = E[VAR Y |X] + VAR E[Y |X]
√
where t ∈ R, z > 0 and i = −1 is the unit imaginary number.
1. mgfX (0) = 1
dk
2. E[X k ] = dtk mgfX (0) if it exists
t2 tk
mgfX (t) = 1 + µ1 t + µ2 + . . . + µk + o(|t|k )
2! k!
1. cgfX (0) = 0
1. pgfX (1) = 1
dk
2. E[X(X − 1) . . . (X − k + 1)] = dz k
pgfX (1) if it exists.
1. chfX (0) = 1
k
2. E[X k ] = (i)−k dt
d
k chfX (0) if it exists
t2 tk
chfX (t) = 1 + iµ1 t − µ2 + . . . + ik µk + o(|t|k )
2! k!
Theorem 11.5. If two random variables X and Y have the same moment generating functions in an open
neighbourhood of 0, that is, (−a, b) for a, b > 0, then X and Y are identically distributed.
Theorem 11.6. If a function ϕ : R → C satisfies 5 - 8 in Theorem 11.4, then there exists a random variable
having ϕ as its characteristic function.
Definition 11.1. The joint probability/moment/cumulant generating and characteristic functions of X and Y
are
Theorem 11.7 (Inversion Formula). Let ϕ be a characteristic function of a random variable X. Then for any
a, b, ˆ ∞ −iat
1 e − e−ibt
P (a < X < b) + {P (X = a) + P (X = b)}/2 = lim ϕ(t) dt
T →∞ 2π −∞ it
Theorem 11.8 (Chernoff Bound). Let X be a random variable having moment generating function. For any
constant x,
P (X ≥ x) ≤ inf e−xt mgfX (t)
t>0
12 STOCHASTIC PROCESS 18
for t > 0.
The residual (or future) lifetime given X > t is defined by
RX (t) = X − t
The mean residual lifetime is the conditional expectation of residual lifetime given X > t, that is,
ˆ ∞ ˆ ∞
SX (z)
E[RX (t)|X > t] = P (RX (t) > z|X > t) dz = (7)
0 t SX (t)
12 Stochastic process
Definition 12.1. A stochastic process is a collection of time indexed random variables
{Xt : t ∈ T }
E[Xn+1 |X0 , . . . , Xn ] ≤ Xn
for all n.
E[Xn+1 |X0 , . . . , Xn ] ≥ Xn
for all n.
Note: the condition X0 , . . . , Xn is often replaced by F, that is,
E[Xn+1 |Fn ] = Xn
12 STOCHASTIC PROCESS 19
Remark 12.3. If the path of a Wiener process f (t) reaches a value f (s) = a at time t = s, then the subsequent
path after time s has the same distribution as the reflection of the subsequent path about the value a.
13 MODE OF CONVERGENCE 20
13 Mode of Convergence
Definition 13.1. Modes of convergence
d
• A sequence of random variables Xn converges to X in distribution (Xn −→ X) if
P (Xn ≤ x) → P (X ≤ x)
P (|Xn − X| > ) → 0
as n → ∞
a.s.
• A sequence of random variables Xn converges to X almost surely (Xn −→ X) if
Lp
• A sequence of random variables Xn converges to X in Lp (Xn −→ X) for p > 0 if
E[|Xn − X|p ] → 0
as n → ∞
Theorem 13.1. Let Xn and X be discrete random variables with probability mass functions fn (x) and f (x)
satisfying fn (x) → f (x) for any x with f (x) > 0. Then
Xn −→ X
in distribution.
Theorem 13.2 (Relations between modes of convergence). As follows:
a.s. p
(a) Xn −→ X =⇒ Xn −→ X
Lp p
(b) Xn −→ X =⇒ Xn −→ X
p d
(c) Xn −→ X =⇒ Xn −→ X
13.1 L1 Convergence
Lemma 13.1 (L1 Convergence). If Y ≥ 0 and E[[]Y ] < ∞, then for any > 0 there exists M > 0 such that
Lemma 13.2. Suppose a random variable Y has a finite absolute expectation, that is, E[|Y |] < ∞. For any
> 0, there exists δ > 0 such that |E[Y 1{A}]| < for any event A with P (A) < δ where 1{A} is an indicator
function of the event A.
Lemma 13.3. Suppose a random variable Y has a finite absolute expectation, that is, E[|Y |] < ∞ and a
sequence An of events satisfy P (An ) → 0. Then
E[Y 1{An }] → 0
Theorem 13.3 (Dominated Convergence Theorem). Suppose that Xn → X in probability, |Xn | ≤ Y and
E[Y ] < ∞. Then
E[Xn ] → E[X]
13 MODE OF CONVERGENCE 21
Theorem 13.4 (Generalized Dominated Convergence Theorem). If all X, Y, Xn , Yn have finite absolute expec-
tation, |Xn | ≤ Yn for all n, Xn → X in probability, Yn → Y , and E[Yn ] → E[Y ], then
E[Xn ] → E[X]
Theorem 13.5 (Monotone Convergence Theorem). Let Xn be non-negative non-decreasing random variables.
Suppose lim Xn = X is finite a.s. Then
n→∞
lim E[Xn ] = E[X]
n→∞
Theorem 13.6 (Fatou’s lemma). Let X1 , X2 , . . . be a sequence of non-negative random variables. Then
Theorem 13.9. If a sequence of random variables Xn converges to X in probability, then there exists a
subsequence nk such that Xnk converges to X almost surely.
Theorem 13.10. A sequence xn of real numbers converges to x if and only if for any subsequence nk there
exists a further subsequence nkl such that xnkl
Theorem 13.11. A sequence of random variables Xn converges to X in probability if and only if for any
subsequence nk there exists a further subsequence nkl such that Xnkl converges to X a.s.
Theorem 13.13. Let X be a random variable with P (X = x) = 0 for all x and F be the distribution function
of X. Then F (X) ∼ unif orm(0, 1) and F −1 (U ) ∼ X for any U ∼ unif orm(0, 1)
d
Theorem 13.14 (Skorokhod’s representation theorem). If Xn −→ X, then there exist random variables
Y, Y1 , Y2 , . . . in a probability space such that
(a) Xn and Yn have the same distribution as well as X and Y have the same distribution
a.s.
(b) Yn −→ Y
d
Theorem 13.17. Xn −→ X if and only if
Theorem 14.2 (Strong Law of Large Numbers). Let X1 , . . . , Xn be i.i.d. r.v.s with E[|Xn |] < ∞. Then
a.s.
X̄n −→ E[X1 ]
Theorem 15.1 (Levy’s Central Limit Theorem). Let X1 , . . . , Xn be i.i.d. r.v.s with µ = E[Xi ] and σ 2 =
VAR Xi . Then
√ d
n(X̄n − µ)/σ −→ N (0, 1)
Theorem 15.2 (Lindeberg-Feller Central Limit Theorem). Let X1 , . . . , Xn be i.i.d. r.v.s with E[Xi ] = 0 and
σi2 = VAR Xi2 < ∞. Let s2n = E[X12 ] + . . . + E[Xn2 ] The Lindeberg condition
1
Pn 2 2 →0
s2n k=1 E[Xk 1{Xk > s2n }]
for any > 0 holds if and only if
d
(X1 + . . . + Xn )/sn −→ N (0, 1)
and
max(σ12 , . . . , σn2 )/s2n → 0
15 CENTRAL LIMIT THEOREM 23
Theorem 15.3 (Lyapounov’s condition). Let X1 , . . . , Xn be i.i.d. r.v.s with E[Xi ] = 0 and σi2 = VAR Xi2 < ∞
satisfying Lyapounov’s condition
n
1 X
lim E[|Xk |2+δ ] = 0
n→∞ s2+δ
n k=1
Theorem 15.4 (δ-method). Let X1 , . . . , Xn be i.i.d. r.v.s anf an is a sequence of positive real numbers diverging
d
to infinity. If an (Xn − µ) −→ Z for some r.v. Z and a constant µ, then for any continuously differentiable
function g,
d
an (g(Xn ) − g(µ)) −→ g 0 (µ)Z