0% found this document useful (0 votes)
17 views23 pages

STA347

This document is a study guide for Yuchen Wang's STA347 final preparation. It covers topics in probability theory, including experiments and sample spaces, properties of probability, classical equal probability and combinatorics, conditional probability, random variables, stochastic processes, modes of convergence, laws of large numbers, and the central limit theorem. Key concepts are defined, such as σ-fields, axioms of probability, disjoint and finite sets. Properties of probability like non-negativity, countable additivity, and continuity from below and above are also summarized.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views23 pages

STA347

This document is a study guide for Yuchen Wang's STA347 final preparation. It covers topics in probability theory, including experiments and sample spaces, properties of probability, classical equal probability and combinatorics, conditional probability, random variables, stochastic processes, modes of convergence, laws of large numbers, and the central limit theorem. Key concepts are defined, such as σ-fields, axioms of probability, disjoint and finite sets. Properties of probability like non-negativity, countable additivity, and continuity from below and above are also summarized.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

STA347

Final Preparation
Yuchen Wang

March 26, 2020

Contents
1 Experiments, Events and Sample Spaces 3

2 Definition and Properties of Probability 3


2.1 Finite Sample Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Classical Equal Probability and Combinatorics 4

4 Inclusion-Exclusion Formula 5

5 Conditional Probability 5

6 Independence 5

7 Bayes Theorem 6

8 Random Variables 6
8.1 Examples of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
8.2 Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
8.3 Multivariate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
8.3.1 Bivariage Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
8.3.2 Marginal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
8.3.3 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
8.3.4 Multivariate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
8.4 Functions of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
8.5 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
8.6 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

9 Inequalities 14

10 Conditional Expectation 15

11 Probability Related Functions 16


11.1 Survival Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

12 Stochastic process 18
12.1 Random Walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
12.2 Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
12.3 Reflection principle (Wiener process) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1
CONTENTS 2

13 Mode of Convergence 20
13.1 L1 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
13.2 Almost Sure Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
13.3 Convergence in distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

14 Law of Large Numbers 22

15 Central Limit Theorem 22


1 EXPERIMENTS, EVENTS AND SAMPLE SPACES 3

1 Experiments, Events and Sample Spaces


Definition 1.1. Experiment, Sample space and event

• Experiment: Any process, real or hypothetical, in which the possible outcomes can be identified ahead of
time;

• Sample space: The collection of all possible outcomes, denoted by S;

• Event: A well-defined subset of sample space

Definition 1.2 (countably infinity). A set is countably infinite if its elements can be put in one-to-one
correspondence with the set of natural numbers.

Definition 1.3 (At most countable sets). A set that is either finite or countably infinite is called an at most
countable set.

Theorem 1.1. Suppose E, E1 , E2 , . . . are events. The following are also events

1. E c

2. E1 ∪ E2 ∪ . . . En
P∞
3. i=1 Ei

2 Definition and Properties of Probability


Definition 2.1 (σ-field). Let χ be a space. A collection F of subsets of χ is called a σ-field if

1. χ ∈ F

2. (closure under complement) if E ∈ F, then E c ∈ F

3. (closure under countable union) if E1 , E2 , . . . ∈ F, then ∪∞


n=1 En ∈ F

Remark 2.1. A σ-field refers to the collection of subsets of a sample space that we should use in order to
establish a mathematically formal definition of probability. The sets in the σ-field constitute the events from
our sample space.

Axiom 2.1 (Axioms of Probability). Let S be a sample space, and let F be a σ-field of S.

• Axiom 1 (non-negativity) P (E) ≥ 0 for any event E ∈ F.

• Axiom 2 P (S) = 1

• Axiom 3 (countable additivity) For every sequence of disjoint events E1 , E2 , . . . ∈ F



X
P (∪∞
i=1 Ei ) = P (Ei )
i=1

Definition 2.2 (probability). Any function P on a sample space S satisfying Axioms 1-3 is called a probability.

Definition 2.3 (disjoint sets). Sets A and B are disjoint if A ∩ B = ∅.

Theorem 2.1. Properties of Probability

1. P (∅) = 0
3 CLASSICAL EQUAL PROBABILITY AND COMBINATORICS 4

2. (finite additivity) For any disjoint events E1 , . . . , En ,


n
X
P (∪ni=1 Ei ) = P (Ei )
i=1

3. P (Ac ) = 1 − P (A)

4. For A ⊂ B, P (A) ≤ P (B)

5. 0 ≤ P (A) ≤ 1

6. P (A − B) = P (A) − P (A ∩ B)

7. P (A ∪ B) = P (A) + P (B) − P (A ∩ B)

8. (subadditivity, Boole’s inequality) For any events E1 , . . . , En ,


n
X
P (∪ni=1 Ei ) ≤ P (Ei )
i=1

Theorem 2.2 (Continuity from below and above). Let P be a probability.


(continuity from below) If An % A (i.e. A1 ⊂ A2 ⊂ . . . and ∪n An = A), then P (An ) % P (A)
(continuity from above) If An & A (i.e. A1 ⊃ A2 ⊃ . . . and ∩n An = A), then P (An ) & P (A)

2.1 Finite Sample Spaces


Suppose |S| = n, that is, S = {s1 , . . . , sn }. Then each member has probability, that is, pi = P ({si }) such that
n
X
pi ≥ 0 and pi = 1
i=1

3 Classical Equal Probability and Combinatorics


Definition 3.1 (permutation). When there are n elements, the number of events pulling k elements out of n
elements is called a permutation of n elements taken k at a time and denoted by Pn,k .
Theorem 3.1.
n!
Pn,k = n(n − 1) . . . (n − k + 1) =
(n − k)!
Definition 3.2 (combination). The number of combinations of n elements taken k at a time is denoted by Cn,k
or nk .


Theorem 3.2.  
n n!
Cn,k = = = Pn,k /k!
k k!(n − k)!
Theorem 3.3 (Binomial coefficients).
n  
n
X n k n−k
(x + y) = x y
k
k=0

Theorem 3.4 (Newton Expansion). For |z| < 1, the term (1 + z)r can be expanded as
∞  
r
X r k
(1 + z) = z
k
k=0
4 INCLUSION-EXCLUSION FORMULA 5

Theorem 3.5.  
n r(r − 1) . . . (r − k + 1) Γ(r + 1)
= =
k k! Γ(r − k + 1)Γ(k + 1)
´∞
with Γ(α) = 0 xα−1 e−x dx
Theorem 3.6. For any numbers x1 , . . . , xk and non-negative integer n,
X n

(x1 + . . . + xk ) =n
xn1 . . . xnk k
n1 , . . . , n k 1
It is easy to see that
       
n n n2 + · · · + nk n3 + · · · + nk nk
= ···
n1 , . . . , n k n1 n2 n3 nk
(1)
n!
=
n1 ! · · · nk !
Theorem 3.7 (Stirling’s formula).
1 1
lim log(n!) − [ log(2π) + (n + ) log(n) − n] = 0
n→∞ 2 2

4 Inclusion-Exclusion Formula
For any n events A1 , . . . , An ,
n
X X X
P (∪ni=1 Ai ) = P (Ai ) − P (Ai ∩ Aj ) + P (Ai ∩ Aj ∩ Ak ) + · · ·
i=1 i<j i<j<k (2)
+ (−1)n−1 P (A1 ∩ · · · ∩ An )

5 Conditional Probability
Definition 5.1 (conditional probability). When P (B) > 0, the conditional probability of an event A given
B is defined by
P (A|B) = P (A ∩ B)/P (B)
Theorem 5.1. If P (B) > 0, then P (A ∩ B) = P (A|B)P (B).
Theorem 5.2. Let A1 , . . . , An be events with P (A1 ∩ . . . ∩ An ) > 0. Then
P (A1 ∩ · · · ∩ An ) = P (A1 ) P (A2 |A1 ) P (A3 |A1 , A2 ) · · · P (An |A1 , . . . , An−1 ) (3)

6 Independence
Definition 6.1 (independence). Two events A and B are independent if and only if
P (A ∩ B) = P (A)P (B)
. A collection of events {Ai }i∈I are said to be (mutually) independent if
P (∩i∈J Ai ) = Πi∈J P (Ai )
for any ∅ =
6 J ⊂ I.
A collection of events {Ai }i∈I are said to be pair-wise independent if
P (Ai ∩ Aj ) = P (Ai )P (Aj )
for i 6= j ∈ I.
7 BAYES THEOREM 6

Theorem 6.1. Two events A and B are independent if and only if A and B c are independent.

Definition 6.2 (conditionally independence). Two events A and B are conditionally independent given C
if
P (A ∩ B|C) = P (A|C)P (B|C)

Remark 6.1. Conditional independence does not imply independence.

7 Bayes Theorem
Definition 7.1. A collection of sets B1 , . . . , Bk is called a partition of A if and only if B1 , . . . , Bk are disjoint
and A = ∪ki=1 Bi .

Theorem 7.1 (Law of total probability). Let events B1 , . . . , Bk be a partition of S with P (Bj ) > 0 for all
j = 1, . . . , k. For any event A,
Xk
P (A) = P (Bj )P (A|Bj )
j=1

Theorem 7.2 (Bayes’ Theorem). If 0 < P (A), P (B) < 1, then

P (A|B)P (B)
P (B|A) =
P (A|B)P (B) + P (A|B c )P (B c )

8 Random Variables
Definition 8.1. A real-valued function X on the sample space S is called a random variable if the probability
of X is well-defined, that is, {s ∈ S : X(s) ≤ r} is an event for each r ∈ R.

Definition 8.2 (Borel sets in R). The collection of all Borel sets B in R is the smallest collection satisfying the
followings

1. (a, b] ∈ B for any a < b ∈ R

2. (closure under complement) For any B ∈ B, B c ∈ B

3. (closure under countable union) For any B1 , B2 , . . . ∈ B, ∪∞


j=1 Bj ∈ B

We call the collection B the Borel σ-field

Definition 8.3 (Probability of a random variable). For any Borel set B in R, an event X ∈ B is defined as
{s ∈ S : X(s) ∈ B} and often denoted by {X ∈ B} or (X ∈ B). The corresponding probability is

P (X ∈ B) = P ({s ∈ S : X(s) ∈ B})

Lemma 8.1. If |X(S)| < ∞ and (X = r) is an event for any r ∈ X(S), then X is a random variable.

Definition 8.4 (distribution). The distribution of X is the collection of all probabilities of all events induced
by X, that is, (B, P (X ∈ B)). Two random variables X and Y are said to be identically distributed if they
have the same distribution.

Remark 8.1. To show X and Y having the same distribution, we need to check for any event B on R,
P (X ∈ B) = P (Y ∈ B). Since all Borel sets on R are induced by intervals, it is enough to prove

P (a < X ≤ b) = P (a < Y ≤ b)

for any a < b ∈ R. Even P (X ≤ a) = P (Y ≤ a) for any a ∈ R guarantees that X and Y are identically
distributed.
8 RANDOM VARIABLES 7

Definition 8.5 (discrete random variable). A random variable X is said to be discrete if P (X = x) = 0 or


P (X = x) > 0 and P (X ∈ χ0 ) = 1 where χ0 = {x ∈ R : P (X = x) > 0}
Definition 8.6 (probability mass function). The probability mass function (pmf) of a discrete random
variable X is
pmfX (x) = P (X = x)
for any possible value of x ∈ X(S).
Theorem 8.1. Let X be a discrete random variable. Then the set of x having P (X = x) is at most countable.
Theorem 8.2. Let f be the pmf ofP a discrete random variable X. The set of possible values of X is X(S) =
/ X(S) ≥ 0 and ∞
{x1 , x2 , . . .}. For x ∈ i=1 f (xi ) = 1.

Theorem 8.3. Let X(S) = {x1 , x2 , . . .} be the set of possible values of a discrete random variable X. Then
for any subset A of R, X X
P (X ∈ A) = P ({x}) = pmfX (x)
x∈A x∈A

Definition 8.7 (absolutely continuity and probability density function). A random variable X is said to be
absolutely continuous if the probability of each interval [a, b] is of the form
ˆ b
P (a < X ≤ b) = f (x) dx
a

where a < b ∈ R and f is a non-negative function on R. Such function f is called a probability density
function (pdf) of X.
Theorem 8.4. Let X be a continuous random variable. Then
d
pdfX (x) = P (X ≤ x)
dx

8.1 Examples of Random Variables


Definition 8.8 (Bernoulli). A random variable X taking value 0 or 1 with P (X = 1) = p and P (X = 0) = 1−p
for some p ∈ [0, 1] is called a Bernoulli random variable with success probability p and often denoted by X ∼
Bernoulli(p).
Definition 8.9 (discrete uniform). Let χ be a non-empty finite set. A random variable X taking values in χ
with equal probability is called a uniform random variable on χ and denoted by X ∼ unif orm(χ).
The probability mass function of X ∼ unif orm(χ) is
(
1
if x ∈ χ
pmfX (x) = |χ|
0 otherwise

Definition 8.10 (binomial). A random variable X is called a binomial random variable if it has the same
distribution as Z which is the number of success in n independent trails with success probability p, and denoted
by X ∼ binomial(n, p).
The probability mass function of X ∼ binomial(n, p) is
( 
n x n−x if n = 0, 1, . . .
pmfX (x) = x p (1 − p)
0 otherwise

Definition 8.11 (continuous uniform). A random variable X defined on (a, b) for finite real numbers a < b
d−c
satisfying P (c < X ≤ d) = b−a for any c, d such that a ≤ c ≤ d ≤ b is called a uniform random variable on
(a, b) which is denoted by X ∼ uniform(a, b). The probability mass function of X ∼ unif orm(a, b) is
(
1
if a < x < b
pmfX (x) = b−a
0 otherwise
8 RANDOM VARIABLES 8

Definition 8.12 (geometric). Consider an independent Bernoulli trial with success probability p. The number
of trials until the first success is called a geometric distribution with parameter p, denoted by geometric(p).
The geometric random variable X ∼ geometric(p) has probability mass function as

pmfX (n) = (1 − p)n−1 p

for n ∈ N.

Definition 8.13 (negative binomial). Consider an independent Bernoulli trial with success probability p. The
number of trials until k-th success is called a negative binomial distribution with parameter k and p, denoted
by neg-bin(k, p).
The negative binomial random variable X ∼ neg − bin(k, p) has probability mass function as
 
n−1
pmfX (n) = (1 − p)n−k pk
k−1

for n ∈ N s.t. n ≥ k.

Definition 8.14 (hypergeometric). Consider a jar containing n balls of which r are black and the remainder n−r
are white. The random variable X is the number of black balls when m balls are drawn without replacement.
The probability of k black balls are drawn is
    
n−r n
/ if k = 0, . . . , min(r, m)

pmf X (k) = m−k m
0 otherwise.

Such distribution is called a hypergeometric distribution.

Definition 8.15 (zeta/zipf). A positive integer valued random variable X follows a Zeta or Zipf distribution
if
n−s
pmf X (n) =
ζ(s)
P∞ −s
for n = 1, 2, . . . and s > 1 where ζ(s) = n=1 n

Definition 8.16 (Poisson). A Poisson distribution with parameter µ > 0 has the probability mass function

µn
pmfX (n) = e−µ
n!
for non-negative integer n.

Theorem 8.5. If X ∼ P oisson(λ) and the distribution of Y , conditional on X = k, is a binomial distribution,


Y |(X = k) ∼ Binom(k, p), then the distribution of Y follows a Poisson distribution Y ∼ P oisson(λ · p)

Theorem 8.6 (SumsP P random variables). If Xi ∼ P oisson(λi ) for i = 1, . . . , n are


of Poisson-distributed
independent, and λ = ni=1 λi , then Y = ( ni=1 Xi ) ∼ P oisson(λ).

Definition 8.17 (Exponential). A continuous random variable W having the probability density

pdf W (w) = λe−λw 1(w > 0)

is distributed from an exponential distribution with parameter λ > 0, which is denoted by W ∼ exponential
(λ).
8 RANDOM VARIABLES 9

8.2 Cumulative Distribution Function


The (cumulative) distribution function of a random variable X is the function

cdf X (x) = FX (x) = P (X ≤ x)

for −∞ < x < ∞.

Theorem 8.7 (properties of distribution functions). Let F be a distribution function. Then


(a) F is nondecreasing,
(b) limx→∞ F (x) = 1 and limx→−∞ F (x) = 0,
(c) F is right continuous, that is, limy&x F (y) = F (x),
(d) F (x−) := limy%x F (y) = P (X < x)
(e) P (X = x) = F (x) − F (x−)

Theorem 8.8. If a real function F satisfies (a)-(c) in the above properties, then it is a distribution function of
a random variable.

Definition 8.18 (p-quantile). The p-quantile of a random variable X is x such that P (X ≤ x) ≥ p and
P (X ≥ x) ≥ 1 − p.

Definition 8.19. The median, lower quartile, upper quartile are 0.5-, 0.25-, 0.75-quantile. The inter quartile
range (IQR) is the difference between upper and lower quartile.

8.3 Multivariate Distributions


8.3.1 Bivariage Distributions
Definition 8.20. The joint/bivariate distribution of two random variables X and Y is the collection of all
possible probabilities, that is, P ((X, Y ) ∈ B) where B is a Borel set in R2 .

Definition 8.21. Two random variables X and Y are jointly continuously distributed if and only if there exists
a non-negative function f such that for any Borel set B in R2
¨
P ((X, Y ) ∈ B) = f (x, y) dx dy
B

Such function f is called a joint density function of (X, Y ).

Theorem 8.9 (Properties of joint density functions). Joint density functions satisfies

1.
pdfX,Y (x, y) ≥ 0

2. ¨
pdfX,Y (x, y) dx dy = 1

Definition 8.22. The joint (cumulative) distribution function of X and Y is

cdfX,Y (x, y) = P (X ≤ x, Y ≤ y)

Definition 8.23. When X and Y are discrete, then the joint probability mass function of X and Y is
defined by
pmfX,Y (x, y) = P (X = x, Y = y)

Theorem 8.10 (Properties of joint probability mass functions). Satisfies


8 RANDOM VARIABLES 10

1.
pmfX,Y (x, y) ≥ 0

2. X
pmfX,Y (x, y) = 1
x,y

Theorem 8.11. Consider two random variables X and Y .

lim cdfX,Y (x, y) = 0


y→−∞

lim cdfX,Y (x, y) = 0


x→−∞
lim cdfX,Y (x, y) = cdfX (x)
y→∞

lim cdfX,Y (x, y) = cdfY (y)


x→∞

8.3.2 Marginal Distributions


Suppose X and Y are random variables. The cdf or pmf or pdf of X (or Y ) derived from the joint cdf or pmf
or pdf is called the marginal cdf or pmf or pdf of X (or Y ).
Theorem 8.12. 1. X
pmfX (x) = pmfX,Y (x, y)
y

2. ˆ
pdfX (x) = pdfX,Y (x, y) dy

Definition 8.24. Two random variables X and Y are independent if and only if

P (X ∈ A, Y ∈ B) = P (X ∈ A)P (Y ∈ B)

Theorem 8.13. If two random variables X and Y are independent, then the following hold if the functions
exist.
1. cdfX,Y (x, y) = cdfX (x) × cdfY (y) for all x, y

2. pmfX,Y (x, y) = pmfX (x) × pmfY (y) for all x, y

3. pdfX,Y (x, y) = pdfX (x) × pdfY (y) for all x, y


Theorem 8.14. If one of the following hold, then two random variables X and Y are independent.
1. cdfX,Y (x, y) = cdfX (x) × cdfY (y) for all x, y

2. pmfX,Y (x, y) = pmfX (x) × pmfY (y) for all x, y

3. pdfX,Y (x, y) = pdfX (x) × pdfY (y) for all x, y

8.3.3 Conditional Distributions


Definition 8.25. The conditional density of X given Y = y is
pdfX,Y (x, y)
pdfX|Y (x|y) =
pdfY (y)
Theorem 8.15.
pdfX,Y (x, y) = pdfX (x)pdfX|Y (x|y)
8 RANDOM VARIABLES 11

8.3.4 Multivariate Distributions


Definition 8.26. The joint cumulative distribution function of n variables X1 , . . . , Xn is defined by

cdfX1 ,...,Xn (x1 , . . . , xn ) = P (X1 ≤ x1 , . . . , Xn ≤ xn )

The joint probability mass/density function of n discrete/continuous random variables X1 , . . . , Xn is

pmfX1 ,...,Xn (x1 , . . . , xn ) = P (X1 = x1 , . . . , Xn = xn )


ˆ ˆ
P ((X1 , . . . , Xn ) ∈ B) = . . . pdfX1 ,...,Xn (x1 , . . . , xn ) dxn . . . dx1
B

Definition 8.27. Let X1 , . . . , Xn be random variables. Marginal cumulative distribution, probability mass,
probability density functions of X1 , . . . , Xi−1 , Xi+1 , . . . , Xn are

cdfX1 ,...,Xi−1 ,Xi+1 ,...,Xn (x1 ,...,xi−1 ,xi+1 ,...,xn ) = lim cdfX1 ,...,Xi−1 ,Xi ,Xi+1 ,...,Xn (x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ) (4)
xi →∞
X
pmfX1 ,...,Xi−1 ,Xi+1 ,...,Xn (x1 ,...,xi−1 ,xi+1 ,...,xn ) = pmfX1 ,...,Xi−1 ,Xi ,Xi+1 ,...,Xn (x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ) (5)
xi
ˆ
pdfX1 ,...,Xi−1 ,Xi+1 ,...,Xn (x1 ,...,xi−1 ,xi+1 ,...,xn ) = pdfX1 ,...,Xi−1 ,Xi ,Xi+1 ,...,Xn (x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ) dxi (6)

Theorem 8.16. Let X1 , . . . , Xn be continuous random variables having cdf. Then


∂n
pdfX1 ,...,Xn (x1 , . . . , xn ) = F (x1 , . . . , xn )
∂x1 . . . ∂xn
Definition 8.28. Random variables X1 , . . . , Xn are independent if and only if for any Borel sets B1 , . . . , Bn

P (X1 ∈ B1 , . . . , Xn ∈ Bn ) = P (X1 ∈ B1 ) . . . P (Xn ∈ Bn )

Theorem 8.17. Random variables X1 , . . . , Xn are independent if and only if

cdfX1 ,...,Xn (x1 , . . . , xn ) = cdfX1 (x1 ) . . . cdfXn (xn )

8.4 Functions of Random Variables


Theorem 8.18. Let X be a discrete random variable and Y = g(X) be a transformed random variable where
g : R → R is a function. The pmf of Y is
X
pmfY (y) = pmfX (x)
x:g(x)=y

Theorem 8.19. Let X be a continuous random variable and Y = g(X) be a transformed random variable
where g is an appropriate transformation like continuous increasing. The cdf of Y is
ˆ
cdfY (y) = pdfX (x) dx
{x:g(x)≤y}

The probability density function of Y is


d
pdfY (y) = cdfY (y)
dy
Theorem 8.20. Let X be a continuous random variable and F (x) = cdfX (x). Then new random variable
Y = F (X) is uniformly distributed on (0, 1), that is, Y ∼ unif orm(0, 1).
8 RANDOM VARIABLES 12

Theorem 8.21 (change of variable). Let X be a continuous random variable and g be a one-to-one and
differentiable function. Then the density of random variable Y = g(X) is

d −1
pdfY (y) = pdfX (g −1 (y)) g (y)
dy

whenever y is in the range of Y (S).

Theorem 8.22. Consider discrete random variables X1 , . . . , Xn . There exist m functions g1 , . . . , gm so that
Yi = gi (X1 , . . . , Xn ). The joint probability mass function of Y = (Y1 , . . . , Ym ) is
X
pmfY (y) = pmfX (x)
x:gi (x)=yi ,i=1,...,m

Definition 8.29. Random variables X1 , . . . , Xn are said to be independent and identically distributed
(i.i.d) if all random variables have the same distribution and are independent.

Theorem 8.23. Let X and Y be jointly continuous random variables. The density of Z = X + Y is
ˆ
pdfZ (z) = pdfX,Y (x, z − x) dx

If X and Y are independent, then


ˆ
pdfX (z) = pdfX (x)pdfY (z − x) dx

Theorem 8.24 (change of variable). Suppose X1 , . . . , Xn have a joint density function f (x1 , . . . , xn ) and
Yi = gi (X1 , . . . , Xn ) for one-to-one correspondent and differentiable functions gi ’s, say y = g(x). The joint
density of Y1 , . . . , Yn is  
∂(x1 , . . . , xn )
pdfY (y) = pdfX (x) det
∂(y1 , . . . , yn )
where x = (x1 , . . . , xn ) = g −1 (y)

8.5 Expectation
Definition 8.30. expectation The expectation (or expected value or mean value) of a discrete random variable
is X X
E[X] = x × P (X = x) = x × pmfX (x)
x x

when the sum is absolutely convergent.

Definition 8.31. The expectation of a continuous random variable X is defined by


ˆ
E[X] = x × pdfX (x) dx

Theorem 8.25. Assume a discrete random variable X is non-negative. Then


ˆ ∞ ˆ ∞
E[X] = P (X > z) dz = x dF (x)
0 0

Corollary 8.1. Let X be a non-negative integer valued random variables. Then



X
E[X] = P (X ≥ n)
n=1
8 RANDOM VARIABLES 13

Lemma 8.2. Let F be the cumulative distribution function of a random variable X. For an interval,
P (a < X ≤ b) = E[1(a < X ≤ b)]
In general, for each event A of X,
P (X ∈ A) = E[1(X ∈ A)]

Theorem 8.26. For any random variable X with finite expectation,


ˆ ∞ ˆ 0 ˆ ∞
E[X] = P (X > z) dz − P (X < z) dz = x dF (x)
0 −∞ −∞

Theorem 8.27. Let X be a random variable and g be a function on R. If expectation of Y = g(X) is defined,
then ˆ ˆ ∞
E[Y ] = g(x) d cdfX (x) = g(x) · pdfX (x) dx
−∞
or ˆ X
E[Y ] = g(x) d cdfX (x) = g(x) · pdfX (x)
x

Lemma 8.3. Assume X, Y ≥ 0 with probability 1, that is, P (X ≥ 0, Y ≥ 0) = 1, then


E[X + Y ] = E[X] + E[Y ]
and
E[X − Y ] = E[X] − E[Y ]
Theorem 8.28 (Properties of Expectation). Satisfies

1. (linearity) Let Y = aX + b, then


E[Y ] = aE[X] + b

2. (monotonicity) If X ≥ 0, that is, P (X ≥ 0) = 1, then E(X) ≥ 0


3. (additivity) E[(]X + Y ) = E[X] + E[Y ]
4. For constant random variable 1, E[1] = 1
Theorem 8.29. Let X and Y be two independent random variables and g and h be real functions satisfying
g(X) and h(Y ) are random variables with finite expectations. Then
E[g(X)h(Y )] = E[g(X)]E[h(Y )]

8.6 Moments
Definition 8.32. For positive integer k, the k-th moment of X is E[X k ] and the k-th central moment is
E[(X − E[X])k ].
Theorem 8.30. If E[|X|t ] < ∞ for some t > 0, then E[|X|s ] < ∞ for any 0 ≤ s ≤ t.
Definition 8.33 (variance). The variance of a random variable X is
VAR X = E[(X − E[X])2 ]
The covariance and correlation between two random variables X and Y are
Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]
and
Cov(X, Y )
Cor(X, Y ) = √
VAR X VAR Y
9 INEQUALITIES 14

Theorem 8.31 (Properties of variance). satisfies

1. VAR X ≥ 0

2. VAR X = E[X 2 ] − (E[X])2

3. VAR aX + b = a2 VAR X

4. VAR X + Y = VAR X + VAR Y + 2Cov(X, Y )

5. VAR X + Y = VAR X + VAR Y if and only if X and Y are uncorrelated.

6. If a random variable X is bounded, then it must has finite variance.

7. VAR X = 0 if and only if P (X = c) = 1 for some c ∈ R.

Theorem 8.32 (Properties of covariance).

Cov[X, Y ] = E[X, Y ] − E[X]E[Y ]

Definition 8.34 (skewness and kurtosis). The standardized third and fourth moments are said to be skewness
and kurtosis, that is,
skewness = E[(X − µ)3 ]/σ 3 , kurtosis = E[(X − µ)4 ]/σ 4
where µ = E[X] and σ 2 = VAR X.

9 Inequalities
Theorem 9.1 (Chebychev’s inequality). Let X be a random variable with mean µ and variance σ 2 . Then, for
any α > 0,
1
P (|X − µ| ≥ ασ) ≤ 2
α
Equivalently, for α > 0,
VAR X
P (|X − µ| > α) ≤
α2
Theorem 9.2 (Markov’s inequality). If X ≥ 0 with µ = E[X] < ∞, then for any α > 0,

P (X ≥ α) ≤ µ/α

Remark 9.1. The Chebychev’s inequality is a special case of Markov’s inequality by considering

Y = (X − µ)2

Note that A = {s ∈ Ω : |X(s) − E(X)| ≥ r} = s ∈ Ω : (X(s) − E(X))2 ≥ r2




Now, consider the random variable, Y, where Y (s) = (X(s) − E(X))2 .


Note that Y is a non-negative random variable.
Thus, we can apply Markov’s inequality to it, to get:
E ((X−E(X))2 )
P (A) = P Y ≥ r2 ≤ E(Y )
= V r(X)

r2
= r2 2 .

Theorem 9.3 (Cauchy-Schwartz’ inequality). Let X and Y be two random variables having finite second
moment. Then
[E[XY ]]2 ≤ E[X 2 ]E[Y 2 ]
where the equality holds if and only if P (aX = bY ) = 1 for some a, b ∈ R.

Theorem 9.4. Let X and Y be two random variables with finite second moment. Then Y = aX + b for some
a, b if and only if |Corr(X, Y )| = 1.
10 CONDITIONAL EXPECTATION 15

Lemma 9.1 (Young’s inequality). For p, q > 1 with 1/p + 1/q = 1 and two nonnegative real numbers x, y ≥ 0,
xy ≤ xp /p + y q /q
Theorem 9.5 (Hölder’s inequality). For p, q > 1 with 1/p + 1/q = 1,
E[|XY |] ≤ ||X||p ||Y ||q
when the expectations exist and are finite where ||X||r = E[|X|r ]1/r for r > 0.
Remark 9.2. The Cauchy-Schwartz’ inequality is a special case of Hölder’s inequality (p = q = 2)
Theorem 9.6 (Jensen’s inequality). For a convex function ϕ,
ϕ(E[X]) ≤ E[ϕ(X)]
Theorem 9.7 (Minkowski’s inequality). For p ≥ 1,
||X + Y ||p ≤ ||X||p + ||Y ||p

10 Conditional Expectation
Definition 10.1. conditional expectation The conditional expectation of Y given X = x is defined by
ˆ
E[Y |X = x] = y dcdfY |X (y|x)

Remark 10.1. The conditional expectation E[Y |X = x] is always a function of x, say h(x). Then denote
h(X) = E[Y |X] as a random variable.
Theorem 10.1. Assume E[|Y |] < ∞. Then
ˆ ∞ ˆ 0
E[Y |X = x] = P (Y > z|X = x) dz − P (Y < z|X = x) dz
0 −∞

If Y is discrete, then X
E[Y |X = x] = y × pmfY |X (y|x)
y

If Y is continuous, then ˆ
E[Y |X = x] = y × pmfY |X (y|x) dy

Theorem 10.2 (Properties of conditional expectation). Satisfies


1. E[aY + b|X] = aE[Y |X] + b
2. If P (Y ≥ 0|X) = 1, then E[Y |X] ≥ 0
3. E[Y + Z|X] = E[Y |X] + E[Z|X]
4. for constant random variable 1, E[1|X] = 1
5. for convex function φ, ϕ(E[Y |X]) ≤ E[ϕ(Y )|X]
Theorem 10.3 (Law of Total Expectation).
E[E[Y |X]] = E[Y ]
i.e. The expected value of the conditional expected value of Y given X is the same as the expected value of Y .
One special case states that if {Ai }i is a finite or countable partition of the sample space, then
X
E[X] = E[X|Ai ]P (Ai )
i
11 PROBABILITY RELATED FUNCTIONS 16

Definition 10.2. conditional variance The conditional variance is given by

VAR Y |X = x = E[(Y − E[Y |X = x])2 |X = x]

Theorem 10.4.
VAR Y = E[VAR Y |X] + VAR E[Y |X]

11 Probability Related Functions


Let X be a random variable.

1. moment generating function: mgfX (t) = E[etX ]

2. cumulant generating function: cgfX (t) = log E[etX ]

3. probability generating function: pgfX (t) = E[z X ]

4. characteristic generating function: chfX (t) = E[eitX ]


where t ∈ R, z > 0 and i = −1 is the unit imaginary number.

Theorem 11.1 (properties of mgf). As follows

1. mgfX (0) = 1
dk
2. E[X k ] = dtk mgfX (0) if it exists

3. If E[|X|k ] < ∞, then for µj = E[X j ] where j = 1, . . . , k,

t2 tk
mgfX (t) = 1 + µ1 t + µ2 + . . . + µk + o(|t|k )
2! k!

4. mgfaX+b (t) = ebt mgfX (at)

5. If X and Y are independent, then

mgfX,Y (s, t) = mgfX (s)mgfY (t)

Theorem 11.2 (properties of cgf). As follows

1. cgfX (0) = 0

2. If X and Y are independent, then

cgfX,Y (s, t) = cgfX (s) + cgfY (t)

Theorem 11.3 (properties of pgf). As follows

1. pgfX (1) = 1
dk
2. E[X(X − 1) . . . (X − k + 1)] = dz k
pgfX (1) if it exists.

3. If X and Y are independent, then

pgfX,Y (s, t) = pgfX (s) + pgfY (t)

Theorem 11.4 (properties of chf). As follows


11 PROBABILITY RELATED FUNCTIONS 17

1. chfX (0) = 1
k
2. E[X k ] = (i)−k dt
d
k chfX (0) if it exists

3. If E[|X|k ] < ∞, then for µj = E[X j ] where j = 1, . . . , k,

t2 tk
chfX (t) = 1 + iµ1 t − µ2 + . . . + ik µk + o(|t|k )
2! k!

4. chfaX+b = eibt chfX (at)

5. If X and Y are independent, then

chfX,Y (s, t) = chfX (s)chfY (t)

6. |chfX (t)| ≤ 1 for all t

7. chf is uniformly continuous

8. for any t1 , . . . , tn ∈ R and z1 , . . . , zn ∈ C,


X
chfX (tj − tk )zj z̄k ≥ 0
j,k

Theorem 11.5. If two random variables X and Y have the same moment generating functions in an open
neighbourhood of 0, that is, (−a, b) for a, b > 0, then X and Y are identically distributed.

Theorem 11.6. If a function ϕ : R → C satisfies 5 - 8 in Theorem 11.4, then there exists a random variable
having ϕ as its characteristic function.

Definition 11.1. The joint probability/moment/cumulant generating and characteristic functions of X and Y
are

1. mgfX,Y (s, t) = E[esX+tY ]

2. cgfX,Y (s, t) = log mgfX,Y (s, t)

3. pgfX,Y (s, t) = E[sX tY ]

4. chfX,Y (s, t) = E[eisX+itY ]

Theorem 11.7 (Inversion Formula). Let ϕ be a characteristic function of a random variable X. Then for any
a, b, ˆ ∞ −iat
1 e − e−ibt
P (a < X < b) + {P (X = a) + P (X = b)}/2 = lim ϕ(t) dt
T →∞ 2π −∞ it
Theorem 11.8 (Chernoff Bound). Let X be a random variable having moment generating function. For any
constant x,
P (X ≥ x) ≤ inf e−xt mgfX (t)
t>0
12 STOCHASTIC PROCESS 18

11.1 Survival Functions


Let X be a non-negative valued random variable.
The survival function of X is SX (t) = P (X > t) or SX (t) = 1 − FX (t).
(the probability of surviving longer than time x.
The hazard function is
pdfX (t) pdfX (t)
hX (t) = =
SX (t) 1 − FX (t)
(measures the risk of event (or death) at time x. The cumulative hazard function is
ˆ t
HX (t) = hX (z) dz
0

for t > 0.
The residual (or future) lifetime given X > t is defined by

RX (t) = X − t

The mean residual lifetime is the conditional expectation of residual lifetime given X > t, that is,
ˆ ∞ ˆ ∞
SX (z)
E[RX (t)|X > t] = P (RX (t) > z|X > t) dz = (7)
0 t SX (t)

Particularly for t = 0 and SX (0) = 1,


ˆ ∞
E[RX (0)|X > 0] = SX (z) dz = E[X]
0

12 Stochastic process
Definition 12.1. A stochastic process is a collection of time indexed random variables

{Xt : t ∈ T }

A collection of σ-field F = {Ft }t∈T is called a filtration if F ⊂ Ft for any 0 ≤ s ≤ t.


A stochastic process X = {Xt }t∈T is said to be adapted to the filtration F if Xt is Ft -measurable (or
{Xt ≤ r} ∈ Ft for any real number r).
Definition 12.2 (Martingales). A stochastic process Xn is said to be a (discrete-time) martingale if
1. E[|Xn |] < ∞

2. E[Xn+1 |X0 , . . . , Xn ] = Xn for all n

3. A stochastic process Xn is said to be supermartingale if it satisfies above (1) and

E[Xn+1 |X0 , . . . , Xn ] ≤ Xn

for all n.

4. A stochastic process Xn is said to be submartingale if it satisfies above (1) and

E[Xn+1 |X0 , . . . , Xn ] ≥ Xn

for all n.
Note: the condition X0 , . . . , Xn is often replaced by F, that is,

E[Xn+1 |Fn ] = Xn
12 STOCHASTIC PROCESS 19

Remark 12.1. A martingale is both supermartingale and submartingale.


If Xn is a submartingale, then −Xn is a supermaringale.
Definition 12.3 (stopping time). A time valued random variable T is said to be a stopping time if the event
{T ≤ n} can be expressed by X0 , . . . , Xn
Example 12.1. The first time T that the stochastic process Xn is bigger than or equal to a constant K is a
stopping time by considering
{T = n} = {X1 < K, . . . , Xn−1 < K, Xn ≥ K}
Theorem 12.1 (Optional Sampling Theorem). Let Xn be a submartingale and T is a stopping time with
P (T ≤ k) = 1. Then
E[X0 ] ≤ E[XT ] ≤ E[Xk ]

12.1 Random Walk


Let X1 , X2 , . . . be a sequence of independent random variables having mean zero and variance 1. Define
Sn = X1 + . . . + Xn
Theorem 12.2. For any α > 0,
VAR Sn
P ( max |Sk | ≥ α) ≤
k=1,...,n α2
Theorem 12.3. If Xn is symmetric for each n, then
P ( max |Sk | ≥ α) ≤ 2P (Sn ≥ α)
k=1,...,n

12.2 Poisson Process


A Poisson process with intensity λ is a stochastic process N = {Nt : t ≥ 0} taking values in non-negative
integers satisfying
(a) N0 = 0 and Ns ≤ Nt if 0 ≤ s ≤t
1 − λh + o(h) if m = 0

(b) P (Nt+h = n + m|Nt = n) = λh + o(h) if m = 1

o(h) if m > 1

(c) For 0 ≤ s < t, the arrivals Nt − Ns in the interval (s, t] is independent of the arrivals Ns in the interval (0, s].
Theorem 12.4. For any fixed time t > 0, Nt ∼ P oisson(λt)
Theorem 12.5. The interarrival times X1 , X2 , . . . are independent and identically distributed from exponential
with λ

12.3 Reflection principle (Wiener process)


Definition 12.4 (Wiener Process). A continuous-time stochastic process W (t) for t ≥ 0 with W (0) = 0 and
such that the increment W (t) − W (s) is Gaussian with mean 0 and variance t − s for any 0 ≤ s < t, and
increments for nonoverlapping time intervals are independent.
Remark 12.2. Brownian motion (i.e. random walk with random step sizes) is the most common example of
a Wiener process.
Theorem 12.6 (Reflection principle). If (W (t) : t ≥ 0) is a Wiener process, and a > 0 is a threshold, then
 
P sup W (s) ≥ a = 2P (W (t) ≥ a)
0≤s≤t

Remark 12.3. If the path of a Wiener process f (t) reaches a value f (s) = a at time t = s, then the subsequent
path after time s has the same distribution as the reflection of the subsequent path about the value a.
13 MODE OF CONVERGENCE 20

13 Mode of Convergence
Definition 13.1. Modes of convergence
d
• A sequence of random variables Xn converges to X in distribution (Xn −→ X) if

P (Xn ≤ x) → P (X ≤ x)

as n → ∞ for any x with P (X = x) = 0.


p
• A sequence of random variables Xn converges to X in probability (Xn −→ X) if

P (|Xn − X| > ) → 0

as n → ∞
a.s.
• A sequence of random variables Xn converges to X almost surely (Xn −→ X) if

P (lim sup |Xn − X| = 0) = 1


n→∞

Lp
• A sequence of random variables Xn converges to X in Lp (Xn −→ X) for p > 0 if

E[|Xn − X|p ] → 0

as n → ∞
Theorem 13.1. Let Xn and X be discrete random variables with probability mass functions fn (x) and f (x)
satisfying fn (x) → f (x) for any x with f (x) > 0. Then

Xn −→ X

in distribution.
Theorem 13.2 (Relations between modes of convergence). As follows:
a.s. p
(a) Xn −→ X =⇒ Xn −→ X
Lp p
(b) Xn −→ X =⇒ Xn −→ X
p d
(c) Xn −→ X =⇒ Xn −→ X

13.1 L1 Convergence
Lemma 13.1 (L1 Convergence). If Y ≥ 0 and E[[]Y ] < ∞, then for any  > 0 there exists M > 0 such that

E[Y 1{Y > M }] < 

Lemma 13.2. Suppose a random variable Y has a finite absolute expectation, that is, E[|Y |] < ∞. For any
 > 0, there exists δ > 0 such that |E[Y 1{A}]| <  for any event A with P (A) < δ where 1{A} is an indicator
function of the event A.
Lemma 13.3. Suppose a random variable Y has a finite absolute expectation, that is, E[|Y |] < ∞ and a
sequence An of events satisfy P (An ) → 0. Then

E[Y 1{An }] → 0

Theorem 13.3 (Dominated Convergence Theorem). Suppose that Xn → X in probability, |Xn | ≤ Y and
E[Y ] < ∞. Then
E[Xn ] → E[X]
13 MODE OF CONVERGENCE 21

Theorem 13.4 (Generalized Dominated Convergence Theorem). If all X, Y, Xn , Yn have finite absolute expec-
tation, |Xn | ≤ Yn for all n, Xn → X in probability, Yn → Y , and E[Yn ] → E[Y ], then

E[Xn ] → E[X]

Theorem 13.5 (Monotone Convergence Theorem). Let Xn be non-negative non-decreasing random variables.
Suppose lim Xn = X is finite a.s. Then
n→∞
lim E[Xn ] = E[X]
n→∞

Theorem 13.6 (Fatou’s lemma). Let X1 , X2 , . . . be a sequence of non-negative random variables. Then

E[ lim inf Xn ] ≤ lim inf E[Xn ]


n→∞ n→∞

13.2 Almost Sure Convergence


Theorem 13.7 (Borel-Cantelli lemma). Let A = ∩∞ ∞
m=1 ∪n=m An be the event that infinitely many An ’s occur.
P
1. P (A) = 0 if n P (An ) < ∞
P
2. P (A) = 1 if n P (An ) = ∞ and A1 , A2 , . . . are independent.

Theorem 13.8. If for any  > 0, ∞


P
n=1 P (|Xn − X| > ) < ∞, then Xn → X almost surely.

Theorem 13.9. If a sequence of random variables Xn converges to X in probability, then there exists a
subsequence nk such that Xnk converges to X almost surely.

Theorem 13.10. A sequence xn of real numbers converges to x if and only if for any subsequence nk there
exists a further subsequence nkl such that xnkl

Theorem 13.11. A sequence of random variables Xn converges to X in probability if and only if for any
subsequence nk there exists a further subsequence nkl such that Xnkl converges to X a.s.

13.3 Convergence in distribution


Theorem 13.12. As follows
d p
(a) If Xn −→ c where c is a constant, then Xn → c.
p Lp
(b) If Xn −→ c and P (|Xn | ≤ M ) = 1 for some M > 0, then Xn −→ X for any p > 0

Theorem 13.13. Let X be a random variable with P (X = x) = 0 for all x and F be the distribution function
of X. Then F (X) ∼ unif orm(0, 1) and F −1 (U ) ∼ X for any U ∼ unif orm(0, 1)
d
Theorem 13.14 (Skorokhod’s representation theorem). If Xn −→ X, then there exist random variables
Y, Y1 , Y2 , . . . in a probability space such that
(a) Xn and Yn have the same distribution as well as X and Y have the same distribution
a.s.
(b) Yn −→ Y

Theorem 13.15 (Continuous mapping theorem). Let g be a continuous function.


a.s. a.s.
1. Xn −→ X =⇒ g(Xn ) −→ g(X)
p p
2. Xn −→ X =⇒ g(Xn ) −→ g(X)
d d
3. Xn −→ X =⇒ g(Xn ) −→ g(X)
d
Theorem 13.16. Xn −→ X if and only if E[g(Xn )] → E[g(X)] for any bounded continuous function g.
14 LAW OF LARGE NUMBERS 22

d
Theorem 13.17. Xn −→ X if and only if

chfXn (t) → chfX (t)


d
Theorem 13.18. If Xn −→ X, then
d
aXn + b −→ aX + b
for any a, b ∈ R
d d
Theorem 13.19 (Slutsky’s lemma). Suppose Xn −→ X and Yn −→ c for a constant c.
d
1. Xn + Yn −→ X + c
d
2. Xn Yn −→ Xc
d
3. Xn /Yn −→ X/c if c 6= 0

14 Law of Large Numbers


Theorem 14.1 (Weak Law of Large Numbers). Let Xn be i.i.d. with E[|Xn |] < ∞. Then
p
X̄n −→ E[X1 ]

Theorem 14.2 (Strong Law of Large Numbers). Let X1 , . . . , Xn be i.i.d. r.v.s with E[|Xn |] < ∞. Then
a.s.
X̄n −→ E[X1 ]

Theorem 14.3. Let X1 , . . . , Xn be i.i.d. r.v.s with E[Xn2 ] < ∞.

X̄n = (X1 + . . . + Xn )/n −→ E[X1 ]

almost surely and in L2 .

15 Central Limit Theorem


For k ≈ np, the binomial probability is approximated by
(k − np)2
   
n k 1
p (1 − p)n−k ≈ p exp −
k 2πnp(1 − p) 2np(1 − p)

Theorem 15.1 (Levy’s Central Limit Theorem). Let X1 , . . . , Xn be i.i.d. r.v.s with µ = E[Xi ] and σ 2 =
VAR Xi . Then
√ d
n(X̄n − µ)/σ −→ N (0, 1)
Theorem 15.2 (Lindeberg-Feller Central Limit Theorem). Let X1 , . . . , Xn be i.i.d. r.v.s with E[Xi ] = 0 and
σi2 = VAR Xi2 < ∞. Let s2n = E[X12 ] + . . . + E[Xn2 ] The Lindeberg condition
1
Pn 2 2 →0
s2n k=1 E[Xk 1{Xk > s2n }]
for any  > 0 holds if and only if
d
(X1 + . . . + Xn )/sn −→ N (0, 1)
and
max(σ12 , . . . , σn2 )/s2n → 0
15 CENTRAL LIMIT THEOREM 23

Theorem 15.3 (Lyapounov’s condition). Let X1 , . . . , Xn be i.i.d. r.v.s with E[Xi ] = 0 and σi2 = VAR Xi2 < ∞
satisfying Lyapounov’s condition
n
1 X
lim E[|Xk |2+δ ] = 0
n→∞ s2+δ
n k=1

Then Lindeberg’s condition holds. Hence


d
(X1 + . . . + Xn )/sn −→ N (0, 1)

Theorem 15.4 (δ-method). Let X1 , . . . , Xn be i.i.d. r.v.s anf an is a sequence of positive real numbers diverging
d
to infinity. If an (Xn − µ) −→ Z for some r.v. Z and a constant µ, then for any continuously differentiable
function g,
d
an (g(Xn ) − g(µ)) −→ g 0 (µ)Z

You might also like