0% found this document useful (0 votes)
11 views

Lect 05

This document discusses various types of convergence for random variables and sequences of random variables. It begins by motivating the importance of convergence in statistical signal processing applications like estimating statistics and signals from data. It then defines and provides examples of three key types of convergence: 1) Convergence with probability 1, also known as almost sure convergence, where a sequence of random variables converges to a limit with probability 1. 2) Convergence in mean square, where the expected value of the squared difference between a sequence and its limit goes to 0 as the sequence length increases. 3) The weak law of large numbers (WLLN), where sample means of independent and identically distributed random variables converge in probability to

Uploaded by

davidlass547
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lect 05

This document discusses various types of convergence for random variables and sequences of random variables. It begins by motivating the importance of convergence in statistical signal processing applications like estimating statistics and signals from data. It then defines and provides examples of three key types of convergence: 1) Convergence with probability 1, also known as almost sure convergence, where a sequence of random variables converges to a limit with probability 1. 2) Convergence in mean square, where the expected value of the squared difference between a sequence and its limit goes to 0 as the sequence length increases. 3) The weak law of large numbers (WLLN), where sample means of independent and identically distributed random variables converge in probability to

Uploaded by

davidlass547
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Lecture Notes 5

Convergence and Limit Theorems

• Motivation

• Convergence with Probability 1

• Convergence in Mean Square

• Convergence in Probability, WLLN

• Convergence in Distribution, CLT

EE 278: Convergence and Limit Theorems Page 5 – 1


Motivation

• One of the key questions in statistical signal processing is how to estimate the
statistics of a r.v., e.g., its mean, variance, distribution, etc.
To estimate such a statistic, we collect samples and use an estimator in the form
of a sample average
◦ How good is the estimator? Does it converge to the true statistic?
◦ How many samples do we need to ensure with some confidence that we are
within a certain range of the true value of the statistic?
• Another key question in statistical signal processing is how to estimate a signal
from noisy observations, e.g., using MSE or linear MSE
◦ Does the estimator converge to the true signal?
◦ How many observations do we need to achieve a desired estimation accuracy?
• The subject of convergence and limit theorems for r.v.s addresses such questions

EE 278: Convergence and Limit Theorems Page 5 – 2


Example: Estimating the Mean of a R.V.

• Let X be a r.v. with finite but unknown mean E(X)


• To estimate the mean we generate X1, X2, . . . , Xn i.i.d. samples drawn
according to the same distribution as X and compute the sample mean
n
1 X
Sn = Xi
n
i=1

• Does Sn converge to E(X) as we increase n? If so, how fast?


But what does it mean to say that a r.v. sequence Sn converges to E(X)?
• First we give an example: Let X1, X2, . . . , Xn, . . . be i.i.d. N (0, 1)
◦ We use Matlab to generate 6 sets of outcomes of X1, . . . , Xn, . . . , X10000
◦ We then plot sn for the 6 sets of outcomes as a function of n
◦ Note that each sn sequence appears to be converging to 0, the mean of the
r.v., as n increases

EE 278: Convergence and Limit Theorems Page 5 – 3


Plots of Sample Sequences of Sn

sn 2

−2
1 2 3 4 5 6 7 8 9 10

0.5
sn

−0.5
10 20 30 40 50 60 70 80 90 100
0.2
sn

−0.2
100 200 300 400 500 600 700 800 900 1000

0.05
sn

−0.05
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
n

EE 278: Convergence and Limit Theorems Page 5 – 4


Convergence With Probability 1

• Recall that a sequence of numbers x1, x2, . . . , xn, . . . converges to x if for every
ǫ > 0, there exists an m(ǫ) such that |xn − x| < ǫ for every n ≥ m(ǫ)
• Now consider a sequence of r.v.s X1, X2, . . . , Xn, . . . all defined on the same
probability space Ω. For every ω ∈ Ω we obtain a sample sequence (sequence of
numbers) X1(ω), X2(ω), . . . , Xn(ω), . . .
• A sequence X1, X2, X3, . . . of r.v.s is said to converge to a random variable X
with probability 1 (w.p.1, also called almost surely) if
P{ω : lim Xn(ω) = X(ω)} = 1
n→∞

• This means that the set of sample paths that converge to X(ω), in the sense of
a sequence converging to a limit, has probability 1
• Equivalently, X1, X2, . . . , Xn, . . . converges w.p.1 if for every ǫ > 0,
lim P{|Xn − X| < ǫ for every n ≥ m} = 1
m→∞

EE 278: Convergence and Limit Theorems Page 5 – 5


• Example Q
1: Let X1, X2, . . . , Xn be i.i.d. Bern(1/2), and define
n
Yn = 2n i=1 Xi . Show that the sequence Yn converges to 0 w.p.1
Solution: To show this, let ǫ > 0 (and ǫ < 2m ), and consider
P{|Yn − 0| < ǫ for all n ≥ m} = P{Xn = 0 for some n ≤ m}
= 1 − P{Xn = 1 for all n ≤ m}
= 1 − ( 12 )m → 1 as m → ∞

• An important example of convergence w.p.1: the Strong Law of Large Numbers


(SLLN), which says that if X1, X2, . . . , Xn, . . . are i.i.d. with finite mean E(X),
then the sequence of sample means Sn → E(X) w.p.1
◦ The previous Matlab example is a good demonstration of the SLLN — each
of the 6 sample paths appears to be converging to 0, which is E(X)
◦ The proof of the SLLN and other convergence w.p.1 results are beyond the
scope of this course. Take Stats 310 if you want to learn a lot more about this

EE 278: Convergence and Limit Theorems Page 5 – 6


Convergence in Mean Square

• A sequence of r.v.s X1, X2, . . . , Xn, . . . converges to a random variable X


in mean square (m.s.) if
2
 
lim E (Xn − X) = 0
n→∞

• Example: Estimating the mean.


Let X1, X2, . . . , Xn, . . . be i.i.d. with finite mean E(X) and variance Var(X).
Then Sn → E(X) in m.s.
• Proof: Here we need to show that
2
 
lim E (Sn − E(X)) = 0
n→∞

First note that


 Xn  n n
1 1X 1X
E(Sn) = E Xi = E(Xi) = E(X) = E(X)
n n n
i=1 i=1 i=1

So, Sn is an unbiased estimate of E(X)

EE 278: Convergence and Limit Theorems Page 5 – 7


Now to prove convergence in m.s., consider
2 2
   
E (Sn − E(X)) = E (Sn − E(Sn))
 
n n
!2
1 X 1 X
= E Xi − E(X) 
n n
i=1 i=1
 
n n
!2
1 X X
= 2 E Xi − E(X) 
n
i=1 i=1
n
!
1 X
= Var Xi
n2
i=1
n
!
1 X
= Var(Xi) since {Xi} are independent
n2
i=1
1
= (nVar(X))
n2
1
= Var(X) → 0 as n → ∞
n

EE 278: Convergence and Limit Theorems Page 5 – 8


• Note that the proof works even if the r.v.s are only pairwise independent or even
only uncorrelated
• Example: Consider the best linear MSE estimates found in the first estimation
example of Lecture Notes 4 as a sequence of r.v.s X̂1, X̂2, . . . , X̂n, . . ., where
X̂n is the best linear estimate of X given the first n observations. This
sequence converges in m.s. to X since MSE n → 0
• Convergence in m.s. does not necessarily imply convergence w.p.1
• Example 2: Let X1, X2, . . . , Xn, . . . be a sequence of independent r.v.s such that

0 with probability 1 − 1
n
Xn =
1 with probability 1
n

Clearly this sequence converges to 0 in m.s., but does it converge w.p.1?

EE 278: Convergence and Limit Theorems Page 5 – 9


It actually does not, since for 0 < ǫ < 1 and any m
n 
1
Y 
P{|Xn − 0| < ǫ for all n ≥ m} = lim 1−
n→∞ i
i=m
n 
i−1
Y 
= lim
n→∞ i
i=m
(m − 1) m (n − 1)
= lim ···
n→∞ m (m + 1) n
m−1
= lim → 0 6= 1
n→∞ n
• Also convergence w.p.1 does not imply convergence in m.s.
Consider the sequence in Example 1. Since
2 1 n 2n
E (Yn − 0) = 2 2 = 2n ,
  

the sequence does not converge in m.s. even though it converges w.p.1

EE 278: Convergence and Limit Theorems Page 5 – 10


• Example: Convergence to a random variable:
Flip a coin with random bias P conditionally independently to obtain the
sequence X1, X2, . . . , Xn, . . ., where as usual Xi = 1 if the ith coin flip is heads
and Xi = 0 otherwise
As we already know, the r.v.s X1, X2, . . . , Xn are not independent, but given
P = p they are i.i.d. Bern(p)
It is easy to show using iterated expectation that E(Sn) = E(X1) = E(P )
In a homework exercise, you will show that Sn → P (not to E(P )) in m.s.

EE 278: Convergence and Limit Theorems Page 5 – 11


Convergence in Probability

• A sequence of r.v.s X1, X2, . . . , Xn, . . . converges to a r.v. X in probability if


for any ǫ > 0,
lim P{|Xn − X| < ǫ} = 1
n→∞

• Convergence w.p.1 implies convergence in probability. The converse is not


necessarily true, so convergence w.p.1 is stronger than in probability
• Example 3: Let X1, X2, . . . , Xn, . . . be independent such that
(
0 with probability 1 − n1
Xn =
n with probability n1
Clearly, this sequence converges in probability to 0, since
1
P{|Xn − 0| > ǫ} = P{Xn > ǫ} = → 0 as n → ∞
n
But does it converge w.p.1? The answer is no (see Example 2)

EE 278: Convergence and Limit Theorems Page 5 – 12


• Convergence in m.s. implies convergence in probability. To show this we use the
Markov inequality. For any ǫ > 0,
2 E[(Xn − X)2]
2
P{|Xn − X| > ǫ} = P{(Xn − X) > ǫ } ≤
ǫ2
If Xn → X in m.s., then
2
 
lim E (Xn − X) = 0 ⇒ lim P{|Xn − X| > ǫ} = 0 ,
n→∞ n→∞

i.e., Xn → X in probability
• The converse is not necessarily true. In Example 3, Xn converges in probability.
Now consider
1 1
 
E (Xn − 0)2 = 0 · 1 − + n2 · = n → ∞ as n → ∞
 
n n
Thus Xn does not converge in m.s.
• So convergence in probability is weaker than both convergence w.p.1 and in m.s.

EE 278: Convergence and Limit Theorems Page 5 – 13


The Weak Law of Large Numbers

• The WLLN states that if X1, X2, . . . , Xn, . . . is a sequence of i.i.d. r.v.s with
finite mean E(X) and variance Var(X), then
n
1 X
Sn = Xi → E(X) in probability
n
i=1

• We already proved that Sn → E(X) in m.s., and since convergence in m.s.


implies convergence in probability, Sn → E(X) in probability
So, WLLN requires only uncorrelation of the r.v.s (SLLN requires independence)

EE 278: Convergence and Limit Theorems Page 5 – 14


Confidence Intervals

• Given ǫ, δ > 0, how large should n, the number of samples, be so that


P{|Sn − E(X)| ≤ ǫ} ≥ 1 − δ ,

i.e., Sn is within ± ǫ of E(X) with probability ≥ 1 − δ ?


• Let’s use the Chebyshev inequality:
P{|Sn − E(X)| ≤ ǫ} = P{|Sn − E(Sn)| ≤ ǫ}
Var(Sn) Var(X)
≥1− = 1−
ǫ2 nǫ2
So n should be large enough that: Var(X)/nǫ2 ≤ δ ⇒ n ≥ Var(X)/δǫ2
• Example: Let ǫ = 0.1σX and δ = 0.001. The number of samples should satisfy
2
σX 5
n≥ 2 = 10 ,
0.001 × 0.01σX
i.e., 105 samples ensure that Sn is within ±0.1σX of E(X) with probability
≥ 0.999, independent of the distribution of X

EE 278: Convergence and Limit Theorems Page 5 – 15


Convergence in Distribution

• A sequence of r.v.s X1, X2, . . . , Xn, . . . converges in distribution to a r.v. X if


lim FXn (x) = FX (x) for every x at which FX (x) is continuous
n→∞

• Convergence in probability implies convergence in distribution — so convergence


in distribution is the weakest form of convergence we discuss
• The most important example of convergence in distribution is the Central Limit
Theorem (CLT). Let X1, X2, . . . , Xn, . . . be i.i.d. r.v.s with finite mean E(X)
2
and variance σX . Consider the normalized sum
n
1 X Xi − E(X)
Zn = √
n σX
i=1

The sum is called normalized because E(Zn) = 0 and Var(Zn) = 1


The Central Limit Theorem states that Zn → Z ∼ N (0, 1) in distribution, i.e.,
(
1 − Q(z) z ≥ 0
lim FZn (z) = Φ(z) =
n→∞ Q(−z) z<0

EE 278: Convergence and Limit Theorems Page 5 – 16


• Example:
Pn Let Xp 1 , X2 , . . . be i.i.d. U[−1, 1] r.v.s. The normalized sum is
Zn = i=1 Xi/ n/3. The following plots show the pdf of Zn for
n = 1, 2, 4, 16. Note how quickly the pdf of Zn approaches the Gaussian pdf

Z1 pdf 0.4

0.2

0
−3 −2 −1 0 1 2 3
Z2 pdf

0.4

0.2

0
−3 −2 −1 0 1 2 3
Z4 pdf

0.4

0.2

0
−3 −2 −1 0 1 2 3
Z16 pdf

0.4

0.2

0
−3 −2 −1 0 1 2 3
z
EE 278: Convergence and Limit Theorems Page 5 – 17
• Example:
Pn Let X1, X2, . .p. be i.i.d. Bern(1/2). The normalized sum is
Zn = i=1(Xi − 0.5)/ n/4. The following plots show the cdf of Zn for
n = 10, 20, 160. Zn is discrete and thus has no pdf, but its cdf converges to the
Gaussian cdf
1
Z10 cdf
0.5

0
−3 −2 −1 0 1 2 3

1
Z20 cdf

0.5

0
−3 −2 −1 0 1 2 3

1
Z160 cdf

0.5

0
−3 −2 −1 0 1 2 3
z

EE 278: Convergence and Limit Theorems Page 5 – 18


Application: Confidence Intervals

• Let X1, X2, . . . , Xn be i.i.d. with finite mean E(X) and variance Var(X) and
let Sn be the sample mean
• Given ǫ, δ > 0, how large should n, the number of samples, be so that
P{|Sn − E(X)| ≤ ǫ} ≥ 1 − δ ?

• We can use the CLT to find an estimate of n as follows:


 X n 
1
P{|Sn − E(Sn)| ≤ ǫ} = P (Xi − E(X)) ≤ ǫ
n
i=1
 n √ 
1 X ǫ n
=P √ (Xi − E(X)) ≤
σX n σX
i=1
 √ 
ǫ n
≈ 1 − 2Q
σX
√ √
• Example: For ǫ = 0.1σX , δ = 0.001, set 2Q(0.1 n) = 0.001, so 0.1 n = 3.3
or n = 1089 — much smaller than n ≥ 105 obtained by the Chebyshev inequality

EE 278: Convergence and Limit Theorems Page 5 – 19


CLT for Random Vectors

• The CLT applies to i.i.d. sequences of random vectors


• Let X1, X2, . . . , Xn, . . . be a sequence of i.i.d. k-dimensional random vectors
with finite mean µ and nonsingular covariance matrix Σ. Define the sequence
of random vectors Z1, Z2, . . . , Zn, . . . by
n
1 X
Zn = √ (Xi − µ)
n i=1

• The Central Limit Theorem for random vectors states that as n → ∞


Zn → Z ∼ N (0, Σ) in distribution
• Example: Let X1, X2, . . . , Xn, . . . be a sequence of i.i.d. 2-dimensional random
vectors with
(
x11 + x12 0 < x11 < 1, 0 < x12 < 1
fX1 (x11, x12) =
0 otherwise
Pn
The following plots show the joint pdf of Yn = i=1 Xi for n = 1, 2, 3, 4. Note
how quickly it looks Gaussian.

EE 278: Convergence and Limit Theorems Page 5 – 20


2 1

0.8
1.5
Y1 pdf

Y2 pdf
0.6
1
0.4

0.5
0.2

0 0
1 2
0.8 1 1.5 2
0.6 0.8 1.5
0.6 1
0.4 1
0.4 0.5
0.2 0.5
0.2
0 0 0 0

0.8 0.6

0.7
0.5
0.6
Y3 pdf

Y4 pdf
0.4
0.5

0.4 0.3

0.3
0.2
0.2
0.1
0.1

0 0
3 4
2.5
3 3 4
2 2.5
3
1.5 2 2
1 1.5 2
1 1
0.5 1
0.5
0 0 0 0

EE 278: Convergence and Limit Theorems Page 5 – 21


Relationships Between Types of Convergence

• The following figure summarizes the relationships between the different types of
convergence we discussed

with probability 1

in probability in distribution

in mean square

EE 278: Convergence and Limit Theorems Page 5 – 22

You might also like