Lect 05
Lect 05
• Motivation
• One of the key questions in statistical signal processing is how to estimate the
statistics of a r.v., e.g., its mean, variance, distribution, etc.
To estimate such a statistic, we collect samples and use an estimator in the form
of a sample average
◦ How good is the estimator? Does it converge to the true statistic?
◦ How many samples do we need to ensure with some confidence that we are
within a certain range of the true value of the statistic?
• Another key question in statistical signal processing is how to estimate a signal
from noisy observations, e.g., using MSE or linear MSE
◦ Does the estimator converge to the true signal?
◦ How many observations do we need to achieve a desired estimation accuracy?
• The subject of convergence and limit theorems for r.v.s addresses such questions
sn 2
−2
1 2 3 4 5 6 7 8 9 10
0.5
sn
−0.5
10 20 30 40 50 60 70 80 90 100
0.2
sn
−0.2
100 200 300 400 500 600 700 800 900 1000
0.05
sn
−0.05
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
n
• Recall that a sequence of numbers x1, x2, . . . , xn, . . . converges to x if for every
ǫ > 0, there exists an m(ǫ) such that |xn − x| < ǫ for every n ≥ m(ǫ)
• Now consider a sequence of r.v.s X1, X2, . . . , Xn, . . . all defined on the same
probability space Ω. For every ω ∈ Ω we obtain a sample sequence (sequence of
numbers) X1(ω), X2(ω), . . . , Xn(ω), . . .
• A sequence X1, X2, X3, . . . of r.v.s is said to converge to a random variable X
with probability 1 (w.p.1, also called almost surely) if
P{ω : lim Xn(ω) = X(ω)} = 1
n→∞
• This means that the set of sample paths that converge to X(ω), in the sense of
a sequence converging to a limit, has probability 1
• Equivalently, X1, X2, . . . , Xn, . . . converges w.p.1 if for every ǫ > 0,
lim P{|Xn − X| < ǫ for every n ≥ m} = 1
m→∞
the sequence does not converge in m.s. even though it converges w.p.1
i.e., Xn → X in probability
• The converse is not necessarily true. In Example 3, Xn converges in probability.
Now consider
1 1
E (Xn − 0)2 = 0 · 1 − + n2 · = n → ∞ as n → ∞
n n
Thus Xn does not converge in m.s.
• So convergence in probability is weaker than both convergence w.p.1 and in m.s.
• The WLLN states that if X1, X2, . . . , Xn, . . . is a sequence of i.i.d. r.v.s with
finite mean E(X) and variance Var(X), then
n
1 X
Sn = Xi → E(X) in probability
n
i=1
Z1 pdf 0.4
0.2
0
−3 −2 −1 0 1 2 3
Z2 pdf
0.4
0.2
0
−3 −2 −1 0 1 2 3
Z4 pdf
0.4
0.2
0
−3 −2 −1 0 1 2 3
Z16 pdf
0.4
0.2
0
−3 −2 −1 0 1 2 3
z
EE 278: Convergence and Limit Theorems Page 5 – 17
• Example:
Pn Let X1, X2, . .p. be i.i.d. Bern(1/2). The normalized sum is
Zn = i=1(Xi − 0.5)/ n/4. The following plots show the cdf of Zn for
n = 10, 20, 160. Zn is discrete and thus has no pdf, but its cdf converges to the
Gaussian cdf
1
Z10 cdf
0.5
0
−3 −2 −1 0 1 2 3
1
Z20 cdf
0.5
0
−3 −2 −1 0 1 2 3
1
Z160 cdf
0.5
0
−3 −2 −1 0 1 2 3
z
• Let X1, X2, . . . , Xn be i.i.d. with finite mean E(X) and variance Var(X) and
let Sn be the sample mean
• Given ǫ, δ > 0, how large should n, the number of samples, be so that
P{|Sn − E(X)| ≤ ǫ} ≥ 1 − δ ?
0.8
1.5
Y1 pdf
Y2 pdf
0.6
1
0.4
0.5
0.2
0 0
1 2
0.8 1 1.5 2
0.6 0.8 1.5
0.6 1
0.4 1
0.4 0.5
0.2 0.5
0.2
0 0 0 0
0.8 0.6
0.7
0.5
0.6
Y3 pdf
Y4 pdf
0.4
0.5
0.4 0.3
0.3
0.2
0.2
0.1
0.1
0 0
3 4
2.5
3 3 4
2 2.5
3
1.5 2 2
1 1.5 2
1 1
0.5 1
0.5
0 0 0 0
• The following figure summarizes the relationships between the different types of
convergence we discussed
with probability 1
in probability in distribution
in mean square