Chapter 7 - Sum of Independent Random - 2016 - Introduction To Statistical Machi
Chapter 7 - Sum of Independent Random - 2016 - Introduction To Statistical Machi
SUM OF INDEPENDENT
RANDOM VARIABLES
7
CHAPTER CONTENTS
Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Reproductive Property. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
In this chapter, the behavior of the sum of independent random variables is first
investigated. Then the limiting behavior of the mean of independent and identically
distributed (i.i.d.) samples when the number of samples tends to infinity is discussed.
7.1 CONVOLUTION
Let x and y be independent discrete variables, and z be their sum:
z = x + y.
Since x + y = z is satisfied when y = z − x, the probability of z can be computed by
summing the probability of x and z − x over all x. For example, let z be the sum of
the outcomes of two 6-sided dice, x and y. When z = 7, these dice take
(x, y) = (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1),
and summing up the probabilities of occurring these combinations gives the proba-
bility of z = 7.
The probability mass function of z, denoted by k(z), can be expressed as
k(z) = g(x)h(z − x),
x
where g(x) and h(y) are the probability mass functions of x and y, respectively. This
operation is called the convolution of x and y and denoted by x ∗ y. When x and y are
continuous, the probability density function of z = x + y, denoted by k(z), is given
similarly as
k(z) = g(x)h(z − x)dx,
where g(x) and h(y) are the probability density functions of x and y, respectively.
satisfies
n
1
E[x] = E[x i ] = µ,
n i=1
n
1 σ2
V [x] = 2
V [x i ] = .
n i=1 n
This means that the average of n samples has the same expectation as the original
single sample, while the variance is reduced by factor 1/n. Thus, if the number
of samples tends to infinity, the variance vanishes and thus the sample average x
converges to the true expectation µ.
The weak law of large numbers asserts this fact more precisely. When the original
distribution has expectation µ, the characteristic function φ x (t) of the average of
independent samples can be expressed by using the characteristic function φ x (t) of a
single sample x as
t n t n
φ x (t) = φ x = 1 + iµ + · · · .
n n
76 CHAPTER 7 SUM OF INDEPENDENT RANDOM VARIABLES
The mean of samples x 1 , . . . , x n usually refers to the arithmetic mean, but other means
such as the geometric mean and the harmonic mean are also often used:
n
1
Arithmetic mean: xi ,
n
i=1
1
n n
Geometric mean: *. x i +/ ,
, i=1 -
1
Harmonic mean: 1 1
.
n
n i=1 x i
For example, suppose that the weight increased by the factors 2%, 12%, and 4%
in the last three years, respectively. Then the average increase rate is not given
by the arithmetic mean (0.02 + 0.12 + 0.04)/3 = 0.06, but the geometric mean
1
(1.02 × 1.12 × 1.04) 3 ≈ 1.0591. When climbing up a mountain at 2 kilometer per
hour and going back at 6 kilometer per hour, the mean velocity is not given by the
arithmetic mean (2 + 6)/2 = 4 but by the harmonic mean 2d/( d2 + d6 ) = 3 for distance
d, according to the formula “velocity = distance/time.” When x 1 , . . . , x n > 0, the
arithmetic, geometric, and harmonic means satisfy
1
n n n
1 1
x i ≥ *. x i +/ ≥ 1 1
,
n n
i=1 , i=1 - n i=1 x i
and the equality holds if and only if x 1 = · · · = x n . The generalized mean is defined
for p , 0 as
1
n p
*. 1
p
x i +/ .
n
, i=1 -
The generalized mean is reduced to the arithmetic mean when p = 1, the geometric
mean when p → 0, and the harmonic mean when p = −1. The maximum of x 1 , . . . , x n
is given when p → +∞, and the minimum of x 1 , . . . , x n is given when p → −∞.
When p = 2, it is called the root mean square.
FIGURE 7.1
Arithmetic mean, geometric mean, and harmonic mean.
Then Eq. (3.5) shows that the limit n → ∞ of the above equation yields
(a) Standard normal distribution N (0, 1) (b) Standard Cauchy distribution Ca(0, 1)
FIGURE 7.2
Law of large numbers.
holds for any ε > 0. This is the weak law of large numbers and x is said to
converge in probability to µ. If the original distribution has the variance, its proof
is straightforward by considering the limit n → ∞ of Chebyshev’s inequality (8.4)
(see Section 8.2.2).
On the other hand, the strong law of large numbers asserts
Pr lim x = µ = 1,
n→∞
and x is said to almost surely converge to µ. The almost sure convergence is a more
direct and stronger concept than the convergence in probability.
Fig. 7.2 exhibits the behavior of the sample average x = n1 i=1
n
x i when
x 1 , . . . , x n are i.i.d. with the standard normal distribution N(0, 1) or the standard
Cauchy distribution Ca(0, 1). The graphs show that, for the normal distribution which
possesses the expectation, the increase of n yields the convergence of the sample
average x to the true expectation 0. On the other hand, for the Cauchy distribution
which does not have the expectation, the sample average x does not converge even if
n is increased.
FIGURE 7.3
Central limit theorem. The solid lines denote the normal densities.
the sample average follow? Fig. 7.3 exhibits the histograms of the sample averages
for the continuous uniform distribution U(0, 1), the exponential distribution Exp(1),
and the probability distribution used in Fig. 19.11, together with the normal densities
with the same expectation and variance. This shows that the histogram of the sample
average approaches the normal density as the number of samples, n, increases.
The central limit theorem asserts this fact more precisely: for standardized random
variable
x−µ
z= √ ,
σ/ n
7.4 CENTRAL LIMIT THEOREM 79