CH 4
CH 4
1. Geometric Series
2 3
X 1
1+ r +r + r + ... = rx = , when |r| < 1.
x=0
1r
P
This formula proves that x=0 P(X = x) = 1 when X Geometric(p):
X X
P(X = x) = p(1 p)x P(X = x) = p(1 p)x
x=0 x=0
X
= p (1 p)x
x=0
p
= (because |1 p| < 1)
1 (1 p)
= 1.
75
= e e
= 1.
n
Note: Another useful identity is: e = lim 1+ for R.
n n
76
The name probability generating function also gives us another clue to the role
of the PGF. The PGF can be used to generate all the probabilities of the
distribution. This is generally tedious and is not often an efficient way of
calculating probabilities. However, the fact that it can be done demonstrates
that the PGF tells us everything there is to know about the distribution.
X
X
sx P(X = x).
GX (s) = E s =
x=0
X
X
x
2. GX (1) = 1 : GX (1) = 1 P(X = x) = P(X = x) = 1.
x=0 x=0
X ~ Bin(n=4, p=0.2)
Check GX (0):
200
GX (0) = (p 0 + q)n
150
= qn
G(s)
100
= P(X = 0).
50
0
20 10 0 10
Check GX (1): s
GX (1) = (p 1 + q)n
= (1)n
= 1.
78
x=0
4
3
X
= p (qs)x
1
0
x=0 5 0 5 s
p
= for all s such that |qs| < 1.
1 qs
p 1
Thus GX (s) = for |s| < .
1 qs q
79
The probability generating function gets its name because the power series can
be expanded and differentiated to reveal the individual probabilities. Thus,
given only the PGF GX (s) = E(sX ), we can recover all probabilities P(X = x).
For shorthand, write px = P(X = x). Then
X
X
GX (s) = E(s ) = px sx = p0 + p1s + p2s2 + p3 s3 + p4s4 + . . .
x=0
1
Thus p2 = P(X = 2) = GX (0).
2
Third derivative: G
X (s) = (3 2 1)p3 + (4 3 2)p4 s + . . .
1
Thus p3 = P(X = 3) = G (0).
3! X
In general:
n
1 (n) 1 d
pn = P(X = n) = GX (0) = n
(GX (s)) .
n! n! ds
s=0
80
s
Example: Let X be a discrete random variable with PGF GX (s) = (2 + 3s2 ).
5
Find the distribution of X.
2 3 3
GX (s) = s + s : GX (0) = P(X = 0) = 0.
5 5
2 9 2 2
GX (s) = + s : GX (0) = P(X = 1) = .
5 5 5
18 1
GX (s) = s : G (0) = P(X = 2) = 0.
5 2 X
18 1 3
G
X (s) = : GX (0) = P(X = 3) = .
5 3! 5
(r) 1 (r)
GX (s) = 0 r 4 : G (s) = P(X = r) = 0 r 4.
r! X
Thus
1 with probability 2/5,
X=
3 with probability 3/5.
Fact: If two power series agree on any interval containing 0, however small, then
all terms of the two series are equal.
Practical use: If we can show that two random variables have the same PGF in
some interval containing 0, then we have shown that the two random variables
have the same distribution.
Another way of expressing this is to say that the PGF of X tells us everything
there is to know about the distribution of X .
81
As well as calculating probabilities, we can also use the PGF to calculate the
moments of the distribution of X. The moments of a distribution are the mean,
variance, etc.
Theorem 4.4: Let X be a discrete random variable with PGF GX (s). Then:
1. E(X) = GX (1).
dk GX (s)
n o
(k)
2. E X(X 1)(X 2) . . . (X k + 1) = GX (1) = .
dsk s=1
(This is the k th factorial moment of X .)
X
so GX (s) = xsx1 px
G(s)
4
x=0
X
2
(k) dk GX (s) X
2. GX (s) = k
= x(x 1)(x 2) . . . (x k + 1)sxk px
ds
x=k
(k)
X
so GX (1) = x(x 1)(x 2) . . . (x k + 1)px
x=k
n o
= E X(X 1)(X 2) . . . (X k + 1) .
82
Solution:
6
GX (s) = e(s1)
G(s)
4
2
E(X) = GX (1) = .
0
For the variance, consider 0.0 0.5 1.0 1.5
n o s
E X(X 1) = GX (1) = 2 e(s1) |s=1 = 2 .
So
Var(X) = E(X 2) (EX)2
n o
= E X(X 1) + EX (EX)2
= 2 + 2
= .
One of the PGFs greatest strengths is that it turns a sum into a product:
(X1 +X2 ) X1 X2
E s =E s s .
This makes the PGF useful for finding the probabilities and moments of a sum
of independent random variables.
= e(s1) e(s1)
= e(+)(s1) .
But this is the PGF of the Poisson( + ) distribution. So, by the uniqueness of
PGFs, X + Y Poisson( + ).
Proof:
n o
X1 +...+XN
= EN E s N (conditional expectation)
n o
X1 XN
= EN E s ...s |N
n o
X1 XN
= EN E s ...s (Xi s are indept of N )
n o
X1 XN
= EN E s ...E s (Xi s are indept of each other)
n o
N
= EN (GX (s))
= GN GX (s) (by definition of GN ).
85
d
E(TN ) = GTN (1) = GN (GX (s))
ds s=1
= GN (GX (s)) GX (s)
s=1
Solution:
1 if heron catches a fish on visit i,
Let Xi =
0 otherwise.
Then T = X1 + X2 + . . . + XN (randomly stopped sum), so
GT (s) = GN (GX (s)).
86
Now
GX (s) = E(sX ) = s0 P(X = 0) + s1 P(X = 1) = 1 p + ps.
Also,
X
X
n
GN (r) = r P(N = n) = rn (1 )n
n=0 n=0
X
= (1 ) (r)n
n=0
1
= . (r < 1/).
1 r
So
1
GT (s) = (putting r = GX (s)),
1 GX (s)
giving:
1
GT (s) =
1 (1 p + ps)
1
=
1 + p ps
1
[could this be Geometric? GT (s) = for some ?]
1 s
1
=
(1 + p) ps
1
1 + p
=
(1 + p) ps
1 + p
87
1 + p p
1 + p
=
p
1 s
1 + p
p
1
1 + p
= .
p
1 s
1 + p
p
This is the PGF of the Geometric 1 distribution, so by unique-
1 + p
ness of PGFs, we have:
1
T Geometric .
1 + p
Here are the first few steps of solving the heron problem without the PGF.
Recall the problem:
Let N Geometric(1 ), so P(N = n) = (1 )n;
Let X1, X2 , . . . be independent of each other and of N , with Xi Binomial(1, p)
(remember Xi = 1 with probability p, and 0 otherwise);
Let T = X1 + . . . + XN be the randomly stopped sum;
Find the distribution of T .
88
Without using the PGF, we would tackle this by looking for an expression for
P(T = t) for any t. Once we have obtained that expression, we might be able
to see that T has a distribution we recognise (e.g. Geometric), or otherwise we
would just state that T is defined by the probability function we have obtained.
X
P(T = t) = P(T = t | N = n)P(N = n). ()
n=0
Back to the heron problem: we are lucky in this case that we know the distri-
bution of (T | N = n) is Binomial(N = n, p), so
n t
P(T = t | N = n) = p (1 p)nt for t = 0, 1, . . . , n.
t
X n
= pt (1 p)nt(1 )n
n=t
t
t X h
p n in
= (1 ) (1 p) ()
1p n=t
t
= ...?
As it happens, we can evaluate the sum in () using the fact that Negative
Binomial probabilities sum to 1. You can try this if you like, but it is quite
tricky. [Hint: use the Negative Binomial (t + 1, 1 (1 p)) distribution.]
1
Overall, we obtain the same answer that T Geometric , but
1 + p
hopefully you can see why the PGF is so useful.
We have been using PGFs throughout this chapter without paying much at-
tention to their mathematical properties. For example, are we sure that the
power series GX (s) = x
P
x=0 s P(X = x) converges? Can we differentiate and
integrate the infinite power series term by term as we did in Section 4.4? When
we said in Section 4.4 that E(X) = GX (1), can we be sure that GX (1) and its
derivative GX (1) even exist?
(No general statement is made about what happens when |s| = R.)
Note: This gives us the surprising result that the set of s for which the PGF GX (s)
converges is symmetric about 0: the PGF converges for all s (R, R), and
for no s < R or s > R.
This is surprising because the PGF itself is not usually symmetric about 0: i.e.
GX (s) 6= GX (s) in general.
As in Section 4.2,
X
X
x x
GX (s) = s (0.8)(0.2) = 0.8 (0.2s)x
x=0 x=0
0.8
= for all s such that |0.2s| < 1.
1 0.2s
1
This is valid for all s with |0.2s| < 1, so it is valid for all s with |s| < 0.2 = 5.
(i.e. 5 < s < 5.)
The radius of convergence is R = 5.
The figure shows the PGF of the Geometric(p = 0.8) distribution, with its
radius of convergence R = 5. Note that although the convergence set (5, 5) is
symmetric about 0, the function GX (s) = p/(1 qs) = 4/(5 s) is not.
5 0 5 s
Radius of Convergence
At the positive end, as s 5, both GX (s) and p/(1 qs) approach infinity.
So the PGF is (left)-continuous at +R:
lim GX (s) = GX (5) = .
s5
As in Section 4.2,
n
X n x nx
x
GX (s) = s p q
x=0
x
n
X n
= (ps)x q nx
x=0
x
Abels theorem states that this sort of effect can never happen at s = 1 (or at
+R). In particular, GX (s) is always left-continuous at s = 1:
lim GX (s) = GX (1) always, even if GX (1) = .
s1
93
Note: Remember that the radius of convergence R 1 for any PGF, so Abels
Theorem means that even in the worst-case scenario when R = 1, we can still
trust that the PGF will be continuous at s = 1. (By contrast, we can not be
sure that the PGF will be continuous at the the lower limit R).
Abels Theorem means that for any PGF, we can write GX (1) as shorthand for
lims1 GX (s).
It also clarifies our proof that E(X) = GX (1) from Section 4.4. If we assume
that term-by-term differentiation is allowed for GX (s) (see below), then the
proof on page 81 gives:
X
GX (s) = sx px ,
x=0
X
so GX (s) = xsx1px (term-by-term differentiation: see below).
x=1
= GX (1)
= lim GX (s),
s1
We have stated that the PGF converges for all |s| < R for some R. In fact,
the probability generating function converges absolutely if |s| < R. Absolute
convergence is stronger than convergence alone: it means that the sum of abso-
lute values, x
P
x=0 |s P(X = x)|, also converges. When two series both converge
absolutely, the product series also converges absolutely. This guarantees that
GX (s) GY (s) is absolutely convergent for any two random variables X and Y .
This is useful because GX (s) GY (s) = GX+Y (s) if X and Y are independent.
The PGF also converges uniformly on any set {s : |s| R } where R < R.
Intuitively, this means that the speed of convergence does not depend upon the
value of s. Thus a value n0 can be found such that for all values of n n0,
the finite sum nx=0 sx P(X = x) is simultaneously close to the converged value
P
GX (s), for all s with |s| R . In mathematical notation: > 0, n0
Z such that s with |s| R , and n n0,
Xn
x
s P(X = x) GX (s) < .
x=0
P
Fact: Let GX (s) = E(sX ) = x
x=0 s P(X = x), and let s < R.
!
d X X d x X
1. GX (s) = x
s P(X = x) = (s P(X = x)) = xsx1 P(X = x).
ds x=0 x=0
ds x=0
(term by term differentiation).
Z
!
Z b Z b X X b
x x
2. GX (s) ds = s P(X = x) ds = s P(X = x) ds
a a x=0 x=0 a
X sx+1
= P(X = x) for R < a < b < R.
x=0
x+1
(term by term integration).
95
The transition diagram below shows the symmetric random walk (all transitions
have probability p = 1/2.)
1/2 1/2 1/2 1/2 1/2 1/2
2 1 0 1 2 3
Question:
What is the key difference between the random walk and the gamblers ruin?
The random walk has an INFINITE state space: it never stops. The gamblers
ruin stops at both ends.
However, there is a new and very useful piece of information that the PGF can
tell us quickly and easily:
what is the probability that we NEVER reach state j , starting from state i?
For example, imagine that the random walk represents the share value for an
investment. The current share price is i dollars, and we might decide to sell
when it reaches j dollars. Knowing how long this might take, and whether there
is a chance we will never succeed, is fundamental to managing our investment.
97
To tackle this problem, we define the random variable T to be the time taken
(number of steps) to reach state j, starting from state i. We find the PGF of
T , and then use the PGF to discover P(T = ). If P(T = ) > 0, there is a
positive chance that we will NEVER reach state j, starting from state i.
We will see how to determine the probability of never reaching our goal in
Section 4.11. First we will see how to calculate the PGF of a reaching time T
in the random walk.
Arrived!
98
Solution:
Let Yn be the step taken at time
n: up or down. For the symmetric random walk,
1 with probability 0.5,
Yn =
1 with probability 0.5,
and Y1 , Y2, . . . are independent.
Recall Tij = number of steps to get from state i to state j for any i, j ,
1n T01
T01
o
= E s | Y1 = 1 + E s | Y1 = 1 .
2
But T1,1 = T1,0 + T01, because the process must pass through 0 to get from 1
to 1.
Now T1,0 and T01 are independent (Markov property). Also, they have the
same distribution because the process is translation invariant (i.e. all states are
the same):
1/2 1/2 1/2 1/2 1/2 1/2
2 1 0 1 2 3
Thus
E sT01 | Y1 = 1 E s1+T1,1
=
E s1+T1,0+T0,1
=
sE sT1,0 E sT01
= by independence
= s(H(s))2 because identically distributed.
Thus
1
s + s(H(s))2
H(s) = by .
2
1 1 s2
Thus H(s) = .
s
= 0.
100
1
H(s) = 2
s + 21 sH(s)2.
101
Thus:
sH(s)2 2H(s) + s = 0.
Solve the quadratic and select the correct root as before, to get
1 1 s2
H(s) = for |s| < 1.
s
In other cases, we will always reach state j eventually, starting from state i.
P
Thinking of t=0 P(T = t) as 1 P(T = )
in other words
X
P(T = t) = 1 P(T = ).
t=0
The term for P(T = )s is missed out. The PGF is defined as the generating
function of the probabilities for finite values only.
103
E(sT ) = H(s) for |s| < 1 because the missing term is zero: i.e. because
s = 0 when |s| < 1.
E(sT ) is NOT left-continuous at s = 1. There is a sudden leap (disconti-
nuity) at s = 1 because s = 0 as s 1, but s = 1 when s = 1.
We test whether T is defective by testing whether or not E(sT ) jumps off the
train that is, we test whether or not H(s) is equal to E(sT ) when s = 1.
Let H(s) = t
P
t=0 s P(T = t) be the power series representing the PGF of T
for |s| < 1. Then T is defective if and only if H(1) < 1.
104
1. We want to know the probability that we will NEVER reach state j, start-
ing from state i.
2. Define T to be the random variable giving the number of steps taken to
get from state i to state j.
3. The event that we never reach state j, starting from state i, is the same
as the event that T = . (If we wait an infinite length of time, we never
get there.) So
4. Find H(s) = t
P
t=0 s P(T = t), using a calculation like the one we did in
Section 4.9. H(s) is the PGF of T for |s| < 1. We only need to find it for
|s| < 1. The calculation in Section 4.9 only works for |s| 1 because the
expectations are infinite or undefined when |s| > 1.
5. The random variable T is defective if and only if H(1) < 1.
6. If H(1) < 1, then the probability that T takes the value is the missing
piece: P(T = ) = 1 H(1).
Overall:
E(T ) and Var(T ) can not be found using the PGF when T is defective: you
will get the wrong answer.
When you are asked to find E(T ) in a context where T might be defective:
In the random walk in Section 4.9, we defined the first reaching time T01 as the
number of steps taken to get from state 0 to state 1.
Questions:
a) What is the probability that we never reach state 1, starting from state 0?
Solutions:
a) We need to know whether T01 is defective.
T01 is defective if and only if H(1) < 1.
1 112
Now H(1) = 1
= 1. So T01 is not defective.
Thus
P(never reach state 1 | start from state 0) = 0.
We will DEFINITELY reach state 1 eventually, even if it takes a very long time.
106
b) Because T01 is not defective, we can find E(T01) by differentiating the PGF:
E(T01) = H (1).
1 1 s2 1/2
H(s) = = s1 s2 1
s
1 1/2
So H (s) = s2 s2 1 2s3
2
Thus
1 1
E(T01) = lim H (s) = lim 2 + q = .
s1 s1 s 3 1
s s2 1
So the expected number of steps to reach state 1 starting from state 0 is infinite:
E(T01) = .
This result is striking. Even though we will definitely reach state 1, the
expected time to do so is infinite! In general, we can prove the following results
for random walks, starting from state 0:
p
Property Reach state 1? P(T01 = ) E(T01) 0 1
p>q Guaranteed 0 finite
q
1
p=q= 2 Guaranteed 0
p<q Not guaranteed >0