0% found this document useful (0 votes)
25 views32 pages

Markov Chains

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views32 pages

Markov Chains

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

31

3 Markov Chains
3.1 Introduction and transition matrices
A stochastic process is a random phenomenon which evolves in time. More formally,

Definition 3.1. A stochastic process X is a family {Xt : t ∈ T } of random variables


indexed by a time-set T .

In this module, T will always be the set of non-negative integers, and so our stochastic
processes will all evolve in discrete time.

Example 3.2 (Gambler’s ruin). Suppose we play a game involving repeatedly tossing a
fair coin; every time the coin comes up Heads you pay me £1; every time there’s a Tail
I pay you £1. I start with £20 and you start with £50: we stop when either of us loses
all our money.
Here we could let Xn = x if you have £x after the nth coin toss. So X0 = 50, and
{X0 , X1 , X2 , . . . } is a stochastic process. Questions that we might be interested in asking
include:

• what is the chance that you win the game?

• what is the expected number of coin tosses until the game ends? ⊛

Example 3.3 (Random walk on Z, see Figure 3.1). Here X0 = 0 and we set Xn =
Xn−1 ±1, where we add one with probability p and subtract one with probability q = 1−p.
Questions that we might like to ask include:

• what is the probability that X ever returns to its starting place?

• if it will return to zero with probability one, then what is the expected time until
it does so?

• what is the probability that the process ever reaches 20?

Definition 3.4. The state-space of a stochastic process X is the set of possible values
taken by the random variables {Xt }. We denote the state-space by S.

In this module, the state-space will always be a discrete set.

Example 3.5. In the gambler’s ruin example, the state-space is given by


{0, 1, 2, . . . , 69, 70}. In the random walk example, the state-space is Z. ⊛
3.1 Introduction and transition matrices 32

10 20 30 40
-1

-2

Figure 3.1: Possible path of a random walk on Z

Suppose that we have observed a process X at times 0, 1, 2, . . . , n − 1, and that we


now wish to make a prediction about the value of Xn . For example, suppose that our
random walk on Z has followed the path {x0 , x1 , x2 , . . . , xn−1 }. Now, given that

X +1 with probability p
n−1
Xn =
X
n−1 −1 with probability q,

it follows that



 p if xn = xn−1 + 1

P (Xn = xn | Xn−1 = xn−1 , Xn−2 = xn−2 , . . . , X1 = x1 , X0 = 0) = q if xn = xn−1 − 1


0

otherwise.

In general, the state of the process at time n could depend upon the entire past of
the process, and upon the time n. Notice however that, in this example,

• the jump in X at time n only depends upon the state of X at time n − 1;

• the conditional probability above does not depend on time n.

A stochastic process such as this is known as a Markov chain:

Definition 3.6 (Markov chain). A random process X = {Xn : n = 0, 1, . . . } is said to


be a discrete state-space Markov chain if

(a) the random variables Xn take values in a discrete state-space S;

(b) the conditional probabilities of the future given the past depend only on the present:

P (Xn = xn | Xn−1 = xn−1 , Xn−2 = xn−2 , . . . , X0 = x0 ) = P (Xn = xn | Xn−1 = xn−1 )

whenever both sides are well-defined, and for all n, all x0 , . . . , xn ∈ S.


3.1 Introduction and transition matrices 33

A stochastic process satisfying (b) above is said to have the Markov property. The
intuition behind (b) is:

P (future | past and present) = P (future | present)

Example 3.7. Let Xn denote the number of individuals alive in generation n of a


branching process. Since the distribution of Xn depends only on Xn−1 , it follows that
X = {Xn } is a Markov chain, taking values in S = {0, 1, 2, . . . }. ⊛

We describe the process as making a transition from state i to state j at time n if


Xn−1 = i and Xn = j: this happens with (one-step) transition probability

P (Xn = j | Xn−1 = i) .

Definition 3.8. X is said to have stationary transition probabilities if the condi-


tional probabilities
pij = p(i, j) = P (Xn = j | Xn−1 = i)

do not vary with time n. If this holds then X is called a time-homogeneous Markov
chain.

Unless stated otherwise, all stochastic processes that we will consider in this module
will be discrete time, discrete state-space, time-homogeneous Markov chains.

Given a chain X we can arrange its one-step transition probabilities into a transition
matrix P . For example, if S = {0, 1, 2, . . . } then we can write

0 1 2 ...
 
0 p00 p01 p02 ...
 
1 p10 p11 p12 ... 
P =  
2 p20
 p21 p22 ... 
.. .. .. ..

..
. . . . .

Similarly, we can define n-step transition probabilities as

(n)
pij = p(n) (i, j) = P (Xm+n = j | Xm = i)

(note that these don’t depend on m, by stationarity), and arrange these into an n-step
3.1 Introduction and transition matrices 34

transition matrix P (n) :

0 1 2 ...

(n) (n) (n)

0 p00 p01 p02 ...
 (n) (n) (n) 
(n)
1
 p(n)
10 p11 p12 ... 
P = (n) (n)

2 p20
 p21 p22 ... 
.. .. .. ..

...
. . . .

(n)
You should read pij as “the probability that X, if started from i, is at j after n steps”.
(0)
Remark 3.9. Note that pij = P (Xm = j | Xm = i): this is equal to 1 if i = j, and zero
otherwise. Thus P (0) = I (the identity matrix).

The matrix P is known as a stochastic matrix: all of its elements are non-negative
(they’re probabilities!), and each of the row sums equals one (for the same reason!). It
is often convenient (at least when S is small) to depict P by means of a state diagram:
each vertex in the diagram corresponds to a state in S, and each arrow to a non-zero
transition probability. For example, if S = {1, 2, 3, 4, 5} and

1 2 3 4 5
 
1 0.8 0.2 0 0 0
2 0.4 0 0 0.5 0.1 
 
 
P = 3
 1 0 0 0 0 
4 0.3 0 0 0 0.7 
 

5 0 0 0.5 0.5 0

then we can represent this as


0.2
+
0.8 5 F 1T k 2
0.4

1 0.1
0.5
0.3

 0.7
,
3 f 4 l 5
0.5

0.5

Example 3.10. Consider the following (very!) simple model of the weather: Xn = w if
it’s wet on day n, and Xn = d if dry. Suppose that

P (wet tomorrow | dry today) = 2/3


P (dry tomorrow | wet today) = 1/4 .
3.1 Introduction and transition matrices 35

Then we have
w d
!
w 3/4 1/4
P =
d 2/3 1/3

Note that, given the starting state x0 and transition matrix P , we can calculate the
probability of the process following any path {x0 , x1 , . . . , xn } over the period {0, 1, . . . , n}
as follows:

P (X follows {x0 , x1 , . . . , xn })
=P (X follows {x0 , x1 , . . . , xn } | X follows {x0 , x1 , . . . , xn−1 }) × P (X follows {x0 , x1 , . . . , xn−1 })
=P (Xn = xn | X follows {x0 , x1 , . . . , xn−1 }) × P (X follows {x0 , x1 , . . . , xn−1 })
=P (Xn = xn | Xn−1 = xn−1 ) × P (X follows {x0 , x1 , . . . , xn−1 })
=p(xn−1 , xn ) × P (X follows {x0 , x1 , . . . , xn−1 })
=p(xn−1 , xn ) × p(xn−2 , xn−1 ) × · · · × p(x0 , x1 ) × P (X0 = x0 ) ,

where in the last probability we allow for the possibility of the initial state also being
random.
In the above example, if X0 = w then the probability that we observe the sequence
{w, w, d, w} is given simply by

3 1 2 1
P (X1 = w, X2 = d, X3 = w | X0 = w) = pww pwd pdw = × × = .
4 4 3 8

The following theorem is fundamental in telling us how to calculate long-term tran-


sition probabilities:

Theorem 3.11 (Chapman-Kolmogorov equations). The n-step transition matrix P (n)


satisfies

P (n+m) = P (n) P (m) .

That is,
(n+m) (n) (m)
X
pij = pik pkj .
k∈S

Proof. We begin with the probability on the LHS, and consider all possible states that
3.1 Introduction and transition matrices 36

the chain might be in at time n:


! !
[
P (Xn+m = j | X0 = i) = P {Xn+m = j, Xn = k} X0 = i
k∈S
X
= P (Xn+m = j, Xn = k | X0 = i)
k∈S
X
= P (Xn+m = j | Xn = k, X0 = i) P (Xn = k | X0 = i)
k∈S
X
= P (Xn+m = j | Xn = k) P (Xn = k | X0 = i)
k∈S
(n) (m)
X
= pik pkj .
k∈S

Exercise 3.12. Note that in this proof (fourth line) we have declared that

P (Xn+m = j | Xn = k, X0 = i) = P (Xn+m = j | Xn = k) .

The Markov property as stated in Definition 3.6 really only tells us this is true when
m = 1. Show (by induction) that it is true for all m ∈ N. ⊛

This theorem tells us that P (2) = P (1) P (1) . But since P (1) = P , we see that P (2) = P 2
(i.e. the square of the one-step transition matrix P ). Continuing this argument, we get

P (n) = P n , n = 0, 1, 2, . . .

So we can obtain the n-step transition probabilities by calculating higher powers of the
one-step transition matrix.

Example 3.13. In the simple weather example, we can compute

w d
!
w 35/48 13/48
P (2) = P2 =
d 13/18 5/18

(n)
Proposition 3.14. Suppose we define the row vector of probabilities ν (n) by νi =
P (Xn = i). (So ν (n) is just the mass function of Xn , expressed as a vector.) Then

ν (n) = ν (r) P n−r for all r ∈ {0, 1, . . . , n − 1} (3.1)


3.1 Introduction and transition matrices 37

and in particular,

ν (n) = ν (0) P n . (3.2)

In other words, we can obtain the distribution of the chain at time n by starting with
its distribution at time r, and multiplying by the matrix P n−r .

Proof. Conditioning on the state at time n − 1 we have


X
P (Xn = j) = P (Xn = j | Xn−1 = i) P (Xn−1 = i)
i∈S
X
= pij P (Xn−1 = i) .
i

In matrix notation, this says that ν (n) = ν (n−1) P . Repeating this argument a total of
n − r times yields

ν (n) = ν (r) P n−r for all r ∈ {0, 1, . . . , n − 1} .

Example 3.15. In the simple weather example, suppose that on day 0 it is wet with
probability 1/5. Then ν (0) = (1/5, 4/5), and
!
3/4 1/4
ν (1) = ν (0) P = (1/5, 4/5) = (41/60, 19/60) ;
2/3 1/3
!
35/48 13/48
ν (2) = ν (1) P = ν (0) P 2 = (1/5, 4/5) = (0.724, 0.276) .
13/18 5/18

Thus the probability that it is wet on day 2 in this scenario is 0.724. ⊛

Exercise 3.16. What is ν (1) if ν (0) = (8/11, 3/11)? What about ν (2) ? ⊛

Example 3.17. Suppose that X0 , X1 , . . . is a sequence of independent random variables


taking values in a countable set S. Show that X = {Xn : n ≥ 0} is a Markov chain.
Under what condition is this chain homogenous?
Solution: Due to the independence of the random variables we have

P (Xn = s | Xn−1 = xn−1 , . . . , X0 = x0 ) = P (Xn = s) = P (Xn = s | Xn−1 = xn−1 ) ,

and so X is (trivially!) a Markov chain. To be homogeneous, we require the transition


probabilities to be time-independent, i.e.

P (Xn = i | Xn−1 = j) = P (X1 = i | X0 = j) .


3.1 Introduction and transition matrices 38

But this means that

P (Xn = i) = P (X1 = i) for all n ≥ 1, and all i ∈ S.

So we need {Xi } to be identically distributed in order for X to be homogeneous. ⊛

Example 3.18 (Bernoulli process). Let S = {0, 1, 2, . . . } and define a chain Y by Y0 = 0


and with
P (Yn+1 = s + 1 | Yn = s) = p = 1 − P (Yn+1 = s | Yn = s) .

Clearly Y is a Markov chain (its position at time n + 1 only depends on that at n).
We can think of Yn as counting the number of Heads in n tosses of a coin, where the
probability of obtaining a Head on any one toss is p.
The transition matrix is given by

0 1 2 3 4 ...
 
0 1−p p 0 0 0
 
1 0 1−p p 0 0 
P =  
2 0
 0 1−p p 0 
..

.. ..
. . .

Furthermore, we can easily calculate the n-step transition probabilities here:

P (Xm+n = j | Xm = i) = P (there are exactly j − i Heads in n coin tosses)



0 if j − i > n or j < i
=
 n pj−i (1 − p)n−(j−i)

if 0 ≤ j − i ≤ n,
j−i

since the number of Heads in n coin tosses is of course distributed as Bin(n, p). ⊛
3.2 State classification 39

3.2 State classification


In this section we look at the states in S, and their various properties: this will better
help us to understand the long-term behaviour of Markov chains.

3.2.1 Communicating classes

Suppose that it is possible for the chain X to make its way from state i to another
state j, and back again. Then we might expect these two states to share many common
properties, and this indeed turns out to be the case.

Definition 3.19. Suppose that i, j ∈ S. Then:


(n)
(a) we say that i leads to j if there is some n > 0 with pij > 0. In this case we write
i → j.

(b) we say that i and j intercommunicate if i → j and j → i. We write i ↔ j.

(c) i and j are in the same communicating class (written i ∼ j) if either

(i) i ↔ j, or
(ii) i = j.

Lemma 3.20. The relation ∼ is an equivalence relation (it is reflexive, symmetric, and
transitive).

Proof. The first two parts here are trivial:

• Reflexive: i ∼ i, since i = i;

• Symmetric: i ∼ j implies that i ↔ j, and so j ∼ i;

• Transitive: suppose that i ∼ j and j ∼ k – we must show that i ∼ k. If i = j or


j = k then i ∼ k is immediate. If not, then we know that i → j and j → k, and
(n) (m)
so there exist n, m > 0 such that pij > 0 and pjk > 0. Then it follows from the
Chapman-Kolmogorov equations (Theorem 3.11) that

(n+m) (n) (m)


pik ≥ pij × pjk > 0 ,

and so i → k. A similar argument shows that k → i, and so i ∼ k.

Example 3.21. Consider the chain X with state-space diagram


3.2 State classification 40

+
1 k @ 2 K
5

 
3 o 4 o 6

Here there are two communicating classes: {1, 2, 3, 4} and {5, 6}. ⊛

Definition 3.22. For a chain X on S we say that

(a) the state i ∈ S is essential if for all j ∈ S with i → j, it’s also the case that j → i.
Otherwise i is inessential.

(b) the chain X (or equivalently its transition matrix P ) is irreducible if S is one
single communicating class (i.e. if all states intercommunicate).

Suppose that the Markov chain starts at state i. Then i is essential if, wherever X
goes, it is always possible to return to its starting state; i is inessential if it is possible
for X to leave i and reach a state from which it is impossible to return.

Example 3.23. In Example 3.21, states {1, 2, 3, 4} are essential; states {5, 6} are inessen-
tial. (E.g. from 5 and 6 it is possible to reach state 4, but then it is impossible to return.)

Example 3.24. Consider the following random walk on S = {0, 1, . . . , n}, which is
absorbed when it reaches state 0 or state n:

) + +
0 k 1 k 2 k .) . . n i

Here the states {1, 2, . . . , n − 1} form an inessential communicating class. (From any
of these states it is possible to reach state 0, from which it is impossible to return.)
States 0 and n are both essential, and each forms a communicating class by itself. ⊛

Lemma 3.25. If i → k and i is essential, then k is also essential, and i ∼ k.

Proof. If i → k and i is essential, then by Definition 3.22 it follows that k → i, and so


i ∼ k. Suppose now that k → j, for some j ∈ S. Then we know that i → k and k → j,
giving i → j. But since i is essential, this implies that j → i. Therefore j → i and
i → k, and so j → k. Therefore k is essential too.

Notice that this implies that all states in any given communicating class are either
essential or inessential.
3.2 State classification 41

3.2.2 Periodicity

Definition 3.26. The period of a state is the greatest common divisor of times at
which the chain might return to the state. Thus
n o
(n)
(a) if i → i then i has period gcd n > 0 : pii > 0 .

(b) if i 9 i then the period of i is not defined.

(c) if i has period 1 then it is said to be aperiodic.

Theorem 3.27. Periodicity is a class property.

Proof. Recall that a|b means that the positive integer a divides the integer b exactly.
Suppose that i ↔ j and that i has period d.
(n) (m)
Since i ↔ j we can find n and m such that pij and pji are both positive. Since
(k) (r)
period(i) = d we know that pii > 0 means d|k (by Definition 3.26). Now, if pjj > 0
(n+r+m) (n) (r) (m) (n+m)
then pii ≥ pij pjj pji > 0 and so d|(n + r + m). But we also know that pii ≥
(n) (m)
pij pji > 0, and so d|(n + m): it follows that d|r, and so state j has period d as
required.
(r)
Exercise 3.28. This ‘proof’ really only shows that if pjj > 0 then d|r: we haven’t
shown that d is the greatest common divisor of all such times though! Show that this
must be the case, by assuming that there exists a d′ > d with d′ |r for all times r at which
it is possible for the chain to return to state j: show that this implies that d′ |k for all k
(k)
with pii > 0, contradicting the assumption that the period of i is d. ⊛

Example 3.29. In Example 3.21, the class {1, 2, 3, 4} has period 1: to see this, note that
it is possible for the chain to start at 1 and return at time 2 (via the path 1 → 2 → 1),
or to return at time 3 (1 → 3 → 2 → 1). Since gcd {2, 3} = 1, wee see that this class is
aperiodic. The period of class {5, 6} is 2, since if the chain starts in state 5, it is only
possible for it to return at even times.
Similarly, in Example 3.24, the class {1, 2, . . . , n − 1} has period 2; classes {0} and
{n} are both aperiodic. ⊛

3.2.3 Recurrence and transience

So far the classification of states has taken no account of the actual probabilities in the
transition matrix P : the ideas of communicating classes, essential states and periodicity
depend only on whether the transition probabilities are positive or not, nothing more.
It’s now time to start using this extra information to establish further properties of the
chain...
3.2 State classification 42

Definition 3.30. For all i, j ∈ S, the distribution of Ti,j , the first-passage time from i
to j, is defined by
(n)
P (Ti,j = n | X0 = i) = fij ,
(n)
where fij = P (Xn = j, Xm 6= j for m = 1, . . . , n − 1 | X0 = i) for n ≥ 1.
We also write ∞
(n)
X

fij = fij ,
n=1

and then P (Ti,j = ∞ | X0 = i) = 1 − fij∗ .

Definition 3.31. The state i is

• recurrent (or persistent) if fii∗ = 1

• transient if fii∗ < 1.

So a state i is recurrent if, when X starts at i, with probability 1 it will return to its
starting state in finite time; i is transient if there is a positive chance of the chain never
returning.
We are often particularly interested in finding fij∗ , since this tells us the probability
that the chain ever visit j if started at i. In order to calculate these first-passage
probabilities, we often make use of a technique known as first-step decomposition.
Note that these probabilities satisfy:

(1)
fij = pij
(n+1)
fij = P (X first hits j at time n + 1 | X0 = i)
X
= P (X first hits j at time n + 1 | X1 = k, X0 = i) P (X1 = k | X0 = i)
k:k→j, k6=j
X
= P (X first hits j at time n + 1 | X1 = k) pik
k:k→j, k6=j
(n)
X
= pik fkj
k:k→j, k6=j

(n)
If we now sum fij over n, we obtain the first-step decomposition:

X
fij∗ = pij + ∗
pik fkj (3.3)
k:k→j, k6=j

Thus we can calculate first-passage probabilities by solving a system of linear equations.


Later on we shall find a more efficient matrix-based approach to this, but for chains on
relatively few states, it is not too inefficient to simply solve the equations ‘longhand’.
3.2 State classification 43

Example 3.32. Two people play a game, whereby each chooses a pattern of three
symbols from {H, T }. They then repeatedly toss a fair coin, and the first person to
observe their pattern of symbols wins. For example, suppose that player A chooses the
sequence HHT , while B chooses HT H. We can represent the progress of the game as a
Markov chain, as follows. Let the state-space be S = {1, 2, 3, 4, 5, 6}, where

1 = start of game; 2 = H; 3 = HH; 4 = HT ; 5 = HHT ; 6 = HT H.

Then our chain has the following transition diagram and matrix:

1/2

3 / 5 i 1
@ 1/2 1 2 3 4 5 6
1/2  
1 1/2 1/2 0 0 0 0
1/2  
4 1 c / 2 2 0 0 1/2 1/2 0 0 
1/2
1/2 3 0
 0 1/2 0 1/2 0 
P = 


1/2  4 1/2 0 0 0 0 1/2 
4 / 6 i 1 
1/2 5 0
 0 0 0 1 0 
6 0 0 0 0 0 1

Note that we have made the chain stop as soon as it hits either state 5 or state 6, since

then one player has won and the game ends. We are now interested in P (A wins) = f15 .
Using the first-step decomposition (3.3), we obtain:

∗ 1 ∗ 1 ∗
f15 = f15 + f25
2 2
∗ ∗
and so f15 = f25 . Furthermore,

∗ 1 ∗ 1 ∗
f25 = f35 + f
2 2 45
∗ 1 ∗ 1
f35 = f35 +
2 2
∗ 1 ∗ 1 ∗
f45 = f15 + f
2 2 65

f65 = 0.


Thus f35 = 1 and f45∗
= 21 f15

= 12 f25
∗ ∗
. Substituting these values into the equation for f25 ,
∗ ∗ ∗
we obtain f25 = 1/2 + (1/4)f25 = 2/3. So f15 = 2/3 and similarly we can check that

f16 = 1/3; thus player A is twice as likely to win the game with his choice of symbols. ⊛
3.2 State classification 44

Exercise 3.33 (Exercise sheet 4). Consider the following random walk:

) 1/2 1/2 1/2 1/2 1/2


+ + + + +
1 0 k 1 k 2 k 3 k 4 k 5 6 i 1
1/2 1/2 1/2 1/2 1/2

What is the probability that the chain ever reaches state 0 if it starts at state 4? ⊛

In a similar way to which we decomposed fij∗ according to the first step of the chain
(n)
X, we can also decompose the transition probability pij according to the first time that
the chain X visits j if started at i – this gives the first-passage decomposition:

n n−1
(n) (m) (n−m) (n−u) (u)
X X
pij = fij pjj = fij pjj , n≥1 (3.4)
m=1 u=0

(where for the second equality we have simply changed variables, using u = n − m).
Notice that we have now expressed our n-step transition probability as a convolution
of two sequences: this suggests that we should use generating functions! So let
∞ ∞
(n) (n)
X  X
pij z n , Fij (z) = E z Ti,j = fij z n .

Pij (z) = and
n=0 n=1

(Be careful with the lower limits of the two sums here, and note that Pij (z) is only
P (n)
guaranteed to converge for |z| < 1, since the series pij is not necessarily finite.)
Using (3.4), we obtain

∞ ∞ n−1
!
(0) (n) (0) (n−u) (u)
X X X
Pij (z) = pij + pij z n = pij + fij pjj zn
n=1 n=1 u=0
∞ ∞
!
(0) (u) (n−u) n−u
X X
= pij + pjj z u fij z
u=0 n=u+1
(0)
= pij + Pjj (z)Fij (z) .

That is,
(0)
Pij (z) = pij + Pjj (z)Fij (z) , when |z| < 1 . (3.5)

This useful formula allows us to derive a simple criterion for determining whether or
not a state is recurrent.
3.2 State classification 45

P (n)
Theorem 3.34. The state i is recurrent if and only if n pii = ∞. Furthermore, if i
is transient then ∞
X (n) 1
pii = < ∞.
n=0
1 − fii∗

Note (Exercise sheet 4) that



(n)
X X
E [number of times X visits state i | X0 = i] = P (Xn = i | X0 = i) = pii .
n=0 n

Thus state i is transient if and only if we expect X to visit i


only a finite number of times.

P (n)
Proof. State i is recurrent iff fii∗ = 1 (by Definition 3.31). Since fii∗ = n fii , we

see that, using Abel’s Theorem (Theorem 1.1), fii = 1 iff limz↑1 Fii (z) = 1. Now, by
equation (3.5) we can write
Pii (z) − 1
Fii (z) =
Pii (z)
(0)
(since pii = 1), and so limz↑1 Fii (z) = 1 iff limz↑1 Pii (z) = ∞. Again using Abel’s
P (n)
Theorem, this occurs if and only if n pii = ∞.
(n)
Exercise 3.35. Show that if i is transient then pji → 0 as n → ∞, for all j ∈ S. ⊛

We can now prove the following:

Theorem 3.36. Recurrence and transience are class properties.


(n) (m)
Proof. Suppose that i ∼ j. Then there exist m, n such that pij > 0 and pji > 0. Then

(m) (r) (n) (m+r+n)


pji pii pij ≤ pjj .

Summing over r we obtain

(m) (n) (r) (m+r+n) (s)


X X X
pji pij pii ≤ pjj ≤ pjj .
r≥0 r≥0 s≥0

If i is recurrent, then the sum on the LHS of this equation is infinite (by Theorem 3.34),
and so the sum on the RHS must also be infinite, showing j to be recurrent.

This means that we only need to check recurrence/transience for one state in a
communicating class in order to determine whether all states in that class are recurrent.
How do recurrence/transience relate to the labels essential/inessential?
3.2 State classification 46

Theorem 3.37. If i is recurrent and i → j then i ∼ j and fij∗ = fji∗ = 1. So if i is


recurrent then it is essential.

Proof. Since i is recurrent and i → j, we know that fii∗ = 1. Putting this into the
first-step decomposition (3.3) we obtain
X X

fki = pki + pkl fli∗ = pkl fli∗ .
l6=i l

Substituting this expression for fli∗ into the RHS of this equation yields
X X

fki = pkl plg fgi∗
l g
!
X X
= pkl plg fgi∗
g l
(2)
X
= pkg fgi∗ ,
g

and multiple applications of this trick leads us to

(n)
X

fki = pkg fgi∗ .
g

But since this holds for any state k ∈ S, it certainly holds when k = i. Thus

(n)
X
1 = fii∗ = pig fgi∗ , for any n ∈ N .
g

(n)
pig = 1 (it’s the sum of row i of the transition matrix P (n) ),
P
Nearly there now! Since g
we get
(n)
X
1 − fgi∗ ,

0= pig for any n ∈ N .
g

But here we have a set of non-negative terms which adds up to zero, implying that every
term in the sum must be zero. However, we know (from one of the assumptions of the
(n)
theorem) that i → j, and so there exists n ∈ N such that pij > 0. This means that
1 − fji∗ = 0, and so fji∗ = 1, and so j → i.
Therefore j ∼ i, and it follows that j must also be recurrent (by Theorem 3.36).
Finally, to show that fij∗ = 1 as well, simply reverse the roles of i and j in the above
argument.

This theorem says that if X starts at the recurrent state i (i.e. a state to which it
will return in finite time with probability 1), then it is impossible for X to reach a state
from which it is impossible to return to i (i.e. i is essential). So once a chain enters a
3.2 State classification 47

recurrent class, it must stay there forever. Hopefully this makes good intuitive sense!
In general, the converse result does not hold. However:

Theorem 3.38. If C is a finite essential class then it is recurrent.

We end this section with simple example.

Example 3.39. Let X be a Markov chain with transition matrix:

9
I
2
1/3
1 2 3
 
1 1/3 1/3 1/3 1/3
5 1 1 1

P = 2 0 0 1 
 
1/3
3 0 1 0 %
3

We can see here that state 1 is inessential: it is possible for the chain to leave 1
and never return. Since it is inessential, by Theorem 3.37 it must also be transient (i.e.
∗ ∗
f11 < 1). We can check this by calculating f11 here easily:


f11 = P (X1 = 1 | X0 = 1) + P (X2 = 1, X1 6= 1 | X0 = 1) + . . .
= 1/3 + 0 + 0 + · · · = 1/3 .

Furthermore, by Theorem 3.34,



X (n) 1
E [expected number of times X visits 1 | X0 = 1] = p11 = ∗
= 3/2 .
n=0
1 − f11

We can also check this calculation directly in this simple example:

(0) (1) (2) (n)


p11 = 1 ; p11 = 1/3 ; p11 = 1/32 ; ... p11 = 1/3n .

Thus ∞ ∞
(n)
X X
p11 = 1/3n = 3/2 .
n=0 n=1

(2) ∗
Finally, note that, since p22 = 1, f22 = 1, and we see that state 2 is recurrent. (We
could have inferred this directly from Theorem 3.38 in fact, since C = {2, 3} is a finite
(n)
essential class, hence recurrent.) Furthermore, p22 = 1 if n is even, and zero if n is odd.

X (n)
Thus p22 = ∞, as expected, since state 2 is recurrent. ⊛
n=0
3.3 Interesting example: random walk on Z 48

3.3 Interesting example: random walk on Z


Let us now revisit the random walk X on the integers Z, defined by

X + 1 with probability p
n
Xn+1 =
X − 1 with probability q = 1 − p.
n

(We shall assume that p is not equal to zero or one, since otherwise this chain is pretty
boring!) This is symmetric if p = 1/2, otherwise asymmetric. Note that the whole state
space forms a single communicating class (so the random walk is irreducible), and that
every state has period 2.

Exercise 3.40 (Exercise sheet 4). Show that, for all n ≥ 0 and k ∈ Z,
 
 n p(n+k)/2 q (n−k)/2

if n + k is even
n+k
P (Xn = k | X0 = 0) = 2

 0 if n + k is odd.

Consider F01 (z) = E z T0,1 , where (recall that) T0,1 is the time that X first hits 1 if
 

started at 0. Since X0 = 0, we know that X1 ∈ {−1, 1}. If X1 = 1 then clearly T0,1 = 1;


but if X1 = −1, then we can argue that T0,1 = 1 + T−1,1 (the time to get from 0 to 1 is
equal to 1 (the first step) plus the time taken to get from -1 to 1). So

F01 (z) = E z T0,1


 

= z 1 P (X1 = 1 | X0 = 0) + E z 1+T−1,1 | X1 = −1 P (X1 = −1 | X0 = 0)


 

= zp + zqE z T−1,1 .
 

Now consider T−1,1 . The time taken to get from -1 to 1 must be equal (in distribution)
to the time taken to get from -1 to 0 for the first time plus the time taken to get from
0 to 1 for the first time. So T−1,1 has the same distribution as T−1,0 + T0,1 . But these
two times are independent random variables, and by symmetry (translation invariance)
have the same distribution as each other; thus
2
E z T−1,1 = E z T−1,0 +T0,1 = E z T−1,0 E z T0,1 = E z T0,1 = F01 (z)2 .
        

Putting this into the above we obtain a quadratic equation for F01 (z):

F01 (z) = pz + qzF01 (z)2 ,

with solutions p
1± 1 − 4pqz 2
F01 (z) = .
2qz
3.4 The fundamental matrix 49

We now have to decide which root to take. We know that F01 (z) is a continuous
function of z (it’s a power series), and that F01 (z) → 0 as z → 0; if we take the positive
root above then the RHS tends to infinity as z → 0, and so we must take the negative
root. Thus
p
1 − 1 − 4pqz 2
F01 (z) = .
2qz

Arguing as above, it follows that F0k (z) = F01 (z)k , for any k ≥ 1. We can use this
to determine whether or not the random walk X is recurrent or transient: since the

state-space is irreducible, it follows that X is recurrent if and only if f0k = 1 for all
k ∈ Z.
Now, when k ≥ 1, we see that
 √ k  k
∗ 1− 1 − 4pq 1 − |1 − 2p|
f0k = lim F0k (z) = =
z→1 2q 2q

(p/q)k if p < 1/2
=
1 if p ≥ 1/2.

So if p < 1/2, there is a positive chance that X never reaches state k ≥ 1; by symmetry,
if p > 1/2, the random walk will tend to drift upwards, and there will be a positive
chance that it will never visit any given negative state. We have shown that

Theorem 3.41. The simple random walk on Z is recurrent if and only if it is symmetric.

It can similarly be shown (without much further effort) that the symmetric random
walk in 2 dimensions is recurrent, but that in 3 (or more) dimensions it is transient!

3.4 The fundamental matrix


Suppose that S is finite, and that the chain X begins in a transient state. Since S is
finite, there must exist at least one essential, recurrent class (why?!), and we know that
eventually X must leave the set of transient states T (consider Exercise 3.35) and enter
a recurrent class where it will remain forever (Theorem 3.37). We would like to know
the probabilities of X ever visiting any particular recurrent class (if there are more than
one), and also to calculate the expected amount of time X spends in T .

Definition 3.42. The square matrix Q is a substochastic matrix if all its entries are
non-negative and all of its row-sums are no greater than one. (Recall that transition
matrices are stochastic matrices – all row sums equal one.)
3.4 The fundamental matrix 50

Definition 3.43. The fundamental matrix G of a substochastic matrix Q is given by

G = I + Q + Q2 + Q3 + . . . .

Note that the entries of G take values in the range [0, ∞].
P∞
Remark 3.44. If P is a transition matrix for the chain X and G = n=0 Pn =
P∞ (n)
n=0 P is its fundamental matrix, then the (i, j)th entry of G satisfies


(n)
X
Gij = pij = E [number of times X visits j | X0 = i] .
n=0

Example 3.45. Consider the chain with transition matrix

1 2
!
2 1
1 3 3
P = .
2 0 1

Then
1 2
2 n 2 n
  !
1 1−
P (n) = 3 3
,
2 0 1
and so
1 2
!
1 3 ∞
G= .
2 0 ∞
Thus if X begins in the transient state 1 then it spends on average three units of time
there, and infinitely long in state 2 (once it gets there it never leaves). If it begins in
state 2 then it spends no time at all in state 1 (it can’t get there), and infinitely long in
state 2. ⊛
3.4 The fundamental matrix 51

We have seen in Theorem 3.34 that we can relate the first-passage probabilities to
the expected number of times that X visits each state. It follows that

∞ if i is recurrent
Gii =
1

1−fii
∗ if i is transient.

Furthermore, if i 6= j then



 0 if i 9 j

Gij = ∞ if i → j and j is recurrent

 fij∗


1−fjj
∗ if i → j and j is transient.

To understand this, consider each case separately. If i 9 j then if X0 = i it is impossible


for the chain to ever visit j, hence the expected number of visits to j is zero. If i → j
then there is a positive chance that the chain, if started at i, will reach j in finite time,
i.e. fij∗ > 0. If it does reach j, then if j is recurrent it will then re-visit j on average
an infinite number of times (since Gjj = ∞); if j is transient however, it will re-visit j

on average a finite number of times (equal to Gjj = 1/(1 − fjj )). So in each case Gij is
given by

Gij = E [number of visits to j | X0 = i]


= P (X ever hits j | X0 = i) E [number of visits to j | X0 = j]
= fij∗ Gjj ,

where we have implicitly used the Markov property in the second line to argue that the
chain effectively ‘starts again with X0 = j’ once it has reached j for the first time.

Example 3.46. Looking back to Example 3.45 we know that state 1 is transient, with

G11 = 3. Thus f11 = 2/3. ⊛

Remark 3.47. Note that if j is recurrent then Gij equals 0 or ∞ for all i ∈ S; if j is
transient then Gij < ∞ for all i.

These observations make it very easy to calculate the rows of G corresponding to


recurrent states j: Gij = 0 if i 9 j; otherwise Gij = ∞. Furthermore, if j is transient and
i is recurrent, then it must be impossible for the chain to move from i to j (Theorem 3.37):
thus Gij = 0 in this case.
So we are left with having to calculate the values of Gij where i, j ∈ T , i.e. where
both i and j are transient states. A very useful fact is that we can calculate this part of
G by restricting attention to the substochastic matrix Q corresponding to the states in
3.4 The fundamental matrix 52

T , and calculating the fundamental matrix for Q, GQ . Then, arguing as for geometric
series, we find that

GQ = I + Q + Q2 + · · · = I + Q(I + Q + Q2 + . . . ) = I + QGQ .

Furthermore, we know that all of the entries of GQ are finite (by the above discussion),
and so we may subtract QGQ from each side to obtain

I = (I − Q)GQ .

Finally, since we are dealing with a finite state-space S here, all of these matrices are
finite-dimensional and we can deduce that

GQ = (I − Q)−1 . (3.6)

Example 3.48. Consider again the coin-tossing game of Example 3.32:


1/2

3 / 5 i 1
@ 1/2 1 2 3 4 5 6
1/2  
1 1/2 1/2 0 0 0 0
1/2  
4 1 c / 2 2 0 0 1/2 1/2 0 0 
1/2
1/2 3 0 0 1/2 0 1/2 0 
P = 
 

1/2  4 1/2 0
 0 0 0 1/2 
4 / 6 i 1 
1/2 5 0 0 0 0 1 0 

6 0 0 0 0 0 1

States T = {1, 2, 3, 4} are transient, and {5} and {6} are two recurrent communicat-
ing classes. We can immediately fill in the entries of G involving either of the recurrent
states:
1 2 3 4 5 6
 
1 ? ? ? ? ∞ ∞
 
? ? ? ? ∞ ∞
2 
? ? ? ? ∞ 0 
3 
G=  .
4
 ? ? ? ? ∞ ∞ 

5
 0 0 0 0 ∞ 0 

6 0 0 0 0 0 ∞
3.4 The fundamental matrix 53

Now let Q be the substochastic matrix obtained by restricting P to T :

1 2 3 4
 
1 1/2 1/2 0 0
 
2 0 0 1/2 1/2 
Q=  
3 0 0 1/2 0  
4 1/2 0 0 0

We then calculate
 −1  
1/2 −1/2 0 0 8/3 4/3 4/3 2/3
   
 0 1 −1/2 −1/2 2/3 4/3 4/3 2/3
GQ = (I − Q)−1

=
 0
 = .
 0 1/2 0  
 0
 0 2 0 
−1/2 0 0 1 4/3 2/3 2/3 4/3

This then tells us the final entries of our matrix G:

1 2 3 4 5 6
 
1 8/3 4/3 4/3 2/3 ∞ ∞
 
2 2/3 4/3 4/3 2/3 ∞ ∞ 

3 0
 0 2 0 ∞ 0 
G=  .

4 4/3 2/3 2/3 4/3 ∞ ∞ 

5 0
 0 0 0 ∞ 0 
6 0 0 0 0 0 ∞

So, for example, if X0 = 1 then on average the chain spends 4/3 units of time in state
2. If we started the chain in state 3 then we would expect it to spend a total of 2 units
of time in state 3 before being absorbed in state 5. ⊛

Notation: If C ⊆ S then we let Ti,C be the first time that X hits C if started at i. We
shall write
X
piC = pij
j∈C

for the probability that X jumps from i into C in one step, and


fiC = P (Xn ∈ C for some n > 0 | X0 = i) = P (Ti,C < ∞) .


That is, fiC is the probability that X ever hits the set of states C when started from i.

Theorem 3.49. Suppose T is the union of all transient classes and C is any union of
3.4 The fundamental matrix 54


recurrent classes such that fiC = 1 for all i ∈ T . Then,

(E [Ti,C ] : i ∈ T ) = GQ e .

Here e is a column vector full of 1s, with as many entries as there are states in T .

Proof. Because X begins in T and must end up in C (since fiC = 1 for all i ∈ T ), Ti,C
is equal to the amount of time that X spends in T . (Here we are using the fact that
recurrent states are essential – once the chain hits a recurrent state it is impossible for it
to move to a transient state.) But the (i, j)th entry of GQ is equal to the mean number
of times the chain visits the transient state j when started at i: if we add up all of these
entries (summing over j) then we obtain the mean amount of time spent in all transient
states. But this is exactly what we get when we multiply GQ by e.
Example 3.50. Consider Example 3.48 once again. Here T = {1, 2, 3, 4} is the set of all
transient states, and C = {5, 6} is the union of all recurrent classes. Using Theorem 3.49
we calculate:
      
E [T1,C ] 8/3 4/3 4/3 2/3 1 6
      
E [T2,C ]
 = GQ e = 2/3 4/3 4/3 2/3 1 = 4 .
    

E [T ]  0 0 2 0 
 1 2
   
 3,C  
E [T4,C ] 4/3 2/3 2/3 4/3 1 4

So we see that if X0 = 1, the average amount of time that the chain spends in T (i.e.
the amount of time until somebody wins the game) is 6. ⊛
As well as using the matrix GQ to calculate the mean amount of time that X spends
in transient states, we can also use it to calculate the probability that the chain will ever
visit the recurrent class C. (If there is only one recurrent class then this probability is of
course 1; but if there is more than one such class, the chain can only ever enter one of
them, and so this problem becomes more interesting.)
Example 3.51. Consider Example 3.48 again: here there are two recurrent classes
(C1 = {5} and C2 = {6}), and we are interested in the probability that the chain ever
visits C1 (since this is the probability that Player A wins). Remember that we have

already calculated this probability, f15 = 2/3, in Example 3.48 by solving a set of linear
equations – we promised then that there is a more efficient way to do this using matrices
... ⊛
Theorem 3.52. Let C be a recurrent class, and T the union of all transient classes. Let
FC∗ be the vector of absorption probabilities

FC∗ = (fiC

: i∈T) ,
3.4 The fundamental matrix 55

and define the matrix RC by

RC = (pij : i ∈ T , j ∈ C) .

Then

FC∗ = GQ RC e (3.7)

where e is again a column vector of 1s (with as many entries as the number of states in
C).

Remark 3.53. If you have trouble remembering how many entries there should be in the
vector e (|T | in Theorem 3.49, and |C| in Theorem 3.52), then note that in both cases the
number of entries is exactly the right number to make the matrix multiplication work!

Proof of Theorem 3.52. Recall that fiC is the probability that X ever reaches C if started
at i. In order to reach C, X has to spend some number n ≥ 0 of steps in the set of
transient states T , and then jump (in one step) from some state k ∈ T to some state
j ∈ C. Thus we can write
!
(n) (n)
XX X XX X

fiC = pik pkj = pik pkj
n≥0 k∈T j∈C k∈T j∈C n≥0
XX X
(G RC )ij = GQ RC e
Q

= Gik pkj = i
.
j∈C k∈T j∈C

Example 3.54. Returning to Example 3.51, we know from Example 3.50 that
 
8/3 4/3 4/3 2/3
 
2/3 4/3 4/3 2/3
GQ = 
 
.
 0 0 2 0 
 
4/3 2/3 2/3 4/3

We also know that


5
 
1 0
 
2 0 
RC1 =   .
3 1/2 

4 0
(Here we have simply taken the entries of P with rows corresponding to states i ∈ T =
{1, 2, 3, 4} and columns to states j ∈ C1 = {5}.) Since there is only one state in C1 , we
3.4 The fundamental matrix 56

have e = (1). Then


    
8/3 4/3 4/3 2/3 0 2/3
    
2/3 4/3 4/3 2/3  0 
= G Q RC 1 e =     (1) = 2/3 .
 
FC∗1  0
 0 2 0  1/2
    1 
 
4/3 2/3 2/3 4/3 0 1/3

This tells us that if we start in state 1 or 2, the probability of ever reaching C1 (and
Player A winning) is 2/3 (as we had already calculated!); if the chain starts in state 3
then Player A is certain to win (3 9 6, and so this makes sense!); if the chain starts in
state 4, then it is more likely that B wins – A only has a 1/3 chance of winning from
this starting state. ⊛

Exercise 3.55. Since the game must end with either Player A or player B winning, it
follows that  
1/3
 
1/3
FC∗2 = 
 0 .

 
2/3
Check that you can obtain this result by replacing RC1 with RC2 above. ⊛
3.4 The fundamental matrix 57

Example 3.56. Consider the following biased random walk:


) 1/4 1/4 1/4
+ + +
1 0 k 1 k 2 k 3 4 i 1
3/4 3/4 3/4
Here there are two essential classes (C1 = {0} and C2 = {4}), and one transient class
(T = {1, 2, 3}). We have
 
1 0 0 0 0
 
3/4 0 1/4 0 0  0 1/4 0
 
 
 0 3/4 0 1/4 0  ;
P = Q = 3/4 0 1/4 .
  

 0 0 3/4 0 1/4 0 3/4 0


 

0 0 0 0 1

Using Q we calculate

13 4 1
1 
GQ = (I − Q)−1 = 12 16 4  ,

10
9 12 13

and from P we read off



 

3/4 0
RC1 = 0 ; RC 2 = 0 .
   

0 1/4

We can then calculate the mean time until the random walk hits one of the recurrent
states belonging to C = C1 ∪ C2 :
      
E [T1,C ] 13 4 1 1 18/10
Q 1 
E [T2,C ] = G e = 12 16 4  1 = 32/10 ,
     
10
E [T3,C ] 9 12 13 1 34/10

and also the probabilities of ending up in state 0:



   
13 4 1 3/4 39/40
1 
FC∗1 = GQ RC 1 e = 12 16 4   0  (1) =  9/10  .
   
10
9 12 13 0 27/40

So, for example, if the chain begins in state 2, it will spend on average 32/10 units of
time in states {1, 2, 3} before ending up in one of the two absorbing states. Furthermore,
starting from state 2, the probability that it will ever visit state 0 (and remain there
forever) is 9/10. ⊛
3.5 Stationarity 58

3.5 Stationarity
In this (final) section of the notes we look at the behaviour of Markov chains as n → ∞.
In general we would not expect Xn to converge to any given state, but we might hope
that as n gets large the distribution of Xn (ν (n) = ν (0) P n ) will converge.

We again assume in this section that S is finite.

Definition 3.57. The row vector π = (πi : i ∈ S) is called a stationary (or invariant,
or equilibrium) distribution of the Markov chain X if and only if π satisfies the following
conditions:

(a) πi ≥ 0 for all i ∈ S;


P
(b) i∈S πi = 1;

(c) for all j ∈ S,


X
πj = πi pij .
i∈S

Stationary distributions are also often called invariant or equilibrium distributions.


Note that, in matrix notation, part (c) of this definition states that

π = πP .

Exercise 3.58 (Exercise sheet 4). Show that part (c) of Definition 3.57 is equivalent to

π = πP n for all n ≥ 0.

Remark 3.59. π is called a stationary distribution since, if X0 ∼ π then Xn ∼ π for all


n ≥ 0. (Recall Exercise 3.16).

In the previous section we saw that S can be decomposed into a set of transient
states T and a number of recurrent classes {Ci }. We have also seen (in Exercise 3.35)
(n)
that pij → 0 as n → ∞ for all j ∈ T . Since S is finite, there is at least one such
recurrent class and we know that the chain will eventually leave T , enter one of the sets
Ci , and remain there forever. If we expect a stationary distribution to tell us about the
long-term behaviour of X, we therefore expect the non-zero entries of π to relate to the
recurrent states in S.
3.5 Stationarity 59

Theorem 3.60. Suppose that X is a Markov chain on a finite state-space S = T ⊎ C,


where T is the set of all transient states and C is a closed irreducible set of recurrent
states. (Note that there is only one class of recurrent states here.) Then there exists a
unique stationary distribution, given by π = (πi )i∈S where

1/µ
i if i ∈ C
πi =
0 if i ∈ T ,

where ∞
(n)
X
µi = E [Ti,i ] = nfii
n=1

is the mean recurrence time of state i.

Definition 3.61. The recurrent state i is called

• positive-recurrent if µi < ∞;

• null-recurrent if µi = ∞.

Remark 3.62. If S is finite (as we are assuming throughout this section), then all
recurrent states are positive-recurrent.

So if there is only one recurrent class in S, we can find the mean recurrence time
of each state by finding the unique stationary distribution π.

Example 3.63. Let S = {1, 2, 3, 4} and


 
1/3 2/3 0 0
 
1/2 1/2 0 0 
P =
 .
 0 0 1/2 1/2 

0 3/4 1/4 0

From inspection of P we see that {1, 2} is a class of recurrent states, and that states 3
and 4 are transient. Theorem 3.60 tells us that there is a unique stationary distribution
π, with π3 = π4 = 0. In order to find π1 and π2 , we can restrict attention to that part
of P corresponding to these two states. We must therefore solve
!
1/3 2/3
(π1 , π2 ) = (π1 , π2 ) ,
1/2 1/2

subject to π1 + π2 = 1. Doing so yields π1 = 3/7 and π2 = 4/7. Thus π = (3/7, 4/7, 0, 0)


is the unique stationary distribution. Furthermore, we see that the mean recurrence
times for states 1 and 2 are µ1 = 1/π1 = 7/3 and µ2 = 1/π2 = 7/4. ⊛
3.5 Stationarity 60

Exercise 3.64. Check that πP = π in Example 3.63. ⊛


Example 3.65. Consider the following random walk:
) 1/4 1/4 1/4 1/4
+ + + +
3/4 0 k 1 k 2 k 3 k 4 i 1/4
3/4 3/4 3/4 3/4

Here C = {0, 1, 2, 3, 4} and T = ∅. To find the unique stationary distribution we


solve (using simultaneous equations, or Gaussian elimination):
 
3/4 1/4 0 0 0
3/4 0 1/4 0 0 
 
 
(π0 , π1 , π2 , π3 , π4 ) 
 0 3/4 0 1/4 0  = (π0 , π1 , π2 , π3 , π4 ) .

 0 0 3/4 0 1/4
 

0 0 0 3/4 1/4

1
This has solution (with 4i=0 πi = 1): π =
P
(81, 27, 9, 3, 1). Thus, if X0 = 0, we see
121
that µ0 , the mean time until X returns to 0 is given by µ0 = 1/π0 = 121/81; whereas if
X0 = 4, the mean time until it returns to its starting state is 121. ⊛

3.5.1 Limiting behaviour

We have seen that a stationary distribution (if one exists) satisfies πP n = π for all n.
(n)
We now study the relationship between π and the limiting behaviour of pij as n → ∞.
(n)
We have already seen that if j is transient, then pij → 0 for all i ∈ S. What can
be said about the limit of these probabilities if j is recurrent? In considering limiting
behaviour, we will restrict attention to the case where all states are aperiodic: in this
case we call the chain itself aperiodic. (Similar, but messier, results can be derived for
chains with period d > 1 by subsampling the chain at times kd, k ≥ 0.)
Theorem 3.66. Suppose that X is an aperiodic Markov chain with state-space S. Then

(n) fij∗
pij → as n → ∞, for all i ∈ S.
µj

Definition 3.67. The chain X is called ergodic if it is irreducible, aperiodic and


positive-recurrent (i.e. all states intercommunicate, have period 1, and are positive-
recurrent).
Thus if |S| < ∞ and X is ergodic, then we have seen that there is a unique stationary
distribution π for the chain. Theorem 3.66 tells us that in this case,

(n) 1
pij → = πj as n → ∞, for all i ∈ S.
µj
3.5 Stationarity 61

Example 3.68. Consider the chain X with transition matrix


 
1/2 1/2 0
P =  0 3/4 1/4 .
 

1 0 0

This chain is ergodic, and the unique stationary distribution is given by π =


(2/7, 4/7, 1/7). Thus as n → ∞,
 
2/7 4/7 1/7
P (n) → 2/7 4/7 1/7 .
 

2/7 4/7 1/7

Example 3.69. Consider the chain X with matrix P given by


 
1 0 0 0 0
 0 1/4 0 3/4 0 
 
 
P =
1/3 0 1/3 0 1/3 .

 0 1/2 0 1/2 0 
 

1/3 1/3 1/3 0 0

Here there is one transient class {3, 5} and two positive-recurrent aperiodic classes,
{1} and {2, 4}. We can therefore immediately work out many of the entries in the limit
of P (n) :  
1 0 0 0 0
0 ? 0 ? 0 
 

P (n) → 
 
 ? ? 0 ? 0 .

0 ? 0 ? 0 
 

? ? 0 ? 0
For the missing entries of this matrix, we see from Theorem 3.66 that we need to find
the probabilities fij∗ . By Theorem 3.37, f24
∗ ∗
= f42 = 1, and so we just need to find fij∗ for
i ∈ {3, 5} and j ∈ {2, 4}. Using the first-step decomposition (3.3), or our fundamental
matrix equations (3.7), we obtain

∗ 1 ∗ 1 ∗ ∗ 1 ∗
f32 = f32 + f52 ⇒ f32 = f ;
3 3 2 52
∗ 1 1 ∗ 1 1 ∗ ∗ 2 ∗ 1
f52 = + f32 = + f52 ⇒ f52 = and f32 = .
3 3 3 6 5 5
∗ ∗ ∗ ∗ ∗
Ths immediately tells us that f34 = f32 = 1/5 and f54 = 2/5; so f31 = 4/5 and f51 = 3/5.
3.5 Stationarity 62

Finally, we need to calculate the mean recurrence times µ1 , µ2 and µ4 . Hopefully


it’s obvious that µ1 = 1. So now consider the chain started in class {2, 4}: the chain
restricted to this class is ergodic, and so possesses a unique stationary distribution (by
Theorem 3.60), which we find by solving
!
1/4 3/4
(π2 , π4 ) = (π2 , π4 ) .
1/2 1/2

This gives (π2 , π4 ) = (2/5, 3/5) (remembering that we always need π2 + π4 = 1). Thus
µ2 = 1/π2 = 5/2 and µ4 = 1/π4 = 5/3.
(n)
We can now finish filling in the limiting matrix, using pij → fij∗ /µj :
 
1 0 0 0 0
 0 2/5 0 3/5 0
 

P (n)
 
→ 4/5 2/25 0 3/25
 0.
 0 2/5 0 3/5 0
 

3/5 4/25 0 6/25 0

THE END

You might also like