Hold Up Game and Many More Extensive Form
Hold Up Game and Many More Extensive Form
Contents.
1 Nash Equilibrium in Extensive
1.1 Seltens Game . . . . . . . .
1.2 The Little Horsey . . . . . .
1.3 Giving Gifts . . . . . . . . .
Form
. . . .
. . . .
. . . .
Games
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
3
4
5
.
.
.
.
.
6
6
7
10
12
14
3 Backward Induction
3.1 Rollback Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Subgame-Perfect Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
18
20
.
.
.
.
.
.
.
.
.
.
.
.
.
25
25
26
29
29
31
32
33
35
38
39
40
40
42
43
43
45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
)
generally though, a Nash equilibrium of an extensive form game is a strategy prole (si , si
such that ui (si , si ) ui (si , si ) for each player i and all si Si . That is, the denition
of Nash equilibrium is the same as for strategic games (but be careful how you specify the
strategies here).
Finding the Nash equilibria of extensive form games thus boils down to nding Nash equilibria of their reduced normal form representations. We have already done this with Myersons card game, reproduced in Fig. 1 (p. 2).
N
[0.5]
[0.5]
black
red
1.b
F
1.c
R
r
qb
1, 1
qc
2
p
1, 1
2, 2 1, 1
2, 2 1, 1
Figure 1: Myersons Card Game in Extensive Form.
Recall that the mixed strategy Nash equilibrium of this game is:
2
2
1
1
[Rr ], [F r ] ,
[m], [p]
.
3
3
3
3
If we want to express this in terms of behavior strategies, we would need to specify the probability distributions for the information sets. Player 1 has two information sets, b following
the black card, and c following the red card. The probability distributions are ( 2/3[F ], 1/3[R])
at information set b, and (0[f ], 1[r ]) at information set c. In other words, if player 1 sees
the black (losing) card, he folds with probability 2/3. If he sees the red (winning) card, he
always raises. Player 2s behavior strategy is specied above (she has only one information
set).
Because in games of perfect recall mixed and behavior strategies are equivalent (Kuhns
Theorem), we can conclude that a Nash equilibrium in behavior strategies must always exist
in these games. This follows directly from Nashs Theorem. Hence, we have the following
important result:
Theorem 1. For any extensive-form game with perfect recall, a Nash equilibrium in behavior strategies exists.
2
Generally, the rst step to solving an extensive-form game is to nd all of its Nash equilibria. The theorem tells us at least one such equilibrium will exist. We furthermore know that
if we nd the Nash equilibria of the reduced normal form representation, we would nd all
equilibria for the extensive form. Hence, the usual procedure is to convert the extensive-form
game to strategic form, and nd its equilibria.
1.1 Seltens Game
However, some of these equilibria would have important drawbacks because they ignore the
dynamic nature of the extensive-form. This should not be surprising: after all, we obtained
the strategic form representation by removing the element of timing of moves completely.
Reinchard Selten was the rst to argue that some Nash equilibria are more reasonable than
others in his 1965 article. He used the example in Fig. 2 (p. 3) to motivate the discussion, and
so will we.
1
U
D
2
L
3,1
U
D
2,2
L
2,2
3,1
R
2,2
0,0
0,0
Figure 2: Seltens Example.
The strategic form representation has two pure-strategy Nash equilibria, D, L and U , R.1
Look closely at the Nash equilibrium (U , R) and what it implies for the extensive form. In the
prole (U , R), player 2s information set is never reached, and she loses nothing by playing
R there. But there is something wrong with this equilibrium: if player 2s information set
is ever reached, then she would be strictly better o by choosing L instead of R. In eect,
player 2 is threatening player 1 with an action that would not be in her own interest to carry
out. Now player 2 does this in order to induce player 1 to choose U at the initial node thereby
yielding her the highest payo of 2. But this threat is not credible because given the chance,
player 2 will always play L, and therefore this is how player 1 would expect her to play if he
chooses D. Consequently, player 1 would choose D and player 2 would choose L, which of
course is the other Nash equilibrium D, L.
The Nash equilibrium U , R is not plausible because it relies on an incredible threat (that
is, it relies on an action which would not be in the interest of the player to carry out). In fact,
none of the MSNE will be plausible for that very reason either. According to our motivation
for studying extensive form games, we are interested in sequencing of moves presumably
because players get to reassess their plans of actions in light of past moves by other players
1
What about mixed strategies? Suppose player 1 randomizes, in which case player 2s best response is L.
But if this is the case, player 1 would be unwilling to randomize and would choose D instead. So it cannot be
the case that player 1 mixes in equilibrium. What if player 2 mixes? Let q denote the probability of choosing
L. Player 1s expected payo from U is then 2q + 2(1 q) = 2, and his expected payo from D is 3q. He
would choose U if 2 3q, or 2/3 q, otherwise he would choose D. Player 2 cannot mix with 1 > q > 2/3 in
equilibrium because she has a unique best response to D. Therefore, she must be mixing with 0 q 2/3. For
any such q, player 1 would play U . So, there is a continuum of mixed-strategy Nash equilibria, where player 1
chooses U , and player 2 mixes with probability q 2/3. These have the same problem as U, R.
(and themselves). That is, nonterminal histories represent points at which such reassessment
may occur. The only acceptable solution should be the PSNE D, L.
The following denition is very important for the discussion that follows. It helps distinguish between actions that would be taken if the equilibrium strategies are implemented and
those that should not.
Definition 2. Given any behavior strategy prole , and information set is said to be on the
path of play if, and only if, the information set is reached with positive probability according
to . If is an equilibrium strategy prole, then we refer to the equilibrium path of play.
To anticipate a bit of what follows, the problem with the U , R solution is that it species
the incredible action at an information set that is o the equilibrium path of play. Player
2s information set is never reached if player 1 chooses U . Consequently, Nash equilibrium
cannot pin down the optimality of the action at that information set. The problem will
not extend to strategy proles which visit all information sets with positive probability. The
reason for this is that if the Nash equilibrium prole reaches all information sets with positive
probability, then it will also reach all outcomes with positive probability. But if it does so,
the fact that no player can prot by deviating from his Nash strategy implies that there
would exist no information set where he would want to deviate. In other words, his actions
at all information sets are credible. If, on the other hand, the Nash strategies leave some
information sets o the path of play, then the Nash requirement has no bite: whatever the
player does at these information sets is irrelevant as it cannot aect his payos. It is
under these circumstances that he may be picking an action that he would not never choose
if the information set is actually reached. Notice that unlike U , R, the other PSNE D, L
does reach all information sets with positive probability. In this case, Nashs requirement is
sucient to establish optimality of the strategies everywhere. As we shall see, our solutions
will always be Nash equilibria. Its just that not all Nash equilibria will be reasonable.
We now look at two examples that demonstrate that this problem occurs not only in games
of certainty, complete and perfect information, but also in games of certainty with imperfect
information, and games of uncertainty with imperfect information.
1.2 The Little Horsey
Consider the following simple game in Fig. 3 (p. 4). Player 1 gets to choose between U , M,
or D. If he chooses D, the game ends. If he chooses either U or M, player 2 gets to choose
between L and R without knowing what action player 1 has taken except that it was not D.
1
U
2
[p]
L
2, 1
1, 3
M
[1p]
R
0, 0
L
0, 2
R
0, 1
However, there is something unsatisfying about the second one. In the D, R equilibrium,
player 2 seems to behave irrationally. If her information set is ever reached, playing L strictly
dominates R. So player 1 should not be induced to player D by the incredible threat to play L.
However since player 2s information set is o the equilibrium path, Nash equilibrium does
not evaluate the optimality of play there.
U
M
D
L
2, 1
0, 2
1, 3
R
0, 0
0, 1
1, 3
GT
[p]
ST
[1 p]
1
No Gift
1
Gift
Gift
[q]
0, 0
1, 1 1, 0
No Gift
[1q]
0, 0
1, 0 1, 1
GG
GN
NG
NN
Y
1, p
p, p
1 p, 0
0, 0
N
1, p 1
p, 0
p 1, p 1
0, 0
Note that this requirement does not tell us anything about how these beliefs are formed
(but you can already probably guess that they involve Bayes rule). Before studying this, lets
see how simply having these beliefs can help solving some of the problems we encountered.
For this, we need to introduce a second requirement.
2.2 Sequential Rationality
A strategy is sequentially rational for player i at the information set h if player i would
actually want to chose the action prescribed by the strategy if h is reached. Because we
require players to have beliefs for each information set, we can split each information set
and evaluate optimal behavior at all of its nodes, including ones in information sets that are
not reached along the equilibrium path. For now, we are not interested how these beliefs are
formed, just that players have them. For example, regardless of what player 1 does, player 2
will have some updated belief q at her information set.2
A continuation game refers to the information set and all nodes that follow from that
information set. Using their beliefs, players can calculate the expected payos from continuation games. In general, the optimal action at an information set may depend on which node
in that set the play has reached. To calculate the expected payo of an action at information
set h, we have to consider the continuation game.
More formally, take any two nodes x y (that is, y follows x in the game tree), and
consider the mixed strategy prole . Let P (y| , x) denote the probability of reaching y
starting from x and following the moves prescribed by . That is, P (y| , x) is the conditional probability that the path of play would go through y after x if all players chose
according to and the game started at x. It is the multiplicative product of all chance probabilities and move probabilities given by for the branches connecting x and y. Naturally,
if y does not follow x, then P (y| , x) = 0. Player is expected utility in the continuation
game starting at node x then is:
P (z| , x)ui (z),
Ui ( |x) =
zZ
where Z is the set of terminal nodes in the game, and ui () is the Bernoulli payo function
specifying player is utility from the outcome z Z. Note that this payo only depends on
the components of that are applied to nodes that follow x. Anything prior to x is ignored.
These expected utilities are therefore conditional on node x having been reached by the path
of play.
For example, consider Seltens game from Fig. 2 (p. 3) and suppose 1 = ( 1/3, 2/3), and
2 = (0.5, 0.5). The continuation game from player 1s information set includes the entire
game tree. What is player 1s expected payo? Let z1 denote the outcome (D, L), z2 denote
the outcome (D, R), and z3 denote the outcome (U ). The expected payo then is:
U1 ( |) = P (z1 | , )u1 (z1 ) + P (z2 | , )u1 (z2 ) + P (z3 | , )u1 (z3 )
= 1 (D)2 (L)(3) + 1 (D)2 (R)(0) + 1 (U )(2)
= 1/3 1/2 3 + 2/3 2 =
2
11/ .
6
This is true even in the case where player 1s strategy is NN, in which case q would reect the belief about
player 1 given the unexpected, zero-probability, event that a gift is oered.
What about his payos in the continuation game starting at player 2s information set? It
would be:
U1 ( |D) = P (z1 | , D)u1 (z1 ) + P (z2 | , D)u1 (z2 )
= 2 (L)(3) + 2 (R)(0)
= 1/2 3 = 3/2.
We can also calculate player 2s payo in both continuation games in an analogous manner:
U2 ( |) = P (z1 | , )u2 (z1 ) + P (z2 | , )u2 (z2 ) + P (z3 | , )u2 (z3 )
= 1 (D)2 (L)(1) + 1 (D)2 (R)(0) + 1 (U )(2)
= 1/3 1/2 1 + 2/3 2 = 3/2
and
U2 ( |D) = P (z1 | , D)u2 (z1 ) + P (z2 | , D)u2 (z2 )
= 2 (L)(1) + 2 (R)(0)
= 1/2 1 = 1/2.
How does player 2 evaluate the optimality of her behavior using her beliefs? She calculates
expected utilities as usual and then chooses the strategy that yields the highest expected
payo in the continuation game. That is, if the information set h is a singleton, the behavior
strategy prole is sequentially rational for i at h if, and only if, player is strategy i
maximized player is expected payo at h if all other players behaved according to .
In Seltens example, player 2 has only one sequentially rational strategy at her information
set D, which is trivial to verify. The expected payo in the continuation game following this
node is maximized if she chooses L with certainty. For player 1, the calculation involves
player 2s strategy. Player 1 would choose D if:
U1 (D, 2 ) U1 (U , 2 ) 2 (L)(3) + 2 (R)(0) 2 2 (L) 2/3.
Thus, strategy D is sequentially rational for any 2 (L) > 2/3; strategy U is sequentially
rational for any 2 (L) < 2/3; and both D and U are sequentially rational if 2 (L) = 2/3.
If information sets are singletons, sequential rationality is straightforward to dene in the
way we just did. What do we do for information sets that contain more than one node?
This is where beliefs come in: we calculate the expected payo from choosing a particular
action conditional on these beliefs. That is, i is sequentially rational for player i at the (nonsingleton) information set h if, and only if, i maximizes player is expected payo when
node x h occurs given the belief probabilities that player i assigns to the nodes x given
that h has occurred, and assuming that the play continues according to .
The intuition is as follows. When a player nds himself at some information set, he must
be able to assess the consequences of choosing some action. This is straightforward when
the information set is a singleton: the player will know precisely what will happen given his
choice. But what if the information set is not a singleton? In that case, the same action could
have dierent consequences depending on which node in the information set the player is
actually at. This is where beliefs come in. Since a belief is just a probability distribution over
the nodes in this information set, the player can compute the expected payo from choosing
8
a particular action. To do this, he takes the payo from taking that action at some node in
the information set and multiplies it by the probability that he is at that node; he does that
for all nodes in the information set and adds the results together. In other words, the usual
expected payo calculation!
For example, consider the game in Fig. 3 (p. 4), and let p denote player 2s belief that she
is at the left node in her information set conditional on this set having been reached. Her
expected payo from choosing L is then p (1) + (1 p) (2) = 2 p; analogously, her
expected payo from choosing R is p (0) + (1 p) (1) = 1 p. She would choose L
whenever 2 p > 1 p; that is, always (this inequality holds regardless of the value of
p). Hence, the unique sequentially rational strategy for player 2 at her information set is to
choose L with certainty.
A similar thing happens with the gift game from Fig. 5 (p. 5). Player 2 can calculate the
expected payo from choosing Y , which is q(1)+(1q)(0) = q, and the expected payo from
choosing N, which is q(0) + (1 q)(1) = q 1. Since q > q 1 for all values of q, it is never
optimal for player 2 to choose N regardless of the beliefs player 2 might hold. Therefore, the
strategy N is not sequentially rational because there is no belief that player 2 can have that
will make it optimal at her information set. In other words, the unique sequentially rational
strategy is to choose Y with certainty.
We now require that in equilibrium players should only choose sequentially rational strategies (actually, we can prove that this must be the case given beliefs, but we need to learn how
to derive these beliefs rst).
Requirement 2 (Sequential Rationality). Given their beliefs, the players strategies must be
sequentially rational. That is, at each information set the actions by the players must form
a Nash equilibrium in the continuation game.
This is now sucient to rule out the implausible Nash equilibria in the three examples we
have seen. Consider Seltens game. As we saw above, player 2s unique sequentially rational
strategy is L (and it forms a Nash equilibrium in the trivial decision game at her information
set). Therefore, 2 (L) = 1 in equilibrium. Since choosing D is sequentially rational for
player 1 for any 2 (L) > 2/3, this implies that D is the unique equilibrium sequentially
rational rational strategy for player 1. We conclude that the only Nash equilibrium that
involves sequentially rational strategies is D, L. In other words, we ruled out all implausible
equilibria.
Lets apply this to the game in Fig. 3 (p. 4). As we have seen, the expected utility from
playing L then is 2 p and the expected utility from playing R is 1 p. Since 2 p > 1 p for
all values of p, Requirement 2 prevents player 2 from choosing R. Simply requiring player 2
to have beliefs and act optimally given these beliefs eliminates the implausible equilibrium
D, R.
Finally, as we have seen in the game from Fig. 5 (p. 5), the only sequentially rational strategy
for player 2 is Y . Hence, Requirement 2 eliminates the implausible equilibrium NN, N there
as well.
Lets now look at a more interesting example in Fig. 7 (p. 10). This is the same game as the
one in Fig. 5 (p. 5) but with a slight variation in the payos. This actually looks a lot more
like a gift-giving game. In this situation, player 2 strictly prefers not to accept the Star Trek
book, and is indierent between rejecting gifts regardless of what they are. Accepting the
preferred gift is the best outcome. Player 1, in turn, still wants his gift accepted but he also
prefers to please player 2. Even then, hed rather have her accept even the gift she does not
like than suer the humiliation of rejection.
Nature
GT
[p]
ST
[1 p]
1
No Gift
1
Gift
Gift
[q]
0, 0
No Gift
[1q]
0, 0
1, 1 1, 0
2, 1 1, 0
10
Consider the game in Fig. 7 (p. 10). Player 2s prior belief about the book being GT is p.
Suppose player 1 plays a mixed strategy, where he gives a gift with probability G if the book
is GT, gives a gift with probability S if the book is ST. What must q be?
It is the probability of the book being GT given that the gift was oered. The probability
that the gift is oered is the sum of the probabilities that it is oered depending on its type:
pG + (1 p)S . The probability that it is oered if it is GT, is pG (that is, the probability
that GT is selected by Nature times the probability that player 1 oers GT as a gift). Putting
these things together gives us the posterior
q=
pG
,
pG + (1 p)S
(( 2/3[F ], 1/3[R]) , r ) ,
2/ [m], 1/ [p]
3
3
All information sets are reached with positive probability in this strategy prole, and so the
theorem tells us that these strategies must be sequentially rational. Lets verify that this is
the case and check that each player would actually want to choose the actions specied by
the strategies if the moves were not planned in advance.
Consider information set c for player 1 (chance has chosen the winning red card). If he
raises, he would get at least 1 (if player 2 passes) and will get at most 2 (if player 2 meets).
There is no gain from folding because it yields at most 1. Therefore, choosing r at this node
11
is sequentially rational. Consider now his information set b (chance has chosen the losing
black card). If player 1 raises, he would get +1 if player 2 passes, and 2 if player 2 meets.
Under the assumption that player 2 implements her equilibrium strategy, the expected payo
will be 1/3 1 + 2/3 2 = 1. If the player folds, his expected payo is 1 also. Hence,
player 1 is indierent between raising and folding at this information set and must be willing
to randomize. In other words, mixing at this information set is sequentially rational, just like
the equilibrium strategy species.
Consider now player 2s information set h. Let qb denote the left node (that is, the node
that follows player 1 raising if the card is black), and let qc denote the right node (that is, the
node that follows player 1 raising if the card is red). We rst calculate the probabilities of
reaching these nodes under given Natures moves:
P (qb | ) = (0.5) 1 (R|b) = (0.5) 1/3 = 1/6,
P (qc | ) = (0.5) 1 (r |c) = (0.5) 1 = 0.5.
We now apply Bayes rule to derive the conditional probabilities; that is, the belief that the
game has reached qb given that player 2s information set was reached, denoted by 2 (qb |h);
and the belief that the game has reached qc given that player 2s information set was reached,
denoted by 2 (qc |h):
1/
P (qb | )
6
= 1
= 0.25
P (qb | ) + P (qc | )
/6 + 0.5
0.5
P (qc | )
= 1
= 0.75.
2 (qc |h) =
P (qb | ) + P (qc | )
/6 + 0.5
2 (qb |h) =
Of course, as required 2 (qb |h) + 2 (qc |h) = 1 because the belief must specify a valid
probability distribution; that is, given that the information set was reached, it must be the
case that exactly one node in it was reached. We conclude that given player 1s strategy
and the chance moves by Nature, player 2s consistent belief given that her information
set was reached must be that she is at node qb with probability 0.25, and at node qc with
probability 0.75. With these consistent beliefs, we can now compute player 2s expected
payos from her pure strategies. If she meets, she would get 2 (qb |h) 2 + 2 (qc |h) 2 =
0.25 2 + 0.75 2 = 1. If she passes, she would get 1 in either case. Hence, given
her beliefs, player 2 is indierent between meeting and passing because both strategies are
sequentially rational. She can therefore mix in equilibrium, as prescribed by her equilibrium
strategy.
2.4 Beliefs After Zero-Probability Events
You may have detected some hand-waving in Requirement 3 in the whenever possible
clause. You would be right. How do we update beliefs in the strategy prole NN, N in
Fig. 5 (p. 5)? The probability of reaching player 2s information set is 0 if these strategies are
followed. We cannot use Bayes rule in such situations because it would involve division by
0. However, player 2s belief is still meaningful under these conditions: it is the belief when
she is surprised by being oered a gift. This problem only arises o the equilibrium path,
never on it (because along the equilibrium path there is never a zero probability of reaching
an information set).
In the case when Bayes rule does not pin down posterior beliefs, any beliefs are admissible.
This means that every action can be chosen as long as it is sequentially rational for some
12
belief. Notice that in the gift game, N is not sequentially rational for any possible belief, and
so it still would not be chosen. This is because N is strictly dominated by Y .
To help illustrate these ideas, consider the following motivational example in Fig. 8 (p. 13),
where three players play a game that can end if player 1 opts out. Let p denote player 3s
belief that he is at the left node in his information set conditional on this set being reached
by the path of play.
1
O
2, 0, 0
I
2
U
3
[p]
L
1, 2, 1
D
[1p]
3, 3, 3
0, 1, 2
R
0, 1, 1
13
I
Q
2
U
3
[p]
[1p]
each nite game will have at least one PBE. This, as you can imagine, is very important if we
want to solve games.
So remember, a PBE is a Nash equilibrium where strategies are sequentially rational given
the beliefs, and the beliefs are weakly consistent with these strategies (updated via Bayes rule
whenever possible).
Going back to our Gift-Giving Game in Fig. 7 (p. 10), recall that player 2s sequentially
rational strategy is to accept whenever q > 1/2, reject whenever q < 1/2, and either one
(including mixtures) otherwise. Let player 1s strategy be denoted by (r s), where r is the
probability of oering the Game Theory book, and s is the probability of oering the Star
Trek manual. Bayes rule then yields player 2s posterior belief:
q=
pr
.
pr + (1 p)s
To nd the PBE, we must nd mixing probabilities for the two players that are sequentially rational and that are also such that q is consistent with player 1s equilibrium strategy. Suppose
rst that player 1s strategy is completely mixed; that is, he randomizes at both information
sets, so r , s (0, 1). Take an arbitrary (possibly degenerate) mixed strategy for player 2
in which she accepts with probability [0, 1]. Since player 1 is willing to mix, he must
be indierent between oering the gift and not oering it at both information sets. Not offering gives him a payo of 0 in either case. Oering, on the other hand, yields a payo
U1 (G|GT) = 2 + (1 )(1) if the book is on Game Theory and U1 (G|ST) = + (1 )(1)
if it is on Star Trek. Observe now that because player 1 must be indierent to mix, it follows
that U1 (G|ST) = U1 (NG|ST), or
+ (1 )(1) = 0 2 1 = 0 =
1
.
2
That is, if player 1 is indierent between oering the Star Trek manual and not oering it, it
must be the case that player 2 will accept the oer with probability exactly equal to 1/2. This
now implies that
U1 (G|GT) = 2( 1/2) + (1 1/2)(1) = 1 1/2 = 1/2 > 0 = U1 (NG|GT).
In other words, = 1/2, which must hold for player 1 to mix in if he holds the Star Trek
manual, also ensures that he cannot possibly mix if he holds the Game Theory book. This
contradicts the supposition that player 1 mixes in both cases. We conclude that if player 1
mixes on the Star Trek manual in equilibrium, he must be oering the Game Theory book for
sure.
Since we now know that s > 0 r = 1, lets see if there is a PBE with these properties.
From the discussion above, we know that this equilibrium requires = 1/2 or else player 1
would not randomize with the Star Trek manual. This now implies that q = 1/2 or else player
2 would not randomize in her acceptance decision. Putting everything together yields:
q=
1
p(1)
p
=
.
s =
2
p(1) + (1 p)s
1p
We need to ensure that s is a valid mixing probability; that is, we must make sure that
s (0, 1). It is clearly positive because p (0, 1). To ensure that s < 1, we also need
p < 1/2. Hence, we found a PBE. Writing it in behavioral strategies yields the following:
p
1
This equilibrium is intuitive: since player 1 always oers the Game Theory book and only
sometimes oers the Star Trek manual, player 2 is willing to risk accepting his oer. Of
course, her estimate about this risk depends on her prior belief. Player 1s strategy is precisely calibrated to take into account this belief when he tries to blu player 2 into accepting
what he knows is a gift she would not want. Note that p < 1/2 is a necessary condition for
this equilibrium. As we shall see shortly, if player 2s priors are too optimistic, she would
accept the oer for sure, in which case (we would expect) player 1 to oer her even the Star
Trek manual for sure.
Observe now that player 2 has learned something from player 1s strategy that she did not
know before: her equilibrium posterior belief is q = 1/2 > p. That is, she started out with a
prior which assigned less than 50% chance to the gift being a Game Theory book and then
updated this belief to 50% upon seeing the gift being oered. However, she still is not sure
just what type of gift she is being oered. Player 1s strategy is called semi-separating because
players action allow player 2 to learn something, but not everything, about the information
he has: she can separate the Game Theory gift from the Star Trek manual only partially.3
The residual uncertainty player 2 has is a common feature of the other player choosing a
semi-separating strategy.
We have exhausted the possibilities in which player 1 mixes when he has the Star Trek
manual. Only two possible types of strategies remain: he either always oers it or never does.
Suppose rst that player 1 always oers the Star Trek manual in equilibrium, so s = 1. From
our calculations above, we know that this means player 2 would have to accept his oer with
1/2, which in turn implies that q 1/2 as well. If player 2 accepts with probability at least
as high as 1/2, then player 1 will always oer the Game Theory book: U1 (G|GT) = 3 1 > 0
for any > 1/3. This now means that player 1 always oers the gift regardless of its type,
which implies q = p. Since we require that q 1/2, it follows that this equilibrium only exists
if p 1/2. Hence, we found a PBE for that range of priors:
1
r = 1, s = 1 , = 1 provided p .
2
Note that if p = 1/2, then player 2 is indierent between accepting and rejecting, so she
can mix with any probability as long as 1/2, so there is a continuum of PBE in this
case. However, requiring that the prior equal a particular value is an extremely demanding
condition and these solutions are extremely fragile: the smallest deviation from p = 1/2
would immediately produce one of the PBE we identied above. Normally, we would ignore
solutions that depend on knife-edge conditions like that. It is important to note that whereas
p = 1/2 is a knife-edge condition we can ignore, q = 1/2 in our semi-separating PBE is not.
Unlike the prior, the posterior probability is strategically induced by the behavior of the
player.
In this equilibrium, player 1 is playing a pooling strategy because he pools on the same
action (oering the gift) no matter what he knows about the gifts type. Not surprisingly,
whenever a player uses a pooling strategy, his opponent cannot learn anything from the
behavior she observes. As we have seen, her posterior is exactly equal to her prior.
Suppose now that player 1 never oers the Star Trek manual, so s = 0, but oers the
Game Theory book with positive probability, so r > 0. In this case, q = 1, and player 2s
sequentially rational response is to accept for sure. However, if she accepts the oer with
certainty, then player 1 would prefer to oer even the Star Trek manual: doing so would net
3
For this reason, some authors call the strategies partially separating or partially pooling.
16
him a payo of 1 while not oering saddles him with 0. In other words, there can be no
equilibrium of this type.
In this case, the player 1s strategy is separating because it allows player 2 to infer what
he knows with certainty. Not surprisingly, if he allowed her to do that, he would be at a
disadvantage. It is not dicult to see that separating in a way that convinces player 2 that
the oer contains the Game Theory book cannot be optimal: she would accept the oer given
this belief but then player 1 would have an incentive to oer the Star Trek manual as well,
which means player 2 cannot sustain her belief that she is oered the Game Theory book for
sure.
Conversely, player 1 could induce a posterior belief that the book is the Star Trek manual
for sure by playing the separating strategy (r = 0, s = 1); that is, he never oers the Game
Theory book and always oers the Star Trek manual. But in this case, player 2 will always
reject the oer, in which case player 1 will be strictly better o not oering the Star Trek
manual. So this cannot be an equilibrium either. This game has no separating equilibria.
The nal scenario to examine is r = s = 0. This leaves player 2s information set o the
equilibrium path of play, so Bayes rule cannot pin down her posterior beliefs. Can we sustain
these strategies in equilibrium? To induce player 1 never to oer even the Game Theory
book, it must be the case that U1 (G|GT) U1 (NT |GT) 3 1 0 1/3. That is,
player 2 must threaten that she will reject the oer with at least 2/3 probability. This threat
will be credible if she believed the gift was bad with probability at least 1/2. In other words,
if q 1/2, then player 2 can credibly reject the oer. Observe now that this is sucient to
deter player 1 from oering the Star Trek manual as well: U1 (G|ST) = 2 1 < 2( 1/3) 1 =
1/3 < 0 = U1 (NG|ST). Therefore, we have multiple PBE that all take the same form:
r = 0, s = 0 , = 0 with q 1/2.
3 Backward Induction
Before learning how to characterize PBE in arbitrary games of imperfect information, we
study how to use backward induction to compute PBE in games of complete information. In
games of complete and perfect information, we can apply this technique to nd the rollback
equilibrium, which we then generalize to subgame-perfect equilibrium. All of these are PBE
but do not require the entire belief machinery.
17
1
2
e
1
e
a
a
15, 15
10, 10
10, 10
0, 0
(e, a)
(e, a)
Player 1
(e, a)
(e, a)
Player 2
r
r
15, 15
10, 10
10, 10
10, 10
0, 0
0, 0
0, 0
0, 0
the path of play and causes the strategies to miss two problems: player 1s choice of a is not
sequentially rational at his second information set, and given that choice, player 2s choice
of r is not rational either! In other words, since the equilibrium path of play does not reach
some continuation games, Nash cannot pin down optimal behavior there.
Lets apply backward induction to the game in Fig. 13 (p. 20). There are three penultimate
information sets, all of them for player 2. For each of these sets we determine player 2s
sequentially rational strategy. Because all information sets are singletons, we do not need
to specify conditional beliefs (they are trivial and assign probability 1 to the single node in
the information set). So, if the (2, 0) set is ever reached, then player 2 is indierent between
accepting and rejecting. Any of these actions is plausible, so player 2s sequentially rational
strategy at (2, 0) includes both actions. In other words, player 2 will be willing to mix at
this information set. On the other hand, player 2s sequentially rational strategy at (1, 1) is
unique: it is always better to accept. And so in the case of (0, 2). Therefore, player 2 has only
two credible pure strategies, yyy and nyy.
1
(2, 0)
(1, 1)
2
y
2,0
(0, 2)
2
n
0,0
2
n
0,0
1,1
y
0,2
n
0,0
we conclude
game has only two rollback equilibria in pure strategies: (2, 0), yyy and (1, 1), nyy .4
Kuhns theorem makes no claims about uniqueness of the equilibrium. Indeed, as we saw
in the example above, the game has two rollback equilibria. However, it should be clear that
if no player is indierent between any two outcomes, then the equilibrium will be unique.
Note that all rollback equilibria are Nash equilibria. (The converse, of course, is not true. We
have just gone to great lengths to eliminate Nash equilibria that seem implausible.)
3.2 Subgame-Perfect Equilibrium
If you accept the logic of backward induction, then the following discussion should seem a
natural extension. Consider the game in Fig. 14 (p. 21). Here, neither of player 2s choices is
4
Player 2 is indierent between accepting and rejecting after (2, 0) oer, so she can mix after receiving
it. If she accepts with probability greater than 1/2, then player 1 is better o demanding (2, 0), and we have
a continuum of SPE there. If, however, she accepts with probability less than 1/2, then player 1 is better
o demanding (1, 1) instead (for another continuum of SPE). If she accepts exactly with probability 1/2, then
player 1 is indierent between (2, 0) and (1, 1), so he can mix between these two strategies as well (for another
continuum of SPE).
20
dominated at her second information set: she is better o choosing D if player 1 plays A and
is better o choosing C if player 1 plays B. Hence, we cannot apply rollback.
However, we can reason in the following way. The game that begins with player 1s second information setthe one following the history (D, R)is a zero-sum simultaneous move
game. We have seen similar games, e.g., Matching Pennies. The expected payos from the
unique mixed strategy Nash equilibrium of this game are (0, 0). Therefore, player 2 should
only choose R if she believes that she will be able to outguess player 1 in the simultaneousmove game. In particular, the probability of obtaining 2 should be high enough (in outweighing the probability of obtaining 2) that the expected payo from R is larger than 1 (the
payo he would get if he played L). This can only happen if player 2 believes she can outguess player 1 with a probability of at least 3/4, in which case the expected payo from R
will be at least 3/4(2) + 1/4(2) = 1. But, since player 2 knows that player 1 is rational (and
therefore just as cunning as she is), it is unreasonable for her to assume that she can outguess player 1 with such high probability. Therefore, player 2 should choose L, and so player
1 should go D. The equilibrium obtain by backward induction in the game in Fig. 14 (p. 21),
then, is (D, 1/2[A]), (L, 1/2[C]).
1
U
D
2
2, 2
R
1
3, 1
B
2
C
2, 2
2, 2 2, 2
D
2, 2
If the game has multiple Nash equilibria, then players must agree on which of them would occur. We shall
examine this weakness in the following section.
21
That is, x and x are in the same information set in the subgame if and only if they are in
the same information set in the original game. The payos in the subgame are the same as
the payos in the original game only restricted to the terminal nodes of the subgame. Note
that the word proper does not mean strict inclusion as in the term proper subset. Any
game is always a proper subgame of itself.6
Proper subgames are quite easy to identify in a broad class of extensive form games. For
example, in games of complete and perfect information, every information set (a singleton)
begins a proper subgame (which then extends all the way to the end of the tree of the original
game). Each of these subgames represents a situation that can occur in the original game.
On the other hand, splitting information sets in games of imperfect information produces
subgames that are not proper because they represent situations that cannot occur in the
original game. Consider, for example, the game Fig. 15 (p. 22) and two candidate subgames.
1
U
D
2
2
L
x R
2
L
x R
22
equilibrium is in mixed strategies, which requires that players be able to mix at each information set. (You should at this point go over the dierence between mixed and behavior
strategies in extensive-form games.)
This now allows us to solve games like the one in Fig. 14 (p. 21). There are three proper
subgames: the entire game, the subgame beginning with player 2s information set, and
the subgame that includes the simultaneous moves game. We shall work, as we did with
backward induction, our way up the tree. The smallest proper subgame has a unique Nash
equilibrium in mixed strategies, where each player chooses one of the two available actions
with the same probability of .5. Given these strategies, each players expected payo from
the subgame is 0. This now means that player 2 will choose L at her information set because
doing so nets her a payos strictly larger than choosing R and receiving the expected payo
of the simultaneous-moves subgame. Given that player 2 chooses L at her information set,
player 1s optimal course of action is to go D at the initial node. So, the subgame perfect
equilibrium of this game is (D, 1/2), (L, 1/2).
Lets compare this to the normal and reduced normal forms of this extensive-form game;
both of which are shown in Fig. 16 (p. 23).
UA
UB
Player 1
DA
DB
LC
2, 2
2, 2
3, 1
3, 1
Player 2
LD
RC
2, 2
2, 2
2, 2
2, 2
3, 1
2, 2
3, 1
2, 2
RD
2, 2
2, 2
2, 2
2, 2
U
DA
DB
L
2, 2
3, 1
3, 1
RC
2, 2
2, 2
2, 2
RD
2, 2
2, 2
2, 2
Figure 16: The Normal and Reduced Normal Forms of the Game from Fig. 14 (p. 21).
The normal form game on the left has 4 pure strategy Nash equilibria: U A, RC, U A, RD,
U B, RC, and U B, RD. The reduced normal form game has only two: U , RC and U , RD.
None of these are subgame
perfect.
However, the reduced form also has a Nash equilibrium
in mixed strategies, 1 , 2 , in which 1 (DA) = 1 (DB) = 1/2, and 1 (U ) = 0; while
2 (L) = 1, and 2 (RC) = 2 (RD) = 0. The Nash equilibrium is
(0, 1/2, 1/2), (1, 0, 0)
which is precisely the subgame-perfect equilibrium we already found.
At this point you should make sure you can nd this mixed strategy Nash equilibrium.
Suppose player 2 chooses RC for sure, then DB is strictly dominated, so player 1 will not use
it. However, this now means RD strictly dominates RC for player 2, a contradiction. Suppose
now that she chooses RD for sure. Then DA is strictly dominated, so player 1 will not use
it. But now RC strictly dominates RD, a contradiction. Therefore, there is no equilibrium in
which player 2 chooses either RC or RD for sure. Suppose player 2 puts positive weight on
L and RC only. Then, DA is strictly dominant for player 1. However, 2s best response to DA
is RD, a contradiction with supposition that she does not play it. Hence, no MSNE in which
she plays only L and RC. Suppose now that she plays L and RD only. Then DB is strictly
dominant, but player 2s best response to this is RC, a contradiction. Hence, no MSNE in
which she plays only L and RD either. Suppose next that she plays RC and RD only. Then
U is strictly dominant, and since player 2s payo is the same against U regardless of the
strategy she uses, we have a continuum of MSNE: U , 2 (RC) (0, 1), 2 (RD) = 1 2 (RC).
23
Suppose next she plays L for sure. Then player 1 is indierent between DA and DB, each of
which strictly dominates U , so he can mix with 1 (DA) (0, 1) and 1 (DB) = 1 1 (DA).
Since player 2 must not be willing to use any of her other pure strategies, it follows that
U2 (1 (DA), RC) 1 1 (DA) 1/4, and U2 (1 (DA), RD) 1 1 (DA) 3/4. Therefore,
1 (DA) [ 1/4, 3/4] are all admissible mixtures, and we have a continuum of MSNE. The
subgame-perfect MSNE is among these: the one with 1 (DA) = 1/2.7
As you can see, we found a lot of MSNE but only one of them is subgame-perfect. This
reiterates the point that all SPE are Nash, while not all Nash equilibria are subgame-perfect.
Note the dierent way of specifying the equilibrium in the extensive form and in the reduced
normal form.
We can now state a very important result that guarantees that we can nd subgame perfect
equilibria for a great many games.
Theorem 5. Every nite extensive game with perfect information has a subgame perfect
Nash equilibrium.
To prove this theorem, simply apply backward induction to dene the optimal strategies
for each subgame in the game. The resulting strategy prole is subgame perfect.
Lets revisit our basic escalation game from Fig. 10 (p. 19). It has three subgames, shown
in Fig. 17 (p. 24) and labeled I, II, and III. What are the pure-strategy Nash equilibria in all
these subgames? We have already found the three equilibria of subgame I: (e, a), r ,
(e, a), r , and (e, a), r . The Nash equilibrium of subgame III is trivial: a. You should
verify (e.g. by writing the normal form) that the Nash equilibria of subgame II are: a, r
and a, r .
I
II
III
1
2
e
1
e
r
r
15, 15
10, 10
10, 10
0, 0
I leave the nal possibility as an exercise: what happens if player 2 puts positive weight on all three of her
strategies?
24
(e, a), r and this is the unique subgame perfect equilibrium. Of course, it is the one we
got from our backward induction method as well.
Subgame perfection (and backward induction) eliminates equilibria based upon non-credible
threats and/or promises. This is accomplished by requiring that players are rational at every
point in the game where they must take action. That is, their strategies must be optimal at
every information set, which is is a much stronger requirement than the one for Nash equilibrium, which only demands rationality at the rst information set. This property is, of course,
sequential rationality, just as we dened it above.
Note that nowhere in our denition of extensive form games did we restrict either the number
of actions available to players at their decision nodes nor the number of decision nodes. For
example, a player may have a continuum of actions or some terminal history may be innitely
long. If a player has a continuum of action at any decision node, then there is an innite
number of terminal histories as well. We must distinguish between games that exhibit some
niteness from those that are innite.
If the length of the longest terminal history is nite, then the game has nite horizon.
If the game has nite horizon and nitely many terminal histories, then the game is nite.
Backward induction only works for games that are nite. Subgame perfection works ne for
innite games.
Lets begin by nding all subgame perfect equilibria in the following game.
1
L
2
a
3,0
b
1,0
C
2
c
2
d
1,1
2,1
e
2,2
f
1,3
Consider player 2s information set after history R. Playing f is strictly better than e, so any
equilibrium strategy for player 2 will involve choosing f at that information set. However,
in the information sets following histories L and C, player 2 is indierent between her two
actions, so she can pick either one from each pair. This yields four strategies for player 2:
(acf ), (adf ), (bcf ), and (bdf ). Player 1s best response
for
player 2 that
to any
strategy
prescribes a as the rst action is to player L. Therefore, L, acf and L, adf are subgame
perfect equilibria. Also, if player 2 is choosing b and d at the rst
two of her information
sets, then player 1s best response is to player C. Therefore, C, bdf is another SPE.If player
2s
is (bcf ),
C, bcf , and R, bcf are SPE. The game thus has six SPE in pure strategies. It has more in
mixed strategies.
4.1 Burning a Bridge
The Red Army, having retreated to Stalingrad is facing the advancing Wehrmacht troops. If
the Red Army stays, then it must decide whether to defend Stalingrad in the case of an attack
or retreat across Volga using the single available bridge for the purpose. Each army prefers
25
to occupy Stalingrad but ghting is the worst outcome for both. However, before the enemy
can attack, the Red Army can choose to blow up the bridge (at no cost to it), cutting o its
own retreat. Model this as an extensive form game and nd all SPE.
R
B
W
A
W
A
A
R
1, 0
1, 1 1, 0
F
1, 1
F
0, 1
Since this is a nite game of complete and perfect information, lets solve it by backward
induction. Starting with the longest terminal history, (B, A), Red Armys optimal action is to
retreat across the bridge, or F . Given this action, the Wehrmacht strategy following history
(B) would be to attack, or A. Its strategy after history (B) is not to attack, or A. Thus, the
Germans optimal strategy is (A, A). Given
this strategy,
the Red Army strictly prefers to
burn the bridge. So, the unique SPE is (B, F ), (A, A) . The outcome is that the Red Army
burns the bridge and the Germans dont attack.
This is an easy example that demonstrates a rather profound result of strategic interaction:
if you limit your choices and do so in a way that is observable by the opponent, then you may
obtain better outcomes. This is because unless the Red Army burns the bridge, it cannot
credibly commit to ghting in order to induce the Germans not to attack. (Their threat to
ght if attacked is not credible, and so deterrence fails.) However, by burning the bridge,
they leave themselves no choice but ght if attack, even though they dont like it. This makes
the threat to ght credible, and so the Germans are deterred.
You will see credible commitment problems in a wide variety of contexts in political science. Generally, a commitment is not credible if it will not be in the interest of the committing
party to carry out its promises should it have to do so. (In our language, its threats/promises
are not subgame perfect.) Limiting ones choices in an observable way may help in such
situations.8
4.2 The Dollar Auction
Lets now play the following game. I have $1 that I want to auction o using the following
procedure. Each student can bid at any time, in 10 cent increments. When no one wants
to bid further, the auction ends and the dollar goes to the highest bidder. Both the highest
bidder and the second highest bidder pay their bids to me. Each of you has $3.00 to bid with
and you cannot bid more than that.
[ What happened? ]
8
This is probably obvious by now but sometimes people get it completely wrong. Take the Trojans who
tried to burn the Greeks ships! Had they succeeded in doing so, this would have only made the Greeks ght so
much harder. (They failed and so the Greeks sailed in apparent defeat, enabling them to carry out the wooden
horse ruse.) William the Conqueror and Cortez got it right when the formed burned and the latter scuttled
their own ships, forcing the soldiers to ght to the end and compelling some of the opposition to surrender.
26
Lets analyze this situation by applying backward induction. We shall assume that it is not
worth spending a dollar in order to win a dollar. Suppose there are only two players, 1 and
2. If 2 ever bids $3.00 she will win the auction and lose $2.00. If she bids $2.90, then 1 has
to bid $3.00 in order to win. If 1s previous bid was $2.00 or less, then 1 will not outbid 2s
$2.90 because doing so (and winning) means losing $2.00 instead of whatever 1s last bid is,
which is at most that (this is where the assumption comes into action). But the same logic
applies for 2s bid of $2.80 because 1 cannot bid $2.90 and expect to win: in this case 2
will do strictly better by outbidding him with $3.00 and losing only $2.00 instead of $2.80.
Therefore, 1 can only win by bidding $3.00 but this means losing $2.00. Therefore, if 2 bids
$2.80, then 1 will pass if his last bid was $2.00 or less. In fact, any bid of $2.10 or more will
beat any last bid of $2.00 or less for that very reason. So, whoever bids $2.10 rst establishes
a credible threat to go all the way up to $3.00 in order to win. This player has already lost
$2.10, but it is worth spending $.90 in order to win the dollar.
Therefore, a bid of $2.10 is the same as a bid of $3.00. This means that in order to beat
a $2.00 bid, it is sucient to bid $2.10, nothing more. By a similar logic, $2.00 beats all bids
of $1.10 or less. Since beating $2.00 involves bidding $2.10, if the players last bid was for
$1.10 or less, there is no reason to outbid a $2.00 bid because doing so yields at most a loss
of $1.10, which is at most equal to the players current bid. In fact, even $1.20 is sucient
to beat a bid of $1.10 or less. This is because once someone bids $1.20, it is worthwhile for
that player to continue up to $2.10 to guarantee a victory (in which case they will stand to
lose $1.10 instead of $1.20).
Therefore, a bid of $1.20 is the same as a bid of $2.10, which is the same as a bid of
$3.00. By this logic, a bid of $1.10 beats bids of 20 cents or less. Because beating $1.10
involves bidding $1.20, if the players last bid was for 20 cents or less, there is no reason to
win. Only a player who bids 30 cents has a credible threat to go up to $1.20. So, any amount
down to 30 cents beats a bid of 20 cents or less. If someone bids 30 cents, no one with a bid
of 20 cents or less has an incentive to challenge.
Therefore, the rst player to bid should bid 30 cents and the auction should end. How
does that correspond to our outcome? (Probably not too well.)
Lets look at an extensive-form representation of a simpler variant, where two players have
$3.00 each but they can only bid in dollar increments in an auction for $2.00.
1
$3
1, 0
$1
Pass
2
Pass
0, 0
$2
2
$3
$2
Pass
$3
1
1, 0
Pass
1, 0
$3
1, 1
1, 2
27
0, 0
2, 1
We begin with the longest terminal history, ($1, $2), and consider 1s decision there. Since
1 is indierent between Passing and bidding $3, both are optimal (we do not use the indifference assumption as in the informal example). If he passes, then player 2 is indierent
between passing and bidding $2 at the decision node following history ($1). If, on the other
hand player 1 bids $3, then player 2s best course is to pass. So, the subgame beginning at the
information set ($1) has three SPE in pure strategies: (Pass, Pass), ($2, Pass), and (Pass, $3).
At the decision node following history ($2), player 2s unique optimal action is to pass,
and so the subgame perfect equilibrium there is (Pass). Therefore, player 2s strategy must
specify Pass for this decision node in any SPE.
Consider now player 1s initial decision. If the players strategies are such that they play
the (Pass, Pass) SPE in the subgame after history ($1), then player 1 does best by bidding $1
at the outset. Therefore, ($1, Pass), (Pass, Pass) is an SPE of the game.
If the strategies specify the SPE ($2, Pass), then player 1s best actions are to bid $2 or Pass
at the initial node. There are two SPE: ($2, Pass), ($2, Pass) and (Pass, Pass), ($2, Pass).
If the strategies specify the SPE (Pass, $3), then player 1s best action is again to bid $1,
and so there is one other SPE: ($1, $3), (Pass, Pass).
The game has four subgame perfect equilibria in pure strategies. It only has three outcomes: (a) player 1 bids $1 and player 2 passes, yielding player 1 a gain of $1 and player 2
a payo of $0; (b) player 1 passes and both get $0; (c) player 1 bids $2 and player 2 passes,
and both get $0.
Now suppose we introduce our assumption, which states that if a player is indierent between passing now and bidding something that will yield him the same payo later given
the other players strategy, then the player passes. This means that in the longest history
player 1 will Pass instead of bidding $3, which further implies player 2 will Pass at the preceding node, and so this subgame will have only one SPE: Pass, Pass. But now player 1 has
a unique best initial action, which is to bid $1. Therefore, the unique SPE under this assumption is ($1, Pass), (Pass, Pass). The outcome is that player 1 bids $1 and player 2 passes.
This corresponds closely to the outcome in our discussion above.
There is a general formula that you can use to calculate the optimal size of the rst bid,
which depends on the amount of money available to each bidder, the size of the award, and
the amount of bid increments. Let the bidding increment be represented by one unit (so the
unit in our example is a dime). If each player has b units available for bidding, and the award
is v units, the the optimal bid is ((b 1) mod (v 1)) + 1. In our example, this translates
to (30 1) mod (10 1) + 1 = 3 units, which equals 30 cents, just as we found.
It is interesting to note that the size of the optimal bid is very sensitive to the amount each
player has available for bidding. If each player has $2.80 instead of $3.00, then the optimal
bid is (28 1) mod (10 1) + 1 = 1, or just 10 cents. If, however, each has $2.70, then the
optimal bid is (27 1) mod (10 1) + 1 = 9, or 90 cents.
The Dollar Auction was rst described by Martin Shubik who reported regular gains from
playing the game in large undergraduate classes. The game is a very useful thought experiment about escalation. At the outset, both players are trying to win the prize by costly
escalation, but at some point the escalation acquires momentum of its own and players continue paying costs to avoid paying the larger costs of capitulating. The requirement that
both highest bidders pay the cost captures the idea of escalation. The general solution (the
formula for the size of the optimal bids) was published by Barry ONeill.
28
29
Of course, there are also the PSNE in which two players vote sincerely and the third, whose vote is irrelevant,
also votes for the same alternative. For instance, in (x, y), there is PSNE in which all players vote for x. Since
player 3s choice cannot aect the outcome, he might as well vote insincerely.
10
Here, as before, there are equilibria in which all three players vote for the same alternative and two of
them vote against their preferences. For instance, in (x, z) this would require them all to vote for x. Coupling
this with any PSNE in the 2nd round will yield an SPE, but the solution is implausible for the same reasons we
discussed already.
30
the other only if the winner can also beat the loser in a majority vote with sincere voting or
there is a third alternative that can defeat the loser and itself be defeated by the winner in
a majority vote with sincere voting (this is due to Shepsle and Weingast). In our situation y
can beat z on its own with sincere voting, and y can beat x through z because z can defeat
x in sincere voting and y in turn defeats z. Hence, there is an agenda that ensures y is
reachable.11
Agenda-setting give player 2 the ability to impose her most preferred outcome and there is
nothing (in this instance) that the others can do. For instance, player 1 and player 3 cannot
collude to defeat her obvious intent. To see this, suppose player 3 proposed a deal to player
1: if player 1 would vote sincerely for x in the rst round, then player 3 would reward him by
voting for x on the 2nd round. Since x will then beat y in the rst round, player 3s insincere
vote in the second round would ensure that x will defeat z as well. This would benet both
players: player 1 would get his most preferred outcome and player 3 would avoid the worst
outcome y and get his second-best. Unfortunately (for player 3), he cannot make a credible
promise to cast an insincere vote. If x defeats y in the rst round, then player 3 can get
his most preferred outcome by voting sincerely in (x, z) in the second round. Therefore,
he would renege on his pledge, so player 1 has no incentive to believe him. But since this
reneging would saddle player 1 with his worst outcome, player 1 would strictly prefer to cast
his sophisticated vote in the rst round even though he is perfectly aware of how player 2
has manipulated the agenda to her advantage. The inability to make credible promises, like
the inability to make credible threats, can seriously hurt players. In this instance, player 3
gets the worst of it.
4.3.2 The Ultimatum Game
Two players want to split a pie of size > 0. Player 1 oers a division x [0, ] according
to which his share is x and player 2s share is x. If player 2 accepts this oer, the pie
is divided accordingly. If player 2 rejects this oer, neither player receives anything. The
extensive form of this game is represented in Fig. 18 (p. 31).
2
1
x, x
0, 0
Figure 18: The Ultimatum Game.
In this game, player 1 has a continuum of action available at the initial node, while player 2
has only two actions. (The continuum of actions ranging from oering 0 to oering the entire
pie is represented by the dotted curve connecting the two extremes.) When player 1 makes
some oer, player 2 can only accept or reject it. There is an innite number of subgames
following a history of length 1 (i.e. following a proposal by player 1). Each history is uniquely
identied by the proposal, x. In all subgames with x < , player 2s optimal action is to
11
In our game, each of the three outcome is possible with an appropriate agenda. Sophisticated voting does
not reduce the chaos.
31
accept because doing so yields a strictly positive payo which is higher than 0, which is what
she would get by rejecting. In the subgame following the history x = , however, player 2
is indierent between accepting and rejecting. So in a subgame perfect equilibrium player
2s strategy either accepts all oers (including x = ) or accepts all oers x < and rejects
x = .
Given these strategies, consider player 1s optimal strategy. We have to nd player 1s
optimal oer for every SPE strategy of player 2. If player 2 accepts all oers, then player 1s
optimal oer is x = because this yields the highest payo. If player 2 rejects x = but
accepts all other oers, there is no optimal oer for player 1! To see this, suppose player 1
oered some x < , which player 2 accepts. But because player 2 accepts all x < , player
1 can improve his payo by oering some x such that x < x < , which player 2 will also
accept but which yields player 1 a strictly better payo.
Therefore, the ultimatum game has a unique subgame perfect equilibrium, in which player
1 oers x = and player 2 accepts all oers. The outcome is that player 1 gets to keep the
entire pie, while player 2s payo is zero.
This one-sided result comes for two reasons. First, player 2 is not allowed to make any
counteroers. If we relaxed this assumption, the SPE will be dierent. (In fact, in the next
section we shall analyze a very general bargaining model.) Second, the reason player 1 does
not have an optimal proposal when player 2 accepts all oers has to do with him being able
to always do a little better by oering to keep slightly more. Because the pie is perfectly
divisible, there is nothing to pin the oers. However, making the pie discrete (e.g. by slicing
it into n equal pieces and then bargaining over the number of pieces each player gets to keep)
will change this as well. (You will do this in your homework.)
4.3.3 The Holdup Game
We now analyze a three-stage game. Before playing the Ultimatum Game from the previous
section, player 2 can determine the size of the pie by exerting a small eort, eS > 0 resulting
in a small pie of size S , or a large eort, eL > eS , resulting in a larger pie of size L > S .
Since player 2 hates exerting any eorts, her payo from obtaining a share of size x is x e,
where e is the amount of eort expended. The extensive form of this game is presented in
Fig. 19 (p. 32).
2
eS
1
0
2
eL
x, S x eS 0, eS
1
C
C
Y
L
C
C 2
C
x, L x eL 0, eL
eort has a unique SPE where player 1 proposes x = and player 2 accepts all oers (note
that the dierence between this version and the one we saw above is that player 2 gets a
strictly negative payo if she rejects an oer instead of 0). So, in the subgame following
eS , player 1 oers S and in the subgame following eL he oers L . In both cases player 2
accepts these proposals, resulting in payos of eS and eL respectively. Given these SPE
strategies, player 2s optimal action at the initial node is to expend little eort, or eS because
doing so yields a strictly better payo.
We conclude that the SPE of the Holdup Game is as follows. Player 1s strategy is (S , L )
and player 2s strategy is (eS , Y , Y ), where Y means accept all oers. The outcome of the
game is that player 2 invests little eort, eS , and player 1 obtains the entire small pie S .
Note that this equilibrium does not depend on the values of sS , eL , S , L as long as eS < eL .
Even if L is much larger than S and eL is only slightly higher than eS , player 2 would still
exert little eort in SPE although it would be better for both players if player 2 exerted eL
(remember, only slightly larger than eS ) and obtained a slice of the larger pie. The problem is
that player 1 cannot credibly promise to give that slice tho player 2. Once player 2 expends
the eort, she can be held up for the entire pie by player 1.
This result holds for similar games where the bargaining procedure yields a more equitable
distribution. If player 2 must expend more eort to generate a larger pie and if the procedure
is such that some of this surplus pie goes to the other player, then for some values of player
2s cost of exerting this eort, she would strictly prefer to exert little eort. Although there
are many outcomes where both players would be strictly better o if player 2 exerted more
eort, these cannot be sustained in equilibrium because of player 1s incentives. In the
example above, player 1 would have liked to be able to commit credibly to oering some of
the extra pie to induce player 2 to exert the larger eort. Just like the problem with noncredible threats, the problem of non-credible promises means that this cannot happen in
subgame perfect equilibrium.
4.3.4 A Two-Stage Game with Several Static Equilibria
Consider the multi-stage game corresponding to two repetitions of the symmetric normal
form game depicted in Fig. 20 (p. 33). In the rst stage of the game, the two players simultaneously choose among their actions, observe the outcome, and then in the second stage play
the static game again. The payos are simply the discounted average from the payos in
each stage. That is, let pi1 represent player is payo at stage 1 and pi2 represent his payo at
stage 2. Then player is payo from the multi-stage game is ui = pi1 + pi2 , where (0, 1)
is the discount factor.
A
Player 1 B
C
A
0, 0
4, 3
0, 6
Player 2
B
C
3, 4
6, 0
0, 0
0, 0
0, 0
5, 5
How do we nd the MSNE? Suppose 1 (C) > 0 in a MSNE. We have several cases to consider.
First, suppose that 1 (B) > 0 as well; that is, player 1 puts positive weight on both B and
C (and possibly A). Since he is willing to mix, it follows that 42 (A) = 52 (C) 2 (C) =
4/ (A). There are two ways to satisfy this requirement. Suppose (C) = (A) = 0,
5 2
2
2
but in this case we obtain A, B. Suppose 2 (C) > 0, which implies 2 (A) > 0 too; that
is, player 2 must be willing to mix between A and C (and possibly B). This now implies
that 31 (B) + 61 (C) = 51 (C) 31 (B) + 1 (C) = 0, a contradiction because 1 (C) > 0.
Therefore, if 1 (C) > 0, then 1 (B) = 0 as well. Second, suppose that 1 (A) > 0 as well; that
is, player 1 puts positive weight on both A and C. Since he is willing to mix, it follows that
32 (B) + 62 (C) = 52 (C) 32 (B) + 2 (C) = 0, which implies 2 (B) = 2 (C) = 0, which
means 2 (A) = 1, but this means that player 1 will not mix and we have B, A. From all this,
we conclude that 1 (C) = 0 in MSNE. Analogously, symmetry gives us 2 (C) = 0 too. This
now reduces the game to the 2 2 variant in Fig. 21 (p. 34).
Player 1
A
B
Player 2
A
B
0, 0
3, 4
4, 3
0, 0
Figure 21: The Static Period Game after Some Equilibrium Logic..
This is now very easy to deal with. Since player 1 is willing to mix, it follows that 32 (B) =
42 (A) and since 1 (B) = 1 2 (A), this gives us 2 (A) = 3/7. Analogously, we obtain
1 (A) = 3/7 and 1 (B) = 4/7. The last thing we need to do is check that the players will not
want to use C given the mixtures (we already know this from the argument above, but it does
not hurt to recall the MSNE requirement). It suces to check for player 1: if he plays C, his
payo will be 0 given player 2s strategy of playing only A and B with positive probability,
which is strictly worse than the expected payo from either A or B. Hence, we do have our
MSNE indeed. The payos in the three equilibria are (4, 3), (3, 4), and ( 12/7, 12/7) respectively.
The ecient payo (5, 5) is not attainable in equilibrium if the game is played once.12
However, consider the following strategy prole for the two-stage game:
Player 1: play C at the rst stage. If the outcome is (C, C), play B at the second stage,
otherwise play 1 (A) = 3/7, 1 (B) = 4/7 at the second stage;
Player 2: play C at the rst stage. If the outcome is (C, C), play A at the second stage,
otherwise play 1 (A) = 3/7, 2 (B) = 4/7 at the second stage.
Is it subgame perfect? Since the strategies at the second stage specify playing Nash equilibrium proles for all possible second stages, the strategies are optimal there. At the rst
stage players can deviate and increase their payos by 1 from 5 to 6 (either player can choose
A). However, doing so results in playing the mixed strategy Nash equilibrium at the second
stage, which lowers their payos to 12/7 from 4 for player 1 and from 3 for player 2. Thus,
12
An outcome is ecient if it is not possible to make some player better o without making the other one
worse o. The outcomes with payos (0, 0) are all inecient, as are the outcomes with payos (4, 3) and
(3, 4). However, the outcomes (6, 0) and (0, 6) are also ecient.
34
12/ )
7
7/16
Similarly, player 2 will not deviate if:
6 + ( 12/7) 5 + (3)
1 (3
12/ )
7
7/9
We conclude that the strategy prole specied above is a subgame perfect equilibrium if
7/9. In eect, players can attain the non-Nash ecient outcome at stage 1 by threatening
to revert to the worst possible Nash equilibrium at stage 2. This technique will be very useful
when analyzing innitely repeated games, where we shall see analogous results.
4.4 The Principle of Optimality
This is an important result that you will make heavy use of both for multi-stage games and
for innitely repeated games that we shall look at next time. The principle states that to
check whether a strategy prole of a multi-stage game with observed actions is subgame
perfect, it suces to check whether there are any histories ht where some player i can prot
by deviating only from the actions prescribed by si (ht ) and conforming to si thereafter. In
other words, for games with arbitrarily long (but nite) histories,13 it suces to check if
some player can prot by deviating only at a single point in the game and then continuing
to play his equilibrium strategy. That is, we do not have to check deviations that involve
actions at several points in the game. You should be able to see how this simplies matters
considerably.
The following theorem is variously called The One-Shot (or One-Stage) Deviation Principle, and is essentially the principle of optimality in dynamic programming. Since this is
such a nice result and because it may not be obvious why it holds, we shall go through the
proof.
Theorem 6 (One-Shot Deviation Principle for Finite Horizon Games). In a nite multi-stage
.
and conforming to si thereafter while all other players stick to si
) is
Proof.
(Necessity.) This follows immediately from the denition of SPE. If (si , si
14
subgame perfect equilibrium, then no player has an incentive to deviate in any subgame.
We shall see that this principle also works for innitely repeated games under some conditions that will
always be met by the games we consider.
14
This does not hold for Nash equilibrium, which may prescribe suboptimal actions o the equilibrium path
(i.e. in some subgames).
35
with h. Let t be the largest t such that, for some ht , si (ht ) si (ht ). (That is, ht is the history
that includes all deviations.) Because si satises the OSDP, ht is longer than h and, because
the game is nite, ht is nite as well. Now consider an alternate strategy si that agrees with
si at all t < t and follows si from stage t on. Because si is the same as si in the subgame
beginning with ht +1 and the same as si in all subgames with t < t, the OSDP implies that it
a strategy that agrees with s1 until t 2, and argue that it is as good a response as s1 , and so
on. The sequence of improving deviations unravels from its endpoint.
The proof works as follows. You start from the last deviation in a sequence of multiple
deviations and argue that it cannot be protable by itself, or else the OSDP would be violated.
This now means that if you use the multiple-deviation strategy up to that point and follow the
original OSDP strategy from that point on, you would get at least as good a payo (again, because the last deviation could not have been the protable one, so the original OSDP strategy
will do at least as good in that subgame). You then go up one step to the new last deviation
and argue that this deviation cannot be protable either: since we are comparing a subgame
with this deviation and the original OSDP strategy to follow with the OSDP strategy itself, the
fact that the original strategy satises OSDP implies that this particular deviation cannot be
protable. Hence, we can replace this deviation with the action from the OSDP strategy too
and obtain at least as good a payo as the multi-deviation strategy. You repeat this process
until you reach the rst stage with a deviation and you reach the contradiction because this
deviation cannot be protable by itself either. In other words, if a strategy satises OSDP, it
must be subgame perfect.
An example here may be helpful. Since in equilibrium we hold all other players strategies
constant when we check for protable deviations, the diagram in Fig. 22 (p. 36) omits the
strategies for the other players and shows only player 1s moves at his information sets.
Label the information sets consecutively with small Roman numerals for ease of exposition.
Suppose that the strategy (adegi) satises OSDP. We want to show that there will be no more
protable other strategies even if they involve multiple deviations from this one. To make
the illustration even more helpful, I have bolded the actions specied by the OSDP strategy.
i
a
ii
c
iii
d
iv
g
v
w
os. For example, OSDP implies that changing from g to h at (iv) cannot be protable, which
implies u v. Also, at (v), changing from i to j cannot be protable, so y z. At (ii), changing to c cannot be protable; since the strategy species playing g at (iv), this deviation leads
to w u. At (iii), changing to f cannot be protable. Since the original strategy species i at
(v), this deviation will lead to x y. Finally, at (i) changing to b cannot be protable. Since
the original strategy species e at (iii), this deviation will lead to w x. The implications of
OSDP are listed as follows:
at (i) : w x
(1)
at (ii) : w u
(2)
at (iii) : x y
(3)
at (iv) : u v
(4)
at (v) : y z
(5)
These inequalities now imply some further relationships: from the rst and the third, we get
w y, and putting this together with the last yields w z as well. Furthermore, from the
third and last we obtain x z, and from the second and fourth we obtain w v. Putting
everything together yields the following orderings of the payos:
wxy z
and
w u v.
We can now check whether there exist any protable multi-stage deviations. (Obviously, there
will be no single-stage protable deviations because the strategy satises OSDP.) Take, for
example, an alternative strategy that deviates at (ii) and (iv); that is in the subgame starting at
(ii), it species ch. This will lead to the outcome v, which cannot improve on w, the outcome
from following the original strategy. Consider another alternative strategy which deviates
twice in the subgame starting at (iii); i.e., it prescribes f j, which would lead to the outcome
z. This cannot improve on x, the outcome the player would get from following the original
strategy. Going up to (i), consider a strategy that deviates at (i) and (v). That is, it prescribes
b and j at these information sets. Since (v) is still never reached, this actually boils down to a
one-shot deviation with the outcome x, which (not surprisingly) cannot improve on w, which
is what the player can get from following the original strategy. What if he deviated at (i) and
(iii) instead? This would lead to y, which is also no better than w. What if he deviated at (i),
(iii), and (v)? This would lead to z, which is also no better than w. Since all other deviations
that start at (i) leave (ii) and (iv) o the path of play, there is no need to consider them. This
example then shows how OSDP implies subgame-perfection. Intuitively, if a strategy satises
OSDP, then it implies a certain preference ordering, which in turn ensures that no multi-stage
deviations will be protable.
To see how the proof would work here. Take the longest deviation, e.g., a strategy that
deviates at (i), (iii), and (v). Since it leaves (ii) and (iv) o the path, lets consider (bdf gj)
as such a supposedly better alternative. Observe now that because (adegi) satises OSDP,
the deviation to j at (v) cannot be improving. This means that the strategy (bdf gi) is at
least as good as (bdf gj). Hence, if (bdf gj) is better than the original, then (bdf gi) must
also be better. Consider now (bdf gi): since it matches the original at (v), OSDP implies
that the deviation to f cannot be improving. Hence, the strategy (bdegi) is at least as good
as (bdf gi), which implies it is also at least as good as (bdf gj). Hence, if (bdf gj) is
better than the original, then (bdegi) must also be better. However, (bdegi) matches the
37
original strategy at all information sets except (i); i.e., it involves a one-shot deviation to b
which cannot be improving by OSDP. Since (bdegi) cannot improve on (adegi), neither can
(bdf gj), a contradiction to the supposition that it is better than (adegi). This is essentially
how the proof works.
Lets now see what OSDP gets you. Consider the game in Fig. 23 (p. 38). The SPE, which you
can obtain by backward induction, is ((bf ), d), with the outcome (3, 3).
1
a
b
2
1, 0
d
1
0, 1
e
2, 2
f
3, 3
People began analyzing simple two-stage games (e.g. ultimatum game where one player
makes an oer and the other gets to accept or reject it) to gain insight into the dynamics
of bargaining. Slowly they moved to more complicated settings where one player makes all
the oers while the other accepts or rejects, with no limit to the number of oers that can
be made. The most attractive protocol is the alternating-oers protocol where players take
turns making oers and responding to the other players last oer.
The alternating-oers game was made famous by Ariel Rubinstein in 1982 when he published a paper showing that while this game has innitely many Nash equilibria (with any
division supportable in equilibrium), it had a unique subgame perfect equilibrium! Now this
is a great result and since it is the foundation of most contemporary literature on strategic
bargaining, we shall explore it in some detail.15
4.5.1 The Basic Alternating-Oers Model
Two players, A and B, bargain over a partition of a pie of size > 0 according to the following
procedure. At time t = 0 player A makes an oer to player B about how to partition the pie.
If player B accepts the oer, then an agreement is made and they divide the pie accordingly,
ending the game. If player B rejects the oer, then she makes a counteroer at time t = 1. If
the counteroer is accepted by player A, the players divide the pie accordingly and the game
ends. If player A rejects the oer, then he makes a counter-counteroer at time t = 2. This
process of alternating oers and counteroers continues until some player accepts an oer.
To make the above a little more precise, we describe the model formally. Two players, A
and B, make oers at discrete points in time indexed by t = (0, 1, 2, . . .). At time t when t is
even (i.e. t = 0, 2, 4, . . .) player A oers x [0, ] where x is the share of the pie A would
keep and x is the share B would keep in case of an agreement. If B accepts the oer, the
division of the pie is (x, x). If player B rejects the oer, then at time t + 1 she makes
a counteroer y [0, ]. If player A accepts the oer, the division ( y, y) obtains.
Generally, we shall specify a proposal as an ordered pair, with the rst number representing
player As share. Since this share uniquely determines player Bs share (and vice versa) each
proposal can be uniquely characterized by the share the proposer oers to keep for himself.
The payos are as follows. While players disagree, neither receives anything (which means
that if they perpetually disagree then each players payo is zero). If some player agrees on a
partition (x, x) at some time t, player As payo is t x and player Bs payo is t ( x).
The players discount the future with a common discount factor (0, 1). This captures
the time preferences of the players: it is better to obtain an agreement sooner than later.
Heres how it works. Suppose agreement is reached on a partition (2, 2) at some time
t. Player As payo is 2t . Since 0 < < 1, as t increases, t decreases. Let = .9 (so
a dollar tomorrow is only worth 90 cents today). If this agreement is reached at t = 0,
player As payo (2)(.9)0 = 2. If the agreement is reached at t = 1, player As payo is
(2)(.9)1 = 1.8. If it happens at t = 2, player As payo is (2)(.9)2 = 1.62. At t = 10, the
payo is (2)(.9)10 .697, at t = 100, it is only (2)(.9)100 = .000053, and so on and so forth.
The point is that the further in the future a player gets some share, the less attractive this
same share is compared to getting it sooner.
15
The Rubinstein bargaining model is extremely attractive because it can be easily modied, adapted, and
extended to various settings. There is signicant cottage industry that does just that. The Muthoo (1999) book
gives an excellent overview of the most important developments. The discussion that follows is taken almost
verbatim from Muthoos book. If you are serious about studying bargaining, you should denitely get this book.
Yours truly also tends to use variations of the Rubinstein model in his own work on intrawar negotiations.
39
This completes the formal description of the game. You can draw the extensive form tree
for several periods, but since the game is not nite (theres an innite number of possible
oers at each information set and the longest terminal history is innitethe one where
players always reject oers), we cannot draw the entire tree.
4.5.2 Nash Equilibria
Lets nd the Nash equilibria in pure strategies for this game. Actually, we cannot nd all
Nash equilibria because theres an innite number of those. What we can do, however, is
characterize the payos that players can get in equilibrium. It turns out that any division of
the pie can be supported in some Nash equilibrium. To see this, consider the strategies where
player A demands x (0, ) in the rst period, then in each subsequent period where he
gets to make an oer, and always rejects all oers. This is a valid strategy for the bargaining
game. Given this strategy, player B does strictly better by accepting x instead of rejecting
forever, so she accepts the initial oer and rejects all subsequent oers. Given that B accepts
the oer, player As strategy is optimal.
The problem, of course, is that Nash equilibrium requires strategies to be mutually best
responses only along the equilibrium path. It is just not reasonable to suppose that player
A can credibly commit to rejecting all oers regardless of what player B does. To see this,
suppose at some time t > 0, player B oers y < to player A. According to the Nash
equilibrium strategy, player A would reject this (and all subsequent oers) which yields a
payo of 0. But player A can do strictly better by accepting y > 0! The Nash equilibrium
is not subgame perfect because player A cannot credibly threaten to reject all oers.
4.5.3 The Unique Subgame Perfect Equilibrium
Since this is an innite horizon game, we cannot use backward induction to solve it. However,
since every subgame that begins with an oer by some player is structurally identical with all
subgames that begin with an oer by that player, we shall look for an equilibrium with two
intuitive properties: (1) no delay: whenever a player has to make an oer, the equilibrium
oer is immediately accepted by the other player; and (2) stationarity: in equilibrium, a player
always makes the same oer.
It is important to realize that at this point I do not claim that such equilibrium existswe
shall look for one that has these properties. Also, I do not claim that if it does exist, it is
the unique SPE of the game. We shall prove this later. However, given that the subgames are
structurally identical, there is no a priori reason to think that oers must be non-stationary,
and, if this is the case, that there should be any reason to delay agreement (given that doing
so is costly). So it makes sense to look for an SPE with these properties.
Let x denote player As equilibrium oer and y denote player Bs equilibrium oer
(again, because of stationarity, there is only one such oer). Consider now some arbitrary
time t at which player A has to make an oer to player B. From the two properties, it follows
that if B rejects the oer, she will then oer y in the next period (stationarity), which
A will accept (no delay). So, Bs payo to rejecting As oer is y . Subgame perfection
requires that B reject any oer x < y and accept any oer x > y . From
the no delay property, this implies x y . However, it cannot be the case that
x > y because player A could increase his payo by oering some x such that
x > x > y . Hence:
(6)
x = y
40
Equation 6 states that in equilibrium, player B must be indierent between accepting and
rejecting player As equilibrium oer. By a symmetric argument it follows that in equilibrium,
player A must be indierent between accepting and rejecting player Bs equilibrium oer:
y = x
(7)
1+
which means that there may be at most one SPE satisfying the no delay and stationarity
properties. The following proposition species this SPE.
Proposition 1. The following pair of strategies is a subgame perfect equilibrium of the
alternating-oers game:
player A always oers x = /(1 + ) and always accepts oers y y ,
player B always oers y = /(1 + ) and always accepts oers x x .
Proof.
We show that player As strategy as specied in the proposition is optimal given
player Bs strategy. Consider an arbitrary period t where player A has to make an oer. If he
follows the equilibrium strategy, the payo is x . If he deviates and oers x < x , player B
would accept, leaving A strictly worse o. Therefore, such deviation is not protable. If he
instead deviates by oering x > x , then player B would reject. Since player B always rejects
such oers and never oers more than y , the best that player A can hope for in this case is
max{( y ), 2 x }. That is, either he accepts player Bs oer in the next period or rejects
it and As oer in the period after the next one is accepted. (Anything further down the road
will be worse because of discounting.) However, 2 x < x and also ( y ) = x < x ,
so such deviation is not protable. Therefore, by the one-shot deviation principle, player As
proposal rule is optimal given Bs strategy.
Consider now player As acceptance rule. At some arbitrary time t player A must decide
how to respond to an oer made by player B. From the above argument we know that player
As optimal proposal is to oer x , which implies that it is optimal to accept an oer y if
and only if y x . Solving this inequality yields y x and substituting for x
yields y y , just as the proposition claims.
This establishes the optimality of player As strategy. By a symmetric argument, we can
show the optimality of player Bs strategy. Given that these strategies are mutually best
responses at any point in the game, they constitute a subgame perfect equilibrium.
This is good but so far we have only proven that there is a unique SPE that satises the
no delay and stationarity properties. We have not shown that there are no other subgame
perfect equilibria in this game. The following proposition, whose proof involves knowing
some (not much) real analysis, states this result.16
Proposition 2. The subgame perfect equilibrium described in Proposition 1 is the unique
subgame perfect equilibrium of the alternating-oers game.
16
If you know what a supremum of a set is, you can ask me and I will tell you how the proposition can be
proved. There is a very elegant proof due to Shaked and Sutton that is much easier to follow than the extremely
complicated original proof by Rubinstein.
41
We are now in game theory heaven! The rather complicated-looking bargaining game has
a unique SPE in which agreement is reached immediately. Player A oers x at t = 0 and
player B immediately accepts this oer. The shares obtained by player A and player B in the
unique equilibrium are x = /(1 + ) and x = /(1 + ) respectively.
In equilibrium, the share depends on the discount factor and player As equilibrium share
is strictly greater than player Bs equilibrium share. In this game there exists a rst-mover
advantage because player A is able to extract all the surplus from what B must forego if she
rejects the initial proposal. In your homework you will be asked to nd the unique stationary
no delay SPE when the two players use dierent discount factors.
The Rubinstein bargaining model makes an important contribution to the study of negotiations. First, the stylized representation captures characteristics of most real-life negotiations:
(a) players attempt to reach an agreement by making oers and counteroers, and (b) bargaining imposes costs on both players.
Some people may argue that the innite horizon assumption is implausible because players have nite lives. However, this involves a misunderstanding of what the innite time
horizon really represents. Rather than modeling a reality where bargaining can continue forever, it models a reality where players do not stop bargaining after some exogenously given
predened time limit. The nite horizon assumption would have the two players to stop
bargaining even though each would prefer to continue doing so if agreement has not been
reached. Unless theres a good explanation of who or what prevents them from continuing to
bargain, the innite horizon assumption is appropriate. (There are other good reasons to use
the assumption and they have to do with the speed with which oers can be made. There are
some interesting models that explore the bargaining model in the context of deadlines for
reaching an agreement. All this is very neat stu and you are strongly encouraged to read it.)
4.6 Bargaining with Fixed Costs
Osborne and Rubinstein also study an alternative specication of the alternating-oers bargaining game where delay costs are modeled not as time preferences but as direct per-period
costs. These models do not behave nearly as nicely as the one we studied here, and they have
not achieved widespread use in the literature.
As before, there are two players who bargain using the alternating-oers protocol with
time periods indexed by t, (t = 0, 1, 2, . . .). Instead of discounting future payos, they pay
per-period costs of delay, c2 > c1 > 0. That is, if agreement is reached at time t on (x, x),
then player 1s payo is x tc1 and player 2s payo is x tc2 .
Lets look for a stationary no-delay SPE as before. Consider a period t in which player 1
makes a proposal. If player 2 rejects, then she can obtain y (t + 1)c2 by our assumptions.
If he accepts, on the other hand, she gets x tc2 because of the t period delay. Hence,
player 2 will accept any x tc2 y (t + 1)c2 , or x y c2 . To nd now the
maximum she can expect to demand, note that by rejecting her oer in t + 1, player 1 will
get x (t + 2)c1 and by accepting it, he will get y (t + 1)c1 because of the t + 1 period
delay up to his acceptance. Therefore, he will accept any y (t + 1)c1 x (t + 2)c1 ,
which reduces to y x c1 . Since player 2 will be demanding the most that player 1
will accept, it follows that y = x + c1 . This now means that player 2 cannot credibly
commit to reject any t period oer that satises:
x x + c1 c2 x x c1 c2 .
42
Observe now that since c1 < c2 , it follows that the RHS of the second inequality is negative.
Suppose now that x < , then it is always possible to nd x > x such that 0 > x x
c1 c2 . For instance, taking x = x (c1 c2 ) = x + (c2 c1 ) > x because c2 > c1 .
Therefore, if x < , it is possible to nd x > x such that player 1 will prefer to propose x
instead of x , which contradicts the stationarity assumption. Therefore, x = . This now
pins down y = c1 . This yields the following result.
Proposition 3. The following pair of strategies constitutes the unique stationary no-delay
subgame perfect equilibrium in the alternating-oers bargaining game with per-period costs
of delay c2 > c1 > 0:
player 1 always oers x = and always accepts oers y c1 ;
player 2 always oers y = c1 and always accepts oers x .
The SPE outcome is that player 1 grabs the entire pie in the rst period.
Obviously, if c1 > c2 > 0 instead, then player 1 will get c2 in the rst period and the rest
will go to player 2. In other words, the player with the lower cost of delay extracts the entire
bargaining surplus, which in this case is heavily asymmetric. If the low-cost player gets to
make the rst oer, he will obtain the entire pie. It turns out that this SPE is also the unique
SPE (if c1 = c2 , then there can be multiple SPE, including some with delay).
This model is not well-behaved in the following sense. First, no matter how small the cost
discrepancy is, the player with the lower cost gets everything. That is, it could be that player
1s cost is c1 = c2 , where > 0 is arbitrarily small. Still, in the unique SPE, he obtains the
entire pie. The solution is totally insensitive to the cardinal dierence in the costs, only to
their ordinal ranking. Note now that if the costs are very close to each other and we tweak
them ever so slightly such that c1 > c2 , then player 2 will get c2 ; i.e., the prediction is
totally reversed! This is not something you want in your models. It is perhaps for this reason
that the xed-cost bargaining model has not found wide acceptance as a workhorse model.
Although backward induction and subgame perfection give compelling arguments for reasonable play in simple two-stage games of perfect information, things get uglier once we
consider games with many players or games where each player moves several times.
5.1 Critiques of Backward Induction
There are two criticisms of BI and both have to do with questions about reasonable behavior.
In my mind, the second critique has more bite than the rst one, but I will give you both.
First, consider a game with n players that has the structure depicted in Fig. 24 (p. 44).
Since this is a game of perfect information, we can apply the backward induction algorithm.
The unique equilibrium is the prole where each player chooses C and in the outcome each
player gets 2.
People have argued that this is unreasonable because in order to get the payo of 2, all n1
players must choose C. If the probability that any player chooses C is p < 1, independent of
the others, then the probability that all n 1 will choose C is p n1 , which can be quite small
if n is large even if p itself is very close to 1. For example, with p = .999 and n = 1001,
43
n1
C
2, . . . , 2
1, . . . , 1
1/ , . . . , 1/
2
2
1/
1
1
1
n1, . . . , /n1 /n, . . . , /n
C
8, 6
2, 1
1, 3
4, 2
3, 5
6, 4
5, 7
Second, we may interpret the extensive form game as implicitly including the possibility
that players sometimes make small mistakes or trembles whenever they act. If the probabilities of trembles are independent across dierent information sets, then no matter how
often past play has failed to conform to the predictions of backward induction, a player is
still justied in continuing to use backward induction for the rest of the game. There is
a trembling-hand perfect equilibrium due to Selten that formalizes this idea. (This is a
defense of backward induction.)
The question now becomes one of choosing between two possible interpretations of deviations. In Fig. 25 (p. 44), if player 2 observes C, will she interpret this as a small mistake
by player 1 or as a signal that player 1 will choose C if given a chance? Who knows? I am
more inclined toward the latter interpretation but your mileage may vary. To see why it may
make sense to treat deviations as a signal, suppose we extend the centipede game to 40 periods and now suppose we nd ourselves in period 20; that is, both players have played C
10 times. Is it reasonable to suppose these were all mistakes? Or that perhaps players are
trying to get closer to the endgame where they would get better payos? In experimental
settings, players usually do continue for a while although they do tend to stop well short
of the end. One way we can think about this is that the game is not actually capturing everything about the players. In particular, in experiments a player may doubt the rationality
of the opponent (so he may expect her to continue) or he may believe she doubts his own
rationality (so she expects him to continue, which in turn makes him expect her to continue
as well). At any rate, small doubts like this may move the play beyond the game-stopping
rst choice by player 1. This does not mean that backward induction is wrong. What it
does mean is that the full information common knowledge assumptions behind it may not
be captured in experiments where real people play the Centipede Game. My reaction to this
is not to abandon backward induction but to modify the model and ask: what will happen if
players with small doubts about each others rationality play the Centipede Game? This is a
topic for another discussion, though.
5.2 Critiques of Subgame Perfection
Obviously, all the critiques of backward induction apply here as well. However, in addition
to these problems, SP also requires that players agree on the play in a subgame even when BI
cannot predict the play.
The coordination game between player 1 and 3 has three Nash equilibria: two in pure
strategies with payos (7, 10, 7), and one in mixed strategies with payos (3.5, 5, 3.5).17 If
we specify an equilibrium in which player 1 and 3 successfully coordinate, then player 2 will
choose R, and so player 1 will choose R as well, expecting a payo of 7. If we specify the
MSNE, then player 2 will choose L because R yields an expected payo of 5 (coordination will
fail half of the time). Again player 1 will choose R expecting a payo of 8. Thus, in all SPE of
this game player 1 chooses R.
Suppose, however, player 1 did not see a way to coordinate in the third stage, and hence
expected a payo of 3.5 conditional on this stage being reached, but feared that player 2
would believe that the play in the third stage would result in coordination on an ecient
equilibrium. (This is not unreasonable since the two pure strategy Nash equilibria there are
the ecient ones.) If player 2 had such expectations, then she would choose R, which means
that player 1 would go L at the initial node!
17
In this MSNE, each player chooses A with probability 1/2, as you should readily see.
45
1
L
R
2
6,0,6
R
3
8,6,8
B
1
A
0,0,0
7,10,7 7,10,7
B
0,0,0
46