Game Theory WSC PDF
Game Theory WSC PDF
Contents
0 Basic Decision Theory 4
0.1 Ordinal Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.2 Expected Utility and the von Neumann-Morgenstern Theorem . . . . . . . . 5
0.3 Utility for Money and Risk Attitudes . . . . . . . . . . . . . . . . . . . . . . . 9
0.4 Bayesian Rationality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
–2–
4 Bayesian Games 164
4.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
4.2 Bayesian Strategies, Dominance, and Bayesian Equilibrium . . . . . . . . . . 167
4.3 Computing Bayesian Equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . 170
4.4 More on Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
4.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
4.5.1 Auctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
4.5.2 The revelation principle for normal form games . . . . . . . . . . . . 179
4.6 Ex Post Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
4.7 Correlated Types and Higher-Order Beliefs . . . . . . . . . . . . . . . . . . . 183
4.8 Incomplete Information, Hierarchies of Beliefs, and Bayesian Games . . . . 188
4.9 Rationalizability in Bayesian Games . . . . . . . . . . . . . . . . . . . . . . . 192
–3–
7.6.2 Full surplus extraction via menus of lotteries . . . . . . . . . . . . . . 279
7.6.3 The beliefs-determine-preferences property and the Wilson doctrine 280
7.6.4 Robust mechanism design . . . . . . . . . . . . . . . . . . . . . . . . . 281
References 282
We consider a decision maker (or agent) who chooses among outcomes in some set Z. To
begin we assume that Z is finite.
The primitive description of preferences is in terms of a preference relation . For any
ordered pair of outcomes (x, y) ∈ Z × Z, the agent can tell us whether or not he weakly
prefers x to y. If yes, we write x y. If no, we write x y.
We can use these to define
strict preference: a b means [a b and b a].
indifference: a ∼ b means [a b and b a].
We say that the preference relation is a weak order if it satisfies the two weak order axioms:
Completeness says that there are no outcomes that the agent is unwilling or unable to
compare. (Consider Z = {do nothing, kill someone to save five others’ lives}.)
Transitivity rules out preference cycles. (Consider Z = {a scoop of ice-cream, an enormous
hunk of chocolate cake, a small plain salad}.)
Theorem 0.1. Let Z be finite and let be a preference relation. Then there is an ordinal utility
function u : Z → R that represents if and only if is complete and transitive.
Moreover, the function u is unique up to increasing transformations: v : Z → R also represents
if and only if v = f ◦ u for some increasing function f : R → R.
–4–
In the first part of the theorem, the “only if” direction follows immediately from the
fact that the real numbers are ordered. For the “if” direction, assign the elements of Z
utility values sequentially; the weak order axioms ensure that this can be done without
contradiction.
“Ordinal” refers to the fact that only the order of the values of the utility function have
meaning. Neither the values nor differences between them convey information about
intensity of preferences. This is captured by the second part of the theorem, which says
that utility functions are only unique up to increasing transformations.
If Z is (uncountably) infinite, weak order is not enough to ensure that there is an ordinal
utility representation:
There are various additional assumptions that rule out such examples. One is
Theorem 0.3. Let Z ⊆ Rn and let be a preference relation. Then there is a continuous ordinal
utility function u : Z → R that represents if and only if is complete, transitive, and continuous.
–5–
$2M with probability 2 ,
1
lottery 2:
$0 with probability 12 .
One tempting possibility is to look at expected values: the weighted averages of the
possible values, with weights given by probabilities.
Let Z be a finite set of outcomes. We denote by ∆Z the set of probability distributions over
Z: ∆Z = {p : Z → R+ | z∈Z p(z) = 1}. Here we interpret p ∈ ∆Z as a lottery over outcomes
P
from Z.
$0 $0
.2 .9
.8 $10
.1 1
$100 $100
An agent has preferences over ∆Z, where “p q” means that he likes p at least as much
as q.
The preference relation on ∆Z admits an expected utility representation if there is a function
u : Z → R such that
X X
(1) pq ⇔ u(z) p(z) ≥ u(z) q(z).
z∈Z z∈Z
The function u is called a Bernoulli utility function. The expectations of u with respect to p
and q appearing on the right in (1) are called von Neumann-Morgenstern (or NM) expected
utility functions.
–6–
When can a preference relation on ∆Z be represented using expected utilities? To answer
this question, we introduce compound lotteries, or “lotteries over lotteries”. An example of
a compound lottery is shown at left in the figure below. We assume that the agent only
cares about the ultimate probabilities of obtaining each outcome in Z, as shown at right in
the figure. The lottery at right is called the reduced lottery corresponding to the compound
lottery.
$0
.2
.41 $0
.7 .8 $10
= .56 $10
.9 $0 .03
.3 .1 $100
$100
Example 0.6. In the figure above, the compound lottery .7p + .3q is identified with the
single-stage lottery with outcome probabilities .7(.2, .8, 0) + .3(.9, 0, .1) = (.41, .56, .03). ♦
(NM2) Continuity: For all p, q, and r such that p q r, there exist δ, ε ∈ (0, 1) such that
The continuity axiom says that there is no outcome so good or so bad that having any
probability of receiving it dominates all other considerations.
Example 0.7. p = win Nobel prize for sure, q = nothing, r = get hit by a bus for sure. ♦
p q ⇔ αp + (1 − α)r αq + (1 − α)r.
–7–
The independence axiom says that introducing the same probability of a third lottery r to
lotteries p and q does not reverse preferences between p and q.
Example 0.8. Let p = (.2, .8, 0), q = (.9, 0, .1), r = (0, 0, 1). Independence implies that if p is
preferred to q, then p̂ = .5p + .5r = (.1, .4, .5) is preferred to q̂ = .5q + .5r = (.45, 0, .55). ♦
$0
.2 .9 $0
.8 $10 .1
.5 .5
$100
.5 1 $100 1
.5 $100
uc = λua + (1 − λ)ub .
–8–
This says that λ is the probability on a in a lottery over a and b that makes this lottery
exactly as good as getting c for sure. ♦
Thus r s. ♦
The problem in Example 0.11 is that the preferences p q and s r include a greater
sensitivity to low probability events than NM expected utility allows. For alternative
models that address this and other problematic examples, see Kahneman and Tversky
(1979) and Gilboa (2009).
Suppose that the outcomes are amounts of money measured in dollars (as in our example
of $1M for sure vs. a 50% chance of $2M). Then for any lottery p ∈ ∆Z, we can compute
the expected dollar return
X
p̄ = z p(z).
z∈Z
–9–
We say that an agent is risk averse if she always weakly prefers getting p̄ for sure to the
lottery p, risk loving if she always prefers p to getting p̄ for sure, and risk neutral if she is
always indifferent.
Risk preferences can be read from the shape of an agent’s utility function u : R → R from
dollar outcomes to units of utility. We generally assume that u is increasing: x < y ⇒
u(x) < u(y). If u is differentiable, this implies that u0 (x) ≥ 0 for all x ∈ R.
u(·) u(·)
u(αx+(1−α)y)
αu(x)+(1−α)u(y)
0 x αx+(1−α)y y $ 0 x y $
For n = 2 this is the definition of concavity. For n > 2 this is proved by induction.
In settings with uncertainty, where all relevant probabilities are objective and known, we
call an agent NM rational if he acts as if he is maximizing a NM expected utility function.
–10–
What if the probabilities are not given? We call an agent Bayesian rational (or say that he
has subjective expected utility preferences) if
(i) In settings with uncertainty, he forms beliefs about the probabilities of all relevant
events.
(ii) When making decisions, he acts to maximize his expected utility given his beliefs.
(iii) After receiving new information, he updates his beliefs by taking conditional prob-
abilities whenever possible.
In game theory, it is standard to begin analyses with the assumption that players are
Bayesian rational.
Foundations for subjective expected utility preferences are obtained from state-space models
of uncertainty. These models begin with a set of possible states whose probabilities are
not given, and consider preferences over maps from states to outcomes. Savage (1954)
provides an axiomatization of subjective expected utility preferences in this framework.
Both the utility function and the assignment of probabilities to states are determined as
part of the representation. Anscombe and Aumann (1963) consider a state-space model in
which preferences are not over state-contingent outcomes, but over maps from states to
lotteries à la von Neumann-Morgenstern. This formulation allows for a much a simpler
derivation of subjective expected utility preferences, and fits very naturally into game-
theoretic models. See Gilboa (2009) for a textbook treatment of these and more general
models of decision under uncertainty.
We study some basic varieties of games and the connections among them:
–11–
1.1 Basic Concepts
2
c d
C 5, 5 0, 6
1
D 6, 0 3, 3
This game is an instance of a Prisoner’s Dilemma, and the traditional names for the strategies
are Cooperate and Defect; see the remarks following Example 1.9. ♦
1.1.1 Definition
If each player chooses some si ∈ Si , the strategy profile is s = (s1 , . . . , sn ) and player j’s
payoff is u j (s).
In our description of a game above, players each choose a particular pure strategy si ∈ Si .
But it is often worth considering the possibility that each player makes a randomized
choice.
–12–
Mixed strategies and mixed strategy profiles
If A is a finite set, then we let ∆A represent the set of probability distributions over A: that
is, ∆A = {p : A → R+ | a∈A p(a) = 1}.
P
Then σi ∈ ∆Si is a mixed strategy for i, while σ = (σ1 , . . . , σn ) ∈ i∈P ∆Si is a mixed strategy
Q
profile.
Under a mixed strategy profile, players are assumed to randomize independently: for
instance, learning that 1 played C provides no information about what 2 did. In other
words, the distribution on the set S of pure strategy profiles created by σ is a product
distribution.
Example 1.2. Battle of the Sexes.
2
a b
A 3, 1 0, 0
1
B 0, 0 1, 3
Suppose that 1 plays A with probability 34 , and 2 plays a with probability 14 . Then
σ = (σ1 , σ2 ) = (σ1 (A), σ1 (B)), (σ2 (a), σ2 (b)) = ( 43 , 14 ), ( 41 , 34 )
The pure strategy profile (A, a) is played with probability σ1 (A) · σ2 (a) = 3
4
· 1
4
= 3
16
. The
complete product distribution is presented in the matrix below.
2
1
a ( 4 ) b ( 34 )
A ( 34 ) 3
16
9
16
1
B ( 14 ) 1
16
3
16
♦
When player i has two strategies, his set of mixed strategies ∆Si is the simplex in R2 , which
is an interval.
A B
0 1 B
–13–
When player i has three strategies, his set of mixed strategies ∆Si is the simplex in R3 ,
which is a triangle.
A A
B C B C
When player i has four strategies, his set of mixed strategies ∆Si is the simplex in R4 ,
which is a pyramid.
Correlated strategies
It is sometimes useful to introduce the possibility that agents can randomize in a correlated
way. This is possible if each player observes a signal that is correlated with the signals
observed by others, and conditions his action choice on the signal he observes. Such signals
can be introduced artificially, but they may also appear naturally in the environment where
the players play the game—see Sections 1.5.2 and 4.5.2.
–14–
2
a b
1
A 2
0
1 1
B 0 2
This behavior cannot be achieved using a mixed strategy profile, since it requires correla-
tion: any mixed strategy profile putting weight on (A, a) and (B, b) would also put weight
on (A, b) and (B, a):
2
y>0 (1 − y) > 0
a b
x>0 A xy x(1 − y)
1
(1 − x) > 0 B (1 − x)y (1 − x)(1 − y)
Example 1.4. Suppose that P = {1, 2, 3} and Si = {1, . . . , ki }. Then a mixed strategy profile
σ = (σ1 , σ2 , σ3 ) ∈ i∈P ∆Si consists
Q
probability vectors of lengths k1 , k2 , and k3 ,
Qof three
while a correlated strategy ρ ∈ ∆ i∈P Si is a single probability vector of length k1 · k2 · k3 .
♦
We write “⊂” because the sets on each side are subsets of different spaces. A more precise
statement is that the set of correlated strategies generated by mixed strategy profiles σ via
Y
ρσ (s) = σi (si )
i∈P
is strictly contained in the set of all correlated strategies. These two points are illustrated
in the next example.
–15–
Example 1.5. If S1 = {A, B} and S2 = {a, b}, then the set of mixed strategy profiles ∆{A, B} ×
∆{a, b} is the product of two intervals, and hence a square. The set of correlated strategies
∆({A, B} × {a, b}) is a pyramid.
a b Aa
A
Bb
B Ab Ba
The set of correlated strategies that correspond to mixed strategy profiles—in other words,
the product distributions on {A, B} × {a, b}—forms a surface in the pyramid. Specifically,
this surface consists of the correlated strategies satisfying ρ(Aa) = (ρ(Aa) + ρ(Ab)) · (ρ(Aa) +
ρ(Ba)). (Showing that this equation characterizes the product distributions is a good
exercise).
1.1.3 Conjectures
One can divide traditional game-theoretic analyses into two classes: equilibrium and
non-equilibrium. In equilibrium analyses (e.g., using Nash equilibrium), one assumes that
players correctly anticipate how opponents will act. In this case, a Bayesian rational player
will maximize his expected utility with respect to a correct prediction about opponents’
strategies. In nonequilibrium analyses (e.g., dominance arguments) correct predictions
are not assumed. Instead, Bayesian rationality requires players to form probabilistic
conjectures about how their opponents will act, and to maximize their expected payoffs
given their conjectures.
–16–
In a two-player game, a full conjecture of player i is a probability distribution νi over player
j’s mixed strategies. We can write this loosely as νi ∈ ∆(∆S j ). (This is loose because ∆S j
is an infinite set.) Player i’s conjecture is thus a “compound probability distribution”
over S j : he assigns probabilities to mixed strategies of player j, which themselves assign
probabilities to pure strategies in S j .
Since player i is an expected utility maximizer, he only cares about the “reduced proba-
bilities” over S j . These are represented by a reduced conjecture µi ∈ ∆S j . When νi has finite
support, the reduced conjecture µi it generates is
X
µi (s j ) = νi (σ j ) σ j (s j ).
σj
(An infinite support would require an integral.) Notice that a reduced conjecture is the
same kind of object as a mixed strategy of player j: both are probability measures on S j .
We call two full conjectures equivalent if they generate the same reduced conjecture.
Example 1.6. Let S2 = {L, R}, let σ2 = (σ2 (L), σ2 (R)) = ( 12 , 21 ), let σ̂2 = ( 13 , 23 ), and suppose that
player 1’s full conjecture is ν1 (σ2 ) = 14 , ν1 (σ̂2 ) = 43 . The corresponding reduced conjecture
is
Under ν̃1 , player 1 isn’t sure of which pure strategy player 2 is playing, but he is sure that
the strategy is pure. Under ν̄1 , player 1 is certain that player 2 will play mixed strategy σ̄2 .
The final probabilities on L and R are the same in both cases. (Aside: The specification of
ν̃1 above is rigorous, but we would usually just write ν̃1 (L) = 38 and ν̃1 (R) = 58 .) ♦
In our nonequilibrium analyses in Sections 1.2 and 1.3, it will be easier to work directly
with reduced conjectures, which we will call conjectures for short. An exception will occur
in Example 1.16.
The only novelty that arises in games with three or more players is that player i’s conjecture
may reflect correlation in opponents’ choices—for instance, through the observation of
–17–
signals that player i himself does not see. Here a (reduced) conjecture µi ∈ ∆( j,i S j ) is
Q
a probability measure on pure strategy profiles of i’s opponents. It is thus the same
sort of object as a correlated strategy among i’s opponents. See Section 1.3.3 for further
discussion.
In all cases, we assume that a player’s own randomization is independent of the conjec-
tured behavior of his opponents, in the sense that joint probabilities are obtained by taking
products—see (4) below.
payoffs probabilities
Suppose σ = (σ1 , σ2 ) = (σ1 (A), σ1 (B)), (σ2 (a), σ2 (b)) = ( 34 , 14 ), ( 14 , 34 ) is played. Then
u1 (σ) = 3 · 3
16
+0· 9
16
+0· 1
16
+1· 3
16
= 34 ,
u2 (σ) = 1 · 3
16
+0· 9
16
+0· 1
16
+3· 3
16
= 34 . ♦
–18–
Q
In nonequilibrium analyses, player i only has a conjecture µi ∈ ∆ j,i S j about his
opponents’ behavior. If player i plays mixed strategy σi ∈ ∆Si , his expected utility is
X
(4) ui (σi , µi ) = ui (s) · σi (si ) µi (s−i ).
s∈S
Here we are using the independence between own randomization and conjectured oppo-
nents’ behavior assumed at the end of the previous section.
There is a standard
abuse of notation here. In (2) ui acts on correlated strategies (so that
ui : ∆ ∆S → R),
Q Q
j∈P S j → R), in (3) ui acts on mixed strategy profiles (so that ui :
Qj∈P j
and in (4) ui acts on mixed strategy/conjecture pairs (so that ui : ∆Si × ∆ j,i S j → R).
Sometimes we even combine mixed strategies with pure strategies, as in ui (si , σ−i ). In the
end we are always taking the expectation of ui (s) over the relevant distribution on pure
strategy profiles s, so there is really no room for confusion.
Suppose we are given some normal form game G. How should we expect Bayesian ra-
tional players (i.e., players who form conjectures about opponents’ strategies and choose
optimally given their conjectures) playing G to behave? We consider a sequence of in-
creasingly restrictive methods for analyzing normal form games. We start by considering
the implications of Bayesian rationality and of common knowledge of rationality. After
this, we introduce equilibrium assumptions.
We always assume that the structure and payoffs of the game are common knowledge: that
everyone knows these things, that everyone knows that everyone knows them, and so on.
Notation:
G = {P , {Si }i∈P , {ui }i∈P } a normal form game
s−i ∈ S−i = j,i S j
Q
a profile of pure strategies for i’s opponents
µi ∈ ∆S−i i’s (reduced) conjecture about his opponents’ strategies
“Dominance” concerns strategies whose performance is good (or bad) regardless of how
opponents behave.
–19–
Strategy σi ∈ ∆Si is strictly dominant if
That is, strategy σi is strictly dominant if it maximizes i’s expected payoffs regardless of
his conjecture about his opponents’ strategies.
While condition (5) directly concerns Bayesian rationality, we now provide a condition
that is easier to check.
(6) ui (si , s−i ) > ui (s0i , s−i ) for all s0i , si and s−i ∈ S−i .
In words, (ii) says that player i prefers si to any other pure strategy s0i for any pure strategy
profile of his opponents.
Proof. (i) The payoff to a mixed strategy is the weighted average of the payoffs of the
pure strategies it uses. Thus against any fixed conjecture µi , the mixed strategy cannot do
strictly better than all pure strategies. Thus the mixed strategy cannot be strictly dominant.
In detail: for any conjecture µi ∈ ∆(S−i ), we have
X X
ui (σi , µi ) = ui (si , µi ) σi (si ) (where ui (si , µi ) = ui (si , s−i ) µi (s−i ) ).
si ∈Si s−i ∈S−i
It is thus impossible that ui (σi , µi ) > ui (si , µi ) for all si ∈ Si , so condition (5) cannot hold.
(ii) (5) ⇒ (6) is immediate: consider the conjecture µi with µi (s−i ) = 1.
(6) ⇒ (5) holds because the inequality in (5) is a weighted average of those in (6). In more
detail: By part (i), the dominating strategy must be pure. To check (5) when the alternate
strategy is also pure, observe that for any fixed µi ,
X X
ui (si , µi ) = ui (si , s−i ) µi (s−i ) > ui (s0i , s−i ) µi (s−i ) = ui (s0i , µi ),
s−i ∈S−i s−i ∈S−i
where the inequality follows from (6). Checking (5) when i’s alternate strategy is mixed
requires an additional step; this is left as an exercise.
–20–
2
c d
C 5, 5 0, 6
1
D 6, 0 3, 3
Joint payoffs are maximized if both players chill. But regardless of what player 2 does,
player 1 is better off dashing. The same is true for player 2. In other words, D and d are
strictly dominant strategies. ♦
Remarks
(i) The game above is an instance of a Prisoner’s Dilemma: a 2 x 2 symmetric normal
form game in which the strategy profile that maximizes total payoffs has both
players playing a strictly dominated strategy. An asymmetric version of this game
first appeared in an experiment run by Flood and Dresher in 1950; see Flood (1952).
A symmetric version of the game, and the famous backstory of prisoners being
interrogated in separate rooms, is due to Tucker; see Luce and Raiffa (1957).
(ii) The entries in the payoff bimatrix are the players’ Bernoulli utilities. If the game is
supposed to represent the game show from Example 1.1, then having these entries
correspond to the dollar amounts in the story is tantamount to assuming that (i)
each player is risk neutral, and (ii) each player cares only about his own dollar
payoffs. If other considerations are important—for instance, if the two players are
friends and care about each others’ fates—then the payoff matrix would need to be
changed to reflect this, and the analysis would differ correspondingly; see Gilboa
(2010, Section 7.1) for an illuminating discussion.
To summarize, the analysis above tells us only that if each player is rational and
cares only about his dollar payoffs, then we should expect to see (D, d).
Most games do not have strictly dominant strategies. How can we get more mileage out
of the notion of dominance?
Strategy σ0i ∈ ∆Si is strictly dominated by strategy σi ∈ ∆Si if
–21–
(i) σ0i is strictly dominated by σi if and only if
(7) ui (σi , s−i ) > ui (σ0i , s−i ) for all s−i ∈ S−i .
2
3
M + 31 ( 12 T + 12 M) = 16 T + 65 M.
2
L R
T 3, − 0, − not dominated
1 M 0, − 3, −
C 2, − 2, −
M C
–22–
M in its support is strictly dominated: if mixed strategy σ1 has σ1 (T), σ1 (M) ≥ p > 0,
then by reducing σ1 (T) and σ1 (M) by p and increasing σ1 (C) by 2p, we can increase
expected payoffs by 2p · (2 − 23 ) = p against both of player 2’s pure strategies.
Some games without a strictly dominant strategy can still be solved using the idea of
dominance.
Example 1.10. Consider the two-player normal form game G with players A (Al) and B
(Bess), strategy sets SA = SB = {1, 2, 3} (representing effort levels), and payoff functions
2
(sA ) − sB if (sA , sB ) , (3, 1),
uA (sA , sB ) = (min{sA , sB })2 − |sA − sB |, uB (sA , sB ) =
−sB if (sA , sB ) = (3, 1).
In words, Al receives the square of the lower strategy chosen and pays a cost equal to the
distance between the two strategies. Bess pays sB , and she receives the square of sA unless
she chooses the minimum effort and Al chooses the maximum. This game’s payoff matrix
is presented below.
B
1 2 3
1 1, 0 0, −1 −1, −2
A 2 0, 3 4, 2 3, 1
3 −1, −1 3, 7 9, 6
Analysis: Al does not have a dominated pure strategy. Bess has one dominated pure
strategy: sB = 3 is dominated by strategy sB = 2. Thus if Bess is rational she will not play
sB = 3.
If Al knows that Bess is rational, he knows that she will not play sB = 3. So if Al is rational,
he won’t play sA = 3, which is strictly dominated by sA = 2 once sB = 3 is removed.
Now if Bess knows:
(i) that Al knows that Bess is rational
(ii) that Al is rational
then Bess knows that Al will not play sA = 3. Hence, since Bess is rational she will not
play sB = 2.
Continuing in a similar vein: Al will not play sA = 2.
Therefore, (sA , sB ) = (1, 1) solves the game by iterated strict dominance. ♦
–23–
Iterated strict dominance is driven by common knowledge of rationality—by the assumption
that all statements of the form “i knows that j knows that . . . k is rational” are true—as
well as by common knowledge of the game itself.
To see which strategies survive iterated strict dominance it is enough to
(i) Iteratively remove all dominated pure strategies.
(ii) When no further pure strategies can be removed, check all remaining mixed strate-
gies.
The remarks from Section 1.2.2 provide the reasons that we don’t need to consider mixed
strategies explicitly until the end. During the iterated elimination, removing a pure
strategy implicitly removes the mixed strategies that use it. This is justified by remark
(iii), which says that if a pure strategy is dominated, so are all mixed strategies that use it.
Remark (i) says that to see which of a player’s strategies are dominated, we only need to
consider performance against opponents’ pure strategy profiles. Because of this, removing
opponents’ dominated mixed strategies at some stage of the iteration won’t affect which
strategies can be removed in the next stage. But remark (iv) says we need to look for
dominated mixed strategies at the end, because a mixed strategy can be dominated even
if none of the pure strategies it uses are. (And remark (iii) says that we need to consider
mixed strategies as possible dominating strategies.)
Proposition 1.11. The set of strategies that remains after iteratively removing strictly dominated
strategies does not depend on the order in which the dominated strategies are removed.
See Dufwenberg and Stegeman (2002) or Ritzberger (2002) for a proof.
Often, iterated strict dominance will eliminate a few strategies but not completely solve
the game.
–24–
As with strict domination, it is enough to consider opponents’ pure strategy profiles: (8)
is equivalent to
Strategy σi ∈ ∆Si is weakly dominant if it weakly dominates all other strategies. As the
names suggest, all strictly dominant strategies are also weakly dominant. And as with
strict dominance, only pure strategies can be weakly dominant.
(We use the term very weak dominance when only the first requirement in (8) or (9) is
imposed. This notion is more commonly applied in Bayesian games (Section 4.2) and
especially mechanism design (Section 7.1.3).)
Example 1.12. Weakly dominated strategies are not ruled out by Bayesian rationality alone.
2
L R
T 1, − 0, −
1
B 0, − 0, − ♦
While the use of weakly dominated strategies is not ruled out by Bayesian rationality
alone, the avoidance of such strategies is often taken as a first principle. In decision
theory, this principle is referred to as admissibility; see Kohlberg and Mertens (1986) for
discussion and historical comments. In game theory, admissibility is sometimes deduced
from the principle of cautiousness, which requires that players not view any opponents’
behavior as impossible; see Asheim (2006) for discussion.
It is natural to contemplate iteratively removing weakly dominated strategies. However,
iterated removal and cautiousness conflict with one another: removing a strategy means
viewing it as impossible, which contradicts cautiousness. See Samuelson (1992) for dis-
cussion and analysis. One consequence is that the order of removal of weakly dominated
strategies can matter—see Example 1.13 below. (For results on when order of removal
does not matter, see Marx and Swinkels (1997) and Østerdal (2005).) But versions of
iterated weak dominance can be placed on a secure epistemic footing (see Brandenburger
et al. (2008)), and moreover, iterated weak dominance is a powerful tool for analyzing
extensive form games (see Section 2.5.1).
–25–
2
L R
U 5, 1 4, 0
1 M 6, 0 3, 1
D 6, 4 4, 4
In the game above, removing weakly dominated strategy U, then weakly dominated
strategy L, and then strictly dominated strategy M leads to prediction (D, R). But remov-
ing weakly dominated strategy M, then weakly dominated strategy R, and then strictly
dominated strategy U leads to the prediction (D, L). ♦
An intermediate solution concept between ISD and IWD is introduced by Dekel and Fu-
denberg (1990), who suggest one round of elimination of all weakly dominated strategies,
followed by iterated elimination of strictly dominated strategies. Since weak dominance
is not applied iteratively, the tensions described above do not arise. Strategies that survive
this Dekel-Fudenberg procedure are sometimes called permissible. See Section 2.6 for further
discussion.
1.3 Rationalizability
Iterated strict dominance may not exhaust the set of strategies that won’t be played by
Bayesian rational players under common knowledge of rationality. Bayesian rational
players not only avoid dominated strategies; they also avoid strategies that are not a best
response to any conjecture about opponent’s behavior. If we apply this idea iteratively,
we obtain the sets of rationalizable strategies for each player.
Strategy σi is a best response to (reduced) conjecture µi ∈ ∆S−i if
–26–
Proposition 1.14. Strategy σi is a best response to µi if and only if every pure strategy si in the
support of σi is a best response to µi .
Proof. Rewrite player i’s expected utility from mixed strategy σi given conjecture µi as
X X X
ui (σi , µi ) = ui (si , s−i ) σi (si ) µi (s−i ) = ui (si , µi ) σi (si ).
si ∈Si s−i ∈S−i si ∈Si
The final expression is a weighted average of the expected utilities ui (si , µi ) of each pure
strategy si given conjecture µi . The probability distributions σi ∈ ∆Si that maximize this
weighted average are precisely those that place all mass on pure strategies that maximize
ui (si , µi ).
The rationalizable strategies (Bernheim (1984), Pearce (1984)) are those that remain after we
iteratively remove all strategies that are not best responses to any allowable conjectures.
Because it only requires common knowledge of rationality, rationalizability is a relatively
weak solution concept. Still, when rationalizability leads to many rounds of removal, it
can result in stark predictions.
3
Example 1.15. Guessing 4
of the average.
There are n players. Each player’s strategy set is Si = {0, 1, . . . , 100}.
3
The target integer is defined to be 4
of the average strategy chosen, rounding down.
All players choosing the target integer split a prize worth V > 0. If no one chooses the
target integer, the prize is not awarded.
Which pure strategies are rationalizable in this game?
To start, we assert that for any pure strategy profile s−i of his opponents, player i has a
response ri ∈ Si such that the target integer generated by (ri , s−i ) is ri . (Proving this is a
short but tricky exercise.) Thus for any conjecture µi about his opponents, player i can
obtain a positive expected payoff by playing a best response to some s−i in the support of
µi .
So: Since Si = {0, 1, . . . , 100},
⇒ The highest possible average is 100.
⇒ The highest possible target is b 34 · 100c = 75.
⇒ Strategies in {76, 77, . . . , 100} yield a payoff of 0.
⇒ Since player i has a strategy that earns a positive expected payoff given his conjec-
ture, strategies in {76, 77, . . . , 100} are not best responses.
–27–
Thus if players are rational, no player chooses a strategy above 75.
⇒ The highest possible average is 75.
⇒ The highest possible target is b 43 · 75c = b56 14 c = 56.
⇒ Strategies in {57, . . . , 100} yield a payoff of 0.
⇒ Since player i has a strategy that earns a positive expected payoff given his conjec-
ture, strategies in {57, . . . , 100} are not best responses.
Thus if players are rational and know that others are rational, no player chooses a strategy
above 56.
Proceeding through the rounds of eliminating strategies that cannot be best responses, we
find that no player will choose a strategy higher than
75 . . . 56 . . . 42 . . . 31 . . . 23 . . . 17 . . . 12 . . . 9 . . . 6 . . . 4 . . . 3 . . . 2 . . . 1 . . . 0.
Thus, after 14 rounds of iteratively removing strategies that cannot be best responses, we
conclude that each player’s unique rationalizable strategy is 0. ♦
2
L C Q
T 3, 3 0, 0 0, 2
1 M 0, 0 3, 3 0, 2
D 2, 2 2, 2 2, 0
–28–
L T
T
_1 C + _2 L _2 L + _1 Q _1 M + _2 T _2 M + _2 T + _1 D
3 3 5 5 5
3 3 3 3
_1 C + _1 L _1 M + _1 T _1 D + _1 T
2 2
2 2 2 2
_2 C + _1 L Q
3 3
_2 M + _1 T L
3 3
D
M C
C _2 C + _1 Q Q M _1 M + _1 D D
3 3 2 2
B1: ΔS2 ⇒ ΔS1 B2 : ΔS1 ⇒ ΔS2
T L
Best responses
for player 1
Everything is a best
response for player 2
M D C Q
In the next round of eliminating strategies that are not best responses to any conjectures,
player 2 knows that player 1 will never play a mixture that uses both T and M. Can it be
a best response for her to play Q?
Yes. Player 2 knows that T, M, and D are all possible best responses for 1. Thus player
2 may hold any conjectures that put all of their weight on these pure strategies. One
–29–
possible conjecture for player 2 is µ2 (T) = µ2 (M) = 12 , and against this conjecture, Q is the
unique best response.
Likewise, if 2’s conjecture is µ2 (T) = µ2 (M) = 52 and µ2 (D) = 15 , then all of her strategies
are best responses. Thus 2’s set of rationalizable strategies is ∆S2 . Since none of 2’s
strategies are removed, 1’s set of rationalizable strategies is his set of best responses,
{σ1 ∈ ∆S1 : σ1 (T) = 0 or σ1 (M) = 0}.
To explain what is happening more carefully we need to bring back full conjectures (Section
1.1.3). The reduced conjecture µ2 (T) = µ2 (M) = 21 is generated by the full conjecture
ν2 ( 21 T + 12 M) = 1. Since 12 T + 12 M is not a best response, this full conjecture is not allowed.
But µ2 (T) = µ2 (M) = 21 is also generated by the full conjecture ν2 (T) = ν2 (M) = 21 . Since
T and M are both best responses, this full conjecture is allowed, which explains why the
reduced conjecture is allowed. ♦
When we compute the rationalizable strategies, we must account for each player’s uncer-
tainty about his opponent’s strategies. Thus, during each iteration we must leave in all of
his best responses to all conjectures over the surviving pure strategies, even conjectures
that correspond to mixed strategies that can be ruled out. We thus have the following
procedure to compute the players’ rationalizable strategies:
(i) Iteratively remove pure strategies that are never a best response to any allowable
conjecture.
(ii) When no further pure strategies can be removed, remove mixed strategies that are
never a best response to any allowable conjecture.
where conv(A) denotes the convex hull of the set A (and j is i’s opponent). Then the set Ri
of player i’s rationalizable strategies is the intersection of the nested sets Σki :
∞
\
(12) R∗i = Σki .
k=0
–30–
In a finite game, this procedure terminates in a finite number of steps, so that R∗i = Σki for
all large enough finite k.
In this definition, Σk−1
j
⊆ ∆(S j ) is the set of mixed strategies of player j that survive k − 1
rounds of removal. Then conv(Σk−1 j
) ⊆ ∆(S j ) is the set of allowable conjectures of player i
used in round k. Notice how this step takes advantage of the fact that j’s mixed strategies
and i’s conjectures are both elements of ∆(S j ). Finally Σki = Bi (conv(Σk−1 j
)) is the set of i’s
best responses to these allowable conjectures, and so is the set of i’s strategies that survive
k rounds of removal.
As in the computation procedure described above, we could replace the set of mixed
strategies Σkj with the set of pure strategies it contains until the final round of removal;
both sets generate the same set of conjectures, or equivalently, both have the same convex
hull.
It is obvious that:
Observation 1.17. If σi is strictly dominated, then σi is not a best response to any conjecture.
In two-player games, the converse of this observation is also true: any strategy that is not
a best response to any conjecture is strictly dominated. (Below we explain how to obtain
this result as an application of the supporting hyperplane theorem.) We therefore have an
equivalence:
Theorem 1.18. In a two-player game, σi is strictly dominated if and only if σi is not a best response
to any conjecture.
Theorem 1.19. In a two-player game, a strategy satisfies iterated strict dominance if and only if
it is rationalizable.
Whether Theorems 1.18 and 1.19 hold in games with more players depends on how we
define rationalizability in these games—see Section 1.3.3.
Background on hyperplanes
–31–
x ∈ Hp,0 as a vector (namely, the vector from the origin to the point x), then Hp,0 is the set of
vectors that are orthogonal to p. Being able to interpret elements of Rn both as points and
as vectors is key for understanding what follows.
Example 1.20. In R2 , a subspace of dimension 1 is a line through the origin. For instance,
the line x2 = ax1 defines a subspace. Since we can rewrite this equation as (−a, 1) · x = 0, it
is the subspace Hp,0 with normal vector p = (−a, 1). Below is the case in which a = − 21 , so
that p = ( 21 , 1). ♦
p = ( 12 , 1)
x2 = – 12 x1 ⇔
p ·x = 0
x2 = – 12 x1+ c
4 ⇔
p· x = c
p.x = 4
2 p.x = 2
p.x = 0
p = ( 12 , 1)
2 4 6 8
–32–
When the normal vector p is drawn with its tail at the origin, it points towards hyperplanes
Hp,c with c > 0. This follows from the law of cosines: p · x = ||p|| ||x|| cos θ > 0 where θ is
the angle between vectors p and x; thus p · x > 0 when θ is acute.
A set of the form {x ∈ Rn : p · x ≥ c} is called a half space; it contains the points on the side
of the hyperplane Hp,c toward which p is pointing.
Let c satisfy p · x ≤ c ≤ p · x̂ for all x ∈ A and x̂ ∈ B. Then Hp,c is a hyperplane that separates
A and B. Any y in bd(A) ∩ bd(B) must satisfy p · y = c.
p .y = c
p.x� > c
p.x < c B
y
A
p
For proofs, discussion, examples, etc. see Hiriart-Urruty and Lemaréchal (2001).
–33–
The (⇒) direction is immediate. The (⇐) direction follows from the supporting hyperplane
theorem. We illustrate the proof using an example.
Example 1.24. Our goal is to show that in the two-player game below, [σi ∈ ∆Si is not
strictly dominated] implies that [σi is a best response to some µi ∈ ∆S−i ].
2
L R
A 2, − 5, −
B 6, − 3, −
1
C 7, − 1, −
D 3, − 2, −
Let v1 (σ1 ) = (u1 (σ1 , L), u1 (σ1 , R)) be the vector payoff induced by σ1 . Note that u1 (σ1 , µ1 ) =
µ1 · v1 (σ1 ).
Let V1 = {v1 (σ1 ) : σ1 ∈ ∆S1 } be the set of such vector payoffs. Equivalently, V1 is the convex
hull of the vector payoffs to player 1’s pure strategies. It is closed and convex.
Now σ1 ∈ ∆S1 is not strictly dominated if and only if v1 (σ1 ) lies on the northeast boundary
of V1 . For example, σ̂1 = 12 A + 21 B is not strictly dominated, with v1 (σ̂1 ) = (4, 4). We want
to show that σ̂1 is a best response to some µ̂1 ∈ ∆S2 .
R
6 μ� 1 = (_13 , _23 )
v1(A) = (2, 5)
5
_1 _1
v1( σ� 1 ) = v1( 2 A + 2 B) = (4, 4)
4
v1(B) = (6, 3)
3
V1 μ� 1 . w1 = 4
2
v1(D) = (3, 2)
1 � 1 . w1 < 4
μ
v1(C) = (7, 1)
1 2 3 4 5 6 7 8 L
A general principle: when you are given a point on the boundary of a convex set, the
normal vector at that point often reveals something interesting.
–34–
The point v1 (σ̂1 ) lies on the hyperplane µ̂1 · w1 = 4, where µ̂1 = ( 13 , 32 ).
This is a supporting hyperplane for the set V1 , where µ̂1 · w1 ≤ 4.
Put differently,
µ̂1 · w1 ≤ µ̂1 · v1 (σ̂1 ) for all w1 ∈ V1
⇒ µ̂1 · v1 (σ1 ) ≤ µ̂1 · v1 (σ̂1 ) for all σ1 ∈ ∆S1 (by the definiton of V1 )
⇒ u1 (σ1 , µ̂1 ) ≤ u1 (σ̂1 , µ̂1 ) for all σ1 ∈ ∆S1
The same argument shows that every mixture of A and B is a best response to µ̃1 .
We can repeat this argument for all mixed strategies of player 1 corresponding to points
on the northeast frontier of V1 , as in the figure below at left. The figure below at right
presents player 1’s best response correspondence, drawn beneath graphs of his pure
strategy payoff functions. Both figures link player 1’s conjectures and best responses: in
the left figure, player 1’s conjectures are the normal vectors, while in the right figure,
player 1’s conjectures correspond to different horizontal coordinates.
μ� 1 = (_13 , _23 )
5 v1(A) u1(A, μ1 )
4
� 1=
μ (_23 , _13 ) u1(B, μ1)
3
V1 v1(B)
u1(D, μ1 )
2
v1(D) u1(C, μ1)
v1(C) C B A
_2 L+ _1 R _1 L+ _2 R
L 3 3 3 3 R ♦
2 3 4 5 6 7 L
There are two definitions of rationalizability for games with three or more players. The
commonly used definition, called correlated rationalizability, allows for correlation in player
i’s conjecture about his opponents’ choices. This is how we defined conjectures in games
with three or more players in Section 1.1.3. Player i might hold a correlated conjecture
if he thought his opponents could coordinate their choices by observing a (not explicitly
modeled) signal that i himself does not observe. (This argument is related to the revelation
principle for normal form games; see Section 1.5.2 and especially Section 4.5.2.)
–35–
With correlated conjectures, the supporting hyperplane argument from the two-player
case goes through essentially unchanged, so we recover the earlier equivalences:
Theorem 1.25. In a finite-player game, σi is strictly dominated if and only if σi is not a best
response to any correlated conjectures.
Theorem 1.26. In a finite-player game, a strategy satisfies iterated strict dominance if and only if
it is correlated rationalizable.
The original definition of rationalizability from Bernheim (1984) and Pearce (1984) is
now called independent rationalizability. This definition assumes that a player separately
forms conjectures about each of his opponents’ mixed strategies, and that his conjectures
about different opponents’ choices are independent of one another, in that the induced
correlated conjecture is a product measure. With such independent conjectures, Theorem
1.25 fails: there are three-player games with strategies that are not a best response to any
independent conjectures but are not strictly dominated (Example 1.27). The reason the
separating hyperplane argument does not work is that with three or more players, the
normal vectors to the analogue of the set V1 at a “northeast” boundary point are correlated
conjectures, and none of these may correspond to an independent conjecture.
Example 1.27. Consider the following three-player game in which only player 3’s payoffs
are shown.
Strategy B is not strictly dominated, since a dominating mixture of A and C would need
to put at least probability 43 on both A (in case 1 and 2 play (T, L)) and C (in case 1 and 2
play (B, R)). If player 3’s conjectures about player 1’s choices and player 2’s choices are
independent, B is not a best response: Independence implies that for some t, l ∈ [0, 1], we
can write µ3 (T, L) = tl, µ3 (T, R) = t(1 − l), µ3 (B, L) = (1 − t)l, and µ3 (B, R) = (1 − t)(1 − l).
Then
–36–
1.3.4 A positive characterization of rationalizability
(b) si ∈ Bi (µi ).
(i) If sets R1 , . . . , Rn satisfy condition (∗), then the strategies in these sets are rationalizable.
(ii) The sets R∗1 , . . . , R∗n of rationalizable pure strategies satisfy condition (∗).
Proof. For part (i), note that condition (*) implies that strategies in the sets Ri cannot be
removed during the first round of elimination, and thus cannot be removed in the second
round, and thus the third either. . .
For part (ii), let R∗1 , . . . , R∗n be the sets of rationalizable pure strategies. Since these sets
are obtained after a finite number of rounds of elimination, each si ∈ R∗i must be a best
response to a conjecture concentrated on R∗ ; otherwise it would be have been removed
−i
before the iteration terminated.
The following is the special case of Theorem 1.28(i) obtained when all conjectures µi are
concentrated on a single pure strategy profile.
Corollary 1.29. Let Ri ⊆ Si for all i ∈ P . Suppose that for each i ∈ P and each si ∈ Ri there is a
s−i ∈ R−i such that si ∈ Bi (s−i ). Then the strategies in each set Ri are rationalizable.
Example 1.30. Consider the following game:
2
x y z
X 3, 3 5, 2 5, 2
1 Y 2, 5 7, 0 0, 7
Z 2, 5 0, 7 7, 0
We show that all pure strategies are rationalizable by applying Corollary 1.29.
First consider (R1 , R2 ) = ({X}, {x}). Since X ∈ B1 (x) and x ∈ B2 (X), these strategies are
rationalizable. Since each strategy is a best response to the opponent’s strategy, (X, x) is a
Nash equilibrium.
–37–
Now consider (R1 , R2 ) = ({Y, Z}, {y, z}). Since Y ∈ B1 (y), Z ∈ B1 (z), y ∈ B2 (Z), and z ∈
B2 (Y), these strategies are also rationalizable. Here no strategy profile consists of mutual
best responses; instead, the four strategy profiles form a best response cycle. Clearly,
the rationalizability of these strategies relies on the fact that correct conjectures are not
required. ♦
With some additional notation, we can restate (*) from Theorem 1.28 concisely as a (set-
valued) fixed point condition. For each set R−i , let
[
Bi (R−i ) = Bi (µi )
µi : µi (R−i )=1
be the set of strategies of player i that are best responses to some conjecture that is
concentrated on opponents’ strategy profiles in R−i . Then condition (*) can be restated as
the requirement that
1.4.1 Definition
We first define Nash equilibrium in a way that is indirect (and abuses notation) but
connects Nash equilibrium with rationalizabililty (cf. Theorem 1.28). Strategy profile
σ ∈ Σ is a Nash equilibrium supported by full conjectures {νi }i∈P if for all i ∈ P ,
(a) νi (σ−i ) = 1, and
(b) σi ∈ Bi (νi ).
In words, (a) says that player i has correct beliefs—he assigns probability 1 to the mixed
strategy profile his opponents will play; (b) says that player i’s strategy is optimal given
these beliefs.
Now that we have made our point about correct conjectures, we can define Nash equi-
–38–
librium directly, without reference to conjectures. To reduce the amount of notation, let
Σi = ∆Si denote player i’s set of mixed strategies; also let Σ = j∈P ∆S j and Σ−i = j,i ∆S j .
Q Q
Finally, recycling notation, let Bi : Σ−i ⇒ Σi be player i’s best response correspondence
defined over opponents’ mixed strategy profiles:
(14) σi ∈ Bi (σ−i ).
In words: each player plays a best response to the strategy profile of his opponents.
(Compare this to characterization (13) of rationalizability.)
Example 1.31. Good Restaurant, Bad Restaurant.
2
g b
G 2, 2 0, 0
1
B 0, 0 1, 1
Everything is rationalizable.
The Nash equilibria are: (G, g), (B, b), ( 13 G + 23 B, 31 g + 32 b).
Checking the mixed equilibrium:
u2 ( 13 G + 23 B, g) = 1
3
· 2 + 23 · 0 = 2
3
u2 ( 13 G + 23 B, b) = 1
3
· 0 + 23 · 1 = 2
3
We can refine our prediction applying the notion of strict equilibrium. s∗ is a strict
equilibrium if for each i, s∗i is the unique best response to s∗−i . That is, Bi (s∗−i ) = {s∗i } for all i.
Strict equilibria seem especially compelling.
But strict equilibria do not exist in all games (unlike Nash equilibria: see Section 1.4.3).
In the previous example, the Nash equilibrium (G, g) maximizes both players’ payoffs.
One might be tempted to say that a Nash equilibrium with this property is always the one
to focus on. But this criterion is not always compelling:
–39–
Example 1.32. Joint investment.
Each player can make a safe investment that pays 8 for sure, or a risky investment that
pays 9 if the other player joins in the investment and 0 otherwise.
2
r s
R 9, 9 0, 8
1
S 8, 0 8, 8
The Nash equilibria here are (R, r), (S, s) and ( 89 R + 19 S, 89 r + 91 s). Although (R, r) yields both
players the highest payoff, each player might be tempted by the sure payoff of 8 that the
safe investment guarantees. ♦
Finding the pure strategy Nash equilibria of a normal form game is simple. For each
player i and each pure strategy profile s−i of i’s opponents, mark the payoff corresponding
to each of i’s best responses. The boxes whose payoffs are all marked correspond to the
pure Nash equilibria.
Example 1.33. A Cournot duopoly. Two firms, Acme and Tyrell, compete in quantities in
{0, 1, 2, 3, 4}. Neither has fixed costs. Acme has a marginal cost of 2 for each of its first 2
units, and a marginal cost of 4 for subsequent units. Tyrell has a marginal cost of 3 for
each of its first 2 units, and a marginal cost of 4 for subsequent units. When a total of Q
units are produced, the market price of the good is 10 − Q.
Tyrell
0 1 2 3 4
0 0, 0 0, 6 0, 10 0, 11 0, 10
1 7, 0 6, 5 5, 8 4, 8 3, 6
Acme 2 12, 0 10, 4 8, 6 6, 5 4, 2
3 13, 0 10, 3 7, 4 4, 2 1, −2
4 12, 0 8, 2 4, 2 0, −1 −4, −6
The unique pure Nash equilibrium is (2, 2). One can verify that this is also the solution by
iterated strict dominance, and thus the unique rationalizable strategy profile. ♦
–40–
2
L C Q
T 3, 3 0, 0 0, 2
1 M 0, 0 3, 3 0, 2
D 2, 2 2, 2 2, 0
This game has two pure Nash equilibria, (T, L) and (M, C). ♦
This method extends directly to games with continuous strategy sets—see Section 1.4.5.
To proceed further, it is helpful to derive explicit links between Nash equilibrium and
rationalizability.
Proposition 1.35. (i) Any pure strategy used with positive probability in a Nash equilibrium
is rationalizable.
(ii) If each player has a unique rationalizable strategy, the profile of these strategies is a Nash
equilibrium.
Proof. By Theorem 1.28, the strategies in Ri ⊆ Si are rationalizable if for each i ∈ P and
each si ∈ Ri , there is a conjecture µi ∈ ∆S−i such that
(a) the support of µi is contained in R−i , and
(b) si is a best response to µi .
To prove part (i) of the proposition, let σ ∈ (σ1 , . . . , σn ) be a Nash equilibrium, and let Ri
be the support of σi . Each si ∈ Ri is a best response to σ−i (by Proposition 1.14), so (a) and
(b) hold with µi determined by σ−i (i.e. µi (s−i ) = j,i σ j (s j ) ).
Q
Clearly, if all players’ rationalizable strategies are unique, they are pure. To prove part (ii)
of the proposition, suppose that s = (si , . . . , sn ) is the unique rationalizable strategy profile.
Then (a) and (b) imply that si is a best response to s−i , and so s is a Nash equilibrium.
With Proposition 1.35 as a starting point, one can specify guidelines for computing all
mixed strategy Nash equilibria of a game:
(i) Eliminate pure strategies that are not rationalizable.
(ii) For each profile of supports, find all equilibria.
Step (ii) is sometimes called the support enumeration algorithm. Once the profile of supports
is fixed, one identifies all equilibria with this profile of supports by introducing the
optimality conditions implied by the supports. For a player to use a certain support,
the pure strategies in the support of a player’s equilibrium strategy receive the same
payoff, which is at least as high as payoffs for strategies outside the support. In this way,
–41–
each player’s optimality conditions restrict what the other players’ strategies may be. In
addition, restricting all but one player to small supports, especially to singleton supports,
imposes direct restrictions on remaining player’s best response—see the first two cases of
Example 1.36. To sum up, the support enumeration algorithm divides the analysis into a
set of exhaustive cases, each of which places additional structure on the problem.
Inevitably, this approach is computationally intensive: if player i has ki strategies, there
are i∈P (2ki − 1) possible profiles of supports, and each can have multiple equilibria. (In
Q
practice, one fixes the supports of only n − 1 players’ strategies, and restricts the nth
player’s strategy using the implied optimality conditions—see the examples below.)
Section 1.4.3 discusses the structure of the set of Nash equilibria and algorithms for
computing Nash equilibria in two-player games.
2
L C Q
T 3, 3 0, 0 0, 2
1 M 0, 0 3, 3 0, 2
D 2, 2 2, 2 2, 0
L T
T
_1 C + _2 L _2 L + _1 Q _1 M + _2 T _2 M + _2 T + _1 D
3 3 5 5 5
3 3 3 3
_1 C + _1 L _1 M + _1 T _1 D + _1 T
2 2
2 2 2 2
_2 C + _1 L Q
3 3
_2 M + _1 T L
3 3
D
M C
C _2 C + _1 Q Q M _1 M + _1 D D
3 3 2 2
B1: ΔS2 ⇒ ΔS1 B2 : ΔS1 ⇒ ΔS2
We showed in Example 1.16 that the players’ sets of rationalizable strategies are R1 = {σ1 ∈
∆S1 : σ1 (T) = 0 or σ1 (M) = 0} and R2 = ∆S2 .
Q is rationalizable because it is a best response to the conjecture µ2 (T) = µ2 (M) = 12 . But
since Q is not a best response to any σ1 ∈ R1 , Q is never played in a Nash equilibrium.
The key point here is that in Nash equilibrium, player 2’s conjecture is correct (i.e., place
–42–
probability on player 1’s actual strategy). Thus, we need not consider any support for σ2
that includes Q.
Three possible supports for σ2 remain:
{L} ⇒ 1’s BR is T ⇒ 2’s BR is L ∴ (T, L) is Nash
{C} ⇒ 1’s BR is M ⇒ 2’s BR is C ∴ (M, C) is Nash
(i) (ii)
{L, C} ⇒ Optimality for player 2 requires that u2 (σ1 , L) = u2 (σ1 , C) ≥ u2 (σ1 , Q)
To determine the set of σ1 for which (i) and (ii) hold, look at the picture of B2 , or compute:
But the only strategy in B1 (∆S2 ) = {σ1 ∈ ∆S1 : σ1 (T) = 0 or σ1 (M) = 0} that puts equal
weight on T and M is the pure strategy D. Player 1 is willing to play D if
Since we are assuming that player 2’s strategy has support {L, C}, her strategy must be of
the form L + (1 − α)C) with α ∈ [ 13 , 23 ].
Thus each mixed strategy profile (D, αL + (1 − α)C) with α ∈ [ 13 , 32 ] is a Nash equilibrium.
(It is worth tracking which best response conditions were used and how they were used
in each step of finding this component of Nash equilibria. First, the optimality of player
2’s strategy σ2 with support {L, C} was used to obtain the restrictions t = m ≤ 25 on player
1’s strategy σ1 . Second, the optimality of σ1 against such a σ2 was used to obtain a further
restriction on player 1’s strategy, namely that he plays D. Third, the fact that D is optimal
for player 1 against σ2 was used to obtain restrictions on player 2’s strategy, namely that
l, c ≤ 32 . Combining these last restrictions with the fact that σ2 has support {L, C} let us
conclude that σ2 takes the form L + (1 − α)C) with α ∈ [ 13 , 23 ].)
In conclusion, the set of Nash equilibria consists of three connected components: {(T, L)},
{(M, C)}, and {(D, αL + (1 − α)C) : α ∈ [ 13 , 23 ]}. ♦
–43–
2
A B C
A 0, 0 6, −3 −4, −1
1 B −3, 6 0, 0 5, 3
C −1, −4 3, 5 0, 0
Since the game is symmetric, both players have the same incentives as a function of the
opponent’s behavior.
A B ⇔ 6b − 4c ≥ −3a + 5c ⇔ a + 2b ≥ 3c;
A C ⇔ 6b − 4c ≥ −a + 3b ⇔ a + 3b ≥ 4c;
B C ⇔ −3a + 5c ≥ −a + 3b ⇔ 5c ≥ 2a + 3b.
A
C ⅘A+⅕C
⁵⁄₇A+²⁄₇C
A
B
B ⅗B+⅖C C
Now consider each possible support of player 1’s equilibrium strategy.
–44–
A a 2
1 2 2
2 3:L 3:R
D d 22 2
3 3 a 2 d a d
2
A 2, 2, 2 0, 0, 1 A 2, 2, 2 0, 3, 3
L R L R 1 1
D 0, 0, 3 0, 0, 3 D 1, 0, 2 1, 0, 2
0 1 0 0
0 0 0 3
3 2 1 3
(D, d) Implies that 3 plays L. Since 1 and 2 are also playing best responses,
this is a Nash equilibrium.
(D, a) Implies that 3 plays L, which implies that 1 prefers to deviate to A.
(D, mix) Implies that 3 plays L, which with 2 mixing implies that 1 prefers
to deviate to A.
(A, d) Implies that 3 plays R, which implies that 1 prefers to deviate to D.
(A, a) 1 and 2 are willing to do this if σ3 (L) ≥ 13 . Since 3 cannot affect his
payoffs given the behavior of 1 and 2, these are Nash equilibria.
(A, mix) 2 only mixes if σ3 (L) = 13 ; but if 1 plays A and 2 mixes, 3 strictly
prefers R – a contradiction.
(mix, a) Implies that 3 plays L, which implies that 1 strictly prefers A.
(mix, d) If 2 plays d, then for 1 to be willing to mix, 3 must play L; this leads
2 to deviate to a.
(mix, mix) Notice that 2 can only affect her own payoffs when 1 plays A.
Hence, for 2 to be indifferent, σ3 (L) = 13 . Given this, 1 is willing to
mix if σ2 (d) = 23 . Then for 3 to be indifferent, σ1 (D) = 74 . This is a
Nash equilibrium.
A classic theorem of Nash shows that every finite normal form game admits at least one
equilibrium, possibly in mixed strategies. Thus the Nash equilibrium concept always
provides at least one prediction of play.
Theorem 1.39 (Nash (1950)).
Any finite normal form game has at least one (possibly mixed) Nash equilibrium.
Equilibrium existence results are usually proved by means of fixed point theorems.
–45–
Theorem 1.40 (Brouwer (1912)). Let X ⊆ Rn be nonempty, compact, and convex. Let the
function f : X → X be continuous. Then there exists an x ∈ X such that x = f (x).
Theorem 1.41 (Kakutani (1941)). Let X ⊆ Rn be nonempty, compact, and convex. Let the
correspondence f : X ⇒ X be nonempty, upper hemicontinuous, and convex valued. Then there
exists an x ∈ X such that x ∈ f (x).
A slicker proof of Theorem 1.39, presented in Nash (1951), builds on Brown and von
Neumann (1950):
Define C : Σ → Σ by
We next consider the structure of the set of Nash equilibria. The strongest results are
available for two-player games. We call a two-player game nondegenerate if for each
strategy σi , the number of pure best responses to σi is at most the cardinality of the
support of σi . (For instance, if σi uses two strategies, no best response to σi may use three
or more strategies.) Also, we define a polytope to be the convex hull of a finite number of
points.
–46–
(ii) In general, the set of Nash equilibria is the union of a finite number of polytopes.
Part (i) of the theorem describes the structure of the Nash set in “typical” two-player games.
It is due to Lemke and Howson (1964), who also provided an algebraic proof of the exis-
tence of Nash equilibrium in two-player games, and the best early algorithm for computing
a Nash equilibrium of a two-player game. Part (ii) of the theorem implies that the complete
set of Nash equilibria of a two-player game can described by specifying the extreme points
of the equilibrium polytopes. Recent algorithms for finding all Nash equilibria of a two-
player game are presented in Avis et al. (2010). A web-based implementation of one of the
algorithms from this paper is posted at cgi.csc.liv.ac.uk/˜rahul/bimatrix_solver/.
The next result is a structure theorem for games with arbitrary numbers of players. A
normal form game is defined by the number of players n, a set of pure strategies Si for
each player, and a specification of payoffs {u(s)}s∈S ∈ Rn×#S . A property holds in generic
finite normal form games if for any choice of the number of players and the pure strategy
sets, the set of payoff specifications for which the property fails to hold has measure zero
in Rn×#S . Notice that on its own, a result about generic games does not tell us whether the
property in question holds in any particular game.
Theorem 1.43.
(i) In generic finite normal form games, the number of Nash equilibria is finite and odd.
(ii) In any finite normal form game, the set of Nash equilibria is the union of a finite number of
connected components (i.e., maximal connected subsets).
Part (i) of the theorem is due to Wilson (1971). Part (ii) is due to Kohlberg and Mertens
(1986), who conclude this from a stronger property, namely that the set of Nash equilibria
is semialgebraic. This property is also satisfied by the equilibrium sets defined by other
equilibrium concepts, including sequential equilibrium in extensive form games (Section
2.4). For applications and background, see Blume and Zame (1994) and Coste (2002).
The problem of finding a Nash equilibrium, even in two-player games, is known to be
computationally hard, in the sense that the running time of any algorithm that always
accomplishes this must grow exponentially in the size of the game (see Daskalakis et al.
(2009) and Chen et al. (2009)). This raises doubts about the unrestricted use of Nash
equilibrium to predict behavior in games. We discuss this point further in the next
section.
Example 1.44. (The Good Restaurant, Bad Restaurant game (Example 1.31))
2
g b
NE: (G, g)
G 2, 2 0, 0
1 (B, b)
B 0, 0 1, 1
( 13 G + 23 B, 13 g + 32 b). ♦
–47–
Example 1.45. Matching Pennies.
2
h t
H 1, −1 −1, 1
1
T −1, 1 1, −1 unique NE: ( 12 H + 21 T, 12 h + 12 t). ♦
There is no general justification for assuming equilibrium knowledge. But justifications can be
found in certain specific instances:
–48–
in which all communication is ignored. For an overview of cheap talk games, see
Farrell and Rabin (1996).
(iii) Focal points (Schelling (1960)). Something about the game makes some Nash
equilibrium the obvious choice about how to behave.
ex: coordinating on the good restaurant.
ex: meeting in NYC at the information booth at Grand Central Station at noon.
(iv) Learning/Evolution: If players recurrently face the same game, they may find their
way from arbitrary initial behavior to Nash equilibrium.
Heuristic learning: Small groups of players, typically employing rules that
condition on the empirical distribution of past play (Young (2004), Hart
(2005))
Evolutionary game theory: Large populations of agents using myopic up-
dating rules (Sandholm (2009, 2010))
In some classes of games (that include the two examples above), many learning
and evolutionary processes do converge to Nash equilibrium.
But there is no general guarantee of convergence:
Many games lead to cycling or chaotic behavior, and in some games any “reason-
able” dynamic process fails to converge to equilibrium (Shapley (1964), Hofbauer
and Swinkels (1996), Hart and Mas-Colell (2003)).
Indeed, results on the difficulty of computing Nash equilibrium imply that no
learning procedure, reasonable or not, can find a Nash equilibrium in a reasonable
amount of time in every large game (Daskalakis et al. (2009), Chen et al. (2009)).
Some games introduced in applications are known to have poor convergence prop-
erties (Hopkins and Seymour (2002), Lahkar (2011)).
In fact, evolutionary game theory models do not even support the elimination of
strictly dominated strategies in all games (Hofbauer and Sandholm (2011)).
Interpretation of mixed strategy Nash equilibrium: why mix in precisely the way that
makes your opponents indifferent?
In the unique equilibrium of Matching Pennies, player 1 is indifferent among all of his
mixed strategies. He chooses ( 21 , 12 ) because this makes player 2 indifferent. Why should
we expect player 1 to behave in this way?
–49–
In zero-sum games (Section 1.6), randomization can be used to ensure that you obtain
at least the equilibrium payoff regardless of how opponents behave:
In a mixed equilibrium, you randomize to make your opponent indifferent between
her strategies. In a zero-sum game, this implies that you are indifferent between
your opponent’s strategies. This implies that you do not care if your opponent
finds out your randomization probabilities in advance, as this does not enable her
to take advantage of you.
(ii) Mixed equilibrium as equilibrium in conjectures
One can interpret σ∗i as describing the conjecture that player i’s opponents have
about player i’s behavior. The fact that σ∗i is a mixed strategy then reflects the
opponents’ uncertainty about how i will behave, even if i is not actually planning
to randomize.
But as Rubinstein (1991) observes, this interpretation
“. . . implies that an equilibrium does not lead to a prediction (statistical or other-
wise) of the players’ behavior. Any player i’s action which is a best response given
his expectation about the other players’ behavior (the other n − 1 strategies) is con-
sistent as a prediction for i’s action (this might include actions which are outside the
support of the mixed strategy). This renders meaningless any comparative statics
or welfare analysis of the mixed strategy equilibrium. . .
(iii) Mixed equilibria as time averages of play: fictitious play (Brown (1951))
Suppose that the game is played repeatedly, and that in each period, each player
chooses a best response to the time average of past play.
Then in certain classes of games, the time average of each players’ behavior con-
verges to his part in some Nash equilibrium strategy profile.
(iv) Mixed equilibria as population equilibria (Nash (1950))
Suppose that there is one population for the player 1 role and another for the player
2 role, and that players are randomly matched to play the game.
If half of the players in each population play Heads, no one has a reason to deviate.
Hence, the mixed equilibrium describes stationary distributions of pure strategies in
each population.
(v) Purification: mixed equilibria as pure equilibria of games with payoff uncertainty
(Harsanyi (1973))
Example 1.46. Purification in Matching Pennies. Suppose that while the Matching Pennies
payoff bimatrix gives player’s approximate payoffs, players’ actual payoffs also contain
small terms εH , εh representing a bias toward playing heads, and that each player only
–50–
knows his own bias. (The formal framework for modeling this situation is called a Bayesian
game—see Section 4.)
2
h t
H 1 + εH , −1 + εh −1 + εH , 1
1
T −1, 1 + εh 1, −1
Specifically, suppose that εH and εh are independent random variables with P(εH > 0) =
P(εH < 0) = 12 and P(εh > 0) = P(εh < 0) = 12 . Then it is a strict Nash equilibrium for each
player to follow his bias. From the ex ante point of view, the distribution over actions that
this equilibrium generates in the original normal form game is ( 12 H + 12 T, 12 h + 21 t).
Harsanyi (1973) shows that any mixed equilibrium can be purified in this way. This
includes not only “reasonable” mixed equilibria like that in Matching Pennies, but also
“unreasonable” ones like those in coordination games. ♦
Many games appearing in economic applications have continuous strategy sets—for in-
stance, an interval of real numbers. For better or worse, analyses of these games often
focus on pure strategy equilibrium.
Let G = {P , {Si }i∈P , {ui }i∈P } be a game in which the strategy sets S may be finite or contin-
uous. Player i’s pure best-response correspondence bi : S−i ⇒ Si is defined by
Example 1.47. Cournot duopoly. Two firms separately choose what quantity of a homoge-
neous good to produce. All output is sent to a market maker who sets the price that clears
the market.
–51–
Firm i’s profit function is
qi (a − qi − q j − c) if qi + q j ≤ a,
πi (qi , q j ) =
−cqi if qi + q j > a.
The graphs of the best response correspondences are shown below. The graphs intersect
a−c−q a−c−q
exactly once, at the solution to the system q1 = 2 2 , q2 = 2 1 . Thus the game’s unique
Nash equilibrium is (q1 , q2 ) = ( a−c
3
, a−c
3
). _
q2
a–c
b1(q2)
a–c
2
a–c
3
b2(q1)
Example 1.48. Bertrand duopoly. Each firm chooses a price. The firm choosing the lower
price, pmin , sells Q(pmin ) units at this price, producing each unit at marginal cost c. The
firm choosing the higher price produces nothing and sells nothing. If the firms charge
the same price, they split the market. (A firm does not produce until it knows how many
units it will sell.)
–52–
Firm i’s profit function is
(a − pi )(pi − c) if pi < p j and pi ≤ a,
πi (pi , p j ) =
(a − pi )(pi − c) if pi = p j ≤ a,
1
2
0 if pi > p j or pi > a.
Let pm = a+c
2
be the monopoly price. Then firm i’s best response correspondence is
(p j , ∞) if p j < c,
[p j , ∞) if p j = c,
bi (p j ) =
if c < p j ≤ pm ,
∅
{pm } if p j > pm .
The graphs of the best response correspondences are shown below. Their unique inter-
section point is the game’s unique Nash equilibrium, (p1 , p2 ) = (c, c). _
p2
b2(p1)
pm
b1(p2)
c
0 c pm p1
The basic existence theorem for games with continuous strategy sets requires continuous
payoff functions and quasiconcavity of payoffs in own strategies. Together these prop-
erties ensure that the pure best response correspondences have the properties needed to
apply the Kakutani fixed point theorem.
Theorem 1.49. Suppose that (i) each Si is a compact, convex subset of Rk , (ii) each ui is continuous
in s , and (iii) each ui is quasiconcave in si . Then the game G has at least one pure strategy Nash
equilibrium.
–53–
Theorem 1.49 applies to the Cournot game (check the quasiconcavity condition!) if one
restricts the strategy sets to make them compact intervals (e.g., by not allowing quantities
above a). The Bertrand game does not have continuous payoffs. However, payoffs
are quasiconcave in own strategies (check!), and in addition the game satisfies a property
called “better reply security”. Reny (1999) shows that these properties ensure the existence
of a pure strategy Nash equilibrium.
Dasgupta and Maskin (1986a) and Reny (1999, 2016) are the basic references on existence
of Nash equilibrium in games with discontinuous payoffs.
A closely related topic concerns existence of equilibrium in environments with discontin-
uous payoffs via endogenous sharing rules. In some applications, the map from strategy
profiles to payoff profiles is naturally represented by a correspondence. For instance, if
firms competing in prices charge the same price, then the consumers may choose between
the firms in a variety of ways, leading to a variety of possible payoff profiles for the
firms. Then one can ask whether there is a “sharing rule” that generates a game (with
single-valued payoff functions) in which a Nash equilibrium exists. Basic references here
are Simon and Zame (1990) and Jackson et al. (2002).
One can also consider existence of mixed strategy Nash equilibrium. (Note that a mixed
strategy in G is a probability measure on Si .) The basic existence result for games with
continuous payoffs drops the requirement of quasiconcavity of payoffs in own strategies.
It is proved using an extension of Kakutani’s fixed point theorem to suitable infinite-
dimensional spaces (Fan (1952), Glicksberg (1952)).
Theorem 1.50. Suppose that (i) each Si is a compact, convex subset of Rk and (ii) each ui is
continuous in s . Then the game G has at least one (possibly mixed) Nash equilibrium.
For results on existence of mixed strategy equilibrium in games with discontinuous pay-
offs, again see Dasgupta and Maskin (1986a) and Reny (1999, 2016).
2
L R
T 5, 5 2, 6
1
B 6, 2 1, 1
–54–
Nash equilibria/payoffs: (T, R) ⇒ (2, 6), (B, L) ⇒ (6, 2), ( 21 T + 12 B, 12 L + 12 R) ⇒ (3 21 , 3 21 )
If the players can observe a toss of a fair coin before play, they can obtain equilibrium
payoffs of (4, 4) using the correlated strategy 21 TR + 21 BL. More generally, any point in
the convex hull of {(2, 6), (6, 2), (3 12 , 3 12 )} can be achieved in equilibrium using some three-
outcome randomization device.
We can imagine a mediator using such a device to determine which equilibrium he tells
the players to play. If a player expects his opponent to obey the announcement, then the
fact that the announced strategy profile is a Nash equilibrium implies that it is optimal
for the player to obey the announcement himself.
(2, 6)
6
4 (4, 4)
(3 _12 , 3 _12 )
2 (6, 2)
2 4 6
Can a mediator use a randomizing device to generate expected payoffs greater than 8 in
equilibrium?
Yes, by only telling each player what he is supposed to play, not what the opponent is
supposed to play.
Suppose the device specifies TL, TR, and BL each with probability 13 , so that ρ = 31 TL +
1
3
TR + 13 BL.
2
L R
1 1
T 3 3
1 1
B 3
0
However, player 1 is only told whether his component is T or B, and player 2 is only told
if her component is L or R:
–55–
The correlated strategy ρ generates payoffs of (4 13 , 4 31 ).
Moreover, we claim that both players obeying constitutes an equilibrium:
Suppose that player 2 plays as prescribed, and consider player 1’s incentives.
If player 1 sees B, he knows that player 2 will play L, so his best response is B (since 6 > 5).
If player 1 sees T, he believes that player 2 is equally likely to play L or R, and so T is a
best response for player 1 (since 3 12 = 3 12 ).
By symmetry, this is an equilibrium. ♦
In words, the first condition says that if i receives signal si and opponents obey their signals,
i cannot benefit from disobeying his signal. The second condition is mathematically
simpler—see below.
The Nash equilibria are (G, b), (B, g), and ( 32 G + 13 B, 23 g + 31 b).
The set of correlated equilibria of any game is an intersection of a finite number of half
spaces. It is therefore a polytope: that is, the convex hull of a finite number of points.
The constraint ensuring that player 1 plays G when told is 3ρGg + 0ρGb ≥ 5ρGg − 4ρGb , or,
equivalently, 2ρGb ≥ ρGg . We compute the constraints for strategies B, g, and b similarly,
and list them along with the nonnegativity constraints:
–56–
(2) ρBg ≥ 2ρBb ; (6) ρGb ≥ 0
(3) 2ρBg ≥ ρGg ; (7) ρBg ≥ 0
(4) ρGb ≥ 2ρBb ; (8) ρBb ≥ 0
The set of correlated equilibria is drawn below. Notice that vertices α and β are the
pure Nash equilibria, and that vertex ε is the mixed Nash equilibrium, at which all four
incentive constraints bind.
31
8 8
3 ε 1
3 1
2 4
2 4
2 δ 4
5 5
α β
5
7 8 6
–57–
γ
ε γ γ
ε ε
β α
δ α
β δ
β
α
♦
1.5.2 Interpretation
Some game theorists feel that correlated equilibrium is the fundamental equilibrium
concept for normal form games:
–58–
arising from any correlated signal structure can be expressed as correlated equi-
libria. Thus, correlated equilibrium provides a way of implicitly including all
signaling structures, possibly provided by a mediator, directly in the solution con-
cept. See Myerson (1991, Chapter 6) for an excellent textbook discussion of this
and related ideas.
–59–
respond to opponents’ “strategies” yields the same conclusions as the assumption of
common knowledge of rationality.
For generalizations of these ideas to Bayesian games, see Battigalli and Siniscalchi (2003).
Worst-case analysis
The analyses so far have assumed that players are Bayesian rational: they form beliefs
about opponent’s choices of strategies and then maximize expected utility given those
beliefs. We start our analysis of zero-sum games by considering an alternative decision
criterion based on worst-case analysis: each player chooses a mixed strategy that maximizes
his minimal expected utility, where the minimum is taken over the possible strategies of
the opponents. In general, worst-case analysis can be understood as representing a strong
form of pessimism.
Under Bayesian rationality a player optimizes against a fixed belief about the opponents’
behavior. In contrast, under worst-case analysis a player’s anticipation about the oppo-
nent’s behavior depends on the agent’s own choice. Stoye (2011), building on Milnor
(1954), and using an Anscombe-Aumann decision-theoretic framework (see Section 0.4),
provides an axiomatization of this maxmin expected utility criterion, as well as other classical
criteria from statistical decision theory.
Zero-sum games
Let G = {P , {Si }i∈P , {ui }i∈P }, P = {1, 2} be a two-player game. We call G a zero-sum game if
u2 (s) = −u1 (s) for all s ∈ S. In a zero-sum game, the two players’ interests are diametrically
opposed: whatever is good for one player is bad for the other, and to the same degree.
Since cardinal utilities are only unique up to a positive affine transformation of payoffs
(vi (s) = a + bui (s) for all s ∈ S), games that are not zero-sum may be strategically equivalent
to a zero-sum game.
In economics, zero-sum games are somewhat exceptional, since typical economic situ-
ations involve a combination of competitive and cooperative elements. Nevertheless,
zero-sum games are important in economics because they describe the polar case of com-
pletely opposing interests.
When discussing zero-sum games, it is simplest to always speak in terms of player 1’s
payoffs. This way, player 1 is always the maximizer, and player 2 becomes the minimizer.
(This choice is completely arbitrary, and has no bearing on any of the conclusions we
–60–
reach. In other words, we could just as easily speak in terms of player 2’s payoffs. This
would change who we call the maximizer and who the minimizer, but would not alter our
conclusions about any given game—see the comment just before Example 1.54 below.)
It is convenient to present worst-case analysis in the context of zero-sum games. In this
context, player 2 really does want to minimize player 1’s payoffs, since by doing so she
maximizes her own payoffs. But the ideas introduced next are useful beyond the context of
zero-sum games. For instance, in repeated games, a player (or group of players) may use
a minmax strategy to punish an opponent for deviations from equilibrium behavior—see
Section 3.2.
But we will henceforth call α2 (σ1 ) a punishment strategy to keep our nomenclature in
terms of player 1’s payoffs.)
We say that σ̄1 ∈ ∆S1 is a maxmin strategy for player 1 if
vmaxmin
1 = u1 (σ̄1 , α2 (σ̄1 )).
In words: σ̄1 maximizes player 1’s payoffs given that player 2 correctly anticipates 1’s
choice and punishes him. By definition, strategy σ̄1 is optimal for a player with maxmin
expected utility preferences. As we will see, such a player may strictly prefer to play a
–61–
mixed strategy; this is not possible for a Bayesian rational player.
Now we consider a worst-case analysis by player 2. Suppose that player 2 anticipates that
whatever she does, player 1 will punish her. Since we are considering a zero-sum game,
player 1 punishes player 2 by playing a best response.
Let β1 (σ2 ) be a best response of player 1 against σ2 :
In words: σ2 minimizes player 1’s payoffs given that player 1 correctly anticipates 2’s
choice and best responds. The resulting payoff vminmax
1
is player 1’s minmax value:
vminmax
1 = u1 (β1 (σ2 ), σ2 ).
Suppose that the players are only allowed to play pure strategies. In this case
–62–
min max u1 (s1 , s2 ) = min{4, 2, 3} = 2. ♦
s2 ∈S2 s1 ∈S1
It is easy to check that whether players are restricted to pure strategies or are allowed
play mixed strategies, max min ≤ min max: going last (and getting to react what your
opponent did) is at least as good as going first. (The proof is straightforward: see “Proof
of the Minmax Theorem” below.) The previous example shows that when players are
restricted to pure strategies, we can have max min < min max. Can this happen if mixed
strategies are allowed?
4
u1(σ1 , L)
3 u1(σ1 , R)
u1(σ1 , C)
2
v1maxmin = _5
3
u1(σ1 , α2(σ1))
0 R C L
_
T σ1 B
Notice that we can divide ∆S1 into three punishment regions: writing σ1 = αT + (1 − α)B,
the regions are [0, 31 ] (where the punishment is L), [ 13 , 23 ] (where it is C), and [ 23 , 1] (where
it is R). Because the lower envelope is a minimum of linear functions, player 1’s maxmin
strategy must occur at a vertex of one of the punishment regions: that is, at T, 23 T + 31 B,
1
3
T + 23 B, or B. An analogous statement is true in the case of player 2’s minmax strategy, as
we will see next.
We can find player 2’s minmax strategy in a similar fashion. In this case, we are looking
for the strategy of player 2 that minimizes an upper envelope function. This calculation
–63–
uses player 1’s best response correspondence. The upper envelope of the payoff functions
pictured below is u1 (β1 (σ2 ), σ2 ). Because this upper envelope is the maximum of linear
functions, it is minimized at a vertex of one of the best response regions shown at bottom.
By computing u1 (β1 (σ2 ), σ2 ) at each vertex, we find that σ2 = 32 C + 13 R, where u1 (β1 ( 23 C +
1
3
R), 32 C + 13 R) = 53 = vminmax
1
.
Notice that vmaxmin
1
= vminmax
1
: the payoff that player 1 is able to guarantee himself is equal
to the payoff that player 2 can hold him to. This is a consequence of the Minmax Theorem,
which we state next.
u1(T) C
u1(B)
L R
C [2]
_2 C+ _1 R
[_53 ] 3 3
T
B
[2]
L [4] _1 L+ _2 R R
3 3
[3]
We can also use a version of the first picture to compute σ2 and vminmax 1
. We do so by
finding the convex combination of the graphs of u1 (σ1 , L), u1 (σ1 , C), and u1 (σ1 , R) whose
highest point is as low as possible. This is the horizontal line shown in the diagram
below at right. (It is clear that no line can have a lower highest point, because no lower
payoff for player 1 is feasible when σ1 = 32 T + 31 B = σ̄1 .) Since this horizontal line is
the graph of 23 u1 (σ1 , C) + 13 u1 (σ1 , R) = u1 (σ1 , 23 C + 13 R) (check the endpoints), we conclude
that σ2 = 23 C + 13 R. 1’s minmax payoff is the constant value of u1 (σ1 , 23 C + 13 R), which is
vminmax
1
= 53 .
In similar fashion, one can determine σ̄1 and vmaxmin
1
using the second picture by finding
the convex combination of the two planes whose lowest point is as high as possible. This
convex combination corresponds to the mixed strategy σ̄1 = 23 T + 13 B. (Sketch this plane
in to see for yourself.)
–64–
4
0
T B ♦
The Minmax Theorem tells us that in a zero-sum game, player 1 can guarantee that he
gets at least v∗1 , and player 2 can guarantee that player 1 gets no more than v∗1 ; moreover,
in such a game, worst-case analysis and Bayesian-rational equilibrium analysis generate
the same predictions of play.
For more on the structure of the equilibrium set in zero-sum games, see Shapley and Snow
(1950); see González-Dı́az et al. (2010) for a textbook treatment. For minmax theorems for
games with infinite strategy sets, see Sion (1958).
Example 1.57. In the zero-sum game G with player 1’s payoffs defined in Example 1.55,
the unique Nash equilibrium is ( 23 T + 13 B, 23 C + 31 R), and the game’s value is v(G) = 53 . ♦
Example 1.58. What are the Nash equilibria of this normal form game?
–65–
2
X Y Z
A 10, 0 −1, 11 −1, 11
B 9, 1 1, 9 1, 9
1 C 2, 8 8, 2 1, 9
D 2, 8 2, 8 7, 3
E 4, 6 5, 5 6, 4
This game is a constant-sum game, and so creates the same incentives as a zero-sum game
(for instance, the game in which each player’s payoffs are always 5 units lower). Therefore,
by the Minmax Theorem, player 2’s Nash equilibrium strategies are her minmax strategies.
To find these, we draw player 1’s best response correspondence (see below). The numbers
in brackets are player 1’s best response payoffs against the given strategies of player 2.
Evidently, player 1’s minmax payoff is v1 = 51 11
= 4 117 , which player 2 can enforce using her
unique minmax strategy, σ2 = 11 5
X + 115 Y + 111 Z.
Now let σ∗1 be an equilibrium strategy of player 1. Since player 2 uses her minmax strategy
σ2 = 115
X + 11
5
Y + 111 Z in any equilibrium, it must be a best response to σ∗1 ; in particular, X,
Y, and Z must yield her the same payoff against σ∗ . But σ∗ , being a best response to σ ,
1 1 2
can only put weight on pure strategies B, C, and E. The only mixture of these strategies
that makes player 2 indifferent among X, Y, and Z is σ∗1 = 77 13
B + 778 C + 56
77
E. We therefore
conclude that ( 77 B + 77 C + 77 E, 11 X + 11 Y + 11 Z) is the unique Nash equilibrium of G. ♦
13 8 56 5 5 1
X
[10]
A
[ _ ] _2 X+ _1 Y
19
3 3 3
_2 X+ _1 Z
3 3 [ 1_39]
[5] _12 X+ _12 Y B _1 X+ _1 Z [5]
2 2
_5 X+ _5 Y+ _1 Z
_1 X+ _2 Z
3 3 [ 1_36]
11
_
11 11
C E
[ ]
51
11
D
Y _5 Y+ _3 Z _1 Y+ _3 Z Z
[8] [7]
[ 4_83] [ 2_43]
8 8 4 4
The proof of the minmax theorem is straightforward if one uses the Nash equilibrium
existence theorem to show that vmaxmin
1
≥ vminmax
1
. We do this next. Then Example 1.59
maxmin minmax
gives a proof that v1 ≥ v1 that uses only the separating hyperplane theorem.
–66–
To prove that vmaxmin
1
= vminmax
1
, we first show that vmaxmin
1
≤ vminmax
1
:
(†) vmaxmin
1 = u1 (σ̄1 , α2 (σ̄1 )) ≤ u1 (σ̄1 , σ2 ) ≤ u1 (β1 (σ2 ), σ2 ) = vminmax
1 .
(‡) vmaxmin
1 = u1 (σ̄1 , α2 (σ̄1 )) ≥ u1 (σ∗1 , α2 (σ∗1 )) = u1 (σ∗1 , σ∗2 ) = u1 (β1 (σ∗2 ), σ∗2 ) ≥ u1 (β1 (σ2 ), σ2 ) = vminmax
1 .
The argument above shows that all of the inequalities in (†) and (‡) are equalities. We now
use this fact to prove statements (ii) and (iii) of the theorem. First, the two new equalities
in (†) tell us that any pair of maxmin strategies forms a Nash equilibrium. (The second
new equality shows that 1’s maxmin strategy σ̄1 is a best response to 2’s minmax strategy
σ2 ; the first new equality shows the reverse.) Second, the two new equalities in (‡) tell us
that a Nash equilibrium strategy for a given player is a maxmin strategy for that player.
(The first new equality shows that the Nash equilibrium strategy σ∗1 is maxmin strategy
for 1; the second new equality shows that σ∗2 is a minmax strategy for 2.) And third, the
equalities in (‡) also show that all Nash equilibria of G yield payoff vmaxmin
1
= vminmax
1
for
player 1. This completes the proof of the theorem.
Example 1.59. Consider the two player normal form game G below, in which player 1 is
the row player and player 2 is the column player. (Only player 1’s payoffs are shown.)
2
a b c d
T 5, · 3, · 1, · 0, ·
1
B −1, · 0, · 4, · 7, ·
Define the following sets:
(i) Explain in game theoretic terms what it means for a vector to be an element of the
set K.
(ii) Let c∗ be the smallest number such that the point (c∗ , c∗ ) is contained in K. What is
the value of c∗ ? Relate this number to player 1’s minmax payoff in G, explaining
the reason for the relationship you describe.
–67–
(iii) Specify the normal vector p∗ ∈ R2 and the intercept d∗ of the hyperplane H = {v ∈
R2 : p∗ · v = d∗ } that separates the set L(c∗ ) from the set K, choosing the vector p∗ to
have components that are nonnegative and sum to one.
(iv) Interpret the fact that p∗ · v ≥ d∗ for all v ∈ K in game theoretic terms. What
conclusions can we draw about player 1’s maxmin payoff in G?
(v) Let Gn be a two player normal form game in which player 1 has n ≥ 2 strategies.
Sketch a proof of the fact that player 1’s minmax payoff in Gn and his maxmin
payoff in Gn are equal. (When Gn is zero-sum, this fact is the Minmax Theorem.)
Solution:
(i) v ∈ K if and only if v = (u1 (T, σ2 ), u1 (B, σ2 )) for some σ2 ∈ ∆S2 .
(ii) c∗ = 2. This is player 1’s minmax value: by playing 12 b + 12 c, the strategy that
generates (2, 2), player 2 ensures that player 1 cannot obtain a payoff higher than
2. If you draw a picture of K, you will see that player 2 cannot restrict 1 to a lower
payoff.
(iii) p∗ = ( 23 , 13 ) and d∗ = 2.
(iv) Let σ∗1 = p∗ . Then (p∗ · v ≥ 2 for all v ∈ K) is equivalent to (u∗1 (σ∗1 , σ2 ) ≥ 2 for all
σ2 ∈ ∆S2 ). Thus, player 1’s maxmin payoff is at least 2; by part (ii), it must be
exactly 2.
(v) Define n-strategy analogues of the sets J, K, and L(c). Let c∗ be the largest value of
c such that int(L(c)) and int(K) do not intersect. (Notice that if, for example, player
1 has a dominated strategy, L(c∗ ) ∩ K may not include (c∗ , . . ., c∗ ); this is why we
need the set L(c∗ ).) If player 2 chooses a σ2 that generates a point in L(c∗ ) ∩ K, then
player 1 cannot obtain a payoff higher than c∗ . Hence, player 1’s minmax value is
less than or equal to c∗ . (In fact, the way we chose c∗ tells us that it is exactly equal
to c∗ .)
Let p∗ and d∗ define a hyperplane that separates L(c∗ ) and K; we know that such a
hyperplane exists by the separating hyperplane theorem. Given the form of L(c∗ ),
we can choose p∗ to lie in ∆S1 ; therefore, since the hyperplane passes through the
point (c∗ , . . ., c∗ ), the d∗ corresponding to any p∗ chosen from ∆S1 (in particular, to
any p∗ whose components sum to one) is in fact c∗ . Thus, as in (iv) above, player
1’s maxmin value is at least c∗ . Since player 1’s maxmin value cannot exceed his
minmax value, we conclude that both of these values equal c∗ . ♦
Extensive form games describe strategic interactions in which moves may occur sequen-
tially. The two main classes of extensive form games are games of perfect information and
–68–
games of imperfect information. The key distinction here is that in the former class of games,
a player’s choices are always immediately observed by his opponents.
Simultaneous moves are modeled by incorporating unobserved moves (see Example 2.2),
and so lead to imperfect information games (but see Section 2.3.4).
Extensive form games may also include chance events, modeled as moves by Nature. For
our categorization, it is most convenient to understand “game of perfect information” to
refer only to games without moves by Nature.
The definition of extensive form games here follows Selten (1975). Osborne and Rubinstein
(1994) define extensive form games without explicitly introducing game trees. Instead,
what we call nodes are identified with the sequences of actions that lead to them. This
approach, which is equivalent to Selten’s, requires somewhat less notation, but at the cost
of being somewhat more abstract.
Example 2.1. Sequential Battle of the Sexes. In this game, player 2 observes player 1’s choice
before making her own choice. 1 x
F B
2y 2z
f b f̂ b̂
ζ1 ζ2 ζ3 ζ4
2,1 0,0 0,0 1,2 ♦
Example 2.2. (Simultaneous) Battle of the Sexes. Here player 2 does not observe player 1’s
choice before she moves herself. We represent the fact that player 2 cannot tell which of
her decision nodes has been reached by enclosing them in an oval. The decision nodes
are said to be in the same information set.
1
F B
2 2
f b f b
–69–
Example 2.3. A simple card game. Players 1 and 2 each bet $1. Player 1 is given a card which
is high or low; each is equally likely. Player 1 sees the card, player 2 doesn’t. Player 1 can
raise the bet to $2 or fold. If player 1 raises, player 2 can call or fold. If player 2 calls, then
player 1 wins if and only if his card is high.
The random assignment of a card is represented as move by Nature, marked in the game
tree as “player 0”. Since player 2 cannot tell whether player 1 has raised with a high card
or a low card, her two decision nodes are in a single information set.
0v
H L
¹⁄₂ ¹⁄₂
1w 1x
R F r f
-1,1 -1,1
2y 2z
C F C F
Let X be a set of nodes, and E ⊆ X × X a set of (directed) edges. The edge leading from
node x to node y is written as e = (x, y). We call the pair (X, E) a tree with root r if r has
no incoming edge, and if for each y , r there is a unique path (i.e., sequence of edges
{(x0 , x1 ), (x1 , x2 ), ..., (xn−1 , xn )}) from r = x0 to y = xn using only edges in E. Nodes with
outgoing edges are called decision nodes; those without are called terminal nodes. The sets
of these are denoted D and Z, respectively.
Root
Terminal nodes
–70–
All edges are labeled with actions:
A set of actions
α: E → A assigns each edge an action, assigning different actions to distinct edges
leaving the same node (e = (x, y), ê = (x, ŷ) , e ⇒ α(e) , α(ê))
We let Ax = {α(e) : e = (x, y) for some y ∈ X} denote the set actions available at decision
node x.
Finally, we introduce a set of players, assign each decision node to a player, and specify
each player’s utility at each terminal node.
1
L R
2 2
a b a b
0,4,2 0,0,1
This completes the definition of an extensive form game of perfect information. Extensive
form games, perfect information or not, are denoted by Γ.
Example 2.4. Sequential Battle of the Sexes: notation. In Example 2.1, we have:
Assignments of decision nodes: D1 = {x}, D2 = {y, z}.
Action sets: Ax = {F, B}, A y = { f, b}, Az = { fˆ, b̂}
Example of utility: (u1 (ζ1 ), u2 (ζ1 )) = (2, 1). ♦
Moves by Nature
Chance events that occur during the course of play are represented by decision nodes
assigned to an artificial “player 0”, often called Nature, whose choice probabilities are
specified as part of the definition of the game.
–71–
To be more precise, the set of decision nodes D is now partitioned into D0 , D1 , . . . , Dn , with
D0 being the set of nodes assigned to Nature. Let x be such a node, so that Ax is the set
of actions available at x. Then the definition of the game includes a probability vector
px ∈ ∆Ax , which for each action a ∈ Ax specifies the probability px (a) that Nature chooses
a.
Although we sometimes refer to Nature as “player 0”, Nature is really just a device for
representing chance events, and should not be regarded as a player. That is, the set of
players is still P = {1, . . . n}.
Perfect recall
One nearly always restricts attention to games satisfying perfect recall: each player remem-
bers everything he once knew, including his own past moves.
To express this requirement formally, write e ≺ y when edge e precedes node y in the game
–72–
tree (i.e., when the path from the root node to y includes e). Let Ex = {e = (x, y) for some y ∈
X} be the set of outgoing edges from x. Then perfect recall is defined as follows:
If x ∈ I ∈ Ii , e ∈ Ex , y, ŷ ∈ I0 ∈ Ii , and e ≺ y,
then there exist x̂ ∈ I and ê ∈ Ex̂ such that α(e) = α(ê) and ê ≺ ŷ.
Example 2.6. Here are two games in which perfect recall is violated. The game at right is
known as the absent-minded driver’s problem; see Piccione and Rubinstein (1997).
D
1x (0)
1x
L R
W
1y 1z
l r l r
D
1y (4)
(1) ♦
A player’s strategy specifies exactly one action at each of his information sets I. The player
cannot choose different actions at different nodes in an information set because he is
unable to distinguish these nodes during play.
Q
Formally, a pure strategy si ∈ Si ≡ I∈Ii AI specifies an action si (I) for player i at each of his
information sets I ∈ Ii .
Example 2.7. Sequential Battle of the Sexes: strategies. In Example 2.1, S1 = Ax = {F, B} and
S2 = A y × Az = { f fˆ, f b̂, b fˆ, bb̂}. Note that even if player 1 chooses F, we still must specify
what player 2 would have done at z, so that 1 can evaluate his choices at x. ♦
A pure strategy for an extensive form game must provide a complete description of how a
player intends to play the game, regardless of what the other players do. In other words,
one role of a strategy is to specify a “plan of action” for playing the game.
However, in games where a player may be called upon to move more than once during the
course of play (that is, in games that do not have the single-move property), a pure strategy
–73–
contains information that a plan of action does not: it specifies how the player would act at
information sets that are unreachable given his strategy, meaning that the actions specified
by his strategy earlier in the game ensure that the information set is not reached.
A C E
1x 2y 1z 1,4
B D F
–74–
Example 2.9. For the game tree below,
Pure strategies: S2 = {L, R} × {l, r} = {Ll, Lr, Rl, Rr};
Mixed strategies: σ2 = (σ2 (Ll), σ2 (Lr), σ2 (Rl), σ2 (Rr)) with σ2 (s2 ) = 1.
P
s2 ∈S2
Γ1 1
Ll
U D
2x 2y
L R l r Rr
Another way to specify random behavior is to suppose that each time a player reaches
an information set I, he randomizes over the actions in AI . (These randomizations are
assumed to be independent of each other and of all other randomizations in the game.)
This way of specifying behavior is called a behavior strategy: βi ∈ I∈Ii ∆AI .
Q
y y y
β2 = (βx2 , β2 ), where βx2 (L) + βx2 (R) = 1 and β2 (l) + β2 (r) = 1.
l r
L
R
♦
Note that mixed strategies are joint distributions, while behavior strategies are collections
of marginal distributions from which draws are assumed to be independent. (Their
relationship is thus the same as that between correlated strategies and mixed strategy
profiles in normal form games.)
–75–
It follows immediately that every distribution over pure strategies generated by a behavior
strategy can also be generated by a mixed strategy. However, the converse statement is
false in general, since behavior strategies cannot generate correlation in choices at different
information sets.
Example 2.11. In the game from Example 2.9, consider the behavior strategy β2 = ((βx2 (L), βx2 (R)),
y y
(β2 (l), β2 (r))) = (( 12 , 21 ), ( 13 , 32 )). Since the randomizations at the two nodes are independent,
β2 generates a mixed strategy that is a product distribution over pure strategies, as shown
inside the table below:
y
1 2
3 3
l r
1 1 1
2
L 6 3
x 1 1 1
2
R 6 3
Or, in notation:
Example 2.12. The mixed strategy σ̂2 = ( 31 , 16 , 0, 12 ) entails correlation between the ran-
domizations at player 2’s two decision nodes. In behavior strategies, such correlation is
forbidden. But in this case the correlation is strategically irrelevant, since during a given
play of the game, only one of player 2’s decision nodes will be reached. In fact, σ̂2 is also
“strategically equivalent” to β2 . We make this idea precise next. ♦
Theorem 2.13 (Kuhn’s (1953) Theorem). Suppose that Γ has perfect recall. Then every mixed
strategy σi is outcome equivalent to some behavior strategy βi .
–76–
For a proof, see González-Dı́az et al. (2010).
Suppose one is given a mixed strategy σi . How can one find an equivalent behavior
strategy βi ? Specifically, how do we define βIi (a), the probability placed on action a at
player i’s information set I?
Intuitively, βIi (a) should be the probability that a is chosen at I, conditional on i and his
opponents acting in such a way that I is reached.
Formally, βIi (a) can be defined as follows:
(i) Let σi (I) be the probability that σi places on pure strategies that do not preclude I’s
being reached.
(ii.a) If σi (I) > 0, let σi (I, a) be the probability that σi places on pure strategies that do not
preclude I’s being reached and that specify action a at I. Then let βIi (a) = σi (I, a)/σi (I).
(ii.b) If σi (I) = 0, specify βIi ∈ ∆AI arbitrarily.
In games with the single-move property, a player cannot prevent any of his own informa-
tion sets from being reached, so σi (I) = 1. Thus case (ii.a) always applies, so the procedure
above specifies a unique βi , and in fact this βi is the only behavior strategy that is outcome
equivalent to σi .
Example 2.14. Consider the game from Example 2.9, which has the single-move property.
Consider mixed strategy σ2 = (σ2 (Ll), σ2 (Lr), σ2 (Rl), σ2 (Rr)) = ( 13 , 13 , 121 , 14 ). Writing this inside
a joint distribution table, then the marginal distributions are the equivalent behavior
strategy:
y
5 7
12 12
l r
2 1 1
3
L 3 3
x 1 1 1
3
R 12 4
In notation:
Example 2.15. Again let σ2 = ( 13 , 13 , 121 , 14 ) as above, but consider the game below:
–77–
Γ3 2x
L R
1 3,1
U D
2y 3,0
l r
4,1 2,2
Then βx2 (L) = σ2 (Ll) + σ2 (Lr) = 23 as before. This game does not have the single move prop-
erty, and player 2’s second node can only be reached if L is played. In a joint distribution
table, we can represent this fact by crossing out the R row of the table (remembering that
we do this only when computing choice probabilities at y, not at x), and then scaling up
to get the conditional probabilities.
y
1 1
2 2
l r
2 1 1
3
L 3 3
x 1 1 1
3
R
12
4
♦
y
In notation, the computation of β2 (l) is
1
y σ2 (Ll) 3 1
β2 (l) = = = .
σ2 (Ll) + σ2 (Lr) 1
3
+ 1
3
2
Example 2.16. In the game from the Example 2.15, if σ2 = (0, 0, 31 , 23 ), then βx2 (L) = 0, but
y
β2 (l) is unrestricted, since if 2 plays σ2 her second node is not reached. ♦
When studying extensive form games directly, it is generally easier to work with behavior
strategies than with mixed strategies. Except when the two are being compared, the
notation σi is used to denote both.
Mixed strategies appear when we consider a normal form game G(Γ) derived from an
extensive form game Γ.
–78–
To motivate normal form representations, imagine that before play begins, the players
specify complete contingent plans for playing Γ. Each reports his plan to a moderator,
who then implements the plan. Since in this scenario strategies are chosen beforehand,
there is no loss in describing the interaction as a simultaneous move game.
Example 2.17.
Γ1 1
G(Γ1 )
U D
2
2x 2y Ll Lr Rl Rr
L R l r U 3, 3 3, 3 2, 1 2, 1
1
D 1, 2 4, 4 1, 2 4, 4 ♦
3,3 2,1 1,2 4,4
Example 2.18.
Γ2 1
G(Γ2 )
U D
2
2 2 L R
L R L R U 3, 3 2, 1
1
D 1, 2 4, 4 ♦
3,3 2,1 1,2 4,4
Example 2.19.
Γ3 2x
L R
1 3,1 2
U D Ll Lr Rl Rr
U 4, 1 2, 2 3, 1 3, 1
1
2y 3,0 D 3, 0 3, 0 3, 1 3, 1
l r
4,1 2,2
We are not done: the (purely) reduced normal form consolidates equivalent pure strategies:
–79–
G(Γ3 ) 2
Ll Lr R
U 4, 1 2, 2 3, 1
1
D 3, 0 3, 0 3, 1 ♦
In the previous example, the two pure strategies that were consolidated correspond to the
same “plan of action” (see Section 2.1.2). This is true in generic extensive form games.
Many extensive form games can have the same reduced normal form. For example,
switching the order of moves in Example 2.18 does not change the reduced normal form.
Thompson (1952) and Elmes and Reny (1994) show that two extensive form games with
perfect recall have the same reduced normal form if it is possible to convert one game to
the other by applying three basic transformations: interchange of simultaneous moves,
addition of superfluous moves, and coalescing of information sets.
Equilibrium concepts for normal form games can be applied to a finite extensive form
game Γ by way of the reduced normal form. For instance, strategy profile σ is a Nash
equilibrium of Γ if it is equivalent to a Nash equilibrium of the reduced normal form of Γ.
However, the solution concepts we have seen so far do not address questions of credibility
of commitment that arise in dynamic contexts.
Example 2.20. Entry deterrence. The players are firms, an entrant (1) and an incumbent (2).
The entrant moves first, deciding to stay Out or to Enter the market. If the entrant stays
Out, he gets a payoff of 0, while the Incumbent gets the monopoly profit of 3. If the entrant
Enters, the incumbent must choose between Fighting (so that both players obtain −1)) or
Accommodating (so that both players obtain the duopoly profit of 1).
1
O E
2
0 2 F A
3
F A O 0, 3 0, 3
1
E −1, −1 1, 1
-1 1
-1 1
–80–
If player 1 chooses E, player 2’s only reasonable response is A. Thus F is an empty threat.
But Nash equilibrium does not rule F out.
Suppose we require player 2 to make credible commitments: if she specifies an action, she
must be willing to carry out that action if her decision node is reached. This requirement
would force her to play A, which in turn would lead player 1 to play E. ♦
Once we specify a strategy profile in an extensive form game, certain information sets
cannot be reached. They are said to be off-path.
While all choices at unreached information sets are optimal, these choices can determine
which choices are optimal at information sets that are reached.
To make sure that off-path behavior is reasonable—in particular, that only credible threats
are made—we introduce the principle of sequential rationality: Predictions of play in exten-
sive form games should require optimal behavior starting from every information set, not
just those on the equilibrium path.
We will study two main solution concepts that formalize this principle: subgame perfect
equilibrium for perfect information games (Section 2.3), and sequential equilibrium for
imperfect information games (Section 2.4).
In generic games of perfect information, we can capture the principle of sequential ratio-
nality without the use of equilibrium assumptions, but with strong assumptions about
players’ belief in opponents’ rationality—see Section 2.3.2. Beyond this setting, equilib-
rium knowledge assumptions will be necessary.
2.3.1 Subgame perfect equilibrium, sequential rationality, one-shot deviations, and back-
ward induction
Recall that extensive form game Γ has perfect information if every information set of Γ is a
singleton, and if Γ contains no moves by Nature.
We now formalize the principle of sequential rationality in the the context of perfect infor-
mation games by defining three solution concepts. While each definition will seem less
demanding than the previous ones, Theorem 2.21 will show that all three are equivalent.
The first two definitions make use of the notion of a subgame. The subgame of Γ starting
from x, denoted Γx , is the game defined by the portion of Γ that starts from decision node
x ∈ D and includes all subsequent nodes. Intuitively, a subgame is a portion of the game
that one can consider analyzing without reference to the rest of the game. (In games of
–81–
imperfect information, not every decision node is the beginning of a subgame: see Section
2.3.4.)
If σ is a strategy profile in Γ, then σ|x denotes the strategy profile that σ induces in subgame
Γx .
Definition (i). Strategy profile σ is a subgame perfect equilibrium of Γ (Selten (1965)) if in each
subgame Γx of Γ, σ|x is a Nash equilibrium.
Definition (ii). Strategy σi is sequentially rational given σ−i if for each decision node x of
player i, (σ|x )i is a best response to (σ|x )−i in Γx . If this is true for every player i, we say that
strategy profile σ itself is sequentially rational.
Definition (iii). Strategy σi admits no profitable one-shot deviations given σ−i if for each decision
node x of player i, player i cannot improve his payoff in Γx by changing his action at node
x but otherwise following strategy (σ|x )i . If this is true for every player i, we say that
strategy profile σ itself admits no profitable one-shot deviations.
Finite games
A finite perfect-information game has a finite number of decision nodes and a finite number
of actions at each decision node.
Theorem 2.21. Let Γ be a finite perfect information game. Then the following are equivalent:
(i) Strategy profile σ is a subgame perfect equilibrium.
(ii) Strategy profile σ is sequentially rational.
(iii) Strategy profile σ admits no profitable one-shot deviations.
The implications (i) ⇒ (ii) ⇒ (iii) follow easily from their definitions, and the implication
(ii) ⇒ (i) is mainly notation-juggling. The implication (iii) ⇒ (ii) is a direct consequence of
a basic result on single-agent sequential choice called the (finite-horizon) one-shot deviation
principle (Theorem 2.28). Thus the principle of sequential rationality has both game-
theoretic content (credibility of commitments) and decision-theoretic content (optimal
–82–
sequential decision-making). The proof of Theorem 2.21 and a detailed presentation of
the one-shot deviation principle are provided at the end of this section.
We can find all profiles satisfying (iii) using the following backward induction procedure:
Find a decision node x that is only followed by terminal nodes. Specify an action at
that node that leads to the terminal node z following x that yields the highest payoff for
the owner of node x. Then replace decision node x with terminal node z and repeat the
procedure.
If indifferences occur, the procedure branches, with each branch specifying a different
optimal choice at the point of indifference. Different strategy profiles that survive may
have different outcomes: while the player making the decision is indifferent between his
actions, other players generally are not (see Example 2.35).
The backward induction procedure systematically constructs strategy profiles from which
there are no profitable one-shot deviations: it first ensures this at nodes followed only by
terminal nodes, and then at the nodes before these, and so on.
Observation 2.22. In a finite perfect information game, a strategy profile admits no profitable
one-shot deviations if and only if it survives the backward induction procedure.
Remarks:
(i) Since the backward induction procedure always generates at least one strategy
profile, existence of subgame perfect equilibrium is guaranteed. In fact, since it
is always possible to specify a pure action at every point of indifference, a pure
strategy subgame perfect equilibrium always exists.
(ii) If the backward induction procedure never leads to an indifference, it generates
a unique subgame perfect equilibrium, sometimes called the backward induction
solution. This is the case in generic finite perfect information games (specifically,
finite perfect information games in which no player is indifferent between any pair
of terminal nodes).
In this case “subgame perfect equilibrium” is somewhat of a misnomer. This
term suggests that equilibrium knowledge assumptions are needed to justify the
prediction. In Section 2.3.2 we explain why this is not true, but also why the
assumptions that are needed are still rather strong in many cases.
Example 2.23. Entry Deterrence: solution. In the Entry Deterrence game (Example 2.20), the
backward induction procedure selects A for player 2, and hence E for player 1. Thus the
backward induction solution is (E, A). ♦
–83–
Example 2.24. Multiple entrants. There are two entrants and an incumbent. The entrants
decide sequentially whether to stay out of the market or enter the market. Entrants who
stay out get 0. If both entrants stay out, the incumbent gets 5. If there is entry, the
incumbent can fight or accommodate. If the incumbent accommodates, per firm profits
are 2 for duopolists and −1 for triopolists. On top of this, fighting costs the incumbent 1
and the entrants who enter 3.
1
O E
2 2
o e o’ e’
0 3 3
3
0
5 f a f’ a’ f’’ a’’
0 0 -1 2 -4 -1
-1 2 0 0 -4 -1
1 2 1 2 -2 -1
The backward induction solution of this game is (E, (e, o0 ), (a, a0 , a00 )), generating outcome
(2, 0, 2). This is an instance of first mover advantage.
Note that Nash equilibrium does not restrict possible predictions very much in this game:
of the 64 pure strategy profiles, 20 are Nash equilibria of the reduced normal form;
(0, 0, 5), (0, 2, 2), and (2, 0, 2) are Nash equilibrium outcomes. Thus, requiring credibility
of commitments refines the set of predictions substantially.
–84–
1:O 1:E
2 2
oo0 oe0 eo0 ee0 oo0 oe0 eo0 ee0
f f 0 f 00 005 005 0 -1 1 0 -1 1 f f 0 f 00 -1 0 1 -4 -4 -2 -1 0 1 -4 -4 -2
f f 0 a00 005 005 0 -1 1 0 -1 1 f f 0 a00 -1 0 1 -1 -1 -1 -1 0 1 -1 -1 -1
f a0 f 00 005 005 0 -1 1 0 -1 1 f a0 f 00 202 -4 -4 -2 202 -4 -4 -2
f a0 a00 005 005 0 -1 1 0 -1 1 f a0 a00 202 -1 -1 -1 202 -1 -1 -1
3 3
a f 0 f 00 005 005 022 022 a f 0 f 00 -1 0 1 -4 -4 -2 -1 0 1 -4 -4 -2
a f 0 a00 005 005 022 022 a f 0 a00 -1 0 1 -1 -1 -1 -1 0 1 -1 -1 -1
aa0 f 00 005 005 022 022 aa0 f 00 202 -4 -4 -2 202 -4 -4 -2
aa0 a00 005 005 022 022 aa0 a00 202 -1 -1 -1 202 -1 -1 -1 ♦
Infinite games
Theorem 2.21 continues to hold in games with a finite number of decision nodes but
infinite numbers of actions at each node. But subgame perfect equilibria need not exist
because maximizing actions need not exist. (For instance, consider a one-player game in
which the player chooses x ∈ [0, 1) and receives payoff x.)
Observation 2.22 stated that in a finite perfect information game, if an indifference occurs
during the backward induction procedure, then each choice of action at the point of
indifference leads to a distinct subgame perfect equilibrium. This conclusion may fail
in games with infinite action sets, because some ways of breaking indifferences may be
inconsistent with equilibrium play—see Example 2.37.
Theorem 2.21 also holds in games with infinitely many decision nodes provided that a
mild condition on payoffs is satisfied. We say that a game’s payoffs are continuous at
infinity if for every ε > 0 there exists a K such that choices made after the Kth node of any
play path cannot alter any player’s payoffs by more than ε. This property holds in games
in which payoffs are discounted over time and (undiscounted) payoffs in each period are
bounded—see Example 2.39 and Section 3.
Theorem 2.25. Let Γ be an arbitrary perfect information game whose payoffs are continuous at
infinity. Then the following are equivalent:
(i) Strategy profile σ is a subgame perfect equilibrium.
(ii) Strategy profile σ is sequentially rational.
(iii) Strategy profile σ admits no profitable one-shot deviations.
Again, implications (i) ⇒ (ii) ⇒ (iii) follow easily from their definitions, and implication (ii)
–85–
⇒ (i) is straightforward to check. Implication (iii) ⇒ (ii) follows from the infinite horizon
one-shot deviation principle, which is an easy extension of its finite-horizon counterpart;
see the end of this section for a proof. In these results, the assumption that payoffs are
continuous at infinity cannot be dispensed with; see Example 2.31.
In games with infinitely many decision nodes, subgame perfect equilibria cannot be found
using backward induction, since there are no last decision nodes from which to launch the
backward induction procedure. Instead, the one-shot deviation principle must be used
directly—again, see Example 2.39 and Section 3.
To begin the proof of Theorem 2.21, we explain why the implications (i) ⇒ (ii) and
(ii) ⇒ (iii) are immediate from the definitions. For the first implication, consider what
definitions (i) and (ii) require in each subgame Γx : in Definition (i), every player’s strategy
must be optimal given the strategies of the others; in definition (ii), only the strategy of
the owner of node x is required to be optimal.
For the second implication, consider the optimization problems the players are required
to solve in definitions (ii) and (iii). In definition (ii), when we evaluate the strategy (σ|x )i
of the owner i of node x in subgame Γx , we must compare the performance of (σ|x )i to that
of each other strategy for the subgame, including ones that specify different actions than
(σ|x )i at multiple decision nodes. Under definition (iii), the only changes in strategy that
are ever considered consist of a change in action at a single decision node.
Thus, the content of Theorem 2.21 is in the reverse implications, (ii) ⇒ (i) and (iii) ⇒ (ii).
Roughly speaking, (ii) implies (i) because in a sequentially rational strategy profile σ, the
optimality in subgame Γx of a strategy (σ|x ) j of a player j who does not own node x can be
deduced from the the optimality of j’s strategies in subgames Γ y of Γx whose initial nodes
are owned by j. The proof to follow makes this argument explicit.
Proof of Theorem 2.21, (ii) ⇒ (i). We give the proof for pure strategy profiles s; the proof for
mixed strategy profiles is similar.
Suppose that s is a sequentially rational strategy profile, and let x be a decision node of
player i. We need to show that s|x is a Nash equilibrium of Γx . First, note that by definition,
(s|x )i is a best response to (s|x )−i . Second, if a decision node belonging to player j , i is on
the play path under s|x , then let y be the first such node. Since y is the first node of j’s that
is reached, j’s choices outside of Γ y do not affect his payoff in Γx when his opponents play
s|x , and sequential rationality implies that (s| y ) j is a best response to (s| y )−j . Together, these
facts imply that (s|x ) j is a best response to (s|x )−j . Third, if no node of player k is reached on
the play path under s|x , then any behavior for k is a best response to (s|x )−k . We therefore
conclude that s|x is a Nash equilibrium of Γx .
As noted earlier, the implication (iii) ⇒ (ii) is a consequence of a fundamental result about
single-agent decision problems called the one-shot deviation principle, which we discuss
–86–
next. To understand this decision-theoretic aspect of backward induction, it is easiest
to focus on perfect information games with one player, which we call sequential decision
problems.
1x
A B
1y 1z
C D E F
1 1 1 1
G H I J K L M N
Let σ and σ̂ be pure strategies of the lone player in sequential decision problem Γ.
We say that σ̂ is a profitable deviation from σ in subgame Γx if σ̂|x generates a higher payoff
than σ|x in subgame Γx . In this definition, it is essential that payoffs be evaluated from the
vantage of node x, not from the vantage of the initial node of Γ (see Example 2.29).
By definition, σ is sequentially rational in Γ if it does not admit a profitable deviation
in any subgame. Put differently, σ is sequentially rational if for any decision node x, σ|x
yields the lone player the highest payoff available in Γx .
Example 2.27. Sequential decision problem revisited. In the sequential decision problem from
Example 2.26, the sequentially rational strategy is (B, C, F, G, I, K, N). ♦
We call strategy σ̂ a one-shot deviation from strategy σ if it only differs from σ at a single
decision node, say x̂. This one-shot deviation is profitable if it generates a higher payoff
than σ in subgame Γx̂ .
Thus to construct a strategy that does not admit profitable deviations of any kind, even
ones requiring changes in action at multiple nodes at once, it is enough to apply the
backward induction procedure, which never considers such deviations explicitly.
–87–
In the context of finite sequential decision problems, the one-shot deviation principle is
simple, but it is not trivial—it requires a proof. (One way to see that the theorem has some
content is to note that it does not extend to infinite sequential decision problems without
additional assumptions are imposed—see Theorem 2.30 and Example 2.31 below.)
Proof of Theorem 2.28. It is immediate that (i) implies (ii). To prove the converse, suppose
that σ does not admit a profitable one-shot deviation, but that it does admit a profitable
deviation that requires changes in action over T periods of play. By hypothesis, the last
stage of this deviation is not profitable when it is undertaken. Therefore, only following
the first T − 1 periods of the deviation must yield at least as good a payoff from the vantage
of the initial stage of the deviation. By the same logic, it is at least as good to only follow
the first T − 2 periods of the deviation, and hence the first T − 3 periods of the deviation
. . . and hence the first period of the deviation. But since a one-shot deviation cannot be
profitable, we have reached a contradiction.
1x
O I
(0) 1y
L R
(-1) (1)
If deviations are always evaluated from the point of view of the initial node x, then there
is no way to improve upon strategy (O, L) by only changing the action played at a single
node. But (O, L) clearly is not sequentially rational. Does this contradict the one-shot
deviation principle?
No. When we consider changing the action at node y from L to R, we should view the
effect of this deviation from the vantage of node y, where it is indeed profitable. The
choice of R at y in turn creates a profitable deviation at x from O to I. ♦
–88–
so Theorem 2.28 implies that σi is sequentially rational in these decision problems. As this
is true for all players, σ is sequentially rational in Γ.
Proof : Clearly (i) implies (ii). To prove the inverse, suppose that (i) is false. Then there is a
decision node x whose owner i has an infinite-period deviation from σ that starts at node
x and increases his payoff in subgame Γx by some δ > 0. Since payoffs are continuous at
infinity, this player i must have a finite-period deviation in Γx that increases his payoff by
δ
2
in Γx . But then Theorem 2.28 implies player i has a profitable one-shot deviation in Γx ,
and thus (ii) must also be false.
Example 2.31. Without continuity of payoffs at infinity, the infinite horizon one-shot devi-
ation principle does not hold. For example, suppose that an agent must choose an infinite
sequence of Ls and Rs; his payoff is 1 if he always chooses R and is 0 otherwise. Consider
the strategy “always choose L”. While there is no profitable finite-period deviation from
this strategy, there is obviously a profitable infinite-period deviation. ♦
Under the single-move property, there is no way that a player’s behavior “early” in the
game can provide information about his behavior “later” in the game, since a player who
has already moved will not be called upon to move again. For this reason, in generic
games with the single-move property the backward induction solution can be justified
using mild assumptions. When there are just two players, common certainty of rationality
–89–
is sufficient. (Why common certainty of rationality? “Knowledge” refers to beliefs that
are both certain and correct. In extensive form contexts, using “certainty” rather than
“knowledge” allows for the possibility that a player discovers that his beliefs are incorrect
during the course of play.)
In games with more than two players, one also needs an assumption of epistemic indepen-
dence: obtaining information about one opponent (by observing his move) does not lead a
player to update his beliefs about other opponents (see Stalnaker (1998) and Battigalli and
Siniscalchi (1999)).
In games where the single-move property is not satisfied, so that some player may have
to move more than once during the course of play, common certainty of rationality and
epistemic independence are not enough to justify backward induction. In addition, one
needs to impose an assumption that regardless of what has happened in the past, there
continues to be common certainty that behavior will be rational in the future. See Halpern
(2001) and Asheim (2002) for formalizations of this idea. This assumption of “undying
common belief in future rational play” is sufficient to justify backward induction so long
as indifferences never arise during the backward induction procedure. This assumption is
strong, and sometimes leads to counterintuitive conclusions—see Example 2.34. But the
assumption is of a different nature than the equilibrium knowledge assumption needed to
justify Nash equilibrium play, as coordination of different players’ beliefs is not assumed.
C c C’
1 2 1 1,5
S s S’
The backward induction solution of this game is ((S, S0 ), s). For player 1 to prefer to play S,
he must expect player 2 to play s if her node is reached. Is it possible for a rational player
2 to do this?
To answer this question, we have to consider what player 2 might think if her decision
node is reached: specifically, what she would conclude about player 1’s rationality, and
about what player 1 will do at the final decision node. One possibility is that player 2
thinks that a rational player 1 would play S at his initial node, and that a player 1 who
plays C there is irrational, and in particular would play C0 at the final node if given the
–90–
opportunity. If this is what player 2 would think if her node were reached, then she would
play c there.
If player 1 is rational and anticipates such a reaction by player 2, he will play C, resulting
in the play of ((C, S0 ), c). Indeed, this prediction is consistent with the assumption of
common certainty of rationality at the start of play—see Ben-Porath (1997). Therefore,
stronger assumptions of the sort described before this example are needed to ensure the
backward induction solution. ♦
C c C
1 2 3 1,5,1
S s S
The game above is a three-player version of the one from Example 2.32, with the third
node being assigned to player 3, and with player 3’s payoffs being identical to player
1’s. The game’s backward induction solution is (S, s, S ). In this game, if player 2’s node
is reached, she may become doubtful of player 1’s rationality, but there is no obvious
reason for her to doubt player 3’s rationality. Thus unlike in Example 2.32, the backward
induction prediction can be justified by common certainty of rationality and epistemic
independence. ♦
S S S S S S S S
In the backward induction solution, everyone always stops, yielding payoffs of (0, 0). ♦
–91–
(i) This example raises a general critique of backward induction logic: Suppose you
are player 2, and you get to go. Since 1 deviated once, why not expect him to
deviate again?
(ii) The outcome (0, 0) is not only the backward induction solution; it is also the unique
Nash equilibrium outcome.
(iii) In experiments, most people do better (McKelvey and Palfrey (1992)).
(iv) There are augmented versions of the game in which some Continuing occurs in
equilibrium: see Kreps et al. (1982) (cf. Example 2.53 below) and Jehiel (2005).
(v) Stable cooperative behavior arises in populations of optimizing agents if one makes
realistic assumptions about the information agents possess (Sandholm et al. (2016)).
When indifferences arise during the backward induction procedure, the procedure branches,
with each branch leading to a distinct subgame perfect equilibrium. To justify subgame
perfect equilibrium predictions in such cases, one must resort to equilibrium knowledge
assumptions to coordinate beliefs about what indifferent players will do.
Example 2.35. Discipline by an indifferent parent. Player 1 is a child and player 2 the parent.
The child chooses to Behave or to Misbehave. If the child Misbehaves, the parent chooses
to Punish or Not to punish; she is indifferent between these options, but the child is not.
1x
B M
0,1 2y
P N
-1,0 1,0
One can argue that the parent should be able to credibly commit to punishing—see Tranæs
(1998). ♦
–92–
2.3.3 Applications
–93–
Part (ii) tells us that in a long game, there is a first mover advantage.
The same limit result holds for odd T̂: (u∗1 , u∗2 ) =
1−δT̂+1 δ+δT̂+1
1+δ
, 1+δ → 1
, δ
1+δ 1+δ
as T → ∞.
, δ−δ
T+1 T+1
(Why? We saw that if T is even, 1 offers last, and his period 0 offer is ( 1+δ
1+δ 1+δ
) (*).
If T̂ = T + 1 is odd, then 2 offers last; since when she makes her period 1 offer there are
–94–
T periods to go, this offer is obtained by reversing (*): ( δ−δ ) = ( δ−δ
T+1 T+1 T̂ T̂
1+δ
, 1+δ
1+δ
, 1+δ ). Thus,
1+δ 1+δ
in period 0, 10 s offer to 2 is u∗2 = δ · 1+δ = δ+δ , and he keeps u∗1 = 1 − δ+δ
T̂ T̂+1 T̂+1 T̂+1
1+δ 1+δ 1+δ
= 1−δ
1+δ
for
himself.) ♦
Remark: The argument above shows that the unique subgame perfect equilibrium has
each player making offers that cause the opponent to be indifferent, and has the opponent
accepting such offers. The fact that the opponent always accepts when indifferent may
seem strange, but with a continuum of possible offers, it is necessary for equilibrium to
exist. It is comforting to know that this equilibrium can be viewed as the limit of equilibria
of games with finer and finer discrete strategy sets, equilibria in which the opponent rejects
offers that leave her indifferent.
Propositions 2.38 and 2.40 show that equilibrium payoffs in alternating-offer bargaining
games are “continuous at infinity”. (Continuity of equilibrium payoffs at infinity does not
hold in all games—see Section 3.1.)
Verifying that the specified strategy profile is a subgame perfect equilibrium is a simple
application of the one-shot deviation principle, and showing that it is uniquely so requires
surprisingly little calculation. This is because of the recursive structure of the game: each
period t subgame “looks the same” as the full game, and each period t subgame “looks
the same” as a period 1 subgame (cf. Section 3.1). It is also helpful that subgames in even
and odd periods “look the same” apart from the identity of the proposer.
Proof of Proposition 2.40: Consider any period t ∈ {0, 1, 2, . . .} in which player i is the
proposer and player j the receiver. We argue in terms of continuation payoffs: that is, we
proceed as though the game started in period t, so that we need not bother discounting
everything by δt . The one-shot deviation principle tells us that to verify that a strategy
profile is a subgame perfect equilibrium, we need only consider deviations occurring at
one node at a time. To show that the strategy profile specified in the proposition is a
subgame perfect equilibrium, note first that player j could obtain a payoff of δ · 1+δ 1
by
–95–
declining the offer and becoming the proposer next period. Thus it is not profitable for
δ
player j to deviate from accepting an offer of 1+δ today. Knowing this, it is optimal for
δ
player i to offer 1+δ to player j, since this is the smallest amount player j will accept, and
δ
since offering player j less than this will result in player i obtaining a payoff of δ · 1+δ < 1+δ
1
.
The proof of uniqueness follows Shaked and Sutton (1984). Let ūi and ui be the supremum
and infimum of the set of player i’s subgame perfect equilibrium payoffs.
In any period t of any subgame perfect equilibrium, the receiver j will accept any offer
worth at least δū j to her. Thus it cannot be optimal for the proposer i to offer j more than
this, or, equivalently, to keep less than 1 − δū j for himself. This provides a lower bound
on player i’s possible subgame perfect equilibrium payoffs:
(15) ui ≥ 1 − δū j .
Likewise, in any period t of any subgame perfect equilibrium, the receiver j will reject any
offer worth less than δu j to her. Therefore, if the proposer i makes an offer that is accepted
in the equilibrium, the offer gives j at least δu j , and so gives i at most 1 − δu j . Alternatively,
if i makes an offer that is rejected, then j will obtain a payoff of at least δ · u j , and so i will
receive a payoff of at most δ · (1 − u j ) < 1 − δu j . These facts yield an upper bound on player
i’s equilibrium payoff:
(16) ūi ≤ 1 − δu j .
To avoid confusion, we now refer to the players as k and `. Using inequalities (15) and
(16), we obtain the following bounds on uk and ūk :
1
Thus uk ≥ 1+δ ≥ ūk ≥ uk , so all of the inequalities bind. We conclude that equilibrium must
be of the form specified in the proposition. Since player 1 is the proposer in period 0, it is
1
he who obtains the equilibrium payoff of 1+δ . ♦
To discuss subgame perfect equilibrium more generally, we must specify what is meant
by a subgame in general extensive form games. A subgame Γ0 of an general extensive form
–96–
game Γ is a subset of Γ which
(i) Contains a decision node, all succeeding edges and nodes, and no others.
(ii) Does not tear information sets: if x ∈ Γ0 and x, y ∈ I, then y ∈ Γ0 .
Example 2.41. Below, a, b, and i are the initial nodes of subgames; the other nodes are not.
1a
2b 2e 2f
1c 1d 3g 3h 1i
Example 2.42. The game tree below has three subgames: the whole game, and the portions
of the game starting after (H, t) and (T, h).
1
H T
2 2
h t h t
1 1
U D U D
2 2 2 2
u d u d u d u d
♦
Subgame perfect equilibrium is completely adequate for analyzing games of perfect infor-
mation. It is also completely adequate in games of stagewise perfect information (also called
multistage games with observable actions), which generalize perfect information games by
allowing multiple players to move at once at any point in play, with all actions being ob-
served immediately afterward. Thus nontrivial information sets are only used to capture
–97–
simultaneous choices. Example 2.42 has stagewise perfect information, but Example 2.41
does not.
The key point is that in stagewise perfect information games, once one fixes the behavior of
a player’s opponents, the player himself faces a collection of sequential decision problems,
implying that we can determine whether the player’s behavior is sequentially rational by
looking for profitable one-shot deviations; in particular, Theorem 2.21 extends to the
present class of games; see also Example 2.48 below.
Once simultaneous moves are possible, performing “backward induction” requires us to
find all equilibria at each stage of the game, allowing the analysis to branch whenever mul-
tiple equilibria are found. Equilibrium knowledge assumptions are needed to coordinate
beliefs about how play will proceed.
(Infinitely) repeated games (Section 3) are the simplest infinite-horizon version of games
of stagewise perfect information, so subgame perfect equilibrium is also the appropriate
solution concept for these games. Since players discount future payoffs, payoffs are
continuous at infinity in the sense defined in Section 2.3.1. Because of this, the one-shot
deviation principle applies, and the conclusions of Theorem 2.25 hold for repeated games.
As previously noted, backward induction cannot be used to solve games with infinite
numbers of decision nodes; instead, the one-shot deviation principle must be applied
directly.
We will see next that in general games of imperfect information, subgame perfect equilib-
rium is no longer sufficient to capture sequential rationality.
–98–
1
O M B 2
F A
0 2x 2y
2 O 0, 2 0, 2
F A F A 1 M −1, −1 2, 1
B −3, −1 4, 0
-1 2 -3 4
-1 1 -1 0
The entrant can choose stay out (O), enter meekly (M), or enter boldly (B). The incumbent
cannot observe which sort of entry has occurred, but is better off accommodating either
way.
This game has two components of Nash equilibria, (B, A) and (O, σ2 (F) ≥ 23 ).
Since the whole game is the only subgame, all of these Nash equilibria are subgame perfect
equilibria. ♦
To heed the principle of sequential rationality in this example, we need to ensure that
player 2 behaves rationally if her information set is reached.
To accomplish this, we will require players to form beliefs about where they are in an
information set, regardless of whether the information set is reached in equilibrium. We
then will require optimal behavior at each information set given these beliefs.
By introducing appropriate restrictions on allowable beliefs, we ultimately will define
the notion of sequential equilibrium, the fundamental equilibrium concept for general
extensive form games.
–99–
at her two nodes differed.
Beliefs
Given a strategy profile σ, we can compute the probability Pσ (x) that each node x ∈ X is
reached. Let Pσ (I) = x∈I Pσ (x).
P
P (x)
The belief profile µ is Bayesian given profile σ if µi (x) = Pσσ (I) whenever Pσ (I) > 0. In words:
beliefs are determined by conditional probabilities whenever possible.
O M B
¹⁄₄ ¹⁄₄ ¹⁄₂
2x I
2y
F A F A
At information sets I on the path of play, Bayesian beliefs describe the conditional proba-
bilities of nodes being reached.
But beliefs are most important at unreached information sets. In this case, they represent
“conditional probabilities” after a probability zero event has occurred.
In Section 2.4.3 we impose a key restriction on allowable beliefs in just this circumstance.
–100–
Sequential rationality
Given a node x, let ui (σ|x) denote player i’s expected utility under strategy profile σ
conditional on node x being reached.
We call strategy σi rational starting from information set I ∈ Ii given σ−i and µi if
X X
(17) µi (x)ui (σi , σ−i |x) ≥ µi (x)ui (σ̂i , σ−i |x) for all σ̂i .
x∈I x∈I
In words: starting at information set I, strategy σi generates the best possible payoff for
player i given his beliefs at I and the other players’ strategies. (Remark: The definition
only depends on choices under σi and σ−i from information set I onward, and on beliefs
µi at information set I.)
If (17) holds for every information set I ∈ Ii , we call strategy σi sequentially rational given σ−i
and µi . If for a given σ and µ this is true for all players, we call strategy profile σ sequentially
rational given µ.
If the information set I = {x} is a singleton (as many are), then beliefs are trivial, so (17)
reduces to
Thus the full glory of (17) is only needed at nontrivial information sets.
A pair (σ, µ) consisting of a strategy profile σ and a belief profile µ is called an assessment.
The assessment (σ, µ) is a weak sequential equilibrium if
(i) µ is Bayesian given σ.
(ii) σ is sequentially rational given µ.
Remarks:
1. One can verify that σ is a Nash equilibrium if and only if (i) there are beliefs µ that
are Bayesian given σ and (ii) for each player i and each information set I ∈ Ii on
the path of play (i.e., with Pσ (I) > 0), σi is rational at information set I ∈ Ii given σ−i
and µi . (For instance, in Example 2.43, if player 1 plays O, player 2 need not play
a best response given his beliefs, so (O, F) is Nash.) It follows that weak sequential
equilibrium is a refinement of Nash equilibrium.
2. The concept we call “weak sequential equilibrium” is called different names by
different authors: “perfect Bayesian equilibrium”, “weak perfect Bayesian equi-
librium”. . . Moreover, these names are also used by different authors to mean
–101–
slightly different things. But fortunately, everyone agrees about the meaning of
“sequential equilibrium”, and this is the important concept.
Example 2.45. Entry deterrence II once more. Now player 2 must play a best response to
some beliefs µ2 . Since A is dominant at this information set, 2 must play it. Thus 1 must
play B, and so ((B, A), µ2 (y) = 1) is the unique weak sequential equilibrium. ♦
Γ 1
O E
0 1
2
F C
2x 2y
f a f a
-5 -5 -2 2
3 2 -1 1
In this game the entrant moves in two stages. First, he chooses between staying out (O)
and entering (E); if he enters, he then chooses between entering foolishly (F) and entering
cleverly (C). Entering foolishly (i.e., choosing (E, F)) is strictly dominated.
If the entrant enters, the incumbent cannot observe which kind of entry has occurred.
Fighting ( f ) is optimal against foolish entry, but accommodating (a) is optimal against
clever entry.
This is a game of stagewise perfect information, so it can be analyzed fully using subgame
perfect equilibrium. The unique subgame perfect equilibrium of the game is ((E, C), a):
player 1 plays the dominant strategy C in the subgame, so 2 plays a, and so 1 plays E.
This corresponds to a weak sequential equilibrium (with beliefs µ2 (y) = 1 determined by
taking conditional probabilities).
In addition, ((O, ·), σ2 ( f ) ≥ 12 ) are Nash equilibria.
These Nash equilibria correspond to a component of weak sequential equilibria (that are
not subgame perfect!):
–102–
((O, C), f ) with µ2 (x) ≥ 23 ;
((O, C), σ2 ( f ) ≥ 12 ) with µ2 (x) = 23 .
Why are these weak sequential equilibria? If 1 plays O, condition (ii) places no restriction
on beliefs at 2’s information set. Therefore, player 2 can put all weight on x, despite the
fact F is a dominated strategy. ♦
Definition
Remarks
(i) The “there exists” in the definition of consistency is important. In general, one can
support more than one consistent belief profile µ for a given strategy profile σ by
considering different perturbed strategy profiles σk .
Later we consider perfect and proper equilibrium, which also use “there exists”
definitions (see the discussion before Example 2.62). We also consider KM stability,
which instead uses a “for all” definition, and so imposes a more demanding sort
of robustness.
(ii) Battigalli (1996) and Kohlberg and Reny (1997) offer characterizations of consistency
that clarify what this requirement entails.
Example 2.46 revisited. Profile ((E, C), a) is subgame perfect equilibrium and a weak se-
quential equilibrium (with µ2 (y) = 1). To see whether it is a sequential equilibrium, we
–103–
must check that µ2 (y) = 1 satisfies (ii). To do this, we must construct an appropriate se-
quence of perturbed strategy profiles, compute the implied beliefs by taking conditional
probabilities, and show that we get µ2 (y) = 1 in the limit.
The assessments (((O, C), f ), µ(x) ≥ 32 ) are weak sequential equilibria (but not subgame
perfect equilibria).
σ1 (O) = 1 − εE σ1 (E) = εE εE (1 − εF )
=⇒ µ2 (y) = = 1 − εF → 1 l
σ1 (F) = εF σ1 (C) = 1 − εF εE
It can be shown that for any strategy profile σ, there exists a corresponding profile of
consistent beliefs µ. So if one can rule out all but one profile of beliefs as candidates for
consistency, this surviving profile must be consistent. This ruling out can be accomplished
using the following implications of consistency: if µ is consistent given σ, then
(i) µ is Bayesian given σ. In fact, if all information sets are reached under σ, then µ is
consistent if and only if it is Bayesian (cf the previous example).
(ii) Preconsistency: If by changing his strategy a player can force one of his own un-
reached information sets to be reached, his beliefs there should be as if he did
so.
More precisely: If player i’s information set I0 follows his information set I, and
if i has a deviation σ̂i such that play under (σ̂i , σ−i ) starting from I reaches I0 with
positive probability, then i’s beliefs at I0 are determined by conditional probabilities
under (σ̂i , σ−i ) starting from I. (One can show that the beliefs at I0 so defined are
independent of the deviation σ̂i specified.)
Example 2.47. Below, preconsistency requires µ1 (x) = 21 = µ1 (y). The same would be true
if the tree shown here were an unreached portion of a larger game.
–104–
1
O I
2
L R
1/2 1/2
1x 1y
Example 2.48. In games of stagewise perfect information (Section 2.3.4), the players play a
sequence of simultaneous move games during stages t = 0, 1, . . . , T, with the choices
made in each stage being observed before the next stage begins, and where the game that
is played in stage t may depend on the choices made in previous stages. (Moves by Nature
can also be introduced at the start of each stage.)
Let Γ be such a game. Each history describing all actions played through stage t − 1 < T
in Γ determines a stage t subgame of Γ. In every stage T subgame, stagewise consistency
ensures that each player’s beliefs about the other players’ choices in the subgame are
given by their strategies. Therefore, each player’s behavior in the subgame is sequentially
rational if and only if the players’ joint behavior in the subgame is a Nash equilibrium.
Applying backward induction then shows that in any game with stagewise perfect infor-
–105–
mation, the set of subgame perfect equilibria and the set of sequential equilibrium strategy
profiles are identical. ♦
(v) Cross-player consistency: Players with the same information must have the same
beliefs about opponents’ deviations.
Example 2.49. In the game tree below, if player 1 chooses A, consistency requires that
µ2 (y) = µ3 ( ŷ). The reason is that both players’ beliefs are generated by the same perturba-
tion of player 1’s strategy.
1
A B C
2y 2z
L R L R
2/3 1/3 2/3 1/3
3 yˆ 3zˆ
One can regard cross-player consistency as an extension of the usual equilibrium assump-
tion: In a Nash equilibrium, players agree about opponents’ behavior off the equilibrium
path. Cross-player consistency requires them to agree about the relative likelihoods of
opponents’ deviations from equilibrium play.
Theorem 2.50 (Hendon et al. (1996)). Let Γ be a finite extensive form game with perfect recall,
let σ be a strategy profile for Γ, and suppose that belief profile µ is preconsistent given σ. Then σ is
sequentially rational given µ if and only if no player i has a profitable one-shot deviation from σi
given σ−i and µi .
–106–
Relationships among extensive form refinements
Kreps and Wilson (1982) show that all sequential equilibrium strategy profiles are subgame
perfect equilibria. Combining this with earlier facts gives us the following diagram of the
relationships among refinements.
Remark: Although sequential equilibrium does not allow the play of strictly dominated
strategies, the set of sequential equilibria of a game Γ may differ from that of a game Γ0 that
is obtained from Γ by removing a strictly dominated strategy: see Section 2.5, especially
Examples 2.57 and 2.59.
Example 2.51. Ace-King-Queen Poker is a two-player card game that is played using a
deck consisting of three cards: an Ace (the high card), a King (the middle card), and a
Queen (the low card). Play proceeds as follows:
–107–
(i) Draw the extensive form of this game.
(ii) Find all sequential equilibria of this game.
(iii) If you could choose whether to be player 1 or player 2 in this game, which player
would you choose to be?
1
(i) See below. The probabilities of Nature’s moves (which are 6
each) are not shown.
(This is an example of an (extensive form) Bayesian game. The natural interpretation is an
ex ante one, in which the players do not know their types until play begins. See Section
4.4 for further discussion.)
(ii) The unique sequential equilibrium is
with the corresponding Bayesian beliefs. (Here, σ2 (c|a) is the probability that a player 2 of
type a chooses c in the event that player 1 raises. Below, µ2 (A|k) will represent the probability
that player 2 assigns to player 1 being of type A when player 2 is of type k and player 1
raises.)
To begin the analysis, notice that R is dominant for (player 1 of) type A, c is dominant
for type a, and f is dominant for type q. In addition, R is dominant for type K: If type
K raises, the worst scenario for him has player 2 call with an Ace and fold with a Queen
(as we just said she would). Thus type K’s lowest possible expected payoff from raising is
1
2
(−2) + 12 (1) = − 21 , which exceeds the payoff of −1 that type K would get from folding.
All that remains is to determine σ1 (R|Q) and σ2 (c|k).
Suppose σ2 (c|k) = 1. Then σ1 (F|Q) = 1, which implies that µ2 (A|k) = 1, in which case 2
should choose f given k, a contradiction.
Suppose σ2 (c|k) = 0. Then σ1 (F|Q) = 0, which implies that µ2 (A|k) = 12 , in which case 2
should choose c given k, a contradiction.
It follows that player 2 must be mixing when she is of type k. For this to be true, it must
be that when she is of type k, her expected payoffs to calling and folding are equal; this is
true when µ2 (A|k) · (−2) + (1 − µ2 (A|k)) · 2 = −1, and hence when µ2 (A|k) = 34 . For this to be
her belief, it must be that σ1 (R|Q) = 13 .
For player 1 to mix when he is of type Q, his expected payoffs to F and to R must be equal.
This is true when
−1 = 1
2
· (−2) + 1
2
σ2 (c|k) · (−2) + (1 − σ2 (c|k)) · 1 .
–108–
0
Ak Aq Ka Kq Qa Qk
1 A 1 1 K 1 1 Q 1
F F F F F F
R R R R R R
–1, 1 –1, 1 –1, 1 –1, 1 –1, 1 –1, 1
2 a 2
c f c f
–2, 2 1, –1 –2, 2 1, –1
2 q 2
c f c f
2, –2 1, –1 2, –2 1, –1
2 k 2
c f c f
2, –2 1, –1 –2, 2 1, –1
Therefore, since this is a zero-sum game, player 2’s expected payoff is 19 , and so it is better
to be player 2. ♦
Example 2.52. Suppose we modify Ace-King-Queen Poker (Example 2.51) as follows: In-
stead of choosing between Raise and Fold, player 1 chooses between Raise and Laydown
(L). A choice of L means that the game ends, the players show their cards, and the player
with the higher card wins the pot. Find the sequential equilibria of this game. Which
player would you choose to be?
with the corresponding Bayesian beliefs. That is, the only change from part (ii) (apart
–109–
from replacing F with L) is that player 1 now chooses L when he receives a King.
To see why, note that compared to the original game, the payoffs in this game are only
different when 1 chooses L after Ak, Aq, and Kq: in these cases he now gets 1 instead of
−1. As before, c is dominant for type a, and f is dominant for type q. Given these choices
of player 2, the unique best response for type K is now L. (The reason is that player 1’s
expected payoff from choosing L when he has a King is 21 (−1) + 12 (1) = 0, which exceeds
his expected payoff from R of 12 (−2) + 21 (1) = − 12 .)
Unlike before, R is now only weakly dominant for A: since σ2 ( f |q) = 1, L is a best response
for type A when σ2 ( f |k) = 1. But if σ2 ( f |k) = 1, then σ1 (R|Q) = 1, which implies that
µ2 (A|k) ≤ 12 , which implies that 2 should choose c given k, a contradiction. Therefore,
σ2 ( f |k) < 1, which implies that σ1 (R|A) = 1. Thus, all that is left is to determine σ1 (R|Q)
and σ2 (c|k), and this is done exactly as before.
In computing player 1’s equilibrium payoff, the only change is that when the type profile
is Ka, player 1 gets a payoff of −1 rather −2. This changes the expected payoff in the game
from − 91 to − 91 + 16 = 18
1
. Therefore, the change in the rules makes it preferable to be player
1. ♦
Example 2.53. Centipede with a possibly altruistic opponent (Myerson (1991), based on Kreps
et al. (1982)).
0
_
19 _1
20 tn ta 20
1w 1x
S C S C
0, 0 2 0, 0 2
s c sˆ cˆ
-1, 4
-1, 5 1y 1z
S’ C’ S’ C’
4, 4 2 4, 8 2
s’ c’ s’
ˆ c’
ˆ
3, 9 8, 8 3, 12 8, 16
Start with a four-round Centipede game (Example 2.34) in which continuing costs you 1
but benefits your opponent 5. (If this was it, “always stop” would be the unique subgame
perfect equilibrium.)
–110–
Suppose that player 1 assigns probability 201 to player 2 being altrustic, meaning that she
cares about the total amount of money the players receive.
(This is an example of an (extensive form) Bayesian game. While we could give this game
an ex ante interpretation, in which player 2 does not know her type until play begins, it
is more natural to give it an interim interpretation, in which player 2 is normal, and the
initial node is there to capture player 1’s uncertainty about player 2’s type. See Section 4.4
for further discussion.)
Notation for actions: capital letters for player 1; hats for the altruistic type of player 2,
primes for a player’s second information set.
Computation of equilibria:
The easy parts: µ1 (w) = 19
20
(since beliefs must be Bayesian);
σ2 (ĉ) = σ2 (ĉ0 ) = 1 (by sequential rationality);
σ2 (s0 ) = 1 (by sequential rationality).
From Bayesian beliefs: If σ1 (C) > 0, then
Pσ (y) 19
σ (C)σ2 (c)
20 1 19σ2 (c)
(†) µ1 (y) = = = .
Pσ (y) + Pσ (z) 19
σ (C)σ2 (c) + 20
20 1
1
σ1 (C) 19σ2 (c) + 1
And if σ1 (C) = 0, preconsistency implies that (†) still describes the only consistent beliefs.
µ1 (y) = 45 .
19σ2 (c)
Thus equation (†) implies that 19σ2 (c)+1
= 45 , and hence that
σ2 (c) = 4
19
.
–111–
For 2 to be willing to mix, she must be indifferent between s and c.
⇒ 5 = (1 − σ1 (C0 )) · 4 + σ1 (C0 ) · 9
⇒ σ1 (C0 ) = 1
5
for playing C. Therefore, player 1 continues, and the unique sequential equilibrium is
(C, 45 S0 + 15 C0 ), ( 15
19
s+ 4
19
c, s0 , ĉ, ĉ0 ) with µ1 (w) = 19
20
, µ1 (y) = 45 .
The point of the example: By making 1 uncertain about 2’s type, we make him willing to
continue. But this also makes 2 willing to continue, since by doing so she can keep player
1 uncertain and thereby prolong the game.
Kreps et al. (1982) introduced this idea to show that introducing a small amount of
uncertainty about one player’s preferences can lead to long initial runs of cooperation in
finitely repeated Prisoner’s Dilemmas (Example 3.2). ♦
Example 2.54. Consider the following game played between two unions and a firm. First,
union 1 decides either to make a concession to the firm or not to make a concession. Union
2 observes union 1’s action, and then itself decides whether or not to make a concession
to the firm. The firm then chooses between production plan a and production plan b. The
firm has two information sets: it either observes that both firms have made concessions,
or it observes that at least one firm did not make a concession.
Each union obtains $4 if the firm chooses plan a and $0 if the firm chooses plan b. In
addition, each union loses $2 for making a concession. If the firm chooses plan a, it
obtains $2 for each union that makes a concession. If the firm chooses plan b, it obtains
$1 for certain, and receives an additional $1 if both unions make concessions. Utilities are
linear in dollars.
(i) Draw an extensive form representation of this game, and find all of its sequential
equilibria.
(ii) Now suppose that union 2 cannot observe union 1’s decision. Draw an appropriate
extensive form for this new game, and compute all of its sequential equilibria.
(iii) The games in part (i) and (ii) each have a unique sequential equilibrium outcome,
but the choices made on these equilibrium paths are quite different. Explain in
words why the choices made on the equilibrium path in (i) cannot be made on the
equilibrium path in (ii), and vice versa. Evidently, the differences here must hinge
on whether or not player 2 can observe a deviation by player 1.
–112–
(i) The extensive form game is below.
1
C D
2 2
c ĉ d
^
3 3x 3y 3z
A B a b a b a b
2 -2 2 -2 4 0 4 0
2 -2 4 0 2 -2 4 0
4 2 2 1 2 1 0 1
In any sequential equilibrium, 3 chooses A and 2 chooses d.ˆ To proceed, we split the
analysis into cases according to 3’s behavior at her right information set, which we call
I. Notice that b is a best response for player 3 if and only if 1 ≥ 2(1 − µ3 (z)), which is
equivalent to µ3 (z) ≥ 12 .
If 3 plays a, then 2 plays d, so 1 plays D. But then 3 prefers b. Contradiction.
If 3 plays b, then 2 plays c and 1 plays C, and so I is unreached. For b to be a best response
for 3, it must be that µ3 (z) ≥ 21 . Since 2 chooses d,ˆ parsimony implies that µ3 (y) = 0. Given
our freedom in choosing εkD and εkd , it is not difficult to verify that any beliefs satisfying
µ3 (z) ≥ 12 and µ3 (y) = 0 are consistent. Thus (C, (c, d),ˆ (A, b)) with µ3 (z) ≥ 1 and µ3 (y) = 0
2
are sequential equilibria.
Now suppose that 3 plays a mixed strategy. Then µ3 (z) = 12 . We divide the analysis into
subcases:
First, suppose that I is off the equilibrium path: that is, 2 plays c and 1 plays C. We noted
above that the belief µ3 (z) = 12 and µ3 (y) = 0 is consistent in this case. Choosing c is
optimal for 2 if 4σ3 (a) ≤ 2, or equivalently if σ3 (a) ≤ 12 ; the same condition describes when
ˆ (A, σ3 (b) ≥ 1 )) with µ3 (z) = 1 and µ3 (y) = 0 are
choosing C is optimal for 1. Thus, (C, (c, d), 2 2
sequential equilibria.
Second, suppose that I is on the equilibrium path. In this case, since beliefs are Bayesian
and 2 plays d,ˆ we have that
σ1 (D)
µ3 (z) =
(1 − σ1 (D)) σ2 (d) + σ1 (D)
σ1 (D)
(†) σ2 (d) = .
1 − σ1 (D)
–113–
We split again into subcases:
If 2 plays c, then (†) implies that 1 plays C, in which case I is off the equilibrium path.
Contradiction.
If 2 plays d, then (†) implies that 1 plays 21 C + 12 D, but 1’s best response is D. Contradiction.
If 2 mixes, then she is indifferent, which implies that σ3 (a) = 12 . Taking this into account,
we find that C is optimal for 1 if and only if 2σ2 (c) ≥ 2. Since 2 is mixing (σ2 (c) < 1), it
follows that 1 chooses D. But this contradicts (†).
To sum up, there is a single component of sequential equilibria:
This equation defines a hyperbola in the plane. One point of intersection with the unit
–114–
1
C D
2v 2w
c d c d
3 3x 3y 3z
A B a b a b a b
2 -2 2 -2 4 0 4 0
2 -2 4 0 2 -2 4 0
4 2 2 1 2 1 0 1
square (where legitimate mixed strategies live) is the point (σ1 (D), σ2 (d)) = (0, 0), but
these choices prevent I from being unreached, a contradiction. The remaining points of
intersection have positive components, allowing us to rewrite (‡) as
1 1
(§) 3= + .
σ1 (D) σ2 (d)
Along this curve σ2 (d) increases as σ1 (D) decreases, and the curve includes points (σ1 (D), σ2 (d)) =
(1, 12 ), ( 23 , 23 ), and ( 12 , 1).
A calculation shows that for player 1 to prefer D and player 2 to prefer d, we must have
respectively. We now show that (a) and (b) are inconsistent with (§). If (σ1 (D), σ2 (d)) =
(1, 12 ), then since player 2 is mixing, (b) must bind, but since σ1 (D) = 1, it does not—a
contradiction. Similar reasoning shows that (σ1 (D), σ2 (d)) = ( 12 , 1) is impossible. Thus
for (§) to hold, players 1 and 2 must both be mixing, and so (a) and (b) must both bind.
It follows that σ2 (d) = σ1 (D), and hence, from (§), that σ2 (d) = σ1 (D) = 23 . But then the
equality in (a) implies that σ3 (a) = − 12 , which is impossible.
Thus, the unique sequential equilibrium is (D, d, (A, b)) (with µ2 (w) = 1 and µ3 (z) = 1),
which generates payoffs (0, 0, 1). Evidently, if player 2 cannot observe player 1’s choice,
all three players are worse off.
(iii) Why can’t (D, d,ˆ b) be chosen on the equilibrium path in part (i)? If player 3 plays
(A, b), then player 2 will play c at her left node, while dˆ is always a best response at her
right node. If player 1 is planning to play D, he knows that when he switches to C, player
2 will observe this and play c rather than d,ˆ which makes this deviation profitable.
Why can’t (C, c, A) be chosen on the equilibrium path in part (ii)? If 1 and 2 are playing
(C, c), player 3’s information set I is unreached. If a deviation causes I to be reached, then
–115–
since 2 cannot observe 1’s choice, it follows from parsimony that 3 may not believe that
this is the result of a double deviation leading to z. Thus 3 must play a at I. Since 2
anticipates that 3 will play a at I, 2 is better off deviating to d. (The same logic shows that
1 is also better off deviating to D.) ♦
In the next result, a sequential equilibrium outcome is the distribution over terminal nodes
generated by some sequential equilibrium.
Theorem 2.56 (Kreps and Wilson (1982), Kohlberg and Mertens (1986)).
(i) The set of sequential equilibrium strategy profiles of Γ consists of a finite number of connected
components.
(ii) In games with generic choices of payoffs, there are a finite number of sequential equilibrium
outcomes; in particular, the outcome is constant on each component of sequential equilibria.
For more on the (semialgebraic) structure of the set of sequential equilibria and solution
sets for other equilibrium concepts, see Blume and Zame (1994).
Theorem 2.56(ii) is also true if we replace “sequential equilibrium” with “Nash equilib-
rium”. The restriction to generic choices of payoffs in Γ rather than in G is important:
reduced normal forms of most extensive form games have payoff ties, and thus are non-
generic in the space of normal form games. Notice also that the result is only about the
set of equilibrium outcomes; often, components of equilibria contain an infinite number
of strategy profiles which differ in their specifications of off-path behavior.
There are fundamental difficulties with defining sequential equilibrium in games with
infinite action spaces. One basic problem is that with continuous action sets, one cannot
perturb players’ strategies in a way that makes every play path have positive probability. A
definition of sequential equilibrium for this setting has recently been proposed by Myerson
and Reny (2015), who “consider limits of strategy profiles that are approximately optimal
(among all strategies in the game) on finite sets of events that can be observed by players
in the game”.
So far we have:
• defined Nash equilibrium for normal form games.
• used this definition (and the notion of the reduced normal from game) to define
Nash equilibrium for extensive form games.
–116–
• considered refinements of Nash equilibrium for extensive form games to capture
the principle of sequential rationality.
In the coming sections, we supplement equilibrium and sequential rationality by intro-
ducing additional principles for analyzing behavior in games.
Example 2.57. Battle of the Sexes with an outside option. Consider the following game Γ and
its reduced normal form G(Γ):
Γ 1
O I
G(Γ) 2
2, 2 1
L R
T B
O 2, 2 2, 2
2 2 1 T 3, 1 0, 0
L R L R B 0, 0 1, 3
3, 1 0, 0 0, 0 1, 3
Since Γ has stagewise perfect information, all three subgame perfect equilibria of Γ are
sequential equilibria when combined with appropriate beliefs (cf. Example 2.48).
(Why? ((I, T), L) has µ(x) = 1 (since x is reached), ((O, B), R) has µ(y) = 1 (by parsimony),
and ((O, 43 T + 14 B), 14 L + 34 R) has µ(x) = 34 (which is easy to compute directly).)
Nevertheless, only one of the three equilibria seems reasonable: If player 1 enters the
subgame, he is giving up a certain payoff of 2. Realizing this, player 2 should expect him
to play T, and then play L herself. We therefore should expect ((I, T), L) to be played. ♦
Kohlberg and Mertens (1986) use this example to introduce the idea of forward induction:
“Essentially what is involved here is an argument of ‘forward induction’: a subgame
should not be treated as a separate game, because it was preceded by a very specific form
of preplay communication—the play leading to the subgame. In the above example, it is
common knowledge that, when player 2 has to play in the subgame, preplay communi-
cation (for the subgame) has effectively ended with the following message from player 1
–117–
to player 2: ‘Look, I had the opportunity to get 2 for sure, and nevertheless I decided to
play in this subgame, and my move is already made. And we both know that you can no
longer talk to me, because we are in the game, and my move is made. So think now well,
and make your decision.”
“Speeches” of this sort are often used to motivate forward induction arguments.
In the example above, forward induction can be captured by requiring that an equilibrium
persist after a strictly dominated strategy is removed. Notice that strategy (I, B) is strictly
dominated for player 1. If we remove this strategy, the unique subgame perfect equi-
librium (and hence sequential equilibrium) is ((I, T), L). This example shows that none
of these solution concepts is robust to the removal of strictly dominated strategies, and
hence to a weak form of forward induction.
In general, capturing forward induction requires more than persistence after the removal
of dominated strategies.
A stronger form of forward induction is captured by equilibrium dominance: an equilibrium
should persist after a strategy that is suboptimal given the equilibrium outcome is removed
(see Sections 2.5.2 and 2.7).
A general definition of forward induction for all extensive form games has been provided
by Govindan and Wilson (2009).
For intuition, GW say:
“Forward induction should ensure that a player’s belief assigns positive probability only
to a restricted set of strategies of other players. In each case, the restricted set comprises
strategies that satisfy minimal criteria for rational play.”
GW’s formal definition is along these lines:
“A player’s pure strategy is called relevant for an outcome of a game in extensive form
with perfect recall if there exists a weakly sequential equilibrium with that outcome for
which the strategy is an optimal reply at every information set it does not exclude. The
outcome satisfies forward induction if it results from a weakly sequential equilibrium
in which players’ beliefs assign positive probability only to relevant strategies at each
information set reached by a profile of relevant strategies.”
One can capture forward induction in Example 2.57 by applying iterated removal of
weakly dominated strategies to the normal form G(Γ): B is strictly dominated for player
1; once B is removed, R is weakly dominated for player 2; once this is removed, O is
strictly dominated for player 1, yielding the prediction (T, L). Furthermore, iterated weak
dominance is also powerful in the context of generic perfect information games, where
–118–
applying it to the reduced normal form yields the backward induction outcome, though
not necessarily the game’s backward induction solution: see Osborne and Rubinstein
(1994), Marx and Swinkels (1997), and Østerdal (2005). But as we noted in Section 1.2.4,
iterated removal of strategies and cautiousness conflict with one another. A resolution to
this conflict is provided by Brandenburger et al. (2008), who provide epistemic foundations
for the iterated removal of weakly dominated strategies.
Closely related to iterated weak dominance is extensive form rationalizability (Pearce (1984);
Battigalli (1997)), which is based on there being common knowledge that players hold
a hierarchy of hypotheses about how opponents will act, and that observing behavior
inconsistent with the current hypothesis leads a player to proceed to the next unfalsified
hypothesis in his hierarchy. Extensive form rationalizability generates the backward
induction outcome (though not necessarily the backward induction solution) in generic
perfect information games, and leads to the forward induction outcome in Example 2.57.
Epistemic foundations for extensive form rationalizability are provided by Battigalli and
Siniscalchi (2002).
In a signaling game,
(i) Player 1 (the sender) receives a private signal (his type) and then chooses an action
(a message).
(ii) Player 2 (the receiver), observing only the message, chooses an action herself (a
response).
These games have many applications (to labor, IO, bargaining problems, etc.)
In signaling games, sequential equilibrium fails to adequately restrict predictions of play.
We therefore introduce new refinements that capture forward induction, and that take the
form of additional restrictions on out-of-equilibrium beliefs.
r1 r1
m1 m2 r2
2 ta 2
r2 r3
πa
0
r1 r1
πb
m1 m2 r2
r2 2 tb 2
r3
–119–
P = {1, 2} the players (1 = the sender, 2 = the receiver)
T finite set of player 1’s types
π prior distributions; π(t) > 0 for all t ∈ T
If message m is sent, the receiver’s expected utility from response r given beliefs µm is
X
u2 (t, m, r) µm
2 (t).
t∈T
Proposition 2.58. In a signaling game, any Bayesian beliefs are consistent, and so every weak
sequential equilibrium is a sequential equilibrium.
This proposition says that any beliefs after an unsent message can be justified by intro-
ducing an appropriate sequence of perturbed strategies for player 1. Verifying this is a
good exercise.
–120–
Recall that a sequential equilibrium outcome is the distribution over terminal nodes
generated by some sequential equilibrium. We now focus on signaling games with a
finite number of equilibrium outcomes. By Theorem 2.56, this is true for generic choices
of payoffs, and it implies that the equilibrium outcome is constant on each of the finite
number of components of sequential equilibrium strategy profiles.
Below we attempt to eliminate entire components of equilibria with unused messages as
inconsistent with forward induction. (Surviving components may become smaller too.)
A weak form of forward induction rules out equilibria that vanish after a dominated
strategy is removed.
Example 2.59.
-1
U 1
0 O ta I
0 2
D
³⁄₄ -1
0
0
-1
¹⁄₄ U 0
0 O I
0 tb 2
D
1
1
Suppose I is played. This message is strictly dominated for ta , so the receiver really ought
to believe she is facing tb (i.e., µ2 (b) = 1), breaking the bad equilibria.
Story: Suppose you are tb . If you deviate, you tell the receiver: ”ta would never want to
deviate. If you see a deviation it must be me, so you should play D”.
If we introduce the requirement that equilibria be robust to the removal of dominated
strategies, the bad equilibria are eliminated: If we eliminate action I for type ta , then
player 2 must play D, and so type tb plays I.
Computation of equilibria:
–121–
We can treat each type of player 1 as a separate player.
Strategy O is strictly dominant for type ta , so he plays this in any sequential equilibrium.
Now consider type tb . If σ1b (I) > 0, then µ2 (tb ) = 1, so player 2 must play D, implying that
type tb plays I. Equilibrium.
If type tb plays O, then player 2’s information set is unreached, so her beliefs are unre-
stricted. Also, for O to be type tb ’s best response, it must be that 0 ≥ −σ2 (U) + (1 − σ2 (U)),
or equivalently that σ2 (U) ≥ 12 . σ2 (U) = 1 is justified for player 2 whenever µ2 (a) ≥ 21 ,
while σ2 (U) ∈ [ 12 , 1) is justified whenever µ2 (a) = 12 . These combinations are sequential
equilibria. ♦
Stronger forms of forward induction are based on equilibrium dominance: they rule out
components of equilibria that vanish after a strategy that is not a best response at any
equilibrium in the component is removed.
4 3
2 I I 2
H L
2 th 2
D D
2/3
1 0
0 0
0
3 4
-3 I 1/3 I -3
H L
D 2 tl 2 D
0 1
0 0
–122–
a payoff of 0 for not investing. The broker’s payoff is the sum of up to two terms: he gets
a payoff of 3 if the investor invests or 0 if she does not, and an additional payoff of 1 if his
announcement about the quality of the investment is truthful.
Components of sequential equilibria (the computation is below):
σ1h (H) = 1 = σ1` (H) σ1h (L) = 1 = σ1` (L)
σH
2
(I) = 1 σL2 (I) = 1
(1) (2)
σ2 (D)
L
= 1 (≥ 31 ) σH2
(D) = 1 (≥ 13 )
µL2 (t` ) ≥ 25 (= 25 ) µ2 (t` )
H
≥ 25 (= 25 )
Are the equilibria in component (2) reasonable? Imagine that players expect such an
equilibrium to be played. Then a salesman with a low-quality property is getting his
highest possible payoff by reporting honestly; dishonestly sending message H can only
reduce his payoff. Therefore, if investor receives message H, he should conclude that it
was sent by a salesman with a high-quality property, and so should invest. Expecting
this, a salesman with a high-quality property should deviate to his honest message H. To
sum up, the fact that the salesman with a high-quality property wants to reveal this to the
investor should lead his preferred equilibrium to be played.
In Example 2.59, certain beliefs were deemed unreasonable because they were based on
expecting a particular sender type to play a dominated strategy. This is not the case
here: H is not dominated by L for t` . Instead, we fixed the component of equilibria
under consideration, and then concluded that certain beliefs are unreasonable given the
anticipation of equilibrium payoffs: the possible payoffs to H for t` (which are 3 and 0) are
all smaller than this type’s equilibrium payoff to L (which is 4), so a receiver who sees H
should not think he is facing t` .
Computation of equilibria:
We divide the analysis into cases according to the choices of types th and t` .
(L, L). In this case µL2 (th ) = 32 and hence sL2 = I. µH2
is unrestricted. Playing L gives t` his
best payoff. Type th weakly prefers L iff 3 ≥ 4(1 − σH 2
(D)) + σH 2
(D), and hence iff σH2
(D) ≥ 31 .
σH
2
(D) = 1 is justified if µH (t ) ≥ 25 , while σH
2 ` 2
(D) ∈ [ 31 , 1) is justified if µH (t ) = 25 . These
2 `
combinations form a component of sequential equilibria.
–123–
(H, L). In this case sH
2
= I and sL2 = D. Thus type t` obtains 1 for playing L and 3 for playing
H, and so deviates to H. l
(mix, H). In this case µL2 (th ) = 1, so sL2 = I, in which case t` strictly prefers L. l
Fix a component of sequential equilibria of signaling game Γ, and let u∗1a be the payoff
received by type ta on this component. (Recall that we are restricting attention to games
in which payoffs are constant on every equilibrium component.)
–124–
Dm is the set of types for whom message m is dominated by the equilibrium, given
that the receiver behaves reasonably.
(II) If for some unused message m with Dm , T and some type tb , we have
then component of equilibria fails the Cho-Kreps criterion (a.k.a. the intuitive criterion).
Type tb would exceed his equilibrium payoffs by playing message m if the receiver
played a best response to some beliefs that exclude types in Dm .
Example 2.60 revisited. Applying the Cho-Kreps criterion in the Beer-Quiche game.
Component (2): H is unused.
BRH2
({th , t` }) = {I, D}
∗
u1` = 4 > 3 = u1` (H, I)
> 0 = u1` (H, D) t` cannot benefit from deviating to H
∗
But u1h = 3 < 4 = u1h (H, I) (*) th could benefit from deviating to H
⇒ DH = {t` }, T − DH = {th } only th could benefit from deviating to H
BRH2
({th }) = {I} ⇒ by (*), these equilibria fail the Cho-Kreps criterion.
(In words: Type t` is getting his highest payoff in the equilibrium. Thus if a deviation to
H occurs, the receiver should believe that it is type th , and so should play I. Given this,
type th prefers to deviate, since he will get a payoff of 4 rather than 0. This breaks the
equilibrium.)
(In words: Type th is getting his highest payoff in the equilibrium. Thus if a deviation
to L occurs, the receiver should believe that it is type t` , and so should play D. This is
consistent with the equilibrium.) ♦
Further refinements
–125–
For set of types I ⊆ T, let MBRm 2
(I) ⊆ Rm be the set of responses to message m that are
optimal for the receiver under some beliefs that put probability 1 on the sender’s type
being in I. Formally:
[
MBRm 2 (I) = {σ̂m
2 ∈ ∆R : support(σ̂2 ) ⊆ BR2 (µ2 )}.
m m m m
µm
2
: µm
2
(I)=1
–126–
2.6 Invariance and Proper Equilibrium
Backward induction, invariance, and normal form refinements
The principle of invariance requires predictions of play in extensive form games with the
same reduced normal form to be the same.
The idea is that games with the same normal form differ only in terms of how they are
presented, so a theory of rational play should not treat them differently.
Example 2.61. Entry deterrence III rerevisited.
Γ 1
O E
Γ’ 1
0 1 O F C 2
2
F C f a
0 2x 2y
2 O 0, 2 0, 2
f a f a 1 F −5, 3 −5, 2
2x 2y
C −2, −1 2, 1
-5 -5 -2 2
f a f a 3 2 -1 1
-5 -5 -2 2
3 2 -1 1
We saw earlier that in game Γ, ((E, C), a) is the unique subgame perfect equilibrium and
the unique sequential equilibrium (with µ2 (y) = 1). There are additional weak sequential
equilibria in which player 1 plays O: namely, ((O, C), f ) with µ2 (x) ≥ 23 , and ((O, C), σ2 ( f ) ≥
1
2
) with µ2 (x) = 23 .
Notice that Γ and Γ0 only differ in the how player 1’s choices are presented. In particular,
both of these games have the same reduced normal form: G(Γ) = G(Γ0 ).
In Γ0 , consistency places no restrictions on beliefs. Therefore, all weak sequential equilibria
of Γ above correspond to sequential equilibria in Γ0 !
What is going on? When F and C are by themselves at a decision node, consistency
forces player 2 to discriminate between them. But in Γ0 , F and C appear as choices at the
same decision node as O, so when player 1 chooses O, consistency does not discriminate
between F and C. ♦
One way to respect invariance is to perform analyses directly on reduced normal forms, so
that invariance holds by default. This also has the advantage of mathematical simplicity,
since normal form games are simpler objects than extensive form games.
Shifting the analysis to the reduced normal form may seem illegitimate. First, normal
form and extensive form games differ in a fundamental way, since only in the latter is it
possible to learn something about one’s opponent during the course of play (see Example
2.32). Second, working directly with the normal form appears to conflict with the use of
backward induction, whose logic seems tied to the extensive form.
–127–
Both of these criticisms can be addressed. First, the differences between extensive and
normal form games are much smaller if we only consider equilibrium play: when players
adhere to equilibrium strategies, nothing important is learned during the course of play.
Second, the fact that a strategy for an extensive form game specifies a player’s complete
plan for playing a game suggests that the temporal structure provided by the extensive
form may not be essential as it might seem. In fact, since an extensive form game creates
a telltale pattern of ties in its reduced normal form, one can “reverse engineer” a reduced
normal form to determine the canonical extensive form that generates it—see Mailath
et al. (1993).
To implement the normal form approach, we require robustness of equilibrium to low
probability mistakes, sometimes called trembles. Trembles ensure that all information sets
of the corresponding extensive form game are reached.
Perfect equilibrium
Throughout this section, we let G be a finite normal form game and Γ a finite extensive
form game (with perfect recall).
Strategy profile σ is an ε-perfect equilibrium of G if it is completely mixed and if si < Bi (σ−i )
implies that σi (si ) ≤ ε.
Strategy profile σ∗ is a perfect equilibrium of G if and only if it is the limit of a sequence
of ε-perfect equilibria with ε → 0.
Remarks:
(i) For σ∗ to be a perfect equilibrium, it is only necessary that there exist a sequence of
ε-perfect equilibria converging to σ∗ . It need not be the case that every sequence
of strategy profiles converging to σ∗ consists of ε-perfect equilibria. An analogous
point arose in the definition of consistency for sequential equilibrium (Section 2.4.3).
The analogous point holds for proper equilibrium, but not for KM stable sets—see
below.
(ii) The formulation of perfect equilibrium above is due to Myerson (1978). The original
definition, dueSto Selten (1975), is stated in terms of Nash equilibria of perturbed
games Gp , p : i∈P Si → (0, 1), in which player i’s mixed strategy must put at least
probability psi on strategy si .
Example 2.62. Entry deterrence revisited.
1 F A
O E
O
2
0 2 F A
3 O 0, 3 0, 3
F A 1
E −1, −1 1, 1
ε
-1 1 E ε
-1 1
–128–
Nash equilibria: (E, A) and (O, σ2 (F) ≥ 12 ). Only (E, A) is subgame perfect.
What are the perfect equilibria of the normal form G(Γ)? Since F is weakly dominated, 2’s
best response to any completely mixed strategy of 1 is A, so in any ε-perfect equilibrium,
σ2 (F) ≤ ε. It follows that if ε is small, 1’s best response is E, so in any ε-perfect equilibrium,
σ1 (O) ≤ ε. Therefore, any sequence of ε-perfect equilibria with ε → 0 converges to (E, A),
which is thus the unique perfect equilibrium of G(Γ). ♦
Let Γ be a generic extensive form game of perfect information, so that Γ has a unique
subgame perfect equilibrium. Will applying perfection to G(Γ) rule out Nash equilibria of
Γ that are not subgame perfect?
If Γ has the single move property (i.e., if no player has more than one decision node on any
play path), then the perfect equilibrium of G(Γ) is unique, and it is outcome equivalent to
the unique subgame perfect equilibrium of Γ.
But beyond games with the single move property, perfect equilibrium is not adequate to
capture backward induction.
Example 2.66. In the game Γ below, ((B, D), R) is the unique subgame perfect equilibrium.
((A, ·), L) are Nash equilibria. (Actually A with σ2 (L) ≥ 12 are Nash too.)
Γ 1
A B
G(Γ) 2
2,4 2 L R
L R A 2, 4 2, 4 ≈1
1 BC 1, 1 0, 0 ε
BD 1, 1 3, 3 ≤ 2ε
1,1 1
1−ε ε
C D
0,0 3,3
–129–
Why (A, L)? If in the ε-perfect equilibria, the weight on BC is at least double that on BD,
then L is player 2’s best response.
Implicitly, we are assuming that if 1’s second node is reached, he is more likely to be dumb
than smart. Sequential rationality forbids this, but perfect equilibrium does not. ♦
To handle this example, we need to force players to behave rationally at all information
sets, even those which occur after he himself deviates.
Proper equilibrium
To capture subgame perfection directly in the reduced normal form, we introduce a
refinement that requires that more costly trembles be less likely to occur.
Strategy profile σ is ε-proper if it is completely mixed and if ui (si , σ−i ) < ui (s0i , σ−i ) implies
that σi (si ) ≤ εσi (s0i ).
Strategy profile σ∗ is a proper equilibrium (Myerson (1978)) if it is the limit of a sequence of
ε-proper equilibria with ε → 0.
Example 2.66 rerevisited. Recall that in Γ, ((A, ·), L) is Nash but not subgame perfect. Now
we show that (A, L) is not proper.
G(Γ) 2
L R
A 2, 4 2, 4 ≈1
1 BC 1, 1 0, 0 ≈ ε2
BD 1, 1 3, 3 ≈ε
Why? If σ2 (R) is small but positive, then u1 (A) > u1 (BD) > u1 (BC), so in any ε-proper
equilibrium we have σ1 (BD) ≤ εσ1 (A) and σ1 (BC) ≤ εσ1 (BD).
Therefore, player 2 puts most of her weight on R in any ε-proper equilibrium, and so L is
not played in any proper equilibrium. ♦
It can be shown that if Γ is a game of perfect information, then every proper equilibrium
of G(Γ) is outcome equivalent (i.e., induces the same distribution over terminal nodes) to
some subgame perfect equilibrium of Γ.
Remarkably, proper equilibrium also captures sequential rationality in games of imperfect
information:
Theorem 2.68 (van Damme (1984), Kohlberg and Mertens (1986)).
–130–
(i) Suppose that σ∗ is a proper equilibrium of G(Γ). Then there is an outcome equivalent
behavior strategy profile β∗ of Γ that is a sequential equilibrium strategy profile of Γ.
(ii) Let {σε } be a sequence of ε-proper equilibria of G(Γ) that converge to proper equilibrium σ∗ .
Let behavior strategy βε be outcome equivalent to σε , and let behavior strategy β∗ be a limit
point of the sequence {βε }. Then β∗ is a sequential equilibrium strategy profile of Γ.
What is the difference between parts (i) and (ii) of the theorem? In part (i), σ∗ is outcome
equivalent to some sequential equilibrium strategy profile β∗ . But outcome equivalent
strategy profiles may specify different behavior off the equilibrium path; moreover, the
strategy σ∗i for the reduced normal form does not specify how player i would behave at
unreachable information sets (i.e., at information sets that σ∗i itself prevents from being
reached). In part (ii), the ε-proper equilibria are used to explicitly construct the behavior
strategy profile β∗ . Thus, part (ii) shows that the construction of proper equilibrium does
not only lead to outcomes that agree with sequential equilibrium; by identifying choices
off the equilibrium path, it captures the full force of the principle of sequential rationality.
Theorem 2.68 shows that proper equilibrium achieves our goals of respecting the principle
of sequential rationality while ensuring invariance of predictions across games with the
same purely reduced normal form.
Only (B, R, D) is perfect in A(Γ), and so only ((B, D), R) is extensive form perfect in Γ.
Why isn’t (A, L, ·) perfect in A(Γ)? C is weakly dominated for player 3, so he plays D in
any perfect equilibrium. Facing εC + (1 − ε)D, 2 prefers R. Therefore, L is not played in
any perfect equilibrium. ♦
Extensive form perfect equilibrium is the original equilibrium refinement used to capture
backward induction in extensive form games with imperfect information, but it is not an
–131–
easy concept to use. Kreps and Wilson (1982) introduced sequential equilibrium to retain
most of the force of extensive form perfect equilibrium, but in a simpler and more intuitive
way, using beliefs and sequential rationality.
The following result shows that extensive form perfect equilibrium and sequential equi-
librium are nearly equivalent, with the former being a just slightly stronger refinement.
Theorem 2.69 (Kreps and Wilson (1982), Blume and Zame (1994), Hendon et al. (1996)).
(i) Every extensive form perfect equilibrium is a sequential equilibrium strategy profile.
(ii) In generic extensive form games, every sequential equilibrium strategy profile is an extensive
form perfect equilibrium.
In rough terms, the distinction between the concepts is as follows: Extensive form perfect
equilibrium and sequential equilibrium require reasonable behavior at all information
sets. But extensive form perfect equilibrium requires best responses to the perturbed
strategies themselves, while sequential equilibrium only requires best responses in the
limit.
To make further connections, Kreps and Wilson (1982) define weak extensive form perfect
equilibrium, which generalizes Selten’s (1975) definition by allowing slight perturbations
to the game’s payoffs. They show that this concept is equivalent to sequential equilibrium.
Notice that extensive form perfect equilibrium retains the problem of making different
predictions in games with the same reduced normal form, since such games can have
different agent normal forms (cf Example 2.61).)
Surprisingly, extensive form perfect equilibria can use weakly dominated strategies:
Γ 1
A B
G(Γ) 2
L R
2 1 A 0, 0 1, 1
1 BC 0, 0 0, 0
L R C D
BD 1, 1 1, 1
But ((A, D), R) is extensive form perfect: there are ε-perfect equilibria of the agent normal
form in which agent 1b is more likely to tremble to C than player 2 is to tremble L, leading
agent 1a to play A. ♦
In fact, Mertens (1995) (see also Hillas and Kohlberg (2002)) provides an example of a game
in which all extensive form perfect equilibria use weakly dominated strategies! Again, the
difficulty is that some player believes that he is more likely to tremble than his opponents.
–132–
In extensive form game Γ, one defines quasi-perfect equilibrium (van Damme (1984)) in
essentially the same way as extensive-form perfect equilibrium, except that when con-
sidering player i’s best response at a given information set against perturbed strategy
profiles, one only perturbs the strategies of i’s opponents; one does not perturb player i’s
own choices at his other information sets. Put differently, we do not have player i consider
the possibility that he himself may tremble later in the game.
Neither of extensive form perfection or quasi-perfection implies the other. But unlike ex-
tensive form perfect equilibria, quasi-perfect equilibria never employ weakly dominated
strategies.
van Damme (1984) proves Theorem 2.68 by showing that proper equilibria must cor-
respond to quasi-perfect equilibria, which in turn correspond to sequential equilibria.
Mailath et al. (1997) show that proper equilibrium in a given normal form game G is
equivalent to what one might call “uniform quasi-perfection” across all extensive forms
with reduced normal form G.
–133–
opponent’s own choice of strategy. (For instance, in the Mini Centipede game (Example
2.32), player 1’s second decision node is not reachable if he chooses B at his first decision
node.) Foundations for the resulting rationalizability concept, sometimes called weakly
sequential rationalizability, are provided by Ben-Porath (1997); for the equilibrium analogue
of this concept, sometimes called weakly sequential equilibrium, see Reny (1992). Adding
common certainty of cautiousness yields the permissible strategies, which are the strategies
that survive the Dekel-Fudenberg procedure; see Dekel and Fudenberg (1990), Branden-
burger (1992), and Börgers (1994), as well as Section 1.2.4. The equilibrium analogue
of permissibility is normal form perfect equilibrium. See Asheim (2006) for a complete
treatment of these solution concepts and their epistemic foundations.
Finally, one can look at solution concepts that require correct expectations about oppo-
nents’ behavior on the equilibrium path, but allow for differences in beliefs about choices
off the equilibrium path. Such solution concepts, which are weaker than Nash equi-
librium, include self-confirming equilibrium (Fudenberg and Levine (1993), Battigalli and
Guaitoli (1997)) and rationalizable self-confirming equilibrium (Dekel et al. (1999, 2002)); the
latter concept uses rationalizability requirements to restrict beliefs about opponents’ play
at reachable nodes off the equilibrium path.
Example 2.71. Battle of the Sexes with an outside option revisited. Here is the game Γ from
Example 2.57, along with its reduced normal form G(Γ):
Γ 1
O I
2, 2
G(Γ) 2
1
L R
T B O 2, 2 2, 2
1 T 3, 1 0, 0
2 2
B 0, 0 1, 3
L R L R
3, 1 0, 0 0, 0 1, 3
–134–
Γ’ 1
O I
2, 2 1 G(Γ0 ) 2
M D L R
O 2, 2 2, 2
0 1
3 1
M 2 14 , 1 34 1 21 , 1 12
4 4 T B 1
T 3, 1 0, 0
2, 2 2 2 2 B 0, 0 1, 3
L R L R L R
3, 1 0, 0 3, 1 0, 0 0, 0 1, 3
Example 2.72. Battle of the Sexes with an outside option rerevisited. We saw in Example 2.57 that
G(Γ) has three subgame perfect equilibria, ((I, T), L), ((O, B), R), and ((O, 43 T + 14 B), 41 L + 34 R).
These correspond to three proper equilibria, (T, L), (B, R), and ( 34 T + 14 B, 14 L + 34 R).
Γ0 has a unique subgame perfect equilibrium, ((I, D, T), L). (In the subgame, player 1’s
strategy DB, which yields him at most 1, is strictly dominated by his strategy M, which
yields him at least 1 12 . Knowing this, player 2 will play L, so player 1 will play (D, T) in
the subgame and I initially.) It then follows from Theorem 2.68 that (T, L) is the unique
proper equilibrium of G(Γ0 ) .
However, while Γ and Γ0 have different purely reduced normal forms, they share the same
fully reduced normal form: G∗ (Γ0 ) = G∗ (Γ) = G(Γ).
If one accepts full invariance as a desirable property, then this example displays a number
of difficulties with proper equilibrium:
(i) Adding or deleting a pure strategy that is equivalent to a mixture of other pure
strategies can alter the set of proper equilibria. (One can show that adding or
deleting duplicates of pure strategies does not affect proper equilibrium.)
(ii) A proper equilibrium of a fully reduced normal form G∗ (Γ0 ) need not even corre-
spond to a subgame perfect equilibrium of Γ0 . (Contrast this with Theorem 2.68, a
positive result for the purely reduced normal form.)
(iii) If we find solutions for games by applying proper equilibrium to their purely
reduced normal forms, then we may obtain different solutions to games with the
same fully reduced normal form. ♦
–135–
Example 2.71 rerevisited. As we have seen, Γ has three subgame perfect (and sequential)
equilibria. But Γ0 has the same fully reduced normal form as Γ, but its only subgame
perfect equilibrium is ((I, D, T), L). Therefore, our unique prediction of play in Γ should
be the corresponding subgame perfect equilibrium ((I, T), L). As we saw earlier, this is the
only equilibrium of Γ that respects forward induction. ♦
Building on this insight, Govindan and Wilson (2009) argue that together, backward
induction and full invariance imply forward induction, at least in generic two-player
games.
Iterated dominance (D3) embodies a limited form of forward induction—see Section 2.5.1.
KM argue that admissibility (D4) is a basic decision-theoretic postulate that should be
respected, and appeal to various authorities (Wald, Arrow, . . . ) in support of this point of
view.
In addition, KM require existence: a solution concept should offer at least one solution for
every game.
For a solution concept to satisfy invariance (D1), backward induction (D2), and existence
in all games, the solutions must be set-valued: see Example 2.73 below.
Similarly, set-valued solutions are required for the solution concept to satisfy (D1), (D3),
and existence (see KM, Section 2.7.B).
Set-valued solutions are natural: Extensive form games possess connected components
of Nash equilibria, elements of which differ only in terms of behavior at unreached
information sets. Each such component should be considered as a unit.
Once one moves to set-valued solutions, one must consider restrictions on the structure
of solution sets. KM argue that solution sets should be connected sets.
As build-up, KM introduce two set-valued solution concepts that satisfy (D1)–(D3) and
existence, but that fail admissibility (D4) and connectedness.
They then introduce their preferred solution concept: A closed set E of Nash equilibria
(of game G = G∗ (Γ)) is KM stable if it is minimal with respect to the following property:
“for any ε > 0 there exists some δ0 > 0 such that for any completely mixed strategy vector
(σ1 , . . . , σn ) and for any δ1 , . . . , δn (0 < δi < δ0 ), the perturbed game where every strategy s
–136–
of player i is replaced by (1 − δi )s + δi σi has an equilibrium ε-close to E.”
Remark: If in the above one replaces “for any (σ1 , . . . , σn ) and δ1 , . . . , δn ” with “for some
(σ1 , . . . , σn ) and δ1 , . . . , δn ”, the resulting requirement is equivalent to perfect equilibrium.
Thus, a key novelty in the definition of KM stability is the requirement that equilibria be
robust to all sequences of perturbations.
KM stability satisfies (D1), (D3), (D4), and existence. In fact, it even satisfies a stronger
forward induction requirement than (D3) called equilibrium dominance: A KM stable set E
contains a KM stable set of any game obtained by deletion of a strategy that is not a best
response to any equilibrium in E (see Section 2.5.2 for further discussion).
However, KM stability fails connectedness and backward induction (D2): KM provide
examples in which a KM stable set (of G∗ (Γ)) contains no strategy profile corresponding
to a sequential equilibrium (of Γ).
A variety of other definitions of strategic stability have been proposed since Kohlberg
and Mertens (1986). Mertens (1989, 1991) proposes a definition of strategic stability that
satisfies (D1)–(D4), existence, connectedness, and much besides, but that is couched in
terms of ideas from algebraic topology. Govindan and Wilson (2006) obtain (D1)–(D4) and
existence (but not connectedness) using a relatively basic definition of strategic stability.
Example 2.73. Why backward induction and full invariance require a set-valued solution
concept.
Γ(p) 1
G 2 O I
L R
1,-1 1
O 1,-1 1,-1
M T B
1 T 2,-2 -2, 2
0
p 1– p
B -2, 2 2,-2
1,-1 2x 2y 2z
L R L R L R
Analysis of G: By drawing the payoffs to each of player 1’s pure strategies as a function
of player 2’s mixed strategy, one can see that player 1’s unique maxmin strategy is O, and
that player 2’s maxmin strategies are αL + (1 − α)R with α ∈ [ 41 , 34 ]. Since G is zero-sum, its
Nash equilibria are thus the profiles (O, αL + (1 − α)R) with α ∈ [ 41 , 34 ].
Analysis of Γ(p): For each p ∈ (0, 1), this game has a unique sequential equilibrium, namely
1−p 4−3p 4−p
1
((O, 2−p M + 2−p B), 8−4p L + 8−4p R). To see this, first notice that there cannot be an equilibrium
–137–
in which player 2 plays a pure strategy. (For instance, if 2 plays L, then 1’s best response
in the subgame would be T, in which case 2 would switch to R.) Thus, player 2 is mixing.
For this to be optimal, her beliefs must satisfy µ(z) = 12 . To have an equilibrium in the
subgame in which player 2 has these beliefs, player 1’s strategy must satisfy
In particular, player 1 must place positive probability on B and on at least one of M and T.
For player 1 to place positive probability on T, he would have to be indifferent between B
and T, implying that 2 plays 12 L + 21 R. But in this case player 1 would be strictly better off
playing M in the subgame, a contradiction. Thus σ1 (T) = 0, and so (*) implies that player
1−p
1
1 plays 2−p M + 2−p B in the subgame.
For player 1 to be willing to randomize between M and B, it must be that
Since p < 1, this payoff is less than 1, and so player 1 strictly prefers O at his initial node.
Each choice of p ∈ (0, 1) leads to a unique and distinct sequential equilibrium of Γ(p).
These equilibria correspond to distinct Nash equilibria of G, which itself is the reduced
normal form of each Γ(p). Therefore, if we accept backward induction and invariance, no
one Nash equilibrium of G constitutes an acceptable prediction of play. Thus, requiring
invariance and backward induction leads us to set-valued solution concepts.
(If in Γ(p) we had made the strategy M a randomization between O and B, the weight on
σ2 (L) would have gone from 43 to 12 , giving us the other half of the component of equilibria.
This does not give us σ2 (L) = 21 , but this is the unique subgame perfect equilibrium of the
game where 1 does not have strategy M.) ♦
3. Repeated Games
In many applications, players face the same interaction repeatedly. How does this affect
our predictions of play?
Repeated games provide a general framework for studying long run relationships.
While we will focus on the basic theory, this subject becomes even more interesting when
–138–
one introduces hidden information (⇒ reputation models (Kreps et al. (1982), Schmidt
(1993))), hidden actions (⇒ imperfect monitoring models (Green and Porter (1984), Abreu
et al. (1990), Fudenberg et al. (1994), Sekiguchi (1997), Sannikov (2007)), state variables
(Dutta (1995), Maskin and Tirole (2001)), or combinations of these elements. An excellent
general reference for this material is Mailath and Samuelson (2006).
Definitions
In an (infinitely) repeated game, an n player normal form game G, known as the stage game, is
played in periods t = 0, 1, 2, . . . , with the period t action choices being commonly observed
at the end of the period. Payoffs in the repeated game are the discounted sum of stage
game payoffs for some fixed discount rate δ ∈ (0, 1).
ai ∈ Ai a pure action
αi ∈ ∆Ai a mixed action
α ∈ i∈P ∆Ai
Q
a mixed action profile
The discount rate δ can be interpreted as the probability that the game does not end in
any given period. But we don’t need to be too literal about infinite repetition: what is
important is that the players view the interaction as one with no clear end—see Rubinstein
(1991) for a discussion.
–139–
The rescaling by (1 − δ) is done for convenience: it makes payoffs in the repeated game
commensurate with payoffs in the stage game. To see why, recall that ∞ t=0 δ = 1−δ ; thus
t 1
P
P∞ t
(1 − δ) t=0 δ c = c, as desired.
As we noted in Section 2.3.4, the one-shot deviation principle applies in repeated games,
so that subgame perfection is equivalent to the absence of profitable one-shot deviations.
Formally, we say that strategy σi admits no profitable one-shot deviations given σ−i if after
each history ht , player i cannot improve his payoff in the continuation game that follows ht
by changing his action immediately after ht but otherwise following strategy (σ|ht )i . If this
is true for every player i, we say that strategy profile σ itself admits no profitable one-shot
deviations.
Theorem 3.1. Let G∞ (δ) be a repeated game. Strategy profile σ is a subgame perfect equilibrium
of G∞ (δ) if and only if σ admits no profitable one-shot deviations.
–140–
Example 3.2. The finitely repeated Prisoner’s Dilemma GT (δ), T < ∞, δ ∈ (0, 1].
Players play T + 1 times, starting with period 0 (so that the last period is T.) Before playing
period t, the results of all previous periods are observed. Payoffs are the discounted sum
of stage game payoffs
Proposition 3.3. In the unique subgame perfect equilibrium of GT (δ), both players always Defect.
Proposition 3.4. In any Nash equilibrium of GT (δ), players always defect on the equilibrium path.
Example 3.5. The infinitely repeated Prisoner’s Dilemma G∞ (δ), δ ∈ (0, 1).
What are the consequences of dropping the assumption that the game has a commonly
known final period?
Proposition 3.6.
(i) “Always defect” is a subgame perfect equilibrium of G∞ (δ) for all δ ∈ (0, 1).
(ii) If δ ≥ 1/2, the following defines a subgame perfect equilibrium of G∞ (δ): σi = “Cooperate
so long as no one has ever defected; otherwise defect,” (the grim trigger strategy).
There are many other equilibria—see Sections 3.2 and 3.3.
We determine whether a strategy profile is a subgame perfect equilibrium of G∞ (δ) using
the one-shot deviation principle (Theorem 3.1). Specifically, it is enough to check that no
–141–
player cannot benefit from deviating only in the period immediately following each finite
history ht ∈ H. We do so by partitioning the finite histories into a small number of cases
according to the nature of continuation play.
Proof. We begin by considering part (ii). Since the equilibrium is symmetric, we need only
check player 1’s behavior. There are two sorts of histories to consider:
1 1 1
Equilibrium: (C, C), (C, C), (C, C), . . . . . . ⇒ π1 = 1
t=0 t=1 t=2
∞
2 0 0 X
Deviation: (D, C), (D, D), (D, D), . . . . . . ⇒ π1 = (1 − δ) 2 + δ · 0 = (1 − δ)2
t
t=0 t=1 t=2 t=1
Remarks
1. The analysis of infinitely repeated games is greatly simplified by their recursive
structure: the “continuation game” starting from any history ht is formally identical
to the game starting at the null history h0 .
(Dynamic programs have a related (but distinct) recursive structure.)
2. Since the strategies are contingent rules for behavior, a change in player i’s strategy
in one period can change the actions he and his opponents choose in future peri-
ods. For instance, if both players are supposed to play the grim trigger strategy,
then changing player 1’s strategy in the initial period changes the play path from
((C, C), (C, C), (C, C), . . .) to ((D, C), (D, D), (D, D) . . .).
–142–
1
C D
2 2
C D C D
1 1 1 1
C D C D C D C D
2 2 2 2 2 2 2 2
C D C D C D C DC D C D C D C D
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
C D C D C D C D C D C D C D C D C D C D C D C D C D C D C D C D
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
C D C D C D C D C D C D C D C D C D C D C D C D C D C D C D C DC D C D C D C D C D C D C D C D C D C D C D C D C D C D C D C D
The (partial) game tree above is that of the repeated Prisoner’s Dilemma. The bold
edges represent the grim trigger strategy profile. If this strategy profile is played,
then play proceeds down the leftmost branch of the tree, and both players cooperate
in every period. If we modify one player’s strategy so that after a single cooperative
history he plays D rather than C, then play enters a subgame in which both players
defect in every period. Thus a deviation from the strategy’s prescription after a
single history can alter what actions are played in all subsequent periods.
This figure also shows the partition of histories into the two cases above. The
decision nodes of player 1 are the initial nodes of subgames. The leftmost subgames
follow histories in which no one has defected (including the null history). All other
subgames follow histories in which someone has defected.
3. The difference between the equilibrium outcomes of the finitely repeated and in-
finitely repeated Prisoner’s Dilemmas is quite stark. With many other stage games,
the difference is not so stark. If the stage game G has multiple Nash equilibrium
outcomes, one can often sustain the play of non-Nash action profiles of G in early
periods—for instance, by rewarding cooperative play in early periods with the
play of good Nash outcomes of G in later periods, and by punishing deviations in
early periods with bad Nash outcomes in later periods. For general analyses of the
finitely repeated games, see Benoı̂t and Krishna (1985) and Friedman (1985).
–143–
3.2 Stick-and-Carrot Strategies and the Folk Theorem
Repeated games typically have many subgame perfect equilibria. Therefore, rather than
looking for all subgame perfect equilibrium strategy profiles, we instead ask which payoff
vectors can be achieved in a subgame perfect equilibrium.
The most basic question is to characterize the set of payoff vectors that can be achieved in a
repeated game G∞ (δ) for some fixed discount rate δ ∈ (0, 1). The solution to this question,
due to Abreu et al. (1990), is studied in Section 3.3.
Before addressing this question, we consider one that turns out to be simpler, at least in its
basic formulation: What can we say about the set of subgame perfect equilibrium payoff
vectors when players are very patient—that is, when the discount rate δ approaches 1?
The folk theorem tells us that all feasible, strictly individually rational payoff vectors can
be obtained in subgame perfect equilibrium. Thus in repeated games with very patient
players, subgame perfection imposes no restrictions on our predictions of payoffs beyond
those evident from the stage game. In proving the folk theorem, we will introduce stick-
and-carrot strategies, a fundamental device for the provision of intertemporal incentives.
Thus vi is the payoff obtained when his opponents minmax him and he, anticipating what
they will do, plays a best response (see Section 1.6). This leads to a lower bound on what
player i obtains in any equilibrium of the repeated game.
Proof. Given any σ−i , i can always play his myopic best response in each period, and this
will yield him a payoff of at least vi .
–144–
The set of feasible, strictly individually rational payoff vectors is
Theorem 3.8 (The folk theorem (Fudenberg and Maskin (1986, 1991), Abreu et al. (1994))).
Let v ∈ F∗ , and suppose that:
Then for all δ close enough to 1, there is a subgame perfect equilibrium of G∞ (δ) with payoffs v.
Condition (ii) is known as nonequivalent utilities, or NEU.
Remark: Why should we care about being able to sustain low subgame perfect equilibrium
payoffs? Consider a stage game G with an action profile a that gives all players high
payoffs, but is not a Nash equilibrium of G: there is a player i with an alternate action ãi
such that action profile (ãi , a−i ) gives player i a very high stage game payoff at the expense
of the others. To support repeated play of a as a subgame perfect equilibrium play path of
G∞ (δ), we must be able to threaten to punish player i with low payoffs starting tomorrow
if he plays ãi today. But to ensure that such a punishment is credible, the punishment
strategy profile itself must be a subgame perfect equilibrium of G∞ (δ). Thus the ability to
enforce low subgame perfect equilibrium payoffs can be crucial to achieving high subgame
perfect equilibrium payoffs.
This idea is applied in constructing the stick and carrot strategies used to prove the folk
theorem (see Example 3.11). It becomes more prominent still when we consider supporting
high equilibrium payoffs in G∞ (δ) for some fixed discount rate δ ∈ (0, 1) (see Example 3.12
and Section 3.3).
The next two examples prove special cases of the folk theorem in order to illustrate two of
the basic constructions of subgame perfect equilibria of repeated games. Later we discuss
what more must be done to prove the full result.
–145–
(1) The payoff vector v = u(â) can be obtained from some pure action profile â ∈ A.
(2) For each player i, there is a Nash equilibrium αi ∈ A of the stage game such that
vi > ui (αi ).
Consider this Nash reversion strategy, a generalization of the grim trigger strategy:
σi : “Play âi if there has never been a period in which exactly one player deviated.
j
Otherwise, if j was the first to unilaterally deviate, play αi .”
Clearly, σ generates payoff vector v in G∞ (δ) for any δ ∈ (0, 1). To verify that σ is a subgame
perfect equilibrium of for δ large enough, we check that no player has a profitable one-shot
deviation.
Let v̄i = max ui (a) denote player i’s maximal stage game payoff.
a∈A
There are 1 + n cases to consider. After histories with no unilateral deviations (e.g., on the
equilibrium path), i does not benefit from deviating if
Example 3.10. To better understand what can and cannot be achieved using Nash reversion,
consider this symmetric normal form game G:
2
a b c
A 0, 0 4, 2 1, 0
1 B 2, 4 0, 0 0, 0
C 0, 1 0, 0 0, 0
The Nash equilibria of G are (A, b), (B, a), and ( 32 A + 13 B, 23 a + 13 b), which yield payoffs (4, 2),
(2, 4), and ( 43 , 43 ). The strategies C and c are strictly dominated, but are the pure minmax
p p
strategies; they generate the pure minmax payoffs (v1 , v2 ) = (1, 1). The mixed minmax
strategies are 31 A + 32 C and 31 a + 23 c; they generate the mixed minmax payoffs (v1 , v2 ) = ( 32 , 32 ).
(All of these claims can be verified using the figure at left.)
–146–
u2
4
B
3
u2(b) u2(a)
A C ⁴⁄₃
B [4]
1
²⁄₃
_
2
3
A+ _
1
3
B [_43 ] a
²⁄₃ 1 ⁴⁄₃ 2 3 4 u1
b
A
[_2 ]
3
_ C
[2] 1
3
A+ _
2
3
C [1]
The five black dots in the figure at right correspond to the feasible payoffs from pure
strategy profiles of G. The convex hull of these points is the set F of feasible repeated
game payoffs.
The light gray region consists of feasible payoff vectors in which each player receives more
than his payoff in the mixed Nash equilibrium of G. In principle, these payoff vectors can
be obtained by patient players in subgame perfect equilibria of the repeated game using
Nash reversion. (Getting points in the interior of this region requires either alternation or
randomization on the equilibrium path; see discussion point 1 below.)
The union of the light and medium gray regions consists of payoff vectors that give each
player at least his pure minmax payoff. Again modulo alternation or randomization on
the equilibrium path, these points can be achieved in subgame perfect equilibrium using
the approach introduced in the next example.
The complete shaded region is the set F∗ considered in the folk theorem. To obtain points
in the dark gray region, one needs to introduce randomization not only on the equilibrium
path, but also on the punishment path; see discussion point 2 below. ♦
Example 3.11. Stick and carrot strategies (Fudenberg and Maskin (1986)).
We now prove the folk theorem under the assumptions that
(1) The payoff vector v = u(â) can be obtained from some pure action profile â ∈ A.
–147–
(2) There are exactly two players.
(3) For each player i, vi is greater than player i’s pure strategy minmax value,
p
vi = min max ui (ai , a j ).
a j ∈A j ai ∈Ai
Let ami
be player i’s minmaxing pure action, and consider the following stick and carrot
strategy:
The value of the punishment length L ≥ 1 will be determined below; often L = 1 is enough.
Again, σ generates payoff vector v in G∞ (δ) for any δ ∈ (0, 1). To verify that σ is a subgame
perfect equilibrium of for δ large enough, we check that no player has a profitable one-shot
deviation.
Let v̄i = maxa ui (a), and let vm
i
= ui (am ), where am = (am
1
, am
2
) is the joint minmaxing pure
action profile. Then
p
(18) i ≤ vi < vi ≤ v̄i .
vm
We can therefore choose a positive integer L such that for i ∈ {1, 2},
i ) > v̄i − vi ,
L (vi − vm or equivalently,
(19) (L + 1) vi > v̄i + Lvm
i .
(In words: if player i were perfectly patient, he would prefer getting vi for L + 1 periods
to getting his maximum payoff v̄i once followed by his joint minmax payoff vm i
L times.)
There is no profitable one-shot deviation from the equilibrium phase (I) if
∞
L ∞
X X X
vi = (1 − δ) δt vi ≥ (1 − δ) v̄i + δt vm
i + δ t
v
i
t=0 t=1 t=L+1
L
X L
X
⇔ δt vi ≥ v̄i + δt vm
i .
t=0 t=1
Equation (19) implies that this inequality holds when δ is close enough to 1.
In the punishment phase (II), deviating is most tempting in the initial period, when all L
–148–
rounds of punishment still remain. Deviating in this period is not profitable if
L−1 ∞
L ∞
X X p X X
(1 − δ) δt vm
i + δt
v i
≥ (1 − δ) v +
i
δt m
vi + δ t
v i
t=0 t=L t=1 t=L+1
p
⇔ vm
i + δ vi ≥
L
vi + δL vm
i
p
⇔ δ vi + (1 − δL )vm
L
i ≥ vi .
Equation (18) implies that this inequality holds when δ is close enough to 1.
The intuition for why the punishment phase is an equilibrium is easiest to see in the
penultimate inequality. If δ = 1, the left-hand side is the sum of the joint minmax payoff
p
vmi
and the equilibrium payoff vi ; the right-hand side is the sum of the minmax payoff vi
and the joint minmax payoff. The joint minmax payoffs cancel, and we are left with the
p
comparison vi ≥ vi , which holds strictly by assumption. Put differently: When players are
very patient, the salient difference between obeying the punishment and deviating from
it is one round of the equilibrium payoff in the former case vs. one round of the minmax
payoff in the latter. The player strictly prefers the former.
Why is σi called a stick and carrot strategy? The punishment phase (II) is the stick (i.e.,
the threat) that keeps players from deviating from the equilibrium path phase (I). The
equilibrium path phase (I) is the carrot (i.e., the reward) offered to players for carrying out
the punishment phase (II). ♦
Example 3.12. Stick-and-carrot strategies.
Let G∞ (δ) be the infinite repetition of the following normal form game G:
2
A B C
A 4, 4 0, 5 1, 0
1 B 5, 0 3, 3 0, 0
C 0, 1 0, 0 − 12 , − 12
(i) Suppose that only actions A and B may be played. For what values of δ can the
payoff vector (4, 4) be sustained as a subgame perfect equilibrium payoff? Why?
(ii) Now suppose that all three actions may be played. For what values of δ can the
payoff vector (4, 4) be sustained using a stick-and-carrot strategy with a one-period
punishment? Why?
(iii) Would increasing the length of the punishment allow (4, 4) to be sustained for lower
values of δ? Explain why or why not.
Solution:
–149–
(i) If only strategies A and B are available, then G∞ (δ) is a repeated Prisoner’s Dilemma.
The strongest punishment that can be applied is to use the grim trigger strategy. If
no one has deviated, then following this strategy is optimal for both players if
4 ≥ 5(1 − δ) + 3δ
⇔ δ ≥ 12 .
If someone has deviated, then following the strategy is clearly optimal for any δ.
Thus payoff (4, 4) can be sustained if δ ≥ 12 .
(ii) Consider the stick-and-carrot strategy whose equilibrium path is ((A, A), (A, A), . . .)
and whose punishment path is ((C, C), (A, A), (A, A), . . .). It is optimal to follow the
strategy on the equilibrium path if
–150–
alternation if players are sufficiently patient. However, the need to alternate complicates
the construction of equilibrium, since we must ensure that no player has a profitable
deviation at any point during the alternation. Fudenberg and Maskin (1991) show that
this can be accomplished so long as players are sufficiently patient.
To avoid alternation and the complications it brings, it is common to augment the repeated
game by introducing public randomization: at the beginning of each period, all players view
the realization of a uniform(0, 1) random variable, enabling them to play a correlated action
in every period. If on the equilibrium path players always play a correlated action whose
expected payoff is v, then each player’s continuation payoff is always exactly v. Since in
addition the benefit obtained in a the current period from a one shot deviation is bounded
(by maxa ui (a) − mina ui (a)), the equilibrium constructions and analyses from Examples 3.9
and 3.11 go through with very minor changes.
It is natural to ask whether public randomization introduces equilibrium outcomes that
would otherwise be impossible. Since the folk theorem holds without public randomiza-
tion, we know that for each payoff vector v ∈ F∗ , there is a δ(v) such that v can be achieved
in a subgame perfect equilibrium of G∞ (δ) whenever δ > δ(v). Furthermore, Fudenberg
et al. (1994) show that any given convex, compact set in the interior of F∗ contains only
subgame perfect equilibrium payoff vectors of G∞ (δ) once δ is large enough. However,
Yamamoto (2010) constructs an example in which the set of subgame perfect equilibrium
payoff vectors of G∞ (δ) is not convex (and in particular excludes certain points just inside
the Pareto frontier) for any δ < 1; thus, allowing public randomization is not entirely
without loss of generality even for discount factors arbitrarily close to 1.
2. How can one obtain payoffs close to the players’ mixed minmax values?
The the stick and carrot equilibrium from Example 3.11 relied on pure minmax actions
as punishments. As Example 3.10 illustrates, such punishments are not always strong
enough to sustain all vectors in F∗ as subgame perfect equilibrium payoffs.
The difficulty with using mixed action punishments is that when a player chooses a mixed
action, his opponents cannot observe his randomization probabilities, but only his realized
pure action. Suppose we modified the stick and carrot strategy profile from Example 3.11
by specifying that player i play his mixed minmax action αi in the punishment phase.
Then during this punishment phase, unless player i expects to get the same stage game
payoff from each action in the support of αi , he will have a profitable and undetectable
deviation from αi to his favorite action in the support of αi .
To address this problem, we need to modify the repeated game strategies so that when
player i plays one of his less preferred actions in the support of αi , he is rewarded with a
higher continuation payoff. More precisely, by carefully balancing player i’s stage game
payoffs and continuation payoffs, one can make player i indifferent among playing any
of the actions in the support of αi , and therefore willing to randomize among them in the
way his mixed minmax action specifies. See Mailath and Samuelson (2006, Sec. 3.8) for a
textbook treatment.
3. What new issues arise when there are three or more players?
–151–
With two players, we can always find an action profile (αm
1
, αm
2
) in which players 1 and 2
simultaneously minmax each other. This possibility was the basis for the stick and carrot
equilibrium from Example 3.11.
Once there are three players, simultaneous minmaxing may no longer be possible: if
(α1 , α2 ) minmaxes player 3, there may be no α3 such that (α1 , α3 ) minmaxes player 2.
In such cases, there is no analogue of the two-player stick and carrot equilibrium, and
indeed, it is not always possible to achieve payoffs close to the players’ minmax values in
a subgame perfect equilibrium.
Example 3.13. Fudenberg and Maskin (1986). Consider this three-player game G:
2 2
A2 B2 A2 B2
A1 1, 1, 1 0, 0, 0 A1 0, 0, 0 0, 0, 0
1 1
B1 0, 0, 0 0, 0, 0 B1 0, 0, 0 1, 1, 1
3: A3 3: B3
Each player’s minmax value is 0, so F∗ = {(λ, λ, λ) : λ ∈ (0, 1]}
Let λ = min{λ : (λ, λ, λ) is an subgame perfect equilibrium payoff of G∞ (δ)}. (It can be
shown quite generally that this worst payoff exists—see Section 3.3.)
We claim that for all δ ∈ (0, 1), λ ≥ 41 :
Why? Consider the subgame perfect equilibrium that generates payoffs of (λ, λ, λ). For
any mixed strategy (α1 , α2 , α3 ) of G, there is a player who can guarantee himself 41 . Both
the equilibrium path and the worst punishment path give each player payoffs of λ. Hence,
for no player to have an incentive to deviate, we must have that λ ≥ (1 − δ) 14 + δλ, or
equivalently, that λ ≥ 14 . ♦
The problem in the previous example is that the players’ payoff functions are identical,
so that no one player can be rewarded or punished independently of the others. In order
to obtain a folk theorem for games with three or more players, we must restrict attention
to stage games in which such independent provision of incentives is possible. Fudenberg
and Maskin (1986) accomplish this under the assumption that the set F∗ is full dimensional,
meaning that it has the same dimension as the number of players. Abreu et al. (1994)
do so under the weaker restriction that the stage game satisfy the nonequivalent utility
condition stated in Theorem 3.8, and show that this condition is essentially necessary for
the conclusion of the folk theorem to obtain.
To obtain a pure minmax folk theorem for games with three or more players satisfying the
NEU condition using an analogue of the stick and carrot strategy profile from Example
3.11, one first replaces the single punishment stage with distinct punishments for each
player (as in Example 3.9). To provide incentives to player i’s opponents to carry out a
punishment of player i, one must reward the punishers once the punishment is over. See
Mailath and Samuelson (2006, Sec. 3.4.2) for a textbook treatment.
–152–
3.3 Computing the Set of Subgame Perfect Equilibrium Payoffs
A good reference on this material is Stokey and Lucas (1989, Sec. 4.1–4.2).
Definitions
X ⊆ Rn state space
Γ:X⇒X feasibility correspondence
Γ(x) = states feasible tomorrow if today’s state is x
Φ : X ⇒ X∞ feasibility correspondence for sequences
Φ(x) = {{xt }∞
t=0 : x0 = x, xt+1 ∈ Γ(xt ) ∀t ≥ 0}
F:X×X →R payoff function. F(x, y) = agent’s payoff today if
today’s state is x and tomorrow’s state is y
δ ∈ (0, 1) discount rate.
Example 3.14. If xt = capital at time t and ct = f (xt ) − xt+1 = consumption at time t, then the
payoff at time t is u(ct ) = u( f (xt ) − xt+1 ) = F(xt , xt+1 ). ♦
Assumptions
(i) X is convex or finite.
(ii) Γ is non-empty, compact valued, and continuous.
(iii) F is bounded and continuous.
–153–
Theorem 3.15 (The principle of optimality (Bellman (1957))).
Consider the functional equation
Theorem 3.15 is a direct analogue of the one-shot deviation principle (Theorem 2.25).
But its statement focuses on the value function (payoffs) rather than the policy function
(strategies).
Part (ii) of the theorem tells us that in order to solve the dynamic program (D), it is enough
to find a bounded solution to the functional equation (F).
If w: X → R describes the continuation values tomorrow, then Tw(x) is the (optimal) value
of being at x today. Notice that Tw ∈ B(X).
Observation 3.16. v is a fixed point of T (v(x) = Tv(x) for all x) if and only if v solves (F).
Theorem 3.17.
(i) T(C(X)) ⊆ C(X). (T maps C(X) into itself.)
(ii) kTw − Tŵk ≤ δ k w − ŵ k. (T is a contraction.)
(iii) T admits a unique fixed point v, and lim Tk w = v for all w ∈ C(X).
k→∞
Part (ii) of the theorem says that the Bellman operator is a contraction mapping. Given
parts (i) and (ii), part (iii) follows from the Banach (or contraction mapping) fixed point
theorem.
Theorem 3.17(iii) says that the functional equation (F) can be solved by value function
iteration. One starts with an arbitrary guess v0 about the value function. One then uses v0
–154–
to define the continuation values in (20). Solving (20) generates a new value function v1 ,
which is then used to define the continuation values in (20). Theorem 3.17(iii) says that
the sequence of value functions generated by this process converges to the the fixed point
of the Bellman operator T, which is the solution to the functional equation (F), and hence
the solution to the dynamic program (D).
–155–
We want to factor elements of V using pairs (a, c), where a ∈ A is the initial action profile
and c : A → Rn is a continuation value function.
The value of pair (a, c) is the vector v(a, c) = (1 − δ)u(a) + δc(a) ∈ Rn .
We say that W is self-generating if W ⊆ B(W). This means that one can enforceably factor
any payoff vector in W into an action profile today and continuation values in W tomorrow,
so that the problem of obtaining a value in W can be “put off until tomorrow”. But when
tomorrow comes, we can put off obtaining the value in W until the following day. . . If
we repeat this indefinitely, then in the end we will have constructed a subgame perfect
equilibrium. (W must be bounded for this to work for certain: see Example 3.23 below.)
Theorem 3.18. If W is self-generating and bounded, then W ⊆ V.
Theorem 3.20 provides an algorithm for computing the set of subgame perfect equilibrium
payoff vectors. To state it, we define W0 = {w ∈ conv(u(A)) : wi ≥ vi for all i ∈ P } to be the
set of feasible, weakly individually rational payoff vectors, and define {Wk }∞k=1
inductively
by Wk+1 = B(Wk ). Obviously W1 = B(W0 ) ⊆ W0 , and in fact, the sequence {Wk }∞ k=1
is
monotone. Since V is a fixed point of B, it is a plausible candidate to be the limit of the
sequence. Theorem 3.20 confirms that this is the case.
–156–
T∞
Theorem 3.20. k=0 Wk exists and equals V.
In this theorem, Wk is the set of equilibrium payoff vectors of games of the following
form: k rounds of the stage game G, followed by (history-dependent) continuation values
drawn from W0 (mnemonic: Wk = Bk (W0 ) comes k periods before W0 ). As k grows larger,
the time before the continuation payoffs from W0 appear is put off further and further into
the future.
Remarks:
1. All of these results can be generalized to repeated games with imperfect public
monitoring—see Abreu et al. (1990).
2. Using these results, one can prove that the set V of (normalized) subgame perfect
equilibrium payoffs is monotone in the discount factor δ ∈ (0, 1). In other words,
increasing players’ patience increases the set of equilibrium outcomes.
Example 3.21. Consider an infinite repetition of the Prisoner’s Dilemma below. What is
the set of subgame perfect equilibrium payoffs when δ = 43 ?
2
C D
C 1, 1 −1, 2
1
D 2, −1 0, 0
HC,DL
HC,CL
W0
HD,DL
HD,CL
To begin, we compute the set W1 = B(W0 ). For each action profile a, we determine the set
W1a of payoff vectors that can be enforceably factored by some pair (a, c) into W0 (meaning
that c : A → W0 ). Then W1 is the union of these sets.
–157–
First consider a = (D, D). For this action profile to be enforceable, neither player can prefer
to deviate to C. Player 1 does not prefer to deviate if
These inequalities show that if c(D, D) = c(C, D) = c(D, C), the pair (a, c) will be enforceable.
(This makes sense: one does not need to promise future rewards to make players choose
a dominant action.) Thus, for any w ∈ W0 , we can enforce action profile (D, D) using a
continuation value function c with c(D, D) = w.
The value for the pair ((D, D), c) with c(D, D) = w is
(1 − δ)u(D, D) + δc(D, D) = 1
4
· (0, 0) + 43 · w.
In the figure below, the full shaded area is W0 ; the smaller shaded area is W1DD = 14 · (0, 0) +
3
4
· W0 , the set of payoff vectors that can be enforceably factored by some pair ((D, D), c)
into W0 .
HC,DL
HC,CL
W1DD
HD,DL
HD,CL
Now consider a = (C, C). In this case, the enforceability constraints are
–158–
⇔ 14 · 1 + 34 c1 (C, C) ≥ 14 · 2 + 34 c1 (D, C)
⇔ c1 (C, C) ≥ 13 + c1 (D, C), and, symmetrically,
c2 (C, C) ≥ 13 + c2 (C, D).
(1 − δ)u(C, C) + δc(C, C) = 1
4
· (1, 1) + 43 · w.
In the figure below, the full shaded area is {w ∈ W0 : w1 , w2 ≥ 13 }, the set of allowable values
for c(C, C); the smaller shaded area is the set W1CC = 14 · (1, 1) + 43 · {w ∈ W0 : w1 , w2 ≥ 13 }.
HC,DL
HC,CL
W1CC
HD,DL
HD,CL
Next consider a = (C, D). In this case the enforceability constraints are
c1 (C, D) ≥ 1
3
+ c1 (D, D) and c2 (C, D) ≥ − 13 + c2 (C, C).
That is, we only need to provide incentives for player 1. Reasoning as above, we find that
for any w ∈ {w ∈ W0 : w1 ≥ 31 }, we can enforce (C, D) using a c with c(C, D) = w. The value
for the pair ((C, D), c) with c(C, D) = w is
(1 − δ)u(C, D) + δc(C, D) = 1
4
· (−1, 2) + 34 · w.
In the figure below at left, the larger shaded area is {w ∈ W0 : w1 ≥ 13 }; the smaller shaded
–159–
area is the set W1CD = 14 · (1, 1) + 34 · {w ∈ W0 : w1 ≥ 31 }. The figure below at right shows the
construction of W1DC , which is entirely symmetric.
HC,DL HC,DL
HC,CL HC,CL
W1CD
W1DC
HD,DL HD,DL
HD,CL HD,CL
HC,DL W1 = W0
HC,CL
HD,DL
HD,CL
Repeating the argument above shows that Wk+1 = Wk = . . . = W0 for all k, implying that
V = W0 . In other words, all feasible, weakly individually rational payoffs are achievable
in subgame perfect equilibrium when δ = 43 . ♦
Example 3.22. Consider the same Prisoner’s Dilemma stage game as in the previous ex-
ample, but suppose that δ = 12 . The sets W0 , . . . , W5 are shown below.
–160–
HC,DL W0 HC,DL HC,DL
W1 W2
Taking the limit, we find that the set of subgame perfect equilibrium payoffs is
∞
[ n o
V = ( 21k , 21k ), ( 21k , 0), (0, 21k ) ∪ {(0, 0)}.
k=0
HC,DL
W* = V
HC,CL
HD,DL
HD,CL
–161–
The payoff (1, 1) is obtained in a subgame perfect equilibrium from the grim trigger
strategy profile with equilibrium path (C, C), (C, C), . . . . The payoff (1, 0) is obtained from
the one with equilibrium path (D, C), (C, D), (D, C), (C, D), . . . ; payoff (0, 1) is obtained by
reversing roles. Payoff ( 21k , 21k ) is obtained by playing (D, D) for k periods before beginning
the cooperative phase; similarly for ( 21k , 0) and (0, 21k ). Finally the “always defect” strategy
profile yields payoff (0, 0). ♦
Once one understands why Theorems 3.18 and 3.19 are true, the proofs are basically
bookkeeping. The proof of Theorem 3.20, which calls on the previous two theorems,
requires more work, but it can be explained quickly if the technicalities are omitted. This
is what we do below.
Example 3.23. That W is bounded ensures that we can’t put off actually receiving our
payoffs forever. To see this, suppose that W = R+ and δ = 1/2. Then we can decompose
1 ∈ W as 1 = 0 + δ · 2. And we can decompose 2 ∈ W as 2 = 0 + δ · 4. And we can decompose
4 ∈ W as 4 = 0 + δ · 8 . . . And the payoff 1 is never obtained. ♦
–162–
For each a ∈ A, let c(a) = π(s|a ), where s|a is the strategy profile starting in period 1 if action
profile a is played in period 0. Then:
(i) w = π(s) = v(a0 , c). (In words: the pair (a0 , c) generates the value w.)
(ii) Since s is a SPE, enforceability in period 0 requires that
(1 − δ)ui (a0 ) + δπi (s|a0 ) ≥ (1 − δ)ui (ai , a0−i ) + δπi (s|(ai ,a0 ) ) for all ai ∈ Ai , i ∈ P .
−i
⇔ vi (a , c) ≥
0
vi ((ai , a0−i ), c) for all ai ∈ Ai , i ∈ P .
–163–
4. Bayesian Games
A strategic environment is said to have incomplete information if when play begins, play-
ers possess payoff-relevant information that is not common knowledge. Bayesian games
(Harsanyi (1967–1968)) provide a tractable way of modeling such environments. While
many basic Bayesian games are not difficult to interpret, the Bayesian game framework
turns out to be extremely general, and so involves many subtleties and complexities. Sec-
tions 4.1–4.6 focus on games whose interpretations are relatively straightforward; more
ingenious games, foundational questions, and other challenging material is presented in
Sections 4.7–4.9.
The fundamental new notion in Bayesian games is that of a type. A player’s type speci-
fies whatever payoff-relevant information and beliefs he holds that other players do not
know. For instance, in an auction environment, a player’s type includes both his private
information about the good for sale, as well as his beliefs about other players’ types. More
on this shortly.
A Bayesian game (Harsanyi (1967–1968)) is a collection BG = {P , {Ai }i∈P , {Ti }i∈P , {pi }i∈P , {ui }i∈P }.
As in our previous game models, the definition of the game is common knowledge among
the players.
The interaction defined above is a simultaneous move Bayesian game: all players choose
actions at the same time. We concentrate on this setting to focus our attention on the novel
aspects of Bayesian games.
–164–
We usually think of these choices being made when each player i knows his own type ti ,
but not the other players’ types. This moment is known as interim stage.
Example 4.1. A k-card card game. Two players play a game using a deck containing k cards
that are numbered 1 through k. To start, each player antes $1. The deck is shuffled and
each player is dealt one card. Each player simultaneously chooses to Play, which requires
putting an additional $1 into the pot, or to Fold. If both players Play, the player with the
higher card takes the pot. If only one player Plays, he takes the pot. If both players Fold,
each takes back his ante.
This game is described formally as follows:
2 if ai = a j = P, ti > t j ,
Ai = {P, F},
if ai = P, a j = F,
1
Ti = {1, . . . , k},
ui (a, t) =
0 if ai = a j = F,
1
if ti , t j ,
pi (ti | t j ) =
k−1 if ai = F, a j = P,
−1
if ti = t j ,
0
if ai = a j = P, ti < t j .
−2
♦
Beliefs in BG can be derived from a common prior distribution p ∈ ∆T if the first-order beliefs
pi (· | ti ) are conditional probabilities (or posterior probabilities) generated from p:
p(ti , t−i )
(21) pi (t−i | ti ) = P .
p(ti , t̂−i )
t̂−i ∈T−i
Example 4.2. The common prior distribution of the k-card card game describes the distri-
bution of possible deals:
1
if t1 , t2 ,
p(t1 , t2 ) =
k(k−1)
0 if t1 = t2 . ♦
In the card game, there common prior describes both players’ beliefs at the ex ante stage,
meaning the time before the cards are dealt.
Most economic applications of Bayesian games employ the common prior assumption—the
assumption that the players’ first order beliefs pi can be derived from some common prior
p using (21). This assumption means that it is as if the game has an ex ante stage at which
no player has learned his type. See Section 4.4 for a discussion.
–165–
The next observation is useful for understanding Bayesian games with a common prior:
Observation 4.3. A Bayesian game BG with common prior distribution p is equivalent to the
following extensive form game ΓBG :
Stage 0: Nature draws t according to the common prior p ∈ ∆T. Player i learns ti only.
Stage 1: Each player i chooses an action from Ai .
In Bayesian games with independent types (including those in which only one player has
multiple types), we can identify a player’s type with some payoff-relevant information
that he possesses. For instance, imagine that a seller is auctioning off a collection of
disparate goods as a bundle. Each bidder receives a private signal about the quality of a
distinct good from the bundle, and these signals are independent of one another. Because
the goods are sold as a bundle, the signals of a bidder’s opponents contain information
about the value of the bundle to the bidder himself. This aspect of types is reflected in
the fact that player i’s utility function ui may condition on his own type ti and on other
players’ types t−i .
In Bayesian games with correlated types, a player’s type also describes beliefs that he
holds but that other players do not know. For instance, consider this environment for
mineral rights auctions (see also Section 4.4): A seller is auctioning the right to drill for oil
on a certain plot of land. Each bidder receives an unbiased private signal about the value
of the land (obtained, for instance, by having a team of geologists examine core samples).
–166–
In this case, a bidder who receives a high signal about the land’s quality is likely to believe
that other bidders have also received high signals. These beliefs depend on the bidder’s
signal, and so are only known to the bidder himself. This aspect of types is reflected in
the fact that a player’s first-order beliefs pi depend on his type ti .
Notice that one aspect of a player’s type is his beliefs about other players’ types, and that
one aspect of those types is those players’ beliefs about other players’ types, and that one
aspect of those types is those players’ beliefs about other players’ types. . . Thus in settings
with correlated types, the simple definition of Bayesian games given above is less simple
than it seems: in addition to first-order beliefs, we have second-order beliefs, third-order
beliefs, and so on. We return to this point in Sections 4.7–4.9.
Bayesian strategies
Let Si denote player i’s set of pure Bayesian strategies. Let S−i = S j and S =
Q Q
j,i j∈P S j.
In games with an ex ante stage, a Bayesian strategy describes a player’s plan for playing
the game at that stage. (For other games, see Section 4.4.)
Interim payoffs
If player i of type ti chooses action ai , and each opponent j , i plays some Bayesian strategy
s j , then type ti ’s interim payoff (or expected payoff ) is
X
(22) Ui (ai , s−i |ti ) = pi (t−i |ti ) ui ((ai , s−i (t−i )), (ti , t−i )).
t−i ∈T−i
Example 4.4. Consider the card game from Example 4.1, in which (i) each player is dealt a
card which only he observes, and then (ii) each player simultaneously chooses an action.
According to definition (22), when a player with card ti is deciding what to do, he uses his
posterior beliefs pi ( · |ti ) as his assessment of the probabilities to the possible cards t j of his
–167–
opponent. (In this example the correlation in types takes a simple form: the player knows
that his opponent’s card differs from his own, but puts equal weight on his opponent
holding each of the remaining cards.)
Now suppose that (as assumed in equilibrium) player i correctly anticipates his opponent’s
strategy s−i , and hence the action s j (t j ) that his opponent would play if she had card t j .
Definition (22) indicates that player i’s payoffs are affected by his opponent’s card in two
distinct ways: there is a direct effect (if both players Play, he will win if his card is better
than his opponent’s), as well as an indirect effect (his opponent’s card determines whether
she will Play or Fold). ♦
Dominance
(23) Ui (ai , s−i |ti ) ≥ Ui (âi , s−i |ti ) for all s−i ∈ S−i and âi ∈ Ai .
In words: action ai is optimal for ti given his beliefs about his opponents’ types, regardless
of his opponents’ Bayesian strategies.
Bayesian strategy si is very weakly dominant if for each type ti ∈ Ti , action si (ti ) is very
weakly dominant for ti . That is:
(24) Ui (si (ti ), s−i |ti ) ≥ Ui (âi , s−i |ti ) for all s−i ∈ S−i , âi ∈ Ai , and ti ∈ Ti .
This is the standard notion of dominance used in mechanism design (see Section 7.1.3).
In Bayesian games with private values, (ui (a, t) ≡ ui (a, ti )), the interim condition (23) is
equivalent to the simpler ex post condition
ui ((ai , a−i ), ti ) ≥ ui ((âi , a−i ), ti ) for all a−i ∈ A−i and âi ∈ Ai .
(Verifying this is a good finger exercise.) Thus with private values, (24) simplifies to
(25) ui ((si (ti ), a−i ), ti ) ≥ ui ((âi , a−i ), ti ) for all a−i ∈ A−i , âi ∈ Ai , and ti ∈ Ti .
We introduce an ex post (but equilibrium) solution concept for games with interdependent
values in Section 4.6.
Action ai is weakly dominant for type ti if, in addition to (23), for all âi , ai there is an s−i ∈ S−i
for which the inequality in (23) is strict. Bayesian strategy si is weakly dominant if it is very
weakly dominant and if si (ti ) is weakly dominant for some type ti ∈ Ti .
–168–
One can define weak domination, strict dominance, and strict domination in a similar
fashion.
Bayesian equilibrium
(26) Ui (si (ti ), s−i |ti ) ≥ Ui (âi , s−i |ti ) for all âi ∈ Ai , ti ∈ Ti , and i ∈ P .
In words: for each player i and each type ti , action si (ti ) is optimal given ti ’s beliefs about
his opponents’ types, and given his opponents’ Bayesian strategies s−i .
The definition of mixed strategy Bayesian equilibrium is what one would expect, but
requires more notation.
If player i of type ti chooses mixed action αi , and each opponent j , i plays some mixed
Bayesian strategy σ j , then type ti ’s expected payoff is
X X Y
Ui (αi , σ−i |ti ) = pi (t−i |ti ) αi (ai ) · σ j (a j | t j ) ui (a, (ti , t−i )) .
t−i ∈T−i a∈A j,i
Ui (σi (ti ), σ−i |ti ) ≥ Ui (α̂i , σ−i |ti ) for all α̂i ∈ ∆Ai , ti ∈ Ti , and i ∈ P .
Remarks:
(i) Since we are considering Bayesian games with simultaneous moves, Bayesian
equilibrium need not (and does not) include any notion of sequential rationality.
For this reason, the terms Bayes-Nash equilibrium or just Nash equilibrium are often
used in place of Bayesian equilibrium.
(ii) If Bayesian game BG has a common prior, then a Bayesian equilibrium of BG is just
a Nash equilibrium of the extensive form game ΓBG from Observation 4.3.
A Bayesian game BG can be represented as a simultaneous move game using the interim
normal form INF(BG). If BG has player set P and type sets Ti , the set of players of INF(BG)
is {(i, ti ) : i ∈ P , ti ∈ Ti }: that is, the interim normal form has one player for every type
P
of every player in the original game, and hence i∈P |Ti | players in total. Player (i, ti )
–169–
has action set Ai , and his utility function is the interim payoff function Ui (ai , s−i |ti ) from
(22). (Notice that in INF(BG), player (i, ti )’s payoffs are independent of the actions of any
opponents (i, t̂i ) corresponding to the same player in the original game BG.)
Because Bayesian equilibrium only depends on interim payoffs, the following observation
is immediate:
Observation 4.5. A pure (or mixed) strategy profile is a Bayesian equilibrium of BG if and only
if it corresponds to a pure (or mixed) Nash equilibrium of INF(BG).
It follows immediately from this observation and Theorem 1.39 that every Bayesian game
with finite type and action sets admits a (possibly mixed) Bayesian equilibrium.
The remarks above show that Bayesian equilibrium is really just Nash equilibrium in the
context of a Bayesian game. Thus in settings with finite numbers of types and actions (and
even some without), computation of Bayesian equilibrium works like computation of Nash
equilibrium: first apply iterated dominance arguments, and then check all combinations
of strategy profiles that remain. In both cases, equilibria are fixed points, and finding all
fixed points generally requires an exhaustive search.
Example 4.6. Example 2.51 introduced the extensive-form Ace-King-Queen Poker game
ΓAKQ . In this game, the cards are dealt, player 1 chooses to Raise (R) or Fold (F), and in the
event of a raise, player 2 chooses to Call (c) or Fold ( f ).
To represent this interaction as a simultaneous move Bayesian game BGAKQ , we describe
the play of the game after the cards are dealt as a simultaneous move game. This game
has type spaces T1 = {A, K, Q}, T2 = {a, k, q} and a common prior p ∈ ∆T, specified in the
next table, that describes the possible deals of the cards.
t2
a k q
A 0 16 16
t1 K 16 0 16
Q 16 16 0
Payoffs are described below. Notice that payoffs only depend on types if player 1 raises
and player 2 calls, in which case the player with the higher card wins:
–170–
2
c f
(2, −2) if t ∈ {Ak, Aq, Kq},
R u((R, c), t) 1, −1 where u((R, c), t) =
1 (−2, 2) if t ∈ {Ka, Qa, Qk}.
F −1, 1 −1, 1
In Example 2.51, we showed that the unique sequential equilibrium of the extensive form
game ΓAKQ is
–171–
t2 = k
c f
R −2, 0 − 21 , −1
t1 = Q
F −1, − 12 −1, 0
The unique Nash equilibrium of this game has t1 = Q play 13 R + 23 F and has t2 = k play
1
3
c + 23 f , as stated in (27).
(How do we obtain the payoffs in the table above? First consider the payoffs of t1 = Q.
This type is equally likely to face a type a, who we know will play c, or a type k. Suppose
type Q raises. If type k plays c, then type Q gets −2 for sure. If instead type k plays f , then
type Q will get −2 if player 2 is type a and 1 if player 2 is type k, for an expected payoff of
1
2
(−2) + 12 (1) = − 21 . Finally, if type Q folds, it gets −1 for sure.
Now consider the payoffs of t2 = k. This type is equally likely to face a type A, who we
know will play R, or a type Q. Suppose type k calls. If type Q plays R, then type k’s
expected payoff is 12 (−2) + 12 (2) = 0; if type Q plays F, then type k’s expected payoff is
1
2
(−2) + 21 (1) = − 12 . Now suppose type k folds. If type Q plays R, then type k gets −1 for
sure. If type Q plays F, then type k’s expected payoff is 21 (−1) + 12 (1) = 0.) ♦
Example 4.7. In the two-player Bayesian game BG, player i’s type ti , representing his level
of productivity, takes values in the finite set Ti ⊂ {1, 2, . . .}. Types are drawn according to
the prior distribution p on T = T1 × T2 . After types are drawn, each player chooses to be
In or Out of a certain project. If player i chooses Out, his payoff is 0. If player i chooses In
and player j chooses Out, player i’s payoff is −c, where c > 0 is the cost of participating
in the project. Finally, if both players choose In, then player i’s payoff is ti t j − c. Thus,
a player who chooses In must pay a cost, but the project only succeeds if both players
choose In; in the latter case, the per-player benefit of the project is the product of the
players’ productivity levels.
Now suppose that the type sets are T1 = {3, 4, 5} and T2 = {4, 5, 6}, and that the prior
distribution p is given by the table below.
t2
4 5 6
3 .2 .1 0
t1 4 .1 .1 .1
5 0 .1 .3
Use σti i to denote the probability with which a player i of type ti chooses In.
–172–
When c = 15, there are three Bayesian equilibria. In all three, types t1 = 3, t1 = 4, t2 = 4,
and t2 = 5 choose Out. The possible behaviors of the remaining two types are
This is shown as follows: The highest benefit that type t1 = 3 could obtain from playing
In is 32 · 12 + 31 · 15 = 13. Since this is less than c = 15, this type stays out in any equilibrium.
Proceeding sequentially, we argue that types t2 = 4, t1 = 4, and t2 = 5 play Out. The
highest benefit type t2 = 4 could obtain from playing In is 31 · 16 = 5 13 < 15, so this
type plays Out; thus, the highest benefit type t1 = 4 could obtain from playing In is
1
3
· 20 + 13 · 24 = 44
3
= 14 23 < 15, so this type plays Out; and thus, finally, the highest benefit
type t2 = 5 could obtain from playing In is 31 · 25 = 8 13 < 15, so this type plays Out.
Conditional on this behavior for the low and middle productivity types, the remaining
types, t1 = 5 and t2 = 6, are playing a 2 × 2 coordination game, namely
t2 = 6
In Out
In 7 2 , 7 2 −15, 0
1 1
t1 = 5
Out 0, −15 0, 0
where 7.5 = 43 · 30 − 15. It is an equilibrium for both to be In, and also for both to be Out.
For type t1 = 5 to be willing to randomize, his expected benefit to being in must equal
c = 15; thus 34 · σ62 · 30 = 15, implying that σ62 = 23 . Virtually the same calculation shows that
σ51 = 23 as well. ♦
Observation 4.3 says that a Bayesian games BG with a common prior can be represented
as an extensive form game ΓBG that begins with a random draw of types from the common
prior distribution. There are applications in which this is just what happens: that is, there
really is an ex ante stage at which the players have identical information, followed by a
moment at which each player receives his private information. In such applications, we
say that the Bayesian game BG admits an ex ante interpretation. Card games (Examples 4.4
–173–
and 4.6) are one example of this sort. Another example is that of a mineral rights auction:
Bidders’ begin with publicly available information about possible amounts of a mineral
in a given tract of land. Then each bidder sends its own geologist to the tract, and the
signals reported to each bidder by its geologist are independent conditional on the tract’s
actual quality. (See Section 4.6 for an example.) The ex ante interpretation only applies
in some applications. But when it does, there is really no difference at all between BG
and ΓBG , and the Bayesian game is really just a particular sort of game with asymmetric
information.
On the other hand, many applications of Bayesian games in economics, even ones with a
common prior, do not have an ex ante stage. For instance, in the joint production game
(Example 4.7), the payoff-relevant information in a player’s type is his level of productivity,
which is a characteristic that we’d expect the player to have known long before the game
is played. In cases like this, the Bayesian game should be given an interim interpretation. In
this interpretation, each player i knows from the very start what his type ti , and the other
types in Ti are there to allow us to capture other players’ uncertainty about player i’s type.
Now in defining Bayesian equilibrium, we require optimal play by all types of player i:
not just his actual type ti , but also all other types t̂i . This is the only sensible way to define
equilibrium in this setting: i’s opponents don’t know what type i is, so in equilibrium they
should anticipate optimal behavior from each possible type; if optimality is not required
for type t̂i , it is as if i’s opponents know that this is not i’s actual type.
To start the section, we said that a strategic interaction has incomplete information if when
play begins, players possess payoff-relevant information that is not common knowledge.
This is so when play begins at the interim stage. Thus a Bayesian game with the interim
interpretation is a convenient way of representing a strategic interaction with incomplete
information. In Section 4.8, we explain why any such interaction can be represented by a
Bayesian game.
Common priors
–174–
players and only one has private information, then the prior is just a description of his
opponents’ beliefs. Also, if there are two players and neither player’s first-order beliefs
depend on his type, then beliefs can be derived from a common prior with independent
types.
In information economics, the vast majority of models assume a common prior, with
or without independent types. There are practical reasons for doing this. Models with
a common prior are less complicated to write down. Also, in applications that do not
assume a common prior, equilibria can be driven by different players having irreconcilable
beliefs about payoff-relevant variables. In most applications this is viewed as undesirable.
But there are settings in which common priors lead to surprising results, most notably the
no trade theorem: Milgrom and Stokey (1982) consider risk-averse agents with a common
prior and a joint allocation that is Pareto efficient relative to this prior. They show that
if the agents subsequently receive private information, it cannot be common knowledge
that mutually beneficial recontracting is possible. The reason is that if one agent suggests
a new contract that he is willing to sign, this fact itself provides information about what
this agent learned, making the contract look undesirable to his opponent. If a common
prior is not assumed, then this impossibility result no longer holds–see Morris (1994). For
further discussion of these and related results, see Morris (1995) and Samuelson (2004).
Despite the prevalence of the common priors in applications, and despite some assertions
to the contrary in the literature, there is no reason why a common prior must be assumed
under the interim interpretation of Bayesian games. For discussions, see Dekel and Gul
(1997), Gul (1998), and Section 4.8 below.
4.5 Applications
4.5.1 Auctions
Auction models not only have great practical importance, but are also a rich source of
examples of Bayesian games with continuous type and action sets. The analyses here
introduce ideas that will be useful in later sections.
Basic references: Vickrey (1961, 1962), Myerson (1981), Riley and Samuelson (1981). A
good textbook presentation is Krishna (2002, ch. 2).
We consider the following auction environment: The set of players is P = {1, . . . , n}. Player
i’s valuation for the good being sold, vi ∈ [0, 1], is his private information. Player i’s utility
–175–
is vi − p if he obtains the good and pays p, and his utility is −p if he does not obtain the
good and pays p.
Players’ valuations are drawn independently of one another, and they are symmetric,
in that they are all drawn from the same distribution on [0, 1]. The cdf and pdf of this
distribution are denoted F and f . We assume that f is positive, which implies that F is
increasing.
In auction theory, a sealed-bid auction is one in which the bidders submit bids simultane-
ously. A pure Bayesian strategy for player i, sometimes called a bidding strategy, is a map
bi : [0, 1] → [0, 1]. We will only consider pure strategies.
We consider two sealed-bid auction formats.
In a second-price auction, the player who submitted the highest bid wins the object and
pays the amount of the second-highest bid. If more than one player submits the highest
bid, the object is assigned to one of these players at random, who pays the amount of the
bid.
Proposition 4.8. In a second-price auction, β∗i (vi ) = vi (i.e., always bidding one’s valuation) is a
weakly dominant strategy.
Proof. Let vi be player i’s valuation, let bi , vi be a possible alternative bid for player i and
let b∗−i = max j,i b j be the highest of the opponents’ bids. First suppose that bi > vi . If b∗−i >
bi , then player i loses whether he bids vi or bi . If b∗−i < vi , i wins and pays b∗−i either way.
But if b∗−i ∈ (vi , bi ), then player i loses if he bids vi but wins if he bids bi ; in the latter case, he
pays b∗−i , yielding a payoff of vi − b∗−i < 0. Hence, whenever vi and bi perform differently,
vi does better. In other words, vi weakly dominates bi . Similar reasoning shows that this
is also true for bi < vi , and so β∗i (vi ) = vi is a weakly dominant strategy.
Remarks:
(i) The proof of Proposition 4.8 is identical to the one for the case in which the values
are common knowledge. Thus this result does not depend on the assumption that
values are independent or identically distributed, or that they have full support on
[0, 1], but it does depend on the private values assumption.
(ii) While having all player’s bid their valuations is a Bayesian equilibrium in weakly
dominant strategies, there are many other Bayesian equilibria that use weakly
dominated strategies. For instance, it is a Bayesian equilibrium for player 1 to
always bid 1 and the others to always bid 0. For a complete analysis, see Blume
and Heidhues (2004).
–176–
In a first-price auction, the player who submitted the highest bid wins the object and pays
the amount of his bid. If more than one player submits the highest bid, the object is
assigned to one of these players at random, who pays the amount of the bid.
We only consider symmetric equilibria: b∗ (·) = b∗ (·) for all i ∈ P . (It can be shown that
i
there are no asymmetric equilibria.)
Let Vi be a random variable representing player i’s valuation at the ex ante stage. Let the
m
order statistic V(k) be the kth lowest of V1 , . . . , Vm . Then V(n−1)
n−1
= max{V1 , . . . , Vn−1 } has cdf
n−1
P(V(n−1) ≤ v) = P(V1 ≤ v, . . . Vn−1 ≤ v) = P(V1 ≤ v) × · · · × P(Vn−1 ≤ v) = F(v)n−1 .
See Matthews (1995) for a proof. We will see proofs of similar results later, but this proof
has a number of picky details.
That is, each player bids the expected value of the maximum of his opponents’ valuations,
conditional on this maximum not exceeding his own valuation.
Proof. (Necessity.) Let b be a symmetric equilibrium. By the previous proposition b
is increasing and differentiable. If a player’s opponents choose this strategy, then the
expected payoff of a player of type v who bids as though he were type v̂ is
! !
(v − b(v̂)) P max b(V j ) < b(v̂) = (v − b(v̂)) P max V j < v̂
j,i j,i
d d n−1
b(v̂)F (v̂) = v ·
n−1 F (v̂) .
dv̂ v̂=v dv̂ v̂=v
–177–
Integrating both sides yields from 0 to ṽ (using the continuity of b and Fn−1 ) yields
Z ṽ
n−1
b(ṽ)F (ṽ) − b(0)F n−1
(0) = w dFn−1 (w).
0
Since F(0) = 0, changing the names of the variables and rearranging yields
Rv
w dFn−1 (w)
b(v) = 0
= E(V(n−1)
n−1 n−1
| V(n−1) < v) = b∗ (v).
Fn−1 (v)
This establishes necessity for v ∈ (0, 1). Since b is increasing and b∗ is continuous, we also
have b(0) = 0 = b∗ (0) and b(1) ≥ b∗ (1). In fact, it must be that b(1) = b∗ (1), since this bid
wins the object for sure.
(Sufficiency.) We need to show that b∗ is a symmetric equilibrium. Suppose that a player’s
opponents choose this strategy. Clearly the player should never choose a bid above b∗ (1),
since bidding b∗ (1) already ensures that he wins the object. Thus since b∗ has range
[0, b∗ (1)], it is enough to show that a player of type v is at least as well off choosing b∗ (v)
as b∗ (v̂) for any v̂ ∈ [0, 1].
Substituting b∗ (v̂) into (28) shows that type v’s payoff for placing bid b∗ (v̂) is
R v̂
n−1 Z v̂ Z v̂
0
w dF (w)
F (v̂) = vF (v̂) − w dF (w) =
n−1 n−1 n−1
v − (v − w) dFn−1 (w).
n−1
F (v̂) 0 0
Remark: The proof of necessity considered what would happen if a player of type v acted
as though he were of some other type v̂. This trick is closely related to the revelation
principle for mechanism design, a fundamental result that we present in Section 7.1.4.
Proposition 4.11. In both the weakly dominant equilibrium of the second price auction and the
symmetric equilibrium of the first price auction,
(i) the expected payment of a bidder of type v > 0 is P(V(n−1)
n−1
< v) · E(V(n−1)
n−1 n−1
| V(n−1) ≤ v), and
n
(ii) the seller’s expected revenue is EV(n−1) .
The expected payment in statement (i) is the product of (a) the probability that the maxi-
mum of a player’s opponents’ valuations is lower than v and (b) the equilibrium bid for
–178–
valuation v in a first price auction. The expected revenue in statement (ii) is the expected
value of the second-highest of the players’ valuations.
Proposition 4.11 is an instance of a much more general phenomenon: see Theorems 7.11
and 7.12.
Proof. (i) For the first price auction, the statement is immediate from Proposition 4.10. For
the second price auction, since β∗i (v) = v, and using the fact that ties occur with probability
zero, player i’s expected payment when he is of type v is
! !
∗ ∗ ∗ ∗ ∗
P max β j (V j ) < βi (v) E max β j (V j ) max β j (V j ) < βi (v)
j,i j,i j,i
! !
= P max V j < v E max V j max V j < v
j,i j,i j,i
= P V(n−1) < v E V(n−1) V(n−1) < v .
n−1 n−1 n−1
(ii) In a symmetric private values environment, the seller’s expected revenue can be
determined from the expected payments of each bidder type: if the expected payment of
R1
type v is r̄(v), then the seller’s expected revenue is n E(r̄(Vi )) = n 0 r̄(v) dF(v). Part (i) says
that the function r̄(·) is the same in the given equilibria of both auction formats, so the
expected revenue is the same as well.
To obtain an expression for this expected revenue, it is enough to consider the second
price auction. In the weakly dominant equilibrium of this auction, the seller’s revenue
n
is the second highest valuation among the n agents. In ex ante terms this is V(n−1) , so the
n
expected revenue is EV(n−1) .
In contrast to the environment considered here, many auctions arising in practice feature
interdependent values, meaning that the signals of a bidder’s opponents would give the
bidder a clearer idea of his value for the good. The key early reference on standard
auctions in environments with interdependent values is Milgrom and Weber (1982); see
Krishna (2002) and Milgrom (2004) for textbook treatments. We consider an example
using a “non-standard” auction format in the next section.
Suppose that the players in a normal form game are sent correlated, payoff-irrelevant
signals before the start of play. This augmentation of the normal form game generates a
Bayesian game. Every Bayesian strategy profile in this game induces a distribution over
–179–
action profiles in the normal form game. According to the revelation principle for normal
form games (Proposition 4.12 below), the distribution over action profiles induced by any
Bayesian equilibrium is a correlated equilibrium of the normal form game. Thus correlated
equilibrium describes the distributions over action profiles that are possible if play of a
normal form game is proceeded by the observation of any correlated, payoff-irrelevant
signals. See Myerson (1991, Chapter 6) for further discussion.
Let G = {P , {Ai }i∈P , {ui }i∈P } be a normal form game. Let BG = {P , {Ai }i∈P , {Ti }i∈P , p, {ui }i∈P } be
a Bayesian game with common prior distribution p ∈ ∆T, and with the same action sets
Ai and utility functions ui : A → R as G. Note that ui does not condition on players’ types.
Proof. Assume without loss of generality that p(ti ) = t−i ∈T−i p(ti , t−i ) > 0 for all ti ∈ Ti and
P
Let Ti∗ (ai ) = {ti ∈ Ti : s∗i (ti ) = ai } be the set of player i types who choose action ai under
strategy s∗ . Then
X X X
p(ti , t−i ) ui (s∗i (ti ), s∗−i (t−i )) = ρ∗ (ai , a−i ) ui (ai , a−i ).
ti ∈Ti∗ (ai ) t−i ∈T−i a−i ∈A−i
Therefore, fixing ai and summing inequality (30) over ti ∈ Ti∗ (ai ), we find that
X X
(31) ρ∗ (ai , a−i ) ui (ai , a−i ) ≥ ρ∗ (ai , a−i ) ui (a0i , a−i )
a−i ∈A−i a−i ∈A−i
–180–
for all a0i ∈ Ai and i ∈ P . (31) says that ρ∗ is a correlated equilibrium of G.
In a Bayesian equilibrium, each type ti of player i chooses an action that maximizes his
expected payoffs given his opponents’ Bayesian strategies s−i . These expected payoffs (22)
depend on type ti ’s beliefs pi (·|ti ) about his opponents’ types conditional on his own type.
For a more demanding equilibrium concept, one can still assume that player i correctly
anticipates his opponents’ Bayesian strategies s−i , but can require that the action he chooses
when his type is ti is optimal for any realization of his opponents’ types (and so for all
possible beliefs about their types). This is not a notion of dominance, because equilibrium
knowledge is still used; what is dropped is all information about opponents’ types. This
relaxation is particularly important in auction and mechanism design: when an auction
with an ex post equilibrium can be constructed, there is no need for the designer to know
the players’ beliefs—see Sections 7.5 and 7.6.
Pure Bayesian strategy profile s is an ex post equilibrium if
(32) ui ((si (ti ), s−i (t−i )), t) ≥ ui ((âi , s−i (t−i )), t) for all t ∈ T, âi ∈ Ai , and i ∈ P .
The name “ex post equilibrium” refers to the fact that if there were an ex post stage at
which all players’ types are revealed, then no player of any type would prefer to switch
strategies at that stage. (This is typically not true in a Bayesian equilibrium. For an
example, consider the Bayesian equilibrium of a first price auction (Proposition 4.10.))
Ex post equilibrium is sometimes called belief-free equilibrium, a reference to the fact that
beliefs pi (·|ti ) are irrelevant in definition (32). (This term is more commonly used in the
context of repeated games with private monitoring, where it means something different.)
For either of these reasons, ex post equilibrium is well-defined in a “pre-Bayesian game”
in which beliefs are not specified at all.
–181–
An auction environment with interdependent values
Player i’s utility for obtaining the good when the type profile is t and paying p is vi (t) − p,
and his utility for not obtaining the good and paying p is −p.
When γ = 0, (33) is a private values environment. When γ = 1, (33) is a common value
environment: conditional on any profile of signals t, everyone’s (expected) valuation for
the good is the same. Intermediate values of γ can be used, e.g., when each player’s signal
is most informative about the properties of the good he cares about most. (Note: Some
authors use the term “common values” to refer to all environments with interdependent
values.)
To complete the definition of the Bayesian game, we need to specify the joint distribution
of signals. But because we are considering ex post equilibrium, what we specify plays no
role in the analysis.
(The mathematically simplest specification is that of independent signals. This assump-
tion can be justified when the “good” being sold is a bundle, and each agent’s signal
concerns a distinct item in the bundle. But economic environments with interdependent
values typically have dependent signals. For instance, in a mineral rights auction, after
conditioning on the quality of the tract of land the signals may be independent. However,
the quality is precisely what the bidders do not know, and if we do not condition on this,
the signals are dependent. A standard model for this case is that of so-called affiliated
signals; see Milgrom and Weber (1982).)
Maskin (1992) introduced the following generalization of a second-price auction for the
environment above: Each agent makes a report t̂i ∈ [0, 1]. If bidder i has the unique highest
report, then he receives the object, and he pays the highest of his opponents’ reports plus
γ times the sum of his opponents’ reports. In the event of a tied highest report, one of the
bidders who made this report is selected at random to receive the object and to make the
payment described above.
–182–
Proposition 4.13. In this auction, it is an ex post equilibrium for all bidders to report truthfully.
Proof. Suppose that the type profile is t and that player i’s opponents report truthfully. If
bidder i’s report is higher than any other agent’s report, then agent i’s payoff is
X X
ti + γ t j − max tk + γ t j = ti − max tk .
k,i k,i
j,i j,i
(Notice that this payoff from being allocated the good does not depend on agent i’s report.)
Thus the fact that it is always optimal for player i to tell the truth follows directly from the
analysis of the second price auction. (To check your understanding, make sure you see
why truthful reporting is not a weakly dominant strategy. This is related to how we used
the supposition that player i’s opponents report truthfully.)
In the examples we have seen so far, we have named a player’s types after the payoff-
relevant information he possesses: his card, his productivity level, his valuation for a
good. But as we noted at the end of Section 4.1, when types are not independent, any
privately known elements of a player’s beliefs are also part of his type, and these beliefs
may concern other players’ beliefs about other players’ beliefs. . .
(Soon we will introduce notation for referring to beliefs about beliefs—see Section 4.8.) ♦
When types are correlated, one needs to be careful about specifying beliefs. In some
settings, doing so naively can have dramatic and unexpected consequences. The leading
–183–
example here is the possibility of full surplus extraction in mechanism design. We explore
this topic in detail in Section 7.6.
GL : 2 GR : 2
A B A B
A 2, 2 0, −3 A 0, 0 0, −3
1 1
B −3, 0 −1, −1 B −3, 0 2, 2
–184–
t2
0 1 2 3
2
0 3
0 0 0 ...
1 1
3
ε 1
3
(1 − ε)ε 0 0 ...
t1
2 0 1
3
(1 − ε)2 ε 1
3
(1 − ε)3 ε 0 ...
3 0 0 1
3
(1 − ε)4 ε 1
3
(1 − ε)5 ε ...
.. .. .. .. ..
. . . . .
Proposition 4.16. If ε > 0, the unique Bayesian equilibrium has both players play A regardless
of type.
If ε > 0 is small, and the payoff matrix turns out to be GR , it is very likely that both players
know that the payoff matrix is GR . But the players still play A, even though (B, B) is a strict
equilibrium in GR .
Proof. If the payoff matrix is GL , player 1 knows this, and plays A, which is dominant for
him in this matrix.
2
π0 3 2 2
= = > .
π0 + π1 2
3
+ 1
3
ε 2+ε 3
Since player 1 plays A when no messages are sent, player 2’s expected payoff to playing A is
more than 23 ×2 = 43 , while her expected payoff to choosing B is less than 23 ×(−3)+ 31 ×2 = − 43 .
Therefore, when her type is t2 = 0, player 2 should choose A.
π1
1
3
ε 1 1
= 1 = > .
π1 + π2 3
ε + 3 (1 − ε)ε 1 + (1 − ε) 2
1
–185–
Continuing, suppose that player 2 is of type 1. In this case, both players know that the
payoff matrix is GR . But player 2 will assign probability greater than 21 to 2 messages
having been sent, and so, knowing that player 1 plays A in this case, will play A herself.
And so on.
Remarks:
1. The previous example and the next one illustrate the idea of contagion, which refers
to iterated dominance arguments that arise in certain Bayesian games with “chains”
of correlation in the common prior distribution. These examples illustrate the im-
portant role that higher-order beliefs can play in equilibrium analysis. Versions of
these examples have been used to model bank runs and currency crises—see Morris
and Shin (2003).
2. The email game raises questions about what should be considered “almost common
knowledge”, and the relationship between almost common knowledge and robust-
ness of equilibrium. To address these questions, one needs to think about notions of
“closeness” on type spaces. See Monderer and Samet (1989), Weinstein and Yildiz
(2007), Dekel et al. (2006), Chen et al. (2010), and Section 4.9.
2
I N
I r, r r − 1, 0
1
N 0, r − 1 0, 0
In this game, strategy I represents investing, and strategy N represents not investing.
Investing yields a payoff of r or r − 1 according to whether the player’s opponent invests
or not. Not investing yields a certain payoff of 0. If r < 0, then the unique NE is
(N, N); if r = 0, the NE are (N, N) and (I, I); if r ∈ (0, 1), the NE are (N, N), (I, I), and
((1 − r)I + rN, (1 − r)I + rN); if r = 1, the NE are (N, N) and (I, I); and if r > 1, then the unique
NE is (I, I).
Now consider a Bayesian game BG in which payoffs are given by the above payoff matrix,
but in which the value of r is the realization of a random variable that is uniformly
distributed on [−2, 3]. In addition, each player i only observes a noisy signal ti about the
value of r. Specifically, ti is defined by ti = r + εi , where εi is uniformly distributed on
1 1
[− 10 , 10 ], and r, ε1 , and ε2 are independent of one another. This construction is known as
a global game; see Carlsson and van Damme (1993) and Morris and Shin (2003).
–186–
Pure strategies for player i are of the form si : [− 21 , 31 ] → {I, N}. To define a pure Bayesian
10 10
equilibrium, let ui (a, r) be player i’s payoff under action profile a under payoff parameter
r (which we can read from the payoff matrix). Then define the payoff to a player i of type
ti for playing action ai against an opponent playing strategy s j as
Z
Ui (ai , s j , ti ) = ui (ai , s j (t j )), r dµi (t j , r | ti ),
tj, r
where µi (·|ti ) represents i’s beliefs about (t j , r) when he is of type ti . Then for s to be a pure
Bayesian equilibrium, it must be that
Proposition 4.18. In any Bayesian equilibrium of BG, player i strictly prefers not to invest when
his signal is less than 12 , and strictly prefers to invest when his signal is above 21 .
Even though the underlying normal form game has multiple strict equilibria when r ∈
(0, 1), the possibility that one’s opponent may have a dominant strategy, combined with
uncertainty about others’ preferences and beliefs, lead almost every type to have a unique
equilibrium action. The choice of ε = 101 is not important here; any positive ε would lead
to the same result.
Proof. Observe that if ti ∈ [− 19 , 29 ], then player i0 s posterior belief about r conditional on
10 10
ti is uniform on [ti − 101 , ti + 10 1
], and his posterior belief about t j conditional on ti is the
triangular distribution with support [ti − 15 , ti + 15 ] (that is, the conditional density equals
0 at t j = ti − 51 , 5 at t j = ti , and 0 at t j = ti + 15 , and is linear on [ti − 51 , ti ] and on [ti , ti + 15 ]).
In any Bayesian equilibrium, if the value of player i’s signal ti is less than 0, then player i
strictly prefers not to invest. Indeed, if the random variable R represents the payoff r at
the ex ante stage, then E(R | ti ) = ti when ti ∈ [− 19
10
, 0] and that E(R | ti ) < − 18
20
when ti < − 19
20
.
It follows that if ti < 0, the expected payoff to I is negative, and so that N is a strict best
response for a player i.
Next, we show that if the value of player i’s signal is less than 201 , then player i strictly
prefers not to invest. We have just seen that if t2 < 0, then player 2 will play N. Now focus
on the most demanding case, in which t1 = 20 1
. In this case, player 1’s posterior beliefs
about player 2’s type are described by a triangular distribution with support [− 203 , 41 ].
Since the density of this distribution reaches its maximum of 5 at t2 = 20 1
, the probability
that player 1 assigns to player 2 having a type below 0 is the area of a triangle with base
3
20
and height 34 · 5. This area is 329 ; by the previous paragraph, it is a lower bound on the
probability that player 1 assigns to player 2 playing N. Therefore, player 1’s expected
–187–
payoff to playing I is
U1 (I, s2 , 20
1
)= 1
20
− µ1 (s2 (t2 ) = N | t1 = 1
20
) ≤ 1
20
− 9
32
< 0.
paragraph that player 2 will play N whenever t2 < 201 , so essentially the same calculation
as before shows that player 1 again must assign a probability of at least 329 to player 1
playing N. Thus, since U1 (I, s2 , 10
1
) ≤ 101 − 329 < 0, player 1’s strict best response when t1 ≤ 101
is to play N.
Rather than iterate this argument further, let τ be the supremum of the set of types that
can be shown by such an iterative argument to play N in any equilibrium. If τ were less
than 12 , then since a player whose type is less than τ will play N, a player whose type is τ
obtains an expected payoff of τ − 21 < 0 from playing N. By continuity, this is also true of a
player whose type is slightly larger than τ, contradicting the definition of τ. We therefore
conclude that τ is at least 21 . This establishes that player i strictly prefers not to invest
when his signal is less than 12 . A symmetric argument shows that player i strictly prefers
to invest when his signal is above 21 .
Informal explanation
Harsanyi (1967–1968) initiated the study of games of incomplete information, where at the
start of play, players possess payoff-relevant information that is not common knowledge.
To describe such a situation directly, one starts by introducing a set of payoff-relevant
states Θ = Θ0 × Θ1 × · · · × Θn . Taking the interim point of view, the actual payoff state is
some θ ∈ Θ, and each player i knows his θi , which is called his payoff type. The remaining
θ0 ∈ Θ0 is the state of nature, representing payoff-relevant information that no one observes.
(We did not include states of nature in our definition of Bayesian games from Section
4.1, because one need not include it explicitly to define the notions of dominance and
equilibrium introduced in Section 4.2. (Some of our examples actually had a state of
–188–
nature: for instance, the mineral rights auction described in Section 4.4, and the global
game of Example 4.17.) However, states of nature play a more important role when
defining rationalizability for Bayesian games—see Section 4.9.)
Harsanyi observed that to describe an environment with incomplete information, one
needs to specify a player’s beliefs about his opponents’ payoff types, his beliefs about his
opponents’ beliefs, and so on. Why? Player i’s beliefs about opponents’ payoff types θ−i
help determine his preferred action, because his opponent’s actions depend on their types.
But this implies that player j should care about player i’s beliefs about θ−i , because player
j’s preferred action depends on player i’s action, which depends on player i’s beliefs about
θ−i . But this in turn implies that player k should care about player j’s beliefs about player
i’s beliefs about θ−i . . . And so on.
The direct approach to addressing this has one specify infinite hierarchies of beliefs for
each player. However, games with infinite hierarchies of beliefs are difficult to analyze.
Harsanyi proposed Bayesian games as a tractable alternative. The fundamental notion here
is that of a type space, a term which refers to the collection of all players’ type sets and
first-order beliefs. As usual, a player’s type encodes both his payoff-relevant information
(i.e., his payoff type) and information about his beliefs; and a player’s first-order beliefs
directly specify his beliefs about opponents’ types and the state of nature as a function
of his own type. The players’ first-order beliefs implicitly determine their higher-order
beliefs.
The simplification provided by Bayesian games comes from the fact that each player’s
beliefs are defined by a single probability distribution over opponents’ types rather than
by an infinite sequence of probability distributions. The games should be interpreted from
a suitable interim perspective, in which player i’s type corresponds to the payoff type and
hierarchy of beliefs we started with.
Every Bayesian game corresponds to a game with hierarchies of beliefs, because the
players’ first-order beliefs implicitly determine their higher-order beliefs. But can every
game with hierarchies of beliefs be represented as a Bayesian game? This question was
answered in the affirmative under some natural consistency conditions by Mertens and
Zamir (1985) (see also Brandenburger and Dekel (1993)), through their construction of
universal type spaces. (Aside: This construction may or may not lead to a Bayesian game
satisfying the common prior assumption.)
In conclusion, using Bayesian games to model strategic environments with incomplete
information is without loss of generality. Of course, complicated hierarchies of beliefs may
need to be represented by type spaces that are much nastier than those in the examples
we have seen. As a practical matter, Bayesian games appearing in most applications have
fairly simple type sets, even though beliefs of arbitrarily high order may matter. But
certain questions—in particular, questions about the robustness of predictions to slight
misspecifications of beliefs—require one to work with universal type spaces; see Section
4.9.
–189–
Some details
In the rest of this section, we assume that there are just two players, and that Θ0 is a
singleton (meaning that it can be ignored). The general case only requires more notation.
We thus begin with the following objects:
To describe a setting with incomplete information (and hence one understood from the
interim perspective), we introduce hierarchies of beliefs {πki }∞
k=1
for each player i ∈ {1, 2}. The
probability distribution πi is player i’s kth-order belief. To explain what kind of object this
k
is, we describe the sets in which each player’s first three orders of beliefs live:
Each player’s first-order belief concerns his opponent’s payoff type. Each player’s second-
order belief concerns his opponent’s payoff type and first-order belief, and (naturally)
allows for correlation between them. Each player’s third-order belief concerns his oppo-
nent’s payoff type, first-order belief, and second-order belief, again allowing for correlation
among them. We can continue similarly to higher-order beliefs.
Notice that the marginal distribution of second-order belief π21 on Θ2 can be viewed as
a first-order belief, which we will call π̂11 . Likewise, the marginal distribution of third-
order belief π31 on Θ2 × ∆Θ1 can be viewed as a second-order belief, which we will call
π̂21 . For player 1’s hierarchy of beliefs to be sensible, different orders of belief should not
contradict one another: we should have π̂11 = π11 and π̂12 = π12 . If these agreements across
levels always hold for player i, that is, if π̂ki = πki for all k ≥ 1, then player i’s hierarchy of
beliefs is said to be coherent. These are the only hierarchies we should care about.
In addition, in a multi-player model of rational behavior, it is also natural to require player
i to be certain that (i) player j’s hierarchy of beliefs is coherent, and that (ii) player j is
certain that player i’s hierarchy of beliefs is coherent, and so on. When this is true, we say
that player i’s belief hierarchy is collectively coherent. We denote the set of such hierarchies
by Hi . (The set Hi can be defined inductively.)
Use the notation π∗i = {πki }∞ k=1
as a shorthand for a belief hierarchy of player i. A game
with collectively coherent hierarchies of beliefs is a collection GH = {Ai , Θi , π∗i , ui }i∈{1,2} , where
π∗ ∈ H . Collective coherence ensures that neither player can experience inconsistencies
i i
of any kind when reasoning about beliefs. This makes games with collectively coherent
hierarchies of beliefs acceptable for analysis, in that we can define solution concepts for
these games without running into loose ends. (The situation is somewhat analogous to
that of perfect recall in extensive form games: although we can write down games that
–190–
fail perfect recall, it is not clear how to analyze them.)
While in principle we can analyze games of form GH, the infinite hierarchies of beliefs can
make the analysis difficult to carry out. This raises the question of whether games of form
GH can always be represented as Bayesian games—that is, by specifying appropriate type
spaces and first-order beliefs for each player. Mertens and Zamir (1985) proved that this
is always possible, thus establishing that Bayesian games provide a completely general
approach to modeling strategic interactions with incomplete information. Brandenburger
and Dekel (1993) provided a much simpler proof under different technical assumptions.
Here are the details. The universal type space is a collection {(Ti∗ , p∗i )}i∈{1,2} . Player i’s set of
types is T∗ = Θ × H , where Θ is his set of payoff types, and H is his set of collectively
i i i i i
coherent hierarchies of beliefs. (Thus there is nothing that is really new so far.) The
function p∗i : Ti∗ → ∆T∗j is player i’s first order belief function; it describes his beliefs about
player j’s type as a function of his own type. (This p∗i is the new part. Let us note that in
general there is no reason that p∗1 and p∗2 should be derivable from a common prior.)
The key step in the construction is defining the first-order belief functions p∗ . To do so, i
Mertens and Zamir (1985) show that there is a homeomorphism (a continuous function
with continuous inverse) between Hi and ∆T∗j . This function, which we denote by pi ,
assigns to each hierarchy of beliefs π∗i a probability distribution pi (·|π∗i ) on T∗j . A key
property of pi is that for every k ≥ 1, the marginal of distribution pi (·|π∗i ) that describes
player i’s kth order belief is none other than πki , the kth-order belief from hierarchy π∗i . To
sum up, this result tells us that the set Hi of collectively coherent hierarchies of beliefs is
equivalent (via pi ) to the set ∆T∗j of distributions over the opponent’s types, and that pi
preserves the meaning of the hierarchies of beliefs.
To finish the definition of the universal type space, we set p∗ (θ , π∗ ) = p (π∗ ), so that i’s
i i i i i
posterior beliefs only depend on the hierarchy part of his type, and capture this hierarchy
appropriately (by virtue of the key property of pi ).
Putting this all together, we can conclude that any game GH with payoff-type profiles
Θ and collectively coherent hierarchies of beliefs is equivalent to a Bayesian game BG =
{Ai , Ti∗ , p∗i , ui }i∈{1,2} whose type space {(Ti∗ , p∗i )}i∈{1,2} is the universal type space based on Θ,
with BG being given the appropriate interim interpretation.
Let us emphasize this last point. GH represents an incomplete information interaction
in which each player i has a particular payoff type θi and hierarchy of beliefs π∗i . BG
introduces a large set of possible types Ti∗ = Θi × Hi in order to include all types that may
arise when considering the players’ beliefs. Nevertheless, player i’s actual type is the pair
(θi , π∗i ) we started with.
Technical remark 1: For it to be meaningful to call the function p : H → ∆T∗ continuous,
i i j
topologies (i.e., notions of “closeness”, or more specifically, convergence and continuity)
must be specified for Hi and ∆T∗j . The central point here is that in Mertens and Zamir
(1985) and Brandenburger and Dekel (1993), the space of hierarchies H j is assigned the
product topology, under which a sequence of hierarchies π∗j converges if for each k ≥ 1,
the kth-order beliefs πkj converge in distribution (i.e., in the sense used in the central limit
–191–
theorem) to their limit point. Informally, convergence in the product topology requires
that beliefs of any given finite order eventually settle down. There are some contexts in
which more demanding notions of closeness are considered—see the last discussion point
in the next section.
Technical remark 2: The central ingredient of Brandenburger and Dekel’s (1993) short
proof is a result from probability theory called the Kolmogorov extension theorem. This
theorem guarantees the existence of a probability distribution on infinite sequences whose
marginals are any prespecified collection of distributions on finite subsequences that is
“coherent” (i.e., that could possibly fit together as the marginals of a single distribution).
This result seems plausible enough, but it turns out that constructing a distribution on
an infinite-dimensional space that captures an infinite collection of finite-dimensional
distributions is a complicated business.
Taking this change into account, the condition (26) for a pure strategy profile s to be a
Bayesian equilibrium remains the same:
(26) Ui (si (ti ), s−i |ti ) ≥ Ui (âi , s−i |ti ) for all âi ∈ Ai , ti ∈ Ti , and i ∈ P .
Recall that in the interim interpretation of Bayesian games, there is no ex ante stage. Each
player knows his type from the start, and the other types in his type set are only there
–192–
to capture other players’ uncertainty, as well as higher-order uncertainty. When defining
rationalizability in this context, we allow a player’s different types to hold different con-
jectures about the others’ actions and the state of nature. In addition, we allow conjectures
to include not only correlation between different opponents’ actions, but between those
actions and the state.
In this setting, a conjecture for type ti is a probability distribution µi (·|ti ) ∈ ∆(A−i × T−i × Θ0 ).
The conjecture must allow correlation between opponents’ actions and types, since dif-
ferent types of player j may choose different actions. To not contradict earlier definitions,
a type’s conjecture must agree with his first-order beliefs. Formally, we say that type ti ’s
conjecture µi (·|ti ) agrees with first-order beliefs if
X
(35) µi (a−i , t−i , θ0 |ti ) = pi (t−i , θ0 |ti ).
a−i ∈A−i
Interim correlated rationalizability (ICR) (Battigalli and Siniscalchi (2003), Dekel et al. (2007))
is defined by iteratively eliminating actions of each type that are not a best response to
any of that type’s conjectures that are still feasible and that agree with first-order beliefs.
Since each type is considered separately, one refers to the ICR actions of a particular type
ti rather than of a player i.
Our analysis of the e-mail game (Example 4.15) used just such an iterated elimination
procedure. Thus we can restate our conclusion there as saying that A is the unique ICR
action of all types of both players.
Some discussion and interesting results about ICR:
(i) As with rationalizability for normal form games, many game theorists find it most
natural to allow for correlations of all sorts in the context of Bayesian games, even
the novel form of correlation between opponents’ actions and an unobservable
state of nature—see e.g. p. 24 of Dekel et al. (2007).
(ii) We argued in Section 4.8 that the primitive description of an incomplete information
environment is a game GH defined by directly specifying each player’s payoff type
and hierarchy of beliefs. It is possible to specify two different Bayesian games,
BG and BG0 , each of which has a type, ti and t0i , whose payoff type and hierarchy
of beliefs agrees with that of player i in GH. A desirable invariance property
for an interim solution concept is that it make the same predictions of play for ti
and t0i , since from a more primitive point of view these types should be regarded
as identical. Dekel et al. (2007) show that ICR satisfies this property. In contrast,
notions of interim rationalizability that require beliefs about others’ actions and the
state of nature to be independent do not have this invariance property. (Example 1
of Dekel et al. (2007) explains the issues here clearly.)
(iii) It is desirable to use solution concepts that are robust to slight misspecifications
of higher-order beliefs, since these may be hard for a modeler to know. Weinstein
and Yildiz (2007) show that if one allows enough flexibility in these slight misspec-
ifications, then ICR is robust, and, more importantly, no stronger interim solution
concept is robust. “Enough flexibility” is captured by a richness assumption, which
–193–
requires that for each player i and each action ai of that player, player i has a payoff
type for whom action ai is dominant. (More on this below.)
For intuition we again consider the e-mail game (Example 4.15). This example
included a normal form game GR with strict equilibria (A, A) and (B, B), with (B, B)
being Pareto dominant. The full e-mail game BG included types ti whose beliefs up
to high orders asserted that GR describes the actual payoffs. However, still-higher-
order beliefs of ti allowed that payoffs might be described by another normal form
game, GL , in which A is strictly dominant. In the end, A turned out to be type ti ’s
unique ICR action.
Weinstein and Yildiz (2007) observe that the selection of equilibrium (A, A) is frag-
ile. It is also possible to construct a Bayesian game BG0 with an “approximately
complete information” type t0i whose unique ICR action is B. Thus, if all we know
is that agent i’s type is approximately a complete information type, there is no basis
for selecting between actions A and B.
More generally, Weinstein and Yildiz (2007) show the following: Starting from any
Bayesian game in which some type ti has multiple ICR actions, one can always
slightly alter beliefs to create a type t0i that is nearly identical to ti , but has any type
ti ICR action you like as its unique ICR action. It follows that no refinement of ICR
is robust to small changes in higher-order beliefs.
(Two further results tell us more about the ICR correspondence: i.e., the map from
types in the universal type space to ICR actions. Weinstein and Yildiz (2007) show
that “nearly all” types in the universal type space have a unique ICR action, where
“nearly all” is defined in a natural topological sense. A complementary result of
Dekel et al. (2007) shows that the ICR correspondence is upper hemicontinuous.)
The richness assumption stated above is not sensible in all applications. For in-
stance, if one starts with a first price auction, it seems odd to introduce types who
find it dominant to place a bid higher than the highest value that any bidder in the
original auction assigns to the good. Likewise, one might want to impose other
common knowledge assumptions about the game’s structure that restrict what
slight modifications of the game are allowable. Penta (2013) shows that Weinstein
and Yildiz’s (2007) results extend as far as one could reasonably expect to settings
with such common knowledge restrictions.
(iv) The foregoing results follow Mertens and Zamir (1985) in using the product topol-
ogy on hierarchies of beliefs to define what is meant by a “slight” perturbation. One
can ask whether there is a stronger topology on hierarchies of beliefs (i.e., a more
demanding notion of closeness) under which Weinstein and Yildiz’s (2007) results
no longer hold. This question is studied by Dekel et al. (2006) and Chen et al. (2010).
The latter paper establishes continuity properties of the ICR correspondence under
a suitably defined uniform topology, in which a sequence of hierarchies of beliefs
π∗i converges if beliefs of all orders simultaneously become close to their limiting
values.
–194–
Weakening belief restrictions
Section 4.6 introduced ex post equilibrium (= belief-free equilibrium), an equilibrium
concept for Bayesian games that makes no use of agents’ beliefs. One can proceed in a
similar spirit with rationalizability. To define belief-free rationalizability, one drops condition
(35) that players’ conjectures agree with their first-order beliefs. The agnosticism inherent
in this solution concept makes it quite weak in most games. As in the equilibrium case,
belief-free rationalizability is well-defined in “pre-Bayesian games” in which beliefs are
not specified, only the sets of payoff types and states of nature. Given how complicated
beliefs can turn out to be, we can now appreciate that this is a dramatic simplification!
The definitions of ICR and belief-free rationalizability can be viewed as two extremes: in
the first, the beliefs of each type are specified exactly, while in the second, no information
about beliefs need be specified at all. One can also consider intermediate cases. For
a general approach, one can assume that it is common knowledge that the conjectures
of payoff type θi lie in some prespecified subset ∆θi of ∆(A−i × T−i × Θ0 ). To obtain the
intermediate cases just noted, one can choose ∆θi to embody restrictions on θi ’s beliefs
about opponents’ payoff types and the state of nature. More broadly, ∆θi can capture
joint restrictions on opponents’ payoff types and actions (for instance, that opponents’
bid functions are monotone). The class of solution concepts of this sort is introduced by
Battigalli and Siniscalchi (2003) under the name ∆-rationalizability.
Solution concepts with weak belief restrictions are basic tools in robust mechanism design,
as we describe in Section 7.6.4.
1. Hidden actions/moral hazard: one player takes an action that is not perfectly observed.
The basic principal-agent problem with hidden actions: The player whose actions are
perfectly observed, called the principal, chooses a contract. Then the player whose
action is not perfectly observed, called the agent, chooses whether to accept the
contract, and then chooses an action. The action generates an observable signal on
which payoffs depend.
2. Hidden information/adverse selection: one player has private information about his per-
sonal characteristics (preferences, productivity). In the basic case, this information
is obtained before play begins.
–195–
i. one uninformed player: basic signalling games à la Cho and Kreps (1987).
ii. competing uninformed players: Spence (1973) job market signalling.
(b) Screening: the uninformed player moves first.
i. one uninformed player: the basic principal-agent problem with hidden infor-
mation/monopolistic screening. The player without private information, the
principal, chooses a menu of contracts. Then the privately informed player,
the agent, decides which contract to accept.
ii. competing uninformed players: Akerlof (1970) market for lemons, Roth-
schild and Stiglitz (1976) market for insurance
One can also organize information economics according to the tools used to analyze them,
and by their historical development:
The earliest models are those with competing uninformed players (Akerlof, Spence,
Rothschild-Stiglitz). The competition can be represented using a partial equilibrium
framework, as they were originally: price-taking behavior by the uninformed players
leads to a “zero profit” condition, which must be satisfied simultaneously with optimiza-
tion by each type of informed agent. These models are easily recast in game-theoretic
terms, which have the advantage of specifying what happens out of equilibrium.
Signalling models are most naturally presented in a game-theoretic framework.
The basic principal-agent problems use only rudimentary game theory; the analyses
amount to solving constrained optimization problems. Also, according to the revelation
principle, it is often enough to restrict attention to direct mechanisms, under which agents
report their private information directly to an “administrator”, and are provided with
incentives to do so truthfully. This device converts basic mechanism design problems
into optimization or feasibility problems. Moving beyond these basic models or looking
at “natural” mechanisms (like auctions) rather than direct mechanisms leads one to use
game theory in a more serious way.
The outline above describes only the most basic models, and there is a vast theory that
elaborates on these models in various directions.
–196–
5. Signalling and Screening with Competing Uninformed Players
Since the uninformed parties (the buyers) move first, this is a screening model.
(If both buyers choose the same price, we need to say which seller types sell to each
buyer—see Example 5.2.)
–197–
Example 5.1. Suppose F is uniform(0, 1) and b(θ) = βθ with β ∈ (1, 2). Then for p ∈ (0, 1],
Rp
0
βθ dθ 1
2
βp2 β
B(p) = = = p < p.
p p 2
Since being the sole buyer at a positive price generates a negative expected payoff, buyer
i does not want to choose pi > p j . If pi = p j = p > 0 then the overall expected benefit is less
than p, so at least one buyer is unhappy regardless of the seller’s strategy (see the final
paragraph). Thus in any subgame perfect equilibrium, pi = p j = 0, and trade occurs with
probability 0. This is despite the fact that trade is always efficient! _
Example 5.2. Suppose F is uniform(0, 1) and b(θ) = α+βθ with α ∈ (0, 1) and β ∈ (1−α, 2−2α).
Then for p ∈ (0, 1],
Rp
0
(α + βθ) dθ αp + 21 βp2 β
B(p) = = = α + p.
p p 2
–198–
B(p) − p∗ < B(p∗ ) − p∗ < 0, so buyer 2 would prefer to deviate to a lower price and never
obtain the car.) _
Remarks:
(i) Akerlof’s original model used a form of competitive equilibrium that imposes a
zero profit condition on buyers. As above the equilibrium prices are defined as
fixed points of B(·), but what happens out of equilibrium is not specified. In the
game-theoretic model above, price competition among buyers results in zero profits
without appealing to a reduced form. The game-theoretic model also shows that
having two buyers is enough to drive the profits of each to zero.
(ii) In cases where B(·) has multiple fixed points, then all fixed points are competitive
equilibria à la Akerlof, but typically only the largest fixed point is a subgame perfect
equilibrium price. (See the problem set.)
The model
Firm i’s payoff for hiring a worker of type θA ∈ Θ at wage w is ui (w, θA ) = θA − w. Its
payoff for not hiring a worker is 0. Thus we are assuming here that education has no
direct value; its only purpose is to signal ability. Other assumptions are also possible.
–199–
We assume cA (·) is differentiable, increasing, and strictly convex with cA (0) = 0, and that
the single crossing condition holds:
Thus at every positive education level, the high ability types have a lower marginal cost
of education.
Note that all of type θA ’s indifference curves have slope c0A (e) at education level e; this
slope does not depend on the wage. (Let ūA ≡ w − cA (e) be a type θA indifference
curve. Expressing w as a function of e yields w̄A (e) ≡ cA (e) + ūA , so differentiating yields
w̄0A (e) ≡ c0A (e).) Thus a given pair of indifference curves, one for each type, crosses at most
once; if they do cross, then the θH indifference curve crosses the θL indifference curve from
above.
w uL uL uH
Thus type θL has a higher absolute cost than type θH of attaining any given education
level.
We will see that versions of the single crossing condition play a fundamental role through-
out information economics. For general treatments, see Milgrom and Shannon (1994) or
Milgrom (2004, ch. 4).
–200–
Analysis
Since there is imperfect information, our first thought is to look for sequential equilibria
of this model. However, it is not clear how to define sequential equilibrium when there
are continuous sets of actions and/or types. (The problem is that in perturbed strategy
profiles, almost all strategies will have probability zero.)
The usual practice is to use weak sequential equilibrium and to directly impose some of
the implications of consistency. Refinements of this sort are typically called perfect Bayesian
equilibrium (see Fudenberg and Tirole (1991a,b)). Exactly what this term means varies from
paper to paper. Here we assume that the firms have common beliefs µ : R+ → ∆Θ about
the worker’s type conditional on his education level. We write µ(e) = (µL (e), µH (e)) for the
beliefs given education level e. (Aside: if we write down the game tree, firm 2 will act
after firm 1 but without viewing firm 1’s choice, and so firm 2’s beliefs must also address
firm 1’s choice. In effect, we are assuming that firm 2’s beliefs about the worker’s type
and firm 1’s choice are independent of one another.)
In a pure perfect Bayesian equilibrium with common beliefs, (eL , eH , w1 (·), w2 (·), µ(·)):
(i) In the last stage, the worker chooses optimally among wage offers (i.e., works for
a firm whose wage offer is highest).
(ii) In the second-to-last stage, having observed the worker’s education choice e, the
firms choose wages w1 (e) and w2 (e) optimally given their common belief µ(e). In
view of the competition between the firms, both will choose wages equal to the
worker’s expected ability. (The details of this argument are the same as in the
Akerlof lemons model.)
(iii) Each worker type chooses his education level optimally given (i) and (ii). Thus the
education choice eA of a worker of type θA ∈ Θ must satisfy
eA ∈ argmax max{w1 (e), w2 (e)} − cA (e) .
e∈R+
Mixed perfect Bayesian equilibrium is defined similarly, but we will not consider such
equilibria here.
Forward induction refinements à la Cho and Kreps (1987) are used to refine the set of
perfect Bayesian equilibria.
There are two kinds of pure equilibrium: separating (the two types choose different ed-
ucation levels), pooling (the two types choose the same education level). There are also
–201–
semi-separating mixed equilibria (one type mixes, sometimes choosing the same education
level as the other type).
We must also check that neither type wants to choose the other type’s education level. He
would then be mistaken for the other type, and thus get that type’s equilibrium wage.
=0
z}|{
θL − cL (0) ≥ θH − cL (eH ) ⇔ cL (eH ) ≥ θH − θL
θH − cH (eH ) ≥ θL − cH (0) ⇔ cH (eH ) ≤ θH − θL
|{z}
=0
cL (eH ) ≥ θH − θL ≥ cH (eH ).
(Loosely speaking: For the low ability type, the cost of obtaining education level eH exceeds
the resulting increase in wages; for the high ability type, it does not.)
At e = 0, the first inequality fails but the second is satisfied.
Now increase e until reaching the value e at which the first inequality binds. Since
cL (e) > cH (e) for all e > 0 (by (36)), the second inequality is satisfied strictly at e. As we
continue to increase e, both inequalities are satisfied strictly until we reach the value ē at
which the right inequality binds. For e > ē, the right inequality fails.
Thus the values of eH that are possible in equilibrium are those in the interval [e, ē].
The figure at left defines e and ē and shows the types’ utility levels in the eH = ē equilibrium,
at which type θH is indifferent between education levels eH and 0. The figure at right shows
the types’ utility levels in the eH = e equilibrium, at which type θL is indifferent. Of course,
conditional on separation, θH is better off when he is able to choose a lower education
level.
–202–
¯L
u ¯H
u ¯L
u ¯H
u
w w
θH θH
θL θL
e e e e e
To complete the specification of an equilibrium with effort levels eL = 0 and eH ∈ [e, ē] ,
we need to specify the firms’ beliefs after observing education levels other than eL and eH .
These beliefs will determine the wage they offer after the observing these unused signals,
and so will determine whether either type of worker wants to deviate. The simplest
specification of beliefs ensuring that neither worker type deviates is µL (e) = 1 for any
e < {0, eH }, so that wi (e) = θL for all such e. Many other choices of beliefs work as well. Our
forward induction refinements place additional structure on out of equilibrium beliefs
We can refine the set of equilibria by applying a weak form of forward induction based
on removal of conditionally dominated strategies. If firms only choose wages in [θL , θH ],
then θL must be better off playing e = 0 and obtaining a payoff of at least θL than playing
e > e and obtaining a payoff of at most
Thus a weak forward induction requirement—one that does not fix a particular equilib-
rium in advance—is to require µL (e) = 0 for all e > e that are not dominated for type
θH . This rules out every eH > e, since if type θH were to choose eH , he could deviate to
e0H ∈ (e, eH ), pay a lower cost, but still receive wage θH . Thus only the separating equilibria
with e∗H = e survives. This is sometimes called the Riley outcome, after Riley (1979).
In summary:
Proposition 5.3. There are perfect Bayesian equilibria with eL = 0 and eH = e for every e ∈ [e, ē],
where cL (e) = θH − θL = cH (ē). Only equilibria generating the Riley outcome, i.e., with eH = e,
survive the removal of conditionally dominated strategies.
–203–
Thus µH (e∗ ) = pH and w1 (e∗ ) = w2 (e∗ ) = (1 − pH )θL + pH θH ≡ θ̄.
(Note: As in the Akerlof model, a full specification of equilibrium must have workers
choosing whom to work for in a way that gives both firms zero expected profits (e.g.,
randomizing uniformly). Otherwise, the firm earning negative expected profits would
prefer to deviate.)
Whether pooling at e∗ is optimal for both worker types depends on both equilibrium and
out-of-equilibrium wages. For the best chance of making this work, specify µL (e) = 1 for
e , e∗ , so wi (e) = θL for such e. Then pooling equilibrium requires
Let ẽ > 0 make (37) bind. Then (37) is satisfied when e∗ ∈ [0, ẽ], and by the single crossing
property so is (38). Thus [0, ẽ] is the set of education levels arising in pooling equilibria.
(Note that since cL (e) = θH − θL > θ̄ − θL = cL (ẽ), it follows that ẽ < e.)
We impose forward induction here by way of the Cho-Kreps criterion (Section 2.5.2).
Fix a perfect Bayesian equilibrium with payoffs u∗ and u∗ . For T ⊆ Θ, define W (e) to be
L H T
the set of equilibrium wages that are possible given education level e if the firms’ beliefs
satisfy θA ∈T µA (e) = 1. Then
P
( )
∗
D(e) = θA : uA > max uA (w, e)
w∈WΘ (e)
is the set of types for whom education level e is equilibrium dominated. If for some e , e∗
with D(e) , Θ and some type θB ∈ Θ, we have
Fix a pooling equilibrium with common effort level e∗ ∈ [0, ẽ], so that θ̄ − cL (e∗ ) ≥ θL .
Define e0 > e∗ by θ̄ − c (e∗ ) = θ − c (e0 ) and e00 > e∗ by θ̄ − c (e∗ ) = θ − c (e00 ). The single
L H L H H H
crossing property implies that e > e . (See the figures below for examples.)
00 0
We now argue that given the firms’ anticipated reactions, only type θH workers will prefer
–204–
to deviate to any e ∈ (e0 , e00 ), breaking the pooling equilibrium. For such e,
Thus D(e) = {θL } and Θ − D(e) = {θH }. But if µH (e) = 1, then both firms choose wi (e) = θH .
Thus by (†), type θH prefers to deviate to e.
In summary:
Proposition 5.4. There are perfect Bayesian equilibria with eL = eH = e∗ for every e∗ ∈ [0, ẽ],
where cL (ẽ) = θ̄ − θL . No such equilibrium satisfies the Cho-Kreps criterion.
Some pooling equilibria can be ruled out using conditional dominance arguments, and
so without appealing to the full strength of the Cho-Kreps criterion. In the figure at
left, choices by type θL of effort levels above e0 are strictly dominated. Thus the simple
forward induction argument used to refine the set of separating equilibria applies equally
well here.
But in the figure at right, no effort level e ∈ (e0 , e00 ) is strictly dominated for type θL , since
the points (e, θH ) are all above θL ’s indifference curve through (0, θL ). However, since the
points (e, θH ) are below θL ’s indifference curve through (e∗ , θ̄), an equilibrium dominance
argument rules out such choices by θL . Thus the Cho-Kreps criterion rules out the pooling
equilibrium pictured here.
w
¯L
u ¯H
u
w ¯L
u ¯ L0
u ¯H
u
θH θH
θ̄ θ̄
θL θL
e* eʼ eʼʼ e e* eʼ eʼʼ e
Remarks:
(i) The Riley outcome Pareto dominates the outcomes of the other separating equilib-
ria.
–205–
(ii) Relative to the Riley outcome, type θL workers would be better off if signalling
were unavailable, since they would then obtain wage (1 − pH )θL + pH θH .
(iii) Type θH workers might also be better off if signalling were unavailable. They prefer
the best pooling equilibrium to the Riley outcome when
This will be true if pH is high enough, since in this case the no-signal wage is nearly
as large as θH ’s separating equilibrium wage.
(iv) Another way to put (ii) and (iii) is to say that when pH is close to 1, the Riley
outcome is Pareto inferior to the outcome of the pooling equilibria with e∗ = 0. But
if the pooling equilibrium breaks down, the firms’ beliefs change from µH (0) = pH
to µH (0) = 0, and there is no reason for them to change back.
The presentation mainly follows Jehle and Reny (2011, Sec. 8.1.3).
The model
–206–
The utility of a type θP consumer from choosing policy (b, p) is
Suppose we draw an indifference curve uP (p, b) = ūP of type θP in (b, p) space. Expressing
p as a function of b and differentiating yields
∂uP
dpP ∂b
(b, p) θP v0 (x1 ) 1
(39) (b) = − ∂u = = > 0.
db P
(b, p) (1 − θP )v0 (x0 ) + θP v0 (x1 ) 1 + (1−θP )v0 (x0 )
∂p θP v0 (x1 )
dpP
And v is strictly concave ⇒ v0 (·) is decreasing ⇒ db
(·) is decreasing ⇒ pP (·) is strictly
concave.
Implications:
dp
(i) The single crossing property holds: since dbP (b) is increasing in θP , type θH ’s
indifference curves cross type θL ’s from below.
(ii) A full insurance policy (b, p) is one with benefit b = `, implying that x1 = x0 . Equation
(39) shows that at such a policy, type θP ’s indifference curve has slope θP .
(iii) An actuarially fair policy (bP , pP ) for type θP is one with p = θP b. Selling such a policy
to its intended type earns a firm zero expected profit. By (ii), a type θP consumer
who could choose any actuarially fair policy would choose the full insurance policy
(bP , pP ) = (`, θP `). (See the figure.)
–207–
p p = θH b
¯H
u
p = θLb
¯L
u
Analysis
We look for pure strategy subgame perfect equilibria. (Since the uninformed party moves
first, this game has stagewise perfect information, so subgame perfection is enough. We
comment on mixed equilibria at the end.)
Lemma 5.5. In any pure subgame perfect equilibrium, both firms earn zero expected profit.
Proof. Each firm’s expected profits are non-negative, since a firm can always offer a pair
of null policies. So suppose expected profits are πi > 0 and π j ∈ [0, πi ].
If both types choose policy (b, p) (that is offered by at least one firm), then j can offer
(b + ε, p) for some small ε > 0 and improve his profits to ≈ πi + π j . l
So suppose θL choses (bL , pL ) and θH chooses (bH , pH ). Then
and the single crossing property implies that at least one inequality is strict. We suppose
that (†) is strict; the proof if (‡) is strict is similar. If firm j offers (bH + εH , pH ) for some
εH > 0 and θH consumers will strictly prefer this to (bH , pH ):
–208–
For this fixed εH , the last two inequalities still hold if bL is replaced by bL + εL for εL > 0
small enough. So if firm j chooses ((bL + εL , pL ), (bH + εH , pH )), it earns ≈ πi + π j . l
Lemma 5.6. There is no pure subgame perfect equilibrium with pooling (i.e., in which both
consumer types select the same policy (b∗ , p∗ ) ).
Proof. Since each firm’s expected profits are zero, so are total expected profits:
Suppose that b∗ > 0. Then (40) implies that p∗ > 0, and in particular that p∗ − θL b∗ > 0.
p
¯H
u
(b*,p*)
¯L
u
If firm i offers (b∗ , p∗ ), then firm j can offer a policy in the shaded region in the figure above
and sell only to type θL . (This is known as cream skimming.) If this policy is close enough
to (b∗ , p∗ ), firm j earns ≈ αL (p∗ − θL b∗ ) > 0. l
Suppose instead that b∗ = 0. Then (40) implies that p∗ = 0. Since (0, 0) is actuarially fair,
θH strictly prefers (`, θH `) to (0, 0), and so strictly prefers (`, θH ` + ε) to (0, 0) when ε > 0 is
small enough. Moreover, policy (`, θH ` + ε) generates positive expected profits if it is sold
to type θH or to type θL (or both). Thus either firm can profitably deviate to this policy. l
Proposition 5.7. Suppose there is pure subgame perfect equilibrium with separation, meaning
∗ , p∗ ) chosen by type θH . Then
that the policy (bL∗ , pL∗ ) chosen by type θL differs from the policy (bH H
(iv) Type θL ’s policy provides less than full insurance: bL∗ < `.
–209–
p p = θH b
¯H
u
(b ,p )
H H
* *
¯L
u p = θLb
(bL ,pL )
* *
payoff).
Why? If not, then a firm can profitably offer an almost actuarially fair policy and make
positive profit (from just θH or from both types).
Why? Since θH ’s favorite fair policy is full insurance, any policy θH likes at least as much
as fair full insurance generates a nonpositive profit—see the figure.
p p=θH b
¯H
u
p=θLb
Step 3: pL∗ − θL bL∗ = 0 (firms generate zero expected profits from θL , and so from θH too).
Why? Since expected profits are zero by Lemma 5.5, Step 2 implies that expected profits
from type θL are nonnegative: pL∗ − θL bL∗ ≥ 0.
–210–
Suppose pL∗ − θL bL∗ is positive. Then pL∗ > 0, and so bL∗ > 0; otherwise θL would prefer to
be uninsured. Now some firm i offers policy (bH ∗ , p∗ ). Consider what happens if firm j , i
H
only offers policies that are both close to (bL∗ , pL∗ ) and in the shaded region in the figure
below. Since these policies are close to (bL∗ , pL∗ ), they earn positive expected profits when
sold to θL . Since they are in the shaded region, θL will buy them rather than (bL∗ , pL∗ ), and
θH likes them less than (bL∗ , pL∗ ), and hence less than (bH ∗ , p∗ ). Thus firm j will earn a positive
H
p ¯H
u
(bL ,pL ) (bH ,pH)
* * ¯L * *
u
Why? The inequality in step 1 holds, and step 3 implies that the inequality in step 2 binds;
(`, θH `) is the only policy consistent with these two facts.
p p=θHb
p=θLb
¯u¯HH
u
¯¯
(b,p)
(bH ,pH )
* *
(b,p)
¯¯
b b̄
¯
–211–
crosses p = θL b from above at a point southwest of (b̄, θL b̄). Thus θL strictly prefers (`, θH `)
to any (bL∗ , θL bL∗ ) with bL∗ ≥ b̄. l
If bL∗ > b (and bL∗ < b̄ if b̄ exists), then θH strictly prefers (bL∗ , pL∗ ) to (bH
∗ , p∗ ) = (`, θH `). l
H
If bL∗ < b and i offers (bL∗ , pL∗ ), then since θL ’s indifference curve through (bL∗ , pL∗ ) crosses the
actuarially fair line from below, j can cream-skim by slightly increasing b and p. (The claim
about the indifference curve is true because the function pL (·) describing θL ’s indifference
curve is concave and has slope θL at b = `, and thus slope greater than θL at b = bL∗ .) l
Note the similarity of the equilibrium outcome here to that of the Riley outcome of the
Spence model.
Each model has an “indicator” (benefit, education level) and a transfer (price, wage).
In each model,
(i) The firm obtains zero expected profit from each type.
(ii) The “bad” type gets its preferred indicator: since it does not benefit from being
identified, it will not accept an inferior indicator to make this happen.
(iii) The “good” type’s indicator-transfer pair makes the “bad” type indifferent: because
of the competition between the firms, the “good” type sacrifices only as much as it
has to to achieve separation.
While we looked at signalling in a labor market and screening in an insurance market, we
could have reversed either of these:
(i) In a labor market screening game (firms offer wage-effort pairs), if a pure sub-
game perfect equilibrium exists it generates the Riley (1979) outcome. (See MWG
ch. 13.D.)
(ii) In an insurance signaling game (consumers propose policies, firms accept or reject),
the Cho-Kreps criterion selects the “Riley outcome” (See Jehle-Reny sec. 8.1.2.)
Pure subgame perfect equilibria of this model need not exist. One possibility for a failure of
existence is illustrated in the figure below, where policies in the shaded region are accepted
by both types and generate positive expected profits. Such a region exists whenever αL is
close enough to 1 (i.e., whenever the probability of a low risk type is high enough).
Dasgupta and Maskin (1986b) prove that mixed subgame perfect equilibrium always
exists in this model.
–212–
p p = θHb
¯H
u
p = (αLθL+(1–αL)θH)b
p = θLb
¯L
u
b b
¯
6. Principal-Agent Models
In a principal-agent model, an uninformed party (the principal) makes a take-it-or-leave-it
offer to an informed party (the agent). This setup gives the principal all of the bargaining
power, but the agent’s informational advantage gives her some power as well.
In the principal-agent model with moral hazard, the form of the informational advantage is
that the principal cannot observe the agent’s action choice, but only a stochastic output
that is influenced by this choice.
The agent’s utility is u ∈ R for rejecting the contract, and u(w) − e for choosing effort e
and earning wage w, where u(·) is twice differentiable with u(0) = 0 and u0 (w) > 0 and
u00 (w) < 0 for w > 0. Thus effort levels are named according to their costs.
The principal maximizes expected profit. (A common variation is to assume the principal
is risk-averse.)
–213–
Analysis: We look for subgame perfect equilibrium.
In stage 2, the agent will either choose her preferred effort level or choose to reject the
contract.
Thus for each possible effort level e, the principal determines the wage schedule that
induces choice e while earning the principal the highest expected profit for the principal
while inducing the agent to choose e. The principal then selects optimally among these
wage schedules and the option to shut down.
One basic question concerns the form of the wage schedules: under what conditions are
wages nondecreasing in output?
Some key early papers: Mirrlees (1999), Holmström (1979). Complete analysis: Grossman
and Hart (1983). The presentation here borrows from the textbook treatment of Salanié
(2005, Sec. 5.1–5.2).
Notation
Analysis
Inducing e = eL :
Here we implicitly assume that when she is indifferent, the agent acts in the way that the
principal prefers. This must be the case in equilibrium, for the same reason that in the
unique equilibrium of alternating offer bargaining games, each player’s offers make the
opponent indifferent between accepting and rejecting.
Solution: Consider the relaxed problem problem that omits constraint (IL ). In this problem,
–214–
wS
1 – qL
slope: –
qL
1 – qL uʼ(wF)
slope: – ·
qL uʼ(wS)
u
¯
45º
wF
But if wages are independent of output, constraint (IL ) is satisfied. Thus wS = wF = w∗L
solves the original problem.
Inducing e = eH :
Solution: Wages should be chosen to make (RH ) bind, since if it did not bind, reducing
u(wS ) and u(wP ) equally would increase profits without affecting (IH ).
Constraint (IH ) must also bind. (See the figure. In the relaxed program without constraint
(IH ) the principal would choose the black dot. But the constraint curve u(wS )−u(wF ) = d > 0
is above the 45◦ line, and points satisfying the constraint are on or above this curve. Thus
–215–
the black dot is not feasible. The feasible set is the shaded region above the gray dot, so
the best the principal can do is to choose the gray dot.)
wS
u(wS) – u(wF) = d > 0
u(wS) – u(wF) = 0
u
¯
1 – qH
slope: –
qH
45º
wF
Thus simultaneously solving the binding versions of (IH ) and (RH ) gives us the equilibrium
utilities (and implicitly the equilibrium wages):
qH eL − qL eH (1 − qL )eH − (1 − qH )eL
u(w∗F ) = u + , u(w∗S ) = u + .
qH − qL qH − qL
Thus since u(w∗L ) = u + eL , a calculation verifies that w∗F < w∗L < w∗S .
Π∗H ≥ Π∗L ⇔ qH (xS − w∗S ) + (1 − qH )(xF − w∗F ) ≥ qL (xS − w∗L ) + (1 − qL )(xF − w∗L )
Recall that w∗L , w∗F , and w∗S do not depend on xS or xF . Thus the greater is xS − xF , the more
attractive inducing effort becomes.
Notation
–216–
output levels: x ∈ X = {x1 , . . . , xm }, x1 < . . . < xm .
output law: pi j = Prob(x j |ei ) > 0.
Analysis
In stage 2, the agent will choose his preferred effort level or to reject the contract. Thus as
before, the principal determines his maximal expected profit conditional on inducing each
effort level ei , and then optimizes over these possibilities and the option to shut down.
The following program determines the maximal expected profit and optimal wage sched-
ule for inducing effort level ei :
m
X
(Pi ) maxm pi j (x j − w j ) subject to
w∈R
j=1
m
X m
X
(Ih|i ) pi j u(w j ) − ei ≥ ph j u(w j ) − eh (for all h , i)
j=1 j=1
m
X
(Ri ) pi j u(w j ) − ei ≥ u
j=1
Although (Pi ) is not a concave program, conditions (KT1)–(KT3) are necessary and suffi-
cient for maximization: Suppose we change the choice variables from the wages w j to the
utilities u(w j ). Then (Pi ) becomes a program with a strictly concave objective function and
linear constraints. The Kuhn-Tucker conditions for this new program are necessary and
sufficient for maximization, and in fact they are equivalent to (KT1)–(KT3).
–217–
We first consider inducing the lowest effort level.
Proposition 6.1. The optimal contract for inducing effort level e1 has a flat wage schedule:
w1 = . . . = wm = w∗ , where u(w∗ ) = u + e1 .
Setting all λ j s to 0 is tantamount to assuming that none of the (Ih|1 ) constraints bind. So in
essence, the proof above solves the relaxed program and then checks that the solution is
feasible in the original program.
So far, we have made no assumptions at all about the output law p. We now introduce
conditions on the output law that ensure that optimal wage schedules are nondecreasing.
We say that the output law satisfies the monotone likelihood ratio property (MLRP) if
pi j
if g < i, then is nondecreasing in j.
pg j
That is: higher outcomes have higher likelihood ratios in favor of higher actions. To match
(KT1), we can write the MLRP in the following way:
pg j
(41) if g < i, then 1 − is nondecreasing in j.
pi j
The next lemma is a consequence of the MLRP for cases in which the principal does not
need to actively dissuade the agent from higher efforts.
Lemma 6.2. If the MLRP holds and the optimal solution to (KT1)–(KT3) satisfies λh = 0 for
h > i, then wages are nondecreasing in output: k ≥ j implies that wk ≥ w j .
Proof. When λh = 0 for h > i, the MLRP (41) and (KT1) imply that increasing j weakly
increases the right-hand side of the restated version of (KT1). To maintain the equality,
u0 (w) must weakly decrease; thus since u is concave, w must weakly increase.
Of course, the condition on the multipliers is trivially true for the highest effort level. Thus
Lemma 6.2 immediately implies
–218–
Proposition 6.3. Under the MLRP, the optimal wage schedule for inducing the maximal effort en
is nondecreasing.
If there are more than two actions, what can we say about optimal contracts to induce
intermediate effort levels? Here clean results require an additional assumption, which we
state in terms of the decumulative probabilities
X
P̃i j = pik = Prob(x > x j |ei ).
k>j
Lemma 6.4. If the MLRP holds, then the outcome distributions pi· satisfy first order stochastic
dominance: for all j, P̃i j is nondecreasing in i.
If e g < ei < eh , then there is an α ∈ (0, 1) such that ei = (1 − α)e g + αeh . We say that the output
law satisfies concavity of decumulative probabilities in effort (CDE) if
Lemma 6.4 says that as effort ei increases, the probability P̃i j of an outcome better than x j
weakly increases. CDE adds the requirement that this probability increase at a weakly
decreasing rate. (For instance, if ei = i for all i, then (42) says that P̃(i+1)j − P̃i j , the increase
in the probability of an output above x j obtained by increasing effort from i to i + 1, is
nonincreasing in i.) Thus CDE is a form of decreasing marginal returns to effort.
Proposition 6.5. Under the MLRP and CDE, the optimal wage schedule for inducing any effort
level ei is nondecreasing in output.
Proof. We have already seen that the result is true for i = 1 and i = n.
Lemma 6.6. Let i > 1. In the optimal solution (w, λ−i , µ) to (Pi ), λ g > 0 (and hence (I g|i ) binds)
for some g < i.
Proof. Suppose to the contrary that λ g = 0 for all g < i. Then this and the fact that (w, λ−i , µ)
satisfies (KT1)–(KT3) for (Pi ) imply that (w, λ−i , µ) also satisfies (KT1)–(KT3) for the relaxed
problem (P+i ) in which only efforts in {ei , . . . , en } are available. In (P+i ), ei is the smallest
effort, so Proposition 6.1 implies that w1 = . . . = wm . But then (I g|i ) is violated for all g < i,
implying that (w, λ−i , µ) is not feasible in (Pi ). l
–219–
Now consider the relaxed problem (P−i ) of inducing ei when only efforts in {e1 , . . . , ei }
are available. By Proposition 6.3, the optimal wages w1 , . . . , wm for this problem are
nondecreasing. To complete the proof of the proposition, we show that these wages
remain feasible in (Pi ), and so are optimal in (Pi ).
To do so, fix h > i; we want to show that (Ih|i ) is satisfied. By Lemma 6.6, there is a g < i
such that (I g|i ) binds. The intuition now is that since effort levels e g and ei generate the same
expected payoffs under wage schedule w, decreasing marginal returns to effort implies
that the expected payoffs from effort level eh must be lower under w. In other words, CDE
implies that effort levels higher than the principal intends are unattractive to the agent.
To show this formally, write ei = (1 − α)e g + αeh , sum by parts, and apply CDE (42) to obtain
m
X m−1
X
pi j u(w j ) − ei = u(w1 ) + P̃i j u(w j+1 ) − u(w j ) − ei
j=1 j=1
m−1
X
≥ (1 − α) u(w1 ) + P̃ g j u(w j+1 ) − u(w j ) − e g
j=1
m−1
X
+ α u(w1 ) + P̃h j u(w j+1 ) − u(w j ) − eh
j=1
X m X m
= (1 − α) p g j u(w j ) − e g + α ph j u(w j ) − eh
j=1 j=1
X m X m
= (1 − α) pi j u(w j ) − ei + α ph j u(w j ) − eh ,
j=1 j=1
where the final equality follows from the fact that (I g|i ) binds. Rearranging the inequality
between the first and last expressions yields (Ih|i ).
Here the (uninformed) principal offers a (privately informed) agent a menu of contracts.
Adverse selection refers to the fact that in designing the menu of contracts, the principal
must ensure that each type will select the contract intended for it rather than the contract
intended for some other type.
In the basic setting considered here, the principal’s payoff does not depend directly on the
agent’s type. This assumption is appropriate for modeling sales of consumption goods,
–220–
but not for insurance contracting.
We describe the model as one of screening via product quality. One can also interpret the
model as one of screening via quantity purchased—that is, one of nonlinear pricing. For
the latter interpretation to be appropriate in a given application, the principal must be
able to ensure that agents cannot circumvent the pricing scheme, either by making many
small purchases, or by teaming up to make group purchases that are then divided among
the team. See Wilson (1993) for a comprehensive treatment.
Key early references: Mirrlees (1971), Mussa and Rosen (1978), Baron and Myerson (1982),
Maskin and Riley (1984).
The agent’s type, θ ∈ {θ` , θh }, 0 < θ` < θh , represents his marginal return to quality. The
probability that he is type θh is πh ∈ (0, 1).
A type θ’s utility for purchasing a good of quality q ≥ 0 at price p ∈ R is
u(q, p, θ) = θq − p.
∂2
u(q, p, θ) > 0.
∂q∂θ
If the principal could observe the agent’s type, he could extract all of the agent’s surplus.
The principal solves
–221–
Since it is clearly optimal to make the constraint bind, this problem is equivalent to
max θq − c(q)
q≥0
The solution is to choose the q̂ satisfying c0 (q̂) = θ, and to set price p̂ = θq̂.
What if the agent’s type is unobservable? Suppose the principal offered the menu
((q̂` , θ` q̂` ), (q̂h , θh q̂h )) consisting of each type’s optimal contract when types are observable.
Since
the high type prefers the low type’s contract. What should the principal do?
The principal’s problem is to choose a pair of contracts, ((q` , p` ), (qh , ph )), that solves
(IC` ) θ` q` − p` ≥ θ` qh − ph
(ICh ) θh qh − ph ≥ θh q` − p`
(IR` ) θ` q` − p` ≥ 0
(IRh ) θh qh − ph ≥ 0
Constraints (IC` ) and (ICh ), which say that each type prefers the contract intended for it,
are called incentive compatibility constraints. Constraints (IR` ) and (IRh ), which say that
each type prefers its contract to no contract, are called individual rationality (or participation)
constraints.
The null contract pair ((0, 0), (0, 0)) is feasible in the principal’s problem. What contract
pair is optimal?
Proposition 6.7. At the optimal solution ((q∗` , p∗` ), (q∗h , p∗h )) of the principal’s problem,
(i) (ICh ) binds. (The high type is indifferent between the contracts.)
(ii) (IR` ) binds. (The low type gets zero surplus.)
(iii) c0 (q∗h ) = θh . (The high type gets its efficient quality.)
πh
(iv) c0 (q∗` ) = max{θ` − 1−π h
(θh − θ` ), 0}. (The low type gets less than its efficient quality, and
may not be not served at all.)
The next figure illustrates a possible optimal contract pair. (IR` ) binds because the type
–222–
θ` contract is on the type θ` indifference curve through the origin. (ICh ) binds because
both contracts are on the same type θh indifference curve. Quality q∗h is determined by the
marginal cost requirement c0 (q∗h ) = θh , and in the case illustrated, quality q∗` is determined
πh
by the marginal cost requirement c0 (q∗` ) = θ` − 1−π h
(θh − θ` ).
p p = θhq
p=θq
(q*h , p*h)
(q *, p*)
q
Proof.
Observe first that constraint (IRh ) is redundant: If (ICh ) and (IR` ) hold, then
(43) θh qh − ph ≥ θh q` − p` ≥ θ` q` − p` ≥ 0.
We now establish that properties (i) and (ii) must hold for a feasible menu ((q` , p` ), (qh , ph ))
to be optimal.
(i) Constraint (ICh ) binds (i.e., ph = θh (qh − q` ) + p` ): If (ICh ) is loose, then (43) implies
that (IRh ) is loose as well, so one can increase ph without violating these constraints.
Increasing ph also makes (IC` ) easier to satisfy. Thus increasing ph is feasible and increases
the principal’s payoffs.
(ii) Constraint (IR` ) binds (i.e., p` = θ` q` ): If (IR` ) is loose, one can increase ph and p`
equally without violating (IR` ), (IC` ), or (ICh ) (or the redundant (IRh )). Thus doing so is
feasible and increases profits.
When (ICh ) binds, we can substitute the expression for ph from (i) into (IC` ) to obtain
–223–
Therefore, since (IRh ) is redundant, (i), (ii), and (44) reduce the principal’s problem to
To solve (45), we consider the relaxed problem in which the constraint qh ≥ q` is omitted.
If we find that the solution to this relaxed problem satisfies qh ≥ q` , it is then a solution
to (45), and hence (with the expressions for ph and p` above) a solution to the original
problem.
To solve this relaxed version of (45), note that its objective function is strictly concave and
separable in q` and qh . The first order condition for qh is θh − c0 (qh ) = 0, giving us (iii).
(Since c0 (·) is increasing with c0 (0) = 0 and limq→∞ c0 (q) = ∞ this equation has a unique
solution.)
To obtain (iv), compute the partial derivative of the objective function with respect to q` :
πh
(46) (1 − πh )(θ` − c0 (q` )) − πh (θh − θ` ) = (1 − πh ) θ` − (θh − θ` ) − c0 (q` ) .
1 − πh
πh
It follows that if θ` − 1−π h
(θh − θ` ) is nonnegative, then setting c0 (q` ) equal to it yields the
πh
optimal choice of q` , which is less than qh because θh > θ` > θ` − 1−π h
(θh − θ` ) and c0 (·)
πh
is increasing. If instead θ` − 1−π h
(θh − θ` ) is negative, then (46) is negative for all q` ≥ 0,
implying that it is optimal to choose q` = 0. Either way, qh ≥ q` as required.
Remarks:
(i) Type θh ’s equilibrium payoff is θh qh − ph = (θh − θ` )q` . This is sometimes called type
θh ’s information rent, since it is the part of her surplus that the principal is unable to
extract.
(ii) Examining (45) shows that the principal chooses q` to maximize
πh
(47) θ` q` − (θh − θ` )q` − c(q` ).
1 − πh
The first two terms of (47) are called the virtual utility of type θ` , and (47) as a whole
is called the virtual surplus from type θ` . The first term in (47) is type θ` ’s actual
utility for obtaining quality q` , and the last term is the principal’s cost of providing
this quality. Subtracted from this is the information rent of type θh , multiplied by
that type’s relative probability. This deduction accounts for the fact that increasing
q` increases the information rent that the principal will have to pay to type θh to
–224–
keep her choosing the contract intended for her. It is the reason that the low type’s
quality is distorted downward from the quality she would be offered if types were
observable.
The techniques used to study the continuum-of-types model are fundamental tools in
mechanism design.
Θ = [0, 1] set of types
F, f cdf and pdf of type distribution; f > 0
t transfer paid by the agent to the principal
u(q, θ) agent’s consumption utility
twice differentiable
u(0, θ) = 0
∂
∂q
u(q, θ) ≥ 0, > 0 when θ > 0
∂
∂θ
u(q, θ) ≥ 0, > 0 when θ > 0, bounded on compact sets
∂2
∂q∂θ
u(q, θ) > 0 for q, θ > 0 (single crossing)
u(q, θ) − t agent’s total utility
c(·) principal’s cost function; nondecreasing, c(0) = 0
t − c(q) principal’s utility
–225–
Payoff equivalence: characterizing incentive compatible menus
Theorem 6.11 characterizes the incentive compatible menus of contracts. It follows from
three lemmas.
Lemma 6.8. Fix q(·). If there is a t(·) such that (q(·), t(·)) satisfies (IC), then q(·) is nondecreasing.
Add:
θ1 θ1
∂ ∂
Z Z
(51) u(q(θ1 ), θ̂) dθ̂ ≥ u(q(θ0 ), θ̂) dθ̂.
θ0 ∂θ θ0 ∂θ
∂ ∂
If q(θ1 ) < q(θ0 ), then single crossing would imply that ∂θ
u(q(θ1 ), θ̂) < ∂θ
u(q(θ0 ), θ̂) for all
θ̂, contradicting (51).
Let
be type θ’s payoff from her intended contract—in other words, type θ’s information rent.
Lemma 6.9. If (q(·), t(·)) satisfies (IC), then for some U(0) ∈ R,
θ
∂
Z
(53) U(θ) = U(0) + u(q(θ̂), θ̂) dθ̂ for all θ ∈ Θ, or equivalently
0 ∂θ
Z θ
∂
(54) t(θ) = u(q(θ), θ) − u(q(θ̂), θ̂) dθ̂ − U(0) for all θ ∈ Θ.
0 ∂θ
Lemma 6.9 says that incentive compatibility and the menu of qualities determine all types’
payoffs / information rents (or equivalently, all types’ transfers) up to the choice of the
additive constant U(0).
–226–
∂
It also shows that payoffs are nondecreasing in type: U0 (θ) = ∂θ
u(q(θ), θ) ≥ 0.
In fact, up to technicalities (see below), equation (53) says that
∂
(55) U0 (θ) = u(q(θ), θ).
∂θ
U(θ) = max u(q(θ̂), θ) − t(θ̂) = u(q(θ̂∗ (θ)), θ) − t(θ̂∗ (θ)) = u(q(θ), θ) − t(θ).
θ̂∈Θ
∂
So by the envelope theorem, U0 (θ) = ∂θ
u(q(θ), θ).
What was not rigorous? First, to apply the usual envelope theorem at value θ, we need
for various smoothness and continuity conditions to hold at θ, but this is not always
∂
the case. (For instance, the usual envelope theorem requires that ∂θ (u(q(θ̂), θ) − t(θ̂)) be
continuous in (θ, θ̂). Lemma 6.8 tells us that q(·) is nondecreasing, which implies that
it is differentiable almost everywhere, but it may have discontinuities, and these may
∂
be passed on to ∂θ (u(q(θ̂), θ) − t(θ̂)).) Second, the integration requires the fundamental
theorem of calculus to hold. The necessary and sufficient condition for this is that U(·) be
absolutely continuous (see, e.g., Folland (1999)); this property of U(·) must be established,
either directly or indirectly.
The integral envelope theorem of Milgrom and Segal (2002) (see also Milgrom (2004, ch. 3))
gives conditions under which the integral formula (53) for U(·) must hold, establishing
absolute continuity of U(·) along the way.
∂
0≤ u(q, θ) ≤ M for (q, θ) ∈ [q(0), q(1)] × Θ.
∂θ
Since q(·) is nondecreasing by Lemma 6.8, q(θ) ∈ [q(0), q(1)] for all θ ∈ Θ, and so
∂
(56) 0≤ u(q(θ), θ) ≤ M for all θ ∈ Θ.
∂θ
Let θ1 > θ0 . Since U(θ1 ) is the left-hand side of (48), and since −U(θ0 ) is the right-hand
side of (49), we can augment (50) as follows:
–227–
Thus the mean value theorem implies that for some θ̄0 , θ̄1 ∈ (θ0 , θ1 ),
∂ ∂
u(q(θ1 ), θ̄1 )(θ1 − θ0 ) ≥ U(θ1 ) − U(θ0 ) ≥ u(q(θ0 ), θ̄0 )(θ1 − θ0 ).
∂θ ∂θ
Bound (56) then implies that U is Lipschitz continuous, and thus absolutely continuous.
Furthermore,
∂ U(θ1 ) − U(θ0 ) ∂
(57) u(q(θ1 ), θ̄1 ) ≥ ≥ u(q(θ0 ), θ̄0 ).
∂θ θ1 − θ0 ∂θ
Lemma 6.10. If q(·) is nondecreasing and t(·) is given by (54) for some U(0) ∈ R, then (q(·), t(·))
satisfies (IC).
Proof. Since t(·) is given by (54), definition (52) of U(θ) implies that U(θ) satisfies (53).
If θ1 > θ0 , then (53) and the single crossing property imply that
θ1 θ1
∂ ∂
Z Z
U(θ1 ) − U(θ0 ) = u(q(θ̂), θ̂) dθ̂ ≥ u(q(θ0 ), θ̂) dθ̂ = u(q(θ0 ), θ1 ) − u(q(θ0 ), θ0 ).
θ0 ∂θ θ0 ∂θ
Thus
which is (ICθ0 |θ1 ). A careful reading reveals that the argument remains valid if θ1 < θ0 .
Thus (IC) holds.
In summary, Lemma 6.8 says that in solving the principal’s problem, we need only consider
nondecreasing q(·), Lemma 6.9 specifies the implied information rents U(·) up to the choice
of the constant U(0), and Lemma 6.10 says that we must consider all nondecreasing q(·).
Putting these together, we have:
–228–
θ
∂
Z
(ii) U(θ) = U(0) + u(q(θ̂), θ̂) dθ̂ for all θ ∈ Θ.
0 ∂θ
Equivalently, (q(·), t(·)) satisfies (IC) if and only if (i) holds and transfers are given by
Z θ
∂
(ii’) t(θ) = u(q(θ), θ) − u(q(θ̂), θ̂) dθ̂ − U(0) for all θ ∈ Θ.
0 ∂θ
Lemma 6.12. If (q(·), t(·)) satisfies (IC) and (IR0 ), then it satisfies (IRθ ) for all θ > 0.
∂
Proof. Use (IC0|θ ) and the fact that ∂θ
u(q, θ) ≥ 0:
1 θ
∂
Z Z !
(58) max u(q(θ), θ) − u(q(θ̂), θ̂) dθ̂ − U(0) − c(q(θ)) f (θ) dθ
q(·),U(0) 0 0 ∂θ
subject to q(·) nondecreasing
U(0) ≥ 0
Thus since f (·) is positive, we can rewrite the principal’s problem again as
1
1 − F(θ) ∂
Z !
max u(q(θ), θ) − u(q(θ), θ) − c(q(θ)) f (θ) dθ
(60) q(·) 0 f (θ) ∂θ
subject to q(·) nondecreasing
f (·)
The function 1−F(·)
is called the hazard rate of distribution F. Its reciprocal is the inverse
–229–
hazard rate.
The expression
1 − F(θ) ∂
(61) u(q, θ) − u(q, θ)
f (θ) ∂θ
from the integrand is called the virtual utility of type θ (cf. (47)). In the relaxed problem
without the requirement that q(·) be nondecreasing, the principal chooses q(θ) to maximize
the difference between (61) and the production cost c(q(θ)). (This difference is known as
the virtual surplus from type θ.) The second term in (61) accounts for the effect of the
choice of q(θ) on the information rents that must be paid to types above θ, as discussed
next.
Remark 6.13. To interpret virtual utility (61), it is easiest to imagine that the agent is drawn
from a population in which the type density is f .
We first interpret the reversal in the order of integration in (59). This is an accounting
trick that reassigns information rents from the types who earn them to the types who
necessitate their payment. In the initial expression in (59), the expression in parentheses is
the information rent that an agent of type θ obtains because of the provision of quality to
lower types θ̂ < θ (the vertical line in the figure at left). The outer integral totals this over
all types θ, so that the region of integration is the shaded triangle. After we reverse the
order of integration, the integrand of the outer integral (in parentheses) is the information
rent that provision of quality to agents of type θ̂ generates for agents of higher types θ > θ̂
(the horizontal line in the figure at right). When we substitute the final expression in
(59) into the objective function in (60), we change the name of the variable of integration
from θ̂ to θ. Thus consumption benefits are indexed by the type that receives them, but
information rents are indexed by the type that generates them.
�
θ
1 1
�
θ
θ 1 1 θ
We next factor out f (θ) from (60). There is already an f (θ) in the consumption benefit
term, since this term is indexed by who receives the benefit, and there are f (θ) agents of
–230–
type θ. But as explained above, the information rent term is indexed by who generates
the rent. Here the density term, which is tied to the agents receiving the rent, has been
integrated away. To factor out an f (θ) that is not there, we divide by it. Doing so converts
1 − F(θ), the information rents attributed to agents of type θ as a group, into (1 − F(θ))/ f (θ),
the information rent attributed to each agent of type θ. This factoring out f (θ) gives us the
virtual utility (61).
Problem (60) requires optimization over a set of functions, and so is an optimal control
problem. However, if choosing each q(θ) to maximize the integrand leads to a non-
decreasing q(·), this choice of q(·) is optimal. We consider environments in which the
agent’s utility is linear in her type.
f (·)
The assumption that ψ(·) is increasing holds if the hazard rate 1−F(·) is nondecreasing. This
monotone hazard rate condition is satisfied by many common distributions; see Bagnoli and
Bergstrom (2005).
Since ψ(·) is increasing and continuous with ψ(0) < 0 < 1 = ψ(1), it has a unique zero
θ∗ ∈ (0, 1).
Proposition 6.14. Suppose that Q = [0, 1]. Then the optimal menu of qualities satisfies
c0 (0)
if ψ(θ) < ,
let q(θ) = 0,
v0 (0)
c0 (q(θ))
h c0 (0)
c0 (1)
i
if ψ(θ) ∈ v0 (0) , v0 (1) , let q(θ) be a solution to ψ(θ) = v0 (q(θ))
,
c0 (1)
if ψ(θ) > v0 (1)
, let q(θ) = 1.
–231–
(Multiple optimal choices of q(θ) are possible if there is an interval of qualities q over
c0 (q)
which v0 (q) is constant and equal to ψ(θ).)
Proof. By the assumptions on c0 (·) and v0 (·),
(63) d
dq
v(q)ψ(θ) − c(q) = v0 (q)ψ(θ) − c0 (q) is nonincreasing in q.
c0 (0)
Thus for a type θ with ψ(θ) < v0 (0)
, the integrand from (62) is maximized at q = 0.
c0 (0) c0 (1) c0 (q(θ))
If ψ(θ) ∈ [ v0 (0) , v0 (1) ], a q(θ) that satisfies ψ(θ) = v0 (q(θ)) also satisfies the first-order condition
for maximizing the integrand from (62), and this condition is sufficient for maximization
c0 (·)
because of (63). Also, since v0 (·) is nondecreasing, q(θ) is increasing in θ over this range.
c0 (1)
Finally, if ψ(θ) > v0 (1) , then the derivative in (63) is still positive at q = 1, leading to the
corner solution q(θ) = 1.
c0 (q)
Example 6.15. Suppose that Q = [0, 1], c(q) ≡ 0, and v(q) = q. Then since v0 (q)
≡ 0, the
optimal menu of qualities is
0 if ψ(θ) < 0,
q(θ) =
1 if ψ(θ) > 0.
(One can choose q(ψ−1 (0)) arbitrarily.) The corresponding menu of transfers is
θ θ
0 if ψ(θ) < 0,
Z
Z
t(θ) = q(θ)θ − q(θ̂) dθ̂ = q(θ)θ − 1 dθ̂ =
_
ψ−1 (0) ψ−1 (0) if ψ(θ) > 0.
0
Proposition 6.16. Suppose that Q = [0, ∞) and that c0 (q̄) ≥ v0 (q̄) for some q̄ ≥ 0. Then the
optimal menu of qualities satisfies
c0 (0)
if ψ(θ) < v0 (0)
, let q(θ) = 0,
c0 (0) c0 (q(θ))
if ψ(θ) ≥ v0 (0)
, let q(θ) be a solution to ψ(θ) = v0 (q(θ))
.
c0 (·)
Proof. Because v0 (·) is continuous and nondecreasing, there are two ways that the condition
on c0 (·) can hold: either c0 (0) > v0 (0), or c0 (q̄) = v0 (q̄) for some q̄ ≥ 0. In the former case it is
optimal to choose q(θ) ≡ 0.
c0 (·)
If instead c0 (q̄) = v0 (q̄), then since ψ(1) = 1, setting q(1) = q̄ is optimal, and because v0 (·) is
nondecreasing and ψ(·) is increasing, the condition also ensures that optimal quantities
c0 (0)
q(θ) exist for all types θ < 1 and that q(·) is increasing in θ once ψ(θ) ≥ v0 (0) .
–232–
Remarks:
(i) If the agent’s type θ > 0 were observable, the principal would maximize θv(q) − aq
by solving v0 (q) = a/θ. Here he instead either solves v0 (q) = a/(θ − f (θ) ) (if θ > θ∗ )
1−F(θ)
or chooses q = 0 (otherwise). Thus only the highest type receives her optimal
quality, all other types above θ∗ receive lower-than-optimal qualities, and types θ∗
and below are not served.
(ii) If the assumption on the hazard rate fails, the allocation function q(·) obtained by
pointwise maximization of virtual surplus may not be nondecreasing. In this case,
one must use further arguments to find the optimal q(·). This q(·) will have flat
spots, reflecting bunching of intervals of types. This can happen when the hazard
rate f (·)/(1 − F(·)) has a decreasing segment: if there are relatively few agents with
types near θ, it may be optimal to extract less from these types in order to lower the
information rents of higher types. The term ironing is sometimes used to describe
the form of the optimal q(·).
To address these problems, Myerson (1981) (see also Baron and Myerson (1982) and
Toikka (2011)) introduce a generalized virtual utility that incorporates the distortions
generated by the allocation to type θ on the allocations to other types generated
by the monotonicity requirement on the allocation function q(·). Using convex
analysis arguments, they show that pointwise maximization of generalized virtual
surplus generates the optimal monotone allocation function.
7. Mechanism Design
Mechanism design considers the use of games to elicit information from groups of pri-
vately informed agents, whether to extract surplus from the agents, to ensure efficient
social choices, or to achieve other ends. Instead of optimal choice by a single agent, the
designer anticipates (some kind of) equilibrium behavior among the multiple agents. (The
designer himself is not a part of this equilibrium; instead he is assumed to have the power
to commit to the mechanism of his choice.)
Some basic questions:
(i) characterization of implementable social choice functions (where “social choice
function” and “implementable” will be defined shortly)
(ii) revenue maximization (often called “optimality”)
(iii) ensuring allocative efficiency
Revenue maximization models have strong commonalities with the principal-agent prob-
–233–
lem with adverse selection, and it is accurate to describe them as “principal - many agent
problems”. Models focusing on allocative efficiency have a different flavor: for instance,
it is more natural to think of the mechanism designer as a planner or as the agents as a
group rather than as a self-interested principal. But we shall see that there is significant
overlap in how problems with different objectives are analyzed.
The father of mechanism design is Hurwicz (1960, 1972). Much of our analysis focuses on
Bayesian implementation, under which agents play a Bayesian equilibrium of the Bayesian
game introduced by the planner. Myerson (1979, 1981) are key early references.
A social choice function g : Θ → X assigns to each type profile θ a social alternative g(θ) ∈ X .
Social choice functions can be used to describe the designer’s aims at the ex ante stage.
Example 7.1. Assignment of distinct objects without monetary transfers. There is a set Z
consisting of n distinct prizes. Agent i’s type θi is an element of Rn , where θiz is agent i’s
benefit from receiving prize z. The distribution µ is the common prior over type profiles.
A social alternative x ∈ X is an assignment of one prize to each agent. Agent i0 s utility is
vi (x , θ) = θiz when he receives prize z under x . Thus this environment has private values.
A social choice function g specifies an assignment g(θ) ∈ X for each type profile θ ∈ Θ.
For example, g(θ) could be an assignment that maximizes the sum of the agents’ utilities
given their types. _
–234–
Quasilinear environments
Example 7.2. Allocation of an indivisible good. The set of allocations is X = A (who gets
the good), or X = A ∪ {0} (allowing the good not to be allocated). We assume private
values, with types representing valuations for the good. Thus agent i’s consumption
utility is described by ui (i, θi ) = θi and ui (j, θi ) = 0 for j , i, and his total utility from
allocation x = ( j, t) is vi (x , θ) = ui ( j, θi ) − ti . Allocation function x is ex post efficient if
x(θ) ∈ argmax j θ j for all θ ∈ Θ, so that the good is assigned to someone who values it
most. _
We will see that analyses are often made easier by allowing randomized allocations:
X = ∆X × Rn , X finite
x = (q, t)
qx ∈ [0, 1] X the probability of allocation x
vi (x , θ) = qx ui (x, θ) − ti
x∈X
–235–
For the usual reasons (cf. Proposition 1.14), randomized allocation q is efficient for θ if
and only if every allocation x in its support is efficient for θ.
Now social choice functions are of the form g(·) = (q(·), t(·)), where q : Θ → ∆X. Allocation
function q? (·) is ex post efficient if for each type profile θ ∈ Θ, q? (θ) is efficient for θ.
Example 7.4. Linear utility. Private good allocation (Example 7.3) and public good pro-
vision (Example 7.22) are instances of quasilinear environments with private values and
linear utility, which here means linearity in own type:
Often we will consider environments with independent private values. This means that
different agents’ types are drawn independently (µ is a product distribution), and that
each agent’s payoff does not depend on other agents’ types (vi (x , θ) = vi (x , θi )).
The independent private values environment provides a natural baseline, and it allows
for many powerful results. But in many applications, types are correlated, or values
are interdependent, or both. Results about optimality and efficiency in these settings
sometimes differ markedly from those in the independent private values case. We focus
on independent private values in Sections 7.2–7.4. We then consider interdependent
values and correlated types in Sections 7.5 and 7.6.
7.1.2 Mechanisms
–236–
function. Together, E and M define a Bayesian game with Bernoulli utility functions
We typically take E for granted and identify the Bayesian game with the mechanism M .
Example 7.5. First price auctions. In Example 7.3, suppose that agents’ types are indepen-
dent draws from a distribution with density f on [0, 1]. This is the independent private
values environment for auctions from Section 4.5.1.
Since preferences are quasilinear, a mechanism in this environment has a decision function
of the form γ(·) = (ρ(·), τ(·)), where ρ : A → ∆X and τ : A → Rn .
To express a first price auction using the definitions above, we let Ai = [0, 1] be the set of
agent i’s bids. Ignoring the case of a tied high bid, we can define the decision function γ(·)
as follows:
This is just as in Example 7.3, except that the allocation probabilities and transfers are now
determined as functions of the bid profile. Combining the last two displays shows that
g
Θ /X
?
s γ
A
–237–
Then, the decision function γ : A → X uses the action profile s(θ) to determine the social
alternative γ(s(θ)).
7.1.3 Implementation
If M = {{Ai }i∈A , γ} implements g in either sense, then the specified equilibrium strategy
profile s∗ makes the following diagram commute (meaning that either path from θ ∈ Θ to
X leads to the same x ):
g
Θ /
?X
s∗ γ
A
Example 7.6. First price auctions revisited. As in Example 7.5, consider a first price auction
in an environment with symmetric independent private values drawn from Θi = [0, 1].
Proposition 4.10 showed that in the unique symmetric equilibrium, all agents use the
increasing bidding function b∗ (v) = E(V(n−1)n−1 n−1
| V(n−1) ≤ v). Because b∗ (·) is increasing, the
good is allocated to a bidder who values it most. Thus (again ignoring ties), the first-price
auction Bayesian implements a social choice function g(·) = (q(·), t(·)) satisfying
if {i} = argmax θ j , then (qi (θ), ti (θ)) = (1, b∗ (θi )), and (q j (θ), t j (θ)) = (0, 0) for j , i.
j∈A
Thus the allocation function q(·) implemented by a first price auction is ex post efficient.
_
Remarks:
(i) The definition of Bayesian implementation requires that there be some equilibrium
of the mechanism that sustains the social choice function; there may be other
equilibria that do not. We can think of the designer both proposing the mechanism
and suggesting the equilibrium to be played.
–238–
(ii) A stronger requirement is that every equilibrium of M sustains the social choice
function. This requirement is called full implementation, and this general topic is
known (somewhat confusingly) as implementation theory. Two key references here
are Postlewaite and Schmeidler (1986) and Jackson (1991). See Palfrey (1992) for a
survey of the early literature. We discuss some resent results in Section 7.6.4.
(iii) Starting in Section 7.2 we will focus on quasilinear environments, which suffice
for many economic applications. Another branch of the literature instead asks
about implementation when general preferences over the set of alternatives X
are allowed. The basic result here is the Gibbard-Satterthwaite theorem (Gibbard
(1973), Satterthwaite (1975)) which states that if each preference over alternatives is
possible for every agent, and if the social choice function g : Θ → X is onto (i.e., if
each alternative is assigned under some type profile), then g is dominant strategy
implementable if and only if it is dictatorial, meaning that there is a single agent
i whose favorite alternative is always chosen. This result is a descendant of the
Arrow impossibility theorem (Arrow (1951)), the foundational result of social choice
theory.
(iv) Here we consider implementation in Bayesian environments, where each agent is
uncertain about others’ types when choosing his action. A complementary branch
of the literature considers implementation in environments with complete informa-
tion, meaning that the agents (though not the principal) know one another’s types
when choosing their actions. The basic notion of implementation supposes that
agents play a Nash equilibrium of the realized complete information game. With
three or more agents, any social choice function can be implemented in the weak
sense (as described in (i)), so the interesting question is whether full implementa-
tion (as in (ii)) is possible. This is commonly referred to as Nash implementation. The
fundamental work here is is due to (Maskin (1999) (first version 1977)), who in-
troduced a condition on social choice functions (now called (Maskin) monotonicity)
which is necessary and almost sufficient for Nash implementation. For a survey of
the early literature, see Moore (1992).
We now show that one can focus on mechanisms of a particularly simple kind.
A direct mechanism (or revelation mechanism) is a mechanism of the form M d = {{Θi }i∈A , g}.
Thus the actions are type announcements, often denoted θ̂i . The decision function g : Θ →
X is therefore a social choice function. However, when the mechanism is run this function
–239–
is applied to the profile of type announcements.
We sometimes refer to a social choice function g as a mechanism, and when we do so we
mean the direct mechanism M d = {{Θi }i∈A , g}. This will prove quite convenient, since by
virtue of Proposition 7.7 below, our analyses in later sections will largely focus on direct
mechanisms.
Writing si : Θi → Θi for a Bayesian strategy in M d , we can describe M d using the following
diagram:
g
Θ /
?X
s g
Θ
X X
µ(θ−i |θi ) vi (γ(s∗i (θi ), s∗−i (θ−i )), θ) ≥ µ(θ−i |θi ) vi (γ(âi , s∗−i (θ−i )), θ) for all âi ∈ Ai .
θ−i θ−i
| {z } | {z }
g(θ) some x ∈X
In particular this is true when âi = s∗i (θ̂i ) for some θ̂i ∈ Θi , so that γ(s∗i (θ̂i ), s∗−i (θ−i )) =
g(θ̂i , θ−i ). Substituting into the previous expression thus yields
X X
(64) µ(θ−i |θi ) vi (g(θi , θ−i ), θ) ≥ µ(θ−i |θi ) vi (g(θ̂i , θ−i ), θ) for all θ̂i ∈ Θi .
θ−i θ−i
–240–
This says that it is optimal for i to be truthful in M d when others are truthful, or in other
words, that g is Bayesian incentive compatible.
It is easy to explain in words why the revelation principle is true. Imagine that mechanism
M is used, but that each agent i must have her action enacted by a proxy. The proxy knows
s∗i from the start. Just before actions must be chosen, the agent reports a type θ̂i to the
proxy, causing it to play action s∗i (θ̂i ). (The proxy is like a machine—it has no incentives,
and its only role is to mechanically carry out strategy s∗i .)
Now suppose the other agents report their types truthfully to their proxies. Since s∗ is
an equilibrium of M , it must be optimal for i to report her type truthfully; otherwise,
s∗i was not optimal for i in the first place. This game of reporting types to proxies is the
direct mechanism M d , and we have just shown that truth-telling is an equilibrium of this
mechanism.
Remarks:
(i) In a direct mechanism M d = {{Θi }i∈A , g}, elements of Θ have two interpretations: as
profiles of types, and as profiles of type announcements. For example, when asking
whether allocation x(θ) is efficient, we think of θ as a profile of types. But when
asking whether agent i has an incentive to report truthfully, we think of θi ∈ Θi
as i’s actual type and θ̂i ∈ Θi as i’s type announcement. Analyses often hinge on
the interplay between these two roles—for instance, see Section 7.4.1. (Beware!
Since truth-telling means that θ̂i = θi , both notations can refer either to types or
announcements.)
(ii) The revelation principle shows that for the purpose of determining which social
choice functions are implementable, direct mechanisms are enough. This is a
tremendous simplification. First of all, it implies that there is no need to construct
clever mechanisms. Second, it replaces computations of Bayesian equilibria of
groups of games with checking of collections of constraints of form (64). The latter
problem often has a simple structure, as we will soon see.
(iii) There are a variety of reasons for considering more general mechanisms:
(a) The revelation principle concerns the existence of an equilibrium that sustains
g(·). There may be other equilibria that do not. One role of general mechanisms
is to do away with such equilibria, so such mechanisms are common in work
on full implementation (see Section 7.1.3).
(b) In general an agent’s type may contain a great deal of information, both payoff-
–241–
relevant information (e.g., in combinatorial auction environments) and beliefs,
beliefs about beliefs, etc. It may be quite burdensome for the agents to reveal
this information to the planner and for the planner to process this information.
It therefore may be preferable to use less burdensome mechanisms (e.g., simple
auction formats) that achieve similar aims. Related reasons for preferring
simple mechanisms are put forward in the Wilson doctrine—see Section 7.6.
There is also a revelation principle for dominant strategy implementation, and it follows
the same lines as above. Given a direct mechanism M d = {{Θi }i∈A , g} with social choice
function g, we say that M d or g is dominant strategy incentive compatible if truth-telling is a
very weakly dominant strategy for all agents.
Example 7.9. Second price auctions revisited. Consider a second price auction in an envi-
ronment with symmetric independent private values drawn from Θi = [0, 1]. Under this
mechanism, each agent places a bid, and (ignoring ties) the good is awarded to the agent
who values it most, who then pays the second highest bid; other agents pay nothing. In
our present language, a second price auction is a direct mechanism with social choice
function g(·) = (x∗ (·), t(·)), where (again ignoring ties)
max j,i θ̂ j if x∗ (θ̂) = i,
∗
x (θ̂) = i when {i} = argmax θ̂ j , and ti (θ̂) =
0 otherwise.
j∈A
Proposition 4.8 showed that truthful reporting is a weakly dominant strategy. Thus the
second price auction is a dominant strategy incentive compatible direct mechanism that
implements the ex post efficient allocation function. _
Except in Section 7.5 we will focus on private values environments (vi (x , θ) = vi (x , θi )).
In these environments, dominant strategy incentive compatibility can be expressed in a
simple ex post form. In Section 4.2, we noted that in a Bayesian game with private-value
payoff functions ui : A × Θi → R, a Bayesian strategy si : Θi → Ai is very weakly dominant
–242–
if the following ex post optimality condition holds (cf. (25)):
ui ((si (θi ), a−i ), θi ) ≥ ui ((âi , a−i ), θi ) for all a−i ∈ A−i , âi ∈ Ai , and θi ∈ Θi
Under a direct mechanism M d = {{Θi }i∈A , g} for private values environment E , action
sets are Ai = Θi , payoff functions are ui ((θ̂i , θ−i ), θi ) = vi (g(θ̂i , θ−i ), θi ), and the truth-telling
strategy is si (θi ) = θi . We therefore have
Observation 7.10. Let g be a social choice function for the private values environment E . Then
g is dominant strategy incentive compatible if and only if for all agents i ∈ A ,
(65) vi (g(θi , θ̂−i ), θi ) ≥ vi (g(θ̌i , θ̂−i ), θi ) for all θi , θ̌i ∈ Θi and θ̂−i ∈ Θ−i .
Comparing the previous two equations reveals that in the dominant strategy incentive
compatibility constraint (65), θ̂−i represents the announcements of i’s opponents, not their
actual types. This is because the announcements determine which social alternative
is chosen, even though the actual types determine, e.g., whether a social alternative is
efficient.
(Here we use the notations θ̌i and θ̂−i to distinguish between a (possibly) false announce-
ment of type θi and arbitrary announcements by the agents besides i. The θ̂−i notation
is not needed when we consider Bayesian incentive compatibility: there the equilibrium
assumption of truth-telling means we only need consider truthful announcements by i’s
opponents, which we can denote by θ−i .)
Prior beliefs are irrelevant in condition (65). While in settings with interdependent values
dominant strategy incentive compatibility is generally too demanding to be useful, the
notion of ex post incentive compatibility also makes no use of prior beliefs, but it does assume
equilibrium knowledge of opponents’ (truth-telling) strategies. See Section 7.5.
The revelation principle for dominant strategy implementation appeared first, and was
introduced by a number of authors, including Gibbard (1973) and Green and Laffont
(1977). The revelation principle for Bayesian implementation was also developed by
many authors, including Dasgupta et al. (1979) and Myerson (1979, 1981, 1982).
In this section and the next, we consider mechanisms for allocation problems E with
quasilinear utility and independent private values. The analysis follows Myerson (1981).
–243–
A = {1, . . . , n} set of agents
Θi = [0, 1] set of agent i’s types
Fi , fi cdf and pdf of agent i’s type distribution; fi positive
different agents’ types are independent
X = ∆A × R n
set of social alternatives
∆A = {q ∈ R+ : i qi ≤ 1} assignment probabilities, allowing for non-assignment
n
P
tribution of types, and f−i (θ−i ) = j,i f j (θ j ) for the joint distribution of the types of i’s
Q
A social choice function is a map g(·) = (q(·), t(·)) with q : Θ → ∆A and t : Θ → Rn . (We
again use q and t for scalars and q(·) and t(·) for functions.)
Suppose that agent i reports θ̂i and others report truthfully. Then
Z
(66) q̄i (θ̂i ) = qi (θ̂i , θ−i ) f−i (θ−i ) dθ−i
Θ−i
is her expected transfer. If in addition agent i’s actual type is θi , then her expected utility
is
Notice the similarities between this and the single-agent model from Section 6.2.2. If we
–244–
suppress the dependence of q̄i on θi , then the map (θi , q̄i , t̄i ) 7→ θi q̄i − t̄i has the single-
crossing property in θi and q̄i , in the sense that ∂θ∂i ∂q̄i (θi q̄i − t̄i ) = 1.
2
By definition, social choice function g(·) = (q(·), t(·)) is Bayesian incentive compatible if
under the direct mechanism for g(·), every type of every agent finds it optimal to report
truthfully:
(IC) θi q̄i (θi ) − t̄i (θi ) ≥ θi q̄i (θ̂i ) − t̄i (θ̂i ) for all θi , θ̂i ∈ Θi and i ∈ A .
The proposition tells us that Bayesian incentive compatibility and equilibrium expected
payoffs only depend on the interim allocation probabilities q̄i (·) and the lowest type’s
expected payoff Ūi (0). In particular, the more detailed information about ex post allocation
probabilities contained in q(·) has no bearing on these questions.
Theorem 7.11 is a direct analogue of Theorem 6.11 from the single-agent setting. There
we had the more general utility function u(q, θ) − t with various assumptions on u, but
here we have multiple agents. However, in considering a single agent’s decision under a
direct mechanism, our focus on truth-telling lets us integrate out the effects of opponents’
types, essentially reducing the analysis here to the analysis of the single-agent setting.
One can prove the necessity of (ii) using either of the arguments used to prove Lemma 6.9.
Here we prove the necessity of (i) and (ii) using a convexity argument that takes advantage
of the linearity of utility in types. Thus the result extends easily to other environments
with linear utility (Example 7.4). We take advantage of this fact in Section 7.4.4.
–245–
Since Ūi (·) is the pointwise maximum of the collection of affine functions {vθ̂i (·)}θ̂i ∈Θ , where
vθ̂i (θi ) = θi q̄i (θ̂i )− t̄i (θ̂i ), Ūi (·) is itself a convex function. (This can be confirmed by drawing
a picture, or see Theorem 5.5 of Rockafellar (1970).)
Next, since
q̄i (θ̂i )θi − t̄i (θ̂i ) = q̄i (θ̂i )θ̂i − t̄i (θ̂i ) + q̄i (θ̂i )(θi − θ̂i )
= Ūi (θ̂i ) + q̄i (θ̂i )(θi − θ̂i ),
(68) Ūi (θi ) ≥ Ūi (θ̂i ) + q̄i (θ̂i )(θi − θ̂i ), with equality when θi = θ̂i .
Consider each side of inequality (68) as a function of θi . The left hand side, θi 7→ Ūi (θi ),
describes type θi ’s expected payoff under (q(·), t(·)). The right hand side, θi 7→ Ūi (θ̂i ) +
q̄i (θ̂i )(θi − θ̂i ), represents the expected payoff to type θi from reporting θ̂i , viewed as a
function of θi for some fixed θ̂i ; in other words, it describes each type’s benefit from
announcing θ̂i . This right hand side function is affine, passes through (θ̂i , Ūi (θ̂i )), and has
slope q̄i (θ̂i ) ∈ [0, 1]. Thus (68) says that this function supports Ūi (·) at θ̂i .
vi ¯ (θ )
U i i
θˆ i θi
If Ūi (·) is differentiable, then Ūi0 (θ̂i ) equals q̄i (θ̂i ), the slope of the support function from
(68) at θ̂i . Thus the convexity of Ūi (·) implies that q̄i (·) is nondecreasing, and the integral
formula (ii) follows from the fundamental theorem of calculus. In general, these results
follow from the convexity of Ūi (·) and results from convex analysis (see Rockafellar (1970,
Theorem 24.2)).
(Sufficiency) Suppose that θ1 > θ0 . Then since q̄i (·) is nondecreasing,
Z θ1 Z θ1
Ūi (θ1 ) = Ūi (θ0 )+ q̄i (θ̂i ) dθ̂i ≥ Ūi (θ0 )+ q̄i (θ0 ) dθ̂i = Ūi (θ0 )+ q̄i (θ0 )(θ1 −θ0 ),
θ0 θ0
–246–
which is (ICθ0 |θ1 ) as expressed in (68). A careful reading of the argument shows that the
same inequality holds if θ1 < θ0 .
Revenue equivalence
Returning to allocation problems, Theorem 7.11 and the revelation principle easily yield
the following result:
Theorem 7.12 (Revenue equivalence). Fix an allocation problem E , and suppose that mech-
anisms M and M † Bayesian implement social choice functions g(·) = (q(·), t(·)) and g† (·) =
(q† (·), t† (·)), where the latter have
(I) the same interim allocation probabilities (q̄i (·) = q̄†i (·) for all i ∈ A ), and
(II) the same expected utilities of the lowest types of each agent (Ūi (0) = Ūi† (0) for all i ∈ A ).
Then these mechanisms generate the same expected payoffs for each type of each agent and the same
expected revenue.
Proof. Fix an allocation problem E and a mechanism M that Bayesian implements social
choice function g(·) = (q(·), t(·)). By the revelation principle, g(·) is Bayesian incentive
compatible. Thus by payoff equivalence, agent i’s expected payoff function Ūi (·) and
expected transfer function t̄i (·) under (the relevant equilibrium of) M are given by formulas
(ii) and (ii’), whose right-hand sides only depend on g(·) by way of q̄i (·) and Ūi (0). The
expected transfer functions in turn determine the designer’s expected revenue, which
P R1
is i∈A 0 t̄i (θi ) fi (θi ) dθi . Thus interim expected payoffs and expected revenue under M
only depend on g(·) by way of q̄i (·) and Ūi (0).
Example 7.13. Revenue equivalence in auctions. Theorem 7.12 implies that any auction format
which Bayesian implements a social choice function that
(I0 ) awards the good to the bidder with the highest valuation, and
(II) gives each bidder’s lowest type an expected payoff of zero
generates the same expected revenue.
(Notice that (I0 ) is a more demanding sort of requirement than (I) from Theorem 7.12:
(I0 ) is a condition on ex post allocation probabilities, whereas (I) is a condition on interim
allocation probabilities.)
When the environment is symmetric (i.e., all agents values are drawn from the same
distribution), there are many auction formats with equilibria satisfying (I0 ) and (II).
In asymmetric environments, conditions (I0 ) and (II) do not hold for all standard auc-
tion formats. For instance, in a second price auction, bidding truthfully is still weakly
–247–
dominant, and so leads to an efficient allocation, but in a first price auction the equilib-
rium allocation need not be efficient. In fact, either format may generate higher expected
revenue than the other depending on the type distributions; see Section 4.3.2 of Krishna
(2002). _
There is also a payoff equivalence theorem for dominant strategy incentive compatibility
(Laffont and Maskin (1979)). Dominant strategy incentive compatibility requires the
reports of each type θi of each agent i to be optimal regardless of his opponents’ Bayesian
strategies (i.e., maps from types to announcements). By Observation 7.10, this requirement
amounts to the ex post incentive compatibility constraints
(69) θi qi (θi , θ−i )−ti (θi , θ−i ) ≥ θi qi (θ̂i , θ−i )−ti (θ̂i , θ−i ) for all θi , θ̂i ∈ Θi , θ−i ∈ Θ−i , and i ∈ A .
For each fixed θ−i , the constraints (69) are an instance of the incentive compatibility
constraints from the principal-agent problem from Section 6.2.2. Thus Theorem 7.14 below,
which characterizes dominant strategy incentive compatibility, follows immediately from
Theorem 6.11, though it too can be established using the convexity argument used to
prove Theorem 7.11.
To state this result, let
be agent i’s ex post utility under social choice function g(·) = (q(·), t(·)).
–248–
7.3 Revenue Maximization and Optimal Auctions
By payoff equivalence, (71) reduces to the requirement that Ūi (0) ≥ 0 for all i ∈ A . (Were
individual rationality not imposed, one could obtain arbitrarily high revenues by taking
each Ūi (0) to −∞.)
Now, recalling that Θi = [0, 1], we can write the principal’s problem as follows
XZ 1 Z θi !
(72) max θi q̄i (θi ) − q̄i (θ̂i ) dθ̂i − Ūi (0) fi (θi ) dθi
q(·),Ū(0) 0 0
i∈A
subject to q(θ) ∈ ∆A
q̄i (·) nondecreasing for all i ∈ A
Ūi (0) ≥ 0 for all i ∈ A
It is instructive to compare problem (72) to the principal’s problem (58) in the setting of
adverse selection with a continuum of types (Section 6.2.2). If in the latter we assume that
utilities are of the linear form u(q, θ) = θq, we obtain
Z 1 Z θ !
(58) max θq(θ) − q(θ̂) dθ̂ − U(0) − c(q(θ)) f (θ) dθ
q(·),U(0) 0 0
Thus in this linear case, (58) closely resembles (72). In the former problem, the quality
q(θ) is a scalar, and the objective function includes a production cost term that is absent
–249–
from (72). For its part, (72) is a multiagent problem, with the allocation probabilities
q(θ) ∈ Rn required to satisfy the constraint q(θ) ∈ ∆A , and with the objective function and
monotonicity constraints stated in terms of the interim objects q̄i (·) and Ūi (0) ≥ 0. The
analysis to follow will address these novelties.
We assume that the virtual valuation of type θi , defined by
1 − Fi (θi )
(73) ψi (θi ) = θi − ,
fi (θi )
is increasing in θi .
Proof. To start, reversing the order of integration in the double integral in (72) yields
Z 1 Z θi ! Z 1 Z 1 ! Z 1
(74) q̄i (θ̂i ) dθ̂i fi (θi ) dθi = fi (θi ) dθi q̄i (θ̂i ) dθ̂i = (1 − Fi (θ̂i )) q̄i (θ̂i ) dθ̂i .
0 0 0 θ̂i 0
XZ 1
(75) θi fi (θi ) − (1 − Fi (θi )) q̄i (θi ) dθi
i∈A 0
XZ 1 !
1 − Fi (θi )
= θi − q̄i (θi ) fi (θi ) dθi
0 fi (θi )
i∈A
XZ
= ψi (θi ) q̄i (θi ) fi (θi ) dθi
i∈A Θi
Then substituting in definition (66) of q̄i (θi ) and rearranging the result lets us express the
–250–
objective function as
XZ Z ! Z X
ψi (θi ) qi (θi , θ−i ) f−i (θ−i ) dθ−i fi (θi ) dθi = ψ q f (θ) dθ.
(76) i (θi ) i (θ)
Θi Θ−i Θ
i∈A i∈A
If we ignore the requirement that q̄i (·) be nondecreasing, then under the constraint q(θ) ∈
∆A , maximization requires that
To see that the monotonicity requirement is moot, suppose that θ̂i < θi . Then ψi (θ̂i ) <
ψi (θi ), so (77) implies that qi (θ̂i , θ−i ) ≤ qi (θi , θ−i ) for all θ−i ∈ Θi . Thus integrating yields
q̄i (θ̂i ) ≤ q̄i (θi ) as required.
1−F (θ )
This proof provides an interpretation of the virtual valuation ψi (θi ) = θi − fi (θi i )i . The first
term is an agent’s actual valuation. The second term is the information rent attributed to
an agent i of type θi , but this rent is earned by higher types of agent i. Remark 7.16 explains
why.
Roughly speaking, the virtual valuation ψi (θi ) represents the surplus that the principal can
extract from agent i of type θi if information rents to higher types of agent i are properly
accounted for. This explains why the principal never allocates the good to an agent i of
a type θi with a negative virtual valuation. Ex post the principal may prefer to do so,
but ex ante it is a bad idea: if the principal employs a mechanism that does so, his payoff
when type θi (or a nearby type) is realized will be outweighed by the expected rent that
the mechanism must pay to higher types of agent i.
Remark 7.16.
(i) The interpretation of virtual valuations, which first appear in (75), is essentially
the same as to the interpretation of virtual utilities in the monopolistic screening
problem (see Remark 6.13). Indeed, since ui (q, θ) = qθ, virtual valuations and
virtual utility are equal here.
(ii) To proceed from (75) to the solution of the principal’s problem, we rearange (75)
to obtain the right-hand side of (76), an integral over type profiles. This is useful
because it allows us to try maximizing the expression in parentheses at each type
profile θ, subject to the constraint q(θ) ∈ ∆A , after which we can check the each
agent’s monotonicity condition. The independence of types, and implicitly the
assumption of private values, makes this rearrangement possible. These assump-
tions likewise ensure that the quantity ψi (θi ) used to determine whether agent i
gets the object at type profile θ = (θi , θ−i ) only reflects information rents paid to
higher types of agent i, and not information rents paid to other agents.
–251–
(iii) To interpret virtual valuations, Bulow and Roberts (1989) imagine a monopolist
who faces inverse demand curve D(p) = 1 − Fi (p). (This means that the 90th
percentile of willingness to pay, p.90 , is implicitly defined by 1 − Fi (p.90 ) = .10, and
so by Fi (p.90 ) = .90.) If this monopolist has no production costs, it should choose a
price p to maximize revenue R(p) = p(1 − Fi (p)). The marginal change in revenue
from reducing the price is −R0 (p) = p fi (p) − (1 − Fi (p)) = fi (p)ψi (p). This captures the
tradeoff between the two effects of a price reduction: selling to more consumers on
the margin versus reducing the price paid by existing consumers.
We now introduce transfer function t(·) to complete the definition of the mechanism. In
view of payoff equivalence, t(·) must be chosen so that expected transfers satisfy
Z θi
t̄i (θi ) = θi q̄i (θi ) − q̄i (θ̂i ) dθ̂i
0
We can make this property of interim transfers hold by making it hold ex post—that is,
by defining
Z θi
(78) ti (θi , θ−i ) = θi qi (θi , θ−i ) − qi (θ̂i , θ−i ) dθ̂i .
0
If qi (θi , θ−i ) = 0, then qi (θ̂i , θ−i ) = 0 for θ̂i < θi (by Theorem 7.15), so (78) implies that
ti (θi , θ−i ) = 0. In other words, an agent who is not assigned the object pays nothing.
To evaluate other cases, let
n o
(79) τi (θ−i ) = inf θi : ψi (θi ) = max{ψ1 (θ1 ), . . . , ψn (θn ), 0} .
In words: if agent i is assigned the good, her payment is the lowest valuation she could
have had and still have been assigned the good.
Summing up, we have
–252–
Proposition 7.17. Suppose that virtual valuations are increasing. Then one mechanism that
maximizes revenue among Bayesian incentive compatible, individually rational mechanisms is the
direct mechanism
(1, τi (θ−i )) if θi > τi (θ−i ),
(qi (θ), ti (θ)) =
(0, 0) if θi < τi (θ−i ),
where τi (θi ) is defined by (79). In fact, this mechanism is dominant strategy incentive compatible.
The final claim is easy to check, but still quite surprising. We were looking to maximize
revenue while satisfying Bayesian incentive compatibility, but found we were able to do so
while satisfying the much more demanding requirement of dominant strategy incentive
compatibility, which does not require agents to correctly anticipate opponents’ behavior.
(We could have seen this coming: The statement of Theorem 7.15 noted (and it is easy
to see directly) that the allocation function (80) is “ex post monotone”, in that for each
agent i and each profile θ−i of opponents types, qi (·, θ−i ) is nondecreasing. The transfers
(78) are the transfers (70) with Ui (0, θ−i ) ≡ 0. Thus Theorem 7.14 implies that (qi (·), ti (·)) is
dominant strategy incentive compatible. We return to this point shortly.)
Theorem 7.15 shows that under revenue maximizing mechanisms, the seller may wind
up keeping the good. This is clearly not ex post efficient. In addition, the foregoing
arguments show that even when the good is sold, it may not be sold to the agent who
values it most, since the agent with the highest virtual valuation ψi (θi ) need not have
the highest valuation θi . However, if the environment is symmetric, with all agents’
types being drawn from the same distribution, the latter inefficiency is absent, as we now
discuss.
Symmetric environments
Now suppose further that the environment is symmetric, so that ψ j (·) = ψ(·) for all j. Then
(79) becomes
n o
τi (θ−i ) = inf θi : ψ(θi ) = max{ψ(θ1 ), . . . , ψ(θn ), 0}
n o
= inf θi : θi = max{θ1 , . . . , θn , ψ−1 (0)} .
Proposition 7.18. Consider a symmetric independent private values environment with increasing
virtual valuations. Then a second price auction with reserve price ψ−1 (0) is revenue maximizing
–253–
among Bayesian incentive compatible, individually rational mechanisms. Moreover, this mecha-
nism is dominant strategy incentive compatible.
In the allocation problem studied above, we were able to find dominant strategy incen-
tive compatible mechanisms that were optimal among all Bayesian incentive compatible
mechanisms. Returning to the focus of Section 7.2, suppose that the social choice function
(q(·), t(·)) is Bayesian incentive compatible. Under what conditions on the allocation func-
tion q(·) can we find an alternative transfer function t† (·) such that (q(·), t† (·)) is dominant
strategy incentive compatible? Theorem 7.14 implies that this is possible if and only if
qi (·, θ−i ) is nondecreasing for all θ−i ∈ Θ−i and i ∈ A . For results on this question in more
general environments, see Mookherjee and Reichelstein (1992).
One can also ask the following less demanding question: if a direct mechanism g(·) =
(q(·), t(·)) is Bayesian incentive compatible, when is it possible to find another direct mech-
anism g† (·) = (q† (·), t† (·)) with the same interim allocation probabilities (q̄†i (·) = q̄i (·)) and
expected transfers (t̄†i (·) = t̄i (·)), but that is dominant strategy incentive compatible? Re-
markably, Manelli and Vincent (2010) show that this is always possible! Of course, the
ex post allocation probabilities q† (·) under the new mechanism generally differ from the
allocation probabilities q(·) under the original mechanism; for instance, even if the original
allocation function q(·) was ex post efficient, the new allocation function q† (·) generally
will not be.
Gershkov et al. (2013) extend Manelli and Vincent’s (2010) result to all environments with
linear utility (Example 7.4), and provide a more direct proof. In the context of allocation
problems, the analysis boils down to the following non-obvious fact: given any “joint
distribution” q(·) whose “marginal distributions” q̄i (·) are nondecreasing, one can construct
a new “joint distribution” q† (·) with the same “marginal distributions” q̄†i (·) = q̄i (·), but
whose “conditional distributions” q†i (·, θ−i ) are all nondecreasing.
We now turn to the question of implementing efficient allocations. Here it is most natural
to think of the mechanism being chosen by a social planner, or designed by the agents
themselves (rather than as a principal / many agents problem as in Section 7.3). Ap-
plications include public good provision and allocation problems, in particular bilateral
trading problems.
–254–
(We have seen that revenue maximizing allocation mechanisms are generally inefficient:
the principal may wind up keeping the good, and in asymmetric environments, a good
may be sold to an agent whose valuation is not highest.)
We would like to implement social choice functions g(·) = (x∗ (·), t(·)) whose allocation
functions x∗ (·) are ex post efficient, as defined at the start of the section:
X
(81) x∗ (θ) ∈ argmax ui (x, θi ) for all θ ∈ Θ.
x∈X i∈A
If all actors in the economy are agents in the mechanism, then true efficiency also requires
that transfers not be burned—see Section 7.4.2.
be the total utility of agents besides i at allocation x∗ (θ) when the type profile is θ.
The direct mechanism for g(·) = (x∗ (·), tG (·)), where the transfer function tG (·) is of the form
for some hi : Θ−i → R is called a Groves mechanism. For future reference, we call the Groves
–255–
mechanism with hi (θ−i ) ≡ 0 the plain Groves mechanism.
Proof. If agent i is of type θi , his payoff from reporting θ̌i when others report θ̂−i is
X
(82) = ui (x∗ (θ̌i , θ̂−i ), θi ) + u j (x∗ (θ̌i , θ̂−i ), θ̂ j ) − hi (θ̂−i ).
j,i
Since x∗ (·) is ex post efficient, it follows that for any θ̂−i , this function is maximized by
choosing θ̌i = θi . Thus truth-telling is a very weakly dominant strategy (cf. Observation
7.10).
While this proof is very short, it is not transparent, so it is worth going through slowly.
Consider the sum of the first two terms in (82). Since x∗ (·) is ex post efficient, allocation
x∗ (θi , θ̂−i ) maximizes this sum when the second arguments of u1 , . . . , un are given by the
components of (θi , θ̂−i ). By definition, this is true whether any given component of (θi , θ̂−i )
is an agent’s actual type or merely a type announcement.
So suppose that agent i is of type θi and that the others report θ̂−i . If agent i announces θi ,
then the allocation is x∗ (θi , θ̂−i ), which maximizes
X
(83) ui ( · , θi ) + u j ( · , θ̂ j ).
j,i
This is (82) (ignoring the hi (θ̂−i ) term). If the agent announces θ̌i , θi , then the allocation
is x∗ (θ̌i , θ̂−i ), which maximizes
X
ui ( · , θ̌i ) + u j ( · , θ̂ j ),
j,i
but which need not maximize (83). So for any type reports θ̂−i of the others, it is a best
response for an agent i of type θi to report θi . In other words, truth-telling is a very weakly
dominant strategy.
–256–
report by agent i that leads to a given allocation x will also require i to pay the same
transfers. We will observe this property in the examples below.
Let x−i : Θ−i → X be an “allocation function” that is ex post efficient if agent i’s payoffs are
ignored:
X
x−i (θ−i ) ∈ argmax u j (x, θ j ).
x∈X j,i
Observation 7.20. Transfers under the VCG mechanism are nonnegative, and agent i’s transfer
is positive only if his announcement affects the allocation.
x∗ (θ) = i when {i} = argmax θ j and x−i (θ−i ) = k when {k} = argmax θ j , so
j∈A j,i
max j,i θ j if x∗ (θ) = i,
ti (θ) =
V
0 otherwise.
0 otherwise, 0 otherwise.
–257–
Thus
n−1
c − j,i θ j if x∗ (θ) = 1 and x−i (θ−i ) = 0,
P
n
tVi (θ) = if x∗ (θ) = 0 and x−i (θ ) = 1,
j,i θ j − n c
P n−1
−i
if x∗ (θ) = x−i (θ−i ).
0
(The transfer payment tVi (·) is in addition to the tax nc that the agent must pay if the
monument is built. The latter is already included in the the definition of allocation 1 ∈ X;
in particular, it appears in the agents’ utilities from that allocation.)
If agent i’s report causes the monument to be built, his VCG transfer tVi (θ) is the difference
between the others’ contributions to the construction cost and the sum of the others’
reported valuations. Since the monument would not have been built if agent i were
absent, this transfer is nonnegative. If agent i’s report prevents the monument from being
built, his VCG transfer is the difference between the sum of the others’ reported valuations
and their contributions to the construction cost; again, this transfer is nonnegative. If agent
i’s report does not affect the public decision, his VCG transfer is zero.
For some type profiles θ, the monument will be built (x∗ (θ) = 1) even though some agent
i’s valuation is less than his contribution to the construction cost (θi − nc < 0). Under the
VCG mechanism, agent i still prefers to tell the truth. If he does so, then his transfer is
zero, so his payoff is θi − nc < 0. If instead he reports a low enough type that the monument
is not built, then his VCG transfer exceeds |θi − nc |, so he is even worse off. _
Example 7.22 can be interpreted as a collective choice problem, with the agents themselves
devising the mechanism in order to ensure an ex post efficient outcome. In such contexts,
some mechanisms that satisfy incentive compatibility constraints may not be useful in
practice. If agents have property rights, they may be unwilling to participate in a mech-
anism in which they always lose (as above). In most cases the mechanism cannot run
an ex post deficit, and if agents are unwilling to “burn money” it cannot run an ex post
surplus either. The remainder of this section considers what can be achieved when these
additional constraints must be met.
Also known as the expected externality mechanism. References: d’Aspremont and Gérard-
Varet (1979), Arrow (1979).
The VCG mechanism implements ex post efficient allocation functions in weakly dominant
–258–
strategies. Transfers under this mechanism are nonnegative.
Suppose instead that we would like to implement an ex post efficient social choice function
with a transfer function that satisfies (ex post) budget balance:
X
ti (θ) = 0 for all θ ∈ Θ.
i∈A
The means that money is not burned. Budget balance is necessary for full ex post efficiency
when everyone in the economy participates in the mechanism.
Budget balance is easily achieved if there is one agent whose preferences are known, as
we can have this agent receive the others’ payments. What can be done if this is not the
case? A natural idea is to have each agent make a payment equal to his VCG transfer, and
then to redistribute this payment among the other agents. But this raises the possibility
of an agent misrepresenting his type in order to increase the redistributions he receives.
We now show that if types are independent, both ex post efficiency and budget balance
can be achieved. However, in addition to assuming independence, we must weaken the
notion of implementation from dominant strategy to Bayesian implementation.
Let allocation function x∗ (·) be ex post efficient. Using the notation for continuous type
spaces, and using the assumption that types are independent, let
Z
(85) t̄Vi (θi ) = tVi (θi , θ−i ) f−i (θ−i ) dθ−i
Θ−i
denote type θi ’s expected VCG transfer when his opponents are truthful. (This is also
called type θi ’s “expected externality”.)
The direct mechanism for g(·) = (x∗ (·), tA (·)), where
1 X V
(86) tAi (θ) = t̄Vi (θi ) − t̄ (θ j )
n − 1 j,i j
is called the AGV mechanism. In words, an agent of type θi makes a payment equal to
t̄Vi (θi ), his expected transfer under the VCG mechanism, and he receives a share of each
other agent’s payment. Dividing each agent’s payment among the others ensures budget
balance. The fact that each agent’s payment only depends on his type, and hence that each
opponent’s payment only depends on that opponent’s type, ensures that misrepresenting
his type cannot change the payments that an agent receives from his opponents. (It is not
so important that the sharing is equal under (86); what matters is that each agent’s base
–259–
payment is given to the other agents in a manner that the others cannot influence.)
Example 7.23. Allocation of an indivisible private good. Once again let X = A , ui (i, θi ) = θi
and ui ( j, θi ) = 0 for j , i. We saw in Example 7.21 that the VCG mechanism is equivalent
to a second price auction:
Thus under the AGV mechanism, the payment made by an agent i of type θi is
Z
t̄Vi (θi ) = tVi (θi , θ−i ) f−i (θ−i ) dθ−i
Θ
Z −i !
= max θ j f−i (θ−i ) dθ−i
{θ−i : θi >max j,i θ j } j,i
Z θi
= v dFn−1 (v),
0
since Fn−1 (v) (= F(v)n−1 ) is the distribution of the highest valuation of the other agents.
(Verifying the final equality rigorously is a good exercise.)
If types ares uniformly distributed, then Fn−1 (v) = vn−1 and f n−1 (v) = (n − 1)vn−2 , so the
payment of an agent i of type θi is
Z θi Z θi
n−1
t̄Vi (θi ) = v · (n − 1)v n−2
dv = (n − 1) vn−1 dv = (θi )n .
0 0 n
Thus an agent i of type θi pays each of the others n1 (θi )n , and receives his shares of their
payments. _
Proposition 7.24. The AGV mechanism is Bayesian incentive compatible and budget balanced.
Proof. Budget balance is clear, so we need only check Bayesian incentive compatibility.
For the latter, we can ignore the redistribution term (i.e., the second term) in the AGV
transfers (86), since it is independent of agent i’s announcement, and so does not affect i’s
incentives. (This amounts to ignoring identical terms that would appear on each side of
each inequality below.) It is thus enough to show that the mechanism (x∗ (·), t̄V (·)) based
on the expected VCG transfers (85) is Bayesian incentive compatible.
–260–
To do so, we show that if agent i’s opponents are truthful, then his Bayesian incentive
compatibility constraints under (x∗ (·), t̄V (·)) are averages of his dominant strategy incentive
compatibility constraints from the VCG mechanism. Type θi ’s VCG incentive compati-
bility constraints require that for every announcement θ̂i ∈ Θi and every announcement
profile θ−i ∈ Θ−i of i’s opponents,
ui (x∗ (θi , θ−i ), θi ) − tVi (θi , θ−i ) ≥ ui (x∗ (θ̂i , θ−i ), θi ) − tVi (θ̂i , θ−i ).
Θ−i Θ−i
or equivalently
Z Z
ui (x∗ (θi , θ−i ), θi ) f−i (θ−i ) dθ−i − t̄Vi (θi ) ≥ ui (x∗ (θ̂i , θ−i )) f−i (θ−i ) dθ−i − t̄Vi (θ̂i )
Θ−i Θ−i
for all announcements θ̂i ∈ Θi . This is type θi ’s Bayesian incentive compatibility constraint
under (x∗ (·), t̄V (·)). (Understanding why the independence assumption is needed in this
proof is a very good exercise.)
Remark: While we constructed the AGV mechanism by modifying the VCG mechanism,
we can also obtain a Bayesian incentive compatible, budget balanced mechanism by
modifying any other Groves mechanism in the same fashion. It is easy to check that
the resulting transfers t(·) are of the form ti (θ) = tAi (θ) + ci , where the constants ci satisfy
i∈A ci = 0 (as required for budget balance). The values of these constants determine, for
P
instance, whether the mechanism also satisfies individual rationality constraints, a topic
we return to in Sections 7.4.3 and 7.4.4.
Our aim in this section and the next is to answer the following question: in independent
private values environments, when can we find a mechanism that satisfies Bayesian incen-
tive compatibility, ex post efficiency, budget balance, and interim individual rationality.
The last requirement asks that each agent be willing to participate in the mechanism after
learning her own type.
–261–
Interim individual rationality
Type θi ’s expected utility under the plain Groves mechanism when all agents report
truthfully is
Z
ui (x∗ (θi , θ−i ), θi ) − tPi (θi , θ−i ) f−i (θ−i ) dθ−i
ŪiP (θi ) =
Θ−i
–262–
XZ
(89) = u j (x∗ (θi , θ−i ), θ j ) f−i (θ−i ) dθ−i .
j∈A Θ−i
(We implicitly assume that this maximum exists, as is true under suitable continuity
assumptions.) We call a type θ†i that achieves the maximum in (90) a most tempted type of
agent i. The direct mechanism for g(·) = (x∗ (·), t I (·)), where
Observation 7.25. The individually rational Groves mechanism is dominant strategy incentive
compatible. It is also interim individually rational: each type’s expected utility, ŪiI (θi ) = ŪiG (θi ) +
rGi , is at least ui (x† , θi ), and ŪiI (θ†i ) = ui (x† , θ†i ) for any most tempted type θ†i .
–263–
Bayesian ones. To achieve both budget balance and interim individual rationality, we
must ensure that the agents do not find opting out of the mechanism too appealing. We
present a simple sufficient condition for this to be so, and show that in allocation problems,
this condition is also necessary, and thus cannot be improved upon.
Building on the previous section, let g(·) = (x∗ (·), t I (·)) be an individually rational Groves
mechanism, let t̄iI (θi ) be type θi ’s expected transfer under this mechanism, and let t̄¯iI be
agent i’s ex ante expected transfer under this mechanism, assuming truth-telling in each
case.
Z
t̄i (θi ) =
I
tiI (θi , θ−i ) f−i (θ−i ) dθ−i and
Θ−i
Z Z
¯t̄ I = t̄i (θi ) fi (θi ) dθi =
I
tiI (θ) f (θ) dθ.
i
Θi Θ
By Proposition 7.27, t̄iI (θi ) and t̄¯iI are the same for every individually rational Groves
mechanism.
Define the KPW mechanism to be the direct mechanism for g(·) = (x∗ (·), t∗ (·)), where
1 ¯I 1 X I 1 ¯I
(92) ∗
ti (θ) = t̄i (θi ) − t̄i −
I
t̄ (θ j ) − t̄ j .
n n − 1 j,i j n
Under (92), agent i of type θi pays t̄iI (θi ) − n1 t̄¯iI , giving an equal share to each other agent;
he also receives his share of the other agents’ payments. (The mechanism is named for
Krishna and Perry (2000) and Williams (1999)—see Section 7.4.4.)
Theorem 7.26. If an IR Groves mechanism (x∗ (·), t I (·)) generates nonnegative expected revenue,
then the KPW mechanism is Bayesian incentive compatible, interim individually rational, and
budget balanced.
Proof. To start, note that the assumption that an individually rational Groves mechanism
generates nonnegative expected revenue is expressed as nj=1 t̄¯jI ≥ 0.
P
1 X I 1 X ¯I 1 X ¯I
(93) t∗i (θ) = t̄iI (θi ) − t̄ j (θ j ) + t̄ − t̄ j .
n − 1 j,i n − 1 j,i j n
j∈A
Consider the direct mechanisms for (x∗ (·), t(·)), with t(·) defined by the first k = 1, 2, 3, 4
terms of (93).
–264–
(1) If t(·) is defined by the first term in (93), then (x∗ (·), t(·)) satisfies IC and IR. (Aver-
aging tiI (θi , θ−i ) over θ−i does not affect θi ’s IR constraints, and because types are
independent, it does not affect θi ’s IC constraints either (see the proof of Proposition
7.24).)
(2) If t(·) is defined by the first two terms in (93), then (x∗ (·), t(·)) satisfies IC and BB
(since the first term has been redistributed), but it may not satisfy IR (since t̄ jI (θ j )
may be negative).
(3) If t(·) is defined by the first three terms in (93), then (x∗ (·), t(·)) satisfies IC and IR
(since t̄¯jI is the expected value of t̄ jI (θ j )), but it may not satisfy BB.
(4) If t(·) = t∗ (·) is defined by the entirety of (93), then (x∗ (·), t∗ (·)) satisfies IC, BB (since
the third term has been redistributed), and IR (since j t̄¯jI ≥ 0).
P
One can verify that the KPW mechanism is symmetric in the following sense: the slack
in the individual rationality constraint of each agent i’s most tempted type is equal to the
individually rational Groves mechanism’s expected revenue divided by n.
We derived the KPW mechanism and the sufficient condition from Theorem 7.26 using
the plain Groves mechanism as a starting point. We now show that starting from any
Groves mechanism would have led us to the same conclusion. This observation is often
useful when the VCG transfers take a simple form—see Example 7.31.
Recall that a Groves mechanism has transfers of the form
X
(94) tGi (θ) = − u j (x∗ (θ), θ j ) + hi (θ−i ) = tPi (θ) + hi (θ−i ).
j,i
the expected value of the type-independent part of agent i’s transfer, which of course does
not depend on agent i’s type. Type θi ’s expected utility under the Groves mechanism
g(·) = (x∗ (·), tG (·)) when all agents report truthfully is
Z
ui (x∗ (θi , θ−i ), θi ) − tGi (θi , θ−i ) f−i (θ−i ) dθ−i
(96) ŪiG (θi ) =
ZΘ−i
ui (x∗ (θi , θ−i ), θi ) − (tPi (θi , θ−i ) + hi (θ−i )) f−i (θ−i ) dθ−i
=
Θ−i
–265–
XZ
= u j (x∗ (θi , θ−i ), θ j ) f−i (θ−i ) dθ−i − h̄i .
j∈A Θ−i
= rPi + h̄i .
(Notice that the identities of the most tempted types do not depend on the functions hi .)
The new mechanism with rebates, g(·) = (x∗ (·), t I,h (·)), has transfer functions
Proposition 7.27.
(i) All Groves mechanisms with rebates generate the same interim transfers for all types of all
agents:
(ii) Transfers (92) under the KPW mechanism can be computed starting from any Groves
mechanism using the same formula:
1 ¯ I,h 1 X I,h 1 ¯ I,h
∗
ti (θ) = t̄i (θi ) − t̄i −
I,h
t̄ (θ j ) − t̄ j .
n n − 1 j,i j n
Proof. Part (i) follows from the definition of interim transfers and equations (95) and (97).
Part (ii) follows from part (i) and definition (92) of the KPW transfers.
An example
Here we ask whether the KPW mechanism satisfies the desired properties in a specific
Bayesian collective choice problem by computing expected transfers under the IR Groves
–266–
mechanism. In doing so, it is useful to separate out the rebate terms by writing t̄iI (θi ) =
t̄Gi (θi ) − rGi and t̄¯iI = t̄¯Gi − rGi .
Example 7.28. A collective choice problem. Agent 1 (a zoologist) and agent 2 are deciding
whether to jointly adopt a pet. They can adopt an alligator (a) or a bunny (b), or they
can not adopt (d), which is the default option. Each agent’s type θi ∈ [0, 1] represents his
bravery; types are independent and uniformly distributed.
Both agent’s utilities from adoption are described by ui (a, θi ) = θi and ui (b, θi ) = 94 . If the
agents do adopt, agent 1 must care for the pet, which costs him 23 for either sort of pet. To
represent this, we make his utility from the default option u1 (d, θ1 ) = 23 . (We define his
utility function this way for convenience; alternatively, we could also subtract 23 from his
utility at every allocation.) Agent 2’s utility from the default option is u2 (d, θ2 ) = 0.
Can we construct a Bayesian incentive compatible, interim individually rational, ex post
efficient, and budget balanced mechanism for this problem?
The efficient allocation function x∗ (·) is
b if θ1 + θ2 < 98 ,
∗
x (θ) =
a if θ1 + θ2 > 98 .
(Either a or b can be specified when θ1 + θ2 = 89 ; since all of our functions are continuous
and since ties occur with probability zero we can safely ignore such cases.) Not adopting
is never efficient. Thus, since ui (a, θi ) = θi and ui (b, θi ) = 49 are the same for both agents,
each agent i’s ex post consumption benefit under the efficient allocation function is given
by
4
if θ1 + θ2 < 89 ,
∗ ∗
ui (x (θ), θi ) =
9
θi if θ1 + θ2 > 89 .
–267–
θi − 3 θi + if θi < 98 ,
2 1 32
= 81
(98)
θi if θi > 89 .
Now consider agent i’s transfers and expected transfers under the plain Groves mechanism
(hi (θ−i ) ≡ 0). Transfers under this mechanism are defined as
4
− 9 if θi + θ j < 98 ,
G ∗
ti (θ) = −u j (x (θ), θ j ) =
−θ j if θi + θ j > 89 .
Using (98) and (99), we find that type θi ’s expected utility under the plain Groves mecha-
nism is
2 θi + 9 θi + 162 if θi < 9 ,
1 2 1 145 8
G ∗
Ūi (θi ) = ūi (θi ) − t̄i (θi ) =
G
θi + 21 if θi > 89 .
Since this function is increasing in θi , and since the agents’ utilities from the default
allocation do not depend on their types, their rebates are given by
–268–
Thus the expected revenue of the individually rational Groves mechanism is
Since the individually rational Groves mechanism runs a surplus, Theorem 7.26 implies
that the KPW mechanism is Bayesian incentive compatible, interim individually rational,
and budget balanced. Under this mechanism, an agent 1 of type θ1 pays agent 2
One can check that agent 1’s “payment” is always negative, and that agent 2’s payment is
always positive. _
Theorem 7.26 shows that when an individually rational Groves mechanism runs an ex-
pected surplus, it is possible to find a mechanism that is Bayesian incentive compatible,
interim individually rational, ex post efficient, and budget balanced. We now show that
in the environment of the payoff and revenue equivalence theorems, the converse state-
ment is also true. The characterization that results is due to Krishna and Perry (2000) and
Williams (1999).
As in Section 7.2, we consider allocation problems with independent types and continuous
type sets, and we allow for random mechanisms. (With quasilinear utility, there is no
advantage in using random transfers, so we only need consider random allocations.)
Since we are interested in efficiency, we require the good to be allocated to one of the
agents.
–269–
A = {1, . . . , n} set of agents
Θi = [0, 1] set of agent i’s types
different agents’ types are independent
Fi , fi cdf and pdf of agent i’s type distribution
X = ∆A × Rn set of social alternatives
ui (q, θ) − ti = θi qi − ti agent i’s utility
Allocation function q∗ (·) is ex post efficient if k < argmax j θ j implies that qk (θ) = 0.
Proof. Sufficiency follows from Theorem 7.26 (which extends immediately to random
mechanisms). To establish necessity, suppose that a mechanism with the desired proper-
ties exists. Budget balance implies that the expected revenue of this mechanism is zero.
Thus the following proposition, which drops the budget balance constraint but retains
the others, implies that individually rational Groves mechanisms have a nonnegative
expected revenue.
Proposition 7.30. Among Bayesian incentive compatible, interim individually rational mecha-
nisms with ex post efficient allocation function q∗ (·), an individually rational Groves mechanism
maximizes the expected transfer from each type of each agent, and so maximizes expected revenue.
Proof. By payoff equivalence (Theorem 7.11), any Bayesian incentive compatible mech-
anism (q∗ (·), t(·)) that implements allocation function q∗ (·) has expected utility functions
that satisfy Ūi (θi ) = ŪiI (θi ) + ci .
Observation 7.25 says that the interim individual rationality constraint binds for some type
of agent i under an individually rational Groves mechanism. Thus the new mechanism
satisfies interim individual rationality if and only if ci ≥ 0. As the expected transfer of
type θi under the new mechanism is
t̄i (θi ) = θi q̄∗i (θi ) − Ūi (θi ) = θi q̄∗i (θi ) − (ŪiI (θi ) + ci )
–270–
Remark: In the proof above, the restriction to allocation problems was used in the appeal
to payoff equivalence (Theorem 7.11). Since the conclusion of that theorem remains true
in all linear utility environments (Example 7.4), so does that of Theorem 7.29. So, for
example, if in the collective choice problem from Example 7.28 we had found that an
individually rational Groves mechanism generated negative expected revenue, it would
have followed that no mechanism satisfying all four desiderata exists.
Example 7.31. Bilateral trade. The owner and potential seller of a good values it at θs . A
potential buyer values it at θb . These valuations are drawn from distributions on [0, 1] with
positive densities fs and fb . Thus there is a positive probability that there are gains from
trade, and also a positive probability that there are not, and either of these probabilities
may be arbitrarily close to 1.
When the allocation probabilities and transfers are q = (qb , qs ) and t = (tb , ts ), the buyer’s
utility is ub (q, θb ) − tb = qb θb − tb , and the seller’s is us (q, θs ) − ts = qs θs − ts .
Since the seller owns the good at the start, the default allocation is q† = (q†b , q†s ) = (0, 1), so
that ub (q† , θb ) = 0 and us (q† , θs ) = θs .
The two agents would like to design a mechanism ensuring that trade occurs whenever
it is ex post efficient. The mechanism must be Bayesian incentive compatible and interim
individually rational, and must not require payments to or from a third party.
The following result shows that no such mechanism exists.
Proof. We apply Theorem 7.29 and Proposition 7.27 (to take the VCG transfers as our
starting point). Ignoring ties, efficient allocation requires
1 if θ > θ , 0 if θ > θ ,
q∗b (θ) = ∗ (θ) =
b s b s
and q
s
0 if θb < θs , 1 if θb < θs .
–271–
tVs (θ) = θb q∗s (θ).
Notice that tVb (θ) + tVs (θ) = min{θb , θs } for all θ ∈ Θ. It follows that
Z
(100) t̄¯Vb + t̄¯Vs = tVb (θ) + tVs (θ) f (θ) dθ
ZΘ
= min{θb , θs } f (θ) dθ
Θ
Z
< θb f (θ) dθ
ZΘ
= θb fb (θb ) dθb
Θb
≡ θ̄b .
The strict inequality reflects the fact that θb > θs with positive probability.
Now observe that if the buyer is of type θb = 0, then (with probability 1) he does not receive
the good and pays no transfer (q∗b (0, θs ) = 0 and tVb (0, θs ) = θs q∗b (0, θs ) = 0 whenever θs > 0),
so his interim expected utility under the VCG mechanism is ŪbV (0) = 0.
Also, if the seller is of type θs = 1, then (with probability 1) she keeps the good and pays a
transfer of θb (q∗s (θb , 1) = 1 and tVs (θb , 1) = θb q∗s (θb , 1) = θb whenever θb < 1), so her interim
expected utility under the VCG mechanism is ŪsV (1) = 1 − t̄Vs (1) = 1 − θ̄b .
It then follows from (96) and (90) that the rebates in the individually rational VCG mech-
anism must satisfy
Combining these inequalities with (100) shows that the expected surplus of the individu-
ally rational VCG mechanism is
Thus by Theorem 7.29, the desired bilateral trading mechanism does not exist.
Remark: Before the development of information economics, it was sometimes taken for
granted that with property rights and enforceable contracts, allocations of goods among
individuals would be more-or-less efficient. After all, when allocations are inefficient,
–272–
mutually beneficial trades are available, and so will be undertaken voluntarily unless the
inefficiency is negligible.
The Myerson-Satterthwaite theorem shows that when individuals possess private infor-
mation, this claim is simply not true.
Section 7.4 considered the implementation of efficient allocations in private values envi-
ronments. Here we study the implementation of efficient allocations when each agent’s
utility may depend directly on other agents’ private information. We restrict attention to
the problem of allocating an indivisible private good. Until the end of the section, we
assume that each agent’s type, here interpreted as a signal about the quality of the good,
is one-dimensional.
A = {1, . . . , n} set of agents = set of allocations
Θi = [0, 1] set of agent i’s types
F cdf of the joint distribution of agents’ types
X=A set of allocations
vi (θ) agent i’s valuation for the good (differentiable)
vi (θ) − ti agent i’s utility if he receives the good and pays ti
−ti agent i’s utility if he does not receive the good and pays ti
Notice that agent i’s valuation for the good may depend on all agents’ signals. We assume
throughout that i’s valuation for the good is increasing in his own signal; in particular,
∂vi
(101) (θ) > 0 for all i ∈ A and θ ∈ Θ.
∂θi
An allocation function x(·) is ex post efficient if for each θ ∈ Θ, an agent with the highest
valuation vi (θ) is allocated the good.
The following example shows that ex post efficiency cannot always be achieved:
Example 7.33. Suppose that there are two agents, and that only agent 1 receives a signal.
The agents’ value functions are v1 (θ1 ) = θ1 + 1 and v2 (θ1 ) = 3θ1 . Ignoring boundary cases,
the efficient allocation function has x∗ (θ1 ) = 1 when θ1 < 12 and x∗ (θ1 ) = 2 when θ1 > 12 .
Only player 1 has private information, so we need only specify the transfer function t1 (·).
Moreover, the direct mechanism for g(·) = (x∗ (·), t1 (·)) is a single-agent decision problem,
so the only notion of implementation is to require optimal choices by each type of agent 1.
–273–
There is no transfer scheme t1 (·) that implements (x∗ (·), t1 (·)). This follows immediately
from Lemma 6.8, but it is easy to show directly. Let θ`1 < 12 < θh1 . Then incentive
compatibility requires that θ`1 + 1 − t1 (θ`1 ) ≥ −t1 (θh1 ) and that −t1 (θh1 ) ≥ θh1 + 1 − t1 (θ`1 ).
Adding these inequalities yields θ`1 ≥ θh1 , a contradiction. ♦
What prevents efficient implementation in Example 7.33 is that while increasing agent 1’s
signal makes the good more valuable to him, it can also make it efficient to allocate the
good to agent 2. To obtain a positive result we must exclude this possibility. This can be
accomplished by assuming that own signals matter most:
∂vi ∂v j
(102) (θ) > (θ) for all i ∈ A , j , i, and θ ∈ Θ.
∂θi ∂θi
In words: increasing agent i’s signal increases his valuation for the good more than it
increases other agents’ valuations for the good.
If own signals matter most, efficient implementation is possible, not only in the Bayesian
sense, but also in the ex post sense. Recall from Section 4.6 that a Bayesian strategy profile
s∗ is an ex post equilibrium if no type θi of any agent i would benefit from deviating from
his specified action s∗i (θi ) regardless of the realizations θ−i of the other agents’ types (see
equation (32)).
Given a direct mechanism M d = {{Θi }i∈A , g} with social choice function g, we say that M d
or g is ex post incentive compatible if truth-telling is an ex post equilibrium. The revelation
principle also holds for ex post implementation, and takes the same form as those for
Bayesian and dominant strategy implementation (see Section 7.1.4).
Let x∗ (·) be an ex post efficient allocation function, and let
( )
(103) mi (θ−i ) = inf θ i : vi (θ i , θ−i ) ≥ max v j (θ i , θ−i )
j∈A
be the lowest type that agent i could be and still value the good the most, given that
the others’ types are θ−i . (Note that if i could not value the good the most (in which
case x∗ (θ) , i), then mi (θ−i ) = inf ∅ = +∞.) The generalized VCG mechanism is the direct
mechanism for g(·) = (x∗ (·), t(·)) with transfer function
vi (mi (θ−i ), θ−i ) if x∗ (θ) = i,
ti (θ) =
0 if x∗ (θ) , i.
In words, if agent i is assigned the good when the others’ types are θ−i , he pays the
–274–
valuation he would have had for the good if he were the lowest type consistent with him
receiving the good. This mechanism becomes a second-price auction when values are
private, and it also generalizes the interdependent value auction studied in Proposition
4.13.
Proof. Fix any θ−i , and assume that agent i’s opponents report truthfully. We need to show
that it is optimal for agent i to report truthfully regardless of his type.
Suppose first that θ is such that x∗ (θ) = i. Then agent i values the good at least as much as
i
all opponents, so (102) and (103) imply that θi ≥ mi (θ−i ). If i reports truthfully, he receives
the good and pays
where the equality follows from (103). If θi = mi (θ−i ), then i’s payoff is zero; thus if
θi > mi (θ−i ), then (102) implies that i’s payoff is positive. Now if i reports θ̂i > mi (θ−i ),
then (102) and (103) imply that
so i still receives the good and pays (104), so he is indifferent between reporting θ̂i and θi .
If i reports θ̌i < mi (θ−i ), then (102) and (103) imply that
thus i does not receive the good and earns a payoff of zero. If i reports mi (θ−i ), then his
payoff is zero whether or not he receives the good. Thus truthful reporting is optimal.
Now suppose that θ is such that x∗ (θ) , i. Then i has an opponent j who values the
i
good at least as much as i does, and so (102) and (103) imply that θi ≤ mi (θ−i ). If i reports
truthfully, or reports any θ̌i < mi (θ−i ), his payoff is zero. If i reports θ̂i > mi (θ−i ), then
i receives the good and pays vi (mi (θ−i ), θ−i ); thus his payoff is 0 if θi = mi (θ−i ), and is
negative if θi < mi (θ−i ) (by (101)). Thus truthful reporting is again optimal.
Remarks:
–275–
(i) As in a second price auction, an agent’s report here helps determine whether he
receives the good, but does not affect what he pays if he does receive it. The latter
property is key for truthful reporting to be optimal.
(ii) The assumption (102) that own signals matter most is stronger than necessary. It
is enough that the inequality in (102) hold at signal profiles θ for which multiple
agents have the highest valuation, and then the inequality need only hold for those
agents. See Dasgupta and Maskin (2000).
(iii) The generalized VCG mechanism requires the designer to have detailed informa-
tion about the agents’ valuation functions. Dasgupta and Maskin (2000) and Perry
and Reny (2002) introduce mechanisms that require the planner to have far less
information. However, these mechanisms do require the agents to have detailed
information about one another’s preferences.
7.6 Correlated Types, Full Surplus Extraction, and the Wilson Doctrine
So far, all of the results that ensure Bayesian implementation of social choice functions
have relied on the assumption of independent types. (By definition, the results ensuring
dominant strategy implementation under private values (VCG, IR-VCG) and ex post
implementation (generalized VCG) did not require this assumption.) We will now see
that what can be achieved through Bayesian implementation expands radically when
correlation of types is introduced. We will then explain why this change takes advantage
of a naive aspect of standard mechanism design models.
We consider a private values environment like that in Section 7.4, except that we allow for
–276–
correlated types. We also assume that type spaces are finite.
This section presents the full surplus extraction theorem of Crémer and McLean (1985,
1988). In essence, this result shows that when there is any correlation among agents’
types, there is a dominant strategy mechanism which implements the efficient allocation
function, while delivering all of the agents’ benefits from the interaction to the designer.
For instance, in allocation problems, the designer is able to achieve perfect ex post price
discrimination despite only knowing the prior distribution on agents’ types!
Let x∗ (·) be an ex post efficient allocation function (see (81)). In Section 7.4.1, we saw
that the VCG mechanism, the direct mechanism with allocation function x∗ (·) and transfer
function tV (·) (see (84)), is dominant strategy incentive compatible. Player i’s interim
expected utility under this mechanism is
X
p(θ−i |θi ) ui (x∗ (θ), θi ) − tVi (θ) .
(105) ūVi (θi ) =
θ−i ∈Θ−i
Let the matrix Pi ∈ RΘi ×Θ−i , with elements pi (θ−i |θi ), describe agent i’s interim beliefs. Thus
the θi th row of Pi describes type θi ’s interim beliefs. If types were independent, then the
rows of Pi would be identical. In contrast, the full extraction result requires the rows of Pi
to be linearly independent.
Strikingly, Theorem 7.35 reveals that the slightest departure from independent types
allows for full surplus extraction via a dominant strategy mechanism. We call the direct
mechanism g(·) = (x∗ (·), tCM (·)) posited in the theorem the Crémer-McLean mechanism.
Proof. Let ūVi ∈ RΘi be the column vector whose elements are agent i’s interim expected
utilities under the VCG mechanism (105). Since Pi ∈ RΘi ×Θ−i has full row rank, there is a
–277–
column vector τi ∈ RΘ−i satisfying
(106) Pi τi = ūVi .
Since truth-telling is dominant under the VCG mechanism, and since the τi term in
(107) does not depend on θi , truth-telling is also dominant under the Crémer-McLean
mechanism. Furthermore, equations (107) and (106) imply that under this mechanism,
the interim expected utility of an agent i of type θi is
X
i (θi ) = ūi (θi ) −
ūCM p(θ−i |θi ) τi (θ−i ) = ūVi (θi ) − ūVi (θi ) = 0.
V
θ−i ∈Θ−i
The proof of Theorem 7.35 divulges what Pi having full-row rank actually buys us. For
any function ci : Θi → R, the planner can design transfers τi : Θ−i → R with the following
property: If agent i’s opponents report truthfully, then for each type θi ∈ Θi , the expected
cost of the transfers is ci (θi ). Thus by setting ci (·) equal to ūVi (·), the principal is able to
make each type’s expected τi (·) payment exactly equal to its expected VCG surplus. This
makes full surplus extraction possible.
(In this argument, we are allowed to suppose that i’s opponents report truthfully because
we are addressing expected transfers, which are evaluated at the mechanism’s equilib-
rium. We could not have supposed this when checking for dominant strategy incentive
compatibility, but this property already follows from the Crémer-McLean mechanism
being a Groves mechanism.)
Remarks:
(i) An analogue of Theorem 7.35 holds for settings with interdependent values, if we
assume as in Proposition 7.34 that own signals matter most. In this case, starting
with the generalized VCG mechanism and proceeding as in the proof of Theorem
7.35 shows that there is an ex post incentive compatible mechanism that achieves
full surplus extraction.
(ii) When types are “almost independent”, so that Pi “almost fails” to have full rank,
some of the transfer terms τi (θ−i ) may need to be very large. This makes sense
given the fact that when types are independent, full surplus extraction is no longer
possible. Robert (1991) shows that if one imposes a limited liability constraint,
–278–
placing an upper bound on the amount that agents may be required to pay, then the
planner’s optimal expected payoffs are continuous in the distribution of valuations,
implying that full surplus extraction cannot be achieved when the types are “almost
independent”.
(iii) The Crémer-McLean mechanism requires the designer to have detailed knowledge
about agents’ beliefs, and in particular about the correlations between different
agents’ types. We return to this point below.
To understand what makes full surplus extraction possible, we introduce a mechanism due
to Neeman (2004) that uses the more demanding requirement of Bayesian implementation,
but achieves full ex post surplus extraction.
Let Z ∈ RΘi ×Θi be the matrix whose diagonal elements equal 0 and whose off-diagonal
elements equal −K, where K > 0 is large. By the full row rank condition on Pi , there is a
matrix Li ∈ RΘ−i ×Θi that satisfies Pi Li = Z. Each column Lθ̂i i ∈ RΘ−i of Li can be interpreted
as a “lottery” whose outcome Lθ̂i i (θ−i ) ∈ R depends on the announcements of agent i’s
opponents.
Suppose that the designer asks each agent i to choose a lottery Lθ̂i i by reporting a type θ̂i .
If the agent’s actual type is θi , his opponents report truthfully, and he reports θ̂i , then his
expected payoff from the lottery is
X
p(θ−i |θi ) Lθ̂i i (θ−i ) = Zθi θ̂i .
θ−i ∈Θ−i
This equals 0 if θi = θ̂i and equals −K otherwise. Thus it is a Bayesian equilibrium for all
agents to report truthfully, and if they do so their expected transfers are zero.
Once all agents report their types truthfully, the planner can determine the efficient allo-
cation x∗ (θ) and each agent’s benefit ui (x∗ (θ), θi ) at this allocation. He can thus implement
allocation x∗ (θ), and require each agent to pay his benefit ui (x∗ (θ), θi ). If the penalty K
from the lotteries is large enough, the consequences of the reports made in the “first stage”
for payoffs in the “second stage” will not dissuade the agents from reporting truthfully.
We thus have
Proposition 7.36. Suppose that the interim belief matrices Pi have full row rank. Then the
Neeman mechanism Bayesian implements the ex post efficient allocation function x∗ (·) and gives
every type of every agent an ex post utility of zero.
–279–
Under the Neeman mechanism, the way that the designer learns each agent’s preferences
over social alternatives is split into two distinct pieces. The agent’s choice of lottery reveals
his beliefs about his opponents’ types. By the full row rank condition, these beliefs in turn
reveal the agent’s type, and thus his preferences over the social alternatives.
Neeman (2004) develops these last observations into a compelling criticism of the relevance
of full extraction results. This criticism returns us to our initial discussion of Bayesian
games in Section 4.1. There we noted that a player’s type captures two separate aspects
of his private information: his payoff-relevant information, and his beliefs about others’
types. If types are independent, then interim beliefs are common knowledge, so only the
payoff-relevant aspect of types remains private information. But if types are correlated,
then neither the payoff-relevant information nor interim beliefs are common knowledge.
The Crémer-McLean mechanism works by exploiting this joint uncertainty. It is able to do
so because of a peculiarity in the standard specification of mechanism design problems.
Traditionally, mechanism design problems are constructed by first specifying a set of
“payoff types”, which describe each agent’s payoff relevant information (see Section 4.8),
and then specifying the beliefs of each payoff type about opponents’ payoff types. Theorem
7.35 and the discussion that follows show that this approach to defining Bayesian games
can have unexpected consequences.
To make this idea precise, Neeman (2004) notes that the full rank condition on the interim
belief matrix Pi imposes the beliefs-determine-preferences property: it rules out the possibility
that there are two types of player i who have the same beliefs about opponents’ types,
but have different payoff types. For instance, in an allocation problem, it is plausible
to have two types of agent i that assign different values to the good, but that share the
same beliefs about opponents’ private information. Given any menu of lotteries whose
outcomes depend on the realization of θ−i , these two types will make the same choice,
but this choice does not reveal the chooser’s preferences. This precludes full surplus
extraction.
Wilson (1985, 1987) proposes two desirable properties that mechanisms should possess
in order to have practical relevance. Wilson (1985) suggests that mechanisms should
–280–
not require the designer to have very precise information about the agents’ beliefs and
preferences. (Dasgupta and Maskin (2000) call mechanisms with this property detail-free.)
The standard auction formats have this property, as do the interdependent value auctions
of Dasgupta and Maskin (2000) and Perry and Reny (2002) mentioned in Section 7.5;
the Crémer-McLean mechanism does not. In a similar vein, Wilson (1987) proposes that
useful mechanisms should not rely on unrealistically strong assumptions about what is
common knowledge among the players. Either or both of these criteria are referred to as
the Wilson doctrine.
The robust mechanism design literature is motivated both by Neeman’s (2004) criticism of
full extraction mechanisms and by the Wilson doctrine. Broadly speaking, its aims are
to see what social choice functions can be implemented under weak assumptions about
players’ beliefs. Implementation is often based on weak solution concepts, though as
explained below, this sometimes comes as a conclusion of the analysis rather than as an
assumption. We describe some specific contributions below. (The discussion may be
easier to follow after reviewing Sections 4.8 and 4.9.) See Bergemann and Morris (2012)
for a survey of the literature.
A natural starting point for robust mechanism design is a “pre mechanism design envi-
ronment” in which agents’ preferences and payoff types are specified, but their beliefs
are not. (This is a direct analogue of a pre-Bayesian game—see Section 4.8). Berge-
mann and Morris (2005) seek conditions that describe when a social choice function is
Bayesian implementable under any type spaces and beliefs that can be introduced to turn
the pre mechanism design environment into a fully-specified mechanism design environ-
ment. The necessary and sufficient condition is that the social choice function be ex post
implementable—a solution concept that is well-defined for the pre mechanism design
environment. In the quasilinear environments considered here, one might also ask about
implementation of social choice correspondences that fix an allocation function (e.g., the
efficient allocation function) but allow any transfer functions. Bergemann and Morris
(2005) show that the previous result holds in this context as well. These results have
the happy consequence that requiring robustness over all possible beliefs leads us to a
solution concept that is relatively easy to check.
The larger part of this literature looks at robust full implementation, which in addition
to robustness requires that the social choice function to be implemented is the unique
equilibrium outcome regardless of the specification of beliefs. (Previous work on full
implementation (without robustness) was noted briefly in Section 7.1.3.) The general
description above can be specified precisely in a variety of ways. For instance, Bergemann
and Morris (2008) consider ex post implementation, characterizing social choice functions g
for which there are mechanisms whose unique ex post equilibrium outcome is g.
Bergemann and Morris (2009, 2011) consider a more demanding requirement that they
–281–
dub robust implementation. Starting from a pre mechanism design environment, it requires
not only that the social choice function g be an ex post equilibrium of the mechanism, but
also that for every possible specification of beliefs, all Bayesian equilibrium outcomes are
close to g. They prove that this requirement is nearly equivalent to g being implementable
using the solution concept of belief-free rationalizability, which is obtained by applying
iterated dominance without imposing any common knowledge restrictions on agents’
beliefs (see Section 4.9).
Oury and Tercieux (2012) introduce what they term continuous implementation, which can
be roughly described as a “local” analogue of the approach of Bergemann and Morris
(2009, 2011). They fix a mechanism design environment, and consider the implications of
a social choice function being Bayesian implementable not only for that problem, but for all
problems in which agents’ beliefs are close to those in the original problem. They establish
that the preceding requirement is tightly linked to the notion of full implementation using
the solution concept of interim correlated rationalizability. The logic behind this is similar
to the reasoning behind Weinstein and Yildiz’s (2007) results on the robustness of ICR (see
Section 4.9).
The conclusions of Bergemann and Morris (2009, 2011) are stated in terms of a solution
concept that imposes no restrictions on agents’ beliefs, while those of Oury and Tercieux
(2012) use a solution concept for a fully specified Bayesian game. Taking a middle route,
Ollár and Penta (2015) consider full implementation in settings with common knowledge
of some belief restrictions. That is, in addition to the pre mechanism design environment,
certain restrictions on the beliefs held by each payoff type θi of agent i about other agents’
payoff types θ j are common knowledge. (For instance, rather than assuming that it
is common knowledge that payoff types are i.i.d. with a uniform distribution on the
unit interval (as one might in a usual mechanism design environment), it might only
be assumed common knowledge that the conditional expectation of each payoff type of
agent i about the payoff type of agent j , i is 12 .) Full implementation under such belief
restrictions is defined using Battigalli and Siniscalchi’s (2003) notion of ∆-rationalizability
(see Section 4.9). Ollár and Penta (2015) show that in a variety of applications, mild
belief restrictions are often enough to significantly expand the set of implementable social
choice functions relative to the belief-free approach studied in Bergemann and Morris
(2009, 2011).
References
Abreu, D., Dutta, P. K., and Smith, L. (1994). The folk theorem for repeated games: A
NEU condition. Econometrica, 62:939–48.
Abreu, D., Pearce, D., and Stacchetti, E. (1990). Toward a theory of discounted repeated
games with imperfect monitoring. Econometrica, 58:1041–63.
–282–
Akerlof, G. A. (1970). The market for ‘lemons’: quality uncertainty and the market
mechanism. Quarterly Journal of Economics, 84:488–500.
Arrow, K. J. (1951). Social Choice and Individual Values. Wiley, New York.
Arrow, K. J. (1979). The property rights doctrine and demand revelation under incomplete
information. In Boskin, M., editor, Economics and Human Welfare, pages 23–39. Academic
Press, New York.
Aumann, R. J. (1990). Nash equilibria are not self-enforcing. In Gabsewicz, J.-J., Richard,
J.-F., and Wolsey, L., editors, Economic Decision-Making: Games, Econometrics, and Opti-
misation, pages 201–206. North Holland, Amsterdam.
Avis, D., Rosenberg, G. D., Savani, R., and von Stengel, B. (2010). Enumeration of Nash
equilibria for two-player games. Economic Theory, 42:9–37.
Bagnoli, M. and Bergstrom, T. (2005). Log-concave probability and its applications. Eco-
nomic Theory, 26:445–469.
–283–
Battigalli, P. (1997). On rationalizability in extensive games. Journal of Economic Theory,
74:40–61.
Battigalli, P. and Siniscalchi, M. (2002). Strong belief and forward induction reasoning.
Journal of Economic Theory, 106:356–391.
Benoı̂t, J.-P. and Krishna, V. (1985). Finitely repeated games. Econometrica, 53:905–922.
Blume, A. and Heidhues, P. (2004). All equilibria of the Vickrey auction. Journal of Economic
Theory, 114:170–177.
Blume, L. E. and Zame, W. R. (1994). The algebraic geometry of perfect and sequential
equilibrium. Econometrica, 62:783–794.
–284–
Brandenburger, A. (1992). Lexicographic probabilities and iterated admissibility. In Das-
gupta, P. et al., editors, Economic Analysis of Markets and Games, pages 282–290. MIT
Press, Cambridge.
Bulow, J. and Roberts, J. (1989). The simple economics of optimal auctions. Journal of
Political Economy, 97:1060–1090.
Carlsson, H. and van Damme, E. (1993). Global games and equilibrium selection. Econo-
metrica, 61:989–1018.
Chen, X., Deng, X., and Teng, S.-H. (2009). Settling the complexity of computing two-
player Nash equilibria. Journal of the ACM, 36: article 14.
Chen, Y.-C., Di Tillio, A., Faingold, E., and Xiong, S. (2010). Uniform topologies on types.
Theoretical Economics, 5:445–478.
Cho, I.-K. and Kreps, D. M. (1987). Signaling games and stable equilibria. The Quarterly
Journal of Economics, 102:179–221.
Crémer, J. and McLean, R. P. (1985). Optimal selling strategies under uncertainty for a
discriminating monopolist when demands are interdependent. Econometrica, 53:345–
361.
–285–
Crémer, J. and McLean, R. P. (1988). Full extraction of the surplus in Bayesian and dominant
strategy auctions. Econometrica, 56:1247–1257.
Dasgupta, P., Hammond, P., and Maskin, E. (1979). The implementation of social choice
rules: some results on incentive compatibility. Review of Economic Studies, 46:185–216.
Dekel, E. and Fudenberg, D. (1990). Rational behavior with payoff uncertainty. Journal of
Economic Theory, 52:243–267.
Dekel, E., Fudenberg, D., and Levine, D. K. (1999). Payoff information and self-confirming
equilibrium. Journal of Economic Theory, 89:165–185.
Dekel, E., Fudenberg, D., and Levine, D. K. (2002). Subjective uncertainty over behavior
strategies: A correction. Journal of Economic Theory, 104:473–478.
Dekel, E., Fudenberg, D., and Morris, S. (2006). Topologies on types. Theoretical Economics,
1:275–309. Correction by Y.-C. Chen and S. Xiong, 3 (2008), 283–285.
Dekel, E., Fudenberg, D., and Morris, S. (2007). Interim correlated rationalizability. Theo-
retical Economics, 2:15–40.
Dekel, E. and Gul, F. (1997). Rationality and knowledge in game theory. In Kreps, D. M.
and Wallis, K. F., editors, Advances in Economics and Econometrics: Theory and Applications;
Seventh World Congress, volume 1. Cambridge University Press.
Dutta, P. K. (1995). A folk theorem for stochastic games. Journal of Economic Theory, 66:1–32.
Elmes, S. and Reny, P. J. (1994). On the strategic equivalence of extensive form games.
Journal of Economic Theory, 62:1–23.
–286–
Ewerhart, C. (2000). Chess-like games are dominance solvable in at most two steps. Games
and Economic Behavior, 33:41–47.
Ewerhart, C. (2002). Backward induction and the game-theoretic analysis of chess. Games
and Economic Behavior, 39:206–214.
Fan, K. (1952). Fixed-point and minimax theorems in locally convex topological linear
spaces. Proceedings of the National Academy of Sciences, 38:121–126.
Farrell, J. and Rabin, M. (1996). Cheap talk. Journal of Economic Perspectives, 10:103–118.
Flood, M. M. (1952). Some experimental games. Report RM-789-1, The Rand Corporation.
Folland, G. B. (1999). Real Analysis: Modern Techniques and Their Applications. Wiley, New
York, second edition.
Foster, D. P. and Vohra, R. (1997). Calibrated learning and correlated equilibrium. Games
and Economic Behavior, 21:40–55.
Fudenberg, D., Levine, D. K., and Maskin, E. (1994). The folk theorem with imperfect
public information. Econometrica, 62:997–1039.
Fudenberg, D. and Maskin, E. (1986). The folk theorem in repeated games with discounting
or with incomplete information. Econometrica, 54:533–554.
Fudenberg, D. and Tirole, J. (1991b). Perfect Bayesian equilibrium and sequential equilib-
rium. Journal of Economic Theory, 53:236–260.
Gershkov, A., Goeree, J. K., Kushnir, A., Moldovanu, B., and Shi, X. (2013). On the
equivalence of Bayesian and dominant strategy implementation. Econometrica, 81:197–
220.
–287–
Gilboa, I. (2009). Theory of Decision under Uncertainty. Cambridge University Press, Cam-
bridge.
Govindan, S. and Wilson, R. (2006). Sufficient conditions for stable equilibria. Theoretical
Economics, 1:167–206.
Green, J. R. and Laffont, J.-J. (1977). Characterization of satisfactory mechanisms for the
revelation of preferences for public goods. Econometrica, 45:427–438.
Halpern, J. Y. (2001). Substantive rationality and backward induction. Games and Economic
Behavior, 37:425–435.
Harsanyi, J. C. (1973). Games with randomly disturbed payoffs: A new rationale for
mixed-strategy equilibrium points. International Journal of Game Theory, 2:1–23.
Hart, S. and Mas-Colell, A. (2003). Uncoupled dynamics do not lead to Nash equilibrium.
American Economic Review, 93:1830–1836.
–288–
Hendon, E., Jacobsen, H. J., and Sloth, B. (1996). The one-shot-deviation principle for
sequential rationality. Games and Economic Behavior, 12:274–282.
Hillas, J. (1998). How much of forward induction is implied by backward induction and
ordinality? Unpublished manuscript, University of Auckland.
Holmström, B. (1979). Moral hazard and observability. Bell Journal of Economics, 10:74–91.
Hopkins, E. and Seymour, R. M. (2002). The stability of price dispersion under seller and
consumer learning. International Economic Review, 43:1157–1190.
Howard, R. A. (1960). Dynamic Programming and Markov Processes. MIT Press, Cambridge.
Jackson, M. O., Simon, L. K., Swinkels, J. M., and Zame, W. R. (2002). Communication and
equilibrium in discontinuous games of incomplete information. Econometrica, 70:1711–
1740.
Jehiel, P., Meyer-ter-Vehn, M., Moldovanu, B., and Zame, W. R. (2006). The limits of ex
post implementation. Econometrica, 74:585–610.
–289–
Jehle, G. A. and Reny, P. J. (2011). Advanced Microeconomic Theory. Financial Times/Prentice
Hall/Pearson, Harlow, England, third edition.
Kohlberg, E. and Mertens, J.-F. (1986). On the strategic stability of equilibria. Econometrica,
54:1003–1037.
Kreps, D. M., Milgrom, P., Roberts, J., and Wilson, R. (1982). Rational cooperation in the
finitely repeated Prisoner’s Dilemma. Journal of Economic Theory, 27:245–252.
Laffont, J.-J. and Maskin, E. (1979). A differential approach to expected utility maximizing
mechanisms. In Laffont, J.-J., editor, Aggregation and Revelation of Preferences, pages
289–308. North Holland.
Lahkar, R. (2011). The dynamic instability of dispersed price equilibria. Journal of Economic
Theory, 146:1796–1827.
Luce, R. D. and Raiffa, H. (1957). Games and Decisions: Introduction and Critical Survey.
Wiley, New York.
Mailath, G. J. and Samuelson, L. (2006). Repeated Games and Reputations: Long-Run Rela-
tionships. Oxford University Press, Oxford.
Mailath, G. J., Samuelson, L., and Swinkels, J. M. (1993). Extensive form reasoning in
normal form games. Econometrica, 61:273–302.
–290–
Mailath, G. J., Samuelson, L., and Swinkels, J. M. (1997). How proper is sequential
equilibrium? Games and Economic Behavior, 18:193–218.
Marx, L. M. and Swinkels, J. M. (1997). Order independence for iterated weak dominance.
Games and Economic Behavior, 18:219–245. Corrigendum, 31 (2000), 324-329.
Maskin, E. (1992). Auctions and privatization. In Siebert, H., editor, Privatization. Institut
für Weltwirtschaft an der Universität Kiel.
Maskin, E. (1999). Nash equilibrium and welfare optimality. Review of Economic Studies,
66:23–38. Working paper circulated in 1977.
Maskin, E. and Riley, J. G. (1984). Monopoly with incomplete information. RAND Journal
of Economics, 15:171–196.
Maskin, E. and Tirole, J. (2001). Markov perfect equilibrium I: Observable actions. Journal
of Economic Theory, 100:191–219.
Mertens, J.-F. (1989). Stable equilibria—a reformulation. I. Definition and basic properties.
Mathematics of Operations Research, 14:575–625.
Mertens, J.-F. (1991). Stable equilibria—a reformulation. II. Discussion of the definition,
and further results. Mathematics of Operations Research, 16:694–753.
Mertens, J.-F. (1995). Two examples of strategic equilibrium. Games and Economic Behavior,
8:378–388.
Mertens, J.-F. and Zamir, S. (1985). Formulation of Bayesian analysis for games with
incomplete information. International Journal of Game Theory, 14:1–29.
Milgrom, P. and Segal, I. (2002). Envelope theorems for arbitrary choice sets. Econometrica,
70:583–601.
–291–
Milgrom, P. R. and Stokey, N. L. (1982). Information, trade, and common knowledge.
Journal of Economic Theory, 26:17–27.
Milnor, J. (1954). Games against Nature. In Thrall, R. M., Coombs, C. H., and Davis, R. L.,
editors, Decison Processes, pages 49–59. Wiley, New York.
Mirrlees, J. (1999). The theory of moral hazard and unobservable behavior, part I. Review
of Economic Studies, 66:3–21. Working paper circulated in 1975.
Morris, S. (1994). Trade with heterogeneous prior beliefs and asymmetric information.
Econometrica, 62:1327–1347.
Morris, S. (1995). The common prior assumption in economic theory. Economics and
Philosophy, 11:227–253.
Morris, S. and Shin, H. S. (2003). Global games: Theory and applications. In Dewatripont,
M., Hansen, L. P., and Turnovsky, S. J., editors, Advances in Economics and Econometrics:
Theory and Applications, Eighth World Congress, volume 1, pages 56–114. Cambridge
University Press, Cambridge.
Mussa, M. and Rosen, S. (1978). Monopoly and product quality. Journal of Economic Theory,
18:301–317.
–292–
Myerson, R. B. and Reny, P. J. (2015). Sequential equilibria of multi-stage games with
infinite sets of types and actions. Unpublished manuscript, University of Chicago.
Ollár, M. and Penta, A. (2015). Full implementation and belief restrictions. American
Economic Review, 108:2243–2277.
Osborne, M. J. and Rubinstein, A. (1994). A Course in Game Theory. MIT Press, Cambridge.
Perea, A. (2011). An algorithm for proper rationalizability. Games and Economic Behavior,
72:510–525.
Reny, P. J. (1992). Backward induction, normal form perfection, and explicable equilibria.
Econometrica, 60:627–649.
–293–
Reny, P. J. (1999). On the existence of pure and mixed strategy Nash equilibria in discon-
tinuous games. Econometrica, 67:1029–1056.
Rosenthal, R. W. (1981). Games of perfect information, predatory pricing and the chain-
store paradox. Journal of Economic Theory, 25:92–100.
Rubinstein, A. (1989). The electronic mail game: Strategic behavior under “almost com-
mon knowledge”. American Economic Review, 79:385–391.
Samuelson, L. (1992). Dominated strategies and common knowledge. Games and Economic
Behavior, 4:284–313.
Sandholm, W. H. (2010). Population Games and Evolutionary Dynamics. MIT Press, Cam-
bridge.
Sandholm, W. H., Izquierdo, S. S., and Izquierdo, L. R. (2016). Best experienced payoff
dynamics and cooperation in the Centipede game. Unpublished manuscript, University
of Wisconsin, Universidad de Valladolid, and Universidad de Burgos.
–294–
Sannikov, Y. (2007). Games with imperfectly observable actions in continuous time. Econo-
metrica, 75:1285–1329.
Shapley, L. S. (1964). Some topics in two person games. In Dresher, M., Shapley, L. S.,
and Tucker, A. W., editors, Advances in Game Theory, volume 52 of Annals of Mathematics
Studies, pages 1–28. Princeton University Press, Princeton.
Simon, L. K. and Zame, W. R. (1990). Discontinuous games and endogenous sharing rules.
Econometrica, 58:861–872.
–295–
Ståhl, I. (1972). Bargaining Theory. Stockholm School of Economics, Stockholm.
Stalnaker, R. (1998). Belief revision in games: forward and backward induction. Mathe-
matical Social Scienes, 36:31–56.
Stoye, J. (2011). Statistical decisions under ambiguity. Theory and Decision, 70:129–148.
Thompson, F. (1952). Equivalence of games in extensive form. RM 759, The Rand Cor-
poration. Reprinted in Classics in Game Theory, H. W. Kuhn, Ed., p. 36-45, Princeton
University Press, Princeton (1997).
van Damme, E. (1984). A relation between perfect equilibria in extensive form games and
proper equilibria in normal form games. International Journal of Game Theory, 13:1–13.
Vickrey, W. (1962). Auctions and bidding games. In Recent Advances in Game Theory.
Princeton University Press, Princeton.
von Neumann, J. and Morgenstern, O. (1944). Theory of Games and Economic Behavior.
Prentice-Hall, Princeton.
Weinstein, J. and Yildiz, M. (2007). A structure theorem for rationalizability with applica-
tion to robust predictions of refinements. Econometrica, 75:365–400.
Wilson, R. (1987). Game theoretic analysis of trading processes. In Bewley, T. F., editor,
Advances in Economic Theory: Fifth World Congress. Cambridge University Press.
–296–
Young, H. P. (2004). Strategic Learning and Its Limits. Oxford University Press, Oxford.
Zeeman, E. C. (1980). Population dynamics from game theory. In Nitecki, Z. and Robinson,
C., editors, Global Theory of Dynamical Systems (Evanston, 1979), number 819 in Lecture
Notes in Mathematics, pages 472–497, Berlin. Springer.
–297–