博弈论01——囚徒困境（主导策略和弱主导策略）

最新推荐文章于 2025-08-07 20:58:17 发布

Fine姐

最新推荐文章于 2025-08-07 20:58:17 发布

阅读量817

点赞数 16

CC 4.0 BY-SA版权

分类专栏：算法博弈论文章标签：算法

本文链接：https://blog.csdn.net/moranxiao199/article/details/149341217

算法博弈论专栏收录该内容

3 篇文章

订阅专栏

主导策略和弱主导策略

博弈论（Game Theory）基础概念
- The tragedy of the commons 公地悲剧
- Game博弈的基础设定
囚徒困境 the prisoner’s dilemma
迭代消除主导策略 iterated elimination of dominated strategies
- Eg: 示例整个迭代消除的过程
弱主导策略定义 weak dominance
- Eg 弱主导策略的过程和缺陷
由此，我们不仅看出迭代消除弱主导策略比迭代消除主导策略更脆弱，而且我们还看到，并非每个游戏都能以这种方式完全解决。由此来看下章的另一种经典博弈——性别之战 the battle of the sexes

博弈论（Game Theory）基础概念

Game theory 博弈论
Game theory is the formal study of interactions between entities in strategic settings. 博弈论是对战略环境中实体之间相互作用的正式研究。

entities 实体
The entities may be countries, companies, other groups of people, or individuals. 实体可以是国家、公司、其他团体或个人
In the context of game theory the entities are typically called players. 实体统称为——玩家（players）

Game theory tries to model that the success of one player may not only depend on their own action, but also on the action of some or all of the other players. For example, in this sense the stock market is a “game” in which the profit or loss which an individual trader makes depends on the actions of the other traders
博弈论试图构建一个模型，即一个玩家的成功可能不仅取决于他自己的行动，还取决于部分或全部其他玩家的行动。例如，从这个意义上讲，股票市场就是一场“游戏”，其中单个交易者的盈利或亏损取决于其他交易者的行为。

The tragedy of the commons 公地悲剧

基本含义：在一个资源共享的系统中，每个个体都根据自己的最大利益来使用资源，而不会考虑集体的长期利益，最终会导致该资源被过度使用和彻底耗尽。

Game博弈的基础设定

组成中文	英文
n个有理性的玩家	n rational players [n] = {1, . . . , n}
每个玩家 $i$ 都有自己的一套策略 $S_i$	Each player i has a set of strategies Si.
从现在起直至另行通知，我们将只考虑有限的策略集	from now on and until further notice, we will only consider finite strategy sets
每个参与者 $i$ 也有一个收益函数 $p_i : S_1 \times S_2 \times \cdots \times S_n \to \mathbb{R}$	Each player i also has a payoff function pi : S1 × S2 × · · · × Sn → R
每个玩家 $i$ 从其策略集 $S_i$ 中选择一个策略 $s_i$	each player $i$ selecting one strategy $s_i$ from her strategy set $S_i$
这样的策略集合 $s_1, s_2, . . . , s_n)$ 称为策略配置文件	Such a collection of strategies $s_1, s_2, . . . , s_n)$ is also called a strategy profile
玩家拥有关于博弈的完整信息。这意味着每个玩家不仅知道自己的收益函数，还知道所有其他玩家的收益函数	the players have complete information about the game. This means that each player not only knows her own payoff function, but the payoff functions of all other players as well

囚徒困境 the prisoner’s dilemma

背景

两名囚犯two prisoners

状态	结局
两人都坦白	都判2年
一人坦白一人隐瞒	一人1年，一人判25年
两人都隐瞒	都判10年

两人都想最大化他们的收益，所以payoffs用负数。这里的值，就是payoff。
在这里插入图片描述

解决办法

Each player can determine her best strategy without reasoning about the other player’s action. No matter what the other player is doing, it is always better to confess.
每个玩家无需考虑其他玩家的行动即可确定自己的最佳策略。无论其他玩家做什么，坦白总是更好的选择。

对于玩家I 来讲，
假设玩家II选择坦白（confess），他坦白坐10年牢，不坦白坐25年。他肯定会选择坦白。
假设玩家II选择沉默（silent），他坦白坐1年牢，不坦白坐2年。他也肯定会选择坦白。
所以，这样可以看出
不管玩家II选择哪一个，玩家I 他都会坦白。坦白，就是玩家I的最佳策略。

主导策略定义 dominant strategies

Definition 2.1
Let $s$ and $t$ be two strategies of player $i$ . Then $s$ dominates $t$ if for any fixed actions of the other players, the payoff to player $i$ when using strategy s$$ is strictly higher than when using strategy $t$ . We say that $t$ is dominated (by $s$ ).
令 s 和 t 分别为玩家 i 的两个策略。如果对于其他玩家的任何固定行动，玩家 i 使用策略 s 的收益严格高于使用策略 t 的收益，则 s 支配 t。我们称 t 被 s 支配。

即 >,不接受 $\geq$
白话：不管别的玩家怎么出牌，我现在的牌都是最佳收益。我这个策略就是主导策略，完全主导我的其他策略。

总结思考

In other words, we found a solution of the game. By our logic, we have deduced that the outcome of
the game has to be that both players confess and they will spend 10 years in prison. Although both
players know that they would do better if they both agreed to stay silent, this is not the outcome
we predict due to the way the game is set up. We have seen a similar effect in the tragedy of the
commons.
换句话说，我们找到了这个博弈的解。根据我们的逻辑，我们推断出博弈的结果必然是双方都坦白，并将被判处10年监禁。尽管双方都知道，如果双方都同意保持沉默，他们的处境会更好，但由于博弈的设定，我们预测的结果并非如此。我们在公地悲剧中也看到了类似的效应。

迭代消除主导策略 iterated elimination of dominated strategies

Iterated elimination of dominated strategies : The process we have used here, namely the repeated elimination of dominated strategies to reduce the size of the game more and more, is called iterated elimination of dominated strategies.
我们这里使用的这个过程，即反复淘汰劣势策略，以不断缩小博弈的规模，被称为迭代消除主导策略。

dominance solvable : For some games, such as the prisoner’s dilemma, this method can be used to completely solve the game, i.e., to obtain a single, forced outcome. Games that can be completely solved in this way are called dominance solvable.
对于某些博弈，例如囚徒困境，这种方法可以用来完全解决博弈，即获得一个单一的、强制的结果。能够以这种方式完全解决的博弈被称为优势可解博弈。

Note: 有的博弈同时存在多个主导策列，可以任意选择先消除哪个。因为排除顺序无关紧要，最终结果将始终相同。

Eg: 示例整个迭代消除的过程

在这里插入图片描述
strategy 4 of player I which is dominated by strategy 3. Remove it.

Then strategy A of player II is dominated by strategy C.
在这里插入图片描述
Then strategy 5 of player I is dominated by strategy 2. Remove strategy 5.

Then strategy B of player II is dominated by strategy E. Remove strategy B.

同理

弱主导策略定义 weak dominance

Definition 2.2
Let $s$ and $t$ be two strategies of player $i$ . Strategy $s$ weakly dominates $t$ if for any fixed strategies of the other players, the payoff to player $i$ when using $s$ is at least as high as when using $t$ , and in at least one case strictly higher.
令 s 和 t 分别为玩家 i 的两种策略。如果对于其他玩家的任何固定策略，玩家 i 使用 s 时的收益至少与使用 t 时的收益一样高，并且至少在一种情况下严格高于 t，则称策略 s 弱支配 t。

白话：弱主导策略，就是之前主导策略的大于，变成大于等于。
特点：弱主导策略，如果同时出现多个主导，remove的顺序不同，得到的结果可能不同。与前面主导策略的特点Note不同。

Eg 弱主导策略的过程和缺陷

这个博弈，player I 的策略2弱主导1和3，所以消除1 和 3 都可以
在这里插入图片描述
如果先消除1 的话，得到一个未完成的博弈，但结果是一致的。
We obtain a game that is not completely solved because there is no difference between strategy 2 and 3.

如果先消除3的话，结果同样得到一个未完成的博弈。

显然，这两个不同的选择，结果是不同的。