Basics of Game Theory
Basics of Game Theory
Game theory is mathematical analysis of strategic interaction. It can be applied in situations where two
or more agents are making decisions in interaction with each other, such as political negotiations,
gambling (in games not only about luck) and economical behaviour. In economics game theory is
appropriate tool when agents (e.g. firms or stock dealers) are interacting directly, not only through the
market. On this course we will introduce briefly some basic ideas of theory of two-person games
(where word “person” needs not to be taken literally).
Let’s examine a simple game of two players (A and B) each of them having two possible strategies to
choose. The player will get a payoff depending on both his own choice and the other player’s choice,
and the players don’t know the choice of the other player before making his own choice. The game
can be described easily by payoff matrix, which could be e.g. as follows:
Player B
B1 B2
(Example 1) Player A
A1
A2 (1,32,0 0,1
3,−1 )
So player A can choose from strategies A1 and A2 and player B can choose from strategies B1 and
B2. Player A is often called as row player, player B as column player and the possible strategies as
“top” and “bottom” (the strategies of row) and “left” and “right” (the strategies of column). The first entry
of each pair of numbers in the payoff matrix is the payoff to player A, and the second entry is payoff to
player B. For example, if A chooses A1 and B chooses B2, then payoff to A will be 0 and payoff to B
will be 1. If the chosen strategies are A2 and B2, player A will get payoff of 3 and player B payoff of -1
(which means that B loses 1).
In this example there is a dominant strategy: for player A it’s always better to choose A2 and for player
B it’s better to choose B1, no matter what is the choice of the other player. The dominant strategy is
equilibrium in this game: A is playing strategy A1 and B is playing strategy B1, and the alternative
strategies are out of consideration.
By making some changes in payoff matrix we’ll get a game without dominant strategy:
Player B
B1 B2
(Example 2) Player A
A1
A2 (1,30,0 0,03,1 )
Now the optimal strategy for player A is depending on the choice of player B. If B chooses B1, it’s
better for A to choose A1, but if B chooses B2, the best strategy for player A is A2. Also the best
strategy for player B is depending on the choice of player A. Let’s suppose that A expects player B to
choose B1 and decides to choose A1. If the game is repeated, it’s profitable for neither of the player to
change the strategy. That is Nash equilibrium which is defined for two-person games as a pair of
strategies for players A and B where A’s choice is optimal with given B’s choice and B’s choice is
optimal with given A’s choice. If A and B suppose that the other player keeps his strategy, the own
strategy is better to keep too. So in this example the pair of strategies (A1,B1) is Nash equilibrium.
There can be more than one Nash equilibrium in a game. In Example 2 also (A2,B2) is Nash
equilibrium. The concept of Nash equilibrium was introduced in 1951 by famous American
mathematician John Nash, who received Nobel Prize in economics in 1994.
If the players make their selection only once and keep it, it’s called pure strategy. Imagine that you are
player A in the following game:
Player B
B1 B2
A1 0,0 0 ,−1
(Example 3) Player A A2 ( 1,0 −1,3 )
If B plays B1, your optimal choice is A2. But if the game is repeated and you keep your choice, player
B probably changes to B2, where his profit is 3 instead of 0. Now you want to change to A1 where
your profit is 0 instead of -1. If B supposes you to stay in A1, he changes to B1 – and we are back in
starting point.
How would you play this kind of game? Probably you try to guess what B is doing next and you know
that B is trying to guess your move, and you know that B knows that you are trying to guess B’s move,
and so on. Both of the players are also observing the way the other is playing, and maybe trying to find
some kind of regular behaviour.
That kind of speculations are unnecessary if the selection of the strategy is randomised so that each of
the strategies is chosen with certain probabilities (equal or not equal, but the sum of the probabilities
must be 1). This is called mixed strategy.
It can be shown that in Example 3 the optimal strategy for player A is to choose A1 with probability 3/4
and A2 with probability 1/4, if B’s strategy is to choose from B1 and B2 with equal probabilities (1/2),
and that this is optimal strategy for B when A is playing A1 and A2 with probabilities 3/4 and 1/4. This
is Nash equilibrium in mixed strategies.
Well-known game “rock paper scissors” is a simple example of a game where the mixed strategy with
equal probabilities is the optimal strategy. In this game there are three columns and rows in payoff
matrix, since both players have three alternative choices. See the website
http://www.gametheory.net/dictionary/Games/RockPaperScissors.html. Note, that when playing “rock
paper scissors” in real life there usually isn’t any method to choose absolutely randomly (human mind
cannot do that). If you play “rock paper scissors” against a computer program, which is using mixed
strategy to choose any of the possibilities randomly with equal probabilities, you will probably lose in a
long run.
How the optimal strategies can be found? We’ll consider this in a case of zero-sum games of two
players, where the payoff to a player equals to loss of the other player. Zero-sum games are games of
competition. The game described in Example 2 is a game of cooperation: when A gets payoff, also B
can get payoff.
Player B
B1 B2
(Example 4) Player A
A1
A2 (2,−2
4,−4
3,−3
1,−1 )
In this game player A is trying to maximize his payoff and player B is trying to minimize his loss. There
isn’t a dominant strategy because A’s optimal choice is depending on B’s choice and vice versa. So
it’s reasonable for player A to choose mixed strategy. If A plays A1 with probability p, the expected
payoff when B is playing B1, is 2p+4(1-p) = -2p+4. If B is playing B2, is A’s expected payoff 3p+1(1-p)
= 2p+1. What’s the best value for p?
3.5
2.5
1.5
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
We can see that the minimum of the functions is at maximal value in the intersection of the lines. The
exact value for p can be found by solving the equation -2p+4 = 2p+1. It’s easy to see that the solution
is p = 3/4 = 0,75. Thus A’s optimal strategy based on maximin criterion is mixed strategy with
probabilities 3/4 and 1/4.
For example, let’s roll a pair of standard 6-sided dice and consider an event A = "the
sum is 10". The possibilities of event A to are {4,6}, {6,4} and {5,5}. So n A =3 . Total
number of possibilities is n=6⋅6=36 . Thus P(A)=3/36 = 1/12 ≈ 8,3%.
The classical definition is useful only if the numbers n A and n can be determined.
It’s not possible in the following example: let’s throw a small stone into a box (without
cover) with area of four square meters. What is the probability that the stone hits to a
one square meter circle drawn in the bottom of the box?
There are (at least) two ways to evaluate the chance. Geometric approach gives the
probability 1/4. Then it must be supposed that the stone can hit wherever in the
bottom of the box with equal probability – none of the parts of the bottom can be
preferred.
Another way is statistical approach: let’s throw the stone to the box repeatedly, let’s
say 1000 times, and check out how many times the stone hits the circle. If the
number of the hits is 234, we can estimate the probability to be approximately
234/1000 ≈ 0,23.
Note: When throwing the stone another 1000 times can number of hits be something
else (and probably it is). The probability calculated with statistical method is only an
estimate. The more throws there are, the better the estimate can be supposed to be.
2 Combinatorics
When using the classical approach the numbers nA and n must be determined. In the
example above it was easy but sometimes it’s more complicated. What is the number
of possible starting hands in Texas hold’em poker? What about the number of
possible hands including pair of aces? Those problems belong to combinatorics. This
kind of combinatorical problems can usually be solved using the following principles.
Rule of product
Let’s suppose that we have to do n selections, number of possibilities being x1 in the first
selection, x2 in the second selection and so on. Then the total number of different sequences of
selections is
x1 · x2 · ... · xn .
Example: If a menu includes three starters, four main dishes and two desserts, then the
number of different meals (including a starter, a main dish and a dessert) is 3·4·2=24.
Permutations with repetitions
Let’s choose k objects from the set of n objects, so that every object can be chosen more than
once and the order matters. Using the rule of product we can evaluate the number of different
sequences of choices to be nk. For example, the number of different passwords consisting of
six lowercase characters from the alphabets a-z is 266 = 308 915 776. (Selection of the
password can be thought to be a sequence of six choices, each of them having 26
possibilities.)
Permutations without repetitions
If an object can be chosen only once, is the number of the possibilities decreased by one after
each selection. So the total number of different possibilities is n · (n-1) · (n-2) · ... · (n-k+1) =
n!
(n−k)! .
That is number of k-permutations of a set of n objects.
Example: How many possibilities there are to pick up five cards in a sequence from a
standard card deck including 52 different cards (when the order matters)? The answer is 52
·51 ·50 ·49 ·48 = 311 875 200.
Especially, if n is equal to k, the number of k-permutations is, actually, number of possibilities
to order n objects to a sequence. Using the principle above we’ll get n · (n-1) · (n-2) · ... · 1 =
n! . (So, if you didn’t know what means notation n! (read “n factorial”) in the formula above,
now you know it.
Combinations
How many possibilities there are to choose k objects from the set of n objects when
the order doesn’t matter and each object can be chosen only once? That’s number of
k-combinations of a set of n objects.
n!
There are (n-k )!k!different ways. It’s the number of k-permutations divided by k!,
since each k-combination can produce k! different permutations. The expression
n!
(n-k )!k! is useful in many areas of mathematics and is called binomial coefficient,
n
shortly denoted as
( )
k (maybe you have the key [nCr] for that in your
calculator).
Example: How many possible starting hands (five cards from ) there are in poker? As seen
above, the number of chances to get five cards from the card deck in a certain order is 52 ·51
·50 ·49 ·48 = 311 875 200. But it doesn’t matter what’s the order you get the cards from the
dealer. So the number of 5-permutations must be divided by 5! to get the answer, which is
52!
52 ·51 ·50 ·49 ·48
=2598960 . = 52
( )
5! Note that it’s equal to (52-5)!5! 5 .
Let A and B be events. Then the following rules for the probabilities are valid.
3) P(Ac) = 1 - P(A),
where Ac is complement of A, event “not A” or “event A doesn’t happen”.
P( A and B )
.
P(B | A) = P(A)
Formula 1a: A shooter’s probability to hit a target is 0,8. He shoots two times. What is the
probability that at least one of the shots hits the target?
Solution. Let’s denote events as follows: A = ”the first shot hits” and B = ”the second shot
hits”. The probability asked is P(A or B) = P(A) + P(B) - P(A and B) = 0,8 + 0,8 - 0,8 · 0,8 =
0,96. Also formula 2a was applied when calculating P(A and B) =
0,8 · 0,8.
Remember that event ”A or B” can be expressed as “only A happens or only B happens or
both of them happen”. The problem can be solved also from that point of view:
P(A or B) = 0,8 · 0,2 + 0,2 · 0,8 + 0,8 · 0,8 = 0,96. The result is the same as above.
Formula 2a: Shooters Jack and John have both the probability 0,7 to hit the target. But if one
of the shooters, just before his turn to shoot, observes another shooter hitting the target, his
chance to hit is dropped to 0,4 (because stress). Jack and John shoot one after another, first
Jack. What is the probability that both of them hit the target?
Solution. Let A = ”Jack hits” and B = ”John hits”. Now the probability to be calculated is
P(A and B) = P(A) · P(B | A) = 0,7 · 0,4 = 0,28 .
Formula 3: Here we can see yet another way to solve the first example. If D = ”at least one
shot hits the target”, then Dc = ”both of shots are missed”. Thus P(D) = 1 - P(Dc) = 1- 0,2 · 0,2
= 0,96 .
Formula 4: Three persons have probabilities 0,5 , 0,6 and 0,9 to hit the target. Each of them
shoots and after that one (and only one) hit is found in the target. What’s the probability that it
was the best shooter (shooter with hit rate 0,9) who hit the target?
Solution. Let B = ”the best shooter hits” and A=”exactly one hits, when everyone shoots
once”. Using formula 4 we’ll get
P( A and B ) 0,5⋅0,4⋅0,9
≈0 , 782
P(B | A) = P(A) = 0,5⋅0,4⋅0,9+0,5⋅0,6⋅0,1+0,5⋅0,4⋅0,1 .
Note that P(A) is sum of the probabilities of events “one hit and two misses”. There are three
different ways how that can be happened.
Reliability Engineering and Reliability Theory
Let us denote as T the random variable “length of the time period the unit is working
properly before the first failure” (or time between failures). The unit has been first
started or introduced at moment t=0. Usually the distribution of T is continuous, and
very often it can be supposed to be normal or exponential distribution.
Note that R(t) = 1 - P(T ≤ t) = F(t), where F is the cumulative density function of T,
defined as
∞
F(t )=∫ f (t )dt
0 .
Failure density and failure rate can be understood very practically by supposing that
very many (e.g. 1000) similar units have been introduced at moment t=0, and
observing only the first failure of each unit. Then failure density can be considered as
proportion of units failed in a time unit. Failure rate can be thought as “conditional
density function”: it’s also proportion of the units failed in a time unit, but only among
the units still working. For example, if f(t1) = 0,12 1/h and R(t1) = 0,333, then
h(t1) = (0,12 1/h)/0,333 ≈ 0,36 1/h. So, at moment t = t1 proportion of 12% of all units
is failed within an hour but because only about 1/3 of units is still working, the
percentage of failured units per hour among them, the failure rate, is 3 times as high
as the failure density.
“The best guess“ for the mean (average) of the values in the whole population is the
mean of the values in the sample. It can be proofed that the sample mean of several
random samples is normally distributed, not depending on distribution of the variable
to sampled. This is the basis of mean estimation methods. The expected value of the
sample mean is the population mean. That’s why the sample mean is called the
unbiased estimate of the population mean. (See the difference between the original
distribution and the distribution of sample mean in several samples. The standard
deviation of the distribution of sample means, called standard error, isn’t same as
standard deviation of the original distribution.)
If we want to know the mean income in a country, we can pick up let’s say 2000
employees and ask them their salary. The average of the salaries is our estimate for
the mean income. Obviously it is the best estimate but an interesting question is: how
good the estimate is?
First, the sample should be selected reasonably: it should represent the population
well. If all employees are selected from same company or same city, the sample
probably isn’t representative. See pages 133-136 in the coursebook for different
methods to get a usable sample.
Let’s suppose that the sampling is done properly. Then the goodness of the estimate
depends on sample size – and fortune. Error risk can be calculated and it can be
expressed in exact form using confidence intervals. Confidence interval with
confidence level p is the interval, centered on the sample mean, including the
population mean with the probability p.
For example, if a poll found that 16% of the voters are supporting party A and the
error margin of the poll is two percentage points with confidence level 0,95, then the
real proportion of supporters of party A lies on the interval [14%, 18%] with probability
0,95. (And there’s still possibility of 5% that the proportion is something else.)
1. Introduction
But what means “as well as possible“? Commonly used and statistically reasonable
criterion is based on minimizing the square sum of residuals. Let us express the data
points as (x1,y1), (x2,y2),..., (xn,yn). Residual means the difference between the given
(measured) value yi and the calculated value kxi+b. So the parameters k and b
should have values such that sum
n
∑ ( y i−(kx i +b ))2
i=1
is minimized. This is the same thing as to find the least square solution for the linear
system of equations
kx1+b = y1
kx2+b = y2
.
.
.
kxn+b = yn
The least square solution of the system of equations minimizes the sum of squared
differences between left and right sides of the equations.
Every linear system of equations can be expressed as matrix equation AX=B. It can
be shown that the least square solution of the equation AX=B is X=(ATA)-1AT B, where
AT is the transpose of coefficient matrix A and A-1 denotes matrix inverse. X is column
vector including the variables to be solved (in the example above k and b) and B is
column vector including the constants of the system (in the example above y1,y2,...,
yn).
is fitting to data points (x1,y1), (x2,y2),..., (xn,yn) as well as possible. The functions f1, f2 ,
…, fn are known functions.
If the function to be fitted to data points is not of the form (*), it’s a matter of nonlinear
regression. For example: find the parameters A and w to fit the function f(x)=Asin(wx)
to given data points (in this example A is representing the amplitude of the sine
curve, and w is so called angular frequency).
With Excel’s Solver add-in the least square solutions can be easily determined.
Some simple forms regression can be done in Excel also using Add Trendline option.
1. Introduction
Term “coursebook“ below means Evans’ book Statistics, Data Analysis and Decision
Modeling. The page and question numbers are from the 4th edition, but also the
corresponding numbers in the 3rd edition are mentioned.
Hypothesis testing is a tool to make valid conclusions using statistical data. A simple
example is to assess is the difference between averages of two different samples
significant or not. If in a certain area the average price in residential sales was 2800
euros per square meter in January 2008 and only 2500 €/m 2 in January 2009, can it
be concluded that prices on the area are going down? In addition to average, we
should know at least sample sizes and variances. (For example: almost nothing can
be concluded if only one apartment in poor condition was sold on the area in January
2009.)
In hypothesis testing two alternatives are defined: the null hypothesis, which
represents the “status quo“ – that there isn’t any changes (for example in residential
prices) or that the commonly accepted theory or understanding is true – and the
alternative hypothesis, which
must be true if the null hypothesis is conluded to be false. Typically, the null
hypothesis must be falsified with very high probability; it must be “almost sure“ that
the null hypothesis is false (and the alternative hypothesis is true). This method is
widely used in science, but also in practical applications where it can be very harmful
to make incorrect conclusions from statistical data.
In statistical inference, even if most sophisticated methods are used, there’s usually a
risk of some level to make incorrect conclusions. When testing a hypothesis a level
of significance must be selected to define the risk which should be small enough to
be accepted. The level of significance, typically denoted by , is exactly defined to be
the probability of making so called Type I error, i.e. the probability to incorrectly reject
the null hypothesis which actually is true – and incorrectly conclude that the
alternative hypothesis is true. See also terms confidence coefficient, Type II error and
power of test on pages 177-178 (the 3rd edition: pages 163-164) in the coursebook.
How to test the hypothesis? There are lot of different methods depending on what is
proposed to test and what kind of statiscical data is available. See details in the
coursebook, and use Data Analysis tool of Excel to solve the following problems.
LAST EXAM QUESTIONS
a) What’s the difference between a discrete and a continuous probability distribution?
discrete probability:
Discribes an finite and countable event. The probabilities of the possible outcomes of the event
is known.
continous probability:
Discribes an cumulative event.
One specific event has the probability 0 - in contrast to discrete probability.
b) What features of the normal distribution are expected value and standard deviation
describing? If you change the expected value, how the pdf curve of the distribution is changed?
What about if you change the standard deviation?
expected value:
Represent the possible values of an event.
A change of the expected value do not change the "total" values of the event but the number of
events where the value is reached differs.
standard deviation:
Is the sqare root of the variance. Discribes the the variation from the mean.
A change of the standard distribution changes the values of the event, but not the number of
the event.
a) Weight of a mechanical machine part, when there are similar parts produced in a factory, the
mean weight being 1250 grams and the standard deviation being 20 grams.
Normal distribution
N(1250;20)
c) Number of interruptions in a production process during one week when there happens on
average 0,5 interruptions in a day.
Poisson distribution
Parameter:
Case 1: You should find out if recent advertisement campaign has had any
effect on sales of your company or not. You have sales data before and after the
campaign.
Case 2: The mean time between failures in a certain kind of system was
analyzed and using this information you should find out the risk of the system to
be failured during next month.
Case 3: The management of a company was intended to know was there any
connection between employee’s time needed to finish certain kind of project and
previous experience from similar projects, and if there was, what kind of
connection it was (e.g.: how much one year of work experience would shorten
-.or lengthen! - the time needed to finish the project).
Case 1:Hypothesis testing. The null hypothesis is: "no change in sales" and probably it's best to compare
average daily sales before and after the campaign.
Case 2: If the mean time between failures is constant (as obviously can be supposed), let's denote it by
T (months), then the number of failures during one month
is Poisson distributed with expected value 1/T.
Probability for the system to be failured (at least once) during next month can be calculated using
formula of Poisson distribution: P = exp(-1/T).
Case 3: Regression analysis. At first it's probably reasonable to try linear regression. Coefficient of
determination (R^2) can be used to analyze is there a connection
between the variables. If there is a strong connection, the slope of the line can be used to estimate how
much the exprerience is affecting the time to finish the project.
There can be other possibilities also: hypothesis testing could be applied in Case 3.
The probability distribution tells the probabilities of different values of a random variable.
If the distribution is binomial, the probability for the random variable to have value k can be calculated
with command:
" =BINOMDIST(k, n, p, 0) "
This gives a probability to get k successful repeats when making n repeats with success probability p.
The probabilities can be described as a bar diagram. Let be n = 10 and p = 0,2.
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0 1 2 3 4 5 6 7 8 9 10
0 0,107374182
1 0,268435456
2 0,301989888
3 0,201326592
4 0,088080384
5 0,026424115
6 0,005505024
7 0,000786432
8 7,3728E-05
9 4,096E-06
10 1,024E-07
For example, we can suppose that you have probability of 20% to score in penalty kick in football.
If you make 10 penalty kicks, the probability to score exactly 4 times is about 8,8 %.
As you can see, the expected value, n*p = 10*0,2 = 2, has the largest probability. However, this is not
always
the case. For example, if the probability to score is 15%, the expected value is 1,5, of which probability
is zero since 1,5 cannot be the number of scores in 10 penalty kicks! But the highest probability in
binomial distribution is always on the values nearby the expected value.
Note that a random process can be described with the binomial distribution only if each repeat has the
same
probability to succeed, and the repeats are independent. In fact this is not obvious in the process of
penalty kicks: if you miss a kick, maybe it makes you nervous and the probability to score on the
next time will be lower…
Let's take a closer look at the syntax of BINOMDIST command. The first three parameters of the
command -
n, k and p, have been explained above. What about the last parameter with value 0 in our example?
It's a logical variable having value 0 (false) or 1 (true) and it can be thought to be an answer to question:
"Do you want cumulative values or not?". The cumulative values are probabilities for the random variable
to have
a value equal or less to a certain number k. And it's called cumulative because it can be calculated also
by adding together all probabilities of the different values not greater than k.
For example, what is the probability that the number of scores is at most 3? It is
0,879126
The same result is got as follows:
0,879126
Note that this is the same as the total lenght of the bars for values 0,1,2 and 3 in the bar diagram of the
probability distribution function above!
With value 0 for the last parameter command BINOMDIST gives values of probability distribution
function (PDF)
and with value 1 it gives values of cumulative distribution function (CDF).
Many of the basic ideas remain the same if you replace BINOMDIST with POISSON, NORMDIST or
EXPONDIST.
The number of the parameters and their meaning can be different - for example in POISSON command
only the expected value is needed besides the value of the random variable and last parameter having
value
0 or 1.
The most notable difference is faced when considering a continuous distribution, like normal or
exponential distribution,
instead of a discrete distribution, like binomial and Poisson distribution. In this case the PDF is
represented by
continuous curve instead of a bar chart, and the CDF is the area between the curve and x-axis instead of
total length
of the bars in a chart.