Quantitative Economics With Python
Quantitative Economics With Python
Python
August 7, 2020
2
Contents
3 Modeling COVID 19 33
4 Linear Algebra 43
7 Heavy-Tailed Distributions 95
3
4 CONTENTS
V Information 519
VI LQ Control 631
1
Chapter 1
1.1 Contents
• Overview 1.2
• Key Formulas 1.3
• Example: The Money Multiplier in Fractional Reserve Banking 1.4
• Example: The Keynesian Multiplier 1.5
• Example: Interest Rates and Present Values 1.6
• Back to the Keynesian Multiplier 1.7
1.2 Overview
The lecture describes important ideas in economics that use the mathematics of geometric
series.
Among these are
• the Keynesian multiplier
• the money multiplier that prevails in fractional reserve banking systems
• interest rates and present values of streams of payouts from assets
(As we shall see below, the term multiplier comes down to meaning sum of a convergent
geometric series)
These and other applications prove the truth of the wise crack that
3
4 CHAPTER 1. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯
1
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ = (1)
1−𝑐
To prove key formula (1), multiply both sides by (1 − 𝑐) and verify that if 𝑐 ∈ (−1, 1), then
the outcome is the equation 1 = 1.
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ + 𝑐𝑇
1 − 𝑐𝑇 +1
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ + 𝑐𝑇 =
1−𝑐
Remark: The above formula works for any value of the scalar 𝑐. We don’t have to restrict 𝑐
to be in the set (−1, 1).
We now move on to describe some famous economic applications of geometric series.
In a fractional reserve banking system, banks hold only a fraction 𝑟 ∈ (0, 1) of cash behind
each deposit receipt that they issue
1.4. EXAMPLE: THE MONEY MULTIPLIER IN FRACTIONAL RESERVE BANKING 5
• In recent times
– cash consists of pieces of paper issued by the government and called dollars or
pounds or …
– a deposit is a balance in a checking or savings account that entitles the owner to
ask the bank for immediate payment in cash
• When the UK and France and the US were on either a gold or silver standard (before
1914, for example)
– cash was a gold or silver coin
– a deposit receipt was a bank note that the bank promised to convert into gold or
silver on demand; (sometimes it was also a checking or savings account balance)
Economists and financiers often define the supply of money as an economy-wide sum of
cash plus deposits.
In a fractional reserve banking system (one in which the reserve ratio 𝑟 satisfies 0 < 𝑟 <
1), banks create money by issuing deposits backed by fractional reserves plus loans that
they make to their customers.
A geometric series is a key tool for understanding how banks create money (i.e., deposits) in
a fractional reserve system.
The geometric series formula (1) is at the heart of the classic model of the money creation
process – one that leads us to the celebrated money multiplier.
𝐿𝑖 + 𝑅𝑖 = 𝐷𝑖 (2)
The left side of the above equation is the sum of the bank’s assets, namely, the loans 𝐿𝑖 it
has outstanding plus its reserves of cash 𝑅𝑖 .
The right side records bank 𝑖’s liabilities, namely, the deposits 𝐷𝑖 held by its depositors; these
are IOU’s from the bank to its depositors in the form of either checking accounts or savings
accounts (or before 1914, bank notes issued by a bank stating promises to redeem note for
gold or silver on demand).
Each bank 𝑖 sets its reserves to satisfy the equation
𝑅𝑖 = 𝑟𝐷𝑖 (3)
𝐷𝑖+1 = 𝐿𝑖 (4)
6 CHAPTER 1. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
Thus, we can think of the banks as being arranged along a line with loans from bank 𝑖 being
immediately deposited in 𝑖 + 1
• in this way, the debtors to bank 𝑖 become creditors of bank 𝑖 + 1
Finally, we add an initial condition about an exogenous level of bank 0’s deposits
𝐷0 is given exogenously
We can think of 𝐷0 as being the amount of cash that a first depositor put into the first bank
in the system, bank number 𝑖 = 0.
Now we do a little algebra.
Combining equations (2) and (3) tells us that
𝐿𝑖 = (1 − 𝑟)𝐷𝑖 (5)
This states that bank 𝑖 loans a fraction (1 − 𝑟) of its deposits and keeps a fraction 𝑟 as cash
reserves.
Combining equation (5) with equation (4) tells us that
Equation (6) expresses 𝐷𝑖 as the 𝑖 th term in the product of 𝐷0 and the geometric series
1, (1 − 𝑟), (1 − 𝑟)2 , ⋯
∞
𝐷0 𝐷
∑(1 − 𝑟)𝑖 𝐷0 = = 0 (7)
𝑖=0
1 − (1 − 𝑟) 𝑟
The money multiplier is a number that tells the multiplicative factor by which an exoge-
nous injection of cash into bank 0 leads to an increase in the total deposits in the banking
system.
1
Equation (7) asserts that the money multiplier is 𝑟
• An initial deposit of cash of 𝐷0 in bank 0 leads the banking system to create total de-
posits of 𝐷𝑟0 .
• The initial deposit 𝐷0 is held as reserves, distributed throughout the banking system
∞
according to 𝐷0 = ∑𝑖=0 𝑅𝑖 .
1.5. EXAMPLE: THE KEYNESIAN MULTIPLIER 7
The famous economist John Maynard Keynes and his followers created a simple model in-
tended to determine national income 𝑦 in circumstances in which
• there are substantial unemployed resources, in particular excess supply of labor and
capital
• prices and interest rates fail to adjust to make aggregate supply equal demand (e.g.,
prices and interest rates are frozen)
• national income is entirely determined by aggregate demand
𝑐+𝑖=𝑦
The second equation is a Keynesian consumption function asserting that people consume a
fraction 𝑏 ∈ (0, 1) of their income:
𝑐 = 𝑏𝑦
1
𝑦= 𝑖
1−𝑏
1
The quantity 1−𝑏 is called the investment multiplier or simply the multiplier.
Applying the formula for the sum of an infinite geometric series, we can write the above equa-
tion as
∞
𝑦 = 𝑖 ∑ 𝑏𝑡
𝑡=0
∞
1
= ∑ 𝑏𝑡
1−𝑏 𝑡=0
∞
The expression ∑𝑡=0 𝑏𝑡 motivates an interpretation of the multiplier as the outcome of a dy-
namic process that we describe next.
We arrive at a dynamic version by interpreting the nonnegative integer 𝑡 as indexing time and
changing our specification of the consumption function to take time into account
• we add a one-period lag in how income affects consumption
We let 𝑐𝑡 be consumption at time 𝑡 and 𝑖𝑡 be investment at time 𝑡.
We modify our consumption function to assume the form
𝑐𝑡 = 𝑏𝑦𝑡−1
so that 𝑏 is the marginal propensity to consume (now) out of last period’s income.
We begin wtih an initial condition stating that
𝑦−1 = 0
𝑖𝑡 = 𝑖 for all 𝑡 ≥ 0
𝑦0 = 𝑖 + 𝑐0 = 𝑖 + 𝑏𝑦−1 = 𝑖
and
𝑦1 = 𝑐1 + 𝑖 = 𝑏𝑦0 + 𝑖 = (1 + 𝑏)𝑖
and
𝑦2 = 𝑐2 + 𝑖 = 𝑏𝑦1 + 𝑖 = (1 + 𝑏 + 𝑏2 )𝑖
𝑦𝑡 = 𝑏𝑦𝑡−1 + 𝑖 = (1 + 𝑏 + 𝑏2 + ⋯ + 𝑏𝑡 )𝑖
or
1.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 9
1 − 𝑏𝑡+1
𝑦𝑡 = 𝑖
1−𝑏
Evidently, as 𝑡 → +∞,
1
𝑦𝑡 → 𝑖
1−𝑏
Remark 1: The above formula is often applied to assert that an exogenous increase in in-
vestment of Δ𝑖 at time 0 ignites a dynamic process of increases in national income by succes-
sive amounts
at times 0, 1, 2, ….
Remark 2 Let 𝑔𝑡 be an exogenous sequence of government expenditures.
If we generalize the model so that the national income identity becomes
𝑐𝑡 + 𝑖 𝑡 + 𝑔 𝑡 = 𝑦 𝑡
then a version of the preceding argument shows that the government expenditures mul-
1
tiplier is also 1−𝑏 , so that a permanent increase in government expenditures ultimately leads
to an increase in national income equal to the multiplier times the increase in government ex-
penditures.
We can apply our formula for geometric series to study how interest rates affect values of
streams of dollar payments that extend over time.
We work in discrete time and assume that 𝑡 = 0, 1, 2, … indexes time.
We let 𝑟 ∈ (0, 1) be a one-period net nominal interest rate
• if the nominal interest rate is 5 percent, then 𝑟 = .05
A one-period gross nominal interest rate 𝑅 is defined as
𝑅 = 1 + 𝑟 ∈ (1, 2)
• if 𝑟 = .05, then 𝑅 = 1.05
Remark: The gross nominal interest rate 𝑅 is an exchange rate or relative price of dol-
lars at between times 𝑡 and 𝑡 + 1. The units of 𝑅 are dollars at time 𝑡 + 1 per dollar at time
𝑡.
When people borrow and lend, they trade dollars now for dollars later or dollars later for dol-
lars now.
The price at which these exchanges occur is the gross nominal interest rate.
• If I sell 𝑥 dollars to you today, you pay me 𝑅𝑥 dollars tomorrow.
10 CHAPTER 1. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
• This means that you borrowed 𝑥 dollars for me at a gross interest rate 𝑅 and a net in-
terest rate 𝑟.
We assume that the net nominal interest rate 𝑟 is fixed over time, so that 𝑅 is the gross nom-
inal interest rate at times 𝑡 = 0, 1, 2, ….
Two important geometric sequences are
1, 𝑅, 𝑅2 , ⋯ (8)
and
Sequence (8) tells us how dollar values of an investment accumulate through time.
Sequence (9) tells us how to discount future dollars to get their values in terms of today’s
dollars.
1.6.1 Accumulation
Geometric sequence (8) tells us how one dollar invested and re-invested in a project with
gross one period nominal rate of return accumulates
• here we assume that net interest payments are reinvested in the project
• thus, 1 dollar invested at time 0 pays interest 𝑟 dollars after one period, so we have 𝑟 +
1 = 𝑅 dollars at time1
• at time 1 we reinvest 1 + 𝑟 = 𝑅 dollars and receive interest of 𝑟𝑅 dollars at time 2 plus
the principal 𝑅 dollars, so we receive 𝑟𝑅 + 𝑅 = (1 + 𝑟)𝑅 = 𝑅2 dollars at the end of
period 2
• and so on
Evidently, if we invest 𝑥 dollars at time 0 and reinvest the proceeds, then the sequence
𝑥, 𝑥𝑅, 𝑥𝑅2 , ⋯
1.6.2 Discounting
Geometric sequence (9) tells us how much future dollars are worth in terms of today’s dollars.
Remember that the units of 𝑅 are dollars at 𝑡 + 1 per dollar at 𝑡.
It follows that
• the units of 𝑅−1 are dollars at 𝑡 per dollar at 𝑡 + 1
• the units of 𝑅−2 are dollars at 𝑡 per dollar at 𝑡 + 2
• and so on; the units of 𝑅−𝑗 are dollars at 𝑡 per dollar at 𝑡 + 𝑗
So if someone has a claim on 𝑥 dollars at time 𝑡 + 𝑗, it is worth 𝑥𝑅−𝑗 dollars at time 𝑡 (e.g.,
today).
1.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 11
𝑥𝑡 = 𝐺𝑡 𝑥0
𝑝0 = 𝑥0 + 𝑥1 /𝑅 + 𝑥2 /(𝑅2 )+ ⋱
= 𝑥0 (1 + 𝐺𝑅−1 + 𝐺2 𝑅−2 + ⋯)
1
= 𝑥0
1 − 𝐺𝑅−1
where the last line uses the formula for an infinite geometric series.
Recall that 𝑅 = 1 + 𝑟 and 𝐺 = 1 + 𝑔 and that 𝑅 > 𝐺 and 𝑟 > 𝑔 and that 𝑟 and 𝑔 are typically
small numbers, e.g., .05 or .03.
1
Use the Taylor series of 1+𝑟 about 𝑟 = 0, namely,
1
= 1 − 𝑟 + 𝑟2 − 𝑟3 + ⋯
1+𝑟
1
and the fact that 𝑟 is small to approximate 1+𝑟 ≈ 1 − 𝑟.
Use this approximation to write 𝑝0 as
1
𝑝0 = 𝑥0
1 − 𝐺𝑅−1
1
= 𝑥0
1 − (1 + 𝑔)(1 − 𝑟)
1
= 𝑥0
1 − (1 + 𝑔 − 𝑟 − 𝑟𝑔)
1
≈ 𝑥0
𝑟−𝑔
𝑥0
𝑝0 =
𝑟−𝑔
is known as the Gordon formula for the present value or current price of an infinite pay-
ment stream 𝑥0 𝐺𝑡 when the nominal one-period interest rate is 𝑟 and when 𝑟 > 𝑔.
We can also extend the asset pricing formula so that it applies to finite leases.
Let the payment stream on the lease now be 𝑥𝑡 for 𝑡 = 1, 2, … , 𝑇 , where again
12 CHAPTER 1. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
𝑥𝑡 = 𝐺𝑡 𝑥0
𝑝0 = 𝑥0 + 𝑥1 /𝑅 + ⋯ + 𝑥𝑇 /𝑅𝑇
= 𝑥0 (1 + 𝐺𝑅−1 + ⋯ + 𝐺𝑇 𝑅−𝑇 )
𝑥0 (1 − 𝐺𝑇 +1 𝑅−(𝑇 +1) )
=
1 − 𝐺𝑅−1
1 1
= 1 − 𝑟(𝑇 + 1) + 𝑟2 (𝑇 + 1)(𝑇 + 2) + ⋯ ≈ 1 − 𝑟(𝑇 + 1)
(1 + 𝑟)𝑇 +1 2
Expanding:
We could have also approximated by removing the second term 𝑟𝑔𝑥0 (𝑇 + 1) when 𝑇 is rela-
tively small compared to 1/(𝑟𝑔) to get 𝑥0 (𝑇 + 1) as in the finite stream approximation.
We will plot the true finite stream present-value and the two approximations, under different
values of 𝑇 , and 𝑔 and 𝑟 in Python.
First we plot the true finite stream present-value after computing it below
return p
# Infinite lease
def infinite_lease(g, r, x_0):
G = (1 + g)
R = (1 + r)
return x_0 / (1 G * R**(1))
Now that we have defined our functions, we can plot some outcomes.
First we study the quality of our approximations
T_max = 50
T = np.arange(0, T_max+1)
g = 0.02
r = 0.03
x_0 = 1
fig, ax = plt.subplots()
ax.set_title('Finite Lease Present Value $T$ Periods Ahead')
for f in funcs:
plot_function(ax, T, f, our_args)
ax.legend()
ax.set_xlabel('$T$ Periods Ahead')
ax.set_ylabel('Present Value, $p_0$')
plt.show()
14 CHAPTER 1. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
The graph above shows how as duration 𝑇 → +∞, the value of a lease of duration 𝑇 ap-
proaches the value of a perpetual lease.
Now we consider two different views of what happens as 𝑟 and 𝑔 covary
ax.legend()
plt.show()
16 CHAPTER 1. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
This graph gives a big hint for why the condition 𝑟 > 𝑔 is necessary if a lease of length 𝑇 =
+∞ is to have finite value.
For fans of 3-d graphs the same point comes through in the following graph.
If you aren’t enamored of 3-d graphs, feel free to skip the next visualization!
rr, gg = np.meshgrid(r, g)
z = finite_lease_pv_true(T, gg, rr, x_0)
We can use a little calculus to study how the present value 𝑝0 of a lease varies with 𝑟 and 𝑔.
We will use a library called SymPy.
SymPy enables us to do symbolic math calculations including computing derivatives of alge-
braic equations.
We will illustrate how it works by creating a symbolic expression that represents our present
value formula for an infinite lease.
After that, we’ll use SymPy to compute derivatives
𝑥0
Out[7]: 𝑔+1
− 𝑟+1 + 1
dp0 / dg is:
18 CHAPTER 1. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
𝑥0
Out[8]: 2
(𝑟 + 1) (− 𝑔+1
𝑟+1
+ 1)
dp0 / dr is:
𝑥0 (𝑔 + 1)
Out[9]: − 2 2
(𝑟 + 1) (− 𝑔+1
𝑟+1
+ 1)
𝜕𝑝0 𝜕𝑝0
We can see that for 𝜕𝑟 < 0 as long as 𝑟 > 𝑔, 𝑟 > 0 and 𝑔 > 0 and 𝑥0 is positive, so 𝜕𝑟 will
always be negative.
𝜕𝑝0 𝜕𝑝0
Similarly, 𝜕𝑔 > 0 as long as 𝑟 > 𝑔, 𝑟 > 0 and 𝑔 > 0 and 𝑥0 is positive, so 𝜕𝑔 will always be
positive.
We will now go back to the case of the Keynesian multiplier and plot the time path of 𝑦𝑡 ,
given that consumption is a constant fraction of national income, and investment is fixed.
# Initial values
i_0 = 0.3
g_0 = 0.3
# 2/3 of income goes towards consumption
b = 2/3
y_init = 0
T = 100
fig, ax = plt.subplots()
ax.set_title('Path of Aggregate Output Over Time')
ax.set_xlabel('$t$')
ax.set_ylabel('$y_t$')
ax.plot(np.arange(0, T+1), calculate_y(i_0, b, g_0, T, y_init))
# Output predicted by geometric series
ax.hlines(i_0 / (1 b) + g_0 / (1 b), xmin=1, xmax=101, linestyles='')
plt.show()
1.7. BACK TO THE KEYNESIAN MULTIPLIER 19
In this model, income grows over time, until it gradually converges to the infinite geometric
series sum of income.
We now examine what will happen if we vary the so-called marginal propensity to con-
sume, i.e., the fraction of income that is consumed
fig,ax = plt.subplots()
ax.set_title('Changing Consumption as a Fraction of Income')
ax.set_ylabel('$y_t$')
ax.set_xlabel('$t$')
x = np.arange(0, T+1)
for b in bs:
y = calculate_y(i_0, b, g_0, T, y_init)
ax.plot(x, y, label=r'$b=$'+f"{b:.2f}")
ax.legend()
plt.show()
20 CHAPTER 1. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
Increasing the marginal propensity to consume 𝑏 increases the path of output over time.
Now we will compare the effects on output of increases in investment and government spend-
ing.
x = np.arange(0, T+1)
values = [0.3, 0.4]
for i in values:
y = calculate_y(i, b, g_0, T, y_init)
ax1.plot(x, y, label=f"i={i}")
for g in values:
y = calculate_y(i_0, b, g, T, y_init)
ax2.plot(x, y, label=f"g={g}")
Notice here, whether government spending increases from 0.3 to 0.4 or investment increases
from 0.3 to 0.4, the shifts in the graphs are identical.
22 CHAPTER 1. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
Chapter 2
Multivariate Hypergeometric
Distribution
2.1 Contents
• Overview 2.2
• The Administrator’s Problem 2.3
• Usage 2.4
2.2 Overview
23
24 CHAPTER 2. MULTIVARIATE HYPERGEOMETRIC DISTRIBUTION
𝑘1
⎛
⎜ 𝑘2 ⎞
⎟
𝑋=⎜
⎜ ⎟.
⎜⋮⎟ ⎟
⎝ 𝑘𝑐 ⎠
To evaluate whether the selection procedure is color blind the administrator wants to study
whether the particular realization of 𝑋 drawn can plausibly be said to be a random draw
from the probability distribution that is implied by the color blind hypothesis.
The appropriate probability distribution is the one described here
Let’s now instantiate the administrator’s problem, while continuing to use the colored balls
metaphor.
The administrator has an urn with 𝑁 = 238 balls.
157 balls are blue, 11 balls are green, 46 balls are yellow, and 24 balls are black.
So (𝐾1 , 𝐾2 , 𝐾3 , 𝐾4 ) = (157, 11, 46, 24) and 𝑐 = 4.
15 balls are drawn without replacement.
So 𝑛 = 15.
The administrator wants to know the probability distribution of outcomes
𝑘1
⎛
⎜ 𝑘2 ⎞
⎟
𝑋=⎜
⎜ ⎟.
⎜⋮⎟ ⎟
⎝𝑘4 ⎠
In particular, he wants to know whether a particular outcome - in the form of a 4×1 vector of
integers recording the numbers of blue, green, yellow, and black balls, respectively, - contains
evidence against the hypothesis that the selection process is fair, which here means color blind
and truly are random draws without replacement from the population of 𝑁 balls.
2.3. THE ADMINISTRATOR’S PROBLEM 25
The right tool for the administrator’s job is the multivariate hypergeometric distribu-
tion.
𝑐
∏𝑖=1 (𝐾
𝑘 )
𝑖
Pr{𝑋𝑖 = 𝑘𝑖 ∀𝑖} = 𝑖
(𝑁
𝑛)
Mean:
𝐾𝑖
E(𝑋𝑖 ) = 𝑛
𝑁
𝑁 − 𝑛 𝐾𝑖 𝐾
Var(𝑋𝑖 ) = 𝑛 (1 − 𝑖 )
𝑁 −1 𝑁 𝑁
𝑁 − 𝑛 𝐾𝑖 𝐾𝑗
Cov(𝑋𝑖 , 𝑋𝑗 ) = −𝑛
𝑁 −1 𝑁 𝑁
Parameters
K_arr: ndarray(int)
number of each type i object.
"""
self.K_arr = np.array(K_arr)
self.N = np.sum(K_arr)
self.c = len(K_arr)
Parameters
k_arr: ndarray(int)
number of observed successes of each object.
"""
k_arr = np.atleast_2d(k_arr)
n = np.sum(k_arr, 1)
pr = num / denom
return pr
Parameters
n: int
number of draws.
"""
# mean
μ = n * K_arr / N
# variancecovariance matrix
Σ = np.ones((c, c)) * n * (N n) / (N 1) / N ** 2
for i in range(c1):
Σ[i, i] *= K_arr[i] * (N K_arr[i])
for j in range(i+1, c):
Σ[i, j] *= K_arr[i] * K_arr[j]
Σ[j, i] = Σ[i, j]
return μ, Σ
Parameters
n: int
number of objects for each draw.
size: int(optional)
sample size.
seed: int(optional)
random seed.
"""
K_arr = self.K_arr
gen = np.random.Generator(np.random.PCG64(seed))
sample = gen.multivariate_hypergeometric(K_arr, n, size=size)
return sample
2.4 Usage
(52)(10 15
2 )( 2 )
𝑃 (2 black, 2 white, 2 red) = = 0.079575596816976
(30
6)
Now use the Urn Class method pmf to compute the probability of the outcome 𝑋 = (2 2 2)
Out[4]: array([0.0795756])
We can use the code to compute probabilities of a list of possible outcomes by constructing
a 2-dimensional array k_arr and pmf will return an array of probabilities for observing each
case.
In [6]: n = 6
μ, Σ = urn.moments(n)
In [7]: μ
In [8]: Σ
Out[10]: array([0.01547738])
In [13]: # mean
μ
We can simulate a large sample and verify that sample means and covariances closely approx-
imate the population means and covariances.
In [16]: # mean
np.mean(sample, 0)
Evidently, the sample means and covariances approximate their population counterparts well.
x_μ = x μ_x
y_μ = y μ_y
In [20]: @njit
def count(vec1, vec2, n):
size = sample.shape[0]
30 CHAPTER 2. MULTIVARIATE HYPERGEOMETRIC DISTRIBUTION
return count_mat
In [21]: c = urn.c
fig, axs = plt.subplots(c, c, figsize=(14, 14))
for i in range(c):
axs[i, i].hist(sample[:, i], bins=np.arange(0, n, 1), alpha=0.5, density=True,
label='hypergeom')
axs[i, i].hist(sample_normal[:, i], bins=np.arange(0, n, 1), alpha=0.5,
density=True, label='normal')
axs[i, i].legend()
axs[i, i].set_title('$k_{' +str(i+1) +'}$')
for j in range(c):
if i == j:
continue
plt.show()
2.4. USAGE 31
The diagonal graphs plot the marginal distributions of 𝑘𝑖 for each 𝑖 using histograms.
Note the substantial differences between hypergeometric distribution and the approximating
normal distribution.
The off-diagonal graphs plot the empirical joint distribution of 𝑘𝑖 and 𝑘𝑗 for each pair (𝑖, 𝑗).
The darker the blue, the more data points are contained in the corresponding cell.
(Note that 𝑘𝑖 is on the x-axis and 𝑘𝑗 is on the y-axis).
The contour maps plot the bivariate Gaussian density function of (𝑘𝑖 , 𝑘𝑗 ) with the population
mean and covariance given by slices of 𝜇 and Σ that we computed above.
Let’s also test the normality for each 𝑘𝑖 using scipy.stats.normaltest that implements
D’Agostino and Pearson’s test that combines skew and kurtosis to form an omnibus test of
normality.
The null hypothesis is that the sample follows normal distribution.
normaltest returns an array of p-values associated with tests for each 𝑘𝑖 sample.
32 CHAPTER 2. MULTIVARIATE HYPERGEOMETRIC DISTRIBUTION
As we can see, all the p-values are almost 0 and the null hypothesis is soundly rejected.
By contrast, the sample from normal distribution does not reject the null hypothesis.
The lesson to take away from this is that the normal approximation is imperfect.
Chapter 3
Modeling COVID 19
3.1 Contents
• Overview 3.2
• The SIR Model 3.3
• Implementation 3.4
• Experiments 3.5
• Ending Lockdown 3.6
3.2 Overview
This is a Python version of the code for analyzing the COVID-19 pandemic provided by An-
drew Atkeson.
See, in particular
• NBER Working Paper No. 26867
• COVID-19 Working papers and code
The purpose of his notes is to introduce economists to quantitative modeling of infectious dis-
ease dynamics.
Dynamics are modeled using a standard SIR (Susceptible-Infected-Removed) model of disease
spread.
The model dynamics are represented by a system of ordinary differential equations.
The main objective is to study the impact of suppression through social distancing on the
spread of the infection.
The focus is on US outcomes but the parameters can be adjusted to study other countries.
We will use the following standard imports:
We will also use SciPy’s numerical routine odeint for solving differential equations.
33
34 CHAPTER 3. MODELING COVID 19
This routine calls into compiled code from the FORTRAN library odepack.
In the version of the SIR model we will analyze there are four states.
All individuals in the population are assumed to be in one of these four states.
The states are: susceptible (S), exposed (E), infected (I) and removed (R).
Comments:
• Those in state R have been infected and either recovered or died.
• Those who have recovered are assumed to have acquired immunity.
• Those in the exposed group are not yet infectious.
𝑠(𝑡)
̇ = −𝛽(𝑡) 𝑠(𝑡) 𝑖(𝑡)
𝑒(𝑡)
̇ = 𝛽(𝑡) 𝑠(𝑡) 𝑖(𝑡) − 𝜎𝑒(𝑡) (1)
̇ = 𝜎𝑒(𝑡) − 𝛾𝑖(𝑡)
𝑖(𝑡)
In these equations,
• 𝛽(𝑡) is called the transmission rate (the rate at which individuals bump into others and
expose them to the virus).
• 𝜎 is called the infection rate (the rate at which those who are exposed become infected)
• 𝛾 is called the recovery rate (the rate at which infected people recover or die).
• the dot symbol 𝑦 ̇ represents the time derivative 𝑑𝑦/𝑑𝑡.
We do not need to model the fraction 𝑟 of the population in state 𝑅 separately because the
states form a partition.
In particular, the “removed” fraction of the population is 𝑟 = 1 − 𝑠 − 𝑒 − 𝑖.
We will also track 𝑐 = 𝑖 + 𝑟, which is the cumulative caseload (i.e., all those who have or have
had the infection).
The system (1) can be written in vector form as
3.4. IMPLEMENTATION 35
3.3.2 Parameters
3.4 Implementation
In [4]: γ = 1 / 18
σ = 1 / 5.2
"""
s, e, i = x
# Time derivatives
ds = ne
de = ne σ * e
di = σ * e γ * i
We solve for the time path numerically using odeint, at a sequence of dates t_vec.
"""
G = lambda x, t: F(x, t, R0)
s_path, e_path, i_path = odeint(G, x_init, t_vec).transpose()
3.5 Experiments
for r in R0_vals:
i_path, c_path = solve_path(r, t_vec)
i_paths.append(i_path)
c_paths.append(c_path)
fig, ax = plt.subplots()
ax.legend(loc='upper left')
plt.show()
Let’s look at a scenario where mitigation (e.g., social distancing) is successively imposed.
Here’s a specification for R0 as a function of time.
This is what the time path of R0 looks like at these alternative rates:
ax.legend()
plt.show()
3.5. EXPERIMENTS 39
for η in η_vals:
R0 = lambda t: R0_mitigating(t, η=η)
i_path, c_path = solve_path(R0, t_vec)
i_paths.append(i_path)
c_paths.append(c_path)
The following replicates additional results by Andrew Atkeson on the timing of lifting lock-
down.
Consider these two mitigation scenarios:
1. 𝑅𝑡 = 0.5 for 30 days and then 𝑅𝑡 = 2 for the remaining 17 months. This corresponds to
lifting lockdown in 30 days.
2. 𝑅𝑡 = 0.5 for 120 days and then 𝑅𝑡 = 2 for the remaining 14 months. This corresponds
to lifting lockdown in 4 months.
The parameters considered here start the model with 25,000 active infections and 75,000
agents already exposed to the virus and thus soon to be contagious.
for R0 in R0_paths:
i_path, c_path = solve_path(R0, t_vec, x_init=x_0)
i_paths.append(i_path)
c_paths.append(c_path)
In [23]: ν = 0.01
Pushing the peak of curve further into the future may reduce cumulative deaths if a vaccine is
found.
Chapter 4
Linear Algebra
4.1 Contents
• Overview 4.2
• Vectors 4.3
• Matrices 4.4
• Solving Systems of Equations 4.5
• Eigenvalues and Eigenvectors 4.6
• Further Topics 4.7
• Exercises 4.8
• Solutions 4.9
4.2 Overview
Linear algebra is one of the most useful branches of applied mathematics for economists to
invest in.
For example, many applied problems in economics and finance require the solution of a linear
system of equations, such as
𝑦1 = 𝑎𝑥1 + 𝑏𝑥2
𝑦2 = 𝑐𝑥1 + 𝑑𝑥2
The objective here is to solve for the “unknowns” 𝑥1 , … , 𝑥𝑘 given 𝑎11 , … , 𝑎𝑛𝑘 and 𝑦1 , … , 𝑦𝑛 .
When considering such problems, it is essential that we first consider at least some of the fol-
lowing questions
• Does a solution actually exist?
• Are there in fact many solutions, and if so how should we interpret them?
• If no solution exists, is there a best “approximate” solution?
43
44 CHAPTER 4. LINEAR ALGEBRA
4.3 Vectors
A vector of length 𝑛 is just a sequence (or array, or tuple) of 𝑛 numbers, which we write as
𝑥 = (𝑥1 , … , 𝑥𝑛 ) or 𝑥 = [𝑥1 , … , 𝑥𝑛 ].
We will write these sequences either horizontally or vertically as we please.
(Later, when we wish to perform certain matrix operations, it will become necessary to distin-
guish between the two)
The set of all 𝑛-vectors is denoted by ℝ𝑛 .
For example, ℝ2 is the plane, and a vector in ℝ2 is just a point in the plane.
Traditionally, vectors are represented visually as arrows from the origin to the point.
The following figure represents three vectors in this manner
The two most common operators for vectors are addition and scalar multiplication, which we
now describe.
As a matter of definition, when we add two vectors, we add them element-by-element
𝑥1 𝑦1 𝑥1 + 𝑦1
⎡𝑥 ⎤ ⎡𝑦 ⎤ ⎡𝑥 + 𝑦 ⎤
𝑥 + 𝑦 = ⎢ 2 ⎥ + ⎢ 2 ⎥ ∶= ⎢ 2 2⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎣𝑥𝑛 ⎦ ⎣𝑦𝑛 ⎦ ⎣𝑥𝑛 + 𝑦𝑛 ⎦
Scalar multiplication is an operation that takes a number 𝛾 and a vector 𝑥 and produces
𝛾𝑥1
⎡ 𝛾𝑥 ⎤
𝛾𝑥 ∶= ⎢ 2 ⎥
⎢ ⋮ ⎥
⎣𝛾𝑥𝑛 ⎦
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')
scalars = (2, 2)
x = np.array(x)
for s in scalars:
v = s * x
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='red',
shrink=0,
alpha=0.5,
width=0.5))
ax.text(v[0] + 0.4, v[1] 0.2, f'${s} x$', fontsize='16')
plt.show()
In Python, a vector can be represented as a list or tuple, such as x = (2, 4, 6), but is more
commonly represented as a NumPy array.
One advantage of NumPy arrays is that scalar multiplication and addition have very natural
4.3. VECTORS 47
syntax
In [5]: 4 * x
𝑛
𝑥′ 𝑦 ∶= ∑ 𝑥𝑖 𝑦𝑖
𝑖=1
1/2
√ 𝑛
‖𝑥‖ ∶= 𝑥 𝑥 ∶= (∑ 𝑥2𝑖 )
′
𝑖=1
Out[6]: 12.0
Out[7]: 1.7320508075688772
Out[8]: 1.7320508075688772
4.3.3 Span
Given a set of vectors 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } in ℝ𝑛 , it’s natural to think about the new vectors we
can create by performing linear operations.
New vectors created in this manner are called linear combinations of 𝐴.
48 CHAPTER 4. LINEAR ALGEBRA
In this context, the values 𝛽1 , … , 𝛽𝑘 are called the coefficients of the linear combination.
The set of linear combinations of 𝐴 is called the span of 𝐴.
The next figure shows the span of 𝐴 = {𝑎1 , 𝑎2 } in ℝ3 .
The span is a two-dimensional plane passing through these two points and the origin.
α, β = 0.2, 0.1
gs = 3
z = np.linspace(x_min, x_max, gs)
x = np.zeros(gs)
y = np.zeros(gs)
ax.plot(x, y, z, 'k', lw=2, alpha=0.5)
ax.plot(z, x, y, 'k', lw=2, alpha=0.5)
ax.plot(y, z, x, 'k', lw=2, alpha=0.5)
# Lines to vectors
for i in (0, 1):
x = (0, x_coords[i])
y = (0, y_coords[i])
z = (0, f(x_coords[i], y_coords[i]))
ax.plot(x, y, z, 'b', lw=1.5, alpha=0.6)
Examples
If 𝐴 contains only one vector 𝑎1 ∈ ℝ2 , then its span is just the scalar multiples of 𝑎1 , which is
the unique line passing through both 𝑎1 and the origin.
If 𝐴 = {𝑒1 , 𝑒2 , 𝑒3 } consists of the canonical basis vectors of ℝ3 , that is
1 0 0
𝑒1 ∶= ⎡ ⎤
⎢0⎥ , 𝑒2 ∶= ⎡ ⎤
⎢1⎥ , 𝑒3 ∶= ⎡
⎢0⎥
⎤
⎣0⎦ ⎣0⎦ ⎣1⎦
then the span of 𝐴 is all of ℝ3 , because, for any 𝑥 = (𝑥1 , 𝑥2 , 𝑥3 ) ∈ ℝ3 , we can write
𝑥 = 𝑥 1 𝑒1 + 𝑥 2 𝑒2 + 𝑥 3 𝑒3
As we’ll see, it’s often desirable to find families of vectors with relatively large span, so that
many vectors can be described by linear operators on a few vectors.
The condition we need for a set of vectors to have a large span is what’s called linear inde-
pendence.
In particular, a collection of vectors 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } in ℝ𝑛 is said to be
• linearly dependent if some strict subset of 𝐴 has the same span as 𝐴.
• linearly independent if it is not linearly dependent.
Put differently, a set of vectors is linearly independent if no vector is redundant to the span
and linearly dependent otherwise.
To illustrate the idea, recall the figure that showed the span of vectors {𝑎1 , 𝑎2 } in ℝ3 as a
plane through the origin.
If we take a third vector 𝑎3 and form the set {𝑎1 , 𝑎2 , 𝑎3 }, this set will be
• linearly dependent if 𝑎3 lies in the plane
• linearly independent otherwise
As another illustration of the concept, since ℝ𝑛 can be spanned by 𝑛 vectors (see the discus-
sion of canonical basis vectors above), any collection of 𝑚 > 𝑛 vectors in ℝ𝑛 must be linearly
dependent.
The following statements are equivalent to linear independence of 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } ⊂ ℝ𝑛
Another nice thing about sets of linearly independent vectors is that each element in the span
has a unique representation as a linear combination of these vectors.
In other words, if 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } ⊂ ℝ𝑛 is linearly independent and
𝑦 = 𝛽 1 𝑎1 + ⋯ 𝛽 𝑘 𝑎𝑘
4.4 Matrices
Matrices are a neat way of organizing data for use in linear operations.
An 𝑛 × 𝑘 matrix is a rectangular array 𝐴 of numbers with 𝑛 rows and 𝑘 columns:
Often, the numbers in the matrix represent coefficients in a system of linear equations, as dis-
cussed at the start of this lecture.
For obvious reasons, the matrix 𝐴 is also called a vector if either 𝑛 = 1 or 𝑘 = 1.
In the former case, 𝐴 is called a row vector, while in the latter it is called a column vector.
If 𝑛 = 𝑘, then 𝐴 is called square.
The matrix formed by replacing 𝑎𝑖𝑗 by 𝑎𝑗𝑖 for every 𝑖 and 𝑗 is called the transpose of 𝐴 and
denoted 𝐴′ or 𝐴⊤ .
If 𝐴 = 𝐴′ , then 𝐴 is called symmetric.
For a square matrix 𝐴, the 𝑖 elements of the form 𝑎𝑖𝑖 for 𝑖 = 1, … , 𝑛 are called the principal
diagonal.
𝐴 is called diagonal if the only nonzero entries are on the principal diagonal.
If, in addition to being diagonal, each element along the principal diagonal is equal to 1, then
𝐴 is called the identity matrix and denoted by 𝐼.
Just as was the case for vectors, a number of algebraic operations are defined for matrices.
Scalar multiplication and addition are immediate generalizations of the vector case:
and
In the latter case, the matrices must have the same shape in order for the definition to make
sense.
We also have a convention for multiplying two matrices.
The rule for matrix multiplication generalizes the idea of inner products discussed above and
is designed to make multiplication play well with basic linear operations.
52 CHAPTER 4. LINEAR ALGEBRA
If 𝐴 and 𝐵 are two matrices, then their product 𝐴𝐵 is formed by taking as its 𝑖, 𝑗-th element
the inner product of the 𝑖-th row of 𝐴 and the 𝑗-th column of 𝐵.
There are many tutorials to help you visualize this operation, such as this one, or the discus-
sion on the Wikipedia page.
If 𝐴 is 𝑛 × 𝑘 and 𝐵 is 𝑗 × 𝑚, then to multiply 𝐴 and 𝐵 we require 𝑘 = 𝑗, and the resulting
matrix 𝐴𝐵 is 𝑛 × 𝑚.
As perhaps the most important special case, consider multiplying 𝑛 × 𝑘 matrix 𝐴 and 𝑘 × 1
column vector 𝑥.
According to the preceding rule, this gives us an 𝑛 × 1 column vector
Note
𝐴𝐵 and 𝐵𝐴 are not generally the same thing.
NumPy arrays are also used as matrices, and have fast, efficient functions and methods for all
the standard matrix operations Section ??.
You can create them manually from tuples of tuples (or lists of lists) as follows
type(A)
Out[10]: tuple
In [11]: A = np.array(A)
type(A)
Out[11]: numpy.ndarray
In [12]: A.shape
Out[12]: (2, 2)
The shape attribute is a tuple giving the number of rows and columns — see here for more
discussion.
4.5. SOLVING SYSTEMS OF EQUATIONS 53
In [13]: A = np.identity(3)
B = np.ones((3, 3))
2 * A
In [14]: A + B
Each 𝑛 × 𝑘 matrix 𝐴 can be identified with a function 𝑓(𝑥) = 𝐴𝑥 that maps 𝑥 ∈ ℝ𝑘 into
𝑦 = 𝐴𝑥 ∈ ℝ𝑛 .
These kinds of functions have a special property: they are linear.
A function 𝑓 ∶ ℝ𝑘 → ℝ𝑛 is called linear if, for all 𝑥, 𝑦 ∈ ℝ𝑘 and all scalars 𝛼, 𝛽, we have
You can check that this holds for the function 𝑓(𝑥) = 𝐴𝑥 + 𝑏 when 𝑏 is the zero vector and
fails when 𝑏 is nonzero.
In fact, it’s known that 𝑓 is linear if and only if there exists a matrix 𝐴 such that 𝑓(𝑥) = 𝐴𝑥
for all 𝑥.
𝑦 = 𝐴𝑥 (3)
54 CHAPTER 4. LINEAR ALGEBRA
The problem we face is to determine a vector 𝑥 ∈ ℝ𝑘 that solves (3), taking 𝑦 and 𝐴 as given.
This is a special case of a more general problem: Find an 𝑥 such that 𝑦 = 𝑓(𝑥).
Given an arbitrary function 𝑓 and a 𝑦, is there always an 𝑥 such that 𝑦 = 𝑓(𝑥)?
If so, is it always unique?
The answer to both these questions is negative, as the next figure shows
for ax in axes:
# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')
ax = axes[0]
ax = axes[1]
ybar = 2.6
ax.plot(x, x * 0 + ybar, 'k', alpha=0.5)
ax.text(0.04, 0.91 * ybar, '$y$', fontsize=16)
plt.show()
4.5. SOLVING SYSTEMS OF EQUATIONS 55
In the first plot, there are multiple solutions, as the function is not one-to-one, while in the
second there are no solutions, since 𝑦 lies outside the range of 𝑓.
Can we impose conditions on 𝐴 in (3) that rule out these problems?
In this context, the most important thing to recognize about the expression 𝐴𝑥 is that it cor-
responds to a linear combination of the columns of 𝐴.
In particular, if 𝑎1 , … , 𝑎𝑘 are the columns of 𝐴, then
𝐴𝑥 = 𝑥1 𝑎1 + ⋯ + 𝑥𝑘 𝑎𝑘
Let’s discuss some more details, starting with the case where 𝐴 is 𝑛 × 𝑛.
This is the familiar case where the number of unknowns equals the number of equations.
For arbitrary 𝑦 ∈ ℝ𝑛 , we hope to find a unique 𝑥 ∈ ℝ𝑛 such that 𝑦 = 𝐴𝑥.
In view of the observations immediately above, if the columns of 𝐴 are linearly independent,
then their span, and hence the range of 𝑓(𝑥) = 𝐴𝑥, is all of ℝ𝑛 .
Hence there always exists an 𝑥 such that 𝑦 = 𝐴𝑥.
Moreover, the solution is unique.
In particular, the following are equivalent
The property of having linearly independent columns is sometimes expressed as having full
column rank.
Inverse Matrices
Determinants
Another quick comment about square matrices is that to every such matrix we assign a
unique number called the determinant of the matrix — you can find the expression for it
here.
If the determinant of 𝐴 is not zero, then we say that 𝐴 is nonsingular.
Perhaps the most important fact about determinants is that 𝐴 is nonsingular if and only if 𝐴
is of full column rank.
This gives us a useful one-number summary of whether or not a square matrix can be in-
verted.
This case is very important in many settings, not least in the setting of linear regression
(where 𝑛 is the number of observations, and 𝑘 is the number of explanatory variables).
Given arbitrary 𝑦 ∈ ℝ𝑛 , we seek an 𝑥 ∈ ℝ𝑘 such that 𝑦 = 𝐴𝑥.
In this setting, the existence of a solution is highly unlikely.
Without much loss of generality, let’s go over the intuition focusing on the case where the
columns of 𝐴 are linearly independent.
It follows that the span of the columns of 𝐴 is a 𝑘-dimensional subspace of ℝ𝑛 .
This span is very “unlikely” to contain arbitrary 𝑦 ∈ ℝ𝑛 .
To see why, recall the figure above, where 𝑘 = 2 and 𝑛 = 3.
Imagine an arbitrarily chosen 𝑦 ∈ ℝ3 , located somewhere in that three-dimensional space.
What’s the likelihood that 𝑦 lies in the span of {𝑎1 , 𝑎2 } (i.e., the two dimensional plane
through these points)?
In a sense, it must be very small, since this plane has zero “thickness”.
As a result, in the 𝑛 > 𝑘 case we usually give up on existence.
However, we can still seek the best approximation, for example, an 𝑥 that makes the distance
‖𝑦 − 𝐴𝑥‖ as small as possible.
To solve this problem, one can use either calculus or the theory of orthogonal projections.
The solution is known to be 𝑥̂ = (𝐴′ 𝐴)−1 𝐴′ 𝑦 — see for example chapter 3 of these notes.
This is the 𝑛 × 𝑘 case with 𝑛 < 𝑘, so there are fewer equations than unknowns.
In this case there are either no solutions or infinitely many — in other words, uniqueness
never holds.
For example, consider the case where 𝑘 = 3 and 𝑛 = 2.
Thus, the columns of 𝐴 consists of 3 vectors in ℝ2 .
This set can never be linearly independent, since it is possible to find two vectors that span
ℝ2 .
(For example, use the canonical basis vectors)
It follows that one column is a linear combination of the other two.
For example, let’s say that 𝑎1 = 𝛼𝑎2 + 𝛽𝑎3 .
Then if 𝑦 = 𝐴𝑥 = 𝑥1 𝑎1 + 𝑥2 𝑎2 + 𝑥3 𝑎3 , we can also write
Here’s an illustration of how to solve linear equations with SciPy’s linalg submodule.
58 CHAPTER 4. LINEAR ALGEBRA
All of these routines are Python front ends to time-tested and highly optimized FORTRAN
code
Out[16]: 2.0
Out[17]: array([[2. , 1. ],
[ 1.5, 0.5]])
Out[18]: array([[1.],
[1.]])
Out[19]: array([[1.],
[ 1.]])
Observe how we can solve for 𝑥 = 𝐴−1 𝑦 by either via inv(A) @ y, or using solve(A, y).
The latter method uses a different algorithm (LU decomposition) that is numerically more
stable, and hence should almost always be preferred.
To obtain the least-squares solution 𝑥̂ = (𝐴′ 𝐴)−1 𝐴′ 𝑦, use scipy.linalg.lstsq(A, y).
𝐴𝑣 = 𝜆𝑣
A = np.array(A)
evals, evecs = eig(A)
evecs = evecs[:, 0], evecs[:, 1]
plt.show()
60 CHAPTER 4. LINEAR ALGEBRA
The eigenvalue equation is equivalent to (𝐴 − 𝜆𝐼)𝑣 = 0, and this has a nonzero solution 𝑣 only
when the columns of 𝐴 − 𝜆𝐼 are linearly dependent.
This in turn is equivalent to stating that the determinant is zero.
Hence to find all eigenvalues, we can look for 𝜆 such that the determinant of 𝐴 − 𝜆𝐼 is zero.
This problem can be expressed as one of solving for the roots of a polynomial in 𝜆 of degree
𝑛.
This in turn implies the existence of 𝑛 solutions in the complex plane, although some might
be repeated.
Some nice facts about the eigenvalues of a square matrix 𝐴 are as follows
2. The trace of 𝐴 (the sum of the elements on the principal diagonal) equals the sum of
the eigenvalues.
4. If 𝐴 is invertible and 𝜆1 , … , 𝜆𝑛 are its eigenvalues, then the eigenvalues of 𝐴−1 are
1/𝜆1 , … , 1/𝜆𝑛 .
A corollary of the first statement is that a matrix is invertible if and only if all its eigenvalues
are nonzero.
Using SciPy, we can solve for the eigenvalues and eigenvectors of a matrix as follows
4.7. FURTHER TOPICS 61
A = np.array(A)
evals, evecs = eig(A)
evals
In [22]: evecs
It is sometimes useful to consider the generalized eigenvalue problem, which, for given matri-
ces 𝐴 and 𝐵, seeks generalized eigenvalues 𝜆 and eigenvectors 𝑣 such that
𝐴𝑣 = 𝜆𝐵𝑣
We round out our discussion by briefly mentioning several other important topics.
Recall the usual summation formula for a geometric progression, which states that if |𝑎| < 1,
∞
then ∑𝑘=0 𝑎𝑘 = (1 − 𝑎)−1 .
A generalization of this idea exists in the matrix setting.
Matrix Norms
The norms on the right-hand side are ordinary vector norms, while the norm on the left-hand
side is a matrix norm — in this case, the so-called spectral norm.
62 CHAPTER 4. LINEAR ALGEBRA
For example, for a square matrix 𝑆, the condition ‖𝑆‖ < 1 means that 𝑆 is contractive, in the
sense that it pulls all vectors towards the origin Section ??.
Neumann’s Theorem
∞
(𝐼 − 𝐴)−1 = ∑ 𝐴𝑘 (4)
𝑘=0
Spectral Radius
A result known as Gelfand’s formula tells us that, for any square matrix 𝐴,
Here 𝜌(𝐴) is the spectral radius, defined as max𝑖 |𝜆𝑖 |, where {𝜆𝑖 }𝑖 is the set of eigenvalues of
𝐴.
As a consequence of Gelfand’s formula, if all eigenvalues are strictly less than one in modulus,
there exists a 𝑘 with ‖𝐴𝑘 ‖ < 1.
In which case (4) is valid.
Analogous definitions exist for negative definite and negative semi-definite matrices.
It is notable that if 𝐴 is positive definite, then all of its eigenvalues are strictly positive, and
hence 𝐴 is invertible (with positive definite inverse).
Then
𝜕𝑎′ 𝑥
1. 𝜕𝑥 =𝑎
𝜕𝐴𝑥
2. 𝜕𝑥 = 𝐴′
𝜕𝑥′ 𝐴𝑥
3. 𝜕𝑥 = (𝐴 + 𝐴′ )𝑥
𝜕𝑦′ 𝐵𝑧
4. 𝜕𝑦 = 𝐵𝑧
𝜕𝑦′ 𝐵𝑧
5. 𝜕𝐵 = 𝑦𝑧 ′
4.8 Exercises
4.8.1 Exercise 1
𝑦 = 𝐴𝑥 + 𝐵𝑢
Here
• 𝑃 is an 𝑛 × 𝑛 matrix and 𝑄 is an 𝑚 × 𝑚 matrix
• 𝐴 is an 𝑛 × 𝑛 matrix and 𝐵 is an 𝑛 × 𝑚 matrix
• both 𝑃 and 𝑄 are symmetric and positive semidefinite
(What must the dimensions of 𝑦 and 𝑢 be to make this a well-posed problem?)
One way to solve the problem is to form the Lagrangian
ℒ = −𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢 + 𝜆′ [𝐴𝑥 + 𝐵𝑢 − 𝑦]
Try applying the formulas given above for differentiating quadratic and linear forms to ob-
tain the first-order conditions for maximizing ℒ with respect to 𝑦, 𝑢 and minimizing it with
respect to 𝜆.
Show that these conditions imply that
1. 𝜆 = −2𝑃 𝑦.
As we will see, in economic contexts Lagrange multipliers often are shadow prices.
Note
If we don’t care about the Lagrange multipliers, we can substitute the constraint
into the objective function, and then just maximize −(𝐴𝑥+𝐵𝑢)′ 𝑃 (𝐴𝑥+𝐵𝑢)−𝑢′ 𝑄𝑢
with respect to 𝑢. You can verify that this leads to the same maximizer.
4.9 Solutions
s.t.
𝑦 = 𝐴𝑥 + 𝐵𝑢
with primitives
• 𝑃 be a symmetric and positive semidefinite 𝑛 × 𝑛 matrix
• 𝑄 be a symmetric and positive semidefinite 𝑚 × 𝑚 matrix
• 𝐴 an 𝑛 × 𝑛 matrix
• 𝐵 an 𝑛 × 𝑚 matrix
The associated Lagrangian is:
𝐿 = −𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢 + 𝜆′ [𝐴𝑥 + 𝐵𝑢 − 𝑦]
1. ^^.
Differentiating Lagrangian equation w.r.t y and setting its derivative equal to zero yields
𝜕𝐿
= −(𝑃 + 𝑃 ′ )𝑦 − 𝜆 = −2𝑃 𝑦 − 𝜆 = 0 ,
𝜕𝑦
4.9. SOLUTIONS 65
since P is symmetric.
Accordingly, the first-order condition for maximizing L w.r.t. y implies
𝜆 = −2𝑃 𝑦
2. ^^.
Differentiating Lagrangian equation w.r.t. u and setting its derivative equal to zero yields
𝜕𝐿
= −(𝑄 + 𝑄′ )𝑢 − 𝐵′ 𝜆 = −2𝑄𝑢 + 𝐵′ 𝜆 = 0
𝜕𝑢
Substituting 𝜆 = −2𝑃 𝑦 gives
𝑄𝑢 + 𝐵′ 𝑃 𝑦 = 0
𝑄𝑢 + 𝐵′ 𝑃 (𝐴𝑥 + 𝐵𝑢) = 0
(𝑄 + 𝐵′ 𝑃 𝐵)𝑢 + 𝐵′ 𝑃 𝐴𝑥 = 0
𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥 ,
which follows from the definition of the first-order conditions for Lagrangian equation.
3. ^^.
Rewriting our problem by substituting the constraint into the objective function, we get
Since we know the optimal choice of u satisfies 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥, then
−2𝑢′ 𝐵′ 𝑃 𝐴𝑥 = −2𝑥′ 𝑆 ′ 𝐵′ 𝑃 𝐴𝑥
= 2𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥
Notice that the term (𝑄 + 𝐵′ 𝑃 𝐵)−1 is symmetric as both P and Q are symmetric.
Regarding the third term −𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢,
Hence, the summation of second and third terms is 𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥.
This implies that
Therefore, the solution to the optimization problem 𝑣(𝑥) = −𝑥′ 𝑃 ̃ 𝑥 follows the above result by
denoting 𝑃 ̃ ∶= 𝐴′ 𝑃 𝐴 − 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴
Footnotes
[1] Although there is a specialized matrix data type defined in NumPy, it’s more standard to
work with ordinary NumPy arrays. See this discussion.
[2] Suppose that ‖𝑆‖ < 1. Take any nonzero vector 𝑥, and let 𝑟 ∶= ‖𝑥‖. We have ‖𝑆𝑥‖ =
𝑟‖𝑆(𝑥/𝑟)‖ ≤ 𝑟‖𝑆‖ < 𝑟 = ‖𝑥‖. Hence every point is pulled towards the origin.
Chapter 5
5.1 Contents
• Overview 5.2
• De Moivre’s Theorem 5.3
• Applications of de Moivre’s Theorem 5.4
5.2 Overview
67
68 CHAPTER 5. COMPLEX NUMBERS AND TRIGONOMETRY
𝑟 = |𝑧| = √𝑥2 + 𝑦2
The value 𝜃 is the angle of (𝑥, 𝑦) with respect to the real axis.
Evidently, the tangent of 𝜃 is ( 𝑥𝑦 ).
Therefore,
𝑦
𝜃 = tan−1 ( )
𝑥
Three elementary trigonometric functions are
5.2.2 An Example
√
Consider the complex number 𝑧 = 1 + 3𝑖.
√ √
For 𝑧 = 1 + 3𝑖, 𝑥 = 1, 𝑦 = 3.
√
It follows that 𝑟 = 2 and 𝜃 = tan−1 ( 3) = 𝜋3 = 60𝑜 .
√
Let’s use Python to plot the trigonometric form of the complex number 𝑧 = 1 + 3𝑖.
# Set parameters
r = 2
θ = π/3
x = r * np.cos(θ)
x_range = np.linspace(0, x, 1000)
θ_range = np.linspace(0, θ, 1000)
# Plot
fig = plt.figure(figsize=(8, 8))
5.2. OVERVIEW 69
ax = plt.subplot(111, projection='polar')
ax.set_rmax(2)
ax.set_rticks((0.5, 1, 1.5, 2)) # Less radial ticks
ax.set_rlabel_position(88.5) # Get radial labels away from plotted line
ax.grid(True)
plt.show()
70 CHAPTER 5. COMPLEX NUMBERS AND TRIGONOMETRY
𝑛
(𝑟(cos 𝜃 + 𝑖 sin 𝜃))𝑛 = (𝑟𝑒𝑖𝜃 )
and compute.
5.4.1 Example 1
1 = 𝑒𝑖𝜃 𝑒−𝑖𝜃
= (cos 𝜃 + 𝑖 sin 𝜃)(cos (-𝜃) + 𝑖 sin (-𝜃))
= (cos 𝜃 + 𝑖 sin 𝜃)(cos 𝜃 − 𝑖 sin 𝜃)
= cos2 𝜃 + sin2 𝜃
𝑥2 𝑦2
= + 2
𝑟2 𝑟
and thus
𝑥2 + 𝑦2 = 𝑟2
5.4.2 Example 2
𝑥𝑛 = 𝑎𝑧 𝑛 + 𝑎𝑧̄ 𝑛̄
= 𝑝𝑒𝑖𝜔 (𝑟𝑒𝑖𝜃 )𝑛 + 𝑝𝑒−𝑖𝜔 (𝑟𝑒−𝑖𝜃 )𝑛
= 𝑝𝑟𝑛 𝑒𝑖(𝜔+𝑛𝜃) + 𝑝𝑟𝑛 𝑒−𝑖(𝜔+𝑛𝜃)
= 𝑝𝑟𝑛 [cos (𝜔 + 𝑛𝜃) + 𝑖 sin (𝜔 + 𝑛𝜃) + cos (𝜔 + 𝑛𝜃) − 𝑖 sin (𝜔 + 𝑛𝜃)]
= 2𝑝𝑟𝑛 cos (𝜔 + 𝑛𝜃)
5.4.3 Example 3
This example provides machinery that is at the heard of Samuelson’s analysis of his
multiplier-accelerator model [93].
Thus, consider a second-order linear difference equation
𝑥𝑛+2 = 𝑐1 𝑥𝑛+1 + 𝑐2 𝑥𝑛
𝑧 2 − 𝑐1 𝑧 − 𝑐 2 = 0
or
(𝑧 2 − 𝑐1 𝑧 − 𝑐2 ) = (𝑧 − 𝑧1 )(𝑧 − 𝑧2 ) = 0
has roots 𝑧1 , 𝑧1 .
A solution is a sequence {𝑥𝑛 }∞
𝑛=0 that satisfies the difference equation.
Under the following circumstances, we can apply our example 2 formula to solve the differ-
ence equation
• the roots 𝑧1 , 𝑧2 of the characteristic polynomial of the difference equation form a com-
plex conjugate pair
• the values 𝑥0 , 𝑥1 are given initial conditions
To solve the difference equation, recall from example 2 that
where 𝜔, 𝑝 are coefficients to be determined from information encoded in the initial conditions
𝑥1 , 𝑥0 .
Since 𝑥0 = 2𝑝 cos 𝜔 and 𝑥1 = 2𝑝𝑟 cos (𝜔 + 𝜃) the ratio of 𝑥1 to 𝑥0 is
𝑥1 𝑟 cos (𝜔 + 𝜃)
=
𝑥0 cos 𝜔
We can solve this equation for 𝜔 then solve for 𝑝 using 𝑥0 = 2𝑝𝑟0 cos (𝜔 + 𝑛𝜃).
With the sympy package in Python, we are able to solve and plot the dynamics of 𝑥𝑛 given
different values of 𝑛.
√ √
In this example, we set the initial values: - 𝑟 = 0.9 - 𝜃 = 41 𝜋 - 𝑥0 = 4 - 𝑥1 = 𝑟 ⋅ 2 2 = 1.8 2.
72 CHAPTER 5. COMPLEX NUMBERS AND TRIGONOMETRY
We first numerically solve for 𝜔 and 𝑝 using nsolve in the sympy package based on the above
initial condition:
# Solve for ω
## Note: we choose the solution near 0
eq1 = Eq(x1/x0 r * cos(ω+θ) / cos(ω), 0)
ω = nsolve(eq1, ω, 0)
ω = np.float(ω)
print(f'ω = {ω:1.3f}')
# Solve for p
eq2 = Eq(x0 2 * p * cos(ω))
p = nsolve(eq2, p, 0)
p = np.float(p)
print(f'p = {p:1.3f}')
ω = 0.000
p = 2.000
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/sympy/core/relational.py:490: SymPyDeprecationWarning:
Eq(expr) with rhs default to 0 has been deprecated since SymPy 1.5.
Use Eq(expr, 0) instead. See
https://github.com/sympy/sympy/issues/16587 for more info.
deprecated_since_version="1.5"
# Define x_n
x = lambda n: 2 * p * r**n * np.cos(ω + n * θ)
# Plot
fig, ax = plt.subplots(figsize=(12, 8))
ax.plot(n, x(n))
ax.set(xlim=(0, max_n), ylim=(5, 5), xlabel='$n$', ylabel='$x_n$')
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')
ax.grid()
plt.show()
𝑒𝑖(𝜔+𝜃) + 𝑒−𝑖(𝜔+𝜃)
cos (𝜔 + 𝜃) =
2
𝑒𝑖(𝜔+𝜃) − 𝑒−𝑖(𝜔+𝜃)
sin (𝜔 + 𝜃) =
2𝑖
Since both real and imaginary parts of the above formula should be equal, we get:
The equations above are also known as the angle sum identities. We can verify the equa-
tions using the simplify function in the sympy package:
# Verify
print("cos(ω)cos(θ) sin(ω)sin(θ) =",
simplify(cos(ω)*cos(θ) sin(ω) * sin(θ)))
print("cos(ω)sin(θ) + sin(ω)cos(θ) =",
simplify(cos(ω)*sin(θ) + sin(ω) * cos(θ)))
We can also compute the trigonometric integrals using polar forms of complex numbers.
For example, we want to solve the following integral:
𝜋
∫ cos(𝜔) sin(𝜔) 𝑑𝜔
−𝜋
𝜋
1 1
∫ cos(𝜔) sin(𝜔) 𝑑𝜔 = sin2 (𝜋) − sin2 (−𝜋) = 0
−𝜋 2 2
We can verify the analytical as well as numerical results using integrate in the sympy pack-
age:
ω = Symbol('ω')
print('The analytical solution for integral of cos(ω)sin(ω) is:')
integrate(cos(ω) * sin(ω), ω)
sin2 (𝜔)
Out[6]:
2
Out[7]: 0
5.4.6 Exercises
We invite the reader to verify analytically and with the sympy package the following two
equalities:
76 CHAPTER 5. COMPLEX NUMBERS AND TRIGONOMETRY
𝜋
𝜋
∫ cos(𝜔)2 𝑑𝜔 =
−𝜋 2
𝜋
𝜋
∫ sin(𝜔)2 𝑑𝜔 =
−𝜋 2
Chapter 6
6.1 Contents
• Overview 6.2
• Relationships 6.3
• LLN 6.4
• CLT 6.5
• Exercises 6.6
• Solutions 6.7
6.2 Overview
This lecture illustrates two of the most important theorems of probability and statistics: The
law of large numbers (LLN) and the central limit theorem (CLT).
These beautiful theorems lie behind many of the most fundamental results in econometrics
and quantitative economic modeling.
The lecture is based around simulations that show the LLN and CLT in action.
We also demonstrate how the LLN and CLT break down when the assumptions they are
based on do not hold.
In addition, we examine several useful extensions of the classical theorems, such as
• The delta method, for smooth functions of random variables.
• The multivariate case.
Some of these extensions are presented as exercises.
We’ll need the following imports:
77
78 CHAPTER 6. LLN AND CLT
6.3 Relationships
6.4 LLN
We begin with the law of large numbers, which tells us when sample averages will converge to
their population means.
The classical law of large numbers concerns independent and identically distributed (IID)
random variables.
Here is the strongest version of the classical LLN, known as Kolmogorov’s strong law.
Let 𝑋1 , … , 𝑋𝑛 be independent and identically distributed scalar random variables, with com-
mon distribution 𝐹 .
When it exists, let 𝜇 denote the common mean of this sample:
𝜇 ∶= 𝔼𝑋 = ∫ 𝑥𝐹 (𝑑𝑥)
In addition, let
1 𝑛
𝑋̄ 𝑛 ∶= ∑ 𝑋𝑖
𝑛 𝑖=1
ℙ {𝑋̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (1)
6.4.2 Proof
The proof of Kolmogorov’s strong law is nontrivial – see, for example, theorem 8.3.5 of [31].
6.4. LLN 79
On the other hand, we can prove a weaker version of the LLN very easily and still get most of
the intuition.
The version we prove is as follows: If 𝑋1 , … , 𝑋𝑛 is IID with 𝔼𝑋𝑖2 < ∞, then, for any 𝜖 > 0, we
have
(This version is weaker because we claim only convergence in probability rather than almost
sure convergence, and assume a finite second moment)
To see that this is so, fix 𝜖 > 0, and let 𝜎2 be the variance of each 𝑋𝑖 .
Recall the Chebyshev inequality, which tells us that
𝔼[(𝑋̄ 𝑛 − 𝜇)2 ]
ℙ {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} ≤ (3)
𝜖2
2
⎧
{ 1 𝑛 ⎫
}
̄ 2
𝔼[(𝑋𝑛 − 𝜇) ] = 𝔼 ⎨[ ∑(𝑋𝑖 − 𝜇)] ⎬
{ 𝑛 𝑖=1 }
⎩ ⎭
1 𝑛 𝑛
= 2 ∑ ∑ 𝔼(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇)
𝑛 𝑖=1 𝑗=1
1 𝑛
= ∑ 𝔼(𝑋𝑖 − 𝜇)2
𝑛2 𝑖=1
𝜎2
=
𝑛
Here the crucial step is at the third equality, which follows from independence.
Independence means that if 𝑖 ≠ 𝑗, then the covariance term 𝔼(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇) drops out.
As a result, 𝑛2 − 𝑛 terms vanish, leading us to a final expression that goes to zero in 𝑛.
Combining our last result with (3), we come to the estimate
𝜎2
ℙ {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} ≤ (4)
𝑛𝜖2
This idea is very important in time series analysis, and we’ll come across it again soon
enough.
6.4.3 Illustration
Let’s now illustrate the classical IID law of large numbers using simulation.
In particular, we aim to generate some sequences of IID random variables and plot the evolu-
tion of 𝑋̄ 𝑛 as 𝑛 increases.
Below is a figure that does just this (as usual, you can click on it to expand it).
It shows IID observations from three different distributions and plots 𝑋̄ 𝑛 against 𝑛 in each
case.
The dots represent the underlying observations 𝑋𝑖 for 𝑖 = 1, … , 100.
In each of the three cases, convergence of 𝑋̄ 𝑛 to 𝜇 occurs as predicted
In [2]: n = 100
for ax in axes:
# Choose a randomly selected distribution
name = random.choice(list(distributions.keys()))
distribution = distributions.pop(name)
# Plot
ax.plot(list(range(n)), data, 'o', color='grey', alpha=0.5)
axlabel = '$\\bar X_n$ for $X_i \sim$' + name
ax.plot(list(range(n)), sample_mean, 'g', lw=3, alpha=0.6, label=axlabel)
m = distribution.mean()
ax.plot(list(range(n)), [m] * n, 'k', lw=1.5, label='$\mu$')
ax.vlines(list(range(n)), m, data, lw=0.2)
6.4. LLN 81
ax.legend(**legend_args, fontsize=12)
plt.show()
82 CHAPTER 6. LLN AND CLT
6.5. CLT 83
The three distributions are chosen at random from a selection stored in the dictionary dis
tributions.
6.5 CLT
Next, we turn to the central limit theorem, which tells us about the distribution of the devia-
tion between sample averages and population means.
The central limit theorem is one of the most remarkable results in all of mathematics.
In the classical IID setting, it tells us the following:
If the sequence 𝑋1 , … , 𝑋𝑛 is IID, with common mean 𝜇 and common variance 𝜎2 ∈ (0, ∞),
then
√ 𝑑
𝑛(𝑋̄ 𝑛 − 𝜇) → 𝑁 (0, 𝜎2 ) as 𝑛→∞ (5)
𝑑
Here → 𝑁 (0, 𝜎2 ) indicates convergence in distribution to a centered (i.e, zero mean) normal
with standard deviation 𝜎.
6.5.2 Intuition
The striking implication of the CLT is that for any distribution with finite second moment,
the simple operation of adding independent copies always leads to a Gaussian curve.
A relatively simple proof of the central limit theorem can be obtained by working with char-
acteristic functions (see, e.g., theorem 9.5.6 of [31]).
The proof is elegant but almost anticlimactic, and it provides surprisingly little intuition.
In fact, all of the proofs of the CLT that we know are similar in this respect.
Why does adding independent copies produce a bell-shaped distribution?
Part of the answer can be obtained by investigating the addition of independent Bernoulli
random variables.
In particular, let 𝑋𝑖 be binary, with ℙ{𝑋𝑖 = 0} = ℙ{𝑋𝑖 = 1} = 0.5, and let 𝑋1 , … , 𝑋𝑛 be
independent.
𝑛
Think of 𝑋𝑖 = 1 as a “success”, so that 𝑌𝑛 = ∑𝑖=1 𝑋𝑖 is the number of successes in 𝑛 trials.
The next figure plots the probability mass function of 𝑌𝑛 for 𝑛 = 1, 2, 4, 8
b = binom(n, 0.5)
ax.bar(dom, b.pmf(dom), alpha=0.6, align='center')
ax.set(xlim=(0.5, 8.5), ylim=(0, 0.55),
xticks=list(range(9)), yticks=(0, 0.2, 0.4),
title=f'$n = {n}$')
plt.show()
When 𝑛 = 1, the distribution is flat — one success or no successes have the same probability.
When 𝑛 = 2 we can either have 0, 1 or 2 successes.
Notice the peak in probability mass at the mid-point 𝑘 = 1.
The reason is that there are more ways to get 1 success (“fail then succeed” or “succeed then
fail”) than to get zero or two successes.
Moreover, the two trials are independent, so the outcomes “fail then succeed” and “succeed
then fail” are just as likely as the outcomes “fail then fail” and “succeed then succeed”.
(If there was positive correlation, say, then “succeed then fail” would be less likely than “suc-
ceed then succeed”)
Here, already we have the essence of the CLT: addition under independence leads probability
mass to pile up in the middle and thin out at the tails.
For 𝑛 = 4 and 𝑛 = 8 we again get a peak at the “middle” value (halfway between the mini-
mum and the maximum possible value).
The intuition is the same — there are simply more ways to get these middle outcomes.
If we continue, the bell-shaped curve becomes even more pronounced.
We are witnessing the binomial approximation of the normal distribution.
6.5. CLT 85
6.5.3 Simulation 1
Since the CLT seems almost magical, running simulations that verify its implications is one
good way to build intuition.
To this end, we now perform the following simulation
√
2. Generate independent draws of 𝑌𝑛 ∶= 𝑛(𝑋̄ 𝑛 − 𝜇).
3. Use these draws to compute some measure of their distribution — such as a histogram.
Here’s some code that does exactly this for the exponential distribution 𝐹 (𝑥) = 1 − 𝑒−𝜆𝑥 .
(Please experiment with other choices of 𝐹 , but remember that, to conform with the condi-
tions of the CLT, the distribution must have a finite second moment)
# Plot
fig, ax = plt.subplots(figsize=(10, 6))
xmin, xmax = 3 * s, 3 * s
ax.set_xlim(xmin, xmax)
ax.hist(Y, bins=60, alpha=0.5, density=True)
xgrid = np.linspace(xmin, xmax, 200)
ax.plot(xgrid, norm.pdf(xgrid, scale=s), 'k', lw=2, label='$N(0, \sigma^2)$')
ax.legend()
plt.show()
86 CHAPTER 6. LLN AND CLT
Notice the absence of for loops — every operation is vectorized, meaning that the major cal-
culations are all shifted to highly optimized C code.
The fit to the normal density is already tight and can be further improved by increasing n.
You can also experiment with other specifications of 𝐹 .
6.5.4 Simulation 2
Our next simulation is somewhat like the first, except that we aim to track the distribution of
√
𝑌𝑛 ∶= 𝑛(𝑋̄ 𝑛 − 𝜇) as 𝑛 increases.
In the simulation, we’ll be working with random variables having 𝜇 = 0.
Thus, when 𝑛 = 1, we have 𝑌1 = 𝑋1 , so the first distribution is just the distribution of the
underlying random variable.
√
For 𝑛 = 2, the distribution of 𝑌2 is that of (𝑋1 + 𝑋2 )/ 2, and so on.
What we expect is that, regardless of the distribution of the underlying random variable, the
distribution of 𝑌𝑛 will smooth out into a bell-shaped curve.
The next figure shows this process for 𝑋𝑖 ∼ 𝑓, where 𝑓 was specified as the convex combina-
tion of three different beta densities.
(Taking a convex combination is an easy way to produce an irregular shape for 𝑓)
In the figure, the closest density is that of 𝑌1 , while the furthest is that of 𝑌5
def gen_x_draws(k):
"""
Returns a flat array containing k independent draws from the
distribution of X, the underlying random variable. This distribution
is itself a convex combination of three beta distributions.
"""
6.5. CLT 87
nmax = 5
reps = 100000
ns = list(range(1, nmax + 1))
# Plot
fig = plt.figure(figsize = (10, 6))
ax = fig.gca(projection='3d')
a, b = 3, 3
gs = 100
xs = np.linspace(a, b, gs)
# Build verts
greys = np.linspace(0.3, 0.7, nmax)
verts = []
for n in ns:
density = gaussian_kde(Y[:, n1])
ys = density(xs)
verts.append(list(zip(xs, ys)))
The law of large numbers and central limit theorem work just as nicely in multidimensional
settings.
To state the results, let’s recall some elementary facts about random vectors.
A random vector X is just a sequence of 𝑘 random variables (𝑋1 , … , 𝑋𝑘 ).
Each realization of X is an element of ℝ𝑘 .
A collection of random vectors X1 , … , X𝑛 is called independent if, given any 𝑛 vectors
x1 , … , x𝑛 in ℝ𝑘 , we have
𝔼[𝑋1 ] 𝜇1
⎛
⎜ 𝔼[𝑋2 ] ⎞
⎟ ⎛
⎜ 𝜇2 ⎞
⎟
𝔼[X] ∶= ⎜
⎜ ⎟
⎟ = ⎜ ⎟ =∶ 𝜇
⎜ ⋮ ⎟ ⎜ ⎜ ⋮ ⎟⎟
⎝ 𝔼[𝑋 𝑘] 𝜇
⎠ ⎝ 𝑘 ⎠
1 𝑛
X̄ 𝑛 ∶= ∑ X𝑖
𝑛 𝑖=1
ℙ {X̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (6)
√ 𝑑
𝑛(X̄ 𝑛 − 𝜇) → 𝑁 (0, Σ) as 𝑛→∞ (7)
6.6 Exercises
6.6.1 Exercise 1
√ 𝑑
𝑛{𝑔(𝑋̄ 𝑛 ) − 𝑔(𝜇)} → 𝑁 (0, 𝑔′ (𝜇)2 𝜎2 ) as 𝑛→∞ (8)
This theorem is used frequently in statistics to obtain the asymptotic distribution of estima-
tors — many of which can be expressed as functions of sample means.
(These kinds of results are often said to use the “delta method”)
The proof is based on a Taylor expansion of 𝑔 around the point 𝜇.
Taking the result as given, let the distribution 𝐹 of each 𝑋𝑖 be uniform on [0, 𝜋/2] and let
𝑔(𝑥) = sin(𝑥).
√
Derive the asymptotic distribution of 𝑛{𝑔(𝑋̄ 𝑛 ) − 𝑔(𝜇)} and illustrate convergence in the
same spirit as the program discussed above.
What happens when you replace [0, 𝜋/2] with [0, 𝜋]?
What is the source of the problem?
6.6.2 Exercise 2
Here’s a result that’s often used in developing statistical tests, and is connected to the multi-
variate central limit theorem.
If you study econometric theory, you will see this result used again and again.
Assume the setting of the multivariate CLT discussed above, so that
√ 𝑑
𝑛(X̄ 𝑛 − 𝜇) → 𝑁 (0, Σ) (9)
is valid.
In a statistical setting, one often wants the right-hand side to be standard normal so that
confidence intervals are easily computed.
This normalization can be achieved on the basis of three observations.
First, if X is a random vector in ℝ𝑘 and A is constant and 𝑘 × 𝑘, then
Var[AX] = A Var[X]A′
𝑑
Second, by the continuous mapping theorem, if Z𝑛 → Z in ℝ𝑘 and A is constant and 𝑘 × 𝑘,
then
𝑑
AZ𝑛 → AZ
Third, if S is a 𝑘×𝑘 symmetric positive definite matrix, then there exists a symmetric positive
definite matrix Q, called the inverse square root of S, such that
6.7. SOLUTIONS 91
QSQ′ = I
√ 𝑑
Z𝑛 ∶= 𝑛Q(X̄ 𝑛 − 𝜇) → Z ∼ 𝑁 (0, I)
Applying the continuous mapping theorem one more time tells us that
𝑑
‖Z𝑛 ‖2 → ‖Z‖2
𝑑
𝑛‖Q(X̄ 𝑛 − 𝜇)‖2 → 𝜒2 (𝑘) (10)
𝑊𝑖
X𝑖 ∶= ( )
𝑈𝑖 + 𝑊 𝑖
where
• each 𝑊𝑖 is an IID draw from the uniform distribution on [−1, 1].
• each 𝑈𝑖 is an IID draw from the uniform distribution on [−2, 2].
• 𝑈𝑖 and 𝑊𝑖 are independent of each other.
Hints:
1. scipy.linalg.sqrtm(A) computes the square root of A. You still need to invert it.
6.7 Solutions
6.7.1 Exercise 1
In [6]: """
Illustrates the delta method, a consequence of the central limit theorem.
"""
# Set parameters
92 CHAPTER 6. LLN AND CLT
n = 250
replications = 100000
distribution = uniform(loc=0, scale=(np.pi / 2))
μ, s = distribution.mean(), distribution.std()
g = np.sin
g_prime = np.cos
# Plot
asymptotic_sd = g_prime(μ) * s
fig, ax = plt.subplots(figsize=(10, 6))
xmin = 3 * g_prime(μ) * s
xmax = xmin
ax.set_xlim(xmin, xmax)
ax.hist(error_obs, bins=60, alpha=0.5, density=True)
xgrid = np.linspace(xmin, xmax, 200)
lb = "$N(0, g'(\mu)^2 \sigma^2)$"
ax.plot(xgrid, norm.pdf(xgrid, scale=asymptotic_sd), 'k', lw=2, label=lb)
ax.legend()
plt.show()
What happens when you replace [0, 𝜋/2] with [0, 𝜋]?
In this case, the mean 𝜇 of this distribution is 𝜋/2, and since 𝑔′ = cos, we have 𝑔′ (𝜇) = 0.
Hence the conditions of the delta theorem are not satisfied.
6.7.2 Exercise 2
√ 𝑑
𝑛Q(X̄ 𝑛 − 𝜇) → 𝑁 (0, I)
√
Y𝑛 ∶= 𝑛(X̄ 𝑛 − 𝜇) and Y ∼ 𝑁 (0, Σ)
𝑑
QY𝑛 → QY
Since linear combinations of normal random variables are normal, the vector QY is also nor-
mal.
Its mean is clearly 0, and its variance-covariance matrix is
𝑑
In conclusion, QY𝑛 → QY ∼ 𝑁 (0, I), which is what we aimed to show.
Now we turn to the simulation exercise.
Our solution is as follows
# Compute Σ^{1/2}
Q = inv(sqrtm(Σ))
# Plot
94 CHAPTER 6. LLN AND CLT
Heavy-Tailed Distributions
7.1 Contents
• Overview 7.2
• Visual Comparisons 7.3
• Failure of the LLN 7.4
• Classifying Tail Properties 7.5
• Exercises 7.6
• Solutions 7.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
7.2 Overview
Most commonly used probability distributions in classical statistics and the natural sciences
have either bounded support or light tails.
When a distribution is light-tailed, extreme observations are rare and draws tend not to devi-
ate too much from the mean.
Having internalized these kinds of distributions, many researchers and practitioners use rules
of thumb such as “outcomes more than four or five standard deviations from the mean can
safely be ignored.”
However, some distributions encountered in economics have far more probability mass in the
tails than distributions like the normal distribution.
With such heavy-tailed distributions, what would be regarded as extreme outcomes for
someone accustomed to thin tailed distributions occur relatively frequently.
Examples of heavy-tailed distributions observed in economic and financial settings include
• the income distributions and the wealth distribution (see, e.g., [108], [9]),
• the firm size distribution ([7], [41]}),
• the distribution of returns on holding assets over short time horizons ([76], [88]), and
• the distribution of city sizes ([91], [41]).
95
96 CHAPTER 7. HEAVY-TAILED DISTRIBUTIONS
These heavy tails turn out to be important for our understanding of economic outcomes.
As one example, the heaviness of the tail in the wealth distribution is one natural measure of
inequality.
It matters for taxation and redistribution policies, as well as for flow-on effects for productiv-
ity growth, business cycles, and political economy
• see, e.g., [2], [43], [15] or [3].
This lecture formalizes some of the concepts introduced above and reviews the key ideas.
Let’s start with some imports:
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
The following two lines can be added to avoid an annoying FutureWarning, and prevent a
specific compatibility issue between pandas and matplotlib from causing problems down the
line:
One way to build intuition on the difference between light and heavy tails is to plot indepen-
dent draws and compare them side-by-side.
7.3.1 A Simulation
The figure below shows a simulation. (You will be asked to replicate it in the exercises.)
The top two subfigures each show 120 independent draws from the normal distribution, which
is light-tailed.
The bottom subfigure shows 120 independent draws from the Cauchy distribution, which is
7.3. VISUAL COMPARISONS 97
heavy-tailed.
98 CHAPTER 7. HEAVY-TAILED DISTRIBUTIONS
In the top subfigure, the standard deviation of the normal distribution is 2, and the draws are
clustered around the mean.
In the middle subfigure, the standard deviation is increased to 12 and, as expected, the
amount of dispersion rises.
The bottom subfigure, with the Cauchy draws, shows a different pattern: tight clustering
around the mean for the great majority of observations, combined with a few sudden large
deviations from the mean.
This is typical of a heavy-tailed distribution.
r = s.pct_change()
fig, ax = plt.subplots()
ax.set_ylabel('returns', fontsize=12)
ax.set_xlabel('date', fontsize=12)
plt.show()
[*********************100%***********************] 1 of 1 completed
7.4. FAILURE OF THE LLN 99
Five of the 1217 observations are more than 5 standard deviations from the mean.
Overall, the figure is suggestive of heavy tails, although not to the same degree as the Cauchy
distribution the figure above.
If, however, one takes tick-by-tick data rather daily data, the heavy-tailedness of the distribu-
tion increases further.
One impact of heavy tails is that sample averages can be poor estimators of the underlying
mean of the distribution.
To understand this point better, recall our earlier discussion of the Law of Large Numbers,
which considered IID 𝑋1 , … , 𝑋𝑛 with common distribution 𝐹
𝑛
If 𝔼|𝑋𝑖 | is finite, then the sample mean 𝑋̄ 𝑛 ∶= 1
𝑛 ∑𝑖=1 𝑋𝑖 satisfies
ℙ {𝑋̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (1)
np.random.seed(1234)
N = 1_000
distribution = cauchy()
fig, ax = plt.subplots()
data = distribution.rvs(N)
# Plot
ax.plot(range(N), sample_mean, alpha=0.6, label='$\\bar X_n$')
plt.show()
𝑖𝑡𝑋̄ 𝑛 𝑡 𝑛
𝔼𝑒 = 𝔼 exp {𝑖 ∑ 𝑋𝑗 }
𝑛 𝑗=1
𝑛
𝑡
= 𝔼 ∏ exp {𝑖 𝑋𝑗 }
𝑗=1
𝑛
𝑛
𝑡
= ∏ 𝔼 exp {𝑖 𝑋𝑗 } = [𝜙(𝑡/𝑛)]𝑛
𝑗=1
𝑛
To keep our discussion precise, we need some definitions concerning tail properties.
We will focus our attention on the right hand tails of nonnegative random variables and their
distributions.
The definitions for left hand tails are very similar and we omit them to simplify the exposi-
tion.
∞
∫ exp(𝑡𝑥)𝐹 (𝑑𝑥) = ∞ for all 𝑡 > 0. (3)
0
One specific class of heavy-tailed distributions has been found repeatedly in economic and
social phenomena: the class of so-called power laws.
Specifically, given 𝛼 > 0, a nonnegative random variable 𝑋 is said to have a Pareto tail with
tail index 𝛼 if
Evidently (4) implies the existence of positive constants 𝑏 and 𝑥̄ such that ℙ{𝑋 > 𝑥} ≥ 𝑏𝑥−𝛼
whenever 𝑥 ≥ 𝑥.̄
The implication is that ℙ{𝑋 > 𝑥} converges to zero no faster than 𝑥−𝛼 .
In some sources, a random variable obeying (4) is said to have a power law tail.
The primary example is the Pareto distribution, which has distribution
𝛼
1 − (𝑥/𝑥)
̄ if 𝑥 ≥ 𝑥̄
𝐹 (𝑥) = { (5)
0 if 𝑥 < 𝑥̄
One graphical technique for investigating Pareto tails and power laws is the so-called rank-
size plot.
This kind of figure plots log size against log rank of the population (i.e., location in the popu-
lation when sorted from smallest to largest).
Often just the largest 5 or 10% of observations are plotted.
For a sufficiently large number of draws from a Pareto distribution, the plot generates a
straight line. For distributions with thinner tails, the data points are concave.
A discussion of why this occurs can be found in [84].
The figure below provides one example, using simulated data.
The rank-size plots shows draws from three different distributions: folded normal, chi-squared
with 1 degree of freedom and Pareto.
The Pareto sample produces a straight line, while the lines produced by the other samples are
concave.
7.6. EXERCISES 103
7.6 Exercises
7.6.1 Exercise 1
Replicate the figure presented above that compares normal and Cauchy draws.
Use np.random.seed(11) to set the seed.
7.6.2 Exercise 2
Prove: If 𝑋 has a Pareto tail with tail index 𝛼, then 𝔼[𝑋 𝑟 ] = ∞ for all 𝑟 ≥ 𝛼.
104 CHAPTER 7. HEAVY-TAILED DISTRIBUTIONS
7.6.3 Exercise 3
Repeat exercise 1, but replace the three distributions (two normal, one Cauchy) with three
Pareto distributions using different choices of 𝛼.
For 𝛼, try 1.15, 1.5 and 1.75.
Use np.random.seed(11) to set the seed.
7.6.4 Exercise 4
7.6.5 Exercise 5
There is an ongoing argument about whether the firm size distribution should be modeled as
a Pareto distribution or a lognormal distribution (see, e.g., [40], [65] or [99]).
This sounds esoteric but has real implications for a variety of economic phenomena.
To illustrate this fact in a simple way, let us consider an economy with 100,000 firms, an in-
terest rate of r = 0.05 and a corporate tax rate of 15%.
Your task is to estimate the present discounted value of projected corporate tax revenue over
the next 10 years.
Because we are forecasting, we need a model.
We will suppose that
1. the number of firms and the firm size distribution (measured in profits) remain fixed
and
1. generating 100,000 draws of firm profit from the firm size distribution,
The Pareto distribution is assumed to take the form (5) with 𝑥̄ = 1 and 𝛼 = 1.05.
(The value the tail index 𝛼 is plausible given the data [41].)
To make the lognormal option as similar as possible to the Pareto option, choose its parame-
ters such that the mean and median of both distributions are the same.
Note that, for each distribution, your estimate of tax revenue will be random because it is
based on a finite number of draws.
7.7. SOLUTIONS 105
To take this into account, generate 100 replications (evaluations of tax revenue) for each of
the two distributions and compare the two samples by
• producing a violin plot visualizing the two samples side-by-side and
• printing the mean and standard deviation of both samples.
For the seed use np.random.seed(1234).
What differences do you observe?
(Note: a better approach to this problem would be to model firm dynamics and try to track
individual firms given the current distribution. We will discuss firm dynamics in later lec-
tures.)
7.7 Solutions
7.7.1 Exercise 1
In [6]: n = 120
np.random.seed(11)
for ax in axes:
ax.set_ylim((120, 120))
s_vals = 2, 12
ax = axes[2]
distribution = cauchy()
data = distribution.rvs(n)
ax.plot(list(range(n)), data, linestyle='', marker='o', alpha=0.5, ms=4)
ax.vlines(list(range(n)), 0, data, lw=0.2)
ax.set_title(f"draws from the Cauchy distribution", fontsize=11)
plt.subplots_adjust(hspace=0.25)
plt.show()
106 CHAPTER 7. HEAVY-TAILED DISTRIBUTIONS
7.7. SOLUTIONS 107
7.7.2 Exercise 2
Let 𝑋 have a Pareto tail with tail index 𝛼 and let 𝐹 be its cdf.
Fix 𝑟 ≥ 𝛼.
As discussed after (4), we can take positive constants 𝑏 and 𝑥̄ such that
But then
∞ 𝑥̄ ∞
𝑟 𝑟−1 𝑟−1
𝔼𝑋 = 𝑟 ∫ 𝑥 ℙ{𝑋 > 𝑥}𝑥 ≥ 𝑟 ∫ 𝑥 ℙ{𝑋 > 𝑥}𝑥 + 𝑟 ∫ 𝑥𝑟−1 𝑏𝑥−𝛼 𝑥.
0 0 𝑥̄
∞
We know that ∫𝑥̄ 𝑥𝑟−𝛼−1 𝑥 = ∞ whenever 𝑟 − 𝛼 − 1 ≥ −1.
Since 𝑟 ≥ 𝛼, we have 𝔼𝑋 𝑟 = ∞.
7.7.3 Exercise 3
np.random.seed(11)
n = 120
alphas = [1.15, 1.50, 1.75]
plt.subplots_adjust(hspace=0.4)
plt.show()
108 CHAPTER 7. HEAVY-TAILED DISTRIBUTIONS
7.7.4 Exercise 4
data_1 = np.abs(z)
data_2 = np.exp(z)
data_3 = np.exp(np.random.exponential(scale=1.0, size=sample_size))
ax.legend()
fig.subplots_adjust(hspace=0.4)
plt.show()
110 CHAPTER 7. HEAVY-TAILED DISTRIBUTIONS
7.7.5 Exercise 5
To do the exercise, we need to choose the parameters 𝜇 and 𝜎 of the lognormal distribution to
match the mean and median of the Pareto distribution.
Here we understand the lognormal distribution as that of the random variable exp(𝜇 + 𝜎𝑍)
when 𝑍 is standard normal.
The mean and median of the Pareto distribution (5) with 𝑥̄ = 1 are
𝛼
mean = and median = 21/𝛼
𝛼−1
7.7. SOLUTIONS 111
Using the corresponding expressions for the lognormal distribution leads us to the equations
𝛼
= exp(𝜇 + 𝜎2 /2) and 21/𝛼 = exp(𝜇)
𝛼−1
β = 1 / (1 + r) # discount factor
x_bar = 1.0
α = 1.05
def pareto_rvs(n):
"Uses a standard method to generate Pareto draws."
u = np.random.uniform(size=n)
y = x_bar / (u**(1/α))
return y
In [11]: μ = np.log(2) / α
σ_sq = 2 * (np.log(α/(α 1)) np.log(2)/α)
σ = np.sqrt(σ_sq)
Here’s a function to compute a single estimate of tax revenue for a particular choice of distri-
bution dist.
tax_rev_lognorm = np.empty(num_reps)
tax_rev_pareto = np.empty(num_reps)
for i in range(num_reps):
tax_rev_pareto[i] = tax_rev('pareto')
tax_rev_lognorm[i] = tax_rev('lognorm')
112 CHAPTER 7. HEAVY-TAILED DISTRIBUTIONS
fig, ax = plt.subplots()
ax.violinplot(data)
plt.show()
Looking at the output of the code, our main conclusion is that the Pareto assumption leads
to a lower mean and greater dispersion.
Part II
Introduction to Dynamics
113
Chapter 8
8.1 Contents
• Overview 8.2
• Some Definitions 8.3
• Graphical Analysis 8.4
• Exercises 8.5
• Solutions 8.6
8.2 Overview
In this lecture we give a quick introduction to discrete time dynamics in one dimension.
In one-dimensional models, the state of the system is described by a single variable.
Although most interesting dynamic models have two or more state variables, the one-
dimensional setting is a good place to learn the foundations of dynamics and build intuition.
Let’s start with some standard imports:
This section sets out the objects of interest and the kinds of properties we study.
115
116 CHAPTER 8. DYNAMICS IN ONE DIMENSION
Here 𝑆 is called the state space and 𝑥 is called the state variable.
In the definition,
• time homogeneity means that 𝑔 is the same at each time 𝑡
• first order means dependence on only one lag (i.e., earlier states such as 𝑥𝑡−1 do not en-
ter into (1)).
If 𝑥0 ∈ 𝑆 is given, then (1) recursively defines the sequence
Continuing in this way, and using our knowledge of geometric series, we find that, for any 𝑡 ≥
0,
1 − 𝑎𝑡
𝑥𝑡 = 𝑎𝑡 𝑥0 + 𝑏 (4)
1−𝑎
This is about all we need to know about the linear model.
We have an exact expression for 𝑥𝑡 for all 𝑡 and hence a full understanding of the dynamics.
Notice in particular that |𝑎| < 1, then, by (4), we have
𝑏
𝑥𝑡 → as 𝑡 → ∞ (5)
1−𝑎
regardless of 𝑥0
This is an example of what is called global stability, a topic we return to below.
In the linear example above, we obtained an exact analytical expression for 𝑥𝑡 in terms of ar-
bitrary 𝑡 and 𝑥0 .
This made analysis of dynamics very easy.
8.4. GRAPHICAL ANALYSIS 117
When models are nonlinear, however, the situation can be quite different.
For example, recall how we previously studied the law of motion for the Solow growth model,
a simplified version of which is
Here 𝑘 is capital stock and 𝑠, 𝑧, 𝛼, 𝛿 are positive parameters with 0 < 𝛼, 𝛿 < 1.
If you try to iterate like we did in (3), you will find that the algebra gets messy quickly.
Analyzing the dynamics of this model requires a different method (see below).
8.3.4 Stability
A steady state of the difference equation 𝑥𝑡+1 = 𝑔(𝑥𝑡 ) is a point 𝑥∗ in 𝑆 such that 𝑥∗ =
𝑔(𝑥∗ ).
In other words, 𝑥∗ is a fixed point of the function 𝑔 in 𝑆.
For example, for the linear model 𝑥𝑡+1 = 𝑎𝑥𝑡 + 𝑏, you can use the definition to check that
• 𝑥∗ ∶= 𝑏/(1 − 𝑎) is a steady state whenever 𝑎 ≠ 1.
• if 𝑎 = 1 and 𝑏 = 0, then every 𝑥 ∈ ℝ is a steady state.
• if 𝑎 = 1 and 𝑏 ≠ 0, then the linear model has no steady state in ℝ.
A steady state 𝑥∗ of 𝑥𝑡+1 = 𝑔(𝑥𝑡 ) is called globally stable if, for all 𝑥0 ∈ 𝑆,
𝑥𝑡 = 𝑔𝑡 (𝑥0 ) → 𝑥∗ as 𝑡 → ∞
For example, in the linear model 𝑥𝑡+1 = 𝑎𝑥𝑡 + 𝑏 with 𝑎 ≠ 1, the steady state 𝑥∗
• is globally stable if |𝑎| < 1 and
• fails to be globally stable otherwise.
This follows directly from (4).
A steady state 𝑥∗ of 𝑥𝑡+1 = 𝑔(𝑥𝑡 ) is called locally stable if there exists an 𝜖 > 0 such that
Let’s look at an example: the Solow model with dynamics given in (6).
We begin with some plotting code that you can ignore at first reading.
The function of the code is to produce 45 degree diagrams and time series plots.
return fig, ax
x = x0
xticks = [xmin]
xtick_labels = [xmin]
for i in range(num_arrows):
if i == 0:
ax.arrow(x, 0.0, 0.0, g(x), **arrow_args) # x, y, dx, dy
else:
ax.arrow(x, x, 0.0, g(x) x, **arrow_args)
ax.plot((x, x), (0, x), 'k', ls='dotted')
x = g(x)
xticks.append(x)
xtick_labels.append(r'${}_{}$'.format(var, str(i+1)))
ax.plot((x, x), (0, x), 'k', ls='dotted')
xticks.append(xmax)
xtick_labels.append(xmax)
ax.set_xticks(xticks)
ax.set_yticks(xticks)
ax.set_xticklabels(xtick_labels)
ax.set_yticklabels(xtick_labels)
8.4. GRAPHICAL ANALYSIS 119
Let’s create a 45 degree diagram for the Solow model with a fixed set of parameters
𝑠𝑧 1/(1−𝛼)
𝑘∗ = ( )
𝛿
8.4. GRAPHICAL ANALYSIS 121
8.4.1 Trajectories
By the preceding discussion, in regions where 𝑔 lies above the 45 degree line, we know that
the trajectory is increasing.
The next figure traces out a trajectory in such a region so we can see this more clearly.
The initial condition is 𝑘0 = 0.25.
In [6]: k0 = 0.25
We can plot the time series of capital corresponding to the figure above as follows:
When capital stock is higher than the unique positive steady state, we see that it declines:
In [9]: k0 = 2.95
The Solow model is nonlinear but still generates very regular dynamics.
One model that generates irregular dynamics is the quadratic map
x0 = 0.3
plot45(g, xmin, xmax, x0, num_arrows=0)
126 CHAPTER 8. DYNAMICS IN ONE DIMENSION
8.5 Exercises
8.5.1 Exercise 1
8.6 Solutions
8.6.1 Exercise 1
In [15]: a, b = 0.5, 1
xmin, xmax = 1, 3
g = lambda x: a * x + b
In [16]: x0 = 0.5
plot45(g, xmin, xmax, x0, num_arrows=5)
Here is the corresponding time series, which converges towards the steady state.
In [18]: a, b = 0.5, 1
xmin, xmax = 1, 3
g = lambda x: a * x + b
In [19]: x0 = 0.5
plot45(g, xmin, xmax, x0, num_arrows=5)
132 CHAPTER 8. DYNAMICS IN ONE DIMENSION
Here is the corresponding time series, which converges towards the steady state.
Once again, we have convergence to the steady state but the nature of convergence differs.
In particular, the time series jumps from above the steady state to below it and back again.
In the current context, the series is said to exhibit damped oscillations.
134 CHAPTER 8. DYNAMICS IN ONE DIMENSION
Chapter 9
AR1 Processes
9.1 Contents
• Overview 9.2
• The AR(1) Model 9.3
• Stationarity and Asymptotic Stability 9.4
• Ergodicity 9.5
• Exercises 9.6
• Solutions 9.7
9.2 Overview
In this lecture we are going to study a very simple class of stochastic models called AR(1)
processes.
These simple models are used again and again in economic research to represent the dynamics
of series such as
• labor income
• dividends
• productivity, etc.
AR(1) processes can take negative values but are easily converted into positive processes
when necessary by a transformation such as exponentiation.
We are going to study AR(1) processes partly because they are useful and partly because
they help us understand important concepts.
Let’s start with some imports:
135
136 CHAPTER 9. AR1 PROCESSES
𝑡−1 𝑡−1
𝑋𝑡 = 𝑎𝑡 𝑋0 + 𝑏 ∑ 𝑎𝑗 + 𝑐 ∑ 𝑎𝑗 𝑊𝑡−𝑗 (2)
𝑗=0 𝑗=0
Equation (2) shows that 𝑋𝑡 is a well defined random variable, the value of which depends on
• the parameters,
• the initial condition 𝑋0 and
• the shocks 𝑊1 , … 𝑊𝑡 from time 𝑡 = 1 to the present.
Throughout, the symbol 𝜓𝑡 will be used to refer to the density of this random variable 𝑋𝑡 .
One of the nice things about this model is that it’s so easy to trace out the sequence of distri-
butions {𝜓𝑡 } corresponding to the time series {𝑋𝑡 }.
To see this, we first note that 𝑋𝑡 is normally distributed for each 𝑡.
This is immediate form (2), since linear combinations of independent normal random vari-
ables are normal.
Given that 𝑋𝑡 is normally distributed, we will know the full distribution 𝜓𝑡 if we can pin
down its first two moments.
Let 𝜇𝑡 and 𝑣𝑡 denote the mean and variance of 𝑋𝑡 respectively.
We can pin down these values from (2) or we can use the following recursive expressions:
These expressions are obtained from (1) by taking, respectively, the expectation and variance
of both sides of the equality.
9.3. THE AR(1) MODEL 137
In calculating the second expression, we are using the fact that 𝑋𝑡 and 𝑊𝑡+1 are independent.
(This follows from our assumptions and (2).)
Given the dynamics in (2) and initial conditions 𝜇0 , 𝑣0 , we obtain 𝜇𝑡 , 𝑣𝑡 and hence
𝜓𝑡 = 𝑁 (𝜇𝑡 , 𝑣𝑡 )
The following code uses these facts to track the sequence of marginal distributions {𝜓𝑡 }.
The parameters are
sim_length = 10
grid = np.linspace(5, 7, 120)
fig, ax = plt.subplots()
for t in range(sim_length):
mu = a * mu + b
v = a**2 * v + c**2
ax.plot(grid, norm.pdf(grid, loc=mu, scale=np.sqrt(v)),
label=f"$\psi_{t}$",
alpha=0.7)
ax.legend(bbox_to_anchor=[1.05,1],loc=2,borderaxespad=1)
plt.show()
138 CHAPTER 9. AR1 PROCESSES
Notice that, in the figure above, the sequence {𝜓𝑡 } seems to be converging to a limiting dis-
tribution.
This is even clearer if we project forward further into the future:
fig, ax = plt.subplots()
plot_density_seq(ax)
plt.show()
In fact it’s easy to show that such convergence will occur, regardless of the initial condition,
whenever |𝑎| < 1.
To see this, we just have to look at the dynamics of the first two moments, as given in (3).
When |𝑎| < 1, these sequence converge to the respective limits
𝑏 𝑐2
𝜇∗ ∶= and 𝑣∗ = (4)
1−𝑎 1 − 𝑎2
(See our lecture on one dimensional dynamics for background on deterministic convergence.)
Hence
𝜓𝑡 → 𝜓∗ = 𝑁 (𝜇∗ , 𝑣∗ ) as 𝑡 → ∞ (5)
We can confirm this is valid for the sequence above using the following code.
mu_star = b / (1 a)
std_star = np.sqrt(c**2 / (1 a**2)) # square root of v_star
psi_star = norm.pdf(grid, loc=mu_star, scale=std_star)
ax.plot(grid, psi_star, 'k', lw=2, label="$\psi^*$")
ax.legend()
plt.show()
140 CHAPTER 9. AR1 PROCESSES
A stationary distribution is a distribution that is a fixed point of the update rule for distribu-
tions.
In other words, if 𝜓𝑡 is stationary, then 𝜓𝑡+𝑗 = 𝜓𝑡 for all 𝑗 in ℕ.
A different way to put this, specialized to the current setting, is as follows: a density 𝜓 on ℝ
is stationary for the AR(1) process if
𝑋𝑡 ∼ 𝜓 ⟹ 𝑎𝑋𝑡 + 𝑏 + 𝑐𝑊𝑡+1 ∼ 𝜓
9.5 Ergodicity
1 𝑚
∑ ℎ(𝑋𝑡 ) → ∫ ℎ(𝑥)𝜓∗ (𝑥)𝑑𝑥 as 𝑚 → ∞ (6)
𝑚 𝑡=1
whenever the integral on the right hand side is finite and well defined.
Notes:
• In (6), convergence holds with probability one.
• The textbook by [81] is a classic reference on ergodicity.
For example, if we consider the identity function ℎ(𝑥) = 𝑥, we get
1 𝑚
∑ 𝑋 → ∫ 𝑥𝜓∗ (𝑥)𝑑𝑥 as 𝑚 → ∞
𝑚 𝑡=1 𝑡
In other words, the time series sample mean converges to the mean of the stationary distribu-
tion.
As will become clear over the next few lectures, ergodicity is a very important concept for
statistics and simulation.
9.6 Exercises
9.6.1 Exercise 1
𝑀𝑘 ∶= 𝔼[(𝑋 − 𝔼𝑋)𝑘 ]
0 if 𝑘 is odd
𝑀𝑘 = {
𝜎𝑘 (𝑘 − 1)!! if 𝑘 is even
1 𝑚
∑(𝑋 − 𝜇∗ )𝑘 ≈ 𝑀𝑘
𝑚 𝑡=1 𝑡
when 𝑚 is large.
Confirm this by simulation at a range of 𝑘 using the default parameters from the lecture.
142 CHAPTER 9. AR1 PROCESSES
9.6.2 Exercise 2
Write your own version of a one dimensional kernel density estimator, which estimates a den-
sity from a sample.
Write it as a class that takes the data 𝑋 and bandwidth ℎ when initialized and provides a
method 𝑓 such that
1 𝑛 𝑥 − 𝑋𝑖
𝑓(𝑥) = ∑𝐾 ( )
ℎ𝑛 𝑖=1 ℎ
9.6.3 Exercise 3
In the lecture we discussed the following fact: for the 𝐴𝑅(1) process
3. Use the resulting sample of 𝑋𝑡+1 values to produce a density estimate via kernel density
estimation.
Try this for 𝑛 = 2000 and confirm that the simulation based estimate of 𝜓𝑡+1 does converge
to the theoretical distribution.
9.7 Solutions
9.7.1 Exercise 1
@njit
def sample_moments_ar1(k, m=100_000, mu_0=0.0, sigma_0=1.0, seed=1234):
np.random.seed(seed)
sample_sum = 0.0
x = mu_0 + sigma_0 * np.random.randn()
for t in range(m):
sample_sum += (x mu_star)**k
x = a * x + b + c * np.random.randn()
return sample_sum / m
def true_moments_ar1(k):
if k % 2 == 0:
return std_star**k * factorial2(k 1)
else:
return 0
k_vals = np.arange(6) + 1
sample_moments = np.empty_like(k_vals)
true_moments = np.empty_like(k_vals)
fig, ax = plt.subplots()
ax.plot(k_vals, true_moments, label="true moments")
ax.plot(k_vals, sample_moments, label="sample moments")
ax.legend()
plt.show()
144 CHAPTER 9. AR1 PROCESSES
9.7.2 Exercise 2
In [8]: K = norm.pdf
class KDE:
if h is None:
c = x_data.std()
n = len(x_data)
h = 1.06 * c * n**(1/5)
self.h = h
self.x_data = x_data
ax.legend()
plt.show()
n = 500
parameter_pairs= (2, 2), (2, 5), (0.5, 0.5)
for α, β in parameter_pairs:
plot_kde(beta(α, β))
146 CHAPTER 9. AR1 PROCESSES
We see that the kernel density estimator is effective when the underlying distribution is
smooth but less so otherwise.
9.7.3 Exercise 3
In [11]: a = 0.9
b = 0.0
c = 0.1
μ = 3
s = 0.2
In [12]: μ_next = a * μ + b
s_next = np.sqrt(a**2 * s**2 + c**2)
In [14]: ψ = norm(μ, s)
ψ_next = norm(μ_next, s_next)
In [15]: n = 2000
x_draws = ψ.rvs(n)
x_draws_next = a * x_draws + b + c * np.random.randn(n)
kde = KDE(x_draws_next)
ax.legend()
plt.show()
The simulated distribution approximately coincides with the theoretical distribution, as pre-
dicted.
148 CHAPTER 9. AR1 PROCESSES
Chapter 10
10.1 Contents
• Overview 10.2
• Definitions 10.3
• Simulation 10.4
• Marginal Distributions 10.5
• Irreducibility and Aperiodicity 10.6
• Stationary Distributions 10.7
• Ergodicity 10.8
• Computing Expectations 10.9
• Exercises 10.10
• Solutions 10.11
In addition to what’s in Anaconda, this lecture will need the following libraries:
10.2 Overview
Markov chains are one of the most useful classes of stochastic processes, being
• simple, flexible and supported by many elegant theoretical results
• valuable for building intuition about random dynamic models
• central to quantitative modeling in their own right
You will find them in many of the workhorse models of economics and finance.
In this lecture, we review some of the theory of Markov chains.
We will also introduce some of the high-quality routines for working with Markov chains
available in QuantEcon.py.
Prerequisite knowledge is basic probability and linear algebra.
Let’s start with some standard imports:
149
150 CHAPTER 10. FINITE MARKOV CHAINS
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
10.3 Definitions
Each row of 𝑃 can be regarded as a probability mass function over 𝑛 possible outcomes.
It is too not difficult to check Section ?? that if 𝑃 is a stochastic matrix, then so is the 𝑘-th
power 𝑃 𝑘 for all 𝑘 ∈ ℕ.
In other words, knowing the current state is enough to know probabilities for future states.
In particular, the dynamics of a Markov chain are fully determined by the set of values
By construction,
• 𝑃 (𝑥, 𝑦) is the probability of going from 𝑥 to 𝑦 in one unit of time (one step)
• 𝑃 (𝑥, ⋅) is the conditional distribution of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥
10.3. DEFINITIONS 151
𝑃𝑖𝑗 = 𝑃 (𝑥𝑖 , 𝑥𝑗 ) 1 ≤ 𝑖, 𝑗 ≤ 𝑛
Going the other way, if we take a stochastic matrix 𝑃 , we can generate a Markov chain {𝑋𝑡 }
as follows:
• draw 𝑋0 from some specified distribution
• for each 𝑡 = 0, 1, …, draw 𝑋𝑡+1 from 𝑃 (𝑋𝑡 , ⋅)
By construction, the resulting process satisfies (2).
10.3.3 Example 1
Consider a worker who, at any given time 𝑡, is either unemployed (state 0) or employed (state
1).
Suppose that, over a one month period,
2. An employed worker loses her job and becomes unemployed with probability 𝛽 ∈ (0, 1).
1−𝛼 𝛼
𝑃 =( ) (3)
𝛽 1−𝛽
Once we have the values 𝛼 and 𝛽, we can address a range of questions, such as
• What is the average duration of unemployment?
• Over the long-run, what fraction of time does a worker find herself unemployed?
• Conditional on employment, what is the probability of becoming unemployed at least
once over the next 12 months?
We’ll cover such applications below.
10.3.4 Example 2
0.971 0.029 0
𝑃 =⎛
⎜ 0.145 0.778 0.077 ⎞
⎟
⎝ 0 0.508 0.492 ⎠
where
• the frequency is monthly
152 CHAPTER 10. FINITE MARKOV CHAINS
10.4 Simulation
One natural way to answer questions about Markov chains is to simulate them.
(To approximate the probability of event 𝐸, we can simulate many times and count the frac-
tion of times that 𝐸 occurs).
Nice functionality for simulating Markov chains exists in QuantEcon.py.
• Efficient, bundled with lots of other useful routines for handling Markov chains.
However, it’s also a good exercise to roll our own routines — let’s do that first and then come
back to the methods in QuantEcon.py.
In these exercises, we’ll take the state space to be 𝑆 = 0, … , 𝑛 − 1.
To simulate a Markov chain, we need its stochastic matrix 𝑃 and a probability distribution 𝜓
for the initial state to be drawn from.
The Markov chain is then constructed as discussed above. To repeat:
2. At each subsequent time 𝑡, the new state 𝑋𝑡+1 is drawn from 𝑃 (𝑋𝑡 , ⋅).
To implement this simulation procedure, we need a method for generating draws from a dis-
crete distribution.
For this task, we’ll use random.draw from QuantEcon, which works as follows:
10.4. SIMULATION 153
We’ll write our code as a function that takes the following three arguments
• A stochastic matrix P
• An initial state init
• A positive integer sample_size representing the length of the time series the function
should return
# set up
P = np.asarray(P)
X = np.empty(sample_size, dtype=int)
# simulate
X[0] = X_0
for t in range(sample_size 1):
X[t+1] = qe.random.draw(P_dist[X[t]])
return X
As we’ll see later, for a long series drawn from P, the fraction of the sample that takes value 0
will be about 0.25.
Moreover, this is true, regardless of the initial distribution from with 𝑋0 is drawn.
The following code illustrates this
Out[6]: 0.25111
You can try changing the initial distribution to confirm that the output is always close to
0.25.
154 CHAPTER 10. FINITE MARKOV CHAINS
As discussed above, QuantEcon.py has routines for handling Markov chains, including simula-
tion.
Here’s an illustration using the same P as the preceding example
mc = qe.MarkovChain(P)
X = mc.simulate(ts_length=1_000_000)
np.mean(X == 0)
Out[7]: 0.250219
CPU times: user 791 ms, sys: 1.36 ms, total: 792 ms
Wall time: 792 ms
CPU times: user 28.6 ms, sys: 8.11 ms, total: 36.7 ms
Wall time: 36.4 ms
If we want to simulate with output as indices rather than state values we can use
In [13]: mc.simulate_indices(ts_length=4)
Suppose that
In words, to get the probability of being at 𝑦 tomorrow, we account for all ways this can hap-
pen and sum their probabilities.
Rewriting this statement in terms of marginal and conditional probabilities gives
𝜓𝑡+1 = 𝜓𝑡 𝑃 (4)
In other words, to move the distribution forward one unit of time, we postmultiply by 𝑃 .
By repeating this 𝑚 times we move forward 𝑚 steps into the future.
Hence, iterating on (4), the expression 𝜓𝑡+𝑚 = 𝜓𝑡 𝑃 𝑚 is also valid — here 𝑃 𝑚 is the 𝑚-th
power of 𝑃 .
As a special case, we see that if 𝜓0 is the initial distribution from which 𝑋0 is drawn, then
𝜓0 𝑃 𝑚 is the distribution of 𝑋𝑚 .
This is very important, so let’s repeat it
156 CHAPTER 10. FINITE MARKOV CHAINS
𝑋0 ∼ 𝜓 0 ⟹ 𝑋𝑚 ∼ 𝜓0 𝑃 𝑚 (5)
𝑋𝑡 ∼ 𝜓𝑡 ⟹ 𝑋𝑡+𝑚 ∼ 𝜓𝑡 𝑃 𝑚 (6)
We know that the probability of transitioning from 𝑥 to 𝑦 in one step is 𝑃 (𝑥, 𝑦).
It turns out that the probability of transitioning from 𝑥 to 𝑦 in 𝑚 steps is 𝑃 𝑚 (𝑥, 𝑦), the
(𝑥, 𝑦)-th element of the 𝑚-th power of 𝑃 .
To see why, consider again (6), but now with 𝜓𝑡 putting all probability on state 𝑥
• 1 in the 𝑥-th position and zero elsewhere
Inserting this into (6), we see that, conditional on 𝑋𝑡 = 𝑥, the distribution of 𝑋𝑡+𝑚 is the
𝑥-th row of 𝑃 𝑚 .
In particular
Recall the stochastic matrix 𝑃 for recession and growth considered above.
Suppose that the current state is unknown — perhaps statistics are available only at the end
of the current month.
We estimate the probability that the economy is in state 𝑥 to be 𝜓(𝑥).
The probability of being in recession (either mild or severe) in 6 months time is given by the
inner product
0
𝜓𝑃 6 ⋅ ⎛
⎜ 1 ⎞
⎟
1
⎝ ⎠
The marginal distributions we have been studying can be viewed either as probabilities or as
cross-sectional frequencies in large samples.
To illustrate, recall our model of employment/unemployment dynamics for a given worker
discussed above.
Consider a large population of workers, each of whose lifetime experience is described by the
specified dynamics, independent of one another.
Let 𝜓 be the current cross-sectional distribution over {0, 1}.
10.6. IRREDUCIBILITY AND APERIODICITY 157
The cross-sectional distribution records the fractions of workers employed and unemployed at
a given moment.
• For example, 𝜓(0) is the unemployment rate.
What will the cross-sectional distribution be in 10 periods hence?
The answer is 𝜓𝑃 10 , where 𝑃 is the stochastic matrix in (3).
This is because each worker is updated according to 𝑃 , so 𝜓𝑃 10 represents probabilities for a
single randomly selected worker.
But when the sample is large, outcomes and probabilities are roughly equal (by the Law of
Large Numbers).
So for a very large (tending to infinite) population, 𝜓𝑃 10 also represents the fraction of work-
ers in each state.
This is exactly the cross-sectional distribution.
Irreducibility and aperiodicity are central concepts of modern Markov chain theory.
Let’s see what they’re about.
10.6.1 Irreducibility
We can translate this into a stochastic matrix, putting zeros where there’s no edge between
nodes
0.9 0.1 0
𝑃 ∶= ⎜ 0.4 0.4 0.2 ⎞
⎛ ⎟
⎝ 0.1 0.1 0.8 ⎠
It’s clear from the graph that this stochastic matrix is irreducible: we can reach any state
from any other state eventually.
We can also test this using QuantEcon.py’s MarkovChain class
Out[14]: True
Here’s a more pessimistic scenario, where the poor are poor forever
This stochastic matrix is not irreducible, since, for example, rich is not accessible from poor.
Let’s confirm this
Out[15]: False
In [16]: mc.communication_classes
It might be clear to you already that irreducibility is going to be important in terms of long
run outcomes.
For example, poverty is a life sentence in the second graph but not the first.
We’ll come back to this a bit later.
10.6.2 Aperiodicity
Loosely speaking, a Markov chain is called periodic if it cycles in a predictible way, and aperi-
odic otherwise.
Here’s a trivial example with three states
mc = qe.MarkovChain(P)
mc.period
Out[17]: 3
More formally, the period of a state 𝑥 is the greatest common divisor of the set of integers
In the last example, 𝐷(𝑥) = {3, 6, 9, …} for every state 𝑥, so the period is 3.
A stochastic matrix is called aperiodic if the period of every state is 1, and periodic other-
wise.
For example, the stochastic matrix associated with the transition probabilities below is peri-
odic because, for example, state 𝑎 has period 2
160 CHAPTER 10. FINITE MARKOV CHAINS
mc = qe.MarkovChain(P)
mc.period
Out[18]: 2
In [19]: mc.is_aperiodic
Out[19]: False
As seen in (4), we can shift probabilities forward one unit of time via postmultiplication by
𝑃.
Some distributions are invariant under this updating process — for example,
• For example, if 𝑃 is the identity matrix, then all distributions are stationary.
Since stationary distributions are long run equilibria, to get uniqueness we require that initial
conditions are not infinitely persistent.
Infinite persistence of initial conditions occurs if certain regions of the state space cannot be
accessed from other regions, which is the opposite of irreducibility.
This gives some intuition for the following fundamental theorem.
Theorem. If 𝑃 is both aperiodic and irreducible, then
10.7.1 Example
Recall our model of employment/unemployment dynamics for a given worker discussed above.
Assuming 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), the uniform ergodicity condition is satisfied.
Let 𝜓∗ = (𝑝, 1 − 𝑝) be the stationary distribution, so that 𝑝 corresponds to unemployment
(state 0).
Using 𝜓∗ = 𝜓∗ 𝑃 and a bit of algebra yields
𝛽
𝑝=
𝛼+𝛽
This is, in some sense, a steady state probability of unemployment — more on interpretation
below.
Not surprisingly it tends to zero as 𝛽 → 0, and to one as 𝛼 → 0.
As discussed above, a given Markov matrix 𝑃 can have many stationary distributions.
That is, there can be many row vectors 𝜓 such that 𝜓 = 𝜓𝑃 .
In fact if 𝑃 has two distinct stationary distributions 𝜓1 , 𝜓2 then it has infinitely many, since
in this case, as you can verify,
162 CHAPTER 10. FINITE MARKOV CHAINS
𝜓3 ∶= 𝜆𝜓1 + (1 − 𝜆)𝜓2
mc = qe.MarkovChain(P)
mc.stationary_distributions # Show all stationary distributions
Part 2 of the Markov chain convergence theorem stated above tells us that the distribution of
𝑋𝑡 converges to the stationary distribution regardless of where we start off.
This adds considerable weight to our interpretation of 𝜓∗ as a stochastic steady state.
The convergence in the theorem is illustrated in the next figure
ψ = ψ @ P
mc = qe.MarkovChain(P)
ψ_star = mc.stationary_distributions[0]
ax.scatter(ψ_star[0], ψ_star[1], ψ_star[2], c='k', s=60)
plt.show()
Here
• 𝑃 is the stochastic matrix for recession and growth considered above.
• The highest red dot is an arbitrarily chosen initial probability distribution 𝜓, repre-
sented as a vector in ℝ3 .
• The other red dots are the distributions 𝜓𝑃 𝑡 for 𝑡 = 1, 2, ….
• The black dot is 𝜓∗ .
You might like to try experimenting with different initial conditions.
10.8 Ergodicity
1 𝑚
∑ 1{𝑋𝑡 = 𝑥} → 𝜓∗ (𝑥) as 𝑚 → ∞ (7)
𝑚 𝑡=1
Here
• 1{𝑋𝑡 = 𝑥} = 1 if 𝑋𝑡 = 𝑥 and zero otherwise
• convergence is with probability one
• the result does not depend on the distribution (or value) of 𝑋0
The result tells us that the fraction of time the chain spends at state 𝑥 converges to 𝜓∗ (𝑥) as
time goes to infinity.
This gives us another way to interpret the stationary distribution — provided that the con-
vergence result in (7) is valid.
The convergence in (7) is a special case of a law of large numbers result for Markov chains —
see EDTC, section 4.3.4 for some additional information.
10.8.1 Example
𝛽
𝑝=
𝛼+𝛽
𝔼[ℎ(𝑋𝑡 )] (8)
𝔼[ℎ(𝑋𝑡+𝑘 ) ∣ 𝑋𝑡 = 𝑥] (9)
where
• {𝑋𝑡 } is a Markov chain generated by 𝑛 × 𝑛 stochastic matrix 𝑃
10.10. EXERCISES 165
ℎ(𝑥1 )
⎛
ℎ=⎜ ⋮ ⎞
⎟
⎝ ℎ(𝑥 )
𝑛 ⎠
The unconditional expectation (8) is easy: We just sum over the distribution of 𝑋𝑡 to get
𝔼[ℎ(𝑋𝑡 )] = 𝜓𝑃 𝑡 ℎ
For the conditional expectation (9), we need to sum over the conditional distribution of 𝑋𝑡+𝑘
given 𝑋𝑡 = 𝑥.
We already know that this is 𝑃 𝑘 (𝑥, ⋅), so
∞
𝔼 [∑ 𝛽 𝑗 ℎ(𝑋𝑡+𝑗 ) ∣ 𝑋𝑡 = 𝑥] = [(𝐼 − 𝛽𝑃 )−1 ℎ](𝑥)
𝑗=0
where
(𝐼 − 𝛽𝑃 )−1 = 𝐼 + 𝛽𝑃 + 𝛽 2 𝑃 2 + ⋯
10.10 Exercises
10.10.1 Exercise 1
According to the discussion above, if a worker’s employment dynamics obey the stochastic
matrix
166 CHAPTER 10. FINITE MARKOV CHAINS
1−𝛼 𝛼
𝑃 =( )
𝛽 1−𝛽
with 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), then, in the long-run, the fraction of time spent unemployed
will be
𝛽
𝑝 ∶=
𝛼+𝛽
In other words, if {𝑋𝑡 } represents the Markov chain for employment, then 𝑋̄ 𝑚 → 𝑝 as 𝑚 →
∞, where
1 𝑚
𝑋̄ 𝑚 ∶= ∑ 1{𝑋𝑡 = 0}
𝑚 𝑡=1
The exercise is to illustrate this convergence by computing 𝑋̄ 𝑚 for large 𝑚 and checking that
it is close to 𝑝.
You will see that this statement is true regardless of the choice of initial condition or the val-
ues of 𝛼, 𝛽, provided both lie in (0, 1).
10.10.2 Exercise 2
2. have the matches returned in order, where the order corresponds to some measure of
“importance”
𝑟𝑖
𝑟𝑗 = ∑
𝑖∈𝐿𝑗
ℓ𝑖
where
• ℓ𝑖 is the total number of outbound links from 𝑖
• 𝐿𝑗 is the set of all pages 𝑖 such that 𝑖 has a link to 𝑗
This is a measure of the number of inbound links, weighted by their own ranking (and nor-
malized by 1/ℓ𝑖 ).
There is, however, another interpretation, and it brings us back to Markov chains.
Let 𝑃 be the matrix given by 𝑃 (𝑖, 𝑗) = 1{𝑖 → 𝑗}/ℓ𝑖 where 1{𝑖 → 𝑗} = 1 if 𝑖 has a link to 𝑗
and zero otherwise.
The matrix 𝑃 is a stochastic matrix provided that each page has at least one link.
With this definition of 𝑃 we have
168 CHAPTER 10. FINITE MARKOV CHAINS
𝑟𝑖 𝑟
𝑟𝑗 = ∑ = ∑ 1{𝑖 → 𝑗} 𝑖 = ∑ 𝑃 (𝑖, 𝑗)𝑟𝑖
𝑖∈𝐿𝑗
ℓ𝑖 all 𝑖
ℓ𝑖 all 𝑖
d > h;
h > g;
h > l;
h > m;
i > g;
i > h;
i > n;
j > e;
j > i;
j > k;
k > n;
l > m;
m > g;
n > c;
n > j;
n > m;
Writing web_graph_data.txt
To parse this file and extract the relevant information, you can use regular expressions.
The following code snippet provides a hint as to how you can go about this
In [24]: import re
re.findall('\w', 'x +++ y ****** z') # \w matches alphanumerics
When you solve for the ranking, you will find that the highest ranked node is in fact g, while
the lowest is a.
10.10.3 Exercise 3
𝜎𝑢2
𝜎𝑦2 ∶=
1 − 𝜌2
Tauchen’s method [105] is the most common method for approximating this continuous state
process with a finite state Markov chain.
170 CHAPTER 10. FINITE MARKOV CHAINS
A routine for this already exists in QuantEcon.py but let’s write our own version as an exer-
cise.
As a first step, we choose
• 𝑛, the number of states for the discrete approximation
• 𝑚, an integer that parameterizes the width of the state space
Next, we create a state space {𝑥0 , … , 𝑥𝑛−1 } ⊂ ℝ and a stochastic 𝑛 × 𝑛 matrix 𝑃 such that
• 𝑥0 = −𝑚 𝜎𝑦
• 𝑥𝑛−1 = 𝑚 𝜎𝑦
• 𝑥𝑖+1 = 𝑥𝑖 + 𝑠 where 𝑠 = (𝑥𝑛−1 − 𝑥0 )/(𝑛 − 1)
Let 𝐹 be the cumulative distribution function of the normal distribution 𝑁 (0, 𝜎𝑢2 ).
The values 𝑃 (𝑥𝑖 , 𝑥𝑗 ) are computed to approximate the AR(1) process — omitting the deriva-
tion, the rules are as follows:
1. If 𝑗 = 0, then set
1. If 𝑗 = 𝑛 − 1, then set
1. Otherwise, set
The exercise is to write a function approx_markov(rho, sigma_u, m=3, n=7) that returns
{𝑥0 , … , 𝑥𝑛−1 } ⊂ ℝ and 𝑛 × 𝑛 matrix 𝑃 as described above.
• Even better, write a function that returns an instance of QuantEcon.py’s MarkovChain
class.
10.11 Solutions
10.11.1 Exercise 1
In [26]: α = β = 0.1
N = 10000
p = β / (α + β)
10.11. SOLUTIONS 171
ax.legend(loc='upper right')
plt.show()
10.11.2 Exercise 2
In [27]: """
Return list of pages, ordered by rank
"""
import re
from operator import itemgetter
172 CHAPTER 10. FINITE MARKOV CHAINS
infile = 'web_graph_data.txt'
alphabet = 'abcdefghijklmnopqrstuvwxyz'
Rankings
***
g: 0.1607
j: 0.1594
m: 0.1195
n: 0.1088
k: 0.09106
b: 0.08326
e: 0.05312
i: 0.05312
c: 0.04834
h: 0.0456
l: 0.03202
d: 0.03056
f: 0.01164
a: 0.002911
10.11.3 Exercise 3
Inventory Dynamics
11.1 Contents
• Overview 11.2
• Sample Paths 11.3
• Marginal Distributions 11.4
• Exercises 11.5
• Solutions 11.6
11.2 Overview
In this lecture we will study the time path of inventories for firms that follow so-called s-S
inventory dynamics.
Such firms
These kinds of policies are common in practice and also optimal in certain circumstances.
A review of early literature and some macroeconomic implications can be found in [19].
Here our main aim is to learn more about simulation, time series and Markov dynamics.
While our Markov environment and many of the concepts we consider are related to those
found in our lecture on finite Markov chains, the state space is a continuum in the current
application.
Let’s start with some imports
173
174 CHAPTER 11. INVENTORY DYNAMICS
(𝑆 − 𝐷𝑡+1 )+ if 𝑋𝑡 ≤ 𝑠
𝑋𝑡+1 = {
(𝑋𝑡 − 𝐷𝑡+1 )+ if 𝑋𝑡 > 𝑠
𝐷𝑡 = exp(𝜇 + 𝜎𝑍𝑡 )
where 𝜇 and 𝜎 are parameters and {𝑍𝑡 } is IID and standard normal.
Here’s a class that stores parameters and generates time paths for inventory.
In [2]: firm_data = [
('s', float64), # restock trigger level
('S', float64), # capacity
('mu', float64), # shock location parameter
('sigma', float64) # shock scale parameter
]
@jitclass(firm_data)
class Firm:
Z = np.random.randn()
D = np.exp(self.mu + self.sigma * Z)
if x <= self.s:
return max(self.S D, 0)
else:
return max(x D, 0)
X = np.empty(sim_length)
X[0] = x_init
for t in range(sim_length1):
X[t+1] = self.update(X[t])
return X
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/ipykernel_launcher.py:9: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
11.3. SAMPLE PATHS 175
s, S = firm.s, firm.S
sim_length = 100
x_init = 50
X = firm.sim_inventory_path(x_init, sim_length)
fig, ax = plt.subplots()
bbox = (0., 1.02, 1., .102)
legend_args = {'ncol': 3,
'bbox_to_anchor': bbox,
'loc': 3,
'mode': 'expand'}
ax.plot(X, label="inventory")
ax.plot(s * np.ones(sim_length), 'k', label="$s$")
ax.plot(S * np.ones(sim_length), 'k', label="$S$")
ax.set_ylim(0, S+10)
ax.set_xlabel("time")
ax.legend(**legend_args)
plt.show()
176 CHAPTER 11. INVENTORY DYNAMICS
Now let’s simulate multiple paths in order to build a more complete picture of the probabili-
ties of different outcomes:
In [4]: sim_length=200
fig, ax = plt.subplots()
for i in range(400):
X = firm.sim_inventory_path(x_init, sim_length)
ax.plot(X, 'b', alpha=0.2, lw=0.5)
plt.show()
In [5]: T = 50
M = 200 # Number of draws
11.4. MARGINAL DISTRIBUTIONS 177
ymin, ymax = 0, S + 10
for ax in axes:
ax.grid(alpha=0.4)
ax = axes[0]
ax.set_ylim(ymin, ymax)
ax.set_ylabel('$X_t$', fontsize=16)
ax.vlines((T,), 1.5, 1.5)
ax.set_xticks((T,))
ax.set_xticklabels((r'$T$',))
sample = np.empty(M)
for m in range(M):
X = firm.sim_inventory_path(x_init, 2 * T)
ax.plot(X, 'b', lw=1, alpha=0.5)
ax.plot((T,), (X[T+1],), 'ko', alpha=0.5)
sample[m] = X[T+1]
axes[1].set_ylim(ymin, ymax)
axes[1].hist(sample,
bins=16,
density=True,
orientation='horizontal',
histtype='bar',
alpha=0.5)
plt.show()
In [6]: T = 50
M = 50_000
178 CHAPTER 11. INVENTORY DYNAMICS
fig, ax = plt.subplots()
sample = np.empty(M)
for m in range(M):
X = firm.sim_inventory_path(x_init, T+1)
sample[m] = X[T]
ax.hist(sample,
bins=36,
density=True,
histtype='bar',
alpha=0.75)
plt.show()
The allocation of probability mass is similar to what was shown by the histogram just above.
11.5 Exercises
11.5.1 Exercise 1
Try different initial conditions to verify that, in the long run, the distribution is invariant
across initial conditions.
11.5.2 Exercise 2
Using simulation, calculate the probability that firms that start with 𝑋0 = 70 need to order
twice or more in the first 50 periods.
You will need a large sample size to get an accurate reading.
11.6 Solutions
11.6.1 Exercise 1
@njit(parallel=True)
def shift_firms_forward(current_inventory_levels, num_periods):
num_firms = len(current_inventory_levels)
new_inventory_levels = np.empty(num_firms)
for f in prange(num_firms):
x = current_inventory_levels[f]
for t in range(num_periods):
Z = np.random.randn()
D = np.exp(mu + sigma * Z)
if x <= s:
x = max(S D, 0)
else:
x = max(x D, 0)
new_inventory_levels[f] = x
return new_inventory_levels
In [10]: x_init = 50
num_firms = 50_000
first_diffs = np.diff(sample_dates)
fig, ax = plt.subplots()
X = np.ones(num_firms) * x_init
current_date = 0
for d in first_diffs:
X = shift_firms_forward(X, d)
current_date += d
plot_kde(X, ax, label=f't = {current_date}')
11.6. SOLUTIONS 181
ax.set_xlabel('inventory')
ax.set_ylabel('probability')
ax.legend()
plt.show()
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
11.6.2 Exercise 2
In [11]: @njit(parallel=True)
def compute_freq(sim_length=50, x_init=70, num_firms=1_000_000):
182 CHAPTER 11. INVENTORY DYNAMICS
for t in range(sim_length):
Z = np.random.randn()
D = np.exp(mu + sigma * Z)
if x <= s:
x = max(S D, 0)
restock_counter += 1
else:
x = max(x D, 0)
if restock_counter > 1:
firm_counter += 1
Note the time the routine takes to run, as well as the output.
In [12]: %%time
freq = compute_freq()
print(f"Frequency of at least two stock outs = {freq}")
Try switching the parallel flag to False in the jitted function above.
Depending on your system, the difference can be substantial.
(On our desktop machine, the speed up is by a factor of 5.)
Chapter 12
12.1 Contents
• Overview 12.2
• The Linear State Space Model 12.3
• Distributions and Moments 12.4
• Stationarity and Ergodicity 12.5
• Noisy Observations 12.6
• Prediction 12.7
• Code 12.8
• Exercises 12.9
• Solutions 12.10
“We may regard the present state of the universe as the effect of its past and the
cause of its future” – Marquis de Laplace
In addition to what’s in Anaconda, this lecture will need the following libraries:
12.2 Overview
183
184 CHAPTER 12. LINEAR STATE SPACE MODELS
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
12.3.1 Primitives
1. the matrices 𝐴, 𝐶, 𝐺
Given 𝐴, 𝐶, 𝐺 and draws of 𝑥0 and 𝑤1 , 𝑤2 , …, the model (1) pins down the values of the se-
quences {𝑥𝑡 } and {𝑦𝑡 }.
12.3. THE LINEAR STATE SPACE MODEL 185
Even without these draws, the primitives 1–3 pin down the probability distributions of {𝑥𝑡 }
and {𝑦𝑡 }.
Later we’ll see how to compute these distributions and their moments.
We’ve made the common assumption that the shocks are independent standardized normal
vectors.
But some of what we say will be valid under the assumption that {𝑤𝑡+1 } is a martingale
difference sequence.
A martingale difference sequence is a sequence that is zero mean when conditioned on past
information.
In the present case, since {𝑥𝑡 } is our state sequence, this means that it satisfies
This is a weaker condition than that {𝑤𝑡 } is IID with 𝑤𝑡+1 ∼ 𝑁 (0, 𝐼).
12.3.2 Examples
1 1 0 0 0
𝑥𝑡 = ⎡ 𝑦
⎢ 𝑡 ⎥
⎤ 𝐴=⎡ ⎤
⎢ 0 𝜙1 𝜙2 ⎥
𝜙 𝐶=⎡
⎢0⎥
⎤ 𝐺 = [0 1 0]
⎣𝑦𝑡−1 ⎦ ⎣0 1 0⎦ ⎣0⎦
You can confirm that under these definitions, (1) and (2) agree.
The next figure shows the dynamics of this process when 𝜙0 = 1.1, 𝜙1 = 0.8, 𝜙2 = −0.8, 𝑦0 =
𝑦−1 = 1.
ar = LinearStateSpace(A, C, G, mu_0=np.ones(n))
x, y = ar.simulate(ts_length)
fig, ax = plt.subplots()
y = y.flatten()
ax.plot(y, 'b', lw=2, alpha=0.7)
ax.grid()
ax.set_xlabel('time', fontsize=12)
ax.set_ylabel('$y_t$', fontsize=12)
plt.show()
A = [[1, 0, 0 ],
[ϕ_0, ϕ_1, ϕ_2],
[0, 1, 0 ]]
C = np.zeros((3, 1))
G = [0, 1, 0]
plot_lss(A, C, G)
′
To put this in the linear state space format we take 𝑥𝑡 = [𝑦𝑡 𝑦𝑡−1 𝑦𝑡−2 𝑦𝑡−3 ] and
𝜙1 𝜙2 𝜙3 𝜙4 𝜎
⎡1 0 0 0⎤ ⎡0⎤
𝐴=⎢ ⎥ 𝐶=⎢ ⎥ 𝐺 = [1 0 0 0]
⎢0 1 0 0⎥ ⎢0⎥
⎣0 0 1 0⎦ ⎣0⎦
The matrix 𝐴 has the form of the companion matrix to the vector [𝜙1 𝜙2 𝜙3 𝜙4 ].
The next figure shows the dynamics of this process when
C_1 = [[σ],
[0],
[0],
[0]]
G_1 = [1, 0, 0, 0]
Vector Autoregressions
𝑦𝑡 𝜙1 𝜙2 𝜙3 𝜙4 𝜎
⎡𝑦 ⎤ ⎡𝐼 0 0 0⎤ ⎡0⎤
𝑥𝑡 = ⎢ 𝑡−1 ⎥ 𝐴=⎢ ⎥ 𝐶=⎢ ⎥ 𝐺 = [𝐼 0 0 0]
⎢𝑦𝑡−2 ⎥ ⎢0 𝐼 0 0⎥ ⎢0⎥
⎣𝑦𝑡−3 ⎦ ⎣0 0 𝐼 0⎦ ⎣0⎦
Seasonals
0 0 0 1
⎡1 0 0 0⎤
𝐴=⎢ ⎥
⎢0 1 0 0⎥
⎣0 0 1 0⎦
It is easy to check that 𝐴4 = 𝐼, which implies that 𝑥𝑡 is strictly periodic with period 4:Section
??
𝑥𝑡+4 = 𝑥𝑡
Such an 𝑥𝑡 process can be used to model deterministic seasonals in quarterly time series.
The indeterministic seasonal produces recurrent, but aperiodic, seasonal fluctuations.
Time Trends
1 1 0
𝐴=[ ] 𝐶=[ ] 𝐺 = [𝑎 𝑏] (4)
0 1 0
12.3. THE LINEAR STATE SPACE MODEL 189
′
and starting at initial condition 𝑥0 = [0 1] .
In fact, it’s possible to use the state-space system to represent polynomial trends of any or-
der.
For instance, we can represent the model 𝑦𝑡 = 𝑎𝑡2 + 𝑏𝑡 + 𝑐 in the linear state space form by
taking
1 1 0 0
𝐴=⎡
⎢0 1 1⎥
⎤ 𝐶=⎡
⎢0⎥
⎤ 𝐺 = [2𝑎 𝑎 + 𝑏 𝑐]
⎣0 0 1⎦ ⎣0⎦
′
and starting at initial condition 𝑥0 = [0 0 1] .
It follows that
1 𝑡 𝑡(𝑡 − 1)/2
𝑡 ⎡
𝐴 = ⎢0 1 𝑡 ⎤
⎥
⎣0 0 1 ⎦
Then 𝑥′𝑡 = [𝑡(𝑡 − 1)/2 𝑡 1]. You can now confirm that 𝑦𝑡 = 𝐺𝑥𝑡 has the correct form.
𝑥𝑡 = 𝐴𝑥𝑡−1 + 𝐶𝑤𝑡
= 𝐴2 𝑥𝑡−2 + 𝐴𝐶𝑤𝑡−1 + 𝐶𝑤𝑡
⋮ (5)
𝑡−1
= ∑ 𝐴𝑗 𝐶𝑤𝑡−𝑗 + 𝐴𝑡 𝑥0
𝑗=0
1 1 1
𝐴=[ ] 𝐶=[ ]
0 1 0
1 𝑡 ′
You will be able to show that 𝐴𝑡 = [ ] and 𝐴𝑗 𝐶 = [1 0] .
0 1
Substituting into the moving average representation (5), we obtain
190 CHAPTER 12. LINEAR STATE SPACE MODELS
𝑡−1
𝑥1𝑡 = ∑ 𝑤𝑡−𝑗 + [1 𝑡] 𝑥0
𝑗=0
Using (1), it’s easy to obtain expressions for the (unconditional) means of 𝑥𝑡 and 𝑦𝑡 .
We’ll explain what unconditional and conditional mean soon.
Letting 𝜇𝑡 ∶= 𝔼[𝑥𝑡 ] and using linearity of expectations, we find that
12.4.2 Distributions
In general, knowing the mean and variance-covariance matrix of a random vector is not quite
as good as knowing the full distribution.
However, there are some situations where these moments alone tell us all we need to know.
These are situations in which the mean vector and covariance matrix are sufficient statis-
tics for the population distribution.
(Sufficient statistics form a list of objects that characterize a population distribution)
One such situation is when the vector in question is Gaussian (i.e., normally distributed).
This is the case here, given
In particular, given our Gaussian assumptions on the primitives and the linearity of (1) we
can see immediately that both 𝑥𝑡 and 𝑦𝑡 are Gaussian for all 𝑡 ≥ 0 Section ??.
Since 𝑥𝑡 is Gaussian, to find the distribution, all we need to do is find its mean and variance-
covariance matrix.
But in fact we’ve already done this, in (6) and (7).
Letting 𝜇𝑡 and Σ𝑡 be as defined by these equations, we have
𝑥𝑡 ∼ 𝑁 (𝜇𝑡 , Σ𝑡 ) (11)
The next figure shows 20 simulations, producing 20 time series for {𝑦𝑡 }, and hence 20 draws
of 𝑦𝑇 .
The system in question is the univariate autoregressive model (3).
The values of 𝑦𝑇 are represented by black dots in the left-hand figure
ar = LinearStateSpace(A, C, G, mu_0=np.ones(n))
for ax in axes:
ax.grid(alpha=0.4)
ax.set_ylim(ymin, ymax)
ax = axes[0]
ax.set_ylim(ymin, ymax)
ax.set_ylabel('$y_t$', fontsize=12)
ax.set_xlabel('time', fontsize=12)
ax.vlines((T,), 1.5, 1.5)
ax.set_xticks((T,))
ax.set_xticklabels(('$T$',))
sample = []
for i in range(sample_size):
rcolor = random.choice(('c', 'g', 'b', 'k'))
x, y = ar.simulate(ts_length=T+15)
y = y.flatten()
ax.plot(y, color=rcolor, lw=1, alpha=0.5)
ax.plot((T,), (y[T],), 'ko', alpha=0.5)
sample.append(y[T])
y = y.flatten()
axes[1].set_ylim(ymin, ymax)
axes[1].set_ylabel('$y_t$', fontsize=12)
axes[1].set_xlabel('relative frequency', fontsize=12)
axes[1].hist(sample, bins=16, density=True, orientation='horizontal', alpha=0.5)
plt.show()
G_2 = [1, 0, 0, 0]
In the right-hand figure, these values are converted into a rotated histogram that shows rela-
tive frequencies from our sample of 20 𝑦𝑇 ’s.
Here is another figure, this time with 100 observations
In [8]: t = 100
cross_section_plot(A_2, C_2, G_2, T=t)
Let’s now try with 500,000 observations, showing only the histogram (without rotation)
In [9]: T = 100
ymin=0.8
ymax=1.25
sample_size = 500_000
Ensemble Means
Just as the histogram approximates the population distribution, the ensemble or cross-
sectional average
1 𝐼 𝑖
𝑦𝑇̄ ∶= ∑ 𝑦𝑇
𝐼 𝑖=1
approximates the expectation 𝔼[𝑦𝑇 ] = 𝐺𝜇𝑇 (as implied by the law of large numbers).
Here’s a simulation comparing the ensemble averages and population means at time points
𝑡 = 0, … , 50.
The parameters are the same as for the preceding figures, and the sample size is relatively
small (𝐼 = 20).
12.4. DISTRIBUTIONS AND MOMENTS 195
In [10]: I = 20
T = 50
ymin = 0.5
ymax = 1.15
fig, ax = plt.subplots()
ensemble_mean = np.zeros(T)
for i in range(I):
x, y = ar.simulate(ts_length=T)
y = y.flatten()
ax.plot(y, 'c', lw=0.8, alpha=0.5)
ensemble_mean = ensemble_mean + y
ensemble_mean = ensemble_mean / I
ax.plot(ensemble_mean, color='b', lw=2, alpha=0.8, label='$\\bar y_t$')
m = ar.moment_sequence()
population_means = []
for t in range(T):
μ_x, μ_y, Σ_x, Σ_y = next(m)
population_means.append(float(μ_y))
1 𝐼 𝑖
𝑥𝑇̄ ∶= ∑ 𝑥 → 𝜇𝑇 (𝐼 → ∞)
𝐼 𝑖=1 𝑇
1 𝐼
∑(𝑥𝑖 − 𝑥𝑇̄ )(𝑥𝑖𝑇 − 𝑥𝑇̄ )′ → Σ𝑇 (𝐼 → ∞)
𝐼 𝑖=1 𝑇
𝑇 −1
𝑝(𝑥0 , 𝑥1 , … , 𝑥𝑇 ) = 𝑝(𝑥0 ) ∏ 𝑝(𝑥𝑡+1 | 𝑥𝑡 )
𝑡=0
𝑝(𝑥𝑡+1 | 𝑥𝑡 ) = 𝑁 (𝐴𝑥𝑡 , 𝐶𝐶 ′ )
Autocovariance Functions
Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ𝑡 (14)
Notice that Σ𝑡+𝑗,𝑡 in general depends on both 𝑗, the gap between the two dates, and 𝑡, the
earlier date.
Stationarity and ergodicity are two properties that, when they hold, greatly aid analysis of
linear state space models.
Let’s start with the intuition.
Let’s look at some more time series from the same model that we analyzed above.
This picture shows cross-sectional distributions for 𝑦 at times 𝑇 , 𝑇 ′ , 𝑇 ″
ar = LinearStateSpace(A, C, G, mu_0=np.ones(4))
if steady_state == 'True':
μ_x, μ_y, Σ_x, Σ_y = ar.stationary_distributions()
ar_state = LinearStateSpace(A, C, G, mu_0=μ_x, Sigma_0=Σ_x)
if steady_state == 'True':
x, y = ar_state.simulate(ts_length=T4)
else:
x, y = ar.simulate(ts_length=T4)
y = y.flatten()
ax.plot(y, color=rcolor, lw=0.8, alpha=0.5)
ax.plot((T0, T1, T2), (y[T0], y[T1], y[T2],), 'ko', alpha=0.5)
plt.show()
Note how the time series “settle down” in the sense that the distributions at 𝑇 ′ and 𝑇 ″ are
relatively similar to each other — but unlike the distribution at 𝑇 .
Apparently, the distributions of 𝑦𝑡 converge to a fixed long-run distribution as 𝑡 → ∞.
When such a distribution exists it is called a stationary distribution.
Since
𝜓∞ = 𝑁 (𝜇∞ , Σ∞ )
Let’s see what happens to the preceding figure if we start 𝑥0 at the stationary distribution.
12.5. STATIONARITY AND ERGODICITY 199
Now the differences in the observed distributions at 𝑇 , 𝑇 ′ and 𝑇 ″ come entirely from random
fluctuations due to the finite sample size.
By
• our choosing 𝑥0 ∼ 𝑁 (𝜇∞ , Σ∞ )
• the definitions of 𝜇∞ and Σ∞ as fixed points of (6) and (7) respectively
we’ve ensured that
Moreover, in view of (14), the autocovariance function takes the form Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ∞ , which
depends on 𝑗 but not on 𝑡.
This motivates the following definition.
A process {𝑥𝑡 } is said to be covariance stationary if
• both 𝜇𝑡 and Σ𝑡 are constant in 𝑡
• Σ𝑡+𝑗,𝑡 depends on the time gap 𝑗 but not on time 𝑡
In our setting, {𝑥𝑡 } will be covariance stationary if 𝜇0 , Σ0 , 𝐴, 𝐶 assume values that imply that
none of 𝜇𝑡 , Σ𝑡 , Σ𝑡+𝑗,𝑡 depends on 𝑡.
200 CHAPTER 12. LINEAR STATE SPACE MODELS
The difference equation 𝜇𝑡+1 = 𝐴𝜇𝑡 is known to have unique fixed point 𝜇∞ = 0 if all eigen-
values of 𝐴 have moduli strictly less than unity.
That is, if (np.absolute(np.linalg.eigvals(A)) < 1).all() == True.
The difference equation (7) also has a unique fixed point in this case, and, moreover
𝜇𝑡 → 𝜇∞ = 0 and Σ𝑡 → Σ∞ as 𝑡→∞
𝐴1 𝑎 𝐶1
𝐴=[ ] 𝐶=[ ]
0 1 0
where
• 𝐴1 is an (𝑛 − 1) × (𝑛 − 1) matrix
• 𝑎 is an (𝑛 − 1) × 1 column vector
′
Let 𝑥𝑡 = [𝑥′1𝑡 1] where 𝑥1𝑡 is (𝑛 − 1) × 1.
It follows that
Let 𝜇1𝑡 = 𝔼[𝑥1𝑡 ] and take expectations on both sides of this expression to get
Assume now that the moduli of the eigenvalues of 𝐴1 are all strictly less than one.
Then (15) has a unique stationary solution, namely,
𝜇1∞ = (𝐼 − 𝐴1 )−1 𝑎
12.5. STATIONARITY AND ERGODICITY 201
′
The stationary value of 𝜇𝑡 itself is then 𝜇∞ ∶= [𝜇′1∞ 1] .
The stationary values of Σ𝑡 and Σ𝑡+𝑗,𝑡 satisfy
Σ∞ = 𝐴Σ∞ 𝐴′ + 𝐶𝐶 ′
(16)
Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ∞
Notice that here Σ𝑡+𝑗,𝑡 depends on the time gap 𝑗 but not on calendar time 𝑡.
In conclusion, if
• 𝑥0 ∼ 𝑁 (𝜇∞ , Σ∞ ) and
• the moduli of the eigenvalues of 𝐴1 are all strictly less than unity
then the {𝑥𝑡 } process is covariance stationary, with constant state component.
Note
If the eigenvalues of 𝐴1 are less than unity in modulus, then (a) starting from any
initial value, the mean and variance-covariance matrix both converge to their sta-
tionary values; and (b) iterations on (7) converge to the fixed point of the discrete
Lyapunov equation in the first line of (16).
12.5.5 Ergodicity
Ensemble averages across simulations are interesting theoretically, but in real life, we usually
observe only a single realization {𝑥𝑡 , 𝑦𝑡 }𝑇𝑡=0 .
So now let’s take a single realization and form the time-series averages
1 𝑇 1 𝑇
𝑥̄ ∶= ∑𝑥 and 𝑦 ̄ ∶= ∑𝑦
𝑇 𝑡=1 𝑡 𝑇 𝑡=1 𝑡
Do these time series averages converge to something interpretable in terms of our basic state-
space representation?
The answer depends on something called ergodicity.
Ergodicity is the property that time series and ensemble averages coincide.
More formally, ergodicity implies that time series sample averages converge to their expecta-
tion under the stationary distribution.
In particular,
1 𝑇
• 𝑇 ∑𝑡=1 𝑥𝑡 → 𝜇∞
1 𝑇
• 𝑇 ∑𝑡=1 (𝑥𝑡 − 𝑥𝑇̄ )(𝑥𝑡 − 𝑥𝑇̄ )′ → Σ∞
1 𝑇
• 𝑇 ∑𝑡=1 (𝑥𝑡+𝑗 − 𝑥𝑇̄ )(𝑥𝑡 − 𝑥𝑇̄ )′ → 𝐴𝑗 Σ∞
202 CHAPTER 12. LINEAR STATE SPACE MODELS
In our linear Gaussian setting, any covariance stationary process is also ergodic.
In some settings, the observation equation 𝑦𝑡 = 𝐺𝑥𝑡 is modified to include an error term.
Often this error term represents the idea that the true state can only be observed imperfectly.
To include an error term in the observation we introduce
• An IID sequence of ℓ × 1 random vectors 𝑣𝑡 ∼ 𝑁 (0, 𝐼).
• A 𝑘 × ℓ matrix 𝐻.
and extend the linear state-space system to
𝑦𝑡 ∼ 𝑁 (𝐺𝜇𝑡 , 𝐺Σ𝑡 𝐺′ + 𝐻𝐻 ′ )
12.7 Prediction
The theory of prediction for linear state space systems is elegant and simple.
The right-hand side follows from 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1 and the fact that 𝑤𝑡+1 is zero mean and
independent of 𝑥𝑡 , 𝑥𝑡−1 , … , 𝑥0 .
12.7. PREDICTION 203
More generally, we’d like to compute the 𝑗-step ahead forecasts 𝔼𝑡 [𝑥𝑡+𝑗 ] and 𝔼𝑡 [𝑦𝑡+𝑗 ].
With a bit of algebra, we obtain
In view of the IID property, current and past state values provide no information about fu-
ture values of the shock.
Hence 𝔼𝑡 [𝑤𝑡+𝑘 ] = 𝔼[𝑤𝑡+𝑘 ] = 0.
It now follows from linearity of expectations that the 𝑗-step ahead forecast of 𝑥 is
𝔼𝑡 [𝑥𝑡+𝑗 ] = 𝐴𝑗 𝑥𝑡
It is useful to obtain the covariance matrix of the vector of 𝑗-step-ahead prediction errors
𝑗−1
𝑥𝑡+𝑗 − 𝔼𝑡 [𝑥𝑡+𝑗 ] = ∑ 𝐴𝑠 𝐶𝑤𝑡−𝑠+𝑗 (20)
𝑠=0
Evidently,
𝑗−1
′
𝑉𝑗 ∶= 𝔼𝑡 [(𝑥𝑡+𝑗 − 𝔼𝑡 [𝑥𝑡+𝑗 ])(𝑥𝑡+𝑗 − 𝔼𝑡 [𝑥𝑡+𝑗 ])′ ] = ∑ 𝐴𝑘 𝐶𝐶 ′ 𝐴𝑘 (21)
𝑘=0
𝑉𝑗 is the conditional covariance matrix of the errors in forecasting 𝑥𝑡+𝑗 , conditioned on time 𝑡
information 𝑥𝑡 .
Under particular conditions, 𝑉𝑗 converges to
204 CHAPTER 12. LINEAR STATE SPACE MODELS
𝑉∞ = 𝐶𝐶 ′ + 𝐴𝑉∞ 𝐴′ (23)
12.8 Code
Our preceding simulations and calculations are based on code in the file lss.py from the
QuantEcon.py package.
The code implements a class for handling linear state space models (simulations, calculating
moments, etc.).
One Python construct you might not be familiar with is the use of a generator function in the
method moment_sequence().
Go back and read the relevant documentation if you’ve forgotten how generator functions
work.
Examples of usage are given in the solutions to the exercises.
12.9 Exercises
12.9.1 Exercise 1
In several contexts, we want to compute forecasts of geometric sums of future random vari-
ables governed by the linear state-space system (1).
We want the following objects
∞
• Forecast of a geometric sum of future 𝑥’s, or 𝔼𝑡 [∑𝑗=0 𝛽 𝑗 𝑥𝑡+𝑗 ].
∞
• Forecast of a geometric sum of future 𝑦’s, or 𝔼𝑡 [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 ].
These objects are important components of some famous and interesting dynamic models.
For example,
∞
• if {𝑦𝑡 } is a stream of dividends, then 𝔼 [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 |𝑥𝑡 ] is a model of a stock price
∞
• if {𝑦𝑡 } is the money supply, then 𝔼 [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 |𝑥𝑡 ] is a model of the price level
Show that:
∞
𝔼𝑡 [∑ 𝛽 𝑗 𝑥𝑡+𝑗 ] = [𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0
and
12.10. SOLUTIONS 205
∞
𝔼𝑡 [∑ 𝛽 𝑗 𝑦𝑡+𝑗 ] = 𝐺[𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0
12.10 Solutions
12.10.1 Exercise 1
1
Suppose that every eigenvalue of 𝐴 has modulus strictly less than 𝛽.
−1
It then follows that 𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ = [𝐼 − 𝛽𝐴] .
This leads to our formulas:
• Forecast of a geometric sum of future 𝑥’s
∞
𝔼𝑡 [∑ 𝛽 𝑗 𝑥𝑡+𝑗 ] = [𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ ]𝑥𝑡 = [𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0
∞
𝔼𝑡 [∑ 𝛽 𝑗 𝑦𝑡+𝑗 ] = 𝐺[𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ ]𝑥𝑡 = 𝐺[𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0
Footnotes
[1] The eigenvalues of 𝐴 are (1, −1, 𝑖, −𝑖).
[2] The correct way to argue this is by induction. Suppose that 𝑥𝑡 is Gaussian. Then (1) and
(10) imply that 𝑥𝑡+1 is Gaussian. Since 𝑥0 is assumed to be Gaussian, it follows that every 𝑥𝑡
is Gaussian. Evidently, this implies that each 𝑦𝑡 is Gaussian.
206 CHAPTER 12. LINEAR STATE SPACE MODELS
Chapter 13
13.1 Contents
• Overview 13.2
• Details 13.3
• Implementation 13.4
• Stochastic Shocks 13.5
• Government Spending 13.6
• Wrapping Everything Into a Class 13.7
• Using the LinearStateSpace Class 13.8
• Pure Multiplier Model 13.9
• Summary 13.10
In addition to what’s in Anaconda, this lecture will need the following libraries:
13.2 Overview
This lecture creates non-stochastic and stochastic versions of Paul Samuelson’s celebrated
multiplier accelerator model [93].
In doing so, we extend the example of the Solow model class in our second OOP lecture.
Our objectives are to
• provide a more detailed example of OOP and classes
• review a famous model
• review linear difference equations, both deterministic and stochastic
Let’s start with some standard imports:
We’ll also use the following for various tasks described below:
207
208 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
Samuelson used a second-order linear difference equation to represent a model of national out-
put based on three components:
• a national output identity asserting that national outcome is the sum of consumption
plus investment plus government purchases.
• a Keynesian consumption function asserting that consumption at time 𝑡 is equal to a
constant times national output at time 𝑡 − 1.
• an investment accelerator asserting that investment at time 𝑡 equals a constant called
the accelerator coefficient times the difference in output between period 𝑡 − 1 and 𝑡 − 2.
• the idea that consumption plus investment plus government purchases constitute aggre-
gate demand, which automatically calls forth an equal amount of aggregate supply.
(To read about linear difference equations see here or chapter IX of [95])
Samuelson used the model to analyze how particular values of the marginal propensity to
consume and the accelerator coefficient might give rise to transient business cycles in national
output.
Possible dynamic properties include
• smooth convergence to a constant level of output
• damped business cycles that eventually converge to a constant level of output
• persistent business cycles that neither dampen nor explode
Later we present an extension that adds a random shock to the right side of the national in-
come identity representing random fluctuations in aggregate demand.
This modification makes national output become governed by a second-order stochastic linear
difference equation that, with appropriate parameter values, gives rise to recurrent irregular
business cycles.
(To read about stochastic linear difference equations see chapter XI of [95])
13.3 Details
𝐶𝑡 = 𝑎𝑌𝑡−1 + 𝛾 (1)
𝑌𝑡 = 𝐶𝑡 + 𝐼𝑡 + 𝐺𝑡 (3)
• The parameter 𝑎 is peoples’ marginal propensity to consume out of income - equation
(1) asserts that people consume a fraction of math:a in (0,1) of each additional dollar of
income.
• The parameter 𝑏 > 0 is the investment accelerator coefficient - equation (2) asserts that
people invest in physical capital when income is increasing and disinvest when it is de-
creasing.
Equations (1), (2), and (3) imply the following second-order linear difference equation for na-
tional income:
𝑌𝑡 = (𝑎 + 𝑏)𝑌𝑡−1 − 𝑏𝑌𝑡−2 + (𝛾 + 𝐺𝑡 )
or
̄ ,
𝑌−1 = 𝑌−1 ̄
𝑌−2 = 𝑌−2
We’ll ordinarily set the parameters (𝑎, 𝑏) so that starting from an arbitrary pair of initial con-
̄ , 𝑌−2
ditions (𝑌−1 ̄ ), national income 𝑌 _𝑡 converges to a constant value as 𝑡 becomes large.
𝑌𝑡 = 𝜌1 𝑌𝑡−1 + 𝜌2 𝑌𝑡−2
or
To discover the properties of the solution of (6), it is useful first to form the characteristic
polynomial for (6):
𝑧 2 − 𝜌1 𝑧 − 𝜌 2 (7)
𝑧2 − 𝜌1 𝑧 − 𝜌2 = (𝑧 − 𝜆1 )(𝑧 − 𝜆2 ) = 0 (8)
𝜆1 = 𝑟𝑒𝑖𝜔 , 𝜆2 = 𝑟𝑒−𝑖𝜔
13.3. DETAILS 211
where 𝑟 is the amplitude of the complex number and 𝜔 is its angle or phase.
These can also be represented as
𝜆1 = 𝑟(𝑐𝑜𝑠(𝜔) + 𝑖 sin(𝜔))
𝜆2 = 𝑟(𝑐𝑜𝑠(𝜔) − 𝑖 sin(𝜔))
𝑌𝑡 = 𝜆𝑡1 𝑐1 + 𝜆𝑡2 𝑐2
where 𝑐1 and 𝑐2 are constants that depend on the two initial conditions and on 𝜌1 , 𝜌2 .
When the roots are complex, it is useful to pursue the following calculations.
Notice that
𝑌𝑡 = 𝑐1 (𝑟𝑒𝑖𝜔 )𝑡 + 𝑐2 (𝑟𝑒−𝑖𝜔 )𝑡
= 𝑐1 𝑟𝑡 𝑒𝑖𝜔𝑡 + 𝑐2 𝑟𝑡 𝑒−𝑖𝜔𝑡
= 𝑐1 𝑟𝑡 [cos(𝜔𝑡) + 𝑖 sin(𝜔𝑡)] + 𝑐2 𝑟𝑡 [cos(𝜔𝑡) − 𝑖 sin(𝜔𝑡)]
= (𝑐1 + 𝑐2 )𝑟𝑡 cos(𝜔𝑡) + 𝑖(𝑐1 − 𝑐2 )𝑟𝑡 sin(𝜔𝑡)
The only way that 𝑌𝑡 can be a real number for each 𝑡 is if 𝑐1 + 𝑐2 is a real number and 𝑐1 − 𝑐2
is an imaginary number.
This happens only when 𝑐1 and 𝑐2 are complex conjugates, in which case they can be written
in the polar forms
𝑐1 = 𝑣𝑒𝑖𝜃 , 𝑐2 = 𝑣𝑒−𝑖𝜃
So we can write
where 𝑣 and 𝜃 are constants that must be chosen to satisfy initial conditions for 𝑌−1 , 𝑌−2 .
This formula shows that when the roots are complex, 𝑌𝑡 displays oscillations with period
𝑝̌ = 2𝜋
𝜔 and damping factor 𝑟.
We say that 𝑝̌ is the period because in that amount of time the cosine wave cos(𝜔𝑡 + 𝜃) goes
through exactly one complete cycles.
(Draw a cosine function to convince yourself of this please)
212 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
Remark: Following [93], we want to choose the parameters 𝑎, 𝑏 of the model so that the ab-
solute values (of the possibly complex) roots 𝜆1 , 𝜆2 of the characteristic polynomial are both
strictly less than one:
Remark: When both roots 𝜆1 , 𝜆2 of the characteristic polynomial have absolute values
strictly less than one, the absolute value of the larger one governs the rate of convergence to
the steady state of the non stochastic version of the model.
We can use a LinearStateSpace instance to do various things that we did above with our
homemade function and class.
Among other things, we show by example that the eigenvalues of the matrix 𝐴 that we use to
form the instance of the LinearStateSpace class for the Samuelson model equal the roots of
the characteristic polynomial (7) for the Samuelson multiplier accelerator model.
Here is the formula for the matrix 𝐴 in the linear state space system in the case that govern-
ment expenditures are a constant 𝐺:
13.4. IMPLEMENTATION 213
1 0 0
𝐴=⎡
⎢𝛾 + 𝐺 𝜌 ⎤
1 𝜌2 ⎥
⎣ 0 1 0⎦
13.4 Implementation
# Set axis
xmin, ymin = 3, 2
xmax, ymax = xmin, ymin
plt.axis([xmin, xmax, ymin, ymax])
return fig
param_plot()
plt.show()
The graph portrays regions in which the (𝜆1 , 𝜆2 ) root pairs implied by the (𝜌1 = (𝑎 + 𝑏), 𝜌2 =
−𝑏) difference equation parameter pairs in the Samuelson model are such that:
• (𝜆1 , 𝜆2 ) are complex with modulus less than 1 - in this case, the {𝑌𝑡 } sequence displays
damped oscillations.
• (𝜆1 , 𝜆2 ) are both real, but one is strictly greater than 1 - this leads to explosive growth.
• (𝜆1 , 𝜆2 ) are both real, but one is strictly less than −1 - this leads to explosive oscilla-
tions.
• (𝜆1 , 𝜆2 ) are both real and both are less than 1 in absolute value - in this case, there is
smooth convergence to the steady state without damped cycles.
Later we’ll present the graph with a red mark showing the particular point implied by the
setting of (𝑎, 𝑏).
13.4. IMPLEMENTATION 215
discriminant = ρ1 ** 2 + 4 * ρ2
if ρ2 > 1 + ρ1 or ρ2 < 1:
print('Explosive oscillations')
elif ρ1 + ρ2 > 1:
print('Explosive growth')
elif discriminant < 0:
print('Roots are complex with modulus less than one; \
therefore damped oscillations')
else:
print('Roots are real and absolute values are less than one; \
therefore get smooth convergence to a steady state')
categorize_solution(1.3, .4)
Roots are real and absolute values are less than one; therefore get smooth�
↪convergence
to a steady state
plt.subplots(figsize=(10, 6))
plt.plot(function)
plt.xlabel('Time $t$')
plt.ylabel('$Y_t$', rotation=0)
plt.grid()
plt.show()
The following function calculates roots of the characteristic polynomial using high school al-
gebra.
(We’ll calculate the roots in other ways later)
The function also plots a 𝑌𝑡 starting from initial conditions that we set
roots = []
ρ1 = α + β
ρ2 = β
print(f'ρ_1 is {ρ1}')
print(f'ρ_2 is {ρ2}')
discriminant = ρ1 ** 2 + 4 * ρ2
if discriminant == 0:
roots.append(ρ1 / 2)
print('Single real root: ')
print(''.join(str(roots)))
elif discriminant > 0:
roots.append((ρ1 + sqrt(discriminant).real) / 2)
roots.append((ρ1 sqrt(discriminant).real) / 2)
print('Two real roots: ')
print(''.join(str(roots)))
else:
roots.append((ρ1 + sqrt(discriminant)) / 2)
roots.append((ρ1 sqrt(discriminant)) / 2)
print('Two complex roots: ')
print(''.join(str(roots)))
return y_t
plot_y(y_nonstochastic())
ρ_1 is 1.42
ρ_2 is 0.5
Two real roots:
[0.6459687576256715, 0.7740312423743284]
Absolute values of roots are less than one
13.4. IMPLEMENTATION 217
The next cell writes code that takes as inputs the modulus 𝑟 and phase 𝜙 of a conjugate pair
of complex numbers in polar form
𝜆1 = 𝑟 exp(𝑖𝜙), 𝜆2 = 𝑟 exp(−𝑖𝜙)
• The code assumes that these two complex numbers are the roots of the characteristic
polynomial
• It then reverse-engineers (𝑎, 𝑏) and (𝜌1 , 𝜌2 ), pairs that would generate those roots
r = .95
period = 10 # Length of cycle in units of time
ϕ = 2 * math.pi/period
a, b = (0.6346322893124001+0j), (0.90249999999999990j)
ρ1, ρ2 = (1.5371322893124+0j), (0.9024999999999999+0j)
ρ1 = ρ1.real
ρ2 = ρ2.real
ρ1, ρ2
Here we’ll use numpy to compute the roots of the characteristic polynomial
p1 = cmath.polar(r1)
p2 = cmath.polar(r2)
r, ϕ = 0.95, 0.6283185307179586
p1, p2 = (0.95, 0.6283185307179586), (0.95, 0.6283185307179586)
a, b = (0.6346322893124001+0j), (0.90249999999999990j)
ρ1, ρ2 = 1.5371322893124, 0.9024999999999999
# Useful constants
ρ1 = α + β
ρ2 = β
categorize_solution(ρ1, ρ2)
return y_t
plot_y(y_nonstochastic())
Roots are complex with modulus less than one; therefore damped oscillations
Roots are [0.85+0.27838822j 0.850.27838822j]
Roots are complex
Roots are less than one
220 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
a, b = 0.6180339887498949, 1.0
Roots are complex with modulus less than one; therefore damped oscillations
Roots are [0.80901699+0.58778525j 0.809016990.58778525j]
Roots are complex
Roots are less than one
13.4. IMPLEMENTATION 221
We can also use sympy to compute analytic formulas for the roots
In [14]: init_printing()
r1 = Symbol("ρ_1")
r2 = Symbol("ρ_2")
z = Symbol("z")
𝜌1 1 𝜌1 1
[ − √𝜌12 + 4𝜌2 , + √𝜌12 + 4𝜌2 ]
2 2 2 2
In [15]: a = Symbol("α")
b = Symbol("β")
r1 = a + b
r2 = b
𝛼 𝛽 1 𝛼 𝛽 1
[ + − √𝛼2 + 2𝛼𝛽 + 𝛽 2 − 4𝛽, + + √𝛼2 + 2𝛼𝛽 + 𝛽 2 − 4𝛽]
2 2 2 2 2 2
222 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
Now we’ll construct some code to simulate the stochastic version of the model that emerges
when we add a random shock process to aggregate demand
# Useful constants
ρ1 = α + β
ρ2 = β
# Categorize solution
categorize_solution(ρ1, ρ2)
# Generate shocks
ϵ = np.random.normal(0, 1, n)
return y_t
plot_y(y_stochastic())
Roots are real and absolute values are less than one; therefore get smooth�
↪convergence
to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
13.5. STOCHASTIC SHOCKS 223
Let’s do a simulation in which there are shocks and the characteristic polynomial has complex
roots
In [17]: r = .97
a, b = 0.6285929690873979, 0.9409000000000001
Roots are complex with modulus less than one; therefore damped oscillations
[0.78474648+0.57015169j 0.784746480.57015169j]
Roots are complex
Roots are less than one
224 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
# Useful constants
ρ1 = α + β
ρ2 = β
# Categorize solution
categorize_solution(ρ1, ρ2)
else:
print('Roots are real')
# Generate shocks
ϵ = np.random.normal(0, 1, n)
# Stochastic
else:
ϵ = np.random.normal(0, 1, n)
return ρ1 * x[t 1] + ρ2 * x[t 2] + γ + g + σ * ϵ[t]
# No government spending
if g == 0:
y_t.append(transition(y_t, t))
Roots are real and absolute values are less than one; therefore get smooth�
↪convergence
to a steady state
[0.7236068 0.2763932]
226 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
We can also see the response to a one time jump in government expenditures
Roots are real and absolute values are less than one; therefore get smooth�
↪convergence
to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
13.7. WRAPPING EVERYTHING INTO A CLASS 227
.. math::
Parameters
y_0 : scalar
Initial condition for Y_0
y_1 : scalar
Initial condition for Y_1
α : scalar
Marginal propensity to consume
β : scalar
Accelerator coefficient
n : int
Number of iterations
σ : scalar
Volatility parameter. It must be greater than or equal to 0. Set
equal to 0 for a nonstochastic model.
228 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
g : scalar
Government spending shock
g_t : int
Time at which government spending shock occurs. Must be specified
when duration != None.
duration : {None, 'permanent', 'oneoff'}
Specifies type of government spending shock. If none, government
spending equal to g for all t.
"""
def __init__(self,
y_0=100,
y_1=50,
α=1.3,
β=0.2,
γ=10,
n=100,
σ=0,
g=0,
g_t=0,
duration=None):
def root_type(self):
if all(isinstance(root, complex) for root in self.roots):
return 'Complex conjugate'
elif len(self.roots) > 1:
return 'Double real'
else:
return 'Single real'
def root_less_than_one(self):
if all(abs(root) < 1 for root in self.roots):
return True
def solution_type(self):
ρ1, ρ2 = self.ρ1, self.ρ2
discriminant = ρ1 ** 2 + 4 * ρ2
if ρ2 >= 1 + ρ1 or ρ2 <= 1:
return 'Explosive oscillations'
elif ρ1 + ρ2 >= 1:
return 'Explosive growth'
elif discriminant < 0:
return 'Damped oscillations'
else:
return 'Steady state'
# Stochastic
else:
13.7. WRAPPING EVERYTHING INTO A CLASS 229
ϵ = np.random.normal(0, 1, self.n)
return self.ρ1 * x[t 1] + self.ρ2 * x[t 2] + self.γ + g \
+ self.σ * ϵ[t]
def generate_series(self):
# No government spending
if self.g == 0:
y_t.append(self._transition(y_t, t))
def summary(self):
print('Summary\n' + '' * 50)
print(f'Root type: {self.root_type()}')
print(f'Solution type: {self.solution_type()}')
print(f'Roots: {str(self.roots)}')
if self.root_less_than_one() == True:
print('Absolute value of roots is less than one')
else:
print('Absolute value of roots is not less than one')
if self.σ > 0:
print('Stochastic series with σ = ' + str(self.σ))
else:
print('Nonstochastic series')
if self.g != 0:
print('Government spending equal to ' + str(self.g))
if self.duration != None:
print(self.duration.capitalize() +
' government spending shock at t = ' + str(self.g_t))
def plot(self):
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(self.generate_series())
ax.set(xlabel='Iteration', xlim=(0, self.n))
ax.set_ylabel('$Y_t$', rotation=0)
230 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
ax.grid()
return fig
def param_plot(self):
fig = param_plot()
ax = fig.gca()
plt.legend(fontsize=12, loc=3)
return fig
Summary
Root type: Complex conjugate
Solution type: Damped oscillations
Roots: [0.65+0.27838822j 0.650.27838822j]
Absolute value of roots is less than one
Stochastic series with σ = 2
Government spending equal to 10
Permanent government spending shock at t = 20
In [23]: sam.plot()
plt.show()
13.7. WRAPPING EVERYTHING INTO A CLASS 231
We’ll use our graph to show where the roots lie and how their location is consistent with the
behavior of the path just graphed.
The red + sign shows the location of the roots
In [24]: sam.param_plot()
plt.show()
232 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
It turns out that we can use the QuantEcon.py LinearStateSpace class to do much of the
work that we have done from scratch above.
Here is how we map the Samuelson model into an instance of a LinearStateSpace class
A = [[1, 0, 0],
[γ + g, ρ1, ρ2],
[0, 1, 0]]
x, y = sam_t.simulate(ts_length=n)
axes[1].set_xlabel('Iteration')
plt.show()
13.8. USING THE LINEARSTATESPACE CLASS 233
Let’s plot impulse response functions for the instance of the Samuelson model using a
method in the LinearStateSpace class
Out[26]: (2, 6, 1)
(2, 6, 1)
Now let’s compute the zeros of the characteristic polynomial by simply calculating the eigen-
values of 𝐴
In [27]: A = np.asarray(A)
w, v = np.linalg.eig(A)
print(w)
We could also create a subclass of LinearStateSpace (inheriting all its methods and at-
tributes) to add more functions to use
"""
This subclass creates a Samuelson multiplieraccelerator model
as a linear state space system.
"""
def __init__(self,
y_0=100,
y_1=100,
α=0.8,
β=0.9,
γ=10,
σ=1,
g=10):
self.α, self.β = α, β
self.y_0, self.y_1, self.g = y_0, y_1, g
self.γ, self.σ = γ, σ
self.ρ1 = α + β
self.ρ2 = β
except ValueError:
print('Stationary distribution does not exist')
x, y = self.simulate(ts_length)
axes[1].set_xlabel('Iteration')
return fig
x, y = self.impulse_response(j)
return fig
13.8.3 Illustrations
In [32]: samlss.plot_irf(100)
plt.show()
In [33]: samlss.multipliers()
Let’s shut down the accelerator by setting 𝑏 = 0 to get a pure multiplier model
• the absence of cycles gives an idea about why Samuelson included the accelerator
In [35]: pure_multiplier.plot_simulation()
Out[35]:
238 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
In [37]: pure_multiplier.plot_simulation()
Out[37]:
13.10. SUMMARY 239
In [38]: pure_multiplier.plot_irf(100)
Out[38]:
13.10 Summary
In this lecture, we wrote functions and classes to represent non-stochastic and stochastic ver-
sions of the Samuelson (1939) multiplier-accelerator model, described in [93].
We saw that different parameter values led to different output paths, which could either be
stationary, explosive, or oscillating.
We also were able to represent the model using the QuantEcon.py LinearStateSpace class.
240 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
Chapter 14
14.1 Contents
• Overview 14.2
• Kesten Processes 14.3
• Heavy Tails 14.4
• Application: Firm Dynamics 14.5
• Exercises 14.6
• Solutions 14.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
14.2 Overview
241
242 CHAPTER 14. KESTEN PROCESSES AND FIRM DYNAMICS
import quantecon as qe
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
The following two lines are only added to avoid a FutureWarning caused by compatibility is-
sues between pandas and matplotlib.
Additional technical background related to this lecture can be found in the monograph of
[17].
The GARCH model is common in financial applications, where time series such as asset re-
turns exhibit time varying volatility.
For example, consider the following plot of daily returns on the Nasdaq Composite Index for
the period 1st January 2006 to 1st November 2019.
r = s.pct_change()
14.3. KESTEN PROCESSES 243
fig, ax = plt.subplots()
ax.plot(r, alpha=0.7)
ax.set_ylabel('returns', fontsize=12)
ax.set_xlabel('date', fontsize=12)
plt.show()
[*********************100%***********************] 1 of 1 completed
Notice how the series exhibits bursts of volatility (high variance) and then settles down again.
GARCH models can replicate this feature.
The GARCH(1, 1) volatility process takes the form
2
𝜎𝑡+1 = 𝛼0 + 𝜎𝑡2 (𝛼1 𝜉𝑡+1
2
+ 𝛽) (2)
where {𝜉𝑡 } is IID with 𝔼𝜉𝑡2 = 1 and all parameters are positive.
Returns on a given asset are then modeled as
𝑟𝑡 = 𝜎𝑡 𝜁𝑡 (3)
Suppose that a given household saves a fixed fraction 𝑠 of its current wealth in every period.
The household earns labor income 𝑦𝑡 at the start of time 𝑡.
Wealth then evolves according to
14.3.3 Stationarity
In earlier lectures, such as the one on AR(1) processes, we introduced the notion of a station-
ary distribution.
In the present context, we can define a stationary distribution as follows:
The distribution 𝐹 ∗ on ℝ is called stationary for the Kesten process (1) if
In other words, if the current state 𝑋𝑡 has distribution 𝐹 ∗ , then so does the next period state
𝑋𝑡+1 .
We can write this alternatively as
The left hand side is the distribution of the next period state when the current state is drawn
from 𝐹 ∗ .
The equality in (6) states that this distribution is unchanged.
By the definition of stationarity and the assumption that 𝐹 ∗ is stationary for the wealth pro-
cess, this is just 𝐹 ∗ (𝑦).
Hence the fraction of households with wealth in [0, 𝑦] is the same next period as it is this pe-
riod.
Since 𝑦 was chosen arbitrarily, the distribution is unchanged.
The Kesten process 𝑋𝑡+1 = 𝑎𝑡+1 𝑋𝑡 + 𝜂𝑡+1 does not always have a stationary distribution.
For example, if 𝑎𝑡 ≡ 𝜂𝑡 ≡ 1 for all 𝑡, then 𝑋𝑡 = 𝑋0 + 𝑡, which diverges to infinity.
To prevent this kind of divergence, we require that {𝑎𝑡 } is strictly less than 1 most of the
time.
In particular, if
Under certain conditions, the stationary distribution of a Kesten process has a Pareto tail.
(See our earlier lecture on heavy-tailed distributions for background.)
This fact is significant for economics because of the prevalence of Pareto-tailed distributions.
To state the conditions under which the stationary distribution of a Kesten process has a
Pareto tail, we first recall that a random variable is called nonarithmetic if its distribution
is not concentrated on {… , −2𝑡, −𝑡, 0, 𝑡, 2𝑡, …} for any 𝑡 ≥ 0.
For example, any random variable with a density is nonarithmetic.
246 CHAPTER 14. KESTEN PROCESSES AND FIRM DYNAMICS
The famous Kesten–Goldie Theorem (see, e.g., [17], theorem 2.4.4) states that if
𝔼𝑎𝛼
𝑡 = 1, 𝔼𝜂𝑡𝛼 < ∞, and 𝔼[𝑎𝛼+1
𝑡 ]<∞
then the stationary distribution of the Kesten process has a Pareto tail with tail index 𝛼.
More precisely, if 𝐹 ∗ is the unique stationary distribution and 𝑋 ∗ ∼ 𝐹 ∗ , then
14.4.2 Intuition
In [5]: μ = 0.5
σ = 1.0
def kesten_ts(ts_length=100):
x = np.zeros(ts_length)
for t in range(ts_length1):
a = np.exp(μ + σ * np.random.randn())
b = np.exp(np.random.randn())
14.5. APPLICATION: FIRM DYNAMICS 247
x[t+1] = a * x[t] + b
return x
fig, ax = plt.subplots()
num_paths = 10
np.random.seed(12)
for i in range(num_paths):
ax.plot(kesten_ts())
ax.set(xlabel='time', ylabel='$X_t$')
plt.show()
As noted in our lecture on heavy tails, for common measures of firm size such as revenue or
employment, the US firm size distribution exhibits a Pareto tail (see, e.g., [7], [41]).
Let us try to explain this rather striking fact using the Kesten–Goldie Theorem.
It was postulated many years ago by Robert Gibrat [42] that firm size evolves according to a
simple rule whereby size next period is proportional to current size.
This is now known as Gibrat’s law of proportional growth.
We can express this idea by stating that a suitably defined measure 𝑠𝑡 of firm size obeys
248 CHAPTER 14. KESTEN PROCESSES AND FIRM DYNAMICS
𝑠𝑡+1
= 𝑎𝑡+1 (8)
𝑠𝑡
1. small firms grow faster than large firms (see, e.g., [35] and [46]) and
2. the growth rate of small firms is more volatile than that of large firms [32].
On the other hand, Gibrat’s law is generally found to be a reasonable approximation for large
firms [35].
We can accommodate these empirical findings by modifying (8) to
where {𝑎𝑡 } and {𝑏𝑡 } are both IID and independent of each other.
In the exercises you are asked to show that (9) is more consistent with the empirical findings
presented above than Gibrat’s law in (8).
14.6 Exercises
14.6.1 Exercise 1
Simulate and plot 15 years of daily returns (consider each year as having 250 working days)
using the GARCH(1, 1) process in (2)–(3).
Take 𝜉𝑡 and 𝜁𝑡 to be independent and standard normal.
Set 𝛼0 = 0.00001, 𝛼1 = 0.1, 𝛽 = 0.9 and 𝜎0 = 0.
Compare visually with the Nasdaq Composite Index returns shown above.
14.6. EXERCISES 249
While the time path differs, you should see bursts of high volatility.
14.6.2 Exercise 2
In our discussion of firm dynamics, it was claimed that (9) is more consistent with the empiri-
cal literature than Gibrat’s law in (8).
(The empirical literature was reviewed immediately above (9).)
In what sense is this true (or false)?
14.6.3 Exercise 3
14.6.4 Exercise 4
One unrealistic aspect of the firm dynamics specified in (9) is that it ignores entry and exit.
In any given period and in any given market, we observe significant numbers of firms entering
and exiting the market.
Empirical discussion of this can be found in a famous paper by Hugo Hopenhayn [57].
In the same paper, Hopenhayn builds a model of entry and exit that incorporates profit max-
imization by firms and market clearing quantities, wages and prices.
In his model, a stationary equilibrium occurs when the number of entrants equals the number
of exiting firms.
In this setting, firm dynamics can be expressed as
𝑠𝑡+1 = 𝑒𝑡+1 𝟙{𝑠𝑡 < 𝑠}̄ + (𝑎𝑡+1 𝑠𝑡 + 𝑏𝑡+1 )𝟙{𝑠𝑡 ≥ 𝑠}̄ (10)
Here
• the state variable 𝑠𝑡 is represents productivity (which is a proxy for output and hence
firm size),
• the IID sequence {𝑒𝑡 } is thought of as a productivity draw for a new entrant and
• the variable 𝑠 ̄ is a threshold value that we take as given, although it is determined en-
dogenously in Hopenhayn’s model.
The idea behind (10) is that firms stay in the market as long as their productivity 𝑠𝑡 remains
at or above 𝑠.̄
• In this case, their productivity updates according to (9).
250 CHAPTER 14. KESTEN PROCESSES AND FIRM DYNAMICS
14.7 Solutions
14.7.1 Exercise 1
years = 15
days = years * 250
def garch_ts(ts_length=days):
σ2 = 0
r = np.zeros(ts_length)
for t in range(ts_length1):
ξ = np.random.randn()
σ2 = α_0 + σ2 * (α_1 * ξ**2 + β)
r[t] = np.sqrt(σ2) * np.random.randn()
return r
14.7. SOLUTIONS 251
fig, ax = plt.subplots()
np.random.seed(12)
ax.plot(garch_ts(), alpha=0.7)
ax.set(xlabel='time', ylabel='$\\sigma_t^2$')
plt.show()
14.7.2 Exercise 2
2. the growth rate of small firms is more volatile than that of large firms.
Also, Gibrat’s law is generally found to be a reasonable approximation for large firms than for
small firms
The claim is that the dynamics in (9) are more consistent with points 1-2 than Gibrat’s law.
To see why, we rewrite (9) in terms of growth dynamics:
𝑠𝑡+1 𝑏
= 𝑎𝑡+1 + 𝑡+1 (11)
𝑠𝑡 𝑠𝑡
𝔼𝑏 𝕍𝑏
𝔼𝑎 + and 𝕍𝑎 +
𝑠 𝑠2
Both of these decline with firm size 𝑠, consistent with the data.
Moreover, the law of motion (11) clearly approaches Gibrat’s law (8) as 𝑠𝑡 gets large.
14.7.3 Exercise 3
𝔼 ln 𝑎𝑡 = 𝔼(𝜇 + 𝜎𝑍) = 𝜇,
and since 𝜂𝑡 has finite moments of all orders, the stationarity condition holds if and only if
𝜇 < 0.
Given the properties of the lognormal distribution (which has finite moments of all orders),
the only other condition in doubt is existence of a positive constant 𝛼 such that 𝔼𝑎𝛼
𝑡 = 1.
𝛼2 𝜎 2
exp (𝛼𝜇 + ) = 1.
2
14.7.4 Exercise 4
@njit(parallel=True)
def generate_draws(μ_a=0.5,
σ_a=0.1,
μ_b=0.0,
σ_b=0.5,
μ_e=0.0,
σ_e=0.5,
s_bar=1.0,
T=500,
M=1_000_000,
s_init=1.0):
draws = np.empty(M)
for m in prange(M):
s = s_init
for t in range(T):
if s < s_bar:
new_s = np.exp(μ_e + σ_e * randn())
else:
a = np.exp(μ_a + σ_a * randn())
14.7. SOLUTIONS 253
return draws
data = generate_draws()
plt.show()
15.1 Contents
• Overview 15.2
• Lorenz Curves and the Gini Coefficient 15.3
• A Model of Wealth Dynamics 15.4
• Implementation 15.5
• Applications 15.6
• Exercises 15.7
• Solutions 15.8
In addition to what’s in Anaconda, this lecture will need the following libraries:
15.2 Overview
The evolution of wealth for any given household depends on their savings behavior.
255
256 CHAPTER 15. WEALTH DISTRIBUTION DYNAMICS
Modeling such behavior will form an important part of this lecture series.
However, in this particular lecture, we will be content with rather ad hoc (but plausible) sav-
ings rules.
We do this to more easily explore the implications of different specifications of income dy-
namics and investment returns.
At the same time, all of the techniques discussed here can be plugged into models that use
optimization to obtain savings rules.
We will use the following imports.
import quantecon as qe
from numba import njit, jitclass, float64, prange
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
fig, ax = plt.subplots()
ax.plot(f_vals, l_vals, label='Lorenz curve, lognormal sample')
ax.plot(f_vals, f_vals, label='Lorenz curve, equality')
ax.legend()
plt.show()
15.3. LORENZ CURVES AND THE GINI COEFFICIENT 257
This curve can be understood as follows: if point (𝑥, 𝑦) lies on the curve, it means that, col-
lectively, the bottom (100𝑥)% of the population holds (100𝑦)% of the wealth.
The “equality” line is the 45 degree line (which might not be exactly 45 degrees in the figure,
depending on the aspect ratio).
A sample that produces this line exhibits perfect equality.
The other line in the figure is the Lorenz curve for the lognormal sample, which deviates sig-
nificantly from perfect equality.
For example, the bottom 80% of the population holds around 40% of total wealth.
Here is another example, which shows how the Lorenz curve shifts as the underlying distribu-
tion changes.
We generate 10,000 observations using the Pareto distribution with a range of parameters,
and then compute the Lorenz curve corresponding to each set of observations.
You can see that, as the tail parameter of the Pareto distribution increases, inequality de-
creases.
This is to be expected, because a higher tail index implies less weight in the tail of the Pareto
distribution.
The definition and interpretation of the Gini coefficient can be found on the corresponding
Wikipedia page.
A value of 0 indicates perfect equality (corresponding the case where the Lorenz curve
matches the 45 degree line) and a value of 1 indicates complete inequality (all wealth held
by the richest household).
The QuantEcon.py library contains a function to calculate the Gini coefficient.
We can test it on the Weibull distribution with parameter 𝑎, where the Gini coefficient is
known to be
𝐺 = 1 − 2−1/𝑎
Let’s see if the Gini coefficient computed from a simulated sample matches this at each fixed
value of 𝑎.
fig, ax = plt.subplots()
for a in a_vals:
y = np.random.weibull(a, size=n)
15.4. A MODEL OF WEALTH DYNAMICS 259
ginis.append(qe.gini_coefficient(y))
ginis_theoretical.append(1 2**(1/a))
ax.plot(a_vals, ginis, label='estimated gini coefficient')
ax.plot(a_vals, ginis_theoretical, label='theoretical gini coefficient')
ax.legend()
ax.set_xlabel("Weibull parameter $a$")
ax.set_ylabel("Gini coefficient")
plt.show()
where
• 𝑤𝑡 is wealth at time 𝑡 for a given household,
• 𝑟𝑡 is the rate of return of financial assets,
• 𝑦𝑡 is current non-financial (e.g., labor) income and
• 𝑠(𝑤𝑡 ) is current wealth net of consumption
Letting {𝑧𝑡 } be a correlated state process of the form
𝑅𝑡 ∶= 1 + 𝑟𝑡 = 𝑐𝑟 exp(𝑧𝑡 ) + exp(𝜇𝑟 + 𝜎𝑟 𝜉𝑡 )
and
𝑦𝑡 = 𝑐𝑦 exp(𝑧𝑡 ) + exp(𝜇𝑦 + 𝜎𝑦 𝜁𝑡 )
𝑠(𝑤) = 𝑠0 𝑤 ⋅ 𝟙{𝑤 ≥ 𝑤}
̂ (2)
15.5 Implementation
In [7]: wealth_dynamics_data = [
('w_hat', float64), # savings parameter
('s_0', float64), # savings parameter
('c_y', float64), # labor income parameter
('μ_y', float64), # labor income paraemter
('σ_y', float64), # labor income parameter
('c_r', float64), # rate of return parameter
('μ_r', float64), # rate of return parameter
('σ_r', float64), # rate of return parameter
('a', float64), # aggregate shock parameter
('b', float64), # aggregate shock parameter
('σ_z', float64), # aggregate shock parameter
('z_mean', float64), # mean of z process
('z_var', float64), # variance of z process
('y_mean', float64), # mean of y process
('R_mean', float64) # mean of R process
]
Here’s a class that stores instance data and implements methods that update the aggregate
state and household wealth.
In [8]: @jitclass(wealth_dynamics_data)
class WealthDynamics:
15.5. IMPLEMENTATION 261
def __init__(self,
w_hat=1.0,
s_0=0.75,
c_y=1.0,
μ_y=1.0,
σ_y=0.2,
c_r=0.05,
μ_r=0.1,
σ_r=0.5,
a=0.5,
b=0.0,
σ_z=0.1):
def parameters(self):
"""
Collect and return parameters.
"""
parameters = (self.w_hat, self.s_0,
self.c_y, self.μ_y, self.σ_y,
self.c_r, self.μ_r, self.σ_r,
self.a, self.b, self.σ_z)
return parameters
# Simplify names
params = self.parameters()
w_hat, s_0, c_y, μ_y, σ_y, c_r, μ_r, σ_r, a, b, σ_z = params
zp = a * z + b + σ_z * np.random.randn()
# Update wealth
y = c_y * np.exp(zp) + np.exp(μ_y + σ_y * np.random.randn())
wp = y
if w >= w_hat:
R = c_r * np.exp(zp) + np.exp(μ_r + σ_r * np.random.randn())
wp += R * s_0 * w
return wp, zp
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/ipykernel_launcher.py:1: NumbaDeprecationWarning: The 'numba.jitclass'
262 CHAPTER 15. WEALTH DISTRIBUTION DYNAMICS
Here’s function to simulate the time series of wealth for in individual households.
In [9]: @njit
def wealth_time_series(wdy, w_0, n):
"""
Generate a single time series of length n for wealth given
initial value w_0.
The initial persistent state z_0 for each household is drawn from
the stationary distribution of the AR(1) process.
"""
z = wdy.z_mean + np.sqrt(wdy.z_var) * np.random.randn()
w = np.empty(n)
w[0] = w_0
for t in range(n1):
w[t+1], z = wdy.update_states(w[t], z)
return w
In [10]: @njit(parallel=True)
def update_cross_section(wdy, w_distribution, shift_length=500):
"""
Shifts a crosssection of household forward in time
"""
new_distribution = np.empty_like(w_distribution)
Parallelization is very effective in the function above because the time path of each household
can be calculated independently once the path for the aggregate state is known.
15.6 Applications
Let’s try simulating the model at different parameter values and investigate the implications
for the wealth distribution.
ts_length = 200
w = wealth_time_series(wdy, wdy.y_mean, ts_length)
fig, ax = plt.subplots()
ax.plot(w)
plt.show()
Now we investigate how the Lorenz curves associated with the wealth distribution change as
return to savings varies.
The code below plots Lorenz curves for three different values of 𝜇𝑟 .
If you are running this yourself, note that it will take one or two minutes to execute.
This is unavoidable because we are executing a CPU intensive task.
In fact the code, which is JIT compiled and parallelized, runs extremely fast relative to the
number of computations.
The Lorenz curve shifts downwards as returns on financial income rise, indicating a rise in
inequality.
We will look at this again via the Gini coefficient immediately below, but first consider the
following image of our system resources when the code above is executing:
Notice how effectively Numba has implemented multithreading for this routine: all 8 CPUs
on our workstation are running at maximum capacity (even though four of them are virtual).
Since the code is both efficiently JIT compiled and fully parallelized, it’s close to impossible
to make this sequence of tasks run faster without changing hardware.
Now let’s check the Gini coefficient.
Once again, we see that inequality increases as returns on financial income rise.
Let’s finish this section by investigating what happens when we change the volatility term 𝜎𝑟
in financial returns.
We see that greater volatility has the effect of increasing inequality in this model.
15.7 Exercises
15.7.1 Exercise 1
For a wealth or income distribution with Pareto tail, a higher tail index suggests lower in-
equality.
Indeed, it is possible to prove that the Gini coefficient of the Pareto distribution with tail in-
dex 𝑎 is 1/(2𝑎 − 1).
To the extent that you can, confirm this by simulation.
In particular, generate a plot of the Gini coefficient against the tail index using both the theo-
retical value just given and the value computed from a sample via qe.gini_coefficient.
For the values of the tail index, use a_vals = np.linspace(1, 10, 25).
Use sample of size 1,000 for each 𝑎 and the sampling method for generating Pareto draws em-
ployed in the discussion of Lorenz curves for the Pareto distribution.
To the extend that you can, interpret the monotone relationship between the Gini index and
𝑎.
15.7.2 Exercise 2
The Kesten–Goldie theorem tells us that Kesten processes have Pareto tails under a range of
parameterizations.
The theorem does not directly apply here, since savings is not always constant and since the
multiplicative and additive terms in (1) are not IID.
At the same time, given the similarities, perhaps Pareto tails will arise.
To test this, run a simulation that generates a cross-section of wealth and generate a rank-size
plot.
If you like, you can use the function rank_size from the quantecon library (documentation
here).
In viewing the plot, remember that Pareto tails generate a straight line. Is this what you see?
For sample size and initial conditions, use
15.8 Solutions
Here is one solution, which produces a good match between theory and simulation.
15.8.1 Exercise 1
In general, for a Pareto distribution, a higher tail index implies less weight in the right hand
tail.
This means less extreme values for wealth and hence more equality.
More equality translates to a lower Gini index.
15.8.2 Exercise 2
plt.show()
270 CHAPTER 15. WEALTH DISTRIBUTION DYNAMICS
Chapter 16
16.1 Contents
• Overview 16.2
• The Basic Idea 16.3
• Convergence 16.4
• Implementation 16.5
• Exercises 16.6
• Solutions 16.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
16.2 Overview
This lecture provides a simple and intuitive introduction to the Kalman filter, for those who
either
• have heard of the Kalman filter but don’t know how it works, or
• know the Kalman filter equations, but don’t know where they come from
For additional (more advanced) reading on the Kalman filter, see
• [72], section 2.7
• [5]
The second reference presents a comprehensive treatment of the Kalman filter.
Required knowledge: Familiarity with matrix manipulations, multivariate normal distribu-
tions, covariance matrices, etc.
We’ll need the following imports:
271
272 CHAPTER 16. A FIRST LOOK AT THE KALMAN FILTER
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
The Kalman filter has many applications in economics, but for now let’s pretend that we are
rocket scientists.
A missile has been launched from country Y and our mission is to track it.
Let 𝑥 ∈ ℝ2 denote the current location of the missile—a pair indicating latitude-longitude
coordinates on a map.
At the present moment in time, the precise location 𝑥 is unknown, but we do have some be-
liefs about 𝑥.
One way to summarize our knowledge is a point prediction 𝑥̂
• But what if the President wants to know the probability that the missile is currently
over the Sea of Japan?
• Then it is better to summarize our initial beliefs with a bivariate probability density 𝑝
– ∫𝐸 𝑝(𝑥)𝑑𝑥 indicates the probability that we attach to the missile being in region 𝐸.
The density 𝑝 is called our prior for the random variable 𝑥.
To keep things tractable in our example, we assume that our prior is Gaussian.
In particular, we take
𝑝 = 𝑁 (𝑥,̂ Σ) (1)
where 𝑥̂ is the mean of the distribution and Σ is a 2×2 covariance matrix. In our simulations,
we will suppose that
This density 𝑝(𝑥) is shown below as a contour map, with the center of the red ellipse being
equal to 𝑥.̂
Parameters
x : array_like(float)
Random variable
y : array_like(float)
Random variable
σ_x : array_like(float)
Standard deviation of random variable x
σ_y : array_like(float)
Standard deviation of random variable y
μ_x : scalar(float)
Mean value of random variable x
μ_y : scalar(float)
Mean value of random variable y
σ_xy : array_like(float)
Covariance of random variables x and y
"""
x_μ = x μ_x
y_μ = y μ_y
Z = gen_gaussian_plot_vals(x_hat, Σ)
274 CHAPTER 16. A FIRST LOOK AT THE KALMAN FILTER
plt.show()
We are now presented with some good news and some bad news.
The good news is that the missile has been located by our sensors, which report that the cur-
rent location is 𝑦 = (2.3, −1.9).
The next figure shows the original prior 𝑝(𝑥) and the new reported location 𝑦
Z = gen_gaussian_plot_vals(x_hat, Σ)
ax.contourf(X, Y, Z, 6, alpha=0.6, cmap=cm.jet)
cs = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs, inline=1, fontsize=10)
ax.text(float(y[0]), float(y[1]), "$y$", fontsize=20, color="black")
plt.show()
16.3. THE BASIC IDEA 275
Here 𝐺 and 𝑅 are 2 × 2 matrices with 𝑅 positive definite. Both are assumed known, and the
noise term 𝑣 is assumed to be independent of 𝑥.
How then should we combine our prior 𝑝(𝑥) = 𝑁 (𝑥,̂ Σ) and this new information 𝑦 to improve
our understanding of the location of the missile?
As you may have guessed, the answer is to use Bayes’ theorem, which tells us to update our
prior 𝑝(𝑥) to 𝑝(𝑥 | 𝑦) via
𝑝(𝑦 | 𝑥) 𝑝(𝑥)
𝑝(𝑥 | 𝑦) =
𝑝(𝑦)
𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 )
where
𝑥𝐹̂ ∶= 𝑥̂ + Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 (𝑦 − 𝐺𝑥)̂ and Σ𝐹 ∶= Σ − Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 𝐺Σ (4)
Here Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 is the matrix of population regression coefficients of the hidden object
𝑥 − 𝑥̂ on the surprise 𝑦 − 𝐺𝑥.̂
This new density 𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 ) is shown in the next figure via contour lines and the
color map.
The original density is left in as contour lines for comparison
Z = gen_gaussian_plot_vals(x_hat, Σ)
cs1 = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs1, inline=1, fontsize=10)
M = Σ * G.T * linalg.inv(G * Σ * G.T + R)
x_hat_F = x_hat + M * (y G * x_hat)
Σ_F = Σ M * G * Σ
new_Z = gen_gaussian_plot_vals(x_hat_F, Σ_F)
cs2 = ax.contour(X, Y, new_Z, 6, colors="black")
ax.clabel(cs2, inline=1, fontsize=10)
ax.contourf(X, Y, new_Z, 6, alpha=0.6, cmap=cm.jet)
ax.text(float(y[0]), float(y[1]), "$y$", fontsize=20, color="black")
plt.show()
16.3. THE BASIC IDEA 277
Our new density twists the prior 𝑝(𝑥) in a direction determined by the new information 𝑦 −
𝐺𝑥.̂
In generating the figure, we set 𝐺 to the identity matrix and 𝑅 = 0.5Σ for Σ defined in (2).
Our aim is to combine this law of motion and our current distribution 𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 ) to
come up with a new predictive distribution for the location in one unit of time.
278 CHAPTER 16. A FIRST LOOK AT THE KALMAN FILTER
In view of (5), all we have to do is introduce a random vector 𝑥𝐹 ∼ 𝑁 (𝑥𝐹̂ , Σ𝐹 ) and work out
the distribution of 𝐴𝑥𝐹 + 𝑤 where 𝑤 is independent of 𝑥𝐹 and has distribution 𝑁 (0, 𝑄).
Since linear combinations of Gaussians are Gaussian, 𝐴𝑥𝐹 + 𝑤 is Gaussian.
Elementary calculations and the expressions in (4) tell us that
and
The matrix 𝐴Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 is often written as 𝐾Σ and called the Kalman gain.
• The subscript Σ has been added to remind us that 𝐾Σ depends on Σ, but not 𝑦 or 𝑥.̂
Using this notation, we can summarize our results as follows.
Our updated prediction is the density 𝑁 (𝑥𝑛𝑒𝑤
̂ , Σ𝑛𝑒𝑤 ) where
𝑥𝑛𝑒𝑤
̂ ∶= 𝐴𝑥̂ + 𝐾Σ (𝑦 − 𝐺𝑥)̂
(6)
Σ𝑛𝑒𝑤 ∶= 𝐴Σ𝐴′ − 𝐾Σ 𝐺Σ𝐴′ + 𝑄
• The density 𝑝𝑛𝑒𝑤 (𝑥) = 𝑁 (𝑥𝑛𝑒𝑤
̂ , Σ𝑛𝑒𝑤 ) is called the predictive distribution
The predictive distribution is the new density shown in the following figure, where the update
has used parameters.
1.2 0.0
𝐴=( ), 𝑄 = 0.3 ∗ Σ
0.0 −0.2
# Density 1
Z = gen_gaussian_plot_vals(x_hat, Σ)
cs1 = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs1, inline=1, fontsize=10)
# Density 2
M = Σ * G.T * linalg.inv(G * Σ * G.T + R)
x_hat_F = x_hat + M * (y G * x_hat)
Σ_F = Σ M * G * Σ
Z_F = gen_gaussian_plot_vals(x_hat_F, Σ_F)
cs2 = ax.contour(X, Y, Z_F, 6, colors="black")
ax.clabel(cs2, inline=1, fontsize=10)
# Density 3
new_x_hat = A * x_hat_F
new_Σ = A * Σ_F * A.T + Q
new_Z = gen_gaussian_plot_vals(new_x_hat, new_Σ)
cs3 = ax.contour(X, Y, new_Z, 6, colors="black")
ax.clabel(cs3, inline=1, fontsize=10)
ax.contourf(X, Y, new_Z, 6, alpha=0.6, cmap=cm.jet)
ax.text(float(y[0]), float(y[1]), "$y$", fontsize=20, color="black")
plt.show()
16.3. THE BASIC IDEA 279
𝑥𝑡+1
̂ = 𝐴𝑥𝑡̂ + 𝐾Σ𝑡 (𝑦𝑡 − 𝐺𝑥𝑡̂ )
(7)
Σ𝑡+1 = 𝐴Σ𝑡 𝐴′ − 𝐾Σ𝑡 𝐺Σ𝑡 𝐴′ + 𝑄
These are the standard dynamic equations for the Kalman filter (see, for example, [72], page
58).
16.4 Convergence
16.5 Implementation
The class Kalman from the QuantEcon.py package implements the Kalman filter
16.6. EXERCISES 281
𝑄 ∶= 𝐶𝐶 ′ and 𝑅 ∶= 𝐻𝐻 ′
• The class Kalman from the QuantEcon.py package has a number of methods, some that
we will wait to use until we study more advanced applications in subsequent lectures.
• Methods pertinent for this lecture are:
– prior_to_filtered, which updates (𝑥𝑡̂ , Σ𝑡 ) to (𝑥𝐹 𝐹
𝑡̂ , Σ𝑡 )
– filtered_to_forecast, which updates the filtering distribution to the predictive
distribution – which becomes the new prior (𝑥𝑡+1̂ , Σ𝑡+1 )
– update, which combines the last two methods
– a stationary_values, which computes the solution to (9) and the corresponding
(stationary) Kalman gain
You can view the program on GitHub.
16.6 Exercises
16.6.1 Exercise 1
Consider the following simple application of the Kalman filter, loosely based on [72], section
2.9.2.
Suppose that
• all variables are scalars
• the hidden state {𝑥𝑡 } is in fact constant, equal to some 𝜃 ∈ ℝ unknown to the modeler
State dynamics are therefore given by (5) with 𝐴 = 1, 𝑄 = 0 and 𝑥0 = 𝜃.
The measurement equation is 𝑦𝑡 = 𝜃 + 𝑣𝑡 where 𝑣𝑡 is 𝑁 (0, 1) and IID.
The task of this exercise to simulate the model and, using the code from kalman.py, plot the
first five predictive densities 𝑝𝑡 (𝑥) = 𝑁 (𝑥𝑡̂ , Σ𝑡 ).
As shown in [72], sections 2.9.1–2.9.2, these distributions asymptotically put all mass on the
unknown value 𝜃.
In the simulation, take 𝜃 = 10, 𝑥0̂ = 8 and Σ0 = 1.
Your figure should – modulo randomness – look something like this
282 CHAPTER 16. A FIRST LOOK AT THE KALMAN FILTER
16.6.2 Exercise 2
The preceding figure gives some support to the idea that probability mass converges to 𝜃.
To get a better idea, choose a small 𝜖 > 0 and calculate
𝜃+𝜖
𝑧𝑡 ∶= 1 − ∫ 𝑝𝑡 (𝑥)𝑑𝑥
𝜃−𝜖
for 𝑡 = 0, 1, 2, … , 𝑇 .
Plot 𝑧𝑡 against 𝑇 , setting 𝜖 = 0.1 and 𝑇 = 600.
Your figure should show error erratically declining something like this
16.6. EXERCISES 283
16.6.3 Exercise 3
As discussed above, if the shock sequence {𝑤𝑡 } is not degenerate, then it is not in general
possible to predict 𝑥𝑡 without error at time 𝑡 − 1 (and this would be the case even if we could
observe 𝑥𝑡−1 ).
Let’s now compare the prediction 𝑥𝑡̂ made by the Kalman filter against a competitor who is
allowed to observe 𝑥𝑡−1 .
This competitor will use the conditional expectation 𝔼[𝑥𝑡 | 𝑥𝑡−1 ], which in this case is 𝐴𝑥𝑡−1 .
The conditional expectation is known to be the optimal prediction method in terms of mini-
mizing mean squared error.
(More precisely, the minimizer of 𝔼 ‖𝑥𝑡 − 𝑔(𝑥𝑡−1 )‖2 with respect to 𝑔 is 𝑔∗ (𝑥𝑡−1 ) ∶= 𝔼[𝑥𝑡 | 𝑥𝑡−1 ])
Thus we are comparing the Kalman filter against a competitor who has more information (in
the sense of being able to observe the latent state) and behaves optimally in terms of mini-
mizing squared error.
Our horse race will be assessed in terms of squared error.
In particular, your task is to generate a graph plotting observations of both ‖𝑥𝑡 − 𝐴𝑥𝑡−1 ‖2 and
‖𝑥𝑡 − 𝑥𝑡̂ ‖2 against 𝑡 for 𝑡 = 1, … , 50.
For the parameters, set 𝐺 = 𝐼, 𝑅 = 0.5𝐼 and 𝑄 = 0.3𝐼, where 𝐼 is the 2 × 2 identity.
Set
0.5 0.4
𝐴=( )
0.6 0.3
0.9 0.3
Σ0 = ( )
0.3 0.9
Observe how, after an initial learning period, the Kalman filter performs quite well, even rela-
tive to the competitor who predicts optimally with knowledge of the latent state.
16.6.4 Exercise 4
16.7 Solutions
16.7.1 Exercise 1
In [7]: # Parameters
θ = 10 # Constant value of state x_t
A, C, G, H = 1, 0, 1, 1
ss = LinearStateSpace(A, C, G, H, mu_0=θ)
x_hat_0, Σ_0 = 8, 1
kalman = Kalman(ss, x_hat_0, Σ_0)
# Set up plot
fig, ax = plt.subplots(figsize=(10,8))
xgrid = np.linspace(θ 5, θ + 2, 200)
for i in range(N):
# Record the current predicted mean and variance
m, v = [float(z) for z in (kalman.x_hat, kalman.Sigma)]
# Plot, update filter
ax.plot(xgrid, norm.pdf(xgrid, loc=m, scale=np.sqrt(v)), label=f'$t={i}$')
kalman.update(y[i])
16.7.2 Exercise 2
In [8]: ϵ = 0.1
θ = 10 # Constant value of state x_t
286 CHAPTER 16. A FIRST LOOK AT THE KALMAN FILTER
A, C, G, H = 1, 0, 1, 1
ss = LinearStateSpace(A, C, G, H, mu_0=θ)
x_hat_0, Σ_0 = 8, 1
kalman = Kalman(ss, x_hat_0, Σ_0)
T = 600
z = np.empty(T)
x, y = ss.simulate(T)
y = y.flatten()
for t in range(T):
# Record the current predicted mean and variance and plot their densities
m, v = [float(temp) for temp in (kalman.x_hat, kalman.Sigma)]
kalman.update(y[t])
16.7.3 Exercise 3
In [9]: # Define A, C, G, H
G = np.identity(2)
H = np.sqrt(0.5) * np.identity(2)
A = [[0.5, 0.4],
[0.6, 0.3]]
C = np.sqrt(0.3) * np.identity(2)
# Print eigenvalues of A
print("Eigenvalues of A:")
print(eigvals(A))
# Print stationary Σ
S, K = kn.stationary_values()
print("Stationary prediction error variance:")
print(S)
e1 = np.empty(T1)
e2 = np.empty(T1)
fig, ax = plt.subplots(figsize=(9,6))
ax.plot(range(1, T), e1, 'k', lw=2, alpha=0.6,
label='Kalman filter error')
ax.plot(range(1, T), e2, 'g', lw=2, alpha=0.6,
label='Conditional expectation error')
ax.legend()
plt.show()
Eigenvalues of A:
[ 0.9+0.j 0.1+0.j]
Stationary prediction error variance:
[[0.40329108 0.1050718 ]
[0.1050718 0.41061709]]
288 CHAPTER 16. A FIRST LOOK AT THE KALMAN FILTER
Footnotes
[1] See, for example, page 93 of [16]. To get from his expressions to the ones used above, you
will also need to apply the Woodbury matrix identity.
Chapter 17
Shortest Paths
17.1 Contents
• Overview 17.2
• Outline of the Problem 17.3
• Finding Least-Cost Paths 17.4
• Solving for Minimum Cost-to-Go 17.5
• Exercises 17.6
• Solutions 17.7
17.2 Overview
The shortest path problem is a classic problem in mathematics and computer science with
applications in
• Economics (sequential decision making, analysis of social networks, etc.)
• Operations research and transportation
• Robotics and artificial intelligence
• Telecommunication network design and routing
• etc., etc.
Variations of the methods we discuss in this lecture are used millions of times every day, in
applications such as
• Google Maps
• routing packets on the internet
For us, the shortest path problem also provides a nice introduction to the logic of dynamic
programming.
Dynamic programming is an extremely powerful optimization technique that we apply in
many lectures on this site.
The only scientific library we’ll need in what follows is NumPy:
289
290 CHAPTER 17. SHORTEST PATHS
The shortest path problem is one of finding how to traverse a graph from one specified node
to another at minimum cost.
Consider the following graph
• A, D, F, G at cost 8
1. Start at node 𝑣 = 𝐴
where
• 𝐹𝑣 is the set of nodes that can be reached from 𝑣 in one step.
• 𝑐(𝑣, 𝑤) is the cost of traveling from 𝑣 to 𝑤.
Hence, if we know the function 𝐽 , then finding the best path is almost trivial.
But how can we find the cost-to-go function 𝐽 ?
Some thought will convince you that, for every node 𝑣, the function 𝐽 satisfies
This is known as the Bellman equation, after the mathematician Richard Bellman.
The Bellman equation can be thought of as a restriction that 𝐽 must satisfy.
What we want to do now is use this restriction to compute 𝐽 .
Let’s look at an algorithm for computing 𝐽 and then think about how to implement it.
17.5. SOLVING FOR MINIMUM COST-TO-GO 293
The standard algorithm for finding 𝐽 is to start an initial guess and then iterate.
This is a standard approach to solving nonlinear equations, often called the method of suc-
cessive approximations.
Our initial guess will be
Now
1. Set 𝑛 = 0
17.5.2 Implementation
Having an algorithm is a good start, but we also need to think about how to implement it on
a computer.
First, for the cost function 𝑐, we’ll implement it as a matrix 𝑄, where a typical element is
𝑐(𝑣, 𝑤) if 𝑤 ∈ 𝐹𝑣
𝑄(𝑣, 𝑤) = {
+∞ otherwise
Notice that the cost of staying still (on the principle diagonal) is set to
294 CHAPTER 17. SHORTEST PATHS
max_iter = 500
i = 0
17.6 Exercises
17.6.1 Exercise 1
Writing graph.txt
17.7 Solutions
17.7.1 Exercise 1
First let’s write a function that reads in the graph data above and builds a distance matrix.
def map_graph_to_distance_matrix(in_file):
infile.close()
return Q
1. a “Bellman operator” function that takes a distance matrix and current guess of J and
returns an updated guess of J, and
def compute_cost_to_go(Q):
J = np.zeros(num_nodes) # Initial guess
next_J = np.empty(num_nodes) # Stores updated guess
max_iter = 500
i = 0
return(J)
We used np.allclose() rather than testing exact equality because we are dealing with floating
point numbers now.
Finally, here’s a function that uses the cost-to-go function to obtain the optimal path (and its
cost).
print(destination_node)
print('Cost: ', sum_costs)
Okay, now we have the necessary functions, let’s call them to do the job we were assigned.
In [8]: Q = map_graph_to_distance_matrix('graph.txt')
J = compute_cost_to_go(Q)
print_best_path(J, Q)
0
8
11
18
23
33
41
53
56
57
60
67
70
73
76
85
87
88
93
94
96
97
98
99
Cost: 160.55000000000007
Chapter 18
18.1 Contents
• Overview 18.2
• The Model 18.3
• Planning Problem 18.4
• Shooting Algorithm 18.5
• Setting Initial Capital to Steady State Capital 18.6
• A Turnpike Property 18.7
• A Limiting Economy 18.8
• Concluding Remarks 18.9
18.2 Overview
This lecture and in Cass-Koopmans Competitive Equilibrium describe a model that Tjalling
Koopmans [66] and David Cass [22] used to analyze optimal growth.
The model can be viewed as an extension of the model of Robert Solow described in an ear-
lier lecture but adapted to make the saving rate the outcome of an optimal choice.
(Solow assumed a constant saving rate determined outside the model).
We describe two versions of the model, one in this lecture and the other in Cass-Koopmans
Competitive Equilibrium.
Together, the two lectures illustrate what is, in fact, a more general connection between a
planned economy and a decentralized economy organized as a competitive equilibrium.
This lecture is devoted to the planned economy version.
The lecture uses important ideas including
• A min-max problem for solving a planning problem.
• A shooting algorithm for solving difference equations subject to initial and terminal
conditions.
• A turnpike property that describes optimal paths for long but finite-horizon
economies.
Let’s start with some standard imports:
299
300 CHAPTER 18. CASS-KOOPMANS PLANNING PROBLEM
𝑇 1−𝛾
𝐶
𝑈 (𝐶)⃗ = ∑ 𝛽 𝑡 𝑡 (1)
𝑡=0
1−𝛾
where 𝛽 ∈ (0, 1) is a discount factor and 𝛾 > 0 governs the curvature of the one-period utility
function with larger 𝛾 implying more curvature.
Note that
𝐶𝑡1−𝛾
𝑢(𝐶𝑡 ) = (2)
1−𝛾
𝑇
ℒ(𝐶,⃗ 𝐾,⃗ 𝜇)⃗ = ∑ 𝛽 𝑡 {𝑢(𝐶𝑡 ) + 𝜇𝑡 (𝐹 (𝐾𝑡 , 1) + (1 − 𝛿)𝐾𝑡 − 𝐶𝑡 − 𝐾𝑡+1 )}
𝑡=0
• Extremization means maximization with respect to 𝐶,⃗ 𝐾⃗ and minimization with re-
spect to 𝜇.⃗
• Our problem satisfies conditions that assure that required second-order conditions are
satisfied at an allocation that satisfies the first-order conditions that we are about to
compute.
Before computing first-order conditions, we present some handy formulas.
𝛼
𝐾
𝐹 (𝐾𝑡 , 𝑁𝑡 ) = 𝐴𝐾𝑡𝛼 𝑁𝑡1−𝛼 = 𝑁𝑡 𝐴 ( 𝑡 )
𝑁𝑡
𝛼
𝐹 (𝐾𝑡 , 𝑁𝑡 ) 𝐾 𝐾
≡ 𝑓 ( 𝑡) = 𝐴( 𝑡)
𝑁𝑡 𝑁𝑡 𝑁𝑡
𝜕𝐹 (𝐾𝑡 , 𝑁𝑡 ) 𝜕𝑁𝑡 𝑓 ( 𝐾
𝑁𝑡 )
𝑡
=
𝜕𝐾𝑡 𝜕𝐾𝑡
𝐾 1
= 𝑁𝑡 𝑓 ′ ( 𝑡 ) (Chain rule)
𝑁𝑡 𝑁𝑡 (6)
𝐾
= 𝑓 ′ ( 𝑡 )∣
𝑁𝑡 𝑁 =1
𝑡
= 𝑓 ′ (𝐾𝑡 )
𝜕𝐹 (𝐾𝑡 , 𝑁𝑡 ) 𝜕𝑁𝑡 𝑓 ( 𝐾
𝑁𝑡 )
𝑡
= (Product rule)
𝜕𝑁𝑡 𝜕𝑁𝑡
𝐾 𝐾 −𝐾
= 𝑓 ( 𝑡 ) +𝑁𝑡 𝑓 ′ ( 𝑡 ) 2𝑡 (Chain rule)
𝑁𝑡 𝑁𝑡 𝑁𝑡
𝐾 𝐾 𝐾
= 𝑓 ( 𝑡 ) − 𝑡 𝑓 ′ ( 𝑡 )∣
𝑁𝑡 𝑁𝑡 𝑁𝑡 𝑁 =1
𝑡
We now compute first order necessary conditions for extremization of the Lagrangian:
In computing (9) we recognize that of 𝐾𝑡 appears in both the time 𝑡 and time 𝑡 − 1 feasibility
constraints.
(10) comes from differentiating with respect to 𝐾𝑇 +1 and applying the following Karush-
Kuhn-Tucker condition (KKT) (see Karush-Kuhn-Tucker conditions):
𝜇𝑇 𝐾𝑇 +1 = 0 (11)
Applying the inverse of the utility function on both sides of the above equation gives
−1
′−1 𝛽
𝐶𝑡+1 = 𝑢 (( ′ [𝑓 ′ (𝐾𝑡+1 ) + (1 − 𝛿)]) )
𝑢 (𝐶𝑡 )
which for our utility function (2) becomes the consumption Euler equation
1/𝛾
𝐶𝑡+1 = (𝛽𝐶𝑡𝛾 [𝑓 ′ (𝐾𝑡+1 ) + (1 − 𝛿)])
1/𝛾
= 𝐶𝑡 (𝛽[𝑓 ′ (𝐾𝑡+1 ) + (1 − 𝛿)])
Below we define a jitclass that stores parameters and functions that define our economy.
In [2]: planning_data = [
('γ', float64), # Coefficient of relative risk aversion
('β', float64), # Discount factor
('δ', float64), # Depreciation rate on capital
('α', float64), # Return to capital per capita
('A', float64) # Technology
]
In [3]: @jitclass(planning_data)
class PlanningProblem():
self.γ, self.β = γ, β
self.δ, self.α, self.A = δ, α, A
return c ** (γ)
return c ** (1 / γ)
return A * k ** α
return α * A * k ** (α 1)
k_next = f(k) + (1 δ) * k c
c_next = u_prime_inv(u_prime(c) / (β * (f_prime(k_next) + (1 δ))))
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/ipykernel_launcher.py:1: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba
doc/latest/reference/deprecation.html#changeofjitclasslocation for the time�
↪frame.
"""Entry point for launching an IPython kernel.
In [4]: pp = PlanningProblem()
We use shooting to compute an optimal allocation 𝐶,⃗ 𝐾⃗ and an associated Lagrange multi-
plier sequence 𝜇.⃗
The first-order necessary conditions (7), (8), and (9) for the planning problem form a system
of difference equations with two boundary conditions:
• 𝐾0 is a given initial condition for capital
• 𝐾𝑇 +1 = 0 is a terminal condition for capital that we deduced from the first-order
necessary condition for 𝐾𝑇 +1 the KKT condition (11)
We have no initial condition for the Lagrange multiplier 𝜇0 .
If we did, our job would be easy:
18.5. SHOOTING ALGORITHM 305
• Given 𝜇0 and 𝑘0 , we could compute 𝑐0 from equation (7) and then 𝑘1 from equation (9)
and 𝜇1 from equation (8).
• We could continue in this way to compute the remaining elements of 𝐶,⃗ 𝐾,⃗ 𝜇.⃗
But we don’t have an initial condition for 𝜇0 , so this won’t work.
Indeed, part of our task is to compute the optimal value of 𝜇0 .
To compute 𝜇0 and the other objects we want, a simple modification of the above procedure
will work.
It is called the shooting algorithm.
It is an instance of a guess and verify algorithm that consists of the following steps:
• Guess an initial Lagrange multiplier 𝜇0 .
• Apply the simple algorithm described above.
• Compute 𝑘𝑇 +1 and check whether it equals zero.
• If 𝐾𝑇 +1 = 0, we have solved the problem.
• If 𝐾𝑇 +1 > 0, lower 𝜇0 and try again.
• If 𝐾𝑇 +1 < 0, raise 𝜇0 and try again.
The following Python code implements the shooting algorithm for the planning problem.
We actually modify the algorithm slightly by starting with a guess for 𝑐0 instead of 𝜇0 in the
following code.
In [5]: @njit
def shooting(pp, c0, k0, T=10):
'''
Given the initial condition of capital k0 and an initial guess
of consumption c0, computes the whole paths of c and k
using the state transition law and Euler equation for T periods.
'''
if c0 > pp.f(k0):
print("initial consumption is not feasible")
return None
c_vec[0] = c0
k_vec[0] = k0
for t in range(T):
k_vec[t+1], c_vec[t+1] = pp.next_k_c(k_vec[t], c_vec[t])
T = paths[0].size 1
for i in range(2):
axs[i].plot(paths[i], c=colors[i])
axs[i].set(xlabel='t', ylabel=ylabels[i], title=titles[i])
axs[1].scatter(T+1, 0, s=80)
axs[1].axvline(T+1, color='k', ls='', lw=1)
plt.show()
Evidently, our initial guess for 𝜇0 is too high, so initial consumption too low.
We know this because we miss our 𝐾𝑇 +1 = 0 target on the high side.
Now we automate things with a search-for-a-good 𝜇0 algorithm that stops when we hit the
target 𝐾𝑡+1 = 0.
We use a bisection method.
We make an initial guess for 𝐶0 (we can eliminate 𝜇0 because 𝐶0 is an exact function of 𝜇0 ).
We know that the lowest 𝐶0 can ever be is 0 and the largest it can be is initial output 𝑓(𝐾0 ).
Guess 𝐶0 and shoot forward to 𝑇 + 1.
If 𝐾𝑇 +1 > 0, we take it to be our new lower bound on 𝐶0 .
If 𝐾𝑇 +1 < 0, we take it to be our new upper bound.
Make a new guess for 𝐶0 that is halfway between our new upper and lower bounds.
Shoot forward again, iterating on these steps until we converge.
When 𝐾𝑇 +1 gets close enough to 0 (i.e., within an error tolerance bounds), we stop.
In [8]: @njit
def bisection(pp, c0, k0, T=10, tol=1e4, max_iter=500, k_ter=0, verbose=True):
i = 0
18.5. SHOOTING ALGORITHM 307
while True:
c_vec, k_vec = shooting(pp, c0, k0, T)
error = k_vec[1] k_ter
i += 1
if i == max_iter:
if verbose:
print('Convergence failed.')
return c_vec, k_vec
c0 = (c0_lower + c0_upper) / 2
if axs is None:
fix, axs = plt.subplots(1, 3, figsize=(16, 4))
ylabels = ['$c_t$', '$k_t$', '$\mu_t$']
titles = ['Consumption', 'Capital', 'Lagrange Multiplier']
c_paths = []
k_paths = []
for T in T_arr:
c_vec, k_vec = bisection(pp, c0, k0, T, k_ter=k_ter, verbose=False)
c_paths.append(c_vec)
k_paths.append(k_vec)
μ_vec = pp.u_prime(c_vec)
paths = [c_vec, k_vec, μ_vec]
for i in range(3):
axs[i].plot(paths[i])
axs[i].set(xlabel='t', ylabel=ylabels[i], title=titles[i])
Now we can solve the model and plot the paths of consumption, capital, and Lagrange multi-
plier.
When 𝑇 → +∞, the optimal allocation converges to steady state values of 𝐶𝑡 and 𝐾𝑡 .
It is instructive to set 𝐾0 equal to the lim𝑇 →+∞ 𝐾𝑡 , which we’ll call steady state capital.
In a steady state 𝐾𝑡+1 = 𝐾𝑡 = 𝐾̄ for all very large 𝑡.
Evalauating the feasibility constraint (4) at 𝐾̄ gives
𝑓(𝐾)̄ − 𝛿 𝐾̄ = 𝐶 ̄ (13)
𝑢′ (𝐶)̄ ′ ̄
1=𝛽 [𝑓 (𝐾) + (1 − 𝛿)]
𝑢′ (𝐶)̄
1
Defining 𝛽 = 1+𝜌 , and cancelling gives
Simplifying gives
𝑓 ′ (𝐾)̄ = 𝜌 + 𝛿
and
𝐾̄ = 𝑓 ′−1 (𝜌 + 𝛿)
𝛼𝐾̄ 𝛼−1 = 𝜌 + 𝛿
67
33 100
𝐾̄ = ( 1
100
1 ) ≈ 9.57583
50 + 19
Let’s verify this with Python and then use this steady state 𝐾̄ as our initial capital stock 𝐾0 .
In [11]: ρ = 1 / pp.β 1
k_ss = pp.f_prime_inv(ρ+pp.δ)
Now we plot
Notice how the planner pushes capital toward the steady state, stays near there for a while,
then pushes 𝐾𝑡 toward the terminal value 𝐾𝑇 +1 = 0 when 𝑡 closely approaches 𝑇 .
The following graphs compare optimal outcomes as we vary 𝑇 .
310 CHAPTER 18. CASS-KOOPMANS PLANNING PROBLEM
The following calculation indicates that when 𝑇 is very large, the optimal capital stock stays
close to its steady state value most of the time.
In [16]: @njit
def saving_rate(pp, c_path, k_path):
'Given paths of c and k, computes the path of saving rate.'
production = pp.f(k_path[:1])
for i, T in enumerate(T_arr):
s_path = saving_rate(pp, c_paths[i], k_paths[i])
axs[1, 1].plot(s_path)
a condition that will be satisfied by a path that converges to an optimal steady state.
We can approximate the optimal path by starting from an arbitrary initial 𝐾0 and shooting
towards the optimal steady state 𝐾 at a large but finite 𝑇 + 1.
In the following code, we do this for a large 𝑇 and plot consumption, capital, and the saving
rate.
̄ 𝐶̄
𝑓(𝐾)−
We know that in the steady state that the saving rate is constant and that 𝑠 ̄ = 𝑓(𝐾)̄ .
𝛿 𝐾̄
𝑠̄ =
𝑓(𝐾)̄
The planner slowly lowers the saving rate until reaching a steady state in which 𝑓 ′ (𝐾) = 𝜌+𝛿.
18.8.1 Exercise
• Plot the optimal consumption, capital, and saving paths when the initial capital level
begins at 1.5 times the steady state level as we shoot towards the steady state at 𝑇 =
130.
• Why does the saving rate respond as it does?
18.8.2 Solution
The relationship between a command economy like the one studied in this lecture and a mar-
ket economy like that studied in Cass-Koopmans Competitive Equilibrium is a foundational
topic in general equilibrium theory and welfare economics.
Chapter 19
Cass-Koopmans Competitive
Equilibrium
19.1 Contents
• Overview 19.2
• Review of Cass-Koopmans Model 19.3
• Competitive Equilibrium 19.4
• Market Structure 19.5
• Firm Problem 19.6
• Household Problem 19.7
• Computing a Competitive Equilibrium 19.8
• Yield Curves and Hicks-Arrow Prices 19.9
19.2 Overview
This lecture continues our analysis in this lecture Cass-Koopmans Planning Model about the
model that Tjalling Koopmans [66] and David Cass [22] used to study optimal growth.
This lecture illustrates what is, in fact, a more general connection between a planned econ-
omy and an economy organized as a competitive equilibrium.
The earlier lecture Cass-Koopmans Planning Model studied a planning problem and used
ideas including
• A min-max problem for solving the planning problem.
• A shooting algorithm for solving difference equations subject to initial and terminal
conditions.
• A turnpike property that describes optimal paths for long-but-finite horizon
economies.
The present lecture uses additional ideas including
• Hicks-Arrow prices named after John R. Hicks and Kenneth Arrow.
• A connection between some Lagrange multipliers in the min-max problem and the
Hicks-Arrow prices.
• A Big 𝐾 , little 𝑘 trick widely used in macroeconomic dynamics.
315
316 CHAPTER 19. CASS-KOOPMANS COMPETITIVE EQUILIBRIUM
• We shall encounter this trick in this lecture and also in this lecture.
𝑇 1−𝛾
𝐶
𝑈 (𝐶)⃗ = ∑ 𝛽 𝑡 𝑡
𝑡=0
1−𝛾
where 𝛽 ∈ (0, 1) is a discount factor and 𝛾 > 0 governs the curvature of the one-period utility
function.
We assume that 𝐾0 > 0.
There is an economy-wide production function
19.4. COMPETITIVE EQUILIBRIUM 317
The allocation that solves the planning problem plays an important role in a competitive
equilibrium as we shall see below.
The representative household and the representative firm are both price takers.
The household owns both factors of production, namely, labor and physical capital.
Each period, the firm rents both factors from the household.
There is a single grand competitive market in which a household can trade date 0 goods for
goods at all other dates 𝑡 = 1, 2, … , 𝑇 .
19.5.1 Prices
There are sequences of prices {𝑤𝑡 , 𝜂𝑡 }𝑇𝑡=0 = {𝑤,⃗ 𝜂}⃗ where 𝑤𝑡 is a wage or rental rate for labor
at time 𝑡 and 𝑒𝑡𝑎𝑡 is a rental rate for capital at time 𝑡.
In addition there is are intertemporal prices that work as follows.
Let 𝑞𝑡0 be the price of a good at date 𝑡 relative to a good at date 0.
We call {𝑞𝑡0 }𝑇𝑡=0 a vector of Hicks-Arrow prices, named after the 1972 economics Nobel
prize winners.
Evidently,
Because 𝑞𝑡0 is a relative price, the units in terms of which prices are quoted are arbitrary –
we are free to normalize them.
𝐹 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) − 𝑤𝑡 𝑛̃ 𝑡 − 𝜂𝑡 𝑘̃ 𝑡
𝐹𝑘 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) = 𝜂𝑡
and
19.7. HOUSEHOLD PROBLEM 319
𝐹𝑛 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) = 𝑤𝑡 (1)
𝜕𝐹 ̃ 𝜕𝐹
𝐹 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) =chain rule 𝑘𝑡 + 𝑛̃
𝜕 𝑘̃ 𝑡 𝜕 𝑛̃ 𝑡 𝑡
𝜕𝐹 ̃ 𝜕𝐹
𝑘𝑡 + 𝑛̃ − 𝑤𝑡 𝑛̃ 𝑡 − 𝜂𝑡 𝑘𝑡
𝜕 𝑘̃ 𝑡 𝜕 𝑛̃ 𝑡 𝑡
or
𝜕𝐹 𝜕𝐹
( − 𝜂𝑡 ) 𝑘̃ 𝑡 + ( − 𝑤𝑡 ) 𝑛̃ 𝑡
𝜕 𝑘̃ 𝑡 𝜕 𝑛̃ 𝑡
𝜕𝐹 𝜕𝐹
Because 𝐹 is homogeneous of degree 1, it follows that 𝜕 𝑘̃ 𝑡
and 𝜕 𝑛̃ 𝑡 are homogeneous of degree
0 and therefore fixed with respect to 𝑘̃ 𝑡 and 𝑛̃ 𝑡 .
If 𝜕𝐹
𝜕 𝑘̃ 𝑡
> 𝜂𝑡 , then the firm makes positive profits on each additional unit of 𝑘̃ 𝑡 , so it will want
to make 𝑘̃ 𝑡 arbitrarily large.
But setting 𝑘̃ 𝑡 = +∞ is not physically feasible, so presumably equilibrium prices will assume
values that present the firm with no such arbitrage opportunity.
𝜕𝐹
A similar argument applies if 𝜕 𝑛̃ 𝑡 > 𝑤𝑡 .
𝜕 𝑘̃ 𝑡
If 𝜕 𝑘̃ 𝑡
< 𝜂𝑡 , the firm will set 𝑘̃ 𝑡 to zero, something that is not feasible.
𝑤𝑡 1 + 𝜂 𝑡 𝑘 𝑡
320 CHAPTER 19. CASS-KOOPMANS COMPETITIVE EQUILIBRIUM
Here (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ) is the household’s net investment in physical capital and 𝛿 ∈ (0, 1) is
again a depreciation rate of capital.
In period 𝑡 is free to purchase more goods to be consumed and invested in physical capital
than its income from supplying capital and labor to the firm, provided that in some other
periods its income exceeds its purchases.
A household’s net excess demand for time 𝑡 consumption goods is the gap
𝑇
∑ 𝑞𝑡0 𝑒𝑡 ≤ 0
𝑡=0
or
𝑇
∑ 𝑞𝑡0 (𝑐𝑡 + (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ) − (𝑤𝑡 1 + 𝜂𝑡 𝑘𝑡 )) ≤ 0
𝑡=0
𝑇
max ∑ 𝛽 𝑡 𝑢(𝑐𝑡 )
𝑐,⃗ 𝑘⃗ 𝑡=0
𝑇
subject to ∑ 𝑞𝑡0 (𝑐𝑡 + (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ) − 𝑤𝑡 − 𝜂𝑡 𝑘𝑡 ) ≤ 0
𝑡=0
19.7.1 Definitions
In this lecture Cass-Koopmans Planning Model, we computed an allocation {𝐶,⃗ 𝐾,⃗ 𝑁⃗ } that
solves the planning problem.
(This allocation will constitute the Big 𝐾 to be in the presence instance of the *Big 𝐾 , lit-
tle** 𝑘 trick that we’ll apply to a competitive equilibrium in the spirit of this lecture and this
lecture.)
We use that allocation to construct a guess for the equilibrium price system.
In particular, we guess that for 𝑡 = 0, … , 𝑇 :
𝜂𝑡 = 𝑓 ′ (𝐾𝑡 ) (4)
and so on.
If our guess for the equilibrium price system is correct, then it must occur that
𝑘𝑡∗ = 𝑘̃ 𝑡∗ (6)
1 = 𝑛̃ ∗𝑡 (7)
∗
𝑐𝑡∗ + 𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡∗ = 𝐹 (𝑘̃ 𝑡∗ , 𝑛̃ ∗𝑡 )
We shall verify that for 𝑡 = 0, … , 𝑇 the allocations chosen by the household and the firm both
equal the allocation that solves the planning problem:
Our approach is to stare at first-order necessary conditions for the optimization problems of
the household and the firm.
At the price system we have guessed, we’ll then verify that both sets of first-order conditions
are satisfied at the allocation that solves the planning problem.
𝑇 𝑇
ℒ(𝑐,⃗ 𝑘,⃗ 𝜆) = ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) + 𝜆 (∑ 𝑞𝑡0 (((1 − 𝛿)𝑘𝑡 − 𝑤𝑡 ) + 𝜂𝑡 𝑘𝑡 − 𝑐𝑡 − 𝑘𝑡+1 ))
𝑡=0 𝑡=0
𝑇
𝜆∶ (∑ 𝑞𝑡0 (𝑐𝑡 + (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ) − 𝑤𝑡 − 𝜂𝑡 𝑘𝑡 )) ≤ 0 (11)
𝑡=0
Now we plug in our guesses of prices and embark on some algebra in the hope of derived
all first-order necessary conditions (7)-(10) for the planning problem from this lecture Cass-
Koopmans Planning Model.
Combining (9) and (2), we get:
𝑢′ (𝐶𝑡 ) = 𝜇𝑡
which is (7).
Combining (10), (2), and (4) we get:
Rewriting (13) by dividing by 𝜆 on both sides (which is nonzero since u’>0) we get:
or
which is (8).
Combining (11), (2), (3) and (4) after multiplying both sides of (11) by 𝜆, we get
𝑇
∑ 𝛽 𝑡 𝜇𝑡 (𝐶𝑡 + (𝐾𝑡+1 − (1 − 𝛿)𝐾𝑡 ) − 𝑓(𝐾𝑡 ) + 𝐾𝑡 𝑓 ′ (𝐾𝑡 ) − 𝑓 ′ (𝐾𝑡 )𝐾𝑡 ) ≤ 0
𝑡=0
which simplifies
𝑇
∑ 𝛽 𝑡 𝜇𝑡 (𝐶𝑡 + 𝐾𝑡+1 − (1 − 𝛿)𝐾𝑡 − 𝐹 (𝐾𝑡 , 1)) ≤ 0
𝑡=0
which is (9).
Combining (12) and (2), we get:
−𝛽 𝑇 +1 𝜇𝑇 +1 ≤ 0
−𝜇𝑇 +1 ≤ 0
𝜕𝐹 (𝐾𝑡 , 1)
= 𝑓 ′ (𝐾𝑡 ) = 𝜂𝑡
𝜕𝐾𝑡
which is (4).
If we now plug (8) into (1) for all t, we get:
𝜕𝐹 (𝐾̃ 𝑡 , 1)
= 𝑓(𝐾𝑡 ) − 𝑓 ′ (𝐾𝑡 )𝐾𝑡 = 𝑤𝑡
̃
𝜕 𝐿𝑡
So at our guess for the equilibrium price system, the allocation that solves the planning prob-
lem also solves the problem faced by a firm within a competitive equilibrium.
By (6) and (7) this allocation is identical to the one that solves the consumer’s problem.
Note: Because budget sets are affected only by relative prices, {𝑞0𝑡 } is determined only up to
multiplication by a positive constant.
Normalization: We are free to choose a {𝑞0𝑡 } that makes 𝜆 = 1 so that we are measuring 𝑞0𝑡
in units of the marginal utility of time 0 goods.
We will plot 𝑞, 𝑤, 𝜂 below to show these equilibrium prices induce the same aggregate move-
ments that we saw earlier in the planning problem.
To proceed, we bring in Python code that Cass-Koopmans Planning Model used to solve the
planning problem
First let’s define a jitclass that stores parameters and functions the characterize an econ-
omy.
In [2]: planning_data = [
('γ', float64), # Coefficient of relative risk aversion
('β', float64), # Discount factor
('δ', float64), # Depreciation rate on capital
('α', float64), # Return to capital per capita
('A', float64) # Technology
]
In [3]: @jitclass(planning_data)
class PlanningProblem():
self.γ, self.β = γ, β
self.δ, self.α, self.A = δ, α, A
return c ** (γ)
return c ** (1 / γ)
return A * k ** α
return α * A * k ** (α 1)
k_next = f(k) + (1 δ) * k c
c_next = u_prime_inv(u_prime(c) / (β * (f_prime(k_next) + (1 δ))))
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/ipykernel_launcher.py:1: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba
doc/latest/reference/deprecation.html#changeofjitclasslocation for the time�
↪frame.
"""Entry point for launching an IPython kernel.
In [4]: @njit
def shooting(pp, c0, k0, T=10):
'''
Given the initial condition of capital k0 and an initial guess
of consumption c0, computes the whole paths of c and k
using the state transition law and Euler equation for T periods.
'''
if c0 > pp.f(k0):
print("initial consumption is not feasible")
return None
c_vec[0] = c0
k_vec[0] = k0
for t in range(T):
k_vec[t+1], c_vec[t+1] = pp.next_k_c(k_vec[t], c_vec[t])
326 CHAPTER 19. CASS-KOOPMANS COMPETITIVE EQUILIBRIUM
In [5]: @njit
def bisection(pp, c0, k0, T=10, tol=1e4, max_iter=500, k_ter=0, verbose=True):
i = 0
while True:
c_vec, k_vec = shooting(pp, c0, k0, T)
error = k_vec[1] k_ter
i += 1
if i == max_iter:
if verbose:
print('Convergence failed.')
return c_vec, k_vec
c0 = (c0_lower + c0_upper) / 2
In [6]: pp = PlanningProblem()
# Steady states
ρ = 1 / pp.β 1
k_ss = pp.f_prime_inv(ρ+pp.δ)
c_ss = pp.f(k_ss) pp.δ * k_ss
The above code from this lecture Cass-Koopmans Planning Model lets us compute an opti-
mal allocation for the planning problem that turns out to be the allocation associated with a
competitive equilibium.
Now we’re ready to bring in Python code that we require to compute additional objects that
appear in a competitive equilibrium.
In [7]: @njit
def q(pp, c_path):
# Here we choose numeraire to be u'(c_0) this is q^(t_0)_t
T = len(c_path) 1
q_path = np.ones(T+1)
q_path[0] = 1
for t in range(1, T+1):
q_path[t] = pp.β ** t * pp.u_prime(c_path[t])
return q_path
19.8. COMPUTING A COMPETITIVE EQUILIBRIUM 327
@njit
def w(pp, k_path):
w_path = pp.f(k_path) k_path * pp.f_prime(k_path)
return w_path
@njit
def η(pp, k_path):
η_path = pp.f_prime(k_path)
return η_path
for T in T_arr:
c_path, k_path = bisection(pp, 0.3, k_ss/3, T, verbose=False)
μ_path = pp.u_prime(c_path)
for i, ax in enumerate(axs.flatten()):
ax.plot(paths[i])
ax.set(title=titles[i], ylabel=ylabels[i], xlabel='t')
if titles[i] is 'Capital':
ax.axhline(k_ss, lw=1, ls='', c='k')
if titles[i] is 'Consumption':
ax.axhline(c_ss, lw=1, ls='', c='k')
plt.tight_layout()
plt.show()
328 CHAPTER 19. CASS-KOOPMANS COMPETITIVE EQUILIBRIUM
Varying Curvature
Now we see how our results change if we keep 𝑇 constant, but allow the curvature parameter,
𝛾 to vary, starting with 𝐾0 below the steady state.
We plot the results for 𝑇 = 150
In [9]: T = 150
γ_arr = [1.1, 4, 6, 8]
for γ in γ_arr:
pp_γ = PlanningProblem(γ=γ)
c_path, k_path = bisection(pp_γ, 0.3, k_ss/3, T, verbose=False)
μ_path = pp_γ.u_prime(c_path)
for i, ax in enumerate(axs.flatten()):
ax.plot(paths[i], label=f'$\gamma = {γ}$')
ax.set(title=titles[i], ylabel=ylabels[i], xlabel='t')
if titles[i] is 'Capital':
ax.axhline(k_ss, lw=1, ls='', c='k')
if titles[i] is 'Consumption':
ax.axhline(c_ss, lw=1, ls='', c='k')
axs[0, 0].legend()
plt.tight_layout()
plt.show()
We return to Hicks-Arrow prices and calculate how they are related to yields on loans of al-
ternative maturities.
This will let us plot a yield curve that graphs yields on bonds of maturities 𝑗 = 1, 2, …
against :math:j=1,2, ldots‘.
The formulas we want are:
A yield to maturity on a loan made at time 𝑡0 that matures at time 𝑡 > 𝑡0
𝑡
log 𝑞𝑡 0
𝑟𝑡0 ,𝑡 = −
𝑡 − 𝑡0
−𝛾
𝑡 𝑢′ (𝑐𝑡 ) 𝑡−𝑡0 𝑐𝑡
𝑞𝑡 0 = 𝛽 𝑡−𝑡0 = 𝛽
𝑢′ (𝑐𝑡0 ) 𝑐𝑡−𝛾
0
We redefine our function for 𝑞 to allow arbitrary base years, and define a new function for 𝑟,
then plot both.
We begin by continuing to assume that 𝑡0 = 0 and plot things for different maturities 𝑡 = 𝑇 ,
with 𝐾0 below the steady state
In [10]: @njit
def q_generic(pp, t0, c_path):
# simplify notations
β = pp.β
u_prime = pp.u_prime
T = len(c_path) 1
q_path = np.zeros(T+1t0)
q_path[0] = 1
for t in range(t0+1, T+1):
q_path[tt0] = β ** (tt0) * u_prime(c_path[t]) / u_prime(c_path[t0])
return q_path
@njit
def r(pp, t0, q_path):
'''Yield to maturity'''
r_path = np.log(q_path[1:]) / np.arange(1, len(q_path))
return r_path
for T in T_arr:
c_path, k_path = bisection(pp, c0, k0, T, verbose=False)
q_path = q_generic(pp, t0, c_path)
r_path = r(pp, t0, q_path)
We aim to have more to say about the term structure of interest rates in a planned lecture on
the topic.
Part III
Search
331
Chapter 20
20.1 Contents
• Overview 20.2
• The McCall Model 20.3
• Computing the Optimal Policy: Take 1 20.4
• Computing the Optimal Policy: Take 2 20.5
• Exercises 20.6
• Solutions 20.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
20.2 Overview
The McCall search model [80] helped transform economists’ way of thinking about labor mar-
kets.
To clarify vague notions such as “involuntary” unemployment, McCall modeled the decision
problem of unemployed agents directly, in terms of factors such as
• current and likely future wages
• impatience
• unemployment compensation
To solve the decision problem he used dynamic programming.
333
334 CHAPTER 20. JOB SEARCH I: THE MCCALL SEARCH MODEL
Here we set up McCall’s model and adopt the same solution method.
As we’ll see, McCall’s model is not only interesting in its own right but also an excellent vehi-
cle for learning dynamic programming.
Let’s start with some imports:
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
𝑤𝑡 = 𝑤(𝑠𝑡 ) where 𝑠𝑡 ∈ 𝕊
Here you should think of state process {𝑠𝑡 } as some underlying, unspecified random factor
that impacts on wages.
(Introducing an exogenous stochastic state process is a standard way for economists to inject
randomness into their models.)
In this lecture, we adopt the following simple environment:
• {𝑠𝑡 } is IID, with 𝑞(𝑠) being the probability of observing state 𝑠 in 𝕊 at each point in
time, and
• the agent observes 𝑠𝑡 at the start of 𝑡 and hence knows 𝑤𝑡 = 𝑤(𝑠𝑡 ),
• the set 𝕊 is finite.
(In later lectures, we will relax all of these assumptions.)
At time 𝑡, our agent has two choices:
2. Reject the offer, receive unemployment compensation 𝑐, and reconsider next period.
The agent is infinitely lived and aims to maximize the expected discounted sum of earnings
∞
𝔼 ∑ 𝛽 𝑡 𝑦𝑡
𝑡=0
20.3. THE MCCALL MODEL 335
20.3.1 A Trade-Off
In order to optimally trade-off current and future rewards, we need to think about two things:
2. the different states that those choices will lead to in next period (in this case, either em-
ployment or unemployment)
To weigh these two aspects of the decision problem, we need to assign values to states.
To this end, let 𝑣∗ (𝑠) be the total lifetime value accruing to an unemployed worker who enters
the current period unemployed when the state is 𝑠 ∈ 𝕊.
In particular, the agent has wage offer 𝑤(𝑠) in hand.
More precisely, 𝑣∗ (𝑠) denotes the value of the objective function (1) when an agent in this sit-
uation makes optimal decisions now and at all future points in time.
Of course 𝑣∗ (𝑠) is not trivial to calculate because we don’t yet know what decisions are opti-
mal and what aren’t!
But think of 𝑣∗ as a function that assigns to each possible state 𝑠 the maximal lifetime value
that can be obtained with that offer in hand.
A crucial observation is that this function 𝑣∗ must satisfy the recursion
336 CHAPTER 20. JOB SEARCH I: THE MCCALL SEARCH MODEL
𝑤(𝑠)
𝑣∗ (𝑠) = max { , 𝑐 + 𝛽 ∑ 𝑣∗ (𝑠′ )𝑞(𝑠′ )} (1)
1−𝛽 𝑠′ ∈𝕊
𝑤(𝑠)
= 𝑤(𝑠) + 𝛽𝑤(𝑠) + 𝛽 2 𝑤(𝑠) + ⋯
1−𝛽
• the second term inside the max operation is the continuation value, which is the life-
time payoff from rejecting the current offer and then behaving optimally in all subse-
quent periods
If we optimize and pick the best of these two options, we obtain maximal lifetime value from
today, given current state 𝑠.
But this is precisely 𝑣∗ (𝑠), which is the l.h.s. of (1).
Suppose for now that we are able to solve (1) for the unknown function 𝑣∗ .
Once we have this function in hand we can behave optimally (i.e., make the right choice be-
tween accept and reject).
All we have to do is select the maximal choice on the r.h.s. of (1).
The optimal action is best thought of as a policy, which is, in general, a map from states to
actions.
Given any 𝑠, we can read off the corresponding best choice (accept or reject) by picking the
max on the r.h.s. of (1).
Thus, we have a map from ℝ to {0, 1}, with 1 meaning accept and 0 meaning reject.
We can write the policy as follows
𝑤(𝑠)
𝜎(𝑠) ∶= 1 { ≥ 𝑐 + 𝛽 ∑ 𝑣∗ (𝑠′ )𝑞(𝑠′ )}
1−𝛽 𝑠′ ∈𝕊
𝜎(𝑠) ∶= 1{𝑤(𝑠) ≥ 𝑤}
̄
where
20.4. COMPUTING THE OPTIMAL POLICY: TAKE 1 337
Here 𝑤̄ (called the reservation wage) is a constant depending on 𝛽, 𝑐 and the wage distribu-
tion.
The agent should accept if and only if the current wage offer exceeds the reservation wage.
In view of (2), we can compute this reservation wage if we can compute the value function.
To put the above ideas into action, we need to compute the value function at each possible
state 𝑠 ∈ 𝕊.
Let’s suppose that 𝕊 = {1, … , 𝑛}.
The value function is then represented by the vector 𝑣∗ = (𝑣∗ (𝑖))𝑛𝑖=1 .
In view of (1), this vector satisfies the nonlinear system of equations
𝑤(𝑖)
𝑣∗ (𝑖) = max { , 𝑐 + 𝛽 ∑ 𝑣∗ (𝑗)𝑞(𝑗)} for 𝑖 = 1, … , 𝑛 (3)
1−𝛽 1≤𝑗≤𝑛
𝑤(𝑖)
𝑣′ (𝑖) = max { , 𝑐 + 𝛽 ∑ 𝑣(𝑗)𝑞(𝑗)} for 𝑖 = 1, … , 𝑛 (4)
1−𝛽 1≤𝑗≤𝑛
Step 3: calculate a measure of the deviation between 𝑣 and 𝑣′ , such as max𝑖 |𝑣(𝑖) − 𝑣′ (𝑖)|.
Step 4: if the deviation is larger than some fixed tolerance, set 𝑣 = 𝑣′ and go to step 2, else
continue.
Step 5: return 𝑣.
Let {𝑣𝑘 } denote the sequence genererated by this algorithm.
This sequence converges to the solution to (3) as 𝑘 → ∞, which is the value function 𝑣∗ .
𝑤(𝑖)
(𝑇 𝑣)(𝑖) = max { , 𝑐 + 𝛽 ∑ 𝑣(𝑗)𝑞(𝑗)} for 𝑖 = 1, … , 𝑛 (5)
1−𝛽 1≤𝑗≤𝑛
(A new vector 𝑇 𝑣 is obtained from given vector 𝑣 by evaluating the r.h.s. at each 𝑖)
The element 𝑣𝑘 in the sequence {𝑣𝑘 } of successive approximations corresponds to 𝑇 𝑘 𝑣.
• This is 𝑇 applied 𝑘 times, starting at the initial guess 𝑣
One can show that the conditions of the Banach fixed point theorem are satisfied by 𝑇 on ℝ𝑛 .
One implication is that 𝑇 has a unique fixed point in ℝ𝑛 .
• That is, a unique vector 𝑣 ̄ such that 𝑇 𝑣 ̄ = 𝑣.̄
Moreover, it’s immediate from the definition of 𝑇 that this fixed point is 𝑣∗ .
A second implication of the Banach contraction mapping theorem is that {𝑇 𝑘 𝑣} converges to
the fixed point 𝑣∗ regardless of 𝑣.
20.4.3 Implementation
Our default for 𝑞, the distribution of the state process, will be Beta-binomial.
plt.show()
20.4. COMPUTING THE OPTIMAL POLICY: TAKE 1 339
In [6]: mccall_data = [
('c', float64), # unemployment compensation
('β', float64), # discount factor
('w', float64[:]), # array of wage values, w[i] = wage at state i
('q', float64[:]) # array of probabilities
]
Here’s a class that stores the data and computes the values of state-action pairs, i.e. the value
in the maximum bracket on the right hand side of the Bellman equation (4), given the current
state and an arbitrary feasible action.
Default parameter values are embedded in the class.
In [7]: @jitclass(mccall_data)
class McCallModel:
self.c, self.β = c, β
self.w, self.q = w_default, q_default
accept = w[i] / (1 β)
reject = c + β * np.sum(v * q)
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/ipykernel_launcher.py:1: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba
doc/latest/reference/deprecation.html#changeofjitclasslocation for the time�
↪frame.
Based on these defaults, let’s try plotting the first few approximate value functions in the se-
quence {𝑇 𝑘 𝑣}.
We will start from guess 𝑣 given by 𝑣(𝑖) = 𝑤(𝑖)/(1 − 𝛽), which is the value of accepting at
every given wage.
Here’s a function to implement this:
"""
n = len(mcm.w)
v = mcm.w / (1 mcm.β)
v_next = np.empty_like(v)
for i in range(num_plots):
ax.plot(mcm.w, v, '', alpha=0.4, label=f"iterate {i}")
# Update guess
for i in range(n):
v_next[i] = np.max(mcm.state_action_values(i, v))
v[:] = v_next # copy contents into v
ax.legend(loc='lower right')
fig, ax = plt.subplots()
ax.set_xlabel('wage')
ax.set_ylabel('value')
plot_value_function_seq(mcm, ax)
plt.show()
20.4. COMPUTING THE OPTIMAL POLICY: TAKE 1 341
You can see that convergence is occuring: successive iterates are getting closer together.
Here’s a more serious iteration effort to compute the limit, which continues until measured
deviation between successive iterates is below tol.
Once we obtain a good approximation to the limit, we will use it to calculate the reservation
wage.
We’ll be using JIT compilation via Numba to turbocharge our loops.
In [10]: @jit(nopython=True)
def compute_reservation_wage(mcm,
max_iter=500,
tol=1e6):
# Simplify names
c, β, w, q = mcm.c, mcm.β, mcm.w, mcm.q
n = len(w)
v = w / (1 β) # initial guess
v_next = np.empty_like(v)
i = 0
error = tol + 1
while i < max_iter and error > tol:
for i in range(n):
v_next[i] = np.max(mcm.state_action_values(i, v))
The next line computes the reservation wage at the default parameters
In [11]: compute_reservation_wage(mcm)
Out[11]: 47.316499710024964
Now we know how to compute the reservation wage, let’s see how it varies with parameters.
In particular, let’s look at what happens when we change 𝛽 and 𝑐.
In [12]: grid_size = 25
R = np.empty((grid_size, grid_size))
for i, c in enumerate(c_vals):
for j, β in enumerate(β_vals):
mcm = McCallModel(c=c, β=β)
R[i, j] = compute_reservation_wage(mcm)
ax.set_title("reservation wage")
ax.set_xlabel("$c$", fontsize=16)
ax.set_ylabel("$β$", fontsize=16)
ax.ticklabel_format(useOffset=False)
plt.show()
20.5. COMPUTING THE OPTIMAL POLICY: TAKE 2 343
As expected, the reservation wage increases both with patience and with unemployment com-
pensation.
The approach to dynamic programming just described is very standard and broadly applica-
ble.
For this particular problem, there’s also an easier way, which circumvents the need to com-
pute the value function.
Let ℎ denote the continuation value:
𝑤(𝑠′ )
𝑣∗ (𝑠′ ) = max { , ℎ}
1−𝛽
𝑤(𝑠′ )
ℎ = 𝑐 + 𝛽 ∑ max { , ℎ} 𝑞(𝑠′ ) (7)
𝑠′ ∈𝕊
1−𝛽
𝑤(𝑠′ )
ℎ′ = 𝑐 + 𝛽 ∑ max { , ℎ} 𝑞(𝑠′ ) (8)
𝑠′ ∈𝕊
1−𝛽
In [14]: @jit(nopython=True)
def compute_reservation_wage_two(mcm,
max_iter=500,
tol=1e5):
# Simplify names
c, β, w, q = mcm.c, mcm.β, mcm.w, mcm.q
# == First compute h == #
h = np.sum(w * q) / (1 β)
i = 0
error = tol + 1
while i < max_iter and error > tol:
s = np.maximum(w / (1 β), h)
h_next = c + β * np.sum(s * q)
error = np.abs(h_next h)
i += 1
h = h_next
return (1 β) * h
20.6 Exercises
20.6.1 Exercise 1
Compute the average duration of unemployment when 𝛽 = 0.99 and 𝑐 takes the following
values
20.7. SOLUTIONS 345
That is, start the agent off as unemployed, compute their reservation wage given the parame-
ters, and then simulate to see how long it takes to accept.
Repeat a large number of times and take the average.
Plot mean unemployment duration as a function of 𝑐 in c_vals.
20.6.2 Exercise 2
The purpose of this exercise is to show how to replace the discrete wage offer distribution
used above with a continuous distribution.
This is a significant topic because many convenient distributions are continuous (i.e., have a
density).
Fortunately, the theory changes little in our simple model.
Recall that ℎ in (6) denotes the value of not accepting a job in this period but then behaving
optimally in all subsequent periods:
To shift to a continuous offer distribution, we can replace (6) by
𝑤(𝑠′ )
ℎ = 𝑐 + 𝛽 ∫ max { , ℎ} 𝑞(𝑠′ )𝑑𝑠′ (10)
1−𝛽
The aim is to solve this nonlinear equation by iteration, and from it obtain the reservation
wage.
Try to carry this out, setting
• the state sequence {𝑠𝑡 } to be IID and standard normal and
• the wage function to be 𝑤(𝑠) = exp(𝜇 + 𝜎𝑠).
You will need to implement a new version of the McCallModel class that assumes a lognormal
wage distribution.
Calculate the integral by Monte Carlo, by averaging over a large number of wage draws.
For default parameters, use c=25, β=0.99, σ=0.5, μ=2.5.
Once your code is working, investigate how the reservation wage changes with 𝑐 and 𝛽.
20.7 Solutions
20.7.1 Exercise 1
@jit(nopython=True)
def compute_stopping_time(w_bar, seed=1234):
np.random.seed(seed)
t = 1
while True:
# Generate a wage draw
w = w_default[qe.random.draw(cdf)]
# Stop when the draw is above the reservation wage
if w >= w_bar:
stopping_time = t
break
else:
t += 1
return stopping_time
@jit(nopython=True)
def compute_mean_stopping_time(w_bar, num_reps=100000):
obs = np.empty(num_reps)
for i in range(num_reps):
obs[i] = compute_stopping_time(w_bar, seed=i)
return obs.mean()
fig, ax = plt.subplots()
plt.show()
20.7. SOLUTIONS 347
20.7.2 Exercise 2
In [16]: mccall_data_continuous = [
('c', float64), # unemployment compensation
('β', float64), # discount factor
('σ', float64), # scale parameter in lognormal distribution
('μ', float64), # location parameter in lognormal distribution
('w_draws', float64[:]) # draws of wages for Monte Carlo
]
@jitclass(mccall_data_continuous)
class McCallModelContinuous:
@jit(nopython=True)
def compute_reservation_wage_continuous(mcmc, max_iter=500, tol=1e5):
h_next = c + β * integral
error = np.abs(h_next h)
i += 1
h = h_next
return (1 β) * h
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/ipykernel_launcher.py:9: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba
doc/latest/reference/deprecation.html#changeofjitclasslocation for the time�
↪frame.
if __name__ == '__main__':
In [17]: grid_size = 25
R = np.empty((grid_size, grid_size))
for i, c in enumerate(c_vals):
for j, β in enumerate(β_vals):
mcmc = McCallModelContinuous(c=c, β=β)
R[i, j] = compute_reservation_wage_continuous(mcmc)
ax.set_title("reservation wage")
ax.set_xlabel("$c$", fontsize=16)
ax.set_ylabel("$β$", fontsize=16)
ax.ticklabel_format(useOffset=False)
plt.show()
20.7. SOLUTIONS 349
350 CHAPTER 20. JOB SEARCH I: THE MCCALL SEARCH MODEL
Chapter 21
21.1 Contents
• Overview 21.2
• The Model 21.3
• Solving the Model 21.4
• Implementation 21.5
• Impact of Parameters 21.6
• Exercises 21.7
• Solutions 21.8
In addition to what’s in Anaconda, this lecture will need the following libraries:
21.2 Overview
Previously we looked at the McCall job search model [80] as a way of understanding unem-
ployment and worker decisions.
One unrealistic feature of the model is that every job is permanent.
In this lecture, we extend the McCall model by introducing job separation.
Once separation enters the picture, the agent comes to view
• the loss of a job as a capital loss, and
• a spell of unemployment as an investment in searching for an acceptable job
The other minor addition is that a utility function will be included to make worker prefer-
ences slightly more sophisticated.
We’ll need the following imports
351
352 CHAPTER 21. JOB SEARCH II: SEARCH AND SEPARATION
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
∞
𝔼 ∑ 𝛽 𝑡 𝑢(𝑦𝑡 ) (1)
𝑡=0
At this stage the only difference from the baseline model is that we’ve added some flexibility
to preferences by introducing a utility function 𝑢.
It satisfies 𝑢′ > 0 and 𝑢″ < 0.
For now we will drop the separation of state process and wage process that we maintained for
the baseline model.
In particular, we simply suppose that wage offers {𝑤𝑡 } are IID with common distribution 𝑞.
The set of possible wage values is denoted by 𝕎.
(Later we will go back to having a separate state process {𝑠𝑡 } driving random outcomes, since
this formulation is usually convenient in more sophisticated models.)
If currently unemployed, the worker either accepts or rejects the current offer 𝑤𝑡 .
If he accepts, then he begins work immediately at wage 𝑤𝑡 .
If he rejects, then he receives unemployment compensation 𝑐.
The process then repeats.
(Note: we do not allow for job search while employed—this topic is taken up in a later lec-
ture)
We drop time subscripts in what follows and primes denote next period values.
Let
• 𝑣(𝑤𝑒 ) be total lifetime value accruing to a worker who enters the current period em-
ployed with existing wage 𝑤𝑒
• ℎ(𝑤) be total lifetime value accruing to a worker who who enters the current period un-
employed and receives wage offer 𝑤.
Here value means the value of the objective function (1) when the worker makes optimal deci-
sions at all future points in time.
Our first aim is to obtain these functions.
Suppose for now that the worker can calculate the functions 𝑣 and ℎ and use them in his de-
cision making.
Then 𝑣 and ℎ should satisfy
and
Rather than jumping straight into solving these equations, let’s see if we can simplify them
somewhat.
(This process will be analogous to our second pass at the plain vanilla McCall model, where
we simplified the Bellman equation.)
First, let
It is clear that 𝑣 is (at least weakly) increasing in 𝑤, since the agent is never made worse off
by a higher wage offer.
Hence, we can express the optimal choice as accepting wage offer 𝑤 if and only if
We’ll use the same iterative approach to solving the Bellman equations that we adopted in
the first job search lecture.
Here this amounts to
2. plug these guesses into the right-hand sides of (5) and (6)
3. update the left-hand sides from this rule and then repeat
21.5 Implementation
In [3]: @njit
def u(c, σ=2.0):
return (c**(1 σ) 1) / (1 σ)
Also, here’s a default wage distribution, based around the BetaBinomial distribution:
356 CHAPTER 21. JOB SEARCH II: SEARCH AND SEPARATION
Here’s our jitted class for the McCall model with separation.
In [5]: mccall_data = [
('α', float64), # job separation rate
('β', float64), # discount factor
('c', float64), # unemployment compensation
('w', float64[:]), # list of wage values
('q', float64[:]) # pmf of random variable w
]
@jitclass(mccall_data)
class McCallModel:
"""
Stores the parameters and functions associated with a given model.
"""
v_new = np.empty_like(v)
for i in range(len(w)):
v_new[i] = u(w[i]) + β * ((1 α) * v[i] + α * d)
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/ipykernel_launcher.py:9: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba
doc/latest/reference/deprecation.html#changeofjitclasslocation for the time�
↪frame.
if __name__ == '__main__':
Now we iterate until successive realizations are closer together than some small tolerance
level.
We then return the current iterate as an approximate solution.
In [6]: @njit
def solve_model(mcm, tol=1e5, max_iter=2000):
"""
Iterates to convergence on the Bellman equations
21.5. IMPLEMENTATION 357
return v, d
fig, ax = plt.subplots()
plt.show()
358 CHAPTER 21. JOB SEARCH II: SEARCH AND SEPARATION
The value 𝑣 is increasing because higher 𝑤 generates a higher wage flow conditional on stay-
ing employed.
In [8]: @njit
def compute_reservation_wage(mcm):
"""
Computes the reservation wage of an instance of the McCall model
by finding the smallest w such that v(w) >= h.
v, d = solve_model(mcm)
h = u(mcm.c) + mcm.β * d
w_bar = np.inf
for i, wage in enumerate(mcm.w):
if v[i] > h:
w_bar = wage
break
return w_bar
Next we will investigate how the reservation wage varies with parameters.
21.6. IMPACT OF PARAMETERS 359
In each instance below, we’ll show you a figure and then ask you to reproduce it in the exer-
cises.
As expected, higher unemployment compensation causes the worker to hold out for higher
wages.
In effect, the cost of continuing job search is reduced.
Again, the results are intuitive: More patient workers will hold out for higher wages.
Finally, let’s look at how 𝑤̄ varies with the job separation rate 𝛼.
Higher 𝛼 translates to a greater chance that a worker will face termination in each period
once employed.
21.7 Exercises
21.7.1 Exercise 1
In [9]: grid_size = 25
c_vals = np.linspace(2, 12, grid_size) # unemployment compensation
beta_vals = np.linspace(0.8, 0.99, grid_size) # discount factors
alpha_vals = np.linspace(0.05, 0.5, grid_size) # separation rate
21.8 Solutions
21.8.1 Exercise 1
w_bar_vals = np.empty_like(c_vals)
fig, ax = plt.subplots()
for i, c in enumerate(c_vals):
mcm.c = c
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar
ax.set(xlabel='unemployment compensation',
ylabel='reservation wage')
ax.plot(c_vals, w_bar_vals, label=r'$\bar w$ as a function of $c$')
ax.legend()
plt.show()
362 CHAPTER 21. JOB SEARCH II: SEARCH AND SEPARATION
for i, β in enumerate(beta_vals):
mcm.β = β
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar
plt.show()
21.8. SOLUTIONS 363
for i, α in enumerate(alpha_vals):
mcm.α = α
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar
plt.show()
364 CHAPTER 21. JOB SEARCH II: SEARCH AND SEPARATION
Chapter 22
22.1 Contents
• Overview 22.2
• The Algorithm 22.3
• Implementation 22.4
• Exercises 22.5
• Solutions 22.6
In addition to what’s in Anaconda, this lecture will need the following libraries:
22.2 Overview
In this lecture we again study the McCall job search model with separation, but now with a
continuous wage distribution.
While we already considered continuous wage distributions briefly in the exercises of the first
job search lecture, the change was relatively trivial in that case.
This is because we were able to reduce the problem to solving for a single scalar value (the
continuation value).
Here, with separation, the change is less trivial, since a continuous wage distribution leads to
an uncountably infinite state space.
The infinite state space leads to additional challenges, particularly when it comes to applying
value function iteration (VFI).
These challenges will lead us to modify VFI by adding an interpolation step.
The combination of VFI and this interpolation step is called fitted value function itera-
tion (fitted VFI).
Fitted VFI is very common in practice, so we will take some time to work through the de-
tails.
365
366 CHAPTER 22. JOB SEARCH III: FITTED VALUE FUNCTION ITERATION
import quantecon as qe
from interpolation import interp
from numpy.random import randn
from numba import njit, jitclass, prange, float64, int32
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
The model is the same as the McCall model with job separation we studied before, except
that the wage offer distribution is continuous.
We are going to start with the two Bellman equations we obtained for the model with job
separation after a simplifying transformation.
Modified to accommodate continuous wage draws, they take the following form:
and
1. in (1), what used to be a sum over a finite number of wage values is an integral over an
infinite set.
2. Plug 𝑣, 𝑑 into the right hand side of (1)–(2) and compute the left hand side to obtain
updates 𝑣′ , 𝑑′
3. Unless some stopping condition is satisfied, set (𝑣, 𝑑) = (𝑣′ , 𝑑′ ) and go to step 2.
However, there is a problem we must confront before we implement this procedure: The iter-
ates of the value function can neither be calculated exactly nor stored on a computer.
To see the issue, consider (2).
Even if 𝑣 is a known function, the only way to store its update 𝑣′ is to record its value 𝑣′ (𝑤)
for every 𝑤 ∈ ℝ+ .
Clearly, this is impossible.
1. Begin with an array v representing the values of an initial guess of the value function on
some grid points {𝑤𝑖 }.
2. Build a function 𝑣 on the state space ℝ+ by interpolation or approximation, based on v
and {𝑤𝑖 }.
3. Obtain and record the samples of the updated function 𝑣′ (𝑤𝑖 ) on each grid point 𝑤𝑖 .
4. Unless some stopping condition is satisfied, take this as the new array and go to step 1.
1. combines well with value function iteration (see., e.g., [44] or [100]) and
2. preserves useful shape properties such as monotonicity and concavity/convexity.
c_grid = np.linspace(0, 1, 6)
f_grid = np.linspace(0, 1, 150)
def Af(x):
return interp(c_grid, f(c_grid), x)
fig, ax = plt.subplots()
ax.legend(loc="upper center")
22.4 Implementation
The first step is to build a jitted class for the McCall model with separation and a continuous
wage offer distribution.
We will take the utility function to be the log function for this application, with 𝑢(𝑐) = ln 𝑐.
We will adopt the lognormal distribution for wages, with 𝑤 = exp(𝜇 + 𝜎𝑧) when 𝑧 is standard
normal and 𝜇, 𝜎 are parameters.
In [4]: @njit
def lognormal_draws(n=1000, μ=2.5, σ=0.5, seed=1234):
22.4. IMPLEMENTATION 369
np.random.seed(seed)
z = np.random.randn(n)
w_draws = np.exp(μ + σ * z)
return w_draws
In [5]: mccall_data_continuous = [
('c', float64), # unemployment compensation
('α', float64), # job separation rate
('β', float64), # discount factor
('σ', float64), # scale parameter in lognormal distribution
('μ', float64), # location parameter in lognormal distribution
('w_grid', float64[:]), # grid of points for fitted VFI
('w_draws', float64[:]) # draws of wages for Monte Carlo
]
@jitclass(mccall_data_continuous)
class McCallModelContinuous:
def __init__(self,
c=1,
α=0.1,
β=0.96,
grid_min=1e10,
grid_max=5,
grid_size=100,
w_draws=lognormal_draws()):
# Simplify names
c, α, β, σ, μ = self.c, self.α, self.β, self.σ, self.μ
w = self.w_grid
u = lambda x: np.log(x)
# Update v
v_new = u(w) + β * ((1 α) * v + α * d)
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/ipykernel_launcher.py:11: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba
doc/latest/reference/deprecation.html#changeofjitclasslocation for the time�
↪frame.
# This is added back by InteractiveShellApp.init_path()
370 CHAPTER 22. JOB SEARCH III: FITTED VALUE FUNCTION ITERATION
In [6]: @njit
def solve_model(mcm, tol=1e5, max_iter=2000):
"""
Iterates to convergence on the Bellman equations
return v, d
In [7]: @njit
def compute_reservation_wage(mcm):
"""
Computes the reservation wage of an instance of the McCall model
by finding the smallest w such that v(w) >= h.
v, d = solve_model(mcm)
h = u(mcm.c) + mcm.β * d
w_bar = np.inf
for i, wage in enumerate(mcm.w_grid):
if v[i] > h:
w_bar = wage
break
return w_bar
The exercises ask you to explore the solution and how it changes with parameters.
22.5. EXERCISES 371
22.5 Exercises
22.5.1 Exercise 1
Use the code above to explore what happens to the reservation wage when the wage parame-
ter 𝜇 changes.
Use the default parameters and 𝜇 in mu_vals = np.linspace(0.0, 2.0, 15)
Is the impact on the reservation wage as you expected?
22.5.2 Exercise 2
22.6 Solutions
22.6.1 Exercise 1
fig, ax = plt.subplots()
for i, m in enumerate(mu_vals):
mcm.w_draws = lognormal_draws(μ=m)
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar
plt.show()
372 CHAPTER 22. JOB SEARCH III: FITTED VALUE FUNCTION ITERATION
Not surprisingly, the agent is more inclined to wait when the distribution of offers shifts to
the right.
22.6.2 Exercise 2
fig, ax = plt.subplots()
for i, s in enumerate(s_vals):
a, b = m s, m + s
mcm.w_draws = np.random.uniform(low=a, high=b, size=10_000)
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar
plt.show()
22.6. SOLUTIONS 373
23.1 Contents
• Overview 23.2
• The Model 23.3
• Implementation 23.4
• Unemployment Duration 23.5
• Exercises 23.6
• Solutions 23.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
23.2 Overview
In this lecture we solve a McCall style job search model with persistent and transitory com-
ponents to wages.
In other words, we relax the unrealistic assumption that randomness in wages is independent
over time.
At the same time, we will go back to assuming that jobs are permanent and no separation
occurs.
This is to keep the model relatively simple as we study the impact of correlation.
We will use the following imports:
import quantecon as qe
from interpolation import interp
from numpy.random import randn
from numba import njit, jitclass, prange, float64
375
376 CHAPTER 23. JOB SEARCH IV: CORRELATED WAGE OFFERS
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
𝑤𝑡 = exp(𝑧𝑡 ) + 𝑦𝑡
where
Here {𝜁𝑡 } and {𝜖𝑡 } are both IID and standard normal.
Here {𝑦𝑡 } is a transitory component and {𝑧𝑡 } is persistent.
As before, the worker can either
𝑢(𝑤)
𝑣∗ (𝑤, 𝑧) = max { , 𝑢(𝑐) + 𝛽 𝔼𝑧 𝑣∗ (𝑤′ , 𝑧 ′ )}
1−𝛽
In this express, 𝑢 is a utility function and 𝔼𝑧 is expectation of next period variables given cur-
rent 𝑧.
The variable 𝑧 enters as a state in the Bellman equation because its current value helps pre-
dict future wages.
23.3.1 A Simplification
There is a way that we can reduce dimensionality in this problem, which greatly accelerates
computation.
To start, let 𝑓 ∗ be the continuation value function, defined by
𝑢(𝑤) ∗
𝑣∗ (𝑤, 𝑧) = max { , 𝑓 (𝑧)}
1−𝛽
23.4. IMPLEMENTATION 377
Combining the last two expressions, we see that the continuation value function satisfies
𝑢(𝑤′ ) ∗ ′
𝑓 ∗ (𝑧) = 𝑢(𝑐) + 𝛽 𝔼𝑧 max { , 𝑓 (𝑧 )}
1−𝛽
𝑢(𝑤′ )
𝑄𝑓(𝑧) = 𝑢(𝑐) + 𝛽 𝔼𝑧 max { , 𝑓(𝑧 ′ )}
1−𝛽
𝑢(𝑤)
≥ 𝑓 ∗ (𝑧)
1−𝛽
𝑤(𝑧)
̄ ∶= exp(𝑓 ∗ (𝑧)(1 − 𝛽)) (1)
Our main aim is to solve for the reservation rule and study its properties and implications.
23.4 Implementation
In [3]: job_search_data = [
('μ', float64), # transient shock log mean
('s', float64), # transient shock log variance
('d', float64), # shift coefficient of persistent state
('ρ', float64), # correlation coefficient of persistent state
('σ', float64), # state volatility
('β', float64), # discount factor
378 CHAPTER 23. JOB SEARCH IV: CORRELATED WAGE OFFERS
Here’s a class that stores the data and the right hand side of the Bellman equation.
Default parameter values are embedded in the class.
In [4]: @jitclass(job_search_data)
class JobSearch:
def __init__(self,
μ=0.0, # transient shock log mean
s=1.0, # transient shock log variance
d=0.0, # shift coefficient of persistent state
ρ=0.9, # correlation coefficient of persistent state
σ=0.1, # state volatility
β=0.98, # discount factor
c=5, # unemployment compensation
mc_size=1000,
grid_size=100):
# Set up grid
z_mean = d / (1 ρ)
z_sd = np.sqrt(σ / (1 ρ**2))
k = 3 # std devs from mean
a, b = z_mean k * z_sd, z_mean + k * z_sd
self.z_grid = np.linspace(a, b, grid_size)
def parameters(self):
"""
Return all parameters as a tuple.
"""
return self.μ, self.s, self.d, \
self.ρ, self.σ, self.β, self.c
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/ipykernel_launcher.py:1: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba
doc/latest/reference/deprecation.html#changeofjitclasslocation for the time�
↪frame.
"""Entry point for launching an IPython kernel.
In [5]: @njit(parallel=True)
def Q(js, f_in, f_out):
"""
Apply the operator Q.
23.4. IMPLEMENTATION 379
* js is an instance of JobSearch
* f_in and f_out are arrays that represent f and Qf respectively
"""
μ, s, d, ρ, σ, β, c = js.parameters()
M = js.e_draws.shape[1]
for i in prange(len(js.z_grid)):
z = js.z_grid[i]
expectation = 0.0
for m in range(M):
e1, e2 = js.e_draws[:, m]
z_next = d + ρ * z + σ * e1
go_val = interp(js.z_grid, f_in, z_next) # f(z')
y_next = np.exp(μ + s * e2) # y' draw
w_next = np.exp(z_next) + y_next # w' draw
stop_val = np.log(w_next) / (1 β)
expectation += max(stop_val, go_val)
expectation = expectation / M
f_out[i] = np.log(c) + β * expectation
# Set up loop
f_in = f_init
i = 0
error = tol + 1
if i == max_iter:
print("Failed to converge!")
return f_out
In [7]: js = JobSearch()
qe.tic()
380 CHAPTER 23. JOB SEARCH IV: CORRELATED WAGE OFFERS
Out[7]: 6.979816436767578
Next we will compute and plot the reservation wage function defined in (1).
fig, ax = plt.subplots()
ax.plot(js.z_grid, res_wage_function, label="reservation wage given $z$")
ax.set(xlabel="$z$", ylabel="wage")
ax.legend()
plt.show()
In [9]: c_vals = 1, 2, 3
fig, ax = plt.subplots()
for c in c_vals:
js = JobSearch(c=c)
f_star = compute_fixed_point(js, verbose=False)
res_wage_function = np.exp(f_star * (1 js.β))
ax.plot(js.z_grid, res_wage_function, label=f"$\\bar w$ at $c = {c}$")
ax.set(xlabel="$z$", ylabel="wage")
ax.legend()
plt.show()
As expected, higher unemployment compensation shifts the reservation wage up at all state
values.
Next we study how mean unemployment duration varies with unemployment compensation.
For simplicity we’ll fix the initial state at 𝑧𝑡 = 0.
@njit
def f_star_function(z):
382 CHAPTER 23. JOB SEARCH IV: CORRELATED WAGE OFFERS
@njit
def draw_tau(t_max=10_000):
z = 0
t = 0
unemployed = True
while unemployed and t < t_max:
# draw current wage
y = np.exp(μ + s * np.random.randn())
w = np.exp(z) + y
res_wage = np.exp(f_star_function(z) * (1 β))
# if optimal to stop, record t
if w >= res_wage:
unemployed = False
τ = t
# else increment data and state
else:
z = ρ * z + d + σ * np.random.randn()
t += 1
return τ
@njit(parallel=True)
def compute_expected_tau(num_reps=100_000):
sum_value = 0
for i in prange(num_reps):
sum_value += draw_tau()
return sum_value / num_reps
return compute_expected_tau()
Let’s test this out with some possible values for unemployment compensation.
23.6 Exercises
23.6.1 Exercise 1
Investigate how mean unemployment duration varies with the discount factor 𝛽.
• What is your prior expectation?
• Do your results match up?
23.7 Solutions
23.7.1 Exercise 1
ax.set_xlabel("$\\beta$")
ax.set_ylabel("mean unemployment duration")
plt.show()
The figure shows that more patient individuals tend to wait longer before accepting an offer.
Chapter 24
24.1 Contents
• Overview 24.2
• Model 24.3
• Implementation 24.4
• Exercises 24.5
• Solutions 24.6
In addition to what’s in Anaconda, this lecture will need the following libraries:
24.2 Overview
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
385
386 CHAPTER 24. JOB SEARCH V: MODELING CAREER CHOICE
• Career and job within career both chosen to maximize expected discounted wage flow.
• Infinite horizon dynamic programming with two state variables.
24.3 Model
∞
𝔼 ∑ 𝛽 𝑡 𝑤𝑡 (1)
𝑡=0
where
𝐼 = 𝜃 + 𝜖 + 𝛽𝑣(𝜃, 𝜖)
Evidently 𝐼, 𝐼𝐼 and 𝐼𝐼𝐼 correspond to “stay put”, “new job” and “new life”, respectively.
24.3. MODEL 387
24.3.1 Parameterization
As in [72], section 6.5, we will focus on a discrete version of the model, parameterized as fol-
lows:
• both 𝜃 and 𝜖 take values in the set np.linspace(0, B, grid_size) — an even grid of
points between 0 and 𝐵 inclusive
• grid_size = 50
• B = 5
• β = 0.95
The distributions 𝐹 and 𝐺 are discrete distributions generating draws from the grid points
np.linspace(0, B, grid_size).
A very useful family of discrete distributions is the Beta-binomial family, with probability
mass function
𝑛 𝐵(𝑘 + 𝑎, 𝑛 − 𝑘 + 𝑏)
𝑝(𝑘 | 𝑛, 𝑎, 𝑏) = ( ) , 𝑘 = 0, … , 𝑛
𝑘 𝐵(𝑎, 𝑏)
Interpretation:
• draw 𝑞 from a Beta distribution with shape parameters (𝑎, 𝑏)
• run 𝑛 independent binary trials, each with success probability 𝑞
• 𝑝(𝑘 | 𝑛, 𝑎, 𝑏) is the probability of 𝑘 successes in these 𝑛 trials
Nice properties:
• very flexible class of distributions, including uniform, symmetric unimodal, etc.
• only three parameters
Here’s a figure showing the effect on the pmf of different shape parameters when 𝑛 = 50.
n = 50
a_vals = [0.5, 1, 100]
b_vals = [0.5, 1, 100]
fig, ax = plt.subplots(figsize=(10, 6))
for a, b in zip(a_vals, b_vals):
ab_label = f'$a = {a:.1f}$, $b = {b:.1f}$'
ax.plot(list(range(0, n+1)), gen_probs(n, a, b), 'o', label=ab_label)
ax.legend()
plt.show()
388 CHAPTER 24. JOB SEARCH V: MODELING CAREER CHOICE
24.4 Implementation
We will first create a class CareerWorkerProblem which will hold the default parameterizations
of the model and an initial guess for the value function.
def __init__(self,
B=5.0, # Upper bound
β=0.95, # Discount factor
grid_size=50, # Grid size
F_a=1,
F_b=1,
G_a=1,
G_b=1):
The following function takes an instance of CareerWorkerProblem and returns the correspond-
ing Bellman operator 𝑇 and the greedy policy function.
24.4. IMPLEMENTATION 389
In this model, 𝑇 is defined by 𝑇 𝑣(𝜃, 𝜖) = max{𝐼, 𝐼𝐼, 𝐼𝐼𝐼}, where 𝐼, 𝐼𝐼 and 𝐼𝐼𝐼 are as given in
(2).
"""
Returns jitted versions of the Bellman operator and the
greedy policy function
cw is an instance of ``CareerWorkerProblem``
"""
@njit(parallel=parallel_flag)
def T(v):
"The Bellman operator"
v_new = np.empty_like(v)
for i in prange(len(v)):
for j in prange(len(v)):
v1 = θ[i] + ϵ[j] + β * v[i, j] # Stay put
v2 = θ[i] + G_mean + β * v[i, :] @ G_probs # New job
v3 = G_mean + F_mean + β * F_probs @ v @ G_probs # New life
v_new[i, j] = max(v1, v2, v3)
return v_new
@njit
def get_greedy(v):
"Computes the vgreedy policy"
σ = np.empty(v.shape)
for i in range(len(v)):
for j in range(len(v)):
v1 = θ[i] + ϵ[j] + β * v[i, j]
v2 = θ[i] + G_mean + β * v[i, :] @ G_probs
v3 = G_mean + F_mean + β * F_probs @ v @ G_probs
if v1 > max(v2, v3):
action = 1
elif v2 > max(v1, v3):
action = 2
else:
action = 3
σ[i, j] = action
return σ
return T, get_greedy
Lastly, solve_model will take an instance of CareerWorkerProblem and iterate using the Bell-
man operator to find the fixed point of the value function.
verbose=True,
print_skip=25):
T, _ = operator_factory(cw, parallel_flag=use_parallel)
# Set up loop
v = np.ones((cw.grid_size, cw.grid_size)) * 100 # Initial guess
i = 0
error = tol + 1
if i == max_iter:
print("Failed to converge!")
return v_new
In [7]: cw = CareerWorkerProblem()
T, get_greedy = operator_factory(cw)
v_star = solve_model(cw, verbose=False)
greedy_star = get_greedy(v_star)
Interpretation:
• If both job and career are poor or mediocre, the worker will experiment with a new job
and new career.
• If career is sufficiently good, the worker will hold it and experiment with new jobs until
a sufficiently good one is found.
• If both job and career are good, the worker will stay put.
Notice that the worker will always hold on to a sufficiently good career, but not necessarily
hold on to even the best paying job.
The reason is that high lifetime wages require both variables to be large, and the worker can-
not change careers without changing jobs.
• Sometimes a good job must be sacrificed in order to change to a better career.
24.5 Exercises
24.5.1 Exercise 1
Using the default parameterization in the class CareerWorkerProblem, generate and plot typi-
cal sample paths for 𝜃 and 𝜖 when the worker follows the optimal policy.
24.5. EXERCISES 393
In particular, modulo randomness, reproduce the following figure (where the horizontal axis
represents time)
Hint: To generate the draws from the distributions 𝐹 and 𝐺, use quantecon.random.draw().
24.5.2 Exercise 2
Let’s now consider how long it takes for the worker to settle down to a permanent job, given
a starting point of (𝜃, 𝜖) = (0, 0).
In other words, we want to study the distribution of the random variable
𝑇 ∗ ∶= the first point in time from which the worker’s job no longer changes
Evidently, the worker’s job becomes permanent if and only if (𝜃𝑡 , 𝜖𝑡 ) enters the “stay put”
region of (𝜃, 𝜖) space.
Letting 𝑆 denote this region, 𝑇 ∗ can be expressed as the first passage time to 𝑆 under the
optimal policy:
𝑇 ∗ ∶= inf{𝑡 ≥ 0 | (𝜃𝑡 , 𝜖𝑡 ) ∈ 𝑆}
Collect 25,000 draws of this random variable and compute the median (which should be
about 7).
Repeat the exercise with 𝛽 = 0.99 and interpret the change.
394 CHAPTER 24. JOB SEARCH V: MODELING CAREER CHOICE
24.5.3 Exercise 3
Set the parameterization to G_a = G_b = 100 and generate a new optimal policy figure – in-
terpret.
24.6 Solutions
24.6.1 Exercise 1
In [9]: F = np.cumsum(cw.F_probs)
G = np.cumsum(cw.G_probs)
v_star = solve_model(cw, verbose=False)
T, get_greedy = operator_factory(cw)
greedy_star = get_greedy(v_star)
plt.legend()
plt.show()
24.6. SOLUTIONS 395
24.6.2 Exercise 2
In [10]: cw = CareerWorkerProblem()
F = np.cumsum(cw.F_probs)
G = np.cumsum(cw.G_probs)
T, get_greedy = operator_factory(cw)
v_star = solve_model(cw, verbose=False)
greedy_star = get_greedy(v_star)
@njit
def passage_time(optimal_policy, F, G):
t = 0
i = j = 0
while True:
if optimal_policy[i, j] == 1: # Stay put
return t
elif optimal_policy[i, j] == 2: # New job
j = int(qe.random.draw(G))
else: # New life
i, j = int(qe.random.draw(F)), int(qe.random.draw(G))
t += 1
@njit(parallel=True)
def median_time(optimal_policy, F, G, M=25000):
samples = np.empty(M)
for i in prange(M):
396 CHAPTER 24. JOB SEARCH V: MODELING CAREER CHOICE
samples[i] = passage_time(optimal_policy, F, G)
return np.median(samples)
median_time(greedy_star, F, G)
Out[10]: 7.0
To compute the median with 𝛽 = 0.99 instead of the default value 𝛽 = 0.95, replace cw =
CareerWorkerProblem() with cw = CareerWorkerProblem(β=0.99).
The medians are subject to randomness but should be about 7 and 14 respectively.
Not surprisingly, more patient workers will wait longer to settle down to their final job.
24.6.3 Exercise 3
In the new figure, you see that the region for which the worker stays put has grown because
the distribution for 𝜖 has become more concentrated around the mean, making high-paying
jobs less realistic.
398 CHAPTER 24. JOB SEARCH V: MODELING CAREER CHOICE
Chapter 25
25.1 Contents
• Overview 25.2
• Model 25.3
• Implementation 25.4
• Solving for Policies 25.5
• Exercises 25.6
• Solutions 25.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
25.2 Overview
399
400 CHAPTER 25. JOB SEARCH VI: ON-THE-JOB SEARCH
25.3 Model
Let 𝑥𝑡 denote the time-𝑡 job-specific human capital of a worker employed at a given firm and
let 𝑤𝑡 denote current wages.
Let 𝑤𝑡 = 𝑥𝑡 (1 − 𝑠𝑡 − 𝜙𝑡 ), where
• 𝜙𝑡 is investment in job-specific human capital for the current role and
• 𝑠𝑡 is search effort, devoted to obtaining new offers from other firms.
For as long as the worker remains in the current job, evolution of {𝑥𝑡 } is given by 𝑥𝑡+1 =
𝑔(𝑥𝑡 , 𝜙𝑡 ).
When search effort at 𝑡 is 𝑠𝑡 , the worker receives a new job offer with probability 𝜋(𝑠𝑡 ) ∈
[0, 1].
The value of the offer, measured in job-specific human capital, is 𝑢𝑡+1 , where {𝑢𝑡 } is IID with
common distribution 𝑓.
The worker can reject the current offer and continue with existing job.
Hence 𝑥𝑡+1 = 𝑢𝑡+1 if he/she accepts and 𝑥𝑡+1 = 𝑔(𝑥𝑡 , 𝜙𝑡 ) otherwise.
Let 𝑏𝑡+1 ∈ {0, 1} be a binary random variable, where 𝑏𝑡+1 = 1 indicates that the worker
receives an offer at the end of time 𝑡.
We can write
Agent’s objective: maximize expected discounted sum of wages via controls {𝑠𝑡 } and {𝜙𝑡 }.
Taking the expectation of 𝑣(𝑥𝑡+1 ) and using (1), the Bellman equation for this problem can
be written as
𝑣(𝑥) = max {𝑥(1 − 𝑠 − 𝜙) + 𝛽(1 − 𝜋(𝑠))𝑣[𝑔(𝑥, 𝜙)] + 𝛽𝜋(𝑠) ∫ 𝑣[𝑔(𝑥, 𝜙) ∨ 𝑢]𝑓(𝑑𝑢)} (2)
𝑠+𝜙≤1
25.3.1 Parameterization
√
𝑔(𝑥, 𝜙) = 𝐴(𝑥𝜙)𝛼 , 𝜋(𝑠) = 𝑠 and 𝑓 = Beta(2, 2)
Before we solve the model, let’s make some quick calculations that provide intuition on what
the solution should look like.
To begin, observe that the worker has two instruments to build capital and hence wages:
2. search for a new job with better job-specific capital match via 𝑠
Since wages are 𝑥(1 − 𝑠 − 𝜙), marginal cost of investment via either 𝜙 or 𝑠 is identical.
Our risk-neutral worker should focus on whatever instrument has the highest expected return.
The relative expected return will depend on 𝑥.
For example, suppose first that 𝑥 = 0.05
• If 𝑠 = 1 and 𝜙 = 0, then since 𝑔(𝑥, 𝜙) = 0, taking expectations of (1) gives expected
next period capital equal to 𝜋(𝑠)𝔼𝑢 = 𝔼𝑢 = 0.5.
• If 𝑠 = 0 and 𝜙 = 1, then next period capital is 𝑔(𝑥, 𝜙) = 𝑔(0.05, 1) ≈ 0.23.
Both rates of return are good, but the return from search is better.
Next, suppose that 𝑥 = 0.4
• If 𝑠 = 1 and 𝜙 = 0, then expected next period capital is again 0.5
• If 𝑠 = 0 and 𝜙 = 1, then 𝑔(𝑥, 𝜙) = 𝑔(0.4, 1) ≈ 0.8
Return from investment via 𝜙 dominates expected return from search.
Combining these observations gives us two informal predictions:
1. At any given state 𝑥, the two controls 𝜙 and 𝑠 will function primarily as substitutes —
worker will focus on whichever instrument has the higher expected return.
Now let’s turn to implementation, and see if we can match our predictions.
25.4 Implementation
We will set up a class JVWorker that holds the parameters of the model described above
"""
def __init__(self,
A=1.4,
α=0.6,
β=0.96, # Discount factor
π=np.sqrt, # Search effort function
402 CHAPTER 25. JOB SEARCH VI: ON-THE-JOB SEARCH
a=2, # Parameter of f
b=2, # Parameter of f
grid_size=50,
mc_size=100,
�=1e4):
# Max of grid is the max of a large quantile value for f and the
# fixed point y = g(y, 1)
� = 1e4
grid_max = max(A**(1 / (1 α)), stats.beta(a, b).ppf(1 �))
# Human capital
self.x_grid = np.linspace(�, grid_max, grid_size)
The function operator_factory takes an instance of this class and returns a jitted version of
the Bellman operator T, ie.
where
When we represent 𝑣, it will be with a NumPy array v giving values on grid x_grid.
But to evaluate the right-hand side of (3), we need a function, so we replace the arrays v and
x_grid with a function v_func that gives linear interpolation of v on x_grid.
Inside the for loop, for each x in the grid over the state space, we set up the function 𝑤(𝑧) =
𝑤(𝑠, 𝜙) defined in (3).
The function is maximized over all feasible (𝑠, 𝜙) pairs.
Another function, get_greedy returns the optimal choice of 𝑠 and 𝜙 at each 𝑥, given a value
function.
"""
Returns a jitted version of the Bellman operator T
jv is an instance of JVWorker
"""
π, β = jv.π, jv.β
x_grid, �, mc_size = jv.x_grid, jv.�, jv.mc_size
f_rvs, g = jv.f_rvs, jv.g
@njit
def state_action_values(z, x, v):
s, ϕ = z
25.4. IMPLEMENTATION 403
integral = 0
for m in range(mc_size):
u = f_rvs[m]
integral += v_func(max(g(x, ϕ), u))
integral = integral / mc_size
@njit(parallel=parallel_flag)
def T(v):
"""
The Bellman operator
"""
v_new = np.empty_like(v)
for i in prange(len(x_grid)):
x = x_grid[i]
# Search on a grid
search_grid = np.linspace(�, 1, 15)
max_val = 1
for s in search_grid:
for ϕ in search_grid:
current_val = state_action_values((s, ϕ), x, v) if s + ϕ <= 1 else
1
if current_val > max_val:
max_val = current_val
v_new[i] = max_val
return v_new
@njit
def get_greedy(v):
"""
Computes the vgreedy policy of a given function v
"""
s_policy, ϕ_policy = np.empty_like(v), np.empty_like(v)
for i in range(len(x_grid)):
x = x_grid[i]
# Search on a grid
search_grid = np.linspace(�, 1, 15)
max_val = 1
for s in search_grid:
for ϕ in search_grid:
current_val = state_action_values((s, ϕ), x, v) if s + ϕ <= 1 else
1
if current_val > max_val:
max_val = current_val
max_s, max_ϕ = s, ϕ
s_policy[i], ϕ_policy[i] = max_s, max_ϕ
return s_policy, ϕ_policy
return T, get_greedy
To solve the model, we will write a function that uses the Bellman operator and iterates to
find a fixed point.
use_parallel=True,
tol=1e4,
max_iter=1000,
verbose=True,
print_skip=25):
"""
Solves the model by value function iteration
* jv is an instance of JVWorker
"""
T, _ = operator_factory(jv, parallel_flag=use_parallel)
# Set up loop
v = jv.x_grid * 0.5 # Initial condition
i = 0
error = tol + 1
if i == max_iter:
print("Failed to converge!")
return v_new
Let’s generate the optimal policies and see what they look like.
In [6]: jv = JVWorker()
T, get_greedy = operator_factory(jv)
v_star = solve_model(jv)
s_star, ϕ_star = get_greedy(v_star)
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
axes[1].set_xlabel("x")
plt.show()
The horizontal axis is the state 𝑥, while the vertical axis gives 𝑠(𝑥) and 𝜙(𝑥).
406 CHAPTER 25. JOB SEARCH VI: ON-THE-JOB SEARCH
Overall, the policies match well with our predictions from above
• Worker switches from one investment strategy to the other depending on relative re-
turn.
• For low values of 𝑥, the best option is to search for a new job.
• Once 𝑥 is larger, worker does better by investing in human capital specific to the cur-
rent position.
25.6 Exercises
25.6.1 Exercise 1
Let’s look at the dynamics for the state process {𝑥𝑡 } associated with these policies.
The dynamics are given by (1) when 𝜙𝑡 and 𝑠𝑡 are chosen according to the optimal policies,
and ℙ{𝑏𝑡+1 = 1} = 𝜋(𝑠𝑡 ).
Since the dynamics are random, analysis is a bit subtle.
One way to do it is to plot, for each 𝑥 in a relatively fine grid called plot_grid, a large num-
ber 𝐾 of realizations of 𝑥𝑡+1 given 𝑥𝑡 = 𝑥.
Plot this with one dot for each realization, in the form of a 45 degree diagram, setting
jv = JVWorker(grid_size=25, mc_size=50)
plot_grid_max, plot_grid_size = 1.2, 100
plot_grid = np.linspace(0, plot_grid_max, plot_grid_size)
fig, ax = plt.subplots()
ax.set_xlim(0, plot_grid_max)
ax.set_ylim(0, plot_grid_max)
By examining the plot, argue that under the optimal policies, the state 𝑥𝑡 will converge to a
constant value 𝑥̄ close to unity.
Argue that at the steady state, 𝑠𝑡 ≈ 0 and 𝜙𝑡 ≈ 0.6.
25.6.2 Exercise 2
In the preceding exercise, we found that 𝑠𝑡 converges to zero and 𝜙𝑡 converges to about 0.6.
Since these results were calculated at a value of 𝛽 close to one, let’s compare them to the best
choice for an infinitely patient worker.
Intuitively, an infinitely patient worker would like to maximize steady state wages, which are
a function of steady state capital.
You can take it as given—it’s certainly true—that the infinitely patient worker does not
search in the long run (i.e., 𝑠𝑡 = 0 for large 𝑡).
Thus, given 𝜙, steady state capital is the positive fixed point 𝑥∗ (𝜙) of the map 𝑥 ↦ 𝑔(𝑥, 𝜙).
Steady state wages can be written as 𝑤∗ (𝜙) = 𝑥∗ (𝜙)(1 − 𝜙).
Graph 𝑤∗ (𝜙) with respect to 𝜙, and examine the best choice of 𝜙.
Can you give a rough interpretation for the value that you see?
25.7. SOLUTIONS 407
25.7 Solutions
25.7.1 Exercise 1
plt.show()
408 CHAPTER 25. JOB SEARCH VI: ON-THE-JOB SEARCH
25.7.2 Exercise 2
In [9]: jv = JVWorker()
def xbar(ϕ):
A, α = jv.A, jv.α
return (A * ϕ**α)**(1 / (1 α))
plt.show()
411
Chapter 26
26.1 Contents
• Overview 26.2
• The Model 26.3
• The Value Function 26.4
• The Optimal Policy 26.5
• The Euler Equation 26.6
• Exercises 26.7
• Solutions 26.8
26.2 Overview
413
414 CHAPTER 26. CAKE EATING I: INTRODUCTION TO OPTIMAL SAVING
𝑥𝑡+1 = 𝑥𝑡 − 𝑐𝑡
left in period 𝑡 + 1.
Consuming quantity 𝑐 of the cake gives current utility 𝑢(𝑐).
We adopt the CRRA utility function
𝑐1−𝛾
𝑢(𝑐) = (𝛾 > 0, 𝛾 ≠ 1) (1)
1−𝛾
In Python this is
return c**(1 γ) / (1 γ)
∞
max ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (2)
{𝑐𝑡 }
𝑡=0
subject to
for all 𝑡.
A consumption path {𝑐𝑡 } satisfying (3) where 𝑥0 = 𝑥̄ is called feasible.
In this problem, the following terminology is standard:
• 𝑥𝑡 is called the state variable
• 𝑐𝑡 is called the control variable or the action
• 𝛽 and 𝛾 are parameters
26.4. THE VALUE FUNCTION 415
26.3.1 Trade-Off
26.3.2 Intuition
The reasoning given above suggests that the discount factor 𝛽 and the curvature parameter 𝛾
will play a key role in determining the rate of consumption.
Here’s an educated guess as to what impact these parameters will have.
First, higher 𝛽 implies less discounting, and hence the agent is more patient, which should
reduce the rate of consumption.
Second, higher 𝛾 implies that marginal utility 𝑢′ (𝑐) = 𝑐−𝛾 falls faster with 𝑐.
This suggests more smoothing, and hence a lower rate of consumption.
In summary, we expect the rate of consumption to be decreasing in both parameters.
Let’s see if this is true.
The first step of our dynamic programming treatment is to obtain the Bellman equation.
The next step is to use it to calculate the solution.
To this end, we let 𝑣(𝑥) be maximum lifetime utility attainable from the current time when 𝑥
units of cake are left.
That is,
∞
𝑣(𝑥) = max ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (4)
𝑡=0
where the maximization is over all paths {𝑐𝑡 } that are feasible from 𝑥0 = 𝑥.
At this point, we do not have an expression for 𝑣, but we can still make inferences about it.
For example, as was the case with the McCall model, the value function will satisfy a version
of the Bellman equation.
In the present case, this equation states that 𝑣 satisfies
416 CHAPTER 26. CAKE EATING I: INTRODUCTION TO OPTIMAL SAVING
The intuition here is essentially the same it was for the McCall model.
Choosing 𝑐 optimally means trading off current vs future rewards.
Current rewards from choice 𝑐 are just 𝑢(𝑐).
Future rewards given current cake size 𝑥, measured from next period and assuming optimal
behavior, are 𝑣(𝑥 − 𝑐).
These are the two terms on the right hand side of (5), after suitable discounting.
If 𝑐 is chosen optimally using this trade off strategy, then we obtain maximal lifetime rewards
from our current state 𝑥.
Hence, 𝑣(𝑥) equals the right hand side of (5), as claimed.
It has been shown that, with 𝑢 as the CRRA utility function in (1), the function
−𝛾
𝑣∗ (𝑥𝑡 ) = (1 − 𝛽 1/𝛾 ) 𝑢(𝑥𝑡 ) (6)
solves the Bellman equation and hence is equal to the value function.
You are asked to confirm that this is true in the exercises below.
The solution (6) depends heavily on the CRRA utility function.
In fact, if we move away from CRRA utility, usually there is no analytical solution at all.
In other words, beyond CRRA utility, we know that the value function still satisfies the Bell-
man equation, but we do not have a way of writing it explicitly, as a function of the state
variable and the parameters.
We will deal with that situation numerically when the time comes.
Here is a Python representation of the value function:
fig, ax = plt.subplots()
ax.set_xlabel('$x$', fontsize=12)
ax.legend(fontsize=12)
plt.show()
26.5. THE OPTIMAL POLICY 417
Now that we have the value function, it is straightforward to calculate the optimal action at
each state.
We should choose consumption to maximize the right hand side of the Bellman equation (5).
We can think of this optimal choice as a function of the state 𝑥, in which case we call it the
optimal policy.
We denote the optimal policy by 𝜎∗ , so that
If we plug the analytical expression (6) for the value function into the right hand side and
compute the optimum, we find that
return (1 β ** (1/γ)) * x
Continuing with the values for 𝛽 and 𝛾 used above, the plot is
plt.show()
In the discussion above we have provided a complete solution to the cake eating problem in
the case of CRRA utility.
There is in fact another way to solve for the optimal policy, based on the so-called Euler
equation.
Although we already have a complete solution, now is a good time to study the Euler equa-
tion.
This is because, for more difficult problems, this equation provides key insights that are hard
to obtain by other methods.
26.6. THE EULER EQUATION 419
Note
A functional equation is an equation where the unknown object is a function.
For a proof of sufficiency of the Euler equation in a very general setting, see proposition 2.2 of
[75].
The following arguments focus on necessity, explaining why an optimal path or policy should
satisfy the Euler equation.
∞
max 𝑈 (𝑐) where 𝑈 (𝑐) ∶= ∑ 𝛽 𝑡 𝑢(𝑐𝑡 )
𝑐∈𝐹
𝑡=0
Note
If you want to know exactly how the derivative 𝑈 ′ (𝑐∗ ) is defined, given that the
argument 𝑐∗ is a vector of infinite length, you can start by learning about Gateaux
derivatives. However, such knowledge is not assumed in what follows.
In other words, the rate of change in 𝑈 must be zero for any infinitesimally small (and feasi-
ble) perturbation away from the optimal path.
So consider a feasible perturbation that reduces consumption at time 𝑡 to 𝑐𝑡∗ − ℎ and increases
∗
it in the next period to 𝑐𝑡+1 + ℎ.
Consumption does not change in any other period.
We call this perturbed path 𝑐ℎ .
By the preceding argument about zero gradients, we have
𝑈 (𝑐ℎ ) − 𝑈 (𝑐∗ )
lim = 𝑈 ′ (𝑐∗ ) = 0
ℎ→0 ℎ
∗ ∗
𝑢(𝑐𝑡∗ − ℎ) − 𝑢(𝑐𝑡∗ ) 𝛽𝑢(𝑐𝑡+1 + ℎ) − 𝑢(𝑐𝑡+1 )
lim + lim =0
ℎ→0 ℎ ℎ→0 ℎ
Another way to derive the Euler equation is to use the Bellman equation (5).
Taking the derivative on the right hand side of the Bellman equation with respect to 𝑐 and
setting it to zero, we get
To obtain 𝑣′ (𝑥 − 𝑐), we set 𝑔(𝑐, 𝑥) = 𝑢(𝑐) + 𝛽𝑣(𝑥 − 𝑐), so that, at the optimal choice of con-
sumption,
Differentiating both sides while acknowledging that the maximizing consumption will depend
on 𝑥, we get
26.7. EXERCISES 421
𝜕 𝜕𝑐 𝜕
𝑣′ (𝑥) = 𝑔(𝑐, 𝑥) + 𝑔(𝑐, 𝑥)
𝜕𝑐 𝜕𝑥 𝜕𝑥
𝜕
When 𝑔(𝑐, 𝑥) is maximized at 𝑐, we have 𝜕𝑐 𝑔(𝑐, 𝑥) = 0.
Hence the derivative simplifies to
𝜕𝑔(𝑐, 𝑥) 𝜕
𝑣′ (𝑥) = = 𝛽𝑣(𝑥 − 𝑐) = 𝛽𝑣′ (𝑥 − 𝑐) (12)
𝜕𝑥 𝜕𝑥
26.7 Exercises
26.7.1 Exercise 1
How does one obtain the expressions for the value function and optimal policy given in (6)
and (7) respectively?
The first step is to make a guess of the functional form for the consumption policy.
So suppose that we do not know the solutions and start with a guess that the optimal policy
is linear.
In other words, we conjecture that there exists a positive 𝜃 such that setting 𝑐𝑡∗ = 𝜃𝑥𝑡 for all 𝑡
produces an optimal path.
Starting from this conjecture, try to obtain the solutions (6) and (7).
In doing so, you will need to use the definition of the value function and the Bellman equa-
tion.
26.8 Solutions
26.8.1 Exercise 1
We start with the conjecture 𝑐𝑡∗ = 𝜃𝑥𝑡 , which leads to a path for the state variable (cake size)
given by
𝑥𝑡+1 = 𝑥𝑡 (1 − 𝜃)
∞
𝑣(𝑥0 ) = ∑ 𝛽 𝑡 𝑢(𝜃𝑥𝑡 )
𝑡=0
∞
= ∑ 𝛽 𝑡 𝑢(𝜃𝑥0 (1 − 𝜃)𝑡 )
𝑡=0
∞
= ∑ 𝜃1−𝛾 𝛽 𝑡 (1 − 𝜃)𝑡(1−𝛾) 𝑢(𝑥0 )
𝑡=0
𝜃1−𝛾
= 𝑢(𝑥0 )
1 − 𝛽(1 − 𝜃)1−𝛾
𝜃1−𝛾
𝑣(𝑥) = max {𝑢(𝑐) + 𝛽 ⋅ 𝑢(𝑥 − 𝑐)}
0≤𝑐≤𝑥 1 − 𝛽(1 − 𝜃)1−𝛾
𝑐1−𝛾 𝜃1−𝛾 (𝑥 − 𝑐)1−𝛾
= max { +𝛽 ⋅ }
0≤𝑐≤𝑥 1 − 𝛾 1 − 𝛽(1 − 𝜃)1−𝛾 1−𝛾
𝜃1−𝛾
𝑐−𝛾 + 𝛽 ⋅ (𝑥 − 𝑐)−𝛾 (−1) = 0
1 − 𝛽(1 − 𝜃)1−𝛾
or
𝜃1−𝛾
𝑐−𝛾 = 𝛽 ⋅ (𝑥 − 𝑐)−𝛾
1 − 𝛽(1 − 𝜃)1−𝛾
With 𝑐 = 𝜃𝑥 we get
−𝛾 𝜃1−𝛾
(𝜃𝑥) =𝛽 ⋅ (𝑥(1 − 𝜃))−𝛾
1 − 𝛽(1 − 𝜃)1−𝛾
1
𝜃 = 1 − 𝛽𝛾
1
𝑐𝑡∗ = (1 − 𝛽 𝛾 ) 𝑥𝑡
1 1−𝛾
(1 − 𝛽 𝛾 )
𝑣∗ (𝑥𝑡 ) = 1−𝛾
𝑢(𝑥𝑡 )
1 − 𝛽 (𝛽 𝛾 )
Rearranging gives
26.8. SOLUTIONS 423
1 −𝛾
𝑣∗ (𝑥𝑡 ) = (1 − 𝛽 𝛾 ) 𝑢(𝑥𝑡 )
27.1 Contents
• Overview 27.2
• Reviewing the Model 27.3
• Value Function Iteration 27.4
• Time Iteration 27.5
• Exercises 27.6
• Solutions 27.7
In addition to what’s in Anaconda, this lecture will require the following library:
27.2 Overview
425
426 CHAPTER 27. CAKE EATING II: NUMERICAL METHODS
return (1 β ** (1/γ)) * x
We introduce the Bellman operator 𝑇 that takes a function v as an argument and returns
a new function 𝑇 𝑣 defined by.
1. Begin with an array of values {𝑣0 , … , 𝑣𝐼 } representing the values of some initial function
𝑣 on the grid points {𝑥0 , … , 𝑥𝐼 }.
2. Build a function 𝑣 ̂ on the state space ℝ+ by linear interpolation, based on these data
points.
27.4.3 Implementation
The maximize function below is a small helper function that converts a SciPy minimization
routine into a maximization routine.
def __init__(self,
β=0.96, # discount factor
γ=1.5, # degree of relative risk aversion
x_grid_min=1e3, # exclude zero for numerical stability
x_grid_max=2.5, # size of cake
x_grid_size=120):
self.β, self.γ = β, γ
# Set up grid
self.x_grid = np.linspace(x_grid_min, x_grid_max, x_grid_size)
# Utility function
def u(self, c):
γ = self.γ
if γ == 1:
return np.log(c)
else:
return (c ** (1 γ)) / (1 γ)
return c ** (self.γ)
u, β = self.u, self.β
v = lambda x: interp(self.x_grid, v_array, x)
* ce is an instance of CakeEating
* v is an array representing a guess of the value function
"""
v_new = np.empty_like(v)
for i, x in enumerate(ce.x_grid):
# Maximize RHS of Bellman equation at state x
v_new[i] = maximize(ce.state_action_value, 1e10, x, (x, v))[1]
return v_new
27.4. VALUE FUNCTION ITERATION 429
After defining the Bellman operator, we are ready to solve the model.
Let’s start by creating a CakeEating instance using the default parameterization.
In [7]: ce = CakeEating()
fig, ax = plt.subplots()
ax.plot(x_grid, v, color=plt.cm.jet(0),
lw=2, alpha=0.6, label='Initial guess')
for i in range(n):
v = T(v, ce) # Apply the Bellman operator
ax.plot(x_grid, v, color=plt.cm.jet(i / n), lw=2, alpha=0.6)
ax.legend()
ax.set_ylabel('value', fontsize=12)
ax.set_xlabel('cake size $x$', fontsize=12)
ax.set_title('Value function iterations')
plt.show()
# Set up loop
v = np.zeros(len(ce.x_grid)) # Initial guess
i = 0
error = tol + 1
v = v_new
if i == max_iter:
print("Failed to converge!")
return v_new
Now let’s call it, noting that it takes a little while to run.
In [10]: v = compute_value_function(ce)
Now we can plot and see what the converged value function looks like.
The quality of approximation is reasonably good for large 𝑥, but less so near the lower
boundary.
The reason is that the utility function and hence value function is very steep near the lower
boundary, and hence hard to approximate.
Let’s see how this plays out in terms of computing the optimal policy.
In the first lecture on cake eating, the optimal consumption policy was shown to be
𝜎∗ (𝑥) = (1 − 𝛽 1/𝛾 ) 𝑥
* ce is an instance of CakeEating
* v is a value function array
"""
c = np.empty_like(v)
for i in range(len(ce.x_grid)):
x = ce.x_grid[i]
# Maximize RHS of Bellman equation at state x
c[i] = maximize(ce.state_action_value, 1e10, x, (x, v))[0]
return c
Now let’s pass the approximate value function and compute optimal consumption:
In [15]: c = σ(ce, v)
fig, ax = plt.subplots()
plt.show()
We can improve it by increasing the grid size or reducing the error tolerance in the value
function iteration routine.
However, both changes will lead to a longer compute time.
Another possibility is to use an alternative algorithm, which offers the possibility of faster
compute time and, at the same time, more accuracy.
We explore this next.
Computationally, we can start with any initial guess of 𝜎0 and now choose 𝑐 to solve
27.6 Exercises
27.6.1 Exercise 1
Instead of the cake size changing according to 𝑥𝑡+1 = 𝑥𝑡 − 𝑐𝑡 , let it change according to
𝑥𝑡+1 = (𝑥𝑡 − 𝑐𝑡 )𝛼
27.6.2 Exercise 2
Implement time iteration, returning to the original case (i.e., dropping the modification in the
exercise above).
27.7 Solutions
27.7.1 Exercise 1
We need to create a class to hold our primitives and return the right hand side of the bellman
equation.
We will use inheritance to maximize code reuse.
def __init__(self,
β=0.96, # discount factor
γ=1.5, # degree of relative risk aversion
α=0.4, # productivity parameter
x_grid_min=1e3, # exclude zero for numerical stability
x_grid_max=2.5, # size of cake
x_grid_size=120):
self.α = α
CakeEating.__init__(self, β, γ, x_grid_min, x_grid_max, x_grid_size)
In [18]: og = OptimalGrowth()
436 CHAPTER 27. CAKE EATING II: NUMERICAL METHODS
fig, ax = plt.subplots()
plt.show()
Here’s the computed policy, combined with the solution we derived above for the standard
cake eating case 𝛼 = 1.
fig, ax = plt.subplots()
ax.set_ylabel('consumption', fontsize=12)
ax.set_xlabel('$x$', fontsize=12)
ax.legend(fontsize=12)
plt.show()
27.7. SOLUTIONS 437
Consumption is higher when 𝛼 < 1 because, at least for large 𝑥, the return to savings is
lower.
27.7.2 Exercise 2
"""
for i, x in enumerate(x_grid):
# handle other x
else:
σ_new[i] = bisect(euler_diff, 1e10, x 1e10, x)
438 CHAPTER 27. CAKE EATING II: NUMERICAL METHODS
return σ_new
x_grid = ce.x_grid
i = 0
error = tol + 1
while i < max_iter and error > tol:
σ = σ_new
if i == max_iter:
print("Failed to converge!")
return σ
In [23]: ce = CakeEating(x_grid_min=0.0)
c_euler = iterate_euler_equation(ce)
ax.set_ylabel('consumption')
ax.set_xlabel('$x$')
ax.legend(fontsize=12)
plt.show()
27.7. SOLUTIONS 439
440 CHAPTER 27. CAKE EATING II: NUMERICAL METHODS
Chapter 28
28.1 Contents
• Overview 28.2
• The Model 28.3
• Computation 28.4
• Exercises 28.5
• Solutions 28.6
28.2 Overview
In this lecture, we’re going to study a simple optimal growth model with one agent.
The model is a version of the standard one sector infinite horizon growth model studied in
• [102], chapter 2
• [72], section 3.1
• EDTC, chapter 1
• [104], chapter 12
It is an extension of the simple cake eating problem we looked at earlier.
The extension involves
• nonlinear returns to saving, through a production function, and
• stochastic returns, due to shocks to production.
Despite these additions, the model is still relatively simple.
We regard it as a stepping stone to more sophisticated models.
We solve the model using dynamic programming and a range of numerical techniques.
In this first lecture on optimal growth, the solution method will be value function iteration
(VFI).
While the code in this first lecture runs slowly, we will use a variety of techniques to drasti-
cally improve execution time over the next few lectures.
441
442CHAPTER 28. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
%matplotlib inline
𝑘𝑡+1 + 𝑐𝑡 ≤ 𝑦𝑡 (1)
In what follows,
• The sequence {𝜉𝑡 } is assumed to be IID.
• The common distribution of each 𝜉𝑡 will be denoted by 𝜙.
• The production function 𝑓 is assumed to be increasing and continuous.
• Depreciation of capital is not made explicit but can be incorporated into the production
function.
While many other treatments of the stochastic growth model use 𝑘𝑡 as the state variable, we
will use 𝑦𝑡 .
This will allow us to treat a stochastic model while maintaining only one state variable.
We consider alternative states and timing specifications in some of our other lectures.
28.3.2 Optimization
∞
𝔼 [∑ 𝛽 𝑡 𝑢(𝑐𝑡 )] (2)
𝑡=0
subject to
where
• 𝑢 is a bounded, continuous and strictly increasing utility function and
• 𝛽 ∈ (0, 1) is a discount factor.
In (3) we are assuming that the resource constraint (1) holds with equality — which is rea-
sonable because 𝑢 is strictly increasing and no output will be wasted at the optimum.
In summary, the agent’s aim is to select a path 𝑐0 , 𝑐1 , 𝑐2 , … for consumption that is
1. nonnegative,
3. optimal, in the sense that it maximizes (2) relative to all other feasible consumption
sequences, and
4. adapted, in the sense that the action 𝑐𝑡 depends only on observable outcomes, not on
future outcomes such as 𝜉𝑡+1 .
One way to think about solving this problem is to look for the best policy function.
A policy function is a map from past and present observables into current action.
We’ll be particularly interested in Markov policies, which are maps from the current state
𝑦𝑡 into a current action 𝑐𝑡 .
For dynamic programming problems such as this one (in fact for any Markov decision pro-
cess), the optimal policy is always a Markov policy.
In other words, the current state 𝑦𝑡 provides a sufficient statistic for the history in terms of
making an optimal decision today.
This is quite intuitive, but if you wish you can find proofs in texts such as [102] (section 4.1).
Hereafter we focus on finding the best Markov policy.
In our context, a Markov policy is a function 𝜎 ∶ ℝ+ → ℝ+ , with the understanding that states
are mapped to actions via
444CHAPTER 28. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
In other words, a feasible consumption policy is a Markov policy that respects the resource
constraint.
The set of all feasible consumption policies will be denoted by Σ.
Each 𝜎 ∈ Σ determines a continuous state Markov process {𝑦𝑡 } for output via
This is the time path for output when we choose and stick with the policy 𝜎.
We insert this process into the objective function to get
∞ ∞
𝔼 [ ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) ] = 𝔼 [ ∑ 𝛽 𝑡 𝑢(𝜎(𝑦𝑡 )) ] (6)
𝑡=0 𝑡=0
This is the total expected present value of following policy 𝜎 forever, given initial income 𝑦0 .
The aim is to select a policy that makes this number as large as possible.
The next section covers these ideas more formally.
28.3.4 Optimality
∞
𝑣𝜎 (𝑦) = 𝔼 [∑ 𝛽 𝑡 𝑢(𝜎(𝑦𝑡 ))] (7)
𝑡=0
The value function gives the maximal value that can be obtained from state 𝑦, after consider-
ing all feasible policies.
A policy 𝜎 ∈ Σ is called optimal if it attains the supremum in (8) for all 𝑦 ∈ ℝ+ .
28.3. THE MODEL 445
With our assumptions on utility and production functions, the value function as defined in
(8) also satisfies a Bellman equation.
For this problem, the Bellman equation takes the form
The primary importance of the value function is that we can use it to compute optimal poli-
cies.
The details are as follows.
Given a continuous function 𝑣 on ℝ+ , we say that 𝜎 ∈ Σ is 𝑣-greedy if 𝜎(𝑦) is a solution to
for every 𝑦 ∈ ℝ+ .
In other words, 𝜎 ∈ Σ is 𝑣-greedy if it optimally trades off current and future rewards when 𝑣
is taken to be the value function.
In our setting, we have the following key result
The intuition is similar to the intuition for the Bellman equation, which was provided after
(9).
See, for example, theorem 10.1.11 of EDTC.
Hence, once we have a good approximation to 𝑣∗ , we can compute the (approximately) opti-
mal policy by computing the corresponding greedy policy.
The advantage is that we are now solving a much lower dimensional optimization problem.
In other words, 𝑇 sends the function 𝑣 into the new function 𝑇 𝑣 defined by (11).
By construction, the set of solutions to the Bellman equation (9) exactly coincides with the
set of fixed points of 𝑇 .
For example, if 𝑇 𝑣 = 𝑣, then, for any 𝑦 ≥ 0,
One can also show that 𝑇 is a contraction mapping on the set of continuous bounded func-
tions on ℝ+ under the supremum distance
It’s not too hard to show that a 𝑣∗ -greedy policy exists (see EDTC, theorem 10.1.11 if you
get stuck).
Hence, at least one optimal policy exists.
Our problem now is how to compute it.
The results stated above assume that the utility function is bounded.
In practice economists often work with unbounded utility functions — and so will we.
In the unbounded setting, various optimality theories exist.
Unfortunately, they tend to be case-specific, as opposed to valid for a large range of applica-
tions.
Nevertheless, their main conclusions are usually in line with those stated for the bounded case
just above (as long as we drop the word “bounded”).
Consult, for example, section 12.2 of EDTC, [64] or [78].
28.4 Computation
Let’s now look at computing the value function and the optimal policy.
Our implementation in this lecture will focus on clarity and flexibility.
Both of these things are helpful, but they do cost us some speed — as you will see when you
run the code.
Later we will sacrifice some of this clarity and flexibility in order to accelerate our code with
just-in-time (JIT) compilation.
The algorithm we will use is fitted value function iteration, which was described in earlier lec-
tures the McCall model and cake eating.
The algorithm will be
1. Begin with an array of values {𝑣1 , … , 𝑣𝐼 } representing the values of some initial function
𝑣 on the grid points {𝑦1 , … , 𝑦𝐼 }.
2. Build a function 𝑣 ̂ on the state space ℝ+ by linear interpolation, based on these data
points.
To maximize the right hand side of the Bellman equation (9), we are going to use the
minimize_scalar routine from SciPy.
448CHAPTER 28. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
Since we are maximizing rather than minimizing, we will use the fact that the maximizer of 𝑔
on the interval [𝑎, 𝑏] is the minimizer of −𝑔 on the same interval.
To this end, and to keep the interface tidy, we will wrap minimize_scalar in an outer func-
tion as follows:
We will assume for now that 𝜙 is the distribution of 𝜉 ∶= exp(𝜇 + 𝑠𝜁) where
• 𝜁 is standard normal,
• 𝜇 is a shock location parameter and
• 𝑠 is a shock scale parameter.
We will store this and other primitives of the optimal growth model in a class.
The class, defined below, combines both parameters and a method that realizes the right
hand side of the Bellman equation (9).
def __init__(self,
u, # utility function
f, # production function
β=0.96, # discount factor
μ=0, # shock location parameter
s=0.1, # shock scale parameter
grid_max=4,
grid_size=120,
shock_size=250,
seed=1234):
# Set up grid
self.grid = np.linspace(1e4, grid_max, grid_size)
"""
v = interp1d(self.grid, v_array)
1 𝑛
∫ 𝑣(𝑓(𝑦 − 𝑐)𝑧)𝜙(𝑑𝑧) ≈ ∑ 𝑣(𝑓(𝑦 − 𝑐)𝜉𝑖 )
𝑛 𝑖=1
* og is an instance of OptimalGrowthModel
* v is an array representing a guess of the value function
"""
v_new = np.empty_like(v)
v_greedy = np.empty_like(v)
for i in range(len(grid)):
y = grid[i]
28.4.4 An Example
For this particular problem, an exact analytical solution is available (see [72], section 3.1.2),
with
𝜎∗ (𝑦) = (1 − 𝛼𝛽)𝑦
It is valuable to have these closed-form solutions because it lets us check whether our code
works for this particular case.
In Python, the functions above can be expressed as:
Next let’s create an instance of the model with the above primitives and assign it to the vari-
able og.
In [6]: α = 0.4
def fcd(k):
return k**α
og = OptimalGrowthModel(u=np.log, f=fcd)
Now let’s see what happens when we apply our Bellman operator to the exact solution 𝑣∗ in
this case.
In theory, since 𝑣∗ is a fixed point, the resulting function should again be 𝑣∗ .
In practice, we expect some small numerical error.
fig, ax = plt.subplots()
28.4. COMPUTATION 451
ax.set_ylim(35, 24)
ax.plot(grid, v, lw=2, alpha=0.6, label='$Tv^*$')
ax.plot(grid, v_init, lw=2, alpha=0.6, label='$v^*$')
ax.legend()
plt.show()
The two functions are essentially indistinguishable, so we are off to a good start.
Now let’s have a look at iterating with the Bellman operator, starting from an arbitrary ini-
tial condition.
The initial condition we’ll start with is, somewhat arbitrarily, 𝑣(𝑦) = 5 ln(𝑦).
fig, ax = plt.subplots()
ax.plot(grid, v, color=plt.cm.jet(0),
lw=2, alpha=0.6, label='Initial condition')
for i in range(n):
v_greedy, v = T(v, og) # Apply the Bellman operator
ax.plot(grid, v, color=plt.cm.jet(i / n), lw=2, alpha=0.6)
ax.legend()
ax.set(ylim=(40, 10), xlim=(np.min(grid), np.max(grid)))
plt.show()
452CHAPTER 28. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
1. the first 36 functions generated by the fitted value function iteration algorithm, with
hotter colors given to higher iterates
We can write a function that iterates until the difference is below a particular tolerance level.
"""
# Set up loop
v = og.u(og.grid) # Initial condition
i = 0
error = tol + 1
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
v = v_new
if i == max_iter:
print("Failed to converge!")
ax.legend()
ax.set_ylim(35, 24)
plt.show()
454CHAPTER 28. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
ax.legend()
plt.show()
28.5. EXERCISES 455
The figure shows that we’ve done a good job in this instance of approximating the true pol-
icy.
28.5 Exercises
28.5.1 Exercise 1
A common choice for utility function in this kind of work is the CRRA specification
𝑐1−𝛾
𝑢(𝑐) =
1−𝛾
Maintaining the other defaults, including the Cobb-Douglas production function, solve the
optimal growth model with this utility specification.
Setting 𝛾 = 1.5, compute and plot an estimate of the optimal policy.
Time how long this function takes to run, so you can compare it to faster code developed in
the next lecture
28.5.2 Exercise 2
Time how long it takes to iterate with the Bellman operator 20 times, starting from initial
condition 𝑣(𝑦) = 𝑢(𝑦).
Use the model specification in the previous exercise.
(As before, we will compare this number with that for the faster code developed in the next
lecture.)
456CHAPTER 28. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
28.6 Solutions
28.6.1 Exercise 1
def u_crra(c):
return (c**(1 γ) 1) / (1 γ)
og = OptimalGrowthModel(u=u_crra, f=fcd)
In [14]: %%time
v_greedy, v_solution = solve_model(og)
Let’s plot the policy function just to see what it looks like:
ax.legend()
plt.show()
28.6. SOLUTIONS 457
28.6.2 Exercise 2
In [17]: %%time
for i in range(20):
v_greedy, v_new = T(v, og)
v = v_new
29.1 Contents
• Overview 29.2
• The Model 29.3
• Computation 29.4
• Exercises 29.5
• Solutions 29.6
In addition to what’s in Anaconda, this lecture will need the following libraries:
29.2 Overview
Previously, we studied a stochastic optimal growth model with one representative agent.
We solved the model using dynamic programming.
In writing our code, we focused on clarity and flexibility.
These are important, but there’s often a trade-off between flexibility and speed.
The reason is that, when code is less flexible, we can exploit structure more easily.
(This is true about algorithms and mathematical problems more generally: more specific
problems have more structure, which, with some thought, can be exploited for better results.)
So, in this lecture, we are going to accept less flexibility while gaining speed, using just-in-
time (JIT) compilation to accelerate our code.
Let’s start with some imports:
459
460CHAPTER 29. OPTIMAL GROWTH II: ACCELERATING THE CODE WITH NUMBA
%matplotlib inline
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
The model is the same as discussed in our previous lecture on optimal growth.
We will start with log utility:
𝑢(𝑐) = ln(𝑐)
29.4 Computation
We will again store the primitives of the optimal growth model in a class.
29.4. COMPUTATION 461
But now we are going to use Numba’s @jitclass decorator to target our class for JIT compi-
lation.
Because we are going to use Numba to compile our class, we need to specify the data types.
You will see this as a list called opt_growth_data above our class.
Unlike in the previous lecture, we hardwire the production and utility specifications into the
class.
This is where we sacrifice flexibility in order to gain more speed.
In [4]: opt_growth_data = [
('α', float64), # Production parameter
('β', float64), # Discount factor
('μ', float64), # Shock location parameter
('s', float64), # Shock scale parameter
('grid', float64[:]), # Grid (array)
('shocks', float64[:]) # Shock draws (array)
]
@jitclass(opt_growth_data)
class OptimalGrowthModel:
def __init__(self,
α=0.4,
β=0.96,
μ=0,
s=0.1,
grid_max=4,
grid_size=120,
shock_size=250,
seed=1234):
# Set up grid
self.grid = np.linspace(1e5, grid_max, grid_size)
"Inverse of u'"
return 1/c
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/ipykernel_launcher.py:10: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba
doc/latest/reference/deprecation.html#changeofjitclasslocation for the time�
↪frame.
The class includes some methods such as u_prime that we do not need now but will use in
later lectures.
In [5]: @njit
def state_action_value(c, y, v_array, og):
"""
Right hand side of the Bellman equation.
* c is consumption
* y is income
* og is an instance of OptimalGrowthModel
* v_array represents a guess of the value function on the grid
"""
Now we can implement the Bellman operator, which maximizes the right hand side of the
Bellman equation:
In [6]: @jit(nopython=True)
def T(v, og):
"""
The Bellman operator.
* og is an instance of OptimalGrowthModel
* v is an array representing a guess of the value function
"""
v_new = np.empty_like(v)
v_greedy = np.empty_like(v)
for i in range(len(og.grid)):
29.4. COMPUTATION 463
y = og.grid[i]
"""
# Set up loop
v = og.u(og.grid) # Initial condition
i = 0
error = tol + 1
if i == max_iter:
print("Failed to converge!")
In [8]: og = OptimalGrowthModel()
Now we call solve_model, using the %%time magic to check how long it takes.
In [9]: %%time
v_greedy, v_solution = solve_model(og)
You will notice that this is much faster than our original implementation.
Here is a plot of the resulting policy, compared with the true policy:
ax.legend()
plt.show()
Again, the fit is excellent — this is as expected since we have not changed the algorithm.
The maximal absolute deviation between the two policies is
Out[11]: 0.0010480511607799947
29.5. EXERCISES 465
29.5 Exercises
29.5.1 Exercise 1
Time how long it takes to iterate with the Bellman operator 20 times, starting from initial
condition 𝑣(𝑦) = 𝑢(𝑦).
Use the default parameterization.
29.5.2 Exercise 2
Modify the optimal growth model to use the CRRA utility specification.
𝑐1−𝛾
𝑢(𝑐) =
1−𝛾
29.5.3 Exercise 3
The next figure shows a simulation of 100 elements of this sequence for three different dis-
count factors (and hence three different policies).
466CHAPTER 29. OPTIMAL GROWTH II: ACCELERATING THE CODE WITH NUMBA
29.6 Solutions
29.6.1 Exercise 1
In [12]: v = og.u(og.grid)
In [13]: %%time
for i in range(20):
v_greedy, v_new = T(v, og)
v = v_new
Compared with our timing for the non-compiled version of value function iteration, the JIT-
compiled code is usually an order of magnitude faster.
29.6.2 Exercise 2
In [14]: opt_growth_data = [
('α', float64), # Production parameter
('β', float64), # Discount factor
('μ', float64), # Shock location parameter
('γ', float64), # Preference parameter
('s', float64), # Shock scale parameter
('grid', float64[:]), # Grid (array)
('shocks', float64[:]) # Shock draws (array)
]
@jitclass(opt_growth_data)
class OptimalGrowthModel_CRRA:
def __init__(self,
α=0.4,
29.6. SOLUTIONS 467
β=0.96,
μ=0,
s=0.1,
γ=1.5,
grid_max=4,
grid_size=120,
shock_size=250,
seed=1234):
# Set up grid
self.grid = np.linspace(1e5, grid_max, grid_size)
def u_prime_inv(c):
return c**(1 / self.γ)
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/ipykernel_launcher.py:11: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba
doc/latest/reference/deprecation.html#changeofjitclasslocation for the time�
↪frame.
# This is added back by InteractiveShellApp.init_path()
Now we call solve_model, using the %%time magic to check how long it takes.
In [16]: %%time
v_greedy, v_solution = solve_model(og_crra)
ax.legend(loc='lower right')
plt.show()
This matches the solution that we obtained in our non-jitted code, in the exercises.
Execution time is an order of magnitude faster.
29.6.3 Exercise 3
og = OptimalGrowthModel(β=β, s=0.05)
ax.legend(loc='lower right')
plt.show()
470CHAPTER 29. OPTIMAL GROWTH II: ACCELERATING THE CODE WITH NUMBA
Chapter 30
30.1 Contents
• Overview 30.2
• The Euler Equation 30.3
• Implementation 30.4
• Exercises 30.5
• Solutions 30.6
In addition to what’s in Anaconda, this lecture will need the following libraries:
30.2 Overview
In this lecture, we’ll continue our earlier study of the stochastic optimal growth model.
In that lecture, we solved the associated dynamic programming problem using value function
iteration.
The beauty of this technique is its broad applicability.
With numerical problems, however, we can often attain higher efficiency in specific applica-
tions by deriving methods that are carefully tailored to the application at hand.
The stochastic optimal growth model has plenty of structure to exploit for this purpose, espe-
cially when we adopt some concavity and smoothness assumptions over primitives.
We’ll use this structure to obtain an Euler equation based method.
This will be an extension of the time iteration method considered in our elementary lecture
on cake eating.
In a subsequent lecture, we’ll see that time iteration can be further adjusted to obtain even
more efficiency.
Let’s start with some imports:
471
472 CHAPTER 30. OPTIMAL GROWTH III: TIME ITERATION
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
Our first step is to derive the Euler equation, which is a generalization of the Euler equation
we obtained in the lecture on cake eating.
We take the model set out in the stochastic growth model lecture and add the following as-
sumptions:
2. 𝑓(0) = 0
The last result is called the envelope condition due to its relationship with the envelope
theorem.
To see why (2) holds, write the Bellman equation in the equivalent form
30.3. THE EULER EQUATION 473
Differentiating with respect to 𝑦, and then evaluating at the optimum yields (2).
(Section 12.1 of EDTC contains full proofs of these results, and closely related discussions can
be found in many other texts.)
Differentiability of the value function and interiority of the optimal policy imply that optimal
consumption satisfies the first order condition associated with (1), which is
Combining (2) and the first-order condition (3) gives the Euler equation
over interior consumption policies 𝜎, one solution of which is the optimal policy 𝜎∗ .
Our aim is to solve the functional equation (5) and hence obtain 𝜎∗ .
Just as we introduced the Bellman operator to solve the Bellman equation, we will now intro-
duce an operator over policies to help us solve the Euler equation.
This operator 𝐾 will act on the set of all 𝜎 ∈ Σ that are continuous, strictly increasing and
interior.
Henceforth we denote this set of policies by 𝒫
2. returns a new function 𝐾𝜎, where 𝐾𝜎(𝑦) is the 𝑐 ∈ (0, 𝑦) that solves.
We call this operator the Coleman-Reffett operator to acknowledge the work of [23] and
[89].
474 CHAPTER 30. OPTIMAL GROWTH III: TIME ITERATION
In essence, 𝐾𝜎 is the consumption policy that the Euler equation tells you to choose today
when your future consumption policy is 𝜎.
The important thing to note about 𝐾 is that, by construction, its fixed points coincide with
solutions to the functional equation (5).
In particular, the optimal policy 𝜎∗ is a fixed point.
Indeed, for fixed 𝑦, the value 𝐾𝜎∗ (𝑦) is the 𝑐 that solves
It is possible to prove that there is a tight relationship between iterates of 𝐾 and iterates of
the Bellman operator.
Mathematically, the two operators are topologically conjugate.
Loosely speaking, this means that if iterates of one operator converge then so do iterates of
the other, and vice versa.
Moreover, there is a sense in which they converge at the same rate, at least in theory.
However, it turns out that the operator 𝐾 is more stable numerically and hence more efficient
in the applications we consider.
Examples are given below.
30.4 Implementation
• 𝑢(𝑐) = ln 𝑐
• 𝑓(𝑘) = 𝑘𝛼
• 𝜙 is the distribution of 𝜉 ∶= exp(𝜇 + 𝑠𝜁) when 𝜁 is standard normal
This will allow us to compare our results to the analytical solutions
As discussed above, our plan is to solve the model using time iteration, which means iterating
with the operator 𝐾.
For this we need access to the functions 𝑢′ and 𝑓, 𝑓 ′ .
These are available in a class called OptimalGrowthModel that we constructed in an earlier
lecture.
In [4]: opt_growth_data = [
('α', float64), # Production parameter
('β', float64), # Discount factor
('μ', float64), # Shock location parameter
('s', float64), # Shock scale parameter
('grid', float64[:]), # Grid (array)
('shocks', float64[:]) # Shock draws (array)
]
@jitclass(opt_growth_data)
class OptimalGrowthModel:
def __init__(self,
α=0.4,
β=0.96,
μ=0,
s=0.1,
grid_max=4,
grid_size=120,
shock_size=250,
seed=1234):
# Set up grid
self.grid = np.linspace(1e5, grid_max, grid_size)
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/ipykernel_launcher.py:10: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba
doc/latest/reference/deprecation.html#changeofjitclasslocation for the time�
↪frame.
# Remove the CWD from sys.path while we load stuff.
In [5]: @njit
def euler_diff(c, σ, y, og):
"""
Set up a function such that the root with respect to c,
given y and σ, is equal to Kσ(y).
"""
The function euler_diff evaluates integrals by Monte Carlo and approximates functions us-
ing linear interpolation.
30.4. IMPLEMENTATION 477
We will use a root-finding algorithm to solve (8) for 𝑐 given state 𝑦 and 𝜎, the current guess
of the policy.
Here’s the operator 𝐾, that implements the root-finding step.
In [6]: @njit
def K(σ, og):
"""
The ColemanReffett operator
β = og.β
f, f_prime, u_prime = og.f, og.f_prime, og.u_prime
grid, shocks = og.grid, og.shocks
σ_new = np.empty_like(σ)
for i, y in enumerate(grid):
# Solve for optimal c at y
c_star = brentq(euler_diff, 1e10, y1e10, args=(σ, y, og))[0]
σ_new[i] = c_star
return σ_new
30.4.1 Testing
Let’s generate an instance and plot some iterates of 𝐾, starting from 𝜎(𝑦) = 𝑦.
In [7]: og = OptimalGrowthModel()
grid = og.grid
n = 15
σ = grid.copy() # Set initial condition
fig, ax = plt.subplots()
lb = 'initial condition $\sigma(y) = y$'
ax.plot(grid, σ, color=plt.cm.jet(0), alpha=0.6, label=lb)
for i in range(n):
σ = K(σ, og)
ax.plot(grid, σ, color=plt.cm.jet(i / n), alpha=0.6)
# Update one more time and plot the last iterate in black
σ = K(σ, og)
ax.plot(grid, σ, color='k', alpha=0.8, label='last iterate')
ax.legend()
plt.show()
478 CHAPTER 30. OPTIMAL GROWTH III: TIME ITERATION
We see that the iteration process converges quickly to a limit that resembles the solution we
obtained in the previous lecture.
Here is a function called solve_model_time_iter that takes an instance of
OptimalGrowthModel and returns an approximation to the optimal policy, using time
iteration.
# Set up loop
i = 0
error = tol + 1
if i == max_iter:
print("Failed to converge!")
return σ_new
Converged in 11 iterations.
Here is a plot of the resulting policy, compared with the true policy:
ax.plot(og.grid, σ, lw=2,
alpha=0.8, label='approximate policy function')
ax.legend()
plt.show()
Out[11]: 2.5329106212446106e05
In [12]: %%timeit n 3 r 1
σ = solve_model_time_iter(og, σ_init, verbose=False)
480 CHAPTER 30. OPTIMAL GROWTH III: TIME ITERATION
Convergence is very fast, even compared to our JIT-compiled value function iteration.
Overall, we find that time iteration provides a very high degree of efficiency and accuracy, at
least for this model.
30.5 Exercises
30.5.1 Exercise 1
𝑐1−𝛾
𝑢(𝑐) =
1−𝛾
Set γ = 1.5.
Compute and plot the optimal policy.
30.6 Solutions
30.6.1 Exercise 1
In [13]: opt_growth_data = [
('α', float64), # Production parameter
('β', float64), # Discount factor
('μ', float64), # Shock location parameter
('γ', float64), # Preference parameter
('s', float64), # Shock scale parameter
('grid', float64[:]), # Grid (array)
('shocks', float64[:]) # Shock draws (array)
]
@jitclass(opt_growth_data)
class OptimalGrowthModel_CRRA:
def __init__(self,
α=0.4,
β=0.96,
μ=0,
s=0.1,
γ=1.5,
grid_max=4,
grid_size=120,
shock_size=250,
seed=1234):
# Set up grid
self.grid = np.linspace(1e5, grid_max, grid_size)
30.6. SOLUTIONS 481
def u_prime_inv(c):
return c**(1 / self.γ)
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/ipykernel_launcher.py:11: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba
doc/latest/reference/deprecation.html#changeofjitclasslocation for the time�
↪frame.
# This is added back by InteractiveShellApp.init_path()
In [15]: %%time
σ = solve_model_time_iter(og_crra, σ_init)
fig, ax = plt.subplots()
ax.plot(og.grid, σ, lw=2,
alpha=0.8, label='approximate policy function')
ax.legend()
plt.show()
Converged in 13 iterations.
482 CHAPTER 30. OPTIMAL GROWTH III: TIME ITERATION
31.1 Contents
• Overview 31.2
• Key Idea 31.3
• Implementation 31.4
In addition to what’s in Anaconda, this lecture will need the following libraries:
31.2 Overview
483
484 CHAPTER 31. OPTIMAL GROWTH IV: THE ENDOGENOUS GRID METHOD
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
Let’s start by reminding ourselves of the theory and then see how the numerics fit in.
31.3.1 Theory
Take the model set out in the time iteration lecture, following the same terminology and no-
tation.
The Euler equation is
The method discussed above requires a root-finding routine to find the 𝑐𝑖 corresponding to a
given income value 𝑦𝑖 .
Root-finding is costly because it typically involves a significant number of function evalua-
tions.
As pointed out by Carroll [21], we can avoid this if 𝑦𝑖 is chosen endogenously.
The only assumption required is that 𝑢′ is invertible on (0, ∞).
Let (𝑢′ )−1 be the inverse function of 𝑢′ .
The idea is this:
• First, we fix an exogenous grid {𝑘𝑖 } for capital (𝑘 = 𝑦 − 𝑐).
• Then we obtain 𝑐𝑖 via
31.4 Implementation
In [4]: opt_growth_data = [
('α', float64), # Production parameter
486 CHAPTER 31. OPTIMAL GROWTH IV: THE ENDOGENOUS GRID METHOD
@jitclass(opt_growth_data)
class OptimalGrowthModel:
def __init__(self,
α=0.4,
β=0.96,
μ=0,
s=0.1,
grid_max=4,
grid_size=120,
shock_size=250,
seed=1234):
# Set up grid
self.grid = np.linspace(1e5, grid_max, grid_size)
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/ipykernel_launcher.py:10: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba
doc/latest/reference/deprecation.html#changeofjitclasslocation for the time�
↪frame.
# Remove the CWD from sys.path while we load stuff.
31.4. IMPLEMENTATION 487
In [5]: @njit
def K(σ_array, og):
"""
The ColemanReffett operator using EGM
"""
# Simplify names
f, β = og.f, og.β
f_prime, u_prime = og.f_prime, og.u_prime
u_prime_inv = og.u_prime_inv
grid, shocks = og.grid, og.shocks
return c
31.4.2 Testing
In [6]: og = OptimalGrowthModel()
grid = og.grid
# Set up loop
i = 0
error = tol + 1
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
σ = σ_new
if i == max_iter:
print("Failed to converge!")
return σ_new
Converged in 12 iterations.
Here is a plot of the resulting policy, compared with the true policy:
fig, ax = plt.subplots()
ax.plot(y, σ, lw=2,
alpha=0.8, label='approximate policy function')
ax.legend()
plt.show()
31.4. IMPLEMENTATION 489
Out[10]: 1.530274914252061e05
In [11]: %%timeit n 3 r 1
σ = solve_model_time_iter(og, σ_init, verbose=False)
Relative to time iteration, which as already found to be highly efficient, EGM has managed to
shave off still more run time without compromising accuracy.
This is due to the lack of a numerical root-finding step.
We can now solve the optimal growth model at given parameters extremely fast.
490 CHAPTER 31. OPTIMAL GROWTH IV: THE ENDOGENOUS GRID METHOD
Chapter 32
32.1 Contents
• Overview 32.2
• The Optimal Savings Problem 32.3
• Computation 32.4
• Implementation 32.5
• Exercises 32.6
• Solutions 32.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
32.2 Overview
In this lecture, we study an optimal savings problem for an infinitely lived consumer—the
“common ancestor” described in [72], section 1.3.
This is an essential sub-problem for many representative macroeconomic models
• [4]
• [59]
• etc.
It is related to the decision problem in the stochastic optimal growth model and yet differs in
important ways.
For example, the choice problem for the agent includes an additive income term that leads to
an occasionally binding constraint.
Moreover, in this and the following lectures, we will inject more realisitic features such as cor-
related shocks.
To solve the model we will use Euler equation based time iteration, which proved to be fast
and accurate in our investigation of the stochastic optimal growth model.
491
492 CHAPTER 32. THE INCOME FLUCTUATION PROBLEM I: BASIC MODEL
Time iteration is globally convergent under mild assumptions, even when utility is unbounded
(both above and below).
We’ll need the following imports:
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
32.2.1 References
Let’s write down the model and then discuss how to solve it.
32.3.1 Set-Up
Consider a household that chooses a state-contingent consumption plan {𝑐𝑡 }𝑡≥0 to maximize
∞
𝔼 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 )
𝑡=0
subject to
Here
• 𝛽 ∈ (0, 1) is the discount factor
• 𝑎𝑡 is asset holdings at time 𝑡, with borrowing constraint 𝑎𝑡 ≥ 0
• 𝑐𝑡 is consumption
• 𝑌𝑡 is non-capital income (wages, unemployment compensation, etc.)
• 𝑅 ∶= 1 + 𝑟, where 𝑟 > 0 is the interest rate on savings
The timing here is as follows:
2. Labor is supplied by the household throughout the period and labor income 𝑌𝑡+1 is re-
ceived at the end of period 𝑡.
1. 𝛽𝑅 < 1
2. 𝑢 is smooth, strictly increasing and strictly concave with lim𝑐→0 𝑢′ (𝑐) = ∞ and
lim𝑐→∞ 𝑢′ (𝑐) = 0
1. (𝑎0 , 𝑧0 ) = (𝑎, 𝑧)
The meaning of the third point is just that consumption at time 𝑡 cannot be a function of
outcomes are yet to be observed.
In fact, for this problem, consumption can be chosen optimally by taking it to be contingent
only on the current state.
Optimality is defined below.
∞
𝑉 (𝑎, 𝑧) ∶= max 𝔼 {∑ 𝛽 𝑡 𝑢(𝑐𝑡 )} (2)
𝑡=0
where the maximization is overall feasible consumption paths from (𝑎, 𝑧).
An optimal consumption path from (𝑎, 𝑧) is a feasible consumption path from (𝑎, 𝑧) that at-
tains the supremum in (2).
To pin down such paths we can use a version of the Euler equation, which in the present set-
ting is
494 CHAPTER 32. THE INCOME FLUCTUATION PROBLEM I: BASIC MODEL
and
As shown in [75],
1. For each (𝑎, 𝑧) ∈ S, a unique optimal consumption path from (𝑎, 𝑧) exists
2. This path is the unique feasible path from (𝑎, 𝑧) satisfying the Euler equality (5) and
the transversality condition
Moreover, there exists an optimal consumption function 𝜎∗ ∶ S → ℝ+ such that the path from
(𝑎, 𝑧) generated by
satisfies both (5) and (6), and hence is the unique optimal path from (𝑎, 𝑧).
Thus, to solve the optimization problem, we need to compute the policy 𝜎∗ .
32.4 Computation
Our investigation of the cake eating problem and stochastic optimal growth model suggests
that time iteration will be faster and more accurate.
This is the approach that we apply below.
32.4. COMPUTATION 495
We can rewrite (5) to make it a statement about functions rather than random variables.
In particular, consider the functional equation
(𝑢′ ∘ 𝜎)(𝑎, 𝑧) = max {𝛽𝑅 𝔼𝑧 (𝑢′ ∘ 𝜎)[𝑅(𝑎 − 𝜎(𝑎, 𝑧)) + 𝑌 ̂ , 𝑍]̂ , 𝑢′ (𝑎)} (7)
where
• (𝑢′ ∘ 𝜎)(𝑠) ∶= 𝑢′ (𝜎(𝑠)).
• 𝔼𝑧 conditions on current state 𝑧 and 𝑋̂ indicates next period value of random variable
𝑋 and
• 𝜎 is the unknown function.
We need a suitable class of candidate solutions for the optimal consumption policy.
The right way to pick such a class is to consider what properties the solution is likely to have,
in order to restrict the search space and ensure that iteration is well behaved.
To this end, let 𝒞 be the space of continuous functions 𝜎 ∶ S → ℝ such that 𝜎 is increasing in
the first argument, 0 < 𝜎(𝑎, 𝑧) ≤ 𝑎 for all (𝑎, 𝑧) ∈ S, and
The proof of the last statement is somewhat technical but here is a quick summary:
It is shown in [75] that 𝐾 is a contraction mapping on 𝒞 under the metric
32.5 Implementation
𝑐1−𝛾
𝑢(𝑐) =
1−𝛾
The exogeneous state process {𝑍𝑡 } defaults to a two-state Markov chain with state space
{0, 1} and transition matrix 𝑃 .
Here we build a class called IFP that stores the model primitives.
In [3]: ifp_data = [
('R', float64), # Interest rate 1 + r
('β', float64), # Discount factor
('γ', float64), # Preference parameter
('P', float64[:, :]), # Markov matrix for binary Z_t
('y', float64[:]), # Income is Y_t = y[Z_t]
('asset_grid', float64[:]) # Grid (array)
]
@jitclass(ifp_data)
class IFP:
def __init__(self,
r=0.01,
β=0.96,
γ=1.5,
P=((0.6, 0.4),
(0.05, 0.95)),
y=(0.0, 2.0),
grid_max=16,
grid_size=50):
self.R = 1 + r
self.β, self.γ = β, γ
self.P, self.y = np.array(P), np.array(y)
self.asset_grid = np.linspace(0, grid_max, grid_size)
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/ipykernel_launcher.py:10: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba
32.5. IMPLEMENTATION 497
In [4]: @njit
def euler_diff(c, a, z, σ_vals, ifp):
"""
The difference between the left and righthand side
of the Euler Equation, given current policy σ.
"""
# Simplify names
R, P, y, β, γ = ifp.R, ifp.P, ifp.y, ifp.β, ifp.γ
asset_grid, u_prime = ifp.asset_grid, ifp.u_prime
n = len(P)
Note that we use linear interpolation along the asset grid to approximate the policy function.
The next step is to obtain the root of the Euler difference.
In [5]: @njit
def K(σ, ifp):
"""
The operator K.
"""
σ_new = np.empty_like(σ)
for i, a in enumerate(ifp.asset_grid):
for z in (0, 1):
result = brentq(euler_diff, 1e8, a, args=(a, z, σ, ifp))
σ_new[i, z] = result.root
return σ_new
With the operator 𝐾 in hand, we can choose an initial condition and start to iterate.
The following function iterates to convergence and returns the approximate optimal policy.
498 CHAPTER 32. THE INCOME FLUCTUATION PROBLEM I: BASIC MODEL
# Set up loop
i = 0
error = tol + 1
if i == max_iter:
print("Failed to converge!")
return σ_new
Let’s carry this out using the default parameters of the IFP class:
Converged in 60 iterations.
The following exercises walk you through several applications where policy functions are com-
puted.
return (1 β ** (1/γ)) * x
fig, ax = plt.subplots()
ax.plot(a_grid, σ_star[:, 0], label='numerical')
ax.plot(a_grid, c_star(a_grid, ifp.β, ifp.γ), '', label='analytical')
500 CHAPTER 32. THE INCOME FLUCTUATION PROBLEM I: BASIC MODEL
ax.set(xlabel='assets', ylabel='consumption')
ax.legend()
plt.show()
Success!
32.6 Exercises
32.6.1 Exercise 1
32.6.2 Exercise 2
Now let’s consider the long run asset levels held by households under the default parameters.
The following figure is a 45 degree diagram showing the law of motion for assets when con-
sumption is optimal
fig, ax = plt.subplots()
for z, lb in zip((0, 1), ('low income', 'high income')):
ax.plot(a, R * (a σ_star[:, z]) + y[z] , label=lb)
ax.plot(a, a, 'k')
ax.set(xlabel='current assets', ylabel='next period assets')
ax.legend()
plt.show()
502 CHAPTER 32. THE INCOME FLUCTUATION PROBLEM I: BASIC MODEL
The unbroken lines show the update function for assets at each 𝑧, which is
32.6.3 Exercise 3
Following on from exercises 1 and 2, let’s look at how savings and aggregate asset holdings
vary with the interest rate
32.7. SOLUTIONS 503
• Note: [72] section 18.6 can be consulted for more background on the topic treated in
this exercise.
For a given parameterization of the model, the mean of the stationary distribution of assets
can be interpreted as aggregate capital in an economy with a unit mass of ex-ante identical
households facing idiosyncratic shocks.
Your task is to investigate how this measure of aggregate capital varies with the interest rate.
Following tradition, put the price (i.e., interest rate) is on the vertical axis.
On the horizontal axis put aggregate capital, computed as the mean of the stationary distri-
bution given the interest rate.
32.7 Solutions
32.7.1 Exercise 1
fig, ax = plt.subplots()
for r_val in r_vals:
ifp = IFP(r=r_val)
σ_star = solve_model_time_iter(ifp, σ_init, verbose=False)
ax.plot(ifp.asset_grid, σ_star[:, 0], label=f'$r = {r_val:.3f}$')
32.7.2 Exercise 2
Now we call the function, generate the series and then histogram it:
fig, ax = plt.subplots()
ax.hist(a, bins=20, alpha=0.5, density=True)
ax.set(xlabel='assets')
plt.show()
32.7. SOLUTIONS 505
32.7.3 Exercise 3
In [15]: M = 25
r_vals = np.linspace(0, 0.02, M)
fig, ax = plt.subplots()
asset_mean = []
for r in r_vals:
print(f'Solving model at r = {r}')
ifp = IFP(r=r)
mean = np.mean(compute_asset_series(ifp, T=250_000))
asset_mean.append(mean)
ax.plot(asset_mean, r_vals)
plt.show()
33.1 Contents
• Overview 33.2
• The Savings Problem 33.3
• Solution Algorithm 33.4
• Implementation 33.5
• Exercises 33.6
• Solutions 33.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
33.2 Overview
507
508CHAPTER 33. THE INCOME FLUCTUATION PROBLEM II: STOCHASTIC RETURNS ON ASSETS
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
33.3.1 Set Up
∞
𝔼 {∑ 𝛽 𝑡 𝑢(𝑐𝑡 )} (1)
𝑡=0
subject to
where
• the maps 𝑅 and 𝑌 are time-invariant nonnegative functions,
• the innovation processes {𝜁𝑡 } and {𝜂𝑡 } are IID and independent of each other, and
• {𝑍𝑡 }𝑡≥0 is an irreducible time-homogeneous Markov chain on a finite set Z
Let 𝑃 represent the Markov matrix for the chain {𝑍𝑡 }𝑡≥0 .
Our assumptions on preferences are the same as our previous lecture on the income fluctua-
tion problem.
As before, 𝔼𝑧 𝑋̂ means expectation of next period value 𝑋̂ given current value 𝑍 = 𝑧.
33.4. SOLUTION ALGORITHM 509
33.3.2 Assumptions
We need restrictions to ensure that the objective (1) is finite and the solution methods de-
scribed below converge.
We also need to ensure that the present discounted value of wealth does not grow too quickly.
When {𝑅𝑡 } was constant we required that 𝛽𝑅 < 1.
Now it is stochastic, we require that
1/𝑛
𝑛
𝛽𝐺𝑅 < 1, where 𝐺𝑅 ∶= lim (𝔼 ∏ 𝑅𝑡 ) (4)
𝑛→∞
𝑡=1
Notice that, when {𝑅𝑡 } takes some constant value 𝑅, this reduces to the previous restriction
𝛽𝑅 < 1
The value 𝐺𝑅 can be thought of as the long run (geometric) average gross rate of return.
More intuition behind (4) is provided in [75].
Discussion on how to check it is given below.
Finally, we impose some routine technical restrictions on non-financial income.
One relatively simple setting where all these restrictions are satisfied is the IID and CRRA
environment of [10].
33.3.3 Optimality
(Intuition and derivation are similar to our earlier lecture on the income fluctuation problem.)
We again solve the Euler equation using time iteration, iterating with a Coleman–Reffett op-
erator 𝐾 defined to match the Euler equation (5).
Our definition of the candidate class 𝜎 ∈ 𝒞 of consumption policies is the same as in our ear-
lier lecture on the income fluctuation problem.
510CHAPTER 33. THE INCOME FLUCTUATION PROBLEM II: STOCHASTIC RETURNS ON ASSETS
For fixed 𝜎 ∈ 𝒞 and (𝑎, 𝑧) ∈ S, the value 𝐾𝜎(𝑎, 𝑧) of the function 𝐾𝜎 at (𝑎, 𝑧) is defined as
the 𝜉 ∈ (0, 𝑎] that solves
The idea behind 𝐾 is that, as can be seen from the definitions, 𝜎 ∈ 𝒞 satisfies the Euler equa-
tion if and only if 𝐾𝜎(𝑎, 𝑧) = 𝜎(𝑎, 𝑧) for all (𝑎, 𝑧) ∈ S.
This means that fixed points of 𝐾 in 𝒞 and optimal consumption policies exactly coincide
(see [75] for more details).
2. there exists an integer 𝑛 such that 𝐾 𝑛 is a contraction mapping on (𝒞, 𝜌), and
We now have a clear path to successfully approximating the optimal policy: choose some 𝜎 ∈
𝒞 and then iterate with 𝐾 until convergence (as measured by the distance 𝜌)
In the study of that model we found that it was possible to further accelerate time iteration
via the endogenous grid method.
We will use the same method here.
The methodology is the same as it was for the optimal growth model, with the minor excep-
tion that we need to remember that consumption is not always interior.
In particular, optimal consumption can be equal to assets when the level of assets is low.
The endogenous grid method (EGM) calls for us to take a grid of savings values 𝑠𝑖 , where
each such 𝑠 is interpreted as 𝑠 = 𝑎 − 𝑐.
For the lowest grid point we take 𝑠0 = 0.
For the corresponding 𝑎0 , 𝑐0 pair we have 𝑎0 = 𝑐0 .
This happens close to the origin, where assets are low and the household consumes all that it
can.
33.5. IMPLEMENTATION 511
Although there are many solutions, the one we take is 𝑎0 = 𝑐0 = 0, which pins down the
policy at the origin, aiding interpolation.
For 𝑠 > 0, we have, by definition, 𝑐 < 𝑎, and hence consumption is interior.
Hence the max component of (5) drops out, and we solve for
̂ ′ ∘ 𝜎) [𝑅𝑠
𝑐𝑖 = (𝑢′ )−1 {𝛽 𝔼𝑧 𝑅(𝑢 ̂ 𝑖 + 𝑌 ̂ , 𝑍]}
̂ (7)
at each 𝑠𝑖 .
Iterating
Once we have the pairs {𝑠𝑖 , 𝑐𝑖 }, the endogenous asset grid is obtained by 𝑎𝑖 = 𝑐𝑖 + 𝑠𝑖 .
Also, we held 𝑧 ∈ Z in the discussion above so we can pair it with 𝑎𝑖 .
An approximation of the policy (𝑎, 𝑧) ↦ 𝜎(𝑎, 𝑧) can be obtained by interpolating {𝑎𝑖 , 𝑐𝑖 } at
each 𝑧.
In what follows, we use linear interpolation.
Convergence of time iteration is dependent on the condition 𝛽𝐺𝑅 < 1 being satisfied.
One can check this using the fact that 𝐺𝑅 is equal to the spectral radius of the matrix 𝐿 de-
fined by
This indentity is proved in [75], where 𝜙 is the density of the innovation 𝜁𝑡 to returns on as-
sets.
(Remember that Z is a finite set, so this expression defines a matrix.)
Checking the condition is even easier when {𝑅𝑡 } is IID.
In that case, it is clear from the definition of 𝐺𝑅 that 𝐺𝑅 is just 𝔼𝑅𝑡 .
We test the condition 𝛽𝔼𝑅𝑡 < 1 in the code below.
33.5 Implementation
We will assume that 𝑅𝑡 = exp(𝑎𝑟 𝜁𝑡 + 𝑏𝑟 ) where 𝑎𝑟 , 𝑏𝑟 are constants and {𝜁𝑡 } is IID standard
normal.
We allow labor income to be correlated, with
𝑌𝑡 = exp(𝑎𝑦 𝜂𝑡 + 𝑍𝑡 𝑏𝑦 )
where {𝜂𝑡 } is also IID standard normal and {𝑍𝑡 } is a Markov chain taking values in {0, 1}.
512CHAPTER 33. THE INCOME FLUCTUATION PROBLEM II: STOCHASTIC RETURNS ON ASSETS
In [3]: ifp_data = [
('γ', float64), # utility parameter
('β', float64), # discount factor
('P', float64[:, :]), # transition probs for z_t
('a_r', float64), # scale parameter for R_t
('b_r', float64), # additive parameter for R_t
('a_y', float64), # scale parameter for Y_t
('b_y', float64), # additive parameter for Y_t
('s_grid', float64[:]), # Grid over savings
('η_draws', float64[:]), # Draws of innovation η for MC
('ζ_draws', float64[:]) # Draws of innovation ζ for MC
]
In [4]: @jitclass(ifp_data)
class IFP:
"""
A class that stores primitives for the income fluctuation
problem.
"""
def __init__(self,
γ=1.5,
β=0.96,
P=np.array([(0.9, 0.1),
(0.1, 0.9)]),
a_r=0.1,
b_r=0.0,
a_y=0.2,
b_y=0.5,
shock_draw_size=50,
grid_max=10,
grid_size=100,
seed=1234):
# Marginal utility
def u_prime(self, c):
return c**(self.γ)
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/ipykernel_launcher.py:1: NumbaDeprecationWarning: The 'numba.jitclass'
33.5. IMPLEMENTATION 513
In [5]: @njit
def K(a_in, σ_in, ifp):
"""
The ColemanReffett operator for the income fluctuation problem,
using the endogenous grid method.
# Simplify names
u_prime, u_prime_inv = ifp.u_prime, ifp.u_prime_inv
R, Y, P, β = ifp.R, ifp.Y, ifp.P, ifp.β
s_grid, η_draws, ζ_draws = ifp.s_grid, ifp.η_draws, ifp.ζ_draws
n = len(P)
# Allocate memory
σ_out = np.empty_like(σ_in)
The next function solves for an approximation of the optimal consumption policy via time
iteration.
514CHAPTER 33. THE INCOME FLUCTUATION PROBLEM II: STOCHASTIC RETURNS ON ASSETS
# Set up loop
i = 0
error = tol + 1
if i == max_iter:
print("Failed to converge!")
Converged in 45 iterations.
33.5. IMPLEMENTATION 515
plt.legend()
plt.show()
Notice that we consume all assets in the lower range of the asset space.
This is because we anticipate income 𝑌𝑡+1 tomorrow, which makes the need to save less ur-
gent.
Can you explain why consuming all assets ends earlier (for lower values of assets) when 𝑧 =
0?
Let’s try to get some idea of what will happen to assets over the long run under this con-
sumption policy.
As with our earlier lecture on the income fluctuation problem, we begin by producing a 45
degree diagram showing the law of motion for assets
a = a_star
fig, ax = plt.subplots()
for z, lb in zip((0, 1), ('bad state', 'good state')):
516CHAPTER 33. THE INCOME FLUCTUATION PROBLEM II: STOCHASTIC RETURNS ON ASSETS
ax.legend()
plt.show()
The unbroken lines represent, for each 𝑧, an average update function for assets, given by
Here
• 𝑅̄ = 𝔼𝑅𝑡 , which is mean returns and
• 𝑌 ̄ (𝑧) = 𝔼𝑧 𝑌 (𝑧, 𝜂𝑡 ), which is mean labor income in state 𝑧.
The dashed line is the 45 degree line.
We can see from the figure that the dynamics will be stable — assets do not diverge even in
the highest state.
33.6 Exercises
33.6.1 Exercise 1
Let’s repeat our earlier exercise on the long-run cross sectional distribution of assets.
In that exercise, we used a relatively simple income fluctuation model.
In the solution, we found the shape of the asset distribution to be unrealistic.
33.7. SOLUTIONS 517
In particular, we failed to match the long right tail of the wealth distribution.
Your task is to try again, repeating the exercise, but now with our more sophisticated model.
Use the default parameters.
33.7 Solutions
33.7.1 Exercise 1
In [12]: @njit
def compute_asset_series(ifp, a_star, σ_star, z_seq, T=500_000):
"""
Simulates a time series of length T for assets, given optimal
savings behavior.
"""
Now we call the function, generate the series and then histogram it, using the solutions com-
puted above.
In [13]: T = 1_000_000
mc = MarkovChain(ifp.P)
z_seq = mc.simulate(T, random_state=1234)
fig, ax = plt.subplots()
ax.hist(a, bins=40, alpha=0.5, density=True)
ax.set(xlabel='assets')
plt.show()
518CHAPTER 33. THE INCOME FLUCTUATION PROBLEM II: STOCHASTIC RETURNS ON ASSETS
Now we have managed to successfully replicate the long right tail of the wealth distribution.
Here’s another view of this using a horizontal violin plot.
Information
519
Chapter 34
34.1 Contents
• Overview 34.2
• Model 34.3
• Take 1: Solution by VFI 34.4
• Take 2: A More Efficient Method 34.5
• Another Functional Equation 34.6
• Solving the RWFE 34.7
• Implementation 34.8
• Exercises 34.9
• Solutions 34.10
• Appendix A 34.11
• Appendix B 34.12
• Examples 34.13
In addition to what’s in Anaconda, this lecture deploys the libraries:
34.2 Overview
In this lecture, we consider an extension of the previously studied job search model of McCall
[80].
We’ll build on a model of Bayesian learning discussed in this lecture on the topic of exchange-
ability and its relationship to the concept of IID (identically and independently distributed)
random variables and to Bayesian updating.
In the McCall model, an unemployed worker decides when to accept a permanent job at a
specific fixed wage, given
• his or her discount factor
• the level of unemployment compensation
• the distribution from which wage offers are drawn
521
522 CHAPTER 34. JOB SEARCH VII: SEARCH WITH LEARNING
In the version considered below, the wage distribution is unknown and must be learned.
• The following is based on the presentation in [72], section 6.6.
Let’s start with some imports
• Infinite horizon dynamic programming with two states and one binary control.
• Bayesian updating to learn the unknown distribution.
34.3 Model
Let’s first review the basic McCall model [80] and then add the variation we want to consider.
Recall that, in the baseline model , an unemployed worker is presented in each period with a
permanent job offer at wage 𝑊𝑡 .
At time 𝑡, our worker either
2. rejects the offer, receives unemployment compensation 𝑐 and reconsiders next period
𝑤
𝑣(𝑤) = max { , 𝑐 + 𝛽 ∫ 𝑣(𝑤′ )𝑞(𝑤′ )𝑑𝑤′ } (1)
1−𝛽
Now let’s extend the model by considering the variation presented in [72], section 6.6.
The model is as above, apart from the fact that
34.3. MODEL 523
𝜋𝑡 𝑓(𝑤𝑡+1 )
𝜋𝑡+1 = (2)
𝜋𝑡 𝑓(𝑤𝑡+1 ) + (1 − 𝜋𝑡 )𝑔(𝑤𝑡+1 )
This last expression follows from Bayes’ rule, which tells us that
ℙ{𝑊 = 𝑤 | 𝑞 = 𝑓}ℙ{𝑞 = 𝑓}
ℙ{𝑞 = 𝑓 | 𝑊 = 𝑤} = and ℙ{𝑊 = 𝑤} = ∑ ℙ{𝑊 = 𝑤 | 𝑞 = 𝜔}ℙ{𝑞 = 𝜔}
ℙ{𝑊 = 𝑤} 𝜔∈{𝑓,𝑔}
The fact that (2) is recursive allows us to progress to a recursive solution method.
Letting
𝜋𝑓(𝑤)
𝑞𝜋 (𝑤) ∶= 𝜋𝑓(𝑤) + (1 − 𝜋)𝑔(𝑤) and 𝜅(𝑤, 𝜋) ∶=
𝜋𝑓(𝑤) + (1 − 𝜋)𝑔(𝑤)
we can express the value function for the unemployed worker recursively as follows
𝑤
𝑣(𝑤, 𝜋) = max { , 𝑐 + 𝛽 ∫ 𝑣(𝑤′ , 𝜋′ ) 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ } where 𝜋′ = 𝜅(𝑤′ , 𝜋) (3)
1−𝛽
Notice that the current guess 𝜋 is a state variable, since it affects the worker’s perception of
probabilities for future rewards.
34.3.3 Parameterization
In [3]: @vectorize
def p(x, a, b):
r = gamma(a + b) / (gamma(a) * gamma(b))
return r * x**(a1) * (1 x)**(b1)
f = lambda x: p(x, 1, 1)
g = lambda x: p(x, 3, 1.2)
ax.legend()
plt.show()
What kind of optimal policy might result from (3) and the parameterization specified above?
Intuitively, if we accept at 𝑤𝑎 and 𝑤𝑎 ≤ 𝑤𝑏 , then — all other things being given — we should
also accept at 𝑤𝑏 .
This suggests a policy of accepting whenever 𝑤 exceeds some threshold value 𝑤.̄
But 𝑤̄ should depend on 𝜋 — in fact, it should be decreasing in 𝜋 because
• 𝑓 is a less attractive offer distribution than 𝑔
• larger 𝜋 means more weight on 𝑓 and less on 𝑔
Thus larger 𝜋 depresses the worker’s assessment of her future prospects, and relatively low
current offers become more attractive.
Summary: We conjecture that the optimal policy is of the form 𝟙𝑤 ≥ 𝑤(𝜋)
̄ for some de-
creasing function 𝑤.̄
34.4. TAKE 1: SOLUTION BY VFI 525
Let’s set about solving the model and see how our results match with our intuition.
We begin by solving via value function iteration (VFI), which is natural but ultimately turns
out to be second best.
The class SearchProblem is used to store parameters and methods needed to compute optimal
actions.
"""
def __init__(self,
β=0.95, # Discount factor
c=0.3, # Unemployment compensation
F_a=1,
F_b=1,
G_a=3,
G_b=1.2,
w_max=1, # Maximum wage possible
w_grid_size=100,
π_grid_size=100,
mc_size=500):
self.mc_size = mc_size
The following function takes an instance of this class and returns jitted versions of the Bell-
man operator T, and a get_greedy() function to compute the approximate optimal policy
from a guess v of the value function
f, g = sp.f, sp.g
w_f, w_g = sp.w_f, sp.w_g
β, c = sp.β, sp.c
mc_size = sp.mc_size
w_grid, π_grid = sp.w_grid, sp.π_grid
@njit
def v_func(x, y, v):
return mlinterp((w_grid, π_grid), v, (x, y))
@njit
526 CHAPTER 34. JOB SEARCH VII: SEARCH WITH LEARNING
return π_new
@njit(parallel=parallel_flag)
def T(v):
"""
The Bellman operator.
"""
v_new = np.empty_like(v)
for i in prange(len(w_grid)):
for j in prange(len(π_grid)):
w = w_grid[i]
π = π_grid[j]
v_1 = w / (1 β)
integral_f, integral_g = 0, 0
for m in prange(mc_size):
integral_f += v_func(w_f[m], κ(w_f[m], π), v)
integral_g += v_func(w_g[m], κ(w_g[m], π), v)
integral = (π * integral_f + (1 π) * integral_g) / mc_size
v_2 = c + β * integral
v_new[i, j] = max(v_1, v_2)
return v_new
@njit(parallel=parallel_flag)
def get_greedy(v):
""""
Compute optimal actions taking v as the value function.
"""
σ = np.empty_like(v)
for i in prange(len(w_grid)):
for j in prange(len(π_grid)):
w = w_grid[i]
π = π_grid[j]
v_1 = w / (1 β)
integral_f, integral_g = 0, 0
for m in prange(mc_size):
integral_f += v_func(w_f[m], κ(w_f[m], π), v)
integral_g += v_func(w_g[m], κ(w_g[m], π), v)
integral = (π * integral_f + (1 π) * integral_g) / mc_size
v_2 = c + β * integral
return σ
return T, get_greedy
34.4. TAKE 1: SOLUTION BY VFI 527
We will omit a detailed discussion of the code because there is a more efficient solution
method that we will use later.
To solve the model we will use the following function that iterates using T to find a fixed
point
"""
Solves for the value function
* sp is an instance of SearchProblem
"""
T, _ = operator_factory(sp, use_parallel)
# Set up loop
i = 0
error = tol + 1
m, n = len(sp.w_grid), len(sp.π_grid)
# Initialize v
v = np.zeros((m, n)) + sp.c / (1 sp.β)
if i == max_iter:
print("Failed to converge!")
return v_new
In [7]: sp = SearchProblem()
v_star = solve_model(sp)
fig, ax = plt.subplots(figsize=(6, 6))
ax.contourf(sp.π_grid, sp.w_grid, v_star, 12, alpha=0.6, cmap=cm.jet)
cs = ax.contour(sp.π_grid, sp.w_grid, v_star, 12, colors="black")
ax.clabel(cs, inline=1, fontsize=10)
ax.set(xlabel='$\pi$', ylabel='$w$')
plt.show()
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
528 CHAPTER 34. JOB SEARCH VII: SEARCH WITH LEARNING
Converged in 34 iterations.
plt.show()
The results fit well with our intuition from section [#looking-forward}{looking forward}.
• The black line in the figure above corresponds to the function 𝑤(𝜋)
̄ introduced there.
• It is decreasing as expected.
This section illustrates the point that when it comes to programming, a bit of mathematical
analysis goes a long way.
𝑤(𝜋)
̄
= 𝑐 + 𝛽 ∫ 𝑣(𝑤′ , 𝜋′ ) 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ (4)
1−𝛽
𝑤 𝑤(𝜋)
̄
𝑣(𝑤, 𝜋) = max { , } (5)
1−𝛽 1−𝛽
𝑤(𝜋)
̄ 𝑤′ ̄ ′)
𝑤(𝜋
= 𝑐 + 𝛽 ∫ max { , } 𝑞𝜋 (𝑤′ ) 𝑑𝑤′
1−𝛽 1−𝛽 1−𝛽
𝑤(𝜋)
̄ = (1 − 𝛽)𝑐 + 𝛽 ∫ max {𝑤′ , 𝑤̄ ∘ 𝜅(𝑤′ , 𝜋)} 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ (6)
Equation (6) can be understood as a functional equation, where 𝑤̄ is the unknown function.
• Let’s call it the reservation wage functional equation (RWFE).
• The solution 𝑤̄ to the RWFE is the object that we wish to compute.
To solve the RWFE, we will first show that its solution is the fixed point of a contraction
mapping.
To this end, let
• 𝑏[0, 1] be the bounded real-valued functions on [0, 1]
• ‖𝜔‖ ∶= sup𝑥∈[0,1] |𝜔(𝑥)|
Consider the operator 𝑄 mapping 𝜔 ∈ 𝑏[0, 1] into 𝑄𝜔 ∈ 𝑏[0, 1] via
Comparing (6) and (7), we see that the set of fixed points of 𝑄 exactly coincides with the set
of solutions to the RWFE.
• If 𝑄𝑤̄ = 𝑤̄ then 𝑤̄ solves (6) and vice versa.
34.8. IMPLEMENTATION 531
Moreover, for any 𝜔, 𝜔′ ∈ 𝑏[0, 1], basic algebra and the triangle inequality for integrals tells us
that
|(𝑄𝜔)(𝜋) − (𝑄𝜔′ )(𝜋)| ≤ 𝛽 ∫ |max {𝑤′ , 𝜔 ∘ 𝜅(𝑤′ , 𝜋)} − max {𝑤′ , 𝜔′ ∘ 𝜅(𝑤′ , 𝜋)}| 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ (8)
Working case by case, it is easy to check that for real numbers 𝑎, 𝑏, 𝑐 we always have
|(𝑄𝜔)(𝜋) − (𝑄𝜔′ )(𝜋)| ≤ 𝛽 ∫ |𝜔 ∘ 𝜅(𝑤′ , 𝜋) − 𝜔′ ∘ 𝜅(𝑤′ , 𝜋)| 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ ≤ 𝛽‖𝜔 − 𝜔′ ‖ (10)
In other words, 𝑄 is a contraction of modulus 𝛽 on the complete metric space (𝑏[0, 1], ‖ ⋅ ‖).
Hence
• A unique solution 𝑤̄ to the RWFE exists in 𝑏[0, 1].
• 𝑄𝑘 𝜔 → 𝑤̄ uniformly as 𝑘 → ∞, for any 𝜔 ∈ 𝑏[0, 1].
34.8 Implementation
The following function takes an instance of SearchProblem and returns the operator Q
f, g = sp.f, sp.g
w_f, w_g = sp.w_f, sp.w_g
β, c = sp.β, sp.c
mc_size = sp.mc_size
w_grid, π_grid = sp.w_grid, sp.π_grid
@njit
def ω_func(p, ω):
return interp(π_grid, ω, p)
@njit
def κ(w, π):
"""
Updates π using Bayes' rule and the current wage observation w.
"""
pf, pg = π * f(w), (1 π) * g(w)
π_new = pf / (pf + pg)
return π_new
@njit(parallel=parallel_flag)
def Q(ω):
532 CHAPTER 34. JOB SEARCH VII: SEARCH WITH LEARNING
"""
"""
ω_new = np.empty_like(ω)
for i in prange(len(π_grid)):
π = π_grid[i]
integral_f, integral_g = 0, 0
for m in prange(mc_size):
integral_f += max(w_f[m], ω_func(κ(w_f[m], π), ω))
integral_g += max(w_g[m], ω_func(κ(w_g[m], π), ω))
integral = (π * integral_f + (1 π) * integral_g) / mc_size
ω_new[i] = (1 β) * c + β * integral
return ω_new
return Q
34.9 Exercises
34.9.1 Exercise 1
34.10 Solutions
34.10.1 Exercise 1
This code solves the “Offer Distribution Unknown” model by iterating on a guess of the reser-
vation wage function.
You should find that the run time is shorter than that of the value function approach.
Similar to above, we set up a function to iterate with Q to find the fixed point
Q = Q_factory(sp, use_parallel)
34.10. SOLUTIONS 533
# Set up loop
i = 0
error = tol + 1
m, n = len(sp.w_grid), len(sp.π_grid)
# Initialize w
w = np.ones_like(sp.π_grid)
if i == max_iter:
print("Failed to converge!")
return w_new
In [11]: sp = SearchProblem()
w_bar = solve_wbar(sp)
Converged in 25 iterations.
534 CHAPTER 34. JOB SEARCH VII: SEARCH WITH LEARNING
34.11 Appendix A
The next piece of code generates a fun simulation to see what the effect of a change in the
underlying distribution on the unemployment rate is.
At a point in the simulation, the distribution becomes significantly worse.
It takes a while for agents to learn this, and in the meantime, they are too optimistic and
turn down too many jobs.
As a result, the unemployment rate spikes
@njit
def update(a, b, e, π):
"Update e and π by drawing wage offer from beta distribution with parameters a and
b"
34.11. APPENDIX A 535
if e == False:
w = np.random.beta(a, b) # Draw random wage
if w >= w_func(π):
e = True # Take new job
else:
π = 1 / (1 + ((1 π) * g(w)) / (π * f(w)))
return e, π
@njit
def simulate_path(F_a=F_a,
F_b=F_b,
G_a=G_a,
G_b=G_b,
N=5000, # Number of agents
T=600, # Simulation length
d=200, # Change date
s=0.025): # Separation rate
e = np.ones((N, T+1))
π = np.ones((N, T+1)) * 1e3
for t in range(T+1):
if t == d:
a, b = F_a, F_b # Change distribution parameters
34.12 Appendix B
In this appendix we provide more details about how Bayes’ Law contributes to the workings
of the model.
We present some graphs that bring out additional insights about how learning works.
We build on graphs proposed in this lecture.
In particular, we’ll add actions of our searching worker to a key graph presented in that lec-
ture.
To begin, we first define two functions for computing the empirical distributions of unemploy-
ment duration and π at the time of employment.
In [13]: @njit
def empirical_dist(F_a, F_b, G_a, G_b, w_bar, π_grid,
N=10000, T=600):
"""
Simulates population for computing empirical cumulative
distribution of unemployment duration and π at time when
the worker accepts the wage offer. For each job searching
problem, we simulate for two cases that either f or g is
the true offer distribution.
Parameters
Returns
accpet_t : 2 by N ndarray. the empirical distribution of
unemployment duration when f or g generates offers.
accept_π : 2 by N ndarray. the empirical distribution of
π at the time of employment when f or g generates offers.
"""
# f or g generates offers
for i, (a, b) in enumerate([(F_a, F_b), (G_a, G_b)]):
# update each agent
for n in range(N):
# initial priori
π = 0.5
for t in range(T+1):
def cumfreq_x(res):
"""
A helper function for calculating the x grids of
the cumulative frequency histogram.
"""
cumcount = res.cumcount
lowerlimit, binsize = res.lowerlimit, res.binsize
return x
Now we define a wrapper function for analyzing job search models with learning under differ-
ent parameterizations.
The wrapper takes parameters of beta distributions and unemployment compensation as in-
puts and then displays various things we want to know to interpret the solution of our search
model.
In addition, it computes empirical cumulative distributions of two key objects.
# part 1: display the details of the model settings and some results
w_grid = np.linspace(1e12, 11e12, 100)
ΔW = np.zeros((len(W), len(Π)))
ΔΠ = np.empty((len(W), len(Π)))
for i, w in enumerate(W):
for j, π in enumerate(Π):
lw = l(w)
ΔΠ[i, j] = π * (lw / (π * lw + 1 π) 1)
plt.show()
plt.show()
We now provide some examples that provide insights about how the model works.
34.13 Examples
The formula implies that the direction of motion of 𝜋𝑡 is determined by the relationship be-
tween 𝑙(𝑤𝑡 ) and 1.
The magnitude is small if
• 𝑙(𝑤) is close to 1, which means the new 𝑤 is not very informative for distinguishing two
distributions,
• 𝜋𝑡−1 is close to either 0 or 1, which means the priori is strong.
Will an unemployed worker accept an offer earlier or not, when the actual ruling distribution
is 𝑔 instead of 𝑓?
Two countervailing effects are at work.
• if f generates successive wage offers, then 𝑤 is more likely to be low, but 𝜋 is moving up
toward to 1, which lowers the reservation wage, i.e., the worker becomes less selective
the longer he or she remains unemployed.
• if g generates wage offers, then 𝑤 is more likely to be high, but 𝜋 is moving downward
toward 0, increasing the reservation wage, i.e., the worker becomes more selective the
longer he or she remains unemployed.
Quantitatively, the lower right figure sheds light on which effect dominates in this example.
It shows the probability that a previously unemployed worker accepts an offer at different val-
ues of 𝜋 when 𝑓 or 𝑔 generates wage offers.
That graph shows that for the particular 𝑓 and 𝑔 in this example, the worker is always more
likely to accept an offer when 𝑓 generates the data even when 𝜋 is close to zero so that the
worker believes the true distribution is 𝑔 and therefore is relatively more selective.
The empirical cumulative distribution of the duration of unemployment verifies our conjec-
ture.
In [15]: job_search_example()
34.13.2 Example 2
34.13.3 Example 3
34.13.4 Example 4
34.13.5 Example 5
35.1 Contents
• Overview 35.2
• Likelihood Ratio Process 35.3
• Nature Permanently Draws from Density g 35.4
• Nature Permanently Draws from Density f 35.5
• Likelihood Ratio Test 35.6
• Sequels 35.7
35.2 Overview
This lecture describes likelihood ratio processes and some of their uses.
We’ll use the simple statistical setting also used in this lecture.
Among the things that we’ll learn about are
• How a likelihood ratio process is the key ingredient in frequentist hypothesis testing
• How a receiver operator characteristic curve summarizes information about a false
alarm probability and power in frequentist hypothesis testing
• How during World War II the United States Navy devised a decision rule that Captain
Garret L. Schyler challenged and asked Milton Friedman to justify to him, a topic to be
studied in this lecture
A nonnegative random variable 𝑊 has one of two probability density functions, either 𝑓 or 𝑔.
549
550 CHAPTER 35. LIKELIHOOD RATIO PROCESSES
Before the beginning of time, nature once and for all decides whether she will draw a se-
quence of IID draws from either 𝑓 or 𝑔.
We will sometimes let 𝑞 be the density that nature chose once and for all, so that 𝑞 is either 𝑓
or 𝑔, permanently.
Nature knows which density it permanently draws from, but we the observers do not.
We do know both 𝑓 and 𝑔 but we don’t know which density nature chose.
But we want to know.
To do that, we use observations.
We observe a sequence {𝑤𝑡 }𝑇𝑡=1 of 𝑇 IID draws from either 𝑓 or 𝑔.
We want to use these observations to infer whether nature chose 𝑓 or 𝑔.
A likelihood ratio process is a useful tool for this task.
To begin, we define key component of a likelihood ratio process, namely, the time 𝑡 likelihood
ratio as the random variable
𝑓 (𝑤𝑡 )
ℓ(𝑤𝑡 ) = , 𝑡 ≥ 1.
𝑔 (𝑤𝑡 )
We assume that 𝑓 and 𝑔 both put positive probabilities on the same intervals of possible real-
izations of the random variable 𝑊 .
𝑓(𝑤𝑡 )
That means that under the 𝑔 density, ℓ(𝑤𝑡 ) = 𝑔(𝑤𝑡 ) is evidently a nonnegative random vari-
able with mean 1.
∞
A likelihood ratio process for sequence {𝑤𝑡 }𝑡=1 is defined as
𝑡
𝐿 (𝑤𝑡 ) = ∏ ℓ(𝑤𝑖 ),
𝑖=1
The likelihood ratio and its logarithm are key tools for making inferences using a classic fre-
quentist approach due to Neyman and Pearson [? ].
To help us appreciate how things work, the following Python code evaluates 𝑓 and 𝑔 as two
different beta distributions, then computes and simulates an associated likelihood ratio pro-
cess by generating a sequence 𝑤𝑡 from some probability distribution, for example, a sequence
of IID draws from 𝑔.
@vectorize
def p(x, a, b):
35.4. NATURE PERMANENTLY DRAWS FROM DENSITY G 551
In [3]: @njit
def simulate(a, b, T=50, N=500):
'''
Generate N sets of T observations of the likelihood ratio,
return as N x T matrix.
'''
for i in range(N):
for j in range(T):
w = np.random.beta(a, b)
l_arr[i, j] = f(w) / g(w)
return l_arr
We first simulate the likelihood ratio process when nature permanently draws from 𝑔.
In [5]: N, T = l_arr_g.shape
for i in range(N):
plt.ylim([0, 3])
plt.title("$L(w^{t})$ paths");
552 CHAPTER 35. LIKELIHOOD RATIO PROCESSES
Evidently, as sample length 𝑇 grows, most probability mass shifts toward zero
To see it this more clearly clearly, we plot over time the fraction of paths 𝐿 (𝑤𝑡 ) that fall in
the interval [0, 0.01].
Despite the evident convergence of most probability mass to a very small interval near 0, the
unconditional mean of 𝐿 (𝑤𝑡 ) under probability density 𝑔 is identically 1 for all 𝑡.
To verify this assertion, first notice that as mentioned earlier the unconditional mean
𝐸0 [ℓ (𝑤𝑡 ) ∣ 𝑞 = 𝑔] is 1 for all 𝑡:
𝑓 (𝑤𝑡 )
𝐸0 [ℓ (𝑤𝑡 ) ∣ 𝑞 = 𝑔] = ∫ 𝑔 (𝑤𝑡 ) 𝑑𝑤𝑡
𝑔 (𝑤𝑡 )
= ∫ 𝑓 (𝑤𝑡 ) 𝑑𝑤𝑡
= 1,
𝐸0 [𝐿 (𝑤1 ) ∣ 𝑞 = 𝑔] = 𝐸0 [ℓ (𝑤1 ) ∣ 𝑞 = 𝑔]
= 1.
Because 𝐿(𝑤𝑡 ) = ℓ(𝑤𝑡 )𝐿(𝑤𝑡−1 ) and {𝑤𝑡 }𝑡𝑡=1 is an IID sequence, we have
for any 𝑡 ≥ 1.
Mathematical induction implies 𝐸0 [𝐿 (𝑤𝑡 ) ∣ 𝑞 = 𝑔] = 1 for all 𝑡 ≥ 1.
How can 𝐸0 [𝐿 (𝑤𝑡 ) ∣ 𝑞 = 𝑔] = 1 possibly be true when most probability mass of the likelihood
ratio process is piling up near 0 as 𝑡 → +∞?
The answer has to be that as 𝑡 → +∞, the distribution of 𝐿𝑡 becomes more and more fat-
tailed: enough mass shifts to larger and larger values of 𝐿𝑡 to make the mean of 𝐿𝑡 continue
to be one despite most of the probability mass piling up near 0.
To illustrate this peculiar property, we simulate many paths and calculate the unconditional
mean of 𝐿 (𝑤𝑡 ) by averaging across these many paths at each 𝑡.
The following Python code approximates unconditional means 𝐸0 [𝐿 (𝑤𝑡 )] by averaging across
sample paths.
Please notice that while sample averages hover around their population means of 1, there is
quite a bit of variability, a consequence of the fat tail of the distribution of 𝐿 (𝑤𝑡 ).
In [8]: N, T = l_arr_g.shape
plt.plot(range(T), np.mean(l_arr_g, axis=0))
plt.hlines(1, 0, T, linestyle='')
554 CHAPTER 35. LIKELIHOOD RATIO PROCESSES
Now suppose that before time 0 nature permanently decided to draw repeatedly from density
𝑓.
While the mean of the likelihood ratio ℓ (𝑤𝑡 ) under density 𝑔 is 1, its mean under the density
𝑓 exceeds one.
To see this, we compute
𝑓 (𝑤𝑡 )
𝐸0 [ℓ (𝑤𝑡 ) ∣ 𝑞 = 𝑓] = ∫ 𝑓 (𝑤𝑡 ) 𝑑𝑤𝑡
𝑔 (𝑤𝑡 )
𝑓 (𝑤𝑡 ) 𝑓 (𝑤𝑡 )
=∫ 𝑔 (𝑤𝑡 ) 𝑑𝑤𝑡
𝑔 (𝑤𝑡 ) 𝑔 (𝑤𝑡 )
2
= ∫ ℓ (𝑤𝑡 ) 𝑔 (𝑤𝑡 ) 𝑑𝑤𝑡
2
= 𝐸0 [ℓ (𝑤𝑡 ) ∣ 𝑞 = 𝑔]
2
= 𝐸0 [ℓ (𝑤𝑡 ) ∣ 𝑞 = 𝑔] + 𝑉 𝑎𝑟 (ℓ (𝑤𝑡 ) ∣ 𝑞 = 𝑔)
2
> 𝐸0 [ℓ (𝑤𝑡 ) ∣ 𝑞 = 𝑔] = 1
This in turn implies that the unconditional mean of the likelihood ratio process 𝐿(𝑤𝑡 ) di-
verges toward +∞.
Simulations below confirm this conclusion.
Please note the scale of the 𝑦 axis.
35.5. NATURE PERMANENTLY DRAWS FROM DENSITY F 555
In [10]: N, T = l_arr_f.shape
plt.plot(range(T), np.mean(l_seq_f, axis=0))
We also plot the probability that 𝐿 (𝑤𝑡 ) falls into the interval [10000, ∞) as a function of time
and watch how fast probability mass diverges to +∞.
We now describe how to employ the machinery of Neyman and Pearson [? ] to test the hy-
pothesis that history 𝑤𝑡 is generated by repeated IID draws from density 𝑔.
Denote 𝑞 as the data generating process, so that 𝑞 = 𝑓 or 𝑔.
Upon observing a sample {𝑊𝑖 }𝑡𝑖=1 , we want to decide which one is the data generating pro-
cess by performing a (frequentist) hypothesis test.
We specify
• Null hypothesis 𝐻0 : 𝑞 = 𝑓,
• Alternative hypothesis 𝐻1 : 𝑞 = 𝑔.
Neyman and Pearson proved that the best way to test this hypothesis is to use a likelihood
ratio test that takes the form:
• reject 𝐻0 if 𝐿(𝑊 𝑡 ) < 𝑐,
• accept 𝐻0 otherwise.
where 𝑐 is a given discrimination threshold, to be chosen in a way we’ll soon describe.
This test is best in the sense that it is a uniformly most powerful test.
To understand what this means, we have to define probabilities of two important events that
allow us to characterize a test associated with given threshold 𝑐.
The two probabities are:
• Probability of detection (= power = 1 minus probability
of Type II error):
1 − 𝛽 ≡ Pr {𝐿 (𝑤𝑡 ) < 𝑐 ∣ 𝑞 = 𝑔}
35.6. LIKELIHOOD RATIO TEST 557
𝛼 ≡ Pr {𝐿 (𝑤𝑡 ) < 𝑐 ∣ 𝑞 = 𝑓}
The Neyman-Pearson Lemma states that among all possible tests, a likelihood ratio test max-
imizes the probability of detection for a given probability of false alarm.
Another way to say the same thing is that among all possible tests, a likelihood ratio test
maximizes power for a given significance level.
To have made a confident inference, we want a small probability of false alarm and a large
probability of detection.
With sample size 𝑡 fixed, we can change our two probabilities by adjusting 𝑐.
A troublesome “that’s life” fact is that these two probabilities move in the same direction as
we vary the critical value 𝑐.
Without specifying quantitative losses from making Type I and Type II errors, there is little
that we can say about how we should trade off probabilities of the two types of mistakes.
We do know that increasing sample size 𝑡 improves statistical inference.
Below we plot some informative figures that illustrate this.
We also present a classical frequentist method for choosing a sample size 𝑡.
Let’s start with a case in which we fix the threshold 𝑐 at 1.
In [12]: c = 1
Below we plot empirical distributions of logarithms of the cumulative likelihood ratios simu-
lated above, which are generated by either 𝑓 or 𝑔.
Taking logarithms has no effect on calculating the probabilities because the log is a mono-
tonic transformation.
As 𝑡 increases, the probabilities of making Type I and Type II errors both decrease, which is
good.
This is because most of the probability mass of log(𝐿(𝑤𝑡 )) moves toward −∞ when 𝑔 is the
data generating process, ; while log(𝐿(𝑤𝑡 )) goes to ∞ when data are generated by 𝑓.
This diverse behavior is what makes it possible to distinguish 𝑞 = 𝑓 from 𝑞 = 𝑔.
axs[nr, nc].legend()
axs[nr, nc].set_title(f"t={t}")
plt.show()
The graph below shows more clearly that, when we hold the threshold 𝑐 fixed, the probabil-
ity of detection monotonically increases with increases in 𝑡 and that the probability of a false
alarm monotonically decreases.
In [14]: PD = np.empty(T)
PFA = np.empty(T)
for t in range(T):
PD[t] = np.sum(l_seq_g[:, t] < c) / N
PFA[t] = np.sum(l_seq_f[:, t] < c) / N
For a given sample size 𝑡, the threshold 𝑐 uniquely pins down probabilities of both types of
error.
If for a fixed 𝑡 we now free up and move 𝑐, we will sweep out the probability of detection as a
function of the probability of false alarm.
This produces what is called a receiver operating characteristic curve for a given discrimina-
tion threshold 𝑐.
Below, we plot receiver operating characteristic curves for a given discrimination threshold 𝑐
but different sample sizes 𝑡.
Notice that as 𝑡 increases, we are assured a larger probability of detection and a smaller prob-
ability of false alarm associated with a given discrimination threshold 𝑐.
As 𝑡 → +∞, we approach the the perfect detection curve that is indicated by a right angle
hinging on the green dot.
For a given sample size 𝑡, a value discrimination threshold 𝑐 determines a point on the re-
ceiver operating characteristic curve.
It is up to the test designer to trade off probabilities of making the two types of errors.
But we know how to choose the smallest sample size to achieve given targets for the probabil-
ities.
Typically, frequentists aim for a high probability of detection that respects an upper bound
on the probability of false alarm.
Below we show an example in which we fix the probability of false alarm at 0.05.
The required sample size for making a decision is then determined by a target probability of
detection, for example, 0.9, as depicted in the following graph.
for t in range(T):
plt.plot(range(T), PD)
plt.axhline(0.9, color="k", ls="")
plt.xlabel("t")
35.7. SEQUELS 561
plt.ylabel("Probability of detection")
plt.title(f"Probability of false alarm={PFA}")
plt.show()
The United States Navy evidently used a procedure like this to select a sample size 𝑡 for do-
ing quality control tests during World War II.
A Navy Captain who had been ordered to perform tests of this kind had second thoughts
about it that he presented to Milton Friedman, as we describe in this lecture.
35.7 Sequels
Likelihood processes play an important role in Bayesian learning, as described in this lecture
and as applied in this lecture.
Likelihood ratio processes appear again in this lecture, which contains another illustration of
the peculiar property of likelihood ratio processes described above.
562 CHAPTER 35. LIKELIHOOD RATIO PROCESSES
Chapter 36
36.1 Contents
• Overview 36.2
• Origin of the Problem 36.3
• A Dynamic Programming Approach 36.4
• Implementation 36.5
• Analysis 36.6
• Comparison with Neyman-Pearson Formulation 36.7
• Sequels 36.8
In addition to what’s in Anaconda, this lecture will need the following libraries:
36.2 Overview
This lecture describes a statistical decision problem encountered by Milton Friedman and W.
Allen Wallis during World War II when they were analysts at the U.S. Government’s Statisti-
cal Research Group at Columbia University.
This problem led Abraham Wald [109] to formulate sequential analysis, an approach to
statistical decision problems intimately related to dynamic programming.
In this lecture, we apply dynamic programming algorithms to Friedman and Wallis and
Wald’s problem.
Key ideas in play will be:
• Bayes’ Law
• Dynamic programming
• Type I and type II statistical errors
– a type I error occurs when you reject a null hypothesis that is true
563
564 CHAPTER 36. A PROBLEM THAT STUMPED MILTON FRIEDMAN
This lecture uses ideas studied in this lecture, this lecture. and this lecture.
On pages 137-139 of his 1998 book Two Lucky People with Rose Friedman [39], Milton Fried-
man described a problem presented to him and Allen Wallis during World War II, when they
worked at the US Government’s Statistical Research Group at Columbia University.
Let’s listen to Milton Friedman tell us what happened
The standard statistical answer was to specify a number of firings (say 1,000) and
a pair of percentages (e.g., 53% and 47%) and tell the client that if A receives a 1
in more than 53% of the firings, it can be regarded as superior; if it receives a 1 in
fewer than 47%, B can be regarded as superior; if the percentage is between 47%
and 53%, neither can be so regarded.
When Allen Wallis was discussing such a problem with (Navy) Captain Garret L.
Schyler, the captain objected that such a test, to quote from Allen’s account, may
prove wasteful. If a wise and seasoned ordnance officer like Schyler were on the
premises, he would see after the first few thousand or even few hundred [rounds]
that the experiment need not be completed either because the new method is ob-
viously inferior or because it is obviously superior beyond what was hoped for ….
Friedman and Wallis struggled with the problem but, after realizing that they were not able
to solve it, described the problem to Abraham Wald.
That started Wald on the path that led him to Sequential Analysis [109].
We’ll formulate the problem using dynamic programming.
36.4. A DYNAMIC PROGRAMMING APPROACH 565
The following presentation of the problem closely follows Dmitri Berskekas’s treatment in
Dynamic Programming and Stochastic Control [12].
A decision-maker observes a sequence of draws of a random variable 𝑧.
He (or she) wants to know which of two probability distributions 𝑓0 or 𝑓1 governs 𝑧.
Conditional on knowing that successive observations are drawn from distribution 𝑓0 , the se-
quence of random variables is independently and identically distributed (IID).
Conditional on knowing that successive observations are drawn from distribution 𝑓1 , the se-
quence of random variables is also independently and identically distributed (IID).
But the observer does not know which of the two distributions generated the sequence.
For reasons explained Exchangeability and Bayesian Updating, this means that the sequence
is not IID and that the observer has something to learn, even though he knows both 𝑓0 and
𝑓1 .
After a number of draws, also to be determined, he makes a decision about which of the dis-
tributions is generating the draws he observes.
He starts with prior
𝜋𝑘 = ℙ{𝑓 = 𝑓0 ∣ 𝑧𝑘 , 𝑧𝑘−1 , … , 𝑧0 }
𝜋𝑘 𝑓0 (𝑧𝑘+1 )
𝜋𝑘+1 = , 𝑘 = −1, 0, 1, …
𝜋𝑘 𝑓0 (𝑧𝑘+1 ) + (1 − 𝜋𝑘 )𝑓1 (𝑧𝑘+1 )
After observing 𝑧𝑘 , 𝑧𝑘−1 , … , 𝑧0 , the decision-maker believes that 𝑧𝑘+1 has probability distribu-
tion
This is a mixture of distributions 𝑓0 and 𝑓1 , with the weight on 𝑓0 being the posterior proba-
bility that 𝑓 = 𝑓0 Section ??.
To help illustrate this kind of distribution, let’s inspect some mixtures of beta distributions.
The density of a beta probability distribution with parameters 𝑎 and 𝑏 is
∞
Γ(𝑎 + 𝑏)𝑧 𝑎−1 (1 − 𝑧)𝑏−1
𝑓(𝑧; 𝑎, 𝑏) = where Γ(𝑡) ∶= ∫ 𝑥𝑡−1 𝑒−𝑥 𝑑𝑥
Γ(𝑎)Γ(𝑏) 0
The next figure shows two beta distributions in the top panel.
The bottom panel presents mixtures of these distributions, with various mixing probabilities
𝜋𝑘
566 CHAPTER 36. A PROBLEM THAT STUMPED MILTON FRIEDMAN
In [3]: @jit(nopython=True)
def p(x, a, b):
r = gamma(a + b) / (gamma(a) * gamma(b))
return r * x**(a1) * (1 x)**(b1)
f0 = lambda x: p(x, 1, 1)
f1 = lambda x: p(x, 9, 9)
grid = np.linspace(0, 1, 50)
axes[0].set_title("Original Distributions")
axes[0].plot(grid, f0(grid), lw=2, label="$f_0$")
axes[0].plot(grid, f1(grid), lw=2, label="$f_1$")
axes[1].set_title("Mixtures")
for π in 0.25, 0.5, 0.75:
y = π * f0(grid) + (1 π) * f1(grid)
axes[1].plot(y, lw=2, label=f"$\pi_k$ = {π}")
for ax in axes:
ax.legend()
ax.set(xlabel="$z$ values", ylabel="probability of $z_k$")
plt.tight_layout()
plt.show()
36.4. A DYNAMIC PROGRAMMING APPROACH 567
After observing 𝑧𝑘 , 𝑧𝑘−1 , … , 𝑧0 , the decision-maker chooses among three distinct actions:
• He decides that 𝑓 = 𝑓0 and draws no more 𝑧’s
• He decides that 𝑓 = 𝑓1 and draws no more 𝑧’s
• He postpones deciding now and instead chooses to draw a 𝑧𝑘+1
Associated with these three actions, the decision-maker can suffer three kinds of losses:
• A loss 𝐿0 if he decides 𝑓 = 𝑓0 when actually 𝑓 = 𝑓1
• A loss 𝐿1 if he decides 𝑓 = 𝑓1 when actually 𝑓 = 𝑓0
• A cost 𝑐 if he postpones deciding and chooses instead to draw another 𝑧
36.4.3 Intuition
Let’s try to guess what an optimal decision rule might look like before we go further.
Suppose at some given point in time that 𝜋 is close to 1.
Then our prior beliefs and the evidence so far point strongly to 𝑓 = 𝑓0 .
If, on the other hand, 𝜋 is close to 0, then 𝑓 = 𝑓1 is strongly favored.
Finally, if 𝜋 is in the middle of the interval [0, 1], then we have little information in either di-
rection.
This reasoning suggests a decision rule such as the one shown in the figure
As we’ll see, this is indeed the correct form of the decision rule.
The key problem is to determine the threshold values 𝛼, 𝛽, which will depend on the parame-
ters listed above.
You might like to pause at this point and try to predict the impact of a parameter such as 𝑐
or 𝐿0 on 𝛼 or 𝛽.
568 CHAPTER 36. A PROBLEM THAT STUMPED MILTON FRIEDMAN
Let 𝐽 (𝜋) be the total loss for a decision-maker with current belief 𝜋 who chooses optimally.
With some thought, you will agree that 𝐽 should satisfy the Bellman equation
𝜋𝑓0 (𝑧 ′ )
𝜋′ = 𝜅(𝑧 ′ , 𝜋) =
𝜋𝑓0 (𝑧′ ) + (1 − 𝜋)𝑓1 (𝑧′ )
when 𝜋 is fixed and 𝑧′ is drawn from the current best guess, which is the distribution 𝑓 de-
fined by
and
accept 𝑓 = 𝑓0 if 𝜋 ≥ 𝛼
accept 𝑓 = 𝑓1 if 𝜋 ≤ 𝛽
draw another 𝑧 if 𝛽 ≤ 𝜋 ≤ 𝛼
36.5. IMPLEMENTATION 569
Our aim is to compute the value function 𝐽 , and from it the associated cutoffs 𝛼 and 𝛽.
To make our computations simpler, using (2), we can write the continuation value ℎ(𝜋) as
The equality
ℎ(𝜋) = 𝑐 + ∫ min{(1 − 𝜅(𝑧 ′ , 𝜋))𝐿0 , 𝜅(𝑧 ′ , 𝜋)𝐿1 , ℎ(𝜅(𝑧 ′ , 𝜋))}𝑓𝜋 (𝑧′ )𝑑𝑧 ′ (4)
36.5 Implementation
In [5]: @jitclass(wf_data)
class WaldFriedman:
def __init__(self,
c=1.25,
a0=1,
b0=1,
a1=3,
b1=1.2,
L0=25,
L1=25,
π_grid_size=200,
570 CHAPTER 36. A PROBLEM THAT STUMPED MILTON FRIEDMAN
mc_size=1000):
def f0_rvs(self):
return np.random.beta(self.a0, self.b0)
def f1_rvs(self):
return np.random.beta(self.a1, self.b1)
return π_new
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/ipykernel_launcher.py:1: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba
doc/latest/reference/deprecation.html#changeofjitclasslocation for the time�
↪frame.
"""Entry point for launching an IPython kernel.
κ = wf.κ
h_new = np.empty_like(π_grid)
h_func = lambda p: interp(π_grid, h, p)
for i in prange(len(π_grid)):
π = π_grid[i]
h_new[i] = c + integral
return h_new
To solve the model, we will iterate using Q to find the fixed point
In [7]: @jit(nopython=True)
def solve_model(wf, tol=1e4, max_iter=1000):
"""
Compute the continuation value function
* wf is an instance of WaldFriedman
"""
# Set up loop
h = np.zeros(len(wf.π_grid))
i = 0
error = tol + 1
if i == max_iter:
print("Failed to converge!")
return h_new
36.6 Analysis
In [8]: wf = WaldFriedman()
ax.plot(wf.f0(wf.π_grid), label="$f_0$")
ax.plot(wf.f1(wf.π_grid), label="$f_1$")
ax.set(ylabel="probability of $z_k$", xlabel="$k$", title="Distributions")
ax.legend()
plt.show()
We will also set up a function to compute the cutoffs 𝛼 and 𝛽 and plot these on our value
function plot
In [10]: @jit(nopython=True)
def find_cutoff_rule(wf, h):
"""
This function takes a continuation value function and returns the
corresponding cutoffs of where you transition between continuing and
choosing a specific model
36.6. ANALYSIS 573
"""
π_grid = wf.π_grid
L0, L1 = wf.L0, wf.L1
return (β, α)
β, α = find_cutoff_rule(wf, h_star)
cost_L0 = (1 wf.π_grid) * wf.L0
cost_L1 = wf.π_grid * wf.L1
plt.legend(borderpad=1.1)
plt.show()
574 CHAPTER 36. A PROBLEM THAT STUMPED MILTON FRIEDMAN
36.6.2 Simulations
The next figure shows the outcomes of 500 simulations of the decision process.
On the left is a histogram of the stopping times, which equal the number of draws of 𝑧𝑘 re-
quired to make a decision.
The average number of draws is around 6.6.
On the right is the fraction of correct decisions at the stopping time.
In this case, the decision-maker is correct 80% of the time
"""
This function takes an initial condition and simulates until it
stops (when a decision is made)
"""
if true_dist == "f0":
f, f_rvs = wf.f0, wf.f0_rvs
elif true_dist == "f1":
f, f_rvs = wf.f1, wf.f1_rvs
# Find cutoffs
β, α = find_cutoff_rule(wf, h_star)
if true_dist == "f0":
if decision == 0:
correct = True
else:
correct = False
return correct, π, t
"""
Simulates repeatedly to get distributions of time needed to make a
decision and how often they are correct
"""
for i in range(ndraws):
correct, π, t = simulate(wf, true_dist, h_star)
tdist[i] = t
cdist[i] = correct
def simulation_plot(wf):
h_star = solve_model(wf)
ndraws = 500
cdist, tdist = stopping_dist(wf, h_star, ndraws)
ax[0].hist(tdist, bins=np.max(tdist))
ax[0].set_title(f"Stopping times over {ndraws} replications")
ax[0].set(xlabel="time", ylabel="number of stops")
ax[0].annotate(f"mean = {np.mean(tdist)}", xy=(max(tdist) / 2,
max(np.histogram(tdist, bins=max(tdist))[0]) / 2))
ax[1].hist(cdist.astype(int), bins=2)
ax[1].set_title(f"Correct decisions over {ndraws} replications")
ax[1].annotate(f"% correct = {np.mean(cdist)}",
xy=(0.05, ndraws / 2))
plt.show()
simulation_plot(wf)
In [12]: wf = WaldFriedman(c=2.5)
simulation_plot(wf)
36.7. COMPARISON WITH NEYMAN-PEARSON FORMULATION 577
Increased cost per draw has induced the decision-maker to take less draws before deciding.
Because he decides with less, the percentage of time he is correct drops.
This leads to him having a higher expected loss when he puts equal weight on both models.
To facilitate comparative statics, we provide a Jupyter notebook that generates the same
plots, but with sliders.
With these sliders, you can adjust parameters and immediately observe
• effects on the smoothness of the value function in the indecisive middle range as we in-
crease the number of grid points in the piecewise linear approximation.
• effects of different settings for the cost parameters 𝐿0 , 𝐿1 , 𝑐, the parameters of two beta
distributions 𝑓0 and 𝑓1 , and the number of points and linear functions 𝑚 to use in the
piece-wise continuous approximation to the value function.
• various simulations from 𝑓0 and associated distributions of waiting times to making a
decision.
• associated histograms of correct and incorrect decisions.
For several reasons, it is useful to describe the theory underlying the test that Navy Captain
G. S. Schuyler had been told to use and that led him to approach Milton Friedman and Allan
Wallis to convey his conjecture that superior practical procedures existed.
Evidently, the Navy had told Captail Schuyler to use what it knew to be a state-of-the-art
Neyman-Pearson test.
We’ll rely on Abraham Wald’s [109] elegant summary of Neyman-Pearson theory.
For our purposes, watch for there features of the setup:
• the assumption of a fixed sample size 𝑛
• the application of laws of large numbers, conditioned on alternative probability models,
to interpret the probabilities 𝛼 and 𝛽 defined in the Neyman-Pearson theory
Recall that in the sequential analytic formulation above, that
• The sample size 𝑛 is not fixed but rather an object to be chosen; technically 𝑛 is a ran-
dom variable.
• The parameters 𝛽 and 𝛼 characterize cut-off rules used to determine 𝑛 as a random
variable.
• Laws of large numbers make no appearances in the sequential construction.
In chapter 1 of Sequential Analysis [109] Abraham Wald summarizes the Neyman-Pearson
approach to hypothesis testing.
Wald frames the problem as making a decision about a probability distribution that is par-
tially known.
(You have to assume that something is already known in order to state a well-posed problem
– usually, something means a lot)
578 CHAPTER 36. A PROBLEM THAT STUMPED MILTON FRIEDMAN
By limiting what is unknown, Wald uses the following simple structure to illustrate the main
ideas:
• A decision-maker wants to decide which of two distributions 𝑓0 , 𝑓1 govern an IID ran-
dom variable 𝑧.
• The null hypothesis 𝐻0 is the statement that 𝑓0 governs the data.
• The alternative hypothesis 𝐻1 is the statement that 𝑓1 governs the data.
• The problem is to devise and analyze a test of hypothesis 𝐻0 against the alternative
hypothesis 𝐻1 on the basis of a sample of a fixed number 𝑛 independent observations
𝑧1 , 𝑧2 , … , 𝑧𝑛 of the random variable 𝑧.
To quote Abraham Wald,
As a basis for choosing among critical regions the following considerations have
been advanced by Neyman and Pearson: In accepting or rejecting 𝐻0 we may
commit errors of two kinds. We commit an error of the first kind if we reject 𝐻0
when it is true; we commit an error of the second kind if we accept 𝐻0 when 𝐻1
is true. After a particular critical region 𝑊 has been chosen, the probability of
committing an error of the first kind, as well as the probability of committing an
error of the second kind is uniquely determined. The probability of committing an
error of the first kind is equal to the probability, determined by the assumption
that 𝐻0 is true, that the observed sample will be included in the critical region 𝑊 .
The probability of committing an error of the second kind is equal to the proba-
bility, determined on the assumption that 𝐻1 is true, that the probability will fall
outside the critical region 𝑊 . For any given critical region 𝑊 we shall denote the
probability of an error of the first kind by 𝛼 and the probability of an error of the
second kind by 𝛽.
Let’s listen carefully to how Wald applies law of large numbers to interpret 𝛼 and 𝛽:
𝐻1 is true, the probability is nearly 1 that the proportion of wrong statements will
be approximately 𝛽. Thus, we can say that in the long run [ here Wald applies
law of large numbers by driving 𝑀 → ∞ (our comment, not Wald’s) ] the propor-
tion of wrong statements will be 𝛼 if 𝐻0 is true and 𝛽 if 𝐻1 is true.
The quantity 𝛼 is called the size of the critical region, and the quantity 1 − 𝛽 is called the
power of the critical region.
Wald notes that
one critical region 𝑊 is more desirable than another if it has smaller values of 𝛼
and 𝛽. Although either 𝛼 or 𝛽 can be made arbitrarily small by a proper choice of
the critical region 𝑊 , it is possible to make both 𝛼 and 𝛽 arbitrarily small for a
fixed value of 𝑛, i.e., a fixed sample size.
Neyman and Pearson show that a region consisting of all samples (𝑧1 , 𝑧2 , … , 𝑧𝑛 )
which satisfy the inequality
𝑓1 (𝑧1 ) ⋯ 𝑓1 (𝑧𝑛 )
≥𝑘
𝑓0 (𝑧1 ) ⋯ 𝑓0 (𝑧𝑛 )
is a most powerful critical region for testing the hypothesis 𝐻0 against the alternative hy-
pothesis 𝐻1 . The term 𝑘 on the right side is a constant chosen so that the region will have
the required size 𝛼.
Wald goes on to discuss Neyman and Pearson’s concept of uniformly most powerful test.
Here is how Wald introduces the notion of a sequential test
A rule is given for making one of the following three decisions at any stage of the
experiment (at the m th trial for each integral value of m ): (1) to accept the hy-
pothesis H , (2) to reject the hypothesis H , (3) to continue the experiment by
making an additional observation. Thus, such a test procedure is carried out se-
quentially. On the basis of the first observation, one of the aforementioned deci-
sion is made. If the first or second decision is made, the process is terminated. If
the third decision is made, a second trial is performed. Again, on the basis of the
first two observations, one of the three decision is made. If the third decision is
made, a third trial is performed, and so on. The process is continued until either
the first or the second decisions is made. The number n of observations required
by such a test procedure is a random variable, since the value of n depends on the
outcome of the observations.
Footnotes
[1] The decision maker acts as if he believes that the sequence of random variables [𝑧0 , 𝑧1 , …]
is exchangeable. See Exchangeability and Bayesian Updating and [67] chapter 11, for discus-
sions of exchangeability.
580 CHAPTER 36. A PROBLEM THAT STUMPED MILTON FRIEDMAN
36.8 Sequels
We’ll dig deeper into some of the ideas used here in the following lectures:
• this lecture discusses the key concept of exchangeability that rationalizes statistical
learning
• this lecture describes likelihood ratio processes and their role in frequentist and
Bayesian statistical theories
• this lecture discusses the role of likelihood ratio processes in Bayesian learning
• this lecture returns to the subject of this lecture and studies whether the Captain’s
hunch that the (frequentist) decision rule that the Navy had ordered him to use can
be expected to be better or worse than the rule sequential rule that Abraham Wald de-
signed
Chapter 37
37.1 Contents
• Overview 37.2
• Independently and Identically Distributed 37.3
• A Setting in Which Past Observations Are Informative 37.4
• Relationship Between IID and Exchangeable 37.5
• Exchangeability 37.6
• Bayes’ Law 37.7
• More Details about Bayesian Updating 37.8
• Appendix 37.9
• Sequels 37.10
37.2 Overview
581
582 CHAPTER 37. EXCHANGEABILITY AND BAYESIAN UPDATING
so that the joint density is the product of a sequence of identical marginal densities.
37.3.1 IID Means Past Observations Don’t Tell Us Anything About Future
Observations
If a sequence is random variables is IID, past information provides no information about fu-
ture realizations.
In this sense, there is nothing to learn about the future from the past.
To understand these statements, let the joint distribution of a sequence of random variables
{𝑊𝑡 }𝑇𝑡=0 that is not necessarily IID, be
𝑝(𝑊𝑇 , 𝑊𝑇 −1 , … , 𝑊1 , 𝑊0 )
37.4. A SETTING IN WHICH PAST OBSERVATIONS ARE INFORMATIVE 583
Using the laws of probability, we can always factor such a joint density into a product of con-
ditional densities:
In general,
which states that the conditional density on the left side does not equal the marginal
density on the right side.
In the special IID case,
• doesnt’t know which of these two distributions that nature has drawn
• summarizing his ignorance by acting as if or thinking that nature chose distribution 𝐹
with probability 𝜋̃ ∈ (0, 1) and distribution 𝐺 with probability 1 − 𝜋̃
• at date 𝑡 ≥ 0 has observed the partial history 𝑤𝑡 , 𝑤𝑡−1 , … , 𝑤0 of draws from the appro-
priate joint density of the partial history
But what do we mean by the appropriate joint distribution?
We’ll discuss that next and in the process describe the concept of exchangeability.
𝑓(𝑊0 )𝑓(𝑊1 ) ⋯
𝑔(𝑊0 )𝑔(𝑊1 ) ⋯
ℎ(𝑊0 , 𝑊1 , …) ≡ 𝜋[𝑓(𝑊
̃ 0 )𝑓(𝑊1 ) ⋯] + (1 − 𝜋)[𝑔(𝑊
̃ 0 )𝑔(𝑊1 ) ⋯] (1)
Under the unconditional distribution ℎ(𝑊0 , 𝑊1 , …), the sequence 𝑊0 , 𝑊1 , … is not indepen-
dently and identically distributed.
To verify this claim, it is sufficient to notice, for example, that
ℎ(𝑤0 , 𝑤1 ) = 𝜋𝑓(𝑤
̃ 0 )𝑓(𝑤1 )+(1− 𝜋)𝑔(𝑤
̃ 0 )𝑔(𝑤1 ) ≠ (𝜋𝑓(𝑤
̃ 0 )+(1− 𝜋)𝑔(𝑤
̃ 0 ))(𝜋𝑓(𝑤
̃ 1 )+(1− 𝜋)𝑔(𝑤
̃ 1 ))
ℎ(𝑤0 , 𝑤1 )
ℎ(𝑤1 |𝑤0 ) ≡ ≠ (𝜋𝑓(𝑤
̃ 1 ) + (1 − 𝜋)𝑔(𝑤
̃ 1 ))
(𝜋𝑓(𝑤
̃ 0 + (1 − 𝜋)𝑔(𝑤
) ̃ 0 ))
37.6 Exchangeability
While the sequence 𝑊0 , 𝑊1 , … is not IID, it can be verified that it is exchangeable, which
means that
ℎ(𝑤0 , 𝑤1 ) = ℎ(𝑤1 , 𝑤0 )
and so on.
More generally, a sequence of random variables is said to be exchangeable if the joint prob-
ability distribution for the sequence does not change when the positions in the sequence in
which finitely many of the random variables appear are altered.
Equation (1) represents our instance of an exchangeable joint density over a sequence of ran-
dom variables as a mixture of two IID joint densities over a sequence of random variables.
For a Bayesian statistician, the mixing parameter 𝜋̃ ∈ (0, 1) has a special interpretation as a
prior probability that nature selected probability distribution 𝐹 .
DeFinetti [25] established a related representation of an exchangeable process created by mix-
ing sequences of IID Bernoulli random variables with parameters 𝜃 and mixing probability
𝜋(𝜃) for a density 𝜋(𝜃) that a Bayesian statistician would interpret as a prior over the un-
known Bernoulli paramter 𝜃.
We noted above that in our example model there is something to learn about about the fu-
ture from past data drawn from our particular instance of a process that is exchangeable but
not IID.
But how can we learn?
And about what?
The answer to the about what question is about 𝑝𝑖.
̃
The answer to the how question is to use Bayes’ Law.
Another way to say use Bayes’ Law is to say compute an appropriate conditional distribution.
Let’s dive into Bayes’ Law in this context.
Let 𝑞 represent the distribution that nature actually draws from 𝑤 from and let
𝜋 = ℙ{𝑞 = 𝑓}
where we regard 𝜋 as the decision maker’s subjective probability (also called a personal
probability.
Suppose that at 𝑡 ≥ 0, the decision maker has observed a history 𝑤𝑡 ≡ [𝑤𝑡 , 𝑤𝑡−1 , … , 𝑤0 ].
We let
𝜋𝑡 = ℙ{𝑞 = 𝑓|𝑤𝑡 }
𝜋−1 = 𝜋̃
𝜋𝑡 𝑓 + (1 − 𝜋𝑡 )𝑔.
𝜋𝑡 𝑓(𝑤𝑡+1 )
𝜋𝑡+1 = (2)
𝜋𝑡 𝑓(𝑤𝑡+1 ) + (1 − 𝜋𝑡 )𝑔(𝑤𝑡+1 )
The last expression follows from Bayes’ rule, which tells us that
ℙ{𝑊 = 𝑤 | 𝑞 = 𝑓}ℙ{𝑞 = 𝑓}
ℙ{𝑞 = 𝑓 | 𝑊 = 𝑤} = and ℙ{𝑊 = 𝑤} = ∑ ℙ{𝑊 = 𝑤 | 𝑞 = 𝜔}ℙ{𝑞 = 𝜔}
ℙ{𝑊 = 𝑤} 𝜔∈{𝑓,𝑔}
Let’s stare at and rearrange Bayes’ Law as represented in equation (2) with the aim of under-
standing how the posterior 𝜋𝑡+1 is influenced by the prior 𝜋𝑡 and the likelihood ratio
𝑓(𝑤)
𝑙(𝑤) =
𝑔(𝑤)
𝑓(𝑤𝑡+1 )
𝜋𝑡 𝑓 (𝑤𝑡+1 ) 𝜋𝑡 𝑔(𝑤 ) 𝜋𝑡 𝑙 (𝑤𝑡+1 )
𝜋𝑡+1 = = 𝑓(𝑤 ) 𝑡+1 =
𝜋𝑡 𝑓 (𝑤𝑡+1 ) + (1 − 𝜋𝑡 ) 𝑔 (𝑤𝑡+1 ) 𝜋𝑡 𝑔(𝑤 ) + (1 − 𝜋𝑡 )
𝑡+1 𝜋𝑡 𝑙 (𝑤𝑡+1 ) + (1 − 𝜋𝑡 )
𝑡+1
Notice how the likelihood ratio and the prior interact to determine whether an observation
𝑤𝑡+1 leads the decision maker to increase or decrease the subjective probability he/she at-
taches to distribution 𝐹 .
When the likelihood ratio 𝑙(𝑤𝑡+1 ) exceeds one, the observation 𝑤𝑡+1 nudges the probability
𝜋 put on distribution 𝐹 upward, and when the likelihood ratio 𝑙(𝑤𝑡+1 ) is less that one, the
observation 𝑤𝑡+1 nudges 𝜋 downward.
Representation (3) is the foundation of the graphs that we’ll use to display the dynamics of
{𝜋𝑡 }∞
𝑡=0 that are induced by Bayes’ Law.
We’ll plot 𝑙 (𝑤) as a way to enlighten us about how learning – i.e., Bayesian updating of the
probability 𝜋 that nature has chosen distribution 𝑓 – works.
37.8. MORE DETAILS ABOUT BAYESIAN UPDATING 587
To create the Python infrastructure to do our work for us, we construct a wrapper function
that displays informative graphs given parameters of 𝑓 and 𝑔.
In [2]: @vectorize
def p(x, a, b):
"The general beta distribution function."
r = gamma(a + b) / (gamma(a) * gamma(b))
return r * x ** (a1) * (1 x) ** (b1)
w_max = 1
w_grid = np.linspace(1e12, w_max1e12, 100)
ΔW = np.zeros((len(W), len(Π)))
ΔΠ = np.empty((len(W), len(Π)))
for i, w in enumerate(W):
for j, π in enumerate(Π):
lw = l(w)
ΔΠ[i, j] = π * (lw / (π * lw + 1 π) 1)
plt.show()
Now we’ll create a group of graphs designed to illustrate the dynamics induced by Bayes’
Law.
We’ll begin with the default values of various objects, then change them in a subsequent ex-
ample.
In [3]: learning_example()
Please look at the three graphs above created for an instance in which 𝑓 is a uniform distri-
bution on [0, 1] (i.e., a Beta distribution with parameters 𝐹𝑎 = 1, 𝐹𝑏 = 1, while 𝑔 is a Beta
distribution with the default parameter values 𝐺𝑎 = 3, 𝐺𝑏 = 1.2.
The graph in the left plots the likehood ratio 𝑙(𝑤) on the coordinate axis against 𝑤 on the
coordinate axis.
The middle graph plots both 𝑓(𝑤) and 𝑔(𝑤) against 𝑤, with the horizontal dotted lines show-
ing values of 𝑤 at which the likelihood ratio equals 1.
The graph on the right side plots arrows to the right that show when Bayes’ Law makes 𝜋
increase and arrows to the left that show when Bayes’ Law make 𝜋 decrease.
Notice how the length of the arrows, which show the magnitude of the force from Bayes’ Law
impelling 𝜋 to change, depend on both the prior probability 𝜋 on the ordinate axis and the
evidence in the form of the current draw of 𝑤 on the coordinate axis.
37.9. APPENDIX 589
The fractions in the colored areas of the middle graphs are probabilities under 𝐹 and 𝐺, re-
spectively, that realizations of 𝑤 fall into the interval that updates the belief 𝜋 in a correct
direction (i.e., toward 0 when 𝐺 is the true distribution, and towards 1 when 𝐹 is the true
distribution).
For example, in the above example, under true distribution 𝐹 , 𝜋 will be updated toward 0 if
𝑤 falls into the interval [0.524, 0.999], which occurs with probability 1 − .524 = .476 under
𝐹 . But this would occur with probability 0.816 if 𝐺 were the true distribution. The fraction
0.816 in the orange region is the integral of 𝑔(𝑤) over this interval.
Next we use our code to create graphs for another instance of our model.
We keep 𝐹 the same as in the preceding instance, namely a uniform distribution, but now
assume that 𝐺 is a Beta distribution with parameters 𝐺𝑎 = 2, 𝐺𝑏 = 1.6.
Notice how the likelihood ratio, the middle graph, and the arrows compare with the previous
instance of our example.
37.9 Appendix
Now we’ll have some fun by plotting multiple realizations of sample paths of 𝜋𝑡 under two
possible assumptions about nature’s choice of distribution:
• that nature permanently draws from 𝐹
• that nature permanently draws from 𝐺
Outcomes depend on a peculiar property of likelihood ratio processes that are discussed in
this lecture
To do this, we create some Python code.
# define f and g
f = njit(lambda x: p(x, F_a, F_b))
g = njit(lambda x: p(x, G_a, G_b))
@njit
590 CHAPTER 37. EXCHANGEABILITY AND BAYESIAN UPDATING
# Draw
w = np.random.beta(a, b)
# Update belief
π = 1 / (1 + ((1 π) * g(w)) / (π * f(w)))
return π
@njit
def simulate_path(a, b, T=50):
"Simulates a path of beliefs π with length T"
π = np.empty(T+1)
# initial condition
π[0] = 0.5
return π
for i in range(N):
π_paths[i] = simulate_path(a=a, b=b, T=T)
if display:
plt.plot(range(T+1), π_paths[i], color='b', lw=0.8, alpha=0.5)
if display:
plt.show()
return π_paths
return simulate
We begin by generating 𝑁 simulated {𝜋𝑡 } paths with 𝑇 periods when the sequence is truly
IID draws from 𝐹 . We set the initial prior 𝜋−1 = .5.
In [7]: T = 50
In the above graph we observe that for most paths 𝜋𝑡 → 1. So Bayes’ Law evidently eventu-
ally discovers the truth for most of our paths.
Next, we generate paths with 𝑇 periods when the sequence is truly IID draws from 𝐺. Again,
we set the initial prior 𝜋−1 = .5.
We study rates of convergence of 𝜋𝑡 to 1 when nature generates the data as IID draws from 𝐹
and of 𝜋𝑡 to 0 when nature generates the data as IID draws from 𝐺.
We do this by averaging across simulated paths of {𝜋𝑡 }𝑇𝑡=0 .
𝑁
Using 𝑁 simulated 𝜋𝑡 paths, we compute 1 − ∑𝑖=1 𝜋𝑖,𝑡 at each 𝑡 when the data are generated
𝑁
as draws from 𝐹 and compute ∑𝑖=1 𝜋𝑖,𝑡 when the data are generated as draws from 𝐺.
From the above graph, rates of convergence appear not to depend on whether 𝐹 or 𝐺 gener-
ates the data.
More insights about the dynamics of {𝜋𝑡 } can be gleaned by computing the following con-
𝜋
ditional expectations of 𝜋𝑡+1 as functions of 𝜋𝑡 via integration with respect to the pertinent
𝑡
probability distribution:
𝜋𝑡+1 𝑙 (𝑤𝑡+1 )
𝐸[ ∣ 𝑞 = 𝜔, 𝜋𝑡 ] = 𝐸 [ ∣ 𝑞 = 𝜔, 𝜋𝑡 ] ,
𝜋𝑡 𝜋𝑡 𝑙 (𝑤𝑡+1 ) + (1 − 𝜋𝑡 )
1
𝑙 (𝑤𝑡+1 )
=∫ 𝜔 (𝑤𝑡+1 ) 𝑑𝑤𝑡+1
0 𝜋𝑡 𝑙 (𝑤𝑡+1 ) + (1 − 𝜋𝑡 )
37.9. APPENDIX 593
where 𝜔 = 𝑓, 𝑔.
The following code approximates the integral above:
# define f and g
f = njit(lambda x: p(x, F_a, F_b))
g = njit(lambda x: p(x, G_a, G_b))
expected_rario = np.empty(len(π_grid))
for q, inte in zip(["f", "g"], [integrand_f, integrand_g]):
for i, π in enumerate(π_grid):
expected_rario[i]= quad(inte, 0, 1, args=(π,))[0]
plt.plot(π_grid, expected_rario, label=f"{q} generates")
plt.hlines(1, 0, 1, linestyle="")
plt.xlabel("$π_t$")
plt.ylabel("$E[\pi_{t+1}/\pi_t]$")
plt.legend()
plt.show()
In [12]: expected_ratio()
594 CHAPTER 37. EXCHANGEABILITY AND BAYESIAN UPDATING
The above graphs shows that when 𝐹 generates the data, 𝜋𝑡 on average always heads north,
while when 𝐺 generates the data, 𝜋𝑡 heads south.
Next, we’ll look at a degenerate case in whcih 𝑓 and 𝑔 are identical beta distributions, and
𝐹𝑎 = 𝐺𝑎 = 3, 𝐹𝑏 = 𝐺𝑏 = 1.2.
In a sense, here there is nothing to learn.
The above graph says that 𝜋𝑡 is inert and would remain at its initial value.
Finally, let’s look at a case in which 𝑓 and 𝑔 are neither very different nor identical, in partic-
ular one in which 𝐹𝑎 = 2, 𝐹𝑏 = 1 and 𝐺𝑎 = 3, 𝐺𝑏 = 1.2.
37.10 Sequels
We’ll dig deeper into some of the ideas used here in the following lectures:
• this lecture describes likelihood ratio processes and their role in frequentist and
Bayesian statistical theories
• this lecture returns to the subject of this lecture and studies whether the Captain’s
hunch that the (frequentist) decision rule that the Navy had ordered him to use can
be expected to be better or worse than the rule sequential rule that Abraham Wald de-
signed
596 CHAPTER 37. EXCHANGEABILITY AND BAYESIAN UPDATING
Chapter 38
38.1 Contents
• Overview 38.2
• The Setting 38.3
• Likelihood Ratio Process and Bayes’ Law 38.4
• Sequels 38.5
38.2 Overview
This lecture describes the role that likelihood ratio processes play in Bayesian learning.
As in this lecture, we’ll use a simple statistical setting from this lecture.
We’ll focus on how a likelihood ratio process and a prior probability determine a posterior
probability.
We’ll derive a convenient recursion for today’s posterior as a function of yesterday’s posterior
and today’s multiplicative increment to a likelihood process.
We’ll also present a useful generalization of that formula that represents today’s posterior in
terms of an initial prior and today’s realization of the likelihood ratio process.
We’ll study how, at least in our setting, a Bayesian eventually learns the probability distribu-
tion that generates the data, an outcome that rests on the asymptotic behavior of likelihood
ratio processes studied in this lecture.
This lecture provides technical results that underly outcomes to be studied in this lecture and
this lecture and this lecture
597
598 CHAPTER 38. LIKELIHOOD RATIO PROCESSES AND BAYESIAN LEARNING
We begin by reviewing the setting in this lecture, which we adopt here too.
A nonnegative random variable 𝑊 has one of two probability density functions, either 𝑓 or 𝑔.
Before the beginning of time, nature once and for all decides whether she will draw a se-
quence of IID draws from either 𝑓 or 𝑔.
We will sometimes let 𝑞 be the density that nature chose once and for all, so that 𝑞 is either 𝑓
or 𝑔, permanently.
Nature knows which density it permanently draws from, but we the observers do not.
We do know both 𝑓 and 𝑔 but we don’t know which density nature chose.
But we want to know.
To do that, we use observations.
We observe a sequence {𝑤𝑡 }𝑇𝑡=1 of 𝑇 IID draws from either 𝑓 or 𝑔.
We want to use these observations to infer whether nature chose 𝑓 or 𝑔.
A likelihood ratio process is a useful tool for this task.
To begin, we define the key component of a likelihood ratio process, namely, the time 𝑡 likeli-
hood ratio as the random variable
𝑓 (𝑤𝑡 )
ℓ(𝑤𝑡 ) = , 𝑡 ≥ 1.
𝑔 (𝑤𝑡 )
We assume that 𝑓 and 𝑔 both put positive probabilities on the same intervals of possible real-
izations of the random variable 𝑊 .
𝑓(𝑤𝑡 )
That means that under the 𝑔 density, ℓ(𝑤𝑡 ) = 𝑔(𝑤𝑡 ) is evidently a nonnegative random vari-
able with mean 1.
∞
A likelihood ratio process for sequence {𝑤𝑡 }𝑡=1 is defined as
𝑡
𝐿 (𝑤𝑡 ) = ∏ ℓ(𝑤𝑖 ),
𝑖=1
The likelihood ratio and its logarithm are key tools for making inferences using a classic fre-
quentist approach due to Neyman and Pearson [? ].
We’ll again deploy the following Python code from this lecture that evaluates 𝑓 and 𝑔 as two
different beta distributions, then computes and simulates an associated likelihood ratio pro-
cess by generating a sequence 𝑤𝑡 from some probability distribution, for example, a sequence
of IID draws from 𝑔.
38.4. LIKELIHOOD RATIO PROCESS AND BAYES’ LAW 599
@vectorize
def p(x, a, b):
r = gamma(a + b) / (gamma(a) * gamma(b))
return r * x** (a1) * (1 x) ** (b1)
In [3]: @njit
def simulate(a, b, T=50, N=500):
'''
Generate N sets of T observations of the likelihood ratio,
return as N x T matrix.
'''
for i in range(N):
for j in range(T):
w = np.random.beta(a, b)
l_arr[i, j] = f(w) / g(w)
return l_arr
We’ll also use the following Python code to prepare some informative simulations
𝜋𝑡 = Prob(𝑞 = 𝑓|𝑤𝑡 )
The likelihood ratio process is a principal actor in the formula that governs the evolution of
the posterior probability 𝜋𝑡 , an instance of Bayes’ Law.
Bayes’ law implies that {𝜋𝑡 } obeys the recursion
𝜋𝑡−1 𝑙𝑡 (𝑤𝑡 )
𝜋𝑡 = (1)
𝜋𝑡−1 𝑙𝑡 (𝑤𝑡 ) + 1 − 𝜋𝑡−1
with 𝜋0 being a Bayesian prior probability that 𝑞 = 𝑓, i.e., a personal or subjective belief
about 𝑞 based on our having seen no data.
600 CHAPTER 38. LIKELIHOOD RATIO PROCESSES AND BAYESIAN LEARNING
Below we define a Python function that updates belief 𝜋 using likelihood ratio ℓ according to
recursion (1)
In [6]: @njit
def update(π, l):
"Update π using likelihood l"
# Update belief
π = π * l / (π * l + 1 π)
return π
Formula (1) can be generalized by iterating on it and thereby deriving an expression for the
time 𝑡 posterior 𝜋𝑡+1 as a function of the time 0 prior 𝜋0 and the likelihood ratio process
𝐿(𝑤𝑡+1 ) at time 𝑡.
To begin, notice that the updating rule
𝜋𝑡 ℓ (𝑤𝑡+1 )
𝜋𝑡+1 =
𝜋𝑡 ℓ (𝑤𝑡+1 ) + (1 − 𝜋𝑡 )
implies
1 𝜋 ℓ (𝑤𝑡+1 ) + (1 − 𝜋𝑡 )
= 𝑡
𝜋𝑡+1 𝜋𝑡 ℓ (𝑤𝑡+1 )
1 1 1
=1− + .
ℓ (𝑤𝑡+1 ) ℓ (𝑤𝑡+1 ) 𝜋𝑡
1 1 1
⇒ −1= ( − 1) .
𝜋𝑡+1 ℓ (𝑤𝑡+1 ) 𝜋𝑡
Therefore
1 1 1 1 1
− 1 = 𝑡+1 ( − 1) = 𝑡+1 )
( − 1) .
𝜋𝑡+1 ∏𝑖=1 ℓ (𝑤𝑖 ) 𝜋0 𝐿 (𝑤 𝜋 0
Since 𝜋0 ∈ (0, 1) and 𝐿 (𝑤𝑡+1 ) > 0, we can verify that 𝜋𝑡+1 ∈ (0, 1).
After rearranging the preceding equation, we can express 𝜋𝑡+1 as a function of 𝐿 (𝑤𝑡+1 ), the
likelihood ratio process at 𝑡 + 1, and the initial prior 𝜋0
𝜋0 𝐿 (𝑤𝑡+1 )
𝜋𝑡+1 = . (2)
𝜋0 𝐿 (𝑤𝑡+1 ) + 1 − 𝜋0
To illustrate this insight, below we will plot graphs showing one simulated path of the likeli-
hood ratio process 𝐿𝑡 along with two paths of 𝜋𝑡 that are associated with the same realization
of the likelihood ratio process but different initial prior probabilities probabilities 𝜋0 .
First, we tell Python two values of 𝜋0 .
Next we generate paths of the likelihood ratio process 𝐿𝑡 and the posterior 𝜋𝑡 for a history of
IID draws from density 𝑓.
In [8]: T = l_arr_f.shape[1]
π_seq_f = np.empty((2, T+1))
π_seq_f[:, 0] = π1, π2
for t in range(T):
for i in range(2):
π_seq_f[i, t+1] = update(π_seq_f[i, t], l_arr_f[0, t])
for i in range(2):
ax1.plot(range(T+1), π_seq_f[i, :], label=f"$\pi_0$={π_seq_f[i, 0]}")
ax1.set_ylabel("$\pi_t$")
ax1.set_xlabel("t")
ax1.legend()
ax1.set_title("when f governs data")
ax2 = ax1.twinx()
ax2.plot(range(1, T+1), np.log(l_seq_f[0, :]), '', color='b')
ax2.set_ylabel("$log(L(w^{t}))$")
plt.show()
602 CHAPTER 38. LIKELIHOOD RATIO PROCESSES AND BAYESIAN LEARNING
The dotted line in the graph above records the logarithm of the likelihood ratio process
log 𝐿(𝑤𝑡 ).
Please note that there are two different scales on the 𝑦 axis.
Now let’s study what happens when the history consists of IID draws from density 𝑔
In [10]: T = l_arr_g.shape[1]
π_seq_g = np.empty((2, T+1))
π_seq_g[:, 0] = π1, π2
for t in range(T):
for i in range(2):
π_seq_g[i, t+1] = update(π_seq_g[i, t], l_arr_g[0, t])
for i in range(2):
ax1.plot(range(T+1), π_seq_g[i, :], label=f"$\pi_0$={π_seq_g[i, 0]}")
ax1.set_ylabel("$\pi_t$")
ax1.set_xlabel("t")
ax1.legend()
ax1.set_title("when g governs data")
ax2 = ax1.twinx()
ax2.plot(range(1, T+1), np.log(l_seq_g[0, :]), '', color='b')
ax2.set_ylabel("$log(L(w^{t}))$")
plt.show()
Below we offer Python code that verifies that nature chose permanently to draw from density
𝑓.
38.5. SEQUELS 603
for i in range(2):
πL = π_seq[i, 0] * l_seq_f[0, :]
π_seq[i, 1:] = πL / (πL + 1 π_seq[i, 0])
Out[13]: True
We thus conclude that the likelihood ratio process is a key ingredient of the formula (2) for
a Bayesian’s posteior probabilty that nature has drawn history 𝑤𝑡 as repeated draws from
density 𝑔.
38.5 Sequels
39.1 Contents
• Overview 39.2
• Setup 39.3
• Frequentist Decision Rule 39.4
• Bayesian Decision Rule 39.5
• Was the Navy Captain’s hunch correct? 39.6
• More details 39.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
39.2 Overview
605
606 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES
(The Navy had ordered the Captain to use an instance of a frequentist decision rule.)
Milton Friedman recognized the Captain’s conjecture as posing a challenging statistical
problem that he and other members of the US Government’s Statistical Research Group at
Columbia University proceeded to try to solve.
One of the members of the group, the great mathematician Abraham Wald, soon solved the
problem.
A good way to formulate the problem is to use some ideas from Bayesian statistics that we
describe in this lecture Exchangeability and Bayesian Updating and in this lecture Likelihood
Ratio Processes, which describes the link between Bayesian updating and likelihood ratio pro-
cesses.
The present lecture uses Python to generate simulations that evaluate expected losses un-
der frequentist and Bayesian decision rules for a instances of the Navy Captain’s decision
problem.
The simulations validate the Navy Captain’s hunch that there is a better rule than the one
the Navy had ordered him to use.
39.3 Setup
To formalize the problem of the Navy Captain whose questions posed the problem that Mil-
ton Friedman and Allan Wallis handed over to Abraham Wald, we consider a setting with the
following parts.
• Each period a decision maker draws a non-negative random variable 𝑍 from a probabil-
ity distribution that he does not completely understand. He knows that two probability
distributions are possible, 𝑓0 and 𝑓1 , and that which ever distribution it is remains fixed
over time. The decision maker believes that before the beginning of time, nature once
and for all selected either 𝑓0 or 𝑓1 and that the probability that it selected 𝑓0 is proba-
bility 𝜋∗ .
𝑡
• The decision maker observes a sample {𝑧𝑖 }𝑖=0 from the the distribution chosen by na-
ture.
The decision maker wants to decide which distribution actually governs 𝑍 and is worried by
two types of errors and the losses that they impose on him.
• a loss 𝐿̄ 1 from a type I error that occurs when he decides that 𝑓 = 𝑓1 when actually
𝑓 = 𝑓0
• a loss 𝐿̄ 0 from a type II error that occurs when he decides that 𝑓 = 𝑓0 when actually
𝑓 = 𝑓1
The decision maker pays a cost 𝑐 for drawing another 𝑧
We mainly borrow parameters from the quantecon lecture “A Problem that Stumped Milton
Friedman” except that we increase both 𝐿̄ 0 and 𝐿̄ 1 from 25 to 100 to encourage the frequen-
tist Navy Captain to take more draws before deciding.
We set the cost 𝑐 of taking one more draw at 1.25.
We set the probability distributions 𝑓0 and 𝑓1 to be beta distributions with 𝑎0 = 𝑏0 = 1,
𝑎1 = 3, and 𝑏1 = 1.2, respectively.
Below is some Python code that sets up these objects.
39.3. SETUP 607
In [3]: @njit
def p(x, a, b):
"Beta distribution."
We start with defining a jitclass that stores parameters and functions we need to solve
problems for both the bayesian and frequentist Navy Captains.
In [4]: wf_data = [
('c', float64), # unemployment compensation
('a0', float64), # parameters of beta distribution
('b0', float64),
('a1', float64),
('b1', float64),
('L0', float64), # cost of selecting f0 when f1 is true
('L1', float64), # cost of selecting f1 when f0 is true
('π_grid', float64[:]), # grid of beliefs π
('π_grid_size', int64),
('mc_size', int64), # size of Monto Carlo simulation
('z0', float64[:]), # sequence of random values
('z1', float64[:]) # sequence of random values
]
In [5]: @jitclass(wf_data)
class WaldFriedman:
def __init__(self,
c=1.25,
a0=1,
b0=1,
a1=3,
b1=1.2,
L0=100,
L1=100,
π_grid_size=200,
mc_size=1000):
return π_new
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/ipykernel_launcher.py:1: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba
doc/latest/reference/deprecation.html#changeofjitclasslocation for the time�
↪frame.
"""Entry point for launching an IPython kernel.
In [6]: wf = WaldFriedman()
plt.figure()
plt.title("Two Distributions")
plt.plot(grid, wf.f0(grid), lw=2, label="$f_0$")
plt.plot(grid, wf.f1(grid), lw=2, label="$f_1$")
plt.legend()
plt.xlabel("$z$ values")
plt.ylabel("density of $z_k$")
plt.tight_layout()
plt.show()
39.4. FREQUENTIST DECISION RULE 609
̄ (𝑡, 𝑑) = 𝑐𝑡 + 𝜋∗ 𝑃 𝐹 𝐴 × 𝐿̄ 1 + (1 − 𝜋∗ ) (1 − 𝑃 𝐷) × 𝐿̄ 0
𝑉𝑓𝑟𝑒 (1)
where 𝑃 𝐹 𝐴 = Pr {𝐿 (𝑧 𝑡 ) < 𝑑 ∣ 𝑞 = 𝑓0 }
𝑃 𝐷 = Pr {𝐿 (𝑧 𝑡 ) < 𝑑 ∣ 𝑞 = 𝑓1 }
Here
• 𝑃 𝐹 𝐴 denotes the probability of a false alarm, i.e., rejecting 𝐻0 when it is true
• 𝑃 𝐷 denotes the probability of a detection error, i.e., not rejecting 𝐻0 when 𝐻1 is
true
For a given sample size 𝑡, the pairs (𝑃 𝐹 𝐴, 𝑃 𝐷) lie on a “receiver operating characteristic
curve” and can be uniquely pinned down by choosing 𝑑.
To see some receiver operating characteristic curves, please see this lecture Likelihood Ratio
Processes.
̄ (𝑡, 𝑑) numerically, we first simulate sequences of 𝑧 when either 𝑓0 or 𝑓1 gen-
To solve for 𝑉𝑓𝑟𝑒
erates data.
In [7]: N = 10000
T = 100
L0_arr = np.cumprod(l0_arr, 1)
L1_arr = np.cumprod(l1_arr, 1)
With an empirical distribution of likelihood ratios in hand, we can draw “receiver operating
characteristic curves” by enumerating (𝑃 𝐹 𝐴, 𝑃 𝐷) pairs given each sample size 𝑡.
Our frequentist minimizes the expected total loss presented in equation (1) by choosing (𝑡, 𝑑).
Doing that delivers an expected loss
̄ = min 𝑉𝑓𝑟𝑒
𝑉𝑓𝑟𝑒 ̄ (𝑡, 𝑑) .
𝑡,𝑑
In [13]: @njit
def V_fre_d_t(d, t, L0_arr, L1_arr, π_star, wf):
N = L0_arr.shape[0]
return V
d = res.x
return V, PFA, PD
T = L0_arr.shape[1]
V_fre_arr = np.empty(T)
PFA_arr = np.empty(T)
PD_arr = np.empty(T)
In [18]: msg = f"The above graph indicates that minimizing over t tells the frequentist to�
↪draw
{t_optimal} observations and then decide."
print(msg)
The above graph indicates that minimizing over t tells the frequentist to draw 8
observations and then decide.
Let’s now change the value of 𝜋∗ and watch how the decision rule changes.
In [19]: n_π = 20
π_star_arr = np.linspace(0.1, 0.9, n_π)
V_fre_bar_arr = np.empty(n_π)
t_optimal_arr = np.empty(n_π)
PFA_optimal_arr = np.empty(n_π)
PD_optimal_arr = np.empty(n_π)
V_fre_bar_arr[i] = V_fre_arr[t_idx]
t_optimal_arr[i] = t_idx + 1
PFA_optimal_arr[i] = PFA_arr[t_idx]
PD_optimal_arr[i] = PD_arr[t_idx]
plt.show()
614 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES
The following shows how do optimal sample size 𝑡 and targeted (𝑃 𝐹 𝐴, 𝑃 𝐷) change as 𝜋∗
varies.
axs[0].plot(π_star_arr, t_optimal_arr)
axs[0].set_xlabel('$\pi^*$')
axs[0].set_title('optimal sample size given $\pi^*$')
plt.show()
In this lecture A Problem that Stumped Milton Friedman, we learned how Abraham Wald
confirmed the Navy Captain’s hunch that there is a better decision rule.
We presented a Bayesian procedure that instructed the Captain to makes decisions by com-
paring his current Bayesian posterior probability 𝜋 with two cutoff probabilities called 𝛼 and
𝛽.
To proceed, we borrow some Python code from the quantecon lecture A Problem that
Stumped Milton Friedman that computes 𝛼 and 𝛽.
In [22]: @njit(parallel=True)
def Q(h, wf):
κ = wf.κ
h_new = np.empty_like(π_grid)
h_func = lambda p: interp(π_grid, h, p)
for i in prange(len(π_grid)):
π = π_grid[i]
h_new[i] = c + integral
return h_new
In [23]: @njit
def solve_model(wf, tol=1e4, max_iter=1000):
"""
Compute the continuation value function
* wf is an instance of WaldFriedman
"""
# Set up loop
h = np.zeros(len(wf.π_grid))
i = 0
error = tol + 1
if i == max_iter:
print("Failed to converge!")
return h_new
In [25]: @njit
def find_cutoff_rule(wf, h):
"""
This function takes a continuation value function and returns the
616 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES
π_grid = wf.π_grid
L0, L1 = wf.L0, wf.L1
return (β, α)
β, α = find_cutoff_rule(wf, h_star)
cost_L0 = (1 wf.π_grid) * wf.L0
cost_L1 = wf.π_grid * wf.L1
plt.legend(borderpad=1.1)
plt.show()
39.5. BAYESIAN DECISION RULE 617
The above figure portrays the value function plotted against decision maker’s Bayesian poste-
rior.
It also shows the probabilities 𝛼 and 𝛽.
The Bayesian decision rule is:
• accept 𝐻0 if 𝜋 ≥ 𝛼
• accept 𝐻1 if 𝜋 ≤ 𝛽
• delay deciding and draw another 𝑧 if 𝛽 ≤ 𝜋 ≤ 𝛼
We can calculate two ‘’objective” loss functions under this situation conditioning on knowing
for sure that nature has selected 𝑓0 , in the first case, or 𝑓1 , in the second case.
1. under 𝑓0 ,
⎧0 if 𝛼 ≤ 𝜋,
{
𝑉 0 (𝜋) = ⎨𝑐 + 𝐸𝑉 0 (𝜋′ ) if 𝛽 ≤ 𝜋 < 𝛼,
{𝐿̄ if 𝜋 < 𝛽.
⎩ 1
1. under 𝑓1
⎧𝐿̄ 0 if 𝛼 ≤ 𝜋,
{
𝑉 1 (𝜋) = ⎨𝑐 + 𝐸𝑉 1 (𝜋′ ) if 𝛽 ≤ 𝜋 < 𝛼,
{0 if 𝜋 < 𝛽.
⎩
𝜋𝑓0 (𝑧′ )
where 𝜋′ = 𝜋𝑓0 (𝑧′ )+(1−𝜋)𝑓1 (𝑧′ ) . Given a prior probability 𝜋0 , the expected loss for the Bayesian
is
618 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES
̄
𝑉𝐵𝑎𝑦𝑒𝑠 (𝜋0 ) = 𝜋∗ 𝑉 0 (𝜋0 ) + (1 − 𝜋∗ ) 𝑉 1 (𝜋0 ) .
Below we write some Python code that computes 𝑉 0 (𝜋) and 𝑉 1 (𝜋) numerically.
In [26]: @njit(parallel=True)
def V_q(wf, flag):
V = np.zeros(wf.π_grid_size)
if flag == 0:
z_arr = wf.z0
V[wf.π_grid < β] = wf.L1
else:
z_arr = wf.z1
V[wf.π_grid >= α] = wf.L0
V_old = np.empty_like(V)
while True:
V_old[:] = V[:]
V[(β <= wf.π_grid) & (wf.π_grid < α)] = 0
for i in prange(len(wf.π_grid)):
π = wf.π_grid[i]
if π >= α or π < β:
continue
for j in prange(len(z_arr)):
π_next = wf.κ(z_arr[j], π)
V[i] += wf.c + interp(wf.π_grid, V_old, π_next)
V[i] /= wf.mc_size
return V
In [27]: V0 = V_q(wf, 0)
V1 = V_q(wf, 1)
̄
Given an assumed value for 𝜋∗ = Pr {nature selects 𝑓0 }, we can then compute 𝑉𝐵𝑎𝑦𝑒𝑠 (𝜋0 ).
We can then determine an initial Bayesian prior 𝜋0∗ that minimizes this objective concept of
expected loss.
The figure 9 below plots four cases corresponding to 𝜋∗ = 0.25, 0.3, 0.5, 0.7.
We observe that in each case 𝜋0∗ equals 𝜋∗ .
axs[row_i, col_i].set_ylabel('$\overline{V}_{baye}(\pi)$')
axs[row_i, col_i].set_title('$\pi^*=$' + f'{π_star}')
fig.suptitle('$\overline{V}_{baye}(\pi)=\pi^*V^0(\pi) + (1\pi^*)V^1(\pi)$',
fontsize=16)
plt.show()
V_baye_bar_arr[i] = V_baye_bar
π_optimal_arr[i] = π_optimal
axs[0].plot(π_star_arr, V_baye_bar_arr)
axs[0].set_xlabel('$\pi^*$')
axs[0].set_title('$\overline{V}_{baye}$')
[π_star_arr.min(), π_star_arr.max()],
c='k', linestyle='', label='45 degree line')
axs[1].set_xlabel('$\pi^*$')
axs[1].set_title('optimal prior given $\pi^*$')
axs[1].legend()
plt.show()
We now compare average (i.e., frequentist) losses obtained by the frequentist and Bayesian
decision rules.
As a starting point, let’s compare average loss functions when 𝜋∗ = 0.5.
In [32]: # frequentist
V_fre_arr, PFA_arr, PD_arr = compute_V_fre(L0_arr, L1_arr, π_star, wf)
# bayesian
V_baye = π_star * V0 + π_star * V1
V_baye_bar = V_baye.min()
Evidently, there is no sample size 𝑡 at which the frequentist decision rule attains a lower loss
function than does the Bayesian rule.
Furthermore, the following graph indicates that the Bayesian decision rule does better on av-
erage for all values of 𝜋∗ .
plt.show()
39.7. MORE DETAILS 623
̄ − 𝑉𝐵𝑎𝑦𝑒𝑠
The right panel of the above graph plots the difference 𝑉𝑓𝑟𝑒 ̄ .
It is always positive.
We can provide more insights by focusing soley the case in which 𝜋∗ = 0.5 = 𝜋0 .
Recall that when 𝜋∗ = 0.5, the frequentist decision rule sets a sample size t_optimal ex ante
For our parameter settings, we can compute it’s value:
In [36]: t_optimal
Out[36]: 8
For convenience, let’s define t_idx as the Python array index corresponding to t_optimal
sample size.
By using simulations, we compute the frequency distribution of time to deciding for the
Bayesian decision rule and compare that time to the frequentist rule’sfixed 𝑡.
The following Python code creates a graph that shows the frequency distribution of Bayesian
times to decide of Bayesian decision maker, conditional on distribution 𝑞 = 𝑓0 or 𝑞 = 𝑓1
generating the data.
The blue and red dotted lines show averages for the Bayesian decision rule, while the black
dotted line shows the frequentist optimal sample size 𝑡.
On average the Bayesian rule decides earlier than the frequentist rule when 𝑞 = 𝑓0 and later
when 𝑞 = 𝑓1 .
In [38]: @njit(parallel=True)
def check_results(L_arr, α, β, flag, π0):
N, T = L_arr.shape
time_arr = np.empty(N)
correctness = np.empty(N)
for i in prange(N):
for t in range(T):
if (π_arr[i, t] < β) or (π_arr[i, t] > α):
time_arr[i] = t + 1
correctness[i] = (flag == 0 and π_arr[i, t] > α) or (flag == 1 and
624 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES
π_arr[i, t] < β)
break
# unconditional distribution
time_arr_u = np.concatenate((time_arr0, time_arr1))
correctness_u = np.concatenate((correctness0, correctness1))
plt.xlabel('t')
plt.ylabel('n')
plt.title('Conditional frequency distribution of times')
plt.show()
Later we’ll figure out how these distributions ultimately affect objective expected values un-
der the two decision rules.
To begin, let’s look at simulations of the Bayesian’s beliefs over time.
39.7. MORE DETAILS 625
We can easily compute the updated beliefs at any time 𝑡 using the one-to-one mapping from
𝐿𝑡 to 𝜋𝑡 given 𝜋0 described in this lecture Likelihood Ratio Processes.
plt.show()
The above figures compare averages and variances of updated Bayesian posteriors after 𝑡
draws.
The left graph compares 𝐸 (𝜋𝑡 ) under 𝑓0 to 1 − 𝐸 (𝜋𝑡 ) under 𝑓1 : they lie on top of each other.
However, as the right hand size graph shows, there is significant difference in variances when 𝑡
is small: the variance is lower under 𝑓1 .
The difference in variances is the reason that the Bayesian decision maker waits longer to de-
cide when 𝑓1 generates the data.
The code below plots outcomes of constructing an unconditional distribution by simply pool-
ing the simulated data across the two possible distributions 𝑓0 and 𝑓1 .
The pooled distribution describes a sense in which on average the Bayesian decides earlier, an
outcome that seems at least partly to confirm the Navy Captain’s hunch.
626 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES
plt.xlabel('t')
plt.ylabel('n')
plt.title('Unconditional distribution of times')
plt.show()
Now we use simulations to compute the fraction of samples in which the Bayesian and the
frequentist decision rules decide correctly.
For the frequentist rule, the probability of making the correct decision under 𝑓1 is the optimal
probability of detection given 𝑡 that we defined earlier, and similarly it equals 1 minus the
optimal probability of a false alarm under 𝑓0 .
Below we plot these two probabilities for the frequentist rule, along with the conditional
probabilities that the Bayesian rule decides before 𝑡 and that the decision is correct.
In [45]: plt.plot([1, 20], [PD, PD], linestyle='', label='PD: fre. chooses f1 correctly')
plt.plot([1, 20], [1PFA, 1PFA], linestyle='', label='1PFA: fre. chooses f0
correctly')
39.7. MORE DETAILS 627
N = time_arr0.size
T_arr = np.arange(1, 21)
plt.plot(T_arr, [np.sum(correctness0[time_arr0 <= t] == 1) / N for t in T_arr],
label='q=f0 and baye. choose f0')
plt.plot(T_arr, [np.sum(correctness1[time_arr1 <= t] == 1) / N for t in T_arr],
label='q=f1 and baye. choose f1')
plt.legend(loc=4)
plt.xlabel('t')
plt.ylabel('Probability')
plt.title('Cond. probability of making correct decisions before t')
plt.show()
N = time_arr_u.size
plt.plot(T_arr, [np.sum(correctness_u[time_arr_u <= t] == 1) / N for t in T_arr],
label="bayesian makes correct decision")
plt.legend()
plt.xlabel('t')
plt.ylabel('Probability')
plt.title('Uncond. probability of making correct decisions before t')
plt.show()
628 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES
plt.xlabel('log(L)')
plt.ylabel('n')
39.7. MORE DETAILS 629
plt.show()
The next graph plots the unconditional distribution of Bayesian times to decide, constructed
as earlier by pooling the two conditional distributions.
plt.xlabel('log(L)')
plt.ylabel('n')
plt.title('Uncond. distribution of log likelihood ratio at frequentist t')
plt.show()
630 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES
Part VI
LQ Control
631
Chapter 40
LQ Control: Foundations
40.1 Contents
• Overview 40.2
• Introduction 40.3
• Optimality – Finite Horizon 40.4
• Implementation 40.5
• Extensions and Comments 40.6
• Further Applications 40.7
• Exercises 40.8
• Solutions 40.9
In addition to what’s in Anaconda, this lecture will need the following libraries:
40.2 Overview
Linear quadratic (LQ) control refers to a class of dynamic optimization problems that have
found applications in almost every scientific field.
This lecture provides an introduction to LQ control and its economic applications.
As we will see, LQ systems have a simple structure that makes them an excellent workhorse
for a wide variety of economic problems.
Moreover, while the linear-quadratic structure is restrictive, it is in fact far more flexible than
it may appear initially.
These themes appear repeatedly below.
Mathematically, LQ control problems are closely related to the Kalman filter
• Recursive formulations of linear-quadratic control problems and Kalman filtering prob-
lems both involve matrix Riccati equations.
• Classical formulations of linear control and linear filtering problems make use of similar
matrix decompositions (see for example this lecture and this lecture).
In reading what follows, it will be useful to have some familiarity with
• matrix manipulations
633
634 CHAPTER 40. LQ CONTROL: FOUNDATIONS
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
40.3 Introduction
The “linear” part of LQ is a linear law of motion for the state, while the “quadratic” part
refers to preferences.
Let’s begin with the former, move on to the latter, and then put them together into an opti-
mization problem.
Here
• 𝑢𝑡 is a “control” vector, incorporating choices available to a decision-maker confronting
the current state 𝑥𝑡
• {𝑤𝑡 } is an uncorrelated zero mean shock process satisfying 𝔼𝑤𝑡 𝑤𝑡′ = 𝐼, where the right-
hand side is the identity matrix
Regarding the dimensions
• 𝑥𝑡 is 𝑛 × 1, 𝐴 is 𝑛 × 𝑛
• 𝑢𝑡 is 𝑘 × 1, 𝐵 is 𝑛 × 𝑘
• 𝑤𝑡 is 𝑗 × 1, 𝐶 is 𝑛 × 𝑗
40.3. INTRODUCTION 635
Example 1
𝑎𝑡+1 + 𝑐𝑡 = (1 + 𝑟)𝑎𝑡 + 𝑦𝑡
Here 𝑎𝑡 is assets, 𝑟 is a fixed interest rate, 𝑐𝑡 is current consumption, and 𝑦𝑡 is current non-
financial income.
If we suppose that {𝑦𝑡 } is serially uncorrelated and 𝑁 (0, 𝜎2 ), then, taking {𝑤𝑡 } to be stan-
dard normal, we can write the system as
This is clearly a special case of (1), with assets being the state and consumption being the
control.
Example 2
One unrealistic feature of the previous model is that non-financial income has a zero mean
and is often negative.
This can easily be overcome by adding a sufficiently large mean.
Hence in this example, we take 𝑦𝑡 = 𝜎𝑤𝑡+1 + 𝜇 for some positive real number 𝜇.
Another alteration that’s useful to introduce (we’ll see why soon) is to change the control
variable from consumption to the deviation of consumption from some “ideal” quantity 𝑐.̄
(Most parameterizations will be such that 𝑐 ̄ is large relative to the amount of consumption
that is attainable in each period, and hence the household wants to increase consumption)
For this reason, we now take our control to be 𝑢𝑡 ∶= 𝑐𝑡 − 𝑐.̄
In terms of these variables, the budget constraint 𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑐𝑡 + 𝑦𝑡 becomes
How can we write this new system in the form of equation (1)?
If, as in the previous example, we take 𝑎𝑡 as the state, then we run into a problem: the law of
motion contains some constant terms on the right-hand side.
This means that we are dealing with an affine function, not a linear one (recall this discus-
sion).
Fortunately, we can easily circumvent this problem by adding an extra state variable.
In particular, if we write
𝑎𝑡+1 1 + 𝑟 −𝑐 ̄ + 𝜇 𝑎 −1 𝜎
( )=( )( 𝑡 ) + ( ) 𝑢𝑡 + ( ) 𝑤𝑡+1 (3)
1 0 1 1 0 0
𝑎𝑡 1 + 𝑟 −𝑐 ̄ + 𝜇 −1 𝜎
𝑥𝑡 ∶= ( ), 𝐴 ∶= ( ), 𝐵 ∶= ( ), 𝐶 ∶= ( ) (4)
1 0 1 0 0
40.3.2 Preferences
In the LQ model, the aim is to minimize flow of losses, where time-𝑡 loss is given by the
quadratic expression
Here
• 𝑅 is assumed to be 𝑛 × 𝑛, symmetric and nonnegative definite.
• 𝑄 is assumed to be 𝑘 × 𝑘, symmetric and positive definite.
Note
In fact, for many economic problems, the definiteness conditions on 𝑅 and 𝑄 can
be relaxed. It is sufficient that certain submatrices of 𝑅 and 𝑄 be nonnegative
definite. See [50] for details.
Example 1
A very simple example that satisfies these assumptions is to take 𝑅 and 𝑄 to be identity ma-
trices so that current loss is
Thus, for both the state and the control, loss is measured as squared distance from the origin.
(In fact, the general case (5) can also be understood in this way, but with 𝑅 and 𝑄 identify-
ing other – non-Euclidean – notions of “distance” from the zero vector).
Intuitively, we can often think of the state 𝑥𝑡 as representing deviation from a target, such as
• deviation of inflation from some target level
• deviation of a firm’s capital stock from some desired quantity
The aim is to put the state close to the target, while using controls parsimoniously.
Example 2
Under this specification, the household’s current loss is the squared deviation of consumption
from the ideal level 𝑐.̄
40.4. OPTIMALITY – FINITE HORIZON 637
Let’s now be precise about the optimization problem we wish to consider, and look at how to
solve it.
We will begin with the finite horizon case, with terminal time 𝑇 ∈ ℕ.
In this case, the aim is to choose a sequence of controls {𝑢0 , … , 𝑢𝑇 −1 } to minimize the objec-
tive
𝑇 −1
𝔼 { ∑ 𝛽 𝑡 (𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 ) + 𝛽 𝑇 𝑥′𝑇 𝑅𝑓 𝑥𝑇 } (6)
𝑡=0
40.4.2 Information
There’s one constraint we’ve neglected to mention so far, which is that the decision-maker
who solves this LQ problem knows only the present and the past, not the future.
To clarify this point, consider the sequence of controls {𝑢0 , … , 𝑢𝑇 −1 }.
When choosing these controls, the decision-maker is permitted to take into account the effects
of the shocks {𝑤1 , … , 𝑤𝑇 } on the system.
However, it is typically assumed — and will be assumed here — that the time-𝑡 control 𝑢𝑡
can be made with knowledge of past and present shocks only.
The fancy measure-theoretic way of saying this is that 𝑢𝑡 must be measurable with respect to
the 𝜎-algebra generated by 𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡 .
This is in fact equivalent to stating that 𝑢𝑡 can be written in the form 𝑢𝑡 =
𝑔𝑡 (𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡 ) for some Borel measurable function 𝑔𝑡 .
(Just about every function that’s useful for applications is Borel measurable, so, for the pur-
poses of intuition, you can read that last phrase as “for some function 𝑔𝑡 ”)
Now note that 𝑥𝑡 will ultimately depend on the realizations of 𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡 .
In fact, it turns out that 𝑥𝑡 summarizes all the information about these historical shocks that
the decision-maker needs to set controls optimally.
More precisely, it can be shown that any optimal control 𝑢𝑡 can always be written as a func-
tion of the current state alone.
638 CHAPTER 40. LQ CONTROL: FOUNDATIONS
Hence in what follows we restrict attention to control policies (i.e., functions) of the form
𝑢𝑡 = 𝑔𝑡 (𝑥𝑡 ).
Actually, the preceding discussion applies to all standard dynamic programming problems.
What’s special about the LQ case is that – as we shall soon see — the optimal 𝑢𝑡 turns out
to be a linear function of 𝑥𝑡 .
40.4.3 Solution
To solve the finite horizon LQ problem we can use a dynamic programming strategy based on
backward induction that is conceptually similar to the approach adopted in this lecture.
For reasons that will soon become clear, we first introduce the notation 𝐽𝑇 (𝑥) = 𝑥′ 𝑅𝑓 𝑥.
Now consider the problem of the decision-maker in the second to last period.
In particular, let the time be 𝑇 − 1, and suppose that the state is 𝑥𝑇 −1 .
The decision-maker must trade-off current and (discounted) final losses, and hence solves
The function 𝐽𝑇 −1 will be called the 𝑇 − 1 value function, and 𝐽𝑇 −1 (𝑥) can be thought of as
representing total “loss-to-go” from state 𝑥 at time 𝑇 − 1 when the decision-maker behaves
optimally.
Now let’s step back to 𝑇 − 2.
For a decision-maker at 𝑇 − 2, the value 𝐽𝑇 −1 (𝑥) plays a role analogous to that played by the
terminal loss 𝐽𝑇 (𝑥) = 𝑥′ 𝑅𝑓 𝑥 for the decision-maker at 𝑇 − 1.
That is, 𝐽𝑇 −1 (𝑥) summarizes the future loss associated with moving to state 𝑥.
The decision-maker chooses her control 𝑢 to trade off current loss against future loss, where
• the next period state is 𝑥𝑇 −1 = 𝐴𝑥𝑇 −2 + 𝐵𝑢 + 𝐶𝑤𝑇 −1 , and hence depends on the choice
of current control.
• the “cost” of landing in state 𝑥𝑇 −1 is 𝐽𝑇 −1 (𝑥𝑇 −1 ).
Her problem is therefore
Letting
The first equality is the Bellman equation from dynamic programming theory specialized to
the finite horizon LQ problem.
Now that we have {𝐽0 , … , 𝐽𝑇 }, we can obtain the optimal controls.
As a first step, let’s find out what the value functions look like.
It turns out that every 𝐽𝑡 has the form 𝐽𝑡 (𝑥) = 𝑥′ 𝑃𝑡 𝑥 + 𝑑𝑡 where 𝑃𝑡 is a 𝑛 × 𝑛 matrix and 𝑑𝑡
is a constant.
We can show this by induction, starting from 𝑃𝑇 ∶= 𝑅𝑓 and 𝑑𝑇 = 0.
Using this notation, (7) becomes
To obtain the minimizer, we can take the derivative of the r.h.s. with respect to 𝑢 and set it
equal to zero.
Applying the relevant rules of matrix calculus, this gives
𝐽𝑇 −1 (𝑥) = 𝑥′ 𝑃𝑇 −1 𝑥 + 𝑑𝑇 −1
where
and
𝑑𝑇 −1 ∶= 𝛽 trace(𝐶 ′ 𝑃𝑇 𝐶) (11)
and
40.5 Implementation
We will use code from lqcontrol.py in QuantEcon.py to solve finite and infinite horizon linear
quadratic control problems.
In the module, the various updating, simulation and fixed point methods are wrapped in a
class called LQ, which includes
• Instance data:
– The required parameters 𝑄, 𝑅, 𝐴, 𝐵 and optional parameters C, β, T, R_f, N spec-
ifying a given LQ model
* set 𝑇 and 𝑅𝑓 to None in the infinite horizon case
* set C = None (or zero) in the deterministic case
– the value function and policy data
* 𝑑𝑡 , 𝑃𝑡 , 𝐹𝑡 in the finite horizon case
* 𝑑, 𝑃 , 𝐹 in the infinite horizon case
• Methods:
– update_values — shifts 𝑑𝑡 , 𝑃𝑡 , 𝐹𝑡 to their 𝑡 − 1 values via (12), (13) and (14)
– stationary_values — computes 𝑃 , 𝑑, 𝐹 in the infinite horizon case
– compute_sequence —- simulates the dynamics of 𝑥𝑡 , 𝑢𝑡 , 𝑤𝑡 given 𝑥0 and assuming
standard normal shocks
40.5.1 An Application
Early Keynesian models assumed that households have a constant marginal propensity to
consume from current income.
Data contradicted the constancy of the marginal propensity to consume.
In response, Milton Friedman, Franco Modigliani and others built models based on a con-
sumer’s preference for an intertemporally smooth consumption stream.
(See, for example, [38] or [82])
One property of those models is that households purchase and sell financial assets to make
consumption streams smoother than income streams.
The household savings problem outlined above captures these ideas.
The optimization problem for the household is to choose a consumption sequence in order to
minimize
40.5. IMPLEMENTATION 641
𝑇 −1
𝔼 { ∑ 𝛽 𝑡 (𝑐𝑡 − 𝑐)̄ 2 + 𝛽 𝑇 𝑞𝑎2𝑇 } (16)
𝑡=0
0 0 𝑞 0
𝑄 ∶= 1, 𝑅 ∶= ( ), and 𝑅𝑓 ∶= ( )
0 0 0 0
Now that the problem is expressed in LQ form, we can proceed to the solution by applying
(12) and (14).
After generating shocks 𝑤1 , … , 𝑤𝑇 , the dynamics for assets and consumption can be simu-
lated via (15).
The following figure was computed using 𝑟 = 0.05, 𝛽 = 1/(1 + 𝑟), 𝑐 ̄ = 2, 𝜇 = 1, 𝜎 = 0.25, 𝑇 = 45
and 𝑞 = 106 .
The shocks {𝑤𝑡 } were taken to be IID and standard normal.
# Formulate as an LQ problem
Q = 1
R = np.zeros((2, 2))
Rf = np.zeros((2, 2))
Rf[0, 0] = q
A = [[1 + r, c_bar + μ],
[0, 1]]
B = [[1],
[ 0]]
C = [[σ],
[0]]
# Plot results
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))
plt.subplots_adjust(hspace=0.5)
for ax in axes:
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=2, **legend_args)
plt.show()
40.5. IMPLEMENTATION 643
The top panel shows the time path of consumption 𝑐𝑡 and income 𝑦𝑡 in the simulation.
As anticipated by the discussion on consumption smoothing, the time path of consumption is
much smoother than that for income.
(But note that consumption becomes more irregular towards the end of life, when the zero
final asset requirement impinges more on consumption choices).
The second panel in the figure shows that the time path of assets 𝑎𝑡 is closely correlated with
cumulative unanticipated income, where the latter is defined as
𝑡
𝑧𝑡 ∶= ∑ 𝜎𝑤𝑡
𝑗=0
A key message is that unanticipated windfall gains are saved rather than consumed, while
unanticipated negative shocks are met by reducing assets.
(Again, this relationship breaks down towards the end of life due to the zero final asset re-
quirement)
These results are relatively robust to changes in parameters.
For example, let’s increase 𝛽 from 1/(1 + 𝑟) ≈ 0.952 to 0.96 while keeping other parameters
fixed.
This consumer is slightly more patient than the last one, and hence puts relatively more
weight on later consumption values.
644 CHAPTER 40. LQ CONTROL: FOUNDATIONS
# Plot results
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))
plt.subplots_adjust(hspace=0.5)
for ax in axes:
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=2, **legend_args)
plt.show()
40.6. EXTENSIONS AND COMMENTS 645
We now have a slowly rising consumption stream and a hump-shaped build-up of assets in the
middle periods to fund rising consumption.
However, the essential features are the same: consumption is smooth relative to income, and
assets are strongly positively correlated with cumulative unanticipated income.
Let’s now consider a number of standard extensions to the LQ problem treated above.
For further examples and a more systematic treatment, see [51], section 2.4.
In some LQ problems, preferences include a cross-product term 𝑢′𝑡 𝑁 𝑥𝑡 , so that the objective
function becomes
𝑇 −1
𝔼 { ∑ 𝛽 𝑡 (𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑁 𝑥𝑡 ) + 𝛽 𝑇 𝑥′𝑇 𝑅𝑓 𝑥𝑇 } (17)
𝑡=0
Finally, we consider the infinite horizon case, with cross-product term, unchanged dynamics
and objective function given by
∞
𝔼 {∑ 𝛽 𝑡 (𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑁 𝑥𝑡 )} (20)
𝑡=0
In the infinite horizon case, optimal policies can depend on time only if time itself is a compo-
nent of the state vector 𝑥𝑡 .
In other words, there exists a fixed matrix 𝐹 such that 𝑢𝑡 = −𝐹 𝑥𝑡 for all 𝑡.
That decision rules are constant over time is intuitive — after all, the decision-maker faces
the same infinite horizon at every stage, with only the current state changing.
Not surprisingly, 𝑃 and 𝑑 are also constant.
The stationary matrix 𝑃 is the solution to the discrete-time algebraic Riccati equation.
Equation (21) is also called the LQ Bellman equation, and the map that sends a given 𝑃 into
the right-hand side of (21) is called the LQ Bellman operator.
The stationary optimal policy for this model is
40.7. FURTHER APPLICATIONS 647
𝛽
𝑑 ∶= trace(𝐶 ′ 𝑃 𝐶) (23)
1−𝛽
The state evolves according to the time-homogeneous process 𝑥𝑡+1 = (𝐴 − 𝐵𝐹 )𝑥𝑡 + 𝐶𝑤𝑡+1 .
An example infinite horizon problem is treated below.
Linear quadratic control problems of the class discussed above have the property of certainty
equivalence.
By this, we mean that the optimal policy 𝐹 is not affected by the parameters in 𝐶, which
specify the shock process.
This can be confirmed by inspecting (22) or (19).
It follows that we can ignore uncertainty when solving for optimal behavior, and plug it back
in when examining optimal state dynamics.
𝑇 −1
𝔼 { ∑ 𝛽 𝑡 (𝑐𝑡 − 𝑐)̄ 2 + 𝛽 𝑇 𝑞𝑎2𝑇 } (24)
𝑡=0
To put this into an LQ setting, consider the budget constraint, which becomes
The fact that 𝑎𝑡+1 is a linear function of (𝑎𝑡 , 1, 𝑡, 𝑡2 ) suggests taking these four variables as
the state vector 𝑥𝑡 .
Once a good choice of state and control (recall 𝑢𝑡 = 𝑐𝑡 − 𝑐)̄ has been made, the remaining
specifications fall into place relatively easily.
Thus, for the dynamics we set
𝑎𝑡 1 + 𝑟 −𝑐 ̄ 𝑚1 𝑚2 −1 𝜎
⎜ 1 ⎞
⎛ ⎟ ⎛
⎜ 0 1 0 0 ⎞⎟ ⎜ 0 ⎞
⎛ ⎟ ⎜ 0 ⎞
⎛ ⎟
𝑥𝑡 ∶= ⎜
⎜ ⎟, 𝐴 ∶= ⎜ ⎟, 𝐵 ∶= ⎜ ⎟, 𝐶 ∶= ⎜ ⎟ (26)
⎜ 𝑡 ⎟⎟ ⎜
⎜ 0 1 1 0 ⎟⎟ ⎜ 0 ⎟
⎜ ⎟ ⎜ 0 ⎟
⎜ ⎟
2
⎝ 𝑡 ⎠ ⎝ 0 1 2 1 ⎠ ⎝ 0 ⎠ ⎝ 0 ⎠
If you expand the expression 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝑢𝑡 + 𝐶𝑤𝑡+1 using this specification, you will find
that assets follow (25) as desired and that the other state variables also update appropriately.
To implement preference specification (24) we take
0 0 0 0 𝑞 0 0 0
⎛
⎜ 0 0 0 0 ⎞
⎟ ⎛
⎜ 0 0 0 0 ⎞
⎟
𝑄 ∶= 1, 𝑅 ∶= ⎜
⎜ ⎟
⎟ and 𝑅𝑓 ∶= ⎜
⎜ ⎟
⎟ (27)
⎜ 0 0 0 0 ⎟ ⎜ 0 0 0 0 ⎟
⎝ 0 0 0 0 ⎠ ⎝ 0 0 0 0 ⎠
The next figure shows a simulation of consumption and assets computed using the
40.7. FURTHER APPLICATIONS 649
In the previous application, we generated income dynamics with an inverted U shape using
polynomials and placed them in an LQ framework.
It is arguably the case that this income process still contains unrealistic features.
A more common earning profile is where
1. income grows over working life, fluctuating around an increasing trend, with growth
flattening off in later years
2. retirement follows, with lower but relatively stable (non-financial) income
𝑝(𝑡) + 𝜎𝑤𝑡+1 if 𝑡 ≤ 𝐾
𝑦𝑡 = { (28)
𝑠 otherwise
Here
650 CHAPTER 40. LQ CONTROL: FOUNDATIONS
• 𝑝(𝑡) ∶= 𝑚1 𝑡 + 𝑚2 𝑡2 with the coefficients 𝑚1 , 𝑚2 chosen such that 𝑝(𝐾) = 𝜇 and 𝑝(0) =
𝑝(2𝐾) = 0
• 𝑠 is retirement income
We suppose that preferences are unchanged and given by (16).
The budget constraint is also unchanged and given by 𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑐𝑡 + 𝑦𝑡 .
Our aim is to solve this problem and simulate paths using the LQ techniques described in this
lecture.
In fact, this is a nontrivial problem, as the kink in the dynamics (28) at 𝐾 makes it very diffi-
cult to express the law of motion as a fixed-coefficient linear system.
However, we can still use our LQ methods here by suitably linking two-component LQ prob-
lems.
These two LQ problems describe the consumer’s behavior during her working life
(lq_working) and retirement (lq_retired).
(This is possible because, in the two separate periods of life, the respective income processes
[polynomial trend and constant] each fit the LQ framework)
The basic idea is that although the whole problem is not a single time-invariant LQ problem,
it is still a dynamic programming problem, and hence we can use appropriate Bellman equa-
tions at every stage.
Based on this logic, we can
1. solve lq_retired by the usual backward induction procedure, iterating back to the start
of retirement.
2. take the start-of-retirement value function generated by this process, and use it as the
terminal condition 𝑅𝑓 to feed into the lq_working specification.
3. solve lq_working by backward induction from this choice of 𝑅𝑓 , iterating back to the
start of working life.
This process gives the entire life-time sequence of value functions and optimal policies.
40.7. FURTHER APPLICATIONS 651
The full set of parameters used in the simulation is discussed in Exercise 2, where you are
asked to replicate the figure.
Once again, the dominant feature observable in the simulation is consumption smoothing.
The asset path fits well with standard life cycle theory, with dissaving early in life followed by
later saving.
Assets peak at retirement and subsequently decline.
𝑝𝑡 = 𝑎0 − 𝑎1 𝑞𝑡 + 𝑑𝑡
∞
𝔼 { ∑ 𝛽 𝑡 𝜋𝑡 } where 𝜋𝑡 ∶= 𝑝𝑡 𝑞𝑡 − 𝑐𝑞𝑡 − 𝛾(𝑞𝑡+1 − 𝑞𝑡 )2 (29)
𝑡=0
Here
• 𝛾(𝑞𝑡+1 − 𝑞𝑡 )2 represents adjustment costs
• 𝑐 is average cost of production
This can be formulated as an LQ problem and then solved and simulated, but first let’s study
the problem and try to get some intuition.
One way to start thinking about the problem is to consider what would happen if 𝛾 = 0.
Without adjustment costs there is no intertemporal trade-off, so the monopolist will choose
output to maximize current profit in each period.
It’s not difficult to show that profit-maximizing output is
𝑎0 − 𝑐 + 𝑑 𝑡
𝑞𝑡̄ ∶=
2𝑎1
This makes no difference to the solution, since 𝑎1 𝑞𝑡2̄ does not depend on the controls.
(In fact, we are just adding a constant term to (29), and optimizers are not affected by con-
stant terms)
The reason for making this substitution is that, as you will be able to verify, 𝜋𝑡̂ reduces to the
simple quadratic
∞
min 𝔼 ∑ 𝛽 𝑡 {𝑎1 (𝑞𝑡 − 𝑞𝑡̄ )2 + 𝛾𝑢2𝑡 } (30)
𝑡=0
It’s now relatively straightforward to find 𝑅 and 𝑄 such that (30) can be written as (20).
Furthermore, the matrices 𝐴, 𝐵 and 𝐶 from (1) can be found by writing down the dynamics
of each element of the state.
Exercise 3 asks you to complete this process, and reproduce the preceding figures.
40.8 Exercises
40.8.1 Exercise 1
40.8.2 Exercise 2
With some careful footwork, the simulation can be generated by patching together the simu-
lations from these two separate models.
40.8.3 Exercise 3
40.9 Solutions
40.9.1 Exercise 1
𝑦𝑡 = 𝑚1 𝑡 + 𝑚2 𝑡2 + 𝜎𝑤𝑡+1
where {𝑤𝑡 } is IID 𝑁 (0, 1) and the coefficients 𝑚1 and 𝑚2 are chosen so that 𝑝(𝑡) = 𝑚1 𝑡 +
𝑚2 𝑡2 has an inverted U shape with
• 𝑝(0) = 0, 𝑝(𝑇 /2) = 𝜇, and
• 𝑝(𝑇 ) = 0
# Formulate as an LQ problem
Q = 1
R = np.zeros((4, 4))
Rf = np.zeros((4, 4))
Rf[0, 0] = q
A = [[1 + r, c_bar, m1, m2],
[0, 1, 0, 0],
[0, 1, 1, 0],
[0, 1, 2, 1]]
B = [[1],
[ 0],
[ 0],
[ 0]]
C = [[σ],
[0],
[0],
[0]]
656 CHAPTER 40. LQ CONTROL: FOUNDATIONS
# Plot results
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))
plt.subplots_adjust(hspace=0.5)
for ax in axes:
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=2, **legend_args)
plt.show()
40.9. SOLUTIONS 657
40.9.2 Exercise 2
This is a permanent income / life-cycle model with polynomial growth in income over work-
ing life followed by a fixed retirement income.
The model is solved by combining two LQ programming problems as described in the lecture.
up = np.column_stack((up_w, up_r))
c = up.flatten() + c_bar # Consumption
# Plot results
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))
plt.subplots_adjust(hspace=0.5)
for ax in axes:
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=2, **legend_args)
plt.show()
40.9.3 Exercise 3
The first task is to find the matrices 𝐴, 𝐵, 𝐶, 𝑄, 𝑅 that define the LQ problem.
Recall that 𝑥𝑡 = (𝑞𝑡̄ 𝑞𝑡 1)′ , while 𝑢𝑡 = 𝑞𝑡+1 − 𝑞𝑡 .
Letting 𝑚0 ∶= (𝑎0 − 𝑐)/2𝑎1 and 𝑚1 ∶= 1/2𝑎1 , we can write 𝑞𝑡̄ = 𝑚0 + 𝑚1 𝑑𝑡 , and then, with
some manipulation
660 CHAPTER 40. LQ CONTROL: FOUNDATIONS
𝑞𝑡+1
̄ = 𝑚0 (1 − 𝜌) + 𝜌𝑞𝑡̄ + 𝑚1 𝜎𝑤𝑡+1
∞
min 𝔼 {∑ 𝛽 𝑡 𝑎1 (𝑞𝑡 − 𝑞𝑡̄ )2 + 𝛾𝑢2𝑡 }
𝑡=0
# Useful constants
m0 = (a0c)/(2 * a1)
m1 = 1/(2 * a1)
# Formulate LQ problem
Q = γ
R = [[ a1, a1, 0],
[a1, a1, 0],
[ 0, 0, 0]]
A = [[ρ, 0, m0 * (1 ρ)],
[0, 1, 0],
[0, 0, 1]]
B = [[0],
[1],
[0]]
C = [[m1 * σ],
[ 0],
[ 0]]
time = range(len(q))
ax.set(xlabel='Time', xlim=(0, max(time)))
ax.plot(time, q_bar, 'k', lw=2, alpha=0.6, label=r'$\bar q_t$')
ax.plot(time, q, 'b', lw=2, alpha=0.6, label='$q_t$')
ax.legend(ncol=2, **legend_args)
s = f'dynamics with $\gamma = {γ}$'
ax.text(max(time) * 0.6, 1 * q_bar.max(), s, fontsize=14)
plt.show()
662 CHAPTER 40. LQ CONTROL: FOUNDATIONS
Chapter 41
41.1 Contents
• Overview 41.2
• The Savings Problem 41.3
• Alternative Representations 41.4
• Two Classic Examples 41.5
• Further Reading 41.6
• Appendix: The Euler Equation 41.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
41.2 Overview
This lecture describes a rational expectations version of the famous permanent income model
of Milton Friedman [38].
Robert Hall cast Friedman’s model within a linear-quadratic setting [47].
Like Hall, we formulate an infinite-horizon linear-quadratic savings problem.
We use the model as a vehicle for illustrating
• alternative formulations of the state of a dynamic system
• the idea of cointegration
• impulse response functions
• the idea that changes in consumption are useful as predictors of movements in income
Background readings on the linear-quadratic-Gaussian permanent income model are Hall’s
[47] and chapter 2 of [72].
Let’s start with some imports
663
664 CHAPTER 41. THE PERMANENT INCOME MODEL
In this section, we state and solve the savings and consumption problem faced by the con-
sumer.
41.3.1 Preliminaries
𝔼𝑡 [𝑋𝑡+1 ] = 𝑋𝑡 , 𝑡 = 0, 1, 2, …
𝑋𝑡+1 = 𝑋𝑡 + 𝑤𝑡+1
𝑡
𝑋𝑡 = ∑ 𝑤𝑗 + 𝑋0
𝑗=1
Not every martingale arises as a random walk (see, for example, Wald’s martingale).
A consumer has preferences over consumption streams that are ordered by the utility func-
tional
∞
𝔼0 [∑ 𝛽 𝑡 𝑢(𝑐𝑡 )] (1)
𝑡=0
where
• 𝔼𝑡 is the mathematical expectation conditioned on the consumer’s time 𝑡 information
• 𝑐𝑡 is time 𝑡 consumption
41.3. THE SAVINGS PROBLEM 665
1
𝑐𝑡 + 𝑏𝑡 = 𝑏 + 𝑦𝑡 𝑡≥0 (2)
1 + 𝑟 𝑡+1
Here
• 𝑦𝑡 is an exogenous endowment process.
• 𝑟 > 0 is a time-invariant risk-free net interest rate.
• 𝑏𝑡 is one-period risk-free debt maturing at 𝑡.
The consumer also faces initial conditions 𝑏0 and 𝑦0 , which can be fixed or random.
41.3.3 Assumptions
For the remainder of this lecture, we follow Friedman and Hall in assuming that (1+𝑟)−1 = 𝛽.
Regarding the endowment process, we assume it has the state-space representation
where
• {𝑤𝑡 } is an IID vector process with 𝔼𝑤𝑡 = 0 and 𝔼𝑤𝑡 𝑤𝑡′ = 𝐼.
• The spectral radius of 𝐴 satisfies 𝜌(𝐴) < √1/𝛽.
• 𝑈 is a selection vector that pins down 𝑦𝑡 as a particular linear combination of compo-
nents of 𝑧𝑡 .
The restriction on 𝜌(𝐴) prevents income from growing so fast that discounted geometric sums
of some quadratic forms to be described below become infinite.
Regarding preferences, we assume the quadratic utility function
Note
Along with this quadratic utility specification, we allow consumption to be nega-
tive. However, by choosing parameters appropriately, we can make the probability
that the model generates negative consumption paths over finite time horizons as
low as desired.
∞
𝔼0 [∑ 𝛽 𝑡 𝑏𝑡2 ] < ∞ (4)
𝑡=0
666 CHAPTER 41. THE PERMANENT INCOME MODEL
This condition rules out an always-borrow scheme that would allow the consumer to enjoy
bliss consumption forever.
𝔼𝑡 [𝑐𝑡+1 ] = 𝑐𝑡 (6)
(In fact, quadratic preferences are necessary for this conclusion Section ??)
One way to interpret (6) is that consumption will change only when “new information” about
permanent income is revealed.
These ideas will be clarified below.
Note
One way to solve the consumer’s problem is to apply dynamic programming as
in this lecture. We do this later. But first we use an alternative approach that is
revealing and shows the work that dynamic programming does for us behind the
scenes.
𝑡
To accomplish this, observe first that (4) implies lim𝑡→∞ 𝛽 2 𝑏𝑡+1 = 0.
Using this restriction on the debt path and solving (2) forward yields
∞
𝑏𝑡 = ∑ 𝛽 𝑗 (𝑦𝑡+𝑗 − 𝑐𝑡+𝑗 ) (7)
𝑗=0
41.3. THE SAVINGS PROBLEM 667
Take conditional expectations on both sides of (7) and use the martingale property of con-
sumption and the law of iterated expectations to deduce
∞
𝑐𝑡
𝑏𝑡 = ∑ 𝛽 𝑗 𝔼𝑡 [𝑦𝑡+𝑗 ] − (8)
𝑗=0
1−𝛽
∞ ∞
𝑟
𝑐𝑡 = (1 − 𝛽) [∑ 𝛽 𝑗 𝔼𝑡 [𝑦𝑡+𝑗 ] − 𝑏𝑡 ] = [∑ 𝛽 𝑗 𝔼𝑡 [𝑦𝑡+𝑗 ] − 𝑏𝑡 ] (9)
𝑗=0
1 + 𝑟 𝑗=0
∞ ∞
∑ 𝛽 𝑗 𝔼𝑡 [𝑦𝑡+𝑗 ] = 𝔼𝑡 [∑ 𝛽 𝑗 𝑦𝑡+𝑗 ] = 𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡
𝑗=0 𝑗=0
𝑟
𝑐𝑡 = [𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡 − 𝑏𝑡 ] (10)
1+𝑟
𝑏𝑡+1 = (1 + 𝑟)(𝑏𝑡 + 𝑐𝑡 − 𝑦𝑡 )
= (1 + 𝑟)𝑏𝑡 + 𝑟[𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡 − 𝑏𝑡 ] − (1 + 𝑟)𝑈 𝑧𝑡
= 𝑏𝑡 + 𝑈 [𝑟(𝐼 − 𝛽𝐴)−1 − (1 + 𝑟)𝐼]𝑧𝑡
= 𝑏𝑡 + 𝑈 (𝐼 − 𝛽𝐴)−1 (𝐴 − 𝐼)𝑧𝑡
To get from the second last to the last expression in this chain of equalities is not trivial.
∞
A key is to use the fact that (1 + 𝑟)𝛽 = 1 and (𝐼 − 𝛽𝐴)−1 = ∑𝑗=0 𝛽 𝑗 𝐴𝑗 .
We’ve now successfully written 𝑐𝑡 and 𝑏𝑡+1 as functions of 𝑏𝑡 and 𝑧𝑡 .
A State-Space Representation
We can summarize our dynamics in the form of a linear state-space system governing con-
sumption, debt and income:
𝑧 𝐴 0 𝐶
𝑥𝑡 = [ 𝑡 ] , 𝐴̃ = [ ], 𝐶̃ = [ ]
𝑏𝑡 𝑈 (𝐼 − 𝛽𝐴)−1 (𝐴 − 𝐼) 1 0
and
𝑈 0 𝑦
𝑈̃ = [ −1 ], 𝑦𝑡̃ = [ 𝑡 ]
(1 − 𝛽)𝑈 (𝐼 − 𝛽𝐴) −(1 − 𝛽) 𝑐𝑡
𝑥𝑡+1 = 𝐴𝑥 ̃ + 𝐶𝑤
̃
𝑡 𝑡+1
(12)
̃
𝑦𝑡̃ = 𝑈 𝑥𝑡
We can use the following formulas from linear state space models to compute population
mean 𝜇𝑡 = 𝔼𝑥𝑡 and covariance Σ𝑡 ∶= 𝔼[(𝑥𝑡 − 𝜇𝑡 )(𝑥𝑡 − 𝜇𝑡 )′ ]
̃
𝜇𝑡+1 = 𝐴𝜇 with 𝜇0 given (13)
𝑡
̃ 𝐴′̃ + 𝐶 𝐶
Σ𝑡+1 = 𝐴Σ ̃ ′̃ with Σ0 given (14)
𝑡
𝜇𝑦,𝑡 = 𝑈̃ 𝜇𝑡
(15)
Σ𝑦,𝑡 = 𝑈̃ Σ𝑡 𝑈̃ ′
41.3. THE SAVINGS PROBLEM 669
To gain some preliminary intuition on the implications of (11), let’s look at a highly stylized
example where income is just IID.
(Later examples will investigate more realistic income streams)
In particular, let {𝑤𝑡 }∞
𝑡=1 be IID and scalar standard normal, and let
𝑧1 0 0 𝜎
𝑧𝑡 = [ 𝑡 ] , 𝐴=[ ], 𝑈 = [1 𝜇] , 𝐶=[ ]
1 0 1 0
𝑡−1
𝑏𝑡 = −𝜎 ∑ 𝑤𝑗
𝑗=1
𝑡
𝑐𝑡 = 𝜇 + (1 − 𝛽)𝜎 ∑ 𝑤𝑗
𝑗=1
Thus income is IID and debt and consumption are both Gaussian random walks.
Defining assets as −𝑏𝑡 , we see that assets are just the cumulative sum of unanticipated in-
comes prior to the present date.
The next figure shows a typical realization with 𝑟 = 0.05, 𝜇 = 1, and 𝜎 = 0.15
In [3]: r = 0.05
β = 1 / (1 + r)
σ = 0.15
μ = 1
T = 60
@njit
def time_path(T):
w = np.random.randn(T+1) # w_0, w_1, ..., w_T
w[0] = 0
b = np.zeros(T+1)
for t in range(1, T+1):
b[t] = w[1:t].sum()
b = σ * b
c = μ + (1 β) * (σ * w b)
return w, b, c
w, b, c = time_path(T)
plt.show()
670 CHAPTER 41. THE PERMANENT INCOME MODEL
b_sum = np.zeros(T+1)
for i in range(250):
w, b, c = time_path(T) # Generate new time path
rcolor = random.choice(('c', 'g', 'b', 'k'))
ax.plot(c, color=rcolor, lw=0.8, alpha=0.7)
ax.grid()
ax.set(xlabel='Time', ylabel='Consumption')
plt.show()
41.4. ALTERNATIVE REPRESENTATIONS 671
In this section, we shed more light on the evolution of savings, debt and consumption by rep-
resenting their dynamics in several different ways.
Hall [47] suggested an insightful way to summarize the implications of LQ permanent income
theory.
First, to represent the solution for 𝑏𝑡 , shift (9) forward one period and eliminate 𝑏𝑡+1 by using
(2) to obtain
∞
𝑐𝑡+1 = (1 − 𝛽) ∑ 𝛽 𝑗 𝔼𝑡+1 [𝑦𝑡+𝑗+1 ] − (1 − 𝛽) [𝛽 −1 (𝑐𝑡 + 𝑏𝑡 − 𝑦𝑡 )]
𝑗=0
∞
If we add and subtract 𝛽 −1 (1 − 𝛽) ∑𝑗=0 𝛽 𝑗 𝔼𝑡 𝑦𝑡+𝑗 from the right side of the preceding equation
and rearrange, we obtain
∞
𝑐𝑡+1 − 𝑐𝑡 = (1 − 𝛽) ∑ 𝛽 𝑗 {𝔼𝑡+1 [𝑦𝑡+𝑗+1 ] − 𝔼𝑡 [𝑦𝑡+𝑗+1 ]} (16)
𝑗=0
The right side is the time 𝑡 + 1 innovation to the expected present value of the endowment
process {𝑦𝑡 }.
We can represent the optimal decision rule for (𝑐𝑡 , 𝑏𝑡+1 ) in the form of (16) and (8), which we
repeat:
672 CHAPTER 41. THE PERMANENT INCOME MODEL
∞
1
𝑏𝑡 = ∑ 𝛽 𝑗 𝔼𝑡 [𝑦𝑡+𝑗 ] − 𝑐 (17)
𝑗=0
1−𝛽 𝑡
Equation (17) asserts that the consumer’s debt due at 𝑡 equals the expected present value of
its endowment minus the expected present value of its consumption stream.
A high debt thus indicates a large expected present value of surpluses 𝑦𝑡 − 𝑐𝑡 .
Recalling again our discussion on forecasting geometric sums, we have
∞
𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 = 𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡
𝑗=0
∞
𝔼𝑡+1 ∑ 𝛽 𝑗 𝑦𝑡+𝑗+1 = 𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡+1
𝑗=0
∞
𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗+1 = 𝑈 (𝐼 − 𝛽𝐴)−1 𝐴𝑧𝑡
𝑗=0
Using these formulas together with (3) and substituting into (16) and (17) gives the following
representation for the consumer’s optimum decision rule:
41.4.2 Cointegration
Representation (18) reveals that the joint process {𝑐𝑡 , 𝑏𝑡 } possesses the property that Engle
and Granger [33] called cointegration.
Cointegration is a tool that allows us to apply powerful results from the theory of stationary
stochastic processes to (certain transformations of) nonstationary models.
To apply cointegration in the present context, suppose that 𝑧𝑡 is asymptotically stationary
Section ??.
Despite this, both 𝑐𝑡 and 𝑏𝑡 will be non-stationary because they have unit roots (see (11) for
𝑏𝑡 ).
Nevertheless, there is a linear combination of 𝑐𝑡 , 𝑏𝑡 that is asymptotically stationary.
In particular, from the second equality in (18) we have
41.4. ALTERNATIVE REPRESENTATIONS 673
∞
(1 − 𝛽)𝑏𝑡 + 𝑐𝑡 = (1 − 𝛽)𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 (20)
𝑗=0
Equation (20) asserts that the cointegrating residual on the left side equals the conditional
expectation of the geometric sum of future incomes on the right Section ??.
Consider again (18), this time in light of our discussion of distribution dynamics in the lec-
ture on linear systems.
The dynamics of 𝑐𝑡 are given by
or
𝑡
𝑐𝑡 = 𝑐0 + ∑ 𝑤̂ 𝑗 for 𝑤̂ 𝑡+1 ∶= (1 − 𝛽)𝑈 (𝐼 − 𝛽𝐴)−1 𝐶𝑤𝑡+1
𝑗=1
The unit root affecting 𝑐𝑡 causes the time 𝑡 variance of 𝑐𝑡 to grow linearly with 𝑡.
In particular, since {𝑤̂ 𝑡 } is IID, we have
where
Impulse response functions measure responses to various impulses (i.e., temporary shocks).
The impulse response function of {𝑐𝑡 } to the innovation {𝑤𝑡 } is a box.
In particular, the response of 𝑐𝑡+𝑗 to a unit increase in the innovation 𝑤𝑡+1 is (1 − 𝛽)𝑈 (𝐼 −
𝛽𝐴)−1 𝐶 for all 𝑗 ≥ 1.
It’s useful to express the innovation to the expected present value of the endowment process
in terms of a moving average representation for income 𝑦𝑡 .
The endowment process defined by (3) has the moving average representation
where
∞
• 𝑑(𝐿) = ∑𝑗=0 𝑑𝑗 𝐿𝑗 for some sequence 𝑑𝑗 , where 𝐿 is the lag operator Section ??
• at time 𝑡, the consumer has an information set Section ?? 𝑤𝑡 = [𝑤𝑡 , 𝑤𝑡−1 , …]
Notice that
It follows that
The object 𝑑(𝛽) is the present value of the moving average coefficients in the represen-
tation for the endowment process 𝑦𝑡 .
𝑧 1 0 𝑧1𝑡 𝜎 0 𝑤1𝑡+1
[ 1𝑡+1 ] = [ ][ ] + [ 1 ][ ]
𝑧2𝑡+1 0 0 𝑧2𝑡 0 𝜎2 𝑤2𝑡+1
Here
• 𝑤𝑡+1 is an IID 2 × 1 process distributed as 𝑁 (0, 𝐼).
• 𝑧1𝑡 is a permanent component of 𝑦𝑡 .
• 𝑧2𝑡 is a purely transitory component of 𝑦𝑡 .
41.5. TWO CLASSIC EXAMPLES 675
41.5.1 Example 1
Formula (26) shows how an increment 𝜎1 𝑤1𝑡+1 to the permanent component of income 𝑧1𝑡+1
leads to
• a permanent one-for-one increase in consumption and
• no increase in savings −𝑏𝑡+1
But the purely transitory component of income 𝜎2 𝑤2𝑡+1 leads to a permanent increment in
consumption by a fraction 1 − 𝛽 of transitory income.
The remaining fraction 𝛽 is saved, leading to a permanent increment in −𝑏𝑡+1 .
Application of the formula for debt in (11) to this example shows that
This confirms that none of 𝜎1 𝑤1𝑡 is saved, while all of 𝜎2 𝑤2𝑡 is saved.
The next figure illustrates these very different reactions to transitory and permanent income
shocks using impulse-response functions
In [5]: r = 0.05
β = 1 / (1 + r)
S = 5 # Impulse date
σ1 = σ2 = 0.15
@njit
def time_path(T, permanent=False):
"Time path of consumption and debt given shock sequence"
w1 = np.zeros(T+1)
w2 = np.zeros(T+1)
b = np.zeros(T+1)
c = np.zeros(T+1)
if permanent:
w1[S+1] = 1.0
else:
w2[S+1] = 1.0
for t in range(1, T):
b[t+1] = b[t] σ2 * w2[t]
c[t+1] = c[t] + σ1 * w1[t+1] + (1 β) * σ2 * w2[t+1]
return b, c
L = 0.175
axes[0].legend(loc='lower right')
plt.tight_layout()
plt.show()
41.5.2 Example 2
Assume now that at time 𝑡 the consumer observes 𝑦𝑡 , and its history up to 𝑡, but not 𝑧𝑡 .
Under this assumption, it is appropriate to use an innovation representation to form 𝐴, 𝐶, 𝑈
in (18).
The discussion in sections 2.9.1 and 2.11.3 of [72] shows that the pertinent state space repre-
sentation for 𝑦𝑡 is
𝑦 1 −(1 − 𝐾) 𝑦𝑡 1
[ 𝑡+1 ] = [ ] [ ] + [ ] 𝑎𝑡+1
𝑎𝑡+1 0 0 𝑎𝑡 1
𝑦
𝑦𝑡 = [1 0] [ 𝑡 ]
𝑎𝑡
where
41.6. FURTHER READING 677
where the endowment process can now be represented in terms of the univariate innovation to
𝑦𝑡 as
This indicates how the fraction 𝐾 of the innovation to 𝑦𝑡 that is regarded as permanent influ-
ences the fraction of the innovation that is saved.
The model described above significantly changed how economists think about consumption.
While Hall’s model does a remarkably good job as a first approximation to consumption data,
it’s widely believed that it doesn’t capture important aspects of some consumption/savings
data.
For example, liquidity constraints and precautionary savings appear to be present sometimes.
Further discussion can be found in, e.g., [48], [86], [26], [20].
678 CHAPTER 41. THE PERMANENT INCOME MODEL
𝑏1
𝑐0 = − 𝑏0 + 𝑦0 and 𝑐1 = 𝑦1 − 𝑏1
1+𝑟
𝑏1
max {𝑢 ( − 𝑏0 + 𝑦0 ) + 𝛽 𝔼0 [𝑢(𝑦1 − 𝑏1 )]}
𝑏1 𝑅
42.1 Contents
• Overview 42.2
• Setup 42.3
• The LQ Approach 42.4
• Implementation 42.5
• Two Example Economies 42.6
In addition to what’s in Anaconda, this lecture will need the following libraries:
42.2 Overview
This lecture continues our analysis of the linear-quadratic (LQ) permanent income model of
savings and consumption.
As we saw in our previous lecture on this topic, Robert Hall [47] used the LQ permanent in-
come model to restrict and interpret intertemporal comovements of nondurable consumption,
nonfinancial income, and financial wealth.
For example, we saw how the model asserts that for any covariance stationary process for
nonfinancial income
• consumption is a random walk
• financial wealth has a unit root and is cointegrated with consumption
Other applications use the same LQ framework.
For example, a model isomorphic to the LQ permanent income model has been used by
Robert Barro [8] to interpret intertemporal comovements of a government’s tax collections,
its expenditures net of debt service, and its public debt.
This isomorphism means that in analyzing the LQ permanent income model, we are in effect
also analyzing the Barro tax smoothing model.
It is just a matter of appropriately relabeling the variables in Hall’s model.
679
680 CHAPTER 42. PERMANENT INCOME II: LQ TECHNIQUES
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
42.3 Setup
Let’s recall the basic features of the model discussed in the permanent income model.
Consumer preferences are ordered by
∞
𝐸0 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (1)
𝑡=0
1
𝑐𝑡 + 𝑏 𝑡 = 𝑏 + 𝑦𝑡 , 𝑡≥0 (2)
1 + 𝑟 𝑡+1
∞
𝐸0 ∑ 𝛽 𝑡 𝑏𝑡2 < ∞ (3)
𝑡=0
42.3. SETUP 681
The interpretation of all variables and parameters are the same as in the previous lecture.
We continue to assume that (1 + 𝑟)𝛽 = 1.
The dynamics of {𝑦𝑡 } again follow the linear state space model
The restrictions on the shock process and parameters are the same as in our previous lecture.
For the purposes of this lecture, let’s assume {𝑦𝑡 } is a second-order univariate autoregressive
process:
We can map this into the linear state space framework in (4), as discussed in our lecture on
682 CHAPTER 42. PERMANENT INCOME II: LQ TECHNIQUES
linear models.
To do so we take
1 1 0 0 0
𝑧𝑡 = ⎢ 𝑦𝑡 ⎤
⎡
⎥, 𝐴 = ⎢𝛼 𝜌1 𝜌2 ⎤
⎡
⎥,
⎡
𝐶 = ⎢𝜎 ⎤⎥, and 𝑈 = [0 1 0]
𝑦
⎣ 𝑡−1 ⎦ ⎣ 0 1 0 ⎦ 0
⎣ ⎦
Previously we solved the permanent income model by solving a system of linear expectational
difference equations subject to two boundary conditions.
Here we solve the same model using LQ methods based on dynamic programming.
After confirming that answers produced by the two methods agree, we apply QuantEcon’s
LinearStateSpace class to illustrate features of the model.
Why solve a model in two distinct ways?
Because by doing so we gather insights about the structure of the model.
Our earlier approach based on solving a system of expectational difference equations brought
to the fore the role of the consumer’s expectations about future nonfinancial income.
On the other hand, formulating the model in terms of an LQ dynamic programming problem
reminds us that
• finding the state (of a dynamic programming problem) is an art, and
• iterations on a Bellman equation implicitly jointly solve both a forecasting problem and
a control problem
Recall from our lecture on LQ theory that the optimal linear regulator problem is to choose a
decision rule for 𝑢𝑡 to minimize
∞
𝔼 ∑ 𝛽 𝑡 {𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 },
𝑡=0
̃ + 𝐵𝑢
𝑥𝑡+1 = 𝐴𝑥 ̃
̃ 𝑡 + 𝐶𝑤
𝑡 𝑡+1 , 𝑡 ≥ 0, (5)
where 𝑤𝑡+1 is IID with mean vector zero and 𝔼𝑤𝑡 𝑤𝑡′ = 𝐼.
The tildes in 𝐴,̃ 𝐵,̃ 𝐶 ̃ are to avoid clashing with notation in (4).
The value function for this problem is 𝑣(𝑥) = −𝑥′ 𝑃 𝑥 − 𝑑, where
• 𝑃 is the unique positive semidefinite solution of the corresponding matrix Riccati equa-
tion.
• The scalar 𝑑 is given by 𝑑 = 𝛽(1 − 𝛽)−1 trace(𝑃 𝐶 𝐶̃ ′̃ ).
42.5. IMPLEMENTATION 683
1
𝑧𝑡 ⎡ 𝑦 ⎤
𝑥𝑡 ∶= [ ] = ⎢ 𝑡 ⎥
𝑏𝑡 ⎢𝑦𝑡−1 ⎥
⎣ 𝑏𝑡 ⎦
𝐴 0 0 𝐶
𝐴 ̃ ∶= [ ] 𝐵̃ ∶= [ ] and 𝐶 ̃ ∶= [ ] 𝑤𝑡+1
(1 + 𝑟)(𝑈𝛾 − 𝑈 ) 1 + 𝑟 1+𝑟 0
Please confirm for yourself that, with these definitions, the LQ dynamics (5) match the dy-
namics of 𝑧𝑡 and 𝑏𝑡 described above.
To map utility into the quadratic form 𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 we can set
• 𝑄 ∶= 1 (remember that we are minimizing) and
• 𝑅 ∶= a 4 × 4 matrix of zeros
However, there is one problem remaining.
We have no direct way to capture the non-recursive restriction (3) on the debt sequence {𝑏𝑡 }
from within the LQ framework.
To try to enforce it, we’re going to use a trick: put a small penalty on 𝑏𝑡2 in the criterion func-
tion.
In the present setting, this means adding a small entry 𝜖 > 0 in the (4, 4) position of 𝑅.
That will induce a (hopefully) small approximation error in the decision rule.
We’ll check whether it really is small numerically soon.
42.5 Implementation
R = 1 / β
A = np.array([[1., 0., 0.],
[α, ρ1, ρ2],
[0., 1., 0.]])
C = np.array([[0.], [σ], [0.]])
G = np.array([[0., 1., 0.]])
QLQ = np.array([1.0])
BLQ = np.array([0., 0., 0., R]).reshape(4,1)
CLQ = np.array([0., σ, 0., 0.]).reshape(4,1)
β_LQ = β
A =
[[ 1. 0. 0. 0. ]
[10. 0.9 0. 0. ]
[ 0. 1. 0. 0. ]
[ 0. 1.05263158 0. 1.05263158]]
B =
[[0. ]
[0. ]
[0. ]
[1.05263158]]
R =
[[0.e+00 0.e+00 0.e+00 0.e+00]
[0.e+00 0.e+00 0.e+00 0.e+00]
[0.e+00 0.e+00 0.e+00 0.e+00]
[0.e+00 0.e+00 0.e+00 1.e09]]
Q =
[1.]
We’ll save the implied optimal policy function soon compare them with what we get by em-
ploying an alternative solution method
In our first lecture on the infinite horizon permanent income problem we used a different solu-
tion method.
The method was based around
• deducing the Euler equations that are the first-order conditions with respect to con-
sumption and savings.
• using the budget constraints and boundary condition to complete a system of expecta-
tional linear difference equations.
• solving those equations to obtain the solution.
Expressed in state space notation, the solution took the form
In [8]: # Use the above formulas to create the optimal policies for b_{t+1} and c_t
b_pol = G @ la.inv(np.eye(3, 3) β * A) @ (A np.eye(3, 3))
c_pol = (1 β) * G @ la.inv(np.eye(3, 3) β * A)
686 CHAPTER 42. PERMANENT INCOME II: LQ TECHNIQUES
# Use the following values to start everyone off at b=0, initial incomes zero
μ_0 = np.array([1., 0., 0., 0.])
Σ_0 = np.zeros((4, 4))
A_LSS calculated as we have here should equal ABF calculated above using the LQ model
[[65.51724138 0.34482759 0. ]]
[[ 6.55172323e+01 3.44827677e01 0.00000000e+00 5.00000190e02]]
We have verified that the two methods give the same solution.
Now let’s create instances of the LinearStateSpace class and use it to do some interesting ex-
periments.
To do this, we’ll use the outcomes from our second method.
• In the second example, while all begin with zero debt, we draw their initial income lev-
els from the invariant distribution of financial income.
– Consumers are ex-ante heterogeneous.
In the first example, consumers’ nonfinancial income paths display pronounced transients
early in the sample
• these will affect outcomes in striking ways
Those transient effects will not be present in the second example.
We use methods affiliated with the LinearStateSpace class to simulate the model.
We generate 25 paths of the exogenous non-financial income process and the associated opti-
mal consumption and debt paths.
In the first set of graphs, darker lines depict a particular sample path, while the lighter lines
describe 24 other paths.
A second graph plots a collection of simulations against the population distribution that we
extract from the LinearStateSpace instance LSS.
Comparing sample paths with population distributions at each date 𝑡 is a useful exercise—see
our discussion of the laws of large numbers
# Simulation/Moment Parameters
moment_generator = lss.moment_sequence()
for i in range(npaths):
sims = lss.simulate(T)
bsim[i, :] = sims[0][1, :]
688 CHAPTER 42. PERMANENT INCOME II: LQ TECHNIQUES
csim[i, :] = sims[1][1, :]
ysim[i, :] = sims[1][0, :]
# Get T
T = bsim.shape[1]
# Plot debt
ax[1].plot(bsim[0, :], label="b", color="r")
ax[1].plot(bsim.T, alpha=.1, color="r")
ax[1].legend(loc=4)
ax[1].set(xlabel="t", ylabel="debt")
fig.tight_layout()
return fig
# Consumption fan
ax[0].plot(xvals, cons_mean, color="k")
ax[0].plot(csim.T, color="k", alpha=.25)
ax[0].fill_between(xvals, c_perc_95m, c_perc_95p, alpha=.25, color="b")
ax[0].fill_between(xvals, c_perc_90m, c_perc_90p, alpha=.25, color="r")
ax[0].set(title="Consumption/Debt over time",
ylim=(cmean15, cmean+15), ylabel="consumption")
# Debt fan
ax[1].plot(xvals, debt_mean, color="k")
ax[1].plot(bsim.T, color="k", alpha=.25)
ax[1].fill_between(xvals, d_perc_95m, d_perc_95p, alpha=.25, color="b")
ax[1].fill_between(xvals, d_perc_90m, d_perc_90p, alpha=.25, color="r")
ax[1].set(xlabel="t", ylabel="debt")
fig.tight_layout()
return fig
Now let’s create figures with initial conditions of zero for 𝑦0 and 𝑏0
plt.show()
690 CHAPTER 42. PERMANENT INCOME II: LQ TECHNIQUES
plt.show()
∞
(1 − 𝛽)𝑏𝑡 + 𝑐𝑡 = (1 − 𝛽)𝐸𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 (6)
𝑗=0
So at time 0 we have
∞
𝑐0 = (1 − 𝛽)𝐸0 ∑ 𝛽 𝑗 𝑦𝑡
𝑡=0
This tells us that consumption starts at the income that would be paid by an annuity whose
value equals the expected discounted value of nonfinancial income at time 𝑡 = 0.
42.6. TWO EXAMPLE ECONOMIES 691
To support that level of consumption, the consumer borrows a lot early and consequently
builds up substantial debt.
In fact, he or she incurs so much debt that eventually, in the stochastic steady state, he con-
sumes less each period than his nonfinancial income.
He uses the gap between consumption and nonfinancial income mostly to service the interest
payments due on his debt.
Thus, when we look at the panel of debt in the accompanying graph, we see that this is a
group of ex-ante identical people each of whom starts with zero debt.
All of them accumulate debt in anticipation of rising nonfinancial income.
They expect their nonfinancial income to rise toward the invariant distribution of income, a
consequence of our having started them at 𝑦−1 = 𝑦−2 = 0.
Cointegration Residual
The following figure plots realizations of the left side of (6), which, as discussed in our last
lecture, is called the cointegrating residual.
As mentioned above, the right side can be thought of as an annuity payment on the expected
∞
present value of future income 𝐸𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 .
Early along a realization, 𝑐𝑡 is approximately constant while (1 − 𝛽)𝑏𝑡 and (1 −
∞
𝛽)𝐸𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 both rise markedly as the household’s present value of income and borrow-
ing rise pretty much together.
This example illustrates the following point: the definition of cointegration implies that the
cointegrating residual is asymptotically covariance stationary, not covariance stationary.
The cointegrating residual for the specification with zero income and zero debt initially has a
notable transient component that dominates its behavior early in the sample.
By altering initial conditions, we shall remove this transient in our second example to be pre-
sented below
return fig
When we set 𝑦−1 = 𝑦−2 = 0 and 𝑏0 = 0 in the preceding exercise, we make debt “head north”
early in the sample.
Average debt in the cross-section rises and approaches the asymptote.
We can regard these as outcomes of a “small open economy” that borrows from abroad at the
fixed gross interest rate 𝑅 = 𝑟 + 1 in anticipation of rising incomes.
So with the economic primitives set as above, the economy converges to a steady state in
which there is an excess aggregate supply of risk-free loans at a gross interest rate of 𝑅.
This excess supply is filled by “foreigner lenders” willing to make those loans.
We can use virtually the same code to rig a “poor man’s Bewley [14] model” in the following
way
• as before, we start everyone at 𝑏0 = 0.
𝑦
• But instead of starting everyone at 𝑦−1 = 𝑦−2 = 0, we draw [ −1 ] from the invariant
𝑦−2
distribution of the {𝑦𝑡 } process.
This rigs a closed economy in which people are borrowing and lending with each other at a
gross risk-free interest rate of 𝑅 = 𝛽 −1 .
Across the group of people being analyzed, risk-free loans are in zero excess supply.
42.6. TWO EXAMPLE ECONOMIES 693
We have arranged primitives so that 𝑅 = 𝛽 −1 clears the market for risk-free loans at zero
aggregate excess supply.
So the risk-free loans are being made from one person to another within our closed set of
agent.
There is no need for foreigners to lend to our group.
Let’s have a look at the corresponding figures
plt.show()
plt.show()
694 CHAPTER 42. PERMANENT INCOME II: LQ TECHNIQUES
43.1 Contents
• Overview 43.2
• Example 1 43.3
• Inventories Not Useful 43.4
• Inventories Useful but are Hardwired to be Zero Always 43.5
• Example 2 43.6
• Example 3 43.7
• Example 4 43.8
• Example 5 43.9
• Example 6 43.10
• Exercises 43.11
In addition to what’s in Anaconda, this lecture employs the following library:
43.2 Overview
697
698 CHAPTER 43. PRODUCTION SMOOTHING VIA INVENTORIES
period inventories.
We compute examples designed to indicate how the firm optimally chooses to smooth produc-
tion and manage inventories while keeping inventories close to sales.
To introduce components of the model, let
• 𝑆𝑡 be sales at time 𝑡
• 𝑄𝑡 be production at time 𝑡
• 𝐼𝑡 be inventories at the beginning of time 𝑡
• 𝛽 ∈ (0, 1) be a discount factor
• 𝑐(𝑄𝑡 ) = 𝑐1 𝑄𝑡 + 𝑐2 𝑄2𝑡 , be a cost of production function, where 𝑐1 > 0, 𝑐2 > 0, be an
inventory cost function
• 𝑑(𝐼𝑡 , 𝑆𝑡 ) = 𝑑1 𝐼𝑡 + 𝑑2 (𝑆𝑡 − 𝐼𝑡 )2 , where 𝑑1 > 0, 𝑑2 > 0, be a cost-of-holding-inventories
function, consisting of two components:
– a cost 𝑑1 𝑡 of carrying inventories, and
– a cost 𝑑2 (𝑆𝑡 − 𝐼𝑡 )2 of having inventories deviate from sales
• 𝑝𝑡 = 𝑎0 − 𝑎1 𝑆𝑡 + 𝑣𝑡 be an inverse demand function for a firm’s product, where 𝑎0 >
0, 𝑎1 > 0 and 𝑣𝑡 is a demand shock at time 𝑡
• 𝜋_𝑡 = 𝑝𝑡 𝑆𝑡 − 𝑐(𝑄𝑡 ) − 𝑑(𝐼𝑡 , 𝑆𝑡 ) be the firm’s profits at time 𝑡
∞
• ∑𝑡=0 𝛽 𝑡 𝜋𝑡 be the present value of the firm’s profits at time 0
• 𝐼𝑡+1 = 𝐼𝑡 + 𝑄𝑡 − 𝑆𝑡 be the law of motion of inventories
• 𝑧𝑡+1 = 𝐴22 𝑧𝑡 + 𝐶2 𝜖𝑡+1 be the law of motion for an exogenous state vector 𝑧𝑡 that con-
tains time 𝑡 information useful for predicting the demand shock 𝑣𝑡
• 𝑣𝑡 = 𝐺𝑧𝑡 link the demand shock to the information set 𝑧𝑡
• the constant 1 be the first component of 𝑧𝑡
To map our problem into a linear-quadratic discounted dynamic programming problem (also
known as an optimal linear regulator), we define the state vector at time 𝑡 as
𝐼
𝑥𝑡 = [ 𝑡 ]
𝑧𝑡
𝑄𝑡
𝑢𝑡 = [ ]
𝑆𝑡
𝐼 1 0 𝐼 1 −1 𝑄𝑡 0
[ 𝑡+1 ] = [ ] [ 𝑡] + [ ] [ ] + [ ] 𝜖𝑡+1
𝑧𝑡 0 𝐴22 𝑧𝑡 0 0 𝑆𝑡 𝐶2
or
(At this point, we ask that you please forgive us for using 𝑄𝑡 to be the firm’s production at
time 𝑡, while below we use 𝑄 as the matrix in the quadratic form 𝑢′𝑡 𝑄𝑢𝑡 that appears in the
firm’s one-period profit function)
We can express the firm’s profit as a function of states and controls as
43.2. OVERVIEW 699
To form the matrices 𝑅, 𝑄, 𝐻, we note that the firm’s profits at time 𝑡 function can be ex-
pressed
⎛ 2 + 𝑑 𝑆 2 + 𝑐 𝑄2 − 𝑎 𝑆 − 𝐺𝑧 𝑆 + 𝑐 𝑄 − 2𝑑 𝑆 𝐼 ⎞
=−⎜
⎜⏟𝑑1⏟
𝐼𝑡⏟
+ 𝑑⏟𝐼𝑡2 ⏟⏟
2⏟ + 𝑎⏟
1𝑆
⏟𝑡⏟⏟⏟ 2 𝑡 ⏟⏟⏟⏟2 𝑡 ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
0 𝑡 𝑡 𝑡 1 𝑡
⎟
2 𝑡 𝑡⎟
⎝ 𝑥′𝑡 𝑅𝑥𝑡 𝑢′𝑡 𝑄𝑢𝑡 2𝑢′𝑡 𝐻𝑥𝑡 ⎠
⎛
⎜ 𝑑2 𝑑1
0 𝑐1
=−⎜ [ 𝐼 𝑧 ′
] [ 2 𝑆𝑐 ] [ 𝐼𝑡 ] + [ 𝑄 𝑆 ] [
𝑐2 0
] [
𝑄𝑡
] + 2 [ 𝑄 𝑆 ] [ 2
⎜
⎜ 𝑡 𝑡 ⏟⏟ 𝑑1 ′
𝑧 𝑡 𝑡
0 𝑎 + 𝑑 𝑆 𝑡 𝑡
−𝑑 − 𝑎0
𝑆
2⏟⏟⏟𝑐 0⏟⏟ 𝑡 ⏟⏟⏟⏟⏟ 1 ⏟⏟2 𝑡 ⏟⏟⏟⏟⏟⏟
2 2 𝑆
⎝ ≡𝑅 ≡𝑄 ≡𝑁
(43.1)
𝑢𝑡 = −𝐹 𝑥𝑡
and the evolution of the state under the optimal decision rule is
Here is code for computing an optimal decision rule and for analyzing its consequences.
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
def __init__(self,
β=0.96, # Discount factor
c1=1, # Costofproduction
c2=1,
d1=1, # Costofholding inventories
d2=1,
a0=10, # Inverse demand function
a1=1,
A22=[[1, 0], # z process
[1, 0.9]],
C2=[[0], [1]],
G=[0, 1]):
self.β = β
self.c1, self.c2 = c1, c2
self.d1, self.d2 = d1, d2
self.a0, self.a1 = a0, a1
self.A22 = np.atleast_2d(A22)
self.C2 = np.atleast_2d(C2)
self.G = np.atleast_2d(G)
# Dimensions
k, j = self.C2.shape # Dimensions for randomness part
n = k + 1 # Number of states
m = 2 # Number of controls
Sc = np.zeros(k)
Sc[0] = 1
B = np.zeros((n, m))
B[0, :] = 1, 1
C = np.zeros((n, j))
C[1:, :] = self.C2
Q = np.zeros((m, m))
Q[0, 0] = c2
Q[1, 1] = a1 + d2
N = np.zeros((m, n))
N[1, 0] = d2
N[0, 1:] = c1 / 2 * Sc
N[1, 1:] = a0 / 2 * Sc self.G / 2
# Construct LQ instance
self.LQ = qe.LQ(Q, R, A, B, C, N, beta=β)
self.LQ.stationary_values()
43.2. OVERVIEW 701
Q_path = u_path[0, :]
S_path = u_path[1, :]
plt.show()
Notice that the above code sets parameters at the following default values
702 CHAPTER 43. PRODUCTION SMOOTHING VIA INVENTORIES
43.3 Example 1
𝜈𝑡 = 𝛼 + 𝜌𝜈𝑡−1 + 𝜖𝑡 ,
which implies
1 1 0 1 0
𝑧𝑡+1 = [ ]=[ ][ ] + [ ] 𝜖𝑡+1 .
𝑣𝑡+1 𝛼 𝜌 ⏟ 𝑣𝑡 1
𝑧𝑡
x0 = [0, 1, 0]
ex1.simulate(x0)
Starting from zero inventories, the firm builds up a stock of inventories and uses them to
smooth costly production in the face of demand shocks.
Optimal decisions evidently respond to demand shocks.
Inventories are always less than sales, so some sales come from current production, a conse-
quence of the cost, 𝑑1 𝐼𝑡 of holding inventories.
The lower right panel shows differences between optimal production and two alternative pro-
duction concepts that come from altering the firm’s cost structure – i.e., its technology.
These two concepts correspond to these distinct altered firm problems.
• a setting in which inventories are not needed
• a setting in which they are needed but we arbitrarily prevent the firm from holding in-
ventories by forcing it to set 𝐼𝑡 = 0 always
We use these two alternative production concepts in order to shed light on the baseline
model.
∞
∑ 𝛽 𝑡 {𝑝𝑡 𝑄𝑡 − 𝐶(𝑄𝑡 )}
𝑡=0
It turns out that the optimal plan for 𝑄𝑡 for this problem also solves a sequence of static
problems max𝑄𝑡 {𝑝𝑡 𝑄𝑡 − 𝑐(𝑄𝑡 )}.
When inventories aren’t required or used, sales always equal production.
This simplifies the problem and the optimal no-inventory production maximizes the expected
value of
∞
∑ 𝛽 𝑡 {𝑝𝑡 𝑄𝑡 − 𝐶 (𝑄𝑡 )} .
𝑡=0
𝑎0 + 𝜈 𝑡 − 𝑐 1
𝑄𝑛𝑖
𝑡 = .
𝑐2 + 𝑎1
Next, we turn to a distinct problem in which inventories are useful – meaning that there are
costs of 𝑑2 (𝐼𝑡 − 𝑆𝑡 )2 associated with having sales not equal to inventories – but we arbitrarily
impose on the firm the costly restriction that it never hold inventories.
Here the firm’s maximization problem is
704 CHAPTER 43. PRODUCTION SMOOTHING VIA INVENTORIES
∞
max ∑ 𝛽 𝑡 {𝑝𝑡 𝑆𝑡 − 𝐶 (𝑄𝑡 ) − 𝑑 (𝐼𝑡 , 𝑆𝑡 )}
{𝐼𝑡 ,𝑄𝑡 ,𝑆𝑡 }
𝑡=0
∞
max ∑ 𝛽 𝑡 {𝑝𝑡 𝑄𝑡 − 𝐶 (𝑄𝑡 ) − 𝑑 (0, 𝑄𝑡 )}
𝑄𝑡
𝑡=0
𝑎0 + 𝜈 𝑡 − 𝑐 1
𝑄ℎ𝑡 = .
𝑐2 + 𝑎 1 + 𝑑 2
We introduce this 𝐼𝑡 is hardwired to zero specification in order to shed light on the role
that inventories play by comparing outcomes with those under our two other versions of the
problem.
The bottom right panel displays an production path for the original problem that we are in-
terested in (the blue line) as well with an optimal production path for the model in which
inventories are not useful (the green path) and also for the model in which, although invento-
ries are useful, they are hardwired to zero and the firm pays cost 𝑑(0, 𝑄𝑡 ) for not setting sales
𝑆𝑡 = 𝑄𝑡 equal to zero (the orange line).
Notice that it is typically optimal for the firm to produce more when inventories aren’t useful.
Here there is no requirement to sell out of inventories and no costs from having sales deviate
from inventories.
But “typical” does not mean “always”.
Thus, if we look closely, we notice that for small 𝑡, the green “production when inventories
aren’t useful” line in the lower right panel is below optimal production in the original model.
High optimal production in the original model early on occurs because the firm wants to ac-
cumulate inventories quickly in order to acquire high inventories for use in later periods.
But how the green line compares to the blue line early on depends on the evolution of the
demand shock, as we will see in a deterministically seasonal demand shock example to be an-
alyzed below.
In that example, the original firm optimally accumulates inventories slowly because the next
positive demand shock is in the distant future.
To make the green-blue model production comparison easier to see, let’s confine the graphs to
the first 10 periods:
43.6 Example 2
Next, we shut down randomness in demand and assume that the demand shock 𝜈𝑡 follows a
deterministic path:
𝜈𝑡 = 𝛼 + 𝜌𝜈𝑡−1
x0 = [0, 1, 0]
ex2.simulate(x0)
706 CHAPTER 43. PRODUCTION SMOOTHING VIA INVENTORIES
43.7 Example 3
Now we’ll put randomness back into the demand shock process and also assume that there
are zero costs of holding inventories.
In particular, we’ll look at a situation in which 𝑑1 = 0 but 𝑑2 > 0.
Now it becomes optimal to set sales approximately equal to inventories and to use inventories
to smooth production quite well, as the following figures confirm
x0 = [0, 1, 0]
ex3.simulate(x0)
43.8. EXAMPLE 4 707
43.8 Example 4
To bring out some features of the optimal policy that are related to some technical issues in
linear control theory, we’ll now temporarily assume that it is costless to hold inventories.
When we completely shut down the cost of holding inventories by setting 𝑑1 = 0 and 𝑑2 = 0,
something absurd happens (because the Bellman equation is opportunistic and very smart).
(Technically, we have set parameters that end up violating conditions needed to assure sta-
bility of the optimally controlled state.)
The firm finds it optimal to set 𝑄𝑡 ≡ 𝑄∗ = −𝑐2𝑐2 , an output level that sets the costs of produc-
1
tion to zero (when 𝑐1 > 0, as it is with our default settings, then it is optimal to set produc-
tion negative, whatever that means!).
Recall the law of motion for inventories
𝐼𝑡+1 = 𝐼𝑡 + 𝑄𝑡 − 𝑆𝑡
−𝑐1
So when 𝑑1 = 𝑑2 = 0 so that the firm finds it optimal to set 𝑄𝑡 = 2𝑐2 for all 𝑡, then
−𝑐1
𝐼𝑡+1 − 𝐼𝑡 = − 𝑆𝑡 < 0
2𝑐2
for almost all values of 𝑆𝑡 under our default parameters that keep demand positive almost all
of the time.
The dynamic program instructs the firm to set production costs to zero and to run a Ponzi
scheme by running inventories down forever.
708 CHAPTER 43. PRODUCTION SMOOTHING VIA INVENTORIES
(We can interpret this as the firm somehow going short in or borrowing inventories)
The following figures confirm that inventories head south without limit
x0 = [0, 1, 0]
ex4.simulate(x0)
Let’s shorten the time span displayed in order to highlight what is going on.
We’ll set the horizon 𝑇 = 30 with the following code
43.9 Example 5
Now we’ll assume that the demand shock that follows a linear time trend
0
To represent this, we set 𝐶2 = [ ] and
0
1 0 1
𝐴22 = [ ] , 𝑥0 = [ ] , 𝐺 = [ 𝑏 𝑎 ]
1 1 0
In [11]: ex5 = SmoothingExample(A22=[[1, 0], [1, 1]], C2=[[0], [0]], G=[b, a])
43.10 Example 6
1 0 0 0 0 0 𝑏
⎡0 0 0 0 1⎤ ⎡0⎤ ⎡𝑎⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
𝐴22 = ⎢0 1 0 0 0⎥ , 𝐶2 = ⎢0⎥ , 𝐺′ = ⎢0⎥
⎢0 0 1 0 0⎥ ⎢0⎥ ⎢0⎥
⎣0 0 0 1 0⎦ ⎣0⎦ ⎣0⎦
1
⎡0⎤
⎢ ⎥
𝑥0 = ⎢1⎥
⎢0⎥
⎣0⎦
Now we’ll generate some more examples that differ simply from the initial season of the year
in which we begin the demand shock
43.11 Exercises
Please try to analyze some inventory sales smoothing problems using the SmoothingExample
class.
43.11.1 Exercise 1
𝜈𝑡 = 𝛼 + 𝜌1 𝜈𝑡−1 + 𝜌2 𝜈𝑡−2 + 𝜖𝑡 .
where 𝛼 = 1, 𝜌1 = 1.2, and 𝜌2 = −0.3. You need to construct 𝐴22, 𝐶, and 𝐺 matrices prop-
erly and then to input them as the keyword arguments of SmoothingExample class. Simulate
′
paths starting from the initial condition 𝑥0 = [0, 1, 0, 0] .
After this, try to construct a very similar SmoothingExample with the same demand shock
process but exclude the randomness 𝜖𝑡 . Compute the stationary states 𝑥̄ by simulating for a
long period. Then try to add shocks with different magnitude to 𝜈𝑡̄ and simulate paths. You
should see how firms respond differently by staring at the production plans.
43.11.2 Exercise 2
43.11.3 Solution 1
# initial condition
x0 = [0, 1, 0, 0]
In the following, we add small and large shocks to 𝜈𝑡̄ and compare how firm responds differ-
ently in quantity. As the shock is not very persistent under the parameterization we are us-
ing, we focus on a short period response.
In [20]: T = 40
43.11.4 Solution 2
In [23]: x0 = [0, 1, 0]
In [24]: SmoothingExample(c2=5).simulate(x0)
In [25]: SmoothingExample(d2=5).simulate(x0)
Part VII
717
Chapter 44
44.1 Contents
• Outline 44.2
• The Model 44.3
• Results 44.4
• Exercises 44.5
• Solutions 44.6
44.2 Outline
In 1969, Thomas C. Schelling developed a simple but striking model of racial segregation [98].
His model studies the dynamics of racially mixed neighborhoods.
Like much of Schelling’s work, the model shows how local interactions can lead to surprising
aggregate structure.
In particular, it shows that relatively mild preference for neighbors of similar race can lead in
aggregate to the collapse of mixed neighborhoods, and high levels of segregation.
In recognition of this and other research, Schelling was awarded the 2005 Nobel Prize in Eco-
nomic Sciences (joint with Robert Aumann).
In this lecture, we (in fact you) will build and run a version of Schelling’s model.
Let’s start with some imports:
We will cover a variation of Schelling’s model that is easy to program and captures the main
idea.
719
720 CHAPTER 44. SCHELLING’S SEGREGATION MODEL
44.3.1 Set-Up
Suppose we have two types of people: orange people and green people.
For the purpose of this lecture, we will assume there are 250 of each type.
These agents all live on a single unit square.
The location of an agent is just a point (𝑥, 𝑦), where 0 < 𝑥, 𝑦 < 1.
44.3.2 Preferences
We will say that an agent is happy if half or more of her 10 nearest neighbors are of the same
type.
Here ‘nearest’ is in terms of Euclidean distance.
An agent who is not happy is called unhappy.
An important point here is that agents are not averse to living in mixed areas.
They are perfectly happy if half their neighbors are of the other color.
44.3.3 Behavior
3. Else, go to step 1
44.4 Results
Let’s have a look at the results we got when we coded and ran this model.
As discussed above, agents are initially mixed randomly together.
44.4. RESULTS 721
But after several cycles, they become segregated into distinct regions.
722 CHAPTER 44. SCHELLING’S SEGREGATION MODEL
44.4. RESULTS 723
724 CHAPTER 44. SCHELLING’S SEGREGATION MODEL
In this instance, the program terminated after 4 cycles through the set of agents, indicating
that all agents had reached a state of happiness.
What is striking about the pictures is how rapidly racial integration breaks down.
This is despite the fact that people in the model don’t actually mind living mixed with the
other type.
Even with these preferences, the outcome is a high degree of segregation.
44.5 Exercises
44.5.1 Exercise 1
* Data:
* location
* Methods:
44.6 Solutions
44.6.1 Exercise 1
class Agent:
def draw_location(self):
self.location = uniform(0, 1), uniform(0, 1)
# == Main == #
num_of_type_0 = 250
num_of_type_1 = 250
num_neighbors = 10 # Number of agents regarded as neighbors
require_same_type = 5 # Want at least this many neighbors to be same type
count = 1
# == Loop until none wishes to move == #
while True:
print('Entering loop ', count)
plot_distribution(agents, count)
count += 1
no_one_moved = True
for agent in agents:
old_location = agent.location
agent.update(agents)
if agent.location != old_location:
no_one_moved = False
if no_one_moved:
break
print('Converged, terminating.')
Entering loop 1
44.6. SOLUTIONS 727
Entering loop 2
728 CHAPTER 44. SCHELLING’S SEGREGATION MODEL
Entering loop 3
44.6. SOLUTIONS 729
Entering loop 4
730 CHAPTER 44. SCHELLING’S SEGREGATION MODEL
Converged, terminating.
Chapter 45
45.1 Contents
• Overview 45.2
• The Model 45.3
• Implementation 45.4
• Dynamics of an Individual Worker 45.5
• Endogenous Job Finding Rate 45.6
• Exercises 45.7
• Solutions 45.8
In addition to what’s in Anaconda, this lecture will need the following libraries:
45.2 Overview
731
732 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
Later, we’ll determine some of these transition rates endogenously using the McCall search
model.
We’ll also use some nifty concepts like ergodicity, which provides a fundamental link between
cross-sectional and long run time series distributions.
These concepts will help us build an equilibrium model of ex-ante homogeneous workers
whose different luck generates variations in their ex post experiences.
Let’s start with some imports:
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
45.2.1 Prerequisites
Before working through what follows, we recommend you read the lecture on finite Markov
chains.
You will also need some basic linear algebra and probability.
The value 𝑏(𝐸𝑡 + 𝑈𝑡 ) is the mass of new workers entering the labor force unemployed.
The total stock of workers 𝑁𝑡 = 𝐸𝑡 + 𝑈𝑡 evolves as
𝑈𝑡
Letting 𝑋𝑡 ∶= ( ), the law of motion for 𝑋 is
𝐸𝑡
(1 − 𝑑)(1 − 𝜆) + 𝑏 (1 − 𝑑)𝛼 + 𝑏
𝑋𝑡+1 = 𝐴𝑋𝑡 where 𝐴 ∶= ( )
(1 − 𝑑)𝜆 (1 − 𝑑)(1 − 𝛼)
This law tells us how total employment and unemployment evolve over time.
𝑈 /𝑁 1 𝑈 /𝑁
( 𝑡+1 𝑡+1 ) = 𝐴 ( 𝑡 𝑡)
𝐸𝑡+1 /𝑁𝑡+1 1+𝑔 𝐸 𝑡 /𝑁𝑡
734 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
Letting
𝑢 𝑈 /𝑁
𝑥𝑡 ∶= ( 𝑡 ) = ( 𝑡 𝑡 )
𝑒𝑡 𝐸𝑡 /𝑁𝑡
̂ 1
𝑥𝑡+1 = 𝐴𝑥 𝑡 where 𝐴 ̂ ∶= 𝐴
1+𝑔
45.4 Implementation
Please be careful because the implied objects 𝑔, 𝐴, 𝐴 ̂ will not change if you only change the
primitives.
For example, if you would like to update a primitive like 𝛼 = 0.03, you need to create an
instance and update it by lm = LakeModel(α=0.03).
In the exercises, we show how to avoid this issue by using getter and setter methods.
Parameters:
λ : scalar
The job finding rate for currently unemployed workers
α : scalar
The dismissal rate for currently employed workers
b : scalar
Entry rate into the labor force
d : scalar
Exit rate from the labor force
"""
def __init__(self, λ=0.283, α=0.013, b=0.0124, d=0.00822):
45.4. IMPLEMENTATION 735
Returns
xbar : steady state vector of employment and unemployment rates
"""
x = 0.5 * np.ones(2)
error = tol + 1
while error > tol:
new_x = self.A_hat @ x
error = np.max(np.abs(new_x x))
x = new_x
return x
Parameters
X0 : array
Contains initial values (E0, U0)
T : int
Number of periods to simulate
Returns
X : iterator
Contains sequence of employment and unemployment stocks
"""
Parameters
x0 : array
Contains initial values (e0,u0)
T : int
Number of periods to simulate
Returns
x : iterator
Contains sequence of employment and unemployment rates
736 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
"""
x = np.atleast_1d(x0) # Recast as array just in case
for t in range(T):
yield x
x = self.A_hat @ x
In [4]: lm = LakeModel()
lm.α
Out[4]: 0.013
In [5]: lm.A
Let’s run a simulation under the default parameters (see above) starting from 𝑋0 = (12, 138)
In [7]: lm = LakeModel()
N_0 = 150 # Population
e_0 = 0.92 # Initial employment rate
u_0 = 1 e_0 # Initial unemployment rate
T = 50 # Simulation length
axes[2].plot(X_path.sum(1), lw=2)
axes[2].set_title('Labor force')
for ax in axes:
ax.grid()
plt.tight_layout()
plt.show()
45.4. IMPLEMENTATION 737
The aggregates 𝐸𝑡 and 𝑈𝑡 don’t converge because their sum 𝐸𝑡 + 𝑈𝑡 grows at rate 𝑔.
On the other hand, the vector of employment and unemployment rates 𝑥𝑡 can be in a steady
state 𝑥̄ if there exists an 𝑥̄ such that
̂ ̄
• 𝑥 ̄ = 𝐴𝑥
• the components satisfy 𝑒 ̄ + 𝑢̄ = 1
This equation tells us that a steady state level 𝑥̄ is an eigenvector of 𝐴 ̂ associated with a unit
eigenvalue.
We also have 𝑥𝑡 → 𝑥̄ as 𝑡 → ∞ provided that the remaining eigenvalue of 𝐴 ̂ has modulus less
that 1.
This is the case for our default parameters:
In [8]: lm = LakeModel()
e, f = np.linalg.eigvals(lm.A_hat)
abs(e), abs(f)
Let’s look at the convergence of the unemployment and employment rate to steady state lev-
els (dashed red line)
In [9]: lm = LakeModel()
e_0 = 0.92 # Initial employment rate
u_0 = 1 e_0 # Initial unemployment rate
738 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
T = 50 # Simulation length
xbar = lm.rate_steady_state()
plt.tight_layout()
plt.show()
An individual worker’s employment dynamics are governed by a finite state Markov process.
The worker can be in one of two states:
• 𝑠𝑡 = 0 means unemployed
• 𝑠𝑡 = 1 means employed
45.5. DYNAMICS OF AN INDIVIDUAL WORKER 739
1−𝜆 𝜆
𝑃 =( )
𝛼 1−𝛼
Let 𝜓𝑡 denote the marginal distribution over employment/unemployment states for the
worker at time 𝑡.
As usual, we regard it as a row vector.
We know from an earlier discussion that 𝜓𝑡 follows the law of motion
𝜓𝑡+1 = 𝜓𝑡 𝑃
We also know from the lecture on finite Markov chains that if 𝛼 ∈ (0, 1) and 𝜆 ∈ (0, 1), then
𝑃 has a unique stationary distribution, denoted here by 𝜓∗ .
The unique stationary distribution satisfies
𝛼
𝜓∗ [0] =
𝛼+𝜆
Not surprisingly, probability mass on the unemployment state increases with the dismissal
rate and falls with the job finding rate.
45.5.1 Ergodicity
1 𝑇
𝑠𝑢,𝑇
̄ ∶= ∑ 𝟙{𝑠𝑡 = 0}
𝑇 𝑡=1
and
1 𝑇
𝑠𝑒,𝑇
̄ ∶= ∑ 𝟙{𝑠𝑡 = 1}
𝑇 𝑡=1
lim 𝑠𝑢,𝑇
̄ = 𝜓∗ [0] and ̄ = 𝜓∗ [1]
lim 𝑠𝑒,𝑇
𝑇 →∞ 𝑇 →∞
740 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
How long does it take for time series sample averages to converge to cross-sectional averages?
We can use QuantEcon.py’s MarkovChain class to investigate this.
Let’s plot the path of the sample averages over 5,000 periods
α, λ = lm.α, lm.λ
P = [[1 λ, λ],
[ α, 1 α]]
mc = MarkovChain(P)
xbar = lm.rate_steady_state()
plt.tight_layout()
plt.show()
45.6. ENDOGENOUS JOB FINDING RATE 741
The most important thing to remember about the model is that optimal decisions are charac-
terized by a reservation wage 𝑤̄
• If the wage offer 𝑤 in hand is greater than or equal to 𝑤,̄ then the worker accepts.
• Otherwise, the worker rejects.
As we saw in our discussion of the model, the reservation wage depends on the wage offer dis-
tribution and the parameters
• 𝛼, the separation rate
742 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
Suppose that all workers inside a lake model behave according to the McCall search model.
The exogenous probability of leaving employment remains 𝛼.
But their optimal decision rules determine the probability 𝜆 of leaving unemployment.
This is now
̄ = 𝛾 ∑ 𝑝(𝑤′ )
𝜆 = 𝛾ℙ{𝑤𝑡 ≥ 𝑤} (1)
𝑤′ ≥𝑤̄
We can use the McCall search version of the Lake Model to find an optimal level of unem-
ployment insurance.
We assume that the government sets unemployment compensation 𝑐.
The government imposes a lump-sum tax 𝜏 sufficient to finance total unemployment pay-
ments.
To attain a balanced budget at a steady state, taxes, the steady state unemployment rate 𝑢,
and the unemployment compensation rate must satisfy
𝜏 = 𝑢𝑐
𝜏 = 𝑢(𝑐, 𝜏 )𝑐
𝑊 ∶= 𝑒 𝔼[𝑉 | employed] + 𝑢 𝑈
45.6. ENDOGENOUS JOB FINDING RATE 743
where the notation 𝑉 and 𝑈 is as defined in the McCall search model lecture.
The wage offer distribution will be a discretized version of the lognormal distribution
𝐿𝑁 (log(20), 1), as shown in the next figure
@jit
def u(c, σ):
if c > 0:
return (c**(1 σ) 1) / (1 σ)
else:
return 10e6
class McCallModel:
"""
Stores the parameters and functions associated with a given model.
"""
def __init__(self,
α=0.2, # Job separation rate
β=0.98, # Discount rate
744 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
# Add a default wage vector and probabilities over the vector using
# the betabinomial distribution
if w_vec is None:
n = 60 # Number of possible outcomes for wage
# Wages between 10 and 20
self.w_vec = np.linspace(10, 20, n)
a, b = 600, 400 # Shape parameters
dist = BetaBinomial(n1, a, b)
self.p_vec = dist.pdf()
else:
self.w_vec = w_vec
self.p_vec = p_vec
@jit
def _update_bellman(α, β, γ, c, σ, w_vec, p_vec, V, V_new, U):
"""
A jitted function to update the Bellman equations. Note that V_new is
modified in place (i.e, modified by this function). The new value of U
is returned.
"""
for w_idx, w in enumerate(w_vec):
# w_idx indexes the vector of possible wages
V_new[w_idx] = u(w, σ) + β * ((1 α) * V[w_idx] + α * U)
U_new = u(c, σ) + β * (1 γ) * U + \
β * γ * np.sum(np.maximum(U, V) * p_vec)
return U_new
Parameters
mcm : an instance of McCallModel
tol : float
error tolerance
max_iter : int
the maximum number of iterations
"""
error_2 = np.abs(U_new U)
error = max(error_1, error_2)
V[:] = V_new
U = U_new
i += 1
return V, U
If V(w) > U for all w, then the reservation wage w_bar is set to
the lowest wage in mcm.w_vec.
Parameters
mcm : an instance of McCallModel
return_values : bool (optional, default=False)
Return the value functions as well
Returns
w_bar : scalar
The reservation wage
"""
V, U = solve_mccall_model(mcm)
w_idx = np.searchsorted(V U, 0)
if w_idx == len(V):
w_bar = np.inf
else:
w_bar = mcm.w_vec[w_idx]
if return_values == False:
return w_bar
else:
return w_bar, V, U
Now let’s compute and plot welfare, employment, unemployment, and tax revenue as a func-
tion of the unemployment compensation rate
logw_dist = norm(np.log(log_wage_mean), 1)
w_vec = np.linspace(1e8, max_wage, wage_grid_size + 1)
cdf = logw_dist.cdf(np.log(w_vec))
pdf = cdf[1:] cdf[:1]
p_vec = pdf / pdf.sum()
w_vec = (w_vec[1:] + w_vec[:1]) / 2
"""
mcm = McCallModel(α=α_q,
β=β,
γ=γ,
c=cτ, # Post tax compensation
σ=σ,
w_vec=w_vecτ, # Post tax wages
p_vec=p_vec)
"""
w_bar, λ, V, U = compute_optimal_quantities(c, τ)
return e, u, welfare
def find_balanced_budget_tax(c):
"""
Find the tax level that will induce a balanced budget.
"""
def steady_state_budget(t):
e, u, w = compute_steady_state_quantities(c, t)
return t u * c
tax_vec = []
unempl_vec = []
empl_vec = []
welfare_vec = []
for c in c_vec:
t = find_balanced_budget_tax(c)
e_rate, u_rate, welfare = compute_steady_state_quantities(c, t)
tax_vec.append(t)
unempl_vec.append(u_rate)
empl_vec.append(e_rate)
welfare_vec.append(welfare)
plt.tight_layout()
plt.show()
45.7 Exercises
45.7.1 Exercise 1
In the Lake Model, there is derived data such as 𝐴 which depends on primitives like 𝛼 and 𝜆.
So, when a user alters these primitives, we need the derived data to update automatically.
(For example, if a user changes the value of 𝑏 for a given instance of the class, we would like
𝑔 = 𝑏 − 𝑑 to update automatically)
In the code above, we took care of this issue by creating new instances every time we wanted
to change parameters.
That way the derived data is always matched to current parameter values.
However, we can use descriptors instead, so that derived data is updated whenever parame-
ters are changed.
This is safer and means we don’t need to create a fresh instance for every new parameteriza-
tion.
(On the other hand, the code becomes denser, which is why we don’t always use the descrip-
tor approach in our lectures.)
In this exercise, your task is to arrange the LakeModel class by using descriptors and decora-
tors such as @property.
(If you need to refresh your understanding of how these work, consult this lecture.)
45.7.2 Exercise 2
Consider an economy with an initial stock of workers 𝑁0 = 100 at the steady state level of
employment in the baseline parameterization
• 𝛼 = 0.013
• 𝜆 = 0.283
• 𝑏 = 0.0124
• 𝑑 = 0.00822
(The values for 𝛼 and 𝜆 follow [24])
Suppose that in response to new legislation the hiring rate reduces to 𝜆 = 0.2.
Plot the transition dynamics of the unemployment and employment stocks for 50 periods.
Plot the transition dynamics for the rates.
How long does the economy take to converge to its new steady state?
What is the new steady state level of employment?
Note: it may be easier to use the class created in exercise 1 to help with changing variables.
45.8. SOLUTIONS 749
45.7.3 Exercise 3
Consider an economy with an initial stock of workers 𝑁0 = 100 at the steady state level of
employment in the baseline parameterization.
Suppose that for 20 periods the birth rate was temporarily high (𝑏 = 0.025) and then re-
turned to its original level.
Plot the transition dynamics of the unemployment and employment stocks for 50 periods.
Plot the transition dynamics for the rates.
How long does the economy take to return to its original steady state?
45.8 Solutions
45.8.1 Exercise 1
Parameters:
λ : scalar
The job finding rate for currently unemployed workers
α : scalar
The dismissal rate for currently employed workers
b : scalar
Entry rate into the labor force
d : scalar
Exit rate from the labor force
"""
def __init__(self, λ=0.283, α=0.013, b=0.0124, d=0.00822):
self._λ, self._α, self._b, self._d = λ, α, b, d
self.compute_derived_values()
def compute_derived_values(self):
# Unpack names to simplify expression
λ, α, b, d = self._λ, self._α, self._b, self._d
self._g = b d
self._A = np.array([[(1d) * (1λ) + b, (1 d) * α + b],
[ (1d) * λ, (1 d) * (1 α)]])
@property
def g(self):
return self._g
@property
def A(self):
return self._A
@property
def A_hat(self):
return self._A_hat
750 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
@property
def λ(self):
return self._λ
@λ.setter
def λ(self, new_value):
self._λ = new_value
self.compute_derived_values()
@property
def α(self):
return self._α
@α.setter
def α(self, new_value):
self._α = new_value
self.compute_derived_values()
@property
def b(self):
return self._b
@b.setter
def b(self, new_value):
self._b = new_value
self.compute_derived_values()
@property
def d(self):
return self._d
@d.setter
def d(self, new_value):
self._d = new_value
self.compute_derived_values()
Returns
xbar : steady state vector of employment and unemployment rates
"""
x = 0.5 * np.ones(2)
error = tol + 1
while error > tol:
new_x = self.A_hat @ x
error = np.max(np.abs(new_x x))
x = new_x
return x
Parameters
X0 : array
Contains initial values (E0, U0)
T : int
45.8. SOLUTIONS 751
Returns
X : iterator
Contains sequence of employment and unemployment stocks
"""
Parameters
x0 : array
Contains initial values (e0,u0)
T : int
Number of periods to simulate
Returns
x : iterator
Contains sequence of employment and unemployment rates
"""
x = np.atleast_1d(x0) # Recast as array just in case
for t in range(T):
yield x
x = self.A_hat @ x
45.8.2 Exercise 2
We begin by constructing the class containing the default parameters and assigning the
steady state values to x0
In [15]: lm = LakeModelModified()
x0 = lm.rate_steady_state()
print(f"Initial Steady State: {x0}")
In [16]: N0 = 100
T = 50
axes[0].plot(X_path[:, 0])
axes[0].set_title('Unemployment')
axes[1].plot(X_path[:, 1])
axes[1].set_title('Employment')
axes[2].plot(X_path.sum(1))
axes[2].set_title('Labor force')
for ax in axes:
ax.grid()
plt.tight_layout()
plt.show()
45.8. SOLUTIONS 753
plt.tight_layout()
plt.show()
We see that it takes 20 periods for the economy to converge to its new steady state levels.
45.8.3 Exercise 3
This next exercise has the economy experiencing a boom in entrances to the labor market and
then later returning to the original levels.
For 20 periods the economy has a new entry rate into the labor market.
Let’s start off at the baseline parameterization and record the steady state
In [20]: lm = LakeModelModified()
x0 = lm.rate_steady_state()
754 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
Now we reset 𝑏 to the original value and then, using the state after 20 periods for the new
initial conditions, we simulate for the additional 30 periods
axes[0].plot(X_path[:, 0])
axes[0].set_title('Unemployment')
axes[1].plot(X_path[:, 1])
axes[1].set_title('Employment')
axes[2].plot(X_path.sum(1))
axes[2].set_title('Labor force')
for ax in axes:
ax.grid()
plt.tight_layout()
plt.show()
45.8. SOLUTIONS 755
plt.tight_layout()
plt.show()
756 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
Chapter 46
46.1 Contents
• Overview 46.2
• Defining Rational Expectations Equilibrium 46.3
• Computation of an Equilibrium 46.4
• Exercises 46.5
• Solutions 46.6
In addition to what’s in Anaconda, this lecture will need the following libraries:
46.2 Overview
757
758 CHAPTER 46. RATIONAL EXPECTATIONS EQUILIBRIUM
Finally, we will learn about the important “Big 𝐾, little 𝑘” trick, a modeling device widely
used in macroeconomics.
Except that for us
• Instead of “Big 𝐾” it will be “Big 𝑌 ”.
• Instead of “little 𝑘” it will be “little 𝑦”.
Let’s start with some standard imports:
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
This widely used method applies in contexts in which a “representative firm” or agent is a
“price taker” operating within a competitive equilibrium.
We want to impose that
• The representative firm or individual takes aggregate 𝑌 as given when it chooses indi-
vidual 𝑦, but ….
• At the end of the day, 𝑌 = 𝑦, so that the representative firm is indeed representative.
The Big 𝑌 , little 𝑦 trick accomplishes these two goals by
• Taking 𝑌 as beyond control when posing the choice problem of who chooses 𝑦; but ….
• Imposing 𝑌 = 𝑦 after having solved the individual’s optimization problem.
Please watch for how this strategy is applied as the lecture unfolds.
We begin by applying the Big 𝑌 , little 𝑦 trick in a very simple static context.
Consider a static model in which a collection of 𝑛 firms produce a homogeneous good that is
sold in a competitive market.
Each of these 𝑛 firms sell output 𝑦.
The price 𝑝 of the good lies on an inverse demand curve
𝑝 = 𝑎 0 − 𝑎1 𝑌 (1)
where
46.3. DEFINING RATIONAL EXPECTATIONS EQUILIBRIUM 759
• 𝑎𝑖 > 0 for 𝑖 = 0, 1
• 𝑌 = 𝑛𝑦 is the market-wide level of output
Each firm has a total cost function
𝑎0 − 𝑎1 𝑌 − 𝑐1 − 𝑐2 𝑦 = 0 (3)
At this point, but not before, we substitute 𝑌 = 𝑛𝑦 into (3) to obtain the following linear
equation
Our first illustration of a rational expectations equilibrium involves a market with 𝑛 firms,
each of which seeks to maximize the discounted present value of profits in the face of adjust-
ment costs.
The adjustment costs induce the firms to make gradual adjustments, which in turn requires
consideration of future prices.
760 CHAPTER 46. RATIONAL EXPECTATIONS EQUILIBRIUM
Individual firms understand that, via the inverse demand curve, the price is determined by
the amounts supplied by other firms.
Hence each firm wants to forecast future total industry supplies.
In our context, a forecast is generated by a belief about the law of motion for the aggregate
state.
Rational expectations equilibrium prevails when this belief coincides with the actual law of
motion generated by production choices induced by this belief.
We formulate a rational expectations equilibrium in terms of a fixed point of an operator that
maps beliefs into optimal beliefs.
𝑝𝑡 = 𝑎0 − 𝑎1 𝑌𝑡 (5)
where
• 𝑎𝑖 > 0 for 𝑖 = 0, 1
• 𝑌𝑡 = 𝑛𝑦𝑡 is the market-wide level of output
∞
∑ 𝛽 𝑡 𝑟𝑡 (6)
𝑡=0
where
𝛾(𝑦𝑡+1 − 𝑦𝑡 )2
𝑟𝑡 ∶= 𝑝𝑡 𝑦𝑡 − , 𝑦0 given (7)
2
This includes ones that the firm cares about but does not control like 𝑝𝑡 .
We turn to this problem now.
In view of (5), the firm’s incentive to forecast the market price translates into an incentive to
forecast aggregate output 𝑌𝑡 .
Aggregate output depends on the choices of other firms.
We assume that 𝑛 is such a large number that the output of any single firm has a negligible
effect on aggregate output.
That justifies firms in regarding their forecasts of aggregate output as being unaffected by
their own output decisions.
We suppose the firm believes that market-wide output 𝑌𝑡 follows the law of motion
For now, let’s fix a particular belief 𝐻 in (8) and investigate the firm’s response to it.
Let 𝑣 be the optimal value function for the firm’s problem given 𝐻.
The value function satisfies the Bellman equation
𝛾(𝑦′ − 𝑦)2
𝑣(𝑦, 𝑌 ) = max {𝑎0 𝑦 − 𝑎1 𝑦𝑌 − + 𝛽𝑣(𝑦′ , 𝐻(𝑌 ))} (9)
′
𝑦 2
where
𝛾(𝑦′ − 𝑦)2
ℎ(𝑦, 𝑌 ) ∶= argmax {𝑎0 𝑦 − 𝑎1 𝑦𝑌 − + 𝛽𝑣(𝑦′ , 𝐻(𝑌 ))} (11)
𝑦′ 2
A First-Order Characterization
𝑣𝑦 (𝑦, 𝑌 ) = 𝑎0 − 𝑎1 𝑌 + 𝛾(𝑦′ − 𝑦)
The firm optimally sets an output path that satisfies (13), taking (8) as given, and subject to
• the initial conditions for (𝑦0 , 𝑌0 ).
• the terminal condition lim𝑡→∞ 𝛽 𝑡 𝑦𝑡 𝑣𝑦 (𝑦𝑡 , 𝑌𝑡 ) = 0.
This last condition is called the transversality condition, and acts as a first-order necessary
condition “at infinity”.
The firm’s decision rule solves the difference equation (13) subject to the given initial condi-
tion 𝑦0 and the transversality condition.
Note that solving the Bellman equation (9) for 𝑣 and then ℎ in (11) yields a decision rule that
automatically imposes both the Euler equation (13) and the transversality condition.
Thus, when firms believe that the law of motion for market-wide output is (8), their optimiz-
ing behavior makes the actual law of motion be (14).
A rational expectations equilibrium or recursive competitive equilibrium of the model with ad-
justment costs is a decision rule ℎ and an aggregate law of motion 𝐻 such that
Thus, a rational expectations equilibrium equates the perceived and actual laws of motion (8)
and (14).
As we’ve seen, the firm’s optimum problem induces a mapping Φ from a perceived law of mo-
tion 𝐻 for market-wide output to an actual law of motion Φ(𝐻).
The mapping Φ is the composition of two operations, taking a perceived law of motion into a
decision rule via (9)–(11), and a decision rule into an actual law via (14).
The 𝐻 component of a rational expectations equilibrium is a fixed point of Φ.
Now let’s consider the problem of computing the rational expectations equilibrium.
Readers accustomed to dynamic programming arguments might try to address this problem
by choosing some guess 𝐻0 for the aggregate law of motion and then iterating with Φ.
Unfortunately, the mapping Φ is not a contraction.
In particular, there is no guarantee that direct iterations on Φ converge Section ??.
Furthermore, there are examples in which these iterations diverge.
Fortunately, there is another method that works here.
The method exploits a connection between equilibrium and Pareto optimality expressed in
the fundamental theorems of welfare economics (see, e.g, [79]).
Lucas and Prescott [74] used this method to construct a rational expectations equilibrium.
The details follow.
Our plan of attack is to match the Euler equations of the market problem with those for a
single-agent choice problem.
As we’ll see, this planning problem can be solved by LQ control (linear regulator).
The optimal quantities from the planning problem are rational expectations equilibrium
quantities.
The rational expectations equilibrium price can be obtained as a shadow price in the planning
problem.
For convenience, in this section, we set 𝑛 = 1.
We first compute a sum of consumer and producer surplus at time 𝑡
764 CHAPTER 46. RATIONAL EXPECTATIONS EQUILIBRIUM
𝑌𝑡
𝛾(𝑌𝑡+1 − 𝑌𝑡 )2
𝑠(𝑌𝑡 , 𝑌𝑡+1 ) ∶= ∫ (𝑎0 − 𝑎1 𝑥) 𝑑𝑥 − (15)
0 2
The first term is the area under the demand curve, while the second measures the social costs
of changing output.
The planning problem is to choose a production plan {𝑌𝑡 } to maximize
∞
∑ 𝛽 𝑡 𝑠(𝑌𝑡 , 𝑌𝑡+1 )
𝑡=0
Evaluating the integral in (15) yields the quadratic form 𝑎0 𝑌𝑡 − 𝑎1 𝑌𝑡2 /2.
As a result, the Bellman equation for the planning problem is
𝑎1 2 𝛾(𝑌 ′ − 𝑌 )2
𝑉 (𝑌 ) = max {𝑎0 𝑌 − 𝑌 − + 𝛽𝑉 (𝑌 ′ )} (16)
𝑌′ 2 2
−𝛾(𝑌 ′ − 𝑌 ) + 𝛽𝑉 ′ (𝑌 ′ ) = 0 (17)
𝑉 ′ (𝑌 ) = 𝑎0 − 𝑎1 𝑌 + 𝛾(𝑌 ′ − 𝑌 )
Substituting this into equation (17) and rearranging leads to the Euler equation
2. substituting into it the expression 𝑌𝑡 = 𝑛𝑦𝑡 that “makes the representative firm be rep-
resentative”.
46.5. EXERCISES 765
If it is appropriate to apply the same terminal conditions for these two difference equations,
which it is, then we have verified that a solution of the planning problem is also a rational
expectations equilibrium quantity sequence.
It follows that for this example we can compute equilibrium quantities by forming the optimal
linear regulator problem corresponding to the Bellman equation (16).
The optimal policy function for the planning problem is the aggregate law of motion 𝐻 that
the representative firm faces within a rational expectations equilibrium.
As you are asked to show in the exercises, the fact that the planner’s problem is an LQ prob-
lem implies an optimal policy — and hence aggregate law of motion — taking the form
𝑌𝑡+1 = 𝜅0 + 𝜅1 𝑌𝑡 (19)
𝑦𝑡+1 = ℎ0 + ℎ1 𝑦𝑡 + ℎ2 𝑌𝑡 (20)
46.5 Exercises
46.5.1 Exercise 1
Express the solution of the firm’s problem in the form (20) and give the values for each ℎ𝑗 .
If there were 𝑛 identical competitive firms all behaving according to (20), what would (20)
imply for the actual law of motion (8) for market supply.
766 CHAPTER 46. RATIONAL EXPECTATIONS EQUILIBRIUM
46.5.2 Exercise 2
Consider the following 𝜅0 , 𝜅1 pairs as candidates for the aggregate law of motion component
of a rational expectations equilibrium (see (19)).
Extending the program that you wrote for exercise 1, determine which if any satisfy the defi-
nition of a rational expectations equilibrium
• (94.0886298678, 0.923409232937)
• (93.2119845412, 0.984323478873)
• (95.0818452486, 0.952459076301)
Describe an iterative algorithm that uses the program that you wrote for exercise 1 to com-
pute a rational expectations equilibrium.
(You are not being asked actually to use the algorithm you are suggesting)
46.5.3 Exercise 3
46.5.4 Exercise 4
∞
A monopolist faces the industry demand curve (5) and chooses {𝑌𝑡 } to maximize ∑𝑡=0 𝛽 𝑡 𝑟𝑡
where
𝛾(𝑌𝑡+1 − 𝑌𝑡 )2
𝑟𝑡 = 𝑝𝑡 𝑌𝑡 −
2
Formulate this problem as an LQ problem.
Compute the optimal policy using the same parameters as the previous exercise.
In particular, solve for the parameters in
𝑌𝑡+1 = 𝑚0 + 𝑚1 𝑌𝑡
46.6 Solutions
46.6.1 Exercise 1
To map a problem into a discounted optimal linear control problem, we need to define
46.6. SOLUTIONS 767
𝑦𝑡
𝑥𝑡 = ⎢𝑌𝑡 ⎤
⎡
⎥, 𝑢𝑡 = 𝑦𝑡+1 − 𝑦𝑡
1
⎣ ⎦
For 𝐵, 𝑄, 𝑅 we set
1 0 0 1 0 𝑎1 /2 −𝑎0 /2
𝐴=⎡ ⎤
⎢0 𝜅1 𝜅0 ⎥ , 𝐵=⎡ ⎤
⎢0⎥ , 𝑅=⎡ 𝑎
⎢ 1 /2 0 0 ⎤ ⎥, 𝑄 = 𝛾/2
⎣0 0 1 ⎦ ⎣0⎦ ⎣−𝑎0 /2 0 0 ⎦
𝑦𝑡+1 − 𝑦𝑡 = −𝐹0 𝑦𝑡 − 𝐹1 𝑌𝑡 − 𝐹2
ℎ0 = −𝐹2 , ℎ 1 = 1 − 𝐹0 , ℎ2 = −𝐹1
a0 = 100
a1 = 0.05
β = 0.95
γ = 10.0
# Beliefs
κ0 = 95.5
κ1 = 0.95
lq = LQ(Q, R, A, B, beta=β)
P, F, d = lq.stationary_values()
768 CHAPTER 46. RATIONAL EXPECTATIONS EQUILIBRIUM
F = F.flatten()
out1 = f"F = [{F[0]:.3f}, {F[1]:.3f}, {F[2]:.3f}]"
h0, h1, h2 = F[2], 1 F[0], F[1]
out2 = f"(h0, h1, h2) = ({h0:.3f}, {h1:.3f}, {h2:.3f})"
print(out1)
print(out2)
For the case 𝑛 > 1, recall that 𝑌𝑡 = 𝑛𝑦𝑡 , which, combined with the previous equation, yields
46.6.2 Exercise 2
To determine whether a 𝜅0 , 𝜅1 pair forms the aggregate law of motion component of a ratio-
nal expectations equilibrium, we can proceed as follows:
• Determine the corresponding firm law of motion 𝑦𝑡+1 = ℎ0 + ℎ1 𝑦𝑡 + ℎ2 𝑌𝑡 .
• Test whether the associated aggregate law :𝑌𝑡+1 = 𝑛ℎ(𝑌𝑡 /𝑛, 𝑌𝑡 ) evaluates to 𝑌𝑡+1 =
𝜅0 + 𝜅1 𝑌𝑡 .
In the second step, we can use 𝑌𝑡 = 𝑛𝑦𝑡 = 𝑦𝑡 , so that 𝑌𝑡+1 = 𝑛ℎ(𝑌𝑡 /𝑛, 𝑌𝑡 ) becomes
The output tells us that the answer is pair (iii), which implies (ℎ0 , ℎ1 , ℎ2 ) =
(95.0819, 1.0000, −.0475).
(Notice we use np.allclose to test equality of floating-point numbers, since exact equality is
too strict).
Regarding the iterative algorithm, one could loop from a given (𝜅0 , 𝜅1 ) pair to the associated
firm law and then to a new (𝜅0 , 𝜅1 ) pair.
This amounts to implementing the operator Φ described in the lecture.
(There is in general no guarantee that this iterative process will converge to a rational expec-
tations equilibrium)
46.6.3 Exercise 3
𝑌
𝑥𝑡 = [ 𝑡 ] , 𝑢𝑡 = 𝑌𝑡+1 − 𝑌𝑡
1
1 0 1 𝑎1 /2 −𝑎0 /2
𝐴=[ ], 𝐵 = [ ], 𝑅=[ ], 𝑄 = 𝛾/2
0 1 0 −𝑎0 /2 0
𝑌𝑡+1 − 𝑌𝑡 = −𝐹0 𝑌𝑡 − 𝐹1
we can obtain the implied aggregate law of motion via 𝜅0 = −𝐹1 and 𝜅1 = 1 − 𝐹0 .
The Python code to solve this problem is below:
lq = LQ(Q, R, A, B, beta=β)
P, F, d = lq.stationary_values()
770 CHAPTER 46. RATIONAL EXPECTATIONS EQUILIBRIUM
F = F.flatten()
κ0, κ1 = F[1], 1 F[0]
print(κ0, κ1)
95.08187459214828 0.9524590627039239
The output yields the same (𝜅0 , 𝜅1 ) pair obtained as an equilibrium from the previous exer-
cise.
46.6.4 Exercise 4
The monopolist’s LQ problem is almost identical to the planner’s problem from the previous
exercise, except that
𝑎1 −𝑎0 /2
𝑅=[ ]
−𝑎0 /2 0
lq = LQ(Q, R, A, B, beta=β)
P, F, d = lq.stationary_values()
F = F.flatten()
m0, m1 = F[1], 1 F[0]
print(m0, m1)
73.47294403502859 0.9265270559649703
We see that the law of motion for the monopolist is approximately 𝑌𝑡+1 = 73.4729 + 0.9265𝑌𝑡 .
In the rational expectations case, the law of motion was approximately 𝑌𝑡+1 = 95.0818 +
0.9525𝑌𝑡 .
One way to compare these two laws of motion is by their fixed points, which give long-run
equilibrium output in each case.
For laws of the form 𝑌𝑡+1 = 𝑐0 + 𝑐1 𝑌𝑡 , the fixed point is 𝑐0 /(1 − 𝑐1 ).
If you crunch the numbers, you will see that the monopolist adopts a lower long-run quantity
than obtained by the competitive market, implying a higher market price.
This is analogous to the elementary static-case results
Footnotes
[1] A literature that studies whether models populated with agents who learn can converge
to rational expectations equilibria features iterations on a modification of the mapping Φ that
46.6. SOLUTIONS 771
47.1 Contents
• Overview 47.2
• Linear difference equations 47.3
• Illustration: Cagan’s Model 47.4
• Some Python code 47.5
• Alternative code 47.6
• Another perspective 47.7
• Log money supply feeds back on log price level 47.8
• Big 𝑃 , little 𝑝 interpretation 47.9
• Fun with Sympy code 47.10
In addition to what’s in Anaconda, this lecture deploys the following libraries:
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
47.2 Overview
This lecture studies stability in the context of an elementary rational expectations model.
We study a rational expectations version of Philip Cagan’s model [18] linking the price level
to the money supply.
773
774 CHAPTER 47. STABILITY IN LINEAR RATIONAL EXPECTATIONS MODELS
Cagan did not use a rational expectations version of his model, but Sargent [94] did.
We study this model because it is intrinsically interesting and also because it has a mathe-
matical structure that also appears in virtually all linear rational expectations model, namely,
that a key endogenous variable equals a mathematical expectation of a geometric sum of fu-
ture values of another variable.
In a rational expectations version of Cagan’s model, the endogenous variable is the price level
or rate of inflation and the other variable is the money supply or the rate of change in the
money supply.
In this lecture, we’ll encounter:
• a convenient formula for the expectation of geometric sum of future values of a variable
• a way of solving an expectational difference equation by mapping it into a vector first-
order difference equation and appropriately manipulating an eigen decomposition of the
transition matrix in order to impose stability
• a way to use a Big 𝐾, little 𝑘 argument to allow apparent feedback from endogenous to
exogenous variables within a rational expectations equilibrium
• a use of eigenvector decompositions of matrices that allowed Blanchard and Khan
(1981) [? ] and Whiteman (1983) [110] to solve a class of linear rational expectations
models
Cagan’s model with rational expectations is formulated as an expectational difference
equation whose solution is a rational expectations equilibrium.
We’ll start this lecture with a quick review of deterministic (i.e., non-random) first-order and
second-order linear difference equations.
In this quick review of linear difference equations, we’ll use the backward shift or lag operator
𝐿
The lag operator 𝐿 maps a sequence {𝑥𝑡 }∞ ∞
𝑡=0 into the sequence {𝑥𝑡−1 }𝑡=0
We’ll can use 𝐿 in linear difference equations by using the equality 𝐿𝑥𝑡 ≡ 𝑥𝑡−1 in algebraic
expressions.
Further, the inverse 𝐿−1 of the lag operator is the forward shift operator.
In linear difference equations, we’ll often use the equaltiy 𝐿−1 𝑥𝑡 ≡ 𝑥𝑡+1 in the the algebra
below.
The algebra of lag and forward shift operators often simplifies formulas for linear difference
equations and their solutions.
Let 𝐿 be the lag operator defined by 𝐿𝑥𝑡 ≡ 𝑥𝑡−1 and let 𝐿−1 be the forward shift operator
defined by 𝐿−1 𝑥𝑡 ≡ 𝑥𝑡+1 .
Then
47.3. LINEAR DIFFERENCE EQUATIONS 775
(1 − 𝜆𝐿)𝑦𝑡 = 𝑢𝑡 , ∀𝑡 (1)
has solutions
or
∞
𝑦𝑡 = ∑ 𝜆𝑗 𝑢𝑡−𝑗 + 𝑘𝜆𝑡
𝑗=0
or
A solution is
1
𝑦𝑡 = −𝜆−1 ( ) 𝑢𝑡+1 + 𝑘𝜆𝑡 (5)
1 − 𝜆−1 𝐿−1
for any 𝑘.
To verify that this is a solution, check the consequences of operating on both sides of equation
(5) by (1 − 𝜆𝐿) and compare to equation (1).
Solution (2) exists for |𝜆| < 1 because the distributed lag in 𝑢 converges.
Solution (5) exists when |𝜆| > 1 because the distributed lead in 𝑢 converges.
When |𝜆| > 1, the distributed lag in 𝑢 in (2) may diverge, so that a solution of this form does
not exist.
The distributed lead in 𝑢 in (5) need not converge when |𝜆| < 1.
where {𝑢𝑡 } is a bounded sequence, 𝑦0 is an initial condition, |𝜆1 | < 1 and |𝜆2 | > 1.
We seek a bounded sequence {𝑦𝑡 }∞ 𝑡=0 that satisfies (6). Using insights from our analysis of the
first-order equation, operate on both sides of (6) by the forward inverse of (1−𝜆2 𝐿) to rewrite
equation (6) as
𝜆−1
2
(1 − 𝜆1 𝐿)𝑦𝑡+1 = − 𝑢
−1 𝑡+1
1 − 𝜆−1
2 𝐿
or
∞
−𝑗
𝑦𝑡+1 = 𝜆1 𝑦𝑡 − 𝜆−1
2 ∑ 𝜆2 𝑢𝑡+𝑗+1 . (7)
𝑗=0
Thus, we obtained equation (7) by solving stable roots (in this case 𝜆1 ) backward, and un-
stable roots (in this case 𝜆2 ) forward.
Equation (7) has a form that we shall encounter often.
−1
𝜆1 𝑦𝑡 is called the feedback part and − 1−𝜆𝜆−1
2
−1 𝑢𝑡+1 is called the feedforward part of the
2 𝐿
solution.
Let
• 𝑚𝑑𝑡 be the log of the demand for money
• 𝑚𝑡 be the log of the supply of money
• 𝑝𝑡 be the log of the price level
It follows that 𝑝𝑡+1 − 𝑝𝑡 is the rate of inflation.
The logarithm of the demand for real money balances 𝑚𝑑𝑡 − 𝑝𝑡 is an inverse function of the
expected rate of inflation 𝑝𝑡+1 − 𝑝𝑡 for 𝑡 ≥ 0:
Equate the demand for log money 𝑚𝑑𝑡 to the supply of log money 𝑚𝑡 in the above equation
and rearrange to deduce that the logarithm of the price level 𝑝𝑡 is related to the logarithm of
the money supply 𝑚𝑡 by
∞
𝑝𝑡 = (1 − 𝜆) ∑ 𝜆𝑗 𝑚𝑡+𝑗 , (9)
𝑗=0
which is the unique stable solution of difference equation (8) among a class of more general
solutions
47.4. ILLUSTRATION: CAGAN’S MODEL 777
∞
𝑝𝑡 = (1 − 𝜆) ∑ 𝜆𝑗 𝑚𝑡+𝑗 + 𝑐𝜆−𝑡 (10)
𝑗=0
𝑚𝑡 = 𝐺𝑥𝑡
(11)
𝑥𝑡+1 = 𝐴𝑥𝑡
𝑚𝑡+1 = 𝛼 + 𝜌1 𝑚𝑡 + 𝜌2 𝑚𝑡−1
where the zeros of the characteristic polynomial (1 − 𝜌1 𝑧 − 𝜌2 𝑧2 ) are strictly greater than 1 in
modulus
We seek a stable or non-explosive solution of the difference equation (8) that obeys the sys-
tem comprised of (8)-(11).
By stable or non-explosive, we mean that neither 𝑚𝑡 nor 𝑝𝑡 diverges as 𝑡 → +∞.
This means that we are shutting down the term 𝑐𝜆−𝑡 in equation (10) above by setting 𝑐 = 0
The solution we are after is
𝑝𝑡 = 𝐹 𝑥𝑡 (12)
where
Note: As mentioned above, an explosive solution of difference equation (8) can be con-
structed by adding to the right hand of (12) a sequence 𝑐𝜆−𝑡 where 𝑐 is an arbitrary positive
constant.
778 CHAPTER 47. STABILITY IN LINEAR RATIONAL EXPECTATIONS MODELS
that is parameterized by 𝜌1 , 𝜌2 , 𝛼
To capture this parameterization with system (9) we set
1 1 0 0
𝑥𝑡 = ⎡ 𝑚
⎢ 𝑡 ⎥,
⎤ 𝐴=⎡
⎢𝛼 𝜌1 𝜌2 ⎤
⎥, 𝐺 = [0 1 0]
⎣𝑚𝑡−1 ⎦ ⎣0 1 0 ⎦
In [3]: λ = .9
α = 0
ρ1 = .9
ρ2 = .05
A = np.array([[1, 0, 0],
[α, ρ1, ρ2],
[0, 1, 0]])
G = np.array([[0, 1, 0]])
The matrix 𝐴 has one eigenvalue equal to unity that is associated with the 𝐴11 component
that captures a constant component of the state 𝑥𝑡 .
We can verify that the two eigenvalues of 𝐴 not associated with the constant in the state 𝑥𝑡
are strictly less than unity in modulus.
[0.05249378 0.95249378 1. ]
Out[5]: True
m_seq = np.empty(T+1)
p_seq = np.empty(T+1)
m_seq[0] = G @ x0
p_seq[0] = F @ x0
x = A @ x_old
m_seq[t+1] = G @ x
p_seq[t+1] = F @ x
x_old = x
In [8]: plt.figure()
plt.plot(range(T+1), m_seq, label='$m_t$')
plt.plot(range(T+1), p_seq, label='$p_t$')
plt.xlabel('t')
plt.title(f'λ={λ}, α={α}, $ρ_1$={ρ1}, $ρ_2$={ρ2}')
plt.legend()
plt.show()
780 CHAPTER 47. STABILITY IN LINEAR RATIONAL EXPECTATIONS MODELS
In the above graph, why is the log of the price level always less than the log of the money
supply?
The answer is because
• according to equation (9), 𝑝𝑡 is a geometric weighted average of current and future val-
ues of 𝑚𝑡 , and
• it happens that in this example future 𝑚’s are always less than the current 𝑚
We could also have run the simulation using the quantecon LinearStateSpace code.
The following code block performs the calculation with that code.
# stack G and F
G_ext = np.vstack([G, F])
C = np.zeros((A.shape[0], 1))
In [10]: T = 100
# plot
plt.figure()
plt.plot(range(T), y[0,:], label='$m_t$')
plt.plot(range(T), y[1,:], label='$p_t$')
plt.xlabel('t')
plt.title(f'λ={λ}, α={α}, $ρ_1$={ρ1}, $ρ_2$={ρ2}')
plt.legend()
plt.show()
47.6. ALTERNATIVE CODE 781
To simplify our presentation in ways that will let focus on an important idea, in the above
second-order difference equation (14) that governs 𝑚𝑡 , we now set 𝛼 = 0, 𝜌1 = 𝜌 ∈ (−1, 1),
and 𝜌2 = 0 so that the law of motion for 𝑚𝑡 becomes
𝑥𝑡 = 𝑚𝑡 .
𝐹 = (1 − 𝜆)(1 − 𝜆𝜌)−1 .
𝑝𝑡 = 𝐹 𝑚 𝑡 .
Please keep these formulas in mind as we investigate an alternative route to and interpreta-
tion of the formula for 𝐹 .
782 CHAPTER 47. STABILITY IN LINEAR RATIONAL EXPECTATIONS MODELS
Above, we imposed stability or non-explosiveness on the solution of the key difference equa-
tion (8) in Cagan’s model by solving the unstable root 𝜆−1 forward.
To shed light on the mechanics involved in imposing stability on a solution of a potentially
unstable system of linear difference equations and to prepare the way for generalizations of
our model in which the money supply is allowed to feed back on the price level itself, we stack
equations (8) and (15) to form the system
𝑚 𝜌 0 𝑚𝑡
[ 𝑡+1 ] = [ −1 ] [ ] (16)
𝑝𝑡+1 −(1 − 𝜆)/𝜆 𝜆 𝑝𝑡
or
where
𝜌 0
𝐻=[ ]. (18)
−(1 − 𝜆)/𝜆 𝜆−1
𝐻 = 𝑄Λ𝑄−1 .
Here Λ is a diagonal matrix of eigenvalues of 𝐻 and 𝑄 is a matrix whose columns are eigen-
vectors of the corresponding eigenvalues.
Note that
𝐻 𝑡 = 𝑄Λ𝑡 𝑄−1
so that
𝑦𝑡 = 𝑄Λ𝑡 𝑄−1 𝑦0
For almost all initial vectors 𝑦0 , the presence of the eigenvalue 𝜆−1 > 1 causes both compo-
nents of 𝑦𝑡 to diverge in absolute value to +∞.
To explore this outcome in more detail, we use the following transformation
𝑦𝑡∗ = 𝑄−1 𝑦𝑡
47.7. ANOTHER PERSPECTIVE 783
that allows us to represent the dynamics in a way that isolates the source of the propensity of
paths to diverge:
∗
𝑦𝑡+1 = Λ𝑡 𝑦𝑡∗
∗
𝑦1,0
𝑦0∗ = [ ], (19)
0
the path of 𝑦𝑡∗ and therefore the paths of both components of 𝑦𝑡 = 𝑄𝑦𝑡∗ will diverge in abso-
lute value as 𝑡 → +∞. (We say that the paths explode)
Equation (19) also leads us to conclude that there is a unique setting for the initial vector 𝑦0
for which both components of 𝑦𝑡 do not diverge.
The required setting of 𝑦0 must evidently have the property that
∗
𝑦1,0
𝑄𝑦0 = 𝑦0∗ = [ ].
0
𝑚0
But note that since 𝑦0 = [ ] and 𝑚0 is given to us an an initial condition, it has to be 𝑝0
𝑝0
that does all the adjusting to satisfy this equation.
Sometimes this situation is described by saying that while 𝑚0 is truly a state variable, 𝑝0 is a
jump variable that is free to adjust at 𝑡 = 0 in order to satisfy the equation.
Thus, in a nutshell the unique value of the vector 𝑦0 for which the paths of 𝑦𝑡 do not diverge
must have second component 𝑝0 that verifies equality (19) by setting the second component of
𝑦0∗ equal to zero.
𝑚0
The component 𝑝0 of the initial vector 𝑦0 = [ ] must evidently satisfy
𝑝0
𝑄{2} 𝑦0 = 0
where 𝑄{2} denotes the second row of 𝑄−1 , a restriction that is equivalent to
We can get an even more convenient formula for 𝑝0 that is cast in terms of components of 𝑄
instead of components of 𝑄−1 .
784 CHAPTER 47. STABILITY IN LINEAR RATIONAL EXPECTATIONS MODELS
To get this formula, first note that because (𝑄21 𝑄22 ) is the second row of the inverse of 𝑄
and because 𝑄−1 𝑄 = 𝐼, it follows that
𝑄11
[𝑄21 𝑄22 ] [ ]=0
𝑄21
Therefore,
So we can write
𝑝0 = 𝑄21 𝑄−1
11 𝑚0 . (22)
It can be verified that this formula replicates itself over time so that
𝑝𝑡 = 𝑄21 𝑄−1
11 𝑚𝑡 . (23)
To implement formula (23), we want to compute 𝑄1 the eigenvector of 𝑄 associated with the
stable eigenvalue 𝜌 of 𝑄.
By hand it can be verified that the eigenvector associated with the stable eigenvalue 𝜌 is pro-
portional to
1 − 𝜆𝜌
𝑄1 = [ ].
1−𝜆
𝑝𝑡 = 𝑄21 𝑄−1
11 𝑚𝑡 ,
where
𝑄11
𝑄1 = [ ].
𝑄21
We have expressed (16) in what superficially appears to be a form in which 𝑦𝑡+1 feeds back on
𝑦𝑡 . even though what we actually want to represent is that the component 𝑝𝑡 feeds forward
on 𝑝𝑡+1 , and through it, on future 𝑚𝑡+𝑗 , 𝑗 = 0, 1, 2, ….
47.8. LOG MONEY SUPPLY FEEDS BACK ON LOG PRICE LEVEL 785
A tell-tale sign that we should look beyond its superficial “feedback” form is that 𝜆−1 > 1 so
that the matrix 𝐻 in (16) is unstable
• it has one eigenvalue 𝜌 that is less than one in modulus that does not imperil stability,
but …
• it has a second eigenvalue 𝜆−1 that exceeds one in modulus and that makes 𝐻 an unsta-
ble matrix
We’ll keep these observations in mind as we turn now to a case in which the log money sup-
ply actually does feed back on the log of the price level.
The same pattern of eigenvalues splitting around unity, with one being below unity and an-
other greater than unity, sometimes continues to prevail when there is feedback from the log
price level to the log money supply.
Let the feedback rule be
𝑦𝑡+1 = 𝐻𝑦𝑡
now becomes
𝜌 𝛿
𝐻=[ ].
−(1 − 𝜆)/𝜆 𝜆−1
We take 𝑚0 as a given intial condition and as before seek an initial value 𝑝0 that stabilizes
the system in the sense that 𝑦𝑡 converges as 𝑡 → +∞.
Our approach is identical with that followed above and is based on an eigenvalue decomposi-
tion in which, cross our fingers, one eigenvalue exceeds unity and the other is less than unity
in absolute value.
When 𝛿 ≠ 0 as we now assume, the eigenvalues of 𝐻 are no longer 𝜌 ∈ (0, 1) and 𝜆−1 > 1
We’ll just calculate them and apply the same algorithm that we used above.
That algorithm remains valid so long as the eigenvalues split around unity as before.
Again we assume that 𝑚0 is an initial condition, but that 𝑝0 is not given but to be solved for.
786 CHAPTER 47. STABILITY IN LINEAR RATIONAL EXPECTATIONS MODELS
Let’s write and execute some Python code that will let us explore how outcomes depend on 𝛿.
H = np.empty((2, 2))
H[0, :] = ρ,δ
H[1, :] = (1 λ) / λ, 1 / λ
return H
# construct H matrix
H = construct_H(ρ, λ, δ)
# compute eigenvalues
eigvals = np.linalg.eigvals(H)
return eigvals
In [12]: H_eigvals()
Notice that a negative δ will not imperil the stability of the matrix 𝐻, even if it has a big
absolute value.
But a large enough positive δ makes both eigenvalues of 𝐻 strictly greater than unity in mod-
ulus.
For example,
In [16]: H_eigvals(δ=0.2)
We want to study systems in which one eigenvalue exceeds unity in modulus while the other
is less than unity in modulus, so we avoid values of 𝛿 that are too large
H = construct_H(ρ, λ, δ)
eigvals, Q = np.linalg.eig(H)
return None
return p0
Let’s plot how the solution 𝑝0 changes as 𝑚0 changes for different settings of 𝛿.
plt.xlabel("$m_0$")
plt.ylabel("$p_0$")
plt.show()
788 CHAPTER 47. STABILITY IN LINEAR RATIONAL EXPECTATIONS MODELS
To look at things from a different angle, we can fix the initial value 𝑚0 and see how 𝑝0
changes as 𝛿 changes.
In [19]: m0 = 1
Notice that when 𝛿 is large enough, both eigenvalues exceed unity in modulus, causing a sta-
bilizing value of 𝑝0 not to exist.
It is helpful to view our solutions with feedback from the price level or inflation to money or
the rate of money creation in terms of the Big 𝐾, little 𝑘 idea discussed in Rational Expecta-
tions Models
This will help us sort out what is taken as given by the decision makers who use the differ-
ence equation (9) to determine 𝑝𝑡 as a function of their forecasts of future values of 𝑚𝑡 .
47.9. BIG 𝑃 , LITTLE 𝑃 INTERPRETATION 789
Let’s write the stabilizing solution that we have computed using the eigenvector decomposi-
tion of 𝐻 as 𝑃𝑡 = 𝐹 ∗ 𝑚𝑡 where
𝐹 ∗ = 𝑄21 𝑄−1
11
Then from 𝑃𝑡+1 = 𝐹 ∗ 𝑚𝑡+1 and 𝑚𝑡+1 = 𝜌𝑚𝑡 + 𝛿𝑃𝑡 we can deduce the recursion 𝑃𝑡+1 =
𝐹 ∗ 𝜌𝑚𝑡 + 𝐹 ∗ 𝛿𝑃𝑡 and create the stacked system
𝑚 𝜌 𝛿 𝑚𝑡
[ 𝑡+1 ] = [ ∗ ∗ ][ ]
𝑃𝑡+1 𝐹 𝜌 𝐹 𝛿 𝑃𝑡
or
𝑥𝑡+1 = 𝐴𝑥𝑡
𝑚𝑡
where 𝑥𝑡 = [ ].
𝑃𝑡
Then apply formula (13) for 𝐹 to deduce that
𝑚𝑡 𝑚
𝑝𝑡 = 𝐹 [ ]=𝐹[ ∗𝑡 ]
𝑃𝑡 𝐹 𝑚𝑡
𝑚𝑡
𝑝𝑡 = [𝐹1 𝐹2 ] [ ] = 𝐹1 𝑚𝑡 + 𝐹2 𝐹 ∗ 𝑚𝑡
𝐹 ∗ 𝑚𝑡
𝐹 ∗ = 𝐹 1 + 𝐹2 𝐹 ∗
We verify this equality in the next block of Python code that implements the following com-
putations.
1. For the system with 𝛿 ≠ 0 so that there is feedback, we compute the stabilizing solution
for 𝑝𝑡 in the form 𝑝𝑡 = 𝐹 ∗ 𝑚𝑡 where 𝐹 ∗ = 𝑄21 𝑄−1
11 as above.
𝑚𝑡
2. Recalling the system (11), (12), and (13) above, we define 𝑥𝑡 = [] and notice that it
𝑃𝑡
𝜌 𝛿
is Big 𝑃𝑡 and not little 𝑝𝑡 here. Then we form 𝐴 and 𝐺 as 𝐴 = [ ∗ ] and 𝐺 =
𝐹 𝜌 𝐹 ∗𝛿
[1 0] and we compute [𝐹1 𝐹2 ] ≡ 𝐹 from equation (13) above.
Out[22]: 0.9501243788791095
G = np.array([1, 0])
F_check= (1 λ) * G @ np.linalg.inv(np.eye(2) λ * A)
F_check
Compare 𝐹 ∗ with 𝐹1 + 𝐹2 𝐹 ∗
This section is a small gift for readers who have made it this far.
It puts Sympy to work on our model.
Thus, we use Sympy to compute some of the key objects comprising the eigenvector decompo-
sition of 𝐻.
𝐻 with nonzero 𝛿.
In [27]: H1
𝜌 𝛿
Out[27]: [ 𝜆−1 1]
𝜆 𝜆
In [28]: H1.eigenvals()
In [29]: H1.eigenvects()
2𝛿𝜆
𝜆𝜌 + 1 √4𝛿𝜆2 − 4𝛿𝜆 + 𝜆2 𝜌2 − 2𝜆𝜌 + 1 − 𝜆𝜌 + 1 √4𝛿𝜆2 − 4𝛿𝜆 + 𝜆2 𝜌2
Out[29]: [( − , 1, [[ 𝜆𝜌+√4𝛿𝜆2 −4𝛿𝜆+𝜆2 𝜌2 −2𝜆𝜌+1−1 ]]) , ( +
2𝜆 2𝜆 1 2𝜆 2𝜆
In [31]: H2
𝜌 0
Out[31]: [ 𝜆−1 1]
𝜆 𝜆
In [32]: H2.eigenvals()
1
Out[32]: { ∶ 1, 𝜌 ∶ 1}
𝜆
In [33]: H2.eigenvects()
𝜆𝜌−1
1 0
Out[33]: [( , 1, [[ ]]) , (𝜌, 1, [[ 𝜆−1 ]])]
𝜆 1 1
1. We compute the matrix 𝑄 whose first column is the eigenvector associated with 𝜌. and
whose second column is the eigenvector associated with 𝜆−1 .
4. Where 𝑄𝑖𝑗 denotes the (𝑖, 𝑗) component of 𝑄−1 , weighted use sympy to compute
−(𝑄22 )−1 𝑄21 (again in symbols)
In [34]: # construct Q
vec = []
for i, (eigval, _, eigvec) in enumerate(H2.eigenvects()):
vec.append(eigvec[0])
if eigval == ρ:
ind = i
Q = vec[ind].col_insert(1, vec[1ind])
In [35]: Q
𝜆𝜌−1
0
Out[35]: [ 𝜆−1 ]
1 1
𝑄−1
792 CHAPTER 47. STABILITY IN LINEAR RATIONAL EXPECTATIONS MODELS
𝜆−1
𝜆𝜌−1
0
Out[36]: [ 𝜆−1 ]
− 𝜆𝜌−1 1
−1
𝑄21 𝑄11
𝜆−1
Out[37]:
𝜆𝜌 − 1
𝜆−1
Out[38]:
𝜆𝜌 − 1
Chapter 48
48.1 Contents
• Overview 48.2
• Background 48.3
• Linear Markov Perfect Equilibria 48.4
• Application 48.5
• Exercises 48.6
• Solutions 48.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
48.2 Overview
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
793
794 CHAPTER 48. MARKOV PERFECT EQUILIBRIUM
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
48.3 Background
Two firms are the only producers of a good, the demand for which is governed by a linear in-
verse demand function
𝑝 = 𝑎0 − 𝑎1 (𝑞1 + 𝑞2 ) (1)
Here 𝑝 = 𝑝𝑡 is the price of the good, 𝑞𝑖 = 𝑞𝑖𝑡 is the output of firm 𝑖 = 1, 2 at time 𝑡 and
𝑎0 > 0, 𝑎1 > 0.
In (1) and what follows,
• the time subscript is suppressed when possible to simplify notation
• 𝑥̂ denotes a next period value of variable 𝑥
Each firm recognizes that its output affects total output and therefore the market price.
The one-period payoff function of firm 𝑖 is price times quantity minus adjustment costs:
Substituting the inverse demand curve (1) into (2) lets us express the one-period payoff as
48.4. LINEAR MARKOV PERFECT EQUILIBRIA 795
𝑣𝑖 (𝑞𝑖 , 𝑞−𝑖 ) = max {𝜋𝑖 (𝑞𝑖 , 𝑞−𝑖 , 𝑞𝑖̂ ) + 𝛽𝑣𝑖 (𝑞𝑖̂ , 𝑓−𝑖 (𝑞−𝑖 , 𝑞𝑖 ))} (4)
𝑞𝑖̂
Definition A Markov perfect equilibrium of the duopoly model is a pair of value functions
(𝑣1 , 𝑣2 ) and a pair of policy functions (𝑓1 , 𝑓2 ) such that, for each 𝑖 ∈ {1, 2} and each possible
state,
• The value function 𝑣𝑖 satisfies Bellman equation (4).
• The maximizer on the right side of (4) equals 𝑓𝑖 (𝑞𝑖 , 𝑞−𝑖 ).
The adjective “Markov” denotes that the equilibrium decision rules depend only on the cur-
rent values of the state variables, not other parts of their histories.
“Perfect” means complete, in the sense that the equilibrium is constructed by backward in-
duction and hence builds in optimizing behavior for each firm at all possible future states.
• These include many states that will not be reached when we iterate forward
on the pair of equilibrium strategies 𝑓𝑖 starting from a given initial state.
48.3.2 Computation
One strategy for computing a Markov perfect equilibrium is iterating to convergence on pairs
of Bellman equations and decision rules.
In particular, let 𝑣𝑖𝑗 , 𝑓𝑖𝑗 be the value function and policy function for firm 𝑖 at the 𝑗-th itera-
tion.
Imagine constructing the iterates
𝑣𝑖𝑗+1 (𝑞𝑖 , 𝑞−𝑖 ) = max {𝜋𝑖 (𝑞𝑖 , 𝑞−𝑖 , 𝑞𝑖̂ ) + 𝛽𝑣𝑖𝑗 (𝑞𝑖̂ , 𝑓−𝑖 (𝑞−𝑖 , 𝑞𝑖 ))} (5)
𝑞𝑖̂
As we saw in the duopoly example, the study of Markov perfect equilibria in games with two
players leads us to an interrelated pair of Bellman equations.
796 CHAPTER 48. MARKOV PERFECT EQUILIBRIUM
In linear-quadratic dynamic games, these “stacked Bellman equations” become “stacked Ric-
cati equations” with a tractable mathematical structure.
We’ll lay out that structure in a general setup and then apply it to some simple problems.
𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 𝑅𝑖 𝑥𝑡 + 𝑢′𝑖𝑡 𝑄𝑖 𝑢𝑖𝑡 + 𝑢′−𝑖𝑡 𝑆𝑖 𝑢−𝑖𝑡 + 2𝑥′𝑡 𝑊𝑖 𝑢𝑖𝑡 + 2𝑢′−𝑖𝑡 𝑀𝑖 𝑢𝑖𝑡 } (6)
𝑡=𝑡0
Here
• 𝑥𝑡 is an 𝑛 × 1 state vector and 𝑢𝑖𝑡 is a 𝑘𝑖 × 1 vector of controls for player 𝑖
• 𝑅𝑖 is 𝑛 × 𝑛
• 𝑆𝑖 is 𝑘−𝑖 × 𝑘−𝑖
• 𝑄𝑖 is 𝑘𝑖 × 𝑘𝑖
• 𝑊𝑖 is 𝑛 × 𝑘𝑖
• 𝑀𝑖 is 𝑘−𝑖 × 𝑘𝑖
• 𝐴 is 𝑛 × 𝑛
• 𝐵𝑖 is 𝑛 × 𝑘𝑖
𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 Π1𝑡 𝑥𝑡 + 𝑢′1𝑡 𝑄1 𝑢1𝑡 + 2𝑢′1𝑡 Γ1𝑡 𝑥𝑡 } (8)
𝑡=𝑡0
subject to
where
• Λ𝑖𝑡 ∶= 𝐴 − 𝐵−𝑖 𝐹−𝑖𝑡
′
• Π𝑖𝑡 ∶= 𝑅𝑖 + 𝐹−𝑖𝑡 𝑆𝑖 𝐹−𝑖𝑡
• Γ𝑖𝑡 ∶= 𝑊𝑖 − 𝑀𝑖′ 𝐹−𝑖𝑡
′
𝐹1𝑡 = (𝑄1 + 𝛽𝐵1′ 𝑃1𝑡+1 𝐵1 )−1 (𝛽𝐵1′ 𝑃1𝑡+1 Λ1𝑡 + Γ1𝑡 ) (10)
𝑃1𝑡 = Π1𝑡 −(𝛽𝐵1′ 𝑃1𝑡+1 Λ1𝑡 +Γ1𝑡 )′ (𝑄1 +𝛽𝐵1′ 𝑃1𝑡+1 𝐵1 )−1 (𝛽𝐵1′ 𝑃1𝑡+1 Λ1𝑡 +Γ1𝑡 )+𝛽Λ′1𝑡 𝑃1𝑡+1 Λ1𝑡 (11)
𝐹2𝑡 = (𝑄2 + 𝛽𝐵2′ 𝑃2𝑡+1 𝐵2 )−1 (𝛽𝐵2′ 𝑃2𝑡+1 Λ2𝑡 + Γ2𝑡 ) (12)
𝑃2𝑡 = Π2𝑡 −(𝛽𝐵2′ 𝑃2𝑡+1 Λ2𝑡 +Γ2𝑡 )′ (𝑄2 +𝛽𝐵2′ 𝑃2𝑡+1 𝐵2 )−1 (𝛽𝐵2′ 𝑃2𝑡+1 Λ2𝑡 +Γ2𝑡 )+𝛽Λ′2𝑡 𝑃2𝑡+1 Λ2𝑡 (13)
Key Insight
A key insight is that equations (10) and (12) are linear in 𝐹1𝑡 and 𝐹2𝑡 .
After these equations have been solved, we can take 𝐹𝑖𝑡 and solve for 𝑃𝑖𝑡 in (11) and (13).
Infinite Horizon
We often want to compute the solutions of such games for infinite horizons, in the hope that
the decision rules 𝐹𝑖𝑡 settle down to be time-invariant as 𝑡1 → +∞.
In practice, we usually fix 𝑡1 and compute the equilibrium of an infinite horizon game by driv-
ing 𝑡0 → −∞.
This is the approach we adopt in the next section.
798 CHAPTER 48. MARKOV PERFECT EQUILIBRIUM
48.4.3 Implementation
We use the function nnash from QuantEcon.py that computes a Markov perfect equilibrium
of the infinite horizon linear-quadratic dynamic game in the manner described above.
48.5 Application
Let’s use these procedures to treat some applications, starting with the duopoly model.
To map the duopoly model into coupled linear-quadratic dynamic programming problems,
define the state and controls as
1
𝑥𝑡 ∶= ⎡𝑞 ⎤
⎢ 1𝑡 ⎥ and 𝑢𝑖𝑡 ∶= 𝑞𝑖,𝑡+1 − 𝑞𝑖𝑡 , 𝑖 = 1, 2
⎣𝑞2𝑡 ⎦
If we write
where 𝑄1 = 𝑄2 = 𝛾,
0 − 𝑎20 0 0 0 − 𝑎20
𝑅1 ∶= ⎡−
⎢ 2
𝑎0
𝑎1 𝑎1 ⎤
2 ⎥ and 𝑅2 ∶= ⎡
⎢ 0 0 𝑎1
2
⎤
⎥
𝑎1 𝑎 𝑎1
⎣ 0 2 0⎦ ⎣− 20 2 𝑎1 ⎦
1 0 0 0 0
𝐴 ∶= ⎢0 1 0⎤
⎡
⎥, 𝐵1 ∶= ⎢1⎤
⎡
⎥, 𝐵2 ∶= ⎢0⎤
⎡
⎥
⎣ 0 0 1 ⎦ 0
⎣ ⎦ 1
⎣ ⎦
The optimal decision rule of firm 𝑖 will take the form 𝑢𝑖𝑡 = −𝐹𝑖 𝑥𝑡 , inducing the following
closed-loop system for the evolution of 𝑥 in the Markov perfect equilibrium:
Consider the previously presented duopoly model with parameter values of:
• 𝑎0 = 10
• 𝑎1 = 2
• 𝛽 = 0.96
• 𝛾 = 12
48.5. APPLICATION 799
From these, we compute the infinite horizon MPE using the preceding code
# Parameters
a0 = 10.0
a1 = 2.0
β = 0.96
γ = 12.0
# In LQ form
A = np.eye(3)
B1 = np.array([[0.], [1.], [0.]])
B2 = np.array([[0.], [0.], [1.]])
Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0
# Display policies
print("Computed policies for firm 1 and firm 2:\n")
print(f"F1 = {F1}")
print(f"F2 = {F2}")
print("\n")
In [4]: Λ1 = A B2 @ F2
lq1 = qe.LQ(Q1, R1, Λ1, B1, beta=β)
P1_ih, F1_ih, d = lq1.stationary_values()
F1_ih
800 CHAPTER 48. MARKOV PERFECT EQUILIBRIUM
This is close enough for rock and roll, as they say in the trade.
Indeed, np.allclose agrees with our assessment
Out[5]: True
48.5.3 Dynamics
Let’s now investigate the dynamics of price and output in this simple duopoly model under
the MPE policies.
Given our optimal policies 𝐹 1 and 𝐹 2, the state evolves according to (14).
The following program
• imports 𝐹 1 and 𝐹 2 from the previous program along with all parameters.
• computes the evolution of 𝑥𝑡 using (14).
• extracts and plots industry output 𝑞𝑡 = 𝑞1𝑡 + 𝑞2𝑡 and price 𝑝𝑡 = 𝑎0 − 𝑎1 𝑞𝑡 .
In [6]: AF = A B1 @ F1 B2 @ F2
n = 20
x = np.empty((3, n))
x[:, 0] = 1, 1, 1
for t in range(n1):
x[:, t+1] = AF @ x[:, t]
q1 = x[1, :]
q2 = x[2, :]
q = q1 + q2 # Total output, MPE
p = a0 a1 * q # Price, MPE
Note that the initial condition has been set to 𝑞10 = 𝑞20 = 1.0.
To gain some perspective we can compare this to what happens in the monopoly case.
The first panel in the next figure compares output of the monopolist and industry output un-
der the MPE, as a function of time.
802 CHAPTER 48. MARKOV PERFECT EQUILIBRIUM
Here parameters are the same as above for both the MPE and monopoly solutions.
The monopolist initial condition is 𝑞0 = 2.0 to mimic the industry initial condition 𝑞10 =
𝑞20 = 1.0 in the MPE case.
As expected, output is higher and prices are lower under duopoly than monopoly.
48.6 Exercises
48.6.1 Exercise 1
Replicate the pair of figures showing the comparison of output and prices for the monopolist
and duopoly under MPE.
Parameters are as in duopoly_mpe.py and you can use that code to compute MPE policies
under duopoly.
The optimal policy in the monopolist case can be computed using QuantEcon.py’s LQ class.
48.6.2 Exercise 2
It takes the form of infinite horizon linear-quadratic game proposed by Judd [63].
Two firms set prices and quantities of two goods interrelated through their demand curves.
Relevant variables are defined as follows:
• 𝐼𝑖𝑡 = inventories of firm 𝑖 at beginning of 𝑡
• 𝑞𝑖𝑡 = production of firm 𝑖 during period 𝑡
• 𝑝𝑖𝑡 = price charged by firm 𝑖 during period 𝑡
• 𝑆𝑖𝑡 = sales made by firm 𝑖 during period 𝑡
• 𝐸𝑖𝑡 = costs of production of firm 𝑖 during period 𝑡
• 𝐶𝑖𝑡 = costs of carrying inventories for firm 𝑖 during 𝑡
The firms’ cost functions are
2
• 𝐶𝑖𝑡 = 𝑐𝑖1 + 𝑐𝑖2 𝐼𝑖𝑡 + 0.5𝑐𝑖3 𝐼𝑖𝑡
2
• 𝐸𝑖𝑡 = 𝑒𝑖1 + 𝑒𝑖2 𝑞𝑖𝑡 + 0.5𝑒𝑖3 𝑞𝑖𝑡 where 𝑒𝑖𝑗 , 𝑐𝑖𝑗 are positive scalars
Inventories obey the laws of motion
𝑆𝑡 = 𝐷𝑝𝑖𝑡 + 𝑏
where
′
• 𝑆𝑡 = [𝑆1𝑡 𝑆2𝑡 ]
• 𝐷 is a 2 × 2 negative definite matrix and
• 𝑏 is a vector of constants
Firm 𝑖 maximizes the undiscounted sum
1 𝑇
lim ∑ (𝑝 𝑆 − 𝐸𝑖𝑡 − 𝐶𝑖𝑡 )
𝑇 →∞ 𝑇 𝑡=0 𝑖𝑡 𝑖𝑡
𝐼1𝑡
𝑝𝑖𝑡
𝑢𝑖𝑡 = [ ] and 𝑥𝑡 = ⎡ ⎤
⎢𝐼2𝑡 ⎥
𝑞𝑖𝑡
⎣1⎦
Decision rules for price and quantity take the form 𝑢𝑖𝑡 = −𝐹𝑖 𝑥𝑡 .
The Markov perfect equilibrium of Judd’s model can be computed by filling in the matrices
appropriately.
The exercise is to calculate these matrices and compute the following figures.
The first figure shows the dynamics of inventories for each firm when the parameters are
In [7]: δ = 0.02
D = np.array([[1, 0.5], [0.5, 1]])
b = np.array([25, 25])
c1 = c2 = np.array([1, 2, 1])
e1 = e2 = np.array([10, 10, 3])
804 CHAPTER 48. MARKOV PERFECT EQUILIBRIUM
48.7 Solutions
48.7.1 Exercise 1
First, let’s compute the duopoly MPE under the stated parameters
In [8]: # == Parameters == #
a0 = 10.0
a1 = 2.0
48.7. SOLUTIONS 805
β = 0.96
γ = 12.0
# == In LQ form == #
A = np.eye(3)
B1 = np.array([[0.], [1.], [0.]])
B2 = np.array([[0.], [0.], [1.]])
R1 = [[ 0., a0/2, 0.],
[a0 / 2., a1, a1 / 2.],
[ 0, a1 / 2., 0.]]
Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0
Now we evaluate the time path of industry output and prices given initial condition 𝑞10 =
𝑞20 = 1.
In [9]: AF = A B1 @ F1 B2 @ F2
n = 20
x = np.empty((3, n))
x[:, 0] = 1, 1, 1
for t in range(n1):
x[:, t+1] = AF @ x[:, t]
q1 = x[1, :]
q2 = x[2, :]
q = q1 + q2 # Total output, MPE
p = a0 a1 * q # Price, MPE
𝑥𝑡 = 𝑞𝑡 − 𝑞 ̄ and 𝑢𝑡 = 𝑞𝑡+1 − 𝑞𝑡
𝑅 = 𝑎1 and 𝑄=𝛾
𝐴=𝐵=1
In [10]: R = a1
Q = γ
A = B = 1
lq_alt = qe.LQ(Q, R, A, B, beta=β)
P, F, d = lq_alt.stationary_values()
q_bar = a0 / (2.0 * a1)
qm = np.empty(n)
qm[0] = 2
x0 = qm[0] q_bar
x = x0
for i in range(1, n):
x = A * x B * F * x
qm[i] = float(x) + q_bar
pm = a0 a1 * qm
ax = axes[0]
ax.plot(qm, 'b', lw=2, alpha=0.75, label='monopolist output')
ax.plot(q, 'g', lw=2, alpha=0.75, label='MPE total output')
ax.set(ylabel="output", xlabel="time", ylim=(2, 4))
ax.legend(loc='upper left', frameon=0)
ax = axes[1]
ax.plot(pm, 'b', lw=2, alpha=0.75, label='monopolist price')
ax.plot(p, 'g', lw=2, alpha=0.75, label='MPE price')
ax.set(ylabel="price", xlabel="time")
ax.legend(loc='upper right', frameon=0)
plt.show()
48.7. SOLUTIONS 807
48.7.2 Exercise 2
In [12]: δ = 0.02
D = np.array([[1, 0.5], [0.5, 1]])
b = np.array([25, 25])
c1 = c2 = np.array([1, 2, 1])
e1 = e2 = np.array([10, 10, 3])
δ_1 = 1 δ
𝐼1𝑡
𝑝𝑖𝑡
𝑢𝑖𝑡 = [ ] and 𝑥𝑡 = ⎡ ⎤
⎢𝐼2𝑡 ⎥
𝑞𝑖𝑡
⎣1⎦
S1 = np.zeros((2, 2))
S2 = np.copy(S1)
W1 = np.array([[ 0, 0],
[ 0, 0],
[0.5 * e1[1], b[0] / 2.]])
W2 = np.array([[ 0, 0],
[ 0, 0],
[0.5 * e2[1], b[1] / 2.]])
Now let’s look at the dynamics of inventories, and reproduce the graph corresponding to 𝛿 =
0.02
48.7. SOLUTIONS 809
In [15]: AF = A B1 @ F1 B2 @ F2
n = 25
x = np.empty((3, n))
x[:, 0] = 2, 0, 1
for t in range(n1):
x[:, t+1] = AF @ x[:, t]
I1 = x[0, :]
I2 = x[1, :]
fig, ax = plt.subplots(figsize=(9, 5))
ax.plot(I1, 'b', lw=2, alpha=0.75, label='inventories, firm 1')
ax.plot(I2, 'g', lw=2, alpha=0.75, label='inventories, firm 2')
ax.set_title(rf'$\delta = {δ}$')
ax.legend()
plt.show()
810 CHAPTER 48. MARKOV PERFECT EQUILIBRIUM
Chapter 49
Uncertainty Traps
49.1 Contents
• Overview 49.2
• The Model 49.3
• Implementation 49.4
• Results 49.5
• Exercises 49.6
• Solutions 49.7
49.2 Overview
811
812 CHAPTER 49. UNCERTAINTY TRAPS
The original model described in [37] has many interesting moving parts.
Here we examine a simplified version that nonetheless captures many of the key ideas.
49.3.1 Fundamentals
where
• 𝜎𝜃 > 0 and 0 < 𝜌 < 1
• {𝑤𝑡 } is IID and standard normal
The random variable 𝜃𝑡 is not observable at any time.
49.3.2 Output
Dropping time subscripts, beliefs for current 𝜃 are represented by the normal distribution
𝑁 (𝜇, 𝛾 −1 ).
Here 𝛾 is the precision of beliefs; its inverse is the degree of uncertainty.
These parameters are updated by Kalman filtering.
Let
• 𝕄 ⊂ {1, … , 𝑀̄ } denote the set of currently active firms.
• 𝑀 ∶= |𝕄| denote the number of currently active firms.
1
• 𝑋 be the average output 𝑀 ∑𝑚∈𝕄 𝑥𝑚 of the active firms.
With this notation and primes for next period values, we can write the updating of the mean
and precision via
𝛾𝜇 + 𝑀 𝛾𝑥 𝑋
𝜇′ = 𝜌 (2)
𝛾 + 𝑀 𝛾𝑥
−1
′ 𝜌2
𝛾 =( + 𝜎𝜃2 ) (3)
𝛾 + 𝑀 𝛾𝑥
These are standard Kalman filtering results applied to the current setting.
Exercise 1 provides more details on how (2) and (3) are derived and then asks you to fill in
remaining steps.
The next figure plots the law of motion for the precision in (3) as a 45 degree diagram, with
one curve for each 𝑀 ∈ {0, … , 6}.
The other parameter values are 𝜌 = 0.99, 𝛾𝑥 = 0.5, 𝜎𝜃 = 0.5
814 CHAPTER 49. UNCERTAINTY TRAPS
Points where the curves hit the 45 degree lines are long-run steady states for precision for dif-
ferent values of 𝑀 .
Thus, if one of these values for 𝑀 remains fixed, a corresponding steady state is the equilib-
rium level of precision
• high values of 𝑀 correspond to greater information about the fundamental, and hence
more precision in steady state
• low values of 𝑀 correspond to less information and more uncertainty in steady state
In practice, as we’ll see, the number of active firms fluctuates stochastically.
49.3.4 Participation
Omitting time subscripts once more, entrepreneurs enter the market in the current period if
Here
• the mathematical expectation of 𝑥𝑚 is based on (1) and beliefs 𝑁 (𝜇, 𝛾 −1 ) for 𝜃
• 𝐹𝑚 is a stochastic but pre-visible fixed cost, independent across time and firms
• 𝑐 is a constant reflecting opportunity costs
The statement that 𝐹𝑚 is pre-visible means that it is realized at the start of the period and
49.4. IMPLEMENTATION 815
1
𝑢(𝑥) = (1 − exp(−𝑎𝑥)) (5)
𝑎
where 𝑎 is a positive parameter.
Combining (4) and (5), entrepreneur 𝑚 participates in the market (or is said to be active)
when
1
{1 − 𝔼[exp (−𝑎(𝜃 + 𝜖𝑚 − 𝐹𝑚 ))]} > 𝑐
𝑎
Using standard formulas for expectations of lognormal random variables, this is equivalent to
the condition
1 𝑎2 ( 𝛾1 + 1
𝛾𝑥 )
𝜓(𝜇, 𝛾, 𝐹𝑚 ) ∶= (1 − exp (−𝑎𝜇 + 𝑎𝐹𝑚 + )) − 𝑐 > 0 (6)
𝑎 2
49.4 Implementation
def __init__(self,
a=1.5, # Risk aversion
γ_x=0.5, # Production shock precision
ρ=0.99, # Correlation coefficient for θ
σ_θ=0.5, # Standard dev of θ shock
num_firms=100, # Number of firms
σ_F=1.5, # Standard dev of fixed costs
c=420, # External opportunity cost
μ_init=0, # Initial value for μ
γ_init=4, # Initial value for γ
θ_init=0): # Initial value for θ
# == Record values == #
self.a, self.γ_x, self.ρ, self.σ_θ = a, γ_x, ρ, σ_θ
self.num_firms, self.σ_F, self.c, = num_firms, σ_F, c
self.σ_x = np.sqrt(1/γ_x)
816 CHAPTER 49. UNCERTAINTY TRAPS
# == Initialize states == #
self.γ, self.μ, self.θ = γ_init, μ_init, θ_init
def gen_aggregates(self):
"""
Generate aggregates based on current beliefs (μ, γ). This
is a simulation step that depends on the draws for F.
"""
F_vals = self.σ_F * np.random.randn(self.num_firms)
M = np.sum(self.ψ(F_vals) > 0) # Counts number of active firms
if M > 0:
x_vals = self.θ + self.σ_x * np.random.randn(M)
X = x_vals.mean()
else:
X = 0
return X, M
In the results below we use this code to simulate time series for the major variables.
49.5 Results
Let’s look first at the dynamics of 𝜇, which the agents use to track 𝜃
49.5. RESULTS 817
We see that 𝜇 tracks 𝜃 well when there are sufficient firms in the market.
However, there are times when 𝜇 tracks 𝜃 poorly due to insufficient information.
These are episodes where the uncertainty traps take hold.
During these episodes
• precision is low and uncertainty is high
• few firms are in the market
To get a clearer idea of the dynamics, let’s look at all the main time series at once, for a given
set of shocks
818 CHAPTER 49. UNCERTAINTY TRAPS
Notice how the traps only take hold after a sequence of bad draws for the fundamental.
Thus, the model gives us a propagation mechanism that maps bad random draws into long
downturns in economic activity.
49.6. EXERCISES 819
49.6 Exercises
49.6.1 Exercise 1
Fill in the details behind (2) and (3) based on the following standard result (see, e.g., p. 24 of
[112]).
Fact Let x = (𝑥1 , … , 𝑥𝑀 ) be a vector of IID draws from common distribution 𝑁 (𝜃, 1/𝛾𝑥 ) and
let 𝑥̄ be the sample mean. If 𝛾𝑥 is known and the prior for 𝜃 is 𝑁 (𝜇, 1/𝛾), then the posterior
distribution of 𝜃 given x is
where
𝜇𝛾 + 𝑀 𝑥𝛾̄ 𝑥
𝜇0 = and 𝛾0 = 𝛾 + 𝑀 𝛾 𝑥
𝛾 + 𝑀 𝛾𝑥
49.6.2 Exercise 2
49.7 Solutions
49.7.1 Exercise 1
This exercise asked you to validate the laws of motion for 𝛾 and 𝜇 given in the lecture, based
on the stated result about Bayesian updating in a scalar Gaussian setting. The stated result
tells us that after observing average output 𝑋 of the 𝑀 firms, our posterior beliefs will be
𝑁 (𝜇0 , 1/𝛾0 )
where
𝜇𝛾 + 𝑀 𝑋𝛾𝑥
𝜇0 = and 𝛾0 = 𝛾 + 𝑀 𝛾𝑥
𝛾 + 𝑀 𝛾𝑥
If we take a random variable 𝜃 with this distribution and then evaluate the distribution of
𝜌𝜃 + 𝜎𝜃 𝑤 where 𝑤 is independent and standard normal, we get the expressions for 𝜇′ and 𝛾 ′
given in the lecture.
49.7.2 Exercise 2
First, let’s replicate the plot that illustrates the law of motion for precision, which is
820 CHAPTER 49. UNCERTAINTY TRAPS
−1
𝜌2
𝛾𝑡+1 = ( + 𝜎𝜃2 )
𝛾𝑡 + 𝑀 𝛾 𝑥
Here 𝑀 is the number of active firms. The next figure plots 𝛾𝑡+1 against 𝛾𝑡 on a 45 degree
diagram for different values of 𝑀
for M in range(7):
γ_next = 1 / (ρ**2 / (γ + M * γ_x) + σ_θ**2)
label_string = f"$M = {M}$"
ax.plot(γ, γ_next, lw=2, label=label_string)
ax.legend(loc='lower right', fontsize=14)
ax.set_xlabel(r'$\gamma$', fontsize=16)
ax.set_ylabel(r"$\gamma'$", fontsize=16)
ax.grid()
plt.show()
49.7. SOLUTIONS 821
The points where the curves hit the 45 degree lines are the long-run steady states correspond-
ing to each 𝑀 , if that value of 𝑀 was to remain fixed. As the number of firms falls, so does
the long-run steady state of precision.
Next let’s generate time series for beliefs and the aggregates – that is, the number of active
firms and average output
In [4]: sim_length=2000
μ_vec = np.empty(sim_length)
θ_vec = np.empty(sim_length)
γ_vec = np.empty(sim_length)
X_vec = np.empty(sim_length)
M_vec = np.empty(sim_length)
μ_vec[0] = econ.μ
γ_vec[0] = econ.γ
θ_vec[0] = 0
w_shocks = np.random.randn(sim_length)
for t in range(sim_length1):
X, M = econ.gen_aggregates()
X_vec[t] = X
M_vec[t] = M
econ.update_beliefs(X, M)
econ.update_θ(w_shocks[t])
μ_vec[t+1] = econ.μ
γ_vec[t+1] = econ.γ
θ_vec[t+1] = econ.θ
plt.show()
49.7. SOLUTIONS 823
If you run the code above you’ll get different plots, of course.
824 CHAPTER 49. UNCERTAINTY TRAPS
Try experimenting with different parameters to see the effects on the time series.
(It would also be interesting to experiment with non-Gaussian distributions for the shocks,
but this is a big exercise since it takes us outside the world of the standard Kalman filter)
Chapter 50
50.1 Contents
• Overview 50.2
• The Economy 50.3
• Firms 50.4
• Code 50.5
In addition to what’s in Anaconda, this lecture will need the following libraries:
50.2 Overview
In this lecture, we describe the structure of a class of models that build on work by Truman
Bewley [13].
We begin by discussing an example of a Bewley model due to Rao Aiyagari.
The model features
• Heterogeneous agents
• A single exogenous vehicle for borrowing and lending
• Limits on amounts individual agents may borrow
The Aiyagari model has been used to investigate many topics, including
• precautionary savings and the effect of liquidity constraints [4]
• risk sharing and asset pricing [55]
• the shape of the wealth distribution [10]
• etc., etc., etc.
Let’s start with some imports:
825
826 CHAPTER 50. THE AIYAGARI MODEL
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
50.2.1 References
50.3.1 Households
∞
max 𝔼 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 )
𝑡=0
subject to
where
• 𝑐𝑡 is current consumption
• 𝑎𝑡 is assets
• 𝑧𝑡 is an exogenous component of labor income capturing stochastic unemployment risk,
etc.
• 𝑤 is a wage rate
• 𝑟 is a net interest rate
• 𝐵 is the maximum amount that the agent is allowed to borrow
The exogenous process {𝑧𝑡 } follows a finite state Markov chain with given stochastic matrix
𝑃.
The wage and interest rate are fixed over time.
In this simple version of the model, households supply labor inelastically because they do not
value leisure.
50.4. FIRMS 827
50.4 Firms
𝑌𝑡 = 𝐴𝐾𝑡𝛼 𝑁 1−𝛼
where
• 𝐴 and 𝛼 are parameters with 𝐴 > 0 and 𝛼 ∈ (0, 1)
• 𝐾𝑡 is aggregate capital
• 𝑁 is total labor supply (which is constant in this simple version of the model)
The firm’s problem is
1−𝛼
𝑁
𝑟 = 𝐴𝛼 ( ) −𝛿 (1)
𝐾
Using this expression and the firm’s first-order condition for labor, we can pin down the equi-
librium wage rate as a function of 𝑟 as
50.4.1 Equilibrium
2. determine corresponding prices, with interest rate 𝑟 determined by (1) and a wage rate
𝑤(𝑟) as given in (2)
3. determine the common optimal savings policy of the households given these prices
4. compute aggregate capital as the mean of steady state capital given this savings policy
50.5 Code
"""
50.5. CODE 829
def __init__(self,
r=0.01, # Interest rate
w=1.0, # Wages
β=0.96, # Discount factor
a_min=1e10,
Π=[[0.9, 0.1], [0.1, 0.9]], # Markov chain
z_vals=[0.1, 1.0], # Exogenous states
a_max=18,
a_size=200):
self.Π = np.asarray(Π)
self.z_vals = np.asarray(z_vals)
self.z_size = len(z_vals)
def build_Q(self):
populate_Q(self.Q, self.a_size, self.z_size, self.Π)
def build_R(self):
self.R.fill(np.inf)
populate_R(self.R,
self.a_size,
self.z_size,
self.a_vals,
self.z_vals,
self.r,
self.w)
@jit(nopython=True)
def populate_R(R, a_size, z_size, a_vals, z_vals, r, w):
n = a_size * z_size
for s_i in range(n):
a_i = s_i // z_size
z_i = s_i % z_size
a = a_vals[a_i]
z = z_vals[z_i]
for new_a_i in range(a_size):
830 CHAPTER 50. THE AIYAGARI MODEL
a_new = a_vals[new_a_i]
c = w * z + (1 + r) * a a_new
if c > 0:
R[s_i, new_a_i] = np.log(c) # Utility
@jit(nopython=True)
def populate_Q(Q, a_size, z_size, Π):
n = a_size * z_size
for s_i in range(n):
z_i = s_i % z_size
for a_i in range(a_size):
for next_z_i in range(z_size):
Q[s_i, a_i, a_i*z_size + next_z_i] = Π[z_i, next_z_i]
@jit(nopython=True)
def asset_marginal(s_probs, a_size, z_size):
a_probs = np.zeros(a_size)
for a_i in range(a_size):
for z_i in range(z_size):
a_probs[a_i] += s_probs[a_i*z_size + z_i]
return a_probs
As a first example of what we can do, let’s compute and plot an optimal accumulation policy
at fixed prices.
# Simplify names
z_size, a_size = am.z_size, am.a_size
z_vals, a_vals = am.z_vals, am.a_vals
n = a_size * z_size
# Get all optimal actions across the set of a indices with z fixed in each row
a_star = np.empty((z_size, a_size))
for s_i in range(n):
a_i = s_i // z_size
z_i = s_i % z_size
a_star[z_i, a_i] = a_vals[results.sigma[s_i]]
plt.show()
50.5. CODE 831
The plot shows asset accumulation policies at different values of the exogenous state.
Now we want to calculate the equilibrium.
Let’s do this visually as a first pass.
The following code draws aggregate supply and demand curves.
The intersection gives equilibrium interest rates and capital.
In [5]: A = 1.0
N = 1.0
α = 0.33
β = 0.96
δ = 0.05
def r_to_w(r):
"""
Equilibrium wages associated with a given interest rate r.
"""
return A * (1 α) * (A * α / (r + δ))**(α / (1 α))
def rd(K):
"""
832 CHAPTER 50. THE AIYAGARI MODEL
Inverse demand curve for capital. The interest rate associated with a
given demand for capital K.
"""
return A * α * (N / K)**(1 α) δ
Parameters:
am : Household
An instance of an aiyagari_household.Household
r : float
The interest rate
"""
w = r_to_w(r)
am.set_prices(r, w)
aiyagari_ddp = DiscreteDP(am.R, am.Q, β)
# Compute the optimal policy
results = aiyagari_ddp.solve(method='policy_iteration')
# Compute the stationary distribution
stationary_probs = results.mc.stationary_distributions[0]
# Extract the marginal distribution for assets
asset_probs = asset_marginal(stationary_probs, am.a_size, am.z_size)
# Return K
return np.sum(asset_probs * am.a_vals)
plt.show()
50.5. CODE 833
834 CHAPTER 50. THE AIYAGARI MODEL
Part VIII
835
Chapter 51
51.1 Contents
• Overview 51.2
• Pricing Models 51.3
• Prices in the Risk-Neutral Case 51.4
• Asset Prices under Risk Aversion 51.5
• Exercises 51.6
• Solutions 51.7
“A little knowledge of geometric series goes a long way” – Robert E. Lucas, Jr.
In addition to what’s in Anaconda, this lecture will need the following libraries:
51.2 Overview
837
838 CHAPTER 51. ASSET PRICING: FINITE STATE MODELS
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)
What happens if for some reason traders discount payouts differently depending on the state
of the world?
Michael Harrison and David Kreps [54] and Lars Peter Hansen and Scott Richard [52] showed
that in quite general settings the price of an ex-dividend asset obeys
The way anticipated future payoffs are evaluated can now depend on various random out-
comes.
One example of this idea is that assets that tend to have good payoffs in bad states of the
world might be regarded as more valuable.
This is because they pay well when the funds are more urgently needed.
We give examples of how the stochastic discount factor has been modeled below.
Recall that, from the definition of a conditional covariance cov𝑡 (𝑥𝑡+1 , 𝑦𝑡+1 ), we have
Aside from prices, another quantity of interest is the price-dividend ratio 𝑣𝑡 ∶= 𝑝𝑡 /𝑑𝑡 .
Let’s write down an expression that this ratio should satisfy.
We can divide both sides of (2) by 𝑑𝑡 to get
𝑑𝑡+1
𝑣𝑡 = 𝔼𝑡 [𝑚𝑡+1 (1 + 𝑣𝑡+1 )] (5)
𝑑𝑡
What can we say about price dynamics on the basis of the models described above?
The answer to this question depends on
For now let’s focus on the risk-neutral case, where the stochastic discount factor is constant,
and study how prices depend on the dividend process.
The simplest case is risk-neutral pricing in the face of a constant, non-random dividend
stream 𝑑𝑡 = 𝑑 > 0.
Removing the expectation from (1) and iterating forward gives
𝑝𝑡 = 𝛽(𝑑 + 𝑝𝑡+1 )
= 𝛽(𝑑 + 𝛽(𝑑 + 𝑝𝑡+2 ))
⋮
= 𝛽(𝑑 + 𝛽𝑑 + 𝛽 2 𝑑 + ⋯ + 𝛽 𝑘−2 𝑑 + 𝛽 𝑘−1 𝑝𝑡+𝑘 )
𝛽𝑑
𝑝̄ ∶= (6)
1−𝛽
Consider a growing, non-random dividend process 𝑑𝑡+1 = 𝑔𝑑𝑡 where 0 < 𝑔𝛽 < 1.
While prices are not usually constant when dividends grow over time, the price dividend-ratio
might be.
If we guess this, substituting 𝑣𝑡 = 𝑣 into (5) as well as our other assumptions, we get 𝑣 =
𝛽𝑔(1 + 𝑣).
Since 𝛽𝑔 < 1, we have a unique positive solution:
𝛽𝑔
𝑣=
1 − 𝛽𝑔
𝛽𝑔
𝑝𝑡 = 𝑑
1 − 𝛽𝑔 𝑡
If, in this example, we take 𝑔 = 1 + 𝜅 and let 𝜌 ∶= 1/𝛽 − 1, then the price becomes
1+𝜅
𝑝𝑡 = 𝑑
𝜌−𝜅 𝑡
𝑔𝑡 = 𝑔(𝑋𝑡 ), 𝑡 = 1, 2, …
where
1. {𝑋𝑡 } is a finite Markov chain with state space 𝑆 and transition probabilities
Pricing
To obtain asset prices in this setting, let’s adapt our analysis from the case of deterministic
growth.
In that case, we found that 𝑣 is constant.
This encourages us to guess that, in the current case, 𝑣𝑡 is constant given the state 𝑋𝑡 .
In other words, we are looking for a fixed function 𝑣 such that the price-dividend ratio satis-
fies 𝑣𝑡 = 𝑣(𝑋𝑡 ).
We can substitute this guess into (5) to get
or
𝑣 = 𝛽𝐾(𝟙 + 𝑣) (9)
Here
• 𝑣 is understood to be the column vector (𝑣(𝑥1 ), … , 𝑣(𝑥𝑛 ))′ .
• 𝐾 is the matrix (𝐾(𝑥𝑖 , 𝑥𝑗 ))1≤𝑖,𝑗≤𝑛 .
• 𝟙 is a column vector of ones.
When does (9) have a unique solution?
From the Neumann series lemma and Gelfand’s formula, this will be the case if 𝛽𝐾 has spec-
tral radius strictly less than one.
In other words, we require that the eigenvalues of 𝐾 be strictly less than 𝛽 −1 in modulus.
The solution is then
51.4.4 Code
K = mc.P * np.exp(mc.state_values)
I = np.identity(n)
v = solve(I β * K, β * K @ np.ones(n))
Now let’s turn to the case where agents are risk averse.
We’ll price several distinct assets, including
• The price of an endowment stream
• A consol (a type of bond issued by the UK government in the 19th century)
• Call options on a consol
Let’s start with a version of the celebrated asset pricing model of Robert E. Lucas, Jr. [73].
As in [73], suppose that the stochastic discount factor takes the form
𝑢′ (𝑐𝑡+1 )
𝑚𝑡+1 = 𝛽 (11)
𝑢′ (𝑐𝑡 )
𝑐1−𝛾
𝑢(𝑐) = with 𝛾 > 0 (12)
1−𝛾
−𝛾
𝑐 −𝛾
𝑚𝑡+1 = 𝛽 ( 𝑡+1 ) = 𝛽𝑔𝑡+1 (13)
𝑐𝑡
If we let
𝑣 = 𝛽𝐽 (𝟙 + 𝑣)
Assuming that the spectral radius of 𝐽 is strictly less than 𝛽 −1 , this equation has the unique
solution
𝑣 = (𝐼 − 𝛽𝐽 )−1 𝛽𝐽 𝟙 (14)
We will define a function tree_price to solve for 𝑣 given parameters stored in the class Asset-
PriceModel
Parameters
β : scalar, float
846 CHAPTER 51. ASSET PRICING: FINITE STATE MODELS
Discount factor
mc : MarkovChain
Contains the transition matrix and set of state values for the state
process
γ : scalar(float)
Coefficient of risk aversion
g : callable
The function mapping states to growth rates
"""
def __init__(self, β=0.96, mc=None, γ=2.0, g=np.exp):
self.β, self.γ = β, γ
self.g = g
self.n = self.mc.P.shape[0]
def tree_price(ap):
"""
Computes the pricedividend ratio of the Lucas tree.
Parameters
ap: AssetPriceModel
An instance of AssetPriceModel containing primitives
Returns
v : array_like(float)
Lucas tree pricedividend ratio
"""
# Simplify names, set up matrices
β, γ, P, y = ap.β, ap.γ, ap.mc.P, ap.mc.state_values
J = P * ap.g(y)**(1 γ)
# Compute v
I = np.identity(ap.n)
Ones = np.ones(ap.n)
v = solve(I β * J, β * J @ Ones)
return v
51.5. ASSET PRICES UNDER RISK AVERSION 847
Here’s a plot of 𝑣 as a function of the state for several values of 𝛾, with a positively correlated
Markov process and 𝑔(𝑥) = exp(𝑥)
for γ in γs:
ap.γ = γ
v = tree_price(ap)
ax.plot(states, v, lw=2, alpha=0.6, label=rf"$\gamma = {γ}$")
Special Cases
Recalling that 𝑃 𝑖 𝟙 = 𝟙 for all 𝑖 and applying Neumann’s geometric series lemma, we are led
to
∞
1
𝑣 = 𝛽(𝐼 − 𝛽𝑃 )−1 𝟙 = 𝛽 ∑ 𝛽 𝑖 𝑃 𝑖 𝟙 = 𝛽 𝟙
𝑖=0
1−𝛽
Thus, with log preferences, the price-dividend ratio for a Lucas tree is constant.
Alternatively, if 𝛾 = 0, then 𝐽 = 𝐾 and we recover the risk-neutral solution (10).
This is as expected, since 𝛾 = 0 implies 𝑢(𝑐) = 𝑐 (and hence agents are risk-neutral).
𝑝𝑡 = 𝔼𝑡 [𝑚𝑡+1 (𝜁 + 𝑝𝑡+1 )]
−𝛾
𝑝𝑡 = 𝔼𝑡 [𝛽𝑔𝑡+1 (𝜁 + 𝑝𝑡+1 )] (15)
Letting 𝑀 (𝑥, 𝑦) = 𝑃 (𝑥, 𝑦)𝑔(𝑦)−𝛾 and rewriting in vector notation yields the solution
𝑝 = (𝐼 − 𝛽𝑀 )−1 𝛽𝑀 𝜁𝟙 (16)
Parameters
ap: AssetPriceModel
An instance of AssetPriceModel containing primitives
ζ : scalar(float)
51.5. ASSET PRICES UNDER RISK AVERSION 849
Returns
p : array_like(float)
Console bond prices
"""
# Simplify names, set up matrices
β, γ, P, y = ap.β, ap.γ, ap.mc.P, ap.mc.state_values
M = P * ap.g(y)**( γ)
# Compute price
I = np.identity(ap.n)
Ones = np.ones(ap.n)
p = solve(I β * M, β * ζ * M @ Ones)
return p
Let’s now price options of varying maturity that give the right to purchase a consol at a price
𝑝𝑆 .
2. Not to exercise the option now but to retain the right to exercise it later
Thus, the owner either exercises the option now or chooses not to exercise and wait until next
period.
This is termed an infinite-horizon call option with strike price 𝑝𝑆 .
The owner of the option is entitled to purchase the consol at the price 𝑝𝑆 at the beginning of
any period, after the coupon has been paid to the previous owner of the bond.
The fundamentals of the economy are identical with the one above, including the stochastic
discount factor and the process for consumption.
Let 𝑤(𝑋𝑡 , 𝑝𝑆 ) be the value of the option when the time 𝑡 growth state is known to be 𝑋𝑡 but
before the owner has decided whether or not to exercise the option at time 𝑡 (i.e., today).
Recalling that 𝑝(𝑋𝑡 ) is the value of the consol when the initial growth state is 𝑋𝑡 , the value
of the option satisfies
𝑢′ (𝑐𝑡+1 )
𝑤(𝑋𝑡 , 𝑝𝑆 ) = max {𝛽 𝔼𝑡 𝑤(𝑋𝑡+1 , 𝑝𝑆 ), 𝑝(𝑋𝑡 ) − 𝑝𝑆 }
𝑢′ (𝑐𝑡 )
850 CHAPTER 51. ASSET PRICING: FINITE STATE MODELS
The first term on the right is the value of waiting, while the second is the value of exercising
now.
We can also write this as
With 𝑀 (𝑥, 𝑦) = 𝑃 (𝑥, 𝑦)𝑔(𝑦)−𝛾 and 𝑤 as the vector of values (𝑤(𝑥𝑖 ), 𝑝𝑆 )𝑛𝑖=1 , we can express
(17) as the nonlinear vector equation
𝑤 = max{𝛽𝑀 𝑤, 𝑝 − 𝑝𝑆 𝟙} (18)
To solve (18), form the operator 𝑇 mapping vector 𝑤 into vector 𝑇 𝑤 via
𝑇 𝑤 = max{𝛽𝑀 𝑤, 𝑝 − 𝑝𝑆 𝟙}
Parameters
ap: AssetPriceModel
An instance of AssetPriceModel containing primitives
ζ : scalar(float)
Coupon of the console
p_s : scalar(float)
Strike price
ϵ : scalar(float), optional(default=1e8)
Tolerance for infinite horizon problem
Returns
w : array_like(float)
Infinite horizon call option prices
"""
# Simplify names, set up matrices
β, γ, P, y = ap.β, ap.γ, ap.mc.P, ap.mc.state_values
M = P * ap.g(y)**( γ)
return w
In [9]: ap = AssetPriceModel(β=0.9)
ζ = 1.0
strike_price = 40
x = ap.mc.state_values
p = consol_price(ap, ζ)
w = call_option(ap, ζ, strike_price)
𝑚1 = 𝛽𝑀 𝟙
where the 𝑖-th element of 𝑚1 is the reciprocal of the one-period gross risk-free interest rate in
state 𝑥𝑖 .
Other Terms
Let 𝑚𝑗 be an 𝑛 × 1 vector whose 𝑖 th component is the reciprocal of the 𝑗 -period gross risk-
free interest rate in state 𝑥𝑖 .
Then 𝑚1 = 𝛽𝑀 , and 𝑚𝑗+1 = 𝑀 𝑚𝑗 for 𝑗 ≥ 1.
51.6 Exercises
51.6.1 Exercise 1
51.6.2 Exercise 2
In [10]: n = 5
P = 0.0125 * np.ones((n, n))
P += np.diag(0.95 0.0125 * np.ones(5))
51.7. SOLUTIONS 853
51.6.3 Exercise 3
Let’s consider finite horizon call options, which are more common than the infinite horizon
variety.
Finite horizon options obey functional equations closely related to (17).
A 𝑘 period option expires after 𝑘 periods.
If we view today as date zero, a 𝑘 period option gives the owner the right to exercise the op-
tion to purchase the risk-free consol at the strike price 𝑝𝑆 at dates 0, 1, … , 𝑘 − 1.
The option expires at time 𝑘.
Thus, for 𝑘 = 1, 2, …, let 𝑤(𝑥, 𝑘) be the value of a 𝑘-period option.
It obeys
51.7 Solutions
51.7.1 Exercise 1
𝑝𝑡 = 𝑑𝑡 + 𝛽𝔼𝑡 [𝑝𝑡+1 ]
854 CHAPTER 51. ASSET PRICING: FINITE STATE MODELS
1
𝑝𝑡 = 𝑑
1−𝛽 𝑡
1
𝑝𝑡 = 𝑑
1 − 𝛽𝑔 𝑡
51.7.2 Exercise 2
In [11]: n = 5
P = 0.0125 * np.ones((n, n))
P += np.diag(0.95 0.0125 * np.ones(5))
s = np.array([0.95, 0.975, 1.0, 1.025, 1.05]) # State values
mc = qe.MarkovChain(P, state_values=s)
γ = 2.0
β = 0.94
ζ = 1.0
p_s = 150.0
In [13]: tree_price(apm)
In [14]: consol_price(apm, ζ)
51.7.3 Exercise 3
return w
52.1 Contents
• Overview 52.2
• Structure of the Model 52.3
• Solving the Model 52.4
• Exercises 52.5
• Solutions 52.6
In addition to what’s in Anaconda, this lecture will need the following libraries:
52.2 Overview
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
857
858 CHAPTER 52. ASSET PRICING WITH INCOMPLETE MARKETS
52.2.1 References
Prior to reading the following, you might like to review our lectures on
• Markov chains
• Asset pricing with finite state space
52.2.2 Bubbles
The model simplifies by ignoring alterations in the distribution of wealth among investors
having different beliefs about the fundamentals that determine asset payouts.
There is a fixed number 𝐴 of shares of an asset.
Each share entitles its owner to a stream of dividends {𝑑𝑡 } governed by a Markov chain de-
fined on a state space 𝑆 ∈ {0, 1}.
The dividend obeys
0 if 𝑠𝑡 = 0
𝑑𝑡 = {
1 if 𝑠𝑡 = 1
The owner of a share at the beginning of time 𝑡 is entitled to the dividend paid at time 𝑡.
The owner of the share at the beginning of time 𝑡 is also entitled to sell the share to another
investor during time 𝑡.
Two types ℎ = 𝑎, 𝑏 of investors differ only in their beliefs about a Markov transition matrix 𝑃
with typical element
𝑃 (𝑖, 𝑗) = ℙ{𝑠𝑡+1 = 𝑗 ∣ 𝑠𝑡 = 𝑖}
1 1
𝑃𝑎 = [ 22 2]
1
3 3
52.3. STRUCTURE OF THE MODEL 859
2 1
𝑃𝑏 = [ 31 3]
3
4 4
The stationary (i.e., invariant) distributions of these two matrices can be calculated as fol-
lows:
In [4]: mcb.stationary_distributions
An owner of the asset at the end of time 𝑡 is entitled to the dividend at time 𝑡 + 1 and also
has the right to sell the asset at time 𝑡 + 1.
Both types of investors are risk-neutral and both have the same fixed discount factor 𝛽 ∈
(0, 1).
In our numerical example, we’ll set 𝛽 = .75, just as Harrison and Kreps did.
We’ll eventually study the consequences of two different assumptions about the number of
shares 𝐴 relative to the resources that our two types of investors can invest in the stock.
1. Both types of investors have enough resources (either wealth or the capacity to borrow)
so that they can purchase the entire available stock of the asset Section ??.
2. No single type of investor has sufficient resources to purchase the entire stock.
The above specifications of the perceived transition matrices 𝑃𝑎 and 𝑃𝑏 , taken directly from
Harrison and Kreps, build in stochastically alternating temporary optimism and pessimism.
Remember that state 1 is the high dividend state.
• In state 0, a type 𝑎 agent is more optimistic about next period’s dividend than a type 𝑏
agent.
• In state 1, a type 𝑏 agent is more optimistic about next period’s dividend.
However, the stationary distributions 𝜋𝐴 = [.57 .43] and 𝜋𝐵 = [.43 .57] tell us that a
type 𝐵 person is more optimistic about the dividend process in the long run than is a type A
person.
Transition matrices for the temporarily optimistic and pessimistic investors are constructed as
follows.
Temporarily optimistic investors (i.e., the investor with the most optimistic beliefs in each
state) believe the transition matrix
1 1
𝑃𝑜 = [ 21 2]
3
4 4
1 1
𝑃𝑝 = [ 21 2]
3
4 4
52.3.4 Information
Investors know a price function mapping the state 𝑠𝑡 at 𝑡 into the equilibrium price 𝑝(𝑠𝑡 ) that
prevails in that state.
This price function is endogenous and to be determined below.
When investors choose whether to purchase or sell the asset at 𝑡, they also know 𝑠𝑡 .
2. There are two types of agents differentiated only by their beliefs. Each type of agent
has sufficient resources to purchase all of the asset (Harrison and Kreps’s setting).
52.4. SOLVING THE MODEL 861
3. There are two types of agents with different beliefs, but because of limited wealth
and/or limited leverage, both types of investors hold the asset each period.
𝑠𝑡 0 1
𝑝𝑎 1.33 1.22
𝑝𝑏 1.45 1.91
𝑝𝑜 1.85 2.08
𝑝𝑝 1 1
𝑝𝑎̂ 1.85 1.69
𝑝𝑏̂ 1.69 2.08
Here
• 𝑝𝑎 is the equilibrium price function under homogeneous beliefs 𝑃𝑎
• 𝑝𝑏 is the equilibrium price function under homogeneous beliefs 𝑃𝑏
• 𝑝𝑜 is the equilibrium price function under heterogeneous beliefs with optimistic marginal
investors
• 𝑝𝑝 is the equilibrium price function under heterogeneous beliefs with pessimistic
marginal investors
• 𝑝𝑎̂ is the amount type 𝑎 investors are willing to pay for the asset
• 𝑝𝑏̂ is the amount type 𝑏 investors are willing to pay for the asset
We’ll explain these values and how they are calculated one row at a time.
𝑝 (0) 0
[ ℎ ] = 𝛽[𝐼 − 𝛽𝑃ℎ ]−1 𝑃ℎ [ ] (1)
𝑝ℎ (1) 1
The first two rows of the table report 𝑝𝑎 (𝑠) and 𝑝𝑏 (𝑠).
Here’s a function that can be used to compute these values
return prices
These equilibrium prices under homogeneous beliefs are important benchmarks for the subse-
quent analysis.
• 𝑝ℎ (𝑠) tells what investor ℎ thinks is the “fundamental value” of the asset.
• Here “fundamental value” means the expected discounted present value of future divi-
dends.
We will compare these fundamental values of the asset with equilibrium values when traders
have different beliefs.
𝑝(𝑠)
̄ = 𝛽 max {𝑃𝑎 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1)),
̄ 𝑃𝑏 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))}
̄ (2)
for 𝑠 = 0, 1.
The marginal investor who prices the asset in state 𝑠 is of type 𝑎 if
𝑃𝑎 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1))
̄ > 𝑃𝑏 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))
̄
𝑃𝑎 (𝑠, 1)𝑝(0)
̄ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1))
̄ < 𝑃𝑏 (𝑠, 1)𝑝(0)
̄ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))
̄
𝑝̄𝑗+1 (𝑠) = 𝛽 max {𝑃𝑎 (𝑠, 0)𝑝̄𝑗 (0) + 𝑃𝑎 (𝑠, 1)(1 + 𝑝̄𝑗 (1)), 𝑃𝑏 (𝑠, 0)𝑝̄𝑗 (0) + 𝑃𝑏 (𝑠, 1)(1 + 𝑝̄𝑗 (1))} (3)
52.4. SOLVING THE MODEL 863
for 𝑠 = 0, 1.
The third row of the table reports equilibrium prices that solve the functional equation when
𝛽 = .75.
Here the type that is optimistic about 𝑠𝑡+1 prices the asset in state 𝑠𝑡 .
It is instructive to compare these prices with the equilibrium prices for the homogeneous be-
lief economies that solve under beliefs 𝑃𝑎 and 𝑃𝑏 .
Equilibrium prices 𝑝̄ in the heterogeneous beliefs economy exceed what any prospective in-
vestor regards as the fundamental value of the asset in each possible state.
Nevertheless, the economy recurrently visits a state that makes each investor want to pur-
chase the asset for more than he believes its future dividends are worth.
The reason is that he expects to have the option to sell the asset later to another investor
who will value the asset more highly than he will.
• Investors of type 𝑎 are willing to pay the following price for the asset
𝑝(0)
̄ if 𝑠𝑡 = 0
𝑝𝑎̂ (𝑠) = {
𝛽(𝑃𝑎 (1, 0)𝑝(0)
̄ + 𝑃𝑎 (1, 1)(1 + 𝑝(1)))
̄ if 𝑠𝑡 = 1
• Investors of type 𝑏 are willing to pay the following price for the asset
break
Outcomes differ when the more optimistic type of investor has insufficient wealth — or insuf-
ficient ability to borrow enough — to hold the entire stock of the asset.
In this case, the asset price must adjust to attract pessimistic investors.
Instead of equation (2), the equilibrium price satisfies
𝑝(𝑠)
̌ = 𝛽 min {𝑃𝑎 (𝑠, 1)𝑝(0)
̌ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1)),
̌ 𝑃𝑏 (𝑠, 1)𝑝(0)
̌ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))}
̌ (4)
and the marginal investor who prices the asset is always the one that values it less highly
than does the other type.
Now the marginal investor is always the (temporarily) pessimistic type.
Notice from the sixth row of that the pessimistic price 𝑝 is lower than the homogeneous belief
prices 𝑝𝑎 and 𝑝𝑏 in both states.
When pessimistic investors price the asset according to (4), optimistic investors think that
the asset is underpriced.
If they could, optimistic investors would willingly borrow at the one-period gross interest rate
𝛽 −1 to purchase more of the asset.
Implicit constraints on leverage prohibit them from doing so.
When optimistic investors price the asset as in equation (2), pessimistic investors think that
the asset is overpriced and would like to sell the asset short.
Constraints on short sales prevent that.
Here’s code to solve for 𝑝̌ using iteration
return p_new
52.5 Exercises
52.5.1 Exercise 1
Recreate the summary table using the functions we have built above.
𝑠𝑡 0 1
𝑝𝑎 1.33 1.22
𝑝𝑏 1.45 1.91
𝑝𝑜 1.85 2.08
𝑝𝑝 1 1
𝑝𝑎̂ 1.85 1.69
𝑝𝑏̂ 1.69 2.08
You will first need to define the transition matrices and dividend payoff vector.
866 CHAPTER 52. ASSET PRICING WITH INCOMPLETE MARKETS
52.6 Solutions
52.6.1 Exercise 1
First, we will obtain equilibrium price vectors with homogeneous beliefs, including when all
investors are optimistic or pessimistic.
p_a
====================
State 0: [1.33]
State 1: [1.22]
p_b
====================
State 0: [1.45]
State 1: [1.91]
p_optimistic
====================
State 0: [1.85]
State 1: [2.08]
p_pessimistic
====================
State 0: [1.]
State 1: [1.]
We will use the price_optimistic_beliefs function to find the price under heterogeneous be-
liefs.
print(f"State 0: {s0}")
print(f"State 1: {s1}")
print("" * 20)
p_optimistic
====================
State 0: [1.85]
State 1: [2.08]
p_hat_a
====================
State 0: [1.85]
State 1: [1.69]
p_hat_b
====================
State 0: [1.69]
State 1: [2.08]
Notice that the equilibrium price with heterogeneous beliefs is equal to the price under single
beliefs with optimistic investors - this is due to the marginal investor being the temporarily
optimistic type.
Footnotes
[1] By assuming that both types of agents always have “deep enough pockets” to purchase
all of the asset, the model takes wealth dynamics off the table. The Harrison-Kreps model
generates high trading volume when the state changes either from 0 to 1 or from 1 to 0.
868 CHAPTER 52. ASSET PRICING WITH INCOMPLETE MARKETS
Part IX
869
Chapter 53
53.1 Contents
• Overview 53.2
• Slicing and Reshaping Data 53.3
• Merging Dataframes and Filling NaNs 53.4
• Grouping and Summarizing Data 53.5
• Final Remarks 53.6
• Exercises 53.7
• Solutions 53.8
53.2 Overview
871
872 CHAPTER 53. PANDAS FOR PANEL DATA
We will read in a dataset from the OECD of real minimum wages in 32 countries and assign
it to realwage.
The dataset can be accessed with the following link:
realwage = pd.read_csv(url1)
The data is currently in long format, which is difficult to analyze when there are several di-
mensions to the data.
We will use pivot_table to create a wide format panel, with a MultiIndex to handle higher
dimensional data.
pivot_table arguments should specify the data (values), the index, and the columns we want
in our resulting dataframe.
By passing a list in columns, we can create a MultiIndex in our column axis
Time
20060101 20,410.65 10.33
20070101 21,087.57 10.67
20080101 20,718.24 10.48
20090101 20,984.77 10.62
20100101 20,879.33 10.57
Country … \
Series In 2015 constant prices at 2015 USD exchange rates …
Pay period Annual …
Time …
20060101 23,826.64 …
20070101 24,616.84 …
20080101 24,185.70 …
20090101 24,496.84 …
20100101 24,373.76 …
Country
Series In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly
Time
20060101 12,594.40 6.05
20070101 12,974.40 6.24
20080101 14,097.56 6.78
20090101 15,756.42 7.58
20100101 16,391.31 7.88
To more easily filter our time series data, later on, we will convert the index into a
DateTimeIndex
Out[5]: pandas.core.indexes.datetimes.DatetimeIndex
The columns contain multiple levels of indexing, known as a MultiIndex, with levels being
ordered hierarchically (Country > Series > Pay period).
A MultiIndex is the simplest and most flexible way to manage panel data in pandas
In [6]: type(realwage.columns)
Out[6]: pandas.core.indexes.multi.MultiIndex
In [7]: realwage.columns.names
Like before, we can select the country (the top level of our MultiIndex)
Stacking and unstacking levels of the MultiIndex will be used throughout this lecture to re-
shape our dataframe into a format we need.
.stack() rotates the lowest level of the column MultiIndex to the row index (.unstack()
works in the opposite direction - try it out)
In [9]: realwage.stack().head()
Country \
Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
20060101 Annual 23,826.64
Hourly 12.06
20070101 Annual 24,616.84
Hourly 12.46
20080101 Annual 24,185.70
Country Belgium … \
Series In 2015 constant prices at 2015 USD PPPs …
Time Pay period …
20060101 Annual 21,042.28 …
Hourly 10.09 …
20070101 Annual 21,310.05 …
Hourly 10.22 …
20080101 Annual 21,416.96 …
Country
Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
20060101 Annual 12,594.40
Hourly 6.05
20070101 Annual 12,974.40
Hourly 6.24
20080101 Annual 14,097.56
[5 rows x 64 columns]
We can also pass in an argument to select the level we would like to stack
In [10]: realwage.stack(level='Country').head()
Time
Series In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly
Country
Australia 25,349.90 12.83
Belgium 20,753.48 9.95
Brazil 2,842.28 1.21
Canada 17,367.24 8.35
Chile 4,251.49 1.81
For the rest of lecture, we will work with a dataframe of the hourly real minimum wages
across countries and time, measured in 2015 US dollars.
To create our filtered dataframe (realwage_f), we can use the xs method to select values at
lower levels in the multiindex, while keeping the higher levels (countries in this case)
In [12]: realwage_f = realwage.xs(('Hourly', 'In 2015 constant prices at 2015 USD exchange
rates'),
level=('Pay period', 'Series'), axis=1)
realwage_f.head()
[5 rows x 32 columns]
Similar to relational databases like SQL, pandas has built in methods to merge datasets to-
gether.
Using country information from WorldData.info, we’ll add the continent of each country to
realwage_f with the merge function.
[5 rows x 17 columns]
First, we’ll select just the country and continent variables from worlddata and rename the
column to ‘Country’
In [16]: realwage_f.transpose().head()
Time 20160101
Country
Australia 12.98
Belgium 9.76
Brazil 1.24
Canada 8.48
Chile 1.91
[5 rows x 11 columns]
878 CHAPTER 53. PANDAS FOR PANEL DATA
We can use either left, right, inner, or outer join to merge our datasets:
• left join includes only countries from the left dataset
• right join includes only countries from the right dataset
• outer join includes countries that are in either the left and right datasets
• inner join includes only countries common to both the left and right datasets
By default, merge will use an inner join.
Here we will pass how='left' to keep all countries in realwage_f, but discard countries in
worlddata that do not have a corresponding data entry realwage_f.
We will also need to specify where the country name is located in each dataframe, which will
be the key that is used to merge the dataframes ‘on’.
Our ‘left’ dataframe (realwage_f.transpose()) contains countries in the index, so we set
left_index=True.
Our ‘right’ dataframe (worlddata) contains countries in the ‘Country’ column, so we set
right_on='Country'
[5 rows x 13 columns]
Countries that appeared in realwage_f but not in worlddata will have NaN in the Continent
column.
To check whether this has occurred, we can use .isnull() on the continent column and filter
the merged dataframe
In [18]: merged[merged['Continent'].isnull()]
[3 rows x 13 columns]
merged['Country'].map(missing_continents)
122.00 NaN
123.00 NaN
138.00 NaN
153.00 NaN
151.00 NaN
174.00 NaN
175.00 NaN
nan Europe
nan Europe
198.00 NaN
200.00 NaN
227.00 NaN
241.00 NaN
240.00 NaN
Name: Country, dtype: object
In [20]: merged['Continent'] =
merged['Continent'].fillna(merged['Country'].map(missing_continents))
merged[merged['Country'] == 'Korea']
[1 rows x 13 columns]
We will also combine the Americas into a single continent - this will make our visualization
nicer later on.
To do this, we will use .replace() and loop through a list of the continent values we want to
replace
Now that we have all the data we want in a single DataFrame, we will reshape it back into
panel form with a MultiIndex.
We should also ensure to sort the index using .sort_index() so that we can efficiently filter
our dataframe later on.
By default, levels will be sorted top-down
20150101 20160101
Continent Country
America Brazil 1.21 1.24
Canada 8.35 8.48
Chile 1.81 1.91
Colombia 1.13 1.12
Costa Rica 2.56 2.63
[5 rows x 11 columns]
While merging, we lost our DatetimeIndex, as we merged columns that were not in datetime
format
In [23]: merged.columns
Now that we have set the merged columns as the index, we can recreate a DatetimeIndex us-
ing .to_datetime()
The DatetimeIndex tends to work more smoothly in the row axis, so we will go ahead and
transpose merged
[5 rows x 32 columns]
882 CHAPTER 53. PANDAS FOR PANEL DATA
Grouping and summarizing data can be particularly useful for understanding large panel
datasets.
A simple way to summarize data is to call an aggregation method on the dataframe, such as
.mean() or .max().
For example, we can calculate the average real minimum wage for each country over the pe-
riod 2006 to 2016 (the default is to aggregate over rows)
In [26]: merged.mean().head(10)
Using this series, we can plot the average real minimum wage over the past decade for each
country in our data set
plt.show()
53.5. GROUPING AND SUMMARIZING DATA 883
Passing in axis=1 to .mean() will aggregate over columns (giving the average minimum wage
for all countries over time)
In [28]: merged.mean(axis=1).head()
Out[28]: Time
20060101 4.69
20070101 4.84
20080101 4.90
20090101 5.08
20100101 5.11
dtype: float64
In [29]: merged.mean(axis=1).plot()
plt.title('Average real minimum wage 2006 2016')
plt.ylabel('2015 USD')
plt.xlabel('Year')
plt.show()
884 CHAPTER 53. PANDAS FOR PANEL DATA
We can also specify a level of the MultiIndex (in the column axis) to aggregate over
We can plot the average minimum wages in each continent as a time series
In [33]: merged.stack().describe()
Calling an aggregation method on the object applies the function to each group, the results of
which are combined in a new data structure.
For example, we can return the number of countries in our dataset for each continent using
.size().
In [35]: grouped.size()
Out[35]: Continent
America 7
Asia 4
Europe 19
dtype: int64
Calling .get_group() to return just the countries in a single group, we can create a kernel
density estimate of the distribution of real minimum wages in 2016 for each continent.
grouped.groups.keys() will return the keys from the groupby object
continents = grouped.groups.keys()
This lecture has provided an introduction to some of pandas’ more advanced features, includ-
ing multiindices, merging, grouping and plotting.
Other tools that may be useful in panel data analysis include xarray, a python package that
extends pandas to N-dimensional data structures.
53.7 Exercises
53.7.1 Exercise 1
In these exercises, you’ll work with a dataset of employment rates in Europe by age and sex
from Eurostat.
The dataset can be accessed with the following link:
Reading in the CSV file returns a panel dataset in long format. Use .pivot_table() to con-
struct a wide format dataframe with a MultiIndex in the columns.
Start off by exploring the dataframe and the variables available in the MultiIndex levels.
Write a program that quickly returns all values in the MultiIndex.
53.7.2 Exercise 2
Filter the above dataframe to only include employment as a percentage of ‘active population’.
Create a grouped boxplot using seaborn of employment rates in 2015 by age group and sex.
Hint: GEO includes both areas and countries.
53.8 Solutions
53.8.1 Exercise 1
UNIT
AGE
SEX
INDIC_EM
GEO United Kingdom
DATE
20070101 4,131.00
20080101 4,204.00
20090101 4,193.00
20100101 4,186.00
20110101 4,164.00
This is a large dataset so it is useful to explore the levels and variables available
In [39]: employ.columns.names
53.8.2 Exercise 2
To easily filter by country, swap GEO to the top level and sort the MultiIndex
We need to get rid of a few items in GEO which are not countries.
A fast way to get rid of the EU areas is to use a list comprehension to find the level values in
GEO that begin with ‘Euro’
Select only percentage employed in the active population from the dataframe
GEO
53.8. SOLUTIONS 891
AGE
SEX Total
DATE
20070101 59.30
20080101 59.80
20090101 60.30
20100101 60.00
20110101 59.70
54.1 Contents
• Overview 54.2
• Simple Linear Regression 54.3
• Extending the Linear Regression Model 54.4
• Endogeneity 54.5
• Summary 54.6
• Exercises 54.7
• Solutions 54.8
In addition to what’s in Anaconda, this lecture will need the following libraries:
54.2 Overview
Linear regression is a standard tool for analyzing the relationship between two or more vari-
ables.
In this lecture, we’ll use the Python package statsmodels to estimate, interpret, and visualize
linear regression models.
Along the way, we’ll discuss a variety of topics, including
• simple and multivariate linear regression
• visualization
• endogeneity and omitted variable bias
• two-stage least squares
As an example, we will replicate results from Acemoglu, Johnson and Robinson’s seminal pa-
per [1].
• You can download a copy here.
In the paper, the authors emphasize the importance of institutions in economic development.
The main contribution is the use of settler mortality rates as a source of exogenous variation
in institutional differences.
893
894 CHAPTER 54. LINEAR REGRESSION IN PYTHON
Such variation is needed to determine whether it is institutions that give rise to greater eco-
nomic growth, rather than the other way around.
Let’s start with some imports:
54.2.1 Prerequisites
[1] wish to determine whether or not differences in institutions can help to explain observed
economic outcomes.
How do we measure institutional differences and economic outcomes?
In this paper,
• economic outcomes are proxied by log GDP per capita in 1995, adjusted for exchange
rates.
• institutional differences are proxied by an index of protection against expropriation on
average over 1985-95, constructed by the Political Risk Services Group.
These variables and other data used in the paper are available for download on Daron Ace-
moglu’s webpage.
We will use pandas’ .read_stata() function to read in data contained in the .dta files to
dataframes
Let’s use a scatterplot to see whether any obvious relationship exists between GDP per capita
and the protection against expropriation index
In [4]: plt.style.use('seaborn')
The plot shows a fairly strong positive relationship between protection against expropriation
and log GDP per capita.
Specifically, if higher protection against expropriation is a measure of institutional quality,
then better institutions appear to be positively correlated with better economic outcomes
(higher GDP per capita).
Given the plot, choosing a linear model to describe this relationship seems like a reasonable
assumption.
We can write our model as
𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 = 𝛽0 + 𝛽1 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 + 𝑢𝑖
where:
• 𝛽0 is the intercept of the linear trend line on the y-axis
• 𝛽1 is the slope of the linear trend line, representing the marginal effect of protection
against risk on log GDP per capita
• 𝑢𝑖 is a random error term (deviations of observations from the linear trend due to fac-
tors not included in the model)
896 CHAPTER 54. LINEAR REGRESSION IN PYTHON
Visually, this linear model involves choosing a straight line that best fits the data, as in the
following plot (Figure 2 in [1])
X = df1_subset['avexpr']
y = df1_subset['logpgp95']
labels = df1_subset['shortnam']
ax.set_xlim([3.3,10.5])
ax.set_ylim([4,10.5])
ax.set_xlabel('Average Expropriation Risk 198595')
ax.set_ylabel('Log GDP per capita, PPP, 1995')
ax.set_title('Figure 2: OLS relationship between expropriation \
risk and income')
plt.show()
54.3. SIMPLE LINEAR REGRESSION 897
The most common technique to estimate the parameters (𝛽’s) of the linear model is Ordinary
Least Squares (OLS).
As the name implies, an OLS model is solved by finding the parameters that minimize the
sum of squared residuals, i.e.
𝑁
min ∑ 𝑢̂2𝑖
𝛽̂ 𝑖=1
where 𝑢̂𝑖 is the difference between the observation and the predicted value of the dependent
variable.
To estimate the constant term 𝛽0 , we need to add a column of 1’s to our dataset (consider
the equation if 𝛽0 was replaced with 𝛽0 𝑥𝑖 and 𝑥𝑖 = 1)
In [6]: df1['const'] = 1
Now we can construct our model in statsmodels using the OLS function.
We will use pandas dataframes with statsmodels, however standard arrays can also be used
as arguments
Out[7]: statsmodels.regression.linear_model.OLS
Out[8]: statsmodels.regression.linear_model.RegressionResultsWrapper
In [9]: print(results.summary())
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
̂
𝑙𝑜𝑔𝑝𝑔𝑝95 𝑖 = 4.63 + 0.53 𝑎𝑣𝑒𝑥𝑝𝑟𝑖
This equation describes the line that best fits our data, as shown in Figure 2.
We can use this equation to predict the level of log GDP per capita for a value of the index of
expropriation protection.
For example, for a country with an index value of 7.07 (the average for the dataset), we find
that their predicted level of log GDP per capita in 1995 is 8.38.
Out[10]: 6.515625
Out[11]: 8.3771
An easier (and more accurate) way to obtain this result is to use .predict() and set
𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 = 1 and 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 = 𝑚𝑒𝑎𝑛_𝑒𝑥𝑝𝑟
54.3. SIMPLE LINEAR REGRESSION 899
Out[12]: array([8.09156367])
We can obtain an array of predicted 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 for every value of 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 in our dataset by
calling .predict() on our results.
Plotting the predicted values against 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 shows that the predicted values lie along the
linear line that we fitted above.
The observed values of 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 are also plotted for comparison purposes
fix, ax = plt.subplots()
ax.scatter(df1_plot['avexpr'], results.predict(), alpha=0.5,
label='predicted')
ax.legend()
ax.set_title('OLS predicted values')
ax.set_xlabel('avexpr')
ax.set_ylabel('logpgp95')
plt.show()
900 CHAPTER 54. LINEAR REGRESSION IN PYTHON
So far we have only accounted for institutions affecting economic performance - almost cer-
tainly there are numerous other factors affecting GDP that are not included in our model.
Leaving out variables that affect 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 will result in omitted variable bias, yielding
biased and inconsistent parameter estimates.
We can extend our bivariate regression model to a multivariate regression model by
adding in other factors that may affect 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 .
[1] consider other factors such as:
• the effect of climate on economic outcomes; latitude is used to proxy this
• differences that affect both economic performance and institutions, eg. cultural, histori-
cal, etc.; controlled for with the use of continent dummies
Let’s estimate some of the extended models considered in the paper (Table 2) using data from
maketable2.dta
Now that we have fitted our model, we will use summary_col to display the results in a single
table (model numbers correspond to those in the paper)
results_table = summary_col(results=[reg1,reg2,reg3],
float_format='%0.2f',
stars = True,
model_names=['Model 1',
'Model 3',
'Model 4'],
info_dict=info_dict,
regressor_order=['const',
'avexpr',
'lat_abst',
'asia',
'africa'])
print(results_table)
54.5 Endogeneity
As [1] discuss, the OLS models likely suffer from endogeneity issues, resulting in biased and
inconsistent model estimates.
Namely, there is likely a two-way relationship between institutions and economic outcomes:
• richer countries may be able to afford or prefer better institutions
• variables that affect income may also be correlated with institutional differences
• the construction of the index may be biased; analysts may be biased towards seeing
countries with higher income having better institutions
To deal with endogeneity, we can use two-stage least squares (2SLS) regression, which
is an extension of OLS regression.
This method requires replacing the endogenous variable 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 with a variable that is:
The new set of regressors is called an instrument, which aims to remove endogeneity in our
proxy of institutional differences.
The main contribution of [1] is the use of settler mortality rates to instrument for institu-
tional differences.
They hypothesize that higher mortality rates of colonizers led to the establishment of insti-
tutions that were more extractive in nature (less protection against expropriation), and these
institutions still persist today.
902 CHAPTER 54. LINEAR REGRESSION IN PYTHON
Using a scatterplot (Figure 3 in [1]), we can see protection against expropriation is negatively
correlated with settler mortality rates, coinciding with the authors’ hypothesis and satisfying
the first condition of a valid instrument.
X = df1_subset2['logem4']
y = df1_subset2['avexpr']
labels = df1_subset2['shortnam']
ax.set_xlim([1.8,8.4])
ax.set_ylim([3.3,10.4])
ax.set_xlabel('Log of Settler Mortality')
ax.set_ylabel('Average Expropriation Risk 198595')
ax.set_title('Figure 3: Firststage relationship between settler mortality \
and expropriation risk')
plt.show()
The second condition may not be satisfied if settler mortality rates in the 17th to 19th cen-
turies have a direct effect on current GDP (in addition to their indirect effect through institu-
54.5. ENDOGENEITY 903
tions).
For example, settler mortality rates may be related to the current disease environment in a
country, which could affect current economic performance.
[1] argue this is unlikely because:
• The majority of settler deaths were due to malaria and yellow fever and had a limited
effect on local people.
• The disease burden on local people in Africa or India, for example, did not appear to
be higher than average, supported by relatively high population densities in these areas
before colonization.
As we appear to have a valid instrument, we can use 2SLS regression to obtain consistent and
unbiased parameter estimates.
First stage
The first stage involves regressing the endogenous variable (𝑎𝑣𝑒𝑥𝑝𝑟𝑖 ) on the instrument.
The instrument is the set of all exogenous variables in our model (and not just the variable
we have replaced).
Using model 1 as an example, our instrument is simply a constant and settler mortality rates
𝑙𝑜𝑔𝑒𝑚4𝑖 .
Therefore, we will estimate the first-stage regression as
𝑎𝑣𝑒𝑥𝑝𝑟𝑖 = 𝛿0 + 𝛿1 𝑙𝑜𝑔𝑒𝑚4𝑖 + 𝑣𝑖
The data we need to estimate this equation is located in maketable4.dta (only complete data,
indicated by baseco = 1, is used for estimation)
const 9.3414 0.611 15.296 0.000 8.121 10.562
logem4 0.6068 0.127 4.790 0.000 0.860 0.354
==============================================================================
Omnibus: 0.035 DurbinWatson: 2.003
Prob(Omnibus): 0.983 JarqueBera (JB): 0.172
Skew: 0.045 Prob(JB): 0.918
Kurtosis: 2.763 Cond. No. 19.4
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
Second stage
We need to retrieve the predicted values of 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 using .predict().
We then replace the endogenous variable 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 with the predicted values 𝑎𝑣𝑒𝑥𝑝𝑟
̂ 𝑖 in the
original linear model.
Our second stage regression is thus
𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 = 𝛽0 + 𝛽1 𝑎𝑣𝑒𝑥𝑝𝑟
̂ 𝑖 + 𝑢𝑖
results_ss = sm.OLS(df4['logpgp95'],
df4[['const', 'predicted_avexpr']]).fit()
print(results_ss.summary())
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
54.5. ENDOGENEITY 905
The second-stage regression results give us an unbiased and consistent estimate of the effect
of institutions on economic outcomes.
The result suggests a stronger positive relationship than what the OLS results indicated.
Note that while our parameter estimates are correct, our standard errors are not and for this
reason, computing 2SLS ‘manually’ (in stages with OLS) is not recommended.
We can correctly estimate a 2SLS regression in one step using the linearmodels package, an
extension of statsmodels
Note that when using IV2SLS, the exogenous and instrument variables are split up in the
function arguments (whereas before the instrument included exogenous variables)
In [19]: iv = IV2SLS(dependent=df4['logpgp95'],
exog=df4['const'],
endog=df4['avexpr'],
instruments=df4['logem4']).fit(cov_type='unadjusted')
print(iv.summary)
Parameter Estimates
==============================================================================
Parameter Std. Err. Tstat Pvalue Lower CI Upper CI
const 1.9097 1.0106 1.8897 0.0588 0.0710 3.8903
avexpr 0.9443 0.1541 6.1293 0.0000 0.6423 1.2462
==============================================================================
Endogenous: avexpr
Instruments: logem4
Unadjusted Covariance (Homoskedastic)
Debiased: False
/usr/share/miniconda3/envs/qelectures/lib/python3.7/site
packages/linearmodels/iv/data.py:25: FutureWarning: is_categorical is deprecated and
will be removed in a future version. Use is_categorical_dtype instead
if is_categorical(s):
Given that we now have consistent and unbiased estimates, we can infer from the model we
have estimated that institutional differences (stemming from institutions set up during colo-
nization) can help to explain differences in income levels across countries today.
[1] use a marginal effect of 0.94 to calculate that the difference in the index between Chile
and Nigeria (ie. institutional quality) implies up to a 7-fold difference in income, emphasizing
the significance of institutions in economic development.
906 CHAPTER 54. LINEAR REGRESSION IN PYTHON
54.6 Summary
We have demonstrated basic OLS and 2SLS regression in statsmodels and linearmodels.
If you are familiar with R, you may want to use the formula interface to statsmodels, or con-
sider using r2py to call R from within Python.
54.7 Exercises
54.7.1 Exercise 1
In the lecture, we think the original model suffers from endogeneity bias due to the likely ef-
fect income has on institutional development.
Although endogeneity is often best identified by thinking about the data and model, we can
formally test for endogeneity using the Hausman test.
We want to test for correlation between the endogenous variable, 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 , and the errors, 𝑢𝑖
𝑎𝑣𝑒𝑥𝑝𝑟𝑖 = 𝜋0 + 𝜋1 𝑙𝑜𝑔𝑒𝑚4𝑖 + 𝜐𝑖
Second, we retrieve the residuals 𝜐𝑖̂ and include them in the original equation
If 𝛼 is statistically significant (with a p-value < 0.05), then we reject the null hypothesis and
conclude that 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 is endogenous.
Using the above information, estimate a Hausman test and interpret your results.
54.7.2 Exercise 2
The OLS parameter 𝛽 can also be estimated using matrix algebra and numpy (you may need
to review the numpy lecture to complete this exercise).
The linear equation we want to estimate is (written in matrix form)
𝑦 = 𝑋𝛽 + 𝑢
To solve for the unknown parameter 𝛽, we want to minimize the sum of squared residuals
min𝑢̂′ 𝑢̂
𝛽̂
54.8. SOLUTIONS 907
Rearranging the first equation and substituting into the second equation, we can write
Solving this optimization problem gives the solution for the 𝛽 ̂ coefficients
𝛽 ̂ = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦
Using the above information, compute 𝛽 ̂ from model 1 using numpy - your results should be
the same as those in the statsmodels output from earlier in the lecture.
54.8 Solutions
54.8.1 Exercise 1
print(reg2.summary())
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
The output shows that the coefficient on the residuals is statistically significant, indicating
𝑎𝑣𝑒𝑥𝑝𝑟𝑖 is endogenous.
54.8.2 Exercise 2
# Compute β_hat
β_hat = np.linalg.solve(X.T @ X, X.T @ y)
β_0 = 4.6
β_1 = 0.53
55.1 Contents
• Overview 55.2
• Set Up and Assumptions 55.3
• Conditional Distributions 55.4
• Maximum Likelihood Estimation 55.5
• MLE with Numerical Methods 55.6
• Maximum Likelihood Estimation with statsmodels 55.7
• Summary 55.8
• Exercises 55.9
• Solutions 55.10
55.2 Overview
In a previous lecture, we estimated the relationship between dependent and explanatory vari-
ables using linear regression.
But what if a linear relationship is not an appropriate assumption for our model?
One widely used alternative is maximum likelihood estimation, which involves specifying a
class of distributions, indexed by unknown parameters, and then using the data to pin down
these parameter values.
The benefit relative to linear regression is that it allows more flexibility in the probabilistic
relationships between variables.
Here we illustrate maximum likelihood by replicating Daniel Treisman’s (2016) paper, Rus-
sia’s Billionaires, which connects the number of billionaires in a country to its economic char-
acteristics.
The paper concludes that Russia has a higher number of billionaires than economic factors
such as market size and tax rate predict.
We’ll require the following imports:
909
910 CHAPTER 55. MAXIMUM LIKELIHOOD ESTIMATION
55.2.1 Prerequisites
Let’s consider the steps we need to go through in maximum likelihood estimation and how
they pertain to this study.
The first step with maximum likelihood estimation is to choose the probability distribution
believed to be generating the data.
More precisely, we need to make an assumption as to which parametric class of distributions
is generating the data.
• e.g., the class of all normal distributions, or the class of all gamma distributions.
Each such class is a family of distributions indexed by a finite number of parameters.
• e.g., the class of normal distributions is a family of distributions indexed by its mean
𝜇 ∈ (−∞, ∞) and standard deviation 𝜎 ∈ (0, ∞).
We’ll let the data pick out a particular element of the class by pinning down the parameters.
The parameter estimates so produced will be called maximum likelihood estimates.
𝜇𝑦 −𝜇
𝑓(𝑦) = 𝑒 , 𝑦 = 0, 1, 2, … , ∞
𝑦!
We can plot the Poisson distribution over 𝑦 for different values of 𝜇 as follows
55.3. SET UP AND ASSUMPTIONS 911
ax.grid()
ax.set_xlabel('$y$', fontsize=14)
ax.set_ylabel('$f(y \mid \mu)$', fontsize=14)
ax.axis(xmin=0, ymin=0)
ax.legend(fontsize=14)
plt.show()
Notice that the Poisson distribution begins to resemble a normal distribution as the mean of
𝑦 increases.
Let’s have a look at the distribution of the data we’ll be working with in this lecture.
Treisman’s main source of data is Forbes’ annual rankings of billionaires and their estimated
net worth.
The dataset mle/fp.dta can be downloaded here or from its AER page.
In [3]: pd.options.display.max_columns = 10
912 CHAPTER 55. MAXIMUM LIKELIHOOD ESTIMATION
[5 rows x 36 columns]
Using a histogram, we can view the distribution of the number of billionaires per country,
numbil0, in 2008 (the United States is dropped for plotting purposes)
plt.subplots(figsize=(12, 8))
plt.hist(numbil0_2008, bins=30)
plt.xlim(left=0)
plt.grid()
plt.xlabel('Number of billionaires in 2008')
plt.ylabel('Count')
plt.show()
55.4. CONDITIONAL DISTRIBUTIONS 913
From the histogram, it appears that the Poisson assumption is not unreasonable (albeit with
a very low 𝜇 and some outliers).
𝑦
𝜇𝑖 𝑖 −𝜇𝑖
𝑓(𝑦𝑖 ∣ x𝑖 ) = 𝑒 ; 𝑦𝑖 = 0, 1, 2, … , ∞. (1)
𝑦𝑖 !
To illustrate the idea that the distribution of 𝑦𝑖 depends on x𝑖 let’s run a simple simulation.
We use our poisson_pmf function from above and arbitrary values for 𝛽 and x𝑖
for X in datasets:
μ = exp(X @ β)
distribution = []
for y_i in y_values:
distribution.append(poisson_pmf(y_i, μ))
ax.plot(y_values,
distribution,
label=f'$\mu_i$={μ:.1}',
marker='o',
markersize=8,
alpha=0.5)
ax.grid()
ax.legend()
ax.set_xlabel('$y \mid x_i$')
ax.set_ylabel(r'$f(y \mid x_i; \beta )$')
ax.axis(xmin=0, ymin=0)
plt.show()
914 CHAPTER 55. MAXIMUM LIKELIHOOD ESTIMATION
In our model for number of billionaires, the conditional distribution contains 4 (𝑘 = 4) pa-
rameters that we need to estimate.
We will label our entire parameter vector as 𝛽 where
𝛽0
⎡𝛽 ⎤
𝛽 = ⎢ 1⎥
⎢𝛽2 ⎥
⎣𝛽3 ⎦
To estimate the model using MLE, we want to maximize the likelihood that our estimate 𝛽̂ is
the true parameter 𝛽.
Intuitively, we want to find the 𝛽̂ that best fits our data.
First, we need to construct the likelihood function ℒ(𝛽), which is similar to a joint probabil-
ity density function.
Assume we have some data 𝑦𝑖 = {𝑦1 , 𝑦2 } and 𝑦𝑖 ∼ 𝑓(𝑦𝑖 ).
If 𝑦1 and 𝑦2 are independent, the joint pmf of these data is 𝑓(𝑦1 , 𝑦2 ) = 𝑓(𝑦1 ) ⋅ 𝑓(𝑦2 ).
If 𝑦𝑖 follows a Poisson distribution with 𝜆 = 7, we can visualize the joint pmf like so
plot_joint_poisson(μ=7, y_n=20)
Similarly, the joint pmf of our data (which is distributed as a conditional Poisson distribu-
tion) can be written as
𝑛 𝑦
𝜇𝑖 𝑖 −𝜇𝑖
𝑓(𝑦1 , 𝑦2 , … , 𝑦𝑛 ∣ x1 , x2 , … , x𝑛 ; 𝛽) = ∏ 𝑒
𝑖=1
𝑦𝑖 !
The likelihood function is the same as the joint pmf, but treats the parameter 𝛽 as a random
variable and takes the observations (𝑦𝑖 , x𝑖 ) as given
𝑛 𝑦
𝜇𝑖 𝑖 −𝜇𝑖
ℒ(𝛽 ∣ 𝑦1 , 𝑦2 , … , 𝑦𝑛 ; x1 , x2 , … , x𝑛 ) = ∏ 𝑒
𝑖=1
𝑦𝑖 !
=𝑓(𝑦1 , 𝑦2 , … , 𝑦𝑛 ∣ x1 , x2 , … , x𝑛 ; 𝛽)
Now that we have our likelihood function, we want to find the 𝛽̂ that yields the maximum
likelihood value
maxℒ(𝛽)
𝛽
The MLE of the Poisson to the Poisson for 𝛽 ̂ can be obtained by solving
𝑛 𝑛 𝑛
max( ∑ 𝑦𝑖 log 𝜇𝑖 − ∑ 𝜇𝑖 − ∑ log 𝑦!)
𝛽
𝑖=1 𝑖=1 𝑖=1
However, no analytical solution exists to the above problem – to find the MLE we need to use
numerical methods.
Many distributions do not have nice, analytical solutions and therefore require numerical
methods to solve for parameter estimates.
One such numerical method is the Newton-Raphson algorithm.
Our goal is to find the maximum likelihood estimate 𝛽.̂
At 𝛽,̂ the first derivative of the log-likelihood function will be equal to 0.
55.6. MLE WITH NUMERICAL METHODS 917
ax1.set_ylabel(r'$log \mathcal{L(\beta)}$',
rotation=0,
labelpad=35,
fontsize=15)
ax2.set_ylabel(r'$\frac{dlog \mathcal{L(\beta)}}{d \beta}$ ',
rotation=0,
labelpad=35,
fontsize=19)
ax2.set_xlabel(r'$\beta$', fontsize=15)
ax1.grid(), ax2.grid()
plt.axhline(c='black')
plt.show()
𝑑 log ℒ(𝛽)
The plot shows that the maximum likelihood value (the top plot) occurs when 𝑑𝛽 = 0
(the bottom plot).
Therefore, the likelihood is maximized when 𝛽 = 10.
We can also ensure that this value is a maximum (as opposed to a minimum) by checking
that the second derivative (slope of the bottom plot) is negative.
The Newton-Raphson algorithm finds a point where the first derivative is 0.
To use the algorithm, we take an initial guess at the maximum value, 𝛽0 (the OLS parameter
estimates might be a reasonable guess), then
(In practice, we stop iterating when the difference is below a small tolerance threshold)
Let’s have a go at implementing the Newton-Raphson algorithm.
First, we’ll create a class called PoissonRegression so we can easily recompute the values of
the log likelihood, gradient and Hessian for every iteration
def μ(self):
return np.exp(self.X @ self.β)
def logL(self):
y = self.y
μ = self.μ()
return np.sum(y * np.log(μ) μ np.log(factorial(y)))
def G(self):
y = self.y
μ = self.μ()
return X.T @ (y μ)
def H(self):
X = self.X
μ = self.μ()
return (X.T @ (μ * X))
Our function newton_raphson will take a PoissonRegression object that has an initial guess
of the parameter vector 𝛽 0 .
The algorithm will update the parameter vector according to the updating rule, and recalcu-
late the gradient and Hessian matrices at the new parameter estimates.
Iteration will end when either:
• The difference between the parameter and the updated parameter is below a tolerance
level.
• The maximum number of iterations has been achieved (meaning convergence is not
achieved).
So we can get an idea of what’s going on while the algorithm is running, an option
display=True is added to print out values at each iteration.
i = 0
error = 100 # Initial error value
# Print iterations
if display:
β_list = [f'{t:.3}' for t in list(model.β.flatten())]
update = f'{i:<13}{model.logL():<16.8}{β_list}'
print(update)
i += 1
Let’s try out our algorithm with a small dataset of 5 observations and 3 variables in X.
y = np.array([1, 0, 1, 1, 0])
Iteration_k Loglikelihood θ
0 4.3447622 ['1.49', '0.265', '0.244']
1 3.5742413 ['3.38', '0.528', '0.474']
2 3.3999526 ['5.06', '0.782', '0.702']
3 3.3788646 ['5.92', '0.909', '0.82']
4 3.3783559 ['6.07', '0.933', '0.843']
5 3.3783555 ['6.08', '0.933', '0.843']
Number of iterations: 6
β_hat = [6.07848205 0.93340226 0.84329625]
As this was a simple model with few observations, the algorithm achieved convergence in only
6 iterations.
You can see that with each iteration, the log-likelihood value increased.
Remember, our objective was to maximize the log-likelihood function, which the algorithm
has worked to achieve.
55.6. MLE WITH NUMERICAL METHODS 921
Also, note that the increase in log ℒ(𝛽 (𝑘) ) becomes smaller with each iteration.
This is because the gradient is approaching 0 as we reach the maximum, and therefore the
numerator in our updating equation is becoming smaller.
The gradient vector should be close to 0 at 𝛽̂
In [11]: poi.G()
Out[11]: array([[3.95169228e07],
[1.00114805e06],
[7.73114562e07]])
The iterative process can be visualized in the following diagram, where the maximum is found
at 𝛽 = 10
β = np.linspace(2, 18)
fig, ax = plt.subplots(figsize=(12, 8))
ax.plot(β, logL(β), lw=2, c='black')
Note that our implementation of the Newton-Raphson algorithm is rather basic — for more
robust implementations see, for example, scipy.optimize.
Now that we know what’s going on under the hood, we can apply MLE to an interesting ap-
plication.
We’ll use the Poisson regression model in statsmodels to obtain a richer output with stan-
dard errors, test values, and more.
statsmodels uses the same algorithm as above to find the maximum likelihood estimates.
Before we begin, let’s re-estimate our simple model with statsmodels to confirm we obtain
the same coefficients and log-likelihood value.
y = np.array([1, 0, 1, 1, 0])
Now let’s replicate results from Daniel Treisman’s paper, Russia’s Billionaires, mentioned ear-
lier in the lecture.
Treisman starts by estimating equation (1), where:
• 𝑦𝑖 is 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑖𝑙𝑙𝑖𝑜𝑛𝑎𝑖𝑟𝑒𝑠𝑖
• 𝑥𝑖1 is log 𝐺𝐷𝑃 𝑝𝑒𝑟 𝑐𝑎𝑝𝑖𝑡𝑎𝑖
• 𝑥𝑖2 is log 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑖
• 𝑥𝑖3 is 𝑦𝑒𝑎𝑟𝑠 𝑖𝑛 𝐺𝐴𝑇 𝑇 𝑖 – years membership in GATT and WTO (to proxy access to in-
ternational markets)
The paper only considers the year 2008 for estimation.
We will set up our variables for estimation like so (you should have the data assigned to df
from earlier in the lecture)
# Add a constant
df['const'] = 1
# Variable sets
reg1 = ['const', 'lngdppc', 'lnpop', 'gattwto08']
reg2 = ['const', 'lngdppc', 'lnpop',
'gattwto08', 'lnmcap08', 'rintr', 'topint08']
reg3 = ['const', 'lngdppc', 'lnpop', 'gattwto08', 'lnmcap08',
'rintr', 'topint08', 'nrrents', 'roflaw']
Then we can use the Poisson function from statsmodels to fit the model.
We’ll use robust standard errors as in the author’s paper
results_table = summary_col(results=results,
float_format='%0.3f',
stars=True,
model_names=reg_names,
info_dict=info_dict,
regressor_order=regressor_order)
results_table.add_title('Table 1 Explaining the Number of Billionaires \
in 2008')
print(results_table)
The output suggests that the frequency of billionaires is positively correlated with GDP
per capita, population size, stock market capitalization, and negatively correlated with top
marginal income tax rate.
To analyze our results by country, we can plot the difference between the predicted an actual
values, then sort from highest to lowest and plot the first 15
# Calculate difference
results_df['difference'] = results_df['numbil0'] results_df['prediction']
As we can see, Russia has by far the highest number of billionaires in excess of what is pre-
dicted by the model (around 50 more than expected).
Treisman uses this empirical result to discuss possible reasons for Russia’s excess of billion-
aires, including the origination of wealth in Russia, the political climate, and the history of
privatization in the years after the USSR.
55.8 Summary
55.9 Exercises
55.9.1 Exercise 1
Suppose we wanted to estimate the probability of an event 𝑦𝑖 occurring, given some observa-
tions.
55.10. SOLUTIONS 927
𝑦
𝑓(𝑦𝑖 ; 𝛽) = 𝜇𝑖 𝑖 (1 − 𝜇𝑖 )1−𝑦𝑖 , 𝑦𝑖 = 0, 1
where 𝜇𝑖 = Φ(x′𝑖 𝛽)
Φ represents the cumulative normal distribution and constrains the predicted 𝑦𝑖 to be be-
tween 0 and 1 (as required for a probability).
𝛽 is a vector of coefficients.
Following the example in the lecture, write a class to represent the Probit model.
To begin, find the log-likelihood function and derive the gradient and Hessian.
The scipy module stats.norm contains the functions needed to compute the cmf and pmf of
the normal distribution.
55.9.2 Exercise 2
Use the following dataset and initial values of 𝛽 to estimate the MLE with the Newton-
Raphson algorithm developed earlier in the lecture
1 2 4 1
⎡1 1 1⎤ ⎡0⎤ 0.1
⎢ ⎥ ⎢ ⎥
X = ⎢1 4 3⎥ 𝑦 = ⎢1⎥ 𝛽 (0) = ⎡
⎢0.1⎥
⎤
⎢1 5 6⎥ ⎢1⎥ ⎣0.1⎦
⎣1 3 5⎦ ⎣0⎦
Verify your results with statsmodels - you can import the Probit function with the following
import statement
Note that the simple Newton-Raphson algorithm developed in this lecture is very sensitive to
initial values, and therefore you may fail to achieve convergence with different starting values.
55.10 Solutions
55.10.1 Exercise 1
𝑛
log ℒ = ∑ [𝑦𝑖 log Φ(x′𝑖 𝛽) + (1 − 𝑦𝑖 ) log(1 − Φ(x′𝑖 𝛽))]
𝑖=1
𝜕
Φ(𝑠) = 𝜙(𝑠)
𝜕𝑠
928 CHAPTER 55. MAXIMUM LIKELIHOOD ESTIMATION
𝑛
𝜕 log ℒ 𝜙(x′𝑖 𝛽) 𝜙(x′𝑖 𝛽)
= ∑ [𝑦𝑖 − (1 − 𝑦 𝑖 ) ]x
𝜕𝛽 𝑖=1
Φ(x′𝑖 𝛽) 1 − Φ(x′𝑖 𝛽) 𝑖
𝑛
𝜕 2 log ℒ ′ 𝜙(x′𝑖 𝛽) + x′𝑖 𝛽Φ(x′𝑖 𝛽) 𝜙𝑖 (x′𝑖 𝛽) − x′𝑖 𝛽(1 − Φ(x′𝑖 𝛽))
′ = − ∑ 𝜙(x 𝑖 𝛽)[𝑦 𝑖 ′ 2
+ (1 − 𝑦 𝑖 ) ′ 2
]x𝑖 x′𝑖
𝜕𝛽𝜕𝛽 𝑖=1
[Φ(x 𝑖 𝛽)] [1 − Φ(x 𝑖 𝛽)]
Using these results, we can write a class for the Probit model as follows
def μ(self):
return norm.cdf(self.X @ self.β.T)
def ϕ(self):
return norm.pdf(self.X @ self.β.T)
def logL(self):
μ = self.μ()
return np.sum(y * np.log(μ) + (1 y) * np.log(1 μ))
def G(self):
μ = self.μ()
ϕ = self.ϕ()
return np.sum((X.T * y * ϕ / μ X.T * (1 y) * ϕ / (1 μ)),
axis=1)
def H(self):
X = self.X
β = self.β
μ = self.μ()
ϕ = self.ϕ()
a = (ϕ + (X @ β.T) * μ) / μ**2
b = (ϕ (X @ β.T) * (1 μ)) / (1 μ)**2
return (ϕ * (y * a + (1 y) * b) * X.T) @ X
55.10.2 Exercise 2
y = np.array([1, 0, 1, 1, 0])
Iteration_k Loglikelihood θ
0 2.3796884 ['1.34', '0.775', '0.157']
1 2.3687526 ['1.53', '0.775', '0.0981']
2 2.3687294 ['1.55', '0.778', '0.0971']
3 2.3687294 ['1.55', '0.778', '0.0971']
Number of iterations: 4
β_hat = [1.54625858 0.77778952 0.09709757]
print(Probit(y, X).fit().summary())
[1] Daron Acemoglu, Simon Johnson, and James A Robinson. The colonial origins of com-
parative development: An empirical investigation. The American Economic Review,
91(5):1369–1401, 2001.
[2] Daron Acemoglu and James A. Robinson. The political economy of the Kuznets curve.
Review of Development Economics, 6(2):183–203, 2002.
[3] SeHyoun Ahn, Greg Kaplan, Benjamin Moll, Thomas Winberry, and Christian Wolf.
When inequality matters for macro and macro matters for inequality. NBER Macroeco-
nomics Annual, 32(1):1–75, 2018.
[4] S Rao Aiyagari. Uninsured Idiosyncratic Risk and Aggregate Saving. The Quarterly
Journal of Economics, 109(3):659–684, 1994.
[7] Robert L Axtell. Zipf distribution of us firm sizes. science, 293(5536):1818–1820, 2001.
[8] Robert J Barro. On the Determination of the Public Debt. Journal of Political Econ-
omy, 87(5):940–971, 1979.
[9] Jess Benhabib and Alberto Bisin. Skewed wealth distributions: Theory and empirics.
Journal of Economic Literature, 56(4):1261–91, 2018.
[10] Jess Benhabib, Alberto Bisin, and Shenghao Zhu. The wealth distribution in bewley
economies with capital income risk. Journal of Economic Theory, 159:489–515, 2015.
[12] Dmitri Bertsekas. Dynamic Programming and Stochastic Control. Academic Press, New
York, 1975.
[13] Truman Bewley. The permanent income hypothesis: A theoretical formulation. Journal
of Economic Theory, 16(2):252–292, 1977.
931
932 BIBLIOGRAPHY
[15] Anmol Bhandari, David Evans, Mikhail Golosov, and Thomas J Sargent. Inequality,
business cycles, and monetary-fiscal policy. Technical report, National Bureau of Eco-
nomic Research, 2018.
[17] Dariusz Buraczewski, Ewa Damek, Thomas Mikosch, et al. Stochastic models with
power-law tails. Springer, 2016.
[18] Philip Cagan. The monetary dynamics of hyperinflation. In Milton Friedman, editor,
Studies in the Quantity Theory of Money, pages 25–117. University of Chicago Press,
Chicago, 1956.
[19] Andrew S Caplin. The variability of aggregate demand with (s, s) inventory policies.
Econometrica, pages 1395–1409, 1985.
[20] Christopher D Carroll. A Theory of the Consumption Function, with and without Liq-
uidity Constraints. Journal of Economic Perspectives, 15(3):23–45, 2001.
[21] Christopher D Carroll. The method of endogenous gridpoints for solving dynamic
stochastic optimization problems. Economics Letters, 91(3):312–320, 2006.
[22] David Cass. Optimum growth in an aggregative model of capital accumulation. Review
of Economic Studies, 32(3):233–240, 1965.
[23] Wilbur John Coleman. Solving the Stochastic Growth Model by Policy-Function Itera-
tion. Journal of Business & Economic Statistics, 8(1):27–29, 1990.
[24] Steven J Davis, R Jason Faberman, and John Haltiwanger. The flow approach to labor
markets: New data sources, micro-macro links and the recent downturn. Journal of
Economic Perspectives, 2006.
[25] Bruno de Finetti. La prevision: Ses lois logiques, ses sources subjectives. Annales de
l’Institute Henri Poincare’, 7:1 – 68, 1937. English translation in Kyburg and Smokler
(eds.), Studies in Subjective Probability, Wiley, New York, 1964.
[26] Angus Deaton. Saving and Liquidity Constraints. Econometrica, 59(5):1221–1248, 1991.
[27] Angus Deaton and Christina Paxson. Intertemporal Choice and Inequality. Journal of
Political Economy, 102(3):437–467, 1994.
[28] Wouter J Den Haan. Comparison of solutions to the incomplete markets model with
aggregate uncertainty. Journal of Economic Dynamics and Control, 34(1):4–27, 2010.
[29] Ulrich Doraszelski and Mark Satterthwaite. Computable markov-perfect industry dy-
namics. The RAND Journal of Economics, 41(2):215–243, 2010.
[30] Y E Du, Ehud Lehrer, and A D Y Pauzner. Competitive economy as a ranking device
over networks. submitted, 2013.
[31] R M Dudley. Real Analysis and Probability. Cambridge Studies in Advanced Mathe-
matics. Cambridge University Press, 2002.
[32] Timothy Dunne, Mark J Roberts, and Larry Samuelson. The growth and failure of us
manufacturing plants. The Quarterly Journal of Economics, 104(4):671–698, 1989.
[33] Robert F Engle and Clive W J Granger. Co-integration and Error Correction: Repre-
sentation, Estimation, and Testing. Econometrica, 55(2):251–276, 1987.
BIBLIOGRAPHY 933
[34] Richard Ericson and Ariel Pakes. Markov-perfect industry dynamics: A framework for
empirical work. The Review of Economic Studies, 62(1):53–82, 1995.
[35] David S Evans. The relationship between firm growth, size, and age: Estimates for 100
manufacturing industries. The Journal of Industrial Economics, pages 567–581, 1987.
[39] Milton Friedman and Rose D Friedman. Two Lucky People. University of Chicago
Press, 1998.
[40] Yoshi Fujiwara, Corrado Di Guilmi, Hideaki Aoyama, Mauro Gallegati, and Wataru
Souma. Do pareto–zipf and gibrat laws hold true? an analysis with european firms.
Physica A: Statistical Mechanics and its Applications, 335(1-2):197–216, 2004.
[41] Xavier Gabaix. Power laws in economics: An introduction. Journal of Economic Per-
spectives, 30(1):185–206, 2016.
[42] Robert Gibrat. Les inégalités économiques: Applications d’une loi nouvelle, la loi de
l’effet proportionnel. PhD thesis, Recueil Sirey, 1931.
[43] Edward Glaeser, Jose Scheinkman, and Andrei Shleifer. The injustice of inequality.
Journal of Monetary Economics, 50(1):199–222, 2003.
[45] Olle Häggström. Finite Markov chains and algorithmic applications, volume 52. Cam-
bridge University Press, 2002.
[46] Bronwyn H Hall. The relationship between firm size and firm growth in the us manu-
facturing sector. The Journal of Industrial Economics, pages 583–606, 1987.
[47] Robert E Hall. Stochastic Implications of the Life Cycle-Permanent Income Hypothesis:
Theory and Evidence. Journal of Political Economy, 86(6):971–987, 1978.
[48] Robert E Hall and Frederic S Mishkin. The Sensitivity of Consumption to Transitory
Income: Estimates from Panel Data on Households. National Bureau of Economic Re-
search Working Paper Series, No. 505, 1982.
[49] James D Hamilton. What’s real about the business cycle? Federal Reserve Bank of St.
Louis Review, (July-August):435–452, 2005.
[51] L P Hansen and T J Sargent. Recursive Models of Dynamic Linear Economies. The
Gorman Lectures in Economics. Princeton University Press, 2013.
[52] Lars Peter Hansen and Scott F Richard. The Role of Conditioning Information in De-
ducing Testable. Econometrica, 55(3):587–613, May 1987.
934 BIBLIOGRAPHY
[53] J. Michael Harrison and David M. Kreps. Speculative investor behavior in a stock mar-
ket with heterogeneous expectations. The Quarterly Journal of Economics, 92(2):323–
336, 1978.
[54] J. Michael Harrison and David M. Kreps. Martingales and arbitrage in multiperiod
securities markets. Journal of Economic Theory, 20(3):381–408, June 1979.
[55] John Heaton and Deborah J Lucas. Evaluating the effects of incomplete markets on risk
sharing and asset pricing. Journal of Political Economy, pages 443–487, 1996.
[57] Hugo A Hopenhayn. Entry, exit, and firm dynamics in long run equilibrium. Economet-
rica: Journal of the Econometric Society, pages 1127–1150, 1992.
[58] Hugo A Hopenhayn and Edward C Prescott. Stochastic Monotonicity and Stationary
Distributions for Dynamic Economies. Econometrica, 60(6):1387–1406, 1992.
[60] K Jänich. Linear Algebra. Springer Undergraduate Texts in Mathematics and Technol-
ogy. Springer, 1994.
[61] Robert J. Shiller John Y. Campbell. The Dividend-Price Ratio and Expectations of
Future Dividends and Discount Factors. Review of Financial Studies, 1(3):195–228,
1988.
[62] Boyan Jovanovic. Firm-specific capital and turnover. Journal of Political Economy,
87(6):1246–1260, 1979.
[63] K L Judd. Cournot versus bertrand: A dynamic resolution. Technical report, Hoover
Institution, Stanford University, 1990.
[64] Takashi Kamihigashi. Elementary results on solutions to the bellman equation of dy-
namic programming: existence, uniqueness, and convergence. Technical report, Kobe
University, 2012.
[65] Illenin Kondo, Logan T Lewis, and Andrea Stella. On the us firm and establishment
size distributions. Technical report, SSRN, 2018.
[67] David M. Kreps. Notes on the Theory of Choice. Westview Press, Boulder, Colorado,
1988.
[69] Martin Lettau and Sydney Ludvigson. Consumption, Aggregate Wealth, and Expected
Stock Returns. Journal of Finance, 56(3):815–849, 06 2001.
BIBLIOGRAPHY 935
[70] Martin Lettau and Sydney C. Ludvigson. Understanding Trend and Cycle in Asset
Values: Reevaluating the Wealth Effect on Consumption. American Economic Review,
94(1):276–299, March 2004.
[71] David Levhari and Leonard J Mirman. The great fish war: an example using a dynamic
cournot-nash solution. The Bell Journal of Economics, pages 322–334, 1980.
[72] L Ljungqvist and T J Sargent. Recursive Macroeconomic Theory. MIT Press, 4 edition,
2018.
[73] Robert E Lucas, Jr. Asset prices in an exchange economy. Econometrica: Journal of the
Econometric Society, 46(6):1429–1445, 1978.
[74] Robert E Lucas, Jr. and Edward C Prescott. Investment under uncertainty. Economet-
rica: Journal of the Econometric Society, pages 659–681, 1971.
[75] Qingyin Ma, John Stachurski, and Alexis Akira Toda. The income fluctuation problem
and the evolution of wealth. Journal of Economic Theory, 187:105003, 2020.
[76] Benoit Mandelbrot. The variation of certain speculative prices. The Journal of Busi-
ness, 36(4):394–419, 1963.
[77] Albert Marcet and Thomas J Sargent. Convergence of Least-Squares Learning in En-
vironments with Hidden State Variables and Private Information. Journal of Political
Economy, 97(6):1306–1322, 1989.
[78] V Filipe Martins-da Rocha and Yiannis Vailakis. Existence and Uniqueness of a Fixed
Point for Local Contractions. Econometrica, 78(3):1127–1141, 2010.
[80] J J McCall. Economics of Information and Job Search. The Quarterly Journal of Eco-
nomics, 84(1):113–126, 1970.
[81] S P Meyn and R L Tweedie. Markov Chains and Stochastic Stability. Cambridge Uni-
versity Press, 2009.
[82] F. Modigliani and R. Brumberg. Utility analysis and the consumption function: An in-
terpretation of cross-section data. In K.K Kurihara, editor, Post-Keynesian Economics.
1954.
[83] Derek Neal. The Complexity of Job Mobility among Young Men. Journal of Labor
Economics, 17(2):237–261, 1999.
[84] Y Nishiyama, S Osada, and K Morimune. Estimation and testing for rank size rule
regression under pareto distribution. In Proceedings of the International Environmental
Modelling and Software Society iEMSs 2004 International Conference. Citeseer, 2004.
[85] Jenő Pál and John Stachurski. Fitted value function iteration with probability one con-
tractions. Journal of Economic Dynamics and Control, 37(1):251–264, 2013.
[87] Guillaume Rabault. When do borrowing constraints bind? Some new results on the
income fluctuation problem. Journal of Economic Dynamics and Control, 26(2):217–
245, 2002.
936 BIBLIOGRAPHY
[88] Svetlozar Todorov Rachev. Handbook of heavy tailed distributions in finance: Handbooks
in finance, volume 1. Elsevier, 2003.
[89] Kevin L Reffett. Production-based asset pricing in monetary economies with transac-
tions costs. Economica, pages 427–443, 1996.
[91] Hernán D Rozenfeld, Diego Rybski, Xavier Gabaix, and Hernán A Makse. The area
and population of cities: New insights from a different perspective on cities. American
Economic Review, 101(5):2205–25, 2011.
[93] Paul A. Samuelson. Interactions between the multiplier analysis and the principle of
acceleration. Review of Economic Studies, 21(2):75–78, 1939.
[94] Thomas J Sargent. The Demand for Money During Hyperinflations under Rational
Expectations: I. International Economic Review, 18(1):59–82, February 1977.
[95] Thomas J Sargent. Macroeconomic Theory. Academic Press, New York, 2nd edition,
1987.
[96] Jack Schechtman and Vera L S Escudero. Some results on an income fluctuation prob-
lem. Journal of Economic Theory, 16(2):151–166, 1977.
[97] Jose A. Scheinkman. Speculation, Trading, and Bubbles. Columbia University Press,
New York, 2014.
[99] Christian Schluter and Mark Trede. Size distributions reconsidered. Econometric Re-
views, 38(6):695–710, 2019.
[100] John Stachurski. Continuous state dynamic programming via nonexpansive approxima-
tion. Computational Economics, 31(2):141–160, 2008.
[101] John Stachurski and Alexis Akira Toda. An impossibility theorem for wealth in
heterogeneous-agent models with limited heterogeneity. Journal of Economic Theory,
182:1–24, 2019.
[103] Kjetil Storesletten, Christopher I Telmer, and Amir Yaron. Consumption and risk shar-
ing over the life cycle. Journal of Monetary Economics, 51(3):609–633, 2004.
[105] George Tauchen. Finite state markov-chain approximations to univariate and vector
autoregressions. Economics Letters, 20(2):177–181, 1986.
[106] Daniel Treisman. Russia’s billionaires. The American Economic Review, 106(5):236–241,
2016.
BIBLIOGRAPHY 937
[107] Ngo Van Long. Dynamic games in the economics of natural resources: a survey. Dy-
namic Games and Applications, 1(1):115–148, 2011.
[109] Abraham Wald. Sequential Analysis. John Wiley and Sons, New York, 1947.
[110] Charles Whiteman. Linear Rational Expectations Models: A User’s Guide. University of
Minnesota Press, Minneapolis, Minnesota, 1983.
[112] G Alastair Young and Richard L Smith. Essentials of statistical inference. Cambridge
University Press, 2005.