0% found this document useful (0 votes)
146 views943 pages

Quantitative Economics With Python

The document provides an overview of using geometric series to model important economic concepts like the Keynesian multiplier, money multiplier in fractional reserve banking, and present value calculations. It introduces the key formulas for infinite and finite geometric series, and provides examples of how each applies to modeling the money multiplier process and the Keynesian multiplier. The document establishes that geometric series are a fundamental tool in economics.

Uploaded by

Peter
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
146 views943 pages

Quantitative Economics With Python

The document provides an overview of using geometric series to model important economic concepts like the Keynesian multiplier, money multiplier in fractional reserve banking, and present value calculations. It introduces the key formulas for infinite and finite geometric series, and provides examples of how each applies to modeling the money multiplier process and the Keynesian multiplier. The document establishes that geometric series are a fundamental tool in economics.

Uploaded by

Peter
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 943

Quantitative Economics with

Python

Thomas J. Sargent and John Stachurski

August 7, 2020
2
Contents

I Tools and Techniques 1

1 Geometric Series for Elementary Economics 3

2 Multivariate Hypergeometric Distribution 23

3 Modeling COVID 19 33

4 Linear Algebra 43

5 Complex Numbers and Trigonometry 67

6 LLN and CLT 77

7 Heavy-Tailed Distributions 95

II Introduction to Dynamics 113

8 Dynamics in One Dimension 115

9 AR1 Processes 135

10 Finite Markov Chains 149

11 Inventory Dynamics 173

12 Linear State Space Models 183

13 Application: The Samuelson Multiplier-Accelerator 207

14 Kesten Processes and Firm Dynamics 241

15 Wealth Distribution Dynamics 255

16 A First Look at the Kalman Filter 271

3
4 CONTENTS

17 Shortest Paths 289

18 Cass-Koopmans Planning Problem 299

19 Cass-Koopmans Competitive Equilibrium 315

III Search 331

20 Job Search I: The McCall Search Model 333

21 Job Search II: Search and Separation 351

22 Job Search III: Fitted Value Function Iteration 365

23 Job Search IV: Correlated Wage Offers 375

24 Job Search V: Modeling Career Choice 385

25 Job Search VI: On-the-Job Search 399

IV Consumption, Savings and Growth 411

26 Cake Eating I: Introduction to Optimal Saving 413

27 Cake Eating II: Numerical Methods 425

28 Optimal Growth I: The Stochastic Optimal Growth Model 441

29 Optimal Growth II: Accelerating the Code with Numba 459

30 Optimal Growth III: Time Iteration 471

31 Optimal Growth IV: The Endogenous Grid Method 483

32 The Income Fluctuation Problem I: Basic Model 491

33 The Income Fluctuation Problem II: Stochastic Returns on Assets 507

V Information 519

34 Job Search VII: Search with Learning 521


CONTENTS 5

35 Likelihood Ratio Processes 549

36 A Problem that Stumped Milton Friedman 563

37 Exchangeability and Bayesian Updating 581

38 Likelihood Ratio Processes and Bayesian Learning 597

39 Bayesian versus Frequentist Decision Rules 605

VI LQ Control 631

40 LQ Control: Foundations 633

41 The Permanent Income Model 663

42 Permanent Income II: LQ Techniques 679

43 Production Smoothing via Inventories 697

VII Multiple Agent Models 717

44 Schelling’s Segregation Model 719

45 A Lake Model of Employment and Unemployment 731

46 Rational Expectations Equilibrium 757

47 Stability in Linear Rational Expectations Models 773

48 Markov Perfect Equilibrium 793

49 Uncertainty Traps 811

50 The Aiyagari Model 825

VIII Asset Pricing and Finance 835

51 Asset Pricing: Finite State Models 837

52 Asset Pricing with Incomplete Markets 857


6 CONTENTS

IX Data and Empirics 869

53 Pandas for Panel Data 871

54 Linear Regression in Python 893

55 Maximum Likelihood Estimation 909


Part I

Tools and Techniques

1
Chapter 1

Geometric Series for Elementary


Economics

1.1 Contents

• Overview 1.2
• Key Formulas 1.3
• Example: The Money Multiplier in Fractional Reserve Banking 1.4
• Example: The Keynesian Multiplier 1.5
• Example: Interest Rates and Present Values 1.6
• Back to the Keynesian Multiplier 1.7

1.2 Overview

The lecture describes important ideas in economics that use the mathematics of geometric
series.
Among these are
• the Keynesian multiplier
• the money multiplier that prevails in fractional reserve banking systems
• interest rates and present values of streams of payouts from assets
(As we shall see below, the term multiplier comes down to meaning sum of a convergent
geometric series)
These and other applications prove the truth of the wise crack that

“in economics, a little knowledge of geometric series goes a long way “

Below we’ll use the following imports:

In [1]: import matplotlib.pyplot as plt


%matplotlib inline
import numpy as np
import sympy as sym
from sympy import init_printing
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D

3
4 CHAPTER 1. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

1.3 Key Formulas

To start, let 𝑐 be a real number that lies strictly between −1 and 1.


• We often write this as 𝑐 ∈ (−1, 1).
• Here (−1, 1) denotes the collection of all real numbers that are strictly less than 1 and
strictly greater than −1.
• The symbol ∈ means in or belongs to the set after the symbol.
We want to evaluate geometric series of two types – infinite and finite.

1.3.1 Infinite Geometric Series

The first type of geometric that interests us is the infinite series

1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯

Where ⋯ means that the series continues without end.


The key formula is

1
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ = (1)
1−𝑐

To prove key formula (1), multiply both sides by (1 − 𝑐) and verify that if 𝑐 ∈ (−1, 1), then
the outcome is the equation 1 = 1.

1.3.2 Finite Geometric Series

The second series that interests us is the finite geomtric series

1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ + 𝑐𝑇

where 𝑇 is a positive integer.


The key formula here is

1 − 𝑐𝑇 +1
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ + 𝑐𝑇 =
1−𝑐

Remark: The above formula works for any value of the scalar 𝑐. We don’t have to restrict 𝑐
to be in the set (−1, 1).
We now move on to describe some famous economic applications of geometric series.

1.4 Example: The Money Multiplier in Fractional Reserve


Banking

In a fractional reserve banking system, banks hold only a fraction 𝑟 ∈ (0, 1) of cash behind
each deposit receipt that they issue
1.4. EXAMPLE: THE MONEY MULTIPLIER IN FRACTIONAL RESERVE BANKING 5

• In recent times
– cash consists of pieces of paper issued by the government and called dollars or
pounds or …
– a deposit is a balance in a checking or savings account that entitles the owner to
ask the bank for immediate payment in cash
• When the UK and France and the US were on either a gold or silver standard (before
1914, for example)
– cash was a gold or silver coin
– a deposit receipt was a bank note that the bank promised to convert into gold or
silver on demand; (sometimes it was also a checking or savings account balance)
Economists and financiers often define the supply of money as an economy-wide sum of
cash plus deposits.
In a fractional reserve banking system (one in which the reserve ratio 𝑟 satisfies 0 < 𝑟 <
1), banks create money by issuing deposits backed by fractional reserves plus loans that
they make to their customers.
A geometric series is a key tool for understanding how banks create money (i.e., deposits) in
a fractional reserve system.
The geometric series formula (1) is at the heart of the classic model of the money creation
process – one that leads us to the celebrated money multiplier.

1.4.1 A Simple Model

There is a set of banks named 𝑖 = 0, 1, 2, ….


Bank 𝑖’s loans 𝐿𝑖 , deposits 𝐷𝑖 , and reserves 𝑅𝑖 must satisfy the balance sheet equation (be-
cause balance sheets balance):

𝐿𝑖 + 𝑅𝑖 = 𝐷𝑖 (2)

The left side of the above equation is the sum of the bank’s assets, namely, the loans 𝐿𝑖 it
has outstanding plus its reserves of cash 𝑅𝑖 .
The right side records bank 𝑖’s liabilities, namely, the deposits 𝐷𝑖 held by its depositors; these
are IOU’s from the bank to its depositors in the form of either checking accounts or savings
accounts (or before 1914, bank notes issued by a bank stating promises to redeem note for
gold or silver on demand).
Each bank 𝑖 sets its reserves to satisfy the equation

𝑅𝑖 = 𝑟𝐷𝑖 (3)

where 𝑟 ∈ (0, 1) is its reserve-deposit ratio or reserve ratio for short


• the reserve ratio is either set by a government or chosen by banks for precautionary rea-
sons
Next we add a theory stating that bank 𝑖 + 1’s deposits depend entirely on loans made by
bank 𝑖, namely

𝐷𝑖+1 = 𝐿𝑖 (4)
6 CHAPTER 1. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

Thus, we can think of the banks as being arranged along a line with loans from bank 𝑖 being
immediately deposited in 𝑖 + 1
• in this way, the debtors to bank 𝑖 become creditors of bank 𝑖 + 1
Finally, we add an initial condition about an exogenous level of bank 0’s deposits

𝐷0 is given exogenously

We can think of 𝐷0 as being the amount of cash that a first depositor put into the first bank
in the system, bank number 𝑖 = 0.
Now we do a little algebra.
Combining equations (2) and (3) tells us that

𝐿𝑖 = (1 − 𝑟)𝐷𝑖 (5)

This states that bank 𝑖 loans a fraction (1 − 𝑟) of its deposits and keeps a fraction 𝑟 as cash
reserves.
Combining equation (5) with equation (4) tells us that

𝐷𝑖+1 = (1 − 𝑟)𝐷𝑖 for 𝑖 ≥ 0

which implies that

𝐷𝑖 = (1 − 𝑟)𝑖 𝐷0 for 𝑖 ≥ 0 (6)

Equation (6) expresses 𝐷𝑖 as the 𝑖 th term in the product of 𝐷0 and the geometric series

1, (1 − 𝑟), (1 − 𝑟)2 , ⋯

Therefore, the sum of all deposits in our banking system 𝑖 = 0, 1, 2, … is


𝐷0 𝐷
∑(1 − 𝑟)𝑖 𝐷0 = = 0 (7)
𝑖=0
1 − (1 − 𝑟) 𝑟

1.4.2 Money Multiplier

The money multiplier is a number that tells the multiplicative factor by which an exoge-
nous injection of cash into bank 0 leads to an increase in the total deposits in the banking
system.
1
Equation (7) asserts that the money multiplier is 𝑟

• An initial deposit of cash of 𝐷0 in bank 0 leads the banking system to create total de-
posits of 𝐷𝑟0 .
• The initial deposit 𝐷0 is held as reserves, distributed throughout the banking system

according to 𝐷0 = ∑𝑖=0 𝑅𝑖 .
1.5. EXAMPLE: THE KEYNESIAN MULTIPLIER 7

1.5 Example: The Keynesian Multiplier

The famous economist John Maynard Keynes and his followers created a simple model in-
tended to determine national income 𝑦 in circumstances in which
• there are substantial unemployed resources, in particular excess supply of labor and
capital
• prices and interest rates fail to adjust to make aggregate supply equal demand (e.g.,
prices and interest rates are frozen)
• national income is entirely determined by aggregate demand

1.5.1 Static Version

An elementary Keynesian model of national income determination consists of three equations


that describe aggegate demand for 𝑦 and its components.
The first equation is a national income identity asserting that consumption 𝑐 plus investment
𝑖 equals national income 𝑦:

𝑐+𝑖=𝑦

The second equation is a Keynesian consumption function asserting that people consume a
fraction 𝑏 ∈ (0, 1) of their income:

𝑐 = 𝑏𝑦

The fraction 𝑏 ∈ (0, 1) is called the marginal propensity to consume.


The fraction 1 − 𝑏 ∈ (0, 1) is called the marginal propensity to save.
The third equation simply states that investment is exogenous at level 𝑖.
• exogenous means determined outside this model.
Substituting the second equation into the first gives (1 − 𝑏)𝑦 = 𝑖.
Solving this equation for 𝑦 gives

1
𝑦= 𝑖
1−𝑏

1
The quantity 1−𝑏 is called the investment multiplier or simply the multiplier.
Applying the formula for the sum of an infinite geometric series, we can write the above equa-
tion as


𝑦 = 𝑖 ∑ 𝑏𝑡
𝑡=0

where 𝑡 is a nonnegative integer.


So we arrive at the following equivalent expressions for the multiplier:
8 CHAPTER 1. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS


1
= ∑ 𝑏𝑡
1−𝑏 𝑡=0


The expression ∑𝑡=0 𝑏𝑡 motivates an interpretation of the multiplier as the outcome of a dy-
namic process that we describe next.

1.5.2 Dynamic Version

We arrive at a dynamic version by interpreting the nonnegative integer 𝑡 as indexing time and
changing our specification of the consumption function to take time into account
• we add a one-period lag in how income affects consumption
We let 𝑐𝑡 be consumption at time 𝑡 and 𝑖𝑡 be investment at time 𝑡.
We modify our consumption function to assume the form

𝑐𝑡 = 𝑏𝑦𝑡−1

so that 𝑏 is the marginal propensity to consume (now) out of last period’s income.
We begin wtih an initial condition stating that

𝑦−1 = 0

We also assume that

𝑖𝑡 = 𝑖 for all 𝑡 ≥ 0

so that investment is constant over time.


It follows that

𝑦0 = 𝑖 + 𝑐0 = 𝑖 + 𝑏𝑦−1 = 𝑖

and

𝑦1 = 𝑐1 + 𝑖 = 𝑏𝑦0 + 𝑖 = (1 + 𝑏)𝑖

and

𝑦2 = 𝑐2 + 𝑖 = 𝑏𝑦1 + 𝑖 = (1 + 𝑏 + 𝑏2 )𝑖

and more generally

𝑦𝑡 = 𝑏𝑦𝑡−1 + 𝑖 = (1 + 𝑏 + 𝑏2 + ⋯ + 𝑏𝑡 )𝑖

or
1.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 9

1 − 𝑏𝑡+1
𝑦𝑡 = 𝑖
1−𝑏

Evidently, as 𝑡 → +∞,

1
𝑦𝑡 → 𝑖
1−𝑏

Remark 1: The above formula is often applied to assert that an exogenous increase in in-
vestment of Δ𝑖 at time 0 ignites a dynamic process of increases in national income by succes-
sive amounts

Δ𝑖, (1 + 𝑏)Δ𝑖, (1 + 𝑏 + 𝑏2 )Δ𝑖, ⋯

at times 0, 1, 2, ….
Remark 2 Let 𝑔𝑡 be an exogenous sequence of government expenditures.
If we generalize the model so that the national income identity becomes

𝑐𝑡 + 𝑖 𝑡 + 𝑔 𝑡 = 𝑦 𝑡

then a version of the preceding argument shows that the government expenditures mul-
1
tiplier is also 1−𝑏 , so that a permanent increase in government expenditures ultimately leads
to an increase in national income equal to the multiplier times the increase in government ex-
penditures.

1.6 Example: Interest Rates and Present Values

We can apply our formula for geometric series to study how interest rates affect values of
streams of dollar payments that extend over time.
We work in discrete time and assume that 𝑡 = 0, 1, 2, … indexes time.
We let 𝑟 ∈ (0, 1) be a one-period net nominal interest rate
• if the nominal interest rate is 5 percent, then 𝑟 = .05
A one-period gross nominal interest rate 𝑅 is defined as

𝑅 = 1 + 𝑟 ∈ (1, 2)
• if 𝑟 = .05, then 𝑅 = 1.05
Remark: The gross nominal interest rate 𝑅 is an exchange rate or relative price of dol-
lars at between times 𝑡 and 𝑡 + 1. The units of 𝑅 are dollars at time 𝑡 + 1 per dollar at time
𝑡.
When people borrow and lend, they trade dollars now for dollars later or dollars later for dol-
lars now.
The price at which these exchanges occur is the gross nominal interest rate.
• If I sell 𝑥 dollars to you today, you pay me 𝑅𝑥 dollars tomorrow.
10 CHAPTER 1. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

• This means that you borrowed 𝑥 dollars for me at a gross interest rate 𝑅 and a net in-
terest rate 𝑟.
We assume that the net nominal interest rate 𝑟 is fixed over time, so that 𝑅 is the gross nom-
inal interest rate at times 𝑡 = 0, 1, 2, ….
Two important geometric sequences are

1, 𝑅, 𝑅2 , ⋯ (8)

and

1, 𝑅−1 , 𝑅−2 , ⋯ (9)

Sequence (8) tells us how dollar values of an investment accumulate through time.
Sequence (9) tells us how to discount future dollars to get their values in terms of today’s
dollars.

1.6.1 Accumulation

Geometric sequence (8) tells us how one dollar invested and re-invested in a project with
gross one period nominal rate of return accumulates
• here we assume that net interest payments are reinvested in the project
• thus, 1 dollar invested at time 0 pays interest 𝑟 dollars after one period, so we have 𝑟 +
1 = 𝑅 dollars at time1
• at time 1 we reinvest 1 + 𝑟 = 𝑅 dollars and receive interest of 𝑟𝑅 dollars at time 2 plus
the principal 𝑅 dollars, so we receive 𝑟𝑅 + 𝑅 = (1 + 𝑟)𝑅 = 𝑅2 dollars at the end of
period 2
• and so on
Evidently, if we invest 𝑥 dollars at time 0 and reinvest the proceeds, then the sequence

𝑥, 𝑥𝑅, 𝑥𝑅2 , ⋯

tells how our account accumulates at dates 𝑡 = 0, 1, 2, ….

1.6.2 Discounting

Geometric sequence (9) tells us how much future dollars are worth in terms of today’s dollars.
Remember that the units of 𝑅 are dollars at 𝑡 + 1 per dollar at 𝑡.
It follows that
• the units of 𝑅−1 are dollars at 𝑡 per dollar at 𝑡 + 1
• the units of 𝑅−2 are dollars at 𝑡 per dollar at 𝑡 + 2
• and so on; the units of 𝑅−𝑗 are dollars at 𝑡 per dollar at 𝑡 + 𝑗
So if someone has a claim on 𝑥 dollars at time 𝑡 + 𝑗, it is worth 𝑥𝑅−𝑗 dollars at time 𝑡 (e.g.,
today).
1.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 11

1.6.3 Application to Asset Pricing

A lease requires a payments stream of 𝑥𝑡 dollars at times 𝑡 = 0, 1, 2, … where

𝑥𝑡 = 𝐺𝑡 𝑥0

where 𝐺 = (1 + 𝑔) and 𝑔 ∈ (0, 1).


Thus, lease payments increase at 𝑔 percent per period.
For a reason soon to be revealed, we assume that 𝐺 < 𝑅.
The present value of the lease is

𝑝0 = 𝑥0 + 𝑥1 /𝑅 + 𝑥2 /(𝑅2 )+ ⋱
= 𝑥0 (1 + 𝐺𝑅−1 + 𝐺2 𝑅−2 + ⋯)
1
= 𝑥0
1 − 𝐺𝑅−1

where the last line uses the formula for an infinite geometric series.
Recall that 𝑅 = 1 + 𝑟 and 𝐺 = 1 + 𝑔 and that 𝑅 > 𝐺 and 𝑟 > 𝑔 and that 𝑟 and 𝑔 are typically
small numbers, e.g., .05 or .03.
1
Use the Taylor series of 1+𝑟 about 𝑟 = 0, namely,

1
= 1 − 𝑟 + 𝑟2 − 𝑟3 + ⋯
1+𝑟
1
and the fact that 𝑟 is small to approximate 1+𝑟 ≈ 1 − 𝑟.
Use this approximation to write 𝑝0 as

1
𝑝0 = 𝑥0
1 − 𝐺𝑅−1
1
= 𝑥0
1 − (1 + 𝑔)(1 − 𝑟)
1
= 𝑥0
1 − (1 + 𝑔 − 𝑟 − 𝑟𝑔)
1
≈ 𝑥0
𝑟−𝑔

where the last step uses the approximation 𝑟𝑔 ≈ 0.


The approximation

𝑥0
𝑝0 =
𝑟−𝑔

is known as the Gordon formula for the present value or current price of an infinite pay-
ment stream 𝑥0 𝐺𝑡 when the nominal one-period interest rate is 𝑟 and when 𝑟 > 𝑔.
We can also extend the asset pricing formula so that it applies to finite leases.
Let the payment stream on the lease now be 𝑥𝑡 for 𝑡 = 1, 2, … , 𝑇 , where again
12 CHAPTER 1. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

𝑥𝑡 = 𝐺𝑡 𝑥0

The present value of this lease is:

𝑝0 = 𝑥0 + 𝑥1 /𝑅 + ⋯ + 𝑥𝑇 /𝑅𝑇
= 𝑥0 (1 + 𝐺𝑅−1 + ⋯ + 𝐺𝑇 𝑅−𝑇 )
𝑥0 (1 − 𝐺𝑇 +1 𝑅−(𝑇 +1) )
=
1 − 𝐺𝑅−1

Applying the Taylor series to 𝑅−(𝑇 +1) about 𝑟 = 0 we get:

1 1
= 1 − 𝑟(𝑇 + 1) + 𝑟2 (𝑇 + 1)(𝑇 + 2) + ⋯ ≈ 1 − 𝑟(𝑇 + 1)
(1 + 𝑟)𝑇 +1 2

Similarly, applying the Taylor series to 𝐺𝑇 +1 about 𝑔 = 0:

(1 + 𝑔)𝑇 +1 = 1 + (𝑇 + 1)𝑔(1 + 𝑔)𝑇 + (𝑇 + 1)𝑇 𝑔2 (1 + 𝑔)𝑇 −1 + ⋯ ≈ 1 + (𝑇 + 1)𝑔

Thus, we get the following approximation:

𝑥0 (1 − (1 + (𝑇 + 1)𝑔)(1 − 𝑟(𝑇 + 1)))


𝑝0 =
1 − (1 − 𝑟)(1 + 𝑔)

Expanding:

𝑥0 (1 − 1 + (𝑇 + 1)2 𝑟𝑔 − 𝑟(𝑇 + 1) + 𝑔(𝑇 + 1))


𝑝0 =
1 − 1 + 𝑟 − 𝑔 + 𝑟𝑔
𝑥0 (𝑇 + 1)((𝑇 + 1)𝑟𝑔 + 𝑟 − 𝑔)
=
𝑟 − 𝑔 + 𝑟𝑔
𝑥 (𝑇 + 1)(𝑟 − 𝑔) 𝑥0 𝑟𝑔(𝑇 + 1)
≈ 0 +
𝑟−𝑔 𝑟−𝑔
𝑥 𝑟𝑔(𝑇 + 1)
= 𝑥0 (𝑇 + 1) + 0
𝑟−𝑔

We could have also approximated by removing the second term 𝑟𝑔𝑥0 (𝑇 + 1) when 𝑇 is rela-
tively small compared to 1/(𝑟𝑔) to get 𝑥0 (𝑇 + 1) as in the finite stream approximation.
We will plot the true finite stream present-value and the two approximations, under different
values of 𝑇 , and 𝑔 and 𝑟 in Python.
First we plot the true finite stream present-value after computing it below

In [2]: # True present value of a finite lease


def finite_lease_pv_true(T, g, r, x_0):
G = (1 + g)
R = (1 + r)
return (x_0 * (1 ­ G**(T + 1) * R**(­T ­ 1))) / (1 ­ G * R**(­1))
# First approximation for our finite lease

def finite_lease_pv_approx_1(T, g, r, x_0):


p = x_0 * (T + 1) + x_0 * r * g * (T + 1) / (r ­ g)
1.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 13

return p

# Second approximation for our finite lease


def finite_lease_pv_approx_2(T, g, r, x_0):
return (x_0 * (T + 1))

# Infinite lease
def infinite_lease(g, r, x_0):
G = (1 + g)
R = (1 + r)
return x_0 / (1 ­ G * R**(­1))

Now that we have defined our functions, we can plot some outcomes.
First we study the quality of our approximations

In [3]: def plot_function(axes, x_vals, func, args):


axes.plot(x_vals, func(*args), label=func.__name__)

T_max = 50

T = np.arange(0, T_max+1)
g = 0.02
r = 0.03
x_0 = 1

our_args = (T, g, r, x_0)


funcs = [finite_lease_pv_true,
finite_lease_pv_approx_1,
finite_lease_pv_approx_2]
## the three functions we want to compare

fig, ax = plt.subplots()
ax.set_title('Finite Lease Present Value $T$ Periods Ahead')
for f in funcs:
plot_function(ax, T, f, our_args)
ax.legend()
ax.set_xlabel('$T$ Periods Ahead')
ax.set_ylabel('Present Value, $p_0$')
plt.show()
14 CHAPTER 1. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

Evidently our approximations perform well for small values of 𝑇 .


However, holding 𝑔 and r fixed, our approximations deteriorate as 𝑇 increases.
Next we compare the infinite and finite duration lease present values over different lease
lengths 𝑇 .

In [4]: # Convergence of infinite and finite


T_max = 1000
T = np.arange(0, T_max+1)
fig, ax = plt.subplots()
ax.set_title('Infinite and Finite Lease Present Value $T$ Periods Ahead')
f_1 = finite_lease_pv_true(T, g, r, x_0)
f_2 = np.ones(T_max+1)*infinite_lease(g, r, x_0)
ax.plot(T, f_1, label='T­period lease PV')
ax.plot(T, f_2, '­­', label='Infinite lease PV')
ax.set_xlabel('$T$ Periods Ahead')
ax.set_ylabel('Present Value, $p_0$')
ax.legend()
plt.show()
1.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 15

The graph above shows how as duration 𝑇 → +∞, the value of a lease of duration 𝑇 ap-
proaches the value of a perpetual lease.
Now we consider two different views of what happens as 𝑟 and 𝑔 covary

In [5]: # First view


# Changing r and g
fig, ax = plt.subplots()
ax.set_title('Value of lease of length $T$')
ax.set_ylabel('Present Value, $p_0$')
ax.set_xlabel('$T$ periods ahead')
T_max = 10
T=np.arange(0, T_max+1)

rs, gs = (0.9, 0.5, 0.4001, 0.4), (0.4, 0.4, 0.4, 0.5),


comparisons = ('$\gg$', '$>$', r'$\approx$', '$<$')
for r, g, comp in zip(rs, gs, comparisons):
ax.plot(finite_lease_pv_true(T, g, r, x_0), label=f'r(={r}) {comp} g(={g})')

ax.legend()
plt.show()
16 CHAPTER 1. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

This graph gives a big hint for why the condition 𝑟 > 𝑔 is necessary if a lease of length 𝑇 =
+∞ is to have finite value.
For fans of 3-d graphs the same point comes through in the following graph.
If you aren’t enamored of 3-d graphs, feel free to skip the next visualization!

In [6]: # Second view


fig = plt.figure()
T = 3
ax = fig.gca(projection='3d')
r = np.arange(0.01, 0.99, 0.005)
g = np.arange(0.011, 0.991, 0.005)

rr, gg = np.meshgrid(r, g)
z = finite_lease_pv_true(T, gg, rr, x_0)

# Removes points where undefined


same = (rr == gg)
z[same] = np.nan
surf = ax.plot_surface(rr, gg, z, cmap=cm.coolwarm,
antialiased=True, clim=(0, 15))
fig.colorbar(surf, shrink=0.5, aspect=5)
ax.set_xlabel('$r$')
ax.set_ylabel('$g$')
ax.set_zlabel('Present Value, $p_0$')
ax.view_init(20, 10)
ax.set_title('Three Period Lease PV with Varying $g$ and $r$')
plt.show()
1.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 17

We can use a little calculus to study how the present value 𝑝0 of a lease varies with 𝑟 and 𝑔.
We will use a library called SymPy.
SymPy enables us to do symbolic math calculations including computing derivatives of alge-
braic equations.
We will illustrate how it works by creating a symbolic expression that represents our present
value formula for an infinite lease.
After that, we’ll use SymPy to compute derivatives

In [7]: # Creates algebraic symbols that can be used in an algebraic expression


g, r, x0 = sym.symbols('g, r, x0')
G = (1 + g)
R = (1 + r)
p0 = x0 / (1 ­ G * R**(­1))
init_printing()
print('Our formula is:')
p0

Our formula is:

𝑥0
Out[7]: 𝑔+1
− 𝑟+1 + 1

In [8]: print('dp0 / dg is:')


dp_dg = sym.diff(p0, g)
dp_dg

dp0 / dg is:
18 CHAPTER 1. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

𝑥0
Out[8]: 2
(𝑟 + 1) (− 𝑔+1
𝑟+1
+ 1)

In [9]: print('dp0 / dr is:')


dp_dr = sym.diff(p0, r)
dp_dr

dp0 / dr is:

𝑥0 (𝑔 + 1)
Out[9]: − 2 2
(𝑟 + 1) (− 𝑔+1
𝑟+1
+ 1)

𝜕𝑝0 𝜕𝑝0
We can see that for 𝜕𝑟 < 0 as long as 𝑟 > 𝑔, 𝑟 > 0 and 𝑔 > 0 and 𝑥0 is positive, so 𝜕𝑟 will
always be negative.
𝜕𝑝0 𝜕𝑝0
Similarly, 𝜕𝑔 > 0 as long as 𝑟 > 𝑔, 𝑟 > 0 and 𝑔 > 0 and 𝑥0 is positive, so 𝜕𝑔 will always be
positive.

1.7 Back to the Keynesian Multiplier

We will now go back to the case of the Keynesian multiplier and plot the time path of 𝑦𝑡 ,
given that consumption is a constant fraction of national income, and investment is fixed.

In [10]: # Function that calculates a path of y


def calculate_y(i, b, g, T, y_init):
y = np.zeros(T+1)
y[0] = i + b * y_init + g
for t in range(1, T+1):
y[t] = b * y[t­1] + i + g
return y

# Initial values
i_0 = 0.3
g_0 = 0.3
# 2/3 of income goes towards consumption
b = 2/3
y_init = 0
T = 100

fig, ax = plt.subplots()
ax.set_title('Path of Aggregate Output Over Time')
ax.set_xlabel('$t$')
ax.set_ylabel('$y_t$')
ax.plot(np.arange(0, T+1), calculate_y(i_0, b, g_0, T, y_init))
# Output predicted by geometric series
ax.hlines(i_0 / (1 ­ b) + g_0 / (1 ­ b), xmin=­1, xmax=101, linestyles='­­')
plt.show()
1.7. BACK TO THE KEYNESIAN MULTIPLIER 19

In this model, income grows over time, until it gradually converges to the infinite geometric
series sum of income.
We now examine what will happen if we vary the so-called marginal propensity to con-
sume, i.e., the fraction of income that is consumed

In [11]: bs = (1/3, 2/3, 5/6, 0.9)

fig,ax = plt.subplots()
ax.set_title('Changing Consumption as a Fraction of Income')
ax.set_ylabel('$y_t$')
ax.set_xlabel('$t$')
x = np.arange(0, T+1)
for b in bs:
y = calculate_y(i_0, b, g_0, T, y_init)
ax.plot(x, y, label=r'$b=$'+f"{b:.2f}")
ax.legend()
plt.show()
20 CHAPTER 1. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

Increasing the marginal propensity to consume 𝑏 increases the path of output over time.
Now we will compare the effects on output of increases in investment and government spend-
ing.

In [12]: fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(6, 10))


fig.subplots_adjust(hspace=0.3)

x = np.arange(0, T+1)
values = [0.3, 0.4]

for i in values:
y = calculate_y(i, b, g_0, T, y_init)
ax1.plot(x, y, label=f"i={i}")
for g in values:
y = calculate_y(i_0, b, g, T, y_init)
ax2.plot(x, y, label=f"g={g}")

axes = ax1, ax2


param_labels = "Investment", "Government Spending"
for ax, param in zip(axes, param_labels):
ax.set_title(f'An Increase in {param} on Output')
ax.legend(loc ="lower right")
ax.set_ylabel('$y_t$')
ax.set_xlabel('$t$')
plt.show()
1.7. BACK TO THE KEYNESIAN MULTIPLIER 21

Notice here, whether government spending increases from 0.3 to 0.4 or investment increases
from 0.3 to 0.4, the shifts in the graphs are identical.
22 CHAPTER 1. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
Chapter 2

Multivariate Hypergeometric
Distribution

2.1 Contents

• Overview 2.2
• The Administrator’s Problem 2.3
• Usage 2.4

2.2 Overview

This lecture describes how an administrator deployed a multivariate hypergeometric dis-


tribution in order to access the fairness of a procedure for awarding research grants.
In the lecture we’ll learn about
• properties of the multivariate hypergeometric distribution
• first and second moments of a multivariate hypergeometric distribution
• using a Monte Carlo simulation of a multivariate normal distribution to evaluate the
quality of a normal approximation
• the administrator’s problem and why the multivariate hypergeometric distribution is the
right tool

2.3 The Administrator’s Problem

An administrator in charge of allocating research grants is in the following situation.


To help us forget details that are none of our business here and to protect the anonymity of
the administrator and the subjects, we call research proposals balls and continents of resi-
dence of authors of a proposal a color.
There are 𝐾𝑖 balls (proposals) of color 𝑖.
There are 𝑐 distinct colors (continents of residence).
Thus, 𝑖 = 1, 2, … , 𝑐
𝑐
So there is a total of 𝑁 = ∑𝑖=1 𝐾𝑖 balls.

23
24 CHAPTER 2. MULTIVARIATE HYPERGEOMETRIC DISTRIBUTION

All 𝑁 of these balls are placed in an urn.


Then 𝑛 balls are drawn randomly.
The selection procedure is supposed to be color blind meaning that ball quality, a random
variable that is supposed to be independent of ball color, governs whether a ball is drawn.
Thus, the selection procedure is supposed randomly to draw 𝑛 balls from the urn.
The 𝑛 balls drawn represent successful proposals and are awarded research funds.
The remaining 𝑁 − 𝑛 balls receive no research funds.

2.3.1 Details of the Awards Procedure Under Study

Let 𝑘𝑖 be the number of balls of color 𝑖 that are drawn.


𝑐
Things have to add up so ∑𝑖=1 𝑘𝑖 = 𝑛.
Under the hypothesis that the selection process judges proposals on their quality and that
quality is independent of continent of the author’s continent of residence, the administrator
views the outcome of the selection procedure as a random vector

𝑘1

⎜ 𝑘2 ⎞

𝑋=⎜
⎜ ⎟.
⎜⋮⎟ ⎟
⎝ 𝑘𝑐 ⎠

To evaluate whether the selection procedure is color blind the administrator wants to study
whether the particular realization of 𝑋 drawn can plausibly be said to be a random draw
from the probability distribution that is implied by the color blind hypothesis.
The appropriate probability distribution is the one described here
Let’s now instantiate the administrator’s problem, while continuing to use the colored balls
metaphor.
The administrator has an urn with 𝑁 = 238 balls.
157 balls are blue, 11 balls are green, 46 balls are yellow, and 24 balls are black.
So (𝐾1 , 𝐾2 , 𝐾3 , 𝐾4 ) = (157, 11, 46, 24) and 𝑐 = 4.
15 balls are drawn without replacement.
So 𝑛 = 15.
The administrator wants to know the probability distribution of outcomes

𝑘1

⎜ 𝑘2 ⎞

𝑋=⎜
⎜ ⎟.
⎜⋮⎟ ⎟
⎝𝑘4 ⎠

In particular, he wants to know whether a particular outcome - in the form of a 4×1 vector of
integers recording the numbers of blue, green, yellow, and black balls, respectively, - contains
evidence against the hypothesis that the selection process is fair, which here means color blind
and truly are random draws without replacement from the population of 𝑁 balls.
2.3. THE ADMINISTRATOR’S PROBLEM 25

The right tool for the administrator’s job is the multivariate hypergeometric distribu-
tion.

2.3.2 Multivariate Hypergeometric Distribution

Let’s start with some imports.

In [1]: import numpy as np


from scipy.special import comb
from scipy.stats import normaltest
from numba import njit, prange
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.cm as cm

To recapitulate, we assume there are in total 𝑐 types of objects in an urn.


If there are 𝐾𝑖 type 𝑖 object in the urn and we take 𝑛 draws at random without replacement,
then the numbers of type 𝑖 objects in the sample (𝑘1 , 𝑘2 , … , 𝑘𝑐 ) has the multivariate hyperge-
ometric distribution.
𝑐 𝑐
Note again that 𝑁 = ∑𝑖=1 𝐾𝑖 is the total number of objects in the urn and 𝑛 = ∑𝑖=1 𝑘𝑖 .
Notation
We use the following notation for binomial coefficients: (𝑚
𝑞) =
𝑚!
(𝑚−𝑞)! .

The multivariate hypergeometric distribution has the following properties:


Probability mass function:

𝑐
∏𝑖=1 (𝐾
𝑘 )
𝑖

Pr{𝑋𝑖 = 𝑘𝑖 ∀𝑖} = 𝑖

(𝑁
𝑛)

Mean:

𝐾𝑖
E(𝑋𝑖 ) = 𝑛
𝑁

Variances and covariances:

𝑁 − 𝑛 𝐾𝑖 𝐾
Var(𝑋𝑖 ) = 𝑛 (1 − 𝑖 )
𝑁 −1 𝑁 𝑁

𝑁 − 𝑛 𝐾𝑖 𝐾𝑗
Cov(𝑋𝑖 , 𝑋𝑗 ) = −𝑛
𝑁 −1 𝑁 𝑁

To do our work for us, we’ll write an Urn class.

In [2]: class Urn:

def __init__(self, K_arr):


"""
Initialization given the number of each type i object in the urn.
26 CHAPTER 2. MULTIVARIATE HYPERGEOMETRIC DISTRIBUTION

Parameters
­­­­­­­­­­
K_arr: ndarray(int)
number of each type i object.
"""

self.K_arr = np.array(K_arr)
self.N = np.sum(K_arr)
self.c = len(K_arr)

def pmf(self, k_arr):


"""
Probability mass function.

Parameters
­­­­­­­­­­
k_arr: ndarray(int)
number of observed successes of each object.
"""

K_arr, N = self.K_arr, self.N

k_arr = np.atleast_2d(k_arr)
n = np.sum(k_arr, 1)

num = np.prod(comb(K_arr, k_arr), 1)


denom = comb(N, n)

pr = num / denom

return pr

def moments(self, n):


"""
Compute the mean and variance­covariance matrix for
multivariate hypergeometric distribution.

Parameters
­­­­­­­­­­
n: int
number of draws.
"""

K_arr, N, c = self.K_arr, self.N, self.c

# mean
μ = n * K_arr / N

# variance­covariance matrix
Σ = np.ones((c, c)) * n * (N ­ n) / (N ­ 1) / N ** 2
for i in range(c­1):
Σ[i, i] *= K_arr[i] * (N ­ K_arr[i])
for j in range(i+1, c):
Σ[i, j] *= ­ K_arr[i] * K_arr[j]
Σ[j, i] = Σ[i, j]

Σ[­1, ­1] *= K_arr[­1] * (N ­ K_arr[­1])

return μ, Σ

def simulate(self, n, size=1, seed=None):


"""
Simulate a sample from multivariate hypergeometric
2.4. USAGE 27

distribution where at each draw we take n objects


from the urn without replacement.

Parameters
­­­­­­­­­­
n: int
number of objects for each draw.
size: int(optional)
sample size.
seed: int(optional)
random seed.
"""

K_arr = self.K_arr

gen = np.random.Generator(np.random.PCG64(seed))
sample = gen.multivariate_hypergeometric(K_arr, n, size=size)

return sample

2.4 Usage

2.4.1 First example

Apply this to an example from wiki:


Suppose there are 5 black, 10 white, and 15 red marbles in an urn. If six marbles are chosen
without replacement, the probability that exactly two of each color are chosen is

(52)(10 15
2 )( 2 )
𝑃 (2 black, 2 white, 2 red) = = 0.079575596816976
(30
6)

In [3]: # construct the urn


K_arr = [5, 10, 15]
urn = Urn(K_arr)

Now use the Urn Class method pmf to compute the probability of the outcome 𝑋 = (2 2 2)

In [4]: k_arr = [2, 2, 2] # array of number of observed successes


urn.pmf(k_arr)

Out[4]: array([0.0795756])

We can use the code to compute probabilities of a list of possible outcomes by constructing
a 2-dimensional array k_arr and pmf will return an array of probabilities for observing each
case.

In [5]: k_arr = [[2, 2, 2], [1, 3, 2]]


urn.pmf(k_arr)

Out[5]: array([0.0795756, 0.1061008])

Now let’s compute the mean vector and variance-covariance matrix.


28 CHAPTER 2. MULTIVARIATE HYPERGEOMETRIC DISTRIBUTION

In [6]: n = 6
μ, Σ = urn.moments(n)

In [7]: μ

Out[7]: array([1., 2., 3.])

In [8]: Σ

Out[8]: array([[ 0.68965517, ­0.27586207, ­0.4137931 ],


[­0.27586207, 1.10344828, ­0.82758621],
[­0.4137931 , ­0.82758621, 1.24137931]])

2.4.2 Back to The Administrator’s Problem

Now let’s turn to the grant administrator’s problem.


Here the array of numbers of 𝑖 objects in the urn is (157, 11, 46, 24).

In [9]: K_arr = [157, 11, 46, 24]


urn = Urn(K_arr)

Let’s compute the probability of the outcome (10, 1, 4, 0).

In [10]: k_arr = [10, 1, 4, 0]


urn.pmf(k_arr)

Out[10]: array([0.01547738])

We can compute probabilities of three possible outcomes by constructing a 3-dimensional ar-


rays k_arr and utilizing the method pmf of the Urn class.

In [11]: k_arr = [[5, 5, 4 ,1], [10, 1, 2, 2], [13, 0, 2, 0]]


urn.pmf(k_arr)

Out[11]: array([6.21412534e­06, 2.70935969e­02, 1.61839976e­02])

Now let’s compute the mean and variance-covariance matrix of 𝑋 when 𝑛 = 6.

In [12]: n = 6 # number of draws


μ, Σ = urn.moments(n)

In [13]: # mean
μ

Out[13]: array([3.95798319, 0.27731092, 1.15966387, 0.60504202])

In [14]: # variance­covariance matrix


Σ
2.4. USAGE 29

Out[14]: array([[ 1.31862604, ­0.17907267, ­0.74884935, ­0.39070401],


[­0.17907267, 0.25891399, ­0.05246715, ­0.02737417],
[­0.74884935, ­0.05246715, 0.91579029, ­0.11447379],
[­0.39070401, ­0.02737417, ­0.11447379, 0.53255196]])

We can simulate a large sample and verify that sample means and covariances closely approx-
imate the population means and covariances.

In [15]: size = 10_000_000


sample = urn.simulate(n, size=size)

In [16]: # mean
np.mean(sample, 0)

Out[16]: array([3.957848 , 0.2773974, 1.159666 , 0.6050886])

In [17]: # variance covariance matrix


np.cov(sample.T)

Out[17]: array([[ 1.31833954, ­0.17869656, ­0.74849613, ­0.39114684],


[­0.17869656, 0.25893371, ­0.05292064, ­0.02731651],
[­0.74849613, ­0.05292064, 0.91564606, ­0.11422929],
[­0.39114684, ­0.02731651, ­0.11422929, 0.53269264]])

Evidently, the sample means and covariances approximate their population counterparts well.

2.4.3 Quality of Normal Approximation

To judge the quality of a multivariate normal approximation to the multivariate hypergeo-


metric distribution, we draw a large sample from a multivariate normal distribution with the
mean vector and covariance matrix for the corresponding multivariate hypergeometric distri-
bution and compare the simulated distribution with the population multivariate hypergeo-
metric distribution.

In [18]: sample_normal = np.random.multivariate_normal(μ, Σ, size=size)

In [19]: def bivariate_normal(x, y, μ, Σ, i, j):

μ_x, μ_y = μ[i], μ[j]


σ_x, σ_y = np.sqrt(Σ[i, i]), np.sqrt(Σ[j, j])
σ_xy = Σ[i, j]

x_μ = x ­ μ_x
y_μ = y ­ μ_y

ρ = σ_xy / (σ_x * σ_y)


z = x_μ**2 / σ_x**2 + y_μ**2 / σ_y**2 ­ 2 * ρ * x_μ * y_μ / (σ_x * σ_y)
denom = 2 * np.pi * σ_x * σ_y * np.sqrt(1 ­ ρ**2)

return np.exp(­z / (2 * (1 ­ ρ**2))) / denom

In [20]: @njit
def count(vec1, vec2, n):
size = sample.shape[0]
30 CHAPTER 2. MULTIVARIATE HYPERGEOMETRIC DISTRIBUTION

count_mat = np.zeros((n+1, n+1))


for i in prange(size):
count_mat[vec1[i], vec2[i]] += 1

return count_mat

In [21]: c = urn.c
fig, axs = plt.subplots(c, c, figsize=(14, 14))

# grids for ploting the bivariate Gaussian


x_grid = np.linspace(­2, n+1, 100)
y_grid = np.linspace(­2, n+1, 100)
X, Y = np.meshgrid(x_grid, y_grid)

for i in range(c):
axs[i, i].hist(sample[:, i], bins=np.arange(0, n, 1), alpha=0.5, density=True,
label='hypergeom')
axs[i, i].hist(sample_normal[:, i], bins=np.arange(0, n, 1), alpha=0.5,
density=True, label='normal')
axs[i, i].legend()
axs[i, i].set_title('$k_{' +str(i+1) +'}$')
for j in range(c):
if i == j:
continue

# bivariate Gaussian density function


Z = bivariate_normal(X, Y, μ, Σ, i, j)
cs = axs[i, j].contour(X, Y, Z, 4, colors="black", alpha=0.6)
axs[i, j].clabel(cs, inline=1, fontsize=10)

# empirical multivariate hypergeometric distrbution


count_mat = count(sample[:, i], sample[:, j], n)
axs[i, j].pcolor(count_mat.T/size, cmap='Blues')
axs[i, j].set_title('$(k_{' +str(i+1) +'}, k_{' + str(j+1) + '})$')

plt.show()
2.4. USAGE 31

The diagonal graphs plot the marginal distributions of 𝑘𝑖 for each 𝑖 using histograms.
Note the substantial differences between hypergeometric distribution and the approximating
normal distribution.
The off-diagonal graphs plot the empirical joint distribution of 𝑘𝑖 and 𝑘𝑗 for each pair (𝑖, 𝑗).

The darker the blue, the more data points are contained in the corresponding cell.
(Note that 𝑘𝑖 is on the x-axis and 𝑘𝑗 is on the y-axis).

The contour maps plot the bivariate Gaussian density function of (𝑘𝑖 , 𝑘𝑗 ) with the population
mean and covariance given by slices of 𝜇 and Σ that we computed above.
Let’s also test the normality for each 𝑘𝑖 using scipy.stats.normaltest that implements
D’Agostino and Pearson’s test that combines skew and kurtosis to form an omnibus test of
normality.
The null hypothesis is that the sample follows normal distribution.

normaltest returns an array of p-values associated with tests for each 𝑘𝑖 sample.
32 CHAPTER 2. MULTIVARIATE HYPERGEOMETRIC DISTRIBUTION

In [22]: test_multihyper = normaltest(sample)


test_multihyper.pvalue

Out[22]: array([0., 0., 0., 0.])

As we can see, all the p-values are almost 0 and the null hypothesis is soundly rejected.
By contrast, the sample from normal distribution does not reject the null hypothesis.

In [23]: test_normal = normaltest(sample_normal)


test_normal.pvalue

Out[23]: array([0.21399014, 0.82811932, 0.90821621, 0.62834071])

The lesson to take away from this is that the normal approximation is imperfect.
Chapter 3

Modeling COVID 19

3.1 Contents

• Overview 3.2
• The SIR Model 3.3
• Implementation 3.4
• Experiments 3.5
• Ending Lockdown 3.6

3.2 Overview

This is a Python version of the code for analyzing the COVID-19 pandemic provided by An-
drew Atkeson.
See, in particular
• NBER Working Paper No. 26867
• COVID-19 Working papers and code
The purpose of his notes is to introduce economists to quantitative modeling of infectious dis-
ease dynamics.
Dynamics are modeled using a standard SIR (Susceptible-Infected-Removed) model of disease
spread.
The model dynamics are represented by a system of ordinary differential equations.
The main objective is to study the impact of suppression through social distancing on the
spread of the infection.
The focus is on US outcomes but the parameters can be adjusted to study other countries.
We will use the following standard imports:

In [1]: import numpy as np


from numpy import exp

import matplotlib.pyplot as plt

We will also use SciPy’s numerical routine odeint for solving differential equations.

33
34 CHAPTER 3. MODELING COVID 19

In [2]: from scipy.integrate import odeint

This routine calls into compiled code from the FORTRAN library odepack.

3.3 The SIR Model

In the version of the SIR model we will analyze there are four states.
All individuals in the population are assumed to be in one of these four states.
The states are: susceptible (S), exposed (E), infected (I) and removed (R).
Comments:
• Those in state R have been infected and either recovered or died.
• Those who have recovered are assumed to have acquired immunity.
• Those in the exposed group are not yet infectious.

3.3.1 Time Path

The flow across states follows the path 𝑆 → 𝐸 → 𝐼 → 𝑅.


All individuals in the population are eventually infected when the transmission rate is posi-
tive and 𝑖(0) > 0.
The interest is primarily in
• the number of infections at a given time (which determines whether or not the health
care system is overwhelmed) and
• how long the caseload can be deferred (hopefully until a vaccine arrives)
Using lower case letters for the fraction of the population in each state, the dynamics are

𝑠(𝑡)
̇ = −𝛽(𝑡) 𝑠(𝑡) 𝑖(𝑡)
𝑒(𝑡)
̇ = 𝛽(𝑡) 𝑠(𝑡) 𝑖(𝑡) − 𝜎𝑒(𝑡) (1)
̇ = 𝜎𝑒(𝑡) − 𝛾𝑖(𝑡)
𝑖(𝑡)

In these equations,
• 𝛽(𝑡) is called the transmission rate (the rate at which individuals bump into others and
expose them to the virus).
• 𝜎 is called the infection rate (the rate at which those who are exposed become infected)
• 𝛾 is called the recovery rate (the rate at which infected people recover or die).
• the dot symbol 𝑦 ̇ represents the time derivative 𝑑𝑦/𝑑𝑡.
We do not need to model the fraction 𝑟 of the population in state 𝑅 separately because the
states form a partition.
In particular, the “removed” fraction of the population is 𝑟 = 1 − 𝑠 − 𝑒 − 𝑖.
We will also track 𝑐 = 𝑖 + 𝑟, which is the cumulative caseload (i.e., all those who have or have
had the infection).
The system (1) can be written in vector form as
3.4. IMPLEMENTATION 35

𝑥̇ = 𝐹 (𝑥, 𝑡), 𝑥 ∶= (𝑠, 𝑒, 𝑖) (2)

for suitable definition of 𝐹 (see the code below).

3.3.2 Parameters

Both 𝜎 and 𝛾 are thought of as fixed, biologically determined parameters.


As in Atkeson’s note, we set
• 𝜎 = 1/5.2 to reflect an average incubation period of 5.2 days.
• 𝛾 = 1/18 to match an average illness duration of 18 days.
The transmission rate is modeled as
• 𝛽(𝑡) ∶= 𝑅(𝑡)𝛾 where 𝑅(𝑡) is the effective reproduction number at time 𝑡.
(The notation is slightly confusing, since 𝑅(𝑡) is different to 𝑅, the symbol that represents the
removed state.)

3.4 Implementation

First we set the population size to match the US.

In [3]: pop_size = 3.3e8

Next we fix parameters as described above.

In [4]: γ = 1 / 18
σ = 1 / 5.2

Now we construct a function that represents 𝐹 in (2)

In [5]: def F(x, t, R0=1.6):


"""
Time derivative of the state vector.

* x is the state vector (array_like)


* t is time (scalar)
* R0 is the effective transmission rate, defaulting to a constant

"""
s, e, i = x

# New exposure of susceptibles


β = R0(t) * γ if callable(R0) else R0 * γ
ne = β * s * i

# Time derivatives
ds = ­ ne
de = ne ­ σ * e
di = σ * e ­ γ * i

return ds, de, di


36 CHAPTER 3. MODELING COVID 19

Note that R0 can be either constant or a given function of time.


The initial conditions are set to

In [6]: # initial conditions of s, e, i


i_0 = 1e­7
e_0 = 4 * i_0
s_0 = 1 ­ i_0 ­ e_0

In vector form the initial condition is

In [7]: x_0 = s_0, e_0, i_0

We solve for the time path numerically using odeint, at a sequence of dates t_vec.

In [8]: def solve_path(R0, t_vec, x_init=x_0):


"""
Solve for i(t) and c(t) via numerical integration,
given the time path for R0.

"""
G = lambda x, t: F(x, t, R0)
s_path, e_path, i_path = odeint(G, x_init, t_vec).transpose()

c_path = 1 ­ s_path ­ e_path # cumulative cases


return i_path, c_path

3.5 Experiments

Let’s run some experiments using this code.


The time period we investigate will be 550 days, or around 18 months:

In [9]: t_length = 550


grid_size = 1000
t_vec = np.linspace(0, t_length, grid_size)

3.5.1 Experiment 1: Constant R0 Case

Let’s start with the case where R0 is constant.


We calculate the time path of infected people under different assumptions for R0:

In [10]: R0_vals = np.linspace(1.6, 3.0, 6)


labels = [f'$R0 = {r:.2f}$' for r in R0_vals]
i_paths, c_paths = [], []

for r in R0_vals:
i_path, c_path = solve_path(r, t_vec)
i_paths.append(i_path)
c_paths.append(c_path)

Here’s some code to plot the time paths.


3.5. EXPERIMENTS 37

In [11]: def plot_paths(paths, labels, times=t_vec):

fig, ax = plt.subplots()

for path, label in zip(paths, labels):


ax.plot(times, path, label=label)

ax.legend(loc='upper left')

plt.show()

Let’s plot current cases as a fraction of the population.

In [12]: plot_paths(i_paths, labels)

As expected, lower effective transmission rates defer the peak of infections.


They also lead to a lower peak in current cases.
Here is cumulative cases, as a fraction of population:

In [13]: plot_paths(c_paths, labels)


38 CHAPTER 3. MODELING COVID 19

3.5.2 Experiment 2: Changing Mitigation

Let’s look at a scenario where mitigation (e.g., social distancing) is successively imposed.
Here’s a specification for R0 as a function of time.

In [14]: def R0_mitigating(t, r0=3, η=1, r_bar=1.6):


R0 = r0 * exp(­ η * t) + (1 ­ exp(­ η * t)) * r_bar
return R0

The idea is that R0 starts off at 3 and falls to 1.6.


This is due to progressive adoption of stricter mitigation measures.
The parameter η controls the rate, or the speed at which restrictions are imposed.
We consider several different rates:

In [15]: η_vals = 1/5, 1/10, 1/20, 1/50, 1/100


labels = [fr'$\eta = {η:.2f}$' for η in η_vals]

This is what the time path of R0 looks like at these alternative rates:

In [16]: fig, ax = plt.subplots()

for η, label in zip(η_vals, labels):


ax.plot(t_vec, R0_mitigating(t_vec, η=η), label=label)

ax.legend()
plt.show()
3.5. EXPERIMENTS 39

Let’s calculate the time path of infected people:

In [17]: i_paths, c_paths = [], []

for η in η_vals:
R0 = lambda t: R0_mitigating(t, η=η)
i_path, c_path = solve_path(R0, t_vec)
i_paths.append(i_path)
c_paths.append(c_path)

This is current cases under the different scenarios:

In [18]: plot_paths(i_paths, labels)


40 CHAPTER 3. MODELING COVID 19

Here are cumulative cases, as a fraction of population:

In [19]: plot_paths(c_paths, labels)

3.6 Ending Lockdown

The following replicates additional results by Andrew Atkeson on the timing of lifting lock-
down.
Consider these two mitigation scenarios:

1. 𝑅𝑡 = 0.5 for 30 days and then 𝑅𝑡 = 2 for the remaining 17 months. This corresponds to
lifting lockdown in 30 days.

2. 𝑅𝑡 = 0.5 for 120 days and then 𝑅𝑡 = 2 for the remaining 14 months. This corresponds
to lifting lockdown in 4 months.

The parameters considered here start the model with 25,000 active infections and 75,000
agents already exposed to the virus and thus soon to be contagious.

In [20]: # initial conditions


i_0 = 25_000 / pop_size
e_0 = 75_000 / pop_size
s_0 = 1 ­ i_0 ­ e_0
x_0 = s_0, e_0, i_0

Let’s calculate the paths:


3.6. ENDING LOCKDOWN 41

In [21]: R0_paths = (lambda t: 0.5 if t < 30 else 2,


lambda t: 0.5 if t < 120 else 2)

labels = [f'scenario {i}' for i in (1, 2)]

i_paths, c_paths = [], []

for R0 in R0_paths:
i_path, c_path = solve_path(R0, t_vec, x_init=x_0)
i_paths.append(i_path)
c_paths.append(c_path)

Here is the number of active infections:

In [22]: plot_paths(i_paths, labels)

What kind of mortality can we expect under these scenarios?


Suppose that 1% of cases result in death

In [23]: ν = 0.01

This is the cumulative number of deaths:

In [24]: paths = [path * ν * pop_size for path in c_paths]


plot_paths(paths, labels)
42 CHAPTER 3. MODELING COVID 19

This is the daily death rate:

In [25]: paths = [path * ν * γ * pop_size for path in i_paths]


plot_paths(paths, labels)

Pushing the peak of curve further into the future may reduce cumulative deaths if a vaccine is
found.
Chapter 4

Linear Algebra

4.1 Contents

• Overview 4.2
• Vectors 4.3
• Matrices 4.4
• Solving Systems of Equations 4.5
• Eigenvalues and Eigenvectors 4.6
• Further Topics 4.7
• Exercises 4.8
• Solutions 4.9

4.2 Overview

Linear algebra is one of the most useful branches of applied mathematics for economists to
invest in.
For example, many applied problems in economics and finance require the solution of a linear
system of equations, such as

𝑦1 = 𝑎𝑥1 + 𝑏𝑥2
𝑦2 = 𝑐𝑥1 + 𝑑𝑥2

or, more generally,

𝑦1 = 𝑎11 𝑥1 + 𝑎12 𝑥2 + ⋯ + 𝑎1𝑘 𝑥𝑘


⋮ (1)
𝑦𝑛 = 𝑎𝑛1 𝑥1 + 𝑎𝑛2 𝑥2 + ⋯ + 𝑎𝑛𝑘 𝑥𝑘

The objective here is to solve for the “unknowns” 𝑥1 , … , 𝑥𝑘 given 𝑎11 , … , 𝑎𝑛𝑘 and 𝑦1 , … , 𝑦𝑛 .
When considering such problems, it is essential that we first consider at least some of the fol-
lowing questions
• Does a solution actually exist?
• Are there in fact many solutions, and if so how should we interpret them?
• If no solution exists, is there a best “approximate” solution?

43
44 CHAPTER 4. LINEAR ALGEBRA

• If a solution exists, how should we compute it?


These are the kinds of topics addressed by linear algebra.
In this lecture we will cover the basics of linear and matrix algebra, treating both theory and
computation.
We admit some overlap with this lecture, where operations on NumPy arrays were first ex-
plained.
Note that this lecture is more theoretical than most, and contains background material that
will be used in applications as we go along.
Let’s start with some imports:

In [1]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
from scipy.interpolate import interp2d
from scipy.linalg import inv, solve, det, eig

4.3 Vectors

A vector of length 𝑛 is just a sequence (or array, or tuple) of 𝑛 numbers, which we write as
𝑥 = (𝑥1 , … , 𝑥𝑛 ) or 𝑥 = [𝑥1 , … , 𝑥𝑛 ].
We will write these sequences either horizontally or vertically as we please.
(Later, when we wish to perform certain matrix operations, it will become necessary to distin-
guish between the two)
The set of all 𝑛-vectors is denoted by ℝ𝑛 .
For example, ℝ2 is the plane, and a vector in ℝ2 is just a point in the plane.
Traditionally, vectors are represented visually as arrows from the origin to the point.
The following figure represents three vectors in this manner

In [2]: fig, ax = plt.subplots(figsize=(10, 8))


# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.set(xlim=(­5, 5), ylim=(­5, 5))


ax.grid()
vecs = ((2, 4), (­3, 3), (­4, ­3.5))
for v in vecs:
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=0.7,
width=0.5))
ax.text(1.1 * v[0], 1.1 * v[1], str(v))
plt.show()
4.3. VECTORS 45

4.3.1 Vector Operations

The two most common operators for vectors are addition and scalar multiplication, which we
now describe.
As a matter of definition, when we add two vectors, we add them element-by-element

𝑥1 𝑦1 𝑥1 + 𝑦1
⎡𝑥 ⎤ ⎡𝑦 ⎤ ⎡𝑥 + 𝑦 ⎤
𝑥 + 𝑦 = ⎢ 2 ⎥ + ⎢ 2 ⎥ ∶= ⎢ 2 2⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎣𝑥𝑛 ⎦ ⎣𝑦𝑛 ⎦ ⎣𝑥𝑛 + 𝑦𝑛 ⎦

Scalar multiplication is an operation that takes a number 𝛾 and a vector 𝑥 and produces

𝛾𝑥1
⎡ 𝛾𝑥 ⎤
𝛾𝑥 ∶= ⎢ 2 ⎥
⎢ ⋮ ⎥
⎣𝛾𝑥𝑛 ⎦

Scalar multiplication is illustrated in the next figure

In [3]: fig, ax = plt.subplots(figsize=(10, 8))


# Set the axes through the origin
for spine in ['left', 'bottom']:
46 CHAPTER 4. LINEAR ALGEBRA

ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.set(xlim=(­5, 5), ylim=(­5, 5))


x = (2, 2)
ax.annotate('', xy=x, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=1,
width=0.5))
ax.text(x[0] + 0.4, x[1] ­ 0.2, '$x$', fontsize='16')

scalars = (­2, 2)
x = np.array(x)

for s in scalars:
v = s * x
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='red',
shrink=0,
alpha=0.5,
width=0.5))
ax.text(v[0] + 0.4, v[1] ­ 0.2, f'${s} x$', fontsize='16')
plt.show()

In Python, a vector can be represented as a list or tuple, such as x = (2, 4, 6), but is more
commonly represented as a NumPy array.
One advantage of NumPy arrays is that scalar multiplication and addition have very natural
4.3. VECTORS 47

syntax

In [4]: x = np.ones(3) # Vector of three ones


y = np.array((2, 4, 6)) # Converts tuple (2, 4, 6) into array
x + y

Out[4]: array([3., 5., 7.])

In [5]: 4 * x

Out[5]: array([4., 4., 4.])

4.3.2 Inner Product and Norm

The inner product of vectors 𝑥, 𝑦 ∈ ℝ𝑛 is defined as

𝑛
𝑥′ 𝑦 ∶= ∑ 𝑥𝑖 𝑦𝑖
𝑖=1

Two vectors are called orthogonal if their inner product is zero.


The norm of a vector 𝑥 represents its “length” (i.e., its distance from the zero vector) and is
defined as

1/2
√ 𝑛
‖𝑥‖ ∶= 𝑥 𝑥 ∶= (∑ 𝑥2𝑖 )

𝑖=1

The expression ‖𝑥 − 𝑦‖ is thought of as the distance between 𝑥 and 𝑦.


Continuing on from the previous example, the inner product and norm can be computed as
follows

In [6]: np.sum(x * y) # Inner product of x and y

Out[6]: 12.0

In [7]: np.sqrt(np.sum(x**2)) # Norm of x, take one

Out[7]: 1.7320508075688772

In [8]: np.linalg.norm(x) # Norm of x, take two

Out[8]: 1.7320508075688772

4.3.3 Span

Given a set of vectors 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } in ℝ𝑛 , it’s natural to think about the new vectors we
can create by performing linear operations.
New vectors created in this manner are called linear combinations of 𝐴.
48 CHAPTER 4. LINEAR ALGEBRA

In particular, 𝑦 ∈ ℝ𝑛 is a linear combination of 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } if

𝑦 = 𝛽1 𝑎1 + ⋯ + 𝛽𝑘 𝑎𝑘 for some scalars 𝛽1 , … , 𝛽𝑘

In this context, the values 𝛽1 , … , 𝛽𝑘 are called the coefficients of the linear combination.
The set of linear combinations of 𝐴 is called the span of 𝐴.
The next figure shows the span of 𝐴 = {𝑎1 , 𝑎2 } in ℝ3 .
The span is a two-dimensional plane passing through these two points and the origin.

In [9]: fig = plt.figure(figsize=(10, 8))


ax = fig.gca(projection='3d')

x_min, x_max = ­5, 5


y_min, y_max = ­5, 5

α, β = 0.2, 0.1

ax.set(xlim=(x_min, x_max), ylim=(x_min, x_max), zlim=(x_min, x_max),


xticks=(0,), yticks=(0,), zticks=(0,))

gs = 3
z = np.linspace(x_min, x_max, gs)
x = np.zeros(gs)
y = np.zeros(gs)
ax.plot(x, y, z, 'k­', lw=2, alpha=0.5)
ax.plot(z, x, y, 'k­', lw=2, alpha=0.5)
ax.plot(y, z, x, 'k­', lw=2, alpha=0.5)

# Fixed linear function, to generate a plane


def f(x, y):
return α * x + β * y

# Vector locations, by coordinate


x_coords = np.array((3, 3))
y_coords = np.array((4, ­4))
z = f(x_coords, y_coords)
for i in (0, 1):
ax.text(x_coords[i], y_coords[i], z[i], f'$a_{i+1}$', fontsize=14)

# Lines to vectors
for i in (0, 1):
x = (0, x_coords[i])
y = (0, y_coords[i])
z = (0, f(x_coords[i], y_coords[i]))
ax.plot(x, y, z, 'b­', lw=1.5, alpha=0.6)

# Draw the plane


grid_size = 20
xr2 = np.linspace(x_min, x_max, grid_size)
yr2 = np.linspace(y_min, y_max, grid_size)
x2, y2 = np.meshgrid(xr2, yr2)
z2 = f(x2, y2)
ax.plot_surface(x2, y2, z2, rstride=1, cstride=1, cmap=cm.jet,
linewidth=0, antialiased=True, alpha=0.2)
plt.show()
4.3. VECTORS 49

Examples

If 𝐴 contains only one vector 𝑎1 ∈ ℝ2 , then its span is just the scalar multiples of 𝑎1 , which is
the unique line passing through both 𝑎1 and the origin.
If 𝐴 = {𝑒1 , 𝑒2 , 𝑒3 } consists of the canonical basis vectors of ℝ3 , that is

1 0 0
𝑒1 ∶= ⎡ ⎤
⎢0⎥ , 𝑒2 ∶= ⎡ ⎤
⎢1⎥ , 𝑒3 ∶= ⎡
⎢0⎥

⎣0⎦ ⎣0⎦ ⎣1⎦

then the span of 𝐴 is all of ℝ3 , because, for any 𝑥 = (𝑥1 , 𝑥2 , 𝑥3 ) ∈ ℝ3 , we can write

𝑥 = 𝑥 1 𝑒1 + 𝑥 2 𝑒2 + 𝑥 3 𝑒3

Now consider 𝐴0 = {𝑒1 , 𝑒2 , 𝑒1 + 𝑒2 }.


If 𝑦 = (𝑦1 , 𝑦2 , 𝑦3 ) is any linear combination of these vectors, then 𝑦3 = 0 (check it).
50 CHAPTER 4. LINEAR ALGEBRA

Hence 𝐴0 fails to span all of ℝ3 .

4.3.4 Linear Independence

As we’ll see, it’s often desirable to find families of vectors with relatively large span, so that
many vectors can be described by linear operators on a few vectors.
The condition we need for a set of vectors to have a large span is what’s called linear inde-
pendence.
In particular, a collection of vectors 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } in ℝ𝑛 is said to be
• linearly dependent if some strict subset of 𝐴 has the same span as 𝐴.
• linearly independent if it is not linearly dependent.
Put differently, a set of vectors is linearly independent if no vector is redundant to the span
and linearly dependent otherwise.
To illustrate the idea, recall the figure that showed the span of vectors {𝑎1 , 𝑎2 } in ℝ3 as a
plane through the origin.
If we take a third vector 𝑎3 and form the set {𝑎1 , 𝑎2 , 𝑎3 }, this set will be
• linearly dependent if 𝑎3 lies in the plane
• linearly independent otherwise
As another illustration of the concept, since ℝ𝑛 can be spanned by 𝑛 vectors (see the discus-
sion of canonical basis vectors above), any collection of 𝑚 > 𝑛 vectors in ℝ𝑛 must be linearly
dependent.
The following statements are equivalent to linear independence of 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } ⊂ ℝ𝑛

1. No vector in 𝐴 can be formed as a linear combination of the other elements.

2. If 𝛽1 𝑎1 + ⋯ 𝛽𝑘 𝑎𝑘 = 0 for scalars 𝛽1 , … , 𝛽𝑘 , then 𝛽1 = ⋯ = 𝛽𝑘 = 0.

(The zero in the first expression is the origin of ℝ𝑛 )

4.3.5 Unique Representations

Another nice thing about sets of linearly independent vectors is that each element in the span
has a unique representation as a linear combination of these vectors.
In other words, if 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } ⊂ ℝ𝑛 is linearly independent and

𝑦 = 𝛽 1 𝑎1 + ⋯ 𝛽 𝑘 𝑎𝑘

then no other coefficient sequence 𝛾1 , … , 𝛾𝑘 will produce the same vector 𝑦.


Indeed, if we also have 𝑦 = 𝛾1 𝑎1 + ⋯ 𝛾𝑘 𝑎𝑘 , then

(𝛽1 − 𝛾1 )𝑎1 + ⋯ + (𝛽𝑘 − 𝛾𝑘 )𝑎𝑘 = 0

Linear independence now implies 𝛾𝑖 = 𝛽𝑖 for all 𝑖.


4.4. MATRICES 51

4.4 Matrices

Matrices are a neat way of organizing data for use in linear operations.
An 𝑛 × 𝑘 matrix is a rectangular array 𝐴 of numbers with 𝑛 rows and 𝑘 columns:

𝑎11 𝑎12 ⋯ 𝑎1𝑘


⎡𝑎 𝑎22 ⋯ 𝑎2𝑘 ⎤
𝐴 = ⎢ 21 ⎥
⎢ ⋮ ⋮ ⋮ ⎥
⎣𝑎𝑛1 𝑎𝑛2 ⋯ 𝑎𝑛𝑘 ⎦

Often, the numbers in the matrix represent coefficients in a system of linear equations, as dis-
cussed at the start of this lecture.
For obvious reasons, the matrix 𝐴 is also called a vector if either 𝑛 = 1 or 𝑘 = 1.
In the former case, 𝐴 is called a row vector, while in the latter it is called a column vector.
If 𝑛 = 𝑘, then 𝐴 is called square.
The matrix formed by replacing 𝑎𝑖𝑗 by 𝑎𝑗𝑖 for every 𝑖 and 𝑗 is called the transpose of 𝐴 and
denoted 𝐴′ or 𝐴⊤ .
If 𝐴 = 𝐴′ , then 𝐴 is called symmetric.
For a square matrix 𝐴, the 𝑖 elements of the form 𝑎𝑖𝑖 for 𝑖 = 1, … , 𝑛 are called the principal
diagonal.
𝐴 is called diagonal if the only nonzero entries are on the principal diagonal.
If, in addition to being diagonal, each element along the principal diagonal is equal to 1, then
𝐴 is called the identity matrix and denoted by 𝐼.

4.4.1 Matrix Operations

Just as was the case for vectors, a number of algebraic operations are defined for matrices.
Scalar multiplication and addition are immediate generalizations of the vector case:

𝑎11 ⋯ 𝑎1𝑘 𝛾𝑎11 ⋯ 𝛾𝑎1𝑘



𝛾𝐴 = 𝛾 ⎢ ⋮ ⋮ ⎤ ⎡
⋮ ⎥ ∶= ⎢ ⋮ ⋮ ⋮ ⎤ ⎥
𝑎
⎣ 𝑛1 ⋯ 𝑎 𝑛𝑘 ⎦ 𝛾𝑎
⎣ 𝑛1 ⋯ 𝛾𝑎 𝑛𝑘 ⎦

and

𝑎11 ⋯ 𝑎1𝑘 𝑏11 ⋯ 𝑏1𝑘 𝑎11 + 𝑏11 ⋯ 𝑎1𝑘 + 𝑏1𝑘


𝐴+𝐵 =⎡
⎢ ⋮ ⋮ ⋮ ⎤+⎡ ⋮
⎥ ⎢ ⋮ ⋮ ⎤ ∶= ⎡
⎥ ⎢ ⋮ ⋮ ⋮ ⎤

⎣𝑎𝑛1 ⋯ 𝑎𝑛𝑘 ⎦ ⎣𝑏𝑛1 ⋯ 𝑏𝑛𝑘 ⎦ ⎣𝑎𝑛1 + 𝑏𝑛1 ⋯ 𝑎𝑛𝑘 + 𝑏𝑛𝑘 ⎦

In the latter case, the matrices must have the same shape in order for the definition to make
sense.
We also have a convention for multiplying two matrices.
The rule for matrix multiplication generalizes the idea of inner products discussed above and
is designed to make multiplication play well with basic linear operations.
52 CHAPTER 4. LINEAR ALGEBRA

If 𝐴 and 𝐵 are two matrices, then their product 𝐴𝐵 is formed by taking as its 𝑖, 𝑗-th element
the inner product of the 𝑖-th row of 𝐴 and the 𝑗-th column of 𝐵.
There are many tutorials to help you visualize this operation, such as this one, or the discus-
sion on the Wikipedia page.
If 𝐴 is 𝑛 × 𝑘 and 𝐵 is 𝑗 × 𝑚, then to multiply 𝐴 and 𝐵 we require 𝑘 = 𝑗, and the resulting
matrix 𝐴𝐵 is 𝑛 × 𝑚.
As perhaps the most important special case, consider multiplying 𝑛 × 𝑘 matrix 𝐴 and 𝑘 × 1
column vector 𝑥.
According to the preceding rule, this gives us an 𝑛 × 1 column vector

𝑎11 ⋯ 𝑎1𝑘 𝑥1 𝑎11 𝑥1 + ⋯ + 𝑎1𝑘 𝑥𝑘



𝐴𝑥 = ⎢ ⋮ ⋮ ⎤ ⎡ ⎤ ⎡
⋮ ⎥ ⎢ ⋮ ⎥ ∶= ⎢ ⋮ ⎤ (2)

⎣𝑎𝑛1 ⋯ 𝑎𝑛𝑘 ⎦ ⎣𝑥𝑘 ⎦ ⎣𝑎𝑛1 𝑥1 + ⋯ + 𝑎𝑛𝑘 𝑥𝑘 ⎦

Note
𝐴𝐵 and 𝐵𝐴 are not generally the same thing.

Another important special case is the identity matrix.


You should check that if 𝐴 is 𝑛 × 𝑘 and 𝐼 is the 𝑘 × 𝑘 identity matrix, then 𝐴𝐼 = 𝐴.
If 𝐼 is the 𝑛 × 𝑛 identity matrix, then 𝐼𝐴 = 𝐴.

4.4.2 Matrices in NumPy

NumPy arrays are also used as matrices, and have fast, efficient functions and methods for all
the standard matrix operations Section ??.
You can create them manually from tuples of tuples (or lists of lists) as follows

In [10]: A = ((1, 2),


(3, 4))

type(A)

Out[10]: tuple

In [11]: A = np.array(A)

type(A)

Out[11]: numpy.ndarray

In [12]: A.shape

Out[12]: (2, 2)

The shape attribute is a tuple giving the number of rows and columns — see here for more
discussion.
4.5. SOLVING SYSTEMS OF EQUATIONS 53

To get the transpose of A, use A.transpose() or, more simply, A.T.


There are many convenient functions for creating common matrices (matrices of zeros, ones,
etc.) — see here.
Since operations are performed elementwise by default, scalar multiplication and addition
have very natural syntax

In [13]: A = np.identity(3)
B = np.ones((3, 3))
2 * A

Out[13]: array([[2., 0., 0.],


[0., 2., 0.],
[0., 0., 2.]])

In [14]: A + B

Out[14]: array([[2., 1., 1.],


[1., 2., 1.],
[1., 1., 2.]])

To multiply matrices we use the @ symbol.


In particular, A @ B is matrix multiplication, whereas A * B is element-by-element multiplica-
tion.
See here for more discussion.

4.4.3 Matrices as Maps

Each 𝑛 × 𝑘 matrix 𝐴 can be identified with a function 𝑓(𝑥) = 𝐴𝑥 that maps 𝑥 ∈ ℝ𝑘 into
𝑦 = 𝐴𝑥 ∈ ℝ𝑛 .
These kinds of functions have a special property: they are linear.
A function 𝑓 ∶ ℝ𝑘 → ℝ𝑛 is called linear if, for all 𝑥, 𝑦 ∈ ℝ𝑘 and all scalars 𝛼, 𝛽, we have

𝑓(𝛼𝑥 + 𝛽𝑦) = 𝛼𝑓(𝑥) + 𝛽𝑓(𝑦)

You can check that this holds for the function 𝑓(𝑥) = 𝐴𝑥 + 𝑏 when 𝑏 is the zero vector and
fails when 𝑏 is nonzero.
In fact, it’s known that 𝑓 is linear if and only if there exists a matrix 𝐴 such that 𝑓(𝑥) = 𝐴𝑥
for all 𝑥.

4.5 Solving Systems of Equations

Recall again the system of equations (1).


If we compare (1) and (2), we see that (1) can now be written more conveniently as

𝑦 = 𝐴𝑥 (3)
54 CHAPTER 4. LINEAR ALGEBRA

The problem we face is to determine a vector 𝑥 ∈ ℝ𝑘 that solves (3), taking 𝑦 and 𝐴 as given.
This is a special case of a more general problem: Find an 𝑥 such that 𝑦 = 𝑓(𝑥).
Given an arbitrary function 𝑓 and a 𝑦, is there always an 𝑥 such that 𝑦 = 𝑓(𝑥)?
If so, is it always unique?
The answer to both these questions is negative, as the next figure shows

In [15]: def f(x):


return 0.6 * np.cos(4 * x) + 1.4

xmin, xmax = ­1, 1


x = np.linspace(xmin, xmax, 160)
y = f(x)
ya, yb = np.min(y), np.max(y)

fig, axes = plt.subplots(2, 1, figsize=(10, 10))

for ax in axes:
# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.set(ylim=(­0.6, 3.2), xlim=(xmin, xmax),


yticks=(), xticks=())

ax.plot(x, y, 'k­', lw=2, label='$f$')


ax.fill_between(x, ya, yb, facecolor='blue', alpha=0.05)
ax.vlines([0], ya, yb, lw=3, color='blue', label='range of $f$')
ax.text(0.04, ­0.3, '$0$', fontsize=16)

ax = axes[0]

ax.legend(loc='upper right', frameon=False)


ybar = 1.5
ax.plot(x, x * 0 + ybar, 'k­­', alpha=0.5)
ax.text(0.05, 0.8 * ybar, '$y$', fontsize=16)
for i, z in enumerate((­0.35, 0.35)):
ax.vlines(z, 0, f(z), linestyle='­­', alpha=0.5)
ax.text(z, ­0.2, f'$x_{i}$', fontsize=16)

ax = axes[1]

ybar = 2.6
ax.plot(x, x * 0 + ybar, 'k­­', alpha=0.5)
ax.text(0.04, 0.91 * ybar, '$y$', fontsize=16)

plt.show()
4.5. SOLVING SYSTEMS OF EQUATIONS 55

In the first plot, there are multiple solutions, as the function is not one-to-one, while in the
second there are no solutions, since 𝑦 lies outside the range of 𝑓.
Can we impose conditions on 𝐴 in (3) that rule out these problems?
In this context, the most important thing to recognize about the expression 𝐴𝑥 is that it cor-
responds to a linear combination of the columns of 𝐴.
In particular, if 𝑎1 , … , 𝑎𝑘 are the columns of 𝐴, then

𝐴𝑥 = 𝑥1 𝑎1 + ⋯ + 𝑥𝑘 𝑎𝑘

Hence the range of 𝑓(𝑥) = 𝐴𝑥 is exactly the span of the columns of 𝐴.


We want the range to be large so that it contains arbitrary 𝑦.
As you might recall, the condition that we want for the span to be large is linear indepen-
dence.
A happy fact is that linear independence of the columns of 𝐴 also gives us uniqueness.
Indeed, it follows from our earlier discussion that if {𝑎1 , … , 𝑎𝑘 } are linearly independent and
𝑦 = 𝐴𝑥 = 𝑥1 𝑎1 + ⋯ + 𝑥𝑘 𝑎𝑘 , then no 𝑧 ≠ 𝑥 satisfies 𝑦 = 𝐴𝑧.
56 CHAPTER 4. LINEAR ALGEBRA

4.5.1 The Square Matrix Case

Let’s discuss some more details, starting with the case where 𝐴 is 𝑛 × 𝑛.
This is the familiar case where the number of unknowns equals the number of equations.
For arbitrary 𝑦 ∈ ℝ𝑛 , we hope to find a unique 𝑥 ∈ ℝ𝑛 such that 𝑦 = 𝐴𝑥.
In view of the observations immediately above, if the columns of 𝐴 are linearly independent,
then their span, and hence the range of 𝑓(𝑥) = 𝐴𝑥, is all of ℝ𝑛 .
Hence there always exists an 𝑥 such that 𝑦 = 𝐴𝑥.
Moreover, the solution is unique.
In particular, the following are equivalent

1. The columns of 𝐴 are linearly independent.

2. For any 𝑦 ∈ ℝ𝑛 , the equation 𝑦 = 𝐴𝑥 has a unique solution.

The property of having linearly independent columns is sometimes expressed as having full
column rank.

Inverse Matrices

Can we give some sort of expression for the solution?


If 𝑦 and 𝐴 are scalar with 𝐴 ≠ 0, then the solution is 𝑥 = 𝐴−1 𝑦.
A similar expression is available in the matrix case.
In particular, if square matrix 𝐴 has full column rank, then it possesses a multiplicative in-
verse matrix 𝐴−1 , with the property that 𝐴𝐴−1 = 𝐴−1 𝐴 = 𝐼.
As a consequence, if we pre-multiply both sides of 𝑦 = 𝐴𝑥 by 𝐴−1 , we get 𝑥 = 𝐴−1 𝑦.
This is the solution that we’re looking for.

Determinants

Another quick comment about square matrices is that to every such matrix we assign a
unique number called the determinant of the matrix — you can find the expression for it
here.
If the determinant of 𝐴 is not zero, then we say that 𝐴 is nonsingular.
Perhaps the most important fact about determinants is that 𝐴 is nonsingular if and only if 𝐴
is of full column rank.
This gives us a useful one-number summary of whether or not a square matrix can be in-
verted.

4.5.2 More Rows than Columns

This is the 𝑛 × 𝑘 case with 𝑛 > 𝑘.


4.5. SOLVING SYSTEMS OF EQUATIONS 57

This case is very important in many settings, not least in the setting of linear regression
(where 𝑛 is the number of observations, and 𝑘 is the number of explanatory variables).
Given arbitrary 𝑦 ∈ ℝ𝑛 , we seek an 𝑥 ∈ ℝ𝑘 such that 𝑦 = 𝐴𝑥.
In this setting, the existence of a solution is highly unlikely.
Without much loss of generality, let’s go over the intuition focusing on the case where the
columns of 𝐴 are linearly independent.
It follows that the span of the columns of 𝐴 is a 𝑘-dimensional subspace of ℝ𝑛 .
This span is very “unlikely” to contain arbitrary 𝑦 ∈ ℝ𝑛 .
To see why, recall the figure above, where 𝑘 = 2 and 𝑛 = 3.
Imagine an arbitrarily chosen 𝑦 ∈ ℝ3 , located somewhere in that three-dimensional space.
What’s the likelihood that 𝑦 lies in the span of {𝑎1 , 𝑎2 } (i.e., the two dimensional plane
through these points)?
In a sense, it must be very small, since this plane has zero “thickness”.
As a result, in the 𝑛 > 𝑘 case we usually give up on existence.
However, we can still seek the best approximation, for example, an 𝑥 that makes the distance
‖𝑦 − 𝐴𝑥‖ as small as possible.
To solve this problem, one can use either calculus or the theory of orthogonal projections.
The solution is known to be 𝑥̂ = (𝐴′ 𝐴)−1 𝐴′ 𝑦 — see for example chapter 3 of these notes.

4.5.3 More Columns than Rows

This is the 𝑛 × 𝑘 case with 𝑛 < 𝑘, so there are fewer equations than unknowns.
In this case there are either no solutions or infinitely many — in other words, uniqueness
never holds.
For example, consider the case where 𝑘 = 3 and 𝑛 = 2.
Thus, the columns of 𝐴 consists of 3 vectors in ℝ2 .
This set can never be linearly independent, since it is possible to find two vectors that span
ℝ2 .
(For example, use the canonical basis vectors)
It follows that one column is a linear combination of the other two.
For example, let’s say that 𝑎1 = 𝛼𝑎2 + 𝛽𝑎3 .
Then if 𝑦 = 𝐴𝑥 = 𝑥1 𝑎1 + 𝑥2 𝑎2 + 𝑥3 𝑎3 , we can also write

𝑦 = 𝑥1 (𝛼𝑎2 + 𝛽𝑎3 ) + 𝑥2 𝑎2 + 𝑥3 𝑎3 = (𝑥1 𝛼 + 𝑥2 )𝑎2 + (𝑥1 𝛽 + 𝑥3 )𝑎3

In other words, uniqueness fails.

4.5.4 Linear Equations with SciPy

Here’s an illustration of how to solve linear equations with SciPy’s linalg submodule.
58 CHAPTER 4. LINEAR ALGEBRA

All of these routines are Python front ends to time-tested and highly optimized FORTRAN
code

In [16]: A = ((1, 2), (3, 4))


A = np.array(A)
y = np.ones((2, 1)) # Column vector
det(A) # Check that A is nonsingular, and hence invertible

Out[16]: ­2.0

In [17]: A_inv = inv(A) # Compute the inverse


A_inv

Out[17]: array([[­2. , 1. ],
[ 1.5, ­0.5]])

In [18]: x = A_inv @ y # Solution


A @ x # Should equal y

Out[18]: array([[1.],
[1.]])

In [19]: solve(A, y) # Produces the same solution

Out[19]: array([[­1.],
[ 1.]])

Observe how we can solve for 𝑥 = 𝐴−1 𝑦 by either via inv(A) @ y, or using solve(A, y).
The latter method uses a different algorithm (LU decomposition) that is numerically more
stable, and hence should almost always be preferred.
To obtain the least-squares solution 𝑥̂ = (𝐴′ 𝐴)−1 𝐴′ 𝑦, use scipy.linalg.lstsq(A, y).

4.6 Eigenvalues and Eigenvectors

Let 𝐴 be an 𝑛 × 𝑛 square matrix.


If 𝜆 is scalar and 𝑣 is a non-zero vector in ℝ𝑛 such that

𝐴𝑣 = 𝜆𝑣

then we say that 𝜆 is an eigenvalue of 𝐴, and 𝑣 is an eigenvector.


Thus, an eigenvector of 𝐴 is a vector such that when the map 𝑓(𝑥) = 𝐴𝑥 is applied, 𝑣 is
merely scaled.
The next figure shows two eigenvectors (blue arrows) and their images under 𝐴 (red arrows).
As expected, the image 𝐴𝑣 of each 𝑣 is just a scaled version of the original

In [20]: A = ((1, 2),


(2, 1))
4.6. EIGENVALUES AND EIGENVECTORS 59

A = np.array(A)
evals, evecs = eig(A)
evecs = evecs[:, 0], evecs[:, 1]

fig, ax = plt.subplots(figsize=(10, 8))


# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')
ax.grid(alpha=0.4)

xmin, xmax = ­3, 3


ymin, ymax = ­3, 3
ax.set(xlim=(xmin, xmax), ylim=(ymin, ymax))

# Plot each eigenvector


for v in evecs:
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=0.6,
width=0.5))

# Plot the image of each eigenvector


for v in evecs:
v = A @ v
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='red',
shrink=0,
alpha=0.6,
width=0.5))

# Plot the lines they run through


x = np.linspace(xmin, xmax, 3)
for v in evecs:
a = v[1] / v[0]
ax.plot(x, a * x, 'b­', lw=0.4)

plt.show()
60 CHAPTER 4. LINEAR ALGEBRA

The eigenvalue equation is equivalent to (𝐴 − 𝜆𝐼)𝑣 = 0, and this has a nonzero solution 𝑣 only
when the columns of 𝐴 − 𝜆𝐼 are linearly dependent.
This in turn is equivalent to stating that the determinant is zero.
Hence to find all eigenvalues, we can look for 𝜆 such that the determinant of 𝐴 − 𝜆𝐼 is zero.
This problem can be expressed as one of solving for the roots of a polynomial in 𝜆 of degree
𝑛.
This in turn implies the existence of 𝑛 solutions in the complex plane, although some might
be repeated.
Some nice facts about the eigenvalues of a square matrix 𝐴 are as follows

1. The determinant of 𝐴 equals the product of the eigenvalues.

2. The trace of 𝐴 (the sum of the elements on the principal diagonal) equals the sum of
the eigenvalues.

3. If 𝐴 is symmetric, then all of its eigenvalues are real.

4. If 𝐴 is invertible and 𝜆1 , … , 𝜆𝑛 are its eigenvalues, then the eigenvalues of 𝐴−1 are
1/𝜆1 , … , 1/𝜆𝑛 .

A corollary of the first statement is that a matrix is invertible if and only if all its eigenvalues
are nonzero.
Using SciPy, we can solve for the eigenvalues and eigenvectors of a matrix as follows
4.7. FURTHER TOPICS 61

In [21]: A = ((1, 2),


(2, 1))

A = np.array(A)
evals, evecs = eig(A)
evals

Out[21]: array([ 3.+0.j, ­1.+0.j])

In [22]: evecs

Out[22]: array([[ 0.70710678, ­0.70710678],


[ 0.70710678, 0.70710678]])

Note that the columns of evecs are the eigenvectors.


Since any scalar multiple of an eigenvector is an eigenvector with the same eigenvalue (check
it), the eig routine normalizes the length of each eigenvector to one.

4.6.1 Generalized Eigenvalues

It is sometimes useful to consider the generalized eigenvalue problem, which, for given matri-
ces 𝐴 and 𝐵, seeks generalized eigenvalues 𝜆 and eigenvectors 𝑣 such that

𝐴𝑣 = 𝜆𝐵𝑣

This can be solved in SciPy via scipy.linalg.eig(A, B).


Of course, if 𝐵 is square and invertible, then we can treat the generalized eigenvalue problem
as an ordinary eigenvalue problem 𝐵−1 𝐴𝑣 = 𝜆𝑣, but this is not always the case.

4.7 Further Topics

We round out our discussion by briefly mentioning several other important topics.

4.7.1 Series Expansions

Recall the usual summation formula for a geometric progression, which states that if |𝑎| < 1,

then ∑𝑘=0 𝑎𝑘 = (1 − 𝑎)−1 .
A generalization of this idea exists in the matrix setting.

Matrix Norms

Let 𝐴 be a square matrix, and let

‖𝐴‖ ∶= max ‖𝐴𝑥‖


‖𝑥‖=1

The norms on the right-hand side are ordinary vector norms, while the norm on the left-hand
side is a matrix norm — in this case, the so-called spectral norm.
62 CHAPTER 4. LINEAR ALGEBRA

For example, for a square matrix 𝑆, the condition ‖𝑆‖ < 1 means that 𝑆 is contractive, in the
sense that it pulls all vectors towards the origin Section ??.

Neumann’s Theorem

Let 𝐴 be a square matrix and let 𝐴𝑘 ∶= 𝐴𝐴𝑘−1 with 𝐴1 ∶= 𝐴.


In other words, 𝐴𝑘 is the 𝑘-th power of 𝐴.
Neumann’s theorem states the following: If ‖𝐴𝑘 ‖ < 1 for some 𝑘 ∈ ℕ, then 𝐼 − 𝐴 is invertible,
and


(𝐼 − 𝐴)−1 = ∑ 𝐴𝑘 (4)
𝑘=0

Spectral Radius

A result known as Gelfand’s formula tells us that, for any square matrix 𝐴,

𝜌(𝐴) = lim ‖𝐴𝑘 ‖1/𝑘


𝑘→∞

Here 𝜌(𝐴) is the spectral radius, defined as max𝑖 |𝜆𝑖 |, where {𝜆𝑖 }𝑖 is the set of eigenvalues of
𝐴.
As a consequence of Gelfand’s formula, if all eigenvalues are strictly less than one in modulus,
there exists a 𝑘 with ‖𝐴𝑘 ‖ < 1.
In which case (4) is valid.

4.7.2 Positive Definite Matrices

Let 𝐴 be a symmetric 𝑛 × 𝑛 matrix.


We say that 𝐴 is

1. positive definite if 𝑥′ 𝐴𝑥 > 0 for every 𝑥 ∈ ℝ𝑛 {0}

2. positive semi-definite or nonnegative definite if 𝑥′ 𝐴𝑥 ≥ 0 for every 𝑥 ∈ ℝ𝑛

Analogous definitions exist for negative definite and negative semi-definite matrices.
It is notable that if 𝐴 is positive definite, then all of its eigenvalues are strictly positive, and
hence 𝐴 is invertible (with positive definite inverse).

4.7.3 Differentiating Linear and Quadratic Forms

The following formulas are useful in many economic contexts. Let


• 𝑧, 𝑥 and 𝑎 all be 𝑛 × 1 vectors
• 𝐴 be an 𝑛 × 𝑛 matrix
• 𝐵 be an 𝑚 × 𝑛 matrix and 𝑦 be an 𝑚 × 1 vector
4.8. EXERCISES 63

Then

𝜕𝑎′ 𝑥
1. 𝜕𝑥 =𝑎
𝜕𝐴𝑥
2. 𝜕𝑥 = 𝐴′
𝜕𝑥′ 𝐴𝑥
3. 𝜕𝑥 = (𝐴 + 𝐴′ )𝑥
𝜕𝑦′ 𝐵𝑧
4. 𝜕𝑦 = 𝐵𝑧

𝜕𝑦′ 𝐵𝑧
5. 𝜕𝐵 = 𝑦𝑧 ′

Exercise 1 below asks you to apply these formulas.

4.7.4 Further Reading

The documentation of the scipy.linalg submodule can be found here.


Chapters 2 and 3 of the Econometric Theory contains a discussion of linear algebra along the
same lines as above, with solved exercises.
If you don’t mind a slightly abstract approach, a nice intermediate-level text on linear algebra
is [60].

4.8 Exercises

4.8.1 Exercise 1

Let 𝑥 be a given 𝑛 × 1 vector and consider the problem

𝑣(𝑥) = max {−𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢}


𝑦,𝑢

subject to the linear constraint

𝑦 = 𝐴𝑥 + 𝐵𝑢

Here
• 𝑃 is an 𝑛 × 𝑛 matrix and 𝑄 is an 𝑚 × 𝑚 matrix
• 𝐴 is an 𝑛 × 𝑛 matrix and 𝐵 is an 𝑛 × 𝑚 matrix
• both 𝑃 and 𝑄 are symmetric and positive semidefinite
(What must the dimensions of 𝑦 and 𝑢 be to make this a well-posed problem?)
One way to solve the problem is to form the Lagrangian

ℒ = −𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢 + 𝜆′ [𝐴𝑥 + 𝐵𝑢 − 𝑦]

where 𝜆 is an 𝑛 × 1 vector of Lagrange multipliers.


64 CHAPTER 4. LINEAR ALGEBRA

Try applying the formulas given above for differentiating quadratic and linear forms to ob-
tain the first-order conditions for maximizing ℒ with respect to 𝑦, 𝑢 and minimizing it with
respect to 𝜆.
Show that these conditions imply that

1. 𝜆 = −2𝑃 𝑦.

2. The optimizing choice of 𝑢 satisfies 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥.

3. The function 𝑣 satisfies 𝑣(𝑥) = −𝑥′ 𝑃 ̃ 𝑥 where 𝑃 ̃ = 𝐴′ 𝑃 𝐴 − 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴.

As we will see, in economic contexts Lagrange multipliers often are shadow prices.

Note
If we don’t care about the Lagrange multipliers, we can substitute the constraint
into the objective function, and then just maximize −(𝐴𝑥+𝐵𝑢)′ 𝑃 (𝐴𝑥+𝐵𝑢)−𝑢′ 𝑄𝑢
with respect to 𝑢. You can verify that this leads to the same maximizer.

4.9 Solutions

4.9.1 Solution to Exercise 1

We have an optimization problem:

𝑣(𝑥) = max{−𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢}


𝑦,𝑢

s.t.

𝑦 = 𝐴𝑥 + 𝐵𝑢

with primitives
• 𝑃 be a symmetric and positive semidefinite 𝑛 × 𝑛 matrix
• 𝑄 be a symmetric and positive semidefinite 𝑚 × 𝑚 matrix
• 𝐴 an 𝑛 × 𝑛 matrix
• 𝐵 an 𝑛 × 𝑚 matrix
The associated Lagrangian is:

𝐿 = −𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢 + 𝜆′ [𝐴𝑥 + 𝐵𝑢 − 𝑦]

1. ^^.

Differentiating Lagrangian equation w.r.t y and setting its derivative equal to zero yields

𝜕𝐿
= −(𝑃 + 𝑃 ′ )𝑦 − 𝜆 = −2𝑃 𝑦 − 𝜆 = 0 ,
𝜕𝑦
4.9. SOLUTIONS 65

since P is symmetric.
Accordingly, the first-order condition for maximizing L w.r.t. y implies

𝜆 = −2𝑃 𝑦

2. ^^.

Differentiating Lagrangian equation w.r.t. u and setting its derivative equal to zero yields

𝜕𝐿
= −(𝑄 + 𝑄′ )𝑢 − 𝐵′ 𝜆 = −2𝑄𝑢 + 𝐵′ 𝜆 = 0
𝜕𝑢
Substituting 𝜆 = −2𝑃 𝑦 gives

𝑄𝑢 + 𝐵′ 𝑃 𝑦 = 0

Substituting the linear constraint 𝑦 = 𝐴𝑥 + 𝐵𝑢 into above equation gives

𝑄𝑢 + 𝐵′ 𝑃 (𝐴𝑥 + 𝐵𝑢) = 0

(𝑄 + 𝐵′ 𝑃 𝐵)𝑢 + 𝐵′ 𝑃 𝐴𝑥 = 0

which is the first-order condition for maximizing L w.r.t. u.


Thus, the optimal choice of u must satisfy

𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥 ,

which follows from the definition of the first-order conditions for Lagrangian equation.

3. ^^.

Rewriting our problem by substituting the constraint into the objective function, we get

𝑣(𝑥) = max{−(𝐴𝑥 + 𝐵𝑢)′ 𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢}


𝑢

Since we know the optimal choice of u satisfies 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥, then

𝑣(𝑥) = −(𝐴𝑥 + 𝐵𝑢)′ 𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢 𝑤𝑖𝑡ℎ 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥

To evaluate the function

𝑣(𝑥) = −(𝐴𝑥 + 𝐵𝑢)′ 𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢


= −(𝑥′ 𝐴′ + 𝑢′ 𝐵′ )𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢
= −𝑥′ 𝐴′ 𝑃 𝐴𝑥 − 𝑢′ 𝐵′ 𝑃 𝐴𝑥 − 𝑥′ 𝐴′ 𝑃 𝐵𝑢 − 𝑢′ 𝐵′ 𝑃 𝐵𝑢 − 𝑢′ 𝑄𝑢
= −𝑥′ 𝐴′ 𝑃 𝐴𝑥 − 2𝑢′ 𝐵′ 𝑃 𝐴𝑥 − 𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢
66 CHAPTER 4. LINEAR ALGEBRA

For simplicity, denote by 𝑆 ∶= (𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴, then 𝑢 = −𝑆𝑥.


Regarding the second term −2𝑢′ 𝐵′ 𝑃 𝐴𝑥,

−2𝑢′ 𝐵′ 𝑃 𝐴𝑥 = −2𝑥′ 𝑆 ′ 𝐵′ 𝑃 𝐴𝑥
= 2𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥

Notice that the term (𝑄 + 𝐵′ 𝑃 𝐵)−1 is symmetric as both P and Q are symmetric.
Regarding the third term −𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢,

−𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢 = −𝑥′ 𝑆 ′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑆𝑥


= −𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥

Hence, the summation of second and third terms is 𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥.
This implies that

𝑣(𝑥) = −𝑥′ 𝐴′ 𝑃 𝐴𝑥 − 2𝑢′ 𝐵′ 𝑃 𝐴𝑥 − 𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢


= −𝑥′ 𝐴′ 𝑃 𝐴𝑥 + 𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥
= −𝑥′ [𝐴′ 𝑃 𝐴 − 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴]𝑥

Therefore, the solution to the optimization problem 𝑣(𝑥) = −𝑥′ 𝑃 ̃ 𝑥 follows the above result by
denoting 𝑃 ̃ ∶= 𝐴′ 𝑃 𝐴 − 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴
Footnotes
[1] Although there is a specialized matrix data type defined in NumPy, it’s more standard to
work with ordinary NumPy arrays. See this discussion.
[2] Suppose that ‖𝑆‖ < 1. Take any nonzero vector 𝑥, and let 𝑟 ∶= ‖𝑥‖. We have ‖𝑆𝑥‖ =
𝑟‖𝑆(𝑥/𝑟)‖ ≤ 𝑟‖𝑆‖ < 𝑟 = ‖𝑥‖. Hence every point is pulled towards the origin.
Chapter 5

Complex Numbers and


Trigonometry

5.1 Contents

• Overview 5.2
• De Moivre’s Theorem 5.3
• Applications of de Moivre’s Theorem 5.4

5.2 Overview

This lecture introduces some elementary mathematics and trigonometry.


Useful and interesting in its own right, these concepts reap substantial rewards when studying
dynamics generated by linear difference equations or linear differential equations.
For example, these tools are keys to understanding outcomes attained by Paul Samuelson
(1939) [93] in his classic paper on interactions between the investment accelerator and the
Keynesian consumption function, our topic in the lecture Samuelson Multiplier Accelerator.
In addition to providing foundations for Samuelson’s work and extensions of it, this lec-
ture can be read as a stand-alone quick reminder of key results from elementary high school
trigonometry.
So let’s dive in.

5.2.1 Complex Numbers

A complex number has a real part 𝑥 and a purely imaginary part 𝑦.


The Euclidean, polar, and trigonometric forms of a complex number 𝑧 are:

𝑧 = 𝑥 + 𝑖𝑦 = 𝑟𝑒𝑖𝜃 = 𝑟(cos 𝜃 + 𝑖 sin 𝜃)

The second equality above is known as Euler’s formula


• Euler contributed many other formulas too!

67
68 CHAPTER 5. COMPLEX NUMBERS AND TRIGONOMETRY

The complex conjugate 𝑧 ̄ of 𝑧 is defined as

𝑧 ̄ = 𝑥 − 𝑖𝑦 = 𝑟𝑒−𝑖𝜃 = 𝑟(cos 𝜃 − 𝑖 sin 𝜃)

The value 𝑥 is the real part of 𝑧 and 𝑦 is the imaginary part of 𝑧.



The symbol |𝑧| = 𝑧 ̄ ⋅ 𝑧 = 𝑟 represents the modulus of 𝑧.
The value 𝑟 is the Euclidean distance of vector (𝑥, 𝑦) from the origin:

𝑟 = |𝑧| = √𝑥2 + 𝑦2

The value 𝜃 is the angle of (𝑥, 𝑦) with respect to the real axis.
Evidently, the tangent of 𝜃 is ( 𝑥𝑦 ).
Therefore,

𝑦
𝜃 = tan−1 ( )
𝑥
Three elementary trigonometric functions are

𝑥 𝑒𝑖𝜃 + 𝑒−𝑖𝜃 𝑦 𝑒𝑖𝜃 − 𝑒−𝑖𝜃 𝑦


cos 𝜃 = = , sin 𝜃 = = , tan 𝜃 =
𝑟 2 𝑟 2𝑖 𝑥
We’ll need the following imports:

In [1]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline
from sympy import *

5.2.2 An Example

Consider the complex number 𝑧 = 1 + 3𝑖.
√ √
For 𝑧 = 1 + 3𝑖, 𝑥 = 1, 𝑦 = 3.

It follows that 𝑟 = 2 and 𝜃 = tan−1 ( 3) = 𝜋3 = 60𝑜 .

Let’s use Python to plot the trigonometric form of the complex number 𝑧 = 1 + 3𝑖.

In [2]: # Abbreviate useful values and functions


π = np.pi
zeros = np.zeros
ones = np.ones

# Set parameters
r = 2
θ = π/3
x = r * np.cos(θ)
x_range = np.linspace(0, x, 1000)
θ_range = np.linspace(0, θ, 1000)

# Plot
fig = plt.figure(figsize=(8, 8))
5.2. OVERVIEW 69

ax = plt.subplot(111, projection='polar')

ax.plot((0, θ), (0, r), marker='o', color='b') # Plot r


ax.plot(zeros(x_range.shape), x_range, color='b') # Plot x
ax.plot(θ_range, x / np.cos(θ_range), color='b') # Plot y
ax.plot(θ_range, ones(θ_range.shape) * 0.1, color='r') # Plot θ

ax.margins(0) # Let the plot starts at origin

ax.set_title("Trigonometry of complex numbers", va='bottom',


fontsize='x­large')

ax.set_rmax(2)
ax.set_rticks((0.5, 1, 1.5, 2)) # Less radial ticks
ax.set_rlabel_position(­88.5) # Get radial labels away from plotted line

ax.text(θ, r+0.01 , r'$z = x + iy = 1 + \sqrt{3}\, i$') # Label z


ax.text(θ+0.2, 1 , '$r = 2$') # Label r
ax.text(0­0.2, 0.5, '$x = 1$') # Label x
ax.text(0.5, 1.2, r'$y = \sqrt{3}$') # Label y
ax.text(0.25, 0.15, r'$\theta = 60^o$') # Label θ

ax.grid(True)
plt.show()
70 CHAPTER 5. COMPLEX NUMBERS AND TRIGONOMETRY

5.3 De Moivre’s Theorem

de Moivre’s theorem states that:

(𝑟(cos 𝜃 + 𝑖 sin 𝜃))𝑛 = 𝑟𝑛 𝑒𝑖𝑛𝜃 = 𝑟𝑛 (cos 𝑛𝜃 + 𝑖 sin 𝑛𝜃)

To prove de Moivre’s theorem, note that

𝑛
(𝑟(cos 𝜃 + 𝑖 sin 𝜃))𝑛 = (𝑟𝑒𝑖𝜃 )

and compute.

5.4 Applications of de Moivre’s Theorem

5.4.1 Example 1

We can use de Moivre’s theorem to show that 𝑟 = √𝑥2 + 𝑦2 .


We have

1 = 𝑒𝑖𝜃 𝑒−𝑖𝜃
= (cos 𝜃 + 𝑖 sin 𝜃)(cos (-𝜃) + 𝑖 sin (-𝜃))
= (cos 𝜃 + 𝑖 sin 𝜃)(cos 𝜃 − 𝑖 sin 𝜃)
= cos2 𝜃 + sin2 𝜃
𝑥2 𝑦2
= + 2
𝑟2 𝑟

and thus

𝑥2 + 𝑦2 = 𝑟2

We recognize this as a theorem of Pythagoras.

5.4.2 Example 2

Let 𝑧 = 𝑟𝑒𝑖𝜃 and 𝑧 ̄ = 𝑟𝑒−𝑖𝜃 so that 𝑧 ̄ is the complex conjugate of 𝑧.


(𝑧, 𝑧)̄ form a complex conjugate pair of complex numbers.
Let 𝑎 = 𝑝𝑒𝑖𝜔 and 𝑎̄ = 𝑝𝑒−𝑖𝜔 be another complex conjugate pair.
For each element of a sequence of integers 𝑛 = 0, 1, 2, … ,.
To do so, we can apply de Moivre’s formula.
Thus,
5.4. APPLICATIONS OF DE MOIVRE’S THEOREM 71

𝑥𝑛 = 𝑎𝑧 𝑛 + 𝑎𝑧̄ 𝑛̄
= 𝑝𝑒𝑖𝜔 (𝑟𝑒𝑖𝜃 )𝑛 + 𝑝𝑒−𝑖𝜔 (𝑟𝑒−𝑖𝜃 )𝑛
= 𝑝𝑟𝑛 𝑒𝑖(𝜔+𝑛𝜃) + 𝑝𝑟𝑛 𝑒−𝑖(𝜔+𝑛𝜃)
= 𝑝𝑟𝑛 [cos (𝜔 + 𝑛𝜃) + 𝑖 sin (𝜔 + 𝑛𝜃) + cos (𝜔 + 𝑛𝜃) − 𝑖 sin (𝜔 + 𝑛𝜃)]
= 2𝑝𝑟𝑛 cos (𝜔 + 𝑛𝜃)

5.4.3 Example 3

This example provides machinery that is at the heard of Samuelson’s analysis of his
multiplier-accelerator model [93].
Thus, consider a second-order linear difference equation

𝑥𝑛+2 = 𝑐1 𝑥𝑛+1 + 𝑐2 𝑥𝑛

whose characteristic polynomial is

𝑧 2 − 𝑐1 𝑧 − 𝑐 2 = 0

or

(𝑧 2 − 𝑐1 𝑧 − 𝑐2 ) = (𝑧 − 𝑧1 )(𝑧 − 𝑧2 ) = 0

has roots 𝑧1 , 𝑧1 .
A solution is a sequence {𝑥𝑛 }∞
𝑛=0 that satisfies the difference equation.

Under the following circumstances, we can apply our example 2 formula to solve the differ-
ence equation
• the roots 𝑧1 , 𝑧2 of the characteristic polynomial of the difference equation form a com-
plex conjugate pair
• the values 𝑥0 , 𝑥1 are given initial conditions
To solve the difference equation, recall from example 2 that

𝑥𝑛 = 2𝑝𝑟𝑛 cos (𝜔 + 𝑛𝜃)

where 𝜔, 𝑝 are coefficients to be determined from information encoded in the initial conditions
𝑥1 , 𝑥0 .
Since 𝑥0 = 2𝑝 cos 𝜔 and 𝑥1 = 2𝑝𝑟 cos (𝜔 + 𝜃) the ratio of 𝑥1 to 𝑥0 is

𝑥1 𝑟 cos (𝜔 + 𝜃)
=
𝑥0 cos 𝜔

We can solve this equation for 𝜔 then solve for 𝑝 using 𝑥0 = 2𝑝𝑟0 cos (𝜔 + 𝑛𝜃).
With the sympy package in Python, we are able to solve and plot the dynamics of 𝑥𝑛 given
different values of 𝑛.
√ √
In this example, we set the initial values: - 𝑟 = 0.9 - 𝜃 = 41 𝜋 - 𝑥0 = 4 - 𝑥1 = 𝑟 ⋅ 2 2 = 1.8 2.
72 CHAPTER 5. COMPLEX NUMBERS AND TRIGONOMETRY

We first numerically solve for 𝜔 and 𝑝 using nsolve in the sympy package based on the above
initial condition:

In [3]: # Set parameters


r = 0.9
θ = π/4
x0 = 4
x1 = 2 * r * sqrt(2)

# Define symbols to be calculated


ω, p = symbols('ω p', real=True)

# Solve for ω
## Note: we choose the solution near 0
eq1 = Eq(x1/x0 ­ r * cos(ω+θ) / cos(ω), 0)
ω = nsolve(eq1, ω, 0)
ω = np.float(ω)
print(f'ω = {ω:1.3f}')

# Solve for p
eq2 = Eq(x0 ­ 2 * p * cos(ω))
p = nsolve(eq2, p, 0)
p = np.float(p)
print(f'p = {p:1.3f}')

ω = 0.000
p = 2.000

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/sympy/core/relational.py:490: SymPyDeprecationWarning:

Eq(expr) with rhs default to 0 has been deprecated since SymPy 1.5.
Use Eq(expr, 0) instead. See
https://github.com/sympy/sympy/issues/16587 for more info.

deprecated_since_version="1.5"

Using the code above, we compute that 𝜔 = 0 and 𝑝 = 2.


Then we plug in the values we solve for 𝜔 and 𝑝 and plot the dynamic.

In [4]: # Define range of n


max_n = 30
n = np.arange(0, max_n+1, 0.01)

# Define x_n
x = lambda n: 2 * p * r**n * np.cos(ω + n * θ)

# Plot
fig, ax = plt.subplots(figsize=(12, 8))

ax.plot(n, x(n))
ax.set(xlim=(0, max_n), ylim=(­5, 5), xlabel='$n$', ylabel='$x_n$')

# Set x­axis in the middle of the plot


ax.spines['bottom'].set_position('center')
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
5.4. APPLICATIONS OF DE MOIVRE’S THEOREM 73

ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')

ticklab = ax.xaxis.get_ticklabels()[0] # Set x­label position


trans = ticklab.get_transform()
ax.xaxis.set_label_coords(31, 0, transform=trans)

ticklab = ax.yaxis.get_ticklabels()[0] # Set y­label position


trans = ticklab.get_transform()
ax.yaxis.set_label_coords(0, 5, transform=trans)

ax.grid()
plt.show()

5.4.4 Trigonometric Identities

We can obtain a complete suite of trigonometric identities by appropriately manipulating po-


lar forms of complex numbers.
We’ll get many of them by deducing implications of the equality

𝑒𝑖(𝜔+𝜃) = 𝑒𝑖𝜔 𝑒𝑖𝜃

For example, we’ll calculate identities for


cos (𝜔 + 𝜃) and sin (𝜔 + 𝜃).
Using the sine and cosine formulas presented at the beginning of this lecture, we have:
74 CHAPTER 5. COMPLEX NUMBERS AND TRIGONOMETRY

𝑒𝑖(𝜔+𝜃) + 𝑒−𝑖(𝜔+𝜃)
cos (𝜔 + 𝜃) =
2
𝑒𝑖(𝜔+𝜃) − 𝑒−𝑖(𝜔+𝜃)
sin (𝜔 + 𝜃) =
2𝑖

We can also obtain the trigonometric identities as follows:

cos (𝜔 + 𝜃) + 𝑖 sin (𝜔 + 𝜃) = 𝑒𝑖(𝜔+𝜃)


= 𝑒𝑖𝜔 𝑒𝑖𝜃
= (cos 𝜔 + 𝑖 sin 𝜔)(cos 𝜃 + 𝑖 sin 𝜃)
= (cos 𝜔 cos 𝜃 − sin 𝜔 sin 𝜃) + 𝑖(cos 𝜔 sin 𝜃 + sin 𝜔 cos 𝜃)

Since both real and imaginary parts of the above formula should be equal, we get:

cos (𝜔 + 𝜃) = cos 𝜔 cos 𝜃 − sin 𝜔 sin 𝜃


sin (𝜔 + 𝜃) = cos 𝜔 sin 𝜃 + sin 𝜔 cos 𝜃

The equations above are also known as the angle sum identities. We can verify the equa-
tions using the simplify function in the sympy package:

In [5]: # Define symbols


ω, θ = symbols('ω θ', real=True)

# Verify
print("cos(ω)cos(θ) ­ sin(ω)sin(θ) =",
simplify(cos(ω)*cos(θ) ­ sin(ω) * sin(θ)))
print("cos(ω)sin(θ) + sin(ω)cos(θ) =",
simplify(cos(ω)*sin(θ) + sin(ω) * cos(θ)))

cos(ω)cos(θ) ­ sin(ω)sin(θ) = cos(θ + ω)


cos(ω)sin(θ) + sin(ω)cos(θ) = sin(θ + ω)

5.4.5 Trigonometric Integrals

We can also compute the trigonometric integrals using polar forms of complex numbers.
For example, we want to solve the following integral:

𝜋
∫ cos(𝜔) sin(𝜔) 𝑑𝜔
−𝜋

Using Euler’s formula, we have:


5.4. APPLICATIONS OF DE MOIVRE’S THEOREM 75

(𝑒𝑖𝜔 + 𝑒−𝑖𝜔 ) (𝑒𝑖𝜔 − 𝑒−𝑖𝜔 )


∫ cos(𝜔) sin(𝜔) 𝑑𝜔 = ∫ 𝑑𝜔
2 2𝑖
1
= ∫ 𝑒2𝑖𝜔 − 𝑒−2𝑖𝜔 𝑑𝜔
4𝑖
1 −𝑖 𝑖
= ( 𝑒2𝑖𝜔 − 𝑒−2𝑖𝜔 + 𝐶1 )
4𝑖 2 2
2 2
1
= − [(𝑒𝑖𝜔 ) + (𝑒−𝑖𝜔 ) − 2] + 𝐶2
8
1
= − (𝑒𝑖𝜔 − 𝑒−𝑖𝜔 )2 + 𝐶2
8
2
1 𝑒𝑖𝜔 − 𝑒−𝑖𝜔
= ( ) + 𝐶2
2 2𝑖
1
= sin2 (𝜔) + 𝐶2
2
and thus:

𝜋
1 1
∫ cos(𝜔) sin(𝜔) 𝑑𝜔 = sin2 (𝜋) − sin2 (−𝜋) = 0
−𝜋 2 2

We can verify the analytical as well as numerical results using integrate in the sympy pack-
age:

In [6]: # Set initial printing


init_printing()

ω = Symbol('ω')
print('The analytical solution for integral of cos(ω)sin(ω) is:')
integrate(cos(ω) * sin(ω), ω)

The analytical solution for integral of cos(ω)sin(ω) is:

sin2 (𝜔)
Out[6]:
2

In [7]: print('The numerical solution for the integral of cos(ω)sin(ω) \


from ­π to π is:')
integrate(cos(ω) * sin(ω), (ω, ­π, π))

The numerical solution for the integral of cos(ω)sin(ω) from ­π to π is:

Out[7]: 0

5.4.6 Exercises

We invite the reader to verify analytically and with the sympy package the following two
equalities:
76 CHAPTER 5. COMPLEX NUMBERS AND TRIGONOMETRY

𝜋
𝜋
∫ cos(𝜔)2 𝑑𝜔 =
−𝜋 2

𝜋
𝜋
∫ sin(𝜔)2 𝑑𝜔 =
−𝜋 2
Chapter 6

LLN and CLT

6.1 Contents

• Overview 6.2
• Relationships 6.3
• LLN 6.4
• CLT 6.5
• Exercises 6.6
• Solutions 6.7

6.2 Overview

This lecture illustrates two of the most important theorems of probability and statistics: The
law of large numbers (LLN) and the central limit theorem (CLT).
These beautiful theorems lie behind many of the most fundamental results in econometrics
and quantitative economic modeling.
The lecture is based around simulations that show the LLN and CLT in action.
We also demonstrate how the LLN and CLT break down when the assumptions they are
based on do not hold.
In addition, we examine several useful extensions of the classical theorems, such as
• The delta method, for smooth functions of random variables.
• The multivariate case.
Some of these extensions are presented as exercises.
We’ll need the following imports:

In [1]: import random


import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.stats import t, beta, lognorm, expon, gamma, uniform, cauchy
from scipy.stats import gaussian_kde, poisson, binom, norm, chi2
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.collections import PolyCollection
from scipy.linalg import inv, sqrtm

77
78 CHAPTER 6. LLN AND CLT

6.3 Relationships

The CLT refines the LLN.


The LLN gives conditions under which sample moments converge to population moments as
sample size increases.
The CLT provides information about the rate at which sample moments converge to popula-
tion moments as sample size increases.

6.4 LLN

We begin with the law of large numbers, which tells us when sample averages will converge to
their population means.

6.4.1 The Classical LLN

The classical law of large numbers concerns independent and identically distributed (IID)
random variables.
Here is the strongest version of the classical LLN, known as Kolmogorov’s strong law.
Let 𝑋1 , … , 𝑋𝑛 be independent and identically distributed scalar random variables, with com-
mon distribution 𝐹 .
When it exists, let 𝜇 denote the common mean of this sample:

𝜇 ∶= 𝔼𝑋 = ∫ 𝑥𝐹 (𝑑𝑥)

In addition, let

1 𝑛
𝑋̄ 𝑛 ∶= ∑ 𝑋𝑖
𝑛 𝑖=1

Kolmogorov’s strong law states that, if 𝔼|𝑋| is finite, then

ℙ {𝑋̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (1)

What does this last expression mean?


Let’s think about it from a simulation perspective, imagining for a moment that our com-
puter can generate perfect random samples (which of course it can’t).
Let’s also imagine that we can generate infinite sequences so that the statement 𝑋̄ 𝑛 → 𝜇 can
be evaluated.
In this setting, (1) should be interpreted as meaning that the probability of the computer
producing a sequence where 𝑋̄ 𝑛 → 𝜇 fails to occur is zero.

6.4.2 Proof

The proof of Kolmogorov’s strong law is nontrivial – see, for example, theorem 8.3.5 of [31].
6.4. LLN 79

On the other hand, we can prove a weaker version of the LLN very easily and still get most of
the intuition.
The version we prove is as follows: If 𝑋1 , … , 𝑋𝑛 is IID with 𝔼𝑋𝑖2 < ∞, then, for any 𝜖 > 0, we
have

ℙ {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} → 0 as 𝑛→∞ (2)

(This version is weaker because we claim only convergence in probability rather than almost
sure convergence, and assume a finite second moment)
To see that this is so, fix 𝜖 > 0, and let 𝜎2 be the variance of each 𝑋𝑖 .
Recall the Chebyshev inequality, which tells us that

𝔼[(𝑋̄ 𝑛 − 𝜇)2 ]
ℙ {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} ≤ (3)
𝜖2

Now observe that

2

{ 1 𝑛 ⎫
}
̄ 2
𝔼[(𝑋𝑛 − 𝜇) ] = 𝔼 ⎨[ ∑(𝑋𝑖 − 𝜇)] ⎬
{ 𝑛 𝑖=1 }
⎩ ⎭
1 𝑛 𝑛
= 2 ∑ ∑ 𝔼(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇)
𝑛 𝑖=1 𝑗=1
1 𝑛
= ∑ 𝔼(𝑋𝑖 − 𝜇)2
𝑛2 𝑖=1
𝜎2
=
𝑛

Here the crucial step is at the third equality, which follows from independence.
Independence means that if 𝑖 ≠ 𝑗, then the covariance term 𝔼(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇) drops out.
As a result, 𝑛2 − 𝑛 terms vanish, leading us to a final expression that goes to zero in 𝑛.
Combining our last result with (3), we come to the estimate

𝜎2
ℙ {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} ≤ (4)
𝑛𝜖2

The claim in (2) is now clear.


Of course, if the sequence 𝑋1 , … , 𝑋𝑛 is correlated, then the cross-product terms 𝔼(𝑋𝑖 −
𝜇)(𝑋𝑗 − 𝜇) are not necessarily zero.
While this doesn’t mean that the same line of argument is impossible, it does mean that if we
want a similar result then the covariances should be “almost zero” for “most” of these terms.
In a long sequence, this would be true if, for example, 𝔼(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇) approached zero
when the difference between 𝑖 and 𝑗 became large.
In other words, the LLN can still work if the sequence 𝑋1 , … , 𝑋𝑛 has a kind of “asymptotic
independence”, in the sense that correlation falls to zero as variables become further apart in
the sequence.
80 CHAPTER 6. LLN AND CLT

This idea is very important in time series analysis, and we’ll come across it again soon
enough.

6.4.3 Illustration

Let’s now illustrate the classical IID law of large numbers using simulation.
In particular, we aim to generate some sequences of IID random variables and plot the evolu-
tion of 𝑋̄ 𝑛 as 𝑛 increases.
Below is a figure that does just this (as usual, you can click on it to expand it).
It shows IID observations from three different distributions and plots 𝑋̄ 𝑛 against 𝑛 in each
case.
The dots represent the underlying observations 𝑋𝑖 for 𝑖 = 1, … , 100.
In each of the three cases, convergence of 𝑋̄ 𝑛 to 𝜇 occurs as predicted

In [2]: n = 100

# Arbitrary collection of distributions


distributions = {"student's t with 10 degrees of freedom": t(10),
"β(2, 2)": beta(2, 2),
"lognormal LN(0, 1/2)": lognorm(0.5),
"γ(5, 1/2)": gamma(5, scale=2),
"poisson(4)": poisson(4),
"exponential with λ = 1": expon(1)}

# Create a figure and some axes


num_plots = 3
fig, axes = plt.subplots(num_plots, 1, figsize=(10, 20))

# Set some plotting parameters to improve layout


bbox = (0., 1.02, 1., .102)
legend_args = {'ncol': 2,
'bbox_to_anchor': bbox,
'loc': 3,
'mode': 'expand'}
plt.subplots_adjust(hspace=0.5)

for ax in axes:
# Choose a randomly selected distribution
name = random.choice(list(distributions.keys()))
distribution = distributions.pop(name)

# Generate n draws from the distribution


data = distribution.rvs(n)

# Compute sample mean at each n


sample_mean = np.empty(n)
for i in range(n):
sample_mean[i] = np.mean(data[:i+1])

# Plot
ax.plot(list(range(n)), data, 'o', color='grey', alpha=0.5)
axlabel = '$\\bar X_n$ for $X_i \sim$' + name
ax.plot(list(range(n)), sample_mean, 'g­', lw=3, alpha=0.6, label=axlabel)
m = distribution.mean()
ax.plot(list(range(n)), [m] * n, 'k­­', lw=1.5, label='$\mu$')
ax.vlines(list(range(n)), m, data, lw=0.2)
6.4. LLN 81

ax.legend(**legend_args, fontsize=12)

plt.show()
82 CHAPTER 6. LLN AND CLT
6.5. CLT 83

The three distributions are chosen at random from a selection stored in the dictionary dis­
tributions.

6.5 CLT

Next, we turn to the central limit theorem, which tells us about the distribution of the devia-
tion between sample averages and population means.

6.5.1 Statement of the Theorem

The central limit theorem is one of the most remarkable results in all of mathematics.
In the classical IID setting, it tells us the following:
If the sequence 𝑋1 , … , 𝑋𝑛 is IID, with common mean 𝜇 and common variance 𝜎2 ∈ (0, ∞),
then

√ 𝑑
𝑛(𝑋̄ 𝑛 − 𝜇) → 𝑁 (0, 𝜎2 ) as 𝑛→∞ (5)

𝑑
Here → 𝑁 (0, 𝜎2 ) indicates convergence in distribution to a centered (i.e, zero mean) normal
with standard deviation 𝜎.

6.5.2 Intuition

The striking implication of the CLT is that for any distribution with finite second moment,
the simple operation of adding independent copies always leads to a Gaussian curve.
A relatively simple proof of the central limit theorem can be obtained by working with char-
acteristic functions (see, e.g., theorem 9.5.6 of [31]).
The proof is elegant but almost anticlimactic, and it provides surprisingly little intuition.
In fact, all of the proofs of the CLT that we know are similar in this respect.
Why does adding independent copies produce a bell-shaped distribution?
Part of the answer can be obtained by investigating the addition of independent Bernoulli
random variables.
In particular, let 𝑋𝑖 be binary, with ℙ{𝑋𝑖 = 0} = ℙ{𝑋𝑖 = 1} = 0.5, and let 𝑋1 , … , 𝑋𝑛 be
independent.
𝑛
Think of 𝑋𝑖 = 1 as a “success”, so that 𝑌𝑛 = ∑𝑖=1 𝑋𝑖 is the number of successes in 𝑛 trials.
The next figure plots the probability mass function of 𝑌𝑛 for 𝑛 = 1, 2, 4, 8

In [3]: fig, axes = plt.subplots(2, 2, figsize=(10, 6))


plt.subplots_adjust(hspace=0.4)
axes = axes.flatten()
ns = [1, 2, 4, 8]
dom = list(range(9))

for ax, n in zip(axes, ns):


84 CHAPTER 6. LLN AND CLT

b = binom(n, 0.5)
ax.bar(dom, b.pmf(dom), alpha=0.6, align='center')
ax.set(xlim=(­0.5, 8.5), ylim=(0, 0.55),
xticks=list(range(9)), yticks=(0, 0.2, 0.4),
title=f'$n = {n}$')

plt.show()

When 𝑛 = 1, the distribution is flat — one success or no successes have the same probability.
When 𝑛 = 2 we can either have 0, 1 or 2 successes.
Notice the peak in probability mass at the mid-point 𝑘 = 1.
The reason is that there are more ways to get 1 success (“fail then succeed” or “succeed then
fail”) than to get zero or two successes.
Moreover, the two trials are independent, so the outcomes “fail then succeed” and “succeed
then fail” are just as likely as the outcomes “fail then fail” and “succeed then succeed”.
(If there was positive correlation, say, then “succeed then fail” would be less likely than “suc-
ceed then succeed”)
Here, already we have the essence of the CLT: addition under independence leads probability
mass to pile up in the middle and thin out at the tails.
For 𝑛 = 4 and 𝑛 = 8 we again get a peak at the “middle” value (halfway between the mini-
mum and the maximum possible value).
The intuition is the same — there are simply more ways to get these middle outcomes.
If we continue, the bell-shaped curve becomes even more pronounced.
We are witnessing the binomial approximation of the normal distribution.
6.5. CLT 85

6.5.3 Simulation 1

Since the CLT seems almost magical, running simulations that verify its implications is one
good way to build intuition.
To this end, we now perform the following simulation

1. Choose an arbitrary distribution 𝐹 for the underlying observations 𝑋𝑖 .


2. Generate independent draws of 𝑌𝑛 ∶= 𝑛(𝑋̄ 𝑛 − 𝜇).

3. Use these draws to compute some measure of their distribution — such as a histogram.

4. Compare the latter to 𝑁 (0, 𝜎2 ).

Here’s some code that does exactly this for the exponential distribution 𝐹 (𝑥) = 1 − 𝑒−𝜆𝑥 .
(Please experiment with other choices of 𝐹 , but remember that, to conform with the condi-
tions of the CLT, the distribution must have a finite second moment)

In [4]: # Set parameters


n = 250 # Choice of n
k = 100000 # Number of draws of Y_n
distribution = expon(2) # Exponential distribution, λ = 1/2
μ, s = distribution.mean(), distribution.std()

# Draw underlying RVs. Each row contains a draw of X_1,..,X_n


data = distribution.rvs((k, n))
# Compute mean of each row, producing k draws of \bar X_n
sample_means = data.mean(axis=1)
# Generate observations of Y_n
Y = np.sqrt(n) * (sample_means ­ μ)

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
xmin, xmax = ­3 * s, 3 * s
ax.set_xlim(xmin, xmax)
ax.hist(Y, bins=60, alpha=0.5, density=True)
xgrid = np.linspace(xmin, xmax, 200)
ax.plot(xgrid, norm.pdf(xgrid, scale=s), 'k­', lw=2, label='$N(0, \sigma^2)$')
ax.legend()

plt.show()
86 CHAPTER 6. LLN AND CLT

Notice the absence of for loops — every operation is vectorized, meaning that the major cal-
culations are all shifted to highly optimized C code.
The fit to the normal density is already tight and can be further improved by increasing n.
You can also experiment with other specifications of 𝐹 .

6.5.4 Simulation 2

Our next simulation is somewhat like the first, except that we aim to track the distribution of

𝑌𝑛 ∶= 𝑛(𝑋̄ 𝑛 − 𝜇) as 𝑛 increases.
In the simulation, we’ll be working with random variables having 𝜇 = 0.
Thus, when 𝑛 = 1, we have 𝑌1 = 𝑋1 , so the first distribution is just the distribution of the
underlying random variable.

For 𝑛 = 2, the distribution of 𝑌2 is that of (𝑋1 + 𝑋2 )/ 2, and so on.
What we expect is that, regardless of the distribution of the underlying random variable, the
distribution of 𝑌𝑛 will smooth out into a bell-shaped curve.
The next figure shows this process for 𝑋𝑖 ∼ 𝑓, where 𝑓 was specified as the convex combina-
tion of three different beta densities.
(Taking a convex combination is an easy way to produce an irregular shape for 𝑓)
In the figure, the closest density is that of 𝑌1 , while the furthest is that of 𝑌5

In [5]: beta_dist = beta(2, 2)

def gen_x_draws(k):
"""
Returns a flat array containing k independent draws from the
distribution of X, the underlying random variable. This distribution
is itself a convex combination of three beta distributions.
"""
6.5. CLT 87

bdraws = beta_dist.rvs((3, k))


# Transform rows, so each represents a different distribution
bdraws[0, :] ­= 0.5
bdraws[1, :] += 0.6
bdraws[2, :] ­= 1.1
# Set X[i] = bdraws[j, i], where j is a random draw from {0, 1, 2}
js = np.random.randint(0, 2, size=k)
X = bdraws[js, np.arange(k)]
# Rescale, so that the random variable is zero mean
m, sigma = X.mean(), X.std()
return (X ­ m) / sigma

nmax = 5
reps = 100000
ns = list(range(1, nmax + 1))

# Form a matrix Z such that each column is reps independent draws of X


Z = np.empty((reps, nmax))
for i in range(nmax):
Z[:, i] = gen_x_draws(reps)
# Take cumulative sum across columns
S = Z.cumsum(axis=1)
# Multiply j­th column by sqrt j
Y = (1 / np.sqrt(ns)) * S

# Plot
fig = plt.figure(figsize = (10, 6))
ax = fig.gca(projection='3d')

a, b = ­3, 3
gs = 100
xs = np.linspace(a, b, gs)

# Build verts
greys = np.linspace(0.3, 0.7, nmax)
verts = []
for n in ns:
density = gaussian_kde(Y[:, n­1])
ys = density(xs)
verts.append(list(zip(xs, ys)))

poly = PolyCollection(verts, facecolors=[str(g) for g in greys])


poly.set_alpha(0.85)
ax.add_collection3d(poly, zs=ns, zdir='x')

ax.set(xlim3d=(1, nmax), xticks=(ns), ylabel='$Y_n$', zlabel='$p(y_n)$',


xlabel=("n"), yticks=((­3, 0, 3)), ylim3d=(a, b),
zlim3d=(0, 0.4), zticks=((0.2, 0.4)))
ax.invert_xaxis()
# Rotates the plot 30 deg on z axis and 45 deg on x axis
ax.view_init(30, 45)
plt.show()
88 CHAPTER 6. LLN AND CLT

As expected, the distribution smooths out into a bell curve as 𝑛 increases.


We leave you to investigate its contents if you wish to know more.
If you run the file from the ordinary IPython shell, the figure should pop up in a window that
you can rotate with your mouse, giving different views on the density sequence.

6.5.5 The Multivariate Case

The law of large numbers and central limit theorem work just as nicely in multidimensional
settings.
To state the results, let’s recall some elementary facts about random vectors.
A random vector X is just a sequence of 𝑘 random variables (𝑋1 , … , 𝑋𝑘 ).
Each realization of X is an element of ℝ𝑘 .
A collection of random vectors X1 , … , X𝑛 is called independent if, given any 𝑛 vectors
x1 , … , x𝑛 in ℝ𝑘 , we have

ℙ{X1 ≤ x1 , … , X𝑛 ≤ x𝑛 } = ℙ{X1 ≤ x1 } × ⋯ × ℙ{X𝑛 ≤ x𝑛 }

(The vector inequality X ≤ x means that 𝑋𝑗 ≤ 𝑥𝑗 for 𝑗 = 1, … , 𝑘)


Let 𝜇𝑗 ∶= 𝔼[𝑋𝑗 ] for all 𝑗 = 1, … , 𝑘.
6.6. EXERCISES 89

The expectation 𝔼[X] of X is defined to be the vector of expectations:

𝔼[𝑋1 ] 𝜇1

⎜ 𝔼[𝑋2 ] ⎞
⎟ ⎛
⎜ 𝜇2 ⎞

𝔼[X] ∶= ⎜
⎜ ⎟
⎟ = ⎜ ⎟ =∶ 𝜇
⎜ ⋮ ⎟ ⎜ ⎜ ⋮ ⎟⎟
⎝ 𝔼[𝑋 𝑘] 𝜇
⎠ ⎝ 𝑘 ⎠

The variance-covariance matrix of random vector X is defined as

Var[X] ∶= 𝔼[(X − 𝜇)(X − 𝜇)′ ]

Expanding this out, we get

𝔼[(𝑋1 − 𝜇1 )(𝑋1 − 𝜇1 )] ⋯ 𝔼[(𝑋1 − 𝜇1 )(𝑋𝑘 − 𝜇𝑘 )]



⎜ 𝔼[(𝑋2 − 𝜇2 )(𝑋1 − 𝜇1 )] ⋯ 𝔼[(𝑋2 − 𝜇2 )(𝑋𝑘 − 𝜇𝑘 )] ⎞

Var[X] = ⎜
⎜ ⎟

⎜ ⋮ ⋮ ⋮ ⎟
⎝ 𝔼[(𝑋𝑘 − 𝜇 𝑘 )(𝑋1 − 𝜇 1 )] ⋯ 𝔼[(𝑋 𝑘 − 𝜇 𝑘 )(𝑋𝑘 − 𝜇𝑘 )] ⎠

The 𝑗, 𝑘-th term is the scalar covariance between 𝑋𝑗 and 𝑋𝑘 .


With this notation, we can proceed to the multivariate LLN and CLT.
Let X1 , … , X𝑛 be a sequence of independent and identically distributed random vectors, each
one taking values in ℝ𝑘 .
Let 𝜇 be the vector 𝔼[X𝑖 ], and let Σ be the variance-covariance matrix of X𝑖 .
Interpreting vector addition and scalar multiplication in the usual way (i.e., pointwise), let

1 𝑛
X̄ 𝑛 ∶= ∑ X𝑖
𝑛 𝑖=1

In this setting, the LLN tells us that

ℙ {X̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (6)

Here X̄ 𝑛 → 𝜇 means that ‖X̄ 𝑛 − 𝜇‖ → 0, where ‖ ⋅ ‖ is the standard Euclidean norm.


The CLT tells us that, provided Σ is finite,

√ 𝑑
𝑛(X̄ 𝑛 − 𝜇) → 𝑁 (0, Σ) as 𝑛→∞ (7)

6.6 Exercises

6.6.1 Exercise 1

One very useful consequence of the central limit theorem is as follows.


Assume the conditions of the CLT as stated above.
If 𝑔 ∶ ℝ → ℝ is differentiable at 𝜇 and 𝑔′ (𝜇) ≠ 0, then
90 CHAPTER 6. LLN AND CLT

√ 𝑑
𝑛{𝑔(𝑋̄ 𝑛 ) − 𝑔(𝜇)} → 𝑁 (0, 𝑔′ (𝜇)2 𝜎2 ) as 𝑛→∞ (8)

This theorem is used frequently in statistics to obtain the asymptotic distribution of estima-
tors — many of which can be expressed as functions of sample means.
(These kinds of results are often said to use the “delta method”)
The proof is based on a Taylor expansion of 𝑔 around the point 𝜇.
Taking the result as given, let the distribution 𝐹 of each 𝑋𝑖 be uniform on [0, 𝜋/2] and let
𝑔(𝑥) = sin(𝑥).

Derive the asymptotic distribution of 𝑛{𝑔(𝑋̄ 𝑛 ) − 𝑔(𝜇)} and illustrate convergence in the
same spirit as the program discussed above.
What happens when you replace [0, 𝜋/2] with [0, 𝜋]?
What is the source of the problem?

6.6.2 Exercise 2

Here’s a result that’s often used in developing statistical tests, and is connected to the multi-
variate central limit theorem.
If you study econometric theory, you will see this result used again and again.
Assume the setting of the multivariate CLT discussed above, so that

1. X1 , … , X𝑛 is a sequence of IID random vectors, each taking values in ℝ𝑘 .


2. 𝜇 ∶= 𝔼[X𝑖 ], and Σ is the variance-covariance matrix of X𝑖 .
3. The convergence

√ 𝑑
𝑛(X̄ 𝑛 − 𝜇) → 𝑁 (0, Σ) (9)

is valid.
In a statistical setting, one often wants the right-hand side to be standard normal so that
confidence intervals are easily computed.
This normalization can be achieved on the basis of three observations.
First, if X is a random vector in ℝ𝑘 and A is constant and 𝑘 × 𝑘, then

Var[AX] = A Var[X]A′

𝑑
Second, by the continuous mapping theorem, if Z𝑛 → Z in ℝ𝑘 and A is constant and 𝑘 × 𝑘,
then

𝑑
AZ𝑛 → AZ

Third, if S is a 𝑘×𝑘 symmetric positive definite matrix, then there exists a symmetric positive
definite matrix Q, called the inverse square root of S, such that
6.7. SOLUTIONS 91

QSQ′ = I

Here I is the 𝑘 × 𝑘 identity matrix.


Putting these things together, your first exercise is to show that if Q is the inverse square
root of �, then

√ 𝑑
Z𝑛 ∶= 𝑛Q(X̄ 𝑛 − 𝜇) → Z ∼ 𝑁 (0, I)

Applying the continuous mapping theorem one more time tells us that

𝑑
‖Z𝑛 ‖2 → ‖Z‖2

Given the distribution of Z, we conclude that

𝑑
𝑛‖Q(X̄ 𝑛 − 𝜇)‖2 → 𝜒2 (𝑘) (10)

where 𝜒2 (𝑘) is the chi-squared distribution with 𝑘 degrees of freedom.


(Recall that 𝑘 is the dimension of X𝑖 , the underlying random vectors)
Your second exercise is to illustrate the convergence in (10) with a simulation.
In doing so, let

𝑊𝑖
X𝑖 ∶= ( )
𝑈𝑖 + 𝑊 𝑖

where
• each 𝑊𝑖 is an IID draw from the uniform distribution on [−1, 1].
• each 𝑈𝑖 is an IID draw from the uniform distribution on [−2, 2].
• 𝑈𝑖 and 𝑊𝑖 are independent of each other.
Hints:

1. scipy.linalg.sqrtm(A) computes the square root of A. You still need to invert it.

2. You should be able to work out Σ from the preceding information.

6.7 Solutions

6.7.1 Exercise 1

Here is one solution

In [6]: """
Illustrates the delta method, a consequence of the central limit theorem.
"""

# Set parameters
92 CHAPTER 6. LLN AND CLT

n = 250
replications = 100000
distribution = uniform(loc=0, scale=(np.pi / 2))
μ, s = distribution.mean(), distribution.std()

g = np.sin
g_prime = np.cos

# Generate obs of sqrt{n} (g(X_n) ­ g(μ))


data = distribution.rvs((replications, n))
sample_means = data.mean(axis=1) # Compute mean of each row
error_obs = np.sqrt(n) * (g(sample_means) ­ g(μ))

# Plot
asymptotic_sd = g_prime(μ) * s
fig, ax = plt.subplots(figsize=(10, 6))
xmin = ­3 * g_prime(μ) * s
xmax = ­xmin
ax.set_xlim(xmin, xmax)
ax.hist(error_obs, bins=60, alpha=0.5, density=True)
xgrid = np.linspace(xmin, xmax, 200)
lb = "$N(0, g'(\mu)^2 \sigma^2)$"
ax.plot(xgrid, norm.pdf(xgrid, scale=asymptotic_sd), 'k­', lw=2, label=lb)
ax.legend()
plt.show()

What happens when you replace [0, 𝜋/2] with [0, 𝜋]?
In this case, the mean 𝜇 of this distribution is 𝜋/2, and since 𝑔′ = cos, we have 𝑔′ (𝜇) = 0.
Hence the conditions of the delta theorem are not satisfied.

6.7.2 Exercise 2

First we want to verify the claim that


6.7. SOLUTIONS 93

√ 𝑑
𝑛Q(X̄ 𝑛 − 𝜇) → 𝑁 (0, I)

This is straightforward given the facts presented in the exercise.


Let


Y𝑛 ∶= 𝑛(X̄ 𝑛 − 𝜇) and Y ∼ 𝑁 (0, Σ)

By the multivariate CLT and the continuous mapping theorem, we have

𝑑
QY𝑛 → QY

Since linear combinations of normal random variables are normal, the vector QY is also nor-
mal.
Its mean is clearly 0, and its variance-covariance matrix is

Var[QY] = QVar[Y]Q′ = QΣQ′ = I

𝑑
In conclusion, QY𝑛 → QY ∼ 𝑁 (0, I), which is what we aimed to show.
Now we turn to the simulation exercise.
Our solution is as follows

In [7]: # Set parameters


n = 250
replications = 50000
dw = uniform(loc=­1, scale=2) # Uniform(­1, 1)
du = uniform(loc=­2, scale=4) # Uniform(­2, 2)
sw, su = dw.std(), du.std()
vw, vu = sw**2, su**2
Σ = ((vw, vw), (vw, vw + vu))
Σ = np.array(Σ)

# Compute Σ^{­1/2}
Q = inv(sqrtm(Σ))

# Generate observations of the normalized sample mean


error_obs = np.empty((2, replications))
for i in range(replications):
# Generate one sequence of bivariate shocks
X = np.empty((2, n))
W = dw.rvs(n)
U = du.rvs(n)
# Construct the n observations of the random vector
X[0, :] = W
X[1, :] = W + U
# Construct the i­th observation of Y_n
error_obs[:, i] = np.sqrt(n) * X.mean(axis=1)

# Premultiply by Q and then take the squared norm


temp = Q @ error_obs
chisq_obs = np.sum(temp**2, axis=0)

# Plot
94 CHAPTER 6. LLN AND CLT

fig, ax = plt.subplots(figsize=(10, 6))


xmax = 8
ax.set_xlim(0, xmax)
xgrid = np.linspace(0, xmax, 200)
lb = "Chi­squared with 2 degrees of freedom"
ax.plot(xgrid, chi2.pdf(xgrid, 2), 'k­', lw=2, label=lb)
ax.legend()
ax.hist(chisq_obs, bins=50, density=True)
plt.show()
Chapter 7

Heavy-Tailed Distributions

7.1 Contents

• Overview 7.2
• Visual Comparisons 7.3
• Failure of the LLN 7.4
• Classifying Tail Properties 7.5
• Exercises 7.6
• Solutions 7.7
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon


!pip install ­­upgrade yfinance

7.2 Overview

Most commonly used probability distributions in classical statistics and the natural sciences
have either bounded support or light tails.
When a distribution is light-tailed, extreme observations are rare and draws tend not to devi-
ate too much from the mean.
Having internalized these kinds of distributions, many researchers and practitioners use rules
of thumb such as “outcomes more than four or five standard deviations from the mean can
safely be ignored.”
However, some distributions encountered in economics have far more probability mass in the
tails than distributions like the normal distribution.
With such heavy-tailed distributions, what would be regarded as extreme outcomes for
someone accustomed to thin tailed distributions occur relatively frequently.
Examples of heavy-tailed distributions observed in economic and financial settings include
• the income distributions and the wealth distribution (see, e.g., [108], [9]),
• the firm size distribution ([7], [41]}),
• the distribution of returns on holding assets over short time horizons ([76], [88]), and
• the distribution of city sizes ([91], [41]).

95
96 CHAPTER 7. HEAVY-TAILED DISTRIBUTIONS

These heavy tails turn out to be important for our understanding of economic outcomes.
As one example, the heaviness of the tail in the wealth distribution is one natural measure of
inequality.
It matters for taxation and redistribution policies, as well as for flow-on effects for productiv-
ity growth, business cycles, and political economy
• see, e.g., [2], [43], [15] or [3].
This lecture formalizes some of the concepts introduced above and reviews the key ideas.
Let’s start with some imports:

In [2]: import numpy as np


import quantecon as qe
import matplotlib.pyplot as plt
%matplotlib inline

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

The following two lines can be added to avoid an annoying FutureWarning, and prevent a
specific compatibility issue between pandas and matplotlib from causing problems down the
line:

In [3]: from pandas.plotting import register_matplotlib_converters


register_matplotlib_converters()

7.3 Visual Comparisons

One way to build intuition on the difference between light and heavy tails is to plot indepen-
dent draws and compare them side-by-side.

7.3.1 A Simulation

The figure below shows a simulation. (You will be asked to replicate it in the exercises.)
The top two subfigures each show 120 independent draws from the normal distribution, which
is light-tailed.
The bottom subfigure shows 120 independent draws from the Cauchy distribution, which is
7.3. VISUAL COMPARISONS 97

heavy-tailed.
98 CHAPTER 7. HEAVY-TAILED DISTRIBUTIONS

In the top subfigure, the standard deviation of the normal distribution is 2, and the draws are
clustered around the mean.
In the middle subfigure, the standard deviation is increased to 12 and, as expected, the
amount of dispersion rises.
The bottom subfigure, with the Cauchy draws, shows a different pattern: tight clustering
around the mean for the great majority of observations, combined with a few sudden large
deviations from the mean.
This is typical of a heavy-tailed distribution.

7.3.2 Heavy Tails in Asset Returns

Next let’s look at some financial data.


Our aim is to plot the daily change in the price of Amazon (AMZN) stock for the period from
1st January 2015 to 1st November 2019.
This equates to daily returns if we set dividends aside.
The code below produces the desired plot using Yahoo financial data via the yfinance library.

In [4]: import yfinance as yf


import pandas as pd

s = yf.download('AMZN', '2015­1­1', '2019­11­1')['Adj Close']

r = s.pct_change()

fig, ax = plt.subplots()

ax.plot(r, linestyle='', marker='o', alpha=0.5, ms=4)


ax.vlines(r.index, 0, r.values, lw=0.2)

ax.set_ylabel('returns', fontsize=12)
ax.set_xlabel('date', fontsize=12)

plt.show()

[*********************100%***********************] 1 of 1 completed
7.4. FAILURE OF THE LLN 99

Five of the 1217 observations are more than 5 standard deviations from the mean.
Overall, the figure is suggestive of heavy tails, although not to the same degree as the Cauchy
distribution the figure above.
If, however, one takes tick-by-tick data rather daily data, the heavy-tailedness of the distribu-
tion increases further.

7.4 Failure of the LLN

One impact of heavy tails is that sample averages can be poor estimators of the underlying
mean of the distribution.
To understand this point better, recall our earlier discussion of the Law of Large Numbers,
which considered IID 𝑋1 , … , 𝑋𝑛 with common distribution 𝐹
𝑛
If 𝔼|𝑋𝑖 | is finite, then the sample mean 𝑋̄ 𝑛 ∶= 1
𝑛 ∑𝑖=1 𝑋𝑖 satisfies

ℙ {𝑋̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (1)

where 𝜇 ∶= 𝔼𝑋𝑖 = ∫ 𝑥𝐹 (𝑥) is the common mean of the sample.


The condition 𝔼|𝑋𝑖 | = ∫ |𝑥|𝐹 (𝑥) < ∞ holds in most cases but can fail if the distribution 𝐹 is
very heavy tailed.
For example, it fails for the Cauchy distribution
Let’s have a look at the behavior of the sample mean in this case, and see whether or not the
LLN is still valid.

In [5]: from scipy.stats import cauchy


100 CHAPTER 7. HEAVY-TAILED DISTRIBUTIONS

np.random.seed(1234)
N = 1_000

distribution = cauchy()

fig, ax = plt.subplots()
data = distribution.rvs(N)

# Compute sample mean at each n


sample_mean = np.empty(N)
for n in range(1, N):
sample_mean[n] = np.mean(data[:n])

# Plot
ax.plot(range(N), sample_mean, alpha=0.6, label='$\\bar X_n$')

ax.plot(range(N), np.zeros(N), 'k­­', lw=0.5)


ax.legend()

plt.show()

The sequence shows no sign of converging.


Will convergence occur if we take 𝑛 even larger?
The answer is no.
To see this, recall that the characteristic function of the Cauchy distribution is

𝜙(𝑡) = 𝔼𝑒𝑖𝑡𝑋 = ∫ 𝑒𝑖𝑡𝑥 𝑓(𝑥)𝑑𝑥 = 𝑒−|𝑡| (2)

Using independence, the characteristic function of the sample mean becomes


7.5. CLASSIFYING TAIL PROPERTIES 101

𝑖𝑡𝑋̄ 𝑛 𝑡 𝑛
𝔼𝑒 = 𝔼 exp {𝑖 ∑ 𝑋𝑗 }
𝑛 𝑗=1
𝑛
𝑡
= 𝔼 ∏ exp {𝑖 𝑋𝑗 }
𝑗=1
𝑛
𝑛
𝑡
= ∏ 𝔼 exp {𝑖 𝑋𝑗 } = [𝜙(𝑡/𝑛)]𝑛
𝑗=1
𝑛

In view of (2), this is just 𝑒−|𝑡| .


Thus, in the case of the Cauchy distribution, the sample mean itself has the very same
Cauchy distribution, regardless of 𝑛!
In particular, the sequence 𝑋̄ 𝑛 does not converge to any point.

7.5 Classifying Tail Properties

To keep our discussion precise, we need some definitions concerning tail properties.
We will focus our attention on the right hand tails of nonnegative random variables and their
distributions.
The definitions for left hand tails are very similar and we omit them to simplify the exposi-
tion.

7.5.1 Light and Heavy Tails

A distribution 𝐹 on ℝ+ is called heavy-tailed if


∫ exp(𝑡𝑥)𝐹 (𝑑𝑥) = ∞ for all 𝑡 > 0. (3)
0

We say that a nonnegative random variable 𝑋 is heavy-tailed if its distribution 𝐹 (𝑥) ∶=


ℙ{𝑋 ≤ 𝑥} is heavy-tailed.
This is equivalent to stating that its moment generating function 𝑚(𝑡) ∶= 𝔼 exp(𝑡𝑋) is
infinite for all 𝑡 > 0.
• For example, the lognormal distribution is heavy-tailed because its moment generating
function is infinite everywhere on (0, ∞).
A distribution 𝐹 on ℝ+ is called light-tailed if it is not heavy-tailed.
A nonnegative random variable 𝑋 is light-tailed if its distribution 𝐹 is light-tailed.
• Example: Every random variable with bounded support is light-tailed. (Why?)
• Example: If 𝑋 has the exponential distribution, with cdf 𝐹 (𝑥) = 1 − exp(−𝜆𝑥) for some
𝜆 > 0, then its moment generating function is finite whenever 𝑡 < 𝜆. Hence 𝑋 is light-
tailed.
One can show that if 𝑋 is light-tailed, then all of its moments are finite.
The contrapositive is that if some moment is infinite, then 𝑋 is heavy-tailed.
102 CHAPTER 7. HEAVY-TAILED DISTRIBUTIONS

The latter condition is not necessary, however.


• Example: the lognormal distribution is heavy-tailed but every moment is finite.

7.5.2 Pareto Tails

One specific class of heavy-tailed distributions has been found repeatedly in economic and
social phenomena: the class of so-called power laws.
Specifically, given 𝛼 > 0, a nonnegative random variable 𝑋 is said to have a Pareto tail with
tail index 𝛼 if

lim 𝑥𝛼 ℙ{𝑋 > 𝑥} = 𝑐. (4)


𝑥→∞

Evidently (4) implies the existence of positive constants 𝑏 and 𝑥̄ such that ℙ{𝑋 > 𝑥} ≥ 𝑏𝑥−𝛼
whenever 𝑥 ≥ 𝑥.̄
The implication is that ℙ{𝑋 > 𝑥} converges to zero no faster than 𝑥−𝛼 .
In some sources, a random variable obeying (4) is said to have a power law tail.
The primary example is the Pareto distribution, which has distribution

𝛼
1 − (𝑥/𝑥)
̄ if 𝑥 ≥ 𝑥̄
𝐹 (𝑥) = { (5)
0 if 𝑥 < 𝑥̄

for some positive constants 𝑥̄ and 𝛼.


It is easy to see that if 𝑋 ∼ 𝐹 , then ℙ{𝑋 > 𝑥} satisfies (4).
Thus, in line with the terminology, Pareto distributed random variables have a Pareto tail.

7.5.3 Rank-Size Plots

One graphical technique for investigating Pareto tails and power laws is the so-called rank-
size plot.
This kind of figure plots log size against log rank of the population (i.e., location in the popu-
lation when sorted from smallest to largest).
Often just the largest 5 or 10% of observations are plotted.
For a sufficiently large number of draws from a Pareto distribution, the plot generates a
straight line. For distributions with thinner tails, the data points are concave.
A discussion of why this occurs can be found in [84].
The figure below provides one example, using simulated data.
The rank-size plots shows draws from three different distributions: folded normal, chi-squared
with 1 degree of freedom and Pareto.
The Pareto sample produces a straight line, while the lines produced by the other samples are
concave.
7.6. EXERCISES 103

You are asked to reproduce this figure in the exercises.

7.6 Exercises

7.6.1 Exercise 1

Replicate the figure presented above that compares normal and Cauchy draws.
Use np.random.seed(11) to set the seed.

7.6.2 Exercise 2

Prove: If 𝑋 has a Pareto tail with tail index 𝛼, then 𝔼[𝑋 𝑟 ] = ∞ for all 𝑟 ≥ 𝛼.
104 CHAPTER 7. HEAVY-TAILED DISTRIBUTIONS

7.6.3 Exercise 3

Repeat exercise 1, but replace the three distributions (two normal, one Cauchy) with three
Pareto distributions using different choices of 𝛼.
For 𝛼, try 1.15, 1.5 and 1.75.
Use np.random.seed(11) to set the seed.

7.6.4 Exercise 4

Replicate the rank-size plot figure presented above.


If you like you can use the function qe.rank_size from the quantecon library to generate the
plots.
Use np.random.seed(13) to set the seed.

7.6.5 Exercise 5

There is an ongoing argument about whether the firm size distribution should be modeled as
a Pareto distribution or a lognormal distribution (see, e.g., [40], [65] or [99]).
This sounds esoteric but has real implications for a variety of economic phenomena.
To illustrate this fact in a simple way, let us consider an economy with 100,000 firms, an in-
terest rate of r = 0.05 and a corporate tax rate of 15%.
Your task is to estimate the present discounted value of projected corporate tax revenue over
the next 10 years.
Because we are forecasting, we need a model.
We will suppose that

1. the number of firms and the firm size distribution (measured in profits) remain fixed
and

2. the firm size distribution is either lognormal or Pareto.

Present discounted value of tax revenue will be estimated by

1. generating 100,000 draws of firm profit from the firm size distribution,

2. multiplying by the tax rate, and

3. summing the results with discounting to obtain present value.

The Pareto distribution is assumed to take the form (5) with 𝑥̄ = 1 and 𝛼 = 1.05.
(The value the tail index 𝛼 is plausible given the data [41].)
To make the lognormal option as similar as possible to the Pareto option, choose its parame-
ters such that the mean and median of both distributions are the same.
Note that, for each distribution, your estimate of tax revenue will be random because it is
based on a finite number of draws.
7.7. SOLUTIONS 105

To take this into account, generate 100 replications (evaluations of tax revenue) for each of
the two distributions and compare the two samples by
• producing a violin plot visualizing the two samples side-by-side and
• printing the mean and standard deviation of both samples.
For the seed use np.random.seed(1234).
What differences do you observe?
(Note: a better approach to this problem would be to model firm dynamics and try to track
individual firms given the current distribution. We will discuss firm dynamics in later lec-
tures.)

7.7 Solutions

7.7.1 Exercise 1

In [6]: n = 120
np.random.seed(11)

fig, axes = plt.subplots(3, 1, figsize=(6, 12))

for ax in axes:
ax.set_ylim((­120, 120))

s_vals = 2, 12

for ax, s in zip(axes[:2], s_vals):


data = np.random.randn(n) * s
ax.plot(list(range(n)), data, linestyle='', marker='o', alpha=0.5, ms=4)
ax.vlines(list(range(n)), 0, data, lw=0.2)
ax.set_title(f"draws from $N(0, \sigma^2)$ with $\sigma = {s}$", fontsize=11)

ax = axes[2]
distribution = cauchy()
data = distribution.rvs(n)
ax.plot(list(range(n)), data, linestyle='', marker='o', alpha=0.5, ms=4)
ax.vlines(list(range(n)), 0, data, lw=0.2)
ax.set_title(f"draws from the Cauchy distribution", fontsize=11)

plt.subplots_adjust(hspace=0.25)

plt.show()
106 CHAPTER 7. HEAVY-TAILED DISTRIBUTIONS
7.7. SOLUTIONS 107

7.7.2 Exercise 2

Let 𝑋 have a Pareto tail with tail index 𝛼 and let 𝐹 be its cdf.
Fix 𝑟 ≥ 𝛼.
As discussed after (4), we can take positive constants 𝑏 and 𝑥̄ such that

ℙ{𝑋 > 𝑥} ≥ 𝑏𝑥−𝛼 whenever 𝑥 ≥ 𝑥̄

But then

∞ 𝑥̄ ∞
𝑟 𝑟−1 𝑟−1
𝔼𝑋 = 𝑟 ∫ 𝑥 ℙ{𝑋 > 𝑥}𝑥 ≥ 𝑟 ∫ 𝑥 ℙ{𝑋 > 𝑥}𝑥 + 𝑟 ∫ 𝑥𝑟−1 𝑏𝑥−𝛼 𝑥.
0 0 𝑥̄


We know that ∫𝑥̄ 𝑥𝑟−𝛼−1 𝑥 = ∞ whenever 𝑟 − 𝛼 − 1 ≥ −1.
Since 𝑟 ≥ 𝛼, we have 𝔼𝑋 𝑟 = ∞.

7.7.3 Exercise 3

In [7]: from scipy.stats import pareto

np.random.seed(11)

n = 120
alphas = [1.15, 1.50, 1.75]

fig, axes = plt.subplots(3, 1, figsize=(6, 8))

for (a, ax) in zip(alphas, axes):


ax.set_ylim((­5, 50))
data = pareto.rvs(size=n, scale=1, b=a)
ax.plot(list(range(n)), data, linestyle='', marker='o', alpha=0.5, ms=4)
ax.vlines(list(range(n)), 0, data, lw=0.2)
ax.set_title(f"Pareto draws with $\\alpha = {a}$", fontsize=11)

plt.subplots_adjust(hspace=0.4)

plt.show()
108 CHAPTER 7. HEAVY-TAILED DISTRIBUTIONS

7.7.4 Exercise 4

First let’s generate the data for the plots:

In [8]: sample_size = 1000


np.random.seed(13)
z = np.random.randn(sample_size)

data_1 = np.abs(z)
data_2 = np.exp(z)
data_3 = np.exp(np.random.exponential(scale=1.0, size=sample_size))

data_list = [data_1, data_2, data_3]


7.7. SOLUTIONS 109

Now we plot the data:

In [9]: fig, axes = plt.subplots(3, 1, figsize=(6, 8))


axes = axes.flatten()
labels = ['$|z|$', '$\exp(z)$', 'Pareto with tail index $1.0$']

for data, label, ax in zip(data_list, labels, axes):

rank_data, size_data = qe.rank_size(data)

ax.loglog(rank_data, size_data, 'o', markersize=3.0, alpha=0.5, label=label)


ax.set_xlabel("log rank")
ax.set_ylabel("log size")

ax.legend()

fig.subplots_adjust(hspace=0.4)

plt.show()
110 CHAPTER 7. HEAVY-TAILED DISTRIBUTIONS

7.7.5 Exercise 5

To do the exercise, we need to choose the parameters 𝜇 and 𝜎 of the lognormal distribution to
match the mean and median of the Pareto distribution.
Here we understand the lognormal distribution as that of the random variable exp(𝜇 + 𝜎𝑍)
when 𝑍 is standard normal.
The mean and median of the Pareto distribution (5) with 𝑥̄ = 1 are

𝛼
mean = and median = 21/𝛼
𝛼−1
7.7. SOLUTIONS 111

Using the corresponding expressions for the lognormal distribution leads us to the equations

𝛼
= exp(𝜇 + 𝜎2 /2) and 21/𝛼 = exp(𝜇)
𝛼−1

which we solve for 𝜇 and 𝜎 given 𝛼 = 1.05


Here is code that generates the two samples, produces the violin plot and prints the mean
and standard deviation of the two samples.

In [10]: num_firms = 100_000


num_years = 10
tax_rate = 0.15
r = 0.05

β = 1 / (1 + r) # discount factor

x_bar = 1.0
α = 1.05

def pareto_rvs(n):
"Uses a standard method to generate Pareto draws."
u = np.random.uniform(size=n)
y = x_bar / (u**(1/α))
return y

Let’s compute the lognormal parameters:

In [11]: μ = np.log(2) / α
σ_sq = 2 * (np.log(α/(α ­ 1)) ­ np.log(2)/α)
σ = np.sqrt(σ_sq)

Here’s a function to compute a single estimate of tax revenue for a particular choice of distri-
bution dist.

In [12]: def tax_rev(dist):


tax_raised = 0
for t in range(num_years):
if dist == 'pareto':
π = pareto_rvs(num_firms)
else:
π = np.exp(μ + σ * np.random.randn(num_firms))
tax_raised += β**t * np.sum(π * tax_rate)
return tax_raised

Now let’s generate the violin plot.

In [13]: num_reps = 100


np.random.seed(1234)

tax_rev_lognorm = np.empty(num_reps)
tax_rev_pareto = np.empty(num_reps)

for i in range(num_reps):
tax_rev_pareto[i] = tax_rev('pareto')
tax_rev_lognorm[i] = tax_rev('lognorm')
112 CHAPTER 7. HEAVY-TAILED DISTRIBUTIONS

fig, ax = plt.subplots()

data = tax_rev_pareto, tax_rev_lognorm

ax.violinplot(data)

plt.show()

Finally, let’s print the means and standard deviations.

In [14]: tax_rev_pareto.mean(), tax_rev_pareto.std()

Out[14]: (1458729.0546623734, 406089.3613661567)

In [15]: tax_rev_lognorm.mean(), tax_rev_lognorm.std()

Out[15]: (2556174.8615230713, 25586.44456513965)

Looking at the output of the code, our main conclusion is that the Pareto assumption leads
to a lower mean and greater dispersion.
Part II

Introduction to Dynamics

113
Chapter 8

Dynamics in One Dimension

8.1 Contents

• Overview 8.2
• Some Definitions 8.3
• Graphical Analysis 8.4
• Exercises 8.5
• Solutions 8.6

8.2 Overview

In this lecture we give a quick introduction to discrete time dynamics in one dimension.
In one-dimensional models, the state of the system is described by a single variable.
Although most interesting dynamic models have two or more state variables, the one-
dimensional setting is a good place to learn the foundations of dynamics and build intuition.
Let’s start with some standard imports:

In [1]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

8.3 Some Definitions

This section sets out the objects of interest and the kinds of properties we study.

8.3.1 Difference Equations

A time homogeneous first order difference equation is an equation of the form

𝑥𝑡+1 = 𝑔(𝑥𝑡 ) (1)

where 𝑔 is a function from some subset 𝑆 of ℝ to itself.

115
116 CHAPTER 8. DYNAMICS IN ONE DIMENSION

Here 𝑆 is called the state space and 𝑥 is called the state variable.
In the definition,
• time homogeneity means that 𝑔 is the same at each time 𝑡
• first order means dependence on only one lag (i.e., earlier states such as 𝑥𝑡−1 do not en-
ter into (1)).
If 𝑥0 ∈ 𝑆 is given, then (1) recursively defines the sequence

𝑥0 , 𝑥1 = 𝑔(𝑥0 ), 𝑥2 = 𝑔(𝑥1 ) = 𝑔(𝑔(𝑥0 )), etc. (2)

This sequence is called the trajectory of 𝑥0 under 𝑔.


If we define 𝑔𝑛 to be 𝑛 compositions of 𝑔 with itself, then we can write the trajectory more
simply as 𝑥𝑡 = 𝑔𝑡 (𝑥0 ) for 𝑡 ≥ 0.

8.3.2 Example: A Linear Model

One simple example is the linear difference equation

𝑥𝑡+1 = 𝑎𝑥𝑡 + 𝑏, 𝑆=ℝ

where 𝑎, 𝑏 are fixed constants.


In this case, given 𝑥0 , the trajectory (2) is

𝑥0 , 𝑎𝑥0 + 𝑏, 𝑎2 𝑥0 + 𝑎𝑏 + 𝑏, etc. (3)

Continuing in this way, and using our knowledge of geometric series, we find that, for any 𝑡 ≥
0,

1 − 𝑎𝑡
𝑥𝑡 = 𝑎𝑡 𝑥0 + 𝑏 (4)
1−𝑎
This is about all we need to know about the linear model.
We have an exact expression for 𝑥𝑡 for all 𝑡 and hence a full understanding of the dynamics.
Notice in particular that |𝑎| < 1, then, by (4), we have

𝑏
𝑥𝑡 → as 𝑡 → ∞ (5)
1−𝑎

regardless of 𝑥0
This is an example of what is called global stability, a topic we return to below.

8.3.3 Example: A Nonlinear Model

In the linear example above, we obtained an exact analytical expression for 𝑥𝑡 in terms of ar-
bitrary 𝑡 and 𝑥0 .
This made analysis of dynamics very easy.
8.4. GRAPHICAL ANALYSIS 117

When models are nonlinear, however, the situation can be quite different.
For example, recall how we previously studied the law of motion for the Solow growth model,
a simplified version of which is

𝑘𝑡+1 = 𝑠𝑧𝑘𝑡𝛼 + (1 − 𝛿)𝑘𝑡 (6)

Here 𝑘 is capital stock and 𝑠, 𝑧, 𝛼, 𝛿 are positive parameters with 0 < 𝛼, 𝛿 < 1.
If you try to iterate like we did in (3), you will find that the algebra gets messy quickly.
Analyzing the dynamics of this model requires a different method (see below).

8.3.4 Stability

A steady state of the difference equation 𝑥𝑡+1 = 𝑔(𝑥𝑡 ) is a point 𝑥∗ in 𝑆 such that 𝑥∗ =
𝑔(𝑥∗ ).
In other words, 𝑥∗ is a fixed point of the function 𝑔 in 𝑆.
For example, for the linear model 𝑥𝑡+1 = 𝑎𝑥𝑡 + 𝑏, you can use the definition to check that
• 𝑥∗ ∶= 𝑏/(1 − 𝑎) is a steady state whenever 𝑎 ≠ 1.
• if 𝑎 = 1 and 𝑏 = 0, then every 𝑥 ∈ ℝ is a steady state.
• if 𝑎 = 1 and 𝑏 ≠ 0, then the linear model has no steady state in ℝ.
A steady state 𝑥∗ of 𝑥𝑡+1 = 𝑔(𝑥𝑡 ) is called globally stable if, for all 𝑥0 ∈ 𝑆,

𝑥𝑡 = 𝑔𝑡 (𝑥0 ) → 𝑥∗ as 𝑡 → ∞

For example, in the linear model 𝑥𝑡+1 = 𝑎𝑥𝑡 + 𝑏 with 𝑎 ≠ 1, the steady state 𝑥∗
• is globally stable if |𝑎| < 1 and
• fails to be globally stable otherwise.
This follows directly from (4).
A steady state 𝑥∗ of 𝑥𝑡+1 = 𝑔(𝑥𝑡 ) is called locally stable if there exists an 𝜖 > 0 such that

|𝑥0 − 𝑥∗ | < 𝜖 ⟹ 𝑥𝑡 = 𝑔𝑡 (𝑥0 ) → 𝑥∗ as 𝑡 → ∞

Obviously every globally stable steady state is also locally stable.


We will see examples below where the converse is not true.

8.4 Graphical Analysis

As we saw above, analyzing the dynamics for nonlinear models is nontrivial.


There is no single way to tackle all nonlinear models.
However, there is one technique for one-dimensional models that provides a great deal of intu-
ition.
This is a graphical approach based on 45 degree diagrams.
118 CHAPTER 8. DYNAMICS IN ONE DIMENSION

Let’s look at an example: the Solow model with dynamics given in (6).
We begin with some plotting code that you can ignore at first reading.
The function of the code is to produce 45 degree diagrams and time series plots.

In [2]: def subplots(fs):


"Custom subplots with axes throught the origin"
fig, ax = plt.subplots(figsize=fs)

# Set the axes through the origin


for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
ax.spines[spine].set_color('green')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

return fig, ax

def plot45(g, xmin, xmax, x0, num_arrows=6, var='x'):

xgrid = np.linspace(xmin, xmax, 200)

fig, ax = subplots((6.5, 6))


ax.set_xlim(xmin, xmax)
ax.set_ylim(xmin, xmax)

hw = (xmax ­ xmin) * 0.01


hl = 2 * hw
arrow_args = dict(fc="k", ec="k", head_width=hw,
length_includes_head=True, lw=1,
alpha=0.6, head_length=hl)

ax.plot(xgrid, g(xgrid), 'b­', lw=2, alpha=0.6, label='g')


ax.plot(xgrid, xgrid, 'k­', lw=1, alpha=0.7, label='45')

x = x0
xticks = [xmin]
xtick_labels = [xmin]

for i in range(num_arrows):
if i == 0:
ax.arrow(x, 0.0, 0.0, g(x), **arrow_args) # x, y, dx, dy
else:
ax.arrow(x, x, 0.0, g(x) ­ x, **arrow_args)
ax.plot((x, x), (0, x), 'k', ls='dotted')

ax.arrow(x, g(x), g(x) ­ x, 0, **arrow_args)


xticks.append(x)
xtick_labels.append(r'${}_{}$'.format(var, str(i)))

x = g(x)
xticks.append(x)
xtick_labels.append(r'${}_{}$'.format(var, str(i+1)))
ax.plot((x, x), (0, x), 'k­', ls='dotted')

xticks.append(xmax)
xtick_labels.append(xmax)
ax.set_xticks(xticks)
ax.set_yticks(xticks)
ax.set_xticklabels(xtick_labels)
ax.set_yticklabels(xtick_labels)
8.4. GRAPHICAL ANALYSIS 119

bbox = (0., 1.04, 1., .104)


legend_args = {'bbox_to_anchor': bbox, 'loc': 'upper right'}

ax.legend(ncol=2, frameon=False, **legend_args, fontsize=14)


plt.show()

def ts_plot(g, xmin, xmax, x0, ts_length=6, var='x'):


fig, ax = subplots((7, 5.5))
ax.set_ylim(xmin, xmax)
ax.set_xlabel(r'$t$', fontsize=14)
ax.set_ylabel(r'${}_t$'.format(var), fontsize=14)
x = np.empty(ts_length)
x[0] = x0
for t in range(ts_length­1):
x[t+1] = g(x[t])
ax.plot(range(ts_length),
x,
'bo­',
alpha=0.6,
lw=2,
label=r'${}_t$'.format(var))
ax.legend(loc='best', fontsize=14)
ax.set_xticks(range(ts_length))
plt.show()

Let’s create a 45 degree diagram for the Solow model with a fixed set of parameters

In [3]: A, s, alpha, delta = 2, 0.3, 0.3, 0.4

Here’s the update function corresponding to the model.

In [4]: def g(k):


return A * s * k**alpha + (1 ­ delta) * k

Here is the 45 degree plot.

In [5]: xmin, xmax = 0, 4 # Suitable plotting region.

plot45(g, xmin, xmax, 0, num_arrows=0)


120 CHAPTER 8. DYNAMICS IN ONE DIMENSION

The plot shows the function 𝑔 and the 45 degree line.


Think of 𝑘𝑡 as a value on the horizontal axis.
To calculate 𝑘𝑡+1 , we can use the graph of 𝑔 to see its value on the vertical axis.
Clearly,
• If 𝑔 lies above the 45 degree line at this point, then we have 𝑘𝑡+1 > 𝑘𝑡 .
• If 𝑔 lies below the 45 degree line at this point, then we have 𝑘𝑡+1 < 𝑘𝑡 .
• If 𝑔 hits the 45 degree line at this point, then we have 𝑘𝑡+1 = 𝑘𝑡 , so 𝑘𝑡 is a steady state.
For the Solow model, there are two steady states when 𝑆 = ℝ+ = [0, ∞).
• the origin 𝑘 = 0
• the unique positive number such that 𝑘 = 𝑠𝑧𝑘𝛼 + (1 − 𝛿)𝑘.
By using some algebra, we can show that in the second case, the steady state is

𝑠𝑧 1/(1−𝛼)
𝑘∗ = ( )
𝛿
8.4. GRAPHICAL ANALYSIS 121

8.4.1 Trajectories

By the preceding discussion, in regions where 𝑔 lies above the 45 degree line, we know that
the trajectory is increasing.
The next figure traces out a trajectory in such a region so we can see this more clearly.
The initial condition is 𝑘0 = 0.25.

In [6]: k0 = 0.25

plot45(g, xmin, xmax, k0, num_arrows=5, var='k')

We can plot the time series of capital corresponding to the figure above as follows:

In [7]: ts_plot(g, xmin, xmax, k0, var='k')


122 CHAPTER 8. DYNAMICS IN ONE DIMENSION

Here’s a somewhat longer view:

In [8]: ts_plot(g, xmin, xmax, k0, ts_length=20, var='k')


8.4. GRAPHICAL ANALYSIS 123

When capital stock is higher than the unique positive steady state, we see that it declines:

In [9]: k0 = 2.95

plot45(g, xmin, xmax, k0, num_arrows=5, var='k')


124 CHAPTER 8. DYNAMICS IN ONE DIMENSION

Here is the time series:

In [10]: ts_plot(g, xmin, xmax, k0, var='k')


8.4. GRAPHICAL ANALYSIS 125

8.4.2 Complex Dynamics

The Solow model is nonlinear but still generates very regular dynamics.
One model that generates irregular dynamics is the quadratic map

𝑔(𝑥) = 4𝑥(1 − 𝑥), 𝑥 ∈ [0, 1]

Let’s have a look at the 45 degree diagram.

In [11]: xmin, xmax = 0, 1


g = lambda x: 4 * x * (1 ­ x)

x0 = 0.3
plot45(g, xmin, xmax, x0, num_arrows=0)
126 CHAPTER 8. DYNAMICS IN ONE DIMENSION

Now let’s look at a typical trajectory.

In [12]: plot45(g, xmin, xmax, x0, num_arrows=6)


8.4. GRAPHICAL ANALYSIS 127

Notice how irregular it is.


Here is the corresponding time series plot.

In [13]: ts_plot(g, xmin, xmax, x0, ts_length=6)


128 CHAPTER 8. DYNAMICS IN ONE DIMENSION

The irregularity is even clearer over a longer time horizon:

In [14]: ts_plot(g, xmin, xmax, x0, ts_length=20)


8.5. EXERCISES 129

8.5 Exercises

8.5.1 Exercise 1

Consider again the linear model 𝑥𝑡+1 = 𝑎𝑥𝑡 + 𝑏 with 𝑎 ≠ 1.


The unique steady state is 𝑏/(1 − 𝑎).
The steady state is globally stable if |𝑎| < 1.
Try to illustrate this graphically by looking at a range of initial conditions.
What differences do you notice in the cases 𝑎 ∈ (−1, 0) and 𝑎 ∈ (0, 1)?
Use 𝑎 = 0.5 and then 𝑎 = −0.5 and study the trajectories
Set 𝑏 = 1 throughout.

8.6 Solutions

8.6.1 Exercise 1

We will start with the case 𝑎 = 0.5.


Let’s set up the model and plotting region:
130 CHAPTER 8. DYNAMICS IN ONE DIMENSION

In [15]: a, b = 0.5, 1
xmin, xmax = ­1, 3
g = lambda x: a * x + b

Now let’s plot a trajectory:

In [16]: x0 = ­0.5
plot45(g, xmin, xmax, x0, num_arrows=5)

Here is the corresponding time series, which converges towards the steady state.

In [17]: ts_plot(g, xmin, xmax, x0, ts_length=10)


8.6. SOLUTIONS 131

Now let’s try 𝑎 = −0.5 and see what differences we observe.


Let’s set up the model and plotting region:

In [18]: a, b = ­0.5, 1
xmin, xmax = ­1, 3
g = lambda x: a * x + b

Now let’s plot a trajectory:

In [19]: x0 = ­0.5
plot45(g, xmin, xmax, x0, num_arrows=5)
132 CHAPTER 8. DYNAMICS IN ONE DIMENSION

Here is the corresponding time series, which converges towards the steady state.

In [20]: ts_plot(g, xmin, xmax, x0, ts_length=10)


8.6. SOLUTIONS 133

Once again, we have convergence to the steady state but the nature of convergence differs.
In particular, the time series jumps from above the steady state to below it and back again.
In the current context, the series is said to exhibit damped oscillations.
134 CHAPTER 8. DYNAMICS IN ONE DIMENSION
Chapter 9

AR1 Processes

9.1 Contents

• Overview 9.2
• The AR(1) Model 9.3
• Stationarity and Asymptotic Stability 9.4
• Ergodicity 9.5
• Exercises 9.6
• Solutions 9.7

9.2 Overview

In this lecture we are going to study a very simple class of stochastic models called AR(1)
processes.
These simple models are used again and again in economic research to represent the dynamics
of series such as
• labor income
• dividends
• productivity, etc.
AR(1) processes can take negative values but are easily converted into positive processes
when necessary by a transformation such as exponentiation.
We are going to study AR(1) processes partly because they are useful and partly because
they help us understand important concepts.
Let’s start with some imports:

In [1]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

9.3 The AR(1) Model

The AR(1) model (autoregressive model of order 1) takes the form

135
136 CHAPTER 9. AR1 PROCESSES

𝑋𝑡+1 = 𝑎𝑋𝑡 + 𝑏 + 𝑐𝑊𝑡+1 (1)

where 𝑎, 𝑏, 𝑐 are scalar-valued parameters.


This law of motion generates a time series {𝑋𝑡 } as soon as we specify an initial condition 𝑋0 .
This is called the state process and the state space is ℝ.
To make things even simpler, we will assume that
• the process {𝑊𝑡 } is IID and standard normal,
• the initial condition 𝑋0 is drawn from the normal distribution 𝑁 (𝜇0 , 𝑣0 ) and
• the initial condition 𝑋0 is independent of {𝑊𝑡 }.

9.3.1 Moving Average Representation

Iterating backwards from time 𝑡, we obtain

𝑋𝑡 = 𝑎𝑋𝑡−1 + 𝑏 + 𝑐𝑊𝑡 = 𝑎2 𝑋𝑡−2 + 𝑎𝑏 + 𝑎𝑐𝑊𝑡−1 + 𝑏 + 𝑐𝑊𝑡 = ⋯

If we work all the way back to time zero, we get

𝑡−1 𝑡−1
𝑋𝑡 = 𝑎𝑡 𝑋0 + 𝑏 ∑ 𝑎𝑗 + 𝑐 ∑ 𝑎𝑗 𝑊𝑡−𝑗 (2)
𝑗=0 𝑗=0

Equation (2) shows that 𝑋𝑡 is a well defined random variable, the value of which depends on
• the parameters,
• the initial condition 𝑋0 and
• the shocks 𝑊1 , … 𝑊𝑡 from time 𝑡 = 1 to the present.
Throughout, the symbol 𝜓𝑡 will be used to refer to the density of this random variable 𝑋𝑡 .

9.3.2 Distribution Dynamics

One of the nice things about this model is that it’s so easy to trace out the sequence of distri-
butions {𝜓𝑡 } corresponding to the time series {𝑋𝑡 }.
To see this, we first note that 𝑋𝑡 is normally distributed for each 𝑡.
This is immediate form (2), since linear combinations of independent normal random vari-
ables are normal.
Given that 𝑋𝑡 is normally distributed, we will know the full distribution 𝜓𝑡 if we can pin
down its first two moments.
Let 𝜇𝑡 and 𝑣𝑡 denote the mean and variance of 𝑋𝑡 respectively.
We can pin down these values from (2) or we can use the following recursive expressions:

𝜇𝑡+1 = 𝑎𝜇𝑡 + 𝑏 and 𝑣𝑡+1 = 𝑎2 𝑣𝑡 + 𝑐2 (3)

These expressions are obtained from (1) by taking, respectively, the expectation and variance
of both sides of the equality.
9.3. THE AR(1) MODEL 137

In calculating the second expression, we are using the fact that 𝑋𝑡 and 𝑊𝑡+1 are independent.
(This follows from our assumptions and (2).)
Given the dynamics in (2) and initial conditions 𝜇0 , 𝑣0 , we obtain 𝜇𝑡 , 𝑣𝑡 and hence

𝜓𝑡 = 𝑁 (𝜇𝑡 , 𝑣𝑡 )

The following code uses these facts to track the sequence of marginal distributions {𝜓𝑡 }.
The parameters are

In [2]: a, b, c = 0.9, 0.1, 0.5

mu, v = ­3.0, 0.6 # initial conditions mu_0, v_0

Here’s the sequence of distributions:

In [3]: from scipy.stats import norm

sim_length = 10
grid = np.linspace(­5, 7, 120)

fig, ax = plt.subplots()

for t in range(sim_length):
mu = a * mu + b
v = a**2 * v + c**2
ax.plot(grid, norm.pdf(grid, loc=mu, scale=np.sqrt(v)),
label=f"$\psi_{t}$",
alpha=0.7)

ax.legend(bbox_to_anchor=[1.05,1],loc=2,borderaxespad=1)

plt.show()
138 CHAPTER 9. AR1 PROCESSES

9.4 Stationarity and Asymptotic Stability

Notice that, in the figure above, the sequence {𝜓𝑡 } seems to be converging to a limiting dis-
tribution.
This is even clearer if we project forward further into the future:

In [4]: def plot_density_seq(ax, mu_0=­3.0, v_0=0.6, sim_length=60):


mu, v = mu_0, v_0
for t in range(sim_length):
mu = a * mu + b
v = a**2 * v + c**2
ax.plot(grid,
norm.pdf(grid, loc=mu, scale=np.sqrt(v)),
alpha=0.5)

fig, ax = plt.subplots()
plot_density_seq(ax)
plt.show()

Moreover, the limit does not depend on the initial condition.


For example, this alternative density sequence also converges to the same limit.

In [5]: fig, ax = plt.subplots()


plot_density_seq(ax, mu_0=3.0)
plt.show()
9.4. STATIONARITY AND ASYMPTOTIC STABILITY 139

In fact it’s easy to show that such convergence will occur, regardless of the initial condition,
whenever |𝑎| < 1.
To see this, we just have to look at the dynamics of the first two moments, as given in (3).
When |𝑎| < 1, these sequence converge to the respective limits

𝑏 𝑐2
𝜇∗ ∶= and 𝑣∗ = (4)
1−𝑎 1 − 𝑎2

(See our lecture on one dimensional dynamics for background on deterministic convergence.)
Hence

𝜓𝑡 → 𝜓∗ = 𝑁 (𝜇∗ , 𝑣∗ ) as 𝑡 → ∞ (5)

We can confirm this is valid for the sequence above using the following code.

In [6]: fig, ax = plt.subplots()


plot_density_seq(ax, mu_0=3.0)

mu_star = b / (1 ­ a)
std_star = np.sqrt(c**2 / (1 ­ a**2)) # square root of v_star
psi_star = norm.pdf(grid, loc=mu_star, scale=std_star)
ax.plot(grid, psi_star, 'k­', lw=2, label="$\psi^*$")
ax.legend()

plt.show()
140 CHAPTER 9. AR1 PROCESSES

As claimed, the sequence {𝜓𝑡 } converges to 𝜓∗ .

9.4.1 Stationary Distributions

A stationary distribution is a distribution that is a fixed point of the update rule for distribu-
tions.
In other words, if 𝜓𝑡 is stationary, then 𝜓𝑡+𝑗 = 𝜓𝑡 for all 𝑗 in ℕ.
A different way to put this, specialized to the current setting, is as follows: a density 𝜓 on ℝ
is stationary for the AR(1) process if

𝑋𝑡 ∼ 𝜓 ⟹ 𝑎𝑋𝑡 + 𝑏 + 𝑐𝑊𝑡+1 ∼ 𝜓

The distribution 𝜓∗ in (5) has this property — checking this is an exercise.


(Of course, we are assuming that |𝑎| < 1 so that 𝜓∗ is well defined.)
In fact, it can be shown that no other distribution on ℝ has this property.
Thus, when |𝑎| < 1, the AR(1) model has exactly one stationary density and that density is
given by 𝜓∗ .

9.5 Ergodicity

The concept of ergodicity is used in different ways by different authors.


One way to understand it in the present setting is that a version of the Law of Large Num-
bers is valid for {𝑋𝑡 }, even though it is not IID.
In particular, averages over time series converge to expectations under the stationary distri-
bution.
9.6. EXERCISES 141

Indeed, it can be proved that, whenever |𝑎| < 1, we have

1 𝑚
∑ ℎ(𝑋𝑡 ) → ∫ ℎ(𝑥)𝜓∗ (𝑥)𝑑𝑥 as 𝑚 → ∞ (6)
𝑚 𝑡=1

whenever the integral on the right hand side is finite and well defined.
Notes:
• In (6), convergence holds with probability one.
• The textbook by [81] is a classic reference on ergodicity.
For example, if we consider the identity function ℎ(𝑥) = 𝑥, we get

1 𝑚
∑ 𝑋 → ∫ 𝑥𝜓∗ (𝑥)𝑑𝑥 as 𝑚 → ∞
𝑚 𝑡=1 𝑡

In other words, the time series sample mean converges to the mean of the stationary distribu-
tion.
As will become clear over the next few lectures, ergodicity is a very important concept for
statistics and simulation.

9.6 Exercises

9.6.1 Exercise 1

Let 𝑘 be a natural number.


The 𝑘-th central moment of a random variable is defined as

𝑀𝑘 ∶= 𝔼[(𝑋 − 𝔼𝑋)𝑘 ]

When that random variable is 𝑁 (𝜇, 𝜎2 ), it is known that

0 if 𝑘 is odd
𝑀𝑘 = {
𝜎𝑘 (𝑘 − 1)!! if 𝑘 is even

Here 𝑛!! is the double factorial.


According to (6), we should have, for any 𝑘 ∈ ℕ,

1 𝑚
∑(𝑋 − 𝜇∗ )𝑘 ≈ 𝑀𝑘
𝑚 𝑡=1 𝑡

when 𝑚 is large.
Confirm this by simulation at a range of 𝑘 using the default parameters from the lecture.
142 CHAPTER 9. AR1 PROCESSES

9.6.2 Exercise 2

Write your own version of a one dimensional kernel density estimator, which estimates a den-
sity from a sample.
Write it as a class that takes the data 𝑋 and bandwidth ℎ when initialized and provides a
method 𝑓 such that

1 𝑛 𝑥 − 𝑋𝑖
𝑓(𝑥) = ∑𝐾 ( )
ℎ𝑛 𝑖=1 ℎ

For 𝐾 use the Gaussian kernel (𝐾 is the standard normal density).


Write the class so that the bandwidth defaults to Silverman’s rule (see the “rule of thumb”
discussion on this page). Test the class you have written by going through the steps

1. simulate data 𝑋1 , … , 𝑋𝑛 from distribution 𝜙

2. plot the kernel density estimate over a suitable range

3. plot the density of 𝜙 on the same figure

for distributions 𝜙 of the following types


• beta distribution with 𝛼 = 𝛽 = 2
• beta distribution with 𝛼 = 2 and 𝛽 = 5
• beta distribution with 𝛼 = 𝛽 = 0.5
Use 𝑛 = 500.
Make a comment on your results. (Do you think this is a good estimator of these distribu-
tions?)

9.6.3 Exercise 3

In the lecture we discussed the following fact: for the 𝐴𝑅(1) process

𝑋𝑡+1 = 𝑎𝑋𝑡 + 𝑏 + 𝑐𝑊𝑡+1

with {𝑊𝑡 } iid and standard normal,

𝜓𝑡 = 𝑁 (𝜇, 𝑠2 ) ⟹ 𝜓𝑡+1 = 𝑁 (𝑎𝜇 + 𝑏, 𝑎2 𝑠2 + 𝑐2 )

Confirm this, at least approximately, by simulation. Let


• 𝑎 = 0.9
• 𝑏 = 0.0
• 𝑐 = 0.1
• 𝜇 = −3
• 𝑠 = 0.2
First, plot 𝜓𝑡 and 𝜓𝑡+1 using the true distributions described above.
Second, plot 𝜓𝑡+1 on the same figure (in a different color) as follows:
9.7. SOLUTIONS 143

1. Generate 𝑛 draws of 𝑋𝑡 from the 𝑁 (𝜇, 𝑠2 ) distribution

2. Update them all using the rule 𝑋𝑡+1 = 𝑎𝑋𝑡 + 𝑏 + 𝑐𝑊𝑡+1

3. Use the resulting sample of 𝑋𝑡+1 values to produce a density estimate via kernel density
estimation.

Try this for 𝑛 = 2000 and confirm that the simulation based estimate of 𝜓𝑡+1 does converge
to the theoretical distribution.

9.7 Solutions

9.7.1 Exercise 1

In [7]: from numba import njit


from scipy.special import factorial2

@njit
def sample_moments_ar1(k, m=100_000, mu_0=0.0, sigma_0=1.0, seed=1234):
np.random.seed(seed)
sample_sum = 0.0
x = mu_0 + sigma_0 * np.random.randn()
for t in range(m):
sample_sum += (x ­ mu_star)**k
x = a * x + b + c * np.random.randn()
return sample_sum / m

def true_moments_ar1(k):
if k % 2 == 0:
return std_star**k * factorial2(k ­ 1)
else:
return 0

k_vals = np.arange(6) + 1
sample_moments = np.empty_like(k_vals)
true_moments = np.empty_like(k_vals)

for k_idx, k in enumerate(k_vals):


sample_moments[k_idx] = sample_moments_ar1(k)
true_moments[k_idx] = true_moments_ar1(k)

fig, ax = plt.subplots()
ax.plot(k_vals, true_moments, label="true moments")
ax.plot(k_vals, sample_moments, label="sample moments")
ax.legend()

plt.show()
144 CHAPTER 9. AR1 PROCESSES

9.7.2 Exercise 2

Here is one solution:

In [8]: K = norm.pdf

class KDE:

def __init__(self, x_data, h=None):

if h is None:
c = x_data.std()
n = len(x_data)
h = 1.06 * c * n**(­1/5)
self.h = h
self.x_data = x_data

def f(self, x):


if np.isscalar(x):
return K((x ­ self.x_data) / self.h).mean() * (1/self.h)
else:
y = np.empty_like(x)
for i, x_val in enumerate(x):
y[i] = K((x_val ­ self.x_data) / self.h).mean() * (1/self.h)
return y

In [9]: def plot_kde(ϕ, x_min=­0.2, x_max=1.2):


x_data = ϕ.rvs(n)
kde = KDE(x_data)

x_grid = np.linspace(­0.2, 1.2, 100)


fig, ax = plt.subplots()
ax.plot(x_grid, kde.f(x_grid), label="estimate")
ax.plot(x_grid, ϕ.pdf(x_grid), label="true density")
9.7. SOLUTIONS 145

ax.legend()
plt.show()

In [10]: from scipy.stats import beta

n = 500
parameter_pairs= (2, 2), (2, 5), (0.5, 0.5)
for α, β in parameter_pairs:
plot_kde(beta(α, β))
146 CHAPTER 9. AR1 PROCESSES

We see that the kernel density estimator is effective when the underlying distribution is
smooth but less so otherwise.

9.7.3 Exercise 3

Here is our solution

In [11]: a = 0.9
b = 0.0
c = 0.1
μ = ­3
s = 0.2

In [12]: μ_next = a * μ + b
s_next = np.sqrt(a**2 * s**2 + c**2)

In [13]: ψ = lambda x: K((x ­ μ) / s)


ψ_next = lambda x: K((x ­ μ_next) / s_next)

In [14]: ψ = norm(μ, s)
ψ_next = norm(μ_next, s_next)

In [15]: n = 2000
x_draws = ψ.rvs(n)
x_draws_next = a * x_draws + b + c * np.random.randn(n)
kde = KDE(x_draws_next)

x_grid = np.linspace(μ ­ 1, μ + 1, 100)


fig, ax = plt.subplots()

ax.plot(x_grid, ψ.pdf(x_grid), label="$\psi_t$")


ax.plot(x_grid, ψ_next.pdf(x_grid), label="$\psi_{t+1}$")
ax.plot(x_grid, kde.f(x_grid), label="estimate of $\psi_{t+1}$")
9.7. SOLUTIONS 147

ax.legend()
plt.show()

The simulated distribution approximately coincides with the theoretical distribution, as pre-
dicted.
148 CHAPTER 9. AR1 PROCESSES
Chapter 10

Finite Markov Chains

10.1 Contents

• Overview 10.2
• Definitions 10.3
• Simulation 10.4
• Marginal Distributions 10.5
• Irreducibility and Aperiodicity 10.6
• Stationary Distributions 10.7
• Ergodicity 10.8
• Computing Expectations 10.9
• Exercises 10.10
• Solutions 10.11
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon

10.2 Overview

Markov chains are one of the most useful classes of stochastic processes, being
• simple, flexible and supported by many elegant theoretical results
• valuable for building intuition about random dynamic models
• central to quantitative modeling in their own right
You will find them in many of the workhorse models of economics and finance.
In this lecture, we review some of the theory of Markov chains.
We will also introduce some of the high-quality routines for working with Markov chains
available in QuantEcon.py.
Prerequisite knowledge is basic probability and linear algebra.
Let’s start with some standard imports:

In [2]: import quantecon as qe


import numpy as np
from mpl_toolkits.mplot3d import Axes3D

149
150 CHAPTER 10. FINITE MARKOV CHAINS

import matplotlib.pyplot as plt


%matplotlib inline

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

10.3 Definitions

The following concepts are fundamental.

10.3.1 Stochastic Matrices

A stochastic matrix (or Markov matrix) is an 𝑛 × 𝑛 square matrix 𝑃 such that

1. each element of 𝑃 is nonnegative, and

2. each row of 𝑃 sums to one

Each row of 𝑃 can be regarded as a probability mass function over 𝑛 possible outcomes.
It is too not difficult to check Section ?? that if 𝑃 is a stochastic matrix, then so is the 𝑘-th
power 𝑃 𝑘 for all 𝑘 ∈ ℕ.

10.3.2 Markov Chains

There is a close connection between stochastic matrices and Markov chains.


To begin, let 𝑆 be a finite set with 𝑛 elements {𝑥1 , … , 𝑥𝑛 }.
The set 𝑆 is called the state space and 𝑥1 , … , 𝑥𝑛 are the state values.
A Markov chain {𝑋𝑡 } on 𝑆 is a sequence of random variables on 𝑆 that have the Markov
property.
This means that, for any date 𝑡 and any state 𝑦 ∈ 𝑆,

ℙ{𝑋𝑡+1 = 𝑦 | 𝑋𝑡 } = ℙ{𝑋𝑡+1 = 𝑦 | 𝑋𝑡 , 𝑋𝑡−1 , …} (1)

In other words, knowing the current state is enough to know probabilities for future states.
In particular, the dynamics of a Markov chain are fully determined by the set of values

𝑃 (𝑥, 𝑦) ∶= ℙ{𝑋𝑡+1 = 𝑦 | 𝑋𝑡 = 𝑥} (𝑥, 𝑦 ∈ 𝑆) (2)

By construction,
• 𝑃 (𝑥, 𝑦) is the probability of going from 𝑥 to 𝑦 in one unit of time (one step)
• 𝑃 (𝑥, ⋅) is the conditional distribution of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥
10.3. DEFINITIONS 151

We can view 𝑃 as a stochastic matrix where

𝑃𝑖𝑗 = 𝑃 (𝑥𝑖 , 𝑥𝑗 ) 1 ≤ 𝑖, 𝑗 ≤ 𝑛

Going the other way, if we take a stochastic matrix 𝑃 , we can generate a Markov chain {𝑋𝑡 }
as follows:
• draw 𝑋0 from some specified distribution
• for each 𝑡 = 0, 1, …, draw 𝑋𝑡+1 from 𝑃 (𝑋𝑡 , ⋅)
By construction, the resulting process satisfies (2).

10.3.3 Example 1

Consider a worker who, at any given time 𝑡, is either unemployed (state 0) or employed (state
1).
Suppose that, over a one month period,

1. An unemployed worker finds a job with probability 𝛼 ∈ (0, 1).

2. An employed worker loses her job and becomes unemployed with probability 𝛽 ∈ (0, 1).

In terms of a Markov model, we have


• 𝑆 = {0, 1}
• 𝑃 (0, 1) = 𝛼 and 𝑃 (1, 0) = 𝛽
We can write out the transition probabilities in matrix form as

1−𝛼 𝛼
𝑃 =( ) (3)
𝛽 1−𝛽

Once we have the values 𝛼 and 𝛽, we can address a range of questions, such as
• What is the average duration of unemployment?
• Over the long-run, what fraction of time does a worker find herself unemployed?
• Conditional on employment, what is the probability of becoming unemployed at least
once over the next 12 months?
We’ll cover such applications below.

10.3.4 Example 2

Using US unemployment data, Hamilton [49] estimated the stochastic matrix

0.971 0.029 0
𝑃 =⎛
⎜ 0.145 0.778 0.077 ⎞

⎝ 0 0.508 0.492 ⎠

where
• the frequency is monthly
152 CHAPTER 10. FINITE MARKOV CHAINS

• the first state represents “normal growth”


• the second state represents “mild recession”
• the third state represents “severe recession”
For example, the matrix tells us that when the state is normal growth, the state will again be
normal growth next month with probability 0.97.
In general, large values on the main diagonal indicate persistence in the process {𝑋𝑡 }.
This Markov process can also be represented as a directed graph, with edges labeled by tran-
sition probabilities

Here “ng” is normal growth, “mr” is mild recession, etc.

10.4 Simulation

One natural way to answer questions about Markov chains is to simulate them.
(To approximate the probability of event 𝐸, we can simulate many times and count the frac-
tion of times that 𝐸 occurs).
Nice functionality for simulating Markov chains exists in QuantEcon.py.
• Efficient, bundled with lots of other useful routines for handling Markov chains.
However, it’s also a good exercise to roll our own routines — let’s do that first and then come
back to the methods in QuantEcon.py.
In these exercises, we’ll take the state space to be 𝑆 = 0, … , 𝑛 − 1.

10.4.1 Rolling Our Own

To simulate a Markov chain, we need its stochastic matrix 𝑃 and a probability distribution 𝜓
for the initial state to be drawn from.
The Markov chain is then constructed as discussed above. To repeat:

1. At time 𝑡 = 0, the 𝑋0 is chosen from 𝜓.

2. At each subsequent time 𝑡, the new state 𝑋𝑡+1 is drawn from 𝑃 (𝑋𝑡 , ⋅).

To implement this simulation procedure, we need a method for generating draws from a dis-
crete distribution.
For this task, we’ll use random.draw from QuantEcon, which works as follows:
10.4. SIMULATION 153

In [3]: ψ = (0.3, 0.7) # probabilities over {0, 1}


cdf = np.cumsum(ψ) # convert into cummulative distribution
qe.random.draw(cdf, 5) # generate 5 independent draws from ψ

Out[3]: array([1, 0, 1, 0, 0])

We’ll write our code as a function that takes the following three arguments
• A stochastic matrix P
• An initial state init
• A positive integer sample_size representing the length of the time series the function
should return

In [4]: def mc_sample_path(P, ψ_0=None, sample_size=1_000):

# set up
P = np.asarray(P)
X = np.empty(sample_size, dtype=int)

# Convert each row of P into a cdf


n = len(P)
P_dist = [np.cumsum(P[i, :]) for i in range(n)]

# draw initial state, defaulting to 0


if ψ_0 is not None:
X_0 = qe.random.draw(np.cumsum(ψ_0))
else:
X_0 = 0

# simulate
X[0] = X_0
for t in range(sample_size ­ 1):
X[t+1] = qe.random.draw(P_dist[X[t]])

return X

Let’s see how it works using the small matrix

In [5]: P = [[0.4, 0.6],


[0.2, 0.8]]

As we’ll see later, for a long series drawn from P, the fraction of the sample that takes value 0
will be about 0.25.
Moreover, this is true, regardless of the initial distribution from with 𝑋0 is drawn.
The following code illustrates this

In [6]: X = mc_sample_path(P, ψ_0=[0.1, 0.9], sample_size=100_000)


np.mean(X == 0)

Out[6]: 0.25111

You can try changing the initial distribution to confirm that the output is always close to
0.25.
154 CHAPTER 10. FINITE MARKOV CHAINS

10.4.2 Using QuantEcon’s Routines

As discussed above, QuantEcon.py has routines for handling Markov chains, including simula-
tion.
Here’s an illustration using the same P as the preceding example

In [7]: from quantecon import MarkovChain

mc = qe.MarkovChain(P)
X = mc.simulate(ts_length=1_000_000)
np.mean(X == 0)

Out[7]: 0.250219

The QuantEcon.py routine is JIT compiled and much faster.

In [8]: %time mc_sample_path(P, sample_size=1_000_000) # Our version

CPU times: user 791 ms, sys: 1.36 ms, total: 792 ms
Wall time: 792 ms

Out[8]: array([0, 1, 0, …, 1, 1, 1])

In [9]: %time mc.simulate(ts_length=1_000_000) # qe version

CPU times: user 28.6 ms, sys: 8.11 ms, total: 36.7 ms
Wall time: 36.4 ms

Out[9]: array([0, 1, 0, …, 1, 1, 1])

Adding State Values and Initial Conditions

If we wish to, we can provide a specification of state values to MarkovChain.


These state values can be integers, floats, or even strings.
The following code illustrates

In [10]: mc = qe.MarkovChain(P, state_values=('unemployed', 'employed'))


mc.simulate(ts_length=4, init='employed')

Out[10]: array(['employed', 'employed', 'unemployed', 'unemployed'], dtype='<U10')

In [11]: mc.simulate(ts_length=4, init='unemployed')

Out[11]: array(['unemployed', 'unemployed', 'employed', 'employed'], dtype='<U10')

In [12]: mc.simulate(ts_length=4) # Start at randomly chosen initial state

Out[12]: array(['unemployed', 'employed', 'employed', 'employed'], dtype='<U10')


10.5. MARGINAL DISTRIBUTIONS 155

If we want to simulate with output as indices rather than state values we can use

In [13]: mc.simulate_indices(ts_length=4)

Out[13]: array([1, 1, 1, 1])

10.5 Marginal Distributions

Suppose that

1. {𝑋𝑡 } is a Markov chain with stochastic matrix 𝑃

2. the distribution of 𝑋𝑡 is known to be 𝜓𝑡

What then is the distribution of 𝑋𝑡+1 , or, more generally, of 𝑋𝑡+𝑚 ?


To answer this, we let 𝜓𝑡 be the distribution of 𝑋𝑡 for 𝑡 = 0, 1, 2, ….
Our first aim is to find 𝜓𝑡+1 given 𝜓𝑡 and 𝑃 .
To begin, pick any 𝑦 ∈ 𝑆.
Using the law of total probability, we can decompose the probability that 𝑋𝑡+1 = 𝑦 as follows:

ℙ{𝑋𝑡+1 = 𝑦} = ∑ ℙ{𝑋𝑡+1 = 𝑦 | 𝑋𝑡 = 𝑥} ⋅ ℙ{𝑋𝑡 = 𝑥}


𝑥∈𝑆

In words, to get the probability of being at 𝑦 tomorrow, we account for all ways this can hap-
pen and sum their probabilities.
Rewriting this statement in terms of marginal and conditional probabilities gives

𝜓𝑡+1 (𝑦) = ∑ 𝑃 (𝑥, 𝑦)𝜓𝑡 (𝑥)


𝑥∈𝑆

There are 𝑛 such equations, one for each 𝑦 ∈ 𝑆.


If we think of 𝜓𝑡+1 and 𝜓𝑡 as row vectors (as is traditional in this literature), these 𝑛 equa-
tions are summarized by the matrix expression

𝜓𝑡+1 = 𝜓𝑡 𝑃 (4)

In other words, to move the distribution forward one unit of time, we postmultiply by 𝑃 .
By repeating this 𝑚 times we move forward 𝑚 steps into the future.
Hence, iterating on (4), the expression 𝜓𝑡+𝑚 = 𝜓𝑡 𝑃 𝑚 is also valid — here 𝑃 𝑚 is the 𝑚-th
power of 𝑃 .
As a special case, we see that if 𝜓0 is the initial distribution from which 𝑋0 is drawn, then
𝜓0 𝑃 𝑚 is the distribution of 𝑋𝑚 .
This is very important, so let’s repeat it
156 CHAPTER 10. FINITE MARKOV CHAINS

𝑋0 ∼ 𝜓 0 ⟹ 𝑋𝑚 ∼ 𝜓0 𝑃 𝑚 (5)

and, more generally,

𝑋𝑡 ∼ 𝜓𝑡 ⟹ 𝑋𝑡+𝑚 ∼ 𝜓𝑡 𝑃 𝑚 (6)

10.5.1 Multiple Step Transition Probabilities

We know that the probability of transitioning from 𝑥 to 𝑦 in one step is 𝑃 (𝑥, 𝑦).
It turns out that the probability of transitioning from 𝑥 to 𝑦 in 𝑚 steps is 𝑃 𝑚 (𝑥, 𝑦), the
(𝑥, 𝑦)-th element of the 𝑚-th power of 𝑃 .
To see why, consider again (6), but now with 𝜓𝑡 putting all probability on state 𝑥
• 1 in the 𝑥-th position and zero elsewhere
Inserting this into (6), we see that, conditional on 𝑋𝑡 = 𝑥, the distribution of 𝑋𝑡+𝑚 is the
𝑥-th row of 𝑃 𝑚 .
In particular

ℙ{𝑋𝑡+𝑚 = 𝑦 | 𝑋𝑡 = 𝑥} = 𝑃 𝑚 (𝑥, 𝑦) = (𝑥, 𝑦)-th element of 𝑃 𝑚

10.5.2 Example: Probability of Recession

Recall the stochastic matrix 𝑃 for recession and growth considered above.
Suppose that the current state is unknown — perhaps statistics are available only at the end
of the current month.
We estimate the probability that the economy is in state 𝑥 to be 𝜓(𝑥).
The probability of being in recession (either mild or severe) in 6 months time is given by the
inner product

0
𝜓𝑃 6 ⋅ ⎛
⎜ 1 ⎞

1
⎝ ⎠

10.5.3 Example 2: Cross-Sectional Distributions

The marginal distributions we have been studying can be viewed either as probabilities or as
cross-sectional frequencies in large samples.
To illustrate, recall our model of employment/unemployment dynamics for a given worker
discussed above.
Consider a large population of workers, each of whose lifetime experience is described by the
specified dynamics, independent of one another.
Let 𝜓 be the current cross-sectional distribution over {0, 1}.
10.6. IRREDUCIBILITY AND APERIODICITY 157

The cross-sectional distribution records the fractions of workers employed and unemployed at
a given moment.
• For example, 𝜓(0) is the unemployment rate.
What will the cross-sectional distribution be in 10 periods hence?
The answer is 𝜓𝑃 10 , where 𝑃 is the stochastic matrix in (3).
This is because each worker is updated according to 𝑃 , so 𝜓𝑃 10 represents probabilities for a
single randomly selected worker.
But when the sample is large, outcomes and probabilities are roughly equal (by the Law of
Large Numbers).
So for a very large (tending to infinite) population, 𝜓𝑃 10 also represents the fraction of work-
ers in each state.
This is exactly the cross-sectional distribution.

10.6 Irreducibility and Aperiodicity

Irreducibility and aperiodicity are central concepts of modern Markov chain theory.
Let’s see what they’re about.

10.6.1 Irreducibility

Let 𝑃 be a fixed stochastic matrix.


Two states 𝑥 and 𝑦 are said to communicate with each other if there exist positive integers
𝑗 and 𝑘 such that

𝑃 𝑗 (𝑥, 𝑦) > 0 and 𝑃 𝑘 (𝑦, 𝑥) > 0

In view of our discussion above, this means precisely that


• state 𝑥 can be reached eventually from state 𝑦, and
• state 𝑦 can be reached eventually from state 𝑥
The stochastic matrix 𝑃 is called irreducible if all states communicate; that is, if 𝑥 and 𝑦
communicate for all (𝑥, 𝑦) in 𝑆 × 𝑆.
For example, consider the following transition probabilities for wealth of a fictitious set of
households
158 CHAPTER 10. FINITE MARKOV CHAINS

We can translate this into a stochastic matrix, putting zeros where there’s no edge between
nodes

0.9 0.1 0
𝑃 ∶= ⎜ 0.4 0.4 0.2 ⎞
⎛ ⎟
⎝ 0.1 0.1 0.8 ⎠

It’s clear from the graph that this stochastic matrix is irreducible: we can reach any state
from any other state eventually.
We can also test this using QuantEcon.py’s MarkovChain class

In [14]: P = [[0.9, 0.1, 0.0],


[0.4, 0.4, 0.2],
[0.1, 0.1, 0.8]]

mc = qe.MarkovChain(P, ('poor', 'middle', 'rich'))


mc.is_irreducible

Out[14]: True

Here’s a more pessimistic scenario, where the poor are poor forever

This stochastic matrix is not irreducible, since, for example, rich is not accessible from poor.
Let’s confirm this

In [15]: P = [[1.0, 0.0, 0.0],


[0.1, 0.8, 0.1],
[0.0, 0.2, 0.8]]

mc = qe.MarkovChain(P, ('poor', 'middle', 'rich'))


mc.is_irreducible

Out[15]: False

We can also determine the “communication classes”


10.6. IRREDUCIBILITY AND APERIODICITY 159

In [16]: mc.communication_classes

Out[16]: [array(['poor'], dtype='<U6'), array(['middle', 'rich'], dtype='<U6')]

It might be clear to you already that irreducibility is going to be important in terms of long
run outcomes.
For example, poverty is a life sentence in the second graph but not the first.
We’ll come back to this a bit later.

10.6.2 Aperiodicity

Loosely speaking, a Markov chain is called periodic if it cycles in a predictible way, and aperi-
odic otherwise.
Here’s a trivial example with three states

The chain cycles with period 3:

In [17]: P = [[0, 1, 0],


[0, 0, 1],
[1, 0, 0]]

mc = qe.MarkovChain(P)
mc.period

Out[17]: 3

More formally, the period of a state 𝑥 is the greatest common divisor of the set of integers

𝐷(𝑥) ∶= {𝑗 ≥ 1 ∶ 𝑃 𝑗 (𝑥, 𝑥) > 0}

In the last example, 𝐷(𝑥) = {3, 6, 9, …} for every state 𝑥, so the period is 3.
A stochastic matrix is called aperiodic if the period of every state is 1, and periodic other-
wise.
For example, the stochastic matrix associated with the transition probabilities below is peri-
odic because, for example, state 𝑎 has period 2
160 CHAPTER 10. FINITE MARKOV CHAINS

We can confirm that the stochastic matrix is periodic as follows

In [18]: P = [[0.0, 1.0, 0.0, 0.0],


[0.5, 0.0, 0.5, 0.0],
[0.0, 0.5, 0.0, 0.5],
[0.0, 0.0, 1.0, 0.0]]

mc = qe.MarkovChain(P)
mc.period

Out[18]: 2

In [19]: mc.is_aperiodic

Out[19]: False

10.7 Stationary Distributions

As seen in (4), we can shift probabilities forward one unit of time via postmultiplication by
𝑃.
Some distributions are invariant under this updating process — for example,

In [20]: P = np.array([[0.4, 0.6],


[0.2, 0.8]])
ψ = (0.25, 0.75)
ψ @ P

Out[20]: array([0.25, 0.75])

Such distributions are called stationary, or invariant.


Formally, a distribution 𝜓∗ on 𝑆 is called stationary for 𝑃 if 𝜓∗ = 𝜓∗ 𝑃 .
(This is the same notion of stationarity that we learned about in the lecture on AR(1) pro-
cesses applied to a different setting.)
From this equality, we immediately get 𝜓∗ = 𝜓∗ 𝑃 𝑡 for all 𝑡.
This tells us an important fact: If the distribution of 𝑋0 is a stationary distribution, then 𝑋𝑡
will have this same distribution for all 𝑡.
Hence stationary distributions have a natural interpretation as stochastic steady states —
we’ll discuss this more in just a moment.
Mathematically, a stationary distribution is a fixed point of 𝑃 when 𝑃 is thought of as the
map 𝜓 ↦ 𝜓𝑃 from (row) vectors to (row) vectors.
Theorem. Every stochastic matrix 𝑃 has at least one stationary distribution.
(We are assuming here that the state space 𝑆 is finite; if not more assumptions are required)
For proof of this result, you can apply Brouwer’s fixed point theorem, or see EDTC, theorem
4.3.5.
There may in fact be many stationary distributions corresponding to a given stochastic ma-
trix 𝑃 .
10.7. STATIONARY DISTRIBUTIONS 161

• For example, if 𝑃 is the identity matrix, then all distributions are stationary.
Since stationary distributions are long run equilibria, to get uniqueness we require that initial
conditions are not infinitely persistent.
Infinite persistence of initial conditions occurs if certain regions of the state space cannot be
accessed from other regions, which is the opposite of irreducibility.
This gives some intuition for the following fundamental theorem.
Theorem. If 𝑃 is both aperiodic and irreducible, then

1. 𝑃 has exactly one stationary distribution 𝜓∗ .

2. For any initial distribution 𝜓0 , we have ‖𝜓0 𝑃 𝑡 − 𝜓∗ ‖ → 0 as 𝑡 → ∞.

For a proof, see, for example, theorem 5.2 of [45].


(Note that part 1 of the theorem requires only irreducibility, whereas part 2 requires both
irreducibility and aperiodicity)
A stochastic matrix satisfying the conditions of the theorem is sometimes called uniformly
ergodic.
One easy sufficient condition for aperiodicity and irreducibility is that every element of 𝑃 is
strictly positive.
• Try to convince yourself of this.

10.7.1 Example

Recall our model of employment/unemployment dynamics for a given worker discussed above.
Assuming 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), the uniform ergodicity condition is satisfied.
Let 𝜓∗ = (𝑝, 1 − 𝑝) be the stationary distribution, so that 𝑝 corresponds to unemployment
(state 0).
Using 𝜓∗ = 𝜓∗ 𝑃 and a bit of algebra yields

𝛽
𝑝=
𝛼+𝛽

This is, in some sense, a steady state probability of unemployment — more on interpretation
below.
Not surprisingly it tends to zero as 𝛽 → 0, and to one as 𝛼 → 0.

10.7.2 Calculating Stationary Distributions

As discussed above, a given Markov matrix 𝑃 can have many stationary distributions.
That is, there can be many row vectors 𝜓 such that 𝜓 = 𝜓𝑃 .
In fact if 𝑃 has two distinct stationary distributions 𝜓1 , 𝜓2 then it has infinitely many, since
in this case, as you can verify,
162 CHAPTER 10. FINITE MARKOV CHAINS

𝜓3 ∶= 𝜆𝜓1 + (1 − 𝜆)𝜓2

is a stationary distribution for 𝑃 for any 𝜆 ∈ [0, 1].


If we restrict attention to the case where only one stationary distribution exists, one option
for finding it is to try to solve the linear system 𝜓(𝐼𝑛 − 𝑃 ) = 0 for 𝜓, where 𝐼𝑛 is the 𝑛 × 𝑛
identity.
But the zero vector solves this equation, so we need to proceed carefully.
In essence, we need to impose the restriction that the solution must be a probability distribu-
tion.
There are various ways to do this.
One option is to regard this as an eigenvector problem: a vector 𝜓 such that 𝜓 = 𝜓𝑃 is a left
eigenvector associated with the unit eigenvalue 𝜆 = 1.
A more stable and sophisticated algorithm is implemented in QuantEcon.py.
This is the one we recommend you use:

In [21]: P = [[0.4, 0.6],


[0.2, 0.8]]

mc = qe.MarkovChain(P)
mc.stationary_distributions # Show all stationary distributions

Out[21]: array([[0.25, 0.75]])

10.7.3 Convergence to Stationarity

Part 2 of the Markov chain convergence theorem stated above tells us that the distribution of
𝑋𝑡 converges to the stationary distribution regardless of where we start off.
This adds considerable weight to our interpretation of 𝜓∗ as a stochastic steady state.
The convergence in the theorem is illustrated in the next figure

In [22]: P = ((0.971, 0.029, 0.000),


(0.145, 0.778, 0.077),
(0.000, 0.508, 0.492))
P = np.array(P)

ψ = (0.0, 0.2, 0.8) # Initial condition

fig = plt.figure(figsize=(8, 6))


ax = fig.add_subplot(111, projection='3d')

ax.set(xlim=(0, 1), ylim=(0, 1), zlim=(0, 1),


xticks=(0.25, 0.5, 0.75),
yticks=(0.25, 0.5, 0.75),
zticks=(0.25, 0.5, 0.75))

x_vals, y_vals, z_vals = [], [], []


for t in range(20):
x_vals.append(ψ[0])
y_vals.append(ψ[1])
z_vals.append(ψ[2])
10.8. ERGODICITY 163

ψ = ψ @ P

ax.scatter(x_vals, y_vals, z_vals, c='r', s=60)


ax.view_init(30, 210)

mc = qe.MarkovChain(P)
ψ_star = mc.stationary_distributions[0]
ax.scatter(ψ_star[0], ψ_star[1], ψ_star[2], c='k', s=60)

plt.show()

Here
• 𝑃 is the stochastic matrix for recession and growth considered above.
• The highest red dot is an arbitrarily chosen initial probability distribution 𝜓, repre-
sented as a vector in ℝ3 .
• The other red dots are the distributions 𝜓𝑃 𝑡 for 𝑡 = 1, 2, ….
• The black dot is 𝜓∗ .
You might like to try experimenting with different initial conditions.

10.8 Ergodicity

Under irreducibility, yet another important result obtains: For all 𝑥 ∈ 𝑆,


164 CHAPTER 10. FINITE MARKOV CHAINS

1 𝑚
∑ 1{𝑋𝑡 = 𝑥} → 𝜓∗ (𝑥) as 𝑚 → ∞ (7)
𝑚 𝑡=1

Here
• 1{𝑋𝑡 = 𝑥} = 1 if 𝑋𝑡 = 𝑥 and zero otherwise
• convergence is with probability one
• the result does not depend on the distribution (or value) of 𝑋0
The result tells us that the fraction of time the chain spends at state 𝑥 converges to 𝜓∗ (𝑥) as
time goes to infinity.
This gives us another way to interpret the stationary distribution — provided that the con-
vergence result in (7) is valid.
The convergence in (7) is a special case of a law of large numbers result for Markov chains —
see EDTC, section 4.3.4 for some additional information.

10.8.1 Example

Recall our cross-sectional interpretation of the employment/unemployment model discussed


above.
Assume that 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), so that irreducibility and aperiodicity both hold.
We saw that the stationary distribution is (𝑝, 1 − 𝑝), where

𝛽
𝑝=
𝛼+𝛽

In the cross-sectional interpretation, this is the fraction of people unemployed.


In view of our latest (ergodicity) result, it is also the fraction of time that a worker can ex-
pect to spend unemployed.
Thus, in the long-run, cross-sectional averages for a population and time-series averages for a
given person coincide.
This is one interpretation of the notion of ergodicity.

10.9 Computing Expectations

We are interested in computing expectations of the form

𝔼[ℎ(𝑋𝑡 )] (8)

and conditional expectations such as

𝔼[ℎ(𝑋𝑡+𝑘 ) ∣ 𝑋𝑡 = 𝑥] (9)

where
• {𝑋𝑡 } is a Markov chain generated by 𝑛 × 𝑛 stochastic matrix 𝑃
10.10. EXERCISES 165

• ℎ is a given function, which, in expressions involving matrix algebra, we’ll think of as


the column vector

ℎ(𝑥1 )

ℎ=⎜ ⋮ ⎞

⎝ ℎ(𝑥 )
𝑛 ⎠

The unconditional expectation (8) is easy: We just sum over the distribution of 𝑋𝑡 to get

𝔼[ℎ(𝑋𝑡 )] = ∑(𝜓𝑃 𝑡 )(𝑥)ℎ(𝑥)


𝑥∈𝑆

Here 𝜓 is the distribution of 𝑋0 .


Since 𝜓 and hence 𝜓𝑃 𝑡 are row vectors, we can also write this as

𝔼[ℎ(𝑋𝑡 )] = 𝜓𝑃 𝑡 ℎ

For the conditional expectation (9), we need to sum over the conditional distribution of 𝑋𝑡+𝑘
given 𝑋𝑡 = 𝑥.
We already know that this is 𝑃 𝑘 (𝑥, ⋅), so

𝔼[ℎ(𝑋𝑡+𝑘 ) ∣ 𝑋𝑡 = 𝑥] = (𝑃 𝑘 ℎ)(𝑥) (10)

The vector 𝑃 𝑘 ℎ stores the conditional expectation 𝔼[ℎ(𝑋𝑡+𝑘 ) ∣ 𝑋𝑡 = 𝑥] over all 𝑥.

10.9.1 Expectations of Geometric Sums

Sometimes we also want to compute expectations of a geometric sum, such as ∑𝑡 𝛽 𝑡 ℎ(𝑋𝑡 ).


In view of the preceding discussion, this is


𝔼 [∑ 𝛽 𝑗 ℎ(𝑋𝑡+𝑗 ) ∣ 𝑋𝑡 = 𝑥] = [(𝐼 − 𝛽𝑃 )−1 ℎ](𝑥)
𝑗=0

where

(𝐼 − 𝛽𝑃 )−1 = 𝐼 + 𝛽𝑃 + 𝛽 2 𝑃 2 + ⋯

Premultiplication by (𝐼 − 𝛽𝑃 )−1 amounts to “applying the resolvent operator”.

10.10 Exercises

10.10.1 Exercise 1

According to the discussion above, if a worker’s employment dynamics obey the stochastic
matrix
166 CHAPTER 10. FINITE MARKOV CHAINS

1−𝛼 𝛼
𝑃 =( )
𝛽 1−𝛽

with 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), then, in the long-run, the fraction of time spent unemployed
will be

𝛽
𝑝 ∶=
𝛼+𝛽

In other words, if {𝑋𝑡 } represents the Markov chain for employment, then 𝑋̄ 𝑚 → 𝑝 as 𝑚 →
∞, where

1 𝑚
𝑋̄ 𝑚 ∶= ∑ 1{𝑋𝑡 = 0}
𝑚 𝑡=1

The exercise is to illustrate this convergence by computing 𝑋̄ 𝑚 for large 𝑚 and checking that
it is close to 𝑝.
You will see that this statement is true regardless of the choice of initial condition or the val-
ues of 𝛼, 𝛽, provided both lie in (0, 1).

10.10.2 Exercise 2

A topic of interest for economics and many other disciplines is ranking.


Let’s now consider one of the most practical and important ranking problems — the rank as-
signed to web pages by search engines.
(Although the problem is motivated from outside of economics, there is in fact a deep connec-
tion between search ranking systems and prices in certain competitive equilibria — see [30])
To understand the issue, consider the set of results returned by a query to a web search en-
gine.
For the user, it is desirable to

1. receive a large set of accurate matches

2. have the matches returned in order, where the order corresponds to some measure of
“importance”

Ranking according to a measure of importance is the problem we now consider.


The methodology developed to solve this problem by Google founders Larry Page and Sergey
Brin is known as PageRank.
To illustrate the idea, consider the following diagram
10.10. EXERCISES 167

Imagine that this is a miniature version of the WWW, with


• each node representing a web page
• each arrow representing the existence of a link from one page to another
Now let’s think about which pages are likely to be important, in the sense of being valuable
to a search engine user.
One possible criterion for the importance of a page is the number of inbound links — an indi-
cation of popularity.
By this measure, m and j are the most important pages, with 5 inbound links each.
However, what if the pages linking to m, say, are not themselves important?
Thinking this way, it seems appropriate to weight the inbound nodes by relative importance.
The PageRank algorithm does precisely this.
A slightly simplified presentation that captures the basic idea is as follows.
Letting 𝑗 be (the integer index of) a typical page and 𝑟𝑗 be its ranking, we set

𝑟𝑖
𝑟𝑗 = ∑
𝑖∈𝐿𝑗
ℓ𝑖

where
• ℓ𝑖 is the total number of outbound links from 𝑖
• 𝐿𝑗 is the set of all pages 𝑖 such that 𝑖 has a link to 𝑗
This is a measure of the number of inbound links, weighted by their own ranking (and nor-
malized by 1/ℓ𝑖 ).
There is, however, another interpretation, and it brings us back to Markov chains.
Let 𝑃 be the matrix given by 𝑃 (𝑖, 𝑗) = 1{𝑖 → 𝑗}/ℓ𝑖 where 1{𝑖 → 𝑗} = 1 if 𝑖 has a link to 𝑗
and zero otherwise.
The matrix 𝑃 is a stochastic matrix provided that each page has at least one link.
With this definition of 𝑃 we have
168 CHAPTER 10. FINITE MARKOV CHAINS

𝑟𝑖 𝑟
𝑟𝑗 = ∑ = ∑ 1{𝑖 → 𝑗} 𝑖 = ∑ 𝑃 (𝑖, 𝑗)𝑟𝑖
𝑖∈𝐿𝑗
ℓ𝑖 all 𝑖
ℓ𝑖 all 𝑖

Writing 𝑟 for the row vector of rankings, this becomes 𝑟 = 𝑟𝑃 .


Hence 𝑟 is the stationary distribution of the stochastic matrix 𝑃 .
Let’s think of 𝑃 (𝑖, 𝑗) as the probability of “moving” from page 𝑖 to page 𝑗.
The value 𝑃 (𝑖, 𝑗) has the interpretation
• 𝑃 (𝑖, 𝑗) = 1/𝑘 if 𝑖 has 𝑘 outbound links and 𝑗 is one of them
• 𝑃 (𝑖, 𝑗) = 0 if 𝑖 has no direct link to 𝑗
Thus, motion from page to page is that of a web surfer who moves from one page to another
by randomly clicking on one of the links on that page.
Here “random” means that each link is selected with equal probability.
Since 𝑟 is the stationary distribution of 𝑃 , assuming that the uniform ergodicity condition is
valid, we can interpret 𝑟𝑗 as the fraction of time that a (very persistent) random surfer spends
at page 𝑗.
Your exercise is to apply this ranking algorithm to the graph pictured above and return the
list of pages ordered by rank.
There is a total of 14 nodes (i.e., web pages), the first named a and the last named n.
A typical line from the file has the form

d ­> h;

This should be interpreted as meaning that there exists a link from d to h.


The data for this graph is shown below, and read into a file called web_graph_data.txt when
the cell is executed.

In [23]: %%file web_graph_data.txt


a ­> d;
a ­> f;
b ­> j;
b ­> k;
b ­> m;
c ­> c;
c ­> g;
c ­> j;
c ­> m;
d ­> f;
d ­> h;
d ­> k;
e ­> d;
e ­> h;
e ­> l;
f ­> a;
f ­> b;
f ­> j;
f ­> l;
g ­> b;
g ­> j;
h ­> d;
10.10. EXERCISES 169

h ­> g;
h ­> l;
h ­> m;
i ­> g;
i ­> h;
i ­> n;
j ­> e;
j ­> i;
j ­> k;
k ­> n;
l ­> m;
m ­> g;
n ­> c;
n ­> j;
n ­> m;

Writing web_graph_data.txt

To parse this file and extract the relevant information, you can use regular expressions.
The following code snippet provides a hint as to how you can go about this

In [24]: import re
re.findall('\w', 'x +++ y ****** z') # \w matches alphanumerics

Out[24]: ['x', 'y', 'z']

In [25]: re.findall('\w', 'a ^^ b &&& $$ c')

Out[25]: ['a', 'b', 'c']

When you solve for the ranking, you will find that the highest ranked node is in fact g, while
the lowest is a.

10.10.3 Exercise 3

In numerical work, it is sometimes convenient to replace a continuous model with a discrete


one.
In particular, Markov chains are routinely generated as discrete approximations to AR(1)
processes of the form

𝑦𝑡+1 = 𝜌𝑦𝑡 + 𝑢𝑡+1

Here 𝑢𝑡 is assumed to be IID and 𝑁 (0, 𝜎𝑢2 ).


The variance of the stationary probability distribution of {𝑦𝑡 } is

𝜎𝑢2
𝜎𝑦2 ∶=
1 − 𝜌2

Tauchen’s method [105] is the most common method for approximating this continuous state
process with a finite state Markov chain.
170 CHAPTER 10. FINITE MARKOV CHAINS

A routine for this already exists in QuantEcon.py but let’s write our own version as an exer-
cise.
As a first step, we choose
• 𝑛, the number of states for the discrete approximation
• 𝑚, an integer that parameterizes the width of the state space
Next, we create a state space {𝑥0 , … , 𝑥𝑛−1 } ⊂ ℝ and a stochastic 𝑛 × 𝑛 matrix 𝑃 such that
• 𝑥0 = −𝑚 𝜎𝑦
• 𝑥𝑛−1 = 𝑚 𝜎𝑦
• 𝑥𝑖+1 = 𝑥𝑖 + 𝑠 where 𝑠 = (𝑥𝑛−1 − 𝑥0 )/(𝑛 − 1)
Let 𝐹 be the cumulative distribution function of the normal distribution 𝑁 (0, 𝜎𝑢2 ).
The values 𝑃 (𝑥𝑖 , 𝑥𝑗 ) are computed to approximate the AR(1) process — omitting the deriva-
tion, the rules are as follows:

1. If 𝑗 = 0, then set

𝑃 (𝑥𝑖 , 𝑥𝑗 ) = 𝑃 (𝑥𝑖 , 𝑥0 ) = 𝐹 (𝑥0 − 𝜌𝑥𝑖 + 𝑠/2)

1. If 𝑗 = 𝑛 − 1, then set

𝑃 (𝑥𝑖 , 𝑥𝑗 ) = 𝑃 (𝑥𝑖 , 𝑥𝑛−1 ) = 1 − 𝐹 (𝑥𝑛−1 − 𝜌𝑥𝑖 − 𝑠/2)

1. Otherwise, set

𝑃 (𝑥𝑖 , 𝑥𝑗 ) = 𝐹 (𝑥𝑗 − 𝜌𝑥𝑖 + 𝑠/2) − 𝐹 (𝑥𝑗 − 𝜌𝑥𝑖 − 𝑠/2)

The exercise is to write a function approx_markov(rho, sigma_u, m=3, n=7) that returns
{𝑥0 , … , 𝑥𝑛−1 } ⊂ ℝ and 𝑛 × 𝑛 matrix 𝑃 as described above.
• Even better, write a function that returns an instance of QuantEcon.py’s MarkovChain
class.

10.11 Solutions

10.11.1 Exercise 1

We will address this exercise graphically.


The plots show the time series of 𝑋̄ 𝑚 − 𝑝 for two initial conditions.
As 𝑚 gets large, both series converge to zero.

In [26]: α = β = 0.1
N = 10000
p = β / (α + β)
10.11. SOLUTIONS 171

P = ((1 ­ α, α), # Careful: P and p are distinct


( β, 1 ­ β))
P = np.array(P)
mc = MarkovChain(P)

fig, ax = plt.subplots(figsize=(9, 6))


ax.set_ylim(­0.25, 0.25)
ax.grid()
ax.hlines(0, 0, N, lw=2, alpha=0.6) # Horizonal line at zero

for x0, col in ((0, 'blue'), (1, 'green')):


# Generate time series for worker that starts at x0
X = mc.simulate(N, init=x0)
# Compute fraction of time spent unemployed, for each n
X_bar = (X == 0).cumsum() / (1 + np.arange(N, dtype=float))
# Plot
ax.fill_between(range(N), np.zeros(N), X_bar ­ p, color=col, alpha=0.1)
ax.plot(X_bar ­ p, color=col, label=f'$X_0 = \, {x0} $')
# Overlay in black­­make lines clearer
ax.plot(X_bar ­ p, 'k­', alpha=0.6)

ax.legend(loc='upper right')
plt.show()

10.11.2 Exercise 2

In [27]: """
Return list of pages, ordered by rank
"""
import re
from operator import itemgetter
172 CHAPTER 10. FINITE MARKOV CHAINS

infile = 'web_graph_data.txt'
alphabet = 'abcdefghijklmnopqrstuvwxyz'

n = 14 # Total number of web pages (nodes)

# Create a matrix Q indicating existence of links


# * Q[i, j] = 1 if there is a link from i to j
# * Q[i, j] = 0 otherwise
Q = np.zeros((n, n), dtype=int)
f = open(infile, 'r')
edges = f.readlines()
f.close()
for edge in edges:
from_node, to_node = re.findall('\w', edge)
i, j = alphabet.index(from_node), alphabet.index(to_node)
Q[i, j] = 1
# Create the corresponding Markov matrix P
P = np.empty((n, n))
for i in range(n):
P[i, :] = Q[i, :] / Q[i, :].sum()
mc = MarkovChain(P)
# Compute the stationary distribution r
r = mc.stationary_distributions[0]
ranked_pages = {alphabet[i] : r[i] for i in range(n)}
# Print solution, sorted from highest to lowest rank
print('Rankings\n ***')
for name, rank in sorted(ranked_pages.items(), key=itemgetter(1), reverse=1):
print(f'{name}: {rank:.4}')

Rankings
***
g: 0.1607
j: 0.1594
m: 0.1195
n: 0.1088
k: 0.09106
b: 0.08326
e: 0.05312
i: 0.05312
c: 0.04834
h: 0.0456
l: 0.03202
d: 0.03056
f: 0.01164
a: 0.002911

10.11.3 Exercise 3

A solution from the QuantEcon.py library can be found here.


Footnotes
[1] Hint: First show that if 𝑃 and 𝑄 are stochastic matrices then so is their product — to
check the row sums, try post multiplying by a column vector of ones. Finally, argue that 𝑃 𝑛
is a stochastic matrix using induction.
Chapter 11

Inventory Dynamics

11.1 Contents

• Overview 11.2
• Sample Paths 11.3
• Marginal Distributions 11.4
• Exercises 11.5
• Solutions 11.6

11.2 Overview

In this lecture we will study the time path of inventories for firms that follow so-called s-S
inventory dynamics.
Such firms

1. wait until inventory falls below some level 𝑠 and then

2. order sufficent quantities to bring their inventory back up to capacity 𝑆.

These kinds of policies are common in practice and also optimal in certain circumstances.
A review of early literature and some macroeconomic implications can be found in [19].
Here our main aim is to learn more about simulation, time series and Markov dynamics.
While our Markov environment and many of the concepts we consider are related to those
found in our lecture on finite Markov chains, the state space is a continuum in the current
application.
Let’s start with some imports

In [1]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

from numba import njit, jitclass, float64, prange

173
174 CHAPTER 11. INVENTORY DYNAMICS

11.3 Sample Paths

Consider a firm with inventory 𝑋𝑡 .


The firm waits until 𝑋𝑡 ≤ 𝑠 and then restocks up to 𝑆 units.
It faces stochastic demand {𝐷𝑡 }, which we assume is IID.
With notation 𝑎+ ∶= max{𝑎, 0}, inventory dynamics can be written as

(𝑆 − 𝐷𝑡+1 )+ if 𝑋𝑡 ≤ 𝑠
𝑋𝑡+1 = {
(𝑋𝑡 − 𝐷𝑡+1 )+ if 𝑋𝑡 > 𝑠

In what follows, we will assume that each 𝐷𝑡 is lognormal, so that

𝐷𝑡 = exp(𝜇 + 𝜎𝑍𝑡 )

where 𝜇 and 𝜎 are parameters and {𝑍𝑡 } is IID and standard normal.
Here’s a class that stores parameters and generates time paths for inventory.

In [2]: firm_data = [
('s', float64), # restock trigger level
('S', float64), # capacity
('mu', float64), # shock location parameter
('sigma', float64) # shock scale parameter
]

@jitclass(firm_data)
class Firm:

def __init__(self, s=10, S=100, mu=1.0, sigma=0.5):

self.s, self.S, self.mu, self.sigma = s, S, mu, sigma

def update(self, x):


"Update the state from t to t+1 given current state x."

Z = np.random.randn()
D = np.exp(self.mu + self.sigma * Z)
if x <= self.s:
return max(self.S ­ D, 0)
else:
return max(x ­ D, 0)

def sim_inventory_path(self, x_init, sim_length):

X = np.empty(sim_length)
X[0] = x_init

for t in range(sim_length­1):
X[t+1] = self.update(X[t])
return X

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/ipykernel_launcher.py:9: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
11.3. SAMPLE PATHS 175

this change and see http://numba.pydata.org/numba­


doc/latest/reference/deprecation.html#change­of­jitclass­location for the time�
↪frame.
if __name__ == '__main__':

Let’s run a first simulation, of a single path:

In [3]: firm = Firm()

s, S = firm.s, firm.S
sim_length = 100
x_init = 50

X = firm.sim_inventory_path(x_init, sim_length)

fig, ax = plt.subplots()
bbox = (0., 1.02, 1., .102)
legend_args = {'ncol': 3,
'bbox_to_anchor': bbox,
'loc': 3,
'mode': 'expand'}

ax.plot(X, label="inventory")
ax.plot(s * np.ones(sim_length), 'k­­', label="$s$")
ax.plot(S * np.ones(sim_length), 'k­', label="$S$")
ax.set_ylim(0, S+10)
ax.set_xlabel("time")
ax.legend(**legend_args)

plt.show()
176 CHAPTER 11. INVENTORY DYNAMICS

Now let’s simulate multiple paths in order to build a more complete picture of the probabili-
ties of different outcomes:

In [4]: sim_length=200
fig, ax = plt.subplots()

ax.plot(s * np.ones(sim_length), 'k­­', label="$s$")


ax.plot(S * np.ones(sim_length), 'k­', label="$S$")
ax.set_ylim(0, S+10)
ax.legend(**legend_args)

for i in range(400):
X = firm.sim_inventory_path(x_init, sim_length)
ax.plot(X, 'b', alpha=0.2, lw=0.5)

plt.show()

11.4 Marginal Distributions

Now let’s look at the marginal distribution 𝜓𝑇 of 𝑋𝑇 for some fixed 𝑇 .


We will do this by generating many draws of 𝑋𝑇 given initial condition 𝑋0 .
With these draws of 𝑋𝑇 we can build up a picture of its distribution 𝜓𝑇
Here’s one visualization, with 𝑇 = 50.

In [5]: T = 50
M = 200 # Number of draws
11.4. MARGINAL DISTRIBUTIONS 177

ymin, ymax = 0, S + 10

fig, axes = plt.subplots(1, 2, figsize=(11, 6))

for ax in axes:
ax.grid(alpha=0.4)

ax = axes[0]

ax.set_ylim(ymin, ymax)
ax.set_ylabel('$X_t$', fontsize=16)
ax.vlines((T,), ­1.5, 1.5)

ax.set_xticks((T,))
ax.set_xticklabels((r'$T$',))

sample = np.empty(M)
for m in range(M):
X = firm.sim_inventory_path(x_init, 2 * T)
ax.plot(X, 'b­', lw=1, alpha=0.5)
ax.plot((T,), (X[T+1],), 'ko', alpha=0.5)
sample[m] = X[T+1]

axes[1].set_ylim(ymin, ymax)

axes[1].hist(sample,
bins=16,
density=True,
orientation='horizontal',
histtype='bar',
alpha=0.5)

plt.show()

We can build up a clearer picture by drawing more samples

In [6]: T = 50
M = 50_000
178 CHAPTER 11. INVENTORY DYNAMICS

fig, ax = plt.subplots()

sample = np.empty(M)
for m in range(M):
X = firm.sim_inventory_path(x_init, T+1)
sample[m] = X[T]

ax.hist(sample,
bins=36,
density=True,
histtype='bar',
alpha=0.75)

plt.show()

Note that the distribution is bimodal


• Most firms have restocked twice but a few have restocked only once (see figure with
paths above).
• Firms in the second category have lower inventory.
We can also approximate the distribution using a kernel density estimator.
Kernel density estimators can be thought of as smoothed histograms.
They are preferable to histograms when the distribution being estimated is likely to be
smooth.
We will use a kernel density estimator from scikit-learn

In [7]: from sklearn.neighbors import KernelDensity

def plot_kde(sample, ax, label=''):

xmin, xmax = 0.9 * min(sample), 1.1 * max(sample)


xgrid = np.linspace(xmin, xmax, 200)
11.5. EXERCISES 179

kde = KernelDensity(kernel='gaussian').fit(sample[:, None])


log_dens = kde.score_samples(xgrid[:, None])

ax.plot(xgrid, np.exp(log_dens), label=label)

In [8]: fig, ax = plt.subplots()


plot_kde(sample, ax)
plt.show()

The allocation of probability mass is similar to what was shown by the histogram just above.

11.5 Exercises

11.5.1 Exercise 1

This model is asymptotically stationary, with a unique stationary distribution.


(See the discussion of stationarity in our lecture on AR(1) processes for background — the
fundamental concepts are the same.)
In particular, the sequence of marginal distributions {𝜓𝑡 } is converging to a unique limiting
distribution that does not depend on initial conditions.
Although we will not prove this here, we can investigate it using simulation.
Your task is to generate and plot the sequence {𝜓𝑡 } at times 𝑡 = 10, 50, 250, 500, 750 based on
the discussion above.
(The kernel density estimator is probably the best way to present each distribution.)
You should see convergence, in the sense that differences between successive distributions are
getting smaller.
180 CHAPTER 11. INVENTORY DYNAMICS

Try different initial conditions to verify that, in the long run, the distribution is invariant
across initial conditions.

11.5.2 Exercise 2

Using simulation, calculate the probability that firms that start with 𝑋0 = 70 need to order
twice or more in the first 50 periods.
You will need a large sample size to get an accurate reading.

11.6 Solutions

11.6.1 Exercise 1

Below is one possible solution:


The computations involve a lot of CPU cycles so we have tried to write the code efficiently.
This meant writing a specialized function rather than using the class above.

In [9]: s, S, mu, sigma = firm.s, firm.S, firm.mu, firm.sigma

@njit(parallel=True)
def shift_firms_forward(current_inventory_levels, num_periods):

num_firms = len(current_inventory_levels)
new_inventory_levels = np.empty(num_firms)

for f in prange(num_firms):
x = current_inventory_levels[f]
for t in range(num_periods):
Z = np.random.randn()
D = np.exp(mu + sigma * Z)
if x <= s:
x = max(S ­ D, 0)
else:
x = max(x ­ D, 0)
new_inventory_levels[f] = x

return new_inventory_levels

In [10]: x_init = 50
num_firms = 50_000

sample_dates = 0, 10, 50, 250, 500, 750

first_diffs = np.diff(sample_dates)

fig, ax = plt.subplots()

X = np.ones(num_firms) * x_init

current_date = 0
for d in first_diffs:
X = shift_firms_forward(X, d)
current_date += d
plot_kde(X, ax, label=f't = {current_date}')
11.6. SOLUTIONS 181

ax.set_xlabel('inventory')
ax.set_ylabel('probability')
ax.legend()
plt.show()

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

Notice that by 𝑡 = 500 or 𝑡 = 750 the densities are barely changing.


We have reached a reasonable approximation of the stationary density.
You can convince yourself that initial conditions don’t matter by testing a few of them.
For example, try rerunning the code above will all firms starting at 𝑋0 = 20 or 𝑋0 = 80.

11.6.2 Exercise 2

Here is one solution.


Again, the computations are relatively intensive so we have written a a specialized function
rather than using the class above.
We will also use parallelization across firms.

In [11]: @njit(parallel=True)
def compute_freq(sim_length=50, x_init=70, num_firms=1_000_000):
182 CHAPTER 11. INVENTORY DYNAMICS

firm_counter = 0 # Records number of firms that restock 2x or more


for m in prange(num_firms):
x = x_init
restock_counter = 0 # Will record number of restocks for firm m

for t in range(sim_length):
Z = np.random.randn()
D = np.exp(mu + sigma * Z)
if x <= s:
x = max(S ­ D, 0)
restock_counter += 1
else:
x = max(x ­ D, 0)

if restock_counter > 1:
firm_counter += 1

return firm_counter / num_firms

Note the time the routine takes to run, as well as the output.

In [12]: %%time

freq = compute_freq()
print(f"Frequency of at least two stock outs = {freq}")

Frequency of at least two stock outs = 0.446995


CPU times: user 3.77 s, sys: 27.7 ms, total: 3.8 s
Wall time: 2.21 s

Try switching the parallel flag to False in the jitted function above.
Depending on your system, the difference can be substantial.
(On our desktop machine, the speed up is by a factor of 5.)
Chapter 12

Linear State Space Models

12.1 Contents

• Overview 12.2
• The Linear State Space Model 12.3
• Distributions and Moments 12.4
• Stationarity and Ergodicity 12.5
• Noisy Observations 12.6
• Prediction 12.7
• Code 12.8
• Exercises 12.9
• Solutions 12.10

“We may regard the present state of the universe as the effect of its past and the
cause of its future” – Marquis de Laplace

In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon

12.2 Overview

This lecture introduces the linear state space dynamic system.


The linear state space system is a generalization of the scalar AR(1) process we studied be-
fore.
This model is a workhorse that carries a powerful theory of prediction.
Its many applications include:
• representing dynamics of higher-order linear systems
• predicting the position of a system 𝑗 steps into the future
• predicting a geometric sum of future values of a variable like
– non-financial income
– dividends on a stock
– the money supply

183
184 CHAPTER 12. LINEAR STATE SPACE MODELS

– a government deficit or surplus, etc.


• key ingredient of useful models
– Friedman’s permanent income model of consumption smoothing.
– Barro’s model of smoothing total tax collections.
– Rational expectations version of Cagan’s model of hyperinflation.
– Sargent and Wallace’s “unpleasant monetarist arithmetic,” etc.
Let’s start with some imports:

In [2]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline
from quantecon import LinearStateSpace
from scipy.stats import norm
import random

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

12.3 The Linear State Space Model

The objects in play are:


• An 𝑛 × 1 vector 𝑥𝑡 denoting the state at time 𝑡 = 0, 1, 2, ….
• An IID sequence of 𝑚 × 1 random vectors 𝑤𝑡 ∼ 𝑁 (0, 𝐼).
• A 𝑘 × 1 vector 𝑦𝑡 of observations at time 𝑡 = 0, 1, 2, ….
• An 𝑛 × 𝑛 matrix 𝐴 called the transition matrix.
• An 𝑛 × 𝑚 matrix 𝐶 called the volatility matrix.
• A 𝑘 × 𝑛 matrix 𝐺 sometimes called the output matrix.
Here is the linear state-space system

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1


𝑦𝑡 = 𝐺𝑥𝑡 (1)
𝑥0 ∼ 𝑁 (𝜇0 , Σ0 )

12.3.1 Primitives

The primitives of the model are

1. the matrices 𝐴, 𝐶, 𝐺

2. shock distribution, which we have specialized to 𝑁 (0, 𝐼)

3. the distribution of the initial condition 𝑥0 , which we have set to 𝑁 (𝜇0 , Σ0 )

Given 𝐴, 𝐶, 𝐺 and draws of 𝑥0 and 𝑤1 , 𝑤2 , …, the model (1) pins down the values of the se-
quences {𝑥𝑡 } and {𝑦𝑡 }.
12.3. THE LINEAR STATE SPACE MODEL 185

Even without these draws, the primitives 1–3 pin down the probability distributions of {𝑥𝑡 }
and {𝑦𝑡 }.
Later we’ll see how to compute these distributions and their moments.

Martingale Difference Shocks

We’ve made the common assumption that the shocks are independent standardized normal
vectors.
But some of what we say will be valid under the assumption that {𝑤𝑡+1 } is a martingale
difference sequence.
A martingale difference sequence is a sequence that is zero mean when conditioned on past
information.
In the present case, since {𝑥𝑡 } is our state sequence, this means that it satisfies

𝔼[𝑤𝑡+1 |𝑥𝑡 , 𝑥𝑡−1 , …] = 0

This is a weaker condition than that {𝑤𝑡 } is IID with 𝑤𝑡+1 ∼ 𝑁 (0, 𝐼).

12.3.2 Examples

By appropriate choice of the primitives, a variety of dynamics can be represented in terms of


the linear state space model.
The following examples help to highlight this point.
They also illustrate the wise dictum finding the state is an art.

Second-order Difference Equation

Let {𝑦𝑡 } be a deterministic sequence that satisfies

𝑦𝑡+1 = 𝜙0 + 𝜙1 𝑦𝑡 + 𝜙2 𝑦𝑡−1 s.t. 𝑦0 , 𝑦−1 given (2)

To map (2) into our state space system (1), we set

1 1 0 0 0
𝑥𝑡 = ⎡ 𝑦
⎢ 𝑡 ⎥
⎤ 𝐴=⎡ ⎤
⎢ 0 𝜙1 𝜙2 ⎥
𝜙 𝐶=⎡
⎢0⎥
⎤ 𝐺 = [0 1 0]
⎣𝑦𝑡−1 ⎦ ⎣0 1 0⎦ ⎣0⎦

You can confirm that under these definitions, (1) and (2) agree.
The next figure shows the dynamics of this process when 𝜙0 = 1.1, 𝜙1 = 0.8, 𝜙2 = −0.8, 𝑦0 =
𝑦−1 = 1.

In [3]: def plot_lss(A,


C,
G,
n=3,
ts_length=50):
186 CHAPTER 12. LINEAR STATE SPACE MODELS

ar = LinearStateSpace(A, C, G, mu_0=np.ones(n))
x, y = ar.simulate(ts_length)

fig, ax = plt.subplots()
y = y.flatten()
ax.plot(y, 'b­', lw=2, alpha=0.7)
ax.grid()
ax.set_xlabel('time', fontsize=12)
ax.set_ylabel('$y_t$', fontsize=12)
plt.show()

In [4]: ϕ_0, ϕ_1, ϕ_2 = 1.1, 0.8, ­0.8

A = [[1, 0, 0 ],
[ϕ_0, ϕ_1, ϕ_2],
[0, 1, 0 ]]

C = np.zeros((3, 1))
G = [0, 1, 0]

plot_lss(A, C, G)

Later you’ll be asked to recreate this figure.

Univariate Autoregressive Processes

We can use (1) to represent the model

𝑦𝑡+1 = 𝜙1 𝑦𝑡 + 𝜙2 𝑦𝑡−1 + 𝜙3 𝑦𝑡−2 + 𝜙4 𝑦𝑡−3 + 𝜎𝑤𝑡+1 (3)

where {𝑤𝑡 } is IID and standard normal.


12.3. THE LINEAR STATE SPACE MODEL 187


To put this in the linear state space format we take 𝑥𝑡 = [𝑦𝑡 𝑦𝑡−1 𝑦𝑡−2 𝑦𝑡−3 ] and

𝜙1 𝜙2 𝜙3 𝜙4 𝜎
⎡1 0 0 0⎤ ⎡0⎤
𝐴=⎢ ⎥ 𝐶=⎢ ⎥ 𝐺 = [1 0 0 0]
⎢0 1 0 0⎥ ⎢0⎥
⎣0 0 1 0⎦ ⎣0⎦

The matrix 𝐴 has the form of the companion matrix to the vector [𝜙1 𝜙2 𝜙3 𝜙4 ].
The next figure shows the dynamics of this process when

𝜙1 = 0.5, 𝜙2 = −0.2, 𝜙3 = 0, 𝜙4 = 0.5, 𝜎 = 0.2, 𝑦0 = 𝑦−1 = 𝑦−2 = 𝑦−3 = 1

In [5]: ϕ_1, ϕ_2, ϕ_3, ϕ_4 = 0.5, ­0.2, 0, 0.5


σ = 0.2

A_1 = [[ϕ_1, ϕ_2, ϕ_3, ϕ_4],


[1, 0, 0, 0 ],
[0, 1, 0, 0 ],
[0, 0, 1, 0 ]]

C_1 = [[σ],
[0],
[0],
[0]]

G_1 = [1, 0, 0, 0]

plot_lss(A_1, C_1, G_1, n=4, ts_length=200)


188 CHAPTER 12. LINEAR STATE SPACE MODELS

Vector Autoregressions

Now suppose that


• 𝑦𝑡 is a 𝑘 × 1 vector
• 𝜙𝑗 is a 𝑘 × 𝑘 matrix and
• 𝑤𝑡 is 𝑘 × 1
Then (3) is termed a vector autoregression.
To map this into (1), we set

𝑦𝑡 𝜙1 𝜙2 𝜙3 𝜙4 𝜎
⎡𝑦 ⎤ ⎡𝐼 0 0 0⎤ ⎡0⎤
𝑥𝑡 = ⎢ 𝑡−1 ⎥ 𝐴=⎢ ⎥ 𝐶=⎢ ⎥ 𝐺 = [𝐼 0 0 0]
⎢𝑦𝑡−2 ⎥ ⎢0 𝐼 0 0⎥ ⎢0⎥
⎣𝑦𝑡−3 ⎦ ⎣0 0 𝐼 0⎦ ⎣0⎦

where 𝐼 is the 𝑘 × 𝑘 identity matrix and 𝜎 is a 𝑘 × 𝑘 matrix.

Seasonals

We can use (1) to represent

1. the deterministic seasonal 𝑦𝑡 = 𝑦𝑡−4


2. the indeterministic seasonal 𝑦𝑡 = 𝜙4 𝑦𝑡−4 + 𝑤𝑡

In fact, both are special cases of (3).


With the deterministic seasonal, the transition matrix becomes

0 0 0 1
⎡1 0 0 0⎤
𝐴=⎢ ⎥
⎢0 1 0 0⎥
⎣0 0 1 0⎦

It is easy to check that 𝐴4 = 𝐼, which implies that 𝑥𝑡 is strictly periodic with period 4:Section
??

𝑥𝑡+4 = 𝑥𝑡

Such an 𝑥𝑡 process can be used to model deterministic seasonals in quarterly time series.
The indeterministic seasonal produces recurrent, but aperiodic, seasonal fluctuations.

Time Trends

The model 𝑦𝑡 = 𝑎𝑡 + 𝑏 is known as a linear time trend.


We can represent this model in the linear state space form by taking

1 1 0
𝐴=[ ] 𝐶=[ ] 𝐺 = [𝑎 𝑏] (4)
0 1 0
12.3. THE LINEAR STATE SPACE MODEL 189


and starting at initial condition 𝑥0 = [0 1] .
In fact, it’s possible to use the state-space system to represent polynomial trends of any or-
der.
For instance, we can represent the model 𝑦𝑡 = 𝑎𝑡2 + 𝑏𝑡 + 𝑐 in the linear state space form by
taking

1 1 0 0
𝐴=⎡
⎢0 1 1⎥
⎤ 𝐶=⎡
⎢0⎥
⎤ 𝐺 = [2𝑎 𝑎 + 𝑏 𝑐]
⎣0 0 1⎦ ⎣0⎦

and starting at initial condition 𝑥0 = [0 0 1] .
It follows that

1 𝑡 𝑡(𝑡 − 1)/2
𝑡 ⎡
𝐴 = ⎢0 1 𝑡 ⎤

⎣0 0 1 ⎦

Then 𝑥′𝑡 = [𝑡(𝑡 − 1)/2 𝑡 1]. You can now confirm that 𝑦𝑡 = 𝐺𝑥𝑡 has the correct form.

12.3.3 Moving Average Representations

A nonrecursive expression for 𝑥𝑡 as a function of 𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡 can be found by using (1)


repeatedly to obtain

𝑥𝑡 = 𝐴𝑥𝑡−1 + 𝐶𝑤𝑡
= 𝐴2 𝑥𝑡−2 + 𝐴𝐶𝑤𝑡−1 + 𝐶𝑤𝑡
⋮ (5)
𝑡−1
= ∑ 𝐴𝑗 𝐶𝑤𝑡−𝑗 + 𝐴𝑡 𝑥0
𝑗=0

Representation (5) is a moving average representation.


It expresses {𝑥𝑡 } as a linear function of

1. current and past values of the process {𝑤𝑡 } and

2. the initial condition 𝑥0

As an example of a moving average representation, let the model be

1 1 1
𝐴=[ ] 𝐶=[ ]
0 1 0

1 𝑡 ′
You will be able to show that 𝐴𝑡 = [ ] and 𝐴𝑗 𝐶 = [1 0] .
0 1
Substituting into the moving average representation (5), we obtain
190 CHAPTER 12. LINEAR STATE SPACE MODELS

𝑡−1
𝑥1𝑡 = ∑ 𝑤𝑡−𝑗 + [1 𝑡] 𝑥0
𝑗=0

where 𝑥1𝑡 is the first entry of 𝑥𝑡 .


The first term on the right is a cumulated sum of martingale differences and is therefore a
martingale.
The second term is a translated linear function of time.
For this reason, 𝑥1𝑡 is called a martingale with drift.

12.4 Distributions and Moments

12.4.1 Unconditional Moments

Using (1), it’s easy to obtain expressions for the (unconditional) means of 𝑥𝑡 and 𝑦𝑡 .
We’ll explain what unconditional and conditional mean soon.
Letting 𝜇𝑡 ∶= 𝔼[𝑥𝑡 ] and using linearity of expectations, we find that

𝜇𝑡+1 = 𝐴𝜇𝑡 with 𝜇0 given (6)

Here 𝜇0 is a primitive given in (1).


The variance-covariance matrix of 𝑥𝑡 is Σ𝑡 ∶= 𝔼[(𝑥𝑡 − 𝜇𝑡 )(𝑥𝑡 − 𝜇𝑡 )′ ].
Using 𝑥𝑡+1 − 𝜇𝑡+1 = 𝐴(𝑥𝑡 − 𝜇𝑡 ) + 𝐶𝑤𝑡+1 , we can determine this matrix recursively via

Σ𝑡+1 = 𝐴Σ𝑡 𝐴′ + 𝐶𝐶 ′ with Σ0 given (7)

As with 𝜇0 , the matrix Σ0 is a primitive given in (1).


As a matter of terminology, we will sometimes call
• 𝜇𝑡 the unconditional mean of 𝑥𝑡
• Σ𝑡 the unconditional variance-covariance matrix of 𝑥𝑡
This is to distinguish 𝜇𝑡 and Σ𝑡 from related objects that use conditioning information, to be
defined below.
However, you should be aware that these “unconditional” moments do depend on the initial
distribution 𝑁 (𝜇0 , Σ0 ).

Moments of the Observations

Using linearity of expectations again we have

𝔼[𝑦𝑡 ] = 𝔼[𝐺𝑥𝑡 ] = 𝐺𝜇𝑡 (8)

The variance-covariance matrix of 𝑦𝑡 is easily shown to be


12.4. DISTRIBUTIONS AND MOMENTS 191

Var[𝑦𝑡 ] = Var[𝐺𝑥𝑡 ] = 𝐺Σ𝑡 𝐺′ (9)

12.4.2 Distributions

In general, knowing the mean and variance-covariance matrix of a random vector is not quite
as good as knowing the full distribution.
However, there are some situations where these moments alone tell us all we need to know.
These are situations in which the mean vector and covariance matrix are sufficient statis-
tics for the population distribution.
(Sufficient statistics form a list of objects that characterize a population distribution)
One such situation is when the vector in question is Gaussian (i.e., normally distributed).
This is the case here, given

1. our Gaussian assumptions on the primitives

2. the fact that normality is preserved under linear operations

In fact, it’s well-known that

𝑢 ∼ 𝑁 (𝑢,̄ 𝑆) and 𝑣 = 𝑎 + 𝐵𝑢 ⟹ 𝑣 ∼ 𝑁 (𝑎 + 𝐵𝑢,̄ 𝐵𝑆𝐵′ ) (10)

In particular, given our Gaussian assumptions on the primitives and the linearity of (1) we
can see immediately that both 𝑥𝑡 and 𝑦𝑡 are Gaussian for all 𝑡 ≥ 0 Section ??.
Since 𝑥𝑡 is Gaussian, to find the distribution, all we need to do is find its mean and variance-
covariance matrix.
But in fact we’ve already done this, in (6) and (7).
Letting 𝜇𝑡 and Σ𝑡 be as defined by these equations, we have

𝑥𝑡 ∼ 𝑁 (𝜇𝑡 , Σ𝑡 ) (11)

By similar reasoning combined with (8) and (9),

𝑦𝑡 ∼ 𝑁 (𝐺𝜇𝑡 , 𝐺Σ𝑡 𝐺′ ) (12)

12.4.3 Ensemble Interpretations

How should we interpret the distributions defined by (11)–(12)?


Intuitively, the probabilities in a distribution correspond to relative frequencies in a large
population drawn from that distribution.
Let’s apply this idea to our setting, focusing on the distribution of 𝑦𝑇 for fixed 𝑇 .
We can generate independent draws of 𝑦𝑇 by repeatedly simulating the evolution of the sys-
tem up to time 𝑇 , using an independent set of shocks each time.
192 CHAPTER 12. LINEAR STATE SPACE MODELS

The next figure shows 20 simulations, producing 20 time series for {𝑦𝑡 }, and hence 20 draws
of 𝑦𝑇 .
The system in question is the univariate autoregressive model (3).
The values of 𝑦𝑇 are represented by black dots in the left-hand figure

In [6]: def cross_section_plot(A,


C,
G,
T=20, # Set the time
ymin=­0.8,
ymax=1.25,
sample_size = 20, # 20 observations/simulations
n=4): # The number of dimensions for the initial x0

ar = LinearStateSpace(A, C, G, mu_0=np.ones(n))

fig, axes = plt.subplots(1, 2, figsize=(16, 5))

for ax in axes:
ax.grid(alpha=0.4)
ax.set_ylim(ymin, ymax)

ax = axes[0]
ax.set_ylim(ymin, ymax)
ax.set_ylabel('$y_t$', fontsize=12)
ax.set_xlabel('time', fontsize=12)
ax.vlines((T,), ­1.5, 1.5)

ax.set_xticks((T,))
ax.set_xticklabels(('$T$',))

sample = []
for i in range(sample_size):
rcolor = random.choice(('c', 'g', 'b', 'k'))
x, y = ar.simulate(ts_length=T+15)
y = y.flatten()
ax.plot(y, color=rcolor, lw=1, alpha=0.5)
ax.plot((T,), (y[T],), 'ko', alpha=0.5)
sample.append(y[T])

y = y.flatten()
axes[1].set_ylim(ymin, ymax)
axes[1].set_ylabel('$y_t$', fontsize=12)
axes[1].set_xlabel('relative frequency', fontsize=12)
axes[1].hist(sample, bins=16, density=True, orientation='horizontal', alpha=0.5)
plt.show()

In [7]: ϕ_1, ϕ_2, ϕ_3, ϕ_4 = 0.5, ­0.2, 0, 0.5


σ = 0.1

A_2 = [[ϕ_1, ϕ_2, ϕ_3, ϕ_4],


[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0]]

C_2 = [[σ], [0], [0], [0]]

G_2 = [1, 0, 0, 0]

cross_section_plot(A_2, C_2, G_2)


12.4. DISTRIBUTIONS AND MOMENTS 193

In the right-hand figure, these values are converted into a rotated histogram that shows rela-
tive frequencies from our sample of 20 𝑦𝑇 ’s.
Here is another figure, this time with 100 observations

In [8]: t = 100
cross_section_plot(A_2, C_2, G_2, T=t)

Let’s now try with 500,000 observations, showing only the histogram (without rotation)

In [9]: T = 100
ymin=­0.8
ymax=1.25
sample_size = 500_000

ar = LinearStateSpace(A_2, C_2, G_2, mu_0=np.ones(4))


fig, ax = plt.subplots()
x, y = ar.simulate(sample_size)
mu_x, mu_y, Sigma_x, Sigma_y = ar.stationary_distributions()
f_y = norm(loc=float(mu_y), scale=float(np.sqrt(Sigma_y)))
y = y.flatten()
ygrid = np.linspace(ymin, ymax, 150)

ax.hist(y, bins=50, density=True, alpha=0.4)


ax.plot(ygrid, f_y.pdf(ygrid), 'k­', lw=2, alpha=0.8, label=r'true density')
ax.set_xlim(ymin, ymax)
ax.set_xlabel('$y_t$', fontsize=12)
ax.set_ylabel('relative frequency', fontsize=12)
ax.legend(fontsize=12)
plt.show()
194 CHAPTER 12. LINEAR STATE SPACE MODELS

The black line is the population density of 𝑦𝑇 calculated from (12).


The histogram and population distribution are close, as expected.
By looking at the figures and experimenting with parameters, you will gain a feel for how the
population distribution depends on the model primitives listed above, as intermediated by the
distribution’s sufficient statistics.

Ensemble Means

In the preceding figure, we approximated the population distribution of 𝑦𝑇 by

1. generating 𝐼 sample paths (i.e., time series) where 𝐼 is a large number

2. recording each observation 𝑦𝑇𝑖

3. histogramming this sample

Just as the histogram approximates the population distribution, the ensemble or cross-
sectional average

1 𝐼 𝑖
𝑦𝑇̄ ∶= ∑ 𝑦𝑇
𝐼 𝑖=1

approximates the expectation 𝔼[𝑦𝑇 ] = 𝐺𝜇𝑇 (as implied by the law of large numbers).
Here’s a simulation comparing the ensemble averages and population means at time points
𝑡 = 0, … , 50.
The parameters are the same as for the preceding figures, and the sample size is relatively
small (𝐼 = 20).
12.4. DISTRIBUTIONS AND MOMENTS 195

In [10]: I = 20
T = 50
ymin = ­0.5
ymax = 1.15

ar = LinearStateSpace(A_2, C_2, G_2, mu_0=np.ones(4))

fig, ax = plt.subplots()

ensemble_mean = np.zeros(T)
for i in range(I):
x, y = ar.simulate(ts_length=T)
y = y.flatten()
ax.plot(y, 'c­', lw=0.8, alpha=0.5)
ensemble_mean = ensemble_mean + y

ensemble_mean = ensemble_mean / I
ax.plot(ensemble_mean, color='b', lw=2, alpha=0.8, label='$\\bar y_t$')
m = ar.moment_sequence()

population_means = []
for t in range(T):
μ_x, μ_y, Σ_x, Σ_y = next(m)
population_means.append(float(μ_y))

ax.plot(population_means, color='g', lw=2, alpha=0.8, label='$G\mu_t$')


ax.set_ylim(ymin, ymax)
ax.set_xlabel('time', fontsize=12)
ax.set_ylabel('$y_t$', fontsize=12)
ax.legend(ncol=2)
plt.show()

The ensemble mean for 𝑥𝑡 is


196 CHAPTER 12. LINEAR STATE SPACE MODELS

1 𝐼 𝑖
𝑥𝑇̄ ∶= ∑ 𝑥 → 𝜇𝑇 (𝐼 → ∞)
𝐼 𝑖=1 𝑇

The limit 𝜇𝑇 is a “long-run average”.


(By long-run average we mean the average for an infinite (𝐼 = ∞) number of sample 𝑥𝑇 ’s)
Another application of the law of large numbers assures us that

1 𝐼
∑(𝑥𝑖 − 𝑥𝑇̄ )(𝑥𝑖𝑇 − 𝑥𝑇̄ )′ → Σ𝑇 (𝐼 → ∞)
𝐼 𝑖=1 𝑇

12.4.4 Joint Distributions

In the preceding discussion, we looked at the distributions of 𝑥𝑡 and 𝑦𝑡 in isolation.


This gives us useful information but doesn’t allow us to answer questions like
• what’s the probability that 𝑥𝑡 ≥ 0 for all 𝑡?
• what’s the probability that the process {𝑦𝑡 } exceeds some value 𝑎 before falling below
𝑏?
• etc., etc.
Such questions concern the joint distributions of these sequences.
To compute the joint distribution of 𝑥0 , 𝑥1 , … , 𝑥𝑇 , recall that joint and conditional densities
are linked by the rule

𝑝(𝑥, 𝑦) = 𝑝(𝑦 | 𝑥)𝑝(𝑥) (joint = conditional × marginal)

From this rule we get 𝑝(𝑥0 , 𝑥1 ) = 𝑝(𝑥1 | 𝑥0 )𝑝(𝑥0 ).


The Markov property 𝑝(𝑥𝑡 | 𝑥𝑡−1 , … , 𝑥0 ) = 𝑝(𝑥𝑡 | 𝑥𝑡−1 ) and repeated applications of the preced-
ing rule lead us to

𝑇 −1
𝑝(𝑥0 , 𝑥1 , … , 𝑥𝑇 ) = 𝑝(𝑥0 ) ∏ 𝑝(𝑥𝑡+1 | 𝑥𝑡 )
𝑡=0

The marginal 𝑝(𝑥0 ) is just the primitive 𝑁 (𝜇0 , Σ0 ).


In view of (1), the conditional densities are

𝑝(𝑥𝑡+1 | 𝑥𝑡 ) = 𝑁 (𝐴𝑥𝑡 , 𝐶𝐶 ′ )

Autocovariance Functions

An important object related to the joint distribution is the autocovariance function

Σ𝑡+𝑗,𝑡 ∶= 𝔼[(𝑥𝑡+𝑗 − 𝜇𝑡+𝑗 )(𝑥𝑡 − 𝜇𝑡 )′ ] (13)

Elementary calculations show that


12.5. STATIONARITY AND ERGODICITY 197

Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ𝑡 (14)

Notice that Σ𝑡+𝑗,𝑡 in general depends on both 𝑗, the gap between the two dates, and 𝑡, the
earlier date.

12.5 Stationarity and Ergodicity

Stationarity and ergodicity are two properties that, when they hold, greatly aid analysis of
linear state space models.
Let’s start with the intuition.

12.5.1 Visualizing Stability

Let’s look at some more time series from the same model that we analyzed above.
This picture shows cross-sectional distributions for 𝑦 at times 𝑇 , 𝑇 ′ , 𝑇 ″

In [11]: def cross_plot(A,


C,
G,
steady_state='False',
T0 = 10,
T1 = 50,
T2 = 75,
T4 = 100):

ar = LinearStateSpace(A, C, G, mu_0=np.ones(4))

if steady_state == 'True':
μ_x, μ_y, Σ_x, Σ_y = ar.stationary_distributions()
ar_state = LinearStateSpace(A, C, G, mu_0=μ_x, Sigma_0=Σ_x)

ymin, ymax = ­0.6, 0.6


fig, ax = plt.subplots()
ax.grid(alpha=0.4)
ax.set_ylim(ymin, ymax)
ax.set_ylabel('$y_t$', fontsize=12)
ax.set_xlabel('$time$', fontsize=12)

ax.vlines((T0, T1, T2), ­1.5, 1.5)


ax.set_xticks((T0, T1, T2))
ax.set_xticklabels(("$T$", "$T'$", "$T''$"), fontsize=12)
for i in range(80):
rcolor = random.choice(('c', 'g', 'b'))

if steady_state == 'True':
x, y = ar_state.simulate(ts_length=T4)
else:
x, y = ar.simulate(ts_length=T4)

y = y.flatten()
ax.plot(y, color=rcolor, lw=0.8, alpha=0.5)
ax.plot((T0, T1, T2), (y[T0], y[T1], y[T2],), 'ko', alpha=0.5)
plt.show()

In [12]: cross_plot(A_2, C_2, G_2)


198 CHAPTER 12. LINEAR STATE SPACE MODELS

Note how the time series “settle down” in the sense that the distributions at 𝑇 ′ and 𝑇 ″ are
relatively similar to each other — but unlike the distribution at 𝑇 .
Apparently, the distributions of 𝑦𝑡 converge to a fixed long-run distribution as 𝑡 → ∞.
When such a distribution exists it is called a stationary distribution.

12.5.2 Stationary Distributions

In our setting, a distribution 𝜓∞ is said to be stationary for 𝑥𝑡 if

𝑥𝑡 ∼ 𝜓∞ and 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1 ⟹ 𝑥𝑡+1 ∼ 𝜓∞

Since

1. in the present case, all distributions are Gaussian

2. a Gaussian distribution is pinned down by its mean and variance-covariance matrix

we can restate the definition as follows: 𝜓∞ is stationary for 𝑥𝑡 if

𝜓∞ = 𝑁 (𝜇∞ , Σ∞ )

where 𝜇∞ and Σ∞ are fixed points of (6) and (7) respectively.

12.5.3 Covariance Stationary Processes

Let’s see what happens to the preceding figure if we start 𝑥0 at the stationary distribution.
12.5. STATIONARITY AND ERGODICITY 199

In [13]: cross_plot(A_2, C_2, G_2, steady_state='True')

Now the differences in the observed distributions at 𝑇 , 𝑇 ′ and 𝑇 ″ come entirely from random
fluctuations due to the finite sample size.
By
• our choosing 𝑥0 ∼ 𝑁 (𝜇∞ , Σ∞ )
• the definitions of 𝜇∞ and Σ∞ as fixed points of (6) and (7) respectively
we’ve ensured that

𝜇𝑡 = 𝜇∞ and Σ𝑡 = Σ∞ for all 𝑡

Moreover, in view of (14), the autocovariance function takes the form Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ∞ , which
depends on 𝑗 but not on 𝑡.
This motivates the following definition.
A process {𝑥𝑡 } is said to be covariance stationary if
• both 𝜇𝑡 and Σ𝑡 are constant in 𝑡
• Σ𝑡+𝑗,𝑡 depends on the time gap 𝑗 but not on time 𝑡
In our setting, {𝑥𝑡 } will be covariance stationary if 𝜇0 , Σ0 , 𝐴, 𝐶 assume values that imply that
none of 𝜇𝑡 , Σ𝑡 , Σ𝑡+𝑗,𝑡 depends on 𝑡.
200 CHAPTER 12. LINEAR STATE SPACE MODELS

12.5.4 Conditions for Stationarity

The Globally Stable Case

The difference equation 𝜇𝑡+1 = 𝐴𝜇𝑡 is known to have unique fixed point 𝜇∞ = 0 if all eigen-
values of 𝐴 have moduli strictly less than unity.
That is, if (np.absolute(np.linalg.eigvals(A)) < 1).all() == True.
The difference equation (7) also has a unique fixed point in this case, and, moreover

𝜇𝑡 → 𝜇∞ = 0 and Σ𝑡 → Σ∞ as 𝑡→∞

regardless of the initial conditions 𝜇0 and Σ0 .


This is the globally stable case — see these notes for more a theoretical treatment.
However, global stability is more than we need for stationary solutions, and often more than
we want.
To illustrate, consider our second order difference equation example.

Here the state is 𝑥𝑡 = [1 𝑦𝑡 𝑦𝑡−1 ] .
Because of the constant first component in the state vector, we will never have 𝜇𝑡 → 0.
How can we find stationary solutions that respect a constant state component?

Processes with a Constant State Component

To investigate such a process, suppose that 𝐴 and 𝐶 take the form

𝐴1 𝑎 𝐶1
𝐴=[ ] 𝐶=[ ]
0 1 0

where
• 𝐴1 is an (𝑛 − 1) × (𝑛 − 1) matrix
• 𝑎 is an (𝑛 − 1) × 1 column vector

Let 𝑥𝑡 = [𝑥′1𝑡 1] where 𝑥1𝑡 is (𝑛 − 1) × 1.
It follows that

𝑥1,𝑡+1 = 𝐴1 𝑥1𝑡 + 𝑎 + 𝐶1 𝑤𝑡+1

Let 𝜇1𝑡 = 𝔼[𝑥1𝑡 ] and take expectations on both sides of this expression to get

𝜇1,𝑡+1 = 𝐴1 𝜇1,𝑡 + 𝑎 (15)

Assume now that the moduli of the eigenvalues of 𝐴1 are all strictly less than one.
Then (15) has a unique stationary solution, namely,

𝜇1∞ = (𝐼 − 𝐴1 )−1 𝑎
12.5. STATIONARITY AND ERGODICITY 201


The stationary value of 𝜇𝑡 itself is then 𝜇∞ ∶= [𝜇′1∞ 1] .
The stationary values of Σ𝑡 and Σ𝑡+𝑗,𝑡 satisfy

Σ∞ = 𝐴Σ∞ 𝐴′ + 𝐶𝐶 ′
(16)
Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ∞

Notice that here Σ𝑡+𝑗,𝑡 depends on the time gap 𝑗 but not on calendar time 𝑡.
In conclusion, if
• 𝑥0 ∼ 𝑁 (𝜇∞ , Σ∞ ) and
• the moduli of the eigenvalues of 𝐴1 are all strictly less than unity
then the {𝑥𝑡 } process is covariance stationary, with constant state component.

Note
If the eigenvalues of 𝐴1 are less than unity in modulus, then (a) starting from any
initial value, the mean and variance-covariance matrix both converge to their sta-
tionary values; and (b) iterations on (7) converge to the fixed point of the discrete
Lyapunov equation in the first line of (16).

12.5.5 Ergodicity

Let’s suppose that we’re working with a covariance stationary process.


In this case, we know that the ensemble mean will converge to 𝜇∞ as the sample size 𝐼 ap-
proaches infinity.

Averages over Time

Ensemble averages across simulations are interesting theoretically, but in real life, we usually
observe only a single realization {𝑥𝑡 , 𝑦𝑡 }𝑇𝑡=0 .
So now let’s take a single realization and form the time-series averages

1 𝑇 1 𝑇
𝑥̄ ∶= ∑𝑥 and 𝑦 ̄ ∶= ∑𝑦
𝑇 𝑡=1 𝑡 𝑇 𝑡=1 𝑡

Do these time series averages converge to something interpretable in terms of our basic state-
space representation?
The answer depends on something called ergodicity.
Ergodicity is the property that time series and ensemble averages coincide.
More formally, ergodicity implies that time series sample averages converge to their expecta-
tion under the stationary distribution.
In particular,
1 𝑇
• 𝑇 ∑𝑡=1 𝑥𝑡 → 𝜇∞
1 𝑇
• 𝑇 ∑𝑡=1 (𝑥𝑡 − 𝑥𝑇̄ )(𝑥𝑡 − 𝑥𝑇̄ )′ → Σ∞
1 𝑇
• 𝑇 ∑𝑡=1 (𝑥𝑡+𝑗 − 𝑥𝑇̄ )(𝑥𝑡 − 𝑥𝑇̄ )′ → 𝐴𝑗 Σ∞
202 CHAPTER 12. LINEAR STATE SPACE MODELS

In our linear Gaussian setting, any covariance stationary process is also ergodic.

12.6 Noisy Observations

In some settings, the observation equation 𝑦𝑡 = 𝐺𝑥𝑡 is modified to include an error term.
Often this error term represents the idea that the true state can only be observed imperfectly.
To include an error term in the observation we introduce
• An IID sequence of ℓ × 1 random vectors 𝑣𝑡 ∼ 𝑁 (0, 𝐼).
• A 𝑘 × ℓ matrix 𝐻.
and extend the linear state-space system to

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1


𝑦𝑡 = 𝐺𝑥𝑡 + 𝐻𝑣𝑡 (17)
𝑥0 ∼ 𝑁 (𝜇0 , Σ0 )

The sequence {𝑣𝑡 } is assumed to be independent of {𝑤𝑡 }.


The process {𝑥𝑡 } is not modified by noise in the observation equation and its moments, distri-
butions and stability properties remain the same.
The unconditional moments of 𝑦𝑡 from (8) and (9) now become

𝔼[𝑦𝑡 ] = 𝔼[𝐺𝑥𝑡 + 𝐻𝑣𝑡 ] = 𝐺𝜇𝑡 (18)

The variance-covariance matrix of 𝑦𝑡 is easily shown to be

Var[𝑦𝑡 ] = Var[𝐺𝑥𝑡 + 𝐻𝑣𝑡 ] = 𝐺Σ𝑡 𝐺′ + 𝐻𝐻 ′ (19)

The distribution of 𝑦𝑡 is therefore

𝑦𝑡 ∼ 𝑁 (𝐺𝜇𝑡 , 𝐺Σ𝑡 𝐺′ + 𝐻𝐻 ′ )

12.7 Prediction

The theory of prediction for linear state space systems is elegant and simple.

12.7.1 Forecasting Formulas – Conditional Means

The natural way to predict variables is to use conditional distributions.


For example, the optimal forecast of 𝑥𝑡+1 given information known at time 𝑡 is

𝔼𝑡 [𝑥𝑡+1 ] ∶= 𝔼[𝑥𝑡+1 ∣ 𝑥𝑡 , 𝑥𝑡−1 , … , 𝑥0 ] = 𝐴𝑥𝑡

The right-hand side follows from 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1 and the fact that 𝑤𝑡+1 is zero mean and
independent of 𝑥𝑡 , 𝑥𝑡−1 , … , 𝑥0 .
12.7. PREDICTION 203

That 𝔼𝑡 [𝑥𝑡+1 ] = 𝔼[𝑥𝑡+1 ∣ 𝑥𝑡 ] is an implication of {𝑥𝑡 } having the Markov property.


The one-step-ahead forecast error is

𝑥𝑡+1 − 𝔼𝑡 [𝑥𝑡+1 ] = 𝐶𝑤𝑡+1

The covariance matrix of the forecast error is

𝔼[(𝑥𝑡+1 − 𝔼𝑡 [𝑥𝑡+1 ])(𝑥𝑡+1 − 𝔼𝑡 [𝑥𝑡+1 ])′ ] = 𝐶𝐶 ′

More generally, we’d like to compute the 𝑗-step ahead forecasts 𝔼𝑡 [𝑥𝑡+𝑗 ] and 𝔼𝑡 [𝑦𝑡+𝑗 ].
With a bit of algebra, we obtain

𝑥𝑡+𝑗 = 𝐴𝑗 𝑥𝑡 + 𝐴𝑗−1 𝐶𝑤𝑡+1 + 𝐴𝑗−2 𝐶𝑤𝑡+2 + ⋯ + 𝐴0 𝐶𝑤𝑡+𝑗

In view of the IID property, current and past state values provide no information about fu-
ture values of the shock.
Hence 𝔼𝑡 [𝑤𝑡+𝑘 ] = 𝔼[𝑤𝑡+𝑘 ] = 0.
It now follows from linearity of expectations that the 𝑗-step ahead forecast of 𝑥 is

𝔼𝑡 [𝑥𝑡+𝑗 ] = 𝐴𝑗 𝑥𝑡

The 𝑗-step ahead forecast of 𝑦 is therefore

𝔼𝑡 [𝑦𝑡+𝑗 ] = 𝔼𝑡 [𝐺𝑥𝑡+𝑗 + 𝐻𝑣𝑡+𝑗 ] = 𝐺𝐴𝑗 𝑥𝑡

12.7.2 Covariance of Prediction Errors

It is useful to obtain the covariance matrix of the vector of 𝑗-step-ahead prediction errors

𝑗−1
𝑥𝑡+𝑗 − 𝔼𝑡 [𝑥𝑡+𝑗 ] = ∑ 𝐴𝑠 𝐶𝑤𝑡−𝑠+𝑗 (20)
𝑠=0

Evidently,

𝑗−1

𝑉𝑗 ∶= 𝔼𝑡 [(𝑥𝑡+𝑗 − 𝔼𝑡 [𝑥𝑡+𝑗 ])(𝑥𝑡+𝑗 − 𝔼𝑡 [𝑥𝑡+𝑗 ])′ ] = ∑ 𝐴𝑘 𝐶𝐶 ′ 𝐴𝑘 (21)
𝑘=0

𝑉𝑗 defined in (21) can be calculated recursively via 𝑉1 = 𝐶𝐶 ′ and

𝑉𝑗 = 𝐶𝐶 ′ + 𝐴𝑉𝑗−1 𝐴′ , 𝑗≥2 (22)

𝑉𝑗 is the conditional covariance matrix of the errors in forecasting 𝑥𝑡+𝑗 , conditioned on time 𝑡
information 𝑥𝑡 .
Under particular conditions, 𝑉𝑗 converges to
204 CHAPTER 12. LINEAR STATE SPACE MODELS

𝑉∞ = 𝐶𝐶 ′ + 𝐴𝑉∞ 𝐴′ (23)

Equation (23) is an example of a discrete Lyapunov equation in the covariance matrix 𝑉∞ .


A sufficient condition for 𝑉𝑗 to converge is that the eigenvalues of 𝐴 be strictly less than one
in modulus.
Weaker sufficient conditions for convergence associate eigenvalues equaling or exceeding one
in modulus with elements of 𝐶 that equal 0.

12.8 Code

Our preceding simulations and calculations are based on code in the file lss.py from the
QuantEcon.py package.
The code implements a class for handling linear state space models (simulations, calculating
moments, etc.).
One Python construct you might not be familiar with is the use of a generator function in the
method moment_sequence().
Go back and read the relevant documentation if you’ve forgotten how generator functions
work.
Examples of usage are given in the solutions to the exercises.

12.9 Exercises

12.9.1 Exercise 1

In several contexts, we want to compute forecasts of geometric sums of future random vari-
ables governed by the linear state-space system (1).
We want the following objects

• Forecast of a geometric sum of future 𝑥’s, or 𝔼𝑡 [∑𝑗=0 𝛽 𝑗 𝑥𝑡+𝑗 ].

• Forecast of a geometric sum of future 𝑦’s, or 𝔼𝑡 [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 ].
These objects are important components of some famous and interesting dynamic models.
For example,

• if {𝑦𝑡 } is a stream of dividends, then 𝔼 [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 |𝑥𝑡 ] is a model of a stock price

• if {𝑦𝑡 } is the money supply, then 𝔼 [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 |𝑥𝑡 ] is a model of the price level
Show that:


𝔼𝑡 [∑ 𝛽 𝑗 𝑥𝑡+𝑗 ] = [𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0

and
12.10. SOLUTIONS 205


𝔼𝑡 [∑ 𝛽 𝑗 𝑦𝑡+𝑗 ] = 𝐺[𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0

what must the modulus for every eigenvalue of 𝐴 be less than?

12.10 Solutions

12.10.1 Exercise 1
1
Suppose that every eigenvalue of 𝐴 has modulus strictly less than 𝛽.
−1
It then follows that 𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ = [𝐼 − 𝛽𝐴] .
This leads to our formulas:
• Forecast of a geometric sum of future 𝑥’s


𝔼𝑡 [∑ 𝛽 𝑗 𝑥𝑡+𝑗 ] = [𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ ]𝑥𝑡 = [𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0

• Forecast of a geometric sum of future 𝑦’s


𝔼𝑡 [∑ 𝛽 𝑗 𝑦𝑡+𝑗 ] = 𝐺[𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ ]𝑥𝑡 = 𝐺[𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0

Footnotes
[1] The eigenvalues of 𝐴 are (1, −1, 𝑖, −𝑖).
[2] The correct way to argue this is by induction. Suppose that 𝑥𝑡 is Gaussian. Then (1) and
(10) imply that 𝑥𝑡+1 is Gaussian. Since 𝑥0 is assumed to be Gaussian, it follows that every 𝑥𝑡
is Gaussian. Evidently, this implies that each 𝑦𝑡 is Gaussian.
206 CHAPTER 12. LINEAR STATE SPACE MODELS
Chapter 13

Application: The Samuelson


Multiplier-Accelerator

13.1 Contents

• Overview 13.2
• Details 13.3
• Implementation 13.4
• Stochastic Shocks 13.5
• Government Spending 13.6
• Wrapping Everything Into a Class 13.7
• Using the LinearStateSpace Class 13.8
• Pure Multiplier Model 13.9
• Summary 13.10
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon

13.2 Overview

This lecture creates non-stochastic and stochastic versions of Paul Samuelson’s celebrated
multiplier accelerator model [93].
In doing so, we extend the example of the Solow model class in our second OOP lecture.
Our objectives are to
• provide a more detailed example of OOP and classes
• review a famous model
• review linear difference equations, both deterministic and stochastic
Let’s start with some standard imports:

In [2]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

We’ll also use the following for various tasks described below:

207
208 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR

In [3]: from quantecon import LinearStateSpace


import cmath
import math
import sympy
from sympy import Symbol, init_printing
from cmath import sqrt

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

13.2.1 Samuelson’s Model

Samuelson used a second-order linear difference equation to represent a model of national out-
put based on three components:
• a national output identity asserting that national outcome is the sum of consumption
plus investment plus government purchases.
• a Keynesian consumption function asserting that consumption at time 𝑡 is equal to a
constant times national output at time 𝑡 − 1.
• an investment accelerator asserting that investment at time 𝑡 equals a constant called
the accelerator coefficient times the difference in output between period 𝑡 − 1 and 𝑡 − 2.
• the idea that consumption plus investment plus government purchases constitute aggre-
gate demand, which automatically calls forth an equal amount of aggregate supply.
(To read about linear difference equations see here or chapter IX of [95])
Samuelson used the model to analyze how particular values of the marginal propensity to
consume and the accelerator coefficient might give rise to transient business cycles in national
output.
Possible dynamic properties include
• smooth convergence to a constant level of output
• damped business cycles that eventually converge to a constant level of output
• persistent business cycles that neither dampen nor explode
Later we present an extension that adds a random shock to the right side of the national in-
come identity representing random fluctuations in aggregate demand.
This modification makes national output become governed by a second-order stochastic linear
difference equation that, with appropriate parameter values, gives rise to recurrent irregular
business cycles.
(To read about stochastic linear difference equations see chapter XI of [95])

13.3 Details

Let’s assume that


• {𝐺𝑡 } is a sequence of levels of government expenditures – we’ll start by setting 𝐺𝑡 = 𝐺
for all 𝑡.
13.3. DETAILS 209

• {𝐶𝑡 } is a sequence of levels of aggregate consumption expenditures, a key endogenous


variable in the model.
• {𝐼𝑡 } is a sequence of rates of investment, another key endogenous variable.
• {𝑌𝑡 } is a sequence of levels of national income, yet another endogenous variable.
• 𝑎 is the marginal propensity to consume in the Keynesian consumption function 𝐶𝑡 =
𝑎𝑌𝑡−1 + 𝛾.
• 𝑏 is the “accelerator coefficient” in the “investment accelerator” 𝐼_𝑡 = 𝑏(𝑌 _𝑡 − 1 −
𝑌 _𝑡 − 2).
• {𝜖𝑡 } is an IID sequence standard normal random variables.
• 𝜎 ≥ 0 is a “volatility” parameter — setting 𝜎 = 0 recovers the non-stochastic case that
we’ll start with.
The model combines the consumption function

𝐶𝑡 = 𝑎𝑌𝑡−1 + 𝛾 (1)

with the investment accelerator

𝐼𝑡 = 𝑏(𝑌𝑡−1 − 𝑌𝑡−2 ) (2)

and the national income identity

𝑌𝑡 = 𝐶𝑡 + 𝐼𝑡 + 𝐺𝑡 (3)
• The parameter 𝑎 is peoples’ marginal propensity to consume out of income - equation
(1) asserts that people consume a fraction of math:a in (0,1) of each additional dollar of
income.
• The parameter 𝑏 > 0 is the investment accelerator coefficient - equation (2) asserts that
people invest in physical capital when income is increasing and disinvest when it is de-
creasing.
Equations (1), (2), and (3) imply the following second-order linear difference equation for na-
tional income:

𝑌𝑡 = (𝑎 + 𝑏)𝑌𝑡−1 − 𝑏𝑌𝑡−2 + (𝛾 + 𝐺𝑡 )

or

𝑌𝑡 = 𝜌1 𝑌𝑡−1 + 𝜌2 𝑌𝑡−2 + (𝛾 + 𝐺𝑡 ) (4)

where 𝜌1 = (𝑎 + 𝑏) and 𝜌2 = −𝑏.


To complete the model, we require two initial conditions.
If the model is to generate time series for 𝑡 = 0, … , 𝑇 , we require initial values

̄ ,
𝑌−1 = 𝑌−1 ̄
𝑌−2 = 𝑌−2

We’ll ordinarily set the parameters (𝑎, 𝑏) so that starting from an arbitrary pair of initial con-
̄ , 𝑌−2
ditions (𝑌−1 ̄ ), national income 𝑌 _𝑡 converges to a constant value as 𝑡 becomes large.

We are interested in studying


210 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR

• the transient fluctuations in 𝑌𝑡 as it converges to its steady state level


• the rate at which it converges to a steady state level
The deterministic version of the model described so far — meaning that no random shocks
hit aggregate demand — has only transient fluctuations.
We can convert the model to one that has persistent irregular fluctuations by adding a ran-
dom shock to aggregate demand.

13.3.1 Stochastic Version of the Model

We create a random or stochastic version of the model by adding a random process of


shocks or disturbances {𝜎𝜖𝑡 } to the right side of equation (4), leading to the second-
order scalar linear stochastic difference equation:

𝑌𝑡 = 𝐺𝑡 + 𝑎(1 − 𝑏)𝑌𝑡−1 − 𝑎𝑏𝑌𝑡−2 + 𝜎𝜖𝑡 (5)

13.3.2 Mathematical Analysis of the Model

To get started, let’s set 𝐺𝑡 ≡ 0, 𝜎 = 0, and 𝛾 = 0.


Then we can write equation (5) as

𝑌𝑡 = 𝜌1 𝑌𝑡−1 + 𝜌2 𝑌𝑡−2

or

𝑌𝑡+2 − 𝜌1 𝑌𝑡+1 − 𝜌2 𝑌𝑡 = 0 (6)

To discover the properties of the solution of (6), it is useful first to form the characteristic
polynomial for (6):

𝑧 2 − 𝜌1 𝑧 − 𝜌 2 (7)

where 𝑧 is possibly a complex number.


We want to find the two zeros (a.k.a. roots) – namely 𝜆1 , 𝜆2 – of the characteristic polyno-
mial.
These are two special values of 𝑧, say 𝑧 = 𝜆1 and 𝑧 = 𝜆2 , such that if we set 𝑧 equal to one of
these values in expression (7), the characteristic polynomial (7) equals zero:

𝑧2 − 𝜌1 𝑧 − 𝜌2 = (𝑧 − 𝜆1 )(𝑧 − 𝜆2 ) = 0 (8)

Equation (8) is said to factor the characteristic polynomial.


When the roots are complex, they will occur as a complex conjugate pair.
When the roots are complex, it is convenient to represent them in the polar form

𝜆1 = 𝑟𝑒𝑖𝜔 , 𝜆2 = 𝑟𝑒−𝑖𝜔
13.3. DETAILS 211

where 𝑟 is the amplitude of the complex number and 𝜔 is its angle or phase.
These can also be represented as

𝜆1 = 𝑟(𝑐𝑜𝑠(𝜔) + 𝑖 sin(𝜔))

𝜆2 = 𝑟(𝑐𝑜𝑠(𝜔) − 𝑖 sin(𝜔))

(To read about the polar form, see here)


Given initial conditions 𝑌−1 , 𝑌−2 , we want to generate a solution of the difference equation
(6).
It can be represented as

𝑌𝑡 = 𝜆𝑡1 𝑐1 + 𝜆𝑡2 𝑐2

where 𝑐1 and 𝑐2 are constants that depend on the two initial conditions and on 𝜌1 , 𝜌2 .
When the roots are complex, it is useful to pursue the following calculations.
Notice that

𝑌𝑡 = 𝑐1 (𝑟𝑒𝑖𝜔 )𝑡 + 𝑐2 (𝑟𝑒−𝑖𝜔 )𝑡
= 𝑐1 𝑟𝑡 𝑒𝑖𝜔𝑡 + 𝑐2 𝑟𝑡 𝑒−𝑖𝜔𝑡
= 𝑐1 𝑟𝑡 [cos(𝜔𝑡) + 𝑖 sin(𝜔𝑡)] + 𝑐2 𝑟𝑡 [cos(𝜔𝑡) − 𝑖 sin(𝜔𝑡)]
= (𝑐1 + 𝑐2 )𝑟𝑡 cos(𝜔𝑡) + 𝑖(𝑐1 − 𝑐2 )𝑟𝑡 sin(𝜔𝑡)

The only way that 𝑌𝑡 can be a real number for each 𝑡 is if 𝑐1 + 𝑐2 is a real number and 𝑐1 − 𝑐2
is an imaginary number.
This happens only when 𝑐1 and 𝑐2 are complex conjugates, in which case they can be written
in the polar forms

𝑐1 = 𝑣𝑒𝑖𝜃 , 𝑐2 = 𝑣𝑒−𝑖𝜃

So we can write

𝑌𝑡 = 𝑣𝑒𝑖𝜃 𝑟𝑡 𝑒𝑖𝜔𝑡 + 𝑣𝑒−𝑖𝜃 𝑟𝑡 𝑒−𝑖𝜔𝑡


= 𝑣𝑟𝑡 [𝑒𝑖(𝜔𝑡+𝜃) + 𝑒−𝑖(𝜔𝑡+𝜃) ]
= 2𝑣𝑟𝑡 cos(𝜔𝑡 + 𝜃)

where 𝑣 and 𝜃 are constants that must be chosen to satisfy initial conditions for 𝑌−1 , 𝑌−2 .
This formula shows that when the roots are complex, 𝑌𝑡 displays oscillations with period
𝑝̌ = 2𝜋
𝜔 and damping factor 𝑟.

We say that 𝑝̌ is the period because in that amount of time the cosine wave cos(𝜔𝑡 + 𝜃) goes
through exactly one complete cycles.
(Draw a cosine function to convince yourself of this please)
212 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR

Remark: Following [93], we want to choose the parameters 𝑎, 𝑏 of the model so that the ab-
solute values (of the possibly complex) roots 𝜆1 , 𝜆2 of the characteristic polynomial are both
strictly less than one:

|𝜆𝑗 | < 1 for 𝑗 = 1, 2

Remark: When both roots 𝜆1 , 𝜆2 of the characteristic polynomial have absolute values
strictly less than one, the absolute value of the larger one governs the rate of convergence to
the steady state of the non stochastic version of the model.

13.3.3 Things This Lecture Does

We write a function to generate simulations of a {𝑌𝑡 } sequence as a function of time.


The function requires that we put in initial conditions for 𝑌−1 , 𝑌−2 .
The function checks that 𝑎, 𝑏 are set so that 𝜆1 , 𝜆2 are less than unity in absolute value (also
called “modulus”).
The function also tells us whether the roots are complex, and, if they are complex, returns
both their real and complex parts.
If the roots are both real, the function returns their values.
We use our function written to simulate paths that are stochastic (when 𝜎 > 0).
We have written the function in a way that allows us to input {𝐺𝑡 } paths of a few simple
forms, e.g.,
• one time jumps in 𝐺 at some time
• a permanent jump in 𝐺 that occurs at some time
We proceed to use the Samuelson multiplier-accelerator model as a laboratory to make a sim-
ple OOP example.
The “state” that determines next period’s 𝑌𝑡+1 is now not just the current value 𝑌𝑡 but also
the once lagged value 𝑌𝑡−1 .
This involves a little more bookkeeping than is required in the Solow model class definition.
We use the Samuelson multiplier-accelerator model as a vehicle for teaching how we can grad-
ually add more features to the class.
We want to have a method in the class that automatically generates a simulation, either non-
stochastic (𝜎 = 0) or stochastic (𝜎 > 0).
We also show how to map the Samuelson model into a simple instance of the
LinearStateSpace class described here.

We can use a LinearStateSpace instance to do various things that we did above with our
homemade function and class.
Among other things, we show by example that the eigenvalues of the matrix 𝐴 that we use to
form the instance of the LinearStateSpace class for the Samuelson model equal the roots of
the characteristic polynomial (7) for the Samuelson multiplier accelerator model.
Here is the formula for the matrix 𝐴 in the linear state space system in the case that govern-
ment expenditures are a constant 𝐺:
13.4. IMPLEMENTATION 213

1 0 0
𝐴=⎡
⎢𝛾 + 𝐺 𝜌 ⎤
1 𝜌2 ⎥
⎣ 0 1 0⎦

13.4 Implementation

We’ll start by drawing an informative graph from page 189 of [95]

In [4]: def param_plot():

"""This function creates the graph on page 189 of


Sargent Macroeconomic Theory, second edition, 1987.
"""

fig, ax = plt.subplots(figsize=(10, 6))


ax.set_aspect('equal')

# Set axis
xmin, ymin = ­3, ­2
xmax, ymax = ­xmin, ­ymin
plt.axis([xmin, xmax, ymin, ymax])

# Set axis labels


ax.set(xticks=[], yticks=[])
ax.set_xlabel(r'$\rho_2$', fontsize=16)
ax.xaxis.set_label_position('top')
ax.set_ylabel(r'$\rho_1$', rotation=0, fontsize=16)
ax.yaxis.set_label_position('right')

# Draw (t1, t2) points


ρ1 = np.linspace(­2, 2, 100)
ax.plot(ρ1, ­abs(ρ1) + 1, c='black')
ax.plot(ρ1, np.ones_like(ρ1) * ­1, c='black')
ax.plot(ρ1, ­(ρ1**2 / 4), c='black')

# Turn normal axes off


for spine in ['left', 'bottom', 'top', 'right']:
ax.spines[spine].set_visible(False)

# Add arrows to represent axes


axes_arrows = {'arrowstyle': '<|­|>', 'lw': 1.3}
ax.annotate('', xy=(xmin, 0), xytext=(xmax, 0), arrowprops=axes_arrows)
ax.annotate('', xy=(0, ymin), xytext=(0, ymax), arrowprops=axes_arrows)

# Annotate the plot with equations


plot_arrowsl = {'arrowstyle': '­|>', 'connectionstyle': "arc3, rad=­0.2"}
plot_arrowsr = {'arrowstyle': '­|>', 'connectionstyle': "arc3, rad=0.2"}
ax.annotate(r'$\rho_1 + \rho_2 < 1$', xy=(0.5, 0.3), xytext=(0.8, 0.6),
arrowprops=plot_arrowsr, fontsize='12')
ax.annotate(r'$\rho_1 + \rho_2 = 1$', xy=(0.38, 0.6), xytext=(0.6, 0.8),
arrowprops=plot_arrowsr, fontsize='12')
ax.annotate(r'$\rho_2 < 1 + \rho_1$', xy=(­0.5, 0.3), xytext=(­1.3, 0.6),
arrowprops=plot_arrowsl, fontsize='12')
ax.annotate(r'$\rho_2 = 1 + \rho_1$', xy=(­0.38, 0.6), xytext=(­1, 0.8),
arrowprops=plot_arrowsl, fontsize='12')
ax.annotate(r'$\rho_2 = ­1$', xy=(1.5, ­1), xytext=(1.8, ­1.3),
arrowprops=plot_arrowsl, fontsize='12')
ax.annotate(r'${\rho_1}^2 + 4\rho_2 = 0$', xy=(1.15, ­0.35),
xytext=(1.5, ­0.3), arrowprops=plot_arrowsr, fontsize='12')
214 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR

ax.annotate(r'${\rho_1}^2 + 4\rho_2 < 0$', xy=(1.4, ­0.7),


xytext=(1.8, ­0.6), arrowprops=plot_arrowsr, fontsize='12')

# Label categories of solutions


ax.text(1.5, 1, 'Explosive\n growth', ha='center', fontsize=16)
ax.text(­1.5, 1, 'Explosive\n oscillations', ha='center', fontsize=16)
ax.text(0.05, ­1.5, 'Explosive oscillations', ha='center', fontsize=16)
ax.text(0.09, ­0.5, 'Damped oscillations', ha='center', fontsize=16)

# Add small marker to y­axis


ax.axhline(y=1.005, xmin=0.495, xmax=0.505, c='black')
ax.text(­0.12, ­1.12, '­1', fontsize=10)
ax.text(­0.12, 0.98, '1', fontsize=10)

return fig

param_plot()
plt.show()

The graph portrays regions in which the (𝜆1 , 𝜆2 ) root pairs implied by the (𝜌1 = (𝑎 + 𝑏), 𝜌2 =
−𝑏) difference equation parameter pairs in the Samuelson model are such that:
• (𝜆1 , 𝜆2 ) are complex with modulus less than 1 - in this case, the {𝑌𝑡 } sequence displays
damped oscillations.
• (𝜆1 , 𝜆2 ) are both real, but one is strictly greater than 1 - this leads to explosive growth.
• (𝜆1 , 𝜆2 ) are both real, but one is strictly less than −1 - this leads to explosive oscilla-
tions.
• (𝜆1 , 𝜆2 ) are both real and both are less than 1 in absolute value - in this case, there is
smooth convergence to the steady state without damped cycles.
Later we’ll present the graph with a red mark showing the particular point implied by the
setting of (𝑎, 𝑏).
13.4. IMPLEMENTATION 215

13.4.1 Function to Describe Implications of Characteristic Polynomial

In [5]: def categorize_solution(ρ1, ρ2):

"""This function takes values of ρ1 and ρ2 and uses them


to classify the type of solution
"""

discriminant = ρ1 ** 2 + 4 * ρ2
if ρ2 > 1 + ρ1 or ρ2 < ­1:
print('Explosive oscillations')
elif ρ1 + ρ2 > 1:
print('Explosive growth')
elif discriminant < 0:
print('Roots are complex with modulus less than one; \
therefore damped oscillations')
else:
print('Roots are real and absolute values are less than one; \
therefore get smooth convergence to a steady state')

In [6]: ### Test the categorize_solution function

categorize_solution(1.3, ­.4)

Roots are real and absolute values are less than one; therefore get smooth�
↪convergence
to a steady state

13.4.2 Function for Plotting Paths

A useful function for our work below is

In [7]: def plot_y(function=None):

"""Function plots path of Y_t"""

plt.subplots(figsize=(10, 6))
plt.plot(function)
plt.xlabel('Time $t$')
plt.ylabel('$Y_t$', rotation=0)
plt.grid()
plt.show()

13.4.3 Manual or “by hand” Root Calculations

The following function calculates roots of the characteristic polynomial using high school al-
gebra.
(We’ll calculate the roots in other ways later)
The function also plots a 𝑌𝑡 starting from initial conditions that we set

In [8]: # This is a 'manual' method

def y_nonstochastic(y_0=100, y_1=80, α=.92, β=.5, γ=10, n=80):


216 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR

"""Takes values of parameters and computes the roots of characteristic


polynomial. It tells whether they are real or complex and whether they
are less than unity in absolute value.It also computes a simulation of
length n starting from the two given initial conditions for national
income
"""

roots = []

ρ1 = α + β
ρ2 = ­β

print(f'ρ_1 is {ρ1}')
print(f'ρ_2 is {ρ2}')

discriminant = ρ1 ** 2 + 4 * ρ2

if discriminant == 0:
roots.append(­ρ1 / 2)
print('Single real root: ')
print(''.join(str(roots)))
elif discriminant > 0:
roots.append((­ρ1 + sqrt(discriminant).real) / 2)
roots.append((­ρ1 ­ sqrt(discriminant).real) / 2)
print('Two real roots: ')
print(''.join(str(roots)))
else:
roots.append((­ρ1 + sqrt(discriminant)) / 2)
roots.append((­ρ1 ­ sqrt(discriminant)) / 2)
print('Two complex roots: ')
print(''.join(str(roots)))

if all(abs(root) < 1 for root in roots):


print('Absolute values of roots are less than one')
else:
print('Absolute values of roots are not less than one')

def transition(x, t): return ρ1 * x[t ­ 1] + ρ2 * x[t ­ 2] + γ

y_t = [y_0, y_1]

for t in range(2, n):


y_t.append(transition(y_t, t))

return y_t

plot_y(y_nonstochastic())

ρ_1 is 1.42
ρ_2 is ­0.5
Two real roots:
[­0.6459687576256715, ­0.7740312423743284]
Absolute values of roots are less than one
13.4. IMPLEMENTATION 217

13.4.4 Reverse-Engineering Parameters to Generate Damped Cycles

The next cell writes code that takes as inputs the modulus 𝑟 and phase 𝜙 of a conjugate pair
of complex numbers in polar form

𝜆1 = 𝑟 exp(𝑖𝜙), 𝜆2 = 𝑟 exp(−𝑖𝜙)
• The code assumes that these two complex numbers are the roots of the characteristic
polynomial
• It then reverse-engineers (𝑎, 𝑏) and (𝜌1 , 𝜌2 ), pairs that would generate those roots

In [9]: ### code to reverse­engineer a cycle


### y_t = r^t (c_1 cos( t) + c2 sin( t))
###

def f(r, ϕ):


"""
Takes modulus r and angle of complex number r exp(j )
and creates ρ1 and ρ2 of characteristic polynomial for which
r exp(j ) and r exp(­ j ) are complex roots.

Returns the multiplier coefficient a and the accelerator coefficient b


that verifies those roots.
"""
g1 = cmath.rect(r, ϕ) # Generate two complex roots
g2 = cmath.rect(r, ­ϕ)
ρ1 = g1 + g2 # Implied ρ1, ρ2
ρ2 = ­g1 * g2
b = ­ρ2 # Reverse­engineer a and b that validate these
a = ρ1 ­ b
return ρ1, ρ2, a, b

## Now let's use the function in an example


218 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR

## Here are the example parameters

r = .95
period = 10 # Length of cycle in units of time
ϕ = 2 * math.pi/period

## Apply the function

ρ1, ρ2, a, b = f(r, ϕ)

print(f"a, b = {a}, {b}")


print(f"ρ1, ρ2 = {ρ1}, {ρ2}")

a, b = (0.6346322893124001+0j), (0.9024999999999999­0j)
ρ1, ρ2 = (1.5371322893124+0j), (­0.9024999999999999+0j)

In [10]: ## Print the real components of ρ1 and ρ2

ρ1 = ρ1.real
ρ2 = ρ2.real

ρ1, ρ2

Out[10]: (1.5371322893124, ­0.9024999999999999)

13.4.5 Root Finding Using Numpy

Here we’ll use numpy to compute the roots of the characteristic polynomial

In [11]: r1, r2 = np.roots([1, ­ρ1, ­ρ2])

p1 = cmath.polar(r1)
p2 = cmath.polar(r2)

print(f"r, ϕ = {r}, {ϕ}")


print(f"p1, p2 = {p1}, {p2}")
# print(f"g1, g2 = {g1}, {g2}")

print(f"a, b = {a}, {b}")


print(f"ρ1, ρ2 = {ρ1}, {ρ2}")

r, ϕ = 0.95, 0.6283185307179586
p1, p2 = (0.95, 0.6283185307179586), (0.95, ­0.6283185307179586)
a, b = (0.6346322893124001+0j), (0.9024999999999999­0j)
ρ1, ρ2 = 1.5371322893124, ­0.9024999999999999

In [12]: ##=== This method uses numpy to calculate roots ===#

def y_nonstochastic(y_0=100, y_1=80, α=.9, β=.8, γ=10, n=80):

""" Rather than computing the roots of the characteristic


polynomial by hand as we did earlier, this function
enlists numpy to do the work for us
"""
13.4. IMPLEMENTATION 219

# Useful constants
ρ1 = α + β
ρ2 = ­β

categorize_solution(ρ1, ρ2)

# Find roots of polynomial


roots = np.roots([1, ­ρ1, ­ρ2])
print(f'Roots are {roots}')

# Check if real or complex


if all(isinstance(root, complex) for root in roots):
print('Roots are complex')
else:
print('Roots are real')

# Check if roots are less than one


if all(abs(root) < 1 for root in roots):
print('Roots are less than one')
else:
print('Roots are not less than one')

# Define transition equation


def transition(x, t): return ρ1 * x[t ­ 1] + ρ2 * x[t ­ 2] + γ

# Set initial conditions


y_t = [y_0, y_1]

# Generate y_t series


for t in range(2, n):
y_t.append(transition(y_t, t))

return y_t

plot_y(y_nonstochastic())

Roots are complex with modulus less than one; therefore damped oscillations
Roots are [0.85+0.27838822j 0.85­0.27838822j]
Roots are complex
Roots are less than one
220 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR

13.4.6 Reverse-Engineered Complex Roots: Example

The next cell studies the implications of reverse-engineered complex roots.


We’ll generate an undamped cycle of period 10

In [13]: r = 1 # Generates undamped, nonexplosive cycles

period = 10 # Length of cycle in units of time


ϕ = 2 * math.pi/period

## Apply the reverse­engineering function f

ρ1, ρ2, a, b = f(r, ϕ)

# Drop the imaginary part so that it is a valid input into y_nonstochastic


a = a.real
b = b.real

print(f"a, b = {a}, {b}")

ytemp = y_nonstochastic(α=a, β=b, y_0=20, y_1=30)


plot_y(ytemp)

a, b = 0.6180339887498949, 1.0
Roots are complex with modulus less than one; therefore damped oscillations
Roots are [0.80901699+0.58778525j 0.80901699­0.58778525j]
Roots are complex
Roots are less than one
13.4. IMPLEMENTATION 221

13.4.7 Digression: Using Sympy to Find Roots

We can also use sympy to compute analytic formulas for the roots

In [14]: init_printing()

r1 = Symbol("ρ_1")
r2 = Symbol("ρ_2")
z = Symbol("z")

sympy.solve(z**2 ­ r1*z ­ r2, z)

𝜌1 √𝜌21 + 4𝜌2 𝜌1 √𝜌21 + 4𝜌2


Out[14]: [ − , + ]
2 2 2 2

𝜌1 1 𝜌1 1
[ − √𝜌12 + 4𝜌2 , + √𝜌12 + 4𝜌2 ]
2 2 2 2

In [15]: a = Symbol("α")
b = Symbol("β")
r1 = a + b
r2 = ­b

sympy.solve(z**2 ­ r1*z ­ r2, z)

𝛼 𝛽 √𝛼2 + 2𝛼𝛽 + 𝛽2 − 4𝛽 𝛼 𝛽 √𝛼2 + 2𝛼𝛽 + 𝛽2 − 4𝛽


Out[15]: [ + − , + + ]
2 2 2 2 2 2

𝛼 𝛽 1 𝛼 𝛽 1
[ + − √𝛼2 + 2𝛼𝛽 + 𝛽 2 − 4𝛽, + + √𝛼2 + 2𝛼𝛽 + 𝛽 2 − 4𝛽]
2 2 2 2 2 2
222 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR

13.5 Stochastic Shocks

Now we’ll construct some code to simulate the stochastic version of the model that emerges
when we add a random shock process to aggregate demand

In [16]: def y_stochastic(y_0=0, y_1=0, α=0.8, β=0.2, γ=10, n=100, σ=5):

"""This function takes parameters of a stochastic version of


the model and proceeds to analyze the roots of the characteristic
polynomial and also generate a simulation.
"""

# Useful constants
ρ1 = α + β
ρ2 = ­β

# Categorize solution
categorize_solution(ρ1, ρ2)

# Find roots of polynomial


roots = np.roots([1, ­ρ1, ­ρ2])
print(roots)

# Check if real or complex


if all(isinstance(root, complex) for root in roots):
print('Roots are complex')
else:
print('Roots are real')

# Check if roots are less than one


if all(abs(root) < 1 for root in roots):
print('Roots are less than one')
else:
print('Roots are not less than one')

# Generate shocks
ϵ = np.random.normal(0, 1, n)

# Define transition equation


def transition(x, t): return ρ1 * \
x[t ­ 1] + ρ2 * x[t ­ 2] + γ + σ * ϵ[t]

# Set initial conditions


y_t = [y_0, y_1]

# Generate y_t series


for t in range(2, n):
y_t.append(transition(y_t, t))

return y_t

plot_y(y_stochastic())

Roots are real and absolute values are less than one; therefore get smooth�
↪convergence
to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
13.5. STOCHASTIC SHOCKS 223

Let’s do a simulation in which there are shocks and the characteristic polynomial has complex
roots

In [17]: r = .97

period = 10 # Length of cycle in units of time


ϕ = 2 * math.pi/period

### Apply the reverse­engineering function f

ρ1, ρ2, a, b = f(r, ϕ)

# Drop the imaginary part so that it is a valid input into y_nonstochastic


a = a.real
b = b.real

print(f"a, b = {a}, {b}")


plot_y(y_stochastic(y_0=40, y_1 = 42, α=a, β=b, σ=2, n=100))

a, b = 0.6285929690873979, 0.9409000000000001
Roots are complex with modulus less than one; therefore damped oscillations
[0.78474648+0.57015169j 0.78474648­0.57015169j]
Roots are complex
Roots are less than one
224 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR

13.6 Government Spending

This function computes a response to either a permanent or one-off increase in government


expenditures

In [18]: def y_stochastic_g(y_0=20,


y_1=20,
α=0.8,
β=0.2,
γ=10,
n=100,
σ=2,
g=0,
g_t=0,
duration='permanent'):

"""This program computes a response to a permanent increase


in government expenditures that occurs at time 20
"""

# Useful constants
ρ1 = α + β
ρ2 = ­β

# Categorize solution
categorize_solution(ρ1, ρ2)

# Find roots of polynomial


roots = np.roots([1, ­ρ1, ­ρ2])
print(roots)

# Check if real or complex


if all(isinstance(root, complex) for root in roots):
print('Roots are complex')
13.6. GOVERNMENT SPENDING 225

else:
print('Roots are real')

# Check if roots are less than one


if all(abs(root) < 1 for root in roots):
print('Roots are less than one')
else:
print('Roots are not less than one')

# Generate shocks
ϵ = np.random.normal(0, 1, n)

def transition(x, t, g):

# Non­stochastic ­ separated to avoid generating random series


# when not needed
if σ == 0:
return ρ1 * x[t ­ 1] + ρ2 * x[t ­ 2] + γ + g

# Stochastic
else:
ϵ = np.random.normal(0, 1, n)
return ρ1 * x[t ­ 1] + ρ2 * x[t ­ 2] + γ + g + σ * ϵ[t]

# Create list and set initial conditions


y_t = [y_0, y_1]

# Generate y_t series


for t in range(2, n):

# No government spending
if g == 0:
y_t.append(transition(y_t, t))

# Government spending (no shock)


elif g != 0 and duration == None:
y_t.append(transition(y_t, t))

# Permanent government spending shock


elif duration == 'permanent':
if t < g_t:
y_t.append(transition(y_t, t, g=0))
else:
y_t.append(transition(y_t, t, g=g))

# One­off government spending shock


elif duration == 'one­off':
if t == g_t:
y_t.append(transition(y_t, t, g=g))
else:
y_t.append(transition(y_t, t, g=0))
return y_t

A permanent government spending shock can be simulated as follows

In [19]: plot_y(y_stochastic_g(g=10, g_t=20, duration='permanent'))

Roots are real and absolute values are less than one; therefore get smooth�
↪convergence
to a steady state
[0.7236068 0.2763932]
226 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR

Roots are real


Roots are less than one

We can also see the response to a one time jump in government expenditures

In [20]: plot_y(y_stochastic_g(g=500, g_t=50, duration='one­off'))

Roots are real and absolute values are less than one; therefore get smooth�
↪convergence
to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
13.7. WRAPPING EVERYTHING INTO A CLASS 227

13.7 Wrapping Everything Into a Class

Up to now, we have written functions to do the work.


Now we’ll roll up our sleeves and write a Python class called Samuelson for the Samuelson
model

In [21]: class Samuelson():

"""This class represents the Samuelson model, otherwise known as the


multiple­accelerator model. The model combines the Keynesian multiplier
with the accelerator theory of investment.

The path of output is governed by a linear second­order difference equation

.. math::

Y_t = + \alpha (1 + \beta) Y_{t­1} ­ \alpha \beta Y_{t­2}

Parameters
­­­­­­­­­­
y_0 : scalar
Initial condition for Y_0
y_1 : scalar
Initial condition for Y_1
α : scalar
Marginal propensity to consume
β : scalar
Accelerator coefficient
n : int
Number of iterations
σ : scalar
Volatility parameter. It must be greater than or equal to 0. Set
equal to 0 for a non­stochastic model.
228 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR

g : scalar
Government spending shock
g_t : int
Time at which government spending shock occurs. Must be specified
when duration != None.
duration : {None, 'permanent', 'one­off'}
Specifies type of government spending shock. If none, government
spending equal to g for all t.

"""

def __init__(self,
y_0=100,
y_1=50,
α=1.3,
β=0.2,
γ=10,
n=100,
σ=0,
g=0,
g_t=0,
duration=None):

self.y_0, self.y_1, self.α, self.β = y_0, y_1, α, β


self.n, self.g, self.g_t, self.duration = n, g, g_t, duration
self.γ, self.σ = γ, σ
self.ρ1 = α + β
self.ρ2 = ­β
self.roots = np.roots([1, ­self.ρ1, ­self.ρ2])

def root_type(self):
if all(isinstance(root, complex) for root in self.roots):
return 'Complex conjugate'
elif len(self.roots) > 1:
return 'Double real'
else:
return 'Single real'

def root_less_than_one(self):
if all(abs(root) < 1 for root in self.roots):
return True

def solution_type(self):
ρ1, ρ2 = self.ρ1, self.ρ2
discriminant = ρ1 ** 2 + 4 * ρ2
if ρ2 >= 1 + ρ1 or ρ2 <= ­1:
return 'Explosive oscillations'
elif ρ1 + ρ2 >= 1:
return 'Explosive growth'
elif discriminant < 0:
return 'Damped oscillations'
else:
return 'Steady state'

def _transition(self, x, t, g):

# Non­stochastic ­ separated to avoid generating random series


# when not needed
if self.σ == 0:
return self.ρ1 * x[t ­ 1] + self.ρ2 * x[t ­ 2] + self.γ + g

# Stochastic
else:
13.7. WRAPPING EVERYTHING INTO A CLASS 229

ϵ = np.random.normal(0, 1, self.n)
return self.ρ1 * x[t ­ 1] + self.ρ2 * x[t ­ 2] + self.γ + g \
+ self.σ * ϵ[t]

def generate_series(self):

# Create list and set initial conditions


y_t = [self.y_0, self.y_1]

# Generate y_t series


for t in range(2, self.n):

# No government spending
if self.g == 0:
y_t.append(self._transition(y_t, t))

# Government spending (no shock)


elif self.g != 0 and self.duration == None:
y_t.append(self._transition(y_t, t))

# Permanent government spending shock


elif self.duration == 'permanent':
if t < self.g_t:
y_t.append(self._transition(y_t, t, g=0))
else:
y_t.append(self._transition(y_t, t, g=self.g))

# One­off government spending shock


elif self.duration == 'one­off':
if t == self.g_t:
y_t.append(self._transition(y_t, t, g=self.g))
else:
y_t.append(self._transition(y_t, t, g=0))
return y_t

def summary(self):
print('Summary\n' + '­' * 50)
print(f'Root type: {self.root_type()}')
print(f'Solution type: {self.solution_type()}')
print(f'Roots: {str(self.roots)}')

if self.root_less_than_one() == True:
print('Absolute value of roots is less than one')
else:
print('Absolute value of roots is not less than one')

if self.σ > 0:
print('Stochastic series with σ = ' + str(self.σ))
else:
print('Non­stochastic series')

if self.g != 0:
print('Government spending equal to ' + str(self.g))

if self.duration != None:
print(self.duration.capitalize() +
' government spending shock at t = ' + str(self.g_t))

def plot(self):
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(self.generate_series())
ax.set(xlabel='Iteration', xlim=(0, self.n))
ax.set_ylabel('$Y_t$', rotation=0)
230 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR

ax.grid()

# Add parameter values to plot


paramstr = f'$\\alpha={self.α:.2f}$ \n $\\beta={self.β:.2f}$ \n \
$\\gamma={self.γ:.2f}$ \n $\\sigma={self.σ:.2f}$ \n \
$\\rho_1={self.ρ1:.2f}$ \n $\\rho_2={self.ρ2:.2f}$'
props = dict(fc='white', pad=10, alpha=0.5)
ax.text(0.87, 0.05, paramstr, transform=ax.transAxes,
fontsize=12, bbox=props, va='bottom')

return fig

def param_plot(self):

# Uses the param_plot() function defined earlier (it is then able


# to be used standalone or as part of the model)

fig = param_plot()
ax = fig.gca()

# Add λ values to legend


for i, root in enumerate(self.roots):
if isinstance(root, complex):
# Need to fill operator for positive as string is split apart
operator = ['+', '']
label = rf'$\lambda_{i+1} = {sam.roots[i].real:.2f} \
{operator[i]} {sam.roots[i].imag:.2f}i$'
else:
label = rf'$\lambda_{i+1} = {sam.roots[i].real:.2f}$'
ax.scatter(0, 0, 0, label=label) # dummy to add to legend

# Add ρ pair to plot


ax.scatter(self.ρ1, self.ρ2, 100, 'red', '+',
label=r'$(\ \rho_1, \ \rho_2 \ )$', zorder=5)

plt.legend(fontsize=12, loc=3)

return fig

13.7.1 Illustration of Samuelson Class

Now we’ll put our Samuelson class to work on an example

In [22]: sam = Samuelson(α=0.8, β=0.5, σ=2, g=10, g_t=20, duration='permanent')


sam.summary()

Summary
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
Root type: Complex conjugate
Solution type: Damped oscillations
Roots: [0.65+0.27838822j 0.65­0.27838822j]
Absolute value of roots is less than one
Stochastic series with σ = 2
Government spending equal to 10
Permanent government spending shock at t = 20

In [23]: sam.plot()
plt.show()
13.7. WRAPPING EVERYTHING INTO A CLASS 231

13.7.2 Using the Graph

We’ll use our graph to show where the roots lie and how their location is consistent with the
behavior of the path just graphed.
The red + sign shows the location of the roots

In [24]: sam.param_plot()
plt.show()
232 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR

13.8 Using the LinearStateSpace Class

It turns out that we can use the QuantEcon.py LinearStateSpace class to do much of the
work that we have done from scratch above.
Here is how we map the Samuelson model into an instance of a LinearStateSpace class

In [25]: """This script maps the Samuelson model in the the


``LinearStateSpace`` class
"""
α = 0.8
β = 0.9
ρ1 = α + β
ρ2 = ­β
γ = 10
σ = 1
g = 10
n = 100

A = [[1, 0, 0],
[γ + g, ρ1, ρ2],
[0, 1, 0]]

G = [[γ + g, ρ1, ρ2], # this is Y_{t+1}


[γ, α, 0], # this is C_{t+1}
[0, β, ­β]] # this is I_{t+1}

μ_0 = [1, 100, 100]


C = np.zeros((3,1))
C[1] = σ # stochastic

sam_t = LinearStateSpace(A, C, G, mu_0=μ_0)

x, y = sam_t.simulate(ts_length=n)

fig, axes = plt.subplots(3, 1, sharex=True, figsize=(12, 8))


titles = ['Output ($Y_t$)', 'Consumption ($C_t$)', 'Investment ($I_t$)']
colors = ['darkblue', 'red', 'purple']
for ax, series, title, color in zip(axes, y, titles, colors):
ax.plot(series, color=color)
ax.set(title=title, xlim=(0, n))
ax.grid()

axes[­1].set_xlabel('Iteration')

plt.show()
13.8. USING THE LINEARSTATESPACE CLASS 233

13.8.1 Other Methods in the LinearStateSpace Class

Let’s plot impulse response functions for the instance of the Samuelson model using a
method in the LinearStateSpace class

In [26]: imres = sam_t.impulse_response()


imres = np.asarray(imres)
y1 = imres[:, :, 0]
y2 = imres[:, :, 1]
y1.shape

Out[26]: (2, 6, 1)

(2, 6, 1)

Now let’s compute the zeros of the characteristic polynomial by simply calculating the eigen-
values of 𝐴

In [27]: A = np.asarray(A)
w, v = np.linalg.eig(A)
print(w)

[0.85+0.42130749j 0.85­0.42130749j 1. +0.j ]


234 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR

13.8.2 Inheriting Methods from LinearStateSpace

We could also create a subclass of LinearStateSpace (inheriting all its methods and at-
tributes) to add more functions to use

In [28]: class SamuelsonLSS(LinearStateSpace):

"""
This subclass creates a Samuelson multiplier­accelerator model
as a linear state space system.
"""
def __init__(self,
y_0=100,
y_1=100,
α=0.8,
β=0.9,
γ=10,
σ=1,
g=10):

self.α, self.β = α, β
self.y_0, self.y_1, self.g = y_0, y_1, g
self.γ, self.σ = γ, σ

# Define intial conditions


self.μ_0 = [1, y_0, y_1]

self.ρ1 = α + β
self.ρ2 = ­β

# Define transition matrix


self.A = [[1, 0, 0],
[γ + g, self.ρ1, self.ρ2],
[0, 1, 0]]

# Define output matrix


self.G = [[γ + g, self.ρ1, self.ρ2], # this is Y_{t+1}
[γ, α, 0], # this is C_{t+1}
[0, β, ­β]] # this is I_{t+1}

self.C = np.zeros((3, 1))


self.C[1] = σ # stochastic

# Initialize LSS with parameters from Samuelson model


LinearStateSpace.__init__(self, self.A, self.C, self.G, mu_0=self.μ_0)

def plot_simulation(self, ts_length=100, stationary=True):

# Temporarily store original parameters


temp_μ = self.μ_0
temp_Σ = self.Sigma_0

# Set distribution parameters equal to their stationary


# values for simulation
if stationary == True:
try:
self.μ_x, self.μ_y, self.σ_x, self.σ_y = \
self.stationary_distributions()
self.μ_0 = self.μ_y
self.Σ_0 = self.σ_y
# Exception where no convergence achieved when
#calculating stationary distributions
13.8. USING THE LINEARSTATESPACE CLASS 235

except ValueError:
print('Stationary distribution does not exist')

x, y = self.simulate(ts_length)

fig, axes = plt.subplots(3, 1, sharex=True, figsize=(12, 8))


titles = ['Output ($Y_t$)', 'Consumption ($C_t$)', 'Investment ($I_t$)']
colors = ['darkblue', 'red', 'purple']
for ax, series, title, color in zip(axes, y, titles, colors):
ax.plot(series, color=color)
ax.set(title=title, xlim=(0, n))
ax.grid()

axes[­1].set_xlabel('Iteration')

# Reset distribution parameters to their initial values


self.μ_0 = temp_μ
self.Sigma_0 = temp_Σ

return fig

def plot_irf(self, j=5):

x, y = self.impulse_response(j)

# Reshape into 3 x j matrix for plotting purposes


yimf = np.array(y).flatten().reshape(j+1, 3).T

fig, axes = plt.subplots(3, 1, sharex=True, figsize=(12, 8))


labels = ['$Y_t$', '$C_t$', '$I_t$']
colors = ['darkblue', 'red', 'purple']
for ax, series, label, color in zip(axes, yimf, labels, colors):
ax.plot(series, color=color)
ax.set(xlim=(0, j))
ax.set_ylabel(label, rotation=0, fontsize=14, labelpad=10)
ax.grid()

axes[0].set_title('Impulse Response Functions')


axes[­1].set_xlabel('Iteration')

return fig

def multipliers(self, j=5):


x, y = self.impulse_response(j)
return np.sum(np.array(y).flatten().reshape(j+1, 3), axis=0)

13.8.3 Illustrations

Let’s show how we can use the SamuelsonLSS

In [29]: samlss = SamuelsonLSS()

In [30]: samlss.plot_simulation(100, stationary=False)


plt.show()
236 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR

In [31]: samlss.plot_simulation(100, stationary=True)


plt.show()
13.9. PURE MULTIPLIER MODEL 237

In [32]: samlss.plot_irf(100)
plt.show()

In [33]: samlss.multipliers()

Out[33]: array([7.414389, 6.835896, 0.578493])

13.9 Pure Multiplier Model

Let’s shut down the accelerator by setting 𝑏 = 0 to get a pure multiplier model
• the absence of cycles gives an idea about why Samuelson included the accelerator

In [34]: pure_multiplier = SamuelsonLSS(α=0.95, β=0)

In [35]: pure_multiplier.plot_simulation()

Stationary distribution does not exist

Out[35]:
238 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR

In [36]: pure_multiplier = SamuelsonLSS(α=0.8, β=0)

In [37]: pure_multiplier.plot_simulation()

Out[37]:
13.10. SUMMARY 239

In [38]: pure_multiplier.plot_irf(100)

Out[38]:

13.10 Summary

In this lecture, we wrote functions and classes to represent non-stochastic and stochastic ver-
sions of the Samuelson (1939) multiplier-accelerator model, described in [93].
We saw that different parameter values led to different output paths, which could either be
stationary, explosive, or oscillating.
We also were able to represent the model using the QuantEcon.py LinearStateSpace class.
240 CHAPTER 13. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
Chapter 14

Kesten Processes and Firm


Dynamics

14.1 Contents

• Overview 14.2
• Kesten Processes 14.3
• Heavy Tails 14.4
• Application: Firm Dynamics 14.5
• Exercises 14.6
• Solutions 14.7
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon


!pip install ­­upgrade yfinance

14.2 Overview

Previously we learned about linear scalar-valued stochastic processes (AR(1) models).


Now we generalize these linear models slightly by allowing the multiplicative coefficient to be
stochastic.
Such processes are known as Kesten processes after German–American mathematician Harry
Kesten (1931–2019)
Although simple to write down, Kesten processes are interesting for at least two reasons:

1. A number of significant economic processes are or can be described as Kesten processes.

2. Kesten processes generate interesting dynamics, including, in some cases, heavy-tailed


cross-sectional distributions.

We will discuss these issues as we go along.


Let’s start with some imports:

241
242 CHAPTER 14. KESTEN PROCESSES AND FIRM DYNAMICS

In [2]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

import quantecon as qe

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

The following two lines are only added to avoid a FutureWarning caused by compatibility is-
sues between pandas and matplotlib.

In [3]: from pandas.plotting import register_matplotlib_converters


register_matplotlib_converters()

Additional technical background related to this lecture can be found in the monograph of
[17].

14.3 Kesten Processes

A Kesten process is a stochastic process of the form

𝑋𝑡+1 = 𝑎𝑡+1 𝑋𝑡 + 𝜂𝑡+1 (1)

where {𝑎𝑡 }𝑡≥1 and {𝜂𝑡 }𝑡≥1 are IID sequences.


We are interested in the dynamics of {𝑋𝑡 }𝑡≥0 when 𝑋0 is given.
We will focus on the nonnegative scalar case, where 𝑋𝑡 takes values in ℝ+ .
In particular, we will assume that
• the initial condition 𝑋0 is nonnegative,
• {𝑎𝑡 }𝑡≥1 is a nonnegative IID stochastic process and
• {𝜂𝑡 }𝑡≥1 is another nonnegative IID stochastic process, independent of the first.

14.3.1 Example: GARCH Volatility

The GARCH model is common in financial applications, where time series such as asset re-
turns exhibit time varying volatility.
For example, consider the following plot of daily returns on the Nasdaq Composite Index for
the period 1st January 2006 to 1st November 2019.

In [4]: import yfinance as yf


import pandas as pd

s = yf.download('^IXIC', '2006­1­1', '2019­11­1')['Adj Close']

r = s.pct_change()
14.3. KESTEN PROCESSES 243

fig, ax = plt.subplots()

ax.plot(r, alpha=0.7)

ax.set_ylabel('returns', fontsize=12)
ax.set_xlabel('date', fontsize=12)

plt.show()

[*********************100%***********************] 1 of 1 completed

Notice how the series exhibits bursts of volatility (high variance) and then settles down again.
GARCH models can replicate this feature.
The GARCH(1, 1) volatility process takes the form

2
𝜎𝑡+1 = 𝛼0 + 𝜎𝑡2 (𝛼1 𝜉𝑡+1
2
+ 𝛽) (2)

where {𝜉𝑡 } is IID with 𝔼𝜉𝑡2 = 1 and all parameters are positive.
Returns on a given asset are then modeled as

𝑟𝑡 = 𝜎𝑡 𝜁𝑡 (3)

where {𝜁𝑡 } is again IID and independent of {𝜉𝑡 }.


The volatility sequence {𝜎𝑡2 }, which drives the dynamics of returns, is a Kesten process.
244 CHAPTER 14. KESTEN PROCESSES AND FIRM DYNAMICS

14.3.2 Example: Wealth Dynamics

Suppose that a given household saves a fixed fraction 𝑠 of its current wealth in every period.
The household earns labor income 𝑦𝑡 at the start of time 𝑡.
Wealth then evolves according to

𝑤𝑡+1 = 𝑅𝑡+1 𝑠𝑤𝑡 + 𝑦𝑡+1 (4)

where {𝑅𝑡 } is the gross rate of return on assets.


If {𝑅𝑡 } and {𝑦𝑡 } are both IID, then (4) is a Kesten process.

14.3.3 Stationarity

In earlier lectures, such as the one on AR(1) processes, we introduced the notion of a station-
ary distribution.
In the present context, we can define a stationary distribution as follows:
The distribution 𝐹 ∗ on ℝ is called stationary for the Kesten process (1) if

𝑋𝑡 ∼ 𝐹 ∗ ⟹ 𝑎𝑡+1 𝑋𝑡 + 𝜂𝑡+1 ∼ 𝐹 ∗ (5)

In other words, if the current state 𝑋𝑡 has distribution 𝐹 ∗ , then so does the next period state
𝑋𝑡+1 .
We can write this alternatively as

𝐹 ∗ (𝑦) = ∫ ℙ{𝑎𝑡+1 𝑥 + 𝜂𝑡+1 ≤ 𝑦}𝐹 ∗ (𝑑𝑥) for all 𝑦 ≥ 0. (6)

The left hand side is the distribution of the next period state when the current state is drawn
from 𝐹 ∗ .
The equality in (6) states that this distribution is unchanged.

14.3.4 Cross-Sectional Interpretation

There is an important cross-sectional interpretation of stationary distributions, discussed pre-


viously but worth repeating here.
Suppose, for example, that we are interested in the wealth distribution — that is, the current
distribution of wealth across households in a given country.
Suppose further that
• the wealth of each household evolves independently according to (4),
• 𝐹 ∗ is a stationary distribution for this stochastic process and
• there are many households.
Then 𝐹 ∗ is a steady state for the cross-sectional wealth distribution in this country.
In other words, if 𝐹 ∗ is the current wealth distribution then it will remain so in subsequent
periods, ceteris paribus.
14.4. HEAVY TAILS 245

To see this, suppose that 𝐹 ∗ is the current wealth distribution.


What is the fraction of households with wealth less than 𝑦 next period?
To obtain this, we sum the probability that wealth is less than 𝑦 tomorrow, given that current
wealth is 𝑤, weighted by the fraction of households with wealth 𝑤.
Noting that the fraction of households with wealth in interval 𝑑𝑤 is 𝐹 ∗ (𝑑𝑤), we get

∫ ℙ{𝑅𝑡+1 𝑠𝑤 + 𝑦𝑡+1 ≤ 𝑦}𝐹 ∗ (𝑑𝑤)

By the definition of stationarity and the assumption that 𝐹 ∗ is stationary for the wealth pro-
cess, this is just 𝐹 ∗ (𝑦).
Hence the fraction of households with wealth in [0, 𝑦] is the same next period as it is this pe-
riod.
Since 𝑦 was chosen arbitrarily, the distribution is unchanged.

14.3.5 Conditions for Stationarity

The Kesten process 𝑋𝑡+1 = 𝑎𝑡+1 𝑋𝑡 + 𝜂𝑡+1 does not always have a stationary distribution.
For example, if 𝑎𝑡 ≡ 𝜂𝑡 ≡ 1 for all 𝑡, then 𝑋𝑡 = 𝑋0 + 𝑡, which diverges to infinity.
To prevent this kind of divergence, we require that {𝑎𝑡 } is strictly less than 1 most of the
time.
In particular, if

𝔼 ln 𝑎𝑡 < 0 and 𝔼𝜂𝑡 < ∞ (7)

then a unique stationary distribution exists on ℝ+ .


• See, for example, theorem 2.1.3 of [17], which provides slightly weaker conditions.
As one application of this result, we see that the wealth process (4) will have a unique sta-
tionary distribution whenever labor income has finite mean and 𝔼 ln 𝑅𝑡 + ln 𝑠 < 0.

14.4 Heavy Tails

Under certain conditions, the stationary distribution of a Kesten process has a Pareto tail.
(See our earlier lecture on heavy-tailed distributions for background.)
This fact is significant for economics because of the prevalence of Pareto-tailed distributions.

14.4.1 The Kesten–Goldie Theorem

To state the conditions under which the stationary distribution of a Kesten process has a
Pareto tail, we first recall that a random variable is called nonarithmetic if its distribution
is not concentrated on {… , −2𝑡, −𝑡, 0, 𝑡, 2𝑡, …} for any 𝑡 ≥ 0.
For example, any random variable with a density is nonarithmetic.
246 CHAPTER 14. KESTEN PROCESSES AND FIRM DYNAMICS

The famous Kesten–Goldie Theorem (see, e.g., [17], theorem 2.4.4) states that if

1. the stationarity conditions in (7) hold,

2. the random variable 𝑎𝑡 is positive with probability one and nonarithmetic,

3. ℙ{𝑎𝑡 𝑥 + 𝜂𝑡 = 𝑥} < 1 for all 𝑥 ∈ ℝ+ and

4. there exists a positive constant 𝛼 such that

𝔼𝑎𝛼
𝑡 = 1, 𝔼𝜂𝑡𝛼 < ∞, and 𝔼[𝑎𝛼+1
𝑡 ]<∞

then the stationary distribution of the Kesten process has a Pareto tail with tail index 𝛼.
More precisely, if 𝐹 ∗ is the unique stationary distribution and 𝑋 ∗ ∼ 𝐹 ∗ , then

lim 𝑥𝛼 ℙ{𝑋 ∗ > 𝑥} = 𝑐


𝑥→∞

for some positive constant 𝑐.

14.4.2 Intuition

Later we will illustrate the Kesten–Goldie Theorem using rank-size plots.


Prior to doing so, we can give the following intuition for the conditions.
Two important conditions are that 𝔼 ln 𝑎𝑡 < 0, so the model is stationary, and 𝔼𝑎𝛼
𝑡 = 1 for
some 𝛼 > 0.
The first condition implies that the distribution of 𝑎𝑡 has a large amount of probability mass
below 1.
The second condition implies that the distribution of 𝑎𝑡 has at least some probability mass at
or above 1.
The first condition gives us existence of the stationary condition.
The second condition means that the current state can be expanded by 𝑎𝑡 .
If this occurs for several concurrent periods, the effects compound each other, since 𝑎𝑡 is mul-
tiplicative.
This leads to spikes in the time series, which fill out the extreme right hand tail of the distri-
bution.
The spikes in the time series are visible in the following simulation, which generates of 10
paths when 𝑎𝑡 and 𝑏𝑡 are lognormal.

In [5]: μ = ­0.5
σ = 1.0

def kesten_ts(ts_length=100):
x = np.zeros(ts_length)
for t in range(ts_length­1):
a = np.exp(μ + σ * np.random.randn())
b = np.exp(np.random.randn())
14.5. APPLICATION: FIRM DYNAMICS 247

x[t+1] = a * x[t] + b
return x

fig, ax = plt.subplots()

num_paths = 10
np.random.seed(12)

for i in range(num_paths):
ax.plot(kesten_ts())

ax.set(xlabel='time', ylabel='$X_t$')
plt.show()

14.5 Application: Firm Dynamics

As noted in our lecture on heavy tails, for common measures of firm size such as revenue or
employment, the US firm size distribution exhibits a Pareto tail (see, e.g., [7], [41]).
Let us try to explain this rather striking fact using the Kesten–Goldie Theorem.

14.5.1 Gibrat’s Law

It was postulated many years ago by Robert Gibrat [42] that firm size evolves according to a
simple rule whereby size next period is proportional to current size.
This is now known as Gibrat’s law of proportional growth.
We can express this idea by stating that a suitably defined measure 𝑠𝑡 of firm size obeys
248 CHAPTER 14. KESTEN PROCESSES AND FIRM DYNAMICS

𝑠𝑡+1
= 𝑎𝑡+1 (8)
𝑠𝑡

for some positive IID sequence {𝑎𝑡 }.


One implication of Gibrat’s law is that the growth rate of individual firms does not depend
on their size.
However, over the last few decades, research contradicting Gibrat’s law has accumulated in
the literature.
For example, it is commonly found that, on average,

1. small firms grow faster than large firms (see, e.g., [35] and [46]) and

2. the growth rate of small firms is more volatile than that of large firms [32].

On the other hand, Gibrat’s law is generally found to be a reasonable approximation for large
firms [35].
We can accommodate these empirical findings by modifying (8) to

𝑠𝑡+1 = 𝑎𝑡+1 𝑠𝑡 + 𝑏𝑡+1 (9)

where {𝑎𝑡 } and {𝑏𝑡 } are both IID and independent of each other.
In the exercises you are asked to show that (9) is more consistent with the empirical findings
presented above than Gibrat’s law in (8).

14.5.2 Heavy Tails

So what has this to do with Pareto tails?


The answer is that (9) is a Kesten process.
If the conditions of the Kesten–Goldie Theorem are satisfied, then the firm size distribution is
predicted to have heavy tails — which is exactly what we see in the data.
In the exercises below we explore this idea further, generalizing the firm size dynamics and
examining the corresponding rank-size plots.
We also try to illustrate why the Pareto tail finding is significant for quantitative analysis.

14.6 Exercises

14.6.1 Exercise 1

Simulate and plot 15 years of daily returns (consider each year as having 250 working days)
using the GARCH(1, 1) process in (2)–(3).
Take 𝜉𝑡 and 𝜁𝑡 to be independent and standard normal.
Set 𝛼0 = 0.00001, 𝛼1 = 0.1, 𝛽 = 0.9 and 𝜎0 = 0.
Compare visually with the Nasdaq Composite Index returns shown above.
14.6. EXERCISES 249

While the time path differs, you should see bursts of high volatility.

14.6.2 Exercise 2

In our discussion of firm dynamics, it was claimed that (9) is more consistent with the empiri-
cal literature than Gibrat’s law in (8).
(The empirical literature was reviewed immediately above (9).)
In what sense is this true (or false)?

14.6.3 Exercise 3

Consider an arbitrary Kesten process as given in (1).


Suppose that {𝑎𝑡 } is lognormal with parameters (𝜇, 𝜎).
In other words, each 𝑎𝑡 has the same distribution as exp(𝜇 + 𝜎𝑍) when 𝑍 is standard normal.
Suppose further that 𝔼𝜂𝑡𝑟 < ∞ for every 𝑟 > 0, as would be the case if, say, 𝜂𝑡 is also lognor-
mal.
Show that the conditions of the Kesten–Goldie theorem are satisfied if and only if 𝜇 < 0.
Obtain the value of 𝛼 that makes the Kesten–Goldie conditions hold.

14.6.4 Exercise 4

One unrealistic aspect of the firm dynamics specified in (9) is that it ignores entry and exit.
In any given period and in any given market, we observe significant numbers of firms entering
and exiting the market.
Empirical discussion of this can be found in a famous paper by Hugo Hopenhayn [57].
In the same paper, Hopenhayn builds a model of entry and exit that incorporates profit max-
imization by firms and market clearing quantities, wages and prices.
In his model, a stationary equilibrium occurs when the number of entrants equals the number
of exiting firms.
In this setting, firm dynamics can be expressed as

𝑠𝑡+1 = 𝑒𝑡+1 𝟙{𝑠𝑡 < 𝑠}̄ + (𝑎𝑡+1 𝑠𝑡 + 𝑏𝑡+1 )𝟙{𝑠𝑡 ≥ 𝑠}̄ (10)

Here
• the state variable 𝑠𝑡 is represents productivity (which is a proxy for output and hence
firm size),
• the IID sequence {𝑒𝑡 } is thought of as a productivity draw for a new entrant and
• the variable 𝑠 ̄ is a threshold value that we take as given, although it is determined en-
dogenously in Hopenhayn’s model.
The idea behind (10) is that firms stay in the market as long as their productivity 𝑠𝑡 remains
at or above 𝑠.̄
• In this case, their productivity updates according to (9).
250 CHAPTER 14. KESTEN PROCESSES AND FIRM DYNAMICS

Firms choose to exit when their productivity 𝑠𝑡 falls below 𝑠.̄


• In this case, they are replaced by a new firm with productivity 𝑒𝑡+1 .
What can we say about dynamics?
Although (10) is not a Kesten process, it does update in the same way as a Kesten process
when 𝑠𝑡 is large.
So perhaps its stationary distribution still has Pareto tails?
Your task is to investigate this question via simulation and rank-size plots.
The approach will be to

1. generate 𝑀 draws of 𝑠𝑇 when 𝑀 and 𝑇 are large and

2. plot the largest 1,000 of the resulting draws in a rank-size plot.

(The distribution of 𝑠𝑇 will be close to the stationary distribution when 𝑇 is large.)


In the simulation, assume that
• each of 𝑎𝑡 , 𝑏𝑡 and 𝑒𝑡 is lognormal,
• the parameters are

In [6]: μ_a = ­0.5 # location parameter for a


σ_a = 0.1 # scale parameter for a
μ_b = 0.0 # location parameter for b
σ_b = 0.5 # scale parameter for b
μ_e = 0.0 # location parameter for e
σ_e = 0.5 # scale parameter for e
s_bar = 1.0 # threshold
T = 500 # sampling date
M = 1_000_000 # number of firms
s_init = 1.0 # initial condition for each firm

14.7 Solutions

14.7.1 Exercise 1

Here is one solution:

In [7]: α_0 = 1e­5


α_1 = 0.1
β = 0.9

years = 15
days = years * 250

def garch_ts(ts_length=days):
σ2 = 0
r = np.zeros(ts_length)
for t in range(ts_length­1):
ξ = np.random.randn()
σ2 = α_0 + σ2 * (α_1 * ξ**2 + β)
r[t] = np.sqrt(σ2) * np.random.randn()
return r
14.7. SOLUTIONS 251

fig, ax = plt.subplots()

np.random.seed(12)

ax.plot(garch_ts(), alpha=0.7)

ax.set(xlabel='time', ylabel='$\\sigma_t^2$')
plt.show()

14.7.2 Exercise 2

The empirical findings are that

1. small firms grow faster than large firms and

2. the growth rate of small firms is more volatile than that of large firms.

Also, Gibrat’s law is generally found to be a reasonable approximation for large firms than for
small firms
The claim is that the dynamics in (9) are more consistent with points 1-2 than Gibrat’s law.
To see why, we rewrite (9) in terms of growth dynamics:

𝑠𝑡+1 𝑏
= 𝑎𝑡+1 + 𝑡+1 (11)
𝑠𝑡 𝑠𝑡

Taking 𝑠𝑡 = 𝑠 as given, the mean and variance of firm growth are


252 CHAPTER 14. KESTEN PROCESSES AND FIRM DYNAMICS

𝔼𝑏 𝕍𝑏
𝔼𝑎 + and 𝕍𝑎 +
𝑠 𝑠2
Both of these decline with firm size 𝑠, consistent with the data.
Moreover, the law of motion (11) clearly approaches Gibrat’s law (8) as 𝑠𝑡 gets large.

14.7.3 Exercise 3

Since 𝑎𝑡 has a density it is nonarithmetic.


Since 𝑎𝑡 has the same density as 𝑎 = exp(𝜇 + 𝜎𝑍) when 𝑍 is standard normal, we have

𝔼 ln 𝑎𝑡 = 𝔼(𝜇 + 𝜎𝑍) = 𝜇,

and since 𝜂𝑡 has finite moments of all orders, the stationarity condition holds if and only if
𝜇 < 0.
Given the properties of the lognormal distribution (which has finite moments of all orders),
the only other condition in doubt is existence of a positive constant 𝛼 such that 𝔼𝑎𝛼
𝑡 = 1.

This is equivalent to the statement

𝛼2 𝜎 2
exp (𝛼𝜇 + ) = 1.
2

Solving for 𝛼 gives 𝛼 = −2𝜇/𝜎2 .

14.7.4 Exercise 4

Here’s one solution. First we generate the observations:

In [8]: from numba import njit, prange


from numpy.random import randn

@njit(parallel=True)
def generate_draws(μ_a=­0.5,
σ_a=0.1,
μ_b=0.0,
σ_b=0.5,
μ_e=0.0,
σ_e=0.5,
s_bar=1.0,
T=500,
M=1_000_000,
s_init=1.0):

draws = np.empty(M)
for m in prange(M):
s = s_init
for t in range(T):
if s < s_bar:
new_s = np.exp(μ_e + σ_e * randn())
else:
a = np.exp(μ_a + σ_a * randn())
14.7. SOLUTIONS 253

b = np.exp(μ_b + σ_b * randn())


new_s = a * s + b
s = new_s
draws[m] = s

return draws

data = generate_draws()

Now we produce the rank-size plot:

In [9]: fig, ax = plt.subplots()

rank_data, size_data = qe.rank_size(data, c=0.01)


ax.loglog(rank_data, size_data, 'o', markersize=3.0, alpha=0.5)
ax.set_xlabel("log rank")
ax.set_ylabel("log size")

plt.show()

The plot produces a straight line, consistent with a Pareto tail.


254 CHAPTER 14. KESTEN PROCESSES AND FIRM DYNAMICS
Chapter 15

Wealth Distribution Dynamics

15.1 Contents

• Overview 15.2
• Lorenz Curves and the Gini Coefficient 15.3
• A Model of Wealth Dynamics 15.4
• Implementation 15.5
• Applications 15.6
• Exercises 15.7
• Solutions 15.8
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon

15.2 Overview

This notebook gives an introduction to wealth distribution dynamics, with a focus on


• modeling and computing the wealth distribution via simulation,
• measures of inequality such as the Lorenz curve and Gini coefficient, and
• how inequality is affected by the properties of wage income and returns on assets.
One interesting property of the wealth distribution we discuss is Pareto tails.
The wealth distribution in many countries exhibits a Pareto tail
• See this lecture for a definition.
• For a review of the empirical evidence, see, for example, [9].
This is consistent with high concentration of wealth amongst the richest households.
It also gives us a way to quantify such concentration, in terms of the tail index.
One question of interest is whether or not we can replicate Pareto tails from a relatively sim-
ple model.

15.2.1 A Note on Assumptions

The evolution of wealth for any given household depends on their savings behavior.

255
256 CHAPTER 15. WEALTH DISTRIBUTION DYNAMICS

Modeling such behavior will form an important part of this lecture series.
However, in this particular lecture, we will be content with rather ad hoc (but plausible) sav-
ings rules.
We do this to more easily explore the implications of different specifications of income dy-
namics and investment returns.
At the same time, all of the techniques discussed here can be plugged into models that use
optimization to obtain savings rules.
We will use the following imports.

In [2]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

import quantecon as qe
from numba import njit, jitclass, float64, prange

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

15.3 Lorenz Curves and the Gini Coefficient

Before we investigate wealth dynamics, we briefly review some measures of inequality.

15.3.1 Lorenz Curves

One popular graphical measure of inequality is the Lorenz curve.


The package QuantEcon.py, already imported above, contains a function to compute Lorenz
curves.
To illustrate, suppose that

In [3]: n = 10_000 # size of sample


w = np.exp(np.random.randn(n)) # lognormal draws

is data representing the wealth of 10,000 households.


We can compute and plot the Lorenz curve as follows:

In [4]: f_vals, l_vals = qe.lorenz_curve(w)

fig, ax = plt.subplots()
ax.plot(f_vals, l_vals, label='Lorenz curve, lognormal sample')
ax.plot(f_vals, f_vals, label='Lorenz curve, equality')
ax.legend()
plt.show()
15.3. LORENZ CURVES AND THE GINI COEFFICIENT 257

This curve can be understood as follows: if point (𝑥, 𝑦) lies on the curve, it means that, col-
lectively, the bottom (100𝑥)% of the population holds (100𝑦)% of the wealth.
The “equality” line is the 45 degree line (which might not be exactly 45 degrees in the figure,
depending on the aspect ratio).
A sample that produces this line exhibits perfect equality.
The other line in the figure is the Lorenz curve for the lognormal sample, which deviates sig-
nificantly from perfect equality.
For example, the bottom 80% of the population holds around 40% of total wealth.
Here is another example, which shows how the Lorenz curve shifts as the underlying distribu-
tion changes.
We generate 10,000 observations using the Pareto distribution with a range of parameters,
and then compute the Lorenz curve corresponding to each set of observations.

In [5]: a_vals = (1, 2, 5) # Pareto tail index


n = 10_000 # size of each sample
fig, ax = plt.subplots()
for a in a_vals:
u = np.random.uniform(size=n)
y = u**(­1/a) # distributed as Pareto with tail index a
f_vals, l_vals = qe.lorenz_curve(y)
ax.plot(f_vals, l_vals, label=f'$a = {a}$')
ax.plot(f_vals, f_vals, label='equality')
ax.legend()
plt.show()
258 CHAPTER 15. WEALTH DISTRIBUTION DYNAMICS

You can see that, as the tail parameter of the Pareto distribution increases, inequality de-
creases.
This is to be expected, because a higher tail index implies less weight in the tail of the Pareto
distribution.

15.3.2 The Gini Coefficient

The definition and interpretation of the Gini coefficient can be found on the corresponding
Wikipedia page.
A value of 0 indicates perfect equality (corresponding the case where the Lorenz curve
matches the 45 degree line) and a value of 1 indicates complete inequality (all wealth held
by the richest household).
The QuantEcon.py library contains a function to calculate the Gini coefficient.
We can test it on the Weibull distribution with parameter 𝑎, where the Gini coefficient is
known to be

𝐺 = 1 − 2−1/𝑎

Let’s see if the Gini coefficient computed from a simulated sample matches this at each fixed
value of 𝑎.

In [6]: a_vals = range(1, 20)


ginis = []
ginis_theoretical = []
n = 100

fig, ax = plt.subplots()
for a in a_vals:
y = np.random.weibull(a, size=n)
15.4. A MODEL OF WEALTH DYNAMICS 259

ginis.append(qe.gini_coefficient(y))
ginis_theoretical.append(1 ­ 2**(­1/a))
ax.plot(a_vals, ginis, label='estimated gini coefficient')
ax.plot(a_vals, ginis_theoretical, label='theoretical gini coefficient')
ax.legend()
ax.set_xlabel("Weibull parameter $a$")
ax.set_ylabel("Gini coefficient")
plt.show()

The simulation shows that the fit is good.

15.4 A Model of Wealth Dynamics

Having discussed inequality measures, let us now turn to wealth dynamics.


The model we will study is

𝑤𝑡+1 = (1 + 𝑟𝑡+1 )𝑠(𝑤𝑡 ) + 𝑦𝑡+1 (1)

where
• 𝑤𝑡 is wealth at time 𝑡 for a given household,
• 𝑟𝑡 is the rate of return of financial assets,
• 𝑦𝑡 is current non-financial (e.g., labor) income and
• 𝑠(𝑤𝑡 ) is current wealth net of consumption
Letting {𝑧𝑡 } be a correlated state process of the form

𝑧𝑡+1 = 𝑎𝑧𝑡 + 𝑏 + 𝜎𝑧 𝜖𝑡+1


260 CHAPTER 15. WEALTH DISTRIBUTION DYNAMICS

we’ll assume that

𝑅𝑡 ∶= 1 + 𝑟𝑡 = 𝑐𝑟 exp(𝑧𝑡 ) + exp(𝜇𝑟 + 𝜎𝑟 𝜉𝑡 )

and

𝑦𝑡 = 𝑐𝑦 exp(𝑧𝑡 ) + exp(𝜇𝑦 + 𝜎𝑦 𝜁𝑡 )

Here {(𝜖𝑡 , 𝜉𝑡 , 𝜁𝑡 )} is IID and standard normal in ℝ3 .


The value of 𝑐𝑟 should be close to zero, since rates of return on assets do not exhibit large
trends.
When we simulate a population of households, we will assume all shocks are idiosyncratic
(i.e., specific to individual households and independent across them).
Regarding the savings function 𝑠, our default model will be

𝑠(𝑤) = 𝑠0 𝑤 ⋅ 𝟙{𝑤 ≥ 𝑤}
̂ (2)

where 𝑠0 is a positive constant.


Thus, for 𝑤 < 𝑤,̂ the household saves nothing. For 𝑤 ≥ 𝑤,̄ the household saves a fraction 𝑠0
of their wealth.
We are using something akin to a fixed savings rate model, while acknowledging that low
wealth households tend to save very little.

15.5 Implementation

Here’s some type information to help Numba.

In [7]: wealth_dynamics_data = [
('w_hat', float64), # savings parameter
('s_0', float64), # savings parameter
('c_y', float64), # labor income parameter
('μ_y', float64), # labor income paraemter
('σ_y', float64), # labor income parameter
('c_r', float64), # rate of return parameter
('μ_r', float64), # rate of return parameter
('σ_r', float64), # rate of return parameter
('a', float64), # aggregate shock parameter
('b', float64), # aggregate shock parameter
('σ_z', float64), # aggregate shock parameter
('z_mean', float64), # mean of z process
('z_var', float64), # variance of z process
('y_mean', float64), # mean of y process
('R_mean', float64) # mean of R process
]

Here’s a class that stores instance data and implements methods that update the aggregate
state and household wealth.

In [8]: @jitclass(wealth_dynamics_data)
class WealthDynamics:
15.5. IMPLEMENTATION 261

def __init__(self,
w_hat=1.0,
s_0=0.75,
c_y=1.0,
μ_y=1.0,
σ_y=0.2,
c_r=0.05,
μ_r=0.1,
σ_r=0.5,
a=0.5,
b=0.0,
σ_z=0.1):

self.w_hat, self.s_0 = w_hat, s_0


self.c_y, self.μ_y, self.σ_y = c_y, μ_y, σ_y
self.c_r, self.μ_r, self.σ_r = c_r, μ_r, σ_r
self.a, self.b, self.σ_z = a, b, σ_z

# Record stationary moments


self.z_mean = b / (1 ­ a)
self.z_var = σ_z**2 / (1 ­ a**2)
exp_z_mean = np.exp(self.z_mean + self.z_var / 2)
self.R_mean = c_r * exp_z_mean + np.exp(μ_r + σ_r**2 / 2)
self.y_mean = c_y * exp_z_mean + np.exp(μ_y + σ_y**2 / 2)

# Test a stability condition that ensures wealth does not diverge


# to infinity.
α = self.R_mean * self.s_0
if α >= 1:
raise ValueError("Stability condition failed.")

def parameters(self):
"""
Collect and return parameters.
"""
parameters = (self.w_hat, self.s_0,
self.c_y, self.μ_y, self.σ_y,
self.c_r, self.μ_r, self.σ_r,
self.a, self.b, self.σ_z)
return parameters

def update_states(self, w, z):


"""
Update one period, given current wealth w and persistent
state z.
"""

# Simplify names
params = self.parameters()
w_hat, s_0, c_y, μ_y, σ_y, c_r, μ_r, σ_r, a, b, σ_z = params
zp = a * z + b + σ_z * np.random.randn()

# Update wealth
y = c_y * np.exp(zp) + np.exp(μ_y + σ_y * np.random.randn())
wp = y
if w >= w_hat:
R = c_r * np.exp(zp) + np.exp(μ_r + σ_r * np.random.randn())
wp += R * s_0 * w
return wp, zp

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/ipykernel_launcher.py:1: NumbaDeprecationWarning: The 'numba.jitclass'
262 CHAPTER 15. WEALTH DISTRIBUTION DYNAMICS

decorator has moved to 'numba.experimental.jitclass' to better reflect the


experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba­
doc/latest/reference/deprecation.html#change­of­jitclass­location for the time�
↪frame.
"""Entry point for launching an IPython kernel.

Here’s function to simulate the time series of wealth for in individual households.

In [9]: @njit
def wealth_time_series(wdy, w_0, n):
"""
Generate a single time series of length n for wealth given
initial value w_0.

The initial persistent state z_0 for each household is drawn from
the stationary distribution of the AR(1) process.

* wdy: an instance of WealthDynamics


* w_0: scalar
* n: int

"""
z = wdy.z_mean + np.sqrt(wdy.z_var) * np.random.randn()
w = np.empty(n)
w[0] = w_0
for t in range(n­1):
w[t+1], z = wdy.update_states(w[t], z)
return w

Now here’s function to simulate a cross section of households forward in time.


Note the use of parallelization to speed up computation.

In [10]: @njit(parallel=True)
def update_cross_section(wdy, w_distribution, shift_length=500):
"""
Shifts a cross­section of household forward in time

* wdy: an instance of WealthDynamics


* w_distribution: array_like, represents current cross­section

Takes a current distribution of wealth values as w_distribution


and updates each w_t in w_distribution to w_{t+j}, where
j = shift_length.

Returns the new distribution.

"""
new_distribution = np.empty_like(w_distribution)

# Update each household


for i in prange(len(new_distribution)):
z = wdy.z_mean + np.sqrt(wdy.z_var) * np.random.randn()
w = w_distribution[i]
for t in range(shift_length­1):
w, z = wdy.update_states(w, z)
new_distribution[i] = w
return new_distribution
15.6. APPLICATIONS 263

Parallelization is very effective in the function above because the time path of each household
can be calculated independently once the path for the aggregate state is known.

15.6 Applications

Let’s try simulating the model at different parameter values and investigate the implications
for the wealth distribution.

15.6.1 Time Series

Let’s look at the wealth dynamics of an individual household.

In [11]: wdy = WealthDynamics()

ts_length = 200
w = wealth_time_series(wdy, wdy.y_mean, ts_length)

fig, ax = plt.subplots()
ax.plot(w)
plt.show()

Notice the large spikes in wealth over time.


Such spikes are similar to what we observed in time series when we studied Kesten processes.

15.6.2 Inequality Measures

Let’s look at how inequality varies with returns on financial assets.


The next function generates a cross section and then computes the Lorenz curve and Gini
coefficient.
264 CHAPTER 15. WEALTH DISTRIBUTION DYNAMICS

In [12]: def generate_lorenz_and_gini(wdy, num_households=100_000, T=500):


"""
Generate the Lorenz curve data and gini coefficient corresponding to a
WealthDynamics mode by simulating num_households forward to time T.
"""
ψ_0 = np.ones(num_households) * wdy.y_mean
z_0 = wdy.z_mean

ψ_star = update_cross_section(wdy, ψ_0, shift_length=T)


return qe.gini_coefficient(ψ_star), qe.lorenz_curve(ψ_star)

Now we investigate how the Lorenz curves associated with the wealth distribution change as
return to savings varies.
The code below plots Lorenz curves for three different values of 𝜇𝑟 .
If you are running this yourself, note that it will take one or two minutes to execute.
This is unavoidable because we are executing a CPU intensive task.
In fact the code, which is JIT compiled and parallelized, runs extremely fast relative to the
number of computations.

In [13]: fig, ax = plt.subplots()


μ_r_vals = (0.0, 0.025, 0.05)
gini_vals = []

for μ_r in μ_r_vals:


wdy = WealthDynamics(μ_r=μ_r)
gv, (f_vals, l_vals) = generate_lorenz_and_gini(wdy)
ax.plot(f_vals, l_vals, label=f'$\psi^*$ at $\mu_r = {μ_r:0.2}$')
gini_vals.append(gv)

ax.plot(f_vals, f_vals, label='equality')


ax.legend(loc="upper left")
plt.show()
15.6. APPLICATIONS 265

The Lorenz curve shifts downwards as returns on financial income rise, indicating a rise in
inequality.
We will look at this again via the Gini coefficient immediately below, but first consider the
following image of our system resources when the code above is executing:

Notice how effectively Numba has implemented multithreading for this routine: all 8 CPUs
on our workstation are running at maximum capacity (even though four of them are virtual).
Since the code is both efficiently JIT compiled and fully parallelized, it’s close to impossible
to make this sequence of tasks run faster without changing hardware.
Now let’s check the Gini coefficient.

In [14]: fig, ax = plt.subplots()


ax.plot(μ_r_vals, gini_vals, label='gini coefficient')
ax.set_xlabel("$\mu_r$")
ax.legend()
plt.show()
266 CHAPTER 15. WEALTH DISTRIBUTION DYNAMICS

Once again, we see that inequality increases as returns on financial income rise.
Let’s finish this section by investigating what happens when we change the volatility term 𝜎𝑟
in financial returns.

In [15]: fig, ax = plt.subplots()


σ_r_vals = (0.35, 0.45, 0.52)
gini_vals = []

for σ_r in σ_r_vals:


wdy = WealthDynamics(σ_r=σ_r)
gv, (f_vals, l_vals) = generate_lorenz_and_gini(wdy)
ax.plot(f_vals, l_vals, label=f'$\psi^*$ at $\sigma_r = {σ_r:0.2}$')
gini_vals.append(gv)

ax.plot(f_vals, f_vals, label='equality')


ax.legend(loc="upper left")
plt.show()
15.7. EXERCISES 267

We see that greater volatility has the effect of increasing inequality in this model.

15.7 Exercises

15.7.1 Exercise 1

For a wealth or income distribution with Pareto tail, a higher tail index suggests lower in-
equality.
Indeed, it is possible to prove that the Gini coefficient of the Pareto distribution with tail in-
dex 𝑎 is 1/(2𝑎 − 1).
To the extent that you can, confirm this by simulation.
In particular, generate a plot of the Gini coefficient against the tail index using both the theo-
retical value just given and the value computed from a sample via qe.gini_coefficient.
For the values of the tail index, use a_vals = np.linspace(1, 10, 25).
Use sample of size 1,000 for each 𝑎 and the sampling method for generating Pareto draws em-
ployed in the discussion of Lorenz curves for the Pareto distribution.
To the extend that you can, interpret the monotone relationship between the Gini index and
𝑎.

15.7.2 Exercise 2

The wealth process (1) is similar to a Kesten process.


This is because, according to (2), savings is constant for all wealth levels above 𝑤.̂
When savings is constant, the wealth process has the same quasi-linear structure as a Kesten
process, with multiplicative and additive shocks.
268 CHAPTER 15. WEALTH DISTRIBUTION DYNAMICS

The Kesten–Goldie theorem tells us that Kesten processes have Pareto tails under a range of
parameterizations.
The theorem does not directly apply here, since savings is not always constant and since the
multiplicative and additive terms in (1) are not IID.
At the same time, given the similarities, perhaps Pareto tails will arise.
To test this, run a simulation that generates a cross-section of wealth and generate a rank-size
plot.
If you like, you can use the function rank_size from the quantecon library (documentation
here).
In viewing the plot, remember that Pareto tails generate a straight line. Is this what you see?
For sample size and initial conditions, use

In [16]: num_households = 250_000


T = 500 # shift forward T periods
ψ_0 = np.ones(num_households) * wdy.y_mean # initial distribution
z_0 = wdy.z_mean

15.8 Solutions

Here is one solution, which produces a good match between theory and simulation.

15.8.1 Exercise 1

In [17]: a_vals = np.linspace(1, 10, 25) # Pareto tail index


ginis = np.empty_like(a_vals)

n = 1000 # size of each sample


fig, ax = plt.subplots()
for i, a in enumerate(a_vals):
y = np.random.uniform(size=n)**(­1/a)
ginis[i] = qe.gini_coefficient(y)
ax.plot(a_vals, ginis, label='sampled')
ax.plot(a_vals, 1/(2*a_vals ­ 1), label='theoretical')
ax.legend()
plt.show()
15.8. SOLUTIONS 269

In general, for a Pareto distribution, a higher tail index implies less weight in the right hand
tail.
This means less extreme values for wealth and hence more equality.
More equality translates to a lower Gini index.

15.8.2 Exercise 2

First let’s generate the distribution:

In [18]: num_households = 250_000


T = 500 # how far to shift forward in time
ψ_0 = np.ones(num_households) * wdy.y_mean
z_0 = wdy.z_mean

ψ_star = update_cross_section(wdy, ψ_0, shift_length=T)

Now let’s see the rank-size plot:

In [19]: fig, ax = plt.subplots()

rank_data, size_data = qe.rank_size(ψ_star, c=0.001)


ax.loglog(rank_data, size_data, 'o', markersize=3.0, alpha=0.5)
ax.set_xlabel("log rank")
ax.set_ylabel("log size")

plt.show()
270 CHAPTER 15. WEALTH DISTRIBUTION DYNAMICS
Chapter 16

A First Look at the Kalman Filter

16.1 Contents

• Overview 16.2
• The Basic Idea 16.3
• Convergence 16.4
• Implementation 16.5
• Exercises 16.6
• Solutions 16.7
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon

16.2 Overview

This lecture provides a simple and intuitive introduction to the Kalman filter, for those who
either
• have heard of the Kalman filter but don’t know how it works, or
• know the Kalman filter equations, but don’t know where they come from
For additional (more advanced) reading on the Kalman filter, see
• [72], section 2.7
• [5]
The second reference presents a comprehensive treatment of the Kalman filter.
Required knowledge: Familiarity with matrix manipulations, multivariate normal distribu-
tions, covariance matrices, etc.
We’ll need the following imports:

In [2]: from scipy import linalg


import numpy as np
import matplotlib.cm as cm
import matplotlib.pyplot as plt
%matplotlib inline
from quantecon import Kalman, LinearStateSpace
from scipy.stats import norm

271
272 CHAPTER 16. A FIRST LOOK AT THE KALMAN FILTER

from scipy.integrate import quad


from numpy.random import multivariate_normal
from scipy.linalg import eigvals

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

16.3 The Basic Idea

The Kalman filter has many applications in economics, but for now let’s pretend that we are
rocket scientists.
A missile has been launched from country Y and our mission is to track it.
Let 𝑥 ∈ ℝ2 denote the current location of the missile—a pair indicating latitude-longitude
coordinates on a map.
At the present moment in time, the precise location 𝑥 is unknown, but we do have some be-
liefs about 𝑥.
One way to summarize our knowledge is a point prediction 𝑥̂
• But what if the President wants to know the probability that the missile is currently
over the Sea of Japan?
• Then it is better to summarize our initial beliefs with a bivariate probability density 𝑝
– ∫𝐸 𝑝(𝑥)𝑑𝑥 indicates the probability that we attach to the missile being in region 𝐸.
The density 𝑝 is called our prior for the random variable 𝑥.
To keep things tractable in our example, we assume that our prior is Gaussian.
In particular, we take

𝑝 = 𝑁 (𝑥,̂ Σ) (1)

where 𝑥̂ is the mean of the distribution and Σ is a 2×2 covariance matrix. In our simulations,
we will suppose that

0.2 0.4 0.3


𝑥̂ = ( ), Σ=( ) (2)
−0.2 0.3 0.45

This density 𝑝(𝑥) is shown below as a contour map, with the center of the red ellipse being
equal to 𝑥.̂

In [3]: # Set up the Gaussian prior density p


Σ = [[0.4, 0.3], [0.3, 0.45]]
Σ = np.matrix(Σ)
x_hat = np.matrix([0.2, ­0.2]).T
# Define the matrices G and R from the equation y = G x + N(0, R)
G = [[1, 0], [0, 1]]
G = np.matrix(G)
R = 0.5 * Σ
16.3. THE BASIC IDEA 273

# The matrices A and Q


A = [[1.2, 0], [0, ­0.2]]
A = np.matrix(A)
Q = 0.3 * Σ
# The observed value of y
y = np.matrix([2.3, ­1.9]).T

# Set up grid for plotting


x_grid = np.linspace(­1.5, 2.9, 100)
y_grid = np.linspace(­3.1, 1.7, 100)
X, Y = np.meshgrid(x_grid, y_grid)

def bivariate_normal(x, y, σ_x=1.0, σ_y=1.0, μ_x=0.0, μ_y=0.0, σ_xy=0.0):


"""
Compute and return the probability density function of bivariate normal
distribution of normal random variables x and y

Parameters
­­­­­­­­­­
x : array_like(float)
Random variable

y : array_like(float)
Random variable

σ_x : array_like(float)
Standard deviation of random variable x

σ_y : array_like(float)
Standard deviation of random variable y

μ_x : scalar(float)
Mean value of random variable x

μ_y : scalar(float)
Mean value of random variable y

σ_xy : array_like(float)
Covariance of random variables x and y

"""

x_μ = x ­ μ_x
y_μ = y ­ μ_y

ρ = σ_xy / (σ_x * σ_y)


z = x_μ**2 / σ_x**2 + y_μ**2 / σ_y**2 ­ 2 * ρ * x_μ * y_μ / (σ_x * σ_y)
denom = 2 * np.pi * σ_x * σ_y * np.sqrt(1 ­ ρ**2)
return np.exp(­z / (2 * (1 ­ ρ**2))) / denom

def gen_gaussian_plot_vals(μ, C):


"Z values for plotting the bivariate Gaussian N(μ, C)"
m_x, m_y = float(μ[0]), float(μ[1])
s_x, s_y = np.sqrt(C[0, 0]), np.sqrt(C[1, 1])
s_xy = C[0, 1]
return bivariate_normal(X, Y, s_x, s_y, m_x, m_y, s_xy)

# Plot the figure

fig, ax = plt.subplots(figsize=(10, 8))


ax.grid()

Z = gen_gaussian_plot_vals(x_hat, Σ)
274 CHAPTER 16. A FIRST LOOK AT THE KALMAN FILTER

ax.contourf(X, Y, Z, 6, alpha=0.6, cmap=cm.jet)


cs = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs, inline=1, fontsize=10)

plt.show()

16.3.1 The Filtering Step

We are now presented with some good news and some bad news.
The good news is that the missile has been located by our sensors, which report that the cur-
rent location is 𝑦 = (2.3, −1.9).
The next figure shows the original prior 𝑝(𝑥) and the new reported location 𝑦

In [4]: fig, ax = plt.subplots(figsize=(10, 8))


ax.grid()

Z = gen_gaussian_plot_vals(x_hat, Σ)
ax.contourf(X, Y, Z, 6, alpha=0.6, cmap=cm.jet)
cs = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs, inline=1, fontsize=10)
ax.text(float(y[0]), float(y[1]), "$y$", fontsize=20, color="black")

plt.show()
16.3. THE BASIC IDEA 275

The bad news is that our sensors are imprecise.


In particular, we should interpret the output of our sensor not as 𝑦 = 𝑥, but rather as

𝑦 = 𝐺𝑥 + 𝑣, where 𝑣 ∼ 𝑁 (0, 𝑅) (3)

Here 𝐺 and 𝑅 are 2 × 2 matrices with 𝑅 positive definite. Both are assumed known, and the
noise term 𝑣 is assumed to be independent of 𝑥.
How then should we combine our prior 𝑝(𝑥) = 𝑁 (𝑥,̂ Σ) and this new information 𝑦 to improve
our understanding of the location of the missile?
As you may have guessed, the answer is to use Bayes’ theorem, which tells us to update our
prior 𝑝(𝑥) to 𝑝(𝑥 | 𝑦) via

𝑝(𝑦 | 𝑥) 𝑝(𝑥)
𝑝(𝑥 | 𝑦) =
𝑝(𝑦)

where 𝑝(𝑦) = ∫ 𝑝(𝑦 | 𝑥) 𝑝(𝑥)𝑑𝑥.


In solving for 𝑝(𝑥 | 𝑦), we observe that
• 𝑝(𝑥) = 𝑁 (𝑥,̂ Σ).
• In view of (3), the conditional density 𝑝(𝑦 | 𝑥) is 𝑁 (𝐺𝑥, 𝑅).
• 𝑝(𝑦) does not depend on 𝑥, and enters into the calculations only as a normalizing con-
stant.
Because we are in a linear and Gaussian framework, the updated density can be computed by
calculating population linear regressions.
276 CHAPTER 16. A FIRST LOOK AT THE KALMAN FILTER

In particular, the solution is known Section ?? to be

𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 )

where

𝑥𝐹̂ ∶= 𝑥̂ + Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 (𝑦 − 𝐺𝑥)̂ and Σ𝐹 ∶= Σ − Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 𝐺Σ (4)

Here Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 is the matrix of population regression coefficients of the hidden object
𝑥 − 𝑥̂ on the surprise 𝑦 − 𝐺𝑥.̂
This new density 𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 ) is shown in the next figure via contour lines and the
color map.
The original density is left in as contour lines for comparison

In [5]: fig, ax = plt.subplots(figsize=(10, 8))


ax.grid()

Z = gen_gaussian_plot_vals(x_hat, Σ)
cs1 = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs1, inline=1, fontsize=10)
M = Σ * G.T * linalg.inv(G * Σ * G.T + R)
x_hat_F = x_hat + M * (y ­ G * x_hat)
Σ_F = Σ ­ M * G * Σ
new_Z = gen_gaussian_plot_vals(x_hat_F, Σ_F)
cs2 = ax.contour(X, Y, new_Z, 6, colors="black")
ax.clabel(cs2, inline=1, fontsize=10)
ax.contourf(X, Y, new_Z, 6, alpha=0.6, cmap=cm.jet)
ax.text(float(y[0]), float(y[1]), "$y$", fontsize=20, color="black")

plt.show()
16.3. THE BASIC IDEA 277

Our new density twists the prior 𝑝(𝑥) in a direction determined by the new information 𝑦 −
𝐺𝑥.̂
In generating the figure, we set 𝐺 to the identity matrix and 𝑅 = 0.5Σ for Σ defined in (2).

16.3.2 The Forecast Step

What have we achieved so far?


We have obtained probabilities for the current location of the state (missile) given prior and
current information.
This is called “filtering” rather than forecasting because we are filtering out noise rather than
looking into the future.
• 𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 ) is called the filtering distribution
But now let’s suppose that we are given another task: to predict the location of the missile
after one unit of time (whatever that may be) has elapsed.
To do this we need a model of how the state evolves.
Let’s suppose that we have one, and that it’s linear and Gaussian. In particular,

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝑤𝑡+1 , where 𝑤𝑡 ∼ 𝑁 (0, 𝑄) (5)

Our aim is to combine this law of motion and our current distribution 𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 ) to
come up with a new predictive distribution for the location in one unit of time.
278 CHAPTER 16. A FIRST LOOK AT THE KALMAN FILTER

In view of (5), all we have to do is introduce a random vector 𝑥𝐹 ∼ 𝑁 (𝑥𝐹̂ , Σ𝐹 ) and work out
the distribution of 𝐴𝑥𝐹 + 𝑤 where 𝑤 is independent of 𝑥𝐹 and has distribution 𝑁 (0, 𝑄).
Since linear combinations of Gaussians are Gaussian, 𝐴𝑥𝐹 + 𝑤 is Gaussian.
Elementary calculations and the expressions in (4) tell us that

𝔼[𝐴𝑥𝐹 + 𝑤] = 𝐴𝔼𝑥𝐹 + 𝔼𝑤 = 𝐴𝑥𝐹̂ = 𝐴𝑥̂ + 𝐴Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 (𝑦 − 𝐺𝑥)̂

and

Var[𝐴𝑥𝐹 + 𝑤] = 𝐴 Var[𝑥𝐹 ]𝐴′ + 𝑄 = 𝐴Σ𝐹 𝐴′ + 𝑄 = 𝐴Σ𝐴′ − 𝐴Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 𝐺Σ𝐴′ + 𝑄

The matrix 𝐴Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 is often written as 𝐾Σ and called the Kalman gain.
• The subscript Σ has been added to remind us that 𝐾Σ depends on Σ, but not 𝑦 or 𝑥.̂
Using this notation, we can summarize our results as follows.
Our updated prediction is the density 𝑁 (𝑥𝑛𝑒𝑤
̂ , Σ𝑛𝑒𝑤 ) where

𝑥𝑛𝑒𝑤
̂ ∶= 𝐴𝑥̂ + 𝐾Σ (𝑦 − 𝐺𝑥)̂
(6)
Σ𝑛𝑒𝑤 ∶= 𝐴Σ𝐴′ − 𝐾Σ 𝐺Σ𝐴′ + 𝑄
• The density 𝑝𝑛𝑒𝑤 (𝑥) = 𝑁 (𝑥𝑛𝑒𝑤
̂ , Σ𝑛𝑒𝑤 ) is called the predictive distribution
The predictive distribution is the new density shown in the following figure, where the update
has used parameters.

1.2 0.0
𝐴=( ), 𝑄 = 0.3 ∗ Σ
0.0 −0.2

In [6]: fig, ax = plt.subplots(figsize=(10, 8))


ax.grid()

# Density 1
Z = gen_gaussian_plot_vals(x_hat, Σ)
cs1 = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs1, inline=1, fontsize=10)

# Density 2
M = Σ * G.T * linalg.inv(G * Σ * G.T + R)
x_hat_F = x_hat + M * (y ­ G * x_hat)
Σ_F = Σ ­ M * G * Σ
Z_F = gen_gaussian_plot_vals(x_hat_F, Σ_F)
cs2 = ax.contour(X, Y, Z_F, 6, colors="black")
ax.clabel(cs2, inline=1, fontsize=10)

# Density 3
new_x_hat = A * x_hat_F
new_Σ = A * Σ_F * A.T + Q
new_Z = gen_gaussian_plot_vals(new_x_hat, new_Σ)
cs3 = ax.contour(X, Y, new_Z, 6, colors="black")
ax.clabel(cs3, inline=1, fontsize=10)
ax.contourf(X, Y, new_Z, 6, alpha=0.6, cmap=cm.jet)
ax.text(float(y[0]), float(y[1]), "$y$", fontsize=20, color="black")

plt.show()
16.3. THE BASIC IDEA 279

16.3.3 The Recursive Procedure

Let’s look back at what we’ve done.


We started the current period with a prior 𝑝(𝑥) for the location 𝑥 of the missile.
We then used the current measurement 𝑦 to update to 𝑝(𝑥 | 𝑦).
Finally, we used the law of motion (5) for {𝑥𝑡 } to update to 𝑝𝑛𝑒𝑤 (𝑥).
If we now step into the next period, we are ready to go round again, taking 𝑝𝑛𝑒𝑤 (𝑥) as the
current prior.
Swapping notation 𝑝𝑡 (𝑥) for 𝑝(𝑥) and 𝑝𝑡+1 (𝑥) for 𝑝𝑛𝑒𝑤 (𝑥), the full recursive procedure is:

1. Start the current period with prior 𝑝𝑡 (𝑥) = 𝑁 (𝑥𝑡̂ , Σ𝑡 ).

2. Observe current measurement 𝑦𝑡 .

3. Compute the filtering distribution 𝑝𝑡 (𝑥 | 𝑦) = 𝑁 (𝑥𝐹 𝐹


𝑡̂ , Σ𝑡 ) from 𝑝𝑡 (𝑥) and 𝑦𝑡 , applying
Bayes rule and the conditional distribution (3).

4. Compute the predictive distribution 𝑝𝑡+1 (𝑥) = 𝑁 (𝑥𝑡+1


̂ , Σ𝑡+1 ) from the filtering distribu-
tion and (5).

5. Increment 𝑡 by one and go to step 1.


280 CHAPTER 16. A FIRST LOOK AT THE KALMAN FILTER

Repeating (6), the dynamics for 𝑥𝑡̂ and Σ𝑡 are as follows

𝑥𝑡+1
̂ = 𝐴𝑥𝑡̂ + 𝐾Σ𝑡 (𝑦𝑡 − 𝐺𝑥𝑡̂ )
(7)
Σ𝑡+1 = 𝐴Σ𝑡 𝐴′ − 𝐾Σ𝑡 𝐺Σ𝑡 𝐴′ + 𝑄

These are the standard dynamic equations for the Kalman filter (see, for example, [72], page
58).

16.4 Convergence

The matrix Σ𝑡 is a measure of the uncertainty of our prediction 𝑥𝑡̂ of 𝑥𝑡 .


Apart from special cases, this uncertainty will never be fully resolved, regardless of how much
time elapses.
One reason is that our prediction 𝑥𝑡̂ is made based on information available at 𝑡 − 1, not 𝑡.
Even if we know the precise value of 𝑥𝑡−1 (which we don’t), the transition equation (5) im-
plies that 𝑥𝑡 = 𝐴𝑥𝑡−1 + 𝑤𝑡 .
Since the shock 𝑤𝑡 is not observable at 𝑡 − 1, any time 𝑡 − 1 prediction of 𝑥𝑡 will incur some
error (unless 𝑤𝑡 is degenerate).
However, it is certainly possible that Σ𝑡 converges to a constant matrix as 𝑡 → ∞.
To study this topic, let’s expand the second equation in (7):

Σ𝑡+1 = 𝐴Σ𝑡 𝐴′ − 𝐴Σ𝑡 𝐺′ (𝐺Σ𝑡 𝐺′ + 𝑅)−1 𝐺Σ𝑡 𝐴′ + 𝑄 (8)

This is a nonlinear difference equation in Σ𝑡 .


A fixed point of (8) is a constant matrix Σ such that

Σ = 𝐴Σ𝐴′ − 𝐴Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 𝐺Σ𝐴′ + 𝑄 (9)

Equation (8) is known as a discrete-time Riccati difference equation.


Equation (9) is known as a discrete-time algebraic Riccati equation.
Conditions under which a fixed point exists and the sequence {Σ𝑡 } converges to it are dis-
cussed in [6] and [5], chapter 4.
A sufficient (but not necessary) condition is that all the eigenvalues 𝜆𝑖 of 𝐴 satisfy |𝜆𝑖 | < 1
(cf. e.g., [5], p. 77).
(This strong condition assures that the unconditional distribution of 𝑥𝑡 converges as 𝑡 → +∞)
In this case, for any initial choice of Σ0 that is both non-negative and symmetric, the se-
quence {Σ𝑡 } in (8) converges to a non-negative symmetric matrix Σ that solves (9).

16.5 Implementation

The class Kalman from the QuantEcon.py package implements the Kalman filter
16.6. EXERCISES 281

• Instance data consists of:


– the moments (𝑥𝑡̂ , Σ𝑡 ) of the current prior.
– An instance of the LinearStateSpace class from QuantEcon.py.
The latter represents a linear state space model of the form

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1


𝑦𝑡 = 𝐺𝑥𝑡 + 𝐻𝑣𝑡

where the shocks 𝑤𝑡 and 𝑣𝑡 are IID standard normals.


To connect this with the notation of this lecture we set

𝑄 ∶= 𝐶𝐶 ′ and 𝑅 ∶= 𝐻𝐻 ′
• The class Kalman from the QuantEcon.py package has a number of methods, some that
we will wait to use until we study more advanced applications in subsequent lectures.
• Methods pertinent for this lecture are:
– prior_to_filtered, which updates (𝑥𝑡̂ , Σ𝑡 ) to (𝑥𝐹 𝐹
𝑡̂ , Σ𝑡 )
– filtered_to_forecast, which updates the filtering distribution to the predictive
distribution – which becomes the new prior (𝑥𝑡+1̂ , Σ𝑡+1 )
– update, which combines the last two methods
– a stationary_values, which computes the solution to (9) and the corresponding
(stationary) Kalman gain
You can view the program on GitHub.

16.6 Exercises

16.6.1 Exercise 1

Consider the following simple application of the Kalman filter, loosely based on [72], section
2.9.2.
Suppose that
• all variables are scalars
• the hidden state {𝑥𝑡 } is in fact constant, equal to some 𝜃 ∈ ℝ unknown to the modeler
State dynamics are therefore given by (5) with 𝐴 = 1, 𝑄 = 0 and 𝑥0 = 𝜃.
The measurement equation is 𝑦𝑡 = 𝜃 + 𝑣𝑡 where 𝑣𝑡 is 𝑁 (0, 1) and IID.
The task of this exercise to simulate the model and, using the code from kalman.py, plot the
first five predictive densities 𝑝𝑡 (𝑥) = 𝑁 (𝑥𝑡̂ , Σ𝑡 ).
As shown in [72], sections 2.9.1–2.9.2, these distributions asymptotically put all mass on the
unknown value 𝜃.
In the simulation, take 𝜃 = 10, 𝑥0̂ = 8 and Σ0 = 1.
Your figure should – modulo randomness – look something like this
282 CHAPTER 16. A FIRST LOOK AT THE KALMAN FILTER

16.6.2 Exercise 2

The preceding figure gives some support to the idea that probability mass converges to 𝜃.
To get a better idea, choose a small 𝜖 > 0 and calculate

𝜃+𝜖
𝑧𝑡 ∶= 1 − ∫ 𝑝𝑡 (𝑥)𝑑𝑥
𝜃−𝜖

for 𝑡 = 0, 1, 2, … , 𝑇 .
Plot 𝑧𝑡 against 𝑇 , setting 𝜖 = 0.1 and 𝑇 = 600.
Your figure should show error erratically declining something like this
16.6. EXERCISES 283

16.6.3 Exercise 3

As discussed above, if the shock sequence {𝑤𝑡 } is not degenerate, then it is not in general
possible to predict 𝑥𝑡 without error at time 𝑡 − 1 (and this would be the case even if we could
observe 𝑥𝑡−1 ).
Let’s now compare the prediction 𝑥𝑡̂ made by the Kalman filter against a competitor who is
allowed to observe 𝑥𝑡−1 .
This competitor will use the conditional expectation 𝔼[𝑥𝑡 | 𝑥𝑡−1 ], which in this case is 𝐴𝑥𝑡−1 .
The conditional expectation is known to be the optimal prediction method in terms of mini-
mizing mean squared error.
(More precisely, the minimizer of 𝔼 ‖𝑥𝑡 − 𝑔(𝑥𝑡−1 )‖2 with respect to 𝑔 is 𝑔∗ (𝑥𝑡−1 ) ∶= 𝔼[𝑥𝑡 | 𝑥𝑡−1 ])
Thus we are comparing the Kalman filter against a competitor who has more information (in
the sense of being able to observe the latent state) and behaves optimally in terms of mini-
mizing squared error.
Our horse race will be assessed in terms of squared error.
In particular, your task is to generate a graph plotting observations of both ‖𝑥𝑡 − 𝐴𝑥𝑡−1 ‖2 and
‖𝑥𝑡 − 𝑥𝑡̂ ‖2 against 𝑡 for 𝑡 = 1, … , 50.
For the parameters, set 𝐺 = 𝐼, 𝑅 = 0.5𝐼 and 𝑄 = 0.3𝐼, where 𝐼 is the 2 × 2 identity.
Set

0.5 0.4
𝐴=( )
0.6 0.3

To initialize the prior density, set


284 CHAPTER 16. A FIRST LOOK AT THE KALMAN FILTER

0.9 0.3
Σ0 = ( )
0.3 0.9

and 𝑥0̂ = (8, 8).


Finally, set 𝑥0 = (0, 0).
You should end up with a figure similar to the following (modulo randomness)

Observe how, after an initial learning period, the Kalman filter performs quite well, even rela-
tive to the competitor who predicts optimally with knowledge of the latent state.

16.6.4 Exercise 4

Try varying the coefficient 0.3 in 𝑄 = 0.3𝐼 up and down.


Observe how the diagonal values in the stationary solution Σ (see (9)) increase and decrease
in line with this coefficient.
The interpretation is that more randomness in the law of motion for 𝑥𝑡 causes more (perma-
nent) uncertainty in prediction.

16.7 Solutions

16.7.1 Exercise 1

In [7]: # Parameters
θ = 10 # Constant value of state x_t
A, C, G, H = 1, 0, 1, 1
ss = LinearStateSpace(A, C, G, H, mu_0=θ)

# Set prior, initialize kalman filter


16.7. SOLUTIONS 285

x_hat_0, Σ_0 = 8, 1
kalman = Kalman(ss, x_hat_0, Σ_0)

# Draw observations of y from state space model


N = 5
x, y = ss.simulate(N)
y = y.flatten()

# Set up plot
fig, ax = plt.subplots(figsize=(10,8))
xgrid = np.linspace(θ ­ 5, θ + 2, 200)

for i in range(N):
# Record the current predicted mean and variance
m, v = [float(z) for z in (kalman.x_hat, kalman.Sigma)]
# Plot, update filter
ax.plot(xgrid, norm.pdf(xgrid, loc=m, scale=np.sqrt(v)), label=f'$t={i}$')
kalman.update(y[i])

ax.set_title(f'First {N} densities when $\\theta = {θ:.1f}$')


ax.legend(loc='upper left')
plt.show()

16.7.2 Exercise 2

In [8]: ϵ = 0.1
θ = 10 # Constant value of state x_t
286 CHAPTER 16. A FIRST LOOK AT THE KALMAN FILTER

A, C, G, H = 1, 0, 1, 1
ss = LinearStateSpace(A, C, G, H, mu_0=θ)

x_hat_0, Σ_0 = 8, 1
kalman = Kalman(ss, x_hat_0, Σ_0)

T = 600
z = np.empty(T)
x, y = ss.simulate(T)
y = y.flatten()

for t in range(T):
# Record the current predicted mean and variance and plot their densities
m, v = [float(temp) for temp in (kalman.x_hat, kalman.Sigma)]

f = lambda x: norm.pdf(x, loc=m, scale=np.sqrt(v))


integral, error = quad(f, θ ­ ϵ, θ + ϵ)
z[t] = 1 ­ integral

kalman.update(y[t])

fig, ax = plt.subplots(figsize=(9, 7))


ax.set_ylim(0, 1)
ax.set_xlim(0, T)
ax.plot(range(T), z)
ax.fill_between(range(T), np.zeros(T), z, color="blue", alpha=0.2)
plt.show()
16.7. SOLUTIONS 287

16.7.3 Exercise 3

In [9]: # Define A, C, G, H
G = np.identity(2)
H = np.sqrt(0.5) * np.identity(2)

A = [[0.5, 0.4],
[0.6, 0.3]]
C = np.sqrt(0.3) * np.identity(2)

# Set up state space mode, initial value x_0 set to zero


ss = LinearStateSpace(A, C, G, H, mu_0 = np.zeros(2))

# Define the prior density


Σ = [[0.9, 0.3],
[0.3, 0.9]]
Σ = np.array(Σ)
x_hat = np.array([8, 8])

# Initialize the Kalman filter


kn = Kalman(ss, x_hat, Σ)

# Print eigenvalues of A
print("Eigenvalues of A:")
print(eigvals(A))

# Print stationary Σ
S, K = kn.stationary_values()
print("Stationary prediction error variance:")
print(S)

# Generate the plot


T = 50
x, y = ss.simulate(T)

e1 = np.empty(T­1)
e2 = np.empty(T­1)

for t in range(1, T):


kn.update(y[:,t])
e1[t­1] = np.sum((x[:, t] ­ kn.x_hat.flatten())**2)
e2[t­1] = np.sum((x[:, t] ­ A @ x[:, t­1])**2)

fig, ax = plt.subplots(figsize=(9,6))
ax.plot(range(1, T), e1, 'k­', lw=2, alpha=0.6,
label='Kalman filter error')
ax.plot(range(1, T), e2, 'g­', lw=2, alpha=0.6,
label='Conditional expectation error')
ax.legend()
plt.show()

Eigenvalues of A:
[ 0.9+0.j ­0.1+0.j]
Stationary prediction error variance:
[[0.40329108 0.1050718 ]
[0.1050718 0.41061709]]
288 CHAPTER 16. A FIRST LOOK AT THE KALMAN FILTER

Footnotes
[1] See, for example, page 93 of [16]. To get from his expressions to the ones used above, you
will also need to apply the Woodbury matrix identity.
Chapter 17

Shortest Paths

17.1 Contents

• Overview 17.2
• Outline of the Problem 17.3
• Finding Least-Cost Paths 17.4
• Solving for Minimum Cost-to-Go 17.5
• Exercises 17.6
• Solutions 17.7

17.2 Overview

The shortest path problem is a classic problem in mathematics and computer science with
applications in
• Economics (sequential decision making, analysis of social networks, etc.)
• Operations research and transportation
• Robotics and artificial intelligence
• Telecommunication network design and routing
• etc., etc.
Variations of the methods we discuss in this lecture are used millions of times every day, in
applications such as
• Google Maps
• routing packets on the internet
For us, the shortest path problem also provides a nice introduction to the logic of dynamic
programming.
Dynamic programming is an extremely powerful optimization technique that we apply in
many lectures on this site.
The only scientific library we’ll need in what follows is NumPy:

In [1]: import numpy as np

289
290 CHAPTER 17. SHORTEST PATHS

17.3 Outline of the Problem

The shortest path problem is one of finding how to traverse a graph from one specified node
to another at minimum cost.
Consider the following graph

We wish to travel from node (vertex) A to node G at minimum cost


• Arrows (edges) indicate the movements we can take.
• Numbers on edges indicate the cost of traveling that edge.
(Graphs such as the one above are called weighted directed graphs)
Possible interpretations of the graph include
• Minimum cost for supplier to reach a destination.
• Routing of packets on the internet (minimize time).
• Etc., etc.
For this simple graph, a quick scan of the edges shows that the optimal paths are
• A, C, F, G at cost 8
17.4. FINDING LEAST-COST PATHS 291

• A, D, F, G at cost 8

17.4 Finding Least-Cost Paths

For large graphs, we need a systematic solution.


Let 𝐽 (𝑣) denote the minimum cost-to-go from node 𝑣, understood as the total cost from 𝑣 if
we take the best route.
Suppose that we know 𝐽 (𝑣) for each node 𝑣, as shown below for the graph from the preceding
example
292 CHAPTER 17. SHORTEST PATHS

Note that 𝐽 (𝐺) = 0.


The best path can now be found as follows

1. Start at node 𝑣 = 𝐴

2. From current node 𝑣, move to any node that solves

min {𝑐(𝑣, 𝑤) + 𝐽 (𝑤)} (1)


𝑤∈𝐹𝑣

where
• 𝐹𝑣 is the set of nodes that can be reached from 𝑣 in one step.
• 𝑐(𝑣, 𝑤) is the cost of traveling from 𝑣 to 𝑤.
Hence, if we know the function 𝐽 , then finding the best path is almost trivial.
But how can we find the cost-to-go function 𝐽 ?
Some thought will convince you that, for every node 𝑣, the function 𝐽 satisfies

𝐽 (𝑣) = min {𝑐(𝑣, 𝑤) + 𝐽 (𝑤)} (2)


𝑤∈𝐹𝑣

This is known as the Bellman equation, after the mathematician Richard Bellman.
The Bellman equation can be thought of as a restriction that 𝐽 must satisfy.
What we want to do now is use this restriction to compute 𝐽 .

17.5 Solving for Minimum Cost-to-Go

Let’s look at an algorithm for computing 𝐽 and then think about how to implement it.
17.5. SOLVING FOR MINIMUM COST-TO-GO 293

17.5.1 The Algorithm

The standard algorithm for finding 𝐽 is to start an initial guess and then iterate.
This is a standard approach to solving nonlinear equations, often called the method of suc-
cessive approximations.
Our initial guess will be

𝐽0 (𝑣) = 0 for all 𝑣 (3)

Now

1. Set 𝑛 = 0

2. Set 𝐽𝑛+1 (𝑣) = min𝑤∈𝐹𝑣 {𝑐(𝑣, 𝑤) + 𝐽𝑛 (𝑤)} for all 𝑣

3. If 𝐽𝑛+1 and 𝐽𝑛 are not equal then increment 𝑛, go to 2

This sequence converges to 𝐽 .


Although we omit the proof, we’ll prove similar claims in our other lectures on dynamic pro-
gramming.

17.5.2 Implementation

Having an algorithm is a good start, but we also need to think about how to implement it on
a computer.
First, for the cost function 𝑐, we’ll implement it as a matrix 𝑄, where a typical element is

𝑐(𝑣, 𝑤) if 𝑤 ∈ 𝐹𝑣
𝑄(𝑣, 𝑤) = {
+∞ otherwise

In this context 𝑄 is usually called the distance matrix.


We’re also numbering the nodes now, with 𝐴 = 0, so, for example

𝑄(1, 2) = the cost of traveling from B to C

For example, for the simple graph above, we set

In [2]: from numpy import inf

Q = np.array([[inf, 1, 5, 3, inf, inf, inf],


[inf, inf, inf, 9, 6, inf, inf],
[inf, inf, inf, inf, inf, 2, inf],
[inf, inf, inf, inf, inf, 4, 8],
[inf, inf, inf, inf, inf, inf, 4],
[inf, inf, inf, inf, inf, inf, 1],
[inf, inf, inf, inf, inf, inf, 0]])

Notice that the cost of staying still (on the principle diagonal) is set to
294 CHAPTER 17. SHORTEST PATHS

• np.inf for non-destination nodes — moving on is required.


• 0 for the destination node — here is where we stop.
For the sequence of approximations {𝐽𝑛 } of the cost-to-go functions, we can use NumPy ar-
rays.
Let’s try with this example and see how we go:

In [3]: nodes = range(7) # Nodes = 0, 1, ..., 6


J = np.zeros_like(nodes, dtype=np.int) # Initial guess
next_J = np.empty_like(nodes, dtype=np.int) # Stores updated guess

max_iter = 500
i = 0

while i < max_iter:


for v in nodes:
# minimize Q[v, w] + J[w] over all choices of w
lowest_cost = inf
for w in nodes:
cost = Q[v, w] + J[w]
if cost < lowest_cost:
lowest_cost = cost
next_J[v] = lowest_cost
if np.equal(next_J, J).all():
break
else:
J[:] = next_J # Copy contents of next_J to J
i += 1

print("The cost­to­go function is", J)

The cost­to­go function is [ 8 10 3 5 4 1 0]

This matches with the numbers we obtained by inspection above.


But, importantly, we now have a methodology for tackling large graphs.

17.6 Exercises

17.6.1 Exercise 1

The text below describes a weighted directed graph.


The line node0, node1 0.04, node8 11.11, node14 72.21 means that from node0 we can go
to
• node1 at cost 0.04
• node8 at cost 11.11
• node14 at cost 72.21
No other nodes can be reached directly from node0.
Other lines have a similar interpretation.
Your task is to use the algorithm given above to find the optimal path and its cost.
Note: You will be dealing with floating point numbers now, rather than integers, so consider
replacing np.equal() with np.allclose().
17.6. EXERCISES 295

In [4]: %%file graph.txt


node0, node1 0.04, node8 11.11, node14 72.21
node1, node46 1247.25, node6 20.59, node13 64.94
node2, node66 54.18, node31 166.80, node45 1561.45
node3, node20 133.65, node6 2.06, node11 42.43
node4, node75 3706.67, node5 0.73, node7 1.02
node5, node45 1382.97, node7 3.33, node11 34.54
node6, node31 63.17, node9 0.72, node10 13.10
node7, node50 478.14, node9 3.15, node10 5.85
node8, node69 577.91, node11 7.45, node12 3.18
node9, node70 2454.28, node13 4.42, node20 16.53
node10, node89 5352.79, node12 1.87, node16 25.16
node11, node94 4961.32, node18 37.55, node20 65.08
node12, node84 3914.62, node24 34.32, node28 170.04
node13, node60 2135.95, node38 236.33, node40 475.33
node14, node67 1878.96, node16 2.70, node24 38.65
node15, node91 3597.11, node17 1.01, node18 2.57
node16, node36 392.92, node19 3.49, node38 278.71
node17, node76 783.29, node22 24.78, node23 26.45
node18, node91 3363.17, node23 16.23, node28 55.84
node19, node26 20.09, node20 0.24, node28 70.54
node20, node98 3523.33, node24 9.81, node33 145.80
node21, node56 626.04, node28 36.65, node31 27.06
node22, node72 1447.22, node39 136.32, node40 124.22
node23, node52 336.73, node26 2.66, node33 22.37
node24, node66 875.19, node26 1.80, node28 14.25
node25, node70 1343.63, node32 36.58, node35 45.55
node26, node47 135.78, node27 0.01, node42 122.00
node27, node65 480.55, node35 48.10, node43 246.24
node28, node82 2538.18, node34 21.79, node36 15.52
node29, node64 635.52, node32 4.22, node33 12.61
node30, node98 2616.03, node33 5.61, node35 13.95
node31, node98 3350.98, node36 20.44, node44 125.88
node32, node97 2613.92, node34 3.33, node35 1.46
node33, node81 1854.73, node41 3.23, node47 111.54
node34, node73 1075.38, node42 51.52, node48 129.45
node35, node52 17.57, node41 2.09, node50 78.81
node36, node71 1171.60, node54 101.08, node57 260.46
node37, node75 269.97, node38 0.36, node46 80.49
node38, node93 2767.85, node40 1.79, node42 8.78
node39, node50 39.88, node40 0.95, node41 1.34
node40, node75 548.68, node47 28.57, node54 53.46
node41, node53 18.23, node46 0.28, node54 162.24
node42, node59 141.86, node47 10.08, node72 437.49
node43, node98 2984.83, node54 95.06, node60 116.23
node44, node91 807.39, node46 1.56, node47 2.14
node45, node58 79.93, node47 3.68, node49 15.51
node46, node52 22.68, node57 27.50, node67 65.48
node47, node50 2.82, node56 49.31, node61 172.64
node48, node99 2564.12, node59 34.52, node60 66.44
node49, node78 53.79, node50 0.51, node56 10.89
node50, node85 251.76, node53 1.38, node55 20.10
node51, node98 2110.67, node59 23.67, node60 73.79
node52, node94 1471.80, node64 102.41, node66 123.03
node53, node72 22.85, node56 4.33, node67 88.35
node54, node88 967.59, node59 24.30, node73 238.61
node55, node84 86.09, node57 2.13, node64 60.80
node56, node76 197.03, node57 0.02, node61 11.06
node57, node86 701.09, node58 0.46, node60 7.01
node58, node83 556.70, node64 29.85, node65 34.32
node59, node90 820.66, node60 0.72, node71 0.67
node60, node76 48.03, node65 4.76, node67 1.63
node61, node98 1057.59, node63 0.95, node64 4.88
296 CHAPTER 17. SHORTEST PATHS

node62, node91 132.23, node64 2.94, node76 38.43


node63, node66 4.43, node72 70.08, node75 56.34
node64, node80 47.73, node65 0.30, node76 11.98
node65, node94 594.93, node66 0.64, node73 33.23
node66, node98 395.63, node68 2.66, node73 37.53
node67, node82 153.53, node68 0.09, node70 0.98
node68, node94 232.10, node70 3.35, node71 1.66
node69, node99 247.80, node70 0.06, node73 8.99
node70, node76 27.18, node72 1.50, node73 8.37
node71, node89 104.50, node74 8.86, node91 284.64
node72, node76 15.32, node84 102.77, node92 133.06
node73, node83 52.22, node76 1.40, node90 243.00
node74, node81 1.07, node76 0.52, node78 8.08
node75, node92 68.53, node76 0.81, node77 1.19
node76, node85 13.18, node77 0.45, node78 2.36
node77, node80 8.94, node78 0.98, node86 64.32
node78, node98 355.90, node81 2.59
node79, node81 0.09, node85 1.45, node91 22.35
node80, node92 121.87, node88 28.78, node98 264.34
node81, node94 99.78, node89 39.52, node92 99.89
node82, node91 47.44, node88 28.05, node93 11.99
node83, node94 114.95, node86 8.75, node88 5.78
node84, node89 19.14, node94 30.41, node98 121.05
node85, node97 94.51, node87 2.66, node89 4.90
node86, node97 85.09
node87, node88 0.21, node91 11.14, node92 21.23
node88, node93 1.31, node91 6.83, node98 6.12
node89, node97 36.97, node99 82.12
node90, node96 23.53, node94 10.47, node99 50.99
node91, node97 22.17
node92, node96 10.83, node97 11.24, node99 34.68
node93, node94 0.19, node97 6.71, node99 32.77
node94, node98 5.91, node96 2.03
node95, node98 6.17, node99 0.27
node96, node98 3.32, node97 0.43, node99 5.87
node97, node98 0.30
node98, node99 0.33
node99,

Writing graph.txt

17.7 Solutions

17.7.1 Exercise 1

First let’s write a function that reads in the graph data above and builds a distance matrix.

In [5]: num_nodes = 100


destination_node = 99

def map_graph_to_distance_matrix(in_file):

# First let's set of the distance matrix Q with inf everywhere


Q = np.ones((num_nodes, num_nodes))
Q = Q * np.inf

# Now we read in the data and modify Q


infile = open(in_file)
17.7. SOLUTIONS 297

for line in infile:


elements = line.split(',')
node = elements.pop(0)
node = int(node[4:]) # convert node description to integer
if node != destination_node:
for element in elements:
destination, cost = element.split()
destination = int(destination[4:])
Q[node, destination] = float(cost)
Q[destination_node, destination_node] = 0

infile.close()
return Q

In addition, let’s write

1. a “Bellman operator” function that takes a distance matrix and current guess of J and
returns an updated guess of J, and

2. a function that takes a distance matrix and returns a cost-to-go function.

We’ll use the algorithm described above.


The minimization step is vectorized to make it faster.

In [6]: def bellman(J, Q):


num_nodes = Q.shape[0]
next_J = np.empty_like(J)
for v in range(num_nodes):
next_J[v] = np.min(Q[v, :] + J)
return next_J

def compute_cost_to_go(Q):
J = np.zeros(num_nodes) # Initial guess
next_J = np.empty(num_nodes) # Stores updated guess
max_iter = 500
i = 0

while i < max_iter:


next_J = bellman(J, Q)
if np.allclose(next_J, J):
break
else:
J[:] = next_J # Copy contents of next_J to J
i += 1

return(J)

We used np.allclose() rather than testing exact equality because we are dealing with floating
point numbers now.
Finally, here’s a function that uses the cost-to-go function to obtain the optimal path (and its
cost).

In [7]: def print_best_path(J, Q):


sum_costs = 0
current_node = 0
298 CHAPTER 17. SHORTEST PATHS

while current_node != destination_node:


print(current_node)
# Move to the next node and increment costs
next_node = np.argmin(Q[current_node, :] + J)
sum_costs += Q[current_node, next_node]
current_node = next_node

print(destination_node)
print('Cost: ', sum_costs)

Okay, now we have the necessary functions, let’s call them to do the job we were assigned.

In [8]: Q = map_graph_to_distance_matrix('graph.txt')
J = compute_cost_to_go(Q)
print_best_path(J, Q)

0
8
11
18
23
33
41
53
56
57
60
67
70
73
76
85
87
88
93
94
96
97
98
99
Cost: 160.55000000000007
Chapter 18

Cass-Koopmans Planning Problem

18.1 Contents

• Overview 18.2
• The Model 18.3
• Planning Problem 18.4
• Shooting Algorithm 18.5
• Setting Initial Capital to Steady State Capital 18.6
• A Turnpike Property 18.7
• A Limiting Economy 18.8
• Concluding Remarks 18.9

18.2 Overview

This lecture and in Cass-Koopmans Competitive Equilibrium describe a model that Tjalling
Koopmans [66] and David Cass [22] used to analyze optimal growth.
The model can be viewed as an extension of the model of Robert Solow described in an ear-
lier lecture but adapted to make the saving rate the outcome of an optimal choice.
(Solow assumed a constant saving rate determined outside the model).
We describe two versions of the model, one in this lecture and the other in Cass-Koopmans
Competitive Equilibrium.
Together, the two lectures illustrate what is, in fact, a more general connection between a
planned economy and a decentralized economy organized as a competitive equilibrium.
This lecture is devoted to the planned economy version.
The lecture uses important ideas including
• A min-max problem for solving a planning problem.
• A shooting algorithm for solving difference equations subject to initial and terminal
conditions.
• A turnpike property that describes optimal paths for long but finite-horizon
economies.
Let’s start with some standard imports:

299
300 CHAPTER 18. CASS-KOOPMANS PLANNING PROBLEM

In [1]: from numba import njit, jitclass, float64


import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

18.3 The Model

Time is discrete and takes values 𝑡 = 0, 1, … , 𝑇 where 𝑇 is finite.


(We’ll study a limiting case in which 𝑇 = +∞ before concluding).
A single good can either be consumed or invested in physical capital.
The consumption good is not durable and depreciates completely if not consumed immedi-
ately.
The capital good is durable but depreciates some each period.
We let 𝐶𝑡 be a nondurable consumption good at time 𝑡.
Let 𝐾𝑡 be the stock of physical capital at time 𝑡.
Let 𝐶 ⃗ = {𝐶0 , … , 𝐶𝑇 } and 𝐾⃗ = {𝐾0 , … , 𝐾𝑇 +1 }.
A representative household is endowed with one unit of labor at each 𝑡 and likes the con-
sumption good at each 𝑡.
The representative household inelastically supplies a single unit of labor 𝑁𝑡 at each 𝑡, so that
𝑁𝑡 = 1 for all 𝑡 ∈ [0, 𝑇 ].
The representative household has preferences over consumption bundles ordered by the utility
functional:

𝑇 1−𝛾
𝐶
𝑈 (𝐶)⃗ = ∑ 𝛽 𝑡 𝑡 (1)
𝑡=0
1−𝛾

where 𝛽 ∈ (0, 1) is a discount factor and 𝛾 > 0 governs the curvature of the one-period utility
function with larger 𝛾 implying more curvature.
Note that

𝐶𝑡1−𝛾
𝑢(𝐶𝑡 ) = (2)
1−𝛾

satisfies 𝑢′ > 0, 𝑢″ < 0.


𝑢′ > 0 asserts that the consumer prefers more to less.
𝑢″ < 0 asserts that marginal utility declines with increases in 𝐶𝑡 .
We assume that 𝐾0 > 0 is an exogenous initial capital stock.
There is an economy-wide production function

𝐹 (𝐾𝑡 , 𝑁𝑡 ) = 𝐴𝐾𝑡𝛼 𝑁𝑡1−𝛼 (3)

with 0 < 𝛼 < 1, 𝐴 > 0.


18.4. PLANNING PROBLEM 301

A feasible allocation 𝐶,⃗ 𝐾⃗ satisfies

𝐶𝑡 + 𝐾𝑡+1 ≤ 𝐹 (𝐾𝑡 , 𝑁𝑡 ) + (1 − 𝛿)𝐾𝑡 , for all 𝑡 ∈ [0, 𝑇 ] (4)

where 𝛿 ∈ (0, 1) is a depreciation rate of capital.

18.4 Planning Problem

A planner chooses an allocation {𝐶,⃗ 𝐾}


⃗ to maximize (1) subject to (4).

Let 𝜇⃗ = {𝜇0 , … , 𝜇𝑇 } be a sequence of nonnegative Lagrange multipliers.


To find an optimal allocation, form a Lagrangian

𝑇
ℒ(𝐶,⃗ 𝐾,⃗ 𝜇)⃗ = ∑ 𝛽 𝑡 {𝑢(𝐶𝑡 ) + 𝜇𝑡 (𝐹 (𝐾𝑡 , 1) + (1 − 𝛿)𝐾𝑡 − 𝐶𝑡 − 𝐾𝑡+1 )}
𝑡=0

and then pose the following min-max problem:

min max ℒ(𝐶,⃗ 𝐾,⃗ 𝜇)⃗ (5)


𝜇⃗ 𝐶,⃗ 𝐾⃗

• Extremization means maximization with respect to 𝐶,⃗ 𝐾⃗ and minimization with re-
spect to 𝜇.⃗
• Our problem satisfies conditions that assure that required second-order conditions are
satisfied at an allocation that satisfies the first-order conditions that we are about to
compute.
Before computing first-order conditions, we present some handy formulas.

18.4.1 Useful Properties of Linearly Homogeneous Production Function

The following technicalities will help us.


Notice that

𝛼
𝐾
𝐹 (𝐾𝑡 , 𝑁𝑡 ) = 𝐴𝐾𝑡𝛼 𝑁𝑡1−𝛼 = 𝑁𝑡 𝐴 ( 𝑡 )
𝑁𝑡

Define the output per-capita production function

𝛼
𝐹 (𝐾𝑡 , 𝑁𝑡 ) 𝐾 𝐾
≡ 𝑓 ( 𝑡) = 𝐴( 𝑡)
𝑁𝑡 𝑁𝑡 𝑁𝑡

whose argument is capital per-capita.


It is useful to recall the following calculations for the marginal product of capital
302 CHAPTER 18. CASS-KOOPMANS PLANNING PROBLEM

𝜕𝐹 (𝐾𝑡 , 𝑁𝑡 ) 𝜕𝑁𝑡 𝑓 ( 𝐾
𝑁𝑡 )
𝑡

=
𝜕𝐾𝑡 𝜕𝐾𝑡
𝐾 1
= 𝑁𝑡 𝑓 ′ ( 𝑡 ) (Chain rule)
𝑁𝑡 𝑁𝑡 (6)
𝐾
= 𝑓 ′ ( 𝑡 )∣
𝑁𝑡 𝑁 =1
𝑡

= 𝑓 ′ (𝐾𝑡 )

and the marginal product of labor

𝜕𝐹 (𝐾𝑡 , 𝑁𝑡 ) 𝜕𝑁𝑡 𝑓 ( 𝐾
𝑁𝑡 )
𝑡

= (Product rule)
𝜕𝑁𝑡 𝜕𝑁𝑡
𝐾 𝐾 −𝐾
= 𝑓 ( 𝑡 ) +𝑁𝑡 𝑓 ′ ( 𝑡 ) 2𝑡 (Chain rule)
𝑁𝑡 𝑁𝑡 𝑁𝑡
𝐾 𝐾 𝐾
= 𝑓 ( 𝑡 ) − 𝑡 𝑓 ′ ( 𝑡 )∣
𝑁𝑡 𝑁𝑡 𝑁𝑡 𝑁 =1
𝑡

= 𝑓(𝐾𝑡 ) − 𝑓 ′ (𝐾𝑡 )𝐾𝑡

18.4.2 First-order necessary conditions

We now compute first order necessary conditions for extremization of the Lagrangian:

𝐶𝑡 ∶ 𝑢′ (𝐶𝑡 ) − 𝜇𝑡 = 0 for all 𝑡 = 0, 1, … , 𝑇 (7)

𝐾𝑡 ∶ 𝛽𝜇𝑡 [(1 − 𝛿) + 𝑓 ′ (𝐾𝑡 )] − 𝜇𝑡−1 = 0 for all 𝑡 = 1, 2, … , 𝑇 (8)

𝜇𝑡 ∶ 𝐹 (𝐾𝑡 , 1) + (1 − 𝛿)𝐾𝑡 − 𝐶𝑡 − 𝐾𝑡+1 = 0 for all 𝑡 = 0, 1, … , 𝑇 (9)

𝐾𝑇 +1 ∶ −𝜇𝑇 ≤ 0, ≤ 0 if 𝐾𝑇 +1 = 0; = 0 if 𝐾𝑇 +1 > 0 (10)

In computing (9) we recognize that of 𝐾𝑡 appears in both the time 𝑡 and time 𝑡 − 1 feasibility
constraints.
(10) comes from differentiating with respect to 𝐾𝑇 +1 and applying the following Karush-
Kuhn-Tucker condition (KKT) (see Karush-Kuhn-Tucker conditions):

𝜇𝑇 𝐾𝑇 +1 = 0 (11)

Combining (7) and (8) gives

𝑢′ (𝐶𝑡 ) [(1 − 𝛿) + 𝑓 ′ (𝐾𝑡 )] − 𝑢′ (𝐶𝑡−1 ) = 0 for all 𝑡 = 1, 2, … , 𝑇 + 1

which can be rearranged to become


18.4. PLANNING PROBLEM 303

𝑢′ (𝐶𝑡+1 ) [(1 − 𝛿) + 𝑓 ′ (𝐾𝑡+1 )] = 𝑢′ (𝐶𝑡 ) for all 𝑡 = 0, 1, … , 𝑇 (12)

Applying the inverse of the utility function on both sides of the above equation gives

−1
′−1 𝛽
𝐶𝑡+1 = 𝑢 (( ′ [𝑓 ′ (𝐾𝑡+1 ) + (1 − 𝛿)]) )
𝑢 (𝐶𝑡 )

which for our utility function (2) becomes the consumption Euler equation

1/𝛾
𝐶𝑡+1 = (𝛽𝐶𝑡𝛾 [𝑓 ′ (𝐾𝑡+1 ) + (1 − 𝛿)])
1/𝛾
= 𝐶𝑡 (𝛽[𝑓 ′ (𝐾𝑡+1 ) + (1 − 𝛿)])

Below we define a jitclass that stores parameters and functions that define our economy.

In [2]: planning_data = [
('γ', float64), # Coefficient of relative risk aversion
('β', float64), # Discount factor
('δ', float64), # Depreciation rate on capital
('α', float64), # Return to capital per capita
('A', float64) # Technology
]

In [3]: @jitclass(planning_data)
class PlanningProblem():

def __init__(self, γ=2, β=0.95, δ=0.02, α=0.33, A=1):

self.γ, self.β = γ, β
self.δ, self.α, self.A = δ, α, A

def u(self, c):


'''
Utility function
ASIDE: If you have a utility function that is hard to solve by hand
you can use automatic or symbolic differentiation
See https://github.com/HIPS/autograd
'''
γ = self.γ

return c ** (1 ­ γ) / (1 ­ γ) if γ!= 1 else np.log(c)

def u_prime(self, c):


'Derivative of utility'
γ = self.γ

return c ** (­γ)

def u_prime_inv(self, c):


'Inverse of derivative of utility'
γ = self.γ

return c ** (­1 / γ)

def f(self, k):


'Production function'
α, A = self.α, self.A
304 CHAPTER 18. CASS-KOOPMANS PLANNING PROBLEM

return A * k ** α

def f_prime(self, k):


'Derivative of production function'
α, A = self.α, self.A

return α * A * k ** (α ­ 1)

def f_prime_inv(self, k):


'Inverse of derivative of production function'
α, A = self.α, self.A

return (k / (A * α)) ** (1 / (α ­ 1))

def next_k_c(self, k, c):


''''
Given the current capital Kt and an arbitrary feasible
consumption choice Ct, computes Kt+1 by state transition law
and optimal Ct+1 by Euler equation.
'''
β, δ = self.β, self.δ
u_prime, u_prime_inv = self.u_prime, self.u_prime_inv
f, f_prime = self.f, self.f_prime

k_next = f(k) + (1 ­ δ) * k ­ c
c_next = u_prime_inv(u_prime(c) / (β * (f_prime(k_next) + (1 ­ δ))))

return k_next, c_next

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/ipykernel_launcher.py:1: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba­
doc/latest/reference/deprecation.html#change­of­jitclass­location for the time�
↪frame.
"""Entry point for launching an IPython kernel.

We can construct an economy with the Python code:

In [4]: pp = PlanningProblem()

18.5 Shooting Algorithm

We use shooting to compute an optimal allocation 𝐶,⃗ 𝐾⃗ and an associated Lagrange multi-
plier sequence 𝜇.⃗
The first-order necessary conditions (7), (8), and (9) for the planning problem form a system
of difference equations with two boundary conditions:
• 𝐾0 is a given initial condition for capital
• 𝐾𝑇 +1 = 0 is a terminal condition for capital that we deduced from the first-order
necessary condition for 𝐾𝑇 +1 the KKT condition (11)
We have no initial condition for the Lagrange multiplier 𝜇0 .
If we did, our job would be easy:
18.5. SHOOTING ALGORITHM 305

• Given 𝜇0 and 𝑘0 , we could compute 𝑐0 from equation (7) and then 𝑘1 from equation (9)
and 𝜇1 from equation (8).
• We could continue in this way to compute the remaining elements of 𝐶,⃗ 𝐾,⃗ 𝜇.⃗
But we don’t have an initial condition for 𝜇0 , so this won’t work.
Indeed, part of our task is to compute the optimal value of 𝜇0 .
To compute 𝜇0 and the other objects we want, a simple modification of the above procedure
will work.
It is called the shooting algorithm.
It is an instance of a guess and verify algorithm that consists of the following steps:
• Guess an initial Lagrange multiplier 𝜇0 .
• Apply the simple algorithm described above.
• Compute 𝑘𝑇 +1 and check whether it equals zero.
• If 𝐾𝑇 +1 = 0, we have solved the problem.
• If 𝐾𝑇 +1 > 0, lower 𝜇0 and try again.
• If 𝐾𝑇 +1 < 0, raise 𝜇0 and try again.
The following Python code implements the shooting algorithm for the planning problem.
We actually modify the algorithm slightly by starting with a guess for 𝑐0 instead of 𝜇0 in the
following code.

In [5]: @njit
def shooting(pp, c0, k0, T=10):
'''
Given the initial condition of capital k0 and an initial guess
of consumption c0, computes the whole paths of c and k
using the state transition law and Euler equation for T periods.
'''
if c0 > pp.f(k0):
print("initial consumption is not feasible")

return None

# initialize vectors of c and k


c_vec = np.empty(T+1)
k_vec = np.empty(T+2)

c_vec[0] = c0
k_vec[0] = k0

for t in range(T):
k_vec[t+1], c_vec[t+1] = pp.next_k_c(k_vec[t], c_vec[t])

k_vec[T+1] = pp.f(k_vec[T]) + (1 ­ pp.δ) * k_vec[T] ­ c_vec[T]

return c_vec, k_vec

We’ll start with an incorrect guess.

In [6]: paths = shooting(pp, 0.2, 0.3, T=10)

In [7]: fig, axs = plt.subplots(1, 2, figsize=(14, 5))

colors = ['blue', 'red']


306 CHAPTER 18. CASS-KOOPMANS PLANNING PROBLEM

titles = ['Consumption', 'Capital']


ylabels = ['$c_t$', '$k_t$']

T = paths[0].size ­ 1
for i in range(2):
axs[i].plot(paths[i], c=colors[i])
axs[i].set(xlabel='t', ylabel=ylabels[i], title=titles[i])

axs[1].scatter(T+1, 0, s=80)
axs[1].axvline(T+1, color='k', ls='­­', lw=1)

plt.show()

Evidently, our initial guess for 𝜇0 is too high, so initial consumption too low.
We know this because we miss our 𝐾𝑇 +1 = 0 target on the high side.
Now we automate things with a search-for-a-good 𝜇0 algorithm that stops when we hit the
target 𝐾𝑡+1 = 0.
We use a bisection method.
We make an initial guess for 𝐶0 (we can eliminate 𝜇0 because 𝐶0 is an exact function of 𝜇0 ).
We know that the lowest 𝐶0 can ever be is 0 and the largest it can be is initial output 𝑓(𝐾0 ).
Guess 𝐶0 and shoot forward to 𝑇 + 1.
If 𝐾𝑇 +1 > 0, we take it to be our new lower bound on 𝐶0 .
If 𝐾𝑇 +1 < 0, we take it to be our new upper bound.
Make a new guess for 𝐶0 that is halfway between our new upper and lower bounds.
Shoot forward again, iterating on these steps until we converge.
When 𝐾𝑇 +1 gets close enough to 0 (i.e., within an error tolerance bounds), we stop.

In [8]: @njit
def bisection(pp, c0, k0, T=10, tol=1e­4, max_iter=500, k_ter=0, verbose=True):

# initial boundaries for guess c0


c0_upper = pp.f(k0)
c0_lower = 0

i = 0
18.5. SHOOTING ALGORITHM 307

while True:
c_vec, k_vec = shooting(pp, c0, k0, T)
error = k_vec[­1] ­ k_ter

# check if the terminal condition is satisfied


if np.abs(error) < tol:
if verbose:
print('Converged successfully on iteration ', i+1)
return c_vec, k_vec

i += 1
if i == max_iter:
if verbose:
print('Convergence failed.')
return c_vec, k_vec

# if iteration continues, updates boundaries and guess of c0


if error > 0:
c0_lower = c0
else:
c0_upper = c0

c0 = (c0_lower + c0_upper) / 2

In [9]: def plot_paths(pp, c0, k0, T_arr, k_ter=0, k_ss=None, axs=None):

if axs is None:
fix, axs = plt.subplots(1, 3, figsize=(16, 4))
ylabels = ['$c_t$', '$k_t$', '$\mu_t$']
titles = ['Consumption', 'Capital', 'Lagrange Multiplier']

c_paths = []
k_paths = []
for T in T_arr:
c_vec, k_vec = bisection(pp, c0, k0, T, k_ter=k_ter, verbose=False)
c_paths.append(c_vec)
k_paths.append(k_vec)

μ_vec = pp.u_prime(c_vec)
paths = [c_vec, k_vec, μ_vec]

for i in range(3):
axs[i].plot(paths[i])
axs[i].set(xlabel='t', ylabel=ylabels[i], title=titles[i])

# Plot steady state value of capital


if k_ss is not None:
axs[1].axhline(k_ss, c='k', ls='­­', lw=1)

axs[1].axvline(T+1, c='k', ls='­­', lw=1)


axs[1].scatter(T+1, paths[1][­1], s=80)

return c_paths, k_paths

Now we can solve the model and plot the paths of consumption, capital, and Lagrange multi-
plier.

In [10]: plot_paths(pp, 0.3, 0.3, [10]);


308 CHAPTER 18. CASS-KOOPMANS PLANNING PROBLEM

18.6 Setting Initial Capital to Steady State Capital

When 𝑇 → +∞, the optimal allocation converges to steady state values of 𝐶𝑡 and 𝐾𝑡 .
It is instructive to set 𝐾0 equal to the lim𝑇 →+∞ 𝐾𝑡 , which we’ll call steady state capital.
In a steady state 𝐾𝑡+1 = 𝐾𝑡 = 𝐾̄ for all very large 𝑡.
Evalauating the feasibility constraint (4) at 𝐾̄ gives

𝑓(𝐾)̄ − 𝛿 𝐾̄ = 𝐶 ̄ (13)

Substituting 𝐾𝑡 = 𝐾̄ and 𝐶𝑡 = 𝐶 ̄ for all 𝑡 into (12) gives

𝑢′ (𝐶)̄ ′ ̄
1=𝛽 [𝑓 (𝐾) + (1 − 𝛿)]
𝑢′ (𝐶)̄

1
Defining 𝛽 = 1+𝜌 , and cancelling gives

1 + 𝜌 = 1[𝑓 ′ (𝐾)̄ + (1 − 𝛿)]

Simplifying gives

𝑓 ′ (𝐾)̄ = 𝜌 + 𝛿

and

𝐾̄ = 𝑓 ′−1 (𝜌 + 𝛿)

For the production function (3) this becomes

𝛼𝐾̄ 𝛼−1 = 𝜌 + 𝛿

As an example, after setting 𝛼 = .33, 𝜌 = 1/𝛽 − 1 = 1/(19/20) − 1 = 20/19 − 19/19 = 1/19,


𝛿 = 1/50, we get
18.6. SETTING INITIAL CAPITAL TO STEADY STATE CAPITAL 309

67
33 100

𝐾̄ = ( 1
100
1 ) ≈ 9.57583
50 + 19

Let’s verify this with Python and then use this steady state 𝐾̄ as our initial capital stock 𝐾0 .

In [11]: ρ = 1 / pp.β ­ 1
k_ss = pp.f_prime_inv(ρ+pp.δ)

print(f'steady state for capital is: {k_ss}')

steady state for capital is: 9.57583816331462

Now we plot

In [12]: plot_paths(pp, 0.3, k_ss, [150], k_ss=k_ss);

Evidently, with a large value of 𝑇 , 𝐾𝑡 stays near 𝐾0 until 𝑡 approaches 𝑇 closely.


Let’s see what the planner does when we set 𝐾0 below 𝐾.̄

In [13]: plot_paths(pp, 0.3, k_ss/3, [150], k_ss=k_ss);

Notice how the planner pushes capital toward the steady state, stays near there for a while,
then pushes 𝐾𝑡 toward the terminal value 𝐾𝑇 +1 = 0 when 𝑡 closely approaches 𝑇 .
The following graphs compare optimal outcomes as we vary 𝑇 .
310 CHAPTER 18. CASS-KOOPMANS PLANNING PROBLEM

In [14]: plot_paths(pp, 0.3, k_ss/3, [150, 75, 50, 25], k_ss=k_ss);

18.7 A Turnpike Property

The following calculation indicates that when 𝑇 is very large, the optimal capital stock stays
close to its steady state value most of the time.

In [15]: plot_paths(pp, 0.3, k_ss/3, [250, 150, 50, 25], k_ss=k_ss);

Different colors in the above graphs are associated different horizons 𝑇 .


Notice that as the horizon increases, the planner puts 𝐾𝑡 closer to the steady state value 𝐾̄
for longer.
This pattern reflects a turnpike property of the steady state.
A rule of thumb for the planner is
• from 𝐾0 , push 𝐾𝑡 toward the steady state and stay close to the steady state until time
approaches 𝑇 .
𝑓(𝐾𝑡 )−𝐶𝑡
The planner accomplishes this by adjusting the saving rate 𝑓(𝐾𝑡 ) over time.
Let’s calculate and plot the saving rate.

In [16]: @njit
def saving_rate(pp, c_path, k_path):
'Given paths of c and k, computes the path of saving rate.'
production = pp.f(k_path[:­1])

return (production ­ c_path) / production


18.8. A LIMITING ECONOMY 311

In [17]: def plot_saving_rate(pp, c0, k0, T_arr, k_ter=0, k_ss=None, s_ss=None):

fix, axs = plt.subplots(2, 2, figsize=(12, 9))

c_paths, k_paths = plot_paths(pp, c0, k0, T_arr, k_ter=k_ter, k_ss=k_ss,


axs=axs.flatten())

for i, T in enumerate(T_arr):
s_path = saving_rate(pp, c_paths[i], k_paths[i])
axs[1, 1].plot(s_path)

axs[1, 1].set(xlabel='t', ylabel='$s_t$', title='Saving rate')

if s_ss is not None:


axs[1, 1].hlines(s_ss, 0, np.max(T_arr), linestyle='­­')

In [18]: plot_saving_rate(pp, 0.3, k_ss/3, [250, 150, 75, 50], k_ss=k_ss)

18.8 A Limiting Economy

We want to set 𝑇 = +∞.


The appropriate thing to do is to replace terminal condition (10) with

lim 𝛽 𝑇 𝑢′ (𝐶𝑇 )𝐾𝑇 +1 = 0,


𝑇 →+∞
312 CHAPTER 18. CASS-KOOPMANS PLANNING PROBLEM

a condition that will be satisfied by a path that converges to an optimal steady state.
We can approximate the optimal path by starting from an arbitrary initial 𝐾0 and shooting
towards the optimal steady state 𝐾 at a large but finite 𝑇 + 1.
In the following code, we do this for a large 𝑇 and plot consumption, capital, and the saving
rate.
̄ 𝐶̄
𝑓(𝐾)−
We know that in the steady state that the saving rate is constant and that 𝑠 ̄ = 𝑓(𝐾)̄ .

From (13) the steady state saving rate equals

𝛿 𝐾̄
𝑠̄ =
𝑓(𝐾)̄

The steady state saving rate 𝑆 ̄ = 𝑠𝑓(


̄ 𝐾)̄ is the amount required to offset capital depreciation
each period.
We first study optimal capital paths that start below the steady state.

In [19]: # steady state of saving rate


s_ss = pp.δ * k_ss / pp.f(k_ss)

plot_saving_rate(pp, 0.3, k_ss/3, [130], k_ter=k_ss, k_ss=k_ss, s_ss=s_ss)

Since 𝐾0 < 𝐾,̄ 𝑓 ′ (𝐾0 ) > 𝜌 + 𝛿.


The planner chooses a positive saving rate that is higher than the steady state saving rate.
Note, 𝑓 ″ (𝐾) < 0, so as 𝐾 rises, 𝑓 ′ (𝐾) declines.
18.9. CONCLUDING REMARKS 313

The planner slowly lowers the saving rate until reaching a steady state in which 𝑓 ′ (𝐾) = 𝜌+𝛿.

18.8.1 Exercise

• Plot the optimal consumption, capital, and saving paths when the initial capital level
begins at 1.5 times the steady state level as we shoot towards the steady state at 𝑇 =
130.
• Why does the saving rate respond as it does?

18.8.2 Solution

In [20]: plot_saving_rate(pp, 0.3, k_ss*1.5, [130], k_ter=k_ss, k_ss=k_ss, s_ss=s_ss)

18.9 Concluding Remarks

In Cass-Koopmans Competitive Equilibrium, we study a decentralized version of an economy


with exactly the same technology and preference structure as deployed here.
In that lecture, we replace the planner of this lecture with Adam Smith’s invisible hand
In place of quantity choices made by the planner, there are market prices somewhat produced
by the invisible hand.
Market prices must adjust to reconcile distinct decisions that are made independently by a
representative household and a representative firm.
314 CHAPTER 18. CASS-KOOPMANS PLANNING PROBLEM

The relationship between a command economy like the one studied in this lecture and a mar-
ket economy like that studied in Cass-Koopmans Competitive Equilibrium is a foundational
topic in general equilibrium theory and welfare economics.
Chapter 19

Cass-Koopmans Competitive
Equilibrium

19.1 Contents

• Overview 19.2
• Review of Cass-Koopmans Model 19.3
• Competitive Equilibrium 19.4
• Market Structure 19.5
• Firm Problem 19.6
• Household Problem 19.7
• Computing a Competitive Equilibrium 19.8
• Yield Curves and Hicks-Arrow Prices 19.9

19.2 Overview

This lecture continues our analysis in this lecture Cass-Koopmans Planning Model about the
model that Tjalling Koopmans [66] and David Cass [22] used to study optimal growth.
This lecture illustrates what is, in fact, a more general connection between a planned econ-
omy and an economy organized as a competitive equilibrium.
The earlier lecture Cass-Koopmans Planning Model studied a planning problem and used
ideas including
• A min-max problem for solving the planning problem.
• A shooting algorithm for solving difference equations subject to initial and terminal
conditions.
• A turnpike property that describes optimal paths for long-but-finite horizon
economies.
The present lecture uses additional ideas including
• Hicks-Arrow prices named after John R. Hicks and Kenneth Arrow.
• A connection between some Lagrange multipliers in the min-max problem and the
Hicks-Arrow prices.
• A Big 𝐾 , little 𝑘 trick widely used in macroeconomic dynamics.

315
316 CHAPTER 19. CASS-KOOPMANS COMPETITIVE EQUILIBRIUM

• We shall encounter this trick in this lecture and also in this lecture.

• A non-stochastic version of a theory of the term structure of interest rates.


• An intimate connection between the cases for the optimality of two competing visions of
good ways to organize an economy, namely:

• socialism in which a central planner commands the allocation of resources,


and

• capitalism (also known as a market economy) in which competitive equilibrium


prices induce individual consumers and producers to choose a socially optimal alloca-
tion as an unintended consequence of their selfish decisions
Let’s start with some standard imports:

In [1]: from numba import njit, jitclass, float64


import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

19.3 Review of Cass-Koopmans Model

The physical setting is identical with that in Cass-Koopmans Planning Model.


Time is discrete and takes values 𝑡 = 0, 1, … , 𝑇 .
A single good can either be consumed or invested in physical capital.
The consumption good is not durable and depreciates completely if not consumed immedi-
ately.
The capital good is durable but partially depreciates each period at a constant rate.
We let 𝐶𝑡 be a nondurable consumption good at time t.
Let 𝐾𝑡 be the stock of physical capital at time t.
Let 𝐶 ⃗ = {𝐶0 , … , 𝐶𝑇 } and 𝐾⃗ = {𝐾0 , … , 𝐾𝑇 +1 }.
A representative household is endowed with one unit of labor at each 𝑡 and likes the con-
sumption good at each 𝑡.
The representative household inelastically supplies a single unit of labor 𝑁𝑡 at each 𝑡, so that
𝑁𝑡 = 1 for all 𝑡 ∈ [0, 𝑇 ].
The representative household has preferences over consumption bundles ordered by the utility
functional:

𝑇 1−𝛾
𝐶
𝑈 (𝐶)⃗ = ∑ 𝛽 𝑡 𝑡
𝑡=0
1−𝛾

where 𝛽 ∈ (0, 1) is a discount factor and 𝛾 > 0 governs the curvature of the one-period utility
function.
We assume that 𝐾0 > 0.
There is an economy-wide production function
19.4. COMPETITIVE EQUILIBRIUM 317

𝐹 (𝐾𝑡 , 𝑁𝑡 ) = 𝐴𝐾𝑡𝛼 𝑁𝑡1−𝛼

with 0 < 𝛼 < 1, 𝐴 > 0.


A feasible allocation 𝐶,⃗ 𝐾⃗ satisfies

𝐶𝑡 + 𝐾𝑡+1 ≤ 𝐹 (𝐾𝑡 , 𝑁𝑡 ) + (1 − 𝛿)𝐾𝑡 , for all 𝑡 ∈ [0, 𝑇 ]

where 𝛿 ∈ (0, 1) is a depreciation rate of capital.

19.3.1 Planning Problem

In this lecture Cass-Koopmans Planning Model, we studied a problem in which a planner


chooses an allocation {𝐶,⃗ 𝐾}
⃗ to maximize (1) subject to (4).

The allocation that solves the planning problem plays an important role in a competitive
equilibrium as we shall see below.

19.4 Competitive Equilibrium

We now study a decentralized version of the economy.


It shares the same technology and preference structure as the planned economy studied in
this lecture Cass-Koopmans Planning Model.
But now there is no planner.
Market prices adjust to reconcile distinct decisions that are made separately by a representa-
tive household and a representative firm.
There is a representative consumer who has the same preferences over consumption plans as
did the consumer in the planned economy.
Instead of being told what to consume and save by a planner, the household chooses for itself
subject to a budget constraint
• At each time 𝑡, the household receives wages and rentals of capital from a firm – these
comprise its income at time 𝑡.
• The consumer decides how much income to allocate to consumption or to savings.
• The household can save either by acquiring additional physical capital (it trades one
for one with time 𝑡 consumption) or by acquiring claims on consumption at dates other
than 𝑡.
• The household owns all physical capital and labor and rents them to the firm.
• The household consumes, supplies labor, and invests in physical capital.
• A profit-maximizing representative firm operates the production technology.
• The firm rents labor and capital each period from the representative household and sells
its output each period to the household.
• The representative household and the representative firm are both price takers who
believe that prices are not affected by their choices
Note: We can think of there being a large number 𝑀 of identical representative consumers
and 𝑀 identical representative firms.
318 CHAPTER 19. CASS-KOOPMANS COMPETITIVE EQUILIBRIUM

19.5 Market Structure

The representative household and the representative firm are both price takers.
The household owns both factors of production, namely, labor and physical capital.
Each period, the firm rents both factors from the household.
There is a single grand competitive market in which a household can trade date 0 goods for
goods at all other dates 𝑡 = 1, 2, … , 𝑇 .

19.5.1 Prices

There are sequences of prices {𝑤𝑡 , 𝜂𝑡 }𝑇𝑡=0 = {𝑤,⃗ 𝜂}⃗ where 𝑤𝑡 is a wage or rental rate for labor
at time 𝑡 and 𝑒𝑡𝑎𝑡 is a rental rate for capital at time 𝑡.
In addition there is are intertemporal prices that work as follows.
Let 𝑞𝑡0 be the price of a good at date 𝑡 relative to a good at date 0.
We call {𝑞𝑡0 }𝑇𝑡=0 a vector of Hicks-Arrow prices, named after the 1972 economics Nobel
prize winners.
Evidently,

number of time 0 goods


𝑞𝑡0 =
number of time t goods

Because 𝑞𝑡0 is a relative price, the units in terms of which prices are quoted are arbitrary –
we are free to normalize them.

19.6 Firm Problem

At time 𝑡 a representative firm hires labor 𝑛̃ 𝑡 and capital 𝑘̃ 𝑡 .


The firm’s profits at time 𝑡 are

𝐹 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) − 𝑤𝑡 𝑛̃ 𝑡 − 𝜂𝑡 𝑘̃ 𝑡

where 𝑤𝑡 is a wage rate at 𝑡 and 𝜂𝑡 is the rental rate on capital at 𝑡.


As in the planned economy model

𝐹 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) = 𝐴𝑘̃ 𝑡𝛼 𝑛̃ 1−𝛼


𝑡

19.6.1 Zero Profit Conditions

Zero-profits condition for capital and labor are

𝐹𝑘 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) = 𝜂𝑡

and
19.7. HOUSEHOLD PROBLEM 319

𝐹𝑛 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) = 𝑤𝑡 (1)

These conditions emerge from a no-arbitrage requirement.


To describe this no-arbitrage profits reasoning, we begin by applying a theorem of Euler
about linearly homogenous functions.
The theorem applies to the Cobb-Douglas production function because it assumed displays
constant returns to scale:

𝛼𝐹 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) = 𝐹 (𝛼𝑘̃ 𝑡 , 𝛼𝑛̃ 𝑡 )

for 𝛼 ∈ (0, 1).


𝜕𝐹
Taking the partial derivative 𝜕𝛼 on both sides of the above equation gives

𝜕𝐹 ̃ 𝜕𝐹
𝐹 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) =chain rule 𝑘𝑡 + 𝑛̃
𝜕 𝑘̃ 𝑡 𝜕 𝑛̃ 𝑡 𝑡

Rewrite the firm’s profits as

𝜕𝐹 ̃ 𝜕𝐹
𝑘𝑡 + 𝑛̃ − 𝑤𝑡 𝑛̃ 𝑡 − 𝜂𝑡 𝑘𝑡
𝜕 𝑘̃ 𝑡 𝜕 𝑛̃ 𝑡 𝑡

or

𝜕𝐹 𝜕𝐹
( − 𝜂𝑡 ) 𝑘̃ 𝑡 + ( − 𝑤𝑡 ) 𝑛̃ 𝑡
𝜕 𝑘̃ 𝑡 𝜕 𝑛̃ 𝑡

𝜕𝐹 𝜕𝐹
Because 𝐹 is homogeneous of degree 1, it follows that 𝜕 𝑘̃ 𝑡
and 𝜕 𝑛̃ 𝑡 are homogeneous of degree
0 and therefore fixed with respect to 𝑘̃ 𝑡 and 𝑛̃ 𝑡 .
If 𝜕𝐹
𝜕 𝑘̃ 𝑡
> 𝜂𝑡 , then the firm makes positive profits on each additional unit of 𝑘̃ 𝑡 , so it will want
to make 𝑘̃ 𝑡 arbitrarily large.
But setting 𝑘̃ 𝑡 = +∞ is not physically feasible, so presumably equilibrium prices will assume
values that present the firm with no such arbitrage opportunity.
𝜕𝐹
A similar argument applies if 𝜕 𝑛̃ 𝑡 > 𝑤𝑡 .
𝜕 𝑘̃ 𝑡
If 𝜕 𝑘̃ 𝑡
< 𝜂𝑡 , the firm will set 𝑘̃ 𝑡 to zero, something that is not feasible.

It is convenient to define 𝑤⃗ = {𝑤0 , … , 𝑤𝑇 }and 𝜂 ⃗ = {𝜂0 , … , 𝜂𝑇 }.

19.7 Household Problem

A representative household lives at 𝑡 = 0, 1, … , 𝑇 .


At 𝑡, the household rents 1 unit of labor and 𝑘𝑡 units of capital to a firm and receives income

𝑤𝑡 1 + 𝜂 𝑡 𝑘 𝑡
320 CHAPTER 19. CASS-KOOPMANS COMPETITIVE EQUILIBRIUM

At 𝑡 the household allocates its income to the following purchases

(𝑐𝑡 + (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ))

Here (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ) is the household’s net investment in physical capital and 𝛿 ∈ (0, 1) is
again a depreciation rate of capital.
In period 𝑡 is free to purchase more goods to be consumed and invested in physical capital
than its income from supplying capital and labor to the firm, provided that in some other
periods its income exceeds its purchases.
A household’s net excess demand for time 𝑡 consumption goods is the gap

𝑒𝑡 ≡ (𝑐𝑡 + (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 )) − (𝑤𝑡 1 + 𝜂𝑡 𝑘𝑡 )

Let 𝑐 ⃗ = {𝑐0 , … , 𝑐𝑇 } and let 𝑘⃗ = {𝑘1 , … , 𝑘𝑇 +1 }.


𝑘0 is given to the household.
The household faces a single budget constraint. that states that the present value of the
household’s net excess demands must be zero:

𝑇
∑ 𝑞𝑡0 𝑒𝑡 ≤ 0
𝑡=0

or

𝑇
∑ 𝑞𝑡0 (𝑐𝑡 + (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ) − (𝑤𝑡 1 + 𝜂𝑡 𝑘𝑡 )) ≤ 0
𝑡=0

The household chooses an allocation to solve the constrained optimization problem:

𝑇
max ∑ 𝛽 𝑡 𝑢(𝑐𝑡 )
𝑐,⃗ 𝑘⃗ 𝑡=0
𝑇
subject to ∑ 𝑞𝑡0 (𝑐𝑡 + (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ) − 𝑤𝑡 − 𝜂𝑡 𝑘𝑡 ) ≤ 0
𝑡=0

19.7.1 Definitions

• A price system is a sequence {𝑞𝑡0 , 𝜂𝑡 , 𝑤𝑡 }𝑇𝑡=0 = {𝑞,⃗ 𝜂,⃗ 𝑤}.



• An allocation is a sequence {𝑐𝑡 , 𝑘𝑡+1 , 𝑛𝑡 = 1}𝑡=0 = {𝑐,⃗ 𝑘,⃗ 𝑛}.
𝑇

• A competitive equilibrium is a price system and an allocation for which
– Given the price system, the allocation solves the household’s problem.
– Given the price system, the allocation solves the firm’s problem.

19.8 Computing a Competitive Equilibrium

We compute a competitive equilibrium by using a guess and verify approach.


19.8. COMPUTING A COMPETITIVE EQUILIBRIUM 321

• We guess equilibrium price sequences {𝑞,⃗ 𝜂,⃗ 𝑤}.



• We then verify that at those prices, the household and the firm choose the same alloca-
tion.

19.8.1 Guess for Price System

In this lecture Cass-Koopmans Planning Model, we computed an allocation {𝐶,⃗ 𝐾,⃗ 𝑁⃗ } that
solves the planning problem.
(This allocation will constitute the Big 𝐾 to be in the presence instance of the *Big 𝐾 , lit-
tle** 𝑘 trick that we’ll apply to a competitive equilibrium in the spirit of this lecture and this
lecture.)
We use that allocation to construct a guess for the equilibrium price system.
In particular, we guess that for 𝑡 = 0, … , 𝑇 :

𝜆𝑞𝑡0 = 𝛽 𝑡 𝑢′ (𝐾𝑡 ) = 𝛽 𝑡 𝜇𝑡 (2)

𝑤𝑡 = 𝑓(𝐾𝑡 ) − 𝐾𝑡 𝑓 ′ (𝐾𝑡 ) (3)

𝜂𝑡 = 𝑓 ′ (𝐾𝑡 ) (4)

At these prices, let the capital chosen by the household be

𝑘𝑡∗ (𝑞,⃗ 𝑤,⃗ 𝜂),⃗ 𝑡≥0 (5)

and let the allocation chosen by the firm be

𝑘̃ 𝑡∗ (𝑞,⃗ 𝑤,⃗ 𝜂),


⃗ 𝑡≥0

and so on.
If our guess for the equilibrium price system is correct, then it must occur that

𝑘𝑡∗ = 𝑘̃ 𝑡∗ (6)

1 = 𝑛̃ ∗𝑡 (7)


𝑐𝑡∗ + 𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡∗ = 𝐹 (𝑘̃ 𝑡∗ , 𝑛̃ ∗𝑡 )

We shall verify that for 𝑡 = 0, … , 𝑇 the allocations chosen by the household and the firm both
equal the allocation that solves the planning problem:

𝑘𝑡∗ = 𝑘̃ 𝑡∗ = 𝐾𝑡 , 𝑛̃ 𝑡 = 1, 𝑐𝑡∗ = 𝐶𝑡 (8)


322 CHAPTER 19. CASS-KOOPMANS COMPETITIVE EQUILIBRIUM

19.8.2 Verification Procedure

Our approach is to stare at first-order necessary conditions for the optimization problems of
the household and the firm.
At the price system we have guessed, we’ll then verify that both sets of first-order conditions
are satisfied at the allocation that solves the planning problem.

19.8.3 Household’s Lagrangian

To solve the household’s problem, we formulate the Lagrangian

𝑇 𝑇
ℒ(𝑐,⃗ 𝑘,⃗ 𝜆) = ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) + 𝜆 (∑ 𝑞𝑡0 (((1 − 𝛿)𝑘𝑡 − 𝑤𝑡 ) + 𝜂𝑡 𝑘𝑡 − 𝑐𝑡 − 𝑘𝑡+1 ))
𝑡=0 𝑡=0

and attack the min-max problem:

min max ℒ(𝑐,⃗ 𝑘,⃗ 𝜆)


𝜆 𝑐,⃗ 𝑘⃗

First-order conditions are

𝑐𝑡 ∶ 𝛽 𝑡 𝑢′ (𝑐𝑡 ) − 𝜆𝑞𝑡0 = 0 𝑡 = 0, 1, … , 𝑇 (9)

𝑘𝑡 ∶ −𝜆𝑞𝑡0 [(1 − 𝛿) + 𝜂𝑡 ] + 𝜆𝑞𝑡−1


0
=0 𝑡 = 1, 2, … , 𝑇 + 1 (10)

𝑇
𝜆∶ (∑ 𝑞𝑡0 (𝑐𝑡 + (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ) − 𝑤𝑡 − 𝜂𝑡 𝑘𝑡 )) ≤ 0 (11)
𝑡=0

𝑘𝑇 +1 ∶ −𝜆𝑞0𝑇 +1 ≤ 0, ≤ 0 if 𝑘𝑇 +1 = 0; = 0 if 𝑘𝑇 +1 > 0 (12)

Now we plug in our guesses of prices and embark on some algebra in the hope of derived
all first-order necessary conditions (7)-(10) for the planning problem from this lecture Cass-
Koopmans Planning Model.
Combining (9) and (2), we get:

𝑢′ (𝐶𝑡 ) = 𝜇𝑡

which is (7).
Combining (10), (2), and (4) we get:

−𝜆𝛽 𝑡 𝜇𝑡 [(1 − 𝛿) + 𝑓 ′ (𝐾𝑡 )] + 𝜆𝛽 𝑡−1 𝜇𝑡−1 = 0 (13)

Rewriting (13) by dividing by 𝜆 on both sides (which is nonzero since u’>0) we get:

𝛽 𝑡 𝜇𝑡 [(1 − 𝛿 + 𝑓 ′ (𝐾𝑡 )] = 𝛽 𝑡−1 𝜇𝑡−1


19.8. COMPUTING A COMPETITIVE EQUILIBRIUM 323

or

𝛽𝜇𝑡 [(1 − 𝛿 + 𝑓 ′ (𝐾𝑡 )] = 𝜇𝑡−1

which is (8).
Combining (11), (2), (3) and (4) after multiplying both sides of (11) by 𝜆, we get

𝑇
∑ 𝛽 𝑡 𝜇𝑡 (𝐶𝑡 + (𝐾𝑡+1 − (1 − 𝛿)𝐾𝑡 ) − 𝑓(𝐾𝑡 ) + 𝐾𝑡 𝑓 ′ (𝐾𝑡 ) − 𝑓 ′ (𝐾𝑡 )𝐾𝑡 ) ≤ 0
𝑡=0

which simplifies

𝑇
∑ 𝛽 𝑡 𝜇𝑡 (𝐶𝑡 + 𝐾𝑡+1 − (1 − 𝛿)𝐾𝑡 − 𝐹 (𝐾𝑡 , 1)) ≤ 0
𝑡=0

Since 𝛽 𝑡 𝜇𝑡 > 0 for 𝑡 = 0, … , 𝑇 , it follows that

𝐶𝑡 + 𝐾𝑡+1 − (1 − 𝛿)𝐾𝑡 − 𝐹 (𝐾𝑡 , 1) = 0 for all 𝑡 in 0, … , 𝑇

which is (9).
Combining (12) and (2), we get:

−𝛽 𝑇 +1 𝜇𝑇 +1 ≤ 0

Dividing both sides by 𝛽 𝑇 +1 gives

−𝜇𝑇 +1 ≤ 0

which is (10) for the planning problem.


Thus, at our guess of the equilibrium price system, the allocation that solves the planning
problem also solves the problem faced by a representative household living in a competitive
equilibrium.
We now turn to the problem faced by a firm in a competitive equilibrium:
If we plug in (8) into (1) for all t, we get

𝜕𝐹 (𝐾𝑡 , 1)
= 𝑓 ′ (𝐾𝑡 ) = 𝜂𝑡
𝜕𝐾𝑡

which is (4).
If we now plug (8) into (1) for all t, we get:

𝜕𝐹 (𝐾̃ 𝑡 , 1)
= 𝑓(𝐾𝑡 ) − 𝑓 ′ (𝐾𝑡 )𝐾𝑡 = 𝑤𝑡
̃
𝜕 𝐿𝑡

which is exactly (5).


324 CHAPTER 19. CASS-KOOPMANS COMPETITIVE EQUILIBRIUM

So at our guess for the equilibrium price system, the allocation that solves the planning prob-
lem also solves the problem faced by a firm within a competitive equilibrium.
By (6) and (7) this allocation is identical to the one that solves the consumer’s problem.
Note: Because budget sets are affected only by relative prices, {𝑞0𝑡 } is determined only up to
multiplication by a positive constant.
Normalization: We are free to choose a {𝑞0𝑡 } that makes 𝜆 = 1 so that we are measuring 𝑞0𝑡
in units of the marginal utility of time 0 goods.
We will plot 𝑞, 𝑤, 𝜂 below to show these equilibrium prices induce the same aggregate move-
ments that we saw earlier in the planning problem.
To proceed, we bring in Python code that Cass-Koopmans Planning Model used to solve the
planning problem
First let’s define a jitclass that stores parameters and functions the characterize an econ-
omy.

In [2]: planning_data = [
('γ', float64), # Coefficient of relative risk aversion
('β', float64), # Discount factor
('δ', float64), # Depreciation rate on capital
('α', float64), # Return to capital per capita
('A', float64) # Technology
]

In [3]: @jitclass(planning_data)
class PlanningProblem():

def __init__(self, γ=2, β=0.95, δ=0.02, α=0.33, A=1):

self.γ, self.β = γ, β
self.δ, self.α, self.A = δ, α, A

def u(self, c):


'''
Utility function
ASIDE: If you have a utility function that is hard to solve by hand
you can use automatic or symbolic differentiation
See https://github.com/HIPS/autograd
'''
γ = self.γ

return c ** (1 ­ γ) / (1 ­ γ) if γ!= 1 else np.log(c)

def u_prime(self, c):


'Derivative of utility'
γ = self.γ

return c ** (­γ)

def u_prime_inv(self, c):


'Inverse of derivative of utility'
γ = self.γ

return c ** (­1 / γ)

def f(self, k):


'Production function'
α, A = self.α, self.A
19.8. COMPUTING A COMPETITIVE EQUILIBRIUM 325

return A * k ** α

def f_prime(self, k):


'Derivative of production function'
α, A = self.α, self.A

return α * A * k ** (α ­ 1)

def f_prime_inv(self, k):


'Inverse of derivative of production function'
α, A = self.α, self.A

return (k / (A * α)) ** (1 / (α ­ 1))

def next_k_c(self, k, c):


''''
Given the current capital Kt and an arbitrary feasible
consumption choice Ct, computes Kt+1 by state transition law
and optimal Ct+1 by Euler equation.
'''
β, δ = self.β, self.δ
u_prime, u_prime_inv = self.u_prime, self.u_prime_inv
f, f_prime = self.f, self.f_prime

k_next = f(k) + (1 ­ δ) * k ­ c
c_next = u_prime_inv(u_prime(c) / (β * (f_prime(k_next) + (1 ­ δ))))

return k_next, c_next

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/ipykernel_launcher.py:1: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba­
doc/latest/reference/deprecation.html#change­of­jitclass­location for the time�
↪frame.
"""Entry point for launching an IPython kernel.

In [4]: @njit
def shooting(pp, c0, k0, T=10):
'''
Given the initial condition of capital k0 and an initial guess
of consumption c0, computes the whole paths of c and k
using the state transition law and Euler equation for T periods.
'''
if c0 > pp.f(k0):
print("initial consumption is not feasible")

return None

# initialize vectors of c and k


c_vec = np.empty(T+1)
k_vec = np.empty(T+2)

c_vec[0] = c0
k_vec[0] = k0

for t in range(T):
k_vec[t+1], c_vec[t+1] = pp.next_k_c(k_vec[t], c_vec[t])
326 CHAPTER 19. CASS-KOOPMANS COMPETITIVE EQUILIBRIUM

k_vec[T+1] = pp.f(k_vec[T]) + (1 ­ pp.δ) * k_vec[T] ­ c_vec[T]

return c_vec, k_vec

In [5]: @njit
def bisection(pp, c0, k0, T=10, tol=1e­4, max_iter=500, k_ter=0, verbose=True):

# initial boundaries for guess c0


c0_upper = pp.f(k0)
c0_lower = 0

i = 0
while True:
c_vec, k_vec = shooting(pp, c0, k0, T)
error = k_vec[­1] ­ k_ter

# check if the terminal condition is satisfied


if np.abs(error) < tol:
if verbose:
print('Converged successfully on iteration ', i+1)
return c_vec, k_vec

i += 1
if i == max_iter:
if verbose:
print('Convergence failed.')
return c_vec, k_vec

# if iteration continues, updates boundaries and guess of c0


if error > 0:
c0_lower = c0
else:
c0_upper = c0

c0 = (c0_lower + c0_upper) / 2

In [6]: pp = PlanningProblem()

# Steady states
ρ = 1 / pp.β ­ 1
k_ss = pp.f_prime_inv(ρ+pp.δ)
c_ss = pp.f(k_ss) ­ pp.δ * k_ss

The above code from this lecture Cass-Koopmans Planning Model lets us compute an opti-
mal allocation for the planning problem that turns out to be the allocation associated with a
competitive equilibium.
Now we’re ready to bring in Python code that we require to compute additional objects that
appear in a competitive equilibrium.

In [7]: @njit
def q(pp, c_path):
# Here we choose numeraire to be u'(c_0) ­­ this is q^(t_0)_t
T = len(c_path) ­ 1
q_path = np.ones(T+1)
q_path[0] = 1
for t in range(1, T+1):
q_path[t] = pp.β ** t * pp.u_prime(c_path[t])
return q_path
19.8. COMPUTING A COMPETITIVE EQUILIBRIUM 327

@njit
def w(pp, k_path):
w_path = pp.f(k_path) ­ k_path * pp.f_prime(k_path)
return w_path

@njit
def η(pp, k_path):
η_path = pp.f_prime(k_path)
return η_path

Now we calculate and plot for each 𝑇

In [8]: T_arr = [250, 150, 75, 50]

fix, axs = plt.subplots(2, 3, figsize=(13, 6))


titles = ['Arrow­Hicks Prices', 'Labor Rental Rate', 'Capital Rental Rate',
'Consumption', 'Capital', 'Lagrange Multiplier']
ylabels = ['$q_t^0$', '$w_t$', '$\eta_t$', '$c_t$', '$k_t$', '$\mu_t$']

for T in T_arr:
c_path, k_path = bisection(pp, 0.3, k_ss/3, T, verbose=False)
μ_path = pp.u_prime(c_path)

q_path = q(pp, c_path)


w_path = w(pp, k_path)[:­1]
η_path = η(pp, k_path)[:­1]
paths = [q_path, w_path, η_path, c_path, k_path, μ_path]

for i, ax in enumerate(axs.flatten()):
ax.plot(paths[i])
ax.set(title=titles[i], ylabel=ylabels[i], xlabel='t')
if titles[i] is 'Capital':
ax.axhline(k_ss, lw=1, ls='­­', c='k')
if titles[i] is 'Consumption':
ax.axhline(c_ss, lw=1, ls='­­', c='k')

plt.tight_layout()
plt.show()
328 CHAPTER 19. CASS-KOOPMANS COMPETITIVE EQUILIBRIUM

Varying Curvature

Now we see how our results change if we keep 𝑇 constant, but allow the curvature parameter,
𝛾 to vary, starting with 𝐾0 below the steady state.
We plot the results for 𝑇 = 150

In [9]: T = 150
γ_arr = [1.1, 4, 6, 8]

fix, axs = plt.subplots(2, 3, figsize=(13, 6))

for γ in γ_arr:
pp_γ = PlanningProblem(γ=γ)
c_path, k_path = bisection(pp_γ, 0.3, k_ss/3, T, verbose=False)
μ_path = pp_γ.u_prime(c_path)

q_path = q(pp_γ, c_path)


w_path = w(pp_γ, k_path)[:­1]
η_path = η(pp_γ, k_path)[:­1]
paths = [q_path, w_path, η_path, c_path, k_path, μ_path]

for i, ax in enumerate(axs.flatten()):
ax.plot(paths[i], label=f'$\gamma = {γ}$')
ax.set(title=titles[i], ylabel=ylabels[i], xlabel='t')
if titles[i] is 'Capital':
ax.axhline(k_ss, lw=1, ls='­­', c='k')
if titles[i] is 'Consumption':
ax.axhline(c_ss, lw=1, ls='­­', c='k')

axs[0, 0].legend()
plt.tight_layout()
plt.show()

Adjusting 𝛾 means adjusting how much individuals prefer to smooth consumption.


Higher 𝛾 means individuals prefer to smooth more resulting in slower adjustments to the
steady state allocations.
Vice-versa for lower 𝛾.
19.9. YIELD CURVES AND HICKS-ARROW PRICES 329

19.9 Yield Curves and Hicks-Arrow Prices

We return to Hicks-Arrow prices and calculate how they are related to yields on loans of al-
ternative maturities.
This will let us plot a yield curve that graphs yields on bonds of maturities 𝑗 = 1, 2, …
against :math:j=1,2, ldots‘.
The formulas we want are:
A yield to maturity on a loan made at time 𝑡0 that matures at time 𝑡 > 𝑡0

𝑡
log 𝑞𝑡 0
𝑟𝑡0 ,𝑡 = −
𝑡 − 𝑡0

A Hicks-Arrow price for a base-year 𝑡0 ≤ 𝑡

−𝛾
𝑡 𝑢′ (𝑐𝑡 ) 𝑡−𝑡0 𝑐𝑡
𝑞𝑡 0 = 𝛽 𝑡−𝑡0 = 𝛽
𝑢′ (𝑐𝑡0 ) 𝑐𝑡−𝛾
0

We redefine our function for 𝑞 to allow arbitrary base years, and define a new function for 𝑟,
then plot both.
We begin by continuing to assume that 𝑡0 = 0 and plot things for different maturities 𝑡 = 𝑇 ,
with 𝐾0 below the steady state

In [10]: @njit
def q_generic(pp, t0, c_path):
# simplify notations
β = pp.β
u_prime = pp.u_prime

T = len(c_path) ­ 1
q_path = np.zeros(T+1­t0)
q_path[0] = 1
for t in range(t0+1, T+1):
q_path[t­t0] = β ** (t­t0) * u_prime(c_path[t]) / u_prime(c_path[t0])
return q_path

@njit
def r(pp, t0, q_path):
'''Yield to maturity'''
r_path = ­ np.log(q_path[1:]) / np.arange(1, len(q_path))
return r_path

def plot_yield_curves(pp, t0, c0, k0, T_arr):

fig, axs = plt.subplots(1, 2, figsize=(10, 5))

for T in T_arr:
c_path, k_path = bisection(pp, c0, k0, T, verbose=False)
q_path = q_generic(pp, t0, c_path)
r_path = r(pp, t0, q_path)

axs[0].plot(range(t0, T+1), q_path)


axs[0].set(xlabel='t', ylabel='$q_t^0$', title='Hicks­Arrow Prices')

axs[1].plot(range(t0+1, T+1), r_path)


axs[1].set(xlabel='t', ylabel='$r_t^0$', title='Yields')
330 CHAPTER 19. CASS-KOOPMANS COMPETITIVE EQUILIBRIUM

In [11]: T_arr = [150, 75, 50]


plot_yield_curves(pp, 0, 0.3, k_ss/3, T_arr)

Now we plot when 𝑡0 = 20

In [12]: plot_yield_curves(pp, 20, 0.3, k_ss/3, T_arr)

We aim to have more to say about the term structure of interest rates in a planned lecture on
the topic.
Part III

Search

331
Chapter 20

Job Search I: The McCall Search


Model

20.1 Contents

• Overview 20.2
• The McCall Model 20.3
• Computing the Optimal Policy: Take 1 20.4
• Computing the Optimal Policy: Take 2 20.5
• Exercises 20.6
• Solutions 20.7

“Questioning a McCall worker is like having a conversation with an out-of-work


friend: ‘Maybe you are setting your sights too high’, or ‘Why did you quit your
old job before you had a new one lined up?’ This is real social science: an attempt
to model, to understand, human behavior by visualizing the situation people find
themselves in, the options they face and the pros and cons as they themselves see
them.” – Robert E. Lucas, Jr.

In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon

20.2 Overview

The McCall search model [80] helped transform economists’ way of thinking about labor mar-
kets.
To clarify vague notions such as “involuntary” unemployment, McCall modeled the decision
problem of unemployed agents directly, in terms of factors such as
• current and likely future wages
• impatience
• unemployment compensation
To solve the decision problem he used dynamic programming.

333
334 CHAPTER 20. JOB SEARCH I: THE MCCALL SEARCH MODEL

Here we set up McCall’s model and adopt the same solution method.
As we’ll see, McCall’s model is not only interesting in its own right but also an excellent vehi-
cle for learning dynamic programming.
Let’s start with some imports:

In [2]: import numpy as np


from numba import jit, jitclass, float64
import matplotlib.pyplot as plt
%matplotlib inline
import quantecon as qe
from quantecon.distributions import BetaBinomial

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

20.3 The McCall Model

An unemployed agent receives in each period a job offer at wage 𝑤𝑡 .


The wage offer is a nonnegative function of some underlying state:

𝑤𝑡 = 𝑤(𝑠𝑡 ) where 𝑠𝑡 ∈ 𝕊

Here you should think of state process {𝑠𝑡 } as some underlying, unspecified random factor
that impacts on wages.
(Introducing an exogenous stochastic state process is a standard way for economists to inject
randomness into their models.)
In this lecture, we adopt the following simple environment:
• {𝑠𝑡 } is IID, with 𝑞(𝑠) being the probability of observing state 𝑠 in 𝕊 at each point in
time, and
• the agent observes 𝑠𝑡 at the start of 𝑡 and hence knows 𝑤𝑡 = 𝑤(𝑠𝑡 ),
• the set 𝕊 is finite.
(In later lectures, we will relax all of these assumptions.)
At time 𝑡, our agent has two choices:

1. Accept the offer and work permanently at constant wage 𝑤𝑡 .

2. Reject the offer, receive unemployment compensation 𝑐, and reconsider next period.

The agent is infinitely lived and aims to maximize the expected discounted sum of earnings


𝔼 ∑ 𝛽 𝑡 𝑦𝑡
𝑡=0
20.3. THE MCCALL MODEL 335

The constant 𝛽 lies in (0, 1) and is called a discount factor.


The smaller is 𝛽, the more the agent discounts future utility relative to current utility.
The variable 𝑦𝑡 is income, equal to
• his/her wage 𝑤𝑡 when employed
• unemployment compensation 𝑐 when unemployed
The agent is assumed to know that {𝑠𝑡 } is IID with common distribution 𝑞 and can use this
when computing expectations.

20.3.1 A Trade-Off

The worker faces a trade-off:


• Waiting too long for a good offer is costly, since the future is discounted.
• Accepting too early is costly, since better offers might arrive in the future.
To decide optimally in the face of this trade-off, we use dynamic programming.
Dynamic programming can be thought of as a two-step procedure that

1. first assigns values to “states” and

2. then deduces optimal actions given those values

We’ll go through these steps in turn.

20.3.2 The Value Function

In order to optimally trade-off current and future rewards, we need to think about two things:

1. the current payoffs we get from different choices

2. the different states that those choices will lead to in next period (in this case, either em-
ployment or unemployment)

To weigh these two aspects of the decision problem, we need to assign values to states.
To this end, let 𝑣∗ (𝑠) be the total lifetime value accruing to an unemployed worker who enters
the current period unemployed when the state is 𝑠 ∈ 𝕊.
In particular, the agent has wage offer 𝑤(𝑠) in hand.
More precisely, 𝑣∗ (𝑠) denotes the value of the objective function (1) when an agent in this sit-
uation makes optimal decisions now and at all future points in time.
Of course 𝑣∗ (𝑠) is not trivial to calculate because we don’t yet know what decisions are opti-
mal and what aren’t!
But think of 𝑣∗ as a function that assigns to each possible state 𝑠 the maximal lifetime value
that can be obtained with that offer in hand.
A crucial observation is that this function 𝑣∗ must satisfy the recursion
336 CHAPTER 20. JOB SEARCH I: THE MCCALL SEARCH MODEL

𝑤(𝑠)
𝑣∗ (𝑠) = max { , 𝑐 + 𝛽 ∑ 𝑣∗ (𝑠′ )𝑞(𝑠′ )} (1)
1−𝛽 𝑠′ ∈𝕊

for every possible 𝑠 in 𝕊.


This important equation is a version of the Bellman equation, which is ubiquitous in eco-
nomic dynamics and other fields involving planning over time.
The intuition behind it is as follows:
• the first term inside the max operation is the lifetime payoff from accepting current of-
fer, since

𝑤(𝑠)
= 𝑤(𝑠) + 𝛽𝑤(𝑠) + 𝛽 2 𝑤(𝑠) + ⋯
1−𝛽
• the second term inside the max operation is the continuation value, which is the life-
time payoff from rejecting the current offer and then behaving optimally in all subse-
quent periods
If we optimize and pick the best of these two options, we obtain maximal lifetime value from
today, given current state 𝑠.
But this is precisely 𝑣∗ (𝑠), which is the l.h.s. of (1).

20.3.3 The Optimal Policy

Suppose for now that we are able to solve (1) for the unknown function 𝑣∗ .
Once we have this function in hand we can behave optimally (i.e., make the right choice be-
tween accept and reject).
All we have to do is select the maximal choice on the r.h.s. of (1).
The optimal action is best thought of as a policy, which is, in general, a map from states to
actions.
Given any 𝑠, we can read off the corresponding best choice (accept or reject) by picking the
max on the r.h.s. of (1).
Thus, we have a map from ℝ to {0, 1}, with 1 meaning accept and 0 meaning reject.
We can write the policy as follows

𝑤(𝑠)
𝜎(𝑠) ∶= 1 { ≥ 𝑐 + 𝛽 ∑ 𝑣∗ (𝑠′ )𝑞(𝑠′ )}
1−𝛽 𝑠′ ∈𝕊

Here 1{𝑃 } = 1 if statement 𝑃 is true and equals 0 otherwise.


We can also write this as

𝜎(𝑠) ∶= 1{𝑤(𝑠) ≥ 𝑤}
̄

where
20.4. COMPUTING THE OPTIMAL POLICY: TAKE 1 337

𝑤̄ ∶= (1 − 𝛽) {𝑐 + 𝛽 ∑ 𝑣∗ (𝑠′ )𝑞(𝑠′ )} (2)


𝑠′

Here 𝑤̄ (called the reservation wage) is a constant depending on 𝛽, 𝑐 and the wage distribu-
tion.
The agent should accept if and only if the current wage offer exceeds the reservation wage.
In view of (2), we can compute this reservation wage if we can compute the value function.

20.4 Computing the Optimal Policy: Take 1

To put the above ideas into action, we need to compute the value function at each possible
state 𝑠 ∈ 𝕊.
Let’s suppose that 𝕊 = {1, … , 𝑛}.
The value function is then represented by the vector 𝑣∗ = (𝑣∗ (𝑖))𝑛𝑖=1 .
In view of (1), this vector satisfies the nonlinear system of equations

𝑤(𝑖)
𝑣∗ (𝑖) = max { , 𝑐 + 𝛽 ∑ 𝑣∗ (𝑗)𝑞(𝑗)} for 𝑖 = 1, … , 𝑛 (3)
1−𝛽 1≤𝑗≤𝑛

20.4.1 The Algorithm

To compute this vector, we use successive approximations:


Step 1: pick an arbitrary initial guess 𝑣 ∈ ℝ𝑛 .
Step 2: compute a new vector 𝑣′ ∈ ℝ𝑛 via

𝑤(𝑖)
𝑣′ (𝑖) = max { , 𝑐 + 𝛽 ∑ 𝑣(𝑗)𝑞(𝑗)} for 𝑖 = 1, … , 𝑛 (4)
1−𝛽 1≤𝑗≤𝑛

Step 3: calculate a measure of the deviation between 𝑣 and 𝑣′ , such as max𝑖 |𝑣(𝑖) − 𝑣′ (𝑖)|.
Step 4: if the deviation is larger than some fixed tolerance, set 𝑣 = 𝑣′ and go to step 2, else
continue.
Step 5: return 𝑣.
Let {𝑣𝑘 } denote the sequence genererated by this algorithm.
This sequence converges to the solution to (3) as 𝑘 → ∞, which is the value function 𝑣∗ .

20.4.2 The Fixed Point Theory

What’s the mathematics behind these ideas?


First, one defines a mapping 𝑇 from ℝ𝑛 to itself via
338 CHAPTER 20. JOB SEARCH I: THE MCCALL SEARCH MODEL

𝑤(𝑖)
(𝑇 𝑣)(𝑖) = max { , 𝑐 + 𝛽 ∑ 𝑣(𝑗)𝑞(𝑗)} for 𝑖 = 1, … , 𝑛 (5)
1−𝛽 1≤𝑗≤𝑛

(A new vector 𝑇 𝑣 is obtained from given vector 𝑣 by evaluating the r.h.s. at each 𝑖)
The element 𝑣𝑘 in the sequence {𝑣𝑘 } of successive approximations corresponds to 𝑇 𝑘 𝑣.
• This is 𝑇 applied 𝑘 times, starting at the initial guess 𝑣
One can show that the conditions of the Banach fixed point theorem are satisfied by 𝑇 on ℝ𝑛 .
One implication is that 𝑇 has a unique fixed point in ℝ𝑛 .
• That is, a unique vector 𝑣 ̄ such that 𝑇 𝑣 ̄ = 𝑣.̄
Moreover, it’s immediate from the definition of 𝑇 that this fixed point is 𝑣∗ .
A second implication of the Banach contraction mapping theorem is that {𝑇 𝑘 𝑣} converges to
the fixed point 𝑣∗ regardless of 𝑣.

20.4.3 Implementation

Our default for 𝑞, the distribution of the state process, will be Beta-binomial.

In [3]: n, a, b = 50, 200, 100 # default parameters


q_default = BetaBinomial(n, a, b).pdf() # default choice of q

Our default set of values for wages will be

In [4]: w_min, w_max = 10, 60


w_default = np.linspace(w_min, w_max, n+1)

Here’s a plot of the probabilities of different wage outcomes:

In [5]: fig, ax = plt.subplots()


ax.plot(w_default, q_default, '­o', label='$q(w(i))$')
ax.set_xlabel('wages')
ax.set_ylabel('probabilities')

plt.show()
20.4. COMPUTING THE OPTIMAL POLICY: TAKE 1 339

We are going to use Numba to accelerate our code.


• See, in particular, the discussion of @jitclass in our lecture on Numba.
The following helps Numba by providing some type

In [6]: mccall_data = [
('c', float64), # unemployment compensation
('β', float64), # discount factor
('w', float64[:]), # array of wage values, w[i] = wage at state i
('q', float64[:]) # array of probabilities
]

Here’s a class that stores the data and computes the values of state-action pairs, i.e. the value
in the maximum bracket on the right hand side of the Bellman equation (4), given the current
state and an arbitrary feasible action.
Default parameter values are embedded in the class.

In [7]: @jitclass(mccall_data)
class McCallModel:

def __init__(self, c=25, β=0.99, w=w_default, q=q_default):

self.c, self.β = c, β
self.w, self.q = w_default, q_default

def state_action_values(self, i, v):


"""
The values of state­action pairs.
"""
# Simplify names
c, β, w, q = self.c, self.β, self.w, self.q
# Evaluate value for each state­action pair
# Consider action = accept or reject the current offer
340 CHAPTER 20. JOB SEARCH I: THE MCCALL SEARCH MODEL

accept = w[i] / (1 ­ β)
reject = c + β * np.sum(v * q)

return np.array([accept, reject])

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/ipykernel_launcher.py:1: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba­
doc/latest/reference/deprecation.html#change­of­jitclass­location for the time�
↪frame.

"""Entry point for launching an IPython kernel.

Based on these defaults, let’s try plotting the first few approximate value functions in the se-
quence {𝑇 𝑘 𝑣}.
We will start from guess 𝑣 given by 𝑣(𝑖) = 𝑤(𝑖)/(1 − 𝛽), which is the value of accepting at
every given wage.
Here’s a function to implement this:

In [8]: def plot_value_function_seq(mcm, ax, num_plots=6):


"""
Plot a sequence of value functions.

* mcm is an instance of McCallModel


* ax is an axes object that implements a plot method.

"""

n = len(mcm.w)
v = mcm.w / (1 ­ mcm.β)
v_next = np.empty_like(v)
for i in range(num_plots):
ax.plot(mcm.w, v, '­', alpha=0.4, label=f"iterate {i}")
# Update guess
for i in range(n):
v_next[i] = np.max(mcm.state_action_values(i, v))
v[:] = v_next # copy contents into v

ax.legend(loc='lower right')

Now let’s create an instance of McCallModel and call the function:

In [9]: mcm = McCallModel()

fig, ax = plt.subplots()
ax.set_xlabel('wage')
ax.set_ylabel('value')
plot_value_function_seq(mcm, ax)
plt.show()
20.4. COMPUTING THE OPTIMAL POLICY: TAKE 1 341

You can see that convergence is occuring: successive iterates are getting closer together.
Here’s a more serious iteration effort to compute the limit, which continues until measured
deviation between successive iterates is below tol.
Once we obtain a good approximation to the limit, we will use it to calculate the reservation
wage.
We’ll be using JIT compilation via Numba to turbocharge our loops.

In [10]: @jit(nopython=True)
def compute_reservation_wage(mcm,
max_iter=500,
tol=1e­6):

# Simplify names
c, β, w, q = mcm.c, mcm.β, mcm.w, mcm.q

# == First compute the value function == #

n = len(w)
v = w / (1 ­ β) # initial guess
v_next = np.empty_like(v)
i = 0
error = tol + 1
while i < max_iter and error > tol:

for i in range(n):
v_next[i] = np.max(mcm.state_action_values(i, v))

error = np.max(np.abs(v_next ­ v))


i += 1

v[:] = v_next # copy contents into v

# == Now compute the reservation wage == #


342 CHAPTER 20. JOB SEARCH I: THE MCCALL SEARCH MODEL

return (1 ­ β) * (c + β * np.sum(v * q))

The next line computes the reservation wage at the default parameters

In [11]: compute_reservation_wage(mcm)

Out[11]: 47.316499710024964

20.4.4 Comparative Statics

Now we know how to compute the reservation wage, let’s see how it varies with parameters.
In particular, let’s look at what happens when we change 𝛽 and 𝑐.

In [12]: grid_size = 25
R = np.empty((grid_size, grid_size))

c_vals = np.linspace(10.0, 30.0, grid_size)


β_vals = np.linspace(0.9, 0.99, grid_size)

for i, c in enumerate(c_vals):
for j, β in enumerate(β_vals):
mcm = McCallModel(c=c, β=β)
R[i, j] = compute_reservation_wage(mcm)

In [13]: fig, ax = plt.subplots()

cs1 = ax.contourf(c_vals, β_vals, R.T, alpha=0.75)


ctr1 = ax.contour(c_vals, β_vals, R.T)

plt.clabel(ctr1, inline=1, fontsize=13)


plt.colorbar(cs1, ax=ax)

ax.set_title("reservation wage")
ax.set_xlabel("$c$", fontsize=16)
ax.set_ylabel("$β$", fontsize=16)

ax.ticklabel_format(useOffset=False)

plt.show()
20.5. COMPUTING THE OPTIMAL POLICY: TAKE 2 343

As expected, the reservation wage increases both with patience and with unemployment com-
pensation.

20.5 Computing the Optimal Policy: Take 2

The approach to dynamic programming just described is very standard and broadly applica-
ble.
For this particular problem, there’s also an easier way, which circumvents the need to com-
pute the value function.
Let ℎ denote the continuation value:

ℎ = 𝑐 + 𝛽 ∑ 𝑣∗ (𝑠′ )𝑞(𝑠′ ) (6)


𝑠′

The Bellman equation can now be written as

𝑤(𝑠′ )
𝑣∗ (𝑠′ ) = max { , ℎ}
1−𝛽

Substituting this last equation into (6) gives

𝑤(𝑠′ )
ℎ = 𝑐 + 𝛽 ∑ max { , ℎ} 𝑞(𝑠′ ) (7)
𝑠′ ∈𝕊
1−𝛽

This is a nonlinear equation that we can solve for ℎ.


344 CHAPTER 20. JOB SEARCH I: THE MCCALL SEARCH MODEL

As before, we will use successive approximations:


Step 1: pick an initial guess ℎ.
Step 2: compute the update ℎ′ via

𝑤(𝑠′ )
ℎ′ = 𝑐 + 𝛽 ∑ max { , ℎ} 𝑞(𝑠′ ) (8)
𝑠′ ∈𝕊
1−𝛽

Step 3: calculate the deviation |ℎ − ℎ′ |.


Step 4: if the deviation is larger than some fixed tolerance, set ℎ = ℎ′ and go to step 2, else
return ℎ.
Once again, one can use the Banach contraction mapping theorem to show that this process
always converges.
The big difference here, however, is that we’re iterating on a single number, rather than an
𝑛-vector.
Here’s an implementation:

In [14]: @jit(nopython=True)
def compute_reservation_wage_two(mcm,
max_iter=500,
tol=1e­5):

# Simplify names
c, β, w, q = mcm.c, mcm.β, mcm.w, mcm.q

# == First compute h == #

h = np.sum(w * q) / (1 ­ β)
i = 0
error = tol + 1
while i < max_iter and error > tol:

s = np.maximum(w / (1 ­ β), h)
h_next = c + β * np.sum(s * q)

error = np.abs(h_next ­ h)
i += 1

h = h_next

# == Now compute the reservation wage == #

return (1 ­ β) * h

You can use this code to solve the exercise below.

20.6 Exercises

20.6.1 Exercise 1

Compute the average duration of unemployment when 𝛽 = 0.99 and 𝑐 takes the following
values
20.7. SOLUTIONS 345

c_vals = np.linspace(10, 40, 25)

That is, start the agent off as unemployed, compute their reservation wage given the parame-
ters, and then simulate to see how long it takes to accept.
Repeat a large number of times and take the average.
Plot mean unemployment duration as a function of 𝑐 in c_vals.

20.6.2 Exercise 2

The purpose of this exercise is to show how to replace the discrete wage offer distribution
used above with a continuous distribution.
This is a significant topic because many convenient distributions are continuous (i.e., have a
density).
Fortunately, the theory changes little in our simple model.
Recall that ℎ in (6) denotes the value of not accepting a job in this period but then behaving
optimally in all subsequent periods:
To shift to a continuous offer distribution, we can replace (6) by

ℎ = 𝑐 + 𝛽 ∫ 𝑣∗ (𝑠′ )𝑞(𝑠′ )𝑑𝑠′ . (9)

Equation (7) becomes

𝑤(𝑠′ )
ℎ = 𝑐 + 𝛽 ∫ max { , ℎ} 𝑞(𝑠′ )𝑑𝑠′ (10)
1−𝛽

The aim is to solve this nonlinear equation by iteration, and from it obtain the reservation
wage.
Try to carry this out, setting
• the state sequence {𝑠𝑡 } to be IID and standard normal and
• the wage function to be 𝑤(𝑠) = exp(𝜇 + 𝜎𝑠).
You will need to implement a new version of the McCallModel class that assumes a lognormal
wage distribution.
Calculate the integral by Monte Carlo, by averaging over a large number of wage draws.
For default parameters, use c=25, β=0.99, σ=0.5, μ=2.5.
Once your code is working, investigate how the reservation wage changes with 𝑐 and 𝛽.

20.7 Solutions

20.7.1 Exercise 1

Here’s one solution


346 CHAPTER 20. JOB SEARCH I: THE MCCALL SEARCH MODEL

In [15]: cdf = np.cumsum(q_default)

@jit(nopython=True)
def compute_stopping_time(w_bar, seed=1234):

np.random.seed(seed)
t = 1
while True:
# Generate a wage draw
w = w_default[qe.random.draw(cdf)]
# Stop when the draw is above the reservation wage
if w >= w_bar:
stopping_time = t
break
else:
t += 1
return stopping_time

@jit(nopython=True)
def compute_mean_stopping_time(w_bar, num_reps=100000):
obs = np.empty(num_reps)
for i in range(num_reps):
obs[i] = compute_stopping_time(w_bar, seed=i)
return obs.mean()

c_vals = np.linspace(10, 40, 25)


stop_times = np.empty_like(c_vals)
for i, c in enumerate(c_vals):
mcm = McCallModel(c=c)
w_bar = compute_reservation_wage_two(mcm)
stop_times[i] = compute_mean_stopping_time(w_bar)

fig, ax = plt.subplots()

ax.plot(c_vals, stop_times, label="mean unemployment duration")


ax.set(xlabel="unemployment compensation", ylabel="months")
ax.legend()

plt.show()
20.7. SOLUTIONS 347

20.7.2 Exercise 2

In [16]: mccall_data_continuous = [
('c', float64), # unemployment compensation
('β', float64), # discount factor
('σ', float64), # scale parameter in lognormal distribution
('μ', float64), # location parameter in lognormal distribution
('w_draws', float64[:]) # draws of wages for Monte Carlo
]

@jitclass(mccall_data_continuous)
class McCallModelContinuous:

def __init__(self, c=25, β=0.99, σ=0.5, μ=2.5, mc_size=1000):

self.c, self.β, self.σ, self.μ = c, β, σ, μ

# Draw and store shocks


np.random.seed(1234)
s = np.random.randn(mc_size)
self.w_draws = np.exp(μ+ σ * s)

@jit(nopython=True)
def compute_reservation_wage_continuous(mcmc, max_iter=500, tol=1e­5):

c, β, σ, μ, w_draws = mcmc.c, mcmc.β, mcmc.σ, mcmc.μ, mcmc.w_draws

h = np.mean(w_draws) / (1 ­ β) # initial guess


i = 0
error = tol + 1
while i < max_iter and error > tol:

integral = np.mean(np.maximum(w_draws / (1 ­ β), h))


348 CHAPTER 20. JOB SEARCH I: THE MCCALL SEARCH MODEL

h_next = c + β * integral

error = np.abs(h_next ­ h)
i += 1

h = h_next

# == Now compute the reservation wage == #

return (1 ­ β) * h

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/ipykernel_launcher.py:9: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba­
doc/latest/reference/deprecation.html#change­of­jitclass­location for the time�
↪frame.
if __name__ == '__main__':

Now we investigate how the reservation wage changes with 𝑐 and 𝛽.


We will do this using a contour plot.

In [17]: grid_size = 25
R = np.empty((grid_size, grid_size))

c_vals = np.linspace(10.0, 30.0, grid_size)


β_vals = np.linspace(0.9, 0.99, grid_size)

for i, c in enumerate(c_vals):
for j, β in enumerate(β_vals):
mcmc = McCallModelContinuous(c=c, β=β)
R[i, j] = compute_reservation_wage_continuous(mcmc)

In [18]: fig, ax = plt.subplots()

cs1 = ax.contourf(c_vals, β_vals, R.T, alpha=0.75)


ctr1 = ax.contour(c_vals, β_vals, R.T)

plt.clabel(ctr1, inline=1, fontsize=13)


plt.colorbar(cs1, ax=ax)

ax.set_title("reservation wage")
ax.set_xlabel("$c$", fontsize=16)
ax.set_ylabel("$β$", fontsize=16)

ax.ticklabel_format(useOffset=False)

plt.show()
20.7. SOLUTIONS 349
350 CHAPTER 20. JOB SEARCH I: THE MCCALL SEARCH MODEL
Chapter 21

Job Search II: Search and


Separation

21.1 Contents

• Overview 21.2
• The Model 21.3
• Solving the Model 21.4
• Implementation 21.5
• Impact of Parameters 21.6
• Exercises 21.7
• Solutions 21.8
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon

21.2 Overview

Previously we looked at the McCall job search model [80] as a way of understanding unem-
ployment and worker decisions.
One unrealistic feature of the model is that every job is permanent.
In this lecture, we extend the McCall model by introducing job separation.
Once separation enters the picture, the agent comes to view
• the loss of a job as a capital loss, and
• a spell of unemployment as an investment in searching for an acceptable job
The other minor addition is that a utility function will be included to make worker prefer-
ences slightly more sophisticated.
We’ll need the following imports

In [2]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

351
352 CHAPTER 21. JOB SEARCH II: SEARCH AND SEPARATION

from numba import njit, jitclass, float64


from quantecon.distributions import BetaBinomial

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

21.3 The Model

The model is similar to the baseline McCall job search model.


It concerns the life of an infinitely lived worker and
• the opportunities he or she (let’s say he to save one character) has to work at different
wages
• exogenous events that destroy his current job
• his decision making process while unemployed
The worker can be in one of two states: employed or unemployed.
He wants to maximize


𝔼 ∑ 𝛽 𝑡 𝑢(𝑦𝑡 ) (1)
𝑡=0

At this stage the only difference from the baseline model is that we’ve added some flexibility
to preferences by introducing a utility function 𝑢.
It satisfies 𝑢′ > 0 and 𝑢″ < 0.

21.3.1 The Wage Process

For now we will drop the separation of state process and wage process that we maintained for
the baseline model.
In particular, we simply suppose that wage offers {𝑤𝑡 } are IID with common distribution 𝑞.
The set of possible wage values is denoted by 𝕎.
(Later we will go back to having a separate state process {𝑠𝑡 } driving random outcomes, since
this formulation is usually convenient in more sophisticated models.)

21.3.2 Timing and Decisions

At the start of each period, the agent can be either


• unemployed or
• employed at some existing wage level 𝑤𝑒 .
At the start of a given period, the current wage offer 𝑤𝑡 is observed.
If currently employed, the worker
21.4. SOLVING THE MODEL 353

1. receives utility 𝑢(𝑤𝑒 ) and

2. is fired with some (small) probability 𝛼.

If currently unemployed, the worker either accepts or rejects the current offer 𝑤𝑡 .
If he accepts, then he begins work immediately at wage 𝑤𝑡 .
If he rejects, then he receives unemployment compensation 𝑐.
The process then repeats.
(Note: we do not allow for job search while employed—this topic is taken up in a later lec-
ture)

21.4 Solving the Model

We drop time subscripts in what follows and primes denote next period values.
Let
• 𝑣(𝑤𝑒 ) be total lifetime value accruing to a worker who enters the current period em-
ployed with existing wage 𝑤𝑒
• ℎ(𝑤) be total lifetime value accruing to a worker who who enters the current period un-
employed and receives wage offer 𝑤.
Here value means the value of the objective function (1) when the worker makes optimal deci-
sions at all future points in time.
Our first aim is to obtain these functions.

21.4.1 The Bellman Equations

Suppose for now that the worker can calculate the functions 𝑣 and ℎ and use them in his de-
cision making.
Then 𝑣 and ℎ should satisfy

𝑣(𝑤𝑒 ) = 𝑢(𝑤𝑒 ) + 𝛽 [(1 − 𝛼)𝑣(𝑤𝑒 ) + 𝛼 ∑ ℎ(𝑤′ )𝑞(𝑤′ )] (2)


𝑤′ ∈𝕎

and

ℎ(𝑤) = max {𝑣(𝑤), 𝑢(𝑐) + 𝛽 ∑ ℎ(𝑤′ )𝑞(𝑤′ )} (3)


𝑤′ ∈𝕎

Equation (2) expresses the value of being employed at wage 𝑤𝑒 in terms of


• current reward 𝑢(𝑤𝑒 ) plus
• discounted expected reward tomorrow, given the 𝛼 probability of being fired
Equation (3) expresses the value of being unemployed with offer 𝑤 in hand as a maximum
over the value of two options: accept or reject the current offer.
Accepting transitions the worker to employment and hence yields reward 𝑣(𝑤).
354 CHAPTER 21. JOB SEARCH II: SEARCH AND SEPARATION

Rejecting leads to unemployment compensation and unemployment tomorrow.


Equations (2) and (3) are the Bellman equations for this model.
They provide enough information to solve for both 𝑣 and ℎ.

21.4.2 A Simplifying Transformation

Rather than jumping straight into solving these equations, let’s see if we can simplify them
somewhat.
(This process will be analogous to our second pass at the plain vanilla McCall model, where
we simplified the Bellman equation.)
First, let

𝑑 ∶= ∑ ℎ(𝑤′ )𝑞(𝑤′ ) (4)


𝑤′ ∈𝕎

be the expected value of unemployment tomorrow.


We can now write (3) as

ℎ(𝑤) = max {𝑣(𝑤), 𝑢(𝑐) + 𝛽𝑑}

or, shifting time forward one period

∑ ℎ(𝑤′ )𝑞(𝑤′ ) = ∑ max {𝑣(𝑤′ ), 𝑢(𝑐) + 𝛽𝑑} 𝑞(𝑤′ )


𝑤′ ∈𝕎 𝑤′ ∈𝕎

Using (4) again now gives

𝑑 = ∑ max {𝑣(𝑤′ ), 𝑢(𝑐) + 𝛽𝑑} 𝑞(𝑤′ ) (5)


𝑤′ ∈𝕎

Finally, (2) can now be rewritten as

𝑣(𝑤) = 𝑢(𝑤) + 𝛽 [(1 − 𝛼)𝑣(𝑤) + 𝛼𝑑] (6)

In the last expression, we wrote 𝑤𝑒 as 𝑤 to make the notation simpler.

21.4.3 The Reservation Wage

Suppose we can use (5) and (6) to solve for 𝑑 and 𝑣.


(We will do this soon.)
We can then determine optimal behavior for the worker.
From (3), we see that an unemployed agent accepts current offer 𝑤 if 𝑣(𝑤) ≥ 𝑢(𝑐) + 𝛽𝑑.
This means precisely that the value of accepting is higher than the expected value of reject-
ing.
21.5. IMPLEMENTATION 355

It is clear that 𝑣 is (at least weakly) increasing in 𝑤, since the agent is never made worse off
by a higher wage offer.
Hence, we can express the optimal choice as accepting wage offer 𝑤 if and only if

𝑤 ≥ 𝑤̄ where 𝑤̄ solves 𝑣(𝑤)̄ = 𝑢(𝑐) + 𝛽𝑑

21.4.4 Solving the Bellman Equations

We’ll use the same iterative approach to solving the Bellman equations that we adopted in
the first job search lecture.
Here this amounts to

1. make guesses for 𝑑 and 𝑣

2. plug these guesses into the right-hand sides of (5) and (6)

3. update the left-hand sides from this rule and then repeat

In other words, we are iterating using the rules

𝑑𝑛+1 = ∑ max {𝑣𝑛 (𝑤′ ), 𝑢(𝑐) + 𝛽𝑑𝑛 } 𝑞(𝑤′ ) (7)


𝑤′ ∈𝕎

𝑣𝑛+1 (𝑤) = 𝑢(𝑤) + 𝛽 [(1 − 𝛼)𝑣𝑛 (𝑤) + 𝛼𝑑𝑛 ] (8)

starting from some initial conditions 𝑑0 , 𝑣0 .


As before, the system always converges to the true solutions—in this case, the 𝑣 and 𝑑 that
solve (5) and (6).
(A proof can be obtained via the Banach contraction mapping theorem.)

21.5 Implementation

Let’s implement this iterative process.


In the code, you’ll see that we use a class to store the various parameters and other objects
associated with a given model.
This helps to tidy up the code and provides an object that’s easy to pass to functions.
The default utility function is a CRRA utility function

In [3]: @njit
def u(c, σ=2.0):
return (c**(1 ­ σ) ­ 1) / (1 ­ σ)

Also, here’s a default wage distribution, based around the BetaBinomial distribution:
356 CHAPTER 21. JOB SEARCH II: SEARCH AND SEPARATION

In [4]: n = 60 # n possible outcomes for w


w_default = np.linspace(10, 20, n) # wages between 10 and 20
a, b = 600, 400 # shape parameters
dist = BetaBinomial(n­1, a, b)
q_default = dist.pdf()

Here’s our jitted class for the McCall model with separation.

In [5]: mccall_data = [
('α', float64), # job separation rate
('β', float64), # discount factor
('c', float64), # unemployment compensation
('w', float64[:]), # list of wage values
('q', float64[:]) # pmf of random variable w
]

@jitclass(mccall_data)
class McCallModel:
"""
Stores the parameters and functions associated with a given model.
"""

def __init__(self, α=0.2, β=0.98, c=6.0, w=w_default, q=q_default):

self.α, self.β, self.c, self.w, self.q = α, β, c, w, q

def update(self, v, d):

α, β, c, w, q = self.α, self.β, self.c, self.w, self.q

v_new = np.empty_like(v)

for i in range(len(w)):
v_new[i] = u(w[i]) + β * ((1 ­ α) * v[i] + α * d)

d_new = np.sum(np.maximum(v, u(c) + β * d) * q)

return v_new, d_new

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/ipykernel_launcher.py:9: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba­
doc/latest/reference/deprecation.html#change­of­jitclass­location for the time�
↪frame.
if __name__ == '__main__':

Now we iterate until successive realizations are closer together than some small tolerance
level.
We then return the current iterate as an approximate solution.

In [6]: @njit
def solve_model(mcm, tol=1e­5, max_iter=2000):
"""
Iterates to convergence on the Bellman equations
21.5. IMPLEMENTATION 357

* mcm is an instance of McCallModel


"""

v = np.ones_like(mcm.w) # Initial guess of v


d = 1 # Initial guess of d
i = 0
error = tol + 1

while error > tol and i < max_iter:


v_new, d_new = mcm.update(v, d)
error_1 = np.max(np.abs(v_new ­ v))
error_2 = np.abs(d_new ­ d)
error = max(error_1, error_2)
v = v_new
d = d_new
i += 1

return v, d

21.5.1 The Reservation Wage: First Pass

The optimal choice of the agent is summarized by the reservation wage.


As discussed above, the reservation wage is the 𝑤̄ that solves 𝑣(𝑤)̄ = ℎ where ℎ ∶= 𝑢(𝑐) + 𝛽𝑑
is the continuation value.
Let’s compare 𝑣 and ℎ to see what they look like.
We’ll use the default parameterizations found in the code above.

In [7]: mcm = McCallModel()


v, d = solve_model(mcm)
h = u(mcm.c) + mcm.β * d

fig, ax = plt.subplots()

ax.plot(mcm.w, v, 'b­', lw=2, alpha=0.7, label='$v$')


ax.plot(mcm.w, [h] * len(mcm.w),
'g­', lw=2, alpha=0.7, label='$h$')
ax.set_xlim(min(mcm.w), max(mcm.w))
ax.legend()

plt.show()
358 CHAPTER 21. JOB SEARCH II: SEARCH AND SEPARATION

The value 𝑣 is increasing because higher 𝑤 generates a higher wage flow conditional on stay-
ing employed.

21.5.2 The Reservation Wage: Computation

Here’s a function compute_reservation_wage that takes an instance of McCallModel and re-


turns the associated reservation wage.

In [8]: @njit
def compute_reservation_wage(mcm):
"""
Computes the reservation wage of an instance of the McCall model
by finding the smallest w such that v(w) >= h.

If no such w exists, then w_bar is set to np.inf.


"""

v, d = solve_model(mcm)
h = u(mcm.c) + mcm.β * d

w_bar = np.inf
for i, wage in enumerate(mcm.w):
if v[i] > h:
w_bar = wage
break

return w_bar

Next we will investigate how the reservation wage varies with parameters.
21.6. IMPACT OF PARAMETERS 359

21.6 Impact of Parameters

In each instance below, we’ll show you a figure and then ask you to reproduce it in the exer-
cises.

21.6.1 The Reservation Wage and Unemployment Compensation

First, let’s look at how 𝑤̄ varies with unemployment compensation.


In the figure below, we use the default parameters in the McCallModel class, apart from c
(which takes the values given on the horizontal axis)

As expected, higher unemployment compensation causes the worker to hold out for higher
wages.
In effect, the cost of continuing job search is reduced.

21.6.2 The Reservation Wage and Discounting

Next, let’s investigate how 𝑤̄ varies with the discount factor.


The next figure plots the reservation wage associated with different values of 𝛽
360 CHAPTER 21. JOB SEARCH II: SEARCH AND SEPARATION

Again, the results are intuitive: More patient workers will hold out for higher wages.

21.6.3 The Reservation Wage and Job Destruction

Finally, let’s look at how 𝑤̄ varies with the job separation rate 𝛼.
Higher 𝛼 translates to a greater chance that a worker will face termination in each period
once employed.

Once more, the results are in line with our intuition.


If the separation rate is high, then the benefit of holding out for a higher wage falls.
Hence the reservation wage is lower.
21.7. EXERCISES 361

21.7 Exercises

21.7.1 Exercise 1

Reproduce all the reservation wage figures shown above.


Regarding the values on the horizontal axis, use

In [9]: grid_size = 25
c_vals = np.linspace(2, 12, grid_size) # unemployment compensation
beta_vals = np.linspace(0.8, 0.99, grid_size) # discount factors
alpha_vals = np.linspace(0.05, 0.5, grid_size) # separation rate

21.8 Solutions

21.8.1 Exercise 1

Here’s the first figure.

In [10]: mcm = McCallModel()

w_bar_vals = np.empty_like(c_vals)

fig, ax = plt.subplots()

for i, c in enumerate(c_vals):
mcm.c = c
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar

ax.set(xlabel='unemployment compensation',
ylabel='reservation wage')
ax.plot(c_vals, w_bar_vals, label=r'$\bar w$ as a function of $c$')
ax.legend()

plt.show()
362 CHAPTER 21. JOB SEARCH II: SEARCH AND SEPARATION

Here’s the second one.

In [11]: fig, ax = plt.subplots()

for i, β in enumerate(beta_vals):
mcm.β = β
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar

ax.set(xlabel='discount factor', ylabel='reservation wage')


ax.plot(beta_vals, w_bar_vals, label=r'$\bar w$ as a function of $\beta$')
ax.legend()

plt.show()
21.8. SOLUTIONS 363

Here’s the third.

In [12]: fig, ax = plt.subplots()

for i, α in enumerate(alpha_vals):
mcm.α = α
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar

ax.set(xlabel='separation rate', ylabel='reservation wage')


ax.plot(alpha_vals, w_bar_vals, label=r'$\bar w$ as a function of $\alpha$')
ax.legend()

plt.show()
364 CHAPTER 21. JOB SEARCH II: SEARCH AND SEPARATION
Chapter 22

Job Search III: Fitted Value


Function Iteration

22.1 Contents

• Overview 22.2
• The Algorithm 22.3
• Implementation 22.4
• Exercises 22.5
• Solutions 22.6
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon


!pip install interpolation

22.2 Overview

In this lecture we again study the McCall job search model with separation, but now with a
continuous wage distribution.
While we already considered continuous wage distributions briefly in the exercises of the first
job search lecture, the change was relatively trivial in that case.
This is because we were able to reduce the problem to solving for a single scalar value (the
continuation value).
Here, with separation, the change is less trivial, since a continuous wage distribution leads to
an uncountably infinite state space.
The infinite state space leads to additional challenges, particularly when it comes to applying
value function iteration (VFI).
These challenges will lead us to modify VFI by adding an interpolation step.
The combination of VFI and this interpolation step is called fitted value function itera-
tion (fitted VFI).
Fitted VFI is very common in practice, so we will take some time to work through the de-
tails.

365
366 CHAPTER 22. JOB SEARCH III: FITTED VALUE FUNCTION ITERATION

We will use the following imports:

In [2]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

import quantecon as qe
from interpolation import interp
from numpy.random import randn
from numba import njit, jitclass, prange, float64, int32

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

22.3 The Algorithm

The model is the same as the McCall model with job separation we studied before, except
that the wage offer distribution is continuous.
We are going to start with the two Bellman equations we obtained for the model with job
separation after a simplifying transformation.
Modified to accommodate continuous wage draws, they take the following form:

𝑑 = ∫ max {𝑣(𝑤′ ), 𝑢(𝑐) + 𝛽𝑑} 𝑞(𝑤′ )𝑑𝑤′ (1)

and

𝑣(𝑤) = 𝑢(𝑤) + 𝛽 [(1 − 𝛼)𝑣(𝑤) + 𝛼𝑑] (2)

The unknowns here are the function 𝑣 and the scalar 𝑑.


The difference between these and the pair of Bellman equations we previously worked on are

1. in (1), what used to be a sum over a finite number of wage values is an integral over an
infinite set.

2. The function 𝑣 in (2) is defined over all 𝑤 ∈ ℝ+ .

The function 𝑞 in (1) is the density of the wage offer distribution.


Its support is taken as equal to ℝ+ .

22.3.1 Value Function Iteration

In theory, we should now proceed as follows:

1. Begin with a guess 𝑣, 𝑑 for the solutions to (1)–(2).


22.3. THE ALGORITHM 367

2. Plug 𝑣, 𝑑 into the right hand side of (1)–(2) and compute the left hand side to obtain
updates 𝑣′ , 𝑑′
3. Unless some stopping condition is satisfied, set (𝑣, 𝑑) = (𝑣′ , 𝑑′ ) and go to step 2.

However, there is a problem we must confront before we implement this procedure: The iter-
ates of the value function can neither be calculated exactly nor stored on a computer.
To see the issue, consider (2).
Even if 𝑣 is a known function, the only way to store its update 𝑣′ is to record its value 𝑣′ (𝑤)
for every 𝑤 ∈ ℝ+ .
Clearly, this is impossible.

22.3.2 Fitted Value Function Iteration

What we will do instead is use fitted value function iteration.


The procedure is as follows:
Let a current guess 𝑣 be given.
Now we record the value of the function 𝑣′ at only finitely many “grid” points 𝑤1 < 𝑤2 <
⋯ < 𝑤𝐼 and then reconstruct 𝑣′ from this information when required.
More precisely, the algorithm will be

1. Begin with an array v representing the values of an initial guess of the value function on
some grid points {𝑤𝑖 }.
2. Build a function 𝑣 on the state space ℝ+ by interpolation or approximation, based on v
and {𝑤𝑖 }.
3. Obtain and record the samples of the updated function 𝑣′ (𝑤𝑖 ) on each grid point 𝑤𝑖 .
4. Unless some stopping condition is satisfied, take this as the new array and go to step 1.

How should we go about step 2?


This is a problem of function approximation, and there are many ways to approach it.
What’s important here is that the function approximation scheme must not only produce a
good approximation to each 𝑣, but also that it combines well with the broader iteration algo-
rithm described above.
One good choice from both respects is continuous piecewise linear interpolation.
This method

1. combines well with value function iteration (see., e.g., [44] or [100]) and
2. preserves useful shape properties such as monotonicity and concavity/convexity.

Linear interpolation will be implemented using a JIT-aware Python interpolation library


called interpolation.py.
The next figure illustrates piecewise linear interpolation of an arbitrary function on grid
points 0, 0.2, 0.4, 0.6, 0.8, 1.
368 CHAPTER 22. JOB SEARCH III: FITTED VALUE FUNCTION ITERATION

In [3]: def f(x):


y1 = 2 * np.cos(6 * x) + np.sin(14 * x)
return y1 + 2.5

c_grid = np.linspace(0, 1, 6)
f_grid = np.linspace(0, 1, 150)

def Af(x):
return interp(c_grid, f(c_grid), x)

fig, ax = plt.subplots()

ax.plot(f_grid, f(f_grid), 'b­', label='true function')


ax.plot(f_grid, Af(f_grid), 'g­', label='linear approximation')
ax.vlines(c_grid, c_grid * 0, f(c_grid), linestyle='dashed', alpha=0.5)

ax.legend(loc="upper center")

ax.set(xlim=(0, 1), ylim=(0, 6))


plt.show()

22.4 Implementation

The first step is to build a jitted class for the McCall model with separation and a continuous
wage offer distribution.
We will take the utility function to be the log function for this application, with 𝑢(𝑐) = ln 𝑐.
We will adopt the lognormal distribution for wages, with 𝑤 = exp(𝜇 + 𝜎𝑧) when 𝑧 is standard
normal and 𝜇, 𝜎 are parameters.

In [4]: @njit
def lognormal_draws(n=1000, μ=2.5, σ=0.5, seed=1234):
22.4. IMPLEMENTATION 369

np.random.seed(seed)
z = np.random.randn(n)
w_draws = np.exp(μ + σ * z)
return w_draws

Here’s our class.

In [5]: mccall_data_continuous = [
('c', float64), # unemployment compensation
('α', float64), # job separation rate
('β', float64), # discount factor
('σ', float64), # scale parameter in lognormal distribution
('μ', float64), # location parameter in lognormal distribution
('w_grid', float64[:]), # grid of points for fitted VFI
('w_draws', float64[:]) # draws of wages for Monte Carlo
]

@jitclass(mccall_data_continuous)
class McCallModelContinuous:

def __init__(self,
c=1,
α=0.1,
β=0.96,
grid_min=1e­10,
grid_max=5,
grid_size=100,
w_draws=lognormal_draws()):

self.c, self.α, self.β = c, α, β

self.w_grid = np.linspace(grid_min, grid_max, grid_size)


self.w_draws = w_draws

def update(self, v, d):

# Simplify names
c, α, β, σ, μ = self.c, self.α, self.β, self.σ, self.μ
w = self.w_grid
u = lambda x: np.log(x)

# Interpolate array represented value function


vf = lambda x: interp(w, v, x)

# Update d using Monte Carlo to evaluate integral


d_new = np.mean(np.maximum(vf(self.w_draws), u(c) + β * d))

# Update v
v_new = u(w) + β * ((1 ­ α) * v + α * d)

return v_new, d_new

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/ipykernel_launcher.py:11: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba­
doc/latest/reference/deprecation.html#change­of­jitclass­location for the time�
↪frame.
# This is added back by InteractiveShellApp.init_path()
370 CHAPTER 22. JOB SEARCH III: FITTED VALUE FUNCTION ITERATION

We then return the current iterate as an approximate solution.

In [6]: @njit
def solve_model(mcm, tol=1e­5, max_iter=2000):
"""
Iterates to convergence on the Bellman equations

* mcm is an instance of McCallModel


"""

v = np.ones_like(mcm.w_grid) # Initial guess of v


d = 1 # Initial guess of d
i = 0
error = tol + 1

while error > tol and i < max_iter:


v_new, d_new = mcm.update(v, d)
error_1 = np.max(np.abs(v_new ­ v))
error_2 = np.abs(d_new ­ d)
error = max(error_1, error_2)
v = v_new
d = d_new
i += 1

return v, d

Here’s a function compute_reservation_wage that takes an instance of


McCallModelContinuous and returns the associated reservation wage.
If 𝑣(𝑤) < ℎ for all 𝑤, then the function returns np.inf

In [7]: @njit
def compute_reservation_wage(mcm):
"""
Computes the reservation wage of an instance of the McCall model
by finding the smallest w such that v(w) >= h.

If no such w exists, then w_bar is set to np.inf.


"""
u = lambda x: np.log(x)

v, d = solve_model(mcm)
h = u(mcm.c) + mcm.β * d

w_bar = np.inf
for i, wage in enumerate(mcm.w_grid):
if v[i] > h:
w_bar = wage
break

return w_bar

The exercises ask you to explore the solution and how it changes with parameters.
22.5. EXERCISES 371

22.5 Exercises

22.5.1 Exercise 1

Use the code above to explore what happens to the reservation wage when the wage parame-
ter 𝜇 changes.
Use the default parameters and 𝜇 in mu_vals = np.linspace(0.0, 2.0, 15)
Is the impact on the reservation wage as you expected?

22.5.2 Exercise 2

Let us now consider how the agent responds to an increase in volatility.


To try to understand this, compute the reservation wage when the wage offer distribution is
uniform on (𝑚 − 𝑠, 𝑚 + 𝑠) and 𝑠 varies.
The idea here is that we are holding the mean constant and spreading the support.
(This is a form of mean-preserving spread.)
Use s_vals = np.linspace(1.0, 2.0, 15) and m = 2.0.
State how you expect the reservation wage vary with 𝑠.
Now compute it. Is this as you expected?

22.6 Solutions

22.6.1 Exercise 1

Here is one solution.

In [8]: mcm = McCallModelContinuous()


mu_vals = np.linspace(0.0, 2.0, 15)
w_bar_vals = np.empty_like(mu_vals)

fig, ax = plt.subplots()

for i, m in enumerate(mu_vals):
mcm.w_draws = lognormal_draws(μ=m)
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar

ax.set(xlabel='mean', ylabel='reservation wage')


ax.plot(mu_vals, w_bar_vals, label=r'$\bar w$ as a function of $\mu$')
ax.legend()

plt.show()
372 CHAPTER 22. JOB SEARCH III: FITTED VALUE FUNCTION ITERATION

Not surprisingly, the agent is more inclined to wait when the distribution of offers shifts to
the right.

22.6.2 Exercise 2

Here is one solution.

In [9]: mcm = McCallModelContinuous()


s_vals = np.linspace(1.0, 2.0, 15)
m = 2.0
w_bar_vals = np.empty_like(s_vals)

fig, ax = plt.subplots()

for i, s in enumerate(s_vals):
a, b = m ­ s, m + s
mcm.w_draws = np.random.uniform(low=a, high=b, size=10_000)
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar

ax.set(xlabel='volatility', ylabel='reservation wage')


ax.plot(s_vals, w_bar_vals, label=r'$\bar w$ as a function of wage volatility')
ax.legend()

plt.show()
22.6. SOLUTIONS 373

The reservation wage increases with volatility.


One might think that higher volatility would make the agent more inclined to take a given
offer, since doing so represents certainty and waiting represents risk.
But job search is like holding an option: the worker is only exposed to upside risk (since, in a
free market, no one can force them to take a bad offer).
More volatility means higher upside potential, which encourages the agent to wait.
374 CHAPTER 22. JOB SEARCH III: FITTED VALUE FUNCTION ITERATION
Chapter 23

Job Search IV: Correlated Wage


Offers

23.1 Contents

• Overview 23.2
• The Model 23.3
• Implementation 23.4
• Unemployment Duration 23.5
• Exercises 23.6
• Solutions 23.7
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon


!pip install interpolation

23.2 Overview

In this lecture we solve a McCall style job search model with persistent and transitory com-
ponents to wages.
In other words, we relax the unrealistic assumption that randomness in wages is independent
over time.
At the same time, we will go back to assuming that jobs are permanent and no separation
occurs.
This is to keep the model relatively simple as we study the impact of correlation.
We will use the following imports:

In [2]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

import quantecon as qe
from interpolation import interp
from numpy.random import randn
from numba import njit, jitclass, prange, float64

375
376 CHAPTER 23. JOB SEARCH IV: CORRELATED WAGE OFFERS

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

23.3 The Model

Wages at each point in time are given by

𝑤𝑡 = exp(𝑧𝑡 ) + 𝑦𝑡

where

𝑦𝑡 ∼ exp(𝜇 + 𝑠𝜁𝑡 ) and 𝑧𝑡+1 = 𝑑 + 𝜌𝑧𝑡 + 𝜎𝜖𝑡+1

Here {𝜁𝑡 } and {𝜖𝑡 } are both IID and standard normal.
Here {𝑦𝑡 } is a transitory component and {𝑧𝑡 } is persistent.
As before, the worker can either

1. accept an offer and work permanently at that wage, or

2. take unemployment compensation 𝑐 and wait till next period.

The value function satisfies the Bellman equation

𝑢(𝑤)
𝑣∗ (𝑤, 𝑧) = max { , 𝑢(𝑐) + 𝛽 𝔼𝑧 𝑣∗ (𝑤′ , 𝑧 ′ )}
1−𝛽

In this express, 𝑢 is a utility function and 𝔼𝑧 is expectation of next period variables given cur-
rent 𝑧.
The variable 𝑧 enters as a state in the Bellman equation because its current value helps pre-
dict future wages.

23.3.1 A Simplification

There is a way that we can reduce dimensionality in this problem, which greatly accelerates
computation.
To start, let 𝑓 ∗ be the continuation value function, defined by

𝑓 ∗ (𝑧) ∶= 𝑢(𝑐) + 𝛽 𝔼𝑧 𝑣∗ (𝑤′ , 𝑧 ′ )

The Bellman equation can now be written

𝑢(𝑤) ∗
𝑣∗ (𝑤, 𝑧) = max { , 𝑓 (𝑧)}
1−𝛽
23.4. IMPLEMENTATION 377

Combining the last two expressions, we see that the continuation value function satisfies

𝑢(𝑤′ ) ∗ ′
𝑓 ∗ (𝑧) = 𝑢(𝑐) + 𝛽 𝔼𝑧 max { , 𝑓 (𝑧 )}
1−𝛽

We’ll solve this functional equation for 𝑓 ∗ by introducing the operator

𝑢(𝑤′ )
𝑄𝑓(𝑧) = 𝑢(𝑐) + 𝛽 𝔼𝑧 max { , 𝑓(𝑧 ′ )}
1−𝛽

By construction, 𝑓 ∗ is a fixed point of 𝑄, in the sense that 𝑄𝑓 ∗ = 𝑓 ∗ .


Under mild assumptions, it can be shown that 𝑄 is a contraction mapping over a suitable
space of continuous functions on ℝ.
By Banach’s contraction mapping theorem, this means that 𝑓 ∗ is the unique fixed point and
we can calculate it by iterating with 𝑄 from any reasonable initial condition.
Once we have 𝑓 ∗ , we can solve the search problem by stopping when the reward for accepting
exceeds the continuation value, or

𝑢(𝑤)
≥ 𝑓 ∗ (𝑧)
1−𝛽

For utility we take 𝑢(𝑐) = ln(𝑐).


The reservation wage is the wage where equality holds in the last expression.
That is,

𝑤(𝑧)
̄ ∶= exp(𝑓 ∗ (𝑧)(1 − 𝛽)) (1)

Our main aim is to solve for the reservation rule and study its properties and implications.

23.4 Implementation

Let 𝑓 be our initial guess of 𝑓 ∗ .


When we iterate, we use the fitted value function iteration algorithm.
In particular, 𝑓 and all subsequent iterates are stored as a vector of values on a grid.
These points are interpolated into a function as required, using piecewise linear interpolation.
The integral in the definition of 𝑄𝑓 is calculated by Monte Carlo.
The following list helps Numba by providing some type information about the data we will
work with.

In [3]: job_search_data = [
('μ', float64), # transient shock log mean
('s', float64), # transient shock log variance
('d', float64), # shift coefficient of persistent state
('ρ', float64), # correlation coefficient of persistent state
('σ', float64), # state volatility
('β', float64), # discount factor
378 CHAPTER 23. JOB SEARCH IV: CORRELATED WAGE OFFERS

('c', float64), # unemployment compensation


('z_grid', float64[:]), # grid over the state space
('e_draws', float64[:,:]) # Monte Carlo draws for integration
]

Here’s a class that stores the data and the right hand side of the Bellman equation.
Default parameter values are embedded in the class.

In [4]: @jitclass(job_search_data)
class JobSearch:

def __init__(self,
μ=0.0, # transient shock log mean
s=1.0, # transient shock log variance
d=0.0, # shift coefficient of persistent state
ρ=0.9, # correlation coefficient of persistent state
σ=0.1, # state volatility
β=0.98, # discount factor
c=5, # unemployment compensation
mc_size=1000,
grid_size=100):

self.μ, self.s, self.d, = μ, s, d,


self.ρ, self.σ, self.β, self.c = ρ, σ, β, c

# Set up grid
z_mean = d / (1 ­ ρ)
z_sd = np.sqrt(σ / (1 ­ ρ**2))
k = 3 # std devs from mean
a, b = z_mean ­ k * z_sd, z_mean + k * z_sd
self.z_grid = np.linspace(a, b, grid_size)

# Draw and store shocks


np.random.seed(1234)
self.e_draws = randn(2, mc_size)

def parameters(self):
"""
Return all parameters as a tuple.
"""
return self.μ, self.s, self.d, \
self.ρ, self.σ, self.β, self.c

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/ipykernel_launcher.py:1: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba­
doc/latest/reference/deprecation.html#change­of­jitclass­location for the time�
↪frame.
"""Entry point for launching an IPython kernel.

Next we implement the 𝑄 operator.

In [5]: @njit(parallel=True)
def Q(js, f_in, f_out):
"""
Apply the operator Q.
23.4. IMPLEMENTATION 379

* js is an instance of JobSearch
* f_in and f_out are arrays that represent f and Qf respectively

"""

μ, s, d, ρ, σ, β, c = js.parameters()
M = js.e_draws.shape[1]

for i in prange(len(js.z_grid)):
z = js.z_grid[i]
expectation = 0.0
for m in range(M):
e1, e2 = js.e_draws[:, m]
z_next = d + ρ * z + σ * e1
go_val = interp(js.z_grid, f_in, z_next) # f(z')
y_next = np.exp(μ + s * e2) # y' draw
w_next = np.exp(z_next) + y_next # w' draw
stop_val = np.log(w_next) / (1 ­ β)
expectation += max(stop_val, go_val)
expectation = expectation / M
f_out[i] = np.log(c) + β * expectation

Here’s a function to compute an approximation to the fixed point of 𝑄.

In [6]: def compute_fixed_point(js,


use_parallel=True,
tol=1e­4,
max_iter=1000,
verbose=True,
print_skip=25):

f_init = np.log(js.c) * np.ones(len(js.z_grid))


f_out = np.empty_like(f_init)

# Set up loop
f_in = f_init
i = 0
error = tol + 1

while i < max_iter and error > tol:


Q(js, f_in, f_out)
error = np.max(np.abs(f_in ­ f_out))
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
f_in[:] = f_out

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return f_out

Let’s try generating an instance and solving the model.

In [7]: js = JobSearch()

qe.tic()
380 CHAPTER 23. JOB SEARCH IV: CORRELATED WAGE OFFERS

f_star = compute_fixed_point(js, verbose=True)


qe.toc()

Error at iteration 25 is 0.6540143893175809.


Error at iteration 50 is 0.12643184012381425.
Error at iteration 75 is 0.030376323858035903.
Error at iteration 100 is 0.007581959253982973.
Error at iteration 125 is 0.0019085682645538782.
Error at iteration 150 is 0.00048173786846916755.
Error at iteration 175 is 0.000121400125664195.

Converged in 179 iterations.


TOC: Elapsed: 0:00:6.97

Out[7]: 6.979816436767578

Next we will compute and plot the reservation wage function defined in (1).

In [8]: res_wage_function = np.exp(f_star * (1 ­ js.β))

fig, ax = plt.subplots()
ax.plot(js.z_grid, res_wage_function, label="reservation wage given $z$")
ax.set(xlabel="$z$", ylabel="wage")
ax.legend()
plt.show()

Notice that the reservation wage is increasing in the current state 𝑧.


This is because a higher state leads the agent to predict higher future wages, increasing the
option value of waiting.
Let’s try changing unemployment compensation and look at its impact on the reservation
wage:
23.5. UNEMPLOYMENT DURATION 381

In [9]: c_vals = 1, 2, 3

fig, ax = plt.subplots()

for c in c_vals:
js = JobSearch(c=c)
f_star = compute_fixed_point(js, verbose=False)
res_wage_function = np.exp(f_star * (1 ­ js.β))
ax.plot(js.z_grid, res_wage_function, label=f"$\\bar w$ at $c = {c}$")

ax.set(xlabel="$z$", ylabel="wage")
ax.legend()
plt.show()

As expected, higher unemployment compensation shifts the reservation wage up at all state
values.

23.5 Unemployment Duration

Next we study how mean unemployment duration varies with unemployment compensation.
For simplicity we’ll fix the initial state at 𝑧𝑡 = 0.

In [10]: def compute_unemployment_duration(js, seed=1234):

f_star = compute_fixed_point(js, verbose=False)


μ, s, d, ρ, σ, β, c = js.parameters()
z_grid = js.z_grid
np.random.seed(seed)

@njit
def f_star_function(z):
382 CHAPTER 23. JOB SEARCH IV: CORRELATED WAGE OFFERS

return interp(z_grid, f_star, z)

@njit
def draw_tau(t_max=10_000):
z = 0
t = 0

unemployed = True
while unemployed and t < t_max:
# draw current wage
y = np.exp(μ + s * np.random.randn())
w = np.exp(z) + y
res_wage = np.exp(f_star_function(z) * (1 ­ β))
# if optimal to stop, record t
if w >= res_wage:
unemployed = False
τ = t
# else increment data and state
else:
z = ρ * z + d + σ * np.random.randn()
t += 1
return τ

@njit(parallel=True)
def compute_expected_tau(num_reps=100_000):
sum_value = 0
for i in prange(num_reps):
sum_value += draw_tau()
return sum_value / num_reps

return compute_expected_tau()

Let’s test this out with some possible values for unemployment compensation.

In [11]: c_vals = np.linspace(1.0, 10.0, 8)


durations = np.empty_like(c_vals)
for i, c in enumerate(c_vals):
js = JobSearch(c=c)
τ = compute_unemployment_duration(js)
durations[i] = τ

Here is a plot of the results.

In [12]: fig, ax = plt.subplots()


ax.plot(c_vals, durations)
ax.set_xlabel("unemployment compensation")
ax.set_ylabel("mean unemployment duration")
plt.show()
23.6. EXERCISES 383

Not surprisingly, unemployment duration increases when unemployment compensation is


higher.
This is because the value of waiting increases with unemployment compensation.

23.6 Exercises

23.6.1 Exercise 1

Investigate how mean unemployment duration varies with the discount factor 𝛽.
• What is your prior expectation?
• Do your results match up?

23.7 Solutions

23.7.1 Exercise 1

Here is one solution.

In [13]: beta_vals = np.linspace(0.94, 0.99, 8)


durations = np.empty_like(beta_vals)
for i, β in enumerate(beta_vals):
js = JobSearch(β=β)
τ = compute_unemployment_duration(js)
durations[i] = τ

In [14]: fig, ax = plt.subplots()


ax.plot(beta_vals, durations)
384 CHAPTER 23. JOB SEARCH IV: CORRELATED WAGE OFFERS

ax.set_xlabel("$\\beta$")
ax.set_ylabel("mean unemployment duration")
plt.show()

The figure shows that more patient individuals tend to wait longer before accepting an offer.
Chapter 24

Job Search V: Modeling Career


Choice

24.1 Contents

• Overview 24.2
• Model 24.3
• Implementation 24.4
• Exercises 24.5
• Solutions 24.6
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon

24.2 Overview

Next, we study a computational problem concerning career and job choices.


The model is originally due to Derek Neal [83].
This exposition draws on the presentation in [72], section 6.5.
We begin with some imports:

In [2]: import numpy as np


import quantecon as qe
import matplotlib.pyplot as plt
%matplotlib inline
from numba import njit, prange
from quantecon.distributions import BetaBinomial
from scipy.special import binom, beta
from mpl_toolkits.mplot3d.axes3d import Axes3D
from matplotlib import cm

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

385
386 CHAPTER 24. JOB SEARCH V: MODELING CAREER CHOICE

24.2.1 Model Features

• Career and job within career both chosen to maximize expected discounted wage flow.
• Infinite horizon dynamic programming with two state variables.

24.3 Model

In what follows we distinguish between a career and a job, where


• a career is understood to be a general field encompassing many possible jobs, and
• a job is understood to be a position with a particular firm
For workers, wages can be decomposed into the contribution of job and career
• 𝑤𝑡 = 𝜃𝑡 + 𝜖𝑡 , where
– 𝜃𝑡 is the contribution of career at time 𝑡
– 𝜖𝑡 is the contribution of the job at time 𝑡
At the start of time 𝑡, a worker has the following options
• retain a current (career, job) pair (𝜃𝑡 , 𝜖𝑡 ) — referred to hereafter as “stay put”
• retain a current career 𝜃𝑡 but redraw a job 𝜖𝑡 — referred to hereafter as “new job”
• redraw both a career 𝜃𝑡 and a job 𝜖𝑡 — referred to hereafter as “new life”
Draws of 𝜃 and 𝜖 are independent of each other and past values, with
• 𝜃𝑡 ∼ 𝐹
• 𝜖𝑡 ∼ 𝐺
Notice that the worker does not have the option to retain a job but redraw a career — start-
ing a new career always requires starting a new job.
A young worker aims to maximize the expected sum of discounted wages


𝔼 ∑ 𝛽 𝑡 𝑤𝑡 (1)
𝑡=0

subject to the choice restrictions specified above.


Let 𝑣(𝜃, 𝜖) denote the value function, which is the maximum of (1) overall feasible (career,
job) policies, given the initial state (𝜃, 𝜖).
The value function obeys

𝑣(𝜃, 𝜖) = max{𝐼, 𝐼𝐼, 𝐼𝐼𝐼}

where

𝐼 = 𝜃 + 𝜖 + 𝛽𝑣(𝜃, 𝜖)

𝐼𝐼 = 𝜃 + ∫ 𝜖′ 𝐺(𝑑𝜖′ ) + 𝛽 ∫ 𝑣(𝜃, 𝜖′ )𝐺(𝑑𝜖′ ) (2)

𝐼𝐼𝐼 = ∫ 𝜃′ 𝐹 (𝑑𝜃′ ) + ∫ 𝜖′ 𝐺(𝑑𝜖′ ) + 𝛽 ∫ ∫ 𝑣(𝜃′ , 𝜖′ )𝐺(𝑑𝜖′ )𝐹 (𝑑𝜃′ )

Evidently 𝐼, 𝐼𝐼 and 𝐼𝐼𝐼 correspond to “stay put”, “new job” and “new life”, respectively.
24.3. MODEL 387

24.3.1 Parameterization

As in [72], section 6.5, we will focus on a discrete version of the model, parameterized as fol-
lows:
• both 𝜃 and 𝜖 take values in the set np.linspace(0, B, grid_size) — an even grid of
points between 0 and 𝐵 inclusive
• grid_size = 50
• B = 5
• β = 0.95
The distributions 𝐹 and 𝐺 are discrete distributions generating draws from the grid points
np.linspace(0, B, grid_size).

A very useful family of discrete distributions is the Beta-binomial family, with probability
mass function

𝑛 𝐵(𝑘 + 𝑎, 𝑛 − 𝑘 + 𝑏)
𝑝(𝑘 | 𝑛, 𝑎, 𝑏) = ( ) , 𝑘 = 0, … , 𝑛
𝑘 𝐵(𝑎, 𝑏)

Interpretation:
• draw 𝑞 from a Beta distribution with shape parameters (𝑎, 𝑏)
• run 𝑛 independent binary trials, each with success probability 𝑞
• 𝑝(𝑘 | 𝑛, 𝑎, 𝑏) is the probability of 𝑘 successes in these 𝑛 trials
Nice properties:
• very flexible class of distributions, including uniform, symmetric unimodal, etc.
• only three parameters
Here’s a figure showing the effect on the pmf of different shape parameters when 𝑛 = 50.

In [3]: def gen_probs(n, a, b):


probs = np.zeros(n+1)
for k in range(n+1):
probs[k] = binom(n, k) * beta(k + a, n ­ k + b) / beta(a, b)
return probs

n = 50
a_vals = [0.5, 1, 100]
b_vals = [0.5, 1, 100]
fig, ax = plt.subplots(figsize=(10, 6))
for a, b in zip(a_vals, b_vals):
ab_label = f'$a = {a:.1f}$, $b = {b:.1f}$'
ax.plot(list(range(0, n+1)), gen_probs(n, a, b), '­o', label=ab_label)
ax.legend()
plt.show()
388 CHAPTER 24. JOB SEARCH V: MODELING CAREER CHOICE

24.4 Implementation

We will first create a class CareerWorkerProblem which will hold the default parameterizations
of the model and an initial guess for the value function.

In [4]: class CareerWorkerProblem:

def __init__(self,
B=5.0, # Upper bound
β=0.95, # Discount factor
grid_size=50, # Grid size
F_a=1,
F_b=1,
G_a=1,
G_b=1):

self.β, self.grid_size, self.B = β, grid_size, B

self.θ = np.linspace(0, B, grid_size) # Set of θ values


self.ϵ = np.linspace(0, B, grid_size) # Set of ϵ values

self.F_probs = BetaBinomial(grid_size ­ 1, F_a, F_b).pdf()


self.G_probs = BetaBinomial(grid_size ­ 1, G_a, G_b).pdf()
self.F_mean = np.sum(self.θ * self.F_probs)
self.G_mean = np.sum(self.ϵ * self.G_probs)

# Store these parameters for str and repr methods


self._F_a, self._F_b = F_a, F_b
self._G_a, self._G_b = G_a, G_b

The following function takes an instance of CareerWorkerProblem and returns the correspond-
ing Bellman operator 𝑇 and the greedy policy function.
24.4. IMPLEMENTATION 389

In this model, 𝑇 is defined by 𝑇 𝑣(𝜃, 𝜖) = max{𝐼, 𝐼𝐼, 𝐼𝐼𝐼}, where 𝐼, 𝐼𝐼 and 𝐼𝐼𝐼 are as given in
(2).

In [5]: def operator_factory(cw, parallel_flag=True):

"""
Returns jitted versions of the Bellman operator and the
greedy policy function

cw is an instance of ``CareerWorkerProblem``
"""

θ, ϵ, β = cw.θ, cw.ϵ, cw.β


F_probs, G_probs = cw.F_probs, cw.G_probs
F_mean, G_mean = cw.F_mean, cw.G_mean

@njit(parallel=parallel_flag)
def T(v):
"The Bellman operator"

v_new = np.empty_like(v)

for i in prange(len(v)):
for j in prange(len(v)):
v1 = θ[i] + ϵ[j] + β * v[i, j] # Stay put
v2 = θ[i] + G_mean + β * v[i, :] @ G_probs # New job
v3 = G_mean + F_mean + β * F_probs @ v @ G_probs # New life
v_new[i, j] = max(v1, v2, v3)

return v_new

@njit
def get_greedy(v):
"Computes the v­greedy policy"

σ = np.empty(v.shape)

for i in range(len(v)):
for j in range(len(v)):
v1 = θ[i] + ϵ[j] + β * v[i, j]
v2 = θ[i] + G_mean + β * v[i, :] @ G_probs
v3 = G_mean + F_mean + β * F_probs @ v @ G_probs
if v1 > max(v2, v3):
action = 1
elif v2 > max(v1, v3):
action = 2
else:
action = 3
σ[i, j] = action

return σ

return T, get_greedy

Lastly, solve_model will take an instance of CareerWorkerProblem and iterate using the Bell-
man operator to find the fixed point of the value function.

In [6]: def solve_model(cw,


use_parallel=True,
tol=1e­4,
max_iter=1000,
390 CHAPTER 24. JOB SEARCH V: MODELING CAREER CHOICE

verbose=True,
print_skip=25):

T, _ = operator_factory(cw, parallel_flag=use_parallel)

# Set up loop
v = np.ones((cw.grid_size, cw.grid_size)) * 100 # Initial guess
i = 0
error = tol + 1

while i < max_iter and error > tol:


v_new = T(v)
error = np.max(np.abs(v ­ v_new))
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
v = v_new

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return v_new

Here’s the solution to the model – an approximate value function

In [7]: cw = CareerWorkerProblem()
T, get_greedy = operator_factory(cw)
v_star = solve_model(cw, verbose=False)
greedy_star = get_greedy(v_star)

fig = plt.figure(figsize=(8, 6))


ax = fig.add_subplot(111, projection='3d')
tg, eg = np.meshgrid(cw.θ, cw.ϵ)
ax.plot_surface(tg,
eg,
v_star.T,
cmap=cm.jet,
alpha=0.5,
linewidth=0.25)
ax.set(xlabel='θ', ylabel='ϵ', zlim=(150, 200))
ax.view_init(ax.elev, 225)
plt.show()
24.4. IMPLEMENTATION 391

And here is the optimal policy

In [8]: fig, ax = plt.subplots(figsize=(6, 6))


tg, eg = np.meshgrid(cw.θ, cw.ϵ)
lvls = (0.5, 1.5, 2.5, 3.5)
ax.contourf(tg, eg, greedy_star.T, levels=lvls, cmap=cm.winter, alpha=0.5)
ax.contour(tg, eg, greedy_star.T, colors='k', levels=lvls, linewidths=2)
ax.set(xlabel='θ', ylabel='ϵ')
ax.text(1.8, 2.5, 'new life', fontsize=14)
ax.text(4.5, 2.5, 'new job', fontsize=14, rotation='vertical')
ax.text(4.0, 4.5, 'stay put', fontsize=14)
plt.show()
392 CHAPTER 24. JOB SEARCH V: MODELING CAREER CHOICE

Interpretation:
• If both job and career are poor or mediocre, the worker will experiment with a new job
and new career.
• If career is sufficiently good, the worker will hold it and experiment with new jobs until
a sufficiently good one is found.
• If both job and career are good, the worker will stay put.
Notice that the worker will always hold on to a sufficiently good career, but not necessarily
hold on to even the best paying job.
The reason is that high lifetime wages require both variables to be large, and the worker can-
not change careers without changing jobs.
• Sometimes a good job must be sacrificed in order to change to a better career.

24.5 Exercises

24.5.1 Exercise 1

Using the default parameterization in the class CareerWorkerProblem, generate and plot typi-
cal sample paths for 𝜃 and 𝜖 when the worker follows the optimal policy.
24.5. EXERCISES 393

In particular, modulo randomness, reproduce the following figure (where the horizontal axis
represents time)

Hint: To generate the draws from the distributions 𝐹 and 𝐺, use quantecon.random.draw().

24.5.2 Exercise 2

Let’s now consider how long it takes for the worker to settle down to a permanent job, given
a starting point of (𝜃, 𝜖) = (0, 0).
In other words, we want to study the distribution of the random variable

𝑇 ∗ ∶= the first point in time from which the worker’s job no longer changes

Evidently, the worker’s job becomes permanent if and only if (𝜃𝑡 , 𝜖𝑡 ) enters the “stay put”
region of (𝜃, 𝜖) space.
Letting 𝑆 denote this region, 𝑇 ∗ can be expressed as the first passage time to 𝑆 under the
optimal policy:

𝑇 ∗ ∶= inf{𝑡 ≥ 0 | (𝜃𝑡 , 𝜖𝑡 ) ∈ 𝑆}

Collect 25,000 draws of this random variable and compute the median (which should be
about 7).
Repeat the exercise with 𝛽 = 0.99 and interpret the change.
394 CHAPTER 24. JOB SEARCH V: MODELING CAREER CHOICE

24.5.3 Exercise 3

Set the parameterization to G_a = G_b = 100 and generate a new optimal policy figure – in-
terpret.

24.6 Solutions

24.6.1 Exercise 1

Simulate job/career paths.


In reading the code, recall that optimal_policy[i, j] = policy at (𝜃𝑖 , 𝜖𝑗 ) = either 1, 2 or 3;
meaning ‘stay put’, ‘new job’ and ‘new life’.

In [9]: F = np.cumsum(cw.F_probs)
G = np.cumsum(cw.G_probs)
v_star = solve_model(cw, verbose=False)
T, get_greedy = operator_factory(cw)
greedy_star = get_greedy(v_star)

def gen_path(optimal_policy, F, G, t=20):


i = j = 0
θ_index = []
ϵ_index = []
for t in range(t):
if greedy_star[i, j] == 1: # Stay put
pass
elif greedy_star[i, j] == 2: # New job
j = int(qe.random.draw(G))
else: # New life
i, j = int(qe.random.draw(F)), int(qe.random.draw(G))
θ_index.append(i)
ϵ_index.append(j)
return cw.θ[θ_index], cw.ϵ[ϵ_index]

fig, axes = plt.subplots(2, 1, figsize=(10, 8))


for ax in axes:
θ_path, ϵ_path = gen_path(greedy_star, F, G)
ax.plot(ϵ_path, label='ϵ')
ax.plot(θ_path, label='θ')
ax.set_ylim(0, 6)

plt.legend()
plt.show()
24.6. SOLUTIONS 395

24.6.2 Exercise 2

The median for the original parameterization can be computed as follows

In [10]: cw = CareerWorkerProblem()
F = np.cumsum(cw.F_probs)
G = np.cumsum(cw.G_probs)
T, get_greedy = operator_factory(cw)
v_star = solve_model(cw, verbose=False)
greedy_star = get_greedy(v_star)

@njit
def passage_time(optimal_policy, F, G):
t = 0
i = j = 0
while True:
if optimal_policy[i, j] == 1: # Stay put
return t
elif optimal_policy[i, j] == 2: # New job
j = int(qe.random.draw(G))
else: # New life
i, j = int(qe.random.draw(F)), int(qe.random.draw(G))
t += 1

@njit(parallel=True)
def median_time(optimal_policy, F, G, M=25000):
samples = np.empty(M)
for i in prange(M):
396 CHAPTER 24. JOB SEARCH V: MODELING CAREER CHOICE

samples[i] = passage_time(optimal_policy, F, G)
return np.median(samples)

median_time(greedy_star, F, G)

Out[10]: 7.0

To compute the median with 𝛽 = 0.99 instead of the default value 𝛽 = 0.95, replace cw =
CareerWorkerProblem() with cw = CareerWorkerProblem(β=0.99).

The medians are subject to randomness but should be about 7 and 14 respectively.
Not surprisingly, more patient workers will wait longer to settle down to their final job.

24.6.3 Exercise 3

In [11]: cw = CareerWorkerProblem(G_a=100, G_b=100)


T, get_greedy = operator_factory(cw)
v_star = solve_model(cw, verbose=False)
greedy_star = get_greedy(v_star)

fig, ax = plt.subplots(figsize=(6, 6))


tg, eg = np.meshgrid(cw.θ, cw.ϵ)
lvls = (0.5, 1.5, 2.5, 3.5)
ax.contourf(tg, eg, greedy_star.T, levels=lvls, cmap=cm.winter, alpha=0.5)
ax.contour(tg, eg, greedy_star.T, colors='k', levels=lvls, linewidths=2)
ax.set(xlabel='θ', ylabel='ϵ')
ax.text(1.8, 2.5, 'new life', fontsize=14)
ax.text(4.5, 2.5, 'new job', fontsize=14, rotation='vertical')
ax.text(4.0, 4.5, 'stay put', fontsize=14)
plt.show()
24.6. SOLUTIONS 397

In the new figure, you see that the region for which the worker stays put has grown because
the distribution for 𝜖 has become more concentrated around the mean, making high-paying
jobs less realistic.
398 CHAPTER 24. JOB SEARCH V: MODELING CAREER CHOICE
Chapter 25

Job Search VI: On-the-Job Search

25.1 Contents

• Overview 25.2
• Model 25.3
• Implementation 25.4
• Solving for Policies 25.5
• Exercises 25.6
• Solutions 25.7
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon


!pip install interpolation

25.2 Overview

In this section, we solve a simple on-the-job search model


• based on [72], exercise 6.18, and [62]
Let’s start with some imports:

In [2]: import numpy as np


import scipy.stats as stats
from interpolation import interp
from numba import njit, prange
import matplotlib.pyplot as plt
%matplotlib inline
from math import gamma

25.2.1 Model Features

• job-specific human capital accumulation combined with on-the-job search


• infinite-horizon dynamic programming with one state variable and two controls

399
400 CHAPTER 25. JOB SEARCH VI: ON-THE-JOB SEARCH

25.3 Model

Let 𝑥𝑡 denote the time-𝑡 job-specific human capital of a worker employed at a given firm and
let 𝑤𝑡 denote current wages.
Let 𝑤𝑡 = 𝑥𝑡 (1 − 𝑠𝑡 − 𝜙𝑡 ), where
• 𝜙𝑡 is investment in job-specific human capital for the current role and
• 𝑠𝑡 is search effort, devoted to obtaining new offers from other firms.
For as long as the worker remains in the current job, evolution of {𝑥𝑡 } is given by 𝑥𝑡+1 =
𝑔(𝑥𝑡 , 𝜙𝑡 ).
When search effort at 𝑡 is 𝑠𝑡 , the worker receives a new job offer with probability 𝜋(𝑠𝑡 ) ∈
[0, 1].
The value of the offer, measured in job-specific human capital, is 𝑢𝑡+1 , where {𝑢𝑡 } is IID with
common distribution 𝑓.
The worker can reject the current offer and continue with existing job.
Hence 𝑥𝑡+1 = 𝑢𝑡+1 if he/she accepts and 𝑥𝑡+1 = 𝑔(𝑥𝑡 , 𝜙𝑡 ) otherwise.
Let 𝑏𝑡+1 ∈ {0, 1} be a binary random variable, where 𝑏𝑡+1 = 1 indicates that the worker
receives an offer at the end of time 𝑡.
We can write

𝑥𝑡+1 = (1 − 𝑏𝑡+1 )𝑔(𝑥𝑡 , 𝜙𝑡 ) + 𝑏𝑡+1 max{𝑔(𝑥𝑡 , 𝜙𝑡 ), 𝑢𝑡+1 } (1)

Agent’s objective: maximize expected discounted sum of wages via controls {𝑠𝑡 } and {𝜙𝑡 }.
Taking the expectation of 𝑣(𝑥𝑡+1 ) and using (1), the Bellman equation for this problem can
be written as

𝑣(𝑥) = max {𝑥(1 − 𝑠 − 𝜙) + 𝛽(1 − 𝜋(𝑠))𝑣[𝑔(𝑥, 𝜙)] + 𝛽𝜋(𝑠) ∫ 𝑣[𝑔(𝑥, 𝜙) ∨ 𝑢]𝑓(𝑑𝑢)} (2)
𝑠+𝜙≤1

Here nonnegativity of 𝑠 and 𝜙 is understood, while 𝑎 ∨ 𝑏 ∶= max{𝑎, 𝑏}.

25.3.1 Parameterization

In the implementation below, we will focus on the parameterization


𝑔(𝑥, 𝜙) = 𝐴(𝑥𝜙)𝛼 , 𝜋(𝑠) = 𝑠 and 𝑓 = Beta(2, 2)

with default parameter values


• 𝐴 = 1.4
• 𝛼 = 0.6
• 𝛽 = 0.96
The Beta(2, 2) distribution is supported on (0, 1) - it has a unimodal, symmetric density
peaked at 0.5.
25.4. IMPLEMENTATION 401

25.3.2 Back-of-the-Envelope Calculations

Before we solve the model, let’s make some quick calculations that provide intuition on what
the solution should look like.
To begin, observe that the worker has two instruments to build capital and hence wages:

1. invest in capital specific to the current job via 𝜙

2. search for a new job with better job-specific capital match via 𝑠

Since wages are 𝑥(1 − 𝑠 − 𝜙), marginal cost of investment via either 𝜙 or 𝑠 is identical.
Our risk-neutral worker should focus on whatever instrument has the highest expected return.
The relative expected return will depend on 𝑥.
For example, suppose first that 𝑥 = 0.05
• If 𝑠 = 1 and 𝜙 = 0, then since 𝑔(𝑥, 𝜙) = 0, taking expectations of (1) gives expected
next period capital equal to 𝜋(𝑠)𝔼𝑢 = 𝔼𝑢 = 0.5.
• If 𝑠 = 0 and 𝜙 = 1, then next period capital is 𝑔(𝑥, 𝜙) = 𝑔(0.05, 1) ≈ 0.23.
Both rates of return are good, but the return from search is better.
Next, suppose that 𝑥 = 0.4
• If 𝑠 = 1 and 𝜙 = 0, then expected next period capital is again 0.5
• If 𝑠 = 0 and 𝜙 = 1, then 𝑔(𝑥, 𝜙) = 𝑔(0.4, 1) ≈ 0.8
Return from investment via 𝜙 dominates expected return from search.
Combining these observations gives us two informal predictions:

1. At any given state 𝑥, the two controls 𝜙 and 𝑠 will function primarily as substitutes —
worker will focus on whichever instrument has the higher expected return.

2. For sufficiently small 𝑥, search will be preferable to investment in job-specific human


capital. For larger 𝑥, the reverse will be true.

Now let’s turn to implementation, and see if we can match our predictions.

25.4 Implementation

We will set up a class JVWorker that holds the parameters of the model described above

In [3]: class JVWorker:


r"""
A Jovanovic­type model of employment with on­the­job search.

"""

def __init__(self,
A=1.4,
α=0.6,
β=0.96, # Discount factor
π=np.sqrt, # Search effort function
402 CHAPTER 25. JOB SEARCH VI: ON-THE-JOB SEARCH

a=2, # Parameter of f
b=2, # Parameter of f
grid_size=50,
mc_size=100,
�=1e­4):

self.A, self.α, self.β, self.π = A, α, β, π


self.mc_size, self.� = mc_size, �

self.g = njit(lambda x, ϕ: A * (x * ϕ)**α) # Transition function


self.f_rvs = np.random.beta(a, b, mc_size)

# Max of grid is the max of a large quantile value for f and the
# fixed point y = g(y, 1)
� = 1e­4
grid_max = max(A**(1 / (1 ­ α)), stats.beta(a, b).ppf(1 ­ �))

# Human capital
self.x_grid = np.linspace(�, grid_max, grid_size)

The function operator_factory takes an instance of this class and returns a jitted version of
the Bellman operator T, ie.

𝑇 𝑣(𝑥) = max 𝑤(𝑠, 𝜙)


𝑠+𝜙≤1

where

𝑤(𝑠, 𝜙) ∶= 𝑥(1 − 𝑠 − 𝜙) + 𝛽(1 − 𝜋(𝑠))𝑣[𝑔(𝑥, 𝜙)] + 𝛽𝜋(𝑠) ∫ 𝑣[𝑔(𝑥, 𝜙) ∨ 𝑢]𝑓(𝑑𝑢) (3)

When we represent 𝑣, it will be with a NumPy array v giving values on grid x_grid.
But to evaluate the right-hand side of (3), we need a function, so we replace the arrays v and
x_grid with a function v_func that gives linear interpolation of v on x_grid.

Inside the for loop, for each x in the grid over the state space, we set up the function 𝑤(𝑧) =
𝑤(𝑠, 𝜙) defined in (3).
The function is maximized over all feasible (𝑠, 𝜙) pairs.
Another function, get_greedy returns the optimal choice of 𝑠 and 𝜙 at each 𝑥, given a value
function.

In [4]: def operator_factory(jv, parallel_flag=True):

"""
Returns a jitted version of the Bellman operator T

jv is an instance of JVWorker

"""

π, β = jv.π, jv.β
x_grid, �, mc_size = jv.x_grid, jv.�, jv.mc_size
f_rvs, g = jv.f_rvs, jv.g

@njit
def state_action_values(z, x, v):
s, ϕ = z
25.4. IMPLEMENTATION 403

v_func = lambda x: interp(x_grid, v, x)

integral = 0
for m in range(mc_size):
u = f_rvs[m]
integral += v_func(max(g(x, ϕ), u))
integral = integral / mc_size

q = π(s) * integral + (1 ­ π(s)) * v_func(g(x, ϕ))


return x * (1 ­ ϕ ­ s) + β * q

@njit(parallel=parallel_flag)
def T(v):
"""
The Bellman operator
"""

v_new = np.empty_like(v)
for i in prange(len(x_grid)):
x = x_grid[i]

# Search on a grid
search_grid = np.linspace(�, 1, 15)
max_val = ­1
for s in search_grid:
for ϕ in search_grid:
current_val = state_action_values((s, ϕ), x, v) if s + ϕ <= 1 else
­1
if current_val > max_val:
max_val = current_val
v_new[i] = max_val

return v_new

@njit
def get_greedy(v):
"""
Computes the v­greedy policy of a given function v
"""
s_policy, ϕ_policy = np.empty_like(v), np.empty_like(v)

for i in range(len(x_grid)):
x = x_grid[i]
# Search on a grid
search_grid = np.linspace(�, 1, 15)
max_val = ­1
for s in search_grid:
for ϕ in search_grid:
current_val = state_action_values((s, ϕ), x, v) if s + ϕ <= 1 else
­1
if current_val > max_val:
max_val = current_val
max_s, max_ϕ = s, ϕ
s_policy[i], ϕ_policy[i] = max_s, max_ϕ
return s_policy, ϕ_policy

return T, get_greedy

To solve the model, we will write a function that uses the Bellman operator and iterates to
find a fixed point.

In [5]: def solve_model(jv,


404 CHAPTER 25. JOB SEARCH VI: ON-THE-JOB SEARCH

use_parallel=True,
tol=1e­4,
max_iter=1000,
verbose=True,
print_skip=25):

"""
Solves the model by value function iteration

* jv is an instance of JVWorker

"""

T, _ = operator_factory(jv, parallel_flag=use_parallel)

# Set up loop
v = jv.x_grid * 0.5 # Initial condition
i = 0
error = tol + 1

while i < max_iter and error > tol:


v_new = T(v)
error = np.max(np.abs(v ­ v_new))
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
v = v_new

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return v_new

25.5 Solving for Policies

Let’s generate the optimal policies and see what they look like.

In [6]: jv = JVWorker()
T, get_greedy = operator_factory(jv)
v_star = solve_model(jv)
s_star, ϕ_star = get_greedy(v_star)

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

Error at iteration 25 is 0.15111122732076243.


Error at iteration 50 is 0.054459990181117135.
Error at iteration 75 is 0.01962720166139853.
Error at iteration 100 is 0.007073579039882816.
Error at iteration 125 is 0.002549294662410162.
Error at iteration 150 is 0.0009187574266391607.
Error at iteration 175 is 0.0003311171601527718.
25.5. SOLVING FOR POLICIES 405

Error at iteration 200 is 0.00011933353741611086.

Converged in 205 iterations.

Here are the plots:

In [7]: plots = [s_star, ϕ_star, v_star]


titles = ["s policy", "ϕ policy", "value function"]

fig, axes = plt.subplots(3, 1, figsize=(12, 12))

for ax, plot, title in zip(axes, plots, titles):


ax.plot(jv.x_grid, plot)
ax.set(title=title)
ax.grid()

axes[­1].set_xlabel("x")
plt.show()

The horizontal axis is the state 𝑥, while the vertical axis gives 𝑠(𝑥) and 𝜙(𝑥).
406 CHAPTER 25. JOB SEARCH VI: ON-THE-JOB SEARCH

Overall, the policies match well with our predictions from above
• Worker switches from one investment strategy to the other depending on relative re-
turn.
• For low values of 𝑥, the best option is to search for a new job.
• Once 𝑥 is larger, worker does better by investing in human capital specific to the cur-
rent position.

25.6 Exercises

25.6.1 Exercise 1

Let’s look at the dynamics for the state process {𝑥𝑡 } associated with these policies.
The dynamics are given by (1) when 𝜙𝑡 and 𝑠𝑡 are chosen according to the optimal policies,
and ℙ{𝑏𝑡+1 = 1} = 𝜋(𝑠𝑡 ).
Since the dynamics are random, analysis is a bit subtle.
One way to do it is to plot, for each 𝑥 in a relatively fine grid called plot_grid, a large num-
ber 𝐾 of realizations of 𝑥𝑡+1 given 𝑥𝑡 = 𝑥.
Plot this with one dot for each realization, in the form of a 45 degree diagram, setting

jv = JVWorker(grid_size=25, mc_size=50)
plot_grid_max, plot_grid_size = 1.2, 100
plot_grid = np.linspace(0, plot_grid_max, plot_grid_size)
fig, ax = plt.subplots()
ax.set_xlim(0, plot_grid_max)
ax.set_ylim(0, plot_grid_max)

By examining the plot, argue that under the optimal policies, the state 𝑥𝑡 will converge to a
constant value 𝑥̄ close to unity.
Argue that at the steady state, 𝑠𝑡 ≈ 0 and 𝜙𝑡 ≈ 0.6.

25.6.2 Exercise 2

In the preceding exercise, we found that 𝑠𝑡 converges to zero and 𝜙𝑡 converges to about 0.6.
Since these results were calculated at a value of 𝛽 close to one, let’s compare them to the best
choice for an infinitely patient worker.
Intuitively, an infinitely patient worker would like to maximize steady state wages, which are
a function of steady state capital.
You can take it as given—it’s certainly true—that the infinitely patient worker does not
search in the long run (i.e., 𝑠𝑡 = 0 for large 𝑡).
Thus, given 𝜙, steady state capital is the positive fixed point 𝑥∗ (𝜙) of the map 𝑥 ↦ 𝑔(𝑥, 𝜙).
Steady state wages can be written as 𝑤∗ (𝜙) = 𝑥∗ (𝜙)(1 − 𝜙).
Graph 𝑤∗ (𝜙) with respect to 𝜙, and examine the best choice of 𝜙.
Can you give a rough interpretation for the value that you see?
25.7. SOLUTIONS 407

25.7 Solutions

25.7.1 Exercise 1

Here’s code to produce the 45 degree diagram

In [8]: jv = JVWorker(grid_size=25, mc_size=50)


π, g, f_rvs, x_grid = jv.π, jv.g, jv.f_rvs, jv.x_grid
T, get_greedy = operator_factory(jv)
v_star = solve_model(jv, verbose=False)
s_policy, ϕ_policy = get_greedy(v_star)

# Turn the policy function arrays into actual functions


s = lambda y: interp(x_grid, s_policy, y)
ϕ = lambda y: interp(x_grid, ϕ_policy, y)

def h(x, b, u):


return (1 ­ b) * g(x, ϕ(x)) + b * max(g(x, ϕ(x)), u)

plot_grid_max, plot_grid_size = 1.2, 100


plot_grid = np.linspace(0, plot_grid_max, plot_grid_size)
fig, ax = plt.subplots(figsize=(8, 8))
ticks = (0.25, 0.5, 0.75, 1.0)
ax.set(xticks=ticks, yticks=ticks,
xlim=(0, plot_grid_max),
ylim=(0, plot_grid_max),
xlabel='$x_t$', ylabel='$x_{t+1}$')

ax.plot(plot_grid, plot_grid, 'k­­', alpha=0.6) # 45 degree line


for x in plot_grid:
for i in range(jv.mc_size):
b = 1 if np.random.uniform(0, 1) < π(s(x)) else 0
u = f_rvs[i]
y = h(x, b, u)
ax.plot(x, y, 'go', alpha=0.25)

plt.show()
408 CHAPTER 25. JOB SEARCH VI: ON-THE-JOB SEARCH

Looking at the dynamics, we can see that


• If 𝑥𝑡 is below about 0.2 the dynamics are random, but 𝑥𝑡+1 > 𝑥𝑡 is very likely.
• As 𝑥𝑡 increases the dynamics become deterministic, and 𝑥𝑡 converges to a steady state
value close to 1.
Referring back to the figure here we see that 𝑥𝑡 ≈ 1 means that 𝑠𝑡 = 𝑠(𝑥𝑡 ) ≈ 0 and 𝜙𝑡 =
𝜙(𝑥𝑡 ) ≈ 0.6.

25.7.2 Exercise 2

The figure can be produced as follows

In [9]: jv = JVWorker()

def xbar(ϕ):
A, α = jv.A, jv.α
return (A * ϕ**α)**(1 / (1 ­ α))

ϕ_grid = np.linspace(0, 1, 100)


fig, ax = plt.subplots(figsize=(9, 7))
ax.set(xlabel='$\phi$')
25.7. SOLUTIONS 409

ax.plot(ϕ_grid, [xbar(ϕ) * (1 ­ ϕ) for ϕ in ϕ_grid], label='$w^*(\phi)$')


ax.legend()

plt.show()

Observe that the maximizer is around 0.6.


This is similar to the long-run value for 𝜙 obtained in exercise 1.
Hence the behavior of the infinitely patent worker is similar to that of the worker with 𝛽 =
0.96.
This seems reasonable and helps us confirm that our dynamic programming solutions are
probably correct.
410 CHAPTER 25. JOB SEARCH VI: ON-THE-JOB SEARCH
Part IV

Consumption, Savings and Growth

411
Chapter 26

Cake Eating I: Introduction to


Optimal Saving

26.1 Contents

• Overview 26.2
• The Model 26.3
• The Value Function 26.4
• The Optimal Policy 26.5
• The Euler Equation 26.6
• Exercises 26.7
• Solutions 26.8

26.2 Overview

In this lecture we introduce a simple “cake eating” problem.


The intertemporal problem is: how much to enjoy today and how much to leave for the fu-
ture?
Although the topic sounds trivial, this kind of trade-off between current and future utility is
at the heart of many savings and consumption problems.
Once we master the ideas in this simple environment, we will apply them to progressively
more challenging—and useful—problems.
The main tool we will use to solve the cake eating problem is dynamic programming.
Readers might find it helpful to review the following lectures before reading this one:
• The shortest paths lecture
• The basic McCall model
• The McCall model with separation
• The McCall model with separation and a continuous wage distribution
In what follows, we require the following imports:

In [1]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

413
414 CHAPTER 26. CAKE EATING I: INTRODUCTION TO OPTIMAL SAVING

26.3 The Model

We consider an infinite time horizon 𝑡 = 0, 1, 2, 3..


At 𝑡 = 0 the agent is given a complete cake with size 𝑥.̄
Let 𝑥𝑡 denote the size of the cake at the beginning of each period, so that, in particular, 𝑥0 =
𝑥.̄
We choose how much of the cake to eat in any given period 𝑡.
After choosing to consume 𝑐𝑡 of the cake in period 𝑡 there is

𝑥𝑡+1 = 𝑥𝑡 − 𝑐𝑡

left in period 𝑡 + 1.
Consuming quantity 𝑐 of the cake gives current utility 𝑢(𝑐).
We adopt the CRRA utility function

𝑐1−𝛾
𝑢(𝑐) = (𝛾 > 0, 𝛾 ≠ 1) (1)
1−𝛾

In Python this is

In [2]: def u(c, γ):

return c**(1 ­ γ) / (1 ­ γ)

Future cake consumption utility is discounted according to 𝛽 ∈ (0, 1).


In particular, consumption of 𝑐 units 𝑡 periods hence has present value 𝛽 𝑡 𝑢(𝑐)
The agent’s problem can be written as


max ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (2)
{𝑐𝑡 }
𝑡=0

subject to

𝑥𝑡+1 = 𝑥𝑡 − 𝑐𝑡 and 0 ≤ 𝑐𝑡 ≤ 𝑥 𝑡 (3)

for all 𝑡.
A consumption path {𝑐𝑡 } satisfying (3) where 𝑥0 = 𝑥̄ is called feasible.
In this problem, the following terminology is standard:
• 𝑥𝑡 is called the state variable
• 𝑐𝑡 is called the control variable or the action
• 𝛽 and 𝛾 are parameters
26.4. THE VALUE FUNCTION 415

26.3.1 Trade-Off

The key trade-off in the cake-eating problem is this:


• Delaying consumption is costly because of the discount factor.
• But delaying some consumption is also attractive because 𝑢 is concave.
The concavity of 𝑢 implies that the consumer gains value from consumption smoothing, which
means spreading consumption out over time.
This is because concavity implies diminishing marginal utility—a progressively smaller gain in
utility for each additional spoonful of cake consumed within one period.

26.3.2 Intuition

The reasoning given above suggests that the discount factor 𝛽 and the curvature parameter 𝛾
will play a key role in determining the rate of consumption.
Here’s an educated guess as to what impact these parameters will have.
First, higher 𝛽 implies less discounting, and hence the agent is more patient, which should
reduce the rate of consumption.
Second, higher 𝛾 implies that marginal utility 𝑢′ (𝑐) = 𝑐−𝛾 falls faster with 𝑐.
This suggests more smoothing, and hence a lower rate of consumption.
In summary, we expect the rate of consumption to be decreasing in both parameters.
Let’s see if this is true.

26.4 The Value Function

The first step of our dynamic programming treatment is to obtain the Bellman equation.
The next step is to use it to calculate the solution.

26.4.1 The Bellman Equation

To this end, we let 𝑣(𝑥) be maximum lifetime utility attainable from the current time when 𝑥
units of cake are left.
That is,


𝑣(𝑥) = max ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (4)
𝑡=0

where the maximization is over all paths {𝑐𝑡 } that are feasible from 𝑥0 = 𝑥.
At this point, we do not have an expression for 𝑣, but we can still make inferences about it.
For example, as was the case with the McCall model, the value function will satisfy a version
of the Bellman equation.
In the present case, this equation states that 𝑣 satisfies
416 CHAPTER 26. CAKE EATING I: INTRODUCTION TO OPTIMAL SAVING

𝑣(𝑥) = max {𝑢(𝑐) + 𝛽𝑣(𝑥 − 𝑐)} for any given 𝑥 ≥ 0. (5)


0≤𝑐≤𝑥

The intuition here is essentially the same it was for the McCall model.
Choosing 𝑐 optimally means trading off current vs future rewards.
Current rewards from choice 𝑐 are just 𝑢(𝑐).
Future rewards given current cake size 𝑥, measured from next period and assuming optimal
behavior, are 𝑣(𝑥 − 𝑐).
These are the two terms on the right hand side of (5), after suitable discounting.
If 𝑐 is chosen optimally using this trade off strategy, then we obtain maximal lifetime rewards
from our current state 𝑥.
Hence, 𝑣(𝑥) equals the right hand side of (5), as claimed.

26.4.2 An Analytical Solution

It has been shown that, with 𝑢 as the CRRA utility function in (1), the function

−𝛾
𝑣∗ (𝑥𝑡 ) = (1 − 𝛽 1/𝛾 ) 𝑢(𝑥𝑡 ) (6)

solves the Bellman equation and hence is equal to the value function.
You are asked to confirm that this is true in the exercises below.
The solution (6) depends heavily on the CRRA utility function.
In fact, if we move away from CRRA utility, usually there is no analytical solution at all.
In other words, beyond CRRA utility, we know that the value function still satisfies the Bell-
man equation, but we do not have a way of writing it explicitly, as a function of the state
variable and the parameters.
We will deal with that situation numerically when the time comes.
Here is a Python representation of the value function:

In [3]: def v_star(x, β, γ):

return (1 ­ β**(1 / γ))**(­γ) * u(x, γ)

And here’s a figure showing the function for fixed parameters:

In [4]: β, γ = 0.95, 1.2


x_grid = np.linspace(0.1, 5, 100)

fig, ax = plt.subplots()

ax.plot(x_grid, v_star(x_grid, β, γ), label='value function')

ax.set_xlabel('$x$', fontsize=12)
ax.legend(fontsize=12)

plt.show()
26.5. THE OPTIMAL POLICY 417

26.5 The Optimal Policy

Now that we have the value function, it is straightforward to calculate the optimal action at
each state.
We should choose consumption to maximize the right hand side of the Bellman equation (5).

𝑐∗ = arg max{𝑢(𝑐) + 𝛽𝑣(𝑥 − 𝑐)}


𝑐

We can think of this optimal choice as a function of the state 𝑥, in which case we call it the
optimal policy.
We denote the optimal policy by 𝜎∗ , so that

𝜎∗ (𝑥) ∶= arg max{𝑢(𝑐) + 𝛽𝑣(𝑥 − 𝑐)} for all 𝑥


𝑐

If we plug the analytical expression (6) for the value function into the right hand side and
compute the optimum, we find that

𝜎∗ (𝑥) = (1 − 𝛽 1/𝛾 ) 𝑥 (7)

Now let’s recall our intuition on the impact of parameters.


We guessed that the consumption rate would be decreasing in both parameters.
This is in fact the case, as can be seen from (7).
Here’s some plots that illustrate.
418 CHAPTER 26. CAKE EATING I: INTRODUCTION TO OPTIMAL SAVING

In [5]: def c_star(x, β, γ):

return (1 ­ β ** (1/γ)) * x

Continuing with the values for 𝛽 and 𝛾 used above, the plot is

In [6]: fig, ax = plt.subplots()


ax.plot(x_grid, c_star(x_grid, β, γ), label='default parameters')
ax.plot(x_grid, c_star(x_grid, β + 0.02, γ), label=r'higher $\beta$')
ax.plot(x_grid, c_star(x_grid, β, γ + 0.2), label=r'higher $\gamma$')
ax.set_ylabel(r'$\sigma(x)$')
ax.set_xlabel('$x$')
ax.legend()

plt.show()

26.6 The Euler Equation

In the discussion above we have provided a complete solution to the cake eating problem in
the case of CRRA utility.
There is in fact another way to solve for the optimal policy, based on the so-called Euler
equation.
Although we already have a complete solution, now is a good time to study the Euler equa-
tion.
This is because, for more difficult problems, this equation provides key insights that are hard
to obtain by other methods.
26.6. THE EULER EQUATION 419

26.6.1 Statement and Implications

The Euler equation for the present problem can be stated as

𝑢′ (𝑐𝑡∗ ) = 𝛽𝑢′ (𝑐𝑡+1



) (8)

This is necessary condition for the optimal path.


It says that, along the optimal path, marginal rewards are equalized across time, after appro-
priate discounting.
This makes sense: optimality is obtained by smoothing consumption up to the point where no
marginal gains remain.
We can also state the Euler equation in terms of the policy function.
A feasible consumption policy is a map 𝑥 ↦ 𝜎(𝑥) satisfying 0 ≤ 𝜎(𝑥) ≤ 𝑥.
The last restriction says that we cannot consume more than the remaining quantity of cake.
A feasible consumption policy 𝜎 is said to satisfy the Euler equation if, for all 𝑥 > 0,

𝑢′ (𝜎(𝑥)) = 𝛽𝑢′ (𝜎(𝑥 − 𝜎(𝑥))) (9)

Evidently (9) is just the policy equivalent of (8).


It turns out that a feasible policy is optimal if and only if it satisfies the Euler equation.
In the exercises, you are asked to verify that the optimal policy (7) does indeed satisfy this
functional equation.

Note
A functional equation is an equation where the unknown object is a function.

For a proof of sufficiency of the Euler equation in a very general setting, see proposition 2.2 of
[75].
The following arguments focus on necessity, explaining why an optimal path or policy should
satisfy the Euler equation.

26.6.2 Derivation I: A Perturbation Approach

Let’s write 𝑐 as a shorthand for consumption path {𝑐𝑡 }∞


𝑡=0 .

The overall cake-eating maximization problem can be written as


max 𝑈 (𝑐) where 𝑈 (𝑐) ∶= ∑ 𝛽 𝑡 𝑢(𝑐𝑡 )
𝑐∈𝐹
𝑡=0

and 𝐹 is the set of feasible consumption paths.


We know that differentiable functions have a zero gradient at a maximizer.
So the optimal path 𝑐∗ ∶= {𝑐𝑡∗ }∞ ′ ∗
𝑡=0 must satisfy 𝑈 (𝑐 ) = 0.
420 CHAPTER 26. CAKE EATING I: INTRODUCTION TO OPTIMAL SAVING

Note
If you want to know exactly how the derivative 𝑈 ′ (𝑐∗ ) is defined, given that the
argument 𝑐∗ is a vector of infinite length, you can start by learning about Gateaux
derivatives. However, such knowledge is not assumed in what follows.

In other words, the rate of change in 𝑈 must be zero for any infinitesimally small (and feasi-
ble) perturbation away from the optimal path.
So consider a feasible perturbation that reduces consumption at time 𝑡 to 𝑐𝑡∗ − ℎ and increases

it in the next period to 𝑐𝑡+1 + ℎ.
Consumption does not change in any other period.
We call this perturbed path 𝑐ℎ .
By the preceding argument about zero gradients, we have

𝑈 (𝑐ℎ ) − 𝑈 (𝑐∗ )
lim = 𝑈 ′ (𝑐∗ ) = 0
ℎ→0 ℎ

Recalling that consumption only changes at 𝑡 and 𝑡 + 1, this becomes

𝛽 𝑡 𝑢(𝑐𝑡∗ − ℎ) + 𝛽 𝑡+1 𝑢(𝑐𝑡+1



+ ℎ) − 𝛽 𝑡 𝑢(𝑐𝑡∗ ) − 𝛽 𝑡+1 𝑢(𝑐𝑡+1

)
lim =0
ℎ→0 ℎ

After rearranging, the same expression can be written as

∗ ∗
𝑢(𝑐𝑡∗ − ℎ) − 𝑢(𝑐𝑡∗ ) 𝛽𝑢(𝑐𝑡+1 + ℎ) − 𝑢(𝑐𝑡+1 )
lim + lim =0
ℎ→0 ℎ ℎ→0 ℎ

or, taking the limit,

−𝑢′ (𝑐𝑡∗ ) + 𝛽𝑢′ (𝑐𝑡+1



)=0

This is just the Euler equation.

26.6.3 Derivation II: Using the Bellman Equation

Another way to derive the Euler equation is to use the Bellman equation (5).
Taking the derivative on the right hand side of the Bellman equation with respect to 𝑐 and
setting it to zero, we get

𝑢′ (𝑐) = 𝛽𝑣′ (𝑥 − 𝑐) (10)

To obtain 𝑣′ (𝑥 − 𝑐), we set 𝑔(𝑐, 𝑥) = 𝑢(𝑐) + 𝛽𝑣(𝑥 − 𝑐), so that, at the optimal choice of con-
sumption,

𝑣(𝑥) = 𝑔(𝑐, 𝑥) (11)

Differentiating both sides while acknowledging that the maximizing consumption will depend
on 𝑥, we get
26.7. EXERCISES 421

𝜕 𝜕𝑐 𝜕
𝑣′ (𝑥) = 𝑔(𝑐, 𝑥) + 𝑔(𝑐, 𝑥)
𝜕𝑐 𝜕𝑥 𝜕𝑥

𝜕
When 𝑔(𝑐, 𝑥) is maximized at 𝑐, we have 𝜕𝑐 𝑔(𝑐, 𝑥) = 0.
Hence the derivative simplifies to

𝜕𝑔(𝑐, 𝑥) 𝜕
𝑣′ (𝑥) = = 𝛽𝑣(𝑥 − 𝑐) = 𝛽𝑣′ (𝑥 − 𝑐) (12)
𝜕𝑥 𝜕𝑥

(This argument is an example of the Envelope Theorem.)


But now an application of (10) gives

𝑢′ (𝑐) = 𝑣′ (𝑥) (13)

Thus, the derivative of the value function is equal to marginal utility.


Combining this fact with (12) recovers the Euler equation.

26.7 Exercises

26.7.1 Exercise 1

How does one obtain the expressions for the value function and optimal policy given in (6)
and (7) respectively?
The first step is to make a guess of the functional form for the consumption policy.
So suppose that we do not know the solutions and start with a guess that the optimal policy
is linear.
In other words, we conjecture that there exists a positive 𝜃 such that setting 𝑐𝑡∗ = 𝜃𝑥𝑡 for all 𝑡
produces an optimal path.
Starting from this conjecture, try to obtain the solutions (6) and (7).
In doing so, you will need to use the definition of the value function and the Bellman equa-
tion.

26.8 Solutions

26.8.1 Exercise 1

We start with the conjecture 𝑐𝑡∗ = 𝜃𝑥𝑡 , which leads to a path for the state variable (cake size)
given by

𝑥𝑡+1 = 𝑥𝑡 (1 − 𝜃)

Then 𝑥𝑡 = 𝑥0 (1 − 𝜃)𝑡 and hence


422 CHAPTER 26. CAKE EATING I: INTRODUCTION TO OPTIMAL SAVING


𝑣(𝑥0 ) = ∑ 𝛽 𝑡 𝑢(𝜃𝑥𝑡 )
𝑡=0

= ∑ 𝛽 𝑡 𝑢(𝜃𝑥0 (1 − 𝜃)𝑡 )
𝑡=0

= ∑ 𝜃1−𝛾 𝛽 𝑡 (1 − 𝜃)𝑡(1−𝛾) 𝑢(𝑥0 )
𝑡=0
𝜃1−𝛾
= 𝑢(𝑥0 )
1 − 𝛽(1 − 𝜃)1−𝛾

From the Bellman equation, then,

𝜃1−𝛾
𝑣(𝑥) = max {𝑢(𝑐) + 𝛽 ⋅ 𝑢(𝑥 − 𝑐)}
0≤𝑐≤𝑥 1 − 𝛽(1 − 𝜃)1−𝛾
𝑐1−𝛾 𝜃1−𝛾 (𝑥 − 𝑐)1−𝛾
= max { +𝛽 ⋅ }
0≤𝑐≤𝑥 1 − 𝛾 1 − 𝛽(1 − 𝜃)1−𝛾 1−𝛾

From the first order condition, we obtain

𝜃1−𝛾
𝑐−𝛾 + 𝛽 ⋅ (𝑥 − 𝑐)−𝛾 (−1) = 0
1 − 𝛽(1 − 𝜃)1−𝛾
or

𝜃1−𝛾
𝑐−𝛾 = 𝛽 ⋅ (𝑥 − 𝑐)−𝛾
1 − 𝛽(1 − 𝜃)1−𝛾

With 𝑐 = 𝜃𝑥 we get

−𝛾 𝜃1−𝛾
(𝜃𝑥) =𝛽 ⋅ (𝑥(1 − 𝜃))−𝛾
1 − 𝛽(1 − 𝜃)1−𝛾

Some rearrangement produces

1
𝜃 = 1 − 𝛽𝛾

This confirms our earlier expression for the optimal policy:

1
𝑐𝑡∗ = (1 − 𝛽 𝛾 ) 𝑥𝑡

Substituting 𝜃 into the value function above gives

1 1−𝛾
(1 − 𝛽 𝛾 )
𝑣∗ (𝑥𝑡 ) = 1−𝛾
𝑢(𝑥𝑡 )
1 − 𝛽 (𝛽 𝛾 )

Rearranging gives
26.8. SOLUTIONS 423

1 −𝛾
𝑣∗ (𝑥𝑡 ) = (1 − 𝛽 𝛾 ) 𝑢(𝑥𝑡 )

Our claims are now verified.


424 CHAPTER 26. CAKE EATING I: INTRODUCTION TO OPTIMAL SAVING
Chapter 27

Cake Eating II: Numerical Methods

27.1 Contents

• Overview 27.2
• Reviewing the Model 27.3
• Value Function Iteration 27.4
• Time Iteration 27.5
• Exercises 27.6
• Solutions 27.7
In addition to what’s in Anaconda, this lecture will require the following library:

In [1]: !pip install interpolation

27.2 Overview

In this lecture we continue the study of the cake eating problem.


The aim of this lecture is to solve the problem using numerical methods.
At first this might appear unnecessary, since we already obtained the optimal policy analyti-
cally.
However, the cake eating problem is too simple to be useful without modifications, and once
we start modifying the problem, numerical methods become essential.
Hence it makes sense to introduce numerical methods now, and test them on this simple
problem.
Since we know the analytical solution, this will allow us to assess the accuracy of alternative
numerical methods.
We will use the following imports:

In [2]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

from interpolation import interp


from scipy.optimize import minimize_scalar, bisect

425
426 CHAPTER 27. CAKE EATING II: NUMERICAL METHODS

27.3 Reviewing the Model

You might like to review the details before we start.


Recall in particular that the Bellman equation is

𝑣(𝑥) = max {𝑢(𝑐) + 𝛽𝑣(𝑥 − 𝑐)} for all 𝑥 ≥ 0. (1)


0≤𝑐≤𝑥

where 𝑢 is the CRRA utility function.


The analytical solutions for the value function and optimal policy were found to be as follows.

In [3]: def c_star(x, β, γ):

return (1 ­ β ** (1/γ)) * x

def v_star(x, β, γ):

return (1 ­ β**(1 / γ))**(­γ) * (x**(1­γ) / (1­γ))

Our first aim is to obtain these analytical solutions numerically.

27.4 Value Function Iteration

The first approach we will take is value function iteration.


This is a form of successive approximation, and was discussed in our lecture on job
search.
The basic idea is:

1. Take an arbitary intial guess of 𝑣.

2. Obtain an update 𝑤 defined by

𝑤(𝑥) = max {𝑢(𝑐) + 𝛽𝑣(𝑥 − 𝑐)}


0≤𝑐≤𝑥

1. Stop if 𝑤 is approximately equal to 𝑣, otherwise set 𝑣 = 𝑤 and go back to step 2.

Let’s write this a bit more mathematically.

27.4.1 The Bellman Operator

We introduce the Bellman operator 𝑇 that takes a function v as an argument and returns
a new function 𝑇 𝑣 defined by.

𝑇 𝑣(𝑥) = max {𝑢(𝑐) + 𝛽𝑣(𝑥 − 𝑐)}


0≤𝑐≤𝑥
27.4. VALUE FUNCTION ITERATION 427

From 𝑣 we get 𝑇 𝑣, and applying 𝑇 to this yields 𝑇 2 𝑣 ∶= 𝑇 (𝑇 𝑣) and so on.


This is called iterating with the Bellman operator from initial guess 𝑣.
As we discuss in more detail in later lectures, one can use Banach’s contraction mapping the-
orem to prove that the sequence of functions 𝑇 𝑛 𝑣 converges to the solution to the Bellman
equation.

27.4.2 Fitted Value Function Iteration

Both consumption 𝑐 and the state variable 𝑥 are continous.


This causes complications when it comes to numerical work.
For example, we need to store each function 𝑇 𝑛 𝑣 in order to compute the next iterate 𝑇 𝑛+1 𝑣.
But this means we have to store 𝑇 𝑛 𝑣(𝑥) at infinitely many 𝑥, which is, in general, impossible.
To circumvent this issue we will use fitted value function iteration, as discussed previously in
one of the lectures on job search.
The process looks like this:

1. Begin with an array of values {𝑣0 , … , 𝑣𝐼 } representing the values of some initial function
𝑣 on the grid points {𝑥0 , … , 𝑥𝐼 }.

2. Build a function 𝑣 ̂ on the state space ℝ+ by linear interpolation, based on these data
points.

3. Obtain and record the value 𝑇 𝑣(𝑥


̂ 𝑖 ) on each grid point 𝑥𝑖 by repeatedly solving the
maximization problem in the Bellman equation.

4. Unless some stopping condition is satisfied, set {𝑣0 , … , 𝑣𝐼 } = {𝑇 𝑣(𝑥


̂ 0 ), … , 𝑇 𝑣(𝑥
̂ 𝐼 )} and
go to step 2.

In step 2 we’ll use continuous piecewise linear interpolation.

27.4.3 Implementation

The maximize function below is a small helper function that converts a SciPy minimization
routine into a maximization routine.

In [4]: def maximize(g, a, b, args):


"""
Maximize the function g over the interval [a, b].

We use the fact that the maximizer of g on any interval is


also the minimizer of ­g. The tuple args collects any extra
arguments to g.

Returns the maximal value and the maximizer.


"""

objective = lambda x: ­g(x, *args)


result = minimize_scalar(objective, bounds=(a, b), method='bounded')
maximizer, maximum = result.x, ­result.fun
return maximizer, maximum
428 CHAPTER 27. CAKE EATING II: NUMERICAL METHODS

We’ll store the parameters 𝛽 and 𝛾 in a class called CakeEating.


The same class will also provide a method called state_action_value that returns the value
of a consumption choice given a particular state and guess of 𝑣.

In [5]: class CakeEating:

def __init__(self,
β=0.96, # discount factor
γ=1.5, # degree of relative risk aversion
x_grid_min=1e­3, # exclude zero for numerical stability
x_grid_max=2.5, # size of cake
x_grid_size=120):

self.β, self.γ = β, γ

# Set up grid
self.x_grid = np.linspace(x_grid_min, x_grid_max, x_grid_size)

# Utility function
def u(self, c):

γ = self.γ

if γ == 1:
return np.log(c)
else:
return (c ** (1 ­ γ)) / (1 ­ γ)

# first derivative of utility function


def u_prime(self, c):

return c ** (­self.γ)

def state_action_value(self, c, x, v_array):


"""
Right hand side of the Bellman equation given x and c.
"""

u, β = self.u, self.β
v = lambda x: interp(self.x_grid, v_array, x)

return u(c) + β * v(x ­ c)

We now define the Bellman operation:

In [6]: def T(v, ce):


"""
The Bellman operator. Updates the guess of the value function.

* ce is an instance of CakeEating
* v is an array representing a guess of the value function

"""
v_new = np.empty_like(v)

for i, x in enumerate(ce.x_grid):
# Maximize RHS of Bellman equation at state x
v_new[i] = maximize(ce.state_action_value, 1e­10, x, (x, v))[1]

return v_new
27.4. VALUE FUNCTION ITERATION 429

After defining the Bellman operator, we are ready to solve the model.
Let’s start by creating a CakeEating instance using the default parameterization.

In [7]: ce = CakeEating()

Now let’s see the iteration of the value function in action.


We start from guess 𝑣 given by 𝑣(𝑥) = 𝑢(𝑥) for every 𝑥 grid point.

In [8]: x_grid = ce.x_grid


v = ce.u(x_grid) # Initial guess
n = 12 # Number of iterations

fig, ax = plt.subplots()

ax.plot(x_grid, v, color=plt.cm.jet(0),
lw=2, alpha=0.6, label='Initial guess')

for i in range(n):
v = T(v, ce) # Apply the Bellman operator
ax.plot(x_grid, v, color=plt.cm.jet(i / n), lw=2, alpha=0.6)

ax.legend()
ax.set_ylabel('value', fontsize=12)
ax.set_xlabel('cake size $x$', fontsize=12)
ax.set_title('Value function iterations')

plt.show()

To do this more systematically, we introduce a wrapper function called


compute_value_function that iterates until some convergence conditions are satisfied.
430 CHAPTER 27. CAKE EATING II: NUMERICAL METHODS

In [9]: def compute_value_function(ce,


tol=1e­4,
max_iter=1000,
verbose=True,
print_skip=25):

# Set up loop
v = np.zeros(len(ce.x_grid)) # Initial guess
i = 0
error = tol + 1

while i < max_iter and error > tol:


v_new = T(v, ce)

error = np.max(np.abs(v ­ v_new))


i += 1

if verbose and i % print_skip == 0:


print(f"Error at iteration {i} is {error}.")

v = v_new

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return v_new

Now let’s call it, noting that it takes a little while to run.

In [10]: v = compute_value_function(ce)

Error at iteration 25 is 23.8003755134813.


Error at iteration 50 is 8.577577195046615.
Error at iteration 75 is 3.091330659691039.
Error at iteration 100 is 1.1141054204751981.
Error at iteration 125 is 0.4015199357729671.
Error at iteration 150 is 0.14470646660561215.
Error at iteration 175 is 0.052151735472762084.
Error at iteration 200 is 0.018795314242879613.
Error at iteration 225 is 0.006773769545588948.
Error at iteration 250 is 0.0024412443051460286.
Error at iteration 275 is 0.000879816432870939.
Error at iteration 300 is 0.00031708295398402697.
Error at iteration 325 is 0.00011427565573285392.

Converged in 329 iterations.

Now we can plot and see what the converged value function looks like.

In [11]: fig, ax = plt.subplots()

ax.plot(x_grid, v, label='Approximate value function')


ax.set_ylabel('$V(x)$', fontsize=12)
ax.set_xlabel('$x$', fontsize=12)
ax.set_title('Value function')
ax.legend()
plt.show()
27.4. VALUE FUNCTION ITERATION 431

Next let’s compare it to the analytical solution.

In [12]: v_analytical = v_star(ce.x_grid, ce.β, ce.γ)

In [13]: fig, ax = plt.subplots()

ax.plot(x_grid, v_analytical, label='analytical solution')


ax.plot(x_grid, v, label='numerical solution')
ax.set_ylabel('$V(x)$', fontsize=12)
ax.set_xlabel('$x$', fontsize=12)
ax.legend()
ax.set_title('Comparison between analytical and numerical value functions')
plt.show()
432 CHAPTER 27. CAKE EATING II: NUMERICAL METHODS

The quality of approximation is reasonably good for large 𝑥, but less so near the lower
boundary.
The reason is that the utility function and hence value function is very steep near the lower
boundary, and hence hard to approximate.

27.4.4 Policy Function

Let’s see how this plays out in terms of computing the optimal policy.
In the first lecture on cake eating, the optimal consumption policy was shown to be

𝜎∗ (𝑥) = (1 − 𝛽 1/𝛾 ) 𝑥

Let’s see if our numerical results lead to something similar.


Our numerical strategy will be to compute

𝜎(𝑥) = arg max {𝑢(𝑐) + 𝛽𝑣(𝑥 − 𝑐)}


0≤𝑐≤𝑥

on a grid of 𝑥 points and then interpolate.


For 𝑣 we will use the approximation of the value function we obtained above.
Here’s the function:

In [14]: def σ(ce, v):


"""
The optimal policy function. Given the value function,
it finds optimal consumption in each state.
27.4. VALUE FUNCTION ITERATION 433

* ce is an instance of CakeEating
* v is a value function array

"""
c = np.empty_like(v)

for i in range(len(ce.x_grid)):
x = ce.x_grid[i]
# Maximize RHS of Bellman equation at state x
c[i] = maximize(ce.state_action_value, 1e­10, x, (x, v))[0]

return c

Now let’s pass the approximate value function and compute optimal consumption:

In [15]: c = σ(ce, v)

Let’s plot this next to the true analytical solution

In [16]: c_analytical = c_star(ce.x_grid, ce.β, ce.γ)

fig, ax = plt.subplots()

ax.plot(ce.x_grid, c_analytical, label='analytical')


ax.plot(ce.x_grid, c, label='numerical')
ax.set_ylabel(r'$\sigma(x)$')
ax.set_xlabel('$x$')
ax.legend()

plt.show()

The fit is reasoable but not perfect.


434 CHAPTER 27. CAKE EATING II: NUMERICAL METHODS

We can improve it by increasing the grid size or reducing the error tolerance in the value
function iteration routine.
However, both changes will lead to a longer compute time.
Another possibility is to use an alternative algorithm, which offers the possibility of faster
compute time and, at the same time, more accuracy.
We explore this next.

27.5 Time Iteration

Now let’s look at a different strategy to compute the optimal policy.


Recall that the optimal policy satisfies the Euler equation

𝑢′ (𝜎(𝑥)) = 𝛽𝑢′ (𝜎(𝑥 − 𝜎(𝑥))) for all 𝑥 > 0 (2)

Computationally, we can start with any initial guess of 𝜎0 and now choose 𝑐 to solve

𝑢′ (𝑐) = 𝛽𝑢′ (𝜎0 (𝑥 − 𝑐))

Choosing 𝑐 to satisfy this equation at all 𝑥 > 0 produces a function of 𝑥.


Call this new function 𝜎1 , treat it as the new guess and repeat.
This is called time iteration.
As with value function iteration, we can view the update step as action of an operator, this
time denoted by 𝐾.
• In particular, 𝐾𝜎 is the policy updated from 𝜎 using the procedure just described.
• We will use this terminology in the exercises below.
The main advantage of time iteration relative to value function iteration is that it operates in
policy space rather than value function space.
This is helpful because the policy function has less curvature, and hence is easier to approxi-
mate.
In the exercises you are asked to implement time iteration and compare it to value function
iteration.
You should find that the method is faster and more accurate.
This is due to

1. the curvature issue mentioned just above and


2. the fact that we are using more information — in this case, the first order conditions.

27.6 Exercises

27.6.1 Exercise 1

Try the following modification of the problem.


27.7. SOLUTIONS 435

Instead of the cake size changing according to 𝑥𝑡+1 = 𝑥𝑡 − 𝑐𝑡 , let it change according to

𝑥𝑡+1 = (𝑥𝑡 − 𝑐𝑡 )𝛼

where 𝛼 is a parameter satisfying 0 < 𝛼 < 1.


(We will see this kind of update rule when we study optimal growth models.)
Make the required changes to value function iteration code and plot the value and policy
functions.
Try to reuse as much code as possible.

27.6.2 Exercise 2

Implement time iteration, returning to the original case (i.e., dropping the modification in the
exercise above).

27.7 Solutions

27.7.1 Exercise 1

We need to create a class to hold our primitives and return the right hand side of the bellman
equation.
We will use inheritance to maximize code reuse.

In [17]: class OptimalGrowth(CakeEating):


"""
A subclass of CakeEating that adds the parameter α and overrides
the state_action_value method.
"""

def __init__(self,
β=0.96, # discount factor
γ=1.5, # degree of relative risk aversion
α=0.4, # productivity parameter
x_grid_min=1e­3, # exclude zero for numerical stability
x_grid_max=2.5, # size of cake
x_grid_size=120):

self.α = α
CakeEating.__init__(self, β, γ, x_grid_min, x_grid_max, x_grid_size)

def state_action_value(self, c, x, v_array):


"""
Right hand side of the Bellman equation given x and c.
"""

u, β, α = self.u, self.β, self.α


v = lambda x: interp(self.x_grid, v_array, x)

return u(c) + β * v((x ­ c)**α)

In [18]: og = OptimalGrowth()
436 CHAPTER 27. CAKE EATING II: NUMERICAL METHODS

Here’s the computed value function.

In [19]: v = compute_value_function(og, verbose=False)

fig, ax = plt.subplots()

ax.plot(x_grid, v, lw=2, alpha=0.6)


ax.set_ylabel('value', fontsize=12)
ax.set_xlabel('state $x$', fontsize=12)

plt.show()

Here’s the computed policy, combined with the solution we derived above for the standard
cake eating case 𝛼 = 1.

In [20]: c_new = σ(og, v)

fig, ax = plt.subplots()

ax.plot(ce.x_grid, c_analytical, label=r'$\alpha=1$ solution')


ax.plot(ce.x_grid, c_new, label=fr'$\alpha={og.α}$ solution')

ax.set_ylabel('consumption', fontsize=12)
ax.set_xlabel('$x$', fontsize=12)

ax.legend(fontsize=12)

plt.show()
27.7. SOLUTIONS 437

Consumption is higher when 𝛼 < 1 because, at least for large 𝑥, the return to savings is
lower.

27.7.2 Exercise 2

Here’s one way to implement time iteration.

In [21]: def K(σ_array, ce):


"""
The policy function operator. Given the policy function,
it updates the optimal consumption using Euler equation.

* σ_array is an array of policy function values on the grid


* ce is an instance of CakeEating

"""

u_prime, β, x_grid = ce.u_prime, ce.β, ce.x_grid


σ_new = np.empty_like(σ_array)

σ = lambda x: interp(x_grid, σ_array, x)

def euler_diff(c, x):


return u_prime(c) ­ β * u_prime(σ(x ­ c))

for i, x in enumerate(x_grid):

# handle small x separately ­­­ helps numerical stability


if x < 1e­12:
σ_new[i] = 0.0

# handle other x
else:
σ_new[i] = bisect(euler_diff, 1e­10, x ­ 1e­10, x)
438 CHAPTER 27. CAKE EATING II: NUMERICAL METHODS

return σ_new

In [22]: def iterate_euler_equation(ce,


max_iter=500,
tol=1e­5,
verbose=True,
print_skip=25):

x_grid = ce.x_grid

σ = np.copy(x_grid) # initial guess

i = 0
error = tol + 1
while i < max_iter and error > tol:

σ_new = K(σ, ce)

error = np.max(np.abs(σ_new ­ σ))


i += 1

if verbose and i % print_skip == 0:


print(f"Error at iteration {i} is {error}.")

σ = σ_new

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return σ

In [23]: ce = CakeEating(x_grid_min=0.0)
c_euler = iterate_euler_equation(ce)

Error at iteration 25 is 0.0036456675931543225.


Error at iteration 50 is 0.0008283185047067848.
Error at iteration 75 is 0.00030791132300957147.
Error at iteration 100 is 0.00013555502390599772.
Error at iteration 125 is 6.417740905302616e­05.
Error at iteration 150 is 3.1438019047758115e­05.
Error at iteration 175 is 1.5658492883291464e­05.

Converged in 192 iterations.

In [24]: fig, ax = plt.subplots()

ax.plot(ce.x_grid, c_analytical, label='analytical solution')


ax.plot(ce.x_grid, c_euler, label='time iteration solution')

ax.set_ylabel('consumption')
ax.set_xlabel('$x$')
ax.legend(fontsize=12)

plt.show()
27.7. SOLUTIONS 439
440 CHAPTER 27. CAKE EATING II: NUMERICAL METHODS
Chapter 28

Optimal Growth I: The Stochastic


Optimal Growth Model

28.1 Contents

• Overview 28.2
• The Model 28.3
• Computation 28.4
• Exercises 28.5
• Solutions 28.6

28.2 Overview

In this lecture, we’re going to study a simple optimal growth model with one agent.
The model is a version of the standard one sector infinite horizon growth model studied in
• [102], chapter 2
• [72], section 3.1
• EDTC, chapter 1
• [104], chapter 12
It is an extension of the simple cake eating problem we looked at earlier.
The extension involves
• nonlinear returns to saving, through a production function, and
• stochastic returns, due to shocks to production.
Despite these additions, the model is still relatively simple.
We regard it as a stepping stone to more sophisticated models.
We solve the model using dynamic programming and a range of numerical techniques.
In this first lecture on optimal growth, the solution method will be value function iteration
(VFI).
While the code in this first lecture runs slowly, we will use a variety of techniques to drasti-
cally improve execution time over the next few lectures.

441
442CHAPTER 28. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL

Let’s start with some imports:

In [1]: import numpy as np


import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
from scipy.optimize import minimize_scalar

%matplotlib inline

28.3 The Model

Consider an agent who owns an amount 𝑦𝑡 ∈ ℝ+ ∶= [0, ∞) of a consumption good at time 𝑡.


This output can either be consumed or invested.
When the good is invested, it is transformed one-for-one into capital.
The resulting capital stock, denoted here by 𝑘𝑡+1 , will then be used for production.
Production is stochastic, in that it also depends on a shock 𝜉𝑡+1 realized at the end of the
current period.
Next period output is

𝑦𝑡+1 ∶= 𝑓(𝑘𝑡+1 )𝜉𝑡+1

where 𝑓 ∶ ℝ+ → ℝ+ is called the production function.


The resource constraint is

𝑘𝑡+1 + 𝑐𝑡 ≤ 𝑦𝑡 (1)

and all variables are required to be nonnegative.

28.3.1 Assumptions and Comments

In what follows,
• The sequence {𝜉𝑡 } is assumed to be IID.
• The common distribution of each 𝜉𝑡 will be denoted by 𝜙.
• The production function 𝑓 is assumed to be increasing and continuous.
• Depreciation of capital is not made explicit but can be incorporated into the production
function.
While many other treatments of the stochastic growth model use 𝑘𝑡 as the state variable, we
will use 𝑦𝑡 .
This will allow us to treat a stochastic model while maintaining only one state variable.
We consider alternative states and timing specifications in some of our other lectures.

28.3.2 Optimization

Taking 𝑦0 as given, the agent wishes to maximize


28.3. THE MODEL 443


𝔼 [∑ 𝛽 𝑡 𝑢(𝑐𝑡 )] (2)
𝑡=0

subject to

𝑦𝑡+1 = 𝑓(𝑦𝑡 − 𝑐𝑡 )𝜉𝑡+1 and 0 ≤ 𝑐𝑡 ≤ 𝑦 𝑡 for all 𝑡 (3)

where
• 𝑢 is a bounded, continuous and strictly increasing utility function and
• 𝛽 ∈ (0, 1) is a discount factor.
In (3) we are assuming that the resource constraint (1) holds with equality — which is rea-
sonable because 𝑢 is strictly increasing and no output will be wasted at the optimum.
In summary, the agent’s aim is to select a path 𝑐0 , 𝑐1 , 𝑐2 , … for consumption that is

1. nonnegative,

2. feasible in the sense of (1),

3. optimal, in the sense that it maximizes (2) relative to all other feasible consumption
sequences, and

4. adapted, in the sense that the action 𝑐𝑡 depends only on observable outcomes, not on
future outcomes such as 𝜉𝑡+1 .

In the present context


• 𝑦𝑡 is called the state variable — it summarizes the “state of the world” at the start of
each period.
• 𝑐𝑡 is called the control variable — a value chosen by the agent each period after observ-
ing the state.

28.3.3 The Policy Function Approach

One way to think about solving this problem is to look for the best policy function.
A policy function is a map from past and present observables into current action.
We’ll be particularly interested in Markov policies, which are maps from the current state
𝑦𝑡 into a current action 𝑐𝑡 .
For dynamic programming problems such as this one (in fact for any Markov decision pro-
cess), the optimal policy is always a Markov policy.
In other words, the current state 𝑦𝑡 provides a sufficient statistic for the history in terms of
making an optimal decision today.
This is quite intuitive, but if you wish you can find proofs in texts such as [102] (section 4.1).
Hereafter we focus on finding the best Markov policy.
In our context, a Markov policy is a function 𝜎 ∶ ℝ+ → ℝ+ , with the understanding that states
are mapped to actions via
444CHAPTER 28. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL

𝑐𝑡 = 𝜎(𝑦𝑡 ) for all 𝑡

In what follows, we will call 𝜎 a feasible consumption policy if it satisfies

0 ≤ 𝜎(𝑦) ≤ 𝑦 for all 𝑦 ∈ ℝ+ (4)

In other words, a feasible consumption policy is a Markov policy that respects the resource
constraint.
The set of all feasible consumption policies will be denoted by Σ.
Each 𝜎 ∈ Σ determines a continuous state Markov process {𝑦𝑡 } for output via

𝑦𝑡+1 = 𝑓(𝑦𝑡 − 𝜎(𝑦𝑡 ))𝜉𝑡+1 , 𝑦0 given (5)

This is the time path for output when we choose and stick with the policy 𝜎.
We insert this process into the objective function to get

∞ ∞
𝔼 [ ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) ] = 𝔼 [ ∑ 𝛽 𝑡 𝑢(𝜎(𝑦𝑡 )) ] (6)
𝑡=0 𝑡=0

This is the total expected present value of following policy 𝜎 forever, given initial income 𝑦0 .
The aim is to select a policy that makes this number as large as possible.
The next section covers these ideas more formally.

28.3.4 Optimality

The 𝜎 associated with a given policy 𝜎 is the mapping defined by


𝑣𝜎 (𝑦) = 𝔼 [∑ 𝛽 𝑡 𝑢(𝜎(𝑦𝑡 ))] (7)
𝑡=0

when {𝑦𝑡 } is given by (5) with 𝑦0 = 𝑦.


In other words, it is the lifetime value of following policy 𝜎 starting at initial condition 𝑦.
The value function is then defined as

𝑣∗ (𝑦) ∶= sup 𝑣𝜎 (𝑦) (8)


𝜎∈Σ

The value function gives the maximal value that can be obtained from state 𝑦, after consider-
ing all feasible policies.
A policy 𝜎 ∈ Σ is called optimal if it attains the supremum in (8) for all 𝑦 ∈ ℝ+ .
28.3. THE MODEL 445

28.3.5 The Bellman Equation

With our assumptions on utility and production functions, the value function as defined in
(8) also satisfies a Bellman equation.
For this problem, the Bellman equation takes the form

𝑣(𝑦) = max {𝑢(𝑐) + 𝛽 ∫ 𝑣(𝑓(𝑦 − 𝑐)𝑧)𝜙(𝑑𝑧)} (𝑦 ∈ ℝ+ ) (9)


0≤𝑐≤𝑦

This is a functional equation in 𝑣.


The term ∫ 𝑣(𝑓(𝑦 − 𝑐)𝑧)𝜙(𝑑𝑧) can be understood as the expected next period value when
• 𝑣 is used to measure value
• the state is 𝑦
• consumption is set to 𝑐
As shown in EDTC, theorem 10.1.11 and a range of other texts

The value function 𝑣∗ satisfies the Bellman equation

In other words, (9) holds when 𝑣 = 𝑣∗ .


The intuition is that maximal value from a given state can be obtained by optimally trading
off
• current reward from a given action, vs
• expected discounted future value of the state resulting from that action
The Bellman equation is important because it gives us more information about the value
function.
It also suggests a way of computing the value function, which we discuss below.

28.3.6 Greedy Policies

The primary importance of the value function is that we can use it to compute optimal poli-
cies.
The details are as follows.
Given a continuous function 𝑣 on ℝ+ , we say that 𝜎 ∈ Σ is 𝑣-greedy if 𝜎(𝑦) is a solution to

max {𝑢(𝑐) + 𝛽 ∫ 𝑣(𝑓(𝑦 − 𝑐)𝑧)𝜙(𝑑𝑧)} (10)


0≤𝑐≤𝑦

for every 𝑦 ∈ ℝ+ .
In other words, 𝜎 ∈ Σ is 𝑣-greedy if it optimally trades off current and future rewards when 𝑣
is taken to be the value function.
In our setting, we have the following key result

• A feasible consumption policy is optimal if and only if it is 𝑣∗ -greedy.


446CHAPTER 28. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL

The intuition is similar to the intuition for the Bellman equation, which was provided after
(9).
See, for example, theorem 10.1.11 of EDTC.
Hence, once we have a good approximation to 𝑣∗ , we can compute the (approximately) opti-
mal policy by computing the corresponding greedy policy.
The advantage is that we are now solving a much lower dimensional optimization problem.

28.3.7 The Bellman Operator

How, then, should we compute the value function?


One way is to use the so-called Bellman operator.
(An operator is a map that sends functions into functions.)
The Bellman operator is denoted by 𝑇 and defined by

𝑇 𝑣(𝑦) ∶= max {𝑢(𝑐) + 𝛽 ∫ 𝑣(𝑓(𝑦 − 𝑐)𝑧)𝜙(𝑑𝑧)} (𝑦 ∈ ℝ+ ) (11)


0≤𝑐≤𝑦

In other words, 𝑇 sends the function 𝑣 into the new function 𝑇 𝑣 defined by (11).
By construction, the set of solutions to the Bellman equation (9) exactly coincides with the
set of fixed points of 𝑇 .
For example, if 𝑇 𝑣 = 𝑣, then, for any 𝑦 ≥ 0,

𝑣(𝑦) = 𝑇 𝑣(𝑦) = max {𝑢(𝑐) + 𝛽 ∫ 𝑣∗ (𝑓(𝑦 − 𝑐)𝑧)𝜙(𝑑𝑧)}


0≤𝑐≤𝑦

which says precisely that 𝑣 is a solution to the Bellman equation.


It follows that 𝑣∗ is a fixed point of 𝑇 .

28.3.8 Review of Theoretical Results

One can also show that 𝑇 is a contraction mapping on the set of continuous bounded func-
tions on ℝ+ under the supremum distance

𝜌(𝑔, ℎ) = sup |𝑔(𝑦) − ℎ(𝑦)|


𝑦≥0

See EDTC, lemma 10.1.18.


Hence, it has exactly one fixed point in this set, which we know is equal to the value function.
It follows that
• The value function 𝑣∗ is bounded and continuous.
• Starting from any bounded and continuous 𝑣, the sequence 𝑣, 𝑇 𝑣, 𝑇 2 𝑣, … generated by
iteratively applying 𝑇 converges uniformly to 𝑣∗ .
This iterative method is called value function iteration.
We also know that a feasible policy is optimal if and only if it is 𝑣∗ -greedy.
28.4. COMPUTATION 447

It’s not too hard to show that a 𝑣∗ -greedy policy exists (see EDTC, theorem 10.1.11 if you
get stuck).
Hence, at least one optimal policy exists.
Our problem now is how to compute it.

28.3.9 Unbounded Utility

The results stated above assume that the utility function is bounded.
In practice economists often work with unbounded utility functions — and so will we.
In the unbounded setting, various optimality theories exist.
Unfortunately, they tend to be case-specific, as opposed to valid for a large range of applica-
tions.
Nevertheless, their main conclusions are usually in line with those stated for the bounded case
just above (as long as we drop the word “bounded”).
Consult, for example, section 12.2 of EDTC, [64] or [78].

28.4 Computation

Let’s now look at computing the value function and the optimal policy.
Our implementation in this lecture will focus on clarity and flexibility.
Both of these things are helpful, but they do cost us some speed — as you will see when you
run the code.
Later we will sacrifice some of this clarity and flexibility in order to accelerate our code with
just-in-time (JIT) compilation.
The algorithm we will use is fitted value function iteration, which was described in earlier lec-
tures the McCall model and cake eating.
The algorithm will be

1. Begin with an array of values {𝑣1 , … , 𝑣𝐼 } representing the values of some initial function
𝑣 on the grid points {𝑦1 , … , 𝑦𝐼 }.

2. Build a function 𝑣 ̂ on the state space ℝ+ by linear interpolation, based on these data
points.

3. Obtain and record the value 𝑇 𝑣(𝑦


̂ 𝑖 ) on each grid point 𝑦𝑖 by repeatedly solving (11).

4. Unless some stopping condition is satisfied, set {𝑣1 , … , 𝑣𝐼 } = {𝑇 𝑣(𝑦


̂ 1 ), … , 𝑇 𝑣(𝑦
̂ 𝐼 )} and
go to step 2.

28.4.1 Scalar Maximization

To maximize the right hand side of the Bellman equation (9), we are going to use the
minimize_scalar routine from SciPy.
448CHAPTER 28. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL

Since we are maximizing rather than minimizing, we will use the fact that the maximizer of 𝑔
on the interval [𝑎, 𝑏] is the minimizer of −𝑔 on the same interval.
To this end, and to keep the interface tidy, we will wrap minimize_scalar in an outer func-
tion as follows:

In [2]: def maximize(g, a, b, args):


"""
Maximize the function g over the interval [a, b].

We use the fact that the maximizer of g on any interval is


also the minimizer of ­g. The tuple args collects any extra
arguments to g.

Returns the maximal value and the maximizer.


"""

objective = lambda x: ­g(x, *args)


result = minimize_scalar(objective, bounds=(a, b), method='bounded')
maximizer, maximum = result.x, ­result.fun
return maximizer, maximum

28.4.2 Optimal Growth Model

We will assume for now that 𝜙 is the distribution of 𝜉 ∶= exp(𝜇 + 𝑠𝜁) where
• 𝜁 is standard normal,
• 𝜇 is a shock location parameter and
• 𝑠 is a shock scale parameter.
We will store this and other primitives of the optimal growth model in a class.
The class, defined below, combines both parameters and a method that realizes the right
hand side of the Bellman equation (9).

In [3]: class OptimalGrowthModel:

def __init__(self,
u, # utility function
f, # production function
β=0.96, # discount factor
μ=0, # shock location parameter
s=0.1, # shock scale parameter
grid_max=4,
grid_size=120,
shock_size=250,
seed=1234):

self.u, self.f, self.β, self.μ, self.s = u, f, β, μ, s

# Set up grid
self.grid = np.linspace(1e­4, grid_max, grid_size)

# Store shocks (with a seed, so results are reproducible)


np.random.seed(seed)
self.shocks = np.exp(μ + s * np.random.randn(shock_size))

def state_action_value(self, c, y, v_array):


"""
Right hand side of the Bellman equation.
28.4. COMPUTATION 449

"""

u, f, β, shocks = self.u, self.f, self.β, self.shocks

v = interp1d(self.grid, v_array)

return u(c) + β * np.mean(v(f(y ­ c) * shocks))

In the second last line we are using linear interpolation.


In the last line, the expectation in (11) is computed via Monte Carlo, using the approxima-
tion

1 𝑛
∫ 𝑣(𝑓(𝑦 − 𝑐)𝑧)𝜙(𝑑𝑧) ≈ ∑ 𝑣(𝑓(𝑦 − 𝑐)𝜉𝑖 )
𝑛 𝑖=1

where {𝜉𝑖 }𝑛𝑖=1 are IID draws from 𝜙.


Monte Carlo is not always the most efficient way to compute integrals numerically but it does
have some theoretical advantages in the present setting.
(For example, it preserves the contraction mapping property of the Bellman operator — see,
e.g., [85].)

28.4.3 The Bellman Operator

The next function implements the Bellman operator.


(We could have added it as a method to the OptimalGrowthModel class, but we prefer small
classes rather than monolithic ones for this kind of numerical work.)

In [4]: def T(v, og):


"""
The Bellman operator. Updates the guess of the value function
and also computes a v­greedy policy.

* og is an instance of OptimalGrowthModel
* v is an array representing a guess of the value function

"""
v_new = np.empty_like(v)
v_greedy = np.empty_like(v)

for i in range(len(grid)):
y = grid[i]

# Maximize RHS of Bellman equation at state y


c_star, v_max = maximize(og.state_action_value, 1e­10, y, (y, v))
v_new[i] = v_max
v_greedy[i] = c_star

return v_greedy, v_new

28.4.4 An Example

Let’s suppose now that


450CHAPTER 28. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL

𝑓(𝑘) = 𝑘𝛼 and 𝑢(𝑐) = ln 𝑐

For this particular problem, an exact analytical solution is available (see [72], section 3.1.2),
with

ln(1 − 𝛼𝛽) (𝜇 + 𝛼 ln(𝛼𝛽)) 1 1 1


𝑣∗ (𝑦) = + [ − ]+ ln 𝑦 (12)
1−𝛽 1−𝛼 1 − 𝛽 1 − 𝛼𝛽 1 − 𝛼𝛽

and optimal consumption policy

𝜎∗ (𝑦) = (1 − 𝛼𝛽)𝑦

It is valuable to have these closed-form solutions because it lets us check whether our code
works for this particular case.
In Python, the functions above can be expressed as:

In [5]: def v_star(y, α, β, μ):


"""
True value function
"""
c1 = np.log(1 ­ α * β) / (1 ­ β)
c2 = (μ + α * np.log(α * β)) / (1 ­ α)
c3 = 1 / (1 ­ β)
c4 = 1 / (1 ­ α * β)
return c1 + c2 * (c3 ­ c4) + c4 * np.log(y)

def σ_star(y, α, β):


"""
True optimal policy
"""
return (1 ­ α * β) * y

Next let’s create an instance of the model with the above primitives and assign it to the vari-
able og.

In [6]: α = 0.4
def fcd(k):
return k**α

og = OptimalGrowthModel(u=np.log, f=fcd)

Now let’s see what happens when we apply our Bellman operator to the exact solution 𝑣∗ in
this case.
In theory, since 𝑣∗ is a fixed point, the resulting function should again be 𝑣∗ .
In practice, we expect some small numerical error.

In [7]: grid = og.grid

v_init = v_star(grid, α, og.β, og.μ) # Start at the solution


v_greedy, v = T(v_init, og) # Apply T once

fig, ax = plt.subplots()
28.4. COMPUTATION 451

ax.set_ylim(­35, ­24)
ax.plot(grid, v, lw=2, alpha=0.6, label='$Tv^*$')
ax.plot(grid, v_init, lw=2, alpha=0.6, label='$v^*$')
ax.legend()
plt.show()

The two functions are essentially indistinguishable, so we are off to a good start.
Now let’s have a look at iterating with the Bellman operator, starting from an arbitrary ini-
tial condition.
The initial condition we’ll start with is, somewhat arbitrarily, 𝑣(𝑦) = 5 ln(𝑦).

In [8]: v = 5 * np.log(grid) # An initial condition


n = 35

fig, ax = plt.subplots()

ax.plot(grid, v, color=plt.cm.jet(0),
lw=2, alpha=0.6, label='Initial condition')

for i in range(n):
v_greedy, v = T(v, og) # Apply the Bellman operator
ax.plot(grid, v, color=plt.cm.jet(i / n), lw=2, alpha=0.6)

ax.plot(grid, v_star(grid, α, og.β, og.μ), 'k­', lw=2,


alpha=0.8, label='True value function')

ax.legend()
ax.set(ylim=(­40, 10), xlim=(np.min(grid), np.max(grid)))
plt.show()
452CHAPTER 28. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL

The figure shows

1. the first 36 functions generated by the fitted value function iteration algorithm, with
hotter colors given to higher iterates

2. the true value function 𝑣∗ drawn in black

The sequence of iterates converges towards 𝑣∗ .


We are clearly getting closer.

28.4.5 Iterating to Convergence

We can write a function that iterates until the difference is below a particular tolerance level.

In [9]: def solve_model(og,


tol=1e­4,
max_iter=1000,
verbose=True,
print_skip=25):
"""
Solve model by iterating with the Bellman operator.

"""

# Set up loop
v = og.u(og.grid) # Initial condition
i = 0
error = tol + 1

while i < max_iter and error > tol:


v_greedy, v_new = T(v, og)
error = np.max(np.abs(v ­ v_new))
28.4. COMPUTATION 453

i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
v = v_new

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return v_greedy, v_new

Let’s use this function to compute an approximate solution at the defaults.

In [10]: v_greedy, v_solution = solve_model(og)

Error at iteration 25 is 0.40975776844490497.


Error at iteration 50 is 0.1476753540823772.
Error at iteration 75 is 0.05322171277213883.
Error at iteration 100 is 0.019180930548646558.
Error at iteration 125 is 0.006912744396029069.
Error at iteration 150 is 0.002491330384817303.
Error at iteration 175 is 0.000897867291303811.
Error at iteration 200 is 0.00032358842396718046.
Error at iteration 225 is 0.00011662020561331587.

Converged in 229 iterations.

Now we check our result by plotting it against the true value:

In [11]: fig, ax = plt.subplots()

ax.plot(grid, v_solution, lw=2, alpha=0.6,


label='Approximate value function')

ax.plot(grid, v_star(grid, α, og.β, og.μ), lw=2,


alpha=0.6, label='True value function')

ax.legend()
ax.set_ylim(­35, ­24)
plt.show()
454CHAPTER 28. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL

The figure shows that we are pretty much on the money.

28.4.6 The Policy Function

The policy v_greedy computed above corresponds to an approximate optimal policy.


The next figure compares it to the exact solution, which, as mentioned above, is 𝜎(𝑦) = (1 −
𝛼𝛽)𝑦

In [12]: fig, ax = plt.subplots()

ax.plot(grid, v_greedy, lw=2,


alpha=0.6, label='approximate policy function')

ax.plot(grid, σ_star(grid, α, og.β), '­­',


lw=2, alpha=0.6, label='true policy function')

ax.legend()
plt.show()
28.5. EXERCISES 455

The figure shows that we’ve done a good job in this instance of approximating the true pol-
icy.

28.5 Exercises

28.5.1 Exercise 1

A common choice for utility function in this kind of work is the CRRA specification

𝑐1−𝛾
𝑢(𝑐) =
1−𝛾

Maintaining the other defaults, including the Cobb-Douglas production function, solve the
optimal growth model with this utility specification.
Setting 𝛾 = 1.5, compute and plot an estimate of the optimal policy.
Time how long this function takes to run, so you can compare it to faster code developed in
the next lecture

28.5.2 Exercise 2

Time how long it takes to iterate with the Bellman operator 20 times, starting from initial
condition 𝑣(𝑦) = 𝑢(𝑦).
Use the model specification in the previous exercise.
(As before, we will compare this number with that for the faster code developed in the next
lecture.)
456CHAPTER 28. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL

28.6 Solutions

28.6.1 Exercise 1

Here we set up the model.

In [13]: γ = 1.5 # Preference parameter

def u_crra(c):
return (c**(1 ­ γ) ­ 1) / (1 ­ γ)

og = OptimalGrowthModel(u=u_crra, f=fcd)

Now let’s run it, with a timer.

In [14]: %%time
v_greedy, v_solution = solve_model(og)

Error at iteration 25 is 0.5528151810417512.


Error at iteration 50 is 0.19923228425590978.
Error at iteration 75 is 0.07180266113800826.
Error at iteration 100 is 0.025877443335843964.
Error at iteration 125 is 0.009326145618970827.
Error at iteration 150 is 0.003361112262005861.
Error at iteration 175 is 0.0012113338243295857.
Error at iteration 200 is 0.0004365607333056687.
Error at iteration 225 is 0.00015733505506432266.

Converged in 237 iterations.


CPU times: user 50.1 s, sys: 36.1 ms, total: 50.1 s
Wall time: 50.1 s

Let’s plot the policy function just to see what it looks like:

In [15]: fig, ax = plt.subplots()

ax.plot(grid, v_greedy, lw=2,


alpha=0.6, label='Approximate optimal policy')

ax.legend()
plt.show()
28.6. SOLUTIONS 457

28.6.2 Exercise 2

Let’s set up:

In [16]: og = OptimalGrowthModel(u=u_crra, f=fcd)


v = og.u(og.grid)

Here’s the timing:

In [17]: %%time

for i in range(20):
v_greedy, v_new = T(v, og)
v = v_new

CPU times: user 4.2 s, sys: 0 ns, total: 4.2 s


Wall time: 4.2 s
458CHAPTER 28. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
Chapter 29

Optimal Growth II: Accelerating the


Code with Numba

29.1 Contents

• Overview 29.2
• The Model 29.3
• Computation 29.4
• Exercises 29.5
• Solutions 29.6
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon


!pip install interpolation

29.2 Overview

Previously, we studied a stochastic optimal growth model with one representative agent.
We solved the model using dynamic programming.
In writing our code, we focused on clarity and flexibility.
These are important, but there’s often a trade-off between flexibility and speed.
The reason is that, when code is less flexible, we can exploit structure more easily.
(This is true about algorithms and mathematical problems more generally: more specific
problems have more structure, which, with some thought, can be exploited for better results.)
So, in this lecture, we are going to accept less flexibility while gaining speed, using just-in-
time (JIT) compilation to accelerate our code.
Let’s start with some imports:

In [2]: import numpy as np


import matplotlib.pyplot as plt
from interpolation import interp
from numba import jit, njit, jitclass, prange, float64, int32
from quantecon.optimize.scalar_maximization import brent_max

459
460CHAPTER 29. OPTIMAL GROWTH II: ACCELERATING THE CODE WITH NUMBA

%matplotlib inline

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

We are using an interpolation function from interpolation.py because it helps us JIT-compile


our code.
The function brent_max is also designed for embedding in JIT-compiled code.
These are alternatives to similar functions in SciPy (which, unfortunately, are not JIT-aware).

29.3 The Model

The model is the same as discussed in our previous lecture on optimal growth.
We will start with log utility:

𝑢(𝑐) = ln(𝑐)

We continue to assume that


• 𝑓(𝑘) = 𝑘𝛼
• 𝜙 is the distribution of 𝜉 ∶= exp(𝜇 + 𝑠𝜁) when 𝜁 is standard normal
We will once again use value function iteration to solve the model.
In particular, the algorithm is unchanged, and the only difference is in the implementation
itself.
As before, we will be able to compare with the true solutions

In [3]: def v_star(y, α, β, μ):


"""
True value function
"""
c1 = np.log(1 ­ α * β) / (1 ­ β)
c2 = (μ + α * np.log(α * β)) / (1 ­ α)
c3 = 1 / (1 ­ β)
c4 = 1 / (1 ­ α * β)
return c1 + c2 * (c3 ­ c4) + c4 * np.log(y)

def σ_star(y, α, β):


"""
True optimal policy
"""
return (1 ­ α * β) * y

29.4 Computation

We will again store the primitives of the optimal growth model in a class.
29.4. COMPUTATION 461

But now we are going to use Numba’s @jitclass decorator to target our class for JIT compi-
lation.
Because we are going to use Numba to compile our class, we need to specify the data types.
You will see this as a list called opt_growth_data above our class.
Unlike in the previous lecture, we hardwire the production and utility specifications into the
class.
This is where we sacrifice flexibility in order to gain more speed.

In [4]: opt_growth_data = [
('α', float64), # Production parameter
('β', float64), # Discount factor
('μ', float64), # Shock location parameter
('s', float64), # Shock scale parameter
('grid', float64[:]), # Grid (array)
('shocks', float64[:]) # Shock draws (array)
]

@jitclass(opt_growth_data)
class OptimalGrowthModel:

def __init__(self,
α=0.4,
β=0.96,
μ=0,
s=0.1,
grid_max=4,
grid_size=120,
shock_size=250,
seed=1234):

self.α, self.β, self.μ, self.s = α, β, μ, s

# Set up grid
self.grid = np.linspace(1e­5, grid_max, grid_size)

# Store shocks (with a seed, so results are reproducible)


np.random.seed(seed)
self.shocks = np.exp(μ + s * np.random.randn(shock_size))

def f(self, k):


"The production function"
return k**self.α

def u(self, c):


"The utility function"
return np.log(c)

def f_prime(self, k):


"Derivative of f"
return self.α * (k**(self.α ­ 1))

def u_prime(self, c):


"Derivative of u"
return 1/c

def u_prime_inv(self, c):


462CHAPTER 29. OPTIMAL GROWTH II: ACCELERATING THE CODE WITH NUMBA

"Inverse of u'"
return 1/c

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/ipykernel_launcher.py:10: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba­
doc/latest/reference/deprecation.html#change­of­jitclass­location for the time�
↪frame.

# Remove the CWD from sys.path while we load stuff.

The class includes some methods such as u_prime that we do not need now but will use in
later lectures.

29.4.1 The Bellman Operator

We will use JIT compilation to accelerate the Bellman operator.


First, here’s a function that returns the value of a particular consumption choice c, given
state y, as per the Bellman equation (9).

In [5]: @njit
def state_action_value(c, y, v_array, og):
"""
Right hand side of the Bellman equation.

* c is consumption
* y is income
* og is an instance of OptimalGrowthModel
* v_array represents a guess of the value function on the grid

"""

u, f, β, shocks = og.u, og.f, og.β, og.shocks

v = lambda x: interp(og.grid, v_array, x)

return u(c) + β * np.mean(v(f(y ­ c) * shocks))

Now we can implement the Bellman operator, which maximizes the right hand side of the
Bellman equation:

In [6]: @jit(nopython=True)
def T(v, og):
"""
The Bellman operator.

* og is an instance of OptimalGrowthModel
* v is an array representing a guess of the value function

"""

v_new = np.empty_like(v)
v_greedy = np.empty_like(v)

for i in range(len(og.grid)):
29.4. COMPUTATION 463

y = og.grid[i]

# Maximize RHS of Bellman equation at state y


result = brent_max(state_action_value, 1e­10, y, args=(y, v, og))
v_greedy[i], v_new[i] = result[0], result[1]

return v_greedy, v_new

We use the solve_model function to perform iteration until convergence.

In [7]: def solve_model(og,


tol=1e­4,
max_iter=1000,
verbose=True,
print_skip=25):
"""
Solve model by iterating with the Bellman operator.

"""

# Set up loop
v = og.u(og.grid) # Initial condition
i = 0
error = tol + 1

while i < max_iter and error > tol:


v_greedy, v_new = T(v, og)
error = np.max(np.abs(v ­ v_new))
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
v = v_new

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return v_greedy, v_new

Let’s compute the approximate solution at the default parameters.


First we create an instance:

In [8]: og = OptimalGrowthModel()

Now we call solve_model, using the %%time magic to check how long it takes.

In [9]: %%time
v_greedy, v_solution = solve_model(og)

Error at iteration 25 is 0.41372668361362486.


Error at iteration 50 is 0.14767653072604503.
Error at iteration 75 is 0.053221715530327174.
Error at iteration 100 is 0.019180931418532055.
Error at iteration 125 is 0.006912744709513419.
Error at iteration 150 is 0.002491330497818467.
Error at iteration 175 is 0.0008978673320534369.
464CHAPTER 29. OPTIMAL GROWTH II: ACCELERATING THE CODE WITH NUMBA

Error at iteration 200 is 0.0003235884386789678.


Error at iteration 225 is 0.00011662021094238639.

Converged in 229 iterations.


CPU times: user 8.08 s, sys: 19.2 ms, total: 8.1 s
Wall time: 8.11 s

You will notice that this is much faster than our original implementation.
Here is a plot of the resulting policy, compared with the true policy:

In [10]: fig, ax = plt.subplots()

ax.plot(og.grid, v_greedy, lw=2,


alpha=0.8, label='approximate policy function')

ax.plot(og.grid, σ_star(og.grid, og.α, og.β), 'k­­',


lw=2, alpha=0.8, label='true policy function')

ax.legend()
plt.show()

Again, the fit is excellent — this is as expected since we have not changed the algorithm.
The maximal absolute deviation between the two policies is

In [11]: np.max(np.abs(v_greedy ­ σ_star(og.grid, og.α, og.β)))

Out[11]: 0.0010480511607799947
29.5. EXERCISES 465

29.5 Exercises

29.5.1 Exercise 1

Time how long it takes to iterate with the Bellman operator 20 times, starting from initial
condition 𝑣(𝑦) = 𝑢(𝑦).
Use the default parameterization.

29.5.2 Exercise 2

Modify the optimal growth model to use the CRRA utility specification.

𝑐1−𝛾
𝑢(𝑐) =
1−𝛾

Set γ = 1.5 as the default value and maintaining other specifications.


(Note that jitclass currently does not support inheritance, so you will have to copy the class
and change the relevant parameters and methods.)
Compute an estimate of the optimal policy, plot it and compare visually with the same plot
from the analogous exercise in the first optimal growth lecture.
Compare execution time as well.

29.5.3 Exercise 3

In this exercise we return to the original log utility specification.


Once an optimal consumption policy 𝜎 is given, income follows

𝑦𝑡+1 = 𝑓(𝑦𝑡 − 𝜎(𝑦𝑡 ))𝜉𝑡+1

The next figure shows a simulation of 100 elements of this sequence for three different dis-
count factors (and hence three different policies).
466CHAPTER 29. OPTIMAL GROWTH II: ACCELERATING THE CODE WITH NUMBA

In each sequence, the initial condition is 𝑦0 = 0.1.


The discount factors are discount_factors = (0.8, 0.9, 0.98).
We have also dialed down the shocks a bit with s = 0.05.
Otherwise, the parameters and primitives are the same as the log-linear model discussed ear-
lier in the lecture.
Notice that more patient agents typically have higher wealth.
Replicate the figure modulo randomness.

29.6 Solutions

29.6.1 Exercise 1

Let’s set up the initial condition.

In [12]: v = og.u(og.grid)

Here’s the timing:

In [13]: %%time

for i in range(20):
v_greedy, v_new = T(v, og)
v = v_new

CPU times: user 469 ms, sys: 0 ns, total: 469 ms


Wall time: 469 ms

Compared with our timing for the non-compiled version of value function iteration, the JIT-
compiled code is usually an order of magnitude faster.

29.6.2 Exercise 2

Here’s our CRRA version of OptimalGrowthModel:

In [14]: opt_growth_data = [
('α', float64), # Production parameter
('β', float64), # Discount factor
('μ', float64), # Shock location parameter
('γ', float64), # Preference parameter
('s', float64), # Shock scale parameter
('grid', float64[:]), # Grid (array)
('shocks', float64[:]) # Shock draws (array)
]

@jitclass(opt_growth_data)
class OptimalGrowthModel_CRRA:

def __init__(self,
α=0.4,
29.6. SOLUTIONS 467

β=0.96,
μ=0,
s=0.1,
γ=1.5,
grid_max=4,
grid_size=120,
shock_size=250,
seed=1234):

self.α, self.β, self.γ, self.μ, self.s = α, β, γ, μ, s

# Set up grid
self.grid = np.linspace(1e­5, grid_max, grid_size)

# Store shocks (with a seed, so results are reproducible)


np.random.seed(seed)
self.shocks = np.exp(μ + s * np.random.randn(shock_size))

def f(self, k):


"The production function."
return k**self.α

def u(self, c):


"The utility function."
return c**(1 ­ self.γ) / (1 ­ self.γ)

def f_prime(self, k):


"Derivative of f."
return self.α * (k**(self.α ­ 1))

def u_prime(self, c):


"Derivative of u."
return c**(­self.γ)

def u_prime_inv(c):
return c**(­1 / self.γ)

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/ipykernel_launcher.py:11: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba­
doc/latest/reference/deprecation.html#change­of­jitclass­location for the time�
↪frame.
# This is added back by InteractiveShellApp.init_path()

Let’s create an instance:

In [15]: og_crra = OptimalGrowthModel_CRRA()

Now we call solve_model, using the %%time magic to check how long it takes.

In [16]: %%time
v_greedy, v_solution = solve_model(og_crra)

Error at iteration 25 is 1.6201897527216715.


Error at iteration 50 is 0.4591060470565935.
468CHAPTER 29. OPTIMAL GROWTH II: ACCELERATING THE CODE WITH NUMBA

Error at iteration 75 is 0.1654235221617455.


Error at iteration 100 is 0.05961808343499797.
Error at iteration 125 is 0.021486161531640846.
Error at iteration 150 is 0.007743542074422294.
Error at iteration 175 is 0.0027907471408923357.
Error at iteration 200 is 0.0010057761070925153.
Error at iteration 225 is 0.0003624784085332067.
Error at iteration 250 is 0.00013063602803242702.

Converged in 257 iterations.


CPU times: user 7.87 s, sys: 19.8 ms, total: 7.89 s
Wall time: 7.87 s

Here is a plot of the resulting policy:

In [17]: fig, ax = plt.subplots()

ax.plot(og.grid, v_greedy, lw=2,


alpha=0.6, label='Approximate value function')

ax.legend(loc='lower right')
plt.show()

This matches the solution that we obtained in our non-jitted code, in the exercises.
Execution time is an order of magnitude faster.

29.6.3 Exercise 3

Here’s one solution:

In [18]: def simulate_og(σ_func, og, y0=0.1, ts_length=100):


'''
29.6. SOLUTIONS 469

Compute a time series given consumption policy σ.


'''
y = np.empty(ts_length)
ξ = np.random.randn(ts_length­1)
y[0] = y0
for t in range(ts_length­1):
y[t+1] = (y[t] ­ σ_func(y[t]))**og.α * np.exp(og.μ + og.s * ξ[t])
return y

In [19]: fig, ax = plt.subplots()

for β in (0.8, 0.9, 0.98):

og = OptimalGrowthModel(β=β, s=0.05)

v_greedy, v_solution = solve_model(og, verbose=False)

# Define an optimal policy function


σ_func = lambda x: interp(og.grid, v_greedy, x)
y = simulate_og(σ_func, og)
ax.plot(y, lw=2, alpha=0.6, label=rf'$\beta = {β}$')

ax.legend(loc='lower right')
plt.show()
470CHAPTER 29. OPTIMAL GROWTH II: ACCELERATING THE CODE WITH NUMBA
Chapter 30

Optimal Growth III: Time Iteration

30.1 Contents

• Overview 30.2
• The Euler Equation 30.3
• Implementation 30.4
• Exercises 30.5
• Solutions 30.6
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon


!pip install interpolation

30.2 Overview

In this lecture, we’ll continue our earlier study of the stochastic optimal growth model.
In that lecture, we solved the associated dynamic programming problem using value function
iteration.
The beauty of this technique is its broad applicability.
With numerical problems, however, we can often attain higher efficiency in specific applica-
tions by deriving methods that are carefully tailored to the application at hand.
The stochastic optimal growth model has plenty of structure to exploit for this purpose, espe-
cially when we adopt some concavity and smoothness assumptions over primitives.
We’ll use this structure to obtain an Euler equation based method.
This will be an extension of the time iteration method considered in our elementary lecture
on cake eating.
In a subsequent lecture, we’ll see that time iteration can be further adjusted to obtain even
more efficiency.
Let’s start with some imports:

In [2]: import numpy as np


import quantecon as qe

471
472 CHAPTER 30. OPTIMAL GROWTH III: TIME ITERATION

import matplotlib.pyplot as plt


%matplotlib inline

from interpolation import interp


from quantecon.optimize import brentq
from numba import njit, jitclass, float64

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

30.3 The Euler Equation

Our first step is to derive the Euler equation, which is a generalization of the Euler equation
we obtained in the lecture on cake eating.
We take the model set out in the stochastic growth model lecture and add the following as-
sumptions:

1. 𝑢 and 𝑓 are continuously differentiable and strictly concave

2. 𝑓(0) = 0

3. lim𝑐→0 𝑢′ (𝑐) = ∞ and lim𝑐→∞ 𝑢′ (𝑐) = 0

4. lim𝑘→0 𝑓 ′ (𝑘) = ∞ and lim𝑘→∞ 𝑓 ′ (𝑘) = 0

The last two conditions are usually called Inada conditions.


Recall the Bellman equation

𝑣∗ (𝑦) = max {𝑢(𝑐) + 𝛽 ∫ 𝑣∗ (𝑓(𝑦 − 𝑐)𝑧)𝜙(𝑑𝑧)} for all 𝑦 ∈ ℝ+ (1)


0≤𝑐≤𝑦

Let the optimal consumption policy be denoted by 𝜎∗ .


We know that 𝜎∗ is a 𝑣∗ -greedy policy so that 𝜎∗ (𝑦) is the maximizer in (1).
The conditions above imply that
• 𝜎∗ is the unique optimal policy for the stochastic optimal growth model
• the optimal policy is continuous, strictly increasing and also interior, in the sense that
0 < 𝜎∗ (𝑦) < 𝑦 for all strictly positive 𝑦, and
• the value function is strictly concave and continuously differentiable, with

(𝑣∗ )′ (𝑦) = 𝑢′ (𝜎∗ (𝑦)) ∶= (𝑢′ ∘ 𝜎∗ )(𝑦) (2)

The last result is called the envelope condition due to its relationship with the envelope
theorem.
To see why (2) holds, write the Bellman equation in the equivalent form
30.3. THE EULER EQUATION 473

𝑣∗ (𝑦) = max {𝑢(𝑦 − 𝑘) + 𝛽 ∫ 𝑣∗ (𝑓(𝑘)𝑧)𝜙(𝑑𝑧)} ,


0≤𝑘≤𝑦

Differentiating with respect to 𝑦, and then evaluating at the optimum yields (2).
(Section 12.1 of EDTC contains full proofs of these results, and closely related discussions can
be found in many other texts.)
Differentiability of the value function and interiority of the optimal policy imply that optimal
consumption satisfies the first order condition associated with (1), which is

𝑢′ (𝜎∗ (𝑦)) = 𝛽 ∫(𝑣∗ )′ (𝑓(𝑦 − 𝜎∗ (𝑦))𝑧)𝑓 ′ (𝑦 − 𝜎∗ (𝑦))𝑧𝜙(𝑑𝑧) (3)

Combining (2) and the first-order condition (3) gives the Euler equation

(𝑢′ ∘ 𝜎∗ )(𝑦) = 𝛽 ∫(𝑢′ ∘ 𝜎∗ )(𝑓(𝑦 − 𝜎∗ (𝑦))𝑧)𝑓 ′ (𝑦 − 𝜎∗ (𝑦))𝑧𝜙(𝑑𝑧) (4)

We can think of the Euler equation as a functional equation

(𝑢′ ∘ 𝜎)(𝑦) = 𝛽 ∫(𝑢′ ∘ 𝜎)(𝑓(𝑦 − 𝜎(𝑦))𝑧)𝑓 ′ (𝑦 − 𝜎(𝑦))𝑧𝜙(𝑑𝑧) (5)

over interior consumption policies 𝜎, one solution of which is the optimal policy 𝜎∗ .
Our aim is to solve the functional equation (5) and hence obtain 𝜎∗ .

30.3.1 The Coleman-Reffett Operator

Recall the Bellman operator

𝑇 𝑣(𝑦) ∶= max {𝑢(𝑐) + 𝛽 ∫ 𝑣(𝑓(𝑦 − 𝑐)𝑧)𝜙(𝑑𝑧)} (6)


0≤𝑐≤𝑦

Just as we introduced the Bellman operator to solve the Bellman equation, we will now intro-
duce an operator over policies to help us solve the Euler equation.
This operator 𝐾 will act on the set of all 𝜎 ∈ Σ that are continuous, strictly increasing and
interior.
Henceforth we denote this set of policies by 𝒫

1. The operator 𝐾 takes as its argument a 𝜎 ∈ 𝒫 and

2. returns a new function 𝐾𝜎, where 𝐾𝜎(𝑦) is the 𝑐 ∈ (0, 𝑦) that solves.

𝑢′ (𝑐) = 𝛽 ∫(𝑢′ ∘ 𝜎)(𝑓(𝑦 − 𝑐)𝑧)𝑓 ′ (𝑦 − 𝑐)𝑧𝜙(𝑑𝑧) (7)

We call this operator the Coleman-Reffett operator to acknowledge the work of [23] and
[89].
474 CHAPTER 30. OPTIMAL GROWTH III: TIME ITERATION

In essence, 𝐾𝜎 is the consumption policy that the Euler equation tells you to choose today
when your future consumption policy is 𝜎.
The important thing to note about 𝐾 is that, by construction, its fixed points coincide with
solutions to the functional equation (5).
In particular, the optimal policy 𝜎∗ is a fixed point.
Indeed, for fixed 𝑦, the value 𝐾𝜎∗ (𝑦) is the 𝑐 that solves

𝑢′ (𝑐) = 𝛽 ∫(𝑢′ ∘ 𝜎∗ )(𝑓(𝑦 − 𝑐)𝑧)𝑓 ′ (𝑦 − 𝑐)𝑧𝜙(𝑑𝑧)

In view of the Euler equation, this is exactly 𝜎∗ (𝑦).

30.3.2 Is the Coleman-Reffett Operator Well Defined?

In particular, is there always a unique 𝑐 ∈ (0, 𝑦) that solves (7)?


The answer is yes, under our assumptions.
For any 𝜎 ∈ 𝒫, the right side of (7)
• is continuous and strictly increasing in 𝑐 on (0, 𝑦)
• diverges to +∞ as 𝑐 ↑ 𝑦
The left side of (7)
• is continuous and strictly decreasing in 𝑐 on (0, 𝑦)
• diverges to +∞ as 𝑐 ↓ 0
Sketching these curves and using the information above will convince you that they cross ex-
actly once as 𝑐 ranges over (0, 𝑦).
With a bit more analysis, one can show in addition that 𝐾𝜎 ∈ 𝒫 whenever 𝜎 ∈ 𝒫.

30.3.3 Comparison with VFI (Theory)

It is possible to prove that there is a tight relationship between iterates of 𝐾 and iterates of
the Bellman operator.
Mathematically, the two operators are topologically conjugate.
Loosely speaking, this means that if iterates of one operator converge then so do iterates of
the other, and vice versa.
Moreover, there is a sense in which they converge at the same rate, at least in theory.
However, it turns out that the operator 𝐾 is more stable numerically and hence more efficient
in the applications we consider.
Examples are given below.

30.4 Implementation

As in our previous study, we continue to assume that


30.4. IMPLEMENTATION 475

• 𝑢(𝑐) = ln 𝑐
• 𝑓(𝑘) = 𝑘𝛼
• 𝜙 is the distribution of 𝜉 ∶= exp(𝜇 + 𝑠𝜁) when 𝜁 is standard normal
This will allow us to compare our results to the analytical solutions

In [3]: def v_star(y, α, β, μ):


"""
True value function
"""
c1 = np.log(1 ­ α * β) / (1 ­ β)
c2 = (μ + α * np.log(α * β)) / (1 ­ α)
c3 = 1 / (1 ­ β)
c4 = 1 / (1 ­ α * β)
return c1 + c2 * (c3 ­ c4) + c4 * np.log(y)

def σ_star(y, α, β):


"""
True optimal policy
"""
return (1 ­ α * β) * y

As discussed above, our plan is to solve the model using time iteration, which means iterating
with the operator 𝐾.
For this we need access to the functions 𝑢′ and 𝑓, 𝑓 ′ .
These are available in a class called OptimalGrowthModel that we constructed in an earlier
lecture.

In [4]: opt_growth_data = [
('α', float64), # Production parameter
('β', float64), # Discount factor
('μ', float64), # Shock location parameter
('s', float64), # Shock scale parameter
('grid', float64[:]), # Grid (array)
('shocks', float64[:]) # Shock draws (array)
]

@jitclass(opt_growth_data)
class OptimalGrowthModel:

def __init__(self,
α=0.4,
β=0.96,
μ=0,
s=0.1,
grid_max=4,
grid_size=120,
shock_size=250,
seed=1234):

self.α, self.β, self.μ, self.s = α, β, μ, s

# Set up grid
self.grid = np.linspace(1e­5, grid_max, grid_size)

# Store shocks (with a seed, so results are reproducible)


np.random.seed(seed)
self.shocks = np.exp(μ + s * np.random.randn(shock_size))
476 CHAPTER 30. OPTIMAL GROWTH III: TIME ITERATION

def f(self, k):


"The production function"
return k**self.α

def u(self, c):


"The utility function"
return np.log(c)

def f_prime(self, k):


"Derivative of f"
return self.α * (k**(self.α ­ 1))

def u_prime(self, c):


"Derivative of u"
return 1/c

def u_prime_inv(self, c):


"Inverse of u'"
return 1/c

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/ipykernel_launcher.py:10: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba­
doc/latest/reference/deprecation.html#change­of­jitclass­location for the time�
↪frame.
# Remove the CWD from sys.path while we load stuff.

Now we implement a method called euler_diff, which returns

𝑢′ (𝑐) − 𝛽 ∫(𝑢′ ∘ 𝜎)(𝑓(𝑦 − 𝑐)𝑧)𝑓 ′ (𝑦 − 𝑐)𝑧𝜙(𝑑𝑧) (8)

In [5]: @njit
def euler_diff(c, σ, y, og):
"""
Set up a function such that the root with respect to c,
given y and σ, is equal to Kσ(y).

"""

β, shocks, grid = og.β, og.shocks, og.grid


f, f_prime, u_prime = og.f, og.f_prime, og.u_prime

# First turn σ into a function via interpolation


σ_func = lambda x: interp(grid, σ, x)

# Now set up the function we need to find the root of.


vals = u_prime(σ_func(f(y ­ c) * shocks)) * f_prime(y ­ c) * shocks
return u_prime(c) ­ β * np.mean(vals)

The function euler_diff evaluates integrals by Monte Carlo and approximates functions us-
ing linear interpolation.
30.4. IMPLEMENTATION 477

We will use a root-finding algorithm to solve (8) for 𝑐 given state 𝑦 and 𝜎, the current guess
of the policy.
Here’s the operator 𝐾, that implements the root-finding step.

In [6]: @njit
def K(σ, og):
"""
The Coleman­Reffett operator

Here og is an instance of OptimalGrowthModel.


"""

β = og.β
f, f_prime, u_prime = og.f, og.f_prime, og.u_prime
grid, shocks = og.grid, og.shocks

σ_new = np.empty_like(σ)
for i, y in enumerate(grid):
# Solve for optimal c at y
c_star = brentq(euler_diff, 1e­10, y­1e­10, args=(σ, y, og))[0]
σ_new[i] = c_star

return σ_new

30.4.1 Testing

Let’s generate an instance and plot some iterates of 𝐾, starting from 𝜎(𝑦) = 𝑦.

In [7]: og = OptimalGrowthModel()
grid = og.grid

n = 15
σ = grid.copy() # Set initial condition

fig, ax = plt.subplots()
lb = 'initial condition $\sigma(y) = y$'
ax.plot(grid, σ, color=plt.cm.jet(0), alpha=0.6, label=lb)

for i in range(n):
σ = K(σ, og)
ax.plot(grid, σ, color=plt.cm.jet(i / n), alpha=0.6)

# Update one more time and plot the last iterate in black
σ = K(σ, og)
ax.plot(grid, σ, color='k', alpha=0.8, label='last iterate')

ax.legend()

plt.show()
478 CHAPTER 30. OPTIMAL GROWTH III: TIME ITERATION

We see that the iteration process converges quickly to a limit that resembles the solution we
obtained in the previous lecture.
Here is a function called solve_model_time_iter that takes an instance of
OptimalGrowthModel and returns an approximation to the optimal policy, using time
iteration.

In [8]: def solve_model_time_iter(model, # Class with model information


σ, # Initial condition
tol=1e­4,
max_iter=1000,
verbose=True,
print_skip=25):

# Set up loop
i = 0
error = tol + 1

while i < max_iter and error > tol:


σ_new = K(σ, model)
error = np.max(np.abs(σ ­ σ_new))
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
σ = σ_new

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return σ_new

Let’s call it:


30.4. IMPLEMENTATION 479

In [9]: σ_init = np.copy(og.grid)


σ = solve_model_time_iter(og, σ_init)

Converged in 11 iterations.

Here is a plot of the resulting policy, compared with the true policy:

In [10]: fig, ax = plt.subplots()

ax.plot(og.grid, σ, lw=2,
alpha=0.8, label='approximate policy function')

ax.plot(og.grid, σ_star(og.grid, og.α, og.β), 'k­­',


lw=2, alpha=0.8, label='true policy function')

ax.legend()
plt.show()

Again, the fit is excellent.


The maximal absolute deviation between the two policies is

In [11]: np.max(np.abs(σ ­ σ_star(og.grid, og.α, og.β)))

Out[11]: 2.5329106212446106e­05

How long does it take to converge?

In [12]: %%timeit ­n 3 ­r 1
σ = solve_model_time_iter(og, σ_init, verbose=False)
480 CHAPTER 30. OPTIMAL GROWTH III: TIME ITERATION

181 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 3 loops each)

Convergence is very fast, even compared to our JIT-compiled value function iteration.
Overall, we find that time iteration provides a very high degree of efficiency and accuracy, at
least for this model.

30.5 Exercises

30.5.1 Exercise 1

Solve the model with CRRA utility

𝑐1−𝛾
𝑢(𝑐) =
1−𝛾

Set γ = 1.5.
Compute and plot the optimal policy.

30.6 Solutions

30.6.1 Exercise 1

We use the class OptimalGrowthModel_CRRA from our VFI lecture.

In [13]: opt_growth_data = [
('α', float64), # Production parameter
('β', float64), # Discount factor
('μ', float64), # Shock location parameter
('γ', float64), # Preference parameter
('s', float64), # Shock scale parameter
('grid', float64[:]), # Grid (array)
('shocks', float64[:]) # Shock draws (array)
]

@jitclass(opt_growth_data)
class OptimalGrowthModel_CRRA:

def __init__(self,
α=0.4,
β=0.96,
μ=0,
s=0.1,
γ=1.5,
grid_max=4,
grid_size=120,
shock_size=250,
seed=1234):

self.α, self.β, self.γ, self.μ, self.s = α, β, γ, μ, s

# Set up grid
self.grid = np.linspace(1e­5, grid_max, grid_size)
30.6. SOLUTIONS 481

# Store shocks (with a seed, so results are reproducible)


np.random.seed(seed)
self.shocks = np.exp(μ + s * np.random.randn(shock_size))

def f(self, k):


"The production function."
return k**self.α

def u(self, c):


"The utility function."
return c**(1 ­ self.γ) / (1 ­ self.γ)

def f_prime(self, k):


"Derivative of f."
return self.α * (k**(self.α ­ 1))

def u_prime(self, c):


"Derivative of u."
return c**(­self.γ)

def u_prime_inv(c):
return c**(­1 / self.γ)

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/ipykernel_launcher.py:11: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba­
doc/latest/reference/deprecation.html#change­of­jitclass­location for the time�
↪frame.
# This is added back by InteractiveShellApp.init_path()

Let’s create an instance:

In [14]: og_crra = OptimalGrowthModel_CRRA()

Now we solve and plot the policy:

In [15]: %%time
σ = solve_model_time_iter(og_crra, σ_init)

fig, ax = plt.subplots()

ax.plot(og.grid, σ, lw=2,
alpha=0.8, label='approximate policy function')

ax.legend()
plt.show()

Converged in 13 iterations.
482 CHAPTER 30. OPTIMAL GROWTH III: TIME ITERATION

CPU times: user 2.39 s, sys: 3.99 ms, total: 2.39 s


Wall time: 2.39 s
Chapter 31

Optimal Growth IV: The


Endogenous Grid Method

31.1 Contents

• Overview 31.2
• Key Idea 31.3
• Implementation 31.4
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon


!pip install interpolation

31.2 Overview

Previously, we solved the stochastic optimal growth model using

1. value function iteration


2. Euler equation based time iteration

We found time iteration to be significantly more accurate and efficient.


In this lecture, we’ll look at a clever twist on time iteration called the endogenous grid
method (EGM).
EGM is a numerical method for implementing policy iteration invented by Chris Carroll.
The original reference is [21].
Let’s start with some standard imports:

In [2]: import numpy as np


import quantecon as qe
from interpolation import interp
from numba import jitclass, njit, float64
from quantecon.optimize import brentq
import matplotlib.pyplot as plt
%matplotlib inline

483
484 CHAPTER 31. OPTIMAL GROWTH IV: THE ENDOGENOUS GRID METHOD

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

31.3 Key Idea

Let’s start by reminding ourselves of the theory and then see how the numerics fit in.

31.3.1 Theory

Take the model set out in the time iteration lecture, following the same terminology and no-
tation.
The Euler equation is

(𝑢′ ∘ 𝜎∗ )(𝑦) = 𝛽 ∫(𝑢′ ∘ 𝜎∗ )(𝑓(𝑦 − 𝜎∗ (𝑦))𝑧)𝑓 ′ (𝑦 − 𝜎∗ (𝑦))𝑧𝜙(𝑑𝑧) (1)

As we saw, the Coleman-Reffett operator is a nonlinear operator 𝐾 engineered so that 𝜎∗ is a


fixed point of 𝐾.
It takes as its argument a continuous strictly increasing consumption policy 𝜎 ∈ Σ.
It returns a new function 𝐾𝜎, where (𝐾𝜎)(𝑦) is the 𝑐 ∈ (0, ∞) that solves

𝑢′ (𝑐) = 𝛽 ∫(𝑢′ ∘ 𝜎)(𝑓(𝑦 − 𝑐)𝑧)𝑓 ′ (𝑦 − 𝑐)𝑧𝜙(𝑑𝑧) (2)

31.3.2 Exogenous Grid

As discussed in the lecture on time iteration, to implement the method on a computer, we


need a numerical approximation.
In particular, we represent a policy function by a set of values on a finite grid.
The function itself is reconstructed from this representation when necessary, using interpola-
tion or some other method.
Previously, to obtain a finite representation of an updated consumption policy, we
• fixed a grid of income points {𝑦𝑖 }
• calculated the consumption value 𝑐𝑖 corresponding to each 𝑦𝑖 using (2) and a root-
finding routine
Each 𝑐𝑖 is then interpreted as the value of the function 𝐾𝜎 at 𝑦𝑖 .
Thus, with the points {𝑦𝑖 , 𝑐𝑖 } in hand, we can reconstruct 𝐾𝜎 via approximation.
Iteration then continues…
31.4. IMPLEMENTATION 485

31.3.3 Endogenous Grid

The method discussed above requires a root-finding routine to find the 𝑐𝑖 corresponding to a
given income value 𝑦𝑖 .
Root-finding is costly because it typically involves a significant number of function evalua-
tions.
As pointed out by Carroll [21], we can avoid this if 𝑦𝑖 is chosen endogenously.
The only assumption required is that 𝑢′ is invertible on (0, ∞).
Let (𝑢′ )−1 be the inverse function of 𝑢′ .
The idea is this:
• First, we fix an exogenous grid {𝑘𝑖 } for capital (𝑘 = 𝑦 − 𝑐).
• Then we obtain 𝑐𝑖 via

𝑐𝑖 = (𝑢′ )−1 {𝛽 ∫(𝑢′ ∘ 𝜎)(𝑓(𝑘𝑖 )𝑧) 𝑓 ′ (𝑘𝑖 ) 𝑧 𝜙(𝑑𝑧)} (3)

• Finally, for each 𝑐𝑖 we set 𝑦𝑖 = 𝑐𝑖 + 𝑘𝑖 .


It is clear that each (𝑦𝑖 , 𝑐𝑖 ) pair constructed in this manner satisfies (2).
With the points {𝑦𝑖 , 𝑐𝑖 } in hand, we can reconstruct 𝐾𝜎 via approximation as before.
The name EGM comes from the fact that the grid {𝑦𝑖 } is determined endogenously.

31.4 Implementation

As before, we will start with a simple setting where


• 𝑢(𝑐) = ln 𝑐,
• production is Cobb-Douglas, and
• the shocks are lognormal.
This will allow us to make comparisons with the analytical solutions

In [3]: def v_star(y, α, β, μ):


"""
True value function
"""
c1 = np.log(1 ­ α * β) / (1 ­ β)
c2 = (μ + α * np.log(α * β)) / (1 ­ α)
c3 = 1 / (1 ­ β)
c4 = 1 / (1 ­ α * β)
return c1 + c2 * (c3 ­ c4) + c4 * np.log(y)

def σ_star(y, α, β):


"""
True optimal policy
"""
return (1 ­ α * β) * y

We reuse the OptimalGrowthModel class

In [4]: opt_growth_data = [
('α', float64), # Production parameter
486 CHAPTER 31. OPTIMAL GROWTH IV: THE ENDOGENOUS GRID METHOD

('β', float64), # Discount factor


('μ', float64), # Shock location parameter
('s', float64), # Shock scale parameter
('grid', float64[:]), # Grid (array)
('shocks', float64[:]) # Shock draws (array)
]

@jitclass(opt_growth_data)
class OptimalGrowthModel:

def __init__(self,
α=0.4,
β=0.96,
μ=0,
s=0.1,
grid_max=4,
grid_size=120,
shock_size=250,
seed=1234):

self.α, self.β, self.μ, self.s = α, β, μ, s

# Set up grid
self.grid = np.linspace(1e­5, grid_max, grid_size)

# Store shocks (with a seed, so results are reproducible)


np.random.seed(seed)
self.shocks = np.exp(μ + s * np.random.randn(shock_size))

def f(self, k):


"The production function"
return k**self.α

def u(self, c):


"The utility function"
return np.log(c)

def f_prime(self, k):


"Derivative of f"
return self.α * (k**(self.α ­ 1))

def u_prime(self, c):


"Derivative of u"
return 1/c

def u_prime_inv(self, c):


"Inverse of u'"
return 1/c

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/ipykernel_launcher.py:10: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba­
doc/latest/reference/deprecation.html#change­of­jitclass­location for the time�
↪frame.
# Remove the CWD from sys.path while we load stuff.
31.4. IMPLEMENTATION 487

31.4.1 The Operator

Here’s an implementation of 𝐾 using EGM as described above.

In [5]: @njit
def K(σ_array, og):
"""
The Coleman­Reffett operator using EGM

"""

# Simplify names
f, β = og.f, og.β
f_prime, u_prime = og.f_prime, og.u_prime
u_prime_inv = og.u_prime_inv
grid, shocks = og.grid, og.shocks

# Determine endogenous grid


y = grid + σ_array # y_i = k_i + c_i

# Linear interpolation of policy using endogenous grid


σ = lambda x: interp(y, σ_array, x)

# Allocate memory for new consumption array


c = np.empty_like(grid)

# Solve for updated consumption value


for i, k in enumerate(grid):
vals = u_prime(σ(f(k) * shocks)) * f_prime(k) * shocks
c[i] = u_prime_inv(β * np.mean(vals))

return c

Note the lack of any root-finding algorithm.

31.4.2 Testing

First we create an instance.

In [6]: og = OptimalGrowthModel()
grid = og.grid

Here’s our solver routine:

In [7]: def solve_model_time_iter(model, # Class with model information


σ, # Initial condition
tol=1e­4,
max_iter=1000,
verbose=True,
print_skip=25):

# Set up loop
i = 0
error = tol + 1

while i < max_iter and error > tol:


σ_new = K(σ, model)
error = np.max(np.abs(σ ­ σ_new))
488 CHAPTER 31. OPTIMAL GROWTH IV: THE ENDOGENOUS GRID METHOD

i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
σ = σ_new

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return σ_new

Let’s call it:

In [8]: σ_init = np.copy(grid)


σ = solve_model_time_iter(og, σ_init)

Converged in 12 iterations.

Here is a plot of the resulting policy, compared with the true policy:

In [9]: y = grid + σ # y_i = k_i + c_i

fig, ax = plt.subplots()

ax.plot(y, σ, lw=2,
alpha=0.8, label='approximate policy function')

ax.plot(y, σ_star(y, og.α, og.β), 'k­­',


lw=2, alpha=0.8, label='true policy function')

ax.legend()
plt.show()
31.4. IMPLEMENTATION 489

The maximal absolute deviation between the two policies is

In [10]: np.max(np.abs(σ ­ σ_star(y, og.α, og.β)))

Out[10]: 1.530274914252061e­05

How long does it take to converge?

In [11]: %%timeit ­n 3 ­r 1
σ = solve_model_time_iter(og, σ_init, verbose=False)

28.9 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 3 loops each)

Relative to time iteration, which as already found to be highly efficient, EGM has managed to
shave off still more run time without compromising accuracy.
This is due to the lack of a numerical root-finding step.
We can now solve the optimal growth model at given parameters extremely fast.
490 CHAPTER 31. OPTIMAL GROWTH IV: THE ENDOGENOUS GRID METHOD
Chapter 32

The Income Fluctuation Problem I:


Basic Model

32.1 Contents

• Overview 32.2
• The Optimal Savings Problem 32.3
• Computation 32.4
• Implementation 32.5
• Exercises 32.6
• Solutions 32.7
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon


!pip install interpolation

32.2 Overview

In this lecture, we study an optimal savings problem for an infinitely lived consumer—the
“common ancestor” described in [72], section 1.3.
This is an essential sub-problem for many representative macroeconomic models
• [4]
• [59]
• etc.
It is related to the decision problem in the stochastic optimal growth model and yet differs in
important ways.
For example, the choice problem for the agent includes an additive income term that leads to
an occasionally binding constraint.
Moreover, in this and the following lectures, we will inject more realisitic features such as cor-
related shocks.
To solve the model we will use Euler equation based time iteration, which proved to be fast
and accurate in our investigation of the stochastic optimal growth model.

491
492 CHAPTER 32. THE INCOME FLUCTUATION PROBLEM I: BASIC MODEL

Time iteration is globally convergent under mild assumptions, even when utility is unbounded
(both above and below).
We’ll need the following imports:

In [2]: import numpy as np


from quantecon.optimize import brent_max, brentq
from interpolation import interp
from numba import njit, float64, jitclass
import matplotlib.pyplot as plt
%matplotlib inline
from quantecon import MarkovChain

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

32.2.1 References

Our presentation is a simplified version of [75].


Other references include [26], [28], [68], [87], [90] and [96].

32.3 The Optimal Savings Problem

Let’s write down the model and then discuss how to solve it.

32.3.1 Set-Up

Consider a household that chooses a state-contingent consumption plan {𝑐𝑡 }𝑡≥0 to maximize


𝔼 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 )
𝑡=0

subject to

𝑎𝑡+1 ≤ 𝑅(𝑎𝑡 − 𝑐𝑡 ) + 𝑌𝑡+1 , 𝑐𝑡 ≥ 0, 𝑎𝑡 ≥ 0 𝑡 = 0, 1, … (1)

Here
• 𝛽 ∈ (0, 1) is the discount factor
• 𝑎𝑡 is asset holdings at time 𝑡, with borrowing constraint 𝑎𝑡 ≥ 0
• 𝑐𝑡 is consumption
• 𝑌𝑡 is non-capital income (wages, unemployment compensation, etc.)
• 𝑅 ∶= 1 + 𝑟, where 𝑟 > 0 is the interest rate on savings
The timing here is as follows:

1. At the start of period 𝑡, the household chooses consumption 𝑐𝑡 .


32.3. THE OPTIMAL SAVINGS PROBLEM 493

2. Labor is supplied by the household throughout the period and labor income 𝑌𝑡+1 is re-
ceived at the end of period 𝑡.

3. Financial income 𝑅(𝑎𝑡 − 𝑐𝑡 ) is received at the end of period 𝑡.

4. Time shifts to 𝑡 + 1 and the process repeats.

Non-capital income 𝑌𝑡 is given by 𝑌𝑡 = 𝑦(𝑍𝑡 ), where {𝑍𝑡 } is an exogeneous state process.


As is common in the literature, we take {𝑍𝑡 } to be a finite state Markov chain taking values
in Z with Markov matrix 𝑃 .
We further assume that

1. 𝛽𝑅 < 1

2. 𝑢 is smooth, strictly increasing and strictly concave with lim𝑐→0 𝑢′ (𝑐) = ∞ and
lim𝑐→∞ 𝑢′ (𝑐) = 0

The asset space is ℝ+ and the state is the pair (𝑎, 𝑧) ∈ S ∶= ℝ+ × Z.


A feasible consumption path from (𝑎, 𝑧) ∈ S is a consumption sequence {𝑐𝑡 } such that {𝑐𝑡 }
and its induced asset path {𝑎𝑡 } satisfy

1. (𝑎0 , 𝑧0 ) = (𝑎, 𝑧)

2. the feasibility constraints in (1), and

3. measurability, which means that 𝑐𝑡 is a function of random outcomes up to date 𝑡 but


not after.

The meaning of the third point is just that consumption at time 𝑡 cannot be a function of
outcomes are yet to be observed.
In fact, for this problem, consumption can be chosen optimally by taking it to be contingent
only on the current state.
Optimality is defined below.

32.3.2 Value Function and Euler Equation

The value function 𝑉 ∶ S → ℝ is defined by


𝑉 (𝑎, 𝑧) ∶= max 𝔼 {∑ 𝛽 𝑡 𝑢(𝑐𝑡 )} (2)
𝑡=0

where the maximization is overall feasible consumption paths from (𝑎, 𝑧).
An optimal consumption path from (𝑎, 𝑧) is a feasible consumption path from (𝑎, 𝑧) that at-
tains the supremum in (2).
To pin down such paths we can use a version of the Euler equation, which in the present set-
ting is
494 CHAPTER 32. THE INCOME FLUCTUATION PROBLEM I: BASIC MODEL

𝑢′ (𝑐𝑡 ) ≥ 𝛽𝑅 𝔼𝑡 𝑢′ (𝑐𝑡+1 ) (3)

and

𝑐𝑡 < 𝑎𝑡 ⟹ 𝑢′ (𝑐𝑡 ) = 𝛽𝑅 𝔼𝑡 𝑢′ (𝑐𝑡+1 ) (4)

When 𝑐𝑡 = 𝑎𝑡 we obviously have 𝑢′ (𝑐𝑡 ) = 𝑢′ (𝑎𝑡 ),


When 𝑐𝑡 hits the upper bound 𝑎𝑡 , the strict inequality 𝑢′ (𝑐𝑡 ) > 𝛽𝑅 𝔼𝑡 𝑢′ (𝑐𝑡+1 ) can occur be-
cause 𝑐𝑡 cannot increase sufficiently to attain equality.
(The lower boundary case 𝑐𝑡 = 0 never arises at the optimum because 𝑢′ (0) = ∞)
With some thought, one can show that (3) and (4) are equivalent to

𝑢′ (𝑐𝑡 ) = max {𝛽𝑅 𝔼𝑡 𝑢′ (𝑐𝑡+1 ) , 𝑢′ (𝑎𝑡 )} (5)

32.3.3 Optimality Results

As shown in [75],

1. For each (𝑎, 𝑧) ∈ S, a unique optimal consumption path from (𝑎, 𝑧) exists

2. This path is the unique feasible path from (𝑎, 𝑧) satisfying the Euler equality (5) and
the transversality condition

lim 𝛽 𝑡 𝔼 [𝑢′ (𝑐𝑡 )𝑎𝑡+1 ] = 0 (6)


𝑡→∞

Moreover, there exists an optimal consumption function 𝜎∗ ∶ S → ℝ+ such that the path from
(𝑎, 𝑧) generated by

(𝑎0 , 𝑧0 ) = (𝑎, 𝑧), 𝑐𝑡 = 𝜎∗ (𝑎𝑡 , 𝑍𝑡 ) and 𝑎𝑡+1 = 𝑅(𝑎𝑡 − 𝑐𝑡 ) + 𝑌𝑡+1

satisfies both (5) and (6), and hence is the unique optimal path from (𝑎, 𝑧).
Thus, to solve the optimization problem, we need to compute the policy 𝜎∗ .

32.4 Computation

There are two standard ways to solve for 𝜎∗

1. time iteration using the Euler equality and

2. value function iteration.

Our investigation of the cake eating problem and stochastic optimal growth model suggests
that time iteration will be faster and more accurate.
This is the approach that we apply below.
32.4. COMPUTATION 495

32.4.1 Time Iteration

We can rewrite (5) to make it a statement about functions rather than random variables.
In particular, consider the functional equation

(𝑢′ ∘ 𝜎)(𝑎, 𝑧) = max {𝛽𝑅 𝔼𝑧 (𝑢′ ∘ 𝜎)[𝑅(𝑎 − 𝜎(𝑎, 𝑧)) + 𝑌 ̂ , 𝑍]̂ , 𝑢′ (𝑎)} (7)

where
• (𝑢′ ∘ 𝜎)(𝑠) ∶= 𝑢′ (𝜎(𝑠)).
• 𝔼𝑧 conditions on current state 𝑧 and 𝑋̂ indicates next period value of random variable
𝑋 and
• 𝜎 is the unknown function.
We need a suitable class of candidate solutions for the optimal consumption policy.
The right way to pick such a class is to consider what properties the solution is likely to have,
in order to restrict the search space and ensure that iteration is well behaved.
To this end, let 𝒞 be the space of continuous functions 𝜎 ∶ S → ℝ such that 𝜎 is increasing in
the first argument, 0 < 𝜎(𝑎, 𝑧) ≤ 𝑎 for all (𝑎, 𝑧) ∈ S, and

sup |(𝑢′ ∘ 𝜎)(𝑎, 𝑧) − 𝑢′ (𝑎)| < ∞ (8)


(𝑎,𝑧)∈S

This will be our candidate class.


In addition, let 𝐾 ∶ 𝒞 → 𝒞 be defined as follows.
For given 𝜎 ∈ 𝒞, the value 𝐾𝜎(𝑎, 𝑧) is the unique 𝑐 ∈ [0, 𝑎] that solves

𝑢′ (𝑐) = max {𝛽𝑅 𝔼𝑧 (𝑢′ ∘ 𝜎) [𝑅(𝑎 − 𝑐) + 𝑌 ̂ , 𝑍]̂ , 𝑢′ (𝑎)} (9)

We refer to 𝐾 as the Coleman–Reffett operator.


The operator 𝐾 is constructed so that fixed points of 𝐾 coincide with solutions to the func-
tional equation (7).
It is shown in [75] that the unique optimal policy can be computed by picking any 𝜎 ∈ 𝒞 and
iterating with the operator 𝐾 defined in (9).

32.4.2 Some Technical Details

The proof of the last statement is somewhat technical but here is a quick summary:
It is shown in [75] that 𝐾 is a contraction mapping on 𝒞 under the metric

𝜌(𝑐, 𝑑) ∶= ‖ 𝑢′ ∘ 𝜎1 − 𝑢′ ∘ 𝜎2 ‖ ∶= sup | 𝑢′ (𝜎1 (𝑠)) − 𝑢′ (𝜎2 (𝑠)) | (𝜎1 , 𝜎2 ∈ 𝒞)


𝑠∈𝑆

which evaluates the maximal difference in terms of marginal utility.


(The benefit of this measure of distance is that, while elements of 𝒞 are not generally
bounded, 𝜌 is always finite under our assumptions.)
496 CHAPTER 32. THE INCOME FLUCTUATION PROBLEM I: BASIC MODEL

It is also shown that the metric 𝜌 is complete on 𝒞.


In consequence, 𝐾 has a unique fixed point 𝜎∗ ∈ 𝒞 and 𝐾 𝑛 𝑐 → 𝜎∗ as 𝑛 → ∞ for any 𝜎 ∈ 𝒞.
By the definition of 𝐾, the fixed points of 𝐾 in 𝒞 coincide with the solutions to (7) in 𝒞.
As a consequence, the path {𝑐𝑡 } generated from (𝑎0 , 𝑧0 ) ∈ 𝑆 using policy function 𝜎∗ is the
unique optimal path from (𝑎0 , 𝑧0 ) ∈ 𝑆.

32.5 Implementation

We use the CRRA utility specification

𝑐1−𝛾
𝑢(𝑐) =
1−𝛾

The exogeneous state process {𝑍𝑡 } defaults to a two-state Markov chain with state space
{0, 1} and transition matrix 𝑃 .
Here we build a class called IFP that stores the model primitives.

In [3]: ifp_data = [
('R', float64), # Interest rate 1 + r
('β', float64), # Discount factor
('γ', float64), # Preference parameter
('P', float64[:, :]), # Markov matrix for binary Z_t
('y', float64[:]), # Income is Y_t = y[Z_t]
('asset_grid', float64[:]) # Grid (array)
]

@jitclass(ifp_data)
class IFP:

def __init__(self,
r=0.01,
β=0.96,
γ=1.5,
P=((0.6, 0.4),
(0.05, 0.95)),
y=(0.0, 2.0),
grid_max=16,
grid_size=50):

self.R = 1 + r
self.β, self.γ = β, γ
self.P, self.y = np.array(P), np.array(y)
self.asset_grid = np.linspace(0, grid_max, grid_size)

# Recall that we need R β < 1 for convergence.


assert self.R * self.β < 1, "Stability condition violated."

def u_prime(self, c):


return c**(­self.γ)

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/ipykernel_launcher.py:10: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba­
32.5. IMPLEMENTATION 497

doc/latest/reference/deprecation.html#change­of­jitclass­location for the time�


↪frame.
# Remove the CWD from sys.path while we load stuff.

Next we provide a function to compute the difference

𝑢′ (𝑐) − max {𝛽𝑅 𝔼𝑧 (𝑢′ ∘ 𝜎) [𝑅(𝑎 − 𝑐) + 𝑌 ̂ , 𝑍]̂ , 𝑢′ (𝑎)} (10)

In [4]: @njit
def euler_diff(c, a, z, σ_vals, ifp):
"""
The difference between the left­ and right­hand side
of the Euler Equation, given current policy σ.

* c is the consumption choice


* (a, z) is the state, with z in {0, 1}
* σ_vals is a policy represented as a matrix.
* ifp is an instance of IFP

"""

# Simplify names
R, P, y, β, γ = ifp.R, ifp.P, ifp.y, ifp.β, ifp.γ
asset_grid, u_prime = ifp.asset_grid, ifp.u_prime
n = len(P)

# Convert policy into a function by linear interpolation


def σ(a, z):
return interp(asset_grid, σ_vals[:, z], a)

# Calculate the expectation conditional on current z


expect = 0.0
for z_hat in range(n):
expect += u_prime(σ(R * (a ­ c) + y[z_hat], z_hat)) * P[z, z_hat]

return u_prime(c) ­ max(β * R * expect, u_prime(a))

Note that we use linear interpolation along the asset grid to approximate the policy function.
The next step is to obtain the root of the Euler difference.

In [5]: @njit
def K(σ, ifp):
"""
The operator K.

"""
σ_new = np.empty_like(σ)
for i, a in enumerate(ifp.asset_grid):
for z in (0, 1):
result = brentq(euler_diff, 1e­8, a, args=(a, z, σ, ifp))
σ_new[i, z] = result.root

return σ_new

With the operator 𝐾 in hand, we can choose an initial condition and start to iterate.
The following function iterates to convergence and returns the approximate optimal policy.
498 CHAPTER 32. THE INCOME FLUCTUATION PROBLEM I: BASIC MODEL

In [6]: def solve_model_time_iter(model, # Class with model information


σ, # Initial condition
tol=1e­4,
max_iter=1000,
verbose=True,
print_skip=25):

# Set up loop
i = 0
error = tol + 1

while i < max_iter and error > tol:


σ_new = K(σ, model)
error = np.max(np.abs(σ ­ σ_new))
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
σ = σ_new

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return σ_new

Let’s carry this out using the default parameters of the IFP class:

In [7]: ifp = IFP()

# Set up initial consumption policy of consuming all assets at all z


z_size = len(ifp.P)
a_grid = ifp.asset_grid
a_size = len(a_grid)
σ_init = np.repeat(a_grid.reshape(a_size, 1), z_size, axis=1)

σ_star = solve_model_time_iter(ifp, σ_init)

Error at iteration 25 is 0.011629589188246303.


Error at iteration 50 is 0.0003857183099467143.

Converged in 60 iterations.

Here’s a plot of the resulting policy for each exogeneous state 𝑧.

In [8]: fig, ax = plt.subplots()


for z in range(z_size):
label = rf'$\sigma^*(\cdot, {z})$'
ax.plot(a_grid, σ_star[:, z], label=label)
ax.set(xlabel='assets', ylabel='consumption')
ax.legend()
plt.show()
32.5. IMPLEMENTATION 499

The following exercises walk you through several applications where policy functions are com-
puted.

32.5.1 A Sanity Check

One way to check our results is to


• set labor income to zero in each state and
• set the gross interest rate 𝑅 to unity.
In this case, our income fluctuation problem is just a cake eating problem.
We know that, in this case, the value function and optimal consumption policy are given by

In [9]: def c_star(x, β, γ):

return (1 ­ β ** (1/γ)) * x

def v_star(x, β, γ):

return (1 ­ β**(1 / γ))**(­γ) * (x**(1­γ) / (1­γ))

Let’s see if we match up:

In [10]: ifp_cake_eating = IFP(r=0.0, y=(0.0, 0.0))

σ_star = solve_model_time_iter(ifp_cake_eating, σ_init)

fig, ax = plt.subplots()
ax.plot(a_grid, σ_star[:, 0], label='numerical')
ax.plot(a_grid, c_star(a_grid, ifp.β, ifp.γ), '­­', label='analytical')
500 CHAPTER 32. THE INCOME FLUCTUATION PROBLEM I: BASIC MODEL

ax.set(xlabel='assets', ylabel='consumption')
ax.legend()

plt.show()

Error at iteration 25 is 0.023332272630545492.


Error at iteration 50 is 0.005301238424249566.
Error at iteration 75 is 0.0019706324625650695.
Error at iteration 100 is 0.0008675521337956349.
Error at iteration 125 is 0.00041073542212255454.
Error at iteration 150 is 0.00020120334010526042.
Error at iteration 175 is 0.00010021430795065234.

Converged in 176 iterations.

Success!

32.6 Exercises

32.6.1 Exercise 1

Let’s consider how the interest rate affects consumption.


Reproduce the following figure, which shows (approximately) optimal consumption policies
for different interest rates
32.6. EXERCISES 501

• Other than r, all parameters are at their default values.


• r steps through np.linspace(0, 0.04, 4).
• Consumption is plotted against assets for income shock fixed at the smallest value.
The figure shows that higher interest rates boost savings and hence suppress consumption.

32.6.2 Exercise 2

Now let’s consider the long run asset levels held by households under the default parameters.
The following figure is a 45 degree diagram showing the law of motion for assets when con-
sumption is optimal

In [11]: ifp = IFP()

σ_star = solve_model_time_iter(ifp, σ_init, verbose=False)


a = ifp.asset_grid
R, y = ifp.R, ifp.y

fig, ax = plt.subplots()
for z, lb in zip((0, 1), ('low income', 'high income')):
ax.plot(a, R * (a ­ σ_star[:, z]) + y[z] , label=lb)

ax.plot(a, a, 'k­­')
ax.set(xlabel='current assets', ylabel='next period assets')

ax.legend()
plt.show()
502 CHAPTER 32. THE INCOME FLUCTUATION PROBLEM I: BASIC MODEL

The unbroken lines show the update function for assets at each 𝑧, which is

𝑎 ↦ 𝑅(𝑎 − 𝜎∗ (𝑎, 𝑧)) + 𝑦(𝑧)

The dashed line is the 45 degree line.


We can see from the figure that the dynamics will be stable — assets do not diverge even in
the highest state.
In fact there is a unique stationary distribution of assets that we can calculate by simulation
• Can be proved via theorem 2 of [58].
• It represents the long run dispersion of assets across households when households have
idiosyncratic shocks.
Ergodicity is valid here, so stationary probabilities can be calculated by averaging over a sin-
gle long time series.
Hence to approximate the stationary distribution we can simulate a long time series for assets
and histogram it.
Your task is to generate such a histogram.
• Use a single time series {𝑎𝑡 } of length 500,000.
• Given the length of this time series, the initial condition (𝑎0 , 𝑧0 ) will not matter.
• You might find it helpful to use the MarkovChain class from quantecon.

32.6.3 Exercise 3

Following on from exercises 1 and 2, let’s look at how savings and aggregate asset holdings
vary with the interest rate
32.7. SOLUTIONS 503

• Note: [72] section 18.6 can be consulted for more background on the topic treated in
this exercise.
For a given parameterization of the model, the mean of the stationary distribution of assets
can be interpreted as aggregate capital in an economy with a unit mass of ex-ante identical
households facing idiosyncratic shocks.
Your task is to investigate how this measure of aggregate capital varies with the interest rate.
Following tradition, put the price (i.e., interest rate) is on the vertical axis.
On the horizontal axis put aggregate capital, computed as the mean of the stationary distri-
bution given the interest rate.

32.7 Solutions

32.7.1 Exercise 1

Here’s one solution:

In [12]: r_vals = np.linspace(0, 0.04, 4)

fig, ax = plt.subplots()
for r_val in r_vals:
ifp = IFP(r=r_val)
σ_star = solve_model_time_iter(ifp, σ_init, verbose=False)
ax.plot(ifp.asset_grid, σ_star[:, 0], label=f'$r = {r_val:.3f}$')

ax.set(xlabel='asset level', ylabel='consumption (low income)')


ax.legend()
plt.show()
504 CHAPTER 32. THE INCOME FLUCTUATION PROBLEM I: BASIC MODEL

32.7.2 Exercise 2

First we write a function to compute a long asset series.

In [13]: def compute_asset_series(ifp, T=500_000, seed=1234):


"""
Simulates a time series of length T for assets, given optimal
savings behavior.

ifp is an instance of IFP


"""
P, y, R = ifp.P, ifp.y, ifp.R # Simplify names

# Solve for the optimal policy


σ_star = solve_model_time_iter(ifp, σ_init, verbose=False)
σ = lambda a, z: interp(ifp.asset_grid, σ_star[:, z], a)

# Simulate the exogeneous state process


mc = MarkovChain(P)
z_seq = mc.simulate(T, random_state=seed)

# Simulate the asset path


a = np.zeros(T+1)
for t in range(T):
z = z_seq[t]
a[t+1] = R * (a[t] ­ σ(a[t], z)) + y[z]
return a

Now we call the function, generate the series and then histogram it:

In [14]: ifp = IFP()


a = compute_asset_series(ifp)

fig, ax = plt.subplots()
ax.hist(a, bins=20, alpha=0.5, density=True)
ax.set(xlabel='assets')
plt.show()
32.7. SOLUTIONS 505

The shape of the asset distribution is unrealistic.


Here it is left skewed when in reality it has a long right tail.
In a subsequent lecture we will rectify this by adding more realistic features to the model.

32.7.3 Exercise 3

Here’s one solution

In [15]: M = 25
r_vals = np.linspace(0, 0.02, M)
fig, ax = plt.subplots()

asset_mean = []
for r in r_vals:
print(f'Solving model at r = {r}')
ifp = IFP(r=r)
mean = np.mean(compute_asset_series(ifp, T=250_000))
asset_mean.append(mean)
ax.plot(asset_mean, r_vals)

ax.set(xlabel='capital', ylabel='interest rate')

plt.show()

Solving model at r = 0.0


Solving model at r = 0.0008333333333333334
Solving model at r = 0.0016666666666666668
Solving model at r = 0.0025
Solving model at r = 0.0033333333333333335
Solving model at r = 0.004166666666666667
Solving model at r = 0.005
Solving model at r = 0.005833333333333334
Solving model at r = 0.006666666666666667
Solving model at r = 0.007500000000000001
Solving model at r = 0.008333333333333333
Solving model at r = 0.009166666666666667
Solving model at r = 0.01
Solving model at r = 0.010833333333333334
Solving model at r = 0.011666666666666667
Solving model at r = 0.0125
Solving model at r = 0.013333333333333334
Solving model at r = 0.014166666666666668
Solving model at r = 0.015000000000000001
Solving model at r = 0.015833333333333335
Solving model at r = 0.016666666666666666
Solving model at r = 0.0175
Solving model at r = 0.018333333333333333
Solving model at r = 0.01916666666666667
Solving model at r = 0.02
506 CHAPTER 32. THE INCOME FLUCTUATION PROBLEM I: BASIC MODEL

As expected, aggregate savings increases with the interest rate.


Chapter 33

The Income Fluctuation Problem II:


Stochastic Returns on Assets

33.1 Contents

• Overview 33.2
• The Savings Problem 33.3
• Solution Algorithm 33.4
• Implementation 33.5
• Exercises 33.6
• Solutions 33.7
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon


!pip install interpolation

33.2 Overview

In this lecture, we continue our study of the income fluctuation problem.


While the interest rate was previously taken to be fixed, we now allow returns on assets to be
state-dependent.
This matches the fact that most households with a positive level of assets face some capital
income risk.
It has been argued that modeling capital income risk is essential for understanding the joint
distribution of income and wealth (see, e.g., [10] or [101]).
Theoretical properties of the household savings model presented here are analyzed in detail in
[75].
In terms of computation, we use a combination of time iteration and the endogenous grid
method to solve the model quickly and accurately.
We require the following imports:

In [2]: import numpy as np


from quantecon.optimize import brent_max, brentq

507
508CHAPTER 33. THE INCOME FLUCTUATION PROBLEM II: STOCHASTIC RETURNS ON ASSETS

from interpolation import interp


from numba import njit, float64, jitclass
import matplotlib.pyplot as plt
%matplotlib inline
from quantecon import MarkovChain

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

33.3 The Savings Problem

In this section we review the household problem and optimality results.

33.3.1 Set Up

A household chooses a consumption-asset path {(𝑐𝑡 , 𝑎𝑡 )} to maximize


𝔼 {∑ 𝛽 𝑡 𝑢(𝑐𝑡 )} (1)
𝑡=0

subject to

𝑎𝑡+1 = 𝑅𝑡+1 (𝑎𝑡 − 𝑐𝑡 ) + 𝑌𝑡+1 and 0 ≤ 𝑐𝑡 ≤ 𝑎𝑡 , (2)

with initial condition (𝑎0 , 𝑍0 ) = (𝑎, 𝑧) treated as given.


Note that {𝑅𝑡 }𝑡≥1 , the gross rate of return on wealth, is allowed to be stochastic.
The sequence {𝑌𝑡 }𝑡≥1 is non-financial income.
The stochastic components of the problem obey

𝑅𝑡 = 𝑅(𝑍𝑡 , 𝜁𝑡 ) and 𝑌𝑡 = 𝑌 (𝑍𝑡 , 𝜂𝑡 ), (3)

where
• the maps 𝑅 and 𝑌 are time-invariant nonnegative functions,
• the innovation processes {𝜁𝑡 } and {𝜂𝑡 } are IID and independent of each other, and
• {𝑍𝑡 }𝑡≥0 is an irreducible time-homogeneous Markov chain on a finite set Z
Let 𝑃 represent the Markov matrix for the chain {𝑍𝑡 }𝑡≥0 .
Our assumptions on preferences are the same as our previous lecture on the income fluctua-
tion problem.
As before, 𝔼𝑧 𝑋̂ means expectation of next period value 𝑋̂ given current value 𝑍 = 𝑧.
33.4. SOLUTION ALGORITHM 509

33.3.2 Assumptions

We need restrictions to ensure that the objective (1) is finite and the solution methods de-
scribed below converge.
We also need to ensure that the present discounted value of wealth does not grow too quickly.
When {𝑅𝑡 } was constant we required that 𝛽𝑅 < 1.
Now it is stochastic, we require that

1/𝑛
𝑛
𝛽𝐺𝑅 < 1, where 𝐺𝑅 ∶= lim (𝔼 ∏ 𝑅𝑡 ) (4)
𝑛→∞
𝑡=1

Notice that, when {𝑅𝑡 } takes some constant value 𝑅, this reduces to the previous restriction
𝛽𝑅 < 1
The value 𝐺𝑅 can be thought of as the long run (geometric) average gross rate of return.
More intuition behind (4) is provided in [75].
Discussion on how to check it is given below.
Finally, we impose some routine technical restrictions on non-financial income.

𝔼 𝑌𝑡 < ∞ and 𝔼 𝑢′ (𝑌𝑡 ) < ∞

One relatively simple setting where all these restrictions are satisfied is the IID and CRRA
environment of [10].

33.3.3 Optimality

Let the class of candidate consumption policies 𝒞 be defined as before.


In [75] it is shown that, under the stated assumptions,
• any 𝜎 ∈ 𝒞 satisfying the Euler equation is an optimal policy and
• exactly one such policy exists in 𝒞.
In the present setting, the Euler equation takes the form

(𝑢′ ∘ 𝜎)(𝑎, 𝑧) = max {𝛽 𝔼𝑧 𝑅̂ (𝑢′ ∘ 𝜎)[𝑅(𝑎


̂ − 𝜎(𝑎, 𝑧)) + 𝑌 ̂ , 𝑍],
̂ 𝑢′ (𝑎)} (5)

(Intuition and derivation are similar to our earlier lecture on the income fluctuation problem.)
We again solve the Euler equation using time iteration, iterating with a Coleman–Reffett op-
erator 𝐾 defined to match the Euler equation (5).

33.4 Solution Algorithm

33.4.1 A Time Iteration Operator

Our definition of the candidate class 𝜎 ∈ 𝒞 of consumption policies is the same as in our ear-
lier lecture on the income fluctuation problem.
510CHAPTER 33. THE INCOME FLUCTUATION PROBLEM II: STOCHASTIC RETURNS ON ASSETS

For fixed 𝜎 ∈ 𝒞 and (𝑎, 𝑧) ∈ S, the value 𝐾𝜎(𝑎, 𝑧) of the function 𝐾𝜎 at (𝑎, 𝑧) is defined as
the 𝜉 ∈ (0, 𝑎] that solves

𝑢′ (𝜉) = max {𝛽 𝔼𝑧 𝑅̂ (𝑢′ ∘ 𝜎)[𝑅(𝑎


̂ − 𝜉) + 𝑌 ̂ , 𝑍],
̂ 𝑢′ (𝑎)} (6)

The idea behind 𝐾 is that, as can be seen from the definitions, 𝜎 ∈ 𝒞 satisfies the Euler equa-
tion if and only if 𝐾𝜎(𝑎, 𝑧) = 𝜎(𝑎, 𝑧) for all (𝑎, 𝑧) ∈ S.
This means that fixed points of 𝐾 in 𝒞 and optimal consumption policies exactly coincide
(see [75] for more details).

33.4.2 Convergence Properties

As before, we pair 𝒞 with the distance

𝜌(𝑐, 𝑑) ∶= sup |(𝑢′ ∘ 𝑐) (𝑎, 𝑧) − (𝑢′ ∘ 𝑑) (𝑎, 𝑧)| ,


(𝑎,𝑧)∈S

It can be shown that

1. (𝒞, 𝜌) is a complete metric space,

2. there exists an integer 𝑛 such that 𝐾 𝑛 is a contraction mapping on (𝒞, 𝜌), and

3. The unique fixed point of 𝐾 in 𝒞 is the unique optimal policy in 𝒞.

We now have a clear path to successfully approximating the optimal policy: choose some 𝜎 ∈
𝒞 and then iterate with 𝐾 until convergence (as measured by the distance 𝜌)

33.4.3 Using an Endogenous Grid

In the study of that model we found that it was possible to further accelerate time iteration
via the endogenous grid method.
We will use the same method here.
The methodology is the same as it was for the optimal growth model, with the minor excep-
tion that we need to remember that consumption is not always interior.
In particular, optimal consumption can be equal to assets when the level of assets is low.

Finding Optimal Consumption

The endogenous grid method (EGM) calls for us to take a grid of savings values 𝑠𝑖 , where
each such 𝑠 is interpreted as 𝑠 = 𝑎 − 𝑐.
For the lowest grid point we take 𝑠0 = 0.
For the corresponding 𝑎0 , 𝑐0 pair we have 𝑎0 = 𝑐0 .
This happens close to the origin, where assets are low and the household consumes all that it
can.
33.5. IMPLEMENTATION 511

Although there are many solutions, the one we take is 𝑎0 = 𝑐0 = 0, which pins down the
policy at the origin, aiding interpolation.
For 𝑠 > 0, we have, by definition, 𝑐 < 𝑎, and hence consumption is interior.
Hence the max component of (5) drops out, and we solve for

̂ ′ ∘ 𝜎) [𝑅𝑠
𝑐𝑖 = (𝑢′ )−1 {𝛽 𝔼𝑧 𝑅(𝑢 ̂ 𝑖 + 𝑌 ̂ , 𝑍]}
̂ (7)

at each 𝑠𝑖 .

Iterating

Once we have the pairs {𝑠𝑖 , 𝑐𝑖 }, the endogenous asset grid is obtained by 𝑎𝑖 = 𝑐𝑖 + 𝑠𝑖 .
Also, we held 𝑧 ∈ Z in the discussion above so we can pair it with 𝑎𝑖 .
An approximation of the policy (𝑎, 𝑧) ↦ 𝜎(𝑎, 𝑧) can be obtained by interpolating {𝑎𝑖 , 𝑐𝑖 } at
each 𝑧.
In what follows, we use linear interpolation.

33.4.4 Testing the Assumptions

Convergence of time iteration is dependent on the condition 𝛽𝐺𝑅 < 1 being satisfied.
One can check this using the fact that 𝐺𝑅 is equal to the spectral radius of the matrix 𝐿 de-
fined by

𝐿(𝑧, 𝑧)̂ ∶= 𝑃 (𝑧, 𝑧)̂ ∫ 𝑅(𝑧,̂ 𝑥)𝜙(𝑥)𝑑𝑥

This indentity is proved in [75], where 𝜙 is the density of the innovation 𝜁𝑡 to returns on as-
sets.
(Remember that Z is a finite set, so this expression defines a matrix.)
Checking the condition is even easier when {𝑅𝑡 } is IID.
In that case, it is clear from the definition of 𝐺𝑅 that 𝐺𝑅 is just 𝔼𝑅𝑡 .
We test the condition 𝛽𝔼𝑅𝑡 < 1 in the code below.

33.5 Implementation

We will assume that 𝑅𝑡 = exp(𝑎𝑟 𝜁𝑡 + 𝑏𝑟 ) where 𝑎𝑟 , 𝑏𝑟 are constants and {𝜁𝑡 } is IID standard
normal.
We allow labor income to be correlated, with

𝑌𝑡 = exp(𝑎𝑦 𝜂𝑡 + 𝑍𝑡 𝑏𝑦 )

where {𝜂𝑡 } is also IID standard normal and {𝑍𝑡 } is a Markov chain taking values in {0, 1}.
512CHAPTER 33. THE INCOME FLUCTUATION PROBLEM II: STOCHASTIC RETURNS ON ASSETS

In [3]: ifp_data = [
('γ', float64), # utility parameter
('β', float64), # discount factor
('P', float64[:, :]), # transition probs for z_t
('a_r', float64), # scale parameter for R_t
('b_r', float64), # additive parameter for R_t
('a_y', float64), # scale parameter for Y_t
('b_y', float64), # additive parameter for Y_t
('s_grid', float64[:]), # Grid over savings
('η_draws', float64[:]), # Draws of innovation η for MC
('ζ_draws', float64[:]) # Draws of innovation ζ for MC
]

In [4]: @jitclass(ifp_data)
class IFP:
"""
A class that stores primitives for the income fluctuation
problem.
"""

def __init__(self,
γ=1.5,
β=0.96,
P=np.array([(0.9, 0.1),
(0.1, 0.9)]),
a_r=0.1,
b_r=0.0,
a_y=0.2,
b_y=0.5,
shock_draw_size=50,
grid_max=10,
grid_size=100,
seed=1234):

np.random.seed(seed) # arbitrary seed

self.P, self.γ, self.β = P, γ, β


self.a_r, self.b_r, self.a_y, self.b_y = a_r, b_r, a_y, b_y
self.η_draws = np.random.randn(shock_draw_size)
self.ζ_draws = np.random.randn(shock_draw_size)
self.s_grid = np.linspace(0, grid_max, grid_size)

# Test stability assuming {R_t} is IID and adopts the lognormal


# specification given below. The test is then β E R_t < 1.
ER = np.exp(b_r + a_r**2 / 2)
assert β * ER < 1, "Stability condition failed."

# Marginal utility
def u_prime(self, c):
return c**(­self.γ)

# Inverse of marginal utility


def u_prime_inv(self, c):
return c**(­1/self.γ)

def R(self, z, ζ):


return np.exp(self.a_r * ζ + self.b_r)

def Y(self, z, η):


return np.exp(self.a_y * η + (z * self.b_y))

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/ipykernel_launcher.py:1: NumbaDeprecationWarning: The 'numba.jitclass'
33.5. IMPLEMENTATION 513

decorator has moved to 'numba.experimental.jitclass' to better reflect the


experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba­
doc/latest/reference/deprecation.html#change­of­jitclass­location for the time�
↪frame.
"""Entry point for launching an IPython kernel.

Here’s the Coleman-Reffett operator based on EGM:

In [5]: @njit
def K(a_in, σ_in, ifp):
"""
The Coleman­­Reffett operator for the income fluctuation problem,
using the endogenous grid method.

* ifp is an instance of IFP


* a_in[i, z] is an asset grid
* σ_in[i, z] is consumption at a_in[i, z]
"""

# Simplify names
u_prime, u_prime_inv = ifp.u_prime, ifp.u_prime_inv
R, Y, P, β = ifp.R, ifp.Y, ifp.P, ifp.β
s_grid, η_draws, ζ_draws = ifp.s_grid, ifp.η_draws, ifp.ζ_draws
n = len(P)

# Create consumption function by linear interpolation


σ = lambda a, z: interp(a_in[:, z], σ_in[:, z], a)

# Allocate memory
σ_out = np.empty_like(σ_in)

# Obtain c_i at each s_i, z, store in σ_out[i, z], computing


# the expectation term by Monte Carlo
for i, s in enumerate(s_grid):
for z in range(n):
# Compute expectation
Ez = 0.0
for z_hat in range(n):
for η in ifp.η_draws:
for ζ in ifp.ζ_draws:
R_hat = R(z_hat, ζ)
Y_hat = Y(z_hat, η)
U = u_prime(σ(R_hat * s + Y_hat, z_hat))
Ez += R_hat * U * P[z, z_hat]
Ez = Ez / (len(η_draws) * len(ζ_draws))
σ_out[i, z] = u_prime_inv(β * Ez)

# Calculate endogenous asset grid


a_out = np.empty_like(σ_out)
for z in range(n):
a_out[:, z] = s_grid + σ_out[:, z]

# Fixing a consumption­asset pair at (0, 0) improves interpolation


σ_out[0, :] = 0
a_out[0, :] = 0

return a_out, σ_out

The next function solves for an approximation of the optimal consumption policy via time
iteration.
514CHAPTER 33. THE INCOME FLUCTUATION PROBLEM II: STOCHASTIC RETURNS ON ASSETS

In [6]: def solve_model_time_iter(model, # Class with model information


a_vec, # Initial condition for assets
σ_vec, # Initial condition for consumption
tol=1e­4,
max_iter=1000,
verbose=True,
print_skip=25):

# Set up loop
i = 0
error = tol + 1

while i < max_iter and error > tol:


a_new, σ_new = K(a_vec, σ_vec, model)
error = np.max(np.abs(σ_vec ­ σ_new))
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
a_vec, σ_vec = np.copy(a_new), np.copy(σ_new)

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return a_new, σ_new

Now we are ready to create an instance at the default parameters.

In [7]: ifp = IFP()

Next we set up an initial condition, which corresponds to consuming all assets.

In [8]: # Initial guess of σ = consume all assets


k = len(ifp.s_grid)
n = len(ifp.P)
σ_init = np.empty((k, n))
for z in range(n):
σ_init[:, z] = ifp.s_grid
a_init = np.copy(σ_init)

Let’s generate an approximation solution.

In [9]: a_star, σ_star = solve_model_time_iter(ifp, a_init, σ_init, print_skip=5)

Error at iteration 5 is 0.5081944529506561.


Error at iteration 10 is 0.1057246950930697.
Error at iteration 15 is 0.03658262202883744.
Error at iteration 20 is 0.013936729965906114.
Error at iteration 25 is 0.005292165269711546.
Error at iteration 30 is 0.0019748126990770665.
Error at iteration 35 is 0.0007219210463285108.
Error at iteration 40 is 0.0002590544496094971.
Error at iteration 45 is 9.163966595426842e­05.

Converged in 45 iterations.
33.5. IMPLEMENTATION 515

Here’s a plot of the resulting consumption policy.

In [10]: fig, ax = plt.subplots()


for z in range(len(ifp.P)):
ax.plot(a_star[:, z], σ_star[:, z], label=f"consumption when $z={z}$")

plt.legend()
plt.show()

Notice that we consume all assets in the lower range of the asset space.
This is because we anticipate income 𝑌𝑡+1 tomorrow, which makes the need to save less ur-
gent.
Can you explain why consuming all assets ends earlier (for lower values of assets) when 𝑧 =
0?

33.5.1 Law of Motion

Let’s try to get some idea of what will happen to assets over the long run under this con-
sumption policy.
As with our earlier lecture on the income fluctuation problem, we begin by producing a 45
degree diagram showing the law of motion for assets

In [11]: # Good and bad state mean labor income


Y_mean = [np.mean(ifp.Y(z, ifp.η_draws)) for z in (0, 1)]
# Mean returns
R_mean = np.mean(ifp.R(z, ifp.ζ_draws))

a = a_star
fig, ax = plt.subplots()
for z, lb in zip((0, 1), ('bad state', 'good state')):
516CHAPTER 33. THE INCOME FLUCTUATION PROBLEM II: STOCHASTIC RETURNS ON ASSETS

ax.plot(a[:, z], R_mean * (a[:, z] ­ σ_star[:, z]) + Y_mean[z] , label=lb)

ax.plot(a[:, 0], a[:, 0], 'k­­')


ax.set(xlabel='current assets', ylabel='next period assets')

ax.legend()
plt.show()

The unbroken lines represent, for each 𝑧, an average update function for assets, given by

̄ − 𝜎∗ (𝑎, 𝑧)) + 𝑌 ̄ (𝑧)


𝑎 ↦ 𝑅(𝑎

Here
• 𝑅̄ = 𝔼𝑅𝑡 , which is mean returns and
• 𝑌 ̄ (𝑧) = 𝔼𝑧 𝑌 (𝑧, 𝜂𝑡 ), which is mean labor income in state 𝑧.
The dashed line is the 45 degree line.
We can see from the figure that the dynamics will be stable — assets do not diverge even in
the highest state.

33.6 Exercises

33.6.1 Exercise 1

Let’s repeat our earlier exercise on the long-run cross sectional distribution of assets.
In that exercise, we used a relatively simple income fluctuation model.
In the solution, we found the shape of the asset distribution to be unrealistic.
33.7. SOLUTIONS 517

In particular, we failed to match the long right tail of the wealth distribution.
Your task is to try again, repeating the exercise, but now with our more sophisticated model.
Use the default parameters.

33.7 Solutions

33.7.1 Exercise 1

First we write a function to compute a long asset series.


Because we want to JIT-compile the function, we code the solution in a way that breaks some
rules on good programming style.
For example, we will pass in the solutions a_star, σ_star along with ifp, even though it
would be more natural to just pass in ifp and then solve inside the function.
The reason we do this is because solve_model_time_iter is not JIT-compiled.

In [12]: @njit
def compute_asset_series(ifp, a_star, σ_star, z_seq, T=500_000):
"""
Simulates a time series of length T for assets, given optimal
savings behavior.

* ifp is an instance of IFP


* a_star is the endogenous grid solution
* σ_star is optimal consumption on the grid
* z_seq is a time path for {Z_t}

"""

# Create consumption function by linear interpolation


σ = lambda a, z: interp(a_star[:, z], σ_star[:, z], a)

# Simulate the asset path


a = np.zeros(T+1)
for t in range(T):
z = z_seq[t]
ζ, η = np.random.randn(), np.random.randn()
R = ifp.R(z, ζ)
Y = ifp.Y(z, η)
a[t+1] = R * (a[t] ­ σ(a[t], z)) + Y
return a

Now we call the function, generate the series and then histogram it, using the solutions com-
puted above.

In [13]: T = 1_000_000
mc = MarkovChain(ifp.P)
z_seq = mc.simulate(T, random_state=1234)

a = compute_asset_series(ifp, a_star, σ_star, z_seq, T=T)

fig, ax = plt.subplots()
ax.hist(a, bins=40, alpha=0.5, density=True)
ax.set(xlabel='assets')
plt.show()
518CHAPTER 33. THE INCOME FLUCTUATION PROBLEM II: STOCHASTIC RETURNS ON ASSETS

Now we have managed to successfully replicate the long right tail of the wealth distribution.
Here’s another view of this using a horizontal violin plot.

In [14]: fig, ax = plt.subplots()


ax.violinplot(a, vert=False, showmedians=True)
ax.set(xlabel='assets')
plt.show()
Part V

Information

519
Chapter 34

Job Search VII: Search with


Learning

34.1 Contents

• Overview 34.2
• Model 34.3
• Take 1: Solution by VFI 34.4
• Take 2: A More Efficient Method 34.5
• Another Functional Equation 34.6
• Solving the RWFE 34.7
• Implementation 34.8
• Exercises 34.9
• Solutions 34.10
• Appendix A 34.11
• Appendix B 34.12
• Examples 34.13
In addition to what’s in Anaconda, this lecture deploys the libraries:

In [1]: !conda install ­y quantecon


!pip install interpolation

34.2 Overview

In this lecture, we consider an extension of the previously studied job search model of McCall
[80].
We’ll build on a model of Bayesian learning discussed in this lecture on the topic of exchange-
ability and its relationship to the concept of IID (identically and independently distributed)
random variables and to Bayesian updating.
In the McCall model, an unemployed worker decides when to accept a permanent job at a
specific fixed wage, given
• his or her discount factor
• the level of unemployment compensation
• the distribution from which wage offers are drawn

521
522 CHAPTER 34. JOB SEARCH VII: SEARCH WITH LEARNING

In the version considered below, the wage distribution is unknown and must be learned.
• The following is based on the presentation in [72], section 6.6.
Let’s start with some imports

In [2]: from numba import njit, prange, vectorize


from interpolation import mlinterp, interp
from math import gamma
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import cm
import scipy.optimize as op
from scipy.stats import cumfreq, beta

34.2.1 Model Features

• Infinite horizon dynamic programming with two states and one binary control.
• Bayesian updating to learn the unknown distribution.

34.3 Model

Let’s first review the basic McCall model [80] and then add the variation we want to consider.

34.3.1 The Basic McCall Model

Recall that, in the baseline model , an unemployed worker is presented in each period with a
permanent job offer at wage 𝑊𝑡 .
At time 𝑡, our worker either

1. accepts the offer and works permanently at constant wage 𝑊𝑡

2. rejects the offer, receives unemployment compensation 𝑐 and reconsiders next period

The wage sequence 𝑊𝑡 is IID and generated from known density 𝑞.



The worker aims to maximize the expected discounted sum of earnings 𝔼 ∑𝑡=0 𝛽 𝑡 𝑦𝑡 The func-
tion 𝑉 satisfies the recursion

𝑤
𝑣(𝑤) = max { , 𝑐 + 𝛽 ∫ 𝑣(𝑤′ )𝑞(𝑤′ )𝑑𝑤′ } (1)
1−𝛽

The optimal policy has the form 1{𝑤 ≥ 𝑤},


̄ where 𝑤̄ is a constant called the reservation
wage.

34.3.2 Offer Distribution Unknown

Now let’s extend the model by considering the variation presented in [72], section 6.6.
The model is as above, apart from the fact that
34.3. MODEL 523

• the density 𝑞 is unknown


• the worker learns about 𝑞 by starting with a prior and updating based on wage offers
that he/she observes
The worker knows there are two possible distributions 𝐹 and 𝐺 — with densities 𝑓 and 𝑔.
At the start of time, “nature” selects 𝑞 to be either 𝑓 or 𝑔 — the wage distribution from
which the entire sequence 𝑊𝑡 will be drawn.
This choice is not observed by the worker, who puts prior probability 𝜋0 on 𝑓 being chosen.
Update rule: worker’s time 𝑡 estimate of the distribution is 𝜋𝑡 𝑓 + (1 − 𝜋𝑡 )𝑔, where 𝜋𝑡 updates
via

𝜋𝑡 𝑓(𝑤𝑡+1 )
𝜋𝑡+1 = (2)
𝜋𝑡 𝑓(𝑤𝑡+1 ) + (1 − 𝜋𝑡 )𝑔(𝑤𝑡+1 )

This last expression follows from Bayes’ rule, which tells us that

ℙ{𝑊 = 𝑤 | 𝑞 = 𝑓}ℙ{𝑞 = 𝑓}
ℙ{𝑞 = 𝑓 | 𝑊 = 𝑤} = and ℙ{𝑊 = 𝑤} = ∑ ℙ{𝑊 = 𝑤 | 𝑞 = 𝜔}ℙ{𝑞 = 𝜔}
ℙ{𝑊 = 𝑤} 𝜔∈{𝑓,𝑔}

The fact that (2) is recursive allows us to progress to a recursive solution method.
Letting

𝜋𝑓(𝑤)
𝑞𝜋 (𝑤) ∶= 𝜋𝑓(𝑤) + (1 − 𝜋)𝑔(𝑤) and 𝜅(𝑤, 𝜋) ∶=
𝜋𝑓(𝑤) + (1 − 𝜋)𝑔(𝑤)

we can express the value function for the unemployed worker recursively as follows

𝑤
𝑣(𝑤, 𝜋) = max { , 𝑐 + 𝛽 ∫ 𝑣(𝑤′ , 𝜋′ ) 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ } where 𝜋′ = 𝜅(𝑤′ , 𝜋) (3)
1−𝛽

Notice that the current guess 𝜋 is a state variable, since it affects the worker’s perception of
probabilities for future rewards.

34.3.3 Parameterization

Following section 6.6 of [72], our baseline parameterization will be


• 𝑓 is Beta(1, 1)
• 𝑔 is Beta(3, 1.2)
• 𝛽 = 0.95 and 𝑐 = 0.3
The densities 𝑓 and 𝑔 have the following shape

In [3]: @vectorize
def p(x, a, b):
r = gamma(a + b) / (gamma(a) * gamma(b))
return r * x**(a­1) * (1 ­ x)**(b­1)

x_grid = np.linspace(0, 1, 100)


524 CHAPTER 34. JOB SEARCH VII: SEARCH WITH LEARNING

f = lambda x: p(x, 1, 1)
g = lambda x: p(x, 3, 1.2)

fig, ax = plt.subplots(figsize=(10, 8))


ax.plot(x_grid, f(x_grid), label='$f$', lw=2)
ax.plot(x_grid, g(x_grid), label='$g$', lw=2)

ax.legend()
plt.show()

34.3.4 Looking Forward

What kind of optimal policy might result from (3) and the parameterization specified above?
Intuitively, if we accept at 𝑤𝑎 and 𝑤𝑎 ≤ 𝑤𝑏 , then — all other things being given — we should
also accept at 𝑤𝑏 .
This suggests a policy of accepting whenever 𝑤 exceeds some threshold value 𝑤.̄
But 𝑤̄ should depend on 𝜋 — in fact, it should be decreasing in 𝜋 because
• 𝑓 is a less attractive offer distribution than 𝑔
• larger 𝜋 means more weight on 𝑓 and less on 𝑔
Thus larger 𝜋 depresses the worker’s assessment of her future prospects, and relatively low
current offers become more attractive.
Summary: We conjecture that the optimal policy is of the form 𝟙𝑤 ≥ 𝑤(𝜋)
̄ for some de-
creasing function 𝑤.̄
34.4. TAKE 1: SOLUTION BY VFI 525

34.4 Take 1: Solution by VFI

Let’s set about solving the model and see how our results match with our intuition.
We begin by solving via value function iteration (VFI), which is natural but ultimately turns
out to be second best.
The class SearchProblem is used to store parameters and methods needed to compute optimal
actions.

In [4]: class SearchProblem:


"""
A class to store a given parameterization of the "offer distribution
unknown" model.

"""

def __init__(self,
β=0.95, # Discount factor
c=0.3, # Unemployment compensation
F_a=1,
F_b=1,
G_a=3,
G_b=1.2,
w_max=1, # Maximum wage possible
w_grid_size=100,
π_grid_size=100,
mc_size=500):

self.β, self.c, self.w_max = β, c, w_max

self.f = njit(lambda x: p(x, F_a, F_b))


self.g = njit(lambda x: p(x, G_a, G_b))

self.π_min, self.π_max = 1e­3, 1­1e­3 # Avoids instability


self.w_grid = np.linspace(0, w_max, w_grid_size)
self.π_grid = np.linspace(self.π_min, self.π_max, π_grid_size)

self.mc_size = mc_size

self.w_f = np.random.beta(F_a, F_b, mc_size)


self.w_g = np.random.beta(G_a, G_b, mc_size)

The following function takes an instance of this class and returns jitted versions of the Bell-
man operator T, and a get_greedy() function to compute the approximate optimal policy
from a guess v of the value function

In [5]: def operator_factory(sp, parallel_flag=True):

f, g = sp.f, sp.g
w_f, w_g = sp.w_f, sp.w_g
β, c = sp.β, sp.c
mc_size = sp.mc_size
w_grid, π_grid = sp.w_grid, sp.π_grid

@njit
def v_func(x, y, v):
return mlinterp((w_grid, π_grid), v, (x, y))

@njit
526 CHAPTER 34. JOB SEARCH VII: SEARCH WITH LEARNING

def κ(w, π):


"""
Updates π using Bayes' rule and the current wage observation w.
"""
pf, pg = π * f(w), (1 ­ π) * g(w)
π_new = pf / (pf + pg)

return π_new

@njit(parallel=parallel_flag)
def T(v):
"""
The Bellman operator.

"""
v_new = np.empty_like(v)

for i in prange(len(w_grid)):
for j in prange(len(π_grid)):
w = w_grid[i]
π = π_grid[j]

v_1 = w / (1 ­ β)

integral_f, integral_g = 0, 0
for m in prange(mc_size):
integral_f += v_func(w_f[m], κ(w_f[m], π), v)
integral_g += v_func(w_g[m], κ(w_g[m], π), v)
integral = (π * integral_f + (1 ­ π) * integral_g) / mc_size

v_2 = c + β * integral
v_new[i, j] = max(v_1, v_2)

return v_new

@njit(parallel=parallel_flag)
def get_greedy(v):
""""
Compute optimal actions taking v as the value function.

"""
σ = np.empty_like(v)

for i in prange(len(w_grid)):
for j in prange(len(π_grid)):
w = w_grid[i]
π = π_grid[j]

v_1 = w / (1 ­ β)

integral_f, integral_g = 0, 0
for m in prange(mc_size):
integral_f += v_func(w_f[m], κ(w_f[m], π), v)
integral_g += v_func(w_g[m], κ(w_g[m], π), v)
integral = (π * integral_f + (1 ­ π) * integral_g) / mc_size

v_2 = c + β * integral

σ[i, j] = v_1 > v_2 # Evaluates to 1 or 0

return σ

return T, get_greedy
34.4. TAKE 1: SOLUTION BY VFI 527

We will omit a detailed discussion of the code because there is a more efficient solution
method that we will use later.
To solve the model we will use the following function that iterates using T to find a fixed
point

In [6]: def solve_model(sp,


use_parallel=True,
tol=1e­4,
max_iter=1000,
verbose=True,
print_skip=5):

"""
Solves for the value function

* sp is an instance of SearchProblem
"""

T, _ = operator_factory(sp, use_parallel)

# Set up loop
i = 0
error = tol + 1
m, n = len(sp.w_grid), len(sp.π_grid)

# Initialize v
v = np.zeros((m, n)) + sp.c / (1 ­ sp.β)

while i < max_iter and error > tol:


v_new = T(v)
error = np.max(np.abs(v ­ v_new))
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
v = v_new

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return v_new

Let’s look at solutions computed from value function iteration

In [7]: sp = SearchProblem()
v_star = solve_model(sp)
fig, ax = plt.subplots(figsize=(6, 6))
ax.contourf(sp.π_grid, sp.w_grid, v_star, 12, alpha=0.6, cmap=cm.jet)
cs = ax.contour(sp.π_grid, sp.w_grid, v_star, 12, colors="black")
ax.clabel(cs, inline=1, fontsize=10)
ax.set(xlabel='$\pi$', ylabel='$w$')

plt.show()

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
528 CHAPTER 34. JOB SEARCH VII: SEARCH WITH LEARNING

TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.


warnings.warn(problem)

Error at iteration 5 is 0.6289767549386269.


Error at iteration 10 is 0.10437719844149562.
Error at iteration 15 is 0.022858070322698154.
Error at iteration 20 is 0.005222113379881321.
Error at iteration 25 is 0.0012097345698922624.
Error at iteration 30 is 0.00027641272673939454.

Converged in 34 iterations.

We will also plot the optimal policy

In [8]: T, get_greedy = operator_factory(sp)


σ_star = get_greedy(v_star)

fig, ax = plt.subplots(figsize=(6, 6))


ax.contourf(sp.π_grid, sp.w_grid, σ_star, 1, alpha=0.6, cmap=cm.jet)
ax.contour(sp.π_grid, sp.w_grid, σ_star, 1, colors="black")
ax.set(xlabel='$\pi$', ylabel='$w$')
34.5. TAKE 2: A MORE EFFICIENT METHOD 529

ax.text(0.5, 0.6, 'reject')


ax.text(0.7, 0.9, 'accept')

plt.show()

The results fit well with our intuition from section [#looking-forward}{looking forward}.
• The black line in the figure above corresponds to the function 𝑤(𝜋)
̄ introduced there.
• It is decreasing as expected.

34.5 Take 2: A More Efficient Method

Let’s consider another method to solve for the optimal policy.


We will use iteration with an operator that has the same contraction rate as the Bellman op-
erator, but
• one dimensional rather than two dimensional
• no maximization step
As a consequence, the algorithm is orders of magnitude faster than VFI.
530 CHAPTER 34. JOB SEARCH VII: SEARCH WITH LEARNING

This section illustrates the point that when it comes to programming, a bit of mathematical
analysis goes a long way.

34.6 Another Functional Equation

To begin, note that when 𝑤 = 𝑤(𝜋),


̄ the worker is indifferent between accepting and rejecting.
Hence the two choices on the right-hand side of (3) have equal value:

𝑤(𝜋)
̄
= 𝑐 + 𝛽 ∫ 𝑣(𝑤′ , 𝜋′ ) 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ (4)
1−𝛽

Together, (3) and (4) give

𝑤 𝑤(𝜋)
̄
𝑣(𝑤, 𝜋) = max { , } (5)
1−𝛽 1−𝛽

Combining (4) and (5), we obtain

𝑤(𝜋)
̄ 𝑤′ ̄ ′)
𝑤(𝜋
= 𝑐 + 𝛽 ∫ max { , } 𝑞𝜋 (𝑤′ ) 𝑑𝑤′
1−𝛽 1−𝛽 1−𝛽

Multiplying by 1 − 𝛽, substituting in 𝜋′ = 𝜅(𝑤′ , 𝜋) and using ∘ for composition of functions


yields

𝑤(𝜋)
̄ = (1 − 𝛽)𝑐 + 𝛽 ∫ max {𝑤′ , 𝑤̄ ∘ 𝜅(𝑤′ , 𝜋)} 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ (6)

Equation (6) can be understood as a functional equation, where 𝑤̄ is the unknown function.
• Let’s call it the reservation wage functional equation (RWFE).
• The solution 𝑤̄ to the RWFE is the object that we wish to compute.

34.7 Solving the RWFE

To solve the RWFE, we will first show that its solution is the fixed point of a contraction
mapping.
To this end, let
• 𝑏[0, 1] be the bounded real-valued functions on [0, 1]
• ‖𝜔‖ ∶= sup𝑥∈[0,1] |𝜔(𝑥)|
Consider the operator 𝑄 mapping 𝜔 ∈ 𝑏[0, 1] into 𝑄𝜔 ∈ 𝑏[0, 1] via

(𝑄𝜔)(𝜋) = (1 − 𝛽)𝑐 + 𝛽 ∫ max {𝑤′ , 𝜔 ∘ 𝜅(𝑤′ , 𝜋)} 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ (7)

Comparing (6) and (7), we see that the set of fixed points of 𝑄 exactly coincides with the set
of solutions to the RWFE.
• If 𝑄𝑤̄ = 𝑤̄ then 𝑤̄ solves (6) and vice versa.
34.8. IMPLEMENTATION 531

Moreover, for any 𝜔, 𝜔′ ∈ 𝑏[0, 1], basic algebra and the triangle inequality for integrals tells us
that

|(𝑄𝜔)(𝜋) − (𝑄𝜔′ )(𝜋)| ≤ 𝛽 ∫ |max {𝑤′ , 𝜔 ∘ 𝜅(𝑤′ , 𝜋)} − max {𝑤′ , 𝜔′ ∘ 𝜅(𝑤′ , 𝜋)}| 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ (8)

Working case by case, it is easy to check that for real numbers 𝑎, 𝑏, 𝑐 we always have

| max{𝑎, 𝑏} − max{𝑎, 𝑐}| ≤ |𝑏 − 𝑐| (9)

Combining (8) and (9) yields

|(𝑄𝜔)(𝜋) − (𝑄𝜔′ )(𝜋)| ≤ 𝛽 ∫ |𝜔 ∘ 𝜅(𝑤′ , 𝜋) − 𝜔′ ∘ 𝜅(𝑤′ , 𝜋)| 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ ≤ 𝛽‖𝜔 − 𝜔′ ‖ (10)

Taking the supremum over 𝜋 now gives us

‖𝑄𝜔 − 𝑄𝜔′ ‖ ≤ 𝛽‖𝜔 − 𝜔′ ‖ (11)

In other words, 𝑄 is a contraction of modulus 𝛽 on the complete metric space (𝑏[0, 1], ‖ ⋅ ‖).
Hence
• A unique solution 𝑤̄ to the RWFE exists in 𝑏[0, 1].
• 𝑄𝑘 𝜔 → 𝑤̄ uniformly as 𝑘 → ∞, for any 𝜔 ∈ 𝑏[0, 1].

34.8 Implementation

The following function takes an instance of SearchProblem and returns the operator Q

In [9]: def Q_factory(sp, parallel_flag=True):

f, g = sp.f, sp.g
w_f, w_g = sp.w_f, sp.w_g
β, c = sp.β, sp.c
mc_size = sp.mc_size
w_grid, π_grid = sp.w_grid, sp.π_grid

@njit
def ω_func(p, ω):
return interp(π_grid, ω, p)

@njit
def κ(w, π):
"""
Updates π using Bayes' rule and the current wage observation w.
"""
pf, pg = π * f(w), (1 ­ π) * g(w)
π_new = pf / (pf + pg)

return π_new

@njit(parallel=parallel_flag)
def Q(ω):
532 CHAPTER 34. JOB SEARCH VII: SEARCH WITH LEARNING

"""

Updates the reservation wage function guess ω via the operator


Q.

"""
ω_new = np.empty_like(ω)

for i in prange(len(π_grid)):
π = π_grid[i]
integral_f, integral_g = 0, 0

for m in prange(mc_size):
integral_f += max(w_f[m], ω_func(κ(w_f[m], π), ω))
integral_g += max(w_g[m], ω_func(κ(w_g[m], π), ω))
integral = (π * integral_f + (1 ­ π) * integral_g) / mc_size

ω_new[i] = (1 ­ β) * c + β * integral

return ω_new

return Q

In the next exercise, you are asked to compute an approximation to 𝑤.̄

34.9 Exercises

34.9.1 Exercise 1

Use the default parameters and Q_factory to compute an optimal policy.


Your result should coincide closely with the figure for the optimal policy [#odu-pol-
vfi}{shown above}.
Try experimenting with different parameters, and confirm that the change in the optimal pol-
icy coincides with your intuition.

34.10 Solutions

34.10.1 Exercise 1

This code solves the “Offer Distribution Unknown” model by iterating on a guess of the reser-
vation wage function.
You should find that the run time is shorter than that of the value function approach.
Similar to above, we set up a function to iterate with Q to find the fixed point

In [10]: def solve_wbar(sp,


use_parallel=True,
tol=1e­4,
max_iter=1000,
verbose=True,
print_skip=5):

Q = Q_factory(sp, use_parallel)
34.10. SOLUTIONS 533

# Set up loop
i = 0
error = tol + 1
m, n = len(sp.w_grid), len(sp.π_grid)

# Initialize w
w = np.ones_like(sp.π_grid)

while i < max_iter and error > tol:


w_new = Q(w)
error = np.max(np.abs(w ­ w_new))
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
w = w_new

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return w_new

The solution can be plotted as follows

In [11]: sp = SearchProblem()
w_bar = solve_wbar(sp)

fig, ax = plt.subplots(figsize=(9, 7))

ax.plot(sp.π_grid, w_bar, color='k')


ax.fill_between(sp.π_grid, 0, w_bar, color='blue', alpha=0.15)
ax.fill_between(sp.π_grid, w_bar, sp.w_max, color='green', alpha=0.15)
ax.text(0.5, 0.6, 'reject')
ax.text(0.7, 0.9, 'accept')
ax.set(xlabel='$\pi$', ylabel='$w$')
ax.grid()
plt.show()

Error at iteration 5 is 0.023632379416083427.


Error at iteration 10 is 0.007673315946021719.
Error at iteration 15 is 0.0017441287001457306.
Error at iteration 20 is 0.000363831145529403.
Error at iteration 25 is 7.507984503452025e­05.

Converged in 25 iterations.
534 CHAPTER 34. JOB SEARCH VII: SEARCH WITH LEARNING

34.11 Appendix A

The next piece of code generates a fun simulation to see what the effect of a change in the
underlying distribution on the unemployment rate is.
At a point in the simulation, the distribution becomes significantly worse.
It takes a while for agents to learn this, and in the meantime, they are too optimistic and
turn down too many jobs.
As a result, the unemployment rate spikes

In [12]: F_a, F_b, G_a, G_b = 1, 1, 3, 1.2

sp = SearchProblem(F_a=F_a, F_b=F_b, G_a=G_a, G_b=G_b)


f, g = sp.f, sp.g

# Solve for reservation wage


w_bar = solve_wbar(sp, verbose=False)

# Interpolate reservation wage function


π_grid = sp.π_grid
w_func = njit(lambda x: interp(π_grid, w_bar, x))

@njit
def update(a, b, e, π):
"Update e and π by drawing wage offer from beta distribution with parameters a and
b"
34.11. APPENDIX A 535

if e == False:
w = np.random.beta(a, b) # Draw random wage
if w >= w_func(π):
e = True # Take new job
else:
π = 1 / (1 + ((1 ­ π) * g(w)) / (π * f(w)))

return e, π

@njit
def simulate_path(F_a=F_a,
F_b=F_b,
G_a=G_a,
G_b=G_b,
N=5000, # Number of agents
T=600, # Simulation length
d=200, # Change date
s=0.025): # Separation rate

"""Simulates path of employment for N number of works over T periods"""

e = np.ones((N, T+1))
π = np.ones((N, T+1)) * 1e­3

a, b = G_a, G_b # Initial distribution parameters

for t in range(T+1):

if t == d:
a, b = F_a, F_b # Change distribution parameters

# Update each agent


for n in range(N):
if e[n, t] == 1: # If agent is currently employment
p = np.random.uniform(0, 1)
if p <= s: # Randomly separate with probability s
e[n, t] = 0

new_e, new_π = update(a, b, e[n, t], π[n, t])


e[n, t+1] = new_e
π[n, t+1] = new_π

return e[:, 1:]

d = 200 # Change distribution at time d


unemployment_rate = 1 ­ simulate_path(d=d).mean(axis=0)

fig, ax = plt.subplots(figsize=(10, 6))


ax.plot(unemployment_rate)
ax.axvline(d, color='r', alpha=0.6, label='Change date')
ax.set_xlabel('Time')
ax.set_title('Unemployment rate')
ax.legend()
plt.show()
536 CHAPTER 34. JOB SEARCH VII: SEARCH WITH LEARNING

34.12 Appendix B

In this appendix we provide more details about how Bayes’ Law contributes to the workings
of the model.
We present some graphs that bring out additional insights about how learning works.
We build on graphs proposed in this lecture.
In particular, we’ll add actions of our searching worker to a key graph presented in that lec-
ture.
To begin, we first define two functions for computing the empirical distributions of unemploy-
ment duration and π at the time of employment.

In [13]: @njit
def empirical_dist(F_a, F_b, G_a, G_b, w_bar, π_grid,
N=10000, T=600):
"""
Simulates population for computing empirical cumulative
distribution of unemployment duration and π at time when
the worker accepts the wage offer. For each job searching
problem, we simulate for two cases that either f or g is
the true offer distribution.

Parameters
­­­­­­­­­­

F_a, F_b, G_a, G_b : parameters of beta distributions F and G.


w_bar : the reservation wage
π_grid : grid points of π, for interpolation
N : number of workers for simulation, optional
T : maximum of time periods for simulation, optional
34.12. APPENDIX B 537

Returns
­­­­­­­
accpet_t : 2 by N ndarray. the empirical distribution of
unemployment duration when f or g generates offers.
accept_π : 2 by N ndarray. the empirical distribution of
π at the time of employment when f or g generates offers.
"""

accept_t = np.empty((2, N))


accept_π = np.empty((2, N))

# f or g generates offers
for i, (a, b) in enumerate([(F_a, F_b), (G_a, G_b)]):
# update each agent
for n in range(N):

# initial priori
π = 0.5

for t in range(T+1):

# Draw random wage


w = np.random.beta(a, b)
lw = p(w, F_a, F_b) / p(w, G_a, G_b)
π = π * lw / (π * lw + 1 ­ π)

# move to next agent if accepts


if w >= interp(π_grid, w_bar, π):
break

# record the unemployment duration


# and π at the time of acceptance
accept_t[i, n] = t
accept_π[i, n] = π

return accept_t, accept_π

def cumfreq_x(res):
"""
A helper function for calculating the x grids of
the cumulative frequency histogram.
"""

cumcount = res.cumcount
lowerlimit, binsize = res.lowerlimit, res.binsize

x = lowerlimit + np.linspace(0, binsize*cumcount.size, cumcount.size)

return x

Now we define a wrapper function for analyzing job search models with learning under differ-
ent parameterizations.
The wrapper takes parameters of beta distributions and unemployment compensation as in-
puts and then displays various things we want to know to interpret the solution of our search
model.
In addition, it computes empirical cumulative distributions of two key objects.

In [14]: def job_search_example(F_a=1, F_b=1, G_a=3, G_b=1.2, c=0.3):


"""
538 CHAPTER 34. JOB SEARCH VII: SEARCH WITH LEARNING

Given the parameters that specify F and G distributions,


calculate and display the rejection and acceptance area,
the evolution of belief π, and the probability of accepting
an offer at different π level, and simulate and calculate
the empirical cumulative distribution of the duration of
unemployment and π at the time the worker accepts the offer.
"""

# construct a search problem


sp = SearchProblem(F_a=F_a, F_b=F_b, G_a=G_a, G_b=G_b, c=c)
f, g = sp.f, sp.g
π_grid = sp.π_grid

# Solve for reservation wage


w_bar = solve_wbar(sp, verbose=False)

# l(w) = f(w) / g(w)


l = lambda w: f(w) / g(w)
# objective function for solving l(w) = 1
obj = lambda w: l(w) ­ 1.

# the mode of beta distribution


# use this to divide w into two intervals for root finding
G_mode = (G_a ­ 1) / (G_a + G_b ­ 2)
roots = np.empty(2)
roots[0] = op.root_scalar(obj, bracket=[1e­10, G_mode]).root
roots[1] = op.root_scalar(obj, bracket=[G_mode, 1­1e­10]).root

fig, axs = plt.subplots(2, 2, figsize=(12, 9))

# part 1: display the details of the model settings and some results
w_grid = np.linspace(1e­12, 1­1e­12, 100)

axs[0, 0].plot(l(w_grid), w_grid, label='$l$', lw=2)


axs[0, 0].vlines(1., 0., 1., linestyle="­­")
axs[0, 0].hlines(roots, 0., 2., linestyle="­­")
axs[0, 0].set_xlim([0., 2.])
axs[0, 0].legend(loc=4)
axs[0, 0].set(xlabel='$l(w)=f(w)/g(w)$', ylabel='$w$')

axs[0, 1].plot(sp.π_grid, w_bar, color='k')


axs[0, 1].fill_between(sp.π_grid, 0, w_bar, color='blue', alpha=0.15)
axs[0, 1].fill_between(sp.π_grid, w_bar, sp.w_max, color='green', alpha=0.15)
axs[0, 1].text(0.5, 0.6, 'reject')
axs[0, 1].text(0.7, 0.9, 'accept')

W = np.arange(0.01, 0.99, 0.08)


Π = np.arange(0.01, 0.99, 0.08)

ΔW = np.zeros((len(W), len(Π)))
ΔΠ = np.empty((len(W), len(Π)))
for i, w in enumerate(W):
for j, π in enumerate(Π):
lw = l(w)
ΔΠ[i, j] = π * (lw / (π * lw + 1 ­ π) ­ 1)

q = axs[0, 1].quiver(Π, W, ΔΠ, ΔW, scale=2, color='r', alpha=0.8)

axs[0, 1].hlines(roots, 0., 1., linestyle="­­")


axs[0, 1].set(xlabel='$\pi$', ylabel='$w$')
axs[0, 1].grid()

axs[1, 0].plot(f(x_grid), x_grid, label='$f$', lw=2)


34.13. EXAMPLES 539

axs[1, 0].plot(g(x_grid), x_grid, label='$g$', lw=2)


axs[1, 0].vlines(1., 0., 1., linestyle="­­")
axs[1, 0].hlines(roots, 0., 2., linestyle="­­")
axs[1, 0].legend(loc=4)
axs[1, 0].set(xlabel='$f(w), g(w)$', ylabel='$w$')

axs[1, 1].plot(sp.π_grid, 1 ­ beta.cdf(w_bar, F_a, F_b), label='$f$')


axs[1, 1].plot(sp.π_grid, 1 ­ beta.cdf(w_bar, G_a, G_b), label='$g$')
axs[1, 1].set_ylim([0., 1.])
axs[1, 1].grid()
axs[1, 1].legend(loc=4)
axs[1, 1].set(xlabel='$\pi$', ylabel='$\mathbb{P}\{w > \overline{w} (\pi)\}$')

plt.show()

# part 2: simulate empirical cumulative distribution


accept_t, accept_π = empirical_dist(F_a, F_b, G_a, G_b, w_bar, π_grid)
N = accept_t.shape[1]

cfq_t_F = cumfreq(accept_t[0, :], numbins=100)


cfq_π_F = cumfreq(accept_π[0, :], numbins=100)

cfq_t_G = cumfreq(accept_t[1, :], numbins=100)


cfq_π_G = cumfreq(accept_π[1, :], numbins=100)

fig, axs = plt.subplots(2, 1, figsize=(12, 9))

axs[0].plot(cumfreq_x(cfq_t_F), cfq_t_F.cumcount/N, label="f generates")


axs[0].plot(cumfreq_x(cfq_t_G), cfq_t_G.cumcount/N, label="g generates")
axs[0].grid(linestyle='­­')
axs[0].legend(loc=4)
axs[0].title.set_text('CDF of duration of unemployment')
axs[0].set(xlabel='time', ylabel='Prob(time)')

axs[1].plot(cumfreq_x(cfq_π_F), cfq_π_F.cumcount/N, label="f generates")


axs[1].plot(cumfreq_x(cfq_π_G), cfq_π_G.cumcount/N, label="g generates")
axs[1].grid(linestyle='­­')
axs[1].legend(loc=4)
axs[1].title.set_text('CDF of π at time worker accepts wage and leaves
unemployment')
axs[1].set(xlabel='π', ylabel='Prob(π)')

plt.show()

We now provide some examples that provide insights about how the model works.

34.13 Examples

34.13.1 Example 1 (Baseline)

𝐹 ~ Beta(1, 1), 𝐺 ~ Beta(3, 1.2), 𝑐=0.3.


In the graphs below, the red arrows in the upper right figure show how 𝜋𝑡 is updated in re-
sponse to the new information 𝑤𝑡 .
Recall the following formula from this lecture

𝜋𝑡+1 𝑙 (𝑤𝑡+1 ) >1 if 𝑙 (𝑤𝑡+1 ) > 1


= {
𝜋𝑡 𝜋𝑡 𝑙 (𝑤𝑡+1 ) + (1 − 𝜋𝑡 ) ≤ 1 if 𝑙 (𝑤𝑡+1 ) ≤ 1
540 CHAPTER 34. JOB SEARCH VII: SEARCH WITH LEARNING

The formula implies that the direction of motion of 𝜋𝑡 is determined by the relationship be-
tween 𝑙(𝑤𝑡 ) and 1.
The magnitude is small if
• 𝑙(𝑤) is close to 1, which means the new 𝑤 is not very informative for distinguishing two
distributions,
• 𝜋𝑡−1 is close to either 0 or 1, which means the priori is strong.
Will an unemployed worker accept an offer earlier or not, when the actual ruling distribution
is 𝑔 instead of 𝑓?
Two countervailing effects are at work.
• if f generates successive wage offers, then 𝑤 is more likely to be low, but 𝜋 is moving up
toward to 1, which lowers the reservation wage, i.e., the worker becomes less selective
the longer he or she remains unemployed.
• if g generates wage offers, then 𝑤 is more likely to be high, but 𝜋 is moving downward
toward 0, increasing the reservation wage, i.e., the worker becomes more selective the
longer he or she remains unemployed.
Quantitatively, the lower right figure sheds light on which effect dominates in this example.
It shows the probability that a previously unemployed worker accepts an offer at different val-
ues of 𝜋 when 𝑓 or 𝑔 generates wage offers.
That graph shows that for the particular 𝑓 and 𝑔 in this example, the worker is always more
likely to accept an offer when 𝑓 generates the data even when 𝜋 is close to zero so that the
worker believes the true distribution is 𝑔 and therefore is relatively more selective.
The empirical cumulative distribution of the duration of unemployment verifies our conjec-
ture.

In [15]: job_search_example()

Substituting symbol P from STIXGeneral


Substituting symbol P from STIXGeneral
34.13. EXAMPLES 541
542 CHAPTER 34. JOB SEARCH VII: SEARCH WITH LEARNING

34.13.2 Example 2

𝐹 ~ Beta(1, 1), 𝐺 ~ Beta(1.2, 1.2), 𝑐=0.3.


Now 𝐺 has the same mean as 𝐹 with a smaller variance.
Since the unemployment compensation 𝑐 serves as a lower bound for bad wage offers, 𝐺 is
now an “inferior” distribution to 𝐹 .
Consequently, we observe that the optimal policy 𝑤(𝜋) is increasing in 𝜋.

In [16]: job_search_example(1, 1, 1.2, 1.2, 0.3)

Substituting symbol P from STIXGeneral


Substituting symbol P from STIXGeneral
34.13. EXAMPLES 543

34.13.3 Example 3

𝐹 ~ Beta(1, 1), 𝐺 ~ Beta(2, 2), 𝑐=0.3.


If the variance of 𝐺 is smaller, we observe in the result that 𝐺 is even more “inferior” and the
slope of 𝑤(𝜋) is larger.

In [17]: job_search_example(1, 1, 2, 2, 0.3)

Substituting symbol P from STIXGeneral


Substituting symbol P from STIXGeneral
544 CHAPTER 34. JOB SEARCH VII: SEARCH WITH LEARNING
34.13. EXAMPLES 545

34.13.4 Example 4

𝐹 ~ Beta(1, 1), 𝐺 ~ Beta(3, 1.2), and 𝑐=0.8.


In this example, we keep the parameters of beta distributions to be the same with the base-
line case but increase the unemployment compensation 𝑐.
Comparing outcomes to the baseline case (example 1) in which unemployment compensation
if low (𝑐=0.3), now the worker can afford a longer learning period.
As a result, the worker tends to accept wage offers much later.
Furthermore, at the time of accepting employment, the belief 𝜋 is closer to either 0 or 1.
That means that the worker has a better idea about what the true distribution is when he
eventually chooses to accept a wage offer.

In [18]: job_search_example(1, 1, 3, 1.2, c=0.8)

Substituting symbol P from STIXGeneral


Substituting symbol P from STIXGeneral
546 CHAPTER 34. JOB SEARCH VII: SEARCH WITH LEARNING

34.13.5 Example 5

𝐹 ~ Beta(1, 1), 𝐺 ~ Beta(3, 1.2), and 𝑐=0.1.


As expected, a smaller 𝑐 makes an unemployed worker accept wage offers earlier after having
acquired less information about the wage distribution.

In [19]: job_search_example(1, 1, 3, 1.2, c=0.1)

Substituting symbol P from STIXGeneral


Substituting symbol P from STIXGeneral
34.13. EXAMPLES 547
548 CHAPTER 34. JOB SEARCH VII: SEARCH WITH LEARNING
Chapter 35

Likelihood Ratio Processes

35.1 Contents

• Overview 35.2
• Likelihood Ratio Process 35.3
• Nature Permanently Draws from Density g 35.4
• Nature Permanently Draws from Density f 35.5
• Likelihood Ratio Test 35.6
• Sequels 35.7

In [1]: import numpy as np


import matplotlib.pyplot as plt
from numba import vectorize, njit
from math import gamma
%matplotlib inline

35.2 Overview

This lecture describes likelihood ratio processes and some of their uses.
We’ll use the simple statistical setting also used in this lecture.
Among the things that we’ll learn about are

• A peculiar property of likelihood ratio processes

• How a likelihood ratio process is the key ingredient in frequentist hypothesis testing
• How a receiver operator characteristic curve summarizes information about a false
alarm probability and power in frequentist hypothesis testing
• How during World War II the United States Navy devised a decision rule that Captain
Garret L. Schyler challenged and asked Milton Friedman to justify to him, a topic to be
studied in this lecture

35.3 Likelihood Ratio Process

A nonnegative random variable 𝑊 has one of two probability density functions, either 𝑓 or 𝑔.

549
550 CHAPTER 35. LIKELIHOOD RATIO PROCESSES

Before the beginning of time, nature once and for all decides whether she will draw a se-
quence of IID draws from either 𝑓 or 𝑔.
We will sometimes let 𝑞 be the density that nature chose once and for all, so that 𝑞 is either 𝑓
or 𝑔, permanently.
Nature knows which density it permanently draws from, but we the observers do not.
We do know both 𝑓 and 𝑔 but we don’t know which density nature chose.
But we want to know.
To do that, we use observations.
We observe a sequence {𝑤𝑡 }𝑇𝑡=1 of 𝑇 IID draws from either 𝑓 or 𝑔.
We want to use these observations to infer whether nature chose 𝑓 or 𝑔.
A likelihood ratio process is a useful tool for this task.
To begin, we define key component of a likelihood ratio process, namely, the time 𝑡 likelihood
ratio as the random variable

𝑓 (𝑤𝑡 )
ℓ(𝑤𝑡 ) = , 𝑡 ≥ 1.
𝑔 (𝑤𝑡 )

We assume that 𝑓 and 𝑔 both put positive probabilities on the same intervals of possible real-
izations of the random variable 𝑊 .
𝑓(𝑤𝑡 )
That means that under the 𝑔 density, ℓ(𝑤𝑡 ) = 𝑔(𝑤𝑡 ) is evidently a nonnegative random vari-
able with mean 1.

A likelihood ratio process for sequence {𝑤𝑡 }𝑡=1 is defined as

𝑡
𝐿 (𝑤𝑡 ) = ∏ ℓ(𝑤𝑖 ),
𝑖=1

where 𝑤𝑡 = {𝑤1 , … , 𝑤𝑡 } is a history of observations up to and including time 𝑡.


Sometimes for shorthand we’ll write 𝐿𝑡 = 𝐿(𝑤𝑡 ).
Notice that the likelihood process satisfies the recursion or multiplicative decomposition

𝐿(𝑤𝑡 ) = ℓ(𝑤𝑡 )𝐿(𝑤𝑡−1 ).

The likelihood ratio and its logarithm are key tools for making inferences using a classic fre-
quentist approach due to Neyman and Pearson [? ].
To help us appreciate how things work, the following Python code evaluates 𝑓 and 𝑔 as two
different beta distributions, then computes and simulates an associated likelihood ratio pro-
cess by generating a sequence 𝑤𝑡 from some probability distribution, for example, a sequence
of IID draws from 𝑔.

In [2]: # Parameters in the two beta distributions.


F_a, F_b = 1, 1
G_a, G_b = 3, 1.2

@vectorize
def p(x, a, b):
35.4. NATURE PERMANENTLY DRAWS FROM DENSITY G 551

r = gamma(a + b) / (gamma(a) * gamma(b))


return r * x** (a­1) * (1 ­ x) ** (b­1)

# The two density functions.


f = njit(lambda x: p(x, F_a, F_b))
g = njit(lambda x: p(x, G_a, G_b))

In [3]: @njit
def simulate(a, b, T=50, N=500):
'''
Generate N sets of T observations of the likelihood ratio,
return as N x T matrix.

'''

l_arr = np.empty((N, T))

for i in range(N):

for j in range(T):
w = np.random.beta(a, b)
l_arr[i, j] = f(w) / g(w)

return l_arr

35.4 Nature Permanently Draws from Density g

We first simulate the likelihood ratio process when nature permanently draws from 𝑔.

In [4]: l_arr_g = simulate(G_a, G_b)


l_seq_g = np.cumprod(l_arr_g, axis=1)

In [5]: N, T = l_arr_g.shape

for i in range(N):

plt.plot(range(T), l_seq_g[i, :], color='b', lw=0.8, alpha=0.5)

plt.ylim([0, 3])
plt.title("$L(w^{t})$ paths");
552 CHAPTER 35. LIKELIHOOD RATIO PROCESSES

Evidently, as sample length 𝑇 grows, most probability mass shifts toward zero
To see it this more clearly clearly, we plot over time the fraction of paths 𝐿 (𝑤𝑡 ) that fall in
the interval [0, 0.01].

In [6]: plt.plot(range(T), np.sum(l_seq_g <= 0.01, axis=0) / N)

Out[6]: [<matplotlib.lines.Line2D at 0x7f3a40810190>]


35.4. NATURE PERMANENTLY DRAWS FROM DENSITY G 553

Despite the evident convergence of most probability mass to a very small interval near 0, the
unconditional mean of 𝐿 (𝑤𝑡 ) under probability density 𝑔 is identically 1 for all 𝑡.
To verify this assertion, first notice that as mentioned earlier the unconditional mean
𝐸0 [ℓ (𝑤𝑡 ) ∣ 𝑞 = 𝑔] is 1 for all 𝑡:

𝑓 (𝑤𝑡 )
𝐸0 [ℓ (𝑤𝑡 ) ∣ 𝑞 = 𝑔] = ∫ 𝑔 (𝑤𝑡 ) 𝑑𝑤𝑡
𝑔 (𝑤𝑡 )
= ∫ 𝑓 (𝑤𝑡 ) 𝑑𝑤𝑡

= 1,

which immediately implies

𝐸0 [𝐿 (𝑤1 ) ∣ 𝑞 = 𝑔] = 𝐸0 [ℓ (𝑤1 ) ∣ 𝑞 = 𝑔]
= 1.

Because 𝐿(𝑤𝑡 ) = ℓ(𝑤𝑡 )𝐿(𝑤𝑡−1 ) and {𝑤𝑡 }𝑡𝑡=1 is an IID sequence, we have

𝐸0 [𝐿 (𝑤𝑡 ) ∣ 𝑞 = 𝑔] = 𝐸0 [𝐿 (𝑤𝑡−1 ) ℓ (𝑤𝑡 ) ∣ 𝑞 = 𝑔]


= 𝐸0 [𝐿 (𝑤𝑡−1 ) 𝐸 [ℓ (𝑤𝑡 ) ∣ 𝑞 = 𝑔, 𝑤𝑡−1 ] ∣ 𝑞 = 𝑔]
= 𝐸0 [𝐿 (𝑤𝑡−1 ) 𝐸 [ℓ (𝑤𝑡 ) ∣ 𝑞 = 𝑔] ∣ 𝑞 = 𝑔]
= 𝐸0 [𝐿 (𝑤𝑡−1 ) ∣ 𝑞 = 𝑔]

for any 𝑡 ≥ 1.
Mathematical induction implies 𝐸0 [𝐿 (𝑤𝑡 ) ∣ 𝑞 = 𝑔] = 1 for all 𝑡 ≥ 1.

35.4.1 Peculiar Property of Likelihood Ratio Process

How can 𝐸0 [𝐿 (𝑤𝑡 ) ∣ 𝑞 = 𝑔] = 1 possibly be true when most probability mass of the likelihood
ratio process is piling up near 0 as 𝑡 → +∞?
The answer has to be that as 𝑡 → +∞, the distribution of 𝐿𝑡 becomes more and more fat-
tailed: enough mass shifts to larger and larger values of 𝐿𝑡 to make the mean of 𝐿𝑡 continue
to be one despite most of the probability mass piling up near 0.
To illustrate this peculiar property, we simulate many paths and calculate the unconditional
mean of 𝐿 (𝑤𝑡 ) by averaging across these many paths at each 𝑡.

In [7]: l_arr_g = simulate(G_a, G_b, N=50000)


l_seq_g = np.cumprod(l_arr_g, axis=1)

The following Python code approximates unconditional means 𝐸0 [𝐿 (𝑤𝑡 )] by averaging across
sample paths.
Please notice that while sample averages hover around their population means of 1, there is
quite a bit of variability, a consequence of the fat tail of the distribution of 𝐿 (𝑤𝑡 ).

In [8]: N, T = l_arr_g.shape
plt.plot(range(T), np.mean(l_arr_g, axis=0))
plt.hlines(1, 0, T, linestyle='­­')
554 CHAPTER 35. LIKELIHOOD RATIO PROCESSES

Out[8]: <matplotlib.collections.LineCollection at 0x7f3a48d3aed0>

35.5 Nature Permanently Draws from Density f

Now suppose that before time 0 nature permanently decided to draw repeatedly from density
𝑓.
While the mean of the likelihood ratio ℓ (𝑤𝑡 ) under density 𝑔 is 1, its mean under the density
𝑓 exceeds one.
To see this, we compute

𝑓 (𝑤𝑡 )
𝐸0 [ℓ (𝑤𝑡 ) ∣ 𝑞 = 𝑓] = ∫ 𝑓 (𝑤𝑡 ) 𝑑𝑤𝑡
𝑔 (𝑤𝑡 )
𝑓 (𝑤𝑡 ) 𝑓 (𝑤𝑡 )
=∫ 𝑔 (𝑤𝑡 ) 𝑑𝑤𝑡
𝑔 (𝑤𝑡 ) 𝑔 (𝑤𝑡 )
2
= ∫ ℓ (𝑤𝑡 ) 𝑔 (𝑤𝑡 ) 𝑑𝑤𝑡
2
= 𝐸0 [ℓ (𝑤𝑡 ) ∣ 𝑞 = 𝑔]
2
= 𝐸0 [ℓ (𝑤𝑡 ) ∣ 𝑞 = 𝑔] + 𝑉 𝑎𝑟 (ℓ (𝑤𝑡 ) ∣ 𝑞 = 𝑔)
2
> 𝐸0 [ℓ (𝑤𝑡 ) ∣ 𝑞 = 𝑔] = 1

This in turn implies that the unconditional mean of the likelihood ratio process 𝐿(𝑤𝑡 ) di-
verges toward +∞.
Simulations below confirm this conclusion.
Please note the scale of the 𝑦 axis.
35.5. NATURE PERMANENTLY DRAWS FROM DENSITY F 555

In [9]: l_arr_f = simulate(F_a, F_b, N=50000)


l_seq_f = np.cumprod(l_arr_f, axis=1)

In [10]: N, T = l_arr_f.shape
plt.plot(range(T), np.mean(l_seq_f, axis=0))

Out[10]: [<matplotlib.lines.Line2D at 0x7f3a405e4e90>]

We also plot the probability that 𝐿 (𝑤𝑡 ) falls into the interval [10000, ∞) as a function of time
and watch how fast probability mass diverges to +∞.

In [11]: plt.plot(range(T), np.sum(l_seq_f > 10000, axis=0) / N)

Out[11]: [<matplotlib.lines.Line2D at 0x7f3a405d5e50>]


556 CHAPTER 35. LIKELIHOOD RATIO PROCESSES

35.6 Likelihood Ratio Test

We now describe how to employ the machinery of Neyman and Pearson [? ] to test the hy-
pothesis that history 𝑤𝑡 is generated by repeated IID draws from density 𝑔.
Denote 𝑞 as the data generating process, so that 𝑞 = 𝑓 or 𝑔.
Upon observing a sample {𝑊𝑖 }𝑡𝑖=1 , we want to decide which one is the data generating pro-
cess by performing a (frequentist) hypothesis test.
We specify
• Null hypothesis 𝐻0 : 𝑞 = 𝑓,
• Alternative hypothesis 𝐻1 : 𝑞 = 𝑔.
Neyman and Pearson proved that the best way to test this hypothesis is to use a likelihood
ratio test that takes the form:
• reject 𝐻0 if 𝐿(𝑊 𝑡 ) < 𝑐,
• accept 𝐻0 otherwise.
where 𝑐 is a given discrimination threshold, to be chosen in a way we’ll soon describe.
This test is best in the sense that it is a uniformly most powerful test.
To understand what this means, we have to define probabilities of two important events that
allow us to characterize a test associated with given threshold 𝑐.
The two probabities are:
• Probability of detection (= power = 1 minus probability
of Type II error):

1 − 𝛽 ≡ Pr {𝐿 (𝑤𝑡 ) < 𝑐 ∣ 𝑞 = 𝑔}
35.6. LIKELIHOOD RATIO TEST 557

• Probability of false alarm (= significance level = probability of


Type I error):

𝛼 ≡ Pr {𝐿 (𝑤𝑡 ) < 𝑐 ∣ 𝑞 = 𝑓}

The Neyman-Pearson Lemma states that among all possible tests, a likelihood ratio test max-
imizes the probability of detection for a given probability of false alarm.
Another way to say the same thing is that among all possible tests, a likelihood ratio test
maximizes power for a given significance level.
To have made a confident inference, we want a small probability of false alarm and a large
probability of detection.
With sample size 𝑡 fixed, we can change our two probabilities by adjusting 𝑐.
A troublesome “that’s life” fact is that these two probabilities move in the same direction as
we vary the critical value 𝑐.
Without specifying quantitative losses from making Type I and Type II errors, there is little
that we can say about how we should trade off probabilities of the two types of mistakes.
We do know that increasing sample size 𝑡 improves statistical inference.
Below we plot some informative figures that illustrate this.
We also present a classical frequentist method for choosing a sample size 𝑡.
Let’s start with a case in which we fix the threshold 𝑐 at 1.

In [12]: c = 1

Below we plot empirical distributions of logarithms of the cumulative likelihood ratios simu-
lated above, which are generated by either 𝑓 or 𝑔.
Taking logarithms has no effect on calculating the probabilities because the log is a mono-
tonic transformation.
As 𝑡 increases, the probabilities of making Type I and Type II errors both decrease, which is
good.
This is because most of the probability mass of log(𝐿(𝑤𝑡 )) moves toward −∞ when 𝑔 is the
data generating process, ; while log(𝐿(𝑤𝑡 )) goes to ∞ when data are generated by 𝑓.
This diverse behavior is what makes it possible to distinguish 𝑞 = 𝑓 from 𝑞 = 𝑔.

In [13]: fig, axs = plt.subplots(2, 2, figsize=(12, 8))


fig.suptitle('distribution of $log(L(w^t))$ under f or g', fontsize=15)

for i, t in enumerate([1, 7, 14, 21]):


nr = i // 2
nc = i % 2

axs[nr, nc].axvline(np.log(c), color="k", ls="­­")

hist_f, x_f = np.histogram(np.log(l_seq_f[:, t]), 200, density=True)


hist_g, x_g = np.histogram(np.log(l_seq_g[:, t]), 200, density=True)

axs[nr, nc].plot(x_f[1:], hist_f, label="dist under f")


axs[nr, nc].plot(x_g[1:], hist_g, label="dist under g")
558 CHAPTER 35. LIKELIHOOD RATIO PROCESSES

for i, (x, hist, label) in enumerate(zip([x_f, x_g], [hist_f, hist_g], ["Type I


error", "Type II error"])):
ind = x[1:] <= np.log(c) if i == 0 else x[1:] > np.log(c)
axs[nr, nc].fill_between(x[1:][ind], hist[ind], alpha=0.5, label=label)

axs[nr, nc].legend()
axs[nr, nc].set_title(f"t={t}")

plt.show()

The graph below shows more clearly that, when we hold the threshold 𝑐 fixed, the probabil-
ity of detection monotonically increases with increases in 𝑡 and that the probability of a false
alarm monotonically decreases.

In [14]: PD = np.empty(T)
PFA = np.empty(T)

for t in range(T):
PD[t] = np.sum(l_seq_g[:, t] < c) / N
PFA[t] = np.sum(l_seq_f[:, t] < c) / N

plt.plot(range(T), PD, label="Probability of detection")


plt.plot(range(T), PFA, label="Probability of false alarm")
plt.xlabel("t")
plt.title("$c=1$")
plt.legend()
plt.show()
35.6. LIKELIHOOD RATIO TEST 559

For a given sample size 𝑡, the threshold 𝑐 uniquely pins down probabilities of both types of
error.
If for a fixed 𝑡 we now free up and move 𝑐, we will sweep out the probability of detection as a
function of the probability of false alarm.
This produces what is called a receiver operating characteristic curve for a given discrimina-
tion threshold 𝑐.
Below, we plot receiver operating characteristic curves for a given discrimination threshold 𝑐
but different sample sizes 𝑡.

In [15]: PFA = np.arange(0, 100, 1)

for t in range(1, 15, 4):


percentile = np.percentile(l_seq_f[:, t], PFA)
PD = [np.sum(l_seq_g[:, t] < p) / N for p in percentile]

plt.plot(PFA / 100, PD, label=f"t={t}")

plt.scatter(0, 1, label="perfect detection")


plt.plot([0, 1], [0, 1], color='k', ls='­­', label="random detection")

plt.arrow(0.5, 0.5, ­0.15, 0.15, head_width=0.03)


plt.text(0.35, 0.7, "better")
plt.xlabel("Probability of false alarm")
plt.ylabel("Probability of detection")
plt.legend()
plt.title("Receiver Operating Characteristic Curve")
plt.show()
560 CHAPTER 35. LIKELIHOOD RATIO PROCESSES

Notice that as 𝑡 increases, we are assured a larger probability of detection and a smaller prob-
ability of false alarm associated with a given discrimination threshold 𝑐.
As 𝑡 → +∞, we approach the the perfect detection curve that is indicated by a right angle
hinging on the green dot.
For a given sample size 𝑡, a value discrimination threshold 𝑐 determines a point on the re-
ceiver operating characteristic curve.
It is up to the test designer to trade off probabilities of making the two types of errors.
But we know how to choose the smallest sample size to achieve given targets for the probabil-
ities.
Typically, frequentists aim for a high probability of detection that respects an upper bound
on the probability of false alarm.
Below we show an example in which we fix the probability of false alarm at 0.05.
The required sample size for making a decision is then determined by a target probability of
detection, for example, 0.9, as depicted in the following graph.

In [16]: PFA = 0.05


PD = np.empty(T)

for t in range(T):

c = np.percentile(l_seq_f[:, t], PFA * 100)


PD[t] = np.sum(l_seq_g[:, t] < c) / N

plt.plot(range(T), PD)
plt.axhline(0.9, color="k", ls="­­")

plt.xlabel("t")
35.7. SEQUELS 561

plt.ylabel("Probability of detection")
plt.title(f"Probability of false alarm={PFA}")
plt.show()

The United States Navy evidently used a procedure like this to select a sample size 𝑡 for do-
ing quality control tests during World War II.
A Navy Captain who had been ordered to perform tests of this kind had second thoughts
about it that he presented to Milton Friedman, as we describe in this lecture.

35.7 Sequels

Likelihood processes play an important role in Bayesian learning, as described in this lecture
and as applied in this lecture.
Likelihood ratio processes appear again in this lecture, which contains another illustration of
the peculiar property of likelihood ratio processes described above.
562 CHAPTER 35. LIKELIHOOD RATIO PROCESSES
Chapter 36

A Problem that Stumped Milton


Friedman

(and that Abraham Wald solved by inventing sequential analysis)

36.1 Contents

• Overview 36.2
• Origin of the Problem 36.3
• A Dynamic Programming Approach 36.4
• Implementation 36.5
• Analysis 36.6
• Comparison with Neyman-Pearson Formulation 36.7
• Sequels 36.8
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon


!pip install interpolation

36.2 Overview

This lecture describes a statistical decision problem encountered by Milton Friedman and W.
Allen Wallis during World War II when they were analysts at the U.S. Government’s Statisti-
cal Research Group at Columbia University.
This problem led Abraham Wald [109] to formulate sequential analysis, an approach to
statistical decision problems intimately related to dynamic programming.
In this lecture, we apply dynamic programming algorithms to Friedman and Wallis and
Wald’s problem.
Key ideas in play will be:
• Bayes’ Law
• Dynamic programming
• Type I and type II statistical errors
– a type I error occurs when you reject a null hypothesis that is true

563
564 CHAPTER 36. A PROBLEM THAT STUMPED MILTON FRIEDMAN

– a type II error is when you accept a null hypothesis that is false


• Abraham Wald’s sequential probability ratio test
• The power of a statistical test
• The critical region of a statistical test
• A uniformly most powerful test
We’ll begin with some imports:

In [2]: import numpy as np


import matplotlib.pyplot as plt
from numba import jit, prange, jitclass, float64, int64
from interpolation import interp
from math import gamma

This lecture uses ideas studied in this lecture, this lecture. and this lecture.

36.3 Origin of the Problem

On pages 137-139 of his 1998 book Two Lucky People with Rose Friedman [39], Milton Fried-
man described a problem presented to him and Allen Wallis during World War II, when they
worked at the US Government’s Statistical Research Group at Columbia University.
Let’s listen to Milton Friedman tell us what happened

In order to understand the story, it is necessary to have an idea of a simple statis-


tical problem, and of the standard procedure for dealing with it. The actual prob-
lem out of which sequential analysis grew will serve. The Navy has two alternative
designs (say A and B) for a projectile. It wants to determine which is superior. To
do so it undertakes a series of paired firings. On each round, it assigns the value
1 or 0 to A accordingly as its performance is superior or inferior to that of B and
conversely 0 or 1 to B. The Navy asks the statistician how to conduct the test and
how to analyze the results.

The standard statistical answer was to specify a number of firings (say 1,000) and
a pair of percentages (e.g., 53% and 47%) and tell the client that if A receives a 1
in more than 53% of the firings, it can be regarded as superior; if it receives a 1 in
fewer than 47%, B can be regarded as superior; if the percentage is between 47%
and 53%, neither can be so regarded.

When Allen Wallis was discussing such a problem with (Navy) Captain Garret L.
Schyler, the captain objected that such a test, to quote from Allen’s account, may
prove wasteful. If a wise and seasoned ordnance officer like Schyler were on the
premises, he would see after the first few thousand or even few hundred [rounds]
that the experiment need not be completed either because the new method is ob-
viously inferior or because it is obviously superior beyond what was hoped for ….

Friedman and Wallis struggled with the problem but, after realizing that they were not able
to solve it, described the problem to Abraham Wald.
That started Wald on the path that led him to Sequential Analysis [109].
We’ll formulate the problem using dynamic programming.
36.4. A DYNAMIC PROGRAMMING APPROACH 565

36.4 A Dynamic Programming Approach

The following presentation of the problem closely follows Dmitri Berskekas’s treatment in
Dynamic Programming and Stochastic Control [12].
A decision-maker observes a sequence of draws of a random variable 𝑧.
He (or she) wants to know which of two probability distributions 𝑓0 or 𝑓1 governs 𝑧.
Conditional on knowing that successive observations are drawn from distribution 𝑓0 , the se-
quence of random variables is independently and identically distributed (IID).
Conditional on knowing that successive observations are drawn from distribution 𝑓1 , the se-
quence of random variables is also independently and identically distributed (IID).
But the observer does not know which of the two distributions generated the sequence.
For reasons explained Exchangeability and Bayesian Updating, this means that the sequence
is not IID and that the observer has something to learn, even though he knows both 𝑓0 and
𝑓1 .
After a number of draws, also to be determined, he makes a decision about which of the dis-
tributions is generating the draws he observes.
He starts with prior

𝜋−1 = ℙ{𝑓 = 𝑓0 ∣ no observations} ∈ (0, 1)

After observing 𝑘 + 1 observations 𝑧𝑘 , 𝑧𝑘−1 , … , 𝑧0 , he updates this value to

𝜋𝑘 = ℙ{𝑓 = 𝑓0 ∣ 𝑧𝑘 , 𝑧𝑘−1 , … , 𝑧0 }

which is calculated recursively by applying Bayes’ law:

𝜋𝑘 𝑓0 (𝑧𝑘+1 )
𝜋𝑘+1 = , 𝑘 = −1, 0, 1, …
𝜋𝑘 𝑓0 (𝑧𝑘+1 ) + (1 − 𝜋𝑘 )𝑓1 (𝑧𝑘+1 )

After observing 𝑧𝑘 , 𝑧𝑘−1 , … , 𝑧0 , the decision-maker believes that 𝑧𝑘+1 has probability distribu-
tion

𝑓𝜋𝑘 (𝑣) = 𝜋𝑘 𝑓0 (𝑣) + (1 − 𝜋𝑘 )𝑓1 (𝑣)

This is a mixture of distributions 𝑓0 and 𝑓1 , with the weight on 𝑓0 being the posterior proba-
bility that 𝑓 = 𝑓0 Section ??.
To help illustrate this kind of distribution, let’s inspect some mixtures of beta distributions.
The density of a beta probability distribution with parameters 𝑎 and 𝑏 is


Γ(𝑎 + 𝑏)𝑧 𝑎−1 (1 − 𝑧)𝑏−1
𝑓(𝑧; 𝑎, 𝑏) = where Γ(𝑡) ∶= ∫ 𝑥𝑡−1 𝑒−𝑥 𝑑𝑥
Γ(𝑎)Γ(𝑏) 0

The next figure shows two beta distributions in the top panel.
The bottom panel presents mixtures of these distributions, with various mixing probabilities
𝜋𝑘
566 CHAPTER 36. A PROBLEM THAT STUMPED MILTON FRIEDMAN

In [3]: @jit(nopython=True)
def p(x, a, b):
r = gamma(a + b) / (gamma(a) * gamma(b))
return r * x**(a­1) * (1 ­ x)**(b­1)

f0 = lambda x: p(x, 1, 1)
f1 = lambda x: p(x, 9, 9)
grid = np.linspace(0, 1, 50)

fig, axes = plt.subplots(2, figsize=(10, 8))

axes[0].set_title("Original Distributions")
axes[0].plot(grid, f0(grid), lw=2, label="$f_0$")
axes[0].plot(grid, f1(grid), lw=2, label="$f_1$")

axes[1].set_title("Mixtures")
for π in 0.25, 0.5, 0.75:
y = π * f0(grid) + (1 ­ π) * f1(grid)
axes[1].plot(y, lw=2, label=f"$\pi_k$ = {π}")

for ax in axes:
ax.legend()
ax.set(xlabel="$z$ values", ylabel="probability of $z_k$")

plt.tight_layout()
plt.show()
36.4. A DYNAMIC PROGRAMMING APPROACH 567

36.4.1 Losses and Costs

After observing 𝑧𝑘 , 𝑧𝑘−1 , … , 𝑧0 , the decision-maker chooses among three distinct actions:
• He decides that 𝑓 = 𝑓0 and draws no more 𝑧’s
• He decides that 𝑓 = 𝑓1 and draws no more 𝑧’s
• He postpones deciding now and instead chooses to draw a 𝑧𝑘+1
Associated with these three actions, the decision-maker can suffer three kinds of losses:
• A loss 𝐿0 if he decides 𝑓 = 𝑓0 when actually 𝑓 = 𝑓1
• A loss 𝐿1 if he decides 𝑓 = 𝑓1 when actually 𝑓 = 𝑓0
• A cost 𝑐 if he postpones deciding and chooses instead to draw another 𝑧

36.4.2 Digression on Type I and Type II Errors

If we regard 𝑓 = 𝑓0 as a null hypothesis and 𝑓 = 𝑓1 as an alternative hypothesis, then 𝐿1 and


𝐿0 are losses associated with two types of statistical errors
• a type I error is an incorrect rejection of a true null hypothesis (a “false positive”)
• a type II error is a failure to reject a false null hypothesis (a “false negative”)
So when we treat 𝑓 = 𝑓0 as the null hypothesis
• We can think of 𝐿1 as the loss associated with a type I error.
• We can think of 𝐿0 as the loss associated with a type II error.

36.4.3 Intuition

Let’s try to guess what an optimal decision rule might look like before we go further.
Suppose at some given point in time that 𝜋 is close to 1.
Then our prior beliefs and the evidence so far point strongly to 𝑓 = 𝑓0 .
If, on the other hand, 𝜋 is close to 0, then 𝑓 = 𝑓1 is strongly favored.
Finally, if 𝜋 is in the middle of the interval [0, 1], then we have little information in either di-
rection.
This reasoning suggests a decision rule such as the one shown in the figure

As we’ll see, this is indeed the correct form of the decision rule.
The key problem is to determine the threshold values 𝛼, 𝛽, which will depend on the parame-
ters listed above.
You might like to pause at this point and try to predict the impact of a parameter such as 𝑐
or 𝐿0 on 𝛼 or 𝛽.
568 CHAPTER 36. A PROBLEM THAT STUMPED MILTON FRIEDMAN

36.4.4 A Bellman Equation

Let 𝐽 (𝜋) be the total loss for a decision-maker with current belief 𝜋 who chooses optimally.
With some thought, you will agree that 𝐽 should satisfy the Bellman equation

𝐽 (𝜋) = min {(1 − 𝜋)𝐿0 , 𝜋𝐿1 , 𝑐 + 𝔼[𝐽 (𝜋′ )]} (1)

where 𝜋′ is the random variable defined by

𝜋𝑓0 (𝑧 ′ )
𝜋′ = 𝜅(𝑧 ′ , 𝜋) =
𝜋𝑓0 (𝑧′ ) + (1 − 𝜋)𝑓1 (𝑧′ )

when 𝜋 is fixed and 𝑧′ is drawn from the current best guess, which is the distribution 𝑓 de-
fined by

𝑓𝜋 (𝑣) = 𝜋𝑓0 (𝑣) + (1 − 𝜋)𝑓1 (𝑣)

In the Bellman equation, minimization is over three actions:

1. Accept the hypothesis that 𝑓 = 𝑓0


2. Accept the hypothesis that 𝑓 = 𝑓1
3. Postpone deciding and draw again

We can represent the Bellman equation as

𝐽 (𝜋) = min {(1 − 𝜋)𝐿0 , 𝜋𝐿1 , ℎ(𝜋)} (2)

where 𝜋 ∈ [0, 1] and


• (1 − 𝜋)𝐿0 is the expected loss associated with accepting 𝑓0 (i.e., the cost of making a
type II error).
• 𝜋𝐿1 is the expected loss associated with accepting 𝑓1 (i.e., the cost of making a type I
error).
• ℎ(𝜋) ∶= 𝑐 + 𝔼[𝐽 (𝜋′ )] the continuation value; i.e., the expected cost associated with draw-
ing one more 𝑧.
The optimal decision rule is characterized by two numbers 𝛼, 𝛽 ∈ (0, 1) × (0, 1) that satisfy

(1 − 𝜋)𝐿0 < min{𝜋𝐿1 , 𝑐 + 𝔼[𝐽 (𝜋′ )]} if 𝜋 ≥ 𝛼

and

𝜋𝐿1 < min{(1 − 𝜋)𝐿0 , 𝑐 + 𝔼[𝐽 (𝜋′ )]} if 𝜋 ≤ 𝛽

The optimal decision rule is then

accept 𝑓 = 𝑓0 if 𝜋 ≥ 𝛼
accept 𝑓 = 𝑓1 if 𝜋 ≤ 𝛽
draw another 𝑧 if 𝛽 ≤ 𝜋 ≤ 𝛼
36.5. IMPLEMENTATION 569

Our aim is to compute the value function 𝐽 , and from it the associated cutoffs 𝛼 and 𝛽.
To make our computations simpler, using (2), we can write the continuation value ℎ(𝜋) as

ℎ(𝜋) = 𝑐 + 𝔼[𝐽 (𝜋′ )]


= 𝑐 + 𝔼𝜋′ min{(1 − 𝜋′ )𝐿0 , 𝜋′ 𝐿1 , ℎ(𝜋′ )}
(3)
′ ′ ′ ′ ′
= 𝑐 + ∫ min{(1 − 𝜅(𝑧 , 𝜋))𝐿0 , 𝜅(𝑧 , 𝜋)𝐿1 , ℎ(𝜅(𝑧 , 𝜋))}𝑓𝜋 (𝑧 )𝑑𝑧

The equality

ℎ(𝜋) = 𝑐 + ∫ min{(1 − 𝜅(𝑧 ′ , 𝜋))𝐿0 , 𝜅(𝑧 ′ , 𝜋)𝐿1 , ℎ(𝜅(𝑧 ′ , 𝜋))}𝑓𝜋 (𝑧′ )𝑑𝑧 ′ (4)

can be understood as a functional equation, where ℎ is the unknown.


Using the functional equation, (4), for the continuation value, we can back out optimal
choices using the RHS of (2).
This functional equation can be solved by taking an initial guess and iterating to find the
fixed point.
In other words, we iterate with an operator 𝑄, where

𝑄ℎ(𝜋) = 𝑐 + ∫ min{(1 − 𝜅(𝑧 ′ , 𝜋))𝐿0 , 𝜅(𝑧 ′ , 𝜋)𝐿1 , ℎ(𝜅(𝑧 ′ , 𝜋))}𝑓𝜋 (𝑧 ′ )𝑑𝑧 ′

36.5 Implementation

First, we will construct a jitclass to store the parameters of the model

In [4]: wf_data = [('a0', float64), # Parameters of beta distributions


('b0', float64),
('a1', float64),
('b1', float64),
('c', float64), # Cost of another draw
('π_grid_size', int64),
('L0', float64), # Cost of selecting f0 when f1 is true
('L1', float64), # Cost of selecting f1 when f0 is true
('π_grid', float64[:]),
('mc_size', int64),
('z0', float64[:]),
('z1', float64[:])]

In [5]: @jitclass(wf_data)
class WaldFriedman:

def __init__(self,
c=1.25,
a0=1,
b0=1,
a1=3,
b1=1.2,
L0=25,
L1=25,
π_grid_size=200,
570 CHAPTER 36. A PROBLEM THAT STUMPED MILTON FRIEDMAN

mc_size=1000):

self.a0, self.b0 = a0, b0


self.a1, self.b1 = a1, b1
self.c, self.π_grid_size = c, π_grid_size
self.L0, self.L1 = L0, L1
self.π_grid = np.linspace(0, 1, π_grid_size)
self.mc_size = mc_size

self.z0 = np.random.beta(a0, b0, mc_size)


self.z1 = np.random.beta(a1, b1, mc_size)

def f0(self, x):

return p(x, self.a0, self.b0)

def f1(self, x):

return p(x, self.a1, self.b1)

def f0_rvs(self):
return np.random.beta(self.a0, self.b0)

def f1_rvs(self):
return np.random.beta(self.a1, self.b1)

def κ(self, z, π):


"""
Updates π using Bayes' rule and the current observation z
"""

f0, f1 = self.f0, self.f1

π_f0, π_f1 = π * f0(z), (1 ­ π) * f1(z)


π_new = π_f0 / (π_f0 + π_f1)

return π_new

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/ipykernel_launcher.py:1: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba­
doc/latest/reference/deprecation.html#change­of­jitclass­location for the time�
↪frame.
"""Entry point for launching an IPython kernel.

As in the optimal growth lecture, to approximate a continuous value function


• We iterate at a finite grid of possible values of 𝜋.
• When we evaluate 𝔼[𝐽 (𝜋′ )] between grid points, we use linear interpolation.
We define the operator function Q below.

In [6]: @jit(nopython=True, parallel=True)


def Q(h, wf):

c, π_grid = wf.c, wf.π_grid


L0, L1 = wf.L0, wf.L1
z0, z1 = wf.z0, wf.z1
mc_size = wf.mc_size
36.6. ANALYSIS 571

κ = wf.κ

h_new = np.empty_like(π_grid)
h_func = lambda p: interp(π_grid, h, p)

for i in prange(len(π_grid)):
π = π_grid[i]

# Find the expected value of J by integrating over z


integral_f0, integral_f1 = 0, 0
for m in range(mc_size):
π_0 = κ(z0[m], π) # Draw z from f0 and update π
integral_f0 += min((1 ­ π_0) * L0, π_0 * L1, h_func(π_0))

π_1 = κ(z1[m], π) # Draw z from f1 and update π


integral_f1 += min((1 ­ π_1) * L0, π_1 * L1, h_func(π_1))

integral = (π * integral_f0 + (1 ­ π) * integral_f1) / mc_size

h_new[i] = c + integral

return h_new

To solve the model, we will iterate using Q to find the fixed point

In [7]: @jit(nopython=True)
def solve_model(wf, tol=1e­4, max_iter=1000):
"""
Compute the continuation value function

* wf is an instance of WaldFriedman
"""

# Set up loop
h = np.zeros(len(wf.π_grid))
i = 0
error = tol + 1

while i < max_iter and error > tol:


h_new = Q(h, wf)
error = np.max(np.abs(h ­ h_new))
i += 1
h = h_new

if i == max_iter:
print("Failed to converge!")

return h_new

36.6 Analysis

Let’s inspect the model’s solutions.


We will be using the default parameterization with distributions like so

In [8]: wf = WaldFriedman()

fig, ax = plt.subplots(figsize=(10, 6))


572 CHAPTER 36. A PROBLEM THAT STUMPED MILTON FRIEDMAN

ax.plot(wf.f0(wf.π_grid), label="$f_0$")
ax.plot(wf.f1(wf.π_grid), label="$f_1$")
ax.set(ylabel="probability of $z_k$", xlabel="$k$", title="Distributions")
ax.legend()

plt.show()

36.6.1 Value Function

To solve the model, we will call our solve_model function

In [9]: h_star = solve_model(wf) # Solve the model

<ipython­input­7­0c1f615d23a7>:15: NumbaWarning: The TBB threading layer�


↪requires TBB
version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
h_new = Q(h, wf)

We will also set up a function to compute the cutoffs 𝛼 and 𝛽 and plot these on our value
function plot

In [10]: @jit(nopython=True)
def find_cutoff_rule(wf, h):

"""
This function takes a continuation value function and returns the
corresponding cutoffs of where you transition between continuing and
choosing a specific model
36.6. ANALYSIS 573

"""

π_grid = wf.π_grid
L0, L1 = wf.L0, wf.L1

# Evaluate cost at all points on grid for choosing a model


payoff_f0 = (1 ­ π_grid) * L0
payoff_f1 = π_grid * L1

# The cutoff points can be found by differencing these costs with


# The Bellman equation (J is always less than or equal to p_c_i)
β = π_grid[np.searchsorted(
payoff_f1 ­ np.minimum(h, payoff_f0),
1e­10)
­ 1]
α = π_grid[np.searchsorted(
np.minimum(h, payoff_f1) ­ payoff_f0,
1e­10)
­ 1]

return (β, α)

β, α = find_cutoff_rule(wf, h_star)
cost_L0 = (1 ­ wf.π_grid) * wf.L0
cost_L1 = wf.π_grid * wf.L1

fig, ax = plt.subplots(figsize=(10, 6))

ax.plot(wf.π_grid, h_star, label='continuation value')


ax.plot(wf.π_grid, cost_L1, label='choose f1')
ax.plot(wf.π_grid, cost_L0, label='choose f0')
ax.plot(wf.π_grid,
np.amin(np.column_stack([h_star, cost_L0, cost_L1]),axis=1),
lw=15, alpha=0.1, color='b', label='minimum cost')

ax.annotate(r"$\beta$", xy=(β + 0.01, 0.5), fontsize=14)


ax.annotate(r"$\alpha$", xy=(α + 0.01, 0.5), fontsize=14)

plt.vlines(β, 0, β * wf.L0, linestyle="­­")


plt.vlines(α, 0, (1 ­ α) * wf.L1, linestyle="­­")

ax.set(xlim=(0, 1), ylim=(0, 0.5 * max(wf.L0, wf.L1)), ylabel="cost",


xlabel="$\pi$", title="Value function")

plt.legend(borderpad=1.1)
plt.show()
574 CHAPTER 36. A PROBLEM THAT STUMPED MILTON FRIEDMAN

The value function equals 𝜋𝐿1 for 𝜋 ≤ 𝛽, and (1 − 𝜋)𝐿0 for 𝜋 ≥ 𝛼.


The slopes of the two linear pieces of the value function are determined by 𝐿1 and −𝐿0 .
The value function is smooth in the interior region, where the posterior probability assigned
to 𝑓0 is in the indecisive region 𝜋 ∈ (𝛽, 𝛼).
The decision-maker continues to sample until the probability that he attaches to model 𝑓0
falls below 𝛽 or above 𝛼.

36.6.2 Simulations

The next figure shows the outcomes of 500 simulations of the decision process.
On the left is a histogram of the stopping times, which equal the number of draws of 𝑧𝑘 re-
quired to make a decision.
The average number of draws is around 6.6.
On the right is the fraction of correct decisions at the stopping time.
In this case, the decision-maker is correct 80% of the time

In [11]: def simulate(wf, true_dist, h_star, π_0=0.5):

"""
This function takes an initial condition and simulates until it
stops (when a decision is made)
"""

f0, f1 = wf.f0, wf.f1


f0_rvs, f1_rvs = wf.f0_rvs, wf.f1_rvs
π_grid = wf.π_grid
κ = wf.κ
36.6. ANALYSIS 575

if true_dist == "f0":
f, f_rvs = wf.f0, wf.f0_rvs
elif true_dist == "f1":
f, f_rvs = wf.f1, wf.f1_rvs

# Find cutoffs
β, α = find_cutoff_rule(wf, h_star)

# Initialize a couple of useful variables


decision_made = False
π = π_0
t = 0

while decision_made is False:


# Maybe should specify which distribution is correct one so that
# the draws come from the "right" distribution
z = f_rvs()
t = t + 1
π = κ(z, π)
if π < β:
decision_made = True
decision = 1
elif π > α:
decision_made = True
decision = 0

if true_dist == "f0":
if decision == 0:
correct = True
else:
correct = False

elif true_dist == "f1":


if decision == 1:
correct = True
else:
correct = False

return correct, π, t

def stopping_dist(wf, h_star, ndraws=250, true_dist="f0"):

"""
Simulates repeatedly to get distributions of time needed to make a
decision and how often they are correct
"""

tdist = np.empty(ndraws, int)


cdist = np.empty(ndraws, bool)

for i in range(ndraws):
correct, π, t = simulate(wf, true_dist, h_star)
tdist[i] = t
cdist[i] = correct

return cdist, tdist

def simulation_plot(wf):
h_star = solve_model(wf)
ndraws = 500
cdist, tdist = stopping_dist(wf, h_star, ndraws)

fig, ax = plt.subplots(1, 2, figsize=(16, 5))


576 CHAPTER 36. A PROBLEM THAT STUMPED MILTON FRIEDMAN

ax[0].hist(tdist, bins=np.max(tdist))
ax[0].set_title(f"Stopping times over {ndraws} replications")
ax[0].set(xlabel="time", ylabel="number of stops")
ax[0].annotate(f"mean = {np.mean(tdist)}", xy=(max(tdist) / 2,
max(np.histogram(tdist, bins=max(tdist))[0]) / 2))

ax[1].hist(cdist.astype(int), bins=2)
ax[1].set_title(f"Correct decisions over {ndraws} replications")
ax[1].annotate(f"% correct = {np.mean(cdist)}",
xy=(0.05, ndraws / 2))

plt.show()

simulation_plot(wf)

36.6.3 Comparative Statics

Now let’s consider the following exercise.


We double the cost of drawing an additional observation.
Before you look, think about what will happen:
• Will the decision-maker be correct more or less often?
• Will he make decisions sooner or later?

In [12]: wf = WaldFriedman(c=2.5)
simulation_plot(wf)
36.7. COMPARISON WITH NEYMAN-PEARSON FORMULATION 577

Increased cost per draw has induced the decision-maker to take less draws before deciding.
Because he decides with less, the percentage of time he is correct drops.
This leads to him having a higher expected loss when he puts equal weight on both models.

36.6.4 A Notebook Implementation

To facilitate comparative statics, we provide a Jupyter notebook that generates the same
plots, but with sliders.
With these sliders, you can adjust parameters and immediately observe
• effects on the smoothness of the value function in the indecisive middle range as we in-
crease the number of grid points in the piecewise linear approximation.
• effects of different settings for the cost parameters 𝐿0 , 𝐿1 , 𝑐, the parameters of two beta
distributions 𝑓0 and 𝑓1 , and the number of points and linear functions 𝑚 to use in the
piece-wise continuous approximation to the value function.
• various simulations from 𝑓0 and associated distributions of waiting times to making a
decision.
• associated histograms of correct and incorrect decisions.

36.7 Comparison with Neyman-Pearson Formulation

For several reasons, it is useful to describe the theory underlying the test that Navy Captain
G. S. Schuyler had been told to use and that led him to approach Milton Friedman and Allan
Wallis to convey his conjecture that superior practical procedures existed.
Evidently, the Navy had told Captail Schuyler to use what it knew to be a state-of-the-art
Neyman-Pearson test.
We’ll rely on Abraham Wald’s [109] elegant summary of Neyman-Pearson theory.
For our purposes, watch for there features of the setup:
• the assumption of a fixed sample size 𝑛
• the application of laws of large numbers, conditioned on alternative probability models,
to interpret the probabilities 𝛼 and 𝛽 defined in the Neyman-Pearson theory
Recall that in the sequential analytic formulation above, that
• The sample size 𝑛 is not fixed but rather an object to be chosen; technically 𝑛 is a ran-
dom variable.
• The parameters 𝛽 and 𝛼 characterize cut-off rules used to determine 𝑛 as a random
variable.
• Laws of large numbers make no appearances in the sequential construction.
In chapter 1 of Sequential Analysis [109] Abraham Wald summarizes the Neyman-Pearson
approach to hypothesis testing.
Wald frames the problem as making a decision about a probability distribution that is par-
tially known.
(You have to assume that something is already known in order to state a well-posed problem
– usually, something means a lot)
578 CHAPTER 36. A PROBLEM THAT STUMPED MILTON FRIEDMAN

By limiting what is unknown, Wald uses the following simple structure to illustrate the main
ideas:
• A decision-maker wants to decide which of two distributions 𝑓0 , 𝑓1 govern an IID ran-
dom variable 𝑧.
• The null hypothesis 𝐻0 is the statement that 𝑓0 governs the data.
• The alternative hypothesis 𝐻1 is the statement that 𝑓1 governs the data.
• The problem is to devise and analyze a test of hypothesis 𝐻0 against the alternative
hypothesis 𝐻1 on the basis of a sample of a fixed number 𝑛 independent observations
𝑧1 , 𝑧2 , … , 𝑧𝑛 of the random variable 𝑧.
To quote Abraham Wald,

A test procedure leading to the acceptance or rejection of the [null] hypothesis in


question is simply a rule specifying, for each possible sample of size 𝑛, whether the
[null] hypothesis should be accepted or rejected on the basis of the sample. This
may also be expressed as follows: A test procedure is simply a subdivision of the
totality of all possible samples of size 𝑛 into two mutually exclusive parts, say part
1 and part 2, together with the application of the rule that the [null] hypothesis
be accepted if the observed sample is contained in part 2. Part 1 is also called the
critical region. Since part 2 is the totality of all samples of size 𝑛 which are not
included in part 1, part 2 is uniquely determined by part 1. Thus, choosing a test
procedure is equivalent to determining a critical region.

Let’s listen to Wald longer:

As a basis for choosing among critical regions the following considerations have
been advanced by Neyman and Pearson: In accepting or rejecting 𝐻0 we may
commit errors of two kinds. We commit an error of the first kind if we reject 𝐻0
when it is true; we commit an error of the second kind if we accept 𝐻0 when 𝐻1
is true. After a particular critical region 𝑊 has been chosen, the probability of
committing an error of the first kind, as well as the probability of committing an
error of the second kind is uniquely determined. The probability of committing an
error of the first kind is equal to the probability, determined by the assumption
that 𝐻0 is true, that the observed sample will be included in the critical region 𝑊 .
The probability of committing an error of the second kind is equal to the proba-
bility, determined on the assumption that 𝐻1 is true, that the probability will fall
outside the critical region 𝑊 . For any given critical region 𝑊 we shall denote the
probability of an error of the first kind by 𝛼 and the probability of an error of the
second kind by 𝛽.

Let’s listen carefully to how Wald applies law of large numbers to interpret 𝛼 and 𝛽:

The probabilities 𝛼 and 𝛽 have the following important practical interpretation:


Suppose that we draw a large number of samples of size 𝑛. Let 𝑀 be the num-
ber of such samples drawn. Suppose that for each of these 𝑀 samples we reject
𝐻0 if the sample is included in 𝑊 and accept 𝐻0 if the sample lies outside 𝑊 . In
this way we make 𝑀 statements of rejection or acceptance. Some of these state-
ments will in general be wrong. If 𝐻0 is true and if 𝑀 is large, the probability is
nearly 1 (i.e., it is practically certain) that the proportion of wrong statements
(i.e., the number of wrong statements divided by 𝑀 ) will be approximately 𝛼. If
36.7. COMPARISON WITH NEYMAN-PEARSON FORMULATION 579

𝐻1 is true, the probability is nearly 1 that the proportion of wrong statements will
be approximately 𝛽. Thus, we can say that in the long run [ here Wald applies
law of large numbers by driving 𝑀 → ∞ (our comment, not Wald’s) ] the propor-
tion of wrong statements will be 𝛼 if 𝐻0 is true and 𝛽 if 𝐻1 is true.

The quantity 𝛼 is called the size of the critical region, and the quantity 1 − 𝛽 is called the
power of the critical region.
Wald notes that

one critical region 𝑊 is more desirable than another if it has smaller values of 𝛼
and 𝛽. Although either 𝛼 or 𝛽 can be made arbitrarily small by a proper choice of
the critical region 𝑊 , it is possible to make both 𝛼 and 𝛽 arbitrarily small for a
fixed value of 𝑛, i.e., a fixed sample size.

Wald summarizes Neyman and Pearson’s setup as follows:

Neyman and Pearson show that a region consisting of all samples (𝑧1 , 𝑧2 , … , 𝑧𝑛 )
which satisfy the inequality

𝑓1 (𝑧1 ) ⋯ 𝑓1 (𝑧𝑛 )
≥𝑘
𝑓0 (𝑧1 ) ⋯ 𝑓0 (𝑧𝑛 )

is a most powerful critical region for testing the hypothesis 𝐻0 against the alternative hy-
pothesis 𝐻1 . The term 𝑘 on the right side is a constant chosen so that the region will have
the required size 𝛼.
Wald goes on to discuss Neyman and Pearson’s concept of uniformly most powerful test.
Here is how Wald introduces the notion of a sequential test

A rule is given for making one of the following three decisions at any stage of the
experiment (at the m th trial for each integral value of m ): (1) to accept the hy-
pothesis H , (2) to reject the hypothesis H , (3) to continue the experiment by
making an additional observation. Thus, such a test procedure is carried out se-
quentially. On the basis of the first observation, one of the aforementioned deci-
sion is made. If the first or second decision is made, the process is terminated. If
the third decision is made, a second trial is performed. Again, on the basis of the
first two observations, one of the three decision is made. If the third decision is
made, a third trial is performed, and so on. The process is continued until either
the first or the second decisions is made. The number n of observations required
by such a test procedure is a random variable, since the value of n depends on the
outcome of the observations.

Footnotes
[1] The decision maker acts as if he believes that the sequence of random variables [𝑧0 , 𝑧1 , …]
is exchangeable. See Exchangeability and Bayesian Updating and [67] chapter 11, for discus-
sions of exchangeability.
580 CHAPTER 36. A PROBLEM THAT STUMPED MILTON FRIEDMAN

36.8 Sequels

We’ll dig deeper into some of the ideas used here in the following lectures:
• this lecture discusses the key concept of exchangeability that rationalizes statistical
learning
• this lecture describes likelihood ratio processes and their role in frequentist and
Bayesian statistical theories
• this lecture discusses the role of likelihood ratio processes in Bayesian learning
• this lecture returns to the subject of this lecture and studies whether the Captain’s
hunch that the (frequentist) decision rule that the Navy had ordered him to use can
be expected to be better or worse than the rule sequential rule that Abraham Wald de-
signed
Chapter 37

Exchangeability and Bayesian


Updating

37.1 Contents

• Overview 37.2
• Independently and Identically Distributed 37.3
• A Setting in Which Past Observations Are Informative 37.4
• Relationship Between IID and Exchangeable 37.5
• Exchangeability 37.6
• Bayes’ Law 37.7
• More Details about Bayesian Updating 37.8
• Appendix 37.9
• Sequels 37.10

37.2 Overview

This lecture studies an example of learning via Bayes’ Law.


We touch on foundations of Bayesian statistical inference invented by Bruno DeFinetti [25].
The relevance of DeFinetti’s work for economists is presented forcefully in chapter 11 of [67]
by David Kreps.
The example that we study in this lecture is a key component of this lecture that augments
the classic job search model of McCall [80] by presenting an unemployed worker with a statis-
tical inference problem.
Here we create graphs that illustrate the role that a likelihood ratio plays in Bayes’ Law.
We’ll use such graphs to provide insights into the mechanics driving outcomes in this lecture
about learning in an augmented McCall job search model.
Among other things, this lecture discusses connections between the statistical concepts of se-
quences of random variables that are
• independently and identically distributed
• exchangeable
Understanding the distinction between these concepts is essential for appreciating how

581
582 CHAPTER 37. EXCHANGEABILITY AND BAYESIAN UPDATING

Bayesian updating works in our example.


You can read about exchangeability here
Below, we’ll often use
• 𝑊 to denote a random variable
• 𝑤 to denote a particular realization of a random variable 𝑊
Let’s start with some imports:

In [1]: from numba import njit, vectorize


from math import gamma
import scipy.optimize as op
from scipy.integrate import quad
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

37.3 Independently and Identically Distributed

We begin by looking at the notion of an independently and identically distributed se-


quence of random variables.
An independently and identically distributed sequence is often abbreviated as IID.
Two notions are involved, independently and identically distributed.
A sequence 𝑊0 , 𝑊1 , … is independently distributed if the joint probability density of the
sequence is the product of the densities of the components of the sequence.
The sequence 𝑊0 , 𝑊1 , … is independently and identically distributed if in addition the
marginal density of 𝑊𝑡 is the same for all 𝑡 = 0, 1, ….
For example, let 𝑝(𝑊0 , 𝑊1 , …) be the joint density of the sequence and let 𝑝(𝑊𝑡 ) be the
marginal density for a particular 𝑊𝑡 for all 𝑡 = 0, 1, ….
Then the joint density of the sequence 𝑊0 , 𝑊1 , … is IID if

𝑝(𝑊0 , 𝑊1 , …) = 𝑝(𝑊0 )𝑝(𝑊1 ) ⋯

so that the joint density is the product of a sequence of identical marginal densities.

37.3.1 IID Means Past Observations Don’t Tell Us Anything About Future
Observations

If a sequence is random variables is IID, past information provides no information about fu-
ture realizations.
In this sense, there is nothing to learn about the future from the past.
To understand these statements, let the joint distribution of a sequence of random variables
{𝑊𝑡 }𝑇𝑡=0 that is not necessarily IID, be

𝑝(𝑊𝑇 , 𝑊𝑇 −1 , … , 𝑊1 , 𝑊0 )
37.4. A SETTING IN WHICH PAST OBSERVATIONS ARE INFORMATIVE 583

Using the laws of probability, we can always factor such a joint density into a product of con-
ditional densities:

𝑝(𝑊𝑇 , 𝑊𝑇 −1 , … , 𝑊1 , 𝑊0 ) =𝑝(𝑊𝑇 |𝑊𝑡−1 , … , 𝑊0 )𝑝(𝑊𝑇 −1 |𝑊𝑇 −2 , … , 𝑊0 ) ⋯


𝑝(𝑊1 |𝑊0 )𝑝(𝑊0 )

In general,

𝑝(𝑊𝑡 |𝑊𝑡−1 , … , 𝑊0 ) ≠ 𝑝(𝑊𝑡 )

which states that the conditional density on the left side does not equal the marginal
density on the right side.
In the special IID case,

𝑝(𝑊𝑡 |𝑊𝑡−1 , … , 𝑊0 ) = 𝑝(𝑊𝑡 )

and partial history 𝑊𝑡−1 , … , 𝑊0 contains no information about the probability of 𝑊𝑡 .


So in the IID case, there is nothing to learn about the densities of future random variables
from past data.
In the general case, there is something go learn from past data.
We turn next to an instance of this general case in which there is something to learn from
past data.
Please keep your eye out for what there is to learn from past data.

37.4 A Setting in Which Past Observations Are Informative

We now turn to a setting in which there is something to learn.



Let {𝑊𝑡 }𝑡=0 be a sequence of nonnegative scalar random variables with a joint probability
distribution constructed as follows.
There are two distinct cumulative distribution functions 𝐹 and 𝐺 — with densities 𝑓 and 𝑔
for a nonnegative scalar random variable 𝑊 .
Before the start of time, say at time 𝑡 = −1, “nature” once and for all selects either 𝑓 or 𝑔
— and thereafter at each time 𝑡 ≥ 0 draws a random 𝑊 from the selected distribution.
So the data are permanently generated as independently and identically distributed (IID)
draws from either 𝐹 or 𝐺.
We could say that objectively the probability that the data are generated as draws from 𝐹 is
either 0 or 1.
We now drop into this setting a decision maker who knows 𝐹 and 𝐺 and that nature picked
one of them once and for all and then drew an IID sequence of draws from that distribution.
But our decision maker does not know which of the two distributions nature selected.
The decision maker summarizes his ignorance about this by picking a subjective probabil-
ity 𝜋̃ and reasons as if nature had selected 𝐹 with probability 𝜋̃ ∈ (0, 1) and 𝐺 with probabil-
ity 1 − 𝜋.̃
584 CHAPTER 37. EXCHANGEABILITY AND BAYESIAN UPDATING

Thus, we assume that the decision maker

• knows both 𝐹 and 𝐺

• doesnt’t know which of these two distributions that nature has drawn
• summarizing his ignorance by acting as if or thinking that nature chose distribution 𝐹
with probability 𝜋̃ ∈ (0, 1) and distribution 𝐺 with probability 1 − 𝜋̃
• at date 𝑡 ≥ 0 has observed the partial history 𝑤𝑡 , 𝑤𝑡−1 , … , 𝑤0 of draws from the appro-
priate joint density of the partial history
But what do we mean by the appropriate joint distribution?
We’ll discuss that next and in the process describe the concept of exchangeability.

37.5 Relationship Between IID and Exchangeable

Conditional on nature selecting 𝐹 , the joint density of the sequence 𝑊0 , 𝑊1 , … is

𝑓(𝑊0 )𝑓(𝑊1 ) ⋯

Conditional on nature selecting 𝐺, the joint density of the sequence 𝑊0 , 𝑊1 , … is

𝑔(𝑊0 )𝑔(𝑊1 ) ⋯

Notice that conditional on nature having selected 𝐹 , the sequence 𝑊0 , 𝑊1 , … is indepen-


dently and identically distributed.
Furthermore, conditional on nature having selected 𝐺, the sequence 𝑊0 , 𝑊1 , … is also
independently and identically distributed.
But what about the unconditional distribution?
The unconditional distribution of 𝑊0 , 𝑊1 , … is evidently

ℎ(𝑊0 , 𝑊1 , …) ≡ 𝜋[𝑓(𝑊
̃ 0 )𝑓(𝑊1 ) ⋯] + (1 − 𝜋)[𝑔(𝑊
̃ 0 )𝑔(𝑊1 ) ⋯] (1)

Under the unconditional distribution ℎ(𝑊0 , 𝑊1 , …), the sequence 𝑊0 , 𝑊1 , … is not indepen-
dently and identically distributed.
To verify this claim, it is sufficient to notice, for example, that

ℎ(𝑤0 , 𝑤1 ) = 𝜋𝑓(𝑤
̃ 0 )𝑓(𝑤1 )+(1− 𝜋)𝑔(𝑤
̃ 0 )𝑔(𝑤1 ) ≠ (𝜋𝑓(𝑤
̃ 0 )+(1− 𝜋)𝑔(𝑤
̃ 0 ))(𝜋𝑓(𝑤
̃ 1 )+(1− 𝜋)𝑔(𝑤
̃ 1 ))

Thus, the conditional distribution

ℎ(𝑤0 , 𝑤1 )
ℎ(𝑤1 |𝑤0 ) ≡ ≠ (𝜋𝑓(𝑤
̃ 1 ) + (1 − 𝜋)𝑔(𝑤
̃ 1 ))
(𝜋𝑓(𝑤
̃ 0 + (1 − 𝜋)𝑔(𝑤
) ̃ 0 ))

This means that the realization 𝑤0 contains information about 𝑤1 .


So there is something to learn.
But what and how?
37.6. EXCHANGEABILITY 585

37.6 Exchangeability

While the sequence 𝑊0 , 𝑊1 , … is not IID, it can be verified that it is exchangeable, which
means that

ℎ(𝑤0 , 𝑤1 ) = ℎ(𝑤1 , 𝑤0 )

and so on.
More generally, a sequence of random variables is said to be exchangeable if the joint prob-
ability distribution for the sequence does not change when the positions in the sequence in
which finitely many of the random variables appear are altered.
Equation (1) represents our instance of an exchangeable joint density over a sequence of ran-
dom variables as a mixture of two IID joint densities over a sequence of random variables.
For a Bayesian statistician, the mixing parameter 𝜋̃ ∈ (0, 1) has a special interpretation as a
prior probability that nature selected probability distribution 𝐹 .
DeFinetti [25] established a related representation of an exchangeable process created by mix-
ing sequences of IID Bernoulli random variables with parameters 𝜃 and mixing probability
𝜋(𝜃) for a density 𝜋(𝜃) that a Bayesian statistician would interpret as a prior over the un-
known Bernoulli paramter 𝜃.

37.7 Bayes’ Law

We noted above that in our example model there is something to learn about about the fu-
ture from past data drawn from our particular instance of a process that is exchangeable but
not IID.
But how can we learn?
And about what?
The answer to the about what question is about 𝑝𝑖.
̃
The answer to the how question is to use Bayes’ Law.
Another way to say use Bayes’ Law is to say compute an appropriate conditional distribution.
Let’s dive into Bayes’ Law in this context.
Let 𝑞 represent the distribution that nature actually draws from 𝑤 from and let

𝜋 = ℙ{𝑞 = 𝑓}

where we regard 𝜋 as the decision maker’s subjective probability (also called a personal
probability.
Suppose that at 𝑡 ≥ 0, the decision maker has observed a history 𝑤𝑡 ≡ [𝑤𝑡 , 𝑤𝑡−1 , … , 𝑤0 ].
We let

𝜋𝑡 = ℙ{𝑞 = 𝑓|𝑤𝑡 }

where we adopt the convention


586 CHAPTER 37. EXCHANGEABILITY AND BAYESIAN UPDATING

𝜋−1 = 𝜋̃

The distribution of 𝑤𝑡+1 conditional on 𝑤𝑡 is then

𝜋𝑡 𝑓 + (1 − 𝜋𝑡 )𝑔.

Bayes’ rule for updating 𝜋𝑡+1 is

𝜋𝑡 𝑓(𝑤𝑡+1 )
𝜋𝑡+1 = (2)
𝜋𝑡 𝑓(𝑤𝑡+1 ) + (1 − 𝜋𝑡 )𝑔(𝑤𝑡+1 )

The last expression follows from Bayes’ rule, which tells us that

ℙ{𝑊 = 𝑤 | 𝑞 = 𝑓}ℙ{𝑞 = 𝑓}
ℙ{𝑞 = 𝑓 | 𝑊 = 𝑤} = and ℙ{𝑊 = 𝑤} = ∑ ℙ{𝑊 = 𝑤 | 𝑞 = 𝜔}ℙ{𝑞 = 𝜔}
ℙ{𝑊 = 𝑤} 𝜔∈{𝑓,𝑔}

37.8 More Details about Bayesian Updating

Let’s stare at and rearrange Bayes’ Law as represented in equation (2) with the aim of under-
standing how the posterior 𝜋𝑡+1 is influenced by the prior 𝜋𝑡 and the likelihood ratio

𝑓(𝑤)
𝑙(𝑤) =
𝑔(𝑤)

It is convenient for us to rewrite the updating rule (2) as

𝑓(𝑤𝑡+1 )
𝜋𝑡 𝑓 (𝑤𝑡+1 ) 𝜋𝑡 𝑔(𝑤 ) 𝜋𝑡 𝑙 (𝑤𝑡+1 )
𝜋𝑡+1 = = 𝑓(𝑤 ) 𝑡+1 =
𝜋𝑡 𝑓 (𝑤𝑡+1 ) + (1 − 𝜋𝑡 ) 𝑔 (𝑤𝑡+1 ) 𝜋𝑡 𝑔(𝑤 ) + (1 − 𝜋𝑡 )
𝑡+1 𝜋𝑡 𝑙 (𝑤𝑡+1 ) + (1 − 𝜋𝑡 )
𝑡+1

This implies that

𝜋𝑡+1 𝑙 (𝑤𝑡+1 ) >1 if 𝑙 (𝑤𝑡+1 ) > 1


= { (3)
𝜋𝑡 𝜋𝑡 𝑙 (𝑤𝑡+1 ) + (1 − 𝜋𝑡 ) ≤ 1 if 𝑙 (𝑤𝑡+1 ) ≤ 1

Notice how the likelihood ratio and the prior interact to determine whether an observation
𝑤𝑡+1 leads the decision maker to increase or decrease the subjective probability he/she at-
taches to distribution 𝐹 .
When the likelihood ratio 𝑙(𝑤𝑡+1 ) exceeds one, the observation 𝑤𝑡+1 nudges the probability
𝜋 put on distribution 𝐹 upward, and when the likelihood ratio 𝑙(𝑤𝑡+1 ) is less that one, the
observation 𝑤𝑡+1 nudges 𝜋 downward.
Representation (3) is the foundation of the graphs that we’ll use to display the dynamics of
{𝜋𝑡 }∞
𝑡=0 that are induced by Bayes’ Law.

We’ll plot 𝑙 (𝑤) as a way to enlighten us about how learning – i.e., Bayesian updating of the
probability 𝜋 that nature has chosen distribution 𝑓 – works.
37.8. MORE DETAILS ABOUT BAYESIAN UPDATING 587

To create the Python infrastructure to do our work for us, we construct a wrapper function
that displays informative graphs given parameters of 𝑓 and 𝑔.

In [2]: @vectorize
def p(x, a, b):
"The general beta distribution function."
r = gamma(a + b) / (gamma(a) * gamma(b))
return r * x ** (a­1) * (1 ­ x) ** (b­1)

def learning_example(F_a=1, F_b=1, G_a=3, G_b=1.2):


"""
A wrapper function that displays the updating rule of belief π,
given the parameters which specify F and G distributions.
"""

f = njit(lambda x: p(x, F_a, F_b))


g = njit(lambda x: p(x, G_a, G_b))

# l(w) = f(w) / g(w)


l = lambda w: f(w) / g(w)
# objective function for solving l(w) = 1
obj = lambda w: l(w) ­ 1

x_grid = np.linspace(0, 1, 100)


π_grid = np.linspace(1e­3, 1­1e­3, 100)

w_max = 1
w_grid = np.linspace(1e­12, w_max­1e­12, 100)

# the mode of beta distribution


# use this to divide w into two intervals for root finding
G_mode = (G_a ­ 1) / (G_a + G_b ­ 2)
roots = np.empty(2)
roots[0] = op.root_scalar(obj, bracket=[1e­10, G_mode]).root
roots[1] = op.root_scalar(obj, bracket=[G_mode, 1­1e­10]).root

fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(18, 5))

ax1.plot(l(w_grid), w_grid, label='$l$', lw=2)


ax1.vlines(1., 0., 1., linestyle="­­")
ax1.hlines(roots, 0., 2., linestyle="­­")
ax1.set_xlim([0., 2.])
ax1.legend(loc=4)
ax1.set(xlabel='$l(w)=f(w)/g(w)$', ylabel='$w$')

ax2.plot(f(x_grid), x_grid, label='$f$', lw=2)


ax2.plot(g(x_grid), x_grid, label='$g$', lw=2)
ax2.vlines(1., 0., 1., linestyle="­­")
ax2.hlines(roots, 0., 2., linestyle="­­")
ax2.legend(loc=4)
ax2.set(xlabel='$f(w), g(w)$', ylabel='$w$')

area1 = quad(f, 0, roots[0])[0]


area2 = quad(g, roots[0], roots[1])[0]
area3 = quad(f, roots[1], 1)[0]

ax2.text((f(0) + f(roots[0])) / 4, roots[0] / 2, f"{area1: .3g}")


ax2.fill_between([0, 1], 0, roots[0], color='blue', alpha=0.15)
ax2.text(np.mean(g(roots)) / 2, np.mean(roots), f"{area2: .3g}")
w_roots = np.linspace(roots[0], roots[1], 20)
ax2.fill_betweenx(w_roots, 0, g(w_roots), color='orange', alpha=0.15)
ax2.text((f(roots[1]) + f(1)) / 4, (roots[1] + 1) / 2, f"{area3: .3g}")
ax2.fill_between([0, 1], roots[1], 1, color='blue', alpha=0.15)
588 CHAPTER 37. EXCHANGEABILITY AND BAYESIAN UPDATING

W = np.arange(0.01, 0.99, 0.08)


Π = np.arange(0.01, 0.99, 0.08)

ΔW = np.zeros((len(W), len(Π)))
ΔΠ = np.empty((len(W), len(Π)))
for i, w in enumerate(W):
for j, π in enumerate(Π):
lw = l(w)
ΔΠ[i, j] = π * (lw / (π * lw + 1 ­ π) ­ 1)

q = ax3.quiver(Π, W, ΔΠ, ΔW, scale=2, color='r', alpha=0.8)

ax3.fill_between(π_grid, 0, roots[0], color='blue', alpha=0.15)


ax3.fill_between(π_grid, roots[0], roots[1], color='green', alpha=0.15)
ax3.fill_between(π_grid, roots[1], w_max, color='blue', alpha=0.15)
ax3.hlines(roots, 0., 1., linestyle="­­")
ax3.set(xlabel='$\pi$', ylabel='$w$')
ax3.grid()

plt.show()

Now we’ll create a group of graphs designed to illustrate the dynamics induced by Bayes’
Law.
We’ll begin with the default values of various objects, then change them in a subsequent ex-
ample.

In [3]: learning_example()

Please look at the three graphs above created for an instance in which 𝑓 is a uniform distri-
bution on [0, 1] (i.e., a Beta distribution with parameters 𝐹𝑎 = 1, 𝐹𝑏 = 1, while 𝑔 is a Beta
distribution with the default parameter values 𝐺𝑎 = 3, 𝐺𝑏 = 1.2.
The graph in the left plots the likehood ratio 𝑙(𝑤) on the coordinate axis against 𝑤 on the
coordinate axis.
The middle graph plots both 𝑓(𝑤) and 𝑔(𝑤) against 𝑤, with the horizontal dotted lines show-
ing values of 𝑤 at which the likelihood ratio equals 1.
The graph on the right side plots arrows to the right that show when Bayes’ Law makes 𝜋
increase and arrows to the left that show when Bayes’ Law make 𝜋 decrease.
Notice how the length of the arrows, which show the magnitude of the force from Bayes’ Law
impelling 𝜋 to change, depend on both the prior probability 𝜋 on the ordinate axis and the
evidence in the form of the current draw of 𝑤 on the coordinate axis.
37.9. APPENDIX 589

The fractions in the colored areas of the middle graphs are probabilities under 𝐹 and 𝐺, re-
spectively, that realizations of 𝑤 fall into the interval that updates the belief 𝜋 in a correct
direction (i.e., toward 0 when 𝐺 is the true distribution, and towards 1 when 𝐹 is the true
distribution).
For example, in the above example, under true distribution 𝐹 , 𝜋 will be updated toward 0 if
𝑤 falls into the interval [0.524, 0.999], which occurs with probability 1 − .524 = .476 under
𝐹 . But this would occur with probability 0.816 if 𝐺 were the true distribution. The fraction
0.816 in the orange region is the integral of 𝑔(𝑤) over this interval.
Next we use our code to create graphs for another instance of our model.
We keep 𝐹 the same as in the preceding instance, namely a uniform distribution, but now
assume that 𝐺 is a Beta distribution with parameters 𝐺𝑎 = 2, 𝐺𝑏 = 1.6.

In [4]: learning_example(G_a=2, G_b=1.6)

Notice how the likelihood ratio, the middle graph, and the arrows compare with the previous
instance of our example.

37.9 Appendix

37.9.1 Sample Paths of 𝜋𝑡

Now we’ll have some fun by plotting multiple realizations of sample paths of 𝜋𝑡 under two
possible assumptions about nature’s choice of distribution:
• that nature permanently draws from 𝐹
• that nature permanently draws from 𝐺
Outcomes depend on a peculiar property of likelihood ratio processes that are discussed in
this lecture
To do this, we create some Python code.

In [5]: def function_factory(F_a=1, F_b=1, G_a=3, G_b=1.2):

# define f and g
f = njit(lambda x: p(x, F_a, F_b))
g = njit(lambda x: p(x, G_a, G_b))

@njit
590 CHAPTER 37. EXCHANGEABILITY AND BAYESIAN UPDATING

def update(a, b, π):


"Update π by drawing from beta distribution with parameters a and b"

# Draw
w = np.random.beta(a, b)

# Update belief
π = 1 / (1 + ((1 ­ π) * g(w)) / (π * f(w)))

return π

@njit
def simulate_path(a, b, T=50):
"Simulates a path of beliefs π with length T"

π = np.empty(T+1)

# initial condition
π[0] = 0.5

for t in range(1, T+1):


π[t] = update(a, b, π[t­1])

return π

def simulate(a=1, b=1, T=50, N=200, display=True):


"Simulates N paths of beliefs π with length T"

π_paths = np.empty((N, T+1))


if display:
fig = plt.figure()

for i in range(N):
π_paths[i] = simulate_path(a=a, b=b, T=T)
if display:
plt.plot(range(T+1), π_paths[i], color='b', lw=0.8, alpha=0.5)

if display:
plt.show()

return π_paths

return simulate

In [6]: simulate = function_factory()

We begin by generating 𝑁 simulated {𝜋𝑡 } paths with 𝑇 periods when the sequence is truly
IID draws from 𝐹 . We set the initial prior 𝜋−1 = .5.

In [7]: T = 50

In [8]: # when nature selects F


π_paths_F = simulate(a=1, b=1, T=T, N=1000)
37.9. APPENDIX 591

In the above graph we observe that for most paths 𝜋𝑡 → 1. So Bayes’ Law evidently eventu-
ally discovers the truth for most of our paths.
Next, we generate paths with 𝑇 periods when the sequence is truly IID draws from 𝐺. Again,
we set the initial prior 𝜋−1 = .5.

In [9]: # when nature selects G


π_paths_G = simulate(a=3, b=1.2, T=T, N=1000)

In the above graph we observe that now most paths 𝜋𝑡 → 0.


592 CHAPTER 37. EXCHANGEABILITY AND BAYESIAN UPDATING

37.9.2 Rates of convergence

We study rates of convergence of 𝜋𝑡 to 1 when nature generates the data as IID draws from 𝐹
and of 𝜋𝑡 to 0 when nature generates the data as IID draws from 𝐺.
We do this by averaging across simulated paths of {𝜋𝑡 }𝑇𝑡=0 .
𝑁
Using 𝑁 simulated 𝜋𝑡 paths, we compute 1 − ∑𝑖=1 𝜋𝑖,𝑡 at each 𝑡 when the data are generated
𝑁
as draws from 𝐹 and compute ∑𝑖=1 𝜋𝑖,𝑡 when the data are generated as draws from 𝐺.

In [10]: plt.plot(range(T+1), 1 ­ np.mean(π_paths_F, 0), label='F generates')


plt.plot(range(T+1), np.mean(π_paths_G, 0), label='G generates')
plt.legend()
plt.title("convergence");

From the above graph, rates of convergence appear not to depend on whether 𝐹 or 𝐺 gener-
ates the data.

37.9.3 Another Graph of Population Dynamics of 𝜋𝑡

More insights about the dynamics of {𝜋𝑡 } can be gleaned by computing the following con-
𝜋
ditional expectations of 𝜋𝑡+1 as functions of 𝜋𝑡 via integration with respect to the pertinent
𝑡
probability distribution:

𝜋𝑡+1 𝑙 (𝑤𝑡+1 )
𝐸[ ∣ 𝑞 = 𝜔, 𝜋𝑡 ] = 𝐸 [ ∣ 𝑞 = 𝜔, 𝜋𝑡 ] ,
𝜋𝑡 𝜋𝑡 𝑙 (𝑤𝑡+1 ) + (1 − 𝜋𝑡 )
1
𝑙 (𝑤𝑡+1 )
=∫ 𝜔 (𝑤𝑡+1 ) 𝑑𝑤𝑡+1
0 𝜋𝑡 𝑙 (𝑤𝑡+1 ) + (1 − 𝜋𝑡 )
37.9. APPENDIX 593

where 𝜔 = 𝑓, 𝑔.
The following code approximates the integral above:

In [11]: def expected_ratio(F_a=1, F_b=1, G_a=3, G_b=1.2):

# define f and g
f = njit(lambda x: p(x, F_a, F_b))
g = njit(lambda x: p(x, G_a, G_b))

l = lambda w: f(w) / g(w)


integrand_f = lambda w, π: f(w) * l(w) / (π * l(w) + 1 ­ π)
integrand_g = lambda w, π: g(w) * l(w) / (π * l(w) + 1 ­ π)

π_grid = np.linspace(0.02, 0.98, 100)

expected_rario = np.empty(len(π_grid))
for q, inte in zip(["f", "g"], [integrand_f, integrand_g]):
for i, π in enumerate(π_grid):
expected_rario[i]= quad(inte, 0, 1, args=(π,))[0]
plt.plot(π_grid, expected_rario, label=f"{q} generates")

plt.hlines(1, 0, 1, linestyle="­­")
plt.xlabel("$π_t$")
plt.ylabel("$E[\pi_{t+1}/\pi_t]$")
plt.legend()

plt.show()

First, consider the case where 𝐹𝑎 = 𝐹𝑏 = 1 and 𝐺𝑎 = 3, 𝐺𝑏 = 1.2.

In [12]: expected_ratio()
594 CHAPTER 37. EXCHANGEABILITY AND BAYESIAN UPDATING

The above graphs shows that when 𝐹 generates the data, 𝜋𝑡 on average always heads north,
while when 𝐺 generates the data, 𝜋𝑡 heads south.
Next, we’ll look at a degenerate case in whcih 𝑓 and 𝑔 are identical beta distributions, and
𝐹𝑎 = 𝐺𝑎 = 3, 𝐹𝑏 = 𝐺𝑏 = 1.2.
In a sense, here there is nothing to learn.

In [13]: expected_ratio(F_a=3, F_b=1.2)

The above graph says that 𝜋𝑡 is inert and would remain at its initial value.
Finally, let’s look at a case in which 𝑓 and 𝑔 are neither very different nor identical, in partic-
ular one in which 𝐹𝑎 = 2, 𝐹𝑏 = 1 and 𝐺𝑎 = 3, 𝐺𝑏 = 1.2.

In [14]: expected_ratio(F_a=2, F_b=1, G_a=3, G_b=1.2)


37.10. SEQUELS 595

37.10 Sequels

We’ll dig deeper into some of the ideas used here in the following lectures:
• this lecture describes likelihood ratio processes and their role in frequentist and
Bayesian statistical theories
• this lecture returns to the subject of this lecture and studies whether the Captain’s
hunch that the (frequentist) decision rule that the Navy had ordered him to use can
be expected to be better or worse than the rule sequential rule that Abraham Wald de-
signed
596 CHAPTER 37. EXCHANGEABILITY AND BAYESIAN UPDATING
Chapter 38

Likelihood Ratio Processes and


Bayesian Learning

38.1 Contents

• Overview 38.2
• The Setting 38.3
• Likelihood Ratio Process and Bayes’ Law 38.4
• Sequels 38.5

In [1]: import numpy as np


import matplotlib.pyplot as plt
from numba import vectorize, njit
from math import gamma
%matplotlib inline

38.2 Overview

This lecture describes the role that likelihood ratio processes play in Bayesian learning.
As in this lecture, we’ll use a simple statistical setting from this lecture.
We’ll focus on how a likelihood ratio process and a prior probability determine a posterior
probability.
We’ll derive a convenient recursion for today’s posterior as a function of yesterday’s posterior
and today’s multiplicative increment to a likelihood process.
We’ll also present a useful generalization of that formula that represents today’s posterior in
terms of an initial prior and today’s realization of the likelihood ratio process.
We’ll study how, at least in our setting, a Bayesian eventually learns the probability distribu-
tion that generates the data, an outcome that rests on the asymptotic behavior of likelihood
ratio processes studied in this lecture.
This lecture provides technical results that underly outcomes to be studied in this lecture and
this lecture and this lecture

597
598 CHAPTER 38. LIKELIHOOD RATIO PROCESSES AND BAYESIAN LEARNING

38.3 The Setting

We begin by reviewing the setting in this lecture, which we adopt here too.
A nonnegative random variable 𝑊 has one of two probability density functions, either 𝑓 or 𝑔.
Before the beginning of time, nature once and for all decides whether she will draw a se-
quence of IID draws from either 𝑓 or 𝑔.
We will sometimes let 𝑞 be the density that nature chose once and for all, so that 𝑞 is either 𝑓
or 𝑔, permanently.
Nature knows which density it permanently draws from, but we the observers do not.
We do know both 𝑓 and 𝑔 but we don’t know which density nature chose.
But we want to know.
To do that, we use observations.
We observe a sequence {𝑤𝑡 }𝑇𝑡=1 of 𝑇 IID draws from either 𝑓 or 𝑔.
We want to use these observations to infer whether nature chose 𝑓 or 𝑔.
A likelihood ratio process is a useful tool for this task.
To begin, we define the key component of a likelihood ratio process, namely, the time 𝑡 likeli-
hood ratio as the random variable

𝑓 (𝑤𝑡 )
ℓ(𝑤𝑡 ) = , 𝑡 ≥ 1.
𝑔 (𝑤𝑡 )

We assume that 𝑓 and 𝑔 both put positive probabilities on the same intervals of possible real-
izations of the random variable 𝑊 .
𝑓(𝑤𝑡 )
That means that under the 𝑔 density, ℓ(𝑤𝑡 ) = 𝑔(𝑤𝑡 ) is evidently a nonnegative random vari-
able with mean 1.

A likelihood ratio process for sequence {𝑤𝑡 }𝑡=1 is defined as

𝑡
𝐿 (𝑤𝑡 ) = ∏ ℓ(𝑤𝑖 ),
𝑖=1

where 𝑤𝑡 = {𝑤1 , … , 𝑤𝑡 } is a history of observations up to and including time 𝑡.


Sometimes for shorthand we’ll write 𝐿𝑡 = 𝐿(𝑤𝑡 ).
Notice that the likelihood process satisfies the recursion or multiplicative decomposition

𝐿(𝑤𝑡 ) = ℓ(𝑤𝑡 )𝐿(𝑤𝑡−1 ).

The likelihood ratio and its logarithm are key tools for making inferences using a classic fre-
quentist approach due to Neyman and Pearson [? ].
We’ll again deploy the following Python code from this lecture that evaluates 𝑓 and 𝑔 as two
different beta distributions, then computes and simulates an associated likelihood ratio pro-
cess by generating a sequence 𝑤𝑡 from some probability distribution, for example, a sequence
of IID draws from 𝑔.
38.4. LIKELIHOOD RATIO PROCESS AND BAYES’ LAW 599

In [2]: # Parameters in the two beta distributions.


F_a, F_b = 1, 1
G_a, G_b = 3, 1.2

@vectorize
def p(x, a, b):
r = gamma(a + b) / (gamma(a) * gamma(b))
return r * x** (a­1) * (1 ­ x) ** (b­1)

# The two density functions.


f = njit(lambda x: p(x, F_a, F_b))
g = njit(lambda x: p(x, G_a, G_b))

In [3]: @njit
def simulate(a, b, T=50, N=500):
'''
Generate N sets of T observations of the likelihood ratio,
return as N x T matrix.

'''

l_arr = np.empty((N, T))

for i in range(N):

for j in range(T):
w = np.random.beta(a, b)
l_arr[i, j] = f(w) / g(w)

return l_arr

We’ll also use the following Python code to prepare some informative simulations

In [4]: l_arr_g = simulate(G_a, G_b, N=50000)


l_seq_g = np.cumprod(l_arr_g, axis=1)

In [5]: l_arr_f = simulate(F_a, F_b, N=50000)


l_seq_f = np.cumprod(l_arr_f, axis=1)

38.4 Likelihood Ratio Process and Bayes’ Law

Let 𝜋𝑡 be a Bayesian posterior defined as

𝜋𝑡 = Prob(𝑞 = 𝑓|𝑤𝑡 )

The likelihood ratio process is a principal actor in the formula that governs the evolution of
the posterior probability 𝜋𝑡 , an instance of Bayes’ Law.
Bayes’ law implies that {𝜋𝑡 } obeys the recursion

𝜋𝑡−1 𝑙𝑡 (𝑤𝑡 )
𝜋𝑡 = (1)
𝜋𝑡−1 𝑙𝑡 (𝑤𝑡 ) + 1 − 𝜋𝑡−1

with 𝜋0 being a Bayesian prior probability that 𝑞 = 𝑓, i.e., a personal or subjective belief
about 𝑞 based on our having seen no data.
600 CHAPTER 38. LIKELIHOOD RATIO PROCESSES AND BAYESIAN LEARNING

Below we define a Python function that updates belief 𝜋 using likelihood ratio ℓ according to
recursion (1)

In [6]: @njit
def update(π, l):
"Update π using likelihood l"

# Update belief
π = π * l / (π * l + 1 ­ π)

return π

Formula (1) can be generalized by iterating on it and thereby deriving an expression for the
time 𝑡 posterior 𝜋𝑡+1 as a function of the time 0 prior 𝜋0 and the likelihood ratio process
𝐿(𝑤𝑡+1 ) at time 𝑡.
To begin, notice that the updating rule

𝜋𝑡 ℓ (𝑤𝑡+1 )
𝜋𝑡+1 =
𝜋𝑡 ℓ (𝑤𝑡+1 ) + (1 − 𝜋𝑡 )

implies

1 𝜋 ℓ (𝑤𝑡+1 ) + (1 − 𝜋𝑡 )
= 𝑡
𝜋𝑡+1 𝜋𝑡 ℓ (𝑤𝑡+1 )
1 1 1
=1− + .
ℓ (𝑤𝑡+1 ) ℓ (𝑤𝑡+1 ) 𝜋𝑡

1 1 1
⇒ −1= ( − 1) .
𝜋𝑡+1 ℓ (𝑤𝑡+1 ) 𝜋𝑡

Therefore

1 1 1 1 1
− 1 = 𝑡+1 ( − 1) = 𝑡+1 )
( − 1) .
𝜋𝑡+1 ∏𝑖=1 ℓ (𝑤𝑖 ) 𝜋0 𝐿 (𝑤 𝜋 0

Since 𝜋0 ∈ (0, 1) and 𝐿 (𝑤𝑡+1 ) > 0, we can verify that 𝜋𝑡+1 ∈ (0, 1).
After rearranging the preceding equation, we can express 𝜋𝑡+1 as a function of 𝐿 (𝑤𝑡+1 ), the
likelihood ratio process at 𝑡 + 1, and the initial prior 𝜋0

𝜋0 𝐿 (𝑤𝑡+1 )
𝜋𝑡+1 = . (2)
𝜋0 𝐿 (𝑤𝑡+1 ) + 1 − 𝜋0

Formula (2) generalizes generalizes formula (1).


Formula (2) can be regarded as a one step revision of prior probability 𝜋0 after seeing the
𝑡+1
batch of data {𝑤𝑖 }𝑖=1 .
Formula (2) shows the key role that the likelihood ratio process 𝐿 (𝑤𝑡+1 ) plays in determining
the posterior probability 𝜋𝑡+1 .
Formula (2) is the foundation for the insight that, because of how the likelihood ratio process
behaves as 𝑡 → +∞, the likelihood ratio process dominates the initial prior 𝜋0 in determining
the limiting behavior of 𝜋𝑡 .
38.4. LIKELIHOOD RATIO PROCESS AND BAYES’ LAW 601

To illustrate this insight, below we will plot graphs showing one simulated path of the likeli-
hood ratio process 𝐿𝑡 along with two paths of 𝜋𝑡 that are associated with the same realization
of the likelihood ratio process but different initial prior probabilities probabilities 𝜋0 .
First, we tell Python two values of 𝜋0 .

In [7]: π1, π2 = 0.2, 0.8

Next we generate paths of the likelihood ratio process 𝐿𝑡 and the posterior 𝜋𝑡 for a history of
IID draws from density 𝑓.

In [8]: T = l_arr_f.shape[1]
π_seq_f = np.empty((2, T+1))
π_seq_f[:, 0] = π1, π2

for t in range(T):
for i in range(2):
π_seq_f[i, t+1] = update(π_seq_f[i, t], l_arr_f[0, t])

In [9]: fig, ax1 = plt.subplots()

for i in range(2):
ax1.plot(range(T+1), π_seq_f[i, :], label=f"$\pi_0$={π_seq_f[i, 0]}")

ax1.set_ylabel("$\pi_t$")
ax1.set_xlabel("t")
ax1.legend()
ax1.set_title("when f governs data")

ax2 = ax1.twinx()
ax2.plot(range(1, T+1), np.log(l_seq_f[0, :]), '­­', color='b')
ax2.set_ylabel("$log(L(w^{t}))$")

plt.show()
602 CHAPTER 38. LIKELIHOOD RATIO PROCESSES AND BAYESIAN LEARNING

The dotted line in the graph above records the logarithm of the likelihood ratio process
log 𝐿(𝑤𝑡 ).
Please note that there are two different scales on the 𝑦 axis.
Now let’s study what happens when the history consists of IID draws from density 𝑔

In [10]: T = l_arr_g.shape[1]
π_seq_g = np.empty((2, T+1))
π_seq_g[:, 0] = π1, π2

for t in range(T):
for i in range(2):
π_seq_g[i, t+1] = update(π_seq_g[i, t], l_arr_g[0, t])

In [11]: fig, ax1 = plt.subplots()

for i in range(2):
ax1.plot(range(T+1), π_seq_g[i, :], label=f"$\pi_0$={π_seq_g[i, 0]}")

ax1.set_ylabel("$\pi_t$")
ax1.set_xlabel("t")
ax1.legend()
ax1.set_title("when g governs data")

ax2 = ax1.twinx()
ax2.plot(range(1, T+1), np.log(l_seq_g[0, :]), '­­', color='b')
ax2.set_ylabel("$log(L(w^{t}))$")

plt.show()

Below we offer Python code that verifies that nature chose permanently to draw from density
𝑓.
38.5. SEQUELS 603

In [12]: π_seq = np.empty((2, T+1))


π_seq[:, 0] = π1, π2

for i in range(2):
πL = π_seq[i, 0] * l_seq_f[0, :]
π_seq[i, 1:] = πL / (πL + 1 ­ π_seq[i, 0])

In [13]: np.abs(π_seq ­ π_seq_f).max() < 1e­10

Out[13]: True

We thus conclude that the likelihood ratio process is a key ingredient of the formula (2) for
a Bayesian’s posteior probabilty that nature has drawn history 𝑤𝑡 as repeated draws from
density 𝑔.

38.5 Sequels

This lecture has been devoted to building some useful infrastructure.


We’ll build on results highlighted in this lectures to understand inferences that are the foun-
dations of results described in this lecture and this lecture and this lecture
604 CHAPTER 38. LIKELIHOOD RATIO PROCESSES AND BAYESIAN LEARNING
Chapter 39

Bayesian versus Frequentist Decision


Rules

39.1 Contents

• Overview 39.2
• Setup 39.3
• Frequentist Decision Rule 39.4
• Bayesian Decision Rule 39.5
• Was the Navy Captain’s hunch correct? 39.6
• More details 39.7
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon

In [2]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline
from numba import njit, prange, jitclass, float64, int64
from interpolation import interp
from math import gamma
from scipy.optimize import minimize

39.2 Overview

This lecture follows up on ideas presented in the following lectures:

• A Problem that Stumped Milton Friedman

• Exchangeability and Bayesian Updating


• Likelihood Ratio Processes
In A Problem that Stumped Milton Friedman we described a problem that a Navy Captain
presented to Milton Friedman during World War II.
The Navy had instructed the Captain to use a decision rule for quality control that the Cap-
tain suspected could be dominated by a better rule.

605
606 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES

(The Navy had ordered the Captain to use an instance of a frequentist decision rule.)
Milton Friedman recognized the Captain’s conjecture as posing a challenging statistical
problem that he and other members of the US Government’s Statistical Research Group at
Columbia University proceeded to try to solve.
One of the members of the group, the great mathematician Abraham Wald, soon solved the
problem.
A good way to formulate the problem is to use some ideas from Bayesian statistics that we
describe in this lecture Exchangeability and Bayesian Updating and in this lecture Likelihood
Ratio Processes, which describes the link between Bayesian updating and likelihood ratio pro-
cesses.
The present lecture uses Python to generate simulations that evaluate expected losses un-
der frequentist and Bayesian decision rules for a instances of the Navy Captain’s decision
problem.
The simulations validate the Navy Captain’s hunch that there is a better rule than the one
the Navy had ordered him to use.

39.3 Setup

To formalize the problem of the Navy Captain whose questions posed the problem that Mil-
ton Friedman and Allan Wallis handed over to Abraham Wald, we consider a setting with the
following parts.
• Each period a decision maker draws a non-negative random variable 𝑍 from a probabil-
ity distribution that he does not completely understand. He knows that two probability
distributions are possible, 𝑓0 and 𝑓1 , and that which ever distribution it is remains fixed
over time. The decision maker believes that before the beginning of time, nature once
and for all selected either 𝑓0 or 𝑓1 and that the probability that it selected 𝑓0 is proba-
bility 𝜋∗ .
𝑡
• The decision maker observes a sample {𝑧𝑖 }𝑖=0 from the the distribution chosen by na-
ture.
The decision maker wants to decide which distribution actually governs 𝑍 and is worried by
two types of errors and the losses that they impose on him.
• a loss 𝐿̄ 1 from a type I error that occurs when he decides that 𝑓 = 𝑓1 when actually
𝑓 = 𝑓0
• a loss 𝐿̄ 0 from a type II error that occurs when he decides that 𝑓 = 𝑓0 when actually
𝑓 = 𝑓1
The decision maker pays a cost 𝑐 for drawing another 𝑧
We mainly borrow parameters from the quantecon lecture “A Problem that Stumped Milton
Friedman” except that we increase both 𝐿̄ 0 and 𝐿̄ 1 from 25 to 100 to encourage the frequen-
tist Navy Captain to take more draws before deciding.
We set the cost 𝑐 of taking one more draw at 1.25.
We set the probability distributions 𝑓0 and 𝑓1 to be beta distributions with 𝑎0 = 𝑏0 = 1,
𝑎1 = 3, and 𝑏1 = 1.2, respectively.
Below is some Python code that sets up these objects.
39.3. SETUP 607

In [3]: @njit
def p(x, a, b):
"Beta distribution."

r = gamma(a + b) / (gamma(a) * gamma(b))

return r * x**(a­1) * (1 ­ x)**(b­1)

We start with defining a jitclass that stores parameters and functions we need to solve
problems for both the bayesian and frequentist Navy Captains.

In [4]: wf_data = [
('c', float64), # unemployment compensation
('a0', float64), # parameters of beta distribution
('b0', float64),
('a1', float64),
('b1', float64),
('L0', float64), # cost of selecting f0 when f1 is true
('L1', float64), # cost of selecting f1 when f0 is true
('π_grid', float64[:]), # grid of beliefs π
('π_grid_size', int64),
('mc_size', int64), # size of Monto Carlo simulation
('z0', float64[:]), # sequence of random values
('z1', float64[:]) # sequence of random values
]

In [5]: @jitclass(wf_data)
class WaldFriedman:

def __init__(self,
c=1.25,
a0=1,
b0=1,
a1=3,
b1=1.2,
L0=100,
L1=100,
π_grid_size=200,
mc_size=1000):

self.c, self.π_grid_size = c, π_grid_size


self.a0, self.b0, self.a1, self.b1 = a0, b0, a1, b1
self.L0, self.L1 = L0, L1
self.π_grid = np.linspace(0, 1, π_grid_size)
self.mc_size = mc_size

self.z0 = np.random.beta(a0, b0, mc_size)


self.z1 = np.random.beta(a1, b1, mc_size)

def f0(self, x):

return p(x, self.a0, self.b0)

def f1(self, x):

return p(x, self.a1, self.b1)

def κ(self, z, π):


"""
Updates π using Bayes' rule and the current observation z
"""
608 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES

a0, b0, a1, b1 = self.a0, self.b0, self.a1, self.b1

π_f0, π_f1 = π * p(z, a0, b0), (1 ­ π) * p(z, a1, b1)


π_new = π_f0 / (π_f0 + π_f1)

return π_new

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/ipykernel_launcher.py:1: NumbaDeprecationWarning: The 'numba.jitclass'
decorator has moved to 'numba.experimental.jitclass' to better reflect the
experimental nature of the functionality. Please update your imports to accommodate
this change and see http://numba.pydata.org/numba­
doc/latest/reference/deprecation.html#change­of­jitclass­location for the time�
↪frame.
"""Entry point for launching an IPython kernel.

In [6]: wf = WaldFriedman()

grid = np.linspace(0, 1, 50)

plt.figure()

plt.title("Two Distributions")
plt.plot(grid, wf.f0(grid), lw=2, label="$f_0$")
plt.plot(grid, wf.f1(grid), lw=2, label="$f_1$")

plt.legend()
plt.xlabel("$z$ values")
plt.ylabel("density of $z_k$")

plt.tight_layout()
plt.show()
39.4. FREQUENTIST DECISION RULE 609

Above, we plot the two possible probability densities 𝑓0 and 𝑓1

39.4 Frequentist Decision Rule

The Navy told the Captain to use a frequentist decision rule.


In particular, it gave him a decision rule that the Navy had designed by using frequentist sta-
tistical theory to minimize an expected loss function.
That decision rule is characterized by a sample size 𝑡 and a cutoff 𝑑 associated with a likeli-
hood ratio.
𝑡 𝑓0 (𝑧𝑖 ) 𝑡
Let 𝐿 (𝑧 𝑡 ) = ∏𝑖=0 𝑓1 (𝑧𝑖 ) be the likelihood ratio associated with observing the sequence {𝑧𝑖 }𝑖=0 .
The decision rule associated with a sample size 𝑡 is:
• decide that 𝑓0 is the distribution if the likelihood ratio is greater than 𝑑
To understand how that rule was engineered, let null and alternative hypotheses be
• null: 𝐻0 : 𝑓 = 𝑓0 ,
• alternative 𝐻1 : 𝑓 = 𝑓1 .
Given sample size 𝑡 and cutoff 𝑑, under the model described above, the mathematical expec-
tation of total loss is

̄ (𝑡, 𝑑) = 𝑐𝑡 + 𝜋∗ 𝑃 𝐹 𝐴 × 𝐿̄ 1 + (1 − 𝜋∗ ) (1 − 𝑃 𝐷) × 𝐿̄ 0
𝑉𝑓𝑟𝑒 (1)

where 𝑃 𝐹 𝐴 = Pr {𝐿 (𝑧 𝑡 ) < 𝑑 ∣ 𝑞 = 𝑓0 }
𝑃 𝐷 = Pr {𝐿 (𝑧 𝑡 ) < 𝑑 ∣ 𝑞 = 𝑓1 }

Here
• 𝑃 𝐹 𝐴 denotes the probability of a false alarm, i.e., rejecting 𝐻0 when it is true
• 𝑃 𝐷 denotes the probability of a detection error, i.e., not rejecting 𝐻0 when 𝐻1 is
true
For a given sample size 𝑡, the pairs (𝑃 𝐹 𝐴, 𝑃 𝐷) lie on a “receiver operating characteristic
curve” and can be uniquely pinned down by choosing 𝑑.
To see some receiver operating characteristic curves, please see this lecture Likelihood Ratio
Processes.
̄ (𝑡, 𝑑) numerically, we first simulate sequences of 𝑧 when either 𝑓0 or 𝑓1 gen-
To solve for 𝑉𝑓𝑟𝑒
erates data.

In [7]: N = 10000
T = 100

In [8]: z0_arr = np.random.beta(wf.a0, wf.b0, (N, T))


z1_arr = np.random.beta(wf.a1, wf.b1, (N, T))

In [9]: plt.hist(z0_arr.flatten(), bins=50, alpha=0.4, label='f0')


plt.hist(z1_arr.flatten(), bins=50, alpha=0.4, label='f1')
plt.legend()
plt.show()
610 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES

We can compute sequneces of likelihood ratios using simulated samples.

In [10]: l = lambda z: wf.f0(z) / wf.f1(z)

In [11]: l0_arr = l(z0_arr)


l1_arr = l(z1_arr)

L0_arr = np.cumprod(l0_arr, 1)
L1_arr = np.cumprod(l1_arr, 1)

With an empirical distribution of likelihood ratios in hand, we can draw “receiver operating
characteristic curves” by enumerating (𝑃 𝐹 𝐴, 𝑃 𝐷) pairs given each sample size 𝑡.

In [12]: PFA = np.arange(0, 100, 1)

for t in range(1, 15, 4):


percentile = np.percentile(L0_arr[:, t], PFA)
PD = [np.sum(L1_arr[:, t] < p) / N for p in percentile]

plt.plot(PFA / 100, PD, label=f"t={t}")

plt.scatter(0, 1, label="perfect detection")


plt.plot([0, 1], [0, 1], color='k', ls='­­', label="random detection")

plt.arrow(0.5, 0.5, ­0.15, 0.15, head_width=0.03)


plt.text(0.35, 0.7, "better")
plt.xlabel("Probability of false alarm")
plt.ylabel("Probability of detection")
plt.legend()
plt.title("Receiver Operating Characteristic Curve")
plt.show()
39.4. FREQUENTIST DECISION RULE 611

Our frequentist minimizes the expected total loss presented in equation (1) by choosing (𝑡, 𝑑).
Doing that delivers an expected loss

̄ = min 𝑉𝑓𝑟𝑒
𝑉𝑓𝑟𝑒 ̄ (𝑡, 𝑑) .
𝑡,𝑑

We first consider the case in which 𝜋∗ = Pr {nature selects 𝑓0 } = 0.5.


We can solve the minimization problem in two steps.
̄ (𝑡).
First, we fix 𝑡 and find the optimal cutoff 𝑑 and consequently the minimal 𝑉𝑓𝑟𝑒
Here is Python code that does that and then plots a useful graph.

In [13]: @njit
def V_fre_d_t(d, t, L0_arr, L1_arr, π_star, wf):

N = L0_arr.shape[0]

PFA = np.sum(L0_arr[:, t­1] < d) / N


PD = np.sum(L1_arr[:, t­1] < d) / N

V = π_star * PFA *wf. L1 + (1 ­ π_star) * (1 ­ PD) * wf.L0

return V

In [14]: def V_fre_t(t, L0_arr, L1_arr, π_star, wf):

res = minimize(V_fre_d_t, 1, args=(t, L0_arr, L1_arr, π_star, wf),�


↪method='Nelder­
Mead')
V = res.fun
612 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES

d = res.x

PFA = np.sum(L0_arr[:, t­1] < d) / N


PD = np.sum(L1_arr[:, t­1] < d) / N

return V, PFA, PD

In [15]: def compute_V_fre(L0_arr, L1_arr, π_star, wf):

T = L0_arr.shape[1]

V_fre_arr = np.empty(T)
PFA_arr = np.empty(T)
PD_arr = np.empty(T)

for t in range(1, T+1):


V, PFA, PD = V_fre_t(t, L0_arr, L1_arr, π_star, wf)
V_fre_arr[t­1] = wf.c * t + V
PFA_arr[t­1] = PFA
PD_arr[t­1] = PD

return V_fre_arr, PFA_arr, PD_arr

In [16]: π_star = 0.5


V_fre_arr, PFA_arr, PD_arr = compute_V_fre(L0_arr, L1_arr, π_star, wf)

plt.plot(range(T), V_fre_arr, label='$\min_{d} \overline{V}_{fre}(t,d)$')


plt.xlabel('t')
plt.title('$\pi^*=0.5$')
plt.legend()
plt.show()
39.4. FREQUENTIST DECISION RULE 613

In [17]: t_optimal = np.argmin(V_fre_arr) + 1

In [18]: msg = f"The above graph indicates that minimizing over t tells the frequentist to�
↪draw
{t_optimal} observations and then decide."
print(msg)

The above graph indicates that minimizing over t tells the frequentist to draw 8
observations and then decide.

Let’s now change the value of 𝜋∗ and watch how the decision rule changes.

In [19]: n_π = 20
π_star_arr = np.linspace(0.1, 0.9, n_π)

V_fre_bar_arr = np.empty(n_π)
t_optimal_arr = np.empty(n_π)
PFA_optimal_arr = np.empty(n_π)
PD_optimal_arr = np.empty(n_π)

for i, π_star in enumerate(π_star_arr):


V_fre_arr, PFA_arr, PD_arr = compute_V_fre(L0_arr, L1_arr, π_star, wf)
t_idx = np.argmin(V_fre_arr)

V_fre_bar_arr[i] = V_fre_arr[t_idx]
t_optimal_arr[i] = t_idx + 1
PFA_optimal_arr[i] = PFA_arr[t_idx]
PD_optimal_arr[i] = PD_arr[t_idx]

In [20]: plt.plot(π_star_arr, V_fre_bar_arr)


plt.xlabel('$\pi^*$')
plt.title('$\overline{V}_{fre}$')

plt.show()
614 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES

The following shows how do optimal sample size 𝑡 and targeted (𝑃 𝐹 𝐴, 𝑃 𝐷) change as 𝜋∗
varies.

In [21]: fig, axs = plt.subplots(1, 2, figsize=(14, 5))

axs[0].plot(π_star_arr, t_optimal_arr)
axs[0].set_xlabel('$\pi^*$')
axs[0].set_title('optimal sample size given $\pi^*$')

axs[1].plot(π_star_arr, PFA_optimal_arr, label='$PFA^*(\pi^*)$')


axs[1].plot(π_star_arr, PD_optimal_arr, label='$PD^*(\pi^*)$')
axs[1].set_xlabel('$\pi^*$')
axs[1].legend()
axs[1].set_title('optimal PFA and PD given $\pi^*$')

plt.show()

39.5 Bayesian Decision Rule

In this lecture A Problem that Stumped Milton Friedman, we learned how Abraham Wald
confirmed the Navy Captain’s hunch that there is a better decision rule.
We presented a Bayesian procedure that instructed the Captain to makes decisions by com-
paring his current Bayesian posterior probability 𝜋 with two cutoff probabilities called 𝛼 and
𝛽.
To proceed, we borrow some Python code from the quantecon lecture A Problem that
Stumped Milton Friedman that computes 𝛼 and 𝛽.

In [22]: @njit(parallel=True)
def Q(h, wf):

c, π_grid = wf.c, wf.π_grid


L0, L1 = wf.L0, wf.L1
z0, z1 = wf.z0, wf.z1
mc_size = wf.mc_size
39.5. BAYESIAN DECISION RULE 615

κ = wf.κ

h_new = np.empty_like(π_grid)
h_func = lambda p: interp(π_grid, h, p)

for i in prange(len(π_grid)):
π = π_grid[i]

# Find the expected value of J by integrating over z


integral_f0, integral_f1 = 0, 0
for m in range(mc_size):
π_0 = κ(z0[m], π) # Draw z from f0 and update π
integral_f0 += min((1 ­ π_0) * L0, π_0 * L1, h_func(π_0))

π_1 = κ(z1[m], π) # Draw z from f1 and update π


integral_f1 += min((1 ­ π_1) * L0, π_1 * L1, h_func(π_1))

integral = (π * integral_f0 + (1 ­ π) * integral_f1) / mc_size

h_new[i] = c + integral

return h_new

In [23]: @njit
def solve_model(wf, tol=1e­4, max_iter=1000):
"""
Compute the continuation value function

* wf is an instance of WaldFriedman
"""

# Set up loop
h = np.zeros(len(wf.π_grid))
i = 0
error = tol + 1

while i < max_iter and error > tol:


h_new = Q(h, wf)
error = np.max(np.abs(h ­ h_new))
i += 1
h = h_new

if i == max_iter:
print("Failed to converge!")

return h_new

In [24]: h_star = solve_model(wf)

<ipython­input­23­cd5766267a2f>:15: NumbaWarning: The TBB threading layer�


↪requires TBB
version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
h_new = Q(h, wf)

In [25]: @njit
def find_cutoff_rule(wf, h):

"""
This function takes a continuation value function and returns the
616 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES

corresponding cutoffs of where you transition between continuing and


choosing a specific model
"""

π_grid = wf.π_grid
L0, L1 = wf.L0, wf.L1

# Evaluate cost at all points on grid for choosing a model


payoff_f0 = (1 ­ π_grid) * L0
payoff_f1 = π_grid * L1

# The cutoff points can be found by differencing these costs with


# The Bellman equation (J is always less than or equal to p_c_i)
β = π_grid[np.searchsorted(
payoff_f1 ­ np.minimum(h, payoff_f0),
1e­10)
­ 1]
α = π_grid[np.searchsorted(
np.minimum(h, payoff_f1) ­ payoff_f0,
1e­10)
­ 1]

return (β, α)

β, α = find_cutoff_rule(wf, h_star)
cost_L0 = (1 ­ wf.π_grid) * wf.L0
cost_L1 = wf.π_grid * wf.L1

fig, ax = plt.subplots(figsize=(10, 6))

ax.plot(wf.π_grid, h_star, label='continuation value')


ax.plot(wf.π_grid, cost_L1, label='choose f1')
ax.plot(wf.π_grid, cost_L0, label='choose f0')
ax.plot(wf.π_grid,
np.amin(np.column_stack([h_star, cost_L0, cost_L1]),axis=1),
lw=15, alpha=0.1, color='b', label='minimum cost')

ax.annotate(r"$\beta$", xy=(β + 0.01, 0.5), fontsize=14)


ax.annotate(r"$\alpha$", xy=(α + 0.01, 0.5), fontsize=14)

plt.vlines(β, 0, β * wf.L0, linestyle="­­")


plt.vlines(α, 0, (1 ­ α) * wf.L1, linestyle="­­")

ax.set(xlim=(0, 1), ylim=(0, 0.5 * max(wf.L0, wf.L1)), ylabel="cost",


xlabel="$\pi$", title="Value function")

plt.legend(borderpad=1.1)
plt.show()
39.5. BAYESIAN DECISION RULE 617

The above figure portrays the value function plotted against decision maker’s Bayesian poste-
rior.
It also shows the probabilities 𝛼 and 𝛽.
The Bayesian decision rule is:
• accept 𝐻0 if 𝜋 ≥ 𝛼
• accept 𝐻1 if 𝜋 ≤ 𝛽
• delay deciding and draw another 𝑧 if 𝛽 ≤ 𝜋 ≤ 𝛼
We can calculate two ‘’objective” loss functions under this situation conditioning on knowing
for sure that nature has selected 𝑓0 , in the first case, or 𝑓1 , in the second case.

1. under 𝑓0 ,

⎧0 if 𝛼 ≤ 𝜋,
{
𝑉 0 (𝜋) = ⎨𝑐 + 𝐸𝑉 0 (𝜋′ ) if 𝛽 ≤ 𝜋 < 𝛼,
{𝐿̄ if 𝜋 < 𝛽.
⎩ 1

1. under 𝑓1

⎧𝐿̄ 0 if 𝛼 ≤ 𝜋,
{
𝑉 1 (𝜋) = ⎨𝑐 + 𝐸𝑉 1 (𝜋′ ) if 𝛽 ≤ 𝜋 < 𝛼,
{0 if 𝜋 < 𝛽.

𝜋𝑓0 (𝑧′ )
where 𝜋′ = 𝜋𝑓0 (𝑧′ )+(1−𝜋)𝑓1 (𝑧′ ) . Given a prior probability 𝜋0 , the expected loss for the Bayesian
is
618 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES

̄
𝑉𝐵𝑎𝑦𝑒𝑠 (𝜋0 ) = 𝜋∗ 𝑉 0 (𝜋0 ) + (1 − 𝜋∗ ) 𝑉 1 (𝜋0 ) .

Below we write some Python code that computes 𝑉 0 (𝜋) and 𝑉 1 (𝜋) numerically.

In [26]: @njit(parallel=True)
def V_q(wf, flag):
V = np.zeros(wf.π_grid_size)
if flag == 0:
z_arr = wf.z0
V[wf.π_grid < β] = wf.L1
else:
z_arr = wf.z1
V[wf.π_grid >= α] = wf.L0

V_old = np.empty_like(V)

while True:
V_old[:] = V[:]
V[(β <= wf.π_grid) & (wf.π_grid < α)] = 0

for i in prange(len(wf.π_grid)):
π = wf.π_grid[i]

if π >= α or π < β:
continue

for j in prange(len(z_arr)):
π_next = wf.κ(z_arr[j], π)
V[i] += wf.c + interp(wf.π_grid, V_old, π_next)

V[i] /= wf.mc_size

if np.abs(V ­ V_old).max() < 1e­5:


break

return V

In [27]: V0 = V_q(wf, 0)
V1 = V_q(wf, 1)

plt.plot(wf.π_grid, V0, label='$V^0$')


plt.plot(wf.π_grid, V1, label='$V^1$')
plt.vlines(β, 0, wf.L0, linestyle='­­')
plt.text(β+0.01, wf.L0/2, 'β')
plt.vlines(α, 0, wf.L0, linestyle='­­')
plt.text(α+0.01, wf.L0/2, 'α')
plt.xlabel('$\pi$')
plt.title('Objective value function $V(\pi)$')
plt.legend()
plt.show()
39.5. BAYESIAN DECISION RULE 619

̄
Given an assumed value for 𝜋∗ = Pr {nature selects 𝑓0 }, we can then compute 𝑉𝐵𝑎𝑦𝑒𝑠 (𝜋0 ).
We can then determine an initial Bayesian prior 𝜋0∗ that minimizes this objective concept of
expected loss.
The figure 9 below plots four cases corresponding to 𝜋∗ = 0.25, 0.3, 0.5, 0.7.
We observe that in each case 𝜋0∗ equals 𝜋∗ .

In [28]: def compute_V_baye_bar(π_star, V0, V1, wf):

V_baye = π_star * V0 + (1 ­ π_star) * V1


π_idx = np.argmin(V_baye)
π_optimal = wf.π_grid[π_idx]
V_baye_bar = V_baye[π_idx]

return V_baye, π_optimal, V_baye_bar

In [29]: π_star_arr = [0.25, 0.3, 0.5, 0.7]

fig, axs = plt.subplots(2, 2, figsize=(15, 10))

for i, π_star in enumerate(π_star_arr):


row_i = i // 2
col_i = i % 2

V_baye, π_optimal, V_baye_bar = compute_V_baye_bar(π_star, V0, V1, wf)

axs[row_i, col_i].plot(wf.π_grid, V_baye)


axs[row_i, col_i].hlines(V_baye_bar, 0, 1, linestyle='­­')
axs[row_i, col_i].vlines(π_optimal, V_baye_bar, V_baye.max(), linestyle='­­')
axs[row_i, col_i].text(π_optimal+0.05, (V_baye_bar + V_baye.max()) / 2,
'${\pi_0^*}=$'+f'{π_optimal:0.2f}')
axs[row_i, col_i].set_xlabel('$\pi$')
620 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES

axs[row_i, col_i].set_ylabel('$\overline{V}_{baye}(\pi)$')
axs[row_i, col_i].set_title('$\pi^*=$' + f'{π_star}')

fig.suptitle('$\overline{V}_{baye}(\pi)=\pi^*V^0(\pi) + (1­\pi^*)V^1(\pi)$',
fontsize=16)
plt.show()

This pattern of outcomes holds more generally.


Thus, the following Python code generates the associated graph that verifies the equality of
𝜋0∗ to 𝜋∗ holds for all 𝜋∗ .

In [30]: π_star_arr = np.linspace(0.1, 0.9, n_π)


V_baye_bar_arr = np.empty_like(π_star_arr)
π_optimal_arr = np.empty_like(π_star_arr)

for i, π_star in enumerate(π_star_arr):

V_baye, π_optimal, V_baye_bar = compute_V_baye_bar(π_star, V0, V1, wf)

V_baye_bar_arr[i] = V_baye_bar
π_optimal_arr[i] = π_optimal

fig, axs = plt.subplots(1, 2, figsize=(14, 5))

axs[0].plot(π_star_arr, V_baye_bar_arr)
axs[0].set_xlabel('$\pi^*$')
axs[0].set_title('$\overline{V}_{baye}$')

axs[1].plot(π_star_arr, π_optimal_arr, label='optimal prior')


axs[1].plot([π_star_arr.min(), π_star_arr.max()],
39.6. WAS THE NAVY CAPTAIN’S HUNCH CORRECT? 621

[π_star_arr.min(), π_star_arr.max()],
c='k', linestyle='­­', label='45 degree line')
axs[1].set_xlabel('$\pi^*$')
axs[1].set_title('optimal prior given $\pi^*$')
axs[1].legend()

plt.show()

39.6 Was the Navy Captain’s hunch correct?

We now compare average (i.e., frequentist) losses obtained by the frequentist and Bayesian
decision rules.
As a starting point, let’s compare average loss functions when 𝜋∗ = 0.5.

In [31]: π_star = 0.5

In [32]: # frequentist
V_fre_arr, PFA_arr, PD_arr = compute_V_fre(L0_arr, L1_arr, π_star, wf)

# bayesian
V_baye = π_star * V0 + π_star * V1
V_baye_bar = V_baye.min()

In [33]: plt.plot(range(T), V_fre_arr, label='$\min_{d} \overline{V}_{fre}(t,d)$')


plt.plot([0, T], [V_baye_bar, V_baye_bar], label='$\overline{V}_{baye}$')
plt.xlabel('t')
plt.title('$\pi^*=0.5$')
plt.legend()
plt.show()
622 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES

Evidently, there is no sample size 𝑡 at which the frequentist decision rule attains a lower loss
function than does the Bayesian rule.
Furthermore, the following graph indicates that the Bayesian decision rule does better on av-
erage for all values of 𝜋∗ .

In [34]: fig, axs = plt.subplots(1, 2, figsize=(14, 5))

axs[0].plot(π_star_arr, V_fre_bar_arr, label='$\overline{V}_{fre}$')


axs[0].plot(π_star_arr, V_baye_bar_arr, label='$\overline{V}_{baye}$')
axs[0].legend()
axs[0].set_xlabel('$\pi^*$')

axs[1].plot(π_star_arr, V_fre_bar_arr ­ V_baye_bar_arr, label='$diff$')


axs[1].legend()
axs[1].set_xlabel('$\pi^*$')

plt.show()
39.7. MORE DETAILS 623

̄ − 𝑉𝐵𝑎𝑦𝑒𝑠
The right panel of the above graph plots the difference 𝑉𝑓𝑟𝑒 ̄ .
It is always positive.

39.7 More details

We can provide more insights by focusing soley the case in which 𝜋∗ = 0.5 = 𝜋0 .

In [35]: π_star = 0.5

Recall that when 𝜋∗ = 0.5, the frequentist decision rule sets a sample size t_optimal ex ante
For our parameter settings, we can compute it’s value:

In [36]: t_optimal

Out[36]: 8

For convenience, let’s define t_idx as the Python array index corresponding to t_optimal
sample size.

In [37]: t_idx = t_optimal ­ 1

39.7.1 Distribution of Bayesian decision rule’s times to decide

By using simulations, we compute the frequency distribution of time to deciding for the
Bayesian decision rule and compare that time to the frequentist rule’sfixed 𝑡.
The following Python code creates a graph that shows the frequency distribution of Bayesian
times to decide of Bayesian decision maker, conditional on distribution 𝑞 = 𝑓0 or 𝑞 = 𝑓1
generating the data.
The blue and red dotted lines show averages for the Bayesian decision rule, while the black
dotted line shows the frequentist optimal sample size 𝑡.
On average the Bayesian rule decides earlier than the frequentist rule when 𝑞 = 𝑓0 and later
when 𝑞 = 𝑓1 .

In [38]: @njit(parallel=True)
def check_results(L_arr, α, β, flag, π0):

N, T = L_arr.shape

time_arr = np.empty(N)
correctness = np.empty(N)

π_arr = π0 * L_arr / (π0 * L_arr + 1 ­ π0)

for i in prange(N):
for t in range(T):
if (π_arr[i, t] < β) or (π_arr[i, t] > α):
time_arr[i] = t + 1
correctness[i] = (flag == 0 and π_arr[i, t] > α) or (flag == 1 and
624 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES

π_arr[i, t] < β)
break

return time_arr, correctness

In [39]: time_arr0, correctness0 = check_results(L0_arr, α, β, 0, π_star)


time_arr1, correctness1 = check_results(L1_arr, α, β, 1, π_star)

# unconditional distribution
time_arr_u = np.concatenate((time_arr0, time_arr1))
correctness_u = np.concatenate((correctness0, correctness1))

In [40]: n1 = plt.hist(time_arr0, bins=range(1, 30), alpha=0.4, label='f0 generates')[0]


n2 = plt.hist(time_arr1, bins=range(1, 30), alpha=0.4, label='f1 generates')[0]
plt.vlines(t_optimal, 0, max(n1.max(), n2.max()), linestyle='­­',�
↪label='frequentist')
plt.vlines(np.mean(time_arr0), 0, max(n1.max(), n2.max()),
linestyle='­­', color='b', label='E(t) under f0')
plt.vlines(np.mean(time_arr1), 0, max(n1.max(), n2.max()),
linestyle='­­', color='r', label='E(t) under f1')
plt.legend();

plt.xlabel('t')
plt.ylabel('n')
plt.title('Conditional frequency distribution of times')

plt.show()

Later we’ll figure out how these distributions ultimately affect objective expected values un-
der the two decision rules.
To begin, let’s look at simulations of the Bayesian’s beliefs over time.
39.7. MORE DETAILS 625

We can easily compute the updated beliefs at any time 𝑡 using the one-to-one mapping from
𝐿𝑡 to 𝜋𝑡 given 𝜋0 described in this lecture Likelihood Ratio Processes.

In [41]: π0_arr = π_star * L0_arr / (π_star * L0_arr + 1 ­ π_star)


π1_arr = π_star * L1_arr / (π_star * L1_arr + 1 ­ π_star)

In [42]: fig, axs = plt.subplots(1, 2, figsize=(14, 4))

axs[0].plot(np.arange(1, π0_arr.shape[1]+1), np.mean(π0_arr, 0), label='f0�


↪generates')
axs[0].plot(np.arange(1, π1_arr.shape[1]+1), 1 ­ np.mean(π1_arr, 0), label='f1
generates')
axs[0].set_xlabel('t')
axs[0].set_ylabel('$E(\pi_t)$ or ($1 ­ E(\pi_t)$)')
axs[0].set_title('Expectation of beliefs after drawing t observations')
axs[0].legend()

axs[1].plot(np.arange(1, π0_arr.shape[1]+1), np.var(π0_arr, 0), label='f0�


↪generates')
axs[1].plot(np.arange(1, π1_arr.shape[1]+1), np.var(π1_arr, 0), label='f1�
↪generates')
axs[1].set_xlabel('t')
axs[1].set_ylabel('var($\pi_t$)')
axs[1].set_title('Variance of beliefs after drawing t observations')
axs[1].legend()

plt.show()

The above figures compare averages and variances of updated Bayesian posteriors after 𝑡
draws.
The left graph compares 𝐸 (𝜋𝑡 ) under 𝑓0 to 1 − 𝐸 (𝜋𝑡 ) under 𝑓1 : they lie on top of each other.
However, as the right hand size graph shows, there is significant difference in variances when 𝑡
is small: the variance is lower under 𝑓1 .
The difference in variances is the reason that the Bayesian decision maker waits longer to de-
cide when 𝑓1 generates the data.
The code below plots outcomes of constructing an unconditional distribution by simply pool-
ing the simulated data across the two possible distributions 𝑓0 and 𝑓1 .
The pooled distribution describes a sense in which on average the Bayesian decides earlier, an
outcome that seems at least partly to confirm the Navy Captain’s hunch.
626 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES

In [43]: n = plt.hist(time_arr_u, bins=range(1, 30), alpha=0.4, label='bayesian')[0]


plt.vlines(np.mean(time_arr_u), 0, n.max(), linestyle='­­',
color='b', label='bayesian E(t)')
plt.vlines(t_optimal, 0, n.max(), linestyle='­­', label='frequentist')
plt.legend()

plt.xlabel('t')
plt.ylabel('n')
plt.title('Unconditional distribution of times')

plt.show()

39.7.2 Probability of making correct decisions

Now we use simulations to compute the fraction of samples in which the Bayesian and the
frequentist decision rules decide correctly.
For the frequentist rule, the probability of making the correct decision under 𝑓1 is the optimal
probability of detection given 𝑡 that we defined earlier, and similarly it equals 1 minus the
optimal probability of a false alarm under 𝑓0 .
Below we plot these two probabilities for the frequentist rule, along with the conditional
probabilities that the Bayesian rule decides before 𝑡 and that the decision is correct.

In [44]: # optimal PFA and PD of frequentist with optimal sample size


V, PFA, PD = V_fre_t(t_optimal, L0_arr, L1_arr, π_star, wf)

In [45]: plt.plot([1, 20], [PD, PD], linestyle='­­', label='PD: fre. chooses f1 correctly')
plt.plot([1, 20], [1­PFA, 1­PFA], linestyle='­­', label='1­PFA: fre. chooses f0
correctly')
39.7. MORE DETAILS 627

plt.vlines(t_optimal, 0, 1, linestyle='­­', label='frequentist optimal sample size')

N = time_arr0.size
T_arr = np.arange(1, 21)
plt.plot(T_arr, [np.sum(correctness0[time_arr0 <= t] == 1) / N for t in T_arr],
label='q=f0 and baye. choose f0')
plt.plot(T_arr, [np.sum(correctness1[time_arr1 <= t] == 1) / N for t in T_arr],
label='q=f1 and baye. choose f1')
plt.legend(loc=4)

plt.xlabel('t')
plt.ylabel('Probability')
plt.title('Cond. probability of making correct decisions before t')

plt.show()

By averaging using 𝜋∗ , we also plot the unconditional distribution.

In [46]: plt.plot([1, 20], [(PD + 1 ­ PFA) / 2, (PD + 1 ­ PFA) / 2],


linestyle='­­', label='fre. makes correct decision')
plt.vlines(t_optimal, 0, 1, linestyle='­­', label='frequentist optimal sample size')

N = time_arr_u.size
plt.plot(T_arr, [np.sum(correctness_u[time_arr_u <= t] == 1) / N for t in T_arr],
label="bayesian makes correct decision")
plt.legend()

plt.xlabel('t')
plt.ylabel('Probability')
plt.title('Uncond. probability of making correct decisions before t')

plt.show()
628 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES

39.7.3 Distribution of likelihood ratios at frequentist’s 𝑡

Next we use simulations to construct distributions of likelihood ratios after 𝑡 draws.


To serve as useful reference points, we also show likelihood ratios that correspond to the
Bayesian cutoffs 𝛼 and 𝛽.
In order to exhibit the distribution more clearly, we report logarithms of likelihood ratios.
The graphs below reports two distributions, one conditional on 𝑓0 generating the data, the
other conditional on 𝑓1 generating the data.

In [47]: Lα = (1 ­ π_star) * α / (π_star ­ π_star * α)


Lβ = (1 ­ π_star) * β / (π_star ­ π_star * β)

In [48]: L_min = min(L0_arr[:, t_idx].min(), L1_arr[:, t_idx].min())


L_max = max(L0_arr[:, t_idx].max(), L1_arr[:, t_idx].max())
bin_range = np.linspace(np.log(L_min), np.log(L_max), 50)
n0 = plt.hist(np.log(L0_arr[:, t_idx]), bins=bin_range, alpha=0.4, label='f0
generates')[0]
n1 = plt.hist(np.log(L1_arr[:, t_idx]), bins=bin_range, alpha=0.4, label='f1
generates')[0]

plt.vlines(np.log(Lβ), 0, max(n0.max(), n1.max()), linestyle='­­', color='r',


label='log($L_β$)')
plt.vlines(np.log(Lα), 0, max(n0.max(), n1.max()), linestyle='­­', color='b',
label='log($L_α$)')
plt.legend()

plt.xlabel('log(L)')
plt.ylabel('n')
39.7. MORE DETAILS 629

plt.title('Cond. distribution of log likelihood ratio at frequentist t')

plt.show()

The next graph plots the unconditional distribution of Bayesian times to decide, constructed
as earlier by pooling the two conditional distributions.

In [49]: plt.hist(np.log(np.concatenate([L0_arr[:, t_idx], L1_arr[:, t_idx]])),


bins=50, alpha=0.4, label='unconditional dist. of log(L)')
plt.vlines(np.log(Lβ), 0, max(n0.max(), n1.max()), linestyle='­­', color='r',
label='log($L_β$)')
plt.vlines(np.log(Lα), 0, max(n0.max(), n1.max()), linestyle='­­', color='b',
label='log($L_α$)')
plt.legend()

plt.xlabel('log(L)')
plt.ylabel('n')
plt.title('Uncond. distribution of log likelihood ratio at frequentist t')

plt.show()
630 CHAPTER 39. BAYESIAN VERSUS FREQUENTIST DECISION RULES
Part VI

LQ Control

631
Chapter 40

LQ Control: Foundations

40.1 Contents

• Overview 40.2
• Introduction 40.3
• Optimality – Finite Horizon 40.4
• Implementation 40.5
• Extensions and Comments 40.6
• Further Applications 40.7
• Exercises 40.8
• Solutions 40.9
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon

40.2 Overview

Linear quadratic (LQ) control refers to a class of dynamic optimization problems that have
found applications in almost every scientific field.
This lecture provides an introduction to LQ control and its economic applications.
As we will see, LQ systems have a simple structure that makes them an excellent workhorse
for a wide variety of economic problems.
Moreover, while the linear-quadratic structure is restrictive, it is in fact far more flexible than
it may appear initially.
These themes appear repeatedly below.
Mathematically, LQ control problems are closely related to the Kalman filter
• Recursive formulations of linear-quadratic control problems and Kalman filtering prob-
lems both involve matrix Riccati equations.
• Classical formulations of linear control and linear filtering problems make use of similar
matrix decompositions (see for example this lecture and this lecture).
In reading what follows, it will be useful to have some familiarity with
• matrix manipulations

633
634 CHAPTER 40. LQ CONTROL: FOUNDATIONS

• vectors of random variables


• dynamic programming and the Bellman equation (see for example this lecture and this
lecture)
For additional reading on LQ control, see, for example,
• [72], chapter 5
• [50], chapter 4
• [56], section 3.5
In order to focus on computation, we leave longer proofs to these sources (while trying to pro-
vide as much intuition as possible).
Let’s start with some imports:

In [2]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline
from quantecon import LQ

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

40.3 Introduction

The “linear” part of LQ is a linear law of motion for the state, while the “quadratic” part
refers to preferences.
Let’s begin with the former, move on to the latter, and then put them together into an opti-
mization problem.

40.3.1 The Law of Motion

Let 𝑥𝑡 be a vector describing the state of some economic system.


Suppose that 𝑥𝑡 follows a linear law of motion given by

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝑢𝑡 + 𝐶𝑤𝑡+1 , 𝑡 = 0, 1, 2, … (1)

Here
• 𝑢𝑡 is a “control” vector, incorporating choices available to a decision-maker confronting
the current state 𝑥𝑡
• {𝑤𝑡 } is an uncorrelated zero mean shock process satisfying 𝔼𝑤𝑡 𝑤𝑡′ = 𝐼, where the right-
hand side is the identity matrix
Regarding the dimensions
• 𝑥𝑡 is 𝑛 × 1, 𝐴 is 𝑛 × 𝑛
• 𝑢𝑡 is 𝑘 × 1, 𝐵 is 𝑛 × 𝑘
• 𝑤𝑡 is 𝑗 × 1, 𝐶 is 𝑛 × 𝑗
40.3. INTRODUCTION 635

Example 1

Consider a household budget constraint given by

𝑎𝑡+1 + 𝑐𝑡 = (1 + 𝑟)𝑎𝑡 + 𝑦𝑡

Here 𝑎𝑡 is assets, 𝑟 is a fixed interest rate, 𝑐𝑡 is current consumption, and 𝑦𝑡 is current non-
financial income.
If we suppose that {𝑦𝑡 } is serially uncorrelated and 𝑁 (0, 𝜎2 ), then, taking {𝑤𝑡 } to be stan-
dard normal, we can write the system as

𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑐𝑡 + 𝜎𝑤𝑡+1

This is clearly a special case of (1), with assets being the state and consumption being the
control.

Example 2

One unrealistic feature of the previous model is that non-financial income has a zero mean
and is often negative.
This can easily be overcome by adding a sufficiently large mean.
Hence in this example, we take 𝑦𝑡 = 𝜎𝑤𝑡+1 + 𝜇 for some positive real number 𝜇.
Another alteration that’s useful to introduce (we’ll see why soon) is to change the control
variable from consumption to the deviation of consumption from some “ideal” quantity 𝑐.̄
(Most parameterizations will be such that 𝑐 ̄ is large relative to the amount of consumption
that is attainable in each period, and hence the household wants to increase consumption)
For this reason, we now take our control to be 𝑢𝑡 ∶= 𝑐𝑡 − 𝑐.̄
In terms of these variables, the budget constraint 𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑐𝑡 + 𝑦𝑡 becomes

𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑢𝑡 − 𝑐 ̄ + 𝜎𝑤𝑡+1 + 𝜇 (2)

How can we write this new system in the form of equation (1)?
If, as in the previous example, we take 𝑎𝑡 as the state, then we run into a problem: the law of
motion contains some constant terms on the right-hand side.
This means that we are dealing with an affine function, not a linear one (recall this discus-
sion).
Fortunately, we can easily circumvent this problem by adding an extra state variable.
In particular, if we write

𝑎𝑡+1 1 + 𝑟 −𝑐 ̄ + 𝜇 𝑎 −1 𝜎
( )=( )( 𝑡 ) + ( ) 𝑢𝑡 + ( ) 𝑤𝑡+1 (3)
1 0 1 1 0 0

then the first row is equivalent to (2).


Moreover, the model is now linear and can be written in the form of (1) by setting
636 CHAPTER 40. LQ CONTROL: FOUNDATIONS

𝑎𝑡 1 + 𝑟 −𝑐 ̄ + 𝜇 −1 𝜎
𝑥𝑡 ∶= ( ), 𝐴 ∶= ( ), 𝐵 ∶= ( ), 𝐶 ∶= ( ) (4)
1 0 1 0 0

In effect, we’ve bought ourselves linearity by adding another state.

40.3.2 Preferences

In the LQ model, the aim is to minimize flow of losses, where time-𝑡 loss is given by the
quadratic expression

𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 (5)

Here
• 𝑅 is assumed to be 𝑛 × 𝑛, symmetric and nonnegative definite.
• 𝑄 is assumed to be 𝑘 × 𝑘, symmetric and positive definite.

Note
In fact, for many economic problems, the definiteness conditions on 𝑅 and 𝑄 can
be relaxed. It is sufficient that certain submatrices of 𝑅 and 𝑄 be nonnegative
definite. See [50] for details.

Example 1

A very simple example that satisfies these assumptions is to take 𝑅 and 𝑄 to be identity ma-
trices so that current loss is

𝑥′𝑡 𝐼𝑥𝑡 + 𝑢′𝑡 𝐼𝑢𝑡 = ‖𝑥𝑡 ‖2 + ‖𝑢𝑡 ‖2

Thus, for both the state and the control, loss is measured as squared distance from the origin.
(In fact, the general case (5) can also be understood in this way, but with 𝑅 and 𝑄 identify-
ing other – non-Euclidean – notions of “distance” from the zero vector).
Intuitively, we can often think of the state 𝑥𝑡 as representing deviation from a target, such as
• deviation of inflation from some target level
• deviation of a firm’s capital stock from some desired quantity
The aim is to put the state close to the target, while using controls parsimoniously.

Example 2

In the household problem studied above, setting 𝑅 = 0 and 𝑄 = 1 yields preferences

𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 = 𝑢2𝑡 = (𝑐𝑡 − 𝑐)̄ 2

Under this specification, the household’s current loss is the squared deviation of consumption
from the ideal level 𝑐.̄
40.4. OPTIMALITY – FINITE HORIZON 637

40.4 Optimality – Finite Horizon

Let’s now be precise about the optimization problem we wish to consider, and look at how to
solve it.

40.4.1 The Objective

We will begin with the finite horizon case, with terminal time 𝑇 ∈ ℕ.
In this case, the aim is to choose a sequence of controls {𝑢0 , … , 𝑢𝑇 −1 } to minimize the objec-
tive

𝑇 −1
𝔼 { ∑ 𝛽 𝑡 (𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 ) + 𝛽 𝑇 𝑥′𝑇 𝑅𝑓 𝑥𝑇 } (6)
𝑡=0

subject to the law of motion (1) and initial state 𝑥0 .


The new objects introduced here are 𝛽 and the matrix 𝑅𝑓 .
The scalar 𝛽 is the discount factor, while 𝑥′ 𝑅𝑓 𝑥 gives terminal loss associated with state 𝑥.
Comments:
• We assume 𝑅𝑓 to be 𝑛 × 𝑛, symmetric and nonnegative definite.
• We allow 𝛽 = 1, and hence include the undiscounted case.
• 𝑥0 may itself be random, in which case we require it to be independent of the shock se-
quence 𝑤1 , … , 𝑤𝑇 .

40.4.2 Information

There’s one constraint we’ve neglected to mention so far, which is that the decision-maker
who solves this LQ problem knows only the present and the past, not the future.
To clarify this point, consider the sequence of controls {𝑢0 , … , 𝑢𝑇 −1 }.
When choosing these controls, the decision-maker is permitted to take into account the effects
of the shocks {𝑤1 , … , 𝑤𝑇 } on the system.
However, it is typically assumed — and will be assumed here — that the time-𝑡 control 𝑢𝑡
can be made with knowledge of past and present shocks only.
The fancy measure-theoretic way of saying this is that 𝑢𝑡 must be measurable with respect to
the 𝜎-algebra generated by 𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡 .
This is in fact equivalent to stating that 𝑢𝑡 can be written in the form 𝑢𝑡 =
𝑔𝑡 (𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡 ) for some Borel measurable function 𝑔𝑡 .
(Just about every function that’s useful for applications is Borel measurable, so, for the pur-
poses of intuition, you can read that last phrase as “for some function 𝑔𝑡 ”)
Now note that 𝑥𝑡 will ultimately depend on the realizations of 𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡 .
In fact, it turns out that 𝑥𝑡 summarizes all the information about these historical shocks that
the decision-maker needs to set controls optimally.
More precisely, it can be shown that any optimal control 𝑢𝑡 can always be written as a func-
tion of the current state alone.
638 CHAPTER 40. LQ CONTROL: FOUNDATIONS

Hence in what follows we restrict attention to control policies (i.e., functions) of the form
𝑢𝑡 = 𝑔𝑡 (𝑥𝑡 ).
Actually, the preceding discussion applies to all standard dynamic programming problems.
What’s special about the LQ case is that – as we shall soon see — the optimal 𝑢𝑡 turns out
to be a linear function of 𝑥𝑡 .

40.4.3 Solution

To solve the finite horizon LQ problem we can use a dynamic programming strategy based on
backward induction that is conceptually similar to the approach adopted in this lecture.
For reasons that will soon become clear, we first introduce the notation 𝐽𝑇 (𝑥) = 𝑥′ 𝑅𝑓 𝑥.
Now consider the problem of the decision-maker in the second to last period.
In particular, let the time be 𝑇 − 1, and suppose that the state is 𝑥𝑇 −1 .
The decision-maker must trade-off current and (discounted) final losses, and hence solves

min{𝑥′𝑇 −1 𝑅𝑥𝑇 −1 + 𝑢′ 𝑄𝑢 + 𝛽 𝔼𝐽𝑇 (𝐴𝑥𝑇 −1 + 𝐵𝑢 + 𝐶𝑤𝑇 )}


𝑢

At this stage, it is convenient to define the function

𝐽𝑇 −1 (𝑥) = min{𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 𝔼𝐽𝑇 (𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤𝑇 )} (7)


𝑢

The function 𝐽𝑇 −1 will be called the 𝑇 − 1 value function, and 𝐽𝑇 −1 (𝑥) can be thought of as
representing total “loss-to-go” from state 𝑥 at time 𝑇 − 1 when the decision-maker behaves
optimally.
Now let’s step back to 𝑇 − 2.
For a decision-maker at 𝑇 − 2, the value 𝐽𝑇 −1 (𝑥) plays a role analogous to that played by the
terminal loss 𝐽𝑇 (𝑥) = 𝑥′ 𝑅𝑓 𝑥 for the decision-maker at 𝑇 − 1.
That is, 𝐽𝑇 −1 (𝑥) summarizes the future loss associated with moving to state 𝑥.
The decision-maker chooses her control 𝑢 to trade off current loss against future loss, where
• the next period state is 𝑥𝑇 −1 = 𝐴𝑥𝑇 −2 + 𝐵𝑢 + 𝐶𝑤𝑇 −1 , and hence depends on the choice
of current control.
• the “cost” of landing in state 𝑥𝑇 −1 is 𝐽𝑇 −1 (𝑥𝑇 −1 ).
Her problem is therefore

min{𝑥′𝑇 −2 𝑅𝑥𝑇 −2 + 𝑢′ 𝑄𝑢 + 𝛽 𝔼𝐽𝑇 −1 (𝐴𝑥𝑇 −2 + 𝐵𝑢 + 𝐶𝑤𝑇 −1 )}


𝑢

Letting

𝐽𝑇 −2 (𝑥) = min{𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 𝔼𝐽𝑇 −1 (𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤𝑇 −1 )}


𝑢

the pattern for backward induction is now clear.


In particular, we define a sequence of value functions {𝐽0 , … , 𝐽𝑇 } via
40.4. OPTIMALITY – FINITE HORIZON 639

𝐽𝑡−1 (𝑥) = min{𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 𝔼𝐽𝑡 (𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤𝑡 )} and 𝐽𝑇 (𝑥) = 𝑥′ 𝑅𝑓 𝑥


𝑢

The first equality is the Bellman equation from dynamic programming theory specialized to
the finite horizon LQ problem.
Now that we have {𝐽0 , … , 𝐽𝑇 }, we can obtain the optimal controls.
As a first step, let’s find out what the value functions look like.
It turns out that every 𝐽𝑡 has the form 𝐽𝑡 (𝑥) = 𝑥′ 𝑃𝑡 𝑥 + 𝑑𝑡 where 𝑃𝑡 is a 𝑛 × 𝑛 matrix and 𝑑𝑡
is a constant.
We can show this by induction, starting from 𝑃𝑇 ∶= 𝑅𝑓 and 𝑑𝑇 = 0.
Using this notation, (7) becomes

𝐽𝑇 −1 (𝑥) = min{𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 𝔼(𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤𝑇 )′ 𝑃𝑇 (𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤𝑇 )} (8)


𝑢

To obtain the minimizer, we can take the derivative of the r.h.s. with respect to 𝑢 and set it
equal to zero.
Applying the relevant rules of matrix calculus, this gives

𝑢 = −(𝑄 + 𝛽𝐵′ 𝑃𝑇 𝐵)−1 𝛽𝐵′ 𝑃𝑇 𝐴𝑥 (9)

Plugging this back into (8) and rearranging yields

𝐽𝑇 −1 (𝑥) = 𝑥′ 𝑃𝑇 −1 𝑥 + 𝑑𝑇 −1

where

𝑃𝑇 −1 = 𝑅 − 𝛽 2 𝐴′ 𝑃𝑇 𝐵(𝑄 + 𝛽𝐵′ 𝑃𝑇 𝐵)−1 𝐵′ 𝑃𝑇 𝐴 + 𝛽𝐴′ 𝑃𝑇 𝐴 (10)

and

𝑑𝑇 −1 ∶= 𝛽 trace(𝐶 ′ 𝑃𝑇 𝐶) (11)

(The algebra is a good exercise — we’ll leave it up to you)


If we continue working backwards in this manner, it soon becomes clear that 𝐽𝑡 (𝑥) = 𝑥′ 𝑃𝑡 𝑥 +
𝑑𝑡 as claimed, where {𝑃𝑡 } and {𝑑𝑡 } satisfy the recursions

𝑃𝑡−1 = 𝑅 − 𝛽 2 𝐴′ 𝑃𝑡 𝐵(𝑄 + 𝛽𝐵′ 𝑃𝑡 𝐵)−1 𝐵′ 𝑃𝑡 𝐴 + 𝛽𝐴′ 𝑃𝑡 𝐴 with 𝑃𝑇 = 𝑅 𝑓 (12)

and

𝑑𝑡−1 = 𝛽(𝑑𝑡 + trace(𝐶 ′ 𝑃𝑡 𝐶)) with 𝑑𝑇 = 0 (13)

Recalling (9), the minimizers from these backward steps are

𝑢𝑡 = −𝐹𝑡 𝑥𝑡 where 𝐹𝑡 ∶= (𝑄 + 𝛽𝐵′ 𝑃𝑡+1 𝐵)−1 𝛽𝐵′ 𝑃𝑡+1 𝐴 (14)


640 CHAPTER 40. LQ CONTROL: FOUNDATIONS

These are the linear optimal control policies we discussed above.


In particular, the sequence of controls given by (14) and (1) solves our finite horizon LQ
problem.
Rephrasing this more precisely, the sequence 𝑢0 , … , 𝑢𝑇 −1 given by

𝑢𝑡 = −𝐹𝑡 𝑥𝑡 with 𝑥𝑡+1 = (𝐴 − 𝐵𝐹𝑡 )𝑥𝑡 + 𝐶𝑤𝑡+1 (15)

for 𝑡 = 0, … , 𝑇 − 1 attains the minimum of (6) subject to our constraints.

40.5 Implementation

We will use code from lqcontrol.py in QuantEcon.py to solve finite and infinite horizon linear
quadratic control problems.
In the module, the various updating, simulation and fixed point methods are wrapped in a
class called LQ, which includes
• Instance data:
– The required parameters 𝑄, 𝑅, 𝐴, 𝐵 and optional parameters C, β, T, R_f, N spec-
ifying a given LQ model
* set 𝑇 and 𝑅𝑓 to None in the infinite horizon case
* set C = None (or zero) in the deterministic case
– the value function and policy data
* 𝑑𝑡 , 𝑃𝑡 , 𝐹𝑡 in the finite horizon case
* 𝑑, 𝑃 , 𝐹 in the infinite horizon case
• Methods:
– update_values — shifts 𝑑𝑡 , 𝑃𝑡 , 𝐹𝑡 to their 𝑡 − 1 values via (12), (13) and (14)
– stationary_values — computes 𝑃 , 𝑑, 𝐹 in the infinite horizon case
– compute_sequence —- simulates the dynamics of 𝑥𝑡 , 𝑢𝑡 , 𝑤𝑡 given 𝑥0 and assuming
standard normal shocks

40.5.1 An Application

Early Keynesian models assumed that households have a constant marginal propensity to
consume from current income.
Data contradicted the constancy of the marginal propensity to consume.
In response, Milton Friedman, Franco Modigliani and others built models based on a con-
sumer’s preference for an intertemporally smooth consumption stream.
(See, for example, [38] or [82])
One property of those models is that households purchase and sell financial assets to make
consumption streams smoother than income streams.
The household savings problem outlined above captures these ideas.
The optimization problem for the household is to choose a consumption sequence in order to
minimize
40.5. IMPLEMENTATION 641

𝑇 −1
𝔼 { ∑ 𝛽 𝑡 (𝑐𝑡 − 𝑐)̄ 2 + 𝛽 𝑇 𝑞𝑎2𝑇 } (16)
𝑡=0

subject to the sequence of budget constraints 𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑐𝑡 + 𝑦𝑡 , 𝑡 ≥ 0.


Here 𝑞 is a large positive constant, the role of which is to induce the consumer to target zero
debt at the end of her life.
(Without such a constraint, the optimal choice is to choose 𝑐𝑡 = 𝑐 ̄ in each period, letting as-
sets adjust accordingly)
As before we set 𝑦𝑡 = 𝜎𝑤𝑡+1 + 𝜇 and 𝑢𝑡 ∶= 𝑐𝑡 − 𝑐,̄ after which the constraint can be written as
in (2).
We saw how this constraint could be manipulated into the LQ formulation 𝑥𝑡+1 = 𝐴𝑥𝑡 +𝐵𝑢𝑡 +
𝐶𝑤𝑡+1 by setting 𝑥𝑡 = (𝑎𝑡 1)′ and using the definitions in (4).
To match with this state and control, the objective function (16) can be written in the form
of (6) by choosing

0 0 𝑞 0
𝑄 ∶= 1, 𝑅 ∶= ( ), and 𝑅𝑓 ∶= ( )
0 0 0 0

Now that the problem is expressed in LQ form, we can proceed to the solution by applying
(12) and (14).
After generating shocks 𝑤1 , … , 𝑤𝑇 , the dynamics for assets and consumption can be simu-
lated via (15).
The following figure was computed using 𝑟 = 0.05, 𝛽 = 1/(1 + 𝑟), 𝑐 ̄ = 2, 𝜇 = 1, 𝜎 = 0.25, 𝑇 = 45
and 𝑞 = 106 .
The shocks {𝑤𝑡 } were taken to be IID and standard normal.

In [3]: # Model parameters


r = 0.05
β = 1/(1 + r)
T = 45
c_bar = 2
σ = 0.25
μ = 1
q = 1e6

# Formulate as an LQ problem
Q = 1
R = np.zeros((2, 2))
Rf = np.zeros((2, 2))
Rf[0, 0] = q
A = [[1 + r, ­c_bar + μ],
[0, 1]]
B = [[­1],
[ 0]]
C = [[σ],
[0]]

# Compute solutions and simulate


lq = LQ(Q, R, A, B, C, beta=β, T=T, Rf=Rf)
x0 = (0, 1)
xp, up, wp = lq.compute_sequence(x0)
642 CHAPTER 40. LQ CONTROL: FOUNDATIONS

# Convert back to assets, consumption and income


assets = xp[0, :] # a_t
c = up.flatten() + c_bar # c_t
income = σ * wp[0, 1:] + μ # y_t

# Plot results
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))

plt.subplots_adjust(hspace=0.5)

bbox = (0., 1.02, 1., .102)


legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}
p_args = {'lw': 2, 'alpha': 0.7}

axes[0].plot(list(range(1, T+1)), income, 'g­', label="non­financial income",


**p_args)
axes[0].plot(list(range(T)), c, 'k­', label="consumption", **p_args)

axes[1].plot(list(range(1, T+1)), np.cumsum(income ­ μ), 'r­',


label="cumulative unanticipated income", **p_args)
axes[1].plot(list(range(T+1)), assets, 'b­', label="assets", **p_args)
axes[1].plot(list(range(T)), np.zeros(T), 'k­')

for ax in axes:
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=2, **legend_args)

plt.show()
40.5. IMPLEMENTATION 643

The top panel shows the time path of consumption 𝑐𝑡 and income 𝑦𝑡 in the simulation.
As anticipated by the discussion on consumption smoothing, the time path of consumption is
much smoother than that for income.
(But note that consumption becomes more irregular towards the end of life, when the zero
final asset requirement impinges more on consumption choices).
The second panel in the figure shows that the time path of assets 𝑎𝑡 is closely correlated with
cumulative unanticipated income, where the latter is defined as

𝑡
𝑧𝑡 ∶= ∑ 𝜎𝑤𝑡
𝑗=0

A key message is that unanticipated windfall gains are saved rather than consumed, while
unanticipated negative shocks are met by reducing assets.
(Again, this relationship breaks down towards the end of life due to the zero final asset re-
quirement)
These results are relatively robust to changes in parameters.
For example, let’s increase 𝛽 from 1/(1 + 𝑟) ≈ 0.952 to 0.96 while keeping other parameters
fixed.
This consumer is slightly more patient than the last one, and hence puts relatively more
weight on later consumption values.
644 CHAPTER 40. LQ CONTROL: FOUNDATIONS

In [4]: # Compute solutions and simulate


lq = LQ(Q, R, A, B, C, beta=0.96, T=T, Rf=Rf)
x0 = (0, 1)
xp, up, wp = lq.compute_sequence(x0)

# Convert back to assets, consumption and income


assets = xp[0, :] # a_t
c = up.flatten() + c_bar # c_t
income = σ * wp[0, 1:] + μ # y_t

# Plot results
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))

plt.subplots_adjust(hspace=0.5)

bbox = (0., 1.02, 1., .102)


legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}
p_args = {'lw': 2, 'alpha': 0.7}

axes[0].plot(list(range(1, T+1)), income, 'g­', label="non­financial income",


**p_args)
axes[0].plot(list(range(T)), c, 'k­', label="consumption", **p_args)

axes[1].plot(list(range(1, T+1)), np.cumsum(income ­ μ), 'r­',


label="cumulative unanticipated income", **p_args)
axes[1].plot(list(range(T+1)), assets, 'b­', label="assets", **p_args)
axes[1].plot(list(range(T)), np.zeros(T), 'k­')

for ax in axes:
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=2, **legend_args)

plt.show()
40.6. EXTENSIONS AND COMMENTS 645

We now have a slowly rising consumption stream and a hump-shaped build-up of assets in the
middle periods to fund rising consumption.
However, the essential features are the same: consumption is smooth relative to income, and
assets are strongly positively correlated with cumulative unanticipated income.

40.6 Extensions and Comments

Let’s now consider a number of standard extensions to the LQ problem treated above.

40.6.1 Time-Varying Parameters

In some settings, it can be desirable to allow 𝐴, 𝐵, 𝐶, 𝑅 and 𝑄 to depend on 𝑡.


For the sake of simplicity, we’ve chosen not to treat this extension in our implementation
given below.
However, the loss of generality is not as large as you might first imagine.
In fact, we can tackle many models with time-varying parameters by suitable choice of state
variables.
One illustration is given below.
646 CHAPTER 40. LQ CONTROL: FOUNDATIONS

For further examples and a more systematic treatment, see [51], section 2.4.

40.6.2 Adding a Cross-Product Term

In some LQ problems, preferences include a cross-product term 𝑢′𝑡 𝑁 𝑥𝑡 , so that the objective
function becomes

𝑇 −1
𝔼 { ∑ 𝛽 𝑡 (𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑁 𝑥𝑡 ) + 𝛽 𝑇 𝑥′𝑇 𝑅𝑓 𝑥𝑇 } (17)
𝑡=0

Our results extend to this case in a straightforward way.


The sequence {𝑃𝑡 } from (12) becomes

𝑃𝑡−1 = 𝑅 − (𝛽𝐵′ 𝑃𝑡 𝐴 + 𝑁 )′ (𝑄 + 𝛽𝐵′ 𝑃𝑡 𝐵)−1 (𝛽𝐵′ 𝑃𝑡 𝐴 + 𝑁 ) + 𝛽𝐴′ 𝑃𝑡 𝐴 with 𝑃𝑇 = 𝑅𝑓 (18)

The policies in (14) are modified to

𝑢𝑡 = −𝐹𝑡 𝑥𝑡 where 𝐹𝑡 ∶= (𝑄 + 𝛽𝐵′ 𝑃𝑡+1 𝐵)−1 (𝛽𝐵′ 𝑃𝑡+1 𝐴 + 𝑁 ) (19)

The sequence {𝑑𝑡 } is unchanged from (13).


We leave interested readers to confirm these results (the calculations are long but not overly
difficult).

40.6.3 Infinite Horizon

Finally, we consider the infinite horizon case, with cross-product term, unchanged dynamics
and objective function given by


𝔼 {∑ 𝛽 𝑡 (𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑁 𝑥𝑡 )} (20)
𝑡=0

In the infinite horizon case, optimal policies can depend on time only if time itself is a compo-
nent of the state vector 𝑥𝑡 .
In other words, there exists a fixed matrix 𝐹 such that 𝑢𝑡 = −𝐹 𝑥𝑡 for all 𝑡.
That decision rules are constant over time is intuitive — after all, the decision-maker faces
the same infinite horizon at every stage, with only the current state changing.
Not surprisingly, 𝑃 and 𝑑 are also constant.
The stationary matrix 𝑃 is the solution to the discrete-time algebraic Riccati equation.

𝑃 = 𝑅 − (𝛽𝐵′ 𝑃 𝐴 + 𝑁 )′ (𝑄 + 𝛽𝐵′ 𝑃 𝐵)−1 (𝛽𝐵′ 𝑃 𝐴 + 𝑁 ) + 𝛽𝐴′ 𝑃 𝐴 (21)

Equation (21) is also called the LQ Bellman equation, and the map that sends a given 𝑃 into
the right-hand side of (21) is called the LQ Bellman operator.
The stationary optimal policy for this model is
40.7. FURTHER APPLICATIONS 647

𝑢 = −𝐹 𝑥 where 𝐹 = (𝑄 + 𝛽𝐵′ 𝑃 𝐵)−1 (𝛽𝐵′ 𝑃 𝐴 + 𝑁 ) (22)

The sequence {𝑑𝑡 } from (13) is replaced by the constant value

𝛽
𝑑 ∶= trace(𝐶 ′ 𝑃 𝐶) (23)
1−𝛽

The state evolves according to the time-homogeneous process 𝑥𝑡+1 = (𝐴 − 𝐵𝐹 )𝑥𝑡 + 𝐶𝑤𝑡+1 .
An example infinite horizon problem is treated below.

40.6.4 Certainty Equivalence

Linear quadratic control problems of the class discussed above have the property of certainty
equivalence.
By this, we mean that the optimal policy 𝐹 is not affected by the parameters in 𝐶, which
specify the shock process.
This can be confirmed by inspecting (22) or (19).
It follows that we can ignore uncertainty when solving for optimal behavior, and plug it back
in when examining optimal state dynamics.

40.7 Further Applications

40.7.1 Application 1: Age-Dependent Income Process

Previously we studied a permanent income model that generated consumption smoothing.


One unrealistic feature of that model is the assumption that the mean of the random income
process does not depend on the consumer’s age.
A more realistic income profile is one that rises in early working life, peaks towards the mid-
dle and maybe declines toward the end of working life and falls more during retirement.
In this section, we will model this rise and fall as a symmetric inverted “U” using a polyno-
mial in age.
As before, the consumer seeks to minimize

𝑇 −1
𝔼 { ∑ 𝛽 𝑡 (𝑐𝑡 − 𝑐)̄ 2 + 𝛽 𝑇 𝑞𝑎2𝑇 } (24)
𝑡=0

subject to 𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑐𝑡 + 𝑦𝑡 , 𝑡 ≥ 0.


For income we now take 𝑦𝑡 = 𝑝(𝑡) + 𝜎𝑤𝑡+1 where 𝑝(𝑡) ∶= 𝑚0 + 𝑚1 𝑡 + 𝑚2 𝑡2 .
(In the next section we employ some tricks to implement a more sophisticated model)
The coefficients 𝑚0 , 𝑚1 , 𝑚2 are chosen such that 𝑝(0) = 0, 𝑝(𝑇 /2) = 𝜇, and 𝑝(𝑇 ) = 0.
You can confirm that the specification 𝑚0 = 0, 𝑚1 = 𝑇 𝜇/(𝑇 /2)2 , 𝑚2 = −𝜇/(𝑇 /2)2 satisfies
these constraints.
648 CHAPTER 40. LQ CONTROL: FOUNDATIONS

To put this into an LQ setting, consider the budget constraint, which becomes

𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑢𝑡 − 𝑐 ̄ + 𝑚1 𝑡 + 𝑚2 𝑡2 + 𝜎𝑤𝑡+1 (25)

The fact that 𝑎𝑡+1 is a linear function of (𝑎𝑡 , 1, 𝑡, 𝑡2 ) suggests taking these four variables as
the state vector 𝑥𝑡 .
Once a good choice of state and control (recall 𝑢𝑡 = 𝑐𝑡 − 𝑐)̄ has been made, the remaining
specifications fall into place relatively easily.
Thus, for the dynamics we set

𝑎𝑡 1 + 𝑟 −𝑐 ̄ 𝑚1 𝑚2 −1 𝜎
⎜ 1 ⎞
⎛ ⎟ ⎛
⎜ 0 1 0 0 ⎞⎟ ⎜ 0 ⎞
⎛ ⎟ ⎜ 0 ⎞
⎛ ⎟
𝑥𝑡 ∶= ⎜
⎜ ⎟, 𝐴 ∶= ⎜ ⎟, 𝐵 ∶= ⎜ ⎟, 𝐶 ∶= ⎜ ⎟ (26)
⎜ 𝑡 ⎟⎟ ⎜
⎜ 0 1 1 0 ⎟⎟ ⎜ 0 ⎟
⎜ ⎟ ⎜ 0 ⎟
⎜ ⎟
2
⎝ 𝑡 ⎠ ⎝ 0 1 2 1 ⎠ ⎝ 0 ⎠ ⎝ 0 ⎠

If you expand the expression 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝑢𝑡 + 𝐶𝑤𝑡+1 using this specification, you will find
that assets follow (25) as desired and that the other state variables also update appropriately.
To implement preference specification (24) we take

0 0 0 0 𝑞 0 0 0

⎜ 0 0 0 0 ⎞
⎟ ⎛
⎜ 0 0 0 0 ⎞

𝑄 ∶= 1, 𝑅 ∶= ⎜
⎜ ⎟
⎟ and 𝑅𝑓 ∶= ⎜
⎜ ⎟
⎟ (27)
⎜ 0 0 0 0 ⎟ ⎜ 0 0 0 0 ⎟
⎝ 0 0 0 0 ⎠ ⎝ 0 0 0 0 ⎠

The next figure shows a simulation of consumption and assets computed using the
40.7. FURTHER APPLICATIONS 649

compute_sequence method of lqcontrol.py with initial assets set to zero.

Once again, smooth consumption is a dominant feature of the sample paths.


The asset path exhibits dynamics consistent with standard life cycle theory.
Exercise 1 gives the full set of parameters used here and asks you to replicate the figure.

40.7.2 Application 2: A Permanent Income Model with Retirement

In the previous application, we generated income dynamics with an inverted U shape using
polynomials and placed them in an LQ framework.
It is arguably the case that this income process still contains unrealistic features.
A more common earning profile is where

1. income grows over working life, fluctuating around an increasing trend, with growth
flattening off in later years
2. retirement follows, with lower but relatively stable (non-financial) income

Letting 𝐾 be the retirement date, we can express these income dynamics by

𝑝(𝑡) + 𝜎𝑤𝑡+1 if 𝑡 ≤ 𝐾
𝑦𝑡 = { (28)
𝑠 otherwise

Here
650 CHAPTER 40. LQ CONTROL: FOUNDATIONS

• 𝑝(𝑡) ∶= 𝑚1 𝑡 + 𝑚2 𝑡2 with the coefficients 𝑚1 , 𝑚2 chosen such that 𝑝(𝐾) = 𝜇 and 𝑝(0) =
𝑝(2𝐾) = 0
• 𝑠 is retirement income
We suppose that preferences are unchanged and given by (16).
The budget constraint is also unchanged and given by 𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑐𝑡 + 𝑦𝑡 .
Our aim is to solve this problem and simulate paths using the LQ techniques described in this
lecture.
In fact, this is a nontrivial problem, as the kink in the dynamics (28) at 𝐾 makes it very diffi-
cult to express the law of motion as a fixed-coefficient linear system.
However, we can still use our LQ methods here by suitably linking two-component LQ prob-
lems.
These two LQ problems describe the consumer’s behavior during her working life
(lq_working) and retirement (lq_retired).
(This is possible because, in the two separate periods of life, the respective income processes
[polynomial trend and constant] each fit the LQ framework)
The basic idea is that although the whole problem is not a single time-invariant LQ problem,
it is still a dynamic programming problem, and hence we can use appropriate Bellman equa-
tions at every stage.
Based on this logic, we can

1. solve lq_retired by the usual backward induction procedure, iterating back to the start
of retirement.

2. take the start-of-retirement value function generated by this process, and use it as the
terminal condition 𝑅𝑓 to feed into the lq_working specification.

3. solve lq_working by backward induction from this choice of 𝑅𝑓 , iterating back to the
start of working life.

This process gives the entire life-time sequence of value functions and optimal policies.
40.7. FURTHER APPLICATIONS 651

The next figure shows one simulation based on this procedure.

The full set of parameters used in the simulation is discussed in Exercise 2, where you are
asked to replicate the figure.
Once again, the dominant feature observable in the simulation is consumption smoothing.
The asset path fits well with standard life cycle theory, with dissaving early in life followed by
later saving.
Assets peak at retirement and subsequently decline.

40.7.3 Application 3: Monopoly with Adjustment Costs

Consider a monopolist facing stochastic inverse demand function

𝑝𝑡 = 𝑎0 − 𝑎1 𝑞𝑡 + 𝑑𝑡

Here 𝑞𝑡 is output, and the demand shock 𝑑𝑡 follows

𝑑𝑡+1 = 𝜌𝑑𝑡 + 𝜎𝑤𝑡+1

where {𝑤𝑡 } is IID and standard normal.


The monopolist maximizes the expected discounted sum of present and future profits
652 CHAPTER 40. LQ CONTROL: FOUNDATIONS


𝔼 { ∑ 𝛽 𝑡 𝜋𝑡 } where 𝜋𝑡 ∶= 𝑝𝑡 𝑞𝑡 − 𝑐𝑞𝑡 − 𝛾(𝑞𝑡+1 − 𝑞𝑡 )2 (29)
𝑡=0

Here
• 𝛾(𝑞𝑡+1 − 𝑞𝑡 )2 represents adjustment costs
• 𝑐 is average cost of production
This can be formulated as an LQ problem and then solved and simulated, but first let’s study
the problem and try to get some intuition.
One way to start thinking about the problem is to consider what would happen if 𝛾 = 0.
Without adjustment costs there is no intertemporal trade-off, so the monopolist will choose
output to maximize current profit in each period.
It’s not difficult to show that profit-maximizing output is

𝑎0 − 𝑐 + 𝑑 𝑡
𝑞𝑡̄ ∶=
2𝑎1

In light of this discussion, what we might expect for general 𝛾 is that


• if 𝛾 is close to zero, then 𝑞𝑡 will track the time path of 𝑞𝑡̄ relatively closely.
• if 𝛾 is larger, then 𝑞𝑡 will be smoother than 𝑞𝑡̄ , as the monopolist seeks to avoid adjust-
ment costs.
This intuition turns out to be correct.
The following figures show simulations produced by solving the corresponding LQ problem.
The only difference in parameters across the figures is the size of 𝛾
40.7. FURTHER APPLICATIONS 653

To produce these figures we converted the monopolist problem into an LQ problem.


The key to this conversion is to choose the right state — which can be a bit of an art.
Here we take 𝑥𝑡 = (𝑞𝑡̄ 𝑞𝑡 1)′ , while the control is chosen as 𝑢𝑡 = 𝑞𝑡+1 − 𝑞𝑡 .
We also manipulated the profit function slightly.
In (29), current profits are 𝜋𝑡 ∶= 𝑝𝑡 𝑞𝑡 − 𝑐𝑞𝑡 − 𝛾(𝑞𝑡+1 − 𝑞𝑡 )2 .
Let’s now replace 𝜋𝑡 in (29) with 𝜋𝑡̂ ∶= 𝜋𝑡 − 𝑎1 𝑞𝑡2̄ .
654 CHAPTER 40. LQ CONTROL: FOUNDATIONS

This makes no difference to the solution, since 𝑎1 𝑞𝑡2̄ does not depend on the controls.
(In fact, we are just adding a constant term to (29), and optimizers are not affected by con-
stant terms)
The reason for making this substitution is that, as you will be able to verify, 𝜋𝑡̂ reduces to the
simple quadratic

𝜋𝑡̂ = −𝑎1 (𝑞𝑡 − 𝑞𝑡̄ )2 − 𝛾𝑢2𝑡

After negation to convert to a minimization problem, the objective becomes


min 𝔼 ∑ 𝛽 𝑡 {𝑎1 (𝑞𝑡 − 𝑞𝑡̄ )2 + 𝛾𝑢2𝑡 } (30)
𝑡=0

It’s now relatively straightforward to find 𝑅 and 𝑄 such that (30) can be written as (20).
Furthermore, the matrices 𝐴, 𝐵 and 𝐶 from (1) can be found by writing down the dynamics
of each element of the state.
Exercise 3 asks you to complete this process, and reproduce the preceding figures.

40.8 Exercises

40.8.1 Exercise 1

Replicate the figure with polynomial income shown above.


The parameters are 𝑟 = 0.05, 𝛽 = 1/(1 + 𝑟), 𝑐 ̄ = 1.5, 𝜇 = 2, 𝜎 = 0.15, 𝑇 = 50 and 𝑞 = 104 .

40.8.2 Exercise 2

Replicate the figure on work and retirement shown above.


The parameters are 𝑟 = 0.05, 𝛽 = 1/(1 + 𝑟), 𝑐 ̄ = 4, 𝜇 = 4, 𝜎 = 0.35, 𝐾 = 40, 𝑇 = 60, 𝑠 = 1 and
𝑞 = 104 .
To understand the overall procedure, carefully read the section containing that figure.
Some hints are as follows:
First, in order to make our approach work, we must ensure that both LQ problems have the
same state variables and control.
As with previous applications, the control can be set to 𝑢𝑡 = 𝑐𝑡 − 𝑐.̄
For lq_working, 𝑥𝑡 , 𝐴, 𝐵, 𝐶 can be chosen as in (26).
• Recall that 𝑚1 , 𝑚2 are chosen so that 𝑝(𝐾) = 𝜇 and 𝑝(2𝐾) = 0.
For lq_retired, use the same definition of 𝑥𝑡 and 𝑢𝑡 , but modify 𝐴, 𝐵, 𝐶 to correspond to
constant income 𝑦𝑡 = 𝑠.
For lq_retired, set preferences as in (27).
For lq_working, preferences are the same, except that 𝑅𝑓 should be replaced by the final
value function that emerges from iterating lq_retired back to the start of retirement.
40.9. SOLUTIONS 655

With some careful footwork, the simulation can be generated by patching together the simu-
lations from these two separate models.

40.8.3 Exercise 3

Reproduce the figures from the monopolist application given above.


For parameters, use 𝑎0 = 5, 𝑎1 = 0.5, 𝜎 = 0.15, 𝜌 = 0.9, 𝛽 = 0.95 and 𝑐 = 2, while 𝛾 varies
between 1 and 50 (see figures).

40.9 Solutions

40.9.1 Exercise 1

Here’s one solution.


We use some fancy plot commands to get a certain style — feel free to use simpler ones.
The model is an LQ permanent income / life-cycle model with hump-shaped income

𝑦𝑡 = 𝑚1 𝑡 + 𝑚2 𝑡2 + 𝜎𝑤𝑡+1

where {𝑤𝑡 } is IID 𝑁 (0, 1) and the coefficients 𝑚1 and 𝑚2 are chosen so that 𝑝(𝑡) = 𝑚1 𝑡 +
𝑚2 𝑡2 has an inverted U shape with
• 𝑝(0) = 0, 𝑝(𝑇 /2) = 𝜇, and
• 𝑝(𝑇 ) = 0

In [5]: # Model parameters


r = 0.05
β = 1/(1 + r)
T = 50
c_bar = 1.5
σ = 0.15
μ = 2
q = 1e4
m1 = T * (μ/(T/2)**2)
m2 = ­(μ/(T/2)**2)

# Formulate as an LQ problem
Q = 1
R = np.zeros((4, 4))
Rf = np.zeros((4, 4))
Rf[0, 0] = q
A = [[1 + r, ­c_bar, m1, m2],
[0, 1, 0, 0],
[0, 1, 1, 0],
[0, 1, 2, 1]]
B = [[­1],
[ 0],
[ 0],
[ 0]]
C = [[σ],
[0],
[0],
[0]]
656 CHAPTER 40. LQ CONTROL: FOUNDATIONS

# Compute solutions and simulate


lq = LQ(Q, R, A, B, C, beta=β, T=T, Rf=Rf)
x0 = (0, 1, 0, 0)
xp, up, wp = lq.compute_sequence(x0)

# Convert results back to assets, consumption and income


ap = xp[0, :] # Assets
c = up.flatten() + c_bar # Consumption
time = np.arange(1, T+1)
income = σ * wp[0, 1:] + m1 * time + m2 * time**2 # Income

# Plot results
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))

plt.subplots_adjust(hspace=0.5)

bbox = (0., 1.02, 1., .102)


legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}
p_args = {'lw': 2, 'alpha': 0.7}

axes[0].plot(range(1, T+1), income, 'g­', label="non­financial income",


**p_args)
axes[0].plot(range(T), c, 'k­', label="consumption", **p_args)

axes[1].plot(range(T+1), ap.flatten(), 'b­', label="assets", **p_args)


axes[1].plot(range(T+1), np.zeros(T+1), 'k­')

for ax in axes:
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=2, **legend_args)

plt.show()
40.9. SOLUTIONS 657

40.9.2 Exercise 2

This is a permanent income / life-cycle model with polynomial growth in income over work-
ing life followed by a fixed retirement income.
The model is solved by combining two LQ programming problems as described in the lecture.

In [6]: # Model parameters


r = 0.05
β = 1/(1 + r)
T = 60
K = 40
c_bar = 4
σ = 0.35
μ = 4
q = 1e4
s = 1
m1 = 2 * μ/K
m2 = ­μ/K**2

# Formulate LQ problem 1 (retirement)


Q = 1
R = np.zeros((4, 4))
Rf = np.zeros((4, 4))
Rf[0, 0] = q
658 CHAPTER 40. LQ CONTROL: FOUNDATIONS

A = [[1 + r, s ­ c_bar, 0, 0],


[0, 1, 0, 0],
[0, 1, 1, 0],
[0, 1, 2, 1]]
B = [[­1],
[ 0],
[ 0],
[ 0]]
C = [[0],
[0],
[0],
[0]]

# Initialize LQ instance for retired agent


lq_retired = LQ(Q, R, A, B, C, beta=β, T=T­K, Rf=Rf)
# Iterate back to start of retirement, record final value function
for i in range(T­K):
lq_retired.update_values()
Rf2 = lq_retired.P

# Formulate LQ problem 2 (working life)


R = np.zeros((4, 4))
A = [[1 + r, ­c_bar, m1, m2],
[0, 1, 0, 0],
[0, 1, 1, 0],
[0, 1, 2, 1]]
B = [[­1],
[ 0],
[ 0],
[ 0]]
C = [[σ],
[0],
[0],
[0]]

# Set up working life LQ instance with terminal Rf from lq_retired


lq_working = LQ(Q, R, A, B, C, beta=β, T=K, Rf=Rf2)

# Simulate working state / control paths


x0 = (0, 1, 0, 0)
xp_w, up_w, wp_w = lq_working.compute_sequence(x0)
# Simulate retirement paths (note the initial condition)
xp_r, up_r, wp_r = lq_retired.compute_sequence(xp_w[:, K])

# Convert results back to assets, consumption and income


xp = np.column_stack((xp_w, xp_r[:, 1:]))
assets = xp[0, :] # Assets

up = np.column_stack((up_w, up_r))
c = up.flatten() + c_bar # Consumption

time = np.arange(1, K+1)


income_w = σ * wp_w[0, 1:K+1] + m1 * time + m2 * time**2 # Income
income_r = np.ones(T­K) * s
income = np.concatenate((income_w, income_r))

# Plot results
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))

plt.subplots_adjust(hspace=0.5)

bbox = (0., 1.02, 1., .102)


40.9. SOLUTIONS 659

legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}


p_args = {'lw': 2, 'alpha': 0.7}

axes[0].plot(range(1, T+1), income, 'g­', label="non­financial income",


**p_args)
axes[0].plot(range(T), c, 'k­', label="consumption", **p_args)

axes[1].plot(range(T+1), assets, 'b­', label="assets", **p_args)


axes[1].plot(range(T+1), np.zeros(T+1), 'k­')

for ax in axes:
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=2, **legend_args)

plt.show()

40.9.3 Exercise 3

The first task is to find the matrices 𝐴, 𝐵, 𝐶, 𝑄, 𝑅 that define the LQ problem.
Recall that 𝑥𝑡 = (𝑞𝑡̄ 𝑞𝑡 1)′ , while 𝑢𝑡 = 𝑞𝑡+1 − 𝑞𝑡 .
Letting 𝑚0 ∶= (𝑎0 − 𝑐)/2𝑎1 and 𝑚1 ∶= 1/2𝑎1 , we can write 𝑞𝑡̄ = 𝑚0 + 𝑚1 𝑑𝑡 , and then, with
some manipulation
660 CHAPTER 40. LQ CONTROL: FOUNDATIONS

𝑞𝑡+1
̄ = 𝑚0 (1 − 𝜌) + 𝜌𝑞𝑡̄ + 𝑚1 𝜎𝑤𝑡+1

By our definition of 𝑢𝑡 , the dynamics of 𝑞𝑡 are 𝑞𝑡+1 = 𝑞𝑡 + 𝑢𝑡 .


Using these facts you should be able to build the correct 𝐴, 𝐵, 𝐶 matrices (and then check
them against those found in the solution code below).
Suitable 𝑅, 𝑄 matrices can be found by inspecting the objective function, which we repeat
here for convenience:


min 𝔼 {∑ 𝛽 𝑡 𝑎1 (𝑞𝑡 − 𝑞𝑡̄ )2 + 𝛾𝑢2𝑡 }
𝑡=0

Our solution code is

In [7]: # Model parameters


a0 = 5
a1 = 0.5
σ = 0.15
ρ = 0.9
γ = 1
β = 0.95
c = 2
T = 120

# Useful constants
m0 = (a0­c)/(2 * a1)
m1 = 1/(2 * a1)

# Formulate LQ problem
Q = γ
R = [[ a1, ­a1, 0],
[­a1, a1, 0],
[ 0, 0, 0]]
A = [[ρ, 0, m0 * (1 ­ ρ)],
[0, 1, 0],
[0, 0, 1]]

B = [[0],
[1],
[0]]
C = [[m1 * σ],
[ 0],
[ 0]]

lq = LQ(Q, R, A, B, C=C, beta=β)

# Simulate state / control paths


x0 = (m0, 2, 1)
xp, up, wp = lq.compute_sequence(x0, ts_length=150)
q_bar = xp[0, :]
q = xp[1, :]

# Plot simulation results


fig, ax = plt.subplots(figsize=(10, 6.5))

# Some fancy plotting stuff ­­ simplify if you prefer


bbox = (0., 1.01, 1., .101)
legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}
40.9. SOLUTIONS 661

p_args = {'lw': 2, 'alpha': 0.6}

time = range(len(q))
ax.set(xlabel='Time', xlim=(0, max(time)))
ax.plot(time, q_bar, 'k­', lw=2, alpha=0.6, label=r'$\bar q_t$')
ax.plot(time, q, 'b­', lw=2, alpha=0.6, label='$q_t$')
ax.legend(ncol=2, **legend_args)
s = f'dynamics with $\gamma = {γ}$'
ax.text(max(time) * 0.6, 1 * q_bar.max(), s, fontsize=14)
plt.show()
662 CHAPTER 40. LQ CONTROL: FOUNDATIONS
Chapter 41

The Permanent Income Model

41.1 Contents

• Overview 41.2
• The Savings Problem 41.3
• Alternative Representations 41.4
• Two Classic Examples 41.5
• Further Reading 41.6
• Appendix: The Euler Equation 41.7
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon

41.2 Overview

This lecture describes a rational expectations version of the famous permanent income model
of Milton Friedman [38].
Robert Hall cast Friedman’s model within a linear-quadratic setting [47].
Like Hall, we formulate an infinite-horizon linear-quadratic savings problem.
We use the model as a vehicle for illustrating
• alternative formulations of the state of a dynamic system
• the idea of cointegration
• impulse response functions
• the idea that changes in consumption are useful as predictors of movements in income
Background readings on the linear-quadratic-Gaussian permanent income model are Hall’s
[47] and chapter 2 of [72].
Let’s start with some imports

In [2]: import matplotlib.pyplot as plt


%matplotlib inline
import numpy as np
import random
from numba import njit

663
664 CHAPTER 41. THE PERMANENT INCOME MODEL

41.3 The Savings Problem

In this section, we state and solve the savings and consumption problem faced by the con-
sumer.

41.3.1 Preliminaries

We use a class of stochastic processes called martingales.


A discrete-time martingale is a stochastic process (i.e., a sequence of random variables) {𝑋𝑡 }
with finite mean at each 𝑡 and satisfying

𝔼𝑡 [𝑋𝑡+1 ] = 𝑋𝑡 , 𝑡 = 0, 1, 2, …

Here 𝔼𝑡 ∶= 𝔼[⋅ | ℱ𝑡 ] is a conditional mathematical expectation conditional on the time 𝑡 infor-


mation set ℱ𝑡 .
The latter is just a collection of random variables that the modeler declares to be visible at 𝑡.
• When not explicitly defined, it is usually understood that ℱ𝑡 = {𝑋𝑡 , 𝑋𝑡−1 , … , 𝑋0 }.
Martingales have the feature that the history of past outcomes provides no predictive power
for changes between current and future outcomes.
For example, the current wealth of a gambler engaged in a “fair game” has this property.
One common class of martingales is the family of random walks.
A random walk is a stochastic process {𝑋𝑡 } that satisfies

𝑋𝑡+1 = 𝑋𝑡 + 𝑤𝑡+1

for some IID zero mean innovation sequence {𝑤𝑡 }.


Evidently, 𝑋𝑡 can also be expressed as

𝑡
𝑋𝑡 = ∑ 𝑤𝑗 + 𝑋0
𝑗=1

Not every martingale arises as a random walk (see, for example, Wald’s martingale).

41.3.2 The Decision Problem

A consumer has preferences over consumption streams that are ordered by the utility func-
tional


𝔼0 [∑ 𝛽 𝑡 𝑢(𝑐𝑡 )] (1)
𝑡=0

where
• 𝔼𝑡 is the mathematical expectation conditioned on the consumer’s time 𝑡 information
• 𝑐𝑡 is time 𝑡 consumption
41.3. THE SAVINGS PROBLEM 665

• 𝑢 is a strictly concave one-period utility function


• 𝛽 ∈ (0, 1) is a discount factor
The consumer maximizes (1) by choosing a consumption, borrowing plan {𝑐𝑡 , 𝑏𝑡+1 }∞
𝑡=0 subject
to the sequence of budget constraints

1
𝑐𝑡 + 𝑏𝑡 = 𝑏 + 𝑦𝑡 𝑡≥0 (2)
1 + 𝑟 𝑡+1

Here
• 𝑦𝑡 is an exogenous endowment process.
• 𝑟 > 0 is a time-invariant risk-free net interest rate.
• 𝑏𝑡 is one-period risk-free debt maturing at 𝑡.
The consumer also faces initial conditions 𝑏0 and 𝑦0 , which can be fixed or random.

41.3.3 Assumptions

For the remainder of this lecture, we follow Friedman and Hall in assuming that (1+𝑟)−1 = 𝛽.
Regarding the endowment process, we assume it has the state-space representation

𝑧𝑡+1 = 𝐴𝑧𝑡 + 𝐶𝑤𝑡+1


(3)
𝑦𝑡 = 𝑈 𝑧𝑡

where
• {𝑤𝑡 } is an IID vector process with 𝔼𝑤𝑡 = 0 and 𝔼𝑤𝑡 𝑤𝑡′ = 𝐼.
• The spectral radius of 𝐴 satisfies 𝜌(𝐴) < √1/𝛽.
• 𝑈 is a selection vector that pins down 𝑦𝑡 as a particular linear combination of compo-
nents of 𝑧𝑡 .
The restriction on 𝜌(𝐴) prevents income from growing so fast that discounted geometric sums
of some quadratic forms to be described below become infinite.
Regarding preferences, we assume the quadratic utility function

𝑢(𝑐𝑡 ) = −(𝑐𝑡 − 𝛾)2

where 𝛾 is a bliss level of consumption

Note
Along with this quadratic utility specification, we allow consumption to be nega-
tive. However, by choosing parameters appropriately, we can make the probability
that the model generates negative consumption paths over finite time horizons as
low as desired.

Finally, we impose the no Ponzi scheme condition


𝔼0 [∑ 𝛽 𝑡 𝑏𝑡2 ] < ∞ (4)
𝑡=0
666 CHAPTER 41. THE PERMANENT INCOME MODEL

This condition rules out an always-borrow scheme that would allow the consumer to enjoy
bliss consumption forever.

41.3.4 First-Order Conditions

First-order conditions for maximizing (1) subject to (2) are

𝔼𝑡 [𝑢′ (𝑐𝑡+1 )] = 𝑢′ (𝑐𝑡 ), 𝑡 = 0, 1, … (5)

These optimality conditions are also known as Euler equations.


If you’re not sure where they come from, you can find a proof sketch in the appendix.
With our quadratic preference specification, (5) has the striking implication that consumption
follows a martingale:

𝔼𝑡 [𝑐𝑡+1 ] = 𝑐𝑡 (6)

(In fact, quadratic preferences are necessary for this conclusion Section ??)
One way to interpret (6) is that consumption will change only when “new information” about
permanent income is revealed.
These ideas will be clarified below.

41.3.5 The Optimal Decision Rule

Now let’s deduce the optimal decision rule Section ??

Note
One way to solve the consumer’s problem is to apply dynamic programming as
in this lecture. We do this later. But first we use an alternative approach that is
revealing and shows the work that dynamic programming does for us behind the
scenes.

In doing so, we need to combine

1. the optimality condition (6)

2. the period-by-period budget constraint (2), and

3. the boundary condition (4)

𝑡
To accomplish this, observe first that (4) implies lim𝑡→∞ 𝛽 2 𝑏𝑡+1 = 0.
Using this restriction on the debt path and solving (2) forward yields


𝑏𝑡 = ∑ 𝛽 𝑗 (𝑦𝑡+𝑗 − 𝑐𝑡+𝑗 ) (7)
𝑗=0
41.3. THE SAVINGS PROBLEM 667

Take conditional expectations on both sides of (7) and use the martingale property of con-
sumption and the law of iterated expectations to deduce


𝑐𝑡
𝑏𝑡 = ∑ 𝛽 𝑗 𝔼𝑡 [𝑦𝑡+𝑗 ] − (8)
𝑗=0
1−𝛽

Expressed in terms of 𝑐𝑡 we get

∞ ∞
𝑟
𝑐𝑡 = (1 − 𝛽) [∑ 𝛽 𝑗 𝔼𝑡 [𝑦𝑡+𝑗 ] − 𝑏𝑡 ] = [∑ 𝛽 𝑗 𝔼𝑡 [𝑦𝑡+𝑗 ] − 𝑏𝑡 ] (9)
𝑗=0
1 + 𝑟 𝑗=0

where the last equality uses (1 + 𝑟)𝛽 = 1.


These last two equations assert that consumption equals economic income
• financial wealth equals −𝑏𝑡

• non-financial wealth equals ∑𝑗=0 𝛽 𝑗 𝔼𝑡 [𝑦𝑡+𝑗 ]
• total wealth equals the sum of financial and non-financial wealth
• a marginal propensity to consume out of total wealth equals the interest factor
𝑟
1+𝑟
• economic income equals
– a constant marginal propensity to consume times the sum of non-financial wealth
and financial wealth
– the amount the consumer can consume while leaving its wealth intact

Responding to the State

The state vector confronting the consumer at 𝑡 is [𝑏𝑡 𝑧𝑡 ].


Here
• 𝑧𝑡 is an exogenous component, unaffected by consumer behavior.
• 𝑏𝑡 is an endogenous component (since it depends on the decision rule).
Note that 𝑧𝑡 contains all variables useful for forecasting the consumer’s future endowment.
It is plausible that current decisions 𝑐𝑡 and 𝑏𝑡+1 should be expressible as functions of 𝑧𝑡 and
𝑏𝑡 .
This is indeed the case.
In fact, from this discussion, we see that

∞ ∞
∑ 𝛽 𝑗 𝔼𝑡 [𝑦𝑡+𝑗 ] = 𝔼𝑡 [∑ 𝛽 𝑗 𝑦𝑡+𝑗 ] = 𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡
𝑗=0 𝑗=0

Combining this with (9) gives

𝑟
𝑐𝑡 = [𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡 − 𝑏𝑡 ] (10)
1+𝑟

Using this equality to eliminate 𝑐𝑡 in the budget constraint (2) gives


668 CHAPTER 41. THE PERMANENT INCOME MODEL

𝑏𝑡+1 = (1 + 𝑟)(𝑏𝑡 + 𝑐𝑡 − 𝑦𝑡 )
= (1 + 𝑟)𝑏𝑡 + 𝑟[𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡 − 𝑏𝑡 ] − (1 + 𝑟)𝑈 𝑧𝑡
= 𝑏𝑡 + 𝑈 [𝑟(𝐼 − 𝛽𝐴)−1 − (1 + 𝑟)𝐼]𝑧𝑡
= 𝑏𝑡 + 𝑈 (𝐼 − 𝛽𝐴)−1 (𝐴 − 𝐼)𝑧𝑡

To get from the second last to the last expression in this chain of equalities is not trivial.

A key is to use the fact that (1 + 𝑟)𝛽 = 1 and (𝐼 − 𝛽𝐴)−1 = ∑𝑗=0 𝛽 𝑗 𝐴𝑗 .
We’ve now successfully written 𝑐𝑡 and 𝑏𝑡+1 as functions of 𝑏𝑡 and 𝑧𝑡 .

A State-Space Representation

We can summarize our dynamics in the form of a linear state-space system governing con-
sumption, debt and income:

𝑧𝑡+1 = 𝐴𝑧𝑡 + 𝐶𝑤𝑡+1


𝑏𝑡+1 = 𝑏𝑡 + 𝑈 [(𝐼 − 𝛽𝐴)−1 (𝐴 − 𝐼)]𝑧𝑡
(11)
𝑦𝑡 = 𝑈 𝑧𝑡
𝑐𝑡 = (1 − 𝛽)[𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡 − 𝑏𝑡 ]

To write this more succinctly, let

𝑧 𝐴 0 𝐶
𝑥𝑡 = [ 𝑡 ] , 𝐴̃ = [ ], 𝐶̃ = [ ]
𝑏𝑡 𝑈 (𝐼 − 𝛽𝐴)−1 (𝐴 − 𝐼) 1 0

and

𝑈 0 𝑦
𝑈̃ = [ −1 ], 𝑦𝑡̃ = [ 𝑡 ]
(1 − 𝛽)𝑈 (𝐼 − 𝛽𝐴) −(1 − 𝛽) 𝑐𝑡

Then we can express equation (11) as

𝑥𝑡+1 = 𝐴𝑥 ̃ + 𝐶𝑤
̃
𝑡 𝑡+1
(12)
̃
𝑦𝑡̃ = 𝑈 𝑥𝑡

We can use the following formulas from linear state space models to compute population
mean 𝜇𝑡 = 𝔼𝑥𝑡 and covariance Σ𝑡 ∶= 𝔼[(𝑥𝑡 − 𝜇𝑡 )(𝑥𝑡 − 𝜇𝑡 )′ ]

̃
𝜇𝑡+1 = 𝐴𝜇 with 𝜇0 given (13)
𝑡

̃ 𝐴′̃ + 𝐶 𝐶
Σ𝑡+1 = 𝐴Σ ̃ ′̃ with Σ0 given (14)
𝑡

We can then compute the mean and covariance of 𝑦𝑡̃ from

𝜇𝑦,𝑡 = 𝑈̃ 𝜇𝑡
(15)
Σ𝑦,𝑡 = 𝑈̃ Σ𝑡 𝑈̃ ′
41.3. THE SAVINGS PROBLEM 669

A Simple Example with IID Income

To gain some preliminary intuition on the implications of (11), let’s look at a highly stylized
example where income is just IID.
(Later examples will investigate more realistic income streams)
In particular, let {𝑤𝑡 }∞
𝑡=1 be IID and scalar standard normal, and let

𝑧1 0 0 𝜎
𝑧𝑡 = [ 𝑡 ] , 𝐴=[ ], 𝑈 = [1 𝜇] , 𝐶=[ ]
1 0 1 0

Finally, let 𝑏0 = 𝑧01 = 0.


Under these assumptions, we have 𝑦𝑡 = 𝜇 + 𝜎𝑤𝑡 ∼ 𝑁 (𝜇, 𝜎2 ).
Further, if you work through the state space representation, you will see that

𝑡−1
𝑏𝑡 = −𝜎 ∑ 𝑤𝑗
𝑗=1
𝑡
𝑐𝑡 = 𝜇 + (1 − 𝛽)𝜎 ∑ 𝑤𝑗
𝑗=1

Thus income is IID and debt and consumption are both Gaussian random walks.
Defining assets as −𝑏𝑡 , we see that assets are just the cumulative sum of unanticipated in-
comes prior to the present date.
The next figure shows a typical realization with 𝑟 = 0.05, 𝜇 = 1, and 𝜎 = 0.15

In [3]: r = 0.05
β = 1 / (1 + r)
σ = 0.15
μ = 1
T = 60

@njit
def time_path(T):
w = np.random.randn(T+1) # w_0, w_1, ..., w_T
w[0] = 0
b = np.zeros(T+1)
for t in range(1, T+1):
b[t] = w[1:t].sum()
b = ­σ * b
c = μ + (1 ­ β) * (σ * w ­ b)
return w, b, c

w, b, c = time_path(T)

fig, ax = plt.subplots(figsize=(10, 6))

ax.plot(μ + σ * w, 'g­', label="Non­financial income")


ax.plot(c, 'k­', label="Consumption")
ax.plot( b, 'b­', label="Debt")
ax.legend(ncol=3, mode='expand', bbox_to_anchor=(0., 1.02, 1., .102))
ax.grid()
ax.set_xlabel('Time')

plt.show()
670 CHAPTER 41. THE PERMANENT INCOME MODEL

Observe that consumption is considerably smoother than income.


The figure below shows the consumption paths of 250 consumers with independent income
streams

In [4]: fig, ax = plt.subplots(figsize=(10, 6))

b_sum = np.zeros(T+1)
for i in range(250):
w, b, c = time_path(T) # Generate new time path
rcolor = random.choice(('c', 'g', 'b', 'k'))
ax.plot(c, color=rcolor, lw=0.8, alpha=0.7)

ax.grid()
ax.set(xlabel='Time', ylabel='Consumption')

plt.show()
41.4. ALTERNATIVE REPRESENTATIONS 671

41.4 Alternative Representations

In this section, we shed more light on the evolution of savings, debt and consumption by rep-
resenting their dynamics in several different ways.

41.4.1 Hall’s Representation

Hall [47] suggested an insightful way to summarize the implications of LQ permanent income
theory.
First, to represent the solution for 𝑏𝑡 , shift (9) forward one period and eliminate 𝑏𝑡+1 by using
(2) to obtain


𝑐𝑡+1 = (1 − 𝛽) ∑ 𝛽 𝑗 𝔼𝑡+1 [𝑦𝑡+𝑗+1 ] − (1 − 𝛽) [𝛽 −1 (𝑐𝑡 + 𝑏𝑡 − 𝑦𝑡 )]
𝑗=0


If we add and subtract 𝛽 −1 (1 − 𝛽) ∑𝑗=0 𝛽 𝑗 𝔼𝑡 𝑦𝑡+𝑗 from the right side of the preceding equation
and rearrange, we obtain


𝑐𝑡+1 − 𝑐𝑡 = (1 − 𝛽) ∑ 𝛽 𝑗 {𝔼𝑡+1 [𝑦𝑡+𝑗+1 ] − 𝔼𝑡 [𝑦𝑡+𝑗+1 ]} (16)
𝑗=0

The right side is the time 𝑡 + 1 innovation to the expected present value of the endowment
process {𝑦𝑡 }.
We can represent the optimal decision rule for (𝑐𝑡 , 𝑏𝑡+1 ) in the form of (16) and (8), which we
repeat:
672 CHAPTER 41. THE PERMANENT INCOME MODEL


1
𝑏𝑡 = ∑ 𝛽 𝑗 𝔼𝑡 [𝑦𝑡+𝑗 ] − 𝑐 (17)
𝑗=0
1−𝛽 𝑡

Equation (17) asserts that the consumer’s debt due at 𝑡 equals the expected present value of
its endowment minus the expected present value of its consumption stream.
A high debt thus indicates a large expected present value of surpluses 𝑦𝑡 − 𝑐𝑡 .
Recalling again our discussion on forecasting geometric sums, we have


𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 = 𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡
𝑗=0

𝔼𝑡+1 ∑ 𝛽 𝑗 𝑦𝑡+𝑗+1 = 𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡+1
𝑗=0

𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗+1 = 𝑈 (𝐼 − 𝛽𝐴)−1 𝐴𝑧𝑡
𝑗=0

Using these formulas together with (3) and substituting into (16) and (17) gives the following
representation for the consumer’s optimum decision rule:

𝑐𝑡+1 = 𝑐𝑡 + (1 − 𝛽)𝑈 (𝐼 − 𝛽𝐴)−1 𝐶𝑤𝑡+1


1
𝑏𝑡 = 𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡 − 𝑐
1−𝛽 𝑡 (18)
𝑦𝑡 = 𝑈 𝑧𝑡
𝑧𝑡+1 = 𝐴𝑧𝑡 + 𝐶𝑤𝑡+1

Representation (18) makes clear that


• The state can be taken as (𝑐𝑡 , 𝑧𝑡 ).
– The endogenous part is 𝑐𝑡 and the exogenous part is 𝑧𝑡 .
– Debt 𝑏𝑡 has disappeared as a component of the state because it is encoded in 𝑐𝑡 .
• Consumption is a random walk with innovation (1 − 𝛽)𝑈 (𝐼 − 𝛽𝐴)−1 𝐶𝑤𝑡+1 .
– This is a more explicit representation of the martingale result in (6).

41.4.2 Cointegration

Representation (18) reveals that the joint process {𝑐𝑡 , 𝑏𝑡 } possesses the property that Engle
and Granger [33] called cointegration.
Cointegration is a tool that allows us to apply powerful results from the theory of stationary
stochastic processes to (certain transformations of) nonstationary models.
To apply cointegration in the present context, suppose that 𝑧𝑡 is asymptotically stationary
Section ??.
Despite this, both 𝑐𝑡 and 𝑏𝑡 will be non-stationary because they have unit roots (see (11) for
𝑏𝑡 ).
Nevertheless, there is a linear combination of 𝑐𝑡 , 𝑏𝑡 that is asymptotically stationary.
In particular, from the second equality in (18) we have
41.4. ALTERNATIVE REPRESENTATIONS 673

(1 − 𝛽)𝑏𝑡 + 𝑐𝑡 = (1 − 𝛽)𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡 (19)

Hence the linear combination (1 − 𝛽)𝑏𝑡 + 𝑐𝑡 is asymptotically stationary.


Accordingly, Granger and Engle would call [(1 − 𝛽) 1] a cointegrating vector for the
state.

When applied to the nonstationary vector process [𝑏𝑡 𝑐𝑡 ] , it yields a process that is asymp-
totically stationary.
Equation (19) can be rearranged to take the form


(1 − 𝛽)𝑏𝑡 + 𝑐𝑡 = (1 − 𝛽)𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 (20)
𝑗=0

Equation (20) asserts that the cointegrating residual on the left side equals the conditional
expectation of the geometric sum of future incomes on the right Section ??.

41.4.3 Cross-Sectional Implications

Consider again (18), this time in light of our discussion of distribution dynamics in the lec-
ture on linear systems.
The dynamics of 𝑐𝑡 are given by

𝑐𝑡+1 = 𝑐𝑡 + (1 − 𝛽)𝑈 (𝐼 − 𝛽𝐴)−1 𝐶𝑤𝑡+1 (21)

or

𝑡
𝑐𝑡 = 𝑐0 + ∑ 𝑤̂ 𝑗 for 𝑤̂ 𝑡+1 ∶= (1 − 𝛽)𝑈 (𝐼 − 𝛽𝐴)−1 𝐶𝑤𝑡+1
𝑗=1

The unit root affecting 𝑐𝑡 causes the time 𝑡 variance of 𝑐𝑡 to grow linearly with 𝑡.
In particular, since {𝑤̂ 𝑡 } is IID, we have

Var[𝑐𝑡 ] = Var[𝑐0 ] + 𝑡 𝜎̂ 2 (22)

where

𝜎̂ 2 ∶= (1 − 𝛽)2 𝑈 (𝐼 − 𝛽𝐴)−1 𝐶𝐶 ′ (𝐼 − 𝛽𝐴′ )−1 𝑈 ′

When 𝜎̂ > 0, {𝑐𝑡 } has no asymptotic distribution.


Let’s consider what this means for a cross-section of ex-ante identical consumers born at time
0.
Let the distribution of 𝑐0 represent the cross-section of initial consumption values.
Equation (22) tells us that the variance of 𝑐𝑡 increases over time at a rate proportional to 𝑡.
A number of different studies have investigated this prediction and found some support for it
(see, e.g., [27], [103]).
674 CHAPTER 41. THE PERMANENT INCOME MODEL

41.4.4 Impulse Response Functions

Impulse response functions measure responses to various impulses (i.e., temporary shocks).
The impulse response function of {𝑐𝑡 } to the innovation {𝑤𝑡 } is a box.
In particular, the response of 𝑐𝑡+𝑗 to a unit increase in the innovation 𝑤𝑡+1 is (1 − 𝛽)𝑈 (𝐼 −
𝛽𝐴)−1 𝐶 for all 𝑗 ≥ 1.

41.4.5 Moving Average Representation

It’s useful to express the innovation to the expected present value of the endowment process
in terms of a moving average representation for income 𝑦𝑡 .
The endowment process defined by (3) has the moving average representation

𝑦𝑡+1 = 𝑑(𝐿)𝑤𝑡+1 (23)

where

• 𝑑(𝐿) = ∑𝑗=0 𝑑𝑗 𝐿𝑗 for some sequence 𝑑𝑗 , where 𝐿 is the lag operator Section ??
• at time 𝑡, the consumer has an information set Section ?? 𝑤𝑡 = [𝑤𝑡 , 𝑤𝑡−1 , …]
Notice that

𝑦𝑡+𝑗 − 𝔼𝑡 [𝑦𝑡+𝑗 ] = 𝑑0 𝑤𝑡+𝑗 + 𝑑1 𝑤𝑡+𝑗−1 + ⋯ + 𝑑𝑗−1 𝑤𝑡+1

It follows that

𝔼𝑡+1 [𝑦𝑡+𝑗 ] − 𝔼𝑡 [𝑦𝑡+𝑗 ] = 𝑑𝑗−1 𝑤𝑡+1 (24)

Using (24) in (16) gives

𝑐𝑡+1 − 𝑐𝑡 = (1 − 𝛽)𝑑(𝛽)𝑤𝑡+1 (25)

The object 𝑑(𝛽) is the present value of the moving average coefficients in the represen-
tation for the endowment process 𝑦𝑡 .

41.5 Two Classic Examples

We illustrate some of the preceding ideas with two examples.


In both examples, the endowment follows the process 𝑦𝑡 = 𝑧1𝑡 + 𝑧2𝑡 where

𝑧 1 0 𝑧1𝑡 𝜎 0 𝑤1𝑡+1
[ 1𝑡+1 ] = [ ][ ] + [ 1 ][ ]
𝑧2𝑡+1 0 0 𝑧2𝑡 0 𝜎2 𝑤2𝑡+1

Here
• 𝑤𝑡+1 is an IID 2 × 1 process distributed as 𝑁 (0, 𝐼).
• 𝑧1𝑡 is a permanent component of 𝑦𝑡 .
• 𝑧2𝑡 is a purely transitory component of 𝑦𝑡 .
41.5. TWO CLASSIC EXAMPLES 675

41.5.1 Example 1

Assume as before that the consumer observes the state 𝑧𝑡 at time 𝑡.


In view of (18) we have

𝑐𝑡+1 − 𝑐𝑡 = 𝜎1 𝑤1𝑡+1 + (1 − 𝛽)𝜎2 𝑤2𝑡+1 (26)

Formula (26) shows how an increment 𝜎1 𝑤1𝑡+1 to the permanent component of income 𝑧1𝑡+1
leads to
• a permanent one-for-one increase in consumption and
• no increase in savings −𝑏𝑡+1
But the purely transitory component of income 𝜎2 𝑤2𝑡+1 leads to a permanent increment in
consumption by a fraction 1 − 𝛽 of transitory income.
The remaining fraction 𝛽 is saved, leading to a permanent increment in −𝑏𝑡+1 .
Application of the formula for debt in (11) to this example shows that

𝑏𝑡+1 − 𝑏𝑡 = −𝑧2𝑡 = −𝜎2 𝑤2𝑡 (27)

This confirms that none of 𝜎1 𝑤1𝑡 is saved, while all of 𝜎2 𝑤2𝑡 is saved.
The next figure illustrates these very different reactions to transitory and permanent income
shocks using impulse-response functions

In [5]: r = 0.05
β = 1 / (1 + r)
S = 5 # Impulse date
σ1 = σ2 = 0.15

@njit
def time_path(T, permanent=False):
"Time path of consumption and debt given shock sequence"
w1 = np.zeros(T+1)
w2 = np.zeros(T+1)
b = np.zeros(T+1)
c = np.zeros(T+1)
if permanent:
w1[S+1] = 1.0
else:
w2[S+1] = 1.0
for t in range(1, T):
b[t+1] = b[t] ­ σ2 * w2[t]
c[t+1] = c[t] + σ1 * w1[t+1] + (1 ­ β) * σ2 * w2[t+1]
return b, c

fig, axes = plt.subplots(2, 1, figsize=(10, 8))


titles = ['transitory', 'permanent']

L = 0.175

for ax, truefalse, title in zip(axes, (True, False), titles):


b, c = time_path(T=20, permanent=truefalse)
ax.set_title(f'Impulse reponse: {title} income shock')
ax.plot(c, 'g­', label="consumption")
ax.plot(b, 'b­', label="debt")
676 CHAPTER 41. THE PERMANENT INCOME MODEL

ax.plot((S, S), (­L, L), 'k­', lw=0.5)


ax.grid(alpha=0.5)
ax.set(xlabel=r'Time', ylim=(­L, L))

axes[0].legend(loc='lower right')

plt.tight_layout()
plt.show()

41.5.2 Example 2

Assume now that at time 𝑡 the consumer observes 𝑦𝑡 , and its history up to 𝑡, but not 𝑧𝑡 .
Under this assumption, it is appropriate to use an innovation representation to form 𝐴, 𝐶, 𝑈
in (18).
The discussion in sections 2.9.1 and 2.11.3 of [72] shows that the pertinent state space repre-
sentation for 𝑦𝑡 is

𝑦 1 −(1 − 𝐾) 𝑦𝑡 1
[ 𝑡+1 ] = [ ] [ ] + [ ] 𝑎𝑡+1
𝑎𝑡+1 0 0 𝑎𝑡 1
𝑦
𝑦𝑡 = [1 0] [ 𝑡 ]
𝑎𝑡

where
41.6. FURTHER READING 677

• 𝐾 ∶= the stationary Kalman gain


• 𝑎𝑡 ∶= 𝑦𝑡 − 𝐸[𝑦𝑡 | 𝑦𝑡−1 , … , 𝑦0 ]
In the same discussion in [72] it is shown that 𝐾 ∈ [0, 1] and that 𝐾 increases as 𝜎1 /𝜎2 does.
In other words, 𝐾 increases as the ratio of the standard deviation of the permanent shock to
that of the transitory shock increases.
Please see first look at the Kalman filter.
Applying formulas (18) implies

𝑐𝑡+1 − 𝑐𝑡 = [1 − 𝛽(1 − 𝐾)]𝑎𝑡+1 (28)

where the endowment process can now be represented in terms of the univariate innovation to
𝑦𝑡 as

𝑦𝑡+1 − 𝑦𝑡 = 𝑎𝑡+1 − (1 − 𝐾)𝑎𝑡 (29)

Equation (29) indicates that the consumer regards


• fraction 𝐾 of an innovation 𝑎𝑡+1 to 𝑦𝑡+1 as permanent
• fraction 1 − 𝐾 as purely transitory
The consumer permanently increases his consumption by the full amount of his estimate of
the permanent part of 𝑎𝑡+1 , but by only (1 − 𝛽) times his estimate of the purely transitory
part of 𝑎𝑡+1 .
Therefore, in total, he permanently increments his consumption by a fraction 𝐾 + (1 − 𝛽)(1 −
𝐾) = 1 − 𝛽(1 − 𝐾) of 𝑎𝑡+1 .
He saves the remaining fraction 𝛽(1 − 𝐾).
According to equation (29), the first difference of income is a first-order moving average.
Equation (28) asserts that the first difference of consumption is IID.
Application of formula to this example shows that

𝑏𝑡+1 − 𝑏𝑡 = (𝐾 − 1)𝑎𝑡 (30)

This indicates how the fraction 𝐾 of the innovation to 𝑦𝑡 that is regarded as permanent influ-
ences the fraction of the innovation that is saved.

41.6 Further Reading

The model described above significantly changed how economists think about consumption.
While Hall’s model does a remarkably good job as a first approximation to consumption data,
it’s widely believed that it doesn’t capture important aspects of some consumption/savings
data.
For example, liquidity constraints and precautionary savings appear to be present sometimes.
Further discussion can be found in, e.g., [48], [86], [26], [20].
678 CHAPTER 41. THE PERMANENT INCOME MODEL

41.7 Appendix: The Euler Equation

Where does the first-order condition (5) come from?


Here we’ll give a proof for the two-period case, which is representative of the general argu-
ment.
The finite horizon equivalent of the no-Ponzi condition is that the agent cannot end her life in
debt, so 𝑏2 = 0.
From the budget constraint (2) we then have

𝑏1
𝑐0 = − 𝑏0 + 𝑦0 and 𝑐1 = 𝑦1 − 𝑏1
1+𝑟

Here 𝑏0 and 𝑦0 are given constants.


Substituting these constraints into our two-period objective 𝑢(𝑐0 ) + 𝛽𝔼0 [𝑢(𝑐1 )] gives

𝑏1
max {𝑢 ( − 𝑏0 + 𝑦0 ) + 𝛽 𝔼0 [𝑢(𝑦1 − 𝑏1 )]}
𝑏1 𝑅

You will be able to verify that the first-order condition is

𝑢′ (𝑐0 ) = 𝛽𝑅 𝔼0 [𝑢′ (𝑐1 )]

Using 𝛽𝑅 = 1 gives (5) in the two-period case.


The proof for the general case is similar
Footnotes
[1] A linear marginal utility is essential for deriving (6) from (5). Suppose instead that we
had imposed the following more standard assumptions on the utility function: 𝑢′ (𝑐) >
0, 𝑢″ (𝑐) < 0, 𝑢‴ (𝑐) > 0 and required that 𝑐 ≥ 0. The Euler equation remains (5). But the
fact that 𝑢‴ < 0 implies via Jensen’s inequality that 𝔼𝑡 [𝑢′ (𝑐𝑡+1 )] > 𝑢′ (𝔼𝑡 [𝑐𝑡+1 ]). This inequal-
ity together with (5) implies that 𝔼𝑡 [𝑐𝑡+1 ] > 𝑐𝑡 (consumption is said to be a ‘submartingale’),
so that consumption stochastically diverges to +∞. The consumer’s savings also diverge to
+∞.
[2] An optimal decision rule is a map from the current state into current actions—in this
case, consumption.
[3] Representation (3) implies that 𝑑(𝐿) = 𝑈 (𝐼 − 𝐴𝐿)−1 𝐶.
[4] This would be the case if, for example, the spectral radius of 𝐴 is strictly less than one.
[5] A moving average representation for a process 𝑦𝑡 is said to be fundamental if the linear
space spanned by 𝑦𝑡 is equal to the linear space spanned by 𝑤𝑡 . A time-invariant innovations
representation, attained via the Kalman filter, is by construction fundamental.
[6] See [61], [69], [70] for interesting applications of related ideas.
Chapter 42

Permanent Income II: LQ


Techniques

42.1 Contents

• Overview 42.2
• Setup 42.3
• The LQ Approach 42.4
• Implementation 42.5
• Two Example Economies 42.6
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon

42.2 Overview

This lecture continues our analysis of the linear-quadratic (LQ) permanent income model of
savings and consumption.
As we saw in our previous lecture on this topic, Robert Hall [47] used the LQ permanent in-
come model to restrict and interpret intertemporal comovements of nondurable consumption,
nonfinancial income, and financial wealth.
For example, we saw how the model asserts that for any covariance stationary process for
nonfinancial income
• consumption is a random walk
• financial wealth has a unit root and is cointegrated with consumption
Other applications use the same LQ framework.
For example, a model isomorphic to the LQ permanent income model has been used by
Robert Barro [8] to interpret intertemporal comovements of a government’s tax collections,
its expenditures net of debt service, and its public debt.
This isomorphism means that in analyzing the LQ permanent income model, we are in effect
also analyzing the Barro tax smoothing model.
It is just a matter of appropriately relabeling the variables in Hall’s model.

679
680 CHAPTER 42. PERMANENT INCOME II: LQ TECHNIQUES

In this lecture, we’ll


• show how the solution to the LQ permanent income model can be obtained using LQ
control methods.
• represent the model as a linear state space system as in this lecture.
• apply QuantEcon’s LinearStateSpace class to characterize statistical features of the con-
sumer’s optimal consumption and borrowing plans.
We’ll then use these characterizations to construct a simple model of cross-section wealth and
consumption dynamics in the spirit of Truman Bewley [14].
(Later we’ll study other Bewley models—see this lecture)
The model will prove useful for illustrating concepts such as
• stationarity
• ergodicity
• ensemble moments and cross-section observations
Let’s start with some imports:

In [2]: import quantecon as qe


import numpy as np
import scipy.linalg as la
import matplotlib.pyplot as plt
%matplotlib inline

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

42.3 Setup

Let’s recall the basic features of the model discussed in the permanent income model.
Consumer preferences are ordered by


𝐸0 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (1)
𝑡=0

where 𝑢(𝑐) = −(𝑐 − 𝛾)2 .


The consumer maximizes (1) by choosing a consumption, borrowing plan {𝑐𝑡 , 𝑏𝑡+1 }∞
𝑡=0 subject
to the sequence of budget constraints

1
𝑐𝑡 + 𝑏 𝑡 = 𝑏 + 𝑦𝑡 , 𝑡≥0 (2)
1 + 𝑟 𝑡+1

and the no-Ponzi condition


𝐸0 ∑ 𝛽 𝑡 𝑏𝑡2 < ∞ (3)
𝑡=0
42.3. SETUP 681

The interpretation of all variables and parameters are the same as in the previous lecture.
We continue to assume that (1 + 𝑟)𝛽 = 1.
The dynamics of {𝑦𝑡 } again follow the linear state space model

𝑧𝑡+1 = 𝐴𝑧𝑡 + 𝐶𝑤𝑡+1


(4)
𝑦𝑡 = 𝑈 𝑧𝑡

The restrictions on the shock process and parameters are the same as in our previous lecture.

42.3.1 Digression on a Useful Isomorphism

The LQ permanent income model of consumption is mathematically isomorphic with a ver-


sion of Barro’s [8] model of tax smoothing.
In the LQ permanent income model
• the household faces an exogenous process of nonfinancial income
• the household wants to smooth consumption across states and time
In the Barro tax smoothing model
• a government faces an exogenous sequence of government purchases (net of interest pay-
ments on its debt)
• a government wants to smooth tax collections across states and time
If we set
• 𝑇𝑡 , total tax collections in Barro’s model to consumption 𝑐𝑡 in the LQ permanent in-
come model.
• 𝐺𝑡 , exogenous government expenditures in Barro’s model to nonfinancial income 𝑦𝑡 in
the permanent income model.
• 𝐵𝑡 , government risk-free one-period assets falling due in Barro’s model to risk-free one-
period consumer debt 𝑏𝑡 falling due in the LQ permanent income model.
• 𝑅, the gross rate of return on risk-free one-period government debt in Barro’s model
to the gross rate of return 1 + 𝑟 on financial assets in the permanent income model of
consumption.
then the two models are mathematically equivalent.
All characterizations of a {𝑐𝑡 , 𝑦𝑡 , 𝑏𝑡 } in the LQ permanent income model automatically apply
to a {𝑇𝑡 , 𝐺𝑡 , 𝐵𝑡 } process in the Barro model of tax smoothing.
See consumption and tax smoothing models for further exploitation of an isomorphism be-
tween consumption and tax smoothing models.

42.3.2 A Specification of the Nonfinancial Income Process

For the purposes of this lecture, let’s assume {𝑦𝑡 } is a second-order univariate autoregressive
process:

𝑦𝑡+1 = 𝛼 + 𝜌1 𝑦𝑡 + 𝜌2 𝑦𝑡−1 + 𝜎𝑤𝑡+1

We can map this into the linear state space framework in (4), as discussed in our lecture on
682 CHAPTER 42. PERMANENT INCOME II: LQ TECHNIQUES

linear models.
To do so we take

1 1 0 0 0
𝑧𝑡 = ⎢ 𝑦𝑡 ⎤

⎥, 𝐴 = ⎢𝛼 𝜌1 𝜌2 ⎤

⎥,

𝐶 = ⎢𝜎 ⎤⎥, and 𝑈 = [0 1 0]
𝑦
⎣ 𝑡−1 ⎦ ⎣ 0 1 0 ⎦ 0
⎣ ⎦

42.4 The LQ Approach

Previously we solved the permanent income model by solving a system of linear expectational
difference equations subject to two boundary conditions.
Here we solve the same model using LQ methods based on dynamic programming.
After confirming that answers produced by the two methods agree, we apply QuantEcon’s
LinearStateSpace class to illustrate features of the model.
Why solve a model in two distinct ways?
Because by doing so we gather insights about the structure of the model.
Our earlier approach based on solving a system of expectational difference equations brought
to the fore the role of the consumer’s expectations about future nonfinancial income.
On the other hand, formulating the model in terms of an LQ dynamic programming problem
reminds us that
• finding the state (of a dynamic programming problem) is an art, and
• iterations on a Bellman equation implicitly jointly solve both a forecasting problem and
a control problem

42.4.1 The LQ Problem

Recall from our lecture on LQ theory that the optimal linear regulator problem is to choose a
decision rule for 𝑢𝑡 to minimize


𝔼 ∑ 𝛽 𝑡 {𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 },
𝑡=0

subject to 𝑥0 given and the law of motion

̃ + 𝐵𝑢
𝑥𝑡+1 = 𝐴𝑥 ̃
̃ 𝑡 + 𝐶𝑤
𝑡 𝑡+1 , 𝑡 ≥ 0, (5)

where 𝑤𝑡+1 is IID with mean vector zero and 𝔼𝑤𝑡 𝑤𝑡′ = 𝐼.

The tildes in 𝐴,̃ 𝐵,̃ 𝐶 ̃ are to avoid clashing with notation in (4).
The value function for this problem is 𝑣(𝑥) = −𝑥′ 𝑃 𝑥 − 𝑑, where
• 𝑃 is the unique positive semidefinite solution of the corresponding matrix Riccati equa-
tion.
• The scalar 𝑑 is given by 𝑑 = 𝛽(1 − 𝛽)−1 trace(𝑃 𝐶 𝐶̃ ′̃ ).
42.5. IMPLEMENTATION 683

The optimal policy is 𝑢𝑡 = −𝐹 𝑥𝑡 , where 𝐹 ∶= 𝛽(𝑄 + 𝛽 𝐵̃ ′ 𝑃 𝐵)̃ −1 𝐵̃ ′ 𝑃 𝐴.̃


Under an optimal decision rule 𝐹 , the state vector 𝑥𝑡 evolves according to 𝑥𝑡+1 = (𝐴 ̃ −
̃
̃ )𝑥𝑡 + 𝐶𝑤
𝐵𝐹 𝑡+1 .

42.4.2 Mapping into the LQ Framework

To map into the LQ framework, we’ll use

1
𝑧𝑡 ⎡ 𝑦 ⎤
𝑥𝑡 ∶= [ ] = ⎢ 𝑡 ⎥
𝑏𝑡 ⎢𝑦𝑡−1 ⎥
⎣ 𝑏𝑡 ⎦

as the state vector and 𝑢𝑡 ∶= 𝑐𝑡 − 𝛾 as the control.


With this notation and 𝑈𝛾 ∶= [𝛾 0 0], we can write the state dynamics as in (5) when

𝐴 0 0 𝐶
𝐴 ̃ ∶= [ ] 𝐵̃ ∶= [ ] and 𝐶 ̃ ∶= [ ] 𝑤𝑡+1
(1 + 𝑟)(𝑈𝛾 − 𝑈 ) 1 + 𝑟 1+𝑟 0

Please confirm for yourself that, with these definitions, the LQ dynamics (5) match the dy-
namics of 𝑧𝑡 and 𝑏𝑡 described above.
To map utility into the quadratic form 𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 we can set
• 𝑄 ∶= 1 (remember that we are minimizing) and
• 𝑅 ∶= a 4 × 4 matrix of zeros
However, there is one problem remaining.
We have no direct way to capture the non-recursive restriction (3) on the debt sequence {𝑏𝑡 }
from within the LQ framework.
To try to enforce it, we’re going to use a trick: put a small penalty on 𝑏𝑡2 in the criterion func-
tion.
In the present setting, this means adding a small entry 𝜖 > 0 in the (4, 4) position of 𝑅.
That will induce a (hopefully) small approximation error in the decision rule.
We’ll check whether it really is small numerically soon.

42.5 Implementation

Let’s write some code to solve the model.


One comment before we start is that the bliss level of consumption 𝛾 in the utility function
has no effect on the optimal decision rule.
We saw this in the previous lecture permanent income.
The reason is that it drops out of the Euler equation for consumption.
In what follows we set it equal to unity.
684 CHAPTER 42. PERMANENT INCOME II: LQ TECHNIQUES

42.5.1 The Exogenous Nonfinancial Income Process

First, we create the objects for the optimal linear regulator

In [3]: # Set parameters


α, β, ρ1, ρ2, σ = 10.0, 0.95, 0.9, 0.0, 1.0

R = 1 / β
A = np.array([[1., 0., 0.],
[α, ρ1, ρ2],
[0., 1., 0.]])
C = np.array([[0.], [σ], [0.]])
G = np.array([[0., 1., 0.]])

# Form LinearStateSpace system and pull off steady state moments


μ_z0 = np.array([[1.0], [0.0], [0.0]])
Σ_z0 = np.zeros((3, 3))
Lz = qe.LinearStateSpace(A, C, G, mu_0=μ_z0, Sigma_0=Σ_z0)
μ_z, μ_y, Σ_z, Σ_y = Lz.stationary_distributions()

# Mean vector of state for the savings problem


mxo = np.vstack([μ_z, 0.0])

# Create stationary covariance matrix of x ­­ start everyone off at b=0


a1 = np.zeros((3, 1))
aa = np.hstack([Σ_z, a1])
bb = np.zeros((1, 4))
sxo = np.vstack([aa, bb])

# These choices will initialize the state vector of an individual at zero


# debt and the ergodic distribution of the endowment process. Use these to
# create the Bewley economy.
mxbewley = mxo
sxbewley = sxo

The next step is to create the matrices for the LQ system

In [4]: A12 = np.zeros((3,1))


ALQ_l = np.hstack([A, A12])
ALQ_r = np.array([[0, ­R, 0, R]])
ALQ = np.vstack([ALQ_l, ALQ_r])

RLQ = np.array([[0., 0., 0., 0.],


[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 1e­9]])

QLQ = np.array([1.0])
BLQ = np.array([0., 0., 0., R]).reshape(4,1)
CLQ = np.array([0., σ, 0., 0.]).reshape(4,1)
β_LQ = β

Let’s print these out and have a look at them

In [5]: print(f"A = \n {ALQ}")


print(f"B = \n {BLQ}")
print(f"R = \n {RLQ}")
print(f"Q = \n {QLQ}")
42.5. IMPLEMENTATION 685

A =
[[ 1. 0. 0. 0. ]
[10. 0.9 0. 0. ]
[ 0. 1. 0. 0. ]
[ 0. ­1.05263158 0. 1.05263158]]
B =
[[0. ]
[0. ]
[0. ]
[1.05263158]]
R =
[[0.e+00 0.e+00 0.e+00 0.e+00]
[0.e+00 0.e+00 0.e+00 0.e+00]
[0.e+00 0.e+00 0.e+00 0.e+00]
[0.e+00 0.e+00 0.e+00 1.e­09]]
Q =
[1.]

Now create the appropriate instance of an LQ model

In [6]: lqpi = qe.LQ(QLQ, RLQ, ALQ, BLQ, C=CLQ, beta=β_LQ)

We’ll save the implied optimal policy function soon compare them with what we get by em-
ploying an alternative solution method

In [7]: P, F, d = lqpi.stationary_values() # Compute value function and decision rule


ABF = ALQ ­ BLQ @ F # Form closed loop system

42.5.2 Comparison with the Difference Equation Approach

In our first lecture on the infinite horizon permanent income problem we used a different solu-
tion method.
The method was based around
• deducing the Euler equations that are the first-order conditions with respect to con-
sumption and savings.
• using the budget constraints and boundary condition to complete a system of expecta-
tional linear difference equations.
• solving those equations to obtain the solution.
Expressed in state space notation, the solution took the form

𝑧𝑡+1 = 𝐴𝑧𝑡 + 𝐶𝑤𝑡+1


𝑏𝑡+1 = 𝑏𝑡 + 𝑈 [(𝐼 − 𝛽𝐴)−1 (𝐴 − 𝐼)]𝑧𝑡
𝑦𝑡 = 𝑈 𝑧𝑡
𝑐𝑡 = (1 − 𝛽)[𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡 − 𝑏𝑡 ]

Now we’ll apply the formulas in this system

In [8]: # Use the above formulas to create the optimal policies for b_{t+1} and c_t
b_pol = G @ la.inv(np.eye(3, 3) ­ β * A) @ (A ­ np.eye(3, 3))
c_pol = (1 ­ β) * G @ la.inv(np.eye(3, 3) ­ β * A)
686 CHAPTER 42. PERMANENT INCOME II: LQ TECHNIQUES

# Create the A matrix for a LinearStateSpace instance


A_LSS1 = np.vstack([A, b_pol])
A_LSS2 = np.eye(4, 1, ­3)
A_LSS = np.hstack([A_LSS1, A_LSS2])

# Create the C matrix for LSS methods


C_LSS = np.vstack([C, np.zeros(1)])

# Create the G matrix for LSS methods


G_LSS1 = np.vstack([G, c_pol])
G_LSS2 = np.vstack([np.zeros(1), ­(1 ­ β)])
G_LSS = np.hstack([G_LSS1, G_LSS2])

# Use the following values to start everyone off at b=0, initial incomes zero
μ_0 = np.array([1., 0., 0., 0.])
Σ_0 = np.zeros((4, 4))

A_LSS calculated as we have here should equal ABF calculated above using the LQ model

In [9]: ABF ­ A_LSS

Out[9]: array([[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,


0.00000000e+00],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[­9.51248418e­06, 9.51247728e­08, 0.00000000e+00,
­1.99999901e­08]])

Now compare pertinent elements of c_pol and F

In [10]: print(c_pol, "\n", ­F)

[[65.51724138 0.34482759 0. ]]
[[ 6.55172323e+01 3.44827677e­01 ­0.00000000e+00 ­5.00000190e­02]]

We have verified that the two methods give the same solution.
Now let’s create instances of the LinearStateSpace class and use it to do some interesting ex-
periments.
To do this, we’ll use the outcomes from our second method.

42.6 Two Example Economies

In the spirit of Bewley models [14], we’ll generate panels of consumers.


The examples differ only in the initial states with which we endow the consumers.
All other parameter values are kept the same in the two examples
• In the first example, all consumers begin with zero nonfinancial income and zero debt.
– The consumers are thus ex-ante identical.
42.6. TWO EXAMPLE ECONOMIES 687

• In the second example, while all begin with zero debt, we draw their initial income lev-
els from the invariant distribution of financial income.
– Consumers are ex-ante heterogeneous.
In the first example, consumers’ nonfinancial income paths display pronounced transients
early in the sample
• these will affect outcomes in striking ways
Those transient effects will not be present in the second example.
We use methods affiliated with the LinearStateSpace class to simulate the model.

42.6.1 First Set of Initial Conditions

We generate 25 paths of the exogenous non-financial income process and the associated opti-
mal consumption and debt paths.
In the first set of graphs, darker lines depict a particular sample path, while the lighter lines
describe 24 other paths.
A second graph plots a collection of simulations against the population distribution that we
extract from the LinearStateSpace instance LSS.
Comparing sample paths with population distributions at each date 𝑡 is a useful exercise—see
our discussion of the laws of large numbers

In [11]: lss = qe.LinearStateSpace(A_LSS, C_LSS, G_LSS, mu_0=μ_0, Sigma_0=Σ_0)

42.6.2 Population and Sample Panels

In the code below, we use the LinearStateSpace class to


• compute and plot population quantiles of the distributions of consumption and debt for
a population of consumers.
• simulate a group of 25 consumers and plot sample paths on the same graph as the pop-
ulation distribution.

In [12]: def income_consumption_debt_series(A, C, G, μ_0, Σ_0, T=150, npaths=25):


"""
This function takes initial conditions (μ_0, Σ_0) and uses the
LinearStateSpace class from QuantEcon to simulate an economy
npaths times for T periods. It then uses that information to
generate some graphs related to the discussion below.
"""
lss = qe.LinearStateSpace(A, C, G, mu_0=μ_0, Sigma_0=Σ_0)

# Simulation/Moment Parameters
moment_generator = lss.moment_sequence()

# Simulate various paths


bsim = np.empty((npaths, T))
csim = np.empty((npaths, T))
ysim = np.empty((npaths, T))

for i in range(npaths):
sims = lss.simulate(T)
bsim[i, :] = sims[0][­1, :]
688 CHAPTER 42. PERMANENT INCOME II: LQ TECHNIQUES

csim[i, :] = sims[1][1, :]
ysim[i, :] = sims[1][0, :]

# Get the moments


cons_mean = np.empty(T)
cons_var = np.empty(T)
debt_mean = np.empty(T)
debt_var = np.empty(T)
for t in range(T):
μ_x, μ_y, Σ_x, Σ_y = next(moment_generator)
cons_mean[t], cons_var[t] = μ_y[1], Σ_y[1, 1]
debt_mean[t], debt_var[t] = μ_x[3], Σ_x[3, 3]

return bsim, csim, ysim, cons_mean, cons_var, debt_mean, debt_var

def consumption_income_debt_figure(bsim, csim, ysim):

# Get T
T = bsim.shape[1]

# Create the first figure


fig, ax = plt.subplots(2, 1, figsize=(10, 8))
xvals = np.arange(T)

# Plot consumption and income


ax[0].plot(csim[0, :], label="c", color="b")
ax[0].plot(ysim[0, :], label="y", color="g")
ax[0].plot(csim.T, alpha=.1, color="b")
ax[0].plot(ysim.T, alpha=.1, color="g")
ax[0].legend(loc=4)
ax[0].set(title="Nonfinancial Income, Consumption, and Debt",
xlabel="t", ylabel="y and c")

# Plot debt
ax[1].plot(bsim[0, :], label="b", color="r")
ax[1].plot(bsim.T, alpha=.1, color="r")
ax[1].legend(loc=4)
ax[1].set(xlabel="t", ylabel="debt")

fig.tight_layout()
return fig

def consumption_debt_fanchart(csim, cons_mean, cons_var,


bsim, debt_mean, debt_var):
# Get T
T = bsim.shape[1]

# Create percentiles of cross­section distributions


cmean = np.mean(cons_mean)
c90 = 1.65 * np.sqrt(cons_var)
c95 = 1.96 * np.sqrt(cons_var)
c_perc_95p, c_perc_95m = cons_mean + c95, cons_mean ­ c95
c_perc_90p, c_perc_90m = cons_mean + c90, cons_mean ­ c90

# Create percentiles of cross­section distributions


dmean = np.mean(debt_mean)
d90 = 1.65 * np.sqrt(debt_var)
d95 = 1.96 * np.sqrt(debt_var)
d_perc_95p, d_perc_95m = debt_mean + d95, debt_mean ­ d95
d_perc_90p, d_perc_90m = debt_mean + d90, debt_mean ­ d90

# Create second figure


42.6. TWO EXAMPLE ECONOMIES 689

fig, ax = plt.subplots(2, 1, figsize=(10, 8))


xvals = np.arange(T)

# Consumption fan
ax[0].plot(xvals, cons_mean, color="k")
ax[0].plot(csim.T, color="k", alpha=.25)
ax[0].fill_between(xvals, c_perc_95m, c_perc_95p, alpha=.25, color="b")
ax[0].fill_between(xvals, c_perc_90m, c_perc_90p, alpha=.25, color="r")
ax[0].set(title="Consumption/Debt over time",
ylim=(cmean­15, cmean+15), ylabel="consumption")

# Debt fan
ax[1].plot(xvals, debt_mean, color="k")
ax[1].plot(bsim.T, color="k", alpha=.25)
ax[1].fill_between(xvals, d_perc_95m, d_perc_95p, alpha=.25, color="b")
ax[1].fill_between(xvals, d_perc_90m, d_perc_90p, alpha=.25, color="r")
ax[1].set(xlabel="t", ylabel="debt")

fig.tight_layout()
return fig

Now let’s create figures with initial conditions of zero for 𝑦0 and 𝑏0

In [13]: out = income_consumption_debt_series(A_LSS, C_LSS, G_LSS, μ_0, Σ_0)


bsim0, csim0, ysim0 = out[:3]
cons_mean0, cons_var0, debt_mean0, debt_var0 = out[3:]

consumption_income_debt_figure(bsim0, csim0, ysim0)

plt.show()
690 CHAPTER 42. PERMANENT INCOME II: LQ TECHNIQUES

In [14]: consumption_debt_fanchart(csim0, cons_mean0, cons_var0,


bsim0, debt_mean0, debt_var0)

plt.show()

Here is what is going on in the above graphs.


For our simulation, we have set initial conditions 𝑏0 = 𝑦−1 = 𝑦−2 = 0.
Because 𝑦−1 = 𝑦−2 = 0, nonfinancial income 𝑦𝑡 starts far below its stationary mean 𝜇𝑦,∞ and
rises early in each simulation.
Recall from the previous lecture that we can represent the optimal decision rule for consump-
tion in terms of the co-integrating relationship


(1 − 𝛽)𝑏𝑡 + 𝑐𝑡 = (1 − 𝛽)𝐸𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 (6)
𝑗=0

So at time 0 we have


𝑐0 = (1 − 𝛽)𝐸0 ∑ 𝛽 𝑗 𝑦𝑡
𝑡=0

This tells us that consumption starts at the income that would be paid by an annuity whose
value equals the expected discounted value of nonfinancial income at time 𝑡 = 0.
42.6. TWO EXAMPLE ECONOMIES 691

To support that level of consumption, the consumer borrows a lot early and consequently
builds up substantial debt.
In fact, he or she incurs so much debt that eventually, in the stochastic steady state, he con-
sumes less each period than his nonfinancial income.
He uses the gap between consumption and nonfinancial income mostly to service the interest
payments due on his debt.
Thus, when we look at the panel of debt in the accompanying graph, we see that this is a
group of ex-ante identical people each of whom starts with zero debt.
All of them accumulate debt in anticipation of rising nonfinancial income.
They expect their nonfinancial income to rise toward the invariant distribution of income, a
consequence of our having started them at 𝑦−1 = 𝑦−2 = 0.

Cointegration Residual

The following figure plots realizations of the left side of (6), which, as discussed in our last
lecture, is called the cointegrating residual.
As mentioned above, the right side can be thought of as an annuity payment on the expected

present value of future income 𝐸𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 .
Early along a realization, 𝑐𝑡 is approximately constant while (1 − 𝛽)𝑏𝑡 and (1 −

𝛽)𝐸𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 both rise markedly as the household’s present value of income and borrow-
ing rise pretty much together.
This example illustrates the following point: the definition of cointegration implies that the
cointegrating residual is asymptotically covariance stationary, not covariance stationary.
The cointegrating residual for the specification with zero income and zero debt initially has a
notable transient component that dominates its behavior early in the sample.
By altering initial conditions, we shall remove this transient in our second example to be pre-
sented below

In [15]: def cointegration_figure(bsim, csim):


"""
Plots the cointegration
"""
# Create figure
fig, ax = plt.subplots(figsize=(10, 8))
ax.plot((1 ­ β) * bsim[0, :] + csim[0, :], color="k")
ax.plot((1 ­ β) * bsim.T + csim.T, color="k", alpha=.1)

ax.set(title="Cointegration of Assets and Consumption", xlabel="t")

return fig

In [16]: cointegration_figure(bsim0, csim0)


plt.show()
692 CHAPTER 42. PERMANENT INCOME II: LQ TECHNIQUES

42.6.3 A “Borrowers and Lenders” Closed Economy

When we set 𝑦−1 = 𝑦−2 = 0 and 𝑏0 = 0 in the preceding exercise, we make debt “head north”
early in the sample.
Average debt in the cross-section rises and approaches the asymptote.
We can regard these as outcomes of a “small open economy” that borrows from abroad at the
fixed gross interest rate 𝑅 = 𝑟 + 1 in anticipation of rising incomes.
So with the economic primitives set as above, the economy converges to a steady state in
which there is an excess aggregate supply of risk-free loans at a gross interest rate of 𝑅.
This excess supply is filled by “foreigner lenders” willing to make those loans.
We can use virtually the same code to rig a “poor man’s Bewley [14] model” in the following
way
• as before, we start everyone at 𝑏0 = 0.
𝑦
• But instead of starting everyone at 𝑦−1 = 𝑦−2 = 0, we draw [ −1 ] from the invariant
𝑦−2
distribution of the {𝑦𝑡 } process.
This rigs a closed economy in which people are borrowing and lending with each other at a
gross risk-free interest rate of 𝑅 = 𝛽 −1 .
Across the group of people being analyzed, risk-free loans are in zero excess supply.
42.6. TWO EXAMPLE ECONOMIES 693

We have arranged primitives so that 𝑅 = 𝛽 −1 clears the market for risk-free loans at zero
aggregate excess supply.
So the risk-free loans are being made from one person to another within our closed set of
agent.
There is no need for foreigners to lend to our group.
Let’s have a look at the corresponding figures

In [17]: out = income_consumption_debt_series(A_LSS, C_LSS, G_LSS, mxbewley, sxbewley)


bsimb, csimb, ysimb = out[:3]
cons_meanb, cons_varb, debt_meanb, debt_varb = out[3:]

consumption_income_debt_figure(bsimb, csimb, ysimb)

plt.show()

In [18]: consumption_debt_fanchart(csimb, cons_meanb, cons_varb,


bsimb, debt_meanb, debt_varb)

plt.show()
694 CHAPTER 42. PERMANENT INCOME II: LQ TECHNIQUES

The graphs confirm the following outcomes:


• As before, the consumption distribution spreads out over time.
But now there is some initial dispersion because there is ex-ante heterogeneity in the initial
𝑦
draws of [ −1 ].
𝑦−2
• As before, the cross-section distribution of debt spreads out over time.
• Unlike before, the average level of debt stays at zero, confirming that this is a closed
borrower-and-lender economy.
• Now the cointegrating residual seems stationary, and not just asymptotically stationary.
Let’s have a look at the cointegration figure

In [19]: cointegration_figure(bsimb, csimb)


plt.show()
42.6. TWO EXAMPLE ECONOMIES 695
696 CHAPTER 42. PERMANENT INCOME II: LQ TECHNIQUES
Chapter 43

Production Smoothing via


Inventories

43.1 Contents

• Overview 43.2
• Example 1 43.3
• Inventories Not Useful 43.4
• Inventories Useful but are Hardwired to be Zero Always 43.5
• Example 2 43.6
• Example 3 43.7
• Example 4 43.8
• Example 5 43.9
• Example 6 43.10
• Exercises 43.11
In addition to what’s in Anaconda, this lecture employs the following library:

In [1]: !conda install ­y quantecon

43.2 Overview

This lecture can be viewed as an application of the quantecon lecture.


It formulates a discounted dynamic program for a firm that chooses a production schedule to
balance
• minimizing costs of production across time, against
• keeping costs of holding inventories low
In the tradition of a classic book by Holt, Modigliani, Muth, and Simon [? ], we simplify the
firm’s problem by formulating it as a linear quadratic discounted dynamic programming prob-
lem of the type studied in this quantecon.
Because its costs of production are increasing and quadratic in production, the firm wants to
smooth production across time provided that holding inventories is not too costly.
But the firm also prefers to sell out of existing inventories, a preference that we represent by
a cost that is quadratic in the difference between sales in a period and the firm’s beginning of

697
698 CHAPTER 43. PRODUCTION SMOOTHING VIA INVENTORIES

period inventories.
We compute examples designed to indicate how the firm optimally chooses to smooth produc-
tion and manage inventories while keeping inventories close to sales.
To introduce components of the model, let
• 𝑆𝑡 be sales at time 𝑡
• 𝑄𝑡 be production at time 𝑡
• 𝐼𝑡 be inventories at the beginning of time 𝑡
• 𝛽 ∈ (0, 1) be a discount factor
• 𝑐(𝑄𝑡 ) = 𝑐1 𝑄𝑡 + 𝑐2 𝑄2𝑡 , be a cost of production function, where 𝑐1 > 0, 𝑐2 > 0, be an
inventory cost function
• 𝑑(𝐼𝑡 , 𝑆𝑡 ) = 𝑑1 𝐼𝑡 + 𝑑2 (𝑆𝑡 − 𝐼𝑡 )2 , where 𝑑1 > 0, 𝑑2 > 0, be a cost-of-holding-inventories
function, consisting of two components:
– a cost 𝑑1 𝑡 of carrying inventories, and
– a cost 𝑑2 (𝑆𝑡 − 𝐼𝑡 )2 of having inventories deviate from sales
• 𝑝𝑡 = 𝑎0 − 𝑎1 𝑆𝑡 + 𝑣𝑡 be an inverse demand function for a firm’s product, where 𝑎0 >
0, 𝑎1 > 0 and 𝑣𝑡 is a demand shock at time 𝑡
• 𝜋_𝑡 = 𝑝𝑡 𝑆𝑡 − 𝑐(𝑄𝑡 ) − 𝑑(𝐼𝑡 , 𝑆𝑡 ) be the firm’s profits at time 𝑡

• ∑𝑡=0 𝛽 𝑡 𝜋𝑡 be the present value of the firm’s profits at time 0
• 𝐼𝑡+1 = 𝐼𝑡 + 𝑄𝑡 − 𝑆𝑡 be the law of motion of inventories
• 𝑧𝑡+1 = 𝐴22 𝑧𝑡 + 𝐶2 𝜖𝑡+1 be the law of motion for an exogenous state vector 𝑧𝑡 that con-
tains time 𝑡 information useful for predicting the demand shock 𝑣𝑡
• 𝑣𝑡 = 𝐺𝑧𝑡 link the demand shock to the information set 𝑧𝑡
• the constant 1 be the first component of 𝑧𝑡
To map our problem into a linear-quadratic discounted dynamic programming problem (also
known as an optimal linear regulator), we define the state vector at time 𝑡 as

𝐼
𝑥𝑡 = [ 𝑡 ]
𝑧𝑡

and the control vector as

𝑄𝑡
𝑢𝑡 = [ ]
𝑆𝑡

The law of motion for the state vector 𝑥𝑡 is evidently

𝐼 1 0 𝐼 1 −1 𝑄𝑡 0
[ 𝑡+1 ] = [ ] [ 𝑡] + [ ] [ ] + [ ] 𝜖𝑡+1
𝑧𝑡 0 𝐴22 𝑧𝑡 0 0 𝑆𝑡 𝐶2

or

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝑢𝑡 + 𝐶𝜖𝑡+1

(At this point, we ask that you please forgive us for using 𝑄𝑡 to be the firm’s production at
time 𝑡, while below we use 𝑄 as the matrix in the quadratic form 𝑢′𝑡 𝑄𝑢𝑡 that appears in the
firm’s one-period profit function)
We can express the firm’s profit as a function of states and controls as
43.2. OVERVIEW 699

𝜋𝑡 = −(𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝐻𝑥𝑡 )

To form the matrices 𝑅, 𝑄, 𝐻, we note that the firm’s profits at time 𝑡 function can be ex-
pressed

𝜋𝑡 =𝑝𝑡 𝑆𝑡 − 𝑐 (𝑄𝑡 ) − 𝑑 (𝐼𝑡 , 𝑆𝑡 )


2
= (𝑎0 − 𝑎1 𝑆𝑡 + 𝑣𝑡 ) 𝑆𝑡 − 𝑐1 𝑄𝑡 − 𝑐2 𝑄2𝑡 − 𝑑1 𝐼𝑡 − 𝑑2 (𝑆𝑡 − 𝐼𝑡 )
=𝑎0 𝑆𝑡 − 𝑎1 𝑆𝑡2 + 𝐺𝑧𝑡 𝑆𝑡 − 𝑐1 𝑄𝑡 − 𝑐2 𝑄2𝑡 − 𝑑1 𝐼𝑡 − 𝑑2 𝑆𝑡2 − 𝑑2 𝐼𝑡2 + 2𝑑2 𝑆𝑡 𝐼𝑡

⎛ 2 + 𝑑 𝑆 2 + 𝑐 𝑄2 − 𝑎 𝑆 − 𝐺𝑧 𝑆 + 𝑐 𝑄 − 2𝑑 𝑆 𝐼 ⎞
=−⎜
⎜⏟𝑑1⏟
𝐼𝑡⏟
+ 𝑑⏟𝐼𝑡2 ⏟⏟
2⏟ + 𝑎⏟
1𝑆
⏟𝑡⏟⏟⏟ 2 𝑡 ⏟⏟⏟⏟2 𝑡 ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
0 𝑡 𝑡 𝑡 1 𝑡

2 𝑡 𝑡⎟
⎝ 𝑥′𝑡 𝑅𝑥𝑡 𝑢′𝑡 𝑄𝑢𝑡 2𝑢′𝑡 𝐻𝑥𝑡 ⎠


⎜ 𝑑2 𝑑1
0 𝑐1
=−⎜ [ 𝐼 𝑧 ′
] [ 2 𝑆𝑐 ] [ 𝐼𝑡 ] + [ 𝑄 𝑆 ] [
𝑐2 0
] [
𝑄𝑡
] + 2 [ 𝑄 𝑆 ] [ 2

⎜ 𝑡 𝑡 ⏟⏟ 𝑑1 ′
𝑧 𝑡 𝑡
0 𝑎 + 𝑑 𝑆 𝑡 𝑡
−𝑑 − 𝑎0
𝑆
2⏟⏟⏟𝑐 0⏟⏟ 𝑡 ⏟⏟⏟⏟⏟ 1 ⏟⏟2 𝑡 ⏟⏟⏟⏟⏟⏟
2 2 𝑆
⎝ ≡𝑅 ≡𝑄 ≡𝑁
(43.1)

where 𝑆𝑐 = [1, 0].


Remark on notation: The notation for cross product term in the QuantEcon library is 𝑁
instead of 𝐻.
The firms’ optimum decision rule takes the form

𝑢𝑡 = −𝐹 𝑥𝑡

and the evolution of the state under the optimal decision rule is

𝑥𝑡+1 = (𝐴 − 𝐵𝐹 )𝑥𝑡 + 𝐶𝜖𝑡+1

Here is code for computing an optimal decision rule and for analyzing its consequences.

In [2]: import numpy as np


import quantecon as qe
import matplotlib.pyplot as plt
%matplotlib inline

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

In [3]: class SmoothingExample:


"""
Class for constructing, solving, and plotting results for
inventories and sales smoothing problem.
"""
700 CHAPTER 43. PRODUCTION SMOOTHING VIA INVENTORIES

def __init__(self,
β=0.96, # Discount factor
c1=1, # Cost­of­production
c2=1,
d1=1, # Cost­of­holding inventories
d2=1,
a0=10, # Inverse demand function
a1=1,
A22=[[1, 0], # z process
[1, 0.9]],
C2=[[0], [1]],
G=[0, 1]):

self.β = β
self.c1, self.c2 = c1, c2
self.d1, self.d2 = d1, d2
self.a0, self.a1 = a0, a1
self.A22 = np.atleast_2d(A22)
self.C2 = np.atleast_2d(C2)
self.G = np.atleast_2d(G)

# Dimensions
k, j = self.C2.shape # Dimensions for randomness part
n = k + 1 # Number of states
m = 2 # Number of controls

Sc = np.zeros(k)
Sc[0] = 1

# Construct matrices of transition law


A = np.zeros((n, n))
A[0, 0] = 1
A[1:, 1:] = self.A22

B = np.zeros((n, m))
B[0, :] = 1, ­1

C = np.zeros((n, j))
C[1:, :] = self.C2

self.A, self.B, self.C = A, B, C

# Construct matrices of one period profit function


R = np.zeros((n, n))
R[0, 0] = d2
R[1:, 0] = d1 / 2 * Sc
R[0, 1:] = d1 / 2 * Sc

Q = np.zeros((m, m))
Q[0, 0] = c2
Q[1, 1] = a1 + d2

N = np.zeros((m, n))
N[1, 0] = ­ d2
N[0, 1:] = c1 / 2 * Sc
N[1, 1:] = ­ a0 / 2 * Sc ­ self.G / 2

self.R, self.Q, self.N = R, Q, N

# Construct LQ instance
self.LQ = qe.LQ(Q, R, A, B, C, N, beta=β)
self.LQ.stationary_values()
43.2. OVERVIEW 701

def simulate(self, x0, T=100):

c1, c2 = self.c1, self.c2


d1, d2 = self.d1, self.d2
a0, a1 = self.a0, self.a1
G = self.G

x_path, u_path, w_path = self.LQ.compute_sequence(x0, ts_length=T)

I_path = x_path[0, :­1]


z_path = x_path[1:, :­1]
�_path = (G @ z_path)[0, :]

Q_path = u_path[0, :]
S_path = u_path[1, :]

revenue = (a0 ­ a1 * S_path + �_path) * S_path


cost_production = c1 * Q_path + c2 * Q_path ** 2
cost_inventories = d1 * I_path + d2 * (S_path ­ I_path) ** 2

Q_no_inventory = (a0 + �_path ­ c1) / (2 * (a1 + c2))


Q_hardwired = (a0 + �_path ­ c1) / (2 * (a1 + c2 + d2))

fig, ax = plt.subplots(2, 2, figsize=(15, 10))

ax[0, 0].plot(range(T), I_path, label="inventories")


ax[0, 0].plot(range(T), S_path, label="sales")
ax[0, 0].plot(range(T), Q_path, label="production")
ax[0, 0].legend(loc=1)
ax[0, 0].set_title("inventories, sales, and production")

ax[0, 1].plot(range(T), (Q_path ­ S_path), color='b')


ax[0, 1].set_ylabel("change in inventories", color='b')
span = max(abs(Q_path ­ S_path))
ax[0, 1].set_ylim(0­span*1.1, 0+span*1.1)
ax[0, 1].set_title("demand shock and change in inventories")

ax1_ = ax[0, 1].twinx()


ax1_.plot(range(T), �_path, color='r')
ax1_.set_ylabel("demand shock", color='r')
span = max(abs(�_path))
ax1_.set_ylim(0­span*1.1, 0+span*1.1)

ax1_.plot([0, T], [0, 0], '­­', color='k')

ax[1, 0].plot(range(T), revenue, label="revenue")


ax[1, 0].plot(range(T), cost_production, label="cost_production")
ax[1, 0].plot(range(T), cost_inventories, label="cost_inventories")
ax[1, 0].legend(loc=1)
ax[1, 0].set_title("profits decomposition")

ax[1, 1].plot(range(T), Q_path, label="production")


ax[1, 1].plot(range(T), Q_hardwired, label='production when $I_t$ \
forced to be zero')
ax[1, 1].plot(range(T), Q_no_inventory, label='production when \
inventories not useful')
ax[1, 1].legend(loc=1)
ax[1, 1].set_title('three production concepts')

plt.show()

Notice that the above code sets parameters at the following default values
702 CHAPTER 43. PRODUCTION SMOOTHING VIA INVENTORIES

• discount factor β=0.96,


• inverse demand function: 𝑎0 = 10, 𝑎1 = 1
• cost of production 𝑐1 = 1, 𝑐2 = 1
• costs of holding inventories 𝑑1 = 1, 𝑑2 = 1
In the examples below, we alter some or all of these parameter values.

43.3 Example 1

In this example, the demand shock follows AR(1) process:

𝜈𝑡 = 𝛼 + 𝜌𝜈𝑡−1 + 𝜖𝑡 ,

which implies

1 1 0 1 0
𝑧𝑡+1 = [ ]=[ ][ ] + [ ] 𝜖𝑡+1 .
𝑣𝑡+1 𝛼 𝜌 ⏟ 𝑣𝑡 1
𝑧𝑡

We set 𝛼 = 1 and 𝜌 = 0.9, their default values.


We’ll calculate and display outcomes, then discuss them below the pertinent figures.

In [4]: ex1 = SmoothingExample()

x0 = [0, 1, 0]
ex1.simulate(x0)

The figures above illustrate various features of an optimal production plan.


43.4. INVENTORIES NOT USEFUL 703

Starting from zero inventories, the firm builds up a stock of inventories and uses them to
smooth costly production in the face of demand shocks.
Optimal decisions evidently respond to demand shocks.
Inventories are always less than sales, so some sales come from current production, a conse-
quence of the cost, 𝑑1 𝐼𝑡 of holding inventories.
The lower right panel shows differences between optimal production and two alternative pro-
duction concepts that come from altering the firm’s cost structure – i.e., its technology.
These two concepts correspond to these distinct altered firm problems.
• a setting in which inventories are not needed
• a setting in which they are needed but we arbitrarily prevent the firm from holding in-
ventories by forcing it to set 𝐼𝑡 = 0 always
We use these two alternative production concepts in order to shed light on the baseline
model.

43.4 Inventories Not Useful

Let’s turn first to the setting in which inventories aren’t needed.


In this problem, the firm forms an output plan that maximizes the expected value of


∑ 𝛽 𝑡 {𝑝𝑡 𝑄𝑡 − 𝐶(𝑄𝑡 )}
𝑡=0

It turns out that the optimal plan for 𝑄𝑡 for this problem also solves a sequence of static
problems max𝑄𝑡 {𝑝𝑡 𝑄𝑡 − 𝑐(𝑄𝑡 )}.
When inventories aren’t required or used, sales always equal production.
This simplifies the problem and the optimal no-inventory production maximizes the expected
value of


∑ 𝛽 𝑡 {𝑝𝑡 𝑄𝑡 − 𝐶 (𝑄𝑡 )} .
𝑡=0

The optimum decision rule is

𝑎0 + 𝜈 𝑡 − 𝑐 1
𝑄𝑛𝑖
𝑡 = .
𝑐2 + 𝑎1

43.5 Inventories Useful but are Hardwired to be Zero Always

Next, we turn to a distinct problem in which inventories are useful – meaning that there are
costs of 𝑑2 (𝐼𝑡 − 𝑆𝑡 )2 associated with having sales not equal to inventories – but we arbitrarily
impose on the firm the costly restriction that it never hold inventories.
Here the firm’s maximization problem is
704 CHAPTER 43. PRODUCTION SMOOTHING VIA INVENTORIES


max ∑ 𝛽 𝑡 {𝑝𝑡 𝑆𝑡 − 𝐶 (𝑄𝑡 ) − 𝑑 (𝐼𝑡 , 𝑆𝑡 )}
{𝐼𝑡 ,𝑄𝑡 ,𝑆𝑡 }
𝑡=0

subject to the restrictions that 𝐼𝑡 = 0 for all 𝑡 and that 𝐼𝑡+1 = 𝐼𝑡 + 𝑄𝑡 − 𝑆𝑡 .


The restriction that 𝐼𝑡 = 0 implies that 𝑄𝑡 = 𝑆𝑡 and that the maximization problem reduces
to


max ∑ 𝛽 𝑡 {𝑝𝑡 𝑄𝑡 − 𝐶 (𝑄𝑡 ) − 𝑑 (0, 𝑄𝑡 )}
𝑄𝑡
𝑡=0

Here the optimal production plan is

𝑎0 + 𝜈 𝑡 − 𝑐 1
𝑄ℎ𝑡 = .
𝑐2 + 𝑎 1 + 𝑑 2

We introduce this 𝐼𝑡 is hardwired to zero specification in order to shed light on the role
that inventories play by comparing outcomes with those under our two other versions of the
problem.
The bottom right panel displays an production path for the original problem that we are in-
terested in (the blue line) as well with an optimal production path for the model in which
inventories are not useful (the green path) and also for the model in which, although invento-
ries are useful, they are hardwired to zero and the firm pays cost 𝑑(0, 𝑄𝑡 ) for not setting sales
𝑆𝑡 = 𝑄𝑡 equal to zero (the orange line).
Notice that it is typically optimal for the firm to produce more when inventories aren’t useful.
Here there is no requirement to sell out of inventories and no costs from having sales deviate
from inventories.
But “typical” does not mean “always”.
Thus, if we look closely, we notice that for small 𝑡, the green “production when inventories
aren’t useful” line in the lower right panel is below optimal production in the original model.
High optimal production in the original model early on occurs because the firm wants to ac-
cumulate inventories quickly in order to acquire high inventories for use in later periods.
But how the green line compares to the blue line early on depends on the evolution of the
demand shock, as we will see in a deterministically seasonal demand shock example to be an-
alyzed below.
In that example, the original firm optimally accumulates inventories slowly because the next
positive demand shock is in the distant future.
To make the green-blue model production comparison easier to see, let’s confine the graphs to
the first 10 periods:

In [5]: ex1.simulate(x0, T=10)


43.6. EXAMPLE 2 705

43.6 Example 2

Next, we shut down randomness in demand and assume that the demand shock 𝜈𝑡 follows a
deterministic path:

𝜈𝑡 = 𝛼 + 𝜌𝜈𝑡−1

Again, we’ll compute and display outcomes in some figures

In [6]: ex2 = SmoothingExample(C2=[[0], [0]])

x0 = [0, 1, 0]
ex2.simulate(x0)
706 CHAPTER 43. PRODUCTION SMOOTHING VIA INVENTORIES

43.7 Example 3

Now we’ll put randomness back into the demand shock process and also assume that there
are zero costs of holding inventories.
In particular, we’ll look at a situation in which 𝑑1 = 0 but 𝑑2 > 0.
Now it becomes optimal to set sales approximately equal to inventories and to use inventories
to smooth production quite well, as the following figures confirm

In [7]: ex3 = SmoothingExample(d1=0)

x0 = [0, 1, 0]
ex3.simulate(x0)
43.8. EXAMPLE 4 707

43.8 Example 4

To bring out some features of the optimal policy that are related to some technical issues in
linear control theory, we’ll now temporarily assume that it is costless to hold inventories.
When we completely shut down the cost of holding inventories by setting 𝑑1 = 0 and 𝑑2 = 0,
something absurd happens (because the Bellman equation is opportunistic and very smart).
(Technically, we have set parameters that end up violating conditions needed to assure sta-
bility of the optimally controlled state.)
The firm finds it optimal to set 𝑄𝑡 ≡ 𝑄∗ = −𝑐2𝑐2 , an output level that sets the costs of produc-
1

tion to zero (when 𝑐1 > 0, as it is with our default settings, then it is optimal to set produc-
tion negative, whatever that means!).
Recall the law of motion for inventories

𝐼𝑡+1 = 𝐼𝑡 + 𝑄𝑡 − 𝑆𝑡

−𝑐1
So when 𝑑1 = 𝑑2 = 0 so that the firm finds it optimal to set 𝑄𝑡 = 2𝑐2 for all 𝑡, then

−𝑐1
𝐼𝑡+1 − 𝐼𝑡 = − 𝑆𝑡 < 0
2𝑐2

for almost all values of 𝑆𝑡 under our default parameters that keep demand positive almost all
of the time.
The dynamic program instructs the firm to set production costs to zero and to run a Ponzi
scheme by running inventories down forever.
708 CHAPTER 43. PRODUCTION SMOOTHING VIA INVENTORIES

(We can interpret this as the firm somehow going short in or borrowing inventories)
The following figures confirm that inventories head south without limit

In [8]: ex4 = SmoothingExample(d1=0, d2=0)

x0 = [0, 1, 0]
ex4.simulate(x0)

Let’s shorten the time span displayed in order to highlight what is going on.
We’ll set the horizon 𝑇 = 30 with the following code

In [9]: # shorter period


ex4.simulate(x0, T=30)
43.9. EXAMPLE 5 709

43.9 Example 5

Now we’ll assume that the demand shock that follows a linear time trend

𝑣𝑡 = 𝑏 + 𝑎𝑡, 𝑎 > 0, 𝑏 > 0

0
To represent this, we set 𝐶2 = [ ] and
0

1 0 1
𝐴22 = [ ] , 𝑥0 = [ ] , 𝐺 = [ 𝑏 𝑎 ]
1 1 0

In [10]: # Set parameters


a = 0.5
b = 3.

In [11]: ex5 = SmoothingExample(A22=[[1, 0], [1, 1]], C2=[[0], [0]], G=[b, a])

x0 = [0, 1, 0] # set the initial inventory as 0


ex5.simulate(x0, T=10)
710 CHAPTER 43. PRODUCTION SMOOTHING VIA INVENTORIES

43.10 Example 6

Now we’ll assume a deterministically seasonal demand shock.


To represent this we’ll set

1 0 0 0 0 0 𝑏
⎡0 0 0 0 1⎤ ⎡0⎤ ⎡𝑎⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
𝐴22 = ⎢0 1 0 0 0⎥ , 𝐶2 = ⎢0⎥ , 𝐺′ = ⎢0⎥
⎢0 0 1 0 0⎥ ⎢0⎥ ⎢0⎥
⎣0 0 0 1 0⎦ ⎣0⎦ ⎣0⎦

where 𝑎 > 0, 𝑏 > 0 and

1
⎡0⎤
⎢ ⎥
𝑥0 = ⎢1⎥
⎢0⎥
⎣0⎦

In [12]: ex6 = SmoothingExample(A22=[[1, 0, 0, 0, 0],


[0, 0, 0, 0, 1],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0]],
C2=[[0], [0], [0], [0], [0]],
G=[b, a, 0, 0, 0])
43.10. EXAMPLE 6 711

x00 = [0, 1, 0, 1, 0, 0] # Set the initial inventory as 0


ex6.simulate(x00, T=20)

Now we’ll generate some more examples that differ simply from the initial season of the year
in which we begin the demand shock

In [13]: x01 = [0, 1, 1, 0, 0, 0]


ex6.simulate(x01, T=20)
712 CHAPTER 43. PRODUCTION SMOOTHING VIA INVENTORIES

In [14]: x02 = [0, 1, 0, 0, 1, 0]


ex6.simulate(x02, T=20)

In [15]: x03 = [0, 1, 0, 0, 0, 1]


ex6.simulate(x03, T=20)
43.11. EXERCISES 713

43.11 Exercises

Please try to analyze some inventory sales smoothing problems using the SmoothingExample
class.

43.11.1 Exercise 1

Assume that the demand shock follows AR(2) process below:

𝜈𝑡 = 𝛼 + 𝜌1 𝜈𝑡−1 + 𝜌2 𝜈𝑡−2 + 𝜖𝑡 .

where 𝛼 = 1, 𝜌1 = 1.2, and 𝜌2 = −0.3. You need to construct 𝐴22, 𝐶, and 𝐺 matrices prop-
erly and then to input them as the keyword arguments of SmoothingExample class. Simulate

paths starting from the initial condition 𝑥0 = [0, 1, 0, 0] .
After this, try to construct a very similar SmoothingExample with the same demand shock
process but exclude the randomness 𝜖𝑡 . Compute the stationary states 𝑥̄ by simulating for a
long period. Then try to add shocks with different magnitude to 𝜈𝑡̄ and simulate paths. You
should see how firms respond differently by staring at the production plans.

43.11.2 Exercise 2

Change parameters of 𝐶(𝑄𝑡 ) and 𝑑(𝐼𝑡 , 𝑆𝑡 ).

1. Make production more costly, by setting 𝑐2 = 5.

2. Increase the cost of having inventories deviate from sales, by setting 𝑑2 = 5.

43.11.3 Solution 1

In [16]: # set parameters


α = 1
ρ1 = 1.2
ρ2 = ­.3

In [17]: # construct matrices


A22 =[[1, 0, 0],
[1, ρ1, ρ2],
[0, 1, 0]]
C2 = [[0], [1], [0]]
G = [0, 1, 0]

In [18]: ex1 = SmoothingExample(A22=A22, C2=C2, G=G)

x0 = [0, 1, 0, 0] # initial condition


ex1.simulate(x0)
714 CHAPTER 43. PRODUCTION SMOOTHING VIA INVENTORIES

In [19]: # now silence the noise


ex1_no_noise = SmoothingExample(A22=A22, C2=[[0], [0], [0]], G=G)

# initial condition
x0 = [0, 1, 0, 0]

# compute stationary states


x_bar = ex1_no_noise.LQ.compute_sequence(x0, ts_length=250)[0][:, ­1]
x_bar

Out[19]: array([ 3.69387755, 1. , 10. , 10. ])

In the following, we add small and large shocks to 𝜈𝑡̄ and compare how firm responds differ-
ently in quantity. As the shock is not very persistent under the parameterization we are us-
ing, we focus on a short period response.

In [20]: T = 40

In [21]: # small shock


x_bar1 = x_bar.copy()
x_bar1[2] += 2
ex1_no_noise.simulate(x_bar1, T=T)
43.11. EXERCISES 715

In [22]: # large shock


x_bar1 = x_bar.copy()
x_bar1[2] += 10
ex1_no_noise.simulate(x_bar1, T=T)
716 CHAPTER 43. PRODUCTION SMOOTHING VIA INVENTORIES

43.11.4 Solution 2

In [23]: x0 = [0, 1, 0]

In [24]: SmoothingExample(c2=5).simulate(x0)

In [25]: SmoothingExample(d2=5).simulate(x0)
Part VII

Multiple Agent Models

717
Chapter 44

Schelling’s Segregation Model

44.1 Contents

• Outline 44.2
• The Model 44.3
• Results 44.4
• Exercises 44.5
• Solutions 44.6

44.2 Outline

In 1969, Thomas C. Schelling developed a simple but striking model of racial segregation [98].
His model studies the dynamics of racially mixed neighborhoods.
Like much of Schelling’s work, the model shows how local interactions can lead to surprising
aggregate structure.
In particular, it shows that relatively mild preference for neighbors of similar race can lead in
aggregate to the collapse of mixed neighborhoods, and high levels of segregation.
In recognition of this and other research, Schelling was awarded the 2005 Nobel Prize in Eco-
nomic Sciences (joint with Robert Aumann).
In this lecture, we (in fact you) will build and run a version of Schelling’s model.
Let’s start with some imports:

In [1]: from random import uniform, seed


from math import sqrt
import matplotlib.pyplot as plt
%matplotlib inline

44.3 The Model

We will cover a variation of Schelling’s model that is easy to program and captures the main
idea.

719
720 CHAPTER 44. SCHELLING’S SEGREGATION MODEL

44.3.1 Set-Up

Suppose we have two types of people: orange people and green people.
For the purpose of this lecture, we will assume there are 250 of each type.
These agents all live on a single unit square.
The location of an agent is just a point (𝑥, 𝑦), where 0 < 𝑥, 𝑦 < 1.

44.3.2 Preferences

We will say that an agent is happy if half or more of her 10 nearest neighbors are of the same
type.
Here ‘nearest’ is in terms of Euclidean distance.
An agent who is not happy is called unhappy.
An important point here is that agents are not averse to living in mixed areas.
They are perfectly happy if half their neighbors are of the other color.

44.3.3 Behavior

Initially, agents are mixed together (integrated).


In particular, the initial location of each agent is an independent draw from a bivariate uni-
form distribution on 𝑆 = (0, 1)2 .
Now, cycling through the set of all agents, each agent is now given the chance to stay or
move.
We assume that each agent will stay put if they are happy and move if unhappy.
The algorithm for moving is as follows

1. Draw a random location in 𝑆

2. If happy at new location, move there

3. Else, go to step 1

In this way, we cycle continuously through the agents, moving as required.


We continue to cycle until no one wishes to move.

44.4 Results

Let’s have a look at the results we got when we coded and ran this model.
As discussed above, agents are initially mixed randomly together.
44.4. RESULTS 721

But after several cycles, they become segregated into distinct regions.
722 CHAPTER 44. SCHELLING’S SEGREGATION MODEL
44.4. RESULTS 723
724 CHAPTER 44. SCHELLING’S SEGREGATION MODEL

In this instance, the program terminated after 4 cycles through the set of agents, indicating
that all agents had reached a state of happiness.
What is striking about the pictures is how rapidly racial integration breaks down.
This is despite the fact that people in the model don’t actually mind living mixed with the
other type.
Even with these preferences, the outcome is a high degree of segregation.

44.5 Exercises

44.5.1 Exercise 1

Implement and run this simulation for yourself.


Consider the following structure for your program.
Agents can be modeled as objects.
Here’s an indication of how they might look

* Data:

* type (green or orange)


44.6. SOLUTIONS 725

* location

* Methods:

* determine whether happy or not given locations of other agents

* If not happy, move

* find a new location where happy

And here’s some pseudocode for the main loop

while agents are still moving


for agent in agents
give agent the opportunity to move

Use 250 agents of each type.

44.6 Solutions

44.6.1 Exercise 1

Here’s one solution that does the job we want.


If you feel like a further exercise, you can probably speed up some of the computations and
then increase the number of agents.

In [2]: seed(10) # For reproducible random numbers

class Agent:

def __init__(self, type):


self.type = type
self.draw_location()

def draw_location(self):
self.location = uniform(0, 1), uniform(0, 1)

def get_distance(self, other):


"Computes the euclidean distance between self and other agent."
a = (self.location[0] ­ other.location[0])**2
b = (self.location[1] ­ other.location[1])**2
return sqrt(a + b)

def happy(self, agents):


"True if sufficient number of nearest neighbors are of the same type."
distances = []
# distances is a list of pairs (d, agent), where d is distance from
# agent to self
for agent in agents:
if self != agent:
distance = self.get_distance(agent)
distances.append((distance, agent))
# == Sort from smallest to largest, according to distance == #
distances.sort()
# == Extract the neighboring agents == #
neighbors = [agent for d, agent in distances[:num_neighbors]]
726 CHAPTER 44. SCHELLING’S SEGREGATION MODEL

# == Count how many neighbors have the same type as self == #


num_same_type = sum(self.type == agent.type for agent in neighbors)
return num_same_type >= require_same_type

def update(self, agents):


"If not happy, then randomly choose new locations until happy."
while not self.happy(agents):
self.draw_location()

def plot_distribution(agents, cycle_num):


"Plot the distribution of agents after cycle_num rounds of the loop."
x_values_0, y_values_0 = [], []
x_values_1, y_values_1 = [], []
# == Obtain locations of each type == #
for agent in agents:
x, y = agent.location
if agent.type == 0:
x_values_0.append(x)
y_values_0.append(y)
else:
x_values_1.append(x)
y_values_1.append(y)
fig, ax = plt.subplots(figsize=(8, 8))
plot_args = {'markersize': 8, 'alpha': 0.6}
ax.set_facecolor('azure')
ax.plot(x_values_0, y_values_0, 'o', markerfacecolor='orange', **plot_args)
ax.plot(x_values_1, y_values_1, 'o', markerfacecolor='green', **plot_args)
ax.set_title(f'Cycle {cycle_num­1}')
plt.show()

# == Main == #

num_of_type_0 = 250
num_of_type_1 = 250
num_neighbors = 10 # Number of agents regarded as neighbors
require_same_type = 5 # Want at least this many neighbors to be same type

# == Create a list of agents == #


agents = [Agent(0) for i in range(num_of_type_0)]
agents.extend(Agent(1) for i in range(num_of_type_1))

count = 1
# == Loop until none wishes to move == #
while True:
print('Entering loop ', count)
plot_distribution(agents, count)
count += 1
no_one_moved = True
for agent in agents:
old_location = agent.location
agent.update(agents)
if agent.location != old_location:
no_one_moved = False
if no_one_moved:
break

print('Converged, terminating.')

Entering loop 1
44.6. SOLUTIONS 727

Entering loop 2
728 CHAPTER 44. SCHELLING’S SEGREGATION MODEL

Entering loop 3
44.6. SOLUTIONS 729

Entering loop 4
730 CHAPTER 44. SCHELLING’S SEGREGATION MODEL

Converged, terminating.
Chapter 45

A Lake Model of Employment and


Unemployment

45.1 Contents

• Overview 45.2
• The Model 45.3
• Implementation 45.4
• Dynamics of an Individual Worker 45.5
• Endogenous Job Finding Rate 45.6
• Exercises 45.7
• Solutions 45.8
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon

45.2 Overview

This lecture describes what has come to be called a lake model.


The lake model is a basic tool for modeling unemployment.
It allows us to analyze
• flows between unemployment and employment.
• how these flows influence steady state employment and unemployment rates.
It is a good model for interpreting monthly labor department reports on gross and net jobs
created and jobs destroyed.
The “lakes” in the model are the pools of employed and unemployed.
The “flows” between the lakes are caused by
• firing and hiring
• entry and exit from the labor force
For the first part of this lecture, the parameters governing transitions into and out of unem-
ployment and employment are exogenous.

731
732 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

Later, we’ll determine some of these transition rates endogenously using the McCall search
model.
We’ll also use some nifty concepts like ergodicity, which provides a fundamental link between
cross-sectional and long run time series distributions.
These concepts will help us build an equilibrium model of ex-ante homogeneous workers
whose different luck generates variations in their ex post experiences.
Let’s start with some imports:

In [2]: import numpy as np


from quantecon import MarkovChain
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.stats import norm
from scipy.optimize import brentq
from quantecon.distributions import BetaBinomial
from numba import jit

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

45.2.1 Prerequisites

Before working through what follows, we recommend you read the lecture on finite Markov
chains.
You will also need some basic linear algebra and probability.

45.3 The Model

The economy is inhabited by a very large number of ex-ante identical workers.


The workers live forever, spending their lives moving between unemployment and employ-
ment.
Their rates of transition between employment and unemployment are governed by the follow-
ing parameters:
• 𝜆, the job finding rate for currently unemployed workers
• 𝛼, the dismissal rate for currently employed workers
• 𝑏, the entry rate into the labor force
• 𝑑, the exit rate from the labor force
The growth rate of the labor force evidently equals 𝑔 = 𝑏 − 𝑑.

45.3.1 Aggregate Variables

We want to derive the dynamics of the following aggregates


• 𝐸𝑡 , the total number of employed workers at date 𝑡
45.3. THE MODEL 733

• 𝑈𝑡 , the total number of unemployed workers at 𝑡


• 𝑁𝑡 , the number of workers in the labor force at 𝑡
We also want to know the values of the following objects
• The employment rate 𝑒𝑡 ∶= 𝐸𝑡 /𝑁𝑡 .
• The unemployment rate 𝑢𝑡 ∶= 𝑈𝑡 /𝑁𝑡 .
(Here and below, capital letters represent aggregates and lowercase letters represent rates)

45.3.2 Laws of Motion for Stock Variables

We begin by constructing laws of motion for the aggregate variables 𝐸𝑡 , 𝑈𝑡 , 𝑁𝑡 .


Of the mass of workers 𝐸𝑡 who are employed at date 𝑡,
• (1 − 𝑑)𝐸𝑡 will remain in the labor force
• of these, (1 − 𝛼)(1 − 𝑑)𝐸𝑡 will remain employed
Of the mass of workers 𝑈𝑡 workers who are currently unemployed,
• (1 − 𝑑)𝑈𝑡 will remain in the labor force
• of these, (1 − 𝑑)𝜆𝑈𝑡 will become employed
Therefore, the number of workers who will be employed at date 𝑡 + 1 will be

𝐸𝑡+1 = (1 − 𝑑)(1 − 𝛼)𝐸𝑡 + (1 − 𝑑)𝜆𝑈𝑡

A similar analysis implies

𝑈𝑡+1 = (1 − 𝑑)𝛼𝐸𝑡 + (1 − 𝑑)(1 − 𝜆)𝑈𝑡 + 𝑏(𝐸𝑡 + 𝑈𝑡 )

The value 𝑏(𝐸𝑡 + 𝑈𝑡 ) is the mass of new workers entering the labor force unemployed.
The total stock of workers 𝑁𝑡 = 𝐸𝑡 + 𝑈𝑡 evolves as

𝑁𝑡+1 = (1 + 𝑏 − 𝑑)𝑁𝑡 = (1 + 𝑔)𝑁𝑡

𝑈𝑡
Letting 𝑋𝑡 ∶= ( ), the law of motion for 𝑋 is
𝐸𝑡

(1 − 𝑑)(1 − 𝜆) + 𝑏 (1 − 𝑑)𝛼 + 𝑏
𝑋𝑡+1 = 𝐴𝑋𝑡 where 𝐴 ∶= ( )
(1 − 𝑑)𝜆 (1 − 𝑑)(1 − 𝛼)

This law tells us how total employment and unemployment evolve over time.

45.3.3 Laws of Motion for Rates

Now let’s derive the law of motion for rates.


To get these we can divide both sides of 𝑋𝑡+1 = 𝐴𝑋𝑡 by 𝑁𝑡+1 to get

𝑈 /𝑁 1 𝑈 /𝑁
( 𝑡+1 𝑡+1 ) = 𝐴 ( 𝑡 𝑡)
𝐸𝑡+1 /𝑁𝑡+1 1+𝑔 𝐸 𝑡 /𝑁𝑡
734 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

Letting

𝑢 𝑈 /𝑁
𝑥𝑡 ∶= ( 𝑡 ) = ( 𝑡 𝑡 )
𝑒𝑡 𝐸𝑡 /𝑁𝑡

we can also write this as

̂ 1
𝑥𝑡+1 = 𝐴𝑥 𝑡 where 𝐴 ̂ ∶= 𝐴
1+𝑔

You can check that 𝑒𝑡 + 𝑢𝑡 = 1 implies that 𝑒𝑡+1 + 𝑢𝑡+1 = 1.


This follows from the fact that the columns of 𝐴 ̂ sum to 1.

45.4 Implementation

Let’s code up these equations.


To do this we’re going to use a class that we’ll call LakeModel.
This class will

1. store the primitives 𝛼, 𝜆, 𝑏, 𝑑

2. compute and store the implied objects 𝑔, 𝐴, 𝐴 ̂

3. provide methods to simulate dynamics of the stocks and rates

4. provide a method to compute the steady state of the rate

Please be careful because the implied objects 𝑔, 𝐴, 𝐴 ̂ will not change if you only change the
primitives.
For example, if you would like to update a primitive like 𝛼 = 0.03, you need to create an
instance and update it by lm = LakeModel(α=0.03).
In the exercises, we show how to avoid this issue by using getter and setter methods.

In [3]: class LakeModel:


"""
Solves the lake model and computes dynamics of unemployment stocks and
rates.

Parameters:
­­­­­­­­­­­­
λ : scalar
The job finding rate for currently unemployed workers
α : scalar
The dismissal rate for currently employed workers
b : scalar
Entry rate into the labor force
d : scalar
Exit rate from the labor force

"""
def __init__(self, λ=0.283, α=0.013, b=0.0124, d=0.00822):
45.4. IMPLEMENTATION 735

self.λ, self.α, self.b, self.d = λ, α, b, d

λ, α, b, d = self.λ, self.α, self.b, self.d


self.g = b ­ d
self.A = np.array([[(1­d) * (1­λ) + b, (1 ­ d) * α + b],
[ (1­d) * λ, (1 ­ d) * (1 ­ α)]])

self.A_hat = self.A / (1 + self.g)

def rate_steady_state(self, tol=1e­6):


"""
Finds the steady state of the system :math:`x_{t+1} = \hat A x_{t}`

Returns
­­­­­­­­
xbar : steady state vector of employment and unemployment rates
"""
x = 0.5 * np.ones(2)
error = tol + 1
while error > tol:
new_x = self.A_hat @ x
error = np.max(np.abs(new_x ­ x))
x = new_x
return x

def simulate_stock_path(self, X0, T):


"""
Simulates the sequence of Employment and Unemployment stocks

Parameters
­­­­­­­­­­­­
X0 : array
Contains initial values (E0, U0)
T : int
Number of periods to simulate

Returns
­­­­­­­­­
X : iterator
Contains sequence of employment and unemployment stocks
"""

X = np.atleast_1d(X0) # Recast as array just in case


for t in range(T):
yield X
X = self.A @ X

def simulate_rate_path(self, x0, T):


"""
Simulates the sequence of employment and unemployment rates

Parameters
­­­­­­­­­­­­
x0 : array
Contains initial values (e0,u0)
T : int
Number of periods to simulate

Returns
­­­­­­­­­
x : iterator
Contains sequence of employment and unemployment rates
736 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

"""
x = np.atleast_1d(x0) # Recast as array just in case
for t in range(T):
yield x
x = self.A_hat @ x

As explained, if we create an instance and update it by lm = LakeModel(α=0.03), derived ob-


jects like 𝐴 will also change.

In [4]: lm = LakeModel()
lm.α

Out[4]: 0.013

In [5]: lm.A

Out[5]: array([[0.72350626, 0.02529314],


[0.28067374, 0.97888686]])

In [6]: lm = LakeModel(α = 0.03)


lm.A

Out[6]: array([[0.72350626, 0.0421534 ],


[0.28067374, 0.9620266 ]])

45.4.1 Aggregate Dynamics

Let’s run a simulation under the default parameters (see above) starting from 𝑋0 = (12, 138)

In [7]: lm = LakeModel()
N_0 = 150 # Population
e_0 = 0.92 # Initial employment rate
u_0 = 1 ­ e_0 # Initial unemployment rate
T = 50 # Simulation length

U_0 = u_0 * N_0


E_0 = e_0 * N_0

fig, axes = plt.subplots(3, 1, figsize=(10, 8))


X_0 = (U_0, E_0)
X_path = np.vstack(tuple(lm.simulate_stock_path(X_0, T)))

axes[0].plot(X_path[:, 0], lw=2)


axes[0].set_title('Unemployment')

axes[1].plot(X_path[:, 1], lw=2)


axes[1].set_title('Employment')

axes[2].plot(X_path.sum(1), lw=2)
axes[2].set_title('Labor force')

for ax in axes:
ax.grid()

plt.tight_layout()
plt.show()
45.4. IMPLEMENTATION 737

The aggregates 𝐸𝑡 and 𝑈𝑡 don’t converge because their sum 𝐸𝑡 + 𝑈𝑡 grows at rate 𝑔.
On the other hand, the vector of employment and unemployment rates 𝑥𝑡 can be in a steady
state 𝑥̄ if there exists an 𝑥̄ such that
̂ ̄
• 𝑥 ̄ = 𝐴𝑥
• the components satisfy 𝑒 ̄ + 𝑢̄ = 1
This equation tells us that a steady state level 𝑥̄ is an eigenvector of 𝐴 ̂ associated with a unit
eigenvalue.
We also have 𝑥𝑡 → 𝑥̄ as 𝑡 → ∞ provided that the remaining eigenvalue of 𝐴 ̂ has modulus less
that 1.
This is the case for our default parameters:

In [8]: lm = LakeModel()
e, f = np.linalg.eigvals(lm.A_hat)
abs(e), abs(f)

Out[8]: (0.6953067378358462, 1.0)

Let’s look at the convergence of the unemployment and employment rate to steady state lev-
els (dashed red line)

In [9]: lm = LakeModel()
e_0 = 0.92 # Initial employment rate
u_0 = 1 ­ e_0 # Initial unemployment rate
738 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

T = 50 # Simulation length

xbar = lm.rate_steady_state()

fig, axes = plt.subplots(2, 1, figsize=(10, 8))


x_0 = (u_0, e_0)
x_path = np.vstack(tuple(lm.simulate_rate_path(x_0, T)))

titles = ['Unemployment rate', 'Employment rate']

for i, title in enumerate(titles):


axes[i].plot(x_path[:, i], lw=2, alpha=0.5)
axes[i].hlines(xbar[i], 0, T, 'r', '­­')
axes[i].set_title(title)
axes[i].grid()

plt.tight_layout()
plt.show()

45.5 Dynamics of an Individual Worker

An individual worker’s employment dynamics are governed by a finite state Markov process.
The worker can be in one of two states:
• 𝑠𝑡 = 0 means unemployed
• 𝑠𝑡 = 1 means employed
45.5. DYNAMICS OF AN INDIVIDUAL WORKER 739

Let’s start off under the assumption that 𝑏 = 𝑑 = 0.


The associated transition matrix is then

1−𝜆 𝜆
𝑃 =( )
𝛼 1−𝛼

Let 𝜓𝑡 denote the marginal distribution over employment/unemployment states for the
worker at time 𝑡.
As usual, we regard it as a row vector.
We know from an earlier discussion that 𝜓𝑡 follows the law of motion

𝜓𝑡+1 = 𝜓𝑡 𝑃

We also know from the lecture on finite Markov chains that if 𝛼 ∈ (0, 1) and 𝜆 ∈ (0, 1), then
𝑃 has a unique stationary distribution, denoted here by 𝜓∗ .
The unique stationary distribution satisfies

𝛼
𝜓∗ [0] =
𝛼+𝜆

Not surprisingly, probability mass on the unemployment state increases with the dismissal
rate and falls with the job finding rate.

45.5.1 Ergodicity

Let’s look at a typical lifetime of employment-unemployment spells.


We want to compute the average amounts of time an infinitely lived worker would spend em-
ployed and unemployed.
Let

1 𝑇
𝑠𝑢,𝑇
̄ ∶= ∑ 𝟙{𝑠𝑡 = 0}
𝑇 𝑡=1

and

1 𝑇
𝑠𝑒,𝑇
̄ ∶= ∑ 𝟙{𝑠𝑡 = 1}
𝑇 𝑡=1

(As usual, 𝟙{𝑄} = 1 if statement 𝑄 is true and 0 otherwise)


These are the fraction of time a worker spends unemployed and employed, respectively, up
until period 𝑇 .
If 𝛼 ∈ (0, 1) and 𝜆 ∈ (0, 1), then 𝑃 is ergodic, and hence we have

lim 𝑠𝑢,𝑇
̄ = 𝜓∗ [0] and ̄ = 𝜓∗ [1]
lim 𝑠𝑒,𝑇
𝑇 →∞ 𝑇 →∞
740 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

with probability one.


Inspection tells us that 𝑃 is exactly the transpose of 𝐴 ̂ under the assumption 𝑏 = 𝑑 = 0.
Thus, the percentages of time that an infinitely lived worker spends employed and unem-
ployed equal the fractions of workers employed and unemployed in the steady state distribu-
tion.

45.5.2 Convergence Rate

How long does it take for time series sample averages to converge to cross-sectional averages?
We can use QuantEcon.py’s MarkovChain class to investigate this.
Let’s plot the path of the sample averages over 5,000 periods

In [10]: lm = LakeModel(d=0, b=0)


T = 5000 # Simulation length

α, λ = lm.α, lm.λ

P = [[1 ­ λ, λ],
[ α, 1 ­ α]]

mc = MarkovChain(P)

xbar = lm.rate_steady_state()

fig, axes = plt.subplots(2, 1, figsize=(10, 8))


s_path = mc.simulate(T, init=1)
s_bar_e = s_path.cumsum() / range(1, T+1)
s_bar_u = 1 ­ s_bar_e

to_plot = [s_bar_u, s_bar_e]


titles = ['Percent of time unemployed', 'Percent of time employed']

for i, plot in enumerate(to_plot):


axes[i].plot(plot, lw=2, alpha=0.5)
axes[i].hlines(xbar[i], 0, T, 'r', '­­')
axes[i].set_title(titles[i])
axes[i].grid()

plt.tight_layout()
plt.show()
45.6. ENDOGENOUS JOB FINDING RATE 741

The stationary probabilities are given by the dashed red line.


In this case it takes much of the sample for these two objects to converge.
This is largely due to the high persistence in the Markov chain.

45.6 Endogenous Job Finding Rate

We now make the hiring rate endogenous.


The transition rate from unemployment to employment will be determined by the McCall
search model [80].
All details relevant to the following discussion can be found in our treatment of that model.

45.6.1 Reservation Wage

The most important thing to remember about the model is that optimal decisions are charac-
terized by a reservation wage 𝑤̄
• If the wage offer 𝑤 in hand is greater than or equal to 𝑤,̄ then the worker accepts.
• Otherwise, the worker rejects.
As we saw in our discussion of the model, the reservation wage depends on the wage offer dis-
tribution and the parameters
• 𝛼, the separation rate
742 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

• 𝛽, the discount factor


• 𝛾, the offer arrival rate
• 𝑐, unemployment compensation

45.6.2 Linking the McCall Search Model to the Lake Model

Suppose that all workers inside a lake model behave according to the McCall search model.
The exogenous probability of leaving employment remains 𝛼.
But their optimal decision rules determine the probability 𝜆 of leaving unemployment.
This is now

̄ = 𝛾 ∑ 𝑝(𝑤′ )
𝜆 = 𝛾ℙ{𝑤𝑡 ≥ 𝑤} (1)
𝑤′ ≥𝑤̄

45.6.3 Fiscal Policy

We can use the McCall search version of the Lake Model to find an optimal level of unem-
ployment insurance.
We assume that the government sets unemployment compensation 𝑐.
The government imposes a lump-sum tax 𝜏 sufficient to finance total unemployment pay-
ments.
To attain a balanced budget at a steady state, taxes, the steady state unemployment rate 𝑢,
and the unemployment compensation rate must satisfy

𝜏 = 𝑢𝑐

The lump-sum tax applies to everyone, including unemployed workers.


Thus, the post-tax income of an employed worker with wage 𝑤 is 𝑤 − 𝜏 .
The post-tax income of an unemployed worker is 𝑐 − 𝜏 .
For each specification (𝑐, 𝜏 ) of government policy, we can solve for the worker’s optimal reser-
vation wage.
This determines 𝜆 via (1) evaluated at post tax wages, which in turn determines a steady
state unemployment rate 𝑢(𝑐, 𝜏 ).
For a given level of unemployment benefit 𝑐, we can solve for a tax that balances the budget
in the steady state

𝜏 = 𝑢(𝑐, 𝜏 )𝑐

To evaluate alternative government tax-unemployment compensation pairs, we require a wel-


fare criterion.
We use a steady state welfare criterion

𝑊 ∶= 𝑒 𝔼[𝑉 | employed] + 𝑢 𝑈
45.6. ENDOGENOUS JOB FINDING RATE 743

where the notation 𝑉 and 𝑈 is as defined in the McCall search model lecture.
The wage offer distribution will be a discretized version of the lognormal distribution
𝐿𝑁 (log(20), 1), as shown in the next figure

We take a period to be a month.


We set 𝑏 and 𝑑 to match monthly birth and death rates, respectively, in the U.S. population
• 𝑏 = 0.0124
• 𝑑 = 0.00822
Following [24], we set 𝛼, the hazard rate of leaving employment, to
• 𝛼 = 0.013

45.6.4 Fiscal Policy Code

We will make use of techniques from the McCall model lecture


The first piece of code implements value function iteration

In [11]: # A default utility function

@jit
def u(c, σ):
if c > 0:
return (c**(1 ­ σ) ­ 1) / (1 ­ σ)
else:
return ­10e6

class McCallModel:
"""
Stores the parameters and functions associated with a given model.
"""

def __init__(self,
α=0.2, # Job separation rate
β=0.98, # Discount rate
744 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

γ=0.7, # Job offer rate


c=6.0, # Unemployment compensation
σ=2.0, # Utility parameter
w_vec=None, # Possible wage values
p_vec=None): # Probabilities over w_vec

self.α, self.β, self.γ, self.c = α, β, γ, c


self.σ = σ

# Add a default wage vector and probabilities over the vector using
# the beta­binomial distribution
if w_vec is None:
n = 60 # Number of possible outcomes for wage
# Wages between 10 and 20
self.w_vec = np.linspace(10, 20, n)
a, b = 600, 400 # Shape parameters
dist = BetaBinomial(n­1, a, b)
self.p_vec = dist.pdf()
else:
self.w_vec = w_vec
self.p_vec = p_vec

@jit
def _update_bellman(α, β, γ, c, σ, w_vec, p_vec, V, V_new, U):
"""
A jitted function to update the Bellman equations. Note that V_new is
modified in place (i.e, modified by this function). The new value of U
is returned.

"""
for w_idx, w in enumerate(w_vec):
# w_idx indexes the vector of possible wages
V_new[w_idx] = u(w, σ) + β * ((1 ­ α) * V[w_idx] + α * U)

U_new = u(c, σ) + β * (1 ­ γ) * U + \
β * γ * np.sum(np.maximum(U, V) * p_vec)

return U_new

def solve_mccall_model(mcm, tol=1e­5, max_iter=2000):


"""
Iterates to convergence on the Bellman equations

Parameters
­­­­­­­­­­
mcm : an instance of McCallModel
tol : float
error tolerance
max_iter : int
the maximum number of iterations
"""

V = np.ones(len(mcm.w_vec)) # Initial guess of V


V_new = np.empty_like(V) # To store updates to V
U = 1 # Initial guess of U
i = 0
error = tol + 1

while error > tol and i < max_iter:


U_new = _update_bellman(mcm.α, mcm.β, mcm.γ,
mcm.c, mcm.σ, mcm.w_vec, mcm.p_vec, V, V_new, U)
error_1 = np.max(np.abs(V_new ­ V))
45.6. ENDOGENOUS JOB FINDING RATE 745

error_2 = np.abs(U_new ­ U)
error = max(error_1, error_2)
V[:] = V_new
U = U_new
i += 1

return V, U

The second piece of code is used to complete the reservation wage:

In [12]: def compute_reservation_wage(mcm, return_values=False):


"""
Computes the reservation wage of an instance of the McCall model
by finding the smallest w such that V(w) > U.

If V(w) > U for all w, then the reservation wage w_bar is set to
the lowest wage in mcm.w_vec.

If v(w) < U for all w, then w_bar is set to np.inf.

Parameters
­­­­­­­­­­
mcm : an instance of McCallModel
return_values : bool (optional, default=False)
Return the value functions as well

Returns
­­­­­­­
w_bar : scalar
The reservation wage

"""

V, U = solve_mccall_model(mcm)
w_idx = np.searchsorted(V ­ U, 0)

if w_idx == len(V):
w_bar = np.inf
else:
w_bar = mcm.w_vec[w_idx]

if return_values == False:
return w_bar
else:
return w_bar, V, U

Now let’s compute and plot welfare, employment, unemployment, and tax revenue as a func-
tion of the unemployment compensation rate

In [13]: # Some global variables that will stay constant


α = 0.013
α_q = (1­(1­α)**3) # Quarterly (α is monthly)
b = 0.0124
d = 0.00822
β = 0.98
γ = 1.0
σ = 2.0

# The default wage distribution ­­­ a discretized lognormal


log_wage_mean, wage_grid_size, max_wage = 20, 200, 170
746 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

logw_dist = norm(np.log(log_wage_mean), 1)
w_vec = np.linspace(1e­8, max_wage, wage_grid_size + 1)
cdf = logw_dist.cdf(np.log(w_vec))
pdf = cdf[1:] ­ cdf[:­1]
p_vec = pdf / pdf.sum()
w_vec = (w_vec[1:] + w_vec[:­1]) / 2

def compute_optimal_quantities(c, τ):


"""
Compute the reservation wage, job finding rate and value functions
of the workers given c and τ.

"""

mcm = McCallModel(α=α_q,
β=β,
γ=γ,
c=c­τ, # Post tax compensation
σ=σ,
w_vec=w_vec­τ, # Post tax wages
p_vec=p_vec)

w_bar, V, U = compute_reservation_wage(mcm, return_values=True)


λ = γ * np.sum(p_vec[w_vec ­ τ > w_bar])
return w_bar, λ, V, U

def compute_steady_state_quantities(c, τ):


"""
Compute the steady state unemployment rate given c and τ using optimal
quantities from the McCall model and computing corresponding steady
state quantities

"""
w_bar, λ, V, U = compute_optimal_quantities(c, τ)

# Compute steady state employment and unemployment rates


lm = LakeModel(α=α_q, λ=λ, b=b, d=d)
x = lm.rate_steady_state()
u, e = x

# Compute steady state welfare


w = np.sum(V * p_vec * (w_vec ­ τ > w_bar)) / np.sum(p_vec * (w_vec ­
τ > w_bar))
welfare = e * w + u * U

return e, u, welfare

def find_balanced_budget_tax(c):
"""
Find the tax level that will induce a balanced budget.

"""
def steady_state_budget(t):
e, u, w = compute_steady_state_quantities(c, t)
return t ­ u * c

τ = brentq(steady_state_budget, 0.0, 0.9 * c)


return τ

# Levels of unemployment insurance we wish to study


45.6. ENDOGENOUS JOB FINDING RATE 747

c_vec = np.linspace(5, 140, 60)

tax_vec = []
unempl_vec = []
empl_vec = []
welfare_vec = []

for c in c_vec:
t = find_balanced_budget_tax(c)
e_rate, u_rate, welfare = compute_steady_state_quantities(c, t)
tax_vec.append(t)
unempl_vec.append(u_rate)
empl_vec.append(e_rate)
welfare_vec.append(welfare)

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

plots = [unempl_vec, empl_vec, tax_vec, welfare_vec]


titles = ['Unemployment', 'Employment', 'Tax', 'Welfare']

for ax, plot, title in zip(axes.flatten(), plots, titles):


ax.plot(c_vec, plot, lw=2, alpha=0.7)
ax.set_title(title)
ax.grid()

plt.tight_layout()
plt.show()

Welfare first increases and then decreases as unemployment benefits rise.


748 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

The level that maximizes steady state welfare is approximately 62.

45.7 Exercises

45.7.1 Exercise 1

In the Lake Model, there is derived data such as 𝐴 which depends on primitives like 𝛼 and 𝜆.
So, when a user alters these primitives, we need the derived data to update automatically.
(For example, if a user changes the value of 𝑏 for a given instance of the class, we would like
𝑔 = 𝑏 − 𝑑 to update automatically)
In the code above, we took care of this issue by creating new instances every time we wanted
to change parameters.
That way the derived data is always matched to current parameter values.
However, we can use descriptors instead, so that derived data is updated whenever parame-
ters are changed.
This is safer and means we don’t need to create a fresh instance for every new parameteriza-
tion.
(On the other hand, the code becomes denser, which is why we don’t always use the descrip-
tor approach in our lectures.)
In this exercise, your task is to arrange the LakeModel class by using descriptors and decora-
tors such as @property.
(If you need to refresh your understanding of how these work, consult this lecture.)

45.7.2 Exercise 2

Consider an economy with an initial stock of workers 𝑁0 = 100 at the steady state level of
employment in the baseline parameterization
• 𝛼 = 0.013
• 𝜆 = 0.283
• 𝑏 = 0.0124
• 𝑑 = 0.00822
(The values for 𝛼 and 𝜆 follow [24])
Suppose that in response to new legislation the hiring rate reduces to 𝜆 = 0.2.
Plot the transition dynamics of the unemployment and employment stocks for 50 periods.
Plot the transition dynamics for the rates.
How long does the economy take to converge to its new steady state?
What is the new steady state level of employment?
Note: it may be easier to use the class created in exercise 1 to help with changing variables.
45.8. SOLUTIONS 749

45.7.3 Exercise 3

Consider an economy with an initial stock of workers 𝑁0 = 100 at the steady state level of
employment in the baseline parameterization.
Suppose that for 20 periods the birth rate was temporarily high (𝑏 = 0.025) and then re-
turned to its original level.
Plot the transition dynamics of the unemployment and employment stocks for 50 periods.
Plot the transition dynamics for the rates.
How long does the economy take to return to its original steady state?

45.8 Solutions

45.8.1 Exercise 1

In [14]: class LakeModelModified:


"""
Solves the lake model and computes dynamics of unemployment stocks and
rates.

Parameters:
­­­­­­­­­­­­
λ : scalar
The job finding rate for currently unemployed workers
α : scalar
The dismissal rate for currently employed workers
b : scalar
Entry rate into the labor force
d : scalar
Exit rate from the labor force

"""
def __init__(self, λ=0.283, α=0.013, b=0.0124, d=0.00822):
self._λ, self._α, self._b, self._d = λ, α, b, d
self.compute_derived_values()

def compute_derived_values(self):
# Unpack names to simplify expression
λ, α, b, d = self._λ, self._α, self._b, self._d

self._g = b ­ d
self._A = np.array([[(1­d) * (1­λ) + b, (1 ­ d) * α + b],
[ (1­d) * λ, (1 ­ d) * (1 ­ α)]])

self._A_hat = self._A / (1 + self._g)

@property
def g(self):
return self._g

@property
def A(self):
return self._A

@property
def A_hat(self):
return self._A_hat
750 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

@property
def λ(self):
return self._λ

@λ.setter
def λ(self, new_value):
self._λ = new_value
self.compute_derived_values()

@property
def α(self):
return self._α

@α.setter
def α(self, new_value):
self._α = new_value
self.compute_derived_values()

@property
def b(self):
return self._b

@b.setter
def b(self, new_value):
self._b = new_value
self.compute_derived_values()

@property
def d(self):
return self._d

@d.setter
def d(self, new_value):
self._d = new_value
self.compute_derived_values()

def rate_steady_state(self, tol=1e­6):


"""
Finds the steady state of the system :math:`x_{t+1} = \hat A x_{t}`

Returns
­­­­­­­­
xbar : steady state vector of employment and unemployment rates
"""
x = 0.5 * np.ones(2)
error = tol + 1
while error > tol:
new_x = self.A_hat @ x
error = np.max(np.abs(new_x ­ x))
x = new_x
return x

def simulate_stock_path(self, X0, T):


"""
Simulates the sequence of Employment and Unemployment stocks

Parameters
­­­­­­­­­­­­
X0 : array
Contains initial values (E0, U0)
T : int
45.8. SOLUTIONS 751

Number of periods to simulate

Returns
­­­­­­­­­
X : iterator
Contains sequence of employment and unemployment stocks
"""

X = np.atleast_1d(X0) # Recast as array just in case


for t in range(T):
yield X
X = self.A @ X

def simulate_rate_path(self, x0, T):


"""
Simulates the sequence of employment and unemployment rates

Parameters
­­­­­­­­­­­­
x0 : array
Contains initial values (e0,u0)
T : int
Number of periods to simulate

Returns
­­­­­­­­­
x : iterator
Contains sequence of employment and unemployment rates

"""
x = np.atleast_1d(x0) # Recast as array just in case
for t in range(T):
yield x
x = self.A_hat @ x

45.8.2 Exercise 2

We begin by constructing the class containing the default parameters and assigning the
steady state values to x0

In [15]: lm = LakeModelModified()
x0 = lm.rate_steady_state()
print(f"Initial Steady State: {x0}")

Initial Steady State: [0.08266806 0.91733194]

Initialize the simulation values

In [16]: N0 = 100
T = 50

New legislation changes 𝜆 to 0.2

In [17]: lm.λ = 0.2

xbar = lm.rate_steady_state() # new steady state


752 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

X_path = np.vstack(tuple(lm.simulate_stock_path(x0 * N0, T)))


x_path = np.vstack(tuple(lm.simulate_rate_path(x0, T)))
print(f"New Steady State: {xbar}")

New Steady State: [0.11309573 0.88690427]

Now plot stocks

In [18]: fig, axes = plt.subplots(3, 1, figsize=[10, 9])

axes[0].plot(X_path[:, 0])
axes[0].set_title('Unemployment')

axes[1].plot(X_path[:, 1])
axes[1].set_title('Employment')

axes[2].plot(X_path.sum(1))
axes[2].set_title('Labor force')

for ax in axes:
ax.grid()

plt.tight_layout()
plt.show()
45.8. SOLUTIONS 753

And how the rates evolve

In [19]: fig, axes = plt.subplots(2, 1, figsize=(10, 8))

titles = ['Unemployment rate', 'Employment rate']

for i, title in enumerate(titles):


axes[i].plot(x_path[:, i])
axes[i].hlines(xbar[i], 0, T, 'r', '­­')
axes[i].set_title(title)
axes[i].grid()

plt.tight_layout()
plt.show()

We see that it takes 20 periods for the economy to converge to its new steady state levels.

45.8.3 Exercise 3

This next exercise has the economy experiencing a boom in entrances to the labor market and
then later returning to the original levels.
For 20 periods the economy has a new entry rate into the labor market.
Let’s start off at the baseline parameterization and record the steady state

In [20]: lm = LakeModelModified()
x0 = lm.rate_steady_state()
754 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

Here are the other parameters:

In [21]: b_hat = 0.025


T_hat = 20

Let’s increase 𝑏 to the new value and simulate for 20 periods

In [22]: lm.b = b_hat


# Simulate stocks
X_path1 = np.vstack(tuple(lm.simulate_stock_path(x0 * N0, T_hat)))
# Simulate rates
x_path1 = np.vstack(tuple(lm.simulate_rate_path(x0, T_hat)))

Now we reset 𝑏 to the original value and then, using the state after 20 periods for the new
initial conditions, we simulate for the additional 30 periods

In [23]: lm.b = 0.0124


# Simulate stocks
X_path2 = np.vstack(tuple(lm.simulate_stock_path(X_path1[­1, :2], T­T_hat+1)))
# Simulate rates
x_path2 = np.vstack(tuple(lm.simulate_rate_path(x_path1[­1, :2], T­T_hat+1)))

Finally, we combine these two paths and plot

In [24]: # note [1:] to avoid doubling period 20


x_path = np.vstack([x_path1, x_path2[1:]])
X_path = np.vstack([X_path1, X_path2[1:]])

fig, axes = plt.subplots(3, 1, figsize=[10, 9])

axes[0].plot(X_path[:, 0])
axes[0].set_title('Unemployment')

axes[1].plot(X_path[:, 1])
axes[1].set_title('Employment')

axes[2].plot(X_path.sum(1))
axes[2].set_title('Labor force')

for ax in axes:
ax.grid()

plt.tight_layout()
plt.show()
45.8. SOLUTIONS 755

And the rates

In [25]: fig, axes = plt.subplots(2, 1, figsize=[10, 6])

titles = ['Unemployment rate', 'Employment rate']

for i, title in enumerate(titles):


axes[i].plot(x_path[:, i])
axes[i].hlines(x0[i], 0, T, 'r', '­­')
axes[i].set_title(title)
axes[i].grid()

plt.tight_layout()
plt.show()
756 CHAPTER 45. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
Chapter 46

Rational Expectations Equilibrium

46.1 Contents

• Overview 46.2
• Defining Rational Expectations Equilibrium 46.3
• Computation of an Equilibrium 46.4
• Exercises 46.5
• Solutions 46.6

“If you’re so smart, why aren’t you rich?”

In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon

46.2 Overview

This lecture introduces the concept of rational expectations equilibrium.


To illustrate it, we describe a linear quadratic version of a famous and important model due
to Lucas and Prescott [74].
This 1971 paper is one of a small number of research articles that kicked off the rational ex-
pectations revolution.
We follow Lucas and Prescott by employing a setting that is readily “Bellmanized” (i.e., capa-
ble of being formulated in terms of dynamic programming problems).
Because we use linear quadratic setups for demand and costs, we can adapt the LQ program-
ming techniques described in this lecture.
We will learn about how a representative agent’s problem differs from a planner’s, and how a
planning problem can be used to compute rational expectations quantities.
We will also learn about how a rational expectations equilibrium can be characterized as a
fixed point of a mapping from a perceived law of motion to an actual law of motion.
Equality between a perceived and an actual law of motion for endogenous market-wide ob-
jects captures in a nutshell what the rational expectations equilibrium concept is all about.

757
758 CHAPTER 46. RATIONAL EXPECTATIONS EQUILIBRIUM

Finally, we will learn about the important “Big 𝐾, little 𝑘” trick, a modeling device widely
used in macroeconomics.
Except that for us
• Instead of “Big 𝐾” it will be “Big 𝑌 ”.
• Instead of “little 𝑘” it will be “little 𝑦”.
Let’s start with some standard imports:

In [2]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

We’ll also use the LQ class from QuantEcon.py.

In [3]: from quantecon import LQ

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

46.2.1 The Big Y, Little y Trick

This widely used method applies in contexts in which a “representative firm” or agent is a
“price taker” operating within a competitive equilibrium.
We want to impose that
• The representative firm or individual takes aggregate 𝑌 as given when it chooses indi-
vidual 𝑦, but ….
• At the end of the day, 𝑌 = 𝑦, so that the representative firm is indeed representative.
The Big 𝑌 , little 𝑦 trick accomplishes these two goals by
• Taking 𝑌 as beyond control when posing the choice problem of who chooses 𝑦; but ….
• Imposing 𝑌 = 𝑦 after having solved the individual’s optimization problem.
Please watch for how this strategy is applied as the lecture unfolds.
We begin by applying the Big 𝑌 , little 𝑦 trick in a very simple static context.

A Simple Static Example of the Big Y, Little y Trick

Consider a static model in which a collection of 𝑛 firms produce a homogeneous good that is
sold in a competitive market.
Each of these 𝑛 firms sell output 𝑦.
The price 𝑝 of the good lies on an inverse demand curve

𝑝 = 𝑎 0 − 𝑎1 𝑌 (1)

where
46.3. DEFINING RATIONAL EXPECTATIONS EQUILIBRIUM 759

• 𝑎𝑖 > 0 for 𝑖 = 0, 1
• 𝑌 = 𝑛𝑦 is the market-wide level of output
Each firm has a total cost function

𝑐(𝑦) = 𝑐1 𝑦 + 0.5𝑐2 𝑦2 , 𝑐𝑖 > 0 for 𝑖 = 1, 2

The profits of a representative firm are 𝑝𝑦 − 𝑐(𝑦).


Using (1), we can express the problem of the representative firm as

max[(𝑎0 − 𝑎1 𝑌 )𝑦 − 𝑐1 𝑦 − 0.5𝑐2 𝑦2 ] (2)


𝑦

In posing problem (2), we want the firm to be a price taker.


We do that by regarding 𝑝 and therefore 𝑌 as exogenous to the firm.
The essence of the Big 𝑌 , little 𝑦 trick is not to set 𝑌 = 𝑛𝑦 before taking the first-order condi-
tion with respect to 𝑦 in problem (2).
This assures that the firm is a price taker.
The first-order condition for problem (2) is

𝑎0 − 𝑎1 𝑌 − 𝑐1 − 𝑐2 𝑦 = 0 (3)

At this point, but not before, we substitute 𝑌 = 𝑛𝑦 into (3) to obtain the following linear
equation

𝑎0 − 𝑐1 − (𝑎1 + 𝑛−1 𝑐2 )𝑌 = 0 (4)

to be solved for the competitive equilibrium market-wide output 𝑌 .


After solving for 𝑌 , we can compute the competitive equilibrium price 𝑝 from the inverse de-
mand curve (1).

46.2.2 Further Reading

References for this lecture include


• [74]
• [95], chapter XIV
• [72], chapter 7

46.3 Defining Rational Expectations Equilibrium

Our first illustration of a rational expectations equilibrium involves a market with 𝑛 firms,
each of which seeks to maximize the discounted present value of profits in the face of adjust-
ment costs.
The adjustment costs induce the firms to make gradual adjustments, which in turn requires
consideration of future prices.
760 CHAPTER 46. RATIONAL EXPECTATIONS EQUILIBRIUM

Individual firms understand that, via the inverse demand curve, the price is determined by
the amounts supplied by other firms.
Hence each firm wants to forecast future total industry supplies.
In our context, a forecast is generated by a belief about the law of motion for the aggregate
state.
Rational expectations equilibrium prevails when this belief coincides with the actual law of
motion generated by production choices induced by this belief.
We formulate a rational expectations equilibrium in terms of a fixed point of an operator that
maps beliefs into optimal beliefs.

46.3.1 Competitive Equilibrium with Adjustment Costs

To illustrate, consider a collection of 𝑛 firms producing a homogeneous good that is sold in a


competitive market.
Each of these 𝑛 firms sell output 𝑦𝑡 .
The price 𝑝𝑡 of the good lies on the inverse demand curve

𝑝𝑡 = 𝑎0 − 𝑎1 𝑌𝑡 (5)

where
• 𝑎𝑖 > 0 for 𝑖 = 0, 1
• 𝑌𝑡 = 𝑛𝑦𝑡 is the market-wide level of output

The Firm’s Problem

Each firm is a price taker.


While it faces no uncertainty, it does face adjustment costs
In particular, it chooses a production plan to maximize


∑ 𝛽 𝑡 𝑟𝑡 (6)
𝑡=0

where

𝛾(𝑦𝑡+1 − 𝑦𝑡 )2
𝑟𝑡 ∶= 𝑝𝑡 𝑦𝑡 − , 𝑦0 given (7)
2

Regarding the parameters,


• 𝛽 ∈ (0, 1) is a discount factor
• 𝛾 > 0 measures the cost of adjusting the rate of output
Regarding timing, the firm observes 𝑝𝑡 and 𝑦𝑡 when it chooses 𝑦𝑡+1 at time 𝑡.
To state the firm’s optimization problem completely requires that we specify dynamics for all
state variables.
46.3. DEFINING RATIONAL EXPECTATIONS EQUILIBRIUM 761

This includes ones that the firm cares about but does not control like 𝑝𝑡 .
We turn to this problem now.

Prices and Aggregate Output

In view of (5), the firm’s incentive to forecast the market price translates into an incentive to
forecast aggregate output 𝑌𝑡 .
Aggregate output depends on the choices of other firms.
We assume that 𝑛 is such a large number that the output of any single firm has a negligible
effect on aggregate output.
That justifies firms in regarding their forecasts of aggregate output as being unaffected by
their own output decisions.

The Firm’s Beliefs

We suppose the firm believes that market-wide output 𝑌𝑡 follows the law of motion

𝑌𝑡+1 = 𝐻(𝑌𝑡 ) (8)

where 𝑌0 is a known initial condition.


The belief function 𝐻 is an equilibrium object, and hence remains to be determined.

Optimal Behavior Given Beliefs

For now, let’s fix a particular belief 𝐻 in (8) and investigate the firm’s response to it.
Let 𝑣 be the optimal value function for the firm’s problem given 𝐻.
The value function satisfies the Bellman equation

𝛾(𝑦′ − 𝑦)2
𝑣(𝑦, 𝑌 ) = max {𝑎0 𝑦 − 𝑎1 𝑦𝑌 − + 𝛽𝑣(𝑦′ , 𝐻(𝑌 ))} (9)

𝑦 2

Let’s denote the firm’s optimal policy function by ℎ, so that

𝑦𝑡+1 = ℎ(𝑦𝑡 , 𝑌𝑡 ) (10)

where

𝛾(𝑦′ − 𝑦)2
ℎ(𝑦, 𝑌 ) ∶= argmax {𝑎0 𝑦 − 𝑎1 𝑦𝑌 − + 𝛽𝑣(𝑦′ , 𝐻(𝑌 ))} (11)
𝑦′ 2

Evidently 𝑣 and ℎ both depend on 𝐻.


762 CHAPTER 46. RATIONAL EXPECTATIONS EQUILIBRIUM

A First-Order Characterization

In what follows it will be helpful to have a second characterization of ℎ, based on first-order


conditions.
The first-order necessary condition for choosing 𝑦′ is

−𝛾(𝑦′ − 𝑦) + 𝛽𝑣𝑦 (𝑦′ , 𝐻(𝑌 )) = 0 (12)

An important useful envelope result of Benveniste-Scheinkman [11] implies that to differenti-


ate 𝑣 with respect to 𝑦 we can naively differentiate the right side of (9), giving

𝑣𝑦 (𝑦, 𝑌 ) = 𝑎0 − 𝑎1 𝑌 + 𝛾(𝑦′ − 𝑦)

Substituting this equation into (12) gives the Euler equation

−𝛾(𝑦𝑡+1 − 𝑦𝑡 ) + 𝛽[𝑎0 − 𝑎1 𝑌𝑡+1 + 𝛾(𝑦𝑡+2 − 𝑦𝑡+1 )] = 0 (13)

The firm optimally sets an output path that satisfies (13), taking (8) as given, and subject to
• the initial conditions for (𝑦0 , 𝑌0 ).
• the terminal condition lim𝑡→∞ 𝛽 𝑡 𝑦𝑡 𝑣𝑦 (𝑦𝑡 , 𝑌𝑡 ) = 0.
This last condition is called the transversality condition, and acts as a first-order necessary
condition “at infinity”.
The firm’s decision rule solves the difference equation (13) subject to the given initial condi-
tion 𝑦0 and the transversality condition.
Note that solving the Bellman equation (9) for 𝑣 and then ℎ in (11) yields a decision rule that
automatically imposes both the Euler equation (13) and the transversality condition.

The Actual Law of Motion for Output

As we’ve seen, a given belief translates into a particular decision rule ℎ.


Recalling that 𝑌𝑡 = 𝑛𝑦𝑡 , the actual law of motion for market-wide output is then

𝑌𝑡+1 = 𝑛ℎ(𝑌𝑡 /𝑛, 𝑌𝑡 ) (14)

Thus, when firms believe that the law of motion for market-wide output is (8), their optimiz-
ing behavior makes the actual law of motion be (14).

46.3.2 Definition of Rational Expectations Equilibrium

A rational expectations equilibrium or recursive competitive equilibrium of the model with ad-
justment costs is a decision rule ℎ and an aggregate law of motion 𝐻 such that

1. Given belief 𝐻, the map ℎ is the firm’s optimal policy function.

2. The law of motion 𝐻 satisfies 𝐻(𝑌 ) = 𝑛ℎ(𝑌 /𝑛, 𝑌 ) for all 𝑌 .


46.4. COMPUTATION OF AN EQUILIBRIUM 763

Thus, a rational expectations equilibrium equates the perceived and actual laws of motion (8)
and (14).

Fixed Point Characterization

As we’ve seen, the firm’s optimum problem induces a mapping Φ from a perceived law of mo-
tion 𝐻 for market-wide output to an actual law of motion Φ(𝐻).
The mapping Φ is the composition of two operations, taking a perceived law of motion into a
decision rule via (9)–(11), and a decision rule into an actual law via (14).
The 𝐻 component of a rational expectations equilibrium is a fixed point of Φ.

46.4 Computation of an Equilibrium

Now let’s consider the problem of computing the rational expectations equilibrium.

46.4.1 Failure of Contractivity

Readers accustomed to dynamic programming arguments might try to address this problem
by choosing some guess 𝐻0 for the aggregate law of motion and then iterating with Φ.
Unfortunately, the mapping Φ is not a contraction.
In particular, there is no guarantee that direct iterations on Φ converge Section ??.
Furthermore, there are examples in which these iterations diverge.
Fortunately, there is another method that works here.
The method exploits a connection between equilibrium and Pareto optimality expressed in
the fundamental theorems of welfare economics (see, e.g, [79]).
Lucas and Prescott [74] used this method to construct a rational expectations equilibrium.
The details follow.

46.4.2 A Planning Problem Approach

Our plan of attack is to match the Euler equations of the market problem with those for a
single-agent choice problem.
As we’ll see, this planning problem can be solved by LQ control (linear regulator).
The optimal quantities from the planning problem are rational expectations equilibrium
quantities.
The rational expectations equilibrium price can be obtained as a shadow price in the planning
problem.
For convenience, in this section, we set 𝑛 = 1.
We first compute a sum of consumer and producer surplus at time 𝑡
764 CHAPTER 46. RATIONAL EXPECTATIONS EQUILIBRIUM

𝑌𝑡
𝛾(𝑌𝑡+1 − 𝑌𝑡 )2
𝑠(𝑌𝑡 , 𝑌𝑡+1 ) ∶= ∫ (𝑎0 − 𝑎1 𝑥) 𝑑𝑥 − (15)
0 2

The first term is the area under the demand curve, while the second measures the social costs
of changing output.
The planning problem is to choose a production plan {𝑌𝑡 } to maximize


∑ 𝛽 𝑡 𝑠(𝑌𝑡 , 𝑌𝑡+1 )
𝑡=0

subject to an initial condition for 𝑌0 .

46.4.3 Solution of the Planning Problem

Evaluating the integral in (15) yields the quadratic form 𝑎0 𝑌𝑡 − 𝑎1 𝑌𝑡2 /2.
As a result, the Bellman equation for the planning problem is

𝑎1 2 𝛾(𝑌 ′ − 𝑌 )2
𝑉 (𝑌 ) = max {𝑎0 𝑌 − 𝑌 − + 𝛽𝑉 (𝑌 ′ )} (16)
𝑌′ 2 2

The associated first-order condition is

−𝛾(𝑌 ′ − 𝑌 ) + 𝛽𝑉 ′ (𝑌 ′ ) = 0 (17)

Applying the same Benveniste-Scheinkman formula gives

𝑉 ′ (𝑌 ) = 𝑎0 − 𝑎1 𝑌 + 𝛾(𝑌 ′ − 𝑌 )

Substituting this into equation (17) and rearranging leads to the Euler equation

𝛽𝑎0 + 𝛾𝑌𝑡 − [𝛽𝑎1 + 𝛾(1 + 𝛽)]𝑌𝑡+1 + 𝛾𝛽𝑌𝑡+2 = 0 (18)

46.4.4 The Key Insight

Return to equation (13) and set 𝑦𝑡 = 𝑌𝑡 for all 𝑡.


(Recall that for this section we’ve set 𝑛 = 1 to simplify the calculations)
A small amount of algebra will convince you that when 𝑦𝑡 = 𝑌𝑡 , equations (18) and (13) are
identical.
Thus, the Euler equation for the planning problem matches the second-order difference equa-
tion that we derived by

1. finding the Euler equation of the representative firm and

2. substituting into it the expression 𝑌𝑡 = 𝑛𝑦𝑡 that “makes the representative firm be rep-
resentative”.
46.5. EXERCISES 765

If it is appropriate to apply the same terminal conditions for these two difference equations,
which it is, then we have verified that a solution of the planning problem is also a rational
expectations equilibrium quantity sequence.
It follows that for this example we can compute equilibrium quantities by forming the optimal
linear regulator problem corresponding to the Bellman equation (16).
The optimal policy function for the planning problem is the aggregate law of motion 𝐻 that
the representative firm faces within a rational expectations equilibrium.

Structure of the Law of Motion

As you are asked to show in the exercises, the fact that the planner’s problem is an LQ prob-
lem implies an optimal policy — and hence aggregate law of motion — taking the form

𝑌𝑡+1 = 𝜅0 + 𝜅1 𝑌𝑡 (19)

for some parameter pair 𝜅0 , 𝜅1 .


Now that we know the aggregate law of motion is linear, we can see from the firm’s Bellman
equation (9) that the firm’s problem can also be framed as an LQ problem.
As you’re asked to show in the exercises, the LQ formulation of the firm’s problem implies a
law of motion that looks as follows

𝑦𝑡+1 = ℎ0 + ℎ1 𝑦𝑡 + ℎ2 𝑌𝑡 (20)

Hence a rational expectations equilibrium will be defined by the parameters (𝜅0 , 𝜅1 , ℎ0 , ℎ1 , ℎ2 )


in (19)–(20).

46.5 Exercises

46.5.1 Exercise 1

Consider the firm problem described above.


Let the firm’s belief function 𝐻 be as given in (19).
Formulate the firm’s problem as a discounted optimal linear regulator problem, being careful
to describe all of the objects needed.
Use the class LQ from the QuantEcon.py package to solve the firm’s problem for the following
parameter values:

𝑎0 = 100, 𝑎1 = 0.05, 𝛽 = 0.95, 𝛾 = 10, 𝜅0 = 95.5, 𝜅1 = 0.95

Express the solution of the firm’s problem in the form (20) and give the values for each ℎ𝑗 .
If there were 𝑛 identical competitive firms all behaving according to (20), what would (20)
imply for the actual law of motion (8) for market supply.
766 CHAPTER 46. RATIONAL EXPECTATIONS EQUILIBRIUM

46.5.2 Exercise 2

Consider the following 𝜅0 , 𝜅1 pairs as candidates for the aggregate law of motion component
of a rational expectations equilibrium (see (19)).
Extending the program that you wrote for exercise 1, determine which if any satisfy the defi-
nition of a rational expectations equilibrium
• (94.0886298678, 0.923409232937)
• (93.2119845412, 0.984323478873)
• (95.0818452486, 0.952459076301)
Describe an iterative algorithm that uses the program that you wrote for exercise 1 to com-
pute a rational expectations equilibrium.
(You are not being asked actually to use the algorithm you are suggesting)

46.5.3 Exercise 3

Recall the planner’s problem described above

1. Formulate the planner’s problem as an LQ problem.


2. Solve it using the same parameter values in exercise 1
• 𝑎0 = 100, 𝑎1 = 0.05, 𝛽 = 0.95, 𝛾 = 10

1. Represent the solution in the form 𝑌𝑡+1 = 𝜅0 + 𝜅1 𝑌𝑡 .


2. Compare your answer with the results from exercise 2.

46.5.4 Exercise 4

A monopolist faces the industry demand curve (5) and chooses {𝑌𝑡 } to maximize ∑𝑡=0 𝛽 𝑡 𝑟𝑡
where

𝛾(𝑌𝑡+1 − 𝑌𝑡 )2
𝑟𝑡 = 𝑝𝑡 𝑌𝑡 −
2
Formulate this problem as an LQ problem.
Compute the optimal policy using the same parameters as the previous exercise.
In particular, solve for the parameters in

𝑌𝑡+1 = 𝑚0 + 𝑚1 𝑌𝑡

Compare your results with the previous exercise – comment.

46.6 Solutions

46.6.1 Exercise 1

To map a problem into a discounted optimal linear control problem, we need to define
46.6. SOLUTIONS 767

• state vector 𝑥𝑡 and control vector 𝑢𝑡


• matrices 𝐴, 𝐵, 𝑄, 𝑅 that define preferences and the law of motion for the state
For the state and control vectors, we choose

𝑦𝑡
𝑥𝑡 = ⎢𝑌𝑡 ⎤

⎥, 𝑢𝑡 = 𝑦𝑡+1 − 𝑦𝑡
1
⎣ ⎦

For 𝐵, 𝑄, 𝑅 we set

1 0 0 1 0 𝑎1 /2 −𝑎0 /2
𝐴=⎡ ⎤
⎢0 𝜅1 𝜅0 ⎥ , 𝐵=⎡ ⎤
⎢0⎥ , 𝑅=⎡ 𝑎
⎢ 1 /2 0 0 ⎤ ⎥, 𝑄 = 𝛾/2
⎣0 0 1 ⎦ ⎣0⎦ ⎣−𝑎0 /2 0 0 ⎦

By multiplying out you can confirm that


• 𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 = −𝑟𝑡
• 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝑢𝑡
We’ll use the module lqcontrol.py to solve the firm’s problem at the stated parameter val-
ues.
This will return an LQ policy 𝐹 with the interpretation 𝑢𝑡 = −𝐹 𝑥𝑡 , or

𝑦𝑡+1 − 𝑦𝑡 = −𝐹0 𝑦𝑡 − 𝐹1 𝑌𝑡 − 𝐹2

Matching parameters with 𝑦𝑡+1 = ℎ0 + ℎ1 𝑦𝑡 + ℎ2 𝑌𝑡 leads to

ℎ0 = −𝐹2 , ℎ 1 = 1 − 𝐹0 , ℎ2 = −𝐹1

Here’s our solution

In [4]: # Model parameters

a0 = 100
a1 = 0.05
β = 0.95
γ = 10.0

# Beliefs

κ0 = 95.5
κ1 = 0.95

# Formulate the LQ problem

A = np.array([[1, 0, 0], [0, κ1, κ0], [0, 0, 1]])


B = np.array([1, 0, 0])
B.shape = 3, 1
R = np.array([[0, a1/2, ­a0/2], [a1/2, 0, 0], [­a0/2, 0, 0]])
Q = 0.5 * γ

# Solve for the optimal policy

lq = LQ(Q, R, A, B, beta=β)
P, F, d = lq.stationary_values()
768 CHAPTER 46. RATIONAL EXPECTATIONS EQUILIBRIUM

F = F.flatten()
out1 = f"F = [{F[0]:.3f}, {F[1]:.3f}, {F[2]:.3f}]"
h0, h1, h2 = ­F[2], 1 ­ F[0], ­F[1]
out2 = f"(h0, h1, h2) = ({h0:.3f}, {h1:.3f}, {h2:.3f})"

print(out1)
print(out2)

F = [­0.000, 0.046, ­96.949]


(h0, h1, h2) = (96.949, 1.000, ­0.046)

The implication is that

𝑦𝑡+1 = 96.949 + 𝑦𝑡 − 0.046 𝑌𝑡

For the case 𝑛 > 1, recall that 𝑌𝑡 = 𝑛𝑦𝑡 , which, combined with the previous equation, yields

𝑌𝑡+1 = 𝑛 (96.949 + 𝑦𝑡 − 0.046 𝑌𝑡 ) = 𝑛96.949 + (1 − 𝑛0.046)𝑌𝑡

46.6.2 Exercise 2

To determine whether a 𝜅0 , 𝜅1 pair forms the aggregate law of motion component of a ratio-
nal expectations equilibrium, we can proceed as follows:
• Determine the corresponding firm law of motion 𝑦𝑡+1 = ℎ0 + ℎ1 𝑦𝑡 + ℎ2 𝑌𝑡 .
• Test whether the associated aggregate law :𝑌𝑡+1 = 𝑛ℎ(𝑌𝑡 /𝑛, 𝑌𝑡 ) evaluates to 𝑌𝑡+1 =
𝜅0 + 𝜅1 𝑌𝑡 .
In the second step, we can use 𝑌𝑡 = 𝑛𝑦𝑡 = 𝑦𝑡 , so that 𝑌𝑡+1 = 𝑛ℎ(𝑌𝑡 /𝑛, 𝑌𝑡 ) becomes

𝑌𝑡+1 = ℎ(𝑌𝑡 , 𝑌𝑡 ) = ℎ0 + (ℎ1 + ℎ2 )𝑌𝑡

Hence to test the second step we can test 𝜅0 = ℎ0 and 𝜅1 = ℎ1 + ℎ2 .


The following code implements this test

In [5]: candidates = ((94.0886298678, 0.923409232937),


(93.2119845412, 0.984323478873),
(95.0818452486, 0.952459076301))

for κ0, κ1 in candidates:

# Form the associated law of motion


A = np.array([[1, 0, 0], [0, κ1, κ0], [0, 0, 1]])

# Solve the LQ problem for the firm


lq = LQ(Q, R, A, B, beta=β)
P, F, d = lq.stationary_values()
F = F.flatten()
h0, h1, h2 = ­F[2], 1 ­ F[0], ­F[1]

# Test the equilibrium condition


if np.allclose((κ0, κ1), (h0, h1 + h2)):
print(f'Equilibrium pair = {κ0}, {κ1}')
print('f(h0, h1, h2) = {h0}, {h1}, {h2}')
break
46.6. SOLUTIONS 769

Equilibrium pair = 95.0818452486, 0.952459076301


f(h0, h1, h2) = {h0}, {h1}, {h2}

The output tells us that the answer is pair (iii), which implies (ℎ0 , ℎ1 , ℎ2 ) =
(95.0819, 1.0000, −.0475).
(Notice we use np.allclose to test equality of floating-point numbers, since exact equality is
too strict).
Regarding the iterative algorithm, one could loop from a given (𝜅0 , 𝜅1 ) pair to the associated
firm law and then to a new (𝜅0 , 𝜅1 ) pair.
This amounts to implementing the operator Φ described in the lecture.
(There is in general no guarantee that this iterative process will converge to a rational expec-
tations equilibrium)

46.6.3 Exercise 3

We are asked to write the planner problem as an LQ problem.


For the state and control vectors, we choose

𝑌
𝑥𝑡 = [ 𝑡 ] , 𝑢𝑡 = 𝑌𝑡+1 − 𝑌𝑡
1

For the LQ matrices, we set

1 0 1 𝑎1 /2 −𝑎0 /2
𝐴=[ ], 𝐵 = [ ], 𝑅=[ ], 𝑄 = 𝛾/2
0 1 0 −𝑎0 /2 0

By multiplying out you can confirm that


• 𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 = −𝑠(𝑌𝑡 , 𝑌𝑡+1 )
• 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝑢𝑡
By obtaining the optimal policy and using 𝑢𝑡 = −𝐹 𝑥𝑡 or

𝑌𝑡+1 − 𝑌𝑡 = −𝐹0 𝑌𝑡 − 𝐹1

we can obtain the implied aggregate law of motion via 𝜅0 = −𝐹1 and 𝜅1 = 1 − 𝐹0 .
The Python code to solve this problem is below:

In [6]: # Formulate the planner's LQ problem

A = np.array([[1, 0], [0, 1]])


B = np.array([[1], [0]])
R = np.array([[a1 / 2, ­a0 / 2], [­a0 / 2, 0]])
Q = γ / 2

# Solve for the optimal policy

lq = LQ(Q, R, A, B, beta=β)
P, F, d = lq.stationary_values()
770 CHAPTER 46. RATIONAL EXPECTATIONS EQUILIBRIUM

# Print the results

F = F.flatten()
κ0, κ1 = ­F[1], 1 ­ F[0]
print(κ0, κ1)

95.08187459214828 0.9524590627039239

The output yields the same (𝜅0 , 𝜅1 ) pair obtained as an equilibrium from the previous exer-
cise.

46.6.4 Exercise 4

The monopolist’s LQ problem is almost identical to the planner’s problem from the previous
exercise, except that

𝑎1 −𝑎0 /2
𝑅=[ ]
−𝑎0 /2 0

The problem can be solved as follows

In [7]: A = np.array([[1, 0], [0, 1]])


B = np.array([[1], [0]])
R = np.array([[a1, ­a0 / 2], [­a0 / 2, 0]])
Q = γ / 2

lq = LQ(Q, R, A, B, beta=β)
P, F, d = lq.stationary_values()

F = F.flatten()
m0, m1 = ­F[1], 1 ­ F[0]
print(m0, m1)

73.47294403502859 0.9265270559649703

We see that the law of motion for the monopolist is approximately 𝑌𝑡+1 = 73.4729 + 0.9265𝑌𝑡 .
In the rational expectations case, the law of motion was approximately 𝑌𝑡+1 = 95.0818 +
0.9525𝑌𝑡 .
One way to compare these two laws of motion is by their fixed points, which give long-run
equilibrium output in each case.
For laws of the form 𝑌𝑡+1 = 𝑐0 + 𝑐1 𝑌𝑡 , the fixed point is 𝑐0 /(1 − 𝑐1 ).
If you crunch the numbers, you will see that the monopolist adopts a lower long-run quantity
than obtained by the competitive market, implying a higher market price.
This is analogous to the elementary static-case results
Footnotes
[1] A literature that studies whether models populated with agents who learn can converge
to rational expectations equilibria features iterations on a modification of the mapping Φ that
46.6. SOLUTIONS 771

can be approximated as 𝛾Φ + (1 − 𝛾)𝐼. Here 𝐼 is the identity operator and 𝛾 ∈ (0, 1) is a


relaxation parameter. See [77] and [36] for statements and applications of this approach to
establish conditions under which collections of adaptive agents who use least squares learning
to converge to a rational expectations equilibrium.
772 CHAPTER 46. RATIONAL EXPECTATIONS EQUILIBRIUM
Chapter 47

Stability in Linear Rational


Expectations Models

47.1 Contents

• Overview 47.2
• Linear difference equations 47.3
• Illustration: Cagan’s Model 47.4
• Some Python code 47.5
• Alternative code 47.6
• Another perspective 47.7
• Log money supply feeds back on log price level 47.8
• Big 𝑃 , little 𝑝 interpretation 47.9
• Fun with Sympy code 47.10
In addition to what’s in Anaconda, this lecture deploys the following libraries:

In [1]: !conda install ­y quantecon

In [2]: import numpy as np


import quantecon as qe
import matplotlib.pyplot as plt
%matplotlib inline
from sympy import *
init_printing()

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

47.2 Overview

This lecture studies stability in the context of an elementary rational expectations model.
We study a rational expectations version of Philip Cagan’s model [18] linking the price level
to the money supply.

773
774 CHAPTER 47. STABILITY IN LINEAR RATIONAL EXPECTATIONS MODELS

Cagan did not use a rational expectations version of his model, but Sargent [94] did.
We study this model because it is intrinsically interesting and also because it has a mathe-
matical structure that also appears in virtually all linear rational expectations model, namely,
that a key endogenous variable equals a mathematical expectation of a geometric sum of fu-
ture values of another variable.
In a rational expectations version of Cagan’s model, the endogenous variable is the price level
or rate of inflation and the other variable is the money supply or the rate of change in the
money supply.
In this lecture, we’ll encounter:
• a convenient formula for the expectation of geometric sum of future values of a variable
• a way of solving an expectational difference equation by mapping it into a vector first-
order difference equation and appropriately manipulating an eigen decomposition of the
transition matrix in order to impose stability
• a way to use a Big 𝐾, little 𝑘 argument to allow apparent feedback from endogenous to
exogenous variables within a rational expectations equilibrium
• a use of eigenvector decompositions of matrices that allowed Blanchard and Khan
(1981) [? ] and Whiteman (1983) [110] to solve a class of linear rational expectations
models
Cagan’s model with rational expectations is formulated as an expectational difference
equation whose solution is a rational expectations equilibrium.
We’ll start this lecture with a quick review of deterministic (i.e., non-random) first-order and
second-order linear difference equations.

47.3 Linear difference equations

In this quick review of linear difference equations, we’ll use the backward shift or lag operator
𝐿
The lag operator 𝐿 maps a sequence {𝑥𝑡 }∞ ∞
𝑡=0 into the sequence {𝑥𝑡−1 }𝑡=0

We’ll can use 𝐿 in linear difference equations by using the equality 𝐿𝑥𝑡 ≡ 𝑥𝑡−1 in algebraic
expressions.
Further, the inverse 𝐿−1 of the lag operator is the forward shift operator.
In linear difference equations, we’ll often use the equaltiy 𝐿−1 𝑥𝑡 ≡ 𝑥𝑡+1 in the the algebra
below.
The algebra of lag and forward shift operators often simplifies formulas for linear difference
equations and their solutions.

47.3.1 First order

We want to solve a linear first-order scalar difference equation.


First, let |𝜆| < 1, and let {𝑢𝑡 }∞
𝑡=−∞ be a bounded sequence of scalar real numbers.

Let 𝐿 be the lag operator defined by 𝐿𝑥𝑡 ≡ 𝑥𝑡−1 and let 𝐿−1 be the forward shift operator
defined by 𝐿−1 𝑥𝑡 ≡ 𝑥𝑡+1 .
Then
47.3. LINEAR DIFFERENCE EQUATIONS 775

(1 − 𝜆𝐿)𝑦𝑡 = 𝑢𝑡 , ∀𝑡 (1)

has solutions

𝑦𝑡 = (1 − 𝜆𝐿)−1 𝑢𝑡 + 𝑘𝜆𝑡 (2)

or


𝑦𝑡 = ∑ 𝜆𝑗 𝑢𝑡−𝑗 + 𝑘𝜆𝑡
𝑗=0

for any real number 𝑘.


You can verify this fact by applying (1 − 𝜆𝐿) to both sides of equation (2) and noting that
(1 − 𝜆𝐿)𝜆𝑡 = 0.
To pin down 𝑘 we need one condition imposed from outside (e.g., an initial or terminal condi-
tion) on the path of 𝑦.
Now let |𝜆| > 1.
Rewrite equation (1) as

𝑦𝑡−1 = 𝜆−1 𝑦𝑡 − 𝜆−1 𝑢𝑡 , ∀𝑡 (3)

or

(1 − 𝜆−1 𝐿−1 )𝑦𝑡 = −𝜆−1 𝑢𝑡+1 . (4)

A solution is

1
𝑦𝑡 = −𝜆−1 ( ) 𝑢𝑡+1 + 𝑘𝜆𝑡 (5)
1 − 𝜆−1 𝐿−1

for any 𝑘.
To verify that this is a solution, check the consequences of operating on both sides of equation
(5) by (1 − 𝜆𝐿) and compare to equation (1).
Solution (2) exists for |𝜆| < 1 because the distributed lag in 𝑢 converges.
Solution (5) exists when |𝜆| > 1 because the distributed lead in 𝑢 converges.
When |𝜆| > 1, the distributed lag in 𝑢 in (2) may diverge, so that a solution of this form does
not exist.
The distributed lead in 𝑢 in (5) need not converge when |𝜆| < 1.

47.3.2 Second order

Now consider the second order difference equation

(1 − 𝜆1 𝐿)(1 − 𝜆2 𝐿)𝑦𝑡+1 = 𝑢𝑡 (6)


776 CHAPTER 47. STABILITY IN LINEAR RATIONAL EXPECTATIONS MODELS

where {𝑢𝑡 } is a bounded sequence, 𝑦0 is an initial condition, |𝜆1 | < 1 and |𝜆2 | > 1.
We seek a bounded sequence {𝑦𝑡 }∞ 𝑡=0 that satisfies (6). Using insights from our analysis of the
first-order equation, operate on both sides of (6) by the forward inverse of (1−𝜆2 𝐿) to rewrite
equation (6) as

𝜆−1
2
(1 − 𝜆1 𝐿)𝑦𝑡+1 = − 𝑢
−1 𝑡+1
1 − 𝜆−1
2 𝐿

or


−𝑗
𝑦𝑡+1 = 𝜆1 𝑦𝑡 − 𝜆−1
2 ∑ 𝜆2 𝑢𝑡+𝑗+1 . (7)
𝑗=0

Thus, we obtained equation (7) by solving stable roots (in this case 𝜆1 ) backward, and un-
stable roots (in this case 𝜆2 ) forward.
Equation (7) has a form that we shall encounter often.
−1
𝜆1 𝑦𝑡 is called the feedback part and − 1−𝜆𝜆−1
2
−1 𝑢𝑡+1 is called the feedforward part of the
2 𝐿
solution.

47.4 Illustration: Cagan’s Model

Let
• 𝑚𝑑𝑡 be the log of the demand for money
• 𝑚𝑡 be the log of the supply of money
• 𝑝𝑡 be the log of the price level
It follows that 𝑝𝑡+1 − 𝑝𝑡 is the rate of inflation.
The logarithm of the demand for real money balances 𝑚𝑑𝑡 − 𝑝𝑡 is an inverse function of the
expected rate of inflation 𝑝𝑡+1 − 𝑝𝑡 for 𝑡 ≥ 0:

𝑚𝑑𝑡 − 𝑝𝑡 = −𝛽(𝑝𝑡+1 − 𝑝𝑡 ), 𝛽>0

Equate the demand for log money 𝑚𝑑𝑡 to the supply of log money 𝑚𝑡 in the above equation
and rearrange to deduce that the logarithm of the price level 𝑝𝑡 is related to the logarithm of
the money supply 𝑚𝑡 by

𝑝𝑡 = (1 − 𝜆)𝑚𝑡 + 𝜆𝑝𝑡+1 (8)


𝛽
where 𝜆 ≡ 1+𝛽 ∈ (0, 1).
Solving the first order difference equation (8) forward gives


𝑝𝑡 = (1 − 𝜆) ∑ 𝜆𝑗 𝑚𝑡+𝑗 , (9)
𝑗=0

which is the unique stable solution of difference equation (8) among a class of more general
solutions
47.4. ILLUSTRATION: CAGAN’S MODEL 777


𝑝𝑡 = (1 − 𝜆) ∑ 𝜆𝑗 𝑚𝑡+𝑗 + 𝑐𝜆−𝑡 (10)
𝑗=0

that is indexed by the real number 𝑐 ∈ R.


Because we want to focus on stable solutions, we set 𝑐 = 0.
We begin by assuming that the log of the money supply is exogenous in the sense that it is
an autonomous process that does not feed back on the log of the price level.
In particular, we assume that the log of the money supply is described by the linear state
space system

𝑚𝑡 = 𝐺𝑥𝑡
(11)
𝑥𝑡+1 = 𝐴𝑥𝑡

where 𝑥𝑡 is an 𝑛 × 1 vector that does not include 𝑝𝑡 or lags of 𝑝𝑡 , 𝐴 is an 𝑛 × 𝑛 matrix with


eigenvalues that are less than 𝜆−1 in absolute values, and 𝐺 is a 1 × 𝑛 selector matrix.
Variables appearing in the vector 𝑥𝑡 contain information that might help predict future values
of the money supply.
We’ll take an example in which 𝑥𝑡 includes only 𝑚𝑡 , possibly lagged values of 𝑚, and a con-
stant.
An example of such an {𝑚𝑡 } process that fits info state space system (11) is one that satisfies
the second order linear difference equation

𝑚𝑡+1 = 𝛼 + 𝜌1 𝑚𝑡 + 𝜌2 𝑚𝑡−1

where the zeros of the characteristic polynomial (1 − 𝜌1 𝑧 − 𝜌2 𝑧2 ) are strictly greater than 1 in
modulus
We seek a stable or non-explosive solution of the difference equation (8) that obeys the sys-
tem comprised of (8)-(11).
By stable or non-explosive, we mean that neither 𝑚𝑡 nor 𝑝𝑡 diverges as 𝑡 → +∞.
This means that we are shutting down the term 𝑐𝜆−𝑡 in equation (10) above by setting 𝑐 = 0
The solution we are after is

𝑝𝑡 = 𝐹 𝑥𝑡 (12)

where

𝐹 = (1 − 𝜆)𝐺(𝐼 − 𝜆𝐴)−1 (13)

Note: As mentioned above, an explosive solution of difference equation (8) can be con-
structed by adding to the right hand of (12) a sequence 𝑐𝜆−𝑡 where 𝑐 is an arbitrary positive
constant.
778 CHAPTER 47. STABILITY IN LINEAR RATIONAL EXPECTATIONS MODELS

47.5 Some Python code

We’ll construct examples that illustrate (11).


Our first example takes as the law of motion for the log money supply the second order differ-
ence equation

𝑚𝑡+1 = 𝛼 + 𝜌1 𝑚𝑡 + 𝜌2 𝑚𝑡−1 (14)

that is parameterized by 𝜌1 , 𝜌2 , 𝛼
To capture this parameterization with system (9) we set

1 1 0 0
𝑥𝑡 = ⎡ 𝑚
⎢ 𝑡 ⎥,
⎤ 𝐴=⎡
⎢𝛼 𝜌1 𝜌2 ⎤
⎥, 𝐺 = [0 1 0]
⎣𝑚𝑡−1 ⎦ ⎣0 1 0 ⎦

Here is Python code

In [3]: λ = .9

α = 0
ρ1 = .9
ρ2 = .05

A = np.array([[1, 0, 0],
[α, ρ1, ρ2],
[0, 1, 0]])
G = np.array([[0, 1, 0]])

The matrix 𝐴 has one eigenvalue equal to unity that is associated with the 𝐴11 component
that captures a constant component of the state 𝑥𝑡 .
We can verify that the two eigenvalues of 𝐴 not associated with the constant in the state 𝑥𝑡
are strictly less than unity in modulus.

In [4]: eigvals = np.linalg.eigvals(A)


print(eigvals)

[­0.05249378 0.95249378 1. ]

In [5]: (abs(eigvals) <= 1).all()

Out[5]: True

Now let’s compute 𝐹 in formulas (12) and (13)

In [6]: # compute the solution, i.e. forumula (3)


F = (1 ­ λ) * G @ np.linalg.inv(np.eye(A.shape[0]) ­ λ * A)
print("F= ",F)

F= [[0. 0.66889632 0.03010033]]


47.5. SOME PYTHON CODE 779

Now let’s simulate paths of 𝑚𝑡 and 𝑝𝑡 starting from an initial value 𝑥0 .

In [7]: # set the initial state


x0 = np.array([1, 1, 0])

T = 100 # length of simulation

m_seq = np.empty(T+1)
p_seq = np.empty(T+1)

m_seq[0] = G @ x0
p_seq[0] = F @ x0

# simulate for T periods


x_old = x0
for t in range(T):

x = A @ x_old

m_seq[t+1] = G @ x
p_seq[t+1] = F @ x

x_old = x

In [8]: plt.figure()
plt.plot(range(T+1), m_seq, label='$m_t$')
plt.plot(range(T+1), p_seq, label='$p_t$')
plt.xlabel('t')
plt.title(f'λ={λ}, α={α}, $ρ_1$={ρ1}, $ρ_2$={ρ2}')
plt.legend()
plt.show()
780 CHAPTER 47. STABILITY IN LINEAR RATIONAL EXPECTATIONS MODELS

In the above graph, why is the log of the price level always less than the log of the money
supply?
The answer is because
• according to equation (9), 𝑝𝑡 is a geometric weighted average of current and future val-
ues of 𝑚𝑡 , and
• it happens that in this example future 𝑚’s are always less than the current 𝑚

47.6 Alternative code

We could also have run the simulation using the quantecon LinearStateSpace code.
The following code block performs the calculation with that code.

In [9]: # construct a LinearStateSpace instance

# stack G and F
G_ext = np.vstack([G, F])

C = np.zeros((A.shape[0], 1))

ss = qe.LinearStateSpace(A, C, G_ext, mu_0=x0)

In [10]: T = 100

# simulate using LinearStateSpace


x, y = ss.simulate(ts_length=T)

# plot
plt.figure()
plt.plot(range(T), y[0,:], label='$m_t$')
plt.plot(range(T), y[1,:], label='$p_t$')
plt.xlabel('t')
plt.title(f'λ={λ}, α={α}, $ρ_1$={ρ1}, $ρ_2$={ρ2}')
plt.legend()
plt.show()
47.6. ALTERNATIVE CODE 781

47.6.1 Special case

To simplify our presentation in ways that will let focus on an important idea, in the above
second-order difference equation (14) that governs 𝑚𝑡 , we now set 𝛼 = 0, 𝜌1 = 𝜌 ∈ (−1, 1),
and 𝜌2 = 0 so that the law of motion for 𝑚𝑡 becomes

𝑚𝑡+1 = 𝜌𝑚𝑡 (15)

and the state 𝑥𝑡 becomes

𝑥𝑡 = 𝑚𝑡 .

Consequently, we can set 𝐺 = 1, 𝐴 = 𝜌 making our formula (13) for 𝐹 become

𝐹 = (1 − 𝜆)(1 − 𝜆𝜌)−1 .

and the log the log price level satisfies

𝑝𝑡 = 𝐹 𝑚 𝑡 .

Please keep these formulas in mind as we investigate an alternative route to and interpreta-
tion of the formula for 𝐹 .
782 CHAPTER 47. STABILITY IN LINEAR RATIONAL EXPECTATIONS MODELS

47.7 Another perspective

Above, we imposed stability or non-explosiveness on the solution of the key difference equa-
tion (8) in Cagan’s model by solving the unstable root 𝜆−1 forward.
To shed light on the mechanics involved in imposing stability on a solution of a potentially
unstable system of linear difference equations and to prepare the way for generalizations of
our model in which the money supply is allowed to feed back on the price level itself, we stack
equations (8) and (15) to form the system

𝑚 𝜌 0 𝑚𝑡
[ 𝑡+1 ] = [ −1 ] [ ] (16)
𝑝𝑡+1 −(1 − 𝜆)/𝜆 𝜆 𝑝𝑡

or

𝑦𝑡+1 = 𝐻𝑦𝑡 , 𝑡≥0 (17)

where

𝜌 0
𝐻=[ ]. (18)
−(1 − 𝜆)/𝜆 𝜆−1

Transition matrix 𝐻 has eigenvalues 𝜌 ∈ (0, 1) and 𝜆−1 > 1.


Because an eigenvalue of 𝐻 exceeds unity, if we iterate on equation (17) starting from an ar-
𝑚
bitrary initial vector 𝑦0 = [ 0 ], we discover that in general absolute values of both compo-
𝑝0
nents of 𝑦𝑡 diverge toward +∞ as 𝑡 → +∞.
To substantiate this claim, we can use the eigenector matrix decomposition of 𝐻 that is avail-
able to us because the eigenvalues of 𝐻 are distinct

𝐻 = 𝑄Λ𝑄−1 .

Here Λ is a diagonal matrix of eigenvalues of 𝐻 and 𝑄 is a matrix whose columns are eigen-
vectors of the corresponding eigenvalues.
Note that

𝐻 𝑡 = 𝑄Λ𝑡 𝑄−1

so that

𝑦𝑡 = 𝑄Λ𝑡 𝑄−1 𝑦0

For almost all initial vectors 𝑦0 , the presence of the eigenvalue 𝜆−1 > 1 causes both compo-
nents of 𝑦𝑡 to diverge in absolute value to +∞.
To explore this outcome in more detail, we use the following transformation

𝑦𝑡∗ = 𝑄−1 𝑦𝑡
47.7. ANOTHER PERSPECTIVE 783

that allows us to represent the dynamics in a way that isolates the source of the propensity of
paths to diverge:


𝑦𝑡+1 = Λ𝑡 𝑦𝑡∗

Staring at this equation indicates that unless


𝑦1,0
𝑦0∗ = [ ], (19)
0

the path of 𝑦𝑡∗ and therefore the paths of both components of 𝑦𝑡 = 𝑄𝑦𝑡∗ will diverge in abso-
lute value as 𝑡 → +∞. (We say that the paths explode)
Equation (19) also leads us to conclude that there is a unique setting for the initial vector 𝑦0
for which both components of 𝑦𝑡 do not diverge.
The required setting of 𝑦0 must evidently have the property that


𝑦1,0
𝑄𝑦0 = 𝑦0∗ = [ ].
0

𝑚0
But note that since 𝑦0 = [ ] and 𝑚0 is given to us an an initial condition, it has to be 𝑝0
𝑝0
that does all the adjusting to satisfy this equation.
Sometimes this situation is described by saying that while 𝑚0 is truly a state variable, 𝑝0 is a
jump variable that is free to adjust at 𝑡 = 0 in order to satisfy the equation.
Thus, in a nutshell the unique value of the vector 𝑦0 for which the paths of 𝑦𝑡 do not diverge
must have second component 𝑝0 that verifies equality (19) by setting the second component of
𝑦0∗ equal to zero.
𝑚0
The component 𝑝0 of the initial vector 𝑦0 = [ ] must evidently satisfy
𝑝0

𝑄{2} 𝑦0 = 0

where 𝑄{2} denotes the second row of 𝑄−1 , a restriction that is equivalent to

𝑄21 𝑚0 + 𝑄22 𝑝0 = 0 (20)

where 𝑄𝑖𝑗 denotes the (𝑖, 𝑗) component of 𝑄−1 .


Solving this equation for 𝑝0 we find

𝑝0 = −(𝑄22 )−1 𝑄21 𝑚0 . (21)

This is the unique stabilizing value of 𝑝0 as a function of 𝑚0 .

47.7.1 Refining the formula

We can get an even more convenient formula for 𝑝0 that is cast in terms of components of 𝑄
instead of components of 𝑄−1 .
784 CHAPTER 47. STABILITY IN LINEAR RATIONAL EXPECTATIONS MODELS

To get this formula, first note that because (𝑄21 𝑄22 ) is the second row of the inverse of 𝑄
and because 𝑄−1 𝑄 = 𝐼, it follows that

𝑄11
[𝑄21 𝑄22 ] [ ]=0
𝑄21

which implies that

𝑄21 𝑄11 + 𝑄22 𝑄21 = 0.

Therefore,

−(𝑄22 )−1 𝑄21 = 𝑄21 𝑄−1


11 .

So we can write

𝑝0 = 𝑄21 𝑄−1
11 𝑚0 . (22)

It can be verified that this formula replicates itself over time so that

𝑝𝑡 = 𝑄21 𝑄−1
11 𝑚𝑡 . (23)

To implement formula (23), we want to compute 𝑄1 the eigenvector of 𝑄 associated with the
stable eigenvalue 𝜌 of 𝑄.
By hand it can be verified that the eigenvector associated with the stable eigenvalue 𝜌 is pro-
portional to

1 − 𝜆𝜌
𝑄1 = [ ].
1−𝜆

Notice that if we set 𝐴 = 𝜌 and 𝐺 = 1 in our earlier formula for 𝑝𝑡 we get

𝑄 = 𝐺(𝐼 − 𝜆𝐴)−1 𝑚𝑡 = (1 − 𝜆)(1 − 𝜆𝜌)−1 𝑚𝑡

a formula that is equivalent with

𝑝𝑡 = 𝑄21 𝑄−1
11 𝑚𝑡 ,

where

𝑄11
𝑄1 = [ ].
𝑄21

47.7.2 Some remarks about feedback

We have expressed (16) in what superficially appears to be a form in which 𝑦𝑡+1 feeds back on
𝑦𝑡 . even though what we actually want to represent is that the component 𝑝𝑡 feeds forward
on 𝑝𝑡+1 , and through it, on future 𝑚𝑡+𝑗 , 𝑗 = 0, 1, 2, ….
47.8. LOG MONEY SUPPLY FEEDS BACK ON LOG PRICE LEVEL 785

A tell-tale sign that we should look beyond its superficial “feedback” form is that 𝜆−1 > 1 so
that the matrix 𝐻 in (16) is unstable
• it has one eigenvalue 𝜌 that is less than one in modulus that does not imperil stability,
but …
• it has a second eigenvalue 𝜆−1 that exceeds one in modulus and that makes 𝐻 an unsta-
ble matrix
We’ll keep these observations in mind as we turn now to a case in which the log money sup-
ply actually does feed back on the log of the price level.

47.8 Log money supply feeds back on log price level

The same pattern of eigenvalues splitting around unity, with one being below unity and an-
other greater than unity, sometimes continues to prevail when there is feedback from the log
price level to the log money supply.
Let the feedback rule be

𝑚𝑡+1 = 𝜌𝑚𝑡 + 𝛿𝑝𝑡 (24)

where 𝜌 ∈ (0, 1) as before and where we shall now allow 𝛿 ≠ 0.


However, 𝛿 cannot be too large if things are to fit together as we wish to deliver a stable sys-
tem for some initial value 𝑝0 that we want to determine uniquely. .
The forward-looking equation (8) continues to describe equality between the demand and
supply of money.
𝑚𝑡
We assume that equations (8) and (24) govern 𝑦𝑡 ≡ [ ] for 𝑡 ≥ 0
𝑝𝑡
The transition matrix 𝐻 in the law of motion

𝑦𝑡+1 = 𝐻𝑦𝑡

now becomes

𝜌 𝛿
𝐻=[ ].
−(1 − 𝜆)/𝜆 𝜆−1

We take 𝑚0 as a given intial condition and as before seek an initial value 𝑝0 that stabilizes
the system in the sense that 𝑦𝑡 converges as 𝑡 → +∞.
Our approach is identical with that followed above and is based on an eigenvalue decomposi-
tion in which, cross our fingers, one eigenvalue exceeds unity and the other is less than unity
in absolute value.
When 𝛿 ≠ 0 as we now assume, the eigenvalues of 𝐻 are no longer 𝜌 ∈ (0, 1) and 𝜆−1 > 1
We’ll just calculate them and apply the same algorithm that we used above.
That algorithm remains valid so long as the eigenvalues split around unity as before.
Again we assume that 𝑚0 is an initial condition, but that 𝑝0 is not given but to be solved for.
786 CHAPTER 47. STABILITY IN LINEAR RATIONAL EXPECTATIONS MODELS

Let’s write and execute some Python code that will let us explore how outcomes depend on 𝛿.

In [11]: def construct_H(ρ, λ, δ):


"contruct matrix H given parameters."

H = np.empty((2, 2))
H[0, :] = ρ,δ
H[1, :] = ­ (1 ­ λ) / λ, 1 / λ

return H

def H_eigvals(ρ=.9, λ=.5, δ=0):


"compute the eigenvalues of matrix H given parameters."

# construct H matrix
H = construct_H(ρ, λ, δ)

# compute eigenvalues
eigvals = np.linalg.eigvals(H)

return eigvals

In [12]: H_eigvals()

Out[12]: array([2. , 0.9])

Notice that a negative δ will not imperil the stability of the matrix 𝐻, even if it has a big
absolute value.

In [13]: # small negative δ


H_eigvals(δ=­0.05)

Out[13]: array([0.8562829, 2.0437171])

In [14]: # large negative δ


H_eigvals(δ=­1.5)

Out[14]: array([0.10742784, 2.79257216])

A sufficiently small positive δ also causes no problem.

In [15]: # sufficiently small positive δ


H_eigvals(δ=0.05)

Out[15]: array([0.94750622, 1.95249378])

But a large enough positive δ makes both eigenvalues of 𝐻 strictly greater than unity in mod-
ulus.
For example,

In [16]: H_eigvals(δ=0.2)

Out[16]: array([1.12984379, 1.77015621])


47.8. LOG MONEY SUPPLY FEEDS BACK ON LOG PRICE LEVEL 787

We want to study systems in which one eigenvalue exceeds unity in modulus while the other
is less than unity in modulus, so we avoid values of 𝛿 that are too large

In [17]: def magic_p0(m0, ρ=.9, λ=.5, δ=0):


"""
Use the magic formula (8) to compute the level of p0
that makes the system stable.
"""

H = construct_H(ρ, λ, δ)
eigvals, Q = np.linalg.eig(H)

# find the index of the smaller eigenvalue


ind = 0 if eigvals[0] < eigvals[1] else 1

# verify that the eigenvalue is less than unity


if eigvals[ind] > 1:

print("both eigenvalues exceed unity in modulus")

return None

p0 = Q[1, ind] / Q[0, ind] * m0

return p0

Let’s plot how the solution 𝑝0 changes as 𝑚0 changes for different settings of 𝛿.

In [18]: m_range = np.arange(0.1, 2., 0.1)

for δ in [­0.05, 0, 0.05]:


plt.plot(m_range, [magic_p0(m0, δ=δ) for m0 in m_range], label=f"δ={δ}")
plt.legend()

plt.xlabel("$m_0$")
plt.ylabel("$p_0$")
plt.show()
788 CHAPTER 47. STABILITY IN LINEAR RATIONAL EXPECTATIONS MODELS

To look at things from a different angle, we can fix the initial value 𝑚0 and see how 𝑝0
changes as 𝛿 changes.

In [19]: m0 = 1

δ_range = np.linspace(­0.05, 0.05, 100)


plt.plot(δ_range, [magic_p0(m0, δ=δ) for δ in δ_range])
plt.xlabel('$\delta$')
plt.ylabel('$p_0$')
plt.title(f'$m_0$={m0}')
plt.show()

Notice that when 𝛿 is large enough, both eigenvalues exceed unity in modulus, causing a sta-
bilizing value of 𝑝0 not to exist.

In [20]: magic_p0(1, δ=0.2)

both eigenvalues exceed unity in modulus

47.9 Big 𝑃 , little 𝑝 interpretation

It is helpful to view our solutions with feedback from the price level or inflation to money or
the rate of money creation in terms of the Big 𝐾, little 𝑘 idea discussed in Rational Expecta-
tions Models
This will help us sort out what is taken as given by the decision makers who use the differ-
ence equation (9) to determine 𝑝𝑡 as a function of their forecasts of future values of 𝑚𝑡 .
47.9. BIG 𝑃 , LITTLE 𝑃 INTERPRETATION 789

Let’s write the stabilizing solution that we have computed using the eigenvector decomposi-
tion of 𝐻 as 𝑃𝑡 = 𝐹 ∗ 𝑚𝑡 where

𝐹 ∗ = 𝑄21 𝑄−1
11

Then from 𝑃𝑡+1 = 𝐹 ∗ 𝑚𝑡+1 and 𝑚𝑡+1 = 𝜌𝑚𝑡 + 𝛿𝑃𝑡 we can deduce the recursion 𝑃𝑡+1 =
𝐹 ∗ 𝜌𝑚𝑡 + 𝐹 ∗ 𝛿𝑃𝑡 and create the stacked system

𝑚 𝜌 𝛿 𝑚𝑡
[ 𝑡+1 ] = [ ∗ ∗ ][ ]
𝑃𝑡+1 𝐹 𝜌 𝐹 𝛿 𝑃𝑡

or

𝑥𝑡+1 = 𝐴𝑥𝑡

𝑚𝑡
where 𝑥𝑡 = [ ].
𝑃𝑡
Then apply formula (13) for 𝐹 to deduce that

𝑚𝑡 𝑚
𝑝𝑡 = 𝐹 [ ]=𝐹[ ∗𝑡 ]
𝑃𝑡 𝐹 𝑚𝑡

which implies that

𝑚𝑡
𝑝𝑡 = [𝐹1 𝐹2 ] [ ] = 𝐹1 𝑚𝑡 + 𝐹2 𝐹 ∗ 𝑚𝑡
𝐹 ∗ 𝑚𝑡

so that we expect to have

𝐹 ∗ = 𝐹 1 + 𝐹2 𝐹 ∗

We verify this equality in the next block of Python code that implements the following com-
putations.

1. For the system with 𝛿 ≠ 0 so that there is feedback, we compute the stabilizing solution
for 𝑝𝑡 in the form 𝑝𝑡 = 𝐹 ∗ 𝑚𝑡 where 𝐹 ∗ = 𝑄21 𝑄−1
11 as above.

𝑚𝑡
2. Recalling the system (11), (12), and (13) above, we define 𝑥𝑡 = [] and notice that it
𝑃𝑡
𝜌 𝛿
is Big 𝑃𝑡 and not little 𝑝𝑡 here. Then we form 𝐴 and 𝐺 as 𝐴 = [ ∗ ] and 𝐺 =
𝐹 𝜌 𝐹 ∗𝛿
[1 0] and we compute [𝐹1 𝐹2 ] ≡ 𝐹 from equation (13) above.

3. We compute 𝐹1 + 𝐹2 𝐹 ∗ and compare it with 𝐹 ∗ and verify equality.

In [21]: # set parameters


ρ = .9
λ = .5
δ = .05
790 CHAPTER 47. STABILITY IN LINEAR RATIONAL EXPECTATIONS MODELS

In [22]: # solve for F_star


H = construct_H(ρ, λ, δ)
eigvals, Q = np.linalg.eig(H)

ind = 0 if eigvals[0] < eigvals[1] else 1


F_star = Q[1, ind] / Q[0, ind]
F_star

Out[22]: 0.9501243788791095

In [23]: # solve for F_check


A = np.empty((2, 2))
A[0, :] = ρ, δ
A[1, :] = F_star * A[0, :]

G = np.array([1, 0])

F_check= (1 ­ λ) * G @ np.linalg.inv(np.eye(2) ­ λ * A)
F_check

Out[23]: array([0.92755597, 0.02375311])

Compare 𝐹 ∗ with 𝐹1 + 𝐹2 𝐹 ∗

In [24]: F_check[0] + F_check[1] * F_star, F_star

Out[24]: (0.9501243788791097, 0.9501243788791095)

47.10 Fun with Sympy code

This section is a small gift for readers who have made it this far.
It puts Sympy to work on our model.
Thus, we use Sympy to compute some of the key objects comprising the eigenvector decompo-
sition of 𝐻.
𝐻 with nonzero 𝛿.

In [25]: λ, δ, ρ = symbols('λ, δ, ρ')

In [26]: H1 = Matrix([[ρ,δ], [­ (1 ­ λ) / λ, λ ** ­1]])

In [27]: H1

𝜌 𝛿
Out[27]: [ 𝜆−1 1]
𝜆 𝜆

In [28]: H1.eigenvals()

𝜆𝜌 + 1 √4𝛿𝜆2 − 4𝛿𝜆 + 𝜆2 𝜌2 − 2𝜆𝜌 + 1 𝜆𝜌 + 1 √4𝛿𝜆2 − 4𝛿𝜆 + 𝜆2 𝜌2 − 2𝜆𝜌 + 1


Out[28]: { − ∶ 1, + ∶ 1}
2𝜆 2𝜆 2𝜆 2𝜆
47.10. FUN WITH SYMPY CODE 791

In [29]: H1.eigenvects()

2𝛿𝜆
𝜆𝜌 + 1 √4𝛿𝜆2 − 4𝛿𝜆 + 𝜆2 𝜌2 − 2𝜆𝜌 + 1 − 𝜆𝜌 + 1 √4𝛿𝜆2 − 4𝛿𝜆 + 𝜆2 𝜌2
Out[29]: [( − , 1, [[ 𝜆𝜌+√4𝛿𝜆2 −4𝛿𝜆+𝜆2 𝜌2 −2𝜆𝜌+1−1 ]]) , ( +
2𝜆 2𝜆 1 2𝜆 2𝜆

𝐻 with 𝛿 being zero.

In [30]: H2 = Matrix([[ρ,0], [­ (1 ­ λ) / λ, λ ** ­1]])

In [31]: H2

𝜌 0
Out[31]: [ 𝜆−1 1]
𝜆 𝜆

In [32]: H2.eigenvals()

1
Out[32]: { ∶ 1, 𝜌 ∶ 1}
𝜆

In [33]: H2.eigenvects()

𝜆𝜌−1
1 0
Out[33]: [( , 1, [[ ]]) , (𝜌, 1, [[ 𝜆−1 ]])]
𝜆 1 1

Below we do induce sympy to do the following fun things for us analytically:

1. We compute the matrix 𝑄 whose first column is the eigenvector associated with 𝜌. and
whose second column is the eigenvector associated with 𝜆−1 .

2. We use sympy to compute the inverse 𝑄−1 of 𝑄 (both in symbols).

3. We use sympy to compute 𝑄21 𝑄−1


11 (in symbols).

4. Where 𝑄𝑖𝑗 denotes the (𝑖, 𝑗) component of 𝑄−1 , weighted use sympy to compute
−(𝑄22 )−1 𝑄21 (again in symbols)

In [34]: # construct Q
vec = []
for i, (eigval, _, eigvec) in enumerate(H2.eigenvects()):

vec.append(eigvec[0])

if eigval == ρ:
ind = i

Q = vec[ind].col_insert(1, vec[1­ind])

In [35]: Q

𝜆𝜌−1
0
Out[35]: [ 𝜆−1 ]
1 1

𝑄−1
792 CHAPTER 47. STABILITY IN LINEAR RATIONAL EXPECTATIONS MODELS

In [36]: Q_inv = Q ** (­1)


Q_inv

𝜆−1
𝜆𝜌−1
0
Out[36]: [ 𝜆−1 ]
− 𝜆𝜌−1 1

−1
𝑄21 𝑄11

In [37]: Q[1, 0] / Q[0, 0]

𝜆−1
Out[37]:
𝜆𝜌 − 1

−(𝑄22 )−1 𝑄21

In [38]: ­ Q_inv[1, 0] / Q_inv[1, 1]

𝜆−1
Out[38]:
𝜆𝜌 − 1
Chapter 48

Markov Perfect Equilibrium

48.1 Contents

• Overview 48.2
• Background 48.3
• Linear Markov Perfect Equilibria 48.4
• Application 48.5
• Exercises 48.6
• Solutions 48.7
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon

48.2 Overview

This lecture describes the concept of Markov perfect equilibrium.


Markov perfect equilibrium is a key notion for analyzing economic problems involving dy-
namic strategic interaction, and a cornerstone of applied game theory.
In this lecture, we teach Markov perfect equilibrium by example.
We will focus on settings with
• two players
• quadratic payoff functions
• linear transition rules for the state
Other references include chapter 7 of [72].
Let’s start with some standard imports:

In [2]: import numpy as np


import quantecon as qe
import matplotlib.pyplot as plt
%matplotlib inline

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer

793
794 CHAPTER 48. MARKOV PERFECT EQUILIBRIUM

requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

48.3 Background

Markov perfect equilibrium is a refinement of the concept of Nash equilibrium.


It is used to study settings where multiple decision-makers interact non-cooperatively over
time, each pursuing its own objective.
The agents in the model face a common state vector, the time path of which is influenced by
– and influences – their decisions.
In particular, the transition law for the state that confronts each agent is affected by decision
rules of other agents.
Individual payoff maximization requires that each agent solve a dynamic programming prob-
lem that includes this transition law.
Markov perfect equilibrium prevails when no agent wishes to revise its policy, taking as given
the policies of all other agents.
Well known examples include
• Choice of price, output, location or capacity for firms in an industry (e.g., [34], [92],
[29]).
• Rate of extraction from a shared natural resource, such as a fishery (e.g., [71], [107]).
Let’s examine a model of the first type.

48.3.1 Example: A Duopoly Model

Two firms are the only producers of a good, the demand for which is governed by a linear in-
verse demand function

𝑝 = 𝑎0 − 𝑎1 (𝑞1 + 𝑞2 ) (1)

Here 𝑝 = 𝑝𝑡 is the price of the good, 𝑞𝑖 = 𝑞𝑖𝑡 is the output of firm 𝑖 = 1, 2 at time 𝑡 and
𝑎0 > 0, 𝑎1 > 0.
In (1) and what follows,
• the time subscript is suppressed when possible to simplify notation
• 𝑥̂ denotes a next period value of variable 𝑥
Each firm recognizes that its output affects total output and therefore the market price.
The one-period payoff function of firm 𝑖 is price times quantity minus adjustment costs:

𝜋𝑖 = 𝑝𝑞𝑖 − 𝛾(𝑞𝑖̂ − 𝑞𝑖 )2 , 𝛾 > 0, (2)

Substituting the inverse demand curve (1) into (2) lets us express the one-period payoff as
48.4. LINEAR MARKOV PERFECT EQUILIBRIA 795

𝜋𝑖 (𝑞𝑖 , 𝑞−𝑖 , 𝑞𝑖̂ ) = 𝑎0 𝑞𝑖 − 𝑎1 𝑞𝑖2 − 𝑎1 𝑞𝑖 𝑞−𝑖 − 𝛾(𝑞𝑖̂ − 𝑞𝑖 )2 , (3)

where 𝑞−𝑖 denotes the output of the firm other than 𝑖.



The objective of the firm is to maximize ∑𝑡=0 𝛽 𝑡 𝜋𝑖𝑡 .
Firm 𝑖 chooses a decision rule that sets next period quantity 𝑞𝑖̂ as a function 𝑓𝑖 of the current
state (𝑞𝑖 , 𝑞−𝑖 ).
An essential aspect of a Markov perfect equilibrium is that each firm takes the decision rule
of the other firm as known and given.
Given 𝑓−𝑖 , the Bellman equation of firm 𝑖 is

𝑣𝑖 (𝑞𝑖 , 𝑞−𝑖 ) = max {𝜋𝑖 (𝑞𝑖 , 𝑞−𝑖 , 𝑞𝑖̂ ) + 𝛽𝑣𝑖 (𝑞𝑖̂ , 𝑓−𝑖 (𝑞−𝑖 , 𝑞𝑖 ))} (4)
𝑞𝑖̂

Definition A Markov perfect equilibrium of the duopoly model is a pair of value functions
(𝑣1 , 𝑣2 ) and a pair of policy functions (𝑓1 , 𝑓2 ) such that, for each 𝑖 ∈ {1, 2} and each possible
state,
• The value function 𝑣𝑖 satisfies Bellman equation (4).
• The maximizer on the right side of (4) equals 𝑓𝑖 (𝑞𝑖 , 𝑞−𝑖 ).
The adjective “Markov” denotes that the equilibrium decision rules depend only on the cur-
rent values of the state variables, not other parts of their histories.
“Perfect” means complete, in the sense that the equilibrium is constructed by backward in-
duction and hence builds in optimizing behavior for each firm at all possible future states.

• These include many states that will not be reached when we iterate forward
on the pair of equilibrium strategies 𝑓𝑖 starting from a given initial state.

48.3.2 Computation

One strategy for computing a Markov perfect equilibrium is iterating to convergence on pairs
of Bellman equations and decision rules.
In particular, let 𝑣𝑖𝑗 , 𝑓𝑖𝑗 be the value function and policy function for firm 𝑖 at the 𝑗-th itera-
tion.
Imagine constructing the iterates

𝑣𝑖𝑗+1 (𝑞𝑖 , 𝑞−𝑖 ) = max {𝜋𝑖 (𝑞𝑖 , 𝑞−𝑖 , 𝑞𝑖̂ ) + 𝛽𝑣𝑖𝑗 (𝑞𝑖̂ , 𝑓−𝑖 (𝑞−𝑖 , 𝑞𝑖 ))} (5)
𝑞𝑖̂

These iterations can be challenging to implement computationally.


However, they simplify for the case in which one-period payoff functions are quadratic and
transition laws are linear — which takes us to our next topic.

48.4 Linear Markov Perfect Equilibria

As we saw in the duopoly example, the study of Markov perfect equilibria in games with two
players leads us to an interrelated pair of Bellman equations.
796 CHAPTER 48. MARKOV PERFECT EQUILIBRIUM

In linear-quadratic dynamic games, these “stacked Bellman equations” become “stacked Ric-
cati equations” with a tractable mathematical structure.
We’ll lay out that structure in a general setup and then apply it to some simple problems.

48.4.1 Coupled Linear Regulator Problems

We consider a general linear-quadratic regulator game with two players.


For convenience, we’ll start with a finite horizon formulation, where 𝑡0 is the initial date and
𝑡1 is the common terminal date.
Player 𝑖 takes {𝑢−𝑖𝑡 } as given and minimizes

𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 𝑅𝑖 𝑥𝑡 + 𝑢′𝑖𝑡 𝑄𝑖 𝑢𝑖𝑡 + 𝑢′−𝑖𝑡 𝑆𝑖 𝑢−𝑖𝑡 + 2𝑥′𝑡 𝑊𝑖 𝑢𝑖𝑡 + 2𝑢′−𝑖𝑡 𝑀𝑖 𝑢𝑖𝑡 } (6)
𝑡=𝑡0

while the state evolves according to

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵1 𝑢1𝑡 + 𝐵2 𝑢2𝑡 (7)

Here
• 𝑥𝑡 is an 𝑛 × 1 state vector and 𝑢𝑖𝑡 is a 𝑘𝑖 × 1 vector of controls for player 𝑖
• 𝑅𝑖 is 𝑛 × 𝑛
• 𝑆𝑖 is 𝑘−𝑖 × 𝑘−𝑖
• 𝑄𝑖 is 𝑘𝑖 × 𝑘𝑖
• 𝑊𝑖 is 𝑛 × 𝑘𝑖
• 𝑀𝑖 is 𝑘−𝑖 × 𝑘𝑖
• 𝐴 is 𝑛 × 𝑛
• 𝐵𝑖 is 𝑛 × 𝑘𝑖

48.4.2 Computing Equilibrium

We formulate a linear Markov perfect equilibrium as follows.


Player 𝑖 employs linear decision rules 𝑢𝑖𝑡 = −𝐹𝑖𝑡 𝑥𝑡 , where 𝐹𝑖𝑡 is a 𝑘𝑖 × 𝑛 matrix.
A Markov perfect equilibrium is a pair of sequences {𝐹1𝑡 , 𝐹2𝑡 } over 𝑡 = 𝑡0 , … , 𝑡1 − 1 such that
• {𝐹1𝑡 } solves player 1’s problem, taking {𝐹2𝑡 } as given, and
• {𝐹2𝑡 } solves player 2’s problem, taking {𝐹1𝑡 } as given
If we take 𝑢2𝑡 = −𝐹2𝑡 𝑥𝑡 and substitute it into (6) and (7), then player 1’s problem becomes
minimization of

𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 Π1𝑡 𝑥𝑡 + 𝑢′1𝑡 𝑄1 𝑢1𝑡 + 2𝑢′1𝑡 Γ1𝑡 𝑥𝑡 } (8)
𝑡=𝑡0

subject to

𝑥𝑡+1 = Λ1𝑡 𝑥𝑡 + 𝐵1 𝑢1𝑡 , (9)


48.4. LINEAR MARKOV PERFECT EQUILIBRIA 797

where
• Λ𝑖𝑡 ∶= 𝐴 − 𝐵−𝑖 𝐹−𝑖𝑡

• Π𝑖𝑡 ∶= 𝑅𝑖 + 𝐹−𝑖𝑡 𝑆𝑖 𝐹−𝑖𝑡
• Γ𝑖𝑡 ∶= 𝑊𝑖 − 𝑀𝑖′ 𝐹−𝑖𝑡

This is an LQ dynamic programming problem that can be solved by working backwards.


Decision rules that solve this problem are

𝐹1𝑡 = (𝑄1 + 𝛽𝐵1′ 𝑃1𝑡+1 𝐵1 )−1 (𝛽𝐵1′ 𝑃1𝑡+1 Λ1𝑡 + Γ1𝑡 ) (10)

where 𝑃1𝑡 solves the matrix Riccati difference equation

𝑃1𝑡 = Π1𝑡 −(𝛽𝐵1′ 𝑃1𝑡+1 Λ1𝑡 +Γ1𝑡 )′ (𝑄1 +𝛽𝐵1′ 𝑃1𝑡+1 𝐵1 )−1 (𝛽𝐵1′ 𝑃1𝑡+1 Λ1𝑡 +Γ1𝑡 )+𝛽Λ′1𝑡 𝑃1𝑡+1 Λ1𝑡 (11)

Similarly, decision rules that solve player 2’s problem are

𝐹2𝑡 = (𝑄2 + 𝛽𝐵2′ 𝑃2𝑡+1 𝐵2 )−1 (𝛽𝐵2′ 𝑃2𝑡+1 Λ2𝑡 + Γ2𝑡 ) (12)

where 𝑃2𝑡 solves

𝑃2𝑡 = Π2𝑡 −(𝛽𝐵2′ 𝑃2𝑡+1 Λ2𝑡 +Γ2𝑡 )′ (𝑄2 +𝛽𝐵2′ 𝑃2𝑡+1 𝐵2 )−1 (𝛽𝐵2′ 𝑃2𝑡+1 Λ2𝑡 +Γ2𝑡 )+𝛽Λ′2𝑡 𝑃2𝑡+1 Λ2𝑡 (13)

Here, in all cases 𝑡 = 𝑡0 , … , 𝑡1 − 1 and the terminal conditions are 𝑃𝑖𝑡1 = 0.


The solution procedure is to use equations (10), (11), (12), and (13), and “work backwards”
from time 𝑡1 − 1.
Since we’re working backward, 𝑃1𝑡+1 and 𝑃2𝑡+1 are taken as given at each stage.
Moreover, since
• some terms on the right-hand side of (10) contain 𝐹2𝑡
• some terms on the right-hand side of (12) contain 𝐹1𝑡
we need to solve these 𝑘1 + 𝑘2 equations simultaneously.

Key Insight

A key insight is that equations (10) and (12) are linear in 𝐹1𝑡 and 𝐹2𝑡 .
After these equations have been solved, we can take 𝐹𝑖𝑡 and solve for 𝑃𝑖𝑡 in (11) and (13).

Infinite Horizon

We often want to compute the solutions of such games for infinite horizons, in the hope that
the decision rules 𝐹𝑖𝑡 settle down to be time-invariant as 𝑡1 → +∞.
In practice, we usually fix 𝑡1 and compute the equilibrium of an infinite horizon game by driv-
ing 𝑡0 → −∞.
This is the approach we adopt in the next section.
798 CHAPTER 48. MARKOV PERFECT EQUILIBRIUM

48.4.3 Implementation

We use the function nnash from QuantEcon.py that computes a Markov perfect equilibrium
of the infinite horizon linear-quadratic dynamic game in the manner described above.

48.5 Application

Let’s use these procedures to treat some applications, starting with the duopoly model.

48.5.1 A Duopoly Model

To map the duopoly model into coupled linear-quadratic dynamic programming problems,
define the state and controls as

1
𝑥𝑡 ∶= ⎡𝑞 ⎤
⎢ 1𝑡 ⎥ and 𝑢𝑖𝑡 ∶= 𝑞𝑖,𝑡+1 − 𝑞𝑖𝑡 , 𝑖 = 1, 2
⎣𝑞2𝑡 ⎦

If we write

𝑥′𝑡 𝑅𝑖 𝑥𝑡 + 𝑢′𝑖𝑡 𝑄𝑖 𝑢𝑖𝑡

where 𝑄1 = 𝑄2 = 𝛾,

0 − 𝑎20 0 0 0 − 𝑎20
𝑅1 ∶= ⎡−
⎢ 2
𝑎0
𝑎1 𝑎1 ⎤
2 ⎥ and 𝑅2 ∶= ⎡
⎢ 0 0 𝑎1
2


𝑎1 𝑎 𝑎1
⎣ 0 2 0⎦ ⎣− 20 2 𝑎1 ⎦

then we recover the one-period payoffs in expression (3).


The law of motion for the state 𝑥𝑡 is 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵1 𝑢1𝑡 + 𝐵2 𝑢2𝑡 where

1 0 0 0 0
𝐴 ∶= ⎢0 1 0⎤

⎥, 𝐵1 ∶= ⎢1⎤

⎥, 𝐵2 ∶= ⎢0⎤


⎣ 0 0 1 ⎦ 0
⎣ ⎦ 1
⎣ ⎦

The optimal decision rule of firm 𝑖 will take the form 𝑢𝑖𝑡 = −𝐹𝑖 𝑥𝑡 , inducing the following
closed-loop system for the evolution of 𝑥 in the Markov perfect equilibrium:

𝑥𝑡+1 = (𝐴 − 𝐵1 𝐹1 − 𝐵1 𝐹2 )𝑥𝑡 (14)

48.5.2 Parameters and Solution

Consider the previously presented duopoly model with parameter values of:
• 𝑎0 = 10
• 𝑎1 = 2
• 𝛽 = 0.96
• 𝛾 = 12
48.5. APPLICATION 799

From these, we compute the infinite horizon MPE using the preceding code

In [3]: import numpy as np


import quantecon as qe

# Parameters
a0 = 10.0
a1 = 2.0
β = 0.96
γ = 12.0

# In LQ form
A = np.eye(3)
B1 = np.array([[0.], [1.], [0.]])
B2 = np.array([[0.], [0.], [1.]])

R1 = [[ 0., ­a0 / 2, 0.],


[­a0 / 2., a1, a1 / 2.],
[ 0, a1 / 2., 0.]]

R2 = [[ 0., 0., ­a0 / 2],


[ 0., 0., a1 / 2.],
[­a0 / 2, a1 / 2., a1]]

Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0

# Solve using QE's nnash function


F1, F2, P1, P2 = qe.nnash(A, B1, B2, R1, R2, Q1,
Q2, S1, S2, W1, W2, M1,
M2, beta=β)

# Display policies
print("Computed policies for firm 1 and firm 2:\n")
print(f"F1 = {F1}")
print(f"F2 = {F2}")
print("\n")

Computed policies for firm 1 and firm 2:

F1 = [[­0.66846615 0.29512482 0.07584666]]


F2 = [[­0.66846615 0.07584666 0.29512482]]

Running the code produces the following output.


One way to see that 𝐹𝑖 is indeed optimal for firm 𝑖 taking 𝐹2 as given is to use QuantE-
con.py’s LQ class.
In particular, let’s take F2 as computed above, plug it into (8) and (9) to get firm 1’s problem
and solve it using LQ.
We hope that the resulting policy will agree with F1 as computed above

In [4]: Λ1 = A ­ B2 @ F2
lq1 = qe.LQ(Q1, R1, Λ1, B1, beta=β)
P1_ih, F1_ih, d = lq1.stationary_values()
F1_ih
800 CHAPTER 48. MARKOV PERFECT EQUILIBRIUM

Out[4]: array([[­0.66846613, 0.29512482, 0.07584666]])

This is close enough for rock and roll, as they say in the trade.
Indeed, np.allclose agrees with our assessment

In [5]: np.allclose(F1, F1_ih)

Out[5]: True

48.5.3 Dynamics

Let’s now investigate the dynamics of price and output in this simple duopoly model under
the MPE policies.
Given our optimal policies 𝐹 1 and 𝐹 2, the state evolves according to (14).
The following program
• imports 𝐹 1 and 𝐹 2 from the previous program along with all parameters.
• computes the evolution of 𝑥𝑡 using (14).
• extracts and plots industry output 𝑞𝑡 = 𝑞1𝑡 + 𝑞2𝑡 and price 𝑝𝑡 = 𝑎0 − 𝑎1 𝑞𝑡 .

In [6]: AF = A ­ B1 @ F1 ­ B2 @ F2
n = 20
x = np.empty((3, n))
x[:, 0] = 1, 1, 1
for t in range(n­1):
x[:, t+1] = AF @ x[:, t]
q1 = x[1, :]
q2 = x[2, :]
q = q1 + q2 # Total output, MPE
p = a0 ­ a1 * q # Price, MPE

fig, ax = plt.subplots(figsize=(9, 5.8))


ax.plot(q, 'b­', lw=2, alpha=0.75, label='total output')
ax.plot(p, 'g­', lw=2, alpha=0.75, label='price')
ax.set_title('Output and prices, duopoly MPE')
ax.legend(frameon=False)
plt.show()
48.5. APPLICATION 801

Note that the initial condition has been set to 𝑞10 = 𝑞20 = 1.0.
To gain some perspective we can compare this to what happens in the monopoly case.
The first panel in the next figure compares output of the monopolist and industry output un-
der the MPE, as a function of time.
802 CHAPTER 48. MARKOV PERFECT EQUILIBRIUM

The second panel shows analogous curves for price.

Here parameters are the same as above for both the MPE and monopoly solutions.
The monopolist initial condition is 𝑞0 = 2.0 to mimic the industry initial condition 𝑞10 =
𝑞20 = 1.0 in the MPE case.
As expected, output is higher and prices are lower under duopoly than monopoly.

48.6 Exercises

48.6.1 Exercise 1

Replicate the pair of figures showing the comparison of output and prices for the monopolist
and duopoly under MPE.
Parameters are as in duopoly_mpe.py and you can use that code to compute MPE policies
under duopoly.
The optimal policy in the monopolist case can be computed using QuantEcon.py’s LQ class.

48.6.2 Exercise 2

In this exercise, we consider a slightly more sophisticated duopoly problem.


48.6. EXERCISES 803

It takes the form of infinite horizon linear-quadratic game proposed by Judd [63].
Two firms set prices and quantities of two goods interrelated through their demand curves.
Relevant variables are defined as follows:
• 𝐼𝑖𝑡 = inventories of firm 𝑖 at beginning of 𝑡
• 𝑞𝑖𝑡 = production of firm 𝑖 during period 𝑡
• 𝑝𝑖𝑡 = price charged by firm 𝑖 during period 𝑡
• 𝑆𝑖𝑡 = sales made by firm 𝑖 during period 𝑡
• 𝐸𝑖𝑡 = costs of production of firm 𝑖 during period 𝑡
• 𝐶𝑖𝑡 = costs of carrying inventories for firm 𝑖 during 𝑡
The firms’ cost functions are
2
• 𝐶𝑖𝑡 = 𝑐𝑖1 + 𝑐𝑖2 𝐼𝑖𝑡 + 0.5𝑐𝑖3 𝐼𝑖𝑡
2
• 𝐸𝑖𝑡 = 𝑒𝑖1 + 𝑒𝑖2 𝑞𝑖𝑡 + 0.5𝑒𝑖3 𝑞𝑖𝑡 where 𝑒𝑖𝑗 , 𝑐𝑖𝑗 are positive scalars
Inventories obey the laws of motion

𝐼𝑖,𝑡+1 = (1 − 𝛿)𝐼𝑖𝑡 + 𝑞𝑖𝑡 − 𝑆𝑖𝑡

Demand is governed by the linear schedule

𝑆𝑡 = 𝐷𝑝𝑖𝑡 + 𝑏

where

• 𝑆𝑡 = [𝑆1𝑡 𝑆2𝑡 ]
• 𝐷 is a 2 × 2 negative definite matrix and
• 𝑏 is a vector of constants
Firm 𝑖 maximizes the undiscounted sum

1 𝑇
lim ∑ (𝑝 𝑆 − 𝐸𝑖𝑡 − 𝐶𝑖𝑡 )
𝑇 →∞ 𝑇 𝑡=0 𝑖𝑡 𝑖𝑡

We can convert this to a linear-quadratic problem by taking

𝐼1𝑡
𝑝𝑖𝑡
𝑢𝑖𝑡 = [ ] and 𝑥𝑡 = ⎡ ⎤
⎢𝐼2𝑡 ⎥
𝑞𝑖𝑡
⎣1⎦

Decision rules for price and quantity take the form 𝑢𝑖𝑡 = −𝐹𝑖 𝑥𝑡 .
The Markov perfect equilibrium of Judd’s model can be computed by filling in the matrices
appropriately.
The exercise is to calculate these matrices and compute the following figures.
The first figure shows the dynamics of inventories for each firm when the parameters are

In [7]: δ = 0.02
D = np.array([[­1, 0.5], [0.5, ­1]])
b = np.array([25, 25])
c1 = c2 = np.array([1, ­2, 1])
e1 = e2 = np.array([10, 10, 3])
804 CHAPTER 48. MARKOV PERFECT EQUILIBRIUM

Inventories trend to a common steady state.


If we increase the depreciation rate to 𝛿 = 0.05, then we expect steady state inventories to
fall.
This is indeed the case, as the next figure shows

48.7 Solutions

48.7.1 Exercise 1

First, let’s compute the duopoly MPE under the stated parameters

In [8]: # == Parameters == #
a0 = 10.0
a1 = 2.0
48.7. SOLUTIONS 805

β = 0.96
γ = 12.0

# == In LQ form == #
A = np.eye(3)
B1 = np.array([[0.], [1.], [0.]])
B2 = np.array([[0.], [0.], [1.]])
R1 = [[ 0., ­a0/2, 0.],
[­a0 / 2., a1, a1 / 2.],
[ 0, a1 / 2., 0.]]

R2 = [[ 0., 0., ­a0 / 2],


[ 0., 0., a1 / 2.],
[­a0 / 2, a1 / 2., a1]]

Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0

# == Solve using QE's nnash function == #


F1, F2, P1, P2 = qe.nnash(A, B1, B2, R1, R2, Q1,
Q2, S1, S2, W1, W2, M1,
M2, beta=β)

Now we evaluate the time path of industry output and prices given initial condition 𝑞10 =
𝑞20 = 1.

In [9]: AF = A ­ B1 @ F1 ­ B2 @ F2
n = 20
x = np.empty((3, n))
x[:, 0] = 1, 1, 1
for t in range(n­1):
x[:, t+1] = AF @ x[:, t]
q1 = x[1, :]
q2 = x[2, :]
q = q1 + q2 # Total output, MPE
p = a0 ­ a1 * q # Price, MPE

Next, let’s have a look at the monopoly solution.


For the state and control, we take

𝑥𝑡 = 𝑞𝑡 − 𝑞 ̄ and 𝑢𝑡 = 𝑞𝑡+1 − 𝑞𝑡

To convert to an LQ problem we set

𝑅 = 𝑎1 and 𝑄=𝛾

in the payoff function 𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 and

𝐴=𝐵=1

in the law of motion 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝑢𝑡 .


We solve for the optimal policy 𝑢𝑡 = −𝐹 𝑥𝑡 and track the resulting dynamics of {𝑞𝑡 }, starting
at 𝑞0 = 2.0.
806 CHAPTER 48. MARKOV PERFECT EQUILIBRIUM

In [10]: R = a1
Q = γ
A = B = 1
lq_alt = qe.LQ(Q, R, A, B, beta=β)
P, F, d = lq_alt.stationary_values()
q_bar = a0 / (2.0 * a1)
qm = np.empty(n)
qm[0] = 2
x0 = qm[0] ­ q_bar
x = x0
for i in range(1, n):
x = A * x ­ B * F * x
qm[i] = float(x) + q_bar
pm = a0 ­ a1 * qm

Let’s have a look at the different time paths

In [11]: fig, axes = plt.subplots(2, 1, figsize=(9, 9))

ax = axes[0]
ax.plot(qm, 'b­', lw=2, alpha=0.75, label='monopolist output')
ax.plot(q, 'g­', lw=2, alpha=0.75, label='MPE total output')
ax.set(ylabel="output", xlabel="time", ylim=(2, 4))
ax.legend(loc='upper left', frameon=0)

ax = axes[1]
ax.plot(pm, 'b­', lw=2, alpha=0.75, label='monopolist price')
ax.plot(p, 'g­', lw=2, alpha=0.75, label='MPE price')
ax.set(ylabel="price", xlabel="time")
ax.legend(loc='upper right', frameon=0)
plt.show()
48.7. SOLUTIONS 807

48.7.2 Exercise 2

We treat the case 𝛿 = 0.02

In [12]: δ = 0.02
D = np.array([[­1, 0.5], [0.5, ­1]])
b = np.array([25, 25])
c1 = c2 = np.array([1, ­2, 1])
e1 = e2 = np.array([10, 10, 3])

δ_1 = 1 ­ δ

Recalling that the control and state are

𝐼1𝑡
𝑝𝑖𝑡
𝑢𝑖𝑡 = [ ] and 𝑥𝑡 = ⎡ ⎤
⎢𝐼2𝑡 ⎥
𝑞𝑖𝑡
⎣1⎦

we set up the matrices as follows:


808 CHAPTER 48. MARKOV PERFECT EQUILIBRIUM

In [13]: # == Create matrices needed to compute the Nash feedback equilibrium == #

A = np.array([[δ_1, 0, ­δ_1 * b[0]],


[ 0, δ_1, ­δ_1 * b[1]],
[ 0, 0, 1]])

B1 = δ_1 * np.array([[1, ­D[0, 0]],


[0, ­D[1, 0]],
[0, 0]])
B2 = δ_1 * np.array([[0, ­D[0, 1]],
[1, ­D[1, 1]],
[0, 0]])

R1 = ­np.array([[0.5 * c1[2], 0, 0.5 * c1[1]],


[ 0, 0, 0],
[0.5 * c1[1], 0, c1[0]]])
R2 = ­np.array([[0, 0, 0],
[0, 0.5 * c2[2], 0.5 * c2[1]],
[0, 0.5 * c2[1], c2[0]]])

Q1 = np.array([[­0.5 * e1[2], 0], [0, D[0, 0]]])


Q2 = np.array([[­0.5 * e2[2], 0], [0, D[1, 1]]])

S1 = np.zeros((2, 2))
S2 = np.copy(S1)

W1 = np.array([[ 0, 0],
[ 0, 0],
[­0.5 * e1[1], b[0] / 2.]])
W2 = np.array([[ 0, 0],
[ 0, 0],
[­0.5 * e2[1], b[1] / 2.]])

M1 = np.array([[0, 0], [0, D[0, 1] / 2.]])


M2 = np.copy(M1)

We can now compute the equilibrium using qe.nnash

In [14]: F1, F2, P1, P2 = qe.nnash(A, B1, B2, R1,


R2, Q1, Q2, S1,
S2, W1, W2, M1, M2)

print("\nFirm 1's feedback rule:\n")


print(F1)

print("\nFirm 2's feedback rule:\n")


print(F2)

Firm 1's feedback rule:

[[ 2.43666582e­01 2.72360627e­02 ­6.82788293e+00]


[ 3.92370734e­01 1.39696451e­01 ­3.77341073e+01]]

Firm 2's feedback rule:

[[ 2.72360627e­02 2.43666582e­01 ­6.82788293e+00]


[ 1.39696451e­01 3.92370734e­01 ­3.77341073e+01]]

Now let’s look at the dynamics of inventories, and reproduce the graph corresponding to 𝛿 =
0.02
48.7. SOLUTIONS 809

In [15]: AF = A ­ B1 @ F1 ­ B2 @ F2
n = 25
x = np.empty((3, n))
x[:, 0] = 2, 0, 1
for t in range(n­1):
x[:, t+1] = AF @ x[:, t]
I1 = x[0, :]
I2 = x[1, :]
fig, ax = plt.subplots(figsize=(9, 5))
ax.plot(I1, 'b­', lw=2, alpha=0.75, label='inventories, firm 1')
ax.plot(I2, 'g­', lw=2, alpha=0.75, label='inventories, firm 2')
ax.set_title(rf'$\delta = {δ}$')
ax.legend()
plt.show()
810 CHAPTER 48. MARKOV PERFECT EQUILIBRIUM
Chapter 49

Uncertainty Traps

49.1 Contents

• Overview 49.2
• The Model 49.3
• Implementation 49.4
• Results 49.5
• Exercises 49.6
• Solutions 49.7

49.2 Overview

In this lecture, we study a simplified version of an uncertainty traps model of Fajgelbaum,


Schaal and Taschereau-Dumouchel [37].
The model features self-reinforcing uncertainty that has big impacts on economic activity.
In the model,
• Fundamentals vary stochastically and are not fully observable.
• At any moment there are both active and inactive entrepreneurs; only active en-
trepreneurs produce.
• Agents – active and inactive entrepreneurs – have beliefs about the fundamentals ex-
pressed as probability distributions.
• Greater uncertainty means greater dispersions of these distributions.
• Entrepreneurs are risk-averse and hence less inclined to be active when uncertainty is
high.
• The output of active entrepreneurs is observable, supplying a noisy signal that helps
everyone inside the model infer fundamentals.
• Entrepreneurs update their beliefs about fundamentals using Bayes’ Law, implemented
via Kalman filtering.
Uncertainty traps emerge because:
• High uncertainty discourages entrepreneurs from becoming active.
• A low level of participation – i.e., a smaller number of active entrepreneurs – diminishes
the flow of information about fundamentals.
• Less information translates to higher uncertainty, further discouraging entrepreneurs

811
812 CHAPTER 49. UNCERTAINTY TRAPS

from choosing to be active, and so on.


Uncertainty traps stem from a positive externality: high aggregate economic activity levels
generates valuable information.
Let’s start with some standard imports:

In [1]: import matplotlib.pyplot as plt


%matplotlib inline
import numpy as np
import itertools

49.3 The Model

The original model described in [37] has many interesting moving parts.
Here we examine a simplified version that nonetheless captures many of the key ideas.

49.3.1 Fundamentals

The evolution of the fundamental process {𝜃𝑡 } is given by

𝜃𝑡+1 = 𝜌𝜃𝑡 + 𝜎𝜃 𝑤𝑡+1

where
• 𝜎𝜃 > 0 and 0 < 𝜌 < 1
• {𝑤𝑡 } is IID and standard normal
The random variable 𝜃𝑡 is not observable at any time.

49.3.2 Output

There is a total 𝑀̄ of risk-averse entrepreneurs.


Output of the 𝑚-th entrepreneur, conditional on being active in the market at time 𝑡, is equal
to

𝑥𝑚 = 𝜃 + 𝜖𝑚 where 𝜖𝑚 ∼ 𝑁 (0, 𝛾𝑥−1 ) (1)

Here the time subscript has been dropped to simplify notation.


The inverse of the shock variance, 𝛾𝑥 , is called the shock’s precision.
The higher is the precision, the more informative 𝑥𝑚 is about the fundamental.
Output shocks are independent across time and firms.

49.3.3 Information and Beliefs

All entrepreneurs start with identical beliefs about 𝜃0 .


Signals are publicly observable and hence all agents have identical beliefs always.
49.3. THE MODEL 813

Dropping time subscripts, beliefs for current 𝜃 are represented by the normal distribution
𝑁 (𝜇, 𝛾 −1 ).
Here 𝛾 is the precision of beliefs; its inverse is the degree of uncertainty.
These parameters are updated by Kalman filtering.
Let
• 𝕄 ⊂ {1, … , 𝑀̄ } denote the set of currently active firms.
• 𝑀 ∶= |𝕄| denote the number of currently active firms.
1
• 𝑋 be the average output 𝑀 ∑𝑚∈𝕄 𝑥𝑚 of the active firms.
With this notation and primes for next period values, we can write the updating of the mean
and precision via

𝛾𝜇 + 𝑀 𝛾𝑥 𝑋
𝜇′ = 𝜌 (2)
𝛾 + 𝑀 𝛾𝑥

−1
′ 𝜌2
𝛾 =( + 𝜎𝜃2 ) (3)
𝛾 + 𝑀 𝛾𝑥

These are standard Kalman filtering results applied to the current setting.
Exercise 1 provides more details on how (2) and (3) are derived and then asks you to fill in
remaining steps.
The next figure plots the law of motion for the precision in (3) as a 45 degree diagram, with
one curve for each 𝑀 ∈ {0, … , 6}.
The other parameter values are 𝜌 = 0.99, 𝛾𝑥 = 0.5, 𝜎𝜃 = 0.5
814 CHAPTER 49. UNCERTAINTY TRAPS

Points where the curves hit the 45 degree lines are long-run steady states for precision for dif-
ferent values of 𝑀 .
Thus, if one of these values for 𝑀 remains fixed, a corresponding steady state is the equilib-
rium level of precision
• high values of 𝑀 correspond to greater information about the fundamental, and hence
more precision in steady state
• low values of 𝑀 correspond to less information and more uncertainty in steady state
In practice, as we’ll see, the number of active firms fluctuates stochastically.

49.3.4 Participation

Omitting time subscripts once more, entrepreneurs enter the market in the current period if

𝔼[𝑢(𝑥𝑚 − 𝐹𝑚 )] > 𝑐 (4)

Here
• the mathematical expectation of 𝑥𝑚 is based on (1) and beliefs 𝑁 (𝜇, 𝛾 −1 ) for 𝜃
• 𝐹𝑚 is a stochastic but pre-visible fixed cost, independent across time and firms
• 𝑐 is a constant reflecting opportunity costs
The statement that 𝐹𝑚 is pre-visible means that it is realized at the start of the period and
49.4. IMPLEMENTATION 815

treated as a constant in (4).


The utility function has the constant absolute risk aversion form

1
𝑢(𝑥) = (1 − exp(−𝑎𝑥)) (5)
𝑎
where 𝑎 is a positive parameter.
Combining (4) and (5), entrepreneur 𝑚 participates in the market (or is said to be active)
when

1
{1 − 𝔼[exp (−𝑎(𝜃 + 𝜖𝑚 − 𝐹𝑚 ))]} > 𝑐
𝑎
Using standard formulas for expectations of lognormal random variables, this is equivalent to
the condition

1 𝑎2 ( 𝛾1 + 1
𝛾𝑥 )
𝜓(𝜇, 𝛾, 𝐹𝑚 ) ∶= (1 − exp (−𝑎𝜇 + 𝑎𝐹𝑚 + )) − 𝑐 > 0 (6)
𝑎 2

49.4 Implementation

We want to simulate this economy.


As a first step, let’s put together a class that bundles
• the parameters, the current value of 𝜃 and the current values of the two belief parame-
ters 𝜇 and 𝛾
• methods to update 𝜃, 𝜇 and 𝛾, as well as to determine the number of active firms and
their outputs
The updating methods follow the laws of motion for 𝜃, 𝜇 and 𝛾 given above.
The method to evaluate the number of active firms generates 𝐹1 , … , 𝐹𝑀̄ and tests condition
(6) for each firm.
The init method encodes as default values the parameters we’ll use in the simulations below

In [2]: class UncertaintyTrapEcon:

def __init__(self,
a=1.5, # Risk aversion
γ_x=0.5, # Production shock precision
ρ=0.99, # Correlation coefficient for θ
σ_θ=0.5, # Standard dev of θ shock
num_firms=100, # Number of firms
σ_F=1.5, # Standard dev of fixed costs
c=­420, # External opportunity cost
μ_init=0, # Initial value for μ
γ_init=4, # Initial value for γ
θ_init=0): # Initial value for θ

# == Record values == #
self.a, self.γ_x, self.ρ, self.σ_θ = a, γ_x, ρ, σ_θ
self.num_firms, self.σ_F, self.c, = num_firms, σ_F, c
self.σ_x = np.sqrt(1/γ_x)
816 CHAPTER 49. UNCERTAINTY TRAPS

# == Initialize states == #
self.γ, self.μ, self.θ = γ_init, μ_init, θ_init

def ψ(self, F):


temp1 = ­self.a * (self.μ ­ F)
temp2 = self.a**2 * (1/self.γ + 1/self.γ_x) / 2
return (1 / self.a) * (1 ­ np.exp(temp1 + temp2)) ­ self.c

def update_beliefs(self, X, M):


"""
Update beliefs (μ, γ) based on aggregates X and M.
"""
# Simplify names
γ_x, ρ, σ_θ = self.γ_x, self.ρ, self.σ_θ
# Update μ
temp1 = ρ * (self.γ * self.μ + M * γ_x * X)
temp2 = self.γ + M * γ_x
self.μ = temp1 / temp2
# Update γ
self.γ = 1 / (ρ**2 / (self.γ + M * γ_x) + σ_θ**2)

def update_θ(self, w):


"""
Update the fundamental state θ given shock w.
"""
self.θ = self.ρ * self.θ + self.σ_θ * w

def gen_aggregates(self):
"""
Generate aggregates based on current beliefs (μ, γ). This
is a simulation step that depends on the draws for F.
"""
F_vals = self.σ_F * np.random.randn(self.num_firms)
M = np.sum(self.ψ(F_vals) > 0) # Counts number of active firms
if M > 0:
x_vals = self.θ + self.σ_x * np.random.randn(M)
X = x_vals.mean()
else:
X = 0
return X, M

In the results below we use this code to simulate time series for the major variables.

49.5 Results

Let’s look first at the dynamics of 𝜇, which the agents use to track 𝜃
49.5. RESULTS 817

We see that 𝜇 tracks 𝜃 well when there are sufficient firms in the market.
However, there are times when 𝜇 tracks 𝜃 poorly due to insufficient information.
These are episodes where the uncertainty traps take hold.
During these episodes
• precision is low and uncertainty is high
• few firms are in the market
To get a clearer idea of the dynamics, let’s look at all the main time series at once, for a given
set of shocks
818 CHAPTER 49. UNCERTAINTY TRAPS

Notice how the traps only take hold after a sequence of bad draws for the fundamental.
Thus, the model gives us a propagation mechanism that maps bad random draws into long
downturns in economic activity.
49.6. EXERCISES 819

49.6 Exercises

49.6.1 Exercise 1

Fill in the details behind (2) and (3) based on the following standard result (see, e.g., p. 24 of
[112]).
Fact Let x = (𝑥1 , … , 𝑥𝑀 ) be a vector of IID draws from common distribution 𝑁 (𝜃, 1/𝛾𝑥 ) and
let 𝑥̄ be the sample mean. If 𝛾𝑥 is known and the prior for 𝜃 is 𝑁 (𝜇, 1/𝛾), then the posterior
distribution of 𝜃 given x is

𝜋(𝜃 | x) = 𝑁 (𝜇0 , 1/𝛾0 )

where

𝜇𝛾 + 𝑀 𝑥𝛾̄ 𝑥
𝜇0 = and 𝛾0 = 𝛾 + 𝑀 𝛾 𝑥
𝛾 + 𝑀 𝛾𝑥

49.6.2 Exercise 2

Modulo randomness, replicate the simulation figures shown above.


• Use the parameter values listed as defaults in the init method of the Uncertainty-
TrapEcon class.

49.7 Solutions

49.7.1 Exercise 1

This exercise asked you to validate the laws of motion for 𝛾 and 𝜇 given in the lecture, based
on the stated result about Bayesian updating in a scalar Gaussian setting. The stated result
tells us that after observing average output 𝑋 of the 𝑀 firms, our posterior beliefs will be

𝑁 (𝜇0 , 1/𝛾0 )

where

𝜇𝛾 + 𝑀 𝑋𝛾𝑥
𝜇0 = and 𝛾0 = 𝛾 + 𝑀 𝛾𝑥
𝛾 + 𝑀 𝛾𝑥

If we take a random variable 𝜃 with this distribution and then evaluate the distribution of
𝜌𝜃 + 𝜎𝜃 𝑤 where 𝑤 is independent and standard normal, we get the expressions for 𝜇′ and 𝛾 ′
given in the lecture.

49.7.2 Exercise 2

First, let’s replicate the plot that illustrates the law of motion for precision, which is
820 CHAPTER 49. UNCERTAINTY TRAPS

−1
𝜌2
𝛾𝑡+1 = ( + 𝜎𝜃2 )
𝛾𝑡 + 𝑀 𝛾 𝑥

Here 𝑀 is the number of active firms. The next figure plots 𝛾𝑡+1 against 𝛾𝑡 on a 45 degree
diagram for different values of 𝑀

In [3]: econ = UncertaintyTrapEcon()


ρ, σ_θ, γ_x = econ.ρ, econ.σ_θ, econ.γ_x # Simplify names
γ = np.linspace(1e­10, 3, 200) # γ grid
fig, ax = plt.subplots(figsize=(9, 9))
ax.plot(γ, γ, 'k­') # 45 degree line

for M in range(7):
γ_next = 1 / (ρ**2 / (γ + M * γ_x) + σ_θ**2)
label_string = f"$M = {M}$"
ax.plot(γ, γ_next, lw=2, label=label_string)
ax.legend(loc='lower right', fontsize=14)
ax.set_xlabel(r'$\gamma$', fontsize=16)
ax.set_ylabel(r"$\gamma'$", fontsize=16)
ax.grid()
plt.show()
49.7. SOLUTIONS 821

The points where the curves hit the 45 degree lines are the long-run steady states correspond-
ing to each 𝑀 , if that value of 𝑀 was to remain fixed. As the number of firms falls, so does
the long-run steady state of precision.
Next let’s generate time series for beliefs and the aggregates – that is, the number of active
firms and average output

In [4]: sim_length=2000

μ_vec = np.empty(sim_length)
θ_vec = np.empty(sim_length)
γ_vec = np.empty(sim_length)
X_vec = np.empty(sim_length)
M_vec = np.empty(sim_length)

μ_vec[0] = econ.μ
γ_vec[0] = econ.γ
θ_vec[0] = 0

w_shocks = np.random.randn(sim_length)

for t in range(sim_length­1):
X, M = econ.gen_aggregates()
X_vec[t] = X
M_vec[t] = M

econ.update_beliefs(X, M)
econ.update_θ(w_shocks[t])

μ_vec[t+1] = econ.μ
γ_vec[t+1] = econ.γ
θ_vec[t+1] = econ.θ

# Record final values of aggregates


X, M = econ.gen_aggregates()
X_vec[­1] = X
M_vec[­1] = M

First, let’s see how well 𝜇 tracks 𝜃 in these simulations

In [5]: fig, ax = plt.subplots(figsize=(9, 6))


ax.plot(range(sim_length), θ_vec, alpha=0.6, lw=2, label=r"$\theta$")
ax.plot(range(sim_length), μ_vec, alpha=0.6, lw=2, label=r"$\mu$")
ax.legend(fontsize=16)
ax.grid()
plt.show()
822 CHAPTER 49. UNCERTAINTY TRAPS

Now let’s plot the whole thing together

In [6]: fig, axes = plt.subplots(4, 1, figsize=(12, 20))


# Add some spacing
fig.subplots_adjust(hspace=0.3)

series = (θ_vec, μ_vec, γ_vec, M_vec)


names = r'$\theta$', r'$\mu$', r'$\gamma$', r'$M$'

for ax, vals, name in zip(axes, series, names):


# Determine suitable y limits
s_max, s_min = max(vals), min(vals)
s_range = s_max ­ s_min
y_max = s_max + s_range * 0.1
y_min = s_min ­ s_range * 0.1
ax.set_ylim(y_min, y_max)
# Plot series
ax.plot(range(sim_length), vals, alpha=0.6, lw=2)
ax.set_title(f"time series for {name}", fontsize=16)
ax.grid()

plt.show()
49.7. SOLUTIONS 823

If you run the code above you’ll get different plots, of course.
824 CHAPTER 49. UNCERTAINTY TRAPS

Try experimenting with different parameters to see the effects on the time series.
(It would also be interesting to experiment with non-Gaussian distributions for the shocks,
but this is a big exercise since it takes us outside the world of the standard Kalman filter)
Chapter 50

The Aiyagari Model

50.1 Contents

• Overview 50.2
• The Economy 50.3
• Firms 50.4
• Code 50.5
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon

50.2 Overview

In this lecture, we describe the structure of a class of models that build on work by Truman
Bewley [13].
We begin by discussing an example of a Bewley model due to Rao Aiyagari.
The model features
• Heterogeneous agents
• A single exogenous vehicle for borrowing and lending
• Limits on amounts individual agents may borrow
The Aiyagari model has been used to investigate many topics, including
• precautionary savings and the effect of liquidity constraints [4]
• risk sharing and asset pricing [55]
• the shape of the wealth distribution [10]
• etc., etc., etc.
Let’s start with some imports:

In [2]: import numpy as np


import quantecon as qe
import matplotlib.pyplot as plt
%matplotlib inline
from quantecon.markov import DiscreteDP
from numba import jit

825
826 CHAPTER 50. THE AIYAGARI MODEL

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

50.2.1 References

The primary reference for this lecture is [4].


A textbook treatment is available in chapter 18 of [72].
A continuous time version of the model by SeHyoun Ahn and Benjamin Moll can be found
here.

50.3 The Economy

50.3.1 Households

Infinitely lived households / consumers face idiosyncratic income shocks.


A unit interval of ex-ante identical households face a common borrowing constraint.
The savings problem faced by a typical household is


max 𝔼 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 )
𝑡=0

subject to

𝑎𝑡+1 + 𝑐𝑡 ≤ 𝑤𝑧𝑡 + (1 + 𝑟)𝑎𝑡 𝑐𝑡 ≥ 0, and 𝑎𝑡 ≥ −𝐵

where
• 𝑐𝑡 is current consumption
• 𝑎𝑡 is assets
• 𝑧𝑡 is an exogenous component of labor income capturing stochastic unemployment risk,
etc.
• 𝑤 is a wage rate
• 𝑟 is a net interest rate
• 𝐵 is the maximum amount that the agent is allowed to borrow
The exogenous process {𝑧𝑡 } follows a finite state Markov chain with given stochastic matrix
𝑃.
The wage and interest rate are fixed over time.
In this simple version of the model, households supply labor inelastically because they do not
value leisure.
50.4. FIRMS 827

50.4 Firms

Firms produce output by hiring capital and labor.


Firms act competitively and face constant returns to scale.
Since returns to scale are constant the number of firms does not matter.
Hence we can consider a single (but nonetheless competitive) representative firm.
The firm’s output is

𝑌𝑡 = 𝐴𝐾𝑡𝛼 𝑁 1−𝛼

where
• 𝐴 and 𝛼 are parameters with 𝐴 > 0 and 𝛼 ∈ (0, 1)
• 𝐾𝑡 is aggregate capital
• 𝑁 is total labor supply (which is constant in this simple version of the model)
The firm’s problem is

𝑚𝑎𝑥𝐾,𝑁 {𝐴𝐾𝑡𝛼 𝑁 1−𝛼 − (𝑟 + 𝛿)𝐾 − 𝑤𝑁 }

The parameter 𝛿 is the depreciation rate.


From the first-order condition with respect to capital, the firm’s inverse demand for capital is

1−𝛼
𝑁
𝑟 = 𝐴𝛼 ( ) −𝛿 (1)
𝐾
Using this expression and the firm’s first-order condition for labor, we can pin down the equi-
librium wage rate as a function of 𝑟 as

𝑤(𝑟) = 𝐴(1 − 𝛼)(𝐴𝛼/(𝑟 + 𝛿))𝛼/(1−𝛼) (2)

50.4.1 Equilibrium

We construct a stationary rational expectations equilibrium (SREE).


In such an equilibrium
• prices induce behavior that generates aggregate quantities consistent with the prices
• aggregate quantities and prices are constant over time
In more detail, an SREE lists a set of prices, savings and production policies such that
• households want to choose the specified savings policies taking the prices as given
• firms maximize profits taking the same prices as given
• the resulting aggregate quantities are consistent with the prices; in particular, the de-
mand for capital equals the supply
• aggregate quantities (defined as cross-sectional averages) are constant
In practice, once parameter values are set, we can check for an SREE by the following steps

1. pick a proposed quantity 𝐾 for aggregate capital


828 CHAPTER 50. THE AIYAGARI MODEL

2. determine corresponding prices, with interest rate 𝑟 determined by (1) and a wage rate
𝑤(𝑟) as given in (2)

3. determine the common optimal savings policy of the households given these prices

4. compute aggregate capital as the mean of steady state capital given this savings policy

If this final quantity agrees with 𝐾 then we have a SREE.

50.5 Code

Let’s look at how we might compute such an equilibrium in practice.


To solve the household’s dynamic programming problem we’ll use the DiscreteDP class from
QuantEcon.py.
Our first task is the least exciting one: write code that maps parameters for a household
problem into the R and Q matrices needed to generate an instance of DiscreteDP.
Below is a piece of boilerplate code that does just this.
In reading the code, the following information will be helpful
• R needs to be a matrix where R[s, a] is the reward at state s under action a.
• Q needs to be a three-dimensional array where Q[s, a, s'] is the probability of transi-
tioning to state s' when the current state is s and the current action is a.
(A more detailed discussion of DiscreteDP is available in the Discrete State Dynamic Pro-
gramming lecture in the Advanced Quantitative Economics with Python lecture series.)
Here we take the state to be 𝑠𝑡 ∶= (𝑎𝑡 , 𝑧𝑡 ), where 𝑎𝑡 is assets and 𝑧𝑡 is the shock.
The action is the choice of next period asset level 𝑎𝑡+1 .
We use Numba to speed up the loops so we can update the matrices efficiently when the pa-
rameters change.
The class also includes a default set of parameters that we’ll adopt unless otherwise specified.

In [3]: class Household:


"""
This class takes the parameters that define a household asset accumulation
problem and computes the corresponding reward and transition matrices R
and Q required to generate an instance of DiscreteDP, and thereby solve
for the optimal policy.

Comments on indexing: We need to enumerate the state space S as a sequence


S = {0, ..., n}. To this end, (a_i, z_i) index pairs are mapped to s_i
indices according to the rule

s_i = a_i * z_size + z_i

To invert this map, use

a_i = s_i // z_size (integer division)


z_i = s_i % z_size

"""
50.5. CODE 829

def __init__(self,
r=0.01, # Interest rate
w=1.0, # Wages
β=0.96, # Discount factor
a_min=1e­10,
Π=[[0.9, 0.1], [0.1, 0.9]], # Markov chain
z_vals=[0.1, 1.0], # Exogenous states
a_max=18,
a_size=200):

# Store values, set up grids over a and z


self.r, self.w, self.β = r, w, β
self.a_min, self.a_max, self.a_size = a_min, a_max, a_size

self.Π = np.asarray(Π)
self.z_vals = np.asarray(z_vals)
self.z_size = len(z_vals)

self.a_vals = np.linspace(a_min, a_max, a_size)


self.n = a_size * self.z_size

# Build the array Q


self.Q = np.zeros((self.n, a_size, self.n))
self.build_Q()

# Build the array R


self.R = np.empty((self.n, a_size))
self.build_R()

def set_prices(self, r, w):


"""
Use this method to reset prices. Calling the method will trigger a
re­build of R.
"""
self.r, self.w = r, w
self.build_R()

def build_Q(self):
populate_Q(self.Q, self.a_size, self.z_size, self.Π)

def build_R(self):
self.R.fill(­np.inf)
populate_R(self.R,
self.a_size,
self.z_size,
self.a_vals,
self.z_vals,
self.r,
self.w)

# Do the hard work using JIT­ed functions

@jit(nopython=True)
def populate_R(R, a_size, z_size, a_vals, z_vals, r, w):
n = a_size * z_size
for s_i in range(n):
a_i = s_i // z_size
z_i = s_i % z_size
a = a_vals[a_i]
z = z_vals[z_i]
for new_a_i in range(a_size):
830 CHAPTER 50. THE AIYAGARI MODEL

a_new = a_vals[new_a_i]
c = w * z + (1 + r) * a ­ a_new
if c > 0:
R[s_i, new_a_i] = np.log(c) # Utility

@jit(nopython=True)
def populate_Q(Q, a_size, z_size, Π):
n = a_size * z_size
for s_i in range(n):
z_i = s_i % z_size
for a_i in range(a_size):
for next_z_i in range(z_size):
Q[s_i, a_i, a_i*z_size + next_z_i] = Π[z_i, next_z_i]

@jit(nopython=True)
def asset_marginal(s_probs, a_size, z_size):
a_probs = np.zeros(a_size)
for a_i in range(a_size):
for z_i in range(z_size):
a_probs[a_i] += s_probs[a_i*z_size + z_i]
return a_probs

As a first example of what we can do, let’s compute and plot an optimal accumulation policy
at fixed prices.

In [4]: # Example prices


r = 0.03
w = 0.956

# Create an instance of Household


am = Household(a_max=20, r=r, w=w)

# Use the instance to build a discrete dynamic program


am_ddp = DiscreteDP(am.R, am.Q, am.β)

# Solve using policy function iteration


results = am_ddp.solve(method='policy_iteration')

# Simplify names
z_size, a_size = am.z_size, am.a_size
z_vals, a_vals = am.z_vals, am.a_vals
n = a_size * z_size

# Get all optimal actions across the set of a indices with z fixed in each row
a_star = np.empty((z_size, a_size))
for s_i in range(n):
a_i = s_i // z_size
z_i = s_i % z_size
a_star[z_i, a_i] = a_vals[results.sigma[s_i]]

fig, ax = plt.subplots(figsize=(9, 9))


ax.plot(a_vals, a_vals, 'k­­') # 45 degrees
for i in range(z_size):
lb = f'$z = {z_vals[i]:.2}$'
ax.plot(a_vals, a_star[i, :], lw=2, alpha=0.6, label=lb)
ax.set_xlabel('current assets')
ax.set_ylabel('next period assets')
ax.legend(loc='upper left')

plt.show()
50.5. CODE 831

The plot shows asset accumulation policies at different values of the exogenous state.
Now we want to calculate the equilibrium.
Let’s do this visually as a first pass.
The following code draws aggregate supply and demand curves.
The intersection gives equilibrium interest rates and capital.

In [5]: A = 1.0
N = 1.0
α = 0.33
β = 0.96
δ = 0.05

def r_to_w(r):
"""
Equilibrium wages associated with a given interest rate r.
"""
return A * (1 ­ α) * (A * α / (r + δ))**(α / (1 ­ α))

def rd(K):
"""
832 CHAPTER 50. THE AIYAGARI MODEL

Inverse demand curve for capital. The interest rate associated with a
given demand for capital K.
"""
return A * α * (N / K)**(1 ­ α) ­ δ

def prices_to_capital_stock(am, r):


"""
Map prices to the induced level of capital stock.

Parameters:
­­­­­­­­­­

am : Household
An instance of an aiyagari_household.Household
r : float
The interest rate
"""
w = r_to_w(r)
am.set_prices(r, w)
aiyagari_ddp = DiscreteDP(am.R, am.Q, β)
# Compute the optimal policy
results = aiyagari_ddp.solve(method='policy_iteration')
# Compute the stationary distribution
stationary_probs = results.mc.stationary_distributions[0]
# Extract the marginal distribution for assets
asset_probs = asset_marginal(stationary_probs, am.a_size, am.z_size)
# Return K
return np.sum(asset_probs * am.a_vals)

# Create an instance of Household


am = Household(a_max=20)

# Use the instance to build a discrete dynamic program


am_ddp = DiscreteDP(am.R, am.Q, am.β)

# Create a grid of r values at which to compute demand and supply of capital


num_points = 20
r_vals = np.linspace(0.005, 0.04, num_points)

# Compute supply of capital


k_vals = np.empty(num_points)
for i, r in enumerate(r_vals):
k_vals[i] = prices_to_capital_stock(am, r)

# Plot against demand for capital by firms


fig, ax = plt.subplots(figsize=(11, 8))
ax.plot(k_vals, r_vals, lw=2, alpha=0.6, label='supply of capital')
ax.plot(k_vals, rd(k_vals), lw=2, alpha=0.6, label='demand for capital')
ax.grid()
ax.set_xlabel('capital')
ax.set_ylabel('interest rate')
ax.legend(loc='upper right')

plt.show()
50.5. CODE 833
834 CHAPTER 50. THE AIYAGARI MODEL
Part VIII

Asset Pricing and Finance

835
Chapter 51

Asset Pricing: Finite State Models

51.1 Contents

• Overview 51.2
• Pricing Models 51.3
• Prices in the Risk-Neutral Case 51.4
• Asset Prices under Risk Aversion 51.5
• Exercises 51.6
• Solutions 51.7

“A little knowledge of geometric series goes a long way” – Robert E. Lucas, Jr.

“Asset pricing is all about covariances” – Lars Peter Hansen

In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon

51.2 Overview

An asset is a claim on one or more future payoffs.


The spot price of an asset depends primarily on
• the anticipated dynamics for the stream of income accruing to the owners
• attitudes to risk
• rates of time preference
In this lecture, we consider some standard pricing models and dividend stream specifications.
We study how prices and dividend-price ratios respond in these different scenarios.
We also look at creating and pricing derivative assets by repackaging income streams.
Key tools for the lecture are
• formulas for predicting future values of functions of a Markov state
• a formula for predicting the discounted sum of future values of a Markov state

837
838 CHAPTER 51. ASSET PRICING: FINITE STATE MODELS

Let’s start with some imports:

In [2]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline
import quantecon as qe
from numpy.linalg import eigvals, solve

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found
TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.
warnings.warn(problem)

51.3 Pricing Models

In what follows let {𝑑𝑡 }𝑡≥0 be a stream of dividends


• A time-𝑡 cum-dividend asset is a claim to the stream 𝑑𝑡 , 𝑑𝑡+1 , ….
• A time-𝑡 ex-dividend asset is a claim to the stream 𝑑𝑡+1 , 𝑑𝑡+2 , ….
Let’s look at some equations that we expect to hold for prices of assets under ex-dividend
contracts (we will consider cum-dividend pricing in the exercises).

51.3.1 Risk-Neutral Pricing

Our first scenario is risk-neutral pricing.


Let 𝛽 = 1/(1 + 𝜌) be an intertemporal discount factor, where 𝜌 is the rate at which agents
discount the future.
The basic risk-neutral asset pricing equation for pricing one unit of an ex-dividend asset is

𝑝𝑡 = 𝛽𝔼𝑡 [𝑑𝑡+1 + 𝑝𝑡+1 ] (1)

This is a simple “cost equals expected benefit” relationship.


Here 𝔼𝑡 [𝑦] denotes the best forecast of 𝑦, conditioned on information available at time 𝑡.

51.3.2 Pricing with Random Discount Factor

What happens if for some reason traders discount payouts differently depending on the state
of the world?
Michael Harrison and David Kreps [54] and Lars Peter Hansen and Scott Richard [52] showed
that in quite general settings the price of an ex-dividend asset obeys

𝑝𝑡 = 𝔼𝑡 [𝑚𝑡+1 (𝑑𝑡+1 + 𝑝𝑡+1 )] (2)

for some stochastic discount factor 𝑚𝑡+1 .


The fixed discount factor 𝛽 in (1) has been replaced by the random variable 𝑚𝑡+1 .
51.4. PRICES IN THE RISK-NEUTRAL CASE 839

The way anticipated future payoffs are evaluated can now depend on various random out-
comes.
One example of this idea is that assets that tend to have good payoffs in bad states of the
world might be regarded as more valuable.
This is because they pay well when the funds are more urgently needed.
We give examples of how the stochastic discount factor has been modeled below.

51.3.3 Asset Pricing and Covariances

Recall that, from the definition of a conditional covariance cov𝑡 (𝑥𝑡+1 , 𝑦𝑡+1 ), we have

𝔼𝑡 (𝑥𝑡+1 𝑦𝑡+1 ) = cov𝑡 (𝑥𝑡+1 , 𝑦𝑡+1 ) + 𝔼𝑡 𝑥𝑡+1 𝔼𝑡 𝑦𝑡+1 (3)

If we apply this definition to the asset pricing equation (2) we obtain

𝑝𝑡 = 𝔼𝑡 𝑚𝑡+1 𝔼𝑡 (𝑑𝑡+1 + 𝑝𝑡+1 ) + cov𝑡 (𝑚𝑡+1 , 𝑑𝑡+1 + 𝑝𝑡+1 ) (4)

It is useful to regard equation (4) as a generalization of equation (1)


• In equation (1), the stochastic discount factor 𝑚𝑡+1 = 𝛽, a constant.
• In equation (1), the covariance term cov𝑡 (𝑚𝑡+1 , 𝑑𝑡+1 + 𝑝𝑡+1 ) is zero because 𝑚𝑡+1 = 𝛽.
Equation (4) asserts that the covariance of the stochastic discount factor with the one period
payout 𝑑𝑡+1 + 𝑝𝑡+1 is an important determinant of the price 𝑝𝑡 .
We give examples of some models of stochastic discount factors that have been proposed later
in this lecture and also in a later lecture.

51.3.4 The Price-Dividend Ratio

Aside from prices, another quantity of interest is the price-dividend ratio 𝑣𝑡 ∶= 𝑝𝑡 /𝑑𝑡 .
Let’s write down an expression that this ratio should satisfy.
We can divide both sides of (2) by 𝑑𝑡 to get

𝑑𝑡+1
𝑣𝑡 = 𝔼𝑡 [𝑚𝑡+1 (1 + 𝑣𝑡+1 )] (5)
𝑑𝑡

Below we’ll discuss the implication of this equation.

51.4 Prices in the Risk-Neutral Case

What can we say about price dynamics on the basis of the models described above?
The answer to this question depends on

1. the process we specify for dividends

2. the stochastic discount factor and how it correlates with dividends


840 CHAPTER 51. ASSET PRICING: FINITE STATE MODELS

For now let’s focus on the risk-neutral case, where the stochastic discount factor is constant,
and study how prices depend on the dividend process.

51.4.1 Example 1: Constant Dividends

The simplest case is risk-neutral pricing in the face of a constant, non-random dividend
stream 𝑑𝑡 = 𝑑 > 0.
Removing the expectation from (1) and iterating forward gives

𝑝𝑡 = 𝛽(𝑑 + 𝑝𝑡+1 )
= 𝛽(𝑑 + 𝛽(𝑑 + 𝑝𝑡+2 ))

= 𝛽(𝑑 + 𝛽𝑑 + 𝛽 2 𝑑 + ⋯ + 𝛽 𝑘−2 𝑑 + 𝛽 𝑘−1 𝑝𝑡+𝑘 )

Unless prices explode in the future, this sequence converges to

𝛽𝑑
𝑝̄ ∶= (6)
1−𝛽

This price is the equilibrium price in the constant dividend case.


Indeed, simple algebra shows that setting 𝑝𝑡 = 𝑝̄ for all 𝑡 satisfies the equilibrium condition
𝑝𝑡 = 𝛽(𝑑 + 𝑝𝑡+1 ).

51.4.2 Example 2: Dividends with Deterministic Growth Paths

Consider a growing, non-random dividend process 𝑑𝑡+1 = 𝑔𝑑𝑡 where 0 < 𝑔𝛽 < 1.
While prices are not usually constant when dividends grow over time, the price dividend-ratio
might be.
If we guess this, substituting 𝑣𝑡 = 𝑣 into (5) as well as our other assumptions, we get 𝑣 =
𝛽𝑔(1 + 𝑣).
Since 𝛽𝑔 < 1, we have a unique positive solution:

𝛽𝑔
𝑣=
1 − 𝛽𝑔

The price is then

𝛽𝑔
𝑝𝑡 = 𝑑
1 − 𝛽𝑔 𝑡

If, in this example, we take 𝑔 = 1 + 𝜅 and let 𝜌 ∶= 1/𝛽 − 1, then the price becomes

1+𝜅
𝑝𝑡 = 𝑑
𝜌−𝜅 𝑡

This is called the Gordon formula.


51.4. PRICES IN THE RISK-NEUTRAL CASE 841

51.4.3 Example 3: Markov Growth, Risk-Neutral Pricing

Next, we consider a dividend process

𝑑𝑡+1 = 𝑔𝑡+1 𝑑𝑡 (7)

The stochastic growth factor {𝑔𝑡 } is given by

𝑔𝑡 = 𝑔(𝑋𝑡 ), 𝑡 = 1, 2, …

where

1. {𝑋𝑡 } is a finite Markov chain with state space 𝑆 and transition probabilities

𝑃 (𝑥, 𝑦) ∶= ℙ{𝑋𝑡+1 = 𝑦 | 𝑋𝑡 = 𝑥} (𝑥, 𝑦 ∈ 𝑆)

1. 𝑔 is a given function on 𝑆 taking positive values

You can think of


• 𝑆 as 𝑛 possible “states of the world” and 𝑋𝑡 as the current state.
• 𝑔 as a function that maps a given state 𝑋𝑡 into a growth factor 𝑔𝑡 = 𝑔(𝑋𝑡 ) for the en-
dowment.
• ln 𝑔𝑡 = ln(𝑑𝑡+1 /𝑑𝑡 ) is the growth rate of dividends.
(For a refresher on notation and theory for finite Markov chains see this lecture)
The next figure shows a simulation, where
• {𝑋𝑡 } evolves as a discretized AR1 process produced using Tauchen’s method.
• 𝑔𝑡 = exp(𝑋𝑡 ), so that ln 𝑔𝑡 = 𝑋𝑡 is the growth rate.

In [3]: mc = qe.tauchen(0.96, 0.25, n=25)


sim_length = 80

x_series = mc.simulate(sim_length, init=np.median(mc.state_values))


g_series = np.exp(x_series)
d_series = np.cumprod(g_series) # Assumes d_0 = 1

series = [x_series, g_series, d_series, np.log(d_series)]


labels = ['$X_t$', '$g_t$', '$d_t$', r'$\log \, d_t$']

fig, axes = plt.subplots(2, 2, figsize=(12, 8))


for ax, s, label in zip(axes.flatten(), series, labels):
ax.plot(s, 'b­', lw=2, label=label)
ax.legend(loc='upper left', frameon=False)
plt.tight_layout()
plt.show()
842 CHAPTER 51. ASSET PRICING: FINITE STATE MODELS

Pricing

To obtain asset prices in this setting, let’s adapt our analysis from the case of deterministic
growth.
In that case, we found that 𝑣 is constant.
This encourages us to guess that, in the current case, 𝑣𝑡 is constant given the state 𝑋𝑡 .
In other words, we are looking for a fixed function 𝑣 such that the price-dividend ratio satis-
fies 𝑣𝑡 = 𝑣(𝑋𝑡 ).
We can substitute this guess into (5) to get

𝑣(𝑋𝑡 ) = 𝛽𝔼𝑡 [𝑔(𝑋𝑡+1 )(1 + 𝑣(𝑋𝑡+1 ))]

If we condition on 𝑋𝑡 = 𝑥, this becomes

𝑣(𝑥) = 𝛽 ∑ 𝑔(𝑦)(1 + 𝑣(𝑦))𝑃 (𝑥, 𝑦)


𝑦∈𝑆

or

𝑣(𝑥) = 𝛽 ∑ 𝐾(𝑥, 𝑦)(1 + 𝑣(𝑦)) where 𝐾(𝑥, 𝑦) ∶= 𝑔(𝑦)𝑃 (𝑥, 𝑦) (8)


𝑦∈𝑆

Suppose that there are 𝑛 possible states 𝑥1 , … , 𝑥𝑛 .


We can then think of (8) as 𝑛 stacked equations, one for each state, and write it in matrix
form as
51.4. PRICES IN THE RISK-NEUTRAL CASE 843

𝑣 = 𝛽𝐾(𝟙 + 𝑣) (9)

Here
• 𝑣 is understood to be the column vector (𝑣(𝑥1 ), … , 𝑣(𝑥𝑛 ))′ .
• 𝐾 is the matrix (𝐾(𝑥𝑖 , 𝑥𝑗 ))1≤𝑖,𝑗≤𝑛 .
• 𝟙 is a column vector of ones.
When does (9) have a unique solution?
From the Neumann series lemma and Gelfand’s formula, this will be the case if 𝛽𝐾 has spec-
tral radius strictly less than one.
In other words, we require that the eigenvalues of 𝐾 be strictly less than 𝛽 −1 in modulus.
The solution is then

𝑣 = (𝐼 − 𝛽𝐾)−1 𝛽𝐾𝟙 (10)

51.4.4 Code

Let’s calculate and plot the price-dividend ratio at a set of parameters.


As before, we’ll generate {𝑋𝑡 } as a discretized AR1 process and set 𝑔𝑡 = exp(𝑋𝑡 ).
Here’s the code, including a test of the spectral radius condition

In [4]: n = 25 # Size of state space


β = 0.9
mc = qe.tauchen(0.96, 0.02, n=n)

K = mc.P * np.exp(mc.state_values)

warning_message = "Spectral radius condition fails"


assert np.max(np.abs(eigvals(K))) < 1 / β, warning_message

I = np.identity(n)
v = solve(I ­ β * K, β * K @ np.ones(n))

fig, ax = plt.subplots(figsize=(12, 8))


ax.plot(mc.state_values, v, 'g­o', lw=2, alpha=0.7, label='$v$')
ax.set_ylabel("price­dividend ratio")
ax.set_xlabel("state")
ax.legend(loc='upper left')
plt.show()
844 CHAPTER 51. ASSET PRICING: FINITE STATE MODELS

Why does the price-dividend ratio increase with the state?


The reason is that this Markov process is positively correlated, so high current states suggest
high future states.
Moreover, dividend growth is increasing in the state.
The anticipation of high future dividend growth leads to a high price-dividend ratio.

51.5 Asset Prices under Risk Aversion

Now let’s turn to the case where agents are risk averse.
We’ll price several distinct assets, including
• The price of an endowment stream
• A consol (a type of bond issued by the UK government in the 19th century)
• Call options on a consol

51.5.1 Pricing a Lucas Tree

Let’s start with a version of the celebrated asset pricing model of Robert E. Lucas, Jr. [73].
As in [73], suppose that the stochastic discount factor takes the form

𝑢′ (𝑐𝑡+1 )
𝑚𝑡+1 = 𝛽 (11)
𝑢′ (𝑐𝑡 )

where 𝑢 is a concave utility function and 𝑐𝑡 is time 𝑡 consumption of a representative con-


sumer.
51.5. ASSET PRICES UNDER RISK AVERSION 845

(A derivation of this expression is given in a later lecture)


Assume the existence of an endowment that follows (7).
The asset being priced is a claim on the endowment process.
Following [73], suppose further that in equilibrium, consumption is equal to the endowment,
so that 𝑑𝑡 = 𝑐𝑡 for all 𝑡.
For utility, we’ll assume the constant relative risk aversion (CRRA) specification

𝑐1−𝛾
𝑢(𝑐) = with 𝛾 > 0 (12)
1−𝛾

When 𝛾 = 1 we let 𝑢(𝑐) = ln 𝑐.


Inserting the CRRA specification into (11) and using 𝑐𝑡 = 𝑑𝑡 gives

−𝛾
𝑐 −𝛾
𝑚𝑡+1 = 𝛽 ( 𝑡+1 ) = 𝛽𝑔𝑡+1 (13)
𝑐𝑡

Substituting this into (5) gives the price-dividend ratio formula

𝑣(𝑋𝑡 ) = 𝛽𝔼𝑡 [𝑔(𝑋𝑡+1 )1−𝛾 (1 + 𝑣(𝑋𝑡+1 ))]

Conditioning on 𝑋𝑡 = 𝑥, we can write this as

𝑣(𝑥) = 𝛽 ∑ 𝑔(𝑦)1−𝛾 (1 + 𝑣(𝑦))𝑃 (𝑥, 𝑦)


𝑦∈𝑆

If we let

𝐽 (𝑥, 𝑦) ∶= 𝑔(𝑦)1−𝛾 𝑃 (𝑥, 𝑦)

then we can rewrite in vector form as

𝑣 = 𝛽𝐽 (𝟙 + 𝑣)

Assuming that the spectral radius of 𝐽 is strictly less than 𝛽 −1 , this equation has the unique
solution

𝑣 = (𝐼 − 𝛽𝐽 )−1 𝛽𝐽 𝟙 (14)

We will define a function tree_price to solve for 𝑣 given parameters stored in the class Asset-
PriceModel

In [5]: class AssetPriceModel:


"""
A class that stores the primitives of the asset pricing model.

Parameters
­­­­­­­­­­
β : scalar, float
846 CHAPTER 51. ASSET PRICING: FINITE STATE MODELS

Discount factor
mc : MarkovChain
Contains the transition matrix and set of state values for the state
process
γ : scalar(float)
Coefficient of risk aversion
g : callable
The function mapping states to growth rates

"""
def __init__(self, β=0.96, mc=None, γ=2.0, g=np.exp):
self.β, self.γ = β, γ
self.g = g

# A default process for the Markov chain


if mc is None:
self.ρ = 0.9
self.σ = 0.02
self.mc = qe.tauchen(self.ρ, self.σ, n=25)
else:
self.mc = mc

self.n = self.mc.P.shape[0]

def test_stability(self, Q):


"""
Stability test for a given matrix Q.
"""
sr = np.max(np.abs(eigvals(Q)))
if not sr < 1 / self.β:
msg = f"Spectral radius condition failed with radius = {sr}"
raise ValueError(msg)

def tree_price(ap):
"""
Computes the price­dividend ratio of the Lucas tree.

Parameters
­­­­­­­­­­
ap: AssetPriceModel
An instance of AssetPriceModel containing primitives

Returns
­­­­­­­
v : array_like(float)
Lucas tree price­dividend ratio

"""
# Simplify names, set up matrices
β, γ, P, y = ap.β, ap.γ, ap.mc.P, ap.mc.state_values
J = P * ap.g(y)**(1 ­ γ)

# Make sure that a unique solution exists


ap.test_stability(J)

# Compute v
I = np.identity(ap.n)
Ones = np.ones(ap.n)
v = solve(I ­ β * J, β * J @ Ones)

return v
51.5. ASSET PRICES UNDER RISK AVERSION 847

Here’s a plot of 𝑣 as a function of the state for several values of 𝛾, with a positively correlated
Markov process and 𝑔(𝑥) = exp(𝑥)

In [6]: γs = [1.2, 1.4, 1.6, 1.8, 2.0]


ap = AssetPriceModel()
states = ap.mc.state_values

fig, ax = plt.subplots(figsize=(12, 8))

for γ in γs:
ap.γ = γ
v = tree_price(ap)
ax.plot(states, v, lw=2, alpha=0.6, label=rf"$\gamma = {γ}$")

ax.set_title('Price­divdend ratio as a function of the state')


ax.set_ylabel("price­dividend ratio")
ax.set_xlabel("state")
ax.legend(loc='upper right')
plt.show()

Notice that 𝑣 is decreasing in each case.


This is because, with a positively correlated state process, higher states suggest higher future
consumption growth.
In the stochastic discount factor (13), higher growth decreases the discount factor, lowering
the weight placed on future returns.

Special Cases

In the special case 𝛾 = 1, we have 𝐽 = 𝑃 .


848 CHAPTER 51. ASSET PRICING: FINITE STATE MODELS

Recalling that 𝑃 𝑖 𝟙 = 𝟙 for all 𝑖 and applying Neumann’s geometric series lemma, we are led
to


1
𝑣 = 𝛽(𝐼 − 𝛽𝑃 )−1 𝟙 = 𝛽 ∑ 𝛽 𝑖 𝑃 𝑖 𝟙 = 𝛽 𝟙
𝑖=0
1−𝛽

Thus, with log preferences, the price-dividend ratio for a Lucas tree is constant.
Alternatively, if 𝛾 = 0, then 𝐽 = 𝐾 and we recover the risk-neutral solution (10).
This is as expected, since 𝛾 = 0 implies 𝑢(𝑐) = 𝑐 (and hence agents are risk-neutral).

51.5.2 A Risk-Free Consol

Consider the same pure exchange representative agent economy.


A risk-free consol promises to pay a constant amount 𝜁 > 0 each period.
Recycling notation, let 𝑝𝑡 now be the price of an ex-coupon claim to the consol.
An ex-coupon claim to the consol entitles the owner at the end of period 𝑡 to
• 𝜁 in period 𝑡 + 1, plus
• the right to sell the claim for 𝑝𝑡+1 next period
The price satisfies (2) with 𝑑𝑡 = 𝜁, or

𝑝𝑡 = 𝔼𝑡 [𝑚𝑡+1 (𝜁 + 𝑝𝑡+1 )]

We maintain the stochastic discount factor (13), so this becomes

−𝛾
𝑝𝑡 = 𝔼𝑡 [𝛽𝑔𝑡+1 (𝜁 + 𝑝𝑡+1 )] (15)

Guessing a solution of the form 𝑝𝑡 = 𝑝(𝑋𝑡 ) and conditioning on 𝑋𝑡 = 𝑥, we get

𝑝(𝑥) = 𝛽 ∑ 𝑔(𝑦)−𝛾 (𝜁 + 𝑝(𝑦))𝑃 (𝑥, 𝑦)


𝑦∈𝑆

Letting 𝑀 (𝑥, 𝑦) = 𝑃 (𝑥, 𝑦)𝑔(𝑦)−𝛾 and rewriting in vector notation yields the solution

𝑝 = (𝐼 − 𝛽𝑀 )−1 𝛽𝑀 𝜁𝟙 (16)

The above is implemented in the function consol_price.

In [7]: def consol_price(ap, ζ):


"""
Computes price of a consol bond with payoff ζ

Parameters
­­­­­­­­­­
ap: AssetPriceModel
An instance of AssetPriceModel containing primitives

ζ : scalar(float)
51.5. ASSET PRICES UNDER RISK AVERSION 849

Coupon of the console

Returns
­­­­­­­
p : array_like(float)
Console bond prices

"""
# Simplify names, set up matrices
β, γ, P, y = ap.β, ap.γ, ap.mc.P, ap.mc.state_values
M = P * ap.g(y)**(­ γ)

# Make sure that a unique solution exists


ap.test_stability(M)

# Compute price
I = np.identity(ap.n)
Ones = np.ones(ap.n)
p = solve(I ­ β * M, β * ζ * M @ Ones)

return p

51.5.3 Pricing an Option to Purchase the Consol

Let’s now price options of varying maturity that give the right to purchase a consol at a price
𝑝𝑆 .

An Infinite Horizon Call Option

We want to price an infinite horizon option to purchase a consol at a price 𝑝𝑆 .


The option entitles the owner at the beginning of a period either to

1. purchase the bond at price 𝑝𝑆 now, or

2. Not to exercise the option now but to retain the right to exercise it later

Thus, the owner either exercises the option now or chooses not to exercise and wait until next
period.
This is termed an infinite-horizon call option with strike price 𝑝𝑆 .
The owner of the option is entitled to purchase the consol at the price 𝑝𝑆 at the beginning of
any period, after the coupon has been paid to the previous owner of the bond.
The fundamentals of the economy are identical with the one above, including the stochastic
discount factor and the process for consumption.
Let 𝑤(𝑋𝑡 , 𝑝𝑆 ) be the value of the option when the time 𝑡 growth state is known to be 𝑋𝑡 but
before the owner has decided whether or not to exercise the option at time 𝑡 (i.e., today).
Recalling that 𝑝(𝑋𝑡 ) is the value of the consol when the initial growth state is 𝑋𝑡 , the value
of the option satisfies

𝑢′ (𝑐𝑡+1 )
𝑤(𝑋𝑡 , 𝑝𝑆 ) = max {𝛽 𝔼𝑡 𝑤(𝑋𝑡+1 , 𝑝𝑆 ), 𝑝(𝑋𝑡 ) − 𝑝𝑆 }
𝑢′ (𝑐𝑡 )
850 CHAPTER 51. ASSET PRICING: FINITE STATE MODELS

The first term on the right is the value of waiting, while the second is the value of exercising
now.
We can also write this as

𝑤(𝑥, 𝑝𝑆 ) = max {𝛽 ∑ 𝑃 (𝑥, 𝑦)𝑔(𝑦)−𝛾 𝑤(𝑦, 𝑝𝑆 ), 𝑝(𝑥) − 𝑝𝑆 } (17)


𝑦∈𝑆

With 𝑀 (𝑥, 𝑦) = 𝑃 (𝑥, 𝑦)𝑔(𝑦)−𝛾 and 𝑤 as the vector of values (𝑤(𝑥𝑖 ), 𝑝𝑆 )𝑛𝑖=1 , we can express
(17) as the nonlinear vector equation

𝑤 = max{𝛽𝑀 𝑤, 𝑝 − 𝑝𝑆 𝟙} (18)

To solve (18), form the operator 𝑇 mapping vector 𝑤 into vector 𝑇 𝑤 via

𝑇 𝑤 = max{𝛽𝑀 𝑤, 𝑝 − 𝑝𝑆 𝟙}

Start at some initial 𝑤 and iterate to convergence with 𝑇 .


We can find the solution with the following function call_option

In [8]: def call_option(ap, ζ, p_s, ϵ=1e­7):


"""
Computes price of a call option on a consol bond.

Parameters
­­­­­­­­­­
ap: AssetPriceModel
An instance of AssetPriceModel containing primitives

ζ : scalar(float)
Coupon of the console

p_s : scalar(float)
Strike price

ϵ : scalar(float), optional(default=1e­8)
Tolerance for infinite horizon problem

Returns
­­­­­­­
w : array_like(float)
Infinite horizon call option prices

"""
# Simplify names, set up matrices
β, γ, P, y = ap.β, ap.γ, ap.mc.P, ap.mc.state_values
M = P * ap.g(y)**(­ γ)

# Make sure that a unique consol price exists


ap.test_stability(M)

# Compute option price


p = consol_price(ap, ζ)
w = np.zeros(ap.n)
error = ϵ + 1
while error > ϵ:
# Maximize across columns
51.5. ASSET PRICES UNDER RISK AVERSION 851

w_new = np.maximum(β * M @ w, p ­ p_s)


# Find maximal difference of each component and update
error = np.amax(np.abs(w ­ w_new))
w = w_new

return w

Here’s a plot of 𝑤 compared to the consol price when 𝑃𝑆 = 40

In [9]: ap = AssetPriceModel(β=0.9)
ζ = 1.0
strike_price = 40

x = ap.mc.state_values
p = consol_price(ap, ζ)
w = call_option(ap, ζ, strike_price)

fig, ax = plt.subplots(figsize=(12, 8))


ax.plot(x, p, 'b­', lw=2, label='consol price')
ax.plot(x, w, 'g­', lw=2, label='value of call option')
ax.set_xlabel("state")
ax.legend(loc='upper right')
plt.show()

In large states, the value of the option is close to zero.


This is despite the fact the Markov chain is irreducible and low states — where the consol
prices are high — will eventually be visited.
The reason is that 𝛽 = 0.9, so the future is discounted relatively rapidly.
852 CHAPTER 51. ASSET PRICING: FINITE STATE MODELS

51.5.4 Risk-Free Rates

Let’s look at risk-free interest rates over different periods.

The One-period Risk-free Interest Rate


−𝛾
As before, the stochastic discount factor is 𝑚𝑡+1 = 𝛽𝑔𝑡+1 .
It follows that the reciprocal 𝑅𝑡−1 of the gross risk-free interest rate 𝑅𝑡 in state 𝑥 is

𝔼𝑡 𝑚𝑡+1 = 𝛽 ∑ 𝑃 (𝑥, 𝑦)𝑔(𝑦)−𝛾


𝑦∈𝑆

We can write this as

𝑚1 = 𝛽𝑀 𝟙

where the 𝑖-th element of 𝑚1 is the reciprocal of the one-period gross risk-free interest rate in
state 𝑥𝑖 .

Other Terms

Let 𝑚𝑗 be an 𝑛 × 1 vector whose 𝑖 th component is the reciprocal of the 𝑗 -period gross risk-
free interest rate in state 𝑥𝑖 .
Then 𝑚1 = 𝛽𝑀 , and 𝑚𝑗+1 = 𝑀 𝑚𝑗 for 𝑗 ≥ 1.

51.6 Exercises

51.6.1 Exercise 1

In the lecture, we considered ex-dividend assets.


A cum-dividend asset is a claim to the stream 𝑑𝑡 , 𝑑𝑡+1 , ….
Following (1), find the risk-neutral asset pricing equation for one unit of a cum-dividend as-
set.
With a constant, non-random dividend stream 𝑑𝑡 = 𝑑 > 0, what is the equilibrium price of a
cum-dividend asset?
With a growing, non-random dividend process 𝑑𝑡 = 𝑔𝑑𝑡 where 0 < 𝑔𝛽 < 1, what is the
equilibrium price of a cum-dividend asset?

51.6.2 Exercise 2

Consider the following primitives

In [10]: n = 5
P = 0.0125 * np.ones((n, n))
P += np.diag(0.95 ­ 0.0125 * np.ones(5))
51.7. SOLUTIONS 853

# State values of the Markov chain


s = np.array([0.95, 0.975, 1.0, 1.025, 1.05])
γ = 2.0
β = 0.94

Let 𝑔 be defined by 𝑔(𝑥) = 𝑥 (that is, 𝑔 is the identity map).


Compute the price of the Lucas tree.
Do the same for
• the price of the risk-free consol when 𝜁 = 1
• the call option on the consol when 𝜁 = 1 and 𝑝𝑆 = 150.0

51.6.3 Exercise 3

Let’s consider finite horizon call options, which are more common than the infinite horizon
variety.
Finite horizon options obey functional equations closely related to (17).
A 𝑘 period option expires after 𝑘 periods.
If we view today as date zero, a 𝑘 period option gives the owner the right to exercise the op-
tion to purchase the risk-free consol at the strike price 𝑝𝑆 at dates 0, 1, … , 𝑘 − 1.
The option expires at time 𝑘.
Thus, for 𝑘 = 1, 2, …, let 𝑤(𝑥, 𝑘) be the value of a 𝑘-period option.
It obeys

𝑤(𝑥, 𝑘) = max {𝛽 ∑ 𝑃 (𝑥, 𝑦)𝑔(𝑦)−𝛾 𝑤(𝑦, 𝑘 − 1), 𝑝(𝑥) − 𝑝𝑆 }


𝑦∈𝑆

where 𝑤(𝑥, 0) = 0 for all 𝑥.


We can express the preceding as the sequence of nonlinear vector equations

𝑤𝑘 = max{𝛽𝑀 𝑤𝑘−1 , 𝑝 − 𝑝𝑆 𝟙} 𝑘 = 1, 2, … with 𝑤0 = 0

Write a function that computes 𝑤𝑘 for any given 𝑘.


Compute the value of the option with k = 5 and k = 25 using parameter values as in Exercise
1.
Is one higher than the other? Can you give intuition?

51.7 Solutions

51.7.1 Exercise 1

For a cum-dividend asset, the basic risk-neutral asset pricing equation is

𝑝𝑡 = 𝑑𝑡 + 𝛽𝔼𝑡 [𝑝𝑡+1 ]
854 CHAPTER 51. ASSET PRICING: FINITE STATE MODELS

With constant dividends, the equilibrium price is

1
𝑝𝑡 = 𝑑
1−𝛽 𝑡

With a growing, non-random dividend process, the equilibrium price is

1
𝑝𝑡 = 𝑑
1 − 𝛽𝑔 𝑡

51.7.2 Exercise 2

First, let’s enter the parameters:

In [11]: n = 5
P = 0.0125 * np.ones((n, n))
P += np.diag(0.95 ­ 0.0125 * np.ones(5))
s = np.array([0.95, 0.975, 1.0, 1.025, 1.05]) # State values
mc = qe.MarkovChain(P, state_values=s)

γ = 2.0
β = 0.94
ζ = 1.0
p_s = 150.0

Next, we’ll create an instance of AssetPriceModel to feed into the functions

In [12]: apm = AssetPriceModel(β=β, mc=mc, γ=γ, g=lambda x: x)

Now we just need to call the relevant functions on the data:

In [13]: tree_price(apm)

Out[13]: array([29.47401578, 21.93570661, 17.57142236, 14.72515002, 12.72221763])

In [14]: consol_price(apm, ζ)

Out[14]: array([753.87100476, 242.55144082, 148.67554548, 109.25108965,


87.56860139])

In [15]: call_option(apm, ζ, p_s)

Out[15]: array([603.87100476, 176.8393343 , 108.67734499, 80.05179254,


64.30843748])

Let’s show the last two functions as a plot

In [16]: fig, ax = plt.subplots()


ax.plot(s, consol_price(apm, ζ), label='consol')
ax.plot(s, call_option(apm, ζ, p_s), label='call option')
ax.legend()
plt.show()
51.7. SOLUTIONS 855

51.7.3 Exercise 3

Here’s a suitable function:

In [17]: def finite_horizon_call_option(ap, ζ, p_s, k):


"""
Computes k period option value.
"""
# Simplify names, set up matrices
β, γ, P, y = ap.β, ap.γ, ap.mc.P, ap.mc.state_values
M = P * ap.g(y)**(­ γ)

# Make sure that a unique solution exists


ap.test_stability(M)

# Compute option price


p = consol_price(ap, ζ)
w = np.zeros(ap.n)
for i in range(k):
# Maximize across columns
w = np.maximum(β * M @ w, p ­ p_s)

return w

Now let’s compute the option values at k=5 and k=25

In [18]: fig, ax = plt.subplots()


for k in [5, 25]:
w = finite_horizon_call_option(apm, ζ, p_s, k)
ax.plot(s, w, label=rf'$k = {k}$')
ax.legend()
plt.show()
856 CHAPTER 51. ASSET PRICING: FINITE STATE MODELS

Not surprisingly, the option has greater value with larger 𝑘.


This is because the owner has a longer time horizon over which he or she may exercise the
option.
Chapter 52

Asset Pricing with Incomplete


Markets

52.1 Contents

• Overview 52.2
• Structure of the Model 52.3
• Solving the Model 52.4
• Exercises 52.5
• Solutions 52.6
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !conda install ­y quantecon

52.2 Overview

This lecture describes a version of a model of Harrison and Kreps [53].


The model determines the price of a dividend-yielding asset that is traded by two types of
self-interested investors.
The model features
• heterogeneous beliefs
• incomplete markets
• short sales constraints, and possibly …
• (leverage) limits on an investor’s ability to borrow in order to finance purchases of a
risky asset
Let’s start with some standard imports:

In [2]: import numpy as np


import quantecon as qe
import scipy.linalg as la

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/numba/np/ufunc/parallel.py:355: NumbaWarning: The TBB threading layer
requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found

857
858 CHAPTER 52. ASSET PRICING WITH INCOMPLETE MARKETS

TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled.


warnings.warn(problem)

52.2.1 References

Prior to reading the following, you might like to review our lectures on
• Markov chains
• Asset pricing with finite state space

52.2.2 Bubbles

Economists differ in how they define a bubble.


The Harrison-Kreps model illustrates the following notion of a bubble that attracts many
economists:

A component of an asset price can be interpreted as a bubble when all investors


agree that the current price of the asset exceeds what they believe the asset’s un-
derlying dividend stream justifies.

52.3 Structure of the Model

The model simplifies by ignoring alterations in the distribution of wealth among investors
having different beliefs about the fundamentals that determine asset payouts.
There is a fixed number 𝐴 of shares of an asset.
Each share entitles its owner to a stream of dividends {𝑑𝑡 } governed by a Markov chain de-
fined on a state space 𝑆 ∈ {0, 1}.
The dividend obeys

0 if 𝑠𝑡 = 0
𝑑𝑡 = {
1 if 𝑠𝑡 = 1

The owner of a share at the beginning of time 𝑡 is entitled to the dividend paid at time 𝑡.
The owner of the share at the beginning of time 𝑡 is also entitled to sell the share to another
investor during time 𝑡.
Two types ℎ = 𝑎, 𝑏 of investors differ only in their beliefs about a Markov transition matrix 𝑃
with typical element

𝑃 (𝑖, 𝑗) = ℙ{𝑠𝑡+1 = 𝑗 ∣ 𝑠𝑡 = 𝑖}

Investors of type 𝑎 believe the transition matrix

1 1
𝑃𝑎 = [ 22 2]
1
3 3
52.3. STRUCTURE OF THE MODEL 859

Investors of type 𝑏 think the transition matrix is

2 1
𝑃𝑏 = [ 31 3]
3
4 4

The stationary (i.e., invariant) distributions of these two matrices can be calculated as fol-
lows:

In [3]: qa = np.array([[1/2, 1/2], [2/3, 1/3]])


qb = np.array([[2/3, 1/3], [1/4, 3/4]])
mca = qe.MarkovChain(qa)
mcb = qe.MarkovChain(qb)
mca.stationary_distributions

Out[3]: array([[0.57142857, 0.42857143]])

In [4]: mcb.stationary_distributions

Out[4]: array([[0.42857143, 0.57142857]])

The stationary distribution of 𝑃𝑎 is approximately 𝜋𝐴 = [.57 .43].


The stationary distribution of 𝑃𝑏 is approximately 𝜋𝐵 = [.43 .57].

52.3.1 Ownership Rights

An owner of the asset at the end of time 𝑡 is entitled to the dividend at time 𝑡 + 1 and also
has the right to sell the asset at time 𝑡 + 1.
Both types of investors are risk-neutral and both have the same fixed discount factor 𝛽 ∈
(0, 1).
In our numerical example, we’ll set 𝛽 = .75, just as Harrison and Kreps did.
We’ll eventually study the consequences of two different assumptions about the number of
shares 𝐴 relative to the resources that our two types of investors can invest in the stock.

1. Both types of investors have enough resources (either wealth or the capacity to borrow)
so that they can purchase the entire available stock of the asset Section ??.
2. No single type of investor has sufficient resources to purchase the entire stock.

Case 1 is the case studied in Harrison and Kreps.


In case 2, both types of investors always hold at least some of the asset.

52.3.2 Short Sales Prohibited

No short sales are allowed.


This matters because it limits pessimists from expressing their opinions.
• They can express their views by selling their shares.
• They cannot express their pessimism more loudly by artificially “manufacturing shares”
– that is, they cannot borrow shares from more optimistic investors and sell them im-
mediately.
860 CHAPTER 52. ASSET PRICING WITH INCOMPLETE MARKETS

52.3.3 Optimism and Pessimism

The above specifications of the perceived transition matrices 𝑃𝑎 and 𝑃𝑏 , taken directly from
Harrison and Kreps, build in stochastically alternating temporary optimism and pessimism.
Remember that state 1 is the high dividend state.
• In state 0, a type 𝑎 agent is more optimistic about next period’s dividend than a type 𝑏
agent.
• In state 1, a type 𝑏 agent is more optimistic about next period’s dividend.
However, the stationary distributions 𝜋𝐴 = [.57 .43] and 𝜋𝐵 = [.43 .57] tell us that a
type 𝐵 person is more optimistic about the dividend process in the long run than is a type A
person.
Transition matrices for the temporarily optimistic and pessimistic investors are constructed as
follows.
Temporarily optimistic investors (i.e., the investor with the most optimistic beliefs in each
state) believe the transition matrix

1 1
𝑃𝑜 = [ 21 2]
3
4 4

Temporarily pessimistic believe the transition matrix

1 1
𝑃𝑝 = [ 21 2]
3
4 4

We’ll return to these matrices and their significance in the exercise.

52.3.4 Information

Investors know a price function mapping the state 𝑠𝑡 at 𝑡 into the equilibrium price 𝑝(𝑠𝑡 ) that
prevails in that state.
This price function is endogenous and to be determined below.
When investors choose whether to purchase or sell the asset at 𝑡, they also know 𝑠𝑡 .

52.4 Solving the Model

Now let’s turn to solving the model.


This amounts to determining equilibrium prices under the different possible specifications of
beliefs and constraints listed above.
In particular, we compare equilibrium price functions under the following alternative assump-
tions about beliefs:

1. There is only one type of agent, either 𝑎 or 𝑏.

2. There are two types of agents differentiated only by their beliefs. Each type of agent
has sufficient resources to purchase all of the asset (Harrison and Kreps’s setting).
52.4. SOLVING THE MODEL 861

3. There are two types of agents with different beliefs, but because of limited wealth
and/or limited leverage, both types of investors hold the asset each period.

𝑠𝑡 0 1
𝑝𝑎 1.33 1.22
𝑝𝑏 1.45 1.91
𝑝𝑜 1.85 2.08
𝑝𝑝 1 1
𝑝𝑎̂ 1.85 1.69
𝑝𝑏̂ 1.69 2.08

Here
• 𝑝𝑎 is the equilibrium price function under homogeneous beliefs 𝑃𝑎
• 𝑝𝑏 is the equilibrium price function under homogeneous beliefs 𝑃𝑏
• 𝑝𝑜 is the equilibrium price function under heterogeneous beliefs with optimistic marginal
investors
• 𝑝𝑝 is the equilibrium price function under heterogeneous beliefs with pessimistic
marginal investors
• 𝑝𝑎̂ is the amount type 𝑎 investors are willing to pay for the asset
• 𝑝𝑏̂ is the amount type 𝑏 investors are willing to pay for the asset
We’ll explain these values and how they are calculated one row at a time.

52.4.1 Single Belief Prices

We’ll start by pricing the asset under homogeneous beliefs.


(This is the case treated in the lecture on asset pricing with finite Markov states)
Suppose that there is only one type of investor, either of type 𝑎 or 𝑏, and that this investor
always “prices the asset”.
𝑝ℎ (0)
Let 𝑝ℎ = [ ] be the equilibrium price vector when all investors are of type ℎ.
𝑝ℎ (1)
The price today equals the expected discounted value of tomorrow’s dividend and tomorrow’s
price of the asset:

𝑝ℎ (𝑠) = 𝛽 (𝑃ℎ (𝑠, 0)(0 + 𝑝ℎ (0)) + 𝑃ℎ (𝑠, 1)(1 + 𝑝ℎ (1))) , 𝑠 = 0, 1

These equations imply that the equilibrium price vector is

𝑝 (0) 0
[ ℎ ] = 𝛽[𝐼 − 𝛽𝑃ℎ ]−1 𝑃ℎ [ ] (1)
𝑝ℎ (1) 1

The first two rows of the table report 𝑝𝑎 (𝑠) and 𝑝𝑏 (𝑠).
Here’s a function that can be used to compute these values

In [5]: def price_single_beliefs(transition, dividend_payoff, β=.75):


"""
862 CHAPTER 52. ASSET PRICING WITH INCOMPLETE MARKETS

Function to Solve Single Beliefs


"""
# First compute inverse piece
imbq_inv = la.inv(np.eye(transition.shape[0]) ­ β * transition)

# Next compute prices


prices = β * imbq_inv @ transition @ dividend_payoff

return prices

Single Belief Prices as Benchmarks

These equilibrium prices under homogeneous beliefs are important benchmarks for the subse-
quent analysis.
• 𝑝ℎ (𝑠) tells what investor ℎ thinks is the “fundamental value” of the asset.
• Here “fundamental value” means the expected discounted present value of future divi-
dends.
We will compare these fundamental values of the asset with equilibrium values when traders
have different beliefs.

52.4.2 Pricing under Heterogeneous Beliefs

There are several cases to consider.


The first is when both types of agents have sufficient wealth to purchase all of the asset them-
selves.
In this case, the marginal investor who prices the asset is the more optimistic type so that the
equilibrium price 𝑝̄ satisfies Harrison and Kreps’s key equation:

𝑝(𝑠)
̄ = 𝛽 max {𝑃𝑎 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1)),
̄ 𝑃𝑏 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))}
̄ (2)

for 𝑠 = 0, 1.
The marginal investor who prices the asset in state 𝑠 is of type 𝑎 if

𝑃𝑎 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1))
̄ > 𝑃𝑏 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))
̄

The marginal investor is of type 𝑏 if

𝑃𝑎 (𝑠, 1)𝑝(0)
̄ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1))
̄ < 𝑃𝑏 (𝑠, 1)𝑝(0)
̄ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))
̄

Thus the marginal investor is the (temporarily) optimistic type.


Equation (2) is a functional equation that, like a Bellman equation, can be solved by
• starting with a guess for the price vector 𝑝̄ and
• iterating to convergence on the operator that maps a guess 𝑝̄𝑗 into an updated guess
𝑝̄𝑗+1 defined by the right side of (2), namely

𝑝̄𝑗+1 (𝑠) = 𝛽 max {𝑃𝑎 (𝑠, 0)𝑝̄𝑗 (0) + 𝑃𝑎 (𝑠, 1)(1 + 𝑝̄𝑗 (1)), 𝑃𝑏 (𝑠, 0)𝑝̄𝑗 (0) + 𝑃𝑏 (𝑠, 1)(1 + 𝑝̄𝑗 (1))} (3)
52.4. SOLVING THE MODEL 863

for 𝑠 = 0, 1.
The third row of the table reports equilibrium prices that solve the functional equation when
𝛽 = .75.
Here the type that is optimistic about 𝑠𝑡+1 prices the asset in state 𝑠𝑡 .
It is instructive to compare these prices with the equilibrium prices for the homogeneous be-
lief economies that solve under beliefs 𝑃𝑎 and 𝑃𝑏 .
Equilibrium prices 𝑝̄ in the heterogeneous beliefs economy exceed what any prospective in-
vestor regards as the fundamental value of the asset in each possible state.
Nevertheless, the economy recurrently visits a state that makes each investor want to pur-
chase the asset for more than he believes its future dividends are worth.
The reason is that he expects to have the option to sell the asset later to another investor
who will value the asset more highly than he will.
• Investors of type 𝑎 are willing to pay the following price for the asset

𝑝(0)
̄ if 𝑠𝑡 = 0
𝑝𝑎̂ (𝑠) = {
𝛽(𝑃𝑎 (1, 0)𝑝(0)
̄ + 𝑃𝑎 (1, 1)(1 + 𝑝(1)))
̄ if 𝑠𝑡 = 1
• Investors of type 𝑏 are willing to pay the following price for the asset

𝛽(𝑃𝑏 (0, 0)𝑝(0)


̄ + 𝑃𝑏 (0, 1)(1 + 𝑝(1)))
̄ if 𝑠𝑡 = 0
𝑝𝑏̂ (𝑠) = {
𝑝(1)
̄ if 𝑠𝑡 = 1

Evidently, 𝑝𝑎̂ (1) < 𝑝(1)


̄ and 𝑝𝑏̂ (0) < 𝑝(0).
̄
Investors of type 𝑎 want to sell the asset in state 1 while investors of type 𝑏 want to sell it in
state 0.
• The asset changes hands whenever the state changes from 0 to 1 or from 1 to 0.
• The valuations 𝑝𝑎̂ (𝑠) and 𝑝𝑏̂ (𝑠) are displayed in the fourth and fifth rows of the table.
• Even the pessimistic investors who don’t buy the asset think that it is worth more than
they think future dividends are worth.
Here’s code to solve for 𝑝,̄ 𝑝𝑎̂ and 𝑝𝑏̂ using the iterative method described above

In [6]: def price_optimistic_beliefs(transitions, dividend_payoff, β=.75,


max_iter=50000, tol=1e­16):
"""
Function to Solve Optimistic Beliefs
"""
# We will guess an initial price vector of [0, 0]
p_new = np.array([[0], [0]])
p_old = np.array([[10.], [10.]])

# We know this is a contraction mapping, so we can iterate to conv


for i in range(max_iter):
p_old = p_new
p_new = β * np.max([q @ p_old
+ q @ dividend_payoff for q in transitions],
1)

# If we succeed in converging, break out of for loop


if np.max(np.sqrt((p_new ­ p_old)**2)) < tol:
864 CHAPTER 52. ASSET PRICING WITH INCOMPLETE MARKETS

break

ptwiddle = β * np.min([q @ p_old


+ q @ dividend_payoff for q in transitions],
1)

phat_a = np.array([p_new[0], ptwiddle[1]])


phat_b = np.array([ptwiddle[0], p_new[1]])

return p_new, phat_a, phat_b

52.4.3 Insufficient Funds

Outcomes differ when the more optimistic type of investor has insufficient wealth — or insuf-
ficient ability to borrow enough — to hold the entire stock of the asset.
In this case, the asset price must adjust to attract pessimistic investors.
Instead of equation (2), the equilibrium price satisfies

𝑝(𝑠)
̌ = 𝛽 min {𝑃𝑎 (𝑠, 1)𝑝(0)
̌ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1)),
̌ 𝑃𝑏 (𝑠, 1)𝑝(0)
̌ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))}
̌ (4)

and the marginal investor who prices the asset is always the one that values it less highly
than does the other type.
Now the marginal investor is always the (temporarily) pessimistic type.
Notice from the sixth row of that the pessimistic price 𝑝 is lower than the homogeneous belief
prices 𝑝𝑎 and 𝑝𝑏 in both states.
When pessimistic investors price the asset according to (4), optimistic investors think that
the asset is underpriced.
If they could, optimistic investors would willingly borrow at the one-period gross interest rate
𝛽 −1 to purchase more of the asset.
Implicit constraints on leverage prohibit them from doing so.
When optimistic investors price the asset as in equation (2), pessimistic investors think that
the asset is overpriced and would like to sell the asset short.
Constraints on short sales prevent that.
Here’s code to solve for 𝑝̌ using iteration

In [7]: def price_pessimistic_beliefs(transitions, dividend_payoff, β=.75,


max_iter=50000, tol=1e­16):
"""
Function to Solve Pessimistic Beliefs
"""
# We will guess an initial price vector of [0, 0]
p_new = np.array([[0], [0]])
p_old = np.array([[10.], [10.]])

# We know this is a contraction mapping, so we can iterate to conv


for i in range(max_iter):
p_old = p_new
p_new = β * np.min([q @ p_old
+ q @ dividend_payoff for q in transitions],
1)
52.5. EXERCISES 865

# If we succeed in converging, break out of for loop


if np.max(np.sqrt((p_new ­ p_old)**2)) < tol:
break

return p_new

52.4.4 Further Interpretation

[97] interprets the Harrison-Kreps model as a model of a bubble — a situation in which an


asset price exceeds what every investor thinks is merited by the asset’s underlying dividend
stream.
Scheinkman stresses these features of the Harrison-Kreps model:
• Compared to the homogeneous beliefs setting leading to the pricing formula, high vol-
ume occurs when the Harrison-Kreps pricing formula prevails.
Type 𝑎 investors sell the entire stock of the asset to type 𝑏 investors every time the state
switches from 𝑠𝑡 = 0 to 𝑠𝑡 = 1.
Type 𝑏 investors sell the asset to type 𝑎 investors every time the state switches from 𝑠𝑡 = 1 to
𝑠𝑡 = 0.
Scheinkman takes this as a strength of the model because he observes high volume during
famous bubbles.
• If the supply of the asset is increased sufficiently either physically (more “houses” are
built) or artificially (ways are invented to short sell “houses”), bubbles end when the
supply has grown enough to outstrip optimistic investors’ resources for purchasing the
asset.
• If optimistic investors finance purchases by borrowing, tightening leverage constraints
can extinguish a bubble.
Scheinkman extracts insights about the effects of financial regulations on bubbles.
He emphasizes how limiting short sales and limiting leverage have opposite effects.

52.5 Exercises

52.5.1 Exercise 1

Recreate the summary table using the functions we have built above.

𝑠𝑡 0 1
𝑝𝑎 1.33 1.22
𝑝𝑏 1.45 1.91
𝑝𝑜 1.85 2.08
𝑝𝑝 1 1
𝑝𝑎̂ 1.85 1.69
𝑝𝑏̂ 1.69 2.08

You will first need to define the transition matrices and dividend payoff vector.
866 CHAPTER 52. ASSET PRICING WITH INCOMPLETE MARKETS

52.6 Solutions

52.6.1 Exercise 1

First, we will obtain equilibrium price vectors with homogeneous beliefs, including when all
investors are optimistic or pessimistic.

In [8]: qa = np.array([[1/2, 1/2], [2/3, 1/3]]) # Type a transition matrix


qb = np.array([[2/3, 1/3], [1/4, 3/4]]) # Type b transition matrix
# Optimistic investor transition matrix
qopt = np.array([[1/2, 1/2], [1/4, 3/4]])
# Pessimistic investor transition matrix
qpess = np.array([[2/3, 1/3], [2/3, 1/3]])

dividendreturn = np.array([[0], [1]])

transitions = [qa, qb, qopt, qpess]


labels = ['p_a', 'p_b', 'p_optimistic', 'p_pessimistic']

for transition, label in zip(transitions, labels):


print(label)
print("=" * 20)
s0, s1 = np.round(price_single_beliefs(transition, dividendreturn), 2)
print(f"State 0: {s0}")
print(f"State 1: {s1}")
print("­" * 20)

p_a
====================
State 0: [1.33]
State 1: [1.22]
­­­­­­­­­­­­­­­­­­­­
p_b
====================
State 0: [1.45]
State 1: [1.91]
­­­­­­­­­­­­­­­­­­­­
p_optimistic
====================
State 0: [1.85]
State 1: [2.08]
­­­­­­­­­­­­­­­­­­­­
p_pessimistic
====================
State 0: [1.]
State 1: [1.]
­­­­­­­­­­­­­­­­­­­­

We will use the price_optimistic_beliefs function to find the price under heterogeneous be-
liefs.

In [9]: opt_beliefs = price_optimistic_beliefs([qa, qb], dividendreturn)


labels = ['p_optimistic', 'p_hat_a', 'p_hat_b']

for p, label in zip(opt_beliefs, labels):


print(label)
print("=" * 20)
s0, s1 = np.round(p, 2)
52.6. SOLUTIONS 867

print(f"State 0: {s0}")
print(f"State 1: {s1}")
print("­" * 20)

p_optimistic
====================
State 0: [1.85]
State 1: [2.08]
­­­­­­­­­­­­­­­­­­­­
p_hat_a
====================
State 0: [1.85]
State 1: [1.69]
­­­­­­­­­­­­­­­­­­­­
p_hat_b
====================
State 0: [1.69]
State 1: [2.08]
­­­­­­­­­­­­­­­­­­­­

Notice that the equilibrium price with heterogeneous beliefs is equal to the price under single
beliefs with optimistic investors - this is due to the marginal investor being the temporarily
optimistic type.
Footnotes
[1] By assuming that both types of agents always have “deep enough pockets” to purchase
all of the asset, the model takes wealth dynamics off the table. The Harrison-Kreps model
generates high trading volume when the state changes either from 0 to 1 or from 1 to 0.
868 CHAPTER 52. ASSET PRICING WITH INCOMPLETE MARKETS
Part IX

Data and Empirics

869
Chapter 53

Pandas for Panel Data

53.1 Contents

• Overview 53.2
• Slicing and Reshaping Data 53.3
• Merging Dataframes and Filling NaNs 53.4
• Grouping and Summarizing Data 53.5
• Final Remarks 53.6
• Exercises 53.7
• Solutions 53.8

53.2 Overview

In an earlier lecture on pandas, we looked at working with simple data sets.


Econometricians often need to work with more complex data sets, such as panels.
Common tasks include
• Importing data, cleaning it and reshaping it across several axes.
• Selecting a time series or cross-section from a panel.
• Grouping and summarizing data.
pandas (derived from ‘panel’ and ‘data’) contains powerful and easy-to-use tools for solving
exactly these kinds of problems.
In what follows, we will use a panel data set of real minimum wages from the OECD to cre-
ate:
• summary statistics over multiple dimensions of our data
• a time series of the average minimum wage of countries in the dataset
• kernel density estimates of wages by continent
We will begin by reading in our long format panel data from a CSV file and reshaping the
resulting DataFrame with pivot_table to build a MultiIndex.
Additional detail will be added to our DataFrame using pandas’ merge function, and data will
be summarized with the groupby function.

871
872 CHAPTER 53. PANDAS FOR PANEL DATA

53.3 Slicing and Reshaping Data

We will read in a dataset from the OECD of real minimum wages in 32 countries and assign
it to realwage.
The dataset can be accessed with the following link:

In [1]: url1 = 'https://raw.githubusercontent.com/QuantEcon/lecture­


python/master/source/_static/lecture_specific/pandas_panel/realwage.csv'

In [2]: import pandas as pd

# Display 6 columns for viewing purposes


pd.set_option('display.max_columns', 6)

# Reduce decimal points to 2


pd.options.display.float_format = '{:,.2f}'.format

realwage = pd.read_csv(url1)

Let’s have a look at what we’ve got to work with

In [3]: realwage.head() # Show first 5 rows

Out[3]: Unnamed: 0 Time Country Series \


0 0 2006­01­01 Ireland In 2015 constant prices at 2015 USD PPPs
1 1 2007­01­01 Ireland In 2015 constant prices at 2015 USD PPPs
2 2 2008­01­01 Ireland In 2015 constant prices at 2015 USD PPPs
3 3 2009­01­01 Ireland In 2015 constant prices at 2015 USD PPPs
4 4 2010­01­01 Ireland In 2015 constant prices at 2015 USD PPPs

Pay period value


0 Annual 17,132.44
1 Annual 18,100.92
2 Annual 17,747.41
3 Annual 18,580.14
4 Annual 18,755.83

The data is currently in long format, which is difficult to analyze when there are several di-
mensions to the data.
We will use pivot_table to create a wide format panel, with a MultiIndex to handle higher
dimensional data.
pivot_table arguments should specify the data (values), the index, and the columns we want
in our resulting dataframe.
By passing a list in columns, we can create a MultiIndex in our column axis

In [4]: realwage = realwage.pivot_table(values='value',


index='Time',
columns=['Country', 'Series', 'Pay period'])
realwage.head()

Out[4]: Country Australia \


Series In 2015 constant prices at 2015 USD PPPs
Pay period Annual Hourly
53.3. SLICING AND RESHAPING DATA 873

Time
2006­01­01 20,410.65 10.33
2007­01­01 21,087.57 10.67
2008­01­01 20,718.24 10.48
2009­01­01 20,984.77 10.62
2010­01­01 20,879.33 10.57

Country … \
Series In 2015 constant prices at 2015 USD exchange rates …
Pay period Annual …
Time …
2006­01­01 23,826.64 …
2007­01­01 24,616.84 …
2008­01­01 24,185.70 …
2009­01­01 24,496.84 …
2010­01­01 24,373.76 …

Country United States \


Series In 2015 constant prices at 2015 USD PPPs
Pay period Hourly
Time
2006­01­01 6.05
2007­01­01 6.24
2008­01­01 6.78
2009­01­01 7.58
2010­01­01 7.88

Country
Series In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly
Time
2006­01­01 12,594.40 6.05
2007­01­01 12,974.40 6.24
2008­01­01 14,097.56 6.78
2009­01­01 15,756.42 7.58
2010­01­01 16,391.31 7.88

[5 rows x 128 columns]

To more easily filter our time series data, later on, we will convert the index into a
DateTimeIndex

In [5]: realwage.index = pd.to_datetime(realwage.index)


type(realwage.index)

Out[5]: pandas.core.indexes.datetimes.DatetimeIndex

The columns contain multiple levels of indexing, known as a MultiIndex, with levels being
ordered hierarchically (Country > Series > Pay period).
A MultiIndex is the simplest and most flexible way to manage panel data in pandas

In [6]: type(realwage.columns)

Out[6]: pandas.core.indexes.multi.MultiIndex

In [7]: realwage.columns.names

Out[7]: FrozenList(['Country', 'Series', 'Pay period'])


874 CHAPTER 53. PANDAS FOR PANEL DATA

Like before, we can select the country (the top level of our MultiIndex)

In [8]: realwage['United States'].head()

Out[8]: Series In 2015 constant prices at 2015 USD PPPs \


Pay period Annual Hourly
Time
2006­01­01 12,594.40 6.05
2007­01­01 12,974.40 6.24
2008­01­01 14,097.56 6.78
2009­01­01 15,756.42 7.58
2010­01­01 16,391.31 7.88

Series In 2015 constant prices at 2015 USD exchange rates


Pay period Annual Hourly
Time
2006­01­01 12,594.40 6.05
2007­01­01 12,974.40 6.24
2008­01­01 14,097.56 6.78
2009­01­01 15,756.42 7.58
2010­01­01 16,391.31 7.88

Stacking and unstacking levels of the MultiIndex will be used throughout this lecture to re-
shape our dataframe into a format we need.
.stack() rotates the lowest level of the column MultiIndex to the row index (.unstack()
works in the opposite direction - try it out)

In [9]: realwage.stack().head()

Out[9]: Country Australia \


Series In 2015 constant prices at 2015 USD PPPs
Time Pay period
2006­01­01 Annual 20,410.65
Hourly 10.33
2007­01­01 Annual 21,087.57
Hourly 10.67
2008­01­01 Annual 20,718.24

Country \
Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
2006­01­01 Annual 23,826.64
Hourly 12.06
2007­01­01 Annual 24,616.84
Hourly 12.46
2008­01­01 Annual 24,185.70

Country Belgium … \
Series In 2015 constant prices at 2015 USD PPPs …
Time Pay period …
2006­01­01 Annual 21,042.28 …
Hourly 10.09 …
2007­01­01 Annual 21,310.05 …
Hourly 10.22 …
2008­01­01 Annual 21,416.96 …

Country United Kingdom \


Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
53.3. SLICING AND RESHAPING DATA 875

2006­01­01 Annual 20,376.32


Hourly 9.81
2007­01­01 Annual 20,954.13
Hourly 10.07
2008­01­01 Annual 20,902.87

Country United States \


Series In 2015 constant prices at 2015 USD PPPs
Time Pay period
2006­01­01 Annual 12,594.40
Hourly 6.05
2007­01­01 Annual 12,974.40
Hourly 6.24
2008­01­01 Annual 14,097.56

Country
Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
2006­01­01 Annual 12,594.40
Hourly 6.05
2007­01­01 Annual 12,974.40
Hourly 6.24
2008­01­01 Annual 14,097.56

[5 rows x 64 columns]

We can also pass in an argument to select the level we would like to stack

In [10]: realwage.stack(level='Country').head()

Out[10]: Series In 2015 constant prices at 2015 USD PPPs \


Pay period Annual Hourly
Time Country
2006­01­01 Australia 20,410.65 10.33
Belgium 21,042.28 10.09
Brazil 3,310.51 1.41
Canada 13,649.69 6.56
Chile 5,201.65 2.22

Series In 2015 constant prices at 2015 USD exchange rates


Pay period Annual Hourly
Time Country
2006­01­01 Australia 23,826.64 12.06
Belgium 20,228.74 9.70
Brazil 2,032.87 0.87
Canada 14,335.12 6.89
Chile 3,333.76 1.42

Using a DatetimeIndex makes it easy to select a particular time period.


Selecting one year and stacking the two lower levels of the MultiIndex creates a cross-section
of our panel data

In [11]: realwage['2015'].stack(level=(1, 2)).transpose().head()

Out[11]: Time 2015­01­01 \


Series In 2015 constant prices at 2015 USD PPPs
Pay period Annual Hourly
Country
876 CHAPTER 53. PANDAS FOR PANEL DATA

Australia 21,715.53 10.99


Belgium 21,588.12 10.35
Brazil 4,628.63 2.00
Canada 16,536.83 7.95
Chile 6,633.56 2.80

Time
Series In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly
Country
Australia 25,349.90 12.83
Belgium 20,753.48 9.95
Brazil 2,842.28 1.21
Canada 17,367.24 8.35
Chile 4,251.49 1.81

For the rest of lecture, we will work with a dataframe of the hourly real minimum wages
across countries and time, measured in 2015 US dollars.
To create our filtered dataframe (realwage_f), we can use the xs method to select values at
lower levels in the multiindex, while keeping the higher levels (countries in this case)

In [12]: realwage_f = realwage.xs(('Hourly', 'In 2015 constant prices at 2015 USD exchange
rates'),
level=('Pay period', 'Series'), axis=1)
realwage_f.head()

Out[12]: Country Australia Belgium Brazil … Turkey United Kingdom \


Time …
2006­01­01 12.06 9.70 0.87 … 2.27 9.81
2007­01­01 12.46 9.82 0.92 … 2.26 10.07
2008­01­01 12.24 9.87 0.96 … 2.22 10.04
2009­01­01 12.40 10.21 1.03 … 2.28 10.15
2010­01­01 12.34 10.05 1.08 … 2.30 9.96

Country United States


Time
2006­01­01 6.05
2007­01­01 6.24
2008­01­01 6.78
2009­01­01 7.58
2010­01­01 7.88

[5 rows x 32 columns]

53.4 Merging Dataframes and Filling NaNs

Similar to relational databases like SQL, pandas has built in methods to merge datasets to-
gether.
Using country information from WorldData.info, we’ll add the continent of each country to
realwage_f with the merge function.

The dataset can be accessed with the following link:

In [13]: url2 = 'https://raw.githubusercontent.com/QuantEcon/lecture­


python/master/source/_static/lecture_specific/pandas_panel/countries.csv'
53.4. MERGING DATAFRAMES AND FILLING NANS 877

In [14]: worlddata = pd.read_csv(url2, sep=';')


worlddata.head()

Out[14]: Country (en) Country (de) Country (local) … Deathrate \


0 Afghanistan Afghanistan Afganistan/Afqanestan … 13.70
1 Egypt Ägypten Misr … 4.70
2 Åland Islands Ålandinseln Åland … 0.00
3 Albania Albanien Shqipëria … 6.70
4 Algeria Algerien Al­Jaza’ir/Algérie … 4.30

Life expectancy Url


0 51.30 https://www.laenderdaten.info/Asien/Afghanista…
1 72.70 https://www.laenderdaten.info/Afrika/Aegypten/…
2 0.00 https://www.laenderdaten.info/Europa/Aland/ind…
3 78.30 https://www.laenderdaten.info/Europa/Albanien/…
4 76.80 https://www.laenderdaten.info/Afrika/Algerien/…

[5 rows x 17 columns]

First, we’ll select just the country and continent variables from worlddata and rename the
column to ‘Country’

In [15]: worlddata = worlddata[['Country (en)', 'Continent']]


worlddata = worlddata.rename(columns={'Country (en)': 'Country'})
worlddata.head()

Out[15]: Country Continent


0 Afghanistan Asia
1 Egypt Africa
2 Åland Islands Europe
3 Albania Europe
4 Algeria Africa

We want to merge our new dataframe, worlddata, with realwage_f.


The pandas merge function allows dataframes to be joined together by rows.
Our dataframes will be merged using country names, requiring us to use the transpose of
realwage_f so that rows correspond to country names in both dataframes

In [16]: realwage_f.transpose().head()

Out[16]: Time 2006­01­01 2007­01­01 2008­01­01 … 2014­01­01 2015­01­01 \


Country …
Australia 12.06 12.46 12.24 … 12.67 12.83
Belgium 9.70 9.82 9.87 … 10.01 9.95
Brazil 0.87 0.92 0.96 … 1.21 1.21
Canada 6.89 6.96 7.24 … 8.22 8.35
Chile 1.42 1.45 1.44 … 1.76 1.81

Time 2016­01­01
Country
Australia 12.98
Belgium 9.76
Brazil 1.24
Canada 8.48
Chile 1.91

[5 rows x 11 columns]
878 CHAPTER 53. PANDAS FOR PANEL DATA

We can use either left, right, inner, or outer join to merge our datasets:
• left join includes only countries from the left dataset
• right join includes only countries from the right dataset
• outer join includes countries that are in either the left and right datasets
• inner join includes only countries common to both the left and right datasets
By default, merge will use an inner join.
Here we will pass how='left' to keep all countries in realwage_f, but discard countries in
worlddata that do not have a corresponding data entry realwage_f.

This is illustrated by the red shading in the following diagram

We will also need to specify where the country name is located in each dataframe, which will
be the key that is used to merge the dataframes ‘on’.
Our ‘left’ dataframe (realwage_f.transpose()) contains countries in the index, so we set
left_index=True.

Our ‘right’ dataframe (worlddata) contains countries in the ‘Country’ column, so we set
right_on='Country'

In [17]: merged = pd.merge(realwage_f.transpose(), worlddata,


how='left', left_index=True, right_on='Country')
merged.head()

Out[17]: 2006­01­01 00:00:00 2007­01­01 00:00:00 2008­01­01 00:00:00 … \


17.00 12.06 12.46 12.24 …
23.00 9.70 9.82 9.87 …
32.00 0.87 0.92 0.96 …
100.00 6.89 6.96 7.24 …
38.00 1.42 1.45 1.44 …

2016­01­01 00:00:00 Country Continent


17.00 12.98 Australia Australia
23.00 9.76 Belgium Europe
53.4. MERGING DATAFRAMES AND FILLING NANS 879

32.00 1.24 Brazil South America


100.00 8.48 Canada North America
38.00 1.91 Chile South America

[5 rows x 13 columns]

Countries that appeared in realwage_f but not in worlddata will have NaN in the Continent
column.
To check whether this has occurred, we can use .isnull() on the continent column and filter
the merged dataframe

In [18]: merged[merged['Continent'].isnull()]

Out[18]: 2006­01­01 00:00:00 2007­01­01 00:00:00 2008­01­01 00:00:00 … \


nan 3.42 3.74 3.87 …
nan 0.23 0.45 0.39 …
nan 1.50 1.64 1.71 …

2016­01­01 00:00:00 Country Continent


nan 5.28 Korea NaN
nan 0.55 Russian Federation NaN
nan 2.08 Slovak Republic NaN

[3 rows x 13 columns]

We have three missing values!


One option to deal with NaN values is to create a dictionary containing these countries and
their respective continents.
.map() will match countries in merged['Country'] with their continent from the dictionary.
Notice how countries not in our dictionary are mapped with NaN

In [19]: missing_continents = {'Korea': 'Asia',


'Russian Federation': 'Europe',
'Slovak Republic': 'Europe'}

merged['Country'].map(missing_continents)

Out[19]: 17.00 NaN


23.00 NaN
32.00 NaN
100.00 NaN
38.00 NaN
108.00 NaN
41.00 NaN
225.00 NaN
53.00 NaN
58.00 NaN
45.00 NaN
68.00 NaN
233.00 NaN
86.00 NaN
88.00 NaN
91.00 NaN
nan Asia
117.00 NaN
880 CHAPTER 53. PANDAS FOR PANEL DATA

122.00 NaN
123.00 NaN
138.00 NaN
153.00 NaN
151.00 NaN
174.00 NaN
175.00 NaN
nan Europe
nan Europe
198.00 NaN
200.00 NaN
227.00 NaN
241.00 NaN
240.00 NaN
Name: Country, dtype: object

We don’t want to overwrite the entire series with this mapping.


.fillna() only fills in NaN values in merged['Continent'] with the mapping, while leaving
other values in the column unchanged

In [20]: merged['Continent'] =
merged['Continent'].fillna(merged['Country'].map(missing_continents))

# Check for whether continents were correctly mapped

merged[merged['Country'] == 'Korea']

Out[20]: 2006­01­01 00:00:00 2007­01­01 00:00:00 2008­01­01 00:00:00 … \


nan 3.42 3.74 3.87 …

2016­01­01 00:00:00 Country Continent


nan 5.28 Korea Asia

[1 rows x 13 columns]

We will also combine the Americas into a single continent - this will make our visualization
nicer later on.
To do this, we will use .replace() and loop through a list of the continent values we want to
replace

In [21]: replace = ['Central America', 'North America', 'South America']

for country in replace:


merged['Continent'].replace(to_replace=country,
value='America',
inplace=True)

Now that we have all the data we want in a single DataFrame, we will reshape it back into
panel form with a MultiIndex.
We should also ensure to sort the index using .sort_index() so that we can efficiently filter
our dataframe later on.
By default, levels will be sorted top-down

In [22]: merged = merged.set_index(['Continent', 'Country']).sort_index()


merged.head()
53.4. MERGING DATAFRAMES AND FILLING NANS 881

Out[22]: 2006­01­01 2007­01­01 2008­01­01 … 2014­01­01 \


Continent Country …
America Brazil 0.87 0.92 0.96 … 1.21
Canada 6.89 6.96 7.24 … 8.22
Chile 1.42 1.45 1.44 … 1.76
Colombia 1.01 1.02 1.01 … 1.13
Costa Rica nan nan nan … 2.41

2015­01­01 2016­01­01
Continent Country
America Brazil 1.21 1.24
Canada 8.35 8.48
Chile 1.81 1.91
Colombia 1.13 1.12
Costa Rica 2.56 2.63

[5 rows x 11 columns]

While merging, we lost our DatetimeIndex, as we merged columns that were not in datetime
format

In [23]: merged.columns

Out[23]: Index([2006­01­01 00:00:00, 2007­01­01 00:00:00, 2008­01­01 00:00:00,


2009­01­01 00:00:00, 2010­01­01 00:00:00, 2011­01­01 00:00:00,
2012­01­01 00:00:00, 2013­01­01 00:00:00, 2014­01­01 00:00:00,
2015­01­01 00:00:00, 2016­01­01 00:00:00],
dtype='object')

Now that we have set the merged columns as the index, we can recreate a DatetimeIndex us-
ing .to_datetime()

In [24]: merged.columns = pd.to_datetime(merged.columns)


merged.columns = merged.columns.rename('Time')
merged.columns

Out[24]: DatetimeIndex(['2006­01­01', '2007­01­01', '2008­01­01', '2009­01­01',


'2010­01­01', '2011­01­01', '2012­01­01', '2013­01­01',
'2014­01­01', '2015­01­01', '2016­01­01'],
dtype='datetime64[ns]', name='Time', freq=None)

The DatetimeIndex tends to work more smoothly in the row axis, so we will go ahead and
transpose merged

In [25]: merged = merged.transpose()


merged.head()

Out[25]: Continent America … Europe


Country Brazil Canada Chile … Slovenia Spain United Kingdom
Time …
2006­01­01 0.87 6.89 1.42 … 3.92 3.99 9.81
2007­01­01 0.92 6.96 1.45 … 3.88 4.10 10.07
2008­01­01 0.96 7.24 1.44 … 3.96 4.14 10.04
2009­01­01 1.03 7.67 1.52 … 4.08 4.32 10.15
2010­01­01 1.08 7.94 1.56 … 4.81 4.30 9.96

[5 rows x 32 columns]
882 CHAPTER 53. PANDAS FOR PANEL DATA

53.5 Grouping and Summarizing Data

Grouping and summarizing data can be particularly useful for understanding large panel
datasets.
A simple way to summarize data is to call an aggregation method on the dataframe, such as
.mean() or .max().

For example, we can calculate the average real minimum wage for each country over the pe-
riod 2006 to 2016 (the default is to aggregate over rows)

In [26]: merged.mean().head(10)

Out[26]: Continent Country


America Brazil 1.09
Canada 7.82
Chile 1.62
Colombia 1.07
Costa Rica 2.53
Mexico 0.53
United States 7.15
Asia Israel 5.95
Japan 6.18
Korea 4.22
dtype: float64

Using this series, we can plot the average real minimum wage over the past decade for each
country in our data set

In [27]: import matplotlib.pyplot as plt


%matplotlib inline
import matplotlib
matplotlib.style.use('seaborn')

merged.mean().sort_values(ascending=False).plot(kind='bar', title="Average real�


↪minimum
wage 2006 ­ 2016")

#Set country labels


country_labels =
merged.mean().sort_values(ascending=False).index.get_level_values('Country').
↪tolist()
plt.xticks(range(0, len(country_labels)), country_labels)
plt.xlabel('Country')

plt.show()
53.5. GROUPING AND SUMMARIZING DATA 883

Passing in axis=1 to .mean() will aggregate over columns (giving the average minimum wage
for all countries over time)

In [28]: merged.mean(axis=1).head()

Out[28]: Time
2006­01­01 4.69
2007­01­01 4.84
2008­01­01 4.90
2009­01­01 5.08
2010­01­01 5.11
dtype: float64

We can plot this time series as a line graph

In [29]: merged.mean(axis=1).plot()
plt.title('Average real minimum wage 2006 ­ 2016')
plt.ylabel('2015 USD')
plt.xlabel('Year')
plt.show()
884 CHAPTER 53. PANDAS FOR PANEL DATA

We can also specify a level of the MultiIndex (in the column axis) to aggregate over

In [30]: merged.mean(level='Continent', axis=1).head()

Out[30]: Continent America Asia Australia Europe


Time
2006­01­01 2.80 4.29 10.25 4.80
2007­01­01 2.85 4.44 10.73 4.94
2008­01­01 2.99 4.45 10.76 4.99
2009­01­01 3.23 4.53 10.97 5.16
2010­01­01 3.34 4.53 10.95 5.17

We can plot the average minimum wages in each continent as a time series

In [31]: merged.mean(level='Continent', axis=1).plot()


plt.title('Average real minimum wage')
plt.ylabel('2015 USD')
plt.xlabel('Year')
plt.show()
53.5. GROUPING AND SUMMARIZING DATA 885

We will drop Australia as a continent for plotting purposes

In [32]: merged = merged.drop('Australia', level='Continent', axis=1)


merged.mean(level='Continent', axis=1).plot()
plt.title('Average real minimum wage')
plt.ylabel('2015 USD')
plt.xlabel('Year')
plt.show()
886 CHAPTER 53. PANDAS FOR PANEL DATA

.describe() is useful for quickly retrieving a number of common summary statistics

In [33]: merged.stack().describe()

Out[33]: Continent America Asia Europe


count 69.00 44.00 200.00
mean 3.19 4.70 5.15
std 3.02 1.56 3.82
min 0.52 2.22 0.23
25% 1.03 3.37 2.02
50% 1.44 5.48 3.54
75% 6.96 5.95 9.70
max 8.48 6.65 12.39

This is a simplified way to use groupby.


Using groupby generally follows a ‘split-apply-combine’ process:
• split: data is grouped based on one or more keys
• apply: a function is called on each group independently
• combine: the results of the function calls are combined into a new data structure
The groupby method achieves the first step of this process, creating a new DataFrameGroupBy
object with data split into groups.
Let’s split merged by continent again, this time using the groupby function, and name the re-
sulting object grouped

In [34]: grouped = merged.groupby(level='Continent', axis=1)


grouped

Out[34]: <pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f4f7c4dbd10>


53.5. GROUPING AND SUMMARIZING DATA 887

Calling an aggregation method on the object applies the function to each group, the results of
which are combined in a new data structure.
For example, we can return the number of countries in our dataset for each continent using
.size().

In this case, our new data structure is a Series

In [35]: grouped.size()

Out[35]: Continent
America 7
Asia 4
Europe 19
dtype: int64

Calling .get_group() to return just the countries in a single group, we can create a kernel
density estimate of the distribution of real minimum wages in 2016 for each continent.
grouped.groups.keys() will return the keys from the groupby object

In [36]: import seaborn as sns

continents = grouped.groups.keys()

for continent in continents:


sns.kdeplot(grouped.get_group(continent)['2015'].unstack(), label=continent,
shade=True)

plt.title('Real minimum wages in 2015')


plt.xlabel('US dollars')
plt.show()
888 CHAPTER 53. PANDAS FOR PANEL DATA

53.6 Final Remarks

This lecture has provided an introduction to some of pandas’ more advanced features, includ-
ing multiindices, merging, grouping and plotting.
Other tools that may be useful in panel data analysis include xarray, a python package that
extends pandas to N-dimensional data structures.

53.7 Exercises

53.7.1 Exercise 1

In these exercises, you’ll work with a dataset of employment rates in Europe by age and sex
from Eurostat.
The dataset can be accessed with the following link:

In [37]: url3 = 'https://raw.githubusercontent.com/QuantEcon/lecture­


python/master/source/_static/lecture_specific/pandas_panel/employ.csv'

Reading in the CSV file returns a panel dataset in long format. Use .pivot_table() to con-
struct a wide format dataframe with a MultiIndex in the columns.
Start off by exploring the dataframe and the variables available in the MultiIndex levels.
Write a program that quickly returns all values in the MultiIndex.

53.7.2 Exercise 2

Filter the above dataframe to only include employment as a percentage of ‘active population’.
Create a grouped boxplot using seaborn of employment rates in 2015 by age group and sex.
Hint: GEO includes both areas and countries.

53.8 Solutions

53.8.1 Exercise 1

In [38]: employ = pd.read_csv(url3)


employ = employ.pivot_table(values='Value',
index=['DATE'],
columns=['UNIT','AGE', 'SEX', 'INDIC_EM', 'GEO'])
employ.index = pd.to_datetime(employ.index) # ensure that dates are datetime format
employ.head()

Out[38]: UNIT Percentage of total population … \


AGE From 15 to 24 years …
SEX Females …
53.8. SOLUTIONS 889

INDIC_EM Active population …


GEO Austria Belgium Bulgaria …
DATE …
2007­01­01 56.00 31.60 26.00 …
2008­01­01 56.20 30.80 26.10 …
2009­01­01 56.20 29.90 24.80 …
2010­01­01 54.00 29.80 26.60 …
2011­01­01 54.80 29.80 24.80 …

UNIT Thousand persons \


AGE From 55 to 64 years
SEX Total
INDIC_EM Total employment (resident population concept ­ LFS)
GEO Switzerland Turkey
DATE
2007­01­01 nan 1,282.00
2008­01­01 nan 1,354.00
2009­01­01 nan 1,449.00
2010­01­01 640.00 1,583.00
2011­01­01 661.00 1,760.00

UNIT
AGE
SEX
INDIC_EM
GEO United Kingdom
DATE
2007­01­01 4,131.00
2008­01­01 4,204.00
2009­01­01 4,193.00
2010­01­01 4,186.00
2011­01­01 4,164.00

[5 rows x 1440 columns]

This is a large dataset so it is useful to explore the levels and variables available

In [39]: employ.columns.names

Out[39]: FrozenList(['UNIT', 'AGE', 'SEX', 'INDIC_EM', 'GEO'])

Variables within levels can be quickly retrieved with a loop

In [40]: for name in employ.columns.names:


print(name, employ.columns.get_level_values(name).unique())

UNIT Index(['Percentage of total population', 'Thousand persons'],�


↪dtype='object',
name='UNIT')
AGE Index(['From 15 to 24 years', 'From 25 to 54 years', 'From 55 to 64 years'],
dtype='object', name='AGE')
SEX Index(['Females', 'Males', 'Total'], dtype='object', name='SEX')
INDIC_EM Index(['Active population', 'Total employment (resident population concept�
↪­
LFS)'], dtype='object', name='INDIC_EM')
GEO Index(['Austria', 'Belgium', 'Bulgaria', 'Croatia', 'Cyprus', 'Czech Republic',
'Denmark', 'Estonia', 'Euro area (17 countries)',
'Euro area (18 countries)', 'Euro area (19 countries)',
'European Union (15 countries)', 'European Union (27 countries)',
890 CHAPTER 53. PANDAS FOR PANEL DATA

'European Union (28 countries)', 'Finland',


'Former Yugoslav Republic of Macedonia, the', 'France',
'France (metropolitan)',
'Germany (until 1990 former territory of the FRG)', 'Greece', 'Hungary',
'Iceland', 'Ireland', 'Italy', 'Latvia', 'Lithuania', 'Luxembourg',
'Malta', 'Netherlands', 'Norway', 'Poland', 'Portugal', 'Romania',
'Slovakia', 'Slovenia', 'Spain', 'Sweden', 'Switzerland', 'Turkey',
'United Kingdom'],
dtype='object', name='GEO')

53.8.2 Exercise 2

To easily filter by country, swap GEO to the top level and sort the MultiIndex

In [41]: employ.columns = employ.columns.swaplevel(0,­1)


employ = employ.sort_index(axis=1)

We need to get rid of a few items in GEO which are not countries.
A fast way to get rid of the EU areas is to use a list comprehension to find the level values in
GEO that begin with ‘Euro’

In [42]: geo_list = employ.columns.get_level_values('GEO').unique().tolist()


countries = [x for x in geo_list if not x.startswith('Euro')]
employ = employ[countries]
employ.columns.get_level_values('GEO').unique()

Out[42]: Index(['Austria', 'Belgium', 'Bulgaria', 'Croatia', 'Cyprus', 'Czech Republic',


'Denmark', 'Estonia', 'Finland',
'Former Yugoslav Republic of Macedonia, the', 'France',
'France (metropolitan)',
'Germany (until 1990 former territory of the FRG)', 'Greece', 'Hungary',
'Iceland', 'Ireland', 'Italy', 'Latvia', 'Lithuania', 'Luxembourg',
'Malta', 'Netherlands', 'Norway', 'Poland', 'Portugal', 'Romania',
'Slovakia', 'Slovenia', 'Spain', 'Sweden', 'Switzerland', 'Turkey',
'United Kingdom'],
dtype='object', name='GEO')

Select only percentage employed in the active population from the dataframe

In [43]: employ_f = employ.xs(('Percentage of total population', 'Active population'),


level=('UNIT', 'INDIC_EM'),
axis=1)
employ_f.head()

Out[43]: GEO Austria … United Kingdom \


AGE From 15 to 24 years … From 55 to 64 years
SEX Females Males Total … Females Males
DATE …
2007­01­01 56.00 62.90 59.40 … 49.90 68.90
2008­01­01 56.20 62.90 59.50 … 50.20 69.80
2009­01­01 56.20 62.90 59.50 … 50.60 70.30
2010­01­01 54.00 62.60 58.30 … 51.10 69.20
2011­01­01 54.80 63.60 59.20 … 51.30 68.40

GEO
53.8. SOLUTIONS 891

AGE
SEX Total
DATE
2007­01­01 59.30
2008­01­01 59.80
2009­01­01 60.30
2010­01­01 60.00
2011­01­01 59.70

[5 rows x 306 columns]

Drop the ‘Total’ value before creating the grouped boxplot

In [44]: employ_f = employ_f.drop('Total', level='SEX', axis=1)

In [45]: box = employ_f['2015'].unstack().reset_index()


sns.boxplot(x="AGE", y=0, hue="SEX", data=box, palette=("husl"), showfliers=False)
plt.xlabel('')
plt.xticks(rotation=35)
plt.ylabel('Percentage of population (%)')
plt.title('Employment in Europe (2015)')
plt.legend(bbox_to_anchor=(1,0.5))
plt.show()
892 CHAPTER 53. PANDAS FOR PANEL DATA
Chapter 54

Linear Regression in Python

54.1 Contents

• Overview 54.2
• Simple Linear Regression 54.3
• Extending the Linear Regression Model 54.4
• Endogeneity 54.5
• Summary 54.6
• Exercises 54.7
• Solutions 54.8
In addition to what’s in Anaconda, this lecture will need the following libraries:

In [1]: !pip install linearmodels

54.2 Overview

Linear regression is a standard tool for analyzing the relationship between two or more vari-
ables.
In this lecture, we’ll use the Python package statsmodels to estimate, interpret, and visualize
linear regression models.
Along the way, we’ll discuss a variety of topics, including
• simple and multivariate linear regression
• visualization
• endogeneity and omitted variable bias
• two-stage least squares
As an example, we will replicate results from Acemoglu, Johnson and Robinson’s seminal pa-
per [1].
• You can download a copy here.
In the paper, the authors emphasize the importance of institutions in economic development.
The main contribution is the use of settler mortality rates as a source of exogenous variation
in institutional differences.

893
894 CHAPTER 54. LINEAR REGRESSION IN PYTHON

Such variation is needed to determine whether it is institutions that give rise to greater eco-
nomic growth, rather than the other way around.
Let’s start with some imports:

In [2]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd
import statsmodels.api as sm
from statsmodels.iolib.summary2 import summary_col
from linearmodels.iv import IV2SLS

54.2.1 Prerequisites

This lecture assumes you are familiar with basic econometrics.


For an introductory text covering these topics, see, for example, [111].

54.3 Simple Linear Regression

[1] wish to determine whether or not differences in institutions can help to explain observed
economic outcomes.
How do we measure institutional differences and economic outcomes?
In this paper,
• economic outcomes are proxied by log GDP per capita in 1995, adjusted for exchange
rates.
• institutional differences are proxied by an index of protection against expropriation on
average over 1985-95, constructed by the Political Risk Services Group.
These variables and other data used in the paper are available for download on Daron Ace-
moglu’s webpage.
We will use pandas’ .read_stata() function to read in data contained in the .dta files to
dataframes

In [3]: df1 = pd.read_stata('https://github.com/QuantEcon/lecture­


python/blob/master/source/_static/lecture_specific/ols/maketable1.dta?raw=true')
df1.head()

Out[3]: shortnam euro1900 excolony avexpr logpgp95 cons1 cons90 democ00a \


0 AFG 0.000000 1.0 NaN NaN 1.0 2.0 1.0
1 AGO 8.000000 1.0 5.363636 7.770645 3.0 3.0 0.0
2 ARE 0.000000 1.0 7.181818 9.804219 NaN NaN NaN
3 ARG 60.000004 1.0 6.386364 9.133459 1.0 6.0 3.0
4 ARM 0.000000 0.0 NaN 7.682482 NaN NaN NaN

cons00a extmort4 logem4 loghjypl baseco


0 1.0 93.699997 4.540098 NaN NaN
1 1.0 280.000000 5.634789 ­3.411248 1.0
2 NaN NaN NaN NaN NaN
3 3.0 68.900002 4.232656 ­0.872274 1.0
4 NaN NaN NaN NaN NaN
54.3. SIMPLE LINEAR REGRESSION 895

Let’s use a scatterplot to see whether any obvious relationship exists between GDP per capita
and the protection against expropriation index

In [4]: plt.style.use('seaborn')

df1.plot(x='avexpr', y='logpgp95', kind='scatter')


plt.show()

The plot shows a fairly strong positive relationship between protection against expropriation
and log GDP per capita.
Specifically, if higher protection against expropriation is a measure of institutional quality,
then better institutions appear to be positively correlated with better economic outcomes
(higher GDP per capita).
Given the plot, choosing a linear model to describe this relationship seems like a reasonable
assumption.
We can write our model as

𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 = 𝛽0 + 𝛽1 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 + 𝑢𝑖

where:
• 𝛽0 is the intercept of the linear trend line on the y-axis
• 𝛽1 is the slope of the linear trend line, representing the marginal effect of protection
against risk on log GDP per capita
• 𝑢𝑖 is a random error term (deviations of observations from the linear trend due to fac-
tors not included in the model)
896 CHAPTER 54. LINEAR REGRESSION IN PYTHON

Visually, this linear model involves choosing a straight line that best fits the data, as in the
following plot (Figure 2 in [1])

In [5]: # Dropping NA's is required to use numpy's polyfit


df1_subset = df1.dropna(subset=['logpgp95', 'avexpr'])

# Use only 'base sample' for plotting purposes


df1_subset = df1_subset[df1_subset['baseco'] == 1]

X = df1_subset['avexpr']
y = df1_subset['logpgp95']
labels = df1_subset['shortnam']

# Replace markers with country labels


fig, ax = plt.subplots()
ax.scatter(X, y, marker='')

for i, label in enumerate(labels):


ax.annotate(label, (X.iloc[i], y.iloc[i]))

# Fit a linear trend line


ax.plot(np.unique(X),
np.poly1d(np.polyfit(X, y, 1))(np.unique(X)),
color='black')

ax.set_xlim([3.3,10.5])
ax.set_ylim([4,10.5])
ax.set_xlabel('Average Expropriation Risk 1985­95')
ax.set_ylabel('Log GDP per capita, PPP, 1995')
ax.set_title('Figure 2: OLS relationship between expropriation \
risk and income')
plt.show()
54.3. SIMPLE LINEAR REGRESSION 897

The most common technique to estimate the parameters (𝛽’s) of the linear model is Ordinary
Least Squares (OLS).
As the name implies, an OLS model is solved by finding the parameters that minimize the
sum of squared residuals, i.e.

𝑁
min ∑ 𝑢̂2𝑖
𝛽̂ 𝑖=1

where 𝑢̂𝑖 is the difference between the observation and the predicted value of the dependent
variable.
To estimate the constant term 𝛽0 , we need to add a column of 1’s to our dataset (consider
the equation if 𝛽0 was replaced with 𝛽0 𝑥𝑖 and 𝑥𝑖 = 1)

In [6]: df1['const'] = 1

Now we can construct our model in statsmodels using the OLS function.
We will use pandas dataframes with statsmodels, however standard arrays can also be used
as arguments

In [7]: reg1 = sm.OLS(endog=df1['logpgp95'], exog=df1[['const', 'avexpr']], \


missing='drop')
type(reg1)

Out[7]: statsmodels.regression.linear_model.OLS

So far we have simply constructed our model.


We need to use .fit() to obtain parameter estimates 𝛽0̂ and 𝛽1̂

In [8]: results = reg1.fit()


type(results)

Out[8]: statsmodels.regression.linear_model.RegressionResultsWrapper

We now have the fitted regression model stored in results.


To view the OLS regression results, we can call the .summary() method.
Note that an observation was mistakenly dropped from the results in the original paper (see
the note located in maketable2.do from Acemoglu’s webpage), and thus the coefficients differ
slightly.

In [9]: print(results.summary())

OLS Regression Results


==============================================================================
Dep. Variable: logpgp95 R­squared: 0.611
Model: OLS Adj. R­squared: 0.608
Method: Least Squares F­statistic: 171.4
Date: Fri, 07 Aug 2020 Prob (F­statistic): 4.16e­24
Time: 00:19:38 Log­Likelihood: ­119.71
898 CHAPTER 54. LINEAR REGRESSION IN PYTHON

No. Observations: 111 AIC: 243.4


Df Residuals: 109 BIC: 248.8
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
const 4.6261 0.301 15.391 0.000 4.030 5.222
avexpr 0.5319 0.041 13.093 0.000 0.451 0.612
==============================================================================
Omnibus: 9.251 Durbin­Watson: 1.689
Prob(Omnibus): 0.010 Jarque­Bera (JB): 9.170
Skew: ­0.680 Prob(JB): 0.0102
Kurtosis: 3.362 Cond. No. 33.2
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.

From our results, we see that


• The intercept 𝛽0̂ = 4.63.
• The slope 𝛽1̂ = 0.53.
• The positive 𝛽1̂ parameter estimate implies that. institutional quality has a positive ef-
fect on economic outcomes, as we saw in the figure.
• The p-value of 0.000 for 𝛽1̂ implies that the effect of institutions on GDP is statistically
significant (using p < 0.05 as a rejection rule).
• The R-squared value of 0.611 indicates that around 61% of variation in log GDP per
capita is explained by protection against expropriation.
Using our parameter estimates, we can now write our estimated relationship as

̂
𝑙𝑜𝑔𝑝𝑔𝑝95 𝑖 = 4.63 + 0.53 𝑎𝑣𝑒𝑥𝑝𝑟𝑖

This equation describes the line that best fits our data, as shown in Figure 2.
We can use this equation to predict the level of log GDP per capita for a value of the index of
expropriation protection.
For example, for a country with an index value of 7.07 (the average for the dataset), we find
that their predicted level of log GDP per capita in 1995 is 8.38.

In [10]: mean_expr = np.mean(df1_subset['avexpr'])


mean_expr

Out[10]: 6.515625

In [11]: predicted_logpdp95 = 4.63 + 0.53 * 7.07


predicted_logpdp95

Out[11]: 8.3771

An easier (and more accurate) way to obtain this result is to use .predict() and set
𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 = 1 and 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 = 𝑚𝑒𝑎𝑛_𝑒𝑥𝑝𝑟
54.3. SIMPLE LINEAR REGRESSION 899

In [12]: results.predict(exog=[1, mean_expr])

Out[12]: array([8.09156367])

We can obtain an array of predicted 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 for every value of 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 in our dataset by
calling .predict() on our results.
Plotting the predicted values against 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 shows that the predicted values lie along the
linear line that we fitted above.
The observed values of 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 are also plotted for comparison purposes

In [13]: # Drop missing observations from whole sample

df1_plot = df1.dropna(subset=['logpgp95', 'avexpr'])

# Plot predicted values

fix, ax = plt.subplots()
ax.scatter(df1_plot['avexpr'], results.predict(), alpha=0.5,
label='predicted')

# Plot observed values

ax.scatter(df1_plot['avexpr'], df1_plot['logpgp95'], alpha=0.5,


label='observed')

ax.legend()
ax.set_title('OLS predicted values')
ax.set_xlabel('avexpr')
ax.set_ylabel('logpgp95')
plt.show()
900 CHAPTER 54. LINEAR REGRESSION IN PYTHON

54.4 Extending the Linear Regression Model

So far we have only accounted for institutions affecting economic performance - almost cer-
tainly there are numerous other factors affecting GDP that are not included in our model.
Leaving out variables that affect 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 will result in omitted variable bias, yielding
biased and inconsistent parameter estimates.
We can extend our bivariate regression model to a multivariate regression model by
adding in other factors that may affect 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 .
[1] consider other factors such as:
• the effect of climate on economic outcomes; latitude is used to proxy this
• differences that affect both economic performance and institutions, eg. cultural, histori-
cal, etc.; controlled for with the use of continent dummies
Let’s estimate some of the extended models considered in the paper (Table 2) using data from
maketable2.dta

In [14]: df2 = pd.read_stata('https://github.com/QuantEcon/lecture­


python/blob/master/source/_static/lecture_specific/ols/maketable2.dta?raw=true')

# Add constant term to dataset


df2['const'] = 1

# Create lists of variables to be used in each regression


X1 = ['const', 'avexpr']
X2 = ['const', 'avexpr', 'lat_abst']
X3 = ['const', 'avexpr', 'lat_abst', 'asia', 'africa', 'other']

# Estimate an OLS regression for each set of variables


reg1 = sm.OLS(df2['logpgp95'], df2[X1], missing='drop').fit()
reg2 = sm.OLS(df2['logpgp95'], df2[X2], missing='drop').fit()
reg3 = sm.OLS(df2['logpgp95'], df2[X3], missing='drop').fit()

Now that we have fitted our model, we will use summary_col to display the results in a single
table (model numbers correspond to those in the paper)

In [15]: info_dict={'R­squared' : lambda x: f"{x.rsquared:.2f}",


'No. observations' : lambda x: f"{int(x.nobs):d}"}

results_table = summary_col(results=[reg1,reg2,reg3],
float_format='%0.2f',
stars = True,
model_names=['Model 1',
'Model 3',
'Model 4'],
info_dict=info_dict,
regressor_order=['const',
'avexpr',
'lat_abst',
'asia',
'africa'])

results_table.add_title('Table 2 ­ OLS Regressions')


54.5. ENDOGENEITY 901

print(results_table)

Table 2 ­ OLS Regressions


=========================================
Model 1 Model 3 Model 4
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
const 4.63*** 4.87*** 5.85***
(0.30) (0.33) (0.34)
avexpr 0.53*** 0.46*** 0.39***
(0.04) (0.06) (0.05)
lat_abst 0.87* 0.33
(0.49) (0.45)
asia ­0.15
(0.15)
africa ­0.92***
(0.17)
R­squared 0.61 0.62 0.70
0.61 0.62 0.72
other 0.30
(0.37)
R­squared 0.61 0.62 0.72
No. observations 111 111 111
=========================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01

54.5 Endogeneity

As [1] discuss, the OLS models likely suffer from endogeneity issues, resulting in biased and
inconsistent model estimates.
Namely, there is likely a two-way relationship between institutions and economic outcomes:
• richer countries may be able to afford or prefer better institutions
• variables that affect income may also be correlated with institutional differences
• the construction of the index may be biased; analysts may be biased towards seeing
countries with higher income having better institutions
To deal with endogeneity, we can use two-stage least squares (2SLS) regression, which
is an extension of OLS regression.
This method requires replacing the endogenous variable 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 with a variable that is:

1. correlated with 𝑎𝑣𝑒𝑥𝑝𝑟𝑖


2. not correlated with the error term (ie. it should not directly affect the dependent vari-
able, otherwise it would be correlated with 𝑢𝑖 due to omitted variable bias)

The new set of regressors is called an instrument, which aims to remove endogeneity in our
proxy of institutional differences.
The main contribution of [1] is the use of settler mortality rates to instrument for institu-
tional differences.
They hypothesize that higher mortality rates of colonizers led to the establishment of insti-
tutions that were more extractive in nature (less protection against expropriation), and these
institutions still persist today.
902 CHAPTER 54. LINEAR REGRESSION IN PYTHON

Using a scatterplot (Figure 3 in [1]), we can see protection against expropriation is negatively
correlated with settler mortality rates, coinciding with the authors’ hypothesis and satisfying
the first condition of a valid instrument.

In [16]: # Dropping NA's is required to use numpy's polyfit


df1_subset2 = df1.dropna(subset=['logem4', 'avexpr'])

X = df1_subset2['logem4']
y = df1_subset2['avexpr']
labels = df1_subset2['shortnam']

# Replace markers with country labels


fig, ax = plt.subplots()
ax.scatter(X, y, marker='')

for i, label in enumerate(labels):


ax.annotate(label, (X.iloc[i], y.iloc[i]))

# Fit a linear trend line


ax.plot(np.unique(X),
np.poly1d(np.polyfit(X, y, 1))(np.unique(X)),
color='black')

ax.set_xlim([1.8,8.4])
ax.set_ylim([3.3,10.4])
ax.set_xlabel('Log of Settler Mortality')
ax.set_ylabel('Average Expropriation Risk 1985­95')
ax.set_title('Figure 3: First­stage relationship between settler mortality \
and expropriation risk')
plt.show()

The second condition may not be satisfied if settler mortality rates in the 17th to 19th cen-
turies have a direct effect on current GDP (in addition to their indirect effect through institu-
54.5. ENDOGENEITY 903

tions).
For example, settler mortality rates may be related to the current disease environment in a
country, which could affect current economic performance.
[1] argue this is unlikely because:
• The majority of settler deaths were due to malaria and yellow fever and had a limited
effect on local people.
• The disease burden on local people in Africa or India, for example, did not appear to
be higher than average, supported by relatively high population densities in these areas
before colonization.
As we appear to have a valid instrument, we can use 2SLS regression to obtain consistent and
unbiased parameter estimates.
First stage
The first stage involves regressing the endogenous variable (𝑎𝑣𝑒𝑥𝑝𝑟𝑖 ) on the instrument.
The instrument is the set of all exogenous variables in our model (and not just the variable
we have replaced).
Using model 1 as an example, our instrument is simply a constant and settler mortality rates
𝑙𝑜𝑔𝑒𝑚4𝑖 .
Therefore, we will estimate the first-stage regression as

𝑎𝑣𝑒𝑥𝑝𝑟𝑖 = 𝛿0 + 𝛿1 𝑙𝑜𝑔𝑒𝑚4𝑖 + 𝑣𝑖

The data we need to estimate this equation is located in maketable4.dta (only complete data,
indicated by baseco = 1, is used for estimation)

In [17]: # Import and select the data


df4 = pd.read_stata('https://github.com/QuantEcon/lecture­
python/blob/master/source/_static/lecture_specific/ols/maketable4.dta?raw=true')
df4 = df4[df4['baseco'] == 1]

# Add a constant variable


df4['const'] = 1

# Fit the first stage regression and print summary


results_fs = sm.OLS(df4['avexpr'],
df4[['const', 'logem4']],
missing='drop').fit()
print(results_fs.summary())

OLS Regression Results


==============================================================================
Dep. Variable: avexpr R­squared: 0.270
Model: OLS Adj. R­squared: 0.258
Method: Least Squares F­statistic: 22.95
Date: Fri, 07 Aug 2020 Prob (F­statistic): 1.08e­05
Time: 00:19:40 Log­Likelihood: ­104.83
No. Observations: 64 AIC: 213.7
Df Residuals: 62 BIC: 218.0
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
904 CHAPTER 54. LINEAR REGRESSION IN PYTHON

­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
const 9.3414 0.611 15.296 0.000 8.121 10.562
logem4 ­0.6068 0.127 ­4.790 0.000 ­0.860 ­0.354
==============================================================================
Omnibus: 0.035 Durbin­Watson: 2.003
Prob(Omnibus): 0.983 Jarque­Bera (JB): 0.172
Skew: 0.045 Prob(JB): 0.918
Kurtosis: 2.763 Cond. No. 19.4
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.

Second stage
We need to retrieve the predicted values of 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 using .predict().
We then replace the endogenous variable 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 with the predicted values 𝑎𝑣𝑒𝑥𝑝𝑟
̂ 𝑖 in the
original linear model.
Our second stage regression is thus

𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 = 𝛽0 + 𝛽1 𝑎𝑣𝑒𝑥𝑝𝑟
̂ 𝑖 + 𝑢𝑖

In [18]: df4['predicted_avexpr'] = results_fs.predict()

results_ss = sm.OLS(df4['logpgp95'],
df4[['const', 'predicted_avexpr']]).fit()
print(results_ss.summary())

OLS Regression Results


==============================================================================
Dep. Variable: logpgp95 R­squared: 0.477
Model: OLS Adj. R­squared: 0.469
Method: Least Squares F­statistic: 56.60
Date: Fri, 07 Aug 2020 Prob (F­statistic): 2.66e­10
Time: 00:19:40 Log­Likelihood: ­72.268
No. Observations: 64 AIC: 148.5
Df Residuals: 62 BIC: 152.9
Df Model: 1
Covariance Type: nonrobust
====================================================================================
coef std err t P>|t| [0.025 0.975]
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
const 1.9097 0.823 2.320 0.024 0.264 3.555
predicted_avexpr 0.9443 0.126 7.523 0.000 0.693 1.195
==============================================================================
Omnibus: 10.547 Durbin­Watson: 2.137
Prob(Omnibus): 0.005 Jarque­Bera (JB): 11.010
Skew: ­0.790 Prob(JB): 0.00407
Kurtosis: 4.277 Cond. No. 58.1
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
54.5. ENDOGENEITY 905

The second-stage regression results give us an unbiased and consistent estimate of the effect
of institutions on economic outcomes.
The result suggests a stronger positive relationship than what the OLS results indicated.
Note that while our parameter estimates are correct, our standard errors are not and for this
reason, computing 2SLS ‘manually’ (in stages with OLS) is not recommended.
We can correctly estimate a 2SLS regression in one step using the linearmodels package, an
extension of statsmodels
Note that when using IV2SLS, the exogenous and instrument variables are split up in the
function arguments (whereas before the instrument included exogenous variables)

In [19]: iv = IV2SLS(dependent=df4['logpgp95'],
exog=df4['const'],
endog=df4['avexpr'],
instruments=df4['logem4']).fit(cov_type='unadjusted')

print(iv.summary)

IV­2SLS Estimation Summary


==============================================================================
Dep. Variable: logpgp95 R­squared: 0.1870
Estimator: IV­2SLS Adj. R­squared: 0.1739
No. Observations: 64 F­statistic: 37.568
Date: Fri, Aug 07 2020 P­value (F­stat) 0.0000
Time: 00:19:40 Distribution: chi2(1)
Cov. Estimator: unadjusted

Parameter Estimates
==============================================================================
Parameter Std. Err. T­stat P­value Lower CI Upper CI
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
const 1.9097 1.0106 1.8897 0.0588 ­0.0710 3.8903
avexpr 0.9443 0.1541 6.1293 0.0000 0.6423 1.2462
==============================================================================

Endogenous: avexpr
Instruments: logem4
Unadjusted Covariance (Homoskedastic)
Debiased: False

/usr/share/miniconda3/envs/qe­lectures/lib/python3.7/site­
packages/linearmodels/iv/data.py:25: FutureWarning: is_categorical is deprecated and
will be removed in a future version. Use is_categorical_dtype instead
if is_categorical(s):

Given that we now have consistent and unbiased estimates, we can infer from the model we
have estimated that institutional differences (stemming from institutions set up during colo-
nization) can help to explain differences in income levels across countries today.
[1] use a marginal effect of 0.94 to calculate that the difference in the index between Chile
and Nigeria (ie. institutional quality) implies up to a 7-fold difference in income, emphasizing
the significance of institutions in economic development.
906 CHAPTER 54. LINEAR REGRESSION IN PYTHON

54.6 Summary

We have demonstrated basic OLS and 2SLS regression in statsmodels and linearmodels.
If you are familiar with R, you may want to use the formula interface to statsmodels, or con-
sider using r2py to call R from within Python.

54.7 Exercises

54.7.1 Exercise 1

In the lecture, we think the original model suffers from endogeneity bias due to the likely ef-
fect income has on institutional development.
Although endogeneity is often best identified by thinking about the data and model, we can
formally test for endogeneity using the Hausman test.
We want to test for correlation between the endogenous variable, 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 , and the errors, 𝑢𝑖

𝐻0 ∶ 𝐶𝑜𝑣(𝑎𝑣𝑒𝑥𝑝𝑟𝑖 , 𝑢𝑖 ) = 0 (𝑛𝑜 𝑒𝑛𝑑𝑜𝑔𝑒𝑛𝑒𝑖𝑡𝑦)


𝐻1 ∶ 𝐶𝑜𝑣(𝑎𝑣𝑒𝑥𝑝𝑟𝑖 , 𝑢𝑖 ) ≠ 0 (𝑒𝑛𝑑𝑜𝑔𝑒𝑛𝑒𝑖𝑡𝑦)

This test is running in two stages.


First, we regress 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 on the instrument, 𝑙𝑜𝑔𝑒𝑚4𝑖

𝑎𝑣𝑒𝑥𝑝𝑟𝑖 = 𝜋0 + 𝜋1 𝑙𝑜𝑔𝑒𝑚4𝑖 + 𝜐𝑖

Second, we retrieve the residuals 𝜐𝑖̂ and include them in the original equation

𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 = 𝛽0 + 𝛽1 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 + 𝛼𝜐𝑖̂ + 𝑢𝑖

If 𝛼 is statistically significant (with a p-value < 0.05), then we reject the null hypothesis and
conclude that 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 is endogenous.
Using the above information, estimate a Hausman test and interpret your results.

54.7.2 Exercise 2

The OLS parameter 𝛽 can also be estimated using matrix algebra and numpy (you may need
to review the numpy lecture to complete this exercise).
The linear equation we want to estimate is (written in matrix form)

𝑦 = 𝑋𝛽 + 𝑢

To solve for the unknown parameter 𝛽, we want to minimize the sum of squared residuals

min𝑢̂′ 𝑢̂
𝛽̂
54.8. SOLUTIONS 907

Rearranging the first equation and substituting into the second equation, we can write

min (𝑌 − 𝑋 𝛽)̂ ′ (𝑌 − 𝑋 𝛽)̂


𝛽̂

Solving this optimization problem gives the solution for the 𝛽 ̂ coefficients

𝛽 ̂ = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦

Using the above information, compute 𝛽 ̂ from model 1 using numpy - your results should be
the same as those in the statsmodels output from earlier in the lecture.

54.8 Solutions

54.8.1 Exercise 1

In [20]: # Load in data


df4 = pd.read_stata('https://github.com/QuantEcon/lecture­
python/blob/master/source/_static/lecture_specific/ols/maketable4.dta?raw=true')

# Add a constant term


df4['const'] = 1

# Estimate the first stage regression


reg1 = sm.OLS(endog=df4['avexpr'],
exog=df4[['const', 'logem4']],
missing='drop').fit()

# Retrieve the residuals


df4['resid'] = reg1.resid

# Estimate the second stage residuals


reg2 = sm.OLS(endog=df4['logpgp95'],
exog=df4[['const', 'avexpr', 'resid']],
missing='drop').fit()

print(reg2.summary())

OLS Regression Results


==============================================================================
Dep. Variable: logpgp95 R­squared: 0.689
Model: OLS Adj. R­squared: 0.679
Method: Least Squares F­statistic: 74.05
Date: Fri, 07 Aug 2020 Prob (F­statistic): 1.07e­17
Time: 00:19:40 Log­Likelihood: ­62.031
No. Observations: 70 AIC: 130.1
Df Residuals: 67 BIC: 136.8
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
const 2.4782 0.547 4.530 0.000 1.386 3.570
avexpr 0.8564 0.082 10.406 0.000 0.692 1.021
resid ­0.4951 0.099 ­5.017 0.000 ­0.692 ­0.298
==============================================================================
Omnibus: 17.597 Durbin­Watson: 2.086
908 CHAPTER 54. LINEAR REGRESSION IN PYTHON

Prob(Omnibus): 0.000 Jarque­Bera (JB): 23.194


Skew: ­1.054 Prob(JB): 9.19e­06
Kurtosis: 4.873 Cond. No. 53.8
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.

The output shows that the coefficient on the residuals is statistically significant, indicating
𝑎𝑣𝑒𝑥𝑝𝑟𝑖 is endogenous.

54.8.2 Exercise 2

In [21]: # Load in data


df1 = pd.read_stata('https://github.com/QuantEcon/lecture­
python/blob/master/source/_static/lecture_specific/ols/maketable1.dta?raw=true')
df1 = df1.dropna(subset=['logpgp95', 'avexpr'])

# Add a constant term


df1['const'] = 1

# Define the X and y variables


y = np.asarray(df1['logpgp95'])
X = np.asarray(df1[['const', 'avexpr']])

# Compute β_hat
β_hat = np.linalg.solve(X.T @ X, X.T @ y)

# Print out the results from the 2 x 1 vector β_hat


print(f'β_0 = {β_hat[0]:.2}')
print(f'β_1 = {β_hat[1]:.2}')

β_0 = 4.6
β_1 = 0.53

It is also possible to use np.linalg.inv(X.T @ X) @ X.T @ y to solve for 𝛽, however .solve()


is preferred as it involves fewer computations.
Chapter 55

Maximum Likelihood Estimation

55.1 Contents

• Overview 55.2
• Set Up and Assumptions 55.3
• Conditional Distributions 55.4
• Maximum Likelihood Estimation 55.5
• MLE with Numerical Methods 55.6
• Maximum Likelihood Estimation with statsmodels 55.7
• Summary 55.8
• Exercises 55.9
• Solutions 55.10

55.2 Overview

In a previous lecture, we estimated the relationship between dependent and explanatory vari-
ables using linear regression.
But what if a linear relationship is not an appropriate assumption for our model?
One widely used alternative is maximum likelihood estimation, which involves specifying a
class of distributions, indexed by unknown parameters, and then using the data to pin down
these parameter values.
The benefit relative to linear regression is that it allows more flexibility in the probabilistic
relationships between variables.
Here we illustrate maximum likelihood by replicating Daniel Treisman’s (2016) paper, Rus-
sia’s Billionaires, which connects the number of billionaires in a country to its economic char-
acteristics.
The paper concludes that Russia has a higher number of billionaires than economic factors
such as market size and tax rate predict.
We’ll require the following imports:

In [1]: import numpy as np


from numpy import exp
import matplotlib.pyplot as plt
%matplotlib inline

909
910 CHAPTER 55. MAXIMUM LIKELIHOOD ESTIMATION

from scipy.special import factorial


import pandas as pd
from mpl_toolkits.mplot3d import Axes3D
import statsmodels.api as sm
from statsmodels.api import Poisson
from scipy import stats
from scipy.stats import norm
from statsmodels.iolib.summary2 import summary_col

55.2.1 Prerequisites

We assume familiarity with basic probability and multivariate calculus.

55.3 Set Up and Assumptions

Let’s consider the steps we need to go through in maximum likelihood estimation and how
they pertain to this study.

55.3.1 Flow of Ideas

The first step with maximum likelihood estimation is to choose the probability distribution
believed to be generating the data.
More precisely, we need to make an assumption as to which parametric class of distributions
is generating the data.
• e.g., the class of all normal distributions, or the class of all gamma distributions.
Each such class is a family of distributions indexed by a finite number of parameters.
• e.g., the class of normal distributions is a family of distributions indexed by its mean
𝜇 ∈ (−∞, ∞) and standard deviation 𝜎 ∈ (0, ∞).
We’ll let the data pick out a particular element of the class by pinning down the parameters.
The parameter estimates so produced will be called maximum likelihood estimates.

55.3.2 Counting Billionaires

Treisman [106] is interested in estimating the number of billionaires in different countries.


The number of billionaires is integer-valued.
Hence we consider distributions that take values only in the nonnegative integers.
(This is one reason least squares regression is not the best tool for the present problem, since
the dependent variable in linear regression is not restricted to integer values)
One integer distribution is the Poisson distribution, the probability mass function (pmf) of
which is

𝜇𝑦 −𝜇
𝑓(𝑦) = 𝑒 , 𝑦 = 0, 1, 2, … , ∞
𝑦!

We can plot the Poisson distribution over 𝑦 for different values of 𝜇 as follows
55.3. SET UP AND ASSUMPTIONS 911

In [2]: poisson_pmf = lambda y, μ: μ**y / factorial(y) * exp(­μ)


y_values = range(0, 25)

fig, ax = plt.subplots(figsize=(12, 8))

for μ in [1, 5, 10]:


distribution = []
for y_i in y_values:
distribution.append(poisson_pmf(y_i, μ))
ax.plot(y_values,
distribution,
label=f'$\mu$={μ}',
alpha=0.5,
marker='o',
markersize=8)

ax.grid()
ax.set_xlabel('$y$', fontsize=14)
ax.set_ylabel('$f(y \mid \mu)$', fontsize=14)
ax.axis(xmin=0, ymin=0)
ax.legend(fontsize=14)

plt.show()

Notice that the Poisson distribution begins to resemble a normal distribution as the mean of
𝑦 increases.
Let’s have a look at the distribution of the data we’ll be working with in this lecture.
Treisman’s main source of data is Forbes’ annual rankings of billionaires and their estimated
net worth.
The dataset mle/fp.dta can be downloaded here or from its AER page.

In [3]: pd.options.display.max_columns = 10
912 CHAPTER 55. MAXIMUM LIKELIHOOD ESTIMATION

# Load in data and view


df = pd.read_stata('https://github.com/QuantEcon/lecture­
python/blob/master/source/_static/lecture_specific/mle/fp.dta?raw=true')
df.head()

Out[3]: country ccode year cyear numbil … topint08 rintr \


0 United States 2.0 1990.0 21990.0 NaN … 39.799999 4.988405
1 United States 2.0 1991.0 21991.0 NaN … 39.799999 4.988405
2 United States 2.0 1992.0 21992.0 NaN … 39.799999 4.988405
3 United States 2.0 1993.0 21993.0 NaN … 39.799999 4.988405
4 United States 2.0 1994.0 21994.0 NaN … 39.799999 4.988405

noyrs roflaw nrrents


0 20.0 1.61 NaN
1 20.0 1.61 NaN
2 20.0 1.61 NaN
3 20.0 1.61 NaN
4 20.0 1.61 NaN

[5 rows x 36 columns]

Using a histogram, we can view the distribution of the number of billionaires per country,
numbil0, in 2008 (the United States is dropped for plotting purposes)

In [4]: numbil0_2008 = df[(df['year'] == 2008) & (


df['country'] != 'United States')].loc[:, 'numbil0']

plt.subplots(figsize=(12, 8))
plt.hist(numbil0_2008, bins=30)
plt.xlim(left=0)
plt.grid()
plt.xlabel('Number of billionaires in 2008')
plt.ylabel('Count')
plt.show()
55.4. CONDITIONAL DISTRIBUTIONS 913

From the histogram, it appears that the Poisson assumption is not unreasonable (albeit with
a very low 𝜇 and some outliers).

55.4 Conditional Distributions

In Treisman’s paper, the dependent variable — the number of billionaires 𝑦𝑖 in country 𝑖 —


is modeled as a function of GDP per capita, population size, and years membership in GATT
and WTO.
Hence, the distribution of 𝑦𝑖 needs to be conditioned on the vector of explanatory variables
x𝑖 .
The standard formulation — the so-called poisson regression model — is as follows:

𝑦
𝜇𝑖 𝑖 −𝜇𝑖
𝑓(𝑦𝑖 ∣ x𝑖 ) = 𝑒 ; 𝑦𝑖 = 0, 1, 2, … , ∞. (1)
𝑦𝑖 !

where 𝜇𝑖 = exp(x′𝑖 𝛽) = exp(𝛽0 + 𝛽1 𝑥𝑖1 + … + 𝛽𝑘 𝑥𝑖𝑘 )

To illustrate the idea that the distribution of 𝑦𝑖 depends on x𝑖 let’s run a simple simulation.
We use our poisson_pmf function from above and arbitrary values for 𝛽 and x𝑖

In [5]: y_values = range(0, 20)

# Define a parameter vector with estimates


β = np.array([0.26, 0.18, 0.25, ­0.1, ­0.22])

# Create some observations X


datasets = [np.array([0, 1, 1, 1, 2]),
np.array([2, 3, 2, 4, 0]),
np.array([3, 4, 5, 3, 2]),
np.array([6, 5, 4, 4, 7])]

fig, ax = plt.subplots(figsize=(12, 8))

for X in datasets:
μ = exp(X @ β)
distribution = []
for y_i in y_values:
distribution.append(poisson_pmf(y_i, μ))
ax.plot(y_values,
distribution,
label=f'$\mu_i$={μ:.1}',
marker='o',
markersize=8,
alpha=0.5)

ax.grid()
ax.legend()
ax.set_xlabel('$y \mid x_i$')
ax.set_ylabel(r'$f(y \mid x_i; \beta )$')
ax.axis(xmin=0, ymin=0)
plt.show()
914 CHAPTER 55. MAXIMUM LIKELIHOOD ESTIMATION

We can see that the distribution of 𝑦𝑖 is conditional on x𝑖 (𝜇𝑖 is no longer constant).

55.5 Maximum Likelihood Estimation

In our model for number of billionaires, the conditional distribution contains 4 (𝑘 = 4) pa-
rameters that we need to estimate.
We will label our entire parameter vector as 𝛽 where

𝛽0
⎡𝛽 ⎤
𝛽 = ⎢ 1⎥
⎢𝛽2 ⎥
⎣𝛽3 ⎦

To estimate the model using MLE, we want to maximize the likelihood that our estimate 𝛽̂ is
the true parameter 𝛽.
Intuitively, we want to find the 𝛽̂ that best fits our data.
First, we need to construct the likelihood function ℒ(𝛽), which is similar to a joint probabil-
ity density function.
Assume we have some data 𝑦𝑖 = {𝑦1 , 𝑦2 } and 𝑦𝑖 ∼ 𝑓(𝑦𝑖 ).
If 𝑦1 and 𝑦2 are independent, the joint pmf of these data is 𝑓(𝑦1 , 𝑦2 ) = 𝑓(𝑦1 ) ⋅ 𝑓(𝑦2 ).
If 𝑦𝑖 follows a Poisson distribution with 𝜆 = 7, we can visualize the joint pmf like so

In [6]: def plot_joint_poisson(μ=7, y_n=20):


yi_values = np.arange(0, y_n, 1)
55.5. MAXIMUM LIKELIHOOD ESTIMATION 915

# Create coordinate points of X and Y


X, Y = np.meshgrid(yi_values, yi_values)

# Multiply distributions together


Z = poisson_pmf(X, μ) * poisson_pmf(Y, μ)

fig = plt.figure(figsize=(12, 8))


ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z.T, cmap='terrain', alpha=0.6)
ax.scatter(X, Y, Z.T, color='black', alpha=0.5, linewidths=1)
ax.set(xlabel='$y_1$', ylabel='$y_2$')
ax.set_zlabel('$f(y_1, y_2)$', labelpad=10)
plt.show()

plot_joint_poisson(μ=7, y_n=20)

Similarly, the joint pmf of our data (which is distributed as a conditional Poisson distribu-
tion) can be written as

𝑛 𝑦
𝜇𝑖 𝑖 −𝜇𝑖
𝑓(𝑦1 , 𝑦2 , … , 𝑦𝑛 ∣ x1 , x2 , … , x𝑛 ; 𝛽) = ∏ 𝑒
𝑖=1
𝑦𝑖 !

𝑦𝑖 is conditional on both the values of x𝑖 and the parameters 𝛽.


916 CHAPTER 55. MAXIMUM LIKELIHOOD ESTIMATION

The likelihood function is the same as the joint pmf, but treats the parameter 𝛽 as a random
variable and takes the observations (𝑦𝑖 , x𝑖 ) as given

𝑛 𝑦
𝜇𝑖 𝑖 −𝜇𝑖
ℒ(𝛽 ∣ 𝑦1 , 𝑦2 , … , 𝑦𝑛 ; x1 , x2 , … , x𝑛 ) = ∏ 𝑒
𝑖=1
𝑦𝑖 !
=𝑓(𝑦1 , 𝑦2 , … , 𝑦𝑛 ∣ x1 , x2 , … , x𝑛 ; 𝛽)

Now that we have our likelihood function, we want to find the 𝛽̂ that yields the maximum
likelihood value

maxℒ(𝛽)
𝛽

In doing so it is generally easier to maximize the log-likelihood (consider differentiating


𝑓(𝑥) = 𝑥 exp(𝑥) vs. 𝑓(𝑥) = log(𝑥) + 𝑥).
Given that taking a logarithm is a monotone increasing transformation, a maximizer of the
likelihood function will also be a maximizer of the log-likelihood function.
In our case the log-likelihood is

log ℒ(𝛽) = log (𝑓(𝑦1 ; 𝛽) ⋅ 𝑓(𝑦2 ; 𝛽) ⋅ … ⋅ 𝑓(𝑦𝑛 ; 𝛽))


𝑛
= ∑ log 𝑓(𝑦𝑖 ; 𝛽)
𝑖=1
𝑛 𝑦
𝜇 𝑖
= ∑ log ( 𝑖 𝑒−𝜇𝑖 )
𝑖=1
𝑦𝑖 !
𝑛 𝑛 𝑛
= ∑ 𝑦𝑖 log 𝜇𝑖 − ∑ 𝜇𝑖 − ∑ log 𝑦!
𝑖=1 𝑖=1 𝑖=1

The MLE of the Poisson to the Poisson for 𝛽 ̂ can be obtained by solving

𝑛 𝑛 𝑛
max( ∑ 𝑦𝑖 log 𝜇𝑖 − ∑ 𝜇𝑖 − ∑ log 𝑦!)
𝛽
𝑖=1 𝑖=1 𝑖=1

However, no analytical solution exists to the above problem – to find the MLE we need to use
numerical methods.

55.6 MLE with Numerical Methods

Many distributions do not have nice, analytical solutions and therefore require numerical
methods to solve for parameter estimates.
One such numerical method is the Newton-Raphson algorithm.
Our goal is to find the maximum likelihood estimate 𝛽.̂
At 𝛽,̂ the first derivative of the log-likelihood function will be equal to 0.
55.6. MLE WITH NUMERICAL METHODS 917

Let’s illustrate this by supposing

log ℒ(𝛽) = −(𝛽 − 10)2 − 10

In [7]: β = np.linspace(1, 20)


logL = ­(β ­ 10) ** 2 ­ 10
dlogL = ­2 * β + 20

fig, (ax1, ax2) = plt.subplots(2, sharex=True, figsize=(12, 8))

ax1.plot(β, logL, lw=2)


ax2.plot(β, dlogL, lw=2)

ax1.set_ylabel(r'$log \mathcal{L(\beta)}$',
rotation=0,
labelpad=35,
fontsize=15)
ax2.set_ylabel(r'$\frac{dlog \mathcal{L(\beta)}}{d \beta}$ ',
rotation=0,
labelpad=35,
fontsize=19)
ax2.set_xlabel(r'$\beta$', fontsize=15)
ax1.grid(), ax2.grid()
plt.axhline(c='black')
plt.show()

Substituting symbol L from STIXNonUnicode


Substituting symbol ( from STIXGeneral
Substituting symbol \beta from STIXNonUnicode
Substituting symbol ) from STIXGeneral
Substituting symbol L from STIXNonUnicode
Substituting symbol ( from STIXGeneral
Substituting symbol \beta from STIXNonUnicode
Substituting symbol ) from STIXGeneral
Substituting symbol L from STIXNonUnicode
Substituting symbol ( from STIXGeneral
Substituting symbol \beta from STIXNonUnicode
Substituting symbol ) from STIXGeneral
Substituting symbol L from STIXNonUnicode
Substituting symbol ( from STIXGeneral
Substituting symbol \beta from STIXNonUnicode
Substituting symbol ) from STIXGeneral
Substituting symbol L from STIXNonUnicode
Substituting symbol ( from STIXGeneral
Substituting symbol \beta from STIXNonUnicode
Substituting symbol ) from STIXGeneral
Substituting symbol L from STIXNonUnicode
Substituting symbol ( from STIXGeneral
Substituting symbol \beta from STIXNonUnicode
Substituting symbol ) from STIXGeneral
918 CHAPTER 55. MAXIMUM LIKELIHOOD ESTIMATION

𝑑 log ℒ(𝛽)
The plot shows that the maximum likelihood value (the top plot) occurs when 𝑑𝛽 = 0
(the bottom plot).
Therefore, the likelihood is maximized when 𝛽 = 10.
We can also ensure that this value is a maximum (as opposed to a minimum) by checking
that the second derivative (slope of the bottom plot) is negative.
The Newton-Raphson algorithm finds a point where the first derivative is 0.
To use the algorithm, we take an initial guess at the maximum value, 𝛽0 (the OLS parameter
estimates might be a reasonable guess), then

1. Use the updating rule to iterate the algorithm

𝛽 (𝑘+1) = 𝛽 (𝑘) − 𝐻 −1 (𝛽 (𝑘) )𝐺(𝛽 (𝑘) )


where:

𝑑 log ℒ(𝛽 (𝑘) )


𝐺(𝛽 (𝑘) ) =
𝑑𝛽 (𝑘)
𝑑2 log ℒ(𝛽 (𝑘) )
𝐻(𝛽 (𝑘) ) = ′
𝑑𝛽 (𝑘) 𝑑𝛽 (𝑘)
1. Check whether 𝛽 (𝑘+1) − 𝛽 (𝑘) < 𝑡𝑜𝑙

• If true, then stop iterating and set 𝛽̂ = 𝛽 (𝑘+1)


• If false, then update 𝛽 (𝑘+1)
As can be seen from the updating equation, 𝛽 (𝑘+1) = 𝛽 (𝑘) only when 𝐺(𝛽 (𝑘) ) = 0 ie. where the
first derivative is equal to 0.
55.6. MLE WITH NUMERICAL METHODS 919

(In practice, we stop iterating when the difference is below a small tolerance threshold)
Let’s have a go at implementing the Newton-Raphson algorithm.
First, we’ll create a class called PoissonRegression so we can easily recompute the values of
the log likelihood, gradient and Hessian for every iteration

In [8]: class PoissonRegression:

def __init__(self, y, X, β):


self.X = X
self.n, self.k = X.shape
# Reshape y as a n_by_1 column vector
self.y = y.reshape(self.n,1)
# Reshape β as a k_by_1 column vector
self.β = β.reshape(self.k,1)

def μ(self):
return np.exp(self.X @ self.β)

def logL(self):
y = self.y
μ = self.μ()
return np.sum(y * np.log(μ) ­ μ ­ np.log(factorial(y)))

def G(self):
y = self.y
μ = self.μ()
return X.T @ (y ­ μ)

def H(self):
X = self.X
μ = self.μ()
return ­(X.T @ (μ * X))

Our function newton_raphson will take a PoissonRegression object that has an initial guess
of the parameter vector 𝛽 0 .
The algorithm will update the parameter vector according to the updating rule, and recalcu-
late the gradient and Hessian matrices at the new parameter estimates.
Iteration will end when either:
• The difference between the parameter and the updated parameter is below a tolerance
level.
• The maximum number of iterations has been achieved (meaning convergence is not
achieved).
So we can get an idea of what’s going on while the algorithm is running, an option
display=True is added to print out values at each iteration.

In [9]: def newton_raphson(model, tol=1e­3, max_iter=1000, display=True):

i = 0
error = 100 # Initial error value

# Print header of output


if display:
header = f'{"Iteration_k":<13}{"Log­likelihood":<16}{"θ":<60}'
print(header)
print("­" * len(header))
920 CHAPTER 55. MAXIMUM LIKELIHOOD ESTIMATION

# While loop runs while any value in error is greater


# than the tolerance until max iterations are reached
while np.any(error > tol) and i < max_iter:
H, G = model.H(), model.G()
β_new = model.β ­ (np.linalg.inv(H) @ G)
error = β_new ­ model.β
model.β = β_new

# Print iterations
if display:
β_list = [f'{t:.3}' for t in list(model.β.flatten())]
update = f'{i:<13}{model.logL():<16.8}{β_list}'
print(update)

i += 1

print(f'Number of iterations: {i}')


print(f'β_hat = {model.β.flatten()}')

# Return a flat array for β (instead of a k_by_1 column vector)


return model.β.flatten()

Let’s try out our algorithm with a small dataset of 5 observations and 3 variables in X.

In [10]: X = np.array([[1, 2, 5],


[1, 1, 3],
[1, 4, 2],
[1, 5, 2],
[1, 3, 1]])

y = np.array([1, 0, 1, 1, 0])

# Take a guess at initial βs


init_β = np.array([0.1, 0.1, 0.1])

# Create an object with Poisson model values


poi = PoissonRegression(y, X, β=init_β)

# Use newton_raphson to find the MLE


β_hat = newton_raphson(poi, display=True)

Iteration_k Log­likelihood θ
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
­­­
0 ­4.3447622 ['­1.49', '0.265', '0.244']
1 ­3.5742413 ['­3.38', '0.528', '0.474']
2 ­3.3999526 ['­5.06', '0.782', '0.702']
3 ­3.3788646 ['­5.92', '0.909', '0.82']
4 ­3.3783559 ['­6.07', '0.933', '0.843']
5 ­3.3783555 ['­6.08', '0.933', '0.843']
Number of iterations: 6
β_hat = [­6.07848205 0.93340226 0.84329625]

As this was a simple model with few observations, the algorithm achieved convergence in only
6 iterations.
You can see that with each iteration, the log-likelihood value increased.
Remember, our objective was to maximize the log-likelihood function, which the algorithm
has worked to achieve.
55.6. MLE WITH NUMERICAL METHODS 921

Also, note that the increase in log ℒ(𝛽 (𝑘) ) becomes smaller with each iteration.
This is because the gradient is approaching 0 as we reach the maximum, and therefore the
numerator in our updating equation is becoming smaller.
The gradient vector should be close to 0 at 𝛽̂

In [11]: poi.G()

Out[11]: array([[­3.95169228e­07],
[­1.00114805e­06],
[­7.73114562e­07]])

The iterative process can be visualized in the following diagram, where the maximum is found
at 𝛽 = 10

In [12]: logL = lambda x: ­(x ­ 10) ** 2 ­ 10

def find_tangent(β, a=0.01):


y1 = logL(β)
y2 = logL(β+a)
x = np.array([[β, 1], [β+a, 1]])
m, c = np.linalg.lstsq(x, np.array([y1, y2]), rcond=None)[0]
return m, c

β = np.linspace(2, 18)
fig, ax = plt.subplots(figsize=(12, 8))
ax.plot(β, logL(β), lw=2, c='black')

for β in [7, 8.5, 9.5, 10]:


β_line = np.linspace(β­2, β+2)
m, c = find_tangent(β)
y = m * β_line + c
ax.plot(β_line, y, '­', c='purple', alpha=0.8)
ax.text(β+2.05, y[­1], f'$G({β}) = {abs(m):.0f}$', fontsize=12)
ax.vlines(β, ­24, logL(β), linestyles='­­', alpha=0.5)
ax.hlines(logL(β), 6, β, linestyles='­­', alpha=0.5)

ax.set(ylim=(­24, ­4), xlim=(6, 13))


ax.set_xlabel(r'$\beta$', fontsize=15)
ax.set_ylabel(r'$log \mathcal{L(\beta)}$',
rotation=0,
labelpad=25,
fontsize=15)
ax.grid(alpha=0.3)
plt.show()

Substituting symbol L from STIXNonUnicode


Substituting symbol ( from STIXGeneral
Substituting symbol \beta from STIXNonUnicode
Substituting symbol ) from STIXGeneral
Substituting symbol L from STIXNonUnicode
Substituting symbol ( from STIXGeneral
Substituting symbol \beta from STIXNonUnicode
Substituting symbol ) from STIXGeneral
922 CHAPTER 55. MAXIMUM LIKELIHOOD ESTIMATION

Note that our implementation of the Newton-Raphson algorithm is rather basic — for more
robust implementations see, for example, scipy.optimize.

55.7 Maximum Likelihood Estimation with statsmodels

Now that we know what’s going on under the hood, we can apply MLE to an interesting ap-
plication.
We’ll use the Poisson regression model in statsmodels to obtain a richer output with stan-
dard errors, test values, and more.
statsmodels uses the same algorithm as above to find the maximum likelihood estimates.
Before we begin, let’s re-estimate our simple model with statsmodels to confirm we obtain
the same coefficients and log-likelihood value.

In [13]: X = np.array([[1, 2, 5],


[1, 1, 3],
[1, 4, 2],
[1, 5, 2],
[1, 3, 1]])

y = np.array([1, 0, 1, 1, 0])

stats_poisson = Poisson(y, X).fit()


print(stats_poisson.summary())

Optimization terminated successfully.


Current function value: 0.675671
Iterations 7
Poisson Regression Results
==============================================================================
Dep. Variable: y No. Observations: 5
55.7. MAXIMUM LIKELIHOOD ESTIMATION WITH STATSMODELS 923

Model: Poisson Df Residuals: 2


Method: MLE Df Model: 2
Date: Fri, 07 Aug 2020 Pseudo R­squ.: 0.2546
Time: 00:18:41 Log­Likelihood: ­3.3784
converged: True LL­Null: ­4.5325
Covariance Type: nonrobust LLR p­value: 0.3153
==============================================================================
coef std err z P>|z| [0.025 0.975]
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
const ­6.0785 5.279 ­1.151 0.250 ­16.425 4.268
x1 0.9334 0.829 1.126 0.260 ­0.691 2.558
x2 0.8433 0.798 1.057 0.291 ­0.720 2.407
==============================================================================

Now let’s replicate results from Daniel Treisman’s paper, Russia’s Billionaires, mentioned ear-
lier in the lecture.
Treisman starts by estimating equation (1), where:
• 𝑦𝑖 is 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑖𝑙𝑙𝑖𝑜𝑛𝑎𝑖𝑟𝑒𝑠𝑖
• 𝑥𝑖1 is log 𝐺𝐷𝑃 𝑝𝑒𝑟 𝑐𝑎𝑝𝑖𝑡𝑎𝑖
• 𝑥𝑖2 is log 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑖
• 𝑥𝑖3 is 𝑦𝑒𝑎𝑟𝑠 𝑖𝑛 𝐺𝐴𝑇 𝑇 𝑖 – years membership in GATT and WTO (to proxy access to in-
ternational markets)
The paper only considers the year 2008 for estimation.
We will set up our variables for estimation like so (you should have the data assigned to df
from earlier in the lecture)

In [14]: # Keep only year 2008


df = df[df['year'] == 2008]

# Add a constant
df['const'] = 1

# Variable sets
reg1 = ['const', 'lngdppc', 'lnpop', 'gattwto08']
reg2 = ['const', 'lngdppc', 'lnpop',
'gattwto08', 'lnmcap08', 'rintr', 'topint08']
reg3 = ['const', 'lngdppc', 'lnpop', 'gattwto08', 'lnmcap08',
'rintr', 'topint08', 'nrrents', 'roflaw']

Then we can use the Poisson function from statsmodels to fit the model.
We’ll use robust standard errors as in the author’s paper

In [15]: # Specify model


poisson_reg = sm.Poisson(df[['numbil0']], df[reg1],
missing='drop').fit(cov_type='HC0')
print(poisson_reg.summary())

Optimization terminated successfully.


Current function value: 2.226090
Iterations 9
Poisson Regression Results
==============================================================================
Dep. Variable: numbil0 No. Observations: 197
924 CHAPTER 55. MAXIMUM LIKELIHOOD ESTIMATION

Model: Poisson Df Residuals: 193


Method: MLE Df Model: 3
Date: Fri, 07 Aug 2020 Pseudo R­squ.: 0.8574
Time: 00:18:41 Log­Likelihood: ­438.54
converged: True LL­Null: ­3074.7
Covariance Type: HC0 LLR p­value: 0.000
==============================================================================
coef std err z P>|z| [0.025 0.975]
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
const ­29.0495 2.578 ­11.268 0.000 ­34.103 ­23.997
lngdppc 1.0839 0.138 7.834 0.000 0.813 1.355
lnpop 1.1714 0.097 12.024 0.000 0.980 1.362
gattwto08 0.0060 0.007 0.868 0.386 ­0.008 0.019
==============================================================================

Success! The algorithm was able to achieve convergence in 9 iterations.


Our output indicates that GDP per capita, population, and years of membership in the Gen-
eral Agreement on Tariffs and Trade (GATT) are positively related to the number of billion-
aires a country has, as expected.
Let’s also estimate the author’s more full-featured models and display them in a single table

In [16]: regs = [reg1, reg2, reg3]


reg_names = ['Model 1', 'Model 2', 'Model 3']
info_dict = {'Pseudo R­squared': lambda x: f"{x.prsquared:.2f}",
'No. observations': lambda x: f"{int(x.nobs):d}"}
regressor_order = ['const',
'lngdppc',
'lnpop',
'gattwto08',
'lnmcap08',
'rintr',
'topint08',
'nrrents',
'roflaw']
results = []

for reg in regs:


result = sm.Poisson(df[['numbil0']], df[reg],
missing='drop').fit(cov_type='HC0',
maxiter=100, disp=0)
results.append(result)

results_table = summary_col(results=results,
float_format='%0.3f',
stars=True,
model_names=reg_names,
info_dict=info_dict,
regressor_order=regressor_order)
results_table.add_title('Table 1 ­ Explaining the Number of Billionaires \
in 2008')
print(results_table)

Table 1 ­ Explaining the Number of Billionaires in 2008


=================================================
Model 1 Model 2 Model 3
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
const ­29.050*** ­19.444*** ­20.858***
(2.578) (4.820) (4.255)
55.7. MAXIMUM LIKELIHOOD ESTIMATION WITH STATSMODELS 925

lngdppc 1.084*** 0.717***0.737***


(0.138) (0.244) (0.233)
lnpop 1.171*** 0.806***0.929***
(0.097) (0.213) (0.195)
gattwto08 0.006 0.007 0.004
(0.007) (0.006) (0.006)
lnmcap08 0.399** 0.286*
(0.172) (0.167)
rintr ­0.010 ­0.009
(0.010) (0.010)
topint08 ­0.051***
­0.058***
(0.011) (0.012)
nrrents ­0.005
(0.010)
roflaw 0.203
(0.372)
Pseudo R­squared 0.86 0.90 0.90
No. observations 197 131 131
=================================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01

The output suggests that the frequency of billionaires is positively correlated with GDP
per capita, population size, stock market capitalization, and negatively correlated with top
marginal income tax rate.
To analyze our results by country, we can plot the difference between the predicted an actual
values, then sort from highest to lowest and plot the first 15

In [17]: data = ['const', 'lngdppc', 'lnpop', 'gattwto08', 'lnmcap08', 'rintr',


'topint08', 'nrrents', 'roflaw', 'numbil0', 'country']
results_df = df[data].dropna()

# Use last model (model 3)


results_df['prediction'] = results[­1].predict()

# Calculate difference
results_df['difference'] = results_df['numbil0'] ­ results_df['prediction']

# Sort in descending order


results_df.sort_values('difference', ascending=False, inplace=True)

# Plot the first 15 data points


results_df[:15].plot('country', 'difference', kind='bar',
figsize=(12,8), legend=False)
plt.ylabel('Number of billionaires above predicted level')
plt.xlabel('Country')
plt.show()
926 CHAPTER 55. MAXIMUM LIKELIHOOD ESTIMATION

As we can see, Russia has by far the highest number of billionaires in excess of what is pre-
dicted by the model (around 50 more than expected).
Treisman uses this empirical result to discuss possible reasons for Russia’s excess of billion-
aires, including the origination of wealth in Russia, the political climate, and the history of
privatization in the years after the USSR.

55.8 Summary

In this lecture, we used Maximum Likelihood Estimation to estimate the parameters of a


Poisson model.
statsmodels contains other built-in likelihood models such as Probit and Logit.
For further flexibility, statsmodels provides a way to specify the distribution manually using
the GenericLikelihoodModel class - an example notebook can be found here.

55.9 Exercises

55.9.1 Exercise 1

Suppose we wanted to estimate the probability of an event 𝑦𝑖 occurring, given some observa-
tions.
55.10. SOLUTIONS 927

We could use a probit regression model, where the pmf of 𝑦𝑖 is

𝑦
𝑓(𝑦𝑖 ; 𝛽) = 𝜇𝑖 𝑖 (1 − 𝜇𝑖 )1−𝑦𝑖 , 𝑦𝑖 = 0, 1
where 𝜇𝑖 = Φ(x′𝑖 𝛽)

Φ represents the cumulative normal distribution and constrains the predicted 𝑦𝑖 to be be-
tween 0 and 1 (as required for a probability).
𝛽 is a vector of coefficients.
Following the example in the lecture, write a class to represent the Probit model.
To begin, find the log-likelihood function and derive the gradient and Hessian.
The scipy module stats.norm contains the functions needed to compute the cmf and pmf of
the normal distribution.

55.9.2 Exercise 2

Use the following dataset and initial values of 𝛽 to estimate the MLE with the Newton-
Raphson algorithm developed earlier in the lecture

1 2 4 1
⎡1 1 1⎤ ⎡0⎤ 0.1
⎢ ⎥ ⎢ ⎥
X = ⎢1 4 3⎥ 𝑦 = ⎢1⎥ 𝛽 (0) = ⎡
⎢0.1⎥

⎢1 5 6⎥ ⎢1⎥ ⎣0.1⎦
⎣1 3 5⎦ ⎣0⎦

Verify your results with statsmodels - you can import the Probit function with the following
import statement

In [18]: from statsmodels.discrete.discrete_model import Probit

Note that the simple Newton-Raphson algorithm developed in this lecture is very sensitive to
initial values, and therefore you may fail to achieve convergence with different starting values.

55.10 Solutions

55.10.1 Exercise 1

The log-likelihood can be written as

𝑛
log ℒ = ∑ [𝑦𝑖 log Φ(x′𝑖 𝛽) + (1 − 𝑦𝑖 ) log(1 − Φ(x′𝑖 𝛽))]
𝑖=1

Using the fundamental theorem of calculus, the derivative of a cumulative probability


distribution is its marginal distribution

𝜕
Φ(𝑠) = 𝜙(𝑠)
𝜕𝑠
928 CHAPTER 55. MAXIMUM LIKELIHOOD ESTIMATION

where 𝜙 is the marginal normal distribution.


The gradient vector of the Probit model is

𝑛
𝜕 log ℒ 𝜙(x′𝑖 𝛽) 𝜙(x′𝑖 𝛽)
= ∑ [𝑦𝑖 − (1 − 𝑦 𝑖 ) ]x
𝜕𝛽 𝑖=1
Φ(x′𝑖 𝛽) 1 − Φ(x′𝑖 𝛽) 𝑖

The Hessian of the Probit model is

𝑛
𝜕 2 log ℒ ′ 𝜙(x′𝑖 𝛽) + x′𝑖 𝛽Φ(x′𝑖 𝛽) 𝜙𝑖 (x′𝑖 𝛽) − x′𝑖 𝛽(1 − Φ(x′𝑖 𝛽))
′ = − ∑ 𝜙(x 𝑖 𝛽)[𝑦 𝑖 ′ 2
+ (1 − 𝑦 𝑖 ) ′ 2
]x𝑖 x′𝑖
𝜕𝛽𝜕𝛽 𝑖=1
[Φ(x 𝑖 𝛽)] [1 − Φ(x 𝑖 𝛽)]

Using these results, we can write a class for the Probit model as follows

In [19]: class ProbitRegression:

def __init__(self, y, X, β):


self.X, self.y, self.β = X, y, β
self.n, self.k = X.shape

def μ(self):
return norm.cdf(self.X @ self.β.T)

def ϕ(self):
return norm.pdf(self.X @ self.β.T)

def logL(self):
μ = self.μ()
return np.sum(y * np.log(μ) + (1 ­ y) * np.log(1 ­ μ))

def G(self):
μ = self.μ()
ϕ = self.ϕ()
return np.sum((X.T * y * ϕ / μ ­ X.T * (1 ­ y) * ϕ / (1 ­ μ)),
axis=1)

def H(self):
X = self.X
β = self.β
μ = self.μ()
ϕ = self.ϕ()
a = (ϕ + (X @ β.T) * μ) / μ**2
b = (ϕ ­ (X @ β.T) * (1 ­ μ)) / (1 ­ μ)**2
return ­(ϕ * (y * a + (1 ­ y) * b) * X.T) @ X

55.10.2 Exercise 2

In [20]: X = np.array([[1, 2, 4],


[1, 1, 1],
[1, 4, 3],
[1, 5, 6],
[1, 3, 5]])

y = np.array([1, 0, 1, 1, 0])

# Take a guess at initial βs


β = np.array([0.1, 0.1, 0.1])
55.10. SOLUTIONS 929

# Create instance of Probit regression class


prob = ProbitRegression(y, X, β)

# Run Newton­Raphson algorithm


newton_raphson(prob)

Iteration_k Log­likelihood θ
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
­­­
0 ­2.3796884 ['­1.34', '0.775', '­0.157']
1 ­2.3687526 ['­1.53', '0.775', '­0.0981']
2 ­2.3687294 ['­1.55', '0.778', '­0.0971']
3 ­2.3687294 ['­1.55', '0.778', '­0.0971']
Number of iterations: 4
β_hat = [­1.54625858 0.77778952 ­0.09709757]

Out[20]: array([­1.54625858, 0.77778952, ­0.09709757])

In [21]: # Use statsmodels to verify results

print(Probit(y, X).fit().summary())

Optimization terminated successfully.


Current function value: 0.473746
Iterations 6
Probit Regression Results
==============================================================================
Dep. Variable: y No. Observations: 5
Model: Probit Df Residuals: 2
Method: MLE Df Model: 2
Date: Fri, 07 Aug 2020 Pseudo R­squ.: 0.2961
Time: 00:18:41 Log­Likelihood: ­2.3687
converged: True LL­Null: ­3.3651
Covariance Type: nonrobust LLR p­value: 0.3692
==============================================================================
coef std err z P>|z| [0.025 0.975]
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
const ­1.5463 1.866 ­0.829 0.407 ­5.204 2.111
x1 0.7778 0.788 0.986 0.324 ­0.768 2.323
x2 ­0.0971 0.590 ­0.165 0.869 ­1.254 1.060
==============================================================================
930 CHAPTER 55. MAXIMUM LIKELIHOOD ESTIMATION
Bibliography

[1] Daron Acemoglu, Simon Johnson, and James A Robinson. The colonial origins of com-
parative development: An empirical investigation. The American Economic Review,
91(5):1369–1401, 2001.

[2] Daron Acemoglu and James A. Robinson. The political economy of the Kuznets curve.
Review of Development Economics, 6(2):183–203, 2002.

[3] SeHyoun Ahn, Greg Kaplan, Benjamin Moll, Thomas Winberry, and Christian Wolf.
When inequality matters for macro and macro matters for inequality. NBER Macroeco-
nomics Annual, 32(1):1–75, 2018.

[4] S Rao Aiyagari. Uninsured Idiosyncratic Risk and Aggregate Saving. The Quarterly
Journal of Economics, 109(3):659–684, 1994.

[5] D. B. O. Anderson and J. B. Moore. Optimal Filtering. Dover Publications, 2005.

[6] E. W. Anderson, L. P. Hansen, E. R. McGrattan, and T. J. Sargent. Mechanics of


Forming and Estimating Dynamic Linear Economies. In Handbook of Computational
Economics. Elsevier, vol 1 edition, 1996.

[7] Robert L Axtell. Zipf distribution of us firm sizes. science, 293(5536):1818–1820, 2001.

[8] Robert J Barro. On the Determination of the Public Debt. Journal of Political Econ-
omy, 87(5):940–971, 1979.

[9] Jess Benhabib and Alberto Bisin. Skewed wealth distributions: Theory and empirics.
Journal of Economic Literature, 56(4):1261–91, 2018.

[10] Jess Benhabib, Alberto Bisin, and Shenghao Zhu. The wealth distribution in bewley
economies with capital income risk. Journal of Economic Theory, 159:489–515, 2015.

[11] L M Benveniste and J A Scheinkman. On the Differentiability of the Value Function in


Dynamic Models of Economics. Econometrica, 47(3):727–732, 1979.

[12] Dmitri Bertsekas. Dynamic Programming and Stochastic Control. Academic Press, New
York, 1975.

[13] Truman Bewley. The permanent income hypothesis: A theoretical formulation. Journal
of Economic Theory, 16(2):252–292, 1977.

[14] Truman F Bewley. Stationary monetary equilibrium with a continuum of independently


fluctuating consumers. In Werner Hildenbran and Andreu Mas-Colell, editors, Contri-
butions to Mathematical Economics in Honor of Gerard Debreu, pages 27–102. North-
Holland, Amsterdam, 1986.

931
932 BIBLIOGRAPHY

[15] Anmol Bhandari, David Evans, Mikhail Golosov, and Thomas J Sargent. Inequality,
business cycles, and monetary-fiscal policy. Technical report, National Bureau of Eco-
nomic Research, 2018.

[16] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.

[17] Dariusz Buraczewski, Ewa Damek, Thomas Mikosch, et al. Stochastic models with
power-law tails. Springer, 2016.

[18] Philip Cagan. The monetary dynamics of hyperinflation. In Milton Friedman, editor,
Studies in the Quantity Theory of Money, pages 25–117. University of Chicago Press,
Chicago, 1956.

[19] Andrew S Caplin. The variability of aggregate demand with (s, s) inventory policies.
Econometrica, pages 1395–1409, 1985.

[20] Christopher D Carroll. A Theory of the Consumption Function, with and without Liq-
uidity Constraints. Journal of Economic Perspectives, 15(3):23–45, 2001.

[21] Christopher D Carroll. The method of endogenous gridpoints for solving dynamic
stochastic optimization problems. Economics Letters, 91(3):312–320, 2006.

[22] David Cass. Optimum growth in an aggregative model of capital accumulation. Review
of Economic Studies, 32(3):233–240, 1965.

[23] Wilbur John Coleman. Solving the Stochastic Growth Model by Policy-Function Itera-
tion. Journal of Business & Economic Statistics, 8(1):27–29, 1990.

[24] Steven J Davis, R Jason Faberman, and John Haltiwanger. The flow approach to labor
markets: New data sources, micro-macro links and the recent downturn. Journal of
Economic Perspectives, 2006.

[25] Bruno de Finetti. La prevision: Ses lois logiques, ses sources subjectives. Annales de
l’Institute Henri Poincare’, 7:1 – 68, 1937. English translation in Kyburg and Smokler
(eds.), Studies in Subjective Probability, Wiley, New York, 1964.

[26] Angus Deaton. Saving and Liquidity Constraints. Econometrica, 59(5):1221–1248, 1991.

[27] Angus Deaton and Christina Paxson. Intertemporal Choice and Inequality. Journal of
Political Economy, 102(3):437–467, 1994.

[28] Wouter J Den Haan. Comparison of solutions to the incomplete markets model with
aggregate uncertainty. Journal of Economic Dynamics and Control, 34(1):4–27, 2010.

[29] Ulrich Doraszelski and Mark Satterthwaite. Computable markov-perfect industry dy-
namics. The RAND Journal of Economics, 41(2):215–243, 2010.

[30] Y E Du, Ehud Lehrer, and A D Y Pauzner. Competitive economy as a ranking device
over networks. submitted, 2013.

[31] R M Dudley. Real Analysis and Probability. Cambridge Studies in Advanced Mathe-
matics. Cambridge University Press, 2002.

[32] Timothy Dunne, Mark J Roberts, and Larry Samuelson. The growth and failure of us
manufacturing plants. The Quarterly Journal of Economics, 104(4):671–698, 1989.

[33] Robert F Engle and Clive W J Granger. Co-integration and Error Correction: Repre-
sentation, Estimation, and Testing. Econometrica, 55(2):251–276, 1987.
BIBLIOGRAPHY 933

[34] Richard Ericson and Ariel Pakes. Markov-perfect industry dynamics: A framework for
empirical work. The Review of Economic Studies, 62(1):53–82, 1995.

[35] David S Evans. The relationship between firm growth, size, and age: Estimates for 100
manufacturing industries. The Journal of Industrial Economics, pages 567–581, 1987.

[36] G W Evans and S Honkapohja. Learning and Expectations in Macroeconomics. Fron-


tiers of Economic Research. Princeton University Press, 2001.

[37] Pablo Fajgelbaum, Edouard Schaal, and Mathieu Taschereau-Dumouchel. Uncertainty


traps. Technical report, National Bureau of Economic Research, 2015.

[38] M. Friedman. A Theory of the Consumption Function. Princeton University Press,


1956.

[39] Milton Friedman and Rose D Friedman. Two Lucky People. University of Chicago
Press, 1998.

[40] Yoshi Fujiwara, Corrado Di Guilmi, Hideaki Aoyama, Mauro Gallegati, and Wataru
Souma. Do pareto–zipf and gibrat laws hold true? an analysis with european firms.
Physica A: Statistical Mechanics and its Applications, 335(1-2):197–216, 2004.

[41] Xavier Gabaix. Power laws in economics: An introduction. Journal of Economic Per-
spectives, 30(1):185–206, 2016.

[42] Robert Gibrat. Les inégalités économiques: Applications d’une loi nouvelle, la loi de
l’effet proportionnel. PhD thesis, Recueil Sirey, 1931.

[43] Edward Glaeser, Jose Scheinkman, and Andrei Shleifer. The injustice of inequality.
Journal of Monetary Economics, 50(1):199–222, 2003.

[44] Geoffrey J Gordon. Stable function approximation in dynamic programming. In Ma-


chine Learning Proceedings 1995, pages 261–268. Elsevier, 1995.

[45] Olle Häggström. Finite Markov chains and algorithmic applications, volume 52. Cam-
bridge University Press, 2002.

[46] Bronwyn H Hall. The relationship between firm size and firm growth in the us manu-
facturing sector. The Journal of Industrial Economics, pages 583–606, 1987.

[47] Robert E Hall. Stochastic Implications of the Life Cycle-Permanent Income Hypothesis:
Theory and Evidence. Journal of Political Economy, 86(6):971–987, 1978.

[48] Robert E Hall and Frederic S Mishkin. The Sensitivity of Consumption to Transitory
Income: Estimates from Panel Data on Households. National Bureau of Economic Re-
search Working Paper Series, No. 505, 1982.

[49] James D Hamilton. What’s real about the business cycle? Federal Reserve Bank of St.
Louis Review, (July-August):435–452, 2005.

[50] L P Hansen and T J Sargent. Robustness. Princeton University Press, 2008.

[51] L P Hansen and T J Sargent. Recursive Models of Dynamic Linear Economies. The
Gorman Lectures in Economics. Princeton University Press, 2013.

[52] Lars Peter Hansen and Scott F Richard. The Role of Conditioning Information in De-
ducing Testable. Econometrica, 55(3):587–613, May 1987.
934 BIBLIOGRAPHY

[53] J. Michael Harrison and David M. Kreps. Speculative investor behavior in a stock mar-
ket with heterogeneous expectations. The Quarterly Journal of Economics, 92(2):323–
336, 1978.

[54] J. Michael Harrison and David M. Kreps. Martingales and arbitrage in multiperiod
securities markets. Journal of Economic Theory, 20(3):381–408, June 1979.

[55] John Heaton and Deborah J Lucas. Evaluating the effects of incomplete markets on risk
sharing and asset pricing. Journal of Political Economy, pages 443–487, 1996.

[56] O Hernandez-Lerma and J B Lasserre. Discrete-Time Markov Control Processes: Basic


Optimality Criteria. Number Vol 1 in Applications of Mathematics Stochastic Modelling
and Applied Probability. Springer, 1996.

[57] Hugo A Hopenhayn. Entry, exit, and firm dynamics in long run equilibrium. Economet-
rica: Journal of the Econometric Society, pages 1127–1150, 1992.

[58] Hugo A Hopenhayn and Edward C Prescott. Stochastic Monotonicity and Stationary
Distributions for Dynamic Economies. Econometrica, 60(6):1387–1406, 1992.

[59] Mark Huggett. The risk-free rate in heterogeneous-agent incomplete-insurance


economies. Journal of Economic Dynamics and Control, 17(5-6):953–969, 1993.

[60] K Jänich. Linear Algebra. Springer Undergraduate Texts in Mathematics and Technol-
ogy. Springer, 1994.

[61] Robert J. Shiller John Y. Campbell. The Dividend-Price Ratio and Expectations of
Future Dividends and Discount Factors. Review of Financial Studies, 1(3):195–228,
1988.

[62] Boyan Jovanovic. Firm-specific capital and turnover. Journal of Political Economy,
87(6):1246–1260, 1979.

[63] K L Judd. Cournot versus bertrand: A dynamic resolution. Technical report, Hoover
Institution, Stanford University, 1990.

[64] Takashi Kamihigashi. Elementary results on solutions to the bellman equation of dy-
namic programming: existence, uniqueness, and convergence. Technical report, Kobe
University, 2012.

[65] Illenin Kondo, Logan T Lewis, and Andrea Stella. On the us firm and establishment
size distributions. Technical report, SSRN, 2018.

[66] Tjalling C. Koopmans. On the concept of optimal economic growth. In Tjalling C.


Koopmans, editor, The Economic Approach to Development Planning, page 225–287.
Chicago, 1965.

[67] David M. Kreps. Notes on the Theory of Choice. Westview Press, Boulder, Colorado,
1988.

[68] Moritz Kuhn. Recursive Equilibria In An Aiyagari-Style Economy With Permanent


Income Shocks. International Economic Review, 54:807–835, 2013.

[69] Martin Lettau and Sydney Ludvigson. Consumption, Aggregate Wealth, and Expected
Stock Returns. Journal of Finance, 56(3):815–849, 06 2001.
BIBLIOGRAPHY 935

[70] Martin Lettau and Sydney C. Ludvigson. Understanding Trend and Cycle in Asset
Values: Reevaluating the Wealth Effect on Consumption. American Economic Review,
94(1):276–299, March 2004.

[71] David Levhari and Leonard J Mirman. The great fish war: an example using a dynamic
cournot-nash solution. The Bell Journal of Economics, pages 322–334, 1980.

[72] L Ljungqvist and T J Sargent. Recursive Macroeconomic Theory. MIT Press, 4 edition,
2018.

[73] Robert E Lucas, Jr. Asset prices in an exchange economy. Econometrica: Journal of the
Econometric Society, 46(6):1429–1445, 1978.

[74] Robert E Lucas, Jr. and Edward C Prescott. Investment under uncertainty. Economet-
rica: Journal of the Econometric Society, pages 659–681, 1971.

[75] Qingyin Ma, John Stachurski, and Alexis Akira Toda. The income fluctuation problem
and the evolution of wealth. Journal of Economic Theory, 187:105003, 2020.

[76] Benoit Mandelbrot. The variation of certain speculative prices. The Journal of Busi-
ness, 36(4):394–419, 1963.

[77] Albert Marcet and Thomas J Sargent. Convergence of Least-Squares Learning in En-
vironments with Hidden State Variables and Private Information. Journal of Political
Economy, 97(6):1306–1322, 1989.

[78] V Filipe Martins-da Rocha and Yiannis Vailakis. Existence and Uniqueness of a Fixed
Point for Local Contractions. Econometrica, 78(3):1127–1141, 2010.

[79] A Mas-Colell, M D Whinston, and J R Green. Microeconomic Theory, volume 1. Ox-


ford University Press, 1995.

[80] J J McCall. Economics of Information and Job Search. The Quarterly Journal of Eco-
nomics, 84(1):113–126, 1970.

[81] S P Meyn and R L Tweedie. Markov Chains and Stochastic Stability. Cambridge Uni-
versity Press, 2009.

[82] F. Modigliani and R. Brumberg. Utility analysis and the consumption function: An in-
terpretation of cross-section data. In K.K Kurihara, editor, Post-Keynesian Economics.
1954.

[83] Derek Neal. The Complexity of Job Mobility among Young Men. Journal of Labor
Economics, 17(2):237–261, 1999.

[84] Y Nishiyama, S Osada, and K Morimune. Estimation and testing for rank size rule
regression under pareto distribution. In Proceedings of the International Environmental
Modelling and Software Society iEMSs 2004 International Conference. Citeseer, 2004.

[85] Jenő Pál and John Stachurski. Fitted value function iteration with probability one con-
tractions. Journal of Economic Dynamics and Control, 37(1):251–264, 2013.

[86] Jonathan A Parker. The Reaction of Household Consumption to Predictable Changes


in Social Security Taxes. American Economic Review, 89(4):959–973, 1999.

[87] Guillaume Rabault. When do borrowing constraints bind? Some new results on the
income fluctuation problem. Journal of Economic Dynamics and Control, 26(2):217–
245, 2002.
936 BIBLIOGRAPHY

[88] Svetlozar Todorov Rachev. Handbook of heavy tailed distributions in finance: Handbooks
in finance, volume 1. Elsevier, 2003.

[89] Kevin L Reffett. Production-based asset pricing in monetary economies with transac-
tions costs. Economica, pages 427–443, 1996.

[90] Michael Reiter. Solving heterogeneous-agent models by projection and perturbation.


Journal of Economic Dynamics and Control, 33(3):649–665, 2009.

[91] Hernán D Rozenfeld, Diego Rybski, Xavier Gabaix, and Hernán A Makse. The area
and population of cities: New insights from a different perspective on cities. American
Economic Review, 101(5):2205–25, 2011.

[92] Stephen P Ryan. The costs of environmental regulation in a concentrated industry.


Econometrica, 80(3):1019–1061, 2012.

[93] Paul A. Samuelson. Interactions between the multiplier analysis and the principle of
acceleration. Review of Economic Studies, 21(2):75–78, 1939.

[94] Thomas J Sargent. The Demand for Money During Hyperinflations under Rational
Expectations: I. International Economic Review, 18(1):59–82, February 1977.

[95] Thomas J Sargent. Macroeconomic Theory. Academic Press, New York, 2nd edition,
1987.

[96] Jack Schechtman and Vera L S Escudero. Some results on an income fluctuation prob-
lem. Journal of Economic Theory, 16(2):151–166, 1977.

[97] Jose A. Scheinkman. Speculation, Trading, and Bubbles. Columbia University Press,
New York, 2014.

[98] Thomas C Schelling. Models of Segregation. American Economic Review, 59(2):488–


493, 1969.

[99] Christian Schluter and Mark Trede. Size distributions reconsidered. Econometric Re-
views, 38(6):695–710, 2019.

[100] John Stachurski. Continuous state dynamic programming via nonexpansive approxima-
tion. Computational Economics, 31(2):141–160, 2008.

[101] John Stachurski and Alexis Akira Toda. An impossibility theorem for wealth in
heterogeneous-agent models with limited heterogeneity. Journal of Economic Theory,
182:1–24, 2019.

[102] N L Stokey, R E Lucas, and E C Prescott. Recursive Methods in Economic Dynamics.


Harvard University Press, 1989.

[103] Kjetil Storesletten, Christopher I Telmer, and Amir Yaron. Consumption and risk shar-
ing over the life cycle. Journal of Monetary Economics, 51(3):609–633, 2004.

[104] R K Sundaram. A First Course in Optimization Theory. Cambridge University Press,


1996.

[105] George Tauchen. Finite state markov-chain approximations to univariate and vector
autoregressions. Economics Letters, 20(2):177–181, 1986.

[106] Daniel Treisman. Russia’s billionaires. The American Economic Review, 106(5):236–241,
2016.
BIBLIOGRAPHY 937

[107] Ngo Van Long. Dynamic games in the economics of natural resources: a survey. Dy-
namic Games and Applications, 1(1):115–148, 2011.

[108] Pareto Vilfredo. Cours d’économie politique. Rouge, Lausanne, 2, 1896.

[109] Abraham Wald. Sequential Analysis. John Wiley and Sons, New York, 1947.

[110] Charles Whiteman. Linear Rational Expectations Models: A User’s Guide. University of
Minnesota Press, Minneapolis, Minnesota, 1983.

[111] Jeffrey M Wooldridge. Introductory econometrics: A modern approach. Nelson Educa-


tion, 2015.

[112] G Alastair Young and Richard L Smith. Essentials of statistical inference. Cambridge
University Press, 2005.

You might also like