Notes on
GMM Estimation in
Dynamic General
Equilibrium Models
1/38
Overview
I Last week, we discussed a method of estimating dynamic
models— maximum likelihood estimation.
I To write out the likelihood function, we needed to:
I solve the model
I impose strong parametric restrictions on the exogeneous
processes
I Today we discuss a complementary procedure
I Also aimed at estimating parameters of preferences, production
functions, etc...
I ... but doesn’t necessarily require us to provide a solution for
the model.
I Outline for today:
I GMM mechanics
I Applications to macroeconomics
2/38
GMM Mechanics
3/38
Notation/Problem Set-up
I is some parameter vector we want to estimate; 0 its true
value.
I q dim ( )
I zt : vector of data, Z fz1 ,...zT g
I Let f be a function such that E [f (zt ; 0 )] = 0.
I r = dim (f )
I We use
1 X
g (Z ; ) f (zt ; )
T
I The main idea: Estimate by …nding the value which makes
g "as close to 0 as possible"
I Asymptotic variance of the sample mean:
lim T E f[g (Z ; )] [g (Z ; )]0 g
T !1
I We will mainly work through examples where q r.
4/38
Method of Moments Estimation: An Example
I Suppose we have a sample of observations, z1 ,... zT drawn
from a t-distribution with v degrees of freedom.
How would you
Students t-Distribution
0.4
estimate v using 0.35
MLE?
0.3
0.25
0.2
0.15
0.1
0.05
0
-3 -2 -1 0 1 2 3
I A t-distribution with v ! 1 degrees of freedom is a N (0; 1)
v 3v 2
I E z2 = v 2; E z4 = (v 2)(v 4)
5/38
Method of Moments Estimation: An Example
I Suppose we have a sample of observations, z1 ,... zT drawn
from a t-distribution with v degrees of freedom.
Students t-Distribution
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-3 -2 -1 0 1 2 3
I We’ll work through an estimation of a simulated dataset of 40
thousand observations, with v = 8.
6/38
How can we estimate v from data?
I One idea: Match the second moment:
T
1 X v
^2;T (zt )2 =
T v 2
t=1
2 ^2;T
) v^ =
^2;T 1
I A second idea: Match the fourth moment:
T
1 X 3v 2
^4;T (zt )4 =
T (v 2) (v 4)
t=1
q
3 ^4;T + 24^4;T + (^4;T )2
) v^ =
^4;T 3
I In general, there are many ways to estimate v .
7/38
GMM Formulation
I Let zt denote a a 1 vector of observables (in our example
a = 1).
I f (zt , ) denote a r 1 vector-valued function. In our example,
r = 2 and !
zt2 2
f (z; ) = 3 2
zt4 ( 2)( 4)
where when evaluated at the true value of , 0 =v
E [f (zt ; 0 )] =0 (1)
and
E [f (zt ; )] 6= 0 for 6= 0
8/38
I How do we …nd 0? We choose so that
T
1 X
g (Z ; ) = f (zt ; )
T
t=1
is as close to possible as 0.
I g is a r (= 2) two-dimensional function. What does "close"
mean?
I This is a common issue when q dim ( ) is less than r .
9/38
GMM Formulation
^GMM = arg min ( ) where
( ) = g (Z ; )0 W g (Z ; )
Here W is a r dimensional "weighting matrix."
I Two examples:
1 0 2 1
W1 = or W2 = .
0 1 1 3
I Then
T 2
1 1 X 2 2
3 2
( )= zt + zt4
T 2 ( 2) ( 4)
t=1
XT 2 2
2 1 3 2
( )= 2 zt2 +3 zt4 +
T 2 ( 2) ( 4)
t=1
3 2
2 zt2 zt4
2 ( 2) ( 4) 10/38
GMM Results
For any positive de…nite, symmetric W :
1. ^GMM is a consistent estimator of .
2. Our g function converges in distribution:
p
T g (Z ; 0 ) ! N (0; )
0
where = limT !1 T E g (Z ; 0) g (Z ; 0)
11/38
GMM Results This W minimizes
Can we pick W smartly? Pick W = 1 . asymptotic variance
I Why? Increase the in‡uence on ^ of elements of g with
precise estimates.
3. For this optimal W :
p
T ^ 0 ! N (0; V ) , where
1 1 @g (Z ; )
V = D D0 and D 0 =
@ 0
4. Under the null hypothesis that the moment conditions are
consistent with the data generating process for z
J = T g (Z ; )0 1
g (Z ; )
is distributed according to a 2 distribution with degrees of
freedom equal to
Typo, should be r - q
Sargan-Hansen test q r
|{z}
|{z}
# of moment conditions # of estimated parameters
12/38
Iterative GMM
I Problem: The formula W = 1 relies on an unknown. (We
can compute ^ , but don’t know its true value, .)
I ^ (our estimate of
1 X hX i0
f (zt ; ) f (z ; ) )
T
will depend on ^.
I And ^ depends on W .
I Iterative procedure. Start with some ^ (say the identity
matrix I ).
I Find which minimizes g (Z ; )0 ^ 1 g (Z ; )
P 0
I Compute ^ as 1 T f zt ; ^ f zt ; ^
I Repeat steps 1-2 until our estimate of doesn’t change much.
I ^ is easy to compute because the moments are indep. across
t.
I Product of the sums simpli…es to the sum of the products.
I In the asset pricing example, things are not so simple. 13/38
A random sample of T=40000 observations with v=8
10 4
2.5
1.5
0.5
0
-8 -6 -4 -2 0 2 4 6 8
I If we used each moment individually:
2 ^2;T
^2;T = 1:33 ) = 8:06
^2;T 1
q
3 ^4;T + 24^4;T + (^4;T )2
^4;T ) 7:71 ) = 8:23
^4;T 3
14/38
Starting our iterative procedure
I W0 = I
1
( ) = g (Z ; )0 W g (Z ; )
T 2
1 X 2 2
3 2
= zt + zt4
T 2 ( 2) ( 4)
t=1
) ^1 = 8:23
I Now …gure out the next-iteration weighting matrix:
X
^1 = 1
0
f zt ; ^ f zt ; ^
T
5:9 103:2 0:3861 0:0125
= ) W1 =
103:2 3181:5 0:0125 0:0007
15/38
I Working trough that iterative procedure, we eventually get
^iterative = 8:14
I And our J- statistic is
iterative 0
J = T g Z ; ^iterative W iterative g Z ;
which in this case is equal to 1:38
I Now, suppose we did this for 500 random samples.
16/38
p
T ^ 0 ! N (0; V ) , where
@g (Z ; )
1 1
V = D D0 and D 0 =
@ 0
dmatrix = ([mean(sampl:^2) (v + :01)=(v + :01 2); :::
mean(sampl:^4) 3 (v + :01)^2=(v + :01 2)=(v + :01 4)]:::
[mean(sampl:^2) v=(v 2); :::
mean(sampl:^4) 3 v^2=(v 2)=(v 4)])=0:01;
std_err = sqrt(1=periods inv(dmatrix0 W dmatrix));
Distribution of estimated v
2
1.8
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0
7 7.2 7.4 7.6 7.8 8 8.2 8.4 8.6 8.8 9
17/38
2
Are the J statistics (1) distributed?
for ier_idx = 1 : iterations
[[stu¤ to compute the gmm objective, fval]]
J_store(iter_idx) = fval periods;
end
plot(sort(J_store); (1 : iterations)=iterations; :::
chi2inv((1 : iterations)=iterations; 1); :::
(1 : iterations)=iterations)
Distribution of J statistics
1
0.9
0.8
0.7
0.6
CDF
0.5
0.4
0.3
0.2
S imulated
0.1 2
(df=1)
0
0 5 10 15 20 25
J s tatis tic 18/38
Applications to
Macroeconomics
19/38
Asset Pricing
I Suppose we have households with preferences
1
X
t ct1
E0 where > 0 and 2 (0; 1)
1
t=0
I e
Let rt:t+1 f
and rt:t+1 be the return on risky and risk-free assets
between periods t and t + 1.
20/38
SDF
I So: " #
ct+1 i
Et rt:t+1 1 =0 (2)
ct
where rti is the vector of asset payouts.
I Also, subtracting the FOC of risky vs. risk-free assets
" #
ct+1 e f
Et rt:t+1 rt:t+1 =0 (3)
ct
21/38
I From last slide
2 3
ct+1 e
ct rt:t+1 1 0
Et 4 5=
ct+1 e
rt:t+1 f
rt:t+1 0
ct
I Two moments for two parameters: and .
I Our model generates (many) extra moment conditions. For
any variable, Xt , that is known at t
2 3
ct+1 e
6 ct rt:t+1 1 Xt 7
Et 6 7= 0
4 ct+1 e f
5 0
ct rt:t+1 rt:t+1 Xt
I This Xt could be
I A constant
I The return of any asset from t m to t.
I The change in consumption between periods t m and t.
22/38
I Suppose our "instruments," Xt are "1", " ctct 1 ", and "rte 1:t ."
Then our moment condition is
2 3
ct+1 e
ct r t:t+1 1
Key point: c 6 6
7
7
ct+1 e f
and r are 6 rt:t+1 rt:t+1 7 2 3
6 ct 7 0
directly 6 7
6 ct+1 7 6 7
7 6 0 7
e ct
6 rt:t+1 1 ct 1
observable!! 6 ct
7 6 0 7
Et 6
6 ct+1 e f ct
7=6 7
7 6 0 7
6 ct r t:t+1 r t:t+1 ct 1 7 6 7
6 7 4 0 5
6 7
6 ct+1 e
rt:t+1 1 rte 1:t 7 0
6 ct 7
6 7
4 ct+1
5
r e r f r e
ct t:t+1 t:t+1 t 1:t
I De…ne the function f so that left hand of the previous
equation is:
0
E [f (zt , 0 )] = 0, where 0 ( 0, 0)
are the true values of the risk-aversion parameter and discount
factor. 23/38
Data choices (monthly data from 1971-2004):
I r e as the (dividend inclusive) return on the S&P 500
I r f as the return on a 3-month t-bill
I c as the growth rate of nondurable goods + services.
.02
.01
Grow th Rate
-.01 -.020
1975 1980 1985 1990 1995 2000
Year
C ons umption 3-Mo. T Bill R eturn on S&P/5
I f
Corr(rt:t+1 , rtf 1:t )=0.90
24/38
I When constructing
^ 1 X X
f zt ; ^ [ f z ; ^ ]0
T t
1 XX 0
= f zt ; ^ f z ; ^
T t
we need to worry about the correlation between zt and z for
t 6= .
I Because the sample is of …nite length, ^ (taken from the
whole sample) may not be positive de…nite and symmetric.
I Instead we apply the following:
s
X
^ = ^0 + v ^v + ^0v , where
1
s +1
=1
T
X
^v = 1
0
f zt ; ^ f zt v;
^
T
t=v +1
I With no serial correlation, we’re left with ^ = ^0 . 25/38
1 1 1 1
Instruments rte 1:t rte 1:t
ct ct
ct 1 ct 1
0.987 0.985 0.989 0.983
(0.151) (0.042) (0.016) (0.015)
163.4 -6.0 -3.9 -6.8
(105.7) (30.0) (10.6) (10.0)
J statistic 2.1 6.4 6.6 7.4
2 — 6.0 6.0 9.5
0:05
26/38
Burnside and Eichenbaum
I Production
Yt = (Kt Ut )1 (Nt f Wt Xt )
I Preferences
log Ct + Nt log (T Wt f ) + (1 Nt ) log T
I Capital evolution
Kt+1 = 1 Ut Kt + It
I Market clearing
Yt = Ct + Gt + It
I Exogenous Processes:
log Xt = log Xt 1 + + t
log Gt = log Xt + gt , where gt = (1 ) + gt 1 + t
27/38
How are the parameters estimated?
Set T = 1369, = 60, f = 324:8. Other parameters:
Name Value
relationship btw utilization and depreciation: t = Ut 1.56
disutility of supplying labor —
K1 "true" capital at the beginning of the sample —
share of labor in production function —
G
mean of g = X —
g =y ratio of g =y —
average depreciation —
volatility of productivity shocks —
volatility of gov’t spending shocks —
persistence of government spending —
trend productivity growth —
Collect the parameters that we need to estimate in a vector
1 = f ; ; ; ; ; g =y ; ; ; ; K1 g
28/38
Data: K̃, C, G, I, Y, H
87
4.5
1.3 8.1
2.2
1.2 1.1
1.2
Index: 1965=1
340
.9 1
.8
1955 1960 1965 1970 1975 1980 1985
Year
C I G Y H K~
29/38
Moment Conditions (1)
I According to our model
Kt+1 = Kt t Kt + It
I Inverting this equation
Kt+1 It
t =1
Kt
I ~ t was constructed by assuming that the
But the data series, K
depreciation rate was constant
I ~ t will not match up with Kt
In any period K
I Burnside and Eichenbaum assumption: Depreciation rates
match up on average.
I Moment condition #1:
" !#
~ t+1 It
K
M1 (Ht ; 1) : E ln 1 ln
~t
K
30/38
Evolution of Capital
I From last slide
Kt+1 = Kt t Kt + It
I and from the …rst-order condition for utilization:
1 Yt
t =
Kt
I so
1
Kt+1 = Kt Yt + I t
I Given K1 , data on Yt and It , along with and can be used
to impute Kt
I Moment Condition #2:
Yt
M2 (Yt ; I2 ; 1) :E log
Kt ( 1 )
31/38
Moment Conditions (2)
I Intuitively, higher ! lower N, on average
I Production function:
y = (kU)1 (NfW ) exp f( 1) g
I FOC for capital utilization/labor e¤ort/hours:
(1 ) y = U k exp ( )
Nf y
=
T Wf cW
y
[log T log (T Wt+1 f )] =
cN
I Euler Equation and market clearing:
y
1= (1 ) + 1 U exp f g
k
y =c +g +k 1 U k exp ( )
32/38
Moment Conditions (2)
I From last slide: Six equations. for the steady-state values of
six endogenous objects (y , W , N, c, U, k).
I Steady state N is pinned down by the parameters.
I Moment condition #3:
M3 (Ht ; 1) : E [log Ht ] f log [N ( 1 )]
33/38
Moment Conditions (3)
I From the Euler Equation
Ct Yt+1
0= E (1 ) + 1 Ut+1 1
Ct+1 Kt+1 ( 1 )
Ct Yt+1 1
= E (1 ) 1 +1 1
Ct+1 Kt+1 ( 1 )
I Moment condition #4:
Ct Yt+1 1
M4 (Ht ; 1) :E (1 ) 1 +1 1
Ct+1 Kt+1 ( 1 )
34/38
Moment Conditions (4)
I From our FOCs
1=
(1 ) Yt Ht Yt
Ut = and =
Kt ( 1 ) T Wt f Ct Wt
I Since
log Yt = (1 ) log (Kt ( 1 ) Ut ) + log (Xt ( 1 ) Wt Ht )
1 (1 )
log Xt ( 1) = log Yt log (Kt ( 1 ) Ut )
log (Ht Wt )
We can back out the time series from observables.
I This leads to the following moment conditions
~ t ; It ; Gt ;
M5 Ht ; Yt ; Ct ; K 1 : E [ log Xt ( 1 )]
h i
~ t ; It ; Gt ;
M6 Ht ; Yt ; Ct ; K 1 : E ( log Xt ( 1 ) )2 2
v
35/38
Moment Conditions (5)
I We already solved for log Xt in terms of observable data.
I We have data on log Gt .
I So we know gt log GX tt in terms of observables.
I This lead to the additional following moment conditions
~ t ; It ; Gt ;
M7 Ht ; Yt ; Ct ; K 1 : E [gt ( 1) ]
~ t ; It ; Gt ;
M8 Ht ; Yt ; Ct ; K 1 : E[gt ( 1) gt 1 ( 1)
(1 )]
~ t ; It ; Gt ;
M9 Ht ; Yt ; Ct ; K 1 : E[(gt ( 1) gt 1 ( 1)
(1 ))2 ] 2
~ t ; It ; Gt ;
M10 Ht ; Yt ; Ct ; K 1 : E [log Gt log Yt ] log (g =y )
36/38
Estimation
I Store the 10 moment conditions in a vector:
M (Ht ; Gt ; Yt ; Ct ,Kt ,It ; 1)
I Parameters 1 minimize
^ M
M0 W
I To perform the estimation, we only need to solve for the
model’s steady state ratios.
37/38
Summary
I In the asset pricing example: With more moments than
parameters, we can apply tests of the model’s joint
restrictions.
I One way to think about (just-identi…ed) GMM in Burnside
and Eichenbaum:
I Like calibration, but acknowledging of statistical uncertainty
about the values of the parameters. (Hansen and Heckman
1996)
I In both examples: Didn’t need to specify or solve for the full
model in order to estimate the parameters
I Another potentially useful reference:
I Hamilton (1994), Chapter 14 (Sections 14.1 and 14.2 on the
course website)
I Hansen (2001):
http://home.uchicago.edu/~lhansen/time_series_perspective.pdf
38/38