0% found this document useful (0 votes)

48 views45 pages

Advanced Bond Yield Forecasting

This paper examines Gaussian dynamic term structure models (GDTSMs) and their ability to forecast Treasury bond yields out-of-sample. The authors show that within any canonical GDTSM, imposing no-arbitrage restrictions does not affect the conditional forecasts of the pricing factors. As a result, GDTSM forecasts are identical to those from an unrestricted vector autoregressive model. Gains in forecasting performance must come from additional constraints placed directly on the pricing factors' distributions, not no-arbitrage alone. The authors develop methods to constrain expected excess returns to lie in a lower-dimensional space and test their empirical relevance using three-factor GDTSMs.

Uploaded by

Kamel Ramtan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views45 pages

Advanced Bond Yield Forecasting

Uploaded by

Kamel Ramtan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

A New Perspective on Gaussian Dynamic

Term Structure Models

Scott Joslin
MIT Sloan School of Management

Kenneth J. Singleton
Graduate School of Business, Stanford University, and NBER

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

Haoxiang Zhu
Graduate School of Business, Stanford University

In any canonical Gaussian dynamic term structure model (GDTSM), the conditional fore-
casts of the pricing factors are invariant to the imposition of no-arbitrage restrictions. This
invariance is maintained even in the presence of a variety of restrictions on the factor
structure of bond yields. To establish these results, we develop a novel canonical GDTSM
in which the pricing factors are observable portfolios of yields. For our normalization,
standard maximum likelihood algorithms converge to the global optimum almost instanta-
neously. We present empirical estimates and out-of-sample forecasts for several GDTSMs
using data on U.S. Treasury bond yields. (JEL E43, G12, C13)

Dynamic models of the term structure often posit a linear factor structure for a
collection of yields, with these yields related to underlying factors P through
a no-arbitrage relationship. Does the imposition of no-arbitrage in a Gaussian
dynamic term structure model (GDTSM) improve the out-of-sample forecasts
of yields relative to those from the unconstrained factor model, or sharpen
model-implied estimates of expected excess returns? In practice, the answers
to these questions are obscured by the imposition of over-identifying restric-
tions on the risk-neutral (Q) or historical (P) distributions of the risk factors,
or on their market prices of risk, in addition to the cross-maturity restrictions
implied by no-arbitrage.1

We are grateful for helpful comments from Greg Duffee, James Hamilton, Monika Piazzesi, Pietro Veronesi
(the Editor), an anonymous referee, and seminar participants at the AFA annual meeting, MIT, the New York
Federal Reserve Bank, and Stanford. Send correspondence to Scott Joslin, Assistant Professor of Finance,
MIT Sloan School of Management E62-639, Cambridge, MA 02142-1347; telephone: (617) 324-3901.
E-mail: [email protected].
1 Recent studies that explore the forecasting performance of GDTSMs include Duffee (2002), Ang and Piazzesi
(2003), Christensen, Diebold, and Rudebusch (2007), Chernov and Mueller (2008), and Jardet, Monfort, and
Pegoraro (2009), among many others.

c The Author 2011. Published by Oxford University Press on behalf of The Society for Financial Studies.
All rights reserved. For Permissions, please e-mail: [email protected].
doi:10.1093/rfs/hhq128 Advance Access publication January 4, 2011
A New Perspective on Gaussian Dynamic Term Structure Models

We show that, within any canonical GDTSM and for any sample of bond
yields, imposing no-arbitrage does not affect the conditional P expectation of
P, E P [Pt |Pt−1 ]. GDTSM-implied forecasts of P are thus identical to those
from the unrestricted vector-autoregressive (VAR) model for P. To establish
these results, we develop an all-encompassing canonical model in which the
pricing factors P are linear combinations of the collection of yields y (such
as the first N principal components (PCs))2 and in which these “yield fac-
tors” follow an unrestricted VAR. Within our canonical GDTSM, as long as
P is measured without error, unconstrained ordinary least squares (OLS) gives
the maximum likelihood (ML) estimates of E P [Pt |Pt−1 ]. Therefore, enforcing

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

no-arbitrage has no effect on out-of-sample forecasts of P. This result holds
for any other canonical GDTSM, owing to observational equivalence (Dai and
Singleton 2000) and, as such, is a generic feature of GDTSMs.
Heuristically, under the assumption that the yield factors P are observed
without error, these propositions follow from the factorization of the condi-
tional density of y into the product of the conditional P density of P times the
conditional density of measurement errors.3 The density of P is determined
by parameters controlling its conditional mean and its innovation covariance
matrix. The measurement error density is determined by the “no-arbitrage”
cross-sectional relationship among the yields. We show that GDTSMs can be
parameterized so that the parameters governing the P forecasts of P do not
appear in the measurement-error density. Given this separation, the only link
between the conditional P density and the measurement density is the covari-
ance of the innovations. However, a classic result of Zellner (1962) implies that
the ML estimates of E P [Pt |Pt−1 ] are independent of this covariance. Conse-
quently, OLS recovers the ML estimates of E P [Pt |Pt−1 ] and the no-arbitrage
restriction is irrelevant for the conditional P forecast of P.
Key to seeing this irrelevance is our choice of canonical form.4 For any N -
factor model with portfolios of yields P as factors, bond prices depend on the
N (N +1) parameters governing the risk-neutral conditional mean of P and the
(N + 1) parameters linking the short rate to P, for a total of (N + 1)2 parame-
ters. Not all of these parameters are free, however, because internal consistency
requires that the model-implied yields reproduce the yield-factors P. We show
that, given the N yield factors, the entire time-t yield curve can be constructed
Q
by specifying (a) r∞ , the long-run mean of the short rate under Q; (b) λQ ,
the speeds of mean reversion of the yield-factors under Q; and (c) ΣP , the

2 Although standard formulations of affine term structure models use latent (unobservable) risk factors (e.g., Dai
and Singleton 2000, Duffee 2002), by Duffie and Kan (1996) we are free to normalize a model so that the factors
are portfolios of yields on bonds and we choose PCs.
3 See, for example, Chen and Scott (1993) and Pearson and Sun (1994).

4 To emphasize, our canonical form is key to seeing the result; due to observational equivalence, the result holds
for any canonical form.

927
The Review of Financial Studies / v 24 n 3 2011

conditional covariance matrix of yields factors from the VAR. That is, given
ΣP , the entire cross-section of bond yields in an N -factor GDTSM is fully de-
Q Q
termined by only the N + 1 parameters r∞ and λQ . Moreover, (r∞ , λQ , ΣP )
can be efficiently estimated independently of the P conditional mean of Pt ,
rendering no-arbitrage irrelevant for forecasting P.
With these results in place, we proceed to show that the conditional fore-
cast E P [Pt |Pt−1 ] from a no-arbitrage GDTSM remains identical to its coun-
terpart from an unrestricted VAR even in the presence of a large class of
over-identifying restrictions on the factor structure of y. In particular, regard-

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

less of the constraints imposed on the risk-neutral distribution of the yield-
factors P, the GDTSM- and VAR-implied forecasts of these factors are
identical. Put differently, OLS recovers the conditional forecasts of the yield
factors even in the presence of further cross-sectional restrictions on the shape
of the yield curve beyond no-arbitrage.
When does the structure of a GDTSM improve out-of-sample forecasts of
P? We show that if constraints are imposed directly on the P distribution of P
within a no-arbitrage GDTSM, then the ML estimate of E P [Pt |Pt−1 ] is more
efficient than its OLS counterpart from a VAR. Thus, our theoretical results,
as well as subsequent empirical illustrations, show that gains from forecast-
ing using a GDTSM, if any, must come from auxiliary constraints on the P
distribution of P, and not from the no-arbitrage restriction per se.5
An important example of such auxiliary constraints is the number of risk
factors that determine risk premiums. Motivated by the descriptive analysis of
Cochrane and Piazzesi (2005, 2008) and Duffee (2008), we develop methods
for restricting expected excess returns to lie in a space of dimension L (< N ),
without restricting a priori which of the N factors Pt represent priced risks.
If L < N , then there are necessarily restrictions linking the historical and
risk-neutral drifts of Pt . In this case, the forecasts of future yields implied by
a GDTSM are in principle different than those from an unrestricted VAR, and
we investigate the empirical relevance of these constraints within three-factor
(N = 3) GDTSMs.
Additionally, we show that our canonical form allows for the computa-
tionally efficient estimation of GDTSMs. The conditional density of observed
Q
yields is fully characterized by r∞ and λQ , as well as the parameters con-
Q
trolling any measurement errors in yields. Importantly, (r∞ , λQ ) constitutes
a low-dimensional, rotation-invariant (and thus economically meaningful) pa-
rameter space. Using standard search algorithms, we obtain near-instantaneous
convergence to the global optimum of the likelihood function. Convergence is

5 Though one might conclude from reading the recent literature that enforcing no-arbitrage improves out-of-
sample forecasts of bond yields, our theorems show that this is not the case. What underlies any documented
forecast gains in these studies from using GDTSMs is the combined structure of no-arbitrage and the auxiliary
restrictions they impose on the P distribution of y .

928
A New Perspective on Gaussian Dynamic Term Structure Models

fast regardless of the number of risk factors or bond yields used in estimation,
or whether the pricing factors P are measured with error.6
The rapid convergence to global optima using our canonical GDTSM makes it
feasible to explore rolling out-of-sample forecasts. For a variety of GDTSMs—
with and without measurement error in yield factors, and with and without
constraints on the dimensionality L of risk premia—we compare the out-of-
sample forecasting performance relative to a benchmark unconstrained VAR,
and confirm our theoretical predictions in the data.

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

1. A Canonical GDTSM with Observable Risk Factors
In this section, we develop our “JSZ” canonical representation of GDTSMs.
Toward this end, we start with a generic representation of a GDTSM, in which
the discrete-time evolution of the risk factors (state vector) X t ∈ R N is gov-
erned by the following equations:7
P P
1X t = K 0X + K 1X X t−1 + Σ X tP , (1)
Q Q Q
1X t = K 0X + K 1X X t−1 + Σ X t , (2)

rt = ρ0X + ρ1X ∙ X t , (3)

where rt is the one-period spot interest rate, Σ X Σ X 0 is the conditional covari-

Q
ance matrix of X t , and tP , t ∼ N (0, I N ). A canonical GDTSM is one that
is maximally flexible in its parameterization of both the Q and P distributions
of X t , subject only to normalizations that ensure econometric identification.
Before formally deriving our canonical GDTSM, we briefly outline the basic
idea. Variations of our canonical form, as well as some of its key implications
for model specification and analysis, are discussed subsequently.
Suppose that N zero-coupon bond yields or N linear combinations of such
yields, Pt , are priced perfectly by the model (subsequently we relax this as-
sumption). By a slight abuse of nomenclature, we will refer to these linear
combinations of yields as portfolios of yields. Applying invariant transforma-
tions,8 we show that (i) the pricing factors X t in (3) can be replaced by the

6 To put this computational advantage into perspective, one needs to read no further than Duffee and Stanton
(2007) and Duffee (2009), who highlight numerous computational challenges and multiple local optima associ-
ated with their likelihood functions. For example, Duffee reports that each optimization for his parametrization
of a three-factor model takes about two days. In contrast, for the GDTSM(3) models examined in this article,
convergence to the global optimum of the likelihood function was typically achieved in about ten seconds, even
though there are three times as many observations in our sample.
7 All of our results apply equally to a continuous-time Gaussian model. Also, we assume that the risk factors, and
hence the yield curve yt , are first-order Markov. See the supplement to this article (Joslin, Singleton, and Zhu
2010) and Joslin, Le, and Singleton (2010) for relaxations of this assumption.
8 Invariant transforms (Dai and Singleton 2000) involve rotating, scaling, and translating the state and parameter
vectors to keep the short rate and bond prices unchanged (invariant), usually by mapping Yt = AX t + b, where
A is an invertible matrix. The transformed parameters are outlined in Appendix B.

929
The Review of Financial Studies / v 24 n 3 2011

observable Pt ; and (ii) the Q distribution of Pt can be fully characterized by

Q Q
the parameters ΘP ≡ (k∞ , λQ , ΣP ), where λQ is the vector of eigenvalues of
Q
K 1X and ΣP Σ0P is the covariance of innovations to the portfolios of yields.9
Q
When the model is stationary under Q, k∞ is proportional to the risk-neutral
Q
long-run mean of the short rate r∞ and a GDTSM can be equivalently param-
eterized in terms of either parameter (see below).
The prices of all coupon bonds (as well as interest rate derivatives) are de-
termined as functions of these observable pricing factors through no-arbitrage.
Importantly, though the pricing factors are now observable, the underlying pa-

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

Q
rameter space of the Q distribution of P is still fully characterized by ΘP .
Moreover, the parameters of the P distribution of the (newly rotated and ob-
servable) state vector Pt are (K 0PP , K 1PP ) along with ΣP . The remainder of this
section fleshes out these ideas.
The model-implied yield on a zero-coupon bond of maturity m is an affine
function of the state X t (Duffie and Kan 1996):

Q Q
yt,m = Am (Θ X ) + Bm (Θ X ) ∙ X t , (4)

where (Am , Bm ) satisfy well-known Riccati difference equations (see

Q Q Q
Appendix A for a summary), and Θ X = (K 0X , K 1X , Σ X , ρ0X , ρ1X ) is the
vector of parameters from (2–3) relevant for pricing. We let (m 1 , m 2 , . . . , m J )
be the set of maturities (in years) of the bonds used in estimation of a GDTSM,
J > N , and yt0 = (yt,m 1 , . . . , yt,m J ) ∈ R J be the corresponding set of model-
implied yields.
In general, (4) may be violated in the data due to market effects (e.g., bid-ask
spreads or repo specials), violations of no-arbitrage, or measurement errors.
We will collectively refer to all of these possibilities simply as measurement or
pricing errors. To distinguish between model-implied and observed yields in
the presence of pricing errors, we let yt,m o denote the yields that are observed

with measurement error. To be consistent with the data, we must impose aux-
iliary structure on a GDTSM, beyond no-arbitrage, in the form of a parametric
distributional assumption for the measurement errors. We let {P θm }θm ∈Θm de-
note the family of measures that describe the conditional distribution of yt − yto .

9 Duffie and Kan (1996) and Cochrane and Piazzesi (2005) also propose to use an identification scheme where
the yields themselves are factors. Adrian and Moench (2008) explore a setting where the pricing factors are the
portfolios themselves; however, they do not impose the internal consistency condition to make the factors equal
to their no-arbitrage equivalents and instead focus on the measurement errors. Our formulation offers an analytic
parametrization and additionally makes transparent our subsequent results.

930
A New Perspective on Gaussian Dynamic Term Structure Models

For any full-rank, portfolio matrix W ∈ R N ×J , we let Pt ≡ W yt denote the

associated N -dimensional set of portfolios of yields, where the i th portfolio
puts weight Wi, j on the yield for maturity m j . Applying (4), we obtain
Q Q
Pt = A W (Θ X ) + BW (Θ X )0 X t , (5)

where A W = W [Am 1 , . . . , Am J ]0 and BW = [Bm 1 , . . . , Bm J ]W 0 . Note that

Q Q Q
BW (K 1X , ρ1 ) depends only on the subset (K 1X , ρ1 ) of Θ X (see (A3) in
Appendix A).

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

Initially, we assume that there exist portfolios for which the no-arbitrage
pricing relations hold exactly:

Case P: There are N portfolios of bond yields Pt , constructed with weights

W , that are priced perfectly by the GDTSM: Pto = Pt .
We refer to the case where each portfolio consists of a single bond, so that N
yields are priced perfectly, as Case Y. We defer until Section 6 the case where
all bonds are measured with errors and estimation is accomplished by Kalman
filtering.

We now state our main result for Case P:

Theorem 1. Suppose that Case P holds for given fixed portfolio weights W .
Then, any canonical GDTSM is observationally equivalent to a unique GDTSM
whose pricing factors Pt are the portfolios of yields Wyt = W yto . Moreover,
Q
the Q distribution of Pt is uniquely determined by (λQ , k∞ , ΣP ), where λQ is
ordered.10 That is,

1Pt = K 0PP + K 1PP Pt−1 + ΣP tP (6)

Q Q Q
1Pt = K 0P + K 1P Pt−1 + ΣP t (7)
r t = ρ 0P + ρ 1 P ∙ P t (8)

Q Q
is a canonical GDTSM, where K 0P , K 1P , ρ0P , and ρ1P are explicit functions
Q Q
of (λQ , k∞ , ΣP ). Our canonical form is parametrized by ΘP = (λQ , k∞ ,
K 0PP , K 1PP , ΣP ).

We refer to the GDTSM in Theorem 1 as the JSZ canonical form parame-

trized by ΘP . Before formally proving Theorem 1, we outline the main steps.
First, we want to show that any GDTSM is observationally equivalent to a
model where the states are the observed bond portfolios Pt (with correspond-
Q Q
ing weights W ). Thus, for G = {(K 0 , K 1 , ρ0 , ρ1 , K 0P , K 1P , Σ)}, the set of all

10 We fix an arbitrary ordering on the complex numbers such that 0 is the smallest number.

931
The Review of Financial Studies / v 24 n 3 2011

possible GDTSMs,11 we want to show that every Θ ∈ G is observationally

equivalent to some ΘP ∈ GP
W
, where

Q Q
W
GP = {(K 0 , K 1 , ρ0 , ρ1 , K 0P , K 1P , Σ) : the factors are portfolios
with weights W }.

This first step is easily established: For any GDTSM with latent state X t , Pt
satisfies (5). Following Dai and Singleton (2000) (DS), we can, by applying
the change of variables outlined in Appendix B, compute the dynamics (under

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

both P and Q) of Pt and express rt as an affine function of Pt . The parameters
after this change of variables give an observationally equivalent model where
the states are the portfolios of yields.
Second, we establish uniqueness by showing that no two GDTSMs in GP W

are observationally equivalent. Clearly, if two GDTSMs are observationally

equivalent and have the same observable factors, it must be that (K 0P , K 1P , Σ)
Q Q
are the same. Intuitively, if the parameters (K 0 , K 1 , ρ0 , ρ1 ) are not the same,
the price of some bonds would depend differently on the factors, a contradic-
tion. In the second step, we formalize this intuition. Moreover, we show that
Q Q Q
for given λQ and k∞ , there exists a unique (K 0 , K 1 , ρ0 , ρ1 ) consistent with
no-arbitrage and the states being the portfolios of yields Pt . In the third and
W in terms of the free parameters (k Q , r Q , Σ ).
final step, we reparamatrize GP ∞ ∞ P
In the second step of our proof of Theorem 1, we will use the following
analogue of the canonical form in Joslin (2007), proved in Appendix C.

Proposition 1. Every canonical GDTSM is observationally equivalent to the

canonical GDTSM with rt = ι ∙ X t ,
Q Q Q
1X t = K 0X + K 1X X t−1 + Σ X t , (9)
P P
1X t = K 0X + K 1X X t−1 + Σ X tP , (10)

where ι is a vector of ones, Σ X is lower triangular (with positive diagonal),

Q Q Q Q
K 1X is in ordered real Jordan form, K 0X,1 = k∞ and K 0X,i = 0 for i 6= 1, and
Q
t , tP ∼ N (0, I N ).

11 More formally, we think of the set of GDTSMs as a set of stochastic processes for the yield curve rather than
as a set of parameters governing the stochastic process of the yield curve. To see the correspondence, we define
on some probability space (Ω, F , P) (with associated filtration {Ft }) the processes y : Ω × N → RN+ . Here,
ytm (ω) is the m -period yield at time t when the state is ω ∈ Ω . When our additional assumption that y is a
Gaussian Markov process and no-arbitrage is maintained (with risk premia at time t depending only on Ft ),
these processes take the form of (1–3) and (4) for some parameters. In this way, we define a surjective map from
Q Q
the set of GDTSM parameters (K 0 , K 1 , ρ0 , ρ1 , K 0P , K 1P , Σ) to the set of GDTSM stochastic processes. With
this association, two GDTSMs are observationally equivalent when the corresponding stochastic processes have
the same finite-dimensional distributions.

932
A New Perspective on Gaussian Dynamic Term Structure Models

Here, we specify the Jordan form with each eigenvalue associated with a
single Jordan block (that is, each eigenvalue has a geometric multiplicity of
Q
one). Thus, when the eigenvalues are all real, K 1X takes the form

Q Q Q
K 1X = J (λQ ) ≡ diag(J1 , J2 , . . . , JmQ ), where each
 Q 
λ 1 ∙∙∙ 0
 i Q 
Q  0 λi ∙∙∙ 0 
Ji =  ,

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

 .. .. .. 
 . . . 1 
Q
0 ∙∙∙ 0 λi

and where the blocks are in order of the eigenvalues. (See Appendix C for
the real Jordan form when the eigenvalues are complex.) We refer to the set
of Jordan canonical GDTSMs as G J , and it is parametrized by Θ J = (λQ ,
Q P , K P , Σ ). The eigenvalues of λQ may not be distinct and may be
k∞ , K 0X 1X X
complex. We explore these possibilities empirically in Section 5.
Proof of Theorem 1: Having already established that we can rotate any model
to one with Pt as the observed states, we proceed to prove the second step.
Suppose that Θ1 , Θ2 ∈ GP W index two observationally equivalent canonical

models. By the existence result in Proposition 1, each Θi is observationally

equivalent to a GDTSM, ΘiJ , which is in real ordered Jordan canonical form.
Since
Pt = A W (ΘiJ ) + BW (Θi )0 X tiJ , (11)

where X tiJ is the latent state for model ΘiJ , it must be that

Θi = A W (ΘiJ ) + BW (ΘiJ )0 ΘiJ . (12)

Here, we use the notation that for a GDTSM with parameter vector Θ and
state X t , the observationally equivalent GDTSM with latent state X̂ t = C +
D X t has parameter vector Θ̂ = C + DΘ, as computed in Appendix B. Since
observational equivalence is transitive, Θ1J is observationally equivalent to Θ2J ;
the uniqueness result in Proposition 1 implies that Θ1J = Θ2J . The equality in
(12) then gives Θ1 = Θ2 , which establishes our second step.
To establish the reparametrization in the third step, we focus on (11) and
Q
(12). The key is to show explicitly how given (λQ , k∞ ) (from ΘiJ ) we can (i)
choose the parameters (K 0J P , K P , Σ ) to get any desired (K P , K P , Σ );
1J J 0P 1P P
Q Q
and (ii) construct the (K 0 , K 1 , ρ0 , ρ1 ) consistent with the factors being Pt .
Details are provided in Appendix D.

933
The Review of Financial Studies / v 24 n 3 2011

For reference, we summarize the transformations computed in the last

step as follows.
Q
Proposition 2. Any canonical GDTSM with Q parameters (λQ , k∞ , ΣP ) has
the JSZ representation in Theorem 1 with
Q
K 1P = B J (λQ )B −1 (13)
Q Q Q
K 0P = k ∞ Bem 1 − K 1P A (14)
−1 0
ρ1P = (B )ι (15)

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

ρ0P = − A ∙ ρ1P , (16)

where em 1 is a vector with all zeros except in the m th 1 entry, which is 1 (m 1

Q
is the multiplicity of λ1 ) and B = BW (J (λQ ), ι)0 , A = A W (k∞ em 1 , J (λ Q ),
Q

B −1 ΣP , 0, ι), where (A W , BW ) are defined in (5) and (A2–A3).

Q
Before proceeding, we discuss the interpretation of the parameter k∞ . If X is
Q Q
stationary under Q, then k∞ and r∞ (the long-run Q mean of the short rate) are
Q Q Pm 1 Q
related according to r∞ = k∞ i=1 (−λ1 )−i , where m 1 is the dimension of
Q Q Q
the first Jordon block J1 of K 1X . Thus, if λ1 is not a repeated root (m 1 = 1),
Q Q Q
r∞ is simply −k∞ /λ1 in stationary models. This is the case in our subsequent
empirical illustrations, where we express our normalization in terms of the
Q
parameter r∞ owing to its natural economic interpretation.
Q Q
That k∞ and r∞ are not always interchangeable in defining a proper canon-
ical form for the set of all GDTSMs of form (1–3) can be seen as follows. In
proceeding to the normalization of Proposition 1, a model with the factors nor-
malized so that rt = ρ0 + ι ∙ X t is further normalized by a level translation
(X t 7→ X t − α). Such level translations can always be used to enforce ρ0 = 0,
Q Q
but they can be used to enforce K 0X = 0 only in the case that K 1X is invertible
(i.e., there are no zero eigenvalues). When m 1 = 1 and there are no zero
12
Q
eigenvalues, the following two normalizations of (K 0P , ρ0 ) are equivalent:
   Q
0 k∞
0 Q  
Q   −k∞ Q 0
K 0P =  .  and ρ0 = Q or K 0 =  .  and ρ0 = 0. (17)
 ..  λ1  .. 
0 0
Q
Theorem 1 uses the form with k∞ , and always applies regardless of the eigen-
Q
values of K 1X .

12 One implication of this observation is that setting both k Q and r Q to zero in the presence of a Q nonstationary
∞ ∞
risk factor, as was done by Christensen, Diebold, and Rudebusch (2007, 2009) in defining their arbitrage-free
Nelson-Siegel model, amounts to imposing an over-identifying restriction on the drift of X 1t .

934
A New Perspective on Gaussian Dynamic Term Structure Models

2. P Dynamics and Maximum Likelihood Estimation

Rather than defining latent states indirectly through a normalization on param-
eters governing the dynamics (under P or Q) of latent states, the JSZ normal-
ization has instead prescribed observable yield portfolios P and parametrized
their Q distribution in a maximally flexible way consistent with no-arbitrage.
A distinctive feature of our normalization is that, in estimation, there is an in-
herent separation between the parameters of the P and Q distributions of Pt .
In contrast, when the risk factors are latent, estimates of the parameters gov-
erning the P distribution necessarily depend on those of the Q distribution of

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

the state, since the pricing model is required to either invert the model for the
fitted states (when N bonds are priced perfectly) or filter for the unobserved
states (when all bonds are measured with errors). This section formalizes this
“separation property” of the JSZ normalization.
By Theorem 1, we can, without loss of generality, use N portfolios of the
yields, Pt = Pto ∈ R N , as observed factors. Suppose that the individual bond
yields, yt , are to be used in estimation and that their associated measurement
errors, yto − yt , have the conditional distribution P θm , for some θm ∈ Θm . We
require only that, for any P θm , these errors are conditionally independent of
lagged values of the measurement errors and satisfy the consistency condition
P(W yto = Pt |Pt ) = 1.13 Then, the conditional likelihood function (under P)
of the observed data (yto ) is

f (yto |yt−1
o
; Θ) = f (yto |Pt ; λQ , k∞
Q
, ΣP , P θm ) × f (Pt |Pt−1 ; K 1PP , K 0PP , ΣP ).
(18)
Notice the convenient separation of parameters in the likelihood function. The
conditional distribution of the yields measured with errors depends only on
Q
(λQ , k∞ , ΣP , P θm ) and not on (K 0PP , K 1PP ). In contrast, the conditional P-
density of the pricing factors Pt depends only on (K 1PP , K 0PP , ΣP ), and not on
Q
(λQ , k∞ ). Using the assumption that Pt is conditionally Gaussian, the second
term in (18) can be expressed as

f (Pt |Pt−1 ; K 1PP , K 0PP , ΣP ) = (2π)−N /2 |ΣP |−1

1 −1 2
× exp − kΣP (Pt − E t−1 [Pt ]) k , (19)
2

13 Implicit in this formulation is the possibility that Cov(y o |P ; λQ , k Q , Σ ) is singular. This would be true in
t t ∞ P
Case Y, where some yields are measured without errors, or when certain portfolios of yto are priced perfectly,
as with the use of principal components as observable factors or as in Chen and Scott (1995), who use different
portfolios of yields as their factors. This setup also accommodates the case where both P and some of the
individual components of yto are priced perfectly by the GDTSM. Furthermore, the errors may be correlated,
non-normal, or have time-varying conditional moments depending on Pt . In practice, it has typically been
assumed that the pricing errors are normally distributed.

935
The Review of Financial Studies / v 24 n 3 2011

where E t−1 [Pt ] = K 0PP + (I + K 1PP )Pt−1 and where for a vector x, kxk2
P 2
denotes the euclidean norm squared: xi . The parameters (K 0PP , K 1PP ) that
maximize the likelihood function f (conditional on t = 0 information), namely

T
X
(K 0PP , K 1PP ) = argmax f (yto |yt−1
o
; K 1PP , K 0PP , ΣP )
t=1

T
X
−1
= argmin kΣP Pto − E t−1 [Pto ] k2 , (20)

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

t=1

are the sample ordinary least squares (OLS) estimates, independent of ΣP

(Zellner 1962). Summarizing these observations:

Proposition 3. Under Case P, the ML estimates of the P parameters (K 0PP ,

K 1PP ) are given by the OLS estimates of the conditional mean of Pt .

Absent constraints linking the P and Q dynamics, one can effectively sepa-
rate the time-series properties (P) of Pt from the cross-sectional constraints
imposed by no-arbitrage (Q). The parameters governing P forecasts distri-
bution thus can be estimated from time series alone, regardless of the cross-
Q
sectional restrictions. Furthermore, independent of (λQ , k∞ , ΣP ), the OLS
estimates of (K 0PP , K 1PP ) are by construction globally optimal. With (K 0PP ,
K 1PP ) at hand, we use the sample conditional variance of Pt , Σ̂P Σ̂0P , com-
puted from the OLS innovations as the starting value for the population vari-
ance ΣP ΣP 0 . Given (λQ , k Q ), this starting value for Σ Σ0 is again by
∞ P P
construction close to the global optimum. Therefore, we have greatly reduced
the number of parameters to be estimated. For instance, in a GDTSM(3) model,
the maximum number of parameters, excluding those governing P θm , is 22 (3
Q
for λQ , 1 for k∞ , 6 for ΣP , 3 for K 0PP and 9 for K 1PP ). With our normal-
Q
ization, one can focus on only the 4 parameters (λQ , k∞ ). This underlies the
substantial improvement in estimation speed for the JSZ normalization over
other canonical forms.
Key to our argument is the fact that we can parametrize of the conditional
distribution of the yields measured with error independently of the parameters
governing the P-conditional mean of P in the sense of the factorization (18).
Q
For any (K 0PP , K 1PP , ΣP , λQ , k∞ ), we have

f (yto |Pt ; λQ , k∞
Q
, ΣP ) × f (Pt |Pt−1 ; K 1PP , K 0PP , ΣP )

≤ f (yto |Pt ; λQ , k∞
Q
, ΣP ) × f (Pt |Pt−1 ; K 1PP ,O L S , K 0PP ,O L S , ΣP ),
(21)

936
A New Perspective on Gaussian Dynamic Term Structure Models

where we suppress the dependence on P θm . This inequality follows from the

observations that (K 0PP , K 1PP ) has no effect on f (yto |Pt ) and that, for any ΣP ,
replacing (K 0PP , K 1PP ) by its OLS estimate increases f (Pt |Pt−1 ).14
It is instructive to compare (18) with the likelihood function that arises in
models with observable factors that parameterize the P distribution of P and
the market prices of these risks. In this case, the parameters are (K 0PP , K 1PP )
Q
and (ρ0 , ρ1 , Λ0 , Λ1 , ΣP ), where E tP [Pt+1 ] = E t [Pt+1 ] + ΣP (Λ0 + Λ1 Pt ),
for state-dependent market prices of risk Λ0 + Λ1 Pt . These parameters are
subject to the internal consistency constraints A W = 0 and BW = I N that en-

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

sure that the model replicates the portfolios of yields P. Moreover, analogous
to (18), the factorization of the likelihood function takes the form

f (yto |yt−1
o
; Θ) = f (yto |Pt ; K 0PP , K 1PP , ΣP , ρ0 , ρ1 , Λ0 , Λ1 )
× f (Pt |Pt−1 ; K 1PP , K 0PP , ΣP ). (22)

Replacing (K 0PP , K 1PP ) with (K 0PP ,O L S , K 1PP ,O L S ) again increases the second
term, but now the first term is affected as well. Thus, within this parameteriza-
tion, the fact that OLS recovers the ML estimates is completely obscured.15

3. On the Relevance of No-arbitrage for Forecasting

The decomposition of the conditional likelihood function of the data in (18)
leads immediately to several important insights about the potential roles of no-
arbitrage restrictions for out-of-sample forecasting. First, Proposition 3 gives
a general striking property of GDTSMs under Case P: The no-arbitrage fea-
ture of a GDTSM has no effect on the ML estimates of K 0PP and K 1PP . This,
in turn, implies that forecasts of future values of P are identical to those from
an unconstrained VAR(1) model for Pt .16 This result sharpens Duffee’s (2009)
finding that the restrictions on a VAR implied by an arbitrage-free GDTSM
cannot be rejected against the alternative of an unrestricted VAR.17 When fore-
casting the N portfolios of yields Pt , Proposition 3 shows theoretically that a
similar result must hold insofar as Case P is (approximately) valid.

14 The last step requires observable factors, another important element of our argument. See Section 3 and (23).

15 In fact, within a macro-GDTSM with a similar parametrization of internally consistent market prices of risk
and observable factors, Ang, Piazzesi, and Wei (2003) report that OLS estimates of E P [Pt+1 |Pt ] are (slightly)
different from their ML estimates. Our analysis generalizes to macro-GDTSMs (see Joslin, Le, and Singleton
2010) and so, in fact, the OLS estimates are the (conditional) ML estimates.
16 Note that, in principle, enforcing no-arbitrage restrictions may be relevant for the construction of forecast confi-
dence intervals through the dependence on ΣP . However, empirically this effect is likely to be small.
17 Duffee (2009) also shows theoretically that no-arbitrage is cross-sectionally irrelevant in any affine model under
the stochastically singular condition of no measurement errors. That is, if the model exactly fits the data without
measurement errors, the cross-sectional loadings (A,B) of (4) are determined without reference to solving the
Ricatti difference equations (A2–A3). Duffee does not theoretically explore the time-series implications of the
no measurement error assumption. In this case, not only would Proposition 3 apply (since Case P is a weaker
assumption) so that the OLS estimates are the ML estimates of (K 0P P , K P ), but also Σ could be inferred
1P P
from a sufficiently large cross-section of bond prices.

937
The Review of Financial Studies / v 24 n 3 2011

The JSZ normalization makes these observations particularly transparent.

In contrast, in the (observationally equivalent) specification in (1–3), portfolio
yield forecasts are

E t [Pt+1 ] − Pt = BW (ΘQ ) (E t [X t+1 ] − X t ) = BW (ΘQ ) K 0X
P P
+ K 1X Xt

= BW (ΘQ ) K 0X
P P
+ K 1X (BW P(ΘQ )−1 Pt − A W (ΘQ ) .

(23)

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

Thus, with latent states, the portfolio forecasts are expressed in terms of both
the P and Q parameters of the model. From (23), it is not obvious that OLS
recovers the ML estimates of (K 0PP , K 1PP ). The JSZ normalization makes the
implicit cancellations in (23) explicit.
Second, the structure of the likelihood function reveals that, in contrast to
the pricing factors, no-arbitrage restrictions are potentially relevant for fore-
casting individual yields that are measured with error. The conditional den-
sity of yto given Pt depends on the parameters of the risk-neutral distribution,
and these are revealed through the cross-maturity restrictions implied by no-
arbitrage. In addition, diffusion invariance implies that ΣP enters both terms
of the likelihood function, so efficient estimation of these parameters comes
from imposing the structure of a GDTSM.
Finally, the structure of the density f (yto |Pt ) also reveals the natural alterna-
tive model for assessing gains in forecast precision from imposing no-arbitrage
restrictions. The state-space representation of this unconstrained model reflects
the presumption that bond yields have a low-dimensional factor structure, but it
does not impose the restrictions implied by a no-arbitrage DTSM. Specifically,
under Case P where Pt is priced perfectly by the GDTSM, the state equation is

1X t+1 = K 0X + K 1X X t + t , t ∼ N (0, Σ X ) i.i.d., (24)

and the observation equation

Pt 0
= C + D Xt + , emt ∼ P θm i.i.d. (25)
yto emt

The parameter set is Θ SS = {(K 0X , K 1X , Σ X , C, D, P θm )}, where P θm is an

observation error distribution that is consistent with Case P.
No-arbitrage requires that the observation equation parameters (C, D) must
be of the form (4); that is, the dynamics are Gaussian under Q. Addition-
ally, no-arbitrage enforces a link between the possible (C, D) and Σ X (dif-
fusion invariance). Since the parameters are not identified, one also imposes
normalizations to achieve a just-identified model. Importantly, the choice of

938
A New Perspective on Gaussian Dynamic Term Structure Models

normalizations will in general affect the ML estimates of the parameters, Θ SS ,

but will not affect the distribution of bond yields implied from the state space
model (either in the cross-section or time series). For example, one could im-
pose the identification scheme in Dai and Singleton (2000) under either the P
or the Q measure. The estimates of (K 0X , K 1X ) and (C, D) will be choice-
specific, but these differences will be offset by changes in the latent states so
that the fits to bond yields will be identical.
Notably, the unconstrained state-space representation (24)-(25) with param-
eter set Θ SS is not the unconstrained J -dimensional VAR representation of

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

yt . The latter relaxes both the no-arbitrage (and any over-identifying restric-
tions) enforced in the GDTSM and the assumed factor structure of bond yields
(the dimension of X t is less than the dimension of yto ). Consequently, gains
in forecasting an individual yield using a GDTSM, relative to the forecasts
from an unconstrained VAR model of yt , may be due to the VAR being over-
parametrized relative to the unconstrained factor model, the imposition of
no-arbitrage restrictions within the GDTSM, or both. The role of no-arbitrage
restrictions is an empirical issue that can be addressed by comparing the
constrained and unconstrained versions of (24)–(25).

4. Irrelevance of Factor Structure for Forecasting

The DTSM literature considers a number of further constraints on the factor
structure of a GDTSM, beyond those implied by the absence of arbitrage. In
addition to making different identification assumptions, one can form a parsi-
monious model by restricting the distribution of certain variables (under either
P or Q) or by restricting the structure of risk premia. We first extend the re-
sults of Section 3 to characterize when this irrelevancy result does (and does
not) hold in more general GDTSMs, and then we discuss the connection of our
results to specific over-identified GDTSMs in the literature.
Within the state-space model (24–25), the parameters (C, D) control the
cross-sectional relationship among the yields, while P θm controls the distribu-
tion of the measurement errors. The covariance matrix of the innovations of
the latent states Σ X is linked to ΣP through the factor loadings (C, D). The
restriction of no-arbitrage, for example, says both that only certain types of
loadings (C, D) are feasible (those given by (4)) and that this feasible set de-
pends on the particular value of Σ X . Thus, no-arbitrage is a cross-parameter
restriction on the feasible set of (C, D, Σ X ) in the general state-space model.
More generally, one might be interested in restrictions on a particular subset
of the parameters η ≡ (C, D, Pmθ , Σ X ), examples of which we discuss in sub-
sequent subsections. The following theorem says that even if restrictions are
imposed on η, as long as (K 0X , K 1X ) are unrestricted, OLS will recover the
ML estimates of (K 0P , K 1P ). (K 0X , K 1X ) will change in general with the re-
strictions imposed on η, but only through an affine transformation of the latent
states.

939
The Review of Financial Studies / v 24 n 3 2011

Theorem 2. Given the state-space model (24–25) and the portfolio matrix W
determining the factors Pt , let H be a subset of the admissible set of η where,
for any (C, D, Σ X , P θm ) ∈ H, the N × N upper left block of D is full rank.
Consider the ML problem with η constrained to lie in the subspace H:

H H
K 0X , K 1X , ηH ∈ arg max f (PT , yT , . . . , P1 , y1 |P0 , y0 ).
K 0X ,K 1X ;η∈H

H , K H , ηH ) are such that

Then, (K 0X 1X

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

H H H H H −1 H
K 0P = D P K 0X − DP K 1X (DP ) CP , (26)
H H H −1
K 1P = D P K 1X (DP ) , (27)

H is the first N elements of C H , D H is the upper left N × N block of

where CP P
H
D , and (K 0P , K 1P ) are the OLS estimates of the regression

1Pt = K 0P + K 1P Pt + tP .
The proof is similar, though notationally more abstract, to the proof of
Proposition 3 and is presented in Appendix E.
Using this result, we first illustrate the estimation of the general state-space
model of (24–25) when the possibility of arbitrage is not precluded. We next
explore the implications of restrictions on the Q and P distributions, as well as
on risk premia, for the conditional distribution of Pt .

4.1 Factor Structure in Arbitrage Models

The factor model (24–25) is not necessarily consistent with the absence of
arbitrage. This is because the loadings in (25) may not come from the solution
Q
of (4) for a given choice of Θ X . Nevertheless, this model is still of interest as
it provides a baseline “factor structure” for the yield curve (cf. Duffee 2009).
Theorem 2 implies that, under Case P, the OLS estimates of the parameters
governing (24) are identical to their counterparts from system ML estimation
of (24–25) when the factors Pt are observed portfolios of bond yields.
Additionally, when, in addition to Case P, the state-space model has tem-
porally i.i.d. normal pricing errors in (25), and these errors are orthogonal to
the portfolio matrix W , the OLS regression of the observed yields onto the
factors P give the ML estimates of the unconstrained (“with arbitrage”) cross-
sectional loadings (C, D) in (25). In this case, the OLS regression estimates of
ΣP must also correspond (through the invariant transformation given in Theo-
rem 2) to the ML estimates of Σ X for the factor model. Taken together, these
procedures provide a simple prescription for constructing alternative reference
models (to arbitrage-free GDTSMs) that maintain the factor structure but do
not impose no-arbitrage. In the empirical analysis in Section 5, we focus on

940
A New Perspective on Gaussian Dynamic Term Structure Models

comparisons of OLS forecasts of PCs with their forecasts from a variety of

arbitrage-free models. These “with arbitrage” factor models provide a natural
reference model when one is interested in forecasting yields.

4.2 Irrelevance of Constraints on the Q Distribution of Yields

The JSZ normalization characterizes the state in terms of an observable port-
folio of zero coupon yields. The conditional Q distribution of Pt+τ (as a func-
tion of Pt ) is expressed in (7), which we have shown can be parametrized by
Q
(λQ , k∞ , ΣP ). Within the model (that is, without measurement errors), P is in-

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

formative about the entire yield curve. Thus, one type of restriction a researcher
may be interested in imposing is on the conditional Q distribution of Pt+τ (or
yt+τ ) as a function of Pt (or yt ).18 Such constraints further restrict (beyond
the no-arbitrage restrictions) the cross-sectional loadings (C, D) in the gen-
eral state-space model as well as which innovation covariances are possible.
Theorem 2 shows that restrictions on the Q distribution of yt+τ , as a function
of yt , are irrelevant for forecasting Pt . Put differently, in the JSZ-normalized
GDTSM, restrictions that affect only the parameters of the Q distribution of Pt
Q
(λQ , k∞ , as well as ΣP ) are irrelevant for forecasting the portfolios of yields
Pt . Though latent-factor representations like (23) suggest that the Q parame-
ters enter into E tP [Pt+1 ], in fact absent restrictions across the P and Q param-
eters of the model, any Q restrictions must affect (K 0X P , K P ) in a manner that
1X
“cancels” their impact on E tP [Pt+1 ].
One example of such a constraint in the literature is the arbitrage-free Nelson-
Siegel (AFNS) model of Christensen, Diebold, and Rudebusch (2007). The
AFNS model allows for a dynamically consistent GDTSM where, except for
a convexity-induced intercept, the factor loadings correspond to those of
Nelson and Siegel (1987). Since the AFNS model is the constrained special
Q
case of the JSZ normalization with λQ = (0, λ, λ) and k∞ = 0,19 an imme-
diate implication of this observation is that forecasts of P using an arbitrage-
free Nelson-Siegel (AFNS) model are equivalent to forecasts based on an
unconstrained VAR(1) representation of P. Proposition 3 implies that these
restrictions do not affect the ML estimates of K 0PP and K 1PP and, hence, they
cannot improve the forecasts of P relative to an unconstrained VAR(1). Thus,
the forecast gains that Christensen, Diebold, and Rudebusch (2007) attribute
to the structure of their AFNS pricing model are, instead, a consequence of the
joint imposition of no-arbitrage and their constraints on the P distribution of
bond yields.

18 More precisely, under Q, y τ τ τ τ τ

t+τ |Ft ∼ N (μt , Σ ). If we express μt = μ (yt ), restrictions on Σ or the
functional form μτ are irrelevant. More generally, since E tP [yt+s ] ∈ Ft = σ (yt ), restrictions of the form
Q
E t [yt+τ ] = g(E tP [yt+τ ]) may affect forecasts.
19 We show this formally in Joslin, Singleton, and Zhu (2010).

941
The Review of Financial Studies / v 24 n 3 2011

4.3 Conditions for Irrelevance of Constraints on Latent Factors

A conclusion of Section 4.2 is that restrictions on the parameters governing Q
distribution of yield factors are irrelevant for forecasts. In this section, we ad-
dress the question if, more generally, a parameter constraint on “Q parameters”
within an identified GDTSM with latent factors affects forecasts. For example,
a researcher may consider the following procedure. They begin with a GDTSM
model with the normalizations of Dai and Singleton (2000) (DS) applied un-
der Q: (K 0XP , K P ) are free while Σ = I , K Q = 0, and K Q is (ordered)
1X X 0X 1X
lower triangular (or real Schur to accommodate complex eigenvalues). After

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

estimation, a more parsimonious model is obtained by taking any coefficients
Q
in K 1X that are insignificantly different from zero and setting them to zero (or
using an iterative AIC or BIC type procedure). A similar procedure is followed
in, for example, Dai and Singleton (2002).
When K 0X P and K P are unconstrained, constraints such as these on Q-
1X
identified parameters are joint constraints on the cross-sectional properties of
the yield curve and the covariance of innovations. To see this, one can invert the
latent factors into the observable factors and observe that non-linear constraints
Q
within the JSZ normalization on (λQ , k∞ , ΣP ) will hold. However, Theorem
2 directly shows that the resulting forecasts for Pt will be identical whether
the constraints are imposed or not. The constraints in general will change the
estimated K 0XP and K P , but they will also change the loadings and the latent
1X
states so that the forecasts of Pt will not change.
Alternatively, one could first apply a normalization under P and then restrict
the parameters governing the Q-conditional distribution of the implied latent
states. For example, as above, one could apply the DS normalization under P
where (K 0XP , K P ) will be restricted while (K Q , K Q ) are restricted. Duffee
1X 0X 1X
and Stanton (2007), for example, apply such a normalization. With this type
of P identification, Theorem 2 no longer applies and it is easy to see that in
general restrictions on the Q parameters (i.e., the Q-conditional distribution of
the latent factors as a function of the latent factors) will affect the forecasts
of Pt .

4.4 Relevance of Constraints on the Structure of Excess Returns

Central to the preceding irrelevance results is the absence of restrictions across
the parameters of the P and Q distributions of Pt . Such constraints would arise
in practice if, for instance, the GDTSM-implied expected excess returns on
bonds of different maturities lie in a space of dimension L less than dim(Pt ) =
N . Put another way, some risks in the economy may have either zero or con-
stant risk premia. When L < N , it also follows that time variation in risk pre-
mia depends only on an L-dimensional state variable. Cochrane and Piazzesi
(2005, 2008) conclude that L = 1 when conditioning risk premiums only on
yield curve information. Joslin, Priebsch, and Singleton (2010) find that L is
at least two when expected excess returns are conditioned on Pt , inflation,

942
A New Perspective on Gaussian Dynamic Term Structure Models

and output growth. We explore the relevance for forecasting bond yields of
imposing the constraint L within GDTSMs that condition risk premiums on
the pricing factors P. When this constraint is (approximately) valid, improved
forecasts of yt may arise from the associated reduction in the dimensionality
of the parameter space.
To interpret this constraint, note from Cox and Huang (1989) and Joslin,
Priebsch, and Singleton (2010) that one-period, expected excess returns on
portfolios of bonds with payoffs that track the pricing factors Pt , say xrPt , are
given by the components of

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

Q Q
xr Pt = K 0PP − K 0P + (K 1PP − K 1P )Pt . (28)

Q
That is, the i th component of (K 1PP − K 1P )Pt is the source of the risk premium
for pure exposures to the i th component of Pt . Therefore, the constraint that the
one-period expected excess returns on bond portfolios are driven by L linear
combinations of the pricing factors P amounts to the constraint that the rank
Q
of A R R P = K 1PP − K 1P is L.20
The reduced rank risk premium GDTSMs can be estimated through a con-
Q
centration of the likelihood in the same spirit as (18). Given (λQ , k∞ , ΣP ,
θ P P
P m ), the ML estimates of (K 0P , K 1P ) can be computed as follows. First,
compute (α, β) from the regression
Q Q
Pt+1 − (K 0P + K 1P Pt ) = α + βPt + tP , (29)

where we fix the volatility matrix ΣP of errors tP and impose the constraint
that β has rank L. We show in Appendix F how one can compute the ML
Q
estimates of this constrained regression in closed form. For a given (λQ , k∞ ,
θ
ΣP , P m ), the ML estimates of the P parameters are then given by
Q Q
K 0PP = K 0P + α̂, K 1PP = K 1P + β̂. (30)

In comparison to the setting underlying Proposition 3 and Theorem 2,

reduced-rank risk premia enforce constraints across the parameters of the P
and Q distributions. Consequently, the ML estimates of the P parameters are
no longer given by their OLS counterparts. This, in turn, means that the im-
plications of Proposition 3 discussed in Section 4.2 will, in general, no longer
apply. Under the reduced-rank restrictions, any further assumptions on the Q
parameters (such as the constraints of the AFNS model) will directly affect
the estimated P parameters as there is a link between the cross-section and

20 Alternatively, we could restrict the rank of [K P − K Q , K P − K Q ] to L. This would enforce the stronger
0P 0P 1P 1P
restriction that only L linear combination of the factors has non-zero expected excess return.

943
The Review of Financial Studies / v 24 n 3 2011

time-series properties of yields. We explore the empirical implications of these

observations in Section 5.

4.5 Relevance of Constraints on the P Distribution of Yields

So far, we have demonstrated that neither the imposition of no-arbitrage nor
restrictions on the Q dynamics have any effect on the ML estimates of K 0PP
and K 1PP . However, restrictions on risk premia, such as the reduced-rank as-
sumption, link P and Q and interact with no-arbitrage to affect estimates of
K 0PP and K 1PP . We now complete this discussion by examining whether no-

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

arbitrage affects the distribution of bond yields when one also imposes stand-
alone restrictions on the P distribution of yields that do not impinge on the Q
distribution, either directly or indirectly through risk premiums. Examples of
such restrictions are that the yield portfolios are cointegrated or that the con-
ditional mean of each portfolio yield does not depend on the other portfolio
yields.21 One can impose such restrictions without reference to a no-arbitrage
model.
In these examples, OLS no longer recovers the ML estimates of the pa-
rameters; rather, to obtain efficient estimates given ΣP , one must implement
generalized least squares (GLS). Let K 0c∗ (ΣP ), K 1c∗ (ΣP ) denote the GLS
estimates of (K 0PP , K 1PP ) given ΣP :

T
X
(K 0c∗ (ΣP ), K 1c∗ (ΣP )) = arg max f (Pto |Pt−1
o
; K 1PP , K 0PP , ΣP ), (31)
P ,K P
K 0P 1P t=1

where the arg max is taken over (K 0PP , K 1PP ) satisfying the appropriate re-
striction on the P dynamics. In the presence of such restrictions, there is a
non-degenerate dependence of (K 0c∗ , K 1c∗ ) on ΣP . This dependence means
that no-arbitrage (which links ΣP across P and Q) affects the ML estimates
of (K 0PP , K 1PP ).
We explore the empirical implications of two types of restrictions on the P
distribution of yields in Section 5: (1) a model with K 1PP constrained to be
diagonal; and (2) a model in which the Pt are cointegrated (with one unit root
and no trend).

4.6 Comparing the JSZ Normalization to Other Canonical Models

The normalizations adopted by DS and Joslin (2007) preserve the latent factor
structure in (9–10), in contrast to the rotation to observable pricing factors in
the JSZ normalization. To our knowledge, the only other normalization that has
an “observable” state vector is the one explored by Collin-Dufresne, Goldstein,

21 See Campbell and Shiller (1991) (among others) for empirical evidence on cointegration among bond yields.
Diebold and Li (2006) adopt an assumption very similar to the second example.

944
A New Perspective on Gaussian Dynamic Term Structure Models

and Jones (2008) (CGJ). All three of these canonical models—DS, Joslin, and
CGJ—are observationally equivalent.22
In the constant volatility subcase of the CGJ setup, the state vector X t is
completely defined by rt and its first N − 1 moments under Q:

X t = (rt , μ1t , μ2t , . . . , μ N −1,t )0 , (32)

where
1 Q 1 Q
μ1t = E (drt ), μk+1,t = E (dμkt ), k = 1, . . . , N − 2. (33)

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

dt dt
Under Q, X t follows
Q Q
d X t = (K 0,C G J + K 1,C G J X t )dt + Σ X d Z t , (34)

Q
where Σ X is lower triangular, K 0,C G J = (0, 0, . . . , 0, γ )0 , and Z t is the stan-
Q
dard Brownian motion. By construction, the matrix K 1,C G J is the companion
Q
matrix factorization of the feedback matrix K 1X in (9).
The sense in which X t is observable in the CGJ normalization is quite
different than in the JSZ normalization, and these differences may have prac-
tical relevance. First, it will not always be convenient to assume that the one-
period short-rate rt is observable. Duffee (1996) highlights various liquidity
and “money-market” effects that might distort yields on short-term bond rela-
tive to what is implied by a GDTSM. The true short rate—the one that implic-
itly underlies the pricing of long-term bonds—will not literally be observable
absent an explicit model of these money-market effects. Second, actions by
monetary authorities might necessitate the inclusion of additional risk factors
or jumps in these factors when explicitly including short rates in the analysis
of a DTSM (Piazzesi 2005). Within the JSZ normalization, one is free to define
the portfolio matrix W so as to focus on segments of the yield curve away from
the very short end, while preserving fully observable P.

22 Different choices of normalizations, associated with different, unique matrix factorizations of the feedback ma-
Q
trix K 1X , give rise to observationally equivalent models, through models with different structure to their param-
eter sets. The JSZ normalization is based on the real Jordan factorization used in Proposition 1. CJG adopt the
companion factorization. For any monic polynomial p(x) = x n − μn−1 x n−1 − ∙ ∙ ∙ − μ1 x − μ0 , the companion
matrix is  0 1 0 ∙∙∙ 0 
 
 0 0 1 ∙ ∙ ∙ 0 
 
 
 . . . .. . 
C( p) =  . . . . .
 . . . . . 
 
 
 0 0 0 ∙∙∙ 1 
μ0 μ1 μ2 ∙∙∙ μn−1
Given any matrix K , its monic characteristic polynomial is unique, and the matrix K is similar to its companion
matrix C( p(K )).

945
The Review of Financial Studies / v 24 n 3 2011

More subtly, the construction of the state vector in the CGJ normalization re-
quires the parameters of the Q distribution. Therefore, any change in the imple-
mentation of a GDTSM that changes the implied Q parameters will necessarily
change the observed pricing factors under the CGJ normalization. Fitting the
same model to two overlapping sample periods could, for example, give rise to
different values of the observed state variables during the overlapping period.
In contrast, under the JSZ normalization, we are led to identical values of P
for all overlapping sample periods.
Full separation of the P and Q sides of the unrestricted model appears to be

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

a unique feature of the JSZ normalization. It is this separation that clarifies the
role of no-arbitrage restrictions in GDTSMs, and gives rise to the enormous
computational advantages of our normalization relative to the DS, Joslin, and
CGJ canonical models.

5. Empirical Results
We estimate the three-factor GDTSMs summarized in Table 1 by ML using
the JSZ canonical form and the methods outlined in Section 3.23 As all of
our estimated models are stationary under Q, we report our results in terms
Q Q
of r∞ instead of k∞ . The data are end-of-month, Constant Maturity Treasury
(CMT) yields from release Fed H.15 over the period from January 1990 to
December 2007 (216 observations). The maturities considered are 6 months,
and 1, 2, 3, 5, 7, and 10 years. From these coupon yields we bootstrap a
zero-coupon curve assuming constant forward rates between maturities. Within
Case P, we consider several subcases. With distinct real eigenvalues, we as-
sume the first three principal components (PCs) are measured without error
(RPC); or the 0.5-, 2-, and 10-year zero coupon yields are measured without
error (RY). Additionally, we estimate models that price the first three PCs of

Table 1
Summary of Model Specifications

Model Name Specification

Q Q Q
RPC Real λQ0 = (λ1 , λ2 , λ3 ), PC1, PC2, PC3 priced exactly
RY Real λ = (λQ
Q0 Q Q
1 , λ2 , λ3 ), 0.5-, 2-, and 10-year zeros priced exactly
Q0 Q Q Q
CPC Complex λ = (λ1 , λ2 , λˉ 2 ), PC1, PC2, PC3 priced exactly
Q Q Q
JPC Real repeated λQ0 = (λ1 , λ2 , λ2 ), PC1, PC2, PC3 priced exactly
RPC1 RPC and rank 1 risk premia
RY1 RY and rank 1 risk premia
RCMT1 RCMT and rank 1 risk premia
JPC1 JPC and rank 1 risk premia
RKF Real distinct λQ , and all yields are measured with error
Q Q Q
RCMT Real λQ0 = (λ1 , λ2 , λ3 ), 0.5-, 2-, and 10-year CMTs priced exactly

ˉ Q denotes the complex conjugate of the i th element of λQ . Also, we defer discussion of case RKF, in which all
23 λ
i
yields are measured with error and Kalman filtering is applied, until Section 6.

946
A New Perspective on Gaussian Dynamic Term Structure Models

the zero curve exactly under the constraints of repeated eigenvalues (JPC) and
complex eigenvalues (CPC). Model JPC imposes the eigenvalue constraint of
the AFNS model examined by Christensen, Diebold, and Rudebusch (2009).
Finally, a subscript of “1” indicates the case of reduced-rank risk premiums
(L = 1) with the one-period expected excess returns being perfectly correlated
across bonds. In all cases, except as noted, the component of measurement er-
rors orthogonal to W are assumed to be normally distributed.24 Although we
derive portfolios from the principal components, one could also use portfo-
lio loadings from various parametric splines for yields such as Nelson-Siegel

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

loadings or polynomial loadings.
An alternative measurement error structure arises when one supposes that
coupon bonds are measured without error. In this case, portfolios of zero bond
yields will necessarily incorporate measurement error. To that end, we consider

Case C: N coupon bonds are priced exactly, and J − N coupon bonds are
measured with normally distributed errors in the GDTSM.

In implementing Case C with coupon-bond data, one can still select N port-
folios of zero coupon yields and construct the rotation where these portfolios
comprise the state vector. Even though such yields may not be observed, this
rotation is still valuable because the portfolios of model-implied zero yields Pt
can be approximated from the observed data. For example, one could bootstrap
or spline an approximate zero coupon yield curve from the observed coupon
bond prices and, from an approximation of Pt , call it Pta . Importantly, the pro-
jection of Pta onto its own lag will recover reliable starting values for K 0PP
and K 1PP . However, because coupon bond yields are nonlinear functions of P,
the irrelevance propositions discussed in Section 3 do not apply to Case C.
In our empirical implementation, we consider the case of the 0.5-, 2-, and
10-year CMT yields measured without error, and the 1-, 3-, 5-, 7-year par
coupon yields measured with errors (RCMT). Throughout, we report asymp-
totic standard errors for the maximum likelihood estimates that are computed
using the outer product of the first derivative of the likelihood function to
estimate the information matrix (see Berndt et al. 1974).

24 In Case Y, this assumption amounts to yield measurement errors being distributed i.i.d. N (0, σ 2 ). When W
p
comes from the principal components, the assumption is equivalent to the higher-order PCs (n > N ) being
distributed N (0, σ p2 ). In both of these cases, we can concentrate σ p from the likelihood (conditional on t = 1
PT o −y
information) through σ̂ p2 = t=2,m (yt,m 2
t,m ) / ((T − 1) × (J − N )) , where yt,m are the model yields that
depend on all the other parameters. To be more precise about the error assumption, let W⊥ ∈ R(J −N )×J be a
basis for the orthogonal complement of the row span of W . Then, since W has orthonormal rows, we can express
yto in terms of its projection onto W and the orthogonal complement to W as yto = W 0 W yto + (W⊥ )0 W⊥ yto =
W 0 Pt + (W⊥ )0 W⊥ yto . We assume yto − yt |Pt has the degenerate distribution N (W 0 Pt , σ p2 (W⊥ )0 W⊥ ) (which
is rotation invariant in the sense that the likelihood is the same for alternative choices of base for the orthogonal
complement to W ). Equivalently, the projection of yto onto W⊥ expressed in the coordinates W⊥ is i.id. normal:
W⊥ yto ∼ N (0, σ p2 I J −N ). This distribution satisfies P(W yto = Pt |Pt ) = 1.

947
The Review of Financial Studies / v 24 n 3 2011

Table 2
ML estimates of the risk-neutral parameters of the model-implied principal components

Parameter Estimate
Q Q Q Q Q
Model λ1 λ2 λ3 /im(λ2 ) r∞
RPC −0.0024 −0.0481 −0.0713 8.61
(0.000566) (0.0083) (0.0133) (0.73)
RY −0.00196 −0.0404 −0.0897 9.37
(0.000378) (0.00274) (0.0073) (0.789)
RKF −0.00245 −0.0472 −0.0739 8.45
(0.000567) (0.00724) (0.0125) (0.678)
RCMT −0.00178 −0.0372 −0.103 11.2

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

(7e-005) (0.000819) (0.0029) (0.346)

JPC −0.00225 −0.0582 −0.0582 8.87

(0.000409) (0.00123) (0.00123) (0.536)
CPC −0.00225 −0.0582 −0.0582 8.87
(0.000409) (0.00123) (0.00123) (0.536)
RPC1 −0.00241 −0.0477 −0.0721 8.61
(0.000559) (0.00766) (0.0126) (0.715)
RY1 −0.00197 −0.0403 −0.0902 9.37
(0.000373) (0.00269) (0.00723) (0.775)
RCMT1 −0.00178 −0.0371 −0.103 11.2
(6.92e−005) (0.000828) (0.003) (0.345)
JPC1 −0.00224 −0.0583 −0.0583 8.9
(0.000405) (0.00122) (0.00122) (0.54)

Q
r∞ is normalized to percent per annum (by multiplying by 12 × 100). Asymptotic standard errors are given in
parentheses.

In order to facilitate comparison of the estimates across models with dif-

ferent pricing factors, all of our results are presented in terms of the implied
P distribution of the first three PCs of the zero yields.25 Table 2 shows that
these parameters are largely invariant to (i) assumptions about the distribution
of measurement errors; (ii) restrictions on the Q dynamics through restrictions
on λQ ; and (iii) restrictions on the relation between the Q and P dynamics
through the reduced-rank assumption. The only mild exception is that model
Q Q
RCMT has a higher r∞ , which is compensated for by slightly lower λ1 and
Q
λ2 . The close alignment of results shows that the cross-section of bond yields
provides a rich information set from which to extract the four relevant Q pa-
Q
rameters, r∞ and λQ .
Another notable feature of these estimates is that the results for model CPC
are the same as those for model JPC. This is because, in the limit, as the com-
plex part of the eigenvalues approaches zero, the complex model approaches
the Jordan model (see Appendix C). Thus we see that, for our dataset, complex
eigenvalues are not preferred over real eigenvalues.
Tables 3 and 4 present the parameters of the P distribution of P. The final
row presents parameters from a VAR (with no pricing involved) of the PCs.

25 That is, under Case Y or when the CMT yields are priced perfectly by the GDTSM, after estimation, we impose
the JSZ normalization based on the PCs of zero yields as the state variables.

948
Table 3
ML estimates of the physical parameters of the model-implied principal components
Model Parameter Estimate
P
K 1,11 P
K 1,12 P
K 1,13 P
K 1,21 P
K 1,22 P
K 1,23 P
K 1,31 P
K 1,32 P
K 1,33 θ1P θ2P θ3P
RPC −0.25 0.16 5.2 0.032 −0.32 4.2 −0.03 −0.028 −1.8 −0.11 0.025 0.0063
(0.16) (0.54) (2.8) (0.054) (0.24) (1.2) (0.023) (0.088) (0.46) (0.028) (0.0075) (0.00035)
RY −0.25 0.11 5.5 0.037 −0.31 4.1 −0.03 −0.034 −1.8 −0.11 0.026 0.0061
(0.15) (0.55) (2.7) (0.054) (0.22) (1.2) (0.023) (0.091) (0.47) (0.027) (0.0075) (0.00035)
RKF −0.12 0.33 6.7 0.0078 −0.35 4.7 −0.021 −0.007 −1.2 −0.14 0.026 0.0063
(0.13) (0.52) (2.9) (0.052) (0.22) (1.2) (0.018) (0.075) (0.42) (0.029) (0.0055) (0.00029)
RCMT −0.25 0.11 5.7 0.037 −0.32 4.1 −0.031 −0.032 −1.8 −0.11 0.026 0.0062
(0.15) (0.55) (2.6) (0.056) (0.23) (1) (0.02) (0.071) (0.43) (0.044) (0.0093) (0.00052)

JPC −0.25 0.16 5.2 0.032 −0.32 4.2 −0.03 −0.028 −1.8 −0.11 0.025 0.0063
(0.15) (0.54) (2.7) (0.054) (0.24) (1.2) (0.023) (0.087) (0.46) (0.027) (0.0074) (0.00035)
CPC 0.16 5.2 0.032 4.2 0.025 0.0063
A New Perspective on Gaussian Dynamic Term Structure Models

−0.25 −0.32 −0.03 −0.028 −1.8 −0.11

(0.15) (0.55) (2.7) (0.052) (0.24) (1.2) (0.023) (0.092) (0.46) (0.099) (0.088) (0.014)

RPC1 −0.24 −0.16 7.4 0.031 −0.14 3.3 −0.025 −0.061 −1.5 −0.11 0.025 0.0063
(0.13) (0.37) (2) (0.04) (0.14) (0.72) (0.016) (0.057) (0.3) (0.035) (0.012) (0.00039)
RY1 −0.24 −0.14 7.3 0.038 −0.17 3.3 −0.026 −0.055 −1.6 −0.11 0.026 0.0061
(0.13) (0.38) (1.8) (0.04) (0.14) (0.64) (0.018) (0.062) (0.29) (0.03) (0.011) (0.00037)
RCMT1 −0.25 −0.11 7.1 0.042 −0.18 3.3 −0.029 −0.045 −1.7 −0.11 0.025 0.0062
(0.15) (0.55) (2.6) (0.057) (0.23) (1.1) (0.02) (0.072) (0.42) (0.04) (0.013) (0.0005)
JPC1 −0.23 −0.18 7.4 0.03 −0.14 3.3 −0.025 −0.064 −1.5 −0.11 0.025 0.0063
(0.13) (0.36) (1.9) (0.04) (0.14) (0.74) (0.016) (0.056) (0.31) (0.036) (0.012) (0.00039)

The long-run P mean of P is defined by θ P = −(K 1P )−1 K 0P . K 1P is annualized (by multiplying by 12). Asymptotic standard errors are given in parentheses.

949
Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012
950
Table 4
ML estimates of the conditional covariance of the model-implied principal components
Model Parameter Estimate
σ1 σ2 σ3 ρ12 ρ13 ρ23

RPC 2.2 0.884 0.373 −0.569 0.584 −0.422

(0.126) (0.0408) (0.0164) (0.0415) (0.0485) (0.0611)
RY 2.2 0.871 0.386 −0.566 0.57 −0.393
(0.125) (0.0426) (0.0174) (0.042) (0.0502) (0.0626)
RKF 2.21 0.837 0.313 −0.603 0.725 −0.631
(0.127) (0.0423) (0.0205) (0.044) (0.0493) (0.0668)
The Review of Financial Studies / v 24 n 3 2011

RCMT 2.23 0.73 0.316 −0.591 0.541 −0.362

(0.423) (0.0215) (0.0278) (0.0325) (0.108) (0.0504)
JPC 2.2 0.884 0.373 −0.569 0.584 −0.421
(0.124) (0.0408) (0.0163) (0.0413) (0.0485) (0.0605)
CPC 2.2 0.883 0.373 −0.569 0.581 −0.421
(0.0407) (0.0398) (0.0152) (0.0316) (0.0401) (0.0589)

RPC1 2.21 0.888 0.374 −0.572 0.586 −0.424

(0.123) (0.0403) (0.0155) (0.0403) (0.0479) (0.0584)
RY1 2.2 0.873 0.386 −0.568 0.571 −0.394
(0.121) (0.0419) (0.0165) (0.0411) (0.0496) (0.0608)
RCMT1 2.23 0.731 0.316 −0.593 0.541 −0.362
(0.424) (0.0215) (0.0278) (0.0324) (0.108) (0.0507)
JPC1 2.21 0.888 0.374 −0.572 0.586 −0.424
(0.122) (0.0402) (0.0154) (0.0402) (0.0479) (0.0579)
√
ρi j is the conditional correlations of the i th and j th components of the factors Pt . Volatility estimates σ1 , σ2 , σ3 are normalized to percent per annum (by multiplying by 100 × 12).
Asymptotic standard errors are given in parentheses.

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

A New Perspective on Gaussian Dynamic Term Structure Models

Table 4 reveals that initializing ΣP using OLS residuals leads to very accu-
rate starting values. By way of contrast, if we had instead used the Dai and
Singleton (2000) (DS) canonical form, an accurate initialization of Σ X would
Q
require a reliable initial value for K 1 . The JSZ canonical form allows us to
Q
avoid this interplay between the values of Σ X and K 1 by applying no-arbitrage
Q
constraints to determine K 1P independently of ΣP .
Across all specifications, the parameters are very comparable. Partly this is a
consequence of Proposition 3: whether λQ comprises distinct real eigenvalues
(RPC), complex eigenvalues (CPC), or repeated eigenvalues (JPC), the esti-

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

mates of K 1PP and K 0PP are equal to each other and to the OLS estimates. How-
ever, stepping beyond this proposition, when we change whether it is PCs or
individual yields (e.g., RPC versus RY) that are priced perfectly by the GDTSM
under Case P, the parameters of the corresponding P distributions remain very
similar. Imposing the reduced-rank risk premium constraint L = 1 leads to
generally similar results, although for some parameters there are measurable
differences in estimates across corresponding models, particularly for some of
the elements of K 1PP .
Regarding the computational efficiency obtained using the JSZ normaliza-
Q
tion, we stress that the only parameters that need to be estimated are (r∞ , λQ ,
P P
ΣP ) since, as discussed in Section 3, (K 0,P , K 1P ) are determined by con-
Q Q
centrating the likelihood and (K 0,P , K 1,P ) are determined by no-arbitrage.26
The models were estimated using sequential quadratic programming, as im-
plemented in Matlab’s fmincon. Estimation under Case P using an informed
guess of the Q eigenvalues took approximately 1.2 seconds.27 Furthermore,
99%+ of the searches converged to the same likelihood value (to within the tol-
erance) with very similar parameter estimates.28 These computational advan-
tages become even more important in the case where all yields are
measured with error, which we consider in Section 6.

5.1 Statistical Inference Within the JSZ Canonical Form

There are two null hypotheses that are of particular interest given our observa-
tions in Section 3. The first test addresses the algebraic multiplicity of eigen-
values in the GDTSM(3) model. As previously stated, the AFNS model of
Christensen, Diebold, and Rudebusch (2007) is equivalent to the JSZ canonical

26 The standard deviation of the pricing errors, σ

pricing , can be concentrated out as well, both when L equals 1 and
when it equals 3.
27 The computations were performed using a single-threaded application on a 2.4GHZ Intel Q6600 processor.

28 An exception here is the Jordan form, where typically there were two local extrema with either the smaller
Q
or the larger eigenvalue repeated. Another general consideration is that one must either optimize over k∞ or
Q
alternatively impose Q stationarity on the model if one desires to use r∞ in estimation. In fact, for estimation
Q Q
purposes, the issue of using k∞ versus r∞ is largely obviated by results in Joslin, Le, and Singleton (2010), who
Q
show how one can concentrate out k∞ under Case P.

951
The Review of Financial Studies / v 24 n 3 2011

Q
form with three extra constraints, including a repeated eigenvalue of K 1 . To
Q Q
assess the validity of the null hypothesis λ2 = λ3 , under the JSZ normaliza-
tion, we perform a Likelihood Ratio (LR) test against the alternative that λQ
is unconstrained. With this one linear constraint, the LR test statistic has an
asymptotic χ 2 distribution with one degree of freedom, χ 2 (1).
The second test of interest is the dimensionality of the one-period risk pre-
mium which, as discussed in Section 4.4, is captured by the rank of A R R P =
Q
K 1PP − K 1P . To impose the constraint that L = 1, we start with the singular
value decomposition of A R R P , UDV0 , where U and V are unitary matrices and

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

D is diagonal with the diagonal sorted in decreasing order. The null hypothesis
of interest—that A R R P has rank 1—is therefore imposed by setting D22 and
D33 to zero. To translate this representation into constraints on the parameter
space, note that, for an N -factor GDTSM with L = 1,
N
X
DV 0 Pt = D11 V j1 P jt . (35)
j=1

Therefore, the expected excess returns xrPt (see Section 4.4) are given by
 
XN
Q
xr Pt = K 0PP − K 0P + U•1 ∙  D11 V j1 P jt  , (36)
j=1

where U•1 is the first column of U . The second term on the right-hand side
of (36) expresses the time-varying components of xrPt in terms of a common
linear combination V•1 0 P of the pricing factors. All of the parameters in (36)
t
0 V
are econometrically identified by virtue of the facts that V•1 •1 = 1 (which
identifies D11 ) and U•1 0 U (which identifies the weights on D V 0 P ). Fur-
•1 11 •1 t
thermore, given N , (36) implies (N − 1)2 cross-equation restrictions on the
parameters of the conditional expectation xrPt . In our case, N = 3, so there
are 4 cross-equation restrictions.
Tests for the equality of two eigenvalues are reported in the top panel of
Table 5, where a leading J means that the model was estimated under the
Q Q
constraint that λ2 = λ3 (consistent with the specifications of AFNS models).
In the PC-based models, this null hypothesis is not rejected, while for the
yield-based models it is rejected at conventional significant levels. To interpret
this difference across choices of risk factors, we note from Table 2 that the
Q Q
estimated |λ2 − λ3 | is larger in model RY than in model RPC, with most
Q
of this difference being attributable to the larger value of |λ3 | in model RY.
Q
The eigenvalue λ3 governs the relatively high-frequency Q variation in yields
and, thus, is particularly relevant for the behavior of the short end of the yield
curve. Introducing the six-month yield directly as a pricing factor overweights
the short end of the yield curve relative to having the PCs as pricing factors, as
the latter are portfolios of yields along the entire maturity spectrum.

952
A New Perspective on Gaussian Dynamic Term Structure Models

Table 5
Likelihood ratio tests
Q Q
H0 : λ2 = λ3

H0 log L 0 Ha log L a LR stats χ 2 (1) p-value

JPC 38.3912 RPC 38.3921 0.375 0.540
JPC1 38.3865 RPC1 38.3876 0.463 0.496
JY 38.1679 RY 38.1863 7.906 0.005
JY1 38.1638 RY1 38.183 8.266 0.004
JRCMT 39.0123 RCMT 39.0414 12.513 0.000

P − KQ
H0 : rank K 1P 1P = 1

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

H0 log L 0 Ha log L a LR stats χ 2 (4) p-value
RPC1 38.3876 RPC 38.3921 1.9475 0.745
JPC1 38.3865 JPC 38.3912 2.0358 0.729
RY 38.1863 RY1 38.1830 1.4217 0.840
JY 38.1679 JY1 38.1638 1.7819 0.776
RCMT1 39.0387 RCMT 39.0414 1.161 0.884

The top panel reports tests for equality of two eigenvalues, and the bottom panel reports tests for rank-1 risk
premium. The likelihood-ratio statistics are computed as LR = −2(T − 1)(log L 0 − log L a ), where T = 216
is sample size and log L 0 and log L a are the log-likelihoods under the null and alternative, respectively. All
log-likelihoods are conditional on t = 1 and are time-series averages across the T − 1 observations.

In the bottom panel, we report tests of the reduced-rank, risk premium hy-
pothesis that L = 1. Under all model specifications, this hypothesis cannot be
rejected. This finding is consistent with the conclusions reached by Cochrane
and Piazzesi (2005), though they effectively considered models with N = 5 as
they examined PC1 through PC5.

5.2 Empirical Relevance of Constraints on P Distribution of Yields

In Section 4.5, we demonstrated that imposing no-arbitrage in addition to con-
straints on P distribution of yields affects the forecasts of yields. We now em-
pirically explore the magnitude of the effect of the interaction of no-arbitrage
with (i) imposing K 1PP to be diagonal; and (ii) imposing that Pt are cointe-
grated (with one unit root and no trend). In both cases, we assume risk premia
have full rank and the Q distribution of yields is unconstrained.
Table 6 presents the estimation results with the constraint that K 1PP is diag-
onal in both the reference VAR as well as asymptotic standard errors. When
the constraint of diagonal K 1PP is imposed, no-arbitrage has almost no effect
on the parameters.29 Additionally, the differences not only are small in magni-
tude, but are also very small with respect to the standard errors.
Table 7 presents the estimation results for the VAR and no-arbitrage mod-
els when cointegration (without a trend) is imposed. Here, we present standard

29 The average log-likelihood (across t ) for the unconstrained no-arbitrage model was 38.392, while for the
diagonal-constrained model it was 38.291. The corresponding likelihood ratio test statistic is 44.0, far exceeding
the 99% rejection region of 16.8, indicating a very strong rejection of this constraint.

953
The Review of Financial Studies / v 24 n 3 2011

Table 6
P constrained to be diagonal
The conditional mean parameters for the model with K 1P
With No Arbitrage Without No Arbitrage
P
K 0P P
K 1P P
K 0P P
K 1P
−0.0129 −0.151 −0.0129 −0.151
(0.0193) (0.135) (0.0188) (0.131)
0.00754 −0.286 0.00761 −0.289
(0.00636) (0.202) (0.00635) (0.201)
0.013 −1.97 0.0129 −1.95
(0.00292) (0.423) (0.00292) (0.421)

P is annualized by multiplying by 12. The left panel imposed no-arbitrage and uses yield data for all matu-

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

K 1P
P
rities. The right panel does not use no-arbitrage and simply computes the estimates of a VAR of Pt with K 1P
constrained to be diagonal through GLS.

Table 7
The conditional mean parameters for the model with cointegration with no trend and one unit root
imposed
With No Arbitrage Without No Arbitrage
P
K 0P P
K 1P P
K 0P P
K 1P

−0.0644 −0.258 0.113 5.22 −0.0668 −0.24 0.266 5.29

(0.0602) (0.336) (0.733) (3.17) (0.218) (0.225) (0.792) (2.67)
−0.0189 0.0495 −0.112 4.32 −0.0172 0.0519 −0.168 4.32
(0.0236) (0.124) (0.288) (1.28) (0.0827) (0.0824) (0.31) (1.03)
0.007 −0.0241 0.0482 −1.73 0.00713 −0.0184 0.0632 −1.71
(0.0105) (0.0562) (0.117) (0.565) (0.0326) (0.0362) (0.126) (0.471)

The left panel imposed no-arbitrage and uses yield data for all maturities. The right panel does not use no-
P , KP ]
arbitrage and simply computes the estimates of a VAR of Pt with cointegration imposed so that [K 0P 1P
has rank 2.

errors computed by a parametric bootstrap due to the well-known non-standard

asymptotics and small-sample bias associated with unit roots. The method that
we used to bootstrap the standard errors is as follows: We randomly choose
a data t ∈ {1, 2, . . . 216} and initialize the state as the value of P on this
date. Then, using the maximum likelihood estimate of the parameters, we
simulate a path of the term structure for the sample size of 216 months and
estimate the model based on these simulated data. These steps are repeated
1000 times. Although the no-arbitrage assumption has a somewhat larger ef-
fect than the diagonal case, the differences are again generally small. Taken to-
gether, these results suggest that although theoretically the no-arbitrage model
may offer improved inference over the simple VAR model when stand-alone
P constraints are imposed, such differences may, evidently, be small in
practice.

5.3 Small-sample standard errors

Another feature of our normalization is that it facilitates the computation of
small-sample standard errors that can be compared to the asymptotic standard

954
A New Perspective on Gaussian Dynamic Term Structure Models

Table 8
The standard errors of the parameter estimates computed both by the asymptotic method and using a
bootstrap method
Parameter Estimate Asymptotic S.E. Bootstrap S.E.
P
K 1,11 −0.2543 (0.1551) (0.2733)
P
K 1,12 0.1595 (0.5428) (0.8277)
P
K 1,13 5.235 (2.761) (3.1)
P
K 1,21 0.03235 (0.05425) (0.1057)
P
K 1,22 −0.3153 (0.2359) (0.3187)

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

P
K 1,23 4.239 (1.212) (1.233)
P
K 1,31 −0.03047 (0.02263) (0.04143)
P
K 1,32 −0.02772 (0.08759) (0.1314)
P
K 1,33 −1.755 (0.4638) (0.5337)

θ1P −0.1109 (0.02762) (0.02496)

θ2P 0.02539 (0.007469) (0.00731)
θ3P 0.00631 (0.0003512) (0.0003162)
Q
λ1 −0.002403 (0.0005662) (0.0006167)
Q
λ2 −0.04813 (0.008296) (0.007395)
Q
λ3 −0.07127 (0.0133) (0.01162)
Q
r∞ 0.08606 (0.007302) (0.01067)

σ1 0.02205 (0.00126) (0.001337)

σ2 0.008838 (0.0004084) (0.001508)
σ3 0.003735 (0.0001643) (0.0002803)
ρ21 −0.5694 (0.04155) (0.2268)
ρ31 0.5842 (0.0485) (0.1161)
ρ32 −0.4218 (0.06114) (0.156)

Here, θ P = −(K 1P )−1 K 0P and ρi j is the conditional correlation between the i th and j th components of Pt .

errors using the outer product of the first derivative of the likelihood function.
We compare these results to bootstrapped standard errors computed with the
procedure given in Section 5.2.
Table 8 presents the results for the model RPC. The asymptotic standard
errors tend to overstate the precision with which we measure the effect of the
level PC on the conditional means of the PCs (K 1,11P , K P , K P ) by a factor
1,21 1,31
of about two. These effects on standard errors for K 1P and θ P are necessarily
due to the small sample properties of OLS estimates in the VAR for P since,
by Proposition 3, the full information ML estimates in the GDTSM agree with
the OLS estimates. Additionally, the precision with which we estimate the Q
parameters is overstated by the asymptotic method by a factor of about 50%.
Overall, though, the asymptotic standard errors line up rather well with the
bootsrapped standard errors.

955
The Review of Financial Studies / v 24 n 3 2011

5.4 Out-of-sample Forecasting Results

An interesting question at this juncture is whether differences in parameter es-
timates translate into differences in the out-of-sample forecasting performance
of these GDTSMs. We compute rolling re-estimation of each model using data
from months t = 1, . . . , T (T = 61, . . . , 215) and use the model to predict,
out of sample, the changes in the principal components over the next 1-, 3-,
6-, and 12-month periods. As a benchmark, we use the corresponding fore-
casts from an unconstrained VAR. As we noted in Section 3, theoretically the
forecasts of Pt are the same across all models that assume these PCs are mea-
sured without error and that differ only in the constraints they impose on the Q

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

distribution of Pt . In particular, with L = 3, whether we assume distinct real
eigenvalues, complex eigenvalues, or repeated eigenvalues (as in the AFNS
model), the forecasts of Pt are all exactly the same as those from an uncon-
strained VAR. This explains the rows of zeros in Table 9.
Under the constraint L = 1 (constrained risk premiums), there is an implicit
constraint on K 1PP and, hence, enforcing the no-arbitrage constraints may im-
prove forecasts. From Table 9, we see that there is a moderate improvement
in forecasts for PC1 and PC2, particularly at longer horizons. Models RPC1
and JPC1 have different predictions (though only slightly). This is because the
differences under Q implied by the repeated root assumption now propagate to
the P dynamics through the restriction relating the P and Q drifts.
As further evidence on the empirical relevance of constraints on the P distri-
bution of P for forecasting, we pursue the examples of Section 5.2: constrain-
ing K 1PP to be diagonal (Table 6) or constraining Pt to have a common unit
root (the cointegration example of Table 7).30 The last four rows of Table 9
present the relative forecasting accuracy of VAR models with these constraints
imposed, as well as their no-arbitrage counterparts with RPC being the uncon-
strained GDTSM. The constrained model VAR + diag(K 1PP ) shows notable
improvements in out-of-sample forecast accuracy for the first and third PCs,
particularly over longer horizons, but interestingly there is a deterioration in
the forecast quality for PC2. This suggests that feedback from (PC1, PC3) to
PC2 is consequential for forecasting the slope of the yield curve. Imposing the
cointegration constraint improves the forecasts of PC1 and, unlike in the prior
example, also the forecasts of PC2.
Of most interest for our analysis is the finding that starting from either of the
constrained VARs and then imposing the no-arbitrage restrictions has virtually
no incremental effect on forecast performance. Even though no-arbitrage re-
strictions can improve out-of-sample forecasts in these cases, in practice they
have virtually no effect on the results in our data. The improvements in fore-
casting with either model RPC + diag(K 1PP ) or RPC + 1UR [K 0PP , K 1PP ] are
entirely a consequence of imposing restrictions on the VAR model for P.

30 For the cointegration example, we enforce the constraint that [K P , K P ] has a zero eigenvalue or, equivalently,
0P 1P
there is a common unit root and no trend.

956
Table 9
The improvement in out-of-sample forecast accuracy relative to the forecasts from a VAR(1)
Forecast Error Relative to PC1 PC2 PC3
Unconstrained VAR(1) (%)
1m 3m 6m 12m 1m 3m 6m 12m 1m 3m 6m 12m
RPC 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
RY −0.3 −0.5 −0.8 −0.7 0.2 0.4 0.4 0.0 0.1 0.8 1.3 0.8
RKF 0.9 3.0 5.9 12.9 −1.7 −4.7 −7.7 −10.0 1.2 3.3 7.7 10.6
JPC 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 −0.0 −0.0 −0.0
CPC 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 −0.0 −0.0 −0.0

RPC1 −2.1 −4.3 −6.2 −7.1 −2.0 −3.8 −3.8 −1.6 −1.5 −2.7 −2.5 0.2
RY1 −2.2 −4.8 −7.3 −8.8 −1.9 −3.9 −3.9 −1.8 −1.6 −2.7 −2.5 −1.0
JPC1 −2.3 −4.7 −6.7 −8.2 −1.9 −3.7 −4.2 −2.7 −1.5 −2.6 −1.9 0.6
P )
VAR + diag (K 1P −5.3 −12.1 −18.6 −21.6 0.7 6.3 11.6 5.7 −2.4 −5.4 −9.1 −13.0
P )
RPC + diag (K 1P 0.7 6.3 11.6 5.6
A New Perspective on Gaussian Dynamic Term Structure Models

−5.3 −12.1 −18.6 −21.6 −2.4 −5.4 −9.1 −13.0

P , KP ]
VAR + 1UR [K 0P 1P −5.3 −10.0 −12.9 −13.5 −2.3 −6.4 −8.9 −6.2 −1.0 −1.6 −1.7 −0.7
P P ]
RPC + 1UR [K 0P , K 1P −5.3 −10.0 −13.0 −13.6 −2.3 −6.3 −8.8 −6.0 −1.0 −1.6 −1.8 −0.9

Forecast errors from a VAR(1) is given by

T
1
v

1PCit+1 − E t [1PCit+1 ] ,
u

t=60
T − 59
u X 2
u
t

where the expectation, E t , is computed using the model estimated with data from time τ = 1, . . . , t. For example, a number of -5 implies that the model has 5% smaller out-of-sample
RMSE than the unrestricted VAR(1).

957
Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012
The Review of Financial Studies / v 24 n 3 2011

It is instructive to place the findings of Christensen, Diebold, and Rude-

busch (2007) for the AFNS model in the context of these results. They com-
pare the forecast performance of an AFNS model with both K 1X P and Σ in
X
(1) constrained to be diagonal to Duffee’s (2002) canonical GDTSM based on
the DS normalization (which is equivalent to our RPC model).31 As with our
examples, forcing K 1X P to be diagonal is a direct constraint on the P distribu-
tion of P and, as such, may lead to more reliable forecasts than those from
an unconstrained VAR model for P. In fact, they report that their constrained
AFNS model does outperform Duffee’s model in forecasting bond yields, also

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

with larger improvements over longer horizons. However, the results in Table 9
suggest that this improvement comes from the restrictions they imposed on the
VAR model for P and not to the use of an AFNS pricing model.

6. Observable Factors with Measurement Errors

Up to this point we have assumed that N portfolios of yields are priced per-
fectly by the GDTSM. We turn next to the case where all of the zero-coupon
yields used in estimation equal their GDTSM-implied values plus measurement
errors. Under the assumption that the measurement errors are jointly normal,
this is a Kalman filtering problem.

Case F: The yields on J (> N ) zero-coupon bonds equal their GDTSM-implied

values plus mean zero, normally distributed errors, yto − yt .

A number of researchers (see, e.g., Duffee and Stanton 2007 and

Duffee 2009) have emphasized the computational challenges of estimation
under Case F. Under the normalization of Dai and Singleton (2000) (DS), a
Q P , K Q , ρ , ρ ), where K Q is lower trian-
researcher must estimate (K 1X , K 0X 1X 0 1 1X
gular. In this parametrization, a researcher would likely have a diffuse prior on
all of the parameters. Moreover, the states of the model depend on the param-
eters, so they too are unknown. We now show that our JSZ canonical repre-
sentation extends to the setting of Case F and demonstrate its benefits both for
interpretation and estimation of GDTSMs.
Theorem 1 shows that any GDTSM is observationally equivalent to a model
where the latent states are a given set of portfolios of yields, purged of measure-
ment errors. In Case P, when the portfolios are assumed to be observed without
measurement errors, this means the states are simply these portfolios of yields.
In Case F, we can maintain the interpretation that the latent states are portfo-
lios of yields with known portfolio matrix W , though now constructed with the
model-implied (measurement-error free) yields yt . Equivalently, under Case F,

31 Christensen, Diebold, and Rudebusch (2007) assume that all yields are measured with additive measurement
errors, the case we turn to in Section 6. However, three-factor models price bonds quite accurately over the
maturity range that they and we consider, so Theorem 2 should be informative about their findings.

958
A New Perspective on Gaussian Dynamic Term Structure Models

one can view Pt = W yt as the “true” values of the pricing factors and view
Pto = W yto as its observed counterpart.32
To set up the Kalman filtering problem for Case F, we start with a given
Q
set of portfolio weights W ∈ R J ×N . From W and (λQ , r∞ , ΣP ), we construct
Q Q
(K 0 , K 1 , ρ0 , ρ1 ) as prescribed in Proposition 2. From the no-arbitrage rela-
tion (A2–A3) we then construct A ∈ R J and B ∈ R J ×N with yt = A + BPt
and thus the relations

1Pt = K 0PP + K 1PP Pt + ΣP tP , (37)

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

yto = A + BPt + ΣY tm , (38)

where tP ∼ N (0, I N ) and tm ∼ N (0, I M ) are the measurement errors. Re-
searchers have considered several parameterizations of the volatility matrix
ΣY for tm . In our subsequent empirical examples, we examine the cases of
independent (diagonal ΣY ) errors with distinct or common volatilities. These
relations give the usual observation and state equations of the Kalman filter,
and they fully characterize the conditional distribution of the yield curve in
terms of rotation-invariant parameters.
The computational benefits from using the JSZ normalization in Case F
arise, in part, from the observation that the least-squares projection of Pto onto
o
Pt−1 will nearly recover the ML estimates of K 0PP and K 1PP to the extent that
Pt ≈ Pt (and we can choose portfolios, such as the principal components, to
o

make these errors small).33 Additionally, although not exact, we have nearly
concentrated the likelihood in that the optimal P parameters will typically have
weak dependence on the Q parameters owing to the fact that, as the Q param-
eters vary, the filtered states largely do not change.34
With the JSZ normalization, the parameter estimates are directly compara-
ble across distributional assumptions on the measurement errors. That is, in
analogy to Section 3, by fixing the yield portfolios, both measured with and
without error, the P parameters are now directly comparable regardless of the
Q structure. The parameters are also directly comparable across sample peri-
ods. When the P parameters are defined indirectly through a Q normalization,
such comparisons will in general not be possible.

6.1 Empirical Implication

To illustrate Case F, we estimate model RKF in which all J zero-coupon bonds
Q
used in estimation are measured with errors, and the eigenvalues of K 1 are all

32 In fact, an equivalent characterization of the JSZ normalization is that, for a given portfolio matrix W ,
A W (ΘQ ) = 0 and BW (ΘQ ) = I N .
33 This approximation can be verified empirically by comparing P o to E P [P ] or E P [P ].
t t t T t
34 This is in contrast to, for example, the rotation of DS where, as the lower triangular K Q is changed, the latent
1
states vary as well. Thus, necessarily, so do the optimal P parameters given the specified Q parameters.

959
The Review of Financial Studies / v 24 n 3 2011

real. From Table 2, it is seen that the estimates of the Q parameters for model
RKF are similar to those for models RPC and RY that fit with N portfolios
of yields priced exactly by the GDTSM(3). Similarly, from Table 3 and Table
4, we see that the P parameters also generally match up across the models
with and without filtering. An exception is the P distribution of PC3: When
filtering, the volatility of PC3 is reduced by about 10%, and PC3 has a larger
effect on the conditional mean of PC1 and PC2 (higher K 1,13P , K P ). That is,
1,23
PC3 both becomes a bit smoother and the model attributes a slightly greater
affect of PC3 on forecasts of changes in the level and slope of the yield curve.

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

For out-of-sample forecasts using model RKF, Table 9 shows that PC1 is better
predicted by a simple VAR, while PC2 is predicted better than a VAR (though
the differences are modest).
Also of interest in the presence of filtering are comparisons of the model-
implied PCs with their corresponding sample estimates that, by assumption,
are contaminated by measurement errors. Figure 1 plots the time series of the
PCs computed from data against those from models RCMT, RY, and RKF.
f
For model RKF, we plot the model-implied filtered PCit = E t [PCi t ]. For all
o
three models, the PCi are nearly identical to their model-implied counterparts.
This is not surprising: If the model is accurately pricing the cross-section of
bonds, then it is almost a necessity that it will accurately match level, slope,
and curvature. PC3 f deviates slightly from PC3o , and this is the source of the
small differences seen in Figure 1.
A quite different picture emerges when we increase the number of pric-
ing factors to four or five using the JSZ normalization under Case F. For
i = 1, 2, 3, PCi f lines up well with PCio , as before. However, from Fig-
ure 2, it is seen that (PC4 f , PC5 f ) appears to be a smoothed version of
(PC4o , PC5o ), with the differences being substantial during some periods.
To interpret these patterns, we note that the likelihood function, through the
Kalman filter, attempts to match both the cross-sectional pricing relationships
and the time-series variation in excess returns. The higher-order PC4 and PC5
have only small impacts on pricing since a three-factor model already prices
the cross-section of bonds well, but they do contain information about time
variation in expected returns.35
Further insight into how ML addresses this dual objective is revealed by the
estimated half-lives of the pricing factors under Q (computed from the esti-
mated λQ ). In the five-factor GDTSM, the Q half-lives of Pt are (in years)
(15, 8.4, 2.4, 0.13, 0.08), whereas they are (24, 1.2, 0.78) in the three-factor
model. The presence of a factor with a very low half-life induces large move-
ments in the short rate (the one-month rate in our discrete time formulation).

35 Cochrane and Piazzesi (2005, 2008) find that a portfolio of smoothed forward rates, that is correlated with PC4,
predicts bond returns. Joslin, Priebsch, and Singleton (2010) find that smoothed growth in industrial production,
which is also correlated with PC4, is an important determinant of excess returns for level and slope portfolios.

960
A New Perspective on Gaussian Dynamic Term Structure Models

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

Figure 1
This figure plots the PCs implied by models RCMT, RY, and RKF against the estimated PCs from the data.
All three models imply PC1 and PC2 that are almost indistinguishable from the data and from each other. The
models imply slightly different PC3, but the difference is very small.

Moreover, the sample average short rate is 23%, which also results in large,
wildly oscillating Sharpe ratios.
It is not the need to filter per se that gives rise to these fitting problems with
a 5-factor model. When the first five PCs are priced perfectly by the GDTSM
(Model RPC), the properties of the short rate are now more plausible (see
Table 10). However, the model-implied yields on bonds with maturities beyond
those included in estimation are now wildly implausible. Furthermore, impos-
ing the reduced rank restriction (Model RPC1 ) does not materially improve the
fit with five factors. For all of these error specifications with five factors, the
Sharpe ratios for the higher-order PCs show substantial variation.36 In contrast,

36 See Duffee (2010) for a more extensive empirical evaluation of the properties of Sharpe ratios in GDTSMs.
Joslin, Priebsch, and Singleton (2010) also investigate maximal Sharpe ratio variation within the context of
macro-GDTSMs.

961
The Review of Financial Studies / v 24 n 3 2011

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

Figure 2
This figure plots the model implied and sample principal components for the fourth and fifth PCs when all PCs
are assumed to be measured with normally distributed errors. High-order PCs implied by the models are visibly
different from the data.

Table 10
Sample moments for three-factor and five-factor GDTSMs

3 Factor Models 5 Factor Models

RPC RPC1 RKF RPC RPC1 RKF
mean 1-month rate 4.2% 4.2% 4.2% 4.3% 4.3% 23%
mean 30-year rate 5.8% 5.8% 5.9% −31% −39% 0.63%
PC4 Sharpe ratio mean 0.096 0.095 0.032 0.031 0.076 30
PC4 Sharpe ratio volatility 0.086 0.018 0.088 0.31 0.2 25
PC5 Sharpe ratio mean 0.096 0.095 0.032 0.031 0.076 30
PC5 Sharpe ratio volatility 0.086 0.018 0.088 0.31 0.2 25

the 3-factor specifications produce plausible values for these moments. We in-
terpret this evidence as being symptomatic of over-fitting, of having too many
pricing factors.
Does the accommodation of filtering substantially increase the computa-
tional complexity of estimation using the JSZ normalization? The parameters
P , K P ) and σ
(K 0, P 1P pricing are now included as part of the parameter search.

962
A New Perspective on Gaussian Dynamic Term Structure Models

As we argued for ΣP in Case RP, we obtain very accurate starting points

P , K P ) irrespective of any inaccuracies in (r Q , λQ ). The additional
for (K 0, P 1P ∞
cost of computing the Kalman filter as well as the lack of concentration of the
likelihood function results in estimation times of approximately 10.4 seconds
and, as without filtering, virtually all local optima are identical to within-set
tolerances. Using the results of the Case P estimation as a starting point for the
Case F estimation decreased the estimation time to approximately 8.7 seconds.
Thus, under the JSZ normalization, the estimation remains very fast even when
all yields are measured with errors.

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

7. Conclusion
We derive a new canonical form for Gaussian dynamic term structure models.
This canonical form allows for (essentially) arbitrary observable portfolios of
zero-coupon yields to serve as the state variable. This allows us to characterize
the properties of a GDTSM in terms of salient observables rather than latent
states. Additionally, the risk-neutral distribution is parsimoniously character-
ized by the eigenvalues, λQ , of the drift matrix and a constant that, under Q
Q
stationarity, is proportional to the long-run mean of the short rate, r∞ . Our
canonical form reveals that simple OLS regression gives the maximum likeli-
hood estimates of the parameters governing the physical distribution of bond
yields. This result remains true even if additional restrictions of several types,
such as restrictions on the risk-neutral condtional distribution of yields, are
imposed. An immediate implication of this result is that constraints such as im-
posing the arbitrage-free Nelson Siegel model or imposing complex Q eigen-
values are irrelevant for forecasting bond yields. However, when one imposes
structure on risk premia, such as the reduced-rank risk premium, a wedge from
the unconstrained OLS estimates arises. Our canonical form allows us to eas-
ily overcome the challenge of empirical estimation of GDTSMs in the case
of filtering. The empirical results suggest that either some caution should be
exercised in interpreting a higher-dimensional model or, alternatively (perhaps
preferably), care should be taken to avoid highly overparametrized models with
implausible implications for either pricing or bond risk premia. Taken together,
our results shed new light on estimation and interpretation of GDTSMs, and
the effects of different specifications of the risk premiums and the risk-neutral
distribution of bond yields on the observed dynamics of the yield curve.

Appendices

A. Bond Pricing in GDTSMs

Under (1–3), the price of an m-year zero-coupon bond is given by
Pm−1
Q
Dt,m = E t [e− i=0 rt+i ] = eAm +Bm ∙X t , (A1)

963
The Review of Financial Studies / v 24 n 3 2011

where (Am , Bm ) solve the first-order difference equations

Q0 1 0
Am+1 − Am = K 0 Bm + Bm H0 Bm − ρ0 (A2)
2
Q0
Bm+1 − Bm = K 1 Bm − ρ1 (A3)

subject to the initial conditions A0 = 0, B0 = 0. See, for example, Dai and Singleton (2003). The
loadings for the corresponding bond yield are Am = −Am /m and Bm = −Bm /m.

B. Invariant Transformations of GDTSMs

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

As in DS, given the GDTSM with parameters as in (1–3) and latent state X t , if we may apply the
invariant transformation X̂ t = C + D X t , we then have an observationally equivalent GDTSM with
latent state X̂ t and parameters given by

Q Q Q
K = D K 0X − D K 1X D −1 C, (A4)
0 X̂

Q Q
K = D K 1X D −1 , (A5)
1 X̂

0 D −1 C,
ρ0 X̂ = ρ0X − ρ1X (A6)

ρ1 X̂ = (D −1 )0 ρ1X , (A7)

K P = D K 0X
P − D K P D −1 C,
1X (A8)
0 X̂

K P = D K 1X
P D −1 , (A9)
1 X̂

H0 X̂ = D H0X D 0 . (A10)

Given a parameter vector Θ, we denote the parameter vector of X̂ t as C + DΘ.

C. Proof of Proposition 1
We require a slight variation of the standard Jordan canonical form of a square matrix that main-
tains all real entries and bears a similar relation to the real Schur decomposition and the Schur
decomposition.

Definition 1. We refer to the real ordered Jordan form of a square matrix A ∈ Rn×n with
eigenvalues (λ1 , λ2 , . . . , λm ) with corresponding algebriac multiplicities (m 1 , m 2 , . . . , m m ) as

A = J (λ) ≡ diag(J1 , J2 , . . . , Jm ),

where if λi is real, Ji is the (m i × m i ) matrix

 
λi 1 ∙∙∙ 0
 0 λi ∙∙∙ 0
 
 
Ji =  .. .. .. ,
 
 . . . 1
0 ∙∙∙ 0 λi

964
A New Perspective on Gaussian Dynamic Term Structure Models

and if |imag(λi )| > 0, Ji is the (2m i × 2m i ) matrix

 
R I2 ∙ ∙ ∙ 0
  !
0 R ∙∙∙ 0 
  real(λi ) −|imag(λi )|
Ji =  .. . ..  with R =

. . . I2  |imagl(λi )| real(λi )
 . 
0 ∙∙∙ 0 R

and otherwise the block is empty. Additionally, we apply an arbitrary ordering on C to order the
blocks by their eigenvalues. In case there exist eigenvalues with a geometric multiplicity greater
than one, we also order the blocks by size.

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

Proof of Proposition 1: We first prove the existence by showing that a latent factor X t with
arbitrary Q dynamics
Q Q Q
1X t = K 0X + K 1X X t−1 + Σ X t
can be transformed to our desired form. By standard linear algebra, there exists matrix U so that
Q
U K 1X U −1 is in the standard Jordan normal form. By Lemma 1 of the supplement to this article
(see Joslin, Singleton, and Zhu 2010), we can further transform to have the real ordered form of
Definition 1. Note that by Joslin (2007), each eigenvalue has a geometric multiplicity one and thus
is associated with only one block due to the Markovian assumption. Now we separately consider
the cases of real and imaginary Jordan blocks and show that we may transform the latent state to
have ρ1 = ι.
1. A Jordan block Ji corresponds to real eigenvalues with algebraic multiplicity m i (m i
could be 1). Then, Ji is m i × m i matrix
 
λi 1 ∙ ∙ ∙ 0
 0 λ ∙∙∙ 0
 i 
 
Ji =  . .. .. .
 . 
 . . . 1
0 ∙∙∙ 0 λi
(1) (k)
Let ρ1i = (ρ1i , . . . , ρ1i ) be the components of ρ1 that correspond to the Jordan block
(1) (1)
Ji . We observe that ρ1i 6= 0, for otherwise we can do without state variable X ti ,
−1
contradicting our assumption of an N -factor model. One can check that Bi Ji Bi = Ji
if and only if Bi has the form  
(1) (2) (m )
b bi ∙∙∙ bi i
 i 
 (1) (m −1) 
 0 bi ∙ ∙ ∙ bi i 
 
Bi =  . . . . (A11)
 . . . .. . 
 . . . 
 
(1)
0 0 ∙∙∙ bi
In particular, we can verify that the matrix
 (1) (2) (1) (m i ) (m i −1) 
ρ ρ1i − ρ1i ∙∙∙ ρ1i − ρ1i
 1i (1) (m i −1) (m i −2) 
 0 ρ1i ∙∙∙ ρ1i − ρ1i 
 
Bi =  . .. .. .. 
 . 
 . . . . 
(1)
0 0 ∙∙∙ ρ1i

satisfies Bi Ji Bi−1 = Ji and (Bi−1 )0 ρ1i = ι.

965
The Review of Financial Studies / v 24 n 3 2011

2. A Jordan block Ji corresponds to complex eigenvalues with multiplicity m i . Then, Ji is

the 2m i × 2m i matrix defined by
 
R I2 ∙∙∙ 0
  !
0 R ∙∙∙ 0
  real(λi ) −|imag(λi )|
Ji = 
 .. . ..  with R =
 .
. .. |imagl(λi )| real(λi )

. I2 

0 ∙∙∙ 0 R

The proof is analogous to the real case, as the individual steps are the same but require

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

lemmas to verify the intuitive steps hold with (2×2) block matrices replacing scalars. The
details of the proof and subsequent steps for this case are available in Joslin, Singleton,
and Zhu (2010).
Q
We obtain the correct form of K 0X as follows. We can demean the components of X cor-

Q,b −1 Q,b
responding to non-singular Jordan blocks by transforming X̂ tb = X tb + K 1X K 0X . There
can be at most one block corresponding to a zero eigenvector (which by our ordering would be
Q
the first), and the first m 1 − 1 entries of K 0X can then be set to zero by translating to X̂ tb =
Q,b Q,b Q,b
X tb − (K 0X,2 , K 0X,3 , , . . . , K 0X,m −1 , , 0)0 . Finally, ρ0 can then be set to zero by the translation
1
X̂ m 1 ,t = X m 1 ,t − ρ0 .
The uniqueness of the canonical GDTSM stated in Proposition 1 follows from the uniqueness
of an ordered Jordan decomposition and the fact that (i) the Jordan decomposition is maintained
only by a block matrix where B has form (A11); and (ii) the only such B that satisifies B 0 ι = ι is
B = I . Furthermore, for θ ∈ Θ J S Z and any vector of parameters a 6= 0, either the translating by a
Q
violates the form of K 0X (which happens if any state besides the last zero eigenvalue state (if one
exists) is translated) or the translating violates ρ0 = 0 (which happens if there is a zero eigenvalue
and only the last such state is translated). This establishes the uniqueness and completes the proof
of Proposition 1.

D. Details of Step 3 in the Proof of Theorem 1

We have established that every GDTSM is observationally equivalent to a Jordan normalized model
and the transformation relating the two models is found by computing the associated portfolio
loadings:
P = { A (Θ J ) + B (Θ J )0 Θ J : Θ J ∈ G }.
GP (A12)
W W J

Observe that since ρ1J = ι, BW (Θ J ) depends only on λQ ; let us denote BλQ ≡ BW (Θ J )0 .

Similarly, let us denote AλQ ,ρ ,Σ ≡ A W (Θ J ). Since, for any λQ , the map sλQ (Σ) = B −1 Σ is a
0 λQ
bijection,37 we can reparametrize the conditional volatility by

P = {A J J Q Q P P
GP Θ J + BΘ J Θ : Θ = (k∞ em 1 , J (λ ), 0, ι, K 0J , K 1J , sλQ (ΣP ))}. (A13)

Q P ,
Here, we use ΣP to denote the parameterization since, for Θ J = (k∞ em 1 , J (λQ ), 0, ι, K 0J
P , B −1 Σ ), the transformed model A J
K 1J
λQ P Θ J + BΘ J Θ (which has Pt as the factors since it is in
−1
GP ) has innovation volatility of BλQ B Q ΣP = ΣP .
λ

37 For simplicity, we denote the Cholesky factorization, Σ, but we have in mind the covariance ΣΣ0 .

966
A New Perspective on Gaussian Dynamic Term Structure Models

Define the bijective map k on R N × R N ×N by

k Q (K 0 , K1) = BλQ K 0 − BλQ K 1 B −1
QA Q , BλQ K 1 B −1
Q . (A14)
λQ ,k∞ ,ΣP λ λQ ,k∞ ,ΣP λ

The function k maps (K 0 , K 1 ) under the change of variables X t 7→ A Q + BλQ X t . Using

λQ ,k∞ ,ΣP
P by
k, we further reparametrize GP

P = {A J J Q Q −1 P P
GP Θ J + BΘ J Θ : Θ = (k∞ em 1 , J (λ ), 0, ι, kλQ ,k Q ,Σ (K 0P , K 1P ), sλQ (ΣP ))}.
∞ P
(A15)

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

P by Θ Q P P Q
This gives our desired reparameterization of GP
J S Z = (λ , k∞ , ΣP , K 0P , K 1P ). This is
Q
because, for Θ J = k∞ em 1 , J (λQ ), 0, ι, k −1 Q (K 0PP , K 1PP ), sλQ (ΣP ) ,
λQ ,k∞ ,ΣP

ΘP = AΘ J + BΘ J Θ J
(A16)
Q
= k Q Q (0, J (λQ )), r Q (k∞ , ι), K 0PP , K 1PP , ΣP ,
λ ,k∞ ,ΣP λQ ,k∞ ,ΣP

where r Q maps (ρ0 , ρ1 ) under the change of variables X t 7→ A Q + BλQ X t :

λQ ,k∞ ,ΣP λQ ,k∞ ,ΣP
0
r Q (ρ , ρ ) = ρ0 − ρ10 B −1
QA Q , B −1
Q ρ1 . (A17)
λQ ,k∞ ,ΣP 0 1 λ λQ ,k∞ ,ΣP λ

E. Proof of Theorem 2
0
We first prove that (26–27) holds when H0 = {η0 = (C 0 , D 0 , Σ0X , P θm )}. Let

η η
(K 0X0 , K 1X0 ) = arg max f (PT , yT , . . . , P1 , y1 |P0 , y0 ; η0 ),
K 0X ,K 1X

which we subsequently show is uniquely maximized.

0 , D 0 ) denote the first N -element of C 0 and upper-left N × N block of D 0 , respec-
Let (CP P
tively. By our assumption of invertibility of DP0 , we have that X = (D 0 )−1 (P − C 0 ). Thus,
t P t P
by our assumptions on the measurement errors,

f (PT , yT , . . . , P1 , y1 |P0 , y0 ; η0 , K 0X , K 1X ) = f (PT , . . . , P1 |P0 ; η0 , K 0X , K 1X )

T
Y
× f (emt |Pt ; η0 ),
t=1

and so
η η
(K 0X0 , K 1X0 ) = arg max f (PT , . . . , P1 |P0 ; η0 ). (A18)
K 0X ,K 1X

Furthermore, substituting into (24) we have

−1 0 −1
1Pt = D0,P K 1X D0, P Pt + D0,P K 0X − DP K 1X (D0,P ) C 0,P + Dt , t ∼ Σ X .

It follows that the maximum value in (A18) is at most equal to the value of the likelihood corre-
sponding to the OLS estimate. Note that although the value of the maximum likelihood depends

967
The Review of Financial Studies / v 24 n 3 2011

on D, the argument that maximizes the value does not depend on D by the classic Zellner (1962)
result. The OLS likelihood value is achieved by choosing (K 0X , K 1X ) to satisfy (26–27), which
0 is full rank.
is feasible by the assumption that (K 0X , K 1X ) is unconstrained and DP
η η
H , K H ) = (K 0 , K 0 ) for some η and we have shown
This proves our result since (K 0X 1X 0X 1X 0
that (26–27) hold for any η0 . Note that in the case that the parameters are under-identified, there
will not be a unique maximum likelihood estimate in the sense that several η0 may give the same
likelihood, but (26–27) will hold for all possible choices. For some H, there may not exist a
maximizer, in which case the result holds vacuously. However, standard conditions and arguments,
such as compactness, provide for the existence of a maximizer.

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

F. ML Estimation of Reduced-rank Regressions
Consider the regression as in (29) of the general form Yt = α + β X t + t subject to the constraint
that β has rank r and where t ∼ N (0, Σ) i.i.d. with Σ known. That is, we wish to solve the
program X
(α, β) = arg min (Yt − (α + β X t )0 Σ−1 (Yt − (α + β X t )).
rank(β)=r t

It is easy to verify that by first de-meaing the variables we may assume without loss of generality
that α ≡ 0. Furthermore,P by transforming the variables, we may assume again without loss of
generality that Σ = I and t X t X t0 = I . Under these assumptions, we wish to solve

β = arg min trace (Y − Xβ 0 )(Y − Xβ 0 )0
rank(β)=r
0
= arg min trace (Y − Xβ O 0 0 0 0
L S )(Y − Xβ O L S ) − 2 trace X (Y − Xβ O L S )(β − β O L S )
rank(β)=r

+ trace( (X 0 X (β 0 − β O
0
L S ))(β − β O L S )
= arg min kβ − β O L S k F ,
rank(β)=r

where Y and X are (T × N ) and (T × M) matrices with the time series stacked vertically,
P
β O L S = (X 0 X )−1 X 0 Y , and F denotes the Frobenius norm: kAk2F = i, j | Ai, j |2 . The above
equalities repeatedly use the identity trace( AB) = trace(B A). As in Keller (1962), this minimiza-
tion problem has solution β ∗ = U Dr∗ V 0 , where U DV 0 gives the singular value decomposition of
β O L S and Dr∗ is the same as D except setting all of the singular values for n > r to 0. This same
proof applies again in the case where β is not square, which would be the case where one assumes
Q Q
that only a single risk is priced (i.e., [K 0P , K 1P ] − [K 0 , K 1 ] has reduced rank) rather than only a
single risk has time-varying price of risk, as we do here.

References
Adrian, T., and E. Moench. 2008. Pricing the Term Structure with Linear Regressions. Staff Report No. 340,
Federal Reserve Bank of New York. http://www.ny.frb.org/research/staff reports/sr340.pdf (accessed October
25, 2010).

Ang, A., and M. Piazzesi. 2003. A No-arbitrage Vector Autoregression of Term Structure Dynamics with
Macroeconomic and Latent Variables. Journal of Monetary Economics 50:745–87.

Ang, A., M. Piazzesi, and M. Wei. 2003. What Does the Yield Curve Tell Us About GDP Growth? Working
Paper, Columbia University.

Berndt, E., B. Hall, R. Hall, and J. Hausman. 1974. Estimation Estimation and Inference in Nonlinear Structural
Models. Annals of Social Measurement 3:653–65.

Campbell, J., and R. Shiller. 1991. Yield Spreads and Interest Rate Movements: A Bird’s-eye View. Review of
Economic Studies 58:495–514.

968
A New Perspective on Gaussian Dynamic Term Structure Models

Chen, R., and L. Scott. 1993. Maximum Likelihood Estimation for a Multifactor Equilibrium Model of the Term
Structure of Interest Rates. Journal of Fixed Income 3:14–31.

. 1995. Interest Rate Options in Multifactor Cox-Ingersoll-Ross Models of the Term Structure. Journal
of Fixed Income (Winter) 53–72.

Chernov, M., and P. Mueller. 2008. The Term Structure of Inflation Expectations. Working Paper, London Busi-
ness School.

Christensen, J. H., F. X. Diebold, and G. D. Rudebusch 2007: The Affine Arbitrage-free Class of Nelson Siegel
Term Structure Models. Working Paper, Federal Reserve Bank of San Francisco.

. 2009. An Arbitrage-free Generalized Nelson-Siegel Term Structure Model. Econometrics Journal 12:C33–

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

C64.

Cochrane, J., and M. Piazzesi. 2005. Bond Risk Premia. American Economic Review 95:138–60.

. 2008. Decomposing the Yield Curve. Working Paper, Stanford University.

Collin-Dufresne, P., R. Goldstein, and C. Jones. 2008. Identification of Maximal Affine Term Structure Models.
Journal of Finance 63:743–95.

Cox, J. C., and C. Huang. 1989. Optimum Consumption and Portfolio Policies When Asset Prices Follow a
Diffusion Process. Journal of Economic Theory 49:33–83.

Dai, Q., and K. Singleton. 2000. Specification Analysis of Affine Term Structure Models. Journal of Finance
55:1943–78.

. 2002. Expectations Puzzles, Time-varying Risk Premia, and Affine Models of the Term Structure.
Journal of Financial Economics 63:415–41.

. 2003. Term Structure Dynamics in Theory and Reality. Review of Financial Studies 16:631–78.

Diebold, F., and C. Li. 2006. Forecasting the Term Structure of Government Bond Yields. Journal of Economet-
rics 130:337–64.

Duffee, G. 1996. Idiosyncratic Variation in Treasury Bill Yields. Journal of Finance 51:527–52.

. 2002. Term Premia and Interest Rate Forecasts in Affine Models. Journal of Finance 57:405–43.

. 2008. Information in (and Not in) the Term Structure. Working Paper, Johns Hopkins University.

. 2009. Forecasting with the Term Structure: The Role of No-arbitrage. Working Paper, University of
California-Berkeley.

. 2010. Sharpe Ratios in Term Structure Models. Working Paper, Johns Hopkins University.

Duffee, G., and R. Stanton. 2007. Evidence on Simulation Inference for Near Unit-root Processes with Implica-
tions for Term Structure Estimation. Journal of Financial Econometrics 6:108–42.

Duffie, D., and R. Kan. 1996. A Yield-factor Model of Interest Rates. Mathematical Finance 6:379–406.

Jardet, C., A. Monfort, and F. Pegoraro. 2009. No-arbitrage Near-cointegrated VAR(p) Term Structure Models,
Term Premiums, and GDP Growth. Working Paper, Banque de France.

Joslin, S. 2007. Pricing and Hedging Volatility in Fixed Income Markets. Working Paper, MIT.

Joslin, S., A. Le, and K. Singleton. 2010. The Conditional Distribution of Bond Yields Implied by Gaussian
Macro-finance Term Structure Models. Working Paper, Sloan School, MIT.

Joslin, S., M. Priebsch, and K. Singleton. 2010. Risk Premiums in Dynamic Term Structure Models with Un-
spanned Macro Risks. Working Paper, Stanford University.

Joslin, S., K. Singleton, and H. Zhu. 2010. Supplement to “A New Perspective on Gaussian DTSMs.” Working
Paper, Sloan School, MIT.

969
The Review of Financial Studies / v 24 n 3 2011

Keller, J. B. 1962. Factorization of Matrices by Least-squares. Biometrica 49:239–42.

Nelson, C., and A. Siegel. 1987. Parsimonious Modelling of Yield Curves. Journal of Business 60:473–89.

Pearson, N. D., and T. Sun. 1994. Exploiting the Conditional Density in Estimating the Term Structure: An
Application to the Cox, Ingersoll, and Ross Model. Journal of Finance 49:1279–304.

Piazzesi, M. 2005. Bond Yields and the Federal Reserve. Journal of Political Economy 113:311–44.

Zellner, A. 1962. An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggrega-
tion Bias. Journal of the American Statistical Association 57:348–68.

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

970

ForecastForecasting With Term Structure
No ratings yet
ForecastForecasting With Term Structure
44 pages
The Advantages of Using Excess Returns To Model The Term Structure
No ratings yet
The Advantages of Using Excess Returns To Model The Term Structure
56 pages
Forecasting The Yield Curve With The Arbitrage-Free Dynamic Nelson-Siegel Model: Brazilian Evidence
No ratings yet
Forecasting The Yield Curve With The Arbitrage-Free Dynamic Nelson-Siegel Model: Brazilian Evidence
20 pages
Research Article: A General Gaussian Interest Rate Model Consistent With The Current Term Structure
No ratings yet
Research Article: A General Gaussian Interest Rate Model Consistent With The Current Term Structure
17 pages
Advanced Interest Rate Models
No ratings yet
Advanced Interest Rate Models
64 pages
Deconstructing The Yield Curve
No ratings yet
Deconstructing The Yield Curve
70 pages
Christensen TheAffineArbritrageFreeClassOfNSTermStructureModels (2010)
No ratings yet
Christensen TheAffineArbritrageFreeClassOfNSTermStructureModels (2010)
43 pages
Jae 1247
No ratings yet
Jae 1247
24 pages
Pricing The Term Structure With Linear Regressions: Adrian, Tobias Moench, Emanuel
No ratings yet
Pricing The Term Structure With Linear Regressions: Adrian, Tobias Moench, Emanuel
48 pages
Deep Learning Asset Pricing 6 MS
No ratings yet
Deep Learning Asset Pricing 6 MS
54 pages
1 s2.0 S0304407605000795 Main
No ratings yet
1 s2.0 S0304407605000795 Main
28 pages
No 76
No ratings yet
No 76
44 pages
Estimating Regression Models of Finite But Unknown Order
No ratings yet
Estimating Regression Models of Finite But Unknown Order
17 pages
Advanced Interest Rate Models
No ratings yet
Advanced Interest Rate Models
14 pages
An Affine Model of Long Maturity Forward Rates
No ratings yet
An Affine Model of Long Maturity Forward Rates
43 pages
Dividend Dynamics, Learning, and Expected Stock Index Returns
No ratings yet
Dividend Dynamics, Learning, and Expected Stock Index Returns
48 pages
Forecasting The Term Structure of Government Bond Yields: Article in Press
No ratings yet
Forecasting The Term Structure of Government Bond Yields: Article in Press
28 pages
An Equilibrium Characterization of The Term Structure - O. Vasicek, 1977 PDF
No ratings yet
An Equilibrium Characterization of The Term Structure - O. Vasicek, 1977 PDF
12 pages
Markov Interest Rate Models - Hagan and Woodward
No ratings yet
Markov Interest Rate Models - Hagan and Woodward
28 pages
Pat Hagan Markovian IR Models
No ratings yet
Pat Hagan Markovian IR Models
28 pages
Hansen, Heaton & Yaron (1996)
No ratings yet
Hansen, Heaton & Yaron (1996)
20 pages
An Empirical Estimation & Model Selection of The Short-Term Interest Rates
No ratings yet
An Empirical Estimation & Model Selection of The Short-Term Interest Rates
24 pages
Renault Werker ATP For Squared Returns Draftchicago Feb2016
No ratings yet
Renault Werker ATP For Squared Returns Draftchicago Feb2016
29 pages
Lecture Slides 4 Term Structure
100% (2)
Lecture Slides 4 Term Structure
12 pages
GMM Estimation With Persistent Panel Data An Application To Production Functions
No ratings yet
GMM Estimation With Persistent Panel Data An Application To Production Functions
21 pages
Regime Shifts in A Dynamic Term Structure Model of U.S. Treasury Bond Yields
No ratings yet
Regime Shifts in A Dynamic Term Structure Model of U.S. Treasury Bond Yields
31 pages
Yield Curve Modelling
No ratings yet
Yield Curve Modelling
33 pages
Forecasting Asset Returns in State Space Models
No ratings yet
Forecasting Asset Returns in State Space Models
149 pages
SSRN Id2685523 PDF
No ratings yet
SSRN Id2685523 PDF
7 pages
(2024) Autoencoder-Based Risk-Neutral Model For Interest Rates (Lyashenko, Mercurio, Sokol)
No ratings yet
(2024) Autoencoder-Based Risk-Neutral Model For Interest Rates (Lyashenko, Mercurio, Sokol)
25 pages
Asset Pricing and Machine Learning A Critical Review - Journal of Economic Surveys - 2022 - Bagnara
No ratings yet
Asset Pricing and Machine Learning A Critical Review - Journal of Economic Surveys - 2022 - Bagnara
30 pages
Giglio Et Al 2022 Factor Models Machine Learning and Asset Pricing
No ratings yet
Giglio Et Al 2022 Factor Models Machine Learning and Asset Pricing
34 pages
LR 14
No ratings yet
LR 14
41 pages
Review: Arbitrage-Free Pricing and Stochastic Calculus: Leonid Kogan
No ratings yet
Review: Arbitrage-Free Pricing and Stochastic Calculus: Leonid Kogan
16 pages
The Vasicek Model: P t, T t T t T P T, T R t, τ t τ
No ratings yet
The Vasicek Model: P t, T t T t T P T, T R t, τ t τ
15 pages
Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C
No ratings yet
Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C
43 pages
Markets91 Interest PDF
No ratings yet
Markets91 Interest PDF
25 pages
CTMC Interest Rate Derivs
No ratings yet
CTMC Interest Rate Derivs
10 pages
SSRN Id3950568
No ratings yet
SSRN Id3950568
52 pages
Using The Moving Block Bootstrap
No ratings yet
Using The Moving Block Bootstrap
25 pages
NoVaS Hstep
No ratings yet
NoVaS Hstep
24 pages
Fundamental Theorem of Asset Pricing
No ratings yet
Fundamental Theorem of Asset Pricing
18 pages
Lecture 10 Nonlinear Regression
No ratings yet
Lecture 10 Nonlinear Regression
10 pages
BK 1980
No ratings yet
BK 1980
8 pages
2021-Huang Z-Fitting Yield Curve With Dynamic Nelson-Siegel Models
No ratings yet
2021-Huang Z-Fitting Yield Curve With Dynamic Nelson-Siegel Models
40 pages
(2022) Fast Exact Joint S&P 500-VIX Smile Calibration in Discrete and Continuous Time (Bourgey, Guyon)
No ratings yet
(2022) Fast Exact Joint S&P 500-VIX Smile Calibration in Discrete and Continuous Time (Bourgey, Guyon)
19 pages
Machine Learning and The Yield Curve Tree-Based Macroeconomic Regime Switching
No ratings yet
Machine Learning and The Yield Curve Tree-Based Macroeconomic Regime Switching
41 pages
Efficient Tests of Stock Return Predictability: Please Share
No ratings yet
Efficient Tests of Stock Return Predictability: Please Share
57 pages
A TWO-FACTOR MODEL of The German Term Structure of Interest Rates
No ratings yet
A TWO-FACTOR MODEL of The German Term Structure of Interest Rates
64 pages
Advanced Financial Models
No ratings yet
Advanced Financial Models
91 pages
Fi Yield Curve Final
No ratings yet
Fi Yield Curve Final
8 pages
The Dynamic, The Static, and The Weak Factor Models and The Analysis of High-Dimensional Time Series
No ratings yet
The Dynamic, The Static, and The Weak Factor Models and The Analysis of High-Dimensional Time Series
25 pages
Blundell Bond ER
No ratings yet
Blundell Bond ER
20 pages
Blundell and Bond (1999 WP) GMM Estimation With Persistent Panel Data An Application To Production Functions
No ratings yet
Blundell and Bond (1999 WP) GMM Estimation With Persistent Panel Data An Application To Production Functions
24 pages
DP677 2011 SecondOrder
No ratings yet
DP677 2011 SecondOrder
25 pages
A Portfolio-Based Evaluation of Affine Term Structure Models
No ratings yet
A Portfolio-Based Evaluation of Affine Term Structure Models
31 pages
JPM 1989 409199
No ratings yet
JPM 1989 409199
7 pages
EarningsInsight 032124
No ratings yet
EarningsInsight 032124
35 pages
Ufaj 79 1 Online PDF
No ratings yet
Ufaj 79 1 Online PDF
124 pages
Time Series Analysis With MATLAB and Econometrics Toolbox
No ratings yet
Time Series Analysis With MATLAB and Econometrics Toolbox
2 pages
Master Viitanen Jonne 2015
No ratings yet
Master Viitanen Jonne 2015
57 pages
Successful Meetings Trainers Guide
100% (2)
Successful Meetings Trainers Guide
58 pages
View - Approch Affine Model
No ratings yet
View - Approch Affine Model
216 pages
Bec H Writing
No ratings yet
Bec H Writing
9 pages
Bec H Listening
No ratings yet
Bec H Listening
7 pages
Davison Hinkley Bootstrap Methods and Their Application
No ratings yet
Davison Hinkley Bootstrap Methods and Their Application
596 pages
Regression Model For Survival Data - The Generalized Time-Dependent Logistic Family - Mackenzie - 1996
No ratings yet
Regression Model For Survival Data - The Generalized Time-Dependent Logistic Family - Mackenzie - 1996
15 pages
The Philosophy of Statistics
No ratings yet
The Philosophy of Statistics
46 pages
Mdasc Reg Syl 2024-25 s56 424
No ratings yet
Mdasc Reg Syl 2024-25 s56 424
12 pages
Bayesian Methods For The Physical Sciences - Learning From Examples in Astronomy and Physics
No ratings yet
Bayesian Methods For The Physical Sciences - Learning From Examples in Astronomy and Physics
245 pages
SIDI2
No ratings yet
SIDI2
28 pages
Machine Learning Homework Guide
No ratings yet
Machine Learning Homework Guide
9 pages
SKL Pattern
No ratings yet
SKL Pattern
66 pages
Probit Model
No ratings yet
Probit Model
29 pages
Edge Co-Occurrence in Natural Images Predicts Contour Grouping Performance
No ratings yet
Edge Co-Occurrence in Natural Images Predicts Contour Grouping Performance
14 pages
Pedersen Et Al 2019 - GAMS, Pacote MGCV
No ratings yet
Pedersen Et Al 2019 - GAMS, Pacote MGCV
42 pages
CO2 Corrosion Guidelines 2009
100% (1)
CO2 Corrosion Guidelines 2009
19 pages
Statistical Parameters, Estimation, Confidence Region
No ratings yet
Statistical Parameters, Estimation, Confidence Region
11 pages
Golden Parachutes PDF
No ratings yet
Golden Parachutes PDF
44 pages
Identifying Initiating Event Frequency: 5.1. Purpose
No ratings yet
Identifying Initiating Event Frequency: 5.1. Purpose
12 pages
Instant Download Primer of Applied Regression & Analysis of Variance 3rd Edition Edition Stanton A. Glantz PDF All Chapter
100% (4)
Instant Download Primer of Applied Regression & Analysis of Variance 3rd Edition Edition Stanton A. Glantz PDF All Chapter
66 pages
Biometric ID Analysis with Beta-Binomial
No ratings yet
Biometric ID Analysis with Beta-Binomial
8 pages
Concepts - of - Machine - Learning (Minor)
No ratings yet
Concepts - of - Machine - Learning (Minor)
14 pages
Modeling Behavior and Population Dynamics - Jim M Cushing
No ratings yet
Modeling Behavior and Population Dynamics - Jim M Cushing
363 pages
Medical Statistics at A Glance 3rd Edition Aviva Petrie Full Chapters Included
100% (1)
Medical Statistics at A Glance 3rd Edition Aviva Petrie Full Chapters Included
142 pages
Longitudinal Trajectories of Sustained Attention Development
No ratings yet
Longitudinal Trajectories of Sustained Attention Development
14 pages
Bundle Adjustment - A Modern Synthesis: Bill - Triggs@
No ratings yet
Bundle Adjustment - A Modern Synthesis: Bill - Triggs@
71 pages
Quiet-STaR: LMs Learn to Reason
No ratings yet
Quiet-STaR: LMs Learn to Reason
26 pages
Youth, Social Media & Politics
No ratings yet
Youth, Social Media & Politics
18 pages
Board Composition Grey Directors and Corporate Failure
No ratings yet
Board Composition Grey Directors and Corporate Failure
13 pages
Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks
No ratings yet
Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks
15 pages
Nelson ConditionalHeteroskedasticityAsset 1991
No ratings yet
Nelson ConditionalHeteroskedasticityAsset 1991
25 pages
Understanding Diffusion Models by Feynman's Path Integral
No ratings yet
Understanding Diffusion Models by Feynman's Path Integral
27 pages
A Tutorial On MM Algorithms
No ratings yet
A Tutorial On MM Algorithms
28 pages
COVID-19 Impact on Filipino Medics
No ratings yet
COVID-19 Impact on Filipino Medics
46 pages

Advanced Bond Yield Forecasting

Uploaded by

Advanced Bond Yield Forecasting

Uploaded by

A New Perspective on Gaussian Dynamic

Term Structure Models

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

rt = ρ0X + ρ1X ∙ X t , (3)

where rt is the one-period spot interest rate, Σ X Σ X 0 is the conditional covari-

observable Pt ; and (ii) the Q distribution of Pt can be fully characterized by

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

where (Am , Bm ) satisfy well-known Riccati difference equations (see

For any full-rank, portfolio matrix W ∈ R N ×J , we let Pt ≡ W yt denote the

where A W = W [Am 1 , . . . , Am J ]0 and BW = [Bm 1 , . . . , Bm J ]W 0 . Note that

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

Case P: There are N portfolios of bond yields Pt , constructed with weights

We now state our main result for Case P:

1Pt = K 0PP + K 1PP Pt−1 + ΣP tP (6)

We refer to the GDTSM in Theorem 1 as the JSZ canonical form parame-

possible GDTSMs,11 we want to show that every Θ ∈ G is observationally

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

are observationally equivalent. Clearly, if two GDTSMs are observationally

Proposition 1. Every canonical GDTSM is observationally equivalent to the

where ι is a vector of ones, Σ X is lower triangular (with positive diagonal),

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

models. By the existence result in Proposition 1, each Θi is observationally

Θi = A W (ΘiJ ) + BW (ΘiJ )0 ΘiJ . (12)

For reference, we summarize the transformations computed in the last

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

where em 1 is a vector with all zeros except in the m th 1 entry, which is 1 (m 1

B −1 ΣP , 0, ι), where (A W , BW ) are defined in (5) and (A2–A3).

2. P Dynamics and Maximum Likelihood Estimation

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

f (Pt |Pt−1 ; K 1PP , K 0PP , ΣP ) = (2π)−N /2 |ΣP |−1

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

are the sample ordinary least squares (OLS) estimates, independent of ΣP

Proposition 3. Under Case P, the ML estimates of the P parameters (K 0PP ,

where we suppress the dependence on P θm . This inequality follows from the

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

3. On the Relevance of No-arbitrage for Forecasting

The JSZ normalization makes these observations particularly transparent.

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

1X t+1 = K 0X + K 1X X t + t , t ∼ N (0, Σ X ) i.i.d., (24)

and the observation equation

The parameter set is Θ SS = {(K 0X , K 1X , Σ X , C, D, P θm )}, where P θm is an

normalizations will in general affect the ML estimates of the parameters, Θ SS ,

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

4. Irrelevance of Factor Structure for Forecasting

H , K H , ηH ) are such that

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

H is the first N elements of C H , D H is the upper left N × N block of

4.1 Factor Structure in Arbitrage Models

comparisons of OLS forecasts of PCs with their forecasts from a variety of

4.2 Irrelevance of Constraints on the Q Distribution of Yields

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

18 More precisely, under Q, y τ τ τ τ τ

4.3 Conditions for Irrelevance of Constraints on Latent Factors

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

4.4 Relevance of Constraints on the Structure of Excess Returns

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

In comparison to the setting underlying Proposition 3 and Theorem 2,

time-series properties of yields. We explore the empirical implications of these

4.5 Relevance of Constraints on the P Distribution of Yields

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

4.6 Comparing the JSZ Normalization to Other Canonical Models

X t = (rt , μ1t , μ2t , . . . , μ N −1,t )0 , (32)

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

Model Name Specification

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012

JPC −0.00225 −0.0582 −0.0582 8.87

In order to facilitate comparison of the estimates across models with dif-

−0.25 −0.32 −0.03 −0.028 −1.8 −0.11

RPC 2.2 0.884 0.373 −0.569 0.584 −0.422

RCMT 2.23 0.73 0.316 −0.591 0.541 −0.362

RPC1 2.21 0.888 0.374 −0.572 0.586 −0.424

1Pt = K 0PP + K 1PP Pt−1 + ΣP tP (6)

1X t+1 = K 0X + K 1X X t + t , t ∼ N (0, Σ X ) i.i.d., (24)

1Pt = K 0PP + K 1PP Pt + ΣP tP , (37)