Advanced Bond Yield Forecasting
Advanced Bond Yield Forecasting
Kenneth J. Singleton
Graduate School of Business, Stanford University, and NBER
In any canonical Gaussian dynamic term structure model (GDTSM), the conditional fore-
casts of the pricing factors are invariant to the imposition of no-arbitrage restrictions. This
invariance is maintained even in the presence of a variety of restrictions on the factor
structure of bond yields. To establish these results, we develop a novel canonical GDTSM
in which the pricing factors are observable portfolios of yields. For our normalization,
standard maximum likelihood algorithms converge to the global optimum almost instanta-
neously. We present empirical estimates and out-of-sample forecasts for several GDTSMs
using data on U.S. Treasury bond yields. (JEL E43, G12, C13)
Dynamic models of the term structure often posit a linear factor structure for a
collection of yields, with these yields related to underlying factors P through
a no-arbitrage relationship. Does the imposition of no-arbitrage in a Gaussian
dynamic term structure model (GDTSM) improve the out-of-sample forecasts
of yields relative to those from the unconstrained factor model, or sharpen
model-implied estimates of expected excess returns? In practice, the answers
to these questions are obscured by the imposition of over-identifying restric-
tions on the risk-neutral (Q) or historical (P) distributions of the risk factors,
or on their market prices of risk, in addition to the cross-maturity restrictions
implied by no-arbitrage.1
We are grateful for helpful comments from Greg Duffee, James Hamilton, Monika Piazzesi, Pietro Veronesi
(the Editor), an anonymous referee, and seminar participants at the AFA annual meeting, MIT, the New York
Federal Reserve Bank, and Stanford. Send correspondence to Scott Joslin, Assistant Professor of Finance,
MIT Sloan School of Management E62-639, Cambridge, MA 02142-1347; telephone: (617) 324-3901.
E-mail: [email protected].
1 Recent studies that explore the forecasting performance of GDTSMs include Duffee (2002), Ang and Piazzesi
(2003), Christensen, Diebold, and Rudebusch (2007), Chernov and Mueller (2008), and Jardet, Monfort, and
Pegoraro (2009), among many others.
c The Author 2011. Published by Oxford University Press on behalf of The Society for Financial Studies.
All rights reserved. For Permissions, please e-mail: [email protected].
doi:10.1093/rfs/hhq128 Advance Access publication January 4, 2011
A New Perspective on Gaussian Dynamic Term Structure Models
We show that, within any canonical GDTSM and for any sample of bond
yields, imposing no-arbitrage does not affect the conditional P expectation of
P, E P [Pt |Pt−1 ]. GDTSM-implied forecasts of P are thus identical to those
from the unrestricted vector-autoregressive (VAR) model for P. To establish
these results, we develop an all-encompassing canonical model in which the
pricing factors P are linear combinations of the collection of yields y (such
as the first N principal components (PCs))2 and in which these “yield fac-
tors” follow an unrestricted VAR. Within our canonical GDTSM, as long as
P is measured without error, unconstrained ordinary least squares (OLS) gives
the maximum likelihood (ML) estimates of E P [Pt |Pt−1 ]. Therefore, enforcing
2 Although standard formulations of affine term structure models use latent (unobservable) risk factors (e.g., Dai
and Singleton 2000, Duffee 2002), by Duffie and Kan (1996) we are free to normalize a model so that the factors
are portfolios of yields on bonds and we choose PCs.
3 See, for example, Chen and Scott (1993) and Pearson and Sun (1994).
4 To emphasize, our canonical form is key to seeing the result; due to observational equivalence, the result holds
for any canonical form.
927
The Review of Financial Studies / v 24 n 3 2011
conditional covariance matrix of yields factors from the VAR. That is, given
ΣP , the entire cross-section of bond yields in an N -factor GDTSM is fully de-
Q Q
termined by only the N + 1 parameters r∞ and λQ . Moreover, (r∞ , λQ , ΣP )
can be efficiently estimated independently of the P conditional mean of Pt ,
rendering no-arbitrage irrelevant for forecasting P.
With these results in place, we proceed to show that the conditional fore-
cast E P [Pt |Pt−1 ] from a no-arbitrage GDTSM remains identical to its coun-
terpart from an unrestricted VAR even in the presence of a large class of
over-identifying restrictions on the factor structure of y. In particular, regard-
5 Though one might conclude from reading the recent literature that enforcing no-arbitrage improves out-of-
sample forecasts of bond yields, our theorems show that this is not the case. What underlies any documented
forecast gains in these studies from using GDTSMs is the combined structure of no-arbitrage and the auxiliary
restrictions they impose on the P distribution of y .
928
A New Perspective on Gaussian Dynamic Term Structure Models
fast regardless of the number of risk factors or bond yields used in estimation,
or whether the pricing factors P are measured with error.6
The rapid convergence to global optima using our canonical GDTSM makes it
feasible to explore rolling out-of-sample forecasts. For a variety of GDTSMs—
with and without measurement error in yield factors, and with and without
constraints on the dimensionality L of risk premia—we compare the out-of-
sample forecasting performance relative to a benchmark unconstrained VAR,
and confirm our theoretical predictions in the data.
6 To put this computational advantage into perspective, one needs to read no further than Duffee and Stanton
(2007) and Duffee (2009), who highlight numerous computational challenges and multiple local optima associ-
ated with their likelihood functions. For example, Duffee reports that each optimization for his parametrization
of a three-factor model takes about two days. In contrast, for the GDTSM(3) models examined in this article,
convergence to the global optimum of the likelihood function was typically achieved in about ten seconds, even
though there are three times as many observations in our sample.
7 All of our results apply equally to a continuous-time Gaussian model. Also, we assume that the risk factors, and
hence the yield curve yt , are first-order Markov. See the supplement to this article (Joslin, Singleton, and Zhu
2010) and Joslin, Le, and Singleton (2010) for relaxations of this assumption.
8 Invariant transforms (Dai and Singleton 2000) involve rotating, scaling, and translating the state and parameter
vectors to keep the short rate and bond prices unchanged (invariant), usually by mapping Yt = AX t + b, where
A is an invertible matrix. The transformed parameters are outlined in Appendix B.
929
The Review of Financial Studies / v 24 n 3 2011
Q Q
yt,m = Am (Θ X ) + Bm (Θ X ) ∙ X t , (4)
with measurement error. To be consistent with the data, we must impose aux-
iliary structure on a GDTSM, beyond no-arbitrage, in the form of a parametric
distributional assumption for the measurement errors. We let {P θm }θm ∈Θm de-
note the family of measures that describe the conditional distribution of yt − yto .
9 Duffie and Kan (1996) and Cochrane and Piazzesi (2005) also propose to use an identification scheme where
the yields themselves are factors. Adrian and Moench (2008) explore a setting where the pricing factors are the
portfolios themselves; however, they do not impose the internal consistency condition to make the factors equal
to their no-arbitrage equivalents and instead focus on the measurement errors. Our formulation offers an analytic
parametrization and additionally makes transparent our subsequent results.
930
A New Perspective on Gaussian Dynamic Term Structure Models
Theorem 1. Suppose that Case P holds for given fixed portfolio weights W .
Then, any canonical GDTSM is observationally equivalent to a unique GDTSM
whose pricing factors Pt are the portfolios of yields Wyt = W yto . Moreover,
Q
the Q distribution of Pt is uniquely determined by (λQ , k∞ , ΣP ), where λQ is
ordered.10 That is,
Q Q
is a canonical GDTSM, where K 0P , K 1P , ρ0P , and ρ1P are explicit functions
Q Q
of (λQ , k∞ , ΣP ). Our canonical form is parametrized by ΘP = (λQ , k∞ ,
K 0PP , K 1PP , ΣP ).
10 We fix an arbitrary ordering on the complex numbers such that 0 is the smallest number.
931
The Review of Financial Studies / v 24 n 3 2011
Q Q
W
GP = {(K 0 , K 1 , ρ0 , ρ1 , K 0P , K 1P , Σ) : the factors are portfolios
with weights W }.
This first step is easily established: For any GDTSM with latent state X t , Pt
satisfies (5). Following Dai and Singleton (2000) (DS), we can, by applying
the change of variables outlined in Appendix B, compute the dynamics (under
11 More formally, we think of the set of GDTSMs as a set of stochastic processes for the yield curve rather than
as a set of parameters governing the stochastic process of the yield curve. To see the correspondence, we define
on some probability space (Ω, F , P) (with associated filtration {Ft }) the processes y : Ω × N → RN+ . Here,
ytm (ω) is the m -period yield at time t when the state is ω ∈ Ω . When our additional assumption that y is a
Gaussian Markov process and no-arbitrage is maintained (with risk premia at time t depending only on Ft ),
these processes take the form of (1–3) and (4) for some parameters. In this way, we define a surjective map from
Q Q
the set of GDTSM parameters (K 0 , K 1 , ρ0 , ρ1 , K 0P , K 1P , Σ) to the set of GDTSM stochastic processes. With
this association, two GDTSMs are observationally equivalent when the corresponding stochastic processes have
the same finite-dimensional distributions.
932
A New Perspective on Gaussian Dynamic Term Structure Models
Here, we specify the Jordan form with each eigenvalue associated with a
single Jordan block (that is, each eigenvalue has a geometric multiplicity of
Q
one). Thus, when the eigenvalues are all real, K 1X takes the form
Q Q Q
K 1X = J (λQ ) ≡ diag(J1 , J2 , . . . , JmQ ), where each
Q
λ 1 ∙∙∙ 0
i Q
Q 0 λi ∙∙∙ 0
Ji = ,
and where the blocks are in order of the eigenvalues. (See Appendix C for
the real Jordan form when the eigenvalues are complex.) We refer to the set
of Jordan canonical GDTSMs as G J , and it is parametrized by Θ J = (λQ ,
Q P , K P , Σ ). The eigenvalues of λQ may not be distinct and may be
k∞ , K 0X 1X X
complex. We explore these possibilities empirically in Section 5.
Proof of Theorem 1: Having already established that we can rotate any model
to one with Pt as the observed states, we proceed to prove the second step.
Suppose that Θ1 , Θ2 ∈ GP W index two observationally equivalent canonical
where X tiJ is the latent state for model ΘiJ , it must be that
Here, we use the notation that for a GDTSM with parameter vector Θ and
state X t , the observationally equivalent GDTSM with latent state X̂ t = C +
D X t has parameter vector Θ̂ = C + DΘ, as computed in Appendix B. Since
observational equivalence is transitive, Θ1J is observationally equivalent to Θ2J ;
the uniqueness result in Proposition 1 implies that Θ1J = Θ2J . The equality in
(12) then gives Θ1 = Θ2 , which establishes our second step.
To establish the reparametrization in the third step, we focus on (11) and
Q
(12). The key is to show explicitly how given (λQ , k∞ ) (from ΘiJ ) we can (i)
choose the parameters (K 0J P , K P , Σ ) to get any desired (K P , K P , Σ );
1J J 0P 1P P
Q Q
and (ii) construct the (K 0 , K 1 , ρ0 , ρ1 ) consistent with the factors being Pt .
Details are provided in Appendix D.
933
The Review of Financial Studies / v 24 n 3 2011
12 One implication of this observation is that setting both k Q and r Q to zero in the presence of a Q nonstationary
∞ ∞
risk factor, as was done by Christensen, Diebold, and Rudebusch (2007, 2009) in defining their arbitrage-free
Nelson-Siegel model, amounts to imposing an over-identifying restriction on the drift of X 1t .
934
A New Perspective on Gaussian Dynamic Term Structure Models
f (yto |yt−1
o
; Θ) = f (yto |Pt ; λQ , k∞
Q
, ΣP , P θm ) × f (Pt |Pt−1 ; K 1PP , K 0PP , ΣP ).
(18)
Notice the convenient separation of parameters in the likelihood function. The
conditional distribution of the yields measured with errors depends only on
Q
(λQ , k∞ , ΣP , P θm ) and not on (K 0PP , K 1PP ). In contrast, the conditional P-
density of the pricing factors Pt depends only on (K 1PP , K 0PP , ΣP ), and not on
Q
(λQ , k∞ ). Using the assumption that Pt is conditionally Gaussian, the second
term in (18) can be expressed as
13 Implicit in this formulation is the possibility that Cov(y o |P ; λQ , k Q , Σ ) is singular. This would be true in
t t ∞ P
Case Y, where some yields are measured without errors, or when certain portfolios of yto are priced perfectly,
as with the use of principal components as observable factors or as in Chen and Scott (1995), who use different
portfolios of yields as their factors. This setup also accommodates the case where both P and some of the
individual components of yto are priced perfectly by the GDTSM. Furthermore, the errors may be correlated,
non-normal, or have time-varying conditional moments depending on Pt . In practice, it has typically been
assumed that the pricing errors are normally distributed.
935
The Review of Financial Studies / v 24 n 3 2011
where E t−1 [Pt ] = K 0PP + (I + K 1PP )Pt−1 and where for a vector x, kxk2
P 2
denotes the euclidean norm squared: xi . The parameters (K 0PP , K 1PP ) that
maximize the likelihood function f (conditional on t = 0 information), namely
T
X
(K 0PP , K 1PP ) = argmax f (yto |yt−1
o
; K 1PP , K 0PP , ΣP )
t=1
T
X
−1
= argmin kΣP Pto − E t−1 [Pto ] k2 , (20)
Absent constraints linking the P and Q dynamics, one can effectively sepa-
rate the time-series properties (P) of Pt from the cross-sectional constraints
imposed by no-arbitrage (Q). The parameters governing P forecasts distri-
bution thus can be estimated from time series alone, regardless of the cross-
Q
sectional restrictions. Furthermore, independent of (λQ , k∞ , ΣP ), the OLS
estimates of (K 0PP , K 1PP ) are by construction globally optimal. With (K 0PP ,
K 1PP ) at hand, we use the sample conditional variance of Pt , Σ̂P Σ̂0P , com-
puted from the OLS innovations as the starting value for the population vari-
ance ΣP ΣP 0 . Given (λQ , k Q ), this starting value for Σ Σ0 is again by
∞ P P
construction close to the global optimum. Therefore, we have greatly reduced
the number of parameters to be estimated. For instance, in a GDTSM(3) model,
the maximum number of parameters, excluding those governing P θm , is 22 (3
Q
for λQ , 1 for k∞ , 6 for ΣP , 3 for K 0PP and 9 for K 1PP ). With our normal-
Q
ization, one can focus on only the 4 parameters (λQ , k∞ ). This underlies the
substantial improvement in estimation speed for the JSZ normalization over
other canonical forms.
Key to our argument is the fact that we can parametrize of the conditional
distribution of the yields measured with error independently of the parameters
governing the P-conditional mean of P in the sense of the factorization (18).
Q
For any (K 0PP , K 1PP , ΣP , λQ , k∞ ), we have
f (yto |Pt ; λQ , k∞
Q
, ΣP ) × f (Pt |Pt−1 ; K 1PP , K 0PP , ΣP )
≤ f (yto |Pt ; λQ , k∞
Q
, ΣP ) × f (Pt |Pt−1 ; K 1PP ,O L S , K 0PP ,O L S , ΣP ),
(21)
936
A New Perspective on Gaussian Dynamic Term Structure Models
f (yto |yt−1
o
; Θ) = f (yto |Pt ; K 0PP , K 1PP , ΣP , ρ0 , ρ1 , Λ0 , Λ1 )
× f (Pt |Pt−1 ; K 1PP , K 0PP , ΣP ). (22)
Replacing (K 0PP , K 1PP ) with (K 0PP ,O L S , K 1PP ,O L S ) again increases the second
term, but now the first term is affected as well. Thus, within this parameteriza-
tion, the fact that OLS recovers the ML estimates is completely obscured.15
14 The last step requires observable factors, another important element of our argument. See Section 3 and (23).
15 In fact, within a macro-GDTSM with a similar parametrization of internally consistent market prices of risk
and observable factors, Ang, Piazzesi, and Wei (2003) report that OLS estimates of E P [Pt+1 |Pt ] are (slightly)
different from their ML estimates. Our analysis generalizes to macro-GDTSMs (see Joslin, Le, and Singleton
2010) and so, in fact, the OLS estimates are the (conditional) ML estimates.
16 Note that, in principle, enforcing no-arbitrage restrictions may be relevant for the construction of forecast confi-
dence intervals through the dependence on ΣP . However, empirically this effect is likely to be small.
17 Duffee (2009) also shows theoretically that no-arbitrage is cross-sectionally irrelevant in any affine model under
the stochastically singular condition of no measurement errors. That is, if the model exactly fits the data without
measurement errors, the cross-sectional loadings (A,B) of (4) are determined without reference to solving the
Ricatti difference equations (A2–A3). Duffee does not theoretically explore the time-series implications of the
no measurement error assumption. In this case, not only would Proposition 3 apply (since Case P is a weaker
assumption) so that the OLS estimates are the ML estimates of (K 0P P , K P ), but also Σ could be inferred
1P P
from a sufficiently large cross-section of bond prices.
937
The Review of Financial Studies / v 24 n 3 2011
(23)
938
A New Perspective on Gaussian Dynamic Term Structure Models
939
The Review of Financial Studies / v 24 n 3 2011
Theorem 2. Given the state-space model (24–25) and the portfolio matrix W
determining the factors Pt , let H be a subset of the admissible set of η where,
for any (C, D, Σ X , P θm ) ∈ H, the N × N upper left block of D is full rank.
Consider the ML problem with η constrained to lie in the subspace H:
H H
K 0X , K 1X , ηH ∈ arg max f (PT , yT , . . . , P1 , y1 |P0 , y0 ).
K 0X ,K 1X ;η∈H
1Pt = K 0P + K 1P Pt + tP .
The proof is similar, though notationally more abstract, to the proof of
Proposition 3 and is presented in Appendix E.
Using this result, we first illustrate the estimation of the general state-space
model of (24–25) when the possibility of arbitrage is not precluded. We next
explore the implications of restrictions on the Q and P distributions, as well as
on risk premia, for the conditional distribution of Pt .
940
A New Perspective on Gaussian Dynamic Term Structure Models
941
The Review of Financial Studies / v 24 n 3 2011
942
A New Perspective on Gaussian Dynamic Term Structure Models
and output growth. We explore the relevance for forecasting bond yields of
imposing the constraint L within GDTSMs that condition risk premiums on
the pricing factors P. When this constraint is (approximately) valid, improved
forecasts of yt may arise from the associated reduction in the dimensionality
of the parameter space.
To interpret this constraint, note from Cox and Huang (1989) and Joslin,
Priebsch, and Singleton (2010) that one-period, expected excess returns on
portfolios of bonds with payoffs that track the pricing factors Pt , say xrPt , are
given by the components of
Q
That is, the i th component of (K 1PP − K 1P )Pt is the source of the risk premium
for pure exposures to the i th component of Pt . Therefore, the constraint that the
one-period expected excess returns on bond portfolios are driven by L linear
combinations of the pricing factors P amounts to the constraint that the rank
Q
of A R R P = K 1PP − K 1P is L.20
The reduced rank risk premium GDTSMs can be estimated through a con-
Q
centration of the likelihood in the same spirit as (18). Given (λQ , k∞ , ΣP ,
θ P P
P m ), the ML estimates of (K 0P , K 1P ) can be computed as follows. First,
compute (α, β) from the regression
Q Q
Pt+1 − (K 0P + K 1P Pt ) = α + βPt + tP , (29)
where we fix the volatility matrix ΣP of errors tP and impose the constraint
that β has rank L. We show in Appendix F how one can compute the ML
Q
estimates of this constrained regression in closed form. For a given (λQ , k∞ ,
θ
ΣP , P m ), the ML estimates of the P parameters are then given by
Q Q
K 0PP = K 0P + α̂, K 1PP = K 1P + β̂. (30)
20 Alternatively, we could restrict the rank of [K P − K Q , K P − K Q ] to L. This would enforce the stronger
0P 0P 1P 1P
restriction that only L linear combination of the factors has non-zero expected excess return.
943
The Review of Financial Studies / v 24 n 3 2011
T
X
(K 0c∗ (ΣP ), K 1c∗ (ΣP )) = arg max f (Pto |Pt−1
o
; K 1PP , K 0PP , ΣP ), (31)
P ,K P
K 0P 1P t=1
where the arg max is taken over (K 0PP , K 1PP ) satisfying the appropriate re-
striction on the P dynamics. In the presence of such restrictions, there is a
non-degenerate dependence of (K 0c∗ , K 1c∗ ) on ΣP . This dependence means
that no-arbitrage (which links ΣP across P and Q) affects the ML estimates
of (K 0PP , K 1PP ).
We explore the empirical implications of two types of restrictions on the P
distribution of yields in Section 5: (1) a model with K 1PP constrained to be
diagonal; and (2) a model in which the Pt are cointegrated (with one unit root
and no trend).
21 See Campbell and Shiller (1991) (among others) for empirical evidence on cointegration among bond yields.
Diebold and Li (2006) adopt an assumption very similar to the second example.
944
A New Perspective on Gaussian Dynamic Term Structure Models
and Jones (2008) (CGJ). All three of these canonical models—DS, Joslin, and
CGJ—are observationally equivalent.22
In the constant volatility subcase of the CGJ setup, the state vector X t is
completely defined by rt and its first N − 1 moments under Q:
where
1 Q 1 Q
μ1t = E (drt ), μk+1,t = E (dμkt ), k = 1, . . . , N − 2. (33)
Q
where Σ X is lower triangular, K 0,C G J = (0, 0, . . . , 0, γ )0 , and Z t is the stan-
Q
dard Brownian motion. By construction, the matrix K 1,C G J is the companion
Q
matrix factorization of the feedback matrix K 1X in (9).
The sense in which X t is observable in the CGJ normalization is quite
different than in the JSZ normalization, and these differences may have prac-
tical relevance. First, it will not always be convenient to assume that the one-
period short-rate rt is observable. Duffee (1996) highlights various liquidity
and “money-market” effects that might distort yields on short-term bond rela-
tive to what is implied by a GDTSM. The true short rate—the one that implic-
itly underlies the pricing of long-term bonds—will not literally be observable
absent an explicit model of these money-market effects. Second, actions by
monetary authorities might necessitate the inclusion of additional risk factors
or jumps in these factors when explicitly including short rates in the analysis
of a DTSM (Piazzesi 2005). Within the JSZ normalization, one is free to define
the portfolio matrix W so as to focus on segments of the yield curve away from
the very short end, while preserving fully observable P.
22 Different choices of normalizations, associated with different, unique matrix factorizations of the feedback ma-
Q
trix K 1X , give rise to observationally equivalent models, through models with different structure to their param-
eter sets. The JSZ normalization is based on the real Jordan factorization used in Proposition 1. CJG adopt the
companion factorization. For any monic polynomial p(x) = x n − μn−1 x n−1 − ∙ ∙ ∙ − μ1 x − μ0 , the companion
matrix is 0 1 0 ∙∙∙ 0
0 0 1 ∙ ∙ ∙ 0
. . . .. .
C( p) = . . . . .
. . . . .
0 0 0 ∙∙∙ 1
μ0 μ1 μ2 ∙∙∙ μn−1
Given any matrix K , its monic characteristic polynomial is unique, and the matrix K is similar to its companion
matrix C( p(K )).
945
The Review of Financial Studies / v 24 n 3 2011
More subtly, the construction of the state vector in the CGJ normalization re-
quires the parameters of the Q distribution. Therefore, any change in the imple-
mentation of a GDTSM that changes the implied Q parameters will necessarily
change the observed pricing factors under the CGJ normalization. Fitting the
same model to two overlapping sample periods could, for example, give rise to
different values of the observed state variables during the overlapping period.
In contrast, under the JSZ normalization, we are led to identical values of P
for all overlapping sample periods.
Full separation of the P and Q sides of the unrestricted model appears to be
5. Empirical Results
We estimate the three-factor GDTSMs summarized in Table 1 by ML using
the JSZ canonical form and the methods outlined in Section 3.23 As all of
our estimated models are stationary under Q, we report our results in terms
Q Q
of r∞ instead of k∞ . The data are end-of-month, Constant Maturity Treasury
(CMT) yields from release Fed H.15 over the period from January 1990 to
December 2007 (216 observations). The maturities considered are 6 months,
and 1, 2, 3, 5, 7, and 10 years. From these coupon yields we bootstrap a
zero-coupon curve assuming constant forward rates between maturities. Within
Case P, we consider several subcases. With distinct real eigenvalues, we as-
sume the first three principal components (PCs) are measured without error
(RPC); or the 0.5-, 2-, and 10-year zero coupon yields are measured without
error (RY). Additionally, we estimate models that price the first three PCs of
Table 1
Summary of Model Specifications
ˉ Q denotes the complex conjugate of the i th element of λQ . Also, we defer discussion of case RKF, in which all
23 λ
i
yields are measured with error and Kalman filtering is applied, until Section 6.
946
A New Perspective on Gaussian Dynamic Term Structure Models
the zero curve exactly under the constraints of repeated eigenvalues (JPC) and
complex eigenvalues (CPC). Model JPC imposes the eigenvalue constraint of
the AFNS model examined by Christensen, Diebold, and Rudebusch (2009).
Finally, a subscript of “1” indicates the case of reduced-rank risk premiums
(L = 1) with the one-period expected excess returns being perfectly correlated
across bonds. In all cases, except as noted, the component of measurement er-
rors orthogonal to W are assumed to be normally distributed.24 Although we
derive portfolios from the principal components, one could also use portfo-
lio loadings from various parametric splines for yields such as Nelson-Siegel
Case C: N coupon bonds are priced exactly, and J − N coupon bonds are
measured with normally distributed errors in the GDTSM.
In implementing Case C with coupon-bond data, one can still select N port-
folios of zero coupon yields and construct the rotation where these portfolios
comprise the state vector. Even though such yields may not be observed, this
rotation is still valuable because the portfolios of model-implied zero yields Pt
can be approximated from the observed data. For example, one could bootstrap
or spline an approximate zero coupon yield curve from the observed coupon
bond prices and, from an approximation of Pt , call it Pta . Importantly, the pro-
jection of Pta onto its own lag will recover reliable starting values for K 0PP
and K 1PP . However, because coupon bond yields are nonlinear functions of P,
the irrelevance propositions discussed in Section 3 do not apply to Case C.
In our empirical implementation, we consider the case of the 0.5-, 2-, and
10-year CMT yields measured without error, and the 1-, 3-, 5-, 7-year par
coupon yields measured with errors (RCMT). Throughout, we report asymp-
totic standard errors for the maximum likelihood estimates that are computed
using the outer product of the first derivative of the likelihood function to
estimate the information matrix (see Berndt et al. 1974).
24 In Case Y, this assumption amounts to yield measurement errors being distributed i.i.d. N (0, σ 2 ). When W
p
comes from the principal components, the assumption is equivalent to the higher-order PCs (n > N ) being
distributed N (0, σ p2 ). In both of these cases, we can concentrate σ p from the likelihood (conditional on t = 1
PT o −y
information) through σ̂ p2 = t=2,m (yt,m 2
t,m ) / ((T − 1) × (J − N )) , where yt,m are the model yields that
depend on all the other parameters. To be more precise about the error assumption, let W⊥ ∈ R(J −N )×J be a
basis for the orthogonal complement of the row span of W . Then, since W has orthonormal rows, we can express
yto in terms of its projection onto W and the orthogonal complement to W as yto = W 0 W yto + (W⊥ )0 W⊥ yto =
W 0 Pt + (W⊥ )0 W⊥ yto . We assume yto − yt |Pt has the degenerate distribution N (W 0 Pt , σ p2 (W⊥ )0 W⊥ ) (which
is rotation invariant in the sense that the likelihood is the same for alternative choices of base for the orthogonal
complement to W ). Equivalently, the projection of yto onto W⊥ expressed in the coordinates W⊥ is i.id. normal:
W⊥ yto ∼ N (0, σ p2 I J −N ). This distribution satisfies P(W yto = Pt |Pt ) = 1.
947
The Review of Financial Studies / v 24 n 3 2011
Table 2
ML estimates of the risk-neutral parameters of the model-implied principal components
Parameter Estimate
Q Q Q Q Q
Model λ1 λ2 λ3 /im(λ2 ) r∞
RPC −0.0024 −0.0481 −0.0713 8.61
(0.000566) (0.0083) (0.0133) (0.73)
RY −0.00196 −0.0404 −0.0897 9.37
(0.000378) (0.00274) (0.0073) (0.789)
RKF −0.00245 −0.0472 −0.0739 8.45
(0.000567) (0.00724) (0.0125) (0.678)
RCMT −0.00178 −0.0372 −0.103 11.2
Q
r∞ is normalized to percent per annum (by multiplying by 12 × 100). Asymptotic standard errors are given in
parentheses.
25 That is, under Case Y or when the CMT yields are priced perfectly by the GDTSM, after estimation, we impose
the JSZ normalization based on the PCs of zero yields as the state variables.
948
Table 3
ML estimates of the physical parameters of the model-implied principal components
Model Parameter Estimate
P
K 1,11 P
K 1,12 P
K 1,13 P
K 1,21 P
K 1,22 P
K 1,23 P
K 1,31 P
K 1,32 P
K 1,33 θ1P θ2P θ3P
RPC −0.25 0.16 5.2 0.032 −0.32 4.2 −0.03 −0.028 −1.8 −0.11 0.025 0.0063
(0.16) (0.54) (2.8) (0.054) (0.24) (1.2) (0.023) (0.088) (0.46) (0.028) (0.0075) (0.00035)
RY −0.25 0.11 5.5 0.037 −0.31 4.1 −0.03 −0.034 −1.8 −0.11 0.026 0.0061
(0.15) (0.55) (2.7) (0.054) (0.22) (1.2) (0.023) (0.091) (0.47) (0.027) (0.0075) (0.00035)
RKF −0.12 0.33 6.7 0.0078 −0.35 4.7 −0.021 −0.007 −1.2 −0.14 0.026 0.0063
(0.13) (0.52) (2.9) (0.052) (0.22) (1.2) (0.018) (0.075) (0.42) (0.029) (0.0055) (0.00029)
RCMT −0.25 0.11 5.7 0.037 −0.32 4.1 −0.031 −0.032 −1.8 −0.11 0.026 0.0062
(0.15) (0.55) (2.6) (0.056) (0.23) (1) (0.02) (0.071) (0.43) (0.044) (0.0093) (0.00052)
JPC −0.25 0.16 5.2 0.032 −0.32 4.2 −0.03 −0.028 −1.8 −0.11 0.025 0.0063
(0.15) (0.54) (2.7) (0.054) (0.24) (1.2) (0.023) (0.087) (0.46) (0.027) (0.0074) (0.00035)
CPC 0.16 5.2 0.032 4.2 0.025 0.0063
A New Perspective on Gaussian Dynamic Term Structure Models
RPC1 −0.24 −0.16 7.4 0.031 −0.14 3.3 −0.025 −0.061 −1.5 −0.11 0.025 0.0063
(0.13) (0.37) (2) (0.04) (0.14) (0.72) (0.016) (0.057) (0.3) (0.035) (0.012) (0.00039)
RY1 −0.24 −0.14 7.3 0.038 −0.17 3.3 −0.026 −0.055 −1.6 −0.11 0.026 0.0061
(0.13) (0.38) (1.8) (0.04) (0.14) (0.64) (0.018) (0.062) (0.29) (0.03) (0.011) (0.00037)
RCMT1 −0.25 −0.11 7.1 0.042 −0.18 3.3 −0.029 −0.045 −1.7 −0.11 0.025 0.0062
(0.15) (0.55) (2.6) (0.057) (0.23) (1.1) (0.02) (0.072) (0.42) (0.04) (0.013) (0.0005)
JPC1 −0.23 −0.18 7.4 0.03 −0.14 3.3 −0.025 −0.064 −1.5 −0.11 0.025 0.0063
(0.13) (0.36) (1.9) (0.04) (0.14) (0.74) (0.016) (0.056) (0.31) (0.036) (0.012) (0.00039)
The long-run P mean of P is defined by θ P = −(K 1P )−1 K 0P . K 1P is annualized (by multiplying by 12). Asymptotic standard errors are given in parentheses.
949
Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012
950
Table 4
ML estimates of the conditional covariance of the model-implied principal components
Model Parameter Estimate
σ1 σ2 σ3 ρ12 ρ13 ρ23
Table 4 reveals that initializing ΣP using OLS residuals leads to very accu-
rate starting values. By way of contrast, if we had instead used the Dai and
Singleton (2000) (DS) canonical form, an accurate initialization of Σ X would
Q
require a reliable initial value for K 1 . The JSZ canonical form allows us to
Q
avoid this interplay between the values of Σ X and K 1 by applying no-arbitrage
Q
constraints to determine K 1P independently of ΣP .
Across all specifications, the parameters are very comparable. Partly this is a
consequence of Proposition 3: whether λQ comprises distinct real eigenvalues
(RPC), complex eigenvalues (CPC), or repeated eigenvalues (JPC), the esti-
28 An exception here is the Jordan form, where typically there were two local extrema with either the smaller
Q
or the larger eigenvalue repeated. Another general consideration is that one must either optimize over k∞ or
Q
alternatively impose Q stationarity on the model if one desires to use r∞ in estimation. In fact, for estimation
Q Q
purposes, the issue of using k∞ versus r∞ is largely obviated by results in Joslin, Le, and Singleton (2010), who
Q
show how one can concentrate out k∞ under Case P.
951
The Review of Financial Studies / v 24 n 3 2011
Q
form with three extra constraints, including a repeated eigenvalue of K 1 . To
Q Q
assess the validity of the null hypothesis λ2 = λ3 , under the JSZ normaliza-
tion, we perform a Likelihood Ratio (LR) test against the alternative that λQ
is unconstrained. With this one linear constraint, the LR test statistic has an
asymptotic χ 2 distribution with one degree of freedom, χ 2 (1).
The second test of interest is the dimensionality of the one-period risk pre-
mium which, as discussed in Section 4.4, is captured by the rank of A R R P =
Q
K 1PP − K 1P . To impose the constraint that L = 1, we start with the singular
value decomposition of A R R P , UDV0 , where U and V are unitary matrices and
Therefore, the expected excess returns xrPt (see Section 4.4) are given by
XN
Q
xr Pt = K 0PP − K 0P + U•1 ∙ D11 V j1 P jt , (36)
j=1
where U•1 is the first column of U . The second term on the right-hand side
of (36) expresses the time-varying components of xrPt in terms of a common
linear combination V•1 0 P of the pricing factors. All of the parameters in (36)
t
0 V
are econometrically identified by virtue of the facts that V•1 •1 = 1 (which
identifies D11 ) and U•1 0 U (which identifies the weights on D V 0 P ). Fur-
•1 11 •1 t
thermore, given N , (36) implies (N − 1)2 cross-equation restrictions on the
parameters of the conditional expectation xrPt . In our case, N = 3, so there
are 4 cross-equation restrictions.
Tests for the equality of two eigenvalues are reported in the top panel of
Table 5, where a leading J means that the model was estimated under the
Q Q
constraint that λ2 = λ3 (consistent with the specifications of AFNS models).
In the PC-based models, this null hypothesis is not rejected, while for the
yield-based models it is rejected at conventional significant levels. To interpret
this difference across choices of risk factors, we note from Table 2 that the
Q Q
estimated |λ2 − λ3 | is larger in model RY than in model RPC, with most
Q
of this difference being attributable to the larger value of |λ3 | in model RY.
Q
The eigenvalue λ3 governs the relatively high-frequency Q variation in yields
and, thus, is particularly relevant for the behavior of the short end of the yield
curve. Introducing the six-month yield directly as a pricing factor overweights
the short end of the yield curve relative to having the PCs as pricing factors, as
the latter are portfolios of yields along the entire maturity spectrum.
952
A New Perspective on Gaussian Dynamic Term Structure Models
Table 5
Likelihood ratio tests
Q Q
H0 : λ2 = λ3
The top panel reports tests for equality of two eigenvalues, and the bottom panel reports tests for rank-1 risk
premium. The likelihood-ratio statistics are computed as LR = −2(T − 1)(log L 0 − log L a ), where T = 216
is sample size and log L 0 and log L a are the log-likelihoods under the null and alternative, respectively. All
log-likelihoods are conditional on t = 1 and are time-series averages across the T − 1 observations.
In the bottom panel, we report tests of the reduced-rank, risk premium hy-
pothesis that L = 1. Under all model specifications, this hypothesis cannot be
rejected. This finding is consistent with the conclusions reached by Cochrane
and Piazzesi (2005), though they effectively considered models with N = 5 as
they examined PC1 through PC5.
29 The average log-likelihood (across t ) for the unconstrained no-arbitrage model was 38.392, while for the
diagonal-constrained model it was 38.291. The corresponding likelihood ratio test statistic is 44.0, far exceeding
the 99% rejection region of 16.8, indicating a very strong rejection of this constraint.
953
The Review of Financial Studies / v 24 n 3 2011
Table 6
P constrained to be diagonal
The conditional mean parameters for the model with K 1P
With No Arbitrage Without No Arbitrage
P
K 0P P
K 1P P
K 0P P
K 1P
−0.0129 −0.151 −0.0129 −0.151
(0.0193) (0.135) (0.0188) (0.131)
0.00754 −0.286 0.00761 −0.289
(0.00636) (0.202) (0.00635) (0.201)
0.013 −1.97 0.0129 −1.95
(0.00292) (0.423) (0.00292) (0.421)
P is annualized by multiplying by 12. The left panel imposed no-arbitrage and uses yield data for all matu-
Table 7
The conditional mean parameters for the model with cointegration with no trend and one unit root
imposed
With No Arbitrage Without No Arbitrage
P
K 0P P
K 1P P
K 0P P
K 1P
The left panel imposed no-arbitrage and uses yield data for all maturities. The right panel does not use no-
P , KP ]
arbitrage and simply computes the estimates of a VAR of Pt with cointegration imposed so that [K 0P 1P
has rank 2.
954
A New Perspective on Gaussian Dynamic Term Structure Models
Table 8
The standard errors of the parameter estimates computed both by the asymptotic method and using a
bootstrap method
Parameter Estimate Asymptotic S.E. Bootstrap S.E.
P
K 1,11 −0.2543 (0.1551) (0.2733)
P
K 1,12 0.1595 (0.5428) (0.8277)
P
K 1,13 5.235 (2.761) (3.1)
P
K 1,21 0.03235 (0.05425) (0.1057)
P
K 1,22 −0.3153 (0.2359) (0.3187)
Here, θ P = −(K 1P )−1 K 0P and ρi j is the conditional correlation between the i th and j th components of Pt .
errors using the outer product of the first derivative of the likelihood function.
We compare these results to bootstrapped standard errors computed with the
procedure given in Section 5.2.
Table 8 presents the results for the model RPC. The asymptotic standard
errors tend to overstate the precision with which we measure the effect of the
level PC on the conditional means of the PCs (K 1,11P , K P , K P ) by a factor
1,21 1,31
of about two. These effects on standard errors for K 1P and θ P are necessarily
due to the small sample properties of OLS estimates in the VAR for P since,
by Proposition 3, the full information ML estimates in the GDTSM agree with
the OLS estimates. Additionally, the precision with which we estimate the Q
parameters is overstated by the asymptotic method by a factor of about 50%.
Overall, though, the asymptotic standard errors line up rather well with the
bootsrapped standard errors.
955
The Review of Financial Studies / v 24 n 3 2011
30 For the cointegration example, we enforce the constraint that [K P , K P ] has a zero eigenvalue or, equivalently,
0P 1P
there is a common unit root and no trend.
956
Table 9
The improvement in out-of-sample forecast accuracy relative to the forecasts from a VAR(1)
Forecast Error Relative to PC1 PC2 PC3
Unconstrained VAR(1) (%)
1m 3m 6m 12m 1m 3m 6m 12m 1m 3m 6m 12m
RPC 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
RY −0.3 −0.5 −0.8 −0.7 0.2 0.4 0.4 0.0 0.1 0.8 1.3 0.8
RKF 0.9 3.0 5.9 12.9 −1.7 −4.7 −7.7 −10.0 1.2 3.3 7.7 10.6
JPC 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 −0.0 −0.0 −0.0
CPC 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 −0.0 −0.0 −0.0
RPC1 −2.1 −4.3 −6.2 −7.1 −2.0 −3.8 −3.8 −1.6 −1.5 −2.7 −2.5 0.2
RY1 −2.2 −4.8 −7.3 −8.8 −1.9 −3.9 −3.9 −1.8 −1.6 −2.7 −2.5 −1.0
JPC1 −2.3 −4.7 −6.7 −8.2 −1.9 −3.7 −4.2 −2.7 −1.5 −2.6 −1.9 0.6
P )
VAR + diag (K 1P −5.3 −12.1 −18.6 −21.6 0.7 6.3 11.6 5.7 −2.4 −5.4 −9.1 −13.0
P )
RPC + diag (K 1P 0.7 6.3 11.6 5.6
A New Perspective on Gaussian Dynamic Term Structure Models
T
1
v
1PCit+1 − E t [1PCit+1 ] ,
u
t=60
T − 59
u X 2
u
t
where the expectation, E t , is computed using the model estimated with data from time τ = 1, . . . , t. For example, a number of -5 implies that the model has 5% smaller out-of-sample
RMSE than the unrestricted VAR(1).
957
Downloaded from http://rfs.oxfordjournals.org/ at University of Southern California on January 17, 2012
The Review of Financial Studies / v 24 n 3 2011
31 Christensen, Diebold, and Rudebusch (2007) assume that all yields are measured with additive measurement
errors, the case we turn to in Section 6. However, three-factor models price bonds quite accurately over the
maturity range that they and we consider, so Theorem 2 should be informative about their findings.
958
A New Perspective on Gaussian Dynamic Term Structure Models
one can view Pt = W yt as the “true” values of the pricing factors and view
Pto = W yto as its observed counterpart.32
To set up the Kalman filtering problem for Case F, we start with a given
Q
set of portfolio weights W ∈ R J ×N . From W and (λQ , r∞ , ΣP ), we construct
Q Q
(K 0 , K 1 , ρ0 , ρ1 ) as prescribed in Proposition 2. From the no-arbitrage rela-
tion (A2–A3) we then construct A ∈ R J and B ∈ R J ×N with yt = A + BPt
and thus the relations
where tP ∼ N (0, I N ) and tm ∼ N (0, I M ) are the measurement errors. Re-
searchers have considered several parameterizations of the volatility matrix
ΣY for tm . In our subsequent empirical examples, we examine the cases of
independent (diagonal ΣY ) errors with distinct or common volatilities. These
relations give the usual observation and state equations of the Kalman filter,
and they fully characterize the conditional distribution of the yield curve in
terms of rotation-invariant parameters.
The computational benefits from using the JSZ normalization in Case F
arise, in part, from the observation that the least-squares projection of Pto onto
o
Pt−1 will nearly recover the ML estimates of K 0PP and K 1PP to the extent that
Pt ≈ Pt (and we can choose portfolios, such as the principal components, to
o
make these errors small).33 Additionally, although not exact, we have nearly
concentrated the likelihood in that the optimal P parameters will typically have
weak dependence on the Q parameters owing to the fact that, as the Q param-
eters vary, the filtered states largely do not change.34
With the JSZ normalization, the parameter estimates are directly compara-
ble across distributional assumptions on the measurement errors. That is, in
analogy to Section 3, by fixing the yield portfolios, both measured with and
without error, the P parameters are now directly comparable regardless of the
Q structure. The parameters are also directly comparable across sample peri-
ods. When the P parameters are defined indirectly through a Q normalization,
such comparisons will in general not be possible.
32 In fact, an equivalent characterization of the JSZ normalization is that, for a given portfolio matrix W ,
A W (ΘQ ) = 0 and BW (ΘQ ) = I N .
33 This approximation can be verified empirically by comparing P o to E P [P ] or E P [P ].
t t t T t
34 This is in contrast to, for example, the rotation of DS where, as the lower triangular K Q is changed, the latent
1
states vary as well. Thus, necessarily, so do the optimal P parameters given the specified Q parameters.
959
The Review of Financial Studies / v 24 n 3 2011
real. From Table 2, it is seen that the estimates of the Q parameters for model
RKF are similar to those for models RPC and RY that fit with N portfolios
of yields priced exactly by the GDTSM(3). Similarly, from Table 3 and Table
4, we see that the P parameters also generally match up across the models
with and without filtering. An exception is the P distribution of PC3: When
filtering, the volatility of PC3 is reduced by about 10%, and PC3 has a larger
effect on the conditional mean of PC1 and PC2 (higher K 1,13P , K P ). That is,
1,23
PC3 both becomes a bit smoother and the model attributes a slightly greater
affect of PC3 on forecasts of changes in the level and slope of the yield curve.
35 Cochrane and Piazzesi (2005, 2008) find that a portfolio of smoothed forward rates, that is correlated with PC4,
predicts bond returns. Joslin, Priebsch, and Singleton (2010) find that smoothed growth in industrial production,
which is also correlated with PC4, is an important determinant of excess returns for level and slope portfolios.
960
A New Perspective on Gaussian Dynamic Term Structure Models
Moreover, the sample average short rate is 23%, which also results in large,
wildly oscillating Sharpe ratios.
It is not the need to filter per se that gives rise to these fitting problems with
a 5-factor model. When the first five PCs are priced perfectly by the GDTSM
(Model RPC), the properties of the short rate are now more plausible (see
Table 10). However, the model-implied yields on bonds with maturities beyond
those included in estimation are now wildly implausible. Furthermore, impos-
ing the reduced rank restriction (Model RPC1 ) does not materially improve the
fit with five factors. For all of these error specifications with five factors, the
Sharpe ratios for the higher-order PCs show substantial variation.36 In contrast,
36 See Duffee (2010) for a more extensive empirical evaluation of the properties of Sharpe ratios in GDTSMs.
Joslin, Priebsch, and Singleton (2010) also investigate maximal Sharpe ratio variation within the context of
macro-GDTSMs.
961
The Review of Financial Studies / v 24 n 3 2011
Table 10
Sample moments for three-factor and five-factor GDTSMs
the 3-factor specifications produce plausible values for these moments. We in-
terpret this evidence as being symptomatic of over-fitting, of having too many
pricing factors.
Does the accommodation of filtering substantially increase the computa-
tional complexity of estimation using the JSZ normalization? The parameters
P , K P ) and σ
(K 0, P 1P pricing are now included as part of the parameter search.
962
A New Perspective on Gaussian Dynamic Term Structure Models
Appendices
963
The Review of Financial Studies / v 24 n 3 2011
Q0 1 0
Am+1 − Am = K 0 Bm + Bm H0 Bm − ρ0 (A2)
2
Q0
Bm+1 − Bm = K 1 Bm − ρ1 (A3)
subject to the initial conditions A0 = 0, B0 = 0. See, for example, Dai and Singleton (2003). The
loadings for the corresponding bond yield are Am = −Am /m and Bm = −Bm /m.
Q Q Q
K = D K 0X − D K 1X D −1 C, (A4)
0 X̂
Q Q
K = D K 1X D −1 , (A5)
1 X̂
0 D −1 C,
ρ0 X̂ = ρ0X − ρ1X (A6)
ρ1 X̂ = (D −1 )0 ρ1X , (A7)
K P = D K 0X
P − D K P D −1 C,
1X (A8)
0 X̂
K P = D K 1X
P D −1 , (A9)
1 X̂
H0 X̂ = D H0X D 0 . (A10)
C. Proof of Proposition 1
We require a slight variation of the standard Jordan canonical form of a square matrix that main-
tains all real entries and bears a similar relation to the real Schur decomposition and the Schur
decomposition.
Definition 1. We refer to the real ordered Jordan form of a square matrix A ∈ Rn×n with
eigenvalues (λ1 , λ2 , . . . , λm ) with corresponding algebriac multiplicities (m 1 , m 2 , . . . , m m ) as
A = J (λ) ≡ diag(J1 , J2 , . . . , Jm ),
964
A New Perspective on Gaussian Dynamic Term Structure Models
and otherwise the block is empty. Additionally, we apply an arbitrary ordering on C to order the
blocks by their eigenvalues. In case there exist eigenvalues with a geometric multiplicity greater
than one, we also order the blocks by size.
965
The Review of Financial Studies / v 24 n 3 2011
The proof is analogous to the real case, as the individual steps are the same but require
P = {A J J Q Q P P
GP Θ J + BΘ J Θ : Θ = (k∞ em 1 , J (λ ), 0, ι, K 0J , K 1J , sλQ (ΣP ))}. (A13)
Q P ,
Here, we use ΣP to denote the parameterization since, for Θ J = (k∞ em 1 , J (λQ ), 0, ι, K 0J
P , B −1 Σ ), the transformed model A J
K 1J
λQ P Θ J + BΘ J Θ (which has Pt as the factors since it is in
−1
GP ) has innovation volatility of BλQ B Q ΣP = ΣP .
λ
37 For simplicity, we denote the Cholesky factorization, Σ, but we have in mind the covariance ΣΣ0 .
966
A New Perspective on Gaussian Dynamic Term Structure Models
P = {A J J Q Q −1 P P
GP Θ J + BΘ J Θ : Θ = (k∞ em 1 , J (λ ), 0, ι, kλQ ,k Q ,Σ (K 0P , K 1P ), sλQ (ΣP ))}.
∞ P
(A15)
ΘP = AΘ J + BΘ J Θ J
(A16)
Q
= k Q Q (0, J (λQ )), r Q (k∞ , ι), K 0PP , K 1PP , ΣP ,
λ ,k∞ ,ΣP λQ ,k∞ ,ΣP
E. Proof of Theorem 2
0
We first prove that (26–27) holds when H0 = {η0 = (C 0 , D 0 , Σ0X , P θm )}. Let
η η
(K 0X0 , K 1X0 ) = arg max f (PT , yT , . . . , P1 , y1 |P0 , y0 ; η0 ),
K 0X ,K 1X
and so
η η
(K 0X0 , K 1X0 ) = arg max f (PT , . . . , P1 |P0 ; η0 ). (A18)
K 0X ,K 1X
It follows that the maximum value in (A18) is at most equal to the value of the likelihood corre-
sponding to the OLS estimate. Note that although the value of the maximum likelihood depends
967
The Review of Financial Studies / v 24 n 3 2011
on D, the argument that maximizes the value does not depend on D by the classic Zellner (1962)
result. The OLS likelihood value is achieved by choosing (K 0X , K 1X ) to satisfy (26–27), which
0 is full rank.
is feasible by the assumption that (K 0X , K 1X ) is unconstrained and DP
η η
H , K H ) = (K 0 , K 0 ) for some η and we have shown
This proves our result since (K 0X 1X 0X 1X 0
that (26–27) hold for any η0 . Note that in the case that the parameters are under-identified, there
will not be a unique maximum likelihood estimate in the sense that several η0 may give the same
likelihood, but (26–27) will hold for all possible choices. For some H, there may not exist a
maximizer, in which case the result holds vacuously. However, standard conditions and arguments,
such as compactness, provide for the existence of a maximizer.
It is easy to verify that by first de-meaing the variables we may assume without loss of generality
that α ≡ 0. Furthermore,P by transforming the variables, we may assume again without loss of
generality that Σ = I and t X t X t0 = I . Under these assumptions, we wish to solve
β = arg min trace (Y − Xβ 0 )(Y − Xβ 0 )0
rank(β)=r
0
= arg min trace (Y − Xβ O 0 0 0 0
L S )(Y − Xβ O L S ) − 2 trace X (Y − Xβ O L S )(β − β O L S )
rank(β)=r
+ trace( (X 0 X (β 0 − β O
0
L S ))(β − β O L S )
= arg min kβ − β O L S k F ,
rank(β)=r
where Y and X are (T × N ) and (T × M) matrices with the time series stacked vertically,
P
β O L S = (X 0 X )−1 X 0 Y , and F denotes the Frobenius norm: kAk2F = i, j | Ai, j |2 . The above
equalities repeatedly use the identity trace( AB) = trace(B A). As in Keller (1962), this minimiza-
tion problem has solution β ∗ = U Dr∗ V 0 , where U DV 0 gives the singular value decomposition of
β O L S and Dr∗ is the same as D except setting all of the singular values for n > r to 0. This same
proof applies again in the case where β is not square, which would be the case where one assumes
Q Q
that only a single risk is priced (i.e., [K 0P , K 1P ] − [K 0 , K 1 ] has reduced rank) rather than only a
single risk has time-varying price of risk, as we do here.
References
Adrian, T., and E. Moench. 2008. Pricing the Term Structure with Linear Regressions. Staff Report No. 340,
Federal Reserve Bank of New York. http://www.ny.frb.org/research/staff reports/sr340.pdf (accessed October
25, 2010).
Ang, A., and M. Piazzesi. 2003. A No-arbitrage Vector Autoregression of Term Structure Dynamics with
Macroeconomic and Latent Variables. Journal of Monetary Economics 50:745–87.
Ang, A., M. Piazzesi, and M. Wei. 2003. What Does the Yield Curve Tell Us About GDP Growth? Working
Paper, Columbia University.
Berndt, E., B. Hall, R. Hall, and J. Hausman. 1974. Estimation Estimation and Inference in Nonlinear Structural
Models. Annals of Social Measurement 3:653–65.
Campbell, J., and R. Shiller. 1991. Yield Spreads and Interest Rate Movements: A Bird’s-eye View. Review of
Economic Studies 58:495–514.
968
A New Perspective on Gaussian Dynamic Term Structure Models
Chen, R., and L. Scott. 1993. Maximum Likelihood Estimation for a Multifactor Equilibrium Model of the Term
Structure of Interest Rates. Journal of Fixed Income 3:14–31.
. 1995. Interest Rate Options in Multifactor Cox-Ingersoll-Ross Models of the Term Structure. Journal
of Fixed Income (Winter) 53–72.
Chernov, M., and P. Mueller. 2008. The Term Structure of Inflation Expectations. Working Paper, London Busi-
ness School.
Christensen, J. H., F. X. Diebold, and G. D. Rudebusch 2007: The Affine Arbitrage-free Class of Nelson Siegel
Term Structure Models. Working Paper, Federal Reserve Bank of San Francisco.
. 2009. An Arbitrage-free Generalized Nelson-Siegel Term Structure Model. Econometrics Journal 12:C33–
Cochrane, J., and M. Piazzesi. 2005. Bond Risk Premia. American Economic Review 95:138–60.
Collin-Dufresne, P., R. Goldstein, and C. Jones. 2008. Identification of Maximal Affine Term Structure Models.
Journal of Finance 63:743–95.
Cox, J. C., and C. Huang. 1989. Optimum Consumption and Portfolio Policies When Asset Prices Follow a
Diffusion Process. Journal of Economic Theory 49:33–83.
Dai, Q., and K. Singleton. 2000. Specification Analysis of Affine Term Structure Models. Journal of Finance
55:1943–78.
. 2002. Expectations Puzzles, Time-varying Risk Premia, and Affine Models of the Term Structure.
Journal of Financial Economics 63:415–41.
. 2003. Term Structure Dynamics in Theory and Reality. Review of Financial Studies 16:631–78.
Diebold, F., and C. Li. 2006. Forecasting the Term Structure of Government Bond Yields. Journal of Economet-
rics 130:337–64.
Duffee, G. 1996. Idiosyncratic Variation in Treasury Bill Yields. Journal of Finance 51:527–52.
. 2002. Term Premia and Interest Rate Forecasts in Affine Models. Journal of Finance 57:405–43.
. 2008. Information in (and Not in) the Term Structure. Working Paper, Johns Hopkins University.
. 2009. Forecasting with the Term Structure: The Role of No-arbitrage. Working Paper, University of
California-Berkeley.
. 2010. Sharpe Ratios in Term Structure Models. Working Paper, Johns Hopkins University.
Duffee, G., and R. Stanton. 2007. Evidence on Simulation Inference for Near Unit-root Processes with Implica-
tions for Term Structure Estimation. Journal of Financial Econometrics 6:108–42.
Duffie, D., and R. Kan. 1996. A Yield-factor Model of Interest Rates. Mathematical Finance 6:379–406.
Jardet, C., A. Monfort, and F. Pegoraro. 2009. No-arbitrage Near-cointegrated VAR(p) Term Structure Models,
Term Premiums, and GDP Growth. Working Paper, Banque de France.
Joslin, S. 2007. Pricing and Hedging Volatility in Fixed Income Markets. Working Paper, MIT.
Joslin, S., A. Le, and K. Singleton. 2010. The Conditional Distribution of Bond Yields Implied by Gaussian
Macro-finance Term Structure Models. Working Paper, Sloan School, MIT.
Joslin, S., M. Priebsch, and K. Singleton. 2010. Risk Premiums in Dynamic Term Structure Models with Un-
spanned Macro Risks. Working Paper, Stanford University.
Joslin, S., K. Singleton, and H. Zhu. 2010. Supplement to “A New Perspective on Gaussian DTSMs.” Working
Paper, Sloan School, MIT.
969
The Review of Financial Studies / v 24 n 3 2011
Nelson, C., and A. Siegel. 1987. Parsimonious Modelling of Yield Curves. Journal of Business 60:473–89.
Pearson, N. D., and T. Sun. 1994. Exploiting the Conditional Density in Estimating the Term Structure: An
Application to the Cox, Ingersoll, and Ross Model. Journal of Finance 49:1279–304.
Piazzesi, M. 2005. Bond Yields and the Federal Reserve. Journal of Political Economy 113:311–44.
Zellner, A. 1962. An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggrega-
tion Bias. Journal of the American Statistical Association 57:348–68.
970