0% found this document useful (0 votes)

48 views51 pages

Dynamic Econometric Models Time Series Econometrics For Microeconometricians 2011

This course provides an introduction to dynamic econometric models and methods. The course surveys linear and nonlinear econometric models and estimation tech-niques, presenting them in a method of moments framework. While emphasizing their applicability under general assumptions on the data generating process, the emphasis will be on applications in time series analysis.

Uploaded by

jon lee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views51 pages

Dynamic Econometric Models Time Series Econometrics For Microeconometricians 2011

Uploaded by

jon lee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Dynamic Econometric Models

Time Series Econometrics for

Microeconometricians

Walter Beckert
Department of Economics
Birkbeck College, University of London
Institute for Fiscal Studies

26 - 27 May 2011, DIW Berlin

1 Introduction

1.1 Overview

This course provides an introduction to dynamic econometric models and methods.

The course surveys linear and nonlinear econometric models and estimation tech-
niques, presenting them in a method of moments framework. While emphasizing
their applicability under general assumptions on the data generating process, the
emphasis will be on applications in time series analysis.

The ﬁrst part of the course treats single equation models, while the second part
is devoted to systems of equations. Starting from a review of the linear regression
model (OLS, GLS, FGLS), the course revisits basic properties of stochastic processes
and their implications for time-series regressions, cast in the form of general autore-

1
gressive distributed lag (ARDL) and error correction model (ECM) representations.

The second part of the course is devoted to estimation of systems of equations

that describe the joint evolution of several time series. The primary focus is on
vector autoregressive models (VARs). The concept of co-integration of time series
is introduced, and its implications for VARSs is explored in the context of the
vector error correction model (VECM) representation of VARs, as a multivariate
generalization of ECMs for AR(DL)s.

An supplementary section focusses on second moment properties of stochastic

processes. Speciﬁcally, it is devoted to time series models of heteroskedasticity which
play a prominent role in the analysis of the volatility of ﬁnancial time series.

The course is designed as a two-day sequence of alternating lectures and practical

computer exercises. The applications in the computer practicals will use time series
data from microeconomic contexts, rather than macroeconomic series.

1.2 About these Notes

These notes are intended as a reference guide to the material covered in the course.
The lectures will follow the notes closely, but will focus on the main principles
and results, omitting much of the intermittent algebra. The presentation of the
course material rests on the kind of mathematical and statistical tools and the styles
of argument that microeconometricians are typically familiar with. The primary
objective is to provide an approach to econometric concept in time series analysis
that appeals to the intuitive understanding of microeconometricians, not a fully
rigorous delineation of results.

2
2 Generalized Method of Moments Estimation

2.1 General Setup and Method of Moments Estimation

This section provides a basic review of Method of Moments estimation in the familiar
context of the linear regression model. It sets up the general framework and notation
in which the remainder of the course and these notes will proceed.

Consider data {(yt , x′t ), t = 1, · · · , T }, where yt denotes a scalar dependent (or

response) variable, while xt denotes a k × 1 vector of independent, exogenous co-
variates. Possible assumptions about the data generating process are:

1. distributional assumptions about the joint or conditional cumulative distribu-

tion function (CDF) F (y, X), or F (y|X) respectively, where y = (y1 , · · · , yT )′
and X = (x1 , · · · , xT )′ ; this setup gives rise to maximum likelihood estimation
(MLE);

2. conditional population moment assumptions:

(i) EY|X [yt |xt ] = g(xt ; θ0 ) a.s. for all t, where θ0 ∈ Θ ⊂ Rk is an unknown
parameter vector, and the function g is possibly nonlinear in θ0 ; in the
special case of linearity, EY|X [yt |xt ] = x′t θ0 a.s., the linear regression
model; in the latter case, this is equivalent to EY|X [yt − x′t θ0 |xt ] = 0 a.s.
for all t;
(ii) continuing with the linear model, EY|X [(yt − xt θ0 )2 |xt ] = σ02 > 0 a.s. for
all t, which is referred to as conditional homoskedasticity.

Note: (i) by itself does not identify θ0 , unless k = 1; (ii) identiﬁes σ02 . Based on (i),
unconditional moment conditions can be derived by iterated expectations:

EY|X [yt − x′t θ0 |xt ] = 0 a.s.

⇒ xt EY|X [yt − x′t θ0 |xt ] = 0 a.s.
[ ]
⇒ EX xt EY|X [yt − x′t θ0 |xt ] = 0
(i’) ⇒ EYX [xt (yt − x′t θ0 )] = 0 = m(yt , xt ; θ0 ),

3
i.e. k unconditional moments, which can identify θ0 . Note also that (ii) holds
unconditionally as well: EYX [(yt − x′t θ0 )2 ] = σ02 for all t.

The idea behind Method of Moments (MOM) estimation of θ0 and σ02 is to replace
population moments by sample analogues (empirical moments, sample averages):
For any θ ∈ Θ,

1∑
T
moments in (i’): ET [xt (yt − x′t θ)] = xt (yt − x′t θ) = mT (y, X; θ)
T t=1

1∑
T
moments in (ii): ET [(yt − x′t θ)2 )] = (yt − x′t θ)2 .
T t=1

The MOM estimators θ̂T and σ̂T2 solve the empirical analogues to (i’) and (ii):

(iii) ET [xt (yt − x′t θ̂T )] = 0

(iv) ET [(yt − x′t θ̂T )2 )] = σ̂T2 .

In this linear model, the MOM estimator for θ0 is equivalent to the familiar OLS
estimator: (iii) implies

1∑
T
xt (yt − x′t θ̂T ) = 0
T t=1
( )
1∑ ′ 1∑
xt xt θ̂T = xt yt
T t T t
ET [xt x′t ] θ̂T = ET [xt yt ]
−1
θ̂T = [ET [xt x′t ]] ET [xt yt ]
[ ]−1
∑ ∑
= xt x′t xt yt
t t
′ −1 ′
= (X X)
X y = θ̂OLS ,
[ ]
provided rk(X′ X) = k. Hence, θ̂T is conditionally unbiased: E θ̂T |X = θ0 . Its con-
ditional variance is var(θ̂T |X) = (X′ X)−1 X′ var(y|X)X(X′ X)−1 ; provided that the
yt are conditionally independent across t, i.e. that var(y|X) = σ02 IT , the conditional
variance of the MOM estimator reduces to var(θ̂T |X) = σ02 (X′ X)−1 . In this case,
the MOM estimator enjoys all the properties of the OLS estimator, a direct conse-
quence of the Gauss-Markov Theorem which rests entirely on conditional moment

4
assumptions: Suppose EY|X [y|X] = Xθ0 = [x′t θ0 ]t=1,··· ,T , and var(y|X) = σ02 IT ,
σ02 > 0; then θ̂T is the best linear unbiased estimator (BLUE), i.e. it is eﬃcient (in
the sense of having minimum variance among all linear, unbiased estimators of θ0 ).

The moment conditions (iv), involving second moments, yield

1∑ T −k 2
σ̂T2 = (yt − x′t θ̂T )2 = sT ,
T t T

where s2T is the OLS estimator of σ02 . This implies that the MOM estimator σ̂T2 is
biased in small samples (ﬁnite T ).

2.2 Variants and Generalizations

2.2.1 IV and 2SLS

Suppose that the previous population orthogonality conditions between xt and resid-
uals yy − x′t θ0 do not hold, but that for some vector of instruments zt , for any t,

EYX|Z [yt − x′t θ0 |zt ] = 0 a.s.

EYXZ [zt (yt − x′y θ0 )] = 0 (1)
var(y|X, Z) = σ02 IT ,

where Z = (z1 , · · · , zT )′ .

Suppose (i) dim(zt ) = dim(xt ) = dim(θ0 ) = k, in which case θ0 is just identiﬁed

by (1). Then, the sample analogue to (1) is

ET [zt (yt − xt θ̂IV )] = 0 ⇒ θ̂IV = (Z′ X)−1 Z′ y,

provided rk(Z′ X) = k. Under the conditional homoskedasticity assumption above,

the instrumental variable (IV) estimator θ̂IV has conditional variance var(θ̂IV |X, Z) =
σ02 (Z′ X)−1 Z′ Z(Z′ X)−1 , which reduces to the conditional variance of the OLS esti-
mator when X is a valid array of instruments, i.e. when the orthogonality conditions
(1) hold with zt = xt .

Suppose (ii) dim(zt ) = m > k. In this case, θ0 is over-identiﬁed by (1), which is

now a system of m (rather than k) equations, and Z′ X is an m × k matrix, i.e. it is

5
not square. Let PZ = Z(Z′ Z)−1 Z′ , the orthogonal projector onto the column space of
Z, col(Z).; recall that orthogonal projectors are idempotent and symmetric. Then,
orthogonality of y − Xθ0 and Z according to (1) implies orthogonality of y − Xθ0
and X′ PZ , so that

EYXZ [X′ PZ (y − Xθ0 )] = 0 (1’)

is a system of k unconditional moment conditions. The sample analogue to (1’) is

1 ′
X PZ (y − Xθ̂2SLS ) = 0
T
θ̂2SLS = (X′ PZ X)−1 X′ PZ y (provided rk(X′ PZ X) = k)
= (X′ PZ PZ′ X)−1 X′ PZ y
′ ′
= (X̂ X̂)−1 X̂ y,

where X̂ = PZ X are the ﬁtted values from the regression of the columns of X
onto Z, i.e. only those components of X which are orthogonal to y − Xθ0 accord-
ing to (1). The conditional variance of the 2SLS estimator is var(θ̂2SLS |X, Z) =
σ02 (X′ PZ X)−1 . Note that, if dim(zt ) = k and rk(Z′ X) = k, then (X′ PZ X)−1 =
(X′ Z(Z′ Z)−1 Z′ X)−1 = (Z′ X)−1 Z′ Z(X′ Z)−1 = (Z′ X)−1 Z′ Z((Z′ X)−1 )′ , i.e. the con-
ditional variance collapses to the one of the IV estimator. Note also, for future
reference, that the conditional variances of the IV moment functions are

var(Z′ (y − Xθ0 )|X, Z) = σ02 (Z′ Z) = var(Z′ (y − Xθ0 )|Z).

2.2.2 Non-scalar Variance-Covariance Matrix

Suppose, as at the outset, that EYX [xt (y − x′t θ0 )] = 0 for all t, but var(y|X) = Ω,
a positive definite, symmetric T × T matrix. This change in the second moment as-
sumptions can be expected to affect the second moment properties of the OLS/MOM
estimator θ̂T , i.e. its conditional variance-covariance matrix and, thereby, its effi-
ciency.

As before, the moment conditions involving the ﬁrst moments yield the OLS/MOM
estimator for θ0 , θ̂T = (X′ X)−1 X′ y. The second moment assumptions, however, now

6
imply

var(θ̂T |X) = (X′ X)−1 X′ var(y|X)X(X′ X)−1

= (X′ X)−1 X′ ΩX(X′ X)−1 .

The Gauss-Markov Theorem implies that, while θ̂T is still conditionally unbiased, it
is no longer eﬃcient. Note also: The conditional variance of the moment functions
is now var(X′ (y − Xθ0 )|X) = X′ ΩX.

Consider the special case of conditional heteroskedasticity in which var(y|X) =

diag(σ12 , · · · , σT2 ). In this case, the conditional moment functions h(yt , xt ; θ0 ) =
yt − x′t θ0 have conditional variances var(yt − x′t θ0 |xt ) = σt2 for any t. In contrast to
the homoskedastic case, this suggests to weight the conditional moment functions
inversely proportional to their respective conditional variances, so that more infor-
mative (precise) moment conditions receive higher weight. Following this logic, the
weighted conditional moment functions are 1
σt2
h(yt , xt ; θ0 ) = 1
σt2
(yt − x′t θ0 ), so that
the weighted conditional moments are
[ ]
1 ′
EY|X (yt − xt θ0 ) xt = 0 a.s.,
σt2
and the weighted unconditional moments are
[ ]
1 ′
EYX xt 2 (yt − xt θ0 ) = 0.
σt
Their sample analogues are
[ ]
1 ′ 1∑ 1
ET xt 2 (yt − xt θ̂GLS ) = xt 2 (yt − x′t θ̂GLS ) = 0
σt T t σt
1 ′ −1 1 1
⇔ X Ω (y − Xθ̂GLS ) = 0, where Ω−1 = diag( 2 , · · · , 2 )
T σ1 σT
⇒ θ̂GLS = (X′ Ω−1 X)−1 X′ Ω−1 y,

provided rk(X′ Ω−1 X) = k. The conditional variance of the GLS estimator is

var(θ̂T |X) = (X′ Ω−1 X)−1 . Note that homoskedasticity is a special case in which
Ω = σ02 IT , θ̂GLS = θ̂OLS and var(θ̂T |X) = (X′ Ω−1 X)−1 = var(θ̂OLS |X) = σ02 (X′ X)−1 .
The GLS estimator is eﬃcient among linear, unbiased estimators under the above
assumptions. In this framework, θ̂GLS has the interpretation of eﬃcient MOM esti-
mator.

7
The GLS estimator above is only feasible if Ω is known. If it is not known, it
needs to be estimated, based on ﬁrst-stage residuals obtained from consistent, but
ineﬃcient OLS estimation of θ0 . Once a consistent estimator Ω̂T is obtained, θ0 can
be re-estimated in a second step, using Ω̂T in lieu of Ω:

θ̂F GLS = (X′ Ω̂−1 −1 ′ −1

T X) X Ω̂T y.

This feasible GLS (FGLS) estimator is asymptotically equivalent to the infeasible

GLS estimator, i.e. it is asymptotically eﬃcient.

2.3 Generalized Method of Moments

This line of reasoning suggests that it is generally beneﬁcial (in the sense of eﬃciency)
to weight moment functions by their conditional variances. The Generalized Method
of Moments (GMM) proceeds in this fashion.1

To illustrate this, re-consider the instrumental variable set-up above, with dim(zt ) =
m ≥ k and var(y|X, Z) = σ02 IT . In this case, as shown above, the moment functions
zt (yt − x′t θ0 ) have conditional variance var(Z′ (y − Xθ0 )|X, Z) = σ02 (Z′ Z).

Starting with an arbitrary positive deﬁnite, symmetric weighting matrix Σ of

dimension m × m, the GMM estimator minimizes the generalized distance from zero
of the empirical moments, relative to the metric deﬁned by Σ:

′
θ̂GM M = arg min ET [Z′ (y − Xθ)] ΣET [Z′ (y − Xθ)] .
θ∈Θ

The ﬁrst-order conditions of the minimization problem deﬁne the GMM estimator
θ̂GM M ; in this case:

(y − Xθ̂GM M )′ ZΣZ′ X = 0
θ̂GM M = (X′ ZΣZ′ X)−1 X′ ZΣZ′ y,
1
Hansen, L.P. (1982): “Large Sample Properties of Generalized Methods of Moments Estima-
tors”, Econometrica, 50(4), 1029-1054; and Hansen, L.P. and K.J. Singleton (1982): “Generalized
Instrumental Variables Estimators of Nonlinear Rational Expectations Models”, Econometrica,
50(5), 1269-1286.

8
with conditional variance

var(θ̂GM M |X, Z; Σ) = (X′ ZΣZ′ X)−1 X′ ZΣZ′ σ02 IT ZΣZ′ X(X′ ZΣZ′ X)−1 .

This variance-covariance matrix is minimized with respect to Σ by choosing Σ⋆ =

−1
1
σ02
(Z′ Z)−1 = [var(Z′ (y − Xθ0 )|X, Z)] , i.e. by weighting the moment functions
inversely proportional to their conditional variances. With this optimal weighting
matrix in place, the optimal GMM estimator is seen to be equivalent to the 2SLS
estimator:
var(θ̂GM M |X, Z; Σ⋆ ) = σ02 (X′ PZ X)−1 .

This generalizes to general nonlinear moment functions with suﬃcient smooth-

ness.

2.4 Testing of Moment Conditions

2.4.1 Hausman Test

The Hausman test examines the consistency of MOM estimators in the face of
possible failures of moment conditions.2

Suppose θ̃T and θ̂T are two estimators of θ0 , obtained on the basis of diﬀerent
assumptions about valid moment restrictions; e.g. θ̃T uses moments beyond those
√
used by θ̂T . The null hypothesis H0 is that both θ̂T and θ̃T are T consistent; i.e.,
in the example, that the additional moments are valid, so that θ̃T is relatively more
eﬃcient than θ̂T . Under H0 ,
√ d
T (θ̂T − θ̃T ) → N (0, VD ),

for some asymptotic variance-covariance matric V(D , which may) be singular. The
alternative hypothesis HA implies that limT →∞ Pr |θ̂T − θ̃T | > ϵ > 0 for any ϵ > 0.
The Hausman test statistic takes the usual quadratic form
( )′ ( )
−
HT = T θ̃T − θ̂T V̂D θ̃T − θ̂T ,
2
Hausman, J.A. (1978): “Speciﬁcation Tests in Econometrics”, Econometrica, 46(5), 1251-
1271.

9
where V̂D− is a consistent estimator of the (generalized) inverse of VD . Under the
null hypothesis, its asymptotic distribution is χ2 with degrees of freedom equal to
the number of restrictions imposed by the null hypothesis.

Example: In the context of endogenous regressors, the OLS estimator β̂OLS

is best linear unbiased if E[X′ u] = 0, but biased otherwise. If Z is an array of
valid instruments, then the IV/2SLS estimator β̂IV/2SLS is unbiased, regardless
of whether E[X′ u] = 0 holds or not, but if this moment condition holds, then it is
ineﬃcient, relative to the OLS estimator. Then, this moment condition can be tested
using the Hausman testing framework. Since β̂OLS − β̂IV/2SLS |X, Z ∼ N (0, V),
( )
where V = var β̂OLS − β̂IV/2SLS , the Hausman test statistic is
( )′ ( )
−
HT = β̂OLS − β̂IV/2SLS V β̂OLS − β̂IV/2SLS ,

and the null hypothesis of the validity of the k moment conditions is rejected at the
α-level if HT > χ2k (1 − α). Note that it follows from the orthogonality of relatively
eﬃcient estimators that, under the null hypothesis,
( )
cov β̂OLS, β̂OLS − β̂IV/2SLS = 0
( ) ( )
⇒ var β̂OLS = cov β̂OLS , β̂IV/2SLS .

Hence,
( )
V = var β̂OLS − β̂IV/2SLS
( ) ( )
= var β̂IV/2SLS − var β̂OLS
[ ]
−1 −1
= σ02 (X′ PZ X) − (X′ X) .

Often, however, the latter matrix is singular.

A convenient fact often facilitates the computation of the Hausman test statistic
HT . A consequence of the (conditional) orthogonality of a relative efficient estimator
and its difference to other consistent, but inefficient estimators is that the (condi-
tional) covariances between such estimators is equal to the variance of the efficient
√
estimator. Hence, if θ̃T is efficient relative to θ̂T , then VD = avar( T (θ̃T − θ̂T )) =
√ √
avar( T (θ̂T − θ0 )) − avar( T (θ̃T − θ0 )).

10
As an example, consider the linear, homoskedastic model and let X = [X1 , X2 ],
where X1 consists of exogenous covariates, while X2 is suspected of lack of exo-
geneity. In other words, the validity of the set of unconditional moment conditions
E[X2 (y − X1 θ1 − X2 θ2 )] = 0 is in doubt. Let W be an array of instruments for
X2 in case X2 is endogenous, and let Z = [X1 , W] denote the array of all in-
struments (i.e. the columns of X1 act as instruments for themselves). Then, the
null hypothesis H0 is that X2 is exogenous, while the alternative hypothesis HA
is that X2 is not exogenous. Under H0 , the Gauss-Markov Theorem implies that
the OLS estimator for θ0 = (θ1′ , θ2′ )′ , θ̂OLS , is eﬃcient; its asymptotic distribution
√
is T (θ̂OLS − θ0 ) → N (0, σ02 (X′ X)−1 ), conditional on X. Under HA , θ̂OLS is in-
d

consistent, but the 2SLS estimator θ̂2SLS is consistent; its asymptotic distribution
√
is T (θ̂2SLS − θ0 ) → N (0, σ02 (X′ PZ X)−1 ), conditional on X and Z. Since the OLS
d

and 2SLS estimators are (conditionally) orthogonal under the null hypothesis, their
conditional covariance matrix is zero under H0 . Hence, conditional on X and Z,
√ ( )
T (θ̂OLS − θ̂2SLS ) → N 0, σ02 ((X′ PZ X)−1 − (X′ X)−1 ) ,
d

so that the test statistic becomes

( )′ [(X′ P X)−1 − (X′ X)−1 ]− ( )
Z d
HT = T θ̂OLS − θ̂2SLS θ̂OLS − θ̂2SLS → χdim(col(X )) ,
2
σ̂T2 2

where σ̂T2 is an estimator of σ02 based on either the OLS or 2SLS regression residuals.
This test is referred to as Hausman-Wu Exogeneity Test.

The computation of the Hausman-Wu test statistics is complicated by the re-

quirement of a generalized inverse. A convenient representation of the test in an
easily implementable form was suggested by Wu (1973)3 . The null hypothesis of
exogeneity of X is equivalent to the null hypothesis that γ = 0 in the augmented
regression
y = Xθ + X̂2 γ + ϵ, where X̂2 = PZ X2 .
This hypothesis can be tested using an F -test with dim(col(X2 )) numerator and
dim(col(X)) − dim(col(X2 )) denominator degrees of freedom. The null hypothesis
of exogeneity is also equivalent to the hypothesis that δ = 0 in the regression

y = Xθ + ûδ + ν, where û = (I − PZ )X2 ,

3
Wu, D. (1973): “Alternative Tests of Independence between Stochastic Regressors and Dis-
turbances”, Econometrica, 41(4), 733-775.

11
where û is a set of the vectors of ﬁtted residuals from the reduced form regressions
of the hypothesized endogenous RHS variables onto all exogenous variables. This
hypothesis can be tested using a t-test with dim(col(X2 )) = 1 degrees of freedom if
X2 ∈ R, and an F -test as above otherwise.

2.4.2 Sargan-Hansen J Test

Another test of the validity of moment conditions can be based on the GMM criterion
function. When the parameter vector of interest θ0 is exactly identiﬁed under the
alternative hypothesis and over-identiﬁed under the null hypothesis, then GMM
moment tests are called test of over-identifying restrictions. Let EYX [m(yt , xt ; θ0 )] =
0 denote the r population moment conditions under the null hypothesis, where
dim(θ0 ) = k and r > k, i.e. there are r − k over-identifying restrictions. The
empirical analogues to the population moment functions are ET [m(yt , xt ; θ0 )]. Let
Σ̂⋆T be (a consistent estimator of) the (optimal) GMM weighting matrix Σ⋆ , and let
θ̂GM M be the GMM estimator of θ0 . The minimized, second round GMM criterion
function [ ]′ [ ]
JT = T ET m(yt , xt ; θ̂GM M ) Σ̂⋆T ET m(yt , xt ; θ̂GM M )

then serves as a test statistic for the validity of the over-identifying moment con-
ditions. This particular test statistic is referred to as the Sargan-Hansen (1982)
J-test4 . Its asymptotic distribution, as T → ∞, is χ2r−k , and the test rejects the null
hypothesis when the statistic exceeds the critical values of a χ2r−k random variable
for the appropriate test size. This does not permit any inference about which of the
moment conditions is invalid, however.

In the case of the example in the preceding subsection, the Sargan-Hansen J test
statistic of the null hypothesis that Z is a valid array of instruments is
−1
(y − Xθ̂2SLS )′ Z′ [Z′ (I − PX )Z] Z(y − Xθ̂2SLS )
JT = ,
σ̂T2
and its asymptotic distribution under the null hypothesis is also χ2r−k ; see Appendix
for details. Note that, in general, the Hausman-Wu test requires estimation under
4
Hansen, L.P. (1982): “Large Sample Properties of Generalized Methods of Moments Estima-
tors”, Econometrica, 50(4), 1029-1054

12
both the null and the alternative hypothesis, while the Sargan-Hansen J test only
requires estimation under the null hypothesis.

This test for over-identifying restrictions can also be implemented in terms of

a regression of the 2SLS residuals û = y − Xβ̂2SLS on all the instruments Z and
testing whether all regression coeﬃcients are jointly equal to zero. The test statistic
is asymptotically distributed χ2 with degrees of freedom equal to the number of
over-identifying restrictions.

2.4.3 Weak Instruments

Broadly speaking, the case of weak instruments refers to a situation in which the
correlation between the endogenous variable and its instrument(s) is low. The treat-
ment of situations with weak instruments is an area of active current research.5 In
the case of a single endogenous variable x2 , a test for the weakness of instruments,
due to Bound et al. (1995)6 , is a partial R2 , denoted by Rp2 , that isolates the impact
of the instruments on the endogenous variable, after eliminating the eﬀect of the
other exogenous variables on the latter. The statistic Rp2 is given by the R2 of the
regression
x2 − x̂2 = (z − ẑ)′ δ + ν,

where x̂2 = PX1 x2 and ẑ = PX1 z. When Rp2 is low, then z is considered an array of
weak instruments for x2 .

2.4.4 Model Selection, Specification Testing and Diagnostic Tests

Various tests for model selection have been proposed in the literature, but none
is entirely satisfactory. In regression models, the regression R2 = 1 − û′ û/y′ y is
often considered, where û is the vector of ﬁtted residuals. Superior models exhibit
smaller R2 . This measure does not require distributional assumptions and, hence,
5
For a recent survey, see Stock and Yogo (2002), NBER Technical Working Paper 284.
6
Bound, J., Jaeger, D.A., and R.M. Baker (1995): “Problems with Instrumental Variables
Estimation When the Correlation between Instruments and Endogenous Explanatory Variables Is
Weak”, Journal of the American Statistical Association, 90, No. 430, 443-350.

13
is embedded in the method of moments framework. Alternatively, under distri-
butional assumptions, measures based on the log-likelihood are available and have
some information theoretic interpretation. The Akaike information criterion (AIC)
adjusts the sample log-likelihood at the MLE θ̂ for model j, lT (θ̂(j) ), for the number
of estimated parameters, kj = dim(θ(j) ), so that AICj = −2lT (θ̂(j) ) + 2kj . Under
( )
normality assumptions, the AIC reduces to AICj = 2kj + T ln û′j ûj /T , where ûj
is the vector of fitted residuals of model j. The Schwarz Bayesian information or
posterior odds criterion (SBC), in addition, accounts for sample size T and is de-
fined as SBC = −2lT (θ̂(j) ) + kj ln(T ). The SBC is a closely related variant of the
Bayes Information Criterion (BIC), which is defined as BIC = SBC/T . Under nor-
( )
mality assumptions, the BIC reduces to BIC = ln û′j ûj /T + kj ln(T )/T . Models
with higher information criteria are deemed superior. In comparison to AIC, the
SBC/BIC criterion tends to choose more parsimonious models. Many practitioners
also test the goodness-of-fit in terms of the accuracy of out-of-sample prediction.

Diagnostic tests examine various assumptions underlying estimation. In the

context of the linear regression model, this section surveys the tests which are used
most often and typically provided by standard statistical software.

1. Structural Stability: Tests for structural stability examine whether the pa-
rameters to be estimated are constant over the sampling period, the null hy-
pothesis. Considering a simple linear regression model, under the alternative
hypothesis,

yt = x′t θ1 + ϵt t = 1, · · · , T1 ,
yt = xt θ2 + ϵt , t = T1 + 1, · · · , T,

where θ1 ̸= θ2 and ϵt are assumed i.i.d. and homoskedastic. Under H0 ,

θ = θ1 = θ2 , i.e. k = dim(xt ) restrictions are imposed. The restricted OLS
estimator for θ yields the restricted sum of squares ϵ̂′ ϵ̂ with T − k degrees of
freedom, while the unrestricted OLS estimators for θ1 and θ2 yield the un-
restricted sum of squares ϵ̂′1 ϵ̂1 + ϵ̂′2 ϵ̂2 with T − 2k degrees of freedom, where
ϵ̂it = yt − xt β̂i for i = 1 while t ≤ T1 , and i = 2 while t > T1 . Chow’s ﬁrst

14
breakpoint test statistic is
(ϵ̂′ ϵ̂ − (ϵ̂′1 ϵ̂1 + ϵ̂′2 ϵ̂2 )) /k
CT = ∼ Fk,T −k .
(ϵ̂′1 ϵ̂1 + ϵ̂′2 ϵ̂2 ) /(T − k)
This test requires that the variances of the residuals ϵt are the same in both
subperiods. This can be tested using the Goldfeld-Quandt test
s21 ϵ̂′1 ϵ̂1 /(T1 − k)
GQT = = ∼ FT1 −k,T2 −k ,
s22 ϵ̂′2 ϵ̂2 /(T2 − k)
where the larger variance estimate should form the numerator so that the
statistic is greater than unity. Chow also suggested a test for predictive failure
for the case when T2 < k,
(ϵ̂′ ϵ̂ − ϵ̂′1 ϵ̂1 ) /T2
C˜T = ′ ∼ FT2 ,T1 −k .
ϵ̂1 ϵ̂1 /(T1 − k)

2. Non-linearity in covariates: Using the estimated residuals from the estimation

of the linear model,
ϵ̂t = yt − x′t θ̂,

the Ramsey RESET test amounts to running a second stage regression of ϵ̂t on
xt and the squared predicted dependent variable ŷt2 and to testing whether the
coeﬃcient on ŷt2 is zero, using a t-test. A numerically equivalent implemen-
tation of the test uses yt in lieu of ϵ̂t in the second-stage regression. Higher
powers of ŷt can be included to test for further degrees of curvature, using
F -tests.

3. Serial Correlation in Residuals: Suppose the data generating process is

yt = x′t θ0 + ϵt
ϵt = ρϵt−1 + νt ,

where νt is white noise, i.e. serially uncorrelated and has mean zero and
constant variance. If θ0 were estimated by OLS, then the estimated residuals
would be
ϵ̂t = yt − x′t θ̂ = x′t (θ0 − θ̂) + ρϵt−1 + νt .

This suggests to test the null hypothesis of no serial correlation, i.e. ρ = 0, by

regressing ϵ̂t onto xt and ϵ̂t−1 and test whether the coeﬃcient on ϵ̂t−1 is zero,

15
using a t-test. Testing against the alternative hypothesis of higher-order serial
correlation on the process for ϵt can be done analogously by including further
lags of ϵ̂t and testing that their coeﬃcients are jointly equal to zero, using an
F -test.

4. Heteroskedasticity: Suppose that the residual variance is suspected to be some

function σ 2 (·) of a vector of variables zt , var(ϵt ) = σ 2 (zt ). Then, a test for
heteroskedasticity against the null hypothesis of homoskedasticity amounts to
regressing OLS residuals ϵ̂t on a constant and zt and testing whether the coef-
ﬁcient vector on zt is zero, using an F -test. Candidates for zt are (i) elements
of the regressors xt , (ii) squares and cross-products of the regressors (White
test), (iii) the squared predicted dependent variable ŷt2 (RESET version), (iv)
lagged squared estimated residuals (autoregressive conditional heteroskedas-
ticity, ARCH), and others.

It should be noted that cases (3.) and (4.) do not impede the usual ﬁrst-
moment properties of the OLS estimator for θ0 (unbiasedness, consistency),
because they pertain to second-moment assumptions. But the conditional
variance-covariance matrix of θ̂ is no longer σ02 (X′ X)−1 , but

var(θ̂OLS |X) = (X′ X)−1 X′ ΩX(X′ X)−1 ,

where the structure of Ω is as in subsection 2.2.2 in case (4.), and as in subsec-

tion 3.1.1 below in case (3.). Corrected variance-covariance matrix estimators
are available, for example the Eicker-White estimator

var(\
θ̂OLS |X) = (X′ X)−1 X′ Ω̂X(X′ X)−1 ,

where Ω̂ is a suitable consistent estimator of Ω; see subsection 3.1.2 for an

example. A popular estimator in the presence of residual serial correlation is
suggested by Newey and West (1987).

5. Influential Observations
An influential observation is a data point that is crucial to inferences drawn
from the data. While the various approaches described here provide quanti-
tative measures of the statistical influence of an observation, it is important

16
to keep in kind, however, that only knowledge of the subject matter and the
data itself can determine whether this inﬂuence is substantively informative
or merely due to data reporting error.
Consider the linear regression model in which the k × k matrix X′ X has full
rank case. Deﬁne the orthogonal projection matrix H = X(X′ X)−1 X′ . Then,
∑
Ŷt = Htt Yt + Hts Ys ,
s̸=t

where Hts is the row t, column s element of H, t, s = 1, · · · , T . The value

Htt is a measure of the leverage or influence of Yt on its linear conditional
prediction, Ŷt . It can be shown that 0 ≤ Htt ≤ 1, and that Htt = 1 implies
that Ŷt = Yt , while Htt = 0 implies that Ŷt = 0. This suggests that, in the case
of high values of Htt , the model requires a separate parameter to fit Yt , while
in the case of low values of Htt the prediction Ŷt = 0 is fixed by design, i.e. by
∑
the choice of X. Furthermore, it can be shown that H̄ = T1 Tt=1 Htt = Tk . One
definition of leverage point is an observation indexed t for which Htt > 2 Tk .
Define θ̂−t and s2−t = σ̂−t
2
as the estimates of θ0 and σ02 based on the T − 1
observations, excluding the tth data point. These are referred to as Jackknife
or cross-validatory estimates.7 These can be used to obtain Jackknife residuals
√
ϵ⋆t = (yt − x′t θ̂−t )/(s−t 1 − Htt ),

which can be highly effective for picking up single outliers or influential obser-
vations. Another Jackknife measure of the influence of an observation on the
joint inference regarding θ0 are Cook’s distances

X′ X
CDt = (θ̂−t − θ̂)′ (θ̂−t − θ̂), t = 1, · · · , T,
ks2
which can be compared to an F -distribution to estimate the percentage inﬂu-
ence of Yt on θ̂.
7
The idea of the Jackknife is due to Tukey. Based on the ”leave one out” estimates θ̂−t ,
t = 1, · · · , T , the random variables T θ̂ − (T − 1)θ̂−t may be treated as i.i.d. estimates of θ0 .
They provide an eﬀective way to get a sampling distribution of θ̂ without recourse to asymptotic
arguments and as an alternative to the bootstrap.

17
Best econometric practice usually derives an estimable statistical model from an
underlying economic model or theory that rationalizes the data generating process.
It is important to recognize that, while the various goodness-of-fit measures and
diagnostic tests may be generally useful statistical tools for specification testing
and model selection, when they fail to support the estimated model they do not
provide any guidance as to how to adjust the model because they are not linked
to the economic model. Failures of these tests, therefore, may be indicative of
a misspecified economic model and suggest a re-examination at that level of the
econometric analysis.

3 Univariate Stochastic Processes

3.1 Moving Averages

3.1.1 Stochastic Properties

Let E[yt − x′t θ0 |xt ] = E[ϵt |xt ] = 0 for all t, but suppose that

ϵt = yt − x′t θ0 = ut + αut−1 ,

where
ut−s |xt ∼ i.i.d. E[ut−s |xt ] = 0, E[u2t−s |xt ] = σ02 , s = 0, 1, · · ·

This assumption about the intertemporal correlation of residuals ϵt induces the

following additional conditional second moments

var(yt − x′t θ0 |xt ) = σ02 (1 + α2 )

cov(yt − x′t θ0 , yt−s − x′t−s θ0 |xt , xt−s )) = ασ02 1{|s|=1} ,

for any t, where 1{A} is an indicator function taking value 1 if the event A occurs,
and zero otherwise. Hence, the conditional second moment matrix of the residuals

18
is bidiagonal,
 
1 + α2 α 0 ···
 
 α 1 + α2 α ··· 
 
var(y − Xθ0 |X) = σ02  ..  =: Ω.
 0 α 1 + α2 . 
 
.. .. ..
. ··· . .

This model is a moving average process of order 1, MA(1). It can be generalized

in a straightforward manner to higher orders, MA(q) for positive integers q. In the
case of an MA(q), the conditional covariances vanish for observations more than q
periods apart.

3.1.2 Estimation

This model is another instance of a non-scalar conditional variance covariance matrix

(next to the heteroskedastic case discussed above), and it permits estimation of θ0 by
FGLS. In order to implement the FGLS estimator for the MA(1), the bidiagonal pa-
rameters of Ω need to be estimated from ﬁrst-stage OLS residuals {ϵ̂t , t = 1, · · · , T }.
A consistent estimator of σ02 (1 + α2 ) is s2T = ET [ϵ̂2t ], while a consistent estimator
s2T 1+α̂2T
of ασ02 is cT = ET −1 [ϵ̂t ϵ̂t−1 ]. As an aside, cT
= α̂T
can be solved to obtain an
estimator of α (with sgn(α̂T ) = sgn(cT )), and this can be used in conjunction with
cT to obtain an estimator of σ02 .

3.2 Autoregressive Processes

3.2.1 Stochastic Properties

In the case of the simplest autoregressive process of order 1, AR(1), xt is a scalar

and takes xt = yt−1 , so that the residuals are ϵt = yt − ρ0 yt−1 , where ρ0 takes the rôle
of the parameter of interest θ0 . The residuals ϵt are assumed i.i.d. with moments
E[ϵt |yt−1 ] = E[ϵt ] = 0 and var(yt − ρ0 yt−1 |yt−1 ) = var(ϵt |yt−1 ) = var(ϵt ) = σ02 ;
independence implies here, in particular, independence of past realizations of the
process {yt , t ≥ 0}.

19
Let X = y− = (y0 , · · · , yT −1 )′ . Note that y − Xθ0 = y − y− ρ0 = [yt −
ρ0 yt−1 ]t=1,··· ,T . While y − Xθ0 |X involved T random variables with non-degenerate
distribution, its analogue in the AR(1) model is the vector y − y− ρ0 |y− , but this
d
involves T − 1 constants (since it is conditioned on y− ), and only yT − ρ0 yT −1 |y−1 =
yT − ρ0 yT −1 |yT −1 has a non-degenerate distribution. Therefore, in the case of au-
toregressive processes, the joint distribution of the vector y conditional on initial
conditions (i.e. y0 in the case of an AR(1); on (y0 , · · · , y−p+1 ) in the case of an
AR(p), for integer p) needs to be determined.

By recursive substitution,

yt = ρ0 yt−1 + ϵt
= ρ0 (ρ0 yt−2 + ϵt−1 ) + ϵt
∑
t−1
= ρt0 y0 + ρs0 ϵt−s .
s=0

This implies the conditional moments

E[yt |y0 ] = ρt0 y0

∑
t−1
σ02 (1 − ρ2t
0 )
var(yt |y0 ) = σ02
ρ2τ =
τ =0
0
1 − ρ0
2

∑
min{t,s}−1 2 max{t,s}
σ02 (1 − ρ0 )
cov(yt , ys |y0 ) = σ02 ρ2τ = .
τ =0
0
1 − ρ20

Note that both ﬁrst and second conditional moments depend on t. Without
further restrictions, this would imply that any MOM estimator of ρ0 (OLS, FGLS)
would depend on t as well, which is inconsistent with the notion of ρ0 being a
time-invariant population parameter. This problem could only be overcome if the
unconditional moments did not depend on t. Regarding the ﬁrst unconditional
moments, by iterated expectations

E[yt ] = E[E[yt |y0 ]] = ρt0 E[y0 ],

which is independent of t if, and only if, E[y0 ] = E[yt ] = 0 for all t. Regarding the

20
second unconditional moments,

var(yt ) = var(E[yt |y0 ]) + E[var(yt |y0 )]

1
= ρ2t
0 var(y0 ) + σ0
2
(1 − ρ2t
0 )
1 − ρ20
[ ]
σ02 σ02
= + ρ0 var(y0 ) −
2t
,
1 − ρ20 1 − ρ20
σ02
which is independent of t if, and only if, var(y0 ) = var(yt ) = 1−ρ20
for all t. Note
that this is only valid if |ρ0 | < 1. If this condition holds, then the AR(1) process
{yt , t ≥ 0} is said to be covariance stationary. Covariance stationarity will be
assumed for the remainder of this section.

Assuming (covariance) stationarity, i.e. |ρ0 | < 1, the above results on the mo-
ments of the stationary distribution can now be obtained more easily: Discarding
the trivial case ρ0 = 0, for the ﬁrst moments, for any t,

E[yt ] = ρ0 E[yt−1 ] = ρ0 E[yt ] = 0,

and for the second moments, for any t,

var(yt ) = var(ρ0 yt−1 + ϵt )

= ρ20 var(yt ) + σ02 (because cov(yt−1 , ϵt ) = 0)
σ02
=
1 − ρ20
( )
∑
s−1
cov(yt , yt−s ) = cov ρs0 yt−s + ρτ0 ϵt−τ , yt−s
τ =0
= ρs0 var(yt−s )
σ02
= ρs0 .
1 − ρ20

Notice that, in the case of autoregressive processes, the autocovariance function

c(s) = cov(yt , yt−s ), s = 0, ±1, ±2, · · · , dies oﬀ gradually, unlike in the case of
moving average processes.

21
3.2.2 Estimation

Estimation of ρ0 can proceed by OLS,

( T )−1 T
∑ ∑
2
ρ̂T = yt−1 yt yt−1
t=1 t=1
( )−1
∑
T ∑
T
2
= ρ0 + yt−1 yt−1 ϵt .
t=1 t=1

Serial independence of ϵt and E[ϵt ] = 0 for all t yields unbiasedness, E[ρ̂T ] = ρ0 .

Assuming covariance stationarity, the asymptotic variance of ρ̂T is
√ ( )−1
avar( T (ρ̂T − ρ0 )) = σ02 E[yt2 ] = 1 − ρ20 .
√
Note that |ρ0 | → 1 implies avar( T (ρ̂T − ρ0 ) → 0. This feature will be discussed at
length below. Note also that, surprisingly, the asymptotic variance does not depend
on the data noise σ02 .

3.2.3 Unit Roots

Denote the characteristic polynomial in the lag operator L of the AR(1) process by
Φ(L) = 1 − ρ0 L, so that Φ(L)yt = ϵt .8 It is necessary and suﬃcient for the AR(1) to
be stationary that the roots z of the characteristic equation |Φ(z)| = 0 lie outside
the unit circle, i.e. that |z| = 1
|ρ0 |
> 1, which is equivalent to the previous condition
for covariance stationarity. If ϵt is also i.i.d., then this is a random walk.

Suppose that ρ0 = 1, so that the characteristic equation Φ(z) = 0 has a unit

root. The process takes then the form

yt = yt−1 + ϵt .

Notice that its first difference, yt − yt−1 = ϵt is stationary. Hence, in the case of
ρ0 = 1, the process {yt , t ≥ 0} is said to be difference-stationary, or integrated of
order 1, denoted by I(1). In this notation, the covariance stationary case is denoted
by I(0).
8
The lag operator L is defined by Lyt = yt−1 .

22
W.l.o.g. let y0 = 0 a.s. for the remainder of this section. Then,
( T )−1 T
∑ ∑
ρ̂ − 1 = 2
yt−1 yt−1 ϵt ,
t=1 t=1

and the conditional moments are

( )2 
[ 2 ] ∑t−1
E yt−1 |y0 = 0 = E  ϵs 
s=1

∑
t−1
= E[ϵ2s ]
s=1
= (t − 1)σ02 a.s.,

so that a.s.9
[ ]
∑
T ∑
T
E 2
yt−1 y0 = 0 = (t − 1)σ02
t=1 t=1
∫ T
≈ σ02 (t − 1)dt
1
∝ σ02 T = Op (T 2 ).
2

Similarly, E[yt−1 ϵt ] = E [yt−1 E[ϵt |yt−1 ]] = 0, and, since E[yt−1 ϵt ] = E[ 12 (yt2 − yt−1
2
−
ϵ2t )],
[ ] [ ]
∑
T
1 2 1 ∑T
E yt−1 ϵt y0 = 0 = E (y − y02 ) − ϵ2 y0 = 0
2 T 2 t=1 t
t=1
 ( )2  
1 ∑T ∑ T
= E  ϵt − ϵ2t  y0 = 0
2 t=1 t=1
[ ]
∑
= E ϵs ϵt y0 = 0 = 0 a.s.,
s̸=t

so that a.s.
( )2  ( )2 
∑
T ∑
E yt−1 ϵt y 0 = 0 = E  ϵs ϵt y0 = 0
t=1 s̸=t

= T (T − 1)σ04
= Op (T 2 ).
9
This section uses the Mann-Wald notation: A random variable wT = Op (T α ) if, for any δ > 0,
there exists M > 0 such that Pr(|T −α wT | > M ) < δ for all T ; wT = op (T α ) if Pr(|T −α wT | > δ) →
0 for every δ > 0, as T → ∞.

23
This suggests that

avar(ρ̂T − 1) ∼ (σ02 T 2 )−1 T (T − 1)σ04 (σ02 T 2 )−1 = O(T −2 ),

i.e. that, in the unit root case, T (ρ̂ − 1) = Op (1). This is to be compared to
√
the stationary case, in which T (ρ̂T − ρ0 ) = Op (1), with asymptotic distribution
N (0, 1 − ρ20 ). The preceding argument makes clear why the asymptotic variance
of this distribution collapses in the unit root case when ρ → 1: In the unit root
case, ρ̂T converges to ρ0 = 1 at rate T −1 , i.e. faster than T − 2 , the reason being
1

∑ 2 ∑ 2
that t yt−1 = Op (T 2 ) in the non-stationary case, while t yt−1 = Op (T ) in the
stationary case. In the stationary case, |ρ0 | < 1,
( )−1
1∑ 2 1∑
T T
ρ̂ − ρ0 = y yt−1 ϵt ,
T t=1 t−1 T t=1
∑T σ02 ∑T
t=1 yt−1 → E[yt ] =
1
it follows from a LLN that T
2 2
1−ρ20
, while var( T1 t=1 yt−1 ϵt ) =
2
1 σ0
T 1−ρ20
. Therefore, by a CLT,

( )
1 ∑
T
d σ02
√ yt−1 ϵt → N 0,
T t=1 1 − ρ20
√ d ( )
T (ρ̂T − ρ0 ) → N 0, 1 − ρ20
√
i.e. T (ρ̂T − ρ0 ) = Op (1).

An alternative representation of the AR(1) model is

∆yt = (ρ0 − 1)yt−1 + ϵt ,

where the diﬀerence operator ∆ = 1 − L. Deﬁne β0 = ρ0 − 1. Then, (covariance)

stationarity implies β0 < 0, while non-stationarity (unit root) implies β0 = 0. The
OLS estimator of β0 in the re-parameterized linear regression model

∆yt = β0 yt−1 + ϵt

is ( )−1
∑ ∑
2
β̂T = β0 + yt−1 yt−1 ϵt .
t t

24
√
The preceding discussion shows that this estimator converges to β0 < 0 at rate T
if the process {yt , t ≥ 0} is I(0), and it converges to β0 = 0 ar rate T if {yt , t ≥ 0}
is I(1). This is the basis for the Dickey-Fuller unit root test, which tests the null
hypothesis of a unit root (equivalent to β0 = 0) against the alternative hypothesis
of stationarity (equivalent to β0 < 0). The Dickey-Fuller test statistics10 is
β̂T
DF T = .
se(β̂T )
The Dickey-Fuller test statistic has a non-standard (Dickey-Fuller) distribution un-
der the null hypothesis. This distribution depends both on the estimated model and
the true data generating process; e.g. the critical value of this test in this model
with 5 percent probability of rejecting a true null hypothesis is approximately -2.9,
while it would be around -2 for a standard one-sided t-test.

3.2.4 Extensions

(a) Deterministic Trends

Suppose a deterministic trend is included in the previous AR(1) model,

yt = α0 + ρ0 yt−1 + γ0 t + ϵt ,

where ϵt is white noise, i.e. i.i.d. across t with mean zero and constant variance.
The model can be re-parameterized as before, for β0 = ρ0 − 1,

∆yt = α0 + β0 (yt−1 − δ0 t) + ϵt ,

where δ0 = γ0 /(1 − ρ0 ). If β0 = 0, then ∆yt = α0 + ϵt , i.e. yt is I(1), a random walk

with drift α0 :
∑
t−1
y t = y 0 + α0 t + ϵt−s ,
s=0
where y0 is the initial condition, α0 t is a deterministic trend component, while
∑t−1
s=0 ϵt−s is a stochastic trend. In this model for the true data generating process,
the Dickey-Fuller test is based on the OLS estimator for β in the regression

∆yt = α + βyt−1 + γt + ϵt ,
10
Dickey, D.A. and W.A. Fuller (1979): “Distribution of the Estimators for Autoregressive Time
Series with a Unit Root”, Journal of the American Statistical Association, 74, 427-431.

25
and the Dickey-Fuller test statistics is, as before, DF T = seβ̂(Tβ̂ ) , but the distribution
T

of this test statistic diﬀers from above, because a deterministic trend is included in
the regression.

(b) AR(p) with trend

Just as moving average models can be expanded by including further lags, so

can autoregressive models. Consider the AR(2) model with trend,

yt = α0 + ρ01 yt−1 + ρ02 yt−2 + γ0 t + ϵt .

In this model, the characteristic polynomial in the lag operator is Φ(L) = 1 − ρ10 L −
ρ20 L2 , and stationarity requires that the roots of |Φ(z)| = |1 − ρ01 z − ρ02 z 2 | = 0 lie
outside the unit circle. Conversely, the process has a unit root if the characteristic
equation permits z = 1 as a solution, i.e. if 1 − ρ01 − ρ02 = 0. In this case, a
re-parametrization suitable for testing the hypothesis of a unit root is

∆yt = α0 + (ρ01 + ρ02 − 1)yt−1 + ρ02 (yt−2 − yt−1 ) + γ0 t + ϵt

= α0 + β0 (yt−1 − δ0 t) − ρ02 ∆yt−1 + ϵt ,

where β0 = ρ01 +ρ02 −1 is zero under the null hypothesis, and δ0 = γ0 /(1−ρ01 −ρ02 ).
Running this regression and testing H0 : β0 = 0 yields an Augmented Dickey-Fuller
test. Again, the Dickey-Fuller test statistic has a different distribution under the
null hypothesis, because of the presence of the lagged differences ∆yt−1 . Notice that,
if the AR(2) process is the true data generating process, but ∆yt−1 were omitted
in the Dickey-Fuller regression, then this omission would induce serial correlation
in the estimated residuals: The regression residuals in the mis-specified regression
estimate ρ02 ∆yt−1 +ϵt , and these terms are correlated, because the yt s are correlated.

All of this generalizes to AR(p) processes, with and without deterministic trend,
where p is a positive integer. The relevant re-parametrization of an AR(p), without
deterministic trend, becomes
∑
p−1
∆yt = α0 + β0 yt−1 + δ0s ∆yt−s + ϵt ,
s=1
where

β0 = ρ01 + · · · + ρ0p − 1
δ0s = −(ρ0,s+1 + · · · + ρ0p ), for s = 1, · · · , p − 1.

26
To see this, deﬁne ρ(L) = ρ01 + · · · + ρ0p − 1, δ(L) = δ01 L + · · · + δ0,p−1 Lp−1 , and
notice that

∆yt = α0 + ρ(L)yt + ϵt
= α0 + (β0 L + δ(L)(1 − L))yt + ϵt ,

because

β0 L + δ(L)(1 − L) = β0 + (δ01 L + · · · + δ0,p−1 Lp−1 )(1 − L)

= β0 L + δ01 L − δ01 L2 + δ02 L2 − δ02 L3 + · · · + δ0,p−1 Lp−1 − δ0,p−1 Lp
= (β0 + δ01 )L + (δ02 − δ01 )L2 + · · · + (δ0,p−1 − δ0,p−2 )Lp−1 − δ0,p−1 Lp
= (ρ01 − 1)L + ρ02 L2 + · · · + ρ0,p−1 Lp−1 + ρ0p Lp
= ρ(L).

The Augmented Dickey-Fuller test, as before, examines H0 : β0 = 0 (unit root) vs.

HA : β0 < 0 (stationarity).

3.3 Autoregressive Distributed Lag Models

The issues discussed above remain essentially the same when contemporaneous and
lagged xt s are re-introduced. Such models are called autoregressive distributed lag
(ARDL) models. The easiest version is an ARDL(1,1), in which xt is a scalar
covariate which appears next to lagged yt (the AR(1) part) contemporaneously and
with one lag (the DL(1) part),

yt = α0 + α1 yt−1 + β0 xt + β1 xt−1 + ϵt .

The implicit assumption in this model is that the process {xt , t ≥ 0} is weakly
exogenous, i.e. the parameters of its marginal distribution are not linked with the
parameters of the conditional distribution of yt , given xt and the past.

If α1 = 1, then yt is I(1). Re-parameterizing,

∆yt = α0 + (α1 − 1)yt−1 + β0 xt + β1 xt−1 + ϵt ,

this balances LHS and RHS in terms of order of integration if xt is I(0) and α1 = 1.

27
If xt itself also is I(1),

∆yt = α0 + (α1 − 1)yt−1 + β0 ∆xt + (β0 + β1 )xt−1 + ϵt ,

and in order to balance LHS and RHS in terms of order of integration, either

(i) β0 + β1 = 0 and α1 = 1, or

(ii) α0 + (α1 − 1)yt−1 + (β0 + β1 )xt−1 is I(0).11

Case (i) yields a model in ﬁrst diﬀerences, ∆yt = α0 + β0 ∆xt + ϵt . Case (ii) is
equivalent to
α0 β0 + β1
yt = + xt + νt
1 − α1 1 − α1
α0 β0 + β1
y⋆ = E[yt |xt ] = + xt ,
1 − α1 1 − α1
where νt is white noise (I(0)). In this case, with both yt and xt being I(1) processes
(so that ∆yt and ∆xt are I(0)), but a particular linear combination of yt and xt ,
yt − α0
1−α1
− β0 +β1
1−α1 t
x, being I(0), the two stochastic processes are said to be co-
integrated. Note that this co-integration relationship has the interpretation of a
stable long-run equilibrium relationship between yt and xt , i.e. it is implied by the
original model if yt = yt−1 and xt = xt−1 . This permits the model to be re-cast in
its error correction model (ECM) representation

∆yt = α0 + (α1 − 1)yt−1 + β0 ∆xt + (β0 + β1 )xt−1 + ϵt

[ ]
α0 β0 + β1
= (α1 − 1) yt−1 − − xt−1 + β0 ∆xt + ϵt
1 − α1 1 − α1
[ ]
= (α1 − 1) yt−1 − yt−1
⋆
+ β0 ∆xt + ϵt
[ ] β0
= (α1 − 1) yt−1 − yt−1
⋆
+ (1 − α1 )∆yt⋆ + ϵt .
β0 + β1
This model provides consistent equilibrium dynamics. Note that α1 − 1 < 0 implies
that yt adjusts downwards (upwards) if its previous level yt−1 was above (below)
11
In this case, it must be that α1 ̸= 1 (β0 + β1 ̸= 0), because otherwise this would contradict the
hypothesized non-stationarity of xt (yt ). It also must be the case that |α1 | < 1, because otherwise
the process is explosive.

28
⋆
its long-run equilibrium level yt−1 , and that it adjusts upwards (downwards) if the
long-run equilibrium level increases (decreases).

These re-parameterizations nest other models of interest, by imposing various

restrictions on the parameters. The purely static model has α1 = β1 = 0. A model
of only partial adjustment has β1 = 0. A model in which the two processes yt and
xt have a so-called common factor (mathematically speaking: share a polynomial in
the lag operator, say 1 − ρL) takes the form

(1 − ρL)yt = (1 − ρL)(α + βxt ) + ϵt

⇒ yt = α(1 − ρ) + ρyt−1 + βxt − βρxt−1 + ϵt ,

i.e. α0 = α(1 − ρ), α1 = ρ, β0 = β, β1 = −βρ. Note that this is equivalent to a

linear model with AR(1) errors,

yt = α + βxt + νt
νt = ρνt−1 + ϵt .

β0 +β1
A model with unit long-run coeﬃcient would impose the restriction 1−α1
= 1. A
random walk with drift requires α1 = 1 and β0 = β1 = 0.

Diﬀerent re-parameterizations are of interest because they permit various inter-

pretations of the dynamics of the processes being modelled, e.g. in terms of long-run
and short-run dynamics. Moreover, they have important implications for estima-
tion. They determine whether a model that is linear or nonlinear (e.g. ECM) in
the parameters is to be estimated. And they ensure that I(0) series are balanced
on the LHS and RHS of a regression equation, so that estimators enjoy standard
√
T convergence properties and conventional regression output retains its validity.
To appreciate this latter point, the next section illustrates a case in which failure to
recognize the order of integration leads to invalid inference.

29
3.4 Spurious Regression

Granger and Newbold (1974)12 and Phillips (1986)13 were the first to identify the
issue of spurious regressions. An example common in applied work, and used here to
illustrate the issues involved, might consider the monthly price of a good or service
provided by a firm (yt ) as a function of monthly trading volume or sales (xt ).14 The
question of interest is whether a change in industry structure, such as for example
the merger of the firm with another firm in the same industry at time T0 , translated
into latent synergies that were passed on to consumers in the form of lower prices.
Let δt = 1{t≥T0 } denote a binary variable that takes on value 1 after the merger was
completed. The proposed model is

yt = α0 δt + β0 xt + ut ,

and the hypothesis of interest is that α0 < 0, vs. α0 = 0.

Suppose that both yt and xt are I(1), satisfying

E[yt |y0 ] = y0 , var(yt |y0 ) = tσy2

E[xt |x0 ] = x0 , var(xt |x0 ) = tσx2
yt ⊥ xt , ∀t.

The last property, independence of yt and xt , implies that β0 = 0, and in this case,
if the merger has no eﬀect on prices, then ut = yt = yt−1 + ϵt , where ϵt is white noise.

Examine the OLS estimator of α0 . By the partitioned regression formula,

( ))−1 (∑ ( ) )
∑ ( x2t x2t
α̂T = α0 + δt 1 − ∑ 2 δt 1 − ∑ 2 ut
t t x t t t xt
( )−1 ( )
∑ ∑ x2t ∑ ∑ x 2 ut
= α0 + δt − δt ∑ 2 δ t ut − δt ∑ 2t

t t t xt t t t xt
( )−1 ( )
∑ x2 ∑ ∑ x 2 ut
= α0 + (T − T0 ) + ∑t 2 ut + ∑t 2 .
t≥T
x
t t t≥T t≥T t xt
0 0 0

12
Granger, C.W. and P. Newbold (1974): “Spurious Regression in Econometrics, Journal of
Econometrics, 2, 111-120
13
Phillips, P.C.B. (1986): “Understanding Spurious Regressions in Econometrics”, Journal of
Econometrics, 33, 311-340
14
The additional issue of endogeneity of xt is ignored in the discussion of this section.

30
The individual components of this expression can be expected to have the following
asymptotic properties: With probability one,

E[x2t |x0 ] = tσx2 + x20

[ ]
∑ ∑ σ2
E x2t x0 = σx2 t + T x20 ≈ x T 2 + T x20 = Op (T 2 )
2
[ t ] t
∑ σ2
E x2t x0 ≈ x (T 2 − T02 ) + (T − T0 )x20 = Op (T 2 )
t≥T
2
[ 0 ] [ ]
∑ ∑
E u t y0 = E (yt−1 + ϵt ) y0 = (T − T0 + 1)y0 = Op (T )
t t≥T0
[ ]
∑ ∑
E x2t ut x0 , y0 = E[x2t |x0 ]E[ut |y0 ]
t≥T0 t≥T0
∑
= E[x2t |x0 ]E[yt−1 + ϵt |y0 ]
t≥T0
( )
σx2 2
= y0 (T − T02 ) + (T − T0 )x20
2
= Op (T 2 ).

Hence,
( )−1 ( )
Op (T 2 ) Op (T 2 )
α̂T = α0 + Op (T ) − Op (T ) −
Op (T 2 ) Op (T 2 )
= α0 + Op (1),

i.e. limT →∞ Pr(|α̂T − α0 | > ϵ) > 0 for any ϵ > 0. In other words, if α0 = 0, then a
conventional t-test will erroneously reject this hypothesis with positive probability.

There are two features to note about this. First, non-stationarity of a regressor
(xt ) can spill over, in the sense of having an impact on statistical properties of
coeﬃcient estimates of other regressors, not just on its own coeﬃcient. Second,
if Case (ii) in the preceding section were true, i.e. yt and xt were co-integrated,
√
then T consistency would be preserved; in this case, a linear combination of I(1)
variables is stationary (I(0)), and this renders the regression residuals I(0). This also
suggests one (single equation based) test for co-integration: First, the individual
variables are tested for unit roots; second, if unit roots are not rejected, a linear
regression model of one variable onto the others is estimated, and the estimated

31
regression residuals are tested for a unit root, using an ADF test (again with diﬀerent
critical values). This is the original Engle-Granger procedure15 . It suﬀers from
inherent problems, however: The assignment of the variables to LHS and RHS
is arbitrary, and it implicity assumes weak exogeneity of the RHS variables. The
conclusion from this is that all variables should be treated equally and symmetrically,
in some sense, i.e. in a system based, multivariate, rather a single equation based,
univariate approach.

4 Multivariate Stochastic Processes

4.1 Vector Auto-Regressive Processes

Let yt = (y1t , · · · , ymt )′ be an m × 1 vector, and consider the vector auto-regression

of order p, VAR(p) for positive integers p,

yt = A0 + A1 yt−1 + · · · + Ap yt−p + ϵt ,

where Ai , i = 0, · · · , p, are m × m coeﬃcient matrices, and ϵt is multivariate white

noise, i.e. a vector of serially uncorrelated, mean zero random variables with constant
variance-covariance matrix, E[ϵt ] = 0 and E[ϵt ϵ′s ] = Σ1{s=t} p.d.s. for all s, t. Deﬁne
the underlying characteristic polynomial in the lag operator by

A(L) = A1 L + · · · + Ap Lp .

Then, the VAR(p) can be written as (I − A(L))yt = A0 + ϵt . It is covariance

stationary if all the roots of |I − A(z)| = 0 lie outside the unit circle. Conversely, it
is non-stationary (each series has a unit root) if |I − A(1)| = 0, which is equivalent
to
I − A(1) = I − A1 − · · · − Ap = 0.
√
If the process is stationary, its coeﬃcient matrices can be estimated (with T con-
sistency) using OLS for each equation. The parameters of Σ can be estimated from
15
Engle, R.F. and C.W.J. Granger (1987): “Co-integration and Error Correction: Representa-
tion, Estimation, and Testing”, Econometrica, 55(2), 251-76

32
the regression residuals {ϵ̂t , t = p + 1, · · · , T } in the usual way, i.e. the (i, j) element
∑T
t=p+1 ϵ̂it ϵ̂jt , for i, j = 1, · · · , m.
1
Σ̂ij = T −p

A component variable yj is said to Granger cause another component variable

yi if lagged values of yj help predicting yi , i.e. if any of the matrix elements Ai,j
s ,
s = 1, · · · , p, are non-zero16 . Note that Granger causality does not mean economic
causality, only statistical validity as a predictor variable. Often, Granger causality
and economic causality run in opposite ways. An example, borrowed from Hamilton
(1994)17 , illustrates this: Dividends do not Granger cause stock prices, even though
stock prices are the present discounted value of expected future dividends and capital
gains; stock prices do Granger cause dividends, however, because they aggregate all
the relevant information regarding expected future dividends.

Stationary VAR(p)s have an equivalent MA(∞) representation. Formally,

yt = (I − A(L))−1 (A0 + ϵt )
∑
∞
−1
= (I − A(1)) A0 + ψi ϵt−i ,
i=1

where the convention is adopted that ψ0 = I. The leading constant follows from

∑
p
E[yt ] = A0 + Ai E[yt ] = A0 + A(1)E[yt ] = (I − A(1))−1 A0 .
i=1

The coeﬃcients {ψs , s ≥ 0} can be determined by matching the polynomials in the

lag operator

(I − A1 L − · · · − Ap Lp )−1 = I + ψ1 L + ψ2 L2 + · · · ,

which is equivalent to

(I − A1 L − · · · − Ap Lp )(I + ψ1 L + ψ2 L2 + · · · ) = I.
16
Granger, C.W.J. (1969): “Investigating Causal Relations by Econometric Models and Cross-
Spectral Methods”, Econometrica, 37(3), 424-348; also, Sims, C.A. (1972): “Money, Income and
Causality”, American Economic Review, 62(4), 540-552
17
Hamilton, J.D. (1994): Time Series Analysis, Princeton: Princeton University Press

33
Hence, matching coeﬃcients on L, L2 , · · · ,

−A1 + ψ1 = 0 ⇒ ψ1 = A1
−A2 + ψ2 − A1 ψ1 = 0 ⇒ ψ2 = A1 ψ1 + A2 = A21 + A2
..
.
general result: ψs = A1 ψs−1 + A2 ψs−2 + · · · + Ap ψs−p , s = 1, 2, · · ·

An alternative route to determine the sequence of {ψs , ≥ 0} is by recursive substi-

tution. The coeﬃcients in the MA(∞) representation can be interpreted as impulse
response function, i.e. as marginal impacts of past innovations, e.g. the (i, j) ele-
ment of ψk is the marginal impact of the innovation ϵj,t−k on yi,t , i, j = 1, · · · , m,
k = 0, 1, 2, · · · . Note, however, that for this interpretation to be meaningful, the
components of ϵt must be orthogonal to each other.

4.2 Vector Error Correction Representation

In the context of modelling multivariate series and estimation of such models, essen-
tially the same issues arise as in the univariate setting, as discussed above. Hence,
in a multivariate context, error correction representations of VAR(p)s, called Vector
ECMs (VECMs), are useful for the same reasons given before.

A VAR(p) can be represented as

∑
p−1
yt = A0 + Φyt−1 + Γi ∆yt−1 + ϵt ,
i=1

which is equivalent to
[ ( p−1 ) ]
∑
(I − ΦL) − Γi Li (I − L) yt = A0 + ϵt ,
i=1

where

Φ = A(1) = A1 + · · · Ap
Γi = − [Ai+1 + · · · + Ap ] , i = 1, 2, · · · , p − 1.

34
To see this, note that
( p−1 )
∑
(I − ΦL) − Γi Li
(I − L)
i=1
= I − ΦL − Γ1 L + Γ1 L2 − Γ2 L2 + Γ2 L3 − · · · − Γp−1 Lp−1 + Γp−1 Lp
= I − (Φ + Γ1 ) L − (Γ2 − Γ1 ) L2 − · · · − (Γp−1 − Γp−2 ) Lp−1 + Γp−1 Lp
= I − A1 L − · · · Ap Lp .

An equivalent representation is the VECM

∑
p−1
∆yt = A0 + (Φ − I)yt−1 + Γi ∆yt−1 + ϵt .
i=1

This is referred to as the Sims, Stock and Watson (1990) canonical representa-
tion, originally due to Fuller (1976)18 . Notice that this is yet again simply a re-
parametrization, and there exists a one-to-one mapping between the coeﬃcient ma-
trices of the VAR and the VECM. The VECM can be estimated by OLS, and VAR
coeﬃcients can be determined via the above formulae.

Just as the coeﬃcient on lagged yt in a univariate ECM plays a critical rôle

in determining the integration properties of the series being modelled, so does the
coeﬃcient matrix Π = Φ−I = A(1)−I in the multivariate case. From the deﬁnition
of covariance stationary in the multivariate context, it follows that the VAR(p) is
non-stationary if, and only if, z = 1 is a solution to the determinatal equation
|I − A(z)| = 0.

It is straightforward to see that Π having full rank m corresponds to the other

extreme case of yt being I(0). Suppose, to the contrary, that rk(Π) < m. Then,
there exists a vector α ∈ Rm such that α′ Π = 0. Consider, for simplicity, a VAR(1)
for which Π = A1 − I. Then,

∆yt = A0 + Πyt−1 + ϵt
⇒ α′ ∆yt = α′ A0 + α′ Πyt−1 + α′ ϵt
⇒ α′ ∆yt = α′ A0 + α′ ϵt ,
18
Sims, C.A., Stock, J.H. and M.W. Watson (1990): “Inference in Linear Time Series Models
with Some Unit Roots”, Econometrica, 58(1), 123-144; Fuller, W.A. (1976): Introduction to
Statistical Time Series, New York: Wiley

35
i.e. α′ yt is I(1), a contradiction to the hypothesis that yt is I(0). Hence, full rank
of Π is equivalent to all components of yt being covariance stationary.

Noting that each equation in a VECM looks just like an univariate ARDL model
in which xt represents another component of the vector yt , one might expect the
matrix Π to be informative about co-integrating relationships as well, because Πyt−1
is just a collection of m linear combinations of the elements of yt−1 . In order to then
balance the order of integration of the LHS and RHS, it must be the case that Π,
in a sense that will be made precise below, contains all coeﬃcients of co-integrating
relationships among the elements of yt , i.e. all co-integrating vectors that induce
linear combinations of the elements of yt which are I(0). It follows from the preceding
two paragraphs that the case of co-integration among the component series of yt
corresponds to 0 < rk(Π) = r < m. In this case, it is said that there exist r distinct
co-integrating relationships between the m elements of yt , each corresponding to a
co-integrating vector βj so that βj′ yt is I(0), j = 1, · · · , r. In terms of the solutions to
the determinantal equation, the case of r co-integration relationships between the m
elements of yt is equivalent to m − r solutions (out of mp solutions of |I − A(z)| = 0)
that lie on the unit circle, with a real part equal to unity, while all other solutions
lie outside the unit circle and correspond to the co-integrating relationships and
higher-order dynamics.

The foregoing discussion is summarized in the Granger Representation Theorem:

Consider the vector-valued process {yt , t ≥ 0} of dimension m, satisfying

∑
p
yt = A(L)yt + ϵt = Ai yt−1 + ϵt ,
i=1

where ϵt is multivariate white noise. Suppose there exist r co-integrating relationships

among the m elements of yt . Then,

(1) there exists an m × r matrix β with rk(β) = r, such that zt = β ′ yt is a system

of r I(0) series;
∑∞
(2) ∆yt has an MA(∞) representation: ∆yt = ψ(L)ϵt , where ψ(L) = I+ i=1 ψi Li ,
and β ′ ψ(1) = 0;

36
(3) Π = A(1) − I has rk(Π) = r, and there exists an m × r matrix α, such that
Π = αβ ′ ;
∑p−1
(4) there exists a VECM: ∆yt = αzt−1 + i=1 γi ∆yt−i + ϵt .

The last assertion of part (2) is not critical for the understanding of the further
development; its proof is given in an appendix.

If some of the series in the VAR are subject to a deterministic time trend - which,
if present, in the case of economic series is typically linear - then it can be included
into the co-integrated relationship, in analogy to Section 3.2.4 above.19 Formally, in
terms of the formalism of the preceding Theorem, if the original VAR(p) is of the
form
∑
p
yt = A(L)yt + γt + ϵt = Ai yt−1 + γt + ϵt ,
i=1

where γ is a m × 1 vector of coeﬃcients on the time variable t, then the associated

VECM is
∑
p−1
∆yt = α(zt−1 + δt) + γi ∆yt−i + ϵt ,
i=1

where the r × 1 vector δ satisﬁes αδ = γ.

It is important to note that α and β are not uniquely determined, since for any
non-singular r × r matrix Q, Π = αβ ′ = αQQ−1 β ′ = α̃β̃ ′ , where α̃ = αQ and
β̃ = β(Q−1 )′ . The same argument applies to δ. The appropriate choice of Q is
usually guided by economic theory and equivalent to imposing r2 restrictions on the
elements of Q.
19
If it were included without being restricted to be part of the co-integrating relationship, then
this might imply a quadratic trend in the respective original series.

37
4.3 Johansen Co-integration Tests

Johansen co-integration tests20 present a formal statistical framework to test hy-

potheses about the rank of the matrix Π in the VECM representation of a VAR(p),
which, as shown above, relate to the integration properties of the multivariate
stochastic process yt . Testing can thereby take various forms. For instance,

(I) H0 : rk(Π) = 0, vs. HA : rk(Π) > 0.

(II) H0 : rk(Π) = 0, vs. HA : rk(Π) = 1.

(III) H0 : rk(Π) = r, vs. HA : rk(Π) > r.

(IV) H0 : rk(Π) = r, vs. HA : rk(Π) = r + 1.

Cases (I) and (II) are considered here in turn.21 As in the case of testing for unit
roots in the case of univariate stochastic processes, there are further test variants
when deterministic trends are included in the model.

4.3.1 Case (I)

Consider the model ∆yt = Πyt−1 + vt ; here, the intercept vector and the lagged
diﬀerences ∆yt−s , s = 1, · · · , p1 are omitted, as they are irrelevant to the under-
standing of the underlying principles of the test procedure. Stack up the T systems
∆y′t = y′t−1 Π′ + v′t , to form
∆Y = Y−1 Π′ + v,

where Y = (y1 , · · · , yT )′ , Y−1 = (y0 , · · · , yT −1 )′ and v = (v1 , · · · , vT )′ are T × m

matrices. If the rows of v are assumed to be normally distributed, with mean zero,
20
Johansen, S. (1988): “Statistical Analysis of Co-integration Vectors”, Journal of Economic
Dynamics and Control, 12, 231-254; and Johansen, S. (1991): “Estimation and Hypothesis Testing
of Co-integration Vectors in Gaussian Vector Autoregressive Models”, Econometrica, 59(6), 1551-
1580
21
The organization and presentation of Johansen tests provided in this section builds on Tom
Rothenberg’s exposition of this material in a graduate time-series course at U.C. Berkeley. I am
indebted to him for his lucid introduction to this topic. All errors are mine.

38
contemporaneous variance-covariance matrix Ω and serially independent, then the
joint probability density of this model, or the likelihood function of the parameters
Π and Ω, given the data, is
( )
∏
T
1 ∑ ′ −1
− T2
f (vt ; Π, Ω) ∝ |Ω| exp − v Ω vt
2 t t
t=1
( ( ))
1 ∑
= |Ω|− 2 v′t Ω−1 vt
T
exp − tr
2
( t
)
1 ∑ ( )
= |Ω|− 2 tr v′t Ω−1 vt
T
exp −
2 t
( )
1 ∑ ( )
= |Ω|− 2 tr Ω−1 vt v′t
T
exp −
2 t
( ( ))
1 ∑
= |Ω|− 2 exp − tr Ω−1 vt v′t
T

2
( t
)
1 ( −1 ′
)
= |Ω|− 2
T
exp − tr Ω V V ,
2

where V = ∆Y − Y−1 Π′ .22 Given Π, Ω can be concentrated out in the usual way,
i.e. by choosing Ω = T1 V′ V.23 Then, the concentrated likelihood function is
( (( )−1 ))
∏
T
1 ′
− T2
1 V′ V
f (vt ; Π) ∝ VV exp tr V′ V
t=1
T 2 T
− T2
1 ′
∝ VV
T
− T2
1 ′
= (Y − Y−1 Π′ ) (Y − Y−1 Π′ )
T
→ max
Π
T ( ′ )
⇔ min ln (Y − Y−1 Π′ ) (Y − Y−1 Π′ ) .
Π 2

Imposing the null hypothesis of Case (I), i.e. the m2 restrictions Π = 0, yields
T
2
ln (|∆Y′ ∆Y|), which is proportional to the log-likelihood function under the null
hypothesis.
22
Strictly speaking, the preceding expression is the conditional density of Y, given y0 .
23
Appendix B.3 is a brief review of concentrating out parameters from a likelihood function.

39
Under the alternative hypothesis, the unrestricted estimator of Π is the OLS es-
timator (on each equation), and the log-likelihood function, evaluated at the estima-
tor, is proportional to the logarithm of the residual sum of squares, i.e. proportional
( )
to T2 ln ∆Y′ MY−1 ∆Y , where the T × T matrix MY−1 = I − Y−1 (Y′−1 Y−1 )−1 Y′−1
is the orthogonal projector onto the space orthogonal to the column space of Y−1 .

The likelihood ratio test statistic for Case (I) is then, as usual, twice the diﬀerence
between the log-likelihood of the unrestricted and restricted model, i.e.
( )
∆Y′ MY−1 ∆Y
LRT = −T ln ′ ∼ χ2m2 ,
∆Y ∆Y
and the null hypothesis is rejected when this statistic exceeds the critical value of a
χ2m2 distribution for the appropriate size of the test.

The usual representation of the Johansen test is in terms of certain eigenvalues.

To deduce this, notice that
( )
′ −1 ′
LRT = −T ln |∆Y ∆Y| ∆Y MY−1 ∆Y
( )
−1
= −T ln |∆Y′ ∆Y| ∆Y′ ∆Y − ∆Y′ Y−1 (Y′−1 Y−1 )−1 Y′−1 ∆Y
( )
′ − 21 ′ ′ −1 ′ ′ − 12
= −T ln I − (∆Y ∆Y) ∆Y Y−1 (Y−1 Y−1 ) Y−1 ∆Y (∆Y ∆Y)
(m )
∏ ∑m
= −T ln µi = −T ln(µi ),
i=1 i=1

where the third (fourth) equality follows from a linear algebra result provided in
Appendix B.1.1 (B.1.2), and {µi , i = 1, · · · , m} are the characteristic roots (eigen-
values) of the matrix
− 21 − 21
Q = I − (∆Y′ ∆Y) ∆Y′ Y−1 (Y′−1 Y−1 )−1 Y′−1 ∆Y (∆Y′ ∆Y) .

The representation of the log-likelihood test statistic in terms of eigenvalues is usu-

ally referred to as Johansen trace statistic for Case (I).

4.3.2 Case (II)

In this case, since the null hypothesis is the same as in Case (I), the denominator
of the test statistics (the log-likelihood function under the null hypothesis) remains

40
the same as before. The numerator is proportional to the logarithm of the sum of
squared residuals when the restriction Π′ = αβ ′ is imposed, where α, β ∈ Rm , and
the model is
∆Y = Y−1 Π′ + v = Y−1 αβ ′ + v.

Hence, the log-likelihood function under the alternative hypothesis is proportional

( )
to T2 ln (∆Y − Y−1 αβ ′ )′ (∆Y − Y−1 αβ ′ ) , which is to be minimized with respect
to β, given α, i.e. concentrating out β, and subsequently with respect to α.

Let z = Y−1 α, which is stationary under the alternative hypothesis, with co-
eﬃcient vector β; i.e. α is the single co-integrating vector under the alternative
hypothesis. Cast in this form, the model under the alternative hypothesis amounts
to m LHS variables collected in ∆yt and a single RHS variable zt , which enters each
equation with an individual coeﬃcient βi , i = 1, · · · , m:

∆yi,t = βi zt + vi,t , i = 1, · · · , m; t = 1, · · · , T.

The coeﬃcients βi can be estimated by individual OLS regressions. Consequently,

and analogously to Case (I), the log-likelihood function under the alternative hy-
( ) ′
pothesis is proportional to T2 ln ∆Y′ MY−1 α ∆Y , where MY−1 α = Mz = I − zzz′ z .
Using the result provided in Appendix B.2,
( )
′ ′ ′
T ( ) T α Y M Y
−1 ∆Y −1 α |∆Y ∆Y|
ln ∆Y′ MY−1 α ∆Y = → min .
2 2 α′ Y′−1 Y−1 α α

( α′ Aα )
This is a ratio of quadratics in α, i.e. of the form T
2
ln α′ Bα
, where A = Y′−1 M∆Y Y−1
and B = Y′−1 Y−1 , which is p.d.s. The FOCs of this minimization problem yield

−2
0 = (α̂′ B α̂) (α̂′ B α̂2Aα̂ − α̂′ Aα̂2B α̂)
( )
α̂′ Aα̂
⇒0 = A− ′ B α̂
α̂ B α̂
= (A − r̂B)α̂
( 1 )
⇔ 0 = B − 2 AB − 2 − r̂I γ̂,
1

where r̂ are the characteristic roots (eigenvalues) of B − 2 AB − 2 and γ̂ = B 2 α̂. There

1 1 1

are m pairs of eigenvalues r̂ and associated eigenvectors α̂. Minimization with re-
spect to α̂ leads to choosing the smallest eigenvalue, r̂min . Hence, the log-likelihood

41
( )
under the alternative hypothesis is proportional to T
2
ln |∆Y′ ∆Y| r̂min , so that
the Johansen likelihood ratio test statistic for Case (II) is
( )
LRT = −T ln r̂min .

Note that, as a consequence of the result provided in Appendix B.1.3,

( )− 1 ( )− 1
B − 2 AB − 2 = Y′−1 Y−1 2 Y−1 M∆Y Y−1 Y′−1 Y−1 2
1 1

has the same eigenvalues as

− 12 − 21
(∆Y′ ∆Y) ∆Y′ MY−1 ∆Y (∆Y′ ∆Y) ,

and these are 1 − µi , i = 1, · · · , m (see Appendix B.1.4), where the µi are the
eigenvalues of the matrix Q encountered in Case (I). Hence, the Johansen likelihood
ratio test statistic can also be expressed as

LRT = −T ln (1 − µmax ) .

4.3.3 Further Results

Using the same principles as in the preceding two subsections, the Johansen likeli-
hood ratio test statistics for the remaining two test cases can be deduced. For Case
(III), H0 : rk(Π) = r against HA : rk(Π) > r, the test statistic is

∑
m
( )
LRT = −T ln µ(i) ,
i=r+1

where µ(1) < · · · < µ(m) are the ordered eigenvalues of the matrix Q obtained in
Case (I). Similarly, for Case (IV), H0 : rk(Π) = r against HA : rk(Π) = r + 1,
( ) ( )
LRT = −T ln 1 − µ(m−r) = −T ln r̂(r+1) ,

where r̂(1) < · · · < r̂(m) are the ordered eigenvalue of I − Q. The critical values
depend on m and r and are provided in tables or by statistical software.

42
5 Supplement: Time Series Models of Heteroskedas-
ticity

5.1 Basic Concepts

Up to this point, it was assumed that the stochastic processes being modelled are
propelled by innovations that have constant variances and covariances over time.
This assumption impedes the analysis of potential volatility in the series, i.e. chang-
ing or heteroskedastic variances (and covariances) over time. Time series models
of heteroskedasticity have important applications as a useful tool to capture the
volatility of a stochastic process, notably in empirical finance. Recent experience in
financial markets shows that - beyond the theory of efficient financial markets which
predicts no autocorrelation in asset returns - squared returns vary widely and, to
some extent, predictably depend on the past. This suggests that conditional vari-
ances may follow a time series process as well, and sometimes this process may be
characterized by a distribution with thick tails.

For the purpose of illustration, consider the univariate stationary AR(p) process
∑
yt = c + pi=1 ϕi yt−i + ut , where ut is assumed to be white noise, i.e. ut is i.i.d. with
E[ut ] = 0 and E[ut us ] = σ 2 1{t=s} , σ 2 > 0. The white noise assumption implies that
the process’ unconditional variance is constant. This does not preclude that the
conditional variance may vary over time. One way to model this is as a stationary
AR(m) for {u2t , t = 1, · · · }:

∑
m
u2t =ξ+ αj u2t−j + ωt ,
j=1

where ωt is white noise, i.e. ωt is i.i.d. with E[ωt ] = 0 and E[ωt ωs ] = λ2 1{t=s} ,
λ2 > 0, for all t. Since E[ut |ut−s , s = 1, 2, · · · ] = 0 this implies for the conditional
variance of ut , given the past,
∑
m
E[u2t |u2t−s , s = 1, 2, · · · ] = ξ + αj u2t−j .
j=1

This AR(m) model for u2t is called Autoregressive Conditional Heteroskedasticity

43
(ARCH) model (Engle (1982)24 ).

This model requires further restrictions in order to be an adequate representation

of volatility and to be compatible with the stationary AR model for the primary
series of interest, yt . (i) To ensure that the conditional variances are positive, it is
required that αj ≥ 0, j = 1 · · · , m, and ξ > 0. (ii) To ensure that u2t is covariance
∑
stationary, it is required that |1 − α(z)| = |1 − mj=1 αj z | = 0 have all roots outside
j

the unit circle. Provided these conditions hold, the unconditional variance of ut can
be expressed in terms of the ARCH model parameters as

σ 2 = ξ/(1 − α(1)).

Further restrictions are required if the model is designed to eliminate thick tails,
i.e. to control higher-order moments. To see this, consider the alternative represen-
√ ∑
tation of the innovations ut = ht vt , ht = ξ + m 2
j=1 αj ut−j , so that vt =
√ut have
ht
the interpretation of standardized innovations of the primary process yt , satisfying

E[vt ] = E [E[vt |ut−s , s = 1, · · · ]] = 0

1 1
var(vt |ut−s , s = 1, · · · ) = var(ut |ut−s , s = 1, · · · ) = (ht + λ2 )
ht ht
[ 2 ]
var(vt ) = E[ut /ht ] = E E[ut |ut−s , s = 1, · · · ]/ht = 1.
2

The thickness of the tails of the distribution of vt is governed by its fourth moment,
E[(vt2 − 1)2 ]. Since u2t = ht vt2 = ht + ωt , it follows that ωt = ht (vt2 − 1), so that
E[ωt2 ] = λ2 = E[h2t (vt2 −1)2 ] = E[h2t ]E[(vt2 −1)2 ], because vt is independent. Consider,
for simplicity, the case of an ARCH(1) model, for which ht = ξ + α1 u2t−1 . Then, the
24
Engle, R.F. (1982): “Autoregressive Conditional Heteroscedasticity with Estimates of the
Variance of United Kingdom Inﬂation”, Econometrica, 50(4), 987-1009.

44
unconditional expectation of h2t is
[ ]
E[h2t ] = E (ξ + α1 u2t−1 )2
= ξ 2 + α12 E[u4t−1 ] + 2α1 E[u2t−1 ]
( ) ξ
= ξ 2 + α12 var(u2t−1 ) + (E[u2t−1 ])2 + 2α1 ξ
1 − α1
( 2 2
)
λ ξ ξ
= ξ 2 + α12 + + 2α1 ξ
1 − α1 (1 − α1 )
2 2 1 − α1
2 2
α1 λ (1 − α1 ) ξ + α1 ξ + 2(1 − α1 )α1 ξ 2
2 2 2 2
= +
1 − α12 (1 − α1 )2
2 2 2
ξ α1 λ
= + .
(1 − α1 )2 1 − α12

Therefore,
[ ] λ2
E (vt2 − 1)2 = α21 λ2
.
ξ2
(1−α1 )2
+ 1−α21

Suppose the assumptions are slightly strengthened, so that the standardized

innovations vt = √ut have a distribution whose tails are not thick, say they be
ht
distributed N (0, 1). Then, the fourth moment satisﬁes E [(vt2 − 1)2 ] = 2. The above
expression for the fourth moment then implies

λ2 (1 − 3α12 ) 2ξ 2
= .
1 − α12 (1 − α1 )2

The right-hand side is positive. Therefore, for the left-hand side to be positive, it is
√
required that α1 ≥ 1/ 3.

Empirically, for ﬁnancial time series, such restrictions on the tails of their distri-
butions are typically rejected. Researchers, therefore, often maintain distributional
assumptions that allow for thicker tails, e.g. t-distribution instead of normality.

5.2 Estimation of ARCH(m) Model

Assuming vt is Gaussian, then estimation can proceed by Maximum Likelihood

methods. The likelihood function is thereby set up recursively.

45
Let Yt = (yt , yt−1 , · · · ). Then, the conditional density of yt , given the past, is
( )
1 1
f (yt |Yt−1 ; θ) = √ exp − ((1 − ϕ(L))yt − c) 2
2πht 2ht

where θ′ = (c, ϕ1 , · · · , ϕp , α1 , · · · , αm , ξ). The log-likelihood function l(θ; y−m+1 , · · · , yT )

can then be expressed as

l(θ; y−m+1 , · · · , yT ) = ln (f (y1 , · · · , yT |y0 , · · · , y−m+1 ; θ))

∑T
= ln (f (yt |Yt−1 ; θ))
t=1

1∑ 1 ∑ ((1 − ϕ(L))yt − c)2

T T
T
= − ln(2π) − ln(ht ) −
2 2 t=1 2 t=1 ht
→ max!
θ

5.3 Extensions

5.3.1 Generalized ARCH (GARCH)

∑∞
Consider the model ht = ξ + π(L)u2t , where π(L) = j=1 πj Lj is an inﬁnite polyno-
mial in the lag operator L and the ut are white noise, as above. Parameterize π(L)
as the ratio of two ﬁnite order polynomials in L:

α(L)
π(L) = ,
1 − δ(L)

where
∑
m
α(L) = αj Lj
j=1
∑r
δ(L) = δk Lk ,
k=1

where it is assumed that |1 − δ(z)| = 0 has all roots outside the unit circle.

This yields
α(L) 2
ht = ξ + u,
1 − δ(L) t

46
from which it follows that

(1 − δ(L))ht = (1 − δ(1))ξ + α(L)u2t ,

which is equivalent to

ht = (1 − δ(1))ξ + δ1 ht−1 + · · · + δr ht−r + α1 u2t−1 + · · · + αm u2t−m ,

i.e. ht follows an ARMA(r, m) process. This model is referred to as Generalized

ARCH (GARCH; Bollerslev (1986)25 ). Estimation proceeds my maximum likeli-
hood, analogous to the case of ARCH.

5.3.2 Integrated GARCH (IGARCH)

Consider the following GARCH model. Suppose (1 − δ(L))ht = ξ + α(L)u2t , so that

∑
r ∑
m
ht = ξ + δi ht−i + αj u2t−j .
i=1 j=1

Then,
∑
r ∑
m
ht + u2t =ξ− δ1 (u2t−1 − ht−1 ) − · · · − δr (u2t−r − ht−r ) + δi u2t−i + αj u2t−j + u2t .
i=1 j=1

Defining the martingale difference sequences ωt = u2t −ht which satisfies E[ωt |past] =
0, and p = max{r, m}, this model is equivalent to

∑
p
∑
r
u2t =ξ+ (δs + αs )u2t−s + ωt − δk ωt−k ,
s=1 k=1

where δs = 0 for s > r and αs = 0 for s > m, k, s = 1, · · · , p. This is an ARMA(p, r)

∑
process for u2t . It has a unit root if ps=1 (δs + αs ) = 1. This special case is called
IGARCH (Engle and Bollerslev (1986)26 ).
25
Bollerslev, T. (1986): “Generalized Autoregressive Conditional Heteroskedasticity”, Journal
of Econometrics, 31, 307-27.
26
Engle, R.F. and T. Bollerslev (1986): “Modelling the Persistence of Conditional Variances”,
Econometric Reviews, 5, 1-50.

47
A Granger Representation Theorem, part (2)

Note, ﬁrst, that β ′ ψ(1) = 0 implies Πψ(1) = 0, which, in turn, is equivalent to

(A(1) − I)ψ(1) = 0. To prove the assertion, notice that the MA(∞) representation
∆yt = (I − L)yt = ψ(L)ϵy implies

(I − A(L))(I − L)yt = (I − A(L))ψ(L)ϵt

⇔ (I − L)(I − A(L))yt = (I − A(L))ψ(L)ϵt (linear operators commute)
⇔ (I − L)ϵt = (I − A(L))ψ(L)ϵt ,

for any realization of the random vector ϵt . Therefore, (I − L) and (I − A(L))ψ(L)

represent the same polynomials in the lag operator, i.e.

(I − z) = (I − A(z))ψ(z) for any z;

Choosing z = 1 yields the desired result.

B Useful Auxiliary Results

The following results are useful for the development of the Johansen Tests for
the number of cointegrating vectors.

1. Let A and B be matrices of dimension n × n.

1.1 Distributive law: det(AB) = |AB| = |A||B|.

1.2 Eigenvalues and matrix spectrum: The eigenvalues (characteristic roots)
λi , i = 1, . . . , n, of A satisfy the characteristic equation

det(A − λi In ) = 0.

Furthermore, there exist n eigenvectors (characteristic vectors) ai , i =

1, . . . , n, of dimension n × 1, such that

(A − λi In )ai = 0, i = 1, . . . , n.

48
The collection of eigenvalues of A, λ(A) = {λi , i = 1, . . . , n}, is called
the matrix spectrum of A and satisﬁes
∏
n
|A| = λi .
i=1

1.3 Let λi , i = 1, . . . , n, satisfy

(1) |λi In − (A′ A)− 2 A′ B(B′ B)−1 B′ A(A′ A)− 2 | = 0.

1 1

Then, µi , i = 1, . . . , n, satisfying

(2) |µi In − (B′ B)− 2 B′ A(A′ A)−1 A′ B(B′ B)− 2 | = 0

1 1

are pairwise identical to λi .

Proof : Let Ã = A(A′ A)− 2 , and B̃ = B(B′ B)− 2 . Then, (1) is equivalent
1 1

C′ Cxi = λi xi ,
CC′ zi = µi zi ,

implying

z′i C′ CC′ xi = λi z′i C′ xi ,

x′i CC′ Czi = µi x′i Czi ,

so that µi = λi .
1.4 Let λi , i = 1, . . . , n, be the eigenvalues of A. Then, γi = 1 − λi , i =
1, . . . , n, are the eigenvalues if In − A.
Proof : This follows immediately from the deﬁnition of λi ,

0 = |A − λi In | = |A − In − (λi − 1)In | = (−1)n |In − A − (1 − λi )In |.

and so 0 = |In − A − (1 − λi )In |.

49
2. Let W = [U, V], where the matrices U and V have dimensions n × a and
n × b, respectively. Let MU = In − U(U′ U)−1 U′ , and analogously for MV .
Then,
[ ]
U′ U U′ V
W′ W = .
V′ U V′ V

For the case a = b = 1, U and V are column vectors, hence their inner
products are scalars, and so it can readily be veriﬁed that

det(W′ W) = (U′ U)(V′ V) − (U′ V)2

= (U′ U)(V′ V − (U′ V)2 /U′ U) = (U′ U)(V′ MU V)
= (V′ V)(U′ U − (U′ V)2 /V′ V) = (V′ V)(U′ MV U).

This generalizes for arbitrary integers n:

det(W′ W) = |W′ W| = |U′ U||V′ MU V| = |V′ V||U′ MV U|.

3. Concentrated Likelihood Function: Consider the normal linear regression model

yn |xn ∼ i.i.d. N (x′n β0 , σ02 ), n = 1, . . . , N . Let y = [y1 , . . . , yN ]′ and X =
[x1 , . . . , xN ]′ . ML estimation of β0 and σ02 amounts to maximizing the average
log-likelihood function
( ( ))
1 ∑
N
1 1 ′
2
L(β, σ ; y,X) = ln √ exp − 2 (yn − xn β) 2
,
N n=1 2πσ 2 2σ
{ }
N 1 1 ∑N
i.e. max L(β, σ 2 ; y,X) ⇔ max − ln(σ 2 ) − 2 (yn − x′n β)2 .
β,σ 2 β,σ 2 2 2σ N n=1

Note that the order of maximization is immaterial. For any value of β, max-
∑ ′
n=1 (yn − xn β) .
imization with respect to σ 2 yields the solution σ 2 (β) = N1 N 2

This allows to concentrate out σ 2 in the average log-likelihood function and

cast the maximization problem as a maximization over β alone:

max
2
L(β, σ 2 ; y,X) ⇔ max L(β, σ 2 (β); y,X)
β,σ β
N 1
⇔ max − ln(σ 2 (β)) −
β 2 2
N
⇔ max − ln(u(β)′ u(β)), where u(β) = y − Xβ.
β 2

50
It is straightforward to check that this results in the well-known MLE for β0 ,
β̂, equivalent to the OLS estimator, and in the MLE for σ02 , σ̂ 2 = σ 2 (β̂).

4. Details on the Sargan-Hansen J-test statistic: Let rk(Z) = m (> rk(X) )=

k. It follows from the deﬁnition of the 2SLS estimator that var β̂2SLS =
−1
σ 2 (X′ PZ X) . Therefore,
( ) ( )
var y − Xβ̂2SLS X, Z = var (y|X, Z) + var Xβ̂2SLS X, Z
( )
−cov y, Xβ̂2SLS X, Z
( )
−cov Xβ̂2SLS , y X, Z .
[ ] ( )
′ ′ −1 ′
Since X PZ I − X (X PZ X) X PZ = 0 implies cov X β̂2SLS X, Z = 0,
−1 −1
it also follows that X (X′ PZ X) XPZ = X (X′ PZ X) X′ . Hence,
( ) ( )
cov y, Xβ̂2SLS X, Z = var Xβ̂2SLS X, Z
( ) [ ]
⇒ var y − Xβ̂2SLS X, Z = σ 2 I − X(X′ PZ X)−1 X′
[ ]
= σ 2 I − X(X′ PZ X)−1 X′ PZ .

Also, the matrix in square brackets in the preceding expressions is idempotent

and symmetric, and so
( ) ( )
rk I − X(X′ PZ X)−1 X′ = tr I − X(X′ PZ X)−1 X′ PZ
( )
= N − tr (X′ PZ X)−1 X′ PZ X
= N − k.

Finally,
( ) ( )
var Z (y − Xβ̂2SLS X, Z = σ 2 Z′ I − X(X′ PZ X)−1 X′ Z
′

( )
= σ 2 Z′ Z − Z′ X(X′ PZ X)−1 X′ Z .

Then, rk(Z′ Z) = m, rk(Z′ X) = k and rk(X′ PZ X) = k imply that the central

matrix of the J-statistic satisﬁes rk (Z′ (I − X(X′ PZ X)−1 X′ ) Z) = m − k.

Assignment - 1 - Lab 3.5 - Varinder
No ratings yet
Assignment - 1 - Lab 3.5 - Varinder
5 pages
Cs3353 Fds Unit 3 Notes Eduengg
No ratings yet
Cs3353 Fds Unit 3 Notes Eduengg
47 pages
Lecture Notes On Advanced Econometrics
100% (1)
Lecture Notes On Advanced Econometrics
378 pages
Panel Data V
No ratings yet
Panel Data V
28 pages
Block 1
No ratings yet
Block 1
81 pages
CHAPTER IV - Multiple Regression Model
No ratings yet
CHAPTER IV - Multiple Regression Model
90 pages
Stock_Watson_3U_ExerciseSolutions_Chapter18_Instructors
No ratings yet
Stock_Watson_3U_ExerciseSolutions_Chapter18_Instructors
35 pages
Panel Data II
No ratings yet
Panel Data II
32 pages
hansen1982
No ratings yet
hansen1982
27 pages
Stock Watson Ecta 1993
No ratings yet
Stock Watson Ecta 1993
38 pages
Econometrics Notes
No ratings yet
Econometrics Notes
30 pages
LoD Estimation Methods FINAL 29Jul2024A
No ratings yet
LoD Estimation Methods FINAL 29Jul2024A
13 pages
Slide Chuong5
No ratings yet
Slide Chuong5
32 pages
Session CLRM Review 1
No ratings yet
Session CLRM Review 1
47 pages
Block1
No ratings yet
Block1
83 pages
CLRM Assumptions
No ratings yet
CLRM Assumptions
24 pages
Estimation Methods
No ratings yet
Estimation Methods
81 pages
AdvEcx Chp2 Full 1005
No ratings yet
AdvEcx Chp2 Full 1005
105 pages
01B Linear Regression With Time Series Data
No ratings yet
01B Linear Regression With Time Series Data
47 pages
Assignment Afework
No ratings yet
Assignment Afework
6 pages
Advanced Econometrics: Masters Class
No ratings yet
Advanced Econometrics: Masters Class
24 pages
Bond(2007 Lecture)Dynamic PD first difference GMM
No ratings yet
Bond(2007 Lecture)Dynamic PD first difference GMM
50 pages
Zero-Intelligence Realized Variance Estimation: Jim Gatheral and Roel C.A. Oomen March 2009
No ratings yet
Zero-Intelligence Realized Variance Estimation: Jim Gatheral and Roel C.A. Oomen March 2009
30 pages
RegEstimationLS_ML_StatColumbia
No ratings yet
RegEstimationLS_ML_StatColumbia
44 pages
Chapter_6_Multiple_Regression_Analysis_Further_Issues
No ratings yet
Chapter_6_Multiple_Regression_Analysis_Further_Issues
9 pages
Tut Chap 013G25
No ratings yet
Tut Chap 013G25
5 pages
IITB Chem
No ratings yet
IITB Chem
24 pages
Panel Data
No ratings yet
Panel Data
14 pages
Hypothesis Testing
50% (2)
Hypothesis Testing
210 pages
The Pivotal Quantity
No ratings yet
The Pivotal Quantity
3 pages
Simple Linear Regression 4
No ratings yet
Simple Linear Regression 4
5 pages
TA_session_06
No ratings yet
TA_session_06
13 pages
Data Analysis Activity
No ratings yet
Data Analysis Activity
3 pages
Regularization
No ratings yet
Regularization
3 pages
Week 1
No ratings yet
Week 1
42 pages
Regression
No ratings yet
Regression
7 pages
Regression Analysis
No ratings yet
Regression Analysis
2 pages
Tutorial12 Estimation PartA
No ratings yet
Tutorial12 Estimation PartA
13 pages
Lect 6
No ratings yet
Lect 6
20 pages
A Step-By-Step Introduction To VAR Models (With Simulations On Matlab) - Michele Piffer
No ratings yet
A Step-By-Step Introduction To VAR Models (With Simulations On Matlab) - Michele Piffer
43 pages
LECTURE2
No ratings yet
LECTURE2
13 pages
ECN 318 Lecture Notes Weeks 3-4
No ratings yet
ECN 318 Lecture Notes Weeks 3-4
25 pages
Final_Exam_Solution_20220201
No ratings yet
Final_Exam_Solution_20220201
14 pages
Chapter 2 Econometrics
No ratings yet
Chapter 2 Econometrics
9 pages
Assign REG Bba
No ratings yet
Assign REG Bba
1 page
Exercise 5.3 Solution Guide
No ratings yet
Exercise 5.3 Solution Guide
11 pages
Data Analytics With Python - Week 3 - 2022
No ratings yet
Data Analytics With Python - Week 3 - 2022
3 pages
Project 03: Data Fitting Applied Mathematics and Statistics For Information Technology
No ratings yet
Project 03: Data Fitting Applied Mathematics and Statistics For Information Technology
17 pages
7772 LectureNotes
No ratings yet
7772 LectureNotes
120 pages
Hayashi ch3 4 - GMM
No ratings yet
Hayashi ch3 4 - GMM
31 pages
Hydrology Assignment
No ratings yet
Hydrology Assignment
6 pages
Lecture 2
No ratings yet
Lecture 2
14 pages
Questions and Answers On Generalized Method of Moments: A B A B
No ratings yet
Questions and Answers On Generalized Method of Moments: A B A B
7 pages
Lecture II - Docx - 12
No ratings yet
Lecture II - Docx - 12
12 pages
Gary Chamberlain Econometric S
No ratings yet
Gary Chamberlain Econometric S
152 pages
Chapter2 Econometrics MultipleLinearRegressionModel 1 1
No ratings yet
Chapter2 Econometrics MultipleLinearRegressionModel 1 1
34 pages
Ecom 165 Notes
No ratings yet
Ecom 165 Notes
98 pages
Course Syllabus On Econometrics
No ratings yet
Course Syllabus On Econometrics
2 pages
Topic 3 SRM 1
No ratings yet
Topic 3 SRM 1
61 pages
Classical Linear Regression and Its Assumptions
No ratings yet
Classical Linear Regression and Its Assumptions
63 pages
R300 Solution Guide 2018M
No ratings yet
R300 Solution Guide 2018M
8 pages
GMM Estimation PDF
No ratings yet
GMM Estimation PDF
35 pages
Estimation of The Mean and Proportion: Prem Mann, Introductory Statistics, 9/E
No ratings yet
Estimation of The Mean and Proportion: Prem Mann, Introductory Statistics, 9/E
84 pages
Chapter3 PDF
No ratings yet
Chapter3 PDF
52 pages
GMM, MLE and Tests For Non-Linear Restrictions: 1 Generalized Method of Moments (GMM)
No ratings yet
GMM, MLE and Tests For Non-Linear Restrictions: 1 Generalized Method of Moments (GMM)
15 pages
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares
No ratings yet
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares
8 pages
Correlation and Regration
No ratings yet
Correlation and Regration
8 pages
Econometrics
No ratings yet
Econometrics
310 pages
Demostraciones
No ratings yet
Demostraciones
20 pages
Chapter 6: Regression
No ratings yet
Chapter 6: Regression
7 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
Matrix OLS NYU Notes
No ratings yet
Matrix OLS NYU Notes
14 pages
SDV
No ratings yet
SDV
82 pages
Lesson01 PDF 02
No ratings yet
Lesson01 PDF 02
5 pages
2 Classical Linear Regression Models: 2.1 Assumptions For The Ordinary Least Squares Regression
No ratings yet
2 Classical Linear Regression Models: 2.1 Assumptions For The Ordinary Least Squares Regression
18 pages
Linear Method of Moments 1.1. The Model
No ratings yet
Linear Method of Moments 1.1. The Model
15 pages
Advanced Econometrics PDF
No ratings yet
Advanced Econometrics PDF
58 pages
Soederlind P. Lecture Notes For Econometrics (LN, Stockholm, 2002) (L) (86s) - GL - PDF
No ratings yet
Soederlind P. Lecture Notes For Econometrics (LN, Stockholm, 2002) (L) (86s) - GL - PDF
86 pages
Generalized Method of Moments
No ratings yet
Generalized Method of Moments
24 pages
3 The Basic Linear Model Finite Sample Results
No ratings yet
3 The Basic Linear Model Finite Sample Results
9 pages
معامل تضخم التباين PDF
100% (1)
معامل تضخم التباين PDF
19 pages
Eco No Metrics
No ratings yet
Eco No Metrics
312 pages
Method of Moment
No ratings yet
Method of Moment
53 pages
Eco No Metrics
No ratings yet
Eco No Metrics
1,045 pages
Panel Data Lecture Rome
No ratings yet
Panel Data Lecture Rome
47 pages
Differential Equations: A Concise Course
From Everand
Differential Equations: A Concise Course
H. S. Bear
5/5 (3)
Logical progression of twelve double binary tables of physical-mathematical elements correlated with scientific-philosophical as well as metaphysical key concepts evidencing the dually four-dimensional basic structure of the universe
From Everand
Logical progression of twelve double binary tables of physical-mathematical elements correlated with scientific-philosophical as well as metaphysical key concepts evidencing the dually four-dimensional basic structure of the universe
Federico Tambara
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
3.5/5 (1)