Open Macroeconomics
Open Macroeconomics
Roberto Rigobon
MIT
Fall 2010
ii
Contents
iii
iv CONTENTS
I started writing these notes while I was visiting the Graduate Institute of International Studies at Genève
in 2004. I started with the identification on macroeconomics section, after that, I had far too much free
time (they gave me tenure), and I have continued them through out the years while I was visiting PUC in
Rio, the University of Indiana, the University of Wisconsin at Madison, the Bank of England, the European
Central Bank, the Inter-American Development Bank, Universidad Los Andes in Bogota, and Now the Kiel
Institute. I thank all of them for their tremendous hospitality and for motivating me to organize my thoughts
in these area.
Before starting it is absolutely crucial to start with a disclaimer. Most people have disclaimers and I have
never been able to write one. So, here it goes... Although I do not work on a Central Bank or a multilateral
organization such as the IBD, IMF or WB, my opinions do not reflect the views of those organizations, nor
their board members, nor their staff members, nor their respective significant others, nor their pets either.
Just in case you were wandering.
Now turning to more serious issues, there are three important characteristics that define these notes.
First, there are, probably, a continuum of mistakes. Especially, because they were written in Spanish,
and then translated to English by someone that has a very limited knowledge of both languages − i.e.,
me. Second, these notes try to summarize an extremely extensive literature. I cannot cite everybody that
deserves to be cited. The main reason is that I cannot type all those citations in BibTex. That is quite
embarrassing because a lot of them are actually colleagues, that are still kicking around. If by any chance
you think that I have forgotten to cite 24 of your papers, I apologize, and I can only offer you these words of
profound sympathy: “you are in very good company”. I promise you, though, that I will not forget a single
one of my papers. So, at least someone will be well represented.
vii
viii CONTENTS
Chapter 5
Identification in Macroeconomics:
Problem
The problem of identification in macroeconomics is one of the most studied issues in theoretical and applied
work. Problems of simultaneous equations, omitted variables, and errors in variables have motivated a large
literature in econometric papers. In this notes, my objective is to describe how these problems affect the
estimation of macro-models and to study some of the new methodologies that have been developed to solve
them. We, as a profession, still are far from having a satisfactory answer, but we are clearly moving in the
right direction.
This chapter describes the three problems we are interested in analyzing. First, we discuss the biases
that arise in each of the cases and their properties. Second, we provide a reinterpretation to the biases by
relating the problem of recovering the “true” coefficients from the data − i.e. the lack of identification − to
these three problems. This puts all the problems within the same framework. The third section analyzes the
standard solution that the literature has offered to these problems. The purpose of the section is to provide
a concise summary of the “favorite” techniques within a single framework. By no means, it pretends to be
a survey of the literature.
aggregation is also crucial. In any case, the choice of topics reflect my preferences, and not the aggregate opinion of the
profession.
95
96 CHAPTER 5. IDENTIFICATION IN MACROECONOMICS: PROBLEM
The problem of simultaneous equations is perhaps one of the most common issues we face in applied work.
In fact, it is the preferred one of any referee uses to protect his/her personal agenda, and wants to reject any
(of my) papers. In any case, it is also common in practical issues. For instance, the problem of estimating
the slope of the demand curve, when the researcher does not know if the price-quantities observed are the
result of shifts in the supply schedule or those of the demand curve is one of the benchmark models in most
econometric classes. The problem is more generalized than this. For example, estimating the Central Bank
reaction function, the fiscal policy reaction function, savings and investment behavior, the linkages among
asset prices, or among countries (contagion), or the choice of education and wages, or of participation in the
labor market and taxes, or the impact of the quality of institutions on income, estimating the Q theory, are
just a few of all the possible questions where endogeneity is a crucial issue.
In this sub-section, we study the general problem of simultaneous equations in the standard supply and
demand framework. Assume that we are interested in estimating the following relationship:
yt = αxt + εt (5.1)
For simplicity, lets assume that the two variables have mean zero (so there is no constant in the regression),
and that both are univariate with dimensions T × 1. It is well known that the OLS estimated coefficient
takes the form
−1
α̂OLS = (x0t xt ) (x0t yt ) .
The problem of simultaneous equations, however, is that the variable x also depends on y. Assume that
they satisfy the following relationship:
xt = βyt + η t . (5.2)
Equations (5.1) and (5.2) form a system of equations that is known as the structural model:
where εt and η t are known as the structural shocks. In most macro applications the following moments are
usually assumed
E (εt ) = 0, E (η t ) = 0,
finite variance
E (ε0t εt ) = σ 2ε , E (η 0t η t ) = σ 2η ,
and are uncorrelated
E (ε0t η t ) = 0.
The assumptions imply that unconditionally the errors have mean zero, that their variances are finite,
and that their covariance is zero. This last assumption is not required but in most macroeconomic models it
is used. The main reason is that we would like to be able to think of the structural shocks as innovations that
are economically meaningful, such as demand versus supply shocks, nominal versus real shocks, or permanent
versus transitory shocks. In general, it is easier to understand the implications of these shocks when they are
considered as independent or orthogonal. As will become clear below we will relax this assumption. For the
moment, this assumption is innocuous to our discussion and therefore we will keep it because it simplifies
tremendously the algebra.
5.1. PROBLEMS AND BIASES 97
Additionally, this covariance assumption implies that all the joint co-movement between the observed
variables (x and y) is explained by the endogenous coefficients (α and β) and not by the correlation in their
disturbances.2
The structural model implies that the observed variables are given by what is known as the reduced form
1
yt = (αη t + εt ) (Reduced Form Model)
1 − αβ
1
xt = (η + βεt )
1 − αβ t
where, in order to assure that the observed variables have finite variance, we will impose the following
assumption:
|αβ| < 1
Under assumptions (10) and (11) there are only three relevant moments that can be estimated in the
sample: the variance of y, the variance of x, and their covariance. If the distributions are not normal there
are also other higher moments that can be estimated in the sample that are relevant, but those issues are
left for later. Mainly because if the distributions are not normal then identification becomes a much easier
problem to solve. We would like to put ourselves in the toughest of all positions and discuss how to solve
the problem there. Furthermore, the assumption that the variables are normal, such that their sum is also
normal is a standard assumption in macro applications. The moments are:
1 2 2
α σ η + σ 2ε
V ar(yt ) = 2 (5.3a)
(1 − αβ)
1 2 2
Covar(xt , yt ) = 2 ασ η + βσ ε (5.3b)
(1 − αβ)
1 2 2 2
V ar(xt ) = 2 ση + β σε (5.3c)
(1 − αβ)
Covar(xt , yt )
α̂OLS =
V ar(xt )
σ 2ε
= α + β (1 − αβ) (5.4)
σ 2η + β 2 σ 2ε
Equation (5.4) shows that the OLS estimate has an additional term which is the bias introduced by
simultaneous equations. There are some properties of this bias that are worth discussing. First, under the
assumption that |αβ| < 1 the sign of the bias is the sign of β. So, if x is a decreasing function of y then the
OLS estimate is smaller (downward biased) than the true one, while the converse occurs if the coefficient is
positive. Notice that nothing prevents the bias to reverse the sign of the coefficient. In other words, the fact
that α is positive (for instance) does not necessarily forces the OLS estimate to be positive. We will see that
this is not the case with some of the other problems discussed below.
2 This does not imply that the disturbances are not correlated in most applications. But any of such correlation can be
Second, the bias is exactly zero if β is zero. Obviously assuming that β is zero and that the covariance
of the structural shocks is also zero is indeed eliminating the problem of simultaneous equation − it is just
assuming the problem away.3 In any case, it is important to highlight that this is the case because some of
the solutions that are widely used in the literature are in fact making this assumption.
Third, notice that there is another condition in which the bias goes to zero:
σ 2η
→∞
σ 2ε
which can happen if the innovations to the first equation are zero (σ 2ε = 0) or if the innovations to the second
equation are infinitely large (σ 2η → ∞), which is the case when the variables are integrated but they are also
cointegrated.
Finally, the bias is small (and goes toward zero), when σ 2η σ 2ε . This is known in the literature as near
identification and we will return to this issue in the next chapter.
Omitted variable bias is perhaps the second most important issue afflicting macro applied work. The fact
that it is almost impossible to control for all observables implies that in most of our specifications we always
have some degree of misspecification. Obviously this should not be considered as a justification to never do
applied work. On the contrary, as in the case of endogeneity, this problem should just make our claims less
ambitious.
One of the most important and studied problems of omitted variables is the estimation of the returns of
one more year of schooling. The idea is that there exists an unobservable variable, the individuals ability,
that is both correlated with the decision of participation in the school system, and on the salaries received.
It could be argued that an individual of higher ability would be willing to study more years, and for the
same level of education might receive a higher wage.
As before, we study a simplified model to highlight the problems of estimation. Assume that we are
interested in estimating the following relationship:
yt = αxt + εt
but in this case, the true model is the following:
yt = αxt + γzt + εt (Omitted Variable Model)
xt = zt + η t
where εt and η t are the structural shocks, and zt is an unobservable omitted variable. The following moments
are usually assumed
Which are equivalent to the assumptions made before (Assumption 10). The reduced form is the following
yt = (α + γ) zt + αη t + εt
xt = zt + η t
As before, there are only three relevant moments that can be estimated in the data:
2
V ar(yt ) = (α + γ) σ 2z + α2 σ 2η + σ 2ε (5.5a)
Covar(xt , yt ) = (α + γ) σ 2z + ασ 2η (5.5b)
V ar(xt ) = σ 2z + σ 2η (5.5c)
Equation (5.6) shows the bias introduced by omitted variables. Notice that in this case we have similar
remarks as the ones for the simultaneous equations problem. First, the sign of the bias is the sign of γ. As
before, nothing prevents the bias to reverse the sign of the coefficient, and if γ is zero, then the omitted
variable does not enter the y equation − and hence the bias disappears.
Second, if
σ 2η
→∞
σ 2z
the bias goes to zero. Finally, the bias is small when σ 2η σ 2z , which is exactly the same condition as before
for near identification.
This parallel will continue to be present, and is part of the purpose of this section to show that indeed
these different problems are in some form all related.
5.1.3 Error-in-variables
Finally, lets study the problem of errors in variables. Assume we are interested in estimating the exact same
relationship but that the true model is
where x∗t is the true variable, but one that cannot be observed. We only observe a noisy and unbiased
measure of it (xt ). As before, εt and η t are the structural shocks, and the following moments are usually
assumed:
E (εt ) = 0, E (η t ) = 0, E (x∗t ) = 0,
finite variance
E (ε0t εt ) = σ 2ε , E (η 0t η t ) = σ 2η , E (x∗0 ∗ 2
t xt ) = σ x∗ ,
These are the conditions that make this a ”classical” error-in-variables problem. The non-classical error-
in-variables produces different implications to the ones discussed here. These are important extensions, but
beyond out scope.
Assumption 13 is equivalent to the assumptions made in the previous two sub-sections. As before, there
are only three relevant moments that can be estimated in the data:
Equation (5.8) shows the bias introduced by error-in-variables. Although the form of equation (5.8) is
similar to (5.6) their properties are not exactly the same. First, the sign of the bias depends on the coefficient
in the equation to be estimated. Which means that the biased in negative if the coefficient is positive, and
the bias is positive if the coefficient is negative. Second, because the ratio of the variances in the right term
is always smaller than one then the bias is always in absolute terms smaller than α. This implies that the
bias (in this case) can never change the sign of the coefficient − the bias is moving the coefficients toward
zero but never reaching it.4
The only circumstance in which the bias is zero is when5
σ 2x∗
→ ∞.
σ 2η
Finally, as before, the bias is small, when σ 2x∗ σ 2ηz , which is exactly the same condition implied by near
identification.
in the linear bivatiare setting (as the one described here). If the model is non-linear or there are more regressors then the bias
can go in any direction.
5 There is another circumstance: α = 0, but this is not an interesting case.
5.2. LACK OF IDENTIFICATION 101
of x, and their covariance. These moments are given in equations (5.3), (5.5), and (5.7) for the cases of
simultaneous equations, omitted variables, and error-in-variables problems, respectively.
In the case of simultaneous equations there are four coefficients: α, β, σ 2ε , and σ 2η − three equations and
four unknwons. For omitted variables we have five parameters: α, γ, σ 2ε , σ 2η , and σ 2z − three equations and
five unknwons. Finally, for the error-in-variables problem we have four parameters: α, σ 2ε , σ 2η , and σ 2x∗ −
three equations and four unknwons. In all three problems the number of coefficients or parameters to be
estimated is larger than the number of equations. Furthermore, not only the number of equations is smaller
than the number of unknowns, but there is no linear or non-linear combination of the equations that can
solve for any of the parameters, and specially the parameter of interest - α.
Therefore, without further assumptions there is a continuum of solutions that satisfy the sample moments.
In other words, we cannot recover the true parameters from the data − which is known as an identification
problem.
The problem of identification described before can be generalized. In this section we discuss the exact
same issues and we introduce the standard terminology of system of equations. In particular rank and order
conditions. We will come back several times to these concepts and therefore, this is a good time to developed
them.
Assume the model to be estimated is
yt = αxt + εt
0
E (εt xt ) 6= 0
E (ε0t xt )
α̂OLS = α + .
V ar(xt )
Which again indicates the source of the bias is coming from the fact that the right hand side variable is
correlated with the residual.
In this model, the identification problem is due to the same aspect as the previous examples. In the data
we can only compute three moments: var(yt ), var(xt ), and covar(yt , xt ) but there are four parameters: α,
σ ε , var(xt ), and E (ε0t xt ).
In the standard literature on system of equations when the number of equations is smaller than the
number of unknowns it is said that the system of equations does not satisfy the order condition. As should
be expected, a system of equations where the order condition is not satisfied has no hope of actually collecting
the parameters without further assumptions − or equations.
It is important to highlight that the fact that the order condition is satisfied does not guarantee that
the system of equations has a solution. In other words, we can have enough equations, but they are not
independent. The independence of the equations is a condition known as rank condition. It states that
the number of independent equations has to be larger than the number of unknowns. The name “rank”
condition comes from the linear system of equations literature where the independence of the equations is
computed using the rank of the matrix describing the system. Most of the systems of equations we are
faced when estimating parameters involve non-linear relationships and checking their independence is much
harder than just calculating the column rank of a matrix. Nevertheless, the econometric literature adopted
this definitions since the seminal contribution by (Fisher 1976).
102 CHAPTER 5. IDENTIFICATION IN MACROECONOMICS: PROBLEM
The idea, or purpose of this section is to show (and try to convince the reader) that all these problems
can be described as part of a more general issue in which the number of coefficients that have to be estimated
is smaller than the number of equations or moments that can be computed in the data. The next section
deals with the methods that have been ;proposed in the literature to solve this problem.
yt = αxt + εt , (5.9)
xt = βyt + η t , (5.10)
where (5.9) is the demand equation, (5.10) is the supply equation, yt and xt are the observed price and
quantity, and εt and η t are the structural shocks. The parameters of interest are α, β, and the variances of
the shocks: σ 2ε , σ 2η . For the moment, assume that the structural shocks are not correlated: σ εη = 0. This
assumption is relaxed below.
It is well known that if α and β are different from zero, equations (5.9) and (5.10) cannot be consistently
estimated without further information. Actually, one can only estimate the covariance matrix of the reduced
form (Ω̂) given by, 2 2
α σ η + σ 2ε ασ 2η + βσ 2ε
1
Ω̂ = .
(1 − αβ)
2 . σ 2η + β 2 σ 2ε
The problem of identification is that the covariance matrix only provides three moments (the variance of yt ,
the variance of xt , and the covariance between yt and xt ) while there are four unknowns: α, β, σ 2η , σ 2ε .
The literature has solved the problem of identification by imposing additional parameter constraints. This
amounts to create or assume additional equations to the system of equations we are studying. These restric-
tions can be divided in the following classes: parameter restrictions, variance restrictions, sign restrictions,
and reverse regressions. In this section we summarize the implications of these assumptions.
The objective of this section, therefore, is to describe briefly most of the assumptions that have been
used in the literature. By no means, this pretends to be an exhaustive survey, it is just a summary of some
of the most used techniques. As will become clear, I will oversimplify what each of the methodologies do,
and indeed, I will present a critical perspective to all of them. It is important to mention that, even though
I will address the methods through this ”critical lens” perspective, these assumptions have proven to be
extremely useful in applied work. We have learned a great deal by using them, and several of the agreements
we have in the profession are the outcome of empirical studies using one or more of these techniques. There
are other economic problems, however, in which none of them can be rationalized and we still are in search
of the answers.
By far, the assumption that has been extensively used in the literature is parameter restrictions in the form
of exclusion restrictions or long run restrictions. For instance, (i) when we estimate VAR’s and compute a
Cholesky decomposition to estimate the structural equations − we are indeed using an exclusion restriction
5.3. STANDARD SOLUTIONS 103
that is implied by the ordering in the VAR; (ii) when we solve the problem by using instrument variables,
we are imposing an exclusion restriction; (iii) when we solve the problem of error-in-variables by using lags,
we are using an exclusion restriction, etc.
5.3.1.1.1 Contemporaneous coefficients: Assuming the problem away The first type of exclusion
restriction is one in which we assume that either β = 0, or α = 0. In my view this is just assuming the
problem of endogeneity or omitted variables away. When said it like this, the assumption does not sound
that reasonable, does it? But this is exactly the implicit assumption that we are making when we use the
triangular decomposition − or Cholesky decomposition − in a VAR! This is exactly the assumption implied
when we claim that certain variable is a valid instrument, etc.
The assumption indicated here implies that (lets assume we concentrate on β = 0)
yt = αxt + εt ,
xt = ηt ,
which implies that xt is orthogonal to εt and we can run OLS in the first equation to recover the true
coefficient.
For the multinomial setup the assumptions are very similar. Assume that there are N endogenous
variables, and that the contemporaneous relationship is described by the matrix A.
AXt = εt
where εt are the structural shocks assumed to be uncorrelated and with covariance matrix Σ, and where A
is a matrix with ones on the diagonal and dense. For example, for the bivariate case, the matrix is
1 −α
A=
−β 1
and the structural shocks covariance matrix is
σ 2ε
0
Σ=
0 σ 2η
5.3.1.1.2 Exogenous Variables: Indirect Least Squares A different set of exclusion restrictions
appear when the variable excluded is exogenous rather than endogenous. Assume the model is the following:
yt = αxt + πwt + εt ,
xt = βyt + γwt + η t ,
where wt is observed. We still make the same Assumption 10 in addition to
Assumption 14 Assume that the observed variable (wt ) has mean zero
E (wt ) = 0
finite variance
E (wt0 wt ) = σ 2w
and is uncorrelated with all the other shocks
E (wt0 εt ) = 0, E (wt0 η t ) = 0.
Where the zero-mean assumption is innocuous. We need it in this setup to assure that we do not have
to estimate a constant term. This is obviously a simplification. The reduced form model is
1
yt = ((αγ + π) wt + εt + αη t )
1 − αβ
1
xt = ((γ + βπ) wt + βεt + η t )
1 − αβ
Although in this model we can compute six moments: three variances for each of the observable variables
and three covariances, there are seven unknwons: three variances (σ 2ε , σ 2η , and σ 2w ) and four coefficients (α,
β, γ, and π). Furthermore, there is no way of re covering even some of the coefficients.
However, it is easy to show that one exclusion restriction is enough to solve the problem of identification.
In this case, we are assuming that the variable wt enters the second equation but does not enters the
first one. The reduced form is
1
yt = (αγwt + εt + αη t )
1 − αβ
1
xt = (γwt + βεt + η t )
1 − αβ
Notice that the ratio between the coefficients on the exogenous variable identify α. The regression coefficient
αγ γ
of yt on wt is 1−αβ , while the coefficient of xt on wt is 1−αβ . The ratio is exactly α. This methodology was
developed by (Haavelmo 1947)
5.3.1.1.3 Instrumental Variables Instrumental variables is similar to the indirect least square we have
seen but the required assumptions are smaller. Which also explains why instrumental variables has been
used so much in the literature. The setup is the following:
yt = αxt + πwt + εt ,
xt = βyt + γwt + η t ,
5.3. STANDARD SOLUTIONS 105
where wt is observed and from now on will be denoted as the “instrument”. We change Assumption 10 to
the following:
Assumption 16 Assume that the observed variable (wt ) has mean zero mean zero
E (εt ) = 0, E (η t ) = 0, E (wt ) = 0,
finite variance
E (ε0t εt ) = σ 2ε , E (η 0t η t ) = σ 2η , E (wt0 wt ) = σ 2w ,
and the instrument is uncorrelated with the residual in the first equation:
E (wt0 εt ) = 0.
Lemma 17 The coefficient α can be estimated consistently if and only if the shocks satisfy Assumption 16,
and the exclusion restriction
π = 0,
is imposed. Furthermore, one of the possible ways to estimate α is the following:
αIV = (wt0 xt )−1 (wt0 yt )
Notice that in this case the structural shocks are not required to be uncorrelated, E (η 0t εt ) 6= 0. Moreover,
the instrument can be correlated with the residuals in the second equation E (wt0 η t ) 6= 0. In these circum-
stances, even though there are less equations than unknowns, we still can solve the problem of estimating α.
By all means, this is the beauty of the instrumental variables approach.
First, lets make clear that the number of equations is in principle not enough to solve the problem. This
means that even though one of the coefficients is identified, the other coefficients cannot be recovered without
further assumptions. Second, we derive the instrumental variable estimates. The reduced form is
1
yt = (αγwt + εt + αη t )
1 − αβ
1
xt = (γwt + βεt + η t )
1 − αβ
In this model there are six moments that can be computed in the sample, but there are seven theoretical
moments: three variances (σ 2ε , σ 2η , and σ 2w ) and three coefficients (α, β, and γ), and two covariances of the
structural shocks (E (η 0t εt ) and E (wt0 η t )). This means that not all the coefficients can be recovered from the
data.
However, the amazing implication of instrumental variables is that even though these system is underi-
dentified, in terms of the total number of equations being smaller than the total number of unknowns, still
one of the parameters − actually the parameter of interest − can be recovered from the moments.
Notice that
1 0 γ β 1 1 1
plim w xt = σw + plim wt0 εt + plim wt0 η t
T t 1 − αβ 1 − αβ T 1 − αβ T
1 1
= γσ w + plim wt0 η t
1 − αβ T
and
1 0 αγ 1 1 α 1
plim w yt = σw + plim wt0 εt + plim wt0 η t
T t 1 − αβ 1 − αβ T 1 − αβ T
α 1
= γσ w + plim wt0 η t
1 − αβ T
106 CHAPTER 5. IDENTIFICATION IN MACROECONOMICS: PROBLEM
which means that even though when plim T1 wt0 η t 6= 0 still the ratio between these two plim’s is α.6 These
assumptions are much weaker than the ones required by ILS - no wonder why IV made such an incredible
impact in our profession while ILS’s impact has been significantly smaller.
Before turning our attention to the next subject it is important to remember the implicit assumptions of
IV for a much general setup − one that allows random coefficients, for example. We will use this in future
chapters and it is worth including these concepts right away.
Assume we are interested in estimating
yt = αt xt + εt
where
αt = ᾱ + η t
and where
E (x0t εt ) 6= 0.
Assume we have an instrumental variable denoted as wt that satisfies the following assumptions
Assumption 18 The instrumental variable is correlated with the right hand side variable
E (wt0 xt ) 6= 0
but uncorrelated with the residual on the first equation, as well as with the random coefficient
E (wt0 εt ) = 0
E (wt0 η t ) = 0.
Then the average of the random coefficients can be recovered by using the standard instrumental variable
estimator.
It is important to indicate what these assumptions are indeed stating that the instrument is affecting
both endogenous variables, but the effect on yt is entirely through the impact of the instrument on xt . This
means that the residuals in the first equation are unaffected by the instrument, as well as the coefficients.
Under these circumstances IV is a consistent estimate of the average effect (ᾱ).
One of the most used restrictions in VAR’s is the one that was popularized by (Blanchard and Quah 1989).
If it is known that one shock does not have permanent effects, then, under some conditions, it is possible
to obtain identification. For example, assume that nominal shocks are short lived, while real shocks are
permanent. Imposing this constraint (Blanchard and Quah 1989) and (Shapiro and Watson 1988) were able
to estimate the effects of aggregate shocks on aggregate activity and unemployment.
The idea is that we can impose that the long run effect of some shock is zero creating one additional
equation to the system and achieving identification. Obviously, this assumption can be used only when the
system includes lagged dependent variables otherwise it is equivalent to a exclusion restriction.
XXXXXX
1 1 0
6 Usually it is assumed that plim T
wt0 η t = 0 and that plim ε η
T t t
= 0, but at this derivation has shown this is not a
requirement.
5.3. STANDARD SOLUTIONS 107
Finally, constraints on the variances,7 for example, that σ 2η /σ 2ε is equal to some constant, or to infinity. The
case in which the relative variances is restricted to be equal to a constant has not been (frequently) used in
applied work, while the assumption that the ratio goes to zero or to infinite is used as one of the underlying
assumptions of most event studies.
Near identification refers to the case in which one of the variances is infinitely large in comparison to the
others. In that case, as has been discussed in Chapter 5, the problem of identification is solved.
Most event studies indeed appeal to this assumption. for example, in corporate finance when we are eval-
uating the impact of earnings announcements on stock prices, the idea is to pool all earning announcements
together in one single day, and the argument is that this process of averaging makes all other shocks in the
economy, such as change in risk premium, interest rates, confidence, etc., smaller. Therefore, it is possible
to measure the impact of the earnings exclusively.
This is the original intuition developed by (Wright 1928) to solve the problem of identification. See
(Fisher 1976) for a general discussion.
Setting
σ 2η /σ 2ε = λ
solves the problem of identification in the simultaneous equations problem. In general, this assumption is
hard to justify and therefore, it has not received a lot of attention in applied work. However, it is important
to highlight that in principle, this assumption is as hard to justify as those based on exclusion restrictions.
Sign restrictions: constraining the sign on the slopes of the structural equations can achieve partial identifi-
cation because the two inequalities imply a region of admissible parameters.
Even though a unique estimate cannot be obtained, at least an admissible interval is derived. See
(Fisher 1976) and (Blanchard and Diamond 1989)
[to be completed XXXX]
In the standard simultaneous equations problem, it is possible to determine, under certain conditions, what
are the range in which the true coefficients belong. The method was developed by Gini (1926) and it was
later recovered by (?) and (?).8 The purpose of the bounds is to highlight or show the extent of the
misspecification, and offer a range of coefficients that are valid to any possible identification scheme. A
7 See (Rothenberg and Ruud 1990) for a detailled study where covariance restrictions are imposed in linear simultaneous
equation models.
8 See (?) for a discussion along the same lines as here.
108 CHAPTER 5. IDENTIFICATION IN MACROECONOMICS: PROBLEM
regressions in which the bounds are tight imply that the biases introduced by simultaneous equations are
small.9
This method was developed for the general problem of misspecification, Assume we are interested in
estimating the simple relationship
yt = axt + ν y,t (5.11)
where the right hand side variable is correlated with the residual because there is a problem of simultaneous
equations. Notice that this is exactly the first equation in our system of equations. It is well known, and
as we have already argued, in the presence of misspecification we cannot estimate a consistently a. Indeed,
because regression 5.11 is misspecified, it is important to realize that there are two forms of estimating a.
σ 2ε
α̂eq−5.12 = α + β (1 − αβ)
σ 2η + β 2 σ 2ε
while the estimate of 1/a in equation (5.13) is (note that the two expressions are similar):
1
b 1 1 σ2
= − (1 − αβ) 2 2 ε 2
α eq−5.13 α α α ση + σε
We are interested in the estimation of α, hence, we solve for α in the second equation. We can in fact
use both estimates and compute the range where the true coefficient α must lie if the model is correct. To
illustrate the range, consider the two possible cases; where α and β have different or similar signs.
If α and β have different signs, the bias in equation (5.12) makes the OLS coefficient smaller (in absolute
value) than the true one. In other words,
|α̂eq−5.12 | < |α|
Similarly, under the same conditions, the estimate in equation (5.13) is also toward zero. Hence we can write
1
b 1
<
α eq−5.13 α
Therefore,
1
|α̂eq−5.12 | < |α| <
1
bα eq−5.13
In other words, if the two schedules have different signs, then the true coefficient lies between these two
estimates.
The intuition of this result is very simple. First, it is important to realize that equation (5.12) is the
OLS run in one direction, while equation (5.13) is the OLS regression in the other direction. If the schedules
9 Although the bounds were developed for the general misspecification problem, here we concentrate on the simultaneous
equations case.
5.3. STANDARD SOLUTIONS 109
have different signs, simultaneous equations will bias the OLS coefficients toward zero, because the OLS
coefficient is a linear combination of the two coefficients—one positive, and the other negative. Hence the
OLS coefficients in both regressions are smaller in absolute terms than the true ones. However, the coefficient
in the first equation (5.12) attempts to estimate α and the coefficient in the second equation (5.13) estimates
1/α. This is what determines the range.
When the two schedules have the same signs the range of coefficients is different. In this case, the bias
in the OLS in both equations (5.12 and 5.13) is away from zero. So, if both coefficients are positive the OLS
is larger than the true one, and if the coefficients are negative the OLS ones are smaller than the true ones.
This means that in absolute terms the true estimate has to satisfy the following relationship:
1
|α| < min |α̂eq−5.12 | ,
1
bα eq−5.13
Again, this implies a range of coefficients that is admissible. The intuition in this case, follows the same
reasoning as before, where the difference in the two estimates is due to the fact that in both equations the
estimated coefficients are larger than the OLS ones.
These bounds have been extended to study the case of multivariate (here we have discussed only the
bivariate case), and when the type of misspecification is not only simultaneous equations but other forms as
well.
110 CHAPTER 5. IDENTIFICATION IN MACROECONOMICS: PROBLEM
Bibliography
Blanchard, O., and D. Quah (1989): “The Dynamic Effects of Aggregate Demand and Aggregate Supply
Disturbances,” American Economic Review, 79, 655–73.
Blanchard, O. J., and P. Diamond (1989): “The Beveridge Curve,” Brookings Papers in Economic
Activity, 1, 1–76.
Fisher, F. M. (1976): The Identification Problem in Econometrics. Robert E. Krieger Publishing Co., New
York, second edn.
Haavelmo, T. (1947): “Methods of Measuring the Marginal Propensity to Consume,” Journal of the
American Statistical Association, 42, 105–122.
Rothenberg, T. J., and P. A. Ruud (1990): “Simultaneous Equations with Covariance Retrictions,”
Journal of Econometrics, 44(1-2), 25–39.
Shapiro, M. D., and M. W. Watson (1988): “Sources of Business Cycle Fluctuations,” in NBER Macroe-
conomics Annual 1988, ed. by S. Fischer. MIT Press, Cambridge, Mass.
Wright, P. G. (1928): The Tariff on Animal and Vegetable Oils, The Institute of Economics. The Macmil-
lan Conpany, New York.
111
112 BIBLIOGRAPHY
Chapter 6
Identification through
Heteroskedasticity: Theory.
The question of identification when the model includes endogenous variables has been studied for several
decades now.1 The problem arises when the structural form cannot be directly estimated, and the parameters
must be recovered from the reduced form, which has fewer equations than the number of unknowns. Thus, to
solve for the original parameters, more information is required. The typical solution is to impose additional
constraints based on economic knowledge about the particular model that is estimated. Indeed, as was
discussed in the previous chapter assumptions such as exclusion, sign, long-run, and covariance restrictions
have been very useful in numerous applied problems. However, they cannot always be justified.
In this chapter we present an alternative method to solve the identification problem that is based on
the heteroskedasticity that exists in the data. I show that if the structural shocks have a known correlation
(usually zero), the identification problem can be solved by simply appealing to the heteroskedasticity of the
structural shocks. For simplicity, I begin with a case in which there are two endogenous variables and two
regimes. Subsequently, I study the cases in which there are more than two regimes, when there are multiple
endogenous variables, and when common unobservable shocks are present.
The chapter is organized as follows: In section 6.1, we discuss the preliminary intuition of the method of
identification based on the heteroskedasticity. In section 6.2, the typical problem of identification is specified
in the bivariate setting. The methodology based on heteroskedasticity is studied when the data exhibit two
regimes, as well as they exhibit more than two regimes. A GMM interpretation of the estimation problem
is developed. In section 6.3, necessary conditions for identification are derived for multivariate processes
with unobservable common shocks. In section 6.4, the question of consistency under misspecification of the
heteroskedasticity is explored in the bivariate setup. Two cases are studied: First, when the number of
regimes are correctly specified but not the timing of the regimes, or windows, and second, when the number
of regimes is smaller than the actual number of regimes exhibited by the data.
113
114 CHAPTER 6. IDENTIFICATION THROUGH HETEROSKEDASTICITY: THEORY.
are the outcomes of shocks to both the supply and the demand schedule, so, the OLS estimates would be
biased. The instrumental variable approach solves the problem of identification by finding a variable that
shifts the supply schedule without affecting the demand curve, thus measuring the slope of the demand. The
heteroskedasticity of the structural shocks works in a similar fashion.
The simplest intuition can be developed by looking at a special case: Split the sample in two and assume
that in the second sub-sample the supply shocks are more volatile than in the first sub-sample, while the
demand shocks have a constant variance across the two sub-samples. This increase in the variance of the
supply shocks implies that the “cloud” of realizations enlarges through the demand schedule, as is shown in
the second panel of Figure 6.1. The residuals are distributed along an ellipse, and the shift in the variance
implies a rotation along the demand curve. From the instrumental variables point of view, this is equivalent
to having a “probabilistic” instrument; we cannot assure that the supply curve shifts (as in the standard IV
approach), but in the second sample shocks to the supply are more likely to occur. Thus, the joint behavior
approximates more closely the demand schedule.
In the limit, if the variance of the supply shocks goes to infinity, the ellipse collapses and becomes the
demand curve. In this case, the slope of the demand can be estimated by OLS. This intuition was put
forward by (Wright 1928). This paper extends the original methodology to the case in which the shifts in
the variances are finite, and the form of the heteroskedasticity is unknown. In fact, if the structural shocks
are not correlated, the system is identified just by knowing that there is a change in the relative variance of
the shocks. In particular, if both variances shift by the same amount, then the two ellipses are proportional,
and the system is not identified. On the other hand, if the relative importance changes, then the system will
be identified by the rotation of the ellipse.
6.2 Identification
Assume there are two regimes in the variances of the structural shocks: high and low volatility. Additionally,
assume that the structural parameters are stable across the regimes. Under these assumptions the two
reduced form covariance matrices have the same structure as before:
2 2
α σ η,s + σ 2ε,s ασ 2η,s + βσ 2ε,s
ω 11,s ω 12,s 1
Ω̂s ≡ = 2 , s ∈ {1, 2} , (6.1)
. ω 22,s (1 − αβ)
2 . σ 2η,s + β σ 2ε,s
where each regime is denoted as s ∈ {1, 2}, where the variances of the structural shocks in regime s are
given by σ ε,s and σ η,s , and where Ω̂s indicates the reduced form covariance matrix in regime s. In this new
system of equations there are six unknowns: α, β, σ 2η,1 , σ 2ε,1 , σ 2η,2 , and σ 2ε,2 , and two covariance matrices
that provide six equations! If the equations are independent, the problem of identification has been solved.
It is essential to restate the assumptions that lead to the identification of the system: (i) the parameters
are stable across the heteroskedasticity regimes, and (ii) the structural shocks are not correlated. These
assumptions are implicit in much of the applied macro work and are further discussed below.
Solving for the variances in equation (6.1), α and β satisfy the following non-linear system of equations:
ω 12,s − β · ω 11,s
α= , s ∈ {1, 2} . (6.2)
ω 22,s − β · ω 12,s
[ω 11,1 ω 12,2 − ω 12,1 ω 11,2 ] β 2 − [ω 11,1 ω 22,2 − ω 22,1 ω 11,2 ] β + [ω 12,1 ω 22,2 − ω 22,1 ω 12,2 ] = 0 (6.3)
6.2. IDENTIFICATION 115
There are two solutions to the quadratic equation. It is easy to show that if α, β is one solution to the system
of equations, then β ∗ = 1/α, α∗ = 1/β, is the other solution. Indeed, the solutions are the two possible ways
in which the structural form can be written. In other words, the system is identified up to row permutations
of the original model.
Proposition 19 Let yt and xt be described by equations (5.9) and (5.10), where the parameters (α and β)
determining the law of motion are stable and where the disturbances have finite variance, are not correlated,
and exhibit heteroskedasticity that can be described with two regimes. Then, if the covariance matrices satisfy
w11,2
det Ω̂2 −
Ω̂1 6= 0 (6.4)
w11,1
the structural form is just identified: α and β are consistently estimated from the two estimable covariance
matrices.
Proof. Identification is achieved if equation (6.3) has real solutions. A real solution requires
2
[ω 11,1 ω 22,2 − ω 22,1 ω 11,2 ] − 4 [ω 11,1 ω 12,2 − ω 12,1 ω 11,2 ] [ω 12,1 ω 22,2 − ω 22,1 ω 12,2 ] > 0.
The first one is satisfied because the positive definite properties of the covariance matrix
which is always positive. Therefore, if the coefficients in the quadratic equation are different from zero, then
the two roots are real.
The last requirement is to show when the quadratic equation does not have infinite solutions. This
requires that either
ω 11,1 ω 22,2 − ω 22,1 ω 11,2 6= 0,
or
ω 11,1 ω 12,2 − ω 12,1 ω 11,2 6= 0.
Given the model generating the data, these two assumptions are not satisfied if the heteroskedasticity implies
a proportional change of both structural shocks’ variances. In other words, when Ω2 = aΩ1 , for some scalar
a. This is the only case in which the solution to the quadratic equation (6.3) has infinite solutions.
Note
h that if Ωi2 = aΩ1 then det [Ω2 − a Ω1 ] = 0, which can be tested by computing whether or not
ω 11,2 ?
det Ω2 − ω11,1 Ω1 = 0. By construction this is equivalent to asking if the covariance of the normalized
difference is equal to zero:
?
ω 11,1 ω 12,2 − ω 11,2 ω 12,1 = 0.
116 CHAPTER 6. IDENTIFICATION THROUGH HETEROSKEDASTICITY: THEORY.
The small sample properties of this statistic are better behaved than the ones from the determinant, and in
the empirical section this is what is implemented to check the rank condition.
Consistent estimates of both covariance matrices imply that the estimate of β solves the following
quadratic equation:
[ω 11,1 ω 12,2 − ω 12,1 ω 11,2 ] β 2 − [ω 11,1 ω 22,2 − ω 22,1 ω 11,2 ] β + [ω 12,1 ω 22,2 − ω 22,1 ω 12,2 ] = 0,
where
1
α2 σ 2η,1 + σ 2ε,1 ασ 2η,2 + βσ 2ε,2 − ασ 2η,1 + βσ 2ε,1 α2 σ 2η,2 + σ 2ε,2
ω 11,1 ω 12,2 − ω 12,1 ω 11,2 = 2
(1 − αβ)
1
α2 σ 2η,1 + σ 2ε,1 σ 2η,2 + β 2 σ 2ε,2 − σ 2η,1 + β 2 σ 2ε,1 α2 σ 2η,2 + σ 2ε,2
ω 11,1 ω 22,2 − ω 22,1 ω 11,2 = 2
(1 − αβ)
1
ασ 2η,1 + βσ 2ε,1 σ 2η,2 + β 2 σ 2ε,2 − σ 2η,1 + β 2 σ 2ε,1 ασ 2η,2 + βσ 2ε,2 ,
ω 12,1 ω 22,2 − ω 22,1 ω 12,2 = 2
(1 − αβ)
where, under the assumption that the rank condition is satisfied (equations (6.4) or (6.5)), the solution of
the system of equations is
[(1 + αβ) ± (1 − αβ)]
β=
2α
where one solution is β = β and the other one is β = 1/α, which are the two permutations of the system
of equations. Thus, if σ 2η,1 , σ 2ε,2 , σ 2ε,1 , and σ 2η,2 are consistently estimated from the data, the consistency
of β is assured. But consistent estimates of the structural variances are indeed obtained from consistent
estimates of the reduced form covariance matrices if the system is linear, the parameters are stable, and the
the residuals have finite variances.
Furthermore, observe that β is consistent if the relative variances of the structural shocks shift:
σ 2η,1 σ 2η,2
−σ 2η,1 σ 2ε,2 + σ 2ε,1 σ 2η,2 6= 0 ⇒ 6
= ,
σ 2ε,1 σ 2ε,2
of unknowns. The rank condition requires the number of linearly independent equations to be equal to or
larger than the number of unknowns. In linear systems of equations, this is done by computing the rank
of the matrix. In the case studied here, the system is non-linear, and the rank condition takes the form of
equation (6.4).
Equation (6.4) fails if the two covariance matrices are proportional; i.e., the heteroskedasticity does not
identify the system if the relative variances are constant across regimes. Returning to the intuition given
in the introduction, imagine that the variance of both shocks doubles; then the shape of the ellipse across
the two regimes is the same, and nothing can be learned about the original system. Technically, this is the
case in which we have six equations and six unknowns, but the equations are not independent. On the other
hand, when the relative ratio of the variances shifts, then the heteroskedasticity changes the region in which
the errors are distributed, enlarging the ellipse along one of the structural equations. This rotation in the
ellipse can be estimated from the reduced form covariances allowing us to obtain the slope of the schedules.
The simplest intuition of how identification is achieved can be developed by first analyzing the case in
which the variance changes for only one shock. Assume that it is known that at some point in time there is
an increase in the variance of the supply shocks. During that period, the “cloud” of realizations is going to
widen along the demand curve as depicted in Figure 6.1. Comparing how the ellipse of the realizations has
changed across the two samples allows one to determine the slope of the demand curve. In this particular
case, because it has been assumed that the structural shocks have zero correlation, this is enough to estimate
the slope of the supply curve, too. Moreover, this explanation has an instrumental variable interpretation.
A valid instrument to estimate the demand schedule is one that moves the supply without affecting the
demand. In this example, the rise in the variance of the supply shocks becomes a probabilistic instrument
precisely because it increases the likelihood that the supply equation “moves”.
Finally, when both variances shift, there is an expansion along both schedules. So it is not necessary to
know which shock becomes more important across the regimes. It is enough if the relative variances shift -
equation (6.4) would be satisfied and both schedules identified.
It is easy to extend the previous results to the case where there are more than two regimes. Assume that
the data exhibit multiple finite heteroskedasticity regimes indexed by s ∈ {1, .., S}. For each regime, the
covariance matrix is
2 2
α σ η,s + σ 2ε,s ασ 2η,s + βσ 2ε,s
ω 11,s ω 12,s 1
Ω̂s ≡ = . (6.6)
. ω 22,s (1 − αβ)
2 . σ 2η,s + β 2 σ 2ε,s
This is a system that has 3S equations (one covariance matrix per regime) and 2S + 2 unknowns: S times
two structural variances for each regime, plus two parameters (α and β).
The order condition will be satisfied for any S larger than or equal to two. The rank condition takes the
same form as equations (6.4) and (6.5) for any pair of regimes. Indeed, the system is overidentified if there
are at least three regimes that satisfy the rank condition for all combinations.
Appealing to the probabilistic IV interpretation used before, each new heteroskedastic regime is a valid
instrument if and only if it satisfies the rank condition with respect to all the previous regimes. In this case,
each new covariance matrix adds three equations and only two unknowns. Otherwise, the new heteroskedastic
regime does not increase the number of restrictions on the structural coefficients. Hence, for S larger than
two, and for all covariance matrices satisfying the rank condition, the system of equations is overidentified,
and the underlying assumption - such as that α and β are stable through time - can be tested. The estimation
has a minimum distance interpretation where each heteroskedastic regime is equivalent to one instrument.2
2 The additional equations can also be interpreted as a factor regression model - where the left hand side variables of equation
118 CHAPTER 6. IDENTIFICATION THROUGH HETEROSKEDASTICITY: THEORY.
(ii) if there is a minimum number of endogenous variables (or maximum number of common shocks) that
satisfies
N 2 − N − 2K > 0. (6.12)
(iii) and if the covariance matrices constitute a system of equations that is linearly independent.
Proof. Note that the proposition states a necessary condition, but not a sufficient one. Thus it is stating
an order condition. From equation (6.7), the number of equations is given by the covariance matrix in each
regime. This provides N (N2+1) equations in each state. The total number of unknowns is as follows: The
matrix AN ×N has N (N − 1) parameters; the matrix ΓN ×K has K(N − 1) parameters; the variances of the
common shocks in each state is K · S (K variances times S regimes) and the variances of the structural
shocks in each regime are N · S (N variances times S regimes). Identification, then, requires
N (N + 1)
S· ≥ N (N − 1) + K(N − 1) + S · K + S · N
2
(N + K) (N − 1)
S ≥ 2 .
N 2 − N − 2K
Inequality (6.11) indicates the minimum number of states required to obtain identification. Finally, in order
for (6.11) to make sense, there is a minimum number of endogenous variables, which is given by
N 2 − N − 2K > 0.
Equation (6.12) is the “catch up” constraint. It indicates the conditions under which one additional
regime in the variance-covariance adds more equations than unknowns. In the example that motivated
this section, (N = 2 and K = 1) implies that the inequality is not satisfied and no further information
is obtained from the heteroskedasticity. Moreover, if the common shocks are interpreted as the sources of
correlation between the structural shocks, then this constraint indicates that some of the covariances of the
structural shocks must be restricted to be constant or zero. Solving for K it is found that identification
requires K < N (N2−1) , where the right hand side of this inequality is exactly the number of all possible
contemporaneous correlations among structural shocks.
There are two main implications of proposition 20: First, in the absence of common shocks only two states
are required to achieve identification, independently of the number of endogenous variables N . Second, if
K > 0 and N is finite, the number of states required to achieve identification is always larger than two.
The estimation of this model is performed by GMM where the moment conditions are
where Ωs is the covariance matrix that can be estimated in the data from the observed variables (xt ) in
regime s, Ωz,s is the covariance matrix of the common unobservable shocks in regime s, which, given the
assumptions in equation (6.8), is a diagonal matrix, and Ωε,s is the covariance matrix of the structural shocks
in regime s, which given the assumptions in equation (6.8), is also diagonal. The parameters of interest are
A and Γ.
As I hope it is clear, the assumptions required to identify the model when there are common shocks is
much harder than in the case in which the covariance assumption of the structural shocks can be imposed
directly. In what follows I would like to discuss two methodologies that Brian Sack and I have used in other
papers to deal with the presence of common shocks. This is an extremely important problem when we are
dealing with macro asset pricing.
120 CHAPTER 6. IDENTIFICATION THROUGH HETEROSKEDASTICITY: THEORY.
At this point it is useful to discuss the relationship between this methodology and the literature on identifi-
cation using heteroskedasticity. As mentioned before, the use of second moments as a source of identification
was firstly introduced by Philip Wright [1928]. He indicated that an increase in the variance of the shocks
in one equation reduces the bias introduced by simultaneous equation problems in the OLS estimate of
the other one. Taking the limit to infinity implies that OLS would estimate the coefficients consistently.
Relatively new research has been conducted extending the original intuition (i) to non-linear models, (ii) to
models with parametric representations of the heteroskedasticity (such as ARCH or GARCH models), and
(iii) to models that are partially identified.
(Klein and Vella 2000b) and (Klein and Vella 2000a) discuss the problem of identification and estimation
in a binary endogenous model when exclusion restrictions (or any other parameter restrictions) are not
available and the case of the triangular model, respectively. They estimate the heteroskedasticity semi-
parametrically and use the residual from the second equation as an additional regressor in the first equation
as the instrument.4
(Sentana 1992) and (Sentana and Fiorentini 2001) study the problem of estimation in factor regressions
when there is conditional heteroskedasticity. The simple case developed in this section (proposition 19) is
a special case of their proposition 3. They study the conditions in which identification is achieved in a
non-triangular system when the common latent factors exhibit heteroskedasticity.
There are important differences between those papers and the approach developed here. First, the
procedure highlighted in this paper requires only the knowledge that a shift in the relative variances has
occurred - that is, the regime shift comes from economic events, such as crisis, policy shifts, or other
characteristics in the data as heteroskedasticity along regions, time, or other cross-sectional characteristics.
The ARCH specification uses the time series heteroskedasticity in the data as an statistical vehicle to achieve
identification. Second, the procedure described in this paper allows us to test for some of the underlying
assumptions, such as parameter stability; the system is overidentified when there are more than two regimes.
The techniques based on conditional heteroskedasticity are unable to provide this test. Third, as is shown
below, if the heteroskedasticity is misspecified in this model, the coefficients are still consistent. This is not
the case when the heteroskedasticity is modeled parametrically; misspecification in those cases could bias
the contemporaneous coefficients as well. Furthermore, if the data exhibit conditional heteroskedasticity,
and the procedure here described is implemented, it is still the case that the coefficients will be consistent.
Fourth, models that rely on conditional heteroskedasticity to achieve identification require the number of
heteroskedastic shocks to be smaller than, or equal to, the number of endogenous variables. As is shown in
Section 6.3, this is not the case in the present procedure. If there are more than two regime shifts, there
exist conditions in which it is possible to have more latent factors than endogenous variables and still being
able to identify the structural system.
Though the estimation procedures among all these papers are very different, they share the same intuition
for solving the problem of endogenous variables: the heteroskedasticity adds equations to the system after
some covariance restrictions have been imposed. It is important to mention that these procedures require
that the system of equations be linear, or in other words, that the coefficients be stable to changes in the
volatility. Future research should consider extending the methodology to non-linear specifications.
Finally, in addition to the papers mentioned above, some applied papers already have used heteroskedas-
ticity to identify a system of equations. In the context of conditional heteroskedasticity, see (Caporale,
Cipollini, and Spagnolo 2002), (Dungey and Martin 2001), (King, Sentana, and Wadhwani 1994), and
(Rigobon 2002). In these papers a structural conditionally heteroskedastic model is estimated from a re-
duced form GARCH model. In the context of regime switches see (?), and (Rigobon and Sack 2003) and
4 See also (Chen and Khan 1999) for a general solution of the problem of identification in sample selection models when the
Assume the system is described by equations (5.9) and (5.10), and that the data exhibit heteroskedastic-
ity with only two regimes. If the windows are misspecified, the computed covariance matrices are linear
combinations of the true underlying covariance matrices. Denote
where Ω1 and Ω2 are the true covariance matrices describing the heteroskedasticity, Ωr1 and Ωr2 are the
estimated covariance matrices, and λr1 and λr2 are weights indicating how “correct” the windows are; when
they are equal to one, the windows coincide with the true regimes.
Proposition 21 Assume the original system satisfies the rank condition (6.4). If the misspecified het-
eroskedasticity satisfies the rank condition (6.4), then the model is identified and its estimators are consis-
tent.
122 CHAPTER 6. IDENTIFICATION THROUGH HETEROSKEDASTICITY: THEORY.
Proof. After some algebra the two covariance matrices can be written in terms of the underlying variances:
2 2
α σ η,r1 + σ 2ε,r1 ασ 2η,r1 + βσ 2ε,r1
1
Ωr1 = ,
(1 − αβ)
2 . σ 2η,r1 + β 2 σ 2ε,r1
2 2
α σ η,r2 + σ 2ε,r2 ασ 2η,r2 + βσ 2ε,r2
1
Ωr2 = ,
(1 − αβ)
2 . σ 2η,r2 + β 2 σ 2ε,r2
where
σ 2η,r1 = λr1 σ 2η,1 + (1 − λr1 ) σ 2η,2 and σ 2ε,r1 = λr1 σ 2ε,1 + (1 − λr1 ) σ 2ε,2 (6.14)
σ 2η,r2 = (1 − λr2 ) σ 2η,1 + λr2 σ 2η,2 and σ 2ε,r2 = (1 − λr2 ) σ 2ε,1 + λr2 σ 2ε,2 . (6.15)
Given that the original heteroskedasticity satisfied the rank condition (σ 2η,1 σ 2ε,2 − σ 2η,2 σ 2ε,1 6= 0), there are
two questions to answer: (i) in which circumstances the misspecified model satisfies the rank condition, and
(ii) in which circumstances the estimates are consistent. After some algebra, Ωr1 and Ωr2 satisfy equation
(6.4) if and only if
σ 2η,r1 σ 2ε,r2 6= σ 2η,r2 σ 2ε,r1 .
Substituting by the definitions of the variances (equations 6.14 and 6.15), the rank condition is not satisfied
if and only if
λr1 = 1 − λr2 .
In other words, the rank condition is not satisfied if the windows are so badly specified that they imply the
same weights on the true regimes. Thus, the two computed matrices are identical.
Assume the rank condition is satisfied; then the question is whether the solution of the new system of
equations is consistent. Substituting equations 6.14 and 6.15 into equation (6.3), the estimated β solves.
Φα 2 1 β
3 β − +β β+ = 0, (6.16)
(1 − αβ) α α
where
Φ = σ 2η,1 σ 2ε,2 − σ 2η,2 σ 2ε,1 (1 − λr1 − λr2 ) .
Note that under the assumption that the original heteroskedasticity satisfies the rank condition, and that
λr1 6= 1 − λr2 , then Φ is different from zero. Hence, equation (6.16) solves the exact same quadratic equation
as the well-specified model. Thus the consistency is assured if the covariance matrix is consistently estimated.
The two solutions are β and 1/α. Therefore, if the regimes are misspecified and the system satisfies the rank
condition, then the estimates are consistent.
In other words, if the computed covariance matrices satisfy the rank condition, then the estimates are
consistent even if the regimes have been slightly misspecified. On the other hand, if the misspecification is so
large that the system fails the rank condition, then the coefficients are not identified. Hence, the estimated
coefficients should be consistent for small perturbations of the regime definitions.
Remember that the equivalent rank condition is testable. Therefore, the degree of misspecification can
be detected in the applications.
Assume the system is described by equations (5.9) and (5.10), and that the data exhibit heteroskedasticity
with S ∗ regimes, where there are no restrictions to the form of the heteroskedasticity. For simplicity denote
the variances of the structural shocks in each regime as follows:
σ 2η,s = (1 + δ η,s ) σ 2η,0
∀s 6= 0,
σ 2ε,s = (1 + δ ε,s ) σ 2ε,0
6.4. CONSISTENCY UNDER MISSPECIFICATION OF THE HETEROSKEDASTICITY. 123
where σ 2η,s and σ 2ε,s represent the variances of the idiosyncratic shocks in regime s, and δ η,s and δ ε,s are the
changes of those variances relative to the variances from regime s = 0.
Assume that only two regimes are used in the estimation. Without loss of generality assume that the
first window corresponds to the first set of ŝ < S ∗ regimes and that the second window corresponds to the
second set of S ∗ − ŝ regimes. The covariance matrices of each of the misspecified periods are given by:
21 P 2
σ η,s + 1ŝ σ ε,s α 1ŝ σ η,s + β 1ŝ
P 2 P 2 P 2
α ŝ σ ε,s
1 s<ŝ s<ŝ s<ŝ s<ŝ
Ωr1 = 1
σ 2η,s + β 2 1ŝ
P P 2
2 . σ ε,s
(1 − αβ) ŝ
s<ŝ s<ŝ
α2 S ∗1−ŝ σ 2η,s + 1
σ 2ε,s α S ∗1−ŝ σ η,s + β S ∗1−ŝ
P P P 2 P 2
S ∗ −ŝ σ ε,s
1 s>ŝ s>ŝ s>ŝ s>ŝ
Ωr2 = 1
σ 2η,s + β 2 S ∗1−ŝ
σ 2ε,s
2
P P
(1 − αβ) . S ∗ −ŝ
s>ŝ s>ŝ
where
1X 1 X
δ η,r1 = δ η,s and δ η,r2 = δ η,s (6.17)
ŝ S ∗ − ŝ
s<ŝ s>ŝ
1X 1 X
δ ε,r1 = δ ε,s and δ ε,r2 = ∗ δ ε,s . (6.18)
ŝ S − ŝ
s<ŝ s>ŝ
Proposition 22 Assume the true heteroskedasticity is described by S ∗ regimes and that those covariance
matrices satisfy the rank condition (6.4). Assume that only two regimes have been used in the estimation;
then, if the following conditions are satisfied, the system is identified and its estimates are consistent.
Proof. The first assumption in the proposition is to guarantee that the original system can be identified
if the heteroskedasticity is well specified. In the ill-specified model, identification is achieved if the relative
volatilities change. This is equivalent to
Equation (6.19) indeed guarantees that the two estimated covariance matrices are different. In other words,
it guarantees that the order condition will be satisfied; there is heteroskedasticity in the estimated model.
The next question is, as before, what are the conditions for consistency. Substituting into equation (6.3)
for the computed covariance matrices (Ωr1 and Ωr2 ) the estimated β satisfies,
σ 2η,0 σ 2ε,0 Φα
2 1 β
3 β − +β β+ = 0, (6.20)
(1 − αβ) α α
124 CHAPTER 6. IDENTIFICATION THROUGH HETEROSKEDASTICITY: THEORY.
where
Φ = (1 + δ ε,r1 ) (1 + δ η,r2 ) − (1 + δ ε,r2 ) (1 + δ η,r1 ) .
Note that if Φ is different from zero, then β solves the same quadratic equation as the original model. Φ is
different from zero if condition (6.19) is satisfied, and
δ η,r1 δ ε,r1
6= . (6.21)
δ η,r2 δ ε,r2
Condition (6.21) indicates that the change in the variances across the misspecified regimes cannot be pro-
portional. In other words, this is equivalent to the rank condition discussed before. Again, the two roots
solving equation (6.20) are β and 1/α.
In summary, even though the assumed form of the heteroskedasticity implies a smaller number of regimes
than those exhibited in the data, the system is identified and its estimates are consistent if and only if the
order and rank conditions are satisfied by the misspecified matrices.
It is important to mention that if the number of true regimes is smaller than the number of regimes used
in the estimation, then the system of equations does not satisfy the rank condition. In other words, there
are not enough independent equations to identify the system. It should be clear that in those cases the
estimates are inconsistent, and the confidence intervals are infinitely large.
The two cases analyzed in this section are probably the most common forms of misspecification. However,
they are not exhaustive. Depending on the particular application in which the identification is used, and the
possible misspecification problems that could be encountered, the consistency of the methodology should be
explored further.
6.4. CONSISTENCY UNDER MISSPECIFICATION OF THE HETEROSKEDASTICITY. 125
Caporale, G. M., A. Cipollini, and N. Spagnolo (2002): “Testing for Contagion: A Conditional
Correlation Analysis.,” CEMFE Mimeo.
√
Chen, S., and S. Khan (1999): “ n-Consistent Estimation of Heteroskedastic Sample Selection Models,”
University of Rochester, Mimeo.
Dungey, M., and V. L. Martin (2001): “Contagion Across Financial Markets: An Empirical Assessment,”
Australian National University Mimeo.
Fisher, F. M. (1976): The Identification Problem in Econometrics. Robert E. Krieger Publishing Co., New
York, second edn.
Haavelmo, T. (1947): “Methods of Measuring the Marginal Propensity to Consume,” Journal of the
American Statistical Association, 42, 105–122.
Hogan, V., and R. Rigobon (2003): “Using Unobserved Supply Shocks to Estimate the Returns to
Education,” NBER working paper 9145.
King, M., E. Sentana, and S. Wadhwani (1994): “Volatility and Links Between National Stock Markets,”
Econometrica, 62, 901–33.
Klein, R., and F. Vella (2000a): “Employing Heteroskedasticity to Identify and Estimate Triangular
Semiparametric Models,” Rutgers mimeo.
(2000b): “Identification and Estimation of the Binary Treatment Model Under Heteroskedasticity,”
Rutgers mimeo.
Koopmans, T., H. Rubin, and R. Leipnik (1950): Measuring the Equation Systems of Dynamic Eco-
nomicsvol. Statistical Inference in Dynamic Economic Models of Cowles Commission for Research in
Economics, chap. II, pp. 53–237. John Wiley and Sons, New York.
Lee, H. Y., L. Ricci, and R. Rigobon (2004): “Once Again, is Account Openness Good for Growth?,”
Journal of Development Economics, 75(2), 451–472.
Rigobon, R. (2000): “A Simple Test for the Stability of Linear Models under Heteroskedasticity, Omitted
Variable, and Endogneous Variable Problems.,” MIT Mimeo: http://web.mit.edu/rigobon/www/.
(2002): “The Curse of Non-Investment Grade Countries,” Journal of Development Economics,
69(2), 423–449.
(2003): “On the Measurement of the International Propagation of Shocks: Is the Transmission
Stable?,” Journal of International Economics, 61, 261–283.
127
128 BIBLIOGRAPHY
Rigobon, R., and D. Rodrik (2005): “Rule of Law, Democracy, Openness, and Income: Estimating the
Interrelationships,” The Economics of Transition, 13(3), 533–64.
Rigobon, R., and B. Sack (2003): “Measuring the Reaction of Monetary Policy to the Stock Market,”
Quarterly Journal of Economics, 118, 639–669.
(2004): “The Impact of Monetary Policy on Asset Prices,” Journal of Monetary Economics, 51,
1553–75.
Sentana, E. (1992): “Identification of Multivariate Conditionally Heteroskedastic Factor Models,” LSE,
FMG Discussion Paper, 139.
Sentana, E., and G. Fiorentini (2001): “Identification, Estimation and Testing of Conditional Het-
eroskedastic Factor Models,” Journal of Econometrics, 102(2), 143–164.
Wright, P. G. (1928): The Tariff on Animal and Vegetable Oils, The Institute of Economics. The Macmil-
lan Conpany, New York.