Principles of Econometrics 2024
Principles of Econometrics 2024
Valérie Mignon
Principles
of Econometrics
Theory and Applications
Classroom Companion: Economics
The Classroom Companion series in Economics includes undergraduate and grad-
uate textbooks alike. It welcomes fundamental textbooks aimed at introducing
students to the core concepts, empirical methods, theories and tools of the field, as
well as advanced textbooks written for students at the Master and PhD level seeking
a deeper understanding of economic theory, mathematical tools and quantitative
methods.
Valérie Mignon
Principles of Econometrics
Theory and Applications
Valérie Mignon
EconomiX-CNRS
University of Paris Nanterre
Nanterre Cedex, France
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
v
vi Preface
Work with econometric content has developed substantially during the twentieth
century, as demonstrated by the large number of journals on econometrics.1
Examples include: Biometrika, Econometrica, Econometric Theory, Econometric
Reviews, Journal of Econometrics, Journal of the American Statistical Association,
Journal of Time Series Analysis, and Quantitative Economics. There are also
journals with more applied content such as Empirical Economics, International
Journal of Forecasting, Journal of Applied Econometrics, Journal of Business and
Economic Statistics, and Journal of Financial Econometrics. In addition, many gen-
eral economic journals publish articles with strong econometric content: American
Economic Review, Economics Letters, European Economic Review, International
Economic Review, International Economics, Journal of the European Economic
Association, Quarterly Journal of Economics, and Review of Economic Studies.
The rise of econometrics can also be illustrated by the fact that recent Nobel
Prizes in economics have been awarded to econometricians. James Heckman and
Daniel McFadden received the Nobel Prize in Economics in 2000 for their work
on theories and methods for the analysis of selective samples and on discrete
choice models. Similarly, in 2003, the Nobel Prize in Economics was awarded to
Robert Engle and Clive Granger for their work on methods of analyzing economic
time series with (i) time-varying volatility (R. Engle) and (ii) common trends (C.
Granger), which has contributed to improved forecasts of economic growth, interest
rates, and stock prices. The Prize was also awarded to Christopher Sims and Thomas
Sargent in 2011 for their empirical work on cause and effect in the macroeconomy,
and to Eugene Fama, Lars Peter Hansen, and Robert Shiller in 2013 for their
empirical analysis of asset prices.
These different points testify that econometrics is a discipline in its own right and
a fundamental branch of economics.
This book aims to provide readers with the basics of econometrics. It is composed
of eight chapters. The first, introductory chapter recalls some essential concepts in
statistics and econometrics. Chapter 2 deals with the simple regression model. Chap-
ter 3 generalizes the previous chapter to the case of the multiple regression model, in
which more than one explanatory variable is included. In Chap. 4, the fundamental
themes of heteroskedasticity and autocorrelation of errors are addressed in detail.
Chapter 5 brings together a set of problems related to explanatory variables. It deals
successively with dependence between explanatory variables and the error term, the
problem of multicollinearity, and the question of stability of the estimated models.
Chapter 6 introduces dynamics into the models and presents distributed lag models.
Chapter 7 extends the previous chapter by presenting time series models, a branch
of econometrics that has undergone numerous developments over the last 40 years.
Finally, Chap. 8 deals with structural models by studying simultaneous equations
models.
1 Pirotte’s (2004) book gives a history of econometrics, from the origins of the discipline to its
recent developments. See also Morgan (1990) and Hendry and Morgan (1995).
Preface vii
Bringing together theory and practice, this book presents the basics of econometrics
in a clear and pedagogical way. It focuses on the acquisition of the methods
and skills that are essential for all students wishing to succeed in their studies
and for all practitioners wishing to apply econometric techniques. The approach
adopted in this textbook is resolutely applied. Through this book, the author
aims to meet a pedagogical and operational need to quickly put into practice
the various concepts presented (statistics, tests, methods, etc.). This is why, after
each theoretical presentation, numerous examples are given, as well as empirical
applications carried out on the computer using existing econometric and statistical
software.
This textbook is primarily intended for students of bachelor’s and master’s
degrees in Economics, Management, and Mathematics and Computer Sciences, as
well as for students of Engineering and Business schools. It will also be useful for
professionals who will find practical solutions to the various problems they face.
ix
Contents
1 Introductory Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 What Is Econometrics? Some Introductory Examples . . . . . . . . . . . . . . . . 1
1.1.1 Answers to Many Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 The Example of Consumption and Income . . . . . . . . . . . . . . . . . . . . 2
1.1.3 The Answers to the Other Questions Asked . . . . . . . . . . . . . . . . . . . 4
1.2 Model and Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 The Concept of Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 Different Types of Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.3 Explained Variable/Explanatory Variable . . . . . . . . . . . . . . . . . . . . . 9
1.2.4 Error Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Statistics Reminders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 Mean. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.2 Variance, Standard Deviation, and Covariance . . . . . . . . . . . . . . . . 11
1.3.3 Linear Correlation Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.4 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 A Brief Introduction to the Concept of Stationarity . . . . . . . . . . . . . . . . . . . 17
1.4.1 Stationarity in the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.2 Stationarity in the Variance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.3 Empirical Application: A Study of the Nikkei Index . . . . . . . . . 21
1.5 Databases and Software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5.1 Databases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5.2 Econometric Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
The Gist of the Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2 The Simple Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1.1 The Linearity Assumption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1.2 Specification of the Simple Regression Model and
Properties of the Error Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1.3 Summary: Specification of the Simple Regression Model. . . . 32
2.2 The Ordinary Least Squares (OLS) Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.1 Objective and Reminder of Hypotheses . . . . . . . . . . . . . . . . . . . . . . . 33
xi
xii Contents
xvii
Introductory Developments
1
After defining the concepts of model and variable, this chapter offers some statistical
reminders about the mean, variance, standard deviation, covariance, and linear
correlation coefficient. A brief introduction to the concept of stationarity is also
provided. Finally, this chapter lists the main databases in economics and finance,
as well as the most commonly used software packages. Beforehand, we give some
introductory examples to illustrate in a simple way what econometrics can do.
– Are the terms of trade a determinant of the value of exchange rates? Do other
economic variables have more impact?
– Is the purchasing power parity theory empirically verified?
– Do rising oil prices have a significant impact on car sales?
– Is the depreciation of the dollar compatible with rising oil prices?
– Is the euro overvalued? If so, by how much? In other words, what is the
equilibrium value of the euro?
– Are international financial markets integrated?
To answer these questions, the econometrician must build a model to relate the
variables of interest. Consider, for example, the question “What is the impact of an
increase of 10 monetary units in income on household consumption?”
To answer this question, two variables need to be taken into account: household
consumption and household income (gross disposable income). To relate these two
variables, we write an equation of the following type:
CONS = α + β × I N C
. (1.1)
1 The series are expressed in real terms, i.e., they are deflated by the consumer price index of each
country.
2 The data are extracted from the national statistical institutes of the two countries: Statistics
1,2E+11
1E+11
8E+10
6E+10
1995 2000 2005 2010 2015 2020
INC_FIN CONS_FIN
1,05E+12
1E+12
9,5E+11
9E+11
8,5E+11
1995 2000 2005 2010 2015 2020
INC_ITA CONS_ITA
it should therefore be positive. In other words, we expect the value obtained for
the coefficient .β to be positive. More specifically, if we estimate model (1.1), we
obtain the following values for the coefficient .β associated with income: 0.690 for
Finland and 0.721 for Italy. These values are positive, which means that an increase
in income is accompanied by an increase in consumption in both countries, all other
things being equal. We can also quantify this increase:
Although different, these two values are quite close, which means that household
consumption behavior, in relation to the change in income, is similar in Finland and
4 1 Introductory Developments
Italy, even though the economic characteristics of the two countries differ. In the
rest of this book, we will see that it is possible to refine these comments by studying
whether or not the values obtained are significantly different. This will be done
using statistical tests.
To conduct their analysis, econometricians have to find the data they need. In the
case of the example previously studied, the following series are needed: household
consumption, household gross disposable income, and the consumer price indexes
for Finland and Italy, i.e., a total of six series. For this purpose, econometricians
need access to databases. Nowadays, there are many such databases, some of which
are freely accessible. A non-exhaustive list of the main economic and financial
databases is given at the end of this chapter. Once the data have been collected,
it is possible to proceed with the study in question.
Let us now consider the various questions posed in Sect. 1.1.1 and give some
possible answers.
– Are the terms of trade a determinant of the value of exchange rates? Do other
economic variables have more impact?
The following data are required for the country under consideration: export
prices, import prices, and the exchange rate, the ratio between export prices and
import prices being used to measure the terms of trade. To assess whether the
terms of trade are a determinant of the exchange rate, it is necessary to estimate a
model that relates the exchange rate and the terms of trade and to test whether the
coefficient associated with the variable “terms of trade” is significantly different
from zero. To determine whether other economic variables have more impact, we
need to add them to the previous model and study their statistical significance.
Other potential determinants include the country’s net foreign asset position,
productivity, interest rate differential, etc.
– Is the purchasing power parity theory empirically confirmed?
According to the purchasing power parity (PPP) theory, each country’s currency
provides the same purchasing power in all countries. In other words, if the
products traded are physically identical (without transport costs), the nominal
exchange rate (indirect quote) is determined by the relative price of the good,
i.e., .Qt = Pt /Pt∗ , which can be written in logarithmic form: .qt = pt − pt∗ where
the lowercase variables are the logarithms of the uppercase variables, .Qt is the
nominal exchange rate, .Pt is the domestic consumer price index, and .Pt∗ is the
foreign consumer price index. In order to grasp the empirical validity of PPP, we
can estimate a relationship of the type .qt = α + β1 pt − β2 pt∗ and check that
.α = 0, .β1 = β2 = 1. This is done by statistically testing that the coefficients
returns .Rt−1 is zero or not. If it is zero, the efficient capital market hypothesis
is not called into question, since past values of returns do not provide any
information to explain the current change in returns.
– Is there international convergence in GDP per capita?
Analyzing the convergence of GDP per capita is fundamental to studying
inequalities between nations. In particular, this question raises the issue of
poor countries catching up with rich ones. If we are interested in conditional
convergence, the Solow model can be used. In this model, the growth rate of a
country’s per capita income depends on the level at which this income is situated
in relation to the long-run equilibrium path of the economy. It is then possible
to estimate a relationship to explain the GDP growth rate between the current
date and the initial date by the level of GDP at the initial date. If the coefficient
assigned to the level of GDP is zero, this indicates an absence of convergence.
– What is the impact of the 35-hour work week on unemployment?
There are several ways to approach this question. One is to estimate a relationship
to explain the unemployment rate by working hours, by varying those working
hours. If the impact of the 35-hour work week on the unemployment rate
is neutral, the coefficient assigned to the duration variable should be similar,
whether the duration is 35 or 39 hours.
– Can higher inflation reduce unemployment?
This question is linked to a relationship that is widely studied in macroe-
conomics, namely, the Phillips curve, according to which there is a negative
relationship between the unemployment rate and the inflation rate. This rela-
tionship will be studied in Chap. 2 in order to determine whether inflation has a
beneficial effect on unemployment.
– Does their parents’ socio-occupational category have an impact on children’s
level of education?
Such a question can again be addressed by estimating a relationship between
children’s level of education and their parents’ socio-occupational category
(SOC). If the coefficient assigned to SOC differs with the SOC, this indicates
an impact of SOC considered on children’s level of education.
– Does air pollution have an impact on children’s health?
Answering this question first requires some way of measuring air pollution and
children’s health. Once these two measures have been established, the analysis
is carried out in a standard way, by estimating a relationship linking children’s
health to air pollution.
– What are the effects of global warming on economic growth?
As before, once the means of measuring global warming (e.g., greenhouse gas
emissions) has been found, a relationship between economic growth and this
variable must be estimated.
Having presented these examples and introductory points, let us formalize the
various concepts, such as the notions of model and variable in more detail.
1.2 Model and Variable 7
C = f (Y )
. (1.2)
where f is such that .f ' > 0. However, three types of functions, or models, are
compatible with the fundamental psychological law:
C = cY + C0
. (1.3)
where .Ĉ designates the estimated consumption.4 By virtue of Eq. (1.4), it appears
that the estimated value of c is positive: the relationship between C and Y is indeed
increasing. Furthermore, the value 0.86 of the marginal propensity to consume
allows us to write that, all other things being equal, an increase of one monetary
unit in income Y is accompanied by an average increase of 0.86 monetary units in
consumption C.
Remark 1.1 The model (1.3) has only one equation describing the relationship
between consumption and income. This is a behavioral equation in the sense that
behavior, i.e., household consumption decisions, depends on changes in income.
The models may also contain technological relationships: these arise, for example,
from constraints imposed by existing technology, or from constraints due to
limited budgetary resources. In addition to these two types of relationships—
behavioral and technological relationships—models frequently include identities,
i.e., technological accounting relationships between variables. For example, the
relationship .Y = C + I + G, where Y denotes output, C consumption expenditure,
I investment expenditure, and G government spending, frequently used in economic
models, is an identity. No parameter needs to be estimated.
3 Strictlyspeaking, a reading of the General Theory suggests that the concave function seems
closest to Keynes’ words; the affine form, however, is the most frequently chosen for practical
reasons.
4 The circumflex (or hat) notation is a simple convention indicating that this is an estimate (and not
Having specified the model and in order to estimate it, it is necessary to have
data representative of the economic phenomena being analyzed. In the case of the
Keynesian consumption function, we need the consumption and income data for the
households studied. The main types of data are:
– Time series are variables observed at regular time intervals. For example, the
quarterly series of consumption of French households over the period 1970–2022
constitutes a time series in the sense that an observation of French household
consumption is available for each quarter between 1970 and 2022. The regularity
of observations is called the frequency. In our example, the frequency of the
series is quarterly. A time series can also be observed at annual, monthly, weekly,
daily, intra-daily, etc. frequency.
– Cross-sectional data are variables observed at the same moment in time and
which concern a specific group of individuals (in the statistical sense of the
term).5 An example would be a data set composed of the consumption of
French households in 2022, the consumption of German households in 2022,
the consumption of Spanish households in 2022, etc.
– Panel data are variables that concern a specific group of individuals and are
measured at regular time intervals. An example would be a data set composed
of the consumption of French households over the period 1970–2022, the con-
sumption of German households over the period 1970–2022, the consumption
of Spanish households over the period 1970–2022, etc. Panel data thus have a
double dimension: individual and temporal.
In the model representing the Keynesian consumption function, two variables are
involved: consumption and income. In accordance with relationship (1.3), income
appears to be the determinant of consumption. In other words, income explains
consumption. We then say that income is an explanatory variable and consumption
is an explained variable.
More generally, the variable we are trying to explain is called the explained
variable or endogenous variable or dependent variable. The explanatory variable
or exogenous variable or independent variable is the variable that explains the
endogenous variable. The values of the explained variable thus depend on the values
of the explanatory variable.
If the model consists of a single equation, there is only one dependent variable.
On the other hand, there may be several explanatory variables. For example,
household consumption can be explained not only by income, but also by the
C = cY + aU + C0
. (1.5)
Ct = cYt + C0
. (1.6)
where t denotes time. Such a model relates variables located at the same moment in
time. However, it is possible to introduce dynamics into the models. Let us consider,
for example, the following model:
Ct = cYt + αCt−1 + C0
. (1.7)
Past consumption (i.e., consumption at date .t −1) acts as an explanatory variable for
current consumption (i.e., consumption at date t). The explanatory variable .Ct−1 is
also called the lagged endogenous variable. The coefficient .α represents the degree
of inertia of consumption. Assuming that .α < 1, the closer .α is to 1, the greater the
degree of consumption inertia. In other words, a value of .α close to 1 means that
past consumption has a strong influence on current consumption. We also speak of
persistence.
In the model (1.3), it has been assumed that consumption is explained solely by
income. If such a relationship is true, it is straightforward to obtain the values of the
parameters c and .C0 : it suffices to have two observations and join them by a straight
line, the other observations lying on this same line. However, such a relationship
is not representative of economic reality. The fact that income alone is used as an
explanatory variable in the model may indeed seem very restrictive, as it is highly
likely that other variables contribute to explaining consumption. We therefore add a
term .ε which represents all other explanatory variables not included in the model.
The model is written:
C = cY + C0 + ε
. (1.8)
1.3 Statistics Reminders 11
The term .ε is a random variable called the error or disturbance. It is the error
in the specification of the model, in that it collects all the variables, other than
income, that have been ignored in explaining consumption. The error term thus
provides a measure of the difference between the observed values of consumption
and those that would be observed if the model were correctly specified. The error
term includes not only the model specification error, but it can also represent a
measurement error due to problems in measuring the variables under consideration.
The purpose of this section is to recall the definition of some basic statistical
concepts that will be used in the remainder of the book: mean, variance, standard
deviation, covariance, and linear correlation coefficient.
1.3.1 Mean
The (arithmetic) mean of a variable is equal to the sum of the values taken by
this variable, divided by the number of observations. Consider a variable X with
T observations: .X1 , X2 , . . . , XT . The (empirical) mean of this series, noted .X̄, is
given by:
1
T
1
.X̄ = (X1 + X2 + . . . + XT ) = Xt (1.9)
T T
t=1
Example 1.1 The six employees of a small company received the following wages
X (in euros): 1 200, 1 200, 1 300, 1 500, 1 500, and 2 500. The mean wage .X̄ is
therefore: .X̄ = 16 (1,200 + 1,200 + 1,300 + 1,500 + 1,500 + 2,500) = 1,533.33
euros. The mean could also have been calculated by weighting the wages by the
number of employees, i.e.: .X̄ = 16 (1,200×2+1,300×1+1,500×2+2,500×1) =
1,533.33 euros. This is a weighted arithmetic mean.
The variance .V (X) of a variable X is equal to the average of the squares of the
deviations from the mean:
1 2 1
T
2 2 2
.V (X) = X1 − X̄ + X2 − X̄ + . . . + XT − X̄ = Xt − X̄
T T
t=1
(1.10)
12 1 Introductory Developments
The standard deviation, noted .σX , is the square root of the variance, i.e.:
1 T
2
.σX = Xt − X̄ (1.11)
T
t=1
1 2
T
V (X) =
. Xt − X̄2 (1.12)
T
t=1
The use of this formula simplifies the calculations in that it is no longer necessary
to calculate deviations from the mean.
The relationships (1.10), (1.11), and (1.12) are valid when studying a popula-
tion.6 In practice, the study of a population is rare, and we are often limited to
studying a sub-part of the population, i.e., a sample. In this case, a slightly different
measure of variance is used, called the empirical variance, which is given by:7
1
T
2
2
sX
. = Xt − X̄ (1.13)
T −1
t=1
or:
1 2
T
T
2
.sX = Xt − X̄2 (1.14)
T −1 T −1
t=1
1 1
T T
Cov(X, Y ) =
. Xt − X̄ Yt − Ȳ = Xt Yt − X̄Ȳ (1.16)
T T
t=1 t=1
6A population is a set of elements, called statistical units or individuals, that we wish to study.
7 The division by .(T − 1) instead of T comes from the loss of one degree of freedom since the
empirical mean (and not the true population mean) is used in calculating the variance.
1.3 Statistics Reminders 13
The correlation coefficient is an indicator of the link between two variables.8 Thus,
when two variables move together, i.e., vary in the same direction, they are said to
be correlated.
Consider two variables X and Y . The linear correlation coefficient between these
two variables, noted .rXY , is given by:
Cov(X, Y )
. rXY = (1.17)
σX σY
or:
T
Xt − X̄ Yt − Ȳ
t=1
rXY =
. (1.18)
T 2 T 2
Xt − X̄ Yt − Ȳ
t=1 t=1
or alternatively:
T T T
T Xt Yt − Xt Yt
t=1 t=1 t=1
rXY =
. (1.19)
T T 2 T T 2
T Xt2 − Xt T Yt2 − Yt
t=1 t=1 t=1 t=1
. − 1 ≤ rXY ≤ 1 (1.20)
8 Ifmore than two variables are studied, the concept of multiple correlation must be used (see
below).
14 1 Introductory Developments
Remark 1.3 So far, we have considered a linear correlation between two variables
X and Y : the values of the pair .(X, Y ) appear to lie on a straight line (see Figs. 1.3
and 1.4). When these values are no longer on a straight line, but on a curve of
any shape, we speak of nonlinear correlation. Positive and negative nonlinear
correlations are illustrated in Figs. 1.6 and 1.7.
1.3 Statistics Reminders 15
Consider the following two annual series (see Table 1.1): the household consump-
tion series (noted C) and the household gross disposable income series (noted Y )
for France over the period 1990–2019. These two series are expressed in real terms,
i.e., they have been deflated by the French consumer price index. The number of
observations is 30.
From the values in Table 1.1, it is possible to calculate the following quantities,
which are necessary to determine the statistics presented above:
30
– . Ct = 30,455,596.93
t=1
30
– . Yt = 34,519,740.64
t=1
30
– . Ct2 = 3.14 × 1013
t=1
16 1 Introductory Developments
Table 1.1 Consumption and gross disposable income of households in France (in e million).
Annual data, 1990–2019
C Y C Y
1990 870,338.41 830,572.81 2005 1,187,709.02 1,049,755.63
1991 868,923.49 832,883.96 2006 1,228,476.20 1,081,354.36
1992 913,134.25 844,679.04 2007 1,260,465.07 1,103,446.70
1993 943,367.50 840,328.00 2008 1,286,487.36 1,128,089.30
1994 950,773.58 847,173.42 2009 1,278,688.70 1,102,747.33
1995 971,315.23 852,111.23 2010 1,292,234.68 1,115,634.36
1996 987,452.68 866,004.67 2011 1,284,970.35 1,114,405.93
1997 982,724.94 867,822.35 2012 1,281,460.14 1,109,228.05
1998 1,018,159.61 902,064.20 2013 1,267,030.06 1,116,070.14
1999 1,039,090.56 916,606.04 2014 1,282,764.87 1,125,426.60
2000 1,083,578.97 957,094.57 2015 1,295,592.76 1,142,198.12
2001 1,126,231.00 985,857.04 2016 1,311,829.11 1,156,441.13
2002 1,146,598.73 991,312.79 2017 1,328,847.32 1,171,560.14
2003 1,148,028.86 1,002,269.15 2018 1,345,881.90 1,183,006.67
2004 1,171,763.22 1,021,751.72 2019 1,365,822.06 1,197,701.47
Data sources: Insee for the consumption and consumer price index series, European Commission
for the gross disposable income series
30
– . Yt2 = 4.04 × 1013
t=1
30
– . Ct Yt = 3.56 × 1013
t=1
1
. C̄ = 30,455,596.93 = 1,015,186.56 (1.21)
30
1
. Ȳ = 34,519,740.64 = 1,150,658.02 (1.22)
30
– The standard deviation of the consumption and income series:
1 30
sC =
. 3.14 × 1013 − (1,015,186.56)2 = 125,970.16 (1.23)
29 29
1.4 A Brief Introduction to the Concept of Stationarity 17
and:
σC = 123,852.87
. (1.24)
1 30
sY =
. 4.04 × 1013 − (1,150,658.02)2 = 157,952.45 (1.25)
29 29
and:
σY = 155,297, 60
. (1.26)
1
.Cov (C, Y ) = 3.56 × 1013 − 1,015,186.56 × 1,150,658.02
30
= 19,084,753,775.26 (1.27)
Cov(C, Y ) 19,084,753,775.26
rCY =
. = = 0.9922 (1.28)
σC σY 123,852.87 × 155,297.60
We can see that the linear correlation coefficient is positive and very close to 1.
This indicates a strong positive correlation between consumption and income: the
two series move in the same direction.
This result can be illustrated graphically. Figure 1.8 clearly shows that the series
move together; they share a common trend. Figure 1.9 shows the values of the pair
.(C, Y ). These values are well represented by a straight line, illustrating the fact that
When working on time series, one must be careful to ensure that they are stationary
over time. The methods described in this book, particularly the ordinary least
squares method, are valid only if the time series are stationary. Only a graphical
intuition of the concept of stationarity will be given here; for more details, readers
can refer to Chap. 7. We distinguish between stationarity in the mean and stationarity
in the variance.
18 1 Introductory Developments
1,4E+12
1,3E+12
1,2E+12
1,1E+12
1E+12
9E+11
8E+11
1990 1995 2000 2005 2010 2015
Y C
Fig. 1.8 Consumption (C) and gross disposable income (Y ) series of French households (euros)
1,400,000
1,300,000
1,200,000
1,100,000
Y
1,000,000
900,000
800,000
800,000 880,000 960,000 1,040,000 1,120,000 1,200,000
C
A time series is stationary in the mean if its mean remains stable over time. As an
illustration, we have reproduced in a very schematic way a nonstationary series in
Fig. 1.10. We can see that the mean, represented by the dotted line, increases over
time.
In Fig. 1.11, the mean of the series is now represented by a straight line parallel
to the x-axis: the mean is stable over time, suggesting that the series is stationary in
1.4 A Brief Introduction to the Concept of Stationarity 19
Xt
Xt
the mean. Of course, this intuition must be verified statistically by applying specific
tests, called unit root tests (see Chap. 7).
In order to apply the usual econometric methods, the series studied must be
mean stationary. Otherwise, it is necessary to stationarize the series, i.e., to make
it stationary. The technique commonly used in practice consists in differentiating
the nonstationary series .Xt , i.e., in applying the first difference operator .Δ:
ΔXt = Xt − Xt−1
.
20 1 Introductory Developments
Xt
A stationary time series in the variance is such that its variance is constant over time.
It is also possible to graphically apprehend the concept of stationarity in the
variance. The series shown in Fig. 1.12 is nonstationary in the variance: graphically,
we can see a “funnel-like phenomenon,” indicating that the variance of the series
tends to increase over time. In order to reduce the variability of a series, the
logarithmic transformation is frequently used.9 The logarithm allows the series to
be framed between two lines, i.e., to eliminate the funneling phenomenon, as shown
schematically in Fig. 1.13.
Remark 1.4 In practice, when we want to make a series stationary in both the mean
and the variance, we must first make it stationary in the variance and, then, in the
mean. The result is a series in logarithmic difference. This logarithmic difference
9 The logarithmic transformation is a special case of the Box-Cox transformation used to reduce
the variability of a time series (see Box and Cox 1964, and Chap. 2) below.
1.4 A Brief Introduction to the Concept of Stationarity 21
Xt
Xt Xt − Xt−1
.Yt = Δ log Xt = log Xt − log Xt−1 = log = log 1 +
Xt−1 Xt−1
∼ Xt − Xt−1
= (1.29)
Xt−1
To illustrate the concept of stationarity, let us consider the Japanese stock market
index series: the Nikkei 225 index. This series, extracted from the Macrobond
database, has a quarterly frequency and covers the period from the third quarter
of 1949 to the second quarter of 2021 (1949.3–2021.2). The Nikkei index series
is reproduced in Fig. 1.14, whereas Fig. 1.15 represents the dynamics of this same
series in logarithms. These graphs highlight an upward trend in the first half of the
sample, followed by a general downward trend, and then an increasing trend from
the early 2010s. The mean therefore changes over time, reflecting that the Japanese
stock market index series seems nonstationary in the mean.
22 1 Introductory Developments
40000
35000
30000
25000
20000
15000
10000
5000
0
1949 1954 1959 1964 1969 1974 1979 1984 1989 1994 1999 2004 2009 2014 2019
11
10
4
1949 1954 1959 1964 1969 1974 1979 1984 1989 1994 1999 2004 2009 2014 2019
Faced with the apparent non-stationarity (in the mean) of the Nikkei index series,
we differentiate it by applying the first difference operator. We then obtain the series
of returns .Rt of the Nikkei index:
Xt ∼ Xt − Xt−1
.Rt = Δ log Xt = log Xt − log Xt−1 = log = (1.30)
Xt−1 Xt−1
1.5 Databases and Software 23
0,4
0,3
0,2
0,1
-0,1
-0,2
-0,3
-0,4
-0,5
1949 1954 1959 1964 1969 1974 1979 1984 1989 1994 1999 2004 2009 2014 2019
where .Xt denotes the Nikkei 225 stock index. The series of returns is displayed
in Fig. 1.16. As shown, the upward trend in the mean has been suppressed by
the differentiation operation, suggesting that the returns series is a priori mean
stationary.
As we have already mentioned, there are many databases in the field of economics
and finance, which have expanded considerably in recent decades. The aim here
is not to give an exhaustive list, but to provide some reference points concerning
a number of frequently used databases. Similarly, we will mention some of the
econometric software that practitioners often use.
1.5.1 Databases
– Bank for International Settlements (open access): financial and monetary data
– Banque de France (free access): economic, monetary, banking, and financial data
for France and the eurozone
24 1 Introductory Developments
– British Petroleum (open access): energy data (oil, gas, electricity, biofuels, coal,
nuclear, etc.)
– CEPII (open access): databases in international macroeconomics and interna-
tional trade
– Datastream/Eikon: economic and financial database with many series for all
countries
– DB.nomics (open access): many economic data sets provided by national and
international institutions for most countries
– ECONDATA (free access): server on databases available online
– Economagic (free access): numerous macroeconomic and financial series, on the
United States, the eurozone, and Japan
– Euronext (free access): data and statistics on stock markets
– European Central Bank (ECB Statistical Data Warehouse, open access): eco-
nomic and financial data for Europe
– Eurostat (free access): socio-economic indicators for European countries, aggre-
gated by theme, country, region, or sector
– Eurozone Statistics (ESCB, free access): eurozone and national central bank
statistics
– FAO (Food and Agriculture Organization of the United Nations, FAOSTAT, open
access): food and agricultural data for most countries
– Insee (free access): statistics and data series for the French economy, quarterly
national accounts
– International Monetary Fund (IMF, partly open access): numerous databases,
including International Financial Statistics (IFS) and World Economic Outlook
(WEO) covering most countries
– Macrobond: economic and financial database with a wide range of series for all
countries
– National Bureau of Economic Research (NBER, open access): various macroe-
conomic, sectoral, and international series
– OECD (open access): statistics and data at national and sectoral levels for OECD
countries, China, India, Indonesia, Russia, and South Africa
– Penn World Table (free access): annual national accounts series for many
countries
– UN (open access): macroeconomic and demographic series and statistics
– UNCTAD (open access): data on international trade, foreign direct investments,
commodity prices, population, macroeconomic indicators, etc.
– WebEc World Wide Web Resources in Economics (free access): server on
economics and econometrics resources
– Worldbank, World Development Indicators (WDI, free access): annual macroeco-
nomic and financial series for most countries, numerous economic development
indicators
– World Inequality Database (WID, open access): database on global inequalities
Most of the applications presented in this book have been processed with Eviews
software, this choice being here guided by pedagogical considerations. Of course,
there are many other econometric and statistical software packages, some of which
are freely available. We mention a few of them below, in alphabetical order, empha-
sizing once again that these lists—one of which concerns commercial software, the
other open-source software—are by no means intended to be exhaustive.
Let us start by mentioning some software packages that require a paid license:
– EViews: econometric software, more particularly adapted for time series analysis
– GAUSS: programming language widely used in statistics and econometrics
– LIMDEP and NLOGIT: econometric software adapted for panel data, discrete
choice, and multinomial choice models
– Matlab: programming language for data analysis, modeling, and algorithmic
programming
– RATS: econometric software, more particularly adapted for time series analysis
– S: statistical programming language; an open-access version of which is R (see
below)
– SAS: statistical and econometric software, allowing the processing of very large
databases
– SPAD: software for data analysis, statistics, data mining, and textual data analysis
– SPSS: statistical software for advanced analysis
– Stata: general statistical and econometric software, widely used, especially in
panel data econometrics
Conclusion
This introductory chapter has recalled some basic concepts in statistics and econo-
metrics. In particular, it has highlighted the importance of the correlation coefficient
in determining whether two variables move together. The next chapter extends this
with a detailed presentation of the basic econometric model: the simple regression
model. This model links the behavior of two variables, in the sense that one of them
explains the other. The notion of correlation is thus deepened, as we study not only
whether two variables move together, but also whether one of them has explanatory
power over the other.
T
Mean .X̄ = 1
T Xt
t=1
T 2
Variance .V (X)= T1 Xt − X̄
√ t=1
Standard deviation .σX = V (X)
T 2
.sX = Xt − X̄
2 1
Empirical variance T −1
t=1
Empirical standard deviation .sX = sX 2
T
.Cov(X, Y ) = Xt − X̄ Yt − Ȳ
1
Covariance T
t=1
Cov(X,Y )
Correlation coefficient .rXY = σX σY , .−1 ≤ rXY ≤ 1
Further Reading
2.1 General
2.1.1 The Linearity Assumption
Y = f (X)
. (2.1)
where Y is the dependent variable and X the explanatory variable. The function
f is said to be linear in X if the power of X is equal to unity and if X is not
multiplied or divided by another variable. In other words, Y is linearly related to X
if the derivative of Y with respect to X—i.e., the slope of the regression line—is
independent of X.
As an example, the model:
Y = 3X
. (2.2)
dY
is linear since . dX = 3: the derivative of Y with respect to X is independent of X.
More generally, the model:
Y = α + βX
. (2.3)
This model is not linear with respect to X and Y , but it is linear with respect to
log X and .log Y . Similarly the model:
.
. log Y = α + βX (2.5)
1
. log Y = α + β (2.7)
X
which is a linear model in .1/X and .log Y .
Y = βX2
. (2.8)
2.1 General 29
The model (2.9) thus becomes a linear model in .log X and .log Y .
Y = α + βX
. (2.10)
Y = α + βX2
. (2.11)
Y = α + β 2X
. (2.12)
and:
β
Y =α+
. X (2.13)
α
are not linear in the parameters.
Linear Model
We wrote in the introduction to this chapter that the simple regression model is a
linear model. The linearity discussed here is the linearity in the parameters. The
methods described in this chapter therefore apply to models that are linear in the
parameters. Of course, the model under study can also be linear in the variables, but
this is not necessary in the sense that it is sufficient that the model can be linearized.
In other words, the model can be linear in X or in any transformation of X.
Y = α + βX + ε
. (2.14)
30 2 The Simple Regression Model
where Y is the dependent variable, X is the explanatory variable, and .ε is the error
term (or disturbance). The parameters (or coefficients) of the model are .α and .β.
It is assumed that the variable X is observed without error, i.e., that X is a certain
variable. Therefore, the variable X is independent of the error term .ε. The variable
Y is a random variable, its random nature coming from the presence of the error
term in the model.
Suppose that the variables X and Y each include T observations: we note .Xt , t =
1, . . . , T , and .Yt , t = 1, . . . , T . The simple regression model is then written:
Yt = α + βXt + εt
. (2.15)
t may designate:
The error term cannot be predicted for every observation, but a number of
assumptions can be made, which are described below.
E (εt ) = 0 ∀t
. (2.16)
This assumption means that, on average, the model is correctly specified and
therefore that, on average, the error is zero.
error term is constant over time. In the case of a cross-sectional model, this refers to
the fact that the variance does not differ between individuals. The constant variance
assumption is the homoskedasticity hypothesis. A series whose variance is constant
is said to be homoskedastic.1 Mathematically, this hypothesis is written as follows:
E εt2 = σε2 ∀t
. (2.18)
E (εt ) = 0 ∀t
. (2.20)
0 ∀t /= t '
.E (εt εt ' ) = (2.21)
σε2 ∀t = t '
variance evolves over time (for a time series model) or differs between individuals (for a cross-
sectional model) is called a heteroskedastic series.
2 Central limit theorem: let .X , X , . . . , X , be n independent random variables with the same
1 2 n
probability density function of mean m and variance .σ 2 . When n tends to infinity, then the sample
n
mean .X̄ = n1 Xi tends towards a normal distribution with mean m and variance .σ 2 /n.
i=1
32 2 The Simple Regression Model
Appendix 2.2 for a detailed presentation of the normal distribution). We thus add
the assumption of normality of the distribution of the error term to the assumptions
of nullity of the expectation (Eq. (2.16)) and of homoskedasticity (Eq. (2.18)), which
can be written as follows:
.εt ∼ N 0, σε
2
(2.23)
where N denotes the normal distribution, and the sign “.∼” means “follow the law.”
Remark 2.3 The assumption that the errors follow a normal distribution with zero
expectation and constant variance and that they are not autocorrelated can also be
formulated by writing that the errors are normally and independently distributed
(Nid), which is noted:
εt ∼ Nid 0, σε2
. (2.24)
If the errors follow the same distribution other than the normal distribution, we
speak of identically and independently distributed (iid) errors, which is noted:
εt ∼ iid 0, σε2
. (2.25)
Remark 2.4 The assumption of normality of the errors is not necessary to establish
the results of the regression model. However, it does allow us to derive statistical
results and construct test statistics (see below).
The complete specification of the simple regression model studied in this chapter is
written as:
Yt = α + βXt + εt
. (2.26)
with:
E (εt ) = 0 ∀t
. (2.27)
0 ∀t /= t '
E (εt εt ' ) =
. (2.28)
σε2 ∀t = t '
and:
εt ∼ N 0, σε2
. (2.29)
2.2 The Ordinary Least Squares (OLS) Method 33
We can also write the complete specification of the simple regression model by
combining the relations (2.27), (2.28), and (2.29):
Yt = α + βXt + εt
. (2.30)
with:
εt ∼ Nid 0, σε2
. (2.31)
The parameters .α and .β of the simple regression model between X and Y are
unknown. If we wish to quantify this relationship between X and Y , we need to
estimate these parameters. This is our objective.
More precisely, from the observed values of the series .Xt and .Yt , the aim is to
find the quantified relationship between these two variables, i.e.:
Ŷt = α̂ + β̂Xt
. (2.32)
where .α̂ and .β̂ are the estimators of the parameters .α and .β. .Ŷt is the estimated (or
adjusted of fitted) value of .Yt . The most frequently used method for estimating the
parameters .α and .β is the ordinary least squares (OLS) method.
The implementation of the OLS method requires a certain number of assumptions
set out previously and recalled below:
3 Assuming that the variable .Xt is nonrandom simplifies the analysis in the sense that it allows us
to use mathematical statistical results by considering .Xt as a known variable for the probability
distribution of the variable .Yt . However, such an assumption is sometimes difficult to maintain in
practice, and the fundamental assumption is, in fact, the absence of correlation between the variable
.Xt and the error term.
34 2 The Simple Regression Model
Yt
^ ⎧ ^ ^ ^
^ et = Yt − Yt ⎨⎩ Yt = α + β × Xt
Yt
Xt
X
Figure 2.1 plots the values of the pair .(Xt , Yt ) for .t = 1, . . . , T . We obtain a scatter
plot that we try to fit with a line. Any line drawn through this scatter plot may be
considered as an estimate of the linear relationship under consideration:
Yt = α + βXt + εt
. (2.33)
The equation of such a line, called the regression line or OLS line, is:
Ŷt = α̂ + β̂Xt
. (2.34)
where .α̂ and .β̂ are the estimators of the parameters .α and .β. The estimated value
Ŷt of .Yt is the ordinate of a point on the line whose abscissa is .Xt . As shown in
.
Fig. 2.1, some points of the pair .(Xt , Yt ) lie above the line (2.34), and others lie
below it. There are therefore deviations, noted .et , from this line:
et = Yt − Ŷt = Yt − α̂ − β̂Xt
. (2.35)
to minimizing the sum of squared residuals. The OLS principle can then be stated:
T
OLS ⇐⇒ Min
. et2 (2.36)
t=1
The objective is to find .α̂ and .β̂ such that the sum of squared residuals is minimal.
. α̂ = Ȳ − β̂ X̄ (2.37)
and:
Cov(Xt , Yt )
. β̂ = (2.38)
V (Xt )
Let us demonstrate these formulas. Using Eq. (2.35), we can write the sum of
squared residuals as:
T T
2
. et2 = Yt − α̂ − β̂Xt (2.39)
t=1 t=1
To obtain the estimators .α̂ and .β̂, we have to minimize this expression with
respect to the parameters .α̂ and .β̂. We are therefore looking for the values .α̂ and
.β̂ such that:
T
T
∂ et2 ∂ et2
t=1 t=1
. = =0 (2.40)
∂ α̂ ∂ β̂
First, let us calculate the derivative of the sum of squared residuals with respect
to .α̂:
T T 2
2
∂ et ∂ Yt − α̂ − β̂Xt
t=1 t=1
. = =0 (2.41)
∂ α̂ ∂ α̂
36 2 The Simple Regression Model
That is:
T
. −2 Yt − α̂ − β̂Xt = 0 (2.42)
t=1
Hence:
T
. Yt − α̂ − β̂Xt = 0 (2.43)
t=1
T
Noting that . α̂ = T α̂, we deduce:
t=1
T
T
. Yt = T α̂ + β̂ Xt (2.44)
t=1 t=1
Now let us determine the derivative of the sum of squared residuals with respect
to .β̂:
T 2
T
∂ et2 ∂ Yt − α̂ − β̂Xt
t=1 t=1
. = =0 (2.45)
∂ β̂ ∂ β̂
That is:
T
. −2 Yt − α̂ − β̂Xt Xt = 0 (2.46)
t=1
Hence:
T
. Yt − α̂ − β̂Xt Xt = 0 (2.47)
t=1
T
T
T
. Xt Yt = α̂ Xt + β̂ Xt2 (2.48)
t=1 t=1 t=1
1 1
T T
. Yt = α̂ + β̂ Xt (2.49)
T T
t=1 t=1
Hence:
. Ȳ = α̂ + β̂ X̄ ⇐⇒ α̂ = Ȳ − β̂ X̄ (2.50)
Equation (2.50) gives us the OLS estimator .α̂ of .α and states that the regression
line passes through the mean point . X̄, Ȳ .
Let us now determine the expression of the OLS estimator .β̂ of .β. For this
purpose, we replace .α̂ by its value given in (2.50) in Eq. (2.48):
T
T
T
. Xt Yt = Ȳ − β̂ X̄ Xt + β̂ Xt2 (2.51)
t=1 t=1 t=1
That is:
T
T
T
T
. Xt Yt = β̂ Xt2 − X̄ Xt + Ȳ Xt (2.52)
t=1 t=1 t=1 t=1
We deduce:
T
T
T
T
. β̂ Xt2 − X̄ Xt = Xt Yt − Ȳ Xt (2.53)
t=1 t=1 t=1 t=1
Hence:
T
T
T
Xt Yt − 1
T Xt Yt
t=1 t=1 t=1
. β̂ = 2 (2.54)
T
T
Xt2 − 1
T Xt
t=1 t=1
We have:
2
1 2 1 2 1
T T T
.V (Xt ) = Xt − X̄ = Xt −
2
Xt (2.55)
T T T
t=1 t=1 t=1
38 2 The Simple Regression Model
Hence:
2
T
1
T
T V (Xt ) =
. Xt2 − Xt (2.56)
T
t=1 t=1
1
T
Cov(Xt , Yt ) =
. Xt Yt − X̄ Ȳ (2.57)
T
t=1
That is:
1 1 1
T T T
Cov(Xt , Yt ) =
. Xt Yt − Xt Yt (2.58)
T T T
t=1 t=1 t=1
Hence:
T
1
T T
T Cov(Xt , Yt ) =
. Xt Yt − Xt Yt (2.59)
T
t=1 t=1 t=1
T Cov(Xt , Yt )
.β̂ = (2.60)
T V (Xt )
Cov(Xt , Yt )
β̂ =
. (2.61)
V (Xt )
Remark 2.5 (Case of Centered Variables) When the variables are centered, i.e.,
when observations are centered on their mean:
xt = Xt − X̄ and yt = Yt − Ȳ
. (2.62)
the OLS estimators .α̂ and .β̂ are, respectively, given by:
α̂ = Ȳ − β̂ X̄
. (2.63)
2.2 The Ordinary Least Squares (OLS) Method 39
and:
T
xt yt
t=1
β̂ =
. (2.64)
T
xt2
t=1
Remark 2.6 Here we have focused on estimating the regression model using the
OLS method. Another estimation method is the maximum likelihood procedure.
This method is presented in the appendix to this chapter. It leads to the same
estimators of the coefficients .α and .β as the OLS method. However, the maximum
likelihood estimator of the error variance is biased (see Appendix 2.3).
πt − E [πt |It−1 ] = γ ut − u∗ + εt
. (2.65)
where .πt is the inflation rate (measured as the growth rate of the consumer price
index) at date t, .E [πt |It−1 ] is the expectation (made at date .t − 1) for the
inflation rate .πt given the set of information I available at date .(t − 1), .ut is the
unemployment rate at date t, and .u∗ is the natural rate of unemployment. In order to
make this model operational, we need to make an assumption about the formation of
expectations. Let us assume that the expected inflation rate is equal to the inflation
rate of the previous period, i.e.:
πt − πt−1 = α + βut + εt
. (2.67)
where .β = γ and .α = −γ u∗ . This equation shows that the variation in the inflation
rate between t and .t − 1 is a function of the unemployment rate at date t. It is also
4 The original version related the rate of change of nominal wages to the unemployment rate. Let
us recall that this was originally a relationship estimated by Phillips (1958) for the British economy
for the period 1861–1957.
40 2 The Simple Regression Model
α̂
u∗ =
. (2.68)
β̂
Equation (2.67) is a simple regression model since it explains the variation in the
inflation rate by a single explanatory variable, the unemployment rate. To illustrate
this, let us consider annual data for the inflation rate and the unemployment rate in
the United States over the period 1956–2020. Of course, calculating the change in
the inflation rate at t requires the value of the inflation rate at .(t − 1) to be known.
Given that this series only begins in 1957, the estimation of Eq. (2.67) will therefore
cover the period 1957–2020. Table 2.1 shows the first and last values of each series.
Before proceeding with the estimation, let us graphically represent the series
in order to get a first idea of the potential relationship between the two variables.
Figure 2.2 reproduces the dynamics of the unemployment rate (denoted U NEMP )
and the variation in the inflation rate (denoted DI N F ) over the period 1957–2020.
Generally, this graph shows that there seems to be a negative relationship between
the two variables, in the sense that periods of rising unemployment are frequently
associated with periods of falling inflation and vice versa. We would therefore
expect to find a negative relationship between the two variables.
To extend this intuition, we can graphically represent the scatter plot, i.e., the
values of the pair (unemployment rate, change in the inflation rate). Figure 2.3
shows that the scatter plot appears to be concentrated around a line with a generally
decreasing trend, confirming the negative nature of the relationship between the two
variables. Let us now proceed to the OLS estimation of the relationship between the
two variables to confirm these intuitions.
2.2 The Ordinary Least Squares (OLS) Method 41
12
10
-2
-4
-6
1957 1961 1965 1969 1973 1977 1981 1985 1989 1993 1997 2001 2005 2009 2013 2017
DINF UNEMP
Fig. 2.2 Unemployment rate (U N EMP ) and change in the inflation rate (DI N F ), United States,
1957–2020
2
DINF
-2
-4
-6
3 4 5 6 7 8 9 10 11
UNEMP
Estimating Eq. (2.67) performed over the period 1957–2020 leads to the follow-
ing result:
πt
. − πt−1 = 2.70 − 0.46ut (2.69)
This model shows us that the coefficient assigned to the unemployment rate is
negative: there is indeed a decreasing relationship between the unemployment rate
42 2 The Simple Regression Model
1.20E+12
INCOME
8.00E+11
4.00E+11
0.00E+00
0.00E+00 4.00E+11 8.00E+11 1.20E+12
CONSUMPTION
and the change in the inflation rate. The estimated value, .−0.46, also allows us to
write that if the unemployment rate falls by 1 point, the change in the inflation rate
increases by 0.46 points on average. The ratio 2.70/0.46 gives us the estimated value
of the natural unemployment rate, i.e., 5.87. Over the period under consideration, the
natural unemployment rate is therefore equal to 5.87%. Note in particular that, while
between 2014 and 2019 the observed unemployment rate was lower than its natural
level, this was no longer the case in 2020—a result that may well be explained by
the effects of the Covid-19 pandemic.
5 The series were deflated by the consumer price index of each country.
6 The data are from the World Bank. The 43 countries considered are Albania, Armenia, Austria,
Azerbaijan, Belarus, Belgium, Bulgaria, Canada, Croatia, Czech Republic, Denmark, Estonia, Fin-
land, France, Georgia, Germany, Greece, Hungary, Iceland, Ireland, Italy, Kazakhstan, Kyrgyzstan,
Latvia, Lithuania, Luxembourg, Macedonia, Moldova, Netherlands, Norway, Poland, Portugal,
Romania, Russia, Serbia and Montenegro, Slovakia, Slovenia, Spain, Sweden, Switzerland,
Turkey, Ukraine, and United Kingdom.
2.2 The Ordinary Least Squares (OLS) Method 43
.
CONSUMPTION 2004 = 3.98.10 + 0.61INCOME 2004
9
(2.70)
This estimation shows that the relationship between consumption and income is
indeed increasing, since the value of the coefficient assigned to income is positive.
This coefficient represents the marginal propensity to consume: an increase of 10
monetary units in gross disposable income in 2004 leads, all other things being
equal, to an increase of 6.1 monetary units in consumption the same year.
α̂ = Ȳ − β̂ X̄
. (2.71)
Cov(Xt , Yt )
β̂ =
. (2.72)
V (Xt )
The expression:
Ŷt = α̂ + β̂Xt
. (2.73)
is the regression line or OLS line. .β̂ is the slope of the regression line. The variable
Ŷt is the estimated variable (or adjusted or fitted variable). The difference between
.
the observed value and the estimated value of the dependent variable is called the
residual:
et = Yt − Ŷt = Yt − α̂ − β̂Xt
. (2.74)
Property 2.1 The regression line passes through the mean point . X̄, Ȳ .
This property, as we have seen, is derived from the relationship .Ȳ = α̂ + β̂ X̄.
Furthermore, knowing that the regression line is given by:
Ŷt = α̂ + β̂Xt
. (2.75)
44 2 The Simple Regression Model
we deduce:
Ŷ = α̂ + β̂ X̄ = Ȳ
. (2.76)
Property 2.2 The observed .Yt and estimated .Ŷt variables have the same mean:
Ŷ = Ȳ .
.
Knowing that the residuals are given by the difference between the observed and
estimated variables, i.e., .et = Yt − Ŷt , we have:
ē = Ȳ − Ŷ
. (2.77)
ē = 0
. (2.78)
T
. et = 0 (2.79)
t=1
Property 2.4 The covariance between the residuals and the explanatory variable .Xt
is zero, as is the covariance between the residuals and the estimated variable .Ŷt :
Cov (Xt , et ) = 0 and Cov Ŷt , et = 0
. (2.80)
Moreover:
.Cov Xt , Ŷt = Cov Xt , α̂ + β̂Xt = Cov Xt , β̂Xt = β̂Cov (Xt , Xt )
Hence:
Cov Xt , Ŷt = Cov(Xt , Yt )
. (2.84)
Cov (Xt , et ) = 0
. (2.85)
stipulating the absence of correlation between the explanatory variable and the
residuals.
Let us now show that .Cov Ŷt , et = 0. We have:
Cov Ŷt , et = Cov α̂ + β̂Xt , et = Cov β̂Xt , et = β̂Cov (Xt , et )
. (2.86)
which means that the estimated variable and the residuals are not correlated.
.V (X + Y ) = V (X) + V (Y ) + 2Cov(X, Y )
.V (X − Y ) = V (X) + V (Y ) − 2Cov(X, Y )
.V (aX) = a V (X)
2
.V (a + X) = V (X)
.Cov(X, X) = V (X)
.Cov(aX, bY ) = abCov(X, Y )
.Cov(a + X, b + Y ) = Cov(X, Y )
46 2 The Simple Regression Model
Property 2.5 A change of origin does not modify the parameter .β̂.
. Wt = Xt + a and Zt = Yt + b (2.88)
where a and b are constants. The regression model .Yt = α +βXt +εt is then written
as:
Zt − b = α + β (Wt − a) + εt
. (2.89)
Hence:
Zt = α + b − βa + βWt + εt
. (2.90)
Zt = α ' + βWt + εt
. (2.91)
It appears that the intercept is modified, but not the parameter .β. We can also
note that:
Cov(Wt , Zt ) Cov(Xt + a, Yt + b) Cov(Xt , Yt )
β̂ =
. = = (2.92)
V (Wt ) V (Xt + a) V (Xt )
where a and b are constants. The regression model .Yt = α + βXt + εt is then
written:
Zt Wt
. =α+β + εt (2.94)
b a
Hence:
Wt
. Zt = bα + bβ + bεt (2.95)
a
The OLS estimators .α̂ and .β̂ of the parameters .α and .β are:
– Linear estimators; in other words, they are functions of the dependent variable
.Yt .
– Unbiased estimators; this means that .E α̂ = α and .E β̂ = β: the bias of
each of the estimators (.Bias α̂ = E α̂ − α and .Bias β̂ = E β̂ − β) is
zero.
– Minimum variance estimators. The estimators .α̂ and .β̂ are the unbiased
estimators with the lowest variance among all the possible linear unbiased
estimators.
The OLS estimators .α̂ and .β̂ are therefore BLUE (the best linear unbiased
estimators). Let us now demonstrate each of these properties.
Linear Estimators
Consider the centered variables .xt = Xt − X̄, .yt = Yt − Ȳ , and let .wt be defined as:
xt
.wt = (2.98)
T
xt2
t=1
T
xt Yt
t=1
T
β̂ =
. = wt Yt (2.99)
T
xt2 t=1
t=1
and:
T
1
α̂ =
. − X̄wt Yt (2.100)
T
t=1
48 2 The Simple Regression Model
The expression (2.99) reflects the fact that .β̂ is a linear estimator of .β: .β̂ indeed
appears as a linear function of the dependent variable .Yt . It is the same for .α̂ which
is expressed as a linear function of .Yt according to Eq. (2.100): .α̂ is thus a linear
estimator of .α.
Let us summarize this first result concerning the properties of the OLS estimators
as follows.
Property 2.7 The OLS estimators .α̂ and .β̂ are linear estimators of the parameters .α
and .β.
Unbiased Estimators
Starting
from the linearity property of estimators, it is possible to show that
.E β̂ = β and .E α̂ = α, leading to the following property (the proof is given
in Appendix 2.1.2).
Property 2.8 The OLS estimators .α̂ and .β̂ are unbiased estimators of the parameters
α and .β:
.
E α̂ = α
. (2.101)
E β̂ = β
. (2.102)
σε2 σε2
V (β̂) =
. = (2.103)
T T V (Xt )
xt2
t=1
and:
⎛ ⎞
T T
⎜1 2 ⎟ xt2 + T X̄2 Xt2
2⎜ X̄ ⎟ 2 t=1 2 t=1
.V (α̂) = σε ⎜ + ⎟ = σε = σε 2 (2.104)
⎝T
T
2
⎠ T
2
T V (Xt )
xt T xt
t=1 t=1
Property 2.9 The OLS estimators .α̂ and .β̂ are consistent estimators of the parame-
ters .α and .β:
It can also be shown that the OLS estimators .α̂ and .β̂ are estimators of
minimum variance among the class of linear unbiased estimators (see demonstration
in Appendix 2.1.3).
Property 2.10 In the class of linear unbiased estimators, the OLS estimators .α̂ and
β̂ are of minimum variance.
.
Putting together all the properties of the OLS estimators presented in this section,
we can finally state the following fundamental property.
Property 2.11 The OLS estimators .α̂ and .β̂ are the best linear unbiased estimators
of the parameters .α and .β: they are BLUE.
It is because of this property that the OLS method is very frequently used.
that is:
et = εt − α̂ − α − β̂ − β Xt
. (2.107)
1 2
T
.σ̂ε2 = et (2.108)
T −2
t=1
σ̂ε2 σ̂ε2
V
.(β̂) = = (2.109)
T T V (Xt )
xt2
t=1
Similarly, from Eq. (2.104), we have the estimator of the variance of .α̂:
T
T
Xt2 Xt2
V
t=1 t=1
.(α̂) = σ̂ε2 = σ̂ε2 (2.110)
T T 2 V (Xt )
T xt2
t=1
To illustrate the OLS method, let us consider the following two series:
– The series of returns of the US Dow Jones Industrial Average index, denoted
RDJ
– The series of returns of the Euro Stoxx 50, i.e., the European stock market index,
denoted REURO
These two series, taken from the Macrobond database, have a quarterly frequency
over the period from the second quarter of 1987 to the second quarter of 2021, i.e.,
a total of 137 observations.
Figure 2.5 shows that the returns series move in much the same way, which is not
surprising given the international integration of financial markets. Figure 2.6 further
shows that the scatter plot can be reasonably adjusted by a regression line of the
type:
REU
. ROt = α̂ + β̂RDJt (2.111)
We assume here that the dependent variable corresponds to the returns of the
European index, the explanatory variable being the returns of the US index. This
choice can be justified by the fact that it is frequently admitted that the US stock
market has an influence on all the other international stock markets.
Our purpose is to obtain the estimated values .α̂ and .β̂ by applying the OLS
method:
α̂ = REU RO − β̂RDJ
. (2.112)
2.2 The Ordinary Least Squares (OLS) Method 51
.3
.2
.1
.0
-.1
-.2
-.3
-.4
1990 1995 2000 2005 2010 2015 2020
REURO RDJ
.3
.2
.1
.0
REURO
-.1
-.2
-.3
-.4
-.32 -.28 -.24 -.20 -.16 -.12 -.08 -.04 .00 .04 .08 .12 .16 .20
RDJ
and:
Cov(RDJ, REU RO)
β̂ =
. (2.113)
V (RDJ )
Table 2.2 presents the calculations required to determine the estimators .α̂ and .β̂.
We thus have:
1
RDJ =
. 2.7061 = 0.0196 (2.114)
137
1
REU RO =
. 1.5421 = 0.0113 (2.115)
137
1
V (RDJ ) =
. 0.9080 − (0.0196)2 = 0.0062 (2.116)
137
1
Cov (RDJ, REU RO) =
. 1.0183 − 0.0196 × 0.0113 = 0.0072 (2.117)
137
From these calculations, we derive the values of the estimators .α̂ and .β̂:
0.0072
β̂ =
. = 1.1559 (2.118)
0.0062
and:
REU
. ROt = −0.0116 + 1.1559RDJt (2.120)
So far, the assumption that the error term follows a normal distribution has not
been made, since it was not necessary to establish the main results of the regression
analysis. This assumption can now be introduced to determine the distribution
followed by the estimators .α̂ and .β̂, as well as by the estimator .σ̂ε2 of the variance
of the error term.
Since .α̂ and .β̂ are linear functions of the error term .ε, they are also normally
distributed. The expectation and variance of these two normal distributions still have
to be specified.
that .α̂ and .β̂ are unbiased estimators of .α and .β, that is: .E α̂ = α
We know
and .E β̂ = β. Moreover, we have shown that the variances of the two estimators
are given by (Eqs. (2.104) and (2.103)):
⎛ ⎞
T T
⎜1 2 ⎟ xt2 + T X̄2 Xt2
2⎜ X̄ ⎟ t=1 t=1
.V (α̂) = σε ⎜ + ⎟ = σε2 = σε2 2 (2.121)
⎝T
T ⎠ T T V (Xt )
xt2 T xt2
t=1 t=1
and:
σε2
V (β̂) =
. (2.122)
T
xt2
t=1
54 2 The Simple Regression Model
We deduce the distributions followed by the two estimators .α̂ and .β̂:
⎛ ⎛ ⎞⎞
⎜ ⎜1 X̄2 ⎟ ⎟
⎜ ⎜ ⎟⎟
. α̂ ∼ N ⎜α, σε2 ⎜ + ⎟⎟ (2.123)
⎝ ⎝T
T ⎠⎠
xt2
t=1
and:
⎛ ⎞
⎜ σ2 ⎟
⎜ ⎟
. β̂ ∼ N ⎜β, ε ⎟ (2.124)
⎝ T ⎠
xt2
t=1
These expressions are a function of .σε2 which is unknown. In order to make them
operational, it is necessary to replace .σε2 by its estimator .σ̂ε2 given by (Eq. (2.108)):
1 2
T
. σ̂ε2 = et (2.125)
T −2
t=1
σ̂ε2
. (T − 2) ∼ χT2 −2 (2.126)
σε2
T
et2
t=1
. ∼ χT2 −2 (2.127)
σε2
(continued)
2.3 Tests on the Regression Parameters 55
√
z r
t= √
. (2.129)
v
The statistics:
w/s
F =
. (2.132)
v/r
F ∼ F (s, r)
. (2.133)
α̂ − α
. ∼ N (0, 1) (2.134)
1
σε
T + T
X̄2
2 xt
t=1
56 2 The Simple Regression Model
and:
β̂ − β
. ∼ N (0, 1) (2.135)
T
σε / xt2
t=1
Let us examine what happens to these expressions when we replace .σε by its
estimator .σ̂ε . Using the results given in Box 2.2, let us posit:
⎛ ⎞
⎜ ⎟
T
⎜ ⎟ et2 /σε
⎜ α̂ − α ⎟ t=1
.t = ⎜ ⎟/
⎜ ⎟ √
T −2
(2.136)
⎜ σ 1 + X̄2 ⎟
⎝ ε T T ⎠
xt2
t=1
Hence:
α̂ − α
. ∼ t (T − 2) (2.137)
1
σ̂ε T + X̄2
T
2 xt
t=1
we deduce that:
⎛ ⎞
T
⎜ ⎟ et2 /σε
⎜ β̂ − β ⎟ t=1
.t = ⎜ ⎟/ √ (2.139)
⎜ ⎟ T −2
⎝
T ⎠
σε / xt2
t=1
Equations (2.137) and (2.139) highlight the fact that replacing .σε2 by its estimator
2
.σ̂εamounts to replacing a normal distribution by a Student’s t distribution. When
the sample size T is sufficiently large, the Student’s t distribution tends to a standard
normal distribution. In practice, when the number of observations exceeds 30
2.3 Tests on the Regression Parameters 57
(T > 30), we consider that the Student’s t distribution in Eqs. (2.137) and (2.139)
.
We present the tests on the two parameters .α and .β, even if the tests on .β are in
practice more frequently used.
Test on α
By virtue of (2.137), it is possible to construct a .100(1 − p)% confidence interval
for .α, that is:
1 X̄2
.α̂ ± tp/2 σ̂ε
T + T
(2.140)
2
xt
t=1
where .tp/2 is the value obtained from the Student’s t distribution for the .100 (p/2)%
significance level. This value is called the critical value of the Student’s t law at the
.100(p/2)% significance level. We often use .p = 0.05, which corresponds to a 95%
confidence interval.
Remark 2.7 The significance level corresponds to the probability of rejecting the
null hypothesis when it is true. It is also called the size of the test.
Remark 2.8 The confidence interval (2.140) can also be written as:
⎡ ⎤
⎢ 1 1 ⎥
⎢ − + X̄2
< α < α̂ + tp/2 σ̂ε X̄2 ⎥ = 100(1 −
.P rob
⎣ α̂ t p/2 σ̂ε T
T T +
T ⎦
xt2 2
xt
t=1 t=1
p)%
It is then possible to test the null hypothesis that the coefficient .α is equal to a
given value .α0 :
H0 : α = α0
. (2.141)
H1 : α /= α0
. (2.142)
58 2 The Simple Regression Model
α̂ − α0
. ∼ t (T − 2) (2.143)
1
σ̂ε T + X̄2
T
2 xt
t=1
Test on β
By virtue of (2.139), we can construct a .100(1 − p)% confidence interval for .β, that
is:
T
.β̂ ± tp/2 σ̂ε /
xt2 (2.144)
t=1
As for .α, it is possible to test the null hypothesis that the coefficient .β is equal to
a given value .β0 :
H0 : β = β0
. (2.145)
H1 : β /= β0
. (2.146)
β̂ − β0
. ∼ t (T − 2) (2.147)
T
σ̂ε / xt2
t=1
2.3 Tests on the Regression Parameters 59
H0 : β = 0
. (2.148)
H0 : β /= 0
. (2.149)
This is a test of coefficient significance, also called the t-test. Thus, under the
null hypothesis, the coefficient associated with the variable .Xt is not significant: .Xt
plays no role in determining the dependent variable .Yt . The test is performed by
replacing .β0 by 0 in (2.147). The test statistic is then given by:
β̂
. (2.150)
T
σ̂ε / xt2
t=1
This expression corresponds to the ratio of the estimated coefficient .β̂ on its
estimated standard deviation .σβ̂ , which is noted .tβ̂ . The quantity:
β̂
tβ̂ =
. (2.151)
σβ̂
This t-test is widely used in practice. It can of course be applied in a similar way
to the coefficient .α.
Test on σε2
It is also possible to construct a test on the variance of the error term from the
equation:
σ̂ε2
. (T − 2) ∼ χT2 −2 (2.152)
σε2
or:
! "
(T − 2) σ̂ε2 (T − 2) σ̂ε2
.P rob
2
< σε2 < 2
= 100(1 − p)% (2.154)
χ1−p/2 χp/2
H0 : σε2 = σ02
. (2.155)
2.3 Tests on the Regression Parameters 61
Let us go back to the previous example linking the following two series:
– The series of returns of the Dow Jones Industrial Average index, RDJ
– The series of returns of the Euro Stoxx 50 index, REU RO
REU
. ROt = −0.0116 + 1.1559RDJt (2.156)
We can now ask whether or not the constant and the coefficient of the slope of
the regression line are significantly different from zero. To this end, let us calculate
the t-statistics of these two coefficients:
α̂ β̂
tα̂ =
. and tβ̂ = (2.157)
σα̂ σβ̂
T
RDJt2
V
.(α̂) = 2 t=1
σ̂ε 2 (2.158)
T V (RDJt )
and:
σ̂ε2
V
.(β̂) = (2.159)
T V (RDJt )
1 2
T
σ̂ε2 =
. et (2.160)
T −2
t=1
.
et = REU ROt − REU ROt (2.161)
Table 2.3 presents the calculations needed to obtain the residuals and the sum of
squared residuals.
62 2 The Simple Regression Model
The estimated values .REU ROt of .REU ROt are determined as follows:
It can be seen from Table 2.3 that the sum of the values of .REU ROt is equal
to the sum of the values of .REU ROt , illustrating that the observed series and the
estimated series have the same mean.
We derive the values of the residuals:
137
We find that . et2 = 0.4322. Hence:
t=1
1
.σ̂ε2 = 0.4322 = 0.0032 (2.162)
137 − 2
0.9080
.V
(α̂) = 0.0032 = 2.4828.10−5 (2.163)
1372 × 0.0062
2.3 Tests on the Regression Parameters 63
Hence:
σ̂α̂ = 0.0050
. (2.164)
So finally:
−0.0116
tα̂ =
. = −2.3232 (2.165)
0.0050
V
0.0032
.(β̂) = = 0.0037 (2.166)
137 × 0.0062
and:
1.1559
tβ̂ = √
. = 18.8861 (2.167)
0.0037
Having determined the t-statistics of the coefficients .α̂ and .β̂, given by
Eqs. (2.165) and (2.167), we can perform the significance tests:
. H0 : α = 0 against H1 : α /= 0 (2.168)
and:
H0 : β = 0 against H1 : β /= 0
. (2.169)
The number of observations is .T = 137. Recall that, under the null hypothesis,
the .tα̂ and .tβ̂ statistics follow Student’s t distributions with .(T − 2) degrees of
freedom. Reading the Student’s t table, for a number of degrees of freedom equal to
135 and for a 5% significance level, gives us the critical value: .t0.025 (135) = 1.96.
It can be seen that:
– .|tα̂ | = 2.3232 > 1.96: we reject the null hypothesis that .α = 0. The constant
term is therefore significantly different from zero.
– .tβ̂ = 18.8861 > 1.96: we reject the null hypothesis that .β = 0. The
slope coefficient of the regression is therefore significantly different from zero,
indicating that the variable RDJ contributes to explaining the variable REU RO.
– The 95% confidence interval for .α is given by .α̂ ± t0.025 × σα̂ , or .−0.0116 ±
1.96 × 0.0050, which corresponds to the interval .[−0.0214; −0.0018] . We can
64 2 The Simple Regression Model
see that 0 does not belong to this interval, thus confirming the rejection of the
null hypothesis for the coefficient .α.
– The 95% confidence interval for .β is given by .β̂ ± t0.025 × σβ̂ , or .1.1559 ±
1.96 × 0.0612, which corresponds to the interval .[1.0359; 1.2759] . We can see
that 0 does not belong to this interval, thus confirming the rejection of the null
hypothesis for the coefficient .β.
Once the regression parameters have been estimated and tested for statistical
significance, the goodness of fit remains to be assessed. In other words, it is
necessary to study whether the observed scatter plot is concentrated or, on the
contrary, dispersed around the regression line. For this purpose, the analysis of the
variance (analysis of variance [ANOVA]) of the regression is performed and the
coefficient of determination is calculated.
et = Yt − Ŷt = Yt − α̂ − β̂Xt
. (2.170)
we have:
Yt = Ŷt + et
. (2.171)
This equation can also be expressed in terms of sums of squares by replacing the
variances by their definitions:
T T
2
T
2
. Yt − Ȳ = Ŷt − Ŷ + (et − ē)2 (2.174)
t=1 t=1 t=1
2.4 Analysis of Variance and Coefficient of Determination 65
which can also be written, noting that .ē = 0 and .Y = Ȳ (see Property 2.2):
T T
2
T
2
. Yt − Ȳ = Ŷt − Ȳ + et2 (2.175)
t=1 t=1 t=1
– The explained
variance,
which corresponds to the variance of the estimated
variable . V Ŷt : this is the variance explained by the model, i.e., by the
explanatory variable .Xt .
– The variance of the residuals, called residual variance .(V (et )). This is the
variance that is not explained by the model.
– The sum of the squares of the deviations of the explained variable from its mean,
known as the total sum of squares, noted T SS
– The explained sum of squares, noted ESS
– The residual sum of squares (also called sum of squared residuals), noted RSS
T SS = ESS + RSS
. (2.176)
Example 2.1 Let us take the example of the relationship between the returns of the
Dow Jones Industrial Average index (RDJ ) and the returns of the Euro Stoxx 50
index (REU RO). We have already calculated the residual variance, i.e., .V (et ) =
0.0032. Furthermore, we have .V (REU RO) = 0.0115 and .V REU RO =
0.0083. We can therefore write the ANOVA equation:
We deduce that the part of the variation of REU RO explained by the model is
given by:
V REU RO 0.0083
. = ≃ 0.7254 (2.178)
V (REU RO) 0.0115
The ANOVA equation enables us to judge the quality of a regression. The closer
the explained variance is to the total variance, i.e., the lower the residual variance,
the better the regression. In order to quantify this, we calculate the ratio between
the explained variance and the total variance, which is called the coefficient of
determination denoted as .R 2 (R-squared):
T
2
T
V Ŷt Ŷt − Ȳ et2
t=1 t=1
.R =2
= =1− (2.179)
V (Yt )
T 2
T 2
Yt − Ȳ Yt − Ȳ
t=1 t=1
or:
ESS RSS
R2 =
. =1− (2.180)
T SS T SS
The coefficient of determination thus measures the proportion of the variance of
Yt explained by the model. By definition, we have:
.
0 ≤ R2 ≤ 1
. (2.181)
X
2.4 Analysis of Variance and Coefficient of Determination 67
X
68 2 The Simple Regression Model
Remark 2.9 Since .V Ŷt = V α̂ + β̂Xt = β̂ 2 V (Xt ), the coefficient of
determination can be written:
β̂ 2 V (Xt )
R2 =
. (2.182)
V (Yt )
Example 2.2 Let us go back to our example relating to the regression of REU RO
on RDJ and a constant, i.e.:
REU
. ROt = −0.0116 + 1.1559RDJt (2.184)
(1.1559)2 × 0.0062
R2 =
. ≃ 0.7254 (2.186)
0.0115
or Eq. (2.183):
[0.0072]2
R2 =
. ≃ 0.7254 (2.187)
0.0062 × 0.0115
it can be deduced that the selected model explains about 72.5% of the variation of
REU RO.
Remark 2.10 The coefficient of determination can be used to compare the quality
of models having the same dependent variable. On the other hand, it cannot be used
to compare models with different dependent variables. For example, the coefficient
of determination can be used to compare the models:
where .Zt is an explanatory variable (other than .Xt ) and .ut an error term, but it
cannot be used to compare:
Thus, if we take the models in Eq. (2.188) and if the coefficient of determination
of the model .Yt = a + bZt + ut is higher than that of the model .Yt = α + βXt + εt ,
the model .Yt = a + bZt + ut is preferred to the model .Yt = α + βXt + εt .
On the other hand, if the coefficient of determination associated with the model
.log Yt = a + bXt + ut is greater than that of the model .Yt = α + βXt + εt , we
cannot conclude that the model .Yt = a + bXt + ut is better, because the dependent
variable is not the same in the two models.
The significance test of the coefficient .β, that is, the test of the null hypothesis
H0 : β = 0, can be approached in the ANOVA framework. Recall that we have
.
(Eq. (2.135)):
β̂ − β
. ∼ N (0, 1) (2.190)
T
2
σε / xt
t=1
Furthermore, by virtue of the property that the sum of the squares of the terms
of a normally distributed series follows a Chi-squared distribution, we can write by
squaring the previous expression (see Box 2.2):
2
β̂ − β
. ∼ χ12 (2.191)
T
σε2 / xt2
t=1
T
et2
t=1
. ∼ χT2 −2 (2.192)
σε2
70 2 The Simple Regression Model
T
β̂ 2 xt2
t=1
F =
. ∼ F (1, T − 2) (2.194)
T
et2 /(T − 2)
t=1
T T
2
T
2
. Yt − Ȳ = Ŷt − Ȳ + et2 (2.195)
t=1 t=1 t=1
T
T
T
T
T
. yt2 = ŷt2 + et2 = β̂ 2 xt2 + et2 (2.196)
t=1 t=1 t=1 t=1 t=1
T
T
Thus, we have .β̂ 2 xt2 = ESS and . et2 = RSS, and Eq. (2.194) becomes:
t=1 t=1
ESS
F =
. ∼ F (1, T − 2) (2.197)
RSS/(T − 2)
This statistic can be used to perform a test of significance of the coefficient .β:
R2
F =
. ∼ F (1, T − 2) (2.198)
1 − R 2 /(T − 2)
Let us go back to our example linking the returns of the European stock index
(REU RO) and the returns of the US stock index (RDJ ). The purpose is to apply
the tests of significance of .β and of the R-squared based on Fisher statistics.
Table 2.4 presents the calculations required to determine the explained sum of
squares (ESS) and the sum of squared residuals (RSS), the latter having already
been calculated.
The explained sum of squares is equal to .ESS = 1.1418 and the sum of squared
residuals is given by .RSS = 0.4322. The application of the formula (2.197) leads
to the following result:
1.1418
F =
. ≃ 356.68 (2.199)
0.4322/135
At the 5% significance level, the value of the Fisher distribution .F (1135) read
from the table is .3.842. Thus, we have .F ≃ 356.68 > 3.842, which means that we
reject the null hypothesis that .β = 0. The variable RDJ contributes significantly to
explaining REU RO, which of course confirms the results previously obtained.
It is also possible to calculate the F statistic from expression (2.198). We have
previously shown that .R 2 ≃ 0.7254. We thus have:
0.7254
F =
. ≃ 356.68 (2.200)
(1 − 0.7254) /135
We obviously obtain the same value as with Eq. (2.197). Comparing, as before,
this value to the critical value at the 5% significance level, i.e., .F (1135) = 3.842,
we have .356.68 > 3.842. We therefore reject the null hypothesis of nonsignificance
of the coefficient of determination. The coefficient of determination is significant,
which is equivalent to concluding that the variable RDJ matters in the explanation
of REU RO, since our model contains only one explanatory variable.
2.5 Prediction
Once the model has been estimated by the OLS method, it is possible to predict
the dependent variable. Suppose that the following model has been estimated for
.t = 1, . . . , T :
Yt = α + βXt + εt
. (2.201)
that is:
for .t = 1, . . . , T .
We seek to determine the forecast of the dependent variable for a horizon h,
i.e., .ŶT +h . Assuming that the relationship generating the explained variable remains
identical and the value of the explanatory variable is known in .T + h, we have:
In order to show that the forecast given by Eq. (2.203) is unbiased, let us calculate
the expectation of the expression (2.205):
.E (eT +h ) = E εT +h − α̂ − α − β̂ − β XT +h (2.206)
Since .α̂ and .β̂ are unbiased estimators of .α and .β and given that .E (εT +h ) = 0,
we have:
E (eT +h ) = 0
. (2.207)
The forecast given by Eq. (2.203) is therefore unbiased. The prediction interval
is given by:
. α̂ + β̂XT +h ± σeT +h (2.208)
where .σeT +h designates the standard deviation of the forecast error. After calculating
this standard deviation (see Appendix 2.1.5), we can can write the .100(1 − p)%
prediction interval7 for .YT +h :
1 XT +h − X̄
2
. α̂ + β̂XT +h ± tp/2 σ̂ε 1 + + (2.209)
T
T
xt2
t=1
Remark 2.12 The purpose may be not to predict the precise value of .YT +h , but its
average value instead. We then consider:
E (YT +h ) = α + βXT +h
. (2.210)
The .100(1 − p)% prediction interval for .E (YT +h ) is therefore given by:
1 2
+ XT +h − X̄
. (α + βXT +h ) ± tp/2 σ̂ε (2.213)
T
T
xt2
t=1
Example 2.3 Consider our example relating the returns of the European stock
index (REU RO) and the returns of the US stock market index (RDJ ) over the
period from the second quarter of 1987 to the second quarter of 2021. For this
period, we estimated the following relationship:
REU
. ROt = −0.0116 + 1.1559RDJt (2.214)
Assume that the returns on the US stock index increase by 2% in the third quarter
of 2021 compared to the previous quarter. Given that .RDJ2021.2 = 0.0451, we
deduce: .RDJ2021.3 = 0.0451 × 1.02 = 0.0460. Therefore, we can write:
REU
. RO2021.3 = −0.0116 + 1.1559 × 0.0460 = 0.0416 (2.215)
137 2
We know that .RDJ = 0.0196 and that . RDJt − RDJ = 0.8546.
√ t=1
Moreover, we have already calculated .σ̂ε = 0.0032. Knowing that .t0.025 (135) =
1.96, we have:
√
. (−0.0116 + 1.1559 × 0.0460) ± 1.96 × 0.0032
1 (0.0460 − 0.0196)2
× 1+ + (2.217)
137 0.8546
which corresponds to the interval .[−0.0698; 0.1529]. If the value taken by REU RO
in the third quarter of 2021 does not lie within this interval, the forecast is incorrect.
This may be the case, for example, if the estimated model, valid until the second
quarter of 2021, is no longer valid for the third quarter of the same year. In other
words, such a situation may arise if the structure of the model has changed.
Yt = α + βXt + εt
. (2.218)
which is linear with respect to the parameters .α and .β, but also with respect to
the variables .Yt and .Xt . We now propose to briefly study models frequently used
in economics, which can be nonlinear with respect to the variables .Yt and .Xt , but
linear with respect to the parameters, or can become so after certain appropriate
transformations of the variables. As an example, the model:
Zt = α + βXt + εt
. (2.220)
The log-linear model, also known as log-log model or double-log model, is given
by:
This model is linear in the parameters .α0 and .β. Furthermore, let us posit:
which is a linear model in the variables .Yt∗ and .Xt∗ and in the parameters .α0 and .β.
It is then possible to apply to this model the methodology presented in this chapter
in order to estimate the parameters .α0 and .β by OLS.
One of the interests of the log-log model is that the coefficient .β measures the
elasticity of .Yt with respect to .Xt , i.e., the percentage change in .Yt for a given
percentage of variation in .Xt . It is thus a constant elasticity model.
For example, if .Yt denotes the quantity of a given good and .Xt the unit price of
this good, the coefficient .β represents the price elasticity of demand. Similarly, if
.Yt designates household consumption and .Xt the income of these same households,
Example 2.4 Let us take the example of the consumption and gross disposable
income series of French households already studied in Chap. 1 and consider the
following model:
where .Ct denotes consumption and .Yt income. The data are annual and the study
period runs from 1990 to 2019. In order to estimate this model, we simply take the
logarithm of the raw consumption and income data and apply the OLS method to
the transformed model. The estimation leads to the following results:
log
. Ct = 1.5552 + 0.8796 log Yt (2.226)
(4.67) (36.87)
2.6 Some Extensions of the Simple Regression Model 77
Remark 2.13 The log-log model can be understood from the Box-Cox transfor-
mation (see Box and Cox, 1964). For a variable .Yt , this transformation is given
by:
%
Ytλ −1
.Yt
(λ)
= λ if λ /= 0 (2.227)
log Yt if λ = 0
(λ)
where .Yt is the transformed variable. The Box-Cox transformation thus depends
on a single parameter, noted .λ.
(λ ) (λ )
Let .Yt Y be the transformation of the variable .Yt and let .Xt X be the
transformation of the variable .Xt :
% λY
Yt −1
.Yt
(λY )
= λYif λY =
/ 0 (2.228)
log Yt if λY = 0
% λ
Xt X −1
(λX )
.Xt = λX if λX /= 0 (2.229)
log Xt if λX = 0
This is a linear model with respect to the parameters .α and .β and with respect to
the variables .log Yt and .Xt . The special feature of this model lies in the fact that only
the dependent variable is in logarithms. After transforming the endogenous variable
into a logarithm, it is possible to apply to this model the methodology presented in
this chapter to estimate the parameters .α and .β by OLS.
In the semi-log model, the coefficient .β measures the rate of change of .Yt
relative to the variation of .Xt ; this rate of change being constant. In other words,
the coefficient .β is equal to the ratio between the relative variation of .Yt and the
absolute variation of .Xt . .β is the semielasticity of .Yt with respect to .Xt .
If the explanatory variable is time, the model is written:
. log Yt = α + βt + εt (2.231)
78 2 The Simple Regression Model
Yt = exp (α + βt)
. (2.232)
This model describes the evolution of the variable .Yt , having a constant growth
rate if .β > 0, or constant decrease if .β < 0. Let us explain this. The model (2.232)
describes an evolution in continuous time and can be written:
where .Y0 = exp(α) is the value of .Yt at date .t = 0. The coefficient .β is thus equal
to:
1 dYt
β=
. (2.234)
Yt dt
Yt = Y0 (1 + g)t
. (2.235)
where g is the growth rate of Y . Transforming this expression into logarithmic terms
gives:
By positing .log Y0 = α and .log (1 + g) = β and adding the error term, we find
model (2.231). The relationship:
. log (1 + g) = β (2.237)
Example 2.5 Let us take the example of the French household consumption series
(Ct ) over the period 1990–2019 at annual frequency and consider the following
.
model:
. log Ct = α + βt + εt (2.238)
2.6 Some Extensions of the Simple Regression Model 79
where t denotes time, i.e., .t = 0, 1, 2, . . . , 29. The OLS estimation of this model
leads to the following results:
log
. Ct = 13.6194 + 0.0140t (2.239)
(1292.53) (22.49)
From this estimation, we deduce that .log 1 + ĝ = 0.0140 where .ĝ is the
estimated growth rate. Hence, .ĝ = 0.0141. Over the period 1990–2019, French
household consumption increased annually at a rate of 1.41%.
Remark 2.14 The semi-log model can be understood from the Box-Cox transfor-
mation, noting that .λY = 0 and .λX = 1.
This model is linear with respect to the parameters .α and .β. Such a model can
be estimated by OLS following the methodology described in this chapter and after
transforming the variable .Xt into its inverse.
According to this model, when the variable .Xt tends to infinity, the term .β X1t
tends to zero and .α is therefore the asymptotic limit of .Yt when .Xt tends to infinity.
In addition, the slope of the model (2.240) is given by:
dYt 1
. = −β (2.241)
dXt Xt2
Therefore, if .β > 0, the slope is always negative, and if .β < 0, the slope is
always positive.
This type of model, represented in Fig. 2.11 for .β > 0, can be illustrated by the
Phillips curve. This curve originally related the growth rate of nominal wages to the
unemployment rate. It was subsequently transformed into a relationship between
the inflation rate and the unemployment rate. This Phillips curve can be estimated
by regressing the inflation rate on the inverse of the unemployment rate, with the
inflation rate tending asymptotically towards the estimated value of .α.
80 2 The Simple Regression Model
β>0
Example 2.6 For example, suppose that, for a given country, the regression of
the inflation rate .(πt ) on the inverse of the unemployment rate .(ut ) leads to the
following results:
1
t = −2.3030 + 20.0103
.π (2.242)
ut
These results show that even if the unemployment rate rises indefinitely, the
largest change in prices will be a drop in the inflation rate of about 2.30 points.
Remark 2.15 The reciprocal model corresponds to the case where .λY = 1 and
λX = −1 in the Box-Cox transformation.
.
exp(α)
0,135 exp(α)
As shown in Fig. 2.12, we see that, initially, .Yt grows at an increasing rate (the
curve is convex), then, after the inflection point, the variable grows at a decreasing
rate.
Remark 2.16 The log-reciprocal model corresponds to the case where .λY = 0 and
λX = −1 in the Box-Cox transformation.
.
Conclusion
This chapter has presented the basic model of econometrics, namely, the simple
regression model. In this model, only one explanatory variable is introduced. In
practice, however, it is rare that a single variable can explain the behavior of the
dependent variable. It is possible, then, to refine the study of the dynamics of the
dependent variable by adding explanatory variables to the model. This is known as
a multiple regression model. This model is the subject of the next chapter.
82 2 The Simple Regression Model
1 2
T
σ̂ε2 = T −2 et
t=1
β̂
t-Statistic tβ̂ = &
σβ̂
T
2
T
V Ŷt Ŷt −Ȳ et2
Coefficient of determination R2 = V (Yt ) = t=1
T
=1−
T
t=1
,
2 2
(Yt −Ȳ ) (Yt −Ȳ )
t=1 t=1
0 ≤ R2 ≤ 1
Further Reading
Developments on the linear regression model and the ordinary least squares method
can be found in any econometrics textbook (see the references cited at the end of the
book), including Johnston and Dinardo (1996), Davidson and MacKinnon (1993),
or Greene (2020). For a more mathematical presentation, see, for example, Florens
et al. (2007).
For further developments related to tests and laws, readers may refer to Lehnan
(1959), Rao (1965), Kmenta (1971), Mood et al. (1974), or Hurlin and Mignon
(2022).
For extensions of the linear regression model, interested readers can refer to
Davidson and MacKinnon (1993) or Gujarati et al. (2017). Nonlinear regression
models are discussed in Goldfeld and Quandt (1972), Gallant (1987), Pindyck and
Rubinfeld (1991), Davidson and MacKinnon (1993), or Gujarati et al. (2017).
Appendix 2.1: Demonstrations 83
In order to demonstrate the linearity of the OLS estimators and in particular of .β̂,
let us consider the centered variables:
xt = Xt − X̄
. (2.248)
and
. yt = Yt − Ȳ (2.249)
T
T
T
T
xt yt xt (Yt − Ȳ ) xt Yt xt
t=1 t=1 t=1 t=1
. β̂ = = = − Ȳ × (2.250)
T
T T T
xt2 xt2 xt2 xt2
t=1 t=1 t=1 t=1
Thus:
T
T
T
. xt = Xt − X̄ = Xt − T X̄ = 0
t=1 t=1 t=1
Hence:8
T
xt Yt
t=1
T
. β̂ = = wt Yt (2.251)
T
xt2 t=1
t=1
with:
xt
wt =
. (2.252)
T
xt2
t=1
The expression (2.251) reflects the fact that .β̂ is a linear estimator of .β: .β̂
appears as a linear function of the dependent variable .Yt . We can also highlight
Property 2.12 By virtue of the definition of .wt (Eq. (2.252)), we can write:
T
xt
T
t=1
. wt = =0 (2.253)
T
t=1 xt2
t=1
In addition:
T
T
T
T
T
. wt xt = wt Xt − X̄ = wt Xt − X̄ wt = wt Xt (2.254)
t=1 t=1 t=1 t=1 t=1
And:
T
xt2
T
T
xt t=1
. wt xt = xt = =1 (2.255)
T T
t=1 t=1 xt2 xt2
t=1 t=1
So:
T
T
. wt xt = wt Xt = 1 (2.256)
t=1 t=1
We also have:
⎛ ⎞2
T T ⎜ ⎟ T
⎜ xt ⎟ xt2 1
. wt2 = ⎜ T ⎟ = 2 = (2.257)
⎝ 2⎠ 2
T T
t=1 t=1 xt t=1
xt xt2
t=1 t=1 t=1
The linearity of the estimator .α̂ can also be demonstrated by noting that:
1
T T
. α̂ = Ȳ − β̂ X̄ = Yt − X̄ wt Yt (2.258)
T
t=1 t=1
Appendix 2.1: Demonstrations 85
T
1
. α̂ = − X̄wt Yt (2.259)
T
t=1
which shows that .α̂ is a linear function of .Yt : .α̂ is a linear estimator of .α.
T
T
T
T
T
. β̂ = wt Yt = wt (α + βXt + εt ) = α wt + β wt Xt + wt εt
t=1 t=1 t=1 t=1 t=1
(2.260)
T
. β̂ = β + wt εt (2.261)
t=1
T
T
.E β̂ = E β+ wt εt =β +E wt εt (2.262)
t=1 t=1
T
E β̂ = β +
. wt E (εt ) (2.263)
t=1
In order to show that .α̂ is also an unbiased estimator of .α, let us start again from
the linearity property:
T
1
. α̂ = − X̄wt Yt (2.265)
T
t=1
that is:
T
1
. α̂ = − X̄wt (α + βXt + εt ) (2.266)
T
t=1
T
T
T T
Xt 1
. α̂ = α − X̄α wt + β − X̄β wt Xt + − X̄wt εt (2.267)
T T
t=1 t=1 t=1 t=1
T
T
Given that, in accordance with Property 2.12, . wt = 0 and . wt Xt = 1, we
t=1 t=1
deduce:
T
1
. α̂ = α + − X̄wt εt (2.268)
T
t=1
T
1
E α̂ = E α +
. − X̄wt εt (2.269)
T
t=1
E α̂ = α
. (2.271)
Appendix 2.1: Demonstrations 87
Let us start by showing that the OLS estimators .α̂ and .β̂ are consistent estimators,
that is, their variance tends to zero when T tends to infinity, i.e.:
2
T
T
V (β̂) = E
. wt εt =E wt2 εt2 + 2 wt wt ' εt εt ' (2.274)
t=1 t=1 t<t '
T
V (β̂) =
. wt2 E εt2 + 2 wt wt ' E (εt εt ' ) (2.275)
t=1 t<t '
We know that:
E εt2 = σε2
. (2.276)
We deduce:
T
V (β̂) = σε2
. wt2 (2.278)
t=1
σε2
V (β̂) =
. (2.279)
T
xt2
t=1
88 2 The Simple Regression Model
T
T 2
Given that . xt2 = Xt − X̄ = T V (Xt ), we deduce the following
t=1 t=1
relationship:
σε2
V (β̂) =
. (2.280)
T V (Xt )
T
2
1
V (α̂) = E
. − X̄wt εt (2.282)
T
t=1
T
2 1
1 1
V (α̂) = E
. − X̄wt εt2 +2 − X̄wt − X̄wt ' εt εt '
T T T
t=1 t<t '
(2.283)
Or:
T
2 1
1 1
V (α̂) =
. − X̄wt E εt2 + 2 − X̄wt − X̄wt ' E (εt εt ' )
T '
T T
t=1 t<t
(2.284)
Hence:
T
2 T
1 1 1
V (α̂) = σε2
. − X̄wt = σε2 − 2 X̄w t + X̄ 2 2
w t (2.285)
T T2 T
t=1 t=1
1 T
1
T
V (α̂) = σε2
. + X̄2 wt2 − 2 X̄ wt (2.286)
T T
t=1 t=1
Appendix 2.1: Demonstrations 89
T
So, using (2.257) and noting that . xt2 = T V (Xt ):
t=1
⎛ ⎞
T
T
⎜1 xt2 + T X̄2 Xt2
⎜ X̄2 ⎟ ⎟ t=1 2 t=1
.V (α̂) = σε2 ⎜ + ⎟ = σε2 = σε 2 (2.287)
⎝T
T ⎠
T T V (Xt )
xt2 T xt2
t=1 t=1
T
β∗ =
. γt Y t (2.288)
t=1
where the .γt are weighting coefficients that must be determined. Given that .Yt =
α + βXt + εt , we have:
T
β∗ =
. γt (α + βXt + εt ) (2.289)
t=1
that is:
T
T
T
β∗ = α
. γt + β γt Xt + γt εt (2.290)
t=1 t=1 t=1
T
T
T
E β∗ = E α
. γt + β γt Xt + γt εt (2.291)
t=1 t=1 t=1
By distributing the expectation operator and using the fact that .E (εt ) = 0, we
get:
T
T
E β∗ = α
. γt + β γt Xt (2.292)
t=1 t=1
90 2 The Simple Regression Model
T
. γt = 0 (2.293)
t=1
and
T
. γt Xt = 1 (2.294)
t=1
T
. β∗ = β + γt εt (2.295)
t=1
2
T
∗ ∗ 2
V β
. =E β −β =E γt εt (2.296)
t=1
T
V (β ∗ ) = σε2
. γt2 (2.297)
t=1
We must therefore compare this variance with that of the OLS estimator, i.e.,
V (β̂) given by:
.
T
V (β̂) = σε2
. wt2 (2.298)
t=1
γt = wt + (γt − wt )
. (2.299)
We have:
T
T
T
T
T
. γt2 = (wt + (γt − wt ))2 = wt2 + (γt − wt )2 + 2 wt (γt − wt )
t=1 t=1 t=1 t=1 t=1
(2.300)
Appendix 2.1: Demonstrations 91
According to (2.257):
T
1
. wt2 = (2.301)
T
t=1 xt2
t=1
T
T
and, in line with (2.252) and using the fact that . γt Xt = γt xt = 1:
t=1 t=1
T
xt γt
T
t=1 1
. wt γt = = (2.302)
T
T
t=1 xt2 xt2
t=1 t=1
T
T
T
T
2
. wt (γt − wt ) = γt2 − wt2 − (γt − wt )2 (2.303)
t=1 t=1 t=1 t=1
T
T
= −2 wt2 + 2 wt γt
t=1 t=1
Hence:
T
1 1
. wt (γt − wt ) = − + =0 (2.304)
T
T
t=1 xt2 xt2
t=1 t=1
We have:
T
T
T
T
V (β ∗ ) = σε2
. γt2 = σε2 (wt + (γt − wt ))2 = σε2 wt2 + (γt − wt )2
t=1 t=1 t=1 t=1
(2.305)
T
.V (β ∗ ) = V (β̂) + σε2 (γt − wt )2 (2.306)
t=1
92 2 The Simple Regression Model
T
Since . (γt − wt )2 ≥ 0, we have:
t=1
V (β ∗ ) ≥ V (β̂)
. (2.307)
It follows that, among the class of unbiased estimators, the OLS estimator .β̂ is
the one with the lowest variance.
By applying similar reasoning, it is possible to show that the OLS estimator .α̂
also satisfies the same property.
We seek to determine an estimator .σ̂ε2 of the variance of the error term .σε2 . By
definition, the residuals are given by:
We also have:
1 1
T T
. Ȳ = Yt = (α + βXt + εt ) = α + β X̄ + ε̄ (2.310)
T T
t=1 t=1
and:
hence:
Given that:
we deduce:
et = (εt − ε̄) − β̂ − β xt
. (2.314)
Appendix 2.1: Demonstrations 93
which gives:
T
T 2
T
T
. et2 = (εt − ε̄)2 + β̂ − β xt2 − 2 β̂ − β (εt − ε̄) xt (2.316)
t=1 t=1 t=1 t=1
T
T
. E (εt − ε̄)2 =E εt2 − T ε̄2 (2.317)
t=1 t=1
2
T 1
T
= E εt2 − εt
T
t=1 t=1
⎛ ⎞2
T 1 T
= E εt2 − ⎝ E εt2 + 2 E (εt εt ' )⎠
T '
t=1 t=1 t/=t
T
E
. (εt − ε̄)2 = (T − 1) σε2 (2.318)
t=1
2
T
T 2
E
. β̂ − β xt2 = xt2 E β̂ − β (2.319)
t=1 t=1
We know that:
2 σε2
E β̂ − β = V (β̂) =
. (2.320)
T
xt2
t=1
94 2 The Simple Regression Model
Hence:
2
T
. E β̂ − β xt2 = σε2 (2.321)
t=1
In addition:
T
T
T
T
. β̂ − β (εt − ε̄) xt = wt εt εt xt − ε̄ xt (2.322)
t=1 t=1 t=1 t=1
T
because .β̂ − β = wt εt according to (2.261). Furthermore, by virtue of (2.252)
t=1
T
and since . xt = 0, we have:
t=1
2
T
εt xt
T T
xt εt
T
t=1
. β̂ − β (εt − ε̄) xt = εt xt = (2.323)
T
2
T
t=1 t=1 xt t=1 xt2
t=1 t=1
T
T
T
Noting that .E εt2 xt2 = xt2 E εt2 = σε2 xt2 , we deduce:
t=1 t=1 t=1
T
E
. β̂ − β (εt − ε̄) xt = σε2 (2.325)
t=1
Appendix 2.1: Demonstrations 95
If we take Eq. (2.316), using relations (2.318), (2.321), and (2.325), we obtain:
T
E
. et2 = (T − 1) σε2 + σε2 − 2σε2 = (T − 2) σε2 (2.326)
t=1
We finally deduce the estimator .σ̂ε2 of the variance of the error term:
1 2
T
. σ̂ε2 = et (2.327)
T −2
t=1
In order to determine a prediction interval, the variance of the forecast error must
be calculated. We have:
2
. V (eT +h ) = E εT +h − α̂ − α − β̂ − β XT +h (2.328)
# $2
= E εT +h − α̂ − α + β̂ − β XT +h
# $
V (eT +h ) = E (εT +h )2 − 2E εT +h α̂ − α + β̂ − β XT +h
. (2.329)
2
+E α̂ − α + β̂ − β XT +h
2
2
Knowing that .E (εT +h XT +h ) = 0, .E α̂ − α = V α̂ , .E β̂ − β = V β̂
and .E α̂ − α β̂ − β = E α̂ − E α̂ β̂ − E β̂ = cov α̂, β̂ , we
have:
.V (eT +h ) = V (εT +h ) + V α̂ + XT +h V β̂ + 2XT +h cov α̂, β̂
2
(2.330)
(2.331)
96 2 The Simple Regression Model
that is:
.cov α̂, β̂ = E ε̄ − β̂ − β X̄ β̂ − β (2.332)
Since .E ε̄ β̂ − β = 0, we have:
2
. cov α̂, β̂ = −X̄E β̂ − β (2.333)
Hence:
σ2
cov α̂, β̂ = −X̄ ε
. (2.335)
T
xt2
t=1
or:
⎛ ⎞ ⎛ ⎞
⎜ XT +h − X̄
2⎟ ⎜ XT +h − X̄
2⎟
⎜ 1 ⎟ ⎜ 1 ⎟
V (eT +h ) =
. σε2 ⎜1 + + ⎟ = σε2 ⎜1 + + T ⎟
⎝ T
T ⎠ ⎝ T 2⎠
xt2 Xt − X̄
t=1 t=1
(2.338)
Appendix 2.2: Normal Distribution and Normality Test 97
The relationship (2.338) shows that the variance of the forecast error is an
increasing function of the squared deviation between the value of the explanatory
variable in .T + h and its mean. In other words, the larger the variance, i.e., the
more the value of X in .T + h deviates from the mean, the higher the variance of
the forecast error. The forecast error being a linear function of variables following
normal distributions (relation (2.205)), it is normally distributed:
eT +h
. ∼ N(0, 1) (2.339)
2
σε
X +h −X̄ )
1 +
1
T + ( T
T
xt2
t=1
that is:
YT +h − ŶT +h
. ∼ t (T − 2) (2.341)
2
σ̂ε 1 + T +
1 ( XT +h −X̄ )
T
xt2
t=1
with .ŶT +h = α̂ + β̂XT +h . We deduce a .100(1 − p)% prediction interval for .YT +h :
1 XT +h − X̄
2
. α̂ + β̂XT +h ± tp/2 σ̂ε 1 + + (2.342)
T
T
xt2
t=1
The normal distribution is the most widespread statistical distribution. The density
function of a random variable x following a general normal distribution is given by:
2
1 1 x−m
p(x) = √
. exp − (2.343)
2π σ 2 σ
98 2 The Simple Regression Model
where exp is the exponential, m is the mean of the variable x, and .σ its standard
deviation. We note:
.x ∼ N m, σ
2
(2.344)
z ∼ N (0, 1)
. (2.346)
T 4
1
T Xt − X̄
t=1 μ4
K=
.
2
= (2.348)
T 2 μ22
1
T Xt − X̄
t=1
where .X̄ is the mean of the series .Xt , .t = 1, . . . , T , and the .μi are the centered
moments of order i. For a normal distribution, we have:
μ3 = 0
. (2.349)
μ4 = 3μ22
Appendix 2.2: Normal Distribution and Normality Test 99
that is:
S=0
. (2.350)
K=3
Under the null hypothesis of normality, the test statistic follows a Chi-squared
distribution with 2 degrees of freedom. Therefore, if the calculated value of the J B
test statistic is lower than the theoretical value of the Chi-squared distribution with
2 degrees of freedom, the null hypothesis of normality is not rejected. On the other
hand, if J B is greater than the critical value, the null hypothesis of normality is
rejected.
This Jarque and Bera test is used to test the normality of the residuals.
.Yt is a linear function of the error term. Consequently, .Yt is also normally
E(Yt ) = α + βXt
. (2.353)
Appendix 2.3: The Maximum Likelihood Method 101
and variance:
V (Yt ) = σε2
. (2.354)
We note .f Y1 , Y2 , . . . , YT α + βXt , σε2 the joint probability density func-
tion of .(Y1 , Y2 , . . . , YT ). Since the .Yt are supposed to be independent, we can write:
f Y1 , Y2 , . . . , YT α + βXt , σε2 = f Y1 α + βXt , σε2 × f Y2 α + βXt , σε2
.
× . . . × f YT α + βXt , σε2 (2.355)
with:
2
1 1 Yt − α − βXt
.f (Yt ) = √ exp − (2.356)
2π σε 2 σε
Assuming that the .(Y1 , Y2 , . . . , YT ) are known, expression (2.357) is called the
likelihood function. It is noted:
T
1 1 Yt − α − βXt 2
L α, β, σε2 = √ T
. exp − (2.358)
2 σε
2π σεT t=1
T
T 1 Yt − α − βXt 2
. ln L = −T ln σε − ln (2π) − (2.359)
2 2 σε
t=1
102 2 The Simple Regression Model
T
T T 1 Yt − α − βXt 2
. ln L = − ln σε2 − ln (2π ) − (2.360)
2 2 2 σε
t=1
T
∂ ln L Yt − α − βXt
. =− (−1) (2.361)
∂α σε2
t=1
T
∂ ln L Yt − α − βXt
. =− (−Xt ) (2.362)
∂β σε2
t=1
T
∂ ln L T 1 Yt − α − βXt 2
. =− 2 + (2.363)
∂σε2 2σε 2 σε2
t=1
T
. Yt − α̂ML − β̂ML Xt = 0 (2.364)
t=1
T
. Yt − α̂ML − β̂ML Xt Xt = 0 (2.365)
t=1
T
2
1
. −T + 2
Yt − α̂ML − β̂ML Xt = 0 (2.366)
σ̂ε,ML t=1
T
T
. Yt = T α̂ML + β̂ML Xt (2.367)
t=1 t=1
T
T
T
. Xt Yt = α̂ML Xt + β̂ML Xt2 (2.368)
t=1 t=1 t=1
Appendix 2.3: The Maximum Likelihood Method 103
Equations (2.367) and (2.368) correspond exactly to Eqs. (2.44) and (2.48). The
maximum likelihood estimators of the coefficients .α and .β are therefore identical to
the OLS estimators.
Let us now determine the estimator of the variance of the error term:
1 2
T
.
2
σ̂ε,ML = Yt − α̂ML − β̂ML Xt (2.369)
T
t=1
1 2
T
= Yt − α̂ − β̂Xt
T
t=1
1 2
T
= et
T
t=1
1 2
T
. σ̂ε2 = et (2.370)
T −2
t=1
Since the OLS estimator is an unbiased estimator, it follows that the maximum
likelihood estimator is a biased estimator. This is a consistent estimator and the bias
decreases as the sample size increases.
The Multiple Regression Model
3
The simple regression model studied in Chap. 2 had only one explanatory variable.
In practice, however, it is common for an (explained) variable to depend on several
explanatory variables.
As in the previous chapter, let us take a few examples to illustrate the questions to
be answered in this chapter. Does a family’s consumption expenditure depend more
on its income or its size? To what extent is the quantity demanded of a particular
good a function of the price of that good, the price of other goods, and consumer
income? Do wages depend more on the level of education or work experience?
In these different cases where several explanatory variables come into play, we
speak of a multiple regression model. This chapter proposes an in-depth study of
this model.1
where .t = 1, . . . , T , .Yt is the explained (or dependent) variable, .X1t , X2t , . . . , Xkt
are the k explanatory variables, and .εt is the error term.
This model thus corresponds to an extension of the simple regression model to
the case of k explanatory variables with .k > 1.
The coefficients .β1 , . . . , βk are called partial regression coefficients or partial
slope coefficients. .β1 measures the change in the mean value of Y , with the value
of the other explanatory variables remaining constant (i.e., all other things being
1 This chapter calls upon various notions of matrix algebra. In Appendix 3.1, readers will find the
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 105
V. Mignon, Principles of Econometrics, Classroom Companion: Economics,
https://doi.org/10.1007/978-3-031-52535-3_3
106 3 The Multiple Regression Model
equal). The regression coefficient .β1 therefore measures the effect of a 1-point
variation in .X1t on the mean value of .Yt , this effect being net of any potential
influence of the other explanatory variables on the mean value of .Yt . The same
type of reasoning obviously applies to the other regression coefficients.
We can write Eq. (3.1) for each value of .t, t = 1, . . . , T :
. Y = X β + ε (3.4)
(T ,1) (T ,k+1)(k+1,1) (T ,1)
where:
2 Matrices and vectors are written in bold characters. This notation convention will be used
throughout the book.
3.2 The OLS Estimators 107
The first column of the matrix .X contains only 1s in order to take into account
the constant .α. This allows us to keep a compact matrix form, making it easier to
present the developments linked to the multiple regression model.
As in the case of the simple regression model, the aim is to obtain an estimate
of the parameter vector .β. To this end, the ordinary least squares (OLS) method is
applied.
Like the simple regression model, the multiple regression model is based on a
number of assumptions, which we present below.
nonrandom. This assumption may seem strong,3 but it amounts to assuming that the
explanatory variables are controlled, which considerably simplifies the derivation
of some fundamental statistical results. It is thus a purely technical assumption,
allowing us to consider each vector of the matrix .X as a known constant for the
probability distribution of .Yt .
It is possible to relax this assumption about the nonrandom character of the
matrix .X and assume that it is independent of each value of the error term. Such
an assumption can be written as:
⎛ ⎞
E (ε1 |X )
⎜ E (ε2 |X ) ⎟
⎜ ⎟
.E (εt |X ) = ⎜ .. ⎟=0 (3.5)
⎝ . ⎠
E (εT |X )
Rank(X) = k + 1
. (3.6)
This assumption states that the explanatory variables are linearly independent.4
Such an assumption of independence among the explanatory variables is necessary
for estimating the parameter vector .β.
If the number of observations T is less than .k + 1, then the matrix .X cannot be
of full rank. For this reason, we assume that the number of observations is greater
than the number of explanatory variables, i.e.:
T >k+1
. (3.7)
3 In the sense that the matrix of explanatory variables is assumed to be unchanged whatever the
sample of observations.
4 We will see later that such an assumption implies that there is no collinearity between the
explanatory variables.
3.2 The OLS Estimators 109
negative ones. We deduce that the mathematical expectation of the errors is zero:
⎛ ⎞
E (ε1 )
⎜ E (ε2 ) ⎟
⎜ ⎟
.E (ε) = ⎜ . ⎟=0 (3.8)
⎝ .. ⎠
E (εT )
Hence:
.E (Y ) = Xβ (3.9)
where .I denotes the identity matrix and .σε2 the variance of the error term.
To understand expression (3.10), let us write the variance-covariance matrix of
the error term:
⎛ ⎞
V (ε1 ) Cov(ε1 , ε2 ) · · · Cov(ε1 , εT )
' ⎜ Cov(ε2 , ε1 ) V (ε2 ) · · · Cov(ε2 , εT )⎟
⎜ ⎟
.E εε =⎜ .. .. .. .. ⎟ (3.11)
⎝ . . . . ⎠
Cov(εT , ε1 ) Cov(εT , ε2 ) · · · V (εT )
⎛ ⎞
E(ε12 ) E(ε1 ε2 ) · · · E(ε1 εT )
⎜ E(ε2 ε1 ) E(ε2 ) · · · E(ε2 εT )⎟
⎜ 2 ⎟
=⎜ .. .. .. .. ⎟
⎝ . . . . ⎠
E(εT ε1 ) E(εT ε2 ) · · · E(εT2 )
Under the assumption of no autocorrelation, all terms off the diagonal are zero.
In accordance with the assumption of homoskedasticity, the terms on the diagonal
are constant and equal to .σε2 . We therefore have:
⎛ 2 ⎞ ⎛ ⎞
σε 0 ··· 0 1 0 ··· 0
' ⎜ σε2 ··· 0⎟ ⎜0 1 ··· 0⎟
⎜0 ⎟ 2⎜ ⎟
.E εε =⎜ . .. .. .. ⎟ = σε ⎜ .. .. . . .. ⎟ = σε I
2
(3.12)
⎝ .. . . . ⎠ ⎝. . . .⎠
0 0 · · · σε2 0 0 ··· 1
.ε ∼ N 0, σε2 I (3.13)
As in the case of the simple regression model, the normality assumption is not
necessary to establish the results of the multiple regression model. However, it
allows us to derive statistical results and construct test statistics (see below).
. Y = Xβ + ε (3.14)
Our objective is to estimate the vector .β of parameters by the OLS method. This
vector .β̂ of estimated parameters is given by:
−1
. β̂ = X' X X' Y (3.15)
e = Y − X β̂
. (3.16)
T
Min
. εt2 ≡ Min e' e (3.17)
t=1
'
e' e = Y − X β̂
. Y − Xβ̂ (3.18)
' '
= Y ' Y − β̂ X' Y − Y ' X β̂ + β̂ X' Xβ̂
'
. β̂ X' Y and .Y ' X β̂ are scalars. Knowing that a scalar is equal to its transpose:
we deduce:
' '
e' e = Y ' Y − 2β̂ X' Y + β̂ X' Xβ̂
. (3.20)
' '
∂ e' e ∂ β̂ X' Y ∂ β̂ X' Xβ̂
. = −2 + (3.22)
∂ β̂ ∂ β̂ ∂ β̂
We have:
'
∂ β̂ X' Y
. = X' Y (3.23)
∂ β̂
∂ A' B ∂ B'A
because . (∂B ) = (∂B ) = A where .A and .B denote vectors (of dimension .(k +
1, 1) in our case).
Moreover:
'
∂ β̂ X' Xβ̂
. = 2 X' X β̂ (3.24)
∂ β̂
∂ B ' CB
since . ( ∂B ) = 2CB where .C denotes a symmetric matrix and .B a vector.
Relationship (3.22) is therefore written as:
∂ e' e
. = −2X ' Y + 2 X' X β̂ = 0 (3.25)
∂ β̂
Hence:
'
. X X β̂ = X' Y (3.26)
Insofar as the matrix .X is of full rank, it follows that the matrix . X' X is also of
rank .k +1, implying that its inverse exists. Consequently, we deduce from Eq. (3.26)
the expression giving the vector .β̂ of parameters estimated by OLS:
−1
β̂ = X' X
. X' Y (3.27)
112 3 The Multiple Regression Model
Remark 3.1 As in the case of the simple regression model, it is possible to estimate
the multiple regression model by the maximum likelihood method. It can be shown
that the maximum likelihood estimator of the parameter vector .β is identical to that
obtained with OLS. Furthermore, as in the simple regression model, the maximum
likelihood estimator of the error variance is biased, unlike that of OLS. For a detailed
presentation of the maximum likelihood method, see Greene (2020).
that is:
−1 −1
β̂ = X' X
. X' Xβ + X' X X' ε (3.29)
Hence:
−1
β̂ = β + X' X
. X' ε (3.30)
Unbiased Estimator
From Eq. (3.30), we can write:
−1
E β̂ = E (β) + X ' X
. X' E (ε) (3.31)
E β̂ = β
. (3.32)
that is:
⎛ ⎞
V (α̂) Cov(α̂, β̂1 ) Cov(α̂, β̂2 ) · · · Cov(α̂, β̂k )
⎜Cov(β̂1 , α̂) V (β̂1 ) Cov(β̂1 , β̂2 ) · · · Cov(β̂1 , β̂k )⎟
⎜ ⎟
⎜ .. ⎟
.Ω = ⎜Cov(β̂2 , α̂) Cov(β̂2 , β̂1 ) V ( β̂ ) · · · . ⎟ (3.34)
β̂ ⎜ 2 ⎟
⎜ . . . . . ⎟
⎝ .. .. .. .. .. ⎠
Cov(β̂k , α̂) ··· ··· ··· V (β̂k )
This matrix is symmetric. Furthermore, using relation (3.30), we can write (3.33)
as follows:
' −1 ' ' −1 ' '
.Ω = E XX Xε XX Xε (3.35)
β̂
or:
−1 −1
.Ωβ̂ = E X' X X' εε ' X X' X (3.36)
−1 −1 ' −1
because . X' X is a symmetric matrix, implying: . X' X = X' X .
Knowing that .E εε ' = σε2 I , we deduce:
−1 −1
Ωβ̂ = X' X
. X' σε2 I X X' X (3.37)
Hence:
−1
.Ωβ̂ = σε2 X ' X (3.38)
Property 3.1 The OLS estimator .β̂ of .β is the best linear unbiased estimator
(BLUE).
114 3 The Multiple Regression Model
The error is, of course, unknown in the model. In order to estimate the variance .σε2
of the errors, we need to use the residuals .e:
e = Y − X β̂
. (3.39)
From this expression and after calculating .E e' e , we can show that the
estimator .σ̂ε2 of the error variance is written (see Appendix 3.2.2):
T
e' e 1
. σ̂ε2 = ≡ et2 (3.40)
T −k−1 T −k−1
t=1
3.2.5 Example
Consider the following example comprising one explained variable .Yt and three
explanatory variables .X1t , .X2t , and .X3t for .t = 1, . . . , 6 (see Table 3.1). This
example is only given for illustrative purposes to put into practice the concepts
previously presented, since it obviously makes little sense to carry out a 6-point
regression.
The model is written as:
. Y = Xβ + ε (3.42)
with: ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
4 1354 ε1
⎜2⎟ ⎜1 5 6 8⎟ ⎛ ⎞ ⎜ ⎟
⎜ ⎟ ⎜ ⎟ α ⎜ε2 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜1⎟ ⎜1 7 3 9⎟ β1 ⎟ ⎜ε3 ⎟
.Y = ⎜ ⎟ , X = ⎜ ⎟,β = ⎜
⎝β2 ⎠ and .ε = ⎜ ⎟
⎜3⎟ ⎜1 2 2 5⎟ ⎜ε4 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝6⎠ ⎝1 9 1 2⎠ β3 ⎝ε5 ⎠
8 1473 ε6
Finally:
⎞ ⎛ ⎞
⎛
5.57 α̂
' −1 ' ⎜ 0.21 ⎟ ⎜β̂1 ⎟
. X X XY =⎜ ⎟ ⎜ ⎟
⎝ 0.47 ⎠ = ⎝β̂2 ⎠ (3.47)
−0.87 β̂3
Practical Calculation
In practice, it is possible to use certain matrix algebra results to simplify calcula-
tions. Thus, in a model
with three explanatory variables, such as the one in this
example, the matrix . X' X is given by:
⎛ ⎞
T X X2t X3t
1t
⎜ X1t X1t2 X X X X ⎟
.X X = ⎜ 1t 2 2t 1t 3t ⎟
'
⎝ X2t (3.49)
X1t X2t X2t X X ⎠
2t 2 3t
X3t X1t X3t X2t X3t X3t
and:
⎛ ⎞
24
⎜121⎟
.X Y = ⎜ ⎟
'
⎝103⎠ (3.52)
92
3.3 Tests on the Regression Coefficients 117
ε ∼ N 0, σε2 I
. (3.53)
So we have:
−1
. β̂ ∼ N β,σε2 X' X (3.56)
Consider the coefficient .β̂i associated with the ith explanatory variable .Xit . We
have:
−1
where .ai+1,i+1 denotes the .(i + 1)th element5 of the diagonal of . X ' X . We can
write:
β̂i − βi
. √ ∼ N(0, 1) (3.58)
σε ai+1,i+1
e' e
. σ̂ε2 = (3.59)
T −k−1
−1
5 We consider the .(i + 1)th element and not the ith since the first element of the matrix . X' X
relates to the constant term. Obviously, if the variables are centered, it is appropriate to choose the
ith element.
118 3 The Multiple Regression Model
'
Using the result that if .w ∼ N 0, σw2 I , . wσ 2w follows a Chi-squared distribution,
w
we have:
e' e
. ∼ χT2 −k−1 (3.60)
σε2
that is:
(T − k − 1)σ̂ε2
. ∼ χT2 −k−1 (3.61)
σε2
Remembering the property that (see Box 2.2 in Chap. 2) if .z ∼ N (0, 1) and
v ∼ χr2 , the quantity:
.
√
z r
.t = √ (3.62)
v
β̂i −βi √
√
σε ai+1,i+1 T −k−1
t=
. (3.63)
(T −k−1)σ̂ε2
σε2
or finally:
β̂i − βi
t=
. √ ∼ t (T − k − 1) (3.64)
σ̂ε ai+1,i+1
From relationship (3.64), we can construct a .100(1 − p)% confidence interval for
βi , i.e.:
.
√
.β̂i ± tp/2 σ̂ε ai+1,i+1 (3.65)
As in the case of the simple regression model, it is possible to test the null
hypothesis that .βi is equal to a certain value .β0 , i.e.:
H0 : βi = β0
. (3.66)
3.3 Tests on the Regression Coefficients 119
H1 : βi /= β0
. (3.67)
β̂i − β0
. √ ∼ t (T − k − 1) (3.68)
σ̂ε ai+1,i+1
In practice, the most commonly used test consists in testing the null hypothesis:
H0 : βi = 0
. (3.69)
H0 : βi /= 0
. (3.70)
This is a coefficient significance test (or t-test): under the null hypothesis, the
coefficient associated with the variable .Xit is not significant, i.e., this variable plays
no role in determining the dependent variable .Yt . In practical terms, we proceed as
in the case of the simple regression model, i.e., we calculate the t-statistic of the
coefficient .β̂i :
β̂i
tβ̂i =
. (3.71)
σβ̂i
where .σ
β̂i denotes the estimate of the standard deviation of the coefficient .β̂i , i.e.:
√
σ
.
β̂i = σ̂ε ai+1,i+1 (3.72)
120 3 The Multiple Regression Model
The decision rule associated with the significance test of the coefficient .βi is
written:
– If .tβ̂i ≤ tp/2 : the null hypothesis is not rejected at the 100p% significance level,
so.βi = 0—the variable .Xit does not contribute to the explanation of .Yt .
– If .tβ̂i > tp/2 : the null hypothesis is rejected at the 100p% significance level, so
.βi /= 0—the coefficient associated with the variable .Xit is significantly different
from zero, indicating that .Xit contributes to the explanation of the dependent
variable .Yt .
We have previously presented the t-test, which allows us to test the significance of
each of the regression coefficients in isolation. It is also possible to simultaneously
test the significance of several or even all the coefficients of the estimated model.
For this purpose, we use a Fisher test.
Assume that the elements of the vector .β are subject to q constraints:
Rβ = r
. (3.73)
R = [0 · · · 0 1 0 · · · 0] and r = 0
. (3.74)
The matrix .R contains a 1 in the .(i + 1)th place6 and .r is null, which amounts
to testing the significance of the coefficient .βi . If we set .r = β0 , we find the test of
equality of .βi at a certain value .β0 presented earlier.
6 Recallthat the first element of the matrix corresponds to the constant term, which explains why
we consider the .(i + 1)th element and not the ith.
3.3 Tests on the Regression Coefficients 121
R = [0 0 1 0 − 1 0 · · · 0] and r = 0
. (3.75)
allows us to test:7
β2 − β4 = 0
. (3.76)
that is:
β2 = β4
. (3.77)
R = [0 0 1 0 1 0 · · · 0] and r = 0
. (3.78)
. β2 + β4 = 0 ⇐⇒ β2 = −β4 (3.79)
This gives:
⎛ ⎞
⎛ ⎞ α ⎛ ⎞
0 1 0 · · · 0 ⎜β ⎟ 0
⎜0 0 ⎜ 1⎟
⎜ 1 · · · 0⎟ ⎜ . ⎟ ⎜0⎟
⎟ ⎜
. ⎜. . ⎜ .. ⎟ = ⎜ . ⎟
⎝ .. ..
.. . . .. ⎟ ⎜ ⎟ ⎟ (3.81)
. . . ⎠ ⎜ . ⎟ ⎝ .. ⎠
⎝ .. ⎠
0 ··· ··· 0 1 0
βk
R = [ 0 I s ] and r = 0
. (3.82)
where .r is a column vector with s elements and .0 is a null matrix of size .(s, k+1−s).
We test .βk−s+2 = βk−s+3 = · · · = βk = 0. This involves testing the null hypothesis
that the last s elements of the vector .β are insignificant.
We can thus see that expression (3.73) brings together a large number of tests,
which are presented in detail in Appendix 3.2.3. We now propose a synthesis.
Synthesis
The various tests mentioned above are Fisher tests. They consist in considering
two regression models: an unconstrained model and a constrained model. The
unconstrained model involves regressing the dependent variable on all the explana-
tory variables. The constrained model involves regressing the dependent variable
on just some of the explanatory variables. It is called a constrained model insofar
as a constraint is imposed on one or more coefficients included in the regression.
Generally speaking, Fisher tests are written:
(RSSc − RSSnc ) /q
F =
. ∼ F (q, T − k − 1) (3.83)
RSSnc / (T − k − 1)
where .RSSnc is the residual sum of squares of the unconstrained model and .RSSc
denotes the residual sum of squares of the constrained model, q being the number
of constraints.
Remark 3.2 The t-test, which consists in testing the significance of a single
coefficient, can also be interpreted as a Fisher test. Indeed, the test of the null
hypothesis .βi = 0 amounts to:
After estimating the regression parameters and testing the significance of the
coefficients, it is necessary—as in the case of the simple regression model—to
assess the goodness of fit. In other words, we study whether the scatter plot is well
represented by the regression line, by analyzing whether the scatter is concentrated
or, on the contrary, dispersed around the line. This can be done by using the analysis
of variance of the regression and calculating the coefficient of determination.
y = xβ + ε
. (3.85)
where:
⎛ ⎞
y1
⎜ ⎟
– .y = ⎝ ... ⎠
y
⎛ T ⎞
x11 · · · xk1
⎜ ⎟
– .x = ⎝ ... . . . ⎠
x xkT
⎛ 1T⎞
β1
⎜ ⎟
– .β = ⎝ ... ⎠
βk
⎛ ⎞
ε1
⎜ ⎟
– and .ε = ⎝ ... ⎠
εT
124 3 The Multiple Regression Model
As in the case of the simple regression model, we will establish the analysis-of-
variance equation by starting from the expression of the residuals:
e = y − x β̂
. (3.86)
Hence:
' ' '
e' e = y − x β̂
. y − x β̂ = y ' y − 2β̂ x ' y + β̂ x ' x β̂ (3.87)
Knowing that .x ' y = x ' x β̂, this relationship can be written as:
'
e' e = y ' y − β̂ x ' x β̂
. (3.88)
that is:
'
y ' y = β̂ x ' x β̂ + e' e
. (3.89)
or:
'
y ' y = β̂ x ' y + e' e
. (3.90)
Knowing that .X' Y = X' Xβ̂, this relationship can still be written as:
'
Y ' Y = β̂ X' Xβ̂ + e' e
. (3.92)
3.4 Analysis of Variance (ANOVA) and Adjusted Coefficient of Determination 125
or also:
'
Y ' Y = β̂ X' Y + e' e
. (3.93)
If we compare this equation with (3.90), we see that the residual sum of squares
is the same, but the left-hand side is different. Specifically:
– In (3.93), the left-hand side is given by: .Y ' Y = Yt2 .
t
2
– In (3.90), the left-hand side is written: .y ' y = yt2 = Yt − Ȳ = Yt2 −
t t t
T Ȳ 2 .
ESS RSS
R2 =
. =1− (3.95)
T SS T SS
That is, in the case of centered variables:
'
β̂ x ' y e' e
. R2 = '
=1− ' (3.96)
yy yy
0 ≤ R2 ≤ 1
. (3.98)
√
Remark 3.3 . R 2 is called the multiple correlation coefficient. However, it is not
used in practice.
126 3 The Multiple Regression Model
R 2 /k
F =
. ∼ F (k, T − k − 1) (3.100)
1 − R 2 / (T − k − 1)
e' e/ (T − k − 1)
R̄ 2 = 1 −
. (3.101)
y ' y/ (T − 1)
that is:
T −1
.R̄ 2 = 1 − 1 − R2 (3.102)
T −k−1
involve the same number of explanatory variables, we can use the .R 2 . On the other
hand, whenever the models to be compared differ in the number of explanatory
variables introduced, the .R̄ 2 should be used. The model with the highest coefficient
of determination—or adjusted coefficient of determination—is then selected.
We saw in the first chapter that the correlation coefficient is an indicator of the
link between two variables. For a model with k explanatory variables, it is possible
to calculate several correlation coefficients. For example, if the model has two
explanatory variables .X1 and .X2 , three correlation coefficients can be calculated:
.rY X1 , .rY X2 , and .rX1 X2 .
However, it is questionable whether .rY X1 measures the true link between Y and
.X1 in the presence of .X2 . For this to be the case, it is necessary to calculate a
correlation coefficient .rY X1 that is independent of the influence that .X2 may have
on Y and on .X1 . Such a coefficient is called the partial correlation coefficient. It
is denoted .rY X1 ,X2 and is given by:
rY X1 − rY X2 rX1 X2
rY X1 ,X2 =
. (3.103)
1 − rY2 X2 1 − rX2 1 X2
rY X2 − rY X1 rX1 X2
rY X2 ,X1 =
. (3.104)
1 − rY2 X1 1 − rX2 1 X2
and:
rX1 X2 − rY X1 rY X2
rX1 X2 ,Y =
. (3.105)
1 − rY2 X1 1 − rY2 X2
rY X1 ,X2 (respectively .rY X2 ,X1 ) is the partial correlation coefficient between Y and
.
X1 (respectively between Y and .X2 ), the influence of .X2 (respectively .X1 ) having
.
been removed. Similarly, .rX1 X2 ,Y is the partial correlation coefficient between the
two explanatory variables .X1 and .X2 , the influence of Y having been removed. A
partial correlation coefficient therefore measures the link between two variables,
the influence of one or more other explanatory variables having been removed. The
three partial correlation coefficients presented above are first-order coefficients in
the sense that only the influence of one variable is removed.
It is also possible to calculate second-order partial correlation coefficients.
Consider, for example, a model with three explanatory variables: .X1 , .X2 , and .X3 .
128 3 The Multiple Regression Model
3.4.5 Example
Let us return to the previous simple example to illustrate the various concepts
presented (see Sect. 3.2.5). Recall that we had a model with three explanatory
variables whose values are shown in Table 3.2.
We have seen that the application of the OLS method led to the following
estimated model:
Sum 0 0 0 0
Mean 0 0 0 0
– .R 2 = ESS
T SS = 0.92 or .R = 1 − T SS = 0.92
2 RSS
We can see that we have .R̄ 2 ≤ R 2 . The model explains around 90% of the
variance of .Yt according to .R 2 and 80% according to .R̄ 2 .
– .R 2 = ESS
T SS = 0.92 or .R = 1 − SCT = 0.92
2 SCR
We find the result that the model explains about 90% of the variance of .Yt
according to .R 2 and 80% according to .R̄ 2 .
β̂3 −0.87
Significance Test of a Coefficient For example, let us look at the test of the null
hypothesis:
.H0 : β2 = 0 (3.111)
3.4 Analysis of Variance (ANOVA) and Adjusted Coefficient of Determination 131
We know that:
β̂2
. √ ∼ t (6 − 3 − 1) (3.112)
σ̂ε a33
−1
where .a33 is the third element of the diagonal of the matrix . X' X . Recall that
−1
we consider the third element and not the second insofar as the matrix . X' X
√
takes the constant into account. We have seen that .a33 = 0.04, so . a33 = 0.2.
Furthermore, we know that . et2 = 2.56, so: .σ̂ε2 = 6−3−1
1
2.56 = 1.28. The test
t
statistic is given by:
0.47
.t=√ = 2.07 (3.113)
1.28 × 0.2
(RSSc − RSSnc ) /q
F =
. ∼ F (q, T − k − 1) (3.114)
RSSnc / (T − k − 1)
where .RSSnc is the sum of squared residuals of the unconstrained model and .RSSc
denotes the sum of squared residuals of the constrained model, q being the number
of constraints.
We know that: .RSSnc = 2.56. The constrained model consists of the regression
of .Yt on .X1t and .X3t . The estimation of this model leads to a residual sum of squares
equal to .RSSc = 7.60. The Fisher test statistic is written as:
(7.60 − 2.56) /1
F =
. = 3.93 ∼ F (q, T − k − 1) (3.115)
2.56/ (6 − 3 − 1)
Significance Test of the Whole Regression Let us now consider the test of the null
hypothesis:
H0 : β1 = β2 = β3 = 0
. (3.116)
132 3 The Multiple Regression Model
ESS/k 31.44/3
F =
. = = 8.18 (3.117)
RSS/ (T − k − 1) 2.56/2
The Fisher table gives us .F (3, 2) = 19.164 at the 5% significance level. Since
the calculated value of the F statistic is lower than the critical value, we do not
reject the null hypothesis that .β1 = β2 = β3 = 0. The fact that the coefficient
of determination is large even though the variables are not significant may be due
to the small number of observations. Remember that this example is for illustrative
purposes only.
We can also use the relationship involving the coefficient of determination:
R 2 /k
. F = (3.118)
1 − R 2 / (T − k − 1)
which gives:
0.92/3
F =
. = 8.18 (3.119)
(1 − 0.92) / (6 − 3 − 1)
The interpretation is similar to the previous one, namely, that the explanatory
variables do not contribute to the explanation of the dependent variable.
H0 : β2 = β3 = 0
. (3.120)
This involves testing whether or not the coefficients associated with the variables
X2t and .X3t are significant. We perform a Fisher test:
.
– The unconstrained model has already been estimated. The corresponding sum of
squared residuals is: .RSSnc = 2.56.
– The constrained model is estimated by regressing .Yt on .X1t . The estimation of
this model leads to a sum of squared residuals equal to: .RSSc = 33.97.
– We calculate the test statistic:
(33.97 − 2.56) /2
F =
. = 12.27 (3.121)
2.56/ (6 − 3 − 1)
– The Fisher table gives, at the 5% significance level: .F (2, 2) = 19. Since the
calculated value of the statistic is lower than the critical value, we cannot reject
the null hypothesis that the coefficients associated with the variables .X2t and .X3t
are not significant.
3.4 Analysis of Variance (ANOVA) and Adjusted Coefficient of Determination 133
– .rY X1 ,X2 , measuring the influence of .X1 on Y , the influence of .X2 having been
removed
– .rY X3 ,X1 X2 , measuring the influence of .X3 on Y , the influences of .X1 and .X2
having been removed
The partial correlation coefficient .rY X1 ,X2 is equal to the correlation coefficient
between .e1 and .e2 ; hence:
The correlation between Y and .X1 is equal to 0.14, when the influence of .X2 is
removed.
To determine .rY X3 ,X1 X2 , we first regress Y on .X1 and .X2 , i.e.:
The correlation between Y and .X3 is .−0.96, when the influence of the variables
X1 and .X2 is removed.
.
As in the previous chapter, we offer a few examples from the literature to highlight
the utility of the multiple regression model for cross-sectional studies.
– The aggregate rate of violent property crimes: sum of theft, armed robbery, and
auto theft
3.5 Some Examples of Cross-Sectional Applications 135
– The aggregate rate of violent crimes against persons: sum of attempted homi-
cides, homicides, and assaults
The second example falls within the field of health econometrics. It is based on
the work of Thuilliez (2007) and highlights the relationship between malaria and
primary education. Primary education is apprehended in terms of school results
through repetition and completion rates in primary school. With regard to malaria
(MALA), the author uses an index measured as the proportion of the population “at
136 3 The Multiple Regression Model
risk” (i.e., likely to contract malaria) in a country in 1994. The analysis covers a set
of 80 countries. Thuilliez (2007) also considers the following explanatory variables:
– Per capita income (I N C): this variable takes into account the fact that countries
with higher income levels offer a better quality of education than others. Per
capita income is measured by GDP per capita at the purchasing power parity
level (in logarithms).
– Level of urbanization (U RB): this variable is expected to have a positive impact
on educational outcomes. The variable is expressed in logarithms.
– Public expenditure on primary education, expressed as a percentage of GDP
(EXP ) and in logarithms.
– Public expenditure management efficiency (GEI ). The measure used is the
government effectiveness index proposed by Kaufmann et al. (2006).
– Geographical location variables: percentage of regions belonging to tropical
zones (T ROP ), on the one hand, and subtropical zones
(SU BT ROP ), on the other hand.
– Infant mortality rate (MOR), in logarithms.
The results obtained by Thuilliez (2007) are displayed in Table 3.7. They show
a relationship between malaria and primary school repetition and completion rates.
Concerning the repetition rate, the coefficient assigned to the variable MALA is
positive, meaning that malaria tends to increase the repetition rate, all other things
being equal. The value of 0.096 shows that children living in malaria-risk countries
have repetition rates 9.6% higher than those living in noninfested countries, all other
things being equal. Similarly, if we consider the regression with completion rate
as the explained variable, it appears that the coefficient is negative: malaria has a
negative impact on the completion rate. The estimated value of the coefficient also
shows that high-risk countries have primary school completion rates 29.5% lower
than those of non-risk countries, all other things being equal. On the other hand, we
find that per capita income has no significant impact on repetition and completion
rates, nor do geographic location variables and public expenditure variables. The
fact that public expenditure does not appear to be significant simply illustrates that
increasing school resources does not imply children get better results. Only the
infant mortality rate has a positive effect on the primary school repetition rate. In
summary, this study by Thuilliez (2007) shows that malaria has a negative impact
on children’s school performance, which is in line with expectations.
The third example comes from the work of Bénassy-Quéré and Salins (2005) and
examines the impact of financial openness on income inequalities for 42 developing
and emerging countries in 2001. Four types of inequalities are distinguished: inter-
individual inequalities, geographical disparities, urban/rural disparities, and regional
disparities. The financial openness variable (denoted OP ENF I ) is defined as
the sum of the absolute values of the country’s capital inflows and outflows, as
a percentage of its GDP. The explanatory variables selected, which are control
variables, differ according to the type of inequality considered.
For the study of inter-individual income inequalities, three control variables are
used:
– Social mobility (SOCMOB): this variable can take four values, ranging from 1
(for the worst institutional environment) to 4 (for the best environment). Social
mobility corresponds to recruitment and promotion in the public and private
sectors. A value of 1 corresponds to recruitment or promotion based on social
position, and a value of 4 corresponds to recruitment or promotion based on
merit.
– Trade openness (T RADE): this represents the scale of tariff and nontariff
barriers. It ranges from 1 (high barriers) to 4 (low barriers).
– The scale of structural reforms undertaken in the country under consideration
following financial and trade openness (REF ). The variable ranges from 1 (no
reform) to 4 (very extensive reforms).
138 3 The Multiple Regression Model
The results concerning the effect of financial openness on income equality are
given in the equation below:
.
EQU ALI T Y = 1.730 − 0.226OP ENF I + 0.436SOCMOB (3.132)
(3.43) (0.43) (2.19)
increase regional inequalities, while the level of GDP per capita tends to reduce
them.
P
.ART Y = 33.43 + 0.47 SORT + 6.10 P RES − 12.24 MERGER (3.133)
(6.23) (6.21) (4.19) (−3.25)
Among the political variables, the incumbent’s bonus has a positive and highly
significant effect. Thus, all other things being equal, almost half of the votes
obtained by the party in the previous elections are transferred to the explained
variable. The variable P RES also has a significant positive impact, illustrating
significant regional and national interactions. Finally, the variable MERGER has
a negative sign, indicating that the merger of the incumbent party with another list
between the two rounds of voting has a negative effect on the incumbent party. Also
140 3 The Multiple Regression Model
noteworthy is the significant impact of the variable P OP showing that the size of
the municipality has an unfavorable effect on the incumbent party.
Turning now to the ecological variables, their impact is negative. The variable
OZONE has a significant and negative coefficient: air pollution has a negative
impact on the election of the incumbent mayor. The same is true for the variable
P OLL, indicating that voters tend to punish incumbent mayors from municipalities
with the most polluted sites and soils. Overall, this study shows that ecological
inequalities, as reflected in environmental variables, have a significant impact on
electoral behavior.
3.6 Prediction
One of the practical interests of the regression model lies in forecasting. Thus, once
the model has been estimated, it can be used to predict the evolution of the dependent
variable.
for .t = 1, . . . , T , i.e.:
We seek to determine the forecast of the dependent variable for a horizon h, i.e.,
. ŶT +h , as well as the associated prediction interval. The latter is given by:
−1
. ŶT +h ± tp/2 σ̂ε R X' X R' + 1 (3.136)
where .R is the matrix containing the values of the explanatory variables at date
T + h and whose first element is 1.
.
Let us explain this expression and the value taken by .ŶT +h . Assuming that the
relationship generating the explained variable remains identical and that the values
of the explanatory variables are known in .T + h, we have:
eT +h = YT +h − ŶT +h
. (3.138)
= εT +h − α̂ − α − β̂1 − β1 X1T +h − . . . − β̂k − βk XkT +h
Since the OLS estimators of the coefficients are unbiased estimators and given
that .E (εT +h ) = 0, we deduce:
E (eT +h ) = 0
. (3.139)
YT +h = Rβ + εT +h
. (3.140)
where, as before, .R is the matrix (row vector) containing the values of the
explanatory variables at date .T + h and whose first element is 1. The forecast is
then given by:
ŶT +h = R β̂
. (3.141)
eT +h = YT +h − ŶT +h = εT +h − R β̂ − β
. (3.142)
E (eT +h ) = 0
. (3.143)
YT +h − ŶT +h
. ∼ t (T − k − 1) (3.148)
−1 '
σ̂ε R X' X R +1
Taking the usual value .p = 5%, the 95% interval is thus written:
−1
ŶT +h ± t0.025 σ̂ε R X' X
. R' + 1 (3.150)
3.6.2 Example
Let us take the example studied throughout this chapter (see Sect. 3.2.5) and suppose
that we wish to predict the value of Y for the date .t = 7. Also assume that the values
of the three explanatory variables are known in .t = 7 and are given by: .X17 = 6,
.X27 = 8, and .X37 = 1. The matrix .R is written as:
R= 1681
. (3.151)
that is:
Hence:
Ŷ7 = 9.72
. (3.154)
3.7 Model Comparison Criteria 143
Now let us determine the 95% prediction interval for .Y7 . We had found:
⎛ ⎞
2.87 −0.24 −0.24 −0.11
' −1 ⎜−0.24 0.04 0.02 −0.002⎟
XX =⎜
⎝−0.24
⎟ (3.155)
−0.004⎠
.
0.02 0.04
−0.11 −0.002 −0.004 0.03
Therefore:
⎛ ⎞⎛ ⎞
2.87 −0.24 −0.24 −0.11 1
' −1 ' ⎜−0.24 0.04 0.02 −0.002⎟ ⎜6⎟
.R X X R = 1681 ⎜
⎝−0.24
⎟⎜ ⎟ (3.156)
0.02 0.04 −0.004⎠ ⎝8⎠
−0.11 −0.002 −0.004 0.03 1
Hence:
−1
R X' X
. R ' = 1.79 (3.157)
We had also obtained: .σ̂ε2 = 1.28, i.e., .σ̂ε = 1.13. Using (3.150) and knowing
that .t (T − k − 1) = t (2) = 4.303, the 95% prediction interval for .Y7 is:
√
9.72 ± 4.303 × 1.13 1.79 + 1 = 9.72 ± 8.12
. (3.158)
These coefficients have already been presented. Let us just recall some essential
points.
The coefficient of determination lies between 0 and 1; the closer it is to 1, the
better the quality of the fit. It measures the quality of the fit within the sample
and is therefore a measure of a model’s explanatory power. It is not because a
model has a high coefficient of determination that it will perform well in out-of-
sample forecasting; the coefficient of determination is not a measure of a model’s
predictive power. If several models—with the same explained variable and the same
number of explanatory variables—are compared on the basis of the coefficient of
determination, the one with the highest .R 2 value should be selected.
It is important to remember, however, that there is a nondecreasing relationship
between the value of the coefficient of determination and the number of explanatory
variables introduced into a model. For this reason, the adjusted (or corrected)
coefficient of determination has been proposed and can be used to compare models
with a different number of explanatory variables; the best model being the one
with the highest adjusted coefficient of determination. Like the usual coefficient
of determination, the adjusted coefficient of determination only allows us to judge
the explanatory power of a model, not its predictive power.
The information criteria are based on information theory and are intended to assess
the loss of information—called the amount of Kullback information8 —when an
estimated model is thought to represent the true data-generating process. Since the
aim is to minimize this loss of information, the model to be preferred, among all the
models estimated, is the one that will minimize the information criteria.
The information criteria (I C) are based on the use of the maximum likelihood
method (see Appendix 2.3 to Chap. 2). They are of the form:
2𝓁 p(T )
IC = −
. + (3.159)
T T
where .𝓁 is the log-likelihood function:
'
T ee
.𝓁=− 1 + log(2π ) + log (3.160)
2 T
8 Strictly speaking, this is known as Kullback-Leibler information (see Kullback and Leibler,
1951).
3.7 Model Comparison Criteria 145
and .p(T ) is a penalty function that increases with the model’s complexity, i.e.,
with the number of explanatory variables introduced into the model. In other words,
the information criteria penalize the addition of variables to guard against the risk of
over-fitting or over-parameterization. The various information criteria proposed in
the literature are distinguished by the penalty function adopted. The most frequently
used criteria are those introduced by Akaike (1973),9 Schwarz (1978), and, to a
lesser extent, Hannan and Quinn (1979), which we present below.
which can still be expressed, under the assumption of error normality,10 as follows:
2(k + 1)
AI C = log σ̂ε2 +
. (3.162)
T
where .σ̂ε2 is the error variance estimator, k is the number of explanatory variables
included in the model, and T is the number of observations. If several models are
compared on the basis of this criterion, the model with the lowest AIC will be
selected. This is thus a criterion to be minimized. Unlike the usual and adjusted
coefficients of determination, the AIC can be used to assess the explanatory power
of a model, but also its predictive power.
Remark 3.6 When the number of explanatory variables k is large relative to the
number of observations—which may happen in the case of a small sample—it is
possible to use the corrected AIC (see Hurvich and Tsai, 1989), denoted .AI Cc ,
given by:
2(k + 1)(k + 2)
.AI Cc = AI C + (3.163)
T −k−2
−2𝓁 (k + 1)
SI C =
. + log(T ) (3.164)
T T
(k + 1)
. SI C = log σ̂ε2 + log T (3.165)
T
where .σ̂ε2 is the error variance estimator, k is the number of explanatory variables
included in the model, and T is the number of observations. As with the AIC, the
SIC must be minimized. Thus, the best model will be the one with the lowest
SIC value. Like the AIC, the SIC criterion can be used to compare predictive
performance both within and out of the sample.
The SIC is more parsimonious than the AIC, as it penalizes the number of
variables in the model more heavily. In other words, it penalizes more the over-
parameterization. As a result, the SIC tends to select models with either the same
number of variables or fewer variables than those selected by the AIC.
−2𝓁 (k + 1) log(log T )
HQ =
. + 2c (3.166)
T T
or, under the assumption of error normality:
(k + 1) log(log T )
H Q = log σ̂ε2 + 2c
. (3.167)
T
where .σ̂ε2 is the error variance estimator, k is the number of explanatory variables
included in the model, T is the number of observations, and c is a constant term for
which a value of 1 is very frequently used. Like the other two information criteria,
the HQ criterion must be minimized.
RSSh
Ch =
. + (2(h + 1) − T ) (3.168)
σ̂ε2
3.8 Empirical Application 147
E (Ch ) ≃ h
. (3.169)
In choosing a model based on this statistic, we should keep the model whose
statistic .Ch is the closest to h.
We propose to study the relationship between the following three series of stock
market returns:
The data are shown in Table 3.9 and are taken from the Macrobond database.
The series are monthly and cover the period from February 1984 to June 2021, i.e.,
a number of observations .T = 449. Suppose we wish to explain the returns of
the UK index by the returns of the Japanese and US stock indexes. The dependent
variable is therefore RF T SE and the explanatory variables are RNI KKEI and
RDJ I ND. We seek to estimate the following model:
449
– . RF T SEt × RNI KKEIt = 0.5788
t=1
449
– . RF T SEt × RDJ I NDt = 0.6902
t=1
449
– . RNI KKEIt × RDJ I NDt = 0.6178
t=1
3.8 Empirical Application 149
449
– . RF T SEt2 = 0.8937
t=1
449
– . RNI KKEIt2 = 1.5665
t=1
449
– . RDJ I N Dt2 = 0.8890
t=1
and:
⎛ ⎞
1.8902
.X Y = ⎝0.5788⎠
'
(3.175)
0.6902
Calculating the inverse of . X' X gives:
⎛ ⎞
' 0.0023 0.0026 −0.0104
−1
. X X = ⎝ 0.0026 0.8823 −0.6230⎠ (3.176)
−0.0104 −0.6230 1.5971
Hence:
⎛ ⎞⎛ ⎞
0.0023 0.0026 −0.0104 1.8902
.β̂ = ⎝ 0.0026 0.8823 −0.6230⎠ ⎝0.5788⎠ (3.177)
−0.0104 −0.6230 1.5971 0.6902
or finally:
⎛ ⎞
−0.0014
.β̂ = ⎝ 0.0856 ⎠ (3.178)
0.7221
RF
. T SE t = −0.0014 + 0.0856RNI KKEIt + 0.7221RDJ I NDt (3.179)
150 3 The Multiple Regression Model
Using Eviews software leads to the results shown in Table 3.10. In addition to the
estimation of the three coefficients, we have several results.
In particular, Table 3.10 gives us the standard deviations and t-statistics of the
estimated coefficients. We thus have:
α̂ −0.0014
tα̂ =
. = = −1.0166 (3.180)
σα̂ 0.0013
β̂1 0.0856
tβ̂1 =
. = = 3.2608 (3.181)
σβ̂1 0.0263
β̂2 0.7221
tβ̂2 =
. = = 20.4435 (3.182)
σβ̂2 0.0353
– .|tα̂ | = 1.0166 < 1.96: we do not reject the null hypothesis that .α = 0. The
constant
is not significantly different from zero.
– .tβ̂1 = 3.2608 > 1.96: we reject the null hypothesis that .β1 = 0. The coefficient
associated with the Japanese variable is significant, meaning that RNI KKEI
contributes to the explanation of RF T SE.
3.8 Empirical Application 151
– .tβ̂2 = 20.4435 > 1.96: we reject the null hypothesis that .β2 = 0.
The coefficient associated with the US variable is significant, meaning that
RDJ I ND contributes to the explanation of RF T SE.
Furthermore, Table 3.10 gives us the value of the F -statistic allowing us to test
the significance of the regression as a whole, i.e., to test the null hypothesis that
.β1 = β2 = 0. The statistic F follows a Fisher distribution with .(q, T − k − 1) =
Since the two estimated models (Tables 3.10 and 3.11) have the same dependent
variable and cover the same period, they can be compared using the adjusted
coefficient of determination. We see that: .0.5964 < 0.6049. The first model,
which also incorporates Japanese returns, is to be preferred, as it explains a higher
percentage of the variation in RF T SE. We can also note that, as expected, the sum
of squared residuals of the first model is lower than that associated with the second
model: .0.3484 < 0.3567, corroborating the superiority of the first regression. These
results are confirmed by the values taken by the Akaike, Schwarz, and Hannan-
Quinn information criteria. Indeed, the model minimizing these three criteria is
the one that includes the returns of the Japanese stock index. These results were
expected since, even if the explanatory power of Japanese returns is lower than that
of US returns, the Japanese variable contributes to the explanation of the UK series.
The model in Table 3.11 can be considered a constrained model, the uncon-
strained model being given in Table 3.10. The constrained model is such that
.β1 = 0, meaning that the Japanese returns are not significant. It is then possible
(RSSc − RSSnc ) /q
F =
. ∼ F (q, T − k − 1) (3.183)
RSSnc / (T − k − 1)
(0.356678 − 0.348373) /1
F =
. = 10.6324 (3.184)
0.348373/ (449 − 2 − 1)
This statistic follows a Fisher distribution with .(1446) degrees of freedom. At the
5% significance level, the Fisher table gives us .F (1446) = 3.842. Since .10.6324 >
3.842, we reject the null hypothesis that the coefficient associated with the Japanese
variable is not significant. The Japanese returns contribute to the explanation of the
UK returns, which is of course consistent with the result derived from the t-test on
the coefficient associated with this same variable.
Conclusion
This chapter has provided a detailed presentation of the multiple regression model.
It should be recalled that the model is based on a number of hypotheses relating
to the explanatory variables and the error term. These include the fundamental
assumptions of no autocorrelation and homoskedasticity of errors. Since, in practice,
one or both of these assumptions are often not met, Chap. 4 presents the procedure
to be followed when autocorrelation and/or heteroskedasticity of errors occur.
Appendix 3.1: Elements of Matrix Algebra 153
Further Reading
This appendix presents the main matrix algebra concepts used in this chapter.
154 3 The Multiple Regression Model
General
aij is the element corresponding to the ith row and the j th column of the matrix .A.
.
The matrix .A has n rows and p columns. The size (or the dimension) of the matrix
is said to be .n × p (which is also noted as .(n, p)).
A row vector is a matrix containing only one row. A column vector is a matrix
with only one column. A matrix can therefore be thought of as a set of row vectors
or column vectors.
When the number of rows is equal to the number of columns, i.e., .n = p, we say
that .A is a square matrix. Frequently used square matrices include:
– Symmetric matrix: it is such that . aij = aj i for all i and j .
– Diagonal matrix: this is a matrix whose elements off the diagonal are zero:
⎛ ⎞
α1 0 0 · · · 0
⎜ 0 α2 0 · · · 0 ⎟
⎜ ⎟
⎜ .. ⎟
.A = ⎜ · · ·⎟
⎜· · · . ⎟ (3.186)
⎜ . ⎟
⎝· · · . . 0⎠
0 0 · · · 0 αp
– Scalar matrix: this is a diagonal matrix whose elements on the diagonal are all
identical:
⎛ ⎞
α 0 0 ··· 0
⎜ 0 α 0 ··· 0 ⎟
⎜ ⎟
⎜ .. ⎟
.A = ⎜ · · ·⎟
⎜· · · . ⎟ (3.187)
⎜ .. ⎟
⎝· · · . 0⎠
0 0 ··· 0 α
Appendix 3.1: Elements of Matrix Algebra 155
– Identity matrix: this is a scalar matrix, noted .I , whose elements on the diagonal
are all equal to 1:
⎛ ⎞
1 0 0 ··· 0
⎜0 1 0 ··· 0 ⎟
⎜ ⎟
⎜ .. ⎟
.I = ⎜· · · . · · ·⎟ (3.188)
⎜ ⎟
⎜ .. ⎟
⎝· · · . 0⎠
0 0 ··· 0 1
Equality
The matrices .A and .B are equal if they are of the same size and if .aij = bij for all
i and j .
Transposition
The transpose .A' of the matrix .A is the matrix whose j th row corresponds to the
j th column of the matrix .A. Since the size of matrix .A is .n × p, the size of matrix
'
.A is .p × n. Thus, we have:
⎛ ⎞
a11 a21 · · · an1
⎜ a12 a22 · · · an2 ⎟
.A = ⎜ ⎟
'
⎝ ⎠ (3.190)
a1p a2p anp
A = A'
. (3.191)
Similarly, we have:
D = A − B = aij − bij
. (3.194)
Note that:
a ' b = a1 b1 + a2 b2 + . . . + an bn
. (3.196)
that is:
n
. a ' b = b' a = ai bi (3.197)
i=1
Now consider two matrices .A and .B and assume that .A is of size .n × p and .B
is of size .p × q. The matrix .C resulting from the product of these two matrices is a
matrix of size .n × q, that is:
. C = A B (3.198)
(n×q) (n×p)(p×q)
cij = a 'i bj
. (3.199)
Matrix multiplication is only possible if the number of columns in the first matrix
(matrix .A) is equal to the number of rows in the second matrix (matrix .B). In this
case, we speak about matrices that are compatible for multiplication.
Appendix 3.1: Elements of Matrix Algebra 157
. AI = I A = A (3.201)
A (B + C) = AB + AC
. (3.205)
Idempotent Matrix
An idempotent matrix .A is a matrix verifying: .AA = A. In other words, an
idempotent matrix is equal to its square. Furthermore, if .A is a symmetric
idempotent matrix, then .A' A = A.
r=s
. (3.206)
158 3 The Multiple Regression Model
We can deduce from this last equation that if .A is a matrix of size .n × p and .B
a square matrix of size .n × n, then:
BB −1 = B −1 B = I
. (3.212)
When the rank of the matrix .B is less than n, the matrix .B is said to be singular
and has no inverse.
Trace of a Matrix
The trace of a square matrix .A of size .n × n, denoted .T r(A), is the sum of its
diagonal elements:
n
T r(A) =
. aii (3.213)
i=1
Furthermore:
T r(A) = T r(A' )
. (3.214)
and:
T r(AB) = T r(BA)
. (3.216)
Determinant of a Matrix
The determinant of a matrix is defined for square matrices only.
As an introductory example, consider a matrix .A of size .2 × 2:
ac
A=
. (3.217)
bd
The determinant of the matrix .A, denoted .det (A) or .|A|, is given by:
a c
.det (A) = |A| = = ad − bc (3.218)
b d
.Aij is the matrix obtained from matrix .A by deleting row i and column j .
where
Aij is called a minor and the term:
.
. Cij = (−1)i+j Aij (3.220)
is called a cofactor.
We have the following property:
. |A| = A' (3.221)
Inverse Matrix
For a matrix to be invertible, it must be nonsingular. Conversely, a matrix is
nonsingular if and only if its inverse exists.
160 3 The Multiple Regression Model
or, equivalently:
for .i, j = 1, . . . , n.
The inverse of the matrix .A, denoted .A−1 , is defined by:
⎛ ⎞
C11 C21 · · · Cn1
1 ⎜⎜C12 C22 · · · Cn2 ⎟
⎟
A−1 = (3.225)
|A| ⎝ ⎠
.
−1
. A−1 =A (3.227)
' −1
. A−1 = A' (3.228)
In order to show that .β̂ is a minimum variance estimator, suppose there exists
another linear estimator .β̆ of .β:
. β̆ = M Y (3.231)
(k+1,1) (k+1,T )(T ,1)
Appendix 3.2: Demonstrations 161
E β̆ = β
. (3.234)
Furthermore:
MX = I
. (3.236)
−1
Replacing .M with . X' X X' + N , we have:
−1
. X' X X' + N X = I (3.237)
−1
Knowing that . X' X X' X = I , we get:
I + NX = I
. (3.238)
Hence:
NX = 0
. (3.239)
. β̆ = β + Mε = E β̆ + Mε (3.240)
Hence:
. β̆ − β = β̆ − E β̆ = Mε (3.241)
162 3 The Multiple Regression Model
Hence:
So we have:
' −1
Ωβ̆ = σε2
. XX + NN ' (3.246)
For .β̆ to have minimal variance and knowing that the variances lie on the diagonal
of .Ωβ̆ , we need to minimize the diagonal elements of .Ωβ̆ . Since the diagonal
−1
elements of . X ' X are constants, the diagonal elements of the matrix .NN ' must
be minimized. If we denote .nij the elements of the matrix .N, where i stands for the
2and j for the column, the diagonal
row elements of the matrix .NN ' are given by:
. nij . These elements are minimal if . nij = 0, or .nij = 0 .∀i, ∀j . We deduce:
2
j j
N =0
. (3.247)
Therefore:
. β̆ = β̂ (3.248)
In order to estimate the variance .σε2 of the errors, we need to use the residuals .e:
e = Y − Xβ̂
. (3.249)
We have:
−1
e = Xβ + ε − X X' X
. X' Y (3.250)
−1
= Xβ + ε − X X' X X ' (Xβ + ε)
Hence:
−1
e = ε − X X' X
. X' ε (3.251)
−1
Noting .P = I − X X ' X X' , we can write:
e = Pε
. (3.252)
We have:
or:
E e' e = E T r P εε '
. (3.256)
164 3 The Multiple Regression Model
using the fact that .T r(AB) = T r(BA) with .A = ε ' and .B = P ε. Hence:
E e' e = σε2 T r (P )
. (3.257)
T r (P ) = T − k − 1
. (3.259)
Finally:
E e' e = σε2 (T − k − 1)
. (3.260)
It follows that the estimator .σ̂ε2 of the error variance is therefore written as:
T
e' e 1
2
.σ̂ε = ≡ et2 (3.261)
T −k−1 T −k−1
t=1
In order to derive the various significance tests, we need to determine the distribution
followed by .Rβ. .β being unknown, let us replace it by its estimator:
R β̂ = r
. (3.262)
and determine the distribution followed by .R β̂. Knowing that .β̂ is an unbiased
estimator of .β, we can write:
E R β̂ = Rβ
. (3.263)
Appendix 3.2: Demonstrations 165
Furthermore:
'
'
V R β̂ = E R β̂ − β
. β̂ − β R (3.264)
Hence:
−1
V R β̂ = σε2 R X ' X
. R' (3.265)
We know that .β̂ follows a normal distribution with .(k + 1) dimensions, therefore:
−1
R β̂ ∼ N R β̂, σε2 R X ' X
. R' (3.266)
and:
−1
R β̂ − β ∼ N 0, σε2 R X' X
. R' (3.267)
Using the result that if .w ∼ N (0, Σ) where .Σ is of size .(K, K), we have
w ' Σ −1 w ∼ χK2 ; then:
.
' −1
−1
. R β̂ − r σε2 R X' X R' R β̂ − r ∼ χq2 (3.269)
Knowing that:
e' e
. ∼ χT2 −k−1 (3.270)
σε2
and using the result (see Box 2.2 in Chap. 2) that if .w ∼ χs2 and .v ∼ χr2 , the statistic
w/s
.F =
v/r follows a Fisher distribution with .(s, r) degrees of freedom, we deduce:
' −1
−1
R β̂ − r R X' X R' R β̂ − r /q
F =
. ∼ F (q, T − k − 1)
e' e/ (T − k − 1)
(3.271)
166 3 The Multiple Regression Model
Let us now return to the three special cases studied—test on a single coefficient,
test on all coefficients, and test on a subset of coefficients—to specify the expression
of the test in each of these cases.
– Test on a particular regression coefficient .βi . This case corresponds to the null
hypothesis .βi = 0, i.e.:
R = [0 · · · 0 1 0 · · · 0] and r = 0
. (3.272)
−1 '
We then have .R β̂ − r = βi and the quadratic form .R X' X R is equal to the
' −1
.(i + 1)th element of the diagonal of the matrix . X X , i.e., .ai+1,i+1 .
The test statistic given in (3.271) becomes:
−1
βi2 ai+1,i+1 /1
.F = ∼ F (1, T − k − 1) (3.273)
2
σ̂ε
that is finally:
βi2
F =
. ∼ F (1, T − k − 1) (3.274)
σ̂ε2 ai+1,i+1
−1
size .(k, k) obtained by deleting the first row and column of the matrix . X' X .
To clarify the expression of this submatrix, let us decompose the matrix .X into
two blocks:
X = x̄ X̄
. (3.276)
where .x̄ denotes a column vector composed of 1 and .X̄ is the matrix of size
(T , k) comprising the values of the k explanatory variables. We then have:
.
!
' T x̄ ' X̄
.X X = ' ' (3.277)
X̄ x̄ X̄ X̄
−1
The calculation of . X' X shows us that the submatrix of size .(k, k) that
interests us here is written as:
Z = I − T −1 x̄ x̄ '
. (3.279)
' '
β̄ X̄ Z X̄ β̄/q
. F = (3.280)
e' e/ (T − k − 1)
' '
β̄ X̄ Z X̄ β̄/k
. F = (3.281)
e' e/ (T − k − 1)
R = [ 0 I s ] and r = 0
. (3.282)
168 3 The Multiple Regression Model
Let us decompose the matrix .X and the vector .β into blocks so that:
!
β̂ r
.Y = X r X s +e (3.283)
β̂ s
= Xr β̂ r + X s β̂ s + e
where the matrix .Xr is formed by the .(k + 1 − s) first columns of .X and .Xs
is formed by the s remaining columns of the matrix .X. We then have: .R β̂ −
−1 '
r = β̂ s . Furthermore, the matrix .R X' X R involved in calculating the test
statistic (Eq. (3.271)) is equal to the submatrix of order s obtained by deleting
−1
the .(k + 1 − s) first rows and columns of the matrix . X' X . Let us explain
the form of this submatrix. We have:
'
' Xr Xr X'r Xs
.X X = (3.284)
X's Xr X's Xs
−1
The calculation of . X' X shows us that the submatrix we are interested in here
is written as:
−1 −1
−1 −1
. X's Xs − X 's X r X'r X r X 'r Xs = X's I − Xr X'r Xr X'r Xs
−1
= X's Z r Xs (3.285)
Let us now explain the expression of the numerator. To do this, consider the
regression of .Y on the explanatory variables listed in .Xr and note .er the residuals
resulting from this regression:
er = Y − Xr β̂ r
. (3.288)
−1
= Y − Xr X'r X r X 'r Y
= Zr Y
Appendix 3.2: Demonstrations 169
Z r Y = Z r Xr β̂ r + Z r Xs β̂ s + Z r e
. (3.289)
We have: −1 '
– .Z r Xr = Xr − Xr X'r Xr Xr Xr = 0
'
– .Z r = Z r = Z r (idempotent and symmetric matrix)
2
– .Z r e = e because:
−1
Z r e = I − Xr X'r Xr
. X'r Y − Xβ̂ (3.290)
−1 −1
= Y − Xβ̂ − X r X'r X r X 'r Y + Xr X'r Xr X'r Xβ̂
−1
= Y − Xβ̂ − X r β̂ r +X r X'r Xr X'r (Y − e)
= X' Xβ̂ + e
Finally, we have:
. Z r e = Y − Xβ̂ − Xr β̂ r + Xr β̂ r = e
Hence:
Z r Y = Z r Xs β̂ s + e
. (3.293)
This test, which is very frequently employed, can be used to test the significance
of a subset of explanatory variables .Xs . In practice, it consists in running two
regressions:
– A regression of .Y on the set of explanatory variables, .e' e being the corre-
sponding sum of squared residuals
– A regression of .Y on the subset of explanatory variables .Xr (i.e., on variables
other than .Xs ), .e'r er being the corresponding sum of squared residuals
The decision rule is as follows:
– If .F ≤ F (s, T − k − 1), the null hypothesis that the variables .Xs are not
significant is not rejected.
– If .F > F (s, T − k − 1), the null hypothesis is rejected.
Heteroskedasticity and Autocorrelation
of Errors 4
– .E (ε) = 0.
'
– .E εε = σε2 I where .I denotes the identity matrix and .σε2 the variance of the
error term.
– .ε ∼ N 0, σε2 I : this normality assumption is not necessary to establish the
results of the multiple regression model, but it does allow statistical results to
be derived and test statistics to be constructed.
This chapter focuses on the case where the hypothesis of sphericity of errors
is not verified. We concentrate on the problems of autocorrelation and heter-
roskedasticity of errors by seeking to answer the following questions:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 171
V. Mignon, Principles of Econometrics, Classroom Companion: Economics,
https://doi.org/10.1007/978-3-031-52535-3_4
172 4 Heteroskedasticity and Autocorrelation of Errors
If the errors are autocorrelated, the terms off the diagonal are not all zero.
Similarly, if the errors are heteroskedastic, the terms on the diagonal are not all
identical.
. Y = X β + ε (4.3)
(T ,1) (T ,k+1)(k+1,1) (T ,1)
=
– .E (ε) 0,
– .E εε ' = Ωε
where .Ωε /= .σε2 I denotes the variance-covariance matrix of the errors. The fact that
.Ωε /= .σε I means there is autocorrelation and/or heteroskedasticity of the errors.
2
E β̂ = β
. (4.5)
4.1 The Generalized Least Squares (GLS) Estimators 173
that is:
−1 ' −1
Ωβ̂ = X' X
. X Ωε X X ' X (4.8)
This expression is different from that obtained when there is neither autocor-
−1
relation nor heteroskedasticity, i.e., .σε2 X' X . It follows that autocorrelation
and/or heteroskedasticity implies that the OLS estimators are no longer of minimum
variance. It is therefore necessary to define other estimators: the generalized least
squares estimators.
. Y = Xβ + ε (4.9)
MY = MXβ + Mε
. (4.10)
M𝚪ε M ' = I
. (4.12)
we could apply OLS to model (4.10). The resulting estimators would then have the
same properties as in the usual case.
174 4 Heteroskedasticity and Autocorrelation of Errors
We saw in Chap. 3 that if .𝚪ε is a positive definite symmetric matrix, there exists
a nonsingular—and therefore invertible—matrix .P such that:
. M = P −1 (4.15)
Furthermore:
−1 −1
𝚪ε−1 = P '
. P = M 'M (4.16)
thus:
−1
β̃ = X' 𝚪ε−1 X
. X' 𝚪ε−1 Y (4.18)
or:
−1
. β̃ = X' Ω−1
ε X X' Ω−1
ε Y (4.19)
The estimator .β̃ given by Eq. (4.19) is called the generalized least squares
(GLS) estimator (or Aitken estimator). Since it was obtained from expressions
(4.11) and (4.12) and since the model (4.10) satisfies the assumptions required for
' .β̃ is the best linear unbiased estimator of .β in the model
the OLS, the GLS estimator
.Y = Xβ + ε with .E εε = Ωε . .β̃ is therefore a BLUE estimator.
Let us determine the variance-covariance matrix .Ωβ̃ of .β̃:
'
. Ωβ̃ = E β̃ − β β̃ − β (4.20)
'
= E Mεε' M
= MΩε M '
4.1 The Generalized Least Squares (GLS) Estimators 175
So:
−1 −1
.Ωβ̃ = X' Ω−1
ε X X' Ωε X X ' Ω−1
ε X (4.21)
−1
Ωβ̃ = X' Ω−1
. ε X (4.22)
We saw in Chap. 3 that the OLS estimator of the error variance was given by (see
Eq. (3.261)):
e' e
σ̂ε2 =
. (4.23)
T −k−1
e' e
σ̃ε2 =
. (4.24)
T −k−1
where:
e = MY − MXβ̃
. (4.25)
that is:
'
Y − Xβ̃ M ' M Y − Xβ̃
2
.σ̃ε = (4.27)
T −k−1
or:
'
Y ' 𝚪ε−1 Y − β̃ X' 𝚪ε−1 Y
2
.σ̃ε = (4.29)
T −k−1
Therefore, assuming that the error term is normally distributed, all the tests
developed in the previous chapters can be applied here.
For the various formulas given above to be operational, the matrix .Ωε must be
known. In practice, this is not the case. Thus, to determine the matrix .Ωε , we need
to specify the analytical form of autocorrelation of errors and/or heteroskedasticity.
Given that the errors are unknown, it is from the residuals that we will look for such
analytic forms. We start by dealing with the problem of heteroskedasticity before
turning to that of autocorrelation.
Recall that heteroskedasticity is present when the terms on the diagonal of the error
variance-covariance matrix are not identical:1
⎛ 2 ⎞
σε1 0 ··· 0
⎜
' ⎜ 0 σε22 ··· 0 ⎟
⎟
.E εε =⎜ . .. .. .. ⎟ (4.30)
⎝ .. . . . ⎠
0 0 · · · σε2T
We then have .E εε ' = Ωε /= .σε2 I , which can be written as:
E εt2 = σε2t
. (4.31)
for .t = 1, . . . , T . Note that .σε2 has been indexed by t, meaning that the variance
varies with t.
Heteroskedasticity can have several sources, including:
– The heterogeneity of the sample under consideration. This is the case, for
example, if the sample studied comprises a large number of countries, bringing
together developed countries and emerging or developing countries.
– The omission of an explanatory variable from the model.
with .V (εt ) = σε2t . Assume that the values of .σε2t are known, for .t = 1, . . . , T . We
can then transform model (4.32) as follows:
or, noting . X
σε = X̃it , for .i = 1, . . . , k, .Ỹt =
it Yt
σεt and .ε̃t = εt
σεt :
t
α
. Ỹt = + β1 X̃1t + β2 X̃2t + . . . + βk X̃kt + ε̃t (4.34)
σεt
The point in transforming the original model is that the variance of the error term
is now constant. Indeed:
εt
.V (ε̃t ) = V (4.35)
σεt
1
= V (εt )
σε2t
178 4 Heteroskedasticity and Autocorrelation of Errors
σε2t
=
σε2t
=1
The error term of the transformed model is therefore homoskedastic, and it is then
possible to apply the OLS technique to model (4.33). The GLS method thus consists
in applying the OLS method to the transformed model. Note that this technique
amounts to minimizing the residual sum of squares of the transformed model, i.e.:
et 2
Min
. ẽt2 = Min = Min ωt et2 (4.36)
t t
σ εt t
with .ẽt = Ỹt − α̃− β̃1 X̃1t − β̃2 X̃2t −. . .− β̃k X̃kt and .ωt = 1
σε2t
. The factors .ωt play the
role of weights, and the GLS method involves minimizing a weighted residual sum
of squares. For this reason, this technique is also called the weighted least squares
method (WLS), WLS being only a special case of GLS in which the transformation
matrix .M is given by:
⎛ ⎞
1
σε1 0 ··· 0
⎜ 1
··· ⎟
⎜ 0 0 ⎟
.M = ⎜ ⎟
σε2
⎜ .. .. .. ⎟ (4.37)
⎝ .. ⎠
. . . .
0 0 ··· 1
σεT
In this example, we have assumed that .V (εt ) is known, which is generally not
the case in practice. We will see later what to do when the variance of the error is
unknown.
Various tests can be used to address the issue of heteroskedasticity. Before pre-
senting them, let us mention that a first intuition can be provided graphically. The
technique consists first in estimating the model considered by OLS as if there were
no heteroskedasticity. We then graphically represent the estimated values .Ŷt of .Yt
(on the x-axis) as a function of the series of squared residuals .et2 (on the y-axis).
Some examples are given in Figs. 4.1, 4.2, 4.3, 4.4 and 4.5.
These graphs allow us to detect whether the estimated mean value of the
dependent variable is systematically related to the squared residuals. If this is the
case, there is a presumption of heteroskedasticity. Figure 4.1 illustrates the absence
of heteroskedasticity in the sense that no particular relationship appears between the
squared residuals and the estimated variable. On the contrary, Figs. 4.2, 4.3, 4.4 and
4.2 Heteroskedasticity of Errors 179
^
Yt
^
Yt
4.5 highlight the existence of a relationship between the two variables, suggesting
there is heteroskedasticity: a linear relationship for Fig. 4.3, a quadratic relationship
according to Fig. 4.4, and a positive nonlinear relationship for Fig. 4.5.
It is also possible to produce graphs with the values of one of the explanatory
variables instead of the estimated values of the dependent variable on the x-axis,
with the squared residuals still shown on the y-axis. If the explanatory variable under
consideration and the squared residuals appear to be related, this is an indication in
favor of heteroskedasticity.
In addition to these graphical methods, there are a number of tests that we now
present.
^
Yt
^
Yt
the error increases with one of the explanatory variables, for example, .Xj . We then
have a relationship of the type:
σε2t = aXj2t
. (4.38)
where a is a positive constant. Such a relationship means that the greater the
values of .Xj , the greater .σε2t . If this is the case, it is an indication that there is
heteroskedasticity. More generally, the test is based on the idea that if we divide the
sample into two subsamples, then, under the assumption of homoskedasticity, the
4.2 Heteroskedasticity of Errors 181
^
Yt
error variances should be identical in both groups. Under the alternative assumption
of heteroskedasticity, they are different. To capture this, Goldfeld and Quandt
suggest a five-step test:
RSS2
.GQ = (4.39)
RSS1
The power of the Goldfeld and Quandt test depends on the choice of m. Harvey
and Phillips (1973) suggest choosing a value of m close to .T /3.
Remark 4.2 The Glejser test was criticized by Goldfeld and Quandt (1972)
who pointed out that the error term .ut in the various regressions did not have
the right statistical properties, which implies that the conditions necessary for
implementating the usual t-test of significance are not required.
as it is based on the residuals from the OLS estimation of the regression model.
In the multiple regression model:
where f is any function, the coefficients .ai , .i = 1, . . . , p, are not related to the
coefficients of the regression model (4.41), and .Z1t , . . . , Zpt are variables likely
to be the source of heteroskedasticity. Some or all of these variables may be
explanatory variables in the regression model (4.41).
Testing the null hypothesis of homoskedasticity is equivalent to testing:
H0 : a1 = a2 = . . . = ap = 0
. (4.43)
σε2t = f (a0 )
. (4.44)
– Step 1. Regression (4.41) is estimated by OLS, and the residual series .et , t =
1, . . . , T is deduced.
– Step 2. The following quantity:
1 2
T
2
σ̂ML
. = et (4.45)
T
t=1
et2
ht =
.
2
(4.46)
σ̂ML
is computed for .t = 1, . . . , T .
– Step 4. After specifying the variables .Z1t , . . . , Zpt , we regress .ht on these
variables:
ht = a0 + a1 Z1t + . . . + ap Zpt + ut
. (4.47)
184 4 Heteroskedasticity and Autocorrelation of Errors
where .ut is an error term. The explained sum of squares (ESS) of this regression
is calculated.
– Step 5. The quantity:
1
BP =
. ESS (4.48)
2
is computed, which, under the null hypothesis of homoskedasticity, has a Chi-
squared distribution with p degrees of freedom, i.e.:
BP ∼ χp2
. (4.49)
where .ut is an error term. It is also possible to add interaction terms such
as .X1t X2t to this regression. We calculate the coefficient of determination .R 2
associated with this auxiliary regression.
– Step 3. We test the null hypothesis of homoskedasticity:
H0 : a1 = b1 = a2 = b2 = . . . = ak = bk = 0
. (4.52)
T R 2 ∼ χ2k
.
2
(4.53)
Remark 4.3 The White test can also be used as a model misspecification test.
Under the null hypothesis, the White test assumes that the errors are not only
homoskedastic, but also uncorrelated with the regressors and that the linear specifi-
cation of the model is correct. If one of these conditions is violated, the test statistic
is above the critical value. On the contrary, if the value of the test statistic is below
the critical value, this indicates that none of these three conditions is violated.
ARCH Test
ARCH (autoregressive conditionally heteroskedastic) processes were intro-
duced by Engle (1982) and are used to model series whose variance—also called
volatility—in t depends on its past values. This is therefore a particular form of
heteroskedasticity, called conditional heteroskedasticity. The test procedure can
be outlined in four steps:
𝓁
et2 = a0 +
.
2
ai et−i (4.55)
i=1
H0 : a1 = a2 = . . . = a𝓁 = 0
. (4.56)
T R 2 ∼ χ𝓁2
. (4.57)
Y = Xβ + ε
. (4.58)
Y t = X't β + ε t
. (4.59)
where .X't denotes a column vector equal to the transpose of the .t th row of the
matrix .X.
W
The .Ω̂ε estimator of the variance-covariance matrix proposed by White (1980)
is written as:
T
W T ' −1 −1
.Ω̂ε = XX et X t X t X ' X
2 '
(4.60)
T −k−1
t=1
with:
ˆ NW = T
∑
. (4.62)
T −k−1
⎡ ⎛
T
q
T
j
×⎣ e2t Xt X't + ⎝ 1−
q +1
t=1 j =1 t=j +1
× Xt et et−j X't−j + Xt−j et−j et X't
188 4 Heteroskedasticity and Autocorrelation of Errors
Assume that heteroskedasticity is such that the variance of the error term is
proportional to .Xj2t where .Xj t is one of the explanatory variables of the regression
model (4.64), i.e.:
σε2t = aXj2t
. (4.65)
2
εt 1
E
. u2i =E = E εt2 (4.67)
Xj t Xj2t
According to (4.65), .E εt2 = aXj2t , hence:
E u2i = a
. (4.68)
The variance of the transformed error term .ut is constant and it is, therefore,
possible to apply OLS to the transformed model (4.66).
4.2 Heteroskedasticity of Errors 189
Thus, the variance of the error term is now assumed to be proportional to .Xj t ,
with a being a constant. The transformed model is then written as:
The variance of the transformed error term .ut = √εt is given by:
Xj t
2
εt 1
.E u2i =E = E εt2 = a (4.71)
Xj t Xj t
The reduction in heteroskedasticity comes from the fact that the logarithmic
transformation “compresses” the scales on which the variables are measured.
Let us use the series of stock market returns studied in the previous chapter and
consider the following model at monthly frequency over the period from February
1984 to June 2021 (449 observations):
where RF T SE denotes the returns of the F T SE 100 index of the London Stock
Exchange and RDJ I ND the returns of the Dow Jones Industrial Average index of
190 4 Heteroskedasticity and Autocorrelation of Errors
the New York Stock Exchange (see Table 4.1). The series are extracted from the
Macrobond database.
The OLS estimation of model (4.73) leads to the results reported in Table 4.2.
Let us apply the various homoskedasticity tests previously presented.
RSS2
GQ =
. (4.74)
RSS1
where .RSS1 and .RSS2 are the sums of squares of the residuals from each of the
two regressions, with .RSS2 > RSS1 . We see that the residual sum of squares
corresponding to the model estimated on the first 150 observations (Table 4.4)
is greater than that relating to the model estimated on the last 150 observations
(Table 4.5). We therefore have .RSS1 = 0.105055 and .RSS2 = 0.127747. So:
GQ = 1.2160
. (4.75)
192 4 Heteroskedasticity and Autocorrelation of Errors
is rejected.
We then calculate the series of residuals in absolute values .|et | . We regress .|et | on
the explanatory variable RDJ I ND or on various transformations of this variable.
Consider, for example, the following two models:
– .|et | = â0 + â1 RDJ I NDt OLS estimation of this model leads to the following
results:
– .|et | = â0 + â1 (RDJ I NDt )−1 OLS estimation of this regression yields:
In both models, the values in parentheses are the t-statistics associated with the
estimated coefficients.
We proceed to test the null hypothesis of homoskedasticity according to which
.a1 = 0. To test this hypothesis, we compare the absolute values of the t-statistics of
the coefficient .a1 with the value read from the Student’s t table (1.96 at the 5%
significance level). We deduce that such a hypothesis is not rejected by models
(4.77) and (4.78).
We calculate:
1 2
T
.
2
σ̂ML = et (4.80)
T
t=1
4.2 Heteroskedasticity of Errors 193
that is:
2
σ̂ML
. = 7.9438.10−4 (4.81)
et2
ht =
.
2
(4.82)
σ̂ML
We calculate the explained sum of squares of this regression, i.e., .ESS = 10.80
and the Breusch-Pagan statistic:
1
.BP = ESS = 5.40 (4.84)
2
Under the null hypothesis of homoskedasticity, the BP statistic follows a Chi-
squared distribution with 1 degree of freedom (since only one explanatory variable
is included in the regression (4.83)). At the 5% significance level, the critical value
read from the Chi-squared table is equal to 3.841. Consequently, .5.40 > 3.841,
which means we reject the null hypothesis of homoskedasticity.
which gives:
ARCH Test
For the estimation of model (4.73) giving the series of residuals .et , we derive the
series of squared residuals .et2 . The .et2 series is then regressed on a constant and its
past values. Using three lags we obtain the following results:
Heteroskedasticity-Corrected Estimations
Tables 4.6 and 4.7 report the results from the OLS estimation of relationship (4.73),
the variance-covariance matrix estimators being given by White (Table 4.6) and by
Newey and West (Table 4.7). These two techniques allow for heteroskedasticity by
correcting the estimators of the variances and covariances of the OLS estimators.
Thus, the estimated values of the coefficients are identical to those shown in
Table 4.2, but the standard deviations of the coefficients (and therefore t-statistics)
are different. The coefficient associated with the RDJ I N D variable remains
significantly different from zero, both with the White correction and with the
Newey-West correction.
There is autocorrelation when the terms off the diagonal of the variance-covariance
matrix of the errors are not all zero:3
⎛ ⎞
σε2 Cov(ε1 , ε2 ) · · · Cov(ε1 , εT )
⎜
' ⎜ Cov(ε2 , ε1 ) σε2 · · · Cov(ε2 , εT )⎟⎟
.E εε =⎜ .. .. .. .. ⎟ (4.88)
⎝ . . . . ⎠
Cov(εT , ε1 ) Cov(εT , ε2 ) ··· σε2
γ0 = E εt2 = σε2
. (4.90)
for .h = 0, ±1, ±2, . . .. The variance-covariance matrix of the errors is written as:
⎛ ⎞
1 ρ1 · · · ρh−1
⎜ ρ1 1 · · · ρh−2 ⎟
⎜ ⎟
E εε ' = σε2 ⎜
. .. .. . . . ⎟ (4.93)
⎝ . . . .. ⎠
ρh−1 ρh−2 · · · 1
We then have .E εε ' = Ωε /= .σε2 I .
Remark 4.4 When working with time series, the autocorrelation in question is
of a temporal type: the error term at a given date depends on the same error
term at another date. When working with cross-sectional data, we speak of spatial
autocorrelation; the correlation being in space rather than time.
– the omission of one or more important explanatory variables from the model.
It should be remembered that the error term can be interpreted as a set of
explanatory variables not included in the model. Consequently, omitting one
or more explanatory variables may result in autocorrelation of the error term,
particularly if the omitted explanatory variables are themselves autocorrelated.
4.3 Autocorrelation of Errors 197
Yt = α + βXt + εt
. (4.94)
where .Δ is the first difference operator, such that: .ΔZt = Zt − Zt−1 . Such
a first-difference transformation can produce autocorrelation of errors in that,
if the error term of (4.94) is non-autocorrelated, the error term .Δεt of (4.96)
exhibits autocorrelation.
– The nonstationarity of the series considered. If the regression model selected
involves nonstationary series and the error term is itself nonstationary, the latter
will be characterized by the presence of autocorrelation (see, for example, Lardic
and Mignon, 2002 for details).
the error term moves either upwards or downwards over a fairly long period.
Conversely, negative autocorrelation occurs when a positive value of the error term
is followed by a negative value, then a positive value, and so on.
Yt = α + βXt + εt
. (4.97)
with:
εt = ρεt−1 + ut
. (4.98)
where .|ρ| < 1, .E (ε) = 0, and .E εε ' /= σε2 I . The term .ut is assumed to be white
noise.
The process (4.98) is called a first-order autoregressive process, denoted
.AR(1): the error term at t is a function of itself at .t − 1. In other words, this process
Let us explain .Ωε . To this end, we iterate Equation (4.98), which gives:
Continuing, we have:
that is finally:
εt = ut + ρut−1 + ρ 2 ut−2 + . . .
. (4.102)
4.3 Autocorrelation of Errors 199
To do this, let us square the two sides of Eq. (4.102) and consider the expectation:
2
E εt2 = E ut + ρut−1 + ρ 2 ut−2 + . . .
. (4.104)
that is:
σu2
E εt2 = σε2 =
. (4.105)
1 − ρ2
Furthermore, we have:
and:
hence:
that is:
⎛ ⎞
1 ρ ρ 2 · · · ρ T −1
⎜ ρ 1 · · · ρ T −2 ⎟
⎜ ⎟
⎜ ρ2 ⎟
Ωε =
. σε2 ⎜ ⎟ (4.112)
⎜ .. .. .. . ⎟
⎝ . . . .. ⎠
ρ T −1 ρ T −2 ··· 1
σ2
with .σε2 = 1−ρ
u
2.
Transferring this expression to (4.99), we can show that the variance of the OLS
estimator is written as:
⎡ ⎤
T T
xt xt−1 xt xt−2
σε2 ⎢ ⎢ x1 xT ⎥⎥
+ . . . + 2ρ T −1
t=2 t=3
.V β̂OLS = ⎢ 1 + 2ρ + 2ρ 2 ⎥
T
2
⎣
T
2
T
2
T
2
⎦
xt xt xt xt
t=1 t=1 t=1 t=1
(4.113)
−1
β̃ = X' Ω−1
. ε X X' Ω−1
ε Y (4.114)
In this case, we know that the variance of the estimator is written as:
−1
Ωβ̃ = X' Ω−1
. ε X (4.115)
σε2
V β̂ =
. (4.117)
T
xt2
t=1
As we can see from expression (4.113), this is no longer the true variance when
there is error autocorrelation. Consequently, using the usual OLS formulas when
there is autocorrelation leads to an erroneous evaluation of the variance of the
estimators and, consequently, of their t-statistics. The results of the significance
tests can then be significantly affected, leading to incorrect interpretations.
All in all, it should be recalled that when there is autocorrelation of errors, the
GLS method should be used. However, as in the case of heteroskedasticity, GLS can
only be used if the variance-covariance matrix of the errors .Ωε is known. In practice,
this is generally not the case, which is why we will present operational estimation
procedures in the following.
t
4.3 Autocorrelation of Errors 203
. −−−−−−−+++++++++++++−−−−−−−−−−
Here we have three runs: a negative run of length 7, a positive run of length
13, and then a negative run of length 10. We wonder whether these three runs are
from a purely random series of 30 observations. Intuitively, we might think that if
the number of runs is very large, the residuals frequently change sign, which is an
indication in favor of a negative autocorrelation of the residuals. Similarly, if the
number of runs is very small, indicating that residuals rarely change sign, this may
indicate a positive autocorrelation of residuals.
204 4 Heteroskedasticity and Autocorrelation of Errors
independence of observations (here of residuals) and assuming that .T1 > 10 and
.T2 > 10, the number of runs follows a normal distribution with mean:
2T1 T2
E (R) = 1 +
. (4.118)
T
and variance:
2T1 T2 (2T1 T2 − 1)
.σR2 = (4.119)
(T − 1) T 2
E(R) ± 1.96σR
. (4.120)
– If the number of runs R lies within the confidence interval, the null hypothesis is
not rejected.
– If the number of runs R lies outside the confidence interval, the null hypothesis
is rejected.
T
(et − et−1 )2
t=2
DW =
. (4.121)
T
et2
t=1
4.3 Autocorrelation of Errors 205
where .et are the residuals resulting from the estimation of the regression model
(simple or multiple). It allows us to test the null hypothesis of no first-order
autocorrelation of the residuals against the alternative hypothesis of first-order
autocorrelation of the residuals. If we assume that the error term follows a first-
order autoregressive process:
εt = ρεt−1 + ut
. (4.122)
H0 : ρ = 0
. (4.123)
H1 : ρ /= 0
. (4.124)
T
T
T
et2 + 2 −2
et−1 et et−1
t=2 t=2 t=2
DW =
. (4.125)
T
et2
t=1
T
T
2 et2 − 2 et et−1
t=2 t=2
. DW ≃ (4.126)
T
et2
t=1
that is:
⎛ ⎞
T
⎜ e e
t t−1 ⎟
⎜ t=2 ⎟
.DW ≃ 2 ⎜1 − ⎟ (4.127)
⎝ 2 ⎠
T
et
t=1
T
et et−1
Given that .E (et ) = 0, the term .
t=2
2
T
represents the estimate of the first-order
et
t=1
autocorrelation coefficient of the residual series. In other words, this is the estimate
206 4 Heteroskedasticity and Autocorrelation of Errors
of the coefficient .ρ in the regression of .et on .et−1 . Let us denote this estimated
coefficient .ρ̂. We can write:
DW ≃ 2 1 − ρ̂
. (4.128)
The expression (4.128) shows that there is a relationship between the statistic DW
and the first-order autocorrelation coefficient of the residuals. Furthermore, this
relationship allows us to highlight various characteristics of the statistic DW :
– Given that a coefficient of autocorrelation varies between .−1 and 1, the statistic
DW varies between 0 and 4. It is 0 when there is perfect positive autocorrelation
(.ρ̂ = 1) and 4 when there is perfect negative autocorrelation (.ρ̂ = −1).
– When .DW ≃ 2, the residuals are not autocorrelated (.ρ̂ = 0).
– When .DW > 2, the autocorrelation of the residuals is negative.
– When .DW < 2, the autocorrelation of the residuals is positive.
To carry out the test, Durbin and Watson tabulated a lower bound and an upper
bound for the critical values of the statistic DW as a function of the number of
observations and the number of explanatory variables k included in the model under
consideration. The table thus gives two values .d1 (lower bound) and .d2 (upper
bound) allowing us to perform the test according to the table below.
.0 .d1 .d2 .2 . 4 − d2 . 4 − d1 . 4
.ρ > 0 .ρ = 0 .ρ < 0
Rejection of .H0
? Non-rejection of .H0
? Rejection of .H0
We can see that there are two regions of “doubt” or indecision. In practice, if
we find ourselves in one of these regions, we tend to reject the null hypothesis
of no autocorrelation. This is because the consequences of not rejecting the null
hypothesis of no autocorrelation even though it is false are considered more “severe”
than the consequences of wrongly assuming the absence of autocorrelation. Thus, in
practice, when in a region of doubt, we use the upper bound .d2 as if it were a usual
critical value: we reject the null hypothesis of no autocorrelation if .DW < d2 . The
region of doubt decreases as the sample size increases.
The Durbin-Watson test is very frequently used. However, it is important to
specify certain conditions of use:
– The regression model must include a constant term. The critical values given
in the tables of Durbin and Watson have indeed been tabulated assuming the
presence of a constant in the regression model.4
4 Let us mention, however, that Farebrother (1980) tabulated the critical values of the statistic DW
in the absence of a constant term.
4.3 Autocorrelation of Errors 207
T
(et − et−4 )2
t=5
DW4 =
. (4.129)
T
et2
t=1
εt = ρ4 εt−4 + ut
. (4.130)
Wallis (1972) derived tables of critical values including an upper and lower
bound for .DW4 (see also Giles and King, 1978).
with:
εt = ρεt−1 + ut
. (4.132)
208 4 Heteroskedasticity and Autocorrelation of Errors
!
! T
h = ρ̂ "
. (4.133)
1 − T V φ̂1
where .V φ̂1 denotes the estimated variance of the coefficient associated with
Yt−1 in regression (4.131) and .ρ̂ is the estimator of the first-order autoregressive
.
T
et et−1
t=2
ρ̂ =
. (4.134)
T
et2
t=1
Under the null hypothesis of no autocorrelation at order 1 .(ρ = 0), the Durbin
statistic has a standard normal distribution, i.e.:
h ∼ N(0, 1)
. (4.135)
– If .|h| < 1.96, the null hypothesis of no autocorrelation at order 1 is not rejected.
– If .|h| > 1.96, the null hypothesis of no autocorrelation at order 1 is rejected.
– Estimate model (4.131) by OLS and derive the residual series .et .
– Estimate by OLS the regression of .et on .et−1 , Yt−1 , . . . , Yt−p , X1t , . . . ,
Xkt .
– Perform a test of significance (t-test) on the coefficient associated with .et−1 . If
this coefficient is significantly different from zero, the residuals are autocorre-
lated to order 1.
The Breusch-Godfrey test is a Lagrange multiplier test based on the search for a
relationship between the errors εt , t = 1, . . . , T .
Suppose that the error term εt of the multiple regression model:
. H0 : φ1 = φ2 = . . . = φp = 0 (4.138)
BG = (T − p) R 2
. (4.140)
Remark 4.7 In the previous developments, it has been assumed that the error term
εt of the multiple regression model follows an autoregressive process of order p
(Eq. (4.137)). The Breusch-Godfrey test can also be applied in the case where the
210 4 Heteroskedasticity and Autocorrelation of Errors
error process follows a moving average process of order p, which is noted MA(p),
i.e.:
against the alternative hypothesis that there is at least one coefficient .ρh(et )
significantly different from zero.
The test statistic is written as:
H
BP (H ) = T
.
2
ρ̂h(e t)
(4.143)
h=1
T
et et−h
t=h+1
ρ̂h =
. (4.144)
T
et2
t=1
5 Itis assumed here that the lagged dependent variable is not among the explanatory variables. We
will come back to the Box-Pierce test in Chap. 7.
4.3 Autocorrelation of Errors 211
to that of the Chi-squared in small samples than is that of the Box-Pierce test. The
test statistic is written as:
H 2
ρ̂h(e t)
LB(H ) = T (T + 2)
. (4.146)
T −h
h=1
In the presence of error autocorrelation, the OLS estimators remain unbiased, but
are no longer of minimum variance. As in the case of heteroskedasticity, this has the
consequence of affecting the precision of the tests. So how can we correct for error
autocorrelation?
To answer this question, we need to distinguish between cases where the variance
of the error term is known and those where it is unknown. When the variance of the
error term is known, we have seen (see Sect. 4.1.2) that the GLS method should be
applied in the presence of autocorrelation. When the variance of the error term is
unknown, various methods are available, which we describe below.
Case Where the Variance of the Error Term Is Known: General Principle
of GLS
Consider the multiple regression model:
Y = Xβ + ε
. (4.148)
with .E εε' = Ωε .
As we have seen previously (see Sect. 4.1.2), the GLS method can be applied
provided we find a transformation matrix .M of known parameters, such as:
M ' M = 𝚪ε−1
. (4.149)
with:
−1
. 𝚪ε = σε2 Ωε (4.150)
212 4 Heteroskedasticity and Autocorrelation of Errors
It is then sufficient to apply OLS to the transformed variables .MY and .MX. To
get a clearer picture, let us consider the simple regression model:
Yt = α + βXt + εt
. (4.151)
and assume that the error term follows a first-order autoregressive process (.AR(1)),
i.e.:
εt = ρεt−1 + ut
. (4.152)
σ2
with .σε2 = 1−ρu
2.
If .ρ is known, the GLS estimator:
−1
. β̃ = X' Ω−1
ε X X' Ω−1
ε Y (4.154)
Then we have:
⎛ ⎞
ρ 2 −ρ 0 0 ··· 0
⎜−ρ 1 + ρ 2 −ρ ··· ⎟
⎜ 0 0 ⎟
⎜ ⎟
⎜ 0 −ρ 1 + ρ 2 −ρ 0 ⎟
.M M = ⎜ . ⎟
'
⎜ . .. .. .. ⎟ (4.157)
⎜ . . . . ⎟
⎜ ⎟
⎝ 0 0 0 · · · 1 + ρ 2 −ρ ⎠
0 0 ··· 0 −ρ 1
' 2 −1
.M M is identical to .σu Ωε , except for the first element of the diagonal (.ρ
2
instead of 1).
By applying the matrix .M to model (4.151), we obtain the transformed variables:
⎛ ⎞
Y2 − ρY1
⎜ Y3 − ρY2 ⎟
⎜ ⎟
MY = ⎜
. .. ⎟ (4.158)
⎝ . ⎠
YT − ρYT −1
and
⎛ ⎞
1 X2 − ρX1
⎜1 X3 − ρX2 ⎟
⎜ ⎟
.MX = ⎜ . .. ⎟ (4.159)
⎝ .. . ⎠
1 XT − ρXT −1
The GLS method amounts to applying the OLS to the regression model formed
by the .(T − 1) transformed observations .MY and .MX :
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
Y2 − ρY1 1 X2 − ρX1 u2
⎜ Y3 − ρY2 ⎟ ⎜1 X3 − ρX2 ⎟ ⎜ ⎟
u
⎜ ⎟ ⎜ ⎟ α (1 − ρ) ⎜ 3⎟
.⎜ .. ⎟ = ⎜. .. ⎟ +⎜ . ⎟ (4.160)
⎝ . ⎠ ⎝ .. . ⎠ β ⎝ .. ⎠
YT − ρYT −1 1 XT − ρXT −1 uT
Remark 4.8 In order not to lose the first observation, we can add a first row to the
matrix M. This first
row is such that all the elements are zero, except the first one
which is equal to . 1 − ρ 2 .
this estimation has been made, the method previously described can be applied
by replacing .ρ by its estimator .ρ̂ in the transformed model. Various methods are
available for this purpose and are discussed below.
Case Where the Variance of the Error Term Is Unknown: Pseudo GLS
Methods
We can distinguish iterative methods from other techniques. These different meth-
ods are called pseudo GLS methods. Generally speaking, they consist of estimating
the parameters of the residuals’ generating model, transforming the variables of
the model using these parameters, and applying OLS to the model formed by the
variables thus transformed.
Non-iterative Methods
Among the non-iterative methods, it is possible to find the estimator .ρ̂ of the
coefficient .ρ in two different ways: by relying on the Durbin-Watson statistic or
by performing regressions using residuals.
where .ρ̂ denotes the estimate of .ρ in the regression of the residuals .et on .et−1 . Using
this expression leads directly to the estimator:
DW
ρ̂ ≃ 1 −
. (4.162)
2
Once this estimator has been obtained, we transform the variables as follows:
T
et et−1
t=2
ρ̂ =
. (4.164)
T
et2
t=1
4.3 Autocorrelation of Errors 215
It then remains for us to transform the variables and apply OLS to the transformed
model.6
Iterative Methods
Various iterative pseudo GLS techniques are available to estimate the coefficient
.ρ. The best known are those of Cochrane and Orcutt (1949) and Hildreth and Lu
(1960).
– Step 1. The regression model under consideration is estimated and the residuals
.et are deduced. An initial estimate .ρ̂0 of .ρ is obtained:
T
et et−1
t=2
ρ̂0 =
. (4.165)
T
et2
t=1
– Step 2. The transformed variables .Yt − ρ̂0 Yt−1 and .Xit − ρ̂0 Xit−1 are constructed
for .i = 1, . . . , k, with k denoting the number of explanatory variables.
– Step 3. OLS is applied to the model in quasi-differences:
Yt − ρ̂0 Yt−1 = α 1 − ρ̂0 + β1 X1t − ρ̂0 X1t−1 + . . .
.
+ βk Xkt − ρ̂0 Xkt−1 + ut (4.166)
(1)
– Step 4. From the new estimation residuals .et , a new estimation .ρ̂1 of .ρ is
performed:
T
(1) (1)
et et−1
t=2
ρ̂1 =
. (4.167)
T
(1)2
et
t=1
– Step 5. We construct the transformed variables .Yt − ρ̂1 Yt−1 and .Xit − ρ̂1 Xit−1
and apply the OLS to the model in quasi-differences:
Yt − ρ̂1 Yt−1 = α 1 − ρ̂1 + β1 X1t − ρ̂1 X1t−1 + . . .
.
+ βk Xkt − ρ̂1 Xkt−1 + ut (4.168)
6 Itis unnecessary to introduce a constant term in the regression of .et on .et−1 since the mean of
the residuals is zero.
216 4 Heteroskedasticity and Autocorrelation of Errors
(2)
A new set of residuals .et is deduced, from which a new estimate .ρ̂2 of .ρ is
obtained and so on.
Remark 4.9 We previously noted (see Remark 4.8) that it was possible not to omit
the first observation during the variable transformation step. When this observation
is not omitted, the method of Cochrane-Orcutt is slightly modified and is called the
Prais-Winsten method (see Prais and Winsten, 1954).
(4.169)
– Step 1. We give ourselves a grid of possible values for .ρ̂, between .−1 and 1. For
example, we can set a step size of 0.1 and consider the values .−0.9, .−0.8, . . . ,
0.8, 0.9.
– Step 2. Relationship (4.169) is estimated for each of the previously fixed values
of .ρ̂. The value of .ρ̂ that minimizes the sum of squared residuals is retained.
– Step 3. To refine the estimates, we repeat the previous two steps, setting a smaller
step size (e.g., 0.01) and so on.
Other Methods
Two other techniques can also be implemented to account for autocorrelation.
The first technique involves applying the maximum likelihood method to the
regression model. This method simultaneously estimates the usual parameters of
the regression model as well as the value of .ρ (see Beach and MacKinnon, 1978).
The second technique has already been discussed in the treatment of het-
eroskedasticity. This is the correction proposed by Newey and West (1987). Recall
that this technique allows us to apply OLS to the regression model, despite the
presence of error autocorrelation, and to correct the standard deviations of the
estimated coefficients. We do not describe this technique again, since it has already
been outlined (see Sect. 4.2.4).
Yt = α + βXt + εt
. (4.170)
4.3 Autocorrelation of Errors 217
and assume that the error term follows a first-order autoregressive process, i.e.:
Yt = α + βXt + ρεt−1 + ut
. (4.172)
Thus, compared with the usual regression model without error autocorrelation,
the term .ρ ε̂T is added.
Let us consider our monthly-frequency model over the period February 1984–June
2021 linking the returns of the RF T SE London Stock Exchange index to the returns
of the RDJ I ND New York Stock Exchange index (see Table 4.8):
The OLS estimation of model (4.174) leads to the results shown in Table 4.9.
The residuals resulting from the estimation of this model are plotted in Fig. 4.11. In
order to determine whether or not they are autocorrelated, let us apply the tests of
absence of autocorrelation.
The value of the Durbin-Watson test statistic is given in Table 4.9: .DW =
2.2247. At the 5% significance level, the reading of the Durbin-Watson table in
the case where only one exogenous variable appears in the model gives .d1 = 1.65
and .d2 = 1.69. Since .d2 < DW < 4 − d2 , we do not reject the null hypothesis of
absence of first-order autocorrelation of the residuals.
.100
.075
.050
.025
.000
-.025
-.050
-.075
-.100
1985 1990 1995 2000 2005 2010 2015 2020
RFTSE Residuals
and:
We then regress .DRF T SEt on a constant and .DRDJ I N Dt . The results are
shown in Table 4.12.
We can calculate the constant term:
Conclusion
independent of each other (collinearity), and when there is some instability in the
estimated model.
Further Reading
This chapter includes a large number of references related to methods for detecting
heteroskedasticity and autocorrelation, as well as the solutions provided. In addition
to these references, most econometric textbooks contain developments on het-
eroskedasticity and autocorrelation problems. In particular, the books by Dhrymes
(1978), Judge et al. (1985, 1988), Davidson and MacKinnon (1993), Hendry (1995),
Wooldridge (2012), Gujarati et al. (2017), or Greene (2020) can be recommended.
Problems with Explanatory Variables: Random
Variables, Collinearity, and Instability 5
As we saw in the third chapter, the multiple regression model is based on a number
of assumptions. Here, we focus more specifically on the first two assumptions,
which relate to explanatory variables:
In this chapter, we look at what happens when these assumptions do not hold. If
the first assumption is violated, the implication is that the explanatory variables
are dependent on the error term. Under these conditions, the OLS estimators
are no longer consistent and it is necessary to use another estimator called the
instrumental variables estimator. This is the subject of the first section of the
chapter.
The consequence of violating the second assumption is that the explanatory
variables are not linearly independent. In other words, they are collinear. This issue
of multicollinearity is addressed in the second section of the chapter.
Finally, we turn our attention to the third problem related to the explanatory
variables, namely, the question of the stability of the estimated model.
The aim of this section is to find an estimator that remains valid in the presence
of correlation between the explanatory variables and the error term. We know that
when the independence assumption between the matrix of explanatory variables
and the error term is violated, the OLS estimator is no longer consistent: even if
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 223
V. Mignon, Principles of Econometrics, Classroom Companion: Economics,
https://doi.org/10.1007/978-3-031-52535-3_5
224 5 Problems with Explanatory Variables
we increase the sample size, the estimator does not tend towards its true value.
It is therefore necessary to find another estimator that does not suffer from this
consistency problem. This is precisely the purpose of the instrumental variables
method, which consists in finding a set of variables that are uncorrelated with the
error term but that are correlated with the explanatory variables, in order to represent
them correctly. Applying this method yields an estimator, called the instrumental
variables estimator, which remains valid in the presence of correlation between the
explanatory variables and the error term.
If the explanatory variables are random and correlated with the error term, it can
be shown that the OLS estimator is no longer consistent (see in particular Greene
2020). In other words, even if the sample size grows indefinitely, the OLS estimators
.β̂ do not approach their true values .β:
P lim β̂ /= β
. (5.1)
Y = Xβ + ε
. (5.2)
Cov(Z ' ε) = 0
. (5.3)
In other words, the aim is to find a matrix .Z of variables that are uncorrelated at
each period with the error term, i.e.:
E (Zit εt ) = 0
. (5.4)
for .i = 1, . . . , k and .t = 1, . . . , T .
Let us premultiply the model (5.2) by .Z ' :
Assuming, by analogy with the OLS method, that .Z ' ε = 0, we can write:
Z ' Y = Z ' Xβ
. (5.6)
Under the assumption that the matrix . Z ' X is non-singular, we obtain the
instrumental variables estimator, denoted .β̂ I V , defined by:
−1 '
β̂ I V = Z ' X
. ZY (5.7)
It can be shown (see in particular Johnston and Dinardo, 1996) that the estimator
of instrumental variables is a consistent estimator of .β, i.e.:
. P lim β̂ I V = β (5.8)
The variables that appear in the matrix .Z are called instrumental variables
or instruments. Some of these variables may be variables that are present in
the original explanatory variables matrix .X. The instrumental variables must be
correlated with the explanatory variables, that is:
Cov(Z ' X) /= 0
. (5.9)
Otherwise, the matrix . Z ' X would indeed be zero, and the procedure could not
be applied.
−1 '
By positing .X̂ = Z Z ' Z Z X, we can also write the instrumental variables
estimator as follows:
' −1 ' ' −1 '
β̂ I V = X̂ X
. X̂ Y = X̂ X̂ X̂ Y (5.10)
' '
because .X̂ X = X̂ X̂.
Employing a technique similar to that used in Chap. 3 for the OLS estimator, it
can easily be shown that the variance-covariance matrix .Ωβ̂I V of the instrumental
variables estimator is given by:
' −1
Ωβ̂I V = σε2 X̂ X̂
. (5.11)
It now remains for us to find a procedure to assess whether or not the explanatory
variables are correlated with the error term in order to determine which estimator to
choose between the OLS estimator and the instrumental variables estimator. To this
end, the Hausman (1978) specification test is used.
226 5 Problems with Explanatory Variables
When the explanatory variables are not correlated with the error term, it is preferable
to use the OLS estimator rather than the instrumental variables estimator, as the OLS
estimator is more accurate (for demonstrations, see in particular Greene, 2020). It
is therefore important to have a test that can be used to determine whether or not
there is a correlation between the explanatory variables and the error term. This is
the purpose of the Hausman test (Hausman, 1978).
This test consists of testing the null hypothesis that the explanatory variables and
the error term are uncorrelated, against the alternative hypothesis that the correlation
between the two types of variables is non-zero. Under the null hypothesis, the
OLS and instrumental variables estimators are consistent, but the OLS estimator
is more accurate. Under the alternative hypothesis, the OLS estimator is no longer
consistent, unlike the instrumental variables estimator.
The idea behind the Hausman test is to test the significance of the difference
between the two estimators. If the difference is not significant, the null hypothesis
is not rejected. On the other hand, if the difference is significant, the null hypothesis
is rejected and the instrumental variables estimator should be used. We calculate the
following statistic, known as the Wald statistic:
where .σ̂ε2 denotes the estimator of the variance of the error term .σε2 , i.e. (see
Chap. 3):
e' e
σ̂ε2 =
. (5.13)
T −k−1
yt = βxt∗ + εt
. (5.14)
and assume that the observations .xt available are not a perfect measure of .xt∗ . In
other words, the observed variable .xt is subject to measurement errors, i.e.:
xt = xt∗ + μt
. (5.15)
where .μt is an error term that follows a normal distribution of zero mean
and variance .σμ2 . It is further assumed that the two error terms .εt and .μt are
independent. Such a model can, for example, be representative of the link between
consumption and permanent income, where .yt denotes current consumption and .xt∗
permanent income. Permanent income is not observable, only current income .xt
being observable. .μt thus denotes the measurement error on permanent income .xt∗ .
We can rewrite the model as follows:
yt = βxt − βμt + εt
. (5.16)
ηt = −βμt + εt
. (5.17)
Then we have:
. yt = βxt + ηt (5.18)
Since the covariance between .xt and .ηt is non-zero, it follows that the OLS
estimator is biased1 and is not consistent. Thus, when there is a measurement error
on the explanatory variable, the OLS estimator is no longer consistent and the
instrumental variables estimator should be used.
1 In the case where it is the explained variable that is observed with error, then the OLS estimator
is still non-consistent, but is no longer biased.
228 5 Problems with Explanatory Variables
X2t = λX1t
. (5.20)
where .λ1 , . . . , λk are constants that are not all zero simultaneously.
In these cases of perfect collinearity or multicollinearity, the rank of the matrix .X
is less than .k + 1, which means that the assumption of linear independence
between
the columns of .X no longer holds. It follows that the rank of . X' X is alsoless
than .k + 1. It is therefore theoretically impossible to invert the matrix . X' X , as
the latter is singular (its determinant is zero). The regression coefficients are then
indeterminate.
Cases of perfect collinearity and multicollinearity are rare. In practice, explana-
tory variables frequently exhibit strong, but not perfect, multicollinearity. We
then speak of quasi-multicollinearity or, more simply, multicollinearity. There is
multicollinearity if, in a model with k explanatory variables, we have the following
relationship:
Multicollinearity has several effects. Firstly, the variances and covariances of the
estimators tend to increase. Let us explain this point.
We demonstrated in Chap. 3 that the variance-covariance matrix .Ωβ̂ of the OLS
coefficients is given by:
−1
Ωβ̂ = σε2 X ' X
. (5.23)
We have also shown that the variance of the OLS coefficient .β̂i associated with
the .ith explanatory variable .Xit is written as:
V β̂i = σε2 ai+1,i+1
. (5.24)
−1
where .ai+1,i+1 denotes the .(i + 1) th element of the diagonal of . X' X . It is
possible to show that:
1
ai+1,i+1 =
. = V I Fi (5.25)
1 − Ri2
where .V I Fi is the variance inflation factor and .Ri2 is the coefficient of deter-
mination associated with the regression of the variable .Xit on the .(k − 1) other
explanatory variables.
The statistic .V I Fi indicates how the variance of an estimator increases when
there is multicollinearity. In this
case, .Ri2 tends to 1 and .ai+1,i+1 tends to infinity.
It follows that the variance .V β̂i also tends to infinity. Multicollinearity therefore
increases the variance of the estimators.
The second effect of multicollinearity is that the OLS estimators are highly
sensitive to small changes in the data. A small change in one observation or in the
number of observations can result in a large change in the estimated values of the
coefficients.
Let us take an example.2 From the data in Table 5.1, we estimate the following
models by OLS:
Yt = α + β1 X1t + β2 X2t + εt
. (5.26)
2 Of course, this example is purely illustrative in the sense that only six observations are considered.
230 5 Problems with Explanatory Variables
and
The variables .X2t and .X3t differ only in the final observation (Table 5.1). Look-
ing at the results in Table 5.2, we see that this small change in the data significantly
alters the estimates. Although not significant, the values of the coefficients of the
explanatory variables differ markedly between the two regressions; the same is true
for their standard deviations.
This example also highlights the first mentioned effect of multicollinearity,
namely, the high value of the standard deviations of the estimated coefficients.
There are also other effects of multicollinearity. These include the following
consequences:
– Because of the high value of the variances of the estimators, the t-statistics
associated with certain coefficients can be very low, even though the values taken
by the coefficients are high.
– Despite the non-significance of one or more explanatory variables, the coefficient
of determination of the regression can be very high. This is frequently considered
to be one of the most visible symptoms of multicollinearity. Thus, if the
coefficient of determination is very high, the Fisher test tends to reject the null
hypothesis of non-significance of the regression as a whole, even though the t-
statistics of several coefficients indicate that the latter are not significant.
– Some variables are sensitive to the exclusion or inclusion of other explanatory
variables.
5.2 Multicollinearity and Variable Selection 231
and suppose that .X3t is a linear combination of the other two explanatory variables:
where .λ1 and .λ2 are simultaneously non-zero constants. Because of the existence of
this linear combination, the coefficient of determination .R 2 from the regression of
.X3t on .X1t and .X2t is equal to 1. By virtue of the relationship (3.106) from Chap. 3,
we can write:
The previous relationship is satisfied for .rX3 X1 = rX3 X2 = 0.6 and .rX1 X2 =
−0.28. It is worth mentioning that these values are not very high even though there
is multicollinearity.
Consequently, in a model with more than two explanatory variables, care must
be taken when interpreting the values of the correlation coefficients.
The underlying idea is that, if the variables are perfectly correlated, the determi-
nant of this matrix is zero. Let us take an example to visualize this property.
Example 5.2 Consider a model with two explanatory variables X1t and X2t . The
determinant D of the correlation coefficient matrix is given by:
1 rX X
D = 1 2
(5.32)
rX2 X1 1
.
The determinant of the correlation coefficient matrix is zero when the variables
are perfectly correlated.
Conversely, when the explanatory variables are orthogonal, rX1 X2 = 0 and the
determinant of the correlation coefficient matrix is 1.
F G ∼ χ 21 k(k+1)
. (5.35)
2
of the matrix . X' X . If the matrix .X has been normalized, so that the length of each
of its columns is 1, then the .𝜘 statistic is equal to 1 when the columns are orthogonal
and greater than 1 when the columns exhibit multicollinearity. This technique is not
a statistical test as such, but it is frequently considered that values of .𝜘 between 10
and 30 correspond to a situation of moderate multicollinearity, and that values above
30 are an indication in favor of strong multicollinearity.
1
V I Fi =
. (5.37)
1 − Ri2
where .Ri2 is the coefficient of determination relating to the regression of the variable
.Xit on the .(k − 1) other explanatory variables. Obviously, the value of .V I Fi is
higher the closer .Ri2 is to 1. Consequently, the higher .V I Fi is, the more collinear
the variable .Xit is.
234 5 Problems with Explanatory Variables
possible to identify the highest values and, thus, identify collinear variables. But, if
the differences between the .V I Fi statistics for the different explanatory variables
are small, it is impossible to detect the variables responsible for multicollinearity.
In practice, if the value of the .V I Fi statistic is greater than 10, which corresponds
to the case where .Ri2 > 0.9, the variable .Xit is considered to be strongly collinear.
Empirical Application
Consider the following model:
where:
– REU RO denotes the series of returns of the European stock market index, Euro
Stoxx 50.
– RDJ I N D is the series of returns of the Dow Jones Industrial Average index.
– RF T SE is the series of returns of the UK stock market index, F T SE 100.
– RNI KKEI is the series of returns of the NI KKEI index of the Tokyo Stock
Exchange.
The data, taken from the Macrobond database, are quarterly and cover the period
from the second quarter of 1987 to the second quarter of 2021 (.T = 137).
We are interested in the possible multicollinearity between the three explanatory
variables under consideration. Let us start by calculating the matrix of correlation
coefficients among the explanatory variables:
⎛ ⎞
1 rRDJ I N D,RF T SE rRDJ I N D,RN I KKEI
. ⎝ rRF T SE,RDJ I N D 1 rRF T SE,RN I KKEI ⎠
rN I KKEI,RDJ I N D rRN I KKEI,RF T SE 1
⎛ ⎞
1 0.8562 0.6059
= ⎝0.8562 1 0.5675⎠ (5.39)
0.6059 0.5675 1
Table 5.3 Estimation of the relationship between the series of stock market returns
Variable Coefficient Std. Error t-Statistic Prob.
C −0.005134 0.004447 −1.154642 0.2503
RDJIND 0.525822 0.107257 4.902431 0.0000
RFTSE 0.633759 0.101216 6.261419 0.0000
RNIKKEI 0.084954 0.046478 1.827827 0.0698
R-squared 0.795886 Mean dependent var 0.011257
Adjusted R-squared 0.791282 S.D. dependent var 0.107580
S.E. of regression 0.049149 Akaike info criterion .−3.159171
To investigate whether the Farrar and Glauber test leads to the same conclusion,
let us calculate the determinant D of the matrix of correlations between the
explanatory variables. We obtain:
D = 0.1666
. (5.40)
The value read from the table of the Chi-squared distribution is equal to .χ62 =
12.592 at the 5% significance level. As the calculated value is higher than the critical
value, the null hypothesis of orthogonality between the explanatory variables is
rejected and the presumption of multicollinearity is confirmed.
Let us now apply the technique based on the calculation of the variance inflation
factors (V I F ). To do this, we regress each of the explanatory variables on the
other two and calculate the coefficient of determination associated with each
regression. The results are reported in Table 5.4. The values of the V I F statistics
are relatively low (less than 10), suggesting that multicollinearity, if present, is not
very strong. This is consistent with the fact that the coefficients of determination .Ri2
associated with each of the three regressions are lower than the overall coefficient
of determination ascertained by estimating the model (5.38).
where .Xr is the submatrix of size .(T , r) formed by the first r columns of .X and .X s
is the submatrix composed of the .s = k + 1 − r remaining columns.
Suppose that, in a previous study, the coefficient .β̂ s was obtained and that it is
an unbiased estimator of .β s . It then remains for us to estimate .β r . To do this, we
start by calculating a new dependent variable .Ỹ , which consists in correcting the
dependent variable of the observations already used, .Xs :
Ỹ = Y − Xs β̂ s
. (5.43)
We then regress .Ỹ on the explanatory variables appearing in .Xr and obtain the
following OLS estimator .β̂ r :
−1 '
β̂ r = X'r Xr
. Xr Ỹ (5.44)
Given that:
Y = Xβ + ε = X r β r + X s β s + ε
. (5.45)
we can write:
−1 '
β̂ r = X 'r X r
. Xr Xr β r + Xs β s + ε − Xs β̂ s (5.46)
5.2 Multicollinearity and Variable Selection 237
Hence:
−1 ' −1 '
β̂ r = β r + X'r Xr
. X r X s β s − β̂ s + X'r Xr Xr ε (5.47)
Knowing that .E (ε) = 0 and .E β̂ s = β s , we deduce:
E β̂ r = β r
. (5.48)
Remark 5.1 A technique similar to this is to combine time series and cross-
sectional data (see in particular Tobin, 1950).
The ridge estimator .β̂ R is therefore a biased estimator of .β. However, Schmidt
(1976) showed that the variances of the elements of .β̂ R are lower than those
associated with the elements of the vector of OLS estimators.
The difficulty inherent in the ridge regression lies in the choice of the value of
c. Hoerl and Kennard (1970a,b) suggest estimating using several values for c in
order to study the stability of .β̂ R . The technique, known as ridge trace, consists in
238 5 Problems with Explanatory Variables
plotting the different values of .β̂ R on the y-axis for various values of c on the x-axis.
The value of c is then selected as the one for which the estimators .β̂ R are stable.
Remark 5.2 The ridge regression method can be generalized to the case where a
value
' different from c is added to each of the elements of the diagonal of the matrix
. X X . This technique is called generalized ridge regression.
Other Techniques
There are other procedures for dealing with the multicollinearity problem, which we
briefly mention below:
In addition to the model comparison criteria presented in Chap. 3, which may also
be useful here, there are various methods for selecting explanatory variables. These
techniques can guide us in choosing which variables to remove or add to a model.
5.2 Multicollinearity and Variable Selection 239
Empirical Application
Consider the previous empirical application aimed at explaining the returns
REU RO of the European stock index (Euro Stoxx 50) by three explanatory
variables:
then estimate the models with two explanatory variables: (RF T SE and RDJ I ND)
and (RF T SE and RNI KKEI ). These are models (4) and (6), respectively. In each
of these models, the new variable has a coefficient significantly different from zero.
Since the coefficient associated with RDJ I ND has a higher t-statistic than that for
RNI KKEI , the second explanatory variable is RDJ I ND. Finally, we estimate
the model with three explanatory variables, model (7), which is the model that we
select, if we consider a 10% significance level, since the coefficients of the three
variables are significant. If the usual 5% significance level is used, model (4) should
be selected.
The application of the stepwise method is identical to the previous case, and
the same model is selected, with the three explanatory variables having significant
coefficients at the 10% significance level—model (4) being chosen if a 5%
significance level is considered.
The focus here is on studying the stability of the estimated model. When estimating
a model over a certain period of time, it is possible that a structural change
may appear in the relationship between the dependent variable and the explanatory
variables. It is thus possible that the values of the estimated parameters do not
remain identical over the entire period studied. In some cases, the introduction of
indicator variables allows us to take account of these possible structural changes.
We also present various stability tests of the estimated coefficients. Beforehand, we
outline the constrained least squares method consisting in estimating a model
under constraints.
242 5 Problems with Explanatory Variables
In Chap. 3, we presented various tests of the hypothesis that the parameter vector .β
is subject to the existence of q constraints:
H0 : Rβ = r
. (5.52)
R β̂ 0 = r
. (5.53)
This estimator, called the constrained least squares estimator, is given by:3
The null hypothesis .H0 : .Rβ = r can be tested using a Fisher test (see Chap. 3):
(RSSc − RSSnc ) /q
F =
. ∼ F (q, T − k − 1) (5.55)
RSSnc / (T − k − 1)
where .RSSnc is the sum of the squared residuals of the unconstrained model (i.e.,
that associated with the vector .β̂) and .RSSc denotes the sum of the squares of the
residuals of the constrained model (i.e., that associated with the vector .β̂ 0 ), q being
the number of constraints and k the number of explanatory variables included in the
model. As we will see later in this chapter, such a test can also be used to assess the
possibility of structural changes.
Example 5.3 In simple cases, CLS are reduced to OLS on a previously transformed
model. Consider the following model:
Yt = α + β1 X1t + β2 X2t + εt
. (5.56)
with .β1 + β2 = 1. This is a model with two explanatory variables (.k = 2) and one
constraint (.q = 1), so we have .q < k. Noting that .β2 = 1 − β1 , we can write the
model as follows:
that is:
.Zt = α + β1 Wt + εt (5.58)
with .Zt = Yt − X2t and .Wt = .X1t − X2t . It is then possible to apply the OLS
method to Eq. (5.58) to obtain .α̂ and .β̂1 . We then deduce: .β̂2 = 1 − β̂1 .
Thus, dummy variables are introduced into a regression model when we wish to
take a binary explanatory factor into account among the explanatory variables. As
an example, such a factor could be:
– The phenomenon either takes place or does not; the dummy variable is then 1 if
the phenomenon takes place, 0 otherwise.
– The male or female factor; the dummy variable is equal to 1 if the person is a
man, 0 if it is a woman (or vice versa).
– The place of residence: urban or rural; the dummy variable is equal to 1 if the
person lives in an urban zone, 0 if in a rural area (or vice versa).
– etc.
The dummy variables enable data to be classified into subgroups based on various
characteristics or attributes. Such variables can be introduced into a regression
244 5 Problems with Explanatory Variables
Introductory Examples
One frequent use of dummy variables is to take account of an exceptional or
even aberrant phenomena. Examples include the following: German reunification in
1991, the launch of the euro in 1999, the September 11, 2001 attacks in the United
States, the winter 1995 strikes in France, the December 1999 storm in France, the
October 1987 stock market crash, the Covid-19 pandemic that broke out at the end
of 2019, etc.
Consider, for example, the following regression model:
Yt = α + β1 Xt + εt
. (5.59)
Yt = α + β1 Xt + β2 Dt + εt
. (5.60)
with:
1 if t = t0
. Dt = (5.61)
0 otherwise
Yt = (α + β2 ) + β1 Xt + εt if t = t0
. (5.62)
and:
Yt = α + β1 Xt + εt if t /= t0
. (5.63)
The two models differ only in the value of the intercept: a perturbation taken into
account via a dummy variable affects only the intercept of the model.
There are, however, cases where the perturbation also impacts the slope of the
regression model:
Yt = α + β1 Xt + β2 Dt + β3 Xt Dt + εt
. (5.64)
5.3 Structural Changes and Indicator Variables 245
Yt = (α + β2 ) + (β1 + β3 ) Xt + εt if t = t0
. (5.65)
and
Yt = α + β1 Xt + εt if t /= t0
. (5.66)
In this example, the intercept and the slope are simultaneously modified.
The choice between the specifications (5.60) and (5.64) can be guided by
theoretical considerations. It is also possible to carry out a posteriori tests in order
to make this choice. To this end, we start by estimating the model without dummy
variables:
Yt = α + β1 Xt + εt
. (5.67)
Yt = α ' + β1' Xt + εt
. (5.68)
with .α ' = (α + β2 ) and .β1' = β1 in the case of model (5.60) and .β1' = β1 + β3 in
the case of model (5.64).
We then perform coefficient comparison tests:
Yi = α + β1 D1i + β2 D2i + εi
. (5.69)
where:
– .Yi denotes the average consumption expenditure of the good B in the subregion
i.
1 if the subregion is located in the South
– .D1i =
0 otherwise.
1 if the subregion is located in the Southeast
– .D2i =
0 otherwise.
246 5 Problems with Explanatory Variables
.D1i and .D2i are two dummy variables representing a qualitative variable. The
qualitative variable here is the region to which the subregion belongs, and each of
the dummy variables represents one of the modalities associated with this variable.
The average consumption expenditure of the good B in the North corresponds to the
case where .D1i = 1 and .D2i = 0 and is given by the model:
Yi = α + β1
. (5.70)
Yi = α + β2
. (5.71)
Yi = α
. (5.72)
Remark 5.4 In the example studied here, we have considered a single qualitative
variable comprising three attributes (North, Southeast, and Southwest). It is possible
to introduce more than one qualitative variable into a model. This is the case, for
example, with the following model:
Yi = α + β1 D1i + β2 D2i + εi
. (5.74)
where .Yi denotes the consumption expenditure on the good B in the subregion i, .D1i
denotes gender (.D1i = 1 if the person is male, 0 if female), and .D2i is the region
to which the subregion belongs (.D2i = 1 if the subregion is in the South region, 0
otherwise). The estimation of the coefficient .α thus gives the average consumption
expenditure on the good B by a woman living in a subregion that is not located in
the South. This situation is the reference situation to which the other cases will be
compared.
where .Yt denotes the average hourly wage in euros and .Dt is a dummy variable
equal to 1 for women and 0 for men. For men, the model is given by .log Yt = α + εt
and for women by .log Yt = (α + β) + εt . Therefore, .α denotes the logarithm of
the average hourly wage and .β is the difference in the logarithm of the average
hourly wage for men and women. The anti-log of .α is interpreted as the median
(not average) hourly wage for men. Similarly, the anti-log of .(α + β) is the median
hourly wage for women.
4 This is only possible if the model does not have a constant term.
248 5 Problems with Explanatory Variables
Yi = α + β1 D1i + β2 D2i + β3 Xi + εi
. (5.76)
where:
– .Yi denotes
the average consumption expenditure on a good B in the subregion i.
1 if the subregion is located in the North
– .D1i =
0 otherwise.
1 if the subregion is located in the Southeast
– .D2i =
0 otherwise.
– .Xi designates the average wage.
Interactions
To illustrate the problem of interactions between variables, let us consider the
following model:
Yt = α + β1 D1t + β2 D2t + β3 Xt + εt
. (5.77)
where:
for men than for women, they are higher whether or not they work in the public
5.3 Structural Changes and Indicator Variables 249
sector. Similarly, if the hourly wage of people working in the public sector is lower
than that of people working in the private sector, it is so whether they are men
or women. There is therefore no interaction between the two qualitative variables.
Such an assumption may seem highly restrictive, and we need to take into account
the possible interactions between the variables. For example, a woman working in
the public sector may earn less than a man working in the same sector. We can thus
write the model (5.77):
For a woman .(D1t = 1) working in the public sector .(D2t = 1), the model is:
Yt = (α + β1 + β2 + β4 ) + β3 Xt + εt
. (5.79)
This model indicates that, all other things being equal, the average hourly wage
of women is e.3.4 lower than that of men, and the average hourly wage of people
working in the public sector is e.2.7 lower than that of people working in the private
sector.
Let us now assume that the estimation of the model (5.78) has led to the following
results:
All other things being equal, the average hourly wage of women working in the
public sector is e3 lower (.−3.4 − 2.7 + 3.1 = −3), which lies between the values
.−3.4 (gender difference alone) and .−2.7 (employment sector difference alone).
Remark 5.6 The method described above is only valid if the series under con-
sideration can be decomposed in an additive way, i.e., if it can be written in
the form: .Y = T + C + S + ε where T designates the trend, C the cyclical
component, S the seasonal component, and .ε the residual component. This is
known as an additive decomposition scheme. But, if the components enter
multiplicatively (multiplicative decomposition scheme), i.e., .Y = T × C × S × ε,
the deseasonalization method presented above is inappropriate.
Empirical Application
Consider the series of returns of the Dow Jones Industrial Average US stock index
(RDJ I N D) over the period from the second quarter of 1970 to the second quarter
of 2021 (source: Macrobond). We are interested in the relationship between the
present value of returns and their first-lagged value. The study period includes the
stock market crash of October 19, 1987, corresponding to the 71st observation.
To take into account this exceptional event, let us consider the following dummy
variable:
1 if t = 71
.Dt = (5.84)
0 otherwise
5 Note that a dummy variable is assigned to each quarter, which requires us not to introduce a
constant term into the regression. We could also have written the model by introducing a constant
term and only three dummy variables.
5.3 Structural Changes and Indicator Variables 251
RDJ
. I NDt = 0.0206 + 0, 0069RDJ I NDt−1 − 0.3131 Dt (5.85)
(3.6585) (0.1028) (−3.9772)
where the values in parentheses are the t-statistics of the estimated coefficients.
All else being equal, the stock market crash of October 1987 reduced the average
value of UK index returns by .−0.3131. This decrease is significant insofar as the
coefficient assigned to the dummy variable is significantly different from zero.
It is often useful to assess the robustness of the estimated model over the entire
study period, i.e., to test its stability. There may in fact be a structural change
or break in the relationship between the dependent variable and the explanatory
variables, resulting in instability of the coefficients of the model estimated over the
entire period under consideration. Several causes can produce a structural change,
such as the transition to the single currency, a change in exchange rate regime (from
a fixed to a flexible exchange rate regime), the 1973 oil shock, the World War II, the
1987 stock market crash, the Covid-19 pandemic, etc.
There are various methods for assessing the stability of the estimated coefficients
of a regression model, and we present them below.
Recursive Residuals
Consider the usual regression model:
Y = Xβ + ε
. (5.86)
Let us denote .x t the vector of k explanatory variables plus the constant for the
t-th observation:
Let .Xt−1 be the matrix formed by the .(t − 1) first rows of .X t . This matrix can
be used to estimate .β. Let .β̂ t−1 be the estimator thus obtained:
−1 '
β̂ t−1 = X't−1 Xt−1
. Xt−1 Y t−1 (5.88)
et = Yt − x 't β̂ t−1
. (5.89)
Yt − x 't β̂ t−1
wt =
.
−1 (5.91)
1 + x 't X't−1 Xt−1 xt
with .wt ∼ N(0, σε2 ). The recursive residuals are defined as the normalized forecast
errors. Furthermore, the recursive residuals are a set of residuals which, if the
disturbance terms are independent and of the same law, are themselves independent
and of the same law. The recursive residuals thus are normally distributed since they
are defined as a linear function of normal variables and the forecast given by OLS
is unbiased.
To generate a sequence of recursive residuals, we proceed as follows:
– We choose a starting set of .τ observations, with .τ < T . These may be, for
example, the first .τ observations of the sample (case of a forward regression).
Having estimated .β̂ τ , the corresponding recursive residuals are determined:
Yτ +1 − x 'τ +1 β̂ τ
wτ +1 =
.
−1 (5.92)
1 + x 'τ +1 X'τ Xτ x τ +1
5.3 Structural Changes and Indicator Variables 253
Yτ +2 − x 'τ +2 β̂ τ +1
wτ +2 =
.
−1 (5.93)
1 + x 'τ +2 X'τ +1 Xτ +1 x τ +2
H0 : β 1 = β 2 = . . . = β T = β
. (5.94)
with:
2
σε1
. = σε2
2
= . . . = σεT
2
= σε2 (5.95)
t
wj
Wt =
. (5.96)
σ̂w
j =τ +1
where .t = τ + 1, . . . , T and:
1
T
. σ̂w2 = wj2 (5.97)
T −τ
j =τ +1
.Wt is thus a cumulative sum that varies with t. As long as the vectors .β are
constant, the average of .Wt is zero. If they vary, .Wt tends to deviate from the straight
line representing the null expectation. More specifically, under the null hypothesis
of stability of the coefficients, .Wt must lie within the interval .[−Lt , Lt ] where:
a (2t + T − 3τ )
. Lt = √
T −τ
254 5 Problems with Explanatory Variables
t
wj2
j =τ +1
st =
. , t = τ + 1, . . . , T (5.98)
T
wj2
j =τ +1
The line representing the expectation of the test statistic under the null hypothesis
of stability is given by:
t −τ
E(st ) =
. (5.99)
T −τ
Suppose we divide the sample into two sub-samples and estimate the following
models:
Yt = α1 + β1 Xt + ε1t , for t = 1, . . . , τ
. (5.101)
and:
Yt = α2 + β2 Xt + ε2t , for t = τ + 1, . . . , T
. (5.102)
The relationship (5.100) is based on the absence of structural change over the
entire period under consideration. In other words, there is no difference between the
two periods .t = 1, . . . , τ and .t = τ + 1, . . . , T : the constant term and the slope
coefficient remain identical. If this is indeed the case, we should have:
α = α1 = α2 and β = β1 = β2
. (5.103)
Assuming that .ε1t and .ε2t are independent and both have normal distributions of
zero mean and same variance, the Chow test is implemented as follows:
– The model (5.100) is estimated and the corresponding residual sum of squares is
noted .RSS0 .
– The model (5.101) is estimated and the corresponding residual sum of squares is
noted .RSS1 .
– The model (5.102) is estimated and the corresponding residual sum of squares is
noted .RSS2 .
– .RSSa = RSS1 + RSS2 is calculated.
– We calculate the test statistic:
(RSS0 − RSSa ) / (k + 1)
F =
. (5.106)
RSSa /(T − 2 (k + 1))
F ∼ F (k + 1, T − 2(k + 1))
. (5.107)
Remark 5.7 The Chow test can be easily generalized to the existence of more than
one structural break. Thus, if we wish to test for the existence of two breaks, we
will split the period into three sub-periods, the principle of the test remaining the
same (the sum of the squared residuals .RSSa then being equal to the sum of the
sums of the squared residuals of the three regressions corresponding to the three
sub-periods).
The Chow test assumes that the date at which the structural break(s) occurs is
known. Otherwise, it is possible to perform rolling regressions and to calculate the
Chow test statistic for each of these regressions. The break point we are looking for
then corresponds to the value for which the Chow statistic is maximum.
Empirical Application
Consider the relationship between the returns of the US Dow Jones Industrial
Average index (RDJ I ND) and the Japanese Nikkei index (RNI KKEI ). The data,
taken from the Macrobond database, are quarterly over the period from the second
quarter of 1978 to the second quarter of 2021 (.T = 173). The OLS estimation of
the relationship:
RNI
. KKEIt = −0.0079 + 0.7958RDJ I N Dt (5.109)
(−1.1688) (9.5019)
Rolling Regressions
To get a rough idea of the stability of the estimated coefficients, we perform rolling
regressions by adding an observation each time. We then graphically represent the
5.3 Structural Changes and Indicator Variables 257
1.2
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
1980 1985 1990 1995 2000 2005 2010 2015 2020
.06
.04
.02
.00
-.02
-.04
1980 1985 1990 1995 2000 2005 2010 2015 2020
.2
.1
.0
-.1
-.2
-.3
-.4
1980 1985 1990 1995 2000 2005 2010 2015 2020
40
30
20
10
-10
-20
-30
-40
1980 1985 1990 1995 2000 2005 2010 2015 2020
CUSUM 5% Significance
1.2
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
1980 1985 1990 1995 2000 2005 2010 2015 2020
outside the interval delimited by the two lines around the 1987 stock market crash,
indicating some instability (random break) in the parameters or variance.
Chow Test
To investigate whether the stock market crash of October 1987 caused a structural
break in the relationship between the returns of the two indices under consideration,
let us apply the Chow test. To this end, we estimate two regressions: a regression
over the period 1978.2–1987.3 (before the crash) and a regression over the period
1987.4–2021.2 (after the crash). The results are given below.
Over the period 1978.2–1987.3, i.e., .t = 1, . . . , 70:
RNI
. KKEIt = 0.0279 + 0.4072RDJ I N Dt (5.110)
(3.3720) (3.8941)
RNI
. KKEIt = − 0.0159 + 0.8728RDJ I NDt (5.111)
(−1.9608) (8.7515)
(1.2654 − 1.2040) / (1 + 1)
F =
. = 4.3128 (5.112)
1.2040/(173 − 2 (1 + 1))
The Fisher table gives us, at the 5% significance level: .F (2, 169) = 2.997. The
calculated value of the test statistic being higher than the critical value, the null
hypothesis of stability of the estimated coefficients is rejected at the 5% significance
level. There is indeed a break in the fourth quarter of 1987. This result was expected
in view of the differences obtained in the estimates over the two sub-periods: the
constant term is positive in the first sub-period and negative in the second, and the
slope coefficient is more than twice as high in the second sub-period as in the first.
It is possible to recover the results of the Chow test by introducing a dummy
variable and running a single regression. Consider the following model:
RN I KKEIt = (α + γ ) + (β + δ) × RDJ I N Dt + εt
. (5.115)
In Eq. (5.113), the coefficient .δ indicates how much the slope coefficient of the
second period differs from that of the first period. Estimating this relationship yields:
RN
. I KKEIt = 0.0279 + 0.4072RDJ I NDt − 0.0439 Dt
(1.8619) (2.1502) (−2.6195)
All coefficients are significantly different from zero (at the 10% significance level
for the constant term), suggesting that the relationship between the two series of
returns is different over the two sub-periods. From this estimation, we deduce the
relationship over the 1978.2–1987.3 period:
RNI
. KKEIt = 0.0279 + 0.4072 × RDJ I N Dt (5.117)
.RNI
KKEIt = (0.0279 − 0.0439) + (0.4072 + 0.4656) RDJ I NDt (5.118)
= −0.0160 + 0.8728RDJ I NDt
We naturally find the results obtained when implementing the Chow test. We see
that the coefficients .γ and .δ are significantly different from zero. We deduce that the
regressions over the two sub-periods differ not only in the constant term but also in
the slope coefficient. The findings therefore confirm the results of the Chow test.
Conclusion
In this chapter, we have considered that two of the assumptions of the regression
model concerning the explanatory variables are violated: the assumption of indepen-
dence between the explanatory variables and the error term, on the one hand, and
the assumption of independence between the explanatory variables, on the other.
We have also studied third problem relating to the explanatory variables, namely,
the question of the instability of the estimated model. So far, we have considered
models in which the dependent variable is a function of one or more explanatory
variables at the same date, i.e., at the same moment in time. Frequently, however, the
explanatory variables include lagged variables or the lagged endogenous variable.
These are referred to as dynamic models, as opposed to static models. These models
are the subject of the next two chapters.
262 5 Problems with Explanatory Variables
Further Reading
∂L
. = −2X ' Y + 2X' Xβ̂ 0 − 2R ' λ (5.121)
∂ β̂ 0
and:
∂L
. = −2 R β̂ 0 − r (5.122)
∂λ
Canceling these partial derivatives, we have:
and:
R β̂ 0 − r = 0
. (5.124)
−1
Let us multiply each member of (5.123) by .R X ' X :
−1 ' −1 '
R β̂ 0 − R X' X
. X Y − R X' X Rλ=0 (5.125)
Hence:
−1 ' −1
λ = R X' X
. R r − R β̂ (5.126)
264 Problems related to explanatory variables
−1 '
with .β̂ = X' X X Y denoting the OLS estimator of the unconstrained model. It
is then sufficient to replace .λ by its value in (5.123):
Hence:
−1 ' ' −1 ' −1
β̂ 0 = β̂ + X' X
. R R XX R r − R β̂ (5.128)
– Models including present and lagged values of explanatory variables; these are
distributed lag models.
– Models in which the lagged values of the dependent variable intervene among
the explanatory variables; in this case, we speak of autoregressive models.1
In economics, the present value of the dependent variable often depends on the past
values of the explanatory variables. In other words, the influence of the explanatory
variables is only exerted after a certain lag. Let us take a few examples to illustrate
this.
models when only the lagged values of the dependent variable are present as explanatory variables.
We speak of autoregressive distributed lag (ARDL) models when the lagged values of the
dependent variable are among the explanatory variables in addition to the lagged values of the
usual explanatory variables.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 265
V. Mignon, Principles of Econometrics, Classroom Companion: Economics,
https://doi.org/10.1007/978-3-031-52535-3_6
266 6 Distributed Lag Models
Ct = α + β1 Rt + φ1 Ct−1
. (6.1)
we can write:
Ct = α + β1 Rt + φ1 (α + β1 Rt−1 + φ1 Ct−2 )
. (6.3)
that is:
Thus, in addition to income of the current period, all past income has an influence
on present consumption. The reaction of consumption to a change in income is
therefore spread, i.e., staggered, over time. The slower the reaction, the closer
the coefficient .φ1 is to 1. This coefficient represents the degree of inertia of
consumption. The model (6.5) is a distributed lag model in the sense that the
explanatory variable (income) has a distributed impact over time on the dependent
variable (consumption).
Another possible illustration, again in the field of consumption, is provided by
Friedman’s (1957) permanent income model. According to this theory, consumption
in a given period depends not just on income of the same period, but on all income
anticipated in future periods. Since future incomes are unknown, they need to be
2 We ignore the error term here to simplify the notations and calculations to follow.
6.1 Why Introduce Lags? Some Examples 267
Ct = μ + δ0 Rt + δ1 Rt−1 + δ2 Rt−2 + εt
. (6.6)
where R denotes income and C consumption. In this model, the present and
lagged values of one and two periods of income are involved in explaining present
consumption, meaning that an increase in income is spread, or distributed, over three
periods. The model (6.6) is called a distributed lag model because the explanatory
variable exerts a time-distributed influence on the dependent variable.
A second example illustrating the spread over time of the influence of explana-
tory variables is given by the investment function. In line with the accelerator
model, investment reacts immediately to changes in demand, i.e.:
It = νΔYt
. (6.7)
where .It denotes investment at date t and .ΔYt = Yt − Yt−1 represents the change in
output perceived as the variation in demand, .ν being the acceleration coefficient. In
line with this formulation, a change in demand generates an immediate increase
in investment: there is no lag between the change in demand and the reaction
of investment. Such a formulation is too restrictive in the sense that it leads to
too abrupt variations in investment, and that there are lags in the adjustment of
investment to changes in demand. These limitations led to the flexible accelerator
model in which the capital stock K is linked to a weighted average of current and
past output, with the weight assigned to past output decreasing over time:
Kt = φ (1 − λ) Yt + λYt−1 + λ2 Yt−2 + . . . + λh Yt−h + . . .
. (6.8)
where the weight .λ is between 0 and 1. After a few simple calculations3 and remem-
bering that investment is equal to the change in the capital stock .(It = Kt − Kt−1 ),
the accelerator model can be written:
∞
.It = λν (1 − λ)i ΔYt−i (6.9)
i=0
This shows that investment reacts in a distributed way to changes in demand, not
adjusting immediately as was the case in the simple accelerator model. It is therefore
a distributed lag model.
These examples illustrate that a variety of factors can justify the existence of lags
and the use of distributed lag models. Lags can have a number of causes, including
but not limited to:
The number of lags h can be finite or infinite. An infinite lag model is used when
the lagged effects of the explanatory variables are likely to be very long-lasting.
Finite lag models are preferred when the effect of a change in X no longer has an
influence on Y after a relatively small number of periods.
To simplify the notations, let us introduce the lag operator L such that:
LXt = Xt−1
. (6.11)
The lag operator thus transforms a variable into its past value. More generally,
we have:
Li Xt = Xt−i
. (6.12)
D(L) = δ0 + δ1 L + . . . + δh Lh
. (6.13)
6.2 General Formulation and Definitions of DistributedLag Models 269
Yt = μ + D(L)Xt + εt
. (6.14)
The coefficient .δ0 measures the variation of .Yt following the variation of .Xt :
ΔYt
δ0 =
. (6.15)
ΔXt
.δ0 is called the short-term multiplier or impact multiplier of X. The partial sums
D(1) = δ0 + δ1 + . . . + δh
. (6.16)
equal to the sum of all coefficients .δi , i = 1, . . . , h, measures the effect, in the long
term, of a variation in X on the value of Y . .D(1) is called the long-term multiplier
or equilibrium multiplier.
It is possible to normalize the coefficients .δi , i = 1, . . . , h, by dividing them by
their sum .D(1). The partial sums of these normalized .δi coefficients measure the
proportion of the total effect of a change in X reached after a certain period.
Let us consider a numerical example to illustrate this. Consider the model (6.6),
by giving values to the coefficients:
4 The concepts of median and mean lags only really make sense if the coefficients are of the same
sign.
270 6 Distributed Lag Models
coefficients, i.e.:
h
iδi
i=0 δ1 + 2δ2 + . . . + hδh D ' (1)
.D̄ = = = (6.18)
h δ0 + δ1 + δ2 + . . . + δh D(1)
δi
i=0
There are several procedures for determining the number of lags h in a distributed
lag model:
Yt = μ + δ0 Xt + δ1 Xt−1 + . . . + δh Xt−h + εt
. (6.19)
RSSh 2h
AI C(h) = log
. + (6.20)
T T
RSSh h log T
SI C(h) = log
. + (6.21)
T T
RSSh h log(log T )
H Q(h) = log
. +2 (6.22)
T T
where .RSSh denotes the sum of squared residuals of the model with h lags and
T is the number of observations.5
5 It has been assumed here that the constant c is equal to 1 in the expression of the HQ criterion.
6.4 Finite Distributed Lag Models: Almon Lag Models 271
Finite distributed lag models are polynomial distributed lag (PDL) models, also
known as Almon lag models (see Almon, 1962).
Almon’s technique avoids directly estimating the coefficients .δi , since it consists
in assuming that the true lag distribution can be approximated by a polynomial of
order q:
q
δi = α0 + α1 i + α2 i 2 + . . . + αq i q =
. αj i j (6.23)
j =0
with .h > q.
Consider, as an example, that the polynomial is of second order .(q = 2). Then
we have:
– .δ0 = α0
– .δ1 = α0 + α1 + α2
– .δ2 = α0 + 2α1 + 4α2
– .. . .
.δh = α0 + hα1 + h α2
– 2
(6.24)
272 6 Distributed Lag Models
that is:
The “new” explanatory variables are linear combinations of the lagged explana-
tory variables. Thus, a regression of Y on these “new” explanatory variables yields
estimates of the coefficients .α, which, in turn, allows us to determine the coefficients
.δ.
More generally, in matrix form, we can write for h lags and a polynomial of
degree q:
⎛ ⎞ ⎛ ⎞⎛ ⎞
δ0 1 0 0 ··· ··· 0 α0
⎜δ1 ⎟ ⎜1 1 1 · · · · · · 1 ⎟ ⎜α1 ⎟
⎜ ⎟ ⎜ ⎟⎜ ⎟
⎜ ⎟ ⎜ q⎟⎜ ⎟
. ⎜ δ2 ⎟ = ⎜1 2 2 · · · · · · 2 ⎟ ⎜ α2 ⎟
2
(6.26)
⎜ . ⎟ ⎜. ⎟⎜ . ⎟
⎝ .. ⎠ ⎝ .. ⎠ ⎝ .. ⎠
δh 1 h h ··· ··· h
2 q αh
Y = I μ + Xδ + ε
. (6.28)
we can write:
Y = I μ + XW α + ε
. (6.29)
⎛ ⎞ ⎛ ⎞
δ0 α0
⎜ δ1 ⎟ ⎜α1 ⎟
⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⎟
where .δ = ⎜δ2 ⎟ and .α = ⎜α2 ⎟ .
⎜.⎟ ⎜.⎟
⎝ .. ⎠ ⎝ .. ⎠
δh αh
6.5 Infinite Distributed Lag Models 273
It is then possible to estimate the regression (6.29) by OLS to obtain the estimator
α̂ of .α and to deduce the estimator .δ̂ of .δ from (6.26).
.
The method just described assumes that the degree q of the polynomial used
for the approximation is known. In practice, this is not the case and q needs to be
determined. One possible technique is to start with a high value, .q = h − 1, and
test the significance of the associated coefficient (.αh−1 ) by means of a t-test. The
degree of the polynomial is then progressively reduced until a significant coefficient
appears.
In infinite distributed lag models, the effect of the explanatory variable is unlimited
in time. It is assumed, however, that the recent past has more influence than the
distant past, and that the weight of past observations tends to decrease steadily over
time.
Generally speaking, an infinite distributed lag model is written as:
∞
Yt = μ +
. δi Xt−i + εt (6.30)
i=0
or:
∞
Yt = μ +
. δi Li Xt + εt (6.31)
i=0
δi = λ i δ0
. (6.32)
higher weights than past observations. The closer .λ is to 1, the slower the rate of
decrease of the coefficients, and the closer .λ is to 0, the faster that rate.
Substituting (6.32) into (6.30), we have:
or:
Yt = μ + δ0 Xt + λXt−1 + λ2 Xt−2 + . . . + λi Xt−i + . . . + εt
. (6.34)
D(L) = δ0 + λδ0 L + λ2 δ0 L2 + . . . + λi δ0 Li + . . .
. (6.35)
Yt = μ + D(L)Xt + εt
. (6.36)
or:
Knowing that:
D(L) = δ0 1 + λL + λ2 L2 + . . . + λi Li + . . .
. (6.38)
δ0
D(L) =
. (6.39)
(1 − λL)
and therefore:
(1 − λL)
D(L)−1 =
. (6.40)
δ0
That is:
Yt − λYt−1 = (1 − λ) μ + δ0 Xt + εt − λεt−1
. (6.42)
6.5 Infinite Distributed Lag Models 275
Hence:
Yt = λYt−1 + (1 − λ) μ + δ0 Xt + εt − λεt−1
. (6.43)
eliminated.
A few remarks are in order. Firstly, the Koyck transformation shows that we can
move from a distributed lag model to an autoregressive model. The endogenous
lagged variable, .Yt−1 , now appears as an explanatory variable of .Yt , which has
important implications in terms of estimation. We know that one of the basic
assumptions of the OLS method is that the matrix of explanatory variables is non-
random. Such an assumption is violated here since .Yt−1 , like .Yt , is a random
variable. However, this assumption can be reformulated by writing that the matrix
of explanatory variables can contain random variables, provided that they are not
correlated with the error term (see Chap. 3). It will therefore be necessary to check
this characteristic during the estimation phase; we will return to this point when
discussing estimation methods (see below).
Secondly, the error term of the model (6.43) is .εt − λεt−1 , and no longer only
.εt as was the case in the original model (6.30). Let us posit .ηt = εt − λεt−1 . It
appears that while the .εt are indeed non-autocorrelated, this is not the case for the
.ηt , a characteristic which must be taken into account during the estimation phase
(see below).
Thirdly, it is possible to define median and mean lags in the Koyck approach,
which makes it possible to quantify the speed with which the dependent variable
.Yt responds to a unit variation in the explanatory variable .Xt . The median lag
corresponds to the number of periods required for 50% of the total effect of a unit
change in the explanatory variable .Xt on .Yt to be reached. It can be shown that, in
the Koyck model, the median lag is given by .log 2/ log λ. Thus, the higher the value
of .λ, the greater the median lag and the lower the speed of adjustment. On the other
hand, the mean lag is defined by:
h
iδi
i=0
D̄ =
. (6.44)
h
δi
i=0
276 6 Distributed Lag Models
λ
D̄ =
. (6.45)
1−λ
The median and mean lags can thus be used to assess the speed with which .Yt
adjusts following a unit variation in .Xt .
If we wish to apply OLS to the Koyck model, we need to ensure that the lagged
endogenous variable .Yt−1 is independent of the error term .ηt . However, such an
assumption does not hold. Indeed, in accordance with (6.43), .εt−1 has an impact on
.εt . Similarly, if we write Eq. (6.43) in .t − 1, it is clear that .εt−1 has an impact on
−1
β̂ I V = Z ' X
. Z'Y (6.46)
Yt = λYt−1 + (1 − λ) μ + δ0 Xt + εt − λεt−1
. (6.47)
only one instrument needs to be found, since only the variable .Yt−1 needs to
be instrumented (the variable .Xt is indeed independent of the error term, by
assumption). We frequently use .Xt−1 as the instrument of .Yt−1 . We then have the
following matrix .Z:
⎛ ⎞
1 X1 X0
⎜1 X2 X1 ⎟
⎜ ⎟
.Z = ⎜ . .. .. ⎟ (6.48)
⎝ .. . . ⎠
1 XT XT −1
6.5 Infinite Distributed Lag Models 277
Remark 6.1 It is not always easy to find the “right” instrumental variables. In
these circumstances, the instrumental variables method may be of limited practical
interest, and it is preferable to resort to the maximum likelihood method. In the
case of the Koyck model, the essential role of the method of instrumental variables
is to obtain a consistent estimator of .β to serve as the initial value of an iterative
procedure, such as the maximum likelihood method.
Remark 6.2 (The Sargan Test) Sargan (1964) developed a test of instrument
validity. The test can be described sequentially as follows:
– Split the variables appearing in the regression model into two groups: the group
of variables independent of the error term (noted .X1 , .X2 , . . . ., .Xk1 ) and the group
of variables that are not independent of the error term (noted .W1 , .W2 , . . . , .Wk2 ).
– Note .Z1 , .Z2 , . . . , .Zk3 the instruments chosen for the variables W , with .k3 ≥ k2.
– Estimate the parameters of the model by the instrumental variables method, i.e.,
' −1 '
.β̂ I V = Z X Z Y , and deduce the estimated series of residuals .et .
– Regress the residuals .et on a constant, the variables X and the variables Z.
Determine the coefficient of determination .R 2 of the estimated regression.
– Calculate the Sargan test statistic:
S = (T − k − 1)R 2
. (6.50)
Yt∗ = α + βXt + εt
. (6.51)
where .Yt∗ denotes the desired level of the dependent variable .Yt and .Xt is an
explanatory variable. As the variable .Yt∗ is unobservable, we express it as a function
of .Yt by using a partial adjustment mechanism of the type:
that is:
This partial adjustment model has a similar structure to the Koyck model, the
error term being simpler since it is only multiplied by the constant .λ.
. Yt = α + βXt∗ + εt (6.55)
where .Xt∗ denotes the expected value of the explanatory variable .Xt . As the variable
∗
.Xt is generally not directly observable, we assume an adaptive training process for
Xt∗ − Xt−1
.
∗ ∗
= λ Xt − Xt−1 (6.56)
6.5 Infinite Distributed Lag Models 279
This model can be reduced to a Koyck model. Let us write the model (6.55) in
(t − 1) and multiply each member by .(1 − λ). This gives us:
.
∗
. (1 − λ) Yt−1 = (1 − λ) α + (1 − λ) βXt−1 + (1 − λ) εt−1 (6.59)
that is:
low, increase until they reach a maximum, and then decrease (a kind of bell curve).
With this approach, the coefficients .δi are distributed as follows:
δi = (1 − λ)r+1 Cr+i
.
i
λi (6.62)
i
where .Cr+i is the coefficient of Newton’s binomial, .0 ≤ λ ≤ 1 and .r ∈ N.
The Pascal approach is a generalization of the Koyck approach. If we posit .r = 0,
we find the geometric distribution of Koyck.
Using Eq. (6.30), the distributed lag model is expressed as follows:
∞
.Yt = μ + (1 − λ)r+1 Cr+i
i
λi Xt−i + εt (6.63)
i=0
The associated .D(L) polynomial is written as:
∞
D(L) = (1 − λ)r+1
.
i
Cr+i λ i Li (6.64)
i=0
Yt = μ + D(L)Xt + εt
. (6.66)
or:
δ0
D(L) =
. (6.68)
(1 − λL)
δ0
D(L) =
. (6.69)
(1 − λL)2
or:
(1 − λL)2
.D(L)−1 = (6.70)
δ0
6.6 Autoregressive Distributed Lag Models 281
(6.72)
(1 − λL)3
D(L)−1 =
. (6.73)
δ0
Substituting in (6.67), we get:
Yt = 3λYt−1 − 3λ2 Yt−2 + λ3 Yt−3 + 1 − 3λ + 3λ2 − λ3 μ
. (6.74)
Generally speaking, the autoregressive form associated with the distributed lag
model in which the coefficients are distributed according to (6.62) has .(r +1) lagged
endogenous variables whose associated coefficients are a function of .λ.
Remark 6.4 In order to determine the value of r, Maddala and Rao (1971) suggest
adopting a sweeping approach: we give ourselves a set of possible values for r and
select the value that maximizes the adjusted coefficient of determination.
In autoregressive distributed lag (ARDL) models, the lagged values of the depen-
dent variable are added to the present and past values of the “usual” explanatory
variables in the set of explanatory variables.7
Generally speaking, an autoregressive distributed lag model is written:
7 We will not deal in detail with ARDL models in this book. For a more exhaustive presentation,
readers can refer to Greene (2020).
282 6 Distributed Lag Models
that is:
p
h
Yt = μ +
. φi Yt−i + δj Xt−j + εt (6.76)
i=1 j =0
Ф(L)Yt = μ + D(L)Xt + εt
. (6.77)
error term .εt is assumed to have the “good” statistical properties. Because of this
characteristic, the OLS estimator is an efficient estimator.
Let us write the distributed lag form of the ARDL model (6.77). To do this, divide
each term of (6.77) by the autoregressive lag polynomial .Ф(L):
μ D(L) εt
Yt =
. + Xt + (6.78)
Ф(L) Ф(L) Ф(L)
∞ ∞
μ
Yt =
. + αj Xt−j + θl εt−l (6.79)
1 − φ1 − . . . − φp
j =0 l=0
where the coefficients .αj , j = 0, 1, . . . , ∞, are the terms associated with the ratio
of the polynomials .D(L) and .Ф(L). Thus, .α0 is the coefficient of 1 in . D(L)
Ф(L) , .α1 is
the coefficient of L in . D(L) 2 D(L)
Ф(L) , .α2 is the coefficient of .L in . Ф(L) , and so on. Similarly,
the coefficients .θl , l = 0, 1, . . . , ∞, are the terms associated with the ratio . Ф(L)
1
.
The model (6.79) has a very general lag structure and is referred to as a rational
lag model by Jorgenson (1966). The long-term multiplier associated with such a
model is given by:
∞
D(1)
. αj = (6.80)
Ф(1)
j =0
6.7 Empirical Application 283
– The returns of the Hang Seng Index of the Hong Kong Stock Exchange: RH K
– The returns of the Japanese index NI KKEI 225: RNI KKEI
The data are weekly and cover the period from the week of December 1, 1969,
to that of July 5, 2021, i.e., a number of observations .T = 2 693 (data source:
Macrobond). Suppose we wish to explain the returns of the Hang Seng Index by
the present and lagged returns of the Japanese index. The dependent variable is
therefore RH K and the explanatory variables are the present and lagged values of
RNI KKEI . We seek to estimate the following distributed lag model:
(6.81)
Let us start by determining the number of lags to take into account. To do this, we
estimate the model (6.81) for various values of h and select the one that minimizes
the information criteria. Table 6.1 shows the values taken by the three criteria AIC,
SIC, and Hannan-Quinn (HQ) for values of h ranging from 1 to 6. These results lead
us to select a number of lags h equal to 1 according to the SIC and HQ criteria and
2 for the AIC criterion. For reasons of parsimony, and given that two out of three
criteria favor a number of lags equal to 1, we choose .h = 1.8
Let us assume a geometric distribution for the lags (Koyck model). We thus seek
to estimate the following model:
8 Note further that the values taken by the AIC criterion for .h = 1 and .h = 2 are almost identical.
284 6 Distributed Lag Models
We have .λ̂ = 0.1134. This value is small, which means that the decay rate of
the coefficients of the distributed lag model is rapid. In other words, the influence
of past values of RN I KKEI on RH K decreases rapidly. The model can also be
written as:
0.0013
μ̂ =
. = 0.0014 (6.84)
1 − 0.1134
.
RH K t = 0.0014 + 0.4846RNI KKEIt + 0.1134 × 0.4846RNI KKEIt−1
+ 0.11342 × 0.4846RNI KKEIt−2 + . . . (6.85)
that is:
.
RH K t = 0.0014 + 0.4846RNI KKEIt + 0.0549RNI KKEIt−1
+ 0.0062RNI KKEIt−2 + . . . (6.86)
We can see that the value of the coefficients associated with the variable
RNI KKEI decreases rapidly as the number of lags increases. We can calculate
the median lag, given by .log 2/ log λ̂, i.e., 0.3184: following a unit variation of
6.7 Empirical Application 285
RNI KKEI , 50% of the total variation of RH K is achieved in just over a day
and a half. As the value of .λ̂ is small, so is the median lag, highlighting a rapid
adjustment. It is also possible to calculate the mean lag:
λ̂
. D̄ = = 0.1278 (6.87)
1 − λ̂
The mean lag is around 0.13: it takes around half a day for the effect of a variation
in RNI KKEI to be reflected in RH K, which is rapid.
Conclusion
This chapter has introduced a first category of dynamic models: distributed lag
models. There is a second category of dynamic models, generally referred to as time
series models, in which the lagged endogenous variable is one of the explanatory
variables. These are the subject of the next chapter, which presents the basics of
time series econometrics.
ΔYt
Short-term multiplier δ0 = ΔXt
Lag form
q
Almon δi = α0 + α1 i + α2 i 2 + . . . + αq i q = j =0 αj i
j, h>q
p
ARDL model Yt = μ + i=1 φi Yt−i + hj =0 δj Xt−j + εt
Lag operator Li Xt = Xt−i
Further Reading
In addition to the references cited in the chapter, readers interested in distributed lag
models can consult Nerlove (1958) and Griliches (1967). A detailed presentation
can also be found in Davidson and MacKinnon (1993) and Gujarati et al. (2017).
An Introduction to Time Series Models
7
1 This chapter takes up a number of developments appearing in the work by Lardic and Mignon
here.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 287
V. Mignon, Principles of Econometrics, Classroom Companion: Economics,
https://doi.org/10.1007/978-3-031-52535-3_7
288 7 An Introduction to Time Series Models
5,000
4,000
3,000
2,000
1,000
0
1980 1985 1990 1995 2000 2005 2010 2015 2020
Fig. 7.1 Standard and Poor’s 500 stock index series, 1980.01–2021.06
from January 1980 to June 2021. The first and last values of this series are given in
Table 7.1: for each month, we have a value of the stock market index.
As the class of random processes is very large, time series analysis initially
focused on a particular class of processes: stationary random processes. These
processes are characterized by the fact that their statistical properties do not change
over time.
7.1 Some Definitions 289
The notion of stationarity of a time series was briefly discussed in the first chapter.
We have seen that, when working with time series, it is necessary to study their
characteristics in terms of stationarity before analyzing and attempting to model
them. Here we present only the concept of second-order stationarity or weak
stationarity, which is the notion of stationarity usually retained in time series
econometrics.3
Condition (1) means that the process is of second order: second-order moments,
such as variance, are finite and independent of time. Condition (2) means that the
expectation of the process is constant over time (mean stationarity). Condition (3)
reflects the fact that the covariance between two periods t and .t + h is solely a
function of the time difference, h. Note that the variance .σY2 = Cov (Yt , Yt ) = γ0
is also independent of time. The fact that the variance is constant over time reflects
the property of homoskedasticity.
In the remainder of the chapter, the term stationary will refer to the concept of
second-order stationarity.
Definition 7.2 Let .Yt be a random process with finite variance. The autovariance
function .γh of .Yt is defined as:
The autocovariance function measures the covariance between two values of the
same series .Yt separated by a certain time h.
3 Fora more detailed study of stationarity and a definition of the various concepts, see in particular
Lardic and Mignon (2002).
290 7 An Introduction to Time Series Models
Theorem 7.1 The autocovariance function of a stationary process .Yt has the
following properties:
– .γ0 = Cov (Yt , Yt ) = E [Yt − E (Yt )]2 = V (Yt ) = σY2 ≥ 0
– .|γh | ≤ γ0
– .γh = γ−h : the autocovariance function is an even function,
.Cov (Yt , Yt+h ) = Cov (Yt , Yt−h ) .
Remark 7.1 We restrict ourselves here to the analysis of series in the time domain.
However, it is possible to study a series in the spectral or frequency domain. The
analog of the autocovariance function in the spectral domain is called the spectral
density. This book does not deal with spectral analysis. Interested readers should
refer to Hamilton (1994) or Greene (2020).
Definition 7.3 Let .Yt be a stationary process. The autocorrelation function .ρh is
defined as:
γh
ρh =
. ,h ∈ Z (7.2)
γ0
The autocorrelation function measures the temporal links between the various
components of the series .Yt . Specifically:
T
−h
(Yt − Ȳ )(Yt+h − Ȳ )
t=1
.ρh = (7.4)
T
−h T
−h
(Yt − Ȳ )2 (Yt−h − Ȳ )2
t=1 t=1
where .Ȳ is the mean of the series .Yt calculated on .(T − h) observations:
T −h
1
Ȳ =
. Yt (7.5)
T −h
t=1
function) as follows:
T
−h
(Yt − Ȳ )(Yt+h − Ȳ )
t=1
.ρ̂h = (7.6)
T
(Yt − Ȳ )2
t=1
where .Ȳ represents the mean of the series .Yt calculated over T observations:
1
T
Ȳ =
. Yt (7.7)
T
t=1
Remark 7.2 The graph of the sampling autocorrelation function is called a correl-
ogram. An example is shown in Fig. 7.2, with the number of lags on the x-axis and
the value of the autocorrelation function on the y-axis.
Theorem 7.2 The autocorrelation function of a stationary process .Yt has the
following properties:
– .ρ0 = 1
– .|ρh | ≤ ρ0
– .ρh = ρ−h : even function.
0
1 2 3 4 5 6
h
−1
292 7 An Introduction to Time Series Models
Let .ρh and .φhh be the autocorrelation and partial autocorrelation functions of
.Yt , respectively. Let .Ph be the symmetric matrix formed by the .(h − 1) first
autocorrelations of .Yt :
⎡ ⎤
1 ρ1 . . . ρh−1
⎢ ⎥
⎢ . 1 ⎥
⎢ ⎥
⎢ . . ⎥
.Ph = ⎢ ⎥ (7.8)
⎢ . . ⎥
⎢ ⎥
⎣ . . ⎦
ρh−1 1
where .|Ph | is the determinant of the matrix .Ph . The matrix .Ph∗ is given by:
⎡ ⎤
1 ρ1 . . ρh−2 ρ1
⎢ . ⎥
⎢ . 1 ⎥
⎢ ⎥
∗ ⎢ . . . ⎥
.Ph = ⎢ ⎥ (7.10)
⎢ . . . ⎥
⎢ ⎥
⎣ . 1 . ⎦
ρh−1 ρh
.Ph∗ is thus the matrix .Ph in which the last column has been replaced by the vector
'
.[ρ1 ....ρh ] .
and
In addition to the graphical representation of the series itself, a first idea concerning
the stationarity or not of a series can be provided by the autocorrelation function.
We know that the autocorrelation function of a stationary time series decreases
very rapidly. If no autocorrelation coefficient is significantly different from zero,
we say that the process has no memory. It is therefore stationary, as in the case of
white noise. If, for example, only the first-order autocorrelation is significant, the
process is said to have a short memory. Conversely, the autocorrelation function of
a non-stationary time series decreases very slowly, indicating a strong dependence
between observations.
Figures 7.3 and 7.4 represent the correlogram of a stationary series. It can be seen
that the autocorrelation function decreases very rapidly (here it is cancelled out from
the fourth lag). Similarly, Fig. 7.5 relates to a stationary series: the autocorrelation
function decreases sinusoidally, but the decay of the envelope curve is exponential,
testifying to a very rapid decrease in the autocorrelation function. Conversely, the
correlograms in Figs. 7.6 and 7.7 relate to a non-stationary series insofar as it
appears that the autocorrelation function decreases very slowly.
0
1 2 3 4
h
−1
294 7 An Introduction to Time Series Models
1 2 3 4
0 h
−1
0
1 2 3 4 5 6 8
h
−1
0
1 2 3 4 5 6 7 8
h
−1
1 2 3 4 5 6 7 8
0 h
−1
.RSPt represents the series of returns of the US stock index over the period from
8.4
8.0
7.6
7.2
6.8
6.4
6.0
5.6
1980 1985 1990 1995 2000 2005 2010 2015 2020
Fig. 7.8 Logarithm of Standard and Poor’s 500 stock index, 1980.01–2021.06
.12
.08
.04
.00
-.04
-.08
-.12
-.16
-.20
-.24
1980 1985 1990 1995 2000 2005 2010 2015 2020
gives the values of the Ljung-Box statistic used to test the null hypothesis of no
autocorrelation (see Chap. 4) for a number of lags ranging from 1 to 20. We see that
the value of this statistic for 20 lags is 8 434,5, which is higher than the critical
value of the Chi-squared distribution with 20 degrees of freedom (31.41 at the
5% significance level): the null hypothesis of no autocorrelation is consequently
rejected. These elements confirm the intuition about the non-stationary nature of the
series LSP . On the other hand, we notice that the autocorrelation function of RSP
no longer shows any particular structure, which pleads in favor of the stationarity of
the series. Of course, this intuition must be confirmed by the application of unit root
tests (see below). However, the Ljung-Box statistic for 20 lags is 36.589, which is
slightly higher than the critical value (31.41 at the 5% significance level), leading to
the rejection of the null hypothesis of no autocorrelation.
Economic and financial series are very often non-stationary series. We are interested
here in non-stationarity in the mean. We have seen that non-stationarity can be
identified graphically through the graph of the series and the correlogram. Since
298 7 An Introduction to Time Series Models
Nelson and Plosser (1982), cases of non-stationarity in the mean have been analyzed
using two types of processes:
Characteristics of TS Processes
Generally speaking, a TS process .Yt can be written:
Yt = ft + εt
. (7.14)
7.2 Stationarity: Autocorrelation Function and Unit Root Test 299
where .ft is a deterministic function of time and .εt is a stationary process. In the
simple case where .ft is a polynomial function of order 1, we have:
Yt = γ + t β + εt
. (7.15)
E[Yt ] = E [ γ + t β + εt ]
. (7.16)
E[Yt ] = γ + t β
. (7.17)
Hence:
V [Yt ] = σε2
. (7.19)
Hence:
. Cov[Yt , Ys ] = 0 ∀ t /= s (7.21)
Remark 7.3 A TS process is a process that can be made stationary (i.e., detrended)
by a regression on a deterministic trend.
300 7 An Introduction to Time Series Models
Characteristics of DS Processes
A DS process is a non-stationary process that can be stationarized by applying a
difference filter .Δ = (1 − L)d where L is the lag operator and d is a positive integer
called the differentiation or integration parameter:
. (1 − L)d Yt = β + εt (7.22)
where .εt is a stationary process. Often .d = 1 and the DS process is written as:
Yt − Yt−1 = β + εt
. (7.23)
stationary.
Yt = Yt−1 + β + εt
. (7.24)
Y1 = Y0 + β + ε1
. (7.25)
Y2 = Y1 + β + ε2 = Y0 + 2β + ε1 + ε2
. (7.26)
t
Yt = Y0 + t β +
. εj (7.27)
j =1
Hence:
E[Yt ] = Y0 + t β
. (7.29)
that is:
⎡ ⎤
t
.V [Yt ] = V ⎣ εj ⎦ (7.31)
j =1
So we have:
V [Yt ] = t σε2
. (7.32)
Hence:
To determine whether a series is stationary or not, unit root tests are applied. There
are numerous unit root tests (see in particular Lardic and Mignon, 2002). We present
here only the test of Dickey and Fuller (1979, 1981) aimed at testing the null
hypothesis of non-stationarity against the alternative hypothesis of stationarity. We
thus test:
– .H0 : the series is non-stationary, i.e., it has at least one unit root.
– .H1 : the series is stationary, i.e., it has no unit root.
(1 − ρL) Yt = εt
. (7.35)
that is:
Yt = ρYt−1 + εt
. (7.36)
that is:
Yt = ρYt−1 + μ (1 − ρ) εt
. (7.38)
that is:
Yt − α − βt − ρYt−1 + αρ + β (t − 1) = εt
. (7.40)
7.2 Stationarity: Autocorrelation Function and Unit Root Test 303
hence:
Yt = ρYt−1 + α (1 − ρ) + βρ + β (1 − ρ) t + εt
. (7.41)
– Model [1]:
H0 : ρ = 1 ⇔ Yt = Yt−1 + εt
. (7.42)
H1 : |ρ| < 1 ⇔ Yt = ρYt−1 + εt
Under the null hypothesis, Yt follows a random walk process without drift.
Under the alternative hypothesis, Yt follows an autoregressive process of order 1
(AR(1)).
– Model [2]:
H0 : ρ = 1 ⇔ Yt = Yt−1 + εt
. (7.43)
H1 : |ρ| < 1 ⇔ Yt = ρYt−1 + γ + εt with γ = μ(1 − ρ)
Under the null hypothesis, Yt follows a random walk with drift. Under the
alternative hypothesis, Yt is a TS process. It can be made stationary by calculating
the deviations from the trend estimated by OLS.
To facilitate the application of the test, models [1], [2], and [3] are in practice
estimated in the following form:5
5 The first-difference models allow us to reduce to usual tests of significance of the coefficients,
the critical values being tabulated by Dickey and Fuller (see below).
304 7 An Introduction to Time Series Models
– Model [1]:
ΔYt = φ Yt−1 + εt
. (7.45)
– Model [2]:
Δ Yt = γ + φ Yt−1 + εt
. (7.46)
– Model [3]:
ΔYt = λ + δt + φ Yt−1 + εt
. (7.47)
– If the calculated value of the t-statistic associated with φ is lower than the critical
value, the null hypothesis is rejected, the series is stationary.
– If the calculated value of the t-statistic associated with φ is higher than the critical
value, the null hypothesis is not rejected, the series is non-stationary.
The models used in the DF test are restrictive in that εt is assumed to be white
noise. However, this assumption is very often questioned due to autocorrelation
and/or heteroskedasticity. To solve this problem, Dickey and Fuller proposed a
parametric correction leading to the augmented Dickey-Fuller test.
– Model [1]:
p
ΔYt = φ Yt−1 +
. φj ΔYt−j + εt (7.48)
j =1
– Model [2]:
p
Δ Yt = γ + φ Yt−1 +
. φj ΔYt−j + εt (7.49)
j =1
– Model [3]:
p
ΔYt = λ + δt + φ Yt−1 +
. φj ΔYt−j + εt (7.50)
j =1
Again, we test the null hypothesis .φ = 0 against the alternative hypothesis .φ < 0.
The t-statistic of the coefficient .φ is compared to the critical values tabulated by
Dickey and Fuller (see Table 7.2). The null hypothesis of unit root is rejected if the
calculated value is less than the critical value.
It should be noted that the application of the ADF test requires us to choose
the number of lags p – called the truncation parameter of the ADF test – to
6 One of the causes of error autocorrelation lies in the omission of explanatory variables. The
correction provided by Dickey and Fuller thus consists in adding explanatory variables represented
by the lagged values of the endogenous variable.
306 7 An Introduction to Time Series Models
be introduced so that the residuals are indeed white noise. Several methods are
available for making this choice, including:
– The study of partial autocorrelations of the series .ΔYt . We select for p the lag
corresponding to the last partial autocorrelation significantly different from zero.
– The estimation of several processes for different values of p. We retain the model
that minimizes the information criteria of Akaike, Schwarz, or Hannan-Quinn.
– The use of the procedure suggested by Campbell and Perron (1991) consisting in
setting a maximum value for p, noted .pmax . We then estimate the regression
model of the ADF test and test the significance of the coefficient associated
with the term .ΔYt−pmax . If this coefficient is significant, we select this value
.pmax for p. If the coefficient associated with .ΔYt−pmax is not significant, we re-
estimate the ADF regression model for a value of p equal to .pmax − 1 and test
the significance of the coefficient relating to the term .ΔYt−pmax −1 and so on.
p
ΔYt = α + βt + φ Yt−1 +
. φj ΔYt−j + εt (7.51)
j =1
p
ΔYt = α + φ Yt−1 +
. φj ΔYt−j + εt (7.52)
j =1
and begin by testing the significance of the constant by referring to the Dickey-
Fuller tables (see Table 7.3):
– If the constant is not significant, we go to Step 3.
– If the constant is significant, we test the null hypothesis of unit root by
comparing the t-statistic of .φ with the values tabulated by Dickey and Fuller
(see Table 7.2). We then have two possibilities:
– If we do not reject the null hypothesis, .Yt is non-stationary. In this case, it
must be differentiated and the test procedure must be repeated on the series
in first difference.
– If the null hypothesis is rejected, .Yt is stationary. In this case, the test
procedure stops and we can work directly on the series .Yt .
– Step 3. This step should only be applied if the constant in the previous model is
not significant. We estimate model [1]:
p
ΔYt = φ Yt−1 +
. φj ΔYt−j + εt (7.53)
j =1
and test the null hypothesis of unit root using Dickey-Fuller critical values (see
Table 7.2):
– If the null hypothesis is not rejected, .Yt is non-stationary. In this case, it must
be differentiated and the test procedure must be repeated on the series in first
difference.
– If the null hypothesis is rejected, .Yt is stationary. In this case, the test
procedure stops and we can work directly on the series .Yt .
Remark 7.5 If, after applying this procedure, we find that .Yt is non-stationary, this
means that the series contains at least one unit root. In this case, we should repeat
the Dickey-Fuller tests on the series in first difference. If .ΔYt is found to be non-
stationary, the procedure should be applied again on the series in second difference
and so on.
Empirical Application
Consider the series SP of Standard and Poor’s 500 stock index over the period
from January 1980 to June 2021. The logarithmic series is denoted LSP , with RSP
standing for the series of returns. Our aim is to apply the Dickey-Fuller test strategy.
Let us first study the stationarity of the series LSP . We test the null hypothesis
of non-stationarity of the series LSP (presence of unit root) against the alternative
hypothesis of stationarity (absence of unit root). To this end, we begin by estimating
the model with constant and trend:
p
ΔLSPt = RSPt = λ + δt + φ LSPt−1 +
. φj RSPt−j + εt (7.54)
j =1
Estimating this model involves determining the value of the truncation parameter
p. As previously mentioned, this choice can be guided by the graph of the partial
autocorrelation function of the series RSP (Fig. 7.11). As shown, only the first
partial autocorrelation lies outside the confidence interval. In other words, only the
first partial autocorrelation is significantly different from zero, which leads us to take
a value of p equal to 1. Another technique involves estimating the model (7.54) for
different values of p and selecting the value that minimizes the information criteria.
Table 7.4 shows the values taken by the AIC, SIC, and HQ information criteria for
values of p ranging from 1 to 12. Minimizing the SIC and HQ criteria leads us
to choose .p = 1, while the AIC criterion tends to select .p = 2. For reasons of
parsimony, and insofar as two out of three criteria favor a value of p equal to 1, we
choose .p = 1.7
As a result, we estimate the following model:
The results are set out in Table 7.5. We start by testing the significance of the
trend (noted .@T REND(“1980M01”)) by referring to the Dickey-Fuller tables. The
critical value of the trend in a model with constant and trend for 500 observations
being 2.78 (see Table 7.3), we have .1.9612 < 2.78: we do not reject the null
hypothesis of non-significance of the trend. We then proceed to the next step, which
7 Forrobustness, we also conducted the analysis with two lags. The results are identical to those
presented here.
7.2 Stationarity: Autocorrelation Function and Unit Root Test 309
Table 7.5 ADF test on LSP . Model with constant and trend
Null hypothesis: LSP has a unit root
Exogenous: constant, linear trend
Lag length: 1 (automatic – based on SIC, maxlag .= 17)
t-Statistic Prob.*
Augmented Dickey-Fuller test statistic .−2.037919 0.5786
Test critical values 1%level .−3.976591
5% level .−3.418870
The results are given in Table 7.6. We test the significance of the constant. The
critical value, at the 5% significance level, of the constant in a model with constant
310 7 An Introduction to Time Series Models
Table 7.6 ADF test on LSP . Model with constant, without trend
Null hypothesis: LSP has a unit root
Exogenous: constant
Lag length: 1 (automatic – based on SIC, maxlag .= 17)
t-Statistic Prob.*
Augmented Dickey-Fuller test statistic .−0.584842 0.8709
Test critical values: 1%level .−3.443254
5% level .−2.867124
without trend is 2.52 (see Table 7.3). Since .0.7951 < 2.52, we do not reject the null
hypothesis that the constant is insignificant. Finally, we estimate the model without
constant or trend:
The results in Table 7.7 allow us to proceed with the unit root test, i.e., the
test of the null hypothesis .φ = 0 against the alternative hypothesis .φ < 0. The
calculated value of the ADF statistic is 2.2448 and the critical value is .−1.95 at
the 5% significance level (Table 7.2). Since .2.2448 > −1.95, we do not reject the
null hypothesis of non-stationarity of the series LSP . We deduce that LSP is non-
stationary and characterized by the presence of at least one unit root.
To determine the order of integration of LSP , we differentiate it:
5% level .−1.941460
and we perform the ADF test on the series RSP . The null hypothesis that RSP is
non-stationary is tested against the alternative hypothesis of stationarity. We adopt
the same sequential strategy as before, first estimating the model with constant and
trend:
p
.ΔRSPt = λ + δt + φ RSPt−1 + φj ΔRSPt−j + εt (7.59)
j =1
The endogenous variable is the series of changes in returns, in other words, the
second difference of the LSP series. In order to determine the truncation parameter
p, we have estimated this model for various values of p and selected the one that
minimizes the information criteria. The application of this methodology leads us to
choose a number of lags p equal to 0, which corresponds to the case of a simple
Dickey-Fuller test. Consequently, we estimate the following model:
ΔRSPt = λ + δt + φ RSPt−1 + εt
. (7.60)
312 7 An Introduction to Time Series Models
5% level .−1.941460
and start by testing the significance of the trend. The results (not reported here) give
us a calculated t-statistic associated with the trend equal to 0.1925. As this value
is lower than the critical value of 2.78, we do not reject the null hypothesis that
the trend is not significant. We therefore estimate the model with constant, without
trend. The results lead to a t-statistic associated with the constant equal to 2.3093,
below the critical value of 2.52. We finally estimate the model with no constant or
trend, the results of which are shown in Table 7.8.
The calculated value of the ADF statistic being equal to .−17.4823 and the critical
value at the 5% significance level being .−1.95, we have: .−17.4823 < −1.95. We
therefore reject the null hypothesis of non-stationarity of the series RSP . We deduce
that RSP is stationary, i.e., integrated of order 0. It follows that the series LSP is
integrated of order 1, since it has to be differentiated once to make it stationary.
7.3.1 Definitions
Autoregressive Processes
Definitions
Definition 7.5 An autoregressive process of order p, denoted AR(p), is a
stationary process Yt verifying a relation of the type:
By introducing the lag operator L, the relation (7.61) can also be written as:
. (1 − φ1 L − ··· − φp Lp ) Yt = εt (7.62)
or:
. Ф(L) Yt = εt (7.63)
Remark 7.7 In time series models, the error term εt is often called innovation.
This name derives from the fact that it is the only new information involved in the
process at date t.
1 1
. E[Yt Yt−h ] − φ1 E[Yt−1 Yt−h ] − ··· − φp E[Yt−p Yt−h ] = E[εt Yt−h ]
γ0 γ0
(7.64)
1
. γh − φ1 γh−1 − . . . − φp γh−p = 0 (7.65)
γ0
314 7 An Introduction to Time Series Models
γh
Hence, noting .ρh = γ0 the autocorrelation function:
ρh − φ1 ρh−1 − . . . − φp ρh−p = 0
. (7.66)
p
ρh =
. φi ρh−i ∀h>0 (7.67)
i=1
⎛ ⎞ ⎛ 1 ρ ρ ... ρ ⎞⎛ ⎞
ρ1 1 2 p−1 φ1
⎜ ρ2 ⎟ ⎜ .. ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ρ1 1 . ⎟
⎟⎜
φ2 ⎟
.⎜ . ⎟ = ⎜ ⎜ . ⎟ (7.68)
. ⎜
⎝ . ⎠ ⎝ . . ⎟ ⎝ .. ⎠
. ρ1 ⎠
ρp ρp−1 1 φp
Partial Autocorrelations
It is possible to calculate the partial autocorrelations of the AR process from the
Yule-Walker equations and the autocorrelations. For this, we use the algorithm of
Durbin (1960):
⎧
⎪
⎪ φ11 = ρ1 algorithm initialization
⎪
⎪
h−1
⎪
⎨ ρh − φh−1,j ρh−j
j =1
. φhh = for h = 2, 3, . . . (7.69)
⎪
⎪
h−1
⎪
⎪
1− φh−1,j ρj
⎪
⎩ j =1
φhj = φh−1,j − φhh φh−1,h−j for h = 2, 3 . . . and j = 1, . . . , h − 1
Property 7.1 For a process . AR(p), . φhh = 0 ∀ h > p. In other words, for a
process .AR(p), the partial autocorrelations cancel out from rank .p + 1.
Moving-Average Processes
Definitions
Definition 7.6 A moving-average process of order q, denoted MA(q) , is a
stationary process Yt verifying a relationship of the type:
Yt = εt − θ1 εt−1 − · · · − θq εt−q
. (7.70)
Yt = (1 − θ1 L − · · · − θq Lq ) εt
. (7.71)
or:
with Θ(L) = 1 − θ1 L − · · · θq Lq .
γh = E[Yt Yt−h ]
. (7.73)
= E (εt − θ1 εt−1 − · · · − θq εt−q )(εt−h − · · · − θq εt−h−q )
γh
We deduce the autocorrelation function .ρh = γ0 :
Property 7.2 For a process .MA(q), .ρh = 0 for .h > q. In other words, the
autocorrelations cancel from rank .q + 1, when the true data generating process
is a .MA(q).
316 7 An Introduction to Time Series Models
Partial Autocorrelations
In order to calculate the partial autocorrelations of an MA process, we use
the Durbin algorithm. However, the partial autocorrelation function of a process
. MA(q) has no particular property and its expression is relatively complicated.
Definitions
Definition 7.7 A stationary process Yt follows an ARMA(p, q) process if:
Autocorrelations
To calculate the autocorrelations of an ARMA process, we proceed as in the case of
AR processes. We obtain the following expression:
p
ρh =
. φi ρh−i ∀h>q (7.79)
i=1
Partial Autocorrelations
The partial autocorrelation function of ARMA processes has no simple expression.
It depends on the order of each part (p and q) and the value of the parameters. It
is most frequently characterized either by a decreasing exponential form or by a
damped oscillatory form.
7.3 ARMA Processes 317
In order to determine the appropriate ARMA process for modeling the time
series under consideration, Box and Jenkins suggested a four-step methodology:
identification, estimation, validation, and forecasting. Let us briefly review these
different steps.
Autocorrelation Function
We start by calculating the autocorrelation coefficients from the expression (7.6):
T
−h
(Yt − Ȳ )(Yt+h − Ȳ )
t=1
ρ̂h =
. (7.80)
T
(Yt − Ȳ )2
t=1
– If . |tρ̂h | < t (T − l), we do not reject the null hypothesis: .ρh is not significant.
– If . |tρ̂h | ≥ t (T − l), we reject the null hypothesis: .ρh is significantly different
from zero,
!1/2
h−1
8 Bartlett showed that the standard deviation is given by .σ̂ ρ̂h = 1
T 1+2 ρ̂i2 .
i=1
318 7 An Introduction to Time Series Models
Example 7.1 Suppose that the application of the t-test on autocorrelations yields
ρ1 /= 0 and .ρ2 = . . . = ρH = 0. The process identified is then an .MA(1) since the
.
– If . |tφ̂hh | < t (T − l), we do not reject the null hypothesis: .φhh is not significantly
different from zero.
– If . |tφ̂hh | ≥ t (T − l), we reject the null hypothesis: .φhh is significantly different
from zero.,
Example 7.2 Suppose that the application of the t-test on partial autocorrelations
yields .φ11 /= 0 and .φ22 = . . . = φH H = 0. The process identified is then an .AR(1)
since the partial autocorrelations cancel out from rank .p + 1, with .p = 1.
At the end of this identification stage, one or more models have been selected. It
is now necessary to estimate each selected model, which is the object of the second
step of the Box and Jenkins procedure.
– With regard to the coefficients, these are the usual significance tests (t-tests). As
these tests are identical to those presented in the previous chapters, we will not
repeat them here. Let us simply note that if some of the estimated coefficients
are not significant, the estimation must be repeated by deleting the variable(s)
associated with the non-significant coefficients.
– With regard to the residuals, the aim is to test whether they have the “good”
statistical properties. In particular, we need to test whether the residuals are
homoskedastic and not autocorrelated.
If several models are validated, the validation step should continue with a
comparison between these models.
Tests on Residuals
"(L)
Ф
The purpose of these tests is to verify that the residuals .et = Θ "(L) Yt do follow
a white noise process. To this end, we apply tests of absence of autocorrelation
and tests of homoskedasticity. These various tests have already been presented in
detail in Chap. 4 and remain valid in the context of ARMA processes. Thus, in
order to test the null hypothesis of no autocorrelation, the Breusch-Godfrey, Box-
Pierce, or Ljung-Box tests can be applied. Similarly, to test the null hypothesis of
homoskedasticity, the tests of Goldfeld and Quandt, Glejser, Breusch-Pagan, White,
or the ARCH test can be implemented.
The tests most commonly used in time series econometrics are the Box-Pierce
or Ljung-Box tests with regard to absence of autocorrelation, and the ARCH test
with regard to homoskedasticity. It is worth clarifying the number of degrees
of freedom associated with the Box-Pierce and Ljung-Box tests. Under the null
hypothesis of no autocorrelation, these two statistics have a Chi-squared distribution
with .(H − p − q) degrees of freedom, where H is the maximum number of lags
considered for calculating autocorrelations, p is the order of the autoregressive part,
and q is the order of the moving-average part.
Once the various tests have been applied, several models can be validated. It
remains for us to compare them in an attempt to select the most “adequate” model.
To this end, various model selection criteria can be used.
– Standard criteria: they are based on the calculation of the forecast error that we
seek to minimize. In this context, the most frequently used criteria are:
– The mean absolute error:
1
MAE =
. |et | (7.81)
T t
320 7 An Introduction to Time Series Models
where T is the number of observations in the series .Yt studied and .et are the
residuals.
The lower the value taken by these criteria, the closer the estimated model is
to the observations.
– Information criteria: we have already presented them in Chap. 3. The most
widely used criteria are those of Akaike, Schwarz, and, to a lesser extent,
Hannan-Quinn:
– The Akaike information criterion (1969):9
2(p + q)
AI C = log "
. σε2 + (7.84)
T
– The Schwarz information criterion (1978):
log T
SI C = log "
. σε2 + (p + q) (7.85)
T
log(log T )
H Q = log "
. σε2 + 2(p + q) (7.86)
T
Ф(L) Yt = Θ(L) εt
. (7.87)
"t+h denote the forecast made at t for the date .t + h, with h denoting the
and let .Y
forecast horizon. By definition, we have the following expression:
with . |φ1 | < 1 and . |θ1 | < 1 . Let us calculate the forecasts for various horizons.
– .Yt+1 = φ1 Yt + εt+1 − θ1 εt
"t+1 = E[Yt+1 |It ] = φ1 Yt − θ1 εt
.Y
"t+h = φ1 Y
Y
. "t+h−1 ∀h>1 (7.90)
"t+h ± u × σet+h
Y
. (7.91)
assuming that the residuals follow a Gaussian white noise process, with u being the
value of the standard normal distribution at the selected significance level (at the
5% level, .u = 1.96). It is then possible to impart a certain degree of confidence to
the forecast if the value of the dependent variable, for the horizon considered, lies
within the prediction interval.
Consider again the series RSP of the returns of Standard and Poor’s stock index
at monthly frequency over the period from February 1980 to June 2021. As we
have previously shown, this series is stationary and can, therefore, be modeled by
an ARMA-type process. To this end, let us take up the four steps of the Box and
Jenkins methodology.
322 7 An Introduction to Time Series Models
Step 1: Identification
In order to identify the orders p and q, let us consider the graph of autocorrelations
and partial autocorrelations of the series RSP . Examining Fig. 7.12 shows that:
– The first autocorrelation falls outside the confidence interval, being significantly
different from zero. From order 2 onwards, the autocorrelations cancel out. We
deduce .q = 1.
– The first partial autocorrelation lies outside the confidence interval, and is sig-
nificantly different from zero. From order 2 onwards, the partial autocorrelations
cancel out. We deduce .p = 1.
At the end of this step, we identify three processes: .AR(1), .MA(1), and
ARMA(1, 1). We can now estimate each of these models.
.
Step 2: Estimation
We estimate the three processes identified: .AR(1) (Table 7.9), .MA(1) (Table 7.10),
and .ARMA(1, 1) (Table 7.11).
7.3 ARMA Processes 323
Step 3: Validation
Tests of Significance of Coefficients
Let us first proceed to the significance of the coefficients in each of the three
estimated models:
At the end of this first phase of the validation stage, two processes are candidates
for modeling the series RSP : the .AR(1) and the .MA(1) processes.
Tests on Residuals
We now apply the tests to the residuals of the .AR(1) and .MA(1) models. We start
with the Ljung-Box test of absence of autocorrelation. The results are shown in
Figs. 7.13 and 7.14. These figures first show that the autocorrelations of the residuals
lie within the confidence interval for each of the two models, suggesting the absence
of autocorrelation. Let us calculate the Ljung-Box statistic for a maximum number
of lags H of 20:
– For the residuals of the .AR(1) model, we have .LB(20) = 14.333. Under
the null hypothesis of no autocorrelation, this statistic follows a Chi-squared
distribution with .(H − p − q) = (20 − 1 − 0) = 19 degrees of freedom. At
the 5% significance level, the critical value of the Chi-squared distribution with
19 degrees of freedom is .30.144. Since .14.333 < 30.144, we do not reject the
null hypothesis of no autocorrelation of residuals. The model .AR(1) therefore
remains a candidate.
– For the residuals of the .MA(1) model, we have .LB(20) = 11.388. Under the null
hypothesis of no autocorrelation, this statistic has a Chi-squared distribution with
.(H − p − q) = (20−0−1) = 19 degrees of freedom, the corresponding critical
value being .30.144 at the 5% significance level. We find that .11.388 < 30.144,
7.3 ARMA Processes 325
Let us now apply the ARCH test to check that the residuals of both models
are indeed homoskedastic. This test involves regressing the squared residuals on
a constant and their .𝓁 past values:
𝓁
et2 = a0 +
.
2
ai et−i (7.92)
i=1
H0 : a1 = a2 = . . . = a𝓁 = 0
. (7.93)
We have estimated the relationship (7.92) on the squares of the residuals of each
of the two models considered, using a number of lags .𝓁 = 1. The results are shown
in Table 7.12. For both models, the critical value to which the .T R 2 test statistic must
be compared is that of the Chi-squared distribution with 1 degree of freedom, i.e.,
3.841 at the 5% significance level. It can be seen that, for both models, the calculated
value of the test statistic is higher than the critical value. The null hypothesis of
homoskedasticity is consequently rejected at the 5% significance level.
In summary, the residuals of the .AR(1) and .MA(1) models are not autocorre-
lated, but (slightly) heteroskedastic. Both models therefore pass the validation stage
from the point of view of the absence of autocorrelation, but not from the point of
view of the homoskedasticity property. This result is not surprising insofar as the
study concerns financial series that are known to exhibit heteroskedasticity due to
their time-varying volatility.
7.4 Extension to the Multivariate Case: VAR Processes 327
we have .p = 4. The .V AR(4) model describing these two variables is written as:
⎧
⎪
⎪
4
4
⎨ Y1t = a1 +
⎪ b1i Y1t−i + c1j Y2t−j − d1 Y2t + ε1t
i=1 j =1
. (7.94)
⎪
⎪
4
4
⎪
⎩ Y2t = a2 + b2i Y1t−i + c2j Y2t−j − d2 Y1t + ε2t
i=1 j =1
where .ε1t and . ε2t are two uncorrelated white noise processes.
This model involves estimating 20 coefficients. The number of parameters to be
estimated grows rapidly with the number of lags, as .pN 2 , where p is the number of
lags and N the number of variables in the model.
In matrix form, the .V AR(4) process is written:
4
B Y t = Ф0 +
. Фi Y t−i + ε t (7.95)
i=1
with:
# $ # $ # $
1 d1 a1 Y1t
B=
. Ф0 = Yt = (7.96)
d2 1 a2 Y2t
# $ # $
b1i c1i ε1t
Фi = εt =
b2i c2i ε2t
General Formulation
We generalize the previous example to the case where .Y t contains N variables and
for any order of lags p. A .V AR(p) process with N variables is written in matrix
form:
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
Y1t ε1t a10
⎜ . ⎟ ⎜ ⎟ ⎜ ⎟
.Y t = ⎝ . ⎠
. ε t = ⎝ ... ⎠ Ф0 = ⎝ ... ⎠ (7.98)
YN t εN t 0
aN
⎛ ⎞
a 1 a1p
2 ... N
a1p
⎜ 1p .. ⎟
Фp = ⎜
⎝ ..
. ..
.
⎟
. ⎠
1 2
aNp aNp . . . N
aNp
(I − Ф1 L − Ф2 L2 − · · · − Фp Lp ) Y t = Ф0 + εt
. (7.99)
or:
Ф(L) Y t = Ф0 + ε t
. (7.100)
p
with .Ф (L) = I − Фi Li .
i=1
More formally, the following definition is used.
Definition 7.8 .Y t follows a .V AR(p) process if and only if there exist white noise
ε t .(ε t ∼ W N (0, Σ ε )), . Ф0 ∈ R N , and . p matrices . Ф1 , . . . , Фp such that:
.
p
Yt −
. Фi Y t−i = Ф0 + ε t (7.101)
i=1
or:
Ф(L)Y t = Ф0 + ε t
. (7.102)
p
Ф(L) = I −
. Фi Li (7.103)
i=1
The parameters of the VAR process can only be estimated on stationary time
series.11 Two estimation techniques are possible: estimation of each equation of
the VAR model by OLS or estimation by the maximum likelihood technique. The
estimation of a VAR model involves choosing the number of lags p. To determine
this value, the information criteria can be used. The procedure consists in estimating
a number of VAR models for an order p ranging from 0 to h where h is the maximum
lag. We select the lag p that minimizes the information criteria AIC, SIC, and HQ12
defined as follows:
2N 2 p
AI C = log det Σ̂ ε +
. (7.104)
T
log T
SI C = log det Σ̂ ε + N 2 p
. (7.105)
T
log(log T )
H Q = log det Σ̂ ε + 2N 2 p
. (7.106)
T
where N is the number of variables in the system, T is the number of observations,
and .Σ̂ ε is an estimator of the variance-covariance matrix of the residuals, det
denoting its determinant.
Remark 7.9 In the case of AR processes, in addition to the tests on the parameters,
tests on the residuals are performed in order to validate the process. In the case of
VAR processes, these tests are not very powerful, and we prefer to use a graph
of the residuals. Residuals should be examined carefully especially when using
VAR models for impulse response analysis, where the absence of correlation of
the innovations is crucial for the interpretation.
Y t = Ф1 Y t−1 + . . . + Фp Y t−p + ε t
. (7.107)
12 We have assumed here that the constant c involved in the expression of the HQ criterion is equal
to 1.
7.4 Extension to the Multivariate Case: VAR Processes 331
It is assumed that p has been chosen, that the .Фi have been estimated, and that
the variance-covariance matrix associated with .ε t has been estimated.
Under certain conditions, the prediction in .(T + 1) of the process is:
E Y T +1 Y T = Ф̂1 Y T + . . . + Ф̂p Y T −p+1
. (7.108)
Granger, 1969). As an example, consider the following .V AR(p) process with two
variables .Y1t and .Y2t :
# $ # $ # $# $
Y1t a0 a11 b11Y1t−1
. = + + ... (7.109)
Y2t b0 a12 b12Y2t−1
# $ # $
ap1 bp1 Y1t−p ε1t
+ +
ap2 bp2 Y2t−p ε2t
Testing for the absence of causality from .Y1t to .Y2t is equivalent to performing
a restriction test on the coefficients of the variables . Y1t of the VAR representation.
Specifically:
– . Y1t does not cause . Y2t if the following null hypothesis is not rejected: .H0 : b11 =
b21 = · · · = bp1 = 0.
– .Y2t does not cause . Y1t if the following null hypothesis is not rejected.: H0 : a12 =
a22 = · · · = ap2 = 0.
These are classic Fisher tests. They are performed either equation by equation, or
directly by comparison between a constrained . V AR model and an unconstrained
. V AR model. In the latter case, we can also perform a maximum likelihood ratio
test.
In the case of Fisher tests, the strategy is as follows for a test of absence of
causality from .Y1t to .Y2t :
– We regress .Y2t on its p past values and on the p past values of .Y1t . This is the
unconstrained model and we note .RSSnc the sum of squared residuals associated
with this model.
332 7 An Introduction to Time Series Models
– We regress .Y2t on its p past values and note the sum of squared residuals
.RSSc . This is the constrained model in that we have imposed the nullity of the
where r is the number of constraints, i.e., the number of coefficients being tested
for nullity, and k is the number of estimated parameters (excluding the constant)
involved in the unconstrained model. Under the null hypothesis of no causality,
this statistic has a Fisher distribution with .(r, T − k − 1) degrees of freedom.
In the case of a maximum likelihood ratio test, we calculate the test statistic:
c
det Σ̂ ε
. C = T log nc (7.111)
det Σ̂ ε
c nc
where .Σ̂ ε (respectively .Σ̂ ε ) denotes the estimator of the variance-covariance
matrix of the residuals of the constrained (respectively unconstrained) model, det
being the determinant. Under the null hypothesis of no causality, this statistic
follows a Chi-squared distribution with 2p degrees of freedom.
Remark 7.10 If we reject the two null hypotheses (absence of causality from .Y1 to
Y2 and absence of causality from .Y2 to .Y1 ), we have a bi-directional causality; we
.
Remark 7.11 One of the practical applications of VAR models lies in the calcu-
lation of the impulse response function. The latter makes it possible to assess the
effect of a random shock on the variables and can therefore be useful for analyzing
the effects of an economic policy. This analysis is beyond the scope of this book
and we refer the reader to Hamilton (1994), Lardic and Mignon (2002), or Greene
(2020).
Consider Standard and Poor’s 500 (SP 500) US stock index series and the associated
dividend series over the period from January 1871 to June 2021. Since the data are
monthly, the number of observations is 1 806. The series are expressed in real terms,
i.e., they have been deflated by the consumer price index.13
1
80 90 00 10 20 30 40 50 60 70 80 90 00 10 20
LSP LDIV
We denote LSP the logarithm of the SP 500 index and LDI V the dividend series
in logarithms. We are interested in the relationship between the two series, seeking
to estimate a VAR-type model. Let us start by studying the characteristics of the two
variables in terms of stationarity.
The two series are shown in Fig. 7.15 and appear to exhibit an overall upward
trend, suggesting that they are non-stationary in the mean. In order to confirm this
intuition, we perform the Dickey-Fuller unit root tests. To do this, we follow the
sequential strategy presented earlier. First, we estimate the model with constant and
trend. If the trend is not significant, we estimate the model with constant. Finally, if
the constant is not significant, we estimate the model without constant or trend. The
implementation of this strategy leads us to select:
The results obtained for the value of the ADF statistic are shown in Table 7.14.
334 7 An Introduction to Time Series Models
It can be seen that the null hypothesis of unit root cannot be rejected for the
two series considered LSP and LDI V . The application of the Dickey-Fuller tests
on the series in first difference (denoted DLSP and DLDI V ) indicates that they
are stationary. In other words, the differentiated series are integrated of order 0,
implying that the series LSP and LDI V are integrated of order 1.
The VAR model is then estimated on the series DLSP and DLDI V , i.e., on the
stationary series. We start by looking for the order p of the VAR process. To this
end, we estimate the VAR process for values of p ranging from 1 to 12 and report
the values taken by the AIC, SIC, and HQ criteria (see Table 7.15). The SIC and HQ
criteria lead us to select a .V AR(2) process, whereas, according to the AIC criterion,
we should select a .V AR(12) process. For the sake of parsimony, we continue the
study with the .V AR(2) process.
The results from the estimation of the .V AR(2) process are shown in Table 7.16;
the values in square brackets represent the t-statistics of the estimated coefficients.
It can be seen that the SP returns are a function of themselves lagged by one and
two periods and of dividends lagged by two periods. The logarithmic changes in
dividends are a function of their one- and two-period lagged values and of the 1-
month lagged values of the SP returns.
Let us now perform the Granger causality test and start by implementing Fisher
tests. First, let us test the null hypothesis that the dividend growth rate does not cause
the returns of the SP index. We estimate two models:
[ 11.9821] [.−4.49688]
DLSP(.−2) .−0.073616 0.010414
[.−3.10058] [ 1.46603]
DLDIV(.−1) .−0.087335 0.459320
[.−1.11822] [ 19.6562]
DLDIV(.−2) 0.229337 0.165131
[ 2.94504] [ 7.08752]
C 0.001479 0.000546
[ 1.58671] [ 1.95616]
R-squared 0.078311 0.319494
Adj. R-squared 0.076260 0.317980
Sum sq. resids 2.778588 0.248729
S.E. equation 0.039311 0.011762
F-statistic 38.19147 211.0376
Log likelihood 3279.105 5454.724
Akaike AIC .−3.631841 .−6.045174
The number of constraints is 2 (we are testing the nullity of the two coefficients
associated with the lagged dividend growth rate), the number of observations is
1 805, and the number of estimated parameters (excluding the constant) in the
unconstrained model is 4. Under the null hypothesis, the F -statistic follows a Fisher
distribution with (2,1800) degrees of freedom. At the 5% significance level, the
critical value is 2.997. Since .4.5023 > 2.997 we reject the null hypothesis of no
causality of the dividend growth rate towards stock market returns.
336 7 An Introduction to Time Series Models
Let us now consider the test of the null hypothesis that stock market returns do
not cause the dividend growth rate. We estimate two models:
If we compare this value with the critical value 2.997, we reject the null
hypothesis of no causality of stock market returns towards the dividend growth rate.
We can also perform a Chi-squared test, calculating the test statistic C. The
calculation of this statistic gives us:
– For the test of the null hypothesis that the dividend growth rate does not cause
returns: .C = 9.0191
– For the test of the null hypothesis that returns do not cause the dividend growth
rate: .C = 20.2855
In both cases, the statistic C is higher than the critical value of the Chi-squared
distribution at the 5% significance level. The null hypothesis is rejected. There is
therefore a two-way causality between stock market returns and the dividend growth
rate, testifying to the presence of a feedback effect.
– Example 1: regression of the infant mortality rate in Egypt (MOR) on the income
of US farmers (I N C) and on the money supply in Honduras (M), annual data
1971–1990:
t = −2943 + 45.79LI F Et
EXP
. (7.115)
(−16.70) (17.76)
P
. OP t = 21698.7 + 111.58RDt (7.116)
(59.44) (26.40)
These three examples illustrate regressions that make no sense, since it is obvious
that there is no link between the explanatory variables and the variable being
explained in each of the three cases considered. Thus, if we take the third example,
it goes without saying that finding that R&D spending in the United States has an
impact on the population in South Africa makes little sense. These examples are
illustrative of spurious regressions, i.e., regressions that are meaningless. This is
due to the non-stationarity of the different series involved in the regressions.
Two features are common to all three regressions: firstly, the coefficient of
determination is very high (above .0.9 in our examples), and, secondly, the value
of the Durbin-Watson statistic is low. These two characteristics are symptomatic of
spurious regressions.
14 These examples are taken from the website of J. Gonzalo, Universidad Carlos III, Madrid.
338 7 An Introduction to Time Series Models
If . Xt and . Yt are two series . I (d), then in general the linear combination . zt :
zt = Yt − βXt
. (7.117)
is also . I (d) .
However, it is possible that . zt is not . I (d) but . I (d − b) where . b is a positive
integer .(d ≥ b > 0). In other words, .zt is integrated of an order lower than the order
of integration of the two variables under consideration. In this case, the series . Xt
and . Yt are said to be cointegrated, which is noted:
.β is the cointegration parameter and the vector . [1, −β] is the cointegrating
vector.
The most studied case corresponds to: . d = b = 1. Thus, two non-stationary
series .(I (1)) are cointegrated if there exists a stationary linear combination .(I (0))
of these two series.
The underlying idea is that, in the short term, . Xt and . Yt may diverge (they
are both non-stationary), but they will move in unison in the long term. There is
therefore a stable long-term relationship between . Xt and . Yt . This relationship is
called cointegration (or cointegrating) relationship or the long-term relation-
ship. It is given by . Yt = βXt (i.e., zt = 0).15 In the long term, similar movements
of .Xt and . Yt tend to offset each other yielding a stationary series. .zt measures
the extent of the imbalance between . Xt and . Yt and is called the equilibrium
error. Examples corresponding to such a situation are numerous in economics: the
relationship between consumption and income, the relationship between short- and
long-term interest rates, the relationship between international stock market indices,
and so on.
15 Note that the cointegrating relationship can include a constant term, for example: .Y
t = α + βXt .
7.5 Cointegration and Error-Correction Models 339
Remark 7.12 For the sake of simplification, we have considered here the case
of two variables. The notion of cointegration can be generalized to the case of N
variables. We will not deal with this generalization in the context of this textbook
and refer readers to Engle and Granger (1991), Hamilton (1994), or Lardic and
Mignon (2002).
One of the fundamental properties of cointegrated series is that they can be modeled
as an error-correction model. This result was demonstrated in the Granger
representation theorem (Granger, 1981), valid for series .CI (1, 1). Such models
allow us to model the adjustments that lead to a long-term equilibrium situation.
They are dynamic models, incorporating both short-term and long-term changes in
variables.
Let . Xt and . Yt be two .CI (1, 1) variables. Assuming that .Yt is the endogenous
variable and .Xt is the explanatory variable, the error-correction model is written:
ΔYt = γ ẑt−1 +
. βi ΔXt−i + δj ΔYt−j + d(L) εt (7.119)
i j
where .εt is white noise. .ẑt = Yt − β̂Xt is the residual from the estimation of the
cointegration relationship between . Xt and . Yt . . d(L) is a finite polynomial in . L. In
practice, we frequently have .d(L) = 1 and the error-correction model is written
more simply:
ΔYt = γ ẑt−1 +
. βi ΔXt−i + δj ΔYt−j + εt (7.120)
i j
Yt = α + βXt + zt
. (7.121)
where .zt is the error term. If the variables are cointegrated, we proceed to the second
step.
Second step: Estimation of the error-correction model.
The error-correction model is estimated by OLS:
ΔYt = γ"
. zt−1 + βi ΔXt−i + δj ΔYt−j + εt (7.122)
i j
where . εt ∼ W N and . " zt−1 is the residual from the estimation of the one-period-
lagged long-term relationship: " .zt−1 = Yt−1 − α̂ − β̂Xt−1 .
In the first step of the Engle and Granger (1987) estimation method, it is
necessary to check that the series .Xt and .Yt are cointegrated, i.e., that the residuals
of the long-term relationship are stationary (.I (0)). It is important to remember that
if .ẑt is not stationary, i.e., if the variables .Xt and .Yt are not cointegrated, then the
relationship (7.121) is a spurious regression. On the other hand, if .ẑt is stationary, the
relationship (7.121) is a cointegrating relationship. To test whether the residual term
of the long-term relationship is stationary or not, cointegration tests are performed.
There are several such tests (see in particular Engle and Granger, 1987; Johansen,
1988 and 1991); here we propose the Dickey-Fuller test.
term relationship:
"
.zt = Yt − α̂ − β̂Xt (7.123)
zt = φ"
Δ"
. zt−1 + ut (7.124)
p
zt = φ"
Δ"
. zt−1 + zt−i + ut
φi Δ" (7.125)
i=1
– If . tφ̂ is lower than the critical value, we reject .H0 : the series .Xt and .Yt are
cointegrated.
17 Inthe MacKinnon table, critical values are distinguished according to whether or not a trend is
included in the cointegration relationship.
342 7 An Introduction to Time Series Models
– If . tφ̂ is higher than the critical value, we do not reject .H0 : the variables .Xt and
.Yt are not cointegrated.
Remark 7.13 The method of Engle and Granger (1987) provides us with a
simple way to test the hypothesis of no cointegration and to estimate an error-
correction model in two steps. The disadvantage of this approach is that it does
not allow multiple cointegration vectors to be distinguished. This is problematic
when we study N variables simultaneously, with .N > 2, or, if preferred, when
we have more than one explanatory variable .(k > 1). Indeed, we know that
if we analyze the behavior of N variables (with .N > 2), we can have up to
.(N − 1) cointegration relationships, the Engle-Granger approach allowing us to
7,000 70
6,000 60
5,000 50
4,000 40
3,000 30
2,000 20
1,000 10
0 0
80 90 00 10 20 30 40 50 60 70 80 90 00 10 20
SP DIV
Fig. 7.16 Evolution of stock prices and dividends, United States, 1871.01–2021.06
18 www.econ.yale.edu/~shiller.
344 7 An Introduction to Time Series Models
1.2
0.8
0.4
0.0
-0.4
-0.8
-1.2
-1.6
80 90 00 10 20 30 40 50 60 70 80 90 00 10 20
Fig. 7.17 Residuals of the long-term relationship between prices and dividends
prices and dividends follow a common trend, even though prices vary much more
than dividends. This is representative of the well-known phenomenon of excessive
stock price volatility.
In any case, and having confirmed that the two series under consideration are
indeed non-stationary and integrated of the same order (order 1), it is legitimate to
address the question of cointegration between the two variables. To this end, we
regress prices on dividends and study the stationarity of the residuals resulting from
the estimation of this relationship.
Figure 7.17 plots the pattern of this residual series. No particular structure
emerges, suggesting that the residuals appear stationary. Let us check this intuition
by applying the augmented Dickey-Fuller test to the residual series. We select a
number of lags equal to 1 and obtain a calculated value of the ADF statistic equal
to .−4.5805. The 5% critical value for 2 variables, more than 200 observations,
and zero lags is equal to .−3.37. Since .−4.5805 < −3.37, the null hypothesis of
non-stationarity of the residual series is rejected. It follows that the null hypothesis
of no cointegration between prices and dividends is rejected. Prices and dividends
are therefore cointegrated: there is a stable long-term relationship between the two
series, which is consistent with the efficient capital market hypothesis for the United
States over the period 1871–2021.
We consider the long-term (10-year) interest rate series for Germany (GER) and
Austria (AU T ) at daily frequency over the period from January 2, 1986, to July
7.5 Cointegration and Error-Correction Models 345
13, 2021, i.e., a total of 9 269 observations. These series are extracted from the
Macrobond database. Since the Engle-Granger approach applies for .CI (1, 1) series,
we first implement the ADF unit root test on the series GER and AU T . To this
end, we follow the previously presented strategy, consisting in starting from the
estimation of a model with trend and constant, then estimating a model with constant
without trend if the latter is not significant, and finally a model without constant or
trend if neither of them proves to be significant. The application of this strategy
leads to the results shown in Table 7.20. We have chosen a model without constant
or trend for both series. Since the calculated value of the ADF statistic for the series
GER and AU T is higher than the critical value, we do not reject the null hypothesis
of unit root at the 5% significance level. To determine the order of integration of the
two series, we differentiate them and apply the test procedure on the series in first-
difference DGER and DAU T . In both cases, a model without constant or trend
is used. It appears that the calculated value of the ADF statistic is lower than the
critical value at the 5% significance level: the null hypothesis of unit root is therefore
rejected. In other words, DGER and DAU T are integrated of order zero, implying
that GER and AU T are integrated of order 1.
The two series are integrated of the same order (order 1), which is a necessary
condition for implementing the Engle-Granger method.
Figure 7.18 representing the joint evolution of the two series further indicates
that GER and AU T are characterized by a common trend over the entire period.
Thus, since GER and AU T are non-stationary and integrated of the same order,
and follow a similar pattern, it is legitimate ask whether the two variables are
cointegrated.
We begin by estimating the static relationship between GER and AU T , i.e.:
AU Tt = α + βGERt + zt
. (7.126)
The results from estimating this relationship allow us to deduce the residual
series:
Recall that:
10
-2
1990 1995 2000 2005 2010 2015 2020
GER AUT
Fig. 7.18 10-year interest rates, Germany (GER) and Austria (AUT), January 2, 1986–July 13,
2021
5% level .−1.940858
Conclusion
This chapter has introduced the basics of time series econometrics, a branch of
econometrics that is still undergoing many developments. In addition to univariate
time series models, we have dealt with multivariate analysis through VAR processes.
In these processes, all variables have the same status, in the sense that no distinction
is made between endogenous and exogenous variables. An alternative to VAR
processes are the simultaneous equations models which are discussed in the next
chapter. Unlike VAR models, which have no theoretical content, simultaneous
equations models are structural macroeconomic models.
Stationarity
E Yt2 < ∞∀t ∈ Z
E (Yt ) = m ∀t ∈ Z
Definition
Cov (Yt , Yt+h ) = γh , ∀t, h ∈ Z,
γ : autocovariance function
Unit root test Dickey-Fuller tests
Functions
Autocovariance γh = Cov (Yt , Yt+h ) , h ∈ Z
Autocorrelation ρh = γγh0 , h ∈ Z
Partial autocorrelation φhh : calculation using the Durbin algorithm
Process
AR(p) Yt − φ1 Yt−1 − ··· − φp Yt−p = εt
φhh = 0 ∀h > p
MA(q) Yt = εt − θ1 εt−1 − · · · − θq εt−q
ρh = 0 ∀h > q
ARMA(p, q) Yt − φ1 Yt−1 − ··· − φp Yt−p =
εt − θ1 εt−1 ··· θq εt−q
Information criteria
2(p+q)
Akaike AI C = log "
σε2 + T
Schwarz σε + (p + q) logT T
SI C = log "2
Further Reading
There are many textbooks on time series econometrics. In addition to the pioneering
work by Box and Jenkins (1970), let us mention the manuals by Harvey (1990),
Mills (1990), Hamilton (1994), Gouriéroux and Monfort (1996), or Brockwell and
Davis (1998); Hamilton’s (1994) work in particular includes numerous develop-
ments on multivariate models.
On the econometrics of non-stationary time series, in addition to the textbooks
cited above and the many references included in this chapter, readers may usefully
consult Engle and Granger (1991), Banerjee et al. (1993), Johansen (1995), as well
as Maddala and Kim (1998).
As mentioned in this chapter, time series econometrics has undergone, and
continues to undergo, many developments. There are therefore references specific
to certain fields:
So far, with the exception of the VAR models presented in the previous chapter, we
have considered models with only one equation. However, many economic theories
are based on models with several equations, i.e., on systems of equations. Since
these equations are not independent of each other, the interaction of the different
variables has important consequences for the estimation of each equation and for
the system as a whole.
We start by outlining the analytical framework before turning to the possibility
or not of estimating the parameters of the model, known as identification. We
then present the estimation methods relating to simultaneous equations models,
as well as the specification test proposed by Hausman (1978). We conclude with an
empirical application.
In the single-equation models we have studied so far, there is only one endogenous
variable, the latter being explained by one or more exogenous variables. If a causal
relationship exists, it runs from the exogenous variables to the endogenous variable.
In a simultaneous equations model, each equation is relative to an endogenous
variable, and it is very common for an explained variable in one equation to become
an explanatory variable in another equation of the model. The distinction between
endogenous and exogenous variables is therefore no longer as marked as in the case
of single-equation models and, in a simultaneous equations model, the variables
are determined simultaneously. This dual status of the variables appearing in a
simultaneous equations model means that it is impossible to estimate the parameters
of one equation without taking into account the information provided by the other
equations in the system. In particular, the OLS estimators are biased and non-
consistent, in the sense that they do not converge to their true values when the sample
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 351
V. Mignon, Principles of Econometrics, Classroom Companion: Economics,
https://doi.org/10.1007/978-3-031-52535-3_8
352 8 Simultaneous Equations Models
Yt = α + βXt + εt
. (8.1)
Xt = Yt + Zt
. (8.2)
form:
Yt = α + βXt + εt = α + β (Yt + Zt ) + εt
. (8.3)
hence:
α β 1
Yt =
. + Zt + εt (8.4)
1−β 1−β 1−β
We deduce:
α β 1
Xt =
. + Zt + εt + Zt (8.5)
1−β 1−β 1−β
hence:
α 1 1
Xt =
. + Zt + εt (8.6)
1−β 1−β 1−β
α β
Yt =
. + Zt + μt (8.7)
1−β 1−β
α 1
Xt =
. + Zt + μt (8.8)
1−β 1−β
qtd = α1 pt + α2 yt + εtd
. (8.9)
qts = β1 pt + εts
. (8.10)
qtd = qts = qt
. (8.11)
where Eq. (8.9) is the demand equation, .qtd denoting the quantity demanded of
any good, .pt the price of that good, and .yt income. Equation (8.10) is the supply
equation, .qts denoting the quantity offered of the good under consideration. .εtd and .εts
are error terms, also known as disturbances. The demand and supply equations are
behavioral equations. Finally, Eq. (8.11) is called the equilibrium equation: it is
the equilibrium condition represented by the equality between demand and supply.
Equilibrium equations contain no error term.
The equations of this system, derived from economic theory, are called struc-
tural equations. This is referred to as a model expressed in structural form. In
this system, price and quantity variables are interdependent, so they are mutually
dependent or endogenous. Income .yt is an exogenous variable, in the sense that it is
determined outside the system.
Since the system incorporates a demand equation, a supply equation, and an
equilibrium condition, it is referred to as a complete system in the sense that it
has as many equations as there are endogenous variables.
Let us express each of the endogenous variables in terms of the exogenous
variable and the error terms .εtd and .εts . From Eq. (8.10), we can write:
1 1
pt =
. qt − εts (8.12)
β1 β1
Hence:
α2 β1 1
qt =
. yt + β1 εtd − α1 εts (8.14)
β1 − α1 β1 − α1
Positing:
α2 β1
γ1 =
. (8.15)
β1 − α1
354 8 Simultaneous Equations Models
and:
1
μ1t =
. β1 εtd − α1 εts (8.16)
β1 − α1
. qt = γ1 yt + μ1t (8.17)
that is:
α2 1
.pt = yt + εtd − εts (8.19)
β1 − α1 β1 − α1
By positing:
α2
γ2 =
. (8.20)
β1 − α1
and:
1
μ2t =
. εtd − εts (8.21)
β1 − α1
Putting together Eqs. (8.17) and (8.22), the system of equations is finally written
as:
. qt = γ1 yt + μ1t (8.23)
In this system, the endogenous variables are correlated with the error terms, with
the result that the OLS estimators are no longer consistent. As we will see later, it
is possible to use an instrumental variables estimator or a two-stage least squares
estimator.
Remark 8.1 When a model includes lagged endogenous variables, these are
referred to as predetermined variables. As an example, consider the following
model:
Ct = α0 + α1 Yt + α2 Ct−1 + ε1t
. (8.25)
Yt = Ct + It + Gt
. (8.27)
variables .(Ct−1 and Yt−1 ). The latter two variables are said to be predetermined
in the sense that they are considered to be already determined with respect to the
current values of the endogenous variables.
More generally, variables that are independent of all future error terms of the
structural form are called predetermined variables.
In the general case, the structural form of the simultaneous equations model is
written:
β11 Y1t + β12 Y2t + . . . + β1M YMt + γ11 X1t + γ12 X2t + . . . + γ1k Xkt = ε1t
β21 Y1t + β22 Y2t + . . . + β2M YMt + γ21 X1t + γ22 X2t + . . . + γ2k Xkt = ε2t
.
...
βM1 Y1t + βM2 Y2t + . . . + βMM YMt + γM1 X1t + γM2 X2t + . . . + γMk Xkt = εMt
(8.28)
1 The predetermined variables can thus be divided into two categories: exogenous variables and
lagged endogenous variables.
356 8 Simultaneous Equations Models
. B Y + 𝚪 X = ε (8.29)
(M,M)(M,1) (M,k)(k,1) (M,1)
with:
⎛ ⎞
β11 β12 · · · β1M
⎜ β21 β22 · · · β2M ⎟
⎜ ⎟
.B = ⎜ .. ⎟ (8.30)
⎝ . ⎠
βM1 βM2 · · · βMM
⎛ ⎞
Y1t
⎜ Y2t ⎟
⎜ ⎟
.Y = ⎜ . ⎟ (8.31)
⎝ .. ⎠
YMt
⎛ ⎞
γ11 γ12 · · · γ1k
⎜ γ21 γ22 · · · γ2k ⎟
⎜ ⎟
.𝚪 = ⎜ .. ⎟ (8.32)
⎝ . ⎠
γM1 γM2 · · · γMk
⎛ ⎞
X1t
⎜X2t ⎟
⎜ ⎟
.X = ⎜ . ⎟ (8.33)
⎝ .. ⎠
Xkt
and:
⎛ ⎞
ε1t
⎜ ε2t ⎟
⎜ ⎟
.ε = ⎜ . ⎟ (8.34)
⎝ .. ⎠
εMt
In each equation, one of the endogenous variables has its coefficient equal to
1: this is the dependent variable. There is therefore one dependent variable per
equation. In other words, in the matrix .B, each column has at least one value equal
to 1. This is known as normalization. On the other hand, equations in which all
coefficients are equal to 1 and involve no disturbance are equilibrium equations.
8.2 The Identification Problem 357
Y = −B −1 𝚪X + B −1 ε
. (8.35)
Remark 8.2 If the matrix .B is an upper triangular matrix, the system is described
as triangular or recursive. Its form is as follows:
The question posed here is whether it is possible to derive estimators of the structural
form parameters from estimators of the reduced-form parameters. The problem
arises from the fact that several structural coefficient estimates can be compatible
358 8 Simultaneous Equations Models
with the same data sets. In other words, one reduced-form equation may correspond
to several structural equations.
The identification conditions are determined equation by equation. Several cases
may arise:
.BY + 𝚪X = ε (8.37)
Y = −B −1 𝚪X + B −1 ε
. (8.38)
or:
Y = ΠX + υ
. (8.39)
with .Π = −B −1 𝚪 and .υ = B −1 ε.
Thus, three types of structural parameters are unknown:
Restrictions
– Normalization. As previously mentioned, in each equation, one of the endoge-
nous variables has its coefficient equal to 1: this is the dependent variable.
There is one such dependent variable per equation. Imposing a value of 1 on
a coefficient is called normalization. This operation reduces the number of
unknown elements in the matrix B, since we then have M(M − 1) and no longer
M 2 undetermined elements.
– Identities. We know that a model can contain behavioral relations and equilib-
rium relations or accounting identities. These equilibrium relations and account-
ing identities do not have to be identified: the coefficients associated with the
variables in these relations are in fact known and are frequently equal to 1. In the
introductory example we studied earlier, Eq. (8.11) is the equilibrium condition
and does not have to be identified.
– Exclusion relations. Not introducing a variable into one of the equations of
the system is considered as an exclusion relation. In effect, this amounts to
assigning a zero coefficient to the variable in question. In other words, it consists
in placing zeros in the elements of the matrices B and/or 𝚪. Such a procedure
obviously reduces the number of unknown parameters and thus provides an aid
to identification.
– Linear restrictions. In line with economic theory, some models contain variables
with identical coefficients. Imposing such restrictions on parameters facilitates
the estimation procedure by reducing the number of unknown parameters.
– Restrictions on the variance-covariance matrix of the disturbances. Such restric-
tions are similar to those imposed on the model parameters. They consist, for
example, in introducing zeros for certain elements of the variance-covariance
matrix when imposing the absence of correlation between the structural distur-
bances of several equations.
360 8 Simultaneous Equations Models
Note:
M = Mj + Mj∗ + 1
. (8.40)
.k = kj + kj∗ (8.41)
Since the number of equations must be at least equal to the number of unknowns,
we deduce the order condition for the identification of the equation j :
kj∗ ≥ Mj
. (8.42)
According to this condition, the number of variables excluded from the equation
j must be at least equal to the number of endogenous variables included in this
same equation j . The order condition is a necessary condition for identification, but
not a sufficient one. In other words, it ensures that the j -th equation of the reduced
form admits a solution, but we do not know whether or not it is unique. In order
to guarantee the uniqueness of the solution, the rank condition is necessary. This
condition (see Greene, 2020) imposes a restriction on the submatrix of the reduced-
form coefficient matrix and ensures that there is a unique solution for the structural
parameters given the parameters of the reduced form. This rank condition can be
8.2 The Identification Problem 361
– If .kj∗ < Mj , or if the rank condition is not verified, the model is underidentified.
– If .kj∗ = Mj , and the rank condition is verified, the model is exactly identified.
– If .kj∗ > Mj , and the rank condition is verified, the model is overidentified (there
are more restrictions than those necessary for identification).
rj + kj∗ ≥ Mj
. (8.43)
where .rj denotes the number of restrictions other than the exclusion restrictions.
It is possible to reformulate this order condition by taking into account both the
exclusion relations and the linear restrictions. By noting .sj the total number of
restrictions, i.e.:
sj = rj + kj∗ + Mj∗
. (8.44)
sj ≥ M − 1
. (8.45)
– If .rj + kj∗ < Mj , or if the rank condition does not hold, the model is
underidentified.
– If .rj + kj∗ = Mj , and the rank condition does not hold, the model is exactly
identified.
– If .rj +kj∗ > Mj , and the rank condition does not hold, the model is overidentified.
On this point, reference can be made to Johnston and Dinardo (1996) and Greene
(2020).
2 However, the OLS method can be applied in the case of triangular (or recursive) systems.
8.3 Estimation Methods 363
This estimation method applies only to equations that are exactly identified. Gen-
erally speaking, the principle of indirect least squares (ILS) consists in estimating
the parameters of the reduced form by OLS and deducing the structural coefficients
by an appropriate transformation of the reduced form coefficients. This technique
can be described in three steps:
– The first step is to write the model in reduced form. This involves expressing the
dependent variable of each equation as a function of the predetermined variables
(exogenous and lagged endogenous variables) and the disturbances.
– The second step aims to estimate the parameters of each of the reduced-form
equations by OLS. The application of OLS is made possible by the fact that the
explanatory variables (predetermined variables) of the reduced-form equations
are no longer correlated with the disturbances.
– The purpose of the third step is to deduce the parameters of the structural
form from the estimated parameters of the reduced form. This determination is
made using the algebraic relations linking the structural and the reduced form
coefficients. The solution is unique since the model is exactly identifiable: there
is thus a one-to-one correspondence between the structural coefficients and those
of the reduced form.
The ILS estimator of the reduced form—which is therefore the OLS estimator—
is a BLUE estimator. In contrast, the ILS estimator of the structural form coefficients
is a biased estimator in the case of small samples. In addition, since the reduced form
of a model is not always easy to establish—especially when the model comprises a
large number of equations—and the existence of an exactly identified relationship is
quite rare, the ILS method is not often used in practice. The two-stage least squares
method is used more frequently.
The two-stage least squares (2SLS) method is the most widely used estimation
method for simultaneous equations models. This estimation procedure was intro-
duced by Theil (1953) and Basmann (1957) and applies to models that are exactly
identifiable or overidentifiable.
As the name suggests, this technique involves applying the OLS method twice.
Consider the simultaneous equations model with M endogenous variables and k
predetermined variables:
Y1t = β12 Y2t + . . . + β1M YMt + γ11 X1t + γ12 X2t + . . . + γ1k Xkt + ε1t
Y2t = β21 Y1t + . . . + β2M YMt + γ21 X1t + γ22 X2t + . . . + γ2k Xkt + ε2t
.
...
YMt = βM1 Y1t + . . . + βMM YMt + γM1 X1t + γM2 X2t + . . . + γMk Xkt + εMt
(8.46)
364 8 Simultaneous Equations Models
the aim being to remove the correlation between endogenous variables and
disturbances. We thus have the following system:
The terms .(u1t , u2t , . . . , uMt ) denote the error terms associated with each of
the equations in this system. This system corresponds to a reduced form system
insofar as no endogenous variables appear on the right-hand side of the various
equations. We deduce
from the estimation of these equations the estimated values
. Ŷ1t , Ŷ2t , . . . , ŶMt :
The second step consists in replacing the endogenous variables appearing on the
right-hand side of the structural equations with their values estimated in the first
step, i.e.:
Y1t = β12 Ŷ2t + . . . + β1M ŶMt + γ11 X1t + γ12 X2t + . . . + γ1k Xkt + v1t
Y2t = β21 Ŷ1t + . . . + β2M ŶMt + γ21 X1t + γ22 X2t + . . . + γ2k Xkt + v2t
.
...
YMt = βM1 Ŷ1t + . . . + βMM ŶMt + γM1 X1t + γM2 X2t + . . . + γMk Xkt + vMt
(8.49)
where the terms .(v1t , v2t , . . . , vMt ) designate the disturbances associated with the
equations of the latter system.
The two-stage least squares estimator can be interpreted as an instrumental
variables estimator where the instruments used are the estimated values of the
endogenous variables (for an in-depth description, see in particular Johnston
and Dinardo, 1996). It can be shown that in the absence of autocorrelation
and heteroskedasticity, the two-stage least squares estimator is the most efficient
instrumental variables estimator.
estimated values of the endogenous variables . Ŷ1t , Ŷ2t , . . . , ŶMt are close to the
true values .(Y1t , Y2t , . . . , YMt ). As a result, the estimators in the second step will be
very close to those in the first step. Conversely, if the coefficients of determination
associated with the reduced-form equations of the first stage are low, the
regressions
are poorly explanatory, and the estimated values . Ŷ1t , Ŷ2t , . . . , ŶMt used in the
second stage will be largely composed of the errors of the first-stage regressions.
The significance of the two-stage least squares estimators is then greatly reduced.
Remark 8.5 When the model is exactly identified, the indirect least squares and
two-stage least squares methods lead to identical results.
Remark 8.6 There are other limited-information methods for estimating simulta-
neous equations models: the generalized moments estimator (used when there is
a presumption of heteroskedasticity), the limited-information maximum likelihood
estimator, or K-class estimators. For a presentation of these various techniques, see
Theil (1971), Davidson and MacKinnon (1993), Florens et al. (2007), or Greene
(2020).
We will not develop these techniques in this book but refer readers instead to Zellner
and Theil (1962), Theil (1971), Johnston and Dinardo (1996), or Greene (2020). Let
us simply mention that these procedures consist in estimating the M equations of
the system simultaneously. Thus, all the information about the set of the structural
equations is taken into account during the estimation. In this framework, the most
commonly used methods are:
– the Three-stage least squares method, due to Zellner and Theil (1962).
Heuristically, this technique involves (i) estimating the reduced form coefficients
by OLS, (ii) determining the two-stage least squares estimators for each equation,
and (iii) calculating the GLS estimator. The three-stage least squares estimator
is an asymptotically efficient instrumental variables estimator. It is particularly
appropriate when the disturbances are heteroskedastic and correlated with each
other.
– The full-information maximum likelihood method. Like the previous one,
this technique considers all the equations and all the model parameters jointly.
It is based on the assumption that the disturbances are normally distributed
and consists in maximizing the log likelihood associated with the model. In
addition to Theil (1971), Dhrymes (1973) and Hausman (1975, 1983) can also
be consulted on this technique.
– The system generalized method of moments. This method is mainly used in the
presence of heteroskedasticity. If the disturbances are homoskedastic, this leads
to results asymptotically equivalent to those derived from the three-stage least
squares method.
366 8 Simultaneous Equations Models
Remark 8.7 The three-stage least squares method can be seen as a two-stage least
squares version of the SUR (seemingly unrelated regressions) method of Zellner
(1962). A SUR model is a system composed of Mequations and T observations of
the type:
⎧
⎪
⎪ Y 1 = X1 β 1 + ε1
⎨
Y 2 = X2 β 2 + ε2
. (8.50)
⎪
⎪ ...
⎩
Y M = XM β M + εM
Y i = Xi β i + εi
. (8.51)
'
.i = 1, . . . , M,
with .ε = [ε 1 , ε 2 , . . . , ε M ] , .E [ε |X1 , . . . , X M ] = 0, and
E εε ' |X1 , . . . , X M = Ωε .
.
Remark 8.8 The three-stage least squares and full-information maximum likeli-
hood estimators are instrumental variables estimators. Both estimators have the
same asymptotic variance-covariance matrix. Therefore, under the assumption
of normality of the disturbances, the two estimators have the same asymptotic
distribution. The three-stage least squares estimator is, however, easier to calculate
than the full-information maximum likelihood estimator.
Remark 8.9 One may ask under what conditions the three-stage least squares
method is more efficient than the two-stage least squares method. Generally speak-
ing, a full-information method is more efficient than a limited-information method
if the model specification is correct. This is a very strong condition, especially in the
case of large models. A misspecification in the model structure will affect the whole
system with full-information three-stage least squares and maximum likelihood
methods, whereas limited-information methods generally restrict the problem to
the equation affected by the misspecification. Furthermore, if the disturbances of
the structural equations are not correlated with each other, the two-stage and three-
stage least squares methods yield identical results. Similarly, both techniques lead
to identical results if the model equations are exactly identified.
8.4 Specification Test 367
We have seen that OLS estimators are not consistent in the case of simultaneous
equations. In the presence of simultaneity, it is appropriate to use other estimation
techniques that we presented in the previous section (instrumental variables meth-
ods). However, if simultaneity does not exist, the instrumental variables techniques
lead to efficient, but non-consistent estimators. The question of simultaneity is
therefore crucial. It arises insofar as the endogenous variables appear among the
regressors of a simultaneous equations model and insofar as such variables are likely
to be correlated with the disturbances. Testing simultaneity therefore amounts to
testing the correlation between an endogenous regressor and the error term. If the
test concludes that simultaneity is present, it is appropriate to use the techniques
presented in the previous section, i.e., the instrumental variables methods. On the
other hand, in the absence of simultaneity, OLS should be used.
The test proposed by Hausman (1978) provides a way of dealing with the
simultaneity problem. The general principle of the test is to compare two sets
of estimators: (i) a set of estimators assumed to be consistent under the null
hypothesis (absence of simultaneity) and under the alternative hypothesis (presence
of simultaneity) and (ii) a set of estimators assumed to be consistent only under the
null hypothesis. To illustrate this test, consider the following example, inspired by
Pindyck and Rubinfeld (1991). The model consists of a demand equation:
Qt = α0 + α1 Pt + α2 Yt + α3 Rt + ε1t
. (8.52)
Qt = β0 + β1 Pt + ε2t
. (8.53)
Qt = a0 + a1 Yt + α2 Rt + u1t
. (8.54)
Pt = b0 + b1 Yt + b2 Rt + u2t
. (8.55)
û2t = Pt − P̂t
. (8.57)
368 8 Simultaneous Equations Models
Under the null hypothesis of no simultaneity, the correlation between .û2t and .ε2t
is zero.
The second step is to estimate the relationship (8.58) and perform a significance
test (usual t-test) of the coefficient assigned to .û2t . If this coefficient is not
significantly different from zero, the null hypothesis is not rejected and there is
no simultaneity problem: the OLS method can be applied. On the other hand, if it
is significantly different from zero, the instrumental variables methods presented in
the previous section should be preferred.
Remark 8.10 In Eq. (8.58), Pindyck and Rubinfeld (1991) suggest regressing .Qt
on .Pt (instead of .P̂t ) and .û2t .
To illustrate the various concepts presented in this chapter, let us consider Klein’s
(1950) model of the US economy over the period 1920–1941.
Ct + It + Gt = Yt
. (8.62)
πt = Yt − W1t − W2t − Tt
. (8.63)
Kt − Kt−1 = It
. (8.64)
where C denotes consumption (in constant dollars), .π profits (in constant dollars),
W1 private sector wages, .W2 government wage payments (public sector wages), I
.
net investment (in constant dollars), .Kt−1 is the capital stock at the beginning of the
year, Y output (in constant dollars), G government expenditures, T taxes on profits,
and t a time trend.
8.5 Empirical Application 369
If we add the constant term present in each of the first three equations, the number
of exogenous variables k is equal to 8.
kj∗ ≥ Mj
. (8.65)
rj + kj∗ ≥ Mj
. (8.66)
k = kj + kj∗
. (8.67)
where .kj is the number of exogenous variables in the equation j under considera-
tion, with k denoting the total number of exogenous variables in the model.
With these points in mind, let us study the identification conditions equation by
equation:
– For Eq. (8.59), we have: .Mj = 3 (three endogenous variables) and .kj = 3 (two
exogenous variables plus the constant term). A linear restriction is also imposed
on the parameters, since the coefficients associated with .W1 and .W2 are assumed
370 8 Simultaneous Equations Models
to be identical. We thus have .rj = 1. We therefore use the order condition (8.66)
with .kj∗ = k − kj = 8 − 3 = 5. We have: .rj + kj∗ = 1 + 5 = 6 which is greater
than .Mj = 3. We deduce that Eq. (8.59) is overidentified.
– In Eq. (8.60), we have: .Mj = 2 and .kj = 3 (two exogenous variables plus the
constant term). No restriction is imposed on the parameters . rj = 0 and we then
use the order condition (8.65). .kj∗ = 8 − 3 = 5 is greater than .Mj = 2, implying
that Eq. (8.60) is also overidentified.
– Finally, Eq. (8.61) is such that .Mj = 2 and .kj = 3 (two exogenous variables plus
the constant term). Due to the absence of restrictions on the parameters . rj = 0 ,
using the order condition (8.65) gives us: .kj∗ > Mj . Consequently, Eq. (8.61) is
also overidentified.
All three equations of the Klein model are overidentified. The model can then be
estimated.
8.5.3 Data
The data concern the United States over the period 1920–1941 and are annual.
Table 8.1 gives the values taken by the various variables used in the model.
In order to estimate the Klein model, instrumental variables methods must be used.
We propose to apply one limited-information method (two-stage least squares)
and two full-information methods (three-stage least squares and full-information
maximum likelihood). First, we estimate each of the equations using OLS.
1920 39.8 12.7 28.8 2.2 180.1 44.9 2.4 2.7 3.4
1921 41.9 12.4 25.5 2.7 182.8 45.6 3.9 .−0.2 7.7
1922 45 16.9 29.3 2.9 182.6 50.1 3.2 1.9 3.9
1923 49.2 18.4 34.1 2.9 184.5 57.2 2.8 5.2 4.7
1924 50.6 19.4 33.9 3.1 189.7 57.1 3.5 3 3.8
1925 52.6 20.1 35.4 3.2 192.7 61 3.3 5.1 5.5
1926 55.1 19.6 37.4 3.3 197.8 64 3.3 5.6 7
1927 56.2 19.8 37.9 3.6 203.4 64.4 4 4.2 6.7
1928 57.3 21.1 39.2 3.7 207.6 64.5 4.2 3 4.2
1929 57.8 21.7 41.3 4 210.6 67 4.1 5.1 4
1930 55 15.6 37.9 4.2 215.7 61.2 5.2 1 7.7
1931 50.9 11.4 34.5 4.8 216.7 53.4 5.9 .−3.4 7.5
1932 45.6 7 29 5.3 213.3 44.3 4.9 .−6.2 8.3
1933 46.5 11.2 28.5 5.6 207.1 45.1 3.7 .−5.1 5.4
1934 48.7 12.3 30.6 6 202 49.7 4 .−3 6.8
1935 51.3 14 33.2 6.1 199 54.4 4.4 .−1.3 7.2
1936 57.7 17.6 36.8 7.4 197.7 62.7 2.9 2.1 8.3
1937 58.7 17.3 41 6.7 199.8 65 4.3 2 6.7
1938 57.5 15.3 38.2 7.7 201.8 60.9 5.3 .−1.9 7.4
1939 61.6 19 41.6 7.8 199.9 69.5 6.6 1.3 8.9
1940 65 21.1 45 8 201.2 75.7 7.4 3.3 9.6
1941 69.7 23.5 53.3 8.5 204.5 88.4 13.8 4.9 11.6
Source: Klein (1950)
are always assigned the same signs, but the orders of magnitude vary slightly.
However, even if the value taken by the coefficients is sometimes different, the
weight of the variables is not modified in the sense that a variable that was not
significant with the two-stage least squares method is also not significant with the
three-stage least squares method. The same applies to significant variables.
8.5 Empirical Application 375
the other techniques (two-stage and three-stage least squares). In the consumption
equation, the values taken by the coefficients of the two profit variables differ from
those obtained by three-stage least squares, but remain insignificant. Conversely, in
the investment equation, the variables that were significant with three-stage least
squares are no longer significant with the maximum likelihood method. Finally,
the results concerning the last equation of the Klein model remain similar to those
obtained with the three-stage least squares technique.
Conclusion
This chapter has gone beyond the univariate framework by presenting multi-
equation models, i.e., systems of equations. Simultaneous equations models, the
subject of this chapter, are based on economic foundations and are therefore
an alternative to VAR models (presented in the previous chapter), which are a-
theoretical. We have seen that a prerequisite for estimating simultaneous equations
models is identification: we need to check that the available data contain sufficient
information for the parameters to be estimated. Once identification has been
carried out, it is possible to proceed with estimation. Several procedures have been
presented and/or applied, including indirect least squares, two-stage least squares,
three-stage least squares, and full-information maximum likelihood.
Further Reading 377
Further Reading
The table below shows the values for z positive, For z, negative, the value is
N (z) = I − N (−z).
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 379
V. Mignon, Principles of Econometrics, Classroom Companion: Economics,
https://doi.org/10.1007/978-3-031-52535-3
380
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.500000 0.503989 0.507978 0.511966 0.515953 0.519939 0.523922 0.527903 0.531881 0.535856
0.1 0.539828 0.543795 0.547758 0.551717 0.555670 0.559618 0.563559 0.567495 0.571424 0.575345
0.2 0.579260 0.583166 0.587064 0.590954 0.594835 0.598706 0.602568 0.606420 0.610261 0.614092
0.3 0.617911 0.621720 0.625516 0.629300 0.633072 0.636831 0.640576 0.644309 0.648027 0.651732
0.4 0.655422 0.659097 0.662757 0.666402 0.670031 0.673645 0.677242 0.680822 0.684386 0.687933
0.5 0.691462 0.694974 0.698468 0.701944 0.705401 0.708840 0.712260 0.715661 0.719043 0.722405
0.6 0.725747 0.729069 0.732371 0.735653 0.738914 0.742154 0.745373 0.748571 0.751748 0.754903
0.7 0.758036 0.761148 0.764238 0.767305 0.770350 0.773373 0.776373 0.779350 0.782305 0.785236
0.8 0.788145 0.791030 0.793892 0.796731 0.799546 0.802337 0.805105 0.807850 0.810570 0.813267
0.9 0.815940 0.818589 0.821214 0.823814 0.826391 0.828944 0.831472 0.833977 0.836457 0.838913
1.0 0.841345 0.843752 0.846136 0.848495 0.850830 0.853141 0.855428 0.857690 0.859929 0.862143
1.1 0.864334 0.866500 0.868643 0.870762 0.872857 0.874928 0.876976 0.879000 0.881000 0.882977
1.2 0.884930 0.886861 0.888768 0.890651 0.892512 0.894350 0.896165 0.897958 0.899727 0.901475
1.3 0.903200 0.904902 0.906582 0.908241 0.909877 0.911492 0.913085 0.914657 0.916207 0.917736
1.4 0.919243 0.920730 0.922196 0.923641 0.925066 0.926471 0.927855 0.929219 0.930563 0.931888
1.5 0.933193 0.934478 0.935745 0.936992 0.938220 0.939429 0.940620 0.941792 0.942947 0.944083
1.6 0.945201 0.946301 0.947384 0.948449 0.949497 0.950529 0.951543 0.952540 0.953521 0.954486
1.7 0.955435 0.956367 0.957284 0.958185 0.959070 0.959941 0.960796 0.961636 0.962462 0.963273
1.8 0.964070 0.964852 0.965620 0.966375 0.967116 0.967843 0.968557 0.969258 0.969946 0.970621
1.9 0.971283 0.971933 0.972571 0.973197 0.973810 0.974412 0.975002 0.975581 0.976148 0.976705
2.0 0.977250 0.977784 0.978308 0.978822 0.979325 0.979818 0.980301 0.980774 0.981237 0.981691
(continued)
Appendix: Statistical Tables
2.1 0.982136 0.982571 0.982997 0.983414 0.983823 0.984222 0.984614 0.984997 0.985371 0.985738
2.2 0.986097 0.986447 0.986791 0.987126 0.987455 0.987776 0.988089 0.988396 0.988696 0.988989
2.3 0.989276 0.989556 0.989830 0.990097 0.990358 0.990613 0.990863 0.991106 0.991344 0.991576
2.4 0.991802 0.992024 0.992240 0.992451 0.992656 0.992857 0.993053 0.993244 0.993431 0.993613
Appendix: Statistical Tables
2.5 0.993790 0.993963 0.994132 0.994297 0.994457 0.994614 0.994766 0.994915 0.995060 0.995201
2.6 0.995339 0.995473 0.995604 0.995731 0.995855 0.995975 0.996093 0.996207 0.996319 0.996427
2.7 0.996533 0.996636 0.996736 0.996833 0.996928 0.997020 0.997110 0.997197 0.997282 0.997365
2.8 0.997445 0.997523 0.997599 0.997673 0.997744 0.997814 0.997882 0.997948 0.998012 0.998074
2.9 0.998134 0.998193 0.998250 0.998305 0.998359 0.998411 0.998462 0.998511 0.998559 0.998605
(continued)
381
382 Appendix: Statistical Tables
z 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.8 4.0 4.5
N(z) 0.998650 0.999032 0.999313 0.999517 0.999663 0.999767 0.999841 0.999928 0.999968 0.999997
6 0.131 0.265 0.404 0.553 0.718 0.906 1.134 1.440 1.943 2.447 3.707 4.317
7 0.130 0.263 0.402 0.549 0.711 0.896 1.119 1.415 1.895 2.365 3.499 4.029
8 0.130 0.262 0.399 0.546 0.706 0.889 1.108 1.397 1.860 2.306 3.355 3.833
9 0.129 0.261 0.398 0.543 0.703 0.883 1.100 1.383 1.833 2.262 3.250 3.690
10 0.129 0.260 0.397 0.542 0.700 0.879 1.093 1.372 1.812 2.228 3.169 3.581
11 0.129 0.260 0.396 0.540 0.697 0.876 1.088 1.363 1.796 2.201 3.106 3.497
12 0.128 0.259 0.395 0.539 0.695 0.873 1.083 1.356 1.782 2.179 3.055 3.428
13 0.128 0.259 0.394 0.538 0.694 0.870 1.079 1.350 1.771 2.160 3.012 3.372
14 0.128 0.258 0.393 0.537 0.692 0.868 1.076 1.345 1.761 2.145 2.977 3.326
15 0.128 0.258 0.393 0.536 0.691 0.866 1.074 1.341 1.753 2.131 2.947 3.286
16 0.128 0.258 0.392 0.535 0.690 0.865 1.071 1.337 1.746 2.120 2.921 3.252
17 0.128 0.257 0.392 0.534 0.689 0.863 1.069 1.333 1.740 2.110 2.898 3.222
18 0.127 0.257 0.392 0.534 0.688 0.862 1.067 1.330 1.734 2.101 2.878 3.197
19 0.127 0.257 0.391 0.533 0.688 0.861 1.066 1.328 1.729 2.093 2.861 3.174
20 0.127 0.257 0.391 0.533 0.687 0.860 1.064 1.325 1.725 2.086 2.845 3.153
21 0.127 0.257 0.391 0.532 0.686 0.859 1.063 1.323 1.721 2.080 2.831 3.135
22 0.127 0.256 0.390 0.532 0.686 0.858 1.061 1.321 1.717 2.074 2.819 3.119
(continued)
383
384
r P = 0.90 P = 0.80 P = 0.70 P = 0.60 P = 0.50 P = 0.40 P = 0.30 P = 0.20 P = 0.10 P = 0.05 P = 0.01 P = 0.005
23 0.127 0.256 0.390 0.532 0.685 0.858 1.060 1.319 1.714 2.069 2.807 3.104
24 0.127 0.256 0.390 0.531 0.685 0.857 1.059 1.318 1.711 2.064 2.797 3.091
25 0.127 0.256 0.390 0.531 0.684 0.856 1.058 1.316 1.708 2.060 2.787 3.078
26 0.127 0.256 0.390 0.531 0.684 0.856 1.058 1.315 1.706 2.056 2.779 3.067
27 0.127 0.256 0.389 0.531 0.684 0.855 1.057 1.314 1.703 2.052 2.771 3.057
28 0.127 0.256 0.389 0.530 0.683 0.855 1.056 1.313 1.701 2.048 2.763 3.047
29 0.127 0.256 0.389 0.530 0.683 0.854 1.055 1.311 1.699 2.045 2.756 3.038
30 0.127 0.256 0.389 0.530 0.683 0.854 1.055 1.310 1.697 2.042 2.750 3.030
40 0.126 0.255 0.388 0.529 0.681 0.851 1.050 1.303 1.684 2.021 2.704 2.971
80 0.126 0.254 0.387 0.526 0.678 0.846 1.043 1.292 1.664 1.990 2.639 2.887
120 0.126 0.254 0.386 0.526 0.677 0.845 1.041 1.289 1.658 1.980 2.617 2.860
∞ 0.126 0.253 0.385 0.524 0.675 0.842 1.036 1.282 1.645 1.960 2.576 2.808
Appendix: Statistical Tables
Appendix: Statistical Tables 385
r P = 0.990 P = 0.975 P = 0.950 P = 0.900 P = 0.800 P = 0.700 P = 0.500 P = 0.300 P = 0.200 P = 0.100 P = 0.010 P = 0.005 P = 0.001
1 0.000 0.001 0.004 0.016 0.064 0.148 0.455 1.074 1.642 2.706 6.635 7.879 10.828
2 0.200 0.051 0.103 0.211 0.446 0.713 1.386 2.408 3.219 4.605 9.210 10.597 13.816
3 0.115 0.216 0.352 0.584 1.005 1.424 2.366 3.665 4.642 6.251 11.345 12.838 16.266
4 0.297 0.484 0.711 1.064 1.649 2.195 3.357 4.878 5.989 7.779 13.277 14.860 18.467
5 0.554 0.831 1.145 1.610 2.343 3.000 4.351 6.064 7.289 9.236 15.086 16.750 20.515
6 0.872 1.237 1.635 2.204 3.070 3.828 5.348 7.231 8.558 10.645 16.812 18.548 22.458
7 1.239 1.690 2.167 2.833 3.822 4.671 6.346 8.383 9.803 12.017 18.475 20.278 24.322
8 1.646 2.180 2.733 3.490 4.594 5.527 7.344 9.524 11.030 13.362 20.090 21.955 26.124
9 2.088 2.700 3.325 4.168 5.380 6.393 8.343 10.656 12.242 14.684 21.666 23.589 27.877
10 2.558 3.247 3.940 4.865 6.179 7.267 9.342 11.781 13.442 15.987 23.209 25.188 29.588
11 3.053 3.816 4.575 5.578 6.989 8.148 10.341 12.899 14.631 17.275 24.725 26.757 31.264
12 3.571 4.404 5.226 6.304 7.807 9.034 11.340 14.011 15.812 18.549 26.217 28.300 32.909
13 4.107 5.009 5.892 7.042 8.634 9.926 12.340 15.119 16.985 19.812 27.688 29.819 34.528
14 4.660 5.629 6.571 7.790 9.467 10.821 13.339 16.222 18.151 21.064 29.141 31.319 36.123
15 5.229 6.262 7.261 8.547 10.307 11.721 14.339 17.322 19.311 22.307 30.578 32.801 37.697
16 5.812 6.908 7.962 0.312 11.152 12.624 15.338 18.418 20.465 23.542 32.000 34.267 39.252
17 6.408 7.564 8.672 10.085 12.002 13.531 16.338 19.511 21.615 24.769 33.409 35.718 40.790
18 7.015 8.231 9.390 10.865 12.857 14.440 17.338 20.601 22.760 25.989 34.805 37.156 42.312
19 7.633 8.907 10.117 11.651 13.716 15.352 18.338 21.689 23.900 27.204 36.191 38.582 43.820
20 8.260 9.591 10.851 12.443 14.578 16.266 19.337 22.775 25.038 28.412 37.566 39.997 45.315
(continued)
Appendix: Statistical Tables
21 8.897 10.283 11.591 13.240 15.445 17.182 20.337 23.858 26.171 29.615 38.932 41.401 46.797
22 9.542 10.982 12.338 14.041 16.314 18.101 21.337 24.939 27.301 30.813 40.289 42.796 48.268
23 10.196 11.689 13.091 14.848 17.187 19.021 22.337 26.018 28.429 32.007 41.638 44.181 49.728
24 10.856 12.401 13.848 15.659 18.062 19.943 23.337 27.096 29.553 33.196 42.980 45.559 51.179
25 11.524 13.120 14.611 16.473 18.940 20.867 24.337 28.172 30.675 34.382 44.314 46.928 52.620
26 12.198 13.844 15.379 17.292 19.820 21.792 25.336 29.246 31.795 35.563 45.642 48.290 54.052
27 12.879 14.573 16.151 18.114 20.703 22.719 26.336 30.319 32.912 36.741 46.963 49.645 55.476
Appendix: Statistical Tables
28 13.565 15.308 16.928 18.939 21.588 23.647 27.336 31.391 34.027 37.916 48.278 50.993 56.892
29 14.256 16.047 17.708 19.768 22.475 24.577 28.336 32.461 35.139 39.087 49.588 52.336 58.301
30 14.953 16.791 18.493 20.599 23.364 25.508 29.336 33.530 36.250 40.256 50.892 53.672 59.703
40 22.164 24.433 26.509 29.051 32.345 34.872 39.335 44.165 47.269 51.805 63.691 66.766 73.402
80 53.540 57.153 60.391 64.278 69.207 72.915 79.334 86.120 90.405 96.578 112.329 116.321 124.839
120 86.923 91.573 95.705 100.624 106.806 111.419 119.334 127.616 132.806 140.233 158.950 163.648 173.617
387
388 Appendix: Statistical Tables
5 6.608 16.258 5.786 13.274 5.409 12.060 5.192 11.392 5.050 10.967 4.950 10.672
6 5.987 13.745 5.143 10.925 4.757 9.780 4.534 9.148 4.387 8.746 4.284 8.466
7 5.591 12.246 4.737 9.547 4.347 8.451 4.120 7.847 3.972 7.460 3.866 7.191
8 5.318 11.259 4.459 8.649 4.066 7.591 3.838 7.006 3.687 6.632 3.581 6.371
9 5.117 10.561 4.256 8.022 3.863 6.992 3.633 6.422 3.482 6.057 3.374 5.802
10 4.965 10.044 4.103 7.559 3.708 6.552 3.478 5.994 3.326 5.636 3.217 5.386
11 4.844 9.646 3.982 7.206 3.587 6.217 3.357 5.668 3.204 5.361 3.095 5.069
12 4.747 9.330 3.885 6.927 3.490 5.953 3.259 5.412 3.106 5.064 2.996 4.821
13 4.667 9.074 3.806 6.701 3.411 4.739 3.179 5.205 3.025 4.862 2.915 4.620
14 4.600 8.862 3.739 6.515 3.344 5.564 3.112 5.035 2.958 4.695 2.848 4.456
15 4.543 8.683 3.682 6.359 3.287 5.417 3.056 4.893 2.901 4.556 2.790 4.318
16 4.494 8.531 3.634 6.226 3.239 5.292 3.007 4.773 2.852 4.437 2.741 4.202
17 4.451 8.400 3.592 6.112 3.197 5.185 2.965 4.669 2.810 4.336 2.699 4.102
18 4.414 8.285 3.555 6.013 3.160 5.092 2.928 4.579 2.773 4.248 2.661 4.015
19 4.381 8.185 3.522 5.926 3.127 5.010 2.895 4.500 2.740 4.171 2.628 3.939
20 4.351 8.096 3.493 5.849 3.098 4.938 2.866 4.431 2.711 4.103 2.599 3.871
21 4.325 8.017 3.467 5.780 3.072 4.874 2.840 4.369 2.685 4.042 2.573 3.812
22 4.301 7.945 3.443 5.719 3.049 4.817 2.817 4.313 2.661 3.988 2.549 3.758
(continued)
389
390
v1 = 1 v1 = 2 v1 = 3 v1 = 4 v1 = 5 v1 = 6
v2 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01
23 4.279 7.881 3.422 5.664 3.028 4.765 2.796 4.264 2.640 3.939 2.528 3.710
24 4.260 7.823 3.403 5.614 3.009 4.718 2.776 4.218 2.621 3.895 2.508 3.667
25 4.242 7.770 3.385 5.568 2.991 4.675 2.759 4.177 2.603 3.855 2.490 3.627
26 4.225 7.721 3.369 5.526 2.975 4.637 2.743 4.140 2.587 3.818 2.474 3.591
27 4.210 7.677 3.354 5.488 2.960 4.601 2.728 4.106 2.572 3.785 2.459 3.558
28 4.196 7.636 3.340 5.453 2.947 4.568 2.714 4.074 2.558 3.754 2.445 3.528
29 4.183 7.598 3.328 5.420 2.934 4.538 2.701 4.045 2.545 3.725 2.432 3.499
30 4.171 7.562 3.316 5.390 2.922 4.510 2.690 4.018 2.534 3.699 2.421 3.473
40 4.085 7.314 3.232 5.179 2.839 4.131 2.606 3.828 2.449 3.514 2.336 3.291
80 3.960 6.963 3.111 4.881 2.719 4.036 2.486 3.563 2.329 3.255 2.214 3.036
120 3.920 6.851 3.072 4.787 2.680 3.949 2.447 3.480 2.290 3.174 2.175 2.956
∞ 3.842 6.637 2.997 4.607 2.606 3.784 2.373 3.321 2.215 3.019 2.099 2.804
Appendix: Statistical Tables
v1 = 8 v1 = 10 v1 = 12 v1 = 24 v1 = 48 v1 = ∞
v2 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01
1 238.883 5981.070 241.882 6055.847 243.906 6106.321 249.052 6234.631 251.669 6299.892 254.314 6365.861
2 19.371 99.374 19.396 99.399 19.413 99.416 19.454 99.458 19.475 99.478 19.496 99.499
3 8.845 27.489 8.786 27.229 8.745 27.052 8.639 26.598 8.583 26.364 8.526 26.125
4 6.041 14.799 5.964 14.546 5.912 14.374 5.774 13.929 5.702 13.699 5.628 13.463
Appendix: Statistical Tables
5 4.818 10.289 4.735 10.051 4.678 9.888 4.527 9.466 4.448 9.247 4.365 9.020
6 4.147 8.102 4.060 7.874 4.000 7.718 3.841 7.313 3.757 7.100 3.669 6.880
7 3.726 6.840 3.637 6.620 3.575 6.469 3.410 6.074 3.322 5.866 3.230 5.650
8 3.438 6.029 3.347 5.814 3.284 5.667 3.115 5.279 3.024 5.074 2.928 4.859
9 3.230 5.467 3.137 5.257 3.073 5.111 2.900 4.729 2.807 4.525 2.707 4.311
10 3.072 5.057 2.978 4.849 2.913 4.706 2.737 4.327 2.641 4.124 2.538 3.909
11 2.948 4.744 2.854 4.539 2.788 4.397 2.609 4.021 2.511 3.818 2.404 3.602
12 2.849 4.499 2.753 4.296 2.687 4.155 2.505 3.780 2.405 3.578 2.296 3.361
13 2.767 4.302 2.671 4.100 2.604 3.960 2.420 3.587 2.318 3.384 2.206 3.165
14 2.699 4.140 2.602 3.939 2.534 3.800 2.349 3.427 2.245 3.224 2.131 3.004
15 2.641 4.004 2.544 3.805 2.475 3.666 2.288 3.294 2.182 3.090 2.066 2.868
16 2.591 3.890 2.494 3.691 2.425 3.553 2.235 3.181 2.128 2.976 2.010 2.753
17 2.548 3.791 2.450 3.593 2.381 3.455 2.190 3.084 2.081 2.878 1.960 2.653
18 2.510 3.705 2.412 3.508 2.342 3.371 2.150 2.999 2.040 2.793 1.917 2.566
19 2.477 3.631 2.378 3.434 2.308 3.297 2.114 2.925 2.003 2.718 1.878 2.489
20 2.447 3.564 2.348 3.368 2.278 3.231 2.082 2.859 1.970 2.652 1.843 2.421
21 2.420 3.506 2.321 3.310 2.250 3.173 2.054 2.801 1.941 2.593 1.812 2.360
22 2.397 3.453 2.297 3.258 2.226 3.121 2.028 2.749 1.914 2.540 1.783 2.305
(continued)
391
392
v1 = 8 v1 = 10 v1 = 12 v1 = 24 v1 = 48 v1 = ∞
v2 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01
23 2.375 3.406 2.275 3.211 2.204 3.074 2.005 2.702 1.890 2.492 1.757 2.256
24 2.355 3.363 2.255 3.168 2.183 3.032 1.984 2.659 1.868 2.448 1.733 2.211
25 2.337 3.324 2.236 3.129 2.165 2.993 1.964 2.620 1.847 2.409 1.711 2.169
26 2.321 3.288 2.220 3.094 2.148 2.958 1.946 2.585 1.828 2.373 1.691 2.131
27 2.305 3.256 2.204 3.062 2.132 2.926 1.930 2.552 1.811 2.339 1.672 2.097
28 2.291 3.226 2.190 3.032 2.118 2.896 1.915 2.522 1.795 2.309 1.654 2.064
29 2.278 3.198 2.177 3.005 2.104 2.868 1.901 2.495 1.780 2.280 1.638 2.034
30 2.266 3.173 2.165 2.979 2.092 2.843 1.887 2.469 1.766 2.254 1.622 2.006
40 2.180 2.993 2.077 2.801 2.003 2.665 1.793 2.288 1.666 2.068 1.509 1.805
80 2.056 2.742 1.951 2.551 1.875 2.415 1.654 2.032 1.514 1.799 1.325 1.494
120 2.016 2.663 1.910 2.472 1.834 2.336 1.608 1.950 1.463 1.711 1.254 1.381
∞ 1.939 2.513 1.832 2.323 1.753 2.187 1.518 1.793 1.359 1.537 1.000 1.000
Appendix: Statistical Tables
Appendix: Statistical Tables 393
Significance level = 5%
k is the number of exogenous variables, and T is the sample size.
Akaike, H. (1969), “Fitting Autoregressive Models for Prediction”, Annals of the Institute of
Statistical Mathematics, 21, pp. 243–247.
Akaike, H. (1973), “Information theory and an extension of maximum likelihood principle”,
Second International Symposium on Information Theory, pp. 261–281.
Akaike, H. (1974), “A new look at the statistical model identification”, IEEE Transactions on
Automatic Control, 19(6), pp. 716–723.
Almon, S. (1962), “The Distributed Lag between Capital Appropriations and Expenditures”,
Econometrica, 30, pp. 407–423.
Baltagi, B.H. (2021), Econometric Analysis of Panel Data, 6th edition, John Wiley & Sons.
Banerjee, A., Dolado, J., Galbraith, J.W. and D.F. Hendry (1993), Cointegration, Error-Correction,
and the Analysis of Nonstationary Data, Oxford University Press.
Basmann, R.L. (1957), Generalized Classical Method of Linear Estimation of Coefficients in a
Structural Equation”, Econometrica, 25, pp. 77–83.
Bauwens, L., Hafner, C. and S. Laurent (2012), “Volatility models”, in Bauwens, L., Hafner, C.
and S. Laurent (eds), Handbook of Volatility Models and their Applications, John Wiley & Sons,
Inc.
Beach, C.M. and J.G. MacKinnon (1978), “A Maximum Likelihood Procedure for Regression with
Autocorrelated Errors”, Econometrica, 46, pp. 51–58.
Belsley, D.A., Kuh, E. and R.E. Welsch (1980), Regression Diagnostics: Identifying Influential
Data and Sources of Collinearity, John Wiley & Sons, New York.
Bénassy-Quéré, A. and V. Salins (2005), “Impact de l’ouverture financière sur les inégalités
internes dans les pays émergents”, Working Paper CEPII, 2005–11.
Beran, J. (1994), Statistics for Long Memory Processes, Chapman & Hall.
Blanchard, O. and S. Fischer (1989), Lectures on Macroeconomics, The MIT Press.
Bollerslev, T. (2008), “Glossary to ARCH (GARCH)”, CREATES Research Paper, 2008–49.
Bollerslev, T., Chou, R.Y. and K.F. Kroner (1992), “ARCH modeling in finance: A review of the
theory and empirical evidence”, Journal of Econometrics, 52(1–2), pp. 5–59.
Bollerslev, T., Engle, R.F. and D.B. Nelson (1994), “ARCH Models”, in Engle R.F. and D.L.
McFadden (eds), Handbook of Econometrics, Vol. IV, pp. 2959–3038, Elsevier Science.
Box, G.E.P. and D.R. Cox (1964),17 “An Analysis of Transformations” , Journal of the Royal
Statistical Society, Series B, 26, pp. 211–243.
Box, G.E.P. and G.M. Jenkins (1970), Time Series Analysis: Forecasting and Control, Holden Day,
San Francisco.
Box, G.E.P. and D.A. Pierce (1970), “Distribution of Residual Autocorrelation in ARIMA Time
Series Models”, Journal of the American Statistical Association, 65, pp. 1509–1526.
Breusch, T.S. (1978), “Testing for Autocorrelation in Dynamic Linear Models”, Australian
Economic Papers, 17, pp. 334–335.
Breusch, T.S. and A.R. Pagan (1979), “A Simple Test for Heteroscedasticity and Random
Coefficient Variation”, Econometrica, 47, pp. 1287–1294.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 395
V. Mignon, Principles of Econometrics, Classroom Companion: Economics,
https://doi.org/10.1007/978-3-031-52535-3
396 References
Brockwell, P.J. and R.A. Davis (1998), Time Series. Theory and Methods, 2nd edition, Springer
Verlag.
Brown, R.L., Durbin, J. and J.M. Evans (1975), “Techniques for Testing the Constancy of
Regression Relationship Over Time”, Journal of the Royal Statistical Society, 37, pp. 149–192.
Campbell, J.Y. and P. Perron (1991), “Pitfalls and Opportunities: What Macroeconomists Should
Know about Unit Roots”, in Fisher, S. (ed.), NBER Macroeconomic Annual, MIT Press,
pp. 141–201.
Chow, G.C. (1960), “Tests of Equality Between Sets of Coefficients in two Linear Regressions”,
Econometrica, 28, pp. 591–605.
Cochrane, D. and G.H. Orcutt (1949), “Application of Least Squares Regressions to Relationships
Containing Autocorrelated Error Terms”, Journal of the American Statistical Association, 44,
pp. 32–61.
Davidson, R. and J.G. MacKinnon (1993), Estimation and Inference in Econometrics, Oxford
University Press.
Dhrymes, P. (1973), “Restricted and Unrestricted Reduced Forms”, Econometrica, 41, pp. 119–
134.
Dhrymes, P. (1978), Introductory Econometrics, Springer Verlag.
Dickey, D.A. and W.A. Fuller (1979), “Distribution of the Estimators for Autoregressive Time
Series With a Unit Root”, Journal of the American Statistical Association, 74, pp. 427–431.
Dickey, D.A. and W.A. Fuller (1981), “Likelihood Ratio Statistics for Autoregressive Time Series
With a Unit Root”, Econometrica, 49, pp. 1057–1072.
Diebold, F.X. (2012), Elements of Forecasting, 4th edition, South Western Publishers.
Dowrick S., Pitchford R. and S.J. Turnovsky (2008), Economic Growth and Macroeconomic
Dynamics: Recent Developments in Economic Theory, Cambridge University Press.
Duesenberry, J. (1949), Income, Saving and the Theory of Consumer Behavior, Harvard University
Press.
Dufrénot, G. and V. Mignon (2002a), “La cointégration non linéaire : une note méthodologique”,
Économie et Prévision, n 155, pp. 117–137.
Dufrénot, G. and V. Mignon (2002b), Recent Developments in Nonlinear Cointegration with
Applications to Macroeconomics and Finance, Kluwer Academic Publishers.
Durbin, J. (1960), “The Fitting of Time Series Models”, Review of the International Statistical
Institute, 28, pp. 233–244.
Durbin, J. (1970), “Testing for Serial Correlation in Least Squares Regression When some of the
Regressors are Lagged Dependent Variables”, Econometrica, 38, pp. 410–421.
Durbin, J. and G.S. Watson (1950), “Testing for Serial Correlation in Least Squares Regression I”,
Biometrika, 37, pp. 409–428.
Durbin, J. and G.S. Watson (1951), “Testing for Serial Correlation in Least Squares Regression
II”, Biometrika, 38, pp. 159–178.
Elhorst, J-P. (2014), Spatial Econometrics, Springer.
Engle, R.F. (1982), “Autoregressive Conditional Heteroscedasticity with Estimates of the Variance
of United Kingdom Inflation”, Econometrica, 50(4), pp. 987–1007.
Engle, R.F. and C.W.J. Granger (1987), “Cointegration and Error Correction: Representation,
Estimation and Testing”, Econometrica, 55, pp. 251–276.
Engle, R.F. and C.W.J. Granger (1991), Long Run Economic Relationships. Readings in Cointe-
gration, Oxford University Press.
Engle, R.F. and S. Yoo (1987), “Forecasting and Testing in Cointegrated Systems”, Journal of
Econometrics, 35, pp. 143–159.
Farebrother, R.W. (1980), “The Durbin-Watson Test for Serial Correlation when There Is No
Intercept in the Regression”, Econometrica, 48, pp. 1553–1563.
Farrar, D.E. and R.R. Glauber (1967), “Multicollinearity in Regression Analysis: The Problem
Revisited”, The Review of Economics and Statistics, 49, pp. 92–107.
Farvaque, E., Jean , N. and B. Zuindeau (2007), “Inégalités écologiques et comportement électoral
: le cas des élections municipales françaises de 2001”, Développement Durable et Territoires,
Dossier 9.
References 397
Feldstein, M. and C. Horioka (1980), “Domestic Saving and International Capital Flows”,
Economic Journal, 90, pp. 314–329.
Florens, J.P., Marimoutou, V. and A. Péguin-Feissolle (2007), Econometric Modeling and Infer-
ence, Cambridge University Press.
Fox, J. (1997), Applied Regression Analysis, Linear Models, and Related Methods, Sage Publica-
tions.
Friedman, M. (1957), A Theory of the Consumption Function, New York.
Frisch, R.A.K. (1933), Editorial, Econometrica, 1, pp. 1–4.
Gallant, A.R. (1987), Nonlinear Statistical Models, John Wiley & Sons.
Geary, R.C. (1970), “Relative Efficiency of Count Sign Changes for Assessing Residual Autore-
gression in Least Squares Regression”, Biometrika, 57, pp. 123–127.
Giles, D.E.A. and M.L. King (1978), “Fourth Order Autocorrelation: Further Significance Points
for the Wallis Test”, Journal of Econometrics, 8, pp. 255–259.
Glejser, H. (1969), “A New Test for Heteroscedasticity”, Journal of the American Statistical
Association, 64, pp. 316–323.
Godfrey, L.G. (1978), “Testing Against Autoregressive and Moving Average Error Models when
the Regressors Include Lagged Dependent Variables”, Econometrica, 46, pp. 1293–1302.
Goldfeld, S.M. and R.E. Quandt (1965), “Some Tests for Homoskedasticity”, Journal of the
American Statistical Association, 60, pp. 539–547.
Goldfeld, S.M. and R.E. Quandt (1972), Nonlinear Econometric Methods, North-Holland, Ams-
terdam.
Gouriéroux, C. (1997), ARCH Models and Financial Applications, Springer Series in Statistics.
Gouriéroux, C. (2000), Econometrics of Qualitative Dependent Variables, Cambridge University
Press.
Gouriéroux, C. and A. Monfort (1996), Time Series and Dynamic Models, Cambridge University
Press.
Gouriéroux, C. and A. Monfort (2008), Statistics and Econometric Models, Cambridge University
Press.
Granger, C.W.J. (1969), “Investigating Causal Relations by Econometric Models and Cross-
Spectral Methods”, Econometrica, 36, pp. 424–438.
Granger, C.W.J. (1981), “Some Properties of Time Series Data and their Use in Econometric Model
Specification” , Journal of Econometrics, pp. 121–130.
Granger, C.W.J. and P. Newbold (1974), “Spurious Regressions in Econometrics”, Journal of
Econometrics, 26, pp. 1045–1066.
Granger, C.W.J. and T. Teräsvirta (1993), Modelling Nonlinear Economic Relationships, Oxford
University Press.
Greene, W. (2020), Econometric Analysis, 8th edition, Pearson.
Griliches, Z. (1967), “Distributed Lags: A Survey”, Econometrica, 36, pp. 16–49.
Griliches, Z. and M. Intriligator (1983), Handbook of Econometrics, Vol. 1, Elsevier.
Gujarati, D.N., Porter, D.C. and S. Gunasekar (2017), Basic Econometrics, McGraw Hill.
Hamilton, J.D. (1994), Time Series Analysis, Princeton University Press.
Hannan, E.J. and B.G. Quinn (1979), “The Determination of the Order of an Autoregression”,
Journal of the Royal Statistical Society, Series B, 41, pp. 190–195.
Harvey, A.C. (1990), The Econometric Analysis of Time Series, MIT Press.
Harvey, A.C. and G.D.A. Phillips (1973), “A Comparison of the Power of Some Tests for
Heteroscedasticity in the General Linear Model”, Journal of Econometrics, 2, pp. 307–316.
Hausman, J. (1975), “An Instrumental Variable Approach to Full-Information Estimators for Linear
and Certain Nonlinear Models”, Econometrica, 43, pp. 727–738.
Hausman, J. (1978), “Specification Tests in Econometrics”, Econometrica, 46, pp. 1251–1271.
Hausman, J. (1983), “Specification and Estimation of Simultaneous Equation Models”, in
Griliches, Z. and M. Intriligator (eds), Handbook of Econometrics, North-Holland, Amsterdam.
Hendry, D.F. (1995), Dynamic Econometrics, Oxford University Press.
Hendry, D.F. and Morgan, M.S. (eds) (1995), The Foundations of Econometric Analysis, Cam-
bridge University Press.
398 References
LeSage, J. and R.K. Pace (2008), Introduction to Spatial Econometrics, Chapman & Hall.
Ljung, G.M. and G.E.P. Box (1978), “On a Measure of Lack of Fit in Time Series Models”,
Biometrika, 65, pp. 297–303.
MacKinnon, J.G. (1991), “Critical Values for Cointegration Tests”, in Engle, R.F. and C.W.J.
Granger (eds), Long-Run Economic Relationships, Oxford University Press, pp. 267–276.
Maddala, G.S. and I.-M. Kim (1998), Unit Roots, Cointegration, and Structural Change, Cam-
bridge University Press.
Maddala, G.S. and A.S. Rao (1971), “Maximum Likelihood Estimation of Solow’s and Jorgenson’s
Distributed Lag Models”, The Review of Economics and Statistics, 53(1), pp. 80–89.
Matyas, L. and P. Sevestre (2008), The Econometrics of Panel Data. Fundamentals and Recent
Developments in Theory and Practice, 3rd edition, Springer.
Mills, T.C. (1990), Time Series Techniques for Economists, Cambridge University Press.
Mittelhammer, R.C., Judge, G.G. and D.J. Miller (2000), Econometric Foundations, Cambridge
University Press, New York.
Mood, A.M., Graybill, F.A. and D.C. Boes (1974), Introduction to the Theory of Statistics,
McGraw-Hill.
Morgan, M.S. (1990), The History of Econometric Ideas (Historical Perspectives on Modern
Economics), Cambridge University Press.
Morgenstern, O. (1963), The Accuracy of Economic Observations, Princeton University Press.
Nelson, C.R. and C. Plosser (1982), “Trends and Random Walks in Macroeconomics Time Series:
Some Evidence and Implications”, Journal of Monetary Economics, 10, pp. 139–162.
Nerlove, M. (1958), Distributed Lags and Demand Analysis for Agricultural and Other Commodi-
ties, Agricultural Handbook 141, US Department of Agriculture.
Newbold, P. (1984), Statistics for Business and Economics, Prentice Hall.
Newey, W.K. and K.D. West (1987), “A Simple Positive Definite Heteroskedasticity and Autocor-
relation Consistent Covariance Matrix”, Econometrica, 55, pp. 703–708.
Palm, F.C. (1996), “GARCH Models of Volatility”, in Maddala G.S. and C.R. Rao (eds), Handbook
of Statistics, Vol. 14, pp. 209–240, Elsevier Science.
Phillips, A.W. (1958), “The Relationship between Unemployment and the Rate of Change of
Money Wage Rates in the United Kingdom, 1861–1957”, Economica, 25 (100), pp. 283–299.
Pindyck, R.S. and D.L. Rubinfeld (1991), Econometric Models and Economic Forecasts, McGraw-
Hill.
Pirotte, A. (2004), L’économétrie. Des origines aux développements récents, CNRS Éditions.
Prais, S.J. and C.B. Winsten (1954), “Trend Estimators and Serial Correlation”, Cowles Commis-
sion Discussion Paper, no. 383, Chicago.
Puech, F. (2005), Analyse des déterminants de la criminalité dans les pays en développement,
Thèse pour le doctorat de Sciences Économiques, Université d’Auvergne-Clermont I.
Rao, C.R. (1965), Linear Statistical Inference and Its Applications, John Wiley & Sons.
Sargan, J.D. (1964), “Wages and Prices in the United Kingdom: A Study in Econometric
Methodology”, in Hart, P.E., Mills, G. and J.K. Whitaker (eds), Econometric Analysis for
National Economic Planning, Butterworths, London.
Schmidt, P. (1976), Econometrics, Marcel Dekker, New York.
Schwarz, G. (1978), “Estimating the Dimension of a Model”, The Annals of Statistics, 6, pp. 461–
464.
Sims, C.A. (1980), “Macroeconomics and Reality”, Econometrica, 48, pp. 1–48.
Solow, R.M. (1960), “On a Family of Lag Distributions”, Econometrica, 28, pp. 393–406.
Spanos, A. (1999), Probability Theory and Statistical Inference: Econometric Modeling with
Observational Data, Cambridge University Press.
Swamy, P.A.V.B. (1971), Statistical Inference in Random Coefficient Regression Models, Springer
Verlag.
Teräsvirta, T., Tjøstheim, D. and C.W.J. Granger (2010), Modelling Nonlinear Economic Time
Series, Oxford University Press.
Theil, H. (1953), “Repeated Least Squares Applied to Complete Equation Systems”, Central
Planning Bureau, The Hague, Netherlands.
400 References
Theil, H. (1971), Principles of Econometrics, John Wiley & Sons, New York.
Theil, H. (1978), Introduction to Econometrics, Prentice Hall.
Thuilliez, J. (2007), “Malaria and Primary Education: A Cross-Country Analysis on Primary
Repetition and Completion Rates”, Working Paper Centre d’Économie de la Sorbonne, 2007–
13.
Tobin, J. (1950), “A Statistical Demand Function for Food in the USA”, Journal of the Royal
Statistical Society, Series A, pp. 113–141.
Wallis, K.F. (1972), “Testing for Fourth-Order Autocorrelation in Quarterly Regression Equa-
tions”, Econometrica, 40, pp. 617–636.
White, H. (1980), “A Heteroscedasticity Consistent Covariance Matrix Estimator and a Direct Test
of Heteroscedasticity”, Econometrica, 48, pp. 817–838.
Wooldridge, J.M. (2010), Econometric Analysis of Cross Section and Panel Data, 2nd edition, MIT
Press.
Wooldridge, J.M. (2012), Introductory Econometrics: A Modern Approach, 5th edition, South
Western Publishing Co.
Zellner, A. (1962), “An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests
of Aggregation Bias”, Journal of the American Statistical Association, 57, pp. 500–509.
Zellner, A. and H. Theil (1962), “Three Stage Least Squares: Simultaneous Estimation of
Simultaneous Equations”, Econometrica, 30, pp. 63–68.
Bringing together theory and practice, this book presents the basics of economet-
rics in a clear and pedagogical way. It focuses on the acquisition of the methods and
skills that are essential for all students wishing to succeed in their studies and for
all practitioners wishing to apply econometric techniques. The approach adopted
in this textbook is resolutely applied. Through this book, the author aims to meet
a pedagogical and operational need to quickly put into practice various concepts
presented (statistics, tests, methods, etc.). This is why, after each theoretical
presentation, numerous examples are given, as well as empirical applications carried
out on the computer using existing econometric and statistical software.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 401
V. Mignon, Principles of Econometrics, Classroom Companion: Economics,
https://doi.org/10.1007/978-3-031-52535-3
402 Index
Cox, D.R., 20, 77 prediction, 73, 74, 97, 141, 252, 299, 301,
Critical value, 57 319, 321
Estimator
BLUE, 47, 49, 113, 174, 363
D consistent, 49, 87–89, 103, 225, 277
Data linear, 47–49, 83, 85, 89, 92, 112, 113, 160,
cross-sectional, 9, 196 174
panel, vii, 9 minimum variance, 47, 49, 89, 160
Davidson, R., 82, 153, 221, 285, 365 unbiased, 47–49, 73, 85–90, 95, 103, 112,
Davis, R.A., 349 114, 141, 161, 164, 173, 236, 237
Determinant, 157, 159, 160 Evans, J.M., 253, 254
Dhrymes, P., 221, 365 Exogeneity, 327
Dickey, D.A., see Test, Dickey-Fuller Explanatory power, 143–145
Diebold, F.X., 262
Distribution
Chi-squared, 54, 55 F
Fisher, 55 Farebrother, R.W., 206
normal, 31, 97 Farrar, D.E., see Test, Farrar-Glauber
standard, 54, 98 Farvaque, E., 139
student, 55 Feedback effect, 332, 336
Disturbance(s), 30, 353, 364 Florens, J.P., vii, 82, 377
structural, 355, 358, 359 Form
Dowrick, S., 267 reduced, 354, 357–360, 362–365, 367
DS process, 297, 300–302 structural, 353, 355, 357–359, 362, 363,
Duesenberry, J., 266 366
Dufrénot, G., 349 Fox, J., 262
Dummy, see Variable, dummy Frequency, 9
Durbin, J., 204, 206–208, 214, 217, 219, 253, Friedman, J.P., 266, 279
254, 293, 314, 316 Frisch, R.A.K., v
Durbin algorithm, 293, 314, 316 Fuller, W.A., see Test, Dickey-Fuller
Function
autocorrelation, 289–293, 296, 297,
E 314–316
Elasticity, 76, 77, 247 autocovariance, 289, 290, 299, 301, 315
Elhorst, J.-P., vii impulse response, 332
Engle, R.F., vi, 185, 339–342, 345, 346, 349 joint probability density, 101
Equation(s) likelihood, 101
behavioral, 8, 353 partial autocorrelation, 289, 292, 308, 316
equilibrium, 353, 356
reduced form, 354
simultaneous, vi, 327, 351, 355, 360, 362, G
363, 365–367 Gallant, A.R., 82, 153
structural, 353, 357, 358, 364–366 Geary, R.C., see Test, Geary
variance analysis, 65, 66, 70, 124, 125, 129, Giles, D.E.A., 207
130, 167 Glauber, R.R., see Test, Farrar-Glauber
Yule-Walker, 293, 313, 314 Glejser, H., see Test, Glejser
Error(s), 30, 105 GLS, see Method, generalized least squares
equilibrium, 338 Godfrey, L.G., 207–209, 218, 219, 319
identically and independently distributed, Goldfeld, S.M., 82, 262
32 See also Test, Goldfeld-Quandt
mean absolute, 319, 320 Gouriéroux, C., vii, 262, 318, 349
mean absolute percent, 320 Granger, C.W.J., vi, 331, 336, 339, 340, 342,
measurement, 227 345, 349
normally and independently distributed, 32 Granger representation theorem, 339
Index 403
Greene, W., vii, 82, 112, 153, 221, 262, 281, Johansen, S., 340, 342, 349
290, 318, 327, 332, 360, 362, 365, Johnston, J., 82, 153, 201, 225, 262, 362, 364,
377 365
Griliches, Z., 262, 285 Jorgenson, D., 282
Gujarati, D.N., 82, 221, 285, 377 Judge, G.G., 153, 221, 262
Juselius, K., 342
H
Hamilton, J.D., 290, 327, 332, 339, 342, 349 K
Harvey, A.C., 182, 190, 349 Kaufmann, D., 136
Hausman, J., 226, 351, 365, 367 Kennard, R.W., 237
Heckman, J., vi Kennedy, P., 262
Hendry, D.F., vi, 26, 221 Kim, I.-M., 349
Heterogeneity, 176 Klein, L.R., 231, 234, 368, 370, 376
Heteroskedastic, see Heteroskedasticity Kmenta, J., 82
Heteroskedasticity, vi, 31, 171–173, 176–180, Koyck, L.M., 273, 275–280, 282, 283
182–189, 194, 195, 201, 211, 216, Kuh, D.A., 233
325, 364–366, 373 Kullback, S., 144
conditional, 185, 186, 325 Kurtosis, see Coefficient, kurtosis
Hildreth, C., see Method, Hildreth-Lu
Hoel, P.G., 26
Hoerl, A.E., 237 L
Homoskedastic, see Homoskedasticity Lag, 265
Homoskedasticity, 30, 31, 109, 171, 181–186, mean, 269, 275, 285
190, 192–194, 289, 319, 325, 326 median, 269, 275, 284
Hurlin, C., 82, 349 Lagrange multiplier statistic, 185, 186
Lardic, S., 197, 287, 289, 302, 327, 332, 339,
342, 349
I Leamer, E.E., 262
Identification, 317, 318, 351, 357–361, 369 Lehnan, E.L., 82
Identification problem, 357 Leptokurtic, see Coefficient, kurtosis
ILS, see Method, indirect least squares LeSage, J., vii
Inertia degree, 10 Ljung, G.M., see Test, Ljung-Box
Information criteria Logarithmic difference, 20
Akaike, 145, 146, 152, 306, 320, 330, 334 Loglikelihood, 102, 365
Akaike corrected, 145 Log-reciprocal, see Model, reciprocal
Hannan-Quinn, 145, 146, 152, 306, 320, Lu, J., see Method, Hildreth-Lu
330, 334
Schwarz, 145, 146, 152, 306, 320, 330, 334
Innovation, 313 M
Integration, 300, 308, 310, 338, 345 MA, see Model, moving average
Interpolation, 197 MacKinnon, J.G., 82, 153, 216, 221, 285, 341,
Interval 346, 365
confidence, 57, 58, 60, 63, 64, 95, 118, 204, Macrobond, 21, 50, 147, 190, 234, 250, 256,
295, 296, 324 283, 345
prediction, 73, 74, 97, 140–143, 321 Maddala, G.S., 281, 349
Intriligator, M.D., 26, 262 Mallows criterion, 146
Marimoutou, J.P., vii
Matrix
J diagonal, 154
Jarque, C.M., see Test, Jarque-Bera full rank, 108, 111, 120, 158, 159, 242
Jean, E., 139 idempotent, 157, 163
Jenkins, G.M., 287, 312, 313, 317, 318, 320, identity, 157
321, 349 inverse, 157, 158
404 Index
T V
Teräsvirta, T., 349 Variable
Test binary, 243
Augmented Dickey-Fuller, 305, 306, 341, centered, 47, 70, 83, 123, 125, 128, 228
346 control, 248
Box-Pierce, 210 dependent, 9
Chow, 254–256, 260, 261 dummy, 243–247, 249–251, 260
coefficient significance, 59, 119 endogenous, 9
CUSUM, 253, 254, 257 exogenous, 9
CUSUM of squares, 254, 257 explained, 9, 27
Dickey-Fuller, 287, 302–308, 333, 334, explanatory, 9, 27
340, 341, 346 independent, 9
Durbin, 204–208, 214, 217 indicator, 241, 243, 245, 249
Durbin-Watson, 204–207, 214, 217, 219, instrumental, 223–225, 276, 277, 355, 362,
337 364–368, 370
Farrar-Glauber, 232, 235 lagged endogenous, 10, 207, 208, 275, 276
Fisher, 120, 122, 126, 131, 132, 152, 230, predetermined, 355, 363, 364, 369
242, 270, 335, 336 qualitative, vii, 246–250
Geary, 201 Variance, 11, 45
Glejser, 182, 186, 188, 192, 194, 319 empirical, 12
Goldfeld-Quandt, 179, 181, 182, 190, 319 explained, 65, 66
Hausman, 226 residual, 65, 66
instruments validity, 277 Variance inflation factor, 229, 233
Jarque-Bera, 99 Vector
Ljung-Box, 207, 210, 219, 297, 319, 324 cointegration, 338
portmanteau (see Test, Box-Pierce) column, 154, 156
regression significance, 121, 151 line, 154, 156
regression significance (see Test, regression VIF, see Variance inflation factor
significance) Volatility, 185
Sargan (see Test, instruments validity)
significance, 59, 60, 69–71, 119–121, 151,
182, 208, 309, 368 W
student, 120, 166, 208, 273, 368 Walker, see Equation(s), Yule-Walker
unit root, 293, 297, 302, 306, 310, 333, 345 Wallis, K.F., 207
Test size, 57 Watson, G.S., see Test, Durbin-Watson
Theil, H., 363, 365, 377 Weak, see Stationarity, second-order
Three-stage least squares, see Method, Welsch, R.E., 233
three-stage least squares West, K.D., see Method, Newey-West
Thuilliez, J., 135 White, H., 184–187, 193, 194, 319
Time series econometrics, 287, 319 White noise, 31, 319
Tobin, J., 237 Winsten, C.B., 216
Trace, 157, 158, 163, 164 WLS, see Method, weighted least squares
Transformation Wooldridge, J.M., vii, 221
Box-Cox, 77, 79–81
Koyck, 273, 275
logarithmic, 20, 28, 29, 189 Y
TS process, 298–300, 303 Yoo, S., 341, 342, 346
Two-stage least squares, see Method, two-stage Yule, see Equation(s), Yule-Walker
least squares
Z
U Zellner, A., 365, 366
Underidentified, 358, 361, 362, 369 Zuindeau, B., 139
Unit root, 293, 297, 300, 302, 303, 305–308,
310, 333, 334, 340, 345