100% found this document useful (2 votes)
2K views417 pages

Principles of Econometrics 2024

This document provides a summary of the book "Principles of Econometrics: Theory and Applications" by Valérie Mignon. It discusses how the book aims to provide students with the basics of econometrics through eight chapters covering topics such as simple and multiple regression models, heteroskedasticity, autocorrelation, model stability, distributed lags, time series analysis, and simultaneous equation models. While focusing on contemporary developments in econometrics like time series analysis, the book illustrates theoretical concepts with empirical applications in macroeconomics and finance using Eviews software. The intended audience is undergraduate and graduate students in economics, management, mathematics, and computer science.

Uploaded by

lvdang36
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
2K views417 pages

Principles of Econometrics 2024

This document provides a summary of the book "Principles of Econometrics: Theory and Applications" by Valérie Mignon. It discusses how the book aims to provide students with the basics of econometrics through eight chapters covering topics such as simple and multiple regression models, heteroskedasticity, autocorrelation, model stability, distributed lags, time series analysis, and simultaneous equation models. While focusing on contemporary developments in econometrics like time series analysis, the book illustrates theoretical concepts with empirical applications in macroeconomics and finance using Eviews software. The intended audience is undergraduate and graduate students in economics, management, mathematics, and computer science.

Uploaded by

lvdang36
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 417

Classroom Companion: Economics

Valérie Mignon

Principles
of Econometrics
Theory and Applications
Classroom Companion: Economics
The Classroom Companion series in Economics includes undergraduate and grad-
uate textbooks alike. It welcomes fundamental textbooks aimed at introducing
students to the core concepts, empirical methods, theories and tools of the field, as
well as advanced textbooks written for students at the Master and PhD level seeking
a deeper understanding of economic theory, mathematical tools and quantitative
methods.
Valérie Mignon

Principles of Econometrics
Theory and Applications
Valérie Mignon
EconomiX-CNRS
University of Paris Nanterre
Nanterre Cedex, France

ISSN 2662-2882 ISSN 2662-2890 (electronic)


Classroom Companion: Economics
ISBN 978-3-031-52534-6 ISBN 978-3-031-52535-3 (eBook)
https://doi.org/10.1007/978-3-031-52535-3
The translation was done with the help of an artificial intelligence machine translation tool. A subsequent
human revision was done primarily in terms of content.
Translation from the French language edition: “Économétrie - Théorie et applications - 2e éd.” by Valérie
Mignon, © 2022. Published by Economica. All Rights Reserved.

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Paper in this product is recyclable.


Preface

Econometrics is the study and measurement of economic phenomena based on


the statistical observation of relevant quantities describing them. Econometrics is
a branch of economic science that draws jointly on economic theory, statistics,
mathematics, and computer science. In particular, it is used to analyze and verify,
i.e., to test, economic phenomena and theories.
Econometrics, as a discipline, was born in 1930 with the creation of the
Econometric Society by Ragnar Frisch, Charles Roos, and Irving Fisher. Frisch
(1933) defines econometrics as follows: “econometrics is by no means the same
as economic statistics. Nor is it identical with what we call general economic
theory, although a considerable portion of this theory has a definitely quantitative
character. Nor should econometrics be taken as synonymous with the application
of mathematics to economics. Experience has shown that each of these three
viewpoints, that of statistics, economic theory, and mathematics, is a necessary,
but not by itself a sufficient, condition for a real understanding of the quantitative
relations in modern economic life. It is the unification of all three that is powerful.
And it is this unification that constitutes econometrics.”
The development of databases—particularly at a very fine level and at high
frequency—combined with the development of computer tools has enabled this
unification of economic theory, statistics, and mathematics. Moreover, as Pirotte
(2004) reminds us “econometrics provides economists with a fundamental basis
for studying the prospects and consequences of economic policies that can be
applied. More specifically, it is the only method that provides both quantitative
and qualitative information.” Thus, through macroeconometric models in particular,
econometrics is characterized by a high level of operational content, especially
for macroeconomists, economic analysts, and policymakers. Macroeconometric
models, the aim of which is to describe economic activity, are used as a simulation
tool and thus provide an aid to policy decision-making. Similarly, in the field of
finance, econometrics has undergone considerable developments, enabling us to
better understand the dynamics of financial markets.

v
vi Preface

Work with econometric content has developed substantially during the twentieth
century, as demonstrated by the large number of journals on econometrics.1
Examples include: Biometrika, Econometrica, Econometric Theory, Econometric
Reviews, Journal of Econometrics, Journal of the American Statistical Association,
Journal of Time Series Analysis, and Quantitative Economics. There are also
journals with more applied content such as Empirical Economics, International
Journal of Forecasting, Journal of Applied Econometrics, Journal of Business and
Economic Statistics, and Journal of Financial Econometrics. In addition, many gen-
eral economic journals publish articles with strong econometric content: American
Economic Review, Economics Letters, European Economic Review, International
Economic Review, International Economics, Journal of the European Economic
Association, Quarterly Journal of Economics, and Review of Economic Studies.
The rise of econometrics can also be illustrated by the fact that recent Nobel
Prizes in economics have been awarded to econometricians. James Heckman and
Daniel McFadden received the Nobel Prize in Economics in 2000 for their work
on theories and methods for the analysis of selective samples and on discrete
choice models. Similarly, in 2003, the Nobel Prize in Economics was awarded to
Robert Engle and Clive Granger for their work on methods of analyzing economic
time series with (i) time-varying volatility (R. Engle) and (ii) common trends (C.
Granger), which has contributed to improved forecasts of economic growth, interest
rates, and stock prices. The Prize was also awarded to Christopher Sims and Thomas
Sargent in 2011 for their empirical work on cause and effect in the macroeconomy,
and to Eugene Fama, Lars Peter Hansen, and Robert Shiller in 2013 for their
empirical analysis of asset prices.
These different points testify that econometrics is a discipline in its own right and
a fundamental branch of economics.
This book aims to provide readers with the basics of econometrics. It is composed
of eight chapters. The first, introductory chapter recalls some essential concepts in
statistics and econometrics. Chapter 2 deals with the simple regression model. Chap-
ter 3 generalizes the previous chapter to the case of the multiple regression model, in
which more than one explanatory variable is included. In Chap. 4, the fundamental
themes of heteroskedasticity and autocorrelation of errors are addressed in detail.
Chapter 5 brings together a set of problems related to explanatory variables. It deals
successively with dependence between explanatory variables and the error term, the
problem of multicollinearity, and the question of stability of the estimated models.
Chapter 6 introduces dynamics into the models and presents distributed lag models.
Chapter 7 extends the previous chapter by presenting time series models, a branch
of econometrics that has undergone numerous developments over the last 40 years.
Finally, Chap. 8 deals with structural models by studying simultaneous equations
models.

1 Pirotte’s (2004) book gives a history of econometrics, from the origins of the discipline to its
recent developments. See also Morgan (1990) and Hendry and Morgan (1995).
Preface vii

While providing a detailed introduction to econometrics, this book also focuses


on some recent developments in the discipline, particularly in time series econo-
metrics. The choice to focus on contemporary advances means that some topics
have been deliberately omitted. This is notably the case for panel data econometrics
(Matyas and Sevestre, 2008; Wooldridge, 2010; Baltagi, 2021), spatial econometrics
(LeSage and Pace, 2008; Elhorst, 2014), econometrics of qualitative variables
(Gouriéroux, 2000; Greene, 2020), and models with unobservable variables (Flo-
rens, Marimoutou, and Péguin-Feissolle, 2007), and nonlinear models (see in
particular Florens et al., 2007; Greene, 2020).
All the theoretical developments in this book are illustrated by numerous
applications to macroeconomics and finance. Each chapter contains several concrete
empirical applications, using Eviews software. This permanent combination of
theoretical and applied aspects will allow readers to quickly put into practice the
different concepts presented.
This book is the fruit of various econometrics courses taught by the author at the
University of Paris Nanterre in France. It is primarily intended for undergraduates
and graduates in economics, management, and in mathematics and computer science
applied to the social sciences, as well as for students at business and engineering
schools. It will also be useful for professionals who work with econometric
techniques. They will find in it practical solutions to the various problems they face.
I would like to thank Agnès Bénassy-Quéré, Hubert Kempf, and Jean Pavlevski
for encouraging me to write this textbook, the first edition of which was published
in French in 2008. I am particularly indebted to Hubert Kempf for prompting me to
write this new edition in English, and to my publisher, Springer. I would also like
to thank Emmanuel Dubois for his constant support and for the help he gave me in
formatting this book.
To Tania and Emmanuel

Paris, France Valérie Mignon


About This Book

Bringing together theory and practice, this book presents the basics of econometrics
in a clear and pedagogical way. It focuses on the acquisition of the methods
and skills that are essential for all students wishing to succeed in their studies
and for all practitioners wishing to apply econometric techniques. The approach
adopted in this textbook is resolutely applied. Through this book, the author
aims to meet a pedagogical and operational need to quickly put into practice
the various concepts presented (statistics, tests, methods, etc.). This is why, after
each theoretical presentation, numerous examples are given, as well as empirical
applications carried out on the computer using existing econometric and statistical
software.
This textbook is primarily intended for students of bachelor’s and master’s
degrees in Economics, Management, and Mathematics and Computer Sciences, as
well as for students of Engineering and Business schools. It will also be useful for
professionals who will find practical solutions to the various problems they face.

ix
Contents

1 Introductory Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 What Is Econometrics? Some Introductory Examples . . . . . . . . . . . . . . . . 1
1.1.1 Answers to Many Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 The Example of Consumption and Income . . . . . . . . . . . . . . . . . . . . 2
1.1.3 The Answers to the Other Questions Asked . . . . . . . . . . . . . . . . . . . 4
1.2 Model and Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 The Concept of Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 Different Types of Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.3 Explained Variable/Explanatory Variable . . . . . . . . . . . . . . . . . . . . . 9
1.2.4 Error Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Statistics Reminders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 Mean. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.2 Variance, Standard Deviation, and Covariance . . . . . . . . . . . . . . . . 11
1.3.3 Linear Correlation Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.4 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 A Brief Introduction to the Concept of Stationarity . . . . . . . . . . . . . . . . . . . 17
1.4.1 Stationarity in the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.2 Stationarity in the Variance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.3 Empirical Application: A Study of the Nikkei Index . . . . . . . . . 21
1.5 Databases and Software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5.1 Databases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5.2 Econometric Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
The Gist of the Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2 The Simple Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1.1 The Linearity Assumption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1.2 Specification of the Simple Regression Model and
Properties of the Error Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1.3 Summary: Specification of the Simple Regression Model. . . . 32
2.2 The Ordinary Least Squares (OLS) Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.1 Objective and Reminder of Hypotheses . . . . . . . . . . . . . . . . . . . . . . . 33

xi
xii Contents

2.2.2 The OLS Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34


2.2.3 The OLS Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2.4 Properties of OLS Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2.5 OLS Estimator of the Variance of the Error Term. . . . . . . . . . . . . 49
2.2.6 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.3 Tests on the Regression Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.3.1 Determining the Distributions Followed by the OLS
Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.3.2 Tests on the Regression Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.3.3 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.4 Analysis of Variance and Coefficient of Determination . . . . . . . . . . . . . . . 64
2.4.1 Analysis of Variance (ANOVA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.4.2 Coefficient of Determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.4.3 Analysis of Variance and Significance Test of the
Coefficient β . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.4.4 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.5 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.6 Some Extensions of the Simple Regression Model . . . . . . . . . . . . . . . . . . . 75
2.6.1 Log-Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.6.2 Semi-Log Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.6.3 Reciprocal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.6.4 Log-Inverse or Log-Reciprocal Model . . . . . . . . . . . . . . . . . . . . . . . . 80
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
The Gist of the Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Appendix 2.1: Demonstrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Appendix 2.2: Normal Distribution and Normality Test . . . . . . . . . . . . . . . . . . . . 97
Appendix 2.3: The Maximum Likelihood Method . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3 The Multiple Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.1 Writing the Model in Matrix Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.2 The OLS Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.2.1 Assumptions of the Multiple Regression Model . . . . . . . . . . . . . 107
3.2.2 Estimation of Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.2.3 Properties of OLS Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.2.4 Error Variance Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.2.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.3 Tests on the Regression Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.3.1 Distribution of Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.3.2 Tests on a Regression Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
3.3.3 Significance Tests of Several Coefficients . . . . . . . . . . . . . . . . . . . . . 120
3.4 Analysis of Variance (ANOVA) and Adjusted Coefficient
of Determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
3.4.1 Analysis-of-Variance Equation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
3.4.2 Coefficient of Determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Contents xiii

3.4.3 Adjusted Coefficient of Determination . . . . . . . . . . . . . . . . . . . . . . . . 126


3.4.4 Partial Correlation Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
3.4.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
3.5 Some Examples of Cross-Sectional Applications . . . . . . . . . . . . . . . . . . . . . 134
3.5.1 Determinants of Crime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
3.5.2 Health Econometrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
3.5.3 Inequalities and Financial Openness . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
3.5.4 Inequality and Voting Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
3.6 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
3.6.1 Determination of Predicted Value and Prediction Interval . . . . 140
3.6.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
3.7 Model Comparison Criteria. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
3.7.1 Explanatory Power/Predictive Power of a Model . . . . . . . . . . . . . 143
3.7.2 Coefficient of Determination and Adjusted Coefficient
of Determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
3.7.3 Information Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
3.7.4 The Mallows Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
3.8 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
3.8.1 Practical Calculation of the OLS Estimators . . . . . . . . . . . . . . . . . 148
3.8.2 Software Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
The Gist of the Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Appendix 3.1: Elements of Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Appendix 3.2: Demonstrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
4 Heteroskedasticity and Autocorrelation of Errors . . . . . . . . . . . . . . . . . . . . . . . 171
4.1 The Generalized Least Squares (GLS) Estimators . . . . . . . . . . . . . . . . . . . . 172
4.1.1 Properties of OLS Estimators in the Presence
of Autocorrelation and/or Heteroskedasticity . . . . . . . . . . . . . . . . . 172
4.1.2 The Generalized Least Squares (GLS) Method . . . . . . . . . . . . . . . 173
4.1.3 Estimation of the Variance of the Errors . . . . . . . . . . . . . . . . . . . . . . . 175
4.2 Heteroskedasticity of Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
4.2.1 The Sources of Heteroskedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
4.2.2 Estimation When There Is Heteroskedasticity . . . . . . . . . . . . . . . . 177
4.2.3 Detecting Heteroskedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.2.4 Estimation Procedures When There Is Heteroskedasticity . . . 186
4.2.5 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
4.3 Autocorrelation of Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
4.3.1 Sources of Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
4.3.2 Estimation When There Is Autocorrelation . . . . . . . . . . . . . . . . . . . 198
4.3.3 Detecting Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
4.3.4 Estimation Procedures in the Presence of Error
Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
4.3.5 Prediction in the Presence of Error Autocorrelation . . . . . . . . . . 216
xiv Contents

4.3.6 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217


Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
The Gist of the Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
5 Problems with Explanatory Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
5.1 Random Explanatory Variables and the Instrumental Variables
Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
5.1.1 Instrumental Variables Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
5.1.2 The Hausman (1978) Specification Test . . . . . . . . . . . . . . . . . . . . . . . 226
5.1.3 Application Example: Measurement Error . . . . . . . . . . . . . . . . . . . . 226
5.2 Multicollinearity and Variable Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
5.2.1 Presentation of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
5.2.2 The Effects of Multicollinearity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
5.2.3 Detecting Multicollinearity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
5.2.4 Solutions to Multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
5.2.5 Variable Selection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
5.3 Structural Changes and Indicator Variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
5.3.1 The Constrained Least Squares Method . . . . . . . . . . . . . . . . . . . . . . . 242
5.3.2 The Introduction of Indicator Variables . . . . . . . . . . . . . . . . . . . . . . . 243
5.3.3 Coefficient Stability Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
The Gist of the Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
Appendix: Demonstration of the Formula for Constrained Least
Squares Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
6 Distributed Lag Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
6.1 Why Introduce Lags? Some Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
6.2 General Formulation and Definitions of Distributed
Lag Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
6.3 Determination of the Number of Lags and Estimation . . . . . . . . . . . . . . . . 270
6.3.1 Determination of the Number of Lags . . . . . . . . . . . . . . . . . . . . . . . . . 270
6.3.2 The Question of Estimating Distributed Lag Models . . . . . . . . . 271
6.4 Finite Distributed Lag Models: Almon Lag Models . . . . . . . . . . . . . . . . . . 271
6.5 Infinite Distributed Lag Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
6.5.1 The Koyck Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
6.5.2 The Pascal Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
6.6 Autoregressive Distributed Lag Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
6.6.1 Writing the ARDL Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
6.6.2 Calculation of ARDL Model Weights . . . . . . . . . . . . . . . . . . . . . . . . 282
6.7 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
The Gist of the Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Contents xv

7 An Introduction to Time Series Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287


7.1 Some Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
7.1.1 Time Series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
7.1.2 Second-Order Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
7.1.3 Autocovariance Function, Autocorrelation Function,
and Partial Autocorrelation Function . . . . . . . . . . . . . . . . . . . . . . . . . . 289
7.2 Stationarity: Autocorrelation Function and Unit Root Test . . . . . . . . . . . 293
7.2.1 Study of the Autocorrelation Function. . . . . . . . . . . . . . . . . . . . . . . . . 293
7.2.2 TS and DS Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
7.2.3 The Dickey-Fuller Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
7.3 ARMA Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
7.3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
7.3.2 The Box and Jenkins Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
7.3.3 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
7.4 Extension to the Multivariate Case: VAR Processes . . . . . . . . . . . . . . . . . . 327
7.4.1 Writing the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
7.4.2 Estimation of the Parameters of a V AR(p) Process
and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
7.4.3 Forecasting VAR Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
7.4.4 Granger Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
7.4.5 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
7.5 Cointegration and Error-Correction Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
7.5.1 The Problem of Spurious Regressions . . . . . . . . . . . . . . . . . . . . . . . . . 336
7.5.2 The Concept of Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
7.5.3 Error-Correction Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
7.5.4 Estimation of Error-Correction Models and
Cointegration Tests: The Engle and Granger (1987)
Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
7.5.5 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
The Gist of the Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
8 Simultaneous Equations Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
8.1 The Analytical Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
8.1.1 Introductory Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
8.1.2 General Form of Simultaneous Equations Models . . . . . . . . . . . 355
8.2 The Identification Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
8.2.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
8.2.2 Rank and Order Conditions for Identification . . . . . . . . . . . . . . . . . 358
8.3 Estimation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
8.3.1 Indirect Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
8.3.2 Two-Stage Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
8.3.3 Full-Information Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
8.4 Specification Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
xvi Contents

8.5 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368


8.5.1 Writing the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
8.5.2 Conditions for Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
8.5.3 Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
8.5.4 Model Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
The Gist of the Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377

Appendix: Statistical Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379


References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
About the Author

Valérie Mignon is Professor of Economics at the University of Paris Nanterre


(France), Member of the EconomiX-CNRS research center, and Scientific Advisor
to the leading French center for research and expertise on the world economy,
CEPII (Paris, France). She teaches econometrics at undergraduate and graduate
levels. Her econometric research focuses mainly on macroeconomics, finance,
international macroeconomics and finance, and energy, fields in which she has
published numerous articles and books.

xvii
Introductory Developments
1

After defining the concepts of model and variable, this chapter offers some statistical
reminders about the mean, variance, standard deviation, covariance, and linear
correlation coefficient. A brief introduction to the concept of stationarity is also
provided. Finally, this chapter lists the main databases in economics and finance,
as well as the most commonly used software packages. Beforehand, we give some
introductory examples to illustrate in a simple way what econometrics can do.

1.1 What Is Econometrics? Some Introductory Examples

Econometrics is a discipline with a strong operational content. It enables us to


quantify a phenomenon, establish a relationship between several variables, validate
or invalidate a theory, evaluate the effects of an economic policy measure, etc.

1.1.1 Answers to Many Questions

Econometrics provides answers to a wide range of questions. Let us take some


simple examples.

– Are the terms of trade a determinant of the value of exchange rates? Do other
economic variables have more impact?
– Is the purchasing power parity theory empirically verified?
– Do rising oil prices have a significant impact on car sales?
– Is the depreciation of the dollar compatible with rising oil prices?
– Is the euro overvalued? If so, by how much? In other words, what is the
equilibrium value of the euro?
– Are international financial markets integrated?

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 1


V. Mignon, Principles of Econometrics, Classroom Companion: Economics,
https://doi.org/10.1007/978-3-031-52535-3_1
2 1 Introductory Developments

– Is the efficient capital market hypothesis confirmed?


– Is there international convergence in GDP per capita?
– What is the impact of the 35-hour work week on unemployment?
– Does higher inflation reduce unemployment?
– Does their parents’ socio-occupational category have an impact on children’s
level of education?
– What is the impact of air pollution on children’s health?
– What are the effects of global warming on economic growth?
– etc.

To answer these questions, the econometrician must build a model to relate the
variables of interest. Consider, for example, the question “What is the impact of an
increase of 10 monetary units in income on household consumption?”

1.1.2 The Example of Consumption and Income

To answer this question, two variables need to be taken into account: household
consumption and household income (gross disposable income). To relate these two
variables, we write an equation of the following type:

CONS = α + β × I N C
. (1.1)

where CON S denotes consumption and I N C income. The impact of a variation


in income on consumption is taken into account by the parameter .β. To quantify
this impact, it is necessary to have a numerical value for the coefficient .β. To this
end, an estimation of the model is performed: estimating a model thus amounts to
quantifying it, i.e., quantifying the relationship between two or more variables. In
the following, we will detail the methods available for estimating a model. For the
moment, let us restrict ourselves to a few illustrations.
Consider two countries: Finland and Italy. For each of the two countries, we
want to assess the impact of a 10-unit increase in the gross disposable income of
Finnish (resp. Italian) households on their consumption. Figures 1.1 and 1.2 show
the evolution of real consumption (CON S) and income (I N C) of households for
each of the two countries.1 The data are annual and cover the period from 1995
to 2020.2 Regardless of which figure we look at, we see that the series move in the
same direction: consumption and income show an overall upward trend in the case of
Finland, and the two series move in tandem, alternating between bullish and bearish
phases, in the case of Italy. If there is a relationship between the two variables,

1 The series are expressed in real terms, i.e., they are deflated by the consumer price index of each

country.
2 The data are extracted from the national statistical institutes of the two countries: Statistics

Finland and the Italian National Institute of Statistics (Istat).


1.1 What Is Econometrics? Some Introductory Examples 3

Fig. 1.1 Evolution of 1,8E+11


consumption and gross
disposable income of Finnish 1,6E+11
households (euros),
1995–2020
1,4E+11

1,2E+11

1E+11

8E+10

6E+10
1995 2000 2005 2010 2015 2020
INC_FIN CONS_FIN

Fig. 1.2 Evolution of 1,2E+12


consumption and gross
disposable income of Italian 1,15E+12
households (euros),
1995–2020 1,1E+12

1,05E+12

1E+12

9,5E+11

9E+11

8,5E+11
1995 2000 2005 2010 2015 2020
INC_ITA CONS_ITA

it should therefore be positive. In other words, we expect the value obtained for
the coefficient .β to be positive. More specifically, if we estimate model (1.1), we
obtain the following values for the coefficient .β associated with income: 0.690 for
Finland and 0.721 for Italy. These values are positive, which means that an increase
in income is accompanied by an increase in consumption in both countries, all other
things being equal. We can also quantify this increase:

– A e10 increase in income in Finland translates into a e.6.90 increase in


consumption of Finnish households, all other things being equal.
– A e10 increase in income in Italy generates, all other things being equal, an
increase in consumption of Italian households of around e.7.21.

Although different, these two values are quite close, which means that household
consumption behavior, in relation to the change in income, is similar in Finland and
4 1 Introductory Developments

Italy, even though the economic characteristics of the two countries differ. In the
rest of this book, we will see that it is possible to refine these comments by studying
whether or not the values obtained are significantly different. This will be done
using statistical tests.

1.1.3 The Answers to the Other Questions Asked

To conduct their analysis, econometricians have to find the data they need. In the
case of the example previously studied, the following series are needed: household
consumption, household gross disposable income, and the consumer price indexes
for Finland and Italy, i.e., a total of six series. For this purpose, econometricians
need access to databases. Nowadays, there are many such databases, some of which
are freely accessible. A non-exhaustive list of the main economic and financial
databases is given at the end of this chapter. Once the data have been collected,
it is possible to proceed with the study in question.
Let us now consider the various questions posed in Sect. 1.1.1 and give some
possible answers.

– Are the terms of trade a determinant of the value of exchange rates? Do other
economic variables have more impact?
The following data are required for the country under consideration: export
prices, import prices, and the exchange rate, the ratio between export prices and
import prices being used to measure the terms of trade. To assess whether the
terms of trade are a determinant of the exchange rate, it is necessary to estimate a
model that relates the exchange rate and the terms of trade and to test whether the
coefficient associated with the variable “terms of trade” is significantly different
from zero. To determine whether other economic variables have more impact, we
need to add them to the previous model and study their statistical significance.
Other potential determinants include the country’s net foreign asset position,
productivity, interest rate differential, etc.
– Is the purchasing power parity theory empirically confirmed?
According to the purchasing power parity (PPP) theory, each country’s currency
provides the same purchasing power in all countries. In other words, if the
products traded are physically identical (without transport costs), the nominal
exchange rate (indirect quote) is determined by the relative price of the good,
i.e., .Qt = Pt /Pt∗ , which can be written in logarithmic form: .qt = pt − pt∗ where
the lowercase variables are the logarithms of the uppercase variables, .Qt is the
nominal exchange rate, .Pt is the domestic consumer price index, and .Pt∗ is the
foreign consumer price index. In order to grasp the empirical validity of PPP, we
can estimate a relationship of the type .qt = α + β1 pt − β2 pt∗ and check that
.α = 0, .β1 = β2 = 1. This is done by statistically testing that the coefficients

take certain specific values.


1.1 What Is Econometrics? Some Introductory Examples 5

– Do rising oil prices have a significant impact on car sales?


This question can be answered by estimating an equation linking car sales to oil
prices. The value obtained for the coefficient assigned to oil prices will quantify
the effect of their increase on car sales. If a significant impact is detected, it is
expected to be negative, as higher oil prices generate an additional cost.
– Is the depreciation of the dollar compatible with rising oil prices?
This question about the link between oil prices and the dollar exchange rate
is essential because oil prices are denominated in dollars. Traditionally, it is
assumed that there is a positive relationship between the two variables, in the
sense that a rise in oil prices is generally accompanied by an appreciation of the
US currency. To understand the link between the two variables, it is necessary
to estimate a relationship explaining the dollar exchange rate by oil prices. The
coefficient assigned to the oil price variable should therefore be positive, and its
value makes it possible to quantify the impact of oil prices on the dollar.
– Is the euro overvalued? If so, by how much? In other words, what is the
equilibrium value of the euro?
To answer these questions, we need to define a “standard” corresponding to the
equilibrium value of the euro. Among the theories for determining equilibrium
exchange rates is the BEER (behavioral equilibrium exchange rate) framework.
By this approach, the exchange rate is linked in the long term to a set of economic
fundamentals, such as the net foreign asset position, the relative price level or any
other measure of productivity, the terms of trade, and the interest rate differential.
Estimating an equation that explains the euro exchange rate by these different
fundamentals allows us to define the equilibrium value of the European currency.
The question of overvaluation is then addressed by comparing the observed value
of the euro with its estimated equilibrium value. In Chap. 7, we will see that
estimating an equilibrium relationship, or long-term relationship, is based on
cointegration theory.
– Are international financial markets integrated?
There are, of course, many ways of approaching this fundamental question.
One possible approach is to adopt the work of Feldstein and Horioka (1980):
if financial markets are perfectly integrated, then capital is perfectly mobile,
which implies that capital should move to wherever the rate of return is
highest. Consequently, for a given country, the investment rate should be totally
uncorrelated with its savings rate. To understand this hypothesis, we need to
estimate a relationship linking the investment rate to the savings rate and to
consider the value of the coefficient assigned to the savings rate. The farther
from 1, the weaker the correlation and the more this suggests a high degree of
financial integration.
– Is the efficient capital market hypothesis confirmed?
In line with the weak form of informational efficiency, prices observed on a
market follow a random walk. In other words, price changes, or so-called returns,
are unpredictable in the sense that it is impossible to predict future returns from
past returns. A simple way to test this hypothesis is to estimate a relationship of
the type .Rt = α + βRt−1 and test whether the coefficient .β assigned to past
6 1 Introductory Developments

returns .Rt−1 is zero or not. If it is zero, the efficient capital market hypothesis
is not called into question, since past values of returns do not provide any
information to explain the current change in returns.
– Is there international convergence in GDP per capita?
Analyzing the convergence of GDP per capita is fundamental to studying
inequalities between nations. In particular, this question raises the issue of
poor countries catching up with rich ones. If we are interested in conditional
convergence, the Solow model can be used. In this model, the growth rate of a
country’s per capita income depends on the level at which this income is situated
in relation to the long-run equilibrium path of the economy. It is then possible
to estimate a relationship to explain the GDP growth rate between the current
date and the initial date by the level of GDP at the initial date. If the coefficient
assigned to the level of GDP is zero, this indicates an absence of convergence.
– What is the impact of the 35-hour work week on unemployment?
There are several ways to approach this question. One is to estimate a relationship
to explain the unemployment rate by working hours, by varying those working
hours. If the impact of the 35-hour work week on the unemployment rate
is neutral, the coefficient assigned to the duration variable should be similar,
whether the duration is 35 or 39 hours.
– Can higher inflation reduce unemployment?
This question is linked to a relationship that is widely studied in macroe-
conomics, namely, the Phillips curve, according to which there is a negative
relationship between the unemployment rate and the inflation rate. This rela-
tionship will be studied in Chap. 2 in order to determine whether inflation has a
beneficial effect on unemployment.
– Does their parents’ socio-occupational category have an impact on children’s
level of education?
Such a question can again be addressed by estimating a relationship between
children’s level of education and their parents’ socio-occupational category
(SOC). If the coefficient assigned to SOC differs with the SOC, this indicates
an impact of SOC considered on children’s level of education.
– Does air pollution have an impact on children’s health?
Answering this question first requires some way of measuring air pollution and
children’s health. Once these two measures have been established, the analysis
is carried out in a standard way, by estimating a relationship linking children’s
health to air pollution.
– What are the effects of global warming on economic growth?
As before, once the means of measuring global warming (e.g., greenhouse gas
emissions) has been found, a relationship between economic growth and this
variable must be estimated.

Having presented these examples and introductory points, let us formalize the
various concepts, such as the notions of model and variable in more detail.
1.2 Model and Variable 7

1.2 Model and Variable

An essential part of econometrics is the construction and estimation of models.


A model relates various variables, which are often economic quantities. It is a
formalized representation of a phenomenon or theory in the form of equations. We
speak of modeling, its aims being to understand, explain, and possibly predict the
phenomenon under study.
First of all, it is necessary to define the concept of model, as well as the types of
variables that can be involved in it.

1.2.1 The Concept of Model

A model is a simplified representation of reality which consists in representing a


phenomenon in the form of one or more equations. It makes it possible to specify
relationships between variables and to explain the way in which certain variables are
determined by others. Consider, for example, the Keynesian consumption function.
In accordance with Keynes’ (1936) “fundamental psychological law,” “men are
disposed, as a rule and on average, to increase their consumption as their income
increases, but not as much as the increase in their income.” According to this law,
consumption is an increasing function of income. By noting C consumption and Y
income, we have:

C = f (Y )
. (1.2)

where f is such that .f ' > 0. However, three types of functions, or models, are
compatible with the fundamental psychological law:

– A linear proportional model: .C = cY , with .0 < c < 1. The parameter c


designates the average propensity to consume . c = C Y , but also the marginal
propensity to consume, since . dC
dY = c. In line with this formulation, the variation
2
of the marginal propensity to consume as a function of income is zero: . ddYC2 = 0;
– A linear affine model: .C = cY + C0 , with .0 < c < 1 and .C0 > 0. The average
propensity to consume is now given by .c + CY0 , while the marginal propensity
2
remains equal to c. Furthermore, as before, we have: . ddYC2 = 0;
– A concave function: .C = f (Y) with ''
 .f < 0. Under these conditions, the 
marginal propensity to consume . f ' is lower than the average propensity . C
Y .
2
Because the function is concave, we have . ddYC2 < 0, reflecting the fact that
the variation in the marginal propensity to consume as a function of income is
negative.
8 1 Introductory Developments

As an approximation,3 the affine linear model is frequently used as a representa-


tion of the Keynesian consumption function. The model:

C = cY + C0
. (1.3)

thus represents the consumption behavior of agents from a Keynesian perspective. c


and .C0 are parameters (or coefficients) that must be estimated. In the next chapter,
we will see that the ordinary least squares (OLS) method is used to estimate these
coefficients; its purpose is to attribute values to the coefficients, i.e., to quantify the
relationship between consumption and income. As an example, suppose that the
application of this method yields the following estimates: 0.86 for the estimated
value of c and 200 000 for the estimated value of .C0 . We then have:

.Ĉ = 0.86Y + 200,000 (1.4)

where .Ĉ designates the estimated consumption.4 By virtue of Eq. (1.4), it appears
that the estimated value of c is positive: the relationship between C and Y is indeed
increasing. Furthermore, the value 0.86 of the marginal propensity to consume
allows us to write that, all other things being equal, an increase of one monetary
unit in income Y is accompanied by an average increase of 0.86 monetary units in
consumption C.

Remark 1.1 The model (1.3) has only one equation describing the relationship
between consumption and income. This is a behavioral equation in the sense that
behavior, i.e., household consumption decisions, depends on changes in income.
The models may also contain technological relationships: these arise, for example,
from constraints imposed by existing technology, or from constraints due to
limited budgetary resources. In addition to these two types of relationships—
behavioral and technological relationships—models frequently include identities,
i.e., technological accounting relationships between variables. For example, the
relationship .Y = C + I + G, where Y denotes output, C consumption expenditure,
I investment expenditure, and G government spending, frequently used in economic
models, is an identity. No parameter needs to be estimated.

3 Strictlyspeaking, a reading of the General Theory suggests that the concave function seems
closest to Keynes’ words; the affine form, however, is the most frequently chosen for practical
reasons.
4 The circumflex (or hat) notation is a simple convention indicating that this is an estimate (and not

an observed value). This convention will be adopted throughout the book.


1.2 Model and Variable 9

1.2.2 Different Types of Data

Having specified the model and in order to estimate it, it is necessary to have
data representative of the economic phenomena being analyzed. In the case of the
Keynesian consumption function, we need the consumption and income data for the
households studied. The main types of data are:

– Time series are variables observed at regular time intervals. For example, the
quarterly series of consumption of French households over the period 1970–2022
constitutes a time series in the sense that an observation of French household
consumption is available for each quarter between 1970 and 2022. The regularity
of observations is called the frequency. In our example, the frequency of the
series is quarterly. A time series can also be observed at annual, monthly, weekly,
daily, intra-daily, etc. frequency.
– Cross-sectional data are variables observed at the same moment in time and
which concern a specific group of individuals (in the statistical sense of the
term).5 An example would be a data set composed of the consumption of
French households in 2022, the consumption of German households in 2022,
the consumption of Spanish households in 2022, etc.
– Panel data are variables that concern a specific group of individuals and are
measured at regular time intervals. An example would be a data set composed
of the consumption of French households over the period 1970–2022, the con-
sumption of German households over the period 1970–2022, the consumption
of Spanish households over the period 1970–2022, etc. Panel data thus have a
double dimension: individual and temporal.

1.2.3 Explained Variable/Explanatory Variable

In the model representing the Keynesian consumption function, two variables are
involved: consumption and income. In accordance with relationship (1.3), income
appears to be the determinant of consumption. In other words, income explains
consumption. We then say that income is an explanatory variable and consumption
is an explained variable.
More generally, the variable we are trying to explain is called the explained
variable or endogenous variable or dependent variable. The explanatory variable
or exogenous variable or independent variable is the variable that explains the
endogenous variable. The values of the explained variable thus depend on the values
of the explanatory variable.
If the model consists of a single equation, there is only one dependent variable.
On the other hand, there may be several explanatory variables. For example,
household consumption can be explained not only by income, but also by the

5 Remember that an individual, or a statistical unit, is an element of the population studied.


10 1 Introductory Developments

unemployment rate. We can write the following model:

C = cY + aU + C0
. (1.5)

where U is the unemployment rate and a is a parameter. In this model, the


dependent variable is consumption C, the explanatory variables are income Y and
the unemployment rate U .

Remark 1.2 In the model .C = cY + C0 , time is not explicitly involved. Suppose


that the consumption and income data are time series. If we assume that income at
date t explains consumption at the same date, then we have:

Ct = cYt + C0
. (1.6)

where t denotes time. Such a model relates variables located at the same moment in
time. However, it is possible to introduce dynamics into the models. Let us consider,
for example, the following model:

Ct = cYt + αCt−1 + C0
. (1.7)

Past consumption (i.e., consumption at date .t −1) acts as an explanatory variable for
current consumption (i.e., consumption at date t). The explanatory variable .Ct−1 is
also called the lagged endogenous variable. The coefficient .α represents the degree
of inertia of consumption. Assuming that .α < 1, the closer .α is to 1, the greater the
degree of consumption inertia. In other words, a value of .α close to 1 means that
past consumption has a strong influence on current consumption. We also speak of
persistence.

1.2.4 Error Term

In the model (1.3), it has been assumed that consumption is explained solely by
income. If such a relationship is true, it is straightforward to obtain the values of the
parameters c and .C0 : it suffices to have two observations and join them by a straight
line, the other observations lying on this same line. However, such a relationship
is not representative of economic reality. The fact that income alone is used as an
explanatory variable in the model may indeed seem very restrictive, as it is highly
likely that other variables contribute to explaining consumption. We therefore add a
term .ε which represents all other explanatory variables not included in the model.
The model is written:

C = cY + C0 + ε
. (1.8)
1.3 Statistics Reminders 11

The term .ε is a random variable called the error or disturbance. It is the error
in the specification of the model, in that it collects all the variables, other than
income, that have been ignored in explaining consumption. The error term thus
provides a measure of the difference between the observed values of consumption
and those that would be observed if the model were correctly specified. The error
term includes not only the model specification error, but it can also represent a
measurement error due to problems in measuring the variables under consideration.

1.3 Statistics Reminders

The purpose of this section is to recall the definition of some basic statistical
concepts that will be used in the remainder of the book: mean, variance, standard
deviation, covariance, and linear correlation coefficient.

1.3.1 Mean

The (arithmetic) mean of a variable is equal to the sum of the values taken by
this variable, divided by the number of observations. Consider a variable X with
T observations: .X1 , X2 , . . . , XT . The (empirical) mean of this series, noted .X̄, is
given by:

1 
T
1
.X̄ = (X1 + X2 + . . . + XT ) = Xt (1.9)
T T
t=1

Example 1.1 The six employees of a small company received the following wages
X (in euros): 1 200, 1 200, 1 300, 1 500, 1 500, and 2 500. The mean wage .X̄ is
therefore: .X̄ = 16 (1,200 + 1,200 + 1,300 + 1,500 + 1,500 + 2,500) = 1,533.33
euros. The mean could also have been calculated by weighting the wages by the
number of employees, i.e.: .X̄ = 16 (1,200×2+1,300×1+1,500×2+2,500×1) =
1,533.33 euros. This is a weighted arithmetic mean.

1.3.2 Variance, Standard Deviation, and Covariance

The variance .V (X) of a variable X is equal to the average of the squares of the
deviations from the mean:

1  2  1 
T
2  2  2
.V (X) = X1 − X̄ + X2 − X̄ + . . . + XT − X̄ = Xt − X̄
T T
t=1
(1.10)
12 1 Introductory Developments

The standard deviation, noted .σX , is the square root of the variance, i.e.:


1 T
 2
.σX = Xt − X̄ (1.11)
T
t=1

In practice, we often use the following formula, obtained by expanding (1.10):

1  2
T
V (X) =
. Xt − X̄2 (1.12)
T
t=1

The use of this formula simplifies the calculations in that it is no longer necessary
to calculate deviations from the mean.
The relationships (1.10), (1.11), and (1.12) are valid when studying a popula-
tion.6 In practice, the study of a population is rare, and we are often limited to
studying a sub-part of the population, i.e., a sample. In this case, a slightly different
measure of variance is used, called the empirical variance, which is given by:7

1 
T
2
2
sX
. = Xt − X̄ (1.13)
T −1
t=1

or:

1  2
T
T
2
.sX = Xt − X̄2 (1.14)
T −1 T −1
t=1

The empirical standard deviation is then:




 1 
T
2
sX =
. Xt − X̄ (1.15)
T −1
t=1

Consider two variables X and Y each comprising T observations. The covari-


ance between these two variables, noted .Cov(X, Y ), is given by:

1  1 
T T
 
Cov(X, Y ) =
. Xt − X̄ Yt − Ȳ = Xt Yt − X̄Ȳ (1.16)
T T
t=1 t=1

6A population is a set of elements, called statistical units or individuals, that we wish to study.
7 The division by .(T − 1) instead of T comes from the loss of one degree of freedom since the
empirical mean (and not the true population mean) is used in calculating the variance.
1.3 Statistics Reminders 13

1.3.3 Linear Correlation Coefficient

The correlation coefficient is an indicator of the link between two variables.8 Thus,
when two variables move together, i.e., vary in the same direction, they are said to
be correlated.
Consider two variables X and Y . The linear correlation coefficient between these
two variables, noted .rXY , is given by:

Cov(X, Y )
. rXY = (1.17)
σX σY
or:
T   
Xt − X̄ Yt − Ȳ
t=1
rXY =
. (1.18)
T  2 T  2
Xt − X̄ Yt − Ȳ
t=1 t=1

or alternatively:

T T T
T Xt Yt − Xt Yt
t=1 t=1 t=1
rXY =
. (1.19)
T T 2 T T 2
T Xt2 − Xt T Yt2 − Yt
t=1 t=1 t=1 t=1

The linear correlation coefficient is such that:

. − 1 ≤ rXY ≤ 1 (1.20)

Thus, the linear correlation coefficient can be positive, negative, or zero. If it


is positive, it means that the variables X and Y move in the same direction: both
variables increase (or decrease) simultaneously. If it is negative, the two variables
move in opposite directions: if one variable increases (respectively decreases), the
other variable decreases (respectively increases). Finally, if it is zero, the covariance
between X and Y equals zero, and the variables are not correlated: there is no linear
relationship between X and Y . More precisely, if the linear correlation coefficient
is close to 1, the variables are strongly positively correlated, and if it is close to .−1,
the variables are strongly negatively correlated.
Figures 1.3, 1.4, and 1.5 schematically illustrate the cases of positive, negative,
and zero linear correlation between two variables X and .Y.

8 Ifmore than two variables are studied, the concept of multiple correlation must be used (see
below).
14 1 Introductory Developments

Fig. 1.3 Positive linear Y


correlation

Fig. 1.4 Negative linear Y


correlation

Fig. 1.5 No linear Y


correlation

Remark 1.3 So far, we have considered a linear correlation between two variables
X and Y : the values of the pair .(X, Y ) appear to lie on a straight line (see Figs. 1.3
and 1.4). When these values are no longer on a straight line, but on a curve of
any shape, we speak of nonlinear correlation. Positive and negative nonlinear
correlations are illustrated in Figs. 1.6 and 1.7.
1.3 Statistics Reminders 15

Fig. 1.6 Positive nonlinear Y


correlation

Fig. 1.7 Negative nonlinear Y


correlation

1.3.4 Empirical Application

Consider the following two annual series (see Table 1.1): the household consump-
tion series (noted C) and the household gross disposable income series (noted Y )
for France over the period 1990–2019. These two series are expressed in real terms,
i.e., they have been deflated by the French consumer price index. The number of
observations is 30.
From the values in Table 1.1, it is possible to calculate the following quantities,
which are necessary to determine the statistics presented above:

30
– . Ct = 30,455,596.93
t=1
30
– . Yt = 34,519,740.64
t=1
30
– . Ct2 = 3.14 × 1013
t=1
16 1 Introductory Developments

Table 1.1 Consumption and gross disposable income of households in France (in e million).
Annual data, 1990–2019
C Y C Y
1990 870,338.41 830,572.81 2005 1,187,709.02 1,049,755.63
1991 868,923.49 832,883.96 2006 1,228,476.20 1,081,354.36
1992 913,134.25 844,679.04 2007 1,260,465.07 1,103,446.70
1993 943,367.50 840,328.00 2008 1,286,487.36 1,128,089.30
1994 950,773.58 847,173.42 2009 1,278,688.70 1,102,747.33
1995 971,315.23 852,111.23 2010 1,292,234.68 1,115,634.36
1996 987,452.68 866,004.67 2011 1,284,970.35 1,114,405.93
1997 982,724.94 867,822.35 2012 1,281,460.14 1,109,228.05
1998 1,018,159.61 902,064.20 2013 1,267,030.06 1,116,070.14
1999 1,039,090.56 916,606.04 2014 1,282,764.87 1,125,426.60
2000 1,083,578.97 957,094.57 2015 1,295,592.76 1,142,198.12
2001 1,126,231.00 985,857.04 2016 1,311,829.11 1,156,441.13
2002 1,146,598.73 991,312.79 2017 1,328,847.32 1,171,560.14
2003 1,148,028.86 1,002,269.15 2018 1,345,881.90 1,183,006.67
2004 1,171,763.22 1,021,751.72 2019 1,365,822.06 1,197,701.47
Data sources: Insee for the consumption and consumer price index series, European Commission
for the gross disposable income series

30
– . Yt2 = 4.04 × 1013
t=1
30
– . Ct Yt = 3.56 × 1013
t=1

From these preliminary calculations, we deduce:

– The mean of consumption and income series:

1
. C̄ = 30,455,596.93 = 1,015,186.56 (1.21)
30
1
. Ȳ = 34,519,740.64 = 1,150,658.02 (1.22)
30
– The standard deviation of the consumption and income series:

1 30
sC =
. 3.14 × 1013 − (1,015,186.56)2 = 125,970.16 (1.23)
29 29
1.4 A Brief Introduction to the Concept of Stationarity 17

and:

σC = 123,852.87
. (1.24)

1 30
sY =
. 4.04 × 1013 − (1,150,658.02)2 = 157,952.45 (1.25)
29 29

and:

σY = 155,297, 60
. (1.26)

– The covariance between consumption and income series:

1
.Cov (C, Y ) = 3.56 × 1013 − 1,015,186.56 × 1,150,658.02
30
= 19,084,753,775.26 (1.27)

By calculating the covariance, we can determine the linear correlation coefficient


between the consumption and income series:

Cov(C, Y ) 19,084,753,775.26
rCY =
. = = 0.9922 (1.28)
σC σY 123,852.87 × 155,297.60

We can see that the linear correlation coefficient is positive and very close to 1.
This indicates a strong positive correlation between consumption and income: the
two series move in the same direction.
This result can be illustrated graphically. Figure 1.8 clearly shows that the series
move together; they share a common trend. Figure 1.9 shows the values of the pair
.(C, Y ). These values are well represented by a straight line, illustrating the fact that

the linear correlation coefficient is very close to 1.

1.4 A Brief Introduction to the Concept of Stationarity

When working on time series, one must be careful to ensure that they are stationary
over time. The methods described in this book, particularly the ordinary least
squares method, are valid only if the time series are stationary. Only a graphical
intuition of the concept of stationarity will be given here; for more details, readers
can refer to Chap. 7. We distinguish between stationarity in the mean and stationarity
in the variance.
18 1 Introductory Developments

1,4E+12

1,3E+12

1,2E+12

1,1E+12

1E+12

9E+11

8E+11
1990 1995 2000 2005 2010 2015
Y C

Fig. 1.8 Consumption (C) and gross disposable income (Y ) series of French households (euros)

1,400,000

1,300,000

1,200,000

1,100,000
Y

1,000,000

900,000

800,000
800,000 880,000 960,000 1,040,000 1,120,000 1,200,000
C

Fig. 1.9 Representation of the values of the pair (consumption, income)

1.4.1 Stationarity in the Mean

A time series is stationary in the mean if its mean remains stable over time. As an
illustration, we have reproduced in a very schematic way a nonstationary series in
Fig. 1.10. We can see that the mean, represented by the dotted line, increases over
time.
In Fig. 1.11, the mean of the series is now represented by a straight line parallel
to the x-axis: the mean is stable over time, suggesting that the series is stationary in
1.4 A Brief Introduction to the Concept of Stationarity 19

Xt

Fig. 1.10 Nonstationary series in the mean

Xt

Fig. 1.11 Stationary series in the mean

the mean. Of course, this intuition must be verified statistically by applying specific
tests, called unit root tests (see Chap. 7).
In order to apply the usual econometric methods, the series studied must be
mean stationary. Otherwise, it is necessary to stationarize the series, i.e., to make
it stationary. The technique commonly used in practice consists in differentiating
the nonstationary series .Xt , i.e., in applying the first difference operator .Δ:

ΔXt = Xt − Xt−1
.
20 1 Introductory Developments

Xt

Fig. 1.12 Nonstationary series in the variance

Thus, very often, to make a series stationary in the mean, it is sufficient to


differentiate it. Here again, the stationarity of the differentiated series must be
verified by applying unit root tests.

1.4.2 Stationarity in the Variance

A stationary time series in the variance is such that its variance is constant over time.
It is also possible to graphically apprehend the concept of stationarity in the
variance. The series shown in Fig. 1.12 is nonstationary in the variance: graphically,
we can see a “funnel-like phenomenon,” indicating that the variance of the series
tends to increase over time. In order to reduce the variability of a series, the
logarithmic transformation is frequently used.9 The logarithm allows the series to
be framed between two lines, i.e., to eliminate the funneling phenomenon, as shown
schematically in Fig. 1.13.

Remark 1.4 In practice, when we want to make a series stationary in both the mean
and the variance, we must first make it stationary in the variance and, then, in the
mean. The result is a series in logarithmic difference. This logarithmic difference

9 The logarithmic transformation is a special case of the Box-Cox transformation used to reduce
the variability of a time series (see Box and Cox 1964, and Chap. 2) below.
1.4 A Brief Introduction to the Concept of Stationarity 21

Xt

Fig. 1.13 Stationary series in the variance

also has an economic interpretation:

Xt Xt − Xt−1
.Yt = Δ log Xt = log Xt − log Xt−1 = log = log 1 +
Xt−1 Xt−1

∼ Xt − Xt−1
= (1.29)
Xt−1

because .log (1 + x) ∼ = x for x small compared to 1; .log denotes the Napierian


logarithm. The logarithmic difference can be interpreted as a growth rate. If .Xt is a
stock price, .Yt can be interpreted as stock returns.

1.4.3 Empirical Application: A Study of the Nikkei Index

To illustrate the concept of stationarity, let us consider the Japanese stock market
index series: the Nikkei 225 index. This series, extracted from the Macrobond
database, has a quarterly frequency and covers the period from the third quarter
of 1949 to the second quarter of 2021 (1949.3–2021.2). The Nikkei index series
is reproduced in Fig. 1.14, whereas Fig. 1.15 represents the dynamics of this same
series in logarithms. These graphs highlight an upward trend in the first half of the
sample, followed by a general downward trend, and then an increasing trend from
the early 2010s. The mean therefore changes over time, reflecting that the Japanese
stock market index series seems nonstationary in the mean.
22 1 Introductory Developments

40000

35000

30000

25000

20000

15000

10000

5000

0
1949 1954 1959 1964 1969 1974 1979 1984 1989 1994 1999 2004 2009 2014 2019

Fig. 1.14 Nikkei 225 index, 1949.3–2021.2

11

10

4
1949 1954 1959 1964 1969 1974 1979 1984 1989 1994 1999 2004 2009 2014 2019

Fig. 1.15 Nikkei 225 index in logarithms, 1949.3–2021.2

Faced with the apparent non-stationarity (in the mean) of the Nikkei index series,
we differentiate it by applying the first difference operator. We then obtain the series
of returns .Rt of the Nikkei index:

Xt ∼ Xt − Xt−1
.Rt = Δ log Xt = log Xt − log Xt−1 = log = (1.30)
Xt−1 Xt−1
1.5 Databases and Software 23

0,4

0,3

0,2

0,1

-0,1

-0,2

-0,3

-0,4

-0,5
1949 1954 1959 1964 1969 1974 1979 1984 1989 1994 1999 2004 2009 2014 2019

Fig. 1.16 Nikkei 225 returns, 1949.4–2021.2

where .Xt denotes the Nikkei 225 stock index. The series of returns is displayed
in Fig. 1.16. As shown, the upward trend in the mean has been suppressed by
the differentiation operation, suggesting that the returns series is a priori mean
stationary.

1.5 Databases and Software

As we have already mentioned, there are many databases in the field of economics
and finance, which have expanded considerably in recent decades. The aim here
is not to give an exhaustive list, but to provide some reference points concerning
a number of frequently used databases. Similarly, we will mention some of the
econometric software that practitioners often use.

1.5.1 Databases

We provide below some indications concerning various databases frequently used


in economics and finance, remembering that this list—arranged alphabetically—is
by no means exhaustive:

– Bank for International Settlements (open access): financial and monetary data
– Banque de France (free access): economic, monetary, banking, and financial data
for France and the eurozone
24 1 Introductory Developments

– British Petroleum (open access): energy data (oil, gas, electricity, biofuels, coal,
nuclear, etc.)
– CEPII (open access): databases in international macroeconomics and interna-
tional trade
– Datastream/Eikon: economic and financial database with many series for all
countries
– DB.nomics (open access): many economic data sets provided by national and
international institutions for most countries
– ECONDATA (free access): server on databases available online
– Economagic (free access): numerous macroeconomic and financial series, on the
United States, the eurozone, and Japan
– Euronext (free access): data and statistics on stock markets
– European Central Bank (ECB Statistical Data Warehouse, open access): eco-
nomic and financial data for Europe
– Eurostat (free access): socio-economic indicators for European countries, aggre-
gated by theme, country, region, or sector
– Eurozone Statistics (ESCB, free access): eurozone and national central bank
statistics
– FAO (Food and Agriculture Organization of the United Nations, FAOSTAT, open
access): food and agricultural data for most countries
– Insee (free access): statistics and data series for the French economy, quarterly
national accounts
– International Monetary Fund (IMF, partly open access): numerous databases,
including International Financial Statistics (IFS) and World Economic Outlook
(WEO) covering most countries
– Macrobond: economic and financial database with a wide range of series for all
countries
– National Bureau of Economic Research (NBER, open access): various macroe-
conomic, sectoral, and international series
– OECD (open access): statistics and data at national and sectoral levels for OECD
countries, China, India, Indonesia, Russia, and South Africa
– Penn World Table (free access): annual national accounts series for many
countries
– UN (open access): macroeconomic and demographic series and statistics
– UNCTAD (open access): data on international trade, foreign direct investments,
commodity prices, population, macroeconomic indicators, etc.
– WebEc World Wide Web Resources in Economics (free access): server on
economics and econometrics resources
– Worldbank, World Development Indicators (WDI, free access): annual macroeco-
nomic and financial series for most countries, numerous economic development
indicators
– World Inequality Database (WID, open access): database on global inequalities

Many other databases are available for macroeconomic, socio-economic, microe-


conomic, and financial data, and it is, of course, impossible to list them all here.
1.5 Databases and Software 25

1.5.2 Econometric Software

Most of the applications presented in this book have been processed with Eviews
software, this choice being here guided by pedagogical considerations. Of course,
there are many other econometric and statistical software packages, some of which
are freely available. We mention a few of them below, in alphabetical order, empha-
sizing once again that these lists—one of which concerns commercial software, the
other open-source software—are by no means intended to be exhaustive.
Let us start by mentioning some software packages that require a paid license:

– EViews: econometric software, more particularly adapted for time series analysis
– GAUSS: programming language widely used in statistics and econometrics
– LIMDEP and NLOGIT: econometric software adapted for panel data, discrete
choice, and multinomial choice models
– Matlab: programming language for data analysis, modeling, and algorithmic
programming
– RATS: econometric software, more particularly adapted for time series analysis
– S: statistical programming language; an open-access version of which is R (see
below)
– SAS: statistical and econometric software, allowing the processing of very large
databases
– SPAD: software for data analysis, statistics, data mining, and textual data analysis
– SPSS: statistical software for advanced analysis
– Stata: general statistical and econometric software, widely used, especially in
panel data econometrics

Open-source software includes:

– Gretl (Gnu Regression, Econometrics and Time-Series Library): general econo-


metric software.
– Grocer: library of econometric programs, developed from Scilab and Matlab
software and languages.
– JMulTi: econometric software, specialized in the analysis of univariate and
multivariate time series, including in the nonlinear domain.
– Ox: programming language used in econometrics and matrix calculation.
– Python: a general-purpose programming language, widely used in econometrics
and in the field of big data thanks to its complementary modules like NumPy,
Pandas, StatsModels, etc. Also worth mentioning is the Jupyter application,
mainly based on the Python language, which is part of the reproducible research
field.
– R: this is the open-access version of the S language. R is widely used and
has become a reference language in statistics and econometrics, with the
development of many packages in all fields of econometrics.
26 1 Introductory Developments

– RunMyCode: a user-friendly platform allowing authors to make their data and


codes (programs) freely available to everyone to promote reproducible research.

Conclusion

This introductory chapter has recalled some basic concepts in statistics and econo-
metrics. In particular, it has highlighted the importance of the correlation coefficient
in determining whether two variables move together. The next chapter extends this
with a detailed presentation of the basic econometric model: the simple regression
model. This model links the behavior of two variables, in the sense that one of them
explains the other. The notion of correlation is thus deepened, as we study not only
whether two variables move together, but also whether one of them has explanatory
power over the other.

The Gist of the Chapter

Let X and Y be two variables with T observations.

T
Mean .X̄ = 1
T Xt
t=1
T  2
Variance .V (X)= T1 Xt − X̄
√ t=1
Standard deviation .σX = V (X)
T  2
.sX = Xt − X̄
2 1
Empirical variance T −1
 t=1
Empirical standard deviation .sX = sX 2

T   
.Cov(X, Y ) = Xt − X̄ Yt − Ȳ
1
Covariance T
t=1
Cov(X,Y )
Correlation coefficient .rXY = σX σY , .−1 ≤ rXY ≤ 1

Further Reading

For further information on econometric methodology, see Hendry (1995) or Spanos


(1999). For a simple presentation of the different types of data, refer to Intriligator
(1978), and for a critical review of the content and accuracy of economic data, see
Morgenstern (1963).
As for statistics, there are many books available. Among the works in English,
readers can refer to Newbold (1984) for a simple and applied presentation of
statistics, or to Hoel (1974) for an introduction. Mood et al. (1974) also provide
a fairly comprehensive introduction to statistical methods.
The Simple Regression Model
2

Regression analysis consists in studying the dependence of a variable (the explained


variable) on one or more other variables (the explanatory variables).
Let us look at some examples. When a company, or a brand owner, advertises
one of its products, does it increase its sales? In other words, is there a relationship
between product sales and advertising expenditure? Does a family’s consumer
spending depend on its size? To what extent does an increase in household income
affect consumption? Is there a link between mortality or morbidity rates and
the number of cigarettes consumed? Are children’s school results dependent on
parental income? All these questions, and others of the kind, can be answered using
regression analysis.
When only one explanatory variable is considered, we speak of simple regres-
sion. When there are several explanatory variables, we speak of multiple regres-
sion. The simple regression model is thus a linear model comprising a single
equation linking an explained variable to an explanatory variable. It is therefore
a bivariate model.
The simple regression model is a random model in the sense that an error term is
included in the equation linking the dependent variable to the explanatory variable.
It should be recalled that this error term allows us to take into account discrepancies
between the explanation given by the model and reality.

2.1 General
2.1.1 The Linearity Assumption

Consider two variables X and Y . We distinguish between linearity in the variables


and linearity in the parameters.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 27


V. Mignon, Principles of Econometrics, Classroom Companion: Economics,
https://doi.org/10.1007/978-3-031-52535-3_2
28 2 The Simple Regression Model

Linearity in the Variables


Let f be a function such that:

Y = f (X)
. (2.1)

where Y is the dependent variable and X the explanatory variable. The function
f is said to be linear in X if the power of X is equal to unity and if X is not
multiplied or divided by another variable. In other words, Y is linearly related to X
if the derivative of Y with respect to X—i.e., the slope of the regression line—is
independent of X.
As an example, the model:

Y = 3X
. (2.2)

dY
is linear since . dX = 3: the derivative of Y with respect to X is independent of X.
More generally, the model:

Y = α + βX
. (2.3)

is a linear model with respect to X and Y .


Now consider the following model:

. log Y = α + β log X (2.4)

This model is not linear with respect to X and Y , but it is linear with respect to
log X and .log Y . Similarly the model:
.

. log Y = α + βX (2.5)

is linear with respect to X and .log Y . The model:


 
1
. Y = exp α + β (2.6)
X

can also be written using the logarithmic transformation:

1
. log Y = α + β (2.7)
X
which is a linear model in .1/X and .log Y .

Remark 2.1 Some models can be linearized. The model:

Y = βX2
. (2.8)
2.1 General 29

is not linear in X because X is assigned a power of 2. This model can, however, be


linearized by applying the logarithmic transformation:
 
. log Y = log βX2 = log β + 2 log X (2.9)

The model (2.9) thus becomes a linear model in .log X and .log Y .

Linearity in the Parameters


A function is said to be linear in the parameters if they are assigned a power equal
to unity and are not multiplied or divided by one or more other parameters. Thus,
the model:

Y = α + βX
. (2.10)

is linear in the parameters .α and .β. Similarly, the model:

Y = α + βX2
. (2.11)

is also linear in the parameters. In contrast, the models:

Y = α + β 2X
. (2.12)

and:
β
Y =α+
. X (2.13)
α
are not linear in the parameters.

Linear Model
We wrote in the introduction to this chapter that the simple regression model is a
linear model. The linearity discussed here is the linearity in the parameters. The
methods described in this chapter therefore apply to models that are linear in the
parameters. Of course, the model under study can also be linear in the variables, but
this is not necessary in the sense that it is sufficient that the model can be linearized.
In other words, the model can be linear in X or in any transformation of X.

2.1.2 Specification of the Simple Regression Model and Properties


of the Error Term

The simple regression model studied in this chapter is written as:

Y = α + βX + ε
. (2.14)
30 2 The Simple Regression Model

where Y is the dependent variable, X is the explanatory variable, and .ε is the error
term (or disturbance). The parameters (or coefficients) of the model are .α and .β.
It is assumed that the variable X is observed without error, i.e., that X is a certain
variable. Therefore, the variable X is independent of the error term .ε. The variable
Y is a random variable, its random nature coming from the presence of the error
term in the model.
Suppose that the variables X and Y each include T observations: we note .Xt , t =
1, . . . , T , and .Yt , t = 1, . . . , T . The simple regression model is then written:

Yt = α + βXt + εt
. (2.15)

t may designate:

– Time: in which case, we speak of a time series model


– An individual: in which case, we speak of a cross-sectional model with the
number of observations T representing the number of individuals

The error term cannot be predicted for every observation, but a number of
assumptions can be made, which are described below.

The Nullity of the Mean Error


First, the error term can take on negative and positive values. There is no reason
for positive (respectively negative) values to be higher or lower than negative
(respectively positive) values. In other words, there is no bias in favor of positive
values, nor in favor of negative values. We deduce that the mathematical expectation
E of the error is zero, i.e.:

E (εt ) = 0 ∀t
. (2.16)

This assumption means that, on average, the model is correctly specified and
therefore that, on average, the error is zero.

The Absence of Autocorrelation in Errors


Second, it is assumed that the error term is not autocorrelated: the value in t does
not depend on the value in .t ' for .t /= t ' . In other words, if we consider a time
series model, this means that the error made at one date t is not correlated with
the error made at another date. For example, if the error made at t is positive, the
probability of observing a positive error at .t + 1 is neither increased nor decreased.
This hypothesis of uncorrelated errors is written as follows:

E (εt εt ' ) = 0 ∀t /= t '


. (2.17)

The Homoskedasticity of Errors


Third, it is assumed that the variance of the error term is constant regardless of
the sample. If we consider a time series model, this means that the variance of the
2.1 General 31

error term is constant over time. In the case of a cross-sectional model, this refers to
the fact that the variance does not differ between individuals. The constant variance
assumption is the homoskedasticity hypothesis. A series whose variance is constant
is said to be homoskedastic.1 Mathematically, this hypothesis is written as follows:
 
E εt2 = σε2 ∀t
. (2.18)

where .σε2 represents the variance of the error term.

Remark 2.2 The assumptions of no autocorrelation and homoskedasticity of errors


can be gathered under the following expression:

0 ∀t /= t '
.E (εt εt ' ) = (2.19)
σε2 ∀t = t '

The errors that simultaneously satisfy the assumptions of homoskedasticity and


no autocorrelation are called spherical errors.
In addition, a series .εt verifying the relationships (2.16) and (2.19) is called white
noise. More generally, the following definition can be used.

Definition 2.1 A stationary process .εt is white noise if:

E (εt ) = 0 ∀t
. (2.20)

0 ∀t /= t '
.E (εt εt ' ) = (2.21)
σε2 ∀t = t '

White noise is thus a zero mean, constant variance, and non-autocorrelated


process. We note:
 
εt ∼ W N 0, σε2
. (2.22)

The Normality of Errors


Under the central limit theorem,2 it is assumed that the error term follows a
normal distribution with zero mean (or expectation) and constant variance (see

1 The hypothesis of homoskedasticity is opposed to that of heteroskedasticity. A series whose

variance evolves over time (for a time series model) or differs between individuals (for a cross-
sectional model) is called a heteroskedastic series.
2 Central limit theorem: let .X , X , . . . , X , be n independent random variables with the same
1 2 n
probability density function of mean m and variance .σ 2 . When n tends to infinity, then the sample
n
mean .X̄ = n1 Xi tends towards a normal distribution with mean m and variance .σ 2 /n.
i=1
32 2 The Simple Regression Model

Appendix 2.2 for a detailed presentation of the normal distribution). We thus add
the assumption of normality of the distribution of the error term to the assumptions
of nullity of the expectation (Eq. (2.16)) and of homoskedasticity (Eq. (2.18)), which
can be written as follows:
 
.εt ∼ N 0, σε
2
(2.23)

where N denotes the normal distribution, and the sign “.∼” means “follow the law.”

Remark 2.3 The assumption that the errors follow a normal distribution with zero
expectation and constant variance and that they are not autocorrelated can also be
formulated by writing that the errors are normally and independently distributed
(Nid), which is noted:
 
εt ∼ Nid 0, σε2
. (2.24)

If the errors follow the same distribution other than the normal distribution, we
speak of identically and independently distributed (iid) errors, which is noted:
 
εt ∼ iid 0, σε2
. (2.25)

Remark 2.4 The assumption of normality of the errors is not necessary to establish
the results of the regression model. However, it does allow us to derive statistical
results and construct test statistics (see below).

2.1.3 Summary: Specification of the Simple Regression Model

The complete specification of the simple regression model studied in this chapter is
written as:

Yt = α + βXt + εt
. (2.26)

with:

E (εt ) = 0 ∀t
. (2.27)

0 ∀t /= t '
E (εt εt ' ) =
. (2.28)
σε2 ∀t = t '

and:
 
εt ∼ N 0, σε2
. (2.29)
2.2 The Ordinary Least Squares (OLS) Method 33

We can also write the complete specification of the simple regression model by
combining the relations (2.27), (2.28), and (2.29):

Yt = α + βXt + εt
. (2.30)

with:
 
εt ∼ Nid 0, σε2
. (2.31)

2.2 The Ordinary Least Squares (OLS) Method


2.2.1 Objective and Reminder of Hypotheses

The parameters .α and .β of the simple regression model between X and Y are
unknown. If we wish to quantify this relationship between X and Y , we need to
estimate these parameters. This is our objective.
More precisely, from the observed values of the series .Xt and .Yt , the aim is to
find the quantified relationship between these two variables, i.e.:

Ŷt = α̂ + β̂Xt
. (2.32)

where .α̂ and .β̂ are the estimators of the parameters .α and .β. .Ŷt is the estimated (or
adjusted of fitted) value of .Yt . The most frequently used method for estimating the
parameters .α and .β is the ordinary least squares (OLS) method.
The implementation of the OLS method requires a certain number of assumptions
set out previously and recalled below:

– The variable .Xt is observed without error and is generated by a mechanism


unrelated to the error term .εt . In other words, the correlation between .Xt and
.εt is zero, i.e.: .Cov (Xt , εt ) = 0 .∀t.
3

– The expectation of the error term is zero: .E (εt ) = 0 .∀t.


– The
 errors are homoskedastic and not autocorrelated, i.e., .E (εt εt ' ) =
0 ∀t /= t '
.
σε2 ∀t = t '

3 Assuming that the variable .Xt is nonrandom simplifies the analysis in the sense that it allows us
to use mathematical statistical results by considering .Xt as a known variable for the probability
distribution of the variable .Yt . However, such an assumption is sometimes difficult to maintain in
practice, and the fundamental assumption is, in fact, the absence of correlation between the variable
.Xt and the error term.
34 2 The Simple Regression Model

Fig. 2.1 The OLS principle Y

Yt
^ ⎧ ^ ^ ^
^ et = Yt − Yt ⎨⎩ Yt = α + β × Xt
Yt

Xt
X

2.2.2 The OLS Principle

Figure 2.1 plots the values of the pair .(Xt , Yt ) for .t = 1, . . . , T . We obtain a scatter
plot that we try to fit with a line. Any line drawn through this scatter plot may be
considered as an estimate of the linear relationship under consideration:

Yt = α + βXt + εt
. (2.33)

The equation of such a line, called the regression line or OLS line, is:

Ŷt = α̂ + β̂Xt
. (2.34)

where .α̂ and .β̂ are the estimators of the parameters .α and .β. The estimated value
Ŷt of .Yt is the ordinate of a point on the line whose abscissa is .Xt . As shown in
.

Fig. 2.1, some points of the pair .(Xt , Yt ) lie above the line (2.34), and others lie
below it. There are therefore deviations, noted .et , from this line:

et = Yt − Ŷt = Yt − α̂ − β̂Xt
. (2.35)

for .t = 1, . . . , T . These deviations are called residuals.


Intuitively, it seems logical to think that the better a line fits the scatter plot, the
smaller the deviations .et . The OLS method thus consists in finding the estimators .α̂
and .β̂ such that the sum of the squares of the differences between the values of .Yt
and those of .Ŷt is minimal. In other words, the method consists in minimizing the
squared distance between each observation and the line (2.34), which is equivalent
2.2 The Ordinary Least Squares (OLS) Method 35

to minimizing the sum of squared residuals. The OLS principle can then be stated:


T
OLS ⇐⇒ Min
. et2 (2.36)
t=1

The objective is to find .α̂ and .β̂ such that the sum of squared residuals is minimal.

2.2.3 The OLS Estimators


Searching for Estimators
The OLS estimators .α̂ and .β̂ of the parameters .α and .β are given by:

. α̂ = Ȳ − β̂ X̄ (2.37)

and:
Cov(Xt , Yt )
. β̂ = (2.38)
V (Xt )

Let us demonstrate these formulas. Using Eq. (2.35), we can write the sum of
squared residuals as:


T T 
 2
. et2 = Yt − α̂ − β̂Xt (2.39)
t=1 t=1

To obtain the estimators .α̂ and .β̂, we have to minimize this expression with
respect to the parameters .α̂ and .β̂. We are therefore looking for the values .α̂ and
.β̂ such that:

   

T 
T
∂ et2 ∂ et2
t=1 t=1
. = =0 (2.40)
∂ α̂ ∂ β̂

First, let us calculate the derivative of the sum of squared residuals with respect
to .α̂:
T  T  2 
 2 
∂ et ∂ Yt − α̂ − β̂Xt
t=1 t=1
. = =0 (2.41)
∂ α̂ ∂ α̂
36 2 The Simple Regression Model

That is:
T 
 
. −2 Yt − α̂ − β̂Xt = 0 (2.42)
t=1

Hence:
T 
 
. Yt − α̂ − β̂Xt = 0 (2.43)
t=1


T
Noting that . α̂ = T α̂, we deduce:
t=1


T 
T
. Yt = T α̂ + β̂ Xt (2.44)
t=1 t=1

Now let us determine the derivative of the sum of squared residuals with respect
to .β̂:
   T  2 

T 
∂ et2 ∂ Yt − α̂ − β̂Xt
t=1 t=1
. = =0 (2.45)
∂ β̂ ∂ β̂

That is:
T 
 
. −2 Yt − α̂ − β̂Xt Xt = 0 (2.46)
t=1

Hence:
T 
 
. Yt − α̂ − β̂Xt Xt = 0 (2.47)
t=1

Expanding this expression, we obtain:


T 
T 
T
. Xt Yt = α̂ Xt + β̂ Xt2 (2.48)
t=1 t=1 t=1

Equations (2.44) and (2.48), called


 the normal
 equations, form a system of
two equations with two unknowns . α̂ and β̂ that we have to solve. By dividing
2.2 The Ordinary Least Squares (OLS) Method 37

Eq. (2.44) by T , we get:

1  1 
T T
. Yt = α̂ + β̂ Xt (2.49)
T T
t=1 t=1

Hence:

. Ȳ = α̂ + β̂ X̄ ⇐⇒ α̂ = Ȳ − β̂ X̄ (2.50)

Equation (2.50) gives us the OLS estimator .α̂ of .α and states that the regression
line passes through the mean point . X̄, Ȳ .
Let us now determine the expression of the OLS estimator .β̂ of .β. For this
purpose, we replace .α̂ by its value given in (2.50) in Eq. (2.48):


T  
T 
T
. Xt Yt = Ȳ − β̂ X̄ Xt + β̂ Xt2 (2.51)
t=1 t=1 t=1

That is:


T 
T 
T 
T
. Xt Yt = β̂ Xt2 − X̄ Xt + Ȳ Xt (2.52)
t=1 t=1 t=1 t=1

We deduce:


T 
T 
T 
T
. β̂ Xt2 − X̄ Xt = Xt Yt − Ȳ Xt (2.53)
t=1 t=1 t=1 t=1

Hence:


T 
T 
T
Xt Yt − 1
T Xt Yt
t=1 t=1 t=1
. β̂ =  2 (2.54)

T 
T
Xt2 − 1
T Xt
t=1 t=1

We have:
2
1  2 1  2 1 
T T T
.V (Xt ) = Xt − X̄ = Xt −
2
Xt (2.55)
T T T
t=1 t=1 t=1
38 2 The Simple Regression Model

Hence:
2

T
1 
T
T V (Xt ) =
. Xt2 − Xt (2.56)
T
t=1 t=1

We deduce that the denominator of (2.54) is equal to .T V (Xt ).


It is also known that the covariance between .Xt and .Yt is given by:

1 
T
Cov(Xt , Yt ) =
. Xt Yt − X̄ Ȳ (2.57)
T
t=1

That is:

1  1  1 
T T T
Cov(Xt , Yt ) =
. Xt Yt − Xt Yt (2.58)
T T T
t=1 t=1 t=1

Hence:


T
1  
T T
T Cov(Xt , Yt ) =
. Xt Yt − Xt Yt (2.59)
T
t=1 t=1 t=1

We deduce that the numerator of (2.54) is .T Cov(Xt , Yt ). Therefore, we have:

T Cov(Xt , Yt )
.β̂ = (2.60)
T V (Xt )

Finally, the OLS estimator .β̂ of .β is given by:

Cov(Xt , Yt )
β̂ =
. (2.61)
V (Xt )

Remark 2.5 (Case of Centered Variables) When the variables are centered, i.e.,
when observations are centered on their mean:

xt = Xt − X̄ and yt = Yt − Ȳ
. (2.62)

the OLS estimators .α̂ and .β̂ are, respectively, given by:

α̂ = Ȳ − β̂ X̄
. (2.63)
2.2 The Ordinary Least Squares (OLS) Method 39

and:


T
xt yt
t=1
β̂ =
. (2.64)
T
xt2
t=1

Remark 2.6 Here we have focused on estimating the regression model using the
OLS method. Another estimation method is the maximum likelihood procedure.
This method is presented in the appendix to this chapter. It leads to the same
estimators of the coefficients .α and .β as the OLS method. However, the maximum
likelihood estimator of the error variance is biased (see Appendix 2.3).

Example: The Phillips Curve and the Natural Unemployment Rate


The Phillips curve is one of the most widely studied relationships in macroeco-
nomics. According to the modified version4 of the Phillips curve, there is a negative
relationship between the inflation rate and the unemployment rate. Taking into
account inflation expectations, this relationship can be written in the following form:

πt − E [πt |It−1 ] = γ ut − u∗ + εt
. (2.65)

where .πt is the inflation rate (measured as the growth rate of the consumer price
index) at date t, .E [πt |It−1 ] is the expectation (made at date .t − 1) for the
inflation rate .πt given the set of information I available at date .(t − 1), .ut is the
unemployment rate at date t, and .u∗ is the natural rate of unemployment. In order to
make this model operational, we need to make an assumption about the formation of
expectations. Let us assume that the expected inflation rate is equal to the inflation
rate of the previous period, i.e.:

E [πt |It−1 ] = πt−1


. (2.66)

The model to be estimated can therefore be written:

πt − πt−1 = α + βut + εt
. (2.67)

where .β = γ and .α = −γ u∗ . This equation shows that the variation in the inflation
rate between t and .t − 1 is a function of the unemployment rate at date t. It is also

4 The original version related the rate of change of nominal wages to the unemployment rate. Let

us recall that this was originally a relationship estimated by Phillips (1958) for the British economy
for the period 1861–1957.
40 2 The Simple Regression Model

Table 2.1 US inflation and t .πt .πt − πt−1 .ut


unemployment rates,
1957 2.8986 .−0.0865 5.2
1957–2020
1958 1.7606 .−1.1380 6.2
1959 1.7301 .−0.0305 5.3
1960 1.3605 .−0.3696 6.6
... ... ... ...
2017 2.1091 0.0345 4.1
2018 1.9102 .−0.1989 3.9
2019 2.2851 0.3750 3.6
2020 1.3620 .−0.9231 6.7
Data sources: US Bureau of Labor
Statistics (BLS) for the unemploy-
ment rate (noted .ut ) and IMF, Inter-
national Financial Statistics, for the
inflation rate (noted .πt )

possible to calculate the natural rate of unemployment:

α̂
u∗ =
. (2.68)
β̂

Equation (2.67) is a simple regression model since it explains the variation in the
inflation rate by a single explanatory variable, the unemployment rate. To illustrate
this, let us consider annual data for the inflation rate and the unemployment rate in
the United States over the period 1956–2020. Of course, calculating the change in
the inflation rate at t requires the value of the inflation rate at .(t − 1) to be known.
Given that this series only begins in 1957, the estimation of Eq. (2.67) will therefore
cover the period 1957–2020. Table 2.1 shows the first and last values of each series.
Before proceeding with the estimation, let us graphically represent the series
in order to get a first idea of the potential relationship between the two variables.
Figure 2.2 reproduces the dynamics of the unemployment rate (denoted U NEMP )
and the variation in the inflation rate (denoted DI N F ) over the period 1957–2020.
Generally, this graph shows that there seems to be a negative relationship between
the two variables, in the sense that periods of rising unemployment are frequently
associated with periods of falling inflation and vice versa. We would therefore
expect to find a negative relationship between the two variables.
To extend this intuition, we can graphically represent the scatter plot, i.e., the
values of the pair (unemployment rate, change in the inflation rate). Figure 2.3
shows that the scatter plot appears to be concentrated around a line with a generally
decreasing trend, confirming the negative nature of the relationship between the two
variables. Let us now proceed to the OLS estimation of the relationship between the
two variables to confirm these intuitions.
2.2 The Ordinary Least Squares (OLS) Method 41

12

10

-2

-4

-6
1957 1961 1965 1969 1973 1977 1981 1985 1989 1993 1997 2001 2005 2009 2013 2017
DINF UNEMP

Fig. 2.2 Unemployment rate (U N EMP ) and change in the inflation rate (DI N F ), United States,
1957–2020

2
DINF

-2

-4

-6
3 4 5 6 7 8 9 10 11
UNEMP

Fig. 2.3 Values of the pair (UNEMP, DINF)

Estimating Eq. (2.67) performed over the period 1957–2020 leads to the follow-
ing result:

πt 
. − πt−1 = 2.70 − 0.46ut (2.69)

This model shows us that the coefficient assigned to the unemployment rate is
negative: there is indeed a decreasing relationship between the unemployment rate
42 2 The Simple Regression Model

Fig. 2.4 Scatter plot, 2.00E+12


household consumption and
income series
1.60E+12

1.20E+12

INCOME
8.00E+11

4.00E+11

0.00E+00
0.00E+00 4.00E+11 8.00E+11 1.20E+12
CONSUMPTION

and the change in the inflation rate. The estimated value, .−0.46, also allows us to
write that if the unemployment rate falls by 1 point, the change in the inflation rate
increases by 0.46 points on average. The ratio 2.70/0.46 gives us the estimated value
of the natural unemployment rate, i.e., 5.87. Over the period under consideration, the
natural unemployment rate is therefore equal to 5.87%. Note in particular that, while
between 2014 and 2019 the observed unemployment rate was lower than its natural
level, this was no longer the case in 2020—a result that may well be explained by
the effects of the Covid-19 pandemic.

A Cross-Sectional Example: The Consumption-Income Relationship


To illustrate that the OLS method also applies to cross-sectional data, consider
household consumption and gross disposable income data for various countries for
the year 2004. The data are expressed in real terms5 and converted to dollars for
consistency. Figure 2.4 shows the scatter plot for the 43 countries considered.6 It is
clear that the points are distributed around a straight line, suggesting the existence
of a linear relationship between the two variables for all countries. Furthermore, the
relationship is increasing, showing that when income increases, consumption tends
to follow a similar upward trend.

5 The series were deflated by the consumer price index of each country.
6 The data are from the World Bank. The 43 countries considered are Albania, Armenia, Austria,
Azerbaijan, Belarus, Belgium, Bulgaria, Canada, Croatia, Czech Republic, Denmark, Estonia, Fin-
land, France, Georgia, Germany, Greece, Hungary, Iceland, Ireland, Italy, Kazakhstan, Kyrgyzstan,
Latvia, Lithuania, Luxembourg, Macedonia, Moldova, Netherlands, Norway, Poland, Portugal,
Romania, Russia, Serbia and Montenegro, Slovakia, Slovenia, Spain, Sweden, Switzerland,
Turkey, Ukraine, and United Kingdom.
2.2 The Ordinary Least Squares (OLS) Method 43

These intuitions can be confirmed by estimating the regression of consumption


on income for households in the 43 countries studied. The OLS estimation leads to
the following relationship:

. 
CONSUMPTION 2004 = 3.98.10 + 0.61INCOME 2004
9
(2.70)

This estimation shows that the relationship between consumption and income is
indeed increasing, since the value of the coefficient assigned to income is positive.
This coefficient represents the marginal propensity to consume: an increase of 10
monetary units in gross disposable income in 2004 leads, all other things being
equal, to an increase of 6.1 monetary units in consumption the same year.

Summary and Properties


Let us summarize the main results obtained so far. According to the previous
developments, the OLS estimators .α̂ and .β̂ of the parameters .α and .β are given
by:

α̂ = Ȳ − β̂ X̄
. (2.71)

Cov(Xt , Yt )
β̂ =
. (2.72)
V (Xt )

The expression:

Ŷt = α̂ + β̂Xt
. (2.73)

is the regression line or OLS line. .β̂ is the slope of the regression line. The variable
Ŷt is the estimated variable (or adjusted or fitted variable). The difference between
.

the observed value and the estimated value of the dependent variable is called the
residual:

et = Yt − Ŷt = Yt − α̂ − β̂Xt
. (2.74)

for .t = 1, . . . , T , and is a measure of the error .εt .


We have also highlighted some properties of the linear regression, which we
summarize below.

Property 2.1 The regression line passes through the mean point . X̄, Ȳ .

This property, as we have seen, is derived from the relationship .Ȳ = α̂ + β̂ X̄.
Furthermore, knowing that the regression line is given by:

Ŷt = α̂ + β̂Xt
. (2.75)
44 2 The Simple Regression Model

we deduce:

Ŷ = α̂ + β̂ X̄ = Ȳ
. (2.76)

which can be formulated by the following property.

Property 2.2 The observed .Yt and estimated .Ŷt variables have the same mean:
Ŷ = Ȳ .
.

Knowing that the residuals are given by the difference between the observed and
estimated variables, i.e., .et = Yt − Ŷt , we have:

ē = Ȳ − Ŷ
. (2.77)

By virtue of Property 2.2, we deduce that .ē = 0, which is expressed by the


following property.

Property 2.3 On average, the residuals are zero:

ē = 0
. (2.78)

i.e., the sum of residuals is zero:


T
. et = 0 (2.79)
t=1

This property means that, on average, the model is correctly estimated.

Property 2.4 The covariance between the residuals and the explanatory variable .Xt
is zero, as is the covariance between the residuals and the estimated variable .Ŷt :
 
Cov (Xt , et ) = 0 and Cov Ŷt , et = 0
. (2.80)

Let us prove this property. We have (see Box 2.1):


   
.Cov (Xt , et ) = Cov Xt , Yt − Ŷt = Cov (Xt , Yt ) − Cov Xt , Ŷt (2.81)

Moreover:
     
.Cov Xt , Ŷt = Cov Xt , α̂ + β̂Xt = Cov Xt , β̂Xt = β̂Cov (Xt , Xt )

= β̂V (Xt ) (2.82)


2.2 The Ordinary Least Squares (OLS) Method 45

According to the expression of .β̂ (Eq. (2.72)), we have:

Cov(Xt , Yt ) = β̂V (Xt )


. (2.83)

Hence:
 
Cov Xt , Ŷt = Cov(Xt , Yt )
. (2.84)

Equation (2.81) therefore gives us the following result:

Cov (Xt , et ) = 0
. (2.85)

stipulating the absence of correlation between the explanatory variable and the
residuals.  
Let us now show that .Cov Ŷt , et = 0. We have:
     
Cov Ŷt , et = Cov α̂ + β̂Xt , et = Cov β̂Xt , et = β̂Cov (Xt , et )
. (2.86)

Using Eq. (2.85), we deduce:


 
Cov Ŷt , et = 0
. (2.87)

which means that the estimated variable and the residuals are not correlated.

Box 2.1 Properties of the variance and the covariance


Consider two variables X and Y and two constants a and b:

.V (X + Y ) = V (X) + V (Y ) + 2Cov(X, Y )
.V (X − Y ) = V (X) + V (Y ) − 2Cov(X, Y )
.V (aX) = a V (X)
2

.V (a + X) = V (X)

.V (aX + bY ) = a V (X) + b V (Y ) + 2abCov(X, Y )


2 2

.V (aX − bY ) = a V (X) + b V (Y ) − 2abCov(X, Y )


2 2

.Cov(X, X) = V (X)

.Cov(aX, bY ) = abCov(X, Y )

.Cov(a + X, b + Y ) = Cov(X, Y )
46 2 The Simple Regression Model

Property 2.5 A change of origin does not modify the parameter .β̂.

To demonstrate this property, let us perform the following change of origin:

. Wt = Xt + a and Zt = Yt + b (2.88)

where a and b are constants. The regression model .Yt = α +βXt +εt is then written
as:

Zt − b = α + β (Wt − a) + εt
. (2.89)

Hence:

Zt = α + b − βa + βWt + εt
. (2.90)

Let us note .α ' = α + b − βa. We have:

Zt = α ' + βWt + εt
. (2.91)

It appears that the intercept is modified, but not the parameter .β. We can also
note that:
Cov(Wt , Zt ) Cov(Xt + a, Yt + b) Cov(Xt , Yt )
β̂ =
. = = (2.92)
V (Wt ) V (Xt + a) V (Xt )

Property 2.6 A change of scale generally modifies the parameter .β̂.

Consider the following two variables:

Wt = aXt and Zt = bYt


. (2.93)

where a and b are constants. The regression model .Yt = α + βXt + εt is then
written:
Zt Wt
. =α+β + εt (2.94)
b a
Hence:
Wt
. Zt = bα + bβ + bεt (2.95)
a

Or again, by noting .α ' = bα and .β ' = bβ


a :

Zt = α ' + β ' Wt + bεt


. (2.96)
2.2 The Ordinary Least Squares (OLS) Method 47

The estimator .β̂ ' of .β ' is thus given by:

Cov(Wt , Zt ) Cov(aXt , bYt ) abCov(Xt , Yt ) b


β̂ ' =
. = = = β̂ (2.97)
V (Wt ) V (aXt ) a 2 V (Xt ) a

As shown, .β̂ ' differs from .β̂ if .a /= b.

2.2.4 Properties of OLS Estimators

The OLS estimators .α̂ and .β̂ of the parameters .α and .β are:

– Linear estimators; in other words, they are functions of the dependent variable
.Yt .
 
– Unbiased estimators; this means that .E α̂ = α and .E β̂ = β: the bias of
   
each of the estimators (.Bias α̂ = E α̂ − α and .Bias β̂ = E β̂ − β) is
zero.
– Minimum variance estimators. The estimators .α̂ and .β̂ are the unbiased
estimators with the lowest variance among all the possible linear unbiased
estimators.

The OLS estimators .α̂ and .β̂ are therefore BLUE (the best linear unbiased
estimators). Let us now demonstrate each of these properties.

Linear Estimators
Consider the centered variables .xt = Xt − X̄, .yt = Yt − Ȳ , and let .wt be defined as:
xt
.wt = (2.98)

T
xt2
t=1

It can then be shown (see Appendix 2.1.1) that:


T
xt Yt
t=1

T
β̂ =
. = wt Yt (2.99)
T
xt2 t=1
t=1

and:
T 
 
1
α̂ =
. − X̄wt Yt (2.100)
T
t=1
48 2 The Simple Regression Model

The expression (2.99) reflects the fact that .β̂ is a linear estimator of .β: .β̂ indeed
appears as a linear function of the dependent variable .Yt . It is the same for .α̂ which
is expressed as a linear function of .Yt according to Eq. (2.100): .α̂ is thus a linear
estimator of .α.
Let us summarize this first result concerning the properties of the OLS estimators
as follows.

Property 2.7 The OLS estimators .α̂ and .β̂ are linear estimators of the parameters .α
and .β.

Unbiased Estimators
Starting
  from the linearity property of estimators, it is possible to show that
.E β̂ = β and .E α̂ = α, leading to the following property (the proof is given
in Appendix 2.1.2).

Property 2.8 The OLS estimators .α̂ and .β̂ are unbiased estimators of the parameters
α and .β:
.

E α̂ = α
. (2.101)
 
E β̂ = β
. (2.102)

Consistent and Minimum Variance Estimators


Starting from the formulas of the variances of the OLS estimators (see demonstra-
tion of the formulas in Appendix 2.1.3):

σε2 σε2
V (β̂) =
. = (2.103)

T T V (Xt )
xt2
t=1

and:
⎛ ⎞
T T
⎜1 2 ⎟ xt2 + T X̄2 Xt2
2⎜ X̄ ⎟ 2 t=1 2 t=1
.V (α̂) = σε ⎜ + ⎟ = σε = σε 2 (2.104)
⎝T 
T
2
⎠ T
2
T V (Xt )
xt T xt
t=1 t=1

we notice that if .T → ∞, then .V (β̂) → 0 and .V (α̂) → 0 (see Appendix 2.1.3),


which can be summarized as follows.
2.2 The Ordinary Least Squares (OLS) Method 49

Property 2.9 The OLS estimators .α̂ and .β̂ are consistent estimators of the parame-
ters .α and .β:

. lim V (α̂) = 0 and lim V (β̂) = 0 (2.105)


T →∞ T →∞

It can also be shown that the OLS estimators .α̂ and .β̂ are estimators of
minimum variance among the class of linear unbiased estimators (see demonstration
in Appendix 2.1.3).

Property 2.10 In the class of linear unbiased estimators, the OLS estimators .α̂ and
β̂ are of minimum variance.
.

Putting together all the properties of the OLS estimators presented in this section,
we can finally state the following fundamental property.

Property 2.11 The OLS estimators .α̂ and .β̂ are the best linear unbiased estimators
of the parameters .α and .β: they are BLUE.

It is because of this property that the OLS method is very frequently used.

2.2.5 OLS Estimator of the Variance of the Error Term


Finding the Estimator of the Error Variance
We now seek to determine an estimator .σ̂ε2 of the error variance .σε2 . Starting from
the definition of the residuals:

.et = Yt − Ŷt = α + βXt + εt − α̂ − β̂Xt (2.106)

that is:
 
et = εt − α̂ − α − β̂ − β Xt
. (2.107)

we can show that such an estimator is written (see Appendix 2.1.4):

1  2
T
.σ̂ε2 = et (2.108)
T −2
t=1

This is an unbiased estimator of .σε2 .

Estimation of the Variances of the OLS Estimators


Determining the estimator .σ̂ε2 of the variance of the error term (Eq. (2.108)) allows
us to give the estimates of the variances of the OLS estimators .α̂ and .β̂. Using
50 2 The Simple Regression Model

Eq. (2.103), the estimator of the variance of .β̂ is written:

σ̂ε2 σ̂ε2
V
.(β̂) = = (2.109)

T T V (Xt )
xt2
t=1

Similarly, from Eq. (2.104), we have the estimator of the variance of .α̂:


T 
T
Xt2 Xt2
V
t=1 t=1
.(α̂) = σ̂ε2 = σ̂ε2 (2.110)

T T 2 V (Xt )
T xt2
t=1

Calculating these expressions allows us to assess the precision of the estimators.

2.2.6 Empirical Application

To illustrate the OLS method, let us consider the following two series:

– The series of returns of the US Dow Jones Industrial Average index, denoted
RDJ
– The series of returns of the Euro Stoxx 50, i.e., the European stock market index,
denoted REURO

These two series, taken from the Macrobond database, have a quarterly frequency
over the period from the second quarter of 1987 to the second quarter of 2021, i.e.,
a total of 137 observations.
Figure 2.5 shows that the returns series move in much the same way, which is not
surprising given the international integration of financial markets. Figure 2.6 further
shows that the scatter plot can be reasonably adjusted by a regression line of the
type:


REU
. ROt = α̂ + β̂RDJt (2.111)

We assume here that the dependent variable corresponds to the returns of the
European index, the explanatory variable being the returns of the US index. This
choice can be justified by the fact that it is frequently admitted that the US stock
market has an influence on all the other international stock markets.
Our purpose is to obtain the estimated values .α̂ and .β̂ by applying the OLS
method:

α̂ = REU RO − β̂RDJ
. (2.112)
2.2 The Ordinary Least Squares (OLS) Method 51

.3

.2

.1

.0

-.1

-.2

-.3

-.4
1990 1995 2000 2005 2010 2015 2020

REURO RDJ

Fig. 2.5 Dow Jones and Euro Stoxx 50 returns, 1987.2–2021.2

.3

.2

.1

.0
REURO

-.1

-.2

-.3

-.4
-.32 -.28 -.24 -.20 -.16 -.12 -.08 -.04 .00 .04 .08 .12 .16 .20
RDJ

Fig. 2.6 Representation of the values of the pair (RDJ,REURO)


52 2 The Simple Regression Model

Table 2.2 OLS estimation RDJ REU RO .(RDJ )


2
.RDJ × REU RO
of the relationship between
1987.2 0.0482 0.0404 0.0023 0.0019
REU RO and RDJ
1987.3 0.0709 0.0277 0.0050 0.0020
1987.4 .−0.2920 .−0.3619 0.0853 0.1057
1988.1 0.0251 0.0792 0.0006 0.0020
1988.2 0.0744 0.0807 0.0055 0.0060
... ... ... ... ...
2020.2 0.1636 0.1488 0.0268 0.0243
2020.3 0.0735 .−0.0126 0.0054 .−0.0009
2020.4 0.0968 0.1065 0.0094 0.0103
2021.1 0.0747 0.0982 0.0056 0.0073
2021.2 0.0451 0.0364 0.0020 0.0016
Sum 2.7061 1.5421 0.9080 1.0183

and:
Cov(RDJ, REU RO)
β̂ =
. (2.113)
V (RDJ )

Table 2.2 presents the calculations required to determine the estimators .α̂ and .β̂.
We thus have:
1
RDJ =
. 2.7061 = 0.0196 (2.114)
137
1
REU RO =
. 1.5421 = 0.0113 (2.115)
137
1
V (RDJ ) =
. 0.9080 − (0.0196)2 = 0.0062 (2.116)
137
1
Cov (RDJ, REU RO) =
. 1.0183 − 0.0196 × 0.0113 = 0.0072 (2.117)
137

From these calculations, we derive the values of the estimators .α̂ and .β̂:

0.0072
β̂ =
. = 1.1559 (2.118)
0.0062
and:

α̂ = 0.0113 − 1.1559 × 0.0196 = −0.0116


. (2.119)
2.3 Tests on the Regression Parameters 53

The equation of the regression line is therefore given by:


REU
. ROt = −0.0116 + 1.1559RDJt (2.120)

By virtue of (2.120), we find that there is a positive relationship between the US


and European stock returns insofar as .β̂ > 0. More precisely, we note that a 1-point
increase in the Dow Jones returns translates, all other things being equal, into a
1.1559 points increase in the returns of the Euro Stoxx index.

2.3 Tests on the Regression Parameters

So far, the assumption that the error term follows a normal distribution has not
been made, since it was not necessary to establish the main results of the regression
analysis. This assumption can now be introduced to determine the distribution
followed by the estimators .α̂ and .β̂, as well as by the estimator .σ̂ε2 of the variance
of the error term.

2.3.1 Determining the Distributions Followed by the OLS


Estimators

Since .α̂ and .β̂ are linear functions of the error term .ε, they are also normally
distributed. The expectation and variance of these two normal distributions still have
to be specified.
 that .α̂ and .β̂ are unbiased estimators of .α and .β, that is: .E α̂ = α
We know
and .E β̂ = β. Moreover, we have shown that the variances of the two estimators
are given by (Eqs. (2.104) and (2.103)):
⎛ ⎞
T T
⎜1 2 ⎟ xt2 + T X̄2 Xt2
2⎜ X̄ ⎟ t=1 t=1
.V (α̂) = σε ⎜ + ⎟ = σε2 = σε2 2 (2.121)
⎝T 
T ⎠ T T V (Xt )
xt2 T xt2
t=1 t=1

and:

σε2
V (β̂) =
. (2.122)

T
xt2
t=1
54 2 The Simple Regression Model

We deduce the distributions followed by the two estimators .α̂ and .β̂:
⎛ ⎛ ⎞⎞
⎜ ⎜1 X̄2 ⎟ ⎟
⎜ ⎜ ⎟⎟
. α̂ ∼ N ⎜α, σε2 ⎜ + ⎟⎟ (2.123)
⎝ ⎝T 
T ⎠⎠
xt2
t=1

and:
⎛ ⎞
⎜ σ2 ⎟
⎜ ⎟
. β̂ ∼ N ⎜β, ε ⎟ (2.124)
⎝ T ⎠
xt2
t=1

These expressions are a function of .σε2 which is unknown. In order to make them
operational, it is necessary to replace .σε2 by its estimator .σ̂ε2 given by (Eq. (2.108)):

1  2
T
. σ̂ε2 = et (2.125)
T −2
t=1

However, such an operation requires knowledge of the distribution followed by


. σ̂ε2 to deduce the distributions followed by the estimators .α̂ and .β̂. Since the error
term .εt is normally distributed, we have (see Box 2.2):

σ̂ε2
. (T − 2) ∼ χT2 −2 (2.126)
σε2

where .χx2 designates the Chi-squared distribution with x degrees of freedom. It


follows that:


T
et2
t=1
. ∼ χT2 −2 (2.127)
σε2

Box 2.2 Relationships between the normal, Chi-squared, and Student’s t


distributions
Consider a random variable z following a standard normal distribution, that
is: .z ∼ N(0, 1). Let .z1 , z2 , . . . , zT T be independent random draws of this
variable, which can be likened to T observations of the variable z. The sum of

(continued)
2.3 Tests on the Regression Parameters 55

Box 2.2 (continued)


the squares of the .zi , .i = 1, . . . , T , follows a Chi-squared distribution with
T degrees of freedom, i.e.:
 
. z12 + z22 + . . . + zT2 ∼ χT2 (2.128)

When the number of degrees of freedom T tends to infinity, the Chi-


squared distribution tends to a normal distribution. Let us now consider two
independent random variables z and v. Assume that z has a standard normal
distribution and v a Chi-squared distribution with r degrees of freedom:
.z ∼ N (0, 1) and .v ∼ χr . Under these conditions, the quantity:
2


z r
t= √
. (2.129)
v

follows a Student’s t distribution with r degrees of freedom, i.e.:



z r
t = √ ∼ t (r)
. (2.130)
v

Consider two random variables w and v each following a Chi-squared


distribution with s and r degrees of freedom, respectively, and suppose that
these two distributions are independent, i.e.:

w ∼ χs2 and v ∼ χr2


. (2.131)

The statistics:
w/s
F =
. (2.132)
v/r

follows a Fisher distribution with .(s, r) degrees of freedom, i.e.:

F ∼ F (s, r)
. (2.133)

According to Eqs. (2.123) and (2.124), we can write:

α̂ − α
.  ∼ N (0, 1) (2.134)
1
σε 
T + T
X̄2
2 xt
t=1
56 2 The Simple Regression Model

and:

β̂ − β
.  ∼ N (0, 1) (2.135)

T
σε / xt2
t=1

Let us examine what happens to these expressions when we replace .σε by its
estimator .σ̂ε . Using the results given in Box 2.2, let us posit:
⎛ ⎞ 
⎜ ⎟ 
T
⎜ ⎟ et2 /σε
⎜ α̂ − α ⎟ t=1
.t = ⎜ ⎟/
⎜   ⎟ √
T −2
(2.136)
⎜ σ  1 + X̄2 ⎟
⎝ ε T T ⎠
xt2
t=1

Hence:

α̂ − α
.  ∼ t (T − 2) (2.137)
1

σ̂ε  T +  X̄2
T
2 xt
t=1

Let us apply the same reasoning to .β̂. Thus, by positing:


⎛ ⎞ 

T
⎜ ⎟ et2 /σε
⎜ β̂ − β ⎟ t=1
.t = ⎜  ⎟/ √ (2.138)
⎜ ⎟ T −2
⎝ 
T
2

σε / xt
t=1

we deduce that:
⎛ ⎞ 

T
⎜ ⎟ et2 /σε
⎜ β̂ − β ⎟ t=1
.t = ⎜  ⎟/ √ (2.139)
⎜ ⎟ T −2
⎝ 
T ⎠
σε / xt2
t=1

Equations (2.137) and (2.139) highlight the fact that replacing .σε2 by its estimator
2
.σ̂εamounts to replacing a normal distribution by a Student’s t distribution. When
the sample size T is sufficiently large, the Student’s t distribution tends to a standard
normal distribution. In practice, when the number of observations exceeds 30
2.3 Tests on the Regression Parameters 57

(T > 30), we consider that the Student’s t distribution in Eqs. (2.137) and (2.139)
.

can be replaced by a standard normal distribution. From expressions (2.137) and


(2.139), it is possible to derive statistical tests on the regression coefficients.

2.3.2 Tests on the Regression Coefficients

We present the tests on the two parameters .α and .β, even if the tests on .β are in
practice more frequently used.

Test on α
By virtue of (2.137), it is possible to construct a .100(1 − p)% confidence interval
for .α, that is:


1 X̄2
.α̂ ± tp/2 σ̂ε 
T +  T
(2.140)
 2
xt
t=1

where .tp/2 is the value obtained from the Student’s t distribution for the .100 (p/2)%
significance level. This value is called the critical value of the Student’s t law at the
.100(p/2)% significance level. We often use .p = 0.05, which corresponds to a 95%

confidence interval.

Remark 2.7 The significance level corresponds to the probability of rejecting the
null hypothesis when it is true. It is also called the size of the test.

Remark 2.8 The confidence interval (2.140) can also be written as:
⎡ ⎤
 
⎢ 1 1 ⎥
⎢ −  + X̄2
< α < α̂ + tp/2 σ̂ε  X̄2 ⎥ = 100(1 −
.P rob
⎣ α̂ t p/2 σ̂ε T 
T T + 
T ⎦
xt2 2
xt
t=1 t=1
p)%

It is then possible to test the null hypothesis that the coefficient .α is equal to a
given value .α0 :

H0 : α = α0
. (2.141)

against the alternative hypothesis:

H1 : α /= α0
. (2.142)
58 2 The Simple Regression Model

If the null hypothesis is true, then:

α̂ − α0
.  ∼ t (T − 2) (2.143)
1

σ̂ε  T +  X̄2
T
2 xt
t=1

The decision rule is:


 
 
 
 
 α̂−α0 
– If .   ≤ tp/2 : the null hypothesis is not rejected at the 100p%
 σ̂ε  T1 + X̄2 
  
T 
 xt 
2
t=1
significance
  therefore, .α = α0 .
level;
 
 
 
 α̂−α0 
– If .   > tp/2 : the null hypothesis is rejected at the 100p% significance
 σ̂ε  T1 + X̄2 
  
T 
 xt2 
t=1
level; therefore, .α /= α0 .

Test on β
By virtue of (2.139), we can construct a .100(1 − p)% confidence interval for .β, that
is:

 T

.β̂ ± tp/2 σ̂ε /
 xt2 (2.144)
t=1

As for .α, it is possible to test the null hypothesis that the coefficient .β is equal to
a given value .β0 :

H0 : β = β0
. (2.145)

against the alternative hypothesis:

H1 : β /= β0
. (2.146)

If the null hypothesis is true, then:

β̂ − β0
.  ∼ t (T − 2) (2.147)

T
σ̂ε / xt2
t=1
2.3 Tests on the Regression Parameters 59

The decision rule is given by:


 
 
 
 
 α̂−α0 
– If .   ≤ tp/2 : the null hypothesis is not rejected at the 100p%
 σ̂ε  T1 + X̄2 
  
T 
 xt2 
t=1
significance
  therefore, .β = β0 .
level;
 
 
 
 α̂−α0 
– If .   > tp/2 : the null hypothesis is rejected at the 100p% significance
 σ̂ε  T1 + X̄ 
2
   2 
T
 xt 
t=1
level; therefore, .β /= β0 .

The commonest practice is to test the null hypothesis:

H0 : β = 0
. (2.148)

against the alternative hypothesis:

H0 : β /= 0
. (2.149)

This is a test of coefficient significance, also called the t-test. Thus, under the
null hypothesis, the coefficient associated with the variable .Xt is not significant: .Xt
plays no role in determining the dependent variable .Yt . The test is performed by
replacing .β0 by 0 in (2.147). The test statistic is then given by:

β̂
.  (2.150)
 T
σ̂ε / xt2
t=1

This expression corresponds to the ratio of the estimated coefficient .β̂ on its
estimated standard deviation .σβ̂ , which is noted .tβ̂ . The quantity:

β̂
tβ̂ =
. (2.151)
σβ̂

is the calculated t-statistic of the coefficient .β̂.


60 2 The Simple Regression Model

The decision rule of the significance test of the coefficient .β is:


 
 
– If .tβ̂  ≤ tp/2 : the null hypothesis is not rejected at the 100p% significance level;
therefore, .β = 0: the coefficient associated with the variable .Xt is not significant
and .Xt does not contribute to explaining .Yt .
 
– If .tβ̂  > tp/2 : the null hypothesis is rejected at the 100p% significance level;
therefore, .β /= 0: the coefficient associated with the variable .Xt is significant,
meaning that .Xt contributes to explaining the dependent variable .Yt .

As said, it is very common to use .p = 0.05. For a sufficiently large number of


observations, the value of the Student’s t distribution at the 5% significance level is
1.96. Consequently:
 
 
– If .tβ̂  ≤ 1.96: the null hypothesis .β = 0 is not rejected at the 5% significance
 
level.
 
– If .tβ̂  > 1.96: the null hypothesis .β = 0 is rejected at the 5% significance level.

This t-test is widely used in practice. It can of course be applied in a similar way
to the coefficient .α.

Test on σε2
It is also possible to construct a test on the variance of the error term from the
equation:

σ̂ε2
. (T − 2) ∼ χT2 −2 (2.152)
σε2

The confidence interval is given by:



σ̂ε2
.P rob χp/2 < (T − 2) = 100(1 − p)%
2 2
< χ1−p/2 (2.153)
σε2

or:
! "
(T − 2) σ̂ε2 (T − 2) σ̂ε2
.P rob
2
< σε2 < 2
= 100(1 − p)% (2.154)
χ1−p/2 χp/2

It is then possible to carry out a test of the type:

H0 : σε2 = σ02
. (2.155)
2.3 Tests on the Regression Parameters 61

2.3.3 Empirical Application

Let us go back to the previous example linking the following two series:

– The series of returns of the Dow Jones Industrial Average index, RDJ
– The series of returns of the Euro Stoxx 50 index, REU RO

We obtained the following estimated relationship:


REU
. ROt = −0.0116 + 1.1559RDJt (2.156)

We can now ask whether or not the constant and the coefficient of the slope of
the regression line are significantly different from zero. To this end, let us calculate
the t-statistics of these two coefficients:

α̂ β̂
tα̂ =
. and tβ̂ = (2.157)
σα̂ σβ̂

First, we need to determine the standard deviations of the estimated coefficients.


We have seen that:


T
RDJt2
V
.(α̂) = 2 t=1
σ̂ε 2 (2.158)
T V (RDJt )

and:

σ̂ε2
V
.(β̂) = (2.159)
T V (RDJt )

It is therefore necessary to determine .σ̂ε2 :

1  2
T
σ̂ε2 =
. et (2.160)
T −2
t=1

Calculating .σ̂ε2 first involves determining the residuals .et , .t = 1, . . . , T :

. 
et = REU ROt − REU ROt (2.161)

Table 2.3 presents the calculations needed to obtain the residuals and the sum of
squared residuals.
62 2 The Simple Regression Model

Table 2.3 Calculation of the  2


REU RO .REU ROt .et .et
residuals
1987.2 0.0404 0.0442 .−0.0037 1.3978E-05
1987.3 0.0277 0.0704 .−0.0427 1.8251E-03
1987.4 .−0.3619 .−0.3491 .−0.0123 1.6271E-04
1988.1 0.0792 0.0174 0.0618 3.8201E-03
1988.2 0.0060 0.0745 0.0062 3.8997E-05
... ... ... ... ...
2020.2 0.1488 0.1775 .−0.0287 8.2524E-04
2020.3 .−0.0126 0.0734 .−0.0860 7.3924E-03
2020.4 0.1065 0.1004 0.0062 3.8212E-05
2021.1 0.0982 0.0748 0.0234 5.4685E-04
2021.2 0.0364 0.0405 .−0.0042 1.7524E-05
Sum 1.5421 1.5421 0.0000 0.4322


The estimated values .REU ROt of .REU ROt are determined as follows:

– .REURO1987.2 = −0.0116 + 1.1559 × 0.0482 = 0.0442


– .REURO1987.3 = −0.0116 + 1.1559 × 0.0709 = 0.0704
– ...
– .REU 
RO2021.2 = −0.0116 + 1.1559 × 0.0451 = 0.0405

It can be seen from Table 2.3 that the sum of the values of .REU ROt is equal

to the sum of the values of .REU ROt , illustrating that the observed series and the
estimated series have the same mean.
We derive the values of the residuals:

– .e1987.2 = 0.0404 − 0.0442 = −0.0037


– .e1987.3 = 0.0277 − 0.0704 = −0.0427
– ...
– .e2021.2 = 0.0364 − 0.0405 = −0.0042


137
We find that . et2 = 0.4322. Hence:
t=1

1
.σ̂ε2 = 0.4322 = 0.0032 (2.162)
137 − 2

Moreover,we had previously calculated the variance of RDJ , i.e.: .V (RDJ ) =


0.0062 and . (RDJ )2 = 0.9080. According to (2.158), we therefore have:

0.9080
.V
(α̂) = 0.0032 = 2.4828.10−5 (2.163)
1372 × 0.0062
2.3 Tests on the Regression Parameters 63

Hence:

σ̂α̂ = 0.0050
. (2.164)

So finally:

−0.0116
tα̂ =
. = −2.3232 (2.165)
0.0050

Similarly, we determine .V


(β̂):

V
0.0032
.(β̂) = = 0.0037 (2.166)
137 × 0.0062

and:
1.1559
tβ̂ = √
. = 18.8861 (2.167)
0.0037

Having determined the t-statistics of the coefficients .α̂ and .β̂, given by
Eqs. (2.165) and (2.167), we can perform the significance tests:

. H0 : α = 0 against H1 : α /= 0 (2.168)

and:

H0 : β = 0 against H1 : β /= 0
. (2.169)

The number of observations is .T = 137. Recall that, under the null hypothesis,
the .tα̂ and .tβ̂ statistics follow Student’s t distributions with .(T − 2) degrees of
freedom. Reading the Student’s t table, for a number of degrees of freedom equal to
135 and for a 5% significance level, gives us the critical value: .t0.025 (135) = 1.96.
It can be seen that:

– .|tα̂ | = 2.3232 > 1.96: we reject the null hypothesis that .α = 0. The constant
term is therefore significantly different from zero.
 
– .tβ̂  = 18.8861 > 1.96: we reject the null hypothesis that .β = 0. The
slope coefficient of the regression is therefore significantly different from zero,
indicating that the variable RDJ contributes to explaining the variable REU RO.

It is possible to construct confidence intervals for .α and .β:

– The 95% confidence interval for .α is given by .α̂ ± t0.025 × σα̂ , or .−0.0116 ±
1.96 × 0.0050, which corresponds to the interval .[−0.0214; −0.0018] . We can
64 2 The Simple Regression Model

see that 0 does not belong to this interval, thus confirming the rejection of the
null hypothesis for the coefficient .α.
– The 95% confidence interval for .β is given by .β̂ ± t0.025 × σβ̂ , or .1.1559 ±
1.96 × 0.0612, which corresponds to the interval .[1.0359; 1.2759] . We can see
that 0 does not belong to this interval, thus confirming the rejection of the null
hypothesis for the coefficient .β.

2.4 Analysis of Variance and Coefficient of Determination

Once the regression parameters have been estimated and tested for statistical
significance, the goodness of fit remains to be assessed. In other words, it is
necessary to study whether the observed scatter plot is concentrated or, on the
contrary, dispersed around the regression line. For this purpose, the analysis of the
variance (analysis of variance [ANOVA]) of the regression is performed and the
coefficient of determination is calculated.

2.4.1 Analysis of Variance (ANOVA)

From the definition of residuals:

et = Yt − Ŷt = Yt − α̂ − β̂Xt
. (2.170)

we have:

Yt = Ŷt + et
. (2.171)

This relationship can be written in terms of variance:


     
V (Yt ) = V Ŷt + et = V Ŷt + V (et ) + 2cov Ŷt , et
. (2.172)
 
By virtue of Property 2.4, we know that: .cov Ŷt , et = 0. So, we have:
 
V (Yt ) = V Ŷt + V (et )
. (2.173)

This equation can also be expressed in terms of sums of squares by replacing the
variances by their definitions:


T T 
 2 
T
2
. Yt − Ȳ = Ŷt − Ŷ + (et − ē)2 (2.174)
t=1 t=1 t=1
2.4 Analysis of Variance and Coefficient of Determination 65

which can also be written, noting that .ē = 0 and .Y = Ȳ (see Property 2.2):


T T 
 2 
T
2
. Yt − Ȳ = Ŷt − Ȳ + et2 (2.175)
t=1 t=1 t=1

Equation (2.173) or (2.175) is called the analysis-of-variance (ANOVA) equa-


tion. In accordance with Eq. (2.173), we see that the total variance .V (Yt ) can be
expressed as the sum of two terms:

– The explained
  variance,
 which corresponds to the variance of the estimated
variable . V Ŷt : this is the variance explained by the model, i.e., by the
explanatory variable .Xt .
– The variance of the residuals, called residual variance .(V (et )). This is the
variance that is not explained by the model.

In a similar way, Eq. (2.175) involves three terms:

– The sum of the squares of the deviations of the explained variable from its mean,
known as the total sum of squares, noted T SS
– The explained sum of squares, noted ESS
– The residual sum of squares (also called sum of squared residuals), noted RSS

Equation (2.175) can thus be schematically written as follows:

T SS = ESS + RSS
. (2.176)

Example 2.1 Let us take the example of the relationship between the returns of the
Dow Jones Industrial Average index (RDJ ) and the returns of the Euro Stoxx 50
index (REU RO). We have already calculated the residual variance, i.e., .V (et ) =
0.0032. Furthermore, we have .V (REU RO) = 0.0115 and .V REU  RO =
0.0083. We can therefore write the ANOVA equation:

.0.0115 = 0.0083 + 0.0032 (2.177)

We deduce that the part of the variation of REU RO explained by the model is
given by:
 

V REU RO 0.0083
. = ≃ 0.7254 (2.178)
V (REU RO) 0.0115

Thus, 72.54% of the variation in REU RO is explained by the model.


66 2 The Simple Regression Model

2.4.2 Coefficient of Determination

The ANOVA equation enables us to judge the quality of a regression. The closer
the explained variance is to the total variance, i.e., the lower the residual variance,
the better the regression. In order to quantify this, we calculate the ratio between
the explained variance and the total variance, which is called the coefficient of
determination denoted as .R 2 (R-squared):

  T 
 2 
T
V Ŷt Ŷt − Ȳ et2
t=1 t=1
.R =2
= =1− (2.179)
V (Yt ) 
T 2 
T 2
Yt − Ȳ Yt − Ȳ
t=1 t=1

or:
ESS RSS
R2 =
. =1− (2.180)
T SS T SS
The coefficient of determination thus measures the proportion of the variance of
Yt explained by the model. By definition, we have:
.

0 ≤ R2 ≤ 1
. (2.181)

The closer the coefficient of determination is to 1, the better the model. A


coefficient of determination equal to 1 indicates a perfect fit: .Ŷt = Yt .∀t. A
coefficient of determination of zero indicates that there is no relationship between
the dependent variable and the explanatory variable: .β̂ = 0. In the latter case, the
best estimate of .Yt is equal to its mean value, i.e., .Ŷt = α̂ = Ȳ . Figures 2.7, 2.8, 2.9,
and 2.10 illustrate schematically the case of a coefficient of determination starting
from zero and tending towards 1.

Fig. 2.7 Coefficient of Y


determination close to zero

X
2.4 Analysis of Variance and Coefficient of Determination 67

Fig. 2.8 Coefficient of Y


determination moving away
from zero

Fig. 2.9 Increasing Y


coefficient of determination

Fig. 2.10 Coefficient of Y


determination close to 1

X
68 2 The Simple Regression Model

   
Remark 2.9 Since .V Ŷt = V α̂ + β̂Xt = β̂ 2 V (Xt ), the coefficient of
determination can be written:

β̂ 2 V (Xt )
R2 =
. (2.182)
V (Yt )

Furthermore, since .β̂ = Cov(X t ,Yt )


V (Xt ) , we can also give the following expression for
the coefficient of determination:

[Cov (Xt , Yt )]2


R2 =
. (2.183)
V (Xt ) V (Yt )

Example 2.2 Let us go back to our example relating to the regression of REU RO
on RDJ and a constant, i.e.:


REU
. ROt = −0.0116 + 1.1559RDJt (2.184)

Let us determine the coefficient of determination of this regression. We have


already calculated:
 

V REU RO
R2 =
. ≃ 0.7254 (2.185)
V (REU RO)

We can also use Eq. (2.182):

(1.1559)2 × 0.0062
R2 =
. ≃ 0.7254 (2.186)
0.0115
or Eq. (2.183):

[0.0072]2
R2 =
. ≃ 0.7254 (2.187)
0.0062 × 0.0115

it can be deduced that the selected model explains about 72.5% of the variation of
REU RO.

Remark 2.10 The coefficient of determination can be used to compare the quality
of models having the same dependent variable. On the other hand, it cannot be used
to compare models with different dependent variables. For example, the coefficient
of determination can be used to compare the models:

Yt = α + βXt + εt and Yt = a + bZt + ut


. (2.188)
2.4 Analysis of Variance and Coefficient of Determination 69

where .Zt is an explanatory variable (other than .Xt ) and .ut an error term, but it
cannot be used to compare:

Yt = α + βXt + εt and log Yt = a + bXt + ut


. (2.189)

Thus, if we take the models in Eq. (2.188) and if the coefficient of determination
of the model .Yt = a + bZt + ut is higher than that of the model .Yt = α + βXt + εt ,
the model .Yt = a + bZt + ut is preferred to the model .Yt = α + βXt + εt .
On the other hand, if the coefficient of determination associated with the model
.log Yt = a + bXt + ut is greater than that of the model .Yt = α + βXt + εt , we

cannot conclude that the model .Yt = a + bXt + ut is better, because the dependent
variable is not the same in the two models.

2.4.3 Analysis of Variance and Significance Test of the Coefficient β

The significance test of the coefficient .β, that is, the test of the null hypothesis
H0 : β = 0, can be approached in the ANOVA framework. Recall that we have
.

(Eq. (2.135)):

β̂ − β
.  ∼ N (0, 1) (2.190)

T
2
σε / xt
t=1

Furthermore, by virtue of the property that the sum of the squares of the terms
of a normally distributed series follows a Chi-squared distribution, we can write by
squaring the previous expression (see Box 2.2):
 2
β̂ − β
. ∼ χ12 (2.191)

T
σε2 / xt2
t=1

We also know from Eq. (2.127) that:


T
et2
t=1
. ∼ χT2 −2 (2.192)
σε2
70 2 The Simple Regression Model

By relating Eqs. (2.191) and (2.192), we obtain:


 2 
T
β̂ − β xt2
t=1
F =
. ∼ F (1, T − 2) (2.193)

T
et2 /(T − 2)
t=1

where .F (1, T − 2) denotes a Fisher distribution with .(1, T − 2) degrees of free-


dom. This result arises because the ratio of two independent Chi-squared distribu-
tions, divided by their number of degrees of freedom, follows a Fisher distribution
(see Box 2.2).
We can then proceed to the significance test on the coefficient .β. Under the null
hypothesis, .H0 : β = 0, we can write:


T
β̂ 2 xt2
t=1
F =
. ∼ F (1, T − 2) (2.194)

T
et2 /(T − 2)
t=1

Let us consider the analysis-of-variance Eq. (2.175):


T T 
 2 
T
2
. Yt − Ȳ = Ŷt − Ȳ + et2 (2.195)
t=1 t=1 t=1

which can also be written using the centered variables:


T 
T 
T 
T 
T
. yt2 = ŷt2 + et2 = β̂ 2 xt2 + et2 (2.196)
t=1 t=1 t=1 t=1 t=1


T 
T
Thus, we have .β̂ 2 xt2 = ESS and . et2 = RSS, and Eq. (2.194) becomes:
t=1 t=1

ESS
F =
. ∼ F (1, T − 2) (2.197)
RSS/(T − 2)

This statistic can be used to perform a test of significance of the coefficient .β:

– If .F ≤ F (1, T − 2), the null hypothesis is not rejected, i.e., .β = 0: the


coefficient associated with the variable .Xt is significant, indicating that .Xt does
not contribute to the explanation of the dependent variable.
2.4 Analysis of Variance and Coefficient of Determination 71

– If .F > F (1, T − 2), the null hypothesis is rejected. We deduce that .β is


significantly different from 0, which implies that the variable .Xt contributes to
explaining .Yt .

Remark 2.11 By virtue of the definition of the coefficient of determination, it is


also possible to write Eq. (2.197) as follows:

R2
F =
. ∼ F (1, T − 2) (2.198)
1 − R 2 /(T − 2)

This test can then be used as a test of significance of the coefficient of


determination, i.e., as a test of the null hypothesis .H0 : R 2 = 0. Of course, since
the simple regression model has only one explanatory variable—the variable .Xt —
testing the significance of the coefficient of determination amounts to testing the
significance of the coefficient .β assigned to .Xt .

2.4.4 Empirical Application

Let us go back to our example linking the returns of the European stock index
(REU RO) and the returns of the US stock index (RDJ ). The purpose is to apply
the tests of significance of .β and of the R-squared based on Fisher statistics.
Table 2.4 presents the calculations required to determine the explained sum of
squares (ESS) and the sum of squared residuals (RSS), the latter having already
been calculated.

Table 2.4 Fisher test


 2
 . 
REU ROt − REU RO 2
.REU ROt .et .et

1987.2 0.0404 0.0011 .−0.0037 1.3978E-05


1987.3 0.0277 0.0035 .−0.0427 1.8251E-03
1987.4 .−0.3619 0.1299 .−0.0128 1.6271E-04
1988.1 0.0792 0.0000 0.0618 3.8201E-03
1988.2 0.0807 0.0040 0.0062 3.8997E-05
... ... ... ... ...
2020.2 0.1488 0.0276 .−0.0287 8.2524E-04
2020.3 .−0.0126 0.0039 .−0.0860 7.3924E-03
2020.4 0.1065 0.0079 0.0062 3.8212E-05
2021.1 0.0982 0.0040 0.0234 5.4685E-04
2021.2 0.0364 0.0009 .−0.0042 1.7524E-05
Sum 1.5421 1.1418 0.0000 0.4322
72 2 The Simple Regression Model

The explained sum of squares is equal to .ESS = 1.1418 and the sum of squared
residuals is given by .RSS = 0.4322. The application of the formula (2.197) leads
to the following result:

1.1418
F =
. ≃ 356.68 (2.199)
0.4322/135

At the 5% significance level, the value of the Fisher distribution .F (1135) read
from the table is .3.842. Thus, we have .F ≃ 356.68 > 3.842, which means that we
reject the null hypothesis that .β = 0. The variable RDJ contributes significantly to
explaining REU RO, which of course confirms the results previously obtained.
It is also possible to calculate the F statistic from expression (2.198). We have
previously shown that .R 2 ≃ 0.7254. We thus have:

0.7254
F =
. ≃ 356.68 (2.200)
(1 − 0.7254) /135

We obviously obtain the same value as with Eq. (2.197). Comparing, as before,
this value to the critical value at the 5% significance level, i.e., .F (1135) = 3.842,
we have .356.68 > 3.842. We therefore reject the null hypothesis of nonsignificance
of the coefficient of determination. The coefficient of determination is significant,
which is equivalent to concluding that the variable RDJ matters in the explanation
of REU RO, since our model contains only one explanatory variable.

2.5 Prediction

Once the model has been estimated by the OLS method, it is possible to predict
the dependent variable. Suppose that the following model has been estimated for
.t = 1, . . . , T :

Yt = α + βXt + εt
. (2.201)

that is:

.Ŷt = α̂ + β̂Xt (2.202)

for .t = 1, . . . , T .
We seek to determine the forecast of the dependent variable for a horizon h,
i.e., .ŶT +h . Assuming that the relationship generating the explained variable remains
identical and the value of the explanatory variable is known in .T + h, we have:

.ŶT +h = α̂ + β̂XT +h (2.203)


2.5 Prediction 73

It is possible to define the forecast error, noted .eT +h , by:

.eT +h = YT +h − ŶT +h = α + βXT +h + εT +h − α̂ − β̂XT +h (2.204)

which can also be expressed as:


 
eT +h = εT +h − α̂ − α − β̂ − β XT +h
. (2.205)

In order to show that the forecast given by Eq. (2.203) is unbiased, let us calculate
the expectation of the expression (2.205):
   
.E (eT +h ) = E εT +h − α̂ − α − β̂ − β XT +h (2.206)

Since .α̂ and .β̂ are unbiased estimators of .α and .β and given that .E (εT +h ) = 0,
we have:

E (eT +h ) = 0
. (2.207)

The forecast given by Eq. (2.203) is therefore unbiased. The prediction interval
is given by:
 
. α̂ + β̂XT +h ± σeT +h (2.208)

where .σeT +h designates the standard deviation of the forecast error. After calculating
this standard deviation (see Appendix 2.1.5), we can can write the .100(1 − p)%
prediction interval7 for .YT +h :

  
 1 XT +h − X̄
2

. α̂ + β̂XT +h ± tp/2 σ̂ε 1 + + (2.209)
 T 
T
xt2
t=1

It is then possible to give a certain degree of confidence to the forecast if the


value of the dependent variable, for the considered horizon, lies within the prediction
interval. The length of this interval is not constant: the more the value of .XT +h
deviates from the mean .X̄ of the sample under consideration, the wider the interval.

Remark 2.12 The purpose may be not to predict the precise value of .YT +h , but its
average value instead. We then consider:

E (YT +h ) = α + βXT +h
. (2.210)

7 The demonstration is given in Appendix 2.1.5.


74 2 The Simple Regression Model

The forecast error is written:


#   $
. eT +h = E (YT +h ) − ŶT +h = − α̂ − α + β̂ − β XT +h (2.211)

and its variance is given by:


⎛ ⎞ ⎛ ⎞
⎜1 XT +h − X̄
2⎟ ⎜1 XT +h − X̄
2⎟
⎜ ⎟ ⎜ ⎟
V (eT +h ) = σε2 ⎜ +
. ⎟ = σε2 ⎜ + T ⎟
⎝T 
T ⎠ ⎝T  2⎠
xt2 Xt − X̄
t=1 t=1
(2.212)

The .100(1 − p)% prediction interval for .E (YT +h ) is therefore given by:


1 2
 + XT +h − X̄
. (α + βXT +h ) ± tp/2 σ̂ε  (2.213)
T 
T
xt2
t=1

Example 2.3 Consider our example relating the returns of the European stock
index (REU RO) and the returns of the US stock market index (RDJ ) over the
period from the second quarter of 1987 to the second quarter of 2021. For this
period, we estimated the following relationship:


REU
. ROt = −0.0116 + 1.1559RDJt (2.214)

Assume that the returns on the US stock index increase by 2% in the third quarter
of 2021 compared to the previous quarter. Given that .RDJ2021.2 = 0.0451, we
deduce: .RDJ2021.3 = 0.0451 × 1.02 = 0.0460. Therefore, we can write:

REU
. RO2021.3 = −0.0116 + 1.1559 × 0.0460 = 0.0416 (2.215)

Let us now determine the 95% prediction interval:



  
 1 RDJ2021.3 − RDJ
2
. α̂ + β̂RDJ2021.3 ± t0.025 σ̂ε 
1 + 137 + 137 (2.216)
  2
RDJt − RDJ
t=1
2.6 Some Extensions of the Simple Regression Model 75


137 2
We know that .RDJ = 0.0196 and that . RDJt − RDJ = 0.8546.
√ t=1
Moreover, we have already calculated .σ̂ε = 0.0032. Knowing that .t0.025 (135) =
1.96, we have:

. (−0.0116 + 1.1559 × 0.0460) ± 1.96 × 0.0032

1 (0.0460 − 0.0196)2
× 1+ + (2.217)
137 0.8546

which corresponds to the interval .[−0.0698; 0.1529]. If the value taken by REU RO
in the third quarter of 2021 does not lie within this interval, the forecast is incorrect.
This may be the case, for example, if the estimated model, valid until the second
quarter of 2021, is no longer valid for the third quarter of the same year. In other
words, such a situation may arise if the structure of the model has changed.

2.6 Some Extensions of the Simple Regression Model

So far, we have focused on the model:

Yt = α + βXt + εt
. (2.218)

which is linear with respect to the parameters .α and .β, but also with respect to
the variables .Yt and .Xt . We now propose to briefly study models frequently used
in economics, which can be nonlinear with respect to the variables .Yt and .Xt , but
linear with respect to the parameters, or can become so after certain appropriate
transformations of the variables. As an example, the model:

. log Yt = α + βXt + εt (2.219)

can also be written:

Zt = α + βXt + εt
. (2.220)

with .Zt = log Yt .


The model (2.219) is nonlinear with respect to the variables .Yt and .Xt , but it
is linear with respect to the variables .Zt and .Xt . The model (2.220) can then be
studied using the methodology presented in this chapter. The transformation of the
variable .Yt into the variable .Zt has allowed us to obtain, from a nonlinear model with
respect to the variables, a linear model with respect to the transformed variables. In
our example, only one of the two variables has been transformed, but there are also
cases where both variables must undergo transformations in order to obtain a linear
model.
76 2 The Simple Regression Model

2.6.1 Log-Linear Model

The log-linear model, also known as log-log model or double-log model, is given
by:

. log Yt = log α + β log Xt + εt (2.221)

By noting .α0 = log α, we get:

. log Yt = α0 + β log Xt + εt (2.222)

This model is linear in the parameters .α0 and .β. Furthermore, let us posit:

. Yt∗ = log Yt and Xt∗ = log Xt (2.223)

The model (2.221) can therefore be written:

.Yt∗ = α0 + βXt∗ + εt (2.224)

which is a linear model in the variables .Yt∗ and .Xt∗ and in the parameters .α0 and .β.
It is then possible to apply to this model the methodology presented in this chapter
in order to estimate the parameters .α0 and .β by OLS.
One of the interests of the log-log model is that the coefficient .β measures the
elasticity of .Yt with respect to .Xt , i.e., the percentage change in .Yt for a given
percentage of variation in .Xt . It is thus a constant elasticity model.
For example, if .Yt denotes the quantity of a given good and .Xt the unit price of
this good, the coefficient .β represents the price elasticity of demand. Similarly, if
.Yt designates household consumption and .Xt the income of these same households,

the coefficient .β measures the elasticity of consumption with respect to income:


estimating this coefficient allows us to determine how much consumption varies in
response to a certain change in income.

Example 2.4 Let us take the example of the consumption and gross disposable
income series of French households already studied in Chap. 1 and consider the
following model:

. log Ct = log α + β log Yt + εt (2.225)

where .Ct denotes consumption and .Yt income. The data are annual and the study
period runs from 1990 to 2019. In order to estimate this model, we simply take the
logarithm of the raw consumption and income data and apply the OLS method to
the transformed model. The estimation leads to the following results:


log
. Ct = 1.5552 + 0.8796 log Yt (2.226)
(4.67) (36.87)
2.6 Some Extensions of the Simple Regression Model 77

where the numbers in parentheses correspond to the t-statistics of the estimated


coefficients. These results show that if income increases by 1% on average,
consumption increases by 0.88%.

Remark 2.13 The log-log model can be understood from the Box-Cox transfor-
mation (see Box and Cox, 1964). For a variable .Yt , this transformation is given
by:
%
Ytλ −1
.Yt
(λ)
= λ if λ /= 0 (2.227)
log Yt if λ = 0

(λ)
where .Yt is the transformed variable. The Box-Cox transformation thus depends
on a single parameter, noted .λ.
(λ ) (λ )
Let .Yt Y be the transformation of the variable .Yt and let .Xt X be the
transformation of the variable .Xt :
% λY
Yt −1
.Yt
(λY )
= λYif λY =
/ 0 (2.228)
log Yt if λY = 0
% λ
Xt X −1
(λX )
.Xt = λX if λX /= 0 (2.229)
log Xt if λX = 0

The log-log model corresponds to the case where .λY = λX = 0.

2.6.2 Semi-Log Model

The semi-log model is given by:

. log Yt = α + βXt + εt (2.230)

This is a linear model with respect to the parameters .α and .β and with respect to
the variables .log Yt and .Xt . The special feature of this model lies in the fact that only
the dependent variable is in logarithms. After transforming the endogenous variable
into a logarithm, it is possible to apply to this model the methodology presented in
this chapter to estimate the parameters .α and .β by OLS.
In the semi-log model, the coefficient .β measures the rate of change of .Yt
relative to the variation of .Xt ; this rate of change being constant. In other words,
the coefficient .β is equal to the ratio between the relative variation of .Yt and the
absolute variation of .Xt . .β is the semielasticity of .Yt with respect to .Xt .
If the explanatory variable is time, the model is written:

. log Yt = α + βt + εt (2.231)
78 2 The Simple Regression Model

or, leaving aside the error term:

Yt = exp (α + βt)
. (2.232)

This model describes the evolution of the variable .Yt , having a constant growth
rate if .β > 0, or constant decrease if .β < 0. Let us explain this. The model (2.232)
describes an evolution in continuous time and can be written:

Yt = exp (α + βt) = Y0 exp (βt)


. (2.233)

where .Y0 = exp(α) is the value of .Yt at date .t = 0. The coefficient .β is thus equal
to:
1 dYt
β=
. (2.234)
Yt dt

Consequently, the coefficient .β represents the instantaneous growth rate of Y at


date t.
If we now consider discrete time, assuming that t denotes, for example, months,
quarters, or years, we can write:

Yt = Y0 (1 + g)t
. (2.235)

where g is the growth rate of Y . Transforming this expression into logarithmic terms
gives:

. log Yt = log Y0 + t log (1 + g) (2.236)

By positing .log Y0 = α and .log (1 + g) = β and adding the error term, we find
model (2.231). The relationship:

. log (1 + g) = β (2.237)

allows us to obtain an estimate of the coefficient .β from an estimate of the growth


rate g. The coefficient .β is interpreted as the continuous growth rate that would give,
at the end of a period, the same result as a single increase at the rate g.

Example 2.5 Let us take the example of the French household consumption series
(Ct ) over the period 1990–2019 at annual frequency and consider the following
.

model:

. log Ct = α + βt + εt (2.238)
2.6 Some Extensions of the Simple Regression Model 79

where t denotes time, i.e., .t = 0, 1, 2, . . . , 29. The OLS estimation of this model
leads to the following results:


log
. Ct = 13.6194 + 0.0140t (2.239)
(1292.53) (22.49)

From this estimation, we deduce that .log 1 + ĝ = 0.0140 where .ĝ is the
estimated growth rate. Hence, .ĝ = 0.0141. Over the period 1990–2019, French
household consumption increased annually at a rate of 1.41%.

Remark 2.14 The semi-log model can be understood from the Box-Cox transfor-
mation, noting that .λY = 0 and .λX = 1.

2.6.3 Reciprocal Model

The reciprocal model is written:


 
1
Yt = α + β
. + εt (2.240)
Xt

This model is linear with respect to the parameters .α and .β. Such a model can
be estimated by OLS following the methodology described in this chapter and after
transforming the variable .Xt into its inverse.  
According to this model, when the variable .Xt tends to infinity, the term .β X1t
tends to zero and .α is therefore the asymptotic limit of .Yt when .Xt tends to infinity.
In addition, the slope of the model (2.240) is given by:
 
dYt 1
. = −β (2.241)
dXt Xt2

Therefore, if .β > 0, the slope is always negative, and if .β < 0, the slope is
always positive.
This type of model, represented in Fig. 2.11 for .β > 0, can be illustrated by the
Phillips curve. This curve originally related the growth rate of nominal wages to the
unemployment rate. It was subsequently transformed into a relationship between
the inflation rate and the unemployment rate. This Phillips curve can be estimated
by regressing the inflation rate on the inverse of the unemployment rate, with the
inflation rate tending asymptotically towards the estimated value of .α.
80 2 The Simple Regression Model

Fig. 2.11 Reciprocal model Y

β>0

Example 2.6 For example, suppose that, for a given country, the regression of
the inflation rate .(πt ) on the inverse of the unemployment rate .(ut ) leads to the
following results:
 
1
t = −2.3030 + 20.0103
.π (2.242)
ut

These results show that even if the unemployment rate rises indefinitely, the
largest change in prices will be a drop in the inflation rate of about 2.30 points.

Remark 2.15 The reciprocal model corresponds to the case where .λY = 1 and
λX = −1 in the Box-Cox transformation.
.

2.6.4 Log-Inverse or Log-Reciprocal Model

The log-inverse or log-reciprocal model is given by:


 
1
. log Yt = α − β + εt (2.243)
Xt

Ignoring the error term, this model can still be written:


  
1
Yt = exp α − β
. (2.244)
Xt

When .Xt tends to zero by positive values . Xt → 0+ , .Yt tends to zero.


Furthermore, the slope is given by:
    
dYt β 1
. = exp α − β (2.245)
dXt Xt2 Xt
Conclusion 81

Fig. 2.12 Log-inverse model Y

exp(α)

0,135 exp(α)

It is positive if .β > 0. Moreover, the second derivative is written:


    
d 2 Yt β2 2β 1
. = − 3 exp α − β (2.246)
dXt2 Xt4 Xt Xt
 
The cancellation of this second derivative, i.e., . β4 − 2β3 = β 2 − 2βXt = 0,
Xt Xt
shows that there is an inflection point for .Xt = β/2. Moreover, when .Xt tends
to infinity, .Yt tends to .exp (α) by virtue of (2.244). By replacing .Xt with .β/2 in
Eq. (2.244), the value of .Yt at the inflection point is given by:

Yt = exp (α − 2) = 0.135 exp (α)


. (2.247)

As shown in Fig. 2.12, we see that, initially, .Yt grows at an increasing rate (the
curve is convex), then, after the inflection point, the variable grows at a decreasing
rate.

Remark 2.16 The log-reciprocal model corresponds to the case where .λY = 0 and
λX = −1 in the Box-Cox transformation.
.

Conclusion

This chapter has presented the basic model of econometrics, namely, the simple
regression model. In this model, only one explanatory variable is introduced. In
practice, however, it is rare that a single variable can explain the behavior of the
dependent variable. It is possible, then, to refine the study of the dynamics of the
dependent variable by adding explanatory variables to the model. This is known as
a multiple regression model. This model is the subject of the next chapter.
82 2 The Simple Regression Model

The Gist of the Chapter

Simple regression model Yt = α + βXt + εt


Variables Explained (dependent) variable: Yt
Explanatory (independent) variable: Xt
Error: εt
Hypotheses Zero mean error: E(εt ) = 0 ∀t
Non-autocorrelation and homoskedasticity:
%
0 ∀t /= t '
E (εt εt ' ) =
σε2 ∀t = t '
Normality: εt ∼ N 0, σε2
Regression line Ŷt = α̂ + β̂Xt
Residuals et = Yt − Ŷt = Yt − α̂ − β̂Xt
OLS estimators α̂ = Ȳ − β̂ X̄
Cov(Xt ,Yt )
β̂ = V (Xt )

1  2
T
σ̂ε2 = T −2 et
t=1
β̂
t-Statistic tβ̂ = &
σβ̂
  T 
 2 
T
V Ŷt Ŷt −Ȳ et2
Coefficient of determination R2 = V (Yt ) = t=1

T
=1− 
T
t=1
,
2 2
(Yt −Ȳ ) (Yt −Ȳ )
t=1 t=1

0 ≤ R2 ≤ 1

Further Reading

Developments on the linear regression model and the ordinary least squares method
can be found in any econometrics textbook (see the references cited at the end of the
book), including Johnston and Dinardo (1996), Davidson and MacKinnon (1993),
or Greene (2020). For a more mathematical presentation, see, for example, Florens
et al. (2007).
For further developments related to tests and laws, readers may refer to Lehnan
(1959), Rao (1965), Kmenta (1971), Mood et al. (1974), or Hurlin and Mignon
(2022).
For extensions of the linear regression model, interested readers can refer to
Davidson and MacKinnon (1993) or Gujarati et al. (2017). Nonlinear regression
models are discussed in Goldfeld and Quandt (1972), Gallant (1987), Pindyck and
Rubinfeld (1991), Davidson and MacKinnon (1993), or Gujarati et al. (2017).
Appendix 2.1: Demonstrations 83

Appendix 2.1: Demonstrations


Appendix 2.1.1: Demonstration of the Linearity of the OLS
Estimators

In order to demonstrate the linearity of the OLS estimators and in particular of .β̂,
let us consider the centered variables:

xt = Xt − X̄
. (2.248)

and

. yt = Yt − Ȳ (2.249)

In this case, the estimator .β̂ is given by:


T 
T 
T 
T
xt yt xt (Yt − Ȳ ) xt Yt xt
t=1 t=1 t=1 t=1
. β̂ = = = − Ȳ × (2.250)
T 
T T T
xt2 xt2 xt2 xt2
t=1 t=1 t=1 t=1

Thus:


T 
T 
T
. xt = Xt − X̄ = Xt − T X̄ = 0
t=1 t=1 t=1

Hence:8


T
xt Yt
t=1

T
. β̂ = = wt Yt (2.251)
T
xt2 t=1
t=1

with:
xt
wt =
. (2.252)

T
xt2
t=1

The expression (2.251) reflects the fact that .β̂ is a linear estimator of .β: .β̂
appears as a linear function of the dependent variable .Yt . We can also highlight

8 Since .X is nonrandom, so is .xt .


t
84 2 The Simple Regression Model

a certain number of characteristics of the weighting coefficients .wt that can be


grouped under the following property.

Property 2.12 By virtue of the definition of .wt (Eq. (2.252)), we can write:


T
xt

T
t=1
. wt = =0 (2.253)
T
t=1 xt2
t=1

In addition:


T 
T 
T 
T 
T
. wt xt = wt Xt − X̄ = wt Xt − X̄ wt = wt Xt (2.254)
t=1 t=1 t=1 t=1 t=1

And:


T
xt2

T 
T
xt t=1
. wt xt = xt = =1 (2.255)

T T
t=1 t=1 xt2 xt2
t=1 t=1

So:


T 
T
. wt xt = wt Xt = 1 (2.256)
t=1 t=1

We also have:
⎛ ⎞2

T T ⎜ ⎟ T
⎜ xt ⎟ xt2 1
. wt2 = ⎜ T ⎟ =  2 =  (2.257)
⎝  2⎠  2
T T
t=1 t=1 xt t=1
xt xt2
t=1 t=1 t=1

The linearity of the estimator .α̂ can also be demonstrated by noting that:

1  
T T
. α̂ = Ȳ − β̂ X̄ = Yt − X̄ wt Yt (2.258)
T
t=1 t=1
Appendix 2.1: Demonstrations 85

and using relation (2.251). We can therefore write:

T 
 
1
. α̂ = − X̄wt Yt (2.259)
T
t=1

which shows that .α̂ is a linear function of .Yt : .α̂ is a linear estimator of .α.

Appendix 2.1.2: Demonstration of the Unbiasedness Property of the


OLS Estimators
 
Let us prove that .β̂ is an unbiased estimator of .β, that is, .E β̂ = β. From
Eq. (2.251), we have:


T 
T 
T 
T 
T
. β̂ = wt Yt = wt (α + βXt + εt ) = α wt + β wt Xt + wt εt
t=1 t=1 t=1 t=1 t=1
(2.260)

By virtue of Eqs. (2.253) and (2.256), we can write:


T
. β̂ = β + wt εt (2.261)
t=1

Let us calculate the mathematical expectation of this expression:

  
T 
T
.E β̂ = E β+ wt εt =β +E wt εt (2.262)
t=1 t=1

This can also be written, noting that .wt is nonrandom:

  
T
E β̂ = β +
. wt E (εt ) (2.263)
t=1

Since .E (εt ) = 0, we deduce:


 
E β̂ = β
. (2.264)

It follows that .β̂ is an unbiased estimator of .β.


86 2 The Simple Regression Model

In order to show that .α̂ is also an unbiased estimator of .α, let us start again from
the linearity property:

T 
 
1
. α̂ = − X̄wt Yt (2.265)
T
t=1

that is:
T 
 
1
. α̂ = − X̄wt (α + βXt + εt ) (2.266)
T
t=1


T 
T 
T T 
 
Xt 1
. α̂ = α − X̄α wt + β − X̄β wt Xt + − X̄wt εt (2.267)
T T
t=1 t=1 t=1 t=1


T 
T
Given that, in accordance with Property 2.12, . wt = 0 and . wt Xt = 1, we
t=1 t=1
deduce:
T 
 
1
. α̂ = α + − X̄wt εt (2.268)
T
t=1

Let us take the mathematical expectation of this expression:

T 
 
1
E α̂ = E α +
. − X̄wt εt (2.269)
T
t=1

which can also be written as:


T 
 
1
E α̂ = α +
. − X̄wt E (εt ) (2.270)
T
t=1

Since .E (εt ) = 0, we obtain the result we are looking for:

E α̂ = α
. (2.271)
Appendix 2.1: Demonstrations 87

Appendix 2.1.3: Demonstration of the Consistency and Minimum


Variance Property of the OLS Estimators

Let us start by showing that the OLS estimators .α̂ and .β̂ are consistent estimators,
that is, their variance tends to zero when T tends to infinity, i.e.:

. lim V (α̂) = 0 and lim V (β̂) = 0 (2.272)


T →∞ T →∞

Let us calculate .V (β̂). We have, by definition:


#  $2 # $2
.V (β̂) = E β̂ − E β̂ = E β̂ − β (2.273)

since .β̂ is an unbiased estimator of .β. Using (2.262), we can write:

2

T 
T 
V (β̂) = E
. wt εt =E wt2 εt2 + 2 wt wt ' εt εt ' (2.274)
t=1 t=1 t<t '


T   
V (β̂) =
. wt2 E εt2 + 2 wt wt ' E (εt εt ' ) (2.275)
t=1 t<t '

We know that:
 
E εt2 = σε2
. (2.276)

where .σε2 denotes the variance of the error, and:

E (εt εt ' ) = 0 ∀t /= t '


. (2.277)

We deduce:


T
V (β̂) = σε2
. wt2 (2.278)
t=1

which can be written using (2.257):

σε2
V (β̂) =
. (2.279)

T
xt2
t=1
88 2 The Simple Regression Model


T 
T 2
Given that . xt2 = Xt − X̄ = T V (Xt ), we deduce the following
t=1 t=1
relationship:

σε2
V (β̂) =
. (2.280)
T V (Xt )

Thus, if .T → ∞, then .V (β̂) → 0, which implies that .β̂ is a consistent estimator.


We also note that the variance of the estimator .β̂ is smaller when the variance of the
explanatory variable is larger.
Let us now calculate .V (α̂) to show that .α̂ is also a consistent estimator. As before,
using the fact that .α̂ is an unbiased estimator, we have by definition:
' (2 ' (2
. V (α̂) = E α̂ − E α̂ = E α̂ − α (2.281)

Using (2.268), we can write:

T 
  2
1
V (α̂) = E
. − X̄wt εt (2.282)
T
t=1

T 
 2  1  
1 1
V (α̂) = E
. − X̄wt εt2 +2 − X̄wt − X̄wt ' εt εt '
T T T
t=1 t<t '
(2.283)

Or:
T 
 2    1  
1 1
V (α̂) =
. − X̄wt E εt2 + 2 − X̄wt − X̄wt ' E (εt εt ' )
T '
T T
t=1 t<t
(2.284)

Hence:
T 
 2 T 
 
1 1 1
V (α̂) = σε2
. − X̄wt = σε2 − 2 X̄w t + X̄ 2 2
w t (2.285)
T T2 T
t=1 t=1

1  T
1 
T
V (α̂) = σε2
. + X̄2 wt2 − 2 X̄ wt (2.286)
T T
t=1 t=1
Appendix 2.1: Demonstrations 89


T
So, using (2.257) and noting that . xt2 = T V (Xt ):
t=1

⎛ ⎞

T 
T
⎜1 xt2 + T X̄2 Xt2
⎜ X̄2 ⎟ ⎟ t=1 2 t=1
.V (α̂) = σε2 ⎜ + ⎟ = σε2 = σε 2 (2.287)
⎝T 
T ⎠ 
T T V (Xt )
xt2 T xt2
t=1 t=1

Thus, if .T → ∞, then .V (α̂) → 0, which implies that .α̂ is a consistent estimator.


It remains to be shown that the OLS estimators .α̂ and .β̂ are estimators of
minimum variance among the class of linear unbiased estimators. Let us start by
treating the case of the estimator .β̂. In general and by definition, a linear estimator
of .β can be written as:


T
β∗ =
. γt Y t (2.288)
t=1

where the .γt are weighting coefficients that must be determined. Given that .Yt =
α + βXt + εt , we have:


T
β∗ =
. γt (α + βXt + εt ) (2.289)
t=1

that is:


T 
T 
T
β∗ = α
. γt + β γt Xt + γt εt (2.290)
t=1 t=1 t=1

Let us determine the mathematical expectation of this expression:


T 
T 
T
E β∗ = E α
. γt + β γt Xt + γt εt (2.291)
t=1 t=1 t=1

By distributing the expectation operator and using the fact that .E (εt ) = 0, we
get:


T 
T
E β∗ = α
. γt + β γt Xt (2.292)
t=1 t=1
90 2 The Simple Regression Model

β ∗ is an unbiased estimator of .β if .E (β ∗ ) = β, i.e., if:


.


T
. γt = 0 (2.293)
t=1

and


T
. γt Xt = 1 (2.294)
t=1

If these two conditions hold, we have according to (2.290):


T
. β∗ = β + γt εt (2.295)
t=1

and the variance is given by:

2

T
∗ ∗ 2
V β
. =E β −β =E γt εt (2.296)
t=1

By applying the same reasoning as that used to demonstrate the consistency of


the estimator .β̂, we obtain:


T
V (β ∗ ) = σε2
. γt2 (2.297)
t=1

We must therefore compare this variance with that of the OLS estimator, i.e.,
V (β̂) given by:
.


T
V (β̂) = σε2
. wt2 (2.298)
t=1

To this end, let us posit:

γt = wt + (γt − wt )
. (2.299)

We have:


T 
T 
T 
T 
T
. γt2 = (wt + (γt − wt ))2 = wt2 + (γt − wt )2 + 2 wt (γt − wt )
t=1 t=1 t=1 t=1 t=1
(2.300)
Appendix 2.1: Demonstrations 91

According to (2.257):


T
1
. wt2 = (2.301)

T
t=1 xt2
t=1


T 
T
and, in line with (2.252) and using the fact that . γt Xt = γt xt = 1:
t=1 t=1


T
xt γt

T
t=1 1
. wt γt = = (2.302)
T 
T
t=1 xt2 xt2
t=1 t=1

According to (2.300), we have:


T 
T 
T 
T
2
. wt (γt − wt ) = γt2 − wt2 − (γt − wt )2 (2.303)
t=1 t=1 t=1 t=1


T 
T
= −2 wt2 + 2 wt γt
t=1 t=1

Hence:


T
1 1
. wt (γt − wt ) = − + =0 (2.304)

T 
T
t=1 xt2 xt2
t=1 t=1

We have:


T 
T 
T 
T
V (β ∗ ) = σε2
. γt2 = σε2 (wt + (γt − wt ))2 = σε2 wt2 + (γt − wt )2
t=1 t=1 t=1 t=1
(2.305)

So, using (2.278):


T
.V (β ∗ ) = V (β̂) + σε2 (γt − wt )2 (2.306)
t=1
92 2 The Simple Regression Model


T
Since . (γt − wt )2 ≥ 0, we have:
t=1

V (β ∗ ) ≥ V (β̂)
. (2.307)

It follows that, among the class of unbiased estimators, the OLS estimator .β̂ is
the one with the lowest variance.
By applying similar reasoning, it is possible to show that the OLS estimator .α̂
also satisfies the same property.

Appendix 2.1.4: Calculation of the Estimator of the Variance of the


Error Term

We seek to determine an estimator .σ̂ε2 of the variance of the error term .σε2 . By
definition, the residuals are given by:

et = Yt − Ŷt = α + βXt + εt − α̂ − β̂Xt


. (2.308)
 
et = εt − α̂ − α − β̂ − β Xt
. (2.309)

We also have:

1  1 
T T
. Ȳ = Yt = (α + βXt + εt ) = α + β X̄ + ε̄ (2.310)
T T
t=1 t=1

and:

yt = Yt − Ȳ = α + βXt + εt − α + β X̄ + ε̄ = β Xt − X̄ + (εt − ε̄)


. (2.311)

hence:

.yt = βxt + (εt − ε̄) (2.312)

Given that:

et = yt − ŷt = yt − β̂xt = βxt + (εt − ε̄) − β̂xt


. (2.313)

we deduce:
 
et = (εt − ε̄) − β̂ − β xt
. (2.314)
Appendix 2.1: Demonstrations 93

In order to introduce the sum of squared residuals, which is necessary to calculate


the estimator of the variance of the error term, this expression is squared and
summed:
! "2

T 
T  
. et2 = (εt − ε̄) − β̂ − β xt (2.315)
t=1 t=1

which gives:


T 
T  2 
T  
T
. et2 = (εt − ε̄)2 + β̂ − β xt2 − 2 β̂ − β (εt − ε̄) xt (2.316)
t=1 t=1 t=1 t=1

Let us calculate the mathematical expectation of the different terms of this


equation:


T 
T
. E (εt − ε̄)2 =E εt2 − T ε̄2 (2.317)
t=1 t=1
2

T   1 
T
= E εt2 − εt
T
t=1 t=1
⎛ ⎞2

T   1  T   
= E εt2 − ⎝ E εt2 + 2 E (εt εt ' )⎠
T '
t=1 t=1 t/=t

Given that .E (εt εt ' ) = 0 for .t /= t ' , we have:


T
E
. (εt − ε̄)2 = (T − 1) σε2 (2.318)
t=1

Now consider the second term of (2.316):

 2 
T 
T  2
E
. β̂ − β xt2 = xt2 E β̂ − β (2.319)
t=1 t=1

We know that:
 2 σε2
E β̂ − β = V (β̂) =
. (2.320)

T
xt2
t=1
94 2 The Simple Regression Model

Hence:

 2 
T
. E β̂ − β xt2 = σε2 (2.321)
t=1

In addition:

 
T 
T 
T 
T
. β̂ − β (εt − ε̄) xt = wt εt εt xt − ε̄ xt (2.322)
t=1 t=1 t=1 t=1


T
because .β̂ − β = wt εt according to (2.261). Furthermore, by virtue of (2.252)
t=1

T
and since . xt = 0, we have:
t=1

 2

T
  εt xt
T T
xt εt 
T
t=1
. β̂ − β (εt − ε̄) xt = εt xt = (2.323)

T
2

T
t=1 t=1 xt t=1 xt2
t=1 t=1

Let us take the expectation of this expression:


⎛ 2 ⎞

T
  ⎜ εt xt ⎟
T
⎜ ⎟
=E⎜ ⎟
t=1
.E β̂ − β (εt − ε̄) xt ⎜ (2.324)
t=1 ⎝  2 ⎟
T

x t
t=1
⎛ ⎞
1 T 
= E⎝ εt2 xt2 + 2 εt εt ' xt xt ' ⎠

T
xt2 t=1 t/=t '
t=1

 

T 
T 
T
Noting that .E εt2 xt2 = xt2 E εt2 = σε2 xt2 , we deduce:
t=1 t=1 t=1

 
T
E
. β̂ − β (εt − ε̄) xt = σε2 (2.325)
t=1
Appendix 2.1: Demonstrations 95

If we take Eq. (2.316), using relations (2.318), (2.321), and (2.325), we obtain:


T
E
. et2 = (T − 1) σε2 + σε2 − 2σε2 = (T − 2) σε2 (2.326)
t=1

We finally deduce the estimator .σ̂ε2 of the variance of the error term:

1  2
T
. σ̂ε2 = et (2.327)
T −2
t=1

This is an unbiased estimator of .σε2 .

Appendix 2.1.5: Calculation of the Standard Deviation of the


Forecast Error and Prediction Interval

In order to determine a prediction interval, the variance of the forecast error must
be calculated. We have:
   2
. V (eT +h ) = E εT +h − α̂ − α − β̂ − β XT +h (2.328)
 #   $2
= E εT +h − α̂ − α + β̂ − β XT +h

 #   $
V (eT +h ) = E (εT +h )2 − 2E εT +h α̂ − α + β̂ − β XT +h
. (2.329)
   2
+E α̂ − α + β̂ − β XT +h

 2  
2
Knowing that .E (εT +h XT +h ) = 0, .E α̂ − α = V α̂ , .E β̂ − β = V β̂
        
and .E α̂ − α β̂ − β = E α̂ − E α̂ β̂ − E β̂ = cov α̂, β̂ , we
have:
   
.V (eT +h ) = V (εT +h ) + V α̂ + XT +h V β̂ + 2XT +h cov α̂, β̂
2
(2.330)

Let us calculate the covariance between .α̂ and .β̂:


       
cov α̂, β̂ = E α̂ − α β̂ − β = E Ȳ − β̂ X̄ − Ȳ + β X̄ + ε̄ β̂ − β
.

(2.331)
96 2 The Simple Regression Model

that is:
      
.cov α̂, β̂ = E ε̄ − β̂ − β X̄ β̂ − β (2.332)
  
Since .E ε̄ β̂ − β = 0, we have:

   2
. cov α̂, β̂ = −X̄E β̂ − β (2.333)

From (2.279), we know that:


 2 σε2
E β̂ − β = V (β̂) =
. (2.334)

T
xt2
t=1

Hence:
  σ2
cov α̂, β̂ = −X̄ ε
. (2.335)

T
xt2
t=1

Furthermore, according to (2.287):


⎛ ⎞
⎜1 X̄2 ⎟
⎜ ⎟
V (α̂) = σε2 ⎜ +
. ⎟ (2.336)
⎝T  2⎠
T
xt
t=1

Transferring these various expressions into (2.330), we obtain:


⎛ ⎞
⎜ X̄2 1 ⎟
⎜ 1 1 ⎟
V (eT +h ) = σε2 ⎜1 + +
. + XT2 +h − 2XT +h X̄ ⎟ (2.337)
⎝ T 
T
2

T
2

T
2

xt xt xt
t=1 t=1 t=1

or:
⎛ ⎞ ⎛ ⎞
⎜ XT +h − X̄
2⎟ ⎜ XT +h − X̄
2⎟
⎜ 1 ⎟ ⎜ 1 ⎟
V (eT +h ) =
. σε2 ⎜1 + + ⎟ = σε2 ⎜1 + + T ⎟
⎝ T 
T ⎠ ⎝ T  2⎠
xt2 Xt − X̄
t=1 t=1
(2.338)
Appendix 2.2: Normal Distribution and Normality Test 97

The relationship (2.338) shows that the variance of the forecast error is an
increasing function of the squared deviation between the value of the explanatory
variable in .T + h and its mean. In other words, the larger the variance, i.e., the
more the value of X in .T + h deviates from the mean, the higher the variance of
the forecast error. The forecast error being a linear function of variables following
normal distributions (relation (2.205)), it is normally distributed:
eT +h
.  ∼ N(0, 1) (2.339)
 2
σε 
X +h −X̄ )
1 +
1
T + ( T
T
xt2
t=1

By replacing .σε with its estimator .σ̂ε , we obtain:


eT +h
.  ∼ t (T − 2) (2.340)
 2
σ̂ε 
X +h −X̄ )
1 +
1
T + ( T
T
xt2
t=1

that is:

YT +h − ŶT +h
.  ∼ t (T − 2) (2.341)
 2

σ̂ε 1 + T +
1 ( XT +h −X̄ )
T
xt2
t=1

with .ŶT +h = α̂ + β̂XT +h . We deduce a .100(1 − p)% prediction interval for .YT +h :

  
 1 XT +h − X̄
2

. α̂ + β̂XT +h ± tp/2 σ̂ε 1 + + (2.342)
 T 
T
xt2
t=1

Appendix 2.2: Normal Distribution and Normality Test

The normal distribution is the most widespread statistical distribution. The density
function of a random variable x following a general normal distribution is given by:
 2
1 1 x−m
p(x) = √
. exp − (2.343)
2π σ 2 σ
98 2 The Simple Regression Model

where exp is the exponential, m is the mean of the variable x, and .σ its standard
deviation. We note:
 
.x ∼ N m, σ
2
(2.344)

The “bell curve” that represents it passes through a maximum at .x = m and is


symmetrical with respect to the vertical line passing through this point. In practice, a
particularly important case is that of the standard normal distribution. The density
function of a random variable z following a standard normal distribution is given by:
 
1 1 2
.p(z) = √ exp − z (2.345)
2π 2

It is always possible to return to the case of the standard normal distribution by


transforming the variable: .z = x−m
σ . We note:

z ∼ N (0, 1)
. (2.346)

Consider a series .Xt , .t = 1, . . . , T . The normal distribution is characterized by


a skewness coefficient equal to zero and a kurtosis coefficient of 3. Skewness is a
measure of the asymmetry of a distribution, and kurtosis is a measure that quantifies
the shape of a distribution by providing information about its tails and peakedness
compared to a normal distribution. The skewness coefficient, noted S, associated
with the series .Xt is given by:
 2

T 3
1
T Xt − X̄
t=1 μ23
S=
.
3
= (2.347)

T 2 μ32
1
T Xt − X̄
t=1

and the kurtosis coefficient, denoted K, is written:


T 4
1
T Xt − X̄
t=1 μ4
K=
.
2
= (2.348)

T 2 μ22
1
T Xt − X̄
t=1

where .X̄ is the mean of the series .Xt , .t = 1, . . . , T , and the .μi are the centered
moments of order i. For a normal distribution, we have:

μ3 = 0
. (2.349)
μ4 = 3μ22
Appendix 2.2: Normal Distribution and Normality Test 99

Fig. 2.13 Density function


of the normal distribution

Fig. 2.14 Asymmetric


distribution (spread to the
left)

Fig. 2.15 Asymmetric


distribution (spread to the
right)

that is:

S=0
. (2.350)
K=3

When .S /= 0, the distribution of the series is said to be asymmetric. More


specifically, when .S < 0, the distribution is skewed to the right (or spread to the left),
and when .S < 0 it is skewed to the left (or spread to the right). Figures 2.13, 2.14,
and 2.15 represent the case of a series following a normal distribution (symmetric
distribution), a series whose distribution is skewed to the right, and a series with a
left-skewed distribution, respectively. On these last two graphs, the dotted curve is
the one relating to the normal distribution.
Concerning the kurtosis coefficient, if .K < 3, the distribution is said to be
flattened or platykurtic. If .K > 3, the distribution presents an excess of kurtosis;
we speak about leptokurtic distribution. Figure 2.13 shows the normal distribution
(.K = 3), Fig. 2.16 the case where .K < 3, and Fig. 2.17 the case where .K > 3. On
these last two graphs, the dotted curve represents the normal distribution.
The Jarque and Bera test (Jarque and Bera, 1980) is based on the definition
of skewness and kurtosis coefficients and allows us to test the null hypothesis of
100 2 The Simple Regression Model

Fig. 2.16 Platykurtic


distribution

Fig. 2.17 Leptokurtic


distribution

normality of a distribution. The test statistic, denoted J B, is written as:



T 1
JB =
. S 2 + (K − 3)2 (2.351)
6 4

Under the null hypothesis of normality, the test statistic follows a Chi-squared
distribution with 2 degrees of freedom. Therefore, if the calculated value of the J B
test statistic is lower than the theoretical value of the Chi-squared distribution with
2 degrees of freedom, the null hypothesis of normality is not rejected. On the other
hand, if J B is greater than the critical value, the null hypothesis of normality is
rejected.
This Jarque and Bera test is used to test the normality of the residuals.

Appendix 2.3: The Maximum Likelihood Method

The maximum likelihood method is another technique for estimating a regression


model. It is assumed that the error .εt is normally distributed with mean zero and
variance .σε2 . In the regression model:

.Yt = α + βXt + εt (2.352)

.Yt is a linear function of the error term. Consequently, .Yt is also normally

distributed with mean:

E(Yt ) = α + βXt
. (2.353)
Appendix 2.3: The Maximum Likelihood Method 101

and variance:

V (Yt ) = σε2
. (2.354)

We note .f Y1 , Y2 , . . . , YT α + βXt , σε2 the joint probability density func-
tion of .(Y1 , Y2 , . . . , YT ). Since the .Yt are supposed to be independent, we can write:
        
  
f Y1 , Y2 , . . . , YT α + βXt , σε2 = f Y1 α + βXt , σε2 × f Y2 α + βXt , σε2
.

  

× . . . × f YT α + βXt , σε2 (2.355)

with:
 2
1 1 Yt − α − βXt
.f (Yt ) = √ exp − (2.356)
2π σε 2 σε

which is the density function of the general normal distribution.


By substituting (2.356) into (2.355), we get:
  

f Y1 , Y2 , . . . , YT
. α + βXt , σε2
T  
1 1  Yt − α − βXt 2
=  √ T exp − (2.357)
2 σε
2π σεT t=1

Assuming that the .(Y1 , Y2 , . . . , YT ) are known, expression (2.357) is called the
likelihood function. It is noted:

  T  
1 1  Yt − α − βXt 2
L α, β, σε2 = √ T
. exp − (2.358)
2 σε
2π σεT t=1

The maximum likelihood method consists in maximizing the latter expression


in order to obtain the estimators of .α, β, and .σε2 . To this end, we write the log-
likelihood function in logarithms:

T  
T 1  Yt − α − βXt 2
. ln L = −T ln σε − ln (2π) − (2.359)
2 2 σε
t=1
102 2 The Simple Regression Model

which can be expressed as follows:

T  
T T 1  Yt − α − βXt 2
. ln L = − ln σε2 − ln (2π ) − (2.360)
2 2 2 σε
t=1

This expression is called the log-likelihood. We differentiate Eq. (2.360) with


respect to .α, β, and .σε2 , i.e.:

T 
 
∂ ln L Yt − α − βXt
. =− (−1) (2.361)
∂α σε2
t=1

T 
 
∂ ln L Yt − α − βXt
. =− (−Xt ) (2.362)
∂β σε2
t=1

T  
∂ ln L T 1  Yt − α − βXt 2
. =− 2 + (2.363)
∂σε2 2σε 2 σε2
t=1

Equalizing each of these equations to zero (first-order optimization condition)


and denoting by ML the maximum likelihood estimators, we obtain:

T 
 
. Yt − α̂ML − β̂ML Xt = 0 (2.364)
t=1

T 
 
. Yt − α̂ML − β̂ML Xt Xt = 0 (2.365)
t=1

T 
 2
1
. −T + 2
Yt − α̂ML − β̂ML Xt = 0 (2.366)
σ̂ε,ML t=1

Equations (2.364) and (2.365) can be written:


T 
T
. Yt = T α̂ML + β̂ML Xt (2.367)
t=1 t=1


T 
T 
T
. Xt Yt = α̂ML Xt + β̂ML Xt2 (2.368)
t=1 t=1 t=1
Appendix 2.3: The Maximum Likelihood Method 103

Equations (2.367) and (2.368) correspond exactly to Eqs. (2.44) and (2.48). The
maximum likelihood estimators of the coefficients .α and .β are therefore identical to
the OLS estimators.
Let us now determine the estimator of the variance of the error term:

1  2
T
.
2
σ̂ε,ML = Yt − α̂ML − β̂ML Xt (2.369)
T
t=1

1  2
T
= Yt − α̂ − β̂Xt
T
t=1

1  2
T
= et
T
t=1

The maximum likelihood estimator of the variance of the error is therefore


different from that obtained by OLS (Eq. (2.327)):

1  2
T
. σ̂ε2 = et (2.370)
T −2
t=1

Since the OLS estimator is an unbiased estimator, it follows that the maximum
likelihood estimator is a biased estimator. This is a consistent estimator and the bias
decreases as the sample size increases.
The Multiple Regression Model
3

The simple regression model studied in Chap. 2 had only one explanatory variable.
In practice, however, it is common for an (explained) variable to depend on several
explanatory variables.
As in the previous chapter, let us take a few examples to illustrate the questions to
be answered in this chapter. Does a family’s consumption expenditure depend more
on its income or its size? To what extent is the quantity demanded of a particular
good a function of the price of that good, the price of other goods, and consumer
income? Do wages depend more on the level of education or work experience?
In these different cases where several explanatory variables come into play, we
speak of a multiple regression model. This chapter proposes an in-depth study of
this model.1

3.1 Writing the Model in Matrix Form

The multiple regression model with k explanatory variables is written as:

Yt = α + β1 X1t + β2 X2t + . . . + βk Xkt + εt


. (3.1)

where .t = 1, . . . , T , .Yt is the explained (or dependent) variable, .X1t , X2t , . . . , Xkt
are the k explanatory variables, and .εt is the error term.
This model thus corresponds to an extension of the simple regression model to
the case of k explanatory variables with .k > 1.
The coefficients .β1 , . . . , βk are called partial regression coefficients or partial
slope coefficients. .β1 measures the change in the mean value of Y , with the value
of the other explanatory variables remaining constant (i.e., all other things being

1 This chapter calls upon various notions of matrix algebra. In Appendix 3.1, readers will find the

elements of matrix algebra necessary to understand the various developments here.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 105
V. Mignon, Principles of Econometrics, Classroom Companion: Economics,
https://doi.org/10.1007/978-3-031-52535-3_3
106 3 The Multiple Regression Model

equal). The regression coefficient .β1 therefore measures the effect of a 1-point
variation in .X1t on the mean value of .Yt , this effect being net of any potential
influence of the other explanatory variables on the mean value of .Yt . The same
type of reasoning obviously applies to the other regression coefficients.
We can write Eq. (3.1) for each value of .t, t = 1, . . . , T :

Y1 = α + β1 X11 + β2 X21 + . . . + βk Xk1 + ε1


Y2 = α + β1 X12 + β2 X22 + . . . + βk Xk2 + ε2
..
.
. (3.2)
Yt = α + β1 X1t + β2 X2t + . . . + βk Xkt + εt
..
.
YT = α + β1 X1T + β2 X2T + . . . + βk XkT + εT

Or, in matrix form:


⎛ ⎞ ⎛ ⎞ ⎛ ⎞
Y1 1 X11 X21 · · · Xk1 ⎛ ⎞ ε1
⎜ Y ⎟ ⎜1 · · · Xk2 ⎟ ⎜ε ⎟
⎜ 2⎟ ⎜ X12 X22 ⎟ α ⎜ 2⎟
⎜ . ⎟ ⎜. . ⎟ ⎜ ⎟ ⎜ ⎟
⎜ . ⎟ ⎜. .. .. ⎟ ⎜β1 ⎟ ⎜ . ⎟
⎜ . ⎟ ⎜. . . · · · .. ⎟ ⎜β ⎟ ⎜ .. ⎟
.⎜ ⎟=⎜ ⎟ ⎜ 2⎟ + ⎜ ⎟ (3.3)
⎜ Yt ⎟ ⎜ 1 X1t X2t · · · Xkt ⎟ ⎜ . ⎟ ⎜ εt ⎟
⎜ ⎟ ⎜ ⎟ . ⎜ ⎟
⎜ .. ⎟ ⎜ .. .. .. .. ⎟ ⎝ . ⎠ ⎜ .. ⎟
⎝ . ⎠ ⎝. . . ··· . ⎠ βk ⎝.⎠
YT 1 X1T X2T · · · XkT εT

The multiple regression model can then be written:2

. Y = X β + ε (3.4)
(T ,1) (T ,k+1)(k+1,1) (T ,1)

where:

– .Y the column vector containing the values of the dependent variable: .Y =


⎛ is ⎞
Y1
⎜Y ⎟
⎜ 2⎟
⎜ . ⎟
⎜ . ⎟
⎜ . ⎟
⎜ ⎟
⎜ Yt ⎟
⎜ ⎟
⎜ .. ⎟
⎝ . ⎠
YT

2 Matrices and vectors are written in bold characters. This notation convention will be used
throughout the book.
3.2 The OLS Estimators 107

– .X is the matrix whose first column is composed of 1 and each of whose


subsequent
⎛ columns contains ⎞the values of each of the k explanatory variables:
1 X11 X21 · · · Xk1
⎜1 X X · · · X ⎟
⎜ 12 22 k2 ⎟
⎜. . .. .. ⎟
⎜. . ⎟
⎜. . . ··· . ⎟
.X = ⎜ ⎟. Each explanatory variable, .Xit , .i = 1, . . . , k, is
⎜1 X1t X2t · · · Xkt ⎟
⎜ ⎟
⎜ .. .. .. . ⎟
⎝. . . · · · .. ⎠
1 X1T X2T · · · XkT
thus represented as a column vector of T observations of the variable .Xi .
⎛ ⎞
α
⎜β1 ⎟
⎜ ⎟
⎜ ⎟
– .β is the vector of parameters to be estimated: .β = ⎜β2 ⎟
⎜.⎟
⎝ .. ⎠
βk
⎛ ⎞
ε1
⎜ε ⎟
⎜ 2⎟
⎜.⎟
⎜.⎟
⎜.⎟
– .ε is the column vector containing the values of the error term: .ε = ⎜ ⎟
⎜ εt ⎟
⎜ ⎟
⎜ .. ⎟
⎝.⎠
εT

The first column of the matrix .X contains only 1s in order to take into account
the constant .α. This allows us to keep a compact matrix form, making it easier to
present the developments linked to the multiple regression model.
As in the case of the simple regression model, the aim is to obtain an estimate
of the parameter vector .β. To this end, the ordinary least squares (OLS) method is
applied.

3.2 The OLS Estimators


3.2.1 Assumptions of the Multiple Regression Model

Like the simple regression model, the multiple regression model is based on a
number of assumptions, which we present below.

Hypothesis 1: The Matrix X Is Nonrandom


For the sake of simplification, it is frequently assumed that the values of the
variables in the matrix .X are observed without error, i.e., that the matrix .X is
108 3 The Multiple Regression Model

nonrandom. This assumption may seem strong,3 but it amounts to assuming that the
explanatory variables are controlled, which considerably simplifies the derivation
of some fundamental statistical results. It is thus a purely technical assumption,
allowing us to consider each vector of the matrix .X as a known constant for the
probability distribution of .Yt .
It is possible to relax this assumption about the nonrandom character of the
matrix .X and assume that it is independent of each value of the error term. Such
an assumption can be written as:
⎛ ⎞
E (ε1 |X )
⎜ E (ε2 |X ) ⎟
⎜ ⎟
.E (εt |X ) = ⎜ .. ⎟=0 (3.5)
⎝ . ⎠
E (εT |X )

where the notation .E (a |b ) denotes the expectation of a conditional on b. Thus, the


expectation of the error term conditional on the values of the explanatory variables
is zero. This is equivalent to assuming that the explanatory variables do not provide
any information about the mean value of the error term. They are not involved in
predicting it.

Hypothesis 2: The Matrix X Is of Full Rank


Assuming that the matrix .X is of full rank is equivalent to writing:

Rank(X) = k + 1
. (3.6)

This assumption states that the explanatory variables are linearly independent.4
Such an assumption of independence among the explanatory variables is necessary
for estimating the parameter vector .β.
If the number of observations T is less than .k + 1, then the matrix .X cannot be
of full rank. For this reason, we assume that the number of observations is greater
than the number of explanatory variables, i.e.:

T >k+1
. (3.7)

Hypothesis 3: The Expectation of the Error Term Is Zero


As in the case of the simple regression model, the error term is assumed to have zero
expectation. The reasoning is the same: the error term can take on both negative
and positive values, and there is no bias in favor of positive values, nor in favor of

3 In the sense that the matrix of explanatory variables is assumed to be unchanged whatever the

sample of observations.
4 We will see later that such an assumption implies that there is no collinearity between the

explanatory variables.
3.2 The OLS Estimators 109

negative ones. We deduce that the mathematical expectation of the errors is zero:
⎛ ⎞
E (ε1 )
⎜ E (ε2 ) ⎟
⎜ ⎟
.E (ε) = ⎜ . ⎟=0 (3.8)
⎝ .. ⎠
E (εT )

Hence:

.E (Y ) = Xβ (3.9)

Hypothesis 4: Homoskedasticity and the Absence of Autocorrelation


of Errors

Let .E εε' be the variance-covariance matrix of the error term with .ε ' denoting the
transpose of .ε. The assumption of homoskedasticity—i.e., constancy of variance—
and of absence of autocorrelation is written:

E εε' = σε2 I
. (3.10)

where .I denotes the identity matrix and .σε2 the variance of the error term.
To understand expression (3.10), let us write the variance-covariance matrix of
the error term:
⎛ ⎞
V (ε1 ) Cov(ε1 , ε2 ) · · · Cov(ε1 , εT )
 ' ⎜ Cov(ε2 , ε1 ) V (ε2 ) · · · Cov(ε2 , εT )⎟
⎜ ⎟
.E εε =⎜ .. .. .. .. ⎟ (3.11)
⎝ . . . . ⎠
Cov(εT , ε1 ) Cov(εT , ε2 ) · · · V (εT )
⎛ ⎞
E(ε12 ) E(ε1 ε2 ) · · · E(ε1 εT )
⎜ E(ε2 ε1 ) E(ε2 ) · · · E(ε2 εT )⎟
⎜ 2 ⎟
=⎜ .. .. .. .. ⎟
⎝ . . . . ⎠
E(εT ε1 ) E(εT ε2 ) · · · E(εT2 )

Under the assumption of no autocorrelation, all terms off the diagonal are zero.
In accordance with the assumption of homoskedasticity, the terms on the diagonal
are constant and equal to .σε2 . We therefore have:
⎛ 2 ⎞ ⎛ ⎞
σε 0 ··· 0 1 0 ··· 0
 ' ⎜ σε2 ··· 0⎟ ⎜0 1 ··· 0⎟
⎜0 ⎟ 2⎜ ⎟
.E εε =⎜ . .. .. .. ⎟ = σε ⎜ .. .. . . .. ⎟ = σε I
2
(3.12)
⎝ .. . . . ⎠ ⎝. . . .⎠
0 0 · · · σε2 0 0 ··· 1

Recall that non-autocorrelated and homoskedastic errors are said to be spherical.


110 3 The Multiple Regression Model

Hypothesis 5: Normality of Errors


Errors are assumed to follow a normal distribution of zero mean and constant
variance, i.e.:

.ε ∼ N 0, σε2 I (3.13)

As in the case of the simple regression model, the normality assumption is not
necessary to establish the results of the multiple regression model. However, it
allows us to derive statistical results and construct test statistics (see below).

3.2.2 Estimation of Coefficients

Consider the multiple regression model:

. Y = Xβ + ε (3.14)

Our objective is to estimate the vector .β of parameters by the OLS method. This
vector .β̂ of estimated parameters is given by:
 −1
. β̂ = X' X X' Y (3.15)

Let us demonstrate this relationship. Consider the vector .e of residuals:

e = Y − X β̂
. (3.16)

where .β̂ denotes the vector of parameters estimated by OLS.


Recall that the OLS method consists in minimizing the sum of squared errors.
Since the errors are not observable, we minimize the sum of squared residuals:

T

Min
. εt2 ≡ Min e' e (3.17)
t=1

'
e' e = Y − X β̂
. Y − Xβ̂ (3.18)
' '
= Y ' Y − β̂ X' Y − Y ' X β̂ + β̂ X' Xβ̂

'
. β̂ X' Y and .Y ' X β̂ are scalars. Knowing that a scalar is equal to its transpose:

' ' '


. β̂ X' Y = β̂ X' Y = Y ' Xβ̂ (3.19)
3.2 The OLS Estimators 111

we deduce:
' '
e' e = Y ' Y − 2β̂ X' Y + β̂ X' Xβ̂
. (3.20)

We seek to minimize this expression, i.e.:



∂ e' e
. =0 (3.21)
∂ β̂

 ' '
∂ e' e ∂ β̂ X' Y ∂ β̂ X' Xβ̂
. = −2 + (3.22)
∂ β̂ ∂ β̂ ∂ β̂

We have:
'
∂ β̂ X' Y
. = X' Y (3.23)
∂ β̂

∂ A' B ∂ B'A
because . (∂B ) = (∂B ) = A where .A and .B denote vectors (of dimension .(k +
1, 1) in our case).
Moreover:
'
∂ β̂ X' Xβ̂ 
. = 2 X' X β̂ (3.24)
∂ β̂

∂ B ' CB
since . ( ∂B ) = 2CB where .C denotes a symmetric matrix and .B a vector.
Relationship (3.22) is therefore written as:

∂ e' e 
. = −2X ' Y + 2 X' X β̂ = 0 (3.25)
∂ β̂

Hence:
 '
. X X β̂ = X' Y (3.26)

Insofar as the matrix .X is of full rank, it follows that the matrix . X' X is also of
rank .k +1, implying that its inverse exists. Consequently, we deduce from Eq. (3.26)
the expression giving the vector .β̂ of parameters estimated by OLS:
 −1
β̂ = X' X
. X' Y (3.27)
112 3 The Multiple Regression Model

Remark 3.1 As in the case of the simple regression model, it is possible to estimate
the multiple regression model by the maximum likelihood method. It can be shown
that the maximum likelihood estimator of the parameter vector .β is identical to that
obtained with OLS. Furthermore, as in the simple regression model, the maximum
likelihood estimator of the error variance is biased, unlike that of OLS. For a detailed
presentation of the maximum likelihood method, see Greene (2020).

3.2.3 Properties of OLS Estimators


Linearity of the Estimator
We have, according to relationship (3.27):
 −1  −1
β̂ = X' X
. X' Y = X' X X' (Xβ + ε) (3.28)

that is:
 −1  −1
β̂ = X' X
. X' Xβ + X' X X' ε (3.29)

Hence:
 −1
β̂ = β + X' X
. X' ε (3.30)

This last equation shows that .β̂ is a linear estimator of .β.

Unbiased Estimator
From Eq. (3.30), we can write:
 −1
E β̂ = E (β) + X ' X
. X' E (ε) (3.31)

Given that .E (β) = β and .E (ε) = 0, we deduce:

E β̂ = β
. (3.32)

which means that .β̂ is an unbiased estimator of .β.

Variance-Covariance Matrix of Coefficients


Let .Ωβ̂ be the variance-covariance matrix of OLS coefficients, i.e., of .β̂, given by:

'
Ωβ̂ = E
. β̂ − β β̂ − β (3.33)
3.2 The OLS Estimators 113

that is:
⎛ ⎞
V (α̂) Cov(α̂, β̂1 ) Cov(α̂, β̂2 ) · · · Cov(α̂, β̂k )
⎜Cov(β̂1 , α̂) V (β̂1 ) Cov(β̂1 , β̂2 ) · · · Cov(β̂1 , β̂k )⎟
⎜ ⎟
⎜ .. ⎟
.Ω = ⎜Cov(β̂2 , α̂) Cov(β̂2 , β̂1 ) V ( β̂ ) · · · . ⎟ (3.34)
β̂ ⎜ 2 ⎟
⎜ . . . . . ⎟
⎝ .. .. .. .. .. ⎠
Cov(β̂k , α̂) ··· ··· ··· V (β̂k )

This matrix is symmetric. Furthermore, using relation (3.30), we can write (3.33)
as follows:

 ' −1 '  ' −1 ' '
.Ω = E XX Xε XX Xε (3.35)
β̂

or:
  
−1 −1
.Ωβ̂ = E X' X X' εε ' X X' X (3.36)

 −1  −1 '  −1
because . X' X is a symmetric matrix, implying: . X' X = X' X .

Knowing that .E εε ' = σε2 I , we deduce:
 −1  −1
Ωβ̂ = X' X
. X' σε2 I X X' X (3.37)

Hence:
 −1
.Ωβ̂ = σε2 X ' X (3.38)

This determination of the variance-covariance matrix will allow us to show that


β̂ is of minimum variance.
.

Minimum Variance Estimator


It can be shown (see Appendix 3.2.1) that, among the class of linear unbiased
estimators, the OLS estimator .β̂ has the smallest variance. We can therefore state
the following property, as in the case of the simple regression model.

Property 3.1 The OLS estimator .β̂ of .β is the best linear unbiased estimator
(BLUE).
114 3 The Multiple Regression Model

3.2.4 Error Variance Estimation

The error is, of course, unknown in the model. In order to estimate the variance .σε2
of the errors, we need to use the residuals .e:

e = Y − X β̂
. (3.39)

From this expression and after calculating .E e' e , we can show that the
estimator .σ̂ε2 of the error variance is written (see Appendix 3.2.2):

T
e' e 1
. σ̂ε2 = ≡ et2 (3.40)
T −k−1 T −k−1
t=1

This is an unbiased estimator of .σε2 .

3.2.5 Example

Consider the following example comprising one explained variable .Yt and three
explanatory variables .X1t , .X2t , and .X3t for .t = 1, . . . , 6 (see Table 3.1). This
example is only given for illustrative purposes to put into practice the concepts
previously presented, since it obviously makes little sense to carry out a 6-point
regression.
The model is written as:

Yt = α + β1 X1t + β2 X2t + β3 X3t + εt


. (3.41)

or, in matrix form:

. Y = Xβ + ε (3.42)

Table 3.1 Illustrative t .Yt .X1t .X2t .X3t


example
1 4 3 5 4
2 2 5 6 8
3 1 7 3 9
4 3 2 2 5
5 6 9 1 2
6 8 4 7 3
Sum 24 30 24 31
3.2 The OLS Estimators 115

with: ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
4 1354 ε1
⎜2⎟ ⎜1 5 6 8⎟ ⎛ ⎞ ⎜ ⎟
⎜ ⎟ ⎜ ⎟ α ⎜ε2 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜1⎟ ⎜1 7 3 9⎟ β1 ⎟ ⎜ε3 ⎟
.Y = ⎜ ⎟ , X = ⎜ ⎟,β = ⎜
⎝β2 ⎠ and .ε = ⎜ ⎟
⎜3⎟ ⎜1 2 2 5⎟ ⎜ε4 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝6⎠ ⎝1 9 1 2⎠ β3 ⎝ε5 ⎠
8 1473 ε6

Determination of OLS Estimators


By writing the transpose of .X:
⎛ ⎞
1 1 1 11 1
⎜3 5 7 29 4⎟
.X = ⎜ ⎟
'
⎝5 (3.43)
6 3 21 7⎠
4 58 9 52 3

we can calculate the matrix product .X' X:


⎛ ⎞
6 30 24 31
⎜ 30 184 107 155⎟⎟
.X X = ⎜
'
(3.44)
⎝24 107 124 128⎠
31 155 128 199

We deduce the inverse matrix:


⎛ ⎞
2.87 −0.24 −0.24 −0.11
 −1 ⎜−0.24 0.04 0.02 −0.002⎟
X' X =⎜
⎝−0.24
⎟ (3.45)
−0.004⎠
.
0.02 0.04
−0.11 −0.002 −0.004 0.03

Calculating the product .X ' Y gives:


⎛ ⎞
24
⎜121⎟
.X Y = ⎜
' ⎟ (3.46)
⎝103⎠
92
116 3 The Multiple Regression Model

Finally:
⎞ ⎛ ⎞

5.57 α̂
 ' −1 ' ⎜ 0.21 ⎟ ⎜β̂1 ⎟
. X X XY =⎜ ⎟ ⎜ ⎟
⎝ 0.47 ⎠ = ⎝β̂2 ⎠ (3.47)
−0.87 β̂3

The estimated model is therefore:

Ŷt = 5.57 + 0.21X1t + 0.47X2t − 0.87X3t


. (3.48)

Practical Calculation
In practice, it is possible to use certain matrix algebra results to simplify calcula-
tions. Thus, in a model
 with three explanatory variables, such as the one in this
example, the matrix . X' X is given by:
⎛    ⎞
T X X2t X3t
  1t  
⎜ X1t X1t2 X X X X ⎟
.X X = ⎜  1t 2 2t  1t 3t ⎟
' 
⎝ X2t (3.49)
X1t X2t X2t X X ⎠
    2t 2 3t
X3t X1t X3t X2t X3t X3t

and the matrix .X' Y is written as:


⎛  ⎞
Y
 t
⎜ Yt X1t ⎟
.X Y = ⎜ ⎟
'
⎝ Yt X2t ⎠ (3.50)

Yt X3t

Calculating these various sums gives the following matrices:


⎛ ⎞
6 30 24 31
⎜ 30 184 107 155⎟
.X X = ⎜ ⎟
'
⎝24 (3.51)
107 124 128⎠
31 155 128 199

and:
⎛ ⎞
24
⎜121⎟
.X Y = ⎜ ⎟
'
⎝103⎠ (3.52)
92
3.3 Tests on the Regression Coefficients 117

3.3 Tests on the Regression Coefficients


3.3.1 Distribution of Estimators

Assume that the errors are normally distributed:

ε ∼ N 0, σε2 I
. (3.53)

By virtue of Relationship (3.30):


 −1
. β̂ = β + X' X X' ε (3.54)

.β̂ is a linear function of .ε and therefore also follows a normal distribution

with .(k + 1) dimensions. Furthermore, we have shown that the variance-covariance


matrix is given by:
 −1
Ωβ̂ = σε2 X ' X
. (3.55)

So we have:
 −1
. β̂ ∼ N β,σε2 X' X (3.56)

Consider the coefficient .β̂i associated with the ith explanatory variable .Xit . We
have:

. β̂i ∼ N βi , σε2 ai+1,i+1 (3.57)

 −1
where .ai+1,i+1 denotes the .(i + 1)th element5 of the diagonal of . X ' X . We can
write:

β̂i − βi
. √ ∼ N(0, 1) (3.58)
σε ai+1,i+1

We have also seen that:

e' e
. σ̂ε2 = (3.59)
T −k−1

 −1
5 We consider the .(i + 1)th element and not the ith since the first element of the matrix . X' X
relates to the constant term. Obviously, if the variables are centered, it is appropriate to choose the
ith element.
118 3 The Multiple Regression Model

 '
Using the result that if .w ∼ N 0, σw2 I , . wσ 2w follows a Chi-squared distribution,
w
we have:

e' e
. ∼ χT2 −k−1 (3.60)
σε2

that is:

(T − k − 1)σ̂ε2
. ∼ χT2 −k−1 (3.61)
σε2

Remembering the property that (see Box 2.2 in Chap. 2) if .z ∼ N (0, 1) and
v ∼ χr2 , the quantity:
.


z r
.t = √ (3.62)
v

follows a Student’s t distribution with r degrees of freedom, we have:

β̂i −βi √

σε ai+1,i+1 T −k−1
t=
.  (3.63)
(T −k−1)σ̂ε2
σε2

or finally:

β̂i − βi
t=
. √ ∼ t (T − k − 1) (3.64)
σ̂ε ai+1,i+1

for .i = 1, . . . , k + 1. This result allows us to proceed with tests on .βi .

3.3.2 Tests on a Regression Coefficient

From relationship (3.64), we can construct a .100(1 − p)% confidence interval for
βi , i.e.:
.


.β̂i ± tp/2 σ̂ε ai+1,i+1 (3.65)

As in the case of the simple regression model, it is possible to test the null
hypothesis that .βi is equal to a certain value .β0 , i.e.:

H0 : βi = β0
. (3.66)
3.3 Tests on the Regression Coefficients 119

against the alternative hypothesis:

H1 : βi /= β0
. (3.67)

If the null hypothesis is true, then:

β̂i − β0
. √ ∼ t (T − k − 1) (3.68)
σ̂ε ai+1,i+1

and the decision rule is given by:


 
 
– If . σ̂ √β̂ai −β0  ≤ tp/2 : the null hypothesis is not rejected at the 100p%
ε i+1,i+1
significance
 level,
 so .βi = β0 .
 
– If . σ̂ √β̂ai −β0  > tp/2 : the null hypothesis is rejected at the 100p% significance
ε i+1,i+1
level, so .βi /= β0 .

In practice, the most commonly used test consists in testing the null hypothesis:

H0 : βi = 0
. (3.69)

against the alternative hypothesis:

H0 : βi /= 0
. (3.70)

This is a coefficient significance test (or t-test): under the null hypothesis, the
coefficient associated with the variable .Xit is not significant, i.e., this variable plays
no role in determining the dependent variable .Yt . In practical terms, we proceed as
in the case of the simple regression model, i.e., we calculate the t-statistic of the
coefficient .β̂i :

β̂i
tβ̂i =
. (3.71)
σβ̂i

where .σ
β̂i denotes the estimate of the standard deviation of the coefficient .β̂i , i.e.:


σ
.
β̂i = σ̂ε ai+1,i+1 (3.72)
120 3 The Multiple Regression Model

The decision rule associated with the significance test of the coefficient .βi is
written:
 
 
– If .tβ̂i  ≤ tp/2 : the null hypothesis is not rejected at the 100p% significance level,
so.βi = 0—the variable .Xit does not contribute to the explanation of .Yt .
 
– If .tβ̂i  > tp/2 : the null hypothesis is rejected at the 100p% significance level, so
.βi /= 0—the coefficient associated with the variable .Xit is significantly different

from zero, indicating that .Xit contributes to the explanation of the dependent
variable .Yt .

3.3.3 Significance Tests of Several Coefficients

We have previously presented the t-test, which allows us to test the significance of
each of the regression coefficients in isolation. It is also possible to simultaneously
test the significance of several or even all the coefficients of the estimated model.
For this purpose, we use a Fisher test.
Assume that the elements of the vector .β are subject to q constraints:

Rβ = r
. (3.73)

where .R is a given matrix of size .(q, k + 1) and .r is the vector of constraints of


dimension .(q, 1). It is further assumed that .q ≤ k + 1 and that the matrix .R is of
full rank, meaning that the q constraints are linearly independent.
Equation (3.73) is very general. We will show that various tests can be carried
out based on this expression.

Test on a Particular Regression Coefficient


The case previously presented of the test of significance of one of the coefficients
.βi corresponds to:

R = [0 · · · 0 1 0 · · · 0] and r = 0
. (3.74)

The matrix .R contains a 1 in the .(i + 1)th place6 and .r is null, which amounts
to testing the significance of the coefficient .βi . If we set .r = β0 , we find the test of
equality of .βi at a certain value .β0 presented earlier.

6 Recallthat the first element of the matrix corresponds to the constant term, which explains why
we consider the .(i + 1)th element and not the ith.
3.3 Tests on the Regression Coefficients 121

Test of Equality of Coefficients


From Eq. (3.73), it is possible to test the equality of two (or more) coefficients. For
example, writing:

R = [0 0 1 0 − 1 0 · · · 0] and r = 0
. (3.75)

allows us to test:7

β2 − β4 = 0
. (3.76)

that is:

β2 = β4
. (3.77)

Similarly, the relationship:

R = [0 0 1 0 1 0 · · · 0] and r = 0
. (3.78)

is equivalent to performing the test:

. β2 + β4 = 0 ⇐⇒ β2 = −β4 (3.79)

Significance Test for All Coefficients


It is also possible, starting from (3.73), to perform a test of significance of all the
coefficients of the explanatory variables of the regression. Such a test is called test
of regression significance. It corresponds to the following case:
⎛ ⎞ ⎛ ⎞
0 1 0 ··· 0 0
⎜0 0 1 · · · 0⎟ ⎜0⎟
⎜ ⎟ ⎜ ⎟
.R = ⎜ . .. .. . . .. ⎟ and r = ⎜ .. ⎟ (3.80)
⎝ .. . . . . ⎠ ⎝.⎠
0 ··· ··· 0 1 0

This gives:
⎛ ⎞
⎛ ⎞ α ⎛ ⎞
0 1 0 · · · 0 ⎜β ⎟ 0
⎜0 0 ⎜ 1⎟
⎜ 1 · · · 0⎟ ⎜ . ⎟ ⎜0⎟
⎟ ⎜
. ⎜. . ⎜ .. ⎟ = ⎜ . ⎟
⎝ .. ..
.. . . .. ⎟ ⎜ ⎟ ⎟ (3.81)
. . . ⎠ ⎜ . ⎟ ⎝ .. ⎠
⎝ .. ⎠
0 ··· ··· 0 1 0
βk

7 The first element of the matrix corresponding to the constant .α.


122 3 The Multiple Regression Model

Here we test .β1 = β2 = · · · = βk = 0. This involves testing the null


hypothesis that no coefficient is significant, i.e., that none of the k explanatory
variables contributes to explaining the dependent variable .Yt . This test is particularly
helpful and frequently used.

Significance Test of a Subset of Coefficients


It is possible to test the significance of a subset of explanatory variables, which
corresponds to the following case:

R = [ 0 I s ] and r = 0
. (3.82)

where .r is a column vector with s elements and .0 is a null matrix of size .(s, k+1−s).
We test .βk−s+2 = βk−s+3 = · · · = βk = 0. This involves testing the null hypothesis
that the last s elements of the vector .β are insignificant.
We can thus see that expression (3.73) brings together a large number of tests,
which are presented in detail in Appendix 3.2.3. We now propose a synthesis.

Synthesis
The various tests mentioned above are Fisher tests. They consist in considering
two regression models: an unconstrained model and a constrained model. The
unconstrained model involves regressing the dependent variable on all the explana-
tory variables. The constrained model involves regressing the dependent variable
on just some of the explanatory variables. It is called a constrained model insofar
as a constraint is imposed on one or more coefficients included in the regression.
Generally speaking, Fisher tests are written:

(RSSc − RSSnc ) /q
F =
. ∼ F (q, T − k − 1) (3.83)
RSSnc / (T − k − 1)

where .RSSnc is the residual sum of squares of the unconstrained model and .RSSc
denotes the residual sum of squares of the constrained model, q being the number
of constraints.

Remark 3.2 The t-test, which consists in testing the significance of a single
coefficient, can also be interpreted as a Fisher test. Indeed, the test of the null
hypothesis .βi = 0 amounts to:

– Regressing the dependent variable on all the explanatory variables: uncon-


strained model
– Regressing the dependent variable on all the explanatory variables except .Xi :
constrained model
3.4 Analysis of Variance (ANOVA) and Adjusted Coefficient of Determination 123

3.4 Analysis of Variance (ANOVA) and Adjusted Coefficient


of Determination

After estimating the regression parameters and testing the significance of the
coefficients, it is necessary—as in the case of the simple regression model—to
assess the goodness of fit. In other words, we study whether the scatter plot is well
represented by the regression line, by analyzing whether the scatter is concentrated
or, on the contrary, dispersed around the line. This can be done by using the analysis
of variance of the regression and calculating the coefficient of determination.

3.4.1 Analysis-of-Variance Equation


Case of Centered Variables
Consider the centered variables:


⎪ yt = Yt − Ȳ



⎪ x = X1t − X̄1


1t
···
. (3.84)

⎪ x = X it − X̄i


it

⎪ ···


xkt = Xkt − X̄k

The multiple regression model is then written as:

y = xβ + ε
. (3.85)

where:
⎛ ⎞
y1
⎜ ⎟
– .y = ⎝ ... ⎠
y
⎛ T ⎞
x11 · · · xk1
⎜ ⎟
– .x = ⎝ ... . . . ⎠
x xkT
⎛ 1T⎞
β1
⎜ ⎟
– .β = ⎝ ... ⎠
βk
⎛ ⎞
ε1
⎜ ⎟
– and .ε = ⎝ ... ⎠
εT
124 3 The Multiple Regression Model

As in the case of the simple regression model, we will establish the analysis-of-
variance equation by starting from the expression of the residuals:

e = y − x β̂
. (3.86)

Hence:
' ' '
e' e = y − x β̂
. y − x β̂ = y ' y − 2β̂ x ' y + β̂ x ' x β̂ (3.87)

Knowing that .x ' y = x ' x β̂, this relationship can be written as:
'
e' e = y ' y − β̂ x ' x β̂
. (3.88)

that is:
'
y ' y = β̂ x ' x β̂ + e' e
. (3.89)

or:
'
y ' y = β̂ x ' y + e' e
. (3.90)

Equation (3.90) is the analysis-of-variance equation according to which the total


sum of squares is equal to the explained sum of squares and the residual sum of
squares where:
  2
– The total sum of squares (TSS) is given by: .y ' y = yt2 = Yt − Ȳ
t t
' '
– The explained sum of squares (ESS) is given by: .β̂ x ' y
or .β̂ x ' x β̂
'
– The residual sum of squares (RSS) is given by: .e e = et2
t

Case of Noncentered Variables


Of course, it is also possible to determine the analysis-of-variance equation in the
case where the variables are not centered. We proceed in the same way, starting from
the expression:
' ' '
. e' e = Y − Xβ̂ Y − Xβ̂ = Y ' Y − 2β̂ X' Y + β̂ X' Xβ̂ (3.91)

Knowing that .X' Y = X' Xβ̂, this relationship can still be written as:
'
Y ' Y = β̂ X' Xβ̂ + e' e
. (3.92)
3.4 Analysis of Variance (ANOVA) and Adjusted Coefficient of Determination 125

or also:
'
Y ' Y = β̂ X' Y + e' e
. (3.93)

If we compare this equation with (3.90), we see that the residual sum of squares
is the same, but the left-hand side is different. Specifically:

– In (3.93), the left-hand side is given by: .Y ' Y = Yt2 .
t
  2 
– In (3.90), the left-hand side is written: .y ' y = yt2 = Yt − Ȳ = Yt2 −
t t t
T Ȳ 2 .

We can therefore write the analysis-of-variance Eq. (3.93) as follows:


'
Y ' Y − T Ȳ 2 = β̂ X' Y − T Ȳ 2 + e' e
. (3.94)
T SS RSS
ESS

It is then possible to give the definition of the coefficient of determination.

3.4.2 Coefficient of Determination

The coefficient of determination, denoted .R 2 (R-squared), of the multiple regres-


sion model is written as in the case of the simple regression model:

ESS RSS
R2 =
. =1− (3.95)
T SS T SS
That is, in the case of centered variables:
'
β̂ x ' y e' e
. R2 = '
=1− ' (3.96)
yy yy

and, in the case of noncentered variables:


'
β̂ X' Y − T Ȳ 2 e' e
.R = = −
2
1 (3.97)
Y ' Y − T Ȳ 2 Y ' Y − T Ȳ 2

As in the case of the simple regression model, we have:

0 ≤ R2 ≤ 1
. (3.98)

Remark 3.3 . R 2 is called the multiple correlation coefficient. However, it is not
used in practice.
126 3 The Multiple Regression Model

The interpretation of the coefficient of determination is similar to that presented


for the simple regression model. It measures the percentage of the total variance
explained by the explanatory variables and thus enables us to judge the quality of
the model’s fit.

Remark 3.4 (Coefficient of Determination and Fisher Test) Considering the


Fisher test of regression significance, it is possible to rewrite expression (3.83) as
follows:
ESS/k
F =
. (3.99)
RSS/ (T − k − 1)

or, alternatively, by introducing the coefficient of determination:

R 2 /k
F =
. ∼ F (k, T − k − 1) (3.100)
1 − R 2 / (T − k − 1)

The latter expression is used to test the significance of the coefficient of


determination, i.e., the null hypothesis that .R 2 = 0.

3.4.3 Adjusted Coefficient of Determination

The coefficient of determination is a nondecreasing function of the number of


explanatory variables included in the model. Thus, when the number of explanatory
variables in the model increases, the value of the coefficient of determination also
increases. This is because introducing an additional explanatory variable cannot
increase the residual sum of squares, so it cannot decrease the value of the coefficient
of determination. To overcome this problem, the number of explanatory variables
in the model must be taken into account when calculating the coefficient of
determination. In other words, the coefficient of determination must be corrected
by the number of degrees of freedom. This is done by calculating the adjusted (or
corrected) coefficient of determination. The latter is noted .R̄ 2 and given by:

e' e/ (T − k − 1)
R̄ 2 = 1 −
. (3.101)
y ' y/ (T − 1)

that is:
T −1
.R̄ 2 = 1 − 1 − R2 (3.102)
T −k−1

Remark 3.5 The determination and adjusted determination coefficients make it


possible to compare models. Of course, these models must have the same dependent
variable and the same number of observations. When the models to be compared
3.4 Analysis of Variance (ANOVA) and Adjusted Coefficient of Determination 127

involve the same number of explanatory variables, we can use the .R 2 . On the other
hand, whenever the models to be compared differ in the number of explanatory
variables introduced, the .R̄ 2 should be used. The model with the highest coefficient
of determination—or adjusted coefficient of determination—is then selected.

3.4.4 Partial Correlation Coefficient

We saw in the first chapter that the correlation coefficient is an indicator of the
link between two variables. For a model with k explanatory variables, it is possible
to calculate several correlation coefficients. For example, if the model has two
explanatory variables .X1 and .X2 , three correlation coefficients can be calculated:
.rY X1 , .rY X2 , and .rX1 X2 .

However, it is questionable whether .rY X1 measures the true link between Y and
.X1 in the presence of .X2 . For this to be the case, it is necessary to calculate a

correlation coefficient .rY X1 that is independent of the influence that .X2 may have
on Y and on .X1 . Such a coefficient is called the partial correlation coefficient. It
is denoted .rY X1 ,X2 and is given by:

rY X1 − rY X2 rX1 X2
rY X1 ,X2 = 
. (3.103)
1 − rY2 X2 1 − rX2 1 X2

Similarly, we can define:

rY X2 − rY X1 rX1 X2
rY X2 ,X1 = 
. (3.104)
1 − rY2 X1 1 − rX2 1 X2

and:
rX1 X2 − rY X1 rY X2
rX1 X2 ,Y = 
. (3.105)
1 − rY2 X1 1 − rY2 X2

rY X1 ,X2 (respectively .rY X2 ,X1 ) is the partial correlation coefficient between Y and
.

X1 (respectively between Y and .X2 ), the influence of .X2 (respectively .X1 ) having
.

been removed. Similarly, .rX1 X2 ,Y is the partial correlation coefficient between the
two explanatory variables .X1 and .X2 , the influence of Y having been removed. A
partial correlation coefficient therefore measures the link between two variables,
the influence of one or more other explanatory variables having been removed. The
three partial correlation coefficients presented above are first-order coefficients in
the sense that only the influence of one variable is removed.
It is also possible to calculate second-order partial correlation coefficients.
Consider, for example, a model with three explanatory variables: .X1 , .X2 , and .X3 .
128 3 The Multiple Regression Model

Three second-order partial correlation coefficients can be calculated: .rY X1 ,X2 X3 ,


rY X2 ,X1 X3 , and .rY X3 ,X1 X2 . In a model with four explanatory variables, third-order
.

partial correlation coefficients can be calculated, and so on.


The coefficient .rY2 X1 ,X2 is sometimes called the partial coefficient of determina-
tion: it measures the proportion of the variation in Y , which is not explained by .X2 ,
but by .X1 . The notion of partial correlation is thus very important as it enables us to
judge the relevance of including one (or more) explanatory variable(s) in a model.
The correlation and partial correlation coefficients are also linked to the coefficient
of determination through the following relationships:

rY2 X1 + rY2 X2 − 2rY X1 rY X2 rX1 X2


.R2 = (3.106)
1 − rX2 1 X2

R 2 = rY2 X1 + 1 − rY2 X1 rY X2 ,X1


. (3.107)

R 2 = rY2 X2 + 1 − rY2 X2 rY X1 ,X2


. (3.108)

3.4.5 Example

Let us return to the previous simple example to illustrate the various concepts
presented (see Sect. 3.2.5). Recall that we had a model with three explanatory
variables whose values are shown in Table 3.2.
We have seen that the application of the OLS method led to the following
estimated model:

. Ŷt = 5.57 + 0.21X1t + 0.47X2t − 0.87X3t (3.109)

Analysis-of-Variance Equation: Case of Centered Variables


To determine the centered variables, we first calculate the means of each of the
variables: .Ȳ = 4, .X̄1 = 5, .X̄2 = 4, and .X̄3 = 5.17. The values of the centered
variables are shown in Table 3.3. The necessary calculations are shown in Table 3.4.

Table 3.2 Illustrative t .Yt .X1t .X2t .X3t


example
1 4 3 5 4
2 2 5 6 8
3 1 7 3 9
4 3 2 2 5
5 6 9 1 2
6 8 4 7 3
3.4 Analysis of Variance (ANOVA) and Adjusted Coefficient of Determination 129

Table 3.3 Illustrative t .yt .x1t .x2t .x3t


example: centered variables
1 0 −2 1 .−1.17
2 −2 0 2 2.83
3 −3 2 −1 3.83
4 −1 −3 −2 .−0.17
5 2 4 −3 .−3.17
6 4 −1 3 .−2.17

Sum 0 0 0 0
Mean 0 0 0 0

Table 3.4 Illustrative t 2


.yt .ŷt
2
.ŷt .et .et
2
example: calculations
1 0 1.07 1.15 .−1.07 1.15
2 4 .−1.53 2.34 .−0.47 0.25
3 9 .−3.40 11.54 0.40 0.16
4 1 .−1.43 2.04 0.43 0.18
5 4 2.19 4.78 .−0.19 0.03
6 16 3.10 9.59 0.90 0.81
Sum 34 0 31.44 0 2.56
Mean 5.67 0 5.24 0

These calculations give us:

– The total sum of squares: .T SS = 34


– The explained sum of squares: .ESS = 31.44
– The sum of squared residuals: .RSS = 2.56

The analysis-of-variance equation is therefore written: .34 = 31.44 + 2.56. It is


then possible to determine the value of the coefficients of determination and adjusted
determination:

– .R 2 = ESS
T SS = 0.92 or .R = 1 − T SS = 0.92
2 RSS

– .R̄ = 1 − 6−3−1 (1 − 0.92) = 0.81


2 6−1

We can see that we have .R̄ 2 ≤ R 2 . The model explains around 90% of the
variance of .Yt according to .R 2 and 80% according to .R̄ 2 .

Analysis-of-Variance Equation: Case of Noncentered Variables


Let us check that we obtain the same results by reasoning directly from the raw data.
The necessary calculations are presented in Table 3.5.
130 3 The Multiple Regression Model

Table 3.5 Example t 2


.Yt .Ŷt
2
.Ŷt .et
2
.et
illustration: calculations
1 16 5.07 25.71 .−1.07 1.14
2 4 2.47 6.11 .−0.48 0.23
3 1 0.60 0.36 0.38 0.14
4 9 2.57 6.61 0.42 0.18
5 36 6.19 38.28 .−0.19 0.04
6 64 7.10 50.37 0.91 0.83
Sum 130 24 127.44 0 2.56
Mean 21.67 4 21.24 0

These calculations give us:

– The total sum of squares: .T SS = 130 − 6 × 42 = 34


– The explained sum of squares: .ESS = 127.44 − 6 × 42 = 31.44
– The sum of squared residuals: .RSS = 2.56

The analysis-of-variance equation is of course similar to that obtained in the case


of centered variables: .34 = 31.44 + 2.56. As before, let us calculate the value of the
coefficients of determination and adjusted determination:

– .R 2 = ESS
T SS = 0.92 or .R = 1 − SCT = 0.92
2 SCR

– .R̄ = 1 − 6−3−1 (1 − 0.92) = 0.81


2 6−1

We find the result that the model explains about 90% of the variance of .Yt
according to .R 2 and 80% according to .R̄ 2 .

Tests on the Regression Coefficients


Recall that we found:
⎛ ⎞ ⎛ ⎞
α̂ 5.57
⎜β̂1 ⎟ ⎜ 0.21 ⎟
.⎜ ⎟ ⎜ ⎟
⎝β̂2 ⎠ = ⎝ 0.47 ⎠ (3.110)

β̂3 −0.87

The significance of the coefficients can then be tested.

Significance Test of a Coefficient For example, let us look at the test of the null
hypothesis:

.H0 : β2 = 0 (3.111)
3.4 Analysis of Variance (ANOVA) and Adjusted Coefficient of Determination 131

We know that:

β̂2
. √ ∼ t (6 − 3 − 1) (3.112)
σ̂ε a33
 −1
where .a33 is the third element of the diagonal of the matrix . X' X . Recall that
 −1
we consider the third element and not the second insofar as the matrix . X' X

takes the constant into account. We have seen that .a33 = 0.04, so . a33 = 0.2.

Furthermore, we know that . et2 = 2.56, so: .σ̂ε2 = 6−3−1
1
2.56 = 1.28. The test
t
statistic is given by:

0.47
.t=√ = 2.07 (3.113)
1.28 × 0.2

This statistic follows a Student’s t distribution with 2 degrees of freedom. The


Student’s law table for a 5% significance level gives us .t (2) = 4.303. Since .2.07 <
4.303, we do not reject the null hypothesis of nonsignificance of the coefficient .β2 :
the variable .X2t does not contribute to the explanation of the dependent variable .Yt .
We previously mentioned that the t-test can also be interpreted as a Fisher test:

(RSSc − RSSnc ) /q
F =
. ∼ F (q, T − k − 1) (3.114)
RSSnc / (T − k − 1)

where .RSSnc is the sum of squared residuals of the unconstrained model and .RSSc
denotes the sum of squared residuals of the constrained model, q being the number
of constraints.
We know that: .RSSnc = 2.56. The constrained model consists of the regression
of .Yt on .X1t and .X3t . The estimation of this model leads to a residual sum of squares
equal to .RSSc = 7.60. The Fisher test statistic is written as:

(7.60 − 2.56) /1
F =
. = 3.93 ∼ F (q, T − k − 1) (3.115)
2.56/ (6 − 3 − 1)

Under the null hypothesis, .F ∼ F (1, 2) = 18.513 at the 5% significance level.


The calculated value of the statistic being lower than the critical value, we do not
reject the null hypothesis. We obtain the same result as with the Student’s t-test.

Significance Test of the Whole Regression Let us now consider the test of the null
hypothesis:

H0 : β1 = β2 = β3 = 0
. (3.116)
132 3 The Multiple Regression Model

This is a Fisher test. We can use relationship (3.99):

ESS/k 31.44/3
F =
. = = 8.18 (3.117)
RSS/ (T − k − 1) 2.56/2

The Fisher table gives us .F (3, 2) = 19.164 at the 5% significance level. Since
the calculated value of the F statistic is lower than the critical value, we do not
reject the null hypothesis that .β1 = β2 = β3 = 0. The fact that the coefficient
of determination is large even though the variables are not significant may be due
to the small number of observations. Remember that this example is for illustrative
purposes only.
We can also use the relationship involving the coefficient of determination:

R 2 /k
. F = (3.118)
1 − R 2 / (T − k − 1)

which gives:

0.92/3
F =
. = 8.18 (3.119)
(1 − 0.92) / (6 − 3 − 1)

The interpretation is similar to the previous one, namely, that the explanatory
variables do not contribute to the explanation of the dependent variable.

Significance Test of a Subset of Explanatory Variables Consider the test of the


null hypothesis:

H0 : β2 = β3 = 0
. (3.120)

This involves testing whether or not the coefficients associated with the variables
X2t and .X3t are significant. We perform a Fisher test:
.

– The unconstrained model has already been estimated. The corresponding sum of
squared residuals is: .RSSnc = 2.56.
– The constrained model is estimated by regressing .Yt on .X1t . The estimation of
this model leads to a sum of squared residuals equal to: .RSSc = 33.97.
– We calculate the test statistic:
(33.97 − 2.56) /2
F =
. = 12.27 (3.121)
2.56/ (6 − 3 − 1)

– The Fisher table gives, at the 5% significance level: .F (2, 2) = 19. Since the
calculated value of the statistic is lower than the critical value, we cannot reject
the null hypothesis that the coefficients associated with the variables .X2t and .X3t
are not significant.
3.4 Analysis of Variance (ANOVA) and Adjusted Coefficient of Determination 133

Calculation of the Partial Correlation Coefficients


Suppose we want to calculate the following two partial correlation coefficients:

– .rY X1 ,X2 , measuring the influence of .X1 on Y , the influence of .X2 having been
removed
– .rY X3 ,X1 X2 , measuring the influence of .X3 on Y , the influences of .X1 and .X2
having been removed

In order to calculate .rY X1 ,X2 , we start by regressing Y on a constant and .X2 ,


which gives:

Ŷt = 3.00 + 0.25 X2t


. (3.122)
(1.23) (0.46)

We derive the residual series:

e1t = Ŷt − 3.00 − 0.25X2t


. (3.123)

We then regress .X1 on a constant and .X2 :

X̂1t = 6.86 − 0.46 X2t


. (3.124)
(3.02) (−0.93)

and we deduce the residual series:

e2t = X̂1t − 6.86 + 0.46X2t


. (3.125)

The partial correlation coefficient .rY X1 ,X2 is equal to the correlation coefficient
between .e1 and .e2 ; hence:

rY X1 ,X2 = re1 e2 = 0.14


. (3.126)

The correlation between Y and .X1 is equal to 0.14, when the influence of .X2 is
removed.
To determine .rY X3 ,X1 X2 , we first regress Y on .X1 and .X2 , i.e.:

Ŷt = 1.95 + 0.15 X1t + 0.32 X2t


. (3.127)
(0.38) (0.25) (0.47)

and we note .e3t the residual series:

.e3t = Ŷt − 1.95 − 0.15X1t − 0.32X2t (3.128)

We then regress .X3 on .X1 and .X2 and obtain:

X̂3t = 4.14 + 0.07 X1t + 0.17 X2t


. (3.129)
(0.75) (0.09) (0.23)
134 3 The Multiple Regression Model

from which we deduce the residual series:

.e4t = X̂3t − 4.14 − 0.07X1t − 0.17X2t (3.130)

We then obtain the desired result:

rY X3 ,X1 X2 = re3 e4 = −0.96


. (3.131)

The correlation between Y and .X3 is .−0.96, when the influence of the variables
X1 and .X2 is removed.
.

3.5 Some Examples of Cross-Sectional Applications

As in the previous chapter, we offer a few examples from the literature to highlight
the utility of the multiple regression model for cross-sectional studies.

3.5.1 Determinants of Crime

The first example concerns the determinants of crime in developing countries. In


this context, the study by Puech (2005) aims to investigate the explanatory factors
of violent crime rates in 2000 for 723 municipalities in the Brazilian state of Minas
Gerais. Two categories of crimes are considered: crimes against property and crimes
against persons. The explanatory variables selected by Puech (2005) include:

– Level of development (DEV ), measured through the municipality’s average


income
– Income inequality, measured by the Gini coefficient of income distribution
(GI NI )
– Changes in economic conditions, captured by the income growth rate between
1991 and 2000 (GROW T H )
– Demographic and sociological variables such as:
– The number of police officers per capita (P OL)
– The urbanization rate (U RB)
– The proportion of young people aged between 15 and 24 in the municipality’s
population (Y OU NG)
– Family instability, measured by the proportion of households headed by single
women (W OMEN )

As previously mentioned, two dependent variables are considered:

– The aggregate rate of violent property crimes: sum of theft, armed robbery, and
auto theft
3.5 Some Examples of Cross-Sectional Applications 135

Table 3.6 Determinants of Property Persons


crime
DEV . 9.12 .−0.03
(2.43) (−0.10)
.−0.68
.DEV
2
(−1.91)
GI N I . 3.13 . 2.12
(3.84) (2.70)
GROW T H .−0.50 . 0.20
(−1.99) (0.82)
P OL .75.01 .−41.61
(1.54) (−0.89)
U RB .−0.18 . 0.53
(−0.35) (1.07)
Y OU N G .39.29 .25.11
(5.04) (3.39)
W OMEN .14.71 .20.51
(2.40) (3.49)
Constant .−33.13 .−6.64
(−3.19) (−2.24)
Source: Puech (2005). Figures in
parentheses are t-statistics of the
estimated coefficients. In the col-
umn “Property” (resp. “Persons”),
the explained variable is the rate of
crimes against property (resp. per-
sons).

– The aggregate rate of violent crimes against persons: sum of attempted homi-
cides, homicides, and assaults

Table 3.6 reports Puech (2005) estimation results. Schematically, if we consider


property crimes, it emerges that both average income and income inequality have a
positive impact on this type of crime. We also note that property crimes increase with
the proportion of young people and vary with family vulnerability in the expected
direction. Turning now to crimes against persons, we see that average income has
no impact, while inequality has an expected positive effect. Like property crimes,
crimes against people also vary positively with the proportion of young people in
the population and with family instability. In summary, the three main determinants
of crime in Minas Gerais are income inequality, the proportion of young people in
the municipality’s population, and family instability.

3.5.2 Health Econometrics

The second example falls within the field of health econometrics. It is based on
the work of Thuilliez (2007) and highlights the relationship between malaria and
primary education. Primary education is apprehended in terms of school results
through repetition and completion rates in primary school. With regard to malaria
(MALA), the author uses an index measured as the proportion of the population “at
136 3 The Multiple Regression Model

risk” (i.e., likely to contract malaria) in a country in 1994. The analysis covers a set
of 80 countries. Thuilliez (2007) also considers the following explanatory variables:

– Per capita income (I N C): this variable takes into account the fact that countries
with higher income levels offer a better quality of education than others. Per
capita income is measured by GDP per capita at the purchasing power parity
level (in logarithms).
– Level of urbanization (U RB): this variable is expected to have a positive impact
on educational outcomes. The variable is expressed in logarithms.
– Public expenditure on primary education, expressed as a percentage of GDP
(EXP ) and in logarithms.
– Public expenditure management efficiency (GEI ). The measure used is the
government effectiveness index proposed by Kaufmann et al. (2006).
– Geographical location variables: percentage of regions belonging to tropical
zones (T ROP ), on the one hand, and subtropical zones
(SU BT ROP ), on the other hand.
– Infant mortality rate (MOR), in logarithms.

The results obtained by Thuilliez (2007) are displayed in Table 3.7. They show
a relationship between malaria and primary school repetition and completion rates.
Concerning the repetition rate, the coefficient assigned to the variable MALA is
positive, meaning that malaria tends to increase the repetition rate, all other things
being equal. The value of 0.096 shows that children living in malaria-risk countries

Table 3.7 Education and Repetition Completion


malaria
MALA .0.096 .−0.295
(2.46) (−3.40)
INC .0.023 .−0.003
(1.53) (−0.06)
U RB .0.031 .0.072
(0.97) (1.36)
EXP .−0.019 .0.018
(−1.36) (0.60)
GEI .−0.001 .−0.016
(−0.08) (−0.42)
T ROP .0.017 .−0.031
(0.39) (−0.40)
SU BT ROP .0.038 .0.008
(1.31) (0.10)
MOR .0.053 .−0.095
(3.12) (−1.51)
Constant .−0.530 .0.991
(−2.83) (1.77)
Source: Thuilliez (2007). Numbers in
parentheses are t-statistics of the esti-
mated coefficients. In the “Repetition”
(resp. “Completion”) column, the depen-
dent variable is the repetition (resp. com-
pletion) rate in primary school.
3.5 Some Examples of Cross-Sectional Applications 137

have repetition rates 9.6% higher than those living in noninfested countries, all other
things being equal. Similarly, if we consider the regression with completion rate
as the explained variable, it appears that the coefficient is negative: malaria has a
negative impact on the completion rate. The estimated value of the coefficient also
shows that high-risk countries have primary school completion rates 29.5% lower
than those of non-risk countries, all other things being equal. On the other hand, we
find that per capita income has no significant impact on repetition and completion
rates, nor do geographic location variables and public expenditure variables. The
fact that public expenditure does not appear to be significant simply illustrates that
increasing school resources does not imply children get better results. Only the
infant mortality rate has a positive effect on the primary school repetition rate. In
summary, this study by Thuilliez (2007) shows that malaria has a negative impact
on children’s school performance, which is in line with expectations.

3.5.3 Inequalities and Financial Openness

The third example comes from the work of Bénassy-Quéré and Salins (2005) and
examines the impact of financial openness on income inequalities for 42 developing
and emerging countries in 2001. Four types of inequalities are distinguished: inter-
individual inequalities, geographical disparities, urban/rural disparities, and regional
disparities. The financial openness variable (denoted OP ENF I ) is defined as
the sum of the absolute values of the country’s capital inflows and outflows, as
a percentage of its GDP. The explanatory variables selected, which are control
variables, differ according to the type of inequality considered.
For the study of inter-individual income inequalities, three control variables are
used:

– Social mobility (SOCMOB): this variable can take four values, ranging from 1
(for the worst institutional environment) to 4 (for the best environment). Social
mobility corresponds to recruitment and promotion in the public and private
sectors. A value of 1 corresponds to recruitment or promotion based on social
position, and a value of 4 corresponds to recruitment or promotion based on
merit.
– Trade openness (T RADE): this represents the scale of tariff and nontariff
barriers. It ranges from 1 (high barriers) to 4 (low barriers).
– The scale of structural reforms undertaken in the country under consideration
following financial and trade openness (REF ). The variable ranges from 1 (no
reform) to 4 (very extensive reforms).
138 3 The Multiple Regression Model

Concerning geographical disparities, the following three control variables are


considered:

– GDP per capita (in logarithms), denoted GDP .


– The level of infrastructure development (I N F RA). The variable is 1 when the
level of infrastructure is low and 4 when it is high.
– The geographical mobility of a country’s inhabitants (GEOMOB). It varies
from 1 (low mobility) to 4 (high mobility).

The results concerning the effect of financial openness on income equality are
given in the equation below:

. 
EQU ALI T Y = 1.730 − 0.226OP ENF I + 0.436SOCMOB (3.132)
(3.43) (0.43) (2.19)

− 0.403 T RADE + 0.262REF


(−2.37) (1.89)

The dependent variable is an indicator of inter-individual income equality within


a given population. Consequently, when an explanatory variable has a positive
coefficient, it tends to increase equality and therefore reduce income inequality. The
figures in parentheses below the values of the estimated coefficients are t-statistics of
these coefficients. This estimation shows that financial openness has no significant
impact on income inequalities: the t-statistics assigned to the coefficient associated
with the variable OP ENF I is indeed lower than the critical value at the 5%
significance level, implying that OP ENF I does not contribute to the explanation
of income inequalities. Concerning the other three variables, it turns out that social
mobility and reforms reduce inequalities, since their associated coefficients are
positive, whereas trade openness tends to increase them.
Table 3.8 presents the results relating to geographical disparities. As shown,
financial openness tends to reduce geographical income inequalities, as well as
urban-rural inequalities. However, the impact of financial openness on regional
inequalities is not significant. The level of infrastructure development tends to

Table 3.8 Impact of financial openness on geographic disparities


Geographic equality Urban-rural equality Regional equality
Constant .0.524 .0.720 .0.251
(0.46) (0.47) (0.27)
OP EN F I .1.146 .2.172 .−0.283
(4.92) (6.19) (−0.78)
GDP .0.130 .0.089 .0.186
(0.98) (0.51) (1.71)
I N F RA .−0.198 .−0.049 .−0.406
(−2.33) (−0.37) (−2.9)
GEOMOB .0.033 .−0.036 .0.128
(0.39) (−0.32) (1.36)
Source: Bénassy-Quéré and Salins (2005). Figures in parentheses are t-statistics of the estimated
coefficients.
3.5 Some Examples of Cross-Sectional Applications 139

increase regional inequalities, while the level of GDP per capita tends to reduce
them.

3.5.4 Inequality and Voting Behavior

The last cross-sectional example we present is based on the article by Farvaque


et al. (2007), and concerns the impact of environmental variables on the electoral
behavior of residents in France. The sample consists of 560 French municipalities
with more than 10,000 inhabitants and covers the 2001 municipal elections.
The dependent variable is the outcome of the incumbent mayor’s party
(P ART Y ), expressed as a percentage of total votes cast. The explanatory variables
are:

– Incumbent bonus (SORT ): this is the percentage of votes obtained by the


incumbent party in the previous municipal elections.
– A binary variable that is 1 if the incumbent party is the same as the one to which
the President of the Republic belongs (Jacques Chirac in 2001). This variable,
noted P RES, equals 0 when the party is different.
– A variable representing the merger (MERGER) of lists between the two rounds
of voting. The value of this variable is 1 when a merger occurs between the two
rounds of voting, 0 otherwise.
– Two ecological variables are also used:
– An air pollution indicator representing the ozone concentration (OZONE)
for each region considered
– A soil pollution indicator representing the number of polluted sites and soils
in the county (P OLL)
– Finally, a control variable, noted P OP , is introduced to take into account the
size of the cities: the variable P OP is 1 if the city counts more than 100,000
inhabitants, 0 otherwise.

The results obtained by Farvaque et al. (2007) are:

P
.ART Y = 33.43 + 0.47 SORT + 6.10 P RES − 12.24 MERGER (3.133)
(6.23) (6.21) (4.19) (−3.25)

− 0.02 OZONE − 0.02 P OLL − 4.40 P OP


(−2.82) (−2.75) (−3.51)

Among the political variables, the incumbent’s bonus has a positive and highly
significant effect. Thus, all other things being equal, almost half of the votes
obtained by the party in the previous elections are transferred to the explained
variable. The variable P RES also has a significant positive impact, illustrating
significant regional and national interactions. Finally, the variable MERGER has
a negative sign, indicating that the merger of the incumbent party with another list
between the two rounds of voting has a negative effect on the incumbent party. Also
140 3 The Multiple Regression Model

noteworthy is the significant impact of the variable P OP showing that the size of
the municipality has an unfavorable effect on the incumbent party.
Turning now to the ecological variables, their impact is negative. The variable
OZONE has a significant and negative coefficient: air pollution has a negative
impact on the election of the incumbent mayor. The same is true for the variable
P OLL, indicating that voters tend to punish incumbent mayors from municipalities
with the most polluted sites and soils. Overall, this study shows that ecological
inequalities, as reflected in environmental variables, have a significant impact on
electoral behavior.

3.6 Prediction

One of the practical interests of the regression model lies in forecasting. Thus, once
the model has been estimated, it can be used to predict the evolution of the dependent
variable.

3.6.1 Determination of Predicted Value and Prediction Interval

The following model was estimated:

Yt = α + β1 X1t + β2 X2t + . . . + βk Xkt + εt


. (3.134)

for .t = 1, . . . , T , i.e.:

. Ŷt = α̂ + β̂1 X1t + β̂2 X2t + . . . + β̂k Xkt (3.135)

We seek to determine the forecast of the dependent variable for a horizon h, i.e.,
. ŶT +h , as well as the associated prediction interval. The latter is given by:

 −1
. ŶT +h ± tp/2 σ̂ε R X' X R' + 1 (3.136)

where .R is the matrix containing the values of the explanatory variables at date
T + h and whose first element is 1.
.

Let us explain this expression and the value taken by .ŶT +h . Assuming that the
relationship generating the explained variable remains identical and that the values
of the explanatory variables are known in .T + h, we have:

. ŶT +h = α̂ + β̂1 X1T +h + β̂2 X2T +h + . . . + β̂k XkT +h (3.137)


3.6 Prediction 141

The prediction error is:

eT +h = YT +h − ŶT +h
. (3.138)

= εT +h − α̂ − α − β̂1 − β1 X1T +h − . . . − β̂k − βk XkT +h

Since the OLS estimators of the coefficients are unbiased estimators and given
that .E (εT +h ) = 0, we deduce:

E (eT +h ) = 0
. (3.139)

In order to determine a prediction interval, we need to calculate the variance of


the forecast error. To do this, let us express the forecast in matrix form. We can
write:

YT +h = Rβ + εT +h
. (3.140)

where, as before, .R is the matrix (row vector) containing the values of the
explanatory variables at date .T + h and whose first element is 1. The forecast is
then given by:

ŶT +h = R β̂
. (3.141)

and the forecast error is written as:

eT +h = YT +h − ŶT +h = εT +h − R β̂ − β
. (3.142)

Knowing that .E β̂ = β and .E (εT +h ) = 0, we have:

E (eT +h ) = 0
. (3.143)

The variance of the forecast error is:



'
V (eT +h ) = E
. εT +h − R β̂ − β εT +h − R β̂ − β (3.144)

Using (3.265), we obtain:


 −1
V (eT +h ) = σε2 R X' X
. R' + 1 (3.145)

Knowing that the forecast error is normally distributed, we have:


eT +h
.   ∼ N (0, 1) (3.146)
−1 '
σε R X' X R +1
142 3 The Multiple Regression Model

Replacing .σε by its estimator (see Eq. (3.40)):



e' e
.σ̂ε = (3.147)
T −k−1
we get:

YT +h − ŶT +h
.   ∼ t (T − k − 1) (3.148)
−1 '
σ̂ε R X' X R +1

It is then possible to construct a .100(1 − p)% prediction interval for .YT +h :



 −1
ŶT +h ± tp/2 σ̂ε R X' X
. R' + 1 (3.149)

Taking the usual value .p = 5%, the 95% interval is thus written:

 −1
ŶT +h ± t0.025 σ̂ε R X' X
. R' + 1 (3.150)

3.6.2 Example

Let us take the example studied throughout this chapter (see Sect. 3.2.5) and suppose
that we wish to predict the value of Y for the date .t = 7. Also assume that the values
of the three explanatory variables are known in .t = 7 and are given by: .X17 = 6,
.X27 = 8, and .X37 = 1. The matrix .R is written as:


R= 1681
. (3.151)

The expected value of .Y7 is given by:

. Ŷ7 = α̂ + β̂1 X17 + β̂2 X27 + β̂3 X37 (3.152)

that is:

Ŷ7 = 5.57 + 0.21 × 6 + 0.47 × 8 − 0.87 × 1


. (3.153)

Hence:

Ŷ7 = 9.72
. (3.154)
3.7 Model Comparison Criteria 143

Now let us determine the 95% prediction interval for .Y7 . We had found:
⎛ ⎞
2.87 −0.24 −0.24 −0.11
 ' −1 ⎜−0.24 0.04 0.02 −0.002⎟
XX =⎜
⎝−0.24
⎟ (3.155)
−0.004⎠
.
0.02 0.04
−0.11 −0.002 −0.004 0.03

Therefore:
⎛ ⎞⎛ ⎞
2.87 −0.24 −0.24 −0.11 1
 ' −1 '  ⎜−0.24 0.04 0.02 −0.002⎟ ⎜6⎟
.R X X R = 1681 ⎜
⎝−0.24
⎟⎜ ⎟ (3.156)
0.02 0.04 −0.004⎠ ⎝8⎠
−0.11 −0.002 −0.004 0.03 1

Hence:
 −1
R X' X
. R ' = 1.79 (3.157)

We had also obtained: .σ̂ε2 = 1.28, i.e., .σ̂ε = 1.13. Using (3.150) and knowing
that .t (T − k − 1) = t (2) = 4.303, the 95% prediction interval for .Y7 is:

9.72 ± 4.303 × 1.13 1.79 + 1 = 9.72 ± 8.12
. (3.158)

which corresponds to an interval ranging from 1.60 to 17.84.

3.7 Model Comparison Criteria

We have previously presented the coefficient of determination and the adjusted


coefficient of determination. These two statistics can be used to compare models.
There are also other criteria, which we describe below.

3.7.1 Explanatory Power/Predictive Power of a Model

It is worth distinguishing the explanatory power of a model from its predictive


power, or, in other words, in-sample prediction from out-of-sample prediction.
In-sample prediction indicates how the estimated model fits the observations in
a given sample. It thus concerns the explanatory power of the model. Out-of-
sample prediction refers to how an estimated model predicts the future values of
the explained variable, given the future values of the explanatory variables. Out-of-
sample forecasting therefore concerns the predictive power of a model.
144 3 The Multiple Regression Model

3.7.2 Coefficient of Determination and Adjusted Coefficient


of Determination

These coefficients have already been presented. Let us just recall some essential
points.
The coefficient of determination lies between 0 and 1; the closer it is to 1, the
better the quality of the fit. It measures the quality of the fit within the sample
and is therefore a measure of a model’s explanatory power. It is not because a
model has a high coefficient of determination that it will perform well in out-of-
sample forecasting; the coefficient of determination is not a measure of a model’s
predictive power. If several models—with the same explained variable and the same
number of explanatory variables—are compared on the basis of the coefficient of
determination, the one with the highest .R 2 value should be selected.
It is important to remember, however, that there is a nondecreasing relationship
between the value of the coefficient of determination and the number of explanatory
variables introduced into a model. For this reason, the adjusted (or corrected)
coefficient of determination has been proposed and can be used to compare models
with a different number of explanatory variables; the best model being the one
with the highest adjusted coefficient of determination. Like the usual coefficient
of determination, the adjusted coefficient of determination only allows us to judge
the explanatory power of a model, not its predictive power.

3.7.3 Information Criteria

The information criteria are based on information theory and are intended to assess
the loss of information—called the amount of Kullback information8 —when an
estimated model is thought to represent the true data-generating process. Since the
aim is to minimize this loss of information, the model to be preferred, among all the
models estimated, is the one that will minimize the information criteria.
The information criteria (I C) are based on the use of the maximum likelihood
method (see Appendix 2.3 to Chap. 2). They are of the form:

2𝓁 p(T )
IC = −
. + (3.159)
T T
where .𝓁 is the log-likelihood function:
  ' 
T ee
.𝓁=− 1 + log(2π ) + log (3.160)
2 T

8 Strictly speaking, this is known as Kullback-Leibler information (see Kullback and Leibler,
1951).
3.7 Model Comparison Criteria 145

and .p(T ) is a penalty function that increases with the model’s complexity, i.e.,
with the number of explanatory variables introduced into the model. In other words,
the information criteria penalize the addition of variables to guard against the risk of
over-fitting or over-parameterization. The various information criteria proposed in
the literature are distinguished by the penalty function adopted. The most frequently
used criteria are those introduced by Akaike (1973),9 Schwarz (1978), and, to a
lesser extent, Hannan and Quinn (1979), which we present below.

Akaike Information Criterion ( AIC)


The Akaike information criterion is written:
−2𝓁 2(k + 1)
AI C =
. + (3.161)
T T

which can still be expressed, under the assumption of error normality,10 as follows:

2(k + 1)
AI C = log σ̂ε2 +
. (3.162)
T

where .σ̂ε2 is the error variance estimator, k is the number of explanatory variables
included in the model, and T is the number of observations. If several models are
compared on the basis of this criterion, the model with the lowest AIC will be
selected. This is thus a criterion to be minimized. Unlike the usual and adjusted
coefficients of determination, the AIC can be used to assess the explanatory power
of a model, but also its predictive power.

Remark 3.6 When the number of explanatory variables k is large relative to the
number of observations—which may happen in the case of a small sample—it is
possible to use the corrected AIC (see Hurvich and Tsai, 1989), denoted .AI Cc ,
given by:

2(k + 1)(k + 2)
.AI Cc = AI C + (3.163)
T −k−2

Schwarz Information Criterion (SIC)


The SIC is given by:

−2𝓁 (k + 1)
SI C =
. + log(T ) (3.164)
T T

9 See also Akaike (1969, 1974).


10 In this case, the OLS and maximum likelihood estimators are equivalent.
146 3 The Multiple Regression Model

or, under the assumption of error normality:

(k + 1)
. SI C = log σ̂ε2 + log T (3.165)
T

where .σ̂ε2 is the error variance estimator, k is the number of explanatory variables
included in the model, and T is the number of observations. As with the AIC, the
SIC must be minimized. Thus, the best model will be the one with the lowest
SIC value. Like the AIC, the SIC criterion can be used to compare predictive
performance both within and out of the sample.
The SIC is more parsimonious than the AIC, as it penalizes the number of
variables in the model more heavily. In other words, it penalizes more the over-
parameterization. As a result, the SIC tends to select models with either the same
number of variables or fewer variables than those selected by the AIC.

Hannan-Quinn Information Criterion (HQ)


The Hannan-Quinn information criterion, also used to assess the explanatory and
predictive power of a model, is given by:

−2𝓁 (k + 1) log(log T )
HQ =
. + 2c (3.166)
T T
or, under the assumption of error normality:

(k + 1) log(log T )
H Q = log σ̂ε2 + 2c
. (3.167)
T

where .σ̂ε2 is the error variance estimator, k is the number of explanatory variables
included in the model, T is the number of observations, and c is a constant term for
which a value of 1 is very frequently used. Like the other two information criteria,
the HQ criterion must be minimized.

3.7.4 The Mallows Criterion

Suppose we want to compare a model with k explanatory variables to a model with


h explanatory variables, with .h ≤ k. We denote .RSSh the sum of squared residuals
associated with the model with h regressors and .σ̂ε2 the error variance estimate
associated with the model with k regressors. The Mallows statistic is given by:

RSSh
Ch =
. + (2(h + 1) − T ) (3.168)
σ̂ε2
3.8 Empirical Application 147

It can be shown that:

E (Ch ) ≃ h
. (3.169)

In choosing a model based on this statistic, we should keep the model whose
statistic .Ch is the closest to h.

3.8 Empirical Application

We propose to study the relationship between the following three series of stock
market returns:

– Returns of the US Dow Jones Industrial Average index: RDJ I ND


– Returns of the UK, F T SE 100 index: RF T SE
– Returns of the Japanese, NI KKEI 225 index: RNI KKEI

The data are shown in Table 3.9 and are taken from the Macrobond database.
The series are monthly and cover the period from February 1984 to June 2021, i.e.,
a number of observations .T = 449. Suppose we wish to explain the returns of
the UK index by the returns of the Japanese and US stock indexes. The dependent
variable is therefore RF T SE and the explanatory variables are RNI KKEI and
RDJ I ND. We seek to estimate the following model:

RF T SEt = α + β1 RNI KKEIt + β2 RDJ I NDt + εt


. (3.170)

Table 3.9 Data on stock t RF T SE RN I KKEI RDJ I N D


market returns
1984.02 .−0.0216 .−0.0164 .−0.0555
1984.03 0.0671 0.0858 0.0088
1984.04 0.0229 0.0048 0.0050
.· · · .· · · .· · · .· · ·

2021.04 0.0374 .−0.0126 0.0267


2021.05 0.0075 0.0016 0.0191
2021.06 0.0021 .−0.0024 .−0.0008

Sum 1.8902 1.0381 3.3417


Data source: Macrobond
148 3 The Multiple Regression Model

3.8.1 Practical Calculation of the OLS Estimators


⎛ ⎞
α
We aim to estimate the vector .β = ⎝β1 ⎠ using the OLS method. The matrix .X is
β2
written as:
⎛ ⎞
1 −0.0164 −0.0555
⎜1 0.0858 0.0088 ⎟
⎜ ⎟
⎜ ⎟
⎜1 0.0048 0.0050 ⎟
⎜. ⎟
.X = ⎜ . ⎟
.. .. (3.171)
⎜. . . ⎟
⎜ ⎟
⎜1 −0.0126 0.0267 ⎟
⎜ ⎟
⎝1 0.0016 0.0191 ⎠
1 −0.0024 −0.0008

In practice, as we have previously seen, we can calculate the matrix .X ' X as


follows:
⎛ 
T RNI KKEI
 
.X X = ⎝
'
RNI KKEI RNI KKEI 2
 
RDJ I ND RNI KKEI × RDJ I N D
 ⎞
RDJ I N D

RNI KKEI × RDJ I N D ⎠ (3.172)

RDJ I ND 2

and the matrix .X' Y as follows:


⎛  ⎞
RF T SE

.X Y = ⎝ RF T SE × RNI KKEI ⎠
'
(3.173)

RF T SE × RDJ I N D

Calculating the products between the variables gives:


449
– . RF T SEt × RNI KKEIt = 0.5788
t=1

449
– . RF T SEt × RDJ I NDt = 0.6902
t=1

449
– . RNI KKEIt × RDJ I NDt = 0.6178
t=1
3.8 Empirical Application 149

and calculating the sums of squares leads to:


449
– . RF T SEt2 = 0.8937
t=1

449
– . RNI KKEIt2 = 1.5665
t=1

449
– . RDJ I N Dt2 = 0.8890
t=1

We deduce the matrices:


⎛ ⎞
449 1.0381 3.3417
.X X = ⎝1.0381 1.5665 0.6178⎠
'
(3.174)
3.3417 0.6178 0.8890

and:
⎛ ⎞
1.8902
.X Y = ⎝0.5788⎠
'
(3.175)
0.6902

Calculating the inverse of . X' X gives:
⎛ ⎞
 ' 0.0023 0.0026 −0.0104
−1
. X X = ⎝ 0.0026 0.8823 −0.6230⎠ (3.176)
−0.0104 −0.6230 1.5971

Hence:
⎛ ⎞⎛ ⎞
0.0023 0.0026 −0.0104 1.8902
.β̂ = ⎝ 0.0026 0.8823 −0.6230⎠ ⎝0.5788⎠ (3.177)
−0.0104 −0.6230 1.5971 0.6902

or finally:
⎛ ⎞
−0.0014
.β̂ = ⎝ 0.0856 ⎠ (3.178)
0.7221

The estimated model is therefore:


RF
. T SE t = −0.0014 + 0.0856RNI KKEIt + 0.7221RDJ I NDt (3.179)
150 3 The Multiple Regression Model

Table 3.10 Series of stock market returns. OLS estimation


Dependent variable: RFTSE
Variable Coefficient Std. error t-Statistic Prob.
C .−0.001362 0.001340 .−1.016558 0.3099
RNIKKEI 0.085603 0.026253 3.260758 0.0012
RDJIND 0.722065 0.035320 20.44346 0.0000
R-squared 0.606707 Mean dependent var 0.004210
Adjusted R-squared 0.604944 S.D. dependent var 0.044466
S.E. of regression 0.027948 Akaike info criterion .−4.310264

Sum squared resid 0.348373 Schwarz criterion .−4.282823


Log likelihood 970.6542 Hannan-Quinn criterion .−4.299447
F-statistic 344.0076 Durbin-Watson stat 2.246332
Prob(F-statistic) 0.000000

3.8.2 Software Estimation

Using Eviews software leads to the results shown in Table 3.10. In addition to the
estimation of the three coefficients, we have several results.
In particular, Table 3.10 gives us the standard deviations and t-statistics of the
estimated coefficients. We thus have:

α̂ −0.0014
tα̂ =
. = = −1.0166 (3.180)
σα̂ 0.0013

β̂1 0.0856
tβ̂1 =
. = = 3.2608 (3.181)

σβ̂1 0.0263

β̂2 0.7221
tβ̂2 =
. = = 20.4435 (3.182)

σβ̂2 0.0353

It is then possible to perform significance tests on each of these coefficients. Each


of the t-statistics has a Student’s t distribution with .(T − k − 1) = (449 − 2 − 1) =
446 degrees of freedom. At the 5% significance level, the critical value is equal to
1.96. It can be seen that:

– .|tα̂ | = 1.0166 < 1.96: we do not reject the null hypothesis that .α = 0. The
constant
 is not significantly different from zero.
 
– .tβ̂1  = 3.2608 > 1.96: we reject the null hypothesis that .β1 = 0. The coefficient
associated with the Japanese variable is significant, meaning that RNI KKEI
contributes to the explanation of RF T SE.
3.8 Empirical Application 151

 
 
– .tβ̂2  = 20.4435 > 1.96: we reject the null hypothesis that .β2 = 0.
The coefficient associated with the US variable is significant, meaning that
RDJ I ND contributes to the explanation of RF T SE.

Furthermore, Table 3.10 gives us the value of the F -statistic allowing us to test
the significance of the regression as a whole, i.e., to test the null hypothesis that
.β1 = β2 = 0. The statistic F follows a Fisher distribution with .(q, T − k − 1) =

(2449−2−1) = (2446) degrees of freedom. The Fisher table, for a 5% significance


level, gives us: .F (2146) = 2.997. Given that .344.0076 > 2.997, the null hypothesis
that the coefficients are both equal to zero is rejected. This conclusion was expected
in view of the results from the t-tests. 
Table 3.10 provides the value of the coefficient of determination . R 2 = 0.6067

and the adjusted coefficient of determination . R̄ 2 = 0.6049 . Thus, about 60% of
the variation in UK returns is explained by the model, i.e., by Japanese and US
returns.
If we compare the t-statistics
  of the
 coefficients
 assigned to RNI KKEI and
   
RDJ I ND, we can see that .tβ̂2  > tβ̂1 . Therefore, the variable RDJ I ND has
more influence than RNI KKEI on RF T SE. We can then look at the simpler
model consisting of regressing UK returns on US returns only. The results of this
estimation are given in Table 3.11.

Table 3.11 Regression of RFTSE on RDJIND


Dependent variable: RFTSE
Variable Coefficient Std. error t-Statistic Prob.
C .−0.001614 0.001352 .−1.193763 0.2332
RDJIND 0.782505 0.030388 25.75060 0.0000
R-squared 0.597331 Mean dependent var 0.004210
Adjusted R-squared 0.596430 S.D. dependent var 0.044466
S.E. of regression 0.028248 Akaike info criterion .−4.291158

Sum squared resid 0.356678 Schwarz criterion .−4.272864


Log likelihood 965.3650 Hannan-Quinn criterion .−4.283947
F-statistic 663.0935 Durbin-Watson stat 2.224700
Prob(F-statistic) 0.0000
152 3 The Multiple Regression Model

Since the two estimated models (Tables 3.10 and 3.11) have the same dependent
variable and cover the same period, they can be compared using the adjusted
coefficient of determination. We see that: .0.5964 < 0.6049. The first model,
which also incorporates Japanese returns, is to be preferred, as it explains a higher
percentage of the variation in RF T SE. We can also note that, as expected, the sum
of squared residuals of the first model is lower than that associated with the second
model: .0.3484 < 0.3567, corroborating the superiority of the first regression. These
results are confirmed by the values taken by the Akaike, Schwarz, and Hannan-
Quinn information criteria. Indeed, the model minimizing these three criteria is
the one that includes the returns of the Japanese stock index. These results were
expected since, even if the explanatory power of Japanese returns is lower than that
of US returns, the Japanese variable contributes to the explanation of the UK series.
The model in Table 3.11 can be considered a constrained model, the uncon-
strained model being given in Table 3.10. The constrained model is such that
.β1 = 0, meaning that the Japanese returns are not significant. It is then possible

to perform a Fisher test on this hypothesis:

(RSSc − RSSnc ) /q
F =
. ∼ F (q, T − k − 1) (3.183)
RSSnc / (T − k − 1)

with .RSSnc = 0.348373, .RSSc = 0.356678 and .q = 1. So we have:

(0.356678 − 0.348373) /1
F =
. = 10.6324 (3.184)
0.348373/ (449 − 2 − 1)

This statistic follows a Fisher distribution with .(1446) degrees of freedom. At the
5% significance level, the Fisher table gives us .F (1446) = 3.842. Since .10.6324 >
3.842, we reject the null hypothesis that the coefficient associated with the Japanese
variable is not significant. The Japanese returns contribute to the explanation of the
UK returns, which is of course consistent with the result derived from the t-test on
the coefficient associated with this same variable.

Conclusion

This chapter has provided a detailed presentation of the multiple regression model.
It should be recalled that the model is based on a number of hypotheses relating
to the explanatory variables and the error term. These include the fundamental
assumptions of no autocorrelation and homoskedasticity of errors. Since, in practice,
one or both of these assumptions are often not met, Chap. 4 presents the procedure
to be followed when autocorrelation and/or heteroskedasticity of errors occur.
Appendix 3.1: Elements of Matrix Algebra 153

The Gist of the Chapter

Multiple regression model


Y = X β + ε
(T ,1) (T ,k+1)(k+1,1) (T ,1)

Matrices Vector of explained variable values: Y


Matrix of k explanatory variables: X
Vector of error term values: ε
Assumptions X is nonrandom: E (εt |X ) = 0
X is of full rank: Rank(X) = k + 1
Zero mean error: E(ε) = 0
Non-autocorrelation and homoskedasticity:

E εε ' = σε2 I

Normality: ε ∼ N 0, σε2 I
Residuals e = Y − X β̂
 −1 '
OLS estimators β̂ = X' X XY
e' e T
σ̂ε2 = T −k−1 ≡ T −k−1
1 2
t=1 et
−1

Adjusted coefficient of R̄ 2 = 1 − T T−k−1 1 − R2
determination
2(k+1)
Information criteria Akaike: AI C = log σ̂ε2 + T
Schwarz: SI C = log σ̂ε2 + (k+1)
T log T
Hannan-Quinn (with c = 1): H Q = log σ̂ε2 + 2 (k+1) log(log
T
T)

Further Reading

As in the previous chapter, developments on the multiple regression model can be


found in all econometrics textbooks (see the references cited in the reference section
at the end of the book). Worth mentioning are Judge et al. (1985, 1988), Johnston
and Dinardo (1996), Davidson and MacKinnon (1993), and Greene (2020). For
nonlinear systems of equations, interested readers may consult Gallant (1987).

Appendix 3.1: Elements of Matrix Algebra

This appendix presents the main matrix algebra concepts used in this chapter.
154 3 The Multiple Regression Model

General

A matrix is an ordered array of elements (or entries):


⎛ ⎞
a11 a12 · · · a1p
 ⎜a21 a22 · · · a2p ⎟
A = aij
. =⎜


⎠ (3.185)
an1 an2 anp

aij is the element corresponding to the ith row and the j th column of the matrix .A.
.

The matrix .A has n rows and p columns. The size (or the dimension) of the matrix
is said to be .n × p (which is also noted as .(n, p)).
A row vector is a matrix containing only one row. A column vector is a matrix
with only one column. A matrix can therefore be thought of as a set of row vectors
or column vectors.
When the number of rows is equal to the number of columns, i.e., .n = p, we say
that .A is a square matrix. Frequently used square matrices include:
 
– Symmetric matrix: it is such that . aij = aj i for all i and j .
– Diagonal matrix: this is a matrix whose elements off the diagonal are zero:
⎛ ⎞
α1 0 0 · · · 0
⎜ 0 α2 0 · · · 0 ⎟
⎜ ⎟
⎜ .. ⎟
.A = ⎜ · · ·⎟
⎜· · · . ⎟ (3.186)
⎜ . ⎟
⎝· · · . . 0⎠
0 0 · · · 0 αp

– Scalar matrix: this is a diagonal matrix whose elements on the diagonal are all
identical:
⎛ ⎞
α 0 0 ··· 0
⎜ 0 α 0 ··· 0 ⎟
⎜ ⎟
⎜ .. ⎟
.A = ⎜ · · ·⎟
⎜· · · . ⎟ (3.187)
⎜ .. ⎟
⎝· · · . 0⎠
0 0 ··· 0 α
Appendix 3.1: Elements of Matrix Algebra 155

– Identity matrix: this is a scalar matrix, noted .I , whose elements on the diagonal
are all equal to 1:
⎛ ⎞
1 0 0 ··· 0
⎜0 1 0 ··· 0 ⎟
⎜ ⎟
⎜ .. ⎟
.I = ⎜· · · . · · ·⎟ (3.188)
⎜ ⎟
⎜ .. ⎟
⎝· · · . 0⎠
0 0 ··· 0 1

– Triangular matrix (inferior or superior): this is a matrix with only null


elements above or below the diagonal.

Main Matrix Operations

Let .B be a matrix such that:


⎛ ⎞
b11 b12 · · · b1p
 ⎜b21 b22 · · · b2p ⎟
B = bij
. =⎜


⎠ (3.189)
bn1 bn2 bnp

Equality
The matrices .A and .B are equal if they are of the same size and if .aij = bij for all
i and j .

Transposition
The transpose .A' of the matrix .A is the matrix whose j th row corresponds to the
j th column of the matrix .A. Since the size of matrix .A is .n × p, the size of matrix
'
.A is .p × n. Thus, we have:

⎛ ⎞
a11 a21 · · · an1
⎜ a12 a22 · · · an2 ⎟
.A = ⎜ ⎟
'
⎝ ⎠ (3.190)
a1p a2p anp

A symmetric matrix is therefore, by definition, a matrix such that:

A = A'
. (3.191)

The transpose of the transpose is equal to the original matrix, i.e.:


 ' '
. A =A (3.192)
156 3 The Multiple Regression Model

Addition and Subtraction


Two matrices .A and .B can be added only if they are of the same dimensions, the
matrix .C resulting from this sum also having this dimension:

. C = A + B = aij + bij (3.193)

Similarly, we have:

D = A − B = aij − bij
. (3.194)

Note that:

. (A + B)' = A' + B ' (3.195)

i.e., the transpose of a sum is equal to the sum of the transpose.

Matrix Multiplication and Scalar Product


The scalar product of a line vector .a with n elements and a column vector .b with
n elements is a scalar:

a ' b = a1 b1 + a2 b2 + . . . + an bn
. (3.196)

that is:
n
. a ' b = b' a = ai bi (3.197)
i=1

Now consider two matrices .A and .B and assume that .A is of size .n × p and .B
is of size .p × q. The matrix .C resulting from the product of these two matrices is a
matrix of size .n × q, that is:

. C = A B (3.198)
(n×q) (n×p)(p×q)

By noting .a i .(i = 1, 2, . . . , n) the rows of .A and .bj .(j = 1, 2, . . . , q) the


columns of .B, each element of .C is the scalar product of a row vector of .A and a
column vector of .B. Noting .cij the ij th element of the matrix .C, we thus have:

cij = a 'i bj
. (3.199)

Matrix multiplication is only possible if the number of columns in the first matrix
(matrix .A) is equal to the number of rows in the second matrix (matrix .B). In this
case, we speak about matrices that are compatible for multiplication.
Appendix 3.1: Elements of Matrix Algebra 157

The scalar multiplication of a matrix is the multiplication of each element of


that matrix by a given scalar. Thus, for a scalar c and a matrix .A, we have:

cA = caij
. (3.200)

We also note the following results:

– Multiplication by the identity matrix:

. AI = I A = A (3.201)

– Transpose of a product of two matrices:

. (AB)' = B ' A' (3.202)

– Transpose of a product of more than two matrices:

. (ABC)' = C ' B ' A' (3.203)

– Multiplication of matrices is associative:

. (AB) C = A (BC) (3.204)

– The sum and multiplication of matrices are distributive:

A (B + C) = AB + AC
. (3.205)

Idempotent Matrix
An idempotent matrix .A is a matrix verifying: .AA = A. In other words, an
idempotent matrix is equal to its square. Furthermore, if .A is a symmetric
idempotent matrix, then .A' A = A.

Rank, Trace, Determinant, and Inverse Matrix


Rank of a Matrix
Consider a matrix .A of size .n × p. The rows of .A constitute n vectors, while the
columns of .A represent p vectors. Let r denote the maximum number of linearly
independent rows and s the maximum number of linearly independent columns. We
can demonstrate that, for any matrix .A of size .n × p, we have:

r=s
. (3.206)
158 3 The Multiple Regression Model

This maximum number of linearly independent rows or columns is called the


rank of the matrix .A. The maximum number of linearly independent rows is called
the row rank and the maximum number of linearly independent columns is called
the column rank. The row rank and the column rank of a matrix are therefore equal.
A matrix whose rank is equal to the number of its columns is called a full rank
matrix.
The rank of a matrix is therefore necessarily less than or equal to the number of
its rows or columns, i.e.:

Rank (A) ≤ min (n, p)


. (3.207)

Furthermore, we have the following properties:



Rank (A) = Rank A'
. (3.208)
 
Rank (A) = Rank A' A = Rank AA'
. (3.209)

.Rank (AB) ≤ min (Rank (A) , Rank(B)) (3.210)

We can deduce from this last equation that if .A is a matrix of size .n × p and .B
a square matrix of size .n × n, then:

Rank (AB) = Rank(A)


. (3.211)

If .B is a square matrix of size n and rank n, it is said to be nonsingular. In this


case, it admits a unique inverse matrix (see below) noted .B −1 such that:

BB −1 = B −1 B = I
. (3.212)

When the rank of the matrix .B is less than n, the matrix .B is said to be singular
and has no inverse.

Trace of a Matrix
The trace of a square matrix .A of size .n × n, denoted .T r(A), is the sum of its
diagonal elements:
n
T r(A) =
. aii (3.213)
i=1

Furthermore:

T r(A) = T r(A' )
. (3.214)

T r(A + B) = T r(A) + T r(B)


. (3.215)
Appendix 3.1: Elements of Matrix Algebra 159

and:

T r(AB) = T r(BA)
. (3.216)

Determinant of a Matrix
The determinant of a matrix is defined for square matrices only.
As an introductory example, consider a matrix .A of size .2 × 2:
 
ac
A=
. (3.217)
bd

The determinant of the matrix .A, denoted .det (A) or .|A|, is given by:
 
a c 
.det (A) = |A| =  = ad − bc (3.218)
b d

More generally, for matrices of size .n × n, we use the cofactor expansion:


n
 
det (A) = |A| =
. aij (−1)i+j Aij  (3.219)
j =1

  .Aij is the matrix obtained from matrix .A by deleting row i and column j .
where
Aij  is called a minor and the term:
.

 
. Cij = (−1)i+j Aij  (3.220)

is called a cofactor.
We have the following property:
 
. |A| = A'  (3.221)

If .A and .B are two square matrices, we have:

. |AB| = |A| × |B| (3.222)

Moreover, the determinant of a matrix is nonzero if and only if that matrix is


of full rank. This last property thus provides a way to determine whether or not a
matrix is of full rank (this is only operational if the matrix is not too large).

Inverse Matrix
For a matrix to be invertible, it must be nonsingular. Conversely, a matrix is
nonsingular if and only if its inverse exists.
160 3 The Multiple Regression Model

Consider a matrix .A of size .n × n. We can write the determinant of this matrix


as a function of the cofactors as follows:

. |A| = ai1 Ci1 + ai2 Ci2 + . . . + ain Cin (3.223)

or, equivalently:

. |A| = a1j C1j + a2j C2j + . . . + anj Cnj (3.224)

for .i, j = 1, . . . , n.
The inverse of the matrix .A, denoted .A−1 , is defined by:
⎛ ⎞
C11 C21 · · · Cn1
1 ⎜⎜C12 C22 · · · Cn2 ⎟

A−1 = (3.225)
|A| ⎝ ⎠
.

C1n C2n Cnn

Let us mention the following properties of inverse matrices:


 
 −1  1
. A  = (3.226)
|A|

−1
. A−1 =A (3.227)

'  −1
. A−1 = A' (3.228)

. (AB)−1 = B −1 A−1 (3.229)

. (ABC)−1 = C −1 B −1 A−1 (3.230)

Appendix 3.2: Demonstrations


Appendix 3.2.1: Demonstration of the Minimum Variance Property
of OLS Estimators

In order to show that .β̂ is a minimum variance estimator, suppose there exists
another linear estimator .β̆ of .β:

. β̆ = M Y (3.231)
(k+1,1) (k+1,T )(T ,1)
Appendix 3.2: Demonstrations 161

We can then write:

. β̆ = M (Xβ + ε) = MXβ + Mε (3.232)

If .β̆ is the desired estimator, by virtue of (3.29), it is necessary that in the


expression of .M:
 −1
M = X' X
. X' + N (3.233)

.N = 0, where .N is of dimension .(k + 1, T ). Let us show that .N = 0.


Knowing that .β̆ must be an unbiased estimator of .β, we have:

E β̆ = β
. (3.234)

Furthermore:

E β̆ = E (MXβ + Mε) = MXβ


. (3.235)

because .E (ε) = 0. We deduce that .E β̆ = β if:

MX = I
. (3.236)
 −1
Replacing .M with . X' X X' + N , we have:
 
−1
. X' X X' + N X = I (3.237)

 −1
Knowing that . X' X X' X = I , we get:

I + NX = I
. (3.238)

Hence:

NX = 0
. (3.239)

Replacing .MX with .I in (3.232), we have:

. β̆ = β + Mε = E β̆ + Mε (3.240)

Hence:

. β̆ − β = β̆ − E β̆ = Mε (3.241)
162 3 The Multiple Regression Model

Let us now determine the variance-covariance matrix .Ωβ̆ of .β̆:



'
Ωβ̆ = E
. β̆ − β β̆ − β (3.242)
 
= E (Mε) (Mε)'
 
'
= E Mεε' M

Hence:

Ωβ̆ = σε2 MM '


. (3.243)

Let us determine the matrix product .MM ' :


   '
−1 −1
MM ' =
. X' X X' + N X' X X' + N (3.244)
   
−1 ' −1
MM ' =
. X' X X + N X X' X + N'
'
 ' −1 '  ' −1  ' −1 ' '  −1
.MM = X X XX XX + XX X N + NX X ' X + NN '
 ' ' '
According to (3.239), .NX = X N = 0, hence:
 −1
MM ' = X' X
. + NN ' (3.245)

So we have:
 ' −1
Ωβ̆ = σε2
. XX + NN ' (3.246)

For .β̆ to have minimal variance and knowing that the variances lie on the diagonal
of .Ωβ̆ , we need to minimize the diagonal elements of .Ωβ̆ . Since the diagonal
 −1
elements of . X ' X are constants, the diagonal elements of the matrix .NN ' must
be minimized. If we denote .nij the elements of the matrix .N, where i stands for the
 2and j for the column, the diagonal
row elements of the matrix .NN ' are given by:
. nij . These elements are minimal if . nij = 0, or .nij = 0 .∀i, ∀j . We deduce:
2
j j

N =0
. (3.247)

Therefore:

. β̆ = β̂ (3.248)

It follows that the OLS estimator .β̂ is of minimum variance.


Appendix 3.2: Demonstrations 163

Appendix 3.2.2: Calculation of the Error Variance

In order to estimate the variance .σε2 of the errors, we need to use the residuals .e:

e = Y − Xβ̂
. (3.249)

We have:
 −1
e = Xβ + ε − X X' X
. X' Y (3.250)
 −1
= Xβ + ε − X X' X X ' (Xβ + ε)

Hence:
 −1
e = ε − X X' X
. X' ε (3.251)
 −1
Noting .P = I − X X ' X X' , we can write:

e = Pε
. (3.252)

Let us study the properties of the matrix .P .


   
−1 ' ' −1 '
– .P ' = I − X X' X X = I − X X' X X = P . .P is therefore a
symmetric
 matrix.  
 −1 '  −1 '
– .P = I − X X ' X
2 X I − X X' X X = P . .P is therefore an
idempotent matrix.

We have:

e' e = ε ' P ' P ε = ε ' P ε


. (3.253)

by virtue of the idempotency and symmetry properties of the matrix .P .


Let us now determine the mathematical expectation of .e' e:
 
E e' e = E ε ' P ε
. (3.254)

Since .ε ' P ε is a scalar and noting T r the trace, we have:


   
. E e' e = E T r ε ' P ε (3.255)

or:
   
E e' e = E T r P εε '
. (3.256)
164 3 The Multiple Regression Model

using the fact that .T r(AB) = T r(BA) with .A = ε ' and .B = P ε. Hence:

E e' e = σε2 T r (P )
. (3.257)

It remains for us to determine the trace of the matrix .P :


  ' −1 ' 
.T r (P ) = T r I − X X X X (3.258)
  
−1 '
= T rI − T r X X' X X
 
−1 '
= T rI − T r X ' X XX
 
−1
. T rI = T and .T r X ' X X' X = k + 1 since the matrix .X' X is of size
.(k + 1, k + 1). We deduce:

T r (P ) = T − k − 1
. (3.259)

Finally:

E e' e = σε2 (T − k − 1)
. (3.260)

It follows that the estimator .σ̂ε2 of the error variance is therefore written as:

T
e' e 1
2
.σ̂ε = ≡ et2 (3.261)
T −k−1 T −k−1
t=1

This is an unbiased estimator of .σε2 .

Appendix 3.2.3: Significance Tests of Several Coefficients

In order to derive the various significance tests, we need to determine the distribution
followed by .Rβ. .β being unknown, let us replace it by its estimator:

R β̂ = r
. (3.262)

and determine the distribution followed by .R β̂. Knowing that .β̂ is an unbiased
estimator of .β, we can write:

E R β̂ = Rβ
. (3.263)
Appendix 3.2: Demonstrations 165

Furthermore:

'
'
V R β̂ = E R β̂ − β
. β̂ − β R (3.264)

Hence:
 −1
V R β̂ = σε2 R X ' X
. R' (3.265)

We know that .β̂ follows a normal distribution with .(k + 1) dimensions, therefore:
 −1
R β̂ ∼ N R β̂, σε2 R X ' X
. R' (3.266)

and:
 −1
R β̂ − β ∼ N 0, σε2 R X' X
. R' (3.267)

Under the null hypothesis .Rβ = r, we therefore have:


 −1
R β̂ − r ∼ N 0, σε2 R X' X
. R' (3.268)

Using the result that if .w ∼ N (0, Σ) where .Σ is of size .(K, K), we have
w ' Σ −1 w ∼ χK2 ; then:
.

'  −1
−1
. R β̂ − r σε2 R X' X R' R β̂ − r ∼ χq2 (3.269)

Knowing that:

e' e
. ∼ χT2 −k−1 (3.270)
σε2

and using the result (see Box 2.2 in Chap. 2) that if .w ∼ χs2 and .v ∼ χr2 , the statistic
w/s
.F =
v/r follows a Fisher distribution with .(s, r) degrees of freedom, we deduce:

'  −1
−1
R β̂ − r R X' X R' R β̂ − r /q
F =
. ∼ F (q, T − k − 1)
e' e/ (T − k − 1)
(3.271)
166 3 The Multiple Regression Model

We then have the following decision rule:

– If .F ≤ F (q, T − k − 1), the null hypothesis is not rejected, i.e.: .Rβ = r.


– If .F > F (q, T − k − 1), the null hypothesis is rejected.

Let us now return to the three special cases studied—test on a single coefficient,
test on all coefficients, and test on a subset of coefficients—to specify the expression
of the test in each of these cases.

– Test on a particular regression coefficient .βi . This case corresponds to the null
hypothesis .βi = 0, i.e.:

R = [0 · · · 0 1 0 · · · 0] and r = 0
. (3.272)
 −1 '
We then have .R β̂ − r = βi and the quadratic form .R X' X R is equal to the
 ' −1
.(i + 1)th element of the diagonal of the matrix . X X , i.e., .ai+1,i+1 .
The test statistic given in (3.271) becomes:
 −1
βi2 ai+1,i+1 /1
.F = ∼ F (1, T − k − 1) (3.273)
2
σ̂ε

that is finally:

βi2
F =
. ∼ F (1, T − k − 1) (3.274)
σ̂ε2 ai+1,i+1

This test is equivalent to a Student’s significance test on a single coefficient since


[t (T − k − 1)]2 = F (1, T − k − 1). The same reasoning applies to the test of
.

the null hypothesis .βi = β0 .


– Test of significance of all coefficients. This case corresponds to the null hypothe-
sis .β1 = β2 = · · · = βk = 0, i.e.:
⎛ ⎞ ⎛ ⎞
0 1 0 ··· 0 0
⎜0 0 1 ··· 0⎟ ⎜0⎟
⎜ ⎟ ⎜ ⎟
.R = ⎜ . .. .. . . .. ⎟ and r = ⎜ .. ⎟ (3.275)
⎝ .. . . . . ⎠ ⎝.⎠
0 ··· ··· 0 1 0
 '
We then have .R β̂ − r = β̄ with .β̄ = β̂1 β̂2 · · · β̂k , i.e., .β̄ is the vector of OLS
 −1 '
coefficients without the constant term. Furthermore, the matrix .R X' X R
involved in calculating the test statistic (Eq. (3.271)) is equal to the submatrix of
Appendix 3.2: Demonstrations 167

 −1
size .(k, k) obtained by deleting the first row and column of the matrix . X' X .
To clarify the expression of this submatrix, let us decompose the matrix .X into
two blocks:

X = x̄ X̄
. (3.276)

where .x̄ denotes a column vector composed of 1 and .X̄ is the matrix of size
(T , k) comprising the values of the k explanatory variables. We then have:
.

!
' T x̄ ' X̄
.X X = ' ' (3.277)
X̄ x̄ X̄ X̄

 −1
The calculation of . X' X shows us that the submatrix of size .(k, k) that
interests us here is written as:

' ' −1 ' −1


. X̄ X̄ − X̄ x̄T −1 x̄ ' X̄ = X̄ Z X̄ (3.278)

where .Z is the transformation matrix given by:

Z = I − T −1 x̄ x̄ '
. (3.279)

Relationship (3.271) then becomes:

' '
β̄ X̄ Z X̄ β̄/q
. F = (3.280)
e' e/ (T − k − 1)

or, knowing that .q = k:

' '
β̄ X̄ Z X̄ β̄/k
. F = (3.281)
e' e/ (T − k − 1)

The decision rule is given by:


– If .F ≤ F (q, T − k − 1), the null hypothesis that all explanatory variables
are not significant is not rejected.
– If .F > F (q, T − k − 1), the null hypothesis is rejected.
As we have seen in the chapter, this test can also be apprehended through the
analysis-of-variance equation.
– Test of significance of a subset of coefficients. This case corresponds to the null
hypothesis: .βk−s+2 = βk−s+3 = · · · = βk = 0, i.e.:

R = [ 0 I s ] and r = 0
. (3.282)
168 3 The Multiple Regression Model

Let us decompose the matrix .X and the vector .β into blocks so that:
!
 β̂ r
.Y = X r X s +e (3.283)
β̂ s

= Xr β̂ r + X s β̂ s + e

where the matrix .Xr is formed by the .(k + 1 − s) first columns of .X and .Xs
is formed by the s remaining columns of the matrix .X. We then have: .R β̂ −
 −1 '
r = β̂ s . Furthermore, the matrix .R X' X R involved in calculating the test
statistic (Eq. (3.271)) is equal to the submatrix of order s obtained by deleting
 −1
the .(k + 1 − s) first rows and columns of the matrix . X' X . Let us explain
the form of this submatrix. We have:
 ' 
' Xr Xr X'r Xs
.X X = (3.284)
X's Xr X's Xs
 −1
The calculation of . X' X shows us that the submatrix we are interested in here
is written as:
 −1   −1
−1 −1
. X's Xs − X 's X r X'r X r X 'r Xs = X's I − Xr X'r Xr X'r Xs
 −1
= X's Z r Xs (3.285)

where .Z is the transformation matrix given by:


 −1
Z r = I − Xr X'r Xr
. X'r (3.286)

Relationship (3.271) then becomes:


' 
β̂ s X's Z r Xs β̂ s /s
.F = (3.287)
e' e/ (T − k − 1)

Let us now explain the expression of the numerator. To do this, consider the
regression of .Y on the explanatory variables listed in .Xr and note .er the residuals
resulting from this regression:

er = Y − Xr β̂ r
. (3.288)
 −1
= Y − Xr X'r X r X 'r Y

= Zr Y
Appendix 3.2: Demonstrations 169

Let us multiply each member of (3.283) by the matrix .Z r :

Z r Y = Z r Xr β̂ r + Z r Xs β̂ s + Z r e
. (3.289)

We have:  −1 '
– .Z r Xr = Xr − Xr X'r Xr Xr Xr = 0
'
– .Z r = Z r = Z r (idempotent and symmetric matrix)
2

– .Z r e = e because:
 −1
Z r e = I − Xr X'r Xr
. X'r Y − Xβ̂ (3.290)
 −1  −1
= Y − Xβ̂ − X r X'r X r X 'r Y + Xr X'r Xr X'r Xβ̂
 −1
= Y − Xβ̂ − X r β̂ r +X r X'r Xr X'r (Y − e)

Let us find the value of .X'r e. We know that:


 '
. X X β̂ = X' Y (3.291)

= X' Xβ̂ + e

We therefore deduce that:


 ' 
' Xr e
.X e = =0 (3.292)
X'r e

Finally, we have:

. Z r e = Y − Xβ̂ − Xr β̂ r + Xr β̂ r = e

Hence:

Z r Y = Z r Xs β̂ s + e
. (3.293)

Let us multiply each member of this relation by its transpose:


'
Y ' Z 'r Z r Y = β̂ s X's Z 'r Z r Xs β̂ s + e' e
. (3.294)

Knowing that .Z r = Z 'r , we get:


'
Y ' Z r Y = β̂ s X's Z r Xs β̂ s + e' e
. (3.295)
170 3 The Multiple Regression Model

Furthermore, since .er = Z r Y , we have: .e'r er = Y ' Z 'r Z r Y = Y ' Z r Y . We then


obtain the following result:
'
β̂ s X's Z r X s β̂ s = e'r er − e' e
. (3.296)

Relationship (3.271) is finally written as:


 '
er er − e' e /s
.F = ∼ F (s, T − k − 1) (3.297)
'
e e/ (T − k − 1)

This test, which is very frequently employed, can be used to test the significance
of a subset of explanatory variables .Xs . In practice, it consists in running two
regressions:
– A regression of .Y on the set of explanatory variables, .e' e being the corre-
sponding sum of squared residuals
– A regression of .Y on the subset of explanatory variables .Xr (i.e., on variables
other than .Xs ), .e'r er being the corresponding sum of squared residuals
The decision rule is as follows:
– If .F ≤ F (s, T − k − 1), the null hypothesis that the variables .Xs are not
significant is not rejected.
– If .F > F (s, T − k − 1), the null hypothesis is rejected.
Heteroskedasticity and Autocorrelation
of Errors 4

The regression models studied in previous chapters were based on a number of


assumptions:

– The nonrandom nature of the matrix of explanatory variables .X.


– Rank(X) = k + 1, k denoting the number of explanatory variables.
.

– .E (ε) = 0.
 '
– .E εε = σε2 I where .I denotes the identity matrix and .σε2 the variance of the
error term.
 
– .ε ∼ N 0, σε2 I : this normality assumption is not necessary to establish the
results of the multiple regression model, but it does allow statistical results to
be derived and test statistics to be constructed.

We are interested here in the sphericity of errors hypothesis:


 
E εε ' = σε2 I
. (4.1)

This hypothesis combines two characteristics:

– The absence of autocorrelation of errors.


– The homoskedasticity of errors, i.e., the fact that the variance of the errors is
constant. When such an assumption is violated, we speak of heteroskedasticity:
the variance of the errors is no longer constant. It varies over time or with the
observations of one or more explanatory variables.

This chapter focuses on the case where the hypothesis of sphericity of errors
is not verified. We concentrate on the problems of autocorrelation and heter-
roskedasticity of errors by seeking to answer the following questions:

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 171
V. Mignon, Principles of Econometrics, Classroom Companion: Economics,
https://doi.org/10.1007/978-3-031-52535-3_4
172 4 Heteroskedasticity and Autocorrelation of Errors

– How can we estimate the parameters of a regression model in the presence of


autocorrelation and/or heteroskedasticity of errors?
– What are the sources of autocorrelation and heteroskedasticity?
– How can we detect autocorrelation and heteroskedasticity?
– What are the solutions to autocorrelation and heteroskedasticity?

4.1 The Generalized Least Squares (GLS) Estimators

The variance-covariance matrix of the error term is written:


⎛ ⎞
V (ε1 ) Cov(ε1 , ε2 ) · · · Cov(ε1 , εT )
 ' ⎜
⎜ Cov(ε2 , ε1 ) V (ε2 ) · · · Cov(ε2 , εT )⎟⎟
.E εε =⎜ .. .. . .. ⎟ (4.2)
⎝ . . . . . ⎠
Cov(εT , ε1 ) Cov(εT , ε2 ) · · · V (εT )

If the errors are autocorrelated, the terms off the diagonal are not all zero.
Similarly, if the errors are heteroskedastic, the terms on the diagonal are not all
identical.

4.1.1 Properties of OLS Estimators in the Presence


of Autocorrelation and/or Heteroskedasticity

Consider the following model:

. Y = X β + ε (4.3)
(T ,1) (T ,k+1)(k+1,1) (T ,1)

where .X is a nonrandom matrix of full rank, with:

 =
– .E (ε)  0,
– .E εε ' = Ωε

where .Ωε /= .σε2 I denotes the variance-covariance matrix of the errors. The fact that
.Ωε /= .σε I means there is autocorrelation and/or heteroskedasticity of the errors.
2

The OLS estimator .β̂ of .β is given by:


 −1 '
β̂ = β + X' X
. Xε (4.4)

by virtue of Eq. (3.30) in Chap. 3. Given that .E (ε) = 0, we have:

E β̂ = β
. (4.5)
4.1 The Generalized Least Squares (GLS) Estimators 173

Thus, when there is autocorrelation and/or heteroskedasticity of errors, the OLS


estimator remains an unbiased estimator of .β.
Let us now examine whether the estimator is still of minimum variance.
We have:
'
Ωβ̂ = E
. β̂ − β β̂ − β (4.6)

which can also be written (see Chap. 3):


 −1 '  '   ' −1
Ωβ̂ = X' X
. X E εε X X X (4.7)

that is:
 −1 '  −1
Ωβ̂ = X' X
. X Ωε X X ' X (4.8)

This expression is different from that obtained when there is neither autocor-
 −1
relation nor heteroskedasticity, i.e., .σε2 X' X . It follows that autocorrelation
and/or heteroskedasticity implies that the OLS estimators are no longer of minimum
variance. It is therefore necessary to define other estimators: the generalized least
squares estimators.

4.1.2 The Generalized Least Squares (GLS) Method

Consider the multiple regression model:

. Y = Xβ + ε (4.9)

and multiply each term by a nonsingular transformation matrix .M of size .(T , T ),


i.e.:

MY = MXβ + Mε
. (4.10)

Let us determine the variance-covariance matrix of the error term .(Mε):


 
.E Mεε' M ' = MΩε M ' = σε2 M𝚪ε M ' (4.11)
 −1
with .𝚪ε = σε2 Ωε . If there were a matrix .M such as:

M𝚪ε M ' = I
. (4.12)

we could apply OLS to model (4.10). The resulting estimators would then have the
same properties as in the usual case.
174 4 Heteroskedasticity and Autocorrelation of Errors

We saw in Chap. 3 that if .𝚪ε is a positive definite symmetric matrix, there exists
a nonsingular—and therefore invertible—matrix .P such that:

.𝚪ε = P P ' (4.13)

We can then write:


 −1  −1
P −1 𝚪ε P '
. = P −1 P P ' P ' =I (4.14)

Comparing (4.14) with (4.12), we get:

. M = P −1 (4.15)

Furthermore:
 −1 −1
𝚪ε−1 = P '
. P = M 'M (4.16)

If we apply OLS to model (4.10), we obtain the following estimator .β̃:


 −1 ' '
β̃ = X' M ' MX
. X M MY (4.17)

thus:
−1
β̃ = X' 𝚪ε−1 X
. X' 𝚪ε−1 Y (4.18)

or:
−1
. β̃ = X' Ω−1
ε X X' Ω−1
ε Y (4.19)

The estimator .β̃ given by Eq. (4.19) is called the generalized least squares
(GLS) estimator (or Aitken estimator). Since it was obtained from expressions
(4.11) and (4.12) and since the model (4.10) satisfies the assumptions required for
 '  .β̃ is the best linear unbiased estimator of .β in the model
the OLS, the GLS estimator
.Y = Xβ + ε with .E εε = Ωε . .β̃ is therefore a BLUE estimator.
Let us determine the variance-covariance matrix .Ωβ̃ of .β̃:

'
. Ωβ̃ = E β̃ − β β̃ − β (4.20)
 
'
= E Mεε' M

= MΩε M '
4.1 The Generalized Least Squares (GLS) Estimators 175

So:
−1 −1
.Ωβ̃ = X' Ω−1
ε X X' Ωε X X ' Ω−1
ε X (4.21)

Hence, the expression for the variance-covariance matrix of .β̃:

−1
Ωβ̃ = X' Ω−1
. ε X (4.22)

4.1.3 Estimation of the Variance of the Errors

We saw in Chap. 3 that the OLS estimator of the error variance was given by (see
Eq. (3.261)):

e' e
σ̂ε2 =
. (4.23)
T −k−1

Applying OLS to model (4.10) we get:

e' e
σ̃ε2 =
. (4.24)
T −k−1

where:

e = MY − MXβ̃
. (4.25)

Equation (4.24) then becomes:


'
MY − MXβ̃ MY − MXβ̃
σ̃ε2 =
. (4.26)
T −k−1

that is:
'
Y − Xβ̃ M ' M Y − Xβ̃
2
.σ̃ε = (4.27)
T −k−1

Given that .M ' M = 𝚪ε−1 , we have:


'
Y − Xβ̃ 𝚪ε−1 Y − Xβ̃
σ̃ε2 =
. (4.28)
T −k−1
176 4 Heteroskedasticity and Autocorrelation of Errors

or:
'
Y ' 𝚪ε−1 Y − β̃ X' 𝚪ε−1 Y
2
.σ̃ε = (4.29)
T −k−1

Therefore, assuming that the error term is normally distributed, all the tests
developed in the previous chapters can be applied here.
For the various formulas given above to be operational, the matrix .Ωε must be
known. In practice, this is not the case. Thus, to determine the matrix .Ωε , we need
to specify the analytical form of autocorrelation of errors and/or heteroskedasticity.
Given that the errors are unknown, it is from the residuals that we will look for such
analytic forms. We start by dealing with the problem of heteroskedasticity before
turning to that of autocorrelation.

4.2 Heteroskedasticity of Errors


4.2.1 The Sources of Heteroskedasticity

Recall that heteroskedasticity is present when the terms on the diagonal of the error
variance-covariance matrix are not identical:1
⎛ 2 ⎞
σε1 0 ··· 0

 ' ⎜ 0 σε22 ··· 0 ⎟

.E εε =⎜ . .. .. .. ⎟ (4.30)
⎝ .. . . . ⎠
0 0 · · · σε2T
 
We then have .E εε ' = Ωε /= .σε2 I , which can be written as:

E εt2 = σε2t
. (4.31)

for .t = 1, . . . , T . Note that .σε2 has been indexed by t, meaning that the variance
varies with t.
Heteroskedasticity can have several sources, including:

– The heterogeneity of the sample under consideration. This is the case, for
example, if the sample studied comprises a large number of countries, bringing
together developed countries and emerging or developing countries.
– The omission of an explanatory variable from the model.

1 It is assumed here that there is no autocorrelation of errors.


4.2 Heteroskedasticity of Errors 177

– Asymmetry in the distribution of certain explanatory variables. As an example,


the distribution of a variable such as income is unequal, in the sense that the
richest individuals hold the lion’s share of income.
– A poor variable transformation and/or a poor functional form. Heteroskedasticity
may arise from the fact that the variables have not been correctly transformed
(e.g., transformation of raw series into first difference or growth rates) or from the
fact that the model is functionally misspecified (e.g., linear specification when
the correct specification would be log linear).
– The nature of the data. If the data studied are averages of observations from
samples of different sizes, this can produce heteroskedasticity.

Remark 4.1 Heteroskedasticity is a problem frequently encountered when work-


ing with cross-sectional data. Regarding time series, the most frequent cases of
heteroskedasticity concern financial series, which are generally characterized by a
variance varying over time.

4.2.2 Estimation When There Is Heteroskedasticity

As previously mentioned, the appropriate estimation method when there is het-


eroskedasticity is the generalized least squares (GLS) procedure. Let us take an
illustrative example. Consider the following model with k explanatory variables:

.Yt = α + β1 X1t + β2 X2t + . . . + βk Xkt + εt (4.32)

with .V (εt ) = σε2t . Assume that the values of .σε2t are known, for .t = 1, . . . , T . We
can then transform model (4.32) as follows:

Yt α X1t X2t Xkt εt


. = + β1 + β2 + . . . + βk + (4.33)
σεt σεt σεt σεt σεt σεt

or, noting . X
σε = X̃it , for .i = 1, . . . , k, .Ỹt =
it Yt
σεt and .ε̃t = εt
σεt :
t

α
. Ỹt = + β1 X̃1t + β2 X̃2t + . . . + βk X̃kt + ε̃t (4.34)
σεt

The point in transforming the original model is that the variance of the error term
is now constant. Indeed:
 
εt
.V (ε̃t ) = V (4.35)
σεt
1
= V (εt )
σε2t
178 4 Heteroskedasticity and Autocorrelation of Errors

σε2t
=
σε2t
=1

The error term of the transformed model is therefore homoskedastic, and it is then
possible to apply the OLS technique to model (4.33). The GLS method thus consists
in applying the OLS method to the transformed model. Note that this technique
amounts to minimizing the residual sum of squares of the transformed model, i.e.:

   et 2 
Min
. ẽt2 = Min = Min ωt et2 (4.36)
t t
σ εt t

with .ẽt = Ỹt − α̃− β̃1 X̃1t − β̃2 X̃2t −. . .− β̃k X̃kt and .ωt = 1
σε2t
. The factors .ωt play the
role of weights, and the GLS method involves minimizing a weighted residual sum
of squares. For this reason, this technique is also called the weighted least squares
method (WLS), WLS being only a special case of GLS in which the transformation
matrix .M is given by:
⎛ ⎞
1
σε1 0 ··· 0
⎜ 1
··· ⎟
⎜ 0 0 ⎟
.M = ⎜ ⎟
σε2
⎜ .. .. .. ⎟ (4.37)
⎝ .. ⎠
. . . .
0 0 ··· 1
σεT

In this example, we have assumed that .V (εt ) is known, which is generally not
the case in practice. We will see later what to do when the variance of the error is
unknown.

4.2.3 Detecting Heteroskedasticity

Various tests can be used to address the issue of heteroskedasticity. Before pre-
senting them, let us mention that a first intuition can be provided graphically. The
technique consists first in estimating the model considered by OLS as if there were
no heteroskedasticity. We then graphically represent the estimated values .Ŷt of .Yt
(on the x-axis) as a function of the series of squared residuals .et2 (on the y-axis).
Some examples are given in Figs. 4.1, 4.2, 4.3, 4.4 and 4.5.
These graphs allow us to detect whether the estimated mean value of the
dependent variable is systematically related to the squared residuals. If this is the
case, there is a presumption of heteroskedasticity. Figure 4.1 illustrates the absence
of heteroskedasticity in the sense that no particular relationship appears between the
squared residuals and the estimated variable. On the contrary, Figs. 4.2, 4.3, 4.4 and
4.2 Heteroskedasticity of Errors 179

Fig. 4.1 Absence of e2t


heteroskedasticity

^
Yt

Fig. 4.2 Heteroskedasticity


e2t

^
Yt

4.5 highlight the existence of a relationship between the two variables, suggesting
there is heteroskedasticity: a linear relationship for Fig. 4.3, a quadratic relationship
according to Fig. 4.4, and a positive nonlinear relationship for Fig. 4.5.
It is also possible to produce graphs with the values of one of the explanatory
variables instead of the estimated values of the dependent variable on the x-axis,
with the squared residuals still shown on the y-axis. If the explanatory variable under
consideration and the squared residuals appear to be related, this is an indication in
favor of heteroskedasticity.
In addition to these graphical methods, there are a number of tests that we now
present.

The Goldfeld and Quandt Test (1965)


The Goldfeld and Quandt test applies in the case where one of the explanatory
variables is the cause of heteroskedasticity. It is thus assumed that the variance of
180 4 Heteroskedasticity and Autocorrelation of Errors

Fig. 4.3 Heteroskedasticity


(linear relationship) e2t

^
Yt

Fig. 4.4 Heteroskedasticity


(quadratic relationship) e2t

^
Yt

the error increases with one of the explanatory variables, for example, .Xj . We then
have a relationship of the type:

σε2t = aXj2t
. (4.38)

where a is a positive constant. Such a relationship means that the greater the
values of .Xj , the greater .σε2t . If this is the case, it is an indication that there is
heteroskedasticity. More generally, the test is based on the idea that if we divide the
sample into two subsamples, then, under the assumption of homoskedasticity, the
4.2 Heteroskedasticity of Errors 181

Fig. 4.5 Heteroskedasticity


(positive nonlinear e2t
relationship)

^
Yt

error variances should be identical in both groups. Under the alternative assumption
of heteroskedasticity, they are different. To capture this, Goldfeld and Quandt
suggest a five-step test:

– Step 1. Observations of the different variables are ranked according to increasing


values of .Xj .
– Step 2. The m central values of the resulting sample are disregarded. We obtain
two subsamples, each comprising .(T − m) /2 observations: one corresponding
to the low values of .Xj , the other to the high values of this same variable.
– Step 3. The model is estimated by OLS on each of the two subsamples, both
composed of .(T − m) /2 observations.
– Step 4. The sum of squared residuals corresponding to each of the two regressions
is calculated. We denote .RSS1 and .RSS2 the sums of squared residuals with
.RSS2 > RSS1 , i.e., .RSS2 is associated with the regression for which the sum of

squared residuals is higher.


– Step 5. The test statistic given by the ratio:

RSS2
.GQ = (4.39)
RSS1

is calculated. Under the null hypothesis of homoskedasticity:


 
T − m − 2(k + 1) T − m − 2(k + 1)
GQ ∼ F
. , (4.40)
2 2
182 4 Heteroskedasticity and Autocorrelation of Errors

The decision rule is as follows:

– If .GQ < F T −m−2(k+1) 2 , T −m−2(k+1)


2 , the null hypothesis of homoskedas-
ticity is not rejected.
– If .GQ > F T −m−2(k+1)
2 , T −m−2(k+1)
2 , there is heteroskedasticity.

The power of the Goldfeld and Quandt test depends on the choice of m. Harvey
and Phillips (1973) suggest choosing a value of m close to .T /3.

The Glejser Test (1969)


The Glejser test is intended not only to detect possible heteroskedasticity, but also
to specify its form. As before, this test is based on the assumption that only one
variable is the cause of heteroskedasticity. Let us assume that it is the variable .Xj .
The test can be presented in three steps:

– Step 1. The regression of the dependent variable on the explanatory variables is


estimated by OLS. The residual series .et is deduced.
– Step 2. The absolute value of the residuals .et is regressed on the variable .Xj or
on transformations of this variable. Glejser proposes various functional forms for
this regression:
- .|et | = a0 + a1 X
j t + ut
- .|et | = a0 + a1 Xj t + ut
- .|et | = a0 + a1 X1j t + ut
- .|et | = a0 + a1 √1 + ut
Xj t

- .|et | = a0 + a1 Xj t + ut

- .|et | = a0 + a1 Xj2t + ut
where .ut is the error term.
– Step 3. The null hypothesis that .a1 = 0 is tested by means of a usual
significance test. If this hypothesis is not rejected, we conclude in favor of
homoskedasticity. On the other hand, if the null hypothesis is rejected, the
hypothesis of heteroskedasticity is adopted.

Remark 4.2 The Glejser test was criticized by Goldfeld and Quandt (1972)
who pointed out that the error term .ut in the various regressions did not have
the right statistical properties, which implies that the conditions necessary for
implementating the usual t-test of significance are not required.

The Breusch-Pagan Test (1979)


This is a general test in the sense that it covers a large number of cases of
heteroskedasticity. Since it is an asymptotic test, it is only valid for sufficiently large
sample sizes. In addition, it overcomes the difficulties of the Goldfeld and Quandt
test residing in the choice of the parameter m and in the identification of the variable
4.2 Heteroskedasticity of Errors 183

Xj at the source of heteroskedasticity. An additional advantage lies in its simplicity,


.

as it is based on the residuals from the OLS estimation of the regression model.
In the multiple regression model:

Yt = α + β1 X1t + β2 X2t + . . . + βk Xkt + εt


. (4.41)

the error term is assumed to have a normal distribution with variance:


 
σε2t = f a0 + a1 Z1t + . . . + ap Zpt
. (4.42)

where f is any function, the coefficients .ai , .i = 1, . . . , p, are not related to the
coefficients of the regression model (4.41), and .Z1t , . . . , Zpt are variables likely
to be the source of heteroskedasticity. Some or all of these variables may be
explanatory variables in the regression model (4.41).
Testing the null hypothesis of homoskedasticity is equivalent to testing:

H0 : a1 = a2 = . . . = ap = 0
. (4.43)

since, in this case, we have:

σε2t = f (a0 )
. (4.44)

which is constant, whatever t. The test can be implemented in five steps:

– Step 1. Regression (4.41) is estimated by OLS, and the residual series .et , t =
1, . . . , T is deduced.
– Step 2. The following quantity:

1  2
T
2
σ̂ML
. = et (4.45)
T
t=1

is calculated, which is the maximum likelihood estimator of the variance of the


error term (see Appendix “The Maximum Likelihood Method” in Chap. 2).
– Step 3. The quantity:

et2
ht =
.
2
(4.46)
σ̂ML

is computed for .t = 1, . . . , T .
– Step 4. After specifying the variables .Z1t , . . . , Zpt , we regress .ht on these
variables:

ht = a0 + a1 Z1t + . . . + ap Zpt + ut
. (4.47)
184 4 Heteroskedasticity and Autocorrelation of Errors

where .ut is an error term. The explained sum of squares (ESS) of this regression
is calculated.
– Step 5. The quantity:

1
BP =
. ESS (4.48)
2
is computed, which, under the null hypothesis of homoskedasticity, has a Chi-
squared distribution with p degrees of freedom, i.e.:

BP ∼ χp2
. (4.49)

The decision rule is as follows:


- If .BP < χp2 , the null hypothesis of homoskedasticity is not rejected.
- If .BP > χp2 , the null hypothesis of homoskedasticity is rejected.

The White Test (1980)


This is a very general test which has the advantage of not relying on the assumption
of normality of the error term. It consists in testing the null hypothesis of
homoskedasticity against the alternative hypothesis of heteroskedasticity; the form
of heteroskedasticity not being specified. The test procedure can be described in
three steps:

– Step 1. The multiple regression model is estimated:

Yt = α + β1 X1t + β2 X2t + . . . + βk Xkt + εt


. (4.50)

and we deduce the series of residuals .et , .t = 1, . . . , T .


– Step 2. The following auxiliary regression is estimated, reflecting the existence
of a relationship between the squared residuals from the previous regression and
one or more explanatory variables:

et2 = a0 + a1 X1t + b1 X1t


.
2
+ a2 X2t + b2 X2t
2
+ . . . + ak Xkt + bk Xkt
2
+ ut (4.51)

where .ut is an error term. It is also possible to add interaction terms such
as .X1t X2t to this regression. We calculate the coefficient of determination .R 2
associated with this auxiliary regression.
– Step 3. We test the null hypothesis of homoskedasticity:

H0 : a1 = b1 = a2 = b2 = . . . = ak = bk = 0
. (4.52)

If at least one coefficient of the auxiliary regression is significantly different from


zero, the null hypothesis is rejected. To perform this test, we calculate the White
test statistic given by the product between the number of observations T and
4.2 Heteroskedasticity of Errors 185

the coefficient of determination of the auxiliary regression,2 or .T R 2 . Under the


null hypothesis, this statistic has a Chi-squared distribution whose number of
degrees of freedom is equal to the number of estimated parameters (excluding
the constant) in the auxiliary regression. Thus, in the case of model (4.51), we
have:

T R 2 ∼ χ2k
.
2
(4.53)

The decision rule is then:


2 , the null hypothesis of homoskedasticity is not rejected.
- If .T R 2 < χ2k
2 , the null hypothesis of homoskedasticity is rejected.
- If .T R 2 > χ2k

Remark 4.3 The White test can also be used as a model misspecification test.
Under the null hypothesis, the White test assumes that the errors are not only
homoskedastic, but also uncorrelated with the regressors and that the linear specifi-
cation of the model is correct. If one of these conditions is violated, the test statistic
is above the critical value. On the contrary, if the value of the test statistic is below
the critical value, this indicates that none of these three conditions is violated.

ARCH Test
ARCH (autoregressive conditionally heteroskedastic) processes were intro-
duced by Engle (1982) and are used to model series whose variance—also called
volatility—in t depends on its past values. This is therefore a particular form of
heteroskedasticity, called conditional heteroskedasticity. The test procedure can
be outlined in four steps:

– Step 1. The multiple regression model:

Yt = α + β1 X1t + β2 X2t + . . . + βk Xkt + εt


. (4.54)

is estimated, and the series of residuals .et , .t = 1, . . . , T , is calculated.


– Step 2. The squared residual series, i.e., .et2 , .t = 1, . . . , T .
– Step 3. The following regression is estimated, consisting of regressing the
squared residual series on its .𝓁 past values and a constant:


𝓁
et2 = a0 +
.
2
ai et−i (4.55)
i=1

2 Such a statistic is called a Lagrange multiplier statistic.


186 4 Heteroskedasticity and Autocorrelation of Errors

The coefficient of determination .R 2 associated with this regression is calculated.


– Step 4. The null hypothesis of homoskedasticity:

H0 : a1 = a2 = . . . = a𝓁 = 0
. (4.56)

is tested against the alternative hypothesis of conditional heteroskedasticity


stipulating that at least one of the coefficients .ai , .i = 1, . . . , 𝓁, is significantly
different from zero. In order to implement this test, we calculate the test
statistic—which is a Lagrange multiplier statistic—consisting in computing
the product between the number of observations T and the coefficient of
determination of the regression estimated in Step 3. Under the null hypothesis
of homoskedasticity, we have:

T R 2 ∼ χ𝓁2
. (4.57)

The decision rule is as follows:


- If .T R 2 < χ𝓁2 , the null hypothesis of homoskedasticity is not rejected.
- If .T R 2 > χ𝓁2 , the null hypothesis of homoskedasticity is rejected in favor of
the alternative hypothesis of conditional heteroskedasticity.

4.2.4 Estimation Procedures When There Is Heteroskedasticity

The presence of heteroskedasticity results in OLS estimators remaining unbiased,


but no longer of minimum variance. This poses a problem, notably because the
precision of the tests is affected. It is therefore relevant to seek to correct for
heteroskedasticity.
We must distinguish between cases where the variance of the error term is known
and those where it is unknown. When the variance of the error term is known, the
GLS method should be applied when there is heteroskedasticity. The example given
in Sect. 4.2.2 relating to the weighted least squares (WLS) method was based on the
knowledge of the variance of the error term. In practice, this is not the case, and it is
impossible to obtain efficient estimators of the parameters by WLS when the form
of heteroskedasticity is unknown.
In these cases where the variance of the error term is unknown, however, it is
possible to obtain estimators of the variances and covariances of the WLS estimators
corrected for heteroskedasticity. For this purpose, we can use the corrections
suggested by White (1980) and Newey and West (1987). These two techniques
do not modify the OLS-estimated values of the coefficients, but only change the
estimated standard deviations of these coefficients (and therefore their t-statistics).
We can also use the results of certain tests, such as the Glejser test, to correct for
heteroskedasticity.
4.2 Heteroskedasticity of Errors 187

The White Estimator of the Variance-Covariance Matrix


Consider the multiple regression model:

Y = Xβ + ε
. (4.58)

which can still be written for an observation t:

Y t = X't β + ε t
. (4.59)

where .X't denotes a column vector equal to the transpose of the .t th row of the
matrix .X.
W
The .Ω̂ε estimator of the variance-covariance matrix proposed by White (1980)
is written as:
 T 
W T  ' −1   −1
.Ω̂ε = XX et X t X t X ' X
2 '
(4.60)
T −k−1
t=1

where T is the number of observations, k is the number of explanatory variables,


and .et are the residuals from the OLS estimation of the multiple regression model.

The Newey and West Estimator of the Variance-Covariance Matrix


The variance-covariance matrix suggested by White is based on the assumption that
the residuals of the estimated model are not autocorrelated. Newey and West (1987)
propose a more general estimator of the variance-covariance matrix that is valid
when there are both autocorrelation and heteroskedasticity of unknown form. The
NW
estimator .Ω̂ε of the variance-covariance matrix proposed by Newey and West is
written as:
NW T  ' −1 N W  ' −1
. Ω̂ε = XX ˆ
∑ XX (4.61)
T −k−1

with:

ˆ NW = T

. (4.62)
T −k−1
⎡ ⎛

T 
q   
T
j
×⎣ e2t Xt X't + ⎝ 1−
q +1
t=1 j =1 t=j +1

× Xt et et−j X't−j + Xt−j et−j et X't
188 4 Heteroskedasticity and Autocorrelation of Errors

In this expression, q designates the truncation parameter. It represents the number


of autocorrelations taken into account in the dynamics of the residuals .et . Newey
and West have suggested the following value for this parameter:
   
T 2/9
.q = int 4 (4.63)
100

where int denotes the integer part.

Hypotheses About the Form of Heteroskedasticity


As previously mentioned, it is also possible to use the results of the Glejser test
regarding the form of heteroskedasticity or to make certain assumptions about this
form.
Consider the multiple regression model:

Yt = α + β1 X1t + β2 X2t + . . . + βj Xj t + . . . + βk Xkt + εt


. (4.64)

Assume that heteroskedasticity is such that the variance of the error term is
proportional to .Xj2t where .Xj t is one of the explanatory variables of the regression
model (4.64), i.e.:

σε2t = aXj2t
. (4.65)

where a is a constant. It is possible to transform model (4.64) by dividing each of


the terms by .Xj :

Yt 1 X1t X2t Xkt εt


. =α + β1 + β2 + . . . + βj + . . . + βk + (4.66)
Xj t Xj t Xj t Xj t Xj t Xj t
εt
Denote by .ut = Xj t the transformed error term. Let us determine its variance:

 2
εt 1
E
. u2i =E = E εt2 (4.67)
Xj t Xj2t
 
According to (4.65), .E εt2 = aXj2t , hence:

E u2i = a
. (4.68)

The variance of the transformed error term .ut is constant and it is, therefore,
possible to apply OLS to the transformed model (4.66).
4.2 Heteroskedasticity of Errors 189

The same type of reasoning can, of course, be applied to other forms of


heteroskedasticity. As an example, suppose the form of heteroskedasticity is such
that:

.σε2t = aXj t (4.69)

Thus, the variance of the error term is now assumed to be proportional to .Xj t ,
with a being a constant. The transformed model is then written as:

Yt 1 X1t X2t  Xkt


.  = α + β1  + β2  + . . . + βj Xj t + . . . + βk 
Xj t Xj t Xj t Xj t Xj t
εt
+ (4.70)
Xj t

The variance of the transformed error term .ut = √εt is given by:
Xj t

 2
εt 1
.E u2i =E  = E εt2 = a (4.71)
Xj t Xj t

which is indeed constant. It is therefore possible to apply OLS to the transformed


model (4.70).

Note on the Logarithmic Transformation


The logarithmic transformation, giving rise to a log-linear model (see Chap. 2), can
also be used to reduce heteroskedasticity. The log-linear multiple regression model
is written:

. log Yt = α + β1 log X1t + β2 log X2t + . . . + βk log Xkt + εt (4.72)

The reduction in heteroskedasticity comes from the fact that the logarithmic
transformation “compresses” the scales on which the variables are measured.

4.2.5 Empirical Application

Let us use the series of stock market returns studied in the previous chapter and
consider the following model at monthly frequency over the period from February
1984 to June 2021 (449 observations):

RF T SEt = α + βRDJ I NDt + εt


. (4.73)

where RF T SE denotes the returns of the F T SE 100 index of the London Stock
Exchange and RDJ I ND the returns of the Dow Jones Industrial Average index of
190 4 Heteroskedasticity and Autocorrelation of Errors

Table 4.1 FTSE and Dow RF T SE RDJ I N D


Jones industrial average
1984.02 .−0.0216 .−0.0555
returns
1984.03 0.0671 0.0088
1984.04 0.0229 0.0050
... ... ...
2021.04 0.0374 0.0267
2021.05 0.0075 0.0191
2021.06 0.0021 .−0.0008

Data source: Macrobond

Table 4.2 OLS estimation results


Dependent variable: RFTSE
Variable Coefficient Std. error t-Statistic Prob.
C .−0.001614 0.001352 .−1.193763 0.2332
RDJIND 0.782505 0.030388 25.75060 0.0000
R-Squared 0.597331 Mean dependent var 0.004210
Adjusted R-squared 0.596430 S.D. dependent var 0.044466
S.E. of regression 0.028248 Akaike info criterion .−4.291158

Sum squared resid 0.356678 Schwarz criterion .−4.272864


Log likelihood 965.3650 Hannan-Quinn criterion .−4.283947
F-statistic 663.0935 Durbin-Watson stat 2.224700
Prob(F-statistic) 0.0000

Table 4.3 Reordered returns RF T SE RDJ I N D


for the FTSE and Dow Jones
1987.10 .−0.3017 .−0.2642
industrial average indexes
1998.08 .−0.1061 .−0.1641
2008.10 .−0.1133 .−0.1515

... ... ...


2020.04 0.0396 0.1051
2020.11 0.1165 0.1119
1987.01 0.0742 0.1295

the New York Stock Exchange (see Table 4.1). The series are extracted from the
Macrobond database.
The OLS estimation of model (4.73) leads to the results reported in Table 4.2.
Let us apply the various homoskedasticity tests previously presented.

The Goldfeld and Quandt Test


Implementing the Goldfeld and Quandt test first involves ordering the observations
according to the increasing values of the explanatory variable RDJ I ND. Table 4.3
reports the observations thus reordered.
We then ignore a number m of central observations. Let us take, as suggested
by Harvey and Phillips (1973), .m = T /3 = 449/3, which leads us to omit 149
4.2 Heteroskedasticity of Errors 191

Table 4.4 Regression on the first 150 observations


Dependent variable: RFTSE
Variable Coefficient Std. error t-Statistic Prob.
C 0.005742 0.003463 1.658159 0.0994
RDJIND 0.959400 0.065123 14.73205 0.0000
R-Squared 0.594557 Mean dependent var .−0.031056

Adjusted R-squared 0.591818 S.D. dependent var 0.045985


S.E. of regression 0.029379 Akaike info criterion .−4.203797
Sum squared resid 0.127747 Schwarz criterion .−4.163655

Log likelihood 317.2848 Hannan-Quinn criterion .−4.187489


F-statistic 217.0332 Durbin-Watson stat 1.908283
Prob(F-statistic) 0.000000

Table 4.5 Regression on the last 150 observations


Dependent variable: RFTSE
Variable Coefficient Std. error t-Statistic Prob.
C 0.001252 0.005647 0.221775 0.8248
RDJIND 0.664592 0.104489 6.360391 0.0000
R-Squared 0.214665 Mean dependent var 0.034396
Adjusted R-squared 0.209359 S.D. dependent var 0.029963
S.E. of regression 0.026643 Akaike info criterion .−4.399360

Sum squared resid 0.105055 Schwarz criterion .−4.359218


Log likelihood 331.9520 Hannan-Quinn criterion .−4.383051
F-statistic 40.45457 Durbin-Watson stat 2.399228
Prob(F-statistic) 0.000000

observations. We obtain two subsamples, each composed of 150 observations. The


OLS estimates of the two resulting regressions are shown in Tables 4.4 and 4.5.
The test statistic GQ is given by:

RSS2
GQ =
. (4.74)
RSS1

where .RSS1 and .RSS2 are the sums of squares of the residuals from each of the
two regressions, with .RSS2 > RSS1 . We see that the residual sum of squares
corresponding to the model estimated on the first 150 observations (Table 4.4)
is greater than that relating to the model estimated on the last 150 observations
(Table 4.5). We therefore have .RSS1 = 0.105055 and .RSS2 = 0.127747. So:

GQ = 1.2160
. (4.75)
192 4 Heteroskedasticity and Autocorrelation of Errors

Under the null hypothesis of homoskedasticity, the GQ statistic has a Fisher


distribution with . T −m−2(k+1)
2 , T −m−2(k+1)
2 = 449−149−2(1+1)
2 , 449−149−2(1+1)
2
degrees of freedom. The Fisher table at the 5% significance level gives us:
.F (148, 148) = 1.00. Since .1.2160 > 1.00, the null hypothesis of homoskedasticity

is rejected.

The Glejser Test


The first step in the Glejser test is to estimate model (4.73). The results of this
estimation are reported in Table 4.2. We derive the residual series:

et = RF T SEt + 0.0016 − 0.7825RDJ I NDt


. (4.76)

We then calculate the series of residuals in absolute values .|et | . We regress .|et | on
the explanatory variable RDJ I ND or on various transformations of this variable.
Consider, for example, the following two models:

– .|et | = â0 + â1 RDJ I NDt OLS estimation of this model leads to the following
results:

. |et | = 0.0221 − 0.0275 RDJ I NDt (4.77)


(25.9251) (−1.4395)

– .|et | = â0 + â1 (RDJ I NDt )−1 OLS estimation of this regression yields:

. |et | = 0.0219 − 8.24.10−7 (RDJ I NDt )−1 (4.78)


(26.0510) (−1.1275)

In both models, the values in parentheses are the t-statistics associated with the
estimated coefficients.
We proceed to test the null hypothesis of homoskedasticity according to which
.a1 = 0. To test this hypothesis, we compare the absolute values of the t-statistics of

the coefficient .a1 with the value read from the Student’s t table (1.96 at the 5%
significance level). We deduce that such a hypothesis is not rejected by models
(4.77) and (4.78).

The Breusch-Pagan Test


Consider the residuals from the regression shown in Table 4.2:

et = RF T SEt + 0.0016 − 0.7825RDJ I NDt


. (4.79)

We calculate:

1  2
T
.
2
σ̂ML = et (4.80)
T
t=1
4.2 Heteroskedasticity of Errors 193

that is:
2
σ̂ML
. = 7.9438.10−4 (4.81)

This gives the series:

et2
ht =
.
2
(4.82)
σ̂ML

for .t = 1, . . . , 449. We then regress .ht on .RDJ I NDt and obtain:

ht = 1.0263 − 3.5352 RDJ I NDt


. (4.83)
(13.4166) (−2.0564)

We calculate the explained sum of squares of this regression, i.e., .ESS = 10.80
and the Breusch-Pagan statistic:

1
.BP = ESS = 5.40 (4.84)
2
Under the null hypothesis of homoskedasticity, the BP statistic follows a Chi-
squared distribution with 1 degree of freedom (since only one explanatory variable
is included in the regression (4.83)). At the 5% significance level, the critical value
read from the Chi-squared table is equal to 3.841. Consequently, .5.40 > 3.841,
which means we reject the null hypothesis of homoskedasticity.

The White Test


For the estimation of model (4.73) giving the series of residuals .et , we deduce the
series of squared residuals in order to estimate the auxiliary regression:

et2 = a0 + a1 RDJ I NDt + b1 RDJ I NDt2 + ut


. (4.85)

which gives:

et2 = 0.0007 − 0.0012 RDJ I NDt + 0.0528RDJ I NDt2


. (4.86)
(10.3910) (−0.8232) (3.8127)

The coefficient of determination associated with this auxiliary regression is equal


to .R 2 = 0.0406. The number of observations being equal to 449, we deduce
the value of the test statistic: .T R 2 = 18.2478. Under the null hypothesis of
homoskedasticity, this statistic follows a Chi-squared distribution with 2 degrees of
freedom (since two variables appear in the auxiliary regression). The Chi-squared
table gives us a critical value equal to 5.991 at the 5% significance level. Given that
.18.2478 > 5.991, the null hypothesis of homoskedasticity is rejected.
194 4 Heteroskedasticity and Autocorrelation of Errors

ARCH Test
For the estimation of model (4.73) giving the series of residuals .et , we derive the
series of squared residuals .et2 . The .et2 series is then regressed on a constant and its
past values. Using three lags we obtain the following results:

et2 = 0.0006 + 0.0670et−1


.
2
+ 0.11936et−2
2
+ 0.1008et−3
2
(4.87)
(6.7282) (1.4162) (2.5508) (2.1429)

The coefficient of determination associated with this regression is .R 2 = 0.0345.


We test the null hypothesis of homoskedasticity by calculating the test statistic
.T R
2 = 446 × 0.0345 = 15.3946. The number of observations equals 446, not

449, corresponding to the number of observations used in the regression (4.87):


since this regression involves three lags on the squared residual variable, three
observations are “lost.” Under the null hypothesis, the test statistic has a Chi-
squared distribution with three degrees of freedom (since three variables appear
in the regression (4.87)). The Chi-squared table gives us a critical value equal to
7.815 at the 5% significance level. Since .15.3946 > 7.815, it follows that the null
hypothesis of homoskedasticity is rejected.
With the exception of Glejser’s test, all the tests conclude that the null hypothesis
of homoskedasticity is rejected and that heteroskedasticity is present. Such a result
is not surprising since our example concerns financial series. These series are
frequently heteroskedastic since their variance changes over time. The occurrence
of heteroskedasticity leads us to correct the OLS estimators of the variance of the
coefficients. To this end, let us apply the methods of White and Newey and West.

Heteroskedasticity-Corrected Estimations
Tables 4.6 and 4.7 report the results from the OLS estimation of relationship (4.73),
the variance-covariance matrix estimators being given by White (Table 4.6) and by
Newey and West (Table 4.7). These two techniques allow for heteroskedasticity by
correcting the estimators of the variances and covariances of the OLS estimators.
Thus, the estimated values of the coefficients are identical to those shown in
Table 4.2, but the standard deviations of the coefficients (and therefore t-statistics)
are different. The coefficient associated with the RDJ I N D variable remains
significantly different from zero, both with the White correction and with the
Newey-West correction.

4.3 Autocorrelation of Errors


4.3.1 Sources of Autocorrelation

There is autocorrelation when the terms off the diagonal of the variance-covariance
matrix of the errors are not all zero:3

3 It is assumed here that the errors are homoskedastic.


4.3 Autocorrelation of Errors 195

Table 4.6 White heteroskedasticity consistent estimations


Dependent variable: RFTSE
White heteroskedasticity-consistent standard errors and Covariance
Variable Coefficient Std. error t-Statistic Prob.
C .−0.001614 0.001401 .−1.152289 0.2498
RDJIND 0.782505 0.040668 19.24132 0.0000
R-Squared 0.597331 Mean dependent var 0.004210
Adjusted R-squared 0.596430 S.D. dependent var 0.044466
S.E. of regression 0.028248 Akaike info criterion .−4.291158

Sum squared resid 0.356678 Schwarz criterion .−4.272864


Log likelihood 965.3650 Hannan-Quinn criterion .−4.283947
F-statistic 663.0935 Durbin-Watson stat 2.224700
Prob(F-statistic) 0.000000 Wald F-statistic 370.2283
Prob(Wald F-statistic) 0.000000

Table 4.7 Newey-West heteroskedasticity and autocorrelation consistent estimation


Dependent variable: RFTSE
Newey-West HAC standard errors and covariance (bandwidth .= 6)
Variable Coefficient Std. error t-Statistic Prob.
C .−0.001614 0.001165 .−1.385650 0.1665
RDJIND 0.782505 0.039089 20.01861 0.0000
R-Squared 0.597331 Mean dependent var 0.004210
Adjusted R-squared 0.596430 S.D. dependent var 0.044466
S.E. of regression 0.028248 Akaike info criterion .−4.291158

Sum squared resid 0.356678 Schwarz criterion .−4.272864


Log likelihood 965.3650 Hannan-Quinn criterion .−4.283947
F-statistic 663.0935 Durbin-Watson stat 2.224700
Prob(F-statistic) 0.000000 Wald F-statistic 400.7447
Prob(Wald F-statistic) 0.000000

⎛ ⎞
σε2 Cov(ε1 , ε2 ) · · · Cov(ε1 , εT )

 '  ⎜ Cov(ε2 , ε1 ) σε2 · · · Cov(ε2 , εT )⎟⎟
.E εε =⎜ .. .. .. .. ⎟ (4.88)
⎝ . . . . ⎠
Cov(εT , ε1 ) Cov(εT , ε2 ) ··· σε2

Thus, in the presence of autocorrelation of errors, the regression model assump-


tion that .E (εt εt ' ) = 0 for .t /= t ' is violated. This results in the error term in t being
related to the error term in .t ' .
196 4 Heteroskedasticity and Autocorrelation of Errors

The autocovariance of order h, denoted .γh , is defined as:

γh = E (εt εt−h ) = cov (εt , εt−h )


. (4.89)

for .h = 0, ±1, ±2, . . ..


When .h = 0, the autocovariance is equal to the variance, i.e.:

γ0 = E εt2 = σε2
. (4.90)

The variance-covariance matrix can be written as:


⎛ ⎞
γ0 γ1 · · · γh−1
  ⎜ ⎜ γ1 γ0 · · · γh−2 ⎟

E εε ' = ⎜
. .. .. . . . ⎟ (4.91)
⎝ . . . .. ⎠
γh−1 γh−2 · · · γ0

Let .ρh be the autocorrelation coefficient of order h:


γh γh
ρh =
. = 2 (4.92)
γ0 σε

for .h = 0, ±1, ±2, . . .. The variance-covariance matrix of the errors is written as:
⎛ ⎞
1 ρ1 · · · ρh−1
  ⎜ ρ1 1 · · · ρh−2 ⎟
⎜ ⎟
E εε ' = σε2 ⎜
. .. .. . . . ⎟ (4.93)
⎝ . . . .. ⎠
ρh−1 ρh−2 · · · 1
 
We then have .E εε ' = Ωε /= .σε2 I .

Remark 4.4 When working with time series, the autocorrelation in question is
of a temporal type: the error term at a given date depends on the same error
term at another date. When working with cross-sectional data, we speak of spatial
autocorrelation; the correlation being in space rather than time.

Autocorrelation can have several sources, including:

– the omission of one or more important explanatory variables from the model.
It should be remembered that the error term can be interpreted as a set of
explanatory variables not included in the model. Consequently, omitting one
or more explanatory variables may result in autocorrelation of the error term,
particularly if the omitted explanatory variables are themselves autocorrelated.
4.3 Autocorrelation of Errors 197

– Misspecification of the model. The functional form chosen is incorrect; for


example, the specification chosen is linear when the model is in fact nonlinear.
– The measurement error of the dependent variable. For example, some data, such
as macroeconomic data, are revised regularly, leading to the publication of values
that are slightly different from those previously published. These measurement
errors can produce autocorrelation in the error term.
– The use of nonseasonally adjusted time series. The use of nonseasonally adjusted
series can produce cycles that generate autocorrelation of errors;
– Data transformations:
- A first example is provided by averages or moving averages. For example,
if we want to work at a quarterly frequency and have monthly series, a
commonly used method is to average the values over three months. In this way,
a certain regularity is introduced into the data by smoothing them: the change
to quarterly data attenuates the fluctuations observed at monthly frequency.
Such smoothing can lead to autocorrelation of errors.
- A second example is given by data interpolation or extrapolation techniques.
Suppose that data are available every ten years and that we wish to obtain
values for all years in between. The standard practice is to interpolate the
series on the basis of some ad hoc assumptions. This interpolation can impose
a systematic pattern in the errors, giving rise to autocorrelation.
- A third example is given by the use—which is very frequent in practice—of
the first difference operator. Consider, for example, the following model, in
which the error term is non-autocorrelated:

Yt = α + βXt + εt
. (4.94)

We can write this model at date .t − 1:

Yt−1 = α + βXt−1 + εt−1


. (4.95)

Taking the difference between (4.94) and (4.95), we obtain:

ΔYt = βΔXt + Δεt


. (4.96)

where .Δ is the first difference operator, such that: .ΔZt = Zt − Zt−1 . Such
a first-difference transformation can produce autocorrelation of errors in that,
if the error term of (4.94) is non-autocorrelated, the error term .Δεt of (4.96)
exhibits autocorrelation.
– The nonstationarity of the series considered. If the regression model selected
involves nonstationary series and the error term is itself nonstationary, the latter
will be characterized by the presence of autocorrelation (see, for example, Lardic
and Mignon, 2002 for details).

As shown, the sources of autocorrelation can be very diverse. Finally, autocor-


relation can be positive or negative. We speak of positive autocorrelation when
198 4 Heteroskedasticity and Autocorrelation of Errors

the error term moves either upwards or downwards over a fairly long period.
Conversely, negative autocorrelation occurs when a positive value of the error term
is followed by a negative value, then a positive value, and so on.

4.3.2 Estimation When There Is Autocorrelation

As previously mentioned, in the presence of error autocorrelation, the OLS esti-


mators remain unbiased but are no longer of minimum variance. In other words, the
estimators are not efficient and the usual testing procedures no longer hold. To study
this property in more detail, let us consider the following example:

Yt = α + βXt + εt
. (4.97)

with:

εt = ρεt−1 + ut
. (4.98)
 
where .|ρ| < 1, .E (ε) = 0, and .E εε ' /= σε2 I . The term .ut is assumed to be white
noise.
The process (4.98) is called a first-order autoregressive process, denoted
.AR(1): the error term at t is a function of itself at .t − 1. In other words, this process

illustrates a first-order autocorrelation of the error term. The coefficient .ρ is the


first-order autocorrelation coefficient.
The application of OLS to the model formed by Eqs. (4.97) and (4.98) leads to
the following result (see Eq. (4.8)):
 −1 '  −1
Ωβ̂ = X' X
. X Ωε X X ' X (4.99)

Let us explain .Ωε . To this end, we iterate Equation (4.98), which gives:

εt = ρ (ρεt−2 + ut−1 ) + ut = ρ 2 εt−2 + ρut−1 + ut


. (4.100)

Continuing, we have:

εt = ρ 2 (ρεt−3 + ut−2 ) + ρut−1 + ut = ρ 3 εt−3 + ρ 2 ut−2 + ρut−1 + ut


. (4.101)

that is finally:

εt = ut + ρut−1 + ρ 2 ut−2 + . . .
. (4.102)
4.3 Autocorrelation of Errors 199

We seek to write the variance-covariance matrix of the error term:


⎛ ⎞
V (ε1 ) Cov(ε1 , ε2 ) · · · Cov(ε1 , εT )
 ' ⎜
⎜ Cov(ε2 , ε1 ) V (ε2 ) · · · Cov(ε2 , εT )⎟⎟
.Ωε = E εε =⎜ .. .. .. .. ⎟ (4.103)
⎝ . . . . ⎠
Cov(εT , ε1 ) Cov(εT , ε2 ) · · · V (εT )

To do this, let us square the two sides of Eq. (4.102) and consider the expectation:

2
E εt2 = E ut + ρut−1 + ρ 2 ut−2 + . . .
. (4.104)

that is:

σu2
E εt2 = σε2 =
. (4.105)
1 − ρ2

Furthermore, we have:

E (εt εt−1 ) = E ((ρεt−1 + ut ) εt−1 ) = ρσε2


. (4.106)

since .E (εt−1 εt−1 ) = σε2 and .E (ut εt−1 ) = 0. In addition:

.E (εt εt−2 ) = E ((ρεt−1 + ut ) εt−2 ) = ρE (εt−1 εt−2 ) (4.107)

and:

E (εt−1 εt−2 ) = E ((ρεt−2 + ut−1 ) εt−2 ) = ρσε2


. (4.108)

hence:

E (εt εt−2 ) = ρ 2 σε2


. (4.109)

More generally, we have:

E (εt εt−i ) = ρ i σε2


. (4.110)

It is then possible to write the variance-covariance matrix:


⎛ ⎞
σε2· · · ρ T −1 σε2
ρσε2
 ' ⎜
⎜ · · · ρ T −2 σε2 ⎟
ρσε2 σε2 ⎟
.Ωε = E εε =⎜ .... .. .. ⎟ (4.111)
⎝ . . . . ⎠
ρ T −1 2
σε ρ T −2 σε · · · σε
2 2
200 4 Heteroskedasticity and Autocorrelation of Errors

that is:
⎛ ⎞
1 ρ ρ 2 · · · ρ T −1
⎜ ρ 1 · · · ρ T −2 ⎟
⎜ ⎟
⎜ ρ2 ⎟
Ωε =
. σε2 ⎜ ⎟ (4.112)
⎜ .. .. .. . ⎟
⎝ . . . .. ⎠
ρ T −1 ρ T −2 ··· 1

σ2
with .σε2 = 1−ρ
u
2.
Transferring this expression to (4.99), we can show that the variance of the OLS
estimator is written as:
⎡ ⎤
T T
xt xt−1 xt xt−2
σε2 ⎢ ⎢ x1 xT ⎥⎥
+ . . . + 2ρ T −1
t=2 t=3
.V β̂OLS = ⎢ 1 + 2ρ + 2ρ 2 ⎥

T
2
⎣ 
T
2

T
2

T
2

xt xt xt xt
t=1 t=1 t=1 t=1
(4.113)

where .xt = Xt − X̄.


We have seen that when there is autocorrelation, it is appropriate to use the
generalized least squares (GLS) estimator given by:

−1
β̃ = X' Ω−1
. ε X X' Ω−1
ε Y (4.114)

In this case, we know that the variance of the estimator is written as:
−1
Ωβ̃ = X' Ω−1
. ε X (4.115)

If we replace the matrix .Ωε by its expression (4.112) in (4.115), we obtain:


⎡ ⎤
⎢ ⎥
⎢ ⎥
⎢ ⎥
2
σε ⎢ ⎢ 1−ρ 2 ⎥
V β̃GLS = ⎥ (4.116)
.
 ⎢ T ⎥
T
2 ⎢ t t−1 ⎥
xt ⎢ x x

t=1 ⎣ 1 + ρ 2 − 2ρ t=2T ⎦

xt2
t=1

where .xt = Xt − X̄.


To assess the effectiveness of the OLS estimator, we compare the bracketed terms
in relationships (4.113) and (4.116). We see that these terms depend on the variable
4.3 Autocorrelation of Errors 201

xt and the first-order autocorrelation coefficient .ρ. Obviously, in the absence of


.

autocorrelation of errors, i.e., if .ρ = 0, the two expressions are identical. When


there is autocorrelation, the efficiency of the OLS estimator depends on the values
taken by the coefficient .ρ and the nature of .xt . Calculations by Johnston and Dinardo
(1996) show that, if we assume that .xt also follows an autoregressive process of
order 1, the efficiency of the OLS estimator decreases from 90 to 10% when the
first-order autocorrelation coefficient .ρ increases from 0.2 to 0.9. These calculations
are only illustrative, but they show that the effectiveness of the OLS estimator is
undermined if OLS is applied to a model with an autocorrelated error term.
Furthermore, we know that the variance of the OLS estimator when the error
term satisfies the good statistical properties is given by:

σε2
V β̂ =
. (4.117)

T
xt2
t=1

As we can see from expression (4.113), this is no longer the true variance when
there is error autocorrelation. Consequently, using the usual OLS formulas when
there is autocorrelation leads to an erroneous evaluation of the variance of the
estimators and, consequently, of their t-statistics. The results of the significance
tests can then be significantly affected, leading to incorrect interpretations.
All in all, it should be recalled that when there is autocorrelation of errors, the
GLS method should be used. However, as in the case of heteroskedasticity, GLS can
only be used if the variance-covariance matrix of the errors .Ωε is known. In practice,
this is generally not the case, which is why we will present operational estimation
procedures in the following.

4.3.3 Detecting Autocorrelation

Since the error term is unobservable, autocorrelation is detected by studying the


residuals that are deduced from estimating the regression model. There are various
tests for detecting autocorrelation. Before presenting them, let us mention that a
first intuition can be provided graphically. Figures 4.6, 4.7, 4.8 and 4.9 reproduce
various cases of autocorrelation of residuals, with Fig. 4.10 representing the case of
absence of autocorrelation. Figure 4.6 illustrates positive autocorrelation, Fig. 4.7
an increasing trend in the residuals, Fig. 4.8 a decreasing trend, and Fig. 4.9 a case
of negative autocorrelation of the residuals. In Fig. 4.10, no particular pattern stands
out, illustrating the case of no autocorrelation in the residuals.

The Geary Test (1970)


This is a nonparametric test in the sense that no hypothesis is made about the
probability distribution from which the residuals are derived. This test is based on
the study of the sign of the residuals or, more precisely, the change in sign of the
202 4 Heteroskedasticity and Autocorrelation of Errors

Fig. 4.6 Positive et


autocorrelation of residuals

Fig. 4.7 Increasing trend in et


residuals

Fig. 4.8 Decreasing trend in et


residuals

t
4.3 Autocorrelation of Errors 203

Fig. 4.9 Negative et


autocorrelation of residuals

Fig. 4.10 Absence of et


autocorrelation of residuals

residuals. Let us define a run (or sequence) of length i as a sequence of i consecutive


values of the same sign. For example, let us denote positive residuals by + and
negative residuals by .−. Suppose that the estimation of a regression model on 30
observations leads to the following results for the residuals:

. −−−−−−−+++++++++++++−−−−−−−−−−

Here we have three runs: a negative run of length 7, a positive run of length
13, and then a negative run of length 10. We wonder whether these three runs are
from a purely random series of 30 observations. Intuitively, we might think that if
the number of runs is very large, the residuals frequently change sign, which is an
indication in favor of a negative autocorrelation of the residuals. Similarly, if the
number of runs is very small, indicating that residuals rarely change sign, this may
indicate a positive autocorrelation of residuals.
204 4 Heteroskedasticity and Autocorrelation of Errors

To test these assumptions, let us note T the total number of observations,


.T1 the number of positive residuals, and .T2 the number of negative residuals
.(T = T1 + T2 ). Let R be the number of runs. Under the null hypothesis of

independence of observations (here of residuals) and assuming that .T1 > 10 and
.T2 > 10, the number of runs follows a normal distribution with mean:

2T1 T2
E (R) = 1 +
. (4.118)
T
and variance:
2T1 T2 (2T1 T2 − 1)
.σR2 = (4.119)
(T − 1) T 2

Under the null hypothesis of independence, we can construct a 95% confidence


interval:

E(R) ± 1.96σR
. (4.120)

The decision rule is as follows:

– If the number of runs R lies within the confidence interval, the null hypothesis is
not rejected.
– If the number of runs R lies outside the confidence interval, the null hypothesis
is rejected.

Returning to our example, we have: .T = 30, T1 = 13, T2 = 17, and .R = 3.


We deduce: .E(R) = 15.7333 and .σR = 2.7328. The 95% confidence interval is
then given by: .[10.3770; 21.0896]. The number of runs, 3, does not lie within this
interval. The null hypothesis is rejected, reflecting the presence of autocorrelation
at the 5% significance level.

The Durbin and Watson Test (1950, 1951)


Unlike the previous test, the Durbin-Watson test is a parametric test in the sense
that it relies on an assumption about the probability distribution of residuals. It is the
best-known test for detecting the presence of autocorrelation, is included in almost
all econometric software, and is widely used.
The Durbin-Watson test is based on the calculation of the following statistic:


T
(et − et−1 )2
t=2
DW =
. (4.121)

T
et2
t=1
4.3 Autocorrelation of Errors 205

where .et are the residuals resulting from the estimation of the regression model
(simple or multiple). It allows us to test the null hypothesis of no first-order
autocorrelation of the residuals against the alternative hypothesis of first-order
autocorrelation of the residuals. If we assume that the error term follows a first-
order autoregressive process:

εt = ρεt−1 + ut
. (4.122)

the Durbin-Watson test consists in testing the null hypothesis:

H0 : ρ = 0
. (4.123)

against the alternative hypothesis:

H1 : ρ /= 0
. (4.124)

The statistic DW is thus used to measure the magnitude of the first-order


autocorrelation. To show some of the characteristics of this test, let us expand the
numerator of (4.121):


T 
T 
T
et2 + 2 −2
et−1 et et−1
t=2 t=2 t=2
DW =
. (4.125)

T
et2
t=1

For T sufficiently large, we can write:


T 
T
2 et2 − 2 et et−1
t=2 t=2
. DW ≃ (4.126)

T
et2
t=1

that is:
⎛ ⎞

T
⎜ e e
t t−1 ⎟
⎜ t=2 ⎟
.DW ≃ 2 ⎜1 − ⎟ (4.127)
⎝  2 ⎠
T
et
t=1

T
et et−1
Given that .E (et ) = 0, the term .
t=2
 2
T
represents the estimate of the first-order
et
t=1
autocorrelation coefficient of the residual series. In other words, this is the estimate
206 4 Heteroskedasticity and Autocorrelation of Errors

of the coefficient .ρ in the regression of .et on .et−1 . Let us denote this estimated
coefficient .ρ̂. We can write:
 
DW ≃ 2 1 − ρ̂
. (4.128)

The expression (4.128) shows that there is a relationship between the statistic DW
and the first-order autocorrelation coefficient of the residuals. Furthermore, this
relationship allows us to highlight various characteristics of the statistic DW :

– Given that a coefficient of autocorrelation varies between .−1 and 1, the statistic
DW varies between 0 and 4. It is 0 when there is perfect positive autocorrelation
(.ρ̂ = 1) and 4 when there is perfect negative autocorrelation (.ρ̂ = −1).
– When .DW ≃ 2, the residuals are not autocorrelated (.ρ̂ = 0).
– When .DW > 2, the autocorrelation of the residuals is negative.
– When .DW < 2, the autocorrelation of the residuals is positive.

To carry out the test, Durbin and Watson tabulated a lower bound and an upper
bound for the critical values of the statistic DW as a function of the number of
observations and the number of explanatory variables k included in the model under
consideration. The table thus gives two values .d1 (lower bound) and .d2 (upper
bound) allowing us to perform the test according to the table below.

.0 .d1 .d2 .2 . 4 − d2 . 4 − d1 . 4

.ρ > 0 .ρ = 0 .ρ < 0

Rejection of .H0
? Non-rejection of .H0
? Rejection of .H0

We can see that there are two regions of “doubt” or indecision. In practice, if
we find ourselves in one of these regions, we tend to reject the null hypothesis
of no autocorrelation. This is because the consequences of not rejecting the null
hypothesis of no autocorrelation even though it is false are considered more “severe”
than the consequences of wrongly assuming the absence of autocorrelation. Thus, in
practice, when in a region of doubt, we use the upper bound .d2 as if it were a usual
critical value: we reject the null hypothesis of no autocorrelation if .DW < d2 . The
region of doubt decreases as the sample size increases.
The Durbin-Watson test is very frequently used. However, it is important to
specify certain conditions of use:

– The regression model must include a constant term. The critical values given
in the tables of Durbin and Watson have indeed been tabulated assuming the
presence of a constant in the regression model.4

4 Let us mention, however, that Farebrother (1980) tabulated the critical values of the statistic DW
in the absence of a constant term.
4.3 Autocorrelation of Errors 207

– The explanatory variables must be nonrandom. Consequently, the Durbin-


Watson test cannot be used when the model includes the lagged endogenous
variable among the explanatory variables. In this case, the Durbin test or the
Breusch-Godfrey test can be used (see below).
– The error term is assumed to follow a normal distribution (see Mittelhammer
et al., 2000 for a study of the consequences of not observing the normality
assumption).
– The Durbin-Watson test only detects the first-order autocorrelation of residuals.
In other words, it is not appropriate if the residuals exhibit higher-order
autocorrelation. It is then possible to use the Wallis test or more general tests
such as the Breusch-Godfrey, Box-Pierce, or Ljung-Box tests.

Remark 4.5 (The Wallis Test (1972)) As mentioned by Wallis (1972), it is


common, when working with quarterly frequency series, to expect a fourth-order
error autocorrelation. Wallis proposes a modification of the Durbin-Watson statistic:


T
(et − et−4 )2
t=5
DW4 =
. (4.129)

T
et2
t=1

allowing us to test the null hypothesis of absence of autocorrelation at order 4 of the


residuals, i.e., .ρ4 = 0 in the equation:

εt = ρ4 εt−4 + ut
. (4.130)

Wallis (1972) derived tables of critical values including an upper and lower
bound for .DW4 (see also Giles and King, 1978).

The Durbin Test (1970)


As we have previously pointed out, the Durbin-Watson test cannot be applied if
the regression model under consideration includes the lagged endogenous variable
among the explanatory variables. The test proposed by Durbin (1970) can be
implemented in such a case.
Consider the following regression model:

Yt = α + φ1 Yt−1 + . . . + φp Yt−p + β1 X1t + . . . + βk Xkt + εt


. (4.131)

with:

εt = ρεt−1 + ut
. (4.132)
208 4 Heteroskedasticity and Autocorrelation of Errors

The Durbin test allows us to detect the presence of first-order autocorrelation of


the residuals in a model with the lagged endogenous variable among the explanatory
variables.
The test statistic proposed by Durbin is written as:

!
! T
h = ρ̂ "
. (4.133)
1 − T V φ̂1

where .V φ̂1 denotes the estimated variance of the coefficient associated with
Yt−1 in regression (4.131) and .ρ̂ is the estimator of the first-order autoregressive
.

coefficient of the residuals, obtained from the regression of .et on .et−1 :


T
et et−1
t=2
ρ̂ =
. (4.134)
T
et2
t=1

et denoting residuals from the OLS estimation of regression (4.131).


.

Under the null hypothesis of no autocorrelation at order 1 .(ρ = 0), the Durbin
statistic has a standard normal distribution, i.e.:

h ∼ N(0, 1)
. (4.135)

At the 5% significance level, the decision rule is:

– If .|h| < 1.96, the null hypothesis of no autocorrelation at order 1 is not rejected.
– If .|h| > 1.96, the null hypothesis of no autocorrelation at order 1 is rejected.

Remark 4.6 If .T V φ̂1 ≥ 1, the formula (4.133) cannot be applied. In such a


case, Durbin suggests proceeding as follows:

– Estimate model (4.131) by OLS and derive the residual series .et .
– Estimate by OLS the regression of .et on .et−1 , Yt−1 , . . . , Yt−p , X1t , . . . ,
Xkt .
– Perform a test of significance (t-test) on the coefficient associated with .et−1 . If
this coefficient is significantly different from zero, the residuals are autocorre-
lated to order 1.

The Breusch-Godfrey Test


Breusch (1978) and Godfrey (1978) proposed—independently—a test to detect the
presence of autocorrelation of order greater than 1 that remains valid when the
model includes the lagged endogenous variable among the explanatory variables.
4.3 Autocorrelation of Errors 209

The Breusch-Godfrey test is a Lagrange multiplier test based on the search for a
relationship between the errors εt , t = 1, . . . , T .
Suppose that the error term εt of the multiple regression model:

Yt = α + β1 X1t + β2 X2t + . . . + βj Xj t + . . . + βk Xkt + εt


. (4.136)

follows an autoregressive process of order p, which we note AR(p), or:

εt = φ1 εt−1 + φ2 εt−2 + . . . + φp εt−p + ut


. (4.137)

where ut is white noise.


The Breusch-Godfrey test consists of testing the null hypothesis of no autocorre-
lation of errors, i.e.:

. H0 : φ1 = φ2 = . . . = φp = 0 (4.138)

The test procedure can be described in three steps:

– The multiple regression model (4.136) is estimated and the residuals et , t =


1, . . . , T , are deduced. The test concerns the residuals since the errors are
obviously unknown.
– We regress the residuals et on their p past values as well as on the k explanatory
variables, i.e.:

et = α + β1 X1t + β2 X2t + . . . + βj Xj t + . . . + βk Xkt


. (4.139)
+ φ1 et−1 + φ2 et−2 + . . . + φp et−p + ut

and we calculate the coefficient of determination R 2 associated with this


regression.
– We calculate the test statistic:

BG = (T − p) R 2
. (4.140)

Under the null hypothesis, BG ∼ χp2 .


The decision rule is then:
- If BG < χp2 , the null hypothesis of no autocorrelation is not rejected.
- If BG > χp2 , the null hypothesis of no autocorrelation is rejected: at least one
of the coefficients φi , i = 1, . . . , p, is significantly different from zero.

Remark 4.7 In the previous developments, it has been assumed that the error term
εt of the multiple regression model follows an autoregressive process of order p
(Eq. (4.137)). The Breusch-Godfrey test can also be applied in the case where the
210 4 Heteroskedasticity and Autocorrelation of Errors

error process follows a moving average process of order p, which is noted MA(p),
i.e.:

.εt = θ1 ut−1 + θ2 ut−2 + . . . + θp ut−p (4.141)

where ut is white noise.


The test procedure is exactly the same as described above.

The Box-Pierce (1970) and Ljung-Box (1978) Tests


The Box-Pierce test, also known as the “portmanteau” test, is designed to
test the non-autocorrelated nature of residuals. Noting .ρh(et ) the coefficient of
autocorrelation of order h of the residuals, .h = 1, . . . , H , the test consists in testing
the null hypothesis:

ρ1(et ) = ρ2(et ) = · · · = ρh(et ) = · · · = ρH (et ) = 0


. (4.142)

against the alternative hypothesis that there is at least one coefficient .ρh(et )
significantly different from zero.
The test statistic is written as:


H
BP (H ) = T
.
2
ρ̂h(e t)
(4.143)
h=1

where .ρ̂h(et ) is the estimator of the autocorrelation coefficient of order . h of the


residuals:


T
et et−h
t=h+1
ρ̂h =
. (4.144)
T
et2
t=1

and .H is the maximum number of lags.


Under the null hypothesis of no autocorrelation:

ρ1(et ) = ρ2(et ) = · · · = ρH (et ) = 0


. (4.145)

the statistic .BP (H ) follows a Chi-squared distribution with H degrees of freedom.5


Ljung and Box (1978) suggest an improvement to the Box-Pierce test when the
sample size is small. The distribution of the Ljung-Box test statistic is indeed closer

5 Itis assumed here that the lagged dependent variable is not among the explanatory variables. We
will come back to the Box-Pierce test in Chap. 7.
4.3 Autocorrelation of Errors 211

to that of the Chi-squared in small samples than is that of the Box-Pierce test. The
test statistic is written as:


H 2
ρ̂h(e t)
LB(H ) = T (T + 2)
. (4.146)
T −h
h=1

Under the null hypothesis of no autocorrelation:

ρ1(et ) = ρ2(et ) = · · · = ρH (et ) = 0


. (4.147)

the statistic .LB(H ) has a Chi-squared distribution with H degrees of freedom.

4.3.4 Estimation Procedures in the Presence of Error


Autocorrelation

In the presence of error autocorrelation, the OLS estimators remain unbiased, but
are no longer of minimum variance. As in the case of heteroskedasticity, this has the
consequence of affecting the precision of the tests. So how can we correct for error
autocorrelation?
To answer this question, we need to distinguish between cases where the variance
of the error term is known and those where it is unknown. When the variance of the
error term is known, we have seen (see Sect. 4.1.2) that the GLS method should be
applied in the presence of autocorrelation. When the variance of the error term is
unknown, various methods are available, which we describe below.

Case Where the Variance of the Error Term Is Known: General Principle
of GLS
Consider the multiple regression model:

Y = Xβ + ε
. (4.148)
 
with .E εε' = Ωε .
As we have seen previously (see Sect. 4.1.2), the GLS method can be applied
provided we find a transformation matrix .M of known parameters, such as:

M ' M = 𝚪ε−1
. (4.149)

with:
−1
. 𝚪ε = σε2 Ωε (4.150)
212 4 Heteroskedasticity and Autocorrelation of Errors

It is then sufficient to apply OLS to the transformed variables .MY and .MX. To
get a clearer picture, let us consider the simple regression model:

Yt = α + βXt + εt
. (4.151)

and assume that the error term follows a first-order autoregressive process (.AR(1)),
i.e.:

εt = ρεt−1 + ut
. (4.152)

where .|ρ| < 1 and .ut is white noise.


As previously shown, the variance-covariance matrix of the error term is given
by:
⎛ ⎞
1 ρ · · · ρ T −1
⎜ ρ 1 · · · ρ T −2 ⎟
⎜ ⎟
Ωε = σε2 ⎜
. .. .. .. .. ⎟ (4.153)
⎝ . . . . ⎠
ρ T −1 ρ T −2 ··· 1

σ2
with .σε2 = 1−ρu
2.
If .ρ is known, the GLS estimator:

−1
. β̃ = X' Ω−1
ε X X' Ω−1
ε Y (4.154)

can be obtained with:


⎛ ⎞
1 −ρ 0 0 ··· 0
⎜−ρ 1 + ρ 2 −ρ ··· ⎟
⎜ 0 ⎟ 0
⎜ ⎟
1 ⎜ 0 −ρ 1 + ρ 2 −ρ ⎟ 0
Ω−1 ⎜
= 2⎜ . ⎟ (4.155)

σu ⎜ ..
.. .. ⎟ ..
. . ⎟ .
⎜ ⎟
⎝ 0 ··· 0 −ρ 1 + ρ 2 −ρ ⎠
0 ··· 0 −ρ 1

Consider the transformation matrix .M such that:


⎛ ⎞
−ρ 1 0 0 ··· 0
⎜ 0 −ρ 1 0 · · · 0⎟
⎜ ⎟
⎜ 0 −ρ 1 0⎟
.M = ⎜ 0 ⎟ (4.156)
⎜ . .. .. .. ⎟
⎝ .. . . .⎠
0 0 0 · · · −ρ 1
4.3 Autocorrelation of Errors 213

Then we have:
⎛ ⎞
ρ 2 −ρ 0 0 ··· 0
⎜−ρ 1 + ρ 2 −ρ ··· ⎟
⎜ 0 0 ⎟
⎜ ⎟
⎜ 0 −ρ 1 + ρ 2 −ρ 0 ⎟
.M M = ⎜ . ⎟
'
⎜ . .. .. .. ⎟ (4.157)
⎜ . . . . ⎟
⎜ ⎟
⎝ 0 0 0 · · · 1 + ρ 2 −ρ ⎠
0 0 ··· 0 −ρ 1

' 2 −1
.M M is identical to .σu Ωε , except for the first element of the diagonal (.ρ
2

instead of 1).
By applying the matrix .M to model (4.151), we obtain the transformed variables:
⎛ ⎞
Y2 − ρY1
⎜ Y3 − ρY2 ⎟
⎜ ⎟
MY = ⎜
. .. ⎟ (4.158)
⎝ . ⎠
YT − ρYT −1

and
⎛ ⎞
1 X2 − ρX1
⎜1 X3 − ρX2 ⎟
⎜ ⎟
.MX = ⎜ . .. ⎟ (4.159)
⎝ .. . ⎠
1 XT − ρXT −1

The GLS method amounts to applying the OLS to the regression model formed
by the .(T − 1) transformed observations .MY and .MX :
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
Y2 − ρY1 1 X2 − ρX1 u2
⎜ Y3 − ρY2 ⎟ ⎜1 X3 − ρX2 ⎟   ⎜ ⎟
u
⎜ ⎟ ⎜ ⎟ α (1 − ρ) ⎜ 3⎟
.⎜ .. ⎟ = ⎜. .. ⎟ +⎜ . ⎟ (4.160)
⎝ . ⎠ ⎝ .. . ⎠ β ⎝ .. ⎠
YT − ρYT −1 1 XT − ρXT −1 uT

The variables thus transformed are said to be transformed into first-quasi


differences.

Remark 4.8 In order not to lose the first observation, we can add a first row to the
matrix M. This first
 row is such that all the elements are zero, except the first one
which is equal to . 1 − ρ 2 .

However, such a method is only applicable if the coefficient .ρ is known, which


is rarely the case in practice. It is a parameter that has to be estimated. Once
214 4 Heteroskedasticity and Autocorrelation of Errors

this estimation has been made, the method previously described can be applied
by replacing .ρ by its estimator .ρ̂ in the transformed model. Various methods are
available for this purpose and are discussed below.

Case Where the Variance of the Error Term Is Unknown: Pseudo GLS
Methods
We can distinguish iterative methods from other techniques. These different meth-
ods are called pseudo GLS methods. Generally speaking, they consist of estimating
the parameters of the residuals’ generating model, transforming the variables of
the model using these parameters, and applying OLS to the model formed by the
variables thus transformed.

Non-iterative Methods
Among the non-iterative methods, it is possible to find the estimator .ρ̂ of the
coefficient .ρ in two different ways: by relying on the Durbin-Watson statistic or
by performing regressions using residuals.

The Use of the Durbin-Watson Test


We know that (see Eq. (4.128)):
 
DW ≃ 2 1 − ρ̂
. (4.161)

where .ρ̂ denotes the estimate of .ρ in the regression of the residuals .et on .et−1 . Using
this expression leads directly to the estimator:

DW
ρ̂ ≃ 1 −
. (4.162)
2
Once this estimator has been obtained, we transform the variables as follows:

Yt − ρ̂Yt−1 and Xit − ρ̂Xit−1


. (4.163)

where .i = 1, . . . , k, k denotes the number of explanatory variables, and apply OLS


to the transformed model.

Method Based on Residuals


This technique consists in regressing .et on .et−1 and deducing the estimator .ρ̂ of .ρ:


T
et et−1
t=2
ρ̂ =
. (4.164)
T
et2
t=1
4.3 Autocorrelation of Errors 215

It then remains for us to transform the variables and apply OLS to the transformed
model.6

Iterative Methods
Various iterative pseudo GLS techniques are available to estimate the coefficient
.ρ. The best known are those of Cochrane and Orcutt (1949) and Hildreth and Lu

(1960).

The Cochrane-Orcutt Method


This is the most popular iterative technique. It can be described in five steps.

– Step 1. The regression model under consideration is estimated and the residuals
.et are deduced. An initial estimate .ρ̂0 of .ρ is obtained:


T
et et−1
t=2
ρ̂0 =
. (4.165)
T
et2
t=1

– Step 2. The transformed variables .Yt − ρ̂0 Yt−1 and .Xit − ρ̂0 Xit−1 are constructed
for .i = 1, . . . , k, with k denoting the number of explanatory variables.
– Step 3. OLS is applied to the model in quasi-differences:
   
Yt − ρ̂0 Yt−1 = α 1 − ρ̂0 + β1 X1t − ρ̂0 X1t−1 + . . .
.
 
+ βk Xkt − ρ̂0 Xkt−1 + ut (4.166)

(1)
– Step 4. From the new estimation residuals .et , a new estimation .ρ̂1 of .ρ is
performed:


T
(1) (1)
et et−1
t=2
ρ̂1 =
. (4.167)

T
(1)2
et
t=1

– Step 5. We construct the transformed variables .Yt − ρ̂1 Yt−1 and .Xit − ρ̂1 Xit−1
and apply the OLS to the model in quasi-differences:
   
Yt − ρ̂1 Yt−1 = α 1 − ρ̂1 + β1 X1t − ρ̂1 X1t−1 + . . .
.
 
+ βk Xkt − ρ̂1 Xkt−1 + ut (4.168)

6 Itis unnecessary to introduce a constant term in the regression of .et on .et−1 since the mean of
the residuals is zero.
216 4 Heteroskedasticity and Autocorrelation of Errors

(2)
A new set of residuals .et is deduced, from which a new estimate .ρ̂2 of .ρ is
obtained and so on.

These calculations  are continued until the estimated regression coefficients


. β̂1 , . . . , β̂k and .α 1 − ρ̂0 are stable.

Remark 4.9 We previously noted (see Remark 4.8) that it was possible not to omit
the first observation during the variable transformation step. When this observation
is not omitted, the method of Cochrane-Orcutt is slightly modified and is called the
Prais-Winsten method (see Prais and Winsten, 1954).

The Hildreth-Lu Method


Consider the following quasi-difference model:
     
Yt − ρ̂Yt−1 = α 1 − ρ̂ + β1 X1t − ρ̂X1t−1 + . . . + βk Xkt − ρ̂Xkt−1 + ut
.

(4.169)

The procedure can be described in three steps.

– Step 1. We give ourselves a grid of possible values for .ρ̂, between .−1 and 1. For
example, we can set a step size of 0.1 and consider the values .−0.9, .−0.8, . . . ,
0.8, 0.9.
– Step 2. Relationship (4.169) is estimated for each of the previously fixed values
of .ρ̂. The value of .ρ̂ that minimizes the sum of squared residuals is retained.
– Step 3. To refine the estimates, we repeat the previous two steps, setting a smaller
step size (e.g., 0.01) and so on.

Other Methods
Two other techniques can also be implemented to account for autocorrelation.
The first technique involves applying the maximum likelihood method to the
regression model. This method simultaneously estimates the usual parameters of
the regression model as well as the value of .ρ (see Beach and MacKinnon, 1978).
The second technique has already been discussed in the treatment of het-
eroskedasticity. This is the correction proposed by Newey and West (1987). Recall
that this technique allows us to apply OLS to the regression model, despite the
presence of error autocorrelation, and to correct the standard deviations of the
estimated coefficients. We do not describe this technique again, since it has already
been outlined (see Sect. 4.2.4).

4.3.5 Prediction in the Presence of Error Autocorrelation

Let us consider the bivariate regression model:

Yt = α + βXt + εt
. (4.170)
4.3 Autocorrelation of Errors 217

and assume that the error term follows a first-order autoregressive process, i.e.:

.εt = ρεt−1 + ut (4.171)

where .|ρ| < 1 and .ut is white noise.


We can then write:

Yt = α + βXt + ρεt−1 + ut
. (4.172)

The prediction of Y for the date .T + 1 is given by:

. ŶT +1 = α̂ + β̂XT +1 + ρ ε̂T (4.173)

Thus, compared with the usual regression model without error autocorrelation,
the term .ρ ε̂T is added.

4.3.6 Empirical Application

Let us consider our monthly-frequency model over the period February 1984–June
2021 linking the returns of the RF T SE London Stock Exchange index to the returns
of the RDJ I ND New York Stock Exchange index (see Table 4.8):

RF T SEt = α + βRDJ I NDt + εt


. (4.174)

The OLS estimation of model (4.174) leads to the results shown in Table 4.9.
The residuals resulting from the estimation of this model are plotted in Fig. 4.11. In
order to determine whether or not they are autocorrelated, let us apply the tests of
absence of autocorrelation.
The value of the Durbin-Watson test statistic is given in Table 4.9: .DW =
2.2247. At the 5% significance level, the reading of the Durbin-Watson table in
the case where only one exogenous variable appears in the model gives .d1 = 1.65
and .d2 = 1.69. Since .d2 < DW < 4 − d2 , we do not reject the null hypothesis of
absence of first-order autocorrelation of the residuals.

Table 4.8 F T SE and Dow RF T SE RDJ I N D


Jones industrial returns
1984.02 .−0.0216 .−0.0555

1984.03 0.0671 0.0088


1984.04 0.0229 0.0050
... ... ...
2021.04 0.0374 0.0267
2021.05 0.0075 0.0191
2021.06 0.0021 .−0.0008

Data source: Macrobond


218 4 Heteroskedasticity and Autocorrelation of Errors

Table 4.9 OLS estimation


Dependent variable: RFTSE
Variable Coefficient Std. error t-Statistic Prob.
C .−0.001614 0.001352 .−1.193763 0.2332
RDJIND 0.782505 0.030388 25.75060 0.0000
R-Squared 0.597331 Mean dependent var 0.004210
Adjusted R-squared 0.596430 S.D. dependent var 0.044466
S.E. of regression 0.028248 Akaike info criterion .−4.291158
Sum squared resid 0.356678 Schwarz criterion .−4.272864

Log likelihood 965.3650 Hannan-Quinn criterion .−4.283947


F-statistic 663.0935 Durbin-Watson stat 2.224700
Prob(F-statistic) 0.0000

.100

.075

.050

.025

.000

-.025

-.050

-.075

-.100
1985 1990 1995 2000 2005 2010 2015 2020

RFTSE Residuals

Fig. 4.11 Graphical representation of residuals

In order to apply the Breusch-Godfrey test, we regress the residuals (denoted


RESI D) obtained from the estimation of model (4.174) on the explanatory variable
RDJ I ND and on the one- and two-period lagged residuals. The results of this
estimation are shown in Table 4.10.
The test statistic is given by:

.BG = (T − p) R 2 = (449 − 2) × 0.0208 = 9.2824 (4.175)

Under the null hypothesis of no autocorrelation, the statistic BG follows a


Chi-squared distribution with .p = 2 degrees of freedom. At the 5% significance
level, the critical value is 5.991. Given that .9.2824 > 5.991, the null hypothesis
4.3 Autocorrelation of Errors 219

Table 4.10 The Breusch-Godfrey test


Dependent variable: RESID
Variable Coefficient Std. error t-Statistic Prob.
C .−5.17E–05 0.001341 .−0.038557 0.9693
RDJIND 0.006827 0.030235 0.225801 0.8215
RESID(-1) .−0.123734 0.047264 .−2.617914 0.0091
RESID(-2) .−0.089929 0.047333 .−1.899944 0.0581
R-Squared 0.020766 Mean dependent var .−6.99E–19
Adjusted R-squared 0.014164 S.D. dependent var 0.028216
S.E. of regression 0.028016 Akaike info criterion .−4.303234
Sum squared resid 0.349272 Schwarz criterion .−4.266646
Log likelihood 970.0760 Hannan-Quinn criterion .−4.288812

F-statistic 3.145607 Durbin-Watson stat 2.016886


Prob(F-statistic) 0.024998

Table 4.11 The Ljung-Box H .LB(H )


2
.χH H .LB(H )
2
.χH
test
1 5.787 3.841 7 11.81 14.067
2 8.351 5.991 8 11.811 15.507
3 10.79 7.815 12 13.055 21.026
4 10.99 9.488 18 16.307 28.869
5 10.993 11.07 24 21.142 36.415
6 11.77 12.592 30 31.654 43.773

of no autocorrelation is rejected. The residuals are therefore autocorrelated. This


conclusion was to be expected from the results shown in Table 4.10 since the
coefficient associated with the one-period lagged residuals is significantly different
from zero and that associated with the two-period lagged residuals is significant at
the 10% level.
Table 4.11 reports the calculated values of the Ljung-Box statistic and the critical
value given by the Chi-squared distribution at the 5% significance level for a number
of lags H ranging from 1 to 30. We find that, for values of H ranging from 1 to 4,
the calculated value of the Ljung-Box statistic is higher than the critical value. The
null hypothesis of no autocorrelation is therefore rejected. For higher values of H
.(H ≥ 5), the null hypothesis is no longer rejected at the 5% significance level (it is

rejected up to 6 lags if the 10% level is used).


While the Durbin-Watson test concludes in the absence of first-order autocor-
relation of the residuals, the Breusch-Godfrey and Ljung-Box tests reject the null
hypothesis of no autocorrelation—particularly for higher orders. To take account
of this feature, it is possible to reestimate model (4.174) by OLS by applying the
correction suggested by Newey-West. The results have already been presented in
the study of heteroskedasticity and are shown in Table 4.7.
For illustrative purposes, let us also apply the Cochrane-Orcutt method. The first
step is to obtain an initial estimate .ρ̂0 of .ρ. To do this, we regress the residuals
220 4 Heteroskedasticity and Autocorrelation of Errors

Table 4.12 The Cochrane-Orcutt procedure


Dependent variable: DRFTSE
Variable Coefficient Std. error t-Statistic Prob.
C .−0.001929 0.001350 .−1.429024 0.1537
DRDJIND 0.791900 0.030019 26.37990 0.0000
R-Squared 0.609423 Mean dependent var 0.004744
Adjusted R-squared 0.608547 S.D. dependent var 0.044869
S.E. of regression 0.028073 Akaike info criterion .−4.303587
Sum squared resid 0.351481 Schwarz criterion .−4.285262

Log likelihood 966.0034 Hannan-Quinn criterion .−4.296363


F-statistic 695.8990 Durbin-Watson stat 2.020324
Prob(F-statistic) 0.0000

obtained from the estimation of model (4.174) on the first-lagged residuals. We


obtain:

RESI Dt = −0.1132 × RESI Dt−1


. (4.176)

We thus have: .ρ̂0 = −0.1132. We construct the variables in quasi-differences:

DRF T SEt = RF T SEt + 0.1132 × RF T SEt−1


. (4.177)

and:

DRDJ I NDt = RDJ I NDt + 0.1132 × RDJ I NDt−1


. (4.178)

We then regress .DRF T SEt on a constant and .DRDJ I N Dt . The results are
shown in Table 4.12.
We can calculate the constant term:

. α̂ = −0.0019/ (1 + 0.1132) = −0.0017 (4.179)

The procedure can then be continued by estimating a new value of .ρ based on


the residuals from the estimation of the quasi-difference model (Table 4.12).

Conclusion

This chapter has focused on error-related problems, namely, autocorrelation and


heteroskedasticity. The next chapter is still concerned with the violation of the
assumptions of the regression model, but now focuses on problems related to
the explanatory variables. It specifies the procedure to follow when the matrix of
explanatory variables is no longer random, when the explanatory variables are not
Further Reading 221

independent of each other (collinearity), and when there is some instability in the
estimated model.

The Gist of the Chapter

Multiple regression model Y = X β + ε


(T ,1) (T ,k+1)(k+1,1) (T ,1)
 '

Heteroskedasticity and/or autocorrelation E εε = Ωε /= σε2 I
 −1 ' −1
GLS estimators β̃ = X' Ω−1
ε X X Ωε Y
Tests of homoskedasticity
Goldfeld and Quandt (1965)
Glejser (1969)
Breusch and Pagan (1979)
White (1980)
ARCH
Tests for absence of autocorrelation
Geary (1970)
Durbin and Watson (1950, 1951):

T
(et −et−1 )2
DW = t=2

T
, e: residual
et2
t=1
DW ≃ 2: no autocorrelation
Durbin (1970)
Breusch (1978) and Godfrey (1978)
Box and Pierce (1970)
Ljung and Box (1978)

Further Reading

This chapter includes a large number of references related to methods for detecting
heteroskedasticity and autocorrelation, as well as the solutions provided. In addition
to these references, most econometric textbooks contain developments on het-
eroskedasticity and autocorrelation problems. In particular, the books by Dhrymes
(1978), Judge et al. (1985, 1988), Davidson and MacKinnon (1993), Hendry (1995),
Wooldridge (2012), Gujarati et al. (2017), or Greene (2020) can be recommended.
Problems with Explanatory Variables: Random
Variables, Collinearity, and Instability 5

As we saw in the third chapter, the multiple regression model is based on a number
of assumptions. Here, we focus more specifically on the first two assumptions,
which relate to explanatory variables:

– The matrix .X of explanatory variables is non-random. This hypothesis amounts


to assuming that the matrix .X is independent of the error term.
– The matrix .X is of full rank. In other words, the explanatory variables in the
matrix .X are linearly independent.

In this chapter, we look at what happens when these assumptions do not hold. If
the first assumption is violated, the implication is that the explanatory variables
are dependent on the error term. Under these conditions, the OLS estimators
are no longer consistent and it is necessary to use another estimator called the
instrumental variables estimator. This is the subject of the first section of the
chapter.
The consequence of violating the second assumption is that the explanatory
variables are not linearly independent. In other words, they are collinear. This issue
of multicollinearity is addressed in the second section of the chapter.
Finally, we turn our attention to the third problem related to the explanatory
variables, namely, the question of the stability of the estimated model.

5.1 Random Explanatory Variables and the Instrumental


Variables Method

The aim of this section is to find an estimator that remains valid in the presence
of correlation between the explanatory variables and the error term. We know that
when the independence assumption between the matrix of explanatory variables
and the error term is violated, the OLS estimator is no longer consistent: even if

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 223
V. Mignon, Principles of Econometrics, Classroom Companion: Economics,
https://doi.org/10.1007/978-3-031-52535-3_5
224 5 Problems with Explanatory Variables

we increase the sample size, the estimator does not tend towards its true value.
It is therefore necessary to find another estimator that does not suffer from this
consistency problem. This is precisely the purpose of the instrumental variables
method, which consists in finding a set of variables that are uncorrelated with the
error term but that are correlated with the explanatory variables, in order to represent
them correctly. Applying this method yields an estimator, called the instrumental
variables estimator, which remains valid in the presence of correlation between the
explanatory variables and the error term.

5.1.1 Instrumental Variables Estimator

If the explanatory variables are random and correlated with the error term, it can
be shown that the OLS estimator is no longer consistent (see in particular Greene
2020). In other words, even if the sample size grows indefinitely, the OLS estimators
.β̂ do not approach their true values .β:

P lim β̂ /= β
. (5.1)

where .P lim denotes the probability limit (or convergence in probability).


The problem is then to find a consistent estimator of .β. To this end, we use the
instrumental variables method.
Consider the following general linear model:

Y = Xβ + ε
. (5.2)

The purpose of the instrumental variables method is to find a set of k variables


Z1t , .Z2t , . . . , .Zkt that are uncorrelated with the error term. By noting .Z the matrix
.

composed of these k variables .(Z = (Z1 , Z2 , . . . , Zk )), we seek to obtain .Z such


that:

Cov(Z ' ε) = 0
. (5.3)

In other words, the aim is to find a matrix .Z of variables that are uncorrelated at
each period with the error term, i.e.:

E (Zit εt ) = 0
. (5.4)

for .i = 1, . . . , k and .t = 1, . . . , T .
Let us premultiply the model (5.2) by .Z ' :

Z ' Y = Z ' Xβ + Z ' ε


. (5.5)
5.1 Random Explanatory Variables and the Instrumental Variables Method 225

Assuming, by analogy with the OLS method, that .Z ' ε = 0, we can write:

Z ' Y = Z ' Xβ
. (5.6)
 
Under the assumption that the matrix . Z ' X is non-singular, we obtain the
instrumental variables estimator, denoted .β̂ I V , defined by:
 −1 '
β̂ I V = Z ' X
. ZY (5.7)

It can be shown (see in particular Johnston and Dinardo, 1996) that the estimator
of instrumental variables is a consistent estimator of .β, i.e.:

. P lim β̂ I V = β (5.8)

The variables that appear in the matrix .Z are called instrumental variables
or instruments. Some of these variables may be variables that are present in
the original explanatory variables matrix .X. The instrumental variables must be
correlated with the explanatory variables, that is:

Cov(Z ' X) /= 0
. (5.9)
 
Otherwise, the matrix . Z ' X would indeed be zero, and the procedure could not
be applied.
 −1 '
By positing .X̂ = Z Z ' Z Z X, we can also write the instrumental variables
estimator as follows:
 ' −1 '  ' −1 '
β̂ I V = X̂ X
. X̂ Y = X̂ X̂ X̂ Y (5.10)

' '
because .X̂ X = X̂ X̂.
Employing a technique similar to that used in Chap. 3 for the OLS estimator, it
can easily be shown that the variance-covariance matrix .Ωβ̂I V of the instrumental
variables estimator is given by:
 ' −1
Ωβ̂I V = σε2 X̂ X̂
. (5.11)

It now remains for us to find a procedure to assess whether or not the explanatory
variables are correlated with the error term in order to determine which estimator to
choose between the OLS estimator and the instrumental variables estimator. To this
end, the Hausman (1978) specification test is used.
226 5 Problems with Explanatory Variables

5.1.2 The Hausman (1978) Specification Test

When the explanatory variables are not correlated with the error term, it is preferable
to use the OLS estimator rather than the instrumental variables estimator, as the OLS
estimator is more accurate (for demonstrations, see in particular Greene, 2020). It
is therefore important to have a test that can be used to determine whether or not
there is a correlation between the explanatory variables and the error term. This is
the purpose of the Hausman test (Hausman, 1978).
This test consists of testing the null hypothesis that the explanatory variables and
the error term are uncorrelated, against the alternative hypothesis that the correlation
between the two types of variables is non-zero. Under the null hypothesis, the
OLS and instrumental variables estimators are consistent, but the OLS estimator
is more accurate. Under the alternative hypothesis, the OLS estimator is no longer
consistent, unlike the instrumental variables estimator.
The idea behind the Hausman test is to test the significance of the difference
between the two estimators. If the difference is not significant, the null hypothesis
is not rejected. On the other hand, if the difference is significant, the null hypothesis
is rejected and the instrumental variables estimator should be used. We calculate the
following statistic, known as the Wald statistic:

 '   ' −1  −1


 −1  
H = β̂ I V − β̂
. σ̂ε2 X̂ X̂ − X' X β̂ I V − β̂ (5.12)

where .σ̂ε2 denotes the estimator of the variance of the error term .σε2 , i.e. (see
Chap. 3):

e' e
σ̂ε2 =
. (5.13)
T −k−1

Under the null hypothesis, the statistic H follows a Chi-squared distribution


whose number of degrees of freedom depends on the context studied:

– If the matrices .X and .Z have no common variables, the number of degrees of


freedom is .(k + 1).
'
 .X' and .Z have .k common variables, the number of degrees of
– If the matrices
freedom is . k − k .

5.1.3 Application Example: Measurement Error

The data used are, of course, assumed to be accurate measurements of their


theoretical equivalents. So far, we have supposed that the variables (dependent and
explanatory) are measured without error. In practice, however, this is not always
the case. For example, survey data (obtained by sampling), aggregate data (GDP,
5.1 Random Explanatory Variables and the Instrumental Variables Method 227

household consumption, investment, etc.), and so on do not always represent exact


measures of the theoretical variables. In this case, we speak of measurement error
on the variables. This may arise from errors in data reporting, calculation errors, etc.
Consider the following model with centered variables:

yt = βxt∗ + εt
. (5.14)

and assume that the observations .xt available are not a perfect measure of .xt∗ . In
other words, the observed variable .xt is subject to measurement errors, i.e.:

xt = xt∗ + μt
. (5.15)

where .μt is an error term that follows a normal distribution of zero mean
and variance .σμ2 . It is further assumed that the two error terms .εt and .μt are
independent. Such a model can, for example, be representative of the link between
consumption and permanent income, where .yt denotes current consumption and .xt∗
permanent income. Permanent income is not observable, only current income .xt
being observable. .μt thus denotes the measurement error on permanent income .xt∗ .
We can rewrite the model as follows:

yt = βxt − βμt + εt
. (5.16)

To simplify the notations, let us posit:

ηt = −βμt + εt
. (5.17)

Then we have:

. yt = βxt + ηt (5.18)

Let us calculate the covariance between .xt and .ηt :


 
Cov (xt , ηt ) = Cov xt∗ + μt , −βμt + εt = −βσμ2 /= 0
. (5.19)

Since the covariance between .xt and .ηt is non-zero, it follows that the OLS
estimator is biased1 and is not consistent. Thus, when there is a measurement error
on the explanatory variable, the OLS estimator is no longer consistent and the
instrumental variables estimator should be used.

1 In the case where it is the explained variable that is observed with error, then the OLS estimator
is still non-consistent, but is no longer biased.
228 5 Problems with Explanatory Variables

5.2 Multicollinearity and Variable Selection

As we previously recalled, one of the basic assumptions of the multiple regression


model is that the rank of the matrix .X is equal to .k + 1, i.e., to the number of
explanatory variables plus the constant. This assumption means that the explanatory
variables are linearly independent, or orthogonal. In other words, there is no
multicollinearity between the explanatory variables.
In this section, we study what happens when such an assumption is violated. In
practice, it is quite common for the explanatory variables to be more or less related
to each other.

5.2.1 Presentation of the Problem

We speak of perfect (or exact) collinearity between two explanatory variables if


they are perfectly dependent on each other. Thus, two variables .X1t and .X2t are
perfectly collinear if:

X2t = λX1t
. (5.20)

where .λ is a non-zero constant.


We speak of perfect (or exact) multicollinearity when an explanatory variable
is the result of a linear combination of several other explanatory variables. In this
case, the coefficient of determination is equal to one.
In a model comprising k explanatory variables, we speak of perfect multi-
collinearity if there is a linear combination:

λ1 X1t + λ2 X2t + . . . + λk Xkt = 0


. (5.21)

where .λ1 , . . . , λk are constants that are not all zero simultaneously.
In these cases of perfect collinearity or multicollinearity, the rank of the matrix .X
is less than .k + 1, which means that the assumption of linear independence
  between
the columns of .X no longer holds. It follows that the rank of . X' X is alsoless
than .k + 1. It is therefore theoretically impossible to invert the matrix . X' X , as
the latter is singular (its determinant is zero). The regression coefficients are then
indeterminate.
Cases of perfect collinearity and multicollinearity are rare. In practice, explana-
tory variables frequently exhibit strong, but not perfect, multicollinearity. We
then speak of quasi-multicollinearity or, more simply, multicollinearity. There is
multicollinearity if, in a model with k explanatory variables, we have the following
relationship:

λ1 X1t + λ2 X2t + . . . + λk Xkt + ut = 0


. (5.22)

where .ut is an error term.


5.2 Multicollinearity and Variable Selection 229

As we will see later, when there is multicollinearity, the regression coefficients


can be estimated—they are determined—but their standard deviations are very high,
making the estimation very imprecise.

5.2.2 The Effects of Multicollinearity

Multicollinearity has several effects. Firstly, the variances and covariances of the
estimators tend to increase. Let us explain this point.
We demonstrated in Chap. 3 that the variance-covariance matrix .Ωβ̂ of the OLS
coefficients is given by:
 −1
Ωβ̂ = σε2 X ' X
. (5.23)

We have also shown that the variance of the OLS coefficient .β̂i associated with
the .ith explanatory variable .Xit is written as:
 
V β̂i = σε2 ai+1,i+1
. (5.24)

 −1
where .ai+1,i+1 denotes the .(i + 1) th element of the diagonal of . X' X . It is
possible to show that:

1
ai+1,i+1 =
. = V I Fi (5.25)
1 − Ri2

where .V I Fi is the variance inflation factor and .Ri2 is the coefficient of deter-
mination associated with the regression of the variable .Xit on the .(k − 1) other
explanatory variables.
The statistic .V I Fi indicates how the variance of an estimator increases when
there is multicollinearity. In this
  case, .Ri2 tends to 1 and .ai+1,i+1 tends to infinity.
It follows that the variance .V β̂i also tends to infinity. Multicollinearity therefore
increases the variance of the estimators.
The second effect of multicollinearity is that the OLS estimators are highly
sensitive to small changes in the data. A small change in one observation or in the
number of observations can result in a large change in the estimated values of the
coefficients.
Let us take an example.2 From the data in Table 5.1, we estimate the following
models by OLS:

Yt = α + β1 X1t + β2 X2t + εt
. (5.26)

2 Of course, this example is purely illustrative in the sense that only six observations are considered.
230 5 Problems with Explanatory Variables

Table 5.1 Example of t Y .X1 .X2 .X3


multicollinearity
1 3 4 8 8
2 5 11 22 22
3 12 6 12 12
4 8 9 19 19
5 9 7 14 14
6 4 3 6 7

Table 5.2 Example of Model (5.26) .α .β1 .β2


multicollinearity. Estimation
Coefficient 5.09 −1.20 0.72
results
Standard deviation 4.67 10.41 5.07
t-statistic 1.09 −0.11 0.14
Model (5.27) .α
' '
.β1
'
.β2

Coefficient 5.50 2.25 −1.00


Standard deviation 4.95 7.37 3.73
t-statistic −1.11 0.30 0.27

and

Yt = α ' + β1' X1t + β2' X3t + εt'


. (5.27)

The variables .X2t and .X3t differ only in the final observation (Table 5.1). Look-
ing at the results in Table 5.2, we see that this small change in the data significantly
alters the estimates. Although not significant, the values of the coefficients of the
explanatory variables differ markedly between the two regressions; the same is true
for their standard deviations.
This example also highlights the first mentioned effect of multicollinearity,
namely, the high value of the standard deviations of the estimated coefficients.
There are also other effects of multicollinearity. These include the following
consequences:

– Because of the high value of the variances of the estimators, the t-statistics
associated with certain coefficients can be very low, even though the values taken
by the coefficients are high.
– Despite the non-significance of one or more explanatory variables, the coefficient
of determination of the regression can be very high. This is frequently considered
to be one of the most visible symptoms of multicollinearity. Thus, if the
coefficient of determination is very high, the Fisher test tends to reject the null
hypothesis of non-significance of the regression as a whole, even though the t-
statistics of several coefficients indicate that the latter are not significant.
– Some variables are sensitive to the exclusion or inclusion of other explanatory
variables.
5.2 Multicollinearity and Variable Selection 231

– It is difficult, if not impossible, to distinguish between the effects of different


explanatory variables on the variable being explained.

5.2.3 Detecting Multicollinearity

Strictly speaking, there is no test of multicollinearity as such. However, several


techniques can be used to detect it and assess how important it is.

Correlation Between Explanatory Variables


A simple method is to calculate the linear correlation coefficients between the
explanatory variables. If the latter are strongly correlated, this is an indication
in favor of multicollinearity. The presence of strong correlations is not, however,
necessary to observe multicollinearity. Multicollinearity can occur even when the
correlation coefficients are relatively low (e.g., below 0.5). Let us take an example
to illustrate this.

Example 5.1 Consider the following model:

Yt = α + β1 X1t + β2 X2t + β3 X3t + εt


. (5.28)

and suppose that .X3t is a linear combination of the other two explanatory variables:

X3t = λ1 X1t + λ2 X2t


. (5.29)

where .λ1 and .λ2 are simultaneously non-zero constants. Because of the existence of
this linear combination, the coefficient of determination .R 2 from the regression of
.X3t on .X1t and .X2t is equal to 1. By virtue of the relationship (3.106) from Chap. 3,

we can write:

rX2 3 X1 + rX2 3 X2 − 2rX3 X1 rX3 X2 rX1 X2


.R2 = =1 (5.30)
1 − rX2 1 X2

The previous relationship is satisfied for .rX3 X1 = rX3 X2 = 0.6 and .rX1 X2 =
−0.28. It is worth mentioning that these values are not very high even though there
is multicollinearity.

Consequently, in a model with more than two explanatory variables, care must
be taken when interpreting the values of the correlation coefficients.

The Klein Test (1962)


This is not strictly speaking a statistical test. The method proposed by Klein
(1962) consists in calculating the linear correlation coefficients between the different
explanatory variables: .rXi Xj for .i /= j and comparing these values with the
232 5 Problems with Explanatory Variables

coefficient of determination .R 2 associated with the regression of .Yt on all the k


explanatory variables. If .rX2 i Xj > R 2 , there is a presumption of multicollinearity.

The Farrar and Glauber Test (1967)


Farrar and Glauber (1967) proposed a technique for detecting multicollinearity
based on the matrix of correlation coefficients between the explanatory variables:
⎛ ⎞
1 rX1 X2 rX1 X3 · · · rX1 Xk
⎜ rX X 1 rX2 X3 · · · rX2 Xk ⎟
⎜ 2 1 ⎟
. ⎜ . .. ⎟ (5.31)
⎝ .. ··· ··· ··· . ⎠
rXk X1 rXk X2 rXk X3 · · · 1

The underlying idea is that, if the variables are perfectly correlated, the determi-
nant of this matrix is zero. Let us take an example to visualize this property.

Example 5.2 Consider a model with two explanatory variables X1t and X2t . The
determinant D of the correlation coefficient matrix is given by:
 
 1 rX X 
D =  1 2
(5.32)
rX2 X1 1 
.

If the variables are perfectly correlated, rX1 X2 = 1. Therefore:


 
1 1
.D =  =0 (5.33)
1 1

The determinant of the correlation coefficient matrix is zero when the variables
are perfectly correlated.
Conversely, when the explanatory variables are orthogonal, rX1 X2 = 0 and the
determinant of the correlation coefficient matrix is 1.

The method proposed by Farrar and Glauber (1967) consists of investigating


whether or not the determinant of the correlation coefficient matrix between the
explanatory variables is close to 0. If this is the case, multicollinearity is presumed.
The authors suggested a Chi-squared test to test the null hypothesis that the
determinant of the correlation coefficient matrix is 1, meaning that the variables
are orthogonal, against the alternative hypothesis that the determinant is less than 1,
indicating that the variables are dependent. The test statistic is given by:
 
1
.F G = − T − 1 − [2 (k + 1) + 5] log D (5.34)
6
5.2 Multicollinearity and Variable Selection 233

where D denotes the determinant of the correlation coefficient matrix of the k


explanatory variables. Under the null hypothesis:

F G ∼ χ 21 k(k+1)
. (5.35)
2

The decision rule is written:

– If F G < χ 21 , the orthogonality hypothesis is not rejected.


2 k(k+1)
– If F G > χ21 , the orthogonality hypothesis is rejected.
2 k(k+1)

The Eigenvalue Method  


This technique is based on the calculation of the eigenvalues of the matrix . X' X
or, similarly, of the matrix of correlation coefficients
 of the explanatory variables.
Knowing that the determinant of the matrix . X' X is equal to the product of
the eigenvalues, a low value of the determinant means that one or more of these
eigenvalues are low. There is then a presumption of multicollinearity.
Belsley et al. (1980) suggest calculating the following statistic, called the state
indicator:

λmax
.𝜘 = √ (5.36)
λmin

where .λmax (respectively


  .λmin ) denotes the largest (respectively smallest) eigenvalue

of the matrix . X' X . If the matrix .X has been normalized, so that the length of each
of its columns is 1, then the .𝜘 statistic is equal to 1 when the columns are orthogonal
and greater than 1 when the columns exhibit multicollinearity. This technique is not
a statistical test as such, but it is frequently considered that values of .𝜘 between 10
and 30 correspond to a situation of moderate multicollinearity, and that values above
30 are an indication in favor of strong multicollinearity.

Variance Inflation Factors


Variance inflation factors (V I F ) can be used as indicators of multicollinearity.
Again, it is worth mentioning that this is not a statistical test per se.
We have previously seen (see Eq. (5.25)) that the variance inflation factor
associated with the .ith explanatory variable is written as:

1
V I Fi =
. (5.37)
1 − Ri2

where .Ri2 is the coefficient of determination relating to the regression of the variable
.Xit on the .(k − 1) other explanatory variables. Obviously, the value of .V I Fi is

higher the closer .Ri2 is to 1. Consequently, the higher .V I Fi is, the more collinear
the variable .Xit is.
234 5 Problems with Explanatory Variables

The .V I Fi statistics can be calculated for the various explanatory variables


(i = 1, . . . , k). When the differences between the .V I Fi statistics are large, it is
.

possible to identify the highest values and, thus, identify collinear variables. But, if
the differences between the .V I Fi statistics for the different explanatory variables
are small, it is impossible to detect the variables responsible for multicollinearity.
In practice, if the value of the .V I Fi statistic is greater than 10, which corresponds
to the case where .Ri2 > 0.9, the variable .Xit is considered to be strongly collinear.

Empirical Application
Consider the following model:

REU ROt = α + β1 RDJ I NDt + β2 RF T SEt + β3 RNI KKEIt + εt


. (5.38)

where:

– REU RO denotes the series of returns of the European stock market index, Euro
Stoxx 50.
– RDJ I N D is the series of returns of the Dow Jones Industrial Average index.
– RF T SE is the series of returns of the UK stock market index, F T SE 100.
– RNI KKEI is the series of returns of the NI KKEI index of the Tokyo Stock
Exchange.

The data, taken from the Macrobond database, are quarterly and cover the period
from the second quarter of 1987 to the second quarter of 2021 (.T = 137).
We are interested in the possible multicollinearity between the three explanatory
variables under consideration. Let us start by calculating the matrix of correlation
coefficients among the explanatory variables:
⎛ ⎞
1 rRDJ I N D,RF T SE rRDJ I N D,RN I KKEI
. ⎝ rRF T SE,RDJ I N D 1 rRF T SE,RN I KKEI ⎠
rN I KKEI,RDJ I N D rRN I KKEI,RF T SE 1
⎛ ⎞
1 0.8562 0.6059
= ⎝0.8562 1 0.5675⎠ (5.39)
0.6059 0.5675 1

It appears that the most strongly correlated explanatory variables are RF T SE


and RDJ I ND.
The estimation of the model (5.38) leads to the results in Table 5.3.
If we refer to the method proposed by Klein, we find that the coefficient of deter-
mination of the model (5.38), equal to 0.7959, is higher than the linear correlation
coefficients between RDJ I ND and RNI KKEI (0.6059) and between RF T SE
and RNI KKEI (0.5675), but is lower than the linear correlation coefficient
between RDJ I ND and RF T SE (0.8562), suggesting there is collinearity.
5.2 Multicollinearity and Variable Selection 235

Table 5.3 Estimation of the relationship between the series of stock market returns
Variable Coefficient Std. Error t-Statistic Prob.
C −0.005134 0.004447 −1.154642 0.2503
RDJIND 0.525822 0.107257 4.902431 0.0000
RFTSE 0.633759 0.101216 6.261419 0.0000
RNIKKEI 0.084954 0.046478 1.827827 0.0698
R-squared 0.795886 Mean dependent var 0.011257
Adjusted R-squared 0.791282 S.D. dependent var 0.107580
S.E. of regression 0.049149 Akaike info criterion .−3.159171

Sum squared resid 0.321274 Schwarz criterion .−3.073916


Log likelihood 220.4032 Hannan-Quinn criterion .−3.124526
F-statistic 172.8658 Durbin-Watson stat 1.945250
Prob(F-statistic) 0.0000

To investigate whether the Farrar and Glauber test leads to the same conclusion,
let us calculate the determinant D of the matrix of correlations between the
explanatory variables. We obtain:

D = 0.1666
. (5.40)

This determinant being closer to 0 than to 1, the presumption of multicollinearity


remains valid. Let us calculate the test statistic:
 
1
.F G = − 137 − 1 − [2 (3 + 1) + 5] log(0.1666) = 239.8618 (5.41)
6

The value read from the table of the Chi-squared distribution is equal to .χ62 =
12.592 at the 5% significance level. As the calculated value is higher than the critical
value, the null hypothesis of orthogonality between the explanatory variables is
rejected and the presumption of multicollinearity is confirmed.
Let us now apply the technique based on the calculation of the variance inflation
factors (V I F ). To do this, we regress each of the explanatory variables on the
other two and calculate the coefficient of determination associated with each
regression. The results are reported in Table 5.4. The values of the V I F statistics
are relatively low (less than 10), suggesting that multicollinearity, if present, is not
very strong. This is consistent with the fact that the coefficients of determination .Ri2

Table 5.4 Calculation of i 2


.Ri .V I Fi
VIF
RDJ I N D 0.7543 4.0698
RF T SE 0.7368 3.7994
RN I KKEI 0.3760 1.6025
236 5 Problems with Explanatory Variables

associated with each of the three regressions are lower than the overall coefficient
of determination ascertained by estimating the model (5.38).

5.2.4 Solutions to Multicollinearity

A frequently used technique is to increase the number of observations in order to


increase the sample size. Such a procedure is useless if the data added are the same
as those already in the sample. In such a case, multicollinearity will effectively be
repeated. Other techniques have therefore been proposed.

Use of Preliminary Estimates


This first, sequential technique consists in using estimation results from a previous
study. To do this, we decompose the matrix .X of explanatory variables and the
vectors of coefficients as follows:
   
  βr β̂ r
.X = X r X s , β = and β̂ = (5.42)
βs β̂ s

where .Xr is the submatrix of size .(T , r) formed by the first r columns of .X and .X s
is the submatrix composed of the .s = k + 1 − r remaining columns.
Suppose that, in a previous study, the coefficient .β̂ s was obtained and that it is
an unbiased estimator of .β s . It then remains for us to estimate .β r . To do this, we
start by calculating a new dependent variable .Ỹ , which consists in correcting the
dependent variable of the observations already used, .Xs :

Ỹ = Y − Xs β̂ s
. (5.43)

We then regress .Ỹ on the explanatory variables appearing in .Xr and obtain the
following OLS estimator .β̂ r :
 −1 '
β̂ r = X'r Xr
. Xr Ỹ (5.44)

Given that:

Y = Xβ + ε = X r β r + X s β s + ε
. (5.45)

we can write:
 −1 '  
β̂ r = X 'r X r
. Xr Xr β r + Xs β s + ε − Xs β̂ s (5.46)
5.2 Multicollinearity and Variable Selection 237

Hence:
 −1 '    −1 '
β̂ r = β r + X'r Xr
. X r X s β s − β̂ s + X'r Xr Xr ε (5.47)
 
Knowing that .E (ε) = 0 and .E β̂ s = β s , we deduce:
 
E β̂ r = β r
. (5.48)

meaning that .β̂ r is an unbiased estimator of .β r .

Remark 5.1 A technique similar to this is to combine time series and cross-
sectional data (see in particular Tobin, 1950).

The Ridge Regression


This is a technique proposed by Hoerl and Kennard (1970a,b) involving the
numerical mechanical treatment of multicollinearity. The underlying idea is simple.
The' problem
 arises because multicollinearity means there is a column, in the matrix
. X X , representing a linear combination (within one error term) of other columns.

Hoerl and Kennard (1970a,b) suggest destroying this linear


 combination by adding
a constant to the diagonal elements of the matrix . X' X . The principle of the ridge
regression is to define the ridge estimator:
 −1 '
β̂ R = X' X + cI
. XY (5.49)

where .c > 0 is an arbitrary constant.


The ridge estimator can be expressed in terms of the usual OLS estimator .β̂:
  −1 −1
β̂ R = I + c X' X
. β̂ (5.50)

Furthermore, replacing .Y by .Xβ + ε in (5.49) and taking the expectation, we


have:
   −1 '
.E β̂ R = X' X + cI X Xβ (5.51)

The ridge estimator .β̂ R is therefore a biased estimator of .β. However, Schmidt
(1976) showed that the variances of the elements of .β̂ R are lower than those
associated with the elements of the vector of OLS estimators.
The difficulty inherent in the ridge regression lies in the choice of the value of
c. Hoerl and Kennard (1970a,b) suggest estimating using several values for c in
order to study the stability of .β̂ R . The technique, known as ridge trace, consists in
238 5 Problems with Explanatory Variables

plotting the different values of .β̂ R on the y-axis for various values of c on the x-axis.
The value of c is then selected as the one for which the estimators .β̂ R are stable.

Remark 5.2 The ridge regression method can be generalized to the case where a
value
 ' different from c is added to each of the elements of the diagonal of the matrix
. X X . This technique is called generalized ridge regression.

Other Techniques
There are other procedures for dealing with the multicollinearity problem, which we
briefly mention below:

– The method of Marquardt generalized  inverses.


 The underlying idea is to
calculate the inverse of the matrix . X' X without needing to compute the
determinant of the same matrix. Since the determinant is equal to the product of
the eigenvalues, the technique consists in calculating the inverse of the eigenvalue
matrix.
– Principal component analysis. We place ourselves in a space where the axes
represent the k variables and the points stand for time (or individuals). The
technique consists in determining in this space k new axes possessing the
property of orthogonality. Factor analysis can also be used.
– The transformation of variables. Rather than carrying out a regression on the
raw series, i.e., on the series in levels, the regression is estimated on the
series in first differences. Such a technique frequently reduces the extent of
multicollinearity because, although series in levels may be highly correlated,
there is a priori no reason for first-differenced series to be so too. However, this
procedure is not without its critics, especially because of the possible occurrence
of autocorrelation in the error term of the first-difference regression.
– Elimination of explanatory variables. The underlying idea is simple: remove the
variable(s) that cause multicollinearity. However, such a procedure can lead to
model misspecification. For example, if a significant variable is removed, the
OLS estimators become biased and the variance of the error term can no longer be
correctly estimated. If a non-significant variable is selected, the OLS estimators
remain unbiased and the variance of the error term can be correctly estimated.
However, selecting or including insignificant variables in a regression reduces the
precision of the coefficient estimates of the significant variables. In the following
section, we present various methods for selecting explanatory variables.

5.2.5 Variable Selection Methods

In addition to the model comparison criteria presented in Chap. 3, which may also
be useful here, there are various methods for selecting explanatory variables. These
techniques can guide us in choosing which variables to remove or add to a model.
5.2 Multicollinearity and Variable Selection 239

The Method of All Possible Regressions


This technique consists in estimating all possible regressions. Thus, from k explana-
tory variables, .2k − 1 regressions are to be estimated. We then select the model
that maximizes the adjusted coefficient of determination (or that minimizes the
information criteria if we wish to use criteria other than the adjusted coefficient of
determination). Obviously, this method is easily applicable for small values of k. For
a high number of explanatory variables, it becomes difficult to use. For example, if
we have 10 explanatory variables, the number of regressions to be estimated is equal
to 1023.

Backward Elimination of Explanatory Variables


This technique involves first estimating the complete model, i.e., the one including
the k explanatory variables. We then eliminate the explanatory variable whose
estimated coefficient has the lowest t-statistic. We then re-estimate the model on
the remaining .k − 1 explanatory variables and again eliminate the variable whose
coefficient has the lowest t-statistic. We reiterate the procedure.

Forward Selection of Explanatory Variables


This is the symmetrical technique to the previous one. We begin by calculating the
correlation coefficients between the dependent variable and each of the explanatory
variables. We select the explanatory variable most strongly correlated with the
dependent variable. Let us note this variable .Xi . We then calculate the partial
correlation coefficients .rY Xj ,Xi , for .i /= j , i.e., the correlation coefficients between
the dependent variable and each of the .k − 1 other explanatory variables .Xj ,
the influence of the variable .Xi having been removed. We select the explanatory
variable for which the partial correlation coefficient is highest. We continue in this
way. We stop when the t-statistics of the coefficients of the explanatory variables
are below the selected critical value, or when the gain measured by the adjusted
coefficient of determination is below a certain threshold that we set.

Remark 5.3 A variant of this technique is the stagewise procedure. As in the


forward selection method, we begin by selecting the explanatory variable .Xi that
is most highly correlated with the dependent variable. We then determine the
residual series resulting from the regression of Y on .Xi . We calculate the correlation
coefficient between this residual series and each of the .k − 1 other explanatory
variables. We then select the explanatory variable .Xj for which the coefficient of
correlation is the highest. The next step is to determine the residual series from
the regression of Y on .Xi and .Xj . We again calculate the correlation coefficients
between this residual series and each of the .k − 2 other remaining explanatory
variables. The one with the highest correlation coefficient is selected, and so on.
The procedure stops when the correlation coefficients are no longer significantly
different from zero.
240 5 Problems with Explanatory Variables

The Stepwise Method


This is an extension of the previous method and is a technique of progressive
selection of explanatory variables with the possibility of elimination. Thus, the
stepwise method is based on the same principle as the forward selection method,
except that each time an explanatory variable is introduced, we examine the
coefficients’ t-statistics of each of the previously selected variables and eliminate
the one(s) whose associated coefficients are not significantly different from zero.
These various methods are not based on any economic considerations and should
therefore be used with caution. The most frequently used technique is the stepwise
method.

Empirical Application
Consider the previous empirical application aimed at explaining the returns
REU RO of the European stock index (Euro Stoxx 50) by three explanatory
variables:

– RDJ I N D: series of returns of the Dow Jones Industrial Average index


– RF T SE: series of returns of the UK stock market index F T SE 100
– RNI KKEI : series of returns of the NI KKEI index of the Tokyo Stock
Exchange

Let us apply each of the methods presented above.


The technique of all possible regressions involves estimating the following .23 −
1 = 7 models:

– Model (1): Regression of REU RO on RDJ I ND


– Model (2): Regression of REU RO on RF T SE
– Model (3): Regression of REU RO on RNI KKEI
– Model (4): Regression of REU RO on RDJ I ND and RF T SE
– Model (5): Regression of REU RO on RDJ I ND and RN I KKEI
– Model (6): Regression of REU RO on RF T SE and RNI KKEI
– Model (7): Regression of REU RO on RDJ I ND, RF T SE, and RNI KKEI

The results are summarized in Table 5.5.


As shown, except for the constant in some cases, all the variables have significant
coefficients in each of the 7 regressions estimated at the 5% significance level—at
the 10% level in the case of the coefficient associated with the variable RNI KKEI
in model (7). If we choose the model that maximizes the adjusted coefficient of
determination, model (7) must be selected. No explanatory variable is therefore
eliminated.
Applying the backward method consists in (i) starting from model (7) and
(ii) eliminating the variable whose coefficient has the lowest t-statistics. Using a
5% significance level, this technique leads to the elimination of the RNI KKEI
variable.
5.3 Structural Changes and Indicator Variables 241

Table 5.5 Estimation of all possible regressions


Model Constant RDJ I N D RF T SE RN I KKEI Adjusted .R 2
(1) .−0.0116 . 1.1559 0.7234
(−2.3232) (18.8861)
(2) .0.0008 . 1.1416 0.7399
(0.1617) (19.6947)
(3) .0.0101 .0.5592 0.3513
(1.3610) (8.6398)
(4) .−0.0062 .0.5811 .0.6557 0.7876
(−1.4065) (5.5986) (6.4680)
(5) .−0.0098 . 1.0511 .0.1195 0.7318
(−1.9629) (13.8745) (2.2836)
(6) .0.0015 . 1.0219 .0.1492 0.7554
(0.3380) (14.9681) (3.0913)
(7) .−0.0051 .0.5258 .0.6338 .0.0850 0.7913
(−1.1546) (4.9024) (6.2614) (1.8278)
t-statistics of the estimated coefficients are in parentheses

The implementation of the forward selection method involves calculating the


correlation coefficients between the dependent variable and each of the three
explanatory variables: .rREU RO,RDJ I N D = 0.8517, .rREU RO,RF T SE = 0.8613, and
.rREU RO,RN I KKEI = 0.5967. The first variable selected is therefore RF T SE. We

then estimate the models with two explanatory variables: (RF T SE and RDJ I ND)
and (RF T SE and RNI KKEI ). These are models (4) and (6), respectively. In each
of these models, the new variable has a coefficient significantly different from zero.
Since the coefficient associated with RDJ I ND has a higher t-statistic than that for
RNI KKEI , the second explanatory variable is RDJ I ND. Finally, we estimate
the model with three explanatory variables, model (7), which is the model that we
select, if we consider a 10% significance level, since the coefficients of the three
variables are significant. If the usual 5% significance level is used, model (4) should
be selected.
The application of the stepwise method is identical to the previous case, and
the same model is selected, with the three explanatory variables having significant
coefficients at the 10% significance level—model (4) being chosen if a 5%
significance level is considered.

5.3 Structural Changes and Indicator Variables

The focus here is on studying the stability of the estimated model. When estimating
a model over a certain period of time, it is possible that a structural change
may appear in the relationship between the dependent variable and the explanatory
variables. It is thus possible that the values of the estimated parameters do not
remain identical over the entire period studied. In some cases, the introduction of
indicator variables allows us to take account of these possible structural changes.
We also present various stability tests of the estimated coefficients. Beforehand, we
outline the constrained least squares method consisting in estimating a model
under constraints.
242 5 Problems with Explanatory Variables

5.3.1 The Constrained Least Squares Method

In Chap. 3, we presented various tests of the hypothesis that the parameter vector .β
is subject to the existence of q constraints:

H0 : Rβ = r
. (5.52)

where .R is a given matrix of size .(q, k + 1) and .r is the vector of constraints of


dimension .(q, 1). It is further assumed that .q ≤ k + 1 and the matrix .R is of full
rank, meaning that the q constraints are linearly independent.
If the null hypothesis is not rejected and we wish to re-estimate the model taking
the constraints into account, we should apply the constrained least squares method
(CLS). We then obtain an estimator .β̂ 0 verifying the relationship:

R β̂ 0 = r
. (5.53)

This estimator, called the constrained least squares estimator, is given by:3

 −1 '   ' −1 ' −1  


β̂ 0 = β̂ + X' X
. R R XX R r − R β̂ (5.54)

The null hypothesis .H0 : .Rβ = r can be tested using a Fisher test (see Chap. 3):

(RSSc − RSSnc ) /q
F =
. ∼ F (q, T − k − 1) (5.55)
RSSnc / (T − k − 1)

where .RSSnc is the sum of the squared residuals of the unconstrained model (i.e.,
that associated with the vector .β̂) and .RSSc denotes the sum of the squares of the
residuals of the constrained model (i.e., that associated with the vector .β̂ 0 ), q being
the number of constraints and k the number of explanatory variables included in the
model. As we will see later in this chapter, such a test can also be used to assess the
possibility of structural changes.

Example 5.3 In simple cases, CLS are reduced to OLS on a previously transformed
model. Consider the following model:

Yt = α + β1 X1t + β2 X2t + εt
. (5.56)

3 The demonstration is given in the appendix to this chapter.


5.3 Structural Changes and Indicator Variables 243

with .β1 + β2 = 1. This is a model with two explanatory variables (.k = 2) and one
constraint (.q = 1), so we have .q < k. Noting that .β2 = 1 − β1 , we can write the
model as follows:

.Yt = α + β1 X1t + X2t − β1 X2t + εt (5.57)

that is:

.Zt = α + β1 Wt + εt (5.58)

with .Zt = Yt − X2t and .Wt = .X1t − X2t . It is then possible to apply the OLS
method to Eq. (5.58) to obtain .α̂ and .β̂1 . We then deduce: .β̂2 = 1 − β̂1 .

5.3.2 The Introduction of Indicator Variables


Definition
Indicator variables, also known as dummy variables, are binary variables
composed of 0 and 1. They are used to reflect the presence or absence of a
phenomenon or characteristic. Introducing dummy variables in regressions makes it
possible to answer various questions, such as:

– Is the crime rate higher in urban or rural areas?


– Does women’s labor supply depend on there being children in the household?
– Is there gender discrimination in hiring?
– For the same level of education, do men’s and women’s hiring salaries differ? If
so, by how much?
– Does location (urban/rural) have an impact on educational attainment?
– Do terrorist attacks change tourist behavior?
– Etc.

Thus, dummy variables are introduced into a regression model when we wish to
take a binary explanatory factor into account among the explanatory variables. As
an example, such a factor could be:

– The phenomenon either takes place or does not; the dummy variable is then 1 if
the phenomenon takes place, 0 otherwise.
– The male or female factor; the dummy variable is equal to 1 if the person is a
man, 0 if it is a woman (or vice versa).
– The place of residence: urban or rural; the dummy variable is equal to 1 if the
person lives in an urban zone, 0 if in a rural area (or vice versa).
– etc.

The dummy variables enable data to be classified into subgroups based on various
characteristics or attributes. Such variables can be introduced into a regression
244 5 Problems with Explanatory Variables

model in the same way as “traditional” explanatory variables. A regression model


can simultaneously contain “traditional” explanatory variables and dummy vari-
ables, but it can also contain dummy variables only. From a theoretical viewpoint,
the introduction of dummy variables into a regression model does not change the
estimation method, nor the tests to be implemented.

Introductory Examples
One frequent use of dummy variables is to take account of an exceptional or
even aberrant phenomena. Examples include the following: German reunification in
1991, the launch of the euro in 1999, the September 11, 2001 attacks in the United
States, the winter 1995 strikes in France, the December 1999 storm in France, the
October 1987 stock market crash, the Covid-19 pandemic that broke out at the end
of 2019, etc.
Consider, for example, the following regression model:

Yt = α + β1 Xt + εt
. (5.59)

for .t = 1, . . . , T . Suppose that at a date .t0 , between 1 and T , a disturbance or a


shock of any origin affects the variable .Xt so that this value is considered an outlier
in the regression. We can write the regression model:

Yt = α + β1 Xt + β2 Dt + εt
. (5.60)

with:

1 if t = t0
. Dt = (5.61)
0 otherwise

The model (5.60) is then written:

Yt = (α + β2 ) + β1 Xt + εt if t = t0
. (5.62)

and:

Yt = α + β1 Xt + εt if t /= t0
. (5.63)

The two models differ only in the value of the intercept: a perturbation taken into
account via a dummy variable affects only the intercept of the model.
There are, however, cases where the perturbation also impacts the slope of the
regression model:

Yt = α + β1 Xt + β2 Dt + β3 Xt Dt + εt
. (5.64)
5.3 Structural Changes and Indicator Variables 245

This model can thus be written as:

Yt = (α + β2 ) + (β1 + β3 ) Xt + εt if t = t0
. (5.65)

and

Yt = α + β1 Xt + εt if t /= t0
. (5.66)

In this example, the intercept and the slope are simultaneously modified.
The choice between the specifications (5.60) and (5.64) can be guided by
theoretical considerations. It is also possible to carry out a posteriori tests in order
to make this choice. To this end, we start by estimating the model without dummy
variables:

Yt = α + β1 Xt + εt
. (5.67)

We then estimate the two models incorporating the dummy variables:

Yt = α ' + β1' Xt + εt
. (5.68)

with .α ' = (α + β2 ) and .β1' = β1 in the case of model (5.60) and .β1' = β1 + β3 in
the case of model (5.64).
We then perform coefficient comparison tests:

/ α and if .β1' = β1 : we are in the case of the specification (5.60).


– If .α ' =
'
– If .α = / α and if .β1' /= β1 : we are in the case of the specification (5.64).

Model Containing Only Indicator Variables


Models whose regressors consist solely of dummy variables are sometimes called
analysis of variance (ANOVA) models. They are frequently used to compare
differences in means between two or more categories of individuals. Let us take
an example. Consider consumer spending on any good B in a country by subregion:

Yi = α + β1 D1i + β2 D2i + εi
. (5.69)

where:

– .Yi denotes the average consumption expenditure of the good B in the subregion
i. 
1 if the subregion is located in the South
– .D1i =
0 otherwise.

1 if the subregion is located in the Southeast
– .D2i =
0 otherwise.
246 5 Problems with Explanatory Variables

.D1i and .D2i are two dummy variables representing a qualitative variable. The

qualitative variable here is the region to which the subregion belongs, and each of
the dummy variables represents one of the modalities associated with this variable.
The average consumption expenditure of the good B in the North corresponds to the
case where .D1i = 1 and .D2i = 0 and is given by the model:

Yi = α + β1
. (5.70)

Similarly, the average consumption expenditure of the good B in the Southeast


is such that .D1i = 0 and .D2i = 1 and is given by:

Yi = α + β2
. (5.71)

We deduce that the average consumption expenditure of the good B in the


Southwest corresponds to the case where .D1i = D2i = 0 and is given by:

Yi = α
. (5.72)

Thus, the average consumption expenditure of the good B in the Southwest is


given by the value of the intercept, the slope coefficients .β1 and .β2 indicating,
respectively, the magnitude of the difference between the average consumption
expenditure in the North and the Southwest, and the value of the difference between
the average consumption expenditure in the Southeast and the Southwest.
To illustrate this, let us assign values to the different coefficients. Suppose that
the estimation of the model (5.69) has led to the following results:

. Ŷi = 350 − 30 D1i − 60 D2i (5.73)


(24.20) (−1.54) (−3.20)

where the values in parentheses correspond to the t-statistics of the estimated


coefficients. This example shows that the average consumption expenditure on
the good B in the Southwest is e350, those in the Northern subregions are e30
lower, and those in subregions located in the Southeast are e60 lower. The average
consumption expenditure on the good B in the North is therefore e.350 − 30 = 320
and e.350 − 60 = 290 in the Southeast.
Let us now look at the significance of the coefficients. We see that the coefficient
for subregions in the North is not significant, while that for subregions in the
Southeast is. As a result, the average consumption expenditure of the good B is not
significantly different between subregions located in the North and in the Southwest.
On the other hand, there is a significant difference of e60 between the average
consumption expenditure on the good B in the Southeast and in the Southwest.
This very simple example highlighted that, to distinguish the three regions
(North, Southeast, and Southwest), only two dummy variables were introduced.
It is very important to note that the introduction of three dummy variables, each
5.3 Structural Changes and Indicator Variables 247

representing a region, is impossible.4 This would lead to a situation of perfect


collinearity, i.e., in a case where the three variables would be perfectly dependent
(see above). In other words, the sum of the three dummy variables would be equal
to the unit vector, which would be, by construction,
  collinear with the unit vector
of the constant term, making the matrix . X' X non-invertible, with .X denoting the
matrix of explanatory variables. Consequently, when a variable has m categories or
attributes, it is appropriate to introduce .(m − 1) dummy variables. In our example,
we had three regions, .m = 3, so only two dummy variables should be included in
the model. Similarly, if we want to study the consumption expenditure on the good
B by gender, only one dummy variable should be introduced taking the value 1 for
men and 0 for women (or vice versa).

Remark 5.4 In the example studied here, we have considered a single qualitative
variable comprising three attributes (North, Southeast, and Southwest). It is possible
to introduce more than one qualitative variable into a model. This is the case, for
example, with the following model:

Yi = α + β1 D1i + β2 D2i + εi
. (5.74)

where .Yi denotes the consumption expenditure on the good B in the subregion i, .D1i
denotes gender (.D1i = 1 if the person is male, 0 if female), and .D2i is the region
to which the subregion belongs (.D2i = 1 if the subregion is in the South region, 0
otherwise). The estimation of the coefficient .α thus gives the average consumption
expenditure on the good B by a woman living in a subregion that is not located in
the South. This situation is the reference situation to which the other cases will be
compared.

Remark 5.5 In Chap. 2, we presented semi-log models of the type .log Yt = α +


βXt +εt . We have seen that the coefficient .β can be interpreted as the semi-elasticity
of .Yt with respect to .Xt . What happens to this when we are dealing with a dummy
variable and not a usual quantitative variable? Consider the following model:

. log Yt = α + βDt + εt (5.75)

where .Yt denotes the average hourly wage in euros and .Dt is a dummy variable
equal to 1 for women and 0 for men. For men, the model is given by .log Yt = α + εt
and for women by .log Yt = (α + β) + εt . Therefore, .α denotes the logarithm of
the average hourly wage and .β is the difference in the logarithm of the average
hourly wage for men and women. The anti-log of .α is interpreted as the median
(not average) hourly wage for men. Similarly, the anti-log of .(α + β) is the median
hourly wage for women.

4 This is only possible if the model does not have a constant term.
248 5 Problems with Explanatory Variables

Model Containing Indicator and Usual Explanatory Variables


In most cases, the regression model contains not only qualitative variables but also
traditional explanatory variables, i.e., quantitative variables. Such models are called
analysis of covariance (ANCOVA) models. They are an extension of ANOVA
models in the sense that they control for the effects of quantitative variables. For this
reason, the quantitative variables in ANCOVA models are called control variables.
If we take the previous example of the consumption expenditure on a good B,
we can consider the following model:

Yi = α + β1 D1i + β2 D2i + β3 Xi + εi
. (5.76)

where:

– .Yi denotes
 the average consumption expenditure on a good B in the subregion i.
1 if the subregion is located in the North
– .D1i =
0 otherwise.

1 if the subregion is located in the Southeast
– .D2i =
0 otherwise.
– .Xi designates the average wage.

The average wage is here a control variable.

Interactions
To illustrate the problem of interactions between variables, let us consider the
following model:

Yt = α + β1 D1t + β2 D2t + β3 Xt + εt
. (5.77)

where:

– .Yt is the hourly wage in euros.


1 if the person is a woman
– .D1t is such that: .D1t =
0 otherwise.

1 if the person works in the public sector
– .D2t is such that: .D2t =
0 if the person works in the private sector.
– .Xt is the level of education (in number of years).

In this model, gender (represented by .D1t ) and employment sector (represented


by .D2t ) are qualitative variables. It is implicitly assumed that the differential effect
of each of these two variables is constant, regardless of the value of the other
variables. In other words, the differential effect of the variable .D1t is assumed to
be constant in both employment sectors, and the differential effect of the variable
.D2t is also assumed to be constant in both genders. Thus, if hourly wages are higher

for men than for women, they are higher whether or not they work in the public
5.3 Structural Changes and Indicator Variables 249

sector. Similarly, if the hourly wage of people working in the public sector is lower
than that of people working in the private sector, it is so whether they are men
or women. There is therefore no interaction between the two qualitative variables.
Such an assumption may seem highly restrictive, and we need to take into account
the possible interactions between the variables. For example, a woman working in
the public sector may earn less than a man working in the same sector. We can thus
write the model (5.77):

.Yt = α + β1 D1t + β2 D2t + β3 Xt + β4 (D1t D2t ) + εt (5.78)

For a woman .(D1t = 1) working in the public sector .(D2t = 1), the model is:

Yt = (α + β1 + β2 + β4 ) + β3 Xt + εt
. (5.79)

The interpretation of the coefficients is then:

– .β1 is the differential effect of being a woman.


– .β2 is the differential effect of working in the public sector.
– .β4 is the differential effect of being a woman working in the public sector.

Let us consider a numerical example. Suppose that the estimation of the


model (5.77) leads to the following results:

Ŷt = −0.6 − 3.4D1t − 2.7D2t + 0.7Xt


. (5.80)

This model indicates that, all other things being equal, the average hourly wage
of women is e.3.4 lower than that of men, and the average hourly wage of people
working in the public sector is e.2.7 lower than that of people working in the private
sector.
Let us now assume that the estimation of the model (5.78) has led to the following
results:

Ŷt = −0.6 − 3.4D1t − 2.7D2t + 0.7Xt + 3.1D1t D2t


. (5.81)

All other things being equal, the average hourly wage of women working in the
public sector is e3 lower (.−3.4 − 2.7 + 3.1 = −3), which lies between the values
.−3.4 (gender difference alone) and .−2.7 (employment sector difference alone).

Use of Indicator Variables for Deseasonalization


We have seen that dummy variables can be used in a variety of cases, including:

– To take account of a temporary event or exceptional phenomenon (e.g., German


reunification, World War I, strikes, particular climatic phenomena, etc.)
– To take account of spatial effects (e.g., living in an urban zone or a rural area,
region to which subregions belong, etc.)
250 5 Problems with Explanatory Variables

– To account for characteristics (modalities or categories) of qualitative variables


(e.g., gender, employment sector, political affiliation, religion, etc.)

Dummy variables can also be used to deseasonalize a series. To illustrate this


procedure, consider a series .Yt with a quarterly frequency. To test for the existence
of a seasonal effect, we run the following regression:5

Yt = β1 D1t + β2 D2t + β3 D3t + β4 D4t + εt


. (5.82)

where .Dit = 1 for quarter i, 0 otherwise, with .i = 1, 2, 3, 4. If there is a seasonal


effect for a particular quarter, the coefficient of the dummy variable corresponding
to that quarter will be significantly different from zero. As an example, if .Yt denotes
a series of toy sales, it is highly likely that the coefficient .β4 assigned to the fourth
quarter of the year will be significantly different from zero, reflecting the increase
in toy sales at Christmas.
The deseasonalization method based on dummy variables is straightforward. It
consists first in estimating the model (5.82):

Ŷt = β̂1 D1t + β̂2 D2t + β̂3 D3t + β̂4 D4t


. (5.83)
 
and then calculating the series . Yt − Ŷt , which is the residual series: this series is
the seasonally adjusted series.

Remark 5.6 The method described above is only valid if the series under con-
sideration can be decomposed in an additive way, i.e., if it can be written in
the form: .Y = T + C + S + ε where T designates the trend, C the cyclical
component, S the seasonal component, and .ε the residual component. This is
known as an additive decomposition scheme. But, if the components enter
multiplicatively (multiplicative decomposition scheme), i.e., .Y = T × C × S × ε,
the deseasonalization method presented above is inappropriate.

Empirical Application
Consider the series of returns of the Dow Jones Industrial Average US stock index
(RDJ I N D) over the period from the second quarter of 1970 to the second quarter
of 2021 (source: Macrobond). We are interested in the relationship between the
present value of returns and their first-lagged value. The study period includes the
stock market crash of October 19, 1987, corresponding to the 71st observation.
To take into account this exceptional event, let us consider the following dummy
variable:

1 if t = 71
.Dt = (5.84)
0 otherwise

5 Note that a dummy variable is assigned to each quarter, which requires us not to introduce a
constant term into the regression. We could also have written the model by introducing a constant
term and only three dummy variables.
5.3 Structural Changes and Indicator Variables 251

Estimating the model with the dummy variable gives:


RDJ
. I NDt = 0.0206 + 0, 0069RDJ I NDt−1 − 0.3131 Dt (5.85)
(3.6585) (0.1028) (−3.9772)

where the values in parentheses are the t-statistics of the estimated coefficients.
All else being equal, the stock market crash of October 1987 reduced the average
value of UK index returns by .−0.3131. This decrease is significant insofar as the
coefficient assigned to the dummy variable is significantly different from zero.

5.3.3 Coefficient Stability Tests

It is often useful to assess the robustness of the estimated model over the entire
study period, i.e., to test its stability. There may in fact be a structural change
or break in the relationship between the dependent variable and the explanatory
variables, resulting in instability of the coefficients of the model estimated over the
entire period under consideration. Several causes can produce a structural change,
such as the transition to the single currency, a change in exchange rate regime (from
a fixed to a flexible exchange rate regime), the 1973 oil shock, the World War II, the
1987 stock market crash, the Covid-19 pandemic, etc.
There are various methods for assessing the stability of the estimated coefficients
of a regression model, and we present them below.

Rolling Regressions and Recursive Residuals


General Principle
The rolling regression technique is very intuitive. It involves estimating the
parameters of successive models in the following way. The first method consists in
estimating successive models by adding one or more observations each time, starting
from the beginning and going towards the end of the period. This is known as
forward regression. The second method also involves estimating successive models
by adding one or more observations each time, but starting from the end of the period
and moving towards the beginning. This is known as backward regression.
Several graphs are then plotted to assess the stability of the various characteristics
of the estimated regression, for example:

– Graph of the estimated coefficients


– Graph of the t-statistics of the estimated coefficients
– Graph of the coefficients of determination of the estimated models

It is then a matter of identifying a possible break in these graphs in order to


detect a structural change. This technique is a graphical method and therefore not a
statistical test in the strict sense of the term.
252 5 Problems with Explanatory Variables

Recursive Residuals
Consider the usual regression model:

Y = Xβ + ε
. (5.86)

Let us denote .x t the vector of k explanatory variables plus the constant for the
t-th observation:

.x t = (1, X1t , . . . , Xkt )' (5.87)

Let .Xt−1 be the matrix formed by the .(t − 1) first rows of .X t . This matrix can
be used to estimate .β. Let .β̂ t−1 be the estimator thus obtained:
 −1 '
β̂ t−1 = X't−1 Xt−1
. Xt−1 Y t−1 (5.88)

where .Y t−1 is the subvector of the .(t − 1) first elements of .Y t .


It is then possible to calculate the forecast error associated with the t-th
observation, denoted .et :

et = Yt − x 't β̂ t−1
. (5.89)

The variance of this forecast error is given by (see Chap. 3):


  −1 
V (et ) = σε2 1 + x 't X't−1 Xt−1
. xt (5.90)

Recursive residuals can be defined as follows:

Yt − x 't β̂ t−1
wt = 
.
 −1 (5.91)
1 + x 't X't−1 Xt−1 xt

with .wt ∼ N(0, σε2 ). The recursive residuals are defined as the normalized forecast
errors. Furthermore, the recursive residuals are a set of residuals which, if the
disturbance terms are independent and of the same law, are themselves independent
and of the same law. The recursive residuals thus are normally distributed since they
are defined as a linear function of normal variables and the forecast given by OLS
is unbiased.
To generate a sequence of recursive residuals, we proceed as follows:

– We choose a starting set of .τ observations, with .τ < T . These may be, for
example, the first .τ observations of the sample (case of a forward regression).
Having estimated .β̂ τ , the corresponding recursive residuals are determined:

Yτ +1 − x 'τ +1 β̂ τ
wτ +1 = 
.
 −1 (5.92)
1 + x 'τ +1 X'τ Xτ x τ +1
5.3 Structural Changes and Indicator Variables 253

– We increase the number of observations by one: we therefore consider the


first .τ + 1 observations of the sample. We estimate .β̂ τ +1 and determine the
corresponding recursive residuals:

Yτ +2 − x 'τ +2 β̂ τ +1
wτ +2 = 
.
 −1 (5.93)
1 + x 'τ +2 X'τ +1 Xτ +1 x τ +2

– We repeat the previous step, each time including an additional observation.

We obtain a series of .T − τ recursive residuals, defined by (5.91) for .t =


τ + 1, . . . , T . From these recursive residuals, Brown et al. (1975) proposed the
CUSUM and CUSUM of squares tests, which allow us to test the stability of the
estimated coefficients in a model. These are tests designed to test the null hypothesis
of parameter stability, i.e.:

H0 : β 1 = β 2 = . . . = β T = β
. (5.94)

with:
2
σε1
. = σε2
2
= . . . = σεT
2
= σε2 (5.95)

where the coefficients .β t , t = 1, . . . , T , are the vectors of the regression coefficients


for the period t and .σεt2 denotes the variances of the errors for this same period.

The CUSUM Test


The first test proposed by Brown et al. (1975) is called the CUSUM test
(CUmulative SUM) and is based on the cumulative sum defined by:


t
wj
Wt =
. (5.96)
σ̂w
j =τ +1

where .t = τ + 1, . . . , T and:

1 
T
. σ̂w2 = wj2 (5.97)
T −τ
j =τ +1

.Wt is thus a cumulative sum that varies with t. As long as the vectors .β are

constant, the average of .Wt is zero. If they vary, .Wt tends to deviate from the straight
line representing the null expectation. More specifically, under the null hypothesis
of stability of the coefficients, .Wt must lie within the interval .[−Lt , Lt ] where:

a (2t + T − 3τ )
. Lt = √
T −τ
254 5 Problems with Explanatory Variables

with .a = 1.143 at the 1% significance level, .a = 0.948 at the 5% significance level,


and .a = 0.850 at the 10% significance level.
The null hypothesis of stability is rejected if .Wt cuts .Lt or .−Lt . This means
that if the coefficients are not constant, there may be a disproportionate number of
recursive residuals .wt of the same sign that “push” .Wt out of the interval.
The CUSUM test is generally used to detect possible systematic movements in
the coefficient values reflecting possible structural instability. If a break is found,
the chosen specification is rejected over the whole period. On the other hand, if we
wish to detect random movements (and not movements necessarily resulting from a
structural modification of the coefficients), we use the CUSUM of squares test.

The CUSUM of Squares Test


The second test proposed by Brown et al. (1975) is the CUSUM of squares test
based on the cumulative sums of the squares of the recursive residuals, i.e.:


t
wj2
j =τ +1
st =
. , t = τ + 1, . . . , T (5.98)

T
wj2
j =τ +1

The line representing the expectation of the test statistic under the null hypothesis
of stability is given by:

t −τ
E(st ) =
. (5.99)
T −τ

This expression varies from 0 (for .t = τ ) to 1 (for .t = T ). The idea is then


to study the significance of the difference between .st and .E(st ). To this end, we
draw a pair of reference lines parallel to the line .E(st ), one lying above it, the other
below it, at a distance C. Brown et al. (1975) tabulated the values of C for various
sample sizes and significance levels. The estimated coefficients are unstable if the
graph of .st intersects the previously defined reference lines. More precisely, under
the null hypothesis of stability of the coefficients, .st has a beta distribution with
mean .(t − τ ) /(T − τ ) and is framed by the interval .±C + (t − τ ) /(T − τ ). If .st
leaves this interval at period .t = i, this means that there is a random break that
reflects the instability of the regression coefficients for this period .i.

The Chow Test (1960)


The Chow test is very frequently used. Consider the following regression model,
for .t = 1, . . . , T :

.Yt = α + βXt + εt (5.100)


5.3 Structural Changes and Indicator Variables 255

Suppose we divide the sample into two sub-samples and estimate the following
models:

Yt = α1 + β1 Xt + ε1t , for t = 1, . . . , τ
. (5.101)

and:

Yt = α2 + β2 Xt + ε2t , for t = τ + 1, . . . , T
. (5.102)

The relationship (5.100) is based on the absence of structural change over the
entire period under consideration. In other words, there is no difference between the
two periods .t = 1, . . . , τ and .t = τ + 1, . . . , T : the constant term and the slope
coefficient remain identical. If this is indeed the case, we should have:

α = α1 = α2 and β = β1 = β2
. (5.103)

The Chow test consists in testing the null hypothesis:



α1 = α2
H0 :
. (5.104)
β1 = β2

against the alternative hypothesis:



α1 =
/ α2
H1 :
. (5.105)
β1 =/ β2

Assuming that .ε1t and .ε2t are independent and both have normal distributions of
zero mean and same variance, the Chow test is implemented as follows:

– The model (5.100) is estimated and the corresponding residual sum of squares is
noted .RSS0 .
– The model (5.101) is estimated and the corresponding residual sum of squares is
noted .RSS1 .
– The model (5.102) is estimated and the corresponding residual sum of squares is
noted .RSS2 .
– .RSSa = RSS1 + RSS2 is calculated.
– We calculate the test statistic:
(RSS0 − RSSa ) / (k + 1)
F =
. (5.106)
RSSa /(T − 2 (k + 1))

where k is the number of explanatory variables (1 in our case).


256 5 Problems with Explanatory Variables

Under the null hypothesis of no structural change, we have:

F ∼ F (k + 1, T − 2(k + 1))
. (5.107)

The decision rule is written:

– If .F < F (k + 1, T − 2(k + 1)), we do not reject the null hypothesis of stability


of the coefficients. There is no structural change.
– If .F > F (k + 1, T − 2(k + 1)), we reject the null hypothesis of stability of the
coefficients, indicating the presence of a structural change.

Remark 5.7 The Chow test can be easily generalized to the existence of more than
one structural break. Thus, if we wish to test for the existence of two breaks, we
will split the period into three sub-periods, the principle of the test remaining the
same (the sum of the squared residuals .RSSa then being equal to the sum of the
sums of the squared residuals of the three regressions corresponding to the three
sub-periods).

The Chow test assumes that the date at which the structural break(s) occurs is
known. Otherwise, it is possible to perform rolling regressions and to calculate the
Chow test statistic for each of these regressions. The break point we are looking for
then corresponds to the value for which the Chow statistic is maximum.

Empirical Application
Consider the relationship between the returns of the US Dow Jones Industrial
Average index (RDJ I ND) and the Japanese Nikkei index (RNI KKEI ). The data,
taken from the Macrobond database, are quarterly over the period from the second
quarter of 1978 to the second quarter of 2021 (.T = 173). The OLS estimation of
the relationship:

RNI KKEIt = α + β × RDJ I N Dt + εt


. (5.108)

over the whole period gives:

RNI
. KKEIt = −0.0079 + 0.7958RDJ I N Dt (5.109)
(−1.1688) (9.5019)

where the figures in parentheses correspond to the t-statistics of the estimated


coefficients. In addition, we have the following statistics: .R02 = 0.3455 and .RSS0 =
1.2654.
As our study period includes the stock market crash of October 19, 1987, it is
pertinent to question the stability of the estimated relationship.

Rolling Regressions
To get a rough idea of the stability of the estimated coefficients, we perform rolling
regressions by adding an observation each time. We then graphically represent the
5.3 Structural Changes and Indicator Variables 257

1.2

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4
1980 1985 1990 1995 2000 2005 2010 2015 2020

Recursive C(2) Estimates


± 2 S.E.

Fig. 5.1 Rolling regressions. Change in the slope coefficient

estimated coefficients corresponding to each of the regressions estimated. Figure 5.1


shows the change in the slope coefficient. The dotted curves correspond to plus or
minus twice the standard deviation of the estimated coefficient. The graph shows
some instability of the slope coefficient, which is more marked in the first part of
the sample.
The same type of analysis can be carried out for the constant term. Figure 5.2
also shows some instability of the coefficient, with, in particular, a change in sign
over the study period.

CUSUM and CUSUM of Squares Tests


To assess the stability of the estimated relationship, let us calculate the recursive
residuals using the method previously presented. Figure 5.3 plots the series of
the recursive residuals, along with two curves representing more or less twice the
standard deviation of recursive residuals at each date. If some residuals lie outside
the band formed by these two curves, this is an indication of instability. We observe
that such a phenomenon appears on several occasions, particularly in the first part
of the sample.
Figure 5.4 corresponds to the application of the CUSUM test for the 5%
significance level. It can be seen that the series of the cumulative sum of recursive
residuals remains within the interval formed by the two lines, suggesting there is no
structural instability in the relationship over the period under consideration.
Figure 5.5 corresponds to the application of the CUSUM of squares for the 5%
significance level. This graph highlights that the cumulative sum of squares falls
258 5 Problems with Explanatory Variables

.06

.04

.02

.00

-.02

-.04
1980 1985 1990 1995 2000 2005 2010 2015 2020

Recursive C(1) Estimates


± 2 S.E.

Fig. 5.2 Rolling regressions. Change in the constant term

.2

.1

.0

-.1

-.2

-.3

-.4
1980 1985 1990 1995 2000 2005 2010 2015 2020

Recursive Residuals ± 2 S.E.

Fig. 5.3 Recursive residuals


5.3 Structural Changes and Indicator Variables 259

40

30

20

10

-10

-20

-30

-40
1980 1985 1990 1995 2000 2005 2010 2015 2020

CUSUM 5% Significance

Fig. 5.4 CUSUM test

1.2

1.0

0.8

0.6

0.4

0.2

0.0

-0.2
1980 1985 1990 1995 2000 2005 2010 2015 2020

CUSUM of Squares 5% Significance

Fig. 5.5 CUSUM of squares test


260 5 Problems with Explanatory Variables

outside the interval delimited by the two lines around the 1987 stock market crash,
indicating some instability (random break) in the parameters or variance.

Chow Test
To investigate whether the stock market crash of October 1987 caused a structural
break in the relationship between the returns of the two indices under consideration,
let us apply the Chow test. To this end, we estimate two regressions: a regression
over the period 1978.2–1987.3 (before the crash) and a regression over the period
1987.4–2021.2 (after the crash). The results are given below.
Over the period 1978.2–1987.3, i.e., .t = 1, . . . , 70:

RNI
. KKEIt = 0.0279 + 0.4072RDJ I N Dt (5.110)
(3.3720) (3.8941)

with .R12 = 0.2964 and .RSS1 = 0.0782.


Over the period 1987.4–2021.2, i.e., .t = 71, . . . , 173:

RNI
. KKEIt = − 0.0159 + 0.8728RDJ I NDt (5.111)
(−1.9608) (8.7515)

with .R22 = 0.3654 and .RSS2 = 1.1258.


So we have: .RSSa = 0.0782 + 1.1258 = 1.2040.
It is possible to calculate the Chow test statistic:

(1.2654 − 1.2040) / (1 + 1)
F =
. = 4.3128 (5.112)
1.2040/(173 − 2 (1 + 1))

The Fisher table gives us, at the 5% significance level: .F (2, 169) = 2.997. The
calculated value of the test statistic being higher than the critical value, the null
hypothesis of stability of the estimated coefficients is rejected at the 5% significance
level. There is indeed a break in the fourth quarter of 1987. This result was expected
in view of the differences obtained in the estimates over the two sub-periods: the
constant term is positive in the first sub-period and negative in the second, and the
slope coefficient is more than twice as high in the second sub-period as in the first.
It is possible to recover the results of the Chow test by introducing a dummy
variable and running a single regression. Consider the following model:

RN I KKEIt = α +β ×RDJ I NDt +γ Dt +δ (Dt × RDJ I N Dt )+εt


. (5.113)

0 over the period 1978.2–1987.3
with .Dt = .
1 over the period 1987.4–2021.2
Thus, over the period 1978.2–1987.3, the model is written:

RNI KKEIt = α + β × RDJ I NDt + εt


. (5.114)
5.3 Structural Changes and Indicator Variables 261

and over the period 1987.4–2021.2:

RN I KKEIt = (α + γ ) + (β + δ) × RDJ I N Dt + εt
. (5.115)

In Eq. (5.113), the coefficient .δ indicates how much the slope coefficient of the
second period differs from that of the first period. Estimating this relationship yields:

RN
. I KKEIt = 0.0279 + 0.4072RDJ I NDt − 0.0439 Dt
(1.8619) (2.1502) (−2.6195)

+ 0.4656 (Dt × RDJ I NDt ) (5.116)


(2.2140)

All coefficients are significantly different from zero (at the 10% significance level
for the constant term), suggesting that the relationship between the two series of
returns is different over the two sub-periods. From this estimation, we deduce the
relationship over the 1978.2–1987.3 period:

RNI
. KKEIt = 0.0279 + 0.4072 × RDJ I N Dt (5.117)

and the relationship over the period 1987.4–2021.2:

.RNI
KKEIt = (0.0279 − 0.0439) + (0.4072 + 0.4656) RDJ I NDt (5.118)
= −0.0160 + 0.8728RDJ I NDt

We naturally find the results obtained when implementing the Chow test. We see
that the coefficients .γ and .δ are significantly different from zero. We deduce that the
regressions over the two sub-periods differ not only in the constant term but also in
the slope coefficient. The findings therefore confirm the results of the Chow test.

Conclusion

In this chapter, we have considered that two of the assumptions of the regression
model concerning the explanatory variables are violated: the assumption of indepen-
dence between the explanatory variables and the error term, on the one hand, and
the assumption of independence between the explanatory variables, on the other.
We have also studied third problem relating to the explanatory variables, namely,
the question of the instability of the estimated model. So far, we have considered
models in which the dependent variable is a function of one or more explanatory
variables at the same date, i.e., at the same moment in time. Frequently, however, the
explanatory variables include lagged variables or the lagged endogenous variable.
These are referred to as dynamic models, as opposed to static models. These models
are the subject of the next two chapters.
262 5 Problems with Explanatory Variables

The Gist of the Chapter

Random explanatory variables Estimation method: instrumental variables (I V )


 −1 '
I V estimator β̂ I V = Z ' X ZY
Z: matrix of instrumental variables
Multicollinearity Explanatory variables not independent of each other
Detection Calculation of correlation coefficients
Klein (1962) test
Farrar and Glauber (1967) test
Eigenvalue method
Calculation of variance inflation factors (V I F )
Solution Ridge regression
Structural changes
Constrained estimation Constrained least squares
Consideration Indicator variables (dummy)
Tests CUSUM and CUSUM of squares (recursive residuals)
Chow (1960)

Further Reading

In addition to the references cited in this chapter concerning, in particular, collinear-


ity detection techniques or tests to detect breaks, the readers can extend their
knowledge through the selected reading below.
For developments relating to multicollinearity and model selection, see the
chapter by Leamer (1983) in the book edited by Griliches and Intriligator (1983).
Readers may also refer to Belsley et al. (1980) and to various econometric textbooks
such as Judge et al. (1988).
Concerning the constrained least squares method, interested readers may refer to
the following textbooks: Judge et al. (1985, 1988), Gouriéroux and Monfort (2008),
and Greene (2020). For more information on the use of dummy variables, see Fox
(1997) and Kennedy (2008). Regarding seasonal adjustment methods, an interesting
reference is provided by Diebold (2012). Readers can also consult Johnston and
Dinardo (1996) for further discussion on recursive residuals. We have not dealt
with models with random coefficients in this book; interested readers may consult
Swamy (1971). Similarly, a key reference on regime-switching models is Goldfeld
and Quandt (1972).
Appendix 263

Appendix: Demonstration of the Formula for Constrained Least


Squares Estimators

In order to determine the constrained least squares estimator, we need to solve a


minimization program of sum of squared residuals:
 '    
Min Y − Xβ̂ 0
. Y − Xβ̂ 0 = Min e' e (5.119)

under the constraint: .R β̂ 0 = r.


We define the Lagrange function:
 '    
L = Y − Xβ̂ 0
. Y − Xβ̂ 0 − 2λ' R β̂ 0 − r (5.120)

where .λ is a column vector formed by the q Lagrange multipliers. We calculate the


partial derivatives:

∂L
. = −2X ' Y + 2X' Xβ̂ 0 − 2R ' λ (5.121)
∂ β̂ 0

and:
∂L  
. = −2 R β̂ 0 − r (5.122)
∂λ
Canceling these partial derivatives, we have:

X' Xβ̂ 0 − X' Y − R ' λ = 0


. (5.123)

and:

R β̂ 0 − r = 0
. (5.124)
 −1
Let us multiply each member of (5.123) by .R X ' X :
 −1 '  −1 '
R β̂ 0 − R X' X
. X Y − R X' X Rλ=0 (5.125)

Hence:
  −1 ' −1  
λ = R X' X
. R r − R β̂ (5.126)
264 Problems related to explanatory variables

 −1 '
with .β̂ = X' X X Y denoting the OLS estimator of the unconstrained model. It
is then sufficient to replace .λ by its value in (5.123):

 −1 '  −1 '   ' −1 ' −1  


.β̂ 0 = X' X X Y + X' X R R XX R r − R β̂ (5.127)

Hence:
 −1 '   ' −1 ' −1  
β̂ 0 = β̂ + X' X
. R R XX R r − R β̂ (5.128)

which defines the constrained least squares estimator.


Distributed Lag Models
6

In the previous chapters, we essentially considered models in which the variables


were all expressed at the same instant of time. However, it is common for models to
include lagged variables, i.e., variables that are not all expressed at the same period.
These are known as dynamic models. There are two main categories:

– Models including present and lagged values of explanatory variables; these are
distributed lag models.
– Models in which the lagged values of the dependent variable intervene among
the explanatory variables; in this case, we speak of autoregressive models.1

This chapter proposes a study of the first category of models. Autoregressive


models will be treated in depth in the following chapter dealing with time series
models. We have thus chosen to divide the presentation of dynamic models into two
chapters, the distinction residing in whether or not the lagged dependent variable is
among the explanatory variables.

6.1 Why Introduce Lags? Some Examples

In economics, the present value of the dependent variable often depends on the past
values of the explanatory variables. In other words, the influence of the explanatory
variables is only exerted after a certain lag. Let us take a few examples to illustrate
this.

1 It is possible to introduce a nuance in the terminology. We generally speak of autoregressive

models when only the lagged values of the dependent variable are present as explanatory variables.
We speak of autoregressive distributed lag (ARDL) models when the lagged values of the
dependent variable are among the explanatory variables in addition to the lagged values of the
usual explanatory variables.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 265
V. Mignon, Principles of Econometrics, Classroom Companion: Economics,
https://doi.org/10.1007/978-3-031-52535-3_6
266 6 Distributed Lag Models

Consider, as a first example, the consumption function. A simple way of


illustrating the consideration of lags is to refer to Duesenberry’s (1949) ratchet
effect. According to this approach, consumption depends on income of the same
period, but also on the highest income achieved in the past. This introduces an
irreversibility of consumption decisions over time, in the sense that the attainment of
a higher income permanently modifies consumption habits. Another way of taking
into account the influence of the past is to explain consumption not only by income
in the same period, but also by lagged consumption. Such a formulation, widely used
in econometric studies on consumption, makes it possible to model consumption
habits. We can write, noting income R and consumption C:2

Ct = α + β1 Rt + φ1 Ct−1
. (6.1)

By replacing .Ct−1 with:

Ct−1 = α + β1 Rt−1 + φ1 Ct−2


. (6.2)

we can write:

Ct = α + β1 Rt + φ1 (α + β1 Rt−1 + φ1 Ct−2 )
. (6.3)

that is:

Ct = α (1 + φ1 ) + β1 (Rt + φ1 Rt−1 ) + φ12 Ct−2


. (6.4)

Replacing .Ct−2 with .α + β1 Rt−2 + φ1 Ct−3 and so on, we get:



 α
Ct = β1
. φ1i Rt−i + (6.5)
1 − φ1
i=0

Thus, in addition to income of the current period, all past income has an influence
on present consumption. The reaction of consumption to a change in income is
therefore spread, i.e., staggered, over time. The slower the reaction, the closer
the coefficient .φ1 is to 1. This coefficient represents the degree of inertia of
consumption. The model (6.5) is a distributed lag model in the sense that the
explanatory variable (income) has a distributed impact over time on the dependent
variable (consumption).
Another possible illustration, again in the field of consumption, is provided by
Friedman’s (1957) permanent income model. According to this theory, consumption
in a given period depends not just on income of the same period, but on all income
anticipated in future periods. Since future incomes are unknown, they need to be

2 We ignore the error term here to simplify the notations and calculations to follow.
6.1 Why Introduce Lags? Some Examples 267

approximated in order to estimate such a model. Friedman proposes to approximate


permanent income by current income and all past income, with observed income
assigned a decreasing weight over time. Under these conditions, an increase in an
individual’s permanent income affects consumption over time. In other words, an
increase in income is not immediately reflected in consumption. A model describing
such a situation can, for example, be written as:

Ct = μ + δ0 Rt + δ1 Rt−1 + δ2 Rt−2 + εt
. (6.6)

where R denotes income and C consumption. In this model, the present and
lagged values of one and two periods of income are involved in explaining present
consumption, meaning that an increase in income is spread, or distributed, over three
periods. The model (6.6) is called a distributed lag model because the explanatory
variable exerts a time-distributed influence on the dependent variable.
A second example illustrating the spread over time of the influence of explana-
tory variables is given by the investment function. In line with the accelerator
model, investment reacts immediately to changes in demand, i.e.:

It = νΔYt
. (6.7)

where .It denotes investment at date t and .ΔYt = Yt − Yt−1 represents the change in
output perceived as the variation in demand, .ν being the acceleration coefficient. In
line with this formulation, a change in demand generates an immediate increase
in investment: there is no lag between the change in demand and the reaction
of investment. Such a formulation is too restrictive in the sense that it leads to
too abrupt variations in investment, and that there are lags in the adjustment of
investment to changes in demand. These limitations led to the flexible accelerator
model in which the capital stock K is linked to a weighted average of current and
past output, with the weight assigned to past output decreasing over time:
 
Kt = φ (1 − λ) Yt + λYt−1 + λ2 Yt−2 + . . . + λh Yt−h + . . .
. (6.8)

where the weight .λ is between 0 and 1. After a few simple calculations3 and remem-
bering that investment is equal to the change in the capital stock .(It = Kt − Kt−1 ),
the accelerator model can be written:


.It = λν (1 − λ)i ΔYt−i (6.9)
i=0

3 Seeclassic textbooks on macroeconomics or economic dynamics, for example, Blanchard and


Fischer (1989) and Dowrick et al. (2008).
268 6 Distributed Lag Models

This shows that investment reacts in a distributed way to changes in demand, not
adjusting immediately as was the case in the simple accelerator model. It is therefore
a distributed lag model.
These examples illustrate that a variety of factors can justify the existence of lags
and the use of distributed lag models. Lags can have a number of causes, including
but not limited to:

– The existence of memory or inertia phenomena. To take the example of


consumption, agents do not immediately modify their consumption following
an increase in income. There is inertia due, for example, to consumption habits.
– Technological or technical reasons. An increase in capital expenditure may have
staggered effects on investment, due in particular to the existence of production
delays. Similarly, the reaction of a variable to an economic policy is often spread
over several periods or only appears after a certain time lag.
– Institutional or political reasons. As an example, certain contractual obligations
may contribute to the occurrence of lags.
– One of the main reasons lies in expectations. Variables are often a function of
agents’ expectations, which are themselves frequently based on the past.

6.2 General Formulation and Definitions of Distributed Lag


Models

Noting h the number of lags, a distributed lag model is written generally as


follows:

.Yt = μ + δ0 Xt + δ1 Xt−1 + . . . + δh Xt−h + εt (6.10)

The number of lags h can be finite or infinite. An infinite lag model is used when
the lagged effects of the explanatory variables are likely to be very long-lasting.
Finite lag models are preferred when the effect of a change in X no longer has an
influence on Y after a relatively small number of periods.
To simplify the notations, let us introduce the lag operator L such that:

LXt = Xt−1
. (6.11)

The lag operator thus transforms a variable into its past value. More generally,
we have:

Li Xt = Xt−i
. (6.12)

Let us define the lag polynomial .D(L) of degree h such that:

D(L) = δ0 + δ1 L + . . . + δh Lh
. (6.13)
6.2 General Formulation and Definitions of DistributedLag Models 269

The distributed lag model (6.10) is then written as:

Yt = μ + D(L)Xt + εt
. (6.14)

The coefficient .δ0 measures the variation of .Yt following the variation of .Xt :

ΔYt
δ0 =
. (6.15)
ΔXt

.δ0 is called the short-term multiplier or impact multiplier of X. The partial sums

of the coefficients .δi , i = 1, . . . , h, define the cumulative multipliers. Thus,


 the
cumulative effect .τ periods after a shock occurring at period t is given by . τi=0 δi .
The polynomial:

D(1) = δ0 + δ1 + . . . + δh
. (6.16)

equal to the sum of all coefficients .δi , i = 1, . . . , h, measures the effect, in the long
term, of a variation in X on the value of Y . .D(1) is called the long-term multiplier
or equilibrium multiplier.
It is possible to normalize the coefficients .δi , i = 1, . . . , h, by dividing them by
their sum .D(1). The partial sums of these normalized .δi coefficients measure the
proportion of the total effect of a change in X reached after a certain period.
Let us consider a numerical example to illustrate this. Consider the model (6.6),
by giving values to the coefficients:

Ct = μ + 0.4Rt + 0.2Rt−1 + 0.1Rt−2


. (6.17)

The short-term multiplier is 0.4: following a one-unit increase in income,


individuals increase their consumption in the same period by 0.4 units. The long-
term multiplier is .0.4+0.2+0.1 = 0.7: following a one-unit increase in income, the
individual increases consumption by 0.4 units in the same period, by 0.2 units in the
following period, and by 0.1 units in the period after that. In the long term, the total
effect of a one-unit increase in income is an increase in consumption of 0.7 units.
Let us now calculate the standardized coefficients, dividing each coefficient by 0.7.
We obtain 0.57, 0.29, and 0.14, respectively. This means that 57% of the total effect
of a change in income is felt in the same period, 86% after one period, and 100%
after two periods.
Another useful concept is that of median lag: this is the number of periods
required for 50% of the total effect to be reached. The notion of mean lag, on
the other hand, allows us to grasp the time period corresponding to the mean value
of the coefficients4 .δi , i = 1, . . . , h. It is defined by the weighted average of the

4 The concepts of median and mean lags only really make sense if the coefficients are of the same
sign.
270 6 Distributed Lag Models

coefficients, i.e.:


h
iδi
i=0 δ1 + 2δ2 + . . . + hδh D ' (1)
.D̄ = = = (6.18)
h δ0 + δ1 + δ2 + . . . + δh D(1)
δi
i=0

where .D ' denotes the derivative of D.

6.3 Determination of the Number of Lags and Estimation


6.3.1 Determination of the Number of Lags

There are several procedures for determining the number of lags h in a distributed
lag model:

Yt = μ + δ0 Xt + δ1 Xt−1 + . . . + δh Xt−h + εt
. (6.19)

– A first technique is to perform significance tests on the coefficients. For example,


we can perform a Fisher test, testing the nullity of coefficients associated with
lags of order greater than h.
– A second technique relies on the use of various criteria: the adjusted coefficient
of determination, the Akaike information criterion (AIC), Schwarz information
criterion (SIC), Hannan-Quinn information criterion (HQ), etc. We select the
value of h that maximizes the adjusted coefficient of determination or the one
that minimizes the AIC, SIC, and HQ criteria:

RSSh 2h
AI C(h) = log
. + (6.20)
T T
RSSh h log T
SI C(h) = log
. + (6.21)
T T
RSSh h log(log T )
H Q(h) = log
. +2 (6.22)
T T
where .RSSh denotes the sum of squared residuals of the model with h lags and
T is the number of observations.5

5 It has been assumed here that the constant c is equal to 1 in the expression of the HQ criterion.
6.4 Finite Distributed Lag Models: Almon Lag Models 271

Of course, each technique has its advantages and disadvantages. In particular,


we know that the AIC criterion tends to overestimate the value of h, while the SIC
criterion is more parsimonious.

6.3.2 The Question of Estimating Distributed Lag Models

In addition to determining the number of lags, estimating a distributed lag model


poses a second problem. It is theoretically possible to estimate such a model by
OLS if the explanatory variable is assumed to be non-random. However, the greater
the number of lags, the higher the risk of multicollinearity between the lagged
explanatory variables. Under these conditions, it is known that the estimation of
the coefficients is imprecise, as coefficient standard deviations tend to be too high.
To overcome this limitation, assumptions are made about the structure of the lags
in order to reduce the number of parameters to be estimated. A distinction is made
between models with a finite number of distributed lags and models with an infinite
number of distributed lags.

6.4 Finite Distributed Lag Models: Almon Lag Models

Finite distributed lag models are polynomial distributed lag (PDL) models, also
known as Almon lag models (see Almon, 1962).
Almon’s technique avoids directly estimating the coefficients .δi , since it consists
in assuming that the true lag distribution can be approximated by a polynomial of
order q:


q
δi = α0 + α1 i + α2 i 2 + . . . + αq i q =
. αj i j (6.23)
j =0

with .h > q.
Consider, as an example, that the polynomial is of second order .(q = 2). Then
we have:

– .δ0 = α0
– .δ1 = α0 + α1 + α2
– .δ2 = α0 + 2α1 + 4α2

– .δ3 = α0 + 3α1 + 9α2

– .. . .

.δh = α0 + hα1 + h α2
– 2

Let us plug these values into (6.19):


 
Yt = μ + α0 Xt + (α0 + α1 + α2 ) Xt−1 + . . . + α0 + hα1 + h2 α2 Xt−h + εt
.

(6.24)
272 6 Distributed Lag Models

that is:

Yt = μ + α0 (Xt + Xt−1 + . . . + Xt−h )


. (6.25)
+ α1 (Xt−1 + 2Xt−2 + . . . + hXt−h )
 
+ α2 Xt−1 + 4Xt−2 + . . . + h2 Xt−h + εt

The “new” explanatory variables are linear combinations of the lagged explana-
tory variables. Thus, a regression of Y on these “new” explanatory variables yields
estimates of the coefficients .α, which, in turn, allows us to determine the coefficients
.δ.

More generally, in matrix form, we can write for h lags and a polynomial of
degree q:
⎛ ⎞ ⎛ ⎞⎛ ⎞
δ0 1 0 0 ··· ··· 0 α0
⎜δ1 ⎟ ⎜1 1 1 · · · · · · 1 ⎟ ⎜α1 ⎟
⎜ ⎟ ⎜ ⎟⎜ ⎟
⎜ ⎟ ⎜ q⎟⎜ ⎟
. ⎜ δ2 ⎟ = ⎜1 2 2 · · · · · · 2 ⎟ ⎜ α2 ⎟
2
(6.26)
⎜ . ⎟ ⎜. ⎟⎜ . ⎟
⎝ .. ⎠ ⎝ .. ⎠ ⎝ .. ⎠
δh 1 h h ··· ··· h
2 q αh

Let us note .W the matrix:


⎛ ⎞
1 0 0 ··· ··· 0
⎜1 1 1 · · · · · · 1 ⎟
⎜ ⎟
⎜ q⎟
.W = ⎜1 2 2 · · · · · · 2 ⎟
2
(6.27)
⎜. ⎟
⎝ .. ⎠
1 h h2 · · · · · · hq

The matrix form of (6.19) being given by:

Y = I μ + Xδ + ε
. (6.28)

we can write:

Y = I μ + XW α + ε
. (6.29)
⎛ ⎞ ⎛ ⎞
δ0 α0
⎜ δ1 ⎟ ⎜α1 ⎟
⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⎟
where .δ = ⎜δ2 ⎟ and .α = ⎜α2 ⎟ .
⎜.⎟ ⎜.⎟
⎝ .. ⎠ ⎝ .. ⎠
δh αh
6.5 Infinite Distributed Lag Models 273

It is then possible to estimate the regression (6.29) by OLS to obtain the estimator
α̂ of .α and to deduce the estimator .δ̂ of .δ from (6.26).
.

The method just described assumes that the degree q of the polynomial used
for the approximation is known. In practice, this is not the case and q needs to be
determined. One possible technique is to start with a high value, .q = h − 1, and
test the significance of the associated coefficient (.αh−1 ) by means of a t-test. The
degree of the polynomial is then progressively reduced until a significant coefficient
appears.

6.5 Infinite Distributed Lag Models

In infinite distributed lag models, the effect of the explanatory variable is unlimited
in time. It is assumed, however, that the recent past has more influence than the
distant past, and that the weight of past observations tends to decrease steadily over
time.
Generally speaking, an infinite distributed lag model is written as:


Yt = μ +
. δi Xt−i + εt (6.30)
i=0

or:


Yt = μ +
. δi Li Xt + εt (6.31)
i=0

In order to estimate the model, it is necessary to reduce it to a model with a finite


number of parameters to be estimated. To this end, a particular form is imposed on
the structure of the coefficients .δi . The two most commonly used forms are based
on the Koyck approach and the Pascal approach.

6.5.1 The Koyck Approach


The Koyck Transformation
Under the assumption that the coefficients .δi are of the same sign, Koyck (1954)
assumes that the lags decrease geometrically:

δi = λ i δ0
. (6.32)

where .i = 0, 1, 2, . . . and .0 < λ < 1.


Since .λ < 1, the relationship (6.32) expresses the fact that the coefficients .δi
decrease as we move further into the past: more recent observations are assigned
274 6 Distributed Lag Models

higher weights than past observations. The closer .λ is to 1, the slower the rate of
decrease of the coefficients, and the closer .λ is to 0, the faster that rate.
Substituting (6.32) into (6.30), we have:

. Yt = μ + δ0 Xt + λδ0 Xt−1 + λ2 δ0 Xt−2 + . . . + λi δ0 Xt−i + . . . + εt (6.33)

or:
 
Yt = μ + δ0 Xt + λXt−1 + λ2 Xt−2 + . . . + λi Xt−i + . . . + εt
. (6.34)

The associated polynomial .D(L) is written as:

D(L) = δ0 + λδ0 L + λ2 δ0 L2 + . . . + λi δ0 Li + . . .
. (6.35)

We can rewrite (6.34) as follows:

Yt = μ + D(L)Xt + εt
. (6.36)

or:

D(L)−1 Yt = D(L)−1 μ + D(L)−1 D(L)Xt + D(L)−1 εt


. (6.37)

Knowing that:
 
D(L) = δ0 1 + λL + λ2 L2 + . . . + λi Li + . . .
. (6.38)

represents the sum of the terms of a geometric sequence, we have:

δ0
D(L) =
. (6.39)
(1 − λL)

and therefore:
(1 − λL)
D(L)−1 =
. (6.40)
δ0

Substituting into (6.37), we get:

. (1 − λL) Yt = (1 − λL) μ + δ0 Xt + (1 − λL) εt (6.41)

That is:

Yt − λYt−1 = (1 − λ) μ + δ0 Xt + εt − λεt−1
. (6.42)
6.5 Infinite Distributed Lag Models 275

Hence:

Yt = λYt−1 + (1 − λ) μ + δ0 Xt + εt − λεt−1
. (6.43)

This gives an autoregressive model with autocorrelated errors of order 1. This


transformation, from a distributed lag model (Eq. (6.30)) to an autoregressive
model (Eq. (6.43)), is called the Koyck transformation. It significantly reduces the
number of parameters to be estimated. Indeed, if we compare Eqs. (6.30) and (6.43),
it appears that, instead of estimating the constant term .μ and an infinite number of
parameters .δi , we now only need to estimate three parameters: the constant term
.μ, .δ0 , and .λ. The risk of multicollinearity is consequently greatly reduced, if not

eliminated.
A few remarks are in order. Firstly, the Koyck transformation shows that we can
move from a distributed lag model to an autoregressive model. The endogenous
lagged variable, .Yt−1 , now appears as an explanatory variable of .Yt , which has
important implications in terms of estimation. We know that one of the basic
assumptions of the OLS method is that the matrix of explanatory variables is non-
random. Such an assumption is violated here since .Yt−1 , like .Yt , is a random
variable. However, this assumption can be reformulated by writing that the matrix
of explanatory variables can contain random variables, provided that they are not
correlated with the error term (see Chap. 3). It will therefore be necessary to check
this characteristic during the estimation phase; we will return to this point when
discussing estimation methods (see below).
Secondly, the error term of the model (6.43) is .εt − λεt−1 , and no longer only
.εt as was the case in the original model (6.30). Let us posit .ηt = εt − λεt−1 . It

appears that while the .εt are indeed non-autocorrelated, this is not the case for the
.ηt , a characteristic which must be taken into account during the estimation phase

(see below).
Thirdly, it is possible to define median and mean lags in the Koyck approach,
which makes it possible to quantify the speed with which the dependent variable
.Yt responds to a unit variation in the explanatory variable .Xt . The median lag

corresponds to the number of periods required for 50% of the total effect of a unit
change in the explanatory variable .Xt on .Yt to be reached. It can be shown that, in
the Koyck model, the median lag is given by .log 2/ log λ. Thus, the higher the value
of .λ, the greater the median lag and the lower the speed of adjustment. On the other
hand, the mean lag is defined by:


h
iδi
i=0
D̄ =
. (6.44)
h
δi
i=0
276 6 Distributed Lag Models

or, in the case of the Koyck model:

λ
D̄ =
. (6.45)
1−λ

The median and mean lags can thus be used to assess the speed with which .Yt
adjusts following a unit variation in .Xt .

Estimation: The Instrumental Variables Method


As previously mentioned, it is not possible to apply the OLS method directly to
estimate the Koyck model. This stems from two reasons:

– The lagged endogenous variable is among the explanatory variables, so the


matrix of explanatory variables is not non-random.
– The error term .ηt = εt − λεt−1 of the Koyck model exhibits autocorrelation.

If we wish to apply OLS to the Koyck model, we need to ensure that the lagged
endogenous variable .Yt−1 is independent of the error term .ηt . However, such an
assumption does not hold. Indeed, in accordance with (6.43), .εt−1 has an impact on
.εt . Similarly, if we write Eq. (6.43) in .t − 1, it is clear that .εt−1 has an impact on

.Yt−1 . There is thus a link between .εt and .Yt−1 .

The consequence of this dependence between the lagged endogenous variable


and the error term .εt is that the OLS estimators are no longer consistent. In
other words, even if the sample size grows indefinitely, the OLS estimators do not
approach their true population values. We know that, in such a case, it is possible to
use the instrumental variables method whose estimator is given by (see Chap. 5):

−1
β̂ I V = Z ' X
. Z'Y (6.46)

where .Z is the instrument matrix.


In the case of the Koyck model (Eq. (6.43)):

Yt = λYt−1 + (1 − λ) μ + δ0 Xt + εt − λεt−1
. (6.47)

only one instrument needs to be found, since only the variable .Yt−1 needs to
be instrumented (the variable .Xt is indeed independent of the error term, by
assumption). We frequently use .Xt−1 as the instrument of .Yt−1 . We then have the
following matrix .Z:
⎛ ⎞
1 X1 X0
⎜1 X2 X1 ⎟
⎜ ⎟
.Z = ⎜ . .. .. ⎟ (6.48)
⎝ .. . . ⎠
1 XT XT −1
6.5 Infinite Distributed Lag Models 277

and the estimator of the instrumental variables is written:6


⎛   ⎞−1 ⎛  ⎞
T X Y Y
  2t  t−1  t
.β̂ I V =⎝ X Xt XY ⎠ ⎝ XY ⎠ (6.49)
 t   t t−1  t t
Xt−1 Xt Xt−1 Xt−1 Yt−1 Xt−1 Yt

Remark 6.1 It is not always easy to find the “right” instrumental variables. In
these circumstances, the instrumental variables method may be of limited practical
interest, and it is preferable to resort to the maximum likelihood method. In the
case of the Koyck model, the essential role of the method of instrumental variables
is to obtain a consistent estimator of .β to serve as the initial value of an iterative
procedure, such as the maximum likelihood method.

Remark 6.2 (The Sargan Test) Sargan (1964) developed a test of instrument
validity. The test can be described sequentially as follows:

– Split the variables appearing in the regression model into two groups: the group
of variables independent of the error term (noted .X1 , .X2 , . . . ., .Xk1 ) and the group
of variables that are not independent of the error term (noted .W1 , .W2 , . . . , .Wk2 ).
– Note .Z1 , .Z2 , . . . , .Zk3 the instruments chosen for the variables W , with .k3 ≥ k2.
– Estimate the parameters of the model by the instrumental variables method, i.e.,
' −1 '
.β̂ I V = Z X Z Y , and deduce the estimated series of residuals .et .
– Regress the residuals .et on a constant, the variables X and the variables Z.
Determine the coefficient of determination .R 2 of the estimated regression.
– Calculate the Sargan test statistic:

S = (T − k − 1)R 2
. (6.50)

where T is the number of observations and k is the number of variables in the


original regression model.
– Under the null hypothesis of validity of all instruments, the statistic S follows
a Chi-squared distribution with r degrees of freedom, where .r = k3 − k2. If
the calculated value of the statistic S is less than the theoretical Chi-squared
value, the null hypothesis of instrument validity is not rejected. If the calculated
value of the statistic S is greater than the theoretical Chi-squared value, the null
hypothesis is rejected, meaning that at least one instrument is not valid in the
sense that it is not independent of the error term. In the latter case, the estimators
of the instrumental variables are not valid.

6 The sums run from 1 to T .


278 6 Distributed Lag Models

The Partial Adjustment Model


The partial adjustment model is an example of an application of the Koyck model
(see in particular Nerlove, 1958). This model includes lagged endogenous variables
among the explanatory variables. The underlying idea is that, due to the presence
of rigidities or various constraints, the dependent variable cannot reach the desired
value in a single period. In other words, the adjustment to the desired value takes
some time. Generally speaking, the partial adjustment model is written:

Yt∗ = α + βXt + εt
. (6.51)

where .Yt∗ denotes the desired level of the dependent variable .Yt and .Xt is an
explanatory variable. As the variable .Yt∗ is unobservable, we express it as a function
of .Yt by using a partial adjustment mechanism of the type:

Yt − Yt−1 = λ Yt∗ − Yt−1


. (6.52)

where .0 ≤ λ ≤ 1 is called the adjustment coefficient. The variation .(Yt − Yt−1 )


corresponds to the observed variation, . Yt∗ − Yt−1 being the desired variation.
Substituting (6.51) into (6.52) gives:

Yt − Yt−1 = λ (α + βXt + εt − Yt−1 )


. (6.53)

that is:

Yt = (1 − λ) Yt−1 + λα + λβXt + λεt


. (6.54)

This partial adjustment model has a similar structure to the Koyck model, the
error term being simpler since it is only multiplied by the constant .λ.

The Adaptive Expectations Model


The adaptive expectations model is another example of an application of the
Koyck model. In this type of model, the values of the explained variable are
a function, not of the observed values of the explanatory variables, but of the
anticipated or expected values. Generally speaking, we can write an adaptive
expectations model as follows:

. Yt = α + βXt∗ + εt (6.55)

where .Xt∗ denotes the expected value of the explanatory variable .Xt . As the variable

.Xt is generally not directly observable, we assume an adaptive training process for

expectations of the type:

Xt∗ − Xt−1
.
∗ ∗
= λ Xt − Xt−1 (6.56)
6.5 Infinite Distributed Lag Models 279

with .0 ≤ λ ≤ 1, .λ is called the expectation coefficient. If .λ = 0, then .Xt∗ =


∗ , which means that expectations remain identical from period to period (static
Xt−1
expectations). If .λ = 1, then .Xt∗ = Xt , which implies that the anticipated value is
equal to the observed value (naive expectations).
In line with the adaptive expectations hypothesis, expectations are revised each
period according to the information provided by the last value actually taken by
the variable. Low values of .λ indicate large adjustments in expectations, while high
values imply slow changes.
We can rewrite (6.56) as follows:

Xt∗ = λXt + (1 − λ) Xt−1


.

(6.57)

Substituting (6.57) into (6.55), we obtain:


 ∗

Yt = α + β λXt + (1 − λ) Xt−1
. + εt (6.58)

This model can be reduced to a Koyck model. Let us write the model (6.55) in
(t − 1) and multiply each member by .(1 − λ). This gives us:
.


. (1 − λ) Yt−1 = (1 − λ) α + (1 − λ) βXt−1 + (1 − λ) εt−1 (6.59)

By subtracting Eqs. (6.58) and (6.59), we get:

Yt − (1 − λ) Yt−1 = α − (1 − λ) α + λβXt + εt − (1 − λ) εt−1


. (6.60)

that is:

Yt = λα + λβXt + (1 − λ) Yt−1 + εt − (1 − λ) εt−1


. (6.61)

This gives us a structure similar to that of the Koyck model.

Remark 6.3 It is possible to combine partial adjustment and adaptive expectations


models. The dependent variable is then the desired level of the variable .Yt , the
explanatory variable being the expected value of the variable .Xt . The result is a
model in which the endogenous variable lagged by one period, but also by two
periods, is included among the explanatory variables. An economic illustration of
such a model is provided by Friedman’s permanent income model.

6.5.2 The Pascal Approach

The Pascal approach is another technique aimed at imposing a particular form on


the structure of the coefficients .δi in order to obtain a model with a finite number
of parameters to be estimated. Such an approach was adopted by Solow (1960) and
makes it possible to account for a distribution such that the coefficients are initially
280 6 Distributed Lag Models

low, increase until they reach a maximum, and then decrease (a kind of bell curve).
With this approach, the coefficients .δi are distributed as follows:

δi = (1 − λ)r+1 Cr+i
.
i
λi (6.62)
i
where .Cr+i is the coefficient of Newton’s binomial, .0 ≤ λ ≤ 1 and .r ∈ N.
The Pascal approach is a generalization of the Koyck approach. If we posit .r = 0,
we find the geometric distribution of Koyck.
Using Eq. (6.30), the distributed lag model is expressed as follows:


.Yt = μ + (1 − λ)r+1 Cr+i
i
λi Xt−i + εt (6.63)
i=0
The associated .D(L) polynomial is written as:


D(L) = (1 − λ)r+1
.
i
Cr+i λ i Li (6.64)
i=0

which can also be expressed as:


δ0
D(L) =
. (6.65)
(1 − λL)r+1
The model (6.63) becomes:

Yt = μ + D(L)Xt + εt
. (6.66)

or:

D(L)−1 Yt = D(L)−1 μ + D(L)−1 D(L)Xt + D(L)−1 εt


. (6.67)

– For .r = 0, then we have:

δ0
D(L) =
. (6.68)
(1 − λL)

and we find the Koyck model.


– For .r = 1, we have:

δ0
D(L) =
. (6.69)
(1 − λL)2
or:

(1 − λL)2
.D(L)−1 = (6.70)
δ0
6.6 Autoregressive Distributed Lag Models 281

Substituting in (6.67), we get:

. (1 − λL)2 Yt = (1 − λL)2 μ + δ0 Xt + (1 − λL)2 εt (6.71)

Noting that .(1 − λL)2 = 1 − 2λL + λ2 L2 , we get:


 
Yt = 2λYt−1 − λ2 Yt−2 + 1 − 2λ + λ2 μ + δ0 Xt + εt − 2λεt−1 + λ2 εt−2
.

(6.72)

which corresponds to a second-order autoregressive model.


– For .r = 2, we have:

(1 − λL)3
D(L)−1 =
. (6.73)
δ0
Substituting in (6.67), we get:
 
Yt = 3λYt−1 − 3λ2 Yt−2 + λ3 Yt−3 + 1 − 3λ + 3λ2 − λ3 μ
. (6.74)

+ δ0 Xt + εt − 3λεt−1 + 3λ2 εt−2 − λ3 εt−3

which corresponds to an autoregressive model of order 3.

Generally speaking, the autoregressive form associated with the distributed lag
model in which the coefficients are distributed according to (6.62) has .(r +1) lagged
endogenous variables whose associated coefficients are a function of .λ.

Remark 6.4 In order to determine the value of r, Maddala and Rao (1971) suggest
adopting a sweeping approach: we give ourselves a set of possible values for r and
select the value that maximizes the adjusted coefficient of determination.

6.6 Autoregressive Distributed Lag Models


6.6.1 Writing the ARDL Model

In autoregressive distributed lag (ARDL) models, the lagged values of the depen-
dent variable are added to the present and past values of the “usual” explanatory
variables in the set of explanatory variables.7
Generally speaking, an autoregressive distributed lag model is written:

Yt = μ + φ1 Yt−1 + . . . + φp Yt−p + δ0 Xt + δ1 Xt−1 + . . . + δh Xt−h + εt


. (6.75)

7 We will not deal in detail with ARDL models in this book. For a more exhaustive presentation,
readers can refer to Greene (2020).
282 6 Distributed Lag Models

that is:


p 
h
Yt = μ +
. φi Yt−i + δj Xt−j + εt (6.76)
i=1 j =0

where .εt is a non-autocorrelated homoskedastic process. By introducing the lag


operator L, we can write:

Ф(L)Yt = μ + D(L)Xt + εt
. (6.77)

with .Ф(L) = 1 − φ1 L − φ2 L2 − . . . − φp Lp and .D(L) = δ0 + δ1 L + . . . + δh Lh .


Such an autoregressive distributed lag model is denoted .ARDL(p, h). We
observe that the Koyck model is a special case of the .ARDL(p, h) model in which
.p = 1 and .h = 0.

.ARDL(p, h) models can be estimated by the OLS method as long as the

error term .εt is assumed to have the “good” statistical properties. Because of this
characteristic, the OLS estimator is an efficient estimator.

6.6.2 Calculation of ARDL Model Weights

Let us write the distributed lag form of the ARDL model (6.77). To do this, divide
each term of (6.77) by the autoregressive lag polynomial .Ф(L):

μ D(L) εt
Yt =
. + Xt + (6.78)
Ф(L) Ф(L) Ф(L)

which can also be written:

 ∞  ∞
μ
Yt =
. + αj Xt−j + θl εt−l (6.79)
1 − φ1 − . . . − φp
j =0 l=0

where the coefficients .αj , j = 0, 1, . . . , ∞, are the terms associated with the ratio
of the polynomials .D(L) and .Ф(L). Thus, .α0 is the coefficient of 1 in . D(L)
Ф(L) , .α1 is
the coefficient of L in . D(L) 2 D(L)
Ф(L) , .α2 is the coefficient of .L in . Ф(L) , and so on. Similarly,
the coefficients .θl , l = 0, 1, . . . , ∞, are the terms associated with the ratio . Ф(L)
1
.
The model (6.79) has a very general lag structure and is referred to as a rational
lag model by Jorgenson (1966). The long-term multiplier associated with such a
model is given by:

 D(1)
. αj = (6.80)
Ф(1)
j =0
6.7 Empirical Application 283

6.7 Empirical Application

Consider the following two series:

– The returns of the Hang Seng Index of the Hong Kong Stock Exchange: RH K
– The returns of the Japanese index NI KKEI 225: RNI KKEI

The data are weekly and cover the period from the week of December 1, 1969,
to that of July 5, 2021, i.e., a number of observations .T = 2 693 (data source:
Macrobond). Suppose we wish to explain the returns of the Hang Seng Index by
the present and lagged returns of the Japanese index. The dependent variable is
therefore RH K and the explanatory variables are the present and lagged values of
RNI KKEI . We seek to estimate the following distributed lag model:

RH Kt = μ + δ0 RNI KKEIt + δ1 RNI KKEIt−1 + . . . + δh RNI KKEIt−h + εt


.

(6.81)

Let us start by determining the number of lags to take into account. To do this, we
estimate the model (6.81) for various values of h and select the one that minimizes
the information criteria. Table 6.1 shows the values taken by the three criteria AIC,
SIC, and Hannan-Quinn (HQ) for values of h ranging from 1 to 6. These results lead
us to select a number of lags h equal to 1 according to the SIC and HQ criteria and
2 for the AIC criterion. For reasons of parsimony, and given that two out of three
criteria favor a number of lags equal to 1, we choose .h = 1.8
Let us assume a geometric distribution for the lags (Koyck model). We thus seek
to estimate the following model:

RH Kt = λRH Kt−1 + (1 − λ) μ + δ0 RNI KKEIt + εt − λεt−1


. (6.82)

Since errors are autocorrelated in this model, we estimate it by applying the


Newey-West correction (see Chap. 4). The results obtained are shown in Table 6.2.

Table 6.1 Determining the h AI C SI C HQ


number of lags
1 .−3.7672 .−3.7607 .−3.7649
2 .−3.7673 .−3.7585 .−3.7641

3 .−3.7666 .−3.7556 .−3.7626


4 .−3.7655 .−3.7524 .−3.7608
5 .−3.7649 .−3.7495 .−3.7593

6 .−3.7656 .−3.7480 .−3.7593

Values in bold correspond to values


minimizing information criteria

8 Note further that the values taken by the AIC criterion for .h = 1 and .h = 2 are almost identical.
284 6 Distributed Lag Models

Table 6.2 OLS model estimation with the Newey-West correction


Dependent variable: RHK
Variable Coefficient Std. error t-Statistic Prob.
C 0.001262 0.000742 1.700514 0.0891
RHK(-1) 0.113356 0.024923 4.548237 0.0000
RNIKKEI 0.484588 0.034404 14.08505 0.0000
R-squared 0.126952 Mean dependent var 0.001940
Adjusted R-squared 0.126303 S.D. dependent var 0.039058
S.E. of regression 0.036508 Akaike info criterion .−3.781467

Sum squared resid 3.583954 Schwarz criterion .−3.774894


Log likelihood 5092.855 Hannan-Quinn criterion .−3.779090
F-statistic 195.5068 Durbin-Watson stat 1.998281
Prob(F-statistic) 0.000000 Wald F-statistic 117.1121
Prob(Wald F-statistic) 0.000000

We have .λ̂ = 0.1134. This value is small, which means that the decay rate of
the coefficients of the distributed lag model is rapid. In other words, the influence
of past values of RN I KKEI on RH K decreases rapidly. The model can also be
written as:

RH Kt = μ + δ0 RN I KKEIt + λδ0 RNI KKEIt−1


. (6.83)
+ λ2 δ0 RNI KKEIt−2 + . . . + λi δ0 RNI KKEIt−i + . . . + εt
 
Knowing that: . 1 − λ̂ μ̂ = 0.0013, we deduce:

0.0013
μ̂ =
. = 0.0014 (6.84)
1 − 0.1134

The estimation of the model (6.83) is therefore given by:

. 
RH K t = 0.0014 + 0.4846RNI KKEIt + 0.1134 × 0.4846RNI KKEIt−1
+ 0.11342 × 0.4846RNI KKEIt−2 + . . . (6.85)

that is:

. 
RH K t = 0.0014 + 0.4846RNI KKEIt + 0.0549RNI KKEIt−1
+ 0.0062RNI KKEIt−2 + . . . (6.86)

We can see that the value of the coefficients associated with the variable
RNI KKEI decreases rapidly as the number of lags increases. We can calculate
the median lag, given by .log 2/ log λ̂, i.e., 0.3184: following a unit variation of
6.7 Empirical Application 285

RNI KKEI , 50% of the total variation of RH K is achieved in just over a day
and a half. As the value of .λ̂ is small, so is the median lag, highlighting a rapid
adjustment. It is also possible to calculate the mean lag:

λ̂
. D̄ = = 0.1278 (6.87)
1 − λ̂

The mean lag is around 0.13: it takes around half a day for the effect of a variation
in RNI KKEI to be reflected in RH K, which is rapid.

Conclusion

This chapter has introduced a first category of dynamic models: distributed lag
models. There is a second category of dynamic models, generally referred to as time
series models, in which the lagged endogenous variable is one of the explanatory
variables. These are the subject of the next chapter, which presents the basics of
time series econometrics.

The Gist of the Chapter

Distributed lag model


Definition Yt = μ + δ0 Xt + δ1 Xt−1 + . . . + δh Xt−h + εt

ΔYt
Short-term multiplier δ0 = ΔXt

Long-term multiplier D(1) = δ0 + δ1 + . . . + δh

Lag form
q
Almon δi = α0 + α1 i + α2 i 2 + . . . + αq i q = j =0 αj i
j, h>q

Koyck δi = λi δ 0, i = 0, 1, 2, . . . and 0 < λ < 1


Pascal δi = (1 − λ)r+1 Cr+i
i λi , where 0 ≤ λ ≤ 1 and r ∈ N

p 
ARDL model Yt = μ + i=1 φi Yt−i + hj =0 δj Xt−j + εt
Lag operator Li Xt = Xt−i

Further Reading

In addition to the references cited in the chapter, readers interested in distributed lag
models can consult Nerlove (1958) and Griliches (1967). A detailed presentation
can also be found in Davidson and MacKinnon (1993) and Gujarati et al. (2017).
An Introduction to Time Series Models
7

Time series econometrics is a branch of econometrics that has undergone many


developments over the last 40 years.1 We offer here an introduction to time series
models. After laying down a number of definitions, we focus on the essential
concept of stationarity. We present the Dickey-Fuller unit root test for testing the
non-stationary nature of a time series. We then expose the basic models of time
series – the autoregressive moving-average models (ARMA models) – and the
related Box and Jenkins methodology. A multivariate extension is proposed through
the presentation of VAR (vector autoregressive) models. Finally, we present the
concepts of non-stationary time series econometrics by studying the notions of
cointegration and error-correction models.

7.1 Some Definitions


7.1.1 Time Series

A time series is a sequence of real numbers, indexed by relative integers such as


time. For each instant of time, the value of the quantity under study .Yt is called a
random variable.2 The set of values .Yt when t varies is called a random process
(or stochastic process): .{Yt , t ∈ Z}. A time series is thus the realization of a random
process.
An example of a time series is provided by Fig. 7.1 which represents Standard
and Poor’s 500 stock index (denoted SP ) at monthly frequency over the period

1 This chapter takes up a number of developments appearing in the work by Lardic and Mignon

(2002), which interested readers may refer to for further details.


2 There are continuous and discrete random variables. We are only interested in discrete variables

here.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 287
V. Mignon, Principles of Econometrics, Classroom Companion: Economics,
https://doi.org/10.1007/978-3-031-52535-3_7
288 7 An Introduction to Time Series Models

5,000

4,000

3,000

2,000

1,000

0
1980 1985 1990 1995 2000 2005 2010 2015 2020

Fig. 7.1 Standard and Poor’s 500 stock index series, 1980.01–2021.06

Table 7.1 Standard and SP


Poor’s 500 stock index series
1980.01 386.0129
1980.02 395.7329
1980.03 353.9680
1980.04 344.3516
... ...
2021.03 3997.9635
2021.04 4199.2766
2021.05 4192.7108
2021.06 4246.8837
Data source: Robert
Shiller’s website (www.
econ.yale.edu/~shiller)

from January 1980 to June 2021. The first and last values of this series are given in
Table 7.1: for each month, we have a value of the stock market index.
As the class of random processes is very large, time series analysis initially
focused on a particular class of processes: stationary random processes. These
processes are characterized by the fact that their statistical properties do not change
over time.
7.1 Some Definitions 289

7.1.2 Second-Order Stationarity

The notion of stationarity of a time series was briefly discussed in the first chapter.
We have seen that, when working with time series, it is necessary to study their
characteristics in terms of stationarity before analyzing and attempting to model
them. Here we present only the concept of second-order stationarity or weak
stationarity, which is the notion of stationarity usually retained in time series
econometrics.3

Definition 7.1 A process .Yt is second-order stationary if:


 
– (1) .E Yt2 < ∞ .∀t ∈ Z.
– (2) .E (Yt ) = m .∀t ∈ Z.
– (3) .Cov (Yt , Yt+h ) = γh , .∀t, h ∈ Z, where .γ is the autocovariance function of
the process.

Condition (1) means that the process is of second order: second-order moments,
such as variance, are finite and independent of time. Condition (2) means that the
expectation of the process is constant over time (mean stationarity). Condition (3)
reflects the fact that the covariance between two periods t and .t + h is solely a
function of the time difference, h. Note that the variance .σY2 = Cov (Yt , Yt ) = γ0
is also independent of time. The fact that the variance is constant over time reflects
the property of homoskedasticity.
In the remainder of the chapter, the term stationary will refer to the concept of
second-order stationarity.

7.1.3 Autocovariance Function, Autocorrelation Function,


and Partial Autocorrelation Function

We have already mentioned the notions of autocovariance function and autocorrela-


tion function in Chap. 4. Let us recall here the definitions of these central concepts
for the study of time series.

Definition 7.2 Let .Yt be a random process with finite variance. The autovariance
function .γh of .Yt is defined as:

γh = Cov (Yt , Yt+h ) = E [[Yt − E (Yt )] [Yt+h − E (Yt+h )]]


. (7.1)

The autocovariance function measures the covariance between two values of the
same series .Yt separated by a certain time h.

3 Fora more detailed study of stationarity and a definition of the various concepts, see in particular
Lardic and Mignon (2002).
290 7 An Introduction to Time Series Models

Theorem 7.1 The autocovariance function of a stationary process .Yt has the
following properties:
 
– .γ0 = Cov (Yt , Yt ) = E [Yt − E (Yt )]2 = V (Yt ) = σY2 ≥ 0
– .|γh | ≤ γ0
– .γh = γ−h : the autocovariance function is an even function,
.Cov (Yt , Yt+h ) = Cov (Yt , Yt−h ) .

Remark 7.1 We restrict ourselves here to the analysis of series in the time domain.
However, it is possible to study a series in the spectral or frequency domain. The
analog of the autocovariance function in the spectral domain is called the spectral
density. This book does not deal with spectral analysis. Interested readers should
refer to Hamilton (1994) or Greene (2020).

Definition 7.3 Let .Yt be a stationary process. The autocorrelation function .ρh is
defined as:
γh
ρh =
. ,h ∈ Z (7.2)
γ0

The autocorrelation function measures the temporal links between the various
components of the series .Yt . Specifically:

Cov (Yt , Yt+h ) γh γh


ρh =
. =√ √ = (7.3)
σYt σYt+h γ0 γ0 γ0

By virtue of the definitions of covariance and standard deviations, we can write:

T
−h
(Yt − Ȳ )(Yt+h − Ȳ )
t=1
.ρh =   (7.4)
T
−h T
−h
(Yt − Ȳ )2 (Yt−h − Ȳ )2
t=1 t=1

where .Ȳ is the mean of the series .Yt calculated on .(T − h) observations:

T −h
1 
Ȳ =
. Yt (7.5)
T −h
t=1

To simplify the calculations, and since in practice only a sample is available, we


can define the sampling autocorrelation function (or estimated autocorrelation
7.1 Some Definitions 291

function) as follows:

T
−h
(Yt − Ȳ )(Yt+h − Ȳ )
t=1
.ρ̂h = (7.6)

T
(Yt − Ȳ )2
t=1

where .Ȳ represents the mean of the series .Yt calculated over T observations:

1 
T
Ȳ =
. Yt (7.7)
T
t=1

For a sufficiently large number of observations T , expressions (7.4) and (7.6)


give very similar results.

Remark 7.2 The graph of the sampling autocorrelation function is called a correl-
ogram. An example is shown in Fig. 7.2, with the number of lags on the x-axis and
the value of the autocorrelation function on the y-axis.

Theorem 7.2 The autocorrelation function of a stationary process .Yt has the
following properties:

– .ρ0 = 1
– .|ρh | ≤ ρ0
– .ρh = ρ−h : even function.

Fig. 7.2 Example of a ^


ρ
correlogram h

0
1 2 3 4 5 6
h

−1
292 7 An Introduction to Time Series Models

The practical interest of the autocorrelation function can be found in particular


in the study of ARMA processes (see below). Another fundamental function in
the study of time series is the partial autocorrelation function. We have already
mentioned the notion of partial correlation coefficient in Chap. 3.
The partial autocorrelation function measures the correlation between .Yt and
.Yt−h , the influence of the variables .Yt−h+i (for .i < h) having been removed.

Let .ρh and .φhh be the autocorrelation and partial autocorrelation functions of
.Yt , respectively. Let .Ph be the symmetric matrix formed by the .(h − 1) first

autocorrelations of .Yt :
⎡ ⎤
1 ρ1 . . . ρh−1
⎢ ⎥
⎢ . 1 ⎥
⎢ ⎥
⎢ . . ⎥
.Ph = ⎢ ⎥ (7.8)
⎢ . . ⎥
⎢ ⎥
⎣ . . ⎦
ρh−1 1

The partial autocorrelation function is given by:


 ∗
P 
h
. φhh = (7.9)
|Ph |

where .|Ph | is the determinant of the matrix .Ph . The matrix .Ph∗ is given by:
⎡ ⎤
1 ρ1 . . ρh−2 ρ1
⎢ . ⎥
⎢ . 1 ⎥
⎢ ⎥
∗ ⎢ . . . ⎥
.Ph = ⎢ ⎥ (7.10)
⎢ . . . ⎥
⎢ ⎥
⎣ . 1 . ⎦
ρh−1 ρh

.Ph∗ is thus the matrix .Ph in which the last column has been replaced by the vector
'
.[ρ1 ....ρh ] .

The partial autocorrelation function is written as:




⎪ ρ1 if i = 1

⎨ 
i−1
ρi − φi−1,j ρi−j
φii =
. j =1 (7.11)

⎪ for i = 2, . . . , h

⎩ 1−

i−1
φi−1,j ρj
j =1

and

φij = φi−1,j − φii φi−1,i−j for i = 2, . . . , h and j = 1, . . . , i − 1.


. (7.12)
7.2 Stationarity: Autocorrelation Function and Unit Root Test 293

This algorithm is known as the Durbin algorithm (Durbin, 1960). It is based


on the Yule-Walker equations (see below). The partial autocorrelation coefficients
are given by the autocorrelation coefficients and by a set of recursive equations.

7.2 Stationarity: Study of the Autocorrelation Function and


Unit Root Test
7.2.1 Study of the Autocorrelation Function

In addition to the graphical representation of the series itself, a first idea concerning
the stationarity or not of a series can be provided by the autocorrelation function.
We know that the autocorrelation function of a stationary time series decreases
very rapidly. If no autocorrelation coefficient is significantly different from zero,
we say that the process has no memory. It is therefore stationary, as in the case of
white noise. If, for example, only the first-order autocorrelation is significant, the
process is said to have a short memory. Conversely, the autocorrelation function of
a non-stationary time series decreases very slowly, indicating a strong dependence
between observations.
Figures 7.3 and 7.4 represent the correlogram of a stationary series. It can be seen
that the autocorrelation function decreases very rapidly (here it is cancelled out from
the fourth lag). Similarly, Fig. 7.5 relates to a stationary series: the autocorrelation
function decreases sinusoidally, but the decay of the envelope curve is exponential,
testifying to a very rapid decrease in the autocorrelation function. Conversely, the
correlograms in Figs. 7.6 and 7.7 relate to a non-stationary series insofar as it
appears that the autocorrelation function decreases very slowly.

Fig. 7.3 Correlogram of a ^


ρ
stationary series h

0
1 2 3 4
h

−1
294 7 An Introduction to Time Series Models

Fig. 7.4 Correlogram of a ^


ρ
stationary series h

1 2 3 4
0 h

−1

Fig. 7.5 Correlogram of a ^


ρ
stationary series h

0
1 2 3 4 5 6 8
h

−1

As an illustration, consider Standard and Poor’s 500 stock index (denoted SP )


monthly over the period from January 1980 to June 2021 (Fig. 7.1). Figure 7.8
reproduces the dynamics of this same series in logarithms, noted LSP . These graphs
highlight the existence of an overall upward trend illustrating that the mean of
the series varies over time: the US stock market index series appears to be non-
stationary in the mean.
Let us differentiate the series LSP by applying the first-difference operator:

ΔLSPt = LSPt − LSPt−1 = RSPt


. (7.13)
7.2 Stationarity: Autocorrelation Function and Unit Root Test 295

Fig. 7.6 Correlogram of a ^


ρ
non-stationary series h

0
1 2 3 4 5 6 7 8
h

−1

Fig. 7.7 Correlogram of a ^


ρ
non-stationary series h

1 2 3 4 5 6 7 8
0 h

−1

.RSPt represents the series of returns of the US stock index over the period from

February 1980 to June 2021 (one observation, corresponding to January 1980, is


lost at the beginning of the period due to the differentiation operation). This series
is shown in Fig. 7.9: the upward trend in the mean has been suppressed by the
differentiation operation, indicating that the series of returns is a priori stationary
in the mean.
Let us confirm these intuitions by examining the correlograms of the LSP and
RSP series. The correlogram of LSP is plotted in Fig. 7.10 and that of RSP
in Fig. 7.11. The vertical dotted lines on the graphs of the autocorrelation and
partial autocorrelation functions define the bounds of the confidence interval. Each
296 7 An Introduction to Time Series Models

8.4

8.0

7.6

7.2

6.8

6.4

6.0

5.6
1980 1985 1990 1995 2000 2005 2010 2015 2020

Fig. 7.8 Logarithm of Standard and Poor’s 500 stock index, 1980.01–2021.06

.12

.08

.04

.00

-.04

-.08

-.12

-.16

-.20

-.24
1980 1985 1990 1995 2000 2005 2010 2015 2020

Fig. 7.9 Standard and Poor’s 500 returns, 1980.02–2021.06

value (autocorrelation or partial autocorrelation) that falls outside this confidence


interval is significantly different from zero. We can see from Fig. 7.10 that the
autocorrelation function of the series LSP decreases very slowly (the values
taken by the autocorrelation function are given in column AC for lags ranging
from 1 to 20). All the values of the autocorrelation function are also outside the
confidence interval; they are significantly different from zero. The column .Q − Stat
7.2 Stationarity: Autocorrelation Function and Unit Root Test 297

Autocorrelation Partial Correlation AC PAC Q-Stat Prob

1 0.992 0.992 492.59 0.000


2 0.983 -0.038 977.26 0.000
3 0.973 -0.039 1453.5 0.000
4 0.964 0.006 1921.4 0.000
5 0.954 -0.003 2381.2 0.000
6 0.945 -0.003 2832.9 0.000
7 0.936 0.019 3276.9 0.000
8 0.927 0.010 3713.6 0.000
9 0.919 0.003 4143.2 0.000
10 0.910 0.001 4565.8 0.000
11 0.902 0.001 4981.6 0.000
12 0.894 0.002 5390.7 0.000
13 0.885 0.001 5793.3 0.000
14 0.878 0.005 6189.4 0.000
15 0.870 0.029 6579.8 0.000
16 0.863 0.012 6964.5 0.000
17 0.854 -0.099 7342.5 0.000
18 0.846 -0.004 7713.5 0.000
19 0.837 -0.008 8077.4 0.000
20 0.828 0.000 8434.5 0.000

Fig. 7.10 Correlogram of the series LSP

gives the values of the Ljung-Box statistic used to test the null hypothesis of no
autocorrelation (see Chap. 4) for a number of lags ranging from 1 to 20. We see that
the value of this statistic for 20 lags is 8 434,5, which is higher than the critical
value of the Chi-squared distribution with 20 degrees of freedom (31.41 at the
5% significance level): the null hypothesis of no autocorrelation is consequently
rejected. These elements confirm the intuition about the non-stationary nature of the
series LSP . On the other hand, we notice that the autocorrelation function of RSP
no longer shows any particular structure, which pleads in favor of the stationarity of
the series. Of course, this intuition must be confirmed by the application of unit root
tests (see below). However, the Ljung-Box statistic for 20 lags is 36.589, which is
slightly higher than the critical value (31.41 at the 5% significance level), leading to
the rejection of the null hypothesis of no autocorrelation.

7.2.2 TS and DS Processes

Economic and financial series are very often non-stationary series. We are interested
here in non-stationarity in the mean. We have seen that non-stationarity can be
identified graphically through the graph of the series and the correlogram. Since
298 7 An Introduction to Time Series Models

Autocorrelation Partial Correlation AC PAC Q-Stat Prob

1 0.225 0.225 25.466 0.000


2 -0.022 -0.077 25.712 0.000
3 -0.020 0.003 25.918 0.000
4 0.036 0.041 26.564 0.000
5 0.090 0.075 30.672 0.000
6 -0.027 -0.067 31.046 0.000
7 0.010 0.043 31.092 0.000
8 0.030 0.017 31.544 0.000
9 0.010 -0.007 31.590 0.000
10 -0.010 -0.013 31.640 0.000
11 0.005 0.020 31.650 0.001
12 -0.014 -0.031 31.755 0.002
13 -0.038 -0.032 32.491 0.002
14 -0.034 -0.017 33.092 0.003
15 -0.017 -0.008 33.245 0.004
16 0.030 0.031 33.723 0.006
17 0.001 -0.010 33.723 0.009
18 0.034 0.048 34.339 0.011
19 -0.007 -0.026 34.361 0.017
20 -0.065 -0.058 36.589 0.013

Fig. 7.11 Correlogram of the series RSP

Nelson and Plosser (1982), cases of non-stationarity in the mean have been analyzed
using two types of processes:

– TS (trend stationary) processes which are characterized by non-stationarity of


a deterministic nature
– DS (difference stationary) processes whose non-stationarity is stochastic (ran-
dom) in nature

Non-stationarity has fundamental consequences for econometrics. If it is stochas-


tic in nature, the usual asymptotic properties of estimators are no longer valid, and
it is necessary to develop a particular asymptotic theory. Moreover, in a multivariate
framework, applying the usual econometric methods to non-stationary series can
lead to the estimation of regressions that seem statistically correct, but which in
reality make no sense at all. In other words, the links highlighted between the
variables appearing in these regressions are spurious; this is the classic problem
of spurious regressions (see below).

Characteristics of TS Processes
Generally speaking, a TS process .Yt can be written:

Yt = ft + εt
. (7.14)
7.2 Stationarity: Autocorrelation Function and Unit Root Test 299

where .ft is a deterministic function of time and .εt is a stationary process. In the
simple case where .ft is a polynomial function of order 1, we have:

Yt = γ + t β + εt
. (7.15)

where t denotes time.  


For simplicity, further assume that .εt ∼ W N 0, σε2 . Let us determine the
expectation, variance, and autocovariance function of this process in order to
identify its characteristics. Calculating the expectation yields:

E[Yt ] = E [ γ + t β + εt ]
. (7.16)

Hence, since .E[εt ] = 0:

E[Yt ] = γ + t β
. (7.17)

Now let us calculate the variance:

.V [Yt ] = E [Yt − E [Yt ]]2 = E [εt ]2 = V [εt ] (7.18)

Hence:

V [Yt ] = σε2
. (7.19)

Finally, let us determine the autocovariance function of the process .Yt :

Cov[Yt , Ys ] = E[(Yt − E[Yt ])(Ys − E[Ys ])] = E[εt εs ]


. (7.20)

Hence:

. Cov[Yt , Ys ] = 0 ∀ t /= s (7.21)

Thus, the expectation of a TS process exhibits a deterministic trend: the process is


non stationary in the mean, the non-stationarity being of a deterministic type. On the
other hand, its variance is constant over time, showing that a TS process is stationary
in variance. Finally, its autocovariance function is independent of time. Thus, by
virtue of (7.19), the long-term forecast error has a finite variance .σε2 . In other words,
the long-term behavior of .Yt is deterministic, which is the main characteristic of TS
processes. In this type of modeling, the effects of a shock on .Yt are transitory (.εt
being assumed stationary and invertible): following a shock, the series returns to its
long-term level represented here by the trend.

Remark 7.3 A TS process is a process that can be made stationary (i.e., detrended)
by a regression on a deterministic trend.
300 7 An Introduction to Time Series Models

Characteristics of DS Processes
A DS process is a non-stationary process that can be stationarized by applying a
difference filter .Δ = (1 − L)d where L is the lag operator and d is a positive integer
called the differentiation or integration parameter:

. (1 − L)d Yt = β + εt (7.22)

where .εt is a stationary process. Often .d = 1 and the DS process is written as:

Yt − Yt−1 = β + εt
. (7.23)

.Yt − Yt−1 = ΔYt is stationary: in a DS process, the difference of the series is

stationary.

Remark 7.4 If .εt is white noise, the process:

Yt = Yt−1 + β + εt
. (7.24)

is known as a random walk with drift if .β /= 0. If .β = 0, it is referred to as a


random walk without drift. A random walk is thus characterized by the presence
of a unit root (the coefficient assigned to .Yt−1 is equal to 1)4 and by the fact that .εt
is white noise.

In order to highlight the main characteristics of a DS process, let us reason by


recurrence:

Y1 = Y0 + β + ε1
. (7.25)

Y2 = Y1 + β + ε2 = Y0 + 2β + ε1 + ε2
. (7.26)

Proceeding in this way, we have:


t
Yt = Y0 + t β +
. εj (7.27)
j =1

where .Y0 denotes the first term of the series .Yt .


Unlike the error term in Eq. (7.15) of a TS process, the error term in the DS
t
process (Eq. (7.27)) corresponds to an accumulation of random shocks . j =1 εj .
This remark is fundamental as it means that a shock at a given date has permanent
consequences.

4 Byintroducing the lag operator L, we can write .(1 − L) Yt = β + εt . If we posit .1 − L = 0, we


deduce .L = 1, hence the name of unit root.
7.2 Stationarity: Autocorrelation Function and Unit Root Test 301

Let us examine the statistical characteristics of DS processes, assuming that .εt is


a white noise process. Consider the calculation of the expectation:
⎡ ⎤

t
E[Yt ] = E ⎣Y0 + t β +
. εj ⎦ (7.28)
j =1

Hence:

E[Yt ] = Y0 + t β
. (7.29)

Now let us determine the variance of the process:


⎡ ⎤2

t
V [Yt ] = E [Yt − E [Yt ]]2 = E ⎣Y0 + t β +
. εj − Y0 − t β ⎦ (7.30)
j =1

that is:
⎡ ⎤
t
.V [Yt ] = V ⎣ εj ⎦ (7.31)
j =1

So we have:

V [Yt ] = t σε2
. (7.32)

Finally, let us calculate the autocovariance function:


⎡⎛ ⎞⎛ ⎞⎤

t s
Cov[Yt , Ys ] = E[(Yt − E[Yt ])(Ys − E[Ys ])] = E ⎣⎝
. εj ⎠ ⎝ εj ⎠⎦
j =1 j =1
(7.33)

Hence:

.Cov[Yt , Ys ] = min(t, s) σε2 s /= t (7.34)

The expectation and variance of a DS process are time-dependent. The DS


process is thus characterized by non-stationarity of a deterministic nature via
the expectation but also by a non-stationarity of a stochastic nature through the
disturbances whose variance follows a linear trend. For a DS process, the variance of
the forecast error is not constant, but increases with the horizon. Thus, each random
shock has a lasting effect on the behavior of the series.
302 7 An Introduction to Time Series Models

Because of their very different characteristics, it is crucial to be able to


distinguish between the two types of processes, TS and DS. This distinction can be
made by means of unit root tests, such as the Dickey-Fuller test, which we present
below.

7.2.3 The Dickey-Fuller Test

To determine whether a series is stationary or not, unit root tests are applied. There
are numerous unit root tests (see in particular Lardic and Mignon, 2002). We present
here only the test of Dickey and Fuller (1979, 1981) aimed at testing the null
hypothesis of non-stationarity against the alternative hypothesis of stationarity. We
thus test:

– .H0 : the series is non-stationary, i.e., it has at least one unit root.
– .H1 : the series is stationary, i.e., it has no unit root.

Simple Dickey-Fuller (DF) Test


Dickey and Fuller (1979) consider three basic models for the series Yt , t =
1, . . . , T :

– Model [1]: model without constant or deterministic trend:

(1 − ρL) Yt = εt
. (7.35)

that is:

Yt = ρYt−1 + εt
. (7.36)

– Model [2]: model with constant without deterministic trend:

. (1 − ρL) (Yt − μ) = εt (7.37)

that is:

Yt = ρYt−1 + μ (1 − ρ) εt
. (7.38)

– Model [3]: model with constant and deterministic trend:

(1 − ρL) (Yt − α − βt) = εt


. (7.39)

that is:

Yt − α − βt − ρYt−1 + αρ + β (t − 1) = εt
. (7.40)
7.2 Stationarity: Autocorrelation Function and Unit Root Test 303

hence:

Yt = ρYt−1 + α (1 − ρ) + βρ + β (1 − ρ) t + εt
. (7.41)

In each of the three models, it is assumed that εt ∼ W N(0, σε2 ).


If ρ = 1, this means that one of the roots of the lag polynomial is equal to 1. In
this case, there is a unit root and Yt is a non-stationary process.
We test the null hypothesis of non-stationarity, i.e., the presence of a unit root
(ρ = 1), against the alternative hypothesis of no unit root (|ρ| < 1).
Let us write more precisely the null and alternative hypotheses for each of the
three models considered:

– Model [1]:

H0 : ρ = 1 ⇔ Yt = Yt−1 + εt
. (7.42)
H1 : |ρ| < 1 ⇔ Yt = ρYt−1 + εt

Under the null hypothesis, Yt follows a random walk process without drift.
Under the alternative hypothesis, Yt follows an autoregressive process of order 1
(AR(1)).
– Model [2]:

H0 : ρ = 1 ⇔ Yt = Yt−1 + εt
. (7.43)
H1 : |ρ| < 1 ⇔ Yt = ρYt−1 + γ + εt with γ = μ(1 − ρ)

The null hypothesis corresponds to a random walk process without drift.


Under the alternative hypothesis, Yt follows an AR(1) process with drift.
– Model [3]:

⎨ H0 : ρ = 1 ⇔ Yt = Yt−1 + β + εt
. H : |ρ| < 1 ⇔ Yt = ρYt−1 + λ + δt + εt (7.44)
⎩ 1
with λ = α(1 − ρ) + ρβ and δ = β(1 − ρ)

Under the null hypothesis, Yt follows a random walk with drift. Under the
alternative hypothesis, Yt is a TS process. It can be made stationary by calculating
the deviations from the trend estimated by OLS.

To facilitate the application of the test, models [1], [2], and [3] are in practice
estimated in the following form:5

5 The first-difference models allow us to reduce to usual tests of significance of the coefficients,
the critical values being tabulated by Dickey and Fuller (see below).
304 7 An Introduction to Time Series Models

– Model [1]:

ΔYt = φ Yt−1 + εt
. (7.45)

– Model [2]:

Δ Yt = γ + φ Yt−1 + εt
. (7.46)

– Model [3]:

ΔYt = λ + δt + φ Yt−1 + εt
. (7.47)

with φ = ρ − 1 and εt is white noise. We test the null hypothesis φ = 0 (non-


stationarity) against the alternative hypothesis φ < 0 (stationarity). To this end,
the t-statistic for the coefficient φ is calculated. This statistic is compared with the
values tabulated by Dickey and Fuller (see Table 7.2). As the critical values are
negative, the decision rule is reversed:

– If the calculated value of the t-statistic associated with φ is lower than the critical
value, the null hypothesis is rejected, the series is stationary.

Table 7.2 Critical values of T 1% 5% 10%


the Dickey-Fuller test for
Model [1]
ρ=1
100 −2.60 −1.95 −1.61
250 −2.58 −1.95 −1.62
500 −2.58 −1.95 −1.62
∞ −2.58 −1.95 −1.62
Model [2]
100 −3.51 −2.89 −2.58
250 −3.46 −2.88 −2.57
500 −3.44 −2.87 −2.57
∞ −3.43 −2.86 −2.57
Model [3]
100 −4.04 −3.45 −3.15
250 −3.99 −3.43 −3.13
500 −3.98 −3.42 −3.13
∞ −3.96 −3.41 −3.12
Model [1]: model without con-
stant or deterministic trend.
Model [2]: model with constant,
without trend. Model [3]: model
with constant and trend
7.2 Stationarity: Autocorrelation Function and Unit Root Test 305

– If the calculated value of the t-statistic associated with φ is higher than the critical
value, the null hypothesis is not rejected, the series is non-stationary.

The models used in the DF test are restrictive in that εt is assumed to be white
noise. However, this assumption is very often questioned due to autocorrelation
and/or heteroskedasticity. To solve this problem, Dickey and Fuller proposed a
parametric correction leading to the augmented Dickey-Fuller test.

Augmented Dickey-Fuller (ADF) Test


To account for possible autocorrelation of errors, lags are introduced on the
endogenous variable.6 As before, three models are distinguished:

– Model [1]:


p
ΔYt = φ Yt−1 +
. φj ΔYt−j + εt (7.48)
j =1

– Model [2]:


p
Δ Yt = γ + φ Yt−1 +
. φj ΔYt−j + εt (7.49)
j =1

– Model [3]:


p
ΔYt = λ + δt + φ Yt−1 +
. φj ΔYt−j + εt (7.50)
j =1

Again, we test the null hypothesis .φ = 0 against the alternative hypothesis .φ < 0.
The t-statistic of the coefficient .φ is compared to the critical values tabulated by
Dickey and Fuller (see Table 7.2). The null hypothesis of unit root is rejected if the
calculated value is less than the critical value.
It should be noted that the application of the ADF test requires us to choose
the number of lags p – called the truncation parameter of the ADF test – to

6 One of the causes of error autocorrelation lies in the omission of explanatory variables. The

correction provided by Dickey and Fuller thus consists in adding explanatory variables represented
by the lagged values of the endogenous variable.
306 7 An Introduction to Time Series Models

be introduced so that the residuals are indeed white noise. Several methods are
available for making this choice, including:

– The study of partial autocorrelations of the series .ΔYt . We select for p the lag
corresponding to the last partial autocorrelation significantly different from zero.
– The estimation of several processes for different values of p. We retain the model
that minimizes the information criteria of Akaike, Schwarz, or Hannan-Quinn.
– The use of the procedure suggested by Campbell and Perron (1991) consisting in
setting a maximum value for p, noted .pmax . We then estimate the regression
model of the ADF test and test the significance of the coefficient associated
with the term .ΔYt−pmax . If this coefficient is significant, we select this value
.pmax for p. If the coefficient associated with .ΔYt−pmax is not significant, we re-

estimate the ADF regression model for a value of p equal to .pmax − 1 and test
the significance of the coefficient relating to the term .ΔYt−pmax −1 and so on.

Sequential Testing Strategy


It is fundamental to note that the unit root test should not be performed on all three
models. Instead, the Dickey-Fuller test should be applied to just one of the three
models. In practice, we adopt a three-step sequential strategy.

– Step 1. We estimate the general model with constant and trend:


p
ΔYt = α + βt + φ Yt−1 +
. φj ΔYt−j + εt (7.51)
j =1

We start by testing the significance of the trend by referring to the Dickey-


Fuller tables (see Table 7.3). Two cases may arise:
– If the trend is not significant, we go on to Step 2.
– If the trend is significant, we keep the model and test the null hypothesis of
unit root by comparing the t-statistic of .φ with the values tabulated by Dickey
and Fuller (see Table 7.2). We then have two possibilities:

Table 7.3 Critical values of Model [2] Model [3]


constant and trend,
Constant Constant Trend
Dickey-Fuller tests
T 1% 5% 10% 1% 5% 10% 1% 5% 10%
100 3.22 2.54 2.17 3.78 3.11 2.73 3.53 2.79 2.38
250 3.19 2.53 2.16 3.74 3.09 2.73 3.49 2.79 2.38
500 3.18 2.52 2.16 3.72 3.08 2.72 3.48 2.78 2.38
.∞ 3.18 2.52 2.16 3.71 3.08 2.72 3.46 2.78 2.38
Model [2]: model with constant, without deterministic
trend. Model [3]: model with constant and trend
7.2 Stationarity: Autocorrelation Function and Unit Root Test 307

– If we do not reject the null hypothesis, .Yt is non-stationary. In this case, it


must be differentiated and the test procedure must be repeated on the series
in first difference.
– If the null hypothesis is rejected, .Yt is stationary. In this case, the test
procedure stops and we can work directly on the series .Yt .
– Step 2. This step should only be applied if the trend in the previous model is not
significant. We estimate model [2]:


p
ΔYt = α + φ Yt−1 +
. φj ΔYt−j + εt (7.52)
j =1

and begin by testing the significance of the constant by referring to the Dickey-
Fuller tables (see Table 7.3):
– If the constant is not significant, we go to Step 3.
– If the constant is significant, we test the null hypothesis of unit root by
comparing the t-statistic of .φ with the values tabulated by Dickey and Fuller
(see Table 7.2). We then have two possibilities:
– If we do not reject the null hypothesis, .Yt is non-stationary. In this case, it
must be differentiated and the test procedure must be repeated on the series
in first difference.
– If the null hypothesis is rejected, .Yt is stationary. In this case, the test
procedure stops and we can work directly on the series .Yt .
– Step 3. This step should only be applied if the constant in the previous model is
not significant. We estimate model [1]:


p
ΔYt = φ Yt−1 +
. φj ΔYt−j + εt (7.53)
j =1

and test the null hypothesis of unit root using Dickey-Fuller critical values (see
Table 7.2):
– If the null hypothesis is not rejected, .Yt is non-stationary. In this case, it must
be differentiated and the test procedure must be repeated on the series in first
difference.
– If the null hypothesis is rejected, .Yt is stationary. In this case, the test
procedure stops and we can work directly on the series .Yt .

Remark 7.5 If, after applying this procedure, we find that .Yt is non-stationary, this
means that the series contains at least one unit root. In this case, we should repeat
the Dickey-Fuller tests on the series in first difference. If .ΔYt is found to be non-
stationary, the procedure should be applied again on the series in second difference
and so on.

Remark 7.6 A non-stationary series is also called an integrated series. For


example, if .Yt is non-stationary and .ΔYt is stationary, then .Yt is integrated of order
308 7 An Introduction to Time Series Models

1: it must be differentiated once to make it stationary. .ΔYt is integrated of order 0:


there is no need to differentiate it to make it stationary. An integrated series of order
0 is thus a stationary series.

Definition 7.4 A series .Yt is integrated of order . d, which we note . Yt ∼ I (d), if it is


necessary to differentiate it . d times to make it stationary. In other words, .Yt ∼ I (d)
if and only if .(1 − L)d Yt ∼ I (0). d is called the integration parameter.

Empirical Application
Consider the series SP of Standard and Poor’s 500 stock index over the period
from January 1980 to June 2021. The logarithmic series is denoted LSP , with RSP
standing for the series of returns. Our aim is to apply the Dickey-Fuller test strategy.
Let us first study the stationarity of the series LSP . We test the null hypothesis
of non-stationarity of the series LSP (presence of unit root) against the alternative
hypothesis of stationarity (absence of unit root). To this end, we begin by estimating
the model with constant and trend:


p
ΔLSPt = RSPt = λ + δt + φ LSPt−1 +
. φj RSPt−j + εt (7.54)
j =1

Estimating this model involves determining the value of the truncation parameter
p. As previously mentioned, this choice can be guided by the graph of the partial
autocorrelation function of the series RSP (Fig. 7.11). As shown, only the first
partial autocorrelation lies outside the confidence interval. In other words, only the
first partial autocorrelation is significantly different from zero, which leads us to take
a value of p equal to 1. Another technique involves estimating the model (7.54) for
different values of p and selecting the value that minimizes the information criteria.
Table 7.4 shows the values taken by the AIC, SIC, and HQ information criteria for
values of p ranging from 1 to 12. Minimizing the SIC and HQ criteria leads us
to choose .p = 1, while the AIC criterion tends to select .p = 2. For reasons of
parsimony, and insofar as two out of three criteria favor a value of p equal to 1, we
choose .p = 1.7
As a result, we estimate the following model:

RSPt = λ + δt + φ LSPt−1 + φ1 RSPt−1 + εt


. (7.55)

The results are set out in Table 7.5. We start by testing the significance of the
trend (noted .@T REND(“1980M01”)) by referring to the Dickey-Fuller tables. The
critical value of the trend in a model with constant and trend for 500 observations
being 2.78 (see Table 7.3), we have .1.9612 < 2.78: we do not reject the null
hypothesis of non-significance of the trend. We then proceed to the next step, which

7 Forrobustness, we also conducted the analysis with two lags. The results are identical to those
presented here.
7.2 Stationarity: Autocorrelation Function and Unit Root Test 309

Table 7.4 Choosing the p AI C SI C HQ


truncation parameter p
1 −3.814483 −3.780663 −3.80121
2 −3.815117 −3.772842 −3.798526
3 −3.811125 −3.760395 −3.791216
4 −3.809856 −3.750671 −3.786628
8 −3.808701 −3.715695 −3.772199
12 −3.79428 −3.667454 −3.744505
Values in bold correspond to values minimizing
information criteria

Table 7.5 ADF test on LSP . Model with constant and trend
Null hypothesis: LSP has a unit root
Exogenous: constant, linear trend
Lag length: 1 (automatic – based on SIC, maxlag .= 17)
t-Statistic Prob.*
Augmented Dickey-Fuller test statistic .−2.037919 0.5786
Test critical values 1%level .−3.976591

5% level .−3.418870

10% level .−3.131976


*MacKinnon (1996) one-sided p-values
Augmented Dickey-Fuller test equation
Dependent variable: D(LSP)
Method: least squares
Sample: 1980M01 2021M06
Included observations: 498
Variable Coefficient Std. error t-Statistic Prob.
LSP(.−1) .−0.013371 0.006561 .−2.037919 0.0421
D(LSP(.−1)) 0.232493 0.043764 5.312378 0.0000
C 0.084074 0.039810 2.111899 0.0352
@TREND(“1980M01”) 5.82E-05 2.97E-05 1.961221 0.0504
R-squared 0.058922 Mean dependent var 0.004844
Adjusted R-squared 0.053207 S.D. dependent var 0.036778
S.E. of regression 0.035787 Akaike info criterion .−3.814483
Sum squared resid 0.632659 Schwarz criterion .−3.780663

Log likelihood 953.8063 Hannan-Quinn criterion .−3.801210


F-statistic 10.31000 Durbin-Watson stat 1.968341
Prob(F-statistic) 0.000001

consists in estimating the model with constant, without trend:

RSPt = λ + φ LSPt−1 + φ1 RSPt−1 + εt


. (7.56)

The results are given in Table 7.6. We test the significance of the constant. The
critical value, at the 5% significance level, of the constant in a model with constant
310 7 An Introduction to Time Series Models

Table 7.6 ADF test on LSP . Model with constant, without trend
Null hypothesis: LSP has a unit root
Exogenous: constant
Lag length: 1 (automatic – based on SIC, maxlag .= 17)
t-Statistic Prob.*
Augmented Dickey-Fuller test statistic .−0.584842 0.8709
Test critical values: 1%level .−3.443254

5% level .−2.867124

10% level .−2.569806

*MacKinnon (1996) one-sided p-values


Augmented Dickey-Fuller test equation
Dependent variable: D(LSP)
Method: least squares
Sample: 1980M01 2021M06
Included observations: 498
Variable Coefficient Std. error t-Statistic Prob.
LSP(.−1) .−0.001445 0.002472 .−0.584842 0.5589
D(LSP(.−1)) 0.226533 0.043784 5.173865 0.0000
C 0.013992 0.017597 0.795109 0.4269
R-squared 0.051595 Mean dependent var 0.004844
Adjusted R-squared 0.047763 S.D. dependent var 0.036778
S.E. of regression 0.035889 Akaike info criterion .−3.810743
Sum squared resid 0.637585 Schwarz criterion .−3.785378

Log likelihood 951.8751 Hannan-Quinn criterion .−3.800788


F-statistic 13.46438 Durbin-Watson stat 1.965871
Prob(F-statistic) 0.000002

without trend is 2.52 (see Table 7.3). Since .0.7951 < 2.52, we do not reject the null
hypothesis that the constant is insignificant. Finally, we estimate the model without
constant or trend:

RSPt = φ LSPt−1 + φ1 RSPt−1 + εt


. (7.57)

The results in Table 7.7 allow us to proceed with the unit root test, i.e., the
test of the null hypothesis .φ = 0 against the alternative hypothesis .φ < 0. The
calculated value of the ADF statistic is 2.2448 and the critical value is .−1.95 at
the 5% significance level (Table 7.2). Since .2.2448 > −1.95, we do not reject the
null hypothesis of non-stationarity of the series LSP . We deduce that LSP is non-
stationary and characterized by the presence of at least one unit root.
To determine the order of integration of LSP , we differentiate it:

ΔLSPt = LSPt − LSPt−1 = RSPt


. (7.58)
7.2 Stationarity: Autocorrelation Function and Unit Root Test 311

Table 7.7 ADF test on LSP . Model without constant or trend


Null hypothesis: LSP has a unit root
Exogenous: none
Lag length: 1 (automatic – based on SIC, maxlag .= 17)
t-Statistic Prob.*
Augmented Dickey-Fuller test statistic 2.244829 0.9944
Test critical values: 1%level .−2.569614

5% level .−1.941460

10% level .−1.616272

*MacKinnon (1996) one-sided p-values


Augmented Dickey-Fuller test equation
Dependent variable: D(LSP)
Method: least squares
Sample: 1980M01 2021M06
Included observations: 498
Variable Coefficient Std. error t-Statistic Prob.
LSP(.−1) 0.000511 0.000228 2.244829 0.0252
D(LSP(.−1)) 0.225714 0.043756 5.158492 0.0000
R-squared 0.050383 Mean dependent var 0.004844
Adjusted R-squared 0.048469 S.D. dependent var 0.036778
S.E. of regression 0.035876 Akaike info criterion .−3.813483
Sum squared resid 0.638399 Schwarz criterion .−3.796573
Log likelihood 951.5572 Hannan-Quinn criterion .−3.806846

Durbin-Watson stat 1.965761

and we perform the ADF test on the series RSP . The null hypothesis that RSP is
non-stationary is tested against the alternative hypothesis of stationarity. We adopt
the same sequential strategy as before, first estimating the model with constant and
trend:


p
.ΔRSPt = λ + δt + φ RSPt−1 + φj ΔRSPt−j + εt (7.59)
j =1

The endogenous variable is the series of changes in returns, in other words, the
second difference of the LSP series. In order to determine the truncation parameter
p, we have estimated this model for various values of p and selected the one that
minimizes the information criteria. The application of this methodology leads us to
choose a number of lags p equal to 0, which corresponds to the case of a simple
Dickey-Fuller test. Consequently, we estimate the following model:

ΔRSPt = λ + δt + φ RSPt−1 + εt
. (7.60)
312 7 An Introduction to Time Series Models

Table 7.8 ADF test on RSP . Model without constant or trend


Null hypothesis: RSP has a unit root
Exogenous: none
Lag length: 0 (automatic – based on SIC, maxlag .= 17)
t-Statistic Prob.*
Augmented Dickey-Fuller test statistic .−17.48229 0.0000
Test critical values: 1%level .−2.569614

5% level .−1.941460

10% level .−1.616272

*MacKinnon (1996) one-sided p-values


Augmented Dickey-Fuller test equation
Dependent variable: D(RSP)
Method: least squares
Sample: 1980M01 2021M06
Included observations: 498
Variable Coefficient Std. error t-Statistic Prob.
D(LSP(.−1)) .−0.761108 0.043536 .−17.48229 0.0000
R-squared 0.380786 Mean dependent var .−3.10E-05
Adjusted R-squared 0.380786 S.D. dependent var 0.045776
S.E. of regression 0.036022 Akaike info criterion .−3.807390

Sum squared resid 0.644885 Schwarz criterion .−3.798935


Log likelihood 949.0402 Hannan-Quinn criterion .−3.804072
Durbin-Watson stat 1.969349

and start by testing the significance of the trend. The results (not reported here) give
us a calculated t-statistic associated with the trend equal to 0.1925. As this value
is lower than the critical value of 2.78, we do not reject the null hypothesis that
the trend is not significant. We therefore estimate the model with constant, without
trend. The results lead to a t-statistic associated with the constant equal to 2.3093,
below the critical value of 2.52. We finally estimate the model with no constant or
trend, the results of which are shown in Table 7.8.
The calculated value of the ADF statistic being equal to .−17.4823 and the critical
value at the 5% significance level being .−1.95, we have: .−17.4823 < −1.95. We
therefore reject the null hypothesis of non-stationarity of the series RSP . We deduce
that RSP is stationary, i.e., integrated of order 0. It follows that the series LSP is
integrated of order 1, since it has to be differentiated once to make it stationary.

7.3 ARMA Processes

ARMA (autoregressive moving-average) processes were introduced by Box and


Jenkins (1970). Such processes are sometimes referred to as a-theoretical in that
their purpose is to model a time series in terms of its past values and the present
and past values of the error term (noise). In other words, they do not refer to
7.3 ARMA Processes 313

any underlying economic theory. We begin by presenting the definition of ARMA


processes before describing the four-step methodology of Box and Jenkins.

7.3.1 Definitions
Autoregressive Processes
Definitions
Definition 7.5 An autoregressive process of order p, denoted AR(p), is a
stationary process Yt verifying a relation of the type:

. Yt − φ1 Yt−1 − ··· − φp Yt−p = εt (7.61)

where φi (i = 1, . . . , p) are real numbers and εt ∼ W N(0, σε2 ).

By introducing the lag operator L, the relation (7.61) can also be written as:

. (1 − φ1 L − ··· − φp Lp ) Yt = εt (7.62)

or:

. Ф(L) Yt = εt (7.63)

with: Ф(L) = 1 − φ1 L − ··· − φ p Lp .

Remark 7.7 In time series models, the error term εt is often called innovation.
This name derives from the fact that it is the only new information involved in the
process at date t.

Autocorrelations and Yule-Walker Equations


The autocorrelations of a process .AR(p) can be calculated by multiplying each
member of Eq. (7.61) by .Yt−h .(h > 0). Then taking the expectation of the variables
and dividing by .γ0 , we obtain the following relationship:

1   1
. E[Yt Yt−h ] − φ1 E[Yt−1 Yt−h ] − ··· − φp E[Yt−p Yt−h ] = E[εt Yt−h ]
γ0 γ0
(7.64)

Since .εt is white noise, we have: .E[εt Yt−h ] = 0 . We deduce that:

1  
. γh − φ1 γh−1 − . . . − φp γh−p = 0 (7.65)
γ0
314 7 An Introduction to Time Series Models

γh
Hence, noting .ρh = γ0 the autocorrelation function:

ρh − φ1 ρh−1 − . . . − φp ρh−p = 0
. (7.66)

The autocorrelation function of a process .AR(p) is finally given by:


p
ρh =
. φi ρh−i ∀h>0 (7.67)
i=1

The autocorrelations of a process .AR(p) are thus described by a linear recurrence


equation of order . p. Writing this relation for different values of h .(h = 1, 2, . . . , p),
we obtain the Yule-Walker equations:

⎛ ⎞ ⎛ 1 ρ ρ ... ρ ⎞⎛ ⎞
ρ1 1 2 p−1 φ1
⎜ ρ2 ⎟ ⎜ .. ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ρ1 1 . ⎟
⎟⎜
φ2 ⎟
.⎜ . ⎟ = ⎜ ⎜ . ⎟ (7.68)
. ⎜
⎝ . ⎠ ⎝ . . ⎟ ⎝ .. ⎠
. ρ1 ⎠
ρp ρp−1 1 φp

These equations allow us to obtain the autocorrelation coefficients as a function


of the autoregressive coefficients and vice versa.

Partial Autocorrelations
It is possible to calculate the partial autocorrelations of the AR process from the
Yule-Walker equations and the autocorrelations. For this, we use the algorithm of
Durbin (1960):


⎪ φ11 = ρ1 algorithm initialization

⎪ 
h−1

⎨ ρh − φh−1,j ρh−j
j =1
. φhh = for h = 2, 3, . . . (7.69)

⎪ 
h−1


1− φh−1,j ρj

⎩ j =1
φhj = φh−1,j − φhh φh−1,h−j for h = 2, 3 . . . and j = 1, . . . , h − 1

Property 7.1 For a process . AR(p), . φhh = 0 ∀ h > p. In other words, for a
process .AR(p), the partial autocorrelations cancel out from rank .p + 1.

This property is fundamental in that it allows us to identify the order p of AR


processes (see below).
7.3 ARMA Processes 315

Moving-Average Processes
Definitions
Definition 7.6 A moving-average process of order q, denoted MA(q) , is a
stationary process Yt verifying a relationship of the type:

Yt = εt − θ1 εt−1 − · · · − θq εt−q
. (7.70)

where the θi (i = 1, . . . , q) are real numbers and εt ∼ W N(0, σε2 ).

By introducing the lag operator L, the relation (7.70) can be written:

Yt = (1 − θ1 L − · · · − θq Lq ) εt
. (7.71)

or:

.Yt = Θ(L)εt (7.72)

with Θ(L) = 1 − θ1 L − · · · θq Lq .

Autocovariances and Autocorrelations


The autocovariance function of a process .MA(q) is given by:

γh = E[Yt Yt−h ]
. (7.73)
 
= E (εt − θ1 εt−1 − · · · − θq εt−q )(εt−h − · · · − θq εt−h−q )

Some simple calculations lead to the following expression:



(−θh + θ1 θh+1 + · · · + θq−h θq )σε2 if h = 1, . . . , q
γh =
. (7.74)
0 if h > q

If .h = 0, we obtain the variance of the process:

. γ0 = σY2 = (1 + θ12 + · · · + θq2 )σε2 (7.75)

γh
We deduce the autocorrelation function .ρh = γ0 :

 −θh +θ1 θh+1 + ··· +θq−h θq


if 1≤h≤q
ρh =
. 1+θ12 + ···+θq2 (7.76)
0 if h>q

Property 7.2 For a process .MA(q), .ρh = 0 for .h > q. In other words, the
autocorrelations cancel from rank .q + 1, when the true data generating process
is a .MA(q).
316 7 An Introduction to Time Series Models

This fundamental property allows us to identify the order q of MA processes.

Partial Autocorrelations
In order to calculate the partial autocorrelations of an MA process, we use
the Durbin algorithm. However, the partial autocorrelation function of a process
. MA(q) has no particular property and its expression is relatively complicated.

Autoregressive Moving-Average Processes: ARMA(p,q)


These processes are a natural extension of AR and MA processes. They are
mixed processes – in the sense that they simultaneously incorporate AR and MA
components – which allows for a more parsimonious description of the data.

Definitions
Definition 7.7 A stationary process Yt follows an ARMA(p, q) process if:

. Yt − φ1 Yt−1 − ··· − φp Yt−p = εt − θ1 εt−1 ··· θq εt−q (7.77)

where the coefficients φi (i = 1, . . . , p) and θj (j = 1, . . . , q) are real numbers and


εt ∼ W N(0, σε2 ).

By introducing the lag operator L, the relation (7.77) is written as:

. Ф(L) Yt = Θ(L) εt (7.78)

with Ф(L) = 1 − φ1 L − ··· − φp Lp and Θ(L) = 1 − θ1 L − · · · θq Lq .

Autocorrelations
To calculate the autocorrelations of an ARMA process, we proceed as in the case of
AR processes. We obtain the following expression:


p
ρh =
. φi ρh−i ∀h>q (7.79)
i=1

The autocorrelation function of ARMA processes satisfies the same difference


equation as that of AR processes.

Partial Autocorrelations
The partial autocorrelation function of ARMA processes has no simple expression.
It depends on the order of each part (p and q) and the value of the parameters. It
is most frequently characterized either by a decreasing exponential form or by a
damped oscillatory form.
7.3 ARMA Processes 317

7.3.2 The Box and Jenkins Methodology

In order to determine the appropriate ARMA process for modeling the time
series under consideration, Box and Jenkins suggested a four-step methodology:
identification, estimation, validation, and forecasting. Let us briefly review these
different steps.

Step 1: Identification of ARMA Processes


The purpose of this first step is to find the values of the parameters p and q of the
ARMA processes. To this end, we rely on the study of the autocorrelation and partial
autocorrelation functions.

Autocorrelation Function
We start by calculating the autocorrelation coefficients from the expression (7.6):

T
−h
(Yt − Ȳ )(Yt+h − Ȳ )
t=1
ρ̂h =
. (7.80)

T
(Yt − Ȳ )2
t=1

for various values of h: .h = 1, 2, . . . , H . Box and Jenkins suggest retaining a


maximum number of lags .H = T4 where T is the number of observations in
the series. After evaluating the function .ρ̂h , we test the statistical significance of
each autocorrelation coefficient using Bartlett’s result that .ρ̂h follows a normal
distribution. Thus, to test the null hypothesis that the autocorrelations are not
significantly different from zero, i.e., .ρh = 0, we calculate the value of the t-
statistic8 . tρ̂h = σ̂ ρ̂ρ̂h which we compare with the critical value read from the
( h)
Student’s t distribution table. The decision rule is:

– If . |tρ̂h | < t (T − l), we do not reject the null hypothesis: .ρh is not significant.
– If . |tρ̂h | ≥ t (T − l), we reject the null hypothesis: .ρh is significantly different
from zero,

where .t (T − l) is the value of the Student’s t distribution with .(T − l) degrees of


freedom, l being the number of estimated parameters.
This test enables us to identify the order q of the MA processes, since we know
that the autocorrelations of a process .MA(q) cancel out from rank .q + 1.

  !1/2
  
h−1
8 Bartlett showed that the standard deviation is given by .σ̂ ρ̂h = 1
T 1+2 ρ̂i2 .
i=1
318 7 An Introduction to Time Series Models

Example 7.1 Suppose that the application of the t-test on autocorrelations yields
ρ1 /= 0 and .ρ2 = . . . = ρH = 0. The process identified is then an .MA(1) since the
.

autocorrelations cancel out from rank .q + 1, with .q = 1.

Partial Autocorrelation Function


It is also possible to construct a test of the null hypothesis that the partial autocorre-
lations are not significantly different from zero, i.e., .φhh = 0. For large samples, the
partial autocorrelations follow a normal distribution with mean zero and variance
.1/T . To test the null hypothesis of nullity of the partial autocorrelations, we
"hh
calculate the test statistic: .tφ̂hh = √φ1/T . The value obtained is compared to the
critical value read from the Student’s t distribution table. The decision rule is:

– If . |tφ̂hh | < t (T − l), we do not reject the null hypothesis: .φhh is not significantly
different from zero.
– If . |tφ̂hh | ≥ t (T − l), we reject the null hypothesis: .φhh is significantly different
from zero.,

where .t (T − l) is the value of the Student’s t distribution with .(T − l) degrees of


freedom, l being the number of estimated parameters.
This test enables us to identify the order p of the AR processes, since we know
that the partial autocorrelations of a process .AR(p) cancel out from rank .p + 1.

Example 7.2 Suppose that the application of the t-test on partial autocorrelations
yields .φ11 /= 0 and .φ22 = . . . = φH H = 0. The process identified is then an .AR(1)
since the partial autocorrelations cancel out from rank .p + 1, with .p = 1.

At the end of this identification stage, one or more models have been selected. It
is now necessary to estimate each selected model, which is the object of the second
step of the Box and Jenkins procedure.

Step 2: Estimation of ARMA Processes


After identifying the values p and q of one or more ARMA processes, the next
step is to estimate the coefficients associated with the autoregressive and moving-
average terms. In some cases, notably for .AR(p) processes with no autocorrelation
of the errors, it is possible to apply the OLS method. More generally, we use the
maximum likelihood method or nonlinear least squares. We will not describe these
estimation techniques here and refer readers to Gouriéroux and Monfort (2008) or
Greene (2020).

Step 3: Validation of ARMA Processes


At the beginning of this step, we have several ARMA processes whose parameters
have been estimated. We now need to validate these models in order to distinguish
between them. To do this, we apply tests on the coefficients and on the residuals:
7.3 ARMA Processes 319

– With regard to the coefficients, these are the usual significance tests (t-tests). As
these tests are identical to those presented in the previous chapters, we will not
repeat them here. Let us simply note that if some of the estimated coefficients
are not significant, the estimation must be repeated by deleting the variable(s)
associated with the non-significant coefficients.
– With regard to the residuals, the aim is to test whether they have the “good”
statistical properties. In particular, we need to test whether the residuals are
homoskedastic and not autocorrelated.

If several models are validated, the validation step should continue with a
comparison between these models.

Tests on Residuals
"(L)
Ф
The purpose of these tests is to verify that the residuals .et = Θ "(L) Yt do follow
a white noise process. To this end, we apply tests of absence of autocorrelation
and tests of homoskedasticity. These various tests have already been presented in
detail in Chap. 4 and remain valid in the context of ARMA processes. Thus, in
order to test the null hypothesis of no autocorrelation, the Breusch-Godfrey, Box-
Pierce, or Ljung-Box tests can be applied. Similarly, to test the null hypothesis of
homoskedasticity, the tests of Goldfeld and Quandt, Glejser, Breusch-Pagan, White,
or the ARCH test can be implemented.
The tests most commonly used in time series econometrics are the Box-Pierce
or Ljung-Box tests with regard to absence of autocorrelation, and the ARCH test
with regard to homoskedasticity. It is worth clarifying the number of degrees
of freedom associated with the Box-Pierce and Ljung-Box tests. Under the null
hypothesis of no autocorrelation, these two statistics have a Chi-squared distribution
with .(H − p − q) degrees of freedom, where H is the maximum number of lags
considered for calculating autocorrelations, p is the order of the autoregressive part,
and q is the order of the moving-average part.
Once the various tests have been applied, several models can be validated. It
remains for us to compare them in an attempt to select the most “adequate” model.
To this end, various model selection criteria can be used.

Model Selection Criteria


There are several types of criteria that can be used to compare validated models:

– Standard criteria: they are based on the calculation of the forecast error that we
seek to minimize. In this context, the most frequently used criteria are:
– The mean absolute error:
1 
MAE =
. |et | (7.81)
T t
320 7 An Introduction to Time Series Models

– The root mean squared error:



1  2
RMSE =
. e (7.82)
T t t

– The mean absolute percent error:


 
1   et 
MAP E = 100 (7.83)
T t  Yt 
.

where T is the number of observations in the series .Yt studied and .et are the
residuals.
The lower the value taken by these criteria, the closer the estimated model is
to the observations.
– Information criteria: we have already presented them in Chap. 3. The most
widely used criteria are those of Akaike, Schwarz, and, to a lesser extent,
Hannan-Quinn:
– The Akaike information criterion (1969):9

2(p + q)
AI C = log "
. σε2 + (7.84)
T
– The Schwarz information criterion (1978):

log T
SI C = log "
. σε2 + (p + q) (7.85)
T

– The Hannan-Quinn information criterion (1979):10

log(log T )
H Q = log "
. σε2 + 2(p + q) (7.86)
T

We seek to minimize these various criteria. Their application allows us to select


a model among the various validated ARMA processes.

Step 4: Prediction of ARMA Processes


The final step in the Box and Jenkins methodology is the prediction step. Consider
a process .ARMA(p, q):

Ф(L) Yt = Θ(L) εt
. (7.87)

9 See also Akaike (1969, 1974).


10 It is assumed here that the constant c in the expression of the HQ criterion is equal to 1.
7.3 ARMA Processes 321

"t+h denote the forecast made at t for the date .t + h, with h denoting the
and let .Y
forecast horizon. By definition, we have the following expression:

."t+h = E[Yt+h |It ]


Y (7.88)

where .It is the set of information available at date t, i.e., . It = (Y1 , Y2 , . . . ,


Yt , ε1 , ε2 , . . . , εt ). The expectation here is taken in the sense of conditional
expectation: it represents the best forecast of the series Y conditionally on the set of
available information. In the linear case, it is a regression function.
Let us take the example of a process . ARMA(1, 1)

. Yt = φ1 Yt−1 + εt − θ1 εt−1 (7.89)

with . |φ1 | < 1 and . |θ1 | < 1 . Let us calculate the forecasts for various horizons.

– .Yt+1 = φ1 Yt + εt+1 − θ1 εt
"t+1 = E[Yt+1 |It ] = φ1 Yt − θ1 εt
.Y

– .Yt+2 = φ1 Yt+1 + εt+2 − θ1 εt+1


"t+2 = E[Yt+2 |It ] = φ1 Y
.Y "t+1

We deduce the following relationship giving the series of recursive forecasts:

"t+h = φ1 Y
Y
. "t+h−1 ∀h>1 (7.90)

"t+h , and construct a


We can calculate the forecast error, .et+h = Yt+h − Y
prediction interval:

"t+h ± u × σet+h
Y
. (7.91)

assuming that the residuals follow a Gaussian white noise process, with u being the
value of the standard normal distribution at the selected significance level (at the
5% level, .u = 1.96). It is then possible to impart a certain degree of confidence to
the forecast if the value of the dependent variable, for the horizon considered, lies
within the prediction interval.

7.3.3 Empirical Application

Consider again the series RSP of the returns of Standard and Poor’s stock index
at monthly frequency over the period from February 1980 to June 2021. As we
have previously shown, this series is stationary and can, therefore, be modeled by
an ARMA-type process. To this end, let us take up the four steps of the Box and
Jenkins methodology.
322 7 An Introduction to Time Series Models

Autocorrelation Partial Correlation AC PAC Q-Stat Prob

1 0.225 0.225 25.466 0.000


2 -0.022 -0.077 25.712 0.000
3 -0.020 0.003 25.918 0.000
4 0.036 0.041 26.564 0.000
5 0.090 0.075 30.672 0.000
6 -0.027 -0.067 31.046 0.000
7 0.010 0.043 31.092 0.000
8 0.030 0.017 31.544 0.000
9 0.010 -0.007 31.590 0.000
10 -0.010 -0.013 31.640 0.000
11 0.005 0.020 31.650 0.001
12 -0.014 -0.031 31.755 0.002
13 -0.038 -0.032 32.491 0.002
14 -0.034 -0.017 33.092 0.003
15 -0.017 -0.008 33.245 0.004
16 0.030 0.031 33.723 0.006
17 0.001 -0.010 33.723 0.009
18 0.034 0.048 34.339 0.011
19 -0.007 -0.026 34.361 0.017
20 -0.065 -0.058 36.589 0.013

Fig. 7.12 Correlogram of the series RSP

Step 1: Identification
In order to identify the orders p and q, let us consider the graph of autocorrelations
and partial autocorrelations of the series RSP . Examining Fig. 7.12 shows that:

– The first autocorrelation falls outside the confidence interval, being significantly
different from zero. From order 2 onwards, the autocorrelations cancel out. We
deduce .q = 1.
– The first partial autocorrelation lies outside the confidence interval, and is sig-
nificantly different from zero. From order 2 onwards, the partial autocorrelations
cancel out. We deduce .p = 1.

At the end of this step, we identify three processes: .AR(1), .MA(1), and
ARMA(1, 1). We can now estimate each of these models.
.

Step 2: Estimation
We estimate the three processes identified: .AR(1) (Table 7.9), .MA(1) (Table 7.10),
and .ARMA(1, 1) (Table 7.11).
7.3 ARMA Processes 323

Table 7.9 Estimation of the process AR(1)


Dependent variable: RSP
Sample: 1980M01 2021M06
Included observations: 498
Variable Coefficient Std. error t-Statistic Prob.
C 0.004854 0.002402 2.020679 0.0439
AR(1) 0.225054 0.036249 6.208603 0.0000
R-squared 0.050841 Mean dependent var 0.004844
Adjusted R-squared 0.047006 S.D. dependent var 0.036778
S.E. of regression 0.035904 Akaike info criterion −3.809845
Sum squared resid 0.638091 Schwarz criterion −3.784480
Log likelihood 951.6514 Hannan-Quinn criterion −3.799890
F-statistic 13.25728 Durbin-Watson stat 1.964289
Prob(F-statistic) 0.000002

Table 7.10 Estimation of the process MA(1)


Dependent variable: RSP
Sample: 1980M01 2021M06
Included observations: 498
Variable Coefficient Std. error t-Statistic Prob.
C 0.004849 0.002277 2.129807 0.0337
MA(1) 0.245678 0.035607 6.899709 0.0000
R-squared 0.055787 Mean dependent var 0.004844
Adjusted R-squared 0.051972 S.D. dependent var 0.036778
S.E. of regression 0.035810 Akaike info criterion −3.815048
Sum squared resid 0.634766 Schwarz criterion −3.789683
Log likelihood 952.9471 Hannan-Quinn criterion −3.805094
F-statistic 14.62309 Durbin-Watson stat 2.004813
Prob(F-statistic) 0.000001

Step 3: Validation
Tests of Significance of Coefficients
Let us first proceed to the significance of the coefficients in each of the three
estimated models:

– .AR(1) process: the first-order autoregressive coefficient is significantly different


from zero as its t-statistic 6.2086 is higher than the critical value 1.96 at the 5%
significance level. The .AR(1) model is therefore a candidate for the modeling of
RSP .
– .MA(1) process: the first-order moving-average coefficient is significantly differ-
ent from zero as its t-statistic 6.8997 is higher than the critical value 1.96 at the
5% significance level. The .MA(1) model is therefore a candidate for modeling
RSP .
324 7 An Introduction to Time Series Models

Table 7.11 Estimation of the process ARMA(1)


Dependent variable: RSP
Sample: 1980M01 2021M06
Included observations: 498
Variable Coefficient Std. error t-Statistic Prob.
C 0.004848 0.002286 2.120727 0.0344
AR(1) −0.049048 0.180326 −0.271997 0.7857
MA(1) 0.291703 0.175861 1.658713 0.0978
R-squared 0.055927 Mean dependent var 0.004844
Adjusted R-squared 0.050194 S.D. dependent var 0.036778
S.E. of regression 0.035844 Akaike info criterion −3.811180
Sum squared resid 0.634672 Schwarz criterion −3.777360
Log likelihood 952.9839 Hannan-Quinn criterion −3.797907
F-statistic 9.754938 Durbin-Watson stat 1.999172
Prob(F-statistic) 0.000003

– .ARMA(1, 1) process: the t-statistic associated with the autoregressive and


moving-average coefficients being less than 1.96 in absolute value, none of
the coefficients is significantly different from zero. We can therefore reject the
.ARMA(1, 1) model.

At the end of this first phase of the validation stage, two processes are candidates
for modeling the series RSP : the .AR(1) and the .MA(1) processes.

Tests on Residuals
We now apply the tests to the residuals of the .AR(1) and .MA(1) models. We start
with the Ljung-Box test of absence of autocorrelation. The results are shown in
Figs. 7.13 and 7.14. These figures first show that the autocorrelations of the residuals
lie within the confidence interval for each of the two models, suggesting the absence
of autocorrelation. Let us calculate the Ljung-Box statistic for a maximum number
of lags H of 20:

– For the residuals of the .AR(1) model, we have .LB(20) = 14.333. Under
the null hypothesis of no autocorrelation, this statistic follows a Chi-squared
distribution with .(H − p − q) = (20 − 1 − 0) = 19 degrees of freedom. At
the 5% significance level, the critical value of the Chi-squared distribution with
19 degrees of freedom is .30.144. Since .14.333 < 30.144, we do not reject the
null hypothesis of no autocorrelation of residuals. The model .AR(1) therefore
remains a candidate.
– For the residuals of the .MA(1) model, we have .LB(20) = 11.388. Under the null
hypothesis of no autocorrelation, this statistic has a Chi-squared distribution with
.(H − p − q) = (20−0−1) = 19 degrees of freedom, the corresponding critical

value being .30.144 at the 5% significance level. We find that .11.388 < 30.144,
7.3 ARMA Processes 325

Q-statistic probabilities adjusted for 1 ARMA term

Autocorrelation Partial Correlation AC PAC Q-Stat Prob

1 0.018 0.018 0.1574


2 -0.073 -0.073 2.8443 0.092
3 -0.026 -0.023 3.1718 0.205
4 0.023 0.019 3.4385 0.329
5 0.098 0.094 8.2691 0.082
6 -0.054 -0.055 9.7296 0.083
7 0.010 0.027 9.7806 0.134
8 0.029 0.024 10.195 0.178
9 0.006 0.001 10.213 0.250
10 -0.014 -0.018 10.317 0.325
11 0.011 0.024 10.377 0.408
12 -0.008 -0.019 10.410 0.494
13 -0.030 -0.032 10.879 0.539
14 -0.025 -0.022 11.192 0.595
15 -0.018 -0.020 11.359 0.658
16 0.037 0.028 12.063 0.674
17 -0.014 -0.014 12.169 0.732
18 0.040 0.051 12.980 0.738
19 0.000 -0.001 12.980 0.793
20 -0.051 -0.045 14.333 0.764

Fig. 7.13 Correlogram of the residuals of the AR(1) model

implying that the null hypothesis of no autocorrelation of residuals is not rejected.


The .MA(1) model therefore remains a candidate.

Let us now apply the ARCH test to check that the residuals of both models
are indeed homoskedastic. This test involves regressing the squared residuals on
a constant and their .𝓁 past values:


𝓁
et2 = a0 +
.
2
ai et−i (7.92)
i=1

and testing the null hypothesis of homoskedasticity:

H0 : a1 = a2 = . . . = a𝓁 = 0
. (7.93)

against the alternative hypothesis of conditional heteroskedasticity, which states that


at least one of the coefficients .ai , .i = 1, . . . , 𝓁, is significantly different from zero.
Under the null hypothesis of homoskedasticity, the test statistic .T R 2 , where T is the
number of observations and .R 2 is the coefficient of determination associated with
the regression (7.92), follows a Chi-squared distribution with .𝓁 degrees of freedom.
326 7 An Introduction to Time Series Models

Q-statistic probabilities adjusted for 1 ARMA term

Autocorrelation Partial Correlation AC PAC Q-Stat Prob

1 -0.003 -0.003 0.0034


2 -0.017 -0.017 0.1453 0.703
3 -0.020 -0.020 0.3502 0.839
4 0.018 0.017 0.5058 0.918
5 0.098 0.098 5.4056 0.248
6 -0.054 -0.053 6.8662 0.231
7 0.016 0.020 7.0039 0.320
8 0.024 0.027 7.3081 0.398
9 0.007 0.002 7.3338 0.501
10 -0.014 -0.021 7.4324 0.592
11 0.010 0.022 7.4854 0.679
12 -0.010 -0.018 7.5323 0.754
13 -0.030 -0.034 8.0006 0.785
14 -0.022 -0.020 8.2520 0.827
15 -0.021 -0.020 8.4747 0.863
16 0.038 0.031 9.2276 0.865
17 -0.017 -0.013 9.3782 0.897
18 0.040 0.047 10.189 0.895
19 -0.005 -0.002 10.200 0.925
20 -0.048 -0.046 11.388 0.910

Fig. 7.14 Correlogram of the residuals of the MA(1) model

Table 7.12 ARCH test .AR(1) .MA(1)


results
.a0 0.0012 (6.9184) 0.0012 (6.9287)
.a1 0.0891 (1.9891) 0.0941 (2.1032)
.T R
2 3.9413 4.4021
Values in parentheses are t-statistics of the
estimated coefficients

We have estimated the relationship (7.92) on the squares of the residuals of each
of the two models considered, using a number of lags .𝓁 = 1. The results are shown
in Table 7.12. For both models, the critical value to which the .T R 2 test statistic must
be compared is that of the Chi-squared distribution with 1 degree of freedom, i.e.,
3.841 at the 5% significance level. It can be seen that, for both models, the calculated
value of the test statistic is higher than the critical value. The null hypothesis of
homoskedasticity is consequently rejected at the 5% significance level.
In summary, the residuals of the .AR(1) and .MA(1) models are not autocorre-
lated, but (slightly) heteroskedastic. Both models therefore pass the validation stage
from the point of view of the absence of autocorrelation, but not from the point of
view of the homoskedasticity property. This result is not surprising insofar as the
study concerns financial series that are known to exhibit heteroskedasticity due to
their time-varying volatility.
7.4 Extension to the Multivariate Case: VAR Processes 327

Table 7.13 Model .AR(1) .MA(1)


comparison criteria
RMSE 0.035794 0.035702
MAE 0.025478 0.025390
MAP E 357.1195 397.9688
AI C −3.809845 −3.815048
SI C −3.784480 −3.789683
HQ −3.799890 −3.805094
Values in bold correspond to values min-
imizing model selection criteria

Model Selection Criteria


To conclude the validation step, let us compare the two models using model
selection criteria. Table 7.13 summarizes the values obtained. These results show
that the .AR(1) model minimizes only the MAPE criterion, the other five criteria
being minimized by the .MA(1) model. If one of the two models has to be chosen,
the .MA(1) model should be preferred for the returns of the US stock market index.
It is then possible to forecast returns based on the .MA(1) process. Strictly speaking,
neither model should be retained insofar as they do not pass the validation step due
to heteroskedasticity in the errors.

7.4 Extension to the Multivariate Case: VAR Processes

The .V AR(p) (vector autoregressive) processes are a generalization of the autore-


gressive processes to the multivariate case. They were introduced by Sims (1980) as
an alternative to structural macroeconometric models, i.e., simultaneous equations
models (see Chap. 8). According to Sims (1980), these macroeconometric models
can be criticized on several points, including the existence of a priori restrictions on
parameters that are too strong in relation to what the theory predicts, the simultaneity
of the relationships, the assumed exogeneity of certain variables, and poor predictive
quality. VAR models have been developed in response to these various criticisms.
Their essential characteristic is that they no longer distinguish between endogenous
and exogenous variables, in the sense that all the variables in the model have the
same status.
VAR models have been the subject of many developments, and we will only
present their general structure here (for more details, readers can refer to Hamilton,
1994; Lardic and Mignon, 2002; or Greene, 2020).

7.4.1 Writing the Model


Introductory Example
Consider two stationary variables .Y1t and .Y2t . Each variable is a function of its own
past values, but also of the past and present values of the other variables. Suppose
328 7 An Introduction to Time Series Models

we have .p = 4. The .V AR(4) model describing these two variables is written as:


⎪ 
4 
4
⎨ Y1t = a1 +
⎪ b1i Y1t−i + c1j Y2t−j − d1 Y2t + ε1t
i=1 j =1
. (7.94)

⎪ 
4 
4

⎩ Y2t = a2 + b2i Y1t−i + c2j Y2t−j − d2 Y1t + ε2t
i=1 j =1

where .ε1t and . ε2t are two uncorrelated white noise processes.
This model involves estimating 20 coefficients. The number of parameters to be
estimated grows rapidly with the number of lags, as .pN 2 , where p is the number of
lags and N the number of variables in the model.
In matrix form, the .V AR(4) process is written:


4
B Y t = Ф0 +
. Фi Y t−i + ε t (7.95)
i=1

with:
# $ # $ # $
1 d1 a1 Y1t
B=
. Ф0 = Yt = (7.96)
d2 1 a2 Y2t
# $ # $
b1i c1i ε1t
Фi = εt =
b2i c2i ε2t

Then we simply multiply each term of (7.95) by .B −1 , assuming .B invertible, to


obtain the usual form of the VAR model.

General Formulation
We generalize the previous example to the case where .Y t contains N variables and
for any order of lags p. A .V AR(p) process with N variables is written in matrix
form:

. Y t = Ф0 + Ф1 Y t−1 + · · · + Фp Y t−p + ε t (7.97)

⎛ ⎞ ⎛ ⎞ ⎛ ⎞
Y1t ε1t a10
⎜ . ⎟ ⎜ ⎟ ⎜ ⎟
.Y t = ⎝ . ⎠
. ε t = ⎝ ... ⎠ Ф0 = ⎝ ... ⎠ (7.98)
YN t εN t 0
aN
⎛ ⎞
a 1 a1p
2 ... N
a1p
⎜ 1p .. ⎟
Фp = ⎜
⎝ ..
. ..
.

. ⎠
1 2
aNp aNp . . . N
aNp

where .ε t is white noise with variance-covariance matrix .Σ ε .


7.4 Extension to the Multivariate Case: VAR Processes 329

We can also write:

(I − Ф1 L − Ф2 L2 − · · · − Фp Lp ) Y t = Ф0 + εt
. (7.99)

or:

Ф(L) Y t = Ф0 + ε t
. (7.100)


p
with .Ф (L) = I − Фi Li .
i=1
More formally, the following definition is used.

Definition 7.8 .Y t follows a .V AR(p) process if and only if there exist white noise
ε t .(ε t ∼ W N (0, Σ ε )), . Ф0 ∈ R N , and . p matrices . Ф1 , . . . , Фp such that:
.


p
Yt −
. Фi Y t−i = Ф0 + ε t (7.101)
i=1

or:

Ф(L)Y t = Ф0 + ε t
. (7.102)

where .Ф0 is the identity matrix (.I ) and:


p
Ф(L) = I −
. Фi Li (7.103)
i=1

7.4.2 Estimation of the Parameters of a V AR(p) Process


and Validation

The parameters of the VAR process can only be estimated on stationary time
series.11 Two estimation techniques are possible: estimation of each equation of
the VAR model by OLS or estimation by the maximum likelihood technique. The
estimation of a VAR model involves choosing the number of lags p. To determine
this value, the information criteria can be used. The procedure consists in estimating
a number of VAR models for an order p ranging from 0 to h where h is the maximum

11 Strictly speaking, it is possible to estimate VAR processes in which non-stationary variables


are involved using OLS. In this case, the estimators are super-consistent, but they are no longer
asymptotically normal, which poses a problem for statistical inference since the usual tests can no
longer be implemented.
330 7 An Introduction to Time Series Models

lag. We select the lag p that minimizes the information criteria AIC, SIC, and HQ12
defined as follows:

2N 2 p
AI C = log det Σ̂ ε +
. (7.104)
T
log T
SI C = log det Σ̂ ε + N 2 p
. (7.105)
T
log(log T )
H Q = log det Σ̂ ε + 2N 2 p
. (7.106)
T
where N is the number of variables in the system, T is the number of observations,
and .Σ̂ ε is an estimator of the variance-covariance matrix of the residuals, det
denoting its determinant.

Remark 7.8 It is also possible to perform maximum likelihood ratio tests to


validate the number of lags p selected. Generally speaking, we test:
.H0 : Фp+1 = 0: process .V AR(p)
.H1 : Фp+1 /= 0: process .V AR(p + 1)
The technique involves estimating a constrained model .(V AR(p)) and an
unconstrained model .(V AR(p + 1)) and performing the log-likelihood ratio test.
If the null hypothesis is not rejected, we continue the procedure by testing:
.H0 : Фp = 0: process .V AR(p − 1)
.H1 : Фp /= 0: process .V AR(p)
We thus have a sequence of nested tests whose goal is to determine the order p
of the VAR process.

Remark 7.9 In the case of AR processes, in addition to the tests on the parameters,
tests on the residuals are performed in order to validate the process. In the case of
VAR processes, these tests are not very powerful, and we prefer to use a graph
of the residuals. Residuals should be examined carefully especially when using
VAR models for impulse response analysis, where the absence of correlation of
the innovations is crucial for the interpretation.

7.4.3 Forecasting VAR Processes

Consider a process .V AR(p):

Y t = Ф1 Y t−1 + . . . + Фp Y t−p + ε t
. (7.107)

12 We have assumed here that the constant c involved in the expression of the HQ criterion is equal
to 1.
7.4 Extension to the Multivariate Case: VAR Processes 331

It is assumed that p has been chosen, that the .Фi have been estimated, and that
the variance-covariance matrix associated with .ε t has been estimated.
Under certain conditions, the prediction in .(T + 1) of the process is:
  
E Y T +1 Y T = Ф̂1 Y T + . . . + Ф̂p Y T −p+1
. (7.108)

where .Y T denotes the past from .Y up to and including the date T .

7.4.4 Granger Causality

The notion of causality plays a very important role in economics insofar as it


enables us to better understand the relationships between variables. To introduce
this notion, consider two variables .Y1 and .Y2 . Heuristically, we say that .Y1 Granger
causes .Y2 if the prediction of .Y2 based on the knowledge of the joint pasts of .Y1 and
.Y2 is better than the forecast based on the knowledge of the past of .Y2 alone (see

Granger, 1969). As an example, consider the following .V AR(p) process with two
variables .Y1t and .Y2t :
# $ # $ # $# $
Y1t a0 a11 b11Y1t−1
. = + + ... (7.109)
Y2t b0 a12 b12Y2t−1
 # $ # $
ap1 bp1 Y1t−p ε1t
+ +
ap2 bp2 Y2t−p ε2t

Testing for the absence of causality from .Y1t to .Y2t is equivalent to performing
a restriction test on the coefficients of the variables . Y1t of the VAR representation.
Specifically:

– . Y1t does not cause . Y2t if the following null hypothesis is not rejected: .H0 : b11 =
b21 = · · · = bp1 = 0.
– .Y2t does not cause . Y1t if the following null hypothesis is not rejected.: H0 : a12 =
a22 = · · · = ap2 = 0.

These are classic Fisher tests. They are performed either equation by equation, or
directly by comparison between a constrained . V AR model and an unconstrained
. V AR model. In the latter case, we can also perform a maximum likelihood ratio

test.
In the case of Fisher tests, the strategy is as follows for a test of absence of
causality from .Y1t to .Y2t :

– We regress .Y2t on its p past values and on the p past values of .Y1t . This is the
unconstrained model and we note .RSSnc the sum of squared residuals associated
with this model.
332 7 An Introduction to Time Series Models

– We regress .Y2t on its p past values and note the sum of squared residuals
.RSSc . This is the constrained model in that we have imposed the nullity of the

coefficients associated with the p values of .Y1t .


– We calculate the test statistic:
(RSSc − RSSnc ) /r
F =
. (7.110)
RSSnc / (T − k − 1)

where r is the number of constraints, i.e., the number of coefficients being tested
for nullity, and k is the number of estimated parameters (excluding the constant)
involved in the unconstrained model. Under the null hypothesis of no causality,
this statistic has a Fisher distribution with .(r, T − k − 1) degrees of freedom.

In the case of a maximum likelihood ratio test, we calculate the test statistic:
 c
det Σ̂ ε
. C = T log nc (7.111)
det Σ̂ ε
c nc
where .Σ̂ ε (respectively .Σ̂ ε ) denotes the estimator of the variance-covariance
matrix of the residuals of the constrained (respectively unconstrained) model, det
being the determinant. Under the null hypothesis of no causality, this statistic
follows a Chi-squared distribution with 2p degrees of freedom.

Remark 7.10 If we reject the two null hypotheses (absence of causality from .Y1 to
Y2 and absence of causality from .Y2 to .Y1 ), we have a bi-directional causality; we
.

speak of feedback loop (feedback effect).

Remark 7.11 One of the practical applications of VAR models lies in the calcu-
lation of the impulse response function. The latter makes it possible to assess the
effect of a random shock on the variables and can therefore be useful for analyzing
the effects of an economic policy. This analysis is beyond the scope of this book
and we refer the reader to Hamilton (1994), Lardic and Mignon (2002), or Greene
(2020).

7.4.5 Empirical Application

Consider Standard and Poor’s 500 (SP 500) US stock index series and the associated
dividend series over the period from January 1871 to June 2021. Since the data are
monthly, the number of observations is 1 806. The series are expressed in real terms,
i.e., they have been deflated by the consumer price index.13

13 The data come from Robert Shiller’s website: http://www.econ.yale.edu/~shiller/data.htm.


7.4 Extension to the Multivariate Case: VAR Processes 333

1
80 90 00 10 20 30 40 50 60 70 80 90 00 10 20

LSP LDIV

Fig. 7.15 Series LSP and LDIV, 1871.01–2021.06

Table 7.14 ADF test results ADF CV at 5% CV at 1%


LSP 1.7308 −1.95 −2.58
LDIV −3.2391 −3.41 −3.96
DLSP −32.3319 −1.95 −2.58
DLDIV −13.0135 −1.95 −2.58
CV critical value

We denote LSP the logarithm of the SP 500 index and LDI V the dividend series
in logarithms. We are interested in the relationship between the two series, seeking
to estimate a VAR-type model. Let us start by studying the characteristics of the two
variables in terms of stationarity.
The two series are shown in Fig. 7.15 and appear to exhibit an overall upward
trend, suggesting that they are non-stationary in the mean. In order to confirm this
intuition, we perform the Dickey-Fuller unit root tests. To do this, we follow the
sequential strategy presented earlier. First, we estimate the model with constant and
trend. If the trend is not significant, we estimate the model with constant. Finally, if
the constant is not significant, we estimate the model without constant or trend. The
implementation of this strategy leads us to select:

– A model without constant or trend for the series LSP


– A model with constant and trend for the series LDI V

The results obtained for the value of the ADF statistic are shown in Table 7.14.
334 7 An Introduction to Time Series Models

Table 7.15 Choice of p, p AI C SI C HQ


VAR estimation
0 −9.230339 −9.224213 −9.228077
1 −9.664253 −9.645876 −9.657468
2 −9.700005 −9.669376 −9.688696
3 −9.702788 −9.659908 −9.686956
4 −9.709008 −9.653877 −9.688653
5 −9.712631 −9.645249 −9.687753
6 −9.712146 −9.632512 −9.682744
7 −9.711237 −9.619352 −9.677312
8 −9.710607 −9.606471 −9.672159
9 −9.709459 −9.593072 −9.666488
10 −9.708673 −9.580035 −9.661179
11 −9.706098 −9.565208 −9.65408
12 −9.713336 −9.560195 −9.656794
Values in bold correspond to values minimizing
information criteria

It can be seen that the null hypothesis of unit root cannot be rejected for the
two series considered LSP and LDI V . The application of the Dickey-Fuller tests
on the series in first difference (denoted DLSP and DLDI V ) indicates that they
are stationary. In other words, the differentiated series are integrated of order 0,
implying that the series LSP and LDI V are integrated of order 1.
The VAR model is then estimated on the series DLSP and DLDI V , i.e., on the
stationary series. We start by looking for the order p of the VAR process. To this
end, we estimate the VAR process for values of p ranging from 1 to 12 and report
the values taken by the AIC, SIC, and HQ criteria (see Table 7.15). The SIC and HQ
criteria lead us to select a .V AR(2) process, whereas, according to the AIC criterion,
we should select a .V AR(12) process. For the sake of parsimony, we continue the
study with the .V AR(2) process.
The results from the estimation of the .V AR(2) process are shown in Table 7.16;
the values in square brackets represent the t-statistics of the estimated coefficients.
It can be seen that the SP returns are a function of themselves lagged by one and
two periods and of dividends lagged by two periods. The logarithmic changes in
dividends are a function of their one- and two-period lagged values and of the 1-
month lagged values of the SP returns.
Let us now perform the Granger causality test and start by implementing Fisher
tests. First, let us test the null hypothesis that the dividend growth rate does not cause
the returns of the SP index. We estimate two models:

– The constrained model consisting in regressing DLSP on a constant and on its


values lagged by 1 and 2 months. We obtain a sum of squared residuals equal to
.RSSc = 2.7925;

– The unconstrained model consisting in regressing DLSP on a constant, its


lagged values by 1 and 2 months, and the lagged values (1 and 2 months) of
DLDI V . We obtain a sum of squared residuals equal to .RSSnc = 2.7786.
7.4 Extension to the Multivariate Case: VAR Processes 335

Table 7.16 Estimation of the VAR(2) process


D(LSP) D(LDIV)
DLSP(.−1) 0.283078 .−0.031786

[ 11.9821] [.−4.49688]
DLSP(.−2) .−0.073616 0.010414
[.−3.10058] [ 1.46603]
DLDIV(.−1) .−0.087335 0.459320
[.−1.11822] [ 19.6562]
DLDIV(.−2) 0.229337 0.165131
[ 2.94504] [ 7.08752]
C 0.001479 0.000546
[ 1.58671] [ 1.95616]
R-squared 0.078311 0.319494
Adj. R-squared 0.076260 0.317980
Sum sq. resids 2.778588 0.248729
S.E. equation 0.039311 0.011762
F-statistic 38.19147 211.0376
Log likelihood 3279.105 5454.724
Akaike AIC .−3.631841 .−6.045174

Schwarz SC .−3.616596 .−6.029929


Mean dependent 0.002103 0.001316
S.D. dependent 0.040902 0.014242
Determinant resid covariance (dof adj.) 2.11E-07
Determinant resid covariance 2.10E-07
Log likelihood 8743.940
Akaike information criterion .−9.688231
Schwarz criterion .−9.657742
Number of coefficients 10

The Fisher test statistic is:


(2.7925 − 2.7786) /2
F =
. = 4.5023 (7.112)
2.7786/ (1805 − 4 − 1)

The number of constraints is 2 (we are testing the nullity of the two coefficients
associated with the lagged dividend growth rate), the number of observations is
1 805, and the number of estimated parameters (excluding the constant) in the
unconstrained model is 4. Under the null hypothesis, the F -statistic follows a Fisher
distribution with (2,1800) degrees of freedom. At the 5% significance level, the
critical value is 2.997. Since .4.5023 > 2.997 we reject the null hypothesis of no
causality of the dividend growth rate towards stock market returns.
336 7 An Introduction to Time Series Models

Let us now consider the test of the null hypothesis that stock market returns do
not cause the dividend growth rate. We estimate two models:

– The constrained model consisting in regressing DLDI V on a constant and on


its values lagged by one and two periods. We obtain a sum of squared residuals
equal to .RSSc = 0.2515.
– The unconstrained model consisting in regressing DLDI V on a constant, its
lagged values by 1 and 2 months, and the lagged values (one and two periods) of
DLSP . We obtain a sum of squared residuals equal to .RSSnc = 0.2487.

The Fisher test statistic is:


(0.2515 − 0.2487) /2
F =
. = 10.1327 (7.113)
0.2487/ (1805 − 4 − 1)

If we compare this value with the critical value 2.997, we reject the null
hypothesis of no causality of stock market returns towards the dividend growth rate.
We can also perform a Chi-squared test, calculating the test statistic C. The
calculation of this statistic gives us:

– For the test of the null hypothesis that the dividend growth rate does not cause
returns: .C = 9.0191
– For the test of the null hypothesis that returns do not cause the dividend growth
rate: .C = 20.2855

In both cases, the statistic C is higher than the critical value of the Chi-squared
distribution at the 5% significance level. The null hypothesis is rejected. There is
therefore a two-way causality between stock market returns and the dividend growth
rate, testifying to the presence of a feedback effect.

7.5 Cointegration and Error-Correction Models


7.5.1 The Problem of Spurious Regressions

The theory of cointegration was introduced by Granger (1981) to study non-


stationary time series. This theory is widely used in economic and financial
applications, since many macroeconomic and financial series are non-stationary.
However, if we apply the usual econometric methods to non-stationary series,
several problems arise, including the famous problem of spurious regressions
addressed by Granger and Newbold (1974). Heuristically, consider two time series
.Xt and . Yt integrated of order 1 and without any link between them. If we run the

regression . Yt = α + βXt + εt , we should get . β = 0. Granger and Newbold


(1974) show that . β is significantly different from zero, meaning that .Xt is an
explanatory variable for . Yt , which makes no sense since, by assumption, the
7.5 Cointegration and Error-Correction Models 337

two series are independent. The consequence of non-stationarity is that classical


inference procedures are no longer valid.
To illustrate this fundamental issue, let us give some examples of spurious
regressions.14 We also report for each estimated regression the value of the
coefficient of determination .R 2 and the Durbin-Watson statistic (DW ). Following
the usual notations, the numbers in parentheses below the estimated values of the
coefficients are their t-statistics.

– Example 1: regression of the infant mortality rate in Egypt (MOR) on the income
of US farmers (I N C) and on the money supply in Honduras (M), annual data
1971–1990:

 t = 179.9 − 0.29 I N Ct − 0.04 Mt


MOR
. (7.114)
(16.63) (−2.32) (−4.26)

with .R 2 = 0.918 and .DW = 0.47.


– Example 2: regression of US exports (EXP ) on Australian male life expectancy
(LI F E), annual data 1960–1990:

t = −2943 + 45.79LI F Et
EXP
. (7.115)
(−16.70) (17.76)

with .R 2 = 0.916 and .DW = 0.36.


– Example 3: regression of South African population (P OP ) on US research and
development expenditure (RD), annual data 1971–1990:


P
. OP t = 21698.7 + 111.58RDt (7.116)
(59.44) (26.40)

with .R 2 = 0.974 and .DW = 0.30.

These three examples illustrate regressions that make no sense, since it is obvious
that there is no link between the explanatory variables and the variable being
explained in each of the three cases considered. Thus, if we take the third example,
it goes without saying that finding that R&D spending in the United States has an
impact on the population in South Africa makes little sense. These examples are
illustrative of spurious regressions, i.e., regressions that are meaningless. This is
due to the non-stationarity of the different series involved in the regressions.
Two features are common to all three regressions: firstly, the coefficient of
determination is very high (above .0.9 in our examples), and, secondly, the value
of the Durbin-Watson statistic is low. These two characteristics are symptomatic of
spurious regressions.

14 These examples are taken from the website of J. Gonzalo, Universidad Carlos III, Madrid.
338 7 An Introduction to Time Series Models

A procedure frequently used to avoid the problem of spurious regressions is


to differentiate non-stationary series in order to stationarize them and apply the
usual econometric methods. However, the main limitation of this differentiation
operation is that it masks the long-term properties of the series studied, since
the relationships between the levels of the variables are no longer considered.
The theory of cointegration alleviates this problem by offering the possibility of
specifying stable long-term relationships while jointly analyzing the short-term
dynamics of the variables under consideration.

7.5.2 The Concept of Cointegration

If . Xt and . Yt are two series . I (d), then in general the linear combination . zt :

zt = Yt − βXt
. (7.117)

is also . I (d) .
However, it is possible that . zt is not . I (d) but . I (d − b) where . b is a positive
integer .(d ≥ b > 0). In other words, .zt is integrated of an order lower than the order
of integration of the two variables under consideration. In this case, the series . Xt
and . Yt are said to be cointegrated, which is noted:

. (Xt , Yt ) ∼ CI (d, b) (7.118)

.β is the cointegration parameter and the vector . [1, −β] is the cointegrating

vector.
The most studied case corresponds to: . d = b = 1. Thus, two non-stationary
series .(I (1)) are cointegrated if there exists a stationary linear combination .(I (0))
of these two series.
The underlying idea is that, in the short term, . Xt and . Yt may diverge (they
are both non-stationary), but they will move in unison in the long term. There is
therefore a stable long-term relationship between . Xt and . Yt . This relationship is
called cointegration (or cointegrating) relationship or the long-term relation-
ship. It is given by . Yt = βXt (i.e., zt = 0).15 In the long term, similar movements
of .Xt and . Yt tend to offset each other yielding a stationary series. .zt measures
the extent of the imbalance between . Xt and . Yt and is called the equilibrium
error. Examples corresponding to such a situation are numerous in economics: the
relationship between consumption and income, the relationship between short- and
long-term interest rates, the relationship between international stock market indices,
and so on.

15 Note that the cointegrating relationship can include a constant term, for example: .Y
t = α + βXt .
7.5 Cointegration and Error-Correction Models 339

Remark 7.12 For the sake of simplification, we have considered here the case
of two variables. The notion of cointegration can be generalized to the case of N
variables. We will not deal with this generalization in the context of this textbook
and refer readers to Engle and Granger (1991), Hamilton (1994), or Lardic and
Mignon (2002).

7.5.3 Error-Correction Models

One of the fundamental properties of cointegrated series is that they can be modeled
as an error-correction model. This result was demonstrated in the Granger
representation theorem (Granger, 1981), valid for series .CI (1, 1). Such models
allow us to model the adjustments that lead to a long-term equilibrium situation.
They are dynamic models, incorporating both short-term and long-term changes in
variables.
Let . Xt and . Yt be two .CI (1, 1) variables. Assuming that .Yt is the endogenous
variable and .Xt is the explanatory variable, the error-correction model is written:
 
ΔYt = γ ẑt−1 +
. βi ΔXt−i + δj ΔYt−j + d(L) εt (7.119)
i j

where .εt is white noise. .ẑt = Yt − β̂Xt is the residual from the estimation of the
cointegration relationship between . Xt and . Yt . . d(L) is a finite polynomial in . L. In
practice, we frequently have .d(L) = 1 and the error-correction model is written
more simply:
 
ΔYt = γ ẑt−1 +
. βi ΔXt−i + δj ΔYt−j + εt (7.120)
i j

The coefficient .γ associated with .ẑt−1 is the error-correction coefficient. It


provides a measure of the speed of adjustment towards the long-term target, given by
the cointegration relationship. The coefficient .γ must be significantly non-zero and
negative for the error-correction mechanism to be present. Otherwise, there is no
return-to-equilibrium phenomenon. The error-correction model allows short-term
fluctuations to be accounted for around the long-term equilibrium (given by the
cointegration relationship). It thus describes an adjustment process and combines
two types of variables:

– First-difference (stationary) variables representing short-term fluctuations


– Variables in levels, here a variable . ẑt which is a stationary linear combination of
non-stationary variables, which ensures that the long term is taken into account
340 7 An Introduction to Time Series Models

7.5.4 Estimation of Error-Correction Models and Cointegration


Tests: The Engle and Granger (1987) Approach
Two-Step Estimation Method
The two-step estimation method, valid for .CI (1, 1) series, was proposed by Engle
and Granger (1987).
First step: Estimation of the long-term relationship.
The long-term relationship is estimated by OLS:16

Yt = α + βXt + zt
. (7.121)

where .zt is the error term. If the variables are cointegrated, we proceed to the second
step.
Second step: Estimation of the error-correction model.
The error-correction model is estimated by OLS:
 
ΔYt = γ"
. zt−1 + βi ΔXt−i + δj ΔYt−j + εt (7.122)
i j

where . εt ∼ W N and . " zt−1 is the residual from the estimation of the one-period-
lagged long-term relationship: " .zt−1 = Yt−1 − α̂ − β̂Xt−1 .

In the first step of the Engle and Granger (1987) estimation method, it is
necessary to check that the series .Xt and .Yt are cointegrated, i.e., that the residuals
of the long-term relationship are stationary (.I (0)). It is important to remember that
if .ẑt is not stationary, i.e., if the variables .Xt and .Yt are not cointegrated, then the
relationship (7.121) is a spurious regression. On the other hand, if .ẑt is stationary, the
relationship (7.121) is a cointegrating relationship. To test whether the residual term
of the long-term relationship is stationary or not, cointegration tests are performed.
There are several such tests (see in particular Engle and Granger, 1987; Johansen,
1988 and 1991); here we propose the Dickey-Fuller test.

Dickey-Fuller Test of No Cointegration


The Dickey-Fuller (DF) and augmented Dickey-Fuller (ADF) tests allow us to
test the null hypothesis of no cointegration against the alternative hypothesis that
the series under consideration are cointegrated. Their purpose is thus to test the
existence of a unit root in the residuals "
.zt derived from the estimation of the long-

term relationship:

"
.zt = Yt − α̂ − β̂Xt (7.123)

16 We assume here that the long-term relationship includes a constant term.


7.5 Cointegration and Error-Correction Models 341

Table 7.17 Engle and Yoo’s T 1% 5% 10%


(1987) critical values for the
.N =2 50 −4.32 −3.67 −3.28
DF test of no cointegration
(.p = 0) 100 −4.07 −3.37 −3.03
200 −4.00 −3.37 −3.02
.N =3 50 −4.84 −4.11 −3.73
100 −4.45 −3.93 −3.59
200 −4.35 −3.78 −3.47
.N =4 50 −4.94 −4.35 −4.02
100 −4.75 −4.22 −3.89
200 −4.70 −4.18 −3.89
.N =5 50 −5.41 −4.76 −4.42
100 −5.18 −4.58 −4.26
200 −5.02 −4.48 −4.18

– In the case of the DF test, we estimate the relationship:

zt = φ"
Δ"
. zt−1 + ut (7.124)

– In the case of the ADF test, we estimate the relationship:


p
zt = φ"
Δ"
. zt−1 + zt−i + ut
φi Δ" (7.125)
i=1

with, in both cases, . ut ∼ W B .


We test the null hypothesis .H0 : " zt non-stationary .(φ = 0) reflecting the fact that
the variables .Xt and . Yt are not cointegrated, against the alternative hypothesis .H1 :
"
.zt stationary .(φ < 0), indicating that the series .Xt and . Yt are cointegrated.

It is important to stress that this test of no cointegration is based on the estimated


residuals .ẑt and not on the true values .zt . The consequence is that the critical values
tabulated by Dickey and Fuller are no longer valid. It is therefore appropriate to
use the critical values tabulated by Engle and Yoo (1987) (Tables 7.17 and 7.18) or
by MacKinnon (1991) (Table 7.19).17 In these tables, N designates the number of
variables considered and T the number of observations.
Since the critical values are negative, the decision rule is as follows (noting .tφ̂
the value of the t-statistic associated with the estimated coefficient .φ̂):

– If . tφ̂ is lower than the critical value, we reject .H0 : the series .Xt and .Yt are
cointegrated.

17 Inthe MacKinnon table, critical values are distinguished according to whether or not a trend is
included in the cointegration relationship.
342 7 An Introduction to Time Series Models

Table 7.18 Engle and Yoo’s T 1% 5% 10%


(1987) critical values for the
.N =2 50 −4.12 −3.29 −2.90
ADF test of no cointegration
with .p = 4 100 −3.73 −3.17 −2.91
200 −3.78 −3.25 −2.98
.N =3 50 −4.45 −3.75 −3.36
100 −4.22 −3.62 −3.32
200 −4.34 −3.78 −3.51
.N =4 50 −4.61 −3.98 −3.67
100 −4.61 −4.02 −3.71
200 −4.72 −4.13 −3.83
.N =5 50 −4.80 −4.15 −3.85
100 −4.98 −4.36 −4.06
200 −4.97 −4.43 −4.14

Table 7.19 MacKinnon’s 1% 5% 10%


(1991) critical values for the
.N =2 Without trend −3.90 −3.34 −3.04
ADF test of no cointegration
With trend −4.32 −3.78 −3.50
.N =3 Without trend −4.30 −3.74 −3.45
With trend −4.67 −4.12 −3.84
.N =4 Without trend −4.65 −4.10 −3.81
With trend −4.97 −4.43 −4.15
.N =5 Without trend −4.96 −4.41 −4.13
With trend −5.25 −4.72 −4.44
.N =6 Without trend −5.24 −4.71 −4.42
With trend −5.51 −4.98 −4.70

– If . tφ̂ is higher than the critical value, we do not reject .H0 : the variables .Xt and
.Yt are not cointegrated.

Remark 7.13 The method of Engle and Granger (1987) provides us with a
simple way to test the hypothesis of no cointegration and to estimate an error-
correction model in two steps. The disadvantage of this approach is that it does
not allow multiple cointegration vectors to be distinguished. This is problematic
when we study N variables simultaneously, with .N > 2, or, if preferred, when
we have more than one explanatory variable .(k > 1). Indeed, we know that
if we analyze the behavior of N variables (with .N > 2), we can have up to
.(N − 1) cointegration relationships, the Engle-Granger approach allowing us to

obtain only one cointegration relationship. To overcome this difficulty, Johansen


(1988) proposed a multivariate approach to cointegration based on the maximum
likelihood method (see also Johansen and Juselius, 1990 and Johansen, 1991). A
presentation of this approach is beyond the scope of this book, and readers can
consult Engle and Granger (1991), Hamilton (1994), or Lardic and Mignon (2002).
7.5 Cointegration and Error-Correction Models 343

7,000 70

6,000 60

5,000 50

4,000 40

3,000 30

2,000 20

1,000 10

0 0
80 90 00 10 20 30 40 50 60 70 80 90 00 10 20

SP DIV

Fig. 7.16 Evolution of stock prices and dividends, United States, 1871.01–2021.06

Example: The Relationship Between Prices and Dividends


The efficient capital market theory forms the core of modern financial theory.
This theory assumes that every asset has a “fundamental value,” reflecting the
underlying economic fundamentals. More precisely, in line with the dividend
discount model, the fundamental value of a stock or stock index is defined as the
discounted sum of future dividends rationally anticipated by agents. We deduce
that, based on this approach, prices and dividends are linked through a stable long-
term relationship: prices and dividends must be cointegrated. Indeed, if prices and
dividends are not cointegrated, i.e., if the residuals of the relationship between prices
and dividends are non-stationary, then there is a long-lasting deviation between
the price and the fundamental value under the dividend discount model. The price
does not return to the fundamental value, which can be interpreted as evidence of
informational inefficiency. Conversely, if prices and dividends are cointegrated, the
residuals are stationary, and there is no lasting deviation between the price and the
fundamental value, which is consistent with the discount model and therefore with
the informational efficiency of the market under consideration.
In order to grasp this issue, consider the Shiller data set18 relating to the US
market over the period January 1871 to June 2021 (monthly data). Figure 7.16 plots
the dynamics of the SP 500 index (SP ) of the New York Stock Exchange as well
as the corresponding dividends (DI V ), both variables being expressed in real terms
(i.e., deflated by the consumer price index). Looking at this figure, we can see that

18 www.econ.yale.edu/~shiller.
344 7 An Introduction to Time Series Models

1.2

0.8

0.4

0.0

-0.4

-0.8

-1.2

-1.6
80 90 00 10 20 30 40 50 60 70 80 90 00 10 20

Fig. 7.17 Residuals of the long-term relationship between prices and dividends

prices and dividends follow a common trend, even though prices vary much more
than dividends. This is representative of the well-known phenomenon of excessive
stock price volatility.
In any case, and having confirmed that the two series under consideration are
indeed non-stationary and integrated of the same order (order 1), it is legitimate to
address the question of cointegration between the two variables. To this end, we
regress prices on dividends and study the stationarity of the residuals resulting from
the estimation of this relationship.
Figure 7.17 plots the pattern of this residual series. No particular structure
emerges, suggesting that the residuals appear stationary. Let us check this intuition
by applying the augmented Dickey-Fuller test to the residual series. We select a
number of lags equal to 1 and obtain a calculated value of the ADF statistic equal
to .−4.5805. The 5% critical value for 2 variables, more than 200 observations,
and zero lags is equal to .−3.37. Since .−4.5805 < −3.37, the null hypothesis of
non-stationarity of the residual series is rejected. It follows that the null hypothesis
of no cointegration between prices and dividends is rejected. Prices and dividends
are therefore cointegrated: there is a stable long-term relationship between the two
series, which is consistent with the efficient capital market hypothesis for the United
States over the period 1871–2021.

7.5.5 Empirical Application

We consider the long-term (10-year) interest rate series for Germany (GER) and
Austria (AU T ) at daily frequency over the period from January 2, 1986, to July
7.5 Cointegration and Error-Correction Models 345

Table 7.20 ADF test results ADF CV at 5%


GER −1.4007 −1.95
AUT −1.8859 −1.95
DGER −74.7409 −1.95
DAUT −98.4491 −1.95
CV critical value

13, 2021, i.e., a total of 9 269 observations. These series are extracted from the
Macrobond database. Since the Engle-Granger approach applies for .CI (1, 1) series,
we first implement the ADF unit root test on the series GER and AU T . To this
end, we follow the previously presented strategy, consisting in starting from the
estimation of a model with trend and constant, then estimating a model with constant
without trend if the latter is not significant, and finally a model without constant or
trend if neither of them proves to be significant. The application of this strategy
leads to the results shown in Table 7.20. We have chosen a model without constant
or trend for both series. Since the calculated value of the ADF statistic for the series
GER and AU T is higher than the critical value, we do not reject the null hypothesis
of unit root at the 5% significance level. To determine the order of integration of the
two series, we differentiate them and apply the test procedure on the series in first-
difference DGER and DAU T . In both cases, a model without constant or trend
is used. It appears that the calculated value of the ADF statistic is lower than the
critical value at the 5% significance level: the null hypothesis of unit root is therefore
rejected. In other words, DGER and DAU T are integrated of order zero, implying
that GER and AU T are integrated of order 1.
The two series are integrated of the same order (order 1), which is a necessary
condition for implementing the Engle-Granger method.
Figure 7.18 representing the joint evolution of the two series further indicates
that GER and AU T are characterized by a common trend over the entire period.
Thus, since GER and AU T are non-stationary and integrated of the same order,
and follow a similar pattern, it is legitimate ask whether the two variables are
cointegrated.
We begin by estimating the static relationship between GER and AU T , i.e.:

AU Tt = α + βGERt + zt
. (7.126)

The results from estimating this relationship allow us to deduce the residual
series:

. ẑt = AU Tt − 0.2237 − 1.0031GERt (7.127)

Recall that:

– If the residuals are non-stationary, the estimated static relationship is a spurious


regression.
346 7 An Introduction to Time Series Models

10

-2
1990 1995 2000 2005 2010 2015 2020

GER AUT

Fig. 7.18 10-year interest rates, Germany (GER) and Austria (AUT), January 2, 1986–July 13,
2021

– If the residuals are stationary, the estimated static relationship is a cointegrating


relationship.

To discriminate between these two possibilities, we apply the ADF test of no


cointegration. Table 7.21 shows the results of the ADF test on the residual series .ẑt
(noted RESI DS in the table).
The calculated value of the ADF statistic should be compared with the critical
values of Engle and Yoo (1987) or MacKinnon (1991) (see Tables 7.17, 7.18,
and 7.19). A number of lags .p = 4 in the implementation of the ADF test have been
selected here. The critical value for .p = 4 is equal to .−3.25 at the 5% significance
level. The calculated value .−4.9615 being lower than the 5% critical value, the null
hypothesis of non-stationarity of the residuals is rejected. It follows that the series
GER and AU T are cointegrated and the estimated static relationship is indeed a
cointegrating relationship. It is then possible to estimate an error-correction model;
the results are shown in Table 7.22.
The results in Table 7.22 show that the coefficient associated with the one-period
lagged residual term is negative .(−0.0039) and significantly different from zero (its
t-statistic is higher than 1.96 in absolute value). There is thus an error-correction
mechanism: in the long term, the differences (or imbalances) between the two series
tend to offset each other, leading the variables to evolve in a similar way. We also
note that, in the short term, the change in the Austrian interest rate is a function of
itself, lagged by one period, and of the variation in the German interest rate, also
lagged by one period.
7.5 Cointegration and Error-Correction Models 347

Table 7.21 ADF test on residuals


Null hypothesis: RESIDS has a unit root
Exogenous: none
Lag length: 4 (automatic – based on SIC, maxlag .= 37)
t-Statistic Prob.*
Augmented Dickey-Fuller test statistic .−4.961462 0.0000
Test critical values: 1%level .−2.565212

5% level .−1.940858

10% level .−1.616677

*MacKinnon (1996) one-sided p-values


Augmented Dickey-Fuller test equation
Dependent variable: D(RESIDS)
Method: least squares
Sample (adjusted): 1/09/1986 7/13/2021
Included observations: 9264 after adjustments
Variable Coefficient Std. error t-Statistic Prob.
RESIDS(.−1) .−0.007200 0.001451 .−4.961462 0.0000
D(RESIDS(.−1)) .−0.311232 0.010406 .−29.90853 0.0000
D(RESIDS(.−2)) .−0.157829 0.010850 .−14.54651 0.0000
D(RESIDS(.−3)) .−0.092181 0.010841 .−8.502916 0.0000
D(RESIDS(.−4)) .−0.043617 0.010374 .−4.204419 0.0000
R-squared 0.097161 Mean dependent var .−0.000137
Adjusted R-squared 0.096771 S.D. dependent var 0.049113
S.E. of regression 0.046676 Akaike info criterion .−3.290644
Sum squared resid 20.17191 Schwarz criterion .−3.286793
Log likelihood 15247.26 Hannan-Quinn criterion .−3.289335

Durbin-Watson stat 2.000780

Table 7.22 Estimation of the error-correction model


Dependent variable: D(AUT)
Variable Coefficient Std. error t-Statistic Prob.
C .−0.000848 0.000421 .−2.015442 0.0439
RESIDS(.−1) .−0.003948 0.001252 .−3.152072 0.0016
D(AUT(.−1)) .−0.074325 0.011648 .−6.380673 0.0000
D(GER(.−1)) 0.088085 0.009155 9.621747 0.0000
R-squared 0.011960 Mean dependent var .−0.000847

Adjusted R-squared 0.011640 S.D. dependent var 0.040712


S.E. of regression 0.040474 Akaike info criterion .−3.575884
Sum squared resid 15.17411 Schwarz criterion .−3.572805

Log likelihood 16572.86 Hannan-Quinn criterion .−3.574838


F-statistic 37.37579 Durbin-Watson stat 2.002893
Prob(F-statistic) 0.000000
348 7 An Introduction to Time Series Models

Conclusion

This chapter has introduced the basics of time series econometrics, a branch of
econometrics that is still undergoing many developments. In addition to univariate
time series models, we have dealt with multivariate analysis through VAR processes.
In these processes, all variables have the same status, in the sense that no distinction
is made between endogenous and exogenous variables. An alternative to VAR
processes are the simultaneous equations models which are discussed in the next
chapter. Unlike VAR models, which have no theoretical content, simultaneous
equations models are structural macroeconomic models.

The Gist of the Chapter

Stationarity
 
E Yt2 < ∞∀t ∈ Z
E (Yt ) = m ∀t ∈ Z
Definition
Cov (Yt , Yt+h ) = γh , ∀t, h ∈ Z,
γ : autocovariance function
Unit root test Dickey-Fuller tests
Functions
Autocovariance γh = Cov (Yt , Yt+h ) , h ∈ Z
Autocorrelation ρh = γγh0 , h ∈ Z
Partial autocorrelation φhh : calculation using the Durbin algorithm
Process
AR(p) Yt − φ1 Yt−1 − ··· − φp Yt−p = εt
φhh = 0 ∀h > p
MA(q) Yt = εt − θ1 εt−1 − · · · − θq εt−q
ρh = 0 ∀h > q
ARMA(p, q) Yt − φ1 Yt−1 − ··· − φp Yt−p =
εt − θ1 εt−1 ··· θq εt−q
Information criteria
2(p+q)
Akaike AI C = log "
σε2 + T
Schwarz σε + (p + q) logT T
SI C = log "2

Hannan-Quinn (c = 1) σε2 + 2(p + q) log(log


H Q = log " T
T)
p
V AR(p) Y t − i=1 Фi Y t−i = Ф0 + ε t
Cointegration (Xt , Yt ) ∼ CI (d, b) if zt = Yt − βXt ∼ I (d − b)
with Xt and Yt ∼ I (d) , d ≥ b > 0
 
Error-correction model ΔYt = γ ẑt−1 + βi ΔXt−i + δj ΔYt−j + εt ,
i j
γ : speed of adjustment to the long-term target
7.5 Cointegration and Error-Correction Models 349

Further Reading

There are many textbooks on time series econometrics. In addition to the pioneering
work by Box and Jenkins (1970), let us mention the manuals by Harvey (1990),
Mills (1990), Hamilton (1994), Gouriéroux and Monfort (1996), or Brockwell and
Davis (1998); Hamilton’s (1994) work in particular includes numerous develop-
ments on multivariate models.
On the econometrics of non-stationary time series, in addition to the textbooks
cited above and the many references included in this chapter, readers may usefully
consult Engle and Granger (1991), Banerjee et al. (1993), Johansen (1995), as well
as Maddala and Kim (1998).
As mentioned in this chapter, time series econometrics has undergone, and
continues to undergo, many developments. There are therefore references specific
to certain fields:

– For developments relating to nonlinear time series econometrics, readers can


refer to Granger and Teräsvirta (1993), Lardic and Mignon (2002), or Teräsvirta
et al. (2010). A particular category of these processes concerns processes
with nonlinearities in variance (ARCH-type models), which are widely used in
finance. For a presentation of these models, in addition to the previously cited
references in nonlinear time series econometrics, see also the literature reviews
by Bollerslev et al. (1992, 1994), Palm (1996), Gouriéroux (1997), Bollerslev
(2008), and Bauwens et al. (2012).
– Readers interested in the econometrics of long-memory processes may refer to
Beran (1994) and Lardic and Mignon (1999, 2002).
– Concerning extensions of the notion of cointegration, let us mention the work
of Dufrénot and Mignon (2002a,b) on nonlinear cointegration and that of Lardic
and Mignon (2002) and Lardic et al. (2005) on fractional cointegration.
– Finally, let us mention a field whose development has been particularly notable
in recent years: non-stationary panel data econometrics. For pedagogical pre-
sentations in French, interested readers may refer to Hurlin and Mignon (2005,
2007).
Simultaneous Equations Models
8

So far, with the exception of the VAR models presented in the previous chapter, we
have considered models with only one equation. However, many economic theories
are based on models with several equations, i.e., on systems of equations. Since
these equations are not independent of each other, the interaction of the different
variables has important consequences for the estimation of each equation and for
the system as a whole.
We start by outlining the analytical framework before turning to the possibility
or not of estimating the parameters of the model, known as identification. We
then present the estimation methods relating to simultaneous equations models,
as well as the specification test proposed by Hausman (1978). We conclude with an
empirical application.

8.1 The Analytical Framework

In the single-equation models we have studied so far, there is only one endogenous
variable, the latter being explained by one or more exogenous variables. If a causal
relationship exists, it runs from the exogenous variables to the endogenous variable.
In a simultaneous equations model, each equation is relative to an endogenous
variable, and it is very common for an explained variable in one equation to become
an explanatory variable in another equation of the model. The distinction between
endogenous and exogenous variables is therefore no longer as marked as in the case
of single-equation models and, in a simultaneous equations model, the variables
are determined simultaneously. This dual status of the variables appearing in a
simultaneous equations model means that it is impossible to estimate the parameters
of one equation without taking into account the information provided by the other
equations in the system. In particular, the OLS estimators are biased and non-
consistent, in the sense that they do not converge to their true values when the sample

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 351
V. Mignon, Principles of Econometrics, Classroom Companion: Economics,
https://doi.org/10.1007/978-3-031-52535-3_8
352 8 Simultaneous Equations Models

size increases. As an example, consider the following system:

Yt = α + βXt + εt
. (8.1)

Xt = Yt + Zt
. (8.2)

where .εt is an error term. In Eq. (8.1), the variable Y is explained by X; Y is


therefore an endogenous variable. Equation (8.2) shows that the variable X is in
turn explained by Y and Z. All in all, in this system, Y and X are endogenous
variables and Z is an exogenous variable.
Suppose that .εt follows a normal distribution of zero mean and constant variance
2
.σε , and that .εt and .Zt are independent. We can rewrite the system in the following

form:

Yt = α + βXt + εt = α + β (Yt + Zt ) + εt
. (8.3)

hence:
α β 1
Yt =
. + Zt + εt (8.4)
1−β 1−β 1−β

We deduce:
α β 1
Xt =
. + Zt + εt + Zt (8.5)
1−β 1−β 1−β

hence:
α 1 1
Xt =
. + Zt + εt (8.6)
1−β 1−β 1−β

The system is finally written:

α β
Yt =
. + Zt + μt (8.7)
1−β 1−β

α 1
Xt =
. + Zt + μt (8.8)
1−β 1−β

with .μt = 1−β1


εt .
Equation (8.8) shows that .Xt is influenced by .μt and, consequently, by .εt . It
follows that .Cov (Xt , εt ) /= 0, implying that the OLS estimator is not consistent.
In order to introduce some concepts relating to simultaneous equations models,
let us start with an introductory example.
8.1 The Analytical Framework 353

8.1.1 Introductory Example

Consider the following three-equation system composed of centered variables:

qtd = α1 pt + α2 yt + εtd
. (8.9)

qts = β1 pt + εts
. (8.10)

qtd = qts = qt
. (8.11)

where Eq. (8.9) is the demand equation, .qtd denoting the quantity demanded of
any good, .pt the price of that good, and .yt income. Equation (8.10) is the supply
equation, .qts denoting the quantity offered of the good under consideration. .εtd and .εts
are error terms, also known as disturbances. The demand and supply equations are
behavioral equations. Finally, Eq. (8.11) is called the equilibrium equation: it is
the equilibrium condition represented by the equality between demand and supply.
Equilibrium equations contain no error term.
The equations of this system, derived from economic theory, are called struc-
tural equations. This is referred to as a model expressed in structural form. In
this system, price and quantity variables are interdependent, so they are mutually
dependent or endogenous. Income .yt is an exogenous variable, in the sense that it is
determined outside the system.
Since the system incorporates a demand equation, a supply equation, and an
equilibrium condition, it is referred to as a complete system in the sense that it
has as many equations as there are endogenous variables.
Let us express each of the endogenous variables in terms of the exogenous
variable and the error terms .εtd and .εts . From Eq. (8.10), we can write:

1 1
pt =
. qt − εts (8.12)
β1 β1

We transfer this expression into (8.9), which gives:


 
1 1 s
qt = α1
. qt − εt + α2 yt + εtd (8.13)
β1 β1

Hence:
α2 β1 1  
qt =
. yt + β1 εtd − α1 εts (8.14)
β1 − α1 β1 − α1

Positing:

α2 β1
γ1 =
. (8.15)
β1 − α1
354 8 Simultaneous Equations Models

and:
1  
μ1t =
. β1 εtd − α1 εts (8.16)
β1 − α1

we can rewrite Eq. (8.14) as follows:

. qt = γ1 yt + μ1t (8.17)

Now we transfer the expression (8.14) into (8.12), and we obtain:


  
1 α2 β1 1 1
pt =
. yt + β1 εtd − α1 εts − εts (8.18)
β1 β1 − α1 β1 − α1 β1

that is:
α2 1  
.pt = yt + εtd − εts (8.19)
β1 − α1 β1 − α1

By positing:
α2
γ2 =
. (8.20)
β1 − α1

and:
1  
μ2t =
. εtd − εts (8.21)
β1 − α1

this last equation can be rewritten as:

.pt = γ2 yt + μ2t (8.22)

Putting together Eqs. (8.17) and (8.22), the system of equations is finally written
as:

. qt = γ1 yt + μ1t (8.23)

.pt = γ2 yt + μ2t (8.24)

Each of the endogenous variables is expressed as a function of the exogenous


variable and a random error term. This is known as the reduced form of the model
(none of the endogenous variables is any longer expressed as a function of the
other endogenous variables). Equations (8.17) and (8.22) are called reduced-form
equations.
8.1 The Analytical Framework 355

In this system, the endogenous variables are correlated with the error terms, with
the result that the OLS estimators are no longer consistent. As we will see later, it
is possible to use an instrumental variables estimator or a two-stage least squares
estimator.

Remark 8.1 When a model includes lagged endogenous variables, these are
referred to as predetermined variables. As an example, consider the following
model:

Ct = α0 + α1 Yt + α2 Ct−1 + ε1t
. (8.25)

It = β0 + β1 Rt + β2 (Yt − Yt−1 ) + ε2t


. (8.26)

Yt = Ct + It + Gt
. (8.27)

Equation (8.25) is the consumption equation, (8.26) is the investment equation,


and (8.27) is the equilibrium condition. This model has three endogenous variables
.(Ct , It , and Yt ), two exogenous variables .(Rt and Gt ), and two lagged endogenous

variables .(Ct−1 and Yt−1 ). The latter two variables are said to be predetermined
in the sense that they are considered to be already determined with respect to the
current values of the endogenous variables.
More generally, variables that are independent of all future error terms of the
structural form are called predetermined variables.

8.1.2 General Form of Simultaneous Equations Models

In the general case, the structural form of the simultaneous equations model is
written:

β11 Y1t + β12 Y2t + . . . + β1M YMt + γ11 X1t + γ12 X2t + . . . + γ1k Xkt = ε1t
β21 Y1t + β22 Y2t + . . . + β2M YMt + γ21 X1t + γ22 X2t + . . . + γ2k Xkt = ε2t
.
...
βM1 Y1t + βM2 Y2t + . . . + βMM YMt + γM1 X1t + γM2 X2t + . . . + γMk Xkt = εMt
(8.28)

This model includes M equations and M endogenous variables .(Y1t , Y2t , . . . ,


YMt ). It comprises k exogenous variables .(X1t , X2t , . . . , Xkt ) which may also
contain predetermined values of the endogenous variables.1 One of the variables
may consist of 1 in order to account for the constant term in each of the equations.
The error terms .(ε1t , ε2t , . . . , εkt ) are called structural disturbances.

1 The predetermined variables can thus be divided into two categories: exogenous variables and
lagged endogenous variables.
356 8 Simultaneous Equations Models

This model can also be written in matrix form:

. B Y + 𝚪 X = ε (8.29)
(M,M)(M,1) (M,k)(k,1) (M,1)

with:
⎛ ⎞
β11 β12 · · · β1M
⎜ β21 β22 · · · β2M ⎟
⎜ ⎟
.B = ⎜ .. ⎟ (8.30)
⎝ . ⎠
βM1 βM2 · · · βMM
⎛ ⎞
Y1t
⎜ Y2t ⎟
⎜ ⎟
.Y = ⎜ . ⎟ (8.31)
⎝ .. ⎠
YMt
⎛ ⎞
γ11 γ12 · · · γ1k
⎜ γ21 γ22 · · · γ2k ⎟
⎜ ⎟
.𝚪 = ⎜ .. ⎟ (8.32)
⎝ . ⎠
γM1 γM2 · · · γMk
⎛ ⎞
X1t
⎜X2t ⎟
⎜ ⎟
.X = ⎜ . ⎟ (8.33)
⎝ .. ⎠
Xkt

and:
⎛ ⎞
ε1t
⎜ ε2t ⎟
⎜ ⎟
.ε = ⎜ . ⎟ (8.34)
⎝ .. ⎠
εMt

In each equation, one of the endogenous variables has its coefficient equal to
1: this is the dependent variable. There is therefore one dependent variable per
equation. In other words, in the matrix .B, each column has at least one value equal
to 1. This is known as normalization. On the other hand, equations in which all
coefficients are equal to 1 and involve no disturbance are equilibrium equations.
8.2 The Identification Problem 357

If the matrix .B is non-singular, it is invertible and it is possible to derive the


reduced form of the model allowing the matrix .Y to be expressed in terms of the
matrix .X:

Y = −B −1 𝚪X + B −1 ε
. (8.35)

The condition that the matrix .B is non-singular is called the completeness


condition. The reduced form allows each endogenous variable to be expressed in
terms of the exogenous or predetermined variables and the disturbances.
The reduced-form equations can be estimated by OLS. In these equations, the
endogenous variables are expressed as a function of the exogenous or predetermined
variables, assumed to be uncorrelated with the error terms. After estimating the
parameters of the reduced-form equations, it is possible to determine the parameters
of the structural equations by applying the indirect least squares method (see
below).
While the transition from the structural form to the reduced form seems easy in
theory, it is not the same in practice. In the reduced form, knowing the elements of
the matrix . B −1 𝚪 does not allow us to determine, i.e., to identify, the matrices .B
and .𝚪 separately. This is known as the identification problem: we have a system
of .(M × k) equations with .(M × M) + (M × k) unknowns, which therefore cannot
be solved without imposing certain restrictions.

Remark 8.2 If the matrix .B is an upper triangular matrix, the system is described
as triangular or recursive. Its form is as follows:

Y1t = f1 (X1t , X2t , . . . , Xkt ) + ε1t


Y2t = f2 (Y1t , X1t , X2t , . . . , Xkt ) + ε2t
. (8.36)
···
YMt = fM (Y1t , Y2t , . . . , YM−1t , X1t , X2t , . . . , Xkt ) + εMt

Each endogenous variable is determined sequentially or recursively. The first


equation contains no endogenous variables and is entirely determined by the
exogenous variables. In the second equation, the explanatory variables include the
endogenous variable from the first equation, and so on. In a triangular system of the
kind, the OLS method can be applied equation by equation, since the endogenous
variables do not depend on the disturbances.

8.2 The Identification Problem


8.2.1 Problem Description

The question posed here is whether it is possible to derive estimators of the structural
form parameters from estimators of the reduced-form parameters. The problem
arises from the fact that several structural coefficient estimates can be compatible
358 8 Simultaneous Equations Models

with the same data sets. In other words, one reduced-form equation may correspond
to several structural equations.
The identification conditions are determined equation by equation. Several cases
may arise:

– If it is impossible to obtain the estimators of the structural form parameters


from the estimators of the reduced form, the model is said to be unidentified or
underidentified. Thus, a model is underidentified if one equation of the model
is underidentifiable. This means that the number of equations is smaller than
the number of parameters to be identified in the structural form, and it is then
impossible to solve the system.
– If it is possible to obtain the estimators of the parameters of the structural form
from the estimators of the reduced form, the model is said to be identified. There
are two possible scenarios here:
– The model is exactly (or fully or strictly) identified if all its equations are
strictly identifiable, i.e., if unique values of the structural parameters can be
obtained.
– The model is overidentified if the equations are overidentifiable, i.e., if
several values correspond to the structural parameters.

We will come back to these various cases later.

8.2.2 Rank and Order Conditions for Identification

Recall that the structural form is given by:

.BY + 𝚪X = ε (8.37)

and the reduced form by:

Y = −B −1 𝚪X + B −1 ε
. (8.38)

or:

Y = ΠX + υ
. (8.39)

with .Π = −B −1 𝚪 and .υ = B −1 ε.
Thus, three types of structural parameters are unknown:

– The matrix .B which is a non-singular matrix of size .(M × M)


– The parameter matrix .𝚪 of size .(M × k)
– The variance-covariance matrix of structural disturbances, denoted .Ωε
8.2 The Identification Problem 359

The reduced form includes the following known parameters:

– The matrix of coefficients of the reduced form .Π of size .(M × k)


– The variance-covariance matrix of the disturbances of the reduced form, noted
.Ωυ

In other words, the number of structural parameters is equal to .M 2 + Mk +


M(M+1)
2 and the number of parameters of the reduced form is given by: .Mk +
M(M+1)
2 . The difference between the number of structural parameters and that of
the reduced form is therefore equal to .M 2 , which corresponds to the number of
unknown elements in the matrix .B. Consequently, if no additional information
is available, identification is impossible. The additional information can be of
several types, depending on the nature of the restrictions or constraints imposed
on the coefficients of the structural form: normalization, identities, exclusion
relations, linear restrictions, or even restrictions on the variance-covariance matrix
of disturbances. Let us consider each of these five points in turn.

Restrictions
– Normalization. As previously mentioned, in each equation, one of the endoge-
nous variables has its coefficient equal to 1: this is the dependent variable.
There is one such dependent variable per equation. Imposing a value of 1 on
a coefficient is called normalization. This operation reduces the number of
unknown elements in the matrix B, since we then have M(M − 1) and no longer
M 2 undetermined elements.
– Identities. We know that a model can contain behavioral relations and equilib-
rium relations or accounting identities. These equilibrium relations and account-
ing identities do not have to be identified: the coefficients associated with the
variables in these relations are in fact known and are frequently equal to 1. In the
introductory example we studied earlier, Eq. (8.11) is the equilibrium condition
and does not have to be identified.
– Exclusion relations. Not introducing a variable into one of the equations of
the system is considered as an exclusion relation. In effect, this amounts to
assigning a zero coefficient to the variable in question. In other words, it consists
in placing zeros in the elements of the matrices B and/or 𝚪. Such a procedure
obviously reduces the number of unknown parameters and thus provides an aid
to identification.
– Linear restrictions. In line with economic theory, some models contain variables
with identical coefficients. Imposing such restrictions on parameters facilitates
the estimation procedure by reducing the number of unknown parameters.
– Restrictions on the variance-covariance matrix of the disturbances. Such restric-
tions are similar to those imposed on the model parameters. They consist, for
example, in introducing zeros for certain elements of the variance-covariance
matrix when imposing the absence of correlation between the structural distur-
bances of several equations.
360 8 Simultaneous Equations Models

Conditions for Identification


Let us first introduce some notations. Consider a particular equation j of the model
with M simultaneous equations. The coefficients associated with this equation
appear accordingly in the j -th columns of the matrices .B and/or .𝚪. It is further
assumed that:

– In this equation, one of the elements of the matrix .B is equal to 1 (normalization).


– Some variables appearing in other equations are excluded from this equation
(exclusion relations).

Note:

– M the number of endogenous variables in the model, i.e., the number of


equations in the model,
– k the number of exogenous variables introduced into the model,
– .Mj the number of endogenous variables included in the equation j considered,

.M designating the number of endogenous variables excluded from the equa-
j
tion j ,
– .kj the number of exogenous variables present in the equation j under considera-
tion, .kj∗ denoting the number of exogenous variables excluded from the equation
j.

The number of equations in the model M is therefore given by:

M = Mj + Mj∗ + 1
. (8.40)

and the number of exogenous variables k is equal to:

.k = kj + kj∗ (8.41)

Since the number of equations must be at least equal to the number of unknowns,
we deduce the order condition for the identification of the equation j :

kj∗ ≥ Mj
. (8.42)

According to this condition, the number of variables excluded from the equation
j must be at least equal to the number of endogenous variables included in this
same equation j . The order condition is a necessary condition for identification, but
not a sufficient one. In other words, it ensures that the j -th equation of the reduced
form admits a solution, but we do not know whether or not it is unique. In order
to guarantee the uniqueness of the solution, the rank condition is necessary. This
condition (see Greene, 2020) imposes a restriction on the submatrix of the reduced-
form coefficient matrix and ensures that there is a unique solution for the structural
parameters given the parameters of the reduced form. This rank condition can be
8.2 The Identification Problem 361

expressed as follows: the equation j is identified if and only if it is possible to obtain


at least one non-zero determinant of order .(M − 1, M − 1) from the coefficients of
the variables excluded from the equation j , but included in the other equations of
the model. In large models, only the order condition is used, as it is very difficult, if
not impossible, to apply the rank condition.
Three cases are then possible, as discussed at the beginning of this section:

– If .kj∗ < Mj , or if the rank condition is not verified, the model is underidentified.
– If .kj∗ = Mj , and the rank condition is verified, the model is exactly identified.
– If .kj∗ > Mj , and the rank condition is verified, the model is overidentified (there
are more restrictions than those necessary for identification).

To establish these identification conditions, we have considered only exclusion


relations. If we also have linear restrictions on the parameters, the order condition
becomes:

rj + kj∗ ≥ Mj
. (8.43)

where .rj denotes the number of restrictions other than the exclusion restrictions.
It is possible to reformulate this order condition by taking into account both the
exclusion relations and the linear restrictions. By noting .sj the total number of
restrictions, i.e.:

sj = rj + kj∗ + Mj∗
. (8.44)

we can write the order condition as:

sj ≥ M − 1
. (8.45)

Knowing that .M − 1 = Mj + Mj∗ , we have, by transferring (8.44) into (8.45):


∗ ∗ ∗
.rj + k + M ≥ Mj + M and we find Eq. (8.43). We then obtain the three cases
j j j
previously presented:

– If .rj + kj∗ < Mj , or if the rank condition does not hold, the model is
underidentified.
– If .rj + kj∗ = Mj , and the rank condition does not hold, the model is exactly
identified.
– If .rj +kj∗ > Mj , and the rank condition does not hold, the model is overidentified.

Remark 8.3 It is also possible to use restrictions on the variance-covariance matrix


'
of the disturbances. We know that .Ωυ = B −1 Ωε B −1 . If restrictions are imposed
on .Ωυ , more information than necessary will be available for estimating .Ωυ . As a
result, it is possible to use the additional information to identify the elements of .B.
Thus, imposing zero covariances between disturbances can help in the identification.
362 8 Simultaneous Equations Models

On this point, reference can be made to Johnston and Dinardo (1996) and Greene
(2020).

8.3 Estimation Methods

Identification is a prerequisite for estimating a simultaneous equations model, since


if the model is underidentified, it cannot be estimated. Only exactly identified or
overidentified models are estimable.
We have seen that while the reduced form can be estimated by OLS, this is not
the case for the structural form. The OLS estimators of the structural parameters
are not consistent insofar as the endogenous variables in each of the equations are
correlated with the disturbances.2
Methods for estimating simultaneous equations models are for the most part
instrumental variables methods (see Chap. 5) and can be classified into two broad
categories:

– Limited-information estimation methods: each equation is estimated sepa-


rately.
– Full-information estimation methods: the system as a whole is estimated, i.e.,
the M equations of the model are estimated simultaneously.

In limited-information estimation methods, the information contained in the


other equations is ignored, hence the name given to these techniques. This category
includes the methods of indirect least squares, two-stage least squares, generalized
moments, limited-information maximum likelihood, and K-class estimators. On the
contrary, in the full-information methods, all the information contained in the set of
M equations is used, hence their name. This category includes the three-stage least
squares method, the full-information maximum likelihood method, or the system
generalized method of moments. Logically, full-information methods are expected
to perform better than limited-information methods, as the joint estimation should
lead to efficiency gains. Despite this advantage, these methods tend to be less widely
used in practice than limited-information methods, for essentially three reasons:
computational complexity, existence of nonlinear solutions on the parameters, and
sensitivity to specification errors.
We focus here essentially on two limited-information estimation methods:
indirect least squares and two-stage least squares.

2 However, the OLS method can be applied in the case of triangular (or recursive) systems.
8.3 Estimation Methods 363

8.3.1 Indirect Least Squares

This estimation method applies only to equations that are exactly identified. Gen-
erally speaking, the principle of indirect least squares (ILS) consists in estimating
the parameters of the reduced form by OLS and deducing the structural coefficients
by an appropriate transformation of the reduced form coefficients. This technique
can be described in three steps:

– The first step is to write the model in reduced form. This involves expressing the
dependent variable of each equation as a function of the predetermined variables
(exogenous and lagged endogenous variables) and the disturbances.
– The second step aims to estimate the parameters of each of the reduced-form
equations by OLS. The application of OLS is made possible by the fact that the
explanatory variables (predetermined variables) of the reduced-form equations
are no longer correlated with the disturbances.
– The purpose of the third step is to deduce the parameters of the structural
form from the estimated parameters of the reduced form. This determination is
made using the algebraic relations linking the structural and the reduced form
coefficients. The solution is unique since the model is exactly identifiable: there
is thus a one-to-one correspondence between the structural coefficients and those
of the reduced form.

The ILS estimator of the reduced form—which is therefore the OLS estimator—
is a BLUE estimator. In contrast, the ILS estimator of the structural form coefficients
is a biased estimator in the case of small samples. In addition, since the reduced form
of a model is not always easy to establish—especially when the model comprises a
large number of equations—and the existence of an exactly identified relationship is
quite rare, the ILS method is not often used in practice. The two-stage least squares
method is used more frequently.

8.3.2 Two-Stage Least Squares

The two-stage least squares (2SLS) method is the most widely used estimation
method for simultaneous equations models. This estimation procedure was intro-
duced by Theil (1953) and Basmann (1957) and applies to models that are exactly
identifiable or overidentifiable.
As the name suggests, this technique involves applying the OLS method twice.
Consider the simultaneous equations model with M endogenous variables and k
predetermined variables:

Y1t = β12 Y2t + . . . + β1M YMt + γ11 X1t + γ12 X2t + . . . + γ1k Xkt + ε1t
Y2t = β21 Y1t + . . . + β2M YMt + γ21 X1t + γ22 X2t + . . . + γ2k Xkt + ε2t
.
...
YMt = βM1 Y1t + . . . + βMM YMt + γM1 X1t + γM2 X2t + . . . + γMk Xkt + εMt
(8.46)
364 8 Simultaneous Equations Models

The first step consists of regressing each of the endogenous variables


(Y1t , Y2t , . . . , YMt ) on the set of predetermined variables .(X1t , X2t , . . . , Xkt )—
.

the aim being to remove the correlation between endogenous variables and
disturbances. We thus have the following system:

Y1t = α11 X1t + α12 X2t + . . . + α1k Xkt + u1t


Y2t = α21 X1t + α22 X2t + . . . + α2k Xkt + u2t
. (8.47)
...
YMt = αM1 X1t + αM2 X2t + . . . + αMk Xkt + uMt

The terms .(u1t , u2t , . . . , uMt ) denote the error terms associated with each of
the equations in this system. This system corresponds to a reduced form system
insofar as no endogenous variables appear on the right-hand side of the various
equations. We deduce
  from the estimation of these equations the estimated values
. Ŷ1t , Ŷ2t , . . . , ŶMt :

Ŷ1t = α̂11 X1t + α̂12 X2t + . . . + α̂1k Xkt


Ŷ2t = α̂21 X1t + α̂22 X2t + . . . + α̂2k Xkt
. (8.48)
...
ŶMt = α̂M1 X1t + α̂M2 X2t + . . . + α̂Mk Xkt

The second step consists in replacing the endogenous variables appearing on the
right-hand side of the structural equations with their values estimated in the first
step, i.e.:

Y1t = β12 Ŷ2t + . . . + β1M ŶMt + γ11 X1t + γ12 X2t + . . . + γ1k Xkt + v1t
Y2t = β21 Ŷ1t + . . . + β2M ŶMt + γ21 X1t + γ22 X2t + . . . + γ2k Xkt + v2t
.
...
YMt = βM1 Ŷ1t + . . . + βMM ŶMt + γM1 X1t + γM2 X2t + . . . + γMk Xkt + vMt
(8.49)

where the terms .(v1t , v2t , . . . , vMt ) designate the disturbances associated with the
equations of the latter system.
The two-stage least squares estimator can be interpreted as an instrumental
variables estimator where the instruments used are the estimated values of the
endogenous variables (for an in-depth description, see in particular Johnston
and Dinardo, 1996). It can be shown that in the absence of autocorrelation
and heteroskedasticity, the two-stage least squares estimator is the most efficient
instrumental variables estimator.

Remark 8.4 If the coefficients of determination associated with the reduced-form


equations of the first stage are very high, the OLS and two-stage least squares
estimators will be similar. Indeed, if the coefficient of determination is large, the
8.3 Estimation Methods 365

 
estimated values of the endogenous variables . Ŷ1t , Ŷ2t , . . . , ŶMt are close to the
true values .(Y1t , Y2t , . . . , YMt ). As a result, the estimators in the second step will be
very close to those in the first step. Conversely, if the coefficients of determination
associated with the reduced-form equations of the first  stage are low, the
 regressions
are poorly explanatory, and the estimated values . Ŷ1t , Ŷ2t , . . . , ŶMt used in the
second stage will be largely composed of the errors of the first-stage regressions.
The significance of the two-stage least squares estimators is then greatly reduced.

Remark 8.5 When the model is exactly identified, the indirect least squares and
two-stage least squares methods lead to identical results.

Remark 8.6 There are other limited-information methods for estimating simulta-
neous equations models: the generalized moments estimator (used when there is
a presumption of heteroskedasticity), the limited-information maximum likelihood
estimator, or K-class estimators. For a presentation of these various techniques, see
Theil (1971), Davidson and MacKinnon (1993), Florens et al. (2007), or Greene
(2020).

8.3.3 Full-Information Methods

We will not develop these techniques in this book but refer readers instead to Zellner
and Theil (1962), Theil (1971), Johnston and Dinardo (1996), or Greene (2020). Let
us simply mention that these procedures consist in estimating the M equations of
the system simultaneously. Thus, all the information about the set of the structural
equations is taken into account during the estimation. In this framework, the most
commonly used methods are:

– the Three-stage least squares method, due to Zellner and Theil (1962).
Heuristically, this technique involves (i) estimating the reduced form coefficients
by OLS, (ii) determining the two-stage least squares estimators for each equation,
and (iii) calculating the GLS estimator. The three-stage least squares estimator
is an asymptotically efficient instrumental variables estimator. It is particularly
appropriate when the disturbances are heteroskedastic and correlated with each
other.
– The full-information maximum likelihood method. Like the previous one,
this technique considers all the equations and all the model parameters jointly.
It is based on the assumption that the disturbances are normally distributed
and consists in maximizing the log likelihood associated with the model. In
addition to Theil (1971), Dhrymes (1973) and Hausman (1975, 1983) can also
be consulted on this technique.
– The system generalized method of moments. This method is mainly used in the
presence of heteroskedasticity. If the disturbances are homoskedastic, this leads
to results asymptotically equivalent to those derived from the three-stage least
squares method.
366 8 Simultaneous Equations Models

Remark 8.7 The three-stage least squares method can be seen as a two-stage least
squares version of the SUR (seemingly unrelated regressions) method of Zellner
(1962). A SUR model is a system composed of Mequations and T observations of
the type:


⎪ Y 1 = X1 β 1 + ε1

Y 2 = X2 β 2 + ε2
. (8.50)

⎪ ...

Y M = XM β M + εM

which can be written as:

Y i = Xi β i + εi
. (8.51)

'
 .i = 1, . . . , M,
with  .ε = [ε 1 , ε 2 , . . . , ε M ] , .E [ε |X1 , . . . , X M ] = 0, and
E εε ' |X1 , . . . , X M = Ωε .
.

This model is called a seemingly unrelated regressions model because the


equations are linked only by their disturbances. The SUR method, which consists
in applying the GLS technique to the system of M equations, is appropriate if all
the variables on the right-hand side of the equations are exogenous (which is not
the case in the structural form of simultaneous equations models) and enables the
parameters of a system to be estimated, taking into account heteroskedasticity and
correlation between the error terms of the different equations.

Remark 8.8 The three-stage least squares and full-information maximum likeli-
hood estimators are instrumental variables estimators. Both estimators have the
same asymptotic variance-covariance matrix. Therefore, under the assumption
of normality of the disturbances, the two estimators have the same asymptotic
distribution. The three-stage least squares estimator is, however, easier to calculate
than the full-information maximum likelihood estimator.

Remark 8.9 One may ask under what conditions the three-stage least squares
method is more efficient than the two-stage least squares method. Generally speak-
ing, a full-information method is more efficient than a limited-information method
if the model specification is correct. This is a very strong condition, especially in the
case of large models. A misspecification in the model structure will affect the whole
system with full-information three-stage least squares and maximum likelihood
methods, whereas limited-information methods generally restrict the problem to
the equation affected by the misspecification. Furthermore, if the disturbances of
the structural equations are not correlated with each other, the two-stage and three-
stage least squares methods yield identical results. Similarly, both techniques lead
to identical results if the model equations are exactly identified.
8.4 Specification Test 367

8.4 Specification Test

We have seen that OLS estimators are not consistent in the case of simultaneous
equations. In the presence of simultaneity, it is appropriate to use other estimation
techniques that we presented in the previous section (instrumental variables meth-
ods). However, if simultaneity does not exist, the instrumental variables techniques
lead to efficient, but non-consistent estimators. The question of simultaneity is
therefore crucial. It arises insofar as the endogenous variables appear among the
regressors of a simultaneous equations model and insofar as such variables are likely
to be correlated with the disturbances. Testing simultaneity therefore amounts to
testing the correlation between an endogenous regressor and the error term. If the
test concludes that simultaneity is present, it is appropriate to use the techniques
presented in the previous section, i.e., the instrumental variables methods. On the
other hand, in the absence of simultaneity, OLS should be used.
The test proposed by Hausman (1978) provides a way of dealing with the
simultaneity problem. The general principle of the test is to compare two sets
of estimators: (i) a set of estimators assumed to be consistent under the null
hypothesis (absence of simultaneity) and under the alternative hypothesis (presence
of simultaneity) and (ii) a set of estimators assumed to be consistent only under the
null hypothesis. To illustrate this test, consider the following example, inspired by
Pindyck and Rubinfeld (1991). The model consists of a demand equation:

Qt = α0 + α1 Pt + α2 Yt + α3 Rt + ε1t
. (8.52)

and a supply equation:

Qt = β0 + β1 Pt + ε2t
. (8.53)

where Q denotes quantity, P price, Y income, and R wealth. It is assumed that Y


and R are exogenous, with P and Q being endogenous. To determine whether there
is a simultaneity problem between P and Q, we proceed as follows.
In the first step, from the structural model formed by Eqs. (8.52) and (8.53), we
deduce the reduced form, which can be expressed in a general way as follows:

Qt = a0 + a1 Yt + α2 Rt + u1t
. (8.54)

Pt = b0 + b1 Yt + b2 Rt + u2t
. (8.55)

We estimate (8.55) by OLS, which gives:

P̂t = b̂0 + b̂1 Yt + b̂2 Rt


. (8.56)

from which we derive the residuals:

û2t = Pt − P̂t
. (8.57)
368 8 Simultaneous Equations Models

Replacing .Pt by .P̂t + û2t in (8.53), we obtain:

Qt = β0 + β1 P̂t + β1 û2t + ε2t


. (8.58)

Under the null hypothesis of no simultaneity, the correlation between .û2t and .ε2t
is zero.
The second step is to estimate the relationship (8.58) and perform a significance
test (usual t-test) of the coefficient assigned to .û2t . If this coefficient is not
significantly different from zero, the null hypothesis is not rejected and there is
no simultaneity problem: the OLS method can be applied. On the other hand, if it
is significantly different from zero, the instrumental variables methods presented in
the previous section should be preferred.

Remark 8.10 In Eq. (8.58), Pindyck and Rubinfeld (1991) suggest regressing .Qt
on .Pt (instead of .P̂t ) and .û2t .

8.5 Empirical Application

To illustrate the various concepts presented in this chapter, let us consider Klein’s
(1950) model of the US economy over the period 1920–1941.

8.5.1 Writing the Model

This model is composed of the following six equations:

Ct = α0 + α1 πt + α2 πt−1 + α3 (W1t + W2t ) + ε1t


. (8.59)

It = β0 + β1 πt + β2 πt−1 + β3 Kt−1 + ε2t


. (8.60)

W1t = γ0 + γ1 Yt + γ2 Yt−1 + γ3 t + ε3t


. (8.61)

Ct + It + Gt = Yt
. (8.62)

πt = Yt − W1t − W2t − Tt
. (8.63)

Kt − Kt−1 = It
. (8.64)

where C denotes consumption (in constant dollars), .π profits (in constant dollars),
W1 private sector wages, .W2 government wage payments (public sector wages), I
.

net investment (in constant dollars), .Kt−1 is the capital stock at the beginning of the
year, Y output (in constant dollars), G government expenditures, T taxes on profits,
and t a time trend.
8.5 Empirical Application 369

Equation (8.59) is the consumption equation, Eq. (8.60) is the investment


equation, and Eq. (8.61) describes the demand for labor. The last three equations
are identities. Equation (8.62) stresses that output is equal to the sum of consumer
demand for goods, firm investment, and government spending. According to Eq.
(8.63), output, i.e., income, is equal to the sum of profits, taxes on profits, and wages.
Finally, Eq. (8.64) defines investment as the change in the capital stock.
The endogenous variables of the system are consumption, investment, private
sector wages, output, profits, and the capital stock. With our notations, we therefore
have .M = 6. For the predetermined variables, we distinguish:

– Lagged variables: .πt−1 , Kt−1 , Yt−1


– Exogenous variables: .W2t , Tt , Gt , and the trend t

If we add the constant term present in each of the first three equations, the number
of exogenous variables k is equal to 8.

8.5.2 Conditions for Identification

Prior to estimation, it is necessary to check that the model is not underidentified, in


which case estimation is impossible. The identification condition (order condition)
established previously is written:

kj∗ ≥ Mj
. (8.65)

or, where there are linear restrictions on the parameters:

rj + kj∗ ≥ Mj
. (8.66)

where .Mj is the number of endogenous variables included in the equation j


considered, .kj∗ is the number of exogenous variables excluded from the equation
j , and .rj designates the number of restrictions other than those of exclusion. Recall
further that we have:

k = kj + kj∗
. (8.67)

where .kj is the number of exogenous variables in the equation j under considera-
tion, with k denoting the total number of exogenous variables in the model.
With these points in mind, let us study the identification conditions equation by
equation:

– For Eq. (8.59), we have: .Mj = 3 (three endogenous variables) and .kj = 3 (two
exogenous variables plus the constant term). A linear restriction is also imposed
on the parameters, since the coefficients associated with .W1 and .W2 are assumed
370 8 Simultaneous Equations Models

to be identical. We thus have .rj = 1. We therefore use the order condition (8.66)
with .kj∗ = k − kj = 8 − 3 = 5. We have: .rj + kj∗ = 1 + 5 = 6 which is greater
than .Mj = 3. We deduce that Eq. (8.59) is overidentified.
– In Eq. (8.60), we have: .Mj = 2 and .kj = 3 (two exogenous variables plus the
constant term). No restriction is imposed on the parameters . rj = 0 and we then
use the order condition (8.65). .kj∗ = 8 − 3 = 5 is greater than .Mj = 2, implying
that Eq. (8.60) is also overidentified.
– Finally, Eq. (8.61) is such that .Mj = 2 and .kj = 3 (two exogenous variables plus
the constant term). Due to the absence of restrictions on the parameters . rj = 0 ,
using the order condition (8.65) gives us: .kj∗ > Mj . Consequently, Eq. (8.61) is
also overidentified.

All three equations of the Klein model are overidentified. The model can then be
estimated.

8.5.3 Data

The data concern the United States over the period 1920–1941 and are annual.
Table 8.1 gives the values taken by the various variables used in the model.

8.5.4 Model Estimation

In order to estimate the Klein model, instrumental variables methods must be used.
We propose to apply one limited-information method (two-stage least squares)
and two full-information methods (three-stage least squares and full-information
maximum likelihood). First, we estimate each of the equations using OLS.

OLS Estimation Equation by Equation


As previously mentioned, OLS estimators are not consistent when there is interde-
pendence between endogenous variables, which is the case here. However, we apply
this estimation procedure for comparison with the results obtained by instrumental
variables methods. The results of the OLS estimation of each equation are reported
in Tables 8.2 (consumption equation), 8.3 (investment equation), and 8.4 (labor-
demand equation).

Two-Stage Least Squares Estimation


We now propose to apply the two-stage least squares method to the three equations
of the Klein model. This method is a priori appropriate insofar as the model is
overidentified (we cannot apply the indirect least squares method, which requires
the model to be exactly identified).
Applying this procedure involves selecting a certain number of instruments. We
have chosen the same instruments for each of the equations, i.e., the set of exogenous
variables: the constant term, one-period lagged profits, one-period lagged capital
8.5 Empirical Application 371

Table 8.1 Data from Klein’s model


.Ct .πt .W1t .W2t .Kt−1 .Yt .Gt .It .Tt

1920 39.8 12.7 28.8 2.2 180.1 44.9 2.4 2.7 3.4
1921 41.9 12.4 25.5 2.7 182.8 45.6 3.9 .−0.2 7.7
1922 45 16.9 29.3 2.9 182.6 50.1 3.2 1.9 3.9
1923 49.2 18.4 34.1 2.9 184.5 57.2 2.8 5.2 4.7
1924 50.6 19.4 33.9 3.1 189.7 57.1 3.5 3 3.8
1925 52.6 20.1 35.4 3.2 192.7 61 3.3 5.1 5.5
1926 55.1 19.6 37.4 3.3 197.8 64 3.3 5.6 7
1927 56.2 19.8 37.9 3.6 203.4 64.4 4 4.2 6.7
1928 57.3 21.1 39.2 3.7 207.6 64.5 4.2 3 4.2
1929 57.8 21.7 41.3 4 210.6 67 4.1 5.1 4
1930 55 15.6 37.9 4.2 215.7 61.2 5.2 1 7.7
1931 50.9 11.4 34.5 4.8 216.7 53.4 5.9 .−3.4 7.5
1932 45.6 7 29 5.3 213.3 44.3 4.9 .−6.2 8.3
1933 46.5 11.2 28.5 5.6 207.1 45.1 3.7 .−5.1 5.4
1934 48.7 12.3 30.6 6 202 49.7 4 .−3 6.8
1935 51.3 14 33.2 6.1 199 54.4 4.4 .−1.3 7.2
1936 57.7 17.6 36.8 7.4 197.7 62.7 2.9 2.1 8.3
1937 58.7 17.3 41 6.7 199.8 65 4.3 2 6.7
1938 57.5 15.3 38.2 7.7 201.8 60.9 5.3 .−1.9 7.4
1939 61.6 19 41.6 7.8 199.9 69.5 6.6 1.3 8.9
1940 65 21.1 45 8 201.2 75.7 7.4 3.3 9.6
1941 69.7 23.5 53.3 8.5 204.5 88.4 13.8 4.9 11.6
Source: Klein (1950)

Table 8.2 OLS estimation of the consumption equation


Dependent variable: C
Method: least squares
Variable Coefficient Std. error t-Statistic Prob.
Constant 16.23660 1.302698 12.46382 0.0000
.π 0.192934 0.091210 2.115273 0.0495
.π(−1) 0.089885 0.090648 0.991582 0.3353
.(W1 + W2 ) 0.796219 0.039944 19.93342 0.0000
R-squared 0.981008 Mean dependent var 53.99524
Adjusted R-squared 0.977657 S.D. dependent var 6.860866
S.E. of regression 1.025540 Akaike info criterion 3.057959
Sum squared resid 17.87945 Schwarz criterion 3.256916
Log likelihood .−28.10857 F-statistic 292.7076
Durbin-Watson stat 1.367474 Prob(F-statistic) 0.000000
372 8 Simultaneous Equations Models

Table 8.3 OLS estimation of the investment equation


Dependent variable: I
Method: least squares
Variable Coefficient Std. error t-Statistic Prob.
Constant 10.12579 5.465547 1.852658 0.0814
.π 0.479636 0.097115 4.938864 0.0001
.π(−1) 0.333039 0.100859 3.302015 0.0042
.K(−1) .−0.111795 0.026728 .−4.182749 0.0006
R-squared 0.931348 Mean dependent var 1.266667
Adjusted R-squared 0.919233 S.D. dependent var 3.551948
S.E. of regression 1.009447 Akaike info criterion 3.026325
Sum squared resid 17.32270 Schwarz criterion 3.225282
Log likelihood .−27.77641 F-statistic 76.87537
Durbin-Watson stat 1.810184 Prob(F-statistic) 0.000000

Table 8.4 OLS estimation of the labor-demand equation


Dependent variable: .W1
Method: least squares
Variable Coefficient Std. error t-Statistic Prob.
Constant 0.064346 1.151797 0.055866 0.9561
Y 0.439477 0.032408 13.56093 0.0000
.Y (−1) 0.146090 0.037423 3.903734 0.0011
.@T REN D 0.130245 0.031910 4.081604 0.0008
R-squared 0.987414 Mean dependent var 36.36190
Adjusted R-squared 0.985193 S.D. dependent var 6.304401
S.E. of regression 0.767147 Akaike info criterion 2.477367
Sum squared resid 10.00475 Schwarz criterion 2.676324
Log likelihood .−22.01235 F-statistic 444.5682
Durbin-Watson stat 1.958434 Prob(F-statistic) 0.000000

stock, one-period lagged output, public sector wages, government expenditure,


income taxes, and the trend. The results from applying the two-stage least squares
method to each of the equations are given in Tables 8.5 (consumption equation), 8.6
(investment equation), and 8.7 (labor-demand equation).
If we compare the results obtained with the two-stage least squares technique
with those obtained with the OLS method, we see that the coefficients still have the
same signs, but their orders of magnitude are different. This is particularly notice-
able for profits and one-period lagged profits in the consumption and investment
equations. In particular, the OLS method gives more weight to current profits, unlike
the two-stage least squares procedure, which gives greater weight to one-period
lagged profits. However, the sum of the coefficients associated with .πt and .πt−1
is similar for both methods. With regard to the labor-demand equation, the results
obtained by the two methods are very similar.
8.5 Empirical Application 373

Table 8.5 Two-stage least squares estimation of the consumption equation


Dependent variable: C
Method: two-stage least squares
Instrument list: constant .π(−1) .K(−1) .Y (−1) .W2 .@T REN D G T
Variable Coefficient Std. error t-Statistic Prob.
Constant 16.55476 1.467979 11.27725 0.0000
.π 0.017302 0.131205 0.131872 0.8966
.π(−1) 0.216234 0.119222 1.813714 0.0874
.(W1 + W2 ) 0.810183 0.044735 18.11069 0.0000
R-squared 0.976711 Mean dependent var 53.99524
Adjusted R-squared 0.972601 S.D. dependent var 6.860866
S.E. of regression 1.135659 F-statistic 225.9334
Sum squared resid 21.92525 Prob(F-statistic) 0.0000
Durbin-Watson stat 1.485072

Table 8.6 Two-stage least squares estimation of the investment equation


Dependent variable: I
Method: two-stage least squares
Instrument list: constant .π(−1) .K(−1) .Y (−1) .W2 .@T REN D G T
Variable Coefficient Std. error t-Statistic Prob.
Constant 20.27821 8.383249 2.418896 0.0271
.π 0.150222 0.192534 0.780237 0.4460
.π(−1) 0.615944 0.180926 3.404398 0.0034
.K(−1) .−0.157788 0.040152 .−3.929751 0.0011
R-squared 0.884884 Mean dependent var 1.266667
Adjusted R-squared 0.864569 S.D. dependent var 3.551948
S.E. of regression 1.307149 F-statistic 41.20019
Sum squared resid 29.04686 Prob(F-statistic) 0.000000
Durbin-Watson stat 1.810184

Three-Stage Least Squares Estimation


The three-stage least squares method is a full-information estimation method, since
we estimate the model as a whole, i.e., all the equations simultaneously. All the
information contained in the system is thus taken into account. Such a technique is
particularly appropriate in the presence of heteroskedasticity and cross-correlation
between the disturbances.
The list of instruments used for the estimation is identical to the previous one,
namely: the constant, one-period lagged profits, one-period lagged capital stock,
one-period lagged output, public sector wages, government expenditure, taxes on
profits, and the trend. The results from the estimation are shown in Table 8.8.
Table 8.9 sets out the estimation statistics for each of the three equations.
Table 8.8 shows that the results obtained by three-stage least squares are similar
to those obtained by applying the two-stage least squares method. The coefficients
374 8 Simultaneous Equations Models

Table 8.7 Two-stage least squares estimation of the labor-demand equation


Dependent variable: .W1
Method: two-stage least squares
Instrument list: constant .π(−1) .K(−1) .Y (−1) .W2 .@T REN D G T
Variable Coefficient Std. error t-Statistic Prob.
Constant 0.065944 1.153313 0.057178 0.9551
Y 0.438859 0.039603 11.08155 0.0000
.Y (−1) 0.146674 0.043164 3.398063 0.0034
.@T REN D 0.130396 0.032388 4.026001 0.0009
R-squared 0.987414 Mean dependent var 36.36190
Adjusted R-squared 0.985193 S.D. dependent var 6.304401
S.E. of regression 0.767155 F-statistic 424.1940
Sum squared resid 10.00496 Prob(F-statistic) 0.000000
Durbin-Watson stat 1.963416

Table 8.8 Three-stage least squares estimation of the Klein model


Estimation method: three-stage least squares
Coefficient Std. error t-Statistic Prob.
.C = C(1) + C(2) ∗ π + C(3) ∗ π(−1) + C(4) ∗ (W1 + W2 )
C(1) 16.44079 1.304549 12.60266 0.0000
C(2) 0.124890 0.108129 1.155013 0.2535
C(3) 0.163144 0.100438 1.624323 0.1105
C(4) 0.790081 0.037938 20.82563 0.0000
.I = C(5) + C(6) ∗ π + C(7) ∗ π(−1) + C(8) ∗ K(−1)

C(5) 28.17785 6.793770 4.147601 0.0001


C(6) .−0.013079 0.161896 .−0.080787 0.9359
C(7) 0.755724 0.152933 4.941532 0.0000
C(8) .−0.194848 0.032531 .−5.989674 0.0000
.W1 = C(9) + C(10) ∗ Y + C(11) ∗ Y (−1) + C(12) ∗ T REN D

C(9) 0.150802 1.014983 0.148576 0.8825


C(10) 0.400492 0.031813 12.58877 0.0000
C(11) 0.181291 0.034159 5.307304 0.0000
C(12) 0.149674 0.027935 5.357897 0.0000
Determinant residual covariance: 0.282997

are always assigned the same signs, but the orders of magnitude vary slightly.
However, even if the value taken by the coefficients is sometimes different, the
weight of the variables is not modified in the sense that a variable that was not
significant with the two-stage least squares method is also not significant with the
three-stage least squares method. The same applies to significant variables.
8.5 Empirical Application 375

Table 8.9 Estimation statistics, three-stage least squares method


Consumption equation
R-squared 0.980108 Mean dependent var 53.99524
Adjusted R-squared 0.976598 S.D. dependent var 6.860866
S.E. of regression 1.049565 Sum squared resid 18.72696
Durbin-Watson stat 1.424939
Investment equation
R-squared 0.825805 Mean dependent var 1.266667
Adjusted R-squared 0.795065 S.D. dependent var 3.551948
S.E. of regression 1.607958 Sum squared resid 43.95398
Durbin-Watson stat 1.995884
Labor-demand equation
R-squared 0.986262 Mean dependent var 36.36190
Adjusted R-squared 0.983838 S.D. dependent var 6.304401
S.E. of regression 0.801490 Sum squared resid 10.92056
Durbin-Watson stat 2.155046

Table 8.10 Full-information maximum likelihood estimation of the Klein model


Estimation method: full-information maximum likelihood (Marquardt)
Coefficient Std. error z-Statistic Prob.
.C = C(1) + C(2) ∗ π + C(3) ∗ π(−1) + C(4) ∗ (W1 + W2 )
C(1) 15.83177 4.111036 3.851040 0.0001
C(2) 0.299937 0.412579 0.726980 0.4672
C(3) 0.042552 0.166547 0.255499 0.7983
C(4) 0.781083 0.078554 9.943317 0.0000
.I = C(5) + C(6) ∗ π + C(7) ∗ π(−1) + C(8) ∗ K(−1)
C(5) 15.59875 14.40899 1.082571 0.2790
C(6) 0.382663 0.341197 1.121533 0.2621
C(7) 0.409364 0.248292 1.648721 0.0992
C(8) .−0.137156 0.071673 .−1.913642 0.0557
.W1 = C(9) + C(10) ∗ Y + C(11) ∗ Y (−1) + C(12) ∗ T REN D

C(9) 0.036159 4.171662 0.008668 0.9931


C(10) 0.370776 0.128994 2.874372 0.0040
C(11) 0.207497 0.090315 2.297480 0.0216
C(12) 0.184179 0.101391 1.816528 0.0693
Log likelihood: .−69.25950
Determinant residual covariance: 0.146976

Full-Information Maximum Likelihood Estimation


Implementing the full-information maximum likelihood procedure involves assum-
ing that the error terms are normally distributed. The results obtained are shown in
Tables 8.10 and 8.11. In general, it can be seen that the t-statistics of the coefficients
are significantly lower than those associated with the coefficients estimated by
376 8 Simultaneous Equations Models

Table 8.11 Estimation statistics, full-information maximum likelihood method


Consumption equation
R-squared 0.979294 Mean dependent var 53.99524
Adjusted R-squared 0.975640 S.D. dependent var 6.860866
S.E. of regression 1.070813 Sum squared resid 19.49287
Durbin-Watson stat 1.260803
Investment equation
R-squared 0.926089 Mean dependent var 1.266667
Adjusted R-squared 0.913046 S.D. dependent var 3.551948
S.E. of regression 1.047396 Sum squared resid 18.64964
Durbin-Watson stat 1.895880
Labor-demand equation
R-squared 0.982884 Mean dependent var 36.36190
Adjusted R-squared 0.979864 S.D. dependent var 6.304401
S.E. of regression 0.894610 Sum squared resid 13.60556
Durbin-Watson stat 2.024727

the other techniques (two-stage and three-stage least squares). In the consumption
equation, the values taken by the coefficients of the two profit variables differ from
those obtained by three-stage least squares, but remain insignificant. Conversely, in
the investment equation, the variables that were significant with three-stage least
squares are no longer significant with the maximum likelihood method. Finally,
the results concerning the last equation of the Klein model remain similar to those
obtained with the three-stage least squares technique.

Conclusion

This chapter has gone beyond the univariate framework by presenting multi-
equation models, i.e., systems of equations. Simultaneous equations models, the
subject of this chapter, are based on economic foundations and are therefore
an alternative to VAR models (presented in the previous chapter), which are a-
theoretical. We have seen that a prerequisite for estimating simultaneous equations
models is identification: we need to check that the available data contain sufficient
information for the parameters to be estimated. Once identification has been
carried out, it is possible to proceed with estimation. Several procedures have been
presented and/or applied, including indirect least squares, two-stage least squares,
three-stage least squares, and full-information maximum likelihood.
Further Reading 377

The Gist of the Chapter

Simultaneous equations model B Y + 𝚪 X = ε


(M,M)(M,1) (M,k)(k,1) (M,1)
Y : vector containing the M endogenous variables
X: vector containing the k exogenous variables
ε: vector of structural disturbances
Identification Order condition (identification)
Rank condition (uniqueness of the solution)
Estimation
Limited information Indirect least squares
methods Two-stage least squares
Full-information Three-stage least squares
methods Full-information maximum likelihood
Specification Hausman (1978) test

Further Reading

Developments on simultaneous equations models can be found in the textbooks


by Gujarati et al. (2017) and Greene (2020). Theil (1978), Pindyck and Rubinfeld
(1991), and Florens et al. (2007) will also prove useful.
Appendix: Statistical Tables

Standard Normal Distribution

The table below shows the values for z positive, For z, negative, the value is
N (z) = I − N (−z).

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 379
V. Mignon, Principles of Econometrics, Classroom Companion: Economics,
https://doi.org/10.1007/978-3-031-52535-3
380

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.500000 0.503989 0.507978 0.511966 0.515953 0.519939 0.523922 0.527903 0.531881 0.535856
0.1 0.539828 0.543795 0.547758 0.551717 0.555670 0.559618 0.563559 0.567495 0.571424 0.575345
0.2 0.579260 0.583166 0.587064 0.590954 0.594835 0.598706 0.602568 0.606420 0.610261 0.614092
0.3 0.617911 0.621720 0.625516 0.629300 0.633072 0.636831 0.640576 0.644309 0.648027 0.651732
0.4 0.655422 0.659097 0.662757 0.666402 0.670031 0.673645 0.677242 0.680822 0.684386 0.687933
0.5 0.691462 0.694974 0.698468 0.701944 0.705401 0.708840 0.712260 0.715661 0.719043 0.722405
0.6 0.725747 0.729069 0.732371 0.735653 0.738914 0.742154 0.745373 0.748571 0.751748 0.754903
0.7 0.758036 0.761148 0.764238 0.767305 0.770350 0.773373 0.776373 0.779350 0.782305 0.785236
0.8 0.788145 0.791030 0.793892 0.796731 0.799546 0.802337 0.805105 0.807850 0.810570 0.813267
0.9 0.815940 0.818589 0.821214 0.823814 0.826391 0.828944 0.831472 0.833977 0.836457 0.838913
1.0 0.841345 0.843752 0.846136 0.848495 0.850830 0.853141 0.855428 0.857690 0.859929 0.862143
1.1 0.864334 0.866500 0.868643 0.870762 0.872857 0.874928 0.876976 0.879000 0.881000 0.882977
1.2 0.884930 0.886861 0.888768 0.890651 0.892512 0.894350 0.896165 0.897958 0.899727 0.901475
1.3 0.903200 0.904902 0.906582 0.908241 0.909877 0.911492 0.913085 0.914657 0.916207 0.917736
1.4 0.919243 0.920730 0.922196 0.923641 0.925066 0.926471 0.927855 0.929219 0.930563 0.931888
1.5 0.933193 0.934478 0.935745 0.936992 0.938220 0.939429 0.940620 0.941792 0.942947 0.944083
1.6 0.945201 0.946301 0.947384 0.948449 0.949497 0.950529 0.951543 0.952540 0.953521 0.954486
1.7 0.955435 0.956367 0.957284 0.958185 0.959070 0.959941 0.960796 0.961636 0.962462 0.963273
1.8 0.964070 0.964852 0.965620 0.966375 0.967116 0.967843 0.968557 0.969258 0.969946 0.970621
1.9 0.971283 0.971933 0.972571 0.973197 0.973810 0.974412 0.975002 0.975581 0.976148 0.976705
2.0 0.977250 0.977784 0.978308 0.978822 0.979325 0.979818 0.980301 0.980774 0.981237 0.981691
(continued)
Appendix: Statistical Tables
2.1 0.982136 0.982571 0.982997 0.983414 0.983823 0.984222 0.984614 0.984997 0.985371 0.985738
2.2 0.986097 0.986447 0.986791 0.987126 0.987455 0.987776 0.988089 0.988396 0.988696 0.988989
2.3 0.989276 0.989556 0.989830 0.990097 0.990358 0.990613 0.990863 0.991106 0.991344 0.991576
2.4 0.991802 0.992024 0.992240 0.992451 0.992656 0.992857 0.993053 0.993244 0.993431 0.993613
Appendix: Statistical Tables

2.5 0.993790 0.993963 0.994132 0.994297 0.994457 0.994614 0.994766 0.994915 0.995060 0.995201
2.6 0.995339 0.995473 0.995604 0.995731 0.995855 0.995975 0.996093 0.996207 0.996319 0.996427
2.7 0.996533 0.996636 0.996736 0.996833 0.996928 0.997020 0.997110 0.997197 0.997282 0.997365
2.8 0.997445 0.997523 0.997599 0.997673 0.997744 0.997814 0.997882 0.997948 0.998012 0.998074
2.9 0.998134 0.998193 0.998250 0.998305 0.998359 0.998411 0.998462 0.998511 0.998559 0.998605
(continued)
381
382 Appendix: Statistical Tables

For values of z higher than 3:

z 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.8 4.0 4.5
N(z) 0.998650 0.999032 0.999313 0.999517 0.999663 0.999767 0.999841 0.999928 0.999968 0.999997

Student t Distribution: Critical Values of t


r P = 0.90 P = 0.80 P = 0.70 P = 0.60 P = 0.50 P = 0.40 P = 0.30 P = 0.20 P = 0.10 P = 0.05 P = 0.01 P = 0.005
1 0.158 0.325 0.510 0.727 1.000 1.376 1.963 3.078 6.314 12.706 63.657 127.321
2 0.142 0.289 0.445 0.617 0.816 1.061 1.386 1.886 2.920 4.303 9.925 14.089
3 0.137 0.277 0.424 0.584 0.765 0.978 1.250 1.638 2.353 3.182 5.841 7.453
4 0.134 0.271 0.414 0.569 0.741 0.941 1.190 1.533 2.132 2.776 4.604 5.598
5 0.132 0.267 0.408 0.559 0.727 0.920 1.156 1.476 2.015 2.571 4.032 4.773
Appendix: Statistical Tables

6 0.131 0.265 0.404 0.553 0.718 0.906 1.134 1.440 1.943 2.447 3.707 4.317
7 0.130 0.263 0.402 0.549 0.711 0.896 1.119 1.415 1.895 2.365 3.499 4.029
8 0.130 0.262 0.399 0.546 0.706 0.889 1.108 1.397 1.860 2.306 3.355 3.833
9 0.129 0.261 0.398 0.543 0.703 0.883 1.100 1.383 1.833 2.262 3.250 3.690
10 0.129 0.260 0.397 0.542 0.700 0.879 1.093 1.372 1.812 2.228 3.169 3.581
11 0.129 0.260 0.396 0.540 0.697 0.876 1.088 1.363 1.796 2.201 3.106 3.497
12 0.128 0.259 0.395 0.539 0.695 0.873 1.083 1.356 1.782 2.179 3.055 3.428
13 0.128 0.259 0.394 0.538 0.694 0.870 1.079 1.350 1.771 2.160 3.012 3.372
14 0.128 0.258 0.393 0.537 0.692 0.868 1.076 1.345 1.761 2.145 2.977 3.326
15 0.128 0.258 0.393 0.536 0.691 0.866 1.074 1.341 1.753 2.131 2.947 3.286
16 0.128 0.258 0.392 0.535 0.690 0.865 1.071 1.337 1.746 2.120 2.921 3.252
17 0.128 0.257 0.392 0.534 0.689 0.863 1.069 1.333 1.740 2.110 2.898 3.222
18 0.127 0.257 0.392 0.534 0.688 0.862 1.067 1.330 1.734 2.101 2.878 3.197
19 0.127 0.257 0.391 0.533 0.688 0.861 1.066 1.328 1.729 2.093 2.861 3.174
20 0.127 0.257 0.391 0.533 0.687 0.860 1.064 1.325 1.725 2.086 2.845 3.153
21 0.127 0.257 0.391 0.532 0.686 0.859 1.063 1.323 1.721 2.080 2.831 3.135
22 0.127 0.256 0.390 0.532 0.686 0.858 1.061 1.321 1.717 2.074 2.819 3.119
(continued)
383
384

r P = 0.90 P = 0.80 P = 0.70 P = 0.60 P = 0.50 P = 0.40 P = 0.30 P = 0.20 P = 0.10 P = 0.05 P = 0.01 P = 0.005
23 0.127 0.256 0.390 0.532 0.685 0.858 1.060 1.319 1.714 2.069 2.807 3.104
24 0.127 0.256 0.390 0.531 0.685 0.857 1.059 1.318 1.711 2.064 2.797 3.091
25 0.127 0.256 0.390 0.531 0.684 0.856 1.058 1.316 1.708 2.060 2.787 3.078
26 0.127 0.256 0.390 0.531 0.684 0.856 1.058 1.315 1.706 2.056 2.779 3.067
27 0.127 0.256 0.389 0.531 0.684 0.855 1.057 1.314 1.703 2.052 2.771 3.057
28 0.127 0.256 0.389 0.530 0.683 0.855 1.056 1.313 1.701 2.048 2.763 3.047
29 0.127 0.256 0.389 0.530 0.683 0.854 1.055 1.311 1.699 2.045 2.756 3.038
30 0.127 0.256 0.389 0.530 0.683 0.854 1.055 1.310 1.697 2.042 2.750 3.030
40 0.126 0.255 0.388 0.529 0.681 0.851 1.050 1.303 1.684 2.021 2.704 2.971
80 0.126 0.254 0.387 0.526 0.678 0.846 1.043 1.292 1.664 1.990 2.639 2.887
120 0.126 0.254 0.386 0.526 0.677 0.845 1.041 1.289 1.658 1.980 2.617 2.860
∞ 0.126 0.253 0.385 0.524 0.675 0.842 1.036 1.282 1.645 1.960 2.576 2.808
Appendix: Statistical Tables
Appendix: Statistical Tables 385

Chi-Squared Distribution: Critical Values of c


386

r P = 0.990 P = 0.975 P = 0.950 P = 0.900 P = 0.800 P = 0.700 P = 0.500 P = 0.300 P = 0.200 P = 0.100 P = 0.010 P = 0.005 P = 0.001
1 0.000 0.001 0.004 0.016 0.064 0.148 0.455 1.074 1.642 2.706 6.635 7.879 10.828
2 0.200 0.051 0.103 0.211 0.446 0.713 1.386 2.408 3.219 4.605 9.210 10.597 13.816
3 0.115 0.216 0.352 0.584 1.005 1.424 2.366 3.665 4.642 6.251 11.345 12.838 16.266
4 0.297 0.484 0.711 1.064 1.649 2.195 3.357 4.878 5.989 7.779 13.277 14.860 18.467
5 0.554 0.831 1.145 1.610 2.343 3.000 4.351 6.064 7.289 9.236 15.086 16.750 20.515
6 0.872 1.237 1.635 2.204 3.070 3.828 5.348 7.231 8.558 10.645 16.812 18.548 22.458
7 1.239 1.690 2.167 2.833 3.822 4.671 6.346 8.383 9.803 12.017 18.475 20.278 24.322
8 1.646 2.180 2.733 3.490 4.594 5.527 7.344 9.524 11.030 13.362 20.090 21.955 26.124
9 2.088 2.700 3.325 4.168 5.380 6.393 8.343 10.656 12.242 14.684 21.666 23.589 27.877
10 2.558 3.247 3.940 4.865 6.179 7.267 9.342 11.781 13.442 15.987 23.209 25.188 29.588
11 3.053 3.816 4.575 5.578 6.989 8.148 10.341 12.899 14.631 17.275 24.725 26.757 31.264
12 3.571 4.404 5.226 6.304 7.807 9.034 11.340 14.011 15.812 18.549 26.217 28.300 32.909
13 4.107 5.009 5.892 7.042 8.634 9.926 12.340 15.119 16.985 19.812 27.688 29.819 34.528
14 4.660 5.629 6.571 7.790 9.467 10.821 13.339 16.222 18.151 21.064 29.141 31.319 36.123
15 5.229 6.262 7.261 8.547 10.307 11.721 14.339 17.322 19.311 22.307 30.578 32.801 37.697
16 5.812 6.908 7.962 0.312 11.152 12.624 15.338 18.418 20.465 23.542 32.000 34.267 39.252
17 6.408 7.564 8.672 10.085 12.002 13.531 16.338 19.511 21.615 24.769 33.409 35.718 40.790
18 7.015 8.231 9.390 10.865 12.857 14.440 17.338 20.601 22.760 25.989 34.805 37.156 42.312
19 7.633 8.907 10.117 11.651 13.716 15.352 18.338 21.689 23.900 27.204 36.191 38.582 43.820
20 8.260 9.591 10.851 12.443 14.578 16.266 19.337 22.775 25.038 28.412 37.566 39.997 45.315
(continued)
Appendix: Statistical Tables
21 8.897 10.283 11.591 13.240 15.445 17.182 20.337 23.858 26.171 29.615 38.932 41.401 46.797
22 9.542 10.982 12.338 14.041 16.314 18.101 21.337 24.939 27.301 30.813 40.289 42.796 48.268
23 10.196 11.689 13.091 14.848 17.187 19.021 22.337 26.018 28.429 32.007 41.638 44.181 49.728
24 10.856 12.401 13.848 15.659 18.062 19.943 23.337 27.096 29.553 33.196 42.980 45.559 51.179
25 11.524 13.120 14.611 16.473 18.940 20.867 24.337 28.172 30.675 34.382 44.314 46.928 52.620
26 12.198 13.844 15.379 17.292 19.820 21.792 25.336 29.246 31.795 35.563 45.642 48.290 54.052
27 12.879 14.573 16.151 18.114 20.703 22.719 26.336 30.319 32.912 36.741 46.963 49.645 55.476
Appendix: Statistical Tables

28 13.565 15.308 16.928 18.939 21.588 23.647 27.336 31.391 34.027 37.916 48.278 50.993 56.892
29 14.256 16.047 17.708 19.768 22.475 24.577 28.336 32.461 35.139 39.087 49.588 52.336 58.301
30 14.953 16.791 18.493 20.599 23.364 25.508 29.336 33.530 36.250 40.256 50.892 53.672 59.703
40 22.164 24.433 26.509 29.051 32.345 34.872 39.335 44.165 47.269 51.805 63.691 66.766 73.402
80 53.540 57.153 60.391 64.278 69.207 72.915 79.334 86.120 90.405 96.578 112.329 116.321 124.839
120 86.923 91.573 95.705 100.624 106.806 111.419 119.334 127.616 132.806 140.233 158.950 163.648 173.617
387
388 Appendix: Statistical Tables

Fisher–Snedecor Distribution: Critical Values of F


v1 = 1 v1 = 2 v1 = 3 v1 = 4 v1 = 5 v1 = 6
v2 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01
1 161.448 4052.181 199.500 4999.500 215.707 5403.352 224.583 5624.583 230.162 5763.650 233.986 5858.986
2 18.513 98.503 19.000 99.000 19.164 99.166 19.247 99.249 19.296 99.299 19.330 99.333
3 10.128 34.116 9.552 30.817 9.277 29.457 9.117 28.710 9.013 28.237 8.941 27.911
4 7.709 21.198 6.944 18.000 6.591 16.694 6.388 15.977 6.256 15.522 6.163 15.207
Appendix: Statistical Tables

5 6.608 16.258 5.786 13.274 5.409 12.060 5.192 11.392 5.050 10.967 4.950 10.672
6 5.987 13.745 5.143 10.925 4.757 9.780 4.534 9.148 4.387 8.746 4.284 8.466
7 5.591 12.246 4.737 9.547 4.347 8.451 4.120 7.847 3.972 7.460 3.866 7.191
8 5.318 11.259 4.459 8.649 4.066 7.591 3.838 7.006 3.687 6.632 3.581 6.371
9 5.117 10.561 4.256 8.022 3.863 6.992 3.633 6.422 3.482 6.057 3.374 5.802
10 4.965 10.044 4.103 7.559 3.708 6.552 3.478 5.994 3.326 5.636 3.217 5.386
11 4.844 9.646 3.982 7.206 3.587 6.217 3.357 5.668 3.204 5.361 3.095 5.069
12 4.747 9.330 3.885 6.927 3.490 5.953 3.259 5.412 3.106 5.064 2.996 4.821
13 4.667 9.074 3.806 6.701 3.411 4.739 3.179 5.205 3.025 4.862 2.915 4.620
14 4.600 8.862 3.739 6.515 3.344 5.564 3.112 5.035 2.958 4.695 2.848 4.456
15 4.543 8.683 3.682 6.359 3.287 5.417 3.056 4.893 2.901 4.556 2.790 4.318
16 4.494 8.531 3.634 6.226 3.239 5.292 3.007 4.773 2.852 4.437 2.741 4.202
17 4.451 8.400 3.592 6.112 3.197 5.185 2.965 4.669 2.810 4.336 2.699 4.102
18 4.414 8.285 3.555 6.013 3.160 5.092 2.928 4.579 2.773 4.248 2.661 4.015
19 4.381 8.185 3.522 5.926 3.127 5.010 2.895 4.500 2.740 4.171 2.628 3.939
20 4.351 8.096 3.493 5.849 3.098 4.938 2.866 4.431 2.711 4.103 2.599 3.871
21 4.325 8.017 3.467 5.780 3.072 4.874 2.840 4.369 2.685 4.042 2.573 3.812
22 4.301 7.945 3.443 5.719 3.049 4.817 2.817 4.313 2.661 3.988 2.549 3.758
(continued)
389
390

v1 = 1 v1 = 2 v1 = 3 v1 = 4 v1 = 5 v1 = 6
v2 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01
23 4.279 7.881 3.422 5.664 3.028 4.765 2.796 4.264 2.640 3.939 2.528 3.710
24 4.260 7.823 3.403 5.614 3.009 4.718 2.776 4.218 2.621 3.895 2.508 3.667
25 4.242 7.770 3.385 5.568 2.991 4.675 2.759 4.177 2.603 3.855 2.490 3.627
26 4.225 7.721 3.369 5.526 2.975 4.637 2.743 4.140 2.587 3.818 2.474 3.591
27 4.210 7.677 3.354 5.488 2.960 4.601 2.728 4.106 2.572 3.785 2.459 3.558
28 4.196 7.636 3.340 5.453 2.947 4.568 2.714 4.074 2.558 3.754 2.445 3.528
29 4.183 7.598 3.328 5.420 2.934 4.538 2.701 4.045 2.545 3.725 2.432 3.499
30 4.171 7.562 3.316 5.390 2.922 4.510 2.690 4.018 2.534 3.699 2.421 3.473
40 4.085 7.314 3.232 5.179 2.839 4.131 2.606 3.828 2.449 3.514 2.336 3.291
80 3.960 6.963 3.111 4.881 2.719 4.036 2.486 3.563 2.329 3.255 2.214 3.036
120 3.920 6.851 3.072 4.787 2.680 3.949 2.447 3.480 2.290 3.174 2.175 2.956
∞ 3.842 6.637 2.997 4.607 2.606 3.784 2.373 3.321 2.215 3.019 2.099 2.804
Appendix: Statistical Tables
v1 = 8 v1 = 10 v1 = 12 v1 = 24 v1 = 48 v1 = ∞
v2 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01
1 238.883 5981.070 241.882 6055.847 243.906 6106.321 249.052 6234.631 251.669 6299.892 254.314 6365.861
2 19.371 99.374 19.396 99.399 19.413 99.416 19.454 99.458 19.475 99.478 19.496 99.499
3 8.845 27.489 8.786 27.229 8.745 27.052 8.639 26.598 8.583 26.364 8.526 26.125
4 6.041 14.799 5.964 14.546 5.912 14.374 5.774 13.929 5.702 13.699 5.628 13.463
Appendix: Statistical Tables

5 4.818 10.289 4.735 10.051 4.678 9.888 4.527 9.466 4.448 9.247 4.365 9.020
6 4.147 8.102 4.060 7.874 4.000 7.718 3.841 7.313 3.757 7.100 3.669 6.880
7 3.726 6.840 3.637 6.620 3.575 6.469 3.410 6.074 3.322 5.866 3.230 5.650
8 3.438 6.029 3.347 5.814 3.284 5.667 3.115 5.279 3.024 5.074 2.928 4.859
9 3.230 5.467 3.137 5.257 3.073 5.111 2.900 4.729 2.807 4.525 2.707 4.311
10 3.072 5.057 2.978 4.849 2.913 4.706 2.737 4.327 2.641 4.124 2.538 3.909
11 2.948 4.744 2.854 4.539 2.788 4.397 2.609 4.021 2.511 3.818 2.404 3.602
12 2.849 4.499 2.753 4.296 2.687 4.155 2.505 3.780 2.405 3.578 2.296 3.361
13 2.767 4.302 2.671 4.100 2.604 3.960 2.420 3.587 2.318 3.384 2.206 3.165
14 2.699 4.140 2.602 3.939 2.534 3.800 2.349 3.427 2.245 3.224 2.131 3.004
15 2.641 4.004 2.544 3.805 2.475 3.666 2.288 3.294 2.182 3.090 2.066 2.868
16 2.591 3.890 2.494 3.691 2.425 3.553 2.235 3.181 2.128 2.976 2.010 2.753
17 2.548 3.791 2.450 3.593 2.381 3.455 2.190 3.084 2.081 2.878 1.960 2.653
18 2.510 3.705 2.412 3.508 2.342 3.371 2.150 2.999 2.040 2.793 1.917 2.566
19 2.477 3.631 2.378 3.434 2.308 3.297 2.114 2.925 2.003 2.718 1.878 2.489
20 2.447 3.564 2.348 3.368 2.278 3.231 2.082 2.859 1.970 2.652 1.843 2.421
21 2.420 3.506 2.321 3.310 2.250 3.173 2.054 2.801 1.941 2.593 1.812 2.360
22 2.397 3.453 2.297 3.258 2.226 3.121 2.028 2.749 1.914 2.540 1.783 2.305
(continued)
391
392

v1 = 8 v1 = 10 v1 = 12 v1 = 24 v1 = 48 v1 = ∞
v2 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01 P = 0.05 P = 0.01
23 2.375 3.406 2.275 3.211 2.204 3.074 2.005 2.702 1.890 2.492 1.757 2.256
24 2.355 3.363 2.255 3.168 2.183 3.032 1.984 2.659 1.868 2.448 1.733 2.211
25 2.337 3.324 2.236 3.129 2.165 2.993 1.964 2.620 1.847 2.409 1.711 2.169
26 2.321 3.288 2.220 3.094 2.148 2.958 1.946 2.585 1.828 2.373 1.691 2.131
27 2.305 3.256 2.204 3.062 2.132 2.926 1.930 2.552 1.811 2.339 1.672 2.097
28 2.291 3.226 2.190 3.032 2.118 2.896 1.915 2.522 1.795 2.309 1.654 2.064
29 2.278 3.198 2.177 3.005 2.104 2.868 1.901 2.495 1.780 2.280 1.638 2.034
30 2.266 3.173 2.165 2.979 2.092 2.843 1.887 2.469 1.766 2.254 1.622 2.006
40 2.180 2.993 2.077 2.801 2.003 2.665 1.793 2.288 1.666 2.068 1.509 1.805
80 2.056 2.742 1.951 2.551 1.875 2.415 1.654 2.032 1.514 1.799 1.325 1.494
120 2.016 2.663 1.910 2.472 1.834 2.336 1.608 1.950 1.463 1.711 1.254 1.381
∞ 1.939 2.513 1.832 2.323 1.753 2.187 1.518 1.793 1.359 1.537 1.000 1.000
Appendix: Statistical Tables
Appendix: Statistical Tables 393

Durbin–Watson Critical Values

Significance level = 5%
k is the number of exogenous variables, and T is the sample size.

k=1 k=2 k=3 k=4 k=5


T d1 d2 d1 d2 d1 d2 d1 d2 d1 d2
15 1.08 1.36 0.95 1.54 0.82 1.75 0.69 1.97 0.56 2.21
16 1.10 1.37 0.98 1.54 0.86 1.73 0.74 1.93 0.62 2.15
17 1.13 1.38 1.02 1.54 0.90 1.71 0.78 1.90 0.67 2.10
18 1.16 1.39 1.05 1.53 0.93 1.69 0.82 1.87 0.71 2.06
19 1.18 1.40 1.08 1.53 0.97 1.68 0.86 1.85 0.75 2.02
20 1.20 1.41 1.10 1.54 1.00 1.68 0.90 1.83 0.79 1.99
21 1.22 1.42 1.13 1.54 1.03 1.67 0.93 1.81 0.83 1.96
22 1.24 1.43 1.15 1.54 1.05 1.66 0.96 1.80 0.86 1.94
23 1.26 1.44 1.17 1.54 1.08 1.66 0.99 1.79 0.90 1.92
24 1.27 1.45 1.19 1.55 1.10 1.66 1.01 1.78 0.93 1.90
25 1.29 1.45 1.21 1.55 1.12 1.66 1.04 1.77 0.95 1.89
26 1.30 1.46 1.22 1.55 1.14 1.65 1.06 1.76 0.98 1.88
27 1.32 1.47 1.24 1.56 1.16 1.65 1.08 1.76 1.01 1.86
28 1.33 1.48 1.26 1.56 1.18 1.65 1.10 1.75 1.03 1.85
29 1.34 1.48 1.27 1.56 1.20 1.65 1.12 1.74 1.05 1.84
30 1.35 1.49 1.28 1.57 1.21 1.65 1.14 1.74 1.07 1.83
31 1.36 1.50 1.30 1.57 1.23 1.65 1.16 1.74 1.09 1.83
32 1.37 1.50 1.31 1.57 1.24 1.65 1.18 1.73 1.11 1.82
33 1.38 1.51 1.32 1.58 1.26 1.65 1.19 1.73 1.13 1.81
34 1.39 1.51 1.33 1.58 1.27 1.65 1.21 1.73 1.15 1.81
35 1.40 1.52 1.34 1.58 1.28 1.65 1.22 1.73 1.16 1.80
36 1.41 1.52 1.35 1.59 1.29 1.65 1.24 1.73 1.18 1.80
37 1.42 1.53 1.36 1.59 1.31 1.66 1.25 1.72 1.19 1.80
38 1.43 1.54 1.37 1.59 1.32 1.66 1.26 1.72 1.21 1.79
39 1.43 1.54 1.38 1.60 1.33 1.66 1.27 1.72 1.22 1.79
40 1.44 1.54 1.39 1.60 1.34 1.66 1.29 1.72 1.23 1.79
(continued)
394 Appendix: Statistical Tables

k=1 k=2 k=3 k=4 k=5


T d1 d2 d1 d2 d1 d2 d1 d2 d1 d2
45 1.48 1.57 1.43 1.62 1.38 1.67 1.34 1.72 1.29 1.78
50 1.50 1.59 1.46 1.63 1.42 1.67 1.38 1.72 1.34 1.77
55 1.53 1.60 1.49 1.64 1.45 1.68 1.41 1.72 1.38 1.77
60 1.55 1.62 1.51 1.65 1.48 1.69 1.44 1.73 1.41 1.77
65 1.57 1.63 1.54 1.66 1.50 1.70 1.47 1.73 1.44 1.77
70 1.58 1.64 1.55 1.67 1.52 1.70 1.49 1.74 1.46 1.77
75 1.60 1.65 1.57 1.68 1.54 1.71 1.51 1.74 1.74 1.77
80 1.61 1.66 1.59 1.69 1.56 1.72 1.53 1.74 1.51 1.77
85 1.62 1.67 1.60 1.70 1.57 1.72 1.55 1.75 1.52 1.77
90 1.63 1.68 1.61 1.70 1.59 1.73 1.57 1.75 1.54 1.78
95 1.64 1.69 1.62 1.71 1.60 1.73 1.58 1.75 1.56 1.78
100 1.650 1.69 1.63 1.72 1.61 1.74 1.59 1.76 1.57 1.78
References

Akaike, H. (1969), “Fitting Autoregressive Models for Prediction”, Annals of the Institute of
Statistical Mathematics, 21, pp. 243–247.
Akaike, H. (1973), “Information theory and an extension of maximum likelihood principle”,
Second International Symposium on Information Theory, pp. 261–281.
Akaike, H. (1974), “A new look at the statistical model identification”, IEEE Transactions on
Automatic Control, 19(6), pp. 716–723.
Almon, S. (1962), “The Distributed Lag between Capital Appropriations and Expenditures”,
Econometrica, 30, pp. 407–423.
Baltagi, B.H. (2021), Econometric Analysis of Panel Data, 6th edition, John Wiley & Sons.
Banerjee, A., Dolado, J., Galbraith, J.W. and D.F. Hendry (1993), Cointegration, Error-Correction,
and the Analysis of Nonstationary Data, Oxford University Press.
Basmann, R.L. (1957), Generalized Classical Method of Linear Estimation of Coefficients in a
Structural Equation”, Econometrica, 25, pp. 77–83.
Bauwens, L., Hafner, C. and S. Laurent (2012), “Volatility models”, in Bauwens, L., Hafner, C.
and S. Laurent (eds), Handbook of Volatility Models and their Applications, John Wiley & Sons,
Inc.
Beach, C.M. and J.G. MacKinnon (1978), “A Maximum Likelihood Procedure for Regression with
Autocorrelated Errors”, Econometrica, 46, pp. 51–58.
Belsley, D.A., Kuh, E. and R.E. Welsch (1980), Regression Diagnostics: Identifying Influential
Data and Sources of Collinearity, John Wiley & Sons, New York.
Bénassy-Quéré, A. and V. Salins (2005), “Impact de l’ouverture financière sur les inégalités
internes dans les pays émergents”, Working Paper CEPII, 2005–11.
Beran, J. (1994), Statistics for Long Memory Processes, Chapman & Hall.
Blanchard, O. and S. Fischer (1989), Lectures on Macroeconomics, The MIT Press.
Bollerslev, T. (2008), “Glossary to ARCH (GARCH)”, CREATES Research Paper, 2008–49.
Bollerslev, T., Chou, R.Y. and K.F. Kroner (1992), “ARCH modeling in finance: A review of the
theory and empirical evidence”, Journal of Econometrics, 52(1–2), pp. 5–59.
Bollerslev, T., Engle, R.F. and D.B. Nelson (1994), “ARCH Models”, in Engle R.F. and D.L.
McFadden (eds), Handbook of Econometrics, Vol. IV, pp. 2959–3038, Elsevier Science.
Box, G.E.P. and D.R. Cox (1964),17 “An Analysis of Transformations” , Journal of the Royal
Statistical Society, Series B, 26, pp. 211–243.
Box, G.E.P. and G.M. Jenkins (1970), Time Series Analysis: Forecasting and Control, Holden Day,
San Francisco.
Box, G.E.P. and D.A. Pierce (1970), “Distribution of Residual Autocorrelation in ARIMA Time
Series Models”, Journal of the American Statistical Association, 65, pp. 1509–1526.
Breusch, T.S. (1978), “Testing for Autocorrelation in Dynamic Linear Models”, Australian
Economic Papers, 17, pp. 334–335.
Breusch, T.S. and A.R. Pagan (1979), “A Simple Test for Heteroscedasticity and Random
Coefficient Variation”, Econometrica, 47, pp. 1287–1294.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 395
V. Mignon, Principles of Econometrics, Classroom Companion: Economics,
https://doi.org/10.1007/978-3-031-52535-3
396 References

Brockwell, P.J. and R.A. Davis (1998), Time Series. Theory and Methods, 2nd edition, Springer
Verlag.
Brown, R.L., Durbin, J. and J.M. Evans (1975), “Techniques for Testing the Constancy of
Regression Relationship Over Time”, Journal of the Royal Statistical Society, 37, pp. 149–192.
Campbell, J.Y. and P. Perron (1991), “Pitfalls and Opportunities: What Macroeconomists Should
Know about Unit Roots”, in Fisher, S. (ed.), NBER Macroeconomic Annual, MIT Press,
pp. 141–201.
Chow, G.C. (1960), “Tests of Equality Between Sets of Coefficients in two Linear Regressions”,
Econometrica, 28, pp. 591–605.
Cochrane, D. and G.H. Orcutt (1949), “Application of Least Squares Regressions to Relationships
Containing Autocorrelated Error Terms”, Journal of the American Statistical Association, 44,
pp. 32–61.
Davidson, R. and J.G. MacKinnon (1993), Estimation and Inference in Econometrics, Oxford
University Press.
Dhrymes, P. (1973), “Restricted and Unrestricted Reduced Forms”, Econometrica, 41, pp. 119–
134.
Dhrymes, P. (1978), Introductory Econometrics, Springer Verlag.
Dickey, D.A. and W.A. Fuller (1979), “Distribution of the Estimators for Autoregressive Time
Series With a Unit Root”, Journal of the American Statistical Association, 74, pp. 427–431.
Dickey, D.A. and W.A. Fuller (1981), “Likelihood Ratio Statistics for Autoregressive Time Series
With a Unit Root”, Econometrica, 49, pp. 1057–1072.
Diebold, F.X. (2012), Elements of Forecasting, 4th edition, South Western Publishers.
Dowrick S., Pitchford R. and S.J. Turnovsky (2008), Economic Growth and Macroeconomic
Dynamics: Recent Developments in Economic Theory, Cambridge University Press.
Duesenberry, J. (1949), Income, Saving and the Theory of Consumer Behavior, Harvard University
Press.
Dufrénot, G. and V. Mignon (2002a), “La cointégration non linéaire : une note méthodologique”,
Économie et Prévision, n 155, pp. 117–137.
Dufrénot, G. and V. Mignon (2002b), Recent Developments in Nonlinear Cointegration with
Applications to Macroeconomics and Finance, Kluwer Academic Publishers.
Durbin, J. (1960), “The Fitting of Time Series Models”, Review of the International Statistical
Institute, 28, pp. 233–244.
Durbin, J. (1970), “Testing for Serial Correlation in Least Squares Regression When some of the
Regressors are Lagged Dependent Variables”, Econometrica, 38, pp. 410–421.
Durbin, J. and G.S. Watson (1950), “Testing for Serial Correlation in Least Squares Regression I”,
Biometrika, 37, pp. 409–428.
Durbin, J. and G.S. Watson (1951), “Testing for Serial Correlation in Least Squares Regression
II”, Biometrika, 38, pp. 159–178.
Elhorst, J-P. (2014), Spatial Econometrics, Springer.
Engle, R.F. (1982), “Autoregressive Conditional Heteroscedasticity with Estimates of the Variance
of United Kingdom Inflation”, Econometrica, 50(4), pp. 987–1007.
Engle, R.F. and C.W.J. Granger (1987), “Cointegration and Error Correction: Representation,
Estimation and Testing”, Econometrica, 55, pp. 251–276.
Engle, R.F. and C.W.J. Granger (1991), Long Run Economic Relationships. Readings in Cointe-
gration, Oxford University Press.
Engle, R.F. and S. Yoo (1987), “Forecasting and Testing in Cointegrated Systems”, Journal of
Econometrics, 35, pp. 143–159.
Farebrother, R.W. (1980), “The Durbin-Watson Test for Serial Correlation when There Is No
Intercept in the Regression”, Econometrica, 48, pp. 1553–1563.
Farrar, D.E. and R.R. Glauber (1967), “Multicollinearity in Regression Analysis: The Problem
Revisited”, The Review of Economics and Statistics, 49, pp. 92–107.
Farvaque, E., Jean , N. and B. Zuindeau (2007), “Inégalités écologiques et comportement électoral
: le cas des élections municipales françaises de 2001”, Développement Durable et Territoires,
Dossier 9.
References 397

Feldstein, M. and C. Horioka (1980), “Domestic Saving and International Capital Flows”,
Economic Journal, 90, pp. 314–329.
Florens, J.P., Marimoutou, V. and A. Péguin-Feissolle (2007), Econometric Modeling and Infer-
ence, Cambridge University Press.
Fox, J. (1997), Applied Regression Analysis, Linear Models, and Related Methods, Sage Publica-
tions.
Friedman, M. (1957), A Theory of the Consumption Function, New York.
Frisch, R.A.K. (1933), Editorial, Econometrica, 1, pp. 1–4.
Gallant, A.R. (1987), Nonlinear Statistical Models, John Wiley & Sons.
Geary, R.C. (1970), “Relative Efficiency of Count Sign Changes for Assessing Residual Autore-
gression in Least Squares Regression”, Biometrika, 57, pp. 123–127.
Giles, D.E.A. and M.L. King (1978), “Fourth Order Autocorrelation: Further Significance Points
for the Wallis Test”, Journal of Econometrics, 8, pp. 255–259.
Glejser, H. (1969), “A New Test for Heteroscedasticity”, Journal of the American Statistical
Association, 64, pp. 316–323.
Godfrey, L.G. (1978), “Testing Against Autoregressive and Moving Average Error Models when
the Regressors Include Lagged Dependent Variables”, Econometrica, 46, pp. 1293–1302.
Goldfeld, S.M. and R.E. Quandt (1965), “Some Tests for Homoskedasticity”, Journal of the
American Statistical Association, 60, pp. 539–547.
Goldfeld, S.M. and R.E. Quandt (1972), Nonlinear Econometric Methods, North-Holland, Ams-
terdam.
Gouriéroux, C. (1997), ARCH Models and Financial Applications, Springer Series in Statistics.
Gouriéroux, C. (2000), Econometrics of Qualitative Dependent Variables, Cambridge University
Press.
Gouriéroux, C. and A. Monfort (1996), Time Series and Dynamic Models, Cambridge University
Press.
Gouriéroux, C. and A. Monfort (2008), Statistics and Econometric Models, Cambridge University
Press.
Granger, C.W.J. (1969), “Investigating Causal Relations by Econometric Models and Cross-
Spectral Methods”, Econometrica, 36, pp. 424–438.
Granger, C.W.J. (1981), “Some Properties of Time Series Data and their Use in Econometric Model
Specification” , Journal of Econometrics, pp. 121–130.
Granger, C.W.J. and P. Newbold (1974), “Spurious Regressions in Econometrics”, Journal of
Econometrics, 26, pp. 1045–1066.
Granger, C.W.J. and T. Teräsvirta (1993), Modelling Nonlinear Economic Relationships, Oxford
University Press.
Greene, W. (2020), Econometric Analysis, 8th edition, Pearson.
Griliches, Z. (1967), “Distributed Lags: A Survey”, Econometrica, 36, pp. 16–49.
Griliches, Z. and M. Intriligator (1983), Handbook of Econometrics, Vol. 1, Elsevier.
Gujarati, D.N., Porter, D.C. and S. Gunasekar (2017), Basic Econometrics, McGraw Hill.
Hamilton, J.D. (1994), Time Series Analysis, Princeton University Press.
Hannan, E.J. and B.G. Quinn (1979), “The Determination of the Order of an Autoregression”,
Journal of the Royal Statistical Society, Series B, 41, pp. 190–195.
Harvey, A.C. (1990), The Econometric Analysis of Time Series, MIT Press.
Harvey, A.C. and G.D.A. Phillips (1973), “A Comparison of the Power of Some Tests for
Heteroscedasticity in the General Linear Model”, Journal of Econometrics, 2, pp. 307–316.
Hausman, J. (1975), “An Instrumental Variable Approach to Full-Information Estimators for Linear
and Certain Nonlinear Models”, Econometrica, 43, pp. 727–738.
Hausman, J. (1978), “Specification Tests in Econometrics”, Econometrica, 46, pp. 1251–1271.
Hausman, J. (1983), “Specification and Estimation of Simultaneous Equation Models”, in
Griliches, Z. and M. Intriligator (eds), Handbook of Econometrics, North-Holland, Amsterdam.
Hendry, D.F. (1995), Dynamic Econometrics, Oxford University Press.
Hendry, D.F. and Morgan, M.S. (eds) (1995), The Foundations of Econometric Analysis, Cam-
bridge University Press.
398 References

Hildreth, C. and J. Lu (1960), “Demand Relations with Autocorrelated Disturbances”, Technical


Bulletin no. 276, Michigan State University Agricultural Experiment Station.
Hoel, P.G. (1974), Introduction to Mathematical Statistics, John Wiley & Sons.
Hoerl, A.E. and R.W. Kennard (1970a), “Ridge Regression: Biased Estimation for Non-Orthogonal
Problems”, Technometrics, pp. 55–68.
Hoerl, A.E. and R.W. Kennard (1970b), “Ridge Regression: Applications to Non-Orthogonal
Problems”, Technometrics, pp. 69–82.
Hurlin, C. and V. Mignon (2005), “Une synthèse des tests de racine unitaire sur données de panel”,
Économie et Prévision, n 169–170-171, pp. 253–294.
Hurlin, C. and V. Mignon (2007), “Une synthèse des tests de cointégration sur données de panel”,
Économie et Prévision, n 180–181, pp. 241–265.
Hurlin, C. and V. Mignon (2022), Statistique et probabilités en économie-gestion, 2nd edition,
Dunod.
Hurvich, C.M. and C.-L. Tsai (1989), “Regression and time series model selection in small
samples”, Biometrika, 76, pp. 297–307.
bib89 Intriligator, M.D. (1978), Econometric Models, Techniques and Applications, Prentice Hall.
Jarque, C.M. and A.K. Bera (1980), “Efficient Tests for Normality, Homoscedasticity and Serial
Independence of Regression Residuals”, Economics Letters, 6, pp. 255–259.
Johansen, S. (1988), “Statistical Analysis of Cointegration Vectors”, Journal of Economic Dynam-
ics and Control, 12, pp. 231–254.
Johansen, S. (1991), “Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian
Vector Autoregressive Models”, Econometrica, 59, pp. 1551–1580.
Johansen, S. (1995), Likelihood-based Inference in Cointegrated Vector Autoregression Models,
Oxford University Press.
Johansen, S. and K. Juselius (1990), “Maximum Likelihood Estimation and Inferences on
Cointegration with Application to the Demand for Money”, Oxford Bulletin of Economics and
Statistics, 52, pp. 169–210.
Johnston, J. and J. Dinardo (1996), Econometric Methods, 4th edition, McGraw Hill.
Jorgenson, D. (1966), “Rational Distributed Lag Functions”, Econometrica, 34, pp. 135–149.
Judge, G.G., Griffiths, W.E., Hill, R.C., Lutkepohl, H. and T.C. Lee (1985), The Theory and
Practice of Econometrics, 2nd edition, John Wiley & Sons.
Judge, G.G., Griffiths, W.E., Hill, R.C., Lutkepohl, H. and T.C. Lee (1988), Introduction to the
Theory and Practice of Econometrics, John Wiley & Sons.
Kaufmann, D., Kraay, A. and M. Mastruzzi (2006), “Governance Matters V: Aggregate and
Individual Governance Indicators for 1996–2005”, http://web.worldbank.org.
Kennedy, P. (2008), A Guide to Econometrics, 6th edition, MIT Press.
Keynes, J.M. (1936), The General Theory of Employment, Interest, and Money, Macmillan.
Klein, L.R. (1950), Economic Fluctuations in the United States, 1921–1941, John Wiley & Sons,
New York.
Klein, L.R. (1962), An Introduction to Econometrics, Prentice-Hall, Englewood Cliffs.
Kmenta, J. (1971), Elements of Econometrics, Macmillan.
Koyck, L.M. (1954), Distributed Lags and Investment Analysis, North-Holland, Amsterdam.
Kullback, S. and A. Leibler (1951), “On information and sufficiency”, Annals of Mathematical
Statistics 22, pp. 79–86.
Lardic, S. and V. Mignon (1999), “La mémoire longue en économie : une revue de la littérature”,
Journal de la Société Française de Statistique, pp. 5–48.
Lardic, S. and V. Mignon (2002), Économétrie des séries temporelles macroéconomiques et
financières, Economica.
Lardic, S., Mignon, V. and F. Murtin (2005), “Estimation des modèles à correction d’erreur
fractionnaires : une note méthodologique”, Journal de la Société Française de Statistique,
pp. 55–68.
Leamer, E.E. (1983), “Model Choice and Specification Analysis”, in Griliches, Z. and M.D.
Intriligator (eds), Handbook of Econometrics, Vol. I, North Holland.
Lehnan, E.L. (1959), Testing Statistical Hypothesis, John Wiley & Sons.
References 399

LeSage, J. and R.K. Pace (2008), Introduction to Spatial Econometrics, Chapman & Hall.
Ljung, G.M. and G.E.P. Box (1978), “On a Measure of Lack of Fit in Time Series Models”,
Biometrika, 65, pp. 297–303.
MacKinnon, J.G. (1991), “Critical Values for Cointegration Tests”, in Engle, R.F. and C.W.J.
Granger (eds), Long-Run Economic Relationships, Oxford University Press, pp. 267–276.
Maddala, G.S. and I.-M. Kim (1998), Unit Roots, Cointegration, and Structural Change, Cam-
bridge University Press.
Maddala, G.S. and A.S. Rao (1971), “Maximum Likelihood Estimation of Solow’s and Jorgenson’s
Distributed Lag Models”, The Review of Economics and Statistics, 53(1), pp. 80–89.
Matyas, L. and P. Sevestre (2008), The Econometrics of Panel Data. Fundamentals and Recent
Developments in Theory and Practice, 3rd edition, Springer.
Mills, T.C. (1990), Time Series Techniques for Economists, Cambridge University Press.
Mittelhammer, R.C., Judge, G.G. and D.J. Miller (2000), Econometric Foundations, Cambridge
University Press, New York.
Mood, A.M., Graybill, F.A. and D.C. Boes (1974), Introduction to the Theory of Statistics,
McGraw-Hill.
Morgan, M.S. (1990), The History of Econometric Ideas (Historical Perspectives on Modern
Economics), Cambridge University Press.
Morgenstern, O. (1963), The Accuracy of Economic Observations, Princeton University Press.
Nelson, C.R. and C. Plosser (1982), “Trends and Random Walks in Macroeconomics Time Series:
Some Evidence and Implications”, Journal of Monetary Economics, 10, pp. 139–162.
Nerlove, M. (1958), Distributed Lags and Demand Analysis for Agricultural and Other Commodi-
ties, Agricultural Handbook 141, US Department of Agriculture.
Newbold, P. (1984), Statistics for Business and Economics, Prentice Hall.
Newey, W.K. and K.D. West (1987), “A Simple Positive Definite Heteroskedasticity and Autocor-
relation Consistent Covariance Matrix”, Econometrica, 55, pp. 703–708.
Palm, F.C. (1996), “GARCH Models of Volatility”, in Maddala G.S. and C.R. Rao (eds), Handbook
of Statistics, Vol. 14, pp. 209–240, Elsevier Science.
Phillips, A.W. (1958), “The Relationship between Unemployment and the Rate of Change of
Money Wage Rates in the United Kingdom, 1861–1957”, Economica, 25 (100), pp. 283–299.
Pindyck, R.S. and D.L. Rubinfeld (1991), Econometric Models and Economic Forecasts, McGraw-
Hill.
Pirotte, A. (2004), L’économétrie. Des origines aux développements récents, CNRS Éditions.
Prais, S.J. and C.B. Winsten (1954), “Trend Estimators and Serial Correlation”, Cowles Commis-
sion Discussion Paper, no. 383, Chicago.
Puech, F. (2005), Analyse des déterminants de la criminalité dans les pays en développement,
Thèse pour le doctorat de Sciences Économiques, Université d’Auvergne-Clermont I.
Rao, C.R. (1965), Linear Statistical Inference and Its Applications, John Wiley & Sons.
Sargan, J.D. (1964), “Wages and Prices in the United Kingdom: A Study in Econometric
Methodology”, in Hart, P.E., Mills, G. and J.K. Whitaker (eds), Econometric Analysis for
National Economic Planning, Butterworths, London.
Schmidt, P. (1976), Econometrics, Marcel Dekker, New York.
Schwarz, G. (1978), “Estimating the Dimension of a Model”, The Annals of Statistics, 6, pp. 461–
464.
Sims, C.A. (1980), “Macroeconomics and Reality”, Econometrica, 48, pp. 1–48.
Solow, R.M. (1960), “On a Family of Lag Distributions”, Econometrica, 28, pp. 393–406.
Spanos, A. (1999), Probability Theory and Statistical Inference: Econometric Modeling with
Observational Data, Cambridge University Press.
Swamy, P.A.V.B. (1971), Statistical Inference in Random Coefficient Regression Models, Springer
Verlag.
Teräsvirta, T., Tjøstheim, D. and C.W.J. Granger (2010), Modelling Nonlinear Economic Time
Series, Oxford University Press.
Theil, H. (1953), “Repeated Least Squares Applied to Complete Equation Systems”, Central
Planning Bureau, The Hague, Netherlands.
400 References

Theil, H. (1971), Principles of Econometrics, John Wiley & Sons, New York.
Theil, H. (1978), Introduction to Econometrics, Prentice Hall.
Thuilliez, J. (2007), “Malaria and Primary Education: A Cross-Country Analysis on Primary
Repetition and Completion Rates”, Working Paper Centre d’Économie de la Sorbonne, 2007–
13.
Tobin, J. (1950), “A Statistical Demand Function for Food in the USA”, Journal of the Royal
Statistical Society, Series A, pp. 113–141.
Wallis, K.F. (1972), “Testing for Fourth-Order Autocorrelation in Quarterly Regression Equa-
tions”, Econometrica, 40, pp. 617–636.
White, H. (1980), “A Heteroscedasticity Consistent Covariance Matrix Estimator and a Direct Test
of Heteroscedasticity”, Econometrica, 48, pp. 817–838.
Wooldridge, J.M. (2010), Econometric Analysis of Cross Section and Panel Data, 2nd edition, MIT
Press.
Wooldridge, J.M. (2012), Introductory Econometrics: A Modern Approach, 5th edition, South
Western Publishing Co.
Zellner, A. (1962), “An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests
of Aggregation Bias”, Journal of the American Statistical Association, 57, pp. 500–509.
Zellner, A. and H. Theil (1962), “Three Stage Least Squares: Simultaneous Estimation of
Simultaneous Equations”, Econometrica, 30, pp. 63–68.

Bringing together theory and practice, this book presents the basics of economet-
rics in a clear and pedagogical way. It focuses on the acquisition of the methods and
skills that are essential for all students wishing to succeed in their studies and for
all practitioners wishing to apply econometric techniques. The approach adopted
in this textbook is resolutely applied. Through this book, the author aims to meet
a pedagogical and operational need to quickly put into practice various concepts
presented (statistics, tests, methods, etc.). This is why, after each theoretical
presentation, numerous examples are given, as well as empirical applications carried
out on the computer using existing econometric and statistical software.

This textbook is primarily intended for students of Bachelor’s and Master’s


Degrees in Economics, Management, and Mathematics and Computer Sciences, as
well as for students of Engineering and Business Schools. It will also be useful for
professionals who will find practical solutions to the various problems they face.

Valérie MIGNON is Professor of Economics at the University of Paris Nanterre


(France), Member of the EconomiX–CNRS research center, and Scientific Advisor
to the leading French center for research and expertise on the world economy,
CEPII (Paris, France). She teaches econometrics at undergraduate and graduate
levels. Her econometric research focuses mainly on macroeconomics, finance,
international macroeconomics and finance, and energy, fields in which she has
published numerous articles and books.
Index

A Brockwell, P.J., 349


Additive decomposition scheme, 250 Brown, R.L., 253, 254
AIC, see Information criteria, Akaike
Akaike, see Information criteria, Akaike
Almon, S., 271 C
ANCOVA, see Model, covariance analysis Campbell, J.Y., 306
ANOVA, see Model, variance analysis Causality, 331, 332, 334–336
AR, see Model, autoregressive Central limit theorem, 31
ARCH, see Model, ARCH Chow, G.C., see Test, Chow
ARDL, see Model, autoregressive distributed CLS, see Method, constrained least squares
lag Cochrane, D., see Method, Cochrane-Orcutt
ARMA, see Model, ARMA Coefficient, 8, 30
Autocorrelation, vi, 31, 109, 171–173, 176, adjusted determination, 123, 126, 144, 151
187, 194–198, 200, 201, 203–211, adjustment, 278
216–219, 276, 283, 296, 297, 305, autocorrelation, 196, 198, 201, 206, 210,
317–319, 364 317
Autocovariance, 196, 289, 290, 299 correlation, 13, 17, 127, 133, 234, 239
Autoregressive lag polynomial, 282 determination, 64, 66, 68, 69, 71, 123, 125,
126, 128, 144, 151, 228, 232, 233
expectation, 279
B kurtosis, 98, 99
Baltagi, B.H., vii multiple correlation, 13, 125
Banerjee, A., 349 partial correlation, 127, 133, 239, 292
Bartlett, 317 partial determination, 128
Basmann, R.L., 363 partial regression, 105 ((see also
Bauwens, L., 349 Coefficient, partial regression))
Beach, C.M., 216 skewness, 98
Belsley, D.A., 233, 262 Cofactor, 159, 160
Bénassy-Quéré, A., 137 Cointegration, 287, 336, 338–342, 346
Bera, J., see Test, Jarque-Bera Collinearity, see Multicollinearity
Beran, J., 349 Condition
Blanchard, O., 267 completeness, 357
Bollerslev, 349 order, 360, 361, 369, 370
Bollerslev, T., 349 rank, 360, 361
Box, G.E.P., 20, 77, 207, 210, 219, 287, 297, Correlation, 13, 14, 17, 33, 127, 133, 231–235,
312, 313, 317–321, 324, 349 239, 241, 330, 364, 373
Break, 251, 254, 256 nonlinear, 14
Breusch, T.S., 182, 192, 193, 207–209, 218, Correlogram, 291, 293, 295, 297
219, 319 Covariance, 11–13, 17, 44, 45, 289

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 401
V. Mignon, Principles of Econometrics, Classroom Companion: Economics,
https://doi.org/10.1007/978-3-031-52535-3
402 Index

Cox, D.R., 20, 77 prediction, 73, 74, 97, 141, 252, 299, 301,
Critical value, 57 319, 321
Estimator
BLUE, 47, 49, 113, 174, 363
D consistent, 49, 87–89, 103, 225, 277
Data linear, 47–49, 83, 85, 89, 92, 112, 113, 160,
cross-sectional, 9, 196 174
panel, vii, 9 minimum variance, 47, 49, 89, 160
Davidson, R., 82, 153, 221, 285, 365 unbiased, 47–49, 73, 85–90, 95, 103, 112,
Davis, R.A., 349 114, 141, 161, 164, 173, 236, 237
Determinant, 157, 159, 160 Evans, J.M., 253, 254
Dhrymes, P., 221, 365 Exogeneity, 327
Dickey, D.A., see Test, Dickey-Fuller Explanatory power, 143–145
Diebold, F.X., 262
Distribution
Chi-squared, 54, 55 F
Fisher, 55 Farebrother, R.W., 206
normal, 31, 97 Farrar, D.E., see Test, Farrar-Glauber
standard, 54, 98 Farvaque, E., 139
student, 55 Feedback effect, 332, 336
Disturbance(s), 30, 353, 364 Florens, J.P., vii, 82, 377
structural, 355, 358, 359 Form
Dowrick, S., 267 reduced, 354, 357–360, 362–365, 367
DS process, 297, 300–302 structural, 353, 355, 357–359, 362, 363,
Duesenberry, J., 266 366
Dufrénot, G., 349 Fox, J., 262
Dummy, see Variable, dummy Frequency, 9
Durbin, J., 204, 206–208, 214, 217, 219, 253, Friedman, J.P., 266, 279
254, 293, 314, 316 Frisch, R.A.K., v
Durbin algorithm, 293, 314, 316 Fuller, W.A., see Test, Dickey-Fuller
Function
autocorrelation, 289–293, 296, 297,
E 314–316
Elasticity, 76, 77, 247 autocovariance, 289, 290, 299, 301, 315
Elhorst, J.-P., vii impulse response, 332
Engle, R.F., vi, 185, 339–342, 345, 346, 349 joint probability density, 101
Equation(s) likelihood, 101
behavioral, 8, 353 partial autocorrelation, 289, 292, 308, 316
equilibrium, 353, 356
reduced form, 354
simultaneous, vi, 327, 351, 355, 360, 362, G
363, 365–367 Gallant, A.R., 82, 153
structural, 353, 357, 358, 364–366 Geary, R.C., see Test, Geary
variance analysis, 65, 66, 70, 124, 125, 129, Giles, D.E.A., 207
130, 167 Glauber, R.R., see Test, Farrar-Glauber
Yule-Walker, 293, 313, 314 Glejser, H., see Test, Glejser
Error(s), 30, 105 GLS, see Method, generalized least squares
equilibrium, 338 Godfrey, L.G., 207–209, 218, 219, 319
identically and independently distributed, Goldfeld, S.M., 82, 262
32 See also Test, Goldfeld-Quandt
mean absolute, 319, 320 Gouriéroux, C., vii, 262, 318, 349
mean absolute percent, 320 Granger, C.W.J., vi, 331, 336, 339, 340, 342,
measurement, 227 345, 349
normally and independently distributed, 32 Granger representation theorem, 339
Index 403

Greene, W., vii, 82, 112, 153, 221, 262, 281, Johansen, S., 340, 342, 349
290, 318, 327, 332, 360, 362, 365, Johnston, J., 82, 153, 201, 225, 262, 362, 364,
377 365
Griliches, Z., 262, 285 Jorgenson, D., 282
Gujarati, D.N., 82, 221, 285, 377 Judge, G.G., 153, 221, 262
Juselius, K., 342

H
Hamilton, J.D., 290, 327, 332, 339, 342, 349 K
Harvey, A.C., 182, 190, 349 Kaufmann, D., 136
Hausman, J., 226, 351, 365, 367 Kennard, R.W., 237
Heckman, J., vi Kennedy, P., 262
Hendry, D.F., vi, 26, 221 Kim, I.-M., 349
Heterogeneity, 176 Klein, L.R., 231, 234, 368, 370, 376
Heteroskedastic, see Heteroskedasticity Kmenta, J., 82
Heteroskedasticity, vi, 31, 171–173, 176–180, Koyck, L.M., 273, 275–280, 282, 283
182–189, 194, 195, 201, 211, 216, Kuh, D.A., 233
325, 364–366, 373 Kullback, S., 144
conditional, 185, 186, 325 Kurtosis, see Coefficient, kurtosis
Hildreth, C., see Method, Hildreth-Lu
Hoel, P.G., 26
Hoerl, A.E., 237 L
Homoskedastic, see Homoskedasticity Lag, 265
Homoskedasticity, 30, 31, 109, 171, 181–186, mean, 269, 275, 285
190, 192–194, 289, 319, 325, 326 median, 269, 275, 284
Hurlin, C., 82, 349 Lagrange multiplier statistic, 185, 186
Lardic, S., 197, 287, 289, 302, 327, 332, 339,
342, 349
I Leamer, E.E., 262
Identification, 317, 318, 351, 357–361, 369 Lehnan, E.L., 82
Identification problem, 357 Leptokurtic, see Coefficient, kurtosis
ILS, see Method, indirect least squares LeSage, J., vii
Inertia degree, 10 Ljung, G.M., see Test, Ljung-Box
Information criteria Logarithmic difference, 20
Akaike, 145, 146, 152, 306, 320, 330, 334 Loglikelihood, 102, 365
Akaike corrected, 145 Log-reciprocal, see Model, reciprocal
Hannan-Quinn, 145, 146, 152, 306, 320, Lu, J., see Method, Hildreth-Lu
330, 334
Schwarz, 145, 146, 152, 306, 320, 330, 334
Innovation, 313 M
Integration, 300, 308, 310, 338, 345 MA, see Model, moving average
Interpolation, 197 MacKinnon, J.G., 82, 153, 216, 221, 285, 341,
Interval 346, 365
confidence, 57, 58, 60, 63, 64, 95, 118, 204, Macrobond, 21, 50, 147, 190, 234, 250, 256,
295, 296, 324 283, 345
prediction, 73, 74, 97, 140–143, 321 Maddala, G.S., 281, 349
Intriligator, M.D., 26, 262 Mallows criterion, 146
Marimoutou, J.P., vii
Matrix
J diagonal, 154
Jarque, C.M., see Test, Jarque-Bera full rank, 108, 111, 120, 158, 159, 242
Jean, E., 139 idempotent, 157, 163
Jenkins, G.M., 287, 312, 313, 317, 318, 320, identity, 157
321, 349 inverse, 157, 158
404 Index

non-singular, 158, 159 ARMA, 287, 292, 312, 316–324


scalar, 155 autoregressive, 198, 201, 205, 209, 212,
square, 154, 158 217, 265, 275, 281, 287, 303, 313,
symmetric, 155, 163 314, 316, 318, 322–327, 330
transpose, 110, 155, 156 autoregressive distributed lags, 265, 281,
Matyas, L., vii 282
McFadden, D.L., vi constrained, 122, 131, 132, 152, 242, 330,
Mean, 11 332, 334, 336
weighted arithmetic, 11 covariance analysis, 248
Method distributed lags, vi, 265, 267–271, 273, 275,
all possible regressions, 239 280–284
backward, 239, 240 double-log (see Model, log-linear)
Cochrane-Orcutt, 215, 216 error correction, 287, 336, 339, 340, 342,
constrained least squares, 241, 242, 264 346
Marquardt generalized inverses, 238 infinite distributed lags, 271, 273
forward, 239, 241 Koyck, 273, 275–280, 282, 283
full-information estimation method, 362 log-inverse, 80, 81
full-information maximum likelihood, 365 log-linear, 76, 77, 189
generalized least squares, 172–174, 177, log-log (see Model, log-linear)
178, 200, 201, 211–215, 365, 366 moving average, 210, 315–317, 322–327
generalized moments, 362, 365 partial adjustment, 278
Hildreth-Lu, 215, 216 Pascal, 273, 279, 280
indirect least squares, 357, 362, 363, 365, polynomial distributed lags, 271
370 rational lags, 282
instrumental variables, 223, 224, 276, 277, reciprocal, 79, 80
362, 368 semi-log, 77, 79
limited-information estimation, 362, 365 simultaneous equations, vi, 327, 351, 355,
maximum likelihood, 39, 100, 101, 112, 362, 363, 365–367
144, 216, 277, 318, 342, 362, 365, unconstrained, 122, 131, 132, 152, 242,
376 330–332, 334–336
Newey-West, 186–188, 194, 216, 219, 283, variance analysis, 245, 248
284 Modeling, 7
ordinary least squares, 34, 35, 50, 76, 107, Monfort, A., 262, 318
110, 111, 148, 172, 174, 178, 186, Mood, A.M., 26, 82
213, 242, 243, 264, 329, 351, 370 Morgan, M.S., vi
pseudo GLS, 214 Morgenstern, O., 26
stagewise, 239 Multicollinearity, vi, 108, 223, 228–238, 247,
stepwise, 240, 241 271, 275
SUR, 366 perfect, 228
three-stage least squares, 365, 366, 373, Multiplier
374 cumulative, 269
two-stage least squares, 355, 363, 370, 372, long-term, 269, 282
373 short-term, 269
weighted least squares, 178, 186 Murtin, F., 349
White, 194
Mignon, V., 82, 197, 287, 289, 302, 327, 332,
339, 342, 349 N
Mills, T.C., 349 Nelson, C.R., 298
Minor, 159 Nerlove, M., 278, 285
Mittelhammer, R.C., 207 Newbold, P., 26, 336
Model, 7 Newey, W.K., see Method, Newey-West
adaptative expectations, 278 Newton, 280
Almon lags, 271 Normalization, 356, 359,
ARCH, 185, 194, 319, 325 360
Index 405

O rolling, 251, 256


OLS, see Method, ordinary least squares simple, 27
OLS line, see Regression line spurious, 298, 336, 337, 340, 345
Operator Relation(s)
first-difference, 19, 22, 197, 294 accounting, 8
lag, 268, 282, 300 cointegration, 338–342, 346
Orcutt, G.H., see Method, Cochrane-Orcutt technological, 8
Overidentified, 358, 361, 370 Residual(s), 34
recursive, 251–254, 257
Ridge regression, 237
P Roos, C., v
Pace, R.K., vii Root mean squared error, 320
Pagan, A.R., 182, 192, 193, 319 Rubinfeld, D.L., 82, 367, 368, 377
Palm, F.C., 349 Run, 203, 204
Parameter(s), 8, 30
cointegration, 338
integration, 300, 308 S
Pascal, see Model, Pascal Salins, V., 137
Péguin-Feissolle, A., vii Sample, 12
Perfect multicollinearity, see Multicollinearity, Sargan, see Test, instruments validity
perfect Scalar product, 157
Perron, P., 306 Scatter plot, 34, 50
Persistence, 10 Schmidt, P., 237
Phillips, A.W., 79, 182, 190 Schwarz, G., see Information criteria, Schwarz
Pierce, D.A., 207, 210, 319 Seasonal adjustment, 249, 250
Pindyck, R.S., 82, 367, 368, 377 Semi-elasticity, 77, 247
Pirotte, A., v, vi Series
Platikurtic, see Coefficient, kurtosis integrated, 307, 308
Plosser, C., 298 time, vi, 9, 17, 196, 287, 289, 292, 319
Population, 12 Sevestre, P., vii
Prais, S.J., 216 Shiller, R., 332
Predictive power, 143–145 Short memory, 293
Puech, F., 134 SIC, see Information criteria, Schwarz
Significance level, 57, 321
Sims, C.A., 327
Q Skewness, see Coefficient, skewness
Quandt, R.E., 82, 262 Solow, R.M., 279
See also Test, Goldfeld-Quandt Spanos, A., 26
Quasi first difference, 213 Spectral density, 290
Sphericity of errors, 171
Stability, vi, 237, 241, 251, 253, 254, 256, 257,
R 260
Random walk, 300, 303 Standard deviation, 12
Rank, 158, 159, 242 empirical, 12
Rao, C.R., 82, 281 Stationarity
Regression line, 28, 34, 37, 43, 50, 53, in mean, 17, 18, 20, 23, 294, 295, 297–299
61 in variance, 17, 20, 299
Regression significance, see Test, regression second-order, 289
significance Structural break, 241, 251, 255, 256, 260
Regression(s) Swamy, P.A.V.B., 262
backward, 251 System(s)
forward, 251, 252 complete, 353
multiple, 27, 105 equations, 351, 362
406 Index

T V
Teräsvirta, T., 349 Variable
Test binary, 243
Augmented Dickey-Fuller, 305, 306, 341, centered, 47, 70, 83, 123, 125, 128, 228
346 control, 248
Box-Pierce, 210 dependent, 9
Chow, 254–256, 260, 261 dummy, 243–247, 249–251, 260
coefficient significance, 59, 119 endogenous, 9
CUSUM, 253, 254, 257 exogenous, 9
CUSUM of squares, 254, 257 explained, 9, 27
Dickey-Fuller, 287, 302–308, 333, 334, explanatory, 9, 27
340, 341, 346 independent, 9
Durbin, 204–208, 214, 217 indicator, 241, 243, 245, 249
Durbin-Watson, 204–207, 214, 217, 219, instrumental, 223–225, 276, 277, 355, 362,
337 364–368, 370
Farrar-Glauber, 232, 235 lagged endogenous, 10, 207, 208, 275, 276
Fisher, 120, 122, 126, 131, 132, 152, 230, predetermined, 355, 363, 364, 369
242, 270, 335, 336 qualitative, vii, 246–250
Geary, 201 Variance, 11, 45
Glejser, 182, 186, 188, 192, 194, 319 empirical, 12
Goldfeld-Quandt, 179, 181, 182, 190, 319 explained, 65, 66
Hausman, 226 residual, 65, 66
instruments validity, 277 Variance inflation factor, 229, 233
Jarque-Bera, 99 Vector
Ljung-Box, 207, 210, 219, 297, 319, 324 cointegration, 338
portmanteau (see Test, Box-Pierce) column, 154, 156
regression significance, 121, 151 line, 154, 156
regression significance (see Test, regression VIF, see Variance inflation factor
significance) Volatility, 185
Sargan (see Test, instruments validity)
significance, 59, 60, 69–71, 119–121, 151,
182, 208, 309, 368 W
student, 120, 166, 208, 273, 368 Walker, see Equation(s), Yule-Walker
unit root, 293, 297, 302, 306, 310, 333, 345 Wallis, K.F., 207
Test size, 57 Watson, G.S., see Test, Durbin-Watson
Theil, H., 363, 365, 377 Weak, see Stationarity, second-order
Three-stage least squares, see Method, Welsch, R.E., 233
three-stage least squares West, K.D., see Method, Newey-West
Thuilliez, J., 135 White, H., 184–187, 193, 194, 319
Time series econometrics, 287, 319 White noise, 31, 319
Tobin, J., 237 Winsten, C.B., 216
Trace, 157, 158, 163, 164 WLS, see Method, weighted least squares
Transformation Wooldridge, J.M., vii, 221
Box-Cox, 77, 79–81
Koyck, 273, 275
logarithmic, 20, 28, 29, 189 Y
TS process, 298–300, 303 Yoo, S., 341, 342, 346
Two-stage least squares, see Method, two-stage Yule, see Equation(s), Yule-Walker
least squares

Z
U Zellner, A., 365, 366
Underidentified, 358, 361, 362, 369 Zuindeau, B., 139
Unit root, 293, 297, 300, 302, 303, 305–308,
310, 333, 334, 340, 345

You might also like