Introductory Econometrics 2000
Introductory Econometrics 2000
Introductory Econometrics
Downloaded by [Hacettepe University] at 02:27 20 January 2017
This book constitutes the first serious attempt to explain the basics of econometrics and its
applications in the clearest and simplest manner possible. Recognising the fact that a good
level of mathematics is no longer a necessary prerequisite for economics/financial economics
undergraduate and postgraduate programmes, it introduces this key subdivision of economics
to an audience who might otherwise have been deterred by its complex nature.
The main features of Introductory Econometrics include:
This text treats econometrics as a subdivision of economics, rather than that of mathematical
statistics. It is designed to explain key economic/econometric issues in a non-mathematical
manner and to show applications of econometric methods in practice. It should prove to be
invaluable to students at all levels venturing into the world of econometrics.
Hamid R. Seddighi
First published 2012
by Routledge
2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN
Simultaneously published in the USA and Canada
by Routledge
711 Third Avenue, New York, NY 10017
Routledge is an imprint of the Taylor & Francis Group, an informa business
© 2012 H.R. Seddighi
The right of H.R. Seddighi to be identified as author of this work has
been asserted by him in accordance with the Copyright, Designs
and Patents Act 1988.
All rights reserved. No part of this book may be reprinted or reproduced
Downloaded by [Hacettepe University] at 02:27 20 January 2017
UNIT 1
Single-equation regression models 1
4.2 Definition of variables, time series data, and the OLS estimation
method 51
4.3 Criteria for the evaluation of the regression results 52
4.4 Tests of significance 56
4.5 Testing for linear restrictions on the parameters of the regression
models 63
4.6 A summary of key issues 67
Review questions 69
UNIT 2
Simultaneous equation regression models 159
UNIT 3
Qualitative variables in econometric models – panel data
regression models 225
UNIT 4
Time series econometrics 271
UNIT 5
Aspects of financial time series econometrics 345
Appendix 370
Index 377
List of figures & tables
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Figures
1.1 A one-to-one deterministic relationship for consumption and income 6
1.2 Key stages in a traditional econometric investigation based on the
SG methodology 12
2.1 Consumption and income 17
2.2 Normally distributed levels data for the variables 18
2.3 A normally distributed variable 18
2.4 A linear consumption-income model 23
2.5 A log-linear model of demand 25
3.1 The sampling distribution of an estimator 35
3.2a Sampling distribution of an unbiased estimator 36
3.2b Sampling distribution of a biased estimator 36
3.3 The consistency property 38
3.4 Sampling distribution of an OLS estimator 41
4.1 Population regression line 54
4.2 Sample regression line 55
4.3 The sampling distribution of β̂2 58
4.4 A standard normal distribution 59
4.5 Hypothesis testing using the standard normal distribution 60
4.6 Significance testing and the critical region 61
4.7 Conventional significance testing 62
4.8 Two-tailed significance testing using the t-distribution 63
4.9 A chi-squared distribution 64
4.10 An F-distribution 65
6.1 The Durbin ‘h’ test 93
7.1 Second and third order polynomial approximation of distributed lag
structures 122
7.2 Distributed lag estimates for the inflation-money growth equation for
Greece 126
7.3 Geometric distribution: successive values of λi 128
7.4 Pascal distribution of weights 130
7.5 Sum of squared residuals with respect to λ 150
8.1 Market equilibria and underidentification 176
8.2 Market equilibria and exact identification of the supply function 178
9.1 Intercept dummies: southern and northern locations 229
Figures and tables xiii
9.2 Critical region for t-distribution 232
10.1 Non-linear probabilities of health insurance on income 243
10.2 A normal CDF plotted against a logistical CDF 247
12.1 Private consumption and personal disposable income for an EU member,
1960–1995 275
12.2 Correlogram for private consumption of an EU member 279
12.3 Correlogram for differenced private consumption of an EU member 281
16.1 Daily log-returns of FTSE 100 from Jan 1, 1990 to Dec 31, 2010 348
16.2 Squared daily returns for FTSE 100 349
Tables
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Each unit can be taught separately depending on the nature of the programme/module being
offered. Units 1, 2 and 3 are suitable for one/two semester undergraduate/post graduate
programmes in economics, business, and relevant MBA programmes. Units 4 and 5 are
suitable for one/two semester programmes, covering advanced undergraduates in economics,
finance, and relevant MBA programmes.
Acknowledgements
I would like to thank my students past and present for their comments and reactions
to some of the material included in this book. Thanks are due to anonymous referees
who reviewed an earlier edition, providing suggestions for improvements. Some of
these have been incorporated into this edition. Thanks are also due to Dr Dennis Philip
of Durham Business School, University of Durham, for undertaking the work on
Unit 5 of this new edition. I also wish to thank Jeff Evans of Sunderland Business
School, University of Sunderland, for preparing the statistical tables included the
Appendix.
xvi Preface
Finally I wish to acknowledge the contributions of K.A. Lawler and A.V. Katos to an
earlier edition of this book. In particular, special thanks are due to A.V. Katos for his contri-
butions to some of the key chapters of the previous edition. Some of the key material of the
earlier edition have been revised but retained in Units 1, 2 and 4 of this new edition. This is
greatly appreciated and acknowledged by the author.
Dr Hamid R. Seddighi
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Unit 1
Single-equation regression
models
Downloaded by [Hacettepe University] at 02:27 20 January 2017
• This unit introduces the reader to the traditional approach to econometric analysis,
focusing on the formulation, estimation and evaluation of single-equation regression
models.
• The unit explains the traditional specific to general (SG) methodology of econometric
analysis and demonstrates this methodology in a step by step fashion via a number of
applied econometric examples.
• The emphasis throughout this unit is on understanding the key concepts and issues
and on applying key ideas to modelling and empirically evaluating economic
models.
• Chapters 1 and 2 explain the process of economic modelling and modification of the
economic models into econometric models for the purpose of empirical analysis.
• Chapter 3 explains the concept of estimators, sampling distributions, properties of
‘good’ estimators, and the OLS estimators and their properties.
• Chapter 4 explains the criteria for evaluation of the regression results, focusing on
hypotheses testing and tests of significance.
• Chapter 5 explains the diagnostic testing procedures used in practice to test and to detect
breakdown of the standard assumptions, including several tests for autocorrelation and
heteroscedasticity. Unlike conventional texts, which typically devote several chapters to
these issues, and, therefore, give the impression that each of the breakdowns occurs
independently of other problems, this chapter follows what typically happens in prac-
tice, explaining all key problems together under one roof to re-emphasise the method-
ological problems of the traditional approach.
• Chapter 6 explains the phenomenon of spurious regression, frequently encountered
within the framework of the traditional (SG) methodology. It discusses how this meth-
odology deals with this problem in practice and explains the modern approach to dealing
with this problem.
• Chapter 7 provides detailed coverage of the traditional approach to converting static
econometric models to dynamic econometric models, explaining the distributed lag and
the autoregressive dynamic models.
• This unit shows key aspects of the traditional methodology to single-equation econo-
metric analysis, including modelling, estimation, evaluation and spurious regression
phenomenon, providing a sound foundation for the modern approach to time series
econometric analysis, to be discussed in Unit 4.
• This unit is suitable for an introductory course/module in econometrics to be delivered
over one semester.
2 Single-equation regression models
• To better understand key elements of this unit, students are encouraged to model
various economic variables, making use of the review questions at the end of each
chapter, collect relevant data (for example, from various online sources), and use an
appropriate regression package to estimate, evaluate and analyse single-equation regres-
sion models.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
1 Economic theory and economic
modelling in practice
Downloaded by [Hacettepe University] at 02:27 20 January 2017
INTRODUCTION
Econometrics is a branch of economics dealing with the empirical evaluation of abstract
economic theories and models. The principle aim of econometrics is to check the validity of
the economic models against data, and to provide empirical content to economic theories
and models. The key component of an econometric analysis is therefore an economic theory
specifically in the form of an economic model suitable for empirical evaluation. This chapter
provides an introductory discussion of the methods of economic modelling and econometric
analysis. To this end, a number of simple economic models are developed step by step to
illustrate the key features of economic modelling and to demonstrate the need for empirical
evaluation of the models via econometric analysis.
Key topics
• Economic modelling in practice
• The econometric approach
• The specific to general methodology of econometric analysis
• frameworks within which the relationships among economic variables are expressed in
consistent and logical sequences;
• the frameworks for the model builder to generalise theoretical arguments and ascertain
implications;
• frameworks for the empirical investigation of economic hypotheses.
These are: (a) a model of household consumption expenditure, and (b) a model of demand
for competitive imports.
Y = f(X).
ΔY = a change in income
ΔX = a change in consumption expenditure
ΔY/ΔX is assumed to be less than one.
Or equivalently:
ΔY/ΔX ≤ 1
where the ratio ΔY/ΔX is the marginal propensity to consume (MPC). This shows the change
Downloaded by [Hacettepe University] at 02:27 20 January 2017
in consumption per unit of a change in income. Hence, the average propensity to consume
(APC) can be written as:
APC = Y/C
where APC > MPC and APC falls as real income rises.
Y = f (X)
0 ≤ MPC = ΔY/ΔX ≤ 1
MPC < APC; APC = Y/X.
In general, economic theorists do not specify the exact functional forms that are likely to
exist among economic variables. The specification of functional forms is left to the model
builder. It is, however, customary to consider an initially linear relationship, because of the
ease of presentation and analysis. Following this tradition, the AIH in a linear form may be
presented as:
Y = a + bX; a ≥ 0; 0 ≤ b ≤ 1.
Where a is the intercept term and b is the slope of the linear function. Both a and b are
unknown constants and are called parameters of the model. An important task for economet-
rics is to provide estimates of the unknown parameters of the economic models on the basis
of economic data on economic variables. This model can be presented graphically by
assuming certain hypothetical values for the unknown parameters: a and b. This is done in
Figure 1.1, where the dependent variable is measured along the vertical axis and the inde-
pendent variable along the horizontal axis.
According to Figure 1.1, there exists a one-to-one relationship between Y and X. That is,
given a value for X, such as X1, there is a unique value for Y, which is shown in Figure 1.1 as
Y1. That is, within the framework of economic models, the relationship between economic vari-
ables is represented by deterministic equations. We return to this point in subsequent sections.
Y
Y = a + bX
Y2
Y1
a
Downloaded by [Hacettepe University] at 02:27 20 January 2017
X
X1 X2
The first factor is level of final expenditure, that is, total expenditure in a given period
in an economy. The composition of total expenditure is also important given the degree
of the import content of the different components of total expenditure (consumption, expen-
diture, investment expenditure and expenditure on exports). In developing a model of
competitive imports, we allow for a general case and distinguish between the three broad
categories of final expenditure, namely, total consumption expenditure by the private and
public sectors, investment expenditure and expenditure on exports. The underlying assump-
tion being that each aggregate component of final expenditure has a different impact on
imports.
The second factor developed by theory is the price of imports relative to the price of
domestic substitutes. Where the price of imports is measured in the units of domestic
currency and expressed in index form, the domestic price level is also expressed in index
form, for example, in the form of wholesale price indices or GDP price deflators. A rise in
the relative prices would normally lead to a fall in demand for competitive imports. In most
cases, to explain how prices are determined, it is usually assumed that the supply elasticities
are infinite, which implies that whatever the level of domestic demand for imports, these are
supplied and, therefore, that import prices are determined outside the model, through the
interactions of demand and supply. The domestic price index is also usually taken as given
and is assumed to be flexible, thereby eliminating excess demand at home. Theory also iden-
tifies the capacity of the import substitution sector to produce and supply the goods as an
important factor in determining the demand for imports. However, the capacity variable is
essentially a short-run phenomenon and is relevant only if excess demand at home cannot be
eliminated by a change in domestic prices. To generate an algebraic model we begin by
defining the economic variables considered to be important in determining the demand for
competitive imports. These variables are defined as follows:
Having defined these variables, we are now in the position to present the general form of the
algebraic model as follows:
(1.1)
Hence, according to theoretical arguments, the demand for competitive imports depends
on (‘is a function of’) the main components of aggregate demand: CG, INVT and EXP.
Each of these has a separate impact on import demand. Moreover, one expects that the rela-
tionship between each one of these variables and imports to be positive. That is, a rise/fall in
any of these variables would result in a rise/fall in the level of demand for competitive
imports. With regard to the relative price term, we expect that a rise/fall in this term to lead
to a fall/rise in the level of imports. We indicate this negative relationship by using a negative
sign below the relative price variable in the previous equation. In this particular presentation
of the model, M is the dependent variable, and CG, INVT, EXP, PM/PD are the independent
variables.
To analyse in more detail the nature of the interactions between the dependent variable
and each one of the independent variables for any economic model, we need to assume a
certain functional form for the general economic model presented by Equation (1.1). This
will be done as follows.
(1.2)
8 Single-equation regression models
where α1, α2, α3, α4, α5 are unknown constants and are called the parameters of the model; α1
is the intercept term of the linear equation and α2, α3, α4, α5 are slope parameters.
In this linear presentation of the model, each slope parameter shows the impact of a
marginal change (a one unit change) in a particular independent variable, while other inde-
pendent variables are constant, on the average value of the dependent variable. For example,
α2 shows the impact of a one unit change in CG, while INVT, EXP and PM/PD are kept
constant, on the average value of import demand. Symbolically:
(1.3)
Similarly, α5 shows the impact of a one unit change in the relative price variable, while all
components of aggregate demand are kept constant, on the average level of demand for
competitive imports. That is:
(1.4)
(1.5)
where β1 is the intercept term and β2–β5 are the slope parameters.
The slope parameters β2, β3, β4 and β5 are partial elasticities, each showing the percentage
change in the dependent variable with respect to a percentage change in any one of the
independent variables under consideration.
For example:
The parameter β2 shows the percentage change in the dependent variable per unit of a
percentage change in the independent variable CG, ceteris paribus. So:
Note that in a log-linear specification each slope parameter shows the percentage change in
the dependent variable (and not its log) per unit of a percentage change in any one of the
independent variables. In economic analysis such a parameter is called an ‘elasticity’. Here,
β2 is the partial elasticity of import demand with respect to consumption expenditure,
showing the percentage change in the level of imports for every one per cent change in the
level of CG. For example, if β2 is found to be 5 per cent, it implies that, for every single per
cent change in CG, (all other independent variables are kept constant), demand for import
changes by 5 per cent. This implies that the demand is highly price elastic. Log-linear
economic models are extremely popular in applied economic/econometric analysis due to
the ease of specification and interpretation of results.
Economic theory and economic modelling in practice 9
A summary of key steps required for economic modelling
Step 1 Give a clear statement/explanation of the economic theory underlying the economic
phenomena under consideration. Pay particular attention to the assumptions, noting implica-
tions and limitations.
Step 2 Use simple linear relationships in parameters to present the economic model in alge-
braic form. Use linear relationships to model/link economic variables to measure the impact
of marginal changes on the dependent variables.
Step 3 Use log-linear specifications when you want to measure elasticities. Make sure you
understand how the variables are linked through the log-linear specification. Remember that
Downloaded by [Hacettepe University] at 02:27 20 January 2017
a slope coefficient in a log-linear specification shows the percentage change in the dependent
variable per unit of a percentage change in an independent variable (to which the slope
coefficient is attached).
Given this definition, the question which now arises is how does one carry out econometrics
work/research in practice? In other words, what is the methodology of econometric/
empirical analysis in economics. There are three major rival methodologies for empirical
Downloaded by [Hacettepe University] at 02:27 20 January 2017
work/analysis in economics; these are ‘specific to general’ (SG), ‘general to specific’ (GS),
and co-integration methodologies. In this text we will follow two key methodological
approaches: (a) the well-documented and applied specific to general methodology (SG), and
(b) the recently popularised co-integration methodology. We start with the traditional approach
to econometric modelling, often termed the specific to general (SG) approach to econometric
modelling. In Chapters 12–15 we will discuss in detail the co-integration methodology.
• Since each and every test is conditional on arbitrary assumptions which are tested later,
then if any of these are rejected, at any given stage in the investigation, all previous
inferences are invalidated.
• Given the restricted number of diagnostic tests conducted, it is accordingly not always
known if the ‘best’ model has been achieved by using this iterative methodology.
Figure 1.2 schematically shows key stages in a traditional econometric investigation based
on the SG methodology.
1 Begin with the most general specification which is reasonable to maintain within the
context of theory.
2 Conduct the simplification process, which is undertaken by way of a sequence of tests,
which aim to avoid the charge of measurement without theory or empirical foundation.
Moreover, within the context of the GS approach it should be noted that:
a the significance levels for the sequences of testing is known;
b the sequential testing procedures are used to select a data coherent model of
economic relationships.
Nevertheless, there still remain drawbacks with the GS approach. These can be generally
stated as being that:
1 The chosen general model might only actually comprise a ‘special’ case of the data
generation process, so that diagnostic testing remains vitally important.
2 Data limitations and inconsistencies weaken the approach.
3 There still remains no universally agreed uniquely optimal sequence for simplifying the
‘general’ model in practice.
12 Single-equation regression models
Economic theory
Algebraic economic
model
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Econometric model
Economic
data
Estimation/
interpretation
Confirmation/diagnostic
checking
Figure 1.2 Key stages in a traditional econometric investigation based on the SG methodology.
Despite a number of key shortcomings, as outlined above, the SG methodology has domi-
nated the majority of empirical work in economics since the mid-1930s. It is currently used
in econometric texts to introduce students to econometric analysis. We will follow this tradi-
tion in Chapters 1–8 to introduce basic concepts, models, analysis and limitations. However,
Economic theory and economic modelling in practice 13
in Chapters 12–16 we will focus our discussion on new methodologies for time series inves-
tigations, particularly on co-integration analysis.
1 Time series data These are observations collected on the values of the variables of the
model over time. Time series data are typically collected from official public/private
Downloaded by [Hacettepe University] at 02:27 20 January 2017
secondary data sources. For example, in the UK the Economic Trends and the London
Stock Exchange market databases are well-known sources of reliable time series data for
time series econometrics investigations. The Economic Trends Annual Supplements are
excellent sources of time series data on key UK macroeconomic variables (gross domestic
product, aggregate consumption expenditure by private and public sectors, imports,
exports, etc.), in quarterly and in annual frequencies. The Economic Trends Annual
Supplements are available online. For financial econometrics analysis, The London Stock
Exchange (LSE) market databases and The Financial Times (FT) provide high-frequency
time series financial data on stocks/shares prices, the volume of financial transactions and
the pattern of demand and supply of stocks and shares. Time series data are used for
modelling, model evaluation and forecasting in time series econometrics studies.
2 Cross-section data These are observations collected on variables of the model at a point
in time. Cross-section data may be collected from public/private data collection agen-
cies, which collect cross-section data via surveys on various aspects of individuals,
consumers, and producers. For example, various published surveys and data on house-
holds’ monthly expenditures (expenditure over a specific month), on companies’ annual
investment outlays (over one particular year) and on a sample of stock returns. In cross-
sectional econometrics studies, typically a sample is chosen by the researcher from a
population of interest and cross-section data is then collected via questionnaire for
detailed econometrics investigation.
3 Panel data sets These provide observations on a cross-section of individuals, firms,
objects or countries over time. Typically, in a panel data set, a long cross-section data
set is mixed with a short time series to generate a combined cross-section/time series
data set. For example, a panel data set on:
• the daily prices of a sample of (say 100) stocks over 3 years;
• the monthly expenditure of a group of households (say 1000) over time (say 3
years);
• the annual R&D expenditure of manufacturing companies located in the NE over a
5-year period;
• problem: examining effectiveness of negative income tax (thousands of households
over 2–3 years);
• earning potential of university graduates: a large data set on graduates’ characteris-
tics, age, type of degree, employment, etc. The graduates are then followed over
time (say 1–4 years) and time series data are collected on earnings and
promotions.
14 Single-equation regression models
As was pointed out above, in a panel data regression analysis, a large cross-section data set
is normally mixed with a short time series data set, for the purpose of empirical analysis.
depict straight linear lines, have an intercept term and slope parameters.
• Log-linear specification is another popular form of economic models. In this type of
model, the slope parameters are various elasticities.
• Economic models show causation between variables. The variable to be explained by
the model is called the dependent variable (regressand) and those explaining it are inde-
pendent variables (regressors).
• Economic theory and models lack empirical content. They must be empirically tested to
see whether they are capable of explaining the data and are consistent with facts.
• Econometrics is a branch of economics dealing with empirical evaluation of economic
theories and models, to establish whether a theory is capable of explaining the data for
which it is designed.
• Specific to general (SG) methodology is the traditional approach to conducting empir-
ical analysis in economics. This is a positivist approach to empirical analysis, based
on deductive reasoning and theory, using quantitative data, focusing on ensuring
reliability and precision in analysis.
• There are many good sources of economic data available online, including the Economic
Trends Annual Supplement, the OECD databases, the EU databases, and various central
banks publications.
• Economic data are collected in three different forms: time series data, cross-section data
and panel data.
Review questions
1 Explain the meaning of each of the following terms:
a economic theory
b economic model.
c econometric model.
2 Explain what is meant by:
a a deterministic model
b a stochastic model.
3 Which type of data is suitable for use with an economic model? Explain your answer.
4 Distinguish between the specific to general (SG) and the general to specific (GS) meth-
odologies of econometrics. Which one of these two approaches do you find more attrac-
tive? Explain your answer.
Economic theory and economic modelling in practice 15
5 Explain what you understand by each of the following:
a a linear model
b a log-linear model
c dependent variable
d independent variables.
6 Explain the nature and types of economic data. What are the key sources of economic
data?
7 Compare and contrast times series data with panel data. Give examples of each type of
economic data to illustrate your answer.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
2 Formulating single-equation
regression models
Downloaded by [Hacettepe University] at 02:27 20 January 2017
INTRODUCTION
Economic models are by their nature deterministic. That is, given a set of values for the
independent variables of the model, only one value can be generated for the dependent vari-
able, from any one of this type of model. This is in contrast to economic data, which are not
deterministic. That is, for any given set of values for the independent variables, there is
likely to exist a range of values for the dependent variable under investigation. Before any
empirical analysis and evaluation, in order to explain the data generation process (DGP), the
deterministic economic models must therefore be modified to reflect the stochastic nature of
the data. This chapter explains in detail how this modification is done using a number of
examples to illustrate key ideas.
Key topics
• The two-variable linear regression model
• The disturbance term, its role and assumptions
• The multiple linear regression model (the classical normal linear regression model:
CNLRM)
For each level of income, it is reasonable to assume that there would be a probability
distribution of associated consumption levels for households. Notice that we talk about
likely probability distributions because we do not know the exact level of monthly consump-
tion expenditure of each household. The proposition here is that households with the same
level of income, say X1, are likely to have different levels of consumption. Consumption
levels are likely to be different because there are many factors other than income (interest
rates, size of family, location, habits, savings behaviour, etc.) which influence monthly
expenditure. It is difficult to measure the net influence of all these economic and behavioural
factors, but they do exist, leading to probability distributions of consumption expenditure at
each level of income. We aim to go deeper, making assumptions concerning how each of
these probability distributions could have been generated. A probability distribution is iden-
tified by three basic characteristics. These are the shape, the central value, and a measure of
dispersion of values around the centre of the distribution.
Y
Monthly
consumption
X1 X2 X3 X Monthly
income
Y
Downloaded by [Hacettepe University] at 02:27 20 January 2017
X1 X2 X
(2.1)
Frequency or
probability
Variance of x
Probability of X
occurring
between a and b
c d a b X
E(X)
Figure 2.3 A normally distributed variable.
Formulating single-equation regression models 19
The variance of the distribution has no meaningful unit of measurement, so to generate a
meaningful measure of the dispersion of values around the central value, the square root of
variance is used in practice. This is called the standard deviation of the distribution, denoted
by SD(X), and the unit of measurement is the same as X, the variable under consideration.
The area under the curve between any two points shows the probability of X occurring
between those two points (see Figure 2.3).
Hence, Pr(a ≤ X ≤ b) = area under the normal curve between a and b.
An important property of the normal distribution is that the probability of X occurring
between a and b would be the same as X occurring between c and d, provided the distance
of these points from the centre of the distribution is the same (measured in units of standard
deviations). That is, the probability of X occurring between c and d to the left of E(X) is the
same as the probability of X occurring between a and b to the right of X, as long as (c–d) and
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(a–b) are the same distance from the centre of distribution. What are the implications of
these characteristics of the normal distribution for the data generation process concerning
the household consumption/income model? First, it is assumed that most of the consumption
expenditure is concentrated around the average consumption expenditure, or expected
expenditure, at any particular income level. Second, the consumption expenditure of
households occur ‘above average’ or ‘below the average’, depending on each household’s
circumstances and behaviour. However, the probability of consumption levels occurring
‘above average’ is the same as the probability of consumption levels occurring below the
expected consumption level for each class of income. Third, at each level of income, the
probability that household consumption expenditure lies between a certain range, say above
expected consumption, is the same as the probability of that household consumption expen-
diture being between a certain range below the expected consumption expenditure, provided
that the two ranges of consumption expenditure have the same distance/difference from
expected consumption. These are the main implications of the normality assumption. They
can only be considered valid if the normality assumption is confirmed through diagnostic
testing.
2 The expected value of each distribution is determined by the algebraic economic model
This is an important assumption, and it is being made to bring the economic model to
the forefront of analysis. If the model is correctly specified and is consistent with the
data generation process, then its role is to show the central value/expected value for
each distribution. Given this framework, the algebraic economic model represents the
average value or the expected value for the dependent variable, at any specific value of
the independent variable. Using the consumption/income model, the expected level of
consumption for any given level of income is assumed to be determined by this simple
relationship. Given X1 is a level of income, the expected level of consumption expendi-
ture E(Y/X1) (reads expected consumption given X1 level of income) is determined by
the algebraic model as follows:
E(Y/X1) = a + b X1 [at the centre of distribution of Y given X1], similarly the expected level
of consumption expenditure given the level of income of X2 is:
(2.2)
The straight line E(Y/X) = a + b X then goes through the centre of each distribution, connecting
the expected levels of consumption expenditure at various income levels. This is depicted in
20 Single-equation regression models
Figure 2.4. The line E(Y/X) = a + bX is called the population regression line or function.
This line is unknown and it is to be estimated on the basis of a set of observations/data on Y
and X. Notice that this assumption has to be tested and confirmed later using appropriate
diagnostic tests. All that is being said is that, provided the algebraic model is correct or is
based on a valid economic theory, then it shows the average level for the dependent variable,
at various levels for the independent variable. This assumption also enables us to represent
each observation in terms of the expected value of each distribution and the deviation of
the data points from the expected level of distribution. For example, the consumption
expenditure of the ith household, corresponding to the income class Xi can be represented as
follows:
(2.3)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
where ui shows the deviations of the ith household’s consumption expenditure from the
expected/average expenditure of households with the income of Xi. The ui term can be posi-
tive, in which case households consume more than average, or it can be negative, implying
that the ith household consumes less than expected consumption for the income group Xi.
We can substitute for E(Yi /Xi) in terms of the algebraic model to obtain:
(2.4)
a + b Xi = the algebraic economic model showing expected consumption at income level Xi,
and ui = deviation of ith household consumption expenditure from the expected consumption
for the income group of Xi.
In this simple model, ui represents the net influence of all factors/variables other than
income on the ith household consumption expenditures. Some of these factors are economic
factors influencing consumption such as interest rates, wealth, and possibly rates of inflation.
For household consumption expenditure, family size is also an important factor. In a more
sophisticated model, these factors should be included. In such cases, there are a number of
independent variables, (income, interest rates and wealth) each influencing consumption
expenditure independently. Other factors that are captured by ui, however, cannot be
measured quantitatively; these include tastes and the behavioural patterns of households.
These factors essentially influence the consumption expenditure in unsystematic, random
ways. In other words, we cannot precisely measure all these net influences. Given this, the
econometric relationship is stochastic in nature. That is, given a level of income such as Xi
we are likely to get a range of values for consumption expenditure, because the influence of
random factors, as captured by the term ui, are different for different households. The term
ui is called a random disturbance term. Its value occurs randomly and its inclusion into
analysis disturbs an otherwise deterministic algebraic economic model.
Using the idea/concept of the disturbance term, we can now present the consumption
expenditure of each household (the data points) as follows:
(2.5)
i = 1, 2, 3,...; (n = number of observations)
Hence, each observation obtained from household consumption expenditure population can
be decomposed as a sum of two elements. The expected consumption for the income class
Formulating single-equation regression models 21
(i.e. E(Y/Xi) and the deviation from the expected value which reflects the special character-
istics/behaviour for that household. This model is called an econometric model. It is a modi-
fied economic model. The economic model is modified through inclusion of the disturbance
term ui, to reflect the stochastic/random nature of economic data. This econometric model is
compatible with the data and although it is based on a deterministic algebraic economic
model, it can be used with stochastic economic data to quantify the underlying economic
relationships.
Note that the data generation assumptions can also be made in terms of the distribution of
the disturbance term. In particular, the probability distribution of each consumption expen-
diture is generated through the probability distribution of the disturbance term. In other
words, if we assume that consumption expenditure is normally distributed, then by implica-
tion, the disturbance terms associated with each value of the independent variable are also
Downloaded by [Hacettepe University] at 02:27 20 January 2017
normally distributed. With regard to the second data generation assumption, this implies that
the expected value for each distribution of the disturbance term must be zero. To see this,
consider the simple econometric model again:
(2.6)
(2.7)
Notice that the average of Yi, given Xi, is E(Yi/Xi) and that the average/expected value of
E(Yi/Xi) is in fact itself, since there is only one expected value! Therefore, by implication,
E(ui/Xi) must be zero, for (2.7) to hold. This is called the zero mean assumption. So the net
influence of all factors other than the independent variable (income), when averaged out, are
zero, provided that the underlying model is correctly specified.
3 The independence assumption How are the data points/observations of the consumption
expenditure generated? The standard assumption here is that the dependent variable
(consumption) and, by implication, the disturbance term, are independently distributed.
In other words, the consumption expenditure of different households is independent of
each other. Therefore, factors captured by the disturbance terms for different house-
holds are independent of each other. For example, consumption expenditure of the ith
household is independent of consumption expenditure of the jth household; symboli-
cally, the assumption of independence is usually written as follows:
(2.8)
Where cov is short for covariance. Covariance measures the degree of association
between two random variables. This assumption is often termed the non-autocorrelation
assumption, which implies no association between the values of the same variable
(disturbance term).
4 The constant variance assumption Now we need an assumption concerning the disper-
sion of the values around the central value of each distribution. In the absence of
any information concerning the variance of each distribution and the way it might be
determined, the standard assumption usually made is that the variance of the distribution
22 Single-equation regression models
of the dependent variable/disturbance term is constant and does not change across distri-
butions. In terms of the consumption/income model, this assumption implies that the
spread/variance of distribution of the consumption expenditure does not change across
the different income groups. In other words, the variance of the distribution does not
change with the level of independent variable income. This is a strong assumption, and
in practice it often breaks down, particularly when using cross-section data. This is
because, in this type of data, independent variables tend to change significantly from one
distribution/group to the next. Significant variations in the levels, or scales, of the inde-
pendent variables might well in turn influence the variance/spread of each distribution,
generating changes in the variance across distributions. As with the other assumptions
about the data generation process, we therefore need to check/test the appropriateness of
this assumption when analysing the empirical model.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(2.9)
where N= normal
I = independently
D= distributed
E(Yi/Xi) = a + bXi, the expected value of each distribution
σ2 = notation for a constant variance
Alternatively, we may state the assumption in terms of the distribution of the disturbance
term ui:
Formulating single-equation regression models 23
Y
Monthly
consumption
E(Y/Xi) = a + bXi
Downloaded by [Hacettepe University] at 02:27 20 January 2017
X1 X2 X3 X Monthly
income
Notice that the expected value of the disturbance term associated with each of the indepen-
dent variables is assumed to be zero. This model is depicted in Figure 2.4.
The straight line E(Y/Xi) = a +bXi is called the population regression line or, sometimes,
the population regression function. The aim of the analysis is to estimate this population
regression line, that is, to estimate a and b, on the basis of a set of observations on Y and X.
Before we consider this issue, we first expand the theoretical model to include more indepen-
dent variables. In practice, it is seldom the case that only one independent variable influences
the dependent variable.
where:
QD = the quantity demanded of a product
P = the price of the product
P* = the average price of related products
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Y = money income.
Each of the independent variables (P, P* and Y) have a particular influence on quantity
demanded. In particular, for a normal product, a rise in P is expected to lead to a fall in quan-
tity demand (QD), ceteris paribus. Hence, the relationship between P and QD is expected to
be inverse or negative. A rise in the price of other related products (P*) will increase the
demand for the product if these are close substitutes, and reduce demand if these goods are
complementary. Finally, a rise in income is expected to lead to an increase in quantity
demanded (a positive relationship).
We now modify this model to generate a simple econometric model for the purpose of
empirical analysis. The necessary steps in this modification are as follows:
1 Specification of the economic model We specify the functional form of the economic
model. There are many options; however, the two most popular forms are linear and
log-linear models of demand.
(2.10)
Notice that the model is linear in terms of its parameters β1, β2, β3 and β4
In a log-linear specification each variable is expressed in a natural logarithmic form, as
follows:
(2.11)
Notice that the model is linear in its parameters since there are no powers.
The ‘correct’ functional form is not known at this stage and we therefore conduct a number
of diagnostic tests to see whether or not the chosen functional form is consistent with the data.
2 Data generation process (DGP) assumptions The next step in developing an econo-
metric model is to specify the data-generating assumptions. That is, we need to explain
how each observation on quantity demanded might have been generated. The basic idea
is the same as in the case of the two-variable model. We assume that for each set of
values of the regressors (P, P* and Y), we get a probability distribution for quantity
demanded. These probability distributions are generated as a result of influences of
random factors on quantity demanded. For example, random factors such as changes in
Formulating single-equation regression models 25
tastes, fashion and the behaviour of buyers. The net influence of these factors is captured
by the disturbance term u. The data generation assumptions are then concerned with the
shape of each conditional probability distribution, the expected values of each distribu-
tion, and the variance of each distribution.
Following the standard assumptions developed previously, we assume that for each set of values
of the regressors (P, P*, Y), the associated disturbance term will be normally and independently
distributed with a zero mean and a constant variance, say σ2. Given these assumptions, an
econometric model of demand, based on a log-linear specification may be written as follows:
(2.12)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Thus, each observation on quantity demand (or its log) can be written as the sum of two
components: a systematic component showing the expected level of demand for the product
at any given level of prices and incomes, so:
(2.13)
and a stochastic component, ut, showing the net influence of factors other than prices and
incomes on quantity demanded. These random factors, for example, changes in the behav-
iour of buyers, might generate deviations from the expected level of demand, causing the
level of demand to be less/more than expected levels. The disturbance term ut is added to the
expected level of demand to account for these random deviations. We can present the model
graphically, using a two-dimensional diagram, by allowing only one regressor (e.g. Pt) to
change and keeping the other two regressors constant.
log QD
t
σ2
σ2
σ2
Population
regression
equation
E(log Qdt ) = β1 + β2logPt + β3logPt* + β4logYt
log P2 log P3 log P4 log Pt
(logP*2, logY2) (logP*3, logY3) (logP*4, logY4)
To present a general linear econometric equation, we assume that there exists a linear
relationship between the dependent and independent variables (linearity in parameters).
Hence:
(2.14)
With this specification, each observation on the dependent variable can be written as the sum
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(2.15)
where subscript ‘t’ indicates the relevant observation at time ‘t’. To allow for an intercept
term, we allow the value of X1t to be unity, i.e.
(2.16)
Alternatively, we can use the subscript of ‘i’ instead of ‘t’, i.e. (cross-section data)
(2.17)
This assumption may break down under any one or more of the following circumstances,
producing specification errors:
a Omission of the relevant regressors or inclusion of irrelevant regressors;
b Incorrect functional form;
c Changing parameters;
d Simultaneous relationships between dependent and independent variables.
2 Zero mean assumption The expected value of each distribution of the disturbance term
is zero, i.e. E(ut | X1t, X2t, . . .) = 0 for all ‘t’, which implies that the expected value of the
dependent variable is determined by the linear economic model, i.e.
(2.18)
The breakdown of this assumption occurs with incorrect specification of the economic
model, such as the omission of relevant regressors.
Formulating single-equation regression models 27
3 The homoscedasticity (constant variance) assumption The variance of each conditional
distribution of the disturbance term (or the dependent variable) is the same and equal to
an unknown constant variance (σ2) i.e.
(2.19)
(2.20)
4 The non-autocorrelation assumption The disturbance terms are not correlated, so there
is no association between any pair of disturbance terms ‘t and s’, hence:
(2.21)
This assumption implies that the values of the dependent variables are independent of each
other. The breakdown of this assumption is known as autocorrelation. It usually occurs in
time series analysis due to prolonged influence of random shocks which are captured by the
disturbance terms. Autocorrelation could also be due to the specification errors known as
dynamic misspecification.
5 Independent variables are non-random and are fixed in repeated sampling. In practice,
this assumption can fail due to:
a One or more of the independent variables being measured with random errors. This
phenomenon frequently occurs in empirical analysis and there is a need to check for
random errors by statistical tests.
b Some or all of the independent variables contained in the model may not be inde-
pendent of the dependent variable. This phenomenon occurs when dependent and
independent variables are jointly determined. In this situation, a single-equation
model is not an adequate presentation of the data generation process and the inde-
pendent variables may not be fixed in repeated sampling.
6 Lack of perfect multicollinearity assumption. Multicollinearity occurs when there exists
a near, or perfect, linear association between any two or more of the explanatory vari-
ables in the model. In this situation, it is not possible to measure the separate impact of
each independent variable on the dependent variable. Multicollinearity essentially
reflects problems with data sets rather than the model. It usually occurs when there are
little variations in the values of the regressors over the sample period. To reduce the
extent of the multicollinearity we must ensure that there are sufficient variations in the
values of the regressors.
There is, however, always a certain degree of multicollinearity existing in
econometric models due to the nature of economic data. The stated assumption implies
that it is possible to measure the separate impact of each regressor on the dependent
variable.
7 The normality assumption. For each set of values of independent variables/ regressors,
the dependent variable/the disturbance terms are normally distributed. In certain
28 Single-equation regression models
situations, especially where the data contains a significant number of outliers, the
assumption of normality fails and the normal procedures of model evaluation are no
longer valid.
(2.22)
(2.23)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
This model and its assumptions constitute the general form of a single-equation regression
model. This general specification of the regression model is sometimes referred to in the
literature as the classical normal linear regression model (CNLRM).
Review questions
1 Distinguish between a deterministic and a stochastic relationship. Explain why economic
models must be modified for use with economic data for the purpose of empirical
analysis?
Downloaded by [Hacettepe University] at 02:27 20 January 2017
INTRODUCTION
Having developed an econometric model, the next task of the econometrician is to estimate
the unknown parameters of the model. To this end, the econometrician must collect a data
set containing observations on the dependent and independent variables of the model, and
use the data for the estimation purposes. Choosing a right method of estimation is critical
and merits special attention. This chapter explains how a method of estimation is chosen and
is used in practice.
Key topics
• Estimators and point estimates
• The sampling distribution of estimators
• The ordinary least squares (OLS) method of estimation
• Monte Carlo studies
• The maximum likelihood (ML) method of estimation
(3.1)
The model implies that each observation on the dependent variable Y, can be written as a
linear sum of ‘k’ independent variables (linear in parameters) and a disturbance term (u).
Moreover, the expected value of Y, for any given values of the independent variables, is
determined by the deterministic part of the model:
(3.2)
Each slope parameter, β2, β3, . . . βk, measures the impact of a marginal change in the corre-
sponding independent variable on the expected value of Y, while the other regressors are
held constant. For example, β2 shows the impact of one unit (a marginal change) in X2 on the
expected value of Y, while X3 . . . Xk are held constant, i.e.
Estimating single-equation regression models 31
or in general:
Given this interpretation, the slope parameters are sometimes called the partial regression
coefficients. These parameters, along with the intercept parameter, are unknown. The
Downloaded by [Hacettepe University] at 02:27 20 January 2017
first task of the empirical analysis is to quantify, or estimate, each of the unknown
parameters, on the basis of a set of observations on the dependent and independent variables.
For example, the estimated parameters for β1, β2, β3 and βk can be denoted as β̂1, β̂2,
β̂3 and β̂k.
These are point estimates of the unknown parameters and are obtained from a sample
of observations on the dependent variable and the independent variables. The actual
process by which the numerical values of these parameters are obtained is discussed
subsequently. At this stage it is sufficient to say that our aim is to generate estimates
for the unknown parameters from a sample of observations on the dependent and
independent variables. Recall that the expected value of the dependent variable is given by
the expression:
(3.3)
Once the intercept term (β1) and partial regression coefficients (β2 . . . βk) are estimated, we
substitute for these estimates into Equation (3.3) to generate the expected value of the
dependent variable for any given values of the independent variables. The estimated
expected value of the dependent variable is denoted by Ŷ,t and can be expanded as
follows:
(3.4)
In practice, β̂*1, β̂*k are numerical values and to generate the estimated expected value of the
dependent variable we also need to substitute into Equation (3.4) the values of the indepen-
dent variable. For example, let us consider the following hypothetical example. Suppose
there are four parameters, and on the basis of a set of observations on the dependent and
independent variables, we obtain:
(3.5)
32 Single-equation regression models
To generate a specific value for the Ŷt variable, we substitute a specific set of values for
the independent variables in Equation (3.5). For example, the estimated expected value of
the dependent variable (Ŷt ) when X2 = 100, X3 = 300 and X4 = 400 can be calculated as
follows:
(3.6)
Therefore, when X2 changes by one unit, Ŷt changes by two units in the same direction as
X2. Similarly:
The reliability of the parameter estimates depends crucially upon whether or not the econo-
metric/regression model and its assumptions are consistent with the data. In particular, only
when each of the assumptions is found to be consistent with the observed data, can we
consider the point estimate to be reliable. The second aim of an empirical investigation is
therefore to test to see whether each assumption is consistent with the data. These assump-
tions include those concerning model specification as well as data generation assumptions.
The testing procedures that are used in practice are based on a series of diagnostic tests
designed to confirm the model and its assumptions. Based upon the results of the diagnostic
tests, two situations can arise:
1 We may fail to falsify each one of the assumptions. In this case, the model is consistent
with the data and we proceed to test economic hypotheses implied by the economic
theory/model and use the model for empirical analysis.
2 One or more assumptions will not be consistent with the data. This is pretty much a
normal occurrence in practice. In this situation, on the basis of the empirical results
obtained, our task is to improve the econometric model. There is, therefore, a link
between economic theory and empirical analysis. Econometric analysis not only
provides empirical content to abstract economic models but also helps us to understand
complex economic relationships. This process means that we modify and improve
economic theories to achieve a better understanding of the data generation process and,
hence, economic issues.
Estimating single-equation regression models 33
3.1.1 Estimation
Given a multiple linear regression model of the form:
(3.7)
The task is to estimate the unknown parameters β1, β2, β3 . . ., βk on the basis of a set of obser-
vations (such as those presented in Table 3.1) on dependent and independent variables (i.e.
Yt and X2t . . . Xkt). A general data set is presented in Table 3.1, as follows:
Y X2 X3 X4 .... Xk
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Each row depicts the observations obtained on the dependent and independent variables
(for example, the first row is the first set of collected observations on the dependent and
independent variables).
of the regressors are fixed. If we repeat this procedure a large number of times, we obtain a
large sample data containing the same values of the regressors, but different values for the
dependent variable, as in Table 3.2.
In Table 3.2, the values of the regressors are fixed at certain levels. Each column shows
the observations on the dependent and independent variables in this hypothetical sampling
procedure. The first row shows the first set of observations, the second row the second set of
observations and the final row the mth set of observations. The interesting point here is that
although the values of the regressors are kept fixed from sample to sample, as it is shown in
Table 3.2, we are likely to obtain different values for the dependent variable from sample to
sample due to the fact that the dependent variable contains the influence of the random
disturbance term. These values of the dependent variable are presented by Yij, where ‘i’ is
the number of observations and ‘j’ is the number in the sample. For example, Y21 is the
second observation on Y in the first sample, and so on.
In this hypothetical experiment, if we substitute for each sample data in an estimator (say
β̂k – the estimator of βk) we get a large number of point estimates for βk. More specifically,
we can write:
(3.8)
(3.9)
(3.10)
Y X2 X3 X4 Y X2 X3 X4 Y X2 .. Xm
* * *
Y11 X21 X31 X41 Y12 X*21 X31*
X41 Y1m X21*
Xm1*
*
Y21 X22 X32 X42 Y22 X*22 X32*
Y2m X22*
Xm2*
.. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. ..
Yn1 X*2n X*3n X*4n Yn2 X*2n X*3n X4n Ynm X*2n X*mn
Estimating single-equation regression models 35
By arranging the point estimates into class intervals, it is possible to generate the sampling
distribution of estimator of βk as follows:
By measuring the relative frequency on the vertical axis and the corresponding class interval
on the horizontal axis, we can obtain the histogram corresponding of the hypothetical
Downloaded by [Hacettepe University] at 02:27 20 January 2017
sampling distribution, and thus generate the sampling distribution of the estimator. Figure
3.1 represents one such sampling distribution. (Note: the mid-points of each bar of the histo-
gram are connected together to form the sampling distribution.)
An estimator whose sampling distribution has a number of desirable characteristics given
the data generation assumptions, is then considered to be ‘optimal’ to be used for the purpose
of estimating parameters of the econometric/regression models. These desirable characteris-
tics or properties of the sampling distributions are discussed below.
F
Relative
frequency
F3
F2
F1
∧
β∗k1 β∗k2 β∗k3 βk
Class interval
∧
(a) E ( β k ) = βk(Unbiased estimator)
(b) E(β*k) βk
Such an estimator is called an efficient estimator. The standard deviation (i.e. the error
associated with the sampling distribution) of the minimum variance estimator is less
than any other unbiased estimator. Hence, most of the points generated by such an esti-
Downloaded by [Hacettepe University] at 02:27 20 January 2017
mator would cluster around the expected value of the distribution. Such an estimator
would be more reliable, that is, it is more likely to produce point estimates which are
close to the true values of the unknown parameters compared to other methods of esti-
mation, if one repeats the process of estimation many times. This is how a ‘good’ esti-
mator is normally selected for the estimation of the regression models.
3 The sufficiency property In addition to the efficiency property, it is desirable for an esti-
mator to make use of all sample data to generate point estimates. An estimator which
uses all observations on the dependent and independent variables to generate point esti-
mates is known as a sufficient estimator. All observations, including extreme values, are
used to generate point estimates and no information is discarded.
Frequency
Downloaded by [Hacettepe University] at 02:27 20 January 2017
∧
E(β) = β Estimator
estimator when the sampling distribution of the estimator and other properties cannot be
determined due to the small size of the sample. In particular, the following asymptotic prop-
erties are desirable:
1 The consistency property The estimator is such that its asymptotic distribution becomes
concentrated on the true value of the parameter as sample size becomes extremely large.
So, as the sample size becomes larger and larger, the centre of the sampling distribution
of the estimator shifts towards the true value of the parameter. Moreover, the spread of
the distribution becomes smaller and smaller as sample size increases. In the limit, the
sampling distribution simply becomes a vertical line concentrated on the true value of
the parameter. This is shown in Figure 3.3.
2 Asymptotically efficient estimator An asymptotically efficient estimator is such that its
variance approaches zero faster than any other consistent estimator. In situations where
small sample properties of the sampling distribution of the estimator cannot be deter-
mined, the usual practice is to consider asymptotic properties, and select an estimator
which is asymptotically efficient. In practice, the properties of the sampling distribution
of the estimator and their asymptotic characteristics are determined using Monte Carlo
simulation/studies. We will explain this method later on in this chapter.
The OLS estimators can be obtained without reference to the normality assumption, whilst
ML estimators require the assumption of normality to hold. Both methods generate identical
estimators of parameters.
(3.11)
We aim here to generate efficient estimators of the unknown parameters β1 to βk so that each
parameter can be estimated given a set of observations on the dependent and independent
variables. We denote the estimators as follows:
uses generates
β̂1 → estimator of β1 → a set of observations on Y and x1...xk →
β̂1 (a point estimate)
uses generates
β̂2 → estimator of β2 → a set of observations on Y and x1...xk →
β̂2 (a point estimate)
...
uses generates
β̂k → estimator of βk → a set of observations on Y and x1...xk →
β̂k (a point estimate)
The conditional expected value of the dependent variable is given by the following expression:
(3.12)
and the estimated expected value is obtained by substitution of the point estimates into
Equation (42):
(3.13)
where Ŷt is the estimated expected value of Y or simply fitted values. In the theoretical model
the difference between each individual value of the dependent variable and the expected
values of the dependent variable is known as the disturbance term, i.e.
(3.14)
The empirical counterpart of the disturbance term (ut) is known as the residual and is usually
denoted as et. It can be obtained as follows:
(3.15)
40 Single-equation regression models
Each residual shows the difference between an observed value of Y and the estimated
expected value of Y and it is in this sense that a residual may be considered to be the empir-
ical counterpart/estimate of a disturbance term.
The OLS method makes use of the concept of the residuals to obtain parameter estimates.
In particular, under the OLS method, the parameter estimates are obtained such that the sum
of the squared residuals is minimized. Note that a residual can be positive or negative,
depending upon whether or not the estimated expected value of the dependent variable is
greater than or less than a particular value of Y. We can square each residual and then sum
up the squared residuals to obtain the residual sum of squares (RSS) as follows:
(3.16)
where Σ is the summation operator. We can substitute for ei into the above expression and
we arrive at the following equation:
(3.17)
Under the OLS method, β̂1 . . . β̂k are obtained in terms of observations of the dependent
and independent variables Yt, X1t . . . Xkt, such that the residual sum of squares is minimised,
i.e. we find the OLS estimators such that Σei2 = Σ[Yt . . . ]2 is as small as possible. This proce-
dure yields the OLS estimators β̂1ols . . . β̂kols. Each estimator is a particular function of the
dependent and independent variables, yielding point estimates for each set of observations
on Y and X1 . . . Xk., i.e.
(3.18)
The exact mathematical formula of each estimator can be obtained by the matrix algebra and
minimisation procedures. In practice, one uses regression packages (e.g. Microfit 4/5 or
EViews) where the OLS method is routinely used to generate point estimates. The main
reason for the popularity of the OLS method is that under repeated sampling procedures, and
given the assumptions of the model, the sampling distribution of each OLS estimator has a
number of desirable/optimal properties. In particular, it can be shown that if all assumptions
of the model are consistent with the data then:
1 The OLS estimators are unbiased, i.e. the expected value of the sampling distribution of
an OLS estimator is equal to the unknown parameter. Symbolically:
(3.19)
This is a conditional property dependent upon the validity of the zero mean assumption,
which in turn depends upon the model to have been specified correctly.
In particular, there are no omitted variables from the regression model. The omitted
variables give rise to biased and therefore unreliable OLS estimators.
2 The OLS estimators have minimum variance. This sampling property of the OLS esti-
mators is conditional on the validity of the assumption of non-autocorrelation and
homoscedasticity. Symbolically:
Estimating single-equation regression models 41
(3.20)
If all data generations assumptions explained above are satisfied, it can be shown that among
all linear unbiased estimators, the OLS estimators have the smallest variance. That is, they
are BLUE. The mathematical proof of this statement is beyond the scope of this introductory
text. The proof of this proposition is provided by the Gauss-Markov theorem, which provides
formal justification for the use of the OLS methods of estimation. Note that the normality
condition is not required for the Gauss-Markov theorem to hold.
Given the properties of (1) and (2), the OLS estimators are said to be efficient. However,
it should be noted that the efficiency property depends crucially on the validity of the model’s
assumptions. If any of the assumptions fail, then the OLS method is no longer efficient and
another method of estimation would need to be found. It is therefore important that we verify
Downloaded by [Hacettepe University] at 02:27 20 January 2017
∧ ∧ ∧
Var(βk) = E[βk − E(βk)]2
∧
f (βk)
∧
(βk)
∧
E(βk) = βk
data is generated. Within this controlled experimental environment, a practitioner sets the
values of the unknown parameters from the outset to see how well a particular method of
estimation can track values. This is in fact the reverse of the estimation procedure. Within
the framework of a Monte Carlo study, the value of the parameters are known from the start.
However, what is not known is how close a particular method of estimation generates these
known values.
In what follows, we demonstrate the steps undertaken in a Monte Carlo study with refer-
ence to the OLS estimators.
1 Set specific values for the parameters of the model to begin the process. For example, in
a simple two-variables regression model:
Yt = β1 + β2Xt + ut
Set: β1 = 0.5
β2 = 0.8
3 Generate some random numbers for ut from a normal distribution with a mean of zero
and a variance of unity. In practice, random numbers are usually generated by computer
software. For example:
5 Choose a particular estimation method, for example, OLS. Use the value of Xt and Yt
(generated in step 4) to generate point estimates for the parameters of the model.
For example, using the OLS method with a regression package, the point estimates
for β1 and β2 are generated via two OLS formulae given below:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
– –
where X is the sample mean of observations on X and Y is the sample mean of generated
data for Y.
Note: the above OLS formulae are derived from the minimisation of the residual sum
of squares. Unlike conventional texts, we have deliberately not used unnecessary maths
and numerical examples in this chapter to derive these estimators. In our view, this does
not meet any purpose. In practice, almost certainly one uses an appropriate regression
package, which produces the point estimates in a matter of seconds.
6 Compare the point estimates with the set parameter values. Are these closely matched?
7 Repeat this controlled experiment from step 3 many times to generate ‘sampling’ distri-
butions for the OLS estimators (i.e. β̂1 and β̂2).
8 Check the following properties of the ‘sampling’ distributions:
a Shape: are the distributions symmetrical around their central values?
b Expected value: are the expected value/central values close to or the same as the set
parameter values? So:
Using a Monte Carlo study such as this, it can be shown that OLS estimators are: unbiased,
possess minimum variance, and are efficient and consistent. However, these properties depend
crucially on the generation of random data from a normal distribution with a mean of zero and
a constant variance. Through a Monte Carlo simulation experiment, econometricians have
been able to investigate the impact of a change in any one of the data generation conditional
assumptions on the behaviour of the estimators. This type of analysis enables the investigators
to select a ‘best’ estimation method to fit the type of data available in different situations.
44 Single-equation regression models
3.5 Maximum likelihood (ML) estimators
Maximum likelihood (ML) estimators are frequently employed by practitioners for the esti-
mation of single equations and, particularly, systems of equations. The intuitive idea under-
lying the ML methodology is quite appealing: estimators are derived such that the likelihood/
probability of obtaining all data points on the dependent variable is the maximum.
For example:
Yt = β1 + β2Xt + β3Zt + ut
Downloaded by [Hacettepe University] at 02:27 20 January 2017
observations on
To derive ML estimators we need the normality assumption to hold. Given this assumption,
the ML estimators are derived from the joint probability density function of the dependent
variable in the model. The mathematical derivation of these estimators, in our view, is
beyond the scope of this introductory text. However, in the subsequent chapters we will
revisit the ML estimators and show their applications in practice. The key point is that the
ML estimators are in fact the same as OLS estimators. In this respect, the OLS estimators are
also ML estimators. That is, provided, the normality assumption holds, the OLS estimators
generate the highest probability of obtaining any sample observations on the dependent vari-
able. This is yet another theoretical justification for the popularity of the OLS estimators.
Review questions
1 Suggest a regression model for the estimation of each one of the variables listed below:
a demand for tourism in a particular country
b aggregate investment expenditure in an economy.
Explain how you would measure each variable included in your model. Explain the role
of any disturbance term included in your model and outline and discuss the standard
assumptions concerning this term. Why are these assumptions necessary?
Collect a sample of observations on the variables included in your model (use an
appropriate secondary source of data), and use a regression package (e.g. Microfit) to
estimate the parameters of your regression model. Explain the meaning of the point
estimates reported by the regression package.
2 Discuss the role of empirical analysis in economics.
3 Explain what you understand by the sampling distribution of an estimator. How is the
sampling distribution used in the search for a ‘good’ estimator?
4 Explain the ordinary least squares (OLS) method. What do you understand by the state-
ment that OLS estimators are ‘BLUE’?
46 Single-equation regression models
5 Explain what you understand by a Monte Carlo study. Devise your own Monte Carlo
experiment to check the properties of the sampling distributions of the OLS estimators.
6 Explain what you understand by each of the following:
a an unbiased estimator
b minimum variance property
c standard deviation/standard error of an estimator
d efficiency property
e sufficiency property
7 Explain the meaning of each one of the following:
a consistent estimator
Downloaded by [Hacettepe University] at 02:27 20 January 2017
INTRODUCTION
In the previous chapters we developed the theoretical arguments underlying linear regres-
sion models. In this chapter and the next we put these ideas into practice by estimating and
empirically evaluating a multiple linear regression model. In practice, the computations are
carried out by the regression packages (Microfit 4/5 and EViews), which are widely avail-
able. These packages generate various diagnostic test results, which are used to evaluate the
regression models in practice. We consider a number of key diagnostic tests, including those
used for the detection of autocorrelation, heteroscedasticity and specification errors. Unlike
conventional texts, which have traditionally dealt with these issues in several separate and
disjointed chapters, and therefore give the impression to the reader that diagnostic testing
and model evaluation in practice is also carried out in a similar disjointed fashion, this text
takes a modern approach to empirical evaluation of the regression models, dealing with
diagnostic testing procedures collectively, but in a step by step fashion. To this end, we have
devoted two key chapters to the explanation of the procedures used for model evaluation in
practice. This chapter deals with modelling, data requirement, estimation, criteria for model
evaluation and some basic statistical procedures using a single-equation model of competi-
tive imports to illustrate the basic idea and procedures. In the next chapter we deal with key
diagnostic testing procedures used collectively in practice, to empirically evaluate single-
equation regression models.
Key topics
• Criteria for empirical evaluation of regression models
• Hypotheses testing procedures and statistical tests of significance
that each macro component of final expenditure may have a different effect on imports
(see Abbott and Seddigh, 1997). In this chapter we follow the simpler model to assist
exposition.
A second variable influencing the demand for imports, ceteris paribus, is the price of close
substitutes. In particular, a rise in relative price levels is expected to lead to a fall in demand
for imports. In most studies of import functions it is assumed that supply elasticities are
infinite and therefore import prices are taken as being determined outside the theoretical
model. Moreover, it is usually assumed that the domestic prices are flexible and change to
eliminate excess demand. Under these conditions, import prices and the domestic price
levels are determined outside the model through the interaction of world demand and
supply. Moreover, infinite supply elasticities assumed means that income distribution is
unaffected. The third factor is the capacity of the country to produce and supply the goods
domestically. However, the capacity factor is essentially a short-run phenomenon and is
relevant only if excess demand at home cannot be eliminated by a change in domestic prices
(Thirwall, 1991).
Given this reasoning, the import demand function may be written symbolically as
follows:
(4.2)
In this specification, α1 is the intercept term, and α2 and α3 are the respective slope
parameters.
In a log-linear specification the log values (expressed as natural log) are connected to each
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(4.3)
1 In log-linear specifications the slope parameters are the partial elasticities. Thus, β2 and
β3 represent the partial elasticity of imports with respect to GDP and the relative price
term, respectively. The income elasticity and elasticity of imports with respect to rela-
tive prices are estimated directly.
2 It is easy to interpret estimated slope parameters as these show percentage changes.
Moreover, the units by which the variables are measured do not influence the magni-
tudes of estimated coefficients, unlike linear specifications.
3 Log transformations of variables reduce the variability in data. This potentially reduces
the likelihood of heteroscedasticity.
Given these properties, we also make use of a log-linear model to estimate partial elasticities
of demand for imports with respect to income and relative prices, as follows:
(4.4)
where ut is a random disturbance added to the equation to capture the impact of all other
variables omitted from the regression model. These influences include random variables such
as sudden change in taste, political upheaval, natural disasters, etc. The equation suggests that
each observation obtained on logs of imports are made up of two components, namely: (i) the
‘average’ expected value of imports at any specific level of GDP and PM/PD, given by:
(4.5)
and (ii) a term showing the deviations from the expected value given by ut. Moreover, the
expected value of imports is determined by variables suggested by economic theory. The
deviations from the average are essentially due to random factors. We start the process by
making some standard assumptions concerning how data are generated. These assumptions
Evaluation of the regression results 51
concern the distribution of the disturbance term or, equivalently, the probability distribution
of the dependent variable, competitive imports. This may be presented symbolically as:
(4.6)
Hence, the disturbance term is normally and independently distributed around a mean of
zero with a constant variance. The regression model may therefore be written as follows:
(4.7)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
M = volume of imports. This yields the total value of imports at constant prices.
Imports in a given year are measured in a particular base year.
PM = a price index of imported goods with prices measured in domestic currency. This
is typically calculated as:
Estimation
The model was estimated by the ordinary least squares method using Microfit 4/5. Under the
OLS method, the parameters are estimated so that the residual sum of squares is minimised.
The reason for using the OLS method is that the OLS estimators are BLUE, provided that all
data generation assumptions are in fact valid. We need to test the validity of these assump-
tions to ensure this is the case.
The Microfit 4/5 regression package produces a set of diagnostic tests designed to test the
adequacy of the model. In what follows, these are discussed in a step by step fashion.
R2 = 0.98109
DW = 0.55343
SE of regression = 0.035263
This is a popular way of presenting regression results, with estimated parameters being
the coefficients attached to each corresponding variable, estimated standard errors written
below corresponding coefficients, followed by R2 (coefficient of determination), the
F-statistic, the Durbin-Watson (DW) test statistic and, finally, the standard error of regres-
sion. Each of these computer-generated numbers have implications for the adequacy of the
model and are used in the evaluation of the regression results. Before we consider these in
detail, we first discuss the types of criteria used in practice to empirically evaluate regression
results.
theoretical arguments. Applying these criteria to the estimated model of imports, we observe
that the partial elasticity of imports with respect to GDP/income is estimated as 2.025. The
sign is positive and consistent with economic theory. The magnitude suggests that for each
1% change in the level of GDP, with relative prices constant, we expect the volume of imports,
on average, to change by about 2%. In other words, import demand is highly elastic with
respect to income. Economic theory does not provide us with guidance concerning the magni-
tude of this parameter. It is advisable to check the magnitude of estimated parameters against
other published work. The price elasticity of demand is estimated at –0.905. The sign is nega-
tive and consistent with the predictions of economic theory. The size of the coefficient
suggests that, for each 1% change in the relative prices of imported goods compared to domes-
tically produced goods, it is expected that imports fall by only 0.9 of 1%. This value suggests
that the demand for imports could be unit elastic. We need to consider this seriously when
conducting further analysis. The economic criteria appear to be satisfied. This is only one
yardstick of the evaluation and we need further analysis. If, on the other hand, the sign and the
magnitude of coefficients were found to be contrary to what is suggested by the economic
theory, we should resolve this problem before going further. This, for example, could be done
by checking the adequacy of the data set and the regression model.
LogM2
Population
s2
regression function
s2
Downloaded by [Hacettepe University] at 02:27 20 January 2017
s2
Intercept
term
Figure 4.1 shows the population regression line corresponding to the log-linear model of
import function. At each level of GDP we expect to find a range of demand levels for imports,
due to the influence of factors other than GDP on demand. Some of these could be random.
The distribution of import demand levels are symmetrical around the expected values given
by the model. Note that each observation has two components: a deviation from the expected
value captured by the disturbance term and the expected value of imports, given by the
model. The aim is to estimate the population regression line. To do this we select a sample
of data values from the population data set and use the OLS method to estimate the line. The
sample data set is shown in Figure 4.2.
Remember that the population regression line and the disturbance terms are non-
observable; all we have is a set of observations on the dependent and independent variables
of the model. These observations are first converted into the OLS estimates of the unknown
parameters, giving rise to a sample repression line, such as the one depicted in Figure 4.2.
The vertical distances between observed values and the estimated line are the OLS residuals.
These may be thought of as being the empirical counterpart of the associated disturbance
terms. The smaller the residuals, the better would be the fit of the estimated line to the data
points and the sample regression line would be a better ‘fit’ to the unknown population
regression line.
To find a measure of the ‘goodness of fit’ based on these observations, we notice that
variations in the value of the dependent variable, from one observation to the next, is due to
changes in either the values of the independent variables/regressors, or the influence of the
disturbance term, or both. If we can separate these two influences, finding the percentage of
the sample variations in the imports (or logs of imports) which can be explained by variation
Evaluation of the regression results 55
LogM2
e3 Sample
regression line
e1
e4
e2
Downloaded by [Hacettepe University] at 02:27 20 January 2017
in the regressors, then the higher this percentage is, the better should be the fit of the
sample regression line to the sample data. Therefore, it would be closer to the population
regression line. The total sample variation in the dependent variable is defined as the sum of
the squared deviations of each observation from the sample mean value of the dependent
variable. That is:
or, symbolically:
(4.9)
It can be shown, given model’s assumptions, that this sum can be divided into the sum of two
squared components, as follows:
Total sum of squares = sample variation due to changes in the regressors + sample
variation due to changes in factors other than regressors =
explained sum of squares (ESS) + residual sum of squares
(RSS)
(4.10)
or, alternatively:
(4.11)
56 Single-equation regression models
Multiplying the R2 value by 100% gives the percentage of sample variations in the depen-
dent variable which can be explained by joint sample variations in the independent variables
of the model. R2 takes only positive values between zero and one. It shows the extent of
linear association between the dependent and all the independent variables of the model.
Note that the coefficient of determination is only a measure of the ‘goodness of fit’ of the
linear relationship and in the case of non-linear relationships, the R2 value could be zero,
even if there is a perfect non-linear relationship between variables. Moreover, the value of
R2 is highly sensitive to the validity of assumptions. In particular, when autocorrelation/
heteroscedasticity are present, the R2 value is highly unreliable. In practice, one has to be
sure that all assumptions are consistent with the data before seriously considering the value
of R2. In the above regression, the value of R2 is calculated as 0.98109%. That is, it appears
that over 98% of sample variations in the dependent variable over the period 1980 Q1 to
Downloaded by [Hacettepe University] at 02:27 20 January 2017
1996 Q4 can be explained by the regressors, log GDP and the log PM/PD. Only 2% of total
variation appears to be due to change in the factor captured by the disturbance term. The
model appears to fit the data very well. We must not put too much emphasis on this value at
this stage since the validity of the assumptions have to be tested. Occasionally R2 is adjusted
– –
for the degrees of freedom of the RSS and the TSS to yield the R2, the adjusted R2. This is
shown below:
or
–
4.3.4 The adjusted coefficient of determination R2
This is used mainly for comparing regression models to which new variables are added. R 2
will always increase if a new explanatory variable is added to a regression model, even if the
–
new variable is useless in explaining the dependent variable. R2 will normally fall, however,
if the new variable does not belong to the regression model. In this situation, the adjusted
coefficient of determination penalises the inclusion of unnecessary regressors to the model,
occasionally carried out incorrectly to increase R2!
(4.12)
The dependent variable (log Mt) is assumed, on theoretical grounds, to be dependent on two
independent variables/regressors, GDP and log PM/PD, respectively. The question is: is it
possible to provide some statistical support for the inclusion of each variable in the model?
If the answer to this question is positive, and provided that all assumptions are valid, statis-
tical tests provide underpinning for the regression model.
Evaluation of the regression results 57
The first step in performing statistical tests of hypotheses is to state the null and alternative
hypotheses. In the tests of significance, the null hypothesis is the statement that contrary to
the suggestion of economic theory, there is no relationship between the dependent and each
one of the independent variables. In other words, the economic model is false. The model is
treated as being false, unless contradictory evidence is found through statistical procedures.
This is very much like trial by jury. The person is treated as innocent unless evidence is
found to ‘contradict’ the person’s presumed innocence. This type of procedure may be called
the process of falsification. The test is designed to falsify the economic relationship.
We now state the null and alternative hypotheses concerning the economic model in hand.
The null hypothesis is usually denoted as H0 and the alternative H1, and may be stated as:
Notice that the objective of the test is to test the null hypothesis against the alternative H1.
There is an alternative way of presenting the null hypothesis; in particular, it can be presented
in terms of parameters of the model. More specifically, we notice that parameter β2 links the
dependent variable of the model to the independent variable GPD. Moreover, the value of
this parameter is unknown. If the value of this parameter is zero, it is then implied that there
is no relationship between the dependent variable and the GDP.
The null hypothesis may therefore be written as:
The alternative hypothesis can be written more specifically. In particular, in forming the
alternative hypothesis we should make use of information that we have concerning the value
of this parameter. According to theory, the relationship between imports and GDP is positive
in nature. Hence, movements in imports are in the same direction as those of GDP. Making
use of this information, the alternative hypothesis may be written as:
We therefore have:
H0: β2=0
H1: β2>0
or, alternatively:
H0: β2 ≤0
H1: β2>0
∧
F(β2)
∧
Var (β2)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
∧
F(β2)
∧
β2
∧ β5 β6
E(β2) = β2
This distribution can be used for finding the probability that the estimator lies between any
two specific values. For example, if one wishes to find the probability of the estimator to lie
between β5 and β6 (any two specific values), we need to find the area under the normal curve
between these two values, as shown in Figure 4.3. Mathematically, the area is found by finding
the integral of the normal distribution, between β5 and β6. An important property of the normal
distribution is that the value of the area under the curve between any two points depends only
on the distance of each point from the mean of the distribution where each distance is expressed
per unit of standard error, rather than the mean and the variance of the distribution. In other
words, for all values of mean and variance of a normal distribution, as long as the deviations
of the points from the mean per unit of standard error are the same, the probabilities are also
the same. Making use of this property to calculate the probability that an estimator, or a
random variable, lies between any two points, we use the standard normal distribution. The
standard normal distribution is centred around a mean of zero and has a variance of unity.
Statisticians have calculated these probabilities for various values of random variables and
tabulated these in the table of standard normal distribution (see Appendix). Any normal distri-
bution can be converted to the standard normal distribution, by expressing the values of
the random variable in terms of deviation from the mean of the distribution per unit of the
standard error. This standard normal distribution is commonly known as the Z-distribution.
For example, one can convert the distribution of β̂2 into a standard normal (Z-distribution) as
follows:
(4.13)
or, diagrammatically:
Evaluation of the regression results 59
∧
f (β2)
∧
Var (β2) = 1
∧
β2
0
Downloaded by [Hacettepe University] at 02:27 20 January 2017
A specific value of Z, under H0, obtained from (4.13) is called a Z-score. The probability that
β̂2 (OLS estimator) lies between any two specific Z-scores can be calculated by converting
the units of distribution into the units of standard normal distribution and then finding the
area under the standard normal curve, from the tabulated table of Z distribution. This is
called a confidence interval for this estimator.
H0: β2 <=0
H1: β2.>0
(4.14)
(4.15)
If H0 is false then β2 would take positive values and Z-scores would tend to be less than Z1
quite frequently. Therefore, if H0 is false, the probability that Z-scores would be greater than,
or equal to Z1, under repeated sampling would be small, say less than 5%, as in Figure 4.5.
In Figure 4.5, the shaded area shows Pr (Z≥Z1). If H0 if false, this probability would be
small. The convention is to set this probability from the outset at some predetermined values,
usually as 1, 5, or 10%. These are called the level of significance of the test. The practitioner
then calculates the Pr (Z≥Z1). If this probability is less than 5%, say, one would be inclined
to reject the null hypothesis.
60 Single-equation regression models
f(Z)
P (Z ≥ Z1)
0 Z1 Z
Downloaded by [Hacettepe University] at 02:27 20 January 2017
This approach to hypothesis testing is called the p-value approach/method. The p-values
are usually reported by regression packages. In practice, we do not know the standard error
of the estimator SE (β2). We need therefore to replace SE(β̂2) with an estimator SÊ(β̂2), in the
Z-score formula. Doing this, however, changes the shape of the normal distribution. The
resulting distribution is called Student’s t-distribution. It is centred on a mean of zero;
however, it is slightly flatter than the standard normal distribution, as the variance of the
distribution is slightly more than unity, for small numbers of observations, e.g. n ≤ 30.
However, as the number of observations increases to more than 30 and beyond, t-distribution
converges to the standard normal/Z-distribution and the two distributions are not distin-
guishable for large samples. The t-distribution is therefore a small sample distribution
closely related to the Z-distribution.
In practice we look at the p-values. If any one of these is less than the 5% level of signifi-
cance, we can reject H0, concluding that the independent variable in question is not statisti-
cally significant.
H0: β2≤0
H1: β2 >0
The p-value reported by the regression package is 0.032. In other words, if H0 is true, there
occur under repeated sampling. Using the conventional 5% level of significance, since the
p-value is less than the chosen level of significance, we reject H0. This result is, however,
only reliable if all assumptions, including those concerning the disturbance term are consis-
tent with the data. We therefore need to confirm all assumptions before arriving at an appro-
priate conclusion.
Evaluation of the regression results 61
4.4.3 An alternative method based on critical values
There is an alternative way of conducting a test of significance using the table of the
t-distribution. Under this method, once the level of significance of the test is set at, say
5%, practitioners then use tables of the t-distribution to find the so called critical value of the
test. This is a particular value of the test statistic for which the probability that the t-ratio is
equal to, or greater than that value, is 5%:
Pr (t≥c.v) = 0.05
f(t)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Critical
region
t
0 0.05
C.V. = tn–k 0.05
= t68–3 = 1.67
The shaded area is called the critical region of the test, because if the t-ratio falls in this
region, we reject H0 at a 5% level of significance. Notice that the method is very similar to
the p-value approach. Under the p-value approach, we use the reported probability for
rejecting/not rejecting H0, whereas under this method, we use the critical value and critical
region of the test to arrive at a decision. Both imply the same conclusions. We now perform
the t-test under the alternative method.
Under H0:
(4.16)
With the critical value of the test being 1.67 (from the table of t-distribution (see Appendix)),
we reject H0, concluding that β2 is significantly different from zero and the log GDP is a
significant regressor.
1 The p-value approach The p-value for the test is calculated at 0.0923, which is greater
than 0.05. Therefore, we do not reject H0, concluding that there is no statistical support
62 Single-equation regression models
for the inclusion of PM/PD as a regressor in the model. This is an unexpected result.
However, as was pointed out, the t-test is only reliable if all assumptions are valid. We
cannot, therefore, express a firm opinion on the results, at this stage.
2 The critical value approach Under H0, The t-value is calculated as:
(4.17)
f(t)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
t
−1.67 −0.12213 0
Decision rule:
We cannot reject H0 at a 5% level of significance, concluding that the relative price term is
not statistically significant. Notice that under H1, β2 takes only negative values, the appro-
priate critical region of the test is the shaded area, as is shown in Figure 4.7.
H0: β1 = 0
H2: β1 ≠ 0
Note that in the absence of any information concerning the value of the intercept term. The
alternative hypothesis is stated such that β1 can take either positive or negative values.
Under H0, the t-ratio is calculated as:
(4.18)
Evaluation of the regression results 63
Given the way H1 is formulated, we need to consider both positive and negative values
of the t-distribution using a 5% level of significance and dividing it equally between
the two tails of distribution; the critical values of the test are 2.0, as shown in Figure 4.8
below.
f(t)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
0.025 0.025
0.025
–t 65 = –2 0 0.025
t 65 = –2 t
A decision rule:
Because reject H0 at the 5% level of significance, concluding that the
intercept term is statistically significant and is different from zero.
H0: income elasticity of demand for imports = price elasticity of demand for imports
H1: H0 is not true
or, in terms of parameters of the model:
H0: β2 = β3
H1: H0 is not true
In this formulation, the null hypothesis imposes a linear restriction on parameters in the
model. This type of test where one, or a number of linear restrictions, are imposed on
parameters, is very popular in practice. How is such a test conducted? We use the example
below to give a basic insight into the procedure. In this example we have one linear restriction
64 Single-equation regression models
on the parameters of the model. We now impose this restriction. The restricted model can be
written as:
(4.19)
or
(4.20)
We estimate the restricted model by OLS to obtain the R2 and the associated residual sum of
the squares (RSS). Now if the restriction is valid, what would be the relation between the R2
or RSS of the restricted and the unrestricted (original model)? We expect, in this situation,
Downloaded by [Hacettepe University] at 02:27 20 January 2017
these two sets of values to be very similar, as both models should fit the data equally well.
Any observed differences are due to estimation procedures.
Symbolically:
(4.21)
or that the observed differences between the restricted and unrestricted residual sum of
squares is small. That is:
To be able to conduct a statistical test, however, what we need is a test statistic which utilises
the difference between the two residual sum of squares. Assuming that the assumptions
concerning the data generation process are met, under repeated sampling procedures, the
difference between the two residual sum of squares divided by the number of restrictions on
the parameters follows a chi-squared distribution. The chi-squared distribution starts at
the origin and is defined only for positive values of a random variable. The shape of a
chi-squared distribution changes with sample size. As sample size increases, this distribution
becomes more symmetrical, approaching a normal distribution shape as sample size approaches
infinity.
F[(RSSR–RSSU)/d]
[(RSSR–RSSU)/d]
(4.22)
if the assumptions are valid, under the repeated sampling exercise, RSSU/n-k also follows a
chi-squared distribution with n-k degrees of freedom.
It can be shown that the ratio of two independent chi-squared distributions follow an
F-distribution. The F-distribution is skewed and starts from the origin, and ranges to infinity.
Since this distribution is related to a pair of chi-squared distributions, there are two associ-
ated degrees of freedom, d and n-k, degrees of freedom in the numerator and denominator,
respectively. Figure 4.10 illustrates an F–distribution. Like the chi-squared distribution from
which it is derived, an F-distribution becomes symmetrical with increases in the sample size.
The F-distribution converges to a chi-squared distribution with large sample sizes.
There are tables of the F-distribution produced for different levels of significance, e.g. 1%,
5% and 10% (see Appendix). To find a desired F-value it is necessary to first select the table
with the desired probability in the right tail, and locate the entry that matches the appropriate
degrees of freedom for the numerator and denominator.
The F-statistic may also be written in terms of the R2 values of the restricted and unre-
stricted models. This can be done only if the dependent variable of the model remains the
same after the imposition of linear restrictions.
F(f)
(4.23)
(4.24)
Assuming that the dependent variable of the model remains the same after imposition of the
Downloaded by [Hacettepe University] at 02:27 20 January 2017
linear restriction, so that TSS for both restricted and unrestricted models are the same, we
have:
(4.25)
1 Estimate the unrestricted model by the OLS and record RSSU or RU2 .
2 Impose the linear restrictions and estimate the restricted model by the OLS to obtain
RSSR or RR2 .
3 Calculate the F-statistic and compare the value with the critical value of the test, obtained
from F-distribution tables. Note that an F-distribution is identified by two degrees of the
freedoms, one corresponding to the chi-squared distribution in the numerator and one
corresponding to the chi-squared distribution in the denominator.
4 If the calculated value of F is greater than the critical value of the test we reject H0 at a
pre-specified level of significance, concluding that the linear restriction/s are not statisti-
cally significant.
An illustration of the use of the F-test follows, where it is applied to test the overall signifi-
cance of the regression.
Notice that our aim here is to test the null hypothesis. The H1 of the test may therefore be
stated in a general form which encompasses alternatives.
Under H0 we have two linear restrictions on the parameter of the model, therefore d = 2,
which is the same as the number of parameters of the model minus one. So k–1= d, or
parameters 3–1= 2 = d.
Since the dependent variable, log Mt, remains the same after imposing the linear restric-
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(4.26)
The coefficient of determination of the restricted model is zero, since under H0, the two
independent variables are deleted. The test statistic, therefore, takes a simpler form, as
follows:
(4.27)
(4.28)
Note that in practice, we do not need to calculate the value of F for the test of significance,
as it is normally reported by the regression packages.
Using the 5% level of significance, the critical value of F at the 5% level of significance is
3.15 (see Table in Appendix). Decision rule: because F = 1683.3 > F0.05 3.15 = 3.15, we reject H0
at the 5% level of significance, concluding that independent variables, jointly, are highly
significant. The reliability of this test depends crucially on the validity of assumptions, which
are to be tested in the next chapter.
dure, this statistic has a sampling distribution with a mean zero and a variance slightly
more than unity. This new statistic is called a t-statistic. A t-distribution is a small sample
distribution (number of observations less than 30), and as the number of observations
increases, a t-distribution converges to a Z-distribution.
• Each null hypothesis is formulated to indicate that each slope parameter is not signifi-
cantly different from zero. A t-test is then used to falsify the null. Using either a p-value
approach or a critical value testing procedure, if the null is rejected, it is concluded that
the parameter is not zero and is statistically significant. This in turn is taken to imply that
the corresponding independent variable/regressor is significant. The so-called t-ratios
refer to the values of the t-statistic under null hypotheses; these are routinely reported
by almost all regression packages. Each t-ratio shows the value of t-test statistic under
the null hypothesis. As a rule of thumb, if the value of t-ratio is greater than two, the
parameter is likely to be statistically significant.
• To test for the significance of a linear restriction (or a set of linear restrictions) on the
parameters of a regression model, an F-test is employed. The null hypothesis is that the
linear restriction is valid. If the null is true, then the residual sum of squares under
the restricted and unrestricted versions of the regression model must be similar under a
repeated sampling procedure. Using this idea, the difference between the restricted and
unrestricted residual sums of squares is calculated and then divided by the number of
restrictions. To generate an F-distribution this value is divided by the residual sum
of the squares of the unrestricted model, which is in turn is divided by its degrees of
freedom. The numerator and the denominator each having an independent chi-squared
distribution, resulting in an F-distribution. Using a 5% level of significance, if the value
of F-test statistic falls into the critical region of the test, we reject the null, concluding
that the linear restriction on the parameters is not true.
• An F-distribution emerges from division of two independent chi-squared distributions.
It is skewed towards origin and can only take on positive values. The F-distribution
tables are available for calculating the probability of the F-statistic falling in a particular
permissible range.
• To test for the overall significance of regression, an F-test is used. The null hypothesis
is that slope parameters, taken jointly, are not different from zero. This is the same as
the F test of linear restriction, but can be carried using the coefficient of determination
version of the test as the dependent variable would not change after imposing the restric-
tion. The F-test statistic is routinely reported by the regression packages. If the value of
this test statistic under the null hypothesis is greater than the critical value of the F-test
at, say 5%, reject the null concluding that the regression is significant.
Evaluation of the regression results 69
• Note that the F-test is a joint test of overall significance and it is possible that the regres-
sion model is significant according to an F-test, but each individual t-test is not signifi-
cant. That is, each individual parameter is not different from zero, but all, jointly, are
different from zero. This situation might arise due to inadequacy of the data, particularly
if the sample size is small and is taken as an indicator of multicollinearity in the data. In
this situation it is advisable to increase the sample size and sample variation.
Review questions
1 Explain why it is necessary to empirically evaluate econometric models. What are the
main criteria used in practice to evaluate econometric models? How would you use
these criteria?
Downloaded by [Hacettepe University] at 02:27 20 January 2017
INTRODUCTION
Econometric models are constructed on the basis of a number of simplifying assumptions
necessary to start the process of empirical analysis. For example, to conduct an empirical
analysis of the import demand function in the previous chapter, we made the following
simplifying assumptions:
1 The relationship between imports and its determinants can be adequately presented by
a log-linear equation (specification assumption).
2 The disturbance term is normally and independently distributed around a mean of zero,
with a constant variance.
Note that there are in fact four assumptions included in this statement. These are:
a the normality assumption;
b the non-autocorrelation assumption;
c the zero mean assumption;
d the homoscedasticity assumption.
3 The regressors are non-stochastic and are measured without errors.
4 There is no exact linear relationship among the regressors/independent variables.
The regression results are then obtained via the OLS method on the basis that each one
of these assumptions is in fact valid. Given this framework, before considering the
regression results as reliable, we need to test the validity of each one of these assumptions.
The reliability of the OLS regression results depends crucially on the validity of these
assumptions. The econometric criteria deal with testing and evaluation of the data genera-
tion assumptions of non-autocorrelation, homoscedasticity, normality and specification
errors. Although we have divided the criteria of the evaluation of the regression results
into economic, statistical, and econometric criteria, in practice, these criteria are strongly
linked and are collectively used to evaluate the results. The regression results can only be
considered reliable when all three criteria simultaneously are satisfied. In this chapter we
focus our attention on the econometric criteria of evaluation. In particular, we explain
the phenomenon of autocorrelation and heteroscedasticity and explain in detail the
detection and testing procedures used in practice for each one of these key problems.
Unlike other introductory econometric texts that devote several separate chapters to
these issues, in applied work, these issues are typically investigated collectively, but in a
step by step fashion. We follow this practice and investigate the validity of the data
Autocorrelation, heteroscedasticity and diagnostic testing 71
generation assumptions, collectively, and in a step by step fashion in this chapter. To
illustrate key issues, we will continue with the regression results of the UK demand for
competitive imports.
Key topics
• Econometric criteria of model evaluation, the specification and misspecification
tests
• Causes and consequences of autocorrelation and detection procedures
• Causes and consequences of heteroscedasticity and detection procedures
• Testing of the normality assumption
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(5.1)
Within the framework of the traditional specific to general (SG) approach, there are no
accepted benchmarks regarding the correct order of diagnostic tests to be used; choice of the
tests and the order by which they are performed are left to the practitioner. In what follows,
we utilise a Microfit 4/5 output regarding the types of diagnostic test and the order of testing,
distinguishing between specification and misspecification tests.
sponding to another observation. For example, in a time series study of output, we can use
the disturbance term to take into account the influence of random factors, such as break-
downs of machinery, climatic changes, and labour strikes, on the level of production per
period. In this example the assumption of independence implies that a breakdown of
machinery/strikes/climatic changes, or, for that matter, any random changes only affect the
level of production in one period: the period during which the random event has actually
occurred. In other words, the influence of a random shock does not persist over time; it has
only a temporary influence on the dependent variable. Notice that the only way that a
prolonged influence of random events can occur in econometric models is when the distur-
bance terms are correlated over time; so that when a random shock occurs in a particular
period, it can also influence the dependent variable in other periods, through interdepen-
dency and correlation/connection with the disturbance terms in other times. The assumption
of independence or no autocorrelation/no serial correlation, even when the model is correctly
specified, often breaks down in practice because of the prolonged influence of random
shocks on the economic variables. In time series applications, practitioners should expect the
presence of autocorrelation and the breakdown of the assumption of independence. This is
particularly the case when the econometric model is incorrectly specified to begin with,
which is normally the case within the framework of the traditional methodology. Let us
explore the connection between autocorrelation and incorrect specification of the model a bit
further. Recall how the systematic part of an econometric model is developed on the basis of
economic theory, which is used as a guide in the selection of the independent variables for
inclusion in the model. In applied work there could be other variables, not suggested by the
economic theory under consideration, influencing the dependent variable. For example, in
developing a model of aggregate consumption expenditure using the Keynesian hypothesis
of the absolute income hypothesis (AIH), we are guided by the theoretical arguments that
real income is the main determinant of short-run consumption. In a modern economy there
are other variables, equally as important. These variables include interest rates and rates of
inflation. Moreover, there are competing theories of aggregate consumption expenditure
over time, recommending the inclusion of excluded variables. If a practitioner omits one or
more of these variables whose effects are captured by the data, the econometric model would
be inconsistent with the data and so is misspecified. The cause of misspecification, in this
case, is omission of relevant economic variables in the econometric model. A particular
indication of this type of specification error is the breakdown of the independence assump-
tion and the presence of autocorrelation. This is because one or more of the excluded vari-
ables could be autocorrelated variables. For example, if the interest rate is left out of the
model of consumption expenditure, we get autocorrelation, because interest rate time series
Autocorrelation, heteroscedasticity and diagnostic testing 73
are autocorrelated. What happens in this situation is that the disturbance term picks up the
influence of the excluded variable/s which are autocorrelated. We therefore get autocorrela-
tion in the model. A test for autocorrelation is, therefore, essentially a general misspecifica-
tion test, as the practitioner is unable to determine the exact cause of autocorrelation from
test results. Further work would be needed to deal with the problem of autocorrelation in
order to discover the exact cause.
In the light of this discussion, the Durbin-Watson test may be regarded as a general
misspecification test, in the sense that it can detect the autocorrelation, but is incapable of
identifying its cause. The test is easily carried out in practice and is routinely reported by
regression packages. It therefore provides an early warning device concerning the specifica-
tion of the model. Before we consider this test and the way it is used in practice it is useful
to look at the consequences of autocorrelation for the OLS methods of estimation.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(5.2)
where, εt is a white noise process satisfying all standard assumptions. In this specification of
autocorrelation, the successive values of the disturbance term are correlated. The extent of
the linear relationship between these successive values is captured by the term ρ. This term
ρ could take on any value between −1 and +1. When it is a positive value, autocorrelation is
said to be positive. This basically means that the difference between successive values of the
disturbance term will be positive or negative for a long period. The disturbance terms do not
alternate in sign over a long period of time. This form in turn implies that the nature of the
74 Single-equation regression models
influence of a random shock on the dependent variable remains the same over successive
periods. For example, if the disturbance term is added to a model of production, to capture
the impact of random events on output, positive autocorrelation implies that the impact of a
breakdown on output will be the same over time. It essentially reduces output over a number
of time periods. Its sign in this case is negative and will remain so for a period of time. On
the other hand, if ρ is negative, we say that we have negative autocorrelation. In this case the
disturbance term alternates in sign, so one period it is negative, and the next period it is posi-
tive, or vice versa. In other words, the impact of a random shock on the dependent variable
of the model changes from one period to the next. In one period it has a positive impact on
the dependent variable, and in the next period it has a negative impact. Note that in the case
of positive autocorrelation the difference between the successive disturbance terms, over
periods of time, is either positive or negative. Whereas in the case of negative autocorrela-
Downloaded by [Hacettepe University] at 02:27 20 January 2017
tion, the sign of the difference between successive values of the disturbance term would
alternate in sign. This simple observation provides the basis for the DW test. Given that the
disturbance terms are not observable, any test based on this observation must use the empir-
ical counterpart of a disturbance term; in the case of the OLS estimation the empirical coun-
terparts are the OLS residuals. A simple detection method based on these residuals would be
to look at the signs over the sample period. If the residuals do not change sign over a rela-
tively long period, this could indicate positive autocorrelation. Alternatively, if the residuals
tend to change signs from one period to the next, this could be a sign of negative autocorrela-
tion. Durbin and Watson extended this idea a bit further to generate a test statistic. They
reasoned that in the case of positive autocorrelation the difference between successive OLS
residuals tends to be small (because both residuals have the same sign). On the other hand,
in the case of a negative autocorrelation the difference between the successive OLS residuals
tends to be relatively large (because the residuals alternate in sign). Taking this idea one step
further, in order to make all differences positive they square each of the residual differences.
Moreover, to take into account all differences, they add all squared differences. They then
reasoned that in the case of a positive autocorrelation this sum would be relatively small,
whereas in the case of a negative autocorrelation it would be relatively large. Symbolically:
where et is the OLS residual for period t, showing the difference between the observed value
and the fitted value of the dependent variable. Note that when calculating the above sum we
lose one observation, hence t starts from the second observation.
The pressing question which arises now is what constitutes a relatively large/relatively
small value in this context? Before we answer this question, however, it is important to
recognise that the value of this sum depends on the units by which the variables of the model
are measured. The first task is therefore to eliminate this problem. Durbin and Watson dealt
with this problem by dividing the above sum by the sum of the squares of the OLS residuals,
hence:
(5.3)
Autocorrelation, heteroscedasticity and diagnostic testing 75
Both the numerator and the denominator have the same unit of measurement and so this ratio
is unit free. We can now get back to the question of the magnitude of this ratio. To deal with
this problem, Durbin and Watson expanded the ratio as follows:
(5.4)
They then argued that the value of each of the first two ratios is approximately equal to one
and the sum of the first two is therefore approximately 2. The third ratio is in fact the OLS
Downloaded by [Hacettepe University] at 02:27 20 January 2017
estimator of the correlation coefficient. This coefficient lies between −1 and +1. We there-
fore have:
DW = d ≅ 2(1–ρ̂)
It is now possible to determine the range of the values of the Durbin-Watson (DW) ratio, as
follows:
1 In the case of a perfect positive correlation, the value of ρ̂ would be approximately +1,
and the value of the DW ratio is approximately zero.
2 In the case of a perfect negative correlation, the value of ρ̂ would be approximately −1
and the DW ratio would be approximately 4.
3 Finally, in the case of no autocorrelation, the value of ρ̂ would be approximately zero
and the DW ratio would be approximately 2.
The range of the values for the DW ratio is therefore (0 and 4). As a quick guide, zero could
indicate a strong positive autocorrelation, 4 a strong negative autocorrelation and 2 a lack of
autocorrelation. This is a quick guide and what is needed is a test statistic which can be used
in all situations. For this purpose we need the sampling distribution of the DW ratio, and
some critical values. Durbin and Watson showed that their test statistic has not got a unique
critical value, but rather it lies between two limiting values under repeated sampling proce-
dures. These limiting values are called the upper and the lower bound values and are denoted
by du and dL, respectively. Durbin and Watson then calculated these values for different
sample sizes and alternative numbers of independent variables. These values are tabulated in
the table of the DW test.
1 We start the test by specifying the null and the alternative hypotheses:
(5.5)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
3 We choose a particular level of significance for the test, say 5%, and find the corre-
sponding upper and lower bound values from the table of DW.
1 If the DW test value is less than dL, reject H0 in favour of positive autocorrelation.
2 If the DW test value is between dL and du, the test is inconclusive. In this case use the
du as the critical value of the test; this assumes autocorrelation exists. It is, however,
recommended in this case to use an additional test for autocorrelation to confirm
this result.
3 If DW is between du and 4-du, do not reject H0. In other words, autocorrelation does not
exist.
4 If DW is between 4-du and 4-dL the test is inconclusive. In this case, use 4-dL as the
critical value, in other words, assume negative autocorrelation exits. It is advisable to
carry out an additional test to confirm this result.
5 If the DW value is between 4-dL and 4, do not reject H0, negative autocorrelation is
present. In practice, the value of the DW test statistic is calculated by the regression
package and is routinely reported. All we need to do is to look at this value; if the value
is close to 2, it is a good indication that autocorrelation does not exist, any other value
signifies autocorrelation. We then need to confirm these results, using the tables for the
DW test statistic.
There are some shortcomings in this simple misspecification test. In particular, although the
test is a first-order test, a significant value of the DW test statistic is also consistent with higher-
order autocorrelation. In other words, if H0 is rejected, practitioners should not take this to
mean that first order autocorrelation exits, but that the order of the autocorrelation is not
known. Moreover, the test is highly unreliable when a lagged value of the dependent variable
appears as a regressor in the model. In this case the value of the DW test statistic would be
biased towards 2, indicating no autocorrelation, despite the fact that autocorrelation is present.
There are a number of alternative tests available that are appropriate for use in this situation.
Given this process, the H0 and H1 of the test may be written as follows:
H0: ρ =0
H1: ρ ≠ 0
(5.6)
The value of this statistic is reported by the Microfit 4/5 software to be 0.55343.
Using a 5% significance level, the lower and the upper bound values for n = 68 and k = 3
and kˊ = 3–1 = 2 (where k is the number of parameters including the intercept term), are
respectively:
There are other causes of autocorrelation. For example, the prolonged influence of random
shocks. This could be responsible for autocorrelation. In these situations we need to test for
78 Single-equation regression models
the specification errors first, correct the model using theory, then deal with any residual
autocorrelation left in the model.
Given that the nature and causes of autocorrelation are not clear, in practice we need to
follow a particular strategy to deal with the problem of autocorrelation. There is no specific
strategy recommended and used by all practitioners. The strategy that we recommend
follows the specific to general methodology based on the diagnostic testing as follows:
1 Attempt to find the order of autocorrelation first. For this purpose we use a general diag-
nostic test for autocorrelation, such as the Lagrange multiplier (LM) test.
2 Use a general misspecification test for incorrect functional form, such as Ramsey’s
regression specification error test (RESET).
3 Use another misspecification test, this time for heteroscedasticity, which may be the
Downloaded by [Hacettepe University] at 02:27 20 January 2017
We next discuss these diagnostic tests applied to the model of competitive imports.
(5.7)
This pattern of autocorrelation could emerge, for example, when one is using quarterly data.
The disturbance term in a particular quarter would depend on the disturbance terms in the
previous four quarters.
We start the test by specifying the null and the alternative hypotheses as:
H0: ρ1 = ρ2 = ρ3 = ρ4 = 0
H1: H0 is not true
If autocorrelation of up to the 4th order exists, the regression model takes the following form:
Autocorrelation, heteroscedasticity and diagnostic testing 79
(5.8)
We may call this equation the unrestricted form of the model, in a sense that there are no
restrictions imposed on the parameters of the model.
If the null hypothesis could not be rejected, the model would be the regression equation
we started with, hence:
(5.9)
We call this equation the restricted form of the model, in the sense that there are four param-
Downloaded by [Hacettepe University] at 02:27 20 January 2017
eter restrictions on parameters of the model (four zero restrictions), compared to the unre-
stricted form of the model. We are therefore essentially testing for parameter restrictions.
We have already considered the F-test for linear restrictions in the previous chapter. The test
statistic is as follows:
(5.10)
Decision rule: using the 5% level of significance, if F(d, n-k) > F(critical value at 5%), reject
the null hypothesis at the 5% level of significance, concluding that autocorrelation up to the
4th order exists. Notice that we still do not know the exact order of autocorrelation. All
we can say at this stage is that autocorrelation could be up to the 4th order. To find the order
of autocorrelation we need to test down. That is, we start from a sufficiently high order, say
the 5th order, and then test down, using this procedure. If there is no autocorrelation of the
5th order, we test for the 4th order, and so on, until the order of autocorrelation is identified.
We are now in the position to apply this test to the model. We do this in a step by step
fashion:
(5.11)
The null and the alternative hypotheses can now be written as:
H0: ρ1 = ρ2 = ρ3 = ρ4 = 0
H1: H0 is not true.
80 Single-equation regression models
2 The test statistic is
(5.12)
(5.13)
If the model is correctly specified, the expected value of the dependent variable is deter-
mined by the variables suggested by the theory; in this situation the disturbance term is a
random variable capturing only the influence of the random terms on the dependent variable.
In other words, if the model is correctly specified, there should not be a relationship between
the expected value of the dependent variable and the disturbance term. However, if the
model is not correctly specified, either due to omission of relevant regressors or incorrect
functional form, the disturbance term would not only capture the influence of random terms
on the dependent variable, but it also captures the influence of omitted variables. In other
words, there are some variables included in the disturbance term which would have a system-
atic influence on the dependent variable. These unknown omitted variables, captured by the
disturbance term, are the cause of model misspecification.
Following this line of reasoning, in developing a misspecification test, Ramsey recom-
mends adding a number of additional terms to the regression model and then testing the
significance of these additional terms. More specifically, he suggested that one needs to
include in the regression model some functions of the regressors, on the basis that, if the
model is misspecified, the disturbance term would capture these variables, either directly or
indirectly through other variables omitted from the regression. All we need to do is to test for
the significance of these additional terms. If these additional variables are found to be signif-
icant, then the model is misspecified. Notice that the test does not tell us how to correct the
Autocorrelation, heteroscedasticity and diagnostic testing 81
model, it is designed to show that the model is misspecified. To correct the model, we need
to use theory. In time series investigation it is advisable to consider the dynamic structure of
the model and correct accordingly.
The question which now arises is how to perform a RESET in practice? In particular, what
are these additional variables to include in the model to start the process? The additional
variables are some function of the regressors. Therefore, we need a variable which depends
on the regressors used in the model. An obvious candidate variable is the estimated expected
value of the dependent variable (fitted values) obtained from the OLS estimation of the
regression model under consideration.
With reference to our model of demand for imports, we have:
(5.14)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
The estimated expected value of the dependent variable (log Mt), depends on the regressors.
Following Ramsey’s recommendation we need a function of this variable to include as an
additional variable in the model. In practice, we normally include a number of the powers of
this variable as additional variables in the regression model to obtain the so-called expanded
equation; we then test for the significance of these additional variables using the F-test for
linear restrictions. If the additional variables are found to be significant, one concludes that
the regression model is misspecified. Regression packages, and in particular Microfit 4/5,
carry out this test automatically using the second power of the fitted value of the dependent
variable as an additional regressor. In practice, however, it is useful to look at the graph of
the OLS residuals against the fitted values to see whether we can detect a relationship. We
can also count the number of turning points on the graph as a guide to the power of the fitted
values to be included as additional variables. For example, normally, one turning point in the
graph suggests a quadratic relationship, so we include the second power of the fitted value
as an additional variable. Two turning points could be indicative of a third-degree polyno-
mial relationship, so we include both the second and the third powers of the fitted values as
additional variables in the regression, and so on.
(5.15)
3 Estimate the expanded equation by the OLS and record the residual sum of squares
and call this RSSu. Notice that the original regression model is in fact a restricted
form of the expanded equation where the coefficient of the additional variable is set
to zero.
4 Using the expanded equation as the unrestricted form of the model, and the original
regression equation as the restricted form of the model, we test for the significance of
additional variables. With reference to our model, we can specify the null and alterna-
tive hypotheses:
82 Single-equation regression models
H0: β4 = 0
H1: β4 ≠ 0
(5.16)
where d = number of parameter restrictions (the difference between the number of the
parameters of the unrestricted model and the number of the parameters of the restricted
model).
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Alternatively, in terms of the variance of the distribution of the disturbance term, the assump-
tion of homoscedasticity implies that the variances of the distributions of all the disturbance
terms are the same. Symbolically:
1 The OLS estimators are unbiased but are no longer efficient, that is, they tend to have
relatively large variances.
2 The OLS estimator of the error variance is biased. The direction of the bias depends on
the relationship between the variance of the disturbance term and the values taken by the
independent variable causing heteroscedasticity.
3 The test statistics which make use of the OLS estimate of the error variance, including t
and F-tests, are unreliable, generating misleading results.
4 The coefficient of determination, R2, will also be unreliable, usually overestimating
the extent of a linear relationship between the dependent and the independent
variables.
Given the serious nature of these consequences, it is clearly necessary to test for the
heteroscedasticity in applied research, particularly when cross-section data is used. We next
consider a number of statistical tests designed to detect heteroscedasticity.
84 Single-equation regression models
5.3.2 Detection: the Koenker-Bassett (1982) (KB) test
This is a simple diagnostic test done automatically by regression packages, for example,
Microfit 4/5. We demonstrate this test by using the regression results of the model of import.
Heteroscedasticity implies that the variance of the disturbance term is not constant over
the data range. The Koenker-Bassett (KB) test assumes, in particular, that the variance of the
disturbance term is a function of the regressors. The actual functional form need not be
specified. To get the test operational, we need a proxy for the non-observable variance and a
variable which is influenced only by the regressors. The squares of the OLS residuals are
usually used as a proxy for the variance. As for the other variable, the KB test uses the fitted
values of the dependent variable (in this example logM). The test then assumes a certain
relationship between these two variables. Specifically, it is assumed that the squared resid-
Downloaded by [Hacettepe University] at 02:27 20 January 2017
uals (the proxy for variance) is a linear function of the squared fitted values of the dependent
variable, used here to capture the variations in all the regressors.
(5.17)
H0: a1 = 0
H1: a1 ≠ 0
(5.18)
3 Decision rule: given that the t-test has only one degree of freedom, it can be shown that
the squared value of the t-value is distributed as an F-distribution with one degree of
freedom. In practice, one usually uses the F-version of the test. The value of this statistic
for the model of competitive imports is F(1, 66) = 4.9546. The critical value of the test
at the 5% level of significance is approximately 4. We therefore reject H0, concluding
that there is evidence of heteroscedasticity present. Given the nature of the time series
model it is likely that heteroscedasticity is due to the model misspecification, as it is
indicated already by the DW test.
(5.19)
H0: the disturbance terms are normally distributed (the OLS residuals are normally
distributed).
H1: H0 is not true
86 Single-equation regression models
2 Calculate the test statistic.
(5.20)
3 Decision rule: the value of the BJ test statistic reported by Microfit 4/5 is 0.81353.
Because BJ = 0.81353 < χ22 (0.05) = 5.99, we do not reject H0 at α =5%, concluding that the
normality assumption is maintained. This result implies that the normality assumption
of the shape of the data generation process is consistent with evidence.
The traditional approach to econometric analysis is based on the specific to general (SG)
methodology. Within this framework a simple regression model based on theory
together with a number of assumptions designed to explain the data generation process
are first specified. The regression model is estimated by the OLS, since if all assump-
tions are valid this method generates the best linear unbiased estimators (BLUE).
• In practice, almost all data generation assumptions are likely to break down. In this case
the OLS estimators are likely to lose their desirable properties, generating misleading
results.
• To deal with this problem, the SG approach recommends a battery of diagnostic testing
to be carried out on each one of the data generation assumptions. If problems are
detected then correction should be carried out using theory and logical reasoning to
guide this process. Within this framework a general model/theory is to be derived from
the original specific model.
• This framework seems sound in theory, but is not all that successful in practice.
Misspecification tests detect problems, but are incapable of identifying corrections. All
corrections and modifications are to be carried out within the framework of the theory.
This is a difficult task and in applied work one frequently is not able to arrive at a satis-
factory model by using this iterative methodology.
• Note that within this framework, each and every test is conditional on arbitrary assump-
tions, which are to be tested later, and if any of these are rejected at any given stage in
the investigation, all previous inferences are invalidated.
• Because of these issues the SG methodology has focused mainly on how to formulate
‘good’ estimators using mathematical and statistical methods to come up with alterna-
tive estimators to the OLS to deal with problems. Although this approach has been
fruitful for mathematicians and statisticians, it has not been all that beneficial for econo-
mists and economic analysis.
• The problem is that this iterative methodology has been seldom able to correct the struc-
ture of the economic models or to come up with new innovative theories consistent with
data and evidence.
• Autocorrelation is the breakdown of independence assumption. In regression analysis it
is taken to mean that the values of the dependent variable/disturbance term are not inde-
pendently distributed. Disturbance terms are correlated and can no longer be considered
to be randomly distributed. Although autocorrelation could be due to prolonged influ-
ence of random events, in almost all time series applications, it is an indication
misspecification. This could be due to omission of relevant variables, including incor-
rect dynamic specification.
Autocorrelation, heteroscedasticity and diagnostic testing 87
• The consequences of autocorrelation for the OLS estimators are serious. These estimators
are no longer efficient and regression results are, by and large, misleading and unreliable.
• The DW test is a simple, but powerful, misspecification test. It is used to detect autocor-
relation, assuming that successive values of the disturbance term are correlated (first-
order autocorrelation). If autocorrelation is detected, however, it could be of a higher
order. In this case, an LM test is used to establish the order of autocorrelation. This
could be followed by a RESET test to provide further evidence of specification error.
These tests are not capable of indicating as to how to correct the model.
• Heteroscedasticity is the breakdown of the assumption of homoscedasticity or constant
variance. It implies that the variance of the probability distribution of the dependent vari-
able/disturbance term no longer remains the same across all observations. In cross-section
regression, it could be due to a scale effect. That is, it could be caused if values of the inde-
Downloaded by [Hacettepe University] at 02:27 20 January 2017
pendent variables change significantly in the data set. Variation in independent variables
generates variability in the variance of the dependent variable/disturbance term. In time
series data it could be another indication of incorrect specification. The KB test or the White
test are routinely used for detection. If heteroscedasticity is detected, the OLS estimators
are no longer efficient and the OLS results are typically misleading and unreliable.
Review questions
1 Explain the difference between the specification tests and misspecification tests. Why
are both types of test necessary when evaluating a regression model?
2 Explain the causes and consequences for the OLS estimators of each of the following:
a autocorrelation
b heteroscedasticity
c multicollinearity.
3 Explain the Durbin-Watson Test for autocorrelation. What are the shortcomings of this
test? How would you react to a significant value of this test in a regression analysis?
4 Explain how you would test for the heteroscedasticity in practice. What would you do
if you find that heteroscedasticity is present in the model?
5 Collect a time series data set consisting of 70 annual observations on each of the
following:
a aggregate personal consumption expenditure C
b national income NI
c annual rate of inflation P*.
6 Use economic theory to specify a suitable regression model for the aggregate personal
consumption expenditure, in terms of national income and the rate of inflation. Define your
variables carefully and use your data set with a regression package to estimate the model.
a test for autocorrelation, including the DW and the LM tests
b test for specification errors using a RESET test
c test for heteroscedasticity, and explain all procedures
d use economic, statistical and econometric criteria to evaluate the regression results
e are the regression results satisfactory?
f explain how you would attempt to improve the regression results.
6 The phenomenon of the spurious
regression, data generation
process (DGP), and additional
diagnostic tests
Downloaded by [Hacettepe University] at 02:27 20 January 2017
INTRODUCTION
Regression models derived from economic theory are long run-equilibrium models. Based
on theory, they show functional relationships among a number of variables in the long run.
The time series data that are used in regression models are, however, dynamic, showing the
magnitude of economic variables over specified intervals of time. There is no guarantee that
these observed magnitudes are long-run values for the variables under consideration, or that
interactions among economic variables are completed over one time period. There could be
some lagged responses between the dependent variable and the regressors specified in the
model, due to slow adjustment processes. The data captures the lagged responses, but the
regression model excludes them. There is, therefore, inconsistency between long-run static
econometric models and the data used for estimation. This is detected by the diagnostic tests,
the DW and the RESET, as a specification problem. This chapter continues with the specific
to general (SG) methodology and explains how this methodology attempts to introduce a
dynamic lag structure into the regression model in response to the specification error problem
indicated by the diagnostic tests. Closely linked to this correction procedure is the phenom-
enon of the spurious regression and the way the traditional (SG) approach attempts to deal
with this problem. This chapter also includes additional diagnostic tests used in practice for
the evaluation of regression models.
Key topics
• The phenomenon of the spurious regression
• The partial adjustment and the autoregressive short-run dynamic model
• Testing for the structural break and parameter stability: the Chow test
• Testing for measurement error: the Hausman test
relation is present. It is therefore tempting to consider the model to be adequate, but the
model is in fact totally meaningless and what we are observing is due to non-stationary data
generation processes generating meaningless and conflicting regression results. This is
termed the spurious regression phenomenon, and within the framework of the traditional
approach is frequently encountered in practice. Granger and Newbold (1974) recommend
the following rule of thumb to detect spurious regression: if the coefficient of determination,
R2, is greater than the DW value, then it is likely that the regression is spurious.
• Assume, without further investigation, that the data generation processes are in fact
stationary and the fault lies with the specification of the regression model.
• The model builder therefore focuses on improving specification of the model, including
the dynamic specification of the regression model, attempting to introduce lags into the
relationship, and hoping things work out, ignoring the data generation processes and
their characteristics.
• The focus is on making the model dynamic by simply including various types of
lags into the regression model (see Chapter 7), without a theoretical and empirical
foundation.
• The immediate focus is on the data generation process of each variable, rather than the
regression model.
90 Single-equation regression models
• It first investigates to check whether the variables are stationary. This is done via unit
root testing procedures, which are discussed in detail in Chapter 13.
• If any one of the variables in the regression model is not stationary, the regression is
likely to be spurious.
• If the variables can be made stationary, they can be combined together in a regression
model. If that is the case, then it focuses on the estimation and empirical evaluation
of the underlying economic model and its dynamic structure. The short-run dynamic
adjustment process is uniquely determined via an error correction mechanism, if the
long-run economic model is consistent with the data.
We will explain in detail the modern approach to time series econometric analysis in
Chapters 12–15. For now, in what follows, for illustrative purposes only, we continue with
Downloaded by [Hacettepe University] at 02:27 20 January 2017
the traditional framework, modifying the model’s specification and its dynamic structure in
search of the data generation process (DGP).
(6.1)
where 0 < α ≤ 1 is the speed of the adjustment parameter. The closer α is to unity, the quicker
the response of economic variables are to changes in economic conditions and the faster
would be the adjustments towards equilibrium. Moreover, the closer α is to unity, the more
efficient the economy. Market economies with institutional rigidities, labour market rigidi-
ties and inefficient market mechanisms tend to be sluggish, reacting slowly to required/
desired changes. For these economies the parameter α tends to be close to zero, indicating a
The phenomenon of the spurious regression, data generation process (DGP) 91
slow and prolonged adjustment process within the above framework. We can now combine
the partial adjustment process with the regression model to generate a short-run model, as
follows:
(6.2)
The long-run regression model showing the desired level of imports is:
(6.2a)
Where all variables are measured in logs (for the ease of presentation we have not included
the log notation).
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Substitution for MDt from the regression model into (6.2) results in:
(6.3)
(6.4)
The appearance of the Mt−1, the lagged dependent variable, as a regressor, implies that this is
a short-run dynamic model. This model is called an autoregressive dynamic short-run model,
and we will discuss its properties in detail in the next chapter. Notice that if ut is NID (0, σ2),
αut would also be a white noise process. We can therefore estimate the short-run model by
the OLS, find the estimates of the short-run parameters and then use the values to estimate
the implied long-run parameters. This two-step method is what has been done frequently in
practice, within the SG framework.
We now illustrate the procedure:
Table 6.1
SR LR
α1 = α β1
α2 = α β2
α3 = α β3
α4 = (1−α)
vt = αut
Where SR= short run and LR= long run. Short run parameters are
estimated first, then using the short run and long run relationships
identified above, the long run estimates are generated.
(6.5)
The regression results corresponding to (6.5) from Microfit 4/5 are reported in Table 6.1.
The dependent variable is log Mt and 67 observations were used for estimation from 1980
Q2 to 1996 Q4.
92 Single-equation regression models
Table 6.2 OLS estimation from Microfit 4/5
where:–
A: Lagrange multiplier test for serial correlation
B: Ramsey’s RESET test using the squared fitted values
C and D: based on the regression of squared residuals or squared filled values
Given AR(1)
(6.6)
where
H0 : ρ = 0
Downloaded by [Hacettepe University] at 02:27 20 January 2017
H1 : p ≠ 0
It can be shown that, under H0, the following test statistic, when a large data set is used, has
a standard normal distribution:
(6.7)
0.025 0.025
(6.8)
Decision rule: because h = 1.0644 falls in between 1.96 and −1.96, we do not reject H0. The
null hypothesis is consistent with the data at α = 5% and first-order autocorrelation does not
appear to exist in this modified short-run model.
94 Single-equation regression models
6.2.3 The Lagrange multiplier (LM) test on the short-run model
The LM test is frequently used to test for higher order autocorrelation. Again, it is a large
sample test and its use should be restricted to when a large data set is available. Given that
we have used quarterly observations, (Microfit 4/5) automatically tests for up to 4th order
autocorrelation chi-sq (4) or F(4,57). The first number, 4, is the number of linear restrictions
which have been imposed on the parameters of the model. The underlying assumptions and
steps are as follows:
Assume AR (4):
(6.9)
The restricted form of the model is, thus, the original model:
(6.10)
This value is reported as F(4, 59) = 2.2653. Note: we have 67 observations and K = 8
(parameters in the unrestricted model).
Decision rule: because F(4, 59) = 2.2653 < , we do not reject H0 at
α =5%. H0 is not consistent with the data. Therefore, according to this result, autocorrelation
of up to 4th order does not exist in this model. [Note is the theoretical value of
F(4,59) at 5%, obtained from the F-distribution table.]
H1: α5 ≠ 0
(6.11)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
e2 = β1 + β2 + ut
H0: β2 = 0
H1: β2 ≠ 0
(6.12)
96 Single-equation regression models
The value of the above test statistic is computed as F(1, 65) = 3.65. The critical value of
F(1, 65) at α = 5% is approximately 4.
Decision rule: since F(1,65) = 3.65 < = 4, we do not reject H0. H0 is
therefore consistent with the data. It appears that there is no evidence of heteroscedasticity/
misspecification in the model/data.
(6.13)
The BJ statistic is computed by Microfit 4/5 as 4.95. The chi-squared value with 2 degrees
of freedom at α = 5% is 5.99. Decision rule: because BJ = 4.95< χ22 (0.05) = 5.99, we do
not reject H0, concluding that H0 is consistent with the data. The normality assumption is
therefore consistent with the data.
Within the framework of the traditional approach, and based on the results of the above
diagnostic tests, the short-run partial adjustment model appears to be ‘well specified’ and
explains the short-run data generation process. The only point of concern is the statistical
insignificance of the relative price term, which needs further investigation.
We can now use the short-run estimated coefficients to derive the long-run parameters, as
follows:
α̂ 1 = −5.5571
α̂ 2 = 0.81046
α̂ 3 = −0.011308
α̂ 4 = 0.61780
(6.14)
The coefficient of adjustment /speed is indicating a fairly sluggish adjustment process. The
estimate of other long-run parameters are as follows:-
effective policy option, given the above estimate of the price elasticity of demand for
imports.
1 The Chow ‘type’ test: designed to test for parameter stability when a possible break
point in the data/model can be identified ‘a priori’.
2 Tests based on recursive estimation methods. These tests include CUSUM and
CUSSUM Q tests, and are carried out when the break point in the data/model is not
known ‘a priori’.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
1 Estimate the model by OLS, using n observations, to generate the residual sum of
squares RSS.
a Estimate the model by OLS, using n1 observations, to generate the residual sum
of squares, for the period before the ‘break’, RSS1.
b Estimate the model by OLS, using n2 observations, to generate the residual sum of
squares, for the period after the ‘break’, RSS2.
In the absence of a structural break, the residual sum of squares of the entire period, RSS, must
be approximately the sum of the residual sum of squares of the two sub-sample periods, so:
(6.15)
Or, simply, the difference between RSS and (RSS1+ RSS2) must not be statistically signifi-
cant. A test statistic is therefore needed based on RSS−(RSS1+ RSS2). Chow showed that the
following test statistic has an ‘F’ distribution:
(6.15a)
where the numerator has a chi-squared distribution with K degrees of freedom and the
denominator has a chi-squared distribution with n–2k degrees of freedom. K is the number
The phenomenon of the spurious regression, data generation process (DGP) 99
of parameters of the model (including the intercept term) and n is the number of observa-
tions. The ‘F’ test statistic is used to test for the stability of the regression parameters. The
test, however, is only reliable when the variance of the disturbance term has remained
unchanged over time. It is therefore customary to carry out a test to check for changes in the
variance of the disturbance term, and then use the Chow test. This is illustrated below:
(6.16)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
with
For illustrative purposes, suppose that a possible break point in the data is at the end of
period n1. The regression model for the period n1+1 ,. . ., n, may be expressed with ‘new’
parameters as follows:
(6.17)
with
2 We first test for H0. If H0 is not rejected, we then test for the parameters’ stability. The
test statistic for this is based on the estimation of the error variance, using n1 and n2
observations in turn. Hence:
(6.18)
where:
RSS2 = residual sum of squares obtained from the OLS regression of the model over
the period n1+1,. . ., n (using n2 observations).
RSS1 = residual sum of squares obtained from the OLS regression of the model over
the period t = 1, . . ., n1, (using n1 observations).
100 Single-equation regression models
If F > F αn2−k,n1−k reject H0 at α% level of significance.
3 If H0 is not rejected, use the Chow test for the parameter stability, as follows:
(6.19)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Comments
Note that the Chow test is essentially a joint test requiring a constant variance for the
disturbance term over the sample period. There are occasions when the break point
in the data is such that it does not provide sufficient degrees of freedom for two separate
OLS regressions. This situation occurs when either n1<k or n2<k. Supposing n2<k, the
Chow test procedure may be summarised as follows: (n1<k may be dealt with in
the same way):
H0 : σ21 = σ22
H0 : σ21 ≠ σ22
(6.20)
where is the sum of squares of the one-step-ahead forecast errors. To obtain e2t,
we first estimate the regression by OLS over the n1 sub-sample. We then use these
parameter estimates together with the value of the regressors in each period, over
n+1. . .n, (n2 observations) to obtain one-step-ahead forecast errors.
Decision rule: If HF(n2) > (chi-Sq (n2)) or χn22(α), reject H0 at α% level of significance.
This test is known as the Hendry forecast (HF) test. If H0 cannot be rejected, go to the
next step, as follows:
The phenomenon of the spurious regression, data generation process (DGP) 101
2 Second step.
Test statistic:
(6.21)
If F > F αn2−k,n1−k reject H0 at α% level of significance concluding that H0 is not consistent with
the data.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(6.22)
for t = k+1,. . ., n.
where: β̂1, β̂2, β̂3 are the OLS estimates obtained from k observations.
It can be shown that a normalised form of the recursive residuals has the following
distribution:
Wt~ NID(0,σ2)
(6.23)
102 Single-equation regression models
Where S is the full sample estimation of the standard error of regression i.e.
If the residuals are random and small in magnitude we would expect the CUSUM statistic
to remain close to zero; any systematic departure from zero is taken to indicate parameter
instability/misspecification. Microfit 4/5 produces a graph of CUSUM statistic against
time. In practice, a systematic trend in the graph of CUSUM statistic against time is taken to
indicate a break/failure in the regression.
The CUSUM Q statistic is defined as:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(6.24)
RSSn = residual sum of squares of the OLS residuals for the full sample period.
The CUSUM Q statistic lies between 0 and 1. A random dispersion of the CUSUM Q
statistic, close to zero, within the band of zero and one, is indicative of parameter stability.
However, both CUSUM and CUSUM Q statistics are essentially used in practice as diagnos-
tics indicative of parameter stability. Tests based on these statistics have low power (are not
reliable for small samples). It is advisable to complement these parameter stability indicators
with a formal Chow test.
(6.25)
where:
We assume that both qt and pt are measured with random measurement errors; under this
framework, the observed values of the regressors may be expressed as follows:
(6.26)
and
pt = p*t + u2t
where u1t and u2t are normally and independently distributed random measurement errors
with the following characteristics:
u1t ~ NID (0,σ2u1t) and u2t ~ NID (0, σ2u2t) also cov(u1tu2t) = 0
yt = α + βqt* + γγp*t + ut
Substitution for the observed values of the regressors into the above equation yields:
(6.27)
or
(6.28)
(6.29)
Is it possible to estimate this model by the OLS method? The OLS method is appropriate
only if all standard assumptions are valid. If there are indeed random measurement errors in
104 Single-equation regression models
the regressors, the regressors cannot be assumed to be fixed under repeated sampling
procedures. In fact, they are stochastic/random variables. Moreover, the regressors are
correlated with the composite error term Wt, violating a standard assumption of the
classical linear regression model. In this case, it can be shown that OLS estimators
are biased and inconsistent. Applying the OLS method, therefore, results in unreliable
estimates and faulty inference. In applied work, when dealing with regressors which
are likely to have been measured with errors, it is crucial that we undertake a test for measure-
ment errors. If measurement errors are present, we should use an alternative estimation
method.
The problem with the regression model under consideration is that it includes regressors, qt
and pt, each containing random measurement errors, resulting in inconsistent OLS estima-
tors. The inconsistency of the OLS estimation is due to the existence of correlation between
each regressor and the composite error term, Wt. Within the framework of the IV method we
replace each one of the stochastic regressors with an ‘instrument’. An instrument for a
stochastic/random regressor is a variable which is highly correlated with the regressor, but
not correlated with the disturbance term. With regard to the regression model under
consideration, we need two instruments, one for qt and one for pt, with the following
characteristics:
Instrumental for qt:
Finding an instrument for stochastic variables appearing in the model is a difficult task in
practice. However, with respect to the model under consideration, we use the lagged values
of qt and pt as the respective instruments.
and
(6.30)
also
and
(6.31)
(6.32)
(6.33)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
The solution to these three simultaneous equations generates the instrumental variable esti-
mators. Most econometric regression packages have a sub-routine for IV estimation. In prac-
tice, once the instruments are identified, the estimation process itself is done by the
appropriate regression packages.
(6.34)
α̂ IV, β̂IV, γ̂ IV
If measurement errors are not present, both methods of estimation would generate consistent
estimators. This observation enables us to formulate the null and alternative hypotheses of
the test as follows:
or:
p lim(α̂ IV − α̂ OLS) = 0;
H0 p lim(β̂ IV − α̂ OLS) = 0;
p lim(γ̂ IV − γ̂ OLS) = 0;
The final format of H0 makes the point that, in the absence of measurement errors, the OLS and
the IV estimators are both consistent estimators. In the absence of measurement errors, the two
Downloaded by [Hacettepe University] at 02:27 20 January 2017
estimators are asymptotically identical. It can be shown that the difference between the OLS and
IV estimators is zero, if the instruments and the OLS residuals are uncorrelated. To carry out a
diagnostic test for the measurement errors, the instruments are added to the regression model to
generate an expanded equation. The joint significance of instruments are then tested by the
familiar F-test for parameter restrictions. If the instruments are found to be not significant, it is
taken to mean that they are uncorrelated with the OLS residuals. Therefore, the IV and the OLS
estimators are essentially the same, implying that there is no measurement errors in the regres-
sors. In this situation IV and OLS estimators are consistent, however, the OLS estimators have
smaller variances. We now illustrate the Hausman diagnostic test for the measurement error.
or
(6.35)
(6.36)
H0: β1 = β2 = 0
No measurement errors [i.e. instrument and the OLS residuals (wt) are not correlated].
The phenomenon of the spurious regression, data generation process (DGP) 107
H1: H0 is not true
(6.37)
(6.38)
R2 = 0.86
Downloaded by [Hacettepe University] at 02:27 20 January 2017
n = 45
5 Test statistic.
(6.39)
(6.40)
α
Decision rule: if F > F d,n−k reject H0 at α level of significance, concluding that H0 is not
consistent with the data. Measurement errors are present in the regressors. Rejection of the
null hypothesis implies that OLS estimators are biased and inconsistent and IV method
should be employed.
In our example:
F 0.05
2,40 = 3.23
Review questions
1 Explain what you understand by the term ‘spurious regression’. How does the tradi-
tional approach (SG) deal with this problem in practice?
2 Explain why there is a need to modify the static long-run regression models. How does
this modification usually take place in practice? What are the shortcomings of this
methodology?
The phenomenon of the spurious regression, data generation process (DGP) 109
3 Explain the partial adjustment mechanism and discuss how this mechanism might be
used to make a regression model dynamic. What are the shortcomings of this adjustment
process.
4 Explain the Durbin h-test. Why is there a need for this test?
5 Explain how you would use each of the following diagnostic tests in practice:
a the Chow test for parameter stability
b the CUSUM and CUSUM Q tests
c the Hausman test for measurement error.
Specification tests These involve specific alternative forms of the regression model, for
example, ‘t’ and ‘F’ tests of parameter restrictions.
• Misspecification tests These tests do not involve specific alternative forms of the model.
They are used to detect inadequate specification. For example, the Durbin-Watson test,
and the Lagrange multiplier tests for autocorrelation.
• Serial correlation/autocorrelation The values of the same random variable are corre-
lated over time/space.
• The Durbin-Watson test A misspecification test designed to be used with a first-order
autoregressive process to detect autocorrelation.
• The Lagrange multiplier test A misspecification test designed to detect higher order
autocorrelation (large sample test).
• The Ramsey regression specification error test (RESET) A misspecification test designed
to detect inadequate specification, including omitted variables and incorrect functional
forms.
• Heteroscedasticity A breakdown of homoscedasticity assumption. The variance of the
disturbance term/dependent variable changes over cross-sectional units/time.
• The Koenker-Basset test A test commonly used to confirm the assumption of
homoscedasticity.
• The Chow test for a structural break A test used to detect parameter stability over time.
• CUSUM and CUSUM Q tests diagnostic tests for parameter stability.
• Measurement errors Measurement errors in regressors, resulting in inconsistent OLS
estimators.
• The Hausman test for measurement errors A general misspecification test for measure-
ment errors/exogeneity.
• Instrumental variable (IV) estimation Estimation method used when the OLS method is
inconsistent, for example, when measurement errors are present.
• The spurious regression phenomenon A regression model with no underlying stationary
data generation process to support the theory. The regression results are spurious and
meaningless.
7 Dynamic econometric modelling
The distributed lag models
Downloaded by [Hacettepe University] at 02:27 20 January 2017
INTRODUCTION
In the previous chapter we introduced the partial adjustment model, a popular method used
in practice to approximate the short-run dynamic adjustments and hence the data generation
process. This chapter continues with the SG methodology and explains in detail how this
methodology deals with the short-run dynamic adjustments via the introduction of various
lag structures into the regression models.
Linearity of the model The general form of the linear-in-parameters multiple regression
model is:
, (7.1)
where Yt is the dependent variable; Xjt (for j=1, 2,. . ., k) are k independent, or
explanatory, variables; εt is the stochastic disturbance, or error term; βj (for j = 0, 1,. . ., k)
are k+1 unknown parameters to be estimated, the so-called (partial) regression coefficients,
with β0 being the intercept; and t indicates the t-th observation, N being the size of the
sample.
(7.2)
(7.3)
(7.5)
Dynamic econometric modelling 111
Assumptions involving the explanatory variables
Under these assumptions the OLS estimators βj (for j = 0, 1,. . ., k) of the corresponding
unknown parameters (for β̂ j = 0, 1,. . ., k) are best linear unbiased estimators (BLUE).
In the case that t indicates time, or, in other words, the regression model involves time
series variables, model (7.1) assumes that the current value of variable Yt depends on the
current values of all explanatory variables included in the model. However, this is not always
true. In various economic phenomena the current value of a variable may depend on the
current values and/or on past values of some explanatory variables as well. This is because
the adjustment towards an equilibrium state might well be sluggish, requiring a number of
periods to be completed. In this situation, within the framework of the traditional approach,
the regression model is modified to capture the short-run dynamic adjustment process
contained in the time series data. These modifications are carried out in an ad hoc fashion
and then imposed on the regression model, resulting in the so-called distributed lag models.
In this chapter we will examine this type of model, explaining applications in economics,
estimation procedures and diagnostic testing. A number of diagnostic tests will be revisited
to show their applications to the evaluation of the distributed lag models. All procedures will
be demonstrated with examples.
Key topics
• Finite distributed lag (DL) models
• Infinite distributed lag (DL) models
• Partial adjustment and adaptive expectation models
Case 7.1 The consumption function The simplest linear formulation of the consumption
function is:
(7.10)
112 Single-equation regression models
where Ct = private consumption, Yt = personal disposable income, ut = disturbance term,
α0 > 0, and 0 < β0 <1. The ratio Ct/Yt is the ‘average propensity to consume’ and the first
derivative ∂Ct/∂Yt = β0 is the ‘marginal propensity to consume’.
From (7.10) it is seen that the current value of consumption depends on the current value
of income only, and not on the current values of any other variable. However, this may be
untrue, taking into account that current consumption may also depend on the current level of
savings. In such a case, the consumption function could be written as
(7.11)
(7.12)
(7.13)
(7.14)
Case 7.2 The accelerator model of investment The accelerator model of investment, in its
simplest form, asserts that there exists a fixed relationship between net investment and
change in output. This relationship is written as:
(7.15)
(7.16)
Dynamic econometric modelling 113
Case 7.3 The quantity theory of money The quantity theory of money asserts that the price
level in an economy is proportional to the quantity of money of this economy. This can be
derived from Irving Fisher’s (1867–1947) equation:
(7.17)
where Mt = nominal money stock, Vt = velocity of money, Pt = overall price level, and Qt =
real output, which can be written in natural logarithms as:
(7.18)
(7.19)
(7.20)
Monetarists argue that money changes is the major factor in determining inflation, treating
the other two factors as being negligible. Moreover, (7.20) implies that:
1 inflation rate is less than the nominal money changes when the combined effect of
output and velocity changes is positive;
2 inflation rate is equal to nominal money changes when the combined effect of output
and velocity changes is zero;
3 inflation rate is higher than nominal money changes when the combined effect of output
and velocity changes is negative.
(7.21)
Although Equation (7.21) assumes that the change of money stock affects inflation instanta-
neously, in reality, the response of inflation to changes in money stock is spread over time,
and thus Equation (7.21) is written as:
(7.22)
Case 7.4 The Phillips curve The original Phillips (1958) curve describes an empirical rela-
tionship between the rate of change in money wages and the rate of unemployment as a
percentage of labour force. The higher the rate of unemployment, the lower the rate of
change in money wages. This relationship is written as:
114 Single-equation regression models
(7.23)
where pt = (Pt−Pt−1)/Pt−1 with Pt = overall price level, and βj < 0 and γj > 0 parameters.
In all these cases we saw that the dependent variable of the economic phenomenon (func-
Downloaded by [Hacettepe University] at 02:27 20 January 2017
tion) was not only determined by variables of the current time point with the dependent
variable (instantaneous adjustment) but from variables of past time points (continuous or
dynamic adjustment) as well. The difference between the current time point and a past time
point is called ‘time lag’, or, simply, ‘lag’, and the corresponding variable is called the
‘lagged variable’.
Distributed lag models In these models the explanatory variables include only current and
lagged values of the independent variables. For example, if we consider one dependent and
one independent variable, the model is of the form:
(7.25)
Equations (7.13), (7.16), (7.22) and (7.24) are examples of distributed lag models.
Autoregressive or dynamic models In these models the explanatory variables include one or
more lagged values of the dependent variable. For example:
(7.26)
Infinite lag models This is the case when k is infinite. The model is written:
(7.27)
Finite lag models This is the case when k is finite. The model is written:
(7.28)
Dynamic econometric modelling 115
In both cases, infinite or finite lag models, to avoid cases of explosive values of E(Y), we
make the assumption that the sum of the βj coefficients is finite, i.e.
(7.29)
Let us explain the regression coefficients of a distributed lag model, say model (7.25). In this
model, under the assumption of ceteris paribus, if the independent variable Xt is increased by
one unit in period t, the impact of this change on E(Yt) will be β0 in time t, β1 in time t+1, β2
in time t+2, and so on. We define this impact as follows:
Partial multipliers of order i It is the marginal effect of Xt−i on Yt; i.e. it is equal
to ∂Yt/∂Xt–i = βj. In other words, these multipliers show the effect on E(Yt) of a unit increase
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Short-run or impact multiplier It is the partial multiplier of order i = 0, i.e. it is equal to β0.
In words, this multiplier shows the effect on E(Yt) of a unit increase in Xt made in the same
period t.
Interim or intermediate multipliers of order i It is the sum of the first i partial multipliers,
i.e. it is equal to β0 + β1+ . . . + βi. In other words, these multipliers show the effect on E(Yt)
of a maintained unit increase in Xt for i periods prior to period t.
Long-run or total or equilibrium multiplier It is the sum of all partial multipliers of the
distributed lag model, i.e. it is equal to β, as defined in (7.29). In words, this multiplier shows
the effect on E(Yt) of a maintained unit increase in Xt for all periods.
Since the partial multipliers are actually the corresponding partial regression coefficients,
these coefficients depend on the units of measurement of the independent variable Xt. A
method for expressing these coefficients of free units of measurement values is their trans-
formation into the following:
Standardised coefficients or lag weights They are the coefficients that are derived from the
transformation:
(7.30)
and they show the proportion of the equilibrium multiplier realised by a specific time period.
Inserting (7.30) into (7.27) or (7.28), these models are written as:
(7.31)
Having defined the lag weights, the following statistics that characterise the nature of the lag
distribution can be also defined:
Mean or average lag It is the weighted average of all lags involved, with the weights being
the standardised coefficients:
116 Single-equation regression models
(7.32)
and it shows the average speed with which Yt responds to a unit sustained change in Xt,
provided that all regression coefficients are positive.
Median lag It is the statistic that shows the time required for 50 per cent of the total change
in Yt is realised after a unit sustained change in Xt, and is given by:
(7.33)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(7.34)
The approaches for estimating this model are usually grouped into the following two
categories:
Unrestricted approaches These approaches refer to the case where the lag length is finite and
no specific restrictions about the nature of the lag pattern are imposed on the β coefficients of
the model. We can distinguish two cases: known lag length, and unknown lag length.
Restricted approaches These approaches refer to the case where specific restrictions about
the nature of the lag pattern are imposed on the β coefficients of the model. We can distin-
guish two cases: finite lag length, and infinite lag length.
Example 7.1 The quantity theory of money for Greece, 1960–1995 (known
lag length)
Suppose that the response of inflation to changes in money stock in Greece is spread
over three years. Therefore, Equation (7.22) of the quantity theory of money is written:
(7.35)
Table 7.1 presents annual data on the implicit price deflator for GNP (P) and
nominal money stock (M1) from 1960 to 1995. By first computing the rate of inflation
Dynamic econometric modelling 117
Table 7.1 GNP price deflator and nominal money stock for Greece
as pt = (Pt − Pt−1)/Pt−1 and the rate of money change as mt = (M1t − M1t−1)/M1t−1, OLS
produces the following results. (The values in the parentheses below the estimated coef-
ficients are asymptotic t-ratios.)
(7.36)
(3.15) (2.72) (3.92) (2.36) (2.68)
–
R2 = 05122 DW = 0.7980 F = 9.1381
–
where R2 = adjusted for degrees of freedom determination coefficient, DW = Durbin-
Watson d-statistic, and F = F-statistic.
Apart from the low Durbin-Watson statistic, which suggests some autocorrelation,
the results in (7.36) suggest that the response of inflation to changes in money stock in
Greece is significantly spread over three years.
(7.37)
118 Single-equation regression models
(7.38)
(7.39)
where n = sample size, q = total number of coefficients in the regression model, and SSR =
sum of squared residuals.
Although these criteria reward good fit, but place a penalty for extra coefficients to be
included in the model, it is very possible that conclusions based on these (and other) criteria will
be different. A model could be ranked superior under one criterion and inferior under another.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Example 7.2 The quantity theory of money for Greece, 1960–1995 (unknown lag
length)
Suppose that the spread of the response of inflation to changes in money stock in
Greece is not known. In this case we will try to estimate the lag length of the distrib-
uted lag model (7.34) by optimising the three criteria (7.37) to (7.39).
Table 7.2 presents the values of the three criteria by applying OLS to Equation
– 2 and
(7.34) for various lag lengths. According to these values it looks like criteria R
AIC favour the 5-period lag length, whilst SC favours the 4-period lag length.
However, the differences between these values are very small, thus, it looks like a
5-period lag length is acceptable for the distributed lag model. The estimation of this
model is presented below:
(7.40)
Although the various criteria indicate that the ‘best’ model is that with a 5-period distrib-
uted lag pattern, the actual estimation of this model in (7.40) shows that as the lag length
increases, the t-ratios of the estimated coefficients corresponding to the lag variables
decrease, indicating that these coefficients are statistically insignificant. This may
possibly be due to the fact that as the lag length increases, the degrees of freedom
decrease, and the introduction of more lagged variables will possibly introduce some
multicollinearity between the independent variables.
Table 7.2 Estimation results for the inflation-money growth equation for Greece
–
Lag length R2 AIC SC
1 Low degrees of freedom The higher the lag length, the lower the degrees of freedom.
Lower degrees of freedom imply lower precision (lower efficiency) of the estimates,
and therefore lower precision of the tests of hypotheses.
2 Multicollinearity The higher the lag length, the higher the chance of successive lagged
variables to be correlated. Multicollinearity leads to lower precision of the estimates
(higher standard errors) and therefore lower precision of the tests of hypotheses.
In summary, the introduction of higher lag lengths may erroneously lead to the rejection of
estimated coefficients as being statistically insignificant, due to the lower precision of the
Downloaded by [Hacettepe University] at 02:27 20 January 2017
tests of hypotheses. In order to bypass this problem of lower precision of estimates, specific
restrictions about the nature of the lag pattern must be imposed on the β coefficients of the
distributed lag model. These specific restrictions are discussed in the following sections.
Arithmetic lag scheme This is the case (Fisher, 1937) when the weights linearly decrease
according to the scheme:
(7.41)
The rationale of this scheme is that more recent values of the dependent variable have greater
influence on the independent variable than more remote values. Similarly, an increasing
arithmetic scheme can be constructed. By inserting (7.41) into (7.34) the distributed lag
model is written:
(7.42)
where the substitution for Zt is obvious. By applying OLS in (7.42) the estimate b of β is
obtained and, therefore, the estimates bi = (k+1−i) b, for i = 0, 1, 2,. . ., k, for the parameters
βi can be correspondingly calculated.
Inverted V lag scheme According to this case (DeLeeuw, 1962), the weights linearly
increase for most recent lags and then they decrease for more remote lags. The scheme, for
k being even, is the following:
120 Single-equation regression models
(7.43)
(7.44)
where the substitution for Zt is obvious. By applying OLS in (7.44) the estimate b of β is
obtained and, therefore, the estimates bi = (1+i) b, for i = 0, 1, 2,. . ., k/2, and bi = (k+1−i) b,
for i = k/2+1,. . ., k, for the parameters βi can be correspondingly calculated.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
The approach of arbitrarily assigning weights suffers from the following limitations:
1 The actual scheme of the lag structure must be known, following a theoretical basis, i.e.
if it increases or decreases, as per the inverted V scheme.
2 The mechanism for assigning specific weights to the lags must be also known, according to
previous information, i.e. if the weights come from a linear, or exponential mechanism, etc.
Example 7.3 The quantity theory of money for Greece, 1960–1995 (arbitrary
weights)
Using the data of Table 7.1 the estimates for each arbitrary weights distributed lag
scheme, say for a 6-period distributed lag equation, are the following:
(7.45)
(3.07) (5.63)
͞R2 = 0.5226 DW = 1.1313 AC = −6.1650 SC = −6.0707
Increasing arithmetic lag scheme Applying (7.41) in an increasing order, the Zt
variable is:
(0.32) (2.18)
Dynamic econometric modelling 121
–
R2 = 0.1177 DW = 0.6380 AIC = –5.5509 SC = –5.4566 (7.46)
(1.69) (3.78)
–2 (7.47)
R = 0.3215 DW = 0.7663 AIC = −5.8136 SC = −5.7193
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Although all three estimates suffer from a considerable autocorrelation, the results of
the decreasing arithmetic lag scheme (7.45) are preferable, according to all statistical
criteria. Therefore, by applying the formula (7.41), and considering that:
(7.48)
(7.49)
(7.50)
According to (7.50), a second degree polynomial could approximate the lag structure shown
in Figure 7.1(a), and a third degree polynomial could approximate the lag structure shown in
Figure 7.1(b). Generally, the degree of the polynomial is greater by one from the number of
turning points shown by the lag structure.
Substituting (7.50) into (7.34) we obtain:
122 Single-equation regression models
βi βi
0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8
(a) Lags (b) Lags
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Figure 7.1 Second (a) and third (b) order polynomial approximation of distributed lag structures.
or
or finally:
(7.51)
where the substitution of the Z’s is obvious. By considering that (7.51) satisfies the classical
assumptions presented in Section 7.1, OLS can be applied to this equation and the best linear
unbiased estimates, a and ai, of the a and ai coefficients, will correspondingly be obtained.
Having estimated with ai the αi coefficients, the actual estimates, bi, of the βi coefficients
can be calculated by using (7.50), as follows:
(7.52)
(7.53)
Dynamic econometric modelling 123
The estimates in (7.52) or (7.53) are ‘restricted least squares estimates’, according to the Almon
distributed lag approach, because they are restricted to fall on a polynomial of degree r.
Since the variances, var(ai), and the covariances, cov(aj,ah), of the estimated coefficients,
ai, can be derived by applying the OLS method to Equation (7.51), the variances, var(bi), of
the estimated coefficients, bi, can be derived by the following formula:
(7.54)
The last variances are used in the significance tests referring to the βi coefficients. Therefore,
Downloaded by [Hacettepe University] at 02:27 20 January 2017
it is very possible some of the βi coefficients were significant although some of the αi
coefficients were insignificant.
Apart from the restriction of the βi coefficients falling on the polynomial, it is common
in some cases to impose extra restrictions on the coefficients. These extra restrictions,
which are called ‘endpoint restrictions’ push the endpoint βi coefficients to be equal to zero,
i.e., β−1 = 0 and βk+1 = 0, or using (7.50) the endpoint restrictions are written:
(7.55)
(7.56)
By incorporating these restrictions (both, or either) into Equation (7.51), the OLS method on
the ‘endpoint restricted equation’ can be applied. However, it must be noted here that the
endpoint restrictions must be used with caution because these restrictions will have an
impact not only on the endpoint coefficients but also on all the coefficients of the model.
In the discussion till now about the polynomial, or Almon, distributed lag models we assumed
that we know both the lag length of the model and the degree of the polynomial. However, in
most cases these two parameters are unknown, and, thus, have to be approximated.
Determining the lag length This approach has been presented in Section 7.4.2. In
summary, searching for the lag length is a problem of testing nested hypotheses. We start by
running a regression with a very large value of k. Then, by lowering the value of k by one at
– 2 =,
a time, we run the corresponding regressions and seek to optimise a criterion, such as R
AIC, or SC (see (7.37) to (7.39)).
Determining the degree of the polynomial Similar to determining the lag length, searching
for the degree of the polynomial is a problem of testing nested hypotheses. Given the lag
length k, we start by running a regression with a large value of r. Then, by lowering the value
–
of r by one at a time, we run the corresponding regressions and seek to optimise a criterion R2
=, such as, AIC, or SC. In practice, fairly low degrees of the polynomial (r = 2 or r = 3) give
good results.
Misspecification problems In cases when the lag length, or the degree of the polynomial,
or both, have been incorrectly determined, specification problems arise. Although these
problems depend on various specific cases, they may be generally summarised as follows:
If k* is the true lag length of the distributed lag model, then if k > k* (inclusion of irrelevant
variables) the estimates are unbiased and consistent, but inefficient; if k < k* (exclusion of
relevant variables) the estimates are generally biased, inconsistent and inefficient.
124 Single-equation regression models
If r* is the true degree of the polynomial, then if r < r* (imposing invalid restrictions) the
estimates are generally biased and inefficient; if r > r* (over-imposing restrictions) the esti-
mates are unbiased but inefficient.
Example 7.4 The quantity theory of money for Greece, 1960–1995 (polynomial
distributed lag model)
Using the data of Table 7.1, the steps involved in applying the polynomial distributed
lag model are the following:
1 Determining the lag length The lag length of this model has been determined in
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(7.57)
(7.58)
3 Determining the degree of the polynomial Starting from r = 5 and lowering this
assumed degree of the polynomial by one at a time, we run the corresponding
regression equations, starting from:
(7.59)
Table 7.3 presents the values of the three criteria by applying OLS to Equation (7.59)
for various degrees of the polynomial. All criteria agree that the degree of the polyno-
mial is equal to 2.
4 Estimates of the α’s Having determined that the lag length is k = 5 and the degree
of the polynomial is r = 2, the actual estimates are given by:
(7.60)
Dynamic econometric modelling 125
–
R2 = 0.5754 DW = 0.9690 F = 14.0993 AIC = −6.1949 SC = −6.0081
b0 = f(0) = a0 = 0.4186
b1 = f(1) = a0 + a1 + a2 = 0.4542
b2 = f(2) = a0 + 2a1 +4a2 = 0.4207
b3 = f(3) = a0 + 3a1 + 9a2 = 0.3181
b4 = f(4) = a0 + 4a1 + 16a2 = 0.1463
b5 = f(5) = a0 + 5a1 + 25a2 = 0.0946
7 The endpoint restrictions For comparison purposes, Table 7.4 presents the results
from the unrestricted distributed lag estimation in (7.40), from the polynomial
distributed lag estimation in (7.61) and for the polynomial distributed lag estima-
tion where both endpoint restrictions (7.55) and (7.56) have been applied.
Obviously, other estimations could also been presented, such as polynomial distrib-
uted lag estimations with one endpoint restriction only, or with the sum of the
distributed lag weights being equal to one, and so on.
Figure 7.2 shows the three estimates presented in Table 7.4. The distribution of the lag
weights of the unrestricted distributed lag estimation (UNR) indicates one substantial
turning point, and, thus, the polynomial for the polynomial distributed lag estimation
(PDL) has been determined as degree equal to two. Finally, the polynomial distributed
126 Single-equation regression models
Table 7.4 Various estimated models for the inflation-money growth equation for Greece
Variable statistic Unrestricted (UNR) Polynomial lag (PDL) Polynomial lag with
endpoint restrictions (EPR)
0.6
0.5 UNR
PDL
EPR
0.4
Coefficients
0.3
0.2
0.1
0
0 1 2 3 4 5
–0.1 Lags
Figure 7.2 Distributed lag estimates for the inflation-money growth equation for Greece.
lag estimation with both endpoint restrictions restricts the distribution of the lag
weights to a smooth, symmetric quadratic curve.
(7.62)
In introducing model (7.62), although there is no problem in determining the lag length,
there is now a new problem: the problem of estimating an infinite number of parameters, the
βi, using a finite number of observations. Therefore, methods should be employed in order to
Downloaded by [Hacettepe University] at 02:27 20 January 2017
lower from infinite to finite the number of estimable parameters. In this section we will
present several such methods.
(7.63)
With the λ parameter, which is known as the ‘rate of change’, being positive and less than one,
it is guaranteed that the values of the β’’s corresponding to greater lags will be smaller than
those corresponding to smaller lags; with infinite time, lag weight tends to be zero. Taking
into account that β0 is common to all coefficients, the declining nature of the geometric scheme
towards zero can be seen numerically and graphically in Table 7.5 and Figure 7.3 respec-
tively. It can also be seen that the greater the value of λ, the slower the decline of the series.
Substituting (7.63) into (7.62) we get:
or
(7.64)
(7.65)
Lag λ 0 1 2 3 4 5 6 7 8 9 10
0.25 1 0.250 0.063 0.016 0.004 0.001 0.000 0.000 0.000 0.000 0.000
0.50 1 0.050 0.250 0.125 0.063 0.031 0.016 0.008 0.004 0.002 0.001
0.75 1 0.750 0.563 0.422 0.316 0.237 0.178 0.133 0.100 0.075 0.056
128 Single-equation regression models
1
0,9
0,8
0,7
0,6
λ = 0.25
λ^i
0,5 λ = 0.50
0,4 λ = 0.75
0,3
Downloaded by [Hacettepe University] at 02:27 20 January 2017
0,2
0,1
0
0 1 2 3 4 5 6 7 8 9 10
Lags
Multiplying both terms of (7.65) by λ and subtracting the result from (7.64) we obtain:
(7.66)
(7.67)
Mean lag Using formulas (7.30) and (7.32,) the mean lag is equal to:
(7.68)
Dynamic econometric modelling 129
Median lag Using formula (7.33), and taking into account that:
(7.69)
Equilibrium multiplier By using formula (7.29) the long-run multiplier is equal to:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(7.70)
This is why some authors, instead of the geometric scheme (7.63), use the following scheme:
(7.71)
(7.72)
(7.73)
Substituting (7.72) into (7.62) we get the Pascal distributed lag model:
(7.74)
This model has four unknown parameters: α, β, λ, and r. In the case when r = 1, formula (7.73)
is written wi = (1 – λ)λi and, thus, the Pascal distributed lag model is reduced to the geometric
distributed lag model. This means that by giving various values for the r parameter we can
shape the distribution of the lag weights in a way which seems more suitable for the case.
Figure 7.4 shows the distribution of these weights for λ = 0.4 and r = 1, 3, 5, respectively.
In the case that r = 2, (7.74) is written:
(7.75)
Firstly, by multiplying both terms of the once-lagged Equation (7.75) with -2λ, secondly, by
multiplying both terms of the twice-lagged Equation (7.75) with λ2, thirdly, by adding these
two results to Equation (7.75), and, finally, by rearranging we get:
130 Single-equation regression models
0.6
0.5
0.4
Weights
r=1
0.3 r=3
r=5
0.2
Downloaded by [Hacettepe University] at 02:27 20 January 2017
0.1
0
0 1 2 3 4 5 6 7 8 9 10
Lags
(7.76)
Ignoring here all the possible problems that the estimation of (7.76) creates, including the
problem of overidentification of the parameters, we could say that by introducing the Pascal
scheme with r = 2 into model (7.62), this infinite distributed lag model has been transformed
into an autoregressive model, where three parameters only have to be estimated: α, β and λ.
Although not very practical for empirical research, generalising (7.76) to include any
positive integer value of r, we get:
(7.77)
7.6.3 The lag operator and the rational distributed lag models
The manipulations of distributed lag models can be simplified by introducing the lag oper-
ator L. This operator is defined by Lxt = xt−1.
Some useful algebraic operations with the lag operator are the following:
Under the lag operator, the infinite distributed lag model (7.62) can be written:
(7.78)
where
(7.79)
is a polynomial in L.
Dynamic econometric modelling 131
Jorgenson (1963) approximated the infinite polynomial (7.79) by the ratio of two finite
polynomials in L, as:
(7.80)
(7.81)
where
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(7.82)
Equation (7.81) is the generalisation of all the distributed lag models we have seen till now.
If, for example, γ(L) = γ0 and δ(L) = 1−λL, then (7.81) becomes the geometric distributed lag
model, and if γ(L) = γ(1−λ)r and δ(L) = (1−δL)r, then (7.81) becomes the Pascal distributed
lag model.
(7.83)
Although this transformation of the models was a mechanical one, seeking just to reduce the
infinite number of the coefficients to be estimated, in this section we will try to connect the
infinite distributed lag models with specific models of economic theory.
(7.84)
The desired level of inventories of a firm being a function of its sales, or the desired level of
capital stock in an economy being a function of its output, might be examples of this model.
However, because a ‘desired’ level is not an ‘observable’ level, and thus cannot be used
in estimation, Nerlove assumed that due to various reasons in the phenomenon there is a
difference between the actual and the desired levels of the dependent variable. In fact he
assumed that, apart from random disturbances, the actual change in the dependent variable,
Yt − Yt−1, is only a fraction of the desired change, Yt* − Yt−1, in any period t, i.e.
(7.85)
132 Single-equation regression models
Equation (7.85) is known as the partial adjustment equation, and fraction γ is known as the
adjustment coefficient. The greater the value of γ, the greater the adjustment of the actual to
the desired level of the dependent variable takes place in period t. In the extreme case where
γ = 1 the adjustment is instantaneous, or, in other words, all the adjustment takes place in the
same time period.
Equation (7.85) can be written as:
(7.86)
which expresses the actual value at time t of the dependent variable as a weighted average of
its desired value at time t and its actual value at time t – 1, with γ and (1 − γ) being respec-
tively the weights. Substituting (7.84) into (7.86) and rearranging, we get:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(7.87)
Equation (7.87) is similar to Equation (7.83), which corresponds to the geometric distrib-
uted lag model. In fact, by using the lag operator, (7.87) can be written as:
(7.88)
(7.89)
(7.90)
(7.91)
The demand for money in an economy being a function of its expected long-run interest rate,
the quantity demanded being a function of the expected price, or the level of consumption
being a function of the expected, or permanent, income (Friedman, 1957), might be exam-
ples of this model.
Similar to the partial adjustment model, since the ‘expected’ level is not an ‘observable’
level, and thus cannot be used in estimation, Cagan assumed that the interested agents revise
their expectations according to the level from their earlier expectations. In fact he assumed
Dynamic econometric modelling 133
that the change in expectations, X*t − X*t−1, is only a fraction δ of the distance between the
actual level of the explanatory variable Xt and its expected level X*t−1, in any period t, i.e.
(7.92)
Equation (7.92) is known as the adaptive expectations equation, or due to its error searching
nature, the error learning equation, and fraction δ is known as the expectation coefficient.
The greater the value of δ, the greater the realisation of expectations in period t. In the
extreme case where δ = 1 expectations are fully and instantaneously realised, or, in other
words, all expectations are realised in the same time period.
Equation (7.92) can be written as:
(7.93)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
which expresses the expected value at time t of the explanatory variable as a weighted
average of its actual value at time t and its expected value at time t – 1, with δ and (1 – δ)
being respectively the weights. Substituting (7.93) into (7.91), we get:
(7.94)
By multiplying the lagged-once Equation (7.91) by (1 – δ) and subtracting the result from
Equation (7.94), and, after rearranging, we get:
(7.95)
Equation (7.95) is similar to Equation (7.83), which corresponds to the geometric distributed
lag model. In fact, by using the lag operator, (7.95) can be written as:
(7.96)
(7.97)
Equation (7.97) is nothing else but an infinite geometric distributed lag model. Therefore, we
saw that the adaptive expectations model, like the partial adjustment model, is a realisation
of the geometric distributed lag model.
(7.98)
(7.99)
(7.100)
where Y*t = desired level of the dependent variable, and X*t = expected level of the explana-
tory variable.
134 Single-equation regression models
The most representative example of this model is Friedman’s (1957) permanent income
hypothesis, according to which hypothesis, ‘permanent consumption’ depends on ‘perma-
nent income’.
Solving the system of the three equations (7.98) to (7.100), using the hints presented in the
two previous sections, we obtain the following equation (Johnston, 1984):
Yt = αγδ + βγδXt + [(1– δ) + (1 – γ)]Yt–1 – (1 – δ)(1 – γ)Yt–2 + [εt – (1– δ) εt–1] (7.101)
Equation (7.101) is not similar to those of the partial adjustment and the adaptive expecta-
tions because it includes Yt-2 among its explanatory variables. This equation, ignoring the
identification problems of the parameters involved, reminds us of Equation (7.76) of the
Pascal distributed lag models.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(7.102)
can be reduced into a dynamic model by imposing restrictions on its coefficients to fall on
specific schemes. The most popular of such restrictions is that of the geometric lag scheme:
(7.103)
(7.104)
(7.105)
(7.106)
Dynamic econometric modelling 135
or
(7.107)
where
(7.108)
and
(7.109)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
1 The error term εt is non-autocorrelated: that is, it satisfies all the classical assumptions
stated in Section 7.1.
2 The error term εt is autoregressive of the first order: that is, it has the form εt = ρεt−1 + ηt,
or it is AR(1), where ηt satisfies all the classical assumptions and ρ is the correlation
coefficient.
3 The error term εt is moving average of the first order: that is, it has the form εt = ηt-ìηt−1,
or it is MA(1), where ηt satisfies all the classical assumptions, and μ is the moving
average coefficient.
This means that if we knew the value of λ we could compute variables Wt and λt and then we
could apply OLS to Equation (7.107). For applied purposes, the values of variable Wt could
be computed recursively by:
(7.111)
However, the value of λ is not known and thus a search procedure similar to that of Hildreth
and Lu (1960) could be used. According to this procedure the sum of squared residuals from
the regression Equation (7.107) is estimated for various values of λ between zero and one.
The estimates of the parameters α, β0, θ0 and λ that correspond to the minimum sum of the
squared residuals of the searching regressions have the maximum likelihood properties of
consistency and asymptotic efficiency, because minimising the sum of the squared residuals
is equivalent to maximising the logarithmic likelihood function Y1, Y2,. . ., Yn, with respect
to α, β0, θ0 and λ, which is:
136 Single-equation regression models
(7.112)
(7.113)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
If we knew the value of the autocorrelation coefficient ρ, then Equation (7.107) could be
written as:
(7.114)
Because the error term in this equation satisfies all the classical assumptions, the OLS
method could be applied if, of course, we knew the value of λ. However, we do not know
either the value of ρ or the value of λ. Therefore, a similar searching method with the method
described in the previous section could be used, with the difference now that this method
will be two-dimensional. In other words, we will be searching for values of ρ between −1
and 1, and for values of λ between 0 and 1. The estimates of this methodology will have all
the properties of the maximum likelihood estimation because minimising the sum of the
squared residuals is equivalent to maximising the corresponding logarithmic likelihood
function with respect to α, β0, θ0, λ and ρ, which is:
(7.115)
where
(7.116)
(7.117)
(7.118)
Dynamic econometric modelling 137
and further assuming that the value of η0 is negligible, i.e., η0 = 0, then if we knew the value
of the moving average coefficient μ Equation (7.107) could be written as:
(7.119)
where
(7.120)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Because the error term in Equation (7.119) satisfies all the classical assumptions, the OLS
method could be applied if, of course, we knew the value of λ. However, we do not know
either the value of μ or the value of λ. Therefore, a similar two-dimensional searching method
with the method described in the previous section could be used. In other words, we will be
searching for values of μ between −1 and 1, and for values of λ between 0 and 1. The esti-
mates of this methodology will have all the properties of the maximum likelihood estimation
because minimising the sum of the squared residuals is equivalent to maximising the corre-
sponding logarithmic likelihood function (under the assumption that η0 = 0) with respect to
α, β0, θ0, λ and μ, which is:
(7.121)
(7.122)
1 The error term ut is non-autocorrelated: that is, it has the form υt = εt, where εt satisfies
all the classical assumptions. This is the case of the partial adjustment model (7.87).
2 The error term ut is autoregressive of the first order: that is, it has the form υt = ρut−1 +
εt, where εt satisfies all the classical assumptions and ñ is the correlation coefficient. This
case is common in most models with time series data.
3 The error term ut is moving average of the first order: that is, it has the form υt = εt −
λεt−1, where εt satisfies all the classical assumptions. This is the case of the adaptive
expectations model (7.95).
138 Single-equation regression models
7.8.2.1 Reduced form estimation with non-autocorrelated disturbances
This is the case of the partial adjustment model (7.87):
(7.123)
or
(7.124)
where
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(7.125)
In this model, the lagged dependent variable Yt−1 is a stochastic regressor which is uncorre-
lated with εt, but is correlated with past values of εt, i.e.
(7.126)
or, in other words, the stochastic explanatory variable Yt−1 is distributed independently of the
error term εt. In this case, Equation (7.124) may be estimated by OLS. However, the OLS
estimates of the coefficients and their standard errors will be consistent and asymptotically
efficient, but will be biased in small samples. Thus, these estimates are not BLUE, and there-
fore the tests of hypotheses will be valid for large samples and invalid for small samples.
In summary, we could apply the OLS methodology to Equation (7.124) as long as the
sample size is large enough. In this case, the OLS estimates a0, b0 and c1 of the coefficients
α0, β0 and γ1, respectively, could be used to estimate the initial parameters α, β and γ of the
model, with the corresponding consistent estimates a, b and c, using (7.125), as follows:
(7.127)
(7.128)
where
(7.129)
(7.130)
Dynamic econometric modelling 139
which means that because ut is correlated with ut−1 and the stochastic explanatory variable
Yt−1 is correlated with ut−1, then Yt−1 will be correlated with ut. In other words, the application
of OLS in Equation (7.110) will give biased and inconsistent estimates of the coefficients
and their standard errors. Thus, the corresponding tests of hypotheses will be invalid for
small or even for large samples. Therefore, another alternative for OLS estimation is needed.
In what follows we will present such alternative methods.
Liviatan used Xt−1 as an instrumental variable for the estimation of (7.128). Xt−1 being non-
stochastic was definitely not correlated with ut, and being one of the explanatory variables of
Yt−1, was likely to be correlated with Yt−1. The estimates a0, b0 and c1 of the coefficients α0,
β0 and γ1 of Equation (7.128), respectively, could be obtained by solving the system of the
usual ‘normal equations’ shown below:
(7.131)
Although these estimates will be biased for small samples, as long as the sample increases
the solution of system (7.131) will yield consistent estimates.
As another type of instrumental variable for Yt−1 in Equation (7.128) could be the lagged
dependent predicted variable, Ŷt−1, obtained by the following regression:
(7.132)
i.e. by the regression of the dependent Yt on lagged variables of Xt. The proper lag length in
regression (7.132) could be determined by optimising a criterion, such as those in (7.37) to
(7.39). Having obtained Ŷt−1 from (7.132), the OLS estimation applied to the following
equation:
(7.133)
which has been derived from Equation (7.128) after substituting Yt−1 with Ŷt−1,
will give, as before, consistent estimates. The method just described, i.e. the method
where as a first step we get Ŷt−1 from (7.132) and as a second step we get the final estimates
from (7.133), is called the method of ‘two stages least squares’ (2SLS). Of course, Ŷt−1 could
140 Single-equation regression models
also be derived from the regression of Yt on any other set of legitimate instruments and not
just on the set of lagged values of Xt.
However, the instrumental variables methodology deals only with the problem of correla-
tion between the stochastic regressor Yt−1 and the error term Nt and not with the problem of
autocorrelation of the error term. In fact this method is free of the autocorrelation scheme.
Therefore, the application of instrumental variables in Equation (7.128) will give consistent
but asymptotically inefficient estimates. In other words, a method for dealing with the auto-
correlation in the error term must be used. Such a method could be the following Hatanaka’s
(1976) ‘two-step’ method, which has the same asymptotic properties as the maximum likeli-
hood estimator (MLE) of normally distributed and serially correlated disturbances.
Step 1 Having estimated from the IV method consistent estimates a0, b0 and c1 of the param-
Downloaded by [Hacettepe University] at 02:27 20 January 2017
eters α0, β0 and γ1, respectively, estimate the residuals ut = Yt − a0 − b0Xt − c1Yt−1, and use
them in order to get the estimate ρ̂, for the autoregressive coefficient ρ, by regressing ut
against its own lagged value ut−1.
Step 2 By regressing against a constant, and ut−1, where Y*t = Yt − ρ̂ Yt−1, and X*t = Yt ρ̂ Xt−1,
and Y *t−1 = Yt−1 − ρ̂ Yt−2 , we get the consistent and asymptotically efficient estimates a0, b0
and c1, of the parameters α0, β0 and γ1. If ϕ denotes the regression coefficient of ut−1, then the
two-step estimator of ρ is ρ̂ + ϕ.
Generally, the instrumental variable maximum likelihood with AR(1) estimates could be
obtained by maximising the function:
(7.134)
where
(7.135)
with Ŷt−1 having been obtained from (7.132), or any other similar estimation. This method is
based on the philosophy of the Cochrane and Orcutt method that we will see next.
(7.136)
or
(7.137)
Dynamic econometric modelling 141
which is a transformation of Equation (7.128), under the assumption that ρ is known. The
application of OLS to Equation (7.137) is nothing else but the method of generalised least
squares (GLS) under the assumption that the value of ρ is known. However, in most cases, ρ
is unknown and thus it has to be approximated by an estimate. The iterative steps of the
CORC methodology are, in this case, the following:
Iteration 1
Step 1.1 Arrange for use the following variables: Yt, Xt and Yt−1.
Step 1.2 Apply OLS to Equation (7.128) and get estimates a0, b0 and c1, of the parameters
α0, β0 and γ1, respectively.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Step 1.3 Use the estimates from step 1.2, in order to estimate the corresponding residuals
ut, i.e. ut = Yt – a0 – b0Xt – c1Yt–1
Step 1.4 Using the estimated residuals ut, from step 1.3, regress ut against its own lagged
value ut−1, and get the estimate for the autoregressive coefficient ρ.
Iteration 2
Step 2.1 Compute the following variables: Y*t = Yt − ρ̂ Yt−1, and X*t = Yt − ρ̂ Xt−1
Step 2.2 By regressing against a constant, get new estimates a0, b0 and c1, of the parameters
α0, β0 and γ1.
Step 2.3 Using the estimates from step 2.2, estimate a new set of residuals, ut, i.e.
ut = Yt − a0 − b0Xt − c1Yt−1.
Step 2.4 Using the estimated residuals ut, from step 2.3, regress ut against its own lagged
value ut−1, and get the new estimate for the autoregressive coefficient ρ.
Iteration x
By following the four steps x.1 to x.4 in each iteration x, continue this iterative procedure
until two successive estimates of the autoregressive coefficient ρ may not differ by more
than a predetermined value, say 0.0001. The estimates a0, b0 and c1 of this final iteration will
be consistent although their standard errors will be inconsistent. To correct for this inconsis-
tency, the following two-step method may be followed (Harvey, 1990):
Step 1 Having estimated from the CORC method consistent estimates a0, b0, c1 and the
parameters α0, β0, γ1 and ρ, respectively, estimate the residuals of the initial model (7.128)
and the residuals of the reduced model (7.137).
Step 2 By regressing et against a constant, and ut−1, get the consistent and asymptotically
efficient estimates a0, b0 and c1 of the parameters α0, β0 and γ1.
142 Single-equation regression models
Equation (7.136), or (7.137), could also be used in obtaining consistent and asymptotically
efficient estimates by maximising the corresponding logarithmic likelihood function with
respect to α0, β0, γ1 and ρ, which is:
(7.138)
(7.139)
where
(7.140)
We also met this case in the adaptive expectations model (7.95):
(7.141)
or
(7.142)
where
(7.143)
(7.144)
which means that ut is autocorrelated (for λ ≠ 0), and taking into account that the stochastic
explanatory variable Yt−1 is correlated with εt−1, this variable is also correlated with υt, because
υt includes εt−1. In other words, the autocorrelation of the error term and the correlation between
an explanatory variable (Yt−1) and the error term makes the application of OLS in Equation
(7.139) to give biased and inconsistent estimates of the coefficients and their standard errors.
Thus, the corresponding tests of hypotheses will be invalid for small or even large samples.
Dynamic econometric modelling 143
In order to estimate (7.139) various methods could be used. The instrumental variable
method, for example, could be used to obtain consistent estimates of the coefficients because
this method does not require any specific assumptions regarding the error term. However,
knowing that the error term in (7.139) is of the moving average scheme, the method that will
incorporate this information into the estimation methodology may improve the asymptotic
efficiency of its estimates. Such methods are the following:
they obtained:
(7.145)
Lagging repeatedly (7.145) and substituting the result into the same Equation (7.145)
we get:
(7.146)
If we knew the value of λ we could apply OLS in Equation (7.146) because all the variables
involved are either known or they could be computed, and the error term εt is white noise.
However, the value of λ is not known and thus a search procedure similar to that of Hildreth
and Lu could be used. According to this procedure, the sum of squared residuals from the
regression Equation (7.146) is estimated for various values of λ between zero and one. The
estimates of the parameters α, β0 and λ that correspond to the minimum sum of the squared
residuals of the searching regressions have the maximum likelihood properties of consis-
tency and asymptotic efficiency, because minimising the sum of the squared residuals is
equivalent to maximising the logarithmic likelihood function Y1, Y2, . . ., Yn, with respect to
α, β0, W0 and λ, which is:
(7.147)
(7.148)
(7.149)
where
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(7.150)
with Ŷt−1 having been obtained from (7.132), or any other similar estimation.
The estimates of Equation (7.148) could be obtained by maximising the corresponding
logarithmic function, which is the following:
(7.151)
(7.152)
where u^t are the residuals of the disturbances ut, given by ut = ρut−1 + εt, with εt being
well behaved. This statistic takes the value of 2 when there is no autocorrelation in the
Dynamic econometric modelling 145
disturbances, and the values of 0 and 4 when there exists a perfectly positive or a perfectly
negative autocorrelation, respectively.
The estimate ρ̂ of the autocorrelation coefficient ρ in the disturbances is given by:
(7.153)
It can be shown that between the Durbin-Watson d-statistic and ρ^ it approximately holds
that:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(7.154)
However, in the presence of lagged dependent variables in the regression model, the Durbin-
Watson d-statistic is biased because it tends toward 2, suggesting, thus, that there is no
autocorrelation, although autocorrelation may be present. Therefore, in such cases, another
test for detecting autocorrelation should be employed.
Durbin (1970) developed a ‘large-sample statistic’ for detecting first-order serial correla-
tion in models when lagged dependent variables are present. This statistic is called the
‘h-statistic’ and is given by:
(7.155)
(7.156)
Step 1 Estimate Equation (7.128) and get the residuals and the estimated variance of the
coefficient of the lagged dependent variable Yt−1.
Step 1 Properly define the model. The model could be of the following form:
(7.157)
Step 2 Apply OLS to Equation (7.157) and obtain the corresponding residuals ut.
Step 3 Regress ut against all the regressors of the model, i.e. against the constant, X1t, . . .,
Xpt, Yt−1, . . . , Yt−q, plus all the lagged residuals till order m, i.e. ut−1, . . ., ut−m, and obtain the
corresponding R2.
(7.158)
(7.159)
This test can be used also in the case when the error term follows a moving average process
of order m, i.e. it is:
(7.160)
Step 1 Divide the variables included in the structural equation to those which are indepen-
dent of the error term, say, X1t, X2t, . . ., Xpt, and to those which are not independent of the
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Step 2 Define the set of instruments, say, W1t, W2t, . . ., Wst, where s ≥ q.
Step 3 Apply the IV estimation to the original equation and obtain the corresponding resid-
uals ut.
Step 4 Regress ut against all the independent of the error term variables, i.e. the constant,
X1t,. . ., Xpt, plus the instruments Wt−1,. . ., Wt−q, and obtain the corresponding R2.
(7.161)
(7.162)
In the case that the alternative hypothesis is accepted, at least one of the instrumental vari-
ables is correlated with the error term, and, therefore, not all instruments are valid. This
means that the IV estimates of the coefficients are also not valid.
Example 7.5 The private consumption function for Greece, 1960–1995 Assume that the
expected, or permanent, private disposable income Y* determines private consumption C in
Greece, according to the linear function:
148 Single-equation regression models
(7.163)
Considering the adaptive expectations model, the mechanism used to transform the
unobserved permanent level of the private disposable income into an observable
level is:
(7.164)
Following the methodology in 7.7.2, the private consumption function in observable levels
is the following:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(7.165)
or in estimable form:
(7.166)
where
(7.167)
Finally, we saw that Equation (7.165) can be expanded in a geometric (or Koyck) infinite
distributed lag model, as follows:
(7.168)
Equation (7.168) is nothing else but Equation (7.13), which we saw in the beginning of this
chapter.
Table 7.6 presents annual data from 1960 to 1995 for private consumption and
private disposable income of the Greek economy, for the estimation of Equation
(7.166). Furthermore, this table presents data for gross investment, gross national product,
and long-term interest rates, which are necessary for the estimation purposes of the next
example. All nominal data are expressed at constant market prices of the year 1970, and in
millions of drachmas. Private disposable income is deflated using the consumption price
deflator.
Applying the techniques described in Section 7.8 to Equation (7.166), Table 7.7 presents
the obtained estimates. The instruments used for the IV estimation are a constant, Yt, Yt−1
and Yt−2. For the application of the Zellner-Geisel method to the corresponding Equation
(7.146), we computed Zt by the recursion:
(7.169)
Applying OLS to Equation (7.146), having first computed Zt using (7.169) and λt, for
different values of λ between zero and one, the sum of the squared residuals (SSR) has been
estimated. Figure 7.5 presents these results. It is seen in this figure that the value of λ, or the
Dynamic econometric modelling 149
Table 7.6 Data for the Greek economy referring to consumption and investment
value of (1−δ) in the adaptive expectations specification, that minimises the sum of squared
residuals is 0.65. In our experimentation the steps used for the change of λ had the quite large
value of 0.05. However, having found that the value of λ that globally minimises the SSR is
0.65, the precise value of λ could be found by decreasing the steps of change to very small
values and repeating the procedure in the neighbourhood of 0.65. The final results are shown
in Table 7.6, obtained by using the maximum likelihood routine of Microfit, where we can
see that the exact estimate of λ is 0.65879, i.e. not different from the value of 0.65 found
earlier.
Taking into account that autocorrelation in Equation (7.166) is of the moving average
scheme (although not significant in the actual estimates) the ML with MA(1) estimates
150 Single-equation regression models
Table 7.7 Estimates of the consumption function for Greece, 1960–1995
h 1.4563[0.145]
Breusch-Godfrey LM(1) 1.5630 [0.211] 2.6140 [0.106]
SARGAN 0.0826 [0.774]
MA(1) 0.24113[0.197]
in Table 7.7 have generally the highest asymptotic efficiency compared to the other two
estimates in the same table. The estimated adaptive expectations model is written as:
(7.170)
(7.171)
25
20
SSR(× 10^8)
15
SSR
10
0
0.05
0.1
0.15
0.2
0.25
0.03
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
Values of λ
(7.172)
Estimates from (7.170) yield a marginal propensity to consume (MPC) equal to 0.26,
implying that a 100 drachmas increase in the current income would increase current
consumption by 26 drachmas. However, if this increase in income is sustained, then from
Equation (7.144) we see that the marginal propensity to consume out of permanent income
will be 0.77, implying that a 100 drachmas increase in permanent income would increase
current consumption by 26 drachmas. By comparing these two marginal propensities to
consume (the short-run MPC = 0.26 and the long-run MPC = 0.77) and because the expecta-
Downloaded by [Hacettepe University] at 02:27 20 January 2017
tion coefficient is estimated to be 0.34, this implies that about one-third of the expectations
of the consumers are realised in any given period.
Finally, using the estimates in (7.172) and formulas (7.68) to (7.70) we find, respectively,
the mean lag, the median lag, the impact multiplier, and the equilibrium multiplier as:
We saw above that the short-run MPC is the impact multiplier and the long-run MPC is the
equilibrium multiplier. Furthermore, the mean lag is 1.93, showing that it takes on average
about two years for the effect of changes in income to be transmitted to consumption changes.
Finally, the medial lag is 1.66, meaning that 50 per cent of the total change in consumption
is accomplished in about one and a half years. In summary, consumption adjusts to income
within a relatively long time.
(7.173)
Considering the partial adjustment model, the mechanism used to transform the unob-
served level of the desired capital stock into an observed one is:
(7.174)
or
(7.175)
152 Single-equation regression models
meaning that the actual net investment is only a fraction of the investment required to
achieve the desired capital stock. Following the methodology in 7.7.1, the capital stock
function in observable levels is the following:
(7.176)
Although Equation (7.176) is in an estimable form, in fact the partial adjustment form,
this equation produces some estimation problems because the data for the capital stock
variable is not usually very reliable. However, by taking into account that capital stock
Kt at the end of a period is equal to the capital stock Kt−1 at the beginning of the period
plus gross investment It less depreciation Dt, i.e.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(7.177)
and assuming that depreciation in period t is proportional to the existing capital stock
in period t − 1, i.e.
(7.178)
where δ = depreciation rate and ηt = error term, then Equation (7.176) can be expressed
in variables which may be more reliable in terms of data.
Substituting (7.178) into (7.177), and then substituting in the result (7.175), and
then rearranging, we obtain that:
(7.179)
Multiplying both terms of the lagged-once Equation (7.179) times (1 − δ), adding the
result to Equation (7.179), and rearranging, we get that:
(7.180)
(7.181)
or
(7.182)
where
(7.183)
The good thing about the specification of Equation (7.183) (for more specifications see
Wallis (1973) and Desai (1976)) is that by estimating it we can estimate all the param-
eters of the partial adjustment model in Equation (7.176) plus the depreciation rate,
without even having data on the capital stock and/or depreciation.
Dynamic econometric modelling 153
Using the data in Table 7.6, It = gross investment and Qt = gross national product,
Table 7.8 presents the estimates by applying to Equation (7.182) techniques described
in section 7.8. The instruments used for the IV estimation are a constant, Qt, Qt−1, rt and
rt−1, where rt = long-term interest rate.
Although all estimates in Table 7.8 are generally acceptable, with the coefficients
having the proper a priori signs that Equation (7.181) predicts, the question is which
of these estimates is the ‘best’. In order to answer this question we should look at the
specification of Equation (7.181). Because the error term in (7.181) is of the first-order
moving average type it seems that the ML with MA(1) estimates in Table 7.8 are more
appropriate. Therefore, using these estimates (a0, b0, b1, c1) of the corresponding
regression coefficients (α0, β0, β1, γ1) the estimated parameters (a, b, c, d) of the
corresponding parameters (α, β, γ, δ) of the accelerator model of investment for Greece
are given by the formulas (7.183), i.e.
(7.184)
154 Single-equation regression models
These estimated parameters seem plausible, with the implicit desired capital stock/
GNP ratio being 2.52526 (neglecting the constant term), the depreciation rate being
0.05541, and the adjustment coefficient being 0.22092. The adjustment coefficient is
quite small, showing that the adjustment of the current level of capital stock to the
desired level of capital stock is rather slow.
developed to explain the state of rest of variables, that is, their equilibrium values,
regression models are also static, lacking dynamic characteristics. Economic time series
data, on the other hand, contain all the information on the dynamic adjustments of the
variables over time towards the equilibrium state. There is, therefore, a big gap between
the static regression models and the time series data to be used for their estimation.
• The static regression models are seldom capable of explaining the data generation
process without modification to take into account the dynamic structure of the data
generation process. Typically, when a static regression model is estimated with time
series data, the results indicate spurious regression and misspecification.
• To explain the data generation process, the static models are modified via inclusion of
lags of variables as additional regressors in the regression models.
• A specific form of lag structure gives rise to a specific regression model: Koyck and
Almon distributed lag models are the most popular forms of lag structures.
• Note that this modification is not based on theory, it is simply assumed that the dynamic
adjustment follows a particular pattern (for example, follows a partial adjustment
process). It is therefore likely that these adjustment mechanisms are not consistent with
the dynamic adjustments of the data. In this case the ‘dynamic’ regression model is
misspecified.
• The estimation and diagnostic testing of the distributed lag models gives rise to addi-
tional problems, typically resolved via modified forms of the OLS methods and new
diagnostic testing procedures. The focus of this approach is on applying correct estima-
tion and testing procedures, rather than on finding the data generation process consistent
with the observed data.
Review questions
1 In a multiple linear regression model of your choice, show that a partial adjustment
model implies a Koyck-type lag structure. What are the implications of this result?
2 The impact of advertising expenditure on sales is assumed to first increase, reach a peak
and then decline. Formulate an appropriate finite lag model of the relationship between
sales and advertising expenditure, assuming that the lag length is k. What are the conse-
quences of an incorrect order of polynomial? How would you decide on the order of
polynomial?
3 Consider the regression model Ct = β1 + β2Y et + ut, where ut ~nid(0, σ2), and (Y et – Y et −1)
= γ(Yt – Y et−1); 0 ≤ γ ≤ 1. Explain how the regression model might be estimated. What are
the problems with the OLS estimation?
Dynamic econometric modelling 155
4 Explain each of the following:
a the Pascal distributed lag model
b the partial adjustment model
c the adaptive expectation model
d the mixed partial and adaptive expectation model.
5 Explain the estimation methods of infinite distributed lag models. Give examples to
illustrate your answer.
Zellner, A. and Geisel, M. (1970). ‘Analysis of distributed lag models with application to the consump-
tion function’, Econometrica, 38, 865–888.
Unit 2
Simultaneous equation
regression models
Downloaded by [Hacettepe University] at 02:27 20 January 2017
INTRODUCTION
Most books on econometrics start with, and exert considerable efforts investigating, single-
equation models, such as demand functions, consumption functions, wage rate functions,
and so on, similar to those considered in detailed in the previous chapters. However,
economic phenomena are not unique. In most cases the single equations under investigation
are either related with other single equations, or they are part of a wider phenomenon which
may be explained by a system of equations. This chapter will introduce you to regression
models with a number of interrelated equations, within which a number of variables are
determined simultaneously. We will introduce the concepts step by step using a number of
examples to demonstrate key issues. We need to use a bit of matrix algebra here to present
some of the results in a compact fashion; however, we will keep the use of matrix algebra to
a minimum and instead rely more on practical examples to illustrate key concepts.
Key topics
• Simultaneous equation bias
• Identification
• Estimation methods (single equation/system of equations)
(8.1)
(8.2)
(8.3)
Where Qdt = quantity demanded, Qst = quantity supplied, Qt = quantity sold, Pt = price of
commodity, Yt = income of consumers, α’s and β’s = parameters, ε’s = random disturbances
also representing other factors not included in each equation, and t = specific time period.
The rationale of the system of Equations (8.1) to (8.3) is that given income, the equilibrium
quantity and the equilibrium price can be jointly and interdependently found by solving
162 Simultaneous equation regression models
the system of three equations. This means that different pairs of equilibrium prices and
equilibrium quantities correspond to different levels of income. In other words, although
income is an explanatory variable for the demand equation only, changes in income causes
effects in both the price and quantity sold.
Case 8.2 Income determination in a closed economy without government The simple
Keynesian model of income determination in an economy without transactions with the rest
of the world (closed economy) and without any government activity can be written as:
(8.4)
(8.5)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Case 8.3 The wages-prices link The basic Phillips (1958) curve can be augmented into the
following model:
(8.6)
(8.7)
where wt = (Wt – Wt–1)/Wt–1 with Wt = monetary wage level, Ut = unemployment rate (per
cent), pt = (Pt – Pt–1)/Pt–1, with Pt = price level, Yt = aggregate demand (income), α’s and β’s
= parameters, ε’s = random disturbances also representing other factors not included in each
equation, and t = specific time period. The rationale of the system of Equations (8.6) and
(8.7) is that given the unemployment rate and aggregate demand levels, then wage inflation
and price inflation can be jointly and interdependently found by solution of this system. This
means that different pairs of money wage inflation and price inflation correspond to different
levels of unemployment rate and aggregate demand. In other words, although the unemploy-
ment rate enters the wages function only and aggregate demand enters the prices function
only, changes in both or any of these two explanatory variables causes effects in both the
money wages inflation and prices inflation.
A common property of the three cases here is that, in each case, some of the variables
included are jointly and interdependently determined by the corresponding system of equa-
tions that represents the economic phenomenon. In the first case, for example, price Pt and
quantity Qt are jointly and interdependently determined, whilst income Yt and the distur-
bances εdt and εst affect Pt and Qt but they are not affected by them. In the second case,
consumption Ct and income Yt are jointly and interdependently determined, whilst invest-
ment It and the disturbances εt affect Ct and Yt but they are not affected by them. In the third
Simultaneous equation models and econometric analysis 163
case, money wage inflation wt and price inflation pt are jointly and interdependently deter-
mined, whilst the unemployment rate Ut, aggregate demand Yt and the disturbances εwt and
εpt affect wt and pt but they are not affected by them.
Since some of the variables in these systems of equations are jointly and interdependently
determined, changes in other variables in the same systems of equations that are not deter-
mined by these systems, but are determined outside the systems, are nonetheless transmitted
to the jointly and interdependently determined variables. This transmission takes place as an
instantaneous adjustment, or feedback, and we say that the system is a system of simulta-
neous equations. In the first case, for example, the initial result of an increase in income at
the existing price level will be increases in demand. The result of a higher demand will be an
increase in the price level, which pushes demand to lower levels and supply to higher levels.
This feedback between changes in price levels and changes in quantities continues till the
Downloaded by [Hacettepe University] at 02:27 20 January 2017
system is again in equilibrium, i.e. till quantity demanded is equal to the quantity supplied.
Generally, we say that a set of equations is a simultaneous equations system, or model, if
all its equations are needed for determining the level of at least one of its jointly and interde-
pendently determined variables. In what follows we investigate models of simultaneous
linear equations.
or error terms.
(8.8)
where the Y’s are G endogenous variables, the X’s are K predetermined variables (exoge-
nous and lagged dependent variables), the ε’s are the disturbances, the γ’s and β’s are the
structural parameters, and t = 1, 2, . . ., n. This structural model is complete because the
number of equations is G as it is the number of endogenous variables. Of course, the equa-
tions in this model may contain any type of equations. Furthermore, some of the structural
coefficients may equal zero, indicating that not all equations involve the same exact vari-
ables. The inclusion of a constant term is indicated by the unitary values of one for the X
variables. Normally, the γii coefficients (diagonal coefficients) are set equal to one to indicate
the dependent variable of the corresponding equation.
Model (8.8) may be written in matrix form as:
(8.9)
or
(8.10)
Simultaneous equation models and econometric analysis 165
where Γ is a G × G matrix of the γ coefficients, B is a G × K matrix of the β coefficients, Yt
is a G × 1 vector of the G endogenous variables for time t, Xt is a K × 1 vector of the K prede-
termined variables for time t, and εt is a G×1 vector of the structural disturbances for time t.
The assumptions underlying structural disturbances are those of the classical normal
linear regression model. These assumptions are written as:
(8.11)
or in matrix form:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(8.12)
where symbol (′) indicates ‘transpose’, and Σ is the so-called variance-covariance matrix of
disturbances.
Since model (8.8) is complete, it may be generally solved for endogenous variables. This
solution is called the reduced form model and it is written as:
(8.13)
where the π’s are the reduced form coefficients, and the u’s are the reduced form distur-
bances. The reduced form coefficients show the effects on the equilibrium values of the
endogenous variables from a change in the corresponding exogenous variables after all feed-
backs have taken place.
Model (8.13) may be written in matrix form as:
(8.14)
or
(8.15)
(8.16)
which is the explicit solution for the endogenous variables it contains, under the assumption
that the inverse matrix Γ−1 exists, or that matrix Γ is nonsingular. In other words, solution
(8.16) is the reduced form model corresponding to the structural form model (8.10).
By comparing (8.16) with (8.15), we obtain:
(8.17)
and
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(8.18)
From (8.17) it is seen that the reduced form coefficients are functions of the structural coef-
ficients. Furthermore, from (8.18) it is seen that each reduced form disturbance is a linear
function of all structural disturbances, and, therefore, the stochastic properties of the reduced
form disturbances depend on the stochastic properties of the structural disturbances. These
properties are written as:
(8.19)
(8.20)
Having established the notation and the properties of the general structural model and the
corresponding reduced form model, we distinguish two important cases of specific models,
according to the values of the γ coefficients in matrix Γ.
1 Seemingly unrelated equations This is the case where matrix Γ is diagonal, i.e. it has
the following form:
(8.21)
Simultaneous equation models and econometric analysis 167
In this case each endogenous variable appears in one and only one equation. In fact, we
do not have a system of simultaneous equations, but instead we have a set of seemingly
unrelated equations.
2 Recursive equations model This is the case where matrix Γ is triangular, i.e. it has the
following form:
(8.22)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
In this case the first equation contains one only endogenous variable, let us say the first
endogenous variable. The second equation contains the first endogenous variable plus a new
one, let us say the second endogenous variable, and so on. The final equation contains all the
endogenous variables of the system. In other words, the solution for the first endogenous
variable is completely determined by the first equation of the system. The solution for
the second endogenous variable is completely determined by the first and the second equa-
tions of the system, and so on. The solution for the final endogenous variable is completely
determined by all equations of the system.
(8.23)
(8.24)
According to the notation established in Section 2, and having classified variables Ct and
Yt as being endogenous and variable It as being exogenous, the structural model of
Equations (8.23) and (8.24) is written in standard form as:
(8.25)
(8.26)
where I is the unitary variable (a n×1 vector, all values being one) and the disturbance term
εt follows the assumptions of the classical normal linear model, i.e.
(8.27)
168 Simultaneous equation regression models
According to (8.10), (8.26) can be written as:
(8.28)
where
(8.29)
Solving the model of Equations (8.23) and (8.24) for endogenous variables we obtain the
explicit solution:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(8.30)
(8.31)
or
(8.32)
where
(8.33)
Solution (8.30), or equivalently (8.32), is the reduced form model of the structural model of
the two simultaneous equations: (8.23) and (8.24). If fact, this solution can be verified by
applying (8.17) and (8.18) respectively as:
(8.34)
and
Simultaneous equation models and econometric analysis 169
(8.35)
Of course, for the existence of the inverse of matrix Γ it must be 1−α1 ≠ 0. Furthermore, from
(8.19) it can be derived that the properties of the reduced form disturbances υ1t = υ2t = υt are
given by:
(8.36)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
We said in section 8.1 that the reduced form coefficients show the effects on the equilibrium
values of the endogenous variables from a change in the corresponding exogenous variables
after all feedbacks have taken place. Coefficients π22=1/(1 − α1) and π12 = α1/(1 − α1), for
example, in (8.33), show the ‘total effects’ on Yt and on Ct respectively, of a change in It. By
analysing, for example, coefficient π22, we obtain:
(8.37)
In other words, (8.37) shows that the ‘total effect’ 1/(1 – α1) on Yt of a change in It, i.e. the
investment multiplier, may be divided between a ‘direct effect’ equal to 1, and an ‘indirect
effect’ equal to α1/(1 – α1). The direct effect, 1, is the direct increase in Yt, shown in the
income accounting identity (8.24), as the coefficient of It, and the indirect effect is the indi-
rect increase in Yt through the increase in consumption Ct, α1/(1 – α1), in Equation (8.30),
transmitted to the income accounting identity.
In the structural model of Equations (8.23) and (8.24) it is assumed that the exogenous
variable It is independent of the error term εt. However, this is not true for the endogenous
variable Yt, although this variable appears as an explanatory variable in Equation (8.23).
This is because this variable is jointly and interdependently determined by variables It and εt,
as shown in the reduced form model in (8.30) or (8.32), and, therefore, it can be proved that:
(8.38)
(8.38) shows that the explanatory variable Yt in Equation (8.23) is not independent of the
error term εt, and, therefore, this equation does not satisfy the assumptions of the classical
regression model. Thus, the application of OLS to this equation would yield biased and
inconsistent estimates.
Generally, because the endogenous variables of a simultaneous equations model are all
correlated with the disturbances, the application of OLS to equations in which endogenous
variables appear as explanatory variables yields biased and inconsistent estimates. This
failure of OLS is called simultaneous equation bias.
This simultaneous equation bias does not disappear by increasing the size of the sample,
because it can be proved that:
170 Simultaneous equation regression models
(8.39)
where a1 is the OLS estimator of α1. (8.39) shows that the probability limit of a1 is not equal to
the true population parameter α1, and thus the estimator is inconsistent. In fact, (8.39) shows that
the plim(a1) will be always greater than α1 because α1<1 and the variances are positive numbers.
we also saw in the previous section that the reduced form model contains only predeter-
mined variables on the right hand side of its equations, which are not correlated with the
reduced form disturbances. Therefore, by assuming that the reduced form disturbances
satisfy the classical linear regression model assumptions, the application of the OLS method
to the structural form equations generally gives unbiased and consistent estimates.
The question that now arises is, can we estimate the structural parameters of the model by
using the consistent estimates of the reduced form parameters? The answer to this question
is that there may be a solution to this problem by employing (8.17), which connects the
parameters of the reduced form model with the parameters of the structural model. Let us
consider the following examples.
(8.40)
This means that if we had consistent estimates of the π’s, then we could get consistent
estimates of the α’s by solving the system of equations in (8.40). By denoting with ^
the estimates of the π’s and with ‘a’ the estimates of the α’s, the estimates of the struc-
tural coefficients are:
(8.41)
This technique, by which we first estimate the reduced form parameters and then
obtain through the system (8.40) estimates of the structural parameters, is called
indirect least squares (ILS).
Table 7.6 in chapter 7 contains data referring to the private consumption (Ct) and to
private disposable income (Yt) for an EU member, Greece. Furthermore, regarding private
savings (St = Yt − Ct), it assumed that all are invested (It). In other words, we have the
simple structural model of Equations (8.23) and (8.24) and the corresponding reduced
form model (8.30), or (8.33).
Simultaneous equation models and econometric analysis 171
Using the data in Table 7.6, the OLS estimates of the reduced form equations of the
model are (s.e. = standard errors and t = t-ratio in absolute values):
(8.42)
and
(8.43)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
and, therefore, from (8.41) we obtain the estimates of the structural coefficients, using
the method of indirect least squares, as:
(8.44)
(8.45)
By comparing (8.45) with (8.44) we see that a1,OLS = 0.77958>a1,ILS = 0.76317. This
result verifies, in a sense, the theoretical result stated in (8.39).
(8.46)
(8.47)
172 Simultaneous equation regression models
where Qt = the quantity of meat, Pt = the price of meat, Pft = the price of fish as a substitute
for meat, Yt = income, wt = cost of labour as a main factor of production, and εt’s are the
error terms. According to the laws of demand and supply we expect α1 < 0 and β1 > 0,
respectively. We expect α2>0 because fish is a substitute for meat and α3 > 0 because the
higher the income, the greater the meat demanded. Finally, we expect β2 < 0 because the
higher the prices of the factors of production, the lower the production.
By solving the system of two structural equations (8.46) and (8.47) for the two
endogenous variables Pt and Qt, we get the following reduced form model:
(8.48)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(8.49)
where
(8.50)
(8.51)
and
(8.52)
If we make use of the data in Table 8.1 where Qt = meat consumption, Yt = personal
disposable income, Pt = the meat consumption price index, Pt = the fish consumption
price index, and wt = real unit labour cost, the OLS estimates of the two reduced form
equations (8.48) and (8.49) are:
(8.53)
and
(8.54)
Having obtained the reduced form estimates in (8.53) and (8.54), let us try, following
the methodology of the previous example, to find the indirect least squares estimates
of the structural form model. Let us try, for example, to find an estimate b1 of β1. From
(8.50) and (8.51) we get:
Simultaneous equation models and econometric analysis 173
(8.55)
However, from the same system of (8.50) and (8.51) we can also get that:
(8.56)
174 Simultaneous equation regression models
By comparing the two values of the estimate b1 in (8.55) and (8.56) we see that these
values are very different. This means that by applying, in this example, the same meth-
odology of indirect least squares as we did in example 8.1 we get two consistent but
different estimates of β1. This result is due to the fact that, in estimating Equations
(8.48) and (8.49) we did not take into account the restriction of π23/π13 = π24/π14. In
other words, the use of the indirect least squares method does not always give unique
estimates. This creates a problem which is known as the problem of identification.
8.4 Identification
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Let us summarise the simultaneous equations models used till now as follows:
Structural equations model:
(8.56a)
(8.56b)
where
(8.57)
(8.58)
(8.59)
(8.60)
(8.61)
Furthermore, we saw that if we apply an OLS estimator to the structural model, then we have
the problem of simultaneous equation bias, i.e. we will get inconsistent estimates of the
structural coefficients Β and Γ, because the endogenous variables of the model, Yt, are not
independent of the structural error term, εt. We also saw that we can apply the OLS estimator
to the unrestricted reduced form model, in order to obtain consistent estimates of the reduced
form coefficients Π, because the predetermined variables, Xt, are independent of the reduced
form error term, υt. Finally, we saw that using the indirect least squares method, i.e. having
first estimated the reduced form coefficients consistently with OLS and secondly obtaining
consistent estimates of the structural coefficients through the coefficients system, there are
cases where we can get unique estimates. Of course, there are other cases where we cannot
Downloaded by [Hacettepe University] at 02:27 20 January 2017
obtain unique estimates of the structural coefficients, or estimates at all. This brings the
problem of identification.
Equations in simultaneous equations models may be grouped into the following two
categories:
Identified equations These are the equations for which estimates of the structural coeffi-
cients can be obtained from the estimates of the reduced form coefficients.
Unidentified or underidentified equations These are the equations for which estimates of the
structural coefficients cannot be obtained from the estimates of the reduced form
coefficients.
The identified equations may be further grouped into the following two categories:
Exactly or just or fully identified equations These are the identified equations for which a
unique estimate of the structural coefficients can be obtained.
Overidentified equations These are the identified equations for which more than one
estimate of at least one of their structural coefficients can be obtained.
Generally, the problem of identification arises because the same reduced form model may be
compatible with more than one structural model, or, in other words, with more than one
theory. We say then that we have observationally equivalent relations and we cannot
distinguish them without more information.
Case 8.4 Underidentified equations If Q = quantity and P = price, assume the simple struc-
tural demand-supply model under equilibrium conditions:
where
(8.63)
176 Simultaneous equation regression models
(8.64)
(8.65)
(8.66)
where
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(8.67)
By counting the coefficients of the structural and reduced form models we see that the struc-
tural model has four coefficients and the reduced form model has two coefficients.
Furthermore, each of the two equations in the coefficients system (8.67) contains all four
structural coefficients. Therefore, it is not possible from system (8.67) to find solutions for
the structural coefficients and, thus, both structural equations are underidentified.
The problem of underidentification of both equations in this model is shown in Figure 8.1.
Figure 8.1(a) represents some scatter points of pairs of equilibrium prices and quantities
data. Figure 8.1(b) shows the scatter points plus some different structures of demand (D, d)
and supply (S, s) curves. The reduced scatter points of Figure 8.1(a) may be the result of the
intersection of the demand (D) and supply (S) curves, or of the demand (d) and supply (s)
curves, or of any other demand or supply curves in Figure 8.1(b). In other words, having
only the information included in the scatter points, it is impossible to distinguish which
structure of demand and supply curves corresponds to these scatter points. Thus, more than
one structure (theories) are consistent with the same scatter points (data) and there is no way
of distinguishing them without further information. These structures are underidentified.
Let us see the same problem presented in Figure 8.1 from another point of view. If we
multiply Equation (8.63) by λ, where 1 ≥ λ ≥ 0, and Equation (8.64) by (1 − λ) and add the
two results, we obtain the following combined equation, which is a linear combination of the
two original equations:
(8.68)
P P
S s
S
s
D
d
D
S d
s
D
d
Q Q
(a) (b)
(8.69)
Observing the combined Equation (8.68) we see that it has the exact same form as the forms
of the original Equations (8.63) and (8.64). We say that these three equations are observa-
tionally equivalent. Therefore, if we just have pairs of data on Qt and Pt, we do not know if
that by regressing Qt on Pt the underlying relation is that of the demand Equation (8.63), or
of the supply Equation (8.64), or of the combined Equation (8.68). In other words, more
than one structure (relation) are consistent with the same data and there is no way of
distinguishing them without further information. These structures are underidentified.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Case 8.5 Exactly or just identified equations Assume a structural demand-supply model
under the equilibrium conditions as:
(8.70)
(8.71)
This model differs from that of Case 8.4 because the demand equation now contains more
information. The quantity demanded depends now not only on prices but on income, Yt, as
well. The reduced form model is:
(8.72)
(8.73)
where:
(8.74)
By counting the coefficients of the structural and of the reduced form models we see that the
structural model has five coefficients and the reduced form model has four coefficients.
Therefore, it is not possible from the coefficients system (8.74) to find solutions for all of the
structural coefficients. In fact, we see that from system (8.74) we can find a unique solution
for the coefficients of the supply function as:
(8.75)
Unfortunately, we cannot find, from the same system, solutions for all coefficients of the
demand function. Therefore, the supply function is exactly identified, whilst the demand
function is not identified.
The exact identification of the supply equation and the non-identification of the demand
equation in this model is shown in Figure 8.2. Figure 8.2(a) represents some scatter points of
pairs of equilibrium prices and quantities. Figure 8.2(b) shows the scatter points plus the
information that the demand curve shifts over time to the right, from D1 to D3, because of
178 Simultaneous equation regression models
P P
S
D3
D2
D1
Q Q
(a) (b)
Figure 8.2 Market equilibria and exact identification of the supply function.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
increasing income, and the supply curve remains stable, or relatively stable. Therefore, the
introduction of this new information in this model, with respect to the model in Case 8.4,
distinguishes a unique supply curve, but it is impossible to distinguish a unique demand
curve. The observed values of Qt and Pt, i.e. the intersection points of demand and supply
curves, trace (identify) the supply curve. Thus, the supply equation is exactly identified,
whilst the demand equation is not identified.
This can be also seen in the following combined equation, which is a linear combination
of the two original Equations (8.70) and (8.71):
(8.76)
where
(8.77)
Observing the combined Equation (8.76) we see that it has the same exact form to that of the
demand Equation (8.70) and different form from that of the supply Equation (8.71). We say,
therefore, that the combined and the demand equations are observationally equivalent,
meaning that the demand function is not identified, and the combined and the supply equa-
tions are not observationally equivalent, meaning that the supply function is identified.
Case 8.6 An exactly or just identified model Assume a structural demand-supply model
under the equilibrium conditions as:
(8.78)
(8.79)
This model differs from the model in Case 8.5 because the supply equation now contains
more information. The quantity supplied depends now not only on prices, but on wages, Wt,
as well, depicting the negative effect of the cost of a major factor of production to the quan-
tity produced and supplied. The reduced form model is:
(8.80)
Simultaneous equation models and econometric analysis 179
(8.81)
where
(8.82)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
By counting the coefficients of the structural and the reduced form models we see that the
structural model has six coefficients and the reduced form model also has six coefficients.
Therefore, it is generally possible from the coefficients system (8.82) to find a unique
solution for all structural coefficients, and, thus, both equations are exactly identified. In this
case the model as a whole is exactly identified.
We can derive the same results using the following combined equation, which is a linear
combination of the two original Equations (8.78) and (8.79):
(8.83)
where:
(8.84)
Observing the combined Equation (8.84) we see that it has a different form from the forms
of both the demand Equation (8.78) and the supply Equation (8.79). We say, therefore, that
the combined and the demand and supply equations are not observationally equivalent,
meaning that both the demand and the supply functions are identified.
Case 8.7 Overidentified equations Assume a structural demand-supply model under the
equilibrium conditions as:
(8.85)
(8.86)
This model differs from the model of Case 8.6 because the demand equation now contains
even more information. The quantity demanded depends now not only on prices and income
but on the prices of a substitute good, Pft, as well, thus depicting the positive effect on the
quantity demanded by an increase in the price of the substitute good. We met this case in
Case 8.2. The reduced form model is:
(8.87)
180 Simultaneous equation regression models
(8.88)
where
(8.89)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
By counting the coefficients of the structural and of the reduced form models we see that the
structural model has seven coefficients and the reduced form model has eight coefficients.
Therefore, it is not possible from the coefficients system (8.89) to find a unique solution for
all of the structural coefficients. In fact, we see that from system (8.89) we can obtain:
(8.90)
meaning that from the same data we can get two generally different estimates of β1.
Furthermore, because β1 appears in all the equations of system (8.89), for each estimate of
β1 in (8.90) we get a set of estimates of the other structural coefficients as well. Therefore,
the two equations are overidentified.
We can derive the same results by using the following combined equation, which is a
linear combination of the two original Equations (8.85) and (8.86):
(8.91)
where
(8.92)
Observing the combined Equation (8.91) we see that it has a different form from that of the
demand Equation (8.85) and the supply Equation (8.86). We say, therefore, that the combined
and the demand and supply equations are not observationally equivalent, meaning that both
the demand and supply functions are identified.
Summarising the cases, we can say that if an equation has an omitted variable, in a two
simultaneous equations system, then this equation is identified. This ‘condition’ is the
general topic of investigation in the next section.
(8.93)
more parameters than the reduced from model. Therefore, the indirect least squares method
is not generally successful in providing unique estimates of the structural parameters,
because the parameters of the structural model are in excess of G2 with respect to the param-
eters of the reduced form model. For obtaining unique estimates, i.e. for identification, we
need further prior information in the form of G2 a priori restrictions on the parameters.
This prior information referring to the parameters of the Ã, Â and Ó matrices comes in
several forms, as follows:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Normalisation This is the case in each equation when the coefficient of one of the G endogenous
variables is normalised to be equal to one. Usually it is set that ãii = 1, for I = 1, 2, . . ., G. It corre-
sponds, in other words, to the endogenous variable that appears on the left hand side of the equa-
tion. Therefore, with normalisation, the G2 a priori restrictions reduce to G(G − 1) restrictions.
Zero restrictions This is the case when some of the endogenous or predetermined variables
do not appear in each of the equations. The corresponding coefficients of the excluded vari-
ables are set equal to zero.
Identity restrictions This refers to the inclusion of identities in the model. The identities do not
carry any coefficients to be estimated, but they aid in the identification of other equations.
Parameter restrictions within equations This refers to certain relationships among param-
eters within equations. If the Cobb-Douglas production function, for example, is of constant
returns of scale, the sum of the elasticities of the factors of production must equal one.
Parameter restrictions across equations This refers to certain relationships among param-
eters across equations. If consumer behaviour in the European Union member states, for
example, is the same, then the marginal propensity to consume across the member states
consumption functions must be set as equal across equations.
(8.94)
For a single equation of the system (8.55), say equation i, (8.94) becomes:
(8.95)
where
(8.96)
182 Simultaneous equation regression models
For this new writing of the coefficients system, let as introduce the following notation:
Under this notation, and without loss of generality, (8.96) is written as:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(8.97)
and
(8.98)
By substituting (8.97) and (8.98) into (8.95) and by partitioning matrix Π appropriately,
(8.95) is written as follows:
(8.99)
(8.100)
(8.101)
In these two subsystems we observe that (8.101) contains γ coefficients only, which are g in
number and correspond to the coefficients of all the endogenous variables included in equation
i. By normalisation we can put one of these coefficients equal to 1 and, therefore, the number
of the unknown coefficients in (8.101) reduces to g − 1. Subsystem (8.100) contains both the γ
coefficients that are also contained in subsystem (8.101) plus the β coefficients, which are k in
number and correspond to the coefficients of all the predetermined variables included in equa-
tion i. Therefore, if we first estimate the γ coefficients from subsystem (8.101) then we can
estimate the β coefficients by substituting the estimated γ coefficients in subsystem (8.100).
Because subsystem (8.101) now contains g – 1 unknown coefficients, therefore, for the
existence of a solution this subsystem must have at least g – 1 equations, or it must be that:
Simultaneous equation models and econometric analysis 183
(8.102)
In words, (8.102) says that in order for equation i to be identified, the number of predeter-
mined variables excluded (not appearing) forming this equation i (k*) must be greater or
equal to the number of the endogenous variables included (appearing) in this equation I, less
one (g – 1). (8.102) is known as the order condition.
However, it is possible some of the equations in subsystem (8.101) are not independent
with respect to the γ and π coefficients. Therefore, the order condition is only a necessary
condition for identification and not a sufficient condition. Thus, a necessary and sufficient
condition for identification requires that the number of independent equations in subsystem
(8.101) to be equal to g – 1. This, of course, will happen if, and only if, by forming all square
submatrices of matrix Πgk*, the order corresponding to the submatrices largest non-zero
Downloaded by [Hacettepe University] at 02:27 20 January 2017
determinant is equal to g – 1, or, in other words, if, and only if, it is true that:
(8.103)
(8.103) is known as the rank condition, because it can be proved (see Kmenta, 1971) that:
(8.104)
where Δ is the matrix which consists of all the structural coefficients for the variables of
the system excluded from the ith structural equation but included in the other structural
equations. The rank condition (8.103) may be also written as:
(8.105)
Let us now summarise the order and rank conditions for identification of an equation in a
simultaneous equations model.
(8.106)
(8.107)
The properties of the conditions for identification The order condition for identification is a
necessary condition, whilst the rank condition for identification is a necessary and sufficient
condition. In other words, for an equation to be identified, the rank condition must be
satisfied, but if the equation is exactly identified or overidentified this is determined by the
order condition according to whether it is K − k = g − 1 or K − k > g − 1, respectively.
184 Simultaneous equation regression models
Furthermore, let us state the identification possibilities for an equation:
In what follows we apply the preceding theory in investigating identification in two simple
models of simultaneous equations.
Consider the demand-supply model of Case 8.5, which under equilibrium conditions
is written in Equations (8.70) and (8.71), or
(8.108)
(8.109)
Investigating the demand function In this function there are k = 1 exogenous variables
(Yt) and g = 2 endogenous variables (Qt, Pt). Therefore, the order condition K − k = 1
− 1 = 0 < g – 1 = 2 − 1 = 1 is ‘satisfied’ for underidentification.
Investigating the supply function In this function there are k = 0 exogenous variables
and g = 2 endogenous variables (Qt, Pt). Therefore, the order condition K − k = 1 − 0
= 1 = g − 1 = 2 − 1 = 1 is satisfied for exact identification.
The rank condition can be investigated by forming matrix Δ from (8.109), which is:
Δ = [− α2]
|Δ1| = | − α2| = − α2 ≠ 0
Therefore, the rank condition is satisfied since we can construct at least one non-zero
determinant of order 1, i.e. rank(Δ) = G − 1 = 1. In summary, by taking into account
both the order and rank conditions, the supply function is exactly identified.
Simultaneous equation models and econometric analysis 185
(8.110)
spending, and εt’s = disturbances. This model is complete because it has four equations
and four endogenous variables. The endogenous variables are Ct, It, Tt and Yt and the
predetermined variables are Ct−1, Yt−1, It−1 and Gt.
According to (8.55), model (8.110) can be written in the following way:
(8.111)
Therefore, the rank condition is satisfied since we can construct at least one non-zero
determinant of order 3, i.e. rank(Δ) = G − 1 = 3. In summary, by taking into account
both the order and rank conditions, the consumption function is overidentified.
The rank condition can be investigated by forming matrix Δ from (8.111), which is:
Therefore, the rank condition is satisfied since we can construct at least one non-zero
determinant of order 3, i.e., rank(Δ) = G − 1 = 3. In summary, by taking into account
both the order and rank conditions, the investment function is overidentified.
Investigating the taxes function In this function there are k = 0 predetermined vari-
ables and g = 2 endogenous variables (Tt, Yt). Therefore, the order condition K − k =
4 − 0 = 4 > g − 1 = 2 − 1 = 1 is satisfied for overidentification.
The rank condition can be investigated by forming matrix Δ from (8.111), which is:
Therefore, the rank condition is satisfied since we can construct at least one non-zero
determinant of order 3, i.e. rank(Δ) = G − 1 = 3. In summary, by taking into account
both the order and rank conditions, the taxes function is overidentified.
1 Single-equation methods of estimation These are the methods where each equation in
the model is estimated individually, taking into account only the information included
in the specific equation and without considering all the other information included in the
rest of the equations of the system. For this reason, these methods are known as limited
information methods. We present the following such methods:
a Ordinary least squares method (OLS), for fully recursive models.
b Indirect least squares (ILS) method, for exactly identified equations.
c Instrumental variables (IV) method.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(8.112)
(8.113)
then the model is called a fully recursive model. In this case we can apply the OLS estimation
in the first equation of the model because the predetermined variables in this equation are
188 Simultaneous equation regression models
uncorrelated with the error term ε1t. We can also apply the OLS estimation in the second
equation of the model because both the predetermined variables in this equation are uncor-
related with the error term ε2t, and the endogenous variable Y1t is also uncorrelated with ε2t.
This is because the endogenous variable Y1t, which is also an explanatory variable in the
second equation, although it is a function of ε1t, is still not correlated with ε2t due to (8.113),
which assumes that ε1t and ε2t are not correlated. For exactly the same reasons, we can apply
OLS to all equations of the fully recursive model. In other words, the structural parameters of
the fully recursive model can be estimated consistently and asymptotically efficiently with
OLS. Of course, in cases where the structural disturbances are correlated among equations,
then the OLS approach is not applicable and therefore other methods have to be used.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(8.114)
(8.115)
(8.116)
and
(8.117)
Simultaneous equation models and econometric analysis 189
Table 8.2 Wages-prices link data of an EU member
8.6.1.2 Indirect least squares method (ILS), for exactly identified equations
Previously, we used this method many times when trying to obtain unique estimates of the
structural coefficients using consistent estimates of the reduced form coefficients. In fact, we
said there that if an equation is exactly identified then the indirect least squares method
yields unique estimates of the structural coefficients of this equation. To summarise, the
steps used for this method are:
Step 1 Solve the structural equations system in order to get the reduced form system, i.e. the
system of (8.57), or:
(8.118)
190 Simultaneous equation regression models
Step 2 Apply the OLS method to each reduced form equation and get the consistent and
asymptotically efficient estimates of the corresponding reduced form coefficients, i.e. apply
the OLS method to (8.58), or:
(8.119)
Step 3 Substituting the estimates of the reduced form coefficients to the coefficients system,
solve it in order to get consistent and asymptotically efficient estimates of the structural coef-
ficients, i.e. solve the system (8.95), or the two subsystems (8.100) and (8.101), or, finally:
(8.120)
(8.121)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
where c’s are the estimates of γ’s, b’s are the estimates of β’s, and P’s are the estimates of Π’s.
(8.122)
(8.124)
According to step 2 and using the data in Table 8.2, the results of the OLS application
to the unrestricted reduced form model:
(8.125)
(8.126)
Simultaneous equation models and econometric analysis 191
(8.127)
According to step 3, the estimates of the structural coefficients will be obtained from
the solution of the following corresponding coefficients system:
(8.128)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(8.129)
(8.130)
where
(8.131)
(8.132)
192 Simultaneous equation regression models
where
(8.133)
If d1 denotes an estimator of δ1, the OLS estimator for Equation (8.132) is:
(8.134)
and
(8.135)
where
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(8.136)
We have argued till now that this estimator yields inconsistent estimates because of the
correlation of Z1 with ε1, or because of the correlation of Y1 with ε1.
If W1 is now an n×(g − 1 + k) matrix that satisfies all the requirements to be considered as
an instrumental variables matrix, then the instrumental variables estimator, which gives
consistent estimates, is given by:
(8.137)
and
(8.138)
where
(8.139)
However, in the case of an exactly identified equation, the number excluded from the equation
predetermined variables, K − k, is equal to the number included in the right-hand side endogenous
variables, g − 1. Therefore, these predetermined variables could be used as instruments in the
place of Y1. In this case it can be proved that the ILS estimator is the same as the IV estimator.
(8.140)
Simultaneous equation models and econometric analysis 193
Because both equations are exactly identified, we see that the IV estimates in (8.140)
are exactly the same as estimates found with the ILS method in (8.129). However, the
estimates in (8.140) also include the standard errors estimated according to (8.138).
The corresponding ILS estimates in (8.129) do not include their standard errors
because due to non-linear relationships between the structural and the reduced form
coefficients the computation of the standard errors of the structural coefficients through
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(8.141)
where
with
If d1 denotes an estimator of δ1, the two-stage least squares approach has the following two
steps:
Step 1 Apply the OLS method to the following unrestricted reduced form equations:
(8.142)
194 Simultaneous equation regression models
to obtain the estimates of the reduced form coefficients pi = (X′X) − 1X′Yi, where pi
estimate of πi, and use these estimates to obtain the predicted sample values of Yi, i.e. Ŷi =
Xpi = X(XʹX)–1XʹYi.
Step 2 Use the predicted sample values of Yi, to construct matrix Ẑ1 = [Ŷ1 X1] where
Ŷ1 = [Ŷ2 Ŷ3 . . . Ŷg] and apply the OLS method to the equation:
(8.143)
(8.144)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
The 2SLS estimator (8.144), in terms of the values of the original variables and for the ith
equation is written as
(8.145)
and
(8.146)
where
(8.147)
For an exactly identified equation, the 2SLS estimator can be shown that it is the same with
the ILS estimator, and it can be interpreted as an IV estimator (see Johnston, 1984).
One of the basic assumptions in the analysis above is that the structural error terms were
not autoregressive. In the case that the error terms are autoregressive, we use the two-stage
least squares with autoregression (2SLS/AR). We can distinguish two cases:
(8.148)
For the ith equation and the tth observation the usual quasi-differenced equation is the
following:
(8.149)
Step 1 The same as step 1 of the 2SLS method. In other words, regress all the endogenous
variables on all the exogenous variables of the model and keep their predictions, Ŷi, t.
Simultaneous equation models and econometric analysis 195
Step 2 The same as step 2 of the 2SLS method. In words, substitute the predictions of the
endogenous variables found in step 1 in place of the corresponding endogenous variables on
the right-hand side of each equation and apply OLS. Having estimated di, compute the 2SLS
residuals ei, and then compute the estimate of ρi, i.e. compute:
(8.150)
Step 3 Take the results of steps 1 and 2 formulate the following model:
(8.151)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Step 1 Treat all lagged endogenous variables as the current endogenous variables, regress
all the current endogenous variables on all the (strictly) exogenous variables of the model
and keep their predictions, Ŷi, t.
Step 2 The same as step 2 of the 2SLS method. In other words, substitute the predictions of
the current and the lagged endogenous variables found in step 1 in place of the corresponding
endogenous variables on the right-hand side of each equation and apply OLS. Having esti-
mated di, compute the 2SLS residuals ei, and then compute the estimate of ρi, using (8.150).
Step 3 Take the results of steps 1 and 2 formulate the following model:
(8.152)
(8.153)
(8.154)
196 Simultaneous equation regression models
(8.155)
Income identity:
(8.156)
The structural model of Equations (8.153)–(8.156) has four endogenous variables (Ct,
It, rt, Yt) and six predetermined variables (Ct−1, It−1, Mt−1, Pt, rt−1, Gt, plus the constant).
According to the order and rank conditions, all structural equations are overidentified.
The estimated equations with the 2SLS method, using EViews, are shown below:
Private consumption function:
(8.157)
where the figures in brackets next to the diagnostic tests show the level of significance.
Private gross investment function:
(8.158)
(8.159)
Although the estimates of the three structural equations are generally acceptable in terms
of the a priori restrictions (signs of coefficients) and the statistical and diagnostic tests,
the money demand function shows some autocorrelation in the residuals (DW and BG
Simultaneous equation models and econometric analysis 197
tests) and that the instruments are not all valid (SARGAN test). Therefore, we present
below the estimates of the money demand function with the method of 2SLS/AR(1).
(8.160)
In (8.160) the estimated correlation coefficient is not significant, as can be seen from
the low t-ratio.
198 Simultaneous equation regression models
8.6.1.5 Limited information maximum likelihood (LIML) method
This method utilises the likelihood of the endogenous variables included in the structural
equation to be estimated only, and not all the endogenous variables of the model, and thus
the name ‘limited’. Assume, for example, that we want to estimate the following first
equation of the model:
(8.161)
where
or the equation
(8.162)
where
The unrestricted reduced form equations for the endogenous variables included in the first
structural equation is given by:
(8.163)
where
Under the assumption that the reduced form disturbances are normally distributed, the log of
their joint distribution L1 is:
(8.164)
According to Anderson and Rubin (1949), the limited information maximum likelihood
approach refers to the maximisation of (8.164) subject to the restriction that rank (Ωgk*) =
g − 1, where Ωgk* is given in (8.103). However, it can be proved (see Johnston (1984),
Kmenta (1986)) that the maximisation of (8.164) is equivalent with the minimisation of the
least variance ratio, given by:
Simultaneous equation models and econometric analysis 199
(8.165)
By noting that c1g is an estimate of γ1g, b1 is an estimate of β1, and λ is the smallest character-
istic root of the determinantal equation:
(8.166)
(8.167)
and
(8.168)
The LIML estimator given in (8.167) and (8.168) has the same asymptotic variance-covariance
matrix with the estimator of 2SLS. Furthermore, in the case of an exactly identified equation it
can be shown that λ = 1, and, therefore, the LIML estimator produces the ILS estimator.
(8.169)
(8.170)
(8.171)
200 Simultaneous equation regression models
8.6.2 System methods of estimation
The common factor in the single-equation estimation methods proposed till now is that in all
methods we tried to bypass the correlation between the explanatory endogenous variables in
the structural equations under estimation and their error terms. These single-equation estima-
tion techniques did not consider the possible correlation of disturbances among equations.
Thus, although the single-equation methods estimators were consistent, they were not asymp-
totically efficient. In this section we will present two methods of estimation of the structural
parameters of a model that treat all equations simultaneously, and thus they are increasing the
efficiency of the corresponding estimators, because they contain all possible information
among equations, as such, for example, of the correlation among their error terms.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(8.172)
(8.173)
To avoid the problem of correlation between the explanatory endogenous variables Yi and εi,
for i = 1, 2, . . ., G, we saw in the previous sections that we can use the predicted values of
Yi, from the regression of Yi on all the predetermined variables of the model, in the place of
the corresponding explanatory endogenous variables. Then, system (8.172) is written as:
(8.174)
Simultaneous equation models and econometric analysis 201
and respectively system (8.174), in which the various possible identities are not included, is
written as:
(8.175)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
System (8.175) leads to the consistent 2SLS estimator, which, according to (8.144), is the
following:
(8.176)
(8.177)
However, knowing the 2SLS consistent estimator in (8.176), we use it in order to compute a
consistent estimate W of Ω, as follows:
(8.178)
By substituting (8.178) into (8.177) we obtain the 3SLS estimator, which is:
(8.179)
with
(8.180)
In summary, the three steps in the 3SLS method are the following:
202 Simultaneous equation regression models
Step 1 The same as step 1 of the 2SLS method. In words, regress all the endogenous vari-
ables on all the exogenous variables of the model and keep their predictions, Ŷi, t.
Step 2 The same as step 2 of the 2SLS method. In other words, substitute the predictions of
the endogenous variables found in step 1 in place of the corresponding endogenous variables
on the right-hand side of each equation and apply OLS. Having estimated di,2SLS, compute
matrix W in (8.178).
Step 3 Using the results of steps 1 and 2, apply the GLS estimator, i.e. compute the 3SLS
estimator according to (8.179) and (8.180).
If in place of the estimates di,2SLS of step 2, the estimates di,3SLS of step 3 are used, then the
Downloaded by [Hacettepe University] at 02:27 20 January 2017
whole procedure may be iterated, to produce the iterative three-stage least squares. However,
it has been proved (Madansky, 1964) that this iterative procedure does not improve the
asymptotic efficiency and does not provide the maximum likelihood estimator. Furthermore,
it has been proved (Zellner and Theil, 1962) that the exactly identified equations do not add
relevant information in the estimation of the overidentified equations.
(8.181)
(8.182)
(8.183)
Simultaneous equation models and econometric analysis 203
(8.184)
and
(8.185)
The full information maximum likelihood estimator is based on the entire system of equations
and can be found by maximising the following log of the likelihood function (Kmenta, 1986):
(8.186)
with respect to Γ, Β and Σ. This estimator is consistent and asymptotically efficient. In the
case that the simultaneous equations model contains identities, the variance-covariance
matrix of the structural disturbances Σ is singular and, therefore, its inverse Σ−1 does not
exist. In this case, and before constructing the likelihood function, the identities of the system
can be eliminated by substituting them into the other equations.
(8.187)
(8.188)
204 Simultaneous equation regression models
(8.189)
(8.190)
the system of the structural equations is not a system of simultaneous equations but instead
it is a set of equations. In this case each equation contains one, and only one, endogenous
variable, e.g. its dependent variable.
Generally speaking, this set of equations may be written analytically as:
(8.191)
(8.192)
where for the ith equation, Yi is a n × 1 vector of the values of the dependent variable, Xi is
a n × Ki matrix of the values of the explanatory variables, εi is a n × 1 vector of the values of
the error variable, and βi is a Ki × 1 vector of the corresponding regression coefficients. The
equations in (8.192) may be also written together as:
(8.193)
(8.194)
Simultaneous equation models and econometric analysis 205
where the substitution is obvious.
With respect to the error terms, we employ the following assumptions:
(8.195)
2 For each equation i (=1,2,. . .,G), the error terms have constant variance over time:
(8.196)
3 For each equation i (=1,2,. . .,G) and for two different time periods t ≠ s (=1,2,. . .,n), the
error terms are not autocorrelated:
(8.197)
4 For the same time period t (=1,2,. . .,n), the error terms of two different equations I ≠ j
(=1,2,. . .,G) may be correlated (contemporaneous correlation):
(8.198)
5 For two different equations I ≠ j (=1,2,. . .,G) and for two different time periods t ≠ s
(=1,2,. . .,n), the error terms are not correlated:
(8.199)
(8.200)
or
(8.201)
(8.202)
From the discussion it is seen that the only link between the equations of the set of equations
in (8.191), or in (8.192), is the contemporaneous correlation. In other words, the only link is
through the covariance σij of the error terms of the ith and the jth equations. For this reason
Zellner (1962) gave the name seemingly unrelated regression equations (SURE) to these
equations.
The methods used in order to estimate the set of equations in (8.191), or in (8.192), depend
on the assumptions about the error terms. These methods are the following:
206 Simultaneous equation regression models
8.7.1 Ordinary least squares (OLS)
If we assume that from the five assumptions (8.195) to (8.199), only the assumptions (8.195)
and (8.196) hold, then each equation in the set of equations in (8.191) could be estimated
individually by the classical ordinary least squares method. The least squares estimator in
this case is the best linear unbiased estimator, and is given by:
(8.203)
with
(8.204)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
where
(8.205)
Consider the following set of three linear demand equations for specific food products:
(8.206)
The a priori restrictions for the coefficients in the three demand equations are the
following:
meat: β11<0 (normal good), β12>0 (substitute good), β13>0 (superior good)
vegetables: β21<0 (normal good), β22>0 (substitute good), β23>0 (superior good)
oils: β31<0 (normal good), β33>0 (superior good)
Assuming that there is no contemporaneous correlation between the error terms of the
three demand equations, we can apply the OLS method. The corresponding results,
using the data in Table 8.4, are shown below:
Simultaneous equation models and econometric analysis 207
Table 8.4 Consumption expenditure for specific food products (at constant 1970 market prices,
million drs), price indices for specific food products (base year 1970), and personal disposable
income (at constant 1970 market prices, million drs) of an EU member.
Year Qm Qv Qo Pm Pf Pv Pd Po Y
1960 6978 11313 4977 0.730 0.631 0.764 0.727 0.635 117179.2
1961 7901 13094 5300 0.687 0.643 0.766 0.740 0.638 127598.9
1962 8660 12369 5430 0.668 0.665 0.779 0.767 0.647 135007.1
1963 9220 13148 5528 0.706 0.686 0.825 0.778 0.729 142128.3
1964 9298 15564 6186 0.826 0.723 0.778 0.829 0.732 159648.7
1965 11718 15132 6457 0.844 0.794 0.908 0.890 0.772 172755.9
1966 12858 16536 6428 0.874 0.854 0.977 0.946 0.786 182365.5
1967 13686 16659 5559 0.887 0.878 0.996 0.955 0.938 195611.0
Downloaded by [Hacettepe University] at 02:27 20 January 2017
1968 14468 18357 6582 0.872 0.890 0.952 0.974 0.857 204470.4
1969 15041 18733 7108 0.888 0.913 1.072 0.989 0.922 222637.5
1970 16273 19692 6681 1.000 1.000 1.000 1.000 1.000 246819.0
1971 17538 18865 7043 1.089 1.109 1.129 1.009 1.003 269248.9
1972 18632 19850 7223 1.139 1.181 1.198 1.016 1.038 297266.0
1973 19789 21311 7439 1.421 1.398 1.531 1.186 1.272 335521.7
1974 20333 21695 7311 1.643 1.600 1.918 1.582 1.696 310231.1
1975 21508 22324 7326 1.810 1.993 2.162 1.788 1.931 327521.3
1976 22736 21906 7638 2.100 2.159 2.709 2.008 1.957 350427.4
1977 25379 20312 6979 2.294 2.390 3.191 2.122 2.140 366730.0
1978 26151 22454 6973 2.499 3.169 3.733 2.504 2.444 390188.5
1979 26143 23269 7287 3.243 3.795 4.241 2.985 2.775 406857.2
1980 26324 24164 8103 3.862 6.850 5.294 3.764 3.440 401942.8
1981 24324 26572 7335 5.270 9.000 6.382 4.898 4.072 419669.1
1982 26155 25051 7591 6.709 10.177 7.450 5.945 4.727 421715.6
1983 27098 25344 8492 7.874 11.637 8.625 6.973 5.537 417930.3
1984 27043 25583 8580 8.949 13.951 10.682 8.319 6.774 434695.7
1985 27047 23402 8303 10.546 18.230 11.960 10.134 8.725 456576.2
1986 27136 23666 7287 12.262 23.003 13.128 12.514 10.254 439654.1
1987 28616 24639 7258 13.439 27.152 15.561 14.254 10.200 438453.5
1988 29624 26857 7328 15.092 29.460 16.553 15.776 11.007 476344.7
1989 30020 26431 7230 18.444 33.972 19.307 18.873 13.585 492334.4
1990 29754 25808 7261 21.632 40.391 23.881 22.757 17.181 495939.2
1991 29332 26409 7596 23.916 45.400 29.734 26.049 22.076 513173.0
1992 30665 26358 7638 27.246 51.814 31.481 29.590 20.751 502520.1
1993 31278 28120 7815 30.083 57.155 32.248 35.182 22.355 523066.1
1994 35192 28405 8126 31.827 63.540 36.397 40.331 26.603 520727.5
1995 36505 28388 8195 33.601 72.491 39.122 42.793 31.090 518406.9
(8.207)
(8.208)
208 Simultaneous equation regression models
(8.209)
The estimates of the three demand equations are acceptable in the light of the signs of
the regression coefficients and the reported statistics.
In the case that assumption (8.198) holds then the OLS estimator is not efficient because it
does not use information on the contemporaneous correlation. To obtain a more efficient
estimator than the OLS estimator, the set of the equations may be written in the form of the
‘stacked’ model of (8.193), In other words, the G different equations may be viewed as the
single equation of (8.193), and the generalised least squares estimator may be used. This
estimator, which is a best linear unbiased estimator, is given by (Aitken, 1935).
(8.210)
with
(8.211)
Step 1 Apply the OLS method to each of the equations individually and obtain the estimates
of the regression coefficients according to (8.203), or:
(8.212)
Step 2 Using these OLS estimates obtain the corresponding residuals according to:
(8.213)
Step 3 Using the residuals estimated in (8.213) obtain consistent estimates sij of the vari-
ances-covariances σij according to:
(8.214)
Step 4 Using the estimates in (8.214) construct the following variance-covariance matrices
S and W, as being estimates of the matrices Σ and Ω, respectively:
Simultaneous equation models and econometric analysis 209
(8.215)
Step 5 Substituting the estimates of (8.215) into (8.210) and (8.211) we obtain the SUR
estimator, as:
(8.216)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
with
(8.217)
Step 6 Having estimated the regression coefficients in (8.216), the whole process could be
iterated by substituting these estimates into step 2 and then continuing to the next steps. This
process will yield the iterated SUR estimates, which correspond to the maximum likelihood
estimates of a SUR model (Oberhofer and Kmenta, 1974).
Step 1 The OLS estimates of the regression coefficients of the three equations indi-
vidually have been obtained in Example 8.12.
Step 2, 3, 4 Using these estimates the residuals for the three equations have been
calculated and, furthermore, using these residuals the variance-covariance matrix S
have been estimated as follows:
(8.218)
Step 5, 6 Using (8.216) and (8.217) the SUR estimates have been obtained to be the
following (results from EViews):
The demand for meat:
(8.219)
210 Simultaneous equation regression models
(8.220)
(8.221)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
The variance-covariance matrix S used in the final iteration of the iterative SUR esti-
mation (EViews) is the following:
(8.222)
Comparing the results in Example 8.13 (SUR) with those in Example 8.12 (OLS) we
see that the estimates in Example 8.13 are in general more efficient (greater t-ratios)
than the corresponding estimates in Example 8.12. Generally, the greater the contem-
poraneous correlation, the more efficient the SUR estimates are with respect to the
OLS estimates.
(8.223)
Breusch and Pagan (1980) suggested that under the null hypothesis the Lagrange multiplier
statistic λ, which is given by:
(8.224)
Simultaneous equation models and econometric analysis 211
where
(8.225)
Example 8.14 Food demand equations for an EU member (testing for contempo-
raneous correlation)
Consider the set (8.206) of the three linear demand equations for specific food prod-
ucts of Example 8.12. According to (8.225), the residual correlation matrix rOLS of the
Downloaded by [Hacettepe University] at 02:27 20 January 2017
OLS residuals (Example 8.12) and the residual correlation matrix rSUR of the SUR
residuals (Example 8.13) are the following:
(8.226)
(8.227)
Taking into account that G = 3, the degrees of freedom are G(G − 1)/2= 6, and for a
significant level of 0.05 we get from the tables of the chi-squared distribution that
Χ2(3) = 7.81473. Thus, because the value(s) of the Lagrange multiplier statistic(s) is
greater than the critical level of the Χ2(3) distribution, the null hypothesis is rejected in
favour of the alternative hypothesis. In other words, at least one covariance is non-
zero, suggesting that there exists contemporaneous correlation.
(8.228)
(8.229)
where
(8.230)
212 Simultaneous equation regression models
or
(8.231)
However, for these tests to be operational, matrix Σ in the formulas above must be replaced
with its estimate S (see Judge et al. 1985).
Example 8.15 Food demand equations for an EU member (testing linear restric-
tions on the coefficients)
Consider the set (8.206) of the three linear demand equations for specific food prod-
ucts of Example 8.12. According to (8.193) this set of equations may be written as:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(8.232)
Assume now that we want to test if the marginal propensity to consume (demand) with
respect to income is the same for all food commodities. In other words, we want to test
if the restrictions β13 = β23 = β33, or, equivalently, the restrictions β13−β33 = 0 and
β23 − β33 = 0, are correct. According to (8.228) these restrictions are written:
(8.233)
Taking into account (8.233) and using the SUR estimation in Example 8.13, statistic
(8.229) is equal to (estimation with EViews) g = 834.2011. Because this value is
greater than the 0.05 significant level critical value of χ2(2) = 5.99146, the null hypoth-
esis is rejected in favour of the alternative hypothesis. In other words, at least one of
the marginal propensities to consume is not equal to the other propensities to consume.
(8.234)
Step 1 Apply the OLS method to each of the equations individually and obtain the estimates
of the regression coefficients.
Step 2 Using these OLS estimates obtain the corresponding residuals ei and estimate the
autocorrelation coefficients p̂ i for each equation individually, by regressing ei,t on ei,t−1.
Step 3 Using the estimated autocorrelation coefficients transform the data according to the
following transformations:
(8.235)
Step 4 Apply the SUR estimator to the transformed data of step 3, i.e. apply the steps 3, 4, 5
and 6 of the methodology in Section 8.7.3.
Demand for meat Demand for fruits and Demand for oils and
vegetables fats
(8.236)
(8.237)
(8.238)
Finally, the residual covariance matrix S and the residual correlation matrix r used in
the estimation are the following:
(8.239)
From (8.224), using the estimated residual correlation matrix r in (8.239), it can be
calculated that λ = 7.1356. This value is less than the 0.05 significant level critical
value of χ2(3) = 7.81473 and, thus, it can be argued that it was not necessary to use
the SURE methodology to the transformed data, but an OLS application to these
transformed data would yield efficient results.
(8.240)
Anderson and Rubin (1950) suggested a likelihood ratio statistic, which is given by:
(8.241)
where λi is the smallest characteristic root of the determinantal equation of the LIML
estimation (see Section 8.6.1.4). In the case that the value of the statistic LR is greater than
the critical value of the χ2 distribution for a given significance level, then this means that
exogenous variables have been inappropriately omitted from the equation under
examination.
Taking into account that n = 35, we can get from (8.241) and from tables of the
chi-squared distribution, for 0.05 significance level yields:
In other words, the LR test suggests that from the private consumption function and
from the money demand function exogenous variables have been inappropriately
omitted. However, Basmann (1960) found that this test rejects the null hypothesis too
often.
216 Simultaneous equation regression models
b. Lagrange multiplier tests (LM)
Hausman (1983) suggested the following steps in order to test omitted variables from an
equation (see also Wooldridge, 1990):
Step 1 Estimate the specific equation with one of the single-equation methods of estimation
and save the residuals.
Step 2 Regress the saved residuals in step 1 on all the predetermined variables of the model
plus the constant and note R2.
Step 3 Use the Lagrange multiplier statistic LM, which is given by:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(8.242)
Taking into account that n = 35, we can get from (8.242) and from tables of the chi-
squared distribution, for 0.05 significance level, yields:
In other words, the LM test suggests that from the money demand function only exog-
enous variables have been inappropriately omitted.
(8.243)
with matrix Zt = [X1t X2t . . . XKt] being the matrix of all the predetermined variables and
matrix Yt = [Y1t Y2t . . . YGt] being the matrix of all the endogenous variables in the model.
Wooldridge (1991) suggested the following procedure in order to test the serial correlation
of any order in the disturbances of the ith equation of the model:
Simultaneous equation models and econometric analysis 217
Step 1 Estimate the ith Equation (8.243) by 2SLS and save the corresponding residuals ei,t.
Step 2 Estimate the reduced form equations of the simultaneous equations model, i.e.
regress each endogenous variable of the model on Zt, and save the fitted values of all the
endogenous variables Ŷt.
Step 3 Regress ei,t against, Ŷi, t, Xi,t and ei,t−1, ei,t−2, . . ., ei,t−p and note R2.
Step 4 Use the Lagrange multiplier statistic LM, which is given by:
(8.244)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
In other words, the LM test suggests that the disturbances in the money demand func-
tion exhibit some serial correlation of the second order (compare with results in (8.159)).
(8.245)
Considering Equation (8.243), Hausman (1976) suggested the following steps for testing the
hypotheses in (8.245):
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Step 1 Estimate the reduced form equations of the simultaneous equations model, i.e. regress
each endogenous variable of the model on Zt, and save the fitted values of all the endogenous
variables Ŷt and also save the corresponding reduced form residuals ut.
Step 2 Because Yt = Ŷt + ut, substitute this expression of the explanatory endogenous vari-
ables into Equation (8.243) and estimate by OLS the following equation:
(8.246)
For efficient estimation, Pindyck and Rubinfeld (1991) suggest the following equation,
instead of Equation (8.246):
(8.247)
Step 3 Use the F-test (or the t-test for one regression coefficient) to test the significance of
the regression coefficients of the ui,t variables. If the test shows significant coefficients,
accept the alternative hypothesis in (8.245), i.e. accept that there is simultaneity. If the test
shows non-significant coefficients, accept the null hypothesis in (8.245), i.e. accept that there
is no simultaneity.
(8.248)
Simultaneous equation models and econometric analysis 219
(8.249)
(8.250)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
According to step 3, the significance level for the regression coefficient of uY,t in Equation
(8.248) using the t-test is {0.0000}, the significance level for the regression coefficients
of uY,t and ur,t in Equation (8.248) using the F-test is {0.000}, and the significance level
for the regression coefficient of uY,t in Equation (8.249) using the t-test is {0.4299}. In
other words, the simultaneity problem is present in the private consumption function and
in the private gross investment function, whilst there is no simultaneity problem in the
money demand function.
(8.251)
Step 1 Estimate the reduced form equations of the simultaneous equations model, i.e. regress
each endogenous variable of the model on Zt, and save the fitted values of all the endogenous
variables Ŷt.
(8.252)
Step 3 Use the F-test (or the t-test for one regression coefficient) to test the significance of
the regression coefficients of the Ŷt variables. If the test shows significant coefficients,
accept the alternative hypothesis in (8.251), i.e. accept that the corresponding variables are
endogenous. If the test shows not significant coefficients, accept the null hypothesis in
(8.251), i.e. accept that the corresponding variables are exogenous.
220 Simultaneous equation regression models
(8.253)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(8.254)
(8.255)
Consumption
Constant 5250.619 7519.307 8423.51 7835.32 8261.57
[2.8302] [3.554] [3.6551] [3.9651] [1.7689]
Yt 0.25626 0.17415 0.14142 0.16152 0.14666
[6.8976] [3.6384] [2.6283] [3.8522] [1.7050]
Ct−1 0.61213 0.73293 0.78107 0.75162 0.77342
[11.1495] [10.377] [9.8403] [12.1551] [6.2582]
Investment
Constant 6014.662 6232.39 6254.74 6442.75 6365.42
Downloaded by [Hacettepe University] at 02:27 20 January 2017
estimates. Going to single-equation methods of estimation we saw that 2SLS and LIML are
preferable to OLS because they produce consistent estimates. Furthermore, the systems
methods of estimation, 3SLS and FIML, are preferable to 2SLS and LIML, because they
produce more efficient estimates than the single methods of estimation. However, the
preceding discussion refers to the asymptotic properties of the estimates of the various
methods and not to their finite-sample properties. But, in most applications, the sample sizes
are not infinite but finite and generally small.
Although, in theory, the systems methods look to be asymptotically more preferable
than the single-equation methods, in practice we have to take into account the following
disadvantages of systems methods:
1 Even in these days of high-speed computers, the computational burden for a moderate
or large econometric model is quite large.
2 Possible specification errors in one or more equations of the system are transmitted to
the other equations of the system, although these equations were correctly specified.
Therefore, the systems methods of estimation are very sensitive to specification errors,
whilst the single-equation methods of estimation are not as sensitive. For the latter
222 Simultaneous equation regression models
methods, any specification error in one equation is stacked with that equation and does
not affect the estimates of the rest of the equations.
higher t-ratios.
Review questions
1 Explain what you understand by the ‘identification problem’. Why should identification
problems be dealt with prior to estimation?
Simultaneous equation models and econometric analysis 223
2 ‘An equation is identified if the “rank” condition is satisfied.’ Discuss.
3 Discuss the advantages of the 2SLS method. Under what circumstances might indirect
least squares be used?
4 Explain the full information maximum likelihood method. Illustrated your answer with
reference to a model of your own choice.
5 Explain what you understand by each of the following:
a simultaneous equation bias
b identification problem
c seemingly unrelated equations.
6 Explain the types of diagnostic tests for the simultaneous equation systems. Explain
each of the following:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Aitken, A. (1935). ‘On least squares and linear combinations of observations’, Proceedings of the
Royal Statistical Society, 55, 42–48.
Anderson, T. and Rubin, H. (1949). ‘Estimation of the parameters of a single equation in a complete
system of stochastic equations’, Annals of Mathematical Statistics, 20, 46–63.
224 Simultaneous equation regression models
Anderson, T. and Rubin, H. (1950). ‘The asymptotic properties of estimators of the parameters of a
single equation in a complete system of stochastic equations’, Annals of Mathematical Statistics, 21,
570–582.
Basmann, R. (1960). ‘On finite sample distributions of generalised classical linear identifiability test
statistics’, Journal of the American Statistical Association, 55, 650–659.
Breusch, T. and Pagan, A. (1980). ‘The LM test and its applications to model specification in
econometrics’, Review of Economic Studies, 47, 239–254.
Hausman, J. A. (1976). ‘Specification tests in Econometrics’, Econometrica, 46, 1251–1271.
Hausman, J. A. (1983). ‘Specification and estimation of simultaneous equations models’, in
Z. Griliches and M. Intriligator (eds) Handbook of Econometrics, Amsterdam: North Holland.
Johnston, J. (1984). Econometric Methods, 3rd edn, New York: McGraw-Hill.
Judge, G. G., Griffiths, W. E., Hill, R. C., Lutkepohl, H. and Lee, T. C. (1985). The Theory and
Practice of Econometrics, 2nd edn, New York: Wiley.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Qualitative variables in
econometric models – panel
data regression models
Downloaded by [Hacettepe University] at 02:27 20 January 2017
• This unit covers regression models involving cross-section data and panel data analysis.
The common trait is the availability, in recent years, of these types of data for regres-
sion. This availability of data has made it possible for regression analysis to be carried
out on many interesting applied issues involving large-scale cross-section and panel
data sets.
• Cross-section data sets are typically collected via surveys and questionnaires. They
contain information on many qualitative variables, which are key in explaining the deci-
sion-making process by individual economic units. This unit provides an introduction to
the treatment of qualitative variables in regression models, as both regressors or
regressands.
• The unit contains three chapters. Chapter 9 explains the inclusion of qualitative vari-
ables via dummy variables as regressors in the regression models. Chapter 10 provides
an introductory coverage of qualitative response models. In this type of model the
dependent variable is a categorical qualitative variable representing an individual’s
response to a particular question of interest. The qualitative response models can deal
with many interesting applied issues involving an individual decision-making process.
• Chapter 11 provides introductory coverage of panel data regression analysis. This type
of data are used to eliminate the omitted variable problem inherent in cross-section data
studies, as well as dealing with many interesting empirical issues which can only be
investigated via a panel data set containing data on the same cross-sectional units over
time.
9 Dummy variable regression
models
Downloaded by [Hacettepe University] at 02:27 20 January 2017
INTRODUCTION
Econometric models are derived on the basis of deductive reasoning, which identifies inter-
relationships among a number of variables. These variables are measured by conventional
time series or cross-section data. However, qualitative variables, those variables which
cannot be measured by quantitative data, can also have a significant impact on the dependent
variable under consideration. For example, in cross-sectional analysis of household consump-
tion expenditure, qualitative variables, such as location, the level of education of household
members, and the gender mix of the household, can influence the household consumption.
Dummy variables are introduced into regression models to capture the impact of these quali-
tative variables on household consumption expenditure.
Qualitative variables can also appear as dependent variables in regression models. For
example, in a study of incidence of private health insurance, based on cross-section data, the
dependent variable of the model is expressed as either an individual possessing private
insurance, or not, given his/her characteristics, including income and occupation. In this type
of model, the dependent variable essentially represents the response of an individual to a
particular question being asked. The answer to the question is either positive or negative,
representing qualitative variables. In recent years, due to the availability of large-scale
micro-econometric data, these models have become popular, particularly in the area of
market research.
This introductory chapter deals with the qualitative variables as regressors in regression
models, while the next chapter provides an introductory discussion of the qualitative depen-
dent variable models.
Key topics
• Dummy variables
• Seasonal adjustment of data using dummy variables
• Pooling time series and cross-section data
• Testing for the structural break using dummy variables
consumption expenditure
The location of households can have a significant impact on household consumption.
Location is a qualitative variable and to capture its impacts, we need to employ dummy
variables. In the example of family consumption expenditure, to bring the location variable
into the analysis, we distinguish between two locations: south and north. We define the
following location dummy variable:
How does location of the household influence consumption expenditure? We can distinguish
between three types of impact:
Figure 9.1 depicts the population regression of the two different locations under 1, above.
Notice that β2 is the marginal propensity to consume in both locations. The slope of the
lines are the same, while the level of autonomous consumption expenditure is assumed to be
higher in the south. Thus, households with the same level of income, on average, have higher
autonomous expenditure in the south, because the cost of living in the south might be higher.
The location variable is, therefore, assumed to shift the southern household’s consumption
function upwards, as depicted in Figure 9.1. Note that the basic idea is that location of the
Dummy variable regression models 229
Ci
E (Cl/Yl) = α1 + β2Yi (Southern
household)
β2
β2
Downloaded by [Hacettepe University] at 02:27 20 January 2017
0 Yi
Y1 Y2
household is a relevant regressor in cross-section studies of the type we are studying. This is
a hypothesis and should be tested later. We have two regression models:
(9.1)
(9.2)
A way of proceeding would be to run two regressions, one for the south and one for the
north, then test to see if the difference between α1 and β1 is statistically significant. This
procedure, however, is seldom done in practice, since it involves two regressions. The
‘correct’ way is to use the location dummy variable to combine the two models and then run
one regression to test the significance of location. This can be done in two ways: regression
with or without the intercept term. We consider the regression with the intercept term here
and regression without the intercept term in 9.2.
Combined model
Using the northern group as a ‘base’ group, the location dummy may be used to derive the
combined model, as follows:
(9.3)
Base group Deviation from
(northern the base group
household)
230 Qualitative variables in econometric models – panel data regression models
If the household is located in the north, Di = 0, and the regression model will be Ci = β1 +
β2Yi + ui, which is the northern household group regression equation. If the household is
located in the south, then DI = 1 and the regression model will be Ci = α1 + β2Yi + ui. In this
example, the location variable only affects the intercept terms of the model, i.e. the level of
autonomous expenditure. The location variables can, however, also influence the slope
parameter.
above example, the qualitative variable is the location of the household, with two categories:
northern households (those living in the north of the country) and southern households (those
living in the south of the country). Following this rule we need only one location dummy
variable, as defined above. If by mistake we introduce two location dummy variables, one
for the north and one for the south, we would fall into the so-called dummy variable trap. In
this situation, the estimation of each individual parameter would not be possible. This is
because by introducing two location dummy variables instead of one, we have created a
perfect linear relationship between some of the regressors of this model. This type of linear
relationship is called perfect multicollinearity, and if this situation arises it would not be
possible to estimate each individual parameter separately, rendering the OLS regression
meaningless. To illustrate this problem, we introduce below two location dummy variables,
as follows:
and
When the ith household is from the south, Si = 1 and Ni = 0, therefore the intercept term would
be β1 + β3, therefore neither β1 nor β3 can be separately estimated, as what we are estimating
is their sum. When the household is from the north, Ni = 1 and Si = 0, and in this case the
intercept term of the regression model would be β1 + β4. Again, we are estimating the sum of
two parameters. This is what is meant by the dummy variable trap, where too many dummy
variables are introduced into the regression model, making the estimation of each individual
parameter impossible. Note that the value of the regressor of the intercept term is always set
equal to unity. Also, the sum of Si + Ni is equal to unity. There is, therefore, a perfect linear
relationship (perfect multicollinearity) between the intercept term’s regressor and Si and Ni, as
this regressor’s value is now equal to the sum of Si and Ni. Estimation of each individual
parameter is not possible due to perfect multicollinearity inherent in this regression model.
Dummy variable regression models 231
To avoid the dummy variable trap we have two options:
• Allow for the intercept term in the regression model but use one of the alternatives as a
base/reference group. In this case we need to introduce dummy variables for all the
other groups. This approach works best in practice as most regression packages require
the regression equation to include an intercept term.
or
• Do not allow for the intercept term (no base/reference group), and introduce dummy
variables for each group. This option requires regression without an intercept term. We
discuss both of these techniques below.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(9.4)
(9.5)
Combined model
Using the northern group as a ‘base/reference’ group, the location dummy may be used to
derive the combined model, as follows:
(9.6)
Base group
How does this combined model work? We start with the northern household; in this case
Di = 0 and the model is:
(9.7)
(9.8)
(9.9)
The combined model may then be estimated by the OLS method using a cross-section
data set. The model can be used to test the hypothesis that location makes a difference to
household consumption. We can conduct the following test:
232 Qualitative variables in econometric models – panel data regression models
9.1.4 Significance tests for location on autonomous expenditure
We set the null hypothesis as:
Ho: α1 − β1 = 0
H1: α1 − β1 ≠ 0
The level of autonomous expenditure is the same for both northern and southern
households.
Let γ = α1 − β1, where γ is the parameter of the dummy variable, Di, in the combined
equation. The test statistic may be written as:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(9.10)
To proceed with the test, we estimate the equation by OLS and calculate the value of the ‘t’
statistic. If the calculated value of t falls in the rejection region, we do not reject H0 at
α = 5%, concluding that the difference in the level of autonomous expenditure in the two
regions is statistically significant. We conduct a similar test of significance on the slope
parameter, (α2 − β2), to see if there are differences between the marginal propensity to
consume in the two regions. Finally, we can conduct a joint test of significance.
Test statistic:
(9.11)
where RSSR = residual sum of squares of the restricted model, RSSU = residual sum of squares
of the unrestricted model, d is the number of restriction (i.e. d = 2) and n – k is the degrees of
freedom of the unrestricted regression equation. To proceed with the test, we need to estimate
the unrestricted and the restricted equation, calculate the value of the test statistic and compare
the value with the critical value of the test, say at the 5% level of significance; if:
0.025 ≤t ≤ −t 0.025
−t n−k n−k
0.025
−t n−k 0.025
−t n−k
There is no base group, so we need to define two dummy variables, one for each group:
(9.12)
Let us now see how it works. If a household is located in the north, D1i = 1 and D2i = 0, we get:
(9.13)
If a household is located in the south, D1i = 0 and D2i = 1, and we get the southern household
model. Note that within this framework the regression model is estimated with no intercept
term to avoid the dummy variable trap and the problem of perfect multicollinearity discussed
above. The lack of an intercept term can cause problems for some regression packages. It is
therefore advisable to select a base group and run a model with an intercept term in applied
work.
= 0 otherwise
D5i = 1 if the ith head of household has a degree
= 0 otherwise
To proceed we choose a ‘base/reference group’. We define the ‘base group’ to be the one for
which the values of each dummy variable is zero. Within this framework the base/reference
group household is defined as being located in the north, the head of the household is male,
the age of the head of household is 50 years or over, and the head of household has not got
a university degree. The regression model of the base group may, therefore, be written as
follows:
(9.14)
The above regression model is then used to estimate the difference between autonomous
expenditure for each group of households and that of the base group, as follows:
(9.15)
Difference due Difference due Difference due
to age (less to age (25 yrs to education
than 25 yrs) < age < 50 yrs) level
Having estimated the model, we undertake ‘t’ and ‘F’ tests to test the significance of each
qualitative variable (t-test) as well as to test their collective significance (F-test).
Note that, when we choose a base group, the regression model has an intercept
term. Moreover, the number of dummy variables needed in each category is one minus
the number of categorical variables. For example, taking into account the three categories
of age, as defined above, we need to define only two dummy variables, when a base group
is used.
Dummy variable regression models 235
9.4. Seasonal adjustments of data using seasonal dummies
Many economic variables exhibit seasonal variations over time. Therefore, values change
with the seasons. For example, consumer expenditure tends to increase in the last quarter of
the year, compared with others. Dummy variables are frequently used to capture seasonal
components. To illustrate the methodology, we consider the following example.
(9.16)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
where
All variables are measured in natural logs. Based on 28 quarterly seasonally unadjusted
observations, we obtain the following OLS results:
(9.17)
(0.310) (0.160) (0.140)
Where the figures in the parentheses are the estimated standard errors.
Residual sum of squares: RSS = 0.75, n = 28 number of observations.
To see if the demand for textiles has seasonal components, we use the fourth quarter of the
year as the ‘base/reference’ period. We then use seasonal dummy variables to compare other
quarters with the fourth quarter:
(9.18)
Second quarter Third quarter
compared with the compared with
base quarter the base quarter
236 Qualitative variables in econometric models – panel data regression models
Now consider how the regression model works. When the observations relate to the first
quarter of the year, D1t = 1 and D2t = D3t = 0; we get the following model, corresponding to
the first quarter:
(9.19)
1st quarter:
D1t = 1
D2t = 0
D3t = 0
Downloaded by [Hacettepe University] at 02:27 20 January 2017
The regressions for the second and the third quarter of the year are obtained similarly as:
(9.20)
2nd quarter:
D1t = 0
D2t = 1
D3t = 0
and
(9.21)
3rd quarter:
D1t = 0
D2t = 0
D3t = 1
The seasonal regression model was estimated by OLS, generating the following results:
(9.22)
RSS = 0.65
n = 28
From these results, the impact of autonomous expenditure on textiles goods in the fourth
quarter of the year is estimated at 1.20 per capita. The coefficient of D1t, 0.28, shows how
much this figure changes in the first quarter of the year, compared to the fourth quarter. It
appears, therefore, that autonomous per capita expenditure in the first quarter, compared to
the fourth quarter, rises by 0.28. Then in the second and third quarters, compared to the
fourth, it falls by 0.38 and 0.45, respectively. We also notice that each seasonal component
is statistically significant. To check if there is a statistically significant seasonal variation in
expenditure on textiles, we perform a joint test of significance:
Dummy variable regression models 237
H0: (α1 − β1) = (α2 − β1) = (α3 − β1) = 0
(9.23)
(9.24)
= 14/39
≅ 0.3
we cannot reject the null hypothesis. Therefore, there are no seasonal effects present in the
data. Note, in this example, seasonal variation is assumed to only impact the intercept term,
leaving slope parameters unchanged over time.
(9.25)
zit = 1 for i = 2, . . ., n
Downloaded by [Hacettepe University] at 02:27 20 January 2017
= 0 otherwise
and
(9.26)
Cit and Yit are, respectively, consumption and income of the ith household in period t. In
this regression model a randomly selected cross-section data set consisting of n observations
are being pooled with the time series data to increase the number of observations via the
dummy variable technique. Note that household one is the base group and the changes in the
intercept term across households and time are compared to this household’s intercept term
β1. The number of observations is now n × 3 = 3n for this pooled regression model.
(9.27)
(9.28)
(9.29)
The above regression model may be used to test for the parameter stability, as follows:
(9.30)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
The test statistic is the familiar F-test for testing linear restrictions on the parameters of the
regression model:
(9.31)
Decision rule: if the F-test value is greater than the corresponding critical value at a pre-set level
of significance, reject the null, concluding that the break occurred in 1997. Note that the number
of restrictions is only two, as there are two restrictions on the parameters of the combined
model. This procedure provides a simple alternative to the Chow test, and is used frequently in
practice to test for structural breaks in the time series and to check the parameters’ stability.
Review questions
1 Explain how dummy variables may be employed in regression models to account for the
impact of qualitative variables on the dependent variables. Give examples to illustrate
your answer.
2 Explain what you understand by the dummy variable trap. How can this problem be
avoided in practice?
3 It is thought that the stock prices are relatively low on Mondays, the so-called ‘Monday
effect’. Explain how this effect on stock prices might be captured via the dummy vari-
able technique.
240 Qualitative variables in econometric models – panel data regression models
4 Explain how dummy variables might be employed to capture seasonal variations in the
data. Why are there normally only three seasonal dummies used when there are four
seasons?
5 Give an example and demonstrate the use of dummy variables for testing parameter
stability and for structural break in regression models.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
10 Qualitative response regression
models
Downloaded by [Hacettepe University] at 02:27 20 January 2017
INTRODUCTION
In recent years large-scale cross-section data sets containing several hundred and even
thousands of observations on the characteristics of individuals, firms and even towns/
cities/regions have become available. The availability of this type of data, and software
advances, has made it possible for practitioners to perform empirical investigations in
the fields of social science, economics and marketing. A distinguishing feature of this
type of investigation involving regression models is that the dependent variable is
specified as a qualitative variable, representing the response of an individual, or firm,
to particular questions. For example, in an investigation on the incidence of R&D
activities by firms, a firm is either undertaking R&D activities, or not, given its characteris-
tics (e.g. sales, exports, etc.) The dependent variable, the R&D status of the firm, is
dichotomous, representing the response of the firm to the question: are you undertaking
R&D activities? The response is either positive or negative. The dependent variable is
therefore a categorical variable, represented by a dummy variable taking two values: if
the answer is positive, it takes a value of one, otherwise it takes a value of zero.
Alternatively, in a study of incidence of private health insurance, using a large sample
of cross-section of data on the characteristics of individuals, an individual either has
private health insurance, or not, given his or her characteristics (e.g. income, occupation,
age, etc.). The question being asked of each individual is: do you have private health
insurance? The response is either positive or negative. If it is found to be positive, the
dependent variable, the insurance ownership status of an individual, takes a value of one,
otherwise it takes a value of zero. In these examples, the dependent variable is defined on the
characteristics of individuals/firms. It is a dichotomous qualitative variable eliciting a ‘yes’
or a ‘no’ response. Qualitative response models lead to a number of interesting problems
concerning estimation, interpretation and analysis. In this introductory chapter we deal
with binary qualitative dependent variable regression models, as well as multinomial and
ordered logit regression models.
Key topics
• The linear probability model (LPM)
• The logit and probit models
• The multinomial logit model
• The ordered logit/probit models
242 Qualitative variables in econometric models – panel data regression models
10.1 The linear probability model (LPM)
We introduce the LPM by considering the incidence of private health ownership.
Let Y = an individual’s status of private health insurance. The binary response may be
presented as follows:
Yi = β1 + β2Xi
Downloaded by [Hacettepe University] at 02:27 20 January 2017
In addition, we introduce a disturbance term ui, to take into account the influence of other
factors, so:
(10.1)
where E(ui) = 0, Var (ui) = δ2, cov = (ui uj) = 0 for all i and j
A model, where a dichotomous dependent variable is expressed as a linear function of
one, or a number of regressors, is called a linear probability model. This is because the
expected value of Yi is the conditional probability that the individual has private health
insurance (i.e. YI = 1, given the individual’s income (given xi)). To see this, let Pi = proba-
bility that Yi = 1 (the response is positive). In this case, 1 – Pi = probability that Yi = 0, the
response is negative. The probability distribution of Yi is:
Yi Probability
Yi = 1 Pi
Yi = 0 1-Pi
The mathematical expectation of Yi is therefore: E(Yi/Xi) = Pi, hence the conditional expec-
tation of Yi, given Xi is the probability of a positive response, given income. Note also that,
under E(ui) = 0, using LPM, we have:
(10.2)
(10.3)
where 0 ≤ Pi ≤ 1.
This is done by estimating β1 and β2, using data on Xi and Yi. However, there is no mech-
anism in the model to ensure that the estimated probabilities obtained from the estimate of
β1 and β2 are within the permissible range of zero and one. This is an obvious problem in the
use of LPMs. In addition, the assumption of linearity may be considered to be rather unreal-
istic. To see this, notice that β2 can be expressed as:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
β2 = ΔPi/ΔXi
That is, β2 measures the change in probability due to a unit change in income. Since
parameter β2 is a constant, the change in the probability, at all levels of income, is in fact
assumed to be constant. This feature is rather unrealistic. At low levels of income, up to a
certain level of income, it is reasonable to assume that incremental changes in income have
no effect on the probability. Once income reaches a certain level, however, there is a strong
probability that it will change at an increasing rate with income. Once a certain high level of
income is reached, there is no effect on the probability as a result of the change in income.
This type of non-linear relationship between the probability Pi and income may be depicted
as follows:
X2
P1 1
0 X
X1
For all values of Xi, the logistical function falls between zero and one. Given this property
and non-linearity, the logistical function is frequently used to estimate conditional probabili-
ties in models involving qualitative dependent variables.
In addition to the above problems, another issue arises with LPMs.
244 Qualitative variables in econometric models – panel data regression models
10.1.2 Heteroscedasticity
LPM is associated with heteroscedasticity in the disturbance term. In other words, under
LPM specification, the distribution of disturbance terms of the model are inevitably
heteroscedastic. The estimation of the model by OLS therefore results in inefficient
estimators. To see why the disturbance term is heteroscedastic, we derive the variance:
The disturbance term takes only two values with the following probabilities:
when Yi = 1
(10.4)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
when Yi = 0
(10.5)
(10.6)
In addition to the presence of heteroscedasticity, the disturbance term is not normally distrib-
uted and, unless large observations are available, statistical tests based on a normal distribu-
tion cannot be used in LPMs. Due to these problems inherent in the LPM, in practical
applications the preferred choice is the logit model described below:
1 Being based on the logistical curve, for all values of the regressors, the value of the
dependent variable (the probability of positive response) falls between zero and one.
2 The probability function is a non-linear function following a logistical curve. This is a
more realistic pattern of change in the probability when compared to LPMs.
3 The estimation of the logit model is quick and easy and there are many packages
available for the estimation of logit-type models (SHAZAM, EViews, LIMDEP,
Microfit 4/5).
(10.7)
Equation (10.7) gives the probability of a positive response. Under this specification, the
probability of a negative response is:
(10.8)
(10.9)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
To estimate the model, we take the natural log of both sides, hence:
(10.10)
or
(10.11)
where, Li is the logit function. In practice, the model is usually estimated by maximum
likelihood methods. However, in principle, it may be estimated by the OLS method, provided
that a large data set is available. To illustrate the OLS estimation using group data, we use
the following steps:
Step 1 A large cross-section data set is typically collected via questionnaires. The data is
arranged into different categories of income.
Step 2 For each category, the number of people with a positive response to the binary
question are recorded. To generate estimates/proxies for Pi, the probability of a positive
response, this number is then divided by the total number of respondents in each income
category.
Step 3 These calculated probabilities are then used in the regression model to allow
estimation. This procedure is shown below:
(10.12)
Where ui is a disturbance term added to (10.11) to capture the net influence of variables
other than income on the dependent variable. Given a large data set, the distribution of the
disturbance term can be shown to approach a normal distribution, as follows:
(10.13)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Note that the variance of this distribution changes with each observation, and the assumption
of homoscedasticity cannot be maintained. To generate efficient estimates, the logit model
is usually divided by 1/ [N1Pi(1 − Pi)] (multiply the dependent/independent variable by
N1Pi(1 − Pi)), using estimated relative frequencies instead of the probabilities. This proce-
dure usually succeeds in removing heteroscedasticity. The model can then be estimated by
the OLS method.
The estimated slope coefficient shows the impact of a unit change in a regressor (a partial
change if there is more than one regressor) on the log of odds ratio. This concept, however, is
seldom used in applied work as the focus of attention in many studies is on the estimated
conditional probabilities. Computing packages (e.g. Microfit, EViews) routinely report the
fitted conditional probabilities associated with any given set of values of the regressors. The
use of these fitted values will be illustrated in an example of logit model estimation given
below.
(10.14)
P 1
Probit
Logit
Downloaded by [Hacettepe University] at 02:27 20 January 2017
0 X1
slight differences between the two CDFs, the choice between probit and logit models is
essentially a matter of convenience.
In order to specify a regression model, we need to use theory to identify the key factors
influencing a firm’s R&D activities. According to theoretical reasoning (see, for example,
Seddighi and Hunly (2007)), some of these factors may be listed as:
• X1 = firm size. We distinguish between small, medium and large firms, so:
large firms are where turnover/sales exceeds $1 million
medium firms are where turnover/sales exceeds $100,000
small firms are where turnover/sales are less than $100,000
• X2 = export intensity. We distinguish the following categories:
high: where 50%, or more, of sales are for exports
medium: where export intensity is between 10–50% of sales
low: where export intensity is less than 10% (less than 10% of sales are exported)
• X3 = technological opportunity. Here we consider the following two categories:
high: where a firm operates in a high/modern technological sector (software firms, etc.)
low: where a firm operates in a low technological sector (metal work, etc.)
248 Qualitative variables in econometric models – panel data regression models
10.4.1 A logic regression
To illustrate the basic ideas, a logit model can be used:
Pr (Yi = 1/ firms size, export intensity, technology opportunity) = pr (Yi = 1/X1, X2, X3) =
(10.15)
where β0, β1, β2, and β3 are the unknown parameters of the logit model. The model suggests
that the probability of a firm undertaking R&D activities, Pr (Yi = 1), depends on the size of
the firm, export intensity and technological opportunity. The aim of the investigation is to
estimate this conditional probability on the basis of a set of observations on Yi, Xii, X2i and X3i.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Finally:
There are 3 × 3 × 2 = 18 different types of firm. For each type we calculate the conditional
probability of a firm being engaged in R&D activities.
In practice, the estimation is carried out using the likelihood method via a regression
package, such as Microfit 4/5, EViews or LIMDEP. A typical output for our example may
be listed as follows:
We notice that X2 and X3 are statistically significant (relatively high t-ratios), whereas
X1 appears to be statistically insignificant. However, recall that a firm is identified by all
three variables jointly. Therefore, we are interested in the estimates of all slope coefficients
Qualitative response regression models 249
(β1, β2, β3) rather than each individual coefficient. To test for the joint significance of the
regressors, a likelihood ratio (LR) test is usually performed in logit/probit regression. This
test statistic has a chi-squared distribution with a degrees of freedom equal to the number of
regressors of the model. An appropriate regression software package (e.g. Microfit 4/5
LIMDEP, EViews) routinely reports the LR test value. In the above example, the LR = 16.32
and Df = 3. The 5% critical value of the test is 12.83, we can, therefore, reject the null
hypothesis and conclude that the three regressors are jointly significant. The estimated coef-
ficients are then used in the logit model to generate predicted probabilities. In practical
applications, the change in the probability of a positive response due to a unit change in any
one of the regressors is usually calculated via the difference in the corresponding fitted prob-
abilities reported by the regression packages. Table 10.2 presents the predicted probabilities
of a selection of firms corresponding to the above example.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Firm Probability X1 X2 X3
3 2 1 3 2 1 3 2 1
1 0.80710 * * *
2 0.58085 * * *
3 0.56964 * * *
4 0.51912 * * *
5 0.50765 * * *
6 0.49618 * * *
7 0.31439 * * *
8 0.10588 * * *
9 0.10161 * * *
10 0.09749 * * *
According to these results, firms with the highest predicted probability of undertaking
R&D are those with high turnovers, high export intensities and high technological opportu-
nity (firm 1). Firms with low turnovers, low export intensity and low technological opportu-
nity, have the lowest predicted probability of undertaking R&D (only 9%). Analysing the
top six firms, that is those firms with a predicted probability of approximately 50% and
more, it can be seen that all such firms have high or medium export intensities, while five
have high sales and three operate within a high technological environment. The analysis
clearly identifies exporting firms as being likely to be involved in R&D activities.
With reference to the goodness of fit measure, because the logit/probit models are non-
linear in the parameters, the conventional coefficient of determination is not applicable.
Unfortunately, there is no universally accepted measure of goodness of fit for these type of
models. Most computing packages routinely report a pseudo R-squared value called the
McFadden R-squared, which ranges between 0 and 1, with a value close to unity indicating
a good fit. In the above example, this measure of goodness of fit is reported at 0.5612,
indicating a fairly good fit, according to this measure.
In most applications, the logit and probit models typically generate similar results. The
probit results for this example are essentially the same. If a researcher considers that move-
ment towards a probability of one or zero, after certain values of the regressors have been
reached, occurs quickly, then a probit model provides a better approximation to the data gener-
ation process. Otherwise, the logit model is preferred and more commonly used in practice.
250 Qualitative variables in econometric models – panel data regression models
10.5 Multi-response qualitative dependent variable
regression models
In the above examples the dependent variable was specified as a binary variable representing
choice between two alternatives. In many applications there are more than two options
facing economic units, such as individuals, household, firms and governments. Qualitative
dependent variable models representing the choice between more than two options facing an
economic agent are called multi-response/polychotomous dependent variable models. In
applying this type of model, the focus is on estimating the probability of choosing a
particular option given the economic agent’s key characteristics. The estimation is carried
out via generalised logit/probit models; in practice, the logit specification is the preferred
choice, as in the case of binary response models. Depending on the type of qualitative
Downloaded by [Hacettepe University] at 02:27 20 January 2017
data collected, these regression models go under the names of multinomial logit and
ordered logit/probit models. To introduce some of the key ideas, we consider the following
examples:
where Pij is the probability that individual i chooses Option j; and three binary dummy
variables, Yi1, Yi2, and Yi3, as follows:
The probability that individual i chooses each one of the above options, within the frame-
work of a multinomial logit, can be expressed as follows:
(10.16)
(10.17)
and
(10.18)
Notice that parameters α and β are specific to choosing Option 2 and parameters γ and δ are
specific to choosing Option 3. The parameters for choosing Option 1 is set to zero to ensure
that Pi1 + Pi2 + Pi3 = 1 (the sum of probabilities must add up to unity). The objective is to
estimate the above unknown parameters on the basis of a large sample of data on individuals
and their income: Xi. The estimation is carried out via the maximum likelihood method
(ML) using computer software (e.g. LIMDEP). Once the parameters are estimated, the fitted
probabilities are then generated for comparison and target marketing, as explained above.
252 Qualitative variables in econometric models – panel data regression models
10.6 Ordered logit/probit regression models
Often when collecting survey data via questionnaires a Likert Scale (a scale used to indicate
the extent of preferences) is used to collect information on the opinion of people concerning
a particular issue or option. For example, when collecting data on the quality of provision of
a service via questionnaires, for example, on banking services, hotel services, etc., a customer
is typically asked to select one of the following options about the service quality:
Question/statement: I find the quality of service to be good.
Options:
1 Strongly disagree
Downloaded by [Hacettepe University] at 02:27 20 January 2017
2 Disagree
3 Neutral
4 Agree
5 Strongly agree
For the purpose of computer modelling, numerical data/codes are assigned to each option.
These numerical values are ordinal only, reflecting the ranking of each option. This type of
ordinal data, based on a Likert scale, are frequently collected via surveys seeking opinions
on various issues and subjects. They are used in quality improvement surveys, marketing
surveys, by credit rating agencies, by government agencies, and by financial organisations.
(10.19)
where Y* is the unobservable utility index (this type of unobservable variable is called a
latent variable), α is an unknown parameter, and u is a disturbance term. The relationship
suggests that in ranking the quality of service, the individual takes into consideration the
time taken to provide the service, as well as some other less important random factors
captured by the error term, u. Notice that this is not a regression model, as the dependent
variable is not observable. To make this type of conceptual model operational, it is then
assumed that each option is chosen according to some threshold values. Given that in the
above example the individual is faced with five options, four threshold values in minutes,
T1, T2, T3 and T4, may be specific, as follows:
Qualitative response regression models 253
(1.20)
In principle, the probability of each of the above options can be calculated, provided the
distribution of the Y* or the error term is known. In applied work, practitioners either assume
a standard normal distribution or a logistic distribution for the latent variable in this type of
Downloaded by [Hacettepe University] at 02:27 20 January 2017
modelling. The ordered probit model is based on the standard normal distribution, while the
ordered logit model uses a logistic distribution as its foundation. Both models tend to produce
very similar results. Estimation is carried out using maximum likelihood estimation via
appropriate software, such as LIMDEP. On the basis of a large data set, this procedure
generates the estimates of the unknown threshold parameters, T1, T2, T3 and T4, as well as
the utility index parameter, α, such that the corresponding likelihood function is maximised.
Once these parameters are estimated, one can determine the thresholds for various available
options and accordingly improve the quality of the provision of a service. For example,
suppose in the above example we estimate that T1 = 10, T2 = 15, T3 = 25 and T4 = 45
minutes. To improve service quality, attention should be on reducing the time taken to
provide the service to below the T2 level of 15 minutes. This type of information enables
service providers to improve the quality of service provision, on a systematic cost-based
framework.
The qualitative dependent variable models in different forms have become increasingly
more popular in recent years. This is because large-scale survey data on all sorts of issues are
nowadays collected to help decision-making processes. In addition, regression packages,
such as LIMDEP, are now readily available, and are capable of handling this type of data set
and modelling, allowing estimation and inference to be carried out efficiently and with
relative ease.
INTRODUCTION
Panel data sets provide a rich environment for researchers to investigate issues which
could not be studied in either cross-sectional or time series settings alone. A key attribute
of this type of data is that it provides a methodology to correct for the omitted variable
problem inherent in cross-sectional data analysis. In a typical panel data study there are
a large number of cross-sectional units, for example, a large number of individuals,
firms or even regions/countries, and only a few periods of time. Researchers collect data on
the characteristics of the cross-sectional units over a relatively short period of time, to
investigate changes in behaviour or potential of a typical cross-section unit. A key feature
of the panel data set is that it provides observation over time on the same cross-sectional
units. This is in contrast to a pooled cross-section/time series data set, discussed in
Chapter 9, where, typically, a randomly selected cross-section data set is observed a number
of times over time, in order to increase the number of observations and to improve the
degrees of freedom of the regression. This chapter focuses discussion on panel data of
the type described above, providing an introduction to their applications and estimation in
practice.
Key topics
• The nature of panel data and its applications
• Using panel data to correct for the omitted variable problem
• The fixed-effects regression models
• The random-effects regression models
• Panel data regression in practice
(11.1)
Yit denotes the dependent variable corresponding to the cross-sectional unit i at the time t; β1
is the intercept term; β2 and β3 are the slope parameters fixed over time and across cross-
sectional units; X2 and X3 are the two regressors changing values across cross-sectional units
and over time, hence, subscripts i and t; αi is the ith individual/cross-sectional unit effect;
and, finally, uit is a disturbance term satisfying the standard conditions.
The focus of attention of the panel data analysis is on the individual effect term αi. This
term is supposed to capture all unmeasured and unobserved variables that influence
258 Qualitative variables in econometric models – panel data regression models
each cross-section unit in a different fashion, for example, such variables as the cross-
sectional unit’s ability, common sense and culture. These are likely to be correlated with the
regressors of the model, and if ignored, that is, left in the disturbance term of the model,
would cause the OLS estimators to be biased and inconsistent. Note that as one moves from
one cross-section unit to the other, the intercept term of the regression model would change,
but the slope parameters would remain unchanged. To go a bit further with this discussion,
we consider below an example of panel data regression analysis in practice.
over time.
To study the neoclassical convergence hypothesis, a researcher collects cross-sectional
data on 20 countries from a published online data bank, for example, the OECD publica-
tions, over a 5-year period. Note that the key requirement of the fixed-effects model, that the
data should be non-random and be collected in a non-random fashion is satisfied here.
The relevant variables of the regression model are per capita annual growth rate of GDP,
G Ḋ P, (dependent variable) and per capita GDP (independent variable). The following
simple regression model is specified:
(11.2)
11.3.2 Approach (i): removing the individual fixed-effect term from the
regression model to correct for the omitted variable problem
Step 1 Find the average value of each variable over the period of the data set for each cross-
sectional unit specified in the model. In the above example, this method requires that for each
one of the 20 countries we first find the average value of per capita GDP growth rate, and the
GDP per capita over a 5-year period. Using the regression model, this step implies:
(11.3)
Using the fact that under the fixed-effects framework the parameters do not change over
time, we can write this equation in terms of average/mean value of each variable, as follows:
(11.4)
Step 2 Find the deviation of each individual cross-section observation from its respective
average/mean value calculated in step 1. Because the intercept term for each cross-sectional
unit is assumed to be constant over time, this data transformation eliminates the regression
intercept term. Also, each of the country-specific fixed-term effects are eliminated. This data
transformation, therefore, removes the omitted variables implicitly captured by these terms.
Using our example, this step implies:
(11.5)
Note that the data transformation is carried out for each country and for each year. For
example, for country 1 (i = 1), each variable is expressed in each year (t = 1,2. . .5) in devia-
tion form from its average value of 5 years. For each one of the countries in the sample, we
have 5 observations, and the total number of observations is N × T = 20 × 5 = 100. The
transformed data set is said to be in ‘deviation from the individual mean form’ and the
regression is known as the within-groups regression model.
Step 3 Estimate the within-groups regression model above by the OLS using a regression
package designed to carry out fixed-effects estimation (for example, EViews and Stata).
These regression packages carry out the required transformation of the data automatically
and use appropriate regression through origin to estimate the model. This data transformation
260 Qualitative variables in econometric models – panel data regression models
removes the omitted variable problem, and the OLS estimators of the transformed model are
unbiased and consistent. These estimators are sometimes called the fixed-effects estimators.
Step 4 Analysis: following this procedure, suppose that in the above example, the slope
parameter estimate is βˆ = −0.003. This value implies that for every one unit change in per
capita income, it is estimated that the rate of growth of per capita GDP, on average, will fall
by 0.003 of 1%. This is a rather small change, but the sign of the parameter estimate is
consistent with the prediction of the convergence hypothesis.
Step 5 Having obtained the OLS estimate of the slope parameter we can now recover the
estimate of the intercept term of each country’s regression model (β1 + αi) using the fact that
the OLS regression line passes through the point of the means, as follows:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(11.6)
Note that, if we specify the original model without intercept term and only specify the fixed
country effect in the regression model, this procedure generates the OLS estimates of the
fixed-effects terms.
Comments
The above data transformation significantly reduces the available degrees of freedom of the
regression. In transforming the data for each cross-sectional unit, we lose one degree of
freedom, reducing the available degrees of freedom by n. In general, the degrees of freedom
of the within-group fixed regression model is NT-N-K, where K is the number of parameters
to be estimated. Applying this to our example, the degrees of freedom of the regression is
now only 100−20−2 = 78. In addition to a significant loss of degrees of freedom, this trans-
formation also eliminates all time-invariant variables from the regression. For example, vari-
ables such as geographical position of the country, or race and religion, normally taken into
account via the dummy variable technique, will all be eliminated. Moreover, the residual
sum of squares of this type of transformed model is normally more than that of the original
model, giving rise to imprecise parameter estimates. Despite these issues and shortcomings,
the within-groups fixed-effects regression model is a popular method and is often the
preferred choice, in practice, for removing the omitted variable problem.
(11.7)
Panel data regression models 261
Previous period:
(11.8)
(11.9)
This type of data transformation removes the fixed-effect terms, αi, but the intercept
term of the regression model is also removed. Moreover, the disturbance term of the
transformed model would now be autocorrelated under the above conditions. This is
because the successive values of the disturbance terms now have a common term. In this
Downloaded by [Hacettepe University] at 02:27 20 January 2017
case the OLS estimators would be inefficient and the regression would not be reliable.
Only when the first differencing removes the autocorrelation from the data would
this method be useful. Because of these issues, the within-groups fixed-effects regression
model is preferred to this method for removing the omitted variables from the regression
models.
11.3.3 Approach (ii): capturing the individual fixed effects – the least
squares dummy variable (LSDV) regression
When the number of non-randomly selected cross-sectional units is relatively small, one
can use dummy variable methodology to allow for different individual fixed effects. To
avoid the dummy variable trap, the regression model is usually specified without the
intercept term. For each individual unit, a dummy variable is then specified to capture
explicitly the impact of each individual fixed effect on the dependent variable. Note that
this approach does need panel data. With only cross-section data, we need a dummy variable
for each cross-section unit, thereby depleting the degrees of freedom. For instance, in the
above example, if we use only cross-section data on 20 countries, we need 20 individual
country-effect dummy variables, leaving the regression model without any degrees of
freedom.
To demonstrate the dummy variable regression methodology we return to the above
example and include a number of dummy variables in the regression model, as follows:
Di = 0 otherwise
(11.10)
The coefficients of the dummy variables change across countries but remain constant over
time. The dummy variable regression model is estimated by the OLS using an appropriate
regression package (for example, EViews) to generate estimates of the country effects, αi,
and the slope parameter β. This type of fixed-effects model is referred to as a least squares
dummy variables (LSDV) model. This procedure generates efficient estimators provided that
all standard conditions are satisfied. It can be shown that mathematically this method is
262 Qualitative variables in econometric models – panel data regression models
identical to the within-groups regression model, however, both methods suffer from
similar limitations. In particular, as in the case of within-groups regression, one cannot
estimate the coefficients of the variables that are fixed for each cross-section unit.
For example, suppose for country one, we want to look at the impact of membership of
the WTO via a dummy variable method. This would not be possible due to perfect
multicollinearity as the sum of the country-specific dummy variables would always be unity,
which is equal to the value of the regressor of this new dummy variable, introduced to take
account of the status of the country one membership of the WTO. Thus, there is an exact
linear relationship between variables that are fixed for each cross-sectional unit, and the
fixed-effect dummy variables and this type of variable cannot be included in the LSDV
models.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(11.11)
The appropriate test statistic is the familiar F-test for parameter restrictions:
(11.12)
where
RSSR = residual sum of squares of the restricted model (no country effect) corresponding
to the regression model below:
(11.13)
This restricted regression model is estimated by the OLS from the pooled cross–section/time
series data set to generate the residual sum of squares, RSSR.
RSSU = residual sum of squares corresponding to the unrestricted LSDV regression
model.
If the value of the F-test statistic is greater than the corresponding theoretical value of F at
say 5% level of significance, we reject H0 and conclude that there are significant country
effects in the data. In this situation the LSDV method is consistent with evidence and it
generates linear unbiased and efficient estimators, provided that the usual assumptions
concerning the disturbance term are satisfied.
In cases when N is large, as it typically is in a panel study of families and individuals, the
LSDV method is not appropriate, because of the degrees of freedom problem. In practice,
when N is large the within-groups fixed-effects regression model is often the preferred
option.
Panel data regression models 263
11.3.5 Shortcomings of the fixed-effects methodology
The fixed-effects regression models are characterised by three basic problems:
(11.14)
Within the framework of the random effect, αi is a random variable representing unobserved
individual effects. Let Wit denote the disturbance term of the random-effect regression
model, that is:
(11.15)
and
(11.16)
This innovation subsumed all relevant omitted variables into the composite error term of the
regression model, effectively dealing with each one of the problems associated with the
fixed-effects methodology identified above. However, the random-effects methodology is
also problematic. In particular, if αi, the individual random-effect term, is correlated with
any one of the regressors, the OLS estimators would be unbiased and inconsistent.
264 Qualitative variables in econometric models – panel data regression models
Accordingly, for the random-effects methodology to work, the following two conditions
must be met:
If these conditions are met, in practice, the random-effects regression model is typically
preferred to the fixed-effects model. Let us assume for the moment that these two conditions
are satisfied. Can this model be estimated by the OLS? The answer to this question depends
on the properties of the composite disturbance term, Wit. To check these, we assume that
each individual effect is distributed independently around the mean of zero with a constant
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(11.17)
(11.18)
(11.19)
Note that αi, the individual effect, is random but is time-invariant. Therefore, as we go
forward in time, this term remains in each of the future error terms of the individual i. In
other words, each individual’s composite error term will have a common term over time and
would therefore be autocorrelated. This is shown below:
(11.20)
This type of autocorrelation is due to the nature of the panel data set, which provides observa-
tions on the same individuals over time. Note that there is no autocorrelation between different
randomly selected individuals, but there is autocorrelation between observations of the same
individual over time. The unmeasured random individual effects captured by the composite
error term are correlated for each individual over time, generating autocorrelation in the distur-
bance term of the random-effects regression model. In this situation OLS estimators are inef-
ficient and the OLS regression is unreliable. In the case of the random-effects model, the
efficient estimator is generalised least squares. For estimation purposes, in practice, we use
appropriate regression software (for example, EViews). A two-step estimation procedure is
used to transform the data to eliminate autocorrelation from the panel data set. The various
components are first estimated by using the residuals from OLS regression. Then, the feasible
generalised least squares (FGLS) estimates are computed using the estimated variances.
Step 1 Make sure the cross-section data is randomly selected from the population.
Panel data regression models 265
Step 2 Perform a version of the Hausman test to see whether the unobservable individual
effects are independently distributed of the regressors. This is a critical step for the use of the
random-effects regression model. The appropriate test statistic for this purpose is a version
of the Hausman measurement error test already encountered in the previous chapters. This
version of the test is known as the Durbin-Wu-Hausman (DWH) test and it is routinely
carried out and reported by regression packages (for example, EViews). The null hypothesis
of this test is:
Step 3 If this null is not rejected then the random-effects regression is appropriate and the
random-effects estimators are unbiased, consistent and efficient. The DWH test can be
carried out in a similar fashion to the Hausman measurement error test demonstrated in
Chapter 6. To do this test, the fixed-effects estimates are obtained (these can be considered
to be instruments for the random-effects regression model), and are then compared to the
random-effects estimates. Under the null hypothesis these two types of estimators should
generate similar results. The DWH test is designed to determine whether the differences
between these two types of parameter estimates, taken jointly, are statistically significant.
The test statistic has an F/chi-squared distribution. In general, as a rule of thumb, a relatively
large value of the DWH test statistic implies rejection of the null hypothesis. The degrees of
freedom of this test are shown to be lower than the number of the parameter restrictions of
the test, and are calculated and reported by the regression software.
Step 4 If the null hypothesis is rejected, the random-effects regression is unreliable and the
parameter estimates are invalid. In this situation, practitioners tend to use the fixed-effects
regression results, despite its shortcomings.
(11.21)
Where Yit is the ith country real per capita growth rate in period t, X2i is the ith country’s
level of exports, and X3it is the ith country’s level of fixed capital formation. According to
the literature, one expects a positive sign for both of the slope parameters, β2 and β3. The
parameter αi is the individual country effect, capturing the combined influence of such vari-
ables as culture and ability on the dependent variable, and uit is a disturbance term satisfying
all standard assumptions. To estimate this model a researcher obtains data over a 10-year
period on 15 countries ‘randomly’ selected from the OECD (Organisation of Economic
Cooperation and Development) year book, available online. There are, altogether, nxt = 15
× 10 = 150 cross-section/time series data on each variable. The focus of analysis is typically
266 Qualitative variables in econometric models – panel data regression models
on the slope parameters β2 and β3. In a panel data study of this kind, there are a number
of questions that need to be answered before the regression analysis is carried out, as
follows:
Step 1 Although the data are said to be randomly collected, given the nature of the economic
data and the fact that a published data bank is used, it is advisable to carry out the DWH test
Downloaded by [Hacettepe University] at 02:27 20 January 2017
before estimation.
and
(11.22)
This test determines whether the estimates of the parameters taken jointly are significantly
different in the two regressions.
Step 3 If H0 is not rejected the random-effects specification is the preferred option. This test
has a chi-squared distribution, with the degrees of freedom equal to the number of parameters
being compared, but occasionally a lower number is reported. For this example the
test statistic is computed by the EViews regression package as chi-squared = 18.76, with
2 degrees of freedom. The corresponding critical value at a 5% level of significance is 5.99;
the test statistic falls into the critical region of the test, we therefore reject H0 and conclude
that, for this example, the appropriate specification of the country effect is the fixed-effects
model.
Step 4 The fixed-effects model regression using the within-groups regression method
generates the OLS estimates of β2 and β3, respectively, as 0.024 and 0.045. These parameter
estimates are consistent with the theoretical prediction. Moreover, the OLS estimators are
unbiased and consistent under the fixed-effects specification. The fixed-effects regression
appears to be the appropriate panel data analysis in this example.
Panel data regression models 267
11.6 A summary of key issues
• Panel data sets typically contain observations on a large number of cross-sectional units
over a relatively short period of time.
• Panel data regression is mainly concerned with removing the ‘omitted variable problem’
from the regression to avoid biased and inconsistent OLS estimators.
• The omitted variable problem is due to individual effects inherent in cross-section data.
If the cross-section data set is collected using non-random sampling procedures, the
regression model is called the fixed-effects model. In this case, the individual fixed
effect changes across individuals but is assumed to be fixed over time.
• A popular method of estimating fixed-effects models is the within-groups fixed-effects
model. Within the framework of this model, data are transformed to eliminate the
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Review questions
1 Explain what you understand by a panel data set. What are the key features of this type
of data set? What type of studies can be carried out with this type of data? Give exam-
ples to illustrate your answer.
2 Explain what you understand by the ‘omitted variable problem’. How is this problem
resolved via panel data analysis?
3 Explain what you understand by the fixed-effects model. Under what conditions
should this model be used? What are the limitations of fixed-effects panel data
studies?
4 Explain the within-groups fixed-effects regression model. How does this model over-
come the omitted variable problem? Use an example to illustrate your answer.
5 Explain the least squares dummy variable model (LSDV). How does this model deal
with the individual fixed effects? How can this model be used for testing for heteroge-
neity of the cross-sectional units?
6 Explain the random-effects model. Under what conditions might this model be employed
in practice? What are the limitations of this type of panel data analysis?
7 Collect a panel data set consisting of the average annual stock prices of 60 manufac-
turing companies over 5 years and the level of annual investment expenditures by each
268 Qualitative variables in econometric models – panel data regression models
company over this 5-year period (you can collect this type of data from the online
sources, for example, from the London Stock Exchange website). It is thought that the
average annual stock price is positively related to the annual investment expenditure.
Use an appropriate regression package (EViews) to carry out each one of the following
steps in this panel data study.
a Specify a panel data regression equation and explain its underlying assumptions.
b Compare and contrast the fixed-effects model and the random-effects model and
explain which of these two alternatives is appropriate in this regression.
c Use the DWH test to select the appropriate model for the panel data regression.
Explain the DWH test and specify the null and alternative hypotheses of this
test.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
d Carry out a panel data regression analysis consistent with the result of the DWH
test. Explain and evaluate the regression results.
Hausman, J. A. and McFadden, D. (1984). ‘Specification tests for the multinomial logit model’,
Econometrica, 52, 1219–40.
Kennedy, P. (2008). A Guide to Econometrics, 6th edn, Blackwell Publishing.
Maddala, G. S. (1983). Limited Dependent and Qualitative Variables in Economics, Cambridge:
Cambridge University Press.
Scott Long, J. (1997). Regression Models for Categorical and Limited Dependent Variables, Sage
Publications.
Wooldridge, J. (2002). Economic Analysis of Cross Section and Panel Data, MIT Press.
Unit 4
Time series econometrics
Downloaded by [Hacettepe University] at 02:27 20 January 2017
• This unit provides an introduction to modern time series regression analysis. It explains
how the phenomenon of the spurious regression led to the development of a new meth-
odology for time series regression analysis, replacing the SG methodology in the early
1980s.
• The unit provides introductory coverage of the concept of stationary time series and
discusses its key role in modern time series econometrics and cointegration analysis.
• This unit has four key chapters: Chapter 12 covers key definitions and concepts. Chapter
13 provides a detailed explanation of the unit root tests and their applications. Chapter
14 introduces the methodology of cointegration and applies this methodology to a
number of bivariate econometric models. Finally, Chapter 14 provides introductory
coverage of multivariate cointegration analysis.
• This unit provides a rigorous but non-technical approach to the key topics, using many
applied examples to illustrate the use of unit root tests and cointegration analysis in
practice.
12 Stationary and non-stationary
time series
Downloaded by [Hacettepe University] at 02:27 20 January 2017
INTRODUCTION
The specific to general (SG) approach to time series analysis implicitly assumes that the
means and the variances of the economic variables in a regression model remain constant
over time. That is, they are time-invariant, and when these assumptions break down,
alternative methods of estimation to the OLS are used to generate efficient estimators.
Within this framework the underlying reasons for the breakdown of these assumptions are
seldom systematically investigated. The mean and variances of many economic and
financial variables are, however, time-variant and their time series are non-stationary. The
non-stationary time series and their treatment have given rise to some important contribu-
tions in econometrics in recent years, including unit root tests and cointegration analysis. In
the next three chapters we will focus attention on these topics. Each chapter will include an
extended summary of key issues to help with better understanding of techniques and
applications.
This introductory chapter provides a discussion on the basic definitions, concepts and
ideas of time series econometrics. Key procedures are illustrated via examples using actual
time series data.
Key topics
• Stationary and non-stationary time series
• Models with deterministic and stochastic trends
• Integrated time series
• Testing for stationarity: the autocorrelation function
(12.1)
(12.2)
(12.3)
Stationarity In this chapter when we use the term ‘stationary’, we will refer to weak station-
arity. Generally speaking, a stochastic process, and, correspondingly, a time series, is
stationary if the means and variances are constant over time and the (auto) covariances
between two time periods, t and t+k, depend only on the distance (gap or lag) k between
these two time periods and not on the actual time period t at which these covariances are
considered.
Non-stationarity If one or more of the three conditions for stationarity are not fulfilled, the
stochastic process, and, correspondingly, the time series, is called ‘non-stationary’. In fact,
most time series in economics are non-stationary. In Figure 12.1, for example, the time
series of private consumption (CP) and personal disposable income (PDI) for an EU member,
since they are all trending consistently upwards, are almost certain not to satisfy condition
(12.1) for stationarity, and, therefore, they are non-stationary (actual data for these series can
be found in Table 12.1).
Having defined stationarity, in what follows we present some useful time series models:
(12.4)
Stationary and non-stationary time series 275
600000
500000
400000
300000
200000
100000
Downloaded by [Hacettepe University] at 02:27 20 January 2017
65 70 75 80 85 90 95
CP PDI
Figure 12.1 Private consumption and personal disposable income for an EU member, 1960–1995,
annual data, millions of 1970 (drs).
(12.5)
(12.6)
(12.7)
is stationary by definition, since its means are zero, its variances are σ2, and its covariances
are zero, being therefore constant over time.
Random walk This is a simple stochastic process {Xt} with Xt being determined by:
(12.8)
(12.9)
implying that the mean of Xt is constant over time. In order to find the variance of Xt, we use
(12.8), which after successive substitutions is written as:
(12.10)
where X0 is the initial value of Xt, which is assumed to be any constant or could be also taken
to be equal to zero. The variance of (12.10), taking into account (12.7), is given by:
(12.11)
276 Time series econometrics
(12.11) shows that the variance of Xt is not constant over time, but instead it increases with
time. Therefore, because condition (12.2) for stationarity is not fulfilled, Xt, or the random
walk time series, is a non-stationary time series. However, if (12.8) is written in first differ-
ences, i.e. is written as:
(12.12)
this first differenced new variable is stationary, because it is equal to εt, which is stationary
by definition.
Random walk with drift This is the case of the stochastic process {Xt} with Xt being deter-
mined by:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(12.13)
where μ ≠ 0 is a constant and εt is white noise. The term ‘drift’ has been given to this process
because if we write (12.13) as the first difference:
(12.14)
this shows that the time series Xt ‘drifts’ upwards or downwards, depending on the sign of μ
being positive or negative. The random walk with drift time series is also a non-stationary
time series. The processes in (12.8) and in (12.13) are no longer ‘random walks’ if the
assumption of white noise is relaxed to allow for autocorrelation in εt. However, even in
cases of autocorrelation in εt, time series Xt will still be non-stationary.
Time trends We call ‘time trend’ the tendency of a non-stationary series to move in one
direction. Let us consider the following model:
(12.15)
Case 1. Stochastic trend This is the case where β = 0 and ϕ = 1. Model (12.15) is written in
this case as:
(12.16)
or
(12.17)
From (12.17) it is seen that Xt trends upwards or downwards according to the sign of α being
positive or negative, respectively. This type of trend is known as ‘stochastic trend’. Model
(12.16) is called a difference-stationary process (DSP) because the non-stationarity in Xt can
be eliminated by taking first differences of the time series (Nelson and Plosser, 1982).
Case 2. Deterministic trend This is the case where β ≠ 0 and ϕ = 0. Model (12.15) is written
in this case as:
Stationary and non-stationary time series 277
(12.18)
From (12.18) it is seen that Xt trends upwards or downwards according to the sign of β being
positive or negative, respectively. This type of trend is known as a ‘deterministic trend’.
Model (12.18) is called a trend-stationary process (TSP) because the non-stationarity in Xt
can be eliminated by subtracting the trend (α + βt) from the time series.
Case 3. Combined stochastic and deterministic trend This is the case where β ≠ 0 and ϕ = 1.
Model (12.15) is written in this case as:
(12.19)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
From (12.19) it is seen that Xt trends upwards or downwards according to the combined
effect of the parameters α and β. This type of trend is known as ‘combined stochastic and
deterministic trend’. To test the hypothesis that a time series is of a DSP type against being
of a TSP type, specific tests have to be employed, such as those developed by Dickey and
Fuller (1979, 1981).
Generalisations We saw in (12.8) that the random walk process is the simplest non-
stationary process. However, this process is a special case of:
(12.20)
which is called a first-order autoregressive process (AR1). This process is stationary if the
parameter ϕ holds that −1 < ϕ < 1. If it is either ϕ < − 1 or ϕ > 1 then the process will be
non-stationary.
Generalising, equation (12.20) is a special case of:
(12.21)
which is called a qth-order autoregressive process (ARq). It can be proved (Greene, 1999)
that this process is stationary if the roots of the characteristic equation:
(12.22)
where L is the lag operator, are all greater than unity in absolute values. Otherwise, the
process in (12.21) is non-stationary.
(12.23)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
5 If Xt ~ I(d) and Yt ~ I(d), then Zt =(a Xt +b Yt) ~ I(d*), where d* in most cases is
equal to d. However, there are special cases, which we will see later in this chapter,
where d* is less than d.
(12.24)
(12.25)
(12.26)
(12.27)
Stationary and non-stationary time series 279
where ρk is the autocorrelation coefficient (AC) between Xt and Xt−k. The autocorrelation
coefficient ρk takes values between −1 and +1, as verifiable from (12.27). The plot of ρk
against k is called the population correlogram. One basic property of the autocorrelation
function is that it is an even function of lag k, i.e. it is ρk = ρ−k. For other properties see
Jenkins and Watts (1968).
For a realisation of a stochastic process, i.e. for a time series Xt, we know that it is:
(12.28)
(12.29)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(12.30)
(12.31)
where ρ̂ k is the estimated autocorrelation coefficient between Xt and Xt−k. The estimated
autocorrelation coefficient ρ̂ k takes values between −1 and +1, as verifiable from (12.31).
The plot of ρ̂ k against k is called the sample correlogram. In what follows, when we refer to
autocorrelation functions or correlograms, we mean the sample equivalent.
As a rule of thumb, the correlogram can be used for detecting non-stationarity in a time
series. As an example, let us consider the time series of private consumption for an EU
member, as presented in Table 12.1. We have seen already in Figure 12.1 that this time
series may be non-stationary. In Table 12.2 the estimated autocorrelation coefficients (AC)
for this time series are reported, and the corresponding correlogram is presented in
Figure 12.2. Furthermore, in Table 12.2 the estimated autocorrelation coefficients for the
1
Autocorrelation coefficient
0,8
0,6
0,4
0,2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
–0,2
Lag-k
same time series, but differenced once, are reported, with the corresponding correlogram in
Figure 12.3.
By examining the correlogram in Figure 12.2 we see that the autocorrelation coefficients
start from very high values (ρ̂ k = 0.925 at lag k = 1) and their values decrease very slowly
towards zero as k increases, showing, thus, a very slow rate of decay. In contrast, the
correlogram in Figure 12.3 shows that all autocorrelation coefficients are close to zero. The
Stationary and non-stationary time series 281
Table 12.2 Autocorrelation coefficients (AC), Ljung-Box Q-Statistic (LB_Q), and probability level
(Prob) for the private consumption time series for an EU member
correlogram in Figure 12.2 is a typical correlogram for a non-stationary time series, whilst
the correlogram in Figure 12.3 is a typical correlogram for a stationary time series. Generally,
as a rule of thumb:
If, in the correlogram of a time series, the estimated autocorrelation coefficient ρ̂ k does
not fall quickly as the lag k increases, this is an indication that the time series is non-
stationary. By contrast, if in the correlogram of a time series, the estimated
0,3
Autocorrelation coefficient
0,2
0,1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
–0,1
–0,2
–0,3
Lag-k
the normal distribution with zero mean and variance equal to 1/n, where n is the sample size.
Therefore, the hypotheses for testing autocorrelation coefficients individually may be formu-
lated as:
(12.32)
where tα2(m) is the critical value of the t-distribution (normal distribution) for an α level of
significance.
In our case, of the time series of the private consumption in Table 12.1, the sample size
is n = 36 and therefore 1/√36 = 0.167. If we assume that the level of significance is
α = 0.05 then tα/2 = 1.96, and, thus, tα/21/√n = 0.327. Comparing this value of 0.327 with the
estimated autocorrelation coefficients in Table 12.2 of the original private consumption
series, we see that all the AC up to lag k = 8 are greater than the critical value of
0.327. Therefore, we accept the alternative hypothesis in (12.32), i.e. we accept the
hypothesis that private consumption is a non-stationary time series. Furthermore, comparing
this value of 0.327 with the estimated autocorrelation coefficients in Table 12.2 of the
once differenced private consumption series, we see that the absolute values of all the AC
are less than the critical value of 0.327. Therefore, we accept the null hypothesis in (12.32),
i.e. we accept the hypothesis that the once differenced once private consumption is a
stationary time series.
(12.33)
Stationary and non-stationary time series 283
where n is the sample size and m is the lag length used. Because this statistic is not valid for
small samples, Ljung and Box (1978) proposed a variation of the statistic in (12.33), the
LB_Q statistic, as follows:
(12.34)
The statistic in (12.34) is more powerful, both in small and large samples, than the statistic
in (12.33), which may be used in large samples only. The hypotheses for testing autocorrela-
tion coefficients jointly may be formulated as:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(12.35)
where: χ2α(m) is the critical value of the χ2 distribution for an a level of significance and m
degrees of freedom.
In this time series of the private consumption in Table 12.1, the sample size is n = 36
and the estimated autocorrelation coefficients reported in Table 12.2 extend up to lag
length k = 16. Furthermore, in the same Table 12.2, the Ljung-Box Q-statistics (LB_Q) are
reported for the specific lag lengths autocorrelation coefficients and with the corresponding
probability levels for significance (Prob). We see in Table 12.2 that all the Probs are 0.000
for the autocorrelation coefficients of the original private consumption series. Therefore, we
accept the alternative hypothesis in (12.35), i.e. we accept the hypothesis that private
consumption is a non-stationary time series. Finally, none of the Probs in Table 12.2 are less
than the 0.05 significance level for the autocorrelation coefficients of the once differenced
private consumption series. Therefore, we accept the null hypothesis in (12.35), i.e. we
accept the hypothesis that the once differenced private consumption is a stationary time
series.
This is the normal procedure in the univariate time series analysis of the Box and
Jenkins type. It is also now the standard procedure in time series econometrics via the
cointegration procedure.
• Let yt yt + 1. .,yt + k be a realisation set from a stochastic process corresponding to the
random variables (Yt,Yt + 1, . . ., Yt + k) with the joint probability function P(Yt,Yt + 1,
. . .,Yt + k), a future set of realisation, m periods ahead, conditional on this current
probability function is generated by P(Yt + m,Yt + 1 + m, . . .,Yt + K + m/Yt . . . .Yt + k ). We
define a stationary process, as one whose joint probability distribution and conditional
joint probability distribution are both invariant with respect to displacement in time,
that is:
Review questions
1 Explain what you understand by each one of the following:
a a stochastic time series process
b a stationary time series process
c a non-stationary time series.
In each case give an example to illustrate your answer.
2 Distinguish between conditions for a weak form and a strong form stationarity. Explain
what you understand by a joint probability function of a time series data generation
process.
3 Collect a time series data set on aggregate imports for the period 1980–2010, relating to
a country of your own choice.
a Plot your data in actual figures and in 1st difference form. What can you infer from
these plots?
Stationary and non-stationary time series 285
b Plot the associated correlogram and explain what it implies.
c Test for the non-stationarity using relevant individual and joint tests of
significance.
4 Explain the specific to general (SG) approach to the analysis of time series regression
models. How does this approach normally deal with the problem of spurious
regression? Give examples to illustrate your answer.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
13 Testing for stationarity
The unit root tests
Downloaded by [Hacettepe University] at 02:27 20 January 2017
INTRODUCTION
Key topics
• Dickey-Fuller unit root tests
• Problems with the unit root tests
(13.1)
where εt is white noise. This process may also be written in a first-order difference equation
form as:
(13.2)
For (13.2) to be stationary the root of the characteristic equation 1 – ϕL = 0 must be greater
than unity in absolute values. This equation has one root only, which is L = 1/ϕ, and, thus,
stationarity requires –1< ϕ < 1. Therefore, the hypotheses for testing the stationarity of Xt
may be written as:
(13.3)
Testing for stationarity 287
In the case that ϕ = 1, i.e. if the null hypothesis is true, then (13.2) is the random walk
process, which we saw was non-stationary. This unity of ϕ is known as the unit root problem,
i.e. the problem of the non-stationarity of the corresponding process. In other words, a unit
root is another way to express non-stationarity.
By subtracting Xt–1 from both sides of (13.1), we get that:
(13.4)
where ∆ is the difference operator and δ = ϕ –1. In other words, (13.4) is another way to write
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(13.1). Assuming that ϕ is positive (this is true for most economic time series) the hypoth-
eses in (13.3) may be written equivalently as:
(13.5)
In the case that δ = 0, or, equivalently, ϕ = 1, i.e. if the null hypothesis is true, the corre-
sponding process is non-stationary. In other words, non-stationarity, or the unit root problem,
may be expressed either as ϕ = 1, or as δ = 0. One could then suggest that the problem of
testing for non-stationarity of time series Xt reduces to testing if parameter ϕ = 1 in the
regression of Equation (13.1), or if parameter δ = 0 in the regression of Equation (13.4).
Such testing could be performed by using the two t-tests respectively:
(13.6)
where Sϕ̂ and Sδ̂ are the estimated standard errors of the estimated parameters ϕ̂ and δ̂
respectively. However, the situation is more complex. Under the null hypothesis of
non-stationarity, i.e. under ϕ = 1 or δ = 0, the t-values computed in (13.6) do not follow the
usual Student’s t-distribution, but they follow a non-standard and even asymmetrical distri-
bution. Therefore, other distribution tables should be employed.
(13.7)
288 Time series econometrics
and save the usual tδ-ratio in (13.6).
Step 2 Decide about the existence of a unit root in the process generating the time series Xt,
according to the following hypothesis:
(13.8)
where τ is the critical value from Table 13.1 for a given significance level. In other words,
for a time series to be stationary, the tδ value must be much negative. Otherwise, the time
series is non-stationary.
Dickey and Fuller noticed that the τ critical values depend on the type of the regression
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Equation (13.7). Therefore they tabulated τ critical values when the regression equation
contains a constant also, i.e. when Equation (13.7) becomes:
(13.9)
No constant
No time
(statistic-τ)
25 −2.66 −2.26 −1.95 −1.60 0.92 1.33 1.70 2.16
50 −2.62 −2.25 −1.95 −1.61 0.91 1.31 1.66 2.08
100 −2.60 −2.24 −1.95 −1.61 0.90 1.29 1.64 2.03
250 −2.58 −2.23 −1.95 −1.62 0.89 1.29 1.63 2.01
500 −2.58 −2.23 −1.95 −1.62 0.89 1.28 1.62 2.00
∞ −2.58 −2.23 −1.95 −1.62 0.89 1.28 1.62 2.00
Constant
No time
(statistic-τμ)
25 −3.75 −3.33 −3.00 −2.62 −0.37 0.00 0.34 0.72
50 −3.58 −3.22 −2.93 −2.60 −0.40 −0.03 0.29 0.66
100 −3.51 −3.17 −2.89 −2.58 −0.42 −0.05 0.26 0.63
250 −3.46 −3.14 −2.88 −2.57 −0.42 −0.06 0.24 0.62
500 −3.44 −3.13 −2.87 −2.57 −0.43 −0.07 0.24 0.61
∞ −3.43 −3.12 −2.86 −2.57 −0.44 −0.07 0.23 0.60
Constant
Time
(statistic-ττ)
25 −4.38 −3.95 −3.60 −3.24 −1.14 −0.80 −0.50 −0.15
50 −4.15 −3.80 −3.50 −3.18 −1.19 −0.87 −0.58 −0.24
100 −4.04 −3.73 −3.45 −3.15 −1.22 −0.90 −0.62 −0.28
250 −3.99 −3.69 −3.43 −3.13 −1.23 −0.92 −0.64 −0.31
500 −3.98 −3.68 −3.42 −3.13 −1.24 −0.93 −0.65 −0.32
∞ −3.96 −3.66 −3.41 −3.12 −1.25 −0.94 −0.66 −0.33
Source: Fuller, W. (1976) Introduction to Statistical Time Series, New York: John Wiley.
Testing for stationarity 289
and when the regression equation contains a constant and a linear trend, i.e. when Equation
(13.4) becomes:
(13.10)
For Equation (13.9) the corresponding τ critical values are called τμ, and for Equation (13.10)
the corresponding τ critical values are called ττ. These critical values are also presented in
Table 13.1. However, the test about the stationarity of a time series always depends on the
coefficient δ of the regressor Xt–1.
Example 13.1 Testing the stationarity of the private consumption time series of
Downloaded by [Hacettepe University] at 02:27 20 January 2017
an EU member (DF)
In Table 12.1 the private consumption (Ct) time series for an EU member is presented.
We saw in Chapter 12 that using the autocorrelation function methodology, this time
series is non-stationary. Let us now apply the DF test on the same data. Corresponding
to Equations (13.9) and (13.10), the OLS estimates for Ct are, respectively, the following:
(13.11)
(13.12)
In Table 13.2 the MacKinnon critical values for the rejection of the hypothesis of a
unit root, evaluated from EViews for Equations (13.11) and (13.12), are reported.
Considering the hypotheses testing specified above, we see that for Equation (13.11)
the tδ = −1.339 value is greater than all the τμ critical values in Table 13.2 and, thus, the
null hypothesis is not rejected. Therefore, the private consumption time series exhibits
a unit root, or, in other words, is a non-stationary time series. Similarly, for Equation
(13.12) the tδ = − 0.571 value is greater than all the ττ critical values in Table 13.2 and,
thus, the null hypothesis is again not rejected. Therefore, the private consumption (Ct)
time series exhibits a unit root, or, in other words, is a non-stationary time series.
Having found that the Ct time series is non-stationary, let us see how the first differ-
Downloaded by [Hacettepe University] at 02:27 20 January 2017
ence of this series (δCt) behaves in terms of non-stationarity. By repeating the same
exercise, as we did with Ct, we get for ∆Ct the following results:
(13.13)
(13.14)
where ∆2Xt = ∆Xt−∆Xt−1. In Table 13.3 the MacKinnon critical values for the rejection
of the hypothesis of a unit root, evaluated from EViews for Equations (13.13) and
(13.14), are reported.
Considering the hypotheses testing specified above, we see that for Equation (13.13)
the td = − 4.862 value is much less than all the τμ critical values in Table 13.3 and, thus,
the alternative hypothesis is accepted. Therefore, the private consumption differenced
once time series is not exhibiting a unit root, or, in other words, is a stationary time
series. Similarly, for Equation (13.14) the tδ = − 5.073 value is much less than all the ττ
critical values in Table 13.3 and, thus, the alternative hypothesis is again accepted.
Therefore, the private consumption differenced once time series does not exhibit a unit
root, or, in other words, is a stationary time series.
Summarising the results of this example, we can say that since ∆Ct is stationary, i.e.
using the terminology in Chapter 12 it is an I(0) stochastic process, and Ct is non-stationary,
the private consumption time series is an I(1) stochastic process. When the terminology
of integration is used in examining the non-stationarity of a stochastic process, the Dickey-
Fuller tests are also known as tests of integration of a stochastic process.
where δ = ϕ1 + ϕ2 + ϕ3 + . . . + ϕq−1 and the δjs are general functions of the ϕs. The corre-
sponding ADF equations are the following equations, respectively:
(13.16)
(13.17)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(13.18)
Because the original Dickey-Fuller equations have been ‘augmented’ with the lagged differ-
enced terms to produce Equations (13.16), (13.17) and (13.18), respectively, the usual DF test
applied to the latter equations takes the name of the augmented Dickey-Fuller (ADF) test. In
fact, both the critical values for the Dickey-Fuller τ-statistics in Table 13.3 still hold for the
ADF test and the testing of hypotheses is still that in (13.8). In other words, if tδ, which it is
generated from the OLS regressions of Equations (13.16), (13.17) or (13.18) is negative enough,
then the corresponding times series will be stationary. Otherwise, it will be non-stationary.
Note that the key reason for augmenting the initial Dickey-Fuller equations with extra
lagged differenced terms is to eliminate possible autocorrelation from the disturbances. The
DF test is not valid if the disturbance terms are autocorrelated, and, in practice, typically the
ADF test procedure, designed to eliminate autocorrelation, is the preferred option. In order to
see how many extra terms we have to include in the equations, the usual Akaike’s informa-
tion criterion (AIC) and Schwartz criterion (SC) could be employed. Furthermore, in order to
test if the disturbances were not autocorrelated, the usual Breusch-Godfrey, or Lagrange
multiplier (LM) test could be used. We demonstrate these key issues via an example below.
Example 13.2 Testing the stationarity of the consumer price index time series of
an EU member (ADF) test
To illustrate the ADF procedures we use the data set on consumer price index first
encountered in Chapter 12, which is presented in Table 12.1. Table 13.4 presents the
AIC, SC, and LM statistics for the consumer price index, Pt, using the augmented
Dickey-Fuller Equations (13.17) and (13.18).
From the results in Table 13.4 it is seen that the estimated equations, with no
extended differenced terms of Pt (q = 0), show that autocorrelation exists in the resid-
uals. Therefore, this equation is discarded. There is no autocorrelation in the residuals
when the equations are extended with one (q = 1) or two (q = 2) differenced terms of
Pt, given low values of the LM test statistics. Furthermore, comparing the AIC and SC
statistics between the various estimated versions of the equations, it is seen that for
Equation (13.17) the minimum statistics are for q = 1, and for Equation (13.18) the
minimum statistics are also for q = 1. Note these are larger negative values, therefore
smaller AIC and SC statistics. These ‘best’ estimated equations are the following:
292 Time series econometrics
Table 13.4 AIC, SC and LM statistics for the Pt Equations (13.17) and (13.18)
eq. −1.766 −1.677 21.329 −2.647 −2.513 0.527 −2.571 −2.389 0.133
(13.17) (0.000) (0.468) (0.715)
eq. −2.101 −1.967 16.607 −2.717 −2.537 1.185 −2.670 −2.444 0.217
(13.18) (0.000) (0.276) (0.641)
(13.19)
(13.20)
Using these ‘best’ estimated equations, in Table 13.5 the MacKinnon critical values
for the rejection of the hypothesis of a unit root, evaluated from EViews for Equations
(13.19) and (13.20), are reported. These critical values are the same critical values as
those in Table 13.3, showing, thus, that the MacKinnon critical values are not affected
by the number of the extra differenced terms in the equations.
Considering the same hypotheses as those for the DF test in (13.8), we see that for
Equation (13.19) the tδ = − 0.809 value is greater than all the τμ critical values in Table 13.5
and, thus, the null hypothesis is not rejected. Therefore, the consumer price index time
series exhibits a unit root, or, in other words, is a non-stationary time series. Similarly, for
Equation (13.20) the tδ = − 0.582 value is greater than all the ττ critical values in Table 13.5
and, thus, the null hypothesis is again not rejected. Therefore, the consumer price index (Pt)
time series exhibits a unit root, or, in other words, is a non-stationary time series.
By repeating the same exercise as we did with Pt, we observe that the differenced
once time series, i.e. ∆Pt, is also a non-stationary time series. If we do the same for the
differenced twice time series, i.e. ∆2Pt, we find that it is a stationary time series. In
other words, the consumer price index (Pt) is an I(2) stochastic process.
Testing for stationarity 293
13.4 Testing joint hypotheses with the Dickey-Fuller tests
In DF or ADF tests we have seen till now the null hypothesis was with respect to parameter
δ. Nothing was said for the other two deterministic parameters of the Dickey-Fuller regres-
sion equations, i.e. of the parameters α, referring to the constant, or drift, and β, referring to
the linear deterministic trend, or time trend. Dickey and Fuller (1981) provided tests for
testing jointly the parameters α, β and δ. The F-test, given by:
(13.21)
where:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
could be used in order to test joint hypotheses, following the usual Wald methodology for
testing restrictions. However, because the F-distribution is not standard, for these tests
Dickey and Fuller provided three additional F-statistics, called Φ1, Φ2 and Φ3, according to
the joint hypotheses to be tested. These three statistics are reported in Table 13.6.
The joint hypotheses are the following:
(13.22)
(13.23)
(13.24)
Example 13.3 Testing joint hypotheses for the non-stationarity of the consumer
price index time series of an EU member (ADF)
We found in the above Example (13.2) that the consumer price index (Pt) is non-
stationary. In this example we will illustrate the Dickey-Fuller joint hypotheses tests
with the time series below.
Using the estimated Equation (13.20) we can test the joint hypotheses in (13.22) and
in (13.23). In other words, for (13.22) we test that:
294 Time series econometrics
(13.25)
(13.26)
From the estimated Equation (13.20), using the usual Wald statistic for redundant
variables, for n = 34 usable observations, k = 4 estimated coefficients and r = 3 restric-
tions, we get that F2(α = 0, β = 0, δ = 0) = 2.277. The value of this statistic F2 = 2.277
is less than the critical value Φ2(n = 25) = 5.68 and Φ2(n = 50) = 5.13, for 0.05 signifi-
Downloaded by [Hacettepe University] at 02:27 20 January 2017
cance level, reported in Table 13.6. Therefore, we accept the null hypothesis, i.e. we
accept that Pt is random walk, since α = 0 means the absence of a stochastic trend, β =
0 means the absence of a deterministic trend also and δ = 0 means non-stationarity.
From the estimated Equation (13.20), using the usual Wald statistic for redundant
variables, for n = 34 usable observations, k = 4 estimated coefficients and r = 2 restric-
tions, we get that Φ3(β = 0, δ = 0) = 2.406. The value of this statistic F3 = 2.406 is less
than the critical value Φ3(n = 25) = 7.24 and Φ3(n = 50) = 6.73, for 0.05 significance
level, reported in Table 13.6. Therefore, we accept the null hypothesis, i.e. we accept
that Pt is subject to a stochastic trend only, since β = 0 means the absence of a deter-
ministic trend and δ = 0 means non-stationarity.
Statistic Φ1
25 0.29 0.38 0.49 0.65 4.12 5.18 6.30 7.88
50 0.29 0.39 0.50 0.66 3.94 4.86 5.80 7.06
100 0.29 0.39 0.50 0.67 3.86 4.71 5.57 6.70
250 0.30 0.39 0.51 0.67 3.81 4.63 5.45 6.52
500 0.30 0.39 0.51 0.67 3.79 4.61 5.41 6.47
∞ 0.30 0.40 0.51 0.67 3.78 4.59 5.38 6.43
Statistic Φ2
25 0.61 0.75 0.89 1.10 4.67 5.68 6.75 8.21
50 0.62 0.77 0.91 1.12 4.31 5.13 5.94 7.02
100 0.63 0.77 0.92 1.12 4.16 4.88 5.59 6.50
250 0.63 0.77 0.92 1.13 4.07 4.75 5.40 6.22
500 0.63 0.77 0.92 1.13 4.05 4.71 5.35 6.15
∞ 0.63 0.77 0.92 1.13 4.03 4.68 5.31 6.09
Statistic Φ3
25 0.74 0.90 1.08 1.33 5.91 7.24 8.65 10.61
50 0.76 0.93 1.11 1.37 5.61 6.73 7.81 9.31
100 0.76 0.94 1.12 1.38 5.47 6.49 7.44 8.73
250 0.76 0.94 1.13 1.39 5.39 6.34 7.25 8.43
500 0.76 0.94 1.13 1.39 5.36 6.30 7.20 8.34
∞ 0.77 0.94 1.13 1.39 5.34 6.25 7.16 8.27
Source: Dickey, D.A. and W.A. Fuller (1981) ‘Likelihood ratio statistics for autoregressive time
series with a unit root’, Econometrica, 49, 4, 1057–1072.
Testing for stationarity 295
Using the estimated Equation (13.19) we can test the joint hypotheses in (13.24). In
other words, for (13.24) we test that:
(13.27)
From the estimated Equation (13.19), using the usual Wald statistic for redundant vari-
ables, for n = 34 usable observations, k = 3 estimated coefficients and r = 2 restrictions, we
get that F1(α = 0, δ = 0) = 1.246. The value of this statistic F1 = 1.246 is less than the critical
value Φ1(n = 25) = 5.18 and Φ1(n = 50) = 4.86, for 0.05 significance level, reported in
Table 13.6. Therefore, we accept the null hypothesis, i.e. we accept that Pt is random walk,
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Table 13.7 Critical values for the Dickey-Fuller τij-statistics (symmetric distributions)
Statistic – ταμ
25 2.20 2.61 2.97 3.41
50 2.18 2.56 2.89 3.28
100 2.17 2.54 2.86 3.22
250 2.16 2.53 2.84 3.19
500 2.16 2.52 2.83 3.18
∞ 2.16 2.52 2.83 3.18
Statistic – τατ
25 2.77 3.20 3.59 4.05
50 2.75 3.14 3.47 3.87
100 2.73 3.11 3.42 3.78
250 2.73 3.09 3.39 3.74
500 2.72 3.08 3.38 3.72
∞ 2.72 3.08 3.38 3.71
Statistic – τβτ
25 2.39 2.85 3.25 3.74
50 2.38 2.81 3.18 3.60
100 2.38 2.79 3.14 3.53
250 2.38 2.79 3.12 3.49
500 2.38 2.78 3.11 3.48
∞ 2.38 2.78 3.11 3.46
Source: Dickey, D.A. and W.A. Fuller (1981) ‘Likelihood ratio statistics for autoregressive time series with a unit
root’, Econometrica, 49, 4, 1057–1072.
296 Time series econometrics
The conditional hypotheses are the following:
(13.28)
(13.29)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(13.30)
1 The ‘top to bottom’ philosophy, meaning that the top (starting) point of the procedure
should be the most general case and then, step by step, to move towards the lowest
(finishing) point, which should be the most specific case.
2 If it is known that the time series under investigation contains a drift or trend, then the
null hypothesis of a unit root can be tested using the standard normal distribution.
3 Because the unit roots tests have low power in rejecting the null hypothesis of a unit
Downloaded by [Hacettepe University] at 02:27 20 January 2017
root, if, at any step of the sequential procedure of testing, the null hypothesis is rejected,
then the whole procedure ends, concluding that the time series under investigation is
stationary.
(13.31)
• Use statistics AIC and SC to find the proper number of differenced terms to be included
in the equation.
• Use statistic LM to test for autocorrelation in the residuals.
• If the null hypothesis is rejected then time series Xt does not contain a unit root. You
may stop the whole process, concluding that the time series is stationary.
• If the null hypothesis is not rejected you must continue, in order to test the drift and
trend terms.
Step 3 Use statistic τβτ to test the conditional null hypothesis β = 0, given δ = 0, i.e. to test the
significance of the trend term given that the time series contains a unit root. You may verify
this test by using statistic Φ3 to test the joint null hypothesis β = δ = 0.
• If the null hypothesis is not rejected, i.e. if β is not significant, you may continue.
• If the null hypothesis is rejected, i.e. if β is significant, you have to perform the following
test:
• If the null hypothesis is rejected, i.e. if time series Xt does not contain a unit root, you
may stop the process, concluding that the time series is stationary.
• If the null hypothesis is not rejected, i.e. if time series Xt contains a unit root, you
conclude that β ≠ 0 and δ = 0.
298 Time series econometrics
Step 4 Use statistic τατ to test the conditional null hypothesis α = 0, given δ = 0, i.e. to test the
significance of the drift term given that the time series contains a unit root. You may verify
this test by using statistic Φ2 to test the joint null hypothesis α = β = δ = 0.
• If the null hypothesis is not rejected, i.e. if α is not significant, you may continue.
• If the null hypothesis is rejected, i.e. if α is significant, you have to perform the following
test:
• Use the standard normal distribution to retest the null hypothesis δ = 0.
• If the null hypothesis is rejected, i.e. if time series Xt does not contain a unit root, you
may stop the process, concluding that the time series is stationary.
• If the null hypothesis is not rejected, i.e. if time series Xt contains a unit root, you
conclude that α ≠ 0 and δ = 0.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(13.32)
• If the null hypothesis is rejected, then time series Xt does not contain a unit root. You
may stop the whole process, concluding that the time series is stationary.
• If the null hypothesis is not rejected, you must continue, in order to test the drift term.
Step 7 Use statistic ταμ to test the conditional null hypothesis α = 0, given δ = 0, i.e. to test
the significance of the drift term, given that the time series contains a unit root. You may
verify this test by using statistic Φ1 to test the joint null hypothesis α = δ = 0.
• If the null hypothesis is not rejected, i.e. if α is not significant, you may continue.
• If the null hypothesis is rejected, i.e. if α is significant, you have to perform the following
test:
• If the null hypothesis is rejected, i.e. if time series Xt does not contain a unit root, you
may stop the process, concluding that the time series is stationary.
• If the null hypothesis is not rejected, i.e. if time series Xt contains a unit root, you
conclude that α ≠ 0 and δ = 0.
(13.33)
• If the null hypothesis is rejected, you conclude that time series Xt does not contain a unit
root, i.e. it is stationary.
• If the null hypothesis is not rejected, you conclude that time series Xt contains a unit
root, i.e. it is non-stationary.
Testing for stationarity 299
Example 13.5 Using a sequential procedure for testing the non-stationarity of the
consumer price index time series of an EU member (ADF)
This example is in fact a summary of the tests performed in Examples 13.2, 13.3 and
13.4. Table 13.8 summarises relevant results from these examples.
Table 13.8 Results of a sequential procedure for testing unit roots of the consumer price index
time series (P) of an EU member
Step 1 The results for Equation (13.31) are shown. There is no autocorrelation in the
residuals.
Step 2 Because tδ = −0.582 > ττ = −3.5468 the time series contains a unit root.
Step 3 Because |tβ| = 2.023 <|τβτ| = 2.81 it may be β = 0. This result is also verified by
F3 = 2.406 < Φ3 = 6.73.
Step 4 Because |tα| = 1.134 < |τατ| = 3.14 it may be α = 0. This result is also verified by
F2 = 2.277 < Φ2 = 5.13.
Step 5 The results for Equation (13.32) are shown. There is no autocorrelation in the
residuals.
Step 6 Because tδ = −0.809 > τμ = −2.9499 the time series contains a unit root.
Step 7 Because |ta| = 1.398 <|ταμ| = 2.56 it may be α = 0. This result is also verified by
F1 = 1.246 < Φ1 = 4.86.
Step 8 The results for Equation (13.33) are shown. There is no autocorrelation in the
residuals.
Step 9 Because tδ = −0.723 > τ = −1.95 the time series contains a unit root.
Final conclusion: the consumer price index (Pt) time series for an EU member contains
a unit root without drift and without trend.
300 Time series econometrics
13.6 The multiple unit roots, the seasonal unit roots and the
panel data unit root tests
Since the original Dickey-Fuller and the augmented Dickey-Fuller test were developed,
various modifications and/or extensions of this test have been proposed. We have already
seen some of these tests in the previous sections. However, we will now refer briefly to some
of these and other tests.
if a time series has possibly one unit root, a simple version of the estimated equation we saw
is the following:
(13.34)
If the time series has possibly two roots, the estimated equation is the following:
(13.35)
If the time series has possibly three roots, the estimated equation is the following:
(13.36)
and so on. For each equation the usual DF or ADF procedure should be applied (see Examples
13.1 and 13.2).
(13.37)
(13.38)
with the λ̂ js being the estimates of the λjs obtained from the following regression:
(13.39)
The test on unit roots could be based on the Student’s t-statistic of the δ coefficient in
the regression (13.37). Osborn et al. (1988), instead of using ∆sZt as the dependent variable
Testing for stationarity 301
in the regression Equation (13.37), proposed variable ∆sXt. Furthermore, Hylleberg et al.
(1990) proposed a more general test to deal with cyclical movements at different frequen-
cies, and, therefore, to test the corresponding unit roots (for more see Charemza and
Deadman, 1997).
In the simple case where the seasonal pattern of a time series Xt measured s times per time
period is purely deterministic, the following regression equation could be used:
(13.40)
where η̂t are the estimated residuals of ηt derived from the following regression equation:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(13.41)
where Djt are s − 1 dummy variables. In other words, η̂t could be considered as a deseason-
alised time series in the place of Xt. For the testing of unit roots the usual DF or ADF proce-
dure could be used for coefficient δ of the regression Equation (13.40) (Dickey et al. (1986),
Enders (2010)).
(13.42)
The basic idea is to remove the deterministic trend by estimating the trend coefficient β.
Using this estimate the series is then detrended.
Step 2 Use the detrended series, det(Yt−1), in the following DF/ADF type equation:
(13.43)
Step 3 The null of the unit root (H0:γ = 0) can be rejected, if the absolute value of the corre-
sponding ‘test statistic’ is larger than the critical value. The CV of this test for n = 200 at 5%
is −3.04 (S-P test critical values). It can be shown that this test has more power compared to
DF/ADF tests and can better distinguish between a TS and a DS time series.
302 Time series econometrics
e. The IPS panel data unit root test
In recent years, because of the popularity of the panel data regression modelling, it has
become common practice to test for the unit root when working with panel data/pooled data
sets. A popular unit root test for panel data/pooled data is recommended by Im, Pesaran and
Shin (2003), and is known as the (IPS) test. Suppose we have n series, each having T obser-
vations. The basic idea of the test is to perform an ADF test on each n series, pool the esti-
mates together, and then carry out a unit root test on the pooled estimated test value. Key
steps may be summarised as follows:
Step 1 For each n time series run an ADF test (use the same model in each case, for example,
run a random walk with a drift parameter model in a ADF form).
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Step 2 Let the tADFi to denote the unit root test static value for each of n series.
Step 3 Obtain the average value of the test statistic in step 2 as follows:
(13.44)
(13.45)
Im, Pesaran and Shin (2003) showed that ZIPS has a standard normal distribution when
sample size is large. Moreover, they calculated the critical values of this test for various
levels of n and T. For example, for n = 5 and T = 50, at a 5% level of significance, the critical
value is calculated at −2.18.
Step 5 If the absolute value of test statistic ZIPS is greater than the critical value, reject the null
hypothesis that each of the n series has a unit root, or that all n series are I(1) series. Note that
this is a large sample test and the lowest permissible value for n is calculated at 5.
(13.46)
–
where X = arithmetic mean of Xt. Due to the formula (13.46) this statistic is called the inte-
gration Durbin-Watson (IDW) statistic. However, if the time series Xt is regressed on a
constant, the estimate of the constant is the arithmetic mean of the time series Xt, and the one
corresponding to this DW regression statistic is the IDW statistic computed in (13.46). In this
regression, if the time series Xt is non-stationary, so will the corresponding residuals, because:
(13.47)
Testing for stationarity 303
where a = estimated intercept and et = residuals. If the value of IDW is low, say lower than
0.5, time series Xt is suspected to be non-stationary. If the value of IDW is close to 2, time
series Xt is stationary.
As an example, the value of IDW for the private consumption (Ct) time series of an EU
member is equal to 0.010, indicating that this time series is non-stationary, as we found in
Example 7.1. Furthermore, for the same time series differenced once (∆Ct) it is IDW =
1.692, indicating that time series ∆Ct is stationary, as we also found in Example 7.1.
measured by the probability of rejecting the null hypothesis when it is false. It has been
proved, using Monte Carlo simulations, that the power of the unit root tests is very low. In
other words, although a time series may be stationary, the unit root tests may fail to detect
this, and suggest that the time series is non-stationary.
Because most macroeconomic time series Xt are trended upwards, the unit root tests often
indicate that these series are non-stationary. A usual procedure to transform the trended
upwards time series to roughly remaining constant over time, is to consider their percentage
growth, i.e. to consider the time series (Xt – Xt−1)/Xt−1. When the values of a time series are
positive, by getting the natural logarithms of the series, i.e. xt = ln(Xt), we can use the ‘lower
case’ series xt in the unit root tests. In trying to detect if the time series xt has a unit root we
use the difference ∆xt = xt – xt−1. However, because it is approximately true that:
(13.48)
in the unit root tests the time series are often taken in logarithms instead of natural levels. To
see this, assume the upwards trending time series of the consumer price index (Pt), for
example, which is given by:
(13.49)
where g = constant rate of growth and ηt = error term. If we take the logarithms of both sides
of (13.49) we obtain that:
(13.50)
or
(13.51)
where pt = lnPt and εt = ηt – ηt−1. In the case that εt is white noise, Equation (13.51) is a
random walk process with a drift parameter.
304 Time series econometrics
13.8 A summary of key issues
• The starting point in modern time series econometrics is the phenomenon of the spurious
regression. That is, a high level of autocorrelation, significant t and F-ratios and high R2,
all in the same regression, signifying serious specification problems. The traditional
approach to econometric analysis fails to provide an adequate answer/solution to this
problem.
• The problem is due to the fact that economic/financial variables are seldom stationary
over time. That is, the joint probability functions generating the realised values of each
variable (data set) change over time, therefore when they combine in a regression model,
the result is a meaningless spurious regression.
• The modern approach to the problem of the spurious regression is as follows:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Step 1 Check the characteristics of the data generation processes (DGP) of each variable
under consideration. The potential DGP for most macroeconomic and also financial time
series could be any one of the following non-stationary stochastic processes (Nelson and
Plosser, 1982):
These DGPs have stochastic trends and are non-stationary; in particular, each series
variance will increase/changes rapidly over time. The DGP could be trend stationary, that is
∆ yt = a (TS – needs detrending).
Step 2 Having gone through the first step for each variable (say Y and X, for example) plot
the data for each variable – check the pattern of change for each variable over time.
Step 3 Test for the unit root to examine the order of the integration of each of the variables
under consideration, as follows:
Dickey and Fuller (1979) consider three different regression models that can be used to test
for the presence of a unit root:
Yt = α Yt − 1 + ut
H0: α = 1 i.e. Yt is I(1) – DS (difference stationary)
H1: α < 1 (strictly less than unity) – Yt is I(0) – stationary
H0:γ = 0; Yt is I(1) – DS
H1: γ < 0; Yt is I(1) – a stationary series
Testing for stationarity 305
Model 2 A random walk model with a drift:
∆Yt = a + γYt−1 + βt + ut
H0 : γ = 0; Yt is I(1) – DS
H1 : γ < 0; Yt−1 is I(0) – a stationary process
Downloaded by [Hacettepe University] at 02:27 20 January 2017
• The DF test involves estimating one (or more) of the DF equations (models 1, 2 or 3),
using OLS in order to obtain the estimated value of γ and the associated standard error.
Compare the resulting ‘t-ratio’ with the appropriate value reported in the DF tables to
determine whether to reject or not reject the null H0.
• Note that the above methodology is the same regardless of which of the three forms/
models is estimated. However, you should take into account the following key points:
• The DF distribution is not exactly the same as a t-distribution and the critical values for
H0:γ = 0 depends on the form of the model and on the sample size.
• The DF statistics are called τ, tau. For no constant or time trend – Model 1: CV for
n = 500 at 5% is −1.96. For constant but no time trend – Model 2: CV for n = 500, at 5%
is −2.87. For constant and time trend – Model 3: CV for n = 500, at 5% is −3.42τ.
• The augmented Dicky-Fuller (ADF) tests are used if autocorrelation is detected. In this
case, add lags of the dependent variables to the RHS of each model to remove autocor-
relation, as follows:
Model 1
Model 2
Model 3
The critical values of the ADF tests are the same as those for the DF tests.
• Note that the correct lag length is important in an ADF test, and incorrect lag length
could lead to misleading results. The lag length selection can be determined by indi-
vidual significance (t tests on δ parameter, autocorrelation tests, and model selection
criteria (e.g. AIC and BIC) produced by regression software).
306 Time series econometrics
• In practice, to remove autocorrelation, start with a relatively large lag (two/three lags),
then test down using a t-test of significance. In many applications one or two lags will
normally remove the autocorrelation. Be aware that an increased number of parameters
will reduce the power of an ADF test.
• Dickey and Fuller (1981) provided three additional F-statistics (called φ1, φ2 and φ3) to
test joint hypotheses on the parameters (this procedure helps in finding the DGP). The
tests are as follows:
Use φ1.
Use φ2.
Use φ3.
• Test statistics are called φ1 (test one), φ2 (test two) and φ3 (test three). These are calcu-
lated in exactly the same way as ordinary F-tests:
(RSS(restricted) – RSS(unrestricted))/r
φi =
RSS(unrestricted)/(T-K)
i = 1,2,3.
RSS = residual sum of squares
r = is the number of parameter restrictions (for the first test r = 2, second test r = 3, and
for the third test r = 2)
for φ1 CV = 4.71
for φ2 Cv = 4.88
for φ3 Cv = 6.49
In each case, if the value of φ is less than CV, do not reject H0 at a 5% level of significance,
Yt is I(1) and the restrictions are binding.
Testing for stationarity 307
Review questions
1 Explain what you understand by a unit root test. Why are unit root tests necessary when
dealing with time series regressions?
2 Explain the Dicky-Fuller (DF) and the augmented Dicky-Fuller (ADF) unit root tests.
Why is the ADF test more popular in applied work?
3 Explain how you would determine the correct lag length when using an ADF test. What
are the consequences of incorrect lag length for the ADF test?
4 What are the problems and limitations of the unit root tests. How would you overcome
these problems?
5 Collect UK annual time series data for the period 1970–2008 from the Economic Trends
Annual Supplement (available online) on each of the following:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
INTRODUCTION
Cointegration analysis is perhaps the most significant development in econometrics since the
mid 1980s. In simple words, cointegration analysis refers to groups of variables that drift
together, although each is individually non-stationary in the sense that they tend upwards or
downwards over time. This common drifting of variables makes linear relationships between
these variables exist over long periods of time, thereby giving us insight into equilibrium
relationships of economic variables. Cointegration analysis is a technique used in the estima-
tion of the long-run or equilibrium parameters in a relationship with non-stationary variables.
It is a new method popularised in response to the problems inherent in the specific to general
approach to time series analysis. It is used for specifying, estimating and testing dynamic
models, and it can be used for testing the validity of underlying economic theories. Furthermore,
the usefulness of cointegration analysis is also seen in the estimation of the short-run or
disequilibrium parameters in a relationship, because the latter estimation can utilise the esti-
mated long-run parameters through cointegration methods. In this chapter we provide an
introduction to the methodology of cointegration, focusing on two-variable regression models.
Key topics
• Spurious regression and modern time series econometrics
• The concept of cointegration
• The Engle-Granger (EG) methodology
• The estimation of the error correction short-run models
(14.1)
where 1 ≤ β1 (= proportionality parameter) > 0. In estimating this function with OLS, using
the data for an EU member presented in Table 12.1, and under the assumption
Cointegration analysis 309
that permanent personal disposable income is equal to personal disposable income, we
will get:
(14.2)
The estimates in (14.2), apart from the low Durbin-Watson statistic, are very good. The
t-statistic is very high, showing that the regression coefficient is significant, and the R2 is also
very high, indicating a very good fit. However, these estimates may be misleading because
the two time series involved in the equation are trended or non-stationary random processes,
as can be seen in Figure 12.1. As a consequence, the OLS estimator is not consistent and the
corresponding usual inference procedures are not valid. As we explained in Chapter 6, these
Downloaded by [Hacettepe University] at 02:27 20 January 2017
regressions where the results look very good in terms of R2 and t-statistics, but the variables
involved are trended time series, have been called spurious regressions by Granger and
Newbold (1974). In fact, Granger and Newbold suggested that if in a regression with trended
time series variables, the DW statistic is low and the coefficient of determination R2 is high,
we should suspect that the estimated equation possibly suffers from spurious regression. As
a rule of thumb, if R2 > DW, spurious regression should be suspected, as in Equation (14.2),
for example, where R2 = 0.9924 > DW = 0.8667.
Taking into account that most time series in economics are non-stationary (Nelson
and Plosser, 1982) then the situation of getting spurious regression results is normally the
case in regression analysis. The SG methodology response to this problem has been
to develop and use alternative estimation methods to the OLS to deal with the changing
means and variances, ignoring the underlying cause of the problem. An alternative practical
methodology to avoid the problem of non-stationarity of the time series has been to use
regressions in the first differences of the time series. However, using relationships where
the variables are expressed in differences is like referring to the short-run or disequilibrium
state of the phenomenon under investigation and not to its long-run or equilibrium
state, where the variables are expressed in their original levels, as most economic theories
suggest.
There is, however, a rather special occurrence, not falling into the normal case outlined
above, as follows:
In certain cases, due to economic/financial market characteristics, when the two or more
non-stationary stochastic processes are combined they result in a stationary stochastic
process, that is, the two/more joint probability distributions, when mixed together,
generate a linear combination which has a stationary probability distribution with white
noise characteristics. In this special case, the variables are said to be cointegrated and
there exists a long-run equilibrium solution. In addition, the adjustment towards this
long-run state is taking place via an error correction model (ECM).
• Within the framework of cointegration methodology, key problems and issues arising
from SG methodology may therefore be resolved by a careful step by step analysis of
the time series data. This is how the modern approach to time series econometrics iden-
tifies and deals with the real cause of spurious regressions, which are so frequently
encountered in time series regression analysis. In what follows we discuss in detail each
key step in cointegration analysis.
Cointegration of two variables Two time series, Yt and Xt, are said to be cointegrated of
order (d,b), where d ≥ b ≥ 0, if both time series are integrated of order d, and there exists
a linear combination of these two time series, say a1Yt + a2Xt, which is integrated of order
(d – b). In mathematical terms, this definition is written as:
(14.3)
Cointegration analysis 311
where CI is the symbol of cointegration. The vector of the coefficients that constitutes
the linear combination of the two series, i.e. [a1, a2] in (14.3), is called the cointegrating
vector.
We can distinguish the following two special cases, which we will investigate in this
chapter:
1 The case where d = b, resulting in a1Yt + a2Xt ~ I(0), which means that the linear combi-
nation of the two time series is stationary, and, therefore, Yt,Xt ~ CI(d,d).
2 The case where d = b = 1, resulting in a1Yt + a2Xt ~ I(0), which means that the linear
combination of the two time series is stationary, and, therefore, Yt,Xt ~ CI(1,1).
(14.4)
(14.5)
The deviation from the long-run equilibrium, called the equilibrium error, εt, is given by:
(14.6)
For the long-run equilibrium to have meaning, i.e. to exist, the equilibrium error in (14.6)
should fluctuate around the equilibrating zero value, as shown in (14.5). In other words, the
equilibrium error εt should be a stationary time series, i.e. it should be εt ~ I(0) with E(εt)
= 0. According to the definition in (14.3), because Yt ~ I(1) and Xt ~ I(1), and the linear
combination εt = Yt − β0 − β1Xt ~ I(0), we can say that Yt and Xt are cointegrated of order
(1,1), i.e. it is Yt,Xt ~ CI(1,1). The cointegrating vector is [1, –β0, −β1]. It can be proved that
in the two-variable case, and under the assumption that the coefficient of one of the variables
is normalised to equal unity, the cointegrating vector, i.e. the linear combination of the two
time series, is unique.
Combining the results above we could say that the cointegration between two time
series is another way to express the existence of a long-run equilibrium relationship between
these two time series. Therefore, by considering that Yt and Xt are cointegrated and that the
equilibrium error εt is stationary with zero mean, we can write that:
(14.7)
and be sure that Equation (14.7) will not produce spurious results. Stock (1987) proved that
for large samples the OLS estimator for Equation (14.7) is super-consistent, i.e. it is consis-
tent and very efficient, because it converges faster to the true values of the regression coef-
ficients than the OLS estimator involving stationary variables. However, Banerjee et al.
(1986) showed that for small samples the OLS estimator is biased and the level of bias
depends on the value of R2; the higher the R2 the lower the level of bias. Finally, according
to Granger (1986), if we want to avoid spurious regression situations we should test before
any regression if the variables involved are cointegrated, something that we present in the
next section.
312 Time series econometrics
14.3 Testing for cointegration
In this section we will present two simple methods for testing for cointegration between two
variables.
Step 1 Find the order of integration of both variables using the unit root methodology presented
Downloaded by [Hacettepe University] at 02:27 20 January 2017
in Chapter 13. There are three cases: (1) If the order of integration of the two variables is the
same, something that the concept of cointegration requires, continue to the next step. (2) If the
order of integration of the two variables is different, you may conclude that the two variables
are not cointegrated. (3) If the two variables are stationary the whole testing process stops
because you can use the standard regression techniques for stationary variables.
Step 2 If the two variables are integrated of the same order, say I(1), estimate the long-run
equilibrium equation with OLS:
(14.8)
which in this case is called the potential cointegrating regression, and save the
residuals, et, as an estimate of the equilibrium error, εt. Although the estimated cointegrating
vector [1, −b0, −b1] is a consistent estimate of the true cointegrating vector [1, β0, −β1],
this is not true for the estimated standard errors of these coefficients. For this reason the
estimated standard errors are often not quoted with the cointegrating regression.
Step 3 For the two variables to be cointegrated the equilibrium errors must be stationary. To
test this stationarity apply the unit root methodology presented in Chapter 13 to the esti-
mated equilibrium errors saved in the previous step. You could use, for example, the Dickey-
Fuller test, or the augmented Dickey-Fuller test, to time series et, which involves the
estimation of a version of the following equation with OLS:
(14.9)
Two things we should take into account in applying the DF or ADF tests:
1 Equation (14.9) does not include a constant term, because, by construction, the OLS
residuals et are centred around zero.
2 Because the estimate of δ in (14.9) is downward biased, due to the fact that, by construc-
tion, the OLS methodology seeks to produce stationary residuals et, the usual Dickey-
Fuller τ statistics are not appropriate for this test. Engle and Granger (1987), Engle and
Yoo (1987), MacKinnon (1991), and Davinson and MacKinnon (1993) presented crit-
ical values for this test, which are even more negative that the usual Dickey-Fuller τ
statistics. In Table 14.1 critical values for this cointegration test are presented.
Cointegration analysis 313
Table 14.1 Critical values for the EG or AEG cointegration tests
Significance levels
Sample 0.01 0.05 0.10 0.01 0.05 0.10 0.01 0.05 0.10
size
Step 4 Arrive at conclusions about the cointegration of the two variables according to the
following hypotheses:
(14.10)
(14.11)
and
(14.12)
(14.13)
314 Time series econometrics
and
(14.14)
Table 14.2 presents the MacKinnon critical values for the rejection of the hypothesis
of a unit root, evaluated from EViews for equations (14.11) to (14.14).
From the figures in Table 14.2 it is seen that both Ct and Yt are non-stationary and
that both ΔCt and ΔYt are stationary. In other words, both Ct and Yt are integrated of
order one, i.e. it is Ct~I(1) and Yt~I(1). Therefore, we can proceed to step 2.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Table 14.2 MacKinnon critical values for Equations (14.11) to (14.14) and the DF tδ ratios
Significance level Critical values for Equations (14.11) Critical values for Equations
and (14.13) (14.12) and (14.14)
[eq. (14.11) tδ = −1.339] [eq. (14.12) tδ = −4.862]
[eq. (14.13) tδ = −1.387] [eq. (14.14) tδ = −6.272]
(14.15)
Using the estimated cointegrating vector [1, −11907.23, −0.779585] we have esti-
mated and saved the estimated equilibrium errors et.
Step 3 Testing the stationarity of et. Taking into account the criteria AIC, SC and
LM(1), we obtained:
(14.16)
Step 4 Comparing the tδ = −3.150 value from Equation (14.16) with the critical values
in Table 14.1 for m = 2, we see that this value is more or less equal to the critical values
for a 0.10 level of significance. In other words, if we assume a significance level equal
to 0.10, or 0.11, then we ‘accept’ the alternative hypothesis that et is stationary,
meaning, thus, that variables Ct and Yt are cointegrated, or that there exists a long-run
equilibrium relationship between these two variables. However, if we work with a
significance level of less than 0.10, then these two variables are not cointegrated and
we cannot say that there exists a long-run equilibrium relationship between private
consumption and personal disposable income for an EU member.
Cointegration analysis 315
b. The Durbin-Watson approach
This approach is very simple and is based on the following two steps:
Step 1 Estimate the cointegrating regression (14.8), save the residuals et, and compute the
Durbin-Watson statistic, which now is called the cointegrating regression Durbin-Watson
(CRDW) statistic, as:
(14.17)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Step 2 Arrive at a decision about the cointegration of the two variables according to the
following hypotheses:
(14.18)
The critical d values, with the null hypothesis being d = 0, have been computed by Sargan
and Bhargava (1983) and by Engle and Granger (1987). These critical values are 0.511,
0.386 and 0.322 for significance levels of 0.01, 0.05 and 0.10, respectively.
Step 1 From the estimated cointegrating regression we see that CRDW = 1.021.
Step 2 Because the CRDW = 1.021 and is greater than the critical values noted above,
the alternative hypothesis of stationarity is accepted, and, thus, we can conclude that
private consumption and personal disposable income are cointegrated.
(14.19)
In the steady-state long-run equilibrium, the variables take the same values for all periods,
i.e. it is Yt = Yt−1 = Yt−2 = . . . = Y* and Xt = Xt−1 = Xt−2 = . . . = X*, and, therefore, the steady-
state long-run equilibrium relationship becomes:
316 Time series econometrics
(14.20)
where the cointegrating vector is [1, −α*, −β*]. Using this cointegrating vector the equilib-
rium error is ε* = Υ* − α* − β*X*. If [1, −a*, −b*] is the estimated cointegrating vector,
obtained by estimating firstly (14.19) with OLS and secondly by substituting the estimates
into (14.20), the estimated equilibrium error is given by et = Yt − a* − b*Xt. This estimated
equilibrium error could be used in testing for cointegration between the two variables, Yt and
Xt, following the steps of the Engle-Granger approach presented above.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Step 1 It has been shown in the above examples that both variables, private consump-
tion and personal disposable income, are integrated of order one.
(14.21)
Substituting the estimates from (14.21) into the corresponding parameters of (14.22)
we get that:
and therefore the estimated cointegrating vector is [1, −46147.25, −0.76714]. This
cointegrating vector is used to construct the estimated equilibrium error et = Yt −
46147.25 − 0.76714Xt.
Step 3 Testing the stationarity of et. Taking into account the criteria AIC, SC and
LM(1), we obtained:
(14.22)
Step 4 Comparing the tδ = − 0.759 value from Equation (14.22) with the critical values
in Table 7.11 for m = 2, we see that this value is much less than all the critical values
Cointegration analysis 317
noted in the table. This means that the equilibrium error is non-stationary, indicating,
thus, that private consumption and personal disposable income are not cointegrated.
Summarising the last three examples we could say that in Example 14.1 we found
that variables Ct and Yt were on the borderline of being cointegrated, in Example 14.2
we found that these variables are cointegrated, and, finally, in Example 14.3 we found
that the same two variables are not cointegrated. The results of these examples show
that cointegration tests lack power and fail to recover cointegration between two vari-
ables, even when these variables are cointegrated. Therefore, we should use cointegra-
tion tests with great caution.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(14.23)
where Yt ~ I(1), Xt ~ (1), Yt,Xt ~ CI(1,1), εt = Yt − β0 − β1Xt ~ I(0), ut = white noise distur-
bance term and λ = short-run adjustment coefficient.
In (14.23) all variables are stationary because Yt and Xt being integrated of order one, their
differences ΔYt and ΔXt are integrated of order zero. Furthermore, the equilibrium error εt is
integrated of order zero because variables Yt and Xt are cointegrated. In other words, one
could say that Equation (14.23) could be estimated by OLS. However, this is not the case,
because the equilibrium error εt is not an observable variable. Therefore, before any estima-
tion of Equation (14.23), values of this error should be obtained.
Engle and Granger (1987) proposed the following two-step methodology in estimating
Equation (14.23):
Step 1 Estimate the potential cointegrating regression (14.8), then get the consistent
estimated cointegrating vector [1, −b0, −b1] and use it in order to obtain the estimated
equilibrium error et = Yt − b0 − b1Xt.
(14.24)
1 Use appropriate statistics, such as AIC, SC and LM, for example, in order to decide
about the proper number of lags for the differenced variables to be used.
318 Time series econometrics
2 Use, if appropriate, the non-lagged differenced variable Xt.
3 Include in the equation other differenced ‘exogenous’ variables, as long as they are
integrated of order one, in order to improve fit.
(14.25)
Step 2 According to (14.24), using the criteria AIC, SC and t-ratios, a version of an
estimated error correction model is the following:
(14.26)
The results in (14.26) show that short-run changes in personal disposable income Yt
positively affect private consumption Ct. Furthermore, because the short-run adjust-
ment coefficient is significant, it shows that 0.19996 of the deviation of the actual
private consumption from its long-run equilibrium level is corrected each year. The
above cointegration analysis has identified that there is a long-run relationship between
C and Y; in addition it has shown that the adjustment towards this equilibrium is
uniquely determined via an ECM regression model (14.26). Note that this procedure is
a significant improvement over the SG methodology, with the latter seldom being able
to correctly identify the short-run dynamic adjustment process.
Yt = a + bXt + ut.
Note that ut = Yt − (a + b Xt). If Y and X are cointegrated then ut, a linear combination
of them, must be I(0), that is, a white noise process.
• Using the OLS residuals, et, from the potential regression model, use them as a proxy
for ut, to test to see whether et series is I (0).
• Step 3 Perform a DF/ADF test on the OLS residuals from step 2, as follows:
There is no need to include an intercept term in the above ADF equation, since the time
series is a residual sequence.
Again use AIC, t and F-tests to determine the appropriate lag length for the ADF. The
parameter of interest is γ and the null and alternative hypotheses are as follows:
H0: γ = 0
H1: γ < 0
If you can reject H0, et series is I(0), stationary. Hence, conclude that Y and X are
cointegrated and there exists a long-run equilibrium relationship between them, as the
theory suggests.
• Since the OLS method used to estimate the potential cointegration equation minimises
the RSS and the residual variance is made as small as possible, the testing procedure
based on the DF critical values is inclined towards rejecting H0. To overcome this
problem, Engle and Granger developed their own CV values for the above tests. The
CVs depend on the sample size and number of variables used in the analysis. For
example, to test for cointegration between Y and X, using 100 observations, at a 5%
level of significance, the CV is −3.398. If the absolute value of t is greater than 3.398,
reject H0 – conclude cointegration. If the two variables are cointegrated, then the adjust-
ment towards equilibrium takes place via an ECM process.
• The Granger representation theorem states that if two variables, Y and X, are cointe-
grated then the relationship between them can be expressed as ECM, which in turn
shows how adjustment towards a long-run equilibrium takes place. Moreover, for any
set of I(1) variables, error correction and cointegration are equivalent. That is ECM
presentation implies cointegration.
320 Time series econometrics
• Step 4 From the cointegration regression the ‘equilibrium error’ is ut = Yt − (a + bXt), a
proxy for this is the OLS residual, et. The ECM implies that in each period a fraction of
this error, corresponding to the previous time period, is corrected for. This process will
move the model towards equilibrium given sufficient time. The estimation of the ECM
representation is the step 4 of the EG methodology, as follows:
This is the ECM representation. It shows the adjustment towards equilibrium via the
ECM term, βet−1. β is termed the speed of adjustment coefficient. It shows how ΔYt
reacts to the previous time period ‘equilibrium error’.
• Step 5 Assess the adequacy of the ECM regression. Carry out diagnostic checks to
Downloaded by [Hacettepe University] at 02:27 20 January 2017
ensure the disturbance term is white noise. Again ensure that the lag length is chosen
correctly. The equilibrium solution is given by the ECM term, when all changes in the
variables are set to zero.
• Note: the EG methodology cannot deal with the general case of more than two
variables.
• The normalisation procedure is not clear. That is, there might be an ECM between
Y and X as well as X and Y (X as the dependent variable in this case).
• In practice, it is possible to find that one regression indicates that the variables are
cointegrated, whereas reversing the order indicates no cointegration.
• Use economic/financial theory/knowledge to specify the ECM regression.
• For the general case of more than two variables, use the Johansen (1988) procedure.
Review questions
1 Explain how the modern approach to time series econometrics deals with the problem
of spurious regression.
2 Explain what you understand by the concept of cointegration. What are the key steps in
the Engle-Granger approach to cointegration analysis?
3 Explain what you understand by [a] I ~ (0), [b] I ~ (1) and [c] I(2) time series. What are
the consequences of mixing an I ~ (0) time series with an I(1)?
4 ‘Cointegration methodology is basically an “inductive” method lacking a theoretical
foundation.’ Discuss.
5 Collect a time series data for levels of aggregate imports and gross domestic product
(GDP) covering at least 50 annual/quarterly periods, for a country of your own choice.
Carry out each one of the following steps using an appropriate regression package (e.g.
Microfit 5, EViews).
a Plot each of the variables in level and first difference form and explain your plots.
Are these time series integrated of order one?
b Use an ADF test on each variable to establish the order of integration. Explain how
you would determine the order of lag length in each case. On the basis of your results
explain whether it would be correct to include both variables in a regression model.
c Run a potential cointegrating regression, using economic theory to specify the
equation.
d Obtain the residuals from the above regression and plot these residuals in both
the level and 1st difference form. Explain the implications of your plots.
Cointegration analysis 321
e Use an ADF test on the residuals of the potential cointegrating equation, determine
the lag length, and explain the implications of your results. Are the two time series
cointegrated?
f Assume that the two variables under consideration are cointegrated, specify a
short-run ECM model, and determine the correct dynamic specification to ensure a
white noise error term. Explain the implications of your results, paying particular
attention to the speed of adjustment.
6 Explain the Engel-Granger (EG) cointegration methodology. What are the shortcom-
ings of this method? Can the Engel-Granger causality test overcome these
shortcomings?
7 Collect annual time series data on UK real disposable income and real aggregate
Downloaded by [Hacettepe University] at 02:27 20 January 2017
personal consumption expenditure for the period 1974–2006, from the Economic
Trends Annual Supplement (available online).Use a regression package (e.g. Microfit 5
or EViews) to carry out each task stated below.
a Plot the data on each variable in level and in 1st difference form. Explain your
plots.
b Carry out ADF tests on each variable. In each case pay particular attention to the
determination of the lag length.
c Use the Engel-Granger methodology to investigate cointegration between these
two variables. Explain each step carefully.
d Assume that the two variables are cointegrated, and specify an ECM short-run
model. Estimate the ECM model ensuring that the error term is white noise.
e Explain the implications of the ECM regression results.
15 Cointegration analysis
The multivariate case
Downloaded by [Hacettepe University] at 02:27 20 January 2017
INTRODUCTION
This chapter provides an introductory coverage of cointegration analysis when more than
two variables are investigated. The multivariate case is a bit more complex and it does
require the use of matrix algebra to derive results. However, the methodology is now
routinely carried out via time series econometric packages (e.g. Microfit 5 and EViews) and
is straightforward to use in practice. In what follows we have tried to keep the level of math-
ematics to a minimum and have used a number of examples to illustrate key ideas and
procedures.
Key topics
• The Engle-Granger (EG) methodology
• Vector autoregression and cointegration
• The Johansen approach
• Granger causality test
Cointegration of more than two variables k time series, X1t, X2t,. . ., Xkt, are said to be
cointegrated of order (d,b), where d ≥ b ≥ 0, if all time series are integrated of order d, and
there exists a linear combination of these k time series, say a1X1t + a2X2t + . . . + akXkt, which
is integrated of order (d−b). In mathematical terms, this definition is written:
(15.1)
The vector of the coefficients that constitute the linear combination of the k time series, i.e.
[a1, a2,. . ., ak] in (15.1), is the cointegrating vector.
We can distinguish the following two special cases which we will investigate in this
chapter:
Cointegration analysis 323
1 The case where d = b, resulting in a1X1t + a2X2t + . . . + akXkt ∼ I(0), which means that the
linear combination of the k time series is stationary, and therefore, X1t, X2t,. . ., Xkt ∼
CI(d,d).
2 The case where d = b = 1, resulting in a1X1t + a2X2t + . . . + akXkt ∼ I(0), which means that
the linear combination of the k time series is stationary, and therefore, X1t, X2t,. . ., Xkt ∼
CI(1,1).
To demonstrate basic ideas let us consider the following three-variable relationship where
Yt ∼ I(1), Xt ∼ I(1) and Zt ∼ I(1):
(15.2)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(15.3)
The deviation from the long-run equilibrium, i.e. the equilibrium error, εt, is given by:
(15.4)
As in the two-variable case, for the long-run equilibrium to have meaning, i.e. to exist, the
equilibrium error in (15.4) should fluctuate around the equilibrating zero value, as shown in
(15.3). In other words, the equilibrium error εt should be a stationary time series, i.e. it
should be εt ∼ I(0) with E(εt) = 0. According to the definition in (15.1), because Yt ∼ I(1), Xt
∼ I(1) and Zt∼ I(1), and the linear combination εt = Yt − β0 − β1Xt − β2 Zt ∼ I(0), we can say
that Yt, Xt and Zt are cointegrated of order (1,1), i.e. it is Yt,Xt Zt ∼ CI(1,1). The cointegrating
vector in this case is [1, −β0, −β1, −β2].
In the two-variables case and under the assumption that the coefficient of one of the vari-
ables is normalised to equal unity, we said in Chapter 14 that the cointegrating vector, is
unique. However, in the multivariable case this is not true. It has been shown that if a long-
run equilibrium relationship exists between k variables, then these variables are cointegrated,
whilst if k variables are cointegrated, then there exists at least one long-run equilibrium
relationship between these variables. In other words, in the multivariate case the cointe-
grating vector is not unique.
It can be proved (Greene, 1999) that in the case of k variables, there can only be up to k−1
linearly independent cointegrating vectors. The number of these linearly independent coin-
tegrating vectors is called the cointegrating rank. Therefore, the cointegrating rank in the
case of k variables may range from 1 to k − 1. As a consequence, in the case where more than
one cointegrating vectors exist, it may be impossible without out-of-sample information
to identify the long-run equilibrium relationship (Enders, 2010). This is the case because
cointegration is a purely statistical concept and it is in fact ‘a-theoretical’ in the sense that
cointegrated relationships need not have any economic meaning (Maddala, 1992).
Step 1 Find the order of integration of all variables using the unit root methodology presented
in Chapter 13. If the order of integration of all variables is the same, something that the
concept of cointegration requires, continue to the next step. However, it is possible to have
a mixture of different-order variables where subsets of the higher-order variables are cointe-
grated to the order of the lower-order variables (Cuthbertson et al., 1992). We will not
consider these possibilities in this book.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Step 2 If all the variables are integrated of the same order, say I(1), estimate the long-run
equilibrium equation with OLS:
(15.5)
which in this case is the cointegrating regression and save the residuals, et, as an estimate of
the equilibrium error, εt. Although the estimated cointegrating vector [1, −b0, −b1, −b2, . . .,
−bk] is a consistent estimate of the true cointegrating vector [1, −β0, −β1, −β2, . . ., −βk], this
is not true for the estimated standard errors of these coefficients.
Step 3 For the variables to be cointegrated the equilibrium errors must be stationary. To test
this stationarity, apply the unit root methodology presented in Chapter 13 to the estimated
equilibrium errors saved in the previous step. You could use, for example, the Dickey-Fuller
test, or the augmented Dickey-Fuller test, to time series et, which involves the estimation of
a version of the following equation with OLS:
(15.6)
Two things we should take into account in applying the DF, or ADF, tests:
1 Equation (15.6) does not include a constant term, because, by construction, the OLS
residuals et are centred around zero.
2 Because the estimate of δ in (15.6) is downward biased, due to the fact that, by construc-
tion, the OLS methodology seeks to produce stationary residuals et, the usual Dickey-
Fuller τ statistics are not appropriate for this test. Critical values, such as those presented
in Table 14.1 should be used. All these critical values depend on the number of
variables included in the cointegrating regression.
Step 4 Arrive at a decision about the cointegration of the variables according to the following
hypotheses:
(15.7)
Cointegration analysis 325
Step 1 Finding the order of integration of the three variables: taking into account
criteria AIC, SC and LM(1), we obtained:
Variables Ct and Yt are integrated of order one, i.e. it is Ct ∼ I(1) and Yt ∼ I(1)
For the inflation rate variable:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(15.8)
and
(15.9)
Table 15.1 presents the MacKinnon critical values for the rejection of the hypothesis
of a unit root, evaluated from EViews for Equations (15.8) to (15.9).
Table 15.1 MacKinnon critical values for Equations (15.8) to (15.9) and the DF tδ ratios
From the figures in Table 15.1 it is seen that Zt is non-stationary and that δZt is
stationary, i.e. it is Zt ∼ I(1). In summary, all the three variables are integrated of order
one and we can therefore proceed to step 2.
(15.10)
326 Time series econometrics
Step 3 Testing the stationarity of et. Taking into account the criteria AIC, SC and
LM(1) we obtained:
(15.11)
Step 4 Comparing the tδ = −4.455 value from Equation (15.11) with the critical values
in Table 14.1 for m = 3, we see that this value is less than the critical values for a 0.05
level of significance. In other words, we accept the alternative hypothesis in (15.7) i.e.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
we accept that et is stationary, meaning, thus, that variables Ct, Yt and Zt are
cointegrated.
If we go back to the similar example in Chapter 14, we saw there that by using the
Engel-Granger cointegration test we were not sure if variables Ct and Yt were cointe-
grated. In this example, using the exactly same methodology, we found that variables
Ct, Yt and Zt are cointegrated. This should not be surprising if we consider that in the
two-variable Example of chapter 14 we possibly made a specification error in assuming
that, in the long-run, private consumption was dependent on personal disposable
income only. In this example, having possibly corrected this specification error, by
also including the level of inflation rate of the economy, we were able to reach more
definite conclusions.
(15.12)
where Yt ∼ I(1), X1t ∼ I(1),. . ., Xkt ∼ I(1), Yt,X1t,. . ., Xkt ∼ CI(1,1), εt = Yt − β0 − β1X1t−. . .
− βkXkt ∼I(0), ʋt = white noise disturbance term and λ = short-run adjustment coefficient.
Under the assumption of the existence of only one cointegrating vector that connects the
cointegrated variables, the OLS estimation applicable to the cointegrating equation will give
consistent estimates. In the case where there are more than one cointegrating vectors, the
Engle-Granger methodology is no longer valid because it is not producing consistent esti-
mates. In this case we have to use the methods presented in the next section.
The Engle and Granger (1987) two-step methodology in estimating Equation (15.12),
under the assumption of the existence of only one cointegrating vector, is the following:
Step 1 Estimate the cointegrating regression (15.5), get the consistent estimated cointe-
grating vector [1, − b0, − b1, . . ., − bk] and use it in order to obtain the estimated equilibrium
error et = Yt − b0 − b1X1t − . . . bkXkt.
Cointegration analysis 327
Step 2 Estimate the following equation by OLS:
(15.13)
1 Use appropriate statistics, such as AIC, SC and LM, for example, in order to decide
about the proper number of lags for the differenced variables to be used.
2 Use, if appropriate, the non-lagged differenced variables Xt.
3 Include in the equation other differenced ‘exogenous’ variables, as long as they are
integrated of order one, in order to improve fit.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(15.14)
Step 2 According to (15.13), using the criteria AIC, SC and t-ratios, a version of an
estimated error correction model is the following:
(15.15)
The results in (15.15) show that short-run changes in personal disposable income, Yt,
and in the inflation rate, affect private consumption, Ct, positively and negatively,
respectively. Furthermore, because the short-run adjustment coefficient is significant,
it shows that 0.37675 of the deviation of the actual private consumption from its
long-run equilibrium level is corrected each year.
(15.16)
where C = consumption and Y = income. The rationale of the first equation of this model
could be that current consumption depends on current income and on lagged consumption
due to habit persistence. The rationale of the second equation may be that current income
depends on lagged income and on lagged consumption, because higher consumption indi-
cates higher demand, which produces higher economic growth and, therefore, higher income.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(15.17)
A distinctive property of the reduced-form model (15.17) is that all its endogenous variables
are expressed in terms of its lagged endogenous variables only. In this model there are no
other ‘exogenous’ variables. This model constitutes of a vector autoregressive model of
order 1, because the highest lag length of its variables is one, and is denoted with VAR(1).
Generally, a system of m variables of the form:
(15.18)
(15.19)
where
(15.20)
In the cases where in the system (15.18) the lag lengths are not the same in all the equations
of the system, this model is called near-vector autoregressive, or near VAR.
The assumptions that usually follow a VAR model are the assumptions of a reduced-form
simultaneous equation model, i.e.:
Cointegration analysis 329
(15.21)
or in matrix form
(15.22)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(15.23)
Furthermore, a VAR(k) process is stationary if its means and covariance matrices are
bounded and the polynomial defined by the determinant
(15.24)
has all it roots outside the complex unit circle (Judge et al. 1988).
Under the assumptions written above, the parameters of a VAR(k) model can be consis-
tently estimated with OLS. Therefore, for the ith equation, the OLS estimator is given by:
(15.25)
where
(15.26)
Consistent estimates wij of the parameters ωij in (15.22) are given by:
(15.27)
We have to note here that the generalised least squares (GLS) estimator, if applied to (15.18),
will give the exact same results as the OLS estimator because matrix X is the same for all
equations. The seemingly unrelated regressions (SUR) estimator could be applied to the near
VAR model to improve efficiency.
The preceding estimation method assumes that the lag length, i.e. the order of the VAR,
is known. In cases where the VAR order is large we have a major problem in VAR analysis:
the problem of over-parameterisation. However, in most cases, the VAR order is not known
and therefore it has to be selected. Common tests for selecting the VAR order are the
following. In all these tests it is assumed that the number of observations is n, and thus k
pre-sample values for all the variables must be considered:
330 Time series econometrics
a. The likelihood ratio (LR) test
This test depends on the usual likelihood ratio statistic given by:
(15.28)
where
ln Cu = log of likelihood of the complete in coefficients (unrestricted) equation
ln Cr = log of likelihood of the smaller in coefficients (unrestricted) equation
v = number of restrictions imposed
Assuming that the coefficients of a VAR(k) model corresponding to the lagged variables are
given by the matrix A = [A1 A2 . . . Ak], the test works by testing in a sequence the following
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(15.29)
The test stops when a null hypothesis is rejected using the LR statistic and the VAR order q,
for k ≥ q ≥ 1, is selected accordingly. However, since estimation methods require white noise
errors, a higher value of q might finally be used in the estimations (Holden and Perman, 1994).
(15.30)
and
(15.31)
where m = number of equations, n = common sample size, q = lag length, and W is the esti-
mated residual covariance matrix Ω evaluated for VAR(q). The VAR order q is selected for
the corresponding minimum value of the criterion.
Example 15.3 Estimating a VAR model for the private consumption and personal
disposable income for an EU member (VAR)
Suppose that we want to estimate a VAR model for the variables of private consump-
tion (Ct) and personal disposable income (Yt) of an EU member, as discussed in the
previous examples.
The first thing that we have to do is to specify the VAR order. Because the data are
annual it seems unlikely that the lag length will be more than k = 3. Therefore, keeping
Cointegration analysis 331
the values of the variables for the first three years as pre-sample values, Table 15.3
presents the statistics from the estimation of the model for various lag lengths ranging
from k = 1 to k = 3.
Table 15.2 Statistics for the consumption-income VAR model for an EU member
For the LR test it is v = 4, because going down from one lag length to an immedi-
ately lower lag length, we exclude one lag on each of the two variables in each of the
two equations. For a 0.05 significance level, the critical value for the χ2(4) distribution
is 9.4877. Because of going down from q = 3 to q = 1, none of the LR values in
Table 15.2 is greater that χ2(4) = 9.4877, this means that none of the null hypotheses
in (15.29) are rejected. Therefore, this test indicates that the proper VAR order of this
model is q = 1. The same VAR order, q = 1, is indicated by the other two statistics, AIC
and SC, because these statistics take their minimum value for q = 1.
Estimates of the VAR(1) model are given below:
(15.32)
and
(15.33)
(15.34)
where, for simplicity, the intercept has been excluded. Assume also that all its m variables
are either simultaneously integrated of order one, or of order zero.
Model (15.34) can be rewritten as
(15.35)
where
(15.36)
332 Time series econometrics
and
(15.37)
Model (15.35) looks like an error correction model, and if all its m variables are integrated
of order one, then variables ΔYt−j are stationary. This model can consistently be estimated
under the assumption that its variables are cointegrated, so BYt−1 is also stationary.
It can be proved that (see Engle and Granger (1987) and Johansen (1989) for the original
works, or Enders (2010) and Charemza and Deadman (1997) for good presentations):
1 If the rank of matrix B is zero, then all the elements in this matrix are zero. Therefore,
in (15.35) the error correction mechanism BYt−1 does not exist, meaning that there is no
Downloaded by [Hacettepe University] at 02:27 20 January 2017
long-run equilibrium relationship between the variables of the model. Thus, these vari-
ables are not cointegrated. The VAR model could be formulated in terms of the first
differences of the variables.
2 If the rank of matrix B is equal to m, i.e. its rows are linearly independent, the vector
process {Yt} is stationary, meaning that all variables are integrated of order zero, and,
therefore, the question of cointegration does not arise. The VAR model could be formu-
lated in terms of the levels of all variables.
3 If the rank of matrix B is r, where r < m, i.e. its rows are not linearly independent, it can
be shown that this matrix can be written as:
(15.38)
where D and C are matrices of m×r dimensions. Matrix C is called the cointegrating matrix,
and matrix D is called the adjustment matrix. In the case where Yt∼I(1) then C’Yt∼I(0), i.e.
the variables in Yt are cointegrated. The cointegrating vectors are the corresponding columns
in C, say c1, c2,. . ., cr. In other words, the rank r of matrix B defines the number of cointe-
grating vectors, i.e. the cointegrating rank. The VAR model could be formulated in terms of
a vector error correction (VEC) model.
The three findings above constitute the generalisation of the Granger representation
theorem. The work of Johansen (1988), and, similarly, of Stock and Watson (1988), was to
identify the cointegrating rank r and to provide estimates of the cointegrating and adjustment
matrices, using the maximum likelihood method. The steps of the Johansen approach may
be formulated as follows (Dickey et al. (1994), Charemza and Deadman (1997)):
Step 1 Using unit roots tests, say ADF, find the order of integration of the variables involved,
say m in number.
Step 2 Using the variables in level terms formulate a VAR model and select the VAR order,
say k, by using LR, AIC, SC, or other tests.
Step 3 Regress ΔYt on ΔYt−1, ΔYt−2,. . ., ΔYt−k+1 and save the residuals. From these residuals,
construct the m×1 vector R0t taking the tth element from the saved residuals from each one
of the assumed regressions of the m variables.
Step 4 Regress Yt−k on ΔYt−1, ΔYt−2,. . ., ΔYt−k+1 and save the residuals. From these residuals,
construct the m×1 vector Rkt, taking the tth element from the saved residuals from each one
of the assumed regressions of the m variables.
Cointegration analysis 333
Step 5 If n is the sample size, using the following formula:
(15.39)
Step 6 Find the squared canonical correlations which correspond to the ordered character-
istic roots of the matrix:
(15.40)
(15.41)
Step 7 Recall that, if rank(B) = 0, the variables are not cointegrated, if rank(B) = m, the vari-
ables are stationary, and if rank(B) = r, where 0 < r < m, the variables are cointegrated.
Furthermore, it is known that the rank of matrix B is equal to the number of the characteristic
roots that are significantly different from zero. Therefore, the exercise of finding̭ the rank ̭ of
̭
matrix B̭ reduces to the testing of the signifi̭ cance of the characteristic roots μ1 > μ 2 > μ 3
>. . . > μ m, or of the insignificance of 1 − μ j (for j = 1, 2, .., m) from unity. The test is based
on the following two likelihood ratio (LR) statistics:
(15.42)
and/or
(15.43)
For the statistic (15.42) the hypotheses to be tested are in the following sequence:
(15.44)
For the statistic (15.43) the hypotheses to be tested are in the following sequence:
(15.45)
Critical values for these statistics can be found in Johansen (1988), Johansen and Juselius
(1990), Osterwald-Lenum (1992) and in Enders (2010). Table 15.3 presents critical values
334 Time series econometrics
Table 15.3 Critical values of the λmax and λtrace statistics
m-r λmax
1 1.699 2.816 3.962 5.332 6.936
2 10.125 12.099 14.036 15.810 17.936
3 16.324 18.697 20.778 23.002 25.521
4 22.113 24.712 27.169 29.335 31.943
5 27.889 30.774 33.178 35.546 38.341
λtrace
1 1.699 2.816 3.962 5.332 6.936
Downloaded by [Hacettepe University] at 02:27 20 January 2017
m-r λmax
1 4.905 6.691 8.083 9.658 11.576
2 10.666 12.783 14.595 16.403 18.782
3 16.521 18.959 21.279 23.362 26.154
4 22.341 24.917 27.341 29.599 32.616
5 27.953 30.818 33.262 35.700 38.858
λtrace
1 4.905 6.691 8.083 9.658 11.576
2 13.038 15.583 17.844 19.611 21.962
3 25.445 28.436 31.256 34.062 37.291
4 41.623 45.248 48.419 51.801 55.551
5 61.566 65.956 69.977 73.031 77.911
m-r λmax
1 5.877 7.563 9.094 10.709 12.740
2 11.628 13.781 15.752 17.622 19.834
3 17.474 19.796 21.894 23.836 26.409
4 22.938 25.611 28.167 30.262 33.121
5 28.643 31.592 34.397 36.625 39.672
λtrace
1 5.877 7.563 9.094 10.709 12.741
2 15.359 17.957 20.168 22.202 24.988
3 28.768 32.093 35.068 37.603 40.198
4 45.635 49.925 53.347 56.449 60.054
5 66.624 71.472 75.328 78.857 82.969
reproduced from Enders (2010) for various specifications of the VAR model and of the coin-
tegrating vector. However, in both trace and max tests, (15.44) and (15.45), the testing of the
hypotheses stops when, going from top to bottom, we encounter the first non-significant result.
For this case, the rank r of matrix B is that shown by the corresponding null hypothesis.
Cointegration analysis 335
Step 8 To each of the characteristic roots there corresponds an eigenvector, say v1, v2,. . ., vm,
which can constitute the eigenmatrix V = [v1 v2 . . . vm]. These eigenvectors can be normalised
by using V’SkkV = I. If in step 7 it has been found that r is the order of matrix B, then the first r
eigenvectors in V are the r cointegrating vectors that constitute the cointegrating matrix C=[v1 v2
. . . vr]. The adjustment matrix is found by D = S0kC. These are the ML estimators of C and D.
Example 15.4 Testing cointegration between the log of private consumption, the
log of personal disposable income and inflation rate for an EU member (The
Johansen approach)
Suppose that we want to test cointegration between the variables of the log of private
Downloaded by [Hacettepe University] at 02:27 20 January 2017
consumption (ct = logCt), the log of personal disposable income (yt = logYt) and infla-
tion (zt = logPt − logPt−1) of an EU member, for which the original values are presented
in Chapter 12, Table 12.1.
Following the steps presented above, and using for comparisons the common
sample 1964–1995, treating, thus, the initial values of the variables as pre-sample
values, we get:
Step 1 Using the ADF test we found that all m = 3 variables are integrated of order one,
i.e. it is ct ∼ I(1), yt ∼ I(1) and zt ∼ I(1).
Step 2 Using the variables in level terms, i.e. as ct, yt and zt, we have formulated a VAR
model and selected the VAR order, say k, by using the LR, AIC, and SC statistics,
presented in Table 15.4.
For the LR test, the degrees of freedom are v = 9, because going down from one lag
length to an immediately lower lag length, we exclude one lag on each of the three
variables in each of the three equations. For a 0.05 significance level, the critical value
for the χ2(9) distribution is 16.919. Because going down from q = 3 to q = 1, none of
the LR values in Table 15.4 are greater than χ2(9) = 16.919; this means that none of the
null hypotheses in (15.29) are rejected. Therefore, this test indicates that the proper
VAR order of this model is q = 1. However, the other two statistics, AIC and SC, indi-
cate an order of q = 2, because these statistics take their minimum value for q = 2.
Steps 3–8 There is no need to go through, one by one, all the other steps presented
above. Econometric packages like Microfit and EViews include the Johansen test as a
standard procedure. In what follows we made use of EViews.
Assuming that the correct VAR order is k = 2, according to AIC and SC, and
assuming further that the time series have means and linear trends, but the cointe-
Table 15.4 Statistics for the consumption-income VAR model for an EU member
grating equations have only intercepts, Table 15.5 presents statistics of the Johansen
tests based on the λtrace LR statistic (15.42).
On the basis of the statistics in Table 15.5, or taking into account the critical values
reported in Table 15.3, and for the 5% significance level, we see that the hypothesised
cointegrating equations are at most one, because according to the hypotheses testing in
(15.44) the LR ratio statistic cannot reject the H0: r≤1, because this statistic is less than
the corresponding critical value.
The estimate of the single cointegrating vector in normalised form has been esti-
mated with EViews to be [1.000000 −0.802239 0.786507 −2.404530] and the corre-
sponding long-run relationship is given by:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(15.46)
Let us assume now that the correct VAR order is k = 1, according to LR, and let us also
consider, as before, that the time series have means and linear trends, but the cointe-
grating equations have only intercepts. Table 15.6 presents statistics of the Johansen
tests based on the λtrace LR statistic (15.46).
On the basis of the statistics in Table 15.6, and for the 5% significance level, we see
that the hypothesised cointegrating equations are at most two, because according to the
hypotheses testing in (15.44) the LR ratio statistic cannot reject the H0: r≤2, because
this statistic is less than the corresponding critical value.
The estimates of the two cointegrating vectors in normalised forms have been esti-
mated with EViews to be [1.000000 0.00000 1.869601 −12.74157], with the corre-
sponding long-run relationship:
(15.47)
Table 15.5 Test statistics for cointegration of ct, yt, and zt. VAR(k = 2) and n =32 (linear deter-
ministic trend in the data)
Table 15.6 Test statistics for cointegration of ct, yt, and zt. VAR(k = 1) and n = 32 (linear deter-
ministic trend in the data)
(15.48)
Comparing the two results above for VAR(2) and VAR(1) we see that our conclusions
changed completely with respect to the number of the long-run relationships that exist
between these two variables. Furthermore, if we change the assumptions referring to
the data generating process (DGP), i.e. assuming linear trends, or not, in the data and/
or intercepts or trends in the cointegrating equations, we may obtain altogether
Downloaded by [Hacettepe University] at 02:27 20 January 2017
different results. Table 15.7 presents the conclusions reached with the Johansen
approach assuming different initial conditions for testing cointegration between these
three variables.
The contradicting results presented in Table 15.7 about the cointegration between
the three variables, ct, yt and zt, show that although cointegration tests are very valu-
able in distinguishing between spurious and meaningful regressions, we should not
rely only on cointegration methods. We should use economic theory and all the a
priori knowledge that is associated with this theory, in order to decide the number and
the form of the cointegrating regressions. From the three long-run equilibrium rela-
tionships estimated in (15.46), (15.47) and (15.48), it seems that the cointegrating
equation (15.46) is very reasonable, taking into account that the long-run income elas-
ticity of consumption is 0.80, a highly acceptable value, and the coefficient of the
inflation variable is negative, showing, thus, the negative effect of rising inflation on
consumption.
Finally, for the case of VAR(2) and one cointegrating regression, the estimated first
equation of the vector error correction model is shown below (the other two equations
or the other estimated terms are not shown here):
(15.49)
Table 15.7 Number of cointegrating regressions by VAR order and DGP assumptions
VAR(1) 2 2 2 2
VAR(2) 2 1 1 1
tical method for determining the direction of the causation between variables and it may
therefore be used in cointegration analysis when there is a lack of clear theoretical frame-
work concerning the variables under investigation. When in a regression equation we say
that the ‘explanatory’ variable Xt affects the ‘dependent’ variable Yt, we indirectly accept
that variable Xt causes variable Yt, in the sense that changes in variable Xt induce changes in
variable Yt. This is, in simple terms, the concept of causality. With respect to the direction
of causality, we can distinguish the following cases:
1 Unidirectional causality This is the case when Xt causes Yt, but Yt does not cause Xt.
2 Bilateral or feedback causality This is the case when variables Xt and Yt are jointly
determined.
Because, in most cases, in the absence of theoretical models, the direction of causality is not
known, various tests have been suggested to identify this direction. The most well-known
test is the one proposed by Granger (1969). This test being based on the premise that ‘the
future cannot cause the present or the past’, utilises the concept of the VAR models. Let us
therefore consider the two-variable, Xt and Yt, VAR(k) model:
(15.50)
(15.51)
1 If {α11, α12, . . ., α1k} ≠ 0 and {β21, β22, . . ., β2k} = 0, there exists a unidirectional
causality from Xt to Yt, denoted as X → Y.
2 If {α11, α12, . . ., α1k} = 0 and {β21,β22, . . ., β2k} ≠ 0, there exists a unidirectional causality
from Yt to Xt, denoted as Y→X.
3 If {α11, α12, . . ., α1k} ≠ 0, and {β21, β22, . . ., β2k} ≠ 0, there exists a bilateral causality
between Yt and Xt, denoted as X⇔Y.
In order to test the hypotheses referring to the significance or not of the sets of the coeffi-
cients of the VAR model of Equations (15.50) and (15.51), the usual Wald F-statistic could
be utilised, which is the following:
Cointegration analysis 339
(15.52)
where: SSRu = sum of squared residuals from the complete equation (unrestricted)
SSRr = sum of squared residuals from the equation under the assumption that a set of
variables is redundant (restricted)
The hypotheses in this test may be formed as follows:
(15.53)
and
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(15.54)
It has to be noted here that in the hypotheses (15.53) and (15.54) it is not tested if ‘X causes
Y’, but instead it is tested if ‘X causes Y according to the Granger type’. This is because the
Granger test is just a statistical test based not on a specific theory of causation but based on
the ability of the equation to better predict the dependent variable. Furthermore, the validity
of the test depends on the order of the VAR model and on the stationarity or not of the vari-
ables. The validity of the test is reduced if the variables involved are non-stationary (Geweke,
1984). Finally, other tests have been proposed, such as those of Sims (1972) and Geweke
et al. (1983), for example, and, moreover, Granger (1988) extended his test to also consider
the concept of cointegration.
Example 15.6 Testing Granger causality between the log of private consumption
and inflation rate for an EU member (the Granger test)
Consider the following two assumptions: (1) demand (consumption) depends on the
level of prices (inflation rate), and (2) the level of prices (inflation rate) depends on
demand (consumption), because it is formed by a ‘pull-type inflation mechanism’.
Anyone, or both, of these assumptions may be correct. We will use the Granger causality
test between the variables of the log of private consumption (ct = logCt) and inflation
rate (zt = logPt − logPt−1) of an EU member, for which the original values are presented
in Table 12.1. Let us consider the case of a VAR(2) model because our data are annual.
In Table 15.8 we present the calculated Fc statistics according to formula (15.52) for
all the possible cases of redundant variables in the two equations of the VAR(2) model.
Dependent, ct {α11, α12} = 0 vs. {α11, α12} ≠ 0 {β11, β12} = 0 vs. {β11, β12} ≠ 0
3.9359 2767.940
(0.0316) (0.0000)
Dependent, zt {α21, α22} = 0 vs. {α21, α22}α0 {β21, β22} = 0 vs. {β21, β22}α0
12.5418 5.6961
(0.0001) (0.0086)
From the results in Table 15.8 we are interested in those referring to the testing of
the set of coefficients {α11, α12} and {β21, β22}. We see that in both cases the F-statistics
are significant (significance levels less than 0.05) meaning that {α11, α12} ≠ 0 and {β21,
β22} ≠ 0, or that z Granger cause c, and c Granger cause z, respectively. In other words,
the Granger causation between consumption and inflation in an EU member is of the
bilateral type, i.e. c⇔z, something which was possibly expected. Note that in a coin-
tegration analysis this type of information is useful, particularly when there is no theo-
retical framework to guide specification of potential cointegrating regressions and
subsequent normalisation procedures.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
• Analysis: in contrast to the trace test, the max. test has a specific form and is easier to
carry out. Using this test, we test for r = 0, against r = 1 (r is the no. of cointegrated
relationships). The calculated value of max. (0,1) is 30.09, and the CV of the test at 5%
is reported to be 28.14, hence it is possible to reject r = 0 in favour of r = 1: it seems that
there is only one cointegrating relationship. However, it is advisable to test r = 1 against
r = 2, using max. (1, 2), before concluding cointegration. The calculated value of max.
(1, 2) is 10.36, the critical value at 5%, and (n − r = 5 − 2 = 3) is 22.0, therefore we
cannot reject H0. We conclude that there is potentially one cointegrated relationship
Cointegration analysis 341
between five variables. The trace test usually is consistent with the max. test, but the
max. text is more specific and easier to apply.
• Once a unique cointegrated vector/relationship is found, the ECM regression can be
carried out to t estimate the short-run adjustment via an ECM process.
Review questions
1 Explain the shortcomings of the Engle-Granger methodology when dealing with cointe-
gration analysis of a multivariate case.
2 Explain the Granger causality test and discuss why this test might be useful in cointegra-
tion analysis.
3 Explain the type of cointegration analysis to be employed when dealing with several
Downloaded by [Hacettepe University] at 02:27 20 January 2017
variables. How would you use knowledge of theory to help with the specification and
analysis?
4 Collect a time series data set (quarterly) for the period 1985–2010, for a country of your
own choice, consisting of the following variables:
• Gross domestic product (GDP)
• sum of public, G, and private consumption expenditure, C: CG
• investment expenditure (gross capital formation): INV
• exports of goods and services: E
• aggregate imports: M
• relative price of Imports to the price of domestically produced: Pm/Pd.
Use a relevant economic theory to explain how M, imports, might be determined by the
above variables (for example, see Abbott and Seddighi, 1996). Based on theory, explain
how you would define your variables for the purpose of a regression analysis. Consider
the theoretical model:
Abbott, A. and Seddighi, H.R. (1996). ‘Aggregate imports and expenditure components in the U.K.’,
Applied Economics, September 1996, 28, 98–110.
Bartlett, M. S. (1946). ‘On the theoretical specification of sampling properties of autocorrelated time
series’, Journal of the Royal Statistical Society, Series B, 27, 27–41.
Box, G. E. P. and Piece, D. A. (1970). ‘Distribution of residual autocorrelations in autoregressive
integrated moving average time series models’, Journal of the American Statistical Association, 65,
1509–1525.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Campbell, J. Y. and Perron, P. (1991). ‘Pitfalls and opportunities: What macroeconomists should now
about unit roots’, Technical Working Paper 100, NBER Working Papers Series.
Charemza, W. W. and Readman, D. F. (1997). New Directions in Econometric Practice: General to
Specific Modelling, Cointegration and Vector Autoregression, 2nd edn, Cheltenham: Edward Elgar.
Cuthbertson, K., Hall, S. G., and Taylor, M. P. (1992). Applied Econometric Techniques, New York:
Philip Allan.
Davinson, R. and MacKinnon, J. G. (1993). Estimation and Inference in Econometrics, New York:
Oxford University Press.
Dickey, D. A. and Fuller, W. A. (1979). ‘Distributions of the estimators for autoregressive time series
with a unit root’, Journal of the American Statistical Association, 74, 427–431.
Dickey, D. A. and Fuller, W. A. (1981). ‘Likelihood ratio statistics for autoregressive time series with
a unit root’, Econometrica, 49, 1057–1072.
Dickey, D. A. and Pantula, S. (1987). ‘Determining the order of differencing in autoregressive
processes’, Journal of Business and Economic Statistics, 15, 455–461.
Dickey, D. A., Bell, W., and Miller, R. (1986). ‘Unit roots in time series models: Tests and implica-
tions’, American Statistician, 40, 12–26.
Dickey, D. A., Hasza, D. P., and Fuller, W. A. (1984). ‘Testing for unit roots in seasonal time series’,
Journal of the American Statistical Association, 79, 355–367.
Dickey, D. A., Jansen, D. W., and Thornton, D. L. (1994). ‘A primer on cointegration with an applica-
tion to money and income’, in B. B. Rao, (ed), Cointegration for the Applied Economist, New York:
St. Martin’s Press.
Doldado, J., Jenkinson, T., and Sosvilla-Rivero, S. (1990). ‘Cointegration and unit roots’, Journal of
Econometric Surveys, 4: 249–273.
Enders, W. (2010). Applied Econometric Time Series, 3rd edn, New York: John Wiley.
Engle, R. F. and Granger, C. W. J. (1987). ‘Cointegration and error correction: Representation, estima-
tion and testing’, Econometrica, 55, 251–276.
Engle, R. F. and Yoo, B. S. (1987). ‘Forecasting and testing cointegrated systems’, Journal of
Econometrics, 35, 145–159.
Engle, R. F., Granger, C. W. J., and Hallman, J. J. (1989). ‘Merging short- and long-run forecasts: an
application of seasonal cointegration to monthly electricity sales forecasting’, Journal of
Econometrics, 40, 45–62.
Fuller, W. (1976). Introduction to Statistical Time Series, New York: John Wiley.
Geweke, J. (1984). ‘Inference and causality in economic time series models’, in Z. Griliches and M. D.
Intriligator (eds), Handbook of Econometrics, vol. 2, Amsterdam: North Holland.
Geweke, J., Meese, R. and Dent, W. (1983). ‘Comparing alternative tests of causality in temporal
systems’, Journal of Econometrics, 77, 161–194.
Granger, C. W. J. (1969). ‘Investigating casual relations by econometric models and cross-spectral
models’, Econometrica, 37, 424–438.
Granger, C. W. J. (1986). ‘Developments in the study of cointegrated economic variables’, Oxford
Bulletin of Economics and Statistics, 48, 213–228.
Cointegration analysis 343
Granger, C. W. J. (1988). ‘Some recent developments in a concept of causality’, Journal of
Econometrics, 39, 199–221.
Granger, C. W. J. and Newbold, P. (1974). ‘Spurious regressions in econometrics’, Journal of
Econometrics, 35, 143–159.
Greene, W. H. (1999). Econometric Analysis, 4th edn, New Jersey: Prentice-Hall.
Gujarati, D. N. (1995). Basic Econometrics, 3rd edn, New York: McGraw-Hill.
Holden, D. and Perman, R. (1994). ‘Unit roots and cointegration for the economist’ in B. B. Rao (ed),
Cointegration for the Applied Economist, New York: St. Martin’s Press.
Hylleberg, S., Engle, R. F., Granger, C. W. J., and Yoo, B. S. (1990). ‘Seasonal integration and coin-
tegration’, Journal of Econometrics, 44, 215–238.
Im, K., Pesaran, M. H., and Shin, Y. (2003). ‘Testing for unit roots in heterogeneous panels’, Journal
of Econometrics, 115, 29–52.
Jenkins, G. M. and Watts, D. G. (1968). Spectral Analysis and Its Applications, San Francisco:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Holden-Day.
Johansen, S. (1988). ‘Statistical analysis of cointegration vectors’, Journal of Economic Dynamics and
Control, 12, 213–254.
Johansen, S. and Juselius, K. (1990). ‘Maximal likelihood estimation and inference on cointegration
– with applications to the demand of money’, Oxford Bulletin of Economics and Statistics, 52,
169–210.
Judge, G. G., Hill, R. C., Griffiths, W. E., Lutkepohl, H. and Lee, T. C. (1988). Introduction to Theory
and Practice of Econometrics, 2nd edn, New York: John Wiley.
Ljung, G. M. and Box, G. P. E. (1978). ‘On a measure of lack of fit in time series models’, Biometrica,
66, 66–72.
MacKinnon, J. G. (1991). ‘Critical values of cointegration tests’, in R. F. Engle and C. W. J. Granger
(eds), Long-Run Econometric Relationships: Readings in Cointegration, New York: Oxford
University Press.
Maddala, G. S. (1992). Introduction to Econometrics, 2nd edn, New Jersey: Prentice-Hall.
Nelson, C. R. and Plosser, C. I. (1982). ‘Trends and random walks in macroeconomic time series’,
Journal of Monetary Economics, 10, 139–162.
Osborn, D. R., Chui, A. P. L., Smith, J. P. and Birchenhall, C. R. (1988). ‘Seasonality and the order of
integration in consumption’, Oxford Bulletin of Economics and Statistics, 50, 361–377.
Osterwald-Lenum, M. (1992). ‘A note with qualities of the asymptotic distribution of the maximum likeli-
hood cointegration rank test statistics’, Oxford Bulletin of Economics and Statistics, 54, 461–472.
Phillips, P. C. B. and Loretan, M. (1991). ‘Estimating long-run economic equilibria’, Review of
Economic Studies, 5, 407–436.
Phillips, P. and Perron, P. (1988). ‘Testing for a unit root in time series regression’, Biometrica, 75,
335–346.
Sargan, J. D. (1964). ‘Wages and prices in the United Kingdom: A study of econometric method-
ology’, in P. E. Hart, G. Mills and J. K. Whitaker (eds), Econometric Analysis for National Economic
Planning, London: Butterworths.
Sargan, J. D. and Bhargava, A. (1983). ‘Testing residuals from least squares regression for being
generated by the Gaussian random walk’, Econometrica, 51, 153–174.
Schmidt, P. and Phillips, P. (1992). ‘LM tests for a unit root in the presence of deterministic trends’,
Oxford Bulletin of Economics and Statistics, 54, 257–87.
Seddighi, H. R. and Shearing, D. (1998). ‘The demand for tourism in the North East of England with
special reference to Northumbria: an empirical study’, Tourism Management, 18(8), December
1997, 499–511.
Sims, C. (1972). ‘Money, income, and causality’, American Economic Review, 62, 540–552.
Stock, J. H. (1987). ‘Asymptotic properties of least squares estimators of cointegrating vectors’,
Econometrica, 55, 1035–1056.
Stock, J. H. and Watson, M. (1988). ‘Testing for common trends’, Journal of the American Statistical
Association, 83, 1097–1107.
Thomas, R. L. (1997). Modern Econometrics: an introduction, Harlow: Addison-Wesley.
Unit 5
Aspects of financial time
series econometrics
Downloaded by [Hacettepe University] at 02:27 20 January 2017
• Time series data on financial variables, such as stock/share prices, asset prices and
exchange rates, are known as financial time series. Data sets on financial time series are
collected by financial institutions, for example, the London Stock Exchange and
Financial Times, and are normally available online.
• Two key characteristics of financial time series are: (a) high frequency of the data and
(b) volatility of the data. High frequency of data reflects the way financial markets
operate in practice. In these markets prices change rapidly due to continuous changes in
supply and demand for financial assets, in response to news and changes in market
conditions. Data sets are available on an hourly and daily basis, in contrast to low-
frequency economic data.
• Volatility of financial data reflects uncertainty about the market valuation of financial
assets and corresponding continuous adjustments in financial prices.
• Volatility in the time series give rise to time-variant variance and covariances of the
time series. These key characteristics are inherently unstable over time, giving rise to
non-stationary financial time series. They are not a sign of misspecification, rather they
are inherent characteristics of financial time series.
This unit focuses on modelling volatility and correlation of financial time series. This is an
important aspect of econometric analysis of financial data and it needs to be tackled first as
part of the modelling exercise, before the start of estimation and regression analysis. The
unit will also explain in detail the relevant estimation and regression procedures of financial
time series.
16 Modelling volatility and
correlations in financial
time series
Dennis Philip
Durham Business School
INTRODUCTION
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Econometric modelling of time series, discussed in the previous chapters, was mainly
centred around modelling the conditional first moment (or conditional mean). Temporal
dependencies in the variances and covariances were considered as model misspecifications.
To this effect, researchers had developed ways of correcting such misspecifications and
time series were studied in the context of homoscedasticity. However, developments in
financial econometrics noted that most of the financial time series showed time-varying vari-
ances and covariances that captured the risk or the uncertainty element in financial assets.
This meant that heteroscedasticity was not to be considered as model misspecification but as
an important feature present in financial time series that should be modelled and not
corrected.
In this chapter we will discuss the important methodologies commonly employed to
explain the dynamics in the variances and covariances of financial assets. In this, we will
review the popular frameworks introduced for modelling the volatility and correlations in
financial time series. The interest in modelling these higher moments stems from the fact that
such estimates have been used as inputs to several financial applications in asset pricing and
for risk management purposes.
This chapter begins by defining volatility and its features. We then introduce the
parametric estimators of volatility that have been developed in the literature, such as
exponentially weighted moving average (EWMA), and autoregressive conditional heterosce-
dastic (ARCH) type models. Then we introduce non-parametric estimators of volatility, such
as range estimators and realised volatility that have been widely used in recent years due to
the availability of high-frequency financial data. Finally, we outline the multivariate specifi-
cations developed for modelling conditional volatility and conditional correlations.
Key topics
• Volatility features in financial time series
• (G)ARCH models
• Asymmetric volatility models
• Non-parametric estimators of volatility
• Multivariate volatility and correlation models
348 Aspects of financial time series econometrics
16.1 Defining volatility and its features
Volatility can be defined as the variations in returns that are observed for a unit of time.
Volatility, if assumed to be constant, can be measured by sample standard deviations when
the returns are normally distributed. Suppose returns rt ~ N(μ, σ2). The sample standard
deviation is given by:
(16.1)
where μ is the average return over the T – day period. If we consider T = 252 trading days
per year, σ̂ measures volatility per year. And, if we assume that the returns are uncorrelated
over time:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(16.2)
This shows that if returns are uncorrelated, we can approximate uncertainty for several hori-
zons as volatility is a function of time. For example, if volatility is 25% per year, then weekly
volatility is approximately 3.47%.
In the above case, since the second moment of a normally distributed random variable
fully summarises the dispersion characteristics of the variable, standard deviation is a good
measure of volatility. However, since returns distributions of financial time series commonly
show fat tails and are non-normal, standard deviations are no longer a good measure of vola-
tility. For skewed financial data, a better measure of volatility could be the interquartile range.
However, it is typically the case that the volatility we observe in returns is not constant
but time-varying. Figure 16.1 plots daily log-returns of the FTSE 100 from Jan 1, 1990 to
Dec 31, 2010.
DL_FTSE100
.100
.075
.050
.025
.000
–.025
–.050
–.075
–.100
90 92 94 96 98 00 02 04 06 08 10
Figure 16.1 Daily log-returns of FTSE 100 from Jan 1, 1990 to Dec 31, 2010.
Modelling volatility and correlations in financial time series 349
DL_FTSE100_SQUARED
.009
.008
.007
.006
.005
.004
Downloaded by [Hacettepe University] at 02:27 20 January 2017
.003
.002
.001
.000
90 92 94 96 98 00 02 04 06 08 10
We can see that volatility is certainly not constant. There are periods of high volatility, such as
during the 2007–08 financial crisis and there are periods of low volatility, such as the mid-1990s.
We also see that volatility tends to cluster where we observe large (small) movements in one
period, these are being followed by large (small) movements. This can be graphically seen in
Figure 16.2 where we plot the squared daily returns for the FTSE 100. Volatility tends to persist.
The effect of shocks on stock returns tends to extend during several subsequent periods and
such features of volatility have been referred to as long memory. Studying the autocorrelation
functions for squared returns would show significant correlations over extended lag lengths.
A further feature of volatility documented in the literature is known as leverage effects.
This feature refers to the asymmetric relationship between volatility and news. Bad news in
one period tends to increase subsequent volatility to a much greater extent than good news
of the same magnitude. Here, news is often proxied by unexpected returns and volatility by
daily squared returns (see Black (1976), Engle and Ng (1993), among others).
(16.3)
(16.4)
where α0 captures the long-run average variance, and αi ≥ 0 for i = 0,1,. . ., q in order for the
variance equation to be positive. In this specification, the shocks up to q periods ago affect
the current volatility of the process.
A simple case of this model is the ARCH(1) model:
(16.5)
where we assume that last period shocks affect the variance of this period. The size and the
impact of the shocks depend on the size and sign of α1. The estimated coefficients have to be
positive as σ2t cannot be negative.
(16.6)
We square the estimated residuals ε̂t from the above regression and run an auxiliary
regression:
(16.7)
(16.8)
against the alternative hypothesis that at least one of the coefficients is non-zero. The q lags
in the auxiliary regression tests for the presence of an ARCH(q) variance process. The test
Modelling volatility and correlations in financial time series 351
statistic is based on the R2 (squared correlation coefficient) from the auxiliary regression,
which is chi-squared distributed:
(16.9)
where T is the number of observations. If the test statistics exceed the critical values from a
chi-squared distribution, we reject the null and conclude that ARCH effects are present in the
returns. This test can also be written in terms of the F-statistic and usually reported alongside
with the LM-statistic in most econometric software packages. Engle’s ARCH LM test
validates the use of ARCH type models for modelling the variance process of returns.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(16.10)
where the current period volatility is explained by the long-run average variance, the past
values of the shocks, and the past history of volatility. A simple case of this model is the
GARCH (1,1) specification:
(16.11)
which has been widely implemented in practice. In this model, the volatility is influenced by
the last period estimate of volatility and shocks observed in the last period. The GARCH
process assigns geometrically decaying weights to past shocks. In order to illustrate this,
consider a GARCH (1,1) process in Equation 11.
If we substitute for the conditional variance equation at t − 1 (σ t2−1) in Equation 16.11:
(16.12)
(16.13)
From the above we can see that the weight applied to ε2t−i is α1 β 1i−1and the weights
decline at rate β. If we further consider an infinite number of successive substitutions for the
352 Aspects of financial time series econometrics
conditional variance equations (as above), we can see that a GARCH (1,1) specification
encapsulates an infinite-order ARCH specification with coefficient weights declining
geometrically.
(16.14)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(16.15)
(16.16)
Risk metrics uses EWMA model estimates for volatility with λ = 0.94. For λ = 0.94, we see:
(16.17)
(16.18)
where for a large T, the second term λT σ 2t−T → 0 and we can see that the weight for {ε2t −i}iT=1
declines at rate λ.
GARCH models are the same as the EWMA model in assigning exponentially declining
weights to past observations. However, unlike EWMA, GARCH models also assign some
weight to the long-run average variance rate. In Equation 16.11, when the intercept param-
eter α0 = 0 and α1 + β1 = 1, then the GARCH (1,1) model reduces to an EWMA model.
conditional variances by distinguishing the sign of the shock. We separate the positive and
negative shocks in the data and allow for different coefficients in a GARCH framework. A
GJR(p,q) model is given by:
(16.19)
where
(16.20)
The coefficients {δj}qj= 1 capture the additional volatility components of negative shocks.
If {δj = 0}qj=1 then this reduces to a GARCH(p,q) model.
Consider the simplest case of a GJR(1,1) model:
(16.21)
where α+1 measures the impact of positive shocks and α+1 + δT(= α−1) measures the impact of
negative shocks. For testing symmetry, we consider testing H0: δT = 0. Significant leverage
effects can be observed when δT > 0. For positivity of the conditional variances, we require
α0 > 0, α+1 ≥ 0, and α+1 + δT ≥ 0. Hence, δT is allowed to be negative, provided α+1 > |δT |.
(16.22)
where, if the error terms εt/σt ~ N(0,1) then . In the case of non-normal
errors, we can allow for other fat-tailed distributions, such as student t, or GED. In such
354 Aspects of financial time series econometrics
cases, μ will take other forms. The variable captures the relative size of the shocks
(16.23)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(16.24)
where α̂0 = 0, α̂1 = −0.3, α̂*1 = 0.6, and μ = 0.85. Let us consider the following cases:
Case 1: impact of a positive scaled shock of +1.0 at t − 1
(16.25)
The coefficient ψ1 captures the effect of sign of the shocks. ψ2 and ψ3 indicate whether the
magnitude of the negative and positive shocks differ, respectively.
GARCH-in-mean or the GARCH-M model introduces conditional variance into the condi-
tional mean equation in order to model the risk-return relationships fundamental to financial
applications. Engle et al. (1987), who initially introduced the ARCH-M framework, modelled
the time-varying risk premia for investors holding long-term bonds. They consider the model:
(16.26)
where εt ~ N(0,σ 2t ), yt is the excess returns on the long-term bond relative to a one-period
treasury bill and μt is the risk premia attached to holding the long-term bond. The risk premia
are modelled as:
(16.27)
(16.28)
β1 captures the compensation (time-varying risk premia) that the investors holding long-term
bonds receive for bearing the risk of interest rate changes.
The rationale behind such models is that investors are risk-averse and therefore expect an
increased proportion of returns for undertaking an increased proportion of risk from the
investment. Therefore, expected returns are a function of risk. In the above model, risk is
captured by the conditional standard deviations. Alternatively, other functional forms, such
as ln (σ2t) or σ2t, can be used in the specifications of the mean equation. Some empirical
studies use lagged conditional volatility variables in order to capture the volatility spillovers
and contagion observed between financial markets. The above model can also be extended
by incorporating other forms of ARCH processes to produce GARCH-M, EGARCH-M,
GJR-M and several other ARCH extensions.
Table 16.2 ARMA(1,1)-GARCH(1,1) model results for FTSE 100 returns rt = ϕ0 + ϕ1rt−1 + θ1εt−1 + εt
εt ~ iidN(0,σ t2 ) σ 2t = α0 + α1ε 2t−1 + β1σ 2t −1
Mean equation:
ϕ0 0.000376 6.86E-05 5.4763 0.0000
ϕ1 0.961621 0.00324 297.061 0.0000
θ1 −0.97657 6.34E-05 −15393.3 0.0000
Variance equation:
α0 1.21E-06 1.98E-07 6.0899 0.0000
α1 0.08543 0.00571 14.9719 0.0000
β1 0.905286 0.0062 146.075 0.0000
Information criteria
Akaike −6.4997
Schwarz −6.49246
Hannan-Quinn −6.49717
ARCH LM Test
ARCH(1) test: F-statistic(1,5475) = 1.487840 [0.2226]
ARCH(5) test: F-statistic(5,5467) = 1.035302 [0.3948]
The AR and MA parameters in the mean equation show strong significance at a 1% level.
While testing for the presence of ARCH effects in residuals, we find that the residuals are
not homoscedastic and indeed show ARCH effects. Hence, we model the variances of the
residuals using (G)ARCH models. We consider estimating the conditional variance using
the popular GARCH(1,1) model. Table 16.2 records the results from an ARMA(1,1)-
GARCH(1,1) model for the FTSE 100 returns.
The results show that the ARMA(1,1)-GARCH(1,1) model fits the data well and the esti-
mated parameters are all significant. The ARCH LM test indicates no ARCH effects in the
residuals. This means that the GARCH(1,1) model has explained all the ARCH effects that
were present in the data. The information criteria suggest that the heteroscedastic ARMA(1,1)-
GARCH(1,1) model is much preferred over the homoscedastic ARMA(1,1) model.
In order to test for any neglected asymmetries in the volatility model, we conduct the three
sign bias tests proposed by Engle and Ng (1993). These tests investigate possible misspecifi-
cations of the conditional variance equation. The volatility model under the null hypothesis
is the GARCH(1,1) specification. Table 16.3 records the asymmetric test results.
Modelling volatility and correlations in financial time series 357
Table 16.3 Engle and Ng (1993) asymmetric test results for
FTSE 100 returns
t-test Prob.
The results show insignificant coefficient from the sign bias t-test. This means that
the positive and negative shocks (in terms of sign) have a similar impact on the next
Downloaded by [Hacettepe University] at 02:27 20 January 2017
period volatility and that there is no ‘sign bias’ in the above GARCH(1,1) specification. To
test whether the magnitude (or size) of negative and positive shocks impact volatility differ-
ently, we conduct the negative sign bias test and the positive sign bias test, respectively. We
find that the coefficient associated to the magnitude of last-period negative shocks is insig-
nificant. This means that there is no neglected ‘size bias’ associated with the negative shocks.
However, in the case of positive shocks, we see a significant asymmetric effect uncaptured
by the GARCH(1,1) specification. When testing for the overall asymmetric effect, using the
joint test for the three effects, we find that the underlying conditional volatility process is
asymmetric. Hence, we model the conditional volatility using a GJR(1,1) model. Table 16.4
records the results from the ARMA(1,1)-GJR(1,1) model for the FTSE 100 returns. We also
conduct the asymmetric sign bias tests on the residuals from this model and find that the
model has captured all the asymmetries in the data.
Table 16.4 ARMA(1,1)-GJR(1,1) model results for FTSE 100 returns rt = ϕ0 + ϕ1rt−1 + θ1εt−1 + εt
Mean equation:
ϕ0 0.000196 1.12E-04 1.754 0.0794
ϕ1 0.686473 0.13876 4.947 0.0000
θ1 −0.698797 0.14743 −4.74 0.0000
Variance equation:
α0 1.27E-06 2.75E-07 4.64 0.0000
α1 0.017011 0.006591 2.581 0.0099
β1 0.922546 0.009416 97.98 0.0000
δT 0.095475 0.014165 6.74 0.0000
Information criteria
Akaike −6.516966
Schwarz −6.508523
Hannan-Quinn −6.51402
ARCH LM Test
ARCH(1) test: F-statistic(1,5475) = 2.461762 [0.1167]
ARCH(5) test: F-statistic(5,5467) = 1.143841 [0.3347]
Engle and Ng (1993) tests
t-test Prob.
Size bias test 0.38617 0.69937
Negative size bias test 0.21395 0.83059
Positive size bias test 2.00322 0.04515
Joint test 4.60971 0.20271
358 Aspects of financial time series econometrics
Table 16.5 ARMA(1,1)-GJR(1,1)-in-mean model results for FTSE 100 returns rt = ϕ0 + ϕ1rt−1 + θ1εt−1
Mean equation:
ϕ0 −0.000024 3.47E-04 −0.06932 0.9447
ϕ1 0.664809 0.16297 4.079 0.0000
θ1 −0.675739 0.17315 −3.903 0.0001
λ 0.027782 0.041109 0.6758 0.4992
Variance equation:
α0 1.33E-02 3.07E-03 4.349 0.0000
α1 0.017156 0.006661 2.575 0.0100
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(16.29)
where Wt is a standard Brownian motion and σt is the spot volatility. The conditional vari-
ance for one-period returns, rt+1 ≡ pt+1 − pt is:
(16.30)
which is called the integrated volatility (or the true volatility) over the period t to t + 1. Since
we do not observe this integral, we estimate it using the theory of quadratic variation. Let m
Downloaded by [Hacettepe University] at 02:27 20 January 2017
be the sampling frequency, such that there are m continuously compounded returns in one
unit of time (say, one day). The jth return is given by:
(16.31)
The realised volatility (in one unit of time) can be defined as:
(16.32)
Then, from the theory of quadratic variation, if sample returns are uncorrelated, it follows that:
(16.33)
Hence, as we increase the sampling frequency of intraday returns, we find that realised vola-
tility converges to the integrated volatility. However, increasing the sampling frequency
induces market microstructure effects, such as nonsynchronous trading, discrete price obser-
vations, intraday periodic volatility patterns and bid-ask bounce (see Campbell et al. (1997)
for more discussion). Therefore, in actual applications, realised volatility is constructed using
high-frequency returns measured above a 5 minute interval. Andersen et al. (1999) proposed
volatility signature plots as simple graphical tools to determine the optimal sampling
frequency. In this, we plot the average realised volatility against sampling frequency and pick
the highest frequency where average realised volatility appears to stabilise.
Andersen, Bollerslev, Diebold and Labys (2001) and Andersen, Bollerslev, Diebold and
Ebens (2001) documented the distributional features of realised volatility. They found that
the distributions of the non-normal asset returns scaled by the realised standard deviations
are Gaussian. Further, they found that while the unconditional distributions of the variances
and covariances for asset returns were leptokurtic and highly skewed to the right, the loga-
rithmic realised standard deviations and correlations all appear approximately Gaussian.
They confirm volatility clustering effects in daily returns and find high volatility persistence
even in the monthly returns. The logarithm of realised volatility showed long memory prop-
erties and appeared fractionally integrated with very slow mean-reversions. Therefore,
several dynamic models, such as the autoregressive fractionally integrated moving average
(ARFIMA) model and the heterogeneous autoregressive (HAR) model have been effectively
applied to model realised volatility (see Giot and Laurent (2004), Koopman et al. (2005),
Corsi et al. (2008), Corsi (2009), among others).
360 Aspects of financial time series econometrics
16.3.2 Range-based volatility
In estimating volatility using intraday price variations, an alternative way would be to use
the price range information. Suppose log prices of assets follow a zero-drift Brownian
motion. A price range can be defined as the difference between the highest and lowest log
asset prices over a fixed sampling period.
To fix notation: if Ct is the closing price on date t, Ot is the opening price on date t, Ht the
high price on date t, Lt the low price on date t and σ is the volatility to be estimated, we define
the following (notation from Yang and Zhang (2000)):
Various range-based volatility estimators have been proposed in literature. Parkinson (1980)
was the first to introduce a range estimator of daily volatility based on the highest and lowest
prices on a particular day. He used the range of log prices to define:
(16.34)
He showed that a range-based estimator can be around 5.2 times more efficient than using
daily closing prices. Garman and Klass (1980) extended Parkinson’s estimator by incorpo-
rating information about closing prices, as below:
(16.35)
Garman and Klass’s estimator showed an efficiency of around 7.4, in comparison with using
the standard close-to-close prices. Parkinson (1980) and Garman and Klass (1980) assume
that the log-price follows a Brownian motion with no drift term. This means that the average
return is assumed to be equal to zero. Rogers and Satchell (1991) relaxed this assumption
and used daily highest, lowest, opening and closing prices in estimating volatility. The
Rogers and Satchell (1991) estimator is given by:
(16.36)
This estimator performs better than the estimators proposed by Parkinson (1980) and Garman
and Klass (1980). Yang and Zhang (2000) proposed a refinement to the Rogers and Satchell
(1991) estimator for the presence of opening price jumps. Due to overnight volatility, the
opening price and the previous day’s closing price are mostly not the same. Estimators that
do not incorporate opening price jumps underestimate volatility. The Yang and Zhang
(2000) estimator is simply the sum of the estimated overnight volatility, the opening market
volatility and the Rogers and Satchell (1991) estimator. Other range-based estimators
proposed in the literature include those of Beckers (1983), Kunitomo (1992), Alizadeh et al.
(2002), Brunetti and Lildolt (2002), among others.
Despite these theoretical advances in range-based volatility estimators, empirical applica-
tions of such models have been few. Chou (2005) claims that the poor empirical perfor-
mance of such models is due to their failure to capture the dynamic evolution of volatilities.
Modelling volatility and correlations in financial time series 361
He therefore proposes a conditional autoregressive range (CARR) model for the price range.
A CARR(p,q) model for the daily price range Rt is given by:
(16.37)
(16.38)
where λt is the conditional mean of the range based on all information up to time t. The coef-
ficients (α0, αi, βj) must all be positive to ensure positivity of λt. The dynamic structure for λt
captures the persistence of the shocks to the price range. The normalised range εt = Rt/λt is
Downloaded by [Hacettepe University] at 02:27 20 January 2017
assumed to have a density function f(·) with unit mean. This model belongs to the family of
the multiplicative error model of Engle (2002) and a unit mean exponential density function
for εt can be used to estimate the parameters in the CARR model (see Engle and Russell
(1998)).
Chou (2005) showed that the CARR model provides a better in-sample and out-of-
sample performance compared with a standard GARCH model. Brandt and Jones (2006)
proposed a range-based EGARCH model, which is essentially Nelson’s (1991) EGARCH
model, but replaces the ‘standardised deviation of the absolute return from its expected
value’ with the ‘standardised deviation of the log range from its expected value’. They
showed that range-based estimators provide significant forecastability of volatility as far as
one year ahead, contradicting return-based volatility predictions that are only possible for
short horizons.
(16.39)
(16.40)
where yt is a n-dimensional vector of asset returns and μt is the conditional mean vector. The
matrix Ht is the N × N positive definite conditional variance matrix of yt. In this section we
will consider the important question: how do we parameterise Ht?
362 Aspects of financial time series econometrics
16.4.1 Vech model
In modelling the covariance matrix Ht, Bollerslev et al. (1988) were the first to propose a
natural multivariate extension of the univariate GARCH(p,q) models known as the vech
model. The vech(p,q) model is given by:
(16.41)
where vech is the vector-half operator, which stacks the lower triangular elements of an N ×
N matrix into a [N(N + 1)/2]×1 vector. The vech model is covariance stationary if all the
eigenvalues of A* and B* are less than one in modulus.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(16.42)
(16.43)
where C, A′s and B′s are matrices of N × N dimension. C is a lower triangular matrix and
therefore CC′ will be positive definite. Also, by estimating A and B rather than A* and B* (as
in the case of the vech model) we can easily ensure positive definiteness. For example, in the
case of 2 assets (N = 2), we have:
Modelling volatility and correlations in financial time series 363
(16.44)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
A BEKK(p,q) model requires estimation of N(N + 1)/2 parameters from matrix C and N 2
(p + q) parameters from matrices A′s and B′s. For example, if we have 3 assets (N = 3), the
number of parameters to be estimated in a BEKK (1,1) framework would be 24. Hence, to
reduce the number of parameters to be estimated, we can impose a Diagonal BEKK
representation where Ai and Bj are diagonal. Alternatively, we can have Ai and Bj as scalar
multiplied by a matrix of ones. In this case, we will have a Scalar BEKK model. The Diagonal
BEKK and Scalar BEKK models are covariance stationary if
respectively.
where (16.45)
Each conditional standard deviation can in turn be defined as any univariate GARCH model,
such as the GARCH(1,1) specification:
(16.46)
Ht is positive definite if all N conditional covariances are positive and R is positive definite.
The number of parameters to be estimated in a CCC-GARCH(1,1) specification is N(N +
5)/2. For the case of 3 assets (N = 3), the number of parameters to be estimated in a
CCC-GARCH(1,1) framework would be 12.
In most empirical applications, the conditional correlations have been found to be far from
constant (see Longin and Solnik (1995), Bera and Kim (2002), among others). Formal tests
for constant correlations have been proposed in the literature. Tse (2000) proposes testing
the null:
364 Aspects of financial time series econometrics
(16.47)
(16.48)
where the conditional variances, hiit and hjjt are GARCH-type models. The test statistic is a
Lagrange multiplier test, which is asymptotically χ2(N(N − 1)/2). Engle and Sheppard (2001)
proposed another test for constant correlations with the null hypothesis:
(16.49)
Downloaded by [Hacettepe University] at 02:27 20 January 2017
against the alternative H1 : vech(Rt) = vech(R̄ )+ β*1 vech(Rt−1)+ . . . + β*pvech(Rt−p). This test
statistic is again chi-squared distributed.
(16.50)
where Dt is the diagonal matrix of conditional standard deviations derived from some
univariate GARCH model (as defined in the case of CCC) and Rt is the conditional correla-
tion matrix. We then standardise the residuals by its dynamic standard deviations to get
standardised residuals. Let:
(16.51)
be the vector of standardised residuals of N GARCH models. These variables now have
standard deviations of one. We now model the conditional correlations of residuals (εt) by
modelling conditional covariances of standardised residuals (ut). We define Rt as:
(16.52)
or
(16.53)
(16.54)
(16.55)
where ut = D−1
t εt is the standardised residuals. Adding and subtracting u′t ut we get:
(16.56)
(16.57)
where
(16.58)
366 Aspects of financial time series econometrics
(16.59)
The likelihood of the volatility term lv(ϕ) is apparently the sum of the individual GARCH
likelihoods, which is jointly maximised by separately maximising each term. This gives us
the parameters ϕ. Then the likelihood of the correlation term lc(ψ | ϕ) is maximised, condi-
tional on the volatility parameters that were estimated before. In the likelihood function for
the correlation term, Rt takes the DCC form diag(Qt)−1/2. Qt · diag(Qt)−1/2.
Therefore, a two-step estimation procedure can be employed to estimate the DCC model.
First, we estimate the conditional variances (volatility terms ϕ) using MLE. That is, we
maximise the likelihood to find ϕ̂:
Downloaded by [Hacettepe University] at 02:27 20 January 2017
(16.60)
Second, we compute the standardised residuals ut = D−1 t εt and we estimate the correlations
among the several assets. Here we maximise the likelihood function of the correlation term ψ̂:
(16.61)
The above DCC estimation methodology employs the assumption of normality in condi-
tional returns, which is generally not the case for financial assets. Other alternative fat-tailed
distributions, such as Student-t, Laplace or logistic distributions, can be employed in esti-
mating the DCC model. Alternative model specifications have also been suggested in the
literature. An excellent review of such generalisations and alternative specifications to
modelling correlations is summarised in Engle (2009).
The above quote is from Engle (2002, pp. 425–426). Why are researchers in financial
econometrics interested in developing several conditional models for the second-order
moments?
2 Prove that an ∞ − order ARCH process can be approximated by a GARCH(1,1)
model.
3 Why has the model-free ‘realised volatility’ measure become a simple yet efficient way
to aggregate risk?
4 Discuss the salient distributional features of realised asset volatility.
5 Outline the dynamic conditional correlation framework used for the modelling of time-
varying variances and correlations.
Corsi, F., Kretschmer, U., Mittnik, S. and Pigorsch, C. (2008). ‘The volatility of realized volatility’,
Econometric Reviews, 27, 46–78.
Engle, R. F. (1982). ‘Autoregressive conditional heteroskedasticity with estimates of the variance of
UK inflation’, Econometrica, 50, 987–1008.
Engle, R. F. (2002a). ‘New frontiers for Arch models’, Journal of Applied Econometrics, 17,
425–446.
Engle, R. F. (2002b). ‘Dynamic conditional correlation: a simple class of multivariate generalized
autoregressive conditional heteroskedasticity models’, Journal of Business and Economic Statistics,
20, 339–350.
Engle, R. F. (2009). Anticipating Correlations: A New Paradigm for Risk Management, Princeton
University Press.
Engle, R. F. and Kroner, F. K. (1995). ‘Multivariate simultaneous generalized ARCH’, Econometric
Theory, 11, 122–150.
Engle, R. F., Lilien, D. M. and Robins, R. P. (1987). ‘Estimating time-varying risk premia in the term
structure: the ARCH-M model’, Econometrica, 55, 391–408.
Engle, R. F. and Ng, V. K. (1993). ‘Measuring and testing the impact of news on volatility’, Journal
of Finance, 48, 1749–1778.
Engle, R. F. and Russell, J. R. (1998). ‘Autoregressive conditional duration: a new model for irregular
spaced transaction data’, Econometrica, 66, 1127–1162.
Engle, R. F. and Sheppard, K. (2001). ‘Theoretical and empirical properties of dynamic conditional
correlation multivariate GARCH’, NBER Working Paper 8554.
Garman, M. B. and Klass, M. J. (1980). ‘On the estimation of price volatility from historical data’,
Journal of Business, 53, 67–78.
Giot, P. and Laurent, S. (2004). ‘Modelling daily value-at-risk using realized volatility and ARCH type
models’, Journal of Empirical Finance, 11, 379–398.
Glosten, L. R., Jagannathan, R. and Runkle, D. (1993). ‘On the relation between the expected value
and the volatility of the nominal excess return on stocks’, Journal of Finance, 48, 1779–1801.
Koopman, S., Jungbacker, B. and Hol, E. (2005). ‘Forecasting daily variability of the S&P 100 stock
index using historical, realised and implied volatility measurements’, Journal of Empirical Finance,
12, 445–475.
Kunitomo, N. (1992). ‘Improving the Parkinson method of estimating security price volatilities’,
Journal of Business, 65, 295–302.
Longin, F. and Solnik, B. (1995). ‘Is the correlation in international equity returns constant:1960–1990’,
Journal of International Money and Finance, 14, 3–26.
Nelson, D. (1991). ‘Conditional heteroskedasticity in asset returns: a new approach’, Econometrica,
59, 347–370.
Parkinson, M. (1980). ‘The extreme value method for estimating the variance of the rate of return’,
Journal of Business, 53, 61–65.
Modelling volatility and correlations in financial time series 369
Rogers, L. C. G. and Satchell, S. E. (1991). ‘Estimating variance from high, low and closing prices’,
Annals of Applied Probability, 1, 504–512.
Silvennoinen, A. and Teräsvirta, T. (2008). ‘Multivariate GARCH models’, In T. G. Andersen, R. A.
Davis, J. P. Kreiss and T. Mikosch, (eds), Handbook of Financial Time Series, New York: Springer.
Taylor, S. (1986). Modelling Financial Time Series, Chichester: Wiley.
Tse, Y. K. (2000) ‘A test for constant correlations in a multivariate GARCH model’, Journal of
Econometrics, 98, 107–127.
Tse, Y. K. and Tsui, A. K. C. (2002). ‘A multivariate GARCH model with time-varying correlations’,
Journal of Business and Economic Statistics, 20, 351–362.
Yang, D. and Zhang, Q. (2000). ‘Drift independent volatility estimation based on high, low, open and
close prices’, Journal of Business, 73, 477–491.
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Appendix
Downloaded by [Hacettepe University] at 02:27 20 January 2017
Example – if Z = +1.96,
then P(0 to Z) = 0.475
0.475
Z–> 0 1.96
Downloaded by [Hacettepe University] at 02:27 20 January 2017
0.0 0 0.004 0.008 0.012 0.016 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.091 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.148 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.17 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.195 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.219 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.258 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.291 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.334 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.377 0.379 0.381 0.383
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.398 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.437 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.475 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.483 0.4834 0.4838 0.4842 0.4846 0.485 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.489
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.492 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.494 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.496 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.497 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.498 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.499 0.499
The above (standardised) normal distribution values were generated with MS Excel.
Appendix 373
Probabilities (areas) under the ‘students’ ‘t’ distribution:
One Tailed Test Two Tailed Test
alpha 0.4 0.25 0.1 0.05 0.025 0.01 0.005 0.0025 0.001 0.0005
2 alpha 0.8 0.5 0.2 0.1 0.05 0.02 0.01 0.005 0.002 0.001
V dof
1 0.325 1 3.078 6.314 12.706 31.821 63.656 127.321 318.289 636.578
2 0.289 0.816 1.886 2.92 4.303 6.965 9.925 14.089 22.328 31.6
3 0.277 0.765 1.638 2.353 3.182 4.541 5.841 7.453 10.214 12.924
4 0.271 0.741 1.533 2.132 2.776 3.747 4.604 5.598 7.173 8.61
5 0.267 0.727 1.476 2.015 2.571 3.365 4.032 4.773 5.894 6.869
6 0.265 0.718 1.44 1.943 2.447 3.143 3.707 4.317 5.208 5.959
7 0.263 0.711 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.408
8 0.262 0.706 1.397 1.86 2.306 2.896 3.355 3.833 4.501 5.041
9 0.261 0.703 1.383 1.833 2.262 2.821 3.25 3.69 4.297 4.781
10 0.26 0.7 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.587
11 0.26 0.697 1.363 1.796 2.201 2.718 3.106 3.497 4.025 4.437
12 0.259 0.695 1.356 1.782 2.179 2.681 3.055 3.428 3.93 4.318
13 0.259 0.694 1.35 1.771 2.16 2.65 3.012 3.372 3.852 4.221
14 0.258 0.692 1.345 1.761 2.145 2.624 2.977 3.326 3.787 4.14
15 0.258 0.691 1.341 1.753 2.131 2.602 2.947 3.286 3.733 4.073
16 0.258 0.69 1.337 1.746 2.12 2.583 2.921 3.252 3.686 4.015
17 0.257 0.689 1.333 1.74 2.11 2.567 2.898 3.222 3.646 3.965
18 0.257 0.688 1.33 1.734 2.101 2.552 2.878 3.197 3.61 3.922
19 0.257 0.688 1.328 1.729 2.093 2.539 2.861 3.174 3.579 3.883
20 0.257 0.687 1.325 1.725 2.086 2.528 2.845 3.153 3.552 3.85
21 0.257 0.686 1.323 1.721 2.08 2.518 2.831 3.135 3.527 3.819
22 0.256 0.686 1.321 1.717 2.074 2.508 2.819 3.119 3.505 3.792
23 0.256 0.685 1.319 1.714 2.069 2.5 2.807 3.104 3.485 3.768
24 0.256 0.685 1.318 1.711 2.064 2.492 2.797 3.091 3.467 3.745
25 0.256 0.684 1.316 1.708 2.06 2.485 2.787 3.078 3.45 3.725
26 0.256 0.684 1.315 1.706 2.056 2.479 2.779 3.067 3.435 3.707
27 0.256 0.684 1.314 1.703 2.052 2.473 2.771 3.057 3.421 3.689
28 0.256 0.683 1.313 1.701 2.048 2.467 2.763 3.047 3.408 3.674
29 0.256 0.683 1.311 1.699 2.045 2.462 2.756 3.038 3.396 3.66
30 0.256 0.683 1.31 1.697 2.042 2.457 2.75 3.03 3.385 3.646
40 0.255 0.681 1.303 1.684 2.021 2.423 2.704 2.971 3.307 3.551
60 0.254 0.679 1.296 1.671 2 2.39 2.66 2.915 3.232 3.46
120 0.254 0.677 1.289 1.658 1.98 2.358 2.617 2.86 3.16 3.373
INF 0.253 0.674 1.282 1.645 1.96 2.326 2.576 2.807 3.09 3.291
The above ‘students’ ‘t’ distribution values were generated with MS Excel.
374 Appendix
Probabilities (areas) under the ‘F’ distribution values (at α = 0.01) (i.e. the upper 1% points):
F distribution
upper 1% points
v1
Fv2 (0.01)
v1 1 2 3 4 5 6 7 8 9 10 15 20 40 60 120 ∞
Downloaded by [Hacettepe University] at 02:27 20 January 2017
v2
1 4052 4999 5404 5624 5764 5859 5928 5981 6022 6056 6157 6209 6286 6313 6340 6366
2 98.5 99 99.16 99.25 99.3 99.33 99.36 99.38 99.39 99.4 99.43 99.45 99.48 99.48 99.49 99.5
3 34.12 30.8 29.46 28.71 28.24 27.91 27.67 27.49 27.34 27.23 26.87 26.69 26.41 26.32 26.22 26.13
4 21.2 18 16.69 15.98 15.52 15.21 14.98 14.8 14.66 14.55 14.2 14.02 13.75 13.65 13.56 13.46
5 16.26 13.3 12.06 11.39 10.97 10.67 10.46 10.29 10.16 10.05 9.72 9.55 9.29 9.2 9.11 9.02
6 13.75 10.9 9.78 9.15 8.75 8.47 8.26 8.1 7.98 7.87 7.56 7.4 7.14 7.06 6.97 6.88
7 12.25 9.55 8.45 7.85 7.46 7.19 6.99 6.84 6.72 6.62 6.31 6.16 5.91 5.82 5.74 5.65
8 11.26 8.65 7.59 7.01 6.63 6.37 6.18 6.03 5.91 5.81 5.52 5.36 5.12 5.03 4.95 4.86
9 10.56 8.02 6.99 6.42 6.06 5.8 5.61 5.47 5.35 5.26 4.96 4.81 4.57 4.48 4.4 4.31
10 10.04 7.56 6.55 5.99 5.64 5.39 5.2 5.06 4.94 4.85 4.56 4.41 4.17 4.08 4 3.91
11 9.65 7.21 6.22 5.67 5.32 5.07 4.89 4.74 4.63 4.54 4.25 4.1 3.86 3.78 3.69 3.6
12 9.33 6.93 5.95 5.41 5.06 4.82 4.64 4.5 4.39 4.3 4.01 3.86 3.62 3.54 3.45 3.36
13 9.07 6.7 5.74 5.21 4.86 4.62 4.44 4.3 4.19 4.1 3.82 3.66 3.43 3.34 3.25 3.17
14 8.86 6.51 5.56 5.04 4.69 4.46 4.28 4.14 4.03 3.94 3.66 3.51 3.27 3.18 3.09 3
15 8.68 6.36 5.42 4.89 4.56 4.32 4.14 4 3.89 3.8 3.52 3.37 3.13 3.05 2.96 2.87
16 8.53 6.23 5.29 4.77 4.44 4.2 4.03 3.89 3.78 3.69 3.41 3.26 3.02 2.93 2.84 2.75
17 8.4 6.11 5.19 4.67 4.34 4.1 3.93 3.79 3.68 3.59 3.31 3.16 2.92 2.83 2.75 2.65
18 8.29 6.01 5.09 4.58 4.25 4.01 3.84 3.71 3.6 3.51 3.23 3.08 2.84 2.75 2.66 2.57
19 8.18 5.93 5.01 4.5 4.17 3.94 3.77 3.63 3.52 3.43 3.15 3 2.76 2.67 2.58 2.49
20 8.1 5.85 4.94 4.43 4.1 3.87 3.7 3.56 3.46 3.37 3.09 2.94 2.69 2.61 2.52 2.42
21 8.02 5.78 4.87 4.37 4.04 3.81 3.64 3.51 3.4 3.31 3.03 2.88 2.64 2.55 2.46 2.36
22 7.95 5.72 4.82 4.31 3.99 3.76 3.59 3.45 3.35 3.26 2.98 2.83 2.58 2.5 2.4 2.31
23 7.88 5.66 4.76 4.26 3.94 3.71 3.54 3.41 3.3 3.21 2.93 2.78 2.54 2.45 2.35 2.26
24 7.82 5.61 4.72 4.22 3.9 3.67 3.5 3.36 3.26 3.17 2.89 2.74 2.49 2.4 2.31 2.21
25 7.77 5.57 4.68 4.18 3.85 3.63 3.46 3.32 3.22 3.13 2.85 2.7 2.45 2.36 2.27 2.17
26 7.72 5.53 4.64 4.14 3.82 3.59 3.42 3.29 3.18 3.09 2.81 2.66 2.42 2.33 2.23 2.13
27 7.68 5.49 4.6 4.11 3.78 3.56 3.39 3.26 3.15 3.06 2.78 2.63 2.38 2.29 2.2 2.1
28 7.64 5.45 4.57 4.07 3.75 3.53 3.36 3.23 3.12 3.03 2.75 2.6 2.35 2.26 2.17 2.06
29 7.6 5.42 4.54 4.04 3.73 3.5 3.33 3.2 3.09 3 2.73 2.57 2.33 2.23 2.14 2.03
30 7.56 5.39 4.51 4.02 3.7 3.47 3.3 3.17 3.07 2.98 2.7 2.55 2.3 2.21 2.11 2.01
40 7.31 5.18 4.31 3.83 3.51 3.29 3.12 2.99 2.89 2.8 2.52 2.37 2.11 2.02 1.92 1.8
60 7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82 2.72 2.63 2.35 2.2 1.94 1.84 1.73 1.6
120 6.85 4.79 3.95 3.48 3.17 2.96 2.79 2.66 2.56 2.47 2.19 2.03 1.76 1.66 1.53 1.38
∞ 6.63 4.61 3.78 3.32 3.02 2.8 2.64 2.51 2.41 2.32 2.04 1.88 1.59 1.47 1.32 1
The above ‘F’ distribution values (at α = 0.01) were generated with MS Excel.
Appendix 375
Probabilities (areas) under the ‘F’ distribution values (at α = 0.05) (i.e. the upper 5% points):
F distribution
upper 5% points
v1 (0.05)
Fv2
Downloaded by [Hacettepe University] at 02:27 20 January 2017
v1 1 2 3 4 5 6 7 8 9 10 15 20 40 60 120 ∞
v2
1 161.5 200 215.7 224.6 230.2 234 236.8 238.9 240.5 241.9 246 248 251.1 252.2 253.3 254.3
2 18.51 19 19.16 19.25 19.3 19.33 19.35 19.37 19.38 19.4 19.43 19.45 19.47 19.48 19.49 19.5
3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 8.7 8.66 8.59 8.57 8.55 8.53
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6 5.96 5.86 5.8 5.72 5.69 5.66 5.63
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 4.62 4.56 4.46 4.43 4.4 4.37
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.1 4.06 3.94 3.87 3.77 3.74 3.7 3.67
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64 3.51 3.44 3.34 3.3 3.27 3.23
8 5.32 4.46 4.07 3.84 3.69 3.58 3.5 3.44 3.39 3.35 3.22 3.15 3.04 3.01 2.97 2.93
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 3.01 2.94 2.83 2.79 2.75 2.71
10 4.96 4.1 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98 2.85 2.77 2.66 2.62 2.58 2.54
11 4.84 3.98 3.59 3.36 3.2 3.09 3.01 2.95 2.9 2.85 2.72 2.65 2.53 2.49 2.45 2.4
12 4.75 3.89 3.49 3.26 3.11 3 2.91 2.85 2.8 2.75 2.62 2.54 2.43 2.38 2.34 2.3
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67 2.53 2.46 2.34 2.3 2.25 2.21
14 4.6 3.74 3.34 3.11 2.96 2.85 2.76 2.7 2.65 2.6 2.46 2.39 2.27 2.22 2.18 2.13
15 4.54 3.68 3.29 3.06 2.9 2.79 2.71 2.64 2.59 2.54 2.4 2.33 2.2 2.16 2.11 2.07
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49 2.35 2.28 2.15 2.11 2.06 2.01
17 4.45 3.59 3.2 2.96 2.81 2.7 2.61 2.55 2.49 2.45 2.31 2.23 2.1 2.06 2.01 1.96
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41 2.27 2.19 2.06 2.02 1.97 1.92
19 4.38 3.52 3.13 2.9 2.74 2.63 2.54 2.48 2.42 2.38 2.23 2.16 2.03 1.98 1.93 1.88
20 4.35 3.49 3.1 2.87 2.71 2.6 2.51 2.45 2.39 2.35 2.2 2.12 1.99 1.95 1.9 1.84
21 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32 2.18 2.1 1.96 1.92 1.87 1.81
22 4.3 3.44 3.05 2.82 2.66 2.55 2.46 2.4 2.34 2.3 2.15 2.07 1.94 1.89 1.84 1.78
23 4.28 3.42 3.03 2.8 2.64 2.53 2.44 2.37 2.32 2.27 2.13 2.05 1.91 1.86 1.81 1.76
24 4.26 3.4 3.01 2.78 2.62 2.51 2.42 2.36 2.3 2.25 2.11 2.03 1.89 1.84 1.79 1.73
25 4.24 3.39 2.99 2.76 2.6 2.49 2.4 2.34 2.28 2.24 2.09 2.01 1.87 1.82 1.77 1.71
26 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.22 2.07 1.99 1.85 1.8 1.75 1.69
27 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.31 2.25 2.2 2.06 1.97 1.84 1.79 1.73 1.67
28 4.2 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.19 2.04 1.96 1.82 1.77 1.71 1.65
29 4.18 3.33 2.93 2.7 2.55 2.43 2.35 2.28 2.22 2.18 2.03 1.94 1.81 1.75 1.7 1.64
30 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16 2.01 1.93 1.79 1.74 1.68 1.62
40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08 1.92 1.84 1.69 1.64 1.58 1.51
60 4 3.15 2.76 2.53 2.37 2.25 2.17 2.1 2.04 1.99 1.84 1.75 1.59 1.53 1.47 1.39
120 3.92 3.07 2.68 2.45 2.29 2.18 2.09 2.02 1.96 1.91 1.75 1.66 1.5 1.43 1.35 1.25
∞ 3.84 3 2.6 2.37 2.21 2.1 2.01 1.94 1.88 1.83 1.67 1.57 1.39 1.32 1.22 1
The above ‘F’ distribution values (at α = 0.05) were generated with MS Excel.
376 Appendix
Probabilities (areas) under the ‘F’ distribution values (at α = 0.1) (i.e. the upper 10% points):
F distribution
upper 10% points
v1 (0.1)
Fv2
Downloaded by [Hacettepe University] at 02:27 20 January 2017
v1 1 2 3 4 5 6 7 8 9 10 15 20 40 60 120 ∞
v2
1 39.86 49.5 53.59 55.83 57.24 58.2 58.91 59.44 59.86 60.19 61.22 61.74 62.53 62.79 63.06 63.33
2 8.53 9 9.16 9.24 9.29 9.33 9.35 9.37 9.38 9.39 9.42 9.44 9.47 9.47 9.48 9.49
3 5.54 5.46 5.39 5.34 5.31 5.28 5.27 5.25 5.24 5.23 5.2 5.18 5.16 5.15 5.14 5.13
4 4.54 4.32 4.19 4.11 4.05 4.01 3.98 3.95 3.94 3.92 3.87 3.84 3.8 3.79 3.78 3.76
5 4.06 3.78 3.62 3.52 3.45 3.4 3.37 3.34 3.32 3.3 3.24 3.21 3.16 3.14 3.12 3.11
6 3.78 3.46 3.29 3.18 3.11 3.05 3.01 2.98 2.96 2.94 2.87 2.84 2.78 2.76 2.74 2.72
7 3.59 3.26 3.07 2.96 2.88 2.83 2.78 2.75 2.72 2.7 2.63 2.59 2.54 2.51 2.49 2.47
8 3.46 3.11 2.92 2.81 2.73 2.67 2.62 2.59 2.56 2.54 2.46 2.42 2.36 2.34 2.32 2.29
9 3.36 3.01 2.81 2.69 2.61 2.55 2.51 2.47 2.44 2.42 2.34 2.3 2.23 2.21 2.18 2.16
10 3.29 2.92 2.73 2.61 2.52 2.46 2.41 2.38 2.35 2.32 2.24 2.2 2.13 2.11 2.08 2.06
11 3.23 2.86 2.66 2.54 2.45 2.39 2.34 2.3 2.27 2.25 2.17 2.12 2.05 2.03 2 1.97
12 3.18 2.81 2.61 2.48 2.39 2.33 2.28 2.24 2.21 2.19 2.1 2.06 1.99 1.96 1.93 1.9
13 3.14 2.76 2.56 2.43 2.35 2.28 2.23 2.2 2.16 2.14 2.05 2.01 1.93 1.9 1.88 1.85
14 3.1 2.73 2.52 2.39 2.31 2.24 2.19 2.15 2.12 2.1 2.01 1.96 1.89 1.86 1.83 1.8
15 3.07 2.7 2.49 2.36 2.27 2.21 2.16 2.12 2.09 2.06 1.97 1.92 1.85 1.82 1.79 1.76
16 3.05 2.67 2.46 2.33 2.24 2.18 2.13 2.09 2.06 2.03 1.94 1.89 1.81 1.78 1.75 1.72
17 3.03 2.64 2.44 2.31 2.22 2.15 2.1 2.06 2.03 2 1.91 1.86 1.78 1.75 1.72 1.69
18 3.01 2.62 2.42 2.29 2.2 2.13 2.08 2.04 2 1.98 1.89 1.84 1.75 1.72 1.69 1.66
19 2.99 2.61 2.4 2.27 2.18 2.11 2.06 2.02 1.98 1.96 1.86 1.81 1.73 1.7 1.67 1.63
20 2.97 2.59 2.38 2.25 2.16 2.09 2.04 2 1.96 1.94 1.84 1.79 1.71 1.68 1.64 1.61
21 2.96 2.57 2.36 2.23 2.14 2.08 2.02 1.98 1.95 1.92 1.83 1.78 1.69 1.66 1.62 1.59
22 2.95 2.56 2.35 2.22 2.13 2.06 2.01 1.97 1.93 1.9 1.81 1.76 1.67 1.64 1.6 1.57
23 2.94 2.55 2.34 2.21 2.11 2.05 1.99 1.95 1.92 1.89 1.8 1.74 1.66 1.62 1.59 1.55
24 2.93 2.54 2.33 2.19 2.1 2.04 1.98 1.94 1.91 1.88 1.78 1.73 1.64 1.61 1.57 1.53
25 2.92 2.53 2.32 2.18 2.09 2.02 1.97 1.93 1.89 1.87 1.77 1.72 1.63 1.59 1.56 1.52
26 2.91 2.52 2.31 2.17 2.08 2.01 1.96 1.92 1.88 1.86 1.76 1.71 1.61 1.58 1.54 1.5
27 2.9 2.51 2.3 2.17 2.07 2 1.95 1.91 1.87 1.85 1.75 1.7 1.6 1.57 1.53 1.49
28 2.89 2.5 2.29 2.16 2.06 2 1.94 1.9 1.87 1.84 1.74 1.69 1.59 1.56 1.52 1.48
29 2.89 2.5 2.28 2.15 2.06 1.99 1.93 1.89 1.86 1.83 1.73 1.68 1.58 1.55 1.51 1.47
30 2.88 2.49 2.28 2.14 2.05 1.98 1.93 1.88 1.85 1.82 1.72 1.67 1.57 1.54 1.5 1.46
40 2.84 2.44 2.23 2.09 2 1.93 1.87 1.83 1.79 1.76 1.66 1.61 1.51 1.47 1.42 1.38
60 2.79 2.39 2.18 2.04 1.95 1.87 1.82 1.77 1.74 1.71 1.6 1.54 1.44 1.4 1.35 1.29
120 2.75 2.35 2.13 1.99 1.9 1.82 1.77 1.72 1.68 1.65 1.55 1.48 1.37 1.32 1.26 1.19
∞ 2.71 2.3 2.08 1.94 1.85 1.77 1.72 1.67 1.63 1.6 1.49 1.42 1.3 1.24 1.17 1
The above ‘F’ distribution values (at α = 0.1) were generated with MS Excel.
Index
Downloaded by [Hacettepe University] at 02:27 20 January 2017