Econometrics II Distance Module
Econometrics II Distance Module
UNITY UNIVERSITY
DEPARTMENT OF ECONOMICS
A. Course Overview
Course Title: Econometrics II
Course No: Econ-2062
Credit Hours: 3
Pre-requisite: Econ-2061
B. Course Description
This course is a continuation of Econometrics I. It aims at introducing the theory (and
practice) of regression on qualitative information, time series and panel data
econometrics as well as simultaneous equation modeling. It first makes an introduction
to the basic concepts in qualitative information modeling such as dummy variable
regression and binary choice models (LPM, Logit and Probit). Elementary time series
models, estimations and tests 2for both stationary and non-stationary data will then be
discussed. It also covers introduction to simultaneous equation modeling with
D. Assessment/Evaluation
Attendance 5%
Quiz 5%
Test-I 15%
Test-II 15%
Assignment 10%
Final Exam 50%
Total Mark 100%
E. Required text book:
Gujarati, D. N. (2009). Basic Econometrics, 5th edition, McGraw-Hill.
F. Additional readings:
Chapter one
This chapter aims at introducing models with binary explanatory variable and
specification and estimation of qualitative/dummy variables.
Topic Outline
1. Introduction
2. Describing Qualitative Information
3. Dummy as Independent Variables
4. Dummy as Dependent Variable
3.1. The Linear Probability Model (LPM)
3.2. The Logit and Probit Models
3.3. Interpreting the Probit and Logit Model Estimates
5. Review questions
1.1 Introduction
5
As it is mentioned in the previous section , this chapter is dealing with the role of
qualitative explanatory variables in regression analysis and the interpretation of the
coefficients/ parameter estimates of such models. It will be shown that the introduction
of qualitative variables, often called dummy variables, makes the linear regression
model an extremely flexible tool that is capable of handling many interesting problems
encountered in empirical studies. Having a brief introduction on such binary variables,
several models( such as LPM, Logit, Probit and Tobit models) in which the dependent
variable itself is qualitative in nature will be discussed.
Example: =
Note that (1.1) is like the two variable regression models encountered in econometrics I
except that instead of a quantitative X variable we have a dummy variable D (hereafter,
we shall designate all dummy variables by the letter D).
Model (1. 1) may enable us to find out whether sex makes any difference in a college
professor’s salary, assuming, of course, that all other variables such as age, degree
attained, and years of experience are held constant. Assuming that the disturbance
satisfy the usually assumptions of the classical linear regression model, we obtain from
(1. 1).
That is, the intercept term α gives the mean salary of female college professors and
7
the slope coefficient β tells by how much the mean salary of a male college professor
differs from the mean salary of his female counterpart, α+ β reflecting the mean
salary of the male college professor. A test of the null hypothesis that there is no sex
discrimination ( H 0 : β=0 ) can be easily made by running regression (1.1) in the usual
manner and finding out whether on the basis of the t- test the estimated β is
statistically significant.
1 if male
=0 otherwise
Model (1. 3) contains one quantitative variable (years of teaching experience) and one
qualitative variable (sex) that has two classes (or levels, classifications, or categories),
namely, male and female. What is the meaning of this equation? Assuming, as usual,
Geometrically, we have the situation shown in fig. 1.1 (for illustration, it is assumed that
α 1 >0 ). In words, model 1. 1 postulates that the male and female college professors’
8
salary functions in relation to the years of teaching experience have the same slope ( β )
but different intercepts. In other words, it is assumed that the level of the male
professor’s mean salary is different from that of the female professor’s mean salary (by
α 2 ) but the rate of change in the mean annual salary by years of experience is the
If the assumption of common slopes is valid, a test of the hypothesis that the two
regressions (1.4) and (1.5) have the same intercept (i.e., there is no sex discrimination)
can be made easily by running the regression (1.3) and noting the statistical
significance of the estimated α 2 on the basis of the traditional t- test. If the t test
shows that α^ 2 is statistically significant, we reject the null hypothesis that the male and
female college professors’ levels of mean annual salary are the same.
Before proceeding further, note the following features of the dummy variable regression
model considered previously.
1. To distinguish the two 9categories, male and female, we have introduced only
Di=0 we know that it is a female since there are only two possible outcomes.
Hence, one dummy variable suffices to distinguish two categories. The general
rule is this: If a qualitative variable has ‘m’ categories, introduce only
‘m-1’ dummy variables. In our example, sex has two categories, and hence
we introduced only a single dummy variable. If this rule is not followed, we shall
fall into what might be called the dummy variable trap, that is, the situation of
perfect multicollinearity.
2. The assignment of 1 and 0 values to two categories, such as male and female, is
arbitrary in the sense that in our example we could have assigned D=1 for
female and D=0 for male.
Suppose that, on the basis of the cross-sectional data, we want to regress the annual
expenditure on health care by an individual on the income and education of the
individual. Since the variable education is qualitative in nature, suppose we consider
three mutually exclusive levels10 of education: less than high school, high school, and
college. Now, unlike the previous case, we have more than two categories of the
qualitative variable education. Therefore, following the rule that the number of
dummies be one less than the number of categories of the variable, we should
introduce two dummies to take care of the three levels of education. Assuming that the
three educational groups have a common slope but different intercepts in the
regression of annual expenditure on health care on annual income, we can use the
following model:
= Annual expenditure
= 0 otherwise
1 if college education
= 0 otherwise
Note that in the preceding assignment of the dummy variables we are arbitrarily
treating the “less than high school education” category as the base category.
Therefore, the intercept α 1 will reflect the intercept for this category. The differential
α
intercepts α 2 and 3 tell by how much the intercepts of the other two categories differ
from the intercept of the base category, which can be readily checked as follows:
/ 11
which are, respectively the mean health care expenditure functions for the three levels
of education, namely, less than high school, high school, and college. Geometrically, the
The technique of dummy variable can be easily extended to handle more than one
qualitative variable. Let us revert to the college professors’ salary regression (1.3), but
now assume that in addition to years of teaching experience and sex, the skin color of
the teacher is also an important determinant of salary. For simplicity, assume that color
has two categories: black and white. We can now write (1.3) as:
12
= if female
=0 otherwise
= if white
=0 otherwise
Notice that, each of the two qualitative variables, sex and color, has two categories and
hence needs one dummy variable for each. Note also that the omitted, or base,
category now is “black female professor.”
Once again, it is assumed that the preceding regressions differ only in the intercept
statistically significant, it will mean that color does affect a professor’s salary. Similarly,
if α 2 is statistically significant, it will mean that sex also affects a professor’s salary. If
both these differential intercepts are statistically significant, it would mean sex as well
as color is an important determinant of professors’ salaries. From the preceding
discussion it follows that we can extend our model to include more than one
quantitative variable and more than two qualitative variables. The only precaution to be
taken is that the number of dummies for each qualitative variable should be one less
than the number of categories of that variable.
1.4.1. INTRODUCTION
In all the regression models that we have considered so far, we have implicitly assumed
that the dependent variable Y is quantitative, whereas the explanatory variables are
either quantitative, qualitative (or dummy), or a mixture thereof. In fact, in the previous
sections, on dummy variables, we saw how the dummy explanatory variables are
introduced in a regression model and what role they play in specific situations. In this
section we consider several models in which the dependent variable itself is qualitative
in nature. Although increasingly used in various areas of social sciences and medical
research, qualitative response regression models pose interesting estimation and
interpretation challenges.
If it owns a house, it takes a value 1 and 0 if it does not. There are several such
examples where the dependent variable is dichotomous. A unique feature of all the
examples is that the dependent variable is of the type that elicits a yes or no response;
that is, it is dichotomous in nature. Now before we discuss the concept of qualitative
response models (QRM) involving dichotomous response variables, it is important to
note a fundamental difference between a regression model where the dependent
variable Y is quantitative and a model where it is qualitative. In a model where Y is
quantitative, our objective is to estimate its expected, or mean, value given the values
of the explanatory variables. In terms econometrics I, what we want is E(Yi | X1i , X2i ,
. . . , Xki), where the X’s are explanatory variables, both quantitative and qualitative. In
models where Y is qualitative, our objective is to find the probability of something
happening, or owning a house, or participating in a labour force etc. Hence, qualitative
response regression models are often known as probability models.
Now we start our discussion of qualitative response models (QRM). These are models
in which the dependent variable is a discrete outcome. These models are analyzed in a
general framework of probability models. There are two broad categories of QRM.
Therefore, the dependent variable Y takes on only two values (i.e., 0 and 1).
Conventional regression methods cannot be used to analyze a qualitative dependent
15
variable model.
Example 1.3.2: ++
In this case, the dependent variable Y takes more than two alternatives. For instances,
Y= 3, otherwise
Since the discussion of multinomial QRMs is out of scope of this course, we start our
study of qualitative response models by considering the binary response regression
model. There are four approaches to estimating a probability model for a binary
response variable:
The linear probability model is the regression model applied to a binary dependent
variable. To fix ideas, consider the following simple model:
Yi =
β 0 + β 1 X + U ……………………………(1)
i i
E(Yi/Xi) =
β 0 + β 1 X …………………………………….(2)
i
Now, letting Pi = probability that Yi = 1 (that is, that the event occurs) and 1 – P i =
probability that Yi = 0 (that is, that the event does not occur), the variable Y i has the
following distributions:
Probability
0 1-
Total 1
i.e., the conditional expectation of the model (1) can, in fact, be interpreted as the
conditional probability of Yi. Since the probability Pi must lie between 0 and 1, we have
the restriction 0 ? E (Yi/Xi) ? 1 i.e., the conditional expectation, or conditional
probability, must lie between 0 and 1.
Example: The LPM estimated by OLS (on home ownership) is given as follows:
Y^ i = -0.9457 + 0.1021X
i
(0.1228) (0.0082)
The intercept of –0.9457 gives the “probability” that a family with zero income
will own a house. Since this value is negative, and since probability cannot be
negative, we treat this value as zero. The slope value of 0.1021 means that for a
unit change in income, on the average the probability of owning a house
increases by 0.1021 or about 10 percent. This is so whether the income level is
increased or not. This seems patently unrealistic. In reality one would expect
that Pi is non-linearly related to Xi (see next section).
From the preceding discussion it would seem that OLS can be easily extended to binary
dependent variable regression models. So, perhaps there is nothing new here.
Unfortunately, this is not the case, for the LPM poses several problems. That is, while
the interpretation of the parameters is unaffected by having a binary outcome, several
assumptions of the LPM are necessarily violated.
18
1. Heteroscedasticity
The variance of the disturbance terms depends on the X’s and is thus not constant. Let
us see this as follows. We have the following probability distributions for U.
Probability
0 -- 1-
1 1--
Now by definition =2 =) since and for all (no serial correlation) by assumption.
= E(Ui2) = (-
β 0 – β 1 X )2 (1-P ) + (1- β 0 – β 1 X )2 (P )
i i i i
=(-
β 0 – β 1 X )2(1- β 0 – β 1 X ) + (1- β 0 – β 1 X )2 ( β 0 + β 1 X )
i i i i
=(
β 0 + β 1 X ) (1- β 0 – β 1 X )
i i
the OLS estimator of β is inefficient and the standard errors are biased, resulting in
incorrect test.
2. Non-normality of Ui
Although OLS does not require the disturbance (U’s) to be normally distributed, we
19
assumed them to be so distributed for the purpose of statistical inference, that is,
hypothesis testing, etc. But the assumption of normality for U i is no longer tenable for
the LPMs because like Yi, Ui takes on only two values.
= Y i-
β 0 – β1 X
i
Now when Yi = 1, = 1 -
β 0 – β1 X
i
and when Yi = 0, = –
β 0 – β1 X
i
3. Non-fulfillment of 0 ? E (Yi/Xi) ? 1
The LPM produces predicted values outside the normal range of probabilities (0, 1). It
predicts value of Y that are negative and greater than 1. This is the real problem with
the OLS estimation of the LPM.
4. Functional Form:
Since the model is linear, a unit increase in X results in a constant change of β in the
probability of an event, holding all other variables constant. The increase is the same
regardless of the current value of X. In many applications, this is unrealistic. When the
outcome is a probability, it is often substantively reasonable that the effects of
independent variables will have diminishing returns as the predicted probability
approaches 0 or 1.
Remark: Because of the above mentioned problems the LPM model is not
recommended for empirical works.
20
Therefore, what we need is a (probability) model that has these two features: (1) As Xi
increases, Pi = E(Y = 1 | X) increases but never steps outside the 0–1 interval, and (2)
the relationship between Pi and Xi is nonlinear, that is, “one which approaches zero at
slower and slower rates as Xi gets small and approaches one at slower and slower rates
as Xi gets very large.’’ Geometrically, the model we want would look something like
Figure 1.1.
1 CDF
21
-? 0 ?
The above S-shaped curve is very much similar with the cumulative distribution function
(CDF) of a random variable. (Note that the CDF of a random variable X is simply the
probability that it takes a value less than or equal to X 0, were X0 is some specified
numerical value of X. In short, F(X), the CDF of X, is F(X = X 0) = P(X ? X0). Therefore,
one can easily use the CDF to model regressions where the response variable is
dichotomous, taking 0-1 values.
The CDFs commonly chosen to represent the 0-1 response models are.
2) the normal – which gives rise to the probit (or normit) model
Now let us see how one can estimate and interpret the logit model. Recall that the LPM
was (for home ownership).
Where X is income and Y = 1 means the family owns a house. Now consider the
following representation of home ownership.
Pi = E(Y = 1 | Xi)=
Pi = = where = 22
related to Zi (i.e., Xi), thus, satisfying the two requirements considered earlier. Since
the above equation is non linear in both the X and the’s. This means we cannot use the
familiar OLS procedure to estimate the parameters. This can be linear as follows.If Pi,
the probability of owning a house, then (1 − Pi ), the probability of not owning a house,
is
1 − Pi =
==
NowPi/ (1 − Pi) is simply the odds ratio in favor of owning a house- the ratio of the
probability that a family will own a house to the probability that it will not own a house.
Taking the natural log of the odds ratio we obtain
Li =) = = 1 +2
L(the log of the odds ratio) is linear in X as well as (the parameters). L is called the
logit and hence the name logit model is given to it.
Li =) = = 1 +2 +
To estimate the above model we need values of Xi and Li. Standard OLS cannot be
applied since values of L are meaningless. Recall that Pi = 1 if a family owns a house
and Pi = 0 if it does not own a house. But if we put these values directly into the logit
23
Li, we obtain:
β 0 – The intercept tells the value of the log-odds in favor of owning a house if
income is zero. Like most interpretations of intercepts, this interpretation may not have
any physical meaning.
Example: Logit estimates. Assume that Y is linearly related to the variables Xi’s as
follows:
Yi =
β 0 + β1 X + β2 X + β3 X + β4 X + β5 X + U
1 2 3 4 5 i
The variables X1, X2 and X3 are statistically significant at 99%. The variable X4 is
significant at 90%. The above estimated result shows that the variables X 1, X2 and X3
have a negative effect on the probability of an event to occur (i.e., y = 1). While the
sign of or the variable X5 has a positive effect on the probability of an event to occur.
Note: Parameters of the model are not the same as the marginal effects we are used to
when analyzing OLS.
The estimating model that emerges from the normal CDF is popularly known as the
probit model. Here the observed dependent variable Y, takes on one of the values 0
and 1 using the following criteria.
¿
Define a latent variable Y* such that
Y i = X 1i β + ?
I
¿
Y = 1 if Y i > 0
0 if
Y ¿i ? 0
The latent variable Y* is continuous (-?< Y* <?). It generates the observed binary
variable Y.
The latent variable is assumed to be a linear function of the observed X’s through the
structural model. Note that is a latent variable because it is unobserved.
25
To motivate the probit model, assume that in our home ownership example the decision
of the family to own a house or not depends on an unobservable utility index (also
= β1 + β2Xi
Given the assumption of normality, the probability that is less than or equal to can be
computed from the standardized normal CDF as:
P(Y = 1 | Xi) =
Since Y* is continuous the model avoids the problems inherent in the LPM model (i.e.,
the problem of non-normality of the error term and heteroscedasticity). However, since
the latent dependent variable is unobserved the model cannot be estimated using OLS.
Maximum likelihood can be used instead.
Most often, the choice is between normal errors and logistic errors, resulting in the
probit (normit) and logit models, respectively. The coefficients derived from the
maximum likelihood (ML) function will be the coefficients for the probit model, if we
assume a normal distribution. If we assume that the appropriate distribution of the
26
error term is a logistic distribution, the coefficients that we get from the ML function will
be the coefficient of the logit model. In both cases, as with the LPM, it is assumed thatE
[/] = 0. In the probit model, it is assumed that / In the logit model, it is assumed that /.
Hence the estimates of the parameters (β’s) from the two models are not directly
comparable. But as Amemiya suggests, a logit estimate of a parameter multiplied by
0.625 gives a fairly good approximation of the probit estimate of the same parameter.
Similarly the coefficients of LPM and logit models are related as follows: β LPM = 0.25 β
, except for intercept
Logit
In the linear regression model, as you have discussed in Econometrics I, the slope
coefficient measures the change in the average value of the regressand for a unit
change in the value of a regressor, with all other variables held constant.
In the LPM, the slope coefficient measures directly the change in the probability of an
event occurring as the result of a unit change in the value of a regressor, with the
effect of all other variables held constant. In the case of linear models, is marginal
effect as given by =
However, the coefficients in logit and probit models are not easy to interpret directly.
Hence, we have to derive marginal effects to compare results from different model.
That is, we need to derive the probability of equals 1 with respect to the kth element
in. Algebraically,
= =
= and =
27
Where denotes the standard CDF and normal density function respectively. The
marginal effects depends up on the values of . The sign of marginal effects corresponds
to the sign of its coefficients and direction of effect of explanatory variables X on the
dependent variable y.
Therefore, in the logit model the slope coefficient of a variable gives the change in the
log of the odds associated with a unit change in that variable, again holding all other
variables constant. But as noted previously, for the logit model the rate of change in the
probability of an event happening is given by Pi(1 − Pi ), where is the (partial
regression) coefficient of the kth regressor. But in evaluating Pi , all the variables
included in the analysis are involved.
In the probit model, as we saw earlier, the rate of change in the probability is is given
by , where is the density function of the standard normal variable. Thus, in both the
logit and probit models all the regressors are involved in computing the changes in
probability, whereas in the LPM only the kth regressor is involved.
Review Questions
2. What are the common and major features of the dummy variable regression
model? State them.
5. Explain or outline the similarities and differences between the probit and logit
models.
7. Can we use the standard OLS method to estimate the probit and logit models?
Why?
C. What is the effect of a unit increase in the reading quotient on the odds of
having a higher murder rate?
30
Chapter Two
31
Topic Outline
1. Introduction
7. Review questions
2.1 Introduction
Time series data is one of the three important types of data that are used in
empirical analysis such as cross-section, time-series and pooled/panel data. Time-
series data have become so frequently and intensively used in empirical research
that econometricians have recently begun to pay very careful attention to such
data. In this chapter, we first give an overview about the nature of time-series data
and then we will see the concept of stochastic process (both stationary and non-
stationary), integrated stochastic processes, unit root test and transforming non-
stationary time-series.
Recall that one of the important types of data used in empirical analysis is time series
data.
A time series data set consists of observations on a variable or several variables over
time. These are data that can be collected over time, for instances, weekly, monthly,
32
quarterly, semiannually, annually, etc. Examples of time series data include stock
prices, money supply, consumer price index, gross domestic product, and automobile
sales figures. Because past events can influence future events and lags in behavior are
prevalent in the social sciences, time is an important dimension in time series data set.
Unlike the arrangement of cross-sectional data, the chronological ordering of
observations in a time series conveys potentially important information. In this and the
following sections we take a closer look at such data not only because of the frequency
with which they are used in practice but also because they pose several challenges to
econometricians and practitioners.
From a theoretical point of view, a time series is a collection of random variables (X t).
Such a collection of random variables ordered in time is called a stochastic process.
Loosely speaking, a random or stochastic process is a collection of random variables
ordered in time. The word stochastic has a Greek origin and means "pertaining to
chance." If we let Y denote a random variable, and if it is continuous, we denote it as
Y(t), but if it is discrete, we denoted it as Yt. We distinguish two types of stochastic
process: stationary and non-stationary stochastic processes.
A type of stochastic process that has received a great deal of attention by time series
analysts is the so-called stationary stochastic process.A stochastic process is said to be
stationary if its mean and variance are constant over time and the value of the
covariance between the two time periods depends only on the distance or gap or lag
between the two time periods and not the actual time at which the covariance is
computed. In the time series literature, such a stochastic process is known as a weakly
stationary, or covariance stationary,
33 or second-order stationary, or wide sense,
stochastic process. To explain stationarity, let Yt be a stochastic time series with these
properties:
Mean: (2.1)
Variance: = (2.2)
Covariance: = (2.3)
where , the covariance (or autocovariance) at lag k, is the covariance between the
values of and , that is, between two Y values k periods apart. If k = 0, we obtain ,
which is simply the variance of Y (=); if k = 1, is the covariance between two adjacent
values of Y (recall the first-order autoregressive scheme).
Suppose we shift the origin of Y from to (for instance, from the first quarter of 1970 to
the first quarter of 1975 say for GDP data). Now if is to be stationary, the mean,
variance, and autocovariances of must be the same as those of . In short, if a time
series is stationary, its mean, variance, and autocovariance (at various lags) remain the
same no matter at what point we measure them; that is, they are time invariant. Such
a time series will tend to return to its mean (called mean reversion) and fluctuations
around this mean (measured by its variance) will remain constant.
Why are stationary time series so important? Because if a time series is non-stationary,
we can study its behavior only for the time period under consideration. Each set of time
series data will therefore be for a particular time period. As a consequence, it is not
possible to generalize it to other time periods. Therefore, for the purpose of forecasting,
such (non-stationary) time series may be of little practical value. We call a stochastic
process purely random (white noise, process) if it has zero mean, constant variance ,
and is serially uncorrelated. Loosely speaking, the error term is assumed to be a white
noise process if ̴IIDN (0,); that is, is independently and identically distributed as a
normal distribution with zero mean
34 and constant variance.
Although our interest is in stationary time series, one often encounters non-stationary
time series, the classic example being the random walk model (RWM). That is, a
non-stationary time series will have a time varying mean or a time-varying variance or
both. It is often said that asset prices, such as stock prices or exchange rates, follow a
random walk; that is, they are non-stationary. We distinguish two types of random
walks: (1) random walk without drift (i.e., no constant or intercept term) and (2)
random walk with drift (i.e., a constant term is present).
A) Random Walk without Drift. Suppose is a white noise error term with mean 0
and variance σ2. Then the series Yt is said to be a random walk if
=+ (2.4)
In the random walk model, as (2.4) shows, the value of Y at time t is equal to its value
at time (t − 1) plus a random shock; thus it is an AR(1) model. We can think of (2.4) as
a regression of Y at time t on its value lagged one period. Believers in the efficient
capital market hypothesis argue that stock prices are essentially random and
therefore there is no scope for profitable speculation in the stock market: If one could
predict tomorrow’s price on the basis of today’s price, we would all be millionaires.
=+
=+=++
=+=+++
35
In general, if the process started at some time 0 with a value of , we have
= +∑ (2.5)
Therefore,
(2.7)
As the preceding expression shows, the mean of Y is equal to its initial, or starting,
value, which is constant, but as t increases, its variance increases indefinitely, thus
violating a condition of stationarity. In short, the RWM without drift is a non-stationary
stochastic process. In practice is often set at zero, in which case =.
-== (2.8)
where is the first difference operator. It is easy to show that, while is non-stationary,
its first difference is stationary. In other words, the first differences of a random walk
time series are stationary. 36
=+ (2.9)
where δ is known as the drift parameter. The name drift comes from the fact that if
we write the preceding equation as
-= (2.10)
= +t (2.11)
(2.12)
Hence, for RWM with drift the mean as well as the variance increases over time, again
violating the conditions of stationarity. In short, RWM, with or without drift, is a non-
stationary stochastic process.
=+ (2.13)
This model resembles the Markov first-order autoregressive model that we discussed in
the case of autocorrelation. If ρ = 1, (2.13) becomes a RWM (without drift). If ρ is in
fact 1, we face what is known as the unit root problem, that is, a situation of non-
stationarity; we already know that in this case the variance of is not stationary. The
37
name unit root is due to the fact that ρ = 12. Thus the terms non-stationarity, random
walk, and unit root can be treated as synonymous.
If, however, |ρ| ≤ 1, that is if the absolute value of ρ is less than one, then it can be
shown that the time series is stationary in the sense we have defined it 3. In practice,
then, it is important to find out if a time series possesses a unit root. In the next section
we will discuss tests of unit root, that is, tests of stationarity.
The distinction between stationary and non-stationary stochastic processes (or time
series) has a crucial bearing on whether the trend is deterministic or stochastic.
Broadly speaking, if the trend in a time series is completely predictable and not
variable, we call it a deterministic trend, whereas if it is not predictable, we call it a
stochastic trend. To make the definition more formal, consider the following model of
the time series .
where is a white noise error term and where t is time measured chronologically. Now
we have the following possibilities:
2
A technical point: If ρ = 1, we can write (2.13) as - = . Now using the lag operator L so that L = , L2 = , and so on,
we can write (2.13) as (1 − L) = . The term unit root refers to the root of the polynomial in the lag operator. If you
set (1 − L) = 0, we obtain, L = 1, hence the
38 name unit root.
3
If in (2.13) it is assumed that the initial value of Y (=) is zero, |ρ| ≤ 1, and is white noise and distributed normally
with zero mean and unit variance, then it follows that ) = 0 and ) = 1/(1 − ρ2). Since both these are constants, by the
definition of stationarity, is stationary. On the other hand, as we saw before, if =1, is a random walk or
nonstationary.
=+ (2.15)
which is nothing but a RWM without drift and is therefore nonstationary. But note that,
if we write (2.15) as
== (2.8)
= β1+ (2.16a)
= = β1+ (2.16b)
this means will exhibit a positive (β1>0) or negative (β1<0) trend. Such a trend is called
a stochastic trend. Equation (2.16b) is a DSP process because the non-stationarity in
can be eliminated by taking first differences of the time series.
which is called a trend stationary process (TSP). Although the mean of is β1+ β1t,
which is not constant, its variance (= σ2) is. Once the values of β1 and β2 are known, the
mean can be forecast perfectly. Therefore, if we subtract the mean of from , the
39
resulting series will be stationary, hence the name trend stationary. This procedure of
removing the (deterministic) trend is called detrending.
we have a random walk with drift and a deterministic trend, which can be seen if we
write this equation as
The random walk model is a specific case of a more general class of stochastic
processes known as integrated processes. Recall that the RWM without drift is non-
stationary, but its first difference, as shown in (2.8), is stationary. Therefore, we call the
RWM without drift integrated of order 1, denoted as . Similarly, if a time series has
to be differenced twice (i.e., take the first difference of the first differences) to make it
stationary, we call such a time series integrated of order 24. In general, if a
(nonstationary) time series has to be differenced d times to make it stationary, that
time series is said to be integrated of order d. A time series integrated of order d is
denoted as ̴. If a time series is stationary to begin with (i.e., it does not require any
differencing), it is said to be integrated of order zero, denoted by ̴. Thus, we will use
40
the terms “stationary time series” and “time series integrated of order zero” to mean
the same thing. Most economic time series are generally ; that is, they generally
become stationary only after taking their first differences.
The following properties of integrated time series may be noted: Let Xt , Yt ,and Zt be
three time series.
1. If ̴ and ̴, then ; that is, a linear combination or sum of stationary and non-stationary
time series is non-stationary.
2. If ̴, then ) =, where a and b are constants. That is, a linear combination of a series is
also. Thus, if ̴, then ) ̴.
4. If ̴ and ̴, then ) ̴ I(d*); d* is generally equal to d, but in some cases d* < d. As you
can see from the preceding statements, one has to pay careful attention in combining
two or more time series that are integrated of different order.
To see why stationary time series are so important, consider the following two random
walk models:
=+ (2.20)
=+ (2.21)
41
where we generated 500 observations of from ̴N(0, 1) and 500 observations of from
̴N(0, 1) and assumed that the initial values of both Y and X were zero. We also assumed
that and are serially uncorrelated as well as mutually uncorrelated. As you know by
now, both these time series are nonstationary; that is, they are I(1) or exhibit
stochastic trends.
Suppose we regress on . Since and are uncorrelated I(1) processes, the R2 from the
regression of Y on X should tend to zero; that is, there should not be any relationship
between the two variables. Consider the regression results:
_________________________________________________________________
Variable Coefficient Std. error t statistic
---------------------------------------------------------------------------------------------------
C -13.2556 0.6203 -21.36856
X 0.3376 0.0443 7.61223
R2 = 0.1044 d = 0.0121
----------------------------------------------------------------------------------------------------
-----
As you can see, the coefficient of X is highly statistically significant, and, although the
R2 value is low, it is statistically significantly different from zero. From these results, you
may be tempted to conclude that there is a significant statistical relationship between Y
and X, whereas a priori there should be none. This is in a nutshell the phenomenon of
spurious or nonsense regression, first discovered by Yule. Yule showed that
(spurious) correlation could persist in non-stationary time series even if the sample is
very large.
remember that although and are non-stationary, their first differences are stationary. In
such a regression you will find that R2 is practically zero, as it should be, and the
Durbin–Watson d is about 2.
By now you have a good idea about the nature of stationary stochastic processes and
their importance. In practice we face two important questions: (1) how do we find out
if a given time series is stationary? (2) If we find that a given time series is not
stationary, is there a way that it can be made stationary?
A test of stationarity (or non-stationarity) that has become widely popular over the past
several years is the unit root test. We will first explain it, and then illustrate it. To
start with, we consider equation (2.13).
=+ (2.13)
We know that if ρ = 1, that is, in the case of the unit root, (2.13) becomes a random
walk model without drift, which we know is a non-stationary stochastic process.
Therefore, to test whether series Yt stationary or not, we regress on its (one period)
lagged value and find out if the estimated ρ is statistically equal to 1? If it is, then is
non-stationary. This is the general
43 idea behind the unit root test of stationarity. For
theoretical reasons, we manipulate (2.13) as follows: Subtract from both sides of
(2.13) to obtain:
= + = (+ (2.22)
=+ (2.23)
In practice, therefore, instead of estimating (2.13), we estimate (2.23) and test the
(null) hypothesis that δ = 0. If δ = 0, then ρ = 1, that is we have a unit root, meaning
the time series under consideration is non-stationary. Before we proceed to estimate
(2.23), it may be noted that if δ = 0, (2.23) will become
== (2.24)
Since is a white noise error term, it is stationary, which means that the first differences
of a random walk time series are stationary, a point we have already made before.
Now let us turn to the estimation of (2.23). We take the first differences of and regress
them on and see if the estimated slope coefficient in this regression (=) is zero or not.
If it is zero, we conclude that is non-stationary. But if it is negative, we conclude that is
stationary. The only question is which test we use to find out if the estimated
coefficient of in (2.23) is zero or not. You might be tempted to say, why not use the
usual t test? Unfortunately, under the null hypothesis that δ = 0 (i.e., ρ = 1), the t
value of the estimated coefficient of does not follow the t distribution even in large
samples; that is, it does not have an asymptotic normal distribution. This suggests that
t-test is not applicable and instead we use widely used test called Dickey-Fuller test.
44
Dickey and Fuller have shown that under the null hypothesis that δ = 0, the estimated t
value of the coefficient of in (2.23) follows the τ (tau) statistic. A sample of these
critical values is given D-F table. In the literature the tau statistic or test is known as
the Dickey–Fuller (DF) test, in honor of its discoverers.
where t is the time or trend variable. In each case, the null hypothesis is that δ = 0;
that is, there is a unit root-the time series is non-stationary. The alternative hypothesis
is that δ is less than zero; that is, the time series is stationary. If the null hypothesis is
rejected, it means that is a stationary time series with zero mean in the case of (2.23),
that is stationary with a non-zero mean [= β1/(1 − ρ)] in the case of (2.25), and that is
stationary around a deterministic trend in (2.26).
It is extremely important to note that the critical values of the tau test to test the
hypothesis that δ = 0, are different for each of the preceding three specifications of the
DF test, which can be seen tau table. Moreover, if, say, specification (2.25) is correct,
45
but we estimate (2.23), we will be committing a specification error, whose
consequences we already know from Chapter 4 of Econometrics I.
Our primary interest here is in the t (=τ) value of the coefficient. The critical 1, 5, and
10 percent τ values are −3.5064, −2.8947, and −2.5842. The estimated δ coefficient is
negative, implying that the estimated ρ is less than 1. The estimated τ value is
−0.2191, which in absolute value is below even the 10 percent critical value of
−2.5842. Since, in absolute terms, the former is smaller than the latter, our conclusion
is that the GDP time series is not stationary.
Now that we know the problems associated with non-stationary time series, the
46
practical question is what to do. To avoid the spurious regression problem that may
arise from regressing a non-stationary time series on one or more non-stationary time
series, we have to transform non-stationary time series to make them stationary. The
transformation method depends on whether the time series are difference stationary
(DSP) or trend stationary (TSP). We consider each of these methods in turn.
Difference-Stationary Processes
If a time series has a unit root, the first differences of such time series are stationary 5.
Therefore, the solution here is to take the first differences of the time series. Returning
to our U.S. GDP time series, we have already seen that it has a unit root. Let us now
see what happens if we take the first differences of the GDP series.
= 16.0049 − 0.06827
Trend-Stationary Process
The simplest way to make Trend-Stationary Process time series stationary is to regress
it on time and the residuals from this regression will then be stationary. In other words,
run the following regression:
= β + β2t + (2.29)
47
where is the time series under study and where t is the trend variable measured
chronologically. Now
=(--) (2.30)
It should be pointed out that if a time series is DSP but we treat it as TSP, this is called
underdifferencing. On the other hand, if a time series is TSP but we treat it as DSP,
this is called overdifferencing. The consequences of these types of specification
errors can be serious, depending on how one handles the serial correlation properties of
the resulting error terms.
In passing it may be noted that most macroeconomic time series are DSP rather than
TSP.
Review Questions
48
4. Proof mathemathecally that a random walk model without a drift is non-
stationary while its first-difference is stationary.
6. Review questions
49
Chapter Three
The purpose of this chapter is to introduce the student very briefly about the
concept of simultaneous dependence of economic variables.
Topic Outline
1. Introduction
3. Simultaneity bias
6. Review questions
3.1. Introduction
In all the previous chapters discussed so far, we have been focusing exclusively with
the problems and estimations of a single equation regression models. In such models, a
dependent variable is expressed as a linear function of one or more explanatory
variables. The cause-and-effect relationship in such models between the dependent
and independent variable is unidirectional. That is, the explanatory variables are the
cause and the independent variable is the effect. But there are situations where such
one-way or unidirectional causation in the function is not meaningful. This occurs if, for
instance, Y (dependent variable) is not only function of X’s (explanatory variables) but
also all or some of the X’s are, in turn, determined by Y. There is, therefore, a two-way
flow of influence between Y and (some of) the X’s which in turn makes the distinction
between dependent and independent variables a little doubtful. Under such
circumstances, we need to consider more than one regression equations; one for each
interdependent variables to understand the multi-flow of influence among the variables.
This is precisely what is done in simultaneous equation models and what we are going
to see in this chapter.
3.2.Simultaneous Bias
The bias arising from application of such procedure of estimation which treats each
equation of the simultaneous equations model as though it were a single model is
known as simultaneity bias or simultaneous equation bias.
Suppose that the following assumptions hold (recall assumptions of classical regression
model).
The reduced form of X of the above model is obtained by substituting Y in the equation
of X.
X =β 0 + β 1 ( α 0 + α 1 X +U )+ β 2 Z +V
X=
β 0 + α 0 β1
+
(β2
1−α 1 β 1 1−α 1 β1
Z+
) (
β 1 U +V
1−α 1 β 1 )
−−−−−−−−−−−−−−−−−−−−−(3 .2 )
Applying OLS to the first equation of the above structural model will result in biased
estimator because cov ( X i U i )=Ε( X i U j )≠0 . Now, let’s proof whether this is the case or
not.
=Ε [ { X−Ε ( X ) } U ]−−−−−−−−−−−−−−−−−−−−−−−−−(3 . 3)
=Ε
[{ β0 +α 0 β 1
1−α 1 β 1
+
(
β2
1−α 1 β 1 ) (
Z+
β1 U +V
1−α 1 β1
−
)
β 0+ α0 β 1
1−α 1 β1
−
β2
1−α 1 β1
Z U
( ) }]
[{ }]
53
U
=Ε ( β U +V )
1−α 1 β1 1
=
( 1
1−α 1 β1 ) 2
Ε ( β1 U +UV )
=
( β1
1−α 1 β1 )
Ε (U 2 )=
β 1 σ 2u
1−α 1 β1
≠0
, since
That is, covariance between X and U is not zero. As a consequence, if OLS is applied to
each equation of the model separately the coefficients will turn out to be biased. Now,
let’s examine how the non-zero co-variance of the error term and the explanatory
variable will lead to biasness in OLS estimates of the parameters. If we apply OLS to
Σ xy Σx(Y −Ȳ ) Σ xY Ȳ Σx
α^ 1= = = 2 2
Σx 2
Σx2 Σx ; (since Σx is zero)
Σx( α 0 +α 1 X +U ) α 0 Σx Σ xU Σ xU
= = 2
+ α1 +
Σx 2
Σx Σx2 Σx2
Σ xX
2
=1
But, we know that Σx=0 and Σx , hence
Σ xU
α=α
^ 1+ −−−−−−−−−−−−−−−−−−−−−−(3 . 4 )
Σx 2
Ε( α^ )=α 1 + Ε ( ΣΣxxU )
2
Since, we have already proved that Ε( Σ XU )≠0 ; which is the same as Ε( XU )≠0 .
assumption that X’s symbolize the exogenous variables and Y’s symbolize the
variables.
d
Q =β 0 +β 1 P+ β2 Y +U 1−−−−−−−−−−−−−−−−−−(3 .5 )
s
Q =α 0 + α 1 P+α 2 R+U 2 −−−−−−−−−−−−−−−−−−−3 . 6 )
Here P and Q are endogenous variables and Y and R are exogenous variables.
55
A structural model describes the complete structure of the relationships among the
economic variables. Structural equations of the model may be expressed in terms of
endogenous variables, exogenous variables and disturbances (random variables). The
parameters of structural model express the direct effect of each explanatory variable on
the dependent variable. Variables not appearing in any function explicitly may have an
indirect effect and is taken into account by the simultaneous solution of the system. For
instance, a change in consumption affects the investment indirectly and is not
considered in the consumption function. The effect of consumption on investment
cannot be measured directly by any structural parameter, but is measured indirectly by
considering the system as a whole.
Y =C +Z ---------------------------------------------------- (3.8)
Z=non-consumption expenditure
Y=national income
56
III) Reduced form of the model:
The reduced form of a structural model is the model in which the endogenous variables
are expressed a function of the predetermined variables and the error term only.
Since C and Y are endogenous variables and only Z is the exogenous variables, we
have to express C and Y in terms of Z. To do this substitute Y=C+Z into equation (16).
C=α+ β(C +Z ) + U
C=α+ βC+βZ +U
C−βC=α+ βZ+U
C(1−β )=α+ βZ +U
C=
α
+
β
1−β 1−β ( )
Z+
U
1−β ---------------------------------- (3.9)
Y=
α
+
1
1−β 1−β ( )
Z+
U
1−β -------------------------------- (3.10)
Equation (9) and (10) are called the reduced form of the structural model of the above.
We can write this more formally as:
C Y U U
C Z
1 1 1
Y CZ 1 U
Y Z
1 1 1
Parameters of the reduced form measure the total effect (direct and indirect) of a
change in exogenous variables on the endogenous variable. For instance, in the above
times
( 1−β1 ) ,the indirect effect.
Note that since the reduced form coefficients can be estimated by the OLS method and
these coefficients are combinations of the structural coefficients, the possibility exist
that the structural coefficients can be “retrieved” from the reduced-form coefficients,
and it is in the estimation of the structural parameters that we may be ultimately
interested. Unfortunately, retrieving the structural coefficients from the reduced form
coefficients is not always possible;
58 this problem is one way of viewing the identification
problem.
An identified equation may be either exactly (or fully or just) identified or over
identified. It is said to be over identified if more than one numerical value can be
obtained for some of the parameters of the structural equations. The circumstances
under which each of these cases occurs will be shown in the following discussion.
a) Under Identification
Solving (3.14) using the substitution technique, we obtain the equilibrium price
59
Pt = ?0 + Vt …………………………………………..(3.15)
where = V1 =
= + …………………………….. (3.16)
where = Wt =
Note that and , (the reduced-form-coefficients) contain all four structural parameters; ?
, ?1, ?0 and ?1. But, there is no way in which the four structural unknowns can be
0
estimated from only two reduced form coefficients. Recall from Algebra for Economists
that to estimate four unknowns we must have four (independent) equations, and in
general, to estimate k unknowns we must have R (independent) equations. What all
this means is that, given time series data on p(price) and Q(quantity) and no other
information, there is no way the researcher guarantee whether he/she is estimating the
demand function or the supply function. That is, a given and represent simply the
point of intersection of the appropriate demand and supply curves because of the
equilibrium condition that demand is equal to supply.
The reason we could not identify the preceding demand function or the supply function
was that the same variables P and Q are present in both functions and there is no
additional information. But suppose we consider the following demand and supply
model.
Where = =- = =
Substituting the equilibrium price (3.20) into the demand or supply equation of ( 3.17)
or (3.18) we obtain the corresponding equilibrium quantity:
= = = =
the demand-and-supply model given in equations (3.17) and (3.18) contain six
structural coefficients ?0, ?1, ?2, ?0, ?1, and ?2 – and there are six reduced form
coefficients - ?0, ?1, ?2, ?3, ?4 and ?5 – to estimate them. Thus, we have six equations in
six unknowns, and normally we should be able to obtain unique estimates. Therefore,
the parameters of both the demand and supply equations can be identified and the
system as a whole can be identified.
c) Over identification
Note that for certain goods and services, wealth of the consumer is another important
determinant of demand. Therefore, the demand function (3.17) can be modified as
follows, keeping the supply function as before:
Demand function = ?0 61
+ ?1Pt +?2It +?3Rt + U1t……………... (3.22)
Equating demand to supply, we obtain the following equilibrium price and quantity
Where = = = =
= = = =
= = = =
The demand and supply model in (3.22) and (3.23) contains seven structural
coefficients, but there are eight equations to estimate them – the eight reduced form
coefficients given above (i.e., ?0 … ?7). Notice that the number of equations is greater
than the number of unknowns. As a result, unique estimation of all the parameters of
our model is not possible. For example, one can solve for ?1 in the following two ways
= or =
That is, there are two estimates of the price coefficient in the supply function, and there
is no guarantee that these two values or solutions will be identical. Moreover, since ?1
will be transmitted to other estimates. Note that the supply function is identified in the
system (3.17) and (3.18) but not in the system (3.22) and (3.23), although in both
cases the supply function remains the same. This is because we have “too much” or an
over sufficiency of information to identify the supply curve. The over sufficiency of the
information results from the fact that in the model (3.22) and (3.23) the exclusion of
the income variable form the supply function was enough to identify it, but in the model
(3.22) and (3.23) the supply function excludes not only the income variable but also the
wealth variable. However, this situation does not imply that over identification is
necessarily bad since the problem
62 of too much information can be handled.
Notice that the situation is the opposite of the case of under identification where there
is too little information. The only way in which the structural parameters of unidentified
(or under identified) equations can be identified (and thus be capable of being
estimated) is through imposition of further restrictions, or use of more extraneous
information. Such restrictions, of course, must be imposed only if their validity can be
defended.
In a simple example such as the forgoing, it is easy to check for identification; in more
complicated systems, however, it is not so easy. This time consuming procedure can be
avoided by resorting to either the orders condition or the rank condition of
identification. Although the order condition is easy to apply, it provides only a necessary
condition for identification. On the other hand, the rank condition is both a necessary
and sufficient condition for identification. Hence, in the next section we discuss orders
condition or the rank condition of identification.
There are two conditions which must be fulfilled for an equation to be identified.
This condition is based on a counting rule of the variables included and excluded from
the particular equation. It is a necessary but not sufficient condition for the
identification of an equation. The order condition may be stated as follows.
order condition for identification is sometimes stated in the following equivalent form.
For an equation to be identified the total number of variables excluded from it but
included in other equations must be at least as great as the number of equations of the
system less one.
Then the order condition for identification may be symbolically expressed as:
( K − M )≥ ( G −1 )
[ excluded ¿ ] ¿ ¿ ¿ ¿
¿
¿
For example, if a system contains 10 equations with 15 variables, ten endogenous and
five exogenous, an equation containing 11 variables is not identified, while another
containing 5 variables is identified.
Order condition:
( K−M )≥(G−1 )
(15−11 )<(10−1 ) ; that is, the order condition is not satisfied.
order condition:
( K−M )≥(G−1 )
(15−5 )<(10−1) ; that is, the order condition is satisfied.
The order condition for identification is necessary for a relation to be identified, but it is
not sufficient, that is, it may be fulfilled in any particular equation and yet the relation
may not be identified.
The rank condition states that: in a system of G equations any particular equation is
identified if and only if it is possible to construct at least one non-zero determinant of
order (G-1) from the coefficients of the variables excluded from that particular equation
but contained in the other equations of the model. The practical steps for tracing the
identifiability of an equation of a structural model may be outlined as follows.
Firstly. Write the parameters of all the equations of the model in a separate table,
noting that the parameter of a variable excluded from an equation is equal to zero.
y 1 =3 y 2 −2 x 1 + x 2 +u1
y 2 = y 3 + x 3 +u 2
y 3 = y 1− y 2−2 x 3 +u3
where the y’s are the endogenous variables and the x’s are the predetermined
variables. This model may be rewritten in the form
− y 1 +3 y 2 +0 y3 −2 x 1 +x652 +0 x 3 +u 1=0
0 y 1− y 2 + y 3 + 0 x 1 + 0 x 2 + x3 +u2 =0
y 1 − y 2− y 3 +0 x 1 +0 x 2−2 x 3 +u 3=0
Ignoring the random disturbance the table of the parameters of the model is as follows:
Variables
Equations
Y1 Y2 Y3 X1 X2 X3
1st equation -1 3 0 -2 1 0
2nd equation 0 -1 1 0 0 1
3rd equation 1 -1 -1 0 0 -2
Secondly. Strike out the row of coefficients of the equation which is being examined for
identification. For example, if we want to examine the identifiability of the second
equation of the model we strike out the second row of the table of coefficients.
Thirdly. Strike out the columns in which a non-zero coefficient of the equation being
examined appears. By deleting the relevant row and columns we are left with the
coefficients of variables not included in the particular equation, but contained in the
other equations of the model. For example, if we are examining for identification the
second equation of the system, we will strike out the second, third and the sixth
columns of the above table, thus obtaining the following tables.
Y1 Y2 Y3 X1 X2 X3 Y3 X1 X2
? ? ?
66
1st -1 3 0 -2 1 0 -1 -2 1
?2nd
3rd 0 -1 1 0 0 1 1 0 0
1 -1 -1 0 0 -2
Fourthly. Form the determinant(s) of order (G-1) and examine their value. If at least
one of these determinants is non-zero, the equation is identified. If all the determinants
of order (G-1) are zero, the equation is underidentified. In the above example of
exploration of the identifiability of the second structural equation, we have three
determinants of order (G-1)=3-1=2. They are:
Δ 1=¿|− 1 −2 ¿|¿ ¿ ¿
¿
(the symbol Δ stands for ‘determinant’) We see that we can form two non-zero
determinants of order G-1=3-1=2; hence the second equation of our system is
identified.
Fifthly. To see whether the equation is exactly identified or overidentified we use the
order condition ( K−M )≥(G−1 ). With this criterion, if the equality sign is satisfied,
that is if ( K−M )=(G−1 ) , the equation is exactly identified. If the inequality sign
(6-3)>(3-1) 67
D=S
S= quantity supplied
Y= income
t= time trend. In the demand function it stands for ‘tastes’; in the supply
function it stands for ‘technology’.
68
The above model is mathematically complete in the sense that it contains three
equations in three endogenous variables, D,S and P1. The remaining variables, Y, P2, C,
t are exogenous. Suppose we want to identify the supply function. We apply the two
criteria for identification:
Consequently the second equation satisfies the first condition for identification.
2. Rank condition
Variables
Equations
D P1 P2 Y t S C
1st equation -1 a1 a2 a3 a4 0 0
b1 b2 0 b4
2nd equation 0 -1 b3
0 0 0
rd 0 0
3 equation 1 1
Following the procedure explained earlier we strike out the second row an the second,
third, fifth, sixth and seventh columns. Thus we are left with the table of the
coefficients of excluded variables:
-1 a1 a2 a3 a4 0 0 -1 a3
b1 b2 0 b4 b3
0 1
0 0 0 1 0
0 -1
1 1
From this table we can form only one non-zero determinant of order
(G-1) = (3-1) =2
Δ=¿|−1 a3 ¿|¿ ¿ ¿
¿
We see that both the order and rank conditions are satisfied. Hence the second
equation of the model is identified. Furthermore, we see that in the order condition the
equality holds: (7-5) = (3-1) = 2. Consequently the second structural equation is
exactly identified.
As we have discussed in section 3.2, the bias arising from application of OLS estimation
which treats each equation of the simultaneous equations model as though it were a
single model is known as simultaneity bias or simultaneous equation bias . To avoid this
bias we use other methods of estimation, such as, Indirect Least Square (ILS), Two
Stage Least Square (2SLS), three Stage Least Square(3SLS), Maximum Likelihood
Methods and the Method of Instrumental Variable (IV). However, in view of the
introductory nature of this course we shall consider very briefly the following
70
techniques.
For just or exactly identified structural equation, the method of obtaining the estimates
of the structural coefficients from the OLS estimators of the reduced form coefficients is
known as the method of indirect least squares (ILS). ILS involves the following three
steps
Step III: - Obtain estimates of the original structural coefficients from the estimated
reduced form coefficients obtained in step II.
This method is applied in estimating an over identified equation. Theoretically, the two
stages least squares may be considered as an extension of ILS method. The 2SLS
method boils down to the application of ordinary list squares in two stages. That is, in
the first stage, we apply least squares to the reduced form equations in order to obtain
an estimate of the exact and random components of the endogenous variables
appearing in the right hand side of the equation with their estimated value and then we
apply OLS to the transformed original equation to obtain estimates of the structural
parameters.
Note, however, that since 2SLS is equivalent to ILS in the just-identified case, it is
usually applied uniformly to all identified equations in the system.
Review Questions
Yt=Qt+It
Yt=Ct+It+ Gt+u3
W t =a0 +a172(UN )t +a 2 P t +ε 1t ,
Pt =b 0 +b 1 M t +b 2 (UN )t + b3 W t +ε 2t ,
where
Assume that
ε 1 t and ε 2t have zero means, constant variances, are not
r it =r t +b 3 I it +ε it ,
where
73
Iit = investment expenditures of the ith firm at time t,
We assume that these N firms are large so that the level of their investment
expenditure affects the interest rate they face. Assume the standard conditions
Discuss whether or not the equations are identified.(Hint: use the rank condition for
identification)
Y t =I t +Ct , (2 )
income, and the interest rate. Assume that ε 1 and ε 2 are not autocorrelated and
are independent of rt.
a. List the endogenous variables and the predetermined variables in the model.
74
Chapter four
The aim of this chapter is to introduce the students with the basic concept of panel data
and panel data regression models.
Topic outline
1. Introduction
4. Review exercise
In Econometrics I we discussed briefly the types of data that are generally available for
empirical analysis, namely, time series, cross section, and panel. In time series
data we observe the values of one or more variables over a period of time (e.g., GDP
for several quarters or years). In cross-section data, values of one or more variables are
collected for several sample units, or entities, at the same point in time (e.g., crime
rates for 50 states in the United States for a given year). In panel data the same cross-
sectional unit (say a family or a firm or a state) is surveyed over time. In short, panel
data have space as well as time dimensions.
There are other names for panel data, such as pooled data (pooling of time series and
cross-sectional observations), combination of time series and cross-section data,
micropanel data, longitudinal data (a study over time of a variable or group of
subjects), event history analysis (e.g., studying the movement over time of subjects
through successive states or conditions), cohort analysis (e.g., following the career
path of 1965 graduates of a business school). Although there are subtle variations, all
these names essentially connote movement over time of cross-sectional units. We will
therefore use the term panel data
76 in a generic sense to include one or more of these
terms. And we will call regression models based on such data panel data regression
models.
What are the advantages of panel data over cross-section or time series data? Baltagi
lists the following advantages of panel data:
1. Since panel data relate to individuals, firms, states, countries, etc.,over time, there is
bound to be heterogeneity in these units. The techniques of panel data estimation can
take such heterogeneity explicitly into account by allowing for individual-specific
variables, as we shall show shortly. We use the term individual in a generic sense to
include microunits such as individuals, firms, states, and countries.
3. By studying the repeated cross section of observations, panel data are better suited
to study the dynamics of change. Spells of unemployment, job turnover, and labor
mobility are better studied with panel data.
4. Panel data can better detect and measure effects that simply cannot be observed in
pure cross-section or pure time series data. For example, the effects of minimum wage
laws on employment and earnings can be better studied if we include successive waves
of minimum wage increases in the federal and/or state minimum wages.
5. Panel data enables us to study more complicated behavioral models. For example,
phenomena such as economies of scale and technological change can be better handled
by panel data than by pure cross-section or pure time series data.
6. By making data available for several thousand units, panel data can minimize the
bias that might result if we aggregate individuals or firms into broad aggregates.
77
In short, panel data can enrich empirical analysis in ways that may not be possible if we
use only cross-section or time series data. This is not to suggest that there are no
problems with panel data modeling. We will discuss them after we cover some theory
and discuss an example.
To set the stage, let us consider a concrete example. Consider the famous study of
investment theory proposed by Y. Grunfeld. Grunfeld was interested in finding out how
real gross investment (Y) depends on the real value of the firm () and real capital stock
(). Although the original study covered several companies, for illustrative purposes we
have obtained data on four companies, General Electric (GE), General Motor (GM), U.S.
Steel (US), and Westinghouse. Data for each company on the preceding three variables
are available for the period 1935–1954. Thus, there are four cross-sectional units and
20 time periods. In all, therefore, we have 80 observations. A priori, Y is expected to be
positively related to and .
In principle, we could run four time series regressions, one for each company or we
could run 20 cross-sectional regressions, one for each year, although in the latter case
we will have to worry about the degrees of freedom. Pooling, or combining, all the 80
observations, we can write the Grunfeld investment function as:
= 1, 2, 3, 4 and t = 1, 2, . . . , 20
where i stands for the ith cross-sectional unit and t for the tth time period.
As a matter of convention, we will let i denote the cross-section identifier and t the time
78
identifier. It is assumed that there are a maximum of N cross sectional units or
observations and a maximum of T time periods. If each cross-sectional unit has the
same number of time series observations, then such a panel (data) is called a
balanced panel. In the present example we have a balanced panel, as each company
in the sample has 20 observations.
If the number of observations differs among panel members, we call such a panel an
unbalanced panel. In this chapter we will largely be concerned with a balanced panel.
Initially, we assume that the X’s are non-stochastic and that the error term follows the
classical assumptions, namely, E() N(0, σ2).Notice carefully the double and triple
subscripted notation, which should be self-explanatory. How do we estimate (4.1)? The
answer follows.
Estimation of (4.1) depends on the assumptions we make about the intercept, the slope
coefficients, and the error term, . There are several possibilities:
1. Assume that the intercept and slope coefficients are constant across time and space
and the error term captures differences over time and individuals.
2. The slope coefficients are constant but the intercept varies over individuals.
3. The slope coefficients are constant but the intercept varies over individuals and time.
4. All coefficients (the intercept as well as slope coefficients) vary over individuals.
5. The intercept as well as slope coefficients vary over individuals and time.
As you can see, each of these cases introduces increasing complexity (and perhaps
79
more reality) in estimating panel data regression models, such as (4.1). Of course, the
complexity will increase if we add more regressors to the model because of the
possibility of collinearity among the regressors. To cover each of the preceding
categories in depth will require a separate book, and there are already several ones on
the market. In what follows, we will cover some of the main features of the various
possibilities, especially the first four. Our discussion is nontechnical.
The simplest, and possibly naive, approach is to disregard the space and time
dimensions of the pooled data and just estimate the usual OLS regression. That is,
stack the 20 observations for each company one on top of the other, thus giving in all
80 observations for each of the variables in the model. The OLS results are as follows
is quite low, suggesting that perhaps there is autocorrelation in the data. Of course, as
we know, a low Durbin–Watson value could be due to specification errors also. For
instance, the estimated model assumes that the intercept value of GE, GM, US, and
Westinghouse are the same. It also assumes that the slope coefficients of the two X
variables are all identical for all the four firms. Obviously, these are highly restricted
assumptions. Therefore, despite its simplicity, the pooled regression (4.1) may distort
80
the true picture of the relationship between Y and the X’s across the four companies.
What we need to do is find some way to take into account the specific nature of the
four companies. How this can be done is explained next.
One way to take into account the “individuality” of each company or each cross-
sectional unit is to let the intercept vary for each company but still assume that the
slope coefficients are constant across firms. To see this, we write model (4.1) as:
…………………………………… (4.3)
Notice that we have put the subscript i on the intercept term to suggest that the
intercepts of the four firms may be different; the differences may be due to special
features of each company, such as managerial style or managerial philosophy.
In the literature, model (4.3) is known as the fixed effects (regression) model (FEM).
The term “fixed effects” is due to the fact that, although the intercept may differ across
individuals (here the four companies), each individual’s intercept does not vary over
time; that is, it is time invariant. Notice that if we were to write the intercept as β1it , it
will suggest that the intercept of each company or individual is time variant. It may be
noted that the FEM given in (4.3) assumes that the (slope) coefficients of the
regressors do not vary across individuals or over time.
How do we actually allow for the (fixed effect) intercept to vary between companies?
We can easily do that by the dummy variable technique, particularly, the differential
intercept dummies. Therefore, we write (4.3) as:
…………………………………..(4.4) 81
Of course, you are free to choose any company as the comparison company.
Incidentally, if you want explicit intercept values for each company, you can introduce
four dummy variables provided you run your regression through the origin, that is, drop
the common intercept in (4.4); if you do not do this, you will fall into the dummy
variable trap. Since we are using dummies to estimate the fixed effects, in the literature
the model (4.4) is also known as the LSDV model. So, the terms fixed effects and
LSDV can be used interchangeably. In passing, note that the LSDV model (4.4) is also
known as the covariance model and X2 and X3 are known as covariates.
Compare this regression with (4.1). In (4.5) all the estimated coefficients are
individually highly significant,82as the p values of the estimated t coefficients are
extremely small. The intercept values of the four companies are statistically different;
being −245.7924 for GE, −84.220 (=−245.7924 + 161.5722) for GM, 93.8774
(=−245.7924 + 339.6328) for US, and −59.2258 (=−245.7924 + 186.5666) for WEST.
These differences in the intercepts may be due to unique features of each company,
such as differences in management style or managerial talent.
Which model is better—(4.1) or (4.5)? The answer should be obvious, judged by the
statistical significance of the estimated coefficients, and the fact that the R2 value has
increased substantially and the fact that the Durbin–Watson d value is much higher,
suggesting that model (4.1) was mis-specified. The increased R2 value, however, should
not be surprising as we have more variables in model (4.5).
We can also provide a formal test of the two models. In relation to (4.5), model (4.1) is
a restricted model in that it imposes a common intercept on all the companies.
Therefore, we can use the restricted F test discussed in Econometrics I. Using the
previous formula , the reader can easily check that in the present instance the F value
is:
where the restricted R2 value is from (4.1) and the unrestricted R2 is from (4.5) and
where the number of restrictions is 3, since model (4.1) assumes that the intercepts of
the GE, GM, US, and WEST are the same. Clearly, the F value of 66.9980 (for 3
numerator df and 74 denominator df) is highly significant and, therefore, the restricted
regression (4.1) seems to be invalid.
The Time Effect. Just as we used the dummy variables to account for individual
(company) effect, we can allow for time effect in the sense that the Grunfeld
83
investment function shifts over time because of factors such as technological changes,
changes in government regulatory and/or tax policies, and external effects such as wars
or other conflicts. Such time effects can be easily accounted for if we introduce time
dummies, one for each year. Since we have data for 20 years, from 1935 to 1954, we
can introduce 19 time dummies (why?), and write the model (4.4) as:
………………(4.7)
where Dum35 takes a value of 1 for observation in year 1935 and 0 otherwise, etc. We
are treating the year 1954 as the base year, whose intercept value is given by λ0
(why?)
We are not presenting the regression results based on (4.7), for none of the individual
time dummies were individually statistically significant. The R2 value of (4.7) was
0.7697, whereas that of (4.1) was 0.7565, an increment of only 0.0132. It is left as an
exercise for the reader to show that, on the basis of the restricted F test, this increment
is not significant, which probably suggests that the year or time effect is not significant.
This might suggest that perhaps the investment function has not changed much over
time.
We have already seen that the individual company effects were statistically significant,
but the individual year effects were not. Could it be that our model is mis-specified in
that we have not taken into account both individual and time effects together? Let us
consider this possibility.
84
…………..(4.8)
When we run this regression, we find the company dummies as well as the coefficients
of the X are individually statistically significant, but none of the time dummies are.
Essentially, we are back to (4.5). The overall conclusion that emerges is that perhaps
there is pronounced individual company effect but no time effect. In other words, the
investment functions for the four companies are the same except for their intercepts. In
all the cases we have considered, the X variables had a strong impact on Y.
Here we assume that the intercepts and the slope coefficients are different for all
individual, or cross-section, units. This is to say that the investment functions of GE,
GM, US, and WEST are all different. We can easily extend our LSDV model to take care
of this situation. Reconsider chapter I. There we introduced the individual dummies in
an additive manner. But, later on, on dummy variables, we showed how interactive, or
differential, slope dummies, can account for differences in slope coefficients. To do this
in the context of the Grunfeld investment function, what we have to do is multiply each
of the company dummies by each of the X variables [this will add six more variables].
That is, we estimate the following model:
Yit = α1 + α2D2i + α3D3i + α4D4i + β2X2it + β3X3it + γ1(D2i X2it) + γ2(D2i X3it)
+γ3(D3i X2it) + γ4(D3i X3it) + γ5(D4i X2it) + γ6(D4i X3it) + uit …………..4.9
You will notice that the γ ’s are the differential slope coefficients, just as α2, α3, and α4
are the differential intercepts. If one or more of the γ coefficients are statistically
significant, it will tell us that one or more slope coefficients are different from the base
group. For example, say β2 and γ1 are statistically significant. In this case ( β2 + γ1)
will give the value of the slope 85
coefficient of
X2 for General Motors, suggesting that the GM slope coefficient of X2 is different from
that of General Electric, which is our comparison company. If all the differential
intercept and all the differential slope coefficients are statistically significant, we can
conclude that the investment functions of General Motors, United States Steel, and
Westinghouse are different from that of General Electric. If this is in fact the case, there
may be little point in estimating the pooled regression .
Let us examine the regression results based on (16.3.8). For ease of reading, the
regression results of (16.3.8) are given in tabular form in Table 16.2. As these results
reveal, Y is significantly related to X2 and X3. However, several differential slope
coefficients are statistically significant. For instance, the slope coefficient of X2 is 0.0902
for GE, but 0.1828 (0.0902 + 0.092) for GM. Interestingly, none of the differential
intercepts are statistically significant.
R2 = 0.9511 d = 1.0896
All in all, it seems that the investment functions of the four companies are different.
This might suggest that the data of the four companies are not “poolable,” in which
case one can estimate the investment functions for each company separately. This is a
reminder that panel data regression models may not be appropriate in each situation,
despite the availability of both time series and cross-sectional data.
A Caution on the Use of the Fixed Effects, or LSDV, Model. Although easy to use,
the LSDV model has some problems that need to be borne in mind.
First, if you introduce too many dummy variables, as in the case of the previous model ,
you will run up against the degrees of freedom problem. Alternatively, in the other case
of, where we have 80 observations, but only 55 degrees of freedom—we lose 3 df for
the three company dummies, 19 df for the 19 year dummies, 2 for the two slope
coefficients, and 1 for the common intercept. Second, with so many variables in the
model, there is always the possibility of multicollinearity, which might make precise
estimation of one or more parameters difficult.
Third, suppose in the FEM , we also include variables such as sex, color, and ethnicity,
which are time invariant too because an individual’s sex color, or ethnicity does not
change over time. Hence, the LSDV approach may not be able to identify the impact of
such time-invariant variables. 87
Fourth, we have to think carefully about the error term uit. All the results we have
presented so far are based on the assumption that the error term follows the classical
assumptions, namely,
uit N(0, σ2). Since the i index refers to cross-sectional observations and t to time series
observations, the classical assumption for uit may have to be modified. There are
several possibilities.
1. We can assume that the error variance is the same for all cross section units or we
can assume that the error variance is heteroscedastic.
2. For each individual we can assume that there is no autocorrelation over time. Thus,
for example, we can assume that the error term of the investment function for General
Motors is nonautocorrelated. Or we could assume that it is autocorrelated, say, of the
AR(1) type.
3. For a given time, it is possible that the error term for General Motors is correlated
with the error term for, say, U.S. Steel or both U.S. Steel and Westinghouse.7 Or, we
could assume that there is no such correlation.
4. We can think of other permutations and combinations of the error term. As you will
quickly realize, allowing for one or more of these possibilities will make the analysis that
much more complicated. Space and mathematical demands preclude us from
considering all the possibilities.
However, some of the problems may be alleviated if we resort to the so-called random
effects model, which we discuss next.
Approach
An obvious question in connection with the covariance [i.e., LSDV] model is whether
the inclusion of the dummy variables—and the consequent loss of the number of
degrees of freedom—is really necessary. The reasoning underlying the covariance
model is that in specifying the regression model we have failed to include relevant
explanatory variables that do not change over time (and possibly others that do change
over time but have the same value for all cross-sectional units), and that the inclusion
of dummy variables is a cover up of our ignorance. If the dummy variables do in fact
represent a lack of knowledge about the (true) model, why not express this ignorance
through the disturbance term uit? This is precisely the approach suggested by the
proponents of the so called error components model (ECM) or random effects
model (REM).
Instead of treating β1i as fixed, we assume that it is a random variable with a mean
value of β1 (no subscript i here). And the intercept value for an individual company can
be expressed as
β1i = β1 + εi i = 1, 2, . . . , N ……………………………………………………….(4.11)
where εi is a random error term with a mean value of zero and variance of σ2 ε .
drawing from a much larger universe of such companies and that they have a common
mean value for the intercept ( = β1) and the individual differences in the intercept
values of each company are reflected in the error term εi .
where
The composite error term wit consists of two components, εi , which is the cross-
section, or individual-specific, error component, and uit , which is the combined time
series and cross-section error component. The term error components model derives its
name because the composite error term wit consists of two (or more) error
components.
Ɛit N(0,δ2)
E(εiuit) = 0 E(εiεj ) =0 (i = j )
that is, the individual error components are not correlated with each other and are not
autocorrelated across both cross-section and time series units. Notice carefully the
90
difference between FEM and ECM. In FEM each cross-sectional unit has its own (fixed)
intercept value, in all N such values for N cross-sectional units. In ECM, on the other
hand, the intercept β1 represents the mean value of all the (cross-sectional) intercepts
and the error component εi represents the (random) deviation of individual intercept
from this mean value. However, keep in mind that εi is not directly observable; it is
what is known as an unobservable, or latent, variable.
E(wit) = 0 ……………………………………………………….(4.15)
Now if σ2ε= 0, there is no difference between models (4.1) and (4.12), in which case
we can simply pool all the (cross-sectional and time series) observations and just run
the pooled regression, as we did in (4.2).
As (4.16) shows, the error term wit is homoscedastic. However, it can be shown that
wit and wis (t _= s) are correlated; that is, the error terms of a given cross-sectional
unit at two different points in time are correlated. The correlation coefficient, corr ( wit,
wis), is as follows:
Notice two special features of the preceding correlation coefficient. First, for any given
cross-sectional unit, the value of the correlation between error terms at two different
times remains the same no matter how far apart the two time periods are. This is in
strong contrast to the first-order [AR(1)] scheme, where we found that the correlation
between time periods declines over time. Second, the correlation structure given in
(4.17) remains the same for all cross sectional units; that is, it is identical for all
individuals.
91
If we do not take this correlation structure into account, and estimate (4.12) by OLS,
the resulting estimators will be inefficient. The most appropriate method here is the
method of generalized least squares (GLS).
We will not discuss the mathematics of GLS in the present context because of its
complexity. Since most modern statistical software packages now have routines to
estimate ECM (as well as FEM), we will only present the results for our investment
example. But before we do that, it may be noted that we can easily extend (4.13) to
allow for a random error component to take into account variation over time.
The results of ECM estimation of the Grunfeld investment function are presented in the
following Table. Several aspects of this regression should be noted. First, if you sum the
random effect values given for the four companies, it will be zero, as it should (why?).
Second, the mean value of the random error component, εi, is the common intercept
value of −73.0353. The random effect value of GE of −169.9282 tells us by how much
the random error component of GE differs from the common intercept value. Similar
interpretation applies to the other three values of the random effects. Third, the R2
value is obtained from the transformed GLS regression.
If you compare the results of the ECM model given in the following table with those
obtained from FEM, you will see that generally the coefficient values of the two X
variables do not seem to differ much, except for those given in the previous table,
where we allowed the slope coefficients of the two variables to differ across cross-
sectional units.
0.3870
Random effect:
GE -169.9282
GM -9.5078
USS 165.5613
Westinghouse 13.87475
R2 = 0.9323 (GLS)
The challenge facing a researcher is: Which model is better, FEM or ECM? The answer
to this question hinges around the assumption one makes about the likely correlation
between the individual, or cross-section specific, error component εi and the X
regressors.
If it is assumed that εi and the X’s are uncorrelated, ECM may be appropriate, whereas
if εi and the X’s are correlated, FEM may be appropriate. Why would one expect
correlation between the individual error component εi and one or more regressors?
Consider an example. Suppose
93
we have a random sample of a large number of
individuals and we want to model their wage, or earnings, function. Suppose earnings
are a function of education, work experience, etc. Now if we let εi stand for innate
ability, family background, etc., then when we model the earnings function including εi
it is very likely to be correlated with education, for innate ability and family background
are often crucial determinants of education. As Wooldridge contends, “In many
applications, the whole reason for using panel data is to allow the unobserved effect
[i.e., εi] to be correlated with the explanatory variables.” The assumptions underlying
ECM is that the εi are a random drawing from a much larger population. But sometimes
this may not be so. For example, suppose we want to study the crime rate across the
50 states in the United States. Obviously, in this case, the assumption that the 50 states
are a random sample is not tenable.
Keeping this fundamental difference in the two approaches in mind, what more can we
say about the choice between FEM and ECM? Here the observations made by Judge et
al. may be helpful:
1. If T (the number of time series data) is large and N (the number of cross-sectional
units) is small, there is likely to be little difference in the values of the parameters
estimated by FEM and ECM. Hence the choice here is based on computational
convenience. On this score, FEM may be preferable.
2. When N is large and T is small, the estimates obtained by the two methods can differ
significantly. Recall that in ECM β1i = β1 + εi , where εi is the cross-sectional random
component, whereas in FEM we treat β1i as fixed and not random. In the latter case,
statistical inference is conditional on the observed cross-sectional units in the sample.
This is appropriate if we strongly believe that the individual, or cross-sectional, units in
our sample are not random drawings from a larger sample. In that case, FEM is
appropriate. However, if the cross-sectional units in the sample are regarded as random
drawings, then ECM is appropriate, for in that case statistical inference is unconditional.
94
3. If the individual error component εi and one or more regressors are correlated, then
the ECM estimators are biased, whereas those obtained from FEM are unbiased.
4. If N is large and T is small, and if the assumptions underlying ECM hold, ECM
estimators are more efficient than FEM estimators.
Is there a formal test that will help us to choose between FEM and ECM? Yes, a test
was developed by Hausman in 1978. We will not discuss the details of this test, for they
are beyond the scope of this book. The null hypothesis underlying the Hausman test is
that the FEM and ECM estimators do not differ substantially. The test statistic developed
by Hausman has an asymptotic χ2 distribution. If the null hypothesis is rejected, the
conclusion is that ECM is not appropriate and that we may be better off using FEM, in
which case statistical inferences will be conditional on the εi in the sample.
As noted at the outset, the topic of panel data modeling is vast and complex. We have
barely scratched the surface. Among the topics that we have not discussed, the
following may be mentioned.
4. Dynamic panel data models in which the lagged value(s) of the regressand ( Yit)
appears as an explanatory variable.
95
One or more of these topics can be found in the references cited in this chapter, and
the reader is urged to consult them to learn more about this topic. These references
also cite several empirical studies in various areas of business and economics that have
used panel data regression models. The beginner is well advised to read some of these
applications to get a feel about how researchers have actually implemented such
models.
1. Panel regression models are based on panel data. Panel data consist of observations
on the same cross-sectional, or individual, units over several time periods.
2. There are several advantages to using panel data. First, they increase the sample
size considerably. Second, by studying repeated cross-section observations, panel data
are better suited to study the dynamics of change. Third, panel data enable us to study
more complicated behavioral models.
3. Despite their substantial advantages, panel data pose several estimation and
inference problems. Since such data involve both cross-section and time dimensions,
problems that plague cross-sectional data (e.g., heteroscedasticity) and time series data
(e.g., autocorrelation) need to addressed. There are some additional problems, such as
cross-correlation in individual units at the same point in time.
4. There are several estimation techniques to address one or more of these problems.
The two most prominent are (1) the FEM and (2) the REM or error components model
(ECM).
5. In FEM the intercept in the regression model is allowed to differ among individuals in
recognition of the fact each individual,
96 or cross sectional, unit may have some special
characteristics of its own. To take into account the differing intercepts, one can use
dummy variables. The FEM using dummy variables is known as the least-squares
dummy variable (LSDV) model. FEM is appropriate in situations where the individual
specific intercept may be correlated with one or more regressors. A disadvantage of
LSDV is that it consumes a lot of degrees of freedom when the number of cross-
sectional units, N, is very large, in which case we will have to introduce N dummies (but
suppress the common intercept term).
7. The Hausman test can be used to decide between FEM and ECM.
Review question
1. Write the difference among cross section, time series and pooled data.