Chapter 5
Panel Data Analysis
1
The Nature of Panel Data
Panel data, also known as longitudinal data, have both time
series and cross-sectional dimensions.
They arise when we measure the same collection of people
or objects over a period of time.
Econometrically, the setup is yit xit uit
where yit is the dependent variable, is the intercept term,
is a k 1 vector of parameters to be estimated on the
explanatory variables, xit; t = 1, …, T;
i = 1, …, N.
2
The Advantages of using Panel Data
There are a number of advantages from using a full panel
technique when a panel of data is available.
We can address a broader range of issues and tackle more
complex problems with panel data than would not be
possible with pure time series or pure cross-sectional data
alone.
It is often of interest to examine how variables, or the
relationships between them, change dynamically (over
time).
By structuring the model in an appropriate way, we can
remove the impact of certain forms of omitted variables
bias in regression results.
3
Fixed Effects Models
The fixed effects model for some variable yit may be written
yit xit i vit
We can think of i as encapsulating all of the variables that
affect yit cross-sectionally but do not vary over time – for
example, the sector that a firm operates in, a person's gender,
or the country where a bank has its headquarters, etc.
Thus we would capture the heterogeneity that is encapsulated
in i by a method that allows for different intercepts for each
cross sectional unit.
This model could be estimated using dummy variables, which
would be termed the least squares dummy variable approach.
4
Fixed Effects Models (Cont’d)
The LSDV model may be written
yit xit 1 D1i 2 D2i 3 D3i N DN i vit
where D1i is a dummy variable that takes the value 1 for all
observations on the first entity (e.g., the first firm) in the
sample and zero otherwise, D2i is a dummy variable that
takes the value 1 for all observations on the second entity
(e.g., the second firm) and zero otherwise, and so on.
The LSDV can be seen as just a standard regression model
and therefore it can be estimated using OLS.
Now the model given by the equation above has N+k
parameters to estimate.
5
Time Fixed Effects Models
It is also possible to have a time-fixed effects model rather
than an entity-fixed effects model.
We would use such a model where we think that the
average value of yit changes over time but not cross-
sectionally.
Hence with time-fixed effects, the intercepts would be
allowed to vary over time but would be assumed to be the
same across entities at each given point in time.
6
Time Fixed Effects Models
We could write a time-fixed effects model as
yit xit t vit
where t is a time-varying intercept that captures all of the
variables that affect y and that vary over time but are constant
cross-sectionally.
An example would be where the regulatory environment or
tax rate changes part-way through a sample period.
In such circumstances, this change of environment may
well influence y, but in the same way for all firms. 7
Time Fixed Effects Models (Cont’d)
Time-variation in the intercept terms can be allowed
for in exactly the same way as with entity fixed effects.
That is, a least squares dummy variable model could be
estimated
yit xit 1D1t 2 D2t ... T DTt vit
where D1t, for example, denotes a dummy variable that
takes the value 1 for the first time period and zero
elsewhere, and so on.
8
Time Fixed Effects Models (Cont’d)
The only difference is that now, the dummy variables
capture time variation rather than cross-sectional variation.
Similarly, to avoid estimating a model containing all T
dummies, a within transformation can be conducted to
subtract away the cross-sectional averages from each
observation. Finally, it is possible to allow for both entity
fixed effects and time fixed effects within the same model.
Such a model would be termed a two-way error component
9
The Random Effects Model
An alternative to the fixed effects model described above is
the random effects model, which is sometimes also known
as the error components model.
As with fixed effects, the random effects approach
proposes different intercept terms for each entity and again
these intercepts are constant over time, with the
relationships between the explanatory and explained
variables assumed to be the same both cross-sectionally
10
and temporally.
The Random Effects Model
However, the difference is that under the random
effects model, the intercepts for each cross-sectional
unit are assumed to arise from a common intercept
(which is the same for all cross-sectional units and
over time), plus a random variable i that varies cross-
sectionally but is constant over time.
yit xit it , it i vit
i measures the random deviation of each entity’s
11
How the Random Effects Model Works
Unlike the fixed effects model, there are no dummy
variables to capture the heterogeneity (variation) in the
cross-sectional dimension.
Instead, this occurs via the i terms.
Note that this framework requires the assumptions that the
new cross-sectional error term, i, has zero mean, is
independent of the individual observation error term vit, has
constant variance, and is independent of the explanatory
12
variables.
How the Random Effects Model Works
The parameters ( and the vector) are estimated consistently but
inefficiently by OLS, and the conventional formulae would have to
be modified as a result of the cross-correlations between error
terms for a given cross-sectional unit at different points in time.
Instead, a generalised least squares (GLS) procedure is usually
used. The transformation involved in this GLS procedure is to
subtract a weighted mean of the yit over time (i.e. part of the mean
rather than the whole mean, as was the case for fixed effects
estimation). 13
Fixed or Random Effects?
It is often said that the random effects model is more
appropriate when the entities in the sample can be thought
of as having been randomly selected from the population,
but a fixed effect model is more plausible when the entities
in the sample effectively constitute the entire population.
More technically, the transformation involved in the GLS
procedure under the random effects approach will not
remove the explanatory variables that do not vary over
14
time, and hence their impact can be enumerated.
Fixed or Random Effects?
Also, since there are fewer parameters to be estimated with
the random effects model (no dummy variables or within
transform to perform), and therefore degrees of freedom are
saved, the random effects model should produce more
efficient estimation than the fixed effects approach.
However, the random effects approach has a major drawback
which arises from the fact that it is valid only when the
composite error term it is uncorrelated with all of the
15
explanatory variables.
Fixed or Random Effects? (Cont’d)
This assumption is more stringent than the corresponding one in
the fixed effects case, because with random effects we thus
require both i and vit to be independent of all of the xit.
This can also be viewed as a consideration of whether any
unobserved omitted variables (that were allowed for by having
different intercepts for each entity) are uncorrelated with the
included explanatory variables. If they are uncorrelated, a
random effects approach can be used; otherwise the fixed
effects model is preferable. 16
Fixed or Random Effects? (Cont’d)
A test for whether this assumption is valid for the random effects
estimator is based on a slightly more complex version of the
Hausman test.
If the assumption does not hold, the parameter estimates will be
biased and inconsistent.
To see how this arises, suppose that we have only one explanatory
variable, x2it that varies positively with yit, and also with the error
term, it. The estimator will ascribe all of any increase in y to x
when in reality some of it arises from the error term, resulting in
17
biased coefficients.
Fixed or Random Effects? (Cont’d)
If the regressors are correlated with the ui, the FE
estimator is consistent but the RE estimator is not
consistent
If the regressors are uncorrelated with the ui, the FE
estimator is still consistent, albeit inefficient, whereas
the RE estimator is consistent and efficient
18
Fixed or Random Effects? (Cont’d)
Step 1: run a fixed effect model
xtreg lnfdi lngdphome lngdphost, fe
estimate store fe
Step 2 : run a random effect model
xtreg lnfdi lngdphome lngdphost, re
estimate store ran
Step 3: conduct Hausman’s test
hausman fe ran
Step 4 : make a decision as to which specification you should use
Notice that if the corresponding probability is < 0.05, Hausman test’s
null hypothesis that the RE estimator is consistent is soundly rejected
The individual effects do appear to be correlated with the 19
Fixed or Random Effects? (Cont’d)
Step 1: run a fixed effect model
xtreg trade_openess lnarea landlocked lnpop lngdp_pc lntot, fe
estimate store fix
Step 2 : run a random effect model
xtreg trade_openess lnarea landlocked lnpop lngdp_pc lntot, re
estimate store ran
Step 3: conduct Hausman’s test
hausman fix ran
Step 4 : make a decision as to which specification you use. If the
corresponding probability is < 0.05, Hausman test’s null hypothesis
that the RE estimator is consistent is rejected, i.e., the individual
effects are correlated with the regressors 20
Fixed or Random Effects? (Cont’d)
Using the macro data, run the following Hausman’s test
rename exporter country
sort country year
sort id year
tsset id year
xtreg trade_openess lnarea landlocked lnpop lngdp_pc lntot, fe
estimate store fix
xtreg trade_openess lnarea landlocked lnpop lngdp_pc lntot, re
estimate store ran
hausman fix ran
21
Fixed or Random Effects? (Cont’d)
---- Coefficients ----
(b) (B) (b-B) sqrt(diag(V_b-V_B))
fix ran Difference S.E.
lnarea -.1430533 -.1846615 .0416082 .8741601
lnpop .6770179 .0990702 .5779477 .0751476
lngdp_pc .2283795 .2091023 .0192772 .037618
lntot -.0824428 .0548649 -.1373078 .014989
b = consistent under Ho and Ha; obtained from xtreg
B = inconsistent under Ha, efficient under Ho; obtained from xtreg
Test: Ho: difference in coefficients not systematic
chi2(4) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 75.51
Prob>chi2 = 0.0000
(V_b-V_B is not positive definite)
Conclusion: the Hausman test’s null hypothesis that the RE
estimator is consistent is rejected. i.e., country fixed effects
do appear to be correlated with the regressors We shall
apply fixed effects model
22
Dynamic Models
All of the models we have considered so far have
been static, e.g.
yt = 1 + 2x2t + ... + kxkt + ut
But we can easily extend this analysis to the case
where the current value of yt depends on previous
values of y or one of the x’s, e.g.
yt = 1 + 2x2t + ... + kxkt + 1yt-1 + 2x2t-1 + … + kxkt-1+ ut
We could extend the model even further by adding
extra lags, e.g. x2t-2 , yt-3 .
23
Why Might we Want/Need To Include Lags in a
Regression?
Inertia of the dependent variable
Over-reactions
However, other problems with the regression could cause
the null hypothesis of no autocorrelation to be rejected:
Omission of relevant variables, which are themselves
autocorrelated.
If we have committed a “misspecification” error by
using an inappropriate functional form.
Autocorrelation resulting from unparameterised
seasonality.
24
Models in First Difference Form
Another way to sometimes deal with the problem of
autocorrelation is to switch to a model in first
differences.
Denote the first difference of yt, i.e. yt - yt-1 as yt;
similarly for the x-variables, x2t = x2t - x2t-1 etc.
The model would now be
yt = 1 + 2 x2t + ... + kxkt + ut
Sometimes the change in y is purported to depend
on previous values of y or xt as well as changes in
x:yt = 1 + 2 x2t + 3x2t-1 +4yt-1 + ut
25