0% found this document useful (0 votes)
12 views

Slides Commands

Uploaded by

Rafia
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Slides Commands

Uploaded by

Rafia
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Panel time-series modeling:

New tools for analyzing xt data

M ARKUS E BERHARDT

University of Nottingham

email: [email protected]
web: http://sites.google.com/site/medevecon (code, data),
http://twitter.com/sjoh2052 (data updates)

2011 UK Stata Users Group meeting


Cass Business School, London
16th September 2011

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 1 / 42


Acknowledgements
In this presentation I touch on a number of novel and existing Stata
routines. Due to lack of space I do not acknowledge the authors
when I discuss the routines, but their contribution is hereby
gratefully acknowledged:
Kit Baum and Fabian Bornhorst (levinlin, ipshin)
Piotr Lewandowski (pescadf)
Scott Merryman (xtfisher)
Rafael E. De Hoyos and Vasilis Sarafidis (xtcsd)
Damiaan Persyn and Joakim Westerlund (xtwest)
Edward F. Blackburne III and Mark W. Frank (xtpmg)
My own contributions
multipurt
xtcd
xtmg
can be found at SSC, including help files and empirical examples.
Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 2 / 42
1 Overview
Revelance, Approach and References

2 Introduction
The two worlds of panel data econometrics
Why not just one world?
The domain of panel time series

3 Stationarity testing in panels

4 Cross-section dependence testing

5 Cointegration testing

6 Estimation in Heterogeneous Parameter Models

7 Outlook

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 3 / 42


1 Overview
Revelance, Approach and References

2 Introduction
The two worlds of panel data econometrics
Why not just one world?
The domain of panel time series

3 Stationarity testing in panels

4 Cross-section dependence testing

5 Cointegration testing

6 Estimation in Heterogeneous Parameter Models

7 Outlook

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 4 / 42


A new field of panel econometrics
‘Panel time-series’ (PTS) or ‘nonstationary panel
econometrics’ deemed of great relevance for development
economists: PWT, UNIDO INDStat, other macro panel
datasets all display the data properties discussed here. Further
academic fields faced with macro panel data: regional science,
climate research; data properties likely in a host of other fields,
too. Close link to Gordon Hughes’ talk yesterday!
Relatively new field of study: Theory starts in early 1990s,
most activity over past 5-10 years; relatively few researchers are
using the methods; not much accessible literature yet.
Many of the theoretical concepts rather intuitive and some of
the methods relatively easy to implement.
No textbook, but some introductory readings: Baltagi (2008,
Econometric Analysis of Panels, Chapter 12), Coakley, Fuertes,
and Smith (2006, Comp Stats & Data Analysis) and Eberhardt
and Teal (2011, J Econ Surveys).

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 5 / 42


1 Overview
Revelance, Approach and References

2 Introduction
The two worlds of panel data econometrics
Why not just one world?
The domain of panel time series

3 Stationarity testing in panels

4 Cross-section dependence testing

5 Cointegration testing

6 Estimation in Heterogeneous Parameter Models

7 Outlook

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 6 / 42


Micro (‘short T ) and Macro (‘long T ’) panels
Macro panels Micro panels
aka time-series panels longitudinal panels
N (‘groups’) moderate, typically < 100; substantial, at times thousands;
countries, regions individuals, firms, households
T substantial, typically > 20; short, < 10, most commonly T < 5;
years; macro-finance: quarters, months typically years
asymptotics N, T → ∞, sequ or jointly (restrictions!) N →∞
parameters study heterogeneity across groups, homogeneity assumed,
not just time-invariant FE FE assumed to pick up all heterogeneity
dynamics often non-trivial, idiosyncratic limited to lagged dep. variable, homogeneous
variable properties unit roots, other nonstationarities stationary data
(e.g. structural breaks, trends)
endogeneity pervasive, lack of valid instruments instruments available (incl. own lags)
cross-section tested and accommodated, but independence assumed
dependence structure not analysed (except: spatial econometrics)
estimators MG, CCEMG, AMG, PMG, FE, DiffGMM, SysGMM, Olley & Pakes,
GM-FMOLS Levinsohn & Petrin (for productivity analysis)
Stata commands xtmg xtreg, xtabond2, opreg, levpet
diagnostic tests residual properties mean something ‘standard’ output (i.e. not much)
Adapted from Pedroni (2008)

Note: The vast majority of empirical research using ‘macro panels’


implements ‘micro panel’ methods!

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 7 / 42


Issue #1 Parameter Heterogeneity
If T large enough we can estimate each time-series separately and
test for heterogeneity. This raises a question as to what the
parameters of interest are: the coefficients of the individual units,
say βi for i = 1, . . . , N or the expected values (‘means’) and the
variances of the coefficients over the groups, E[βi ] and Var[βi ]?

Consider the following example data generating process (read: the


true process driving the data), taken from Smith and Fuertes (2007):
let

yit = µi + εit E[εit ] = 0, Var[εit ] = E[ε2it ] = σ2ε

For each group i there is zero-mean variation in y around a


constant group-specific mean µi . Furthermore, these
group-specific means also vary across groups:

µi = µ + η i E[η i ] = 0, Var[η i ] = E[η2i ] = σ2η

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 8 / 42


Issue #1 Parameter Heterogeneity (cont’d)
We can now consider the different means (ȳ) we can estimate for
this very simple example:

ȳ ♥ = (NT )−1 yit E[ȳ ♥ ] = µ Var[ȳ ♥ ] = (NT )−1 (σ2ε + σ2η )


XX
i t

ȳi♣ −1
E[ȳi♣ ] = µi Var[ȳi♣ ] = T −1 σ2ε
X
= T yit
t

ȳt♠ −1
E[ȳt♠ ] = µt = µ Var[ȳt♠ ] = N −1 (σ2ε + σ2η )
X
= N yit
i

Even in such a simple model setup, we thus obtain very different


results: two of the averages are estimates for the population
average (µ), whereas the third is for the group-specific average (µi ),
with variances of the estimators differing across all three. All of
them are unbiased and {NT , T , N}-consistent estimators of
something, the question is just whether this something is
interesting at all. . . Lesson: what is the statistic of interest? What is
the research question?

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 9 / 42


Issue #2 Variable non-stationarity
Example: cumulative rainfall data for Fortaleza, Northern Brazil,
and the evolution of UK per capita GDP. Consider OLS regression
log (Y/L)t = 2.1963 [t = 19.47] + 0.2898 [t = 32.53] rainfallt
R2 = .874 F(1, 152) = 1058.23 (p = .000) T = 154
We get a positive, statistically significant relationship between the
two variables: seems to suggest that rainfall is a very good predictor
of UK per capita GDP (causal relation!?). . . however the
relationship becomes insignificant if we run the model with
variables in first difference (allowing for a drift term):
∆log (Y/L)t = 0.0156 [0.57] + 0.1863 [1.02] ∆rainfallt
R2 = .007 F(1, 151) = 1.67 (p = .310) T = 153
⇒ spurious regression result of apparent significance in a
regression model of two (or more) nonstationary variables. Luckily,
we can test for cointegration to see whether the relationship is
spurious or not.
Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 10 / 42
Issue #2 Variable non-stationarity (con’t)

Rainfall in Fortaleza/Brazil and UK GDP pc


10 10

20

8 8
15

6 6
10

4
5 4

0 2
2
1850 1900 1950 2000
year 0 5 10 15 20
Note: GDP pc (in logs) is plotted in grey (left axis),
Rainfall (cumulative) for Fortaleza (in m)
cumulative rainfall (in m) in black (right axis). Note: GDP pc (in logs) is on the y-axis

Note: This example is inspired by James Reade’s lectures.

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 11 / 42


Issue #3 Cross-section correlation
Variable and/or residual correlation across panel members:
due to common shocks (e.g. recession) or spillover effects.
Standard panel estimators assume cross-section
independence.
If neglected cross-section dependence (CSD) can lead to
imprecise estimates and at worst to a serious identification
problem.
Example (next slide): Agro-climatic ‘distance’ — how similar
or different is the climatic environment in agriculture?
Spatial econometrics: econometrician ‘knows’ how panel
members are associated/correlated (e.g. neighbourhood),
models this association explicitly employing a weight matrix
(‘spatially lagged dependent variable’). [Gordon’s talk!]
Common factor models: models dependence with
unobserved common factors ft with heterogeneous impact γi .
Trick is to estimate common factors or blend out their impact
on estimation.
Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 12 / 42
Issue #3 Cross-section correlation (cont’d)
Agro-climatic ‘distance’ — the view from Kenya. Kenya’s cultivated land: 40% is located in zone Aw (Equatorial savannah,
dry winters), 19% in zone BS (steppe), 17% in zone BW (desert) and 25% in zone H (highland climate). Source: Matthews
(1983), in Gallup, Mellinger, and Sachs (1999).

Taken from: Eberhardt & Teal (2011) ‘No mangos in the tundra: spatial
heterogeneity in agricultural productivity analysis’, Working paper.

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 13 / 42


Existing methods won’t do. . .
Apply DiffGMM, SysGMM: macro panels now arguably main
playground for this type of estimators, even though they were
developed for large N, short T ! Problems: require stationary
variables or at least stationarity in the initial condition (t = 0
SysGMM); overfitting problem with long T panels; assume
parameter homogeneity for instrumentation (see Pesaran &
Smith, 1995); assume cross-section independence.
Treat as large system of equations: VAR, VECM: For
Zit0 = (yit , Xit0 ) a panel can be thought of as
M
∆Zt∗ = c + ΠZt−1
∗ ∗
Φm ∆Zt−k + ε∗t
X
+
m=1
Problem: general VECM quickly becomes infeasible as N rises:
exponential growth of parameters to be estimated.

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 14 / 42


So when are Panel Time Series methods most appropriate?
1 time-dimension T
Ï too short for reliable inference for any single group alone. . .
Ï . . . but long enough to deal with dynamics flexibly.
2 cross-section dimension N
Ï too large to be treated as a system (as in the VECM). . .
Ï . . . but not so large as to ‘overwhelm’ the T dimension (many
tests require T /N → 0 for asymptotics).
3 data properties
Ï some processes are nonstationary s.t. cointegration is a
possibility for some groups in the panel.
Ï potential for heterogeneity in the relationship across groups,
dynamics non-trivial.
Ï some commonality exists across groups — if no
commonalities, then nothing is gained by combining the
information to a panel compared to a time-series.
Ï cross-section dependence may be an issue (variable in country
i may be non-spuriously correlated with variable in country j;
unobserved factors common to all countries)

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 15 / 42


1 Overview
Revelance, Approach and References

2 Introduction
The two worlds of panel data econometrics
Why not just one world?
The domain of panel time series

3 Stationarity testing in panels

4 Cross-section dependence testing

5 Cointegration testing

6 Estimation in Heterogeneous Parameter Models

7 Outlook

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 16 / 42


Examples of stochastic processes

Notes: These graphs were produced in OxMetrics 5 using PcNaive.

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 17 / 42


Stationary testing in time-series land

yt = a + ρyt−1 + εt
⇔ ∆yt = a + (ρ − 1) yt−1 + εt = a + byt−1 + εt (1)
| {z }
b

If ρ = 1 ⇔ b = 0 this collapses to ∆yt = a + εt

Dickey-Fuller (DF) test: One way of testing the unit root hypothesis
H0 : ρ = 1 is to compute the t-ratio for yt−1 in equation (1). Since
this involves a definitely I(0) variable on the LHS and a potentially
I(1) variable on the RHS the t-ratio does not have a standard
t-distribution, but ‘Dickey-Fuller distribution’.
The Augmented Dickey-Fuller (ADF) test takes potential serial
correlation in the error term into account — this is achieved by
introducing lagged terms of the dependent variable. Alternative
testing procedures use other parametric or nonparametric
techniques to wash out serial correlation.

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 18 / 42


The trouble with time-series unit root tests
Single time-series test might not reject H0 , but we’d still doubt data is I(1):
F low power of the tests in near-unit root case
inference is sensitive to treatment of serially correlated errors and treatment
of means and trends
sensitivity to structural breaks
power dependence on time span: short (decades) time-series look I(1), long
ones (century) I(0)
non-linearities

F by far the most serious short-coming. Consider the power statistics for
time-series unit root tests:

AR coefficient DF ADF PP
0.25 0.90 0.93 0.95
0.65 0.25 0.78 0.89
0.95 0.09 0.15 0.20
Recall: ‘power’ of a test is its ability to reject the null when it is false; ‘size’ is the
probability of rejecting the null when it is actually true.

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 19 / 42


PURT implementations in Stata
First generation PURTs
Ï Levin and Lin (1992) pooled ADF test (levinlin)
Ï Im, Pesaran, and Shin (1997) averaged unit root test for
heterogeneous panels (IPS) (ipshin)
Ï Maddala and Wu (1999) Fisher combination test (MW)
(xtfisher)
Ï Breitung (2000), Hadri (2000), Harris & Tzavalis (1999)
(xtunitroot with options breitung, hadri, ht,
respectively, in addition to the above tests)
Second generation PURTs
Ï Pesaran (2007) panel unit root test (pescadf)
Ï Pesaran, Smith, and Yamagata (2009) panel unit root test
(xtcipsm under construction)
Convenient tool
Ï multipurt combines xtfisher and pescadf but allows

multiple variables and ranges of lag augmentations.


Alternative approaches currently unavailable
Ï Bai and Ng (2004) PANIC attack
Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 20 / 42
Practical example

Maddala and Wu (1999) Fisher Test


Constant
lags ln Yit ln Lit ln Kit ln Rit
0 377.10 (.00) 195.89 (.98) 475.55 (.00) 821.56 (.00)
1 387.37 (.00) 318.94 (.00) 353.65 (.00) 376.22 (.00)
2 329.96 (.00) 184.69 (.99) 277.02 (.04) 373.42 (.00)
3 292.94 (.01) 211.53 (.89) 329.64 (.00) 361.32 (.00)
Pesaran (2007) CIPS Test
Constant
lags ln Yit ln Lit ln Kit ln Rit
0 2.33 (.99) 3.46 (.99) 8.01 (.99) 9.45 (.99)
1 2.50 (.99) -0.24 (.41) 8.43 (.99) 7.13 (.99)
2 10.36 (.99) 8.39 (.99) 10.27 (.99) 14.58 (.99)
3 15.22 (.99) 12.55 (.99) 11.63 (.99) 16.51 (.99)

Code:
xtfisher lny, lags(1) and
pescadf lny, lags(1) for each variable/lag-length or
multipurt lny lnl lnk lnrd, lags(3)
Table from: Eberhardt, Helmers & Strauss (forthcoming) ‘Do spillovers matter
when estimating private returns to R&D?’, The Review of Economics and Statistics
Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 21 / 42
1 Overview
Revelance, Approach and References

2 Introduction
The two worlds of panel data econometrics
Why not just one world?
The domain of panel time series

3 Stationarity testing in panels

4 Cross-section dependence testing

5 Cointegration testing

6 Estimation in Heterogeneous Parameter Models

7 Outlook

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 22 / 42


1 Investigate share of variation explained by first two principal
components.
2 Mean (absolute) correlation coefficients (ρ̂ ij , |ρ̂ ij |) of variables
or residuals
3 Pesaran (2004) CD test
sµ ¶ ÃN−1 N q !
2
Tij ρ̂ ij
X X
Ï CD = CD ∼ N(0, 1)
N(N − 1) i=1 j=i+1

Ï xtcsd program only works after xtreg, quite limiting.


Ï Can apply xtcd to test ‘raw’ variables, residuals of AR(2)
regressions [pooled, heterog] and residuals from any other
models.
4 Moscone and Tosetti (2009): number of alternative tests (but
none perform better than CD).
5 Jensen & Schmitt (2011, Spatial Econ Analysis): Schott test of
interest when N is small.
6 Variety of Spatial Econometric tests (cross-section) available if
structure (W -matrix) is imposed.
Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 23 / 42
Practical example

PANEL A: L EVELS PANEL B: F IRST D IFFERENCES


ln Yit ln Lit ln Kit ln Rit ∆ln Yit ∆ln Lit ∆ln Kit ∆ln Rit
avg ρ 0.29 0.30 0.55 0.40 avg ρ 0.17 0.17 0.20 0.03
avg |ρ| 0.59 0.57 0.77 0.78 avg |ρ| 0.26 0.28 0.34 0.34
CD 110.44 105.45 199.00 149.64 CD 58.78 59.08 68.53 12.50
p-value 0.00 0.00 0.00 0.00 p-value 0.00 0.00 0.00 0.00
PANEL C: P OOLED AR(2) PANEL D: C OUNTRY- INDUSTRY AR(2)
ln Yit ln Lit ln Kit ln Rit ln Yit ln Lit ln Kit ln Rit
avg ρ 0.00 0.00 0.00 0.02 avg ρ 0.13 0.12 0.09 0.02
avg |ρ| 0.23 0.26 0.24 0.25 avg |ρ| 0.25 0.25 0.25 0.23
CD -0.55 -1.42 -1.03 7.05 CD 45.46 42.20 33.78 8.44
p-value 0.58 0.16 0.30 0.00 p-value 0.00 0.00 0.00 0.00

Code:
xtcd lny lnl lnk lnrd (Panel A)

Table from: Eberhardt, Helmers & Strauss (forthcoming) ‘Do spillovers matter
when estimating private returns to R&D?’, The Review of Economics and Statistics

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 24 / 42


1 Overview
Revelance, Approach and References

2 Introduction
The two worlds of panel data econometrics
Why not just one world?
The domain of panel time series

3 Stationarity testing in panels

4 Cross-section dependence testing

5 Cointegration testing

6 Estimation in Heterogeneous Parameter Models

7 Outlook

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 25 / 42


Some issues of testing for cointegration
Conceptual concerns
In time-series
1 What’s the null: cointegration or noncointegration?
2 Do we use a parametric (lags) or nonparametric (kernels)
method to adjust for serial correlation in the residuals?
In panels we additionally need to worry about
1 How much heterogeneity do we allow across groups/countries?
2 How do we combine the statistics we arrive at if we opted for
heterogeneous tests?

Major approaches
Run some regression, collect residuals and test for stationarity
(‘residual-based tests’)
Construct an error correction model and investigate whether
the EC term is significant (‘error correction tests’)

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 26 / 42


Cases to consider (leaving factors aside)

yit = αi + βi xit + uit (2)


xit = µ + xi,t−1 + vit
uit = ρ i ui,t−1 + εit

1 ρ i = 1 ∀ i, errors I(1), no cointegration between x and y,


2 ρ i < 1 ∀ i, errors stationary, cointegration,
3 ρ i < 1 ∀ i and βi = β; homogeneous cointegration, otherwise
heterogeneous cointegration.
4 if there is heterogeneous cointegration but we impose
homogeneity βi = β then what is in effect estimated is
yit = ai + b xit + {(βi − b)xit + eit }
where the composite error term in {} will generally not be
stationary even though every group individually cointegrates

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 27 / 42


1st generation tests

Kao (1999) — run a static fixed effects model of variables assumed


cointegrated, get residuals and apply a pooled ADF regression (analogous to
the Engle-Granger procedure in time-series). We get a Dickey-Fuller test of
cointegration (if êit ∼ I(1) cannot reject H0 of no cointegration, if êit ∼ I(0)
reject H0 ). Kao suggests a total of 5 tests, all rather restrictive on
cointegrating vector (common) and dynamics (common).
Pedroni (1999) and Pedroni (2004) — introduces flexibility/heterogeneity
in terms of cointegrating vector and dynamics. Still residual tests in the
Engle-Granger tradition. Two groups of statistics: ‘group-mean’ (heterog),
‘panel’ (pooled). Separate tests for parametric/ nonparametric versions.
Adjustment terms to make all nine tests N(0,1) under null of no
cointegration.
McCoskey and Kao (1998) — LM test for H0 of cointegration: reverse null
test (like KPSS in stationarity testing). Distribution is non-standard:
bootstrap.
How does rejection of H0 (no cointegration) come about in heterog tests?
What’s the intuition? Baltagi (2005, p.255): “enough of the individual
cross-sections have statistics ‘far away’ from the means predicted by theory
were they to be generated under the null.”
None of these is coded in Stata, but Kao (1999) could be easily implemented.
Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 28 / 42
2nd generation — Westerlund (2007)
ECM approach — check whether an ECM does/does not have error correction
(individual group or full panel).

¢ K 1i K 3i
∆yit = ci + a0i yi,t−1 − bi xi,t−1 + a1ij ∆yi,t−j + a2ij ∆xi,t−j + uit
¡ X X
(3)
j=1 j=−K2i

where a0i is the error correction/speed of adjustment term. Note that penultimate
term includes lags and leads of ∆x, otherwise need to assume exogeneity of x.
Estimate separately ∀ i (appropriate Ki ). If a0i = 0 → no error-correction → y, x not
cointegrated. a0i < 0 → EC → cointegration. In total 4 tests, based on ‘group
mean’, ‘pooled panel’ idea. Large negative values reject H0 of no cointegration.
If CSD is suspected: use bootstrap to obtain robust critical values.

In practice: long T is important, strong assumption about the direction of


causation from x to y (weak exogeneity of x).

Coded in Stata as xtwest (needs matvsort), often quite stark results


(homog/heterog) unless T is large.

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 29 / 42


2nd generation — Gengenbach, Urbain, and Westerlund (2009)
Again: ECM approach. This time we assume a common factor structure for CSD
and account for them in the test regressions

∆Yit = αi Yi,t−1 + γ1i Xi,t−1 + γ2i Fi,t−1 (4)


pi pi pi
π1is ∆Yi,t−s + π2is ∆Xi,t−s + π3is ∆Fi,t−s + εit
X X X
+
s=1 s=0 s=0
= αi Yi,t−1 + γ1i Xi,t−1 + ψ1i Ȳt−1 + ψ2i X̄t−1 (5)
pi pi pi pi
π1is ∆Yi,t−s + π2is ∆Xi,t−s + φ1is ∆Ȳt−s + φ1is ∆X̄t−s + εit
X X X X
+
s=1 s=0 s=1 s=0

Equation is estimated ∀ i individually with ‘ideal’ lag-length (AIC, BIC selection


criterion), then results are averaged from either t-ratios of α̂i or a Wald test of α̂i
and γ̂1i . Tests distributions are nonstandard, so we have to go by the values
created from simulations. In practice apply a truncation rule to wipe out the
influence of outliers.

With a little effort this can be coded in Stata — all you’re doing is running
country-regressions. I am currently working on this command: xtectest.

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 30 / 42


1 Overview
Revelance, Approach and References

2 Introduction
The two worlds of panel data econometrics
Why not just one world?
The domain of panel time series

3 Stationarity testing in panels

4 Cross-section dependence testing

5 Cointegration testing

6 Estimation in Heterogeneous Parameter Models

7 Outlook

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 31 / 42


Empirical setup: common factor model
For i = 1, . . . , N, t = 1, . . . , T , let

yit = β0i x it + uit uit = αi + γ0i f t + εit


0
xmit = πmi + δmi g mt + ρ 1mi f1mt + . . . + ρ nmi fnmt + vmit
where m = 1, . . . , k and f ·mt ⊂ f t

CD-Production: observed output yit , k observed factor inputs x it (in logs).


Unobserved common factors f t (account for TFP) and g t .
yit , x it as well as f t , g mt are potentially nonstationary.
Country-specific factor parameters βi .
Country-specific factor loadings γi , δi , ρ i .
Country-specific fixed effects αi , πmi .
i.i.d. errors εit , vit .
Correlation between u and x: endogeneity.

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 32 / 42


Empirical implementation
Factor loadings:
Technology parameters: homogeneous heterogeneous
homogeneous A B
heterogeneous C D

A — POLS, FE, FD-OLS (all with time dummies)


B — CCEP
C — MG, RCM (all with country trends)
D — CMG, AMG, ARCM

AMG is the ‘Augmented Mean Group’ estimator (Eberhardt & Teal,


2010), a two-step procedure conceptually similar to the Pesaran
(2006) CCE estimator in the Mean Group version.
T
∆yit = b0 ∆x it + ct ∆Dt + eit ⇒ ĉ t ≡ µ̂•t
X
(i)
t=2
yit = ai + b0i x it + ci t + di µ̂•t + eit ⇒ b̂AMG = N −1
X
(ii) b̂i
i

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 33 / 42


Standard Mean Group (MG) estimator
N Time-series regressions: yit = ai + b0i x it + ci t + eit
b̂MG = N −1 b̂i
X
Averaging:
i

Common Correlated Effects Mean Group (CCEMG or CMG)


Major Insight: f t = γ̄−1 (ȳt − ᾱ − β̄0 x̄ t ) for N → ∞ since ε̄t = 0 (iff γ̄ 6= 0)
Augmentation: yit = ai + b0i x it + d1i ȳt + d 02i x̄ t + eit
⇒ b̂CMG = N −1 b̂i [can apply weights]
X

where the cross-section means ȳt (T × 1) and x̄ t (T × k) proxy for f t .


Incredibly simple setup, very powerful in ‘soaking up’ heterogeneities (observed,
unobserved)

Common Correlated Effects Mean Pooled (CCEP)


Augmentation: yit = ai + b x it + d1i ȳt + d 02i x̄ t + eit

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 34 / 42


Practical example: how to loose friends and alienate people. . .
. . . by rejecting not one but two empirical literatures/traditions:

The returns to R&D investment. Why interesting? R&D is one


of the few variables public policy can affect. Modern growth
models: development through innovation. Empirical model:
yit = αlit + βkit + γrit + λt + ψi + eit (6)
R&D spillovers. Trying to show if and how much ‘knowledge
spillovers’ get created by R&D investment:
N
tfpit = ψi + γrit + χ ωk rkt + εit
X
(7)
k=1
N
= αlit + βkit + γrit + χ ωk rkt + λt + ψi + eit
X
yit (8)
k=1

Taken from: Eberhardt, Helmers & Strauss (forthcoming) ‘Do spillovers matter
when estimating private returns to R&D?’, The Review of Economics and Statistics

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 35 / 42


Practical example

Estimator POLS 2FE MG CDMG CMG CMG


[1] [2] [3] [4] [5] [6]
ln Lit 0.464 0.608 0.568 0.557 0.599 0.698
[40.72]∗∗ [18.41]∗∗ [6.57]∗∗ [7.63]∗∗ [9.00]∗∗ [8.24]∗∗
ln Kit 0.465 0.487 0.117 0.445 0.244 0.149
[37.59]∗∗ [10.60]∗∗ [0.96] [5.01]∗∗ [1.70] [1.00]
ln Rit 0.096 0.063 -0.058 0.089 0.035 -0.050
[22.80]∗∗ [4.42]∗∗ [0.73] [2.12]∗ [0.44] [0.60]
dummies/trends included implicit included included
CRS 0.00 0.34 0.00 0.09 0.47 0.28
Order of integration I(1) I(1) I(1)/I(0) I(1)/I(0) I(0) I(1)/I(0)
CD Test 0.12 0.14 0.00 0.05 0.51 0.35
RMSE 0.278 0.163 0.051 0.068 0.037 0.035
2,637 observations, 119 country-sectors.

Code:
xtmg lny lnl lnk lnrd, trend res(r mg) ([3])
xtmg lny lnl lnk lnrd, cce res(r cmg) ([5])
xtmg lny lnl lnk lnrd, cce trend res(r cmg) ([6])

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 36 / 42


Practicalities

Easy to implement CCEP estimator with existing xtreg command


(single covariate example):
1 Create cross-section averages: sort year, then
by year: egen lyT=mean(ly)
2 xi: xtreg ly lx i.id|lyT i.id|lxT, fe
where id is the id variable for the cross-section (N) dimension.
3 Standard errors are wrong, so would need to apply the
bootstrap or write a routine to correct them.

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 37 / 42


Pesaran, Shin, and Smith (1999) Pooled Mean Group (PMG)
Sometimes assuming a common long-run equilibrium relationship makes a lot
of sense (e.g. in OECD countries):

∆yit = αi + βi ∆xit + λi (θxi,t−1 − yi,t−1 ) + uit uit ∼ iidN(0, σ2i ) (9)

βi , are short-run parameters, which like σ2i differ across countries.


Error-correction term λi also differs across i, long-run parameter θ however is
constant across the groups. This estimators is quite appealing when studying
small sets of arguably ‘similar’ countries rather than large diverse macro panels.

In I(1) panels this estimator allows for mix of cointegration (λi > 0) and
noncointegration (λi = 0).

Code: xtpmg d.lny d.lnx, lr(l.lny l.lnx) ec(ec) replace.

The short-run equations can also be manually augmented with cross-section


averages to yield the Binder and Offermanns (2007) C-PMG.

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 38 / 42


1 Overview
Revelance, Approach and References

2 Introduction
The two worlds of panel data econometrics
Why not just one world?
The domain of panel time series

3 Stationarity testing in panels

4 Cross-section dependence testing

5 Cointegration testing

6 Estimation in Heterogeneous Parameter Models

7 Outlook

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 39 / 42


Work in Progress and Plans
Pesaran et al. (2009) CIPSM PURT (xtcipsm)
Gengenbach, Palm, and Urbain (2010) EC test of cointegration
(xtectest)
When people find my website through google searches the most
popular panel time series-related keywords relate to
PANIC: Bai and Ng (2002, 2004) methods related to estimating
the unobserved common factors.
GM-FMOLS: Pedroni (2000) averaged FMOLS estimator.
Pedroni’s 7 panel cointegration tests (assuming cross-section
independence)
CUP-FM: Bai and Kao (2006) estimator.

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 40 / 42


Thank you.
Markus Eberhardt
University of Nottingham and Centre for the Study of African
Economies, University of Oxford

http://sites.google.com/site/medevecon (code, data),


http://twitter.com/sjoh2052 (data updates)

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 41 / 42


Bai, J., & Kao, C. (2006). On the estimation and inference of a panel cointegration model with cross-sectional dependence.
In B. H. Baltagi (Ed.), Panel data econometrics: Theoretical contributions and empirical applications. Amsterdam:
Elsevier Science.
Bai, J., & Ng, S. (2002). Determining the Number of Factors in Approximate Factor Models. Econometrica, 70(1), 191-221.
Bai, J., & Ng, S. (2004). A PANIC attack on unit roots and cointegration. Econometrica, 72(4), 191-221.
Baltagi, B. H. (2005). Econometric Analysis of Panel Data. New York: John Wiley and Sons.
Baltagi, B. H. (2008). Econometric Analysis of Panel Data (4th ed.). New York: John Wiley and Sons.
Binder, M., & Offermanns, C. J. (2007). International investment positions and exchange rate dynamics: a dynamic panel
analysis (Discussion Paper Series 1: Economic Studies Nos. 2007,23). Deutsche Bundesbank.
Coakley, J., Fuertes, A.-M., & Smith, R. P. (2006). Unobserved heterogeneity in panel time series models. Computational
Statistics & Data Analysis, 50(9), 2361-2380.
Eberhardt, M., & Teal, F. (2010). Productivity Analysis in Global Manufacturing Production. (Oxford University, Department
of Economics Discussion Paper Series #515)
Eberhardt, M., & Teal, F. (2011). Econometrics for Grumblers: A New Look at the Literature on Cross-Country Growth
Empirics. Journal of Economic Surveys, 25(1), 109-155.
Gallup, J. L., Mellinger, A. D., & Sachs, J. D. (1999). Geography Datasets: Agricultural Measures. Available online at CID,
Harvard University.
Gengenbach, C., Palm, F. C., & Urbain, J.-P. (2010). Panel Unit Root Tests in the Presence of Cross-Sectional Dependencies:
Comparison and Implications for Modelling (Vol. 29; Tech. Rep. No. 2).
Gengenbach, C., Urbain, J.-P., & Westerlund, J. (2009). Error Correction Testing in Panels with Global Stochastic Trends
(December 2009). Maastricht University: METEOR.
Hadri, K. (2000). Testing for stationarity in heterogeneous panel data. The Econometrics Journal, 3, 148-161.
Im, K. S., Pesaran, M. H., & Shin, Y. (1997). Testing for unit roots in heterogeneous panels. (Discussion Paper, University of
Cambridge)
Kao, C. (1999). Spurious regression and residual-based tests for cointegration in panel data. Journal of Econometrics, 65(1),
9-15.
Levin, A., & Lin, C. (1992). Unit root tests in panel data: Asymptotic and finite-sample properties. (UCSD Discussion paper
92-23)
Maddala, G. S., & Wu, S. (1999). A comparative study of unit root tests with panel data and a new simple test. Oxford
Bulletin of Economics and Statistics, 61(Special Issue), 631-652.
Matthews, E. (1983). Global Vegetation and Land Use: New High Resolution Databases for Climate Studies. Journal of
Climate and Applied Meteorology, 22, 474-487.
McCoskey, S., & Kao, C. (1998). A residual based test of the null of cointegration in panel data. Econometric Reviews, 17,
57-84.
Moscone, F., & Tosetti, E. (2009). A Review And Comparison Of Tests Of Cross-Section Independence In Panels. Journal of
Economic Surveys, 23(3), 528-561.
Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 42 / 42
Pedroni, P. (1999). Critical values for cointegration tests in heterogeneous panels with multiple regressors. Oxford Bulletin
of Economics and Statistics, 61(Special Issue), 653-670.
Pedroni, P. (2000). Fully modified OLS for heterogeneous cointegrated panels. In B. H. Baltagi (Ed.), Nonstationary panels,
cointegration in panels and dynamic panels. Amsterdam: Elsevier.
Pedroni, P. (2004). Panel Cointegration: Asymptotic and Finite Sample Properties of Pooled Time Series Tests with an
Application to the PPP Hypothesis. Econometric Theory, 20(3), 597-625.
Pedroni, P. (2008). Nonstationary panel data. Notes for IMF Course. (Not publically available.)
Pesaran, M. H. (2004). General diagnostic tests for cross section dependence in panels. (IZA Discussion Paper No. 1240)
Pesaran, M. H. (2006). Estimation and inference in large heterogeneous panels with a multifactor error structure.
Econometrica, 74(4), 967-1012.
Pesaran, M. H. (2007). A simple panel unit root test in the presence of cross-section dependence. Journal of Applied
Econometrics, 22(2), 265-312.
Pesaran, M. H., Shin, Y., & Smith, R. (1999). Pooled mean group estimation of dynamic heterogeneous panels. Journal of the
American Statistical Association, 94, 289-326.
Pesaran, M. H., & Smith, R. P. (1995). Estimating long-run relationships from dynamic heterogeneous panels. Journal of
Econometrics, 68(1), 79-113.
Pesaran, M. H., Smith, V., & Yamagata, T. (2009). Panel unit root tests in the presence of a multifactor error structure.
(Cambridge University, unpublished working paper, September)
Smith, R. P., & Fuertes, A.-M. (2007). Panel Time Series. (Centre for Microdata Methods and Practice (cemmap) mimeo, April
2007.)
Westerlund, J. (2007). Testing for Error Correction in Panel Data. Oxford Bulletin of Economics and Statistics, 69(6), 709-748.

Markus Eberhardt (Nottingham) Panel Time Series in Stata 2011 42 / 42

You might also like