Advanced Panel Data Methods
Chapter 14
Wooldridge: Introductory Econometrics:
A Modern Approach, 5e
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
What is panel data?
Panel data, sometimes referred to as longitudinal
data, is data that contains observations about
different cross-sections across time. Examples of
groups that may make up panel data series include
countries, firms, individuals, or demographic
groups.
Like time series data, panel data contains
observations collected at a regular frequency,
chronologically. Like cross-sectional data, panel
data contains observations across a collection of
individuals.
What is panel data?
Example: A panel data observation Yit is observed for all
individuals i=1,…,N across all time periods t = 1,...,T
Example of panel data
Example: A panel data observation Yit is observed for all
individuals i=1,…,N across all time periods t = 1,...,T
Panel data
Panel data can model both the common and
individual behaviors of groups.
Panel data contains more information, more
variability, and more efficiency than pure time
series data or cross-sectional data.
Panel data can detect and measure statistical
effects that pure time series or cross-sectional data
can't.
Panel data can minimize estimation biases that
may arise from aggregating groups into a single
time series.
Panel data
𝑦𝑖𝑡 = 𝛽1 + 𝛽2𝑖𝑡 𝑥𝑖𝑡 + 𝜀𝑖𝑡
→ 𝜀𝑖𝑡 = 𝑢𝑖 + 𝑣𝑡 + 𝑒𝑖𝑡
Unobserved factors that are constant across entities but vary over
time - 𝑣𝑡
Policy Changes: Changes in tax policy, trade agreements, or environmental
regulations can affect businesses, industries, or regions differently over time.
Macroeconomic Shocks: Financial crises, natural disasters, or geopolitical
tensions can generate common shocks that influence all entities within a given
context during specific time periods.
Unobserved factors that are specific to each entity refer to
characteristics or influences that are unique to individual entities
(e.g., individuals, firms, countries) and remain relatively constant
over time. - 𝑢𝑖
Location-specific Characteristics: Geographic location can introduce
unobserved factors that are specific to each entity: local market conditions,
infrastructure quality.
Advantages of panel data
Panel data allow to control for unobserved factors
that are constant across entities but vary over time
or factors that are specific to each entity.
Panel data analysis can help reduce omitted
variable bias compared to pooled cross-sectional
data
By including entity-specific fixed effects (in Fixed
Effects Models - FEM) or random effects (in
Random Effects Models - REM), panel data
analysis can address unobserved heterogeneity
more effectively compared to pooled cross-
sectional data.
Regression model
𝑦𝑖𝑡 = 𝛽1 + 𝛽2𝑖𝑡 𝑥𝑖𝑡 + 𝜀𝑖𝑡 , 𝜀𝑖𝑡 = 𝑢𝑖 + 𝑣𝑡 + 𝑒𝑖𝑡
𝑣𝑡 = 0 → 𝜀𝑖𝑡 = 𝑢𝑖 + 𝑒𝑖𝑡
Pooled-OLS:
𝑢𝑖 = 0∀𝑖 → 𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝑒𝑖𝑡
Fixed effect model: 𝑢𝑖 ,
𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝑢𝑖 + 𝑒𝑖𝑡
Random effect model: 𝑢𝑖 ~𝑁(0, 𝜎𝑢2 )
𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝛾𝑍𝑖 + 𝑒𝑖𝑡 ,
Fixed Effects Models - FEM
The entity fixed effect regression model is:
𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝑢𝑖 + 𝑒𝑖𝑡 (𝑖 = 1, … , 𝑛; 𝑡 = 1, . . , 𝑇)
Random Effects Models - REM
The entity random effect regression model is:
𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝛾𝑍𝑖 + 𝑒𝑖𝑡
POLS vs FEM vs REM
POLS vs FEM
• Wald Test
POLS vs REM
• LM Test
REM vs FEM
• Hausman Test
REM vs Pooled OLS
REM: Assumes that the errors are homoscedastic across entities but may
be heteroscedastic across time periods within entities.
POLS: Assumes that the errors are homoscedastic across all observations.
The Breusch-Pagan Lagrange multiplier test
H0: The errors in the model are homoscedastic across observations.
H1: The errors are heteroscedastic across observations.
Test Statistic:
The test statistic is based on the residuals from the regression model. It
assesses whether there is a systematic relationship between the squared
residuals and the independent variables.
Interpretation:
p-value < Reject the null hypothesis. The assumption of
homoscedasticity underlying POLS is violated REM is more appropriate..
p-value > Fail to reject the null hypothesis The errors are
homoscedastic and the assumption of POLS is met.
REM vs FEM
REM: Assumes that the unobserved heterogeneity (individual-specific
effects) is uncorrelated with the independent variables.
FEM: Assumes that the unobserved heterogeneity is correlated with the
independent variables.
The Hausman test
H0: The FEM is consistent and efficient, while the REM is inconsistent.
H1: The REM is consistent and efficient, while the FEM is inconsistent.
Test Statistic:
The test statistic is based on the difference between the coefficient estimates
from the REM and FEM.
Interpretation:
p-value < Reject the null hypothesis. FEM is preferred.
p-value > Fail to reject the null hypothesis REM is consistent and
efficient.
Computer exercise 1
Use the data in SCHOOL93_98 to answer the following questions. Use
the command xtset schid year to set the cross section and time
dimensions.
(i) How many schools are there. Does each school have a record for
each of the six years? Verify that lavgrexpp is missing for all
schools in 1993.
(ii) Create a selection indicator, s, that is equal to one if and only if you
have nonmissing data on math4, lavgrexpp, lunch, and lenrol. Next,
define a variable tobs to be the number of complete time periods
per school. How many schools have all given years of available
data (noting that 1993 is not available for any school when we use
lavgrexpp)? Drop all schools with tobs = 0.
(iii) Use random effects to estimate a model relating math4 to
lavgrexpp, lunch, and lenrol. Be sure to include a full set of year
dummies. What is the estimated effect of school spending on
math4? What is its cluster-robust t statistic?
Computer exercise 1
Use the data in SCHOOL93_98 to answer the following questions. Use
the command xtset schid year to set the cross section and time
dimensions.
iv. Now estimate the model from part (iii) by fixed effects. What is the
estimated spending effect and its robust confidence interval? How
does it compare to the RE estimate from part (iii)?
v. Create the time averages of all of the explanatory variables in the
RE/FE estimation, including the time dummies. You need to use the
selection indicator constructed in part (ii). Verify that when you add
these and estimate the equation by RE you obtain the FE estimates
on the time-varying explanatory variables. What happens if you
drop the time averages for y95, y96, y97, and y98?
vi. Is the random effects estimator rejected in favor of fixed effects?
Explain
Stata code
Code Meaning/Using Command
a command used to convert string encode varname,
encode
variables into numeric variables generate(newvarname)
xtset declare the panel data xtset id year
estimate fixed-effects panel data xtreg dependent_var
models. independent_vars, fe
xtreg
estimate random-effects panel xtreg dependent_var
data models. independent_vars, re
Perform a Breusch-Pagan LM xtreg dependent_var
(Lagrange multiplier) test to independent_vars, re
xttest0
choose between the POLS model Xttest0
and the REM model
Perform the Hausman test, which xtreg dependent_var
helps determine whether the fixed independent_vars, fe
effects model (FEM) or the est sto fem
hausman random effects model (REM) is xtreg dependent_var
more appropriate for your panel independent_vars, re
data analysis. est sto rem
hausman fem rem
Stata code
Code Meaning/Using Command
sends Stata ssc install asdoc
output to Word / asdoc sum (dependent_var independent_vars),
asdoc RTF format. detail replace dec(3)
asdoc pwcorr (dependent_var
independent_vars) , star(all) replace dec(3)
exporting ssc install outreg2
regression Reg/xtreg dependent_var independent_vars
results into outreg2 using myreg.doc, replace ctitle(Fixed
outreg2 tables. Effects) keep(x1 x2 x3) addtext(Country FE,
YES, Year FE, YES) dec(3)
keep(independent_vars) stats(coef se tstat
pval)