0% found this document useful (0 votes)
4 views

slides-6-iu

The document discusses panel data models, including pooled OLS, fixed effects, and random effects models, along with their advantages and disadvantages. It highlights the importance of the Hausman test to determine the consistency of estimates between fixed and random effects models. Additionally, it provides examples of panel data using provincial data from Vietnam and outlines model specifications and estimation methods.

Uploaded by

Ngô Trâm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

slides-6-iu

The document discusses panel data models, including pooled OLS, fixed effects, and random effects models, along with their advantages and disadvantages. It highlights the importance of the Hausman test to determine the consistency of estimates between fixed and random effects models. Additionally, it provides examples of panel data using provincial data from Vietnam and outlines model specifications and estimation methods.

Uploaded by

Ngô Trâm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

PANEL DATA MODELS

Trương Đăng Thụy


[email protected]
1 - Panel data
2 - Pooled OLS estimator
3 - Fixed effects model
4 - Random effects model
5 - FE vs RE: Hausman test
6 - Between group estimator

COVERED IN THIS LECTURE


PANEL DATA
PANEL DATA
▪ cross-sectional: MANY units and ONE time period
𝑦𝑖 with 𝑖 = 1, … 𝑁
▪ time-series: ONE unit and SEVERAL time periods
𝑦𝑡 with 𝑡 = 1, … , 𝑇
▪ panel data: data on MANY units and SEVERAL time periods
𝑦𝑖𝑡 with 𝑖 = 1, … , 𝑁 and 𝑡 = 1, . . , 𝑇
EXAMPLE DATA
Viet Nam Provincial data on
▪ rgdp: provincial GDP (bil. VND)
▪ labfo: number of laborers of provinces (1000 persons)
▪ rinvest: gross investment of provinces (bil. VND)
▪ pci: 100-point scaled composite index measuring and ranking Vietnam’s provinces
based on their overall economic governance quality
▪ data for 58 provinces, 5 years (2007-2011)
EXAMPLE PANEL DATA
provcode province year rgdp labfo rinvest pci
An Giang 1 2007 22000000 1221.3 5600000 66.4688
1 2008 25000000 1244.9 4600000 61.1247
1 2009 25000000 1227.3 4800000 58.177
1 2010 27000000 1255 4500000 61.9379
1 2011 29000000 1300.4 3900000 62.22
Bac Can 2 2007 1500000 177.2 592714 46.4687
2 2008 2000000 179.8 1100000 39.7762
2 2009 2400000 189.8 1100000 75.9563
2 2010 3400000 194 2600000 51.4864
2 2011 4200000 199.6 2900000 52.71
... ... ... ... ... ... ...
ADVANTAGES AND DISADVANTAGES
OF PANEL DATA
▪ Advantages
▪ More observations
▪ More variability
▪ Less collinearity between regressors
▪ Control of individual heterogeneity
▪ Reduce biases

▪ Disadvantages
▪ Require more efforts collecting data
▪ Selectivity biases
PANEL DATA MODEL
REQUIRES WITHIN GROUP VARIATION
▪ Panel data model (FE) requires variation within group
▪ An example where panel data does not work
𝑦𝑖𝑡 = 𝛼 + 𝛽𝑥𝑖𝑡 + 𝑢
▪ 𝑦𝑖𝑡 is export volume from VN to country 𝑖 in year 𝑡
▪ 𝑥𝑖𝑡 is the distance from VN to country 𝑖 in year 𝑡
▪ As distance from VN to country 𝑖 does not change from year to year, it can’t be
included in the fixed effect model.
THE DATA
▪ Provincial data 2007 – 2011
▪ rgdp: regional GDP (mil. VND)
▪ rinvest: investment (mil. VND)
▪ labfo: labor force (thousand workers)
▪ pci: Provicial Competitive Index (range 0-100)

▪ Source: https://kinhteluong.online/esdata/iu/panel.csv
SUMMARY STATISTICS
MODEL SPECIFICATION
▪ In this lecture we will consider the specification

𝑦𝑖𝑡 = 𝛼 + 𝛽𝑋𝑖𝑡 + 𝑢
▪ 𝑦𝑖𝑡 is the logarithm of real GDP of province 𝑖 in year 𝑡
▪ 𝑋𝑖𝑡 includes
▪ Logarithm of the labor force
▪ Logarithm of real investment
▪ Provincial competitiveness index (PCI)
POOLED OLS ESTIMATOR
POOLED OLS ESTIMATOR
▪ Data of all groups are pooled together
▪ No difference between groups

𝑦𝑖𝑡 = 𝛼 + 𝛽𝑋𝑖𝑡 + 𝑢𝑖𝑡


▪ Coefficients are identical for all groups.
▪ Some assumptions:
▪ The error term is not autocorrelated and homoscedastic
▪ 𝑋 is nonstochastic and not correlate with 𝑢 (𝑋 is strictly exogenous)
THE POOLED OLS IN R
POOLED OLS WITH ROBUST STANDARD ERRORS
CLUSTERED STANDARD ERRORS
▪ The Pooled OLS estimator (and other panel data models) assumes no correlation
between residuals of the same group
▪ If we relax the assumption, then

cov 𝑢𝑖𝑡 , 𝑢𝑖𝑠 ≠ 0


▪ We then have heteroskedasticity and autocorrelation
▪ If this happens, the Pooled OLS estimator is still consistent, but the standard errors
are incorrect.
▪ In this case we may use the clustered robust standard errors.
POOLED OLS WITH CLUSTERED STANDARD ERRORS
POOLED OLS
USING PACKAGE PLM
FIXED EFFECTS MODEL

Within group estimator


THE FIXED EFFECTS MODEL
▪ The model

𝑦𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝑢𝑖𝑡


▪ The slopes are still identical for all groups.
▪ But each group has a different intercept.
▪ These intercepts are called fixed effects, which capture individual heterogeneity.
▪ Two estimators:
▪ Fixed effects estimator (within group)
▪ Least square dummy variable estimator

▪ Note: these are the two ways of estimating the FE model, not two different models.
WITHIN GROUP FIXED EFFECTS ESTIMATOR
▪ The model
𝑦𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝑢𝑖𝑡 (1)
▪ We need to allow for the intercept to vary across groups.
▪ Now take the average of variables across time, note that the parameters are time-
invariant
𝑦ത𝑖𝑡 = 𝛼𝑖 + 𝛽 𝑋ത𝑖𝑡 + 𝑢ത 𝑖𝑡 (2)
1 1
▪ where 𝑦ത𝑖𝑡 = σ𝑇𝑡=1 𝑦𝑖𝑡 and 𝑋ത𝑖𝑡 = σ𝑇𝑡=1 𝑋𝑖𝑡
𝑇 𝑇
▪ Then subtract (2) from (1)
𝑦𝑖𝑡 − 𝑦ത𝑖𝑡 = 𝛼𝑖 − 𝛼𝑖 + 𝛽 𝑋𝑖𝑡 − 𝑋ത𝑖𝑡 + 𝑢𝑖𝑡 − 𝑢ത 𝑖𝑡
▪ Which results in
𝑦ු𝑖𝑡 = 𝛽 𝑋ෘ𝑖𝑡 + 𝑢ු 𝑖𝑡
▪ With this way we can estimate 𝛽 but not the fixed effects.
WITHIN GROUP
FIXED EFFECTS
ESTIMATOR
WITHIN GROUP
FIXED EFFECTS
ESTIMATOR
robust standard
errors
WITHIN GROUP
FIXED EFFECTS
ESTIMATOR
clustered standard
errors
LEAST SQUARES DUMMY VARIABLE ESTIMATOR
▪ For the model
𝑦𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝑢𝑖𝑡
▪ we can estimate the fixed effects and 𝛽 by introducing the dummy variables
1 if 𝑗 = 𝑖
𝐷𝑗𝑖 =
0 otherwise
▪ We can then estimate the following model using OLS
𝑁

𝑦𝑖𝑡 = ෍ 𝛼𝑗 𝐷𝑗𝑖 + 𝛽𝑋𝑖𝑡 + 𝑢𝑖𝑡


𝑗=1
▪ This is the least squares dummy variable (LSDV) estimator.
▪ The LSDV estimates are identical to the within group FE estimates.
▪ However, LSDV estimate the fixed effects.
▪ On the other hand, LSDV is not feasible when 𝑁 is large.
LSDV
FIXED EFFECTS
ESTIMATOR

some factors omitted


LSDV
FIXED EFFECTS
ESTIMATOR
robust standard
errors
LSDV
FIXED EFFECTS
ESTIMATOR
clustered standard
errors
LSDV TWO-WAY FIXED EFFECTS MODEL
▪ The model now includes time fixed effects

𝑁 𝑇

𝑦𝑖𝑡 = ෍ 𝛼𝑗 𝐷𝑗𝑖 + ෍ 𝛾𝑔 𝐷𝑔𝑡 + 𝛽𝑋𝑖𝑡 + 𝑢𝑖𝑡


𝑗=1 𝑔=1

where
1 if 𝑔 = 𝑡
𝐷𝑔𝑡 =
0 otherwise
LSDV TWO-WAY
FIXED EFFECTS MODEL

some factors omitted


RANDOM EFFECTS MODEL
▪ The random effects model is presented by
𝑦𝑖𝑡 = 𝛼 + 𝛽𝑋𝑖𝑡 + 𝑢𝑖𝑡
▪ The error component now includes
𝑢𝑖𝑡 = 𝜇𝑖 + 𝜖𝑖𝑡
▪ 𝜇𝑖 ~𝑁 0, 𝜎𝜇2 the individual specific random component
▪ 𝜖𝑖𝑡 ~𝑁 0, 𝜎𝜖2 the idiosyncratic disturbance
▪ In the random effects model, regressors can be time-invariant
▪ Estimation method: generalized least squares
RANDOM EFFECTS MODEL
RANDOM EFFECT
MODEL
clustered
standard errors
RANDOM VS. FIXED EFFECTS
RANDOM VS. FIXED EFFECTS
▪ The main difference is that the individual effects are assumed:
▪ fixed in FE
▪ random in RE.

▪ The random effects model is preferred for


▪ The fixed effects vary over time.
▪ It is more efficient (higher degree of freedom)
▪ It allows time-invariant regressors

▪ RE estimates, however, are inconsistent if assumption (error term is not correlated


with individual effects) is violated
▪ Which one should we use? Hausman test!
HAUSMAN TEST
▪ Null hypothesis:
▪ Estimates of RE and FE are not systematically different, or
▪ both RE and FE estimates are consistent

▪ Alternative hypothesis: RE estimates are inconsistent


▪ Test statistics
′ −1
𝐻 = 𝛽𝐹𝐸 − 𝛽𝑅𝐸 𝑉 𝛽𝐹𝐸 − 𝑉 𝛽𝑅𝐸 𝛽𝐹𝐸 − 𝛽𝑅𝐸
▪ which follows 𝜒 2 with df = number of regressors. Note that 𝑉(𝛽) is the variance
covariance matrix.
▪ We reject H0 if p-value is small.
▪ If reject H0: estimates of RE and FE are different, and so RE estimates are inconsistent.
▪ If not reject H0: RE and FE estimates are not different, so both are good. But remember
that RE estimates are more efficient.
HAUSMAN TEST IN R
NOTES ON HAUSMAN TEST
▪ We can test only with the same set of regressors.
▪ If we include time-invariant regressor in the RE model (which is not possible in FE
model), then Hausman test fails.
▪ Hausman test check whether the two estimates are equal.
▪ If we reject the null hypothesis, the FE estimates are consistent and the RE model is
not.
▪ Important: if any regressor is correlated with the error term, both estimates are
biased.

You might also like