Lecture Series 1
Linear Random and Fixed Effect Models and
Their (Less) Recent Extensions
Stefanie Schurer
[email protected]RMIT University
School of Economics, Finance, and Marketing
January 21, 2014
1 / 62
Overview
1
Recap: Linear model set-up, random effects estimation and
fixed effects estimation;
Relationship between random and fixed (and between) effects
estimators;
Is fixed effects estimation always preferable to random effects
estimation?;
Hausman-Taylor (1981) approach to estimating coefficients on
both time-varying and time-invariant variables;
Correlated random effects (CRE): a flexible extension to
random effect models to relax orthogonality condition;
Plumper and Troegers Fixed Effects Vector Decomposition
Approach and Rule of Thumb;
Application: Estimating the effects of health on wages.
2 / 62
References for Lecture 1
1
2
3
4
5
6
Greene, W.H. (2011). Econometric Analysis. Pearson
Education Limited. 399-438.
Wooldridge, J. (2009). Econometric Analysis of Cross Section
and Panel Data. The MIT Press. 285-382, 345-361.
Hsiao, C. (2003). Analysis of Panel Data. Econometric
Society Monographs. CUP: New York. 27-44.
Mundlak, Y. (1978). On the Pooling of Time Series and
Cross-section Data. Econometrica 46: 69-85.
Hausman, J., Taylor, W.E. (1981). Panel Data and
Unobservable Individual Effects. Econometrica 49: 1377-1398.
Pluemper, T., Traeger, V. (2007). Efficient estimation of
time-invariant and rarely changing variables in finite sample
panel analyses with unit fixed effects. Political Analysis 15:
124-139.
Contoyannis, P., Rice, N. (2001). The impact of health on
wages. Evidence from the BHPS. Empirical Economics 26:
599-622.
3 / 62
1. Recap: Linear model set-up
4 / 62
Heterogeneous intercept models
Consider the following linear regression model which allows for
individual-specific heterogeneity i
Yit = Xit + it ,
(1)
for all i = 1, . . . , N and t = 1, . . . , T
it = i + uit ,
(2)
Yit is some outcome of interest;
Xit is a vector of covariates (Xit1 , . . . , XitK ) and generally
includes a constant term, i.e. Xit1 = 1 for all i and t. These
may include also time-invariant variables such as Xi .
The unobserved (errors) consist of two components: i
(constant across time), uit is an idiosyncratic error term that
varies across individuals and time
(uit iid(0, u2 ), E (it |i , Xi 1 , . . . , XiT ))=0.
5 / 62
The model in matrix notation
The NT observations are ordered first by i units, and then by t
observations, such that:
Y = X + ,
(3)
The dimensions are:
Y: NT 1 vector of Yit s;
X: NT K matrix with rows columns Xitk ;
: NT 1 vector of it s.
6 / 62
OLS estimator in matrix form
The OLS estimator for is:
OLS = (X X )1 (X Y )
(4)
Our focus here is how to estimate this model under different
assumptions about the individual-specific heterogeneity i . Early
discussions (examples) in the literature were concerned with
whether i should be treated as a random variable (which would
add a error term) or as a fixed parameter to be estimated for each
cross-sectional group.
More modern approaches to panel data econometrics are more
concerned with the question whether i is correlated with the
explanatory variables of interest (e.g. Wooldridge, 2009, p.
285-286).
7 / 62
Random versus fixed effect models
We will examine the implications for OLS estimation under the
alternative assumptions that:
1
i is uncorrelated with Xit for all t = 1, . . . , T (referred to as
random effects model);
i is allowed to arbitrarily correlate with Xit for all
t = 1, . . . , T (referred to as fixed effects model);
i is assumed to linearly depend on Xit (Referred to as
correlated random effects model);
We will consider the suitable estimators in each case.
8 / 62
Random effect models
In the random effect model we assume Cov (Xit , i ) = 0
t = 1, . . . , T , or the stronger assumption of zero conditional
expectation, i.e.: E (i |Xi 1 , . . . XiT ) = 0. In this scenario, using
OLS will yield unbiased parameter estimates, but wrong standard
errors and thus unreliable statistical inference. Lets take a look at
why:
Consider the properties of the OLS estimator:
E (OLS |X ) = E {(X X )1 X Y |X }
= E {(X X )
X (X + )|X }
= + (X X )
X E {|X }
(5)
(6)
(7)
(8)
9 / 62
Random effect models
Now think of the sampling properties of the OLS estimator:
Var (OLS |X ) = Var {(X X )1 X Y |X }
= Var {(X X )
X (X + )|X }
1
= Var { + (X X )
= XX
(9)
X |X }
(10)
(11)
X Var {|X }X (X X )
(12)
Recall, the OLS assumption about is that it iid (0, 2 ) and so:
Var (OLS |X ) = 2 (X X )1 ,
(13)
and replacing 2 by an estimate, typically the sample variance of
the regression errors:
s2 =
N T
1 XX 2
eit .
NT
(14)
i =1 t=1
10 / 62
Random effect models
But whats wrong with the variance when we allow for unobserved
heterogeneity? Due to it = i + uit the assumption of
independent errors across observations fails. In particular, if
i N(0, 2 ) and uit iid(0, u2 ), where 2 = u2 + 2 , then the
variance-covariance matrix of i = (i 1 , i 2 , . . . , iT ) :
u2 + 2
2
...
2
2
2
u2 + 2 . . .
2
2
2
2
2
...
2
Var (i |Xi ) =
2
2
. . . u2 + 2
2
2
2
2
2
...
u + 2
u2 IT T + 2 iT 1 i1T
= (i is a vector of ones).
11 / 62
Random effect models
Since the observations i and j are independent, the disturbance
covariance matrix for the full NT observations is:
0 ... 0 0
0 ... 0 0
N
= .
= INN
T T
.. ..
.
.
... . .
0 0 . . . 0 NT NT
12 / 62
Random effect models
There are two solutions to fix the wrong standard errors implied by
cross-sectional unobserved heterogeneity when using OLS:
1
Correcting the OLS standard errors: robust covariance matrix
estimation; estimate model with OLS, then adjust standard
errors ex post.
Random effects estimation: obtain a more efficient estimator
of using generalised least squares. transform the data first,
then use OLS on transformed data - this approach is similar to
(feasible) GLS when controlling for e.g. heteroskedasticity.
13 / 62
1. Correcting OLS standard errors ex post
Note that Var (OLS |X ) = X X 1 X Var {|X }X (X X )1
implies that Var (|X ) is a NT NT matrix with a block
diagonal structure;
For each of the N cross-sectional groups there will be T T
diagonal blocks corresponding to Var (i |X );
Off these diagonal blocks the matrix has zeros due to the
assumed independence of the cross-sectional sample;
Thus we can correct the OLS standard errors by replacing
Var (|X ) with a suitable estimate from the sample data.
14 / 62
1. Correcting OLS standard errors ex post
Suitable estimators are:
2 by:
Estimate
s2
N T 1 T
NT (T 1) 1 X X X
=
eit eis
2
(15)
i =1 t=1 s=t+1
Estimate 2 by:
2
s = (NT )
N X
T
X
eit2
(16)
i =1 t=1
This approach is nothing else than robust covariance matrix
estimation (See p. 390 in Greene, 2012).
15 / 62
Random effects (or GLS) estimation
We want to transform the data in a way that the variance of
the transformed errors is equal to the identity matrix; i.e.
Var (
) = Var () = INT
(17)
A good candidate for transforming the data for each
individual is 1/2 - hence, if we find this term, we can
pre-multiply Yi , Xi and i by 1/2 (or, in terms of matrix
notation: 1/2 and pre-multiply Y , X , and ).
See the derivations of 1/2 on the blackboard;
The final result is:
1/2 =
[I iT 1 i1T
],
u
T
where
=1 p
u
u2 + T 2
(18)
(19)
16 / 62
Random effects estimation
Consider the following transformation of our benchmark linear
regression model:
1/2 Y = 1/2 X + 1/2 ,
(20)
Y = X + .
(21)
or
where, for instance:
1/2 Yi =
Yi 1 Yi
Yi 2 Yi
..
.
YiT Yi
1/2
Xi =
,
Xi 1 Xi
Xi 2 Xi
..
.
XiT Xi
17 / 62
Random effects estimation
We can show that the transformed errors have the property that
Var (
) = Var () = INT (try to do at home). Thus, feasible
GLS regression based on this transformation satisfies the necessary
assumptions for efficient estimation of , and is referred to as the
random effects estimator:
RE = (X X )1 X Y = (X 1 X )1 X 1 Y .
(22)
The variance of this estimator is (Homework: Check whether you
can derive all steps by yourself - we will talk about it in class next
week)
Var (RE |X ) = Var {(X 1 X )1 X 1 Y |X }
= (X
= (X
X)
X)
X (X
(23)
1
X)
(24)
(25)
18 / 62
Fixed effect models
In fixed effect model we allow for the possibility of Cov (Xit , i ). In
this case, the OLS estimator OLS will be biased and inconsistent.
This is so because:
E (OLS |X ) = E {(X X )1 X Y |X }
= E {(X X )
6= ,
X (X + )|X }
= + (X X )
X E {|X }
(26)
(27)
(28)
(29)
where the last inequality stems from the fact that E (i |Xit ) 6= 0.
19 / 62
Fixed effect models
There are two solutions to the problem. Re-consider the original
model:
Yit = Xit + it ,
(30)
for all i = 1, . . . , N and t = 1, . . . , T
it = i + uit ,
(31)
Within-group fixed effects: subtract the within-group means
from the original regression equation that combines Eqs. 30
and 31.
First-differences between two adjacent time periods.
20 / 62
Within-group fixed effects
Construct the within-group average of benchmark linear regression
model:
Yi = Xi + i + ui ,
(32)
P
P
Yit and Xi = T 1 T
where Yi = T 1 T
t=1
t=1 Xit , and
PT
1
ui = T
t=1 uit . Then, subtract Eq. 32 from combined Eqs. 30
and 31.
Yit Yi = (Xit Xi ) + (uit ui ) (i i ).
(33)
And so the within-group fixed effects estimator is:
FE =
N X
T
N X
T
hX
i1 h X
i
(Xit Xi )(Xit Xi )
(Xit Xi )(Yit Yi )
i =1 t=1
i =1 t=1
(34)
21 / 62
Within-group fixed effects
In contrast to the random effects or GLS procedure which uses
both within-group (across time) and between-group (across
cross-sectional units) variation to estimate , the within-group
fixed effect approach uses only the within-group variation.
Any time-invariant observable characteristics will also
difference out, so that their coefficients cannot be identified
(unless they are interacted with time-varying variables).
N degrees of freedom will be lost, since this approach
estimates the group sample means (one for each group).
Even though the transformed errors in Eq. 33 (uit ui ) are
non-classical (which means what?), the OLS standard errors
from the fixed effects regression are correct.
22 / 62
First differences approach
Alternatively, one can eliminate i from the equations by taking a
first difference of the regression model combining Eqs. 30 and 31:
Yit Yit1 = (Xit Xit1 ) + (uit uit1 ).
(35)
Only coefficients on time-varying regressors are estimable.
First differences is equivalent to within-group fixed effect
estimation for T=2.
23 / 62
Pros and cons
The first differences approach is easy to implement manually
and keeping track of the correct number of degrees of
freedom is more straightforward.
If the model is correctly specified and if there is no serial
correlation, then within-group fixed effect estimation is more
efficient than first differences.
The relative efficiency between the two estimators depends on
the degree of serial correlation in the idiosyncratic errors
(Cov (uit , uis ), for t 6= s). (Why?)
24 / 62
2. Relationship between random and fixed
(and between) effect estimators
25 / 62
Some transformations
Consider the following transformation
T 1 iT 1 i1T
,
N
where INN is the identity matrix of dimension N N,
is
the Kronecker product, and iT 1 is a T 1 vector of 1s.
Group-means transformation: P = INN
Deviations from group means: Q = INT NT P
26 / 62
Some transformations, cont.
P and Q have the effect of transforming the data to group means,
and deviations from means, respectively:
Y11 Y1
Y1
...
...
Y1T Y1
Y1
Y2
Y21 Y2
...
...
PY = , QY =
, and so on.
Y2T Y2
Y2
...
...
Y Y
Y
N
N1
N
...
...
YN
YNT YN
Note that P and Q are idempotent (P 2 = P, Q 2 = Q) and
orthogonal (PQ = 0).
27 / 62
Fixed and random effects similarity
Hence, the fixed effects estimator of can be expressed in more
compact notation:
FE = (X QX )1 X QY
(36)
The random effect transformation described above is a partial
deviation from group means:
Yit = Yit Yi ,
(37)
Xit = Xit Xi ,
1/2
u2
where = 1 2 +T
.
2
(38)
and
28 / 62
Fixed and random effects similarity
The partial deviations framework provides an optimal use of the
within group and the between group variation. Note that the larger
is the between-group fraction of total variation (i.e. 2 relative to
u2 ) and/or the larger is T , the greater will be (closer to 1), and
the more weight is given to within-group, compared to
between-group, variation.
2 = 0, then = 0 and the full variation in
Suppose T=3 and
the data is used, compared to = 0.5, if 2 = u2 ;
2 = 2 , then if T=3, = 0.5
Alternatively, suppose
u
compared to = 0.75 if T=15.
29 / 62
Random effects as weighted average
The random effect estimator can be thought of as a weighted
average of the within-group estimator FE and the between-group
estimator BE based on the group-means data:
RE = kk FE + (Ikk kk )BE ,
BE
N
hX
i =1
T (Xi X )(Xi X )
N
i1 X
(39)
T (Xi X )(Yi Y )
i =1
= (X PX )
X PY
30 / 62
Random effects as weighted average
N X
T
hX
(Xit Xi )(Xit Xi ) +
i =1 t=1
N X
T
X
N
X
T (Xi X )(Xi X )
i1
i =1
(Xit Xi )(Xit Xi )
i =1 t=1
u2
= (1 )2 .
u2 + T 2
(40)
If = 0, then FE and RE are equivalent (a lot of weight is
given to within-group variation).
If = 1, a lot of weight is given to between-group variation.
However, 0 < < 1 is the more likely case.
31 / 62
Summary
If E (i |Xit ) = 0,
Both RE and FE are consistent for (and so would be OLS).
RE is efficient, OLS has biased standard errors.
If E (i |Xit ) 6= 0,
RE is inconsistent for .
FE is consistent for .
32 / 62
Testing
The efficiency/consistency trade-off between RE and FE suggests
a method to test the random effects restriction. One of these tests
is the Hausman test. Under the null hypothesis,
H0 : E (i |Xit ) = 0, RE is efficient, but it is inconsistent under the
alternative hypothesis (Ha : E (i |Xit ) 6= 0). In contrast, FE is
consistent under both H0 and H1 .
The Hausman test statistic for this test is:
H = (FE RE ) {Var (FE RE )}1 (FE RE ),
(41)
where Var (FE RE ) = Var (FE ) Var (RE ) is the
variance-covariance matrix of the difference between the fixed
effects and random effects estimator.
33 / 62
Testing
Under the null hypothesis, the Hausman test statistic has a 2
distribution with degrees of freedom equal to the dimension of ,
i.e.:
H 2k
(42)
Note: Since the fixed effects estimation method can only identify
coefficients on time-variant variables, the relevant dimension of
is the number of time-varying variable coefficients.
34 / 62
3. Is fixed effects estimation always preferable
to random effects estimation?
35 / 62
Is FE always better than RE?
Recall: The fixed effects estimator uses only the within-group (=
difference from group mean) variation and ignores the
between-group variation. This method is used because of a
concern that this between-group variation is contaminated with
unobserved heterogeneity.
In some cases, the cross-sectional variation may be more reliable
than the within-group time-variation, in which case fixed effects
estimation may be worse than the OLS or RE alternatives.
36 / 62
Is FE always better than RE?
Examples are:
Measurement error in Xit : If Xit is measured with classical, i.e.
purely random, error, then taking either differences-from-mean
or first-differences will exacerbate the noise-to-signal ratio in
the resulting data Serious attenuation bias in FE
Endogenous changes in Xit : If X is endogenous, i.e. changes
in Xit over time are not exogenous to changes in Yit , then
fixed effects estimation may be worse than random effects or
OLS. In this case, (Xit Xi ) may be strongly correlated with
(it i ).
There may not be enough variation in the X variables,
although FE can estimate the coefficient even if X rarely
changes (Pluemper and Troeger, 2007)
37 / 62
4. Hausman-Taylor (1981) approach to
estimating coefficients on both time-varying
and time-invariant variables
38 / 62
Hausman-Taylor (1981) approach
If we have a situation, in which we have both time-variant and
time-invariant variables of interest, Hausman and Taylor show that
consistent estimation of the coefficients of interest is possible, if
not all of the time-varying coefficients are correlated with the
unobserved heterogeneity,
The basic idea is to use the group means of the time-varying
variables that are uncorrelated with the unobserved heterogeneity
as instrument for the time-invariant variables to obtain consistent
estimates of their coefficients, while consistent estimates of the
time-varying variable coefficients can be obtained using standard
fixed effects estimation.
This requires there are at least as many uncorrelated time-varying
variables as correlated time-invariant variables and also that there
is suitable correlation between these.
39 / 62
Hausman-Taylor (1981) approach
Consider the linear regression of Yit on k time-varying covariates
(Xit ) and g time-invariant covariates (Zi ):
Yit = Xit + Zi + it ,
(43)
where i = 1, . . . , N, t = 1, . . . , T , and
it = i + uit ,
(44)
40 / 62
Hausman-Taylor (1981) approach
Sub-divide each of the Xit = (X1it X2it ) and Zi = (Z1i Z2i ) :
X1it and X2it consist of k1 and k2 variables, respectively
(k1 + k2 = k);
Z1i and Z2i consist of g1 and g2 variables, respectively
(g1 + g2 = g );
E (i |X1it ) = 0 and E (i |Z1i ) = 0; and
E (i |X2it ) 6= 0 and E (i |Z2i ) 6= 0.
41 / 62
Hausman-Taylor (1981) approach
The intuition for the Hausman-Taylor approach is follows:
STEP 1: Fixed effects provides consistent estimation of the
coefficients on the time-varying variables:
(45)
FE = (X QX )1 X QY
N 1
Remember that Q = I P, where P = I
T ii The
residual variance obtained in this step is a consistent estimator
of u2 .
STEP 2: Using FE to construct the group means of the
within-group residuals:
di = Yi Xi FE = Zi + i + ui ,
(46)
where ui is the group-mean residual (uit ).
If (46) was estimated with OLS or GLS, then
is likely to be
biased, due to the correlation of Zi 2 with i .
42 / 62
Hausman-Taylor (1981) approach
Where does expression for d in (46) come from? The group means
of the within-group residuals are derived as follows:
d = P(Y X FE ) = P{I X (X QX )1 X Q}Y
= P{I X (X QX )1 X Q}(X + Z + + u)
= P(X + Z + + u X )
= P(Z + + u)
= Z + + Pu
This is a regression of the group-mean residuals from the fixed
effects regression on the Zi s, with i + ui being the group-mean
residuals.
43 / 62
Hausman-Taylor (1981) approach
1i as instruments for Z2i . This will provide
STEP 3: Use X
consistent estimation of if there are sufficient X1 s (i.e. order
condition: k1 g2 ), and the X1i s are correlated with the Z2i s
(rank condition).
Then estimate (46) with a 2-SLS approach, where:
= (Zi PA Zi )1 Zi PA di
(47)
where A = [X1it Z1i ], and PA is the projection matrix:
and
PA = A(A A)1 A
(48)
Z2 = A(A A)1 A Z2
(49)
44 / 62
Hausman-Taylor (1981) approach
NOTE: Both FE and
2SLS are consistent. However, since
FE is likely to be inefficient, then 2SLS , which stem from the
FE approach are likely to be inefficient too. Therefore,
Hausman and Taylor suggest an extension to estimate and
in a more efficient way.
STEP 4: The residual variance in the step above is a
consistent estimator of 2 = u2 /T + 2 . Using the
consistent estimator of u2 from the first step, we deduce an
estimator for 2 = 2 u2 /T . The weight for feasible GLS
is:
=1 p
u
u2 + T 2
(50)
45 / 62
Hausman-Taylor (1981) approach
STEP 5: Construct a weighted instrumental variable
estimator. The full set of variables is:
wit = (X1it
X2it
Z1i Z2i ) = WNT (k1 +k2 +g1 +g2 ) ,
(51)
so the transformed variables of GLS are:
wit
Yit
= wit w
i
= Yit Yi .
The instruments used are:
vit = [(X1it X1i ) (X2it X2i ) Z1i X1i ]
(52)
46 / 62
Hausman-Taylor (1981) approach
1
Instrumental variable estimator (efficient):
( )IV = [(W V )(V V )1 (V W )]1 [(W V )(V V )1 (V Y )]
(53)
Instrumental variable estimator using un-weighted variables
(inefficient):
( )IV = [(W V )(V V )1 (V W )]1 [(W V )(V V )1 (V Y )]
(54)
Feasible GLS estimator
( )GLS = [W W ]1 [W Y ]
(55)
47 / 62
5. Correlated random effects (CRE): a flexible
extension to random effect models
48 / 62
Intuition of CRE
Recall that the random effects estimator is biased if is correlated
with Xit . Chamberlain (1984) and Mundlak (1978) observed that
if i is correlated with Xit in period t, then it will also be
correlated with Xit in period s, where t 6= s. One interpretation of
this observation is that Xit should be included in the period s
regression. More generally, all the realisations of the X s should be
included in each periods regression.
That is, if i is correlated with Xit in the structural form then all
leads and lags of Xit should be included in the regression.
49 / 62
Formalisation of CRE
Specify the linear projection of i on the set of Xit s:
i = Xi1 1 + Xi2 2 + . . . + XiT
T + i .
(56)
Eq. 56 provides a way to decompose i into two components:
1
) that is
A component (Xi1 1 + Xi2 2 + . . . + XiT
T
correlated with the observable covariates; and
A component (i ) that is uncorrelated with the covariates.
The s are the projection coefficients that reflect the extent of the
correlation between i and Xit , and i is, by construction, a true
random effect - i.e. uncorrelated with Xit for all t.
50 / 62
Formalisation of CRE
Note:
E (i |Xit ) does not have to be linear in the Xit s. It is only the
linear correlation that causes bias/inconsistency in the OLS
and (random effects/GLS) estimator. Hence, only the linear
projection is required for CRE to be unbiased/consistent.
Mundlak (1978) adopted the more restricted specification that
1 = 2 = T = . This restriction implies that Eq. 56
reduces to:
i = (T Xi ) + i
(57)
51 / 62
Mundlaks assumption and consequences
The assumption that the individual-specific effect is equally
correlated with all time-period Xit s implies a very easy
implementation of the correction. All you need to do is to replace
i in Eq. 60:
Yit = Xit + i + uit ,
(58)
With (and ignore the scaling factor of T):
To get:
i = (T Xi ) + i
(59)
Yit = Xit + Xi + i + uit ,
(60)
where i is a true random effect.
52 / 62
Chamberlains approach
If you do not want to make the strong assumptions made by
Mundlak, then implementation of this correction is slightly more
difficult. Use Eq. 56 to substitute for i in combined Eq. 30, we
get:
Yit
= Xit + Xi1 1 + Xi2 2 + . . . + XiT
T + i + uit (61)
X
= Xit ( + t ) +
Xis s + i + uit .
(62)
s6=t
or, in more compact form:
Yit = Xi1 t1 + Xi2 t2 + . . . + XiT
tT + i + uit .
where ts =
s
+ t
(63)
s 6= t
s = t.
53 / 62
Some explanations
Eq. 62 is the reduced form equation for the model. The errors
(i + uit ) are uncorrelated with the regressors. This expression
shows that one way to view the problem of ignoring the correlation
between the covariates and the unobserved heterogeneity is an
omitted variables problem that can be solved by including all the
out-of-period realisations of Xis in the period t equation.
In Eq. 63, the coefficient on Xit , i.e. tt , consists of two
components:
1
The structural effect of interest ;
The component t , which reflects the correlation of Xit with
the unobserved heterogeneity.
54 / 62
Estimation of CRE
The parameters of interest ( and t s) can be estimated by the
minimum distance approach - it requires two steps:
1
Estimate the unrestricted reduced form equations as outlined
in Eq. 63 by OLS. Include all the leads and lags of the Xit s in
the period t regression, and estimate this regression separately
for each time period.
Estimate the parameters of interest by imposing the implied
restrictions (see below) on the first-stage reduced form
coefficients using a minimum distance estimation method.
This latter means to use a quadratic form criteria as the basis
for estimating the parameters of interest in the second stage.
The implied cross-equation restrictions are:
1
ts = s t 6= s;
tt st = t 6= s.
The details of minimum distance are explained on the white-board.
55 / 62
Evaluation of CRE
This approach is called random effects because it
parameterises the distribution of i (i.e. by projecting i onto
the set of sample realisations of Xit );
It requires to estimate 1 + TK + K parameters (risk of
proliferation of parameters);
It relies on the measured Xit s being time-varying. Time
invariant variables will be absorbed into the i in this
specification;
A test of the (zero-) correlation between the covariates and
the unobserved heterogeneity is given by testing
H0 : 1 = 2 = . . . = T = 0 vs Ha : not all are zero;
An important caveat to the CRE discussion is that Xis enters
the period t equation only via its correlation with i . In some
situations, out-of-period regressors may have independent,
structural reasons for being included (this approach may fail
then).
56 / 62
6. Plumper and Troeger (2007) approach to
modelling (nearly) time-invariant variables
57 / 62
Three-stage procedure
Run a fixed effects model - predict the individual fixed effect;
Decompose the individual fixed effects into the part explained
by time-invariant and/or rarely changing variables and an
error term (hi );
Re-estimate the first stage by pooled OLS including the
time-invariant variables plus the error term of stage 2.
58 / 62
Three-stage procedure
yit yi = k
K
X
(xkit xki +m
M
X
(zmi zmi )+(eit ei )+(ui ui ))
m=1
k=1
(64)
Let:
ui = yi
K
X
k xkit ei )
(65)
k=1
ui =
M
X
m zmi + hi
(66)
m=1
and
hi = ui
M
X
m zmi
(67)
m=1
yit = + k
K
X
k=1
xkit +
M
X
m=1
m zmi + hi + it
(68)
59 / 62
Monte Carlo Simulations
1
Compare finite sample properties of the FEVD estimator
against those of the Pooled OLS, RE, and Hausman-Taylor IV
estimator (Use RMSE as criterion);
If both time-invariant and time-varying variables correlate
strongly with the individual FE, than FEVD outperforms all
estimators;
When considering the estimates of coefficients on rarely
changing variables, FEVD outperforms FE if:
Ratio between Between/Within variation is high (threshold is
1.7), and;
Overall R 2 is low, and;
Correlation between rarely changing variables and ind. FE is
low.
60 / 62
7. Application: Effect of Health on Hourly
Wages (Contoyannis and Rice, 2001) using six
waves of BHPS
61 / 62
Assumptions
1
Remember: Use the mean values of the exogenous
time-varying variables to instrument the time-invariant
endogenous variables
Time-invariant endogenous variables: Higher Degree;
Time-variant endogenous variables: Health (Psychological and
Physiological), workforce sector, occupation;
Test for the validity of the instruments in the Hausman and
Taylor approach using a Hausman test (comparing the
estimated coefficients with those of a FE model): They should
be sufficiently close.
Approach is valid only if health is correlated with the
individual, time-invariant effect of wages, but not with the
period-specific effects of wages
62 / 62