ECONOMETRICS Summary 21:22
ECONOMETRICS Summary 21:22
LINEAR MODEL (SINGLE EQUATION LINEAR MODEL WITH CROSS SECTIONAL DATA:
OLS)
OLS (ordinary least-squares method) is the goodness of fit measure R2: it is a method for
estimating the regression ŷ. It estimates the relationship between x and y and how a change in x
affects y. R2 represents how much variation of y the model has explained. Higher R2 (R2=1) means
better estimation but correlation does not correspond to causation. Correlation cannot inform
policy.
Problems:
- The most frequent problem with OLS is that of omitted variable bias.
- Controlling for variables that are caused by the variable of interest will also lead to biased
coefficient.
- The same unit cannot be treated and not treated at the same time
where y; x1; x2; x3; ... ; xK are observable random scalars (that is, we can observe them in a
random sample of the population), u is the unobservable random disturbance or error, and b0; b1;
b2; ... ; bK are the parameters (constants) we would like to estimate.
y = β0 + βx + u
The error term u can consist of a variety of things, including omitted variables and measurement
error. The parameters βj hopefully correspond to the parameters of interest.
(Intercept absorbed)
Assumption OLS.1 (Zero Correlation/orthogonality conditions): The error has a zero mean and
is uncorrelated with each explanatory variable:
(i) omitted variables (we would like to control for one or more additional variables but,
usually because of data unavailability, we cannot include them in a regression model)
2
If (1)
Assumption OLS.2 (No Perfect Collinearity): In the population, there are no exact linear
relationships among the covariates. It fails if and only if at least one of the regressors can be
written as a linear function of the other regressors (in the population).
3
High correlation among regressors often cannot be avoided, but not a violation of assumptions.
Sometimes high correlation among regressors (multicollinearity) is the researcher’s fault because
parameterization has not been carefully chosen.
We can write Β as a function of population moments in observable variables:
OLS on a random sample is consistent for β. Then OLS using a random sample consistently
estimates β.
Solved is:
Assumption OLS.3 (Homoskedasticity): All the residuals u have the same variance, it has
nothing to do with consistency of β.
With
TYPE OF BIAS:
● selection bias: cov (Ɛ, X) != 0 some individual characteristics may affect not only the
outcome but the probability to receive the treatment, the treatment is not random but
depends on individual characteristics. Those who get (choose) the treatment are very
different from those who do not choose the treatment. There are several ways to eliminate
the selection bias: Correct randomization, Include fixed effects, Using control variables
(that should be uncorrelated with D_i but correlated with Y_i).
● omitted variable bias: I should control for a variable but I’m not able to control that
variable, because for example I cannot measure this variable.
● bad controls: it is a version of selection bias, include omitted variable is important as soon
as they are relevant in economic terms, if we introduce endogenous variable, with
measurement error problem, more control doesn’t mean better estimates. Bad controls
are variable that could be themselves be outcomes. They do not respond to any economic
intuition.
● measurement error: we observe yi and xi, which are the true variables, plus some noise.
● Confuse correlation with causation (omitted factors, spurious correlation: refers to a
connection between two variables that appears causal but is not. Spurious relationships
often have the appearance of one variable affecting another. This spurious correlation is
often caused by a third factor that is not apparent at the time of examination, sometimes
called a confounding factor, reverse causation: occurs when you believe that X causes Y,
but in reality, Y actually causes X)
● Heteroscedasticity or autocorrelation in the estimation of standard error, the most
important assumption with OLS is about the error term (an example of autocorrelation is
the grouped error structure).
5
Exogeneity cannot be tested, instead we can always test the relevance: the null that z and x are
uncorrelated given a sample of data (and hope to reject the null).
Replacing the population covariances with the sample covariances gives us the so-called
instrumental variables estimator for the simple regression model:
and then even a small correlation between z and u can produce a larger asymptotic bias than OLS.
6
OLS and IV are different estimation methods that can be applied to the same model. They are
consistent under different assumptions.
Reduced form:
Part (a) rules out perfect collinearity among the exogenous variables. Part (b) is the practically
important restriction.
Deriving 2SLS:
Two-step estimation: The first-stage regression is xi on zi to get the fitted values, x̂i. The second-
stage regression is yi on x̂i.
It is best to use a software package with a 2SLS command rather than explicitly carry out the two-
step procedure. Carrying out the two-step procedure explicitly makes one susceptible to harmful
mistakes (standard errors incorrect in the second stage regression…).
Two-Stage least squares (2SLS) regression analysis is the extension of the OLS method. It is used
when the dependent variable’s error terms are correlated with the independent variables. An
instrument variable is used to create a new variable by replacing the problematic variable. In
ordinary least square method, there is a basic assumption that the value of the error terms is
independent of predictor variables. When this assumption is broken, this technique helps us to
solve this problem. This analysis assumes that there is a secondary predictor that is correlated to
the problematic predictor but not with the error term.
Statistical significant is relevant but economic significant is extremely important to understand if
the variable are strongly related (Z have to explain well the variation of X). Good instruments are
usually generated by real or natural experiments.
Given the existence of the instrument variable, the following two methods are used:
- In the first stage, a new variable is created using the instrument variable (relationship
between instrument and variable instrumented)
- In the second stage, the model-estimated values from stage one are then used in place of
the actual values of the problematic predictors to compute an OLS model for the response of
interest (put the first stage in the causal relationship of interest).
This is called 2SLS because it can be done in two steps:
1) Obtain the first stage fitted values
2) Plug the first stage fitted values into the "second-stage equation".
With this method we compare the coefficient of OLS (causal relationship of interest) and the
coefficient using an instrumental variable (second stage).
The reduced form equation can be derived by substituting the first-stage equation, into the causal
relation of interest. The reduced form is the regression of the dependent variable on any
8
covariates in the model and the instruments (it could directly estimate the relation between the
instrument and the outcome variable).
Problems:
- instrument is not truly exogenous.
- Even instruments that are randomly assigned can be invalid (they don't affect the outcome
directly).
- Weak instrument (increase bias).
(1)
Let y1 be the response variable, y2 the single endogenous explanatory variable (EEV), and z the
1 ⋅ L vector of exogenous variables, where z1 is a 1 ⋅ L1 strict subvector of z.
The control function approach uses extra regressors to break the correlation between
endogenenous explanatory variables and unobservables affecting the response, the method still
relies on the availability of exogenous variables that do not appear in the structural equation.
Exogeneity assumption:
v2 is an explanatory variable in the equation. The new error, e1, is uncorrelated with y2 as well as
with v2 and z.
Two-step procedure:
(i) Regress yi2 on zi and obtain the reduced form residuals, v̂i2;
We really need to impose much more on the reduced form; it is no longer just defined as a linear
projection:
CF accounts for endogeneity of y2 and y22 using a single control function, v̂2. CF is likely more
efficient but definitely less robust.
(CRC)
Consistent estimation of APEs is more difficult if one or more explanatory variables are
endogenous.
The potential problem with applying instrumental variables is that the error term e1 = v1y2 + u1 is
not necessarily uncorrelated with the instruments z, even with our maintained assumptions.
11
The original intercept, η1, cannot be estimated. Is really this bias something to pay attention to?
(endogeneity test)
Bias: empirical fact (impede us to use OLS).
Endogeneity: it can be something that explain this bias.
If all elements of x are exogenous then 2SLS and OLS should differ only due to sampling error.
It makes no sense to make inference on β using, say, OLS robust to general heteroskedasticity and
then assume homoskedasticity when obtaining a Hausman test. The traditional Hausman test that
compares 2SLS and OLS does not have a limiting chi-square distribution when heteroskedasticity is
present. Yet it has no systematic power for detecting heteroskedasticity.
Compare the 2SLS estimator using all instruments to 2SLS using a subset that just identifies
equation. If all instruments are valid, the estimates should differ only as a result of sampling error.
12
A failure to reject should not make us too confident. A rejection indicates that one or both IVs fail
the exogeneity requirement; we do not know which one or whether it is both.
The usefulness of the Sargan-Hausman test is that, if we reject the null hypothesis, then our logic
for choosing the IVs must be re-examined. Unfortunately, the test does not tell us which IVs fail
the exogeneity requirement; it could be one of them or all of them.
If we fail to reject the null hypothesis, then we can have some confidence in the set of instruments
used up to a point. Even if we do not reject the null hypothesis, it is possible that more than one
instrument is endogenous, and that the 2SLS estimators using a full and reduced set of
instruments are asymptotically biased in similar ways.
weaker condition:
A stronger assumption:
13
PANEL DATA
A panel data is a sample of individual observed in multiple time periods.
Advantages of using panel data: more control of omitted variables, more observation, many
research questions involve time.
Fixed effects, d-in-d and panel data are strategies that use data with a time or cohort dimension to
control for unobserved but fixed omitted variables.
(Cross-sectional data: A sample of individuals observed in 1 time period).
For each unit in the population, we have a (usually short) time series.
there is a single response variable, yt, that we observe in several time periods.
Having the same β for all t is not restrictive because xt can be very general.
It can (and usually should) include time-period dummies to allow a different intercept in each
period. It can include interactions between time-period dummies and other variables to allow
partial effects to change over time.
We can write in a way similar to a SUR system:
Suggests that, even though the applications are very different, a common statistical framework
can be used for SUR and panel data.
Xt has the same dimension for all t. In SUR, the dimension of the covariates generally changes
across equation.
14
The general model allows variables that change only across time (such as year dummies), only
across unit (such as gender), and across unit and time (typically, the most interesting variables).
Some variables, such as gender, do not change over time for any units in the population.
Contemporaneous exogeneity:
Rules out omitted variables, measurement error, and so on, but it does not restrict correlation
between xs and ut for s ≠ t.
Sequential exogeneity:
can be applied to distributed lag models and models with lagged dependent variables.
Given whatever is in xt, the covariates from other time periods do not help explain yt.
(ut and xt+1 cannot be correlated)
Contrast the SUR case, where the dimension of the covariates can be different across equations.
Remember, though, that xit can be chosen quite flexibly.
This is the weakest possible assumption without moving into instrumental variables territory.
The strongest assumption we could make:
is equivalent to:
SOLS.1 does not restrict relationship between uig and covariates in other equations.
In the panel data case:
16
which simply says that the single-equation OLS rank condition (OLS.2) holds for each equation.
Panel Data:
The SOLS estimator looks just like the single-equations OLS estimator, but Xi is a matrix and yi is a
vector:
Testing
Obtain asymptotic standard errors to construct large-sample t statistics and confidence intervals.
EXTENSIONS
We can combine SUR and panel data models:
(G = T)
19
Assume, for now, that we know Ω. Transform the equation to remove correlations in errors and
make variances constant.
every element of Xi is uncorrelated with every element of ui, so any linear combination of Xi is
uncorrelated with ui.
A variety of GLS estimators, even with a misspecified variance matrix, will be consistent.
Assumption SGLS.2 (Rank Condition):
Ω is nonsingular and
is nonsingular.
We need the covariates in each equation to be uncorrelated with the errors in each equation.
The GLS estimator in some cases is OLS equation-by-equation.
FEASIBLE GLS
If the covariance of the errors Ω is unknown, one can get a consistent estimate of Ω, say ΩΛ, using
an implementable version of GLS known as the feasible generalized least squares (FGLS) estimator.
In FGLS, modelling proceeds in two stages: (1) the model is estimated by OLS or another consistent
(but inefficient) estimator, and the residuals are used to build a consistent estimator of the errors
covariance matrix (to do so, one often needs to examine the model adding additional constraints,
for example if the errors follow a time series process, a statistician generally needs some
theoretical assumptions on this process to ensure that a consistent estimator is available); and (2)
using the consistent estimator of the covariance matrix of the errors, one can implement GLS
ideas.
The Estimator and Asymptotic Properties
Estimator of Ω.
This only makes sense when Ω has fixed dimension. (In the panel data case, T is fixed).
In SUR analysis, we almost always use:
21
β ̂ and β∗ are “asymptotically equivalent”, they are “N −equivalent,” which is much stronger than
saying that they are both consistent.
If N is “small,” the statistical properties of β ̂ and β∗ could be very different.
This estimator is robust to “system heteroskedasticity.”
Assumption SGLS.3 (System Homoskedasticity):
Given the zero conditional mean assumption, the key conditions are:
This estimator can be needed under system homoskedasticity if incorrect restrictions are imposed
on the unconditional variance-covariance matrix.
SUR REVISITED
SUR system, written for a random draw i as:
We have considered two estimators of the βg: OLS equation-by-equation and GLS using Ω ̂ as the
estimated G ⋅ G variance matrix.
OLS VERSUS SUR FOR SYSTEMS
If the same regressors appear in each equation the OLS equation-by-equation is numerically the
same as FGLS for any structure of Ω.
If Ω ̂ is diagonal, FGLS = OLS equation by equation for any choice of explanatory variables.
If Ω is diagonal, and then FGLS and OLS EBE are asymptotically equivalent.
23
FGLS is (asymptotically) more efficient the OLS EBE only when at least some exclusion restrictions
have been made and there is some correlation in the errors across equations. Therefore, there is a
tradeoff between efficiency and robustness.
Stata has a feature that allows one to specify linear constraints on the parameters.
Cross-equation restrictions arise naturally in demand systems, cost share equations, and so on.
In share equations, where the dependent variable is a fraction, can question whether linearity
seems reasonable. Almost certainly the system homoskedasticity assumption fails.
SYSTEMS WITH SINGULAR VARIANCE-COVARIANCE MATRICES
In expenditure and cost share systems, the G responses, if the categories are exhaustive and
mutually exclusive, sum to unity.
Restriction on the sum:
Can drop any of the equations. Make it the last one, and impose the restrictions on the
parameters.
This two-equation system has a cross equation restriction, too. But the singularity in the variance
matrix is gone, so can apply FGLS with:
Can add firm characteristics to the share equations without essential change.
xit can contain all kinds of explanatory variables, including time period dummies and variables that
do not change over time.
ASSUMPTIONS FOR POOLED OLS (POLS)
pooled OLS is employed when you select a different sample for each year/month/period of the
panel data. Fixed effects or random effects are employed when you are going to observe the same
sample of individuals/countries/states/cities/etc.
What is fixed effects and random effects?
A fixed effects model is a statistical model in which the model parameters are fixed or non-random
quantities. This is in contrast to random effects models and mixed models in which all or some of
the model parameters are random variables.
Randomized evaluations
We would like to compare identical individuals, we cannot do that but the aim is still the same.
The issue for good estimate is the self-selection, randomization break this problem.
Randomization is not just the random assignment of the treatment to an individual so that the
treatment does not depend on individual characteristic, but the aim of randomization is also to
allows us to compare groups that are identical along many other characteristics. So with
randomization we obtain two groups: the treated and control that apart from the treatment they
are identical, in this way we avoid the self-selection problem.
In a randomized experiment, a sample of N individuals is selected from the population. This
sample is then divided randomly into two groups: the Treatment group (Nt individuals) and the
Control group Nc (individuals). Obviously Nt+Nc=N.
The Treatment group is then treated by policy X while the control group is not. Then the outcome
Y is observed and compared for both Treatment and Control groups. The effect of policy X is
measured general by the difference in empirical means of Y between Treatments and Controls.
Manipulation is the key to resolve the problem of selection bias.
(No causation without manipulation: causes are only those things that could be, in principle,
treatments in experiments).
Problems:
- It is not possible to run all the experiments we would like to because they might affect
substantially the economic or social outcomes of the Treated.
Simple difference
As random experiments are very rare, economists have to rely on actual policy changes to identify
the effects of policies on outcomes. These are called “natural experiments" because we take
advantage of changes that were not made explicitly to measure the effects of policies.
26
The key issue when analyzing a natural experiment is to divide the data into a control and
treatment group. The most obvious way to do that is to do a simple difference method using data
before (t = 0) and after the change (t = 1). The OLS estimate of β is the difference in means Y1-Y0
before and after the change.
Difference-in-difference
A way to improve on the simple difference method is to compare outcomes before and after a
policy change for a group affected by the change (Treatment Group) to a group not affected by the
change (Control Group). Alternatively: instead of comparing before and after, it is possible to
compare a region where a policy is implemented to a region with no such policy.
The idea is to correct the simple difference before and after for the treatment group by
subtracting the simple difference for the control group.
The DD-estimate is an unbiased estimate of the effect of the policy change if, absent the policy
change, the average change in Y1-Y0 would have been the same for treatment and controls.
Problems:
- A pre-condition of the validity of the DD assumption is that the program is not
implemented based on the pre-existing differences in outcomes.
- When average levels of the outcome Y are very different for controls and treatments
before the policy change, the magnitude or even sign of the DD effect is very sensitive to the
functional form posited.
Fixed effect
Fixed effects can be seen as a generalization of DD in the case of more than two periods (say S
periods) and more than 2 groups (say J groups). Controlling for variables that are constant across
entities but vary over time can be done by including time fixed effects.
Suppose that group j in year t experiences a given policy T (for example an income tax rate) of
intensity Tjt. We want to know the effect of T on an outcome Y.
OLS Regression: Yjt = α + βTjt + Ɛjt
Put time dummies and group dummies in the regression:
Yjt = α +γt +δj +βTjt + Ɛjt
Direct extension of DD where there are 2 groups that experience different changes in policy over 2
periods. The fixed effects strategy requires panel data.
Assumption POLS.1 (Contemporaneous Exogeneity):
27
POLS.1 allows for lagged dependent variables as well as other non-strictly exogenous regressors.
Assumption POLS.2 (Rank Condition):
is:
This expression simplifies if we appropriately restrict the conditional variances and covariances.
Assumption POLS.3 (Homoskedasticity and No Serial Correlation):
implies that the “usual” asymptotic variance matrix estimator of β ̂ POLS is valid.
Can use the usual t and F statistics as approximately valid for large N.
Without POLS.3, generally need fully robust variance matrix. That is, robust to arbitrary
heteroskedasticity and serial correlation.
This estimator in Stata is computed using a “cluster” option, where each unit i is a cluster of T time
series observations.
If we maintain the no serial correlation part of POLS.3, then a heteroskedasticity-robust form is
valid. In Stata, this estimator is obtained with a “robust” option, but its robustness is limited to
heteroskedasticity, not serial correlation.
DYNAMIC COMPLETENESS AND TIME SERIES PERSISTENCE
One way to interpret the presence of serial correlation in the errors of panel data models is that
the model has misspecified dynamics. However, we may not want a model to satisfy the DC
assumption.
The presence of serial correlation is entirely different from strict exogeneity. Strict exogeneity always fails
in models with a lagged dependent variable.
With a large cross section and small T, the statistical properties of the estimators are invariant to the time
series properties of the series.
We can ignore estimation of β – in this case, estimation by POLS – in testing assumptions about the
unconditional variance-covariance matrix. Therefore, testing for serial correlation, or for constant variances
across time, is straightforward. Testing for AR(1) serial correlation:
with large N we can do valid inference with POLS. So, this is an efficiency issue.
If we use FGLS to account for serial correlation, strict exogeneity is key. If we make adjustments
just for heteroskedasticity, contemporaneous exogeneity sufficies provided our estimated
variance functions depend only on elements of xit.
29
while Prais-Winsten (and other methods that exploit serial correlation in estimation) effectively
requires:
When “robust” is used as an option, Stata labels the standard errors “semi-robust.” For linear
models, there is no distinction between fully robust and semi-robust. But for certain kinds of
nonlinear models, one distinguishes between standard errors that allow misspecification of the
conditional mean – given fully robust standard errors – and those that only allow misspecification
of the conditional variance – which are dubbed “semi-robust.”
Consider a labor supply function and a wage offer (inverse labor demand) function:
The first shows how much each unit in the population would work at any given wage.
We assume that we observe equilibrium hours and wages for each individual:
In many cases, elements of zit might be correlated with the errors in other time periods.
The assumption is weaker, often in important ways, than the assumption that all elements of Zi
are uncorrelated with all elements in ui:
Looks like the key rank condition for 2SLS except that Zi and Xi are matrices.
32
This is the same as saying the rank condition holds for each equation g.
In the panel data case: if
Then:
If:
33
Then:
Estimation proceeds from the method of moments. The moment condition in Assumption SIV.1
can be written as:
or
If L = K:
Replace population averages with sample averages to get the system instrumental variables (SIV)
estimator:
The SIV estimator for the panel data system with IVs stacked with L = K is a pooled IV estimator:
,
generally has no solution when L > K (L equations in K unknowns).
βΛ solve:
35
Positive definiteness is stonger than needed. Usually the law of large numbers, combined with
consistency of a first-stage estimator, is used to establish SIV.3.
Where:
System 2SLS
The System 2SLS estimator uses weight matrix
36
The optimal weighting matrix is the inverse of the variance matrix of:
To obtain an actual GMM estimator using an efficient weighting matrix, use a two-step procedure:
1)
2)
37
We call such an estimator an optimal GMM estimator. It is sometimes called a minimum chi-
square estimator.
The optimal weighting matrix provides an asymptotically efficient estimator in the class of
estimators based on:
the estimator is asymptotically efficient for the given set of moment conditions (instruments).
It is possible we can find additional moment conditions that can enhance efficiency.
When L = K the weighting matrix is irrelevant. There is only one estimator consistent and that is
the system IV estimator.
Then:
means that all the squares and cross products are uncorrelated with the squares and
cross products in .
In the SUR case, can show SIV.5 is the same as:
Instruments:
Implies:
(b)
GIV.1 can impose unintended restrictions on the relationships between instruments and errors
across equations or time or both.
Assumption GIV.3 (System Homoskedasticity):
Like the GIV estimator, the consistency of the traditional 3SLS estimator does not follow from:
The equivalence of all estimates holds if we just impose the common instrument assumption in
the general system:
The GMM 3SLS estimator and the GIV estimator (using the same Ω )̂ are identical.
In many modern applications of system IV methods, to both simultaneous equations and panel
data, instruments that are exogenous in one equation are not exogenous in all other equations. In
such cases it is important to use the GMM 3SLS estimator once Zi has been properly chosen.
The GIV estimator and the traditional 3SLS estimator generally induce correlation between the
transformed instruments and the structural errors.
41
Most applications of method of moments tend to focus on GMM methods based on the original
orthogonality conditions. Nevertheless, for unobserved effects models we will see that the GIV
approach can provide insights into the workings of certain panel data estimators.
TESTING
MORE ON EFFICIENCY
Adding Instruments to Enhance Efficiency
Adding more instruments – that are exogenous, of course – can never hurt asymptotic efficiency
provided an optimal weighting matrix is used. We can never do worse – using first-order
asymptotics – by adding more IVs and using optimal GMM.
Generally, the optimal GMM estimator is more efficient than 2SLS.
If heteroskedasticity is present, we can keep adding instruments that would otherwise be
redundant in order to improve efficiency.
We cannot improve efficiency over OLS by adding nonlinear functions of x to the instrument list.
If heteroskedasticity is present, the GMM estimator that uses the extended instrument list and a
heteroskedasticity-robust weighting matrix is generally more efficient, asymptotically, than OLS!
Finding the Optimal Instruments
The optimal instruments are:
GMM estimation based on this set of moment conditions will be more robust than estimators
based on a transformed set of moment conditions, such as GIV. If we decide to use GMM, we can
use the unrestricted weighting matrix.
Under Assumption SIV.5, which is a system homoskedasticity assumption, the GMM 3SLS
estimator is an asymptotically efficient GMM estimator.
When the same instruments can be used in every equation, GMM 3SLS, GIV, and traditional 3SLS
are identical.
When GMM and GIV are both consistent but are not asymptotically equivalent, they cannot
generally be ranked in terms of asymptotic efficiency.
One can never do worse by adding instruments and using the efficient weighting matrix in GMM.
This has implications for panel data applications. For example, if one has the option of choosing
the instruments as a block diagonal matrix or as a stacked matrix, it is better in large samples to
use the block diagonal form.
Under system homoskedasticity, 3SLS is generally more efficient. But there are situations where
they coincide.
If Ω is diagonal, 2SLS and 3SLS are asymptotically equivalent.
There is a trade-off between robustness and efficiency.
SIMULTANEOUS EQUATION MODEL
Consider a system of two regressions:
This is a simultaneous equation model (SEM) since y1 and y2 are determined simultaneously. Both
variables are determined within the model, so are endogenous, and denoted by letter y.
In the balanced panel case we assume random sampling across i (the cross section dimension),
with fixed time periods T. The unbalanced case is trickier because we must know why we are
missing some time periods for some units. We consider this much later under missing data/sample
selection issues.
For a random draw i from the population, the basic model is
where uit are the idiosyncratic errors. The composite error at time t is
Because of ci, the sequence is almost certainly serially correlated, and definitely is if uit is serially
uncorrelated.
Useful to write a population version of the model in conditional expectation form:
The rank condition is violated if xt has elements that do not change over time. Assume each
element of xt has some time variation (that is, for at least some members in the population).
The orthogonality condition is:
Now the rank condition also excludes variables that change by the same amount for each unit.
45
Random effect models assist in controlling for unobserved heterogeneity when the heterogeneity
is constant over time and not correlated with independent variables. This constant can be
removed from longitudinal data through differencing, since taking a first difference will remove
any time invariant components of the model.
Two common assumptions can be made about the individual specific effect: the random effects
assumption and the fixed effects assumption. The random effects assumption is that the individual
unobserved heterogeneity is uncorrelated with the independent variables. The fixed effect
assumption is that the individual specific effect is correlated with the independent variables.
If the random effects assumption holds, the random effects estimator is more efficient than the
fixed effects model.
ASSUMPTIONS
We assume a balanced panel.
The basic unobserved effects model is:
β is not depending on time. But xit can include time period dummies and interactions of variables
with time periods dummies, so the model is quite flexible.
A general specification is:
gt is a vector of aggregate time effects (often time dummies), zi is a set of time-constant observed
variables, and wit changes across i and t (for at least some units i and time periods t). wit can
include nteractions among time-constant and time varying variables.
Assumptions about the Unobserved Effect
“random effect” essentially means:
46
The term “fixed effect” means that no restrictions are placed on the relationship between ci and
“correlated random effects” is used to denote situations where we model the relationship
between ci and
Exogeneity Assumptions on the Explanatory Variables
ASSUMPTION RE.2:
Under Assumptions RE.1 and RE.3, Ω has the RE structure and system homoskedasticity holds.
(a) is homoskedasticity and serial uncorrelatedness of uit conditional on (xi, ci)
(b) is homoskedasticity of ci.
Under RE.1, RE.2, and RE.3:
is a valid estimator.
Fixed Effects Estimation
Fixed effects model is a statistical model in which the model parameters are fixed or non-random
quantities. This is in contrast to random effects models and mixed models in which all or some of
the model parameters are random variables.
In panel data where longitudinal observations exist for the same subject, fixed effects represent
the subject-specific means. In panel data analysis the term fixed effects estimator (also known as
the within estimator) is used to refer to an estimator for the coefficients in the regression model
including those fixed effects (one time-invariant intercept for each subject).
Unlike POLS and RE, fixed effects estimation removes ci to form an estimating equation.
ASSUMPTION FE.1:
49
ASSUMPTION FE.2:
The estimator is obtained by running a pooled ordinary least squares (OLS) estimation for a
regression of ∆yit on ∆xit.
The FD estimator is pooled OLS on the first differences. In practice, might not difference period
dummies, unless interested in the year intercepts in the original levels.
FD also requires a kind of strict exogeneity. The weakest assumption is:
ASSUMPTION FD.1:
ASSUMPTION FD.2:
ASSUMPTION FD.3:
RE, FE, and FD still the most popular approaches to estimating β with strictly exogenous
explanatory variables. Or, can use GLS versions of FE and FD.
CHAMBERLAIN’S APPROACH TO UE MODELS
In linear panel analysis, it can be desirable to estimate the magnitude of the fixed effects, as they
provide measures of the unobserved components. Chamberlain's approach to unobserved effects
models is a way of estimating the linear unobserved effects, under Fixed Effect (rather than
random effects) assumptions.
In the standard model:
Chamberlain simply writes down a linear projection relating ci to the entire history of the xit.
Assume no aggregate time effects for notational simplicity (and no time-constant variables).
but now we think some elements of xit are correlated with uit.
Pooled 2SLS will be consistent if:
In principle, this can be applied to models with lagged dependent variables, although in a model
with only a lagged dependent variable, it would be hard to find a convincing instrument.
Generally, assuming the instruments are uncorrelated with ci is a strong assumption. If we are
willing to make it, we probably are willing to assume strict exogeneity conditional on ci. So, we can
use an RE approach. Assumptions parallel those for exogenous xit.
ASSUMPTION REIV.1:
For simplicity, assumes that xit contains an overall intercept (and probably a separate intercept in
each time period), so we can take
As usual, we could relax the assumptions to zero correlation without changing consistency.
This is just the usual rank condition for GIV estimation. The REIV estimator is just the GIV estimator
where Ω is assumed to have the RE form. Without further assumptions, fully robust inference is
warranted, as usual.
ASSUMPTION REIV.3:
We can test the null that a set of variables is endogenous. Write the model as:
With REIV, can have time-constant explanatory variables and time-constant instruments. With lots
of good controls, or an exogenous intervention in an initial time period, the analysis can be
convincing. But time-constant IVs in panel data are often unconvincing. A more robust analysis
uses fixed effects and instrumental variables (FEIV). This requires time-varying instruments.
ASSUMPTION FEIV.1:
the IVs can be arbitrarily correlated with ci as long as there is exogenous time variation in the
instruments.
ASSUMPTION FEIV.2:
54
ASSUMPTION FEIV.3:
for overidentification:
For endogeneity: