0% found this document useful (0 votes)
32 views

ECONOMETRICS Summary 21:22

The document discusses linear regression models and ordinary least squares estimation. It covers assumptions of OLS including exogeneity of regressors and homoskedasticity. It also discusses problems that can arise such as omitted variable bias, measurement error, and simultaneity. Instrumental variable estimation is introduced as an approach to address endogeneity of regressors.

Uploaded by

Assan Achibat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

ECONOMETRICS Summary 21:22

The document discusses linear regression models and ordinary least squares estimation. It covers assumptions of OLS including exogeneity of regressors and homoskedasticity. It also discusses problems that can arise such as omitted variable bias, measurement error, and simultaneity. Instrumental variable estimation is introduced as an approach to address endogeneity of regressors.

Uploaded by

Assan Achibat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Econometric Analysis of Cross Section and Panel Data

LINEAR MODEL (SINGLE EQUATION LINEAR MODEL WITH CROSS SECTIONAL DATA:
OLS)
OLS (ordinary least-squares method) is the goodness of fit measure R2: it is a method for
estimating the regression ŷ. It estimates the relationship between x and y and how a change in x
affects y. R2 represents how much variation of y the model has explained. Higher R2 (R2=1) means
better estimation but correlation does not correspond to causation. Correlation cannot inform
policy.
Problems:
- The most frequent problem with OLS is that of omitted variable bias.
- Controlling for variables that are caused by the variable of interest will also lead to biased
coefficient.
- The same unit cannot be treated and not treated at the same time

where y; x1; x2; x3; ... ; xK are observable random scalars (that is, we can observe them in a
random sample of the population), u is the unobservable random disturbance or error, and b0; b1;
b2; ... ; bK are the parameters (constants) we would like to estimate.

y = β0 + βx + u
The error term u can consist of a variety of things, including omitted variables and measurement
error. The parameters βj hopefully correspond to the parameters of interest.

(Intercept absorbed)

Assumption OLS.1 (Zero Correlation/orthogonality conditions): The error has a zero mean and
is uncorrelated with each explanatory variable:

Explanatory variable xj is said to be endogenous in the equation if it is correlated with u.


Often violated by:

(i) omitted variables (we would like to control for one or more additional variables but,
usually because of data unavailability, we cannot include them in a regression model)
2

(ii) measurement error (of xk)


(iii) simultaneity (Simultaneity arises when at least one of the explanatory variables is
determined simultaneously along with y)

Zero conditional mean assumption:

If (1)

Then (2) (but not the contrary)


The difference between (1) and (2) is substantive:
under (1), discussions of “functional form misspecification” is meaningless, while (2) means that all
functions of the covariates affecting the population regression E(y|x)⇒ have been accounted for
in our choices of x2, ..., xK.
In most cases we hope to have (2) when the explanatory variables are “exogenous” – typically, if a
nonlinear function of a regressor is statistically and practically significant, we leave it in the model
– but in reality, we should probably settle for (1).
The parameters in the linear projection provide the best (population) mean square error
approximation to the true regression function.

For a continuous variable xj, its partial effect is which is a function of x.


Average partial effect (averaging across the distribution of x) is:

It is a constant parameter. If x has a multivariate normal distribution, then:

(for all j).


Multivariate normality is very strong and usually unrealistic, but it suggests that linear regression
more generally approximates quantities of interest: APEs.

Assumption OLS.2 (No Perfect Collinearity): In the population, there are no exact linear
relationships among the covariates. It fails if and only if at least one of the regressors can be
written as a linear function of the other regressors (in the population).
3

High correlation among regressors often cannot be avoided, but not a violation of assumptions.
Sometimes high correlation among regressors (multicollinearity) is the researcher’s fault because
parameterization has not been carefully chosen.
We can write Β as a function of population moments in observable variables:

is a K ⋅ K matrix of variances and covariances in the population;

is essentially a K ⋅ 1 vector of population covariances;


Apply the logic of method of moments given the random sample: replace population means with
sample means:

OLS on a random sample is consistent for β. Then OLS using a random sample consistently
estimates β.

Solved is:

Assumption OLS.3 (Homoskedasticity): All the residuals u have the same variance, it has
nothing to do with consistency of β.

With

u2 is uncorrelated with each xj

Homoskedasticity is often violated.


4

R-squared is perfectly valid as a goodness-of-fit measure under heteroskedasticity.

Weighted least squares (WLS) leads to a different estimator of β (heteroskedasticity-robust).

PRACTICAL REGRESSION HINTS:


- Do not always attempt to maximize R-squared, adjusted R-squared, or some other
goodness-of-fit measure. Might include in x factors that should not be held fixed. It is
possible to obtain a convincing estimate of a causal effect with a low R-squared. For
example, under random assignment, a simple regression estimate consistently estimates
the causal effect, but the “treatment” may not explain much of the variation in y.
- Include covariates that help predict the outcome if they are uncorrelated (in the
population) with the covariate(s) of interest. Adding z will not cause collinearity but it will
generally reduce the error variance.
- Be careful in using models nonlinear in explanatory variables, especially with interactions.
Coefficients on level terms may become essentially meaningless.
- Centering can make coefficients more interesting. Without centering some variables are
highly collinear.

TYPE OF BIAS:
● selection bias: cov (Ɛ, X) != 0 some individual characteristics may affect not only the
outcome but the probability to receive the treatment, the treatment is not random but
depends on individual characteristics. Those who get (choose) the treatment are very
different from those who do not choose the treatment. There are several ways to eliminate
the selection bias: Correct randomization, Include fixed effects, Using control variables
(that should be uncorrelated with D_i but correlated with Y_i).
● omitted variable bias: I should control for a variable but I’m not able to control that
variable, because for example I cannot measure this variable.
● bad controls: it is a version of selection bias, include omitted variable is important as soon
as they are relevant in economic terms, if we introduce endogenous variable, with
measurement error problem, more control doesn’t mean better estimates. Bad controls
are variable that could be themselves be outcomes. They do not respond to any economic
intuition.
● measurement error: we observe yi and xi, which are the true variables, plus some noise.
● Confuse correlation with causation (omitted factors, spurious correlation: refers to a
connection between two variables that appears causal but is not. Spurious relationships
often have the appearance of one variable affecting another. This spurious correlation is
often caused by a third factor that is not apparent at the time of examination, sometimes
called a confounding factor, reverse causation: occurs when you believe that X causes Y,
but in reality, Y actually causes X)
● Heteroscedasticity or autocorrelation in the estimation of standard error, the most
important assumption with OLS is about the error term (an example of autocorrelation is
the grouped error structure).
5

THE INSTRUMENTAL VARIABLES ESTIMATOR IN THE SIMPLE MODEL (CROSS-


SECTIONAL DATA)
OLS regression Y = Xβ +Ɛ is biased when Ɛ is correlated with X. A way to get around this issue is to
use an instrument Z for X.
2 important conditions for a valid IV:
1 Cov(X,Z) != 0 (Z is correlated with X, first stage exists).
2 Cov(Z,Ɛ) = 0 (exclusion restriction: Z is uncorrelated with any other determinants of the
dependent variable, the instrument is exogenous, the fact that Z is uncorrelated with Ɛ cannot be
tested).

where u is thought to be correlated with x


x is “endogenous” then ordinary least squares (OLS) will be inconsistent for β1.

An instrumental variable, z, for w has two properties:

Exogeneity cannot be tested, instead we can always test the relevance: the null that z and x are
uncorrelated given a sample of data (and hope to reject the null).

Replacing the population covariances with the sample covariances gives us the so-called
instrumental variables estimator for the simple regression model:

If is small z is a “weak” instrument

and then even a small correlation between z and u can produce a larger asymptotic bias than OLS.
6

OLS and IV are different estimation methods that can be applied to the same model. They are
consistent under different assumptions.
Reduced form:

z1 helps to predict xK controlling for the other exogenous variables.


If the model has more than one IV estimator we say the model is potentially overidentified.
Two-stage least squares (2SLS) estimator is the most efficient IV estimator:

Part (a) rules out perfect collinearity among the exogenous variables. Part (b) is the practically
important restriction.
Deriving 2SLS:

Two-step estimation: The first-stage regression is xi on zi to get the fitted values, x̂i. The second-
stage regression is yi on x̂i.
It is best to use a software package with a 2SLS command rather than explicitly carry out the two-
step procedure. Carrying out the two-step procedure explicitly makes one susceptible to harmful
mistakes (standard errors incorrect in the second stage regression…).

Potential pitfalls with 2SLS:


(1) A “little” endogeneity of one or more instruments can lead to large inconsistency if the
instruments are weak, that is, only slightly partially correlated with the endogenous explanatory
variables (EEVs).
(2) The standard errors of 2SLS can be large. Suppose xK is the only EEV.
7

Two-Stage least squares (2SLS) regression analysis is the extension of the OLS method. It is used
when the dependent variable’s error terms are correlated with the independent variables. An
instrument variable is used to create a new variable by replacing the problematic variable. In
ordinary least square method, there is a basic assumption that the value of the error terms is
independent of predictor variables. When this assumption is broken, this technique helps us to
solve this problem. This analysis assumes that there is a secondary predictor that is correlated to
the problematic predictor but not with the error term.
Statistical significant is relevant but economic significant is extremely important to understand if
the variable are strongly related (Z have to explain well the variation of X). Good instruments are
usually generated by real or natural experiments.
Given the existence of the instrument variable, the following two methods are used:
- In the first stage, a new variable is created using the instrument variable (relationship
between instrument and variable instrumented)
- In the second stage, the model-estimated values from stage one are then used in place of
the actual values of the problematic predictors to compute an OLS model for the response of
interest (put the first stage in the causal relationship of interest).
This is called 2SLS because it can be done in two steps:
1) Obtain the first stage fitted values
2) Plug the first stage fitted values into the "second-stage equation".
With this method we compare the coefficient of OLS (causal relationship of interest) and the
coefficient using an instrumental variable (second stage).
The reduced form equation can be derived by substituting the first-stage equation, into the causal
relation of interest. The reduced form is the regression of the dependent variable on any
8

covariates in the model and the instruments (it could directly estimate the relation between the
instrument and the outcome variable).
Problems:
- instrument is not truly exogenous.
- Even instruments that are randomly assigned can be invalid (they don't affect the outcome
directly).
- Weak instrument (increase bias).

CROSS-SECTIONAL DATA: CONTROL FUNCTIONS AND SPECIFICATION TESTING


In least squares estimation problems, sometimes one or more regressors specified in the model
are not observable. One way to circumvent this issue is to estimate or generate regressors from
observable data. This generated regressor method is also applicable to unobserved instrumental
variables. Under some regularity conditions, consistency and asymptotic normality of least squares
estimator is preserved, but asymptotic variance has a different form in general.
Control functions (also known as two-stage residual inclusion) are statistical methods to correct
for endogeneity problems by modelling the endogeneity in the error term. A particular reason why
they are popular is because they work for non-invertible models (such as discrete choice models)
and allow for heterogeneous effects, where effects at the individual level can differ from effects at
the aggregate.
Most models that are linear in parameters are estimated using two-stage least squares (2SLS): one
or more of the regressors have been estimated from a first-stage procedure.
An alternative, the control function (CF) approach, relies on the same kinds of identification
conditions.

(1)
Let y1 be the response variable, y2 the single endogenous explanatory variable (EEV), and z the
1 ⋅ L vector of exogenous variables, where z1 is a 1 ⋅ L1 strict subvector of z.
The control function approach uses extra regressors to break the correlation between
endogenenous explanatory variables and unobservables affecting the response, the method still
relies on the availability of exogenous variables that do not appear in the structural equation.
Exogeneity assumption:

(not ensured in the sample)


Reduced form for y2: (linear projection)
9

(we incorporate a bias in our regression: ρ)

is the population regression coefficient.

v2 is an explanatory variable in the equation. The new error, e1, is uncorrelated with y2 as well as
with v2 and z.
Two-step procedure:
(i) Regress yi2 on zi and obtain the reduced form residuals, v̂i2;

(ii) Regress (2)

(take into account sampling


error)

is called a generated regressor

OLS estimates from (2) are control function estimates


The OLS estimates of δ1 and α1 from (2) can be shown to be identical to the 2SLS estimates
starting from (1).
Extend the model so that the EEV is in quadratic form (include nonlinear IV functions in the
model):

We really need to impose much more on the reduced form; it is no longer just defined as a linear
projection:

Independence of u1, v2 and z and linearity restriction.


10

A CF approach is immediate: OLS of:

CF accounts for endogeneity of y2 and y22 using a single control function, v̂2. CF is likely more
efficient but definitely less robust.

CORRELATED RANDOM COEFFICIENT MODELS


Extrapolate individual characteristics

(CRC)

Still a linear model but also a form of non-linearity:

fixed term like intercept, random component, like an error

is the object of interest: the average partial effect (APE).

Consistent estimation of APEs is more difficult if one or more explanatory variables are
endogenous.

(Imposition: add a bias to our model)

The potential problem with applying instrumental variables is that the error term e1 = v1y2 + u1 is
not necessarily uncorrelated with the instruments z, even with our maintained assumptions.
11

is sufficient for 2SLS to consistently estimate


the equation we estimate by usual 2SLS can be written as:

The original intercept, η1, cannot be estimated. Is really this bias something to pay attention to?
(endogeneity test)
Bias: empirical fact (impede us to use OLS).
Endogeneity: it can be something that explain this bias.

TESTING FOR ENDOGENEITY


If the null hypothesis is that all explanatory variables are exogenous, and we allow one or more to
be endogenous under the alternative, then we can base a test on the difference between the 2SLS
and OLS estimators, provided we have sufficient exogenous instruments to identify the
parameters by 2SLS.
In the general equation y=xβ + u with instruments z the Durbin-Wu-Hausman (DWH) test is based
on the difference

If all elements of x are exogenous then 2SLS and OLS should differ only due to sampling error.
It makes no sense to make inference on β using, say, OLS robust to general heteroskedasticity and
then assume homoskedasticity when obtaining a Hausman test. The traditional Hausman test that
compares 2SLS and OLS does not have a limiting chi-square distribution when heteroskedasticity is
present. Yet it has no systematic power for detecting heteroskedasticity.

TESTING OVERIDENTIFYING RESTRICTIONS


When we have more instruments than we need to identify an equation, we can test whether the
additional instruments are valid in the sense that they are uncorrelated with u.
If we have more instruments than we need we can, in a (weak) sense, test whether some of them
are exogenous. The test will have weak power if the two IV estimators are biased in a similar way.

Compare the 2SLS estimator using all instruments to 2SLS using a subset that just identifies
equation. If all instruments are valid, the estimates should differ only as a result of sampling error.
12

A failure to reject should not make us too confident. A rejection indicates that one or both IVs fail
the exogeneity requirement; we do not know which one or whether it is both.
The usefulness of the Sargan-Hausman test is that, if we reject the null hypothesis, then our logic
for choosing the IVs must be re-examined. Unfortunately, the test does not tell us which IVs fail
the exogeneity requirement; it could be one of them or all of them.
If we fail to reject the null hypothesis, then we can have some confidence in the set of instruments
used up to a point. Even if we do not reject the null hypothesis, it is possible that more than one
instrument is endogenous, and that the 2SLS estimators using a full and reduced set of
instruments are asymptotically biased in similar ways.

SYSTEMS OF EQUATIONS: SYSTEM OLS


We assume random sampling of units from a well-defined population. We may sample several
different response variables along with explanatory variables (SUR: Seemingly Unrelated
Regressions), or sample different time periods on the same response and explanatory variables
(panel data).
In the panel data case, we assume a small number of time periods; to apply standard limit
theorems (law of large numbers, central limit theorem), we can use results for independent,
identically distributed observations.

SUR (SEEMINGLY UNRELATED REGRESSIONS):

The explanatory variables, xg, can be different across equations.


Exogeneity of the explanatory variables:

weaker condition:

A stronger assumption:
13

If G=3 then there are 3 response variables y.


In some applications, especially to consumer and firm theory, the coefficients are restricted across
equations.

PANEL DATA
A panel data is a sample of individual observed in multiple time periods.
Advantages of using panel data: more control of omitted variables, more observation, many
research questions involve time.
Fixed effects, d-in-d and panel data are strategies that use data with a time or cohort dimension to
control for unobserved but fixed omitted variables.
(Cross-sectional data: A sample of individuals observed in 1 time period).
For each unit in the population, we have a (usually short) time series.

A linear panel data model is:

there is a single response variable, yt, that we observe in several time periods.
Having the same β for all t is not restrictive because xt can be very general.

It can (and usually should) include time-period dummies to allow a different intercept in each
period. It can include interactions between time-period dummies and other variables to allow
partial effects to change over time.
We can write in a way similar to a SUR system:

Suggests that, even though the applications are very different, a common statistical framework
can be used for SUR and panel data.
Xt has the same dimension for all t. In SUR, the dimension of the covariates generally changes
across equation.
14

The general model allows variables that change only across time (such as year dummies), only
across unit (such as gender), and across unit and time (typically, the most interesting variables).
Some variables, such as gender, do not change over time for any units in the population.
Contemporaneous exogeneity:

Rules out omitted variables, measurement error, and so on, but it does not restrict correlation
between xs and ut for s ≠ t.
Sequential exogeneity:

can be applied to distributed lag models and models with lagged dependent variables.

A stronger assumption, both technically and practically, is strict exogeneity:

Implies that xs and ut are uncorrelated for all s and t.

Given whatever is in xt, the covariates from other time periods do not help explain yt.
(ut and xt+1 cannot be correlated)

SYSTEM OLS ESTIMATION


Consistency

First estimation method uses OLS on the system of equations.

We want to estimate the population parameter vector β.


In the SUR case:
15

In the panel data case, G = T (number of time periods):

Contrast the SUR case, where the dimension of the covariates can be different across equations.
Remember, though, that xit can be chosen quite flexibly.

Assumptions for System OLS (SOLS):


Assumption SOLS 1:

This is the weakest possible assumption without moving into instrumental variables territory.
The strongest assumption we could make:

In the SUR case:

is equivalent to:

SOLS.1 does not restrict relationship between uig and covariates in other equations.
In the panel data case:
16

SOLS.1 for panel data is contemporaneous exogeneity.


Assumption SOLS 2:

In the SUR case:

SOLS 2 holds if and only if:

which simply says that the single-equation OLS rank condition (OLS.2) holds for each equation.
Panel Data:

Holds if and only if:


17

The SOLS estimator looks just like the single-equations OLS estimator, but Xi is a matrix and yi is a
vector:

SOLS estimator in the SUR case:

system OLS is ordinary least squares equation-by-equation.


For panel data:

which we call the pooled OLS estimator.


System OLS is not unbiased under SOLS.1 and SOLS.2. It is if we use:
18

But this is very strong.


Asymptotic Normality and Inference
A fully robust estimator of B – that is, an estimator valid under SOLS.1 and SOLS.2, without any
second moment assumptions on ui – is:

Testing
Obtain asymptotic standard errors to construct large-sample t statistics and confidence intervals.

EXTENSIONS
We can combine SUR and panel data models:

GENERALIZED LEAST SQUARES


Generalized least squares (GLS) is a technique for estimating the unknown parameters in a linear
regression model when there is a certain degree of correlation between the residuals in a
regression model. In these cases, ordinary least squares and weighted least squares can be
statistically inefficient, or even give misleading inferences.

ASYMPTOTIC PROPERTIES OF GLS


By “generalized least squares,” we mean exploiting different unconditional variances across
equation (time, in the panel data case) and nonzero unconditional covariances across equations.
We do not exploit situations where the variance-covariance matrix is a function of Xi.
The equation in system form:

(G = T)
19

The G x G unconditional variance-covariance matrix plays a key role:

Assume, for now, that we know Ω. Transform the equation to remove correlations in errors and
make variances constant.

Apply System OLS to:

The GLS estimator is:

consistency of β∗ holds if:

GLS transforms the orthogonality conditions; it may not be consistent


when SOLS ( ) is.
Assumption SGLS.1 (Exogeneity):
20

every element of Xi is uncorrelated with every element of ui, so any linear combination of Xi is
uncorrelated with ui.
A variety of GLS estimators, even with a misspecified variance matrix, will be consistent.
Assumption SGLS.2 (Rank Condition):
Ω is nonsingular and

is nonsingular.

Under SGLS.1 and SGLS.2, β∗ is consistent for β as

If only holds, GLS is generally inconsistent.

We need the covariates in each equation to be uncorrelated with the errors in each equation.
The GLS estimator in some cases is OLS equation-by-equation.

FEASIBLE GLS
If the covariance of the errors Ω is unknown, one can get a consistent estimate of Ω, say ΩΛ, using
an implementable version of GLS known as the feasible generalized least squares (FGLS) estimator.
In FGLS, modelling proceeds in two stages: (1) the model is estimated by OLS or another consistent
(but inefficient) estimator, and the residuals are used to build a consistent estimator of the errors
covariance matrix (to do so, one often needs to examine the model adding additional constraints,
for example if the errors follow a time series process, a statistician generally needs some
theoretical assumptions on this process to ensure that a consistent estimator is available); and (2)
using the consistent estimator of the covariance matrix of the errors, one can implement GLS
ideas.
The Estimator and Asymptotic Properties

Estimator of Ω.
This only makes sense when Ω has fixed dimension. (In the panel data case, T is fixed).
In SUR analysis, we almost always use:
21

Same Ω ̂ can be used for panel data.


The FGLS estimator is:

β ̂ and β∗ are “asymptotically equivalent”, they are “N −equivalent,” which is much stronger than
saying that they are both consistent.
If N is “small,” the statistical properties of β ̂ and β∗ could be very different.
This estimator is robust to “system heteroskedasticity.”
Assumption SGLS.3 (System Homoskedasticity):

Sufficient for SGLS.3 is:

Given the zero conditional mean assumption, the key conditions are:

By the random sampling assumption, the unconditional variance-covariance matrices


must be identical across i, and equal to Ω. The question is whether conditional variances and
covariances conditional on Xi are constant. Particularly in panel data applications without strict
exogeneity it will not make sense to condition on all of Xi.
22

FGLS WITH INCORRECT RESTRICTIONS ON THE VARIANCE MATRIX


Rather than estimate Ω in an unrestricted fashion, so that Ω ̂ = Ω, we impose restrictions on the
estimated matrix. This is very common for panel data.
Let Λ ̂ denote an estimator that may be inconsistent for Ω. Nevertheless, Λ ̂ usually has a well-
defined, nonsingular probability limit.
The variance matrix estimator is consistent if:

This estimator can be needed under system homoskedasticity if incorrect restrictions are imposed
on the unconditional variance-covariance matrix.

TESTING USING FGLS

A generally available statistic is the Wald statistic:

SUR REVISITED
SUR system, written for a random draw i as:

We have considered two estimators of the βg: OLS equation-by-equation and GLS using Ω ̂ as the
estimated G ⋅ G variance matrix.
OLS VERSUS SUR FOR SYSTEMS
If the same regressors appear in each equation the OLS equation-by-equation is numerically the
same as FGLS for any structure of Ω.
If Ω ̂ is diagonal, FGLS = OLS equation by equation for any choice of explanatory variables.

If Ω is diagonal, and then FGLS and OLS EBE are asymptotically equivalent.
23

FGLS is (asymptotically) more efficient the OLS EBE only when at least some exclusion restrictions
have been made and there is some correlation in the errors across equations. Therefore, there is a
tradeoff between efficiency and robustness.

is sufficient for OLS on that equation to be consistent.

SUR generally requires for all g and h.


FGLS gains efficiency over OLS (under system homoskedasticity) only
when it is valid to use the
orthogonality condition
and some variables omitted from an equation, say, g, are assumed to be uncorrelated with at least
one explanatory variable omitted from that equation.
IMPOSING CROSS-EQUATION RESTRICTIONS
Two-equation system in the population is:

Let the vector of all parameters be the 8 ⋅ 1 vector:

Then we can define the matrix of regressors as:


24

Stata has a feature that allows one to specify linear constraints on the parameters.
Cross-equation restrictions arise naturally in demand systems, cost share equations, and so on.

In share equations, where the dependent variable is a fraction, can question whether linearity
seems reasonable. Almost certainly the system homoskedasticity assumption fails.
SYSTEMS WITH SINGULAR VARIANCE-COVARIANCE MATRICES
In expenditure and cost share systems, the G responses, if the categories are exhaustive and
mutually exclusive, sum to unity.
Restriction on the sum:

Can drop any of the equations. Make it the last one, and impose the restrictions on the
parameters.

This two-equation system has a cross equation restriction, too. But the singularity in the variance
matrix is gone, so can apply FGLS with:

Can add firm characteristics to the share equations without essential change.

PANEL DATA REVISITED


Write for a random draw i as:
25

xit can contain all kinds of explanatory variables, including time period dummies and variables that
do not change over time.
ASSUMPTIONS FOR POOLED OLS (POLS)
pooled OLS is employed when you select a different sample for each year/month/period of the
panel data. Fixed effects or random effects are employed when you are going to observe the same
sample of individuals/countries/states/cities/etc.
What is fixed effects and random effects?
A fixed effects model is a statistical model in which the model parameters are fixed or non-random
quantities. This is in contrast to random effects models and mixed models in which all or some of
the model parameters are random variables.

Randomized evaluations
We would like to compare identical individuals, we cannot do that but the aim is still the same.
The issue for good estimate is the self-selection, randomization break this problem.
Randomization is not just the random assignment of the treatment to an individual so that the
treatment does not depend on individual characteristic, but the aim of randomization is also to
allows us to compare groups that are identical along many other characteristics. So with
randomization we obtain two groups: the treated and control that apart from the treatment they
are identical, in this way we avoid the self-selection problem.
In a randomized experiment, a sample of N individuals is selected from the population. This
sample is then divided randomly into two groups: the Treatment group (Nt individuals) and the
Control group Nc (individuals). Obviously Nt+Nc=N.
The Treatment group is then treated by policy X while the control group is not. Then the outcome
Y is observed and compared for both Treatment and Control groups. The effect of policy X is
measured general by the difference in empirical means of Y between Treatments and Controls.
Manipulation is the key to resolve the problem of selection bias.
(No causation without manipulation: causes are only those things that could be, in principle,
treatments in experiments).
Problems:
- It is not possible to run all the experiments we would like to because they might affect
substantially the economic or social outcomes of the Treated.
Simple difference
As random experiments are very rare, economists have to rely on actual policy changes to identify
the effects of policies on outcomes. These are called “natural experiments" because we take
advantage of changes that were not made explicitly to measure the effects of policies.
26

The key issue when analyzing a natural experiment is to divide the data into a control and
treatment group. The most obvious way to do that is to do a simple difference method using data
before (t = 0) and after the change (t = 1). The OLS estimate of β is the difference in means Y1-Y0
before and after the change.
Difference-in-difference
A way to improve on the simple difference method is to compare outcomes before and after a
policy change for a group affected by the change (Treatment Group) to a group not affected by the
change (Control Group). Alternatively: instead of comparing before and after, it is possible to
compare a region where a policy is implemented to a region with no such policy.
The idea is to correct the simple difference before and after for the treatment group by
subtracting the simple difference for the control group.
The DD-estimate is an unbiased estimate of the effect of the policy change if, absent the policy
change, the average change in Y1-Y0 would have been the same for treatment and controls.

Problems:
- A pre-condition of the validity of the DD assumption is that the program is not
implemented based on the pre-existing differences in outcomes.
- When average levels of the outcome Y are very different for controls and treatments
before the policy change, the magnitude or even sign of the DD effect is very sensitive to the
functional form posited.
Fixed effect
Fixed effects can be seen as a generalization of DD in the case of more than two periods (say S
periods) and more than 2 groups (say J groups). Controlling for variables that are constant across
entities but vary over time can be done by including time fixed effects.
Suppose that group j in year t experiences a given policy T (for example an income tax rate) of
intensity Tjt. We want to know the effect of T on an outcome Y.
OLS Regression: Yjt = α + βTjt + Ɛjt
Put time dummies and group dummies in the regression:
Yjt = α +γt +δj +βTjt + Ɛjt
Direct extension of DD where there are 2 groups that experience different changes in policy over 2
periods. The fixed effects strategy requires panel data.
Assumption POLS.1 (Contemporaneous Exogeneity):
27

POLS.1 allows for lagged dependent variables as well as other non-strictly exogenous regressors.
Assumption POLS.2 (Rank Condition):

Under POLS.1 and POLS.2 the asymptotic variance of:

is:

This expression simplifies if we appropriately restrict the conditional variances and covariances.
Assumption POLS.3 (Homoskedasticity and No Serial Correlation):

implies that the “usual” asymptotic variance matrix estimator of β ̂ POLS is valid.
Can use the usual t and F statistics as approximately valid for large N.
Without POLS.3, generally need fully robust variance matrix. That is, robust to arbitrary
heteroskedasticity and serial correlation.
This estimator in Stata is computed using a “cluster” option, where each unit i is a cluster of T time
series observations.
If we maintain the no serial correlation part of POLS.3, then a heteroskedasticity-robust form is
valid. In Stata, this estimator is obtained with a “robust” option, but its robustness is limited to
heteroskedasticity, not serial correlation.
DYNAMIC COMPLETENESS AND TIME SERIES PERSISTENCE

When it holds, we say the model is dynamically complete.


A weaker sufficient condition for POLS.3 is:
28

is not enough with random regressors.

Dynamic completeness (DC) is a very strong assumption in static models.

One way to interpret the presence of serial correlation in the errors of panel data models is that
the model has misspecified dynamics. However, we may not want a model to satisfy the DC
assumption.
The presence of serial correlation is entirely different from strict exogeneity. Strict exogeneity always fails
in models with a lagged dependent variable.

With a large cross section and small T, the statistical properties of the estimators are invariant to the time
series properties of the series.

TESTING FOR SERIAL CORRELATION AND HETEROSKEDASTICITY

We can ignore estimation of β – in this case, estimation by POLS – in testing assumptions about the
unconditional variance-covariance matrix. Therefore, testing for serial correlation, or for constant variances
across time, is straightforward. Testing for AR(1) serial correlation:

FGLS WITH STRICTLY EXOGENOUS REGRESSORS


If we detect serial correlation or heteroskedasticity, it is tempting to use a FGLS method to try to
improve over POLS in the model.

with large N we can do valid inference with POLS. So, this is an efficiency issue.
If we use FGLS to account for serial correlation, strict exogeneity is key. If we make adjustments
just for heteroskedasticity, contemporaneous exogeneity sufficies provided our estimated
variance functions depend only on elements of xit.
29

Under strict exogeneity, we might use a simple AR(1) correction.


The FGLS estimator might be more efficient that POLS even if the AR(1) model is not quite right.
Maybe it accounts for “enough” of the serial correlation. But we should make our inference
robust.
generalized estimating equations (GEE)
Generalized estimating equation (GEE) is used to estimate the parameters of a generalized linear
model with a possible unknown correlation between outcomes.
GEE is essentially FGLS (and certainly asymptotically equivalent to it) recognizing that our chosen
variance matrix – such as the homoskedasticity AR(1) – might be incorrect. It also allows for
unrestricted system heteroskedasticity in conducting inference.
Tradeoff between efficiency and consistency: The POLS estimator only requires:

while Prais-Winsten (and other methods that exploit serial correlation in estimation) effectively
requires:

When “robust” is used as an option, Stata labels the standard errors “semi-robust.” For linear
models, there is no distinction between fully robust and semi-robust. But for certain kinds of
nonlinear models, one distinguishes between standard errors that allow misspecification of the
conditional mean – given fully robust standard errors – and those that only allow misspecification
of the conditional variance – which are dubbed “semi-robust.”

SYSTEMS OF EQUATIONS: INSTRUMENTAL VARIABLES


A system that looks like a SUR setup, but where some explanatory variables are endogenous in
their own equation (at least), and the panel data case, where some explanatory variables are
contemporaneously endogenous. We assume random sampling of units from a well-defined
population. The analysis is like estimation by SOLS and GLS, but we must distinguish explanatory
variables from instruments.

THE SUR CASE WITH ENDOGENOUS EXPLANATORY VARIABLE

The explanatory variables can be different across equations.


30

Now we want to allow for at least some equations g.

Consider a labor supply function and a wage offer (inverse labor demand) function:

The first shows how much each unit in the population would work at any given wage.

We assume that we observe equilibrium hours and wages for each individual:

we assume that the elements of z1 and z2 are exogenous to both equations.


The labor supply-wage offer example is a simultaneous equations model (SEM).

we now need IVs for one or both equations.


In many cases, the same instruments can be used for each equation, but in other cases the
instruments will differ across equations.

PANEL DATA MODELS WITH ENDOGENOUS EXPLANATORY VARIABLES

In many cases, elements of zit might be correlated with the errors in other time periods.

for each equation g, the moment conditions are:

The G ⋅ L matrix of instruments is:


31

THE SYSTEM IV ESTIMATOR


The system in the OLS/GLS case:

Assumption SIV.1 (Moment Conditions): For a G ⋅ L matrix Zi:

The assumption is weaker, often in important ways, than the assumption that all elements of Zi
are uncorrelated with all elements in ui:

Assumption SIV.2 (Rank Condition):

Looks like the key rank condition for 2SLS except that Zi and Xi are matrices.
32

A necessary condition for SIV.2 to hold is the order condition, L ≥ K.

In the SUR case:

SIV.2 holds if and only if:

This is the same as saying the rank condition holds for each equation g.
In the panel data case: if

Then:

If:
33

Then:

Estimation proceeds from the method of moments. The moment condition in Assumption SIV.1
can be written as:

or

If the rank condition is violated β is not identified.

If L = K:

Replace population averages with sample averages to get the system instrumental variables (SIV)
estimator:

Consistency is immediate by the usual WLLN argument.


Alternative:

(This is more convenient for studying the asymptotic distribution).


The SIV estimator for the SUR system where we have Lg = Kg for all g is IV equation-by-equation.
34

The SIV estimator for the panel data system with IVs stacked with L = K is a pooled IV estimator:

GENERALIZED METHOD OF MOMENTS ESTIMATION


The generalized method of moments (GMM) is a generic method for estimating parameters in
statistical models. Usually it is applied in the context of semiparametric models, where the
parameter of interest is finite-dimensional, whereas the full shape of the data's distribution
function may not be known, and therefore maximum likelihood estimation is not applicable.
Generalized method of moments (GMM) refers to a class of estimators constructed from the
sample moment counterparts of population moment conditions (sometimes known as
orthogonality conditions) of the data generating model. GMM estimators have become widely
used, for the following reasons:
1.
GMM estimators have large sample properties that are easy to characterize. A family of such
estimators can be studied simultaneously in ways that make asymptotic efficiency comparisons
easy. The method also provides a natural way to construct tests which take account of both
sampling and estimation error.
2.
In practice, researchers find it useful that GMM estimators may be constructed without specifying
the full data generating process (which would be required to write down the maximum likelihood
estimator). This characteristic has been exploited in analysing partially specified economic models,
studying potentially misspecified dynamic models designed to match target moments, and
constructing stochastic discount factor models that link asset pricing to sources of macroeconomic
risk.
General treatment with (potential) overidentification, that is, L > K.
A General Weighting Matrix
Though we assume the population moment conditions,
uniquely determine β, the sample analog,

,
generally has no solution when L > K (L equations in K unknowns).
βΛ solve:
35

(the Euclidean length of the vector as small as possible).


This estimator is consistent and is sometimes used as an initial estimator, but it is essentially never
efficient.
Ŵ is an L ⋅ L symmetric, positive semi-definite matrix, which can be random,

The solution to this is called a generalized method of moments (GMM) estimator.


Assumption SIV.3 (Positive Definite Limit):

Positive definiteness is stonger than needed. Usually the law of large numbers, combined with
consistency of a first-stage estimator, is used to establish SIV.3.

Theorem (Consistency): Under SIV.1 to SIV.3

Theorem (Asymptotic Normality): Under SIV.1, SIV.2, and SIV.3,


is asymptotically normal with mean zero and variance matrix

Where:
System 2SLS
The System 2SLS estimator uses weight matrix
36

and the estimator can be written as

Assumption SIV.3 is equivalent to:

Inference with S2SLS is possible without further assumptions.


In the SUR case, the S2SLS estimator is 2SLS equation-by-equation.
In the panel data case S2SLS is the pooled 2SLS estimator:

Optimal Weighting Matrix (how to estimate the optimal W)


β is defined by:

The optimal weighting matrix is the inverse of the variance matrix of:

Assumption SIV.4 (Optimal Weighting Matrix):

To obtain an actual GMM estimator using an efficient weighting matrix, use a two-step procedure:
1)

2)
37

We call such an estimator an optimal GMM estimator. It is sometimes called a minimum chi-
square estimator.
The optimal weighting matrix provides an asymptotically efficient estimator in the class of
estimators based on:

the estimator is asymptotically efficient for the given set of moment conditions (instruments).
It is possible we can find additional moment conditions that can enhance efficiency.
When L = K the weighting matrix is irrelevant. There is only one estimator consistent and that is
the system IV estimator.

is now an optimal GMM estimator.

are the optimal GMM residuals.


There is usually more than one way to estimate Λ, and so the optimal GMM estimator is not
unique. But all are √𝑁 asymptotically equivalent.
It is using a first-stage estimate of Κ that causes finite-sample problems for two-step optimal
GMM. System S2SLS does not have the same problems, but it is asymptotically inefficient. ·
Empirical likelihood has been proposed as an alternative to GMM.
The GMM Three Stage Least Squares Estimator
We can consider a restricted version of the optimal weighting matrix.
Assumption SIV.5 (System Homoskedasticity):
Let:
38

Then:

means that all the squares and cross products are uncorrelated with the squares and

cross products in .
In the SUR case, can show SIV.5 is the same as:

By the usual iterated expectations argument, a sufficient condition is:

and sufficient are:

Under SIV.5, we can estimate differently

Under SIV.1 to SIV.5, an optimal GMM estimator can be written:


39

We call the GMM three stage least squares (3SLS) estimator.


For first-order asymptotics, there is no gain in using SIV.5.

THE GENERALIZED IV ESTIMATOR


Derivation of the GIV Estimator and Its Asymptotic Properties
Rather than estimating β using the moment conditions:

an alternative is to transform the moment conditions in a way analogous to generalized least


squares.

Instruments:

Assumption GIV.1 (Exogeneity):

Implies:

Assumption GIV.1 is identical to the consistency condition for GLS when


Assumption GIV.2 (Rank Condition):
(a)

(b)

When G = 1, Assumption GIV.2 reduces to Assumption 2SLS.2


When Ω is replaced with a consistent estimator, Ω ,̂ we obtain the generalized instrumental
variables (GIV) estimator.
40

GIV.1 can impose unintended restrictions on the relationships between instruments and errors
across equations or time or both.
Assumption GIV.3 (System Homoskedasticity):

A sufficient condition for GIV.3 is

Comparison of GMM, GIV, and the Traditional 3SLS Estimator


Traditional 3SLS estimator:

The orthogonality condition needed for consistency is:

Like the GIV estimator, the consistency of the traditional 3SLS estimator does not follow from:

We have three different estimators of systems of equations based on first estimating

The equivalence of all estimates holds if we just impose the common instrument assumption in
the general system:

The GMM 3SLS estimator and the GIV estimator (using the same Ω )̂ are identical.
In many modern applications of system IV methods, to both simultaneous equations and panel
data, instruments that are exogenous in one equation are not exogenous in all other equations. In
such cases it is important to use the GMM 3SLS estimator once Zi has been properly chosen.
The GIV estimator and the traditional 3SLS estimator generally induce correlation between the
transformed instruments and the structural errors.
41

Most applications of method of moments tend to focus on GMM methods based on the original
orthogonality conditions. Nevertheless, for unobserved effects models we will see that the GIV
approach can provide insights into the workings of certain panel data estimators.
TESTING

We can always use the Wald statistic

MORE ON EFFICIENCY
Adding Instruments to Enhance Efficiency
Adding more instruments – that are exogenous, of course – can never hurt asymptotic efficiency
provided an optimal weighting matrix is used. We can never do worse – using first-order
asymptotics – by adding more IVs and using optimal GMM.
Generally, the optimal GMM estimator is more efficient than 2SLS.
If heteroskedasticity is present, we can keep adding instruments that would otherwise be
redundant in order to improve efficiency.
We cannot improve efficiency over OLS by adding nonlinear functions of x to the instrument list.
If heteroskedasticity is present, the GMM estimator that uses the extended instrument list and a
heteroskedasticity-robust weighting matrix is generally more efficient, asymptotically, than OLS!
Finding the Optimal Instruments
The optimal instruments are:

which is a function of wi only.


If Zi∗ were available, we would have no reason to try other functions of wi as instruments: once
we have Zi∗, all other functions of wi are redundant.
The optimal instruments depend on a matrix of conditional means and a conditional variance-
covariance matrix and is generally unknown.
SUMMARY

Generally, if we start with moment conditions of the form


42

GMM estimation based on this set of moment conditions will be more robust than estimators
based on a transformed set of moment conditions, such as GIV. If we decide to use GMM, we can
use the unrestricted weighting matrix.
Under Assumption SIV.5, which is a system homoskedasticity assumption, the GMM 3SLS
estimator is an asymptotically efficient GMM estimator.
When the same instruments can be used in every equation, GMM 3SLS, GIV, and traditional 3SLS
are identical.
When GMM and GIV are both consistent but are not asymptotically equivalent, they cannot
generally be ranked in terms of asymptotic efficiency.
One can never do worse by adding instruments and using the efficient weighting matrix in GMM.
This has implications for panel data applications. For example, if one has the option of choosing
the instruments as a block diagonal matrix or as a stacked matrix, it is better in large samples to
use the block diagonal form.
Under system homoskedasticity, 3SLS is generally more efficient. But there are situations where
they coincide.
If Ω is diagonal, 2SLS and 3SLS are asymptotically equivalent.
There is a trade-off between robustness and efficiency.
SIMULTANEOUS EQUATION MODEL
Consider a system of two regressions:

This is a simultaneous equation model (SEM) since y1 and y2 are determined simultaneously. Both
variables are determined within the model, so are endogenous, and denoted by letter y.

UNOBSERVED EFFECTS LINEAR PANEL DATA MODELS


Omitted-variable bias (OVB) occurs when a statistical model leaves out one or more relevant
variables. The bias results in the model attributing the effect of the missing variables to those that
were included.
More specifically, OVB is the bias that appears in the estimates of parameters in a regression
analysis, when the assumed specification is incorrect in that it omits an independent variable that
is a determinant of the dependent variable and correlated with one or more of the included
independent variables.
We explicitly add a time constant, unobserved effect to the model. Often called unobserved
heterogeneity.
43

In the balanced panel case we assume random sampling across i (the cross section dimension),
with fixed time periods T. The unbalanced case is trickier because we must know why we are
missing some time periods for some units. We consider this much later under missing data/sample
selection issues.
For a random draw i from the population, the basic model is

where uit are the idiosyncratic errors. The composite error at time t is

Because of ci, the sequence is almost certainly serially correlated, and definitely is if uit is serially
uncorrelated.
Useful to write a population version of the model in conditional expectation form:

βj is the partial effect of xtj on E so that we are “holding c fixed.”


Hope is that we can allow c to be correlated with xt.
With a single cross section, there is nothing we can do unless we can find good observable proxies
for c or IVs for the endogenous elements of xt. But with two or more periods we have more
options.
We can write the population model as:

Suppose we have T = 2 time periods:


44

cross section in the changes or differences.


Sufficient for OLS on a random sample to consistently estimate β:

The rank condition is violated if xt has elements that do not change over time. Assume each
element of xt has some time variation (that is, for at least some members in the population).
The orthogonality condition is:

under the conditional mean specification.

OLS on the differences will only be consistent if we add:

strict exogeneity assumption.


But in reality, we don’t omit an intercept from the differenced equation.
If we start with a model with different intercepts:

is the change in the aggregate time effects (intercepts).

Now the rank condition also excludes variables that change by the same amount for each unit.
45

Random effect models assist in controlling for unobserved heterogeneity when the heterogeneity
is constant over time and not correlated with independent variables. This constant can be
removed from longitudinal data through differencing, since taking a first difference will remove
any time invariant components of the model.
Two common assumptions can be made about the individual specific effect: the random effects
assumption and the fixed effects assumption. The random effects assumption is that the individual
unobserved heterogeneity is uncorrelated with the independent variables. The fixed effect
assumption is that the individual specific effect is correlated with the independent variables.
If the random effects assumption holds, the random effects estimator is more efficient than the
fixed effects model.
ASSUMPTIONS
We assume a balanced panel.
The basic unobserved effects model is:

In addition to unobserved effect and unobserved heterogeneity, ci is sometimes called a latent


effect.
An extension of the basic model is:

are unknown parameters.

β is not depending on time. But xit can include time period dummies and interactions of variables
with time periods dummies, so the model is quite flexible.
A general specification is:

gt is a vector of aggregate time effects (often time dummies), zi is a set of time-constant observed
variables, and wit changes across i and t (for at least some units i and time periods t). wit can
include nteractions among time-constant and time varying variables.
Assumptions about the Unobserved Effect
“random effect” essentially means:
46

The term “fixed effect” means that no restrictions are placed on the relationship between ci and

“correlated random effects” is used to denote situations where we model the relationship
between ci and
Exogeneity Assumptions on the Explanatory Variables

Contemporaneous Exogeneity Conditional on the Unobserved Effect:

Strict Exogeneity Conditional on the Unobserved Effect:

Assuming the condition holds conditional on ci:

strict exogeneity is:

sequential exogeneity conditional on the unobserved effect:

ESTIMATION AND TESTING


There are four common methods: pooled OLS, random effects, fixed effects, and first differencing.
Pooled OLS
Pooled OLS is employed when you select a different sample for each year/month/period of the
panel data. Fixed effects or random effects are employed when you are going to observe the same
sample of individuals/countries/states/cities/etc.
47

Consistency of the POLS estimator is ensured by:

Contemporaneous exogeneity is weaker than strict exogeneity.


Random Effects Estimation
Random effects model, also called a variance components model, is a statistical model where the
model parameters are random variables. It is a kind of hierarchical linear model, which assumes
that the data being analysed are drawn from a hierarchy of different populations whose
differences relate to that hierarchy. In econometrics, random effects models are used in panel
analysis of hierarchical or panel data when one assumes no fixed effects (it allows for individual
effects). A random effects model is a special case of a mixed model.
ASSUMPTION RE.1:

is without loss of generality.

A GLS approach also leaves ci in the error term:

ASSUMPTION RE.2:

Ω is nonsingular and rank


ASSUMPTION RE.3:
48

Under Assumptions RE.1 and RE.3, Ω has the RE structure and system homoskedasticity holds.
(a) is homoskedasticity and serial uncorrelatedness of uit conditional on (xi, ci)
(b) is homoskedasticity of ci.
Under RE.1, RE.2, and RE.3:

is a valid estimator.
Fixed Effects Estimation
Fixed effects model is a statistical model in which the model parameters are fixed or non-random
quantities. This is in contrast to random effects models and mixed models in which all or some of
the model parameters are random variables.
In panel data where longitudinal observations exist for the same subject, fixed effects represent
the subject-specific means. In panel data analysis the term fixed effects estimator (also known as
the within estimator) is used to refer to an estimator for the coefficients in the regression model
including those fixed effects (one time-invariant intercept for each subject).
Unlike POLS and RE, fixed effects estimation removes ci to form an estimating equation.

overbar indicates time averages


It is often called the between equation because it relies on variation in the data between cross
section observations.
The between estimator is inconsistent unless:

ASSUMPTION FE.1:
49

ASSUMPTION FE.2:

Under FE.1 and FE.2:

A nonrobust form requires an extra assumption:


ASSUMPTION FE.3:

Fixed Effects GLS:

When we eliminate ci by demeaning we get:

Applying FGLS is tricky.

drop one of the time periods.


First-Differencing Estimation
The first-difference (FD) estimator is an estimator used to address the problem of omitted
variables with panel data. It is consistent under the assumptions of the fixed effects model. In
certain situations it can be more efficient than the standard fixed effects (or "within") estimator.
FD removes ci. It does it by differencing adjacent observations.
50

The estimator is obtained by running a pooled ordinary least squares (OLS) estimation for a
regression of ∆yit on ∆xit.

We explicitly lose the first time period:

The FD estimator is pooled OLS on the first differences. In practice, might not difference period
dummies, unless interested in the year intercepts in the original levels.
FD also requires a kind of strict exogeneity. The weakest assumption is:

ASSUMPTION FD.1:

ASSUMPTION FD.2:

ASSUMPTION FD.3:

For a given i, the time series “model” would be:

where ci is the intercept for unit i.


Standard UE model:

which we write for all T time periods as:


51

RE, FE, and FD still the most popular approaches to estimating β with strictly exogenous
explanatory variables. Or, can use GLS versions of FE and FD.
CHAMBERLAIN’S APPROACH TO UE MODELS
In linear panel analysis, it can be desirable to estimate the magnitude of the fixed effects, as they
provide measures of the unobserved components. Chamberlain's approach to unobserved effects
models is a way of estimating the linear unobserved effects, under Fixed Effect (rather than
random effects) assumptions.
In the standard model:

Chamberlain simply writes down a linear projection relating ci to the entire history of the xit.
Assume no aggregate time effects for notational simplicity (and no time-constant variables).

Assuming finite second moments.


Write for all time periods as:

We can apply system OLS, FGLS, or method of moments procedures.


If we apply RE:

more restrictive Mundlak version:

In the Chamberlain equation, to account for system heteroskedasticity or a non RE unconditional


variance matrix, we can use a GMM approach with IV matrix:
52

RE AND FE INSTRUMENTAL VARIABLES METHODS


We start with the usual unobserved effects model:

but now we think some elements of xit are correlated with uit.
Pooled 2SLS will be consistent if:

In principle, this can be applied to models with lagged dependent variables, although in a model
with only a lagged dependent variable, it would be hard to find a convincing instrument.
Generally, assuming the instruments are uncorrelated with ci is a strong assumption. If we are
willing to make it, we probably are willing to assume strict exogeneity conditional on ci. So, we can
use an RE approach. Assumptions parallel those for exogenous xit.
ASSUMPTION REIV.1:

For simplicity, assumes that xit contains an overall intercept (and probably a separate intercept in
each time period), so we can take

As usual, we could relax the assumptions to zero correlation without changing consistency.

ASSUMPTION REIV.2: Ω is nonsingular, and


53

This is just the usual rank condition for GIV estimation. The REIV estimator is just the GIV estimator
where Ω is assumed to have the RE form. Without further assumptions, fully robust inference is
warranted, as usual.
ASSUMPTION REIV.3:

We can test the null that a set of variables is endogenous. Write the model as:

We maintain strict exogeneity of the instruments zit:

Reduced form for yit3:

To test overidentifying restrictions:

With REIV, can have time-constant explanatory variables and time-constant instruments. With lots
of good controls, or an exogenous intervention in an initial time period, the analysis can be
convincing. But time-constant IVs in panel data are often unconvincing. A more robust analysis
uses fixed effects and instrumental variables (FEIV). This requires time-varying instruments.
ASSUMPTION FEIV.1:

Apply pooled 2SLS to the time-demeaned equation:

the IVs can be arbitrarily correlated with ci as long as there is exogenous time variation in the
instruments.
ASSUMPTION FEIV.2:
54

ASSUMPTION FEIV.3:

for overidentification:

For endogeneity:

You might also like