GENERAL MODELING FRAMEWORK
GENERAL MODELING FRAMEWORK FOR ANALYZING PANEL DATA
The Model:
The fundamental advantage of a panel data set over a cross section is that it will allow the researcher great
flexibility in modeling differences in behavior across individuals.
The basic framework for this discussion is a regression model of the form:
yit = x’it β + z’i α + εit
= x’it β + ci + εit.
yit: The dependent variable for individual i at time t.
x'it: A vector of K independent variables for individual i at time t. (There are K regressors in Xit, not including a
constant term. )
β: A vector of coefficients to be estimated, representing the effect of the independent variables on the dependent
variable.
zi: A vector of individual-specific variables (some observed, some unobserved). Crucially, it includes a constant
term.
α: A vector of coefficients for the individual-specific variables.
ci: The unobserved individual effect (or heterogeneity). This represents characteristics of individual i that are
constant over time but affect yit and are not captured by the observed variables x'it. This is often written as ci =
z'iα
εit: The error term, representing unobserved factors that vary over both individuals and time.
Individual-specific variables refer to characteristics that vary across individuals (or entities) but remain constant
over time for each individual. These variables can be observed (can be directly measures) or unobserved (can not
be directly measured).
The Challenge of Unobserved Effects:
If zi (and therefore ci) were fully observed, the model could be estimated using ordinary least squares (OLS).
However, in most real-world scenarios, ci contains unobserved components. This is the key challenge. These
unobserved individual effects can be correlated with the other explanatory variables, leading to biased and
inconsistent OLS estimates.
The objective:
The primary goal is to efficiently and consistently estimate the partial effects of the independent variables:
δ 𝐸 [𝑦𝑖𝑡| 𝑥𝑖𝑡]
β = δ 𝑥𝑖𝑡
This represents the change in the expected value of yit for a one-unit change in xit, holding the unobserved
effect ci constant (conceptually).
Strict Exogeneity:
A crucial assumption is strict exogeneity: Whether this is possible depends on the assumptions about the
unobserved effects. We begin with a strict exogeneity assumption for the independent variables:
𝐸 [ε𝑖𝑡| 𝑥𝑖1 , 𝑥𝑖2 , ... , 𝑐𝑖 ] = 𝐸 [ε𝑖𝑡| 𝑥𝑖𝑡 , 𝑐𝑖 ] = 0
This means that the error term in any time period t is uncorrelated with the values of the independent variables
in all time periods (past, present, and future), given the unobserved effect ci.
Strict exogeneity simplifies the analysis by allowing us to focus on the current values of x when predicting y. It
implies that the independent variables only affect the expected value of the dependent variable in the current
period.
Dynamic Models and Strict Exogeneity:
Strict exogeneity rules out dynamic models of the form:
𝑦𝑖𝑡 = β 𝑤'𝑖𝑡 + γ 𝑦𝑖,𝑡−1 + 𝑐𝑖 + ε𝑖𝑡
In this dynamic model, the lagged dependent variable yi,t-1 is included as a regressor. Because yi,t-1 is itself a
function of ci (from the previous period), it will be correlated with ci. This correlation violates the strict exogeneity
assumption. The problem is that ci affects y in all periods, so if y is used as a predictor, it will be correlated with
ci.
As long as γ is nonzero, covariation between 𝜖𝑖𝑡 and 𝑥𝑖𝑡 = (𝑤𝑖𝑡, 𝑦𝑖,𝑡−1) is transmitted through 𝑐𝑖 in 𝑦𝑖,𝑡−1.
Model structures:
The basic Panel data Regression framework has the following form
𝑦𝑖𝑡 = β 𝑥'𝑖𝑡 + α 𝑧'𝑖 + ε𝑖𝑡 = β 𝑥'𝑖𝑡 + 𝑐𝑖 + ε𝑖𝑡
for i = 1, ... , N and t = 1, … , T
There are K regressors in Xit, not including a constant term.
The heterogeneity, or individual effect, is α 𝑧' where 𝑧 contains a constant term and a set of individual or
𝑖 𝑖
group-specific variables.
The group-specific variables: observed (race, sex, location...) or unobserved (family specific characteristics,
individual heterogeneity in skill or preferences). ε_it is a random error term.
We will examine a variety of different models for panel data. Broadly, they can be arranged as follows:
1.Pooled Regression
Yit = β x’it + α z’i + ɛit
The Pooled OLS model applies the Ordinary Least Squares (OLS) methodology to panel data. This model
assumes that there are no unobservable entity-specific effects, meaning that all entities in the data set are
considered to have the same underlying characteristics. Consequently, αi is assumed to be constant across
individuals and there is no dependence within individual groups
2.Fixed Effects:
If 𝑍𝑖 is unobserved, but correlated with 𝑥𝑖𝑡 , then the least squares estimator of 𝛽 is biased and inconsistent as a
consequence of an omitted variable.
However, in this instance, the model:
𝑦𝑖𝑡 = β 𝑥'𝑖𝑡 + α 𝑧'𝑖 + ε𝑖𝑡
Where 𝛼𝑖 = 𝑍𝑖' 𝛼 embodies all the observable effects and specifies an estimable conditional mean. This fixed effects
approach takes 𝛼𝑖 to be a group-specific constant term in the regression model. "fixed" signifies that Corr(𝑐𝑖, 𝑋𝑖𝑡) ≠ 0
not that 𝑐𝑖 is nonstochastic.
3.Random Effects:
If the unobserved individual heterogeneity, however formulated, is uncorrelated with 𝑋𝑖𝑡, then the model may be
formulated as:
𝑦𝑖𝑡 = β 𝑥'𝑖𝑡 + α 𝑧'𝑖 + ε𝑖𝑡
Where 𝛼𝑖 = 𝛼 + 𝑢𝑖
This random effects approach specifies that 𝑢𝑖 is a group-specific random element (similar to 𝜖𝑖𝑡).
The crucial distinction between fixed and random effects is whether the unobserved individual effect embodies
elements that are correlated with the regressors in the model, not whether these effects are stochastic or not.
4.Random Parameters:
The random effects model can be viewed as a regression model with a random constant term. The extension of the
model might appear as:
𝑦𝑖𝑡 = (β + 𝑢𝑖) 𝑥'𝑖𝑡 + α + 𝑢𝑖 + ε𝑖𝑡
where 𝑢𝑖 is a random vector that induces the variation of the parameters across individuals.
Probability Limit (plim): Rappels
Definition: Convergence in probability
Let θ be a constant, ε > 0, and n be the index of the sequence of RV x
If lim Prob [|x - θ| > ε] = 0 for any ε > 0, we say that x converge in probability to θ.
That is, the probability that the difference between x and θ is larger than any ε > 0 goes to zero as n becomes
bigger.
Notation:
If xn is an estimator and if plim xn = 0, we say that xn is a constant estimator of θ.
Convergence to a Random Variable: Rappels
Definition: Limiting Distribution
Let x be a random sequence with cdf F (x ). Let x be random variable with cdf F(x).
When F converge to F as n→∞, for all points x at which F(x) is continous, we say that x convergnece in
distribution to x. The distribution of that random variable is the limiting distribution of x .
Notation:
WELL-BEHAVED PANEL DATAQ
Assumptions reminder: classical regression model
1. Linearity:
2. Full Rank:
The (n x K) sample data matrix, X has full column rank for every (n>K).
3. Strict exogeneity of the independent variables:
𝐸 [ε𝑖| 𝑥𝑖1 , 𝑥𝑖2 , ... , 𝑥𝑖𝑘 ] = 0 for (i,j = 1,...,n)
4. Homoscedasticity and nonautocorrelation:
𝐸 [ε𝑖ε𝑗|𝑋 ] = δ² if i = j
= 0 otherwise
The following are the crucial results needed: For consistency of b, we need
For consistency of δ² we added a fairly weak assumption about the moments of the disturbances. To establish
asymptotic normality, we required consistency and
Exceptions to the assumptions are likely to arise in a panel data set.