0% found this document useful (0 votes)
47 views3 pages

Econometrics Cheatsheet en

The document is an Econometrics Cheat Sheet that outlines key assumptions and properties of Ordinary Least Squares (OLS) regression, including Gauss-Markov assumptions and various econometric model specifications. It covers concepts such as data types, regression analysis, hypothesis testing, and issues like multicollinearity, heteroscedasticity, and autocorrelation. Additionally, it provides formulas and interpretations related to regression coefficients and statistical tests.

Uploaded by

taminafritz01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views3 pages

Econometrics Cheatsheet en

The document is an Econometrics Cheat Sheet that outlines key assumptions and properties of Ordinary Least Squares (OLS) regression, including Gauss-Markov assumptions and various econometric model specifications. It covers concepts such as data types, regression analysis, hypothesis testing, and issues like multicollinearity, heteroscedasticity, and autocorrelation. Additionally, it provides formulas and interpretations related to regression coefficients and statistical tests.

Uploaded by

taminafritz01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Econometrics Cheat Sheet Assumptions and properties Ordinary Least Squares

By Marcelo Moreno - Universidad Rey Juan Carlos


The Econometrics Cheat Sheet Project Econometric model assumptions Objective - minimize Pn the2 Sum of Squared Residuals (SSR):
Under this assumptions, the OLS estimator will present min i=1 ûi , where ûi = yi − ŷi
good properties. Gauss-Markov assumptions:
Basic concepts 1. Parameters linearity (and weak dependence in time y
Simple regression model
Equation:
Definitions series). y must be a linear function of the β’s.
yi = β0 + β1 xi + ui
Econometrics - is a social science discipline with the 2. Random sampling. The sample from the population Estimation:
objective of quantify the relationships between economic has been randomly taken. (Only when cross section)
ŷi = β̂0 + β̂1 xi
agents, test economic theories and evaluate and implement 3. No perfect collinearity. where:
government and business policies. • There are no independent variables that are constant: β1
β̂0 = y − β̂1 x
Econometric model - is a simplified representation of the Var(xj ) 6= 0, ∀j = 1, . . . , k
β̂ Cov(y,x)
• There isn’t an exact linear relation between indepen- 1 = Var(x)
reality to explain economic phenomena. β0 x
Ceteris paribus - if all the other relevant factors remain dent variables.
constant. 4. Conditional mean zero and correlation zero.
Multiple regression model
a. There aren’t systematic errors: E(u | x1 , . . . , xk ) =
Data types E(u) = 0 → strong exogeneity (a implies b).
y Equation:
Cross section - data taken at a given moment in time, an b. There are no relevant variables left out of the model: yi = β0 + β1 x1i + · · · + βk xki + ui
static photo. Order doesn’t matter. Cov(xj , u) = 0, ∀j = 1, . . . , k → weak exogeneity. Estimation:
Time series - observation of variables across time. Order 5. Homoscedasticity. The variability of the residuals is ŷi = β̂0 + β̂1 x1i + · · · + β̂k xki
does matter. the same for all levels of x: where:
Panel data - consist of a time series for each observation Var(u | x1 , . . . , xk ) = σu2 β̂0 = y − β̂1 x1 − · · · − β̂k xk
of a cross section. 6. No autocorrelation. Residuals don’t contain informa- β0 x 2
Cov(y,resid x )
β̂j = Var(resid xj )j
Pooled cross sections - combines cross section from dif- tion about any other residuals:
x 1
Matrix: β̂ = (X T X)−1 (X T y)
ferent time periods. Corr(ut , us | x1 , . . . , xk ) = 0, ∀t 6= s
Phases of an econometric model 7. Normality. Residuals are independent and identically Interpretation of coefficients
distributed: u ∼ N (0, σu2 ) Model Dependent Independent β1 interpretation
1. Specification. 3. Validation. Level-level
8. Data size. The number of observations available must
y x ∆y = β1 ∆x
2. Estimation. 4. Utilization. Level-log y log(x) ∆y ≈ (β1 /100)(%∆x)
be greater than (k + 1) parameters to estimate. (It is Log-level
Regression analysis already satisfied under asymptotic situations) Log-log
log(y)
log(y)
x
log(x)
%∆y ≈ (100β1 )∆x
%∆y ≈ β1 (%∆x)
Study and predict the mean value of a variable (dependent Quadratic x + x2
variable, y) regarding the base of fixed values of other vari- Asymptotic properties of OLS
y ∆y = (β1 + 2β2 x)∆x

ables (independent variables, x’s). In econometrics it is Under the econometric model assumptions and the Central Error measurements
common to use Ordinary Least Squares (OLS) for regres- Limit Theorem (CLT): Sum of Sq. Residuals: SSR = i=1 û2i = Pi=1 (yi − ŷi )2
Pn Pn
• Hold 1 to 4a: OLS is unbiased. E(β̂j ) = βj Explained Sum of Squares:
n
sion analysis. SSE = Pi=1 (ŷi − y)2
• Hold 1 to 4: OLS is consistent. plim(β̂j ) = βj (to 4b Total Sum of Sq.:
n
SST = SSE + SSR = i=1q (yi − y)2
Correlation analysis left out 4a, weak exogeneity, biased but consistent)
Correlation analysis don’t distinguish between dependent Standard Error of the Regression: σ̂u = n−k−1 SSR
• Hold 1 to 5: asymptotic normality of OLS (then, 7 is
and independent variables. Standard Error of the β̂’s:
p
necessarily satisfied): u ∼ N (0, σu2 ) se(β̂) = q σ̂u2 · (X T X)−1
• Simple correlation measures the grade of linear associa- a
Pn
Root Mean Squared Error:
2
i=1 (yi −ŷi )
tion between two variables.Pn • Hold 1 to 6: unbiased estimate of σu . E(σ̂u ) = σu
2 2 2 RMSE =
Pn n
r = Cov(x,y)
((xi −x)·(yi −y)) • Hold 1 to 6: OLS is BLUE (Best Linear Unbiased Esti- Absolute Mean Error: AME = i=1 n i i
|y −ŷ |
σx ·σy =
pPn i=1 Pn
mator) or efficient.
2 2
i=1 (xi −x) · i=1 (yi −y)
Pn
Mean Percentage Error:
|û /y |
• Partial correlation measures the grade of linear associa- MPE = i=1n i i · 100
• Hold 1 to 7: hypothesis testing and confidence intervals
tion between two variables controlling a third.
can be done reliably.

CS-25.01-EN - github.com/marcelomijas/econometrics-cheatsheet - CC-BY-4.0 license


R-squared Individual tests Dummy variables
Tests if a parameter is significantly different from a given
Is a measure of the goodness of the fit, how the regression value, ϑ. Dummy (or binary) variables are used for qualitative infor-
fits the data: • H0 : βj = ϑ mation like sex, civil state, country, etc.
SSE
R2 = SST = 1 − SSR
SST • H1 : βj 6= ϑ • Takes the value 1 in a given category and 0 in the
• Measures the percentage of variation of y that is lin- rest.
Under H0 : t = se(j β̂ ) ∼ tn−k−1,α/2
β̂ −ϑ
early explained by the variations of x’s. j • Are used to analyze and modeling structural changes
• Takes values between 0 (no linear explanation) and 1 If |t| > |tn−k−1,α/2 |, there is evidence to reject H0 . in the model parameters.
(total explanation). Individual significance test - tests if a parameter is sig- If a qualitative variable have m categories, we only have to
When the number of regressors increases, the value of the nificantly different from zero. include (m − 1) dummy variables.
R-squared also increases, whatever the new variables are • H0 : βj = 0
relevant or not. To solve this problem, there is an adjusted • H1 : βj 6= 0
Structural change
Structural change refers to changes in the values of the pa-
R-squared by degrees of freedom (or corrected): Under H0 : t = se(β̂j ) ∼ tn−k−1,α/2
β̂
2 n−1 j rameters of the econometric model produced by the effect
R = 1 − n−k−1 · SSR n−1
SST = 1 − n−k−1 · (1 − R )
2
If |t| > |tn−k−1,α/2 |, there is evidence to reject H0 . of different sub-populations. Structural change can be in-
For big sample sizes: R ≈ R2
2
cluded in the model through dummy variables.
The F test
The location of the dummy variables (D) matters:
Simultaneously tests multiple (linear) hypothesis about the
Hypothesis testing parameters. It makes use of a non restricted model and a
• On the intercept (additive effect) - represents the
mean difference between the values produced by the
restricted model:
Definitions structural change.
• Non restricted model - is the model on which we want
Is a rule designed to explain from a sample, if exist ev- y = β0 + δ 1 D + β1 x 1 + u
to test the hypothesis.
idence or not to reject an hypothesis that is made • On the slope (multiplicative effect) - represents the ef-
• Restricted model - is the model on which the hypoth-
about one or more population parameters. fect (slope) difference between the values produced by
esis that we want to test have been imposed.
Elements of an hypothesis test: the structural change.
Then, looking at the errors, there are:
• Null hypothesis (H0 ) - is the hypothesis to be tested. y = β0 + β1 x1 + δ1 D · x1 + u
• SSRUR - is the SSR of the non restricted model.
• Alternative hypothesis (H1 ) - is the hypothesis that Chow’s structural test - analyze the existence of struc-
• SSRR - is the SSR of the restricted model.
cannot be rejected when H0 is rejected. tural changes in all the model parameters, it’s a particular
Under H0 : F = SSRSSR R −SSRUR
· n−k−1 ∼ Fq,n−k−1
• Test statistic - is a random variable whose probability q
expression of the F test, where H0 : No structural change
where k is the number of parameters of the non restricted
UR

distribution is known under H0 . (all δ = 0).


model and q is the number of linear hypothesis tested.
• Critical value (C) - is the value against which the test
If F > Fq,n−k−1 , there is evidence to reject H0 .
Global significance test - tests if all the parameters as- Changes of scale
statistic is compared to determine if H0 is rejected or
not. It sets the frontier between the regions of accep-
sociated to x’s are simultaneously equal to zero. Changes in the measurement units of the variables:
tance and rejection of H0 .
• H0 : β1 = β2 = · · · = βk = 0 • In the endogenous variable, y ∗ = y·λ - affects all model
• Significance level (α) - is the probability of rejecting
• H1 : β1 6= 0 and/or β2 6= 0 . . . and/or βk 6= 0 parameters, βj∗ = βj · λ, ∀j = 1, . . . , k
the null hypothesis being true (Type I Error). Is chosen
In this case, we can simplify the formula for the F statistic: • In an exogenous variable, x∗ = xj · λ - only affect the
by who conduct the test. Commonly is 10%, 5% or 1%. j
• p-value - is the highest level of significance by which H0 Under H0 : F = 1−R R2
2 ·
n−k−1
k ∼ Fk,n−k−1 parameter linked to said exogenous variable, βj∗ = βj · λ
cannot be rejected. If F > Fk,n−k−1 , there is evidence to reject H0 . • Same scale change on endogenous and exogenous - only
Two-tailed test. H0 dist. One-tailed test. H0 dist. affects the intercept, β0∗ = β0 · λ
Confidence intervals
−C
1−α
C
1−α
C
α The confidence intervals at (1 − α) confidence level can be Changes of origin
α/ 2 α/ 2
Accept. region Accept. reg. calculated: Changes in the measurement origin of the variables (en-
The rule is: if p-value < α holds, there is evidence to β̂j ∓ tn−k−1,α/2 · se(β̂j ) dogenous or exogenous), y ∗ = y + λ - only affects the
reject H0 , thus, there is evidence to accept H1 . model’s intercept, β0∗ = β0 + λ

CS-25.01-EN - github.com/marcelomijas/econometrics-cheatsheet - CC-BY-4.0 license


Multicollinearity Heteroscedasticity Autocorrelation
• Perfect multicollinearity - there are independent The residuals ui of the population regression function do The residual of any observation, ut , is correlated with the
variables that are constant and/or there is an exact not have the same variance σu2 : residual of any other observation. The observations are not
linear relation between independent variables. Is the Var(u | x1 , . . . , xk ) = Var(u) 6= σu2 independent.
breaking of the third (3) econometric model as- Is the breaking of the fifth (5) econometric model as- Corr(ut , us | x1 , . . . , xk ) = Corr(ut , us ) 6= 0, ∀t 6= s
sumption. sumption. The “natural” context of this phenomena is time series. Is
• Approximate multicollinearity - there are indepen- the breaking of the sixth (6) econometric model as-
dent variables that are approximately constant and/or
Consequences sumption.
• OLS estimators still are unbiased.
there is an approximately linear relation between inde-
pendent variables. It does not break any economet-
• OLS estimators still are consistent. Consequences
• OLS is not efficient anymore, but still a LUE (Linear • OLS estimators still are unbiased.
ric model assumption, but has an effect on OLS.
Unbiased Estimator). • OLS estimators still are consistent.
Consequences • Variance estimations of the estimators are biased: • OLS is not efficient anymore, but still a LUE (Linear
• Perfect multicollinearity - the equation system of the construction of confidence intervals and the hypoth- Unbiased Estimator).
OLS cannot be solved due to infinite solutions. esis testing is not reliable. • Variance estimations of the estimators are biased:
• Approximate multicollinearity the construction of confidence intervals and the hypoth-
– Small sample variations can induce to big variations
Detection esis testing is not reliable.
• Graphs - look u y
in the OLS estimations.
– The variance of the OLS estimators of the x’s that
for scatter pat- Detection
terns on x vs. u • Graphs - look for scatter patterns on ut−1 vs. ut or
are collinear, increments, thus the inference of the pa-
or x vs. y plots. x make use of a correlogram.
rameter is affected. The estimation of the parameter
Ac. Ac. + Ac. −
is very imprecise (big confidence interval). x ut ut ut
Detection • Formal tests - White, Bartlett, Breusch-Pagan, etc.
• Correlation analysis - look for high correlations be- Commonly, H0 : No heteroscedasticity. ut−1 ut−1
tween independent variables, |r| > 0.7. ut−1
Correction
• Variance Inflation Factor (VIF) - indicates the in-
• Use OLS with a variance-covariance matrix estimator
crement of Var(β̂j ) because of the multicollinearity. robust to heteroscedasticity (HC), for example, the one • Formal tests - Durbin-Watson, Breusch-Godfrey, etc.
1
VIF(β̂j ) = 1−R 2
j
proposed by White. Commonly, H0 : No autocorrelation.
where Rj denotes the R-squared from a regression be-
2 • If the variance structure is known, make use of Weighted
tween xj and all the other x’s. Least Squares (WLS) or Generalized Least Squares Correction
– Values between 4 to 10 - there might be multicollinear- (GLS): • Use OLS with a variance-covariance matrix estimator
ity problems. – Supposing that Var(u) = σ 2
u · x i , divide the model robust to heterocedasticity and autocorrelation (HAC),
– Values > 10 - there are multicollinearity problems. variables by the square root of x i and apply OLS. for example, the one proposed by Newey-West.
One typical characteristic of multicollinearity is that the – Supposing that Var(u) = σu
2
· x 2
i , divide the model • Use Generalized Least Squares. Supposing yt = β0 +
regression coefficients of the model aren’t individually dif- variables by x i (the square root of x 2
i ) and apply OLS. β1 xt + ut , with ut = ρut−1 + εt , where |ρ| < 1 and εt is
ferent from zero (due to high variances), but jointly they • If the variance structure is not known, make use of Fea- white noise.
are different from zero. sible Weighted Least Squared (FWLS), that estimates a – If ρ is known, create a quasi-differentiated model
possible variance, divides the model variables by it and where ut is white noise and estimate it by OLS.
Correction then apply OLS. – If ρ is not known, estimate it by -for example-
• Delete one of the collinear variables. • Make a new model specification, for example, logarith- the Cochrane-Orcutt method, create a quasi-
• Perform factorial analysis (or any other dimension re- mic transformation (lower variance). differentiated model where ut is white noise and es-
duction technique) on the collinear variables. timate it by OLS.
• Interpret coefficients with multicollinearity jointly.

CS-25.01-EN - github.com/marcelomijas/econometrics-cheatsheet - CC-BY-4.0 license

You might also like