0% found this document useful (0 votes)
44 views60 pages

Econ-T2 Eng

This document summarizes key concepts in regression models (RM). [1] RM specify a relationship between an endogenous (dependent) variable and exogenous (independent) variables. [2] Ordinary Least Squares (OLS) estimation minimizes the sum of squared errors to estimate model parameters. [3] For a RM to produce valid inferences, its assumptions about the disturbance term and exogenous variables must hold.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views60 pages

Econ-T2 Eng

This document summarizes key concepts in regression models (RM). [1] RM specify a relationship between an endogenous (dependent) variable and exogenous (independent) variables. [2] Ordinary Least Squares (OLS) estimation minimizes the sum of squared errors to estimate model parameters. [3] For a RM to produce valid inferences, its assumptions about the disturbance term and exogenous variables must hold.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

Econometrics

Inference

Dr. Toni Mora


Regression models (RM)

y = f (x1 , x 2 ,%, x k ) + g(x k +1 ,%, x n )


$!!#!! "
e

We should distinguish systematic and random parts

yi = b1x i,1 + b2 x i,2 + ! + bk x i,k + ei


RM specification

l yi is the endogenous variable or dependent variable,


the variable to be explained
l xj are the k exogenous variables (explanatory,
independent, controls or regressors)
l bk are the parameters of the model to be estimated
(slopes)
l ei is the disturbance term
l i number of individuals & j number of regressors
Parameters’ interpretation

l Marginal effects of regressors. The higher coefficient the great


influence on the endogenous variable but measurement units do
not change and the model is expressed in levels (no
transformation)
l For instance, b1 represents the change in the endogenous
variable when the first regressor rises in one unit holding
constant the rest of regressors, that is, Dx2=0…

¶y
= bj
¶x j
How to interpret

Dŷ = Dbˆ 1x1 + Dbˆ 2 x 2 + ... + Dbˆ k x k


Holding x2,...,xk fixed

Dŷ = Dbˆ 1x1

bj should be interpreted ceteris paribus


Beta coefficients

l Whether the variables are not in the same


measurement units or
l Not possible to be transform properly the variables

To measure which variable shows a greater


influence we should calculate beta coefficients
How to compute beta coefficients

The greater beta coefficient the higher the influence. Two


alternative methods (standardized coefficients):

Sx j
ˆ
bj*
= bˆ j
Sy

* x i,1 * x i,2 * x i,k


yi = b 1 + b2 +!+ bk + e'i
Sx1 Sx 2 Sx k
General assumptions in RM

l Model functional form


l Parameters’ identification
l Expected value of disturbance term
l Disturbance variances & covariances
l Random sampling & independent variables
l Error term distribution
Linearity assumption

b1 is a vector of ones (intercept term)

y i = b1 + b 2 x i, 2 + ! + b k x i,k + e i
Y = Xb + e
Linearity refers to the form in which parameters and
disturbance are introduced into the equation but not
necessarily refers to the relationship between the
variables (Xj exponentiation)
Exemples of parameters’ linearity

y = b0 + b1x + e
y = b 0 + b1·(1 / x) + e Reciprocate

y = b0 + b1x + b2 x 2 + e

y = b 0 + b1 x1 + b 2 x2 + b3 x1 x2 + e

Non linearity in parameters y = b0eb1x + e


Logarithmic model

K
y = eb1 x b22 x b3 3 ···x bkk e e = eb1 Õ
k =2
x bkk e e

ln y = b1 + b 2 ln x 2 + b3 ln x 3 +···+b k ln x k + e

¶y y ¶ ln y
= = bj Constant elasticity, does not change
¶x j x j ¶ ln x j when x varies

¶y y x jb j
= ¹ b j This is not the case in the linear model
¶x j x j x ' b + e since we might expect increasing returns
Logistic model

New technology or market entrance. First, slow growth


rate but expanding quickly and then stagnating at the
end of the period.

1
yt =
1+ e ( - x 't b + d t + e t )
yt Transformed in logs
log it ( y t ) = ln = x 't b + d t + e t
1 - yt
Functional forms & parameters’ interpretation

level-level y x Dy = b1Dx

level-log y log(x) Dy = (b1 / 100)%Dx

log-level log(y) x D% y = (100 b1 )Dx

log-log log(y) log(x) %Dy = b1 %Dx


Elasticity

For instance, a 1% increase in x increases a 0.25% the


endogenous variable

%Dy = b1 %Dx
log( y) = 256 + 0.25 log( x)
How to choose functional form

l Theoretical model

l Relevant to know the slope & the elasticity

l Coefficients should satisfy expectations

l R2 comparison but same exogenous variables (R2


often overrated)
Disturbance term...

e i = y - E( y / X = x i )

l Picks up omitted variables (parsimony, inappropriate


proxies, lack of information, theoretical lack)
l Functional form misspecification
l Measurement errors in the exogenous variables
l Picks up random behaviour or unpredictable
behaviour of economic agents
Matrix algebraic form

æ1 x11 ! x k1 ö
ç ÷
Y = Xb + e X = ç" " ÷
ç1 x ! x ÷
è 1n kn ø

é y1 ù é b1 ù é e1 ù
êy ú êb ú êe ú
Y = ê 2ú b=ê 2ú e = ê 2ú
ê! ú ê ! ú ê!ú
ê ú ê ú ê ú
ëyn û ëb K û ëe n û
Basic hypotheses in RM

General hypotheses:

1. RM is stochastic and the relationship between the


endogenous and exogenous variables is linear
2. Assume enough information. Number of
individuals should exceed the number of
parameters to be estimated
N³k enough degrees of freedom
Disturbance term hypotheses (I)

1. Zero conditional mean assumption (strictly exogenous):


contemporaneously uncorrelated and unobservables are
expected to be zero on average for any given value of x. The
same expected value for “unobservables” such ability for
different x values such education.
E (e i / X ) = 0

2. Constant variance : homoskedasticity assumption


2
V (e i / X ) = se
(II)

3. No autocorrelation (serial correlation)

[ (
COV(ei , e j / X) = E (ei - E(ei )) e j - E(e j ) )]
when E (e i / X ) = 0

COV (e i , e j / X ) = E (e i ·e j ) = 0
Both assumptions

é E (e 12 ) E (e 1e 2 ) ... E (e1e n ) ù és e2 0 ... 0ù


ê ú ê ú
ê E (e e ) E (e 2
2) ... E (e 2e n )ú ê 0 s e2 ... 0ú
E (ee ' | X ) = 2 1
= = s e2 I n
ê ... ú ê ... ú
ê 2 ú ê 2ú
êë E (e ne 1 ) E (e ne 2 ) ... E (e n ) úû êë 0 0 ... s e úû
(III)

4. Disturbances are normally distributed with zero


mean and constant variance

ei ~ N(0, se2 ·I n )
Hypotheses with regards exogenous variables

1. Fixed and independently determined of all other


factors. The only stochastic term is the disturbance
term
2. Not measurement errors
3. Contemporaneously uncorrelated with the
disturbance term: exogeneity E(X ki e i ) = 0
4. Complete rank: non-singular linear combination and
identification
Hypothesis on parameters

The only assumption is non structural change:


parameters (bk) are constant across the sample
RM estimation

l Ordinary Least Squares (OLS)

l Maximum likelyhood (ML)

l Non linear inference (NL)


OLS
600
500
400
300
200

400 500 600 700 800


despt

95% CI Fitted values


despm
Simple linear RM

y ei = yi - ŷi = yi - bˆ 0 - bˆ 1·x i

yi

û i

ŷi

xi x
OLS methodology

We select that regression line that minimize errors, that is, the
difference between the fitted value and the real value of the
endogenous variable.

Since positive and negative errors compensate and we can not


differentiate absolute errors, the only way is to minimize the
sum of squared errors (SSE).

Be aware that Wooldridge use SSE for explained and SSR for residuals
Goal
å( )
n

å
2
min e i2 = y i - bˆ 0 - bˆ 1 ·x i
i =1

Simple RM - OLS

ï
å
ì ¶ e i2
å
= -2 ( y i - bˆ 0 - bˆ 1 ·x i ) = 0 åy i = nbˆ 0 +bˆ 1 åx i
ï ¶bˆ 0
í
ï ¶å e 2

å ˆ - bˆ ·x ) = 0
å x i yi = bˆ 0 å x i + bˆ 1 å
i
ï ¶bˆ = -2 x i ( y i - b 0 1 i x i2
î 1

å yi nbˆ 0 ˆ å xi y = bˆ 0 + bˆ 1 ·x
= + b1
n n n

bˆ 0 = y - bˆ 1 ·x
Point estimate of the parameter

n åy i

bˆ 1 =
å x åx y
i i i
=
n å x y - å x å y = å x y - nxy
i i i i i i

n åx i
n å x - (å x )
2
i i å x - nx
2 2
i
2

åx åx i
2
i

bˆ 1 =
(å x y n ) - xy S
i i
=
xy
= rxy
Sy
(å x n )- x S
2
i
2 2
x Sx
Matrix notation

SSE ( bˆ ) = ( y - Xbˆ )' ( y - Xbˆ ) =


= y ' y - y ' Xbˆ - bˆ ' X ' y + bˆ ' X ' Xbˆ =
= y ' y - 2bˆ ' X ' y + bˆ ' X ' Xbˆ
Necessary condition to be a minimum

¶SSE ( bˆ )
= -2 X ' y + 2 X ' Xbˆ = 0
¶bˆ
OLS - Algebra

X' Xbˆ = X' y ˆb = (X' X) -1 (X' y)

Second derivative should be positive

¶ 2 SSE ( bˆ )
= 2X ' X
¶bˆ¶bˆ '
OLS estimators properties

l Linearity

l Unbiased

l Efficiency

l Consistent
(P1) Linearity

ˆb = (X' X ) -1 (X' y) = (X' X) -1 X' (Xb + e) =


= (X' X ) -1 X' Xb + (X' X ) -1 X' e =
-1
= b + (X' X) X' e

Parameters’ vector is a combination of the disturbance term,


the exogenous variables and population parameters
Randomness in OLS estimators

Given that estimator is linear combination of the disturbance


term, and this introduces randomness into the model, thus,
estimators are random variables following a normal distribution

Individually
bˆ ~ N(E[bˆ ], V[bˆ ])
bˆ j ~ N(E[bˆ j ], V[bˆ j ])

For finite samples this is critical


(P2) Unbiased

Bias (bˆ ) = E (bˆ ) - b

E[bˆ ] = E[b + (X' X) -1 X' e] =


-1 -1
E[b] + E[(X' X) X' e] = b + (X' X) X' E[e] = b
(P3) Efficiency

OLS estimators are efficient (BLUE): it refers to the variance


of estimators

V(bˆ ) = E{(bˆ - b)(bˆ - b)'} =


= E{[(X' X) -1 X' e][(X' X) -1 X' e]'} = E{( X' X) -1 X' ee' X(X' X) -1} =
= (X' X) -1 X' E[ee' ]X(X' X) -1} = (X' X) -1 X' s e2 I n X(X' X) -1 =
= s e2 (X' X) -1 X' X(X' X) -1 = s e2 (X' X) -1
RMSE= root mean squared error = “EQM”

(P4) Consistent

ˆ ˆ ˆ
RMSE ( b ) = V ( b ) + [bias ( b )] 2

ˆ ˆ
RMSE(b ) = V (b ) = s e ( X ' X )
2 -1 Multiply and divide by n,
assuming X’X/n exists and it is
finite but n®¥

[ ]
lim n®¥ EQM ( bˆ ) = lim n®¥ V ( bˆ ) = lim n®¥ s e2 ( X ' X ) -1 =
és e2 æ X ' X ö -1 ù
= lim n®¥ ê ç ÷ ú=0
êë n è n ø úû
OLS estimators

ˆb ~ N[b, s 2 (X' X) -1 ]
e

Individually

ˆb ~ N(b , s2a ) "j = 1,..., k


j j e jj

bˆ j - b j
bˆ j ~ N(b j , sb2ˆ ) "j = 1,..., k z= ~ N(0,1)
j sb j
Disturbance variance or error variance(s e2 )

We need to compute the variance of the disturbance term to know


the variance of parameter estimators and to test null hypotheses

E(SSE ) = E(e' e) = E[(Me )' (Me )] = E[eM ' Me ] = E(eMe )


eMe is a scalar
E(eMe) = tr{E[e' Me]} = tr{ME[ee' ]} = tr{Ms e2 I n } =
= s e2 tr{MIn } = s e2 tr{M} = s e2 (n - k )
2
sˆ e : how to estimate

e' e SSE
se =
ˆ 2
= ~ c n-k
2
n-k n-k
Now, the variances & covariances can be estimated

V(bˆ ) = se2 (X' X) -1 V̂(bˆ ) = sˆ e2 (X' X) -1

SSE = y' y - bˆ ' X ' y or SSE = y ' y - yˆ ' yˆ


bj parameters’ statistic

bˆ j - b j
bˆ j ~ N(b j , se2a jj ) ~ N(0,1) ~ ?
sˆ b j

sˆ b2 = sˆ e2 (X' X) -1

Dividing by s both numerator and denominator

( bˆ j - b j )· åx 2
i s
=
( bˆ j - b j )· åa jj s
=
( bˆ j - b j )· åa jj s
~
N (0,1)
= tn-k
S s
2 2
S (n - k ) (s (n - k ))
2 2
e' e (s (n - k ))
2
c 2
n-k n-k
Confidence intervals for bj

bˆ j - b j
~ t n -k [ ]
P - ta 2 £ t £ ta 2 =1- a
sˆ b j

é bˆ j - b j ù
P ê- t a 2 £ £ ta 2 ú = 1- a
ê Sbˆ ú
ë j û

[bˆ ± t
j a / 2 ·Sbˆ j ]
Confidence interval for s2

[ 2 2 2
]
P c 1-a 2 £ c £ c a 2 = 1 - a

é (n - k )sˆ 2 (n - k )sˆ 2ù
ê 2 <s < 2
2
ú
êë ca 2,n - k c1-a 2,n - k úû
Maximum likelyhood estimation (ML)

ML maximizes the probability of obtaining sample observations

1 ì 1 2ü
f (e i ) = exp í- 2 e i ý
2ps e2 î 2s e þ

Joint density function


n
ì 1 n

f (e i ) = Õ 2 -n / 2
f (e i ) = (2ps e ) exp í- 2 å ei ý
i =1 î 2s e i =1 þ
ML estimator

Likelyhood function

-n / 2 ì 1 ü
L( y; b, se2 ) = (2p) (se2 ) - n / 2 exp í- 2 ( y - Xb)' ( y - Xb)ý
î 2s e þ

Partial derivative of the function and equalizing to zero


(usually working in logs)

bˆ MV = (X' X) -1 (X' y) = bˆ MQO if ei~N


Goodness of fit

SSE = y' y - bˆ ' X ' y = y' y - yˆ ' yˆ


n n n

y ' y = yˆ ' yˆ + SSE åi =1


y i2 = å
i =1
ŷ i2 + å
i =1
e i2

y' y - ny 2 = ŷ' ŷ - ny 2 + e' e


Decomposing the total sum of squares (SST)

n n n n
SST = å ( yi - y ) 2 = å yi2 - ny 2 SSR = å
i =1
2
( yˆ i - y ) = å
i =1
yˆ i2 - ny 2
i =1 i =1

We decompose SST (total sample variation) into SSR


(explained sum of squares by means of the regression) & SSE
(sum of squared errors or residuals: unexplained)

RM should contain an
y' y - ny 2 = ŷ' ŷ - ny 2 + e' e
intercept
n

SST = SSR + SSE åe


i =1
i =0
How to obtain SSR

n n
SSR = å
i =1
( yˆ i - y ) 2 = å
i =1
yˆ i2 - ny 2

Two ways to obtain SSR

SQR = bˆ ' X' Xbˆ - ny 2 = bˆ ' X' y - ny 2


Coefficient of determination (R2)

2 SSE e' e
R =1- =1-
å(y
2
SST - y)
i

When the RM contains an intercept: (otherwise cannot be


garanteed SST³SSE)

2 SSR It shows the proportion of the sample


R = variation in the endogenous variable that is
SST
explained by means of the RM
Models without intercept

RM without intercept: 0£R2£1 is not satisfied

It is always better to include an intercept given that:


• If it is not statistically significant (null hypothesis equal to
zero) the regression line goes through the origin
• If theoretically should be an intercept, but we regress through
the origin (omission of the intercept) the model will be
misspecified
Intepreting R2

2 R2 1, the model fits reasonably well


0 £ R £1
R2 0, the model does not fit properly

Cross-section data R2=0.5 is quite good. For time series the value is higher.
Using microdata 0.2 is fair enough
To compare we need same sample size

D number of regressors also D R2 because ÑSSE. For comparison


reasons, we need to adjust the coefficient of determination
Adjusted coefficient of determination

It takes into account the degrees of freedom, that is, the number
of explanatory variables. It is called adjusted R2

2 n -1
R =1- (1 - R 2 )
n-k

For nested models (ennierats-anidados) we should always use


this coefficient. Two models are nested when one model
includes the other one
Proxying R2

It is the squared correlation between the real and fitted values

(å (y - y )(ŷ - ŷ ))
i i
2

(å (y - y) )æçè å (ŷ - ŷ) ö÷ø


2 2
R = 2
=ryŷ
2
i i

Useful but it is not a proportion of the variance explained by


the model
Normality test

l Histogram of residuals (x-axis for residuals and we draw a


normal distribution)
l Scatter plot (x-axis residuals & y-axis the expected value if the
residuals ~ normal distribution). Result must be a line
l Jarque-Bera test, based on symmetry & kurtosis (asymptotically
valid)
l Alternatives: other symmetry/kurtosis tests & Shapiro-Wilk
Jarque-Bera

é s (k - 3) 2 ù 2
JB = n ê + ú ~ ck
ë6 24 û

Symmetry & Kurtosis

If e~N, s=0 & k=3


Then, JB should be 0
Partial regressions

Estimating by means of OLS accounting for two sets of


variables (X1,X2), if both sets are orthogonal
(uncorrelated), the coefficients can be obtained
separately regressing y on X1 & on X2

y = Xb + e = X1b1 + X 2b 2 + e
Simple RM versus RM

~ ~ ~
simple RM y = b0 + b1x1

RM ˆ +b
ŷ = b ˆ x +b
ˆ x
0 1 1 2 2

~ ˆ
b1 ¹ b 1
Exception:
• b̂ 2 = 0 (no partial effect)
• x1 & x2 uncorrelated
Partial regression coefficients

bˆ 1 =
å r̂ y
i1 i

å r̂ 2
i1

r̂i1 are the residuals in: xˆ1 = gˆ0 + gˆ2 x2

We estimate using the uncorrelated part of x1 with x2 (ri1). We


regress yi on ri1 , that is, we obtain the effect of x1 on yi net of x2
Partial correlation coefficients

rij.v = partial correlation coefficient between i - j, holding v

r12 - r13r23 r13 - r12 r32


r12.3 = ; r13.2 =
(1 - r132 )(1 - r23
2
) (1 - r122 )(1 - r322 )
r23 - r21r31
r23.1 =
2
(1 - r21 )(1 - r312 )

You might also like