0% found this document useful (0 votes)
6 views15 pages

LLICO2b_ECO1_English

Matrices for Econometrics

Uploaded by

Amer Ibrahim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views15 pages

LLICO2b_ECO1_English

Matrices for Econometrics

Uploaded by

Amer Ibrahim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

2.2. ORDINARY LEAST SQUARE ESTIMATION (OLS).

STATISTICAL PROPERTIES

Once specified the model, the step forward is to estimate


it, i.e. to get numeric values for the parameters of the
MLRM given the available data:
β$ 1 , β$ 2 ,..., β$ k

We analyze two methods of estimation:

a) Ordinary Least Square Estimation (OLS)

b) Maximum Likelihood Estimation (ML)

We begin with the first method, the OLS estimation, and


we apply it to simple case: The simple linear regression
model (SLRM):

y i = β1 + β 2 x 2 i + u i
y Point Cloud:

Set of observations
x x used to estimate the
x
x x x model
x
x x
x x x
x
x x
x

x2
15
To estimate the SLRM suppose to get values for the
parameters β1 and β2 such that they fit in the best possible
way the point cloud....

x x
y$ = β$ 1 + β$ 2 x 2
x
x x x
x
x x
x x x
x
x x
x

x2

the idea is to find that straight line that is the best in fitting
the point cloud, i.e., that is characterized by the least
distance between each point and its representation on the
straight line....

Therefore, we define the residual of the estimation as:

ei = y i − y$ i = y i − β$ 1 − β$ 2 x 2 i ( )
Or in matrix form:

$ = Y − Xβ$
e= Y−Y
Hence, we will obtain those values for β$ 1 and β$ 2 such
that the residual is the minimum.

16
In order to minimize the residual we can follow different
criteria:

• Minimize the simple sum of residuals. However, this


is not useful since the errors can be positive and
negative and therefore summing up can compensate.

• Minimize the sum of residuals in absolute values.


This is also not a useful choice since the absolute
values operator is not differentiable.

• Minimize the sum of squared residuals. This is the


common choice.

$ $
Hence, we will obtain those values of β1 and β 2 that
make minimum the sum of squared residuals(SSR)....

Min SSR

on:
N N 2

SSR = ∑ ei2 = ∑ ( y − yˆ ) = e12 + e22 + ... + eN2


i =1 i

Or in matrix form,

17
 e1 
 
N
 e2 
SSR = ∑ ei2 = e' e = (e1 e2 ... eN )  ...  =
i =1
 
e 
 N
( )( )
= Y − Yˆ ' Y − Yˆ =
( )( ) ( )(
= Y − Xβˆ ' Y − Xβˆ = Y ' − βˆ ' X ' Y − Xβˆ = )
= Y ' Y − Y ' Xβˆ − βˆ ' X 'Y + βˆ ' X ' Xβˆ =
= Y ' Y − 2 βˆ ' X 'Y + βˆ ' X ' Xβˆ

Hence:

(
Min SSR = Min Y ' Y − 2 βˆ ' X ' Y + βˆ ' X ' Xβˆ )

= =
(
∂SSR ∂e' e ∂ Y ' Y − 2βˆ ' X ' Y + βˆ ' X ' Xβˆ )
= −2 X ' Y + 2 X ' Xβˆ
∂βˆ ∂βˆ ∂βˆ

∂SSR
=0
∂β
ˆ

−2 X' Y + 2 X' Xβ$ = 0

18
βˆOLS = ( X ' X )−1 ( X ' Y )
with:

 1 1 ... 1   1 x 21 ... x k1 
  
 x 21 x 22 ... x 2 N   1 x 22 ... x k 2 
( X' X ) =  =
... ... ... ...   ... ... ... ... 
  
 x k1 xk2 ... x kN   1 x 2 N ... x kN 
 N N

 N ∑ x2i ... ∑ x ki 
i =1 i =1
N N N 
 x2i ... ∑ x 2 i x ki 
= ∑ ∑x 2
2i
i =1 i =1 i =1 
 ... ... ... ... 
 N x N
... ∑ x 2ki 
N

 i=1 ki
∑ x ki x 2i 
i =1 i =1

 N 
 1 1 ... 1   y1   ∑ y i 
i =1
   N 
x 21 x 22 ... x 2 N   y 2   x2i yi 
( ) 
X ' Y =  = ∑ 
... ... ... ...   ...  i =1
    ... 
 x k1 xk2 ... x kN   y N   N x y 

 i=1 ki i 

19
Therefore, β̂OLS has dimension Kx1.

To prove that we have reached a minimum, we can


compute the second derivative and verify that this is
positive:

= 2 =
( )
∂ 2 SSR ∂ 2 e' e ∂ 2 Y ' Y − 2βˆ ' X ' Y + βˆ ' X ' Xβˆ
=
∂ β2 ˆ
∂ β ˆ ∂ β 2 ˆ

=
(
∂ − 2 X ' Y + 2 X `X βˆ )
= 2 X'X
∂βˆ

This expression is positive since X’X is definite positive.


In fact the elements of the principal diagonal are the sum
of the squares of the observations of each explicative
variable.

The OLS estimator has 4 properties:

1ª) Linearity
2ª) Unbiasedness
3ª) Efficiency
4ª) Consistency

20
1ª) Linearity

The estimators are a linear combination of the population


parameters β, the explicative variables and the error term.
Therefore,

β$ = ( X' X) ( X' Y) = ( X' X) ( X' ( Xβ + U)) =


−1 −1

= ( X' X) (X' X)β + (X' X)−1 X' U =


−1

= β + ( X' X) X' U
−1

Since the error term follows a Normal distribution, the


estimators also will follow a Normal distribution.

2ª) Unbiasedness

An estimator is unbiased when its expected value is equal


to the population parameter that it estimates:

( ) (
E(β) = E ( X' X) X' Y = E β + ( X' X) X' U =
$ −1 −1
)
= β + ( X' X ) X' E( U) = β
−1

()
Bias ( βˆ ) = E βˆ − β = 0

21
3ª) Efficiency

Among all the linear and unbiased estimators, the OLS


estimator is the one that has the minimum variance (Gauss
Markov theorem).

This property is important because the fact the estimator is


unbiased does not guarantee that the numeric value of the
estimator is close to the real value, but only that on
average it will coincide with this value.

( ) (( )( )) ( )(
VAR βˆOLS = E βˆ − β βˆ − β ' = E ( X ' X ) X 'U ( X ' X ) X 'U ' =
−1 −1
)
= ( X ' X ) X ' E (UU ' ) X ( X ' X ) =
−1 −1

= σ u2 ( X ' X ) X ' I N X ( X ' X ) =


−1 −1

= σ u2 ( X ' X )
−1

( )
VAR βˆOLS = σ u2 ( X ' X )
−1

( )
VAR βˆ j = σ u2a jj ∀j = 1,2,..., k
ajj is the j-element of the
principal diagonal of
(X'X)-1

 var(β$ 1 ) cov(β$ 1 , β$ 2 ) ... cov(β$ 1 , β$ k ) 


 
$  cov(β$ 2 , β$ 1 ) var(β$ 2 ) $ $
... cov(β 2 , β k )
VAR (β) =  
... ... ... ...
 
 cov(β$ k , β$ 1 ) cov(β$ k , β$ 2 ) ... $
var(β )  k

22
An estimator that satisfies the properties of linearity,
unbiasedness and efficiency is also called BLUE (best
linear unbiased estimator).

4ª) Consistency

An estimator is consistent if, with number of observation


that tends to the infinity, its value proxies the population
value:

N β$ β
We can say that an estimator is consistent when the limit
of its Mean Squared Error (MSE) with N that tends to
infinity is equal to zero:

LimN →∞ MSE ( βˆ ) = 0

Since: ˆ ˆ (
MSE ( β ) = VAR ( β ) + bias ( β )
ˆ )2

23
Lim N →∞ MSE ( βˆ ) = Lim N →∞VAR ( βˆ ) =
−1
σ X'X 
2
( 2
u)−1
= LimN →∞σ X ' X = Lim N →∞ u
  =0
N  N 

Supposed that it exists


and is a fixed (finite)
number.
Therefore,

βˆOLS ~ N(β ,σ u2 ( X ' X )−1 )

βˆ j ,OLS ~ N(β j ,σ u2a jj )∀j = 1,2,..., k

2.3. RESIDUAL ANALYSIS

We use the residuals of the estimation as estimators of the


disturbances:

eOLS = Y − Xβ̂ OLS

Properties of the OLS residuals:

24
1ª) The vector of the residuals is a linear combination of
the endogeneous variable.

$ = Y − Xβ$ = Y − X( X' X) −1 X' Y =


e= Y−Y
[ ]
= I − X( X' X) −1 X' Y = MY

M=I-X(X'X)-1X' is:

• A squared matrix with dimension NxN


• symmetric

• idempotent (MM=M). Since it is symmetric it is true

that MM'=M'M=M'M'=M
• a singular matrix: M = 0

• orthogonal with respect to X (MX=0)

• with trace M=N-k

2ª) e is a linear combination of the u

e = MY = M ( Xβ + U) = MXβ + MU = MU

3ª) e is orthogonal to matrix X

e' X = ( MY)' X = ( MU)' X = U' M ' X = 0


X' e = X' ( MU) = 0

4ª)The sample mean of the residuals is 0 when the model


has a constant term

25
Given that:
 1 1 ... 1   e1   0 
    
x 21 x 22 ... x 2 N   e 2   0 
X' e =  =
 ... ... ... ...   ...   ...
    
 x k1 x k2 ... x kN   e N   0 

implies:
N N
1
∑e
i =1
i = 0 and e=
N
∑e
i =1
i =0

5ª) e follows a Normal distribution

Since e=MU and U~N(0,σ2uIN), it follows:

e~N(0, σ2uM)

ei~N(0, σ2umii) i=1,2,...,N

E ( e) = E ( MU ) = ME ( U ) = 0

 var(e1 ) cov(e1 , e 2 ) ... cov(e1 , e N ) 


 
cov(e 2 , e1 ) var(e 2 ) ... cov(e 2 , e N )
VAR (e) =  =
 ... ... ... ... 
 
 cov(e N , e1 ) cov(e N , e 2 ) ... var(e N ) 

= E[ee'] = E[ MUU' M'] = ME[ UU']M' = σ 2u MM' = σ 2u M

26
Therefore the residuals are not spherical.

2.4. VARIANCE ESTIMATION OF THE DISTURBANCE


TERM

Since U is a no observable term, its variance σ2uIN is also


not observable.

This is very important since we need to know this variance


in order to perform test on the estimated OLS
coefficients.,...

βˆOLS ~ N(β ,σ u2 ( X ' X ) −1 )

Therefore we estimate σ2u taking into account the fact that


the vector e of the residuals of the OLS estimation is an
estimate of the vector U. Hence:

E [SSR ] = E [e' e] = E [( MU )' ( MU )] = E [U ' M ' MU ] =


= E [U ' MU ] = E [tr (U ' MU )] = E [tr (MUU ')] =
(
= tr (E [MUU ']) = tr (ME[UU ']) = tr Mσ u2 I N = )
= σ u2 tr (MI N ) = σ u2 tr (M ) = σ u2 (N − k )

In this why we obtain the unbiased estimator of the


variance of the perturbation term:

27
e' e
σˆ u2 ,OLS =
N −k

Hence, we will say that:

()
VAR β$ = σ 2u ( X' X) −1 ()
$ β$ = σ$ 2 ( X' X) −1
VAR u

VAR(β$ j ) = σ u2 a jj $ (β$ ) = σ$ 2 a
VAR j u jj

In order to compute e'e (SSR) we can use three ways:


e' e = ∑ e2i
i =1


e' e = Y' Y − β$ ' X' Y

e' e = ( Y − Ŷ)' ( Y − Ŷ) = ( Y − Xβˆ )' ( Y − Xβˆ ) = ( Y'− βˆ ' X' )( Y − Xβˆ ) =
= Y' Y − Y' Xβˆ − βˆ ' X' Y + βˆ ' X' Xβˆ = Y' Y − 2βˆ ' X' Y + βˆ ' X' Xβˆ =
= Y' Y − 2βˆ ' X' Y + βˆ ' X' X( X' X) −1 ( X' Y) = Y' Y − βˆ ' X' Y

• e' e = Y' Y − Ŷ' Ŷ

28
e' e = Y' Y − β$ ' X' Y = Y' Y − β$ ' X'( Xβ$ + e) = Y' Y − β$ ' X' Xβ$ − β$ ' X' e =
= Y' Y − β$ ' X' Xβ$ = Y' Y − Y $ 'Y
$

29

You might also like