Non-Linear Methods 4.1. Asymptotic Analysis: 4.1.2. Stochastic Regressors
Non-Linear Methods 4.1. Asymptotic Analysis: 4.1.2. Stochastic Regressors
Non-Linear Methods
4.1. Asymptotic Analysis
4.1.1. Introduction
4.1.2. Stochastic Regressors
n = 1 [n [ A] +
1 [ [ A] = 0
/ [ A ~ `
. o
2
(A
/
A)
1
A
/
= 0
j lim
1
:
A
/
A
= (
2
4.1.3. Consistency
The probability limit of /
j|i:
:
/
:
= lim
:
1 [[/
:
[ _ o] = 1
= lim
:
1 [[/
:
[ o] = 0
1ct 1
:
= 1 [[/
:
[ o] . c:d o. 0. t/c:
` :.t.\: `. 1
:
<
Slutsky's Theorem
j|i:
:
/
:
= c
o () : co:ti:non:
= j|i:
:
o (/
:
) = o (c)
Theorem
1[
o] o
\ [
o] 0
= j|i:[
o] = o
3
Weak Law of Large Numbers
1
:
:
i=1
.
i
= j
1
:
:
i=1
r
i
i
=
1
:
A
/
= 0
/
:
= (A
/
A)
1
A
/
n = + (A
/
A)
1
A
/
j|i:
:
/
:
= + j|i:
(A
/
A)
1
A
/
= +
j|i:
A
/
A
:
1
plim
1
:
A
/
= + (
1
0
=
4
j|i::
2
:
= j|i:
1
: /
(c
/
c) = j|i:
1
: /
(
/
`)
= j|i:
:
: /
:
/
A
:
A
/
A
:
1
A
/
= o
2
0
/
(
1
0
:/ 1
2
i
= o
2
5
4.1.4. Asymptotic Normality
Central Limit Theorem
:( . j) ~ `
0.
2
:
A
/
d
`
0. o
2
(
Cramr's Theorem
.
:
1
~ ` (j. )
j|i:
:
=
=
:
.
:
d
` (j.
/
)
6
Asymptotic Distribution of OLS Estimator
/
:
= + (A
/
A)
1
A
/
:(/
:
) =
A
/
A
:
. .. .
(
1
1
A
/
:
....
`
:(/
:
)
d
`
0. o
2
(
1
7
4.1.5. Simulation examples
8
4.2. Non-linear Regression
4.2.1. Motivation
Functional Form and Interpretation of Coefcients
Linear:
n = ... + r
/
/
+ ... +
/
=
1 [n[r]
r
/
Log-linear
Cobb-Douglas Production Function
n =
1
1
2
1
3
c
/
=
log 1 [n[r]
log r
/
=
nn
r
/
r
/
= c|c:ticitn
2
= labour elasticity of output
a 1% increase in Labour leads to a
2
% increase in output
nb Constant Returns to Scale:1 and 1 increased by fact = cn
1
(c1)
2
(c1)
3
c
= c
2
+
3
n
CRS
H
0
:
2
+
3
= 1
10
Lin-log
n = ... + log (r)
/
/
+ ... +
/
=
n
r
/
r
/
Elasticity not constant
11
Log-lin
n =
1
c
r
2
log (n) = ... + r
/
/
+ ... +
2
: j:ojo:tio:c| change in n for a unit change in r
100*
2
- %c/c:oc in n for a unit change in r (but only small
2
& no
estimation uncertainty)
12
4.2.2. Non-linear Least Squares
n and Box-Cox transformed r
n =
1
2
1
2
n =
1
2
1
2
+
Least Squares
o () =
:
i1
(n
i
1 (r
i
. ))
2
= (n 1)
/
(n 1)
= c
/
c
13
First Order Conditions
o ()
= 2
c
/
c
= 2
:
i1
(n
i
1 (r
i
. ))
1 (r
i
. )
= 0
Note difference between foc NLS and LS after log transform
14
Model
n = c
r
2
Model in logs
log (n) = r
NLS in original model
o ()
= 2
:
i1
n
i
c
r
i
c
r
i
2
r
i
= 0
solve numerically
15
OLS in transformed model
o
log
()
= 2
:
i1
(log (n
i
) r
i
) r
i
= 0
= A
/
log n A
/
A = 0
/ = (A
/
A) A
/
log n
16
Identied Parameters
=
+
=
jd1 (n [ ) = jd1 (n [
+
)
=
+
=
o () = o (
+
)
n =
1
c
2
+r
3
=
+
1
c
r
3
: with
+
1
=
1
c
2
= c
+
2
+r
3
: with
+
2
= log(
1
) +
2
There are an innite number of
1
and
2
that lead to the same
model (DGP)
17
Statistical Properties of NLS
\ c: (/
`1o
) - :
2
(A
/
A)
1
:
2
=
c
/
c
: /
=
1
: /
:
i1
(n
i
1 (r
i
. /))
2
A
:/
=
1
/
=
1 (r
1
. )
/
.
.
.
1 (r
:
. )
/
/
= A
/
= A
18
Conditions for Asymptotic Properties
Unique minimum
not strangely behaved
Let
0
correspond to DGP
1
:
o () =
1
:
:
i1
( n
i
1 (r
i
.
0
)
. .. .
+ 1 (r
i
.
0
) 1 (r
i
. ) )
2
=
1
:
:
i1
2
i
+
2
:
:
i1
i
(1 (r
i
.
0
) 1 (r
i
. ))
+
1
:
:
i1
(1 (r
i
.
0
) 1 (r
i
. ))
2
19
Three terms:
1
:
:
i1
2
i
o
2
2
:
:
i1
i
(1 (r
i
.
0
) 1 (r
i
. )) =
2
:
:
i1
i
1
/
(
0
)
=
2
:
(
0
)
:
i1
i
1
/
= 0
plim
1
:
:
i1
(1 (r
i
.
0
) 1 (r
i
. ))
2
= 0 == =
0
:(/
`1o
0
)
d
`
0. o
2
(
1
( = plim
A
/
A
:
20
This asymptotic result gives an APPROXIMATION
/
`1o
cjj:or
~ `
0
. :
2
(A
/
A)
1
t =
/
`1o.i
0.i
:
2
(A
/
A)
1
ii
cjj:or
~ t
:/
1 =
(c
/
:
c
:
c
/
c) o
c
/
c (: /)
cjj:or
~ 1 (o. : /)
21
Last Week
Outline
Asymptotics
LLN,
Mann-Wald, Asymptotic Normality
Nonlinear Least Squares
Identication =
+
= Models different. Cf A multicollinear
This week:
Finish non-linear optimization
Maximum Likelihood
22
4.2.3. Non-linear Opimization
Maximize`Minimize 1 (o)
1 (o) : (:itc:io:(/,cctic 1n:ctio:
G(o) =
1 (o)
o
H (o) =
2
1 (o)
o o
/
F.O.C.
G(o) = 0
Start o
1
and nd successive o
/
23
Newton-Raphson
G(o) - G
^
o
/
+ H
^
o
/
o
^
o
/
G(o) = 0 =
G
^
o
/
+ H
^
o
/
o
^
o
/
= 0
^
o
/+1
=
^
o
/
H
^
o
/
1
G
^
o
/
^
o
/
Requires Hessian
Gauss-Newton : Requires only rst derivatives
24
Gauss-Newton
Linearize 1 and objective f
io:
quadratic
1 = 1
/
+ G
/
/
o
^
o
/
+ 1
n = 1 (r. ) +
1 (r. ) = 1
/
(r) + o
/
(r)
/
+ :
/
(r)
o (o) =
:
i1
(n
i
1 (r
i
. ))
2
1 (r. ) - 1
/
(r) + o
/
(r)
/
= 1
/
(r) + o
/
(r)
/
o
/
(r)
/
^
/
25
Replacing gives a new criterion function
o
/
(o) =
:
i1
n
i
1
/
(r
i
) o
/
(r)
/
2
=
:
i1
( n
i
1
/
(r) + o
/
(r)
/
^
/
. .. .
.
o
/
(r)
/
)
2
=
:
i1
(.
/i
o
/
/i
)
2
= (. A)
/
(. A) = o () : oo1
=
^
= (A
/
/
A
/
)
1
A
/
.
/
A
/
:/
= o
/
(r) =
1
=
/
.
/i
= n
i
1
/
(r
i
) + o
/
(r
i
)
/
^
/
26
c
/i
= n
i
1
r
i
.
^
c
/
= n 1
/
. = A
/
^
/
+ c
/
^
/+1
= (A
/
/
A
/
)
1
A
/
/
.
/
= (A
/
/
A
/
)
1
A
/
/
A
/
^
/
+ c
/
=
^
/
+ (A
/
/
A
/
)
1
A
/
/
c
/
27
4.2.4. Lagrange Multiplier Test
n = A
1
1
+ A
2
2
+
H
0
:
2
= 0 : o 1
Lagrange Function
(
1
.
2
. \) = o (
1
.
2
) + 2\
/
(
2
0)
o = c
/
c = (n A
1
1
A
2
2
)
/
(n A
1
1
A
2
2
)
28
F.O.C.
1
= 2A
/
1
(n A
1
/
1
A
2
/
2
) = 0
2
= 2A
/
2
(n A
1
/
1
A
2
/
2
) + 2
^
\ = 0
\
= /
2
= 0
=
0 = 2A
/
1
(n A
1
/
1
) : (3 i: 1)
/
1.1
= (A
/
1
A
1
)
1
A
/
1
n
^
\ = A
/
2
c
1
Note (2): exhibit 4.8.
^
\ is the shadow price of the restriction
^
\ =
1
2
o (/
1
. 0)
2
29
Derivation of the LM-statistic
H
0
:
2
= 0 :
c
1
= n A
1
/
1
= `
1
n
= `
1
(A
1
1
+ A
2
0 + )
= `
1
^
\ = A
/
2
c
1
= A
/
2
`
1
~ `
0. o
2
A
/
2
`
1
A
2
^
\
/
(A
/
2
`
1
A
2
)
1
^
\o
2
~ .
2
o
^
\
/
(A
/
2
`
1
A
2
)
1
^
\^ o
2
- .
2
o
30
Steps:
1. Estimate restricted model
n = A
1
1
+
2. Auxiliary regression of c
1
from step 1
c
1
= A
1
c
1
+ A
2
c
2
+ :c:idnc|: = (A
1
: A
2
)
c
1
c
2
+ :c:idnc|:
3.
1` = : 1
2
- .
2
o
31
Proof Step 3
1` =
^
\
/
(A
/
2
`
1
A
2
)
1
^
\
^ o
2
=
c
/
1
A
2
(A
/
2
`
1
A
2
)
1
A
/
2
c
1
c
/
1
c
1
:
oo1 = c
/
1
A (A
/
A)
1
A
/
c
1
: but A
/
1
c
1
= 0 :o::c| cnctio:: :c:t:ictcd model
= (0 : c
/
1
A
2
)
(A
/
2
`
1
A
2
)
1
0
A
/
2
c
1
= c
/
1
A
2
(A
/
2
`
1
A
2
)
1
A
/
2
c
1
c
/
1
c
1
: oo1 in auxiliary model =: + ^ o
2
so
1` =
oo1
oo1 :
= : 1
2
32
Intuition
If A
2
are irrelevant, then they should not be able to explain anything
in the residuals
Nonlinear Regression
AnaloguesSteps:
1. Estimate restricted model NLS
2. Auxiliary regression of c
1
from last step 1 on all explanatory
variables o (the derivatives of 1)
3. 1` = : 1
2
- .
2
o
4.2.5. Illustration : Coffee Sales
33
4.3. Maximum Likelihood
4.3.1. Motivation
4.3.2. Maximum Likelihood Estimation
Density: of n
i
given r
i
and parameters o :
j (n
i
[r
i
; o)
Joint Density of n independent observations is product
j(n[o. A) =
:
i=1
j (n
i
[r
i
; o)
34
Log-Likelihood function of parameters o
log 1 = log
:
i=1
j(n
i
[r
i
; o)
=
:
i=1
log j(n
i
[o; r
i
)
/ (o) = log 1(o) = log j (n. A. o)
35
Maximize the Likelihood as function of o
First Order and Second Order Derivatives
G(o)
/1
=
/ (o; n)
o
: oco:c
H
//
=
2
/ (o; n)
oo
/
: Hc::ic:
Independence
G(o)
/1
=
1
:
:
i=1
/
i
o
: H
//
=
1
:
:
i=1
2
/
i
oo
/
36
First Order Conditions
Non-linear Numerical solution =
G(o)
/1
=
/ (o; n)
o
= 0
=
^
o
`1
Second Order Conditions Maximum & Informatie Matrix
H
^
o
=
2
|:1
/
: `coctic 1c1i:itc
Loglikelihood function concave
37
4.3.3. Asymptotic Properties
J
:
(o
0
) = 1
/ (o; n)
o
/ (o; n)
o
/
= 1 [H]
iff information matrix equality holds
J
0
= lim
1
:
J
:
(o
0
)
^
o
`11
o
d
`
0. J
1
0
38
Recall:
LogLikelihood; / (o) = log 1(o) = log j (n. o)
= log j (n. A. o)
Score :G(o)
/1
=
/ (o; n)
o
Estimator: First order conditions
G(o) = 0
^
o
`11
39
Further details hidden in book
Fix o : G(o. n) Random Variable with expectation 0
1 =
j (n. o) dn =
c
/(o;n)
dn
o
[ " ] = 1 [G(o)] = 0 : Score has Expectation 0
Details:
40
Score has Expectation 0
0 =
o
1
=
o
c
/(o;n)
dn
=
o
c
/(o;n)
dn =
/ (o; n)
o
c
/(o;n)
dn
=
G(o) c
/(o;n)
dn
=
G(o) jd1 (n) dn
1 [G(o)] = 0
41
Information Matrix Equality
Hessian : H
//
=
2
/ (o; n)
o o
/
Same trick as for score f
io:
0 =
o
/ (o; n)
o
/
c
/(o;n)
dn
=
o
/ (o; n)
o
/
c
/(o;n)
dn
=
2
/ (o; n)
oo
/
c
/(o;n)
+
/ (o; n)
o
/ (o; n)
o
/
c
/(o;n)
dn
42
/ (o; n)
o
/ (o; n)
o
/
c
/(o;n)
dn =
2
/ (o; n)
oo
/
c
/(o;n)
dn
G(o) G(o)
/
jd1 (n) dn =
H (o) jd1 (n) dn
\ c: (G) = 1 [H] = J
:
This is called the Information Matrix Equality
43
Distribution of the Score
G(o)
cjj:
~ ` (0. J
:
(o))
G(o) = G
^
o
+
G(o)
o
o(
^
o.o)
o
^
o
^
o o = H
1
G(o)
cjj:
~ `
0. H
1
J
:
H
1
cjj:
~ `
0. H
1
or
^
o o
d
`
0. J
1
0
44
ML in linear model
n = A + . ~ `
0. o
2
1
:
n ~ `
A. o
2
1
:
jd1
n; . o
2
. A
=
1
(2:o
2
)
:2
exp
1
2o
2
(n A)
/
(n A)
. o
2
=
:
2
log (2:)
:
2
log
o
2
1
2o
2
(n A)
/
(n A)
45
First Order Conditions
0 =
/
=
1
o
2
A
/
(n A) =
1
o
2
A
/
c
0 =
/
o
2
=
:
2o
2
+
1
2 (o
2
)
2
(n A)
/
(n A) =
:
2o
2
+
1
2 (o
2
)
2
c
/
c
^ o
2
= c
/
c:
^
= (A
/
A)
1
A
/
n
46
Note:
Trick: put the solution ^ o
2
= c
/
c: back into the likelihood function to
obtain a function that only depends on
/ () =
:
2
log (2:)
:
2
log
c
/
c
:
1
2c
/
c:
c
/
c
_
:
2
log (c
/
c)
The c's depend on and Maximizing / () is Minimizing c
/
c = OLS.
47
4.3.4. Likelihood Ratio Test
Restrictions o non-linear restrictions
H
0
: : (o) = 0
o1
11 = 2
log
^
o
1
log
~
o
0
d
.
2
(o)
Exhibit 4.13
48
In linear model: concentrate likelihood ^ o
2
= ^ o
2
() = c
/
c:
/ () = constant
:
2
log (c
/
c)
/ (
0
) = constant
:
2
log (c
/
1
c
1
)
: () = 0
o1
11 = 2
:
2
log (c
/
c)
:
2
log (c
/
1
c
1
)
= :(log
c
/
1
c
1
c
/
c
11 = :log
1
c
/
c
c
/
c
+
c
/
1
c
1
c
/
c
= :log
1 +
o
: /
1
49
4.3.5. Wald Test
H
0
: : (o) = 0
:
^
o
1
= 0 + 1
0
^
o
1
o
0
+ :c:ci:dc:
::
^
o
1
d
`
0. 1
0
J
1
0
1
/
0
\ = :
^
o
1
1
0
J
1
:
^
o
1
1
/
0
1
:
^
o
1
- .
2
(o)
Exhibit 4.14
50
4.3.6. Lagrange Multiplier Test
max / (o) = log 1(o)
:.t. : : (o) = 0
= / (o) \
/
: (o)
1` =
^
\
/
\
1
^
\
1` =
/
o
/
1
2
/
oo
/
1
/
o
evaluated at o =
^
o
0
Exhibit 4.15
51
4.3.7. LM-test in the Linear Model
4.3.8. Remarks on Tests
4.3.9. Two Examples
52
Last Week ML
Probability Model j (n[o) = 1 (n. o)
Likelihood 1(o[n) = 1 (n. o)
Log-likelihood / (o) = ln 1(o[n)
Score-function:
G =
/
o
G(o[n) = 0 : `11
G(n[o) : :tcti:tic
G(n[o
0
)
cjj:
~ ` (0. I)
53
MLE
^
o o - H
1
G
cjj:
~ `
0. H
1
I H
1
LR
/
^
o
= /
~
o
+ G
~
o
^
o
~
o
+
1
2
^
o
~
o
/
H
~
o
^
o
~
o
^
o
~
o
=
^
o
~
o
/
H
~
o
^
o
~
o
cjj:
~ .
2
(o)
LM
Wald
54
4.4. GMM Generalized Method of Moments
4.4.1. Motivation
Method of Moments
OLS
ML
IV
4.4.2. GMM Estimation
55
1 [o (o)] = 0
1
:
:
i=1
o
i
(o) = 0
Method of Moments i.i.d. 1 [n
i
] = j
1 [n
i
^ j] = 0
1
:
:
i=1
[n
i
^ j] = 0 ==
t
/
(n ^ j t) = 0
^ j =
1
:
:
i=1
n
i
56
Least Squares
n
i
= r
/
i
+
i
1 [
i
[r
i
] = 0 == co (r
i
.
i
) = 0
1 [r
i
i
] = 0
1
:
:
i=1
[r
i
c
i
] =
1
:
A
/
c = 0
1
:
:
i=1
r
i
n
i
r
/
i
^
=
1
:
A
/
n A
^
= 0
Maximum Likelihood
57
/ (o; n) : log -likelihood
1
/ (o; n)
o
= 0
1
:
:
i=1
/
i
o
= 0
The Generalized Method of Moments
1 [o (o)] = 0
o (o) : R
j
R
:
j : # jc:c:ctc::
: : # :o:c:t :c:t:ictio::
: : # o/:c:ctio::
j _ : _ :
58
j = : : crcct idc:ti1icctio:
j < : : oc: idc:ti1icctio:
j = : = : theoretically possible & can solve, but generally non-
sensical results.
If observations i.i.d.
G
:
(o) =
:
i=1
o
i
(o) = 0
: j cannot solve: there is no o s.t. all moment conditions are
satised
choose best one using appropriate metric - distance - weights
1
:
G
/
:
\G
:
(4.63)
4.4.3. GMM Standard Errors
59
Central limit theorem
1
:
G
:
(o
0
)
d
` (0. J
0
)
J
0
= 1 [o
i
(o
0
) o
/
i
(o
0
)]
OLS
G
:
(o) =
:
i=1
r
i
n
i
r
/
i
^
= A
/
J
0
= 1
r
i
2
i
r
/
i
= o
2
r
i
r
/
i
o
2
j lim
1
:
:
i=1
r
i
r
/
i
= o
2
j lim
1
:
A
/
A = o
2
(
1
:
A
/
d
`
0. o
2
(
60
ML
G
:
(o
0
) =
:
i=1
/
i
o
=
/
o
o=o
0
1
:
/
o
o=o
0
d
` (0. J
0
)
J
0
= 1
/
i
o
/
i
o
/
= lim
1
:
:
i=1
1
2
/
i
oo
/
= J
0
61
Asymptotic Distribution of GMM Estimator
First order approximation
G
:
(o) - G
:
(o
0
) + H
:
(o
0
) (o o
0
)
H =
G
o
/
: :j
rst order condition minimum 4.63
1
:
G
/
\G
H
/
\ (G
:
(o
0
) + H (o o
0
)) = 0
G
:
(o
0
) + H
:
(o
0
) (o o
0
) = 0
^
o = o
0
(H
/
\H)
1
H
/
\G
:
(o
0
)
^
o o
0
d
` (0. \ )
\ = (H
/
\H)
1
H
/
\J
0
\
/
H (H
/
\H)
1
62
Choice of Weight Matrix W
\ = (H
/
\H)
1
H
/
\J
0
\
/
H (H
/
\H)
1
depends on the choie of \
\ arbitrary positive denite
\ = 1
:
\
1
:
= (H
/
H)
1
H
/
J
0
H (H
/
H)
1
Information in moments that are not very accurate less valuable
less weight
Weights inversely proportional to Variance matrix
\ = J
1
0
\
c11
=
H
/
J
1
0
H
1
H
/
J
1
0
J
0
J
1
0
H
H
/
J
1
0
H
1
=
H
/
J
1
0
H
1
\
1
:
\
c11
= j.d.
63
\
c11
^
o
G``
=
H
/
J
1
0
H
1
J
1
0
: ::c||
H = Go
/
large
OLS
\
(1o
=
H
/
J
1
0
H
1
=
(o
2
(
1
(
1
= o
2
(
1
- o
2
1
:
A
/
A
1
ML
\
`1
=
H
/
J
1
0
H
1
=
J
0
J
1
0
J
0
1
= J
1
0
64
Iterative Choice of Weights
J
0
unknown
J
:
=
:
i=1
o
i
^
o
o
/
i
^
o
. H
:
=
:
i=1
o
i
^
o
o
/
c:
^
o
-
1
:
\ =
H
/
:
J
1
:
H
:
1
65
Test of Moment Conditions
Test for overidentifying moment restrictions
G
/
:
J
1
:
G
:
cjj:
~ .
2
:j
Nb in exactly identied case : = j
G
/
\G = 0
66
4.4.4. Quasi-ML
Moment conditions
1
/ (o; n)
o
= 0
Likelihood not correct (misspecied): Information Matrix Eequality
does not hold:
J
0
= H
0
67
Can still use
c:
^
o
-
H
/
:
J
1
:
H
:
1
J
:
=
:
i=1
o
i
^
o
o
/
i
^
o
.
H
:
=
:
i=1
o
i
^
o
o
/
68
4.4.5. GMM in Simple Regression
See book for details
4.4.6. Illustration : Stock Market Returns
69
Consumption Based Asset Pricing Model
Consider a representative agent making decisions about
Consumption
Investment
Maximize discounted utility
1
i=0
o
i
0
l (c
t+i
) [
t
t
information set available at time t
70
c
t
+
`
,=1
j
,.t
,.t
= n
t
+
`
,=1
j
,.t
,.t:
,
c : co::n:jtio:
j : j:icc
,.t
: quantity of asset , bought in t
Optimal Path of consumption and investment satises
j
,.t
l
/
(c
t
) = o
:
,
0
1
:
,.t+:
,
l
/
c
t+:
,
[
t
l
/
(c) marginal uility of consumption
Utility lost by foregoing consumption in t to buy 1 unit of asset , :
j
,.t
l
/
(c
t
)
=
71
Value in period t of the expected utility gained from con-
suming the return on the investement in period t + :
,
:
o
:
,
0
1
:
,.t+:
,
l
/
c
t+:
,
[
t
o
:
,
0
:
,.t+:
,
j
,.t
l
/
c
t+:
,
l
/
(c
t
)
1[
t
= 0
Need explicit choice of utility function
72
Hansen and Singleton (1982)
Constant Relative Risk Aversion (CRRA)
l (c
t
) =
c
t
1
o
:
,
0
:
,.t+:
,
j
,.t
c
t+:
,
c
t
[
t
1 = 0
n
,t
= o
:
,
0
:
,.t+:
,
j
,.t
c
t+:
,
c
t
1
n
/
. = 0
73