0% found this document useful (0 votes)
61 views238 pages

Rayleigh Distribution Estimation Exercises

The document discusses estimation theory related to the Rayleigh distribution, focusing on the properties of an unbiased estimator for the parameter σ² derived from independent normally distributed random variables. It provides detailed calculations for expected values and variances, as well as the consistency of the estimator. Additionally, it includes instructions for approximating the finite sample distribution using MATLAB simulations.

Uploaded by

danyprimovalli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views238 pages

Rayleigh Distribution Estimation Exercises

The document discusses estimation theory related to the Rayleigh distribution, focusing on the properties of an unbiased estimator for the parameter σ² derived from independent normally distributed random variables. It provides detailed calculations for expected values and variances, as well as the consistency of the estimator. Additionally, it includes instructions for approximating the finite sample distribution using MATLAB simulations.

Uploaded by

danyprimovalli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Exercises Chapter 1

Estimation Theory
Data science and advanced programming

Christophe Hurlin

HEC Lausanne

September 2024

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 1 / 68
Exercise 1

Rayleigh distribution

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 2 / 68
Problem
We consider two continuous independent random variables X and Y normally distributed
with N 0, σ2 . The transformed variable R de…ned by:
p
R = X2 + Y2

has a Rayleigh distribution with a parameter σ2 :

R Rayleigh σ2

with a pdf fR r ; σ2 de…ned by:

r r2
fR r ; σ 2 = exp 8r 2 [0, +∞[
σ2 2σ2
r
π 4 π
E (R ) = σ V (R ) = σ2
2 2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 3 / 68
Problem (cont’d)
We consider an i .i .d . sample fR1 , R2 , .., RN g and an estimator (MLE) of σ2 de…ned by

1 N 2
2N i∑
b2 =
σ Ri
=1

b2 is an unbiased estimator of σ2 .
Question 1: Show that σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 4 / 68
Solution:
We have:
1 N 2
2N i∑
b2 =
σ Ri
=1

b2 .
We want to calculate E σ

Since the sample fR1 , R2 , .., RN g is i .i .d ., we have:


!
1 N 2 1 N
E σ b2 = E
2N i =1∑ R i =
2N i∑
E Ri2
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 5 / 68
Solution (cont’d):
!
1 N 2 1 N
2N i∑ 2N i∑
2
E σ
b =E Ri = E Ri2
=1 =1
We know that: r
π 4 π
E (R i ) = σ V (R i ) = σ2
2 2
So, we have:

E Ri2 = V (R i ) + E (R i )2
4 π π 2
= σ2 + σ
2 2
= 2σ2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 6 / 68
Solution (cont’d):

1 N
2N i∑
b2
E σ = E Ri2
=1
1 N
2N i∑
= 2σ2
=1
N2σ2
=
2N
So, we have:
b2 = σ2
E σ

b2 is unbiased.
The estimator σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 7 / 68
Remark: Sometimes, the Rayleigh distribution is parametrized by σ. But, it is easier to
b2 is unbiased than to show that σ
show that σ b is unbiased...
v
u
u 1 N
b=t
2N i∑
σ Ri2
=1

Then to study the bias, we have to calculate:


0vu 1
u 1 N
b) = E @t
2N i∑
E (σ Ri2 A ???
=1

since for a nonlinear function g (.) , E (g (x )) 6= g (E (x )) ... The only solution is to


compute the integral
R∞
E (σb ) = 0 x fσb x ; σ2 dx

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 8 / 68
Problem (cont’d)
b2 is a (weakly) consistent estimator of σ2 . We admit that the
Question 2: Show that σ
raw moments of R are de…ned by:
k k
E Rik = σk 2 2 Γ 1 + k 2N
2

and where Γ (.) denotes the gamma function with:


Z∞
Γ (x ) = tx 1
exp ( t ) dt.
0

and
Γ (x ) = (x 1)! for x 2 N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 9 / 68
Solution:
b2 . Since the sample fR1 , R2 , .., RN g is i .i .d ., we have:
First, calculate V σ
!
1 N 2 1 N
2N i∑ 4N 2 i∑
2
V σ
b =V Ri = V Ri2
=1 =1

What is the value of V Ri2 ?


2
V Ri2 = E Ri4 E Ri2

We shown that
E Ri2 = 2σ2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 10 / 68
Solution (cont’d):
2
V Ri2 = E Ri4 E Ri2 = E Ri4 4σ4

What is the value of E Ri4 ? For any k 2 N :

k k
E Rik = σk 2 2 Γ 1 +
2
For k = 4, we have:
E Ri4 = σ4 22 Γ (3)

with
Γ (3 ) = (3 1)! = 2! = 2
So, we have:
E Ri4 = σ4 23 = 8σ4

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 11 / 68
Solution (cont’d):
The variance of Ri2 is equal to:

V Ri2 = E Ri4 4σ4 = 8σ4 4σ4 = 4σ4

As a consequence

1 N
4N 2 i∑
b2
V σ = V Ri2
=1
1 N
4N 2 i∑
= 4σ4
=1
N4σ4
=
4N 2
σ4
=
N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 12 / 68
Solution (cont’d):
To sum up:
b2 = σ2 (unbiased estimator)
E σ

σ4
b2 =
lim V σ lim =0
N !∞ N !∞ N

b2 is a (weakly) consistent estimator of σ2 :


So, the estimator σ
p
b2 ! σ2
σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 13 / 68
Problem (cont’d)
Question 3: Sometimes, the Rayleigh distribution is parametrized by σ (and not by σ2 )
with
R Rayleigh (σ )
Propose a (weakly) consistent estimator for σ.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 14 / 68
Solution:
Since σ > 0, a natural estimator for σ is de…ned by:
v
u p
u 1 N
b=t
σ ∑ Ri2 = σb2
2N i =1

b2 is a (weakly) consistent estimator of σ2 :


We shown that the estimator σ
p
b2 ! σ2
σ

By applying
p the Continuous Mapping Theorem (CMP) for the continuous function
g (x ) = x , we get immediately:
p
b2 ! g σ2
g σ

or
p
b!σ
σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 15 / 68
Problem (cont’d)
Question 4: For any value of σ2 , what is the …nite sample (or exact sampling)
b2 de…ned by
distribution of the estimator σ

1 N 2
2N i∑
b2 =
σ Ri
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 16 / 68
Solution:
The estimator is de…ned by
1 N 2
2N i∑
b2 =
σ Ri
=1
We know that R1 , R2 , .., RN are i .i .d . random variables with a Rayleigh distribution.

Ri Rayleigh σ2

Reminder: if X and Y are independent and normally distributed N 0, σ2 random


p
variables, the transformed variable R = X 2 + Y 2 has a Rayleigh distribution.
The exact distribution of R 2 = X 2 + Y 2 is unknown (it is not a χ2 ...), and as a
consequence the …nite sample distribution of σb2 is unknown.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 17 / 68
Problem (cont’d)
Question 5: Write a Matlab code in order to approximate the true (unknown) …nite
b2 for a sample size N = 10, a true value of σ2 = 16 and by using
sample distribution of σ
S = 1, 000 simulations.
b2 .
(1) Plot an histogram of the 1,000 realisations of the estimator σ
(2) plot the Kernel estimator of the density fσb2 (x ) by using the Matlab built-in function
ksdensity.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 18 / 68
De…nition (Kernel density estimator)
Let consider a sample X1 , .., XN , where X has a distribution characterized by the pdf
fX (x ) , for x 2 R. A consistent (kernel) estimator of fX (x ) for any x 2 R is given by:

1 N x xi
λN i∑
b
fX (x ) = K
=1 λ

where K (.) denotes a kernel function and λ is bandwidth parameter.


p
b
fX (x ) ! fX (x ) 8x 2 R

For more details (and a discussion on the optimal choice of λ), see:
Lecture notes "Econométrie Non Paramétrique", Hurlin (2008), Master Econométrie and
Statistique Appliquée, Université d’Orléans.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 19 / 68
De…nition (Kernel function)
A kernel function K (u ) satis…es the following properties:
(i ) K (u ) 0
R
(ii ) K (u ) du = 1
(iii ) K (u ) reaches its maximum for u = 0 and decreases with ju j.
(iv ) K (u ) is symmetric, i.e. K (u ) = K ( u ) .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 20 / 68
Some examples of Kernel functions

1 u2
Normal : K (u ) = p exp u2R
2π 2
Triangular : K (u ) = 1 ju j u 2 [ 1, 1]
15 2
Quartic or BiWeight : K (u ) = 1 u2 u 2 [ 1, 1]
16
3
Epanechnikov : K (u ) = 1 u2 u 2 [ 1, 1]
4
35 3
Triweight : K (u ) = 1 u2 u 2 [ 1, 1]
32

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 21 / 68
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 22 / 68
250

200

150

100

50

0
5 10 15 20 25 30 35

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 23 / 68
0.08

0.07

0.06

0.05

0.04

0.03

0.02

0.01

0
0 5 10 15 20 25 30 35 40

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 24 / 68
Problem (cont’d)
Question 6: For the special case where σ2 = 1, what is the …nite sample (or exact
b2 de…ned by
sampling) distribution of the estimator σ

1 N 2
2N i∑
b2 =
σ Ri
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 25 / 68
Solution:
The estimator is de…ned by
1 N 2
2N i∑
b2 =
σ Ri
=1
We know that R1 , R2 , .., RN are i .i .d . random variables with Ri Rayleigh (1)
p
Reminder 1 : if Ri Rayleigh (1), then R = X 2 + Y 2 where X and Y are independent
and standard normally distributed N (0, 1) random variables.
Reminder 2: if X N (0, 1) , then X 2 χ2 (1 )
Reminder 3: if X 2 χ2 (v1 ) and Y 2 χ2 (v2 ) , and X and Y are independent, then
X +Y χ2 (v1 + v2 )

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 26 / 68
Solution (cont’d):
So, if X and Y are independent and standard normally distributed N (0, 1) random
variables p
Ri = X 2 + Y 2 Rayleigh (1)
Ri2 = X 2 + Y 2 χ2 (2 )
The sum of independent chi-squared distributed random variable has a chi-squared
distribution.
N
∑ Ri2 χ2 (2N )
i =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 27 / 68
Solution (cont’d):

N
b2 =
2N σ ∑ Ri2 χ2 (2N )
i =1

In the special case where σ2 b2 has an exact sampling


= 1, the transformed variable 2N σ
(…nite sample) distribution that corresponds to a chi-squared distribution with 2N
degrees of freedom.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 28 / 68
Problem (cont’d)
Question 7: Write a Matlab code in order to approximate the true (unknown) …nite
b2 for a sample size N = 10 in the
sample distribution of the transformed variable 2N σ
2
special case where σ = 1 by using S = 10, 000 simulations.
(1) Plot the Kernel estimator of the density f2N σb2 (x ) by using the Matlab built-in
function ksdensity.
(2) Compare this estimated density function to the pdf of a chi-squared distribution with
2N degrees of freedom.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 29 / 68
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 30 / 68
0.07
Estimated finite sample pdf
Theoretical pdf of a chi-squared
0.06

0.05

0.04

0.03

0.02

0.01

0
0 10 20 30 40 50 60 70

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 31 / 68
Problem (cont’d)
b2 ?
Question 8: What is the asymptotic distribution of the estimator σ

1 N 2
2N i∑
b2 =
σ Ri
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 32 / 68
Solution:
We know that R12 , R22 , .., RN2 are i .i .d . variables with

E Ri2 = 2σ2 V Ri2 = 4σ4

Step 1: By applying the Lindberg-Levy univariate Central Limit Theorem (CLT), we


get: !
p 1 N 2 d
N i∑
2
N Ri 2σ ! N 0, 4σ4
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 33 / 68
Solution (cont’d):
Step 2: By de…nition, we have
!
1 N 2 1 N 2
b2 =
σ ∑
2N i =1
Ri = g
N i∑
Ri
=1
!
p 1 N 2 d
N i∑
N Ri 2σ2 ! N 0, 4σ4
=1
with g (x ) = x /2. So, g (.) is a continuous and continuously di¤erentiable function with
g 2σ2 6= 0 and not involving N, then the delta method implies
! ! !
p 2
1 N 2 d ∂g (x )
N i∑
2 4
N g R i g 2σ ! N 0, 4σ
=1 ∂x 2σ2

2σ2 ∂g (x ) ∂x /2 1
g 2σ2 = = σ2 = =
2 ∂x 2σ2 ∂x 2σ2 2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 34 / 68
Solution (cont’d):
! ! !
p 2
1 N 2 d 1
N i∑
2 4
N g Ri g 2σ !N 0, 4σ
=1 2

b2 is asymptotically normally distributed


The estimator σ
p d
N σ b2 σ2 ! N 0, σ4

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 35 / 68
Problem (cont’d)
b2 ?
Question 9: What is the asymptotic variance of the estimator σ

1 N 2
2N i∑
b2 =
σ Ri
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 36 / 68
Solution:
We know that:
p d
b2
N σ σ2 ! N 0, σ4

or equivalently
asy σ4
b2
σ N σ2 ,
N
b2 is equal to:
The asymptotic variance of σ

σ4
b2 =
Vasy σ
N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 37 / 68
Problem (cont’d)
Question 10: Write a Matlab code in order to approximate the asymptotic distribution
of the transformed variable p
N σ b2 σ2
Z =
σ
for a sample size N = 10, 000, a true value of σ2 = 16 by using S = 10, 000 simulations.
(1) Plot the Kernel estimator of the density fZ (x ) by using the Matlab built-in function
ksdensity.
(2) Compare this estimated density function to the pdf of a standard normal distribution.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 38 / 68
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 39 / 68
0.45
Estimated finite sample pdf
0.4 Theoretical pdf of a standard normal

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
-5 0 5

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 40 / 68
Problem (cont’d)
b of the parameter σ
Question 11: What is the asymptotic distribution of the estimator σ
de…ned by: v
u
u 1 N
b=t
2N i∑
σ Ri2
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 41 / 68
Solution:
Step 1 : We know that:
p d
b2
N σ σ2 ! N 0, σ4

b > 0, we have:
Since σ
v
u p
u 1 N
b=t
σ ∑ Ri2 = σ b2
b2 = g σ
2N i =1
p
where g (x ) = x is a continuous and continuously di¤erentiable function with
2
g σ 6= 0 and that does not depend on N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 42 / 68
Solution (cont’d):
Step 2: We have
p d
N σb2 σ2 ! N 0, σ4
p
The delta method for g (x ) = x implies
!
p 2
2 2 d ∂g (x ) 4
b
N g σ g σ !N 0, σ
∂x σ2

with p
∂g (x ) ∂ x 1 1
g σ2 = σ = = p =
∂x σ2 ∂x σ2 2 σ2 2σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 43 / 68
Solution (cont’d):
Step 2 (cont’d):
!
p 2
2 2 d ∂g (x ) 4
b
N g σ g σ !N 0, σ
∂x σ2
p
∂g (x ) ∂ x 1 1
g σ2 = σ = = p =
∂x σ2 ∂x σ2 2 σ 2 2σ
So, we have:
p d σ2
b
N (σ σ) ! N 0,
4

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 44 / 68
Exercise 2

CAPM

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 45 / 68
Problem (CAPM)
The empirical analogue of the CAPM is given by:

rit rft = αi + βi (rmt rft ) + εt


| {z } | {z }
excess return of security i for time t market excess return for time t

where εt is an i .i .d . error term. We assume that

e
rit = rit rft e
rmt = rmt rft

E ( εt ) = 0 V ( εt ) = σ 2 E ( εt j e
rmt ) = 0

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 46 / 68
Problem (CAPM, cont’d)
Consider the model
e
rit = αi + βi e
rmt + εt
Data: Microsoft, SP500 and Tbill (closing prices) from 11/1/1993 to 04/03/2003

0.10

0.08
0.05

0.04
RMSFT

0.00

0.00

-0.05
-0.04

-0.10
-0.06 -0.04 -0.02 0.00 0.02 0.04 0.06 0.08 -0.08
500 1000 1500 2000

RSP500 RSP500 RMSFT

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 47 / 68
Problem (CAPM, cont’d)
We consider the CAPM model rewritten as follows

rit = xt> β + εt
e t = 1, ..T

where xt = (1 e rmt )> is 2 1 vector of random variables, β = (αi βi )> is 2 1 vector of


parameters, and where the error term εt satis…es E (εt ) = 0, V (εt ) = σ2 and
E ( εt j e
rmt ) = 0.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 48 / 68
Problem (CAPM, cont’d)
Question 1: show that the OLS estimator
! 1 !
T T
b=
β ∑ xt xt> ∑ xt erit
t =1 t =1

satis…es p
b d
T β β0 ! N 0, σ2 E 1
xt> xt

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 49 / 68
Solution:

1 let us rewrite the OLS estimator as:


! 1 ! ! 1 !
T T T T
β = ∑ xt xt
b >
∑ xt erit = β0 + ∑ xt xt> ∑ xt εt
t =1 t =1 t =1 t =1

2 b
Normalize the vector β β0
! 1 !
p 1 T p 1 T
T t∑ T t∑
b
T β β0 = xt xt> T xt εt
=1 =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 50 / 68
Solution (cont’d):

3. Using the WLLN and the CMP:


! 1
1 T p
T t∑
xt xt> ! E 1
xt xt>
=1

4. Using the CLT:


!
p 1 T d
T t∑
T xt εt E (xt εt ) ! N (0, V (xt εt ))
=1

with E ( εt j e
rmt ) = 0 and E ( εt j 1) = E (εt ) = 0 =) E (xt εt ) = 0 and

V (xt εt ) = E xt εt εt xt> = E E xt εt εt xt> xt

= E xt V ( εt j xt ) xt> = σ2 E xt xt>

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 51 / 68
Solution (cont’d):
So, we have ! 1
1 T p
T t∑
>
xt xt ! E 1 xt xt>
=1
!
p 1 T d
T t∑
T xt εt ! N 0, σ2 E xt xt>
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 52 / 68
Solution (cont’d):
By using the Slutsky’s theorem (for a convergence in distribution), we have:
! 1 !
p 1 T p 1 T d
T t∑ T t∑
b
T β β0 = >
xt xt T xt µt ! N (Π, Ω)
=1 =1

with
Π=E 1
xt xt> 0=0

Ω=E 1
xt xt> σ2 E xt xt> E 1
xt xt> = σ2 E 1
xt xt>

Finally, we have:
p d
b
T β β0 ! N 0, σ2 E 1
xt xt>

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 53 / 68
Problem (CAPM, cont’d)
b?
Question 2: What is the asymptotic variance-covariance matrix of the OLS estimator β
! 1 !
T T
b=
β ∑ xt xt> ∑ xt e
rit
t =1 t =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 54 / 68
Solution:
We shown that p
b d
T β β0 ! N 0, σ2 E 1
xt xt>

or equivalently
b
asy σ2
β N β0 , E 1
xt xt>
T
b is equal to:
The asymptotic variance-covariance matrix of β
2
b = σ E
Vasy β 1
xt xt>
T | {z }
2 2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 55 / 68
Remarks:

1 The asymptotic variance covariance matrix is a 2 2 symmetric matrix:


0 1
σ 2 V asy ( b
α ) cov b
α , b
β
Vasy β b = E 1 xt xt> = @ A
T cov b β, b
α Vasy b β

2 rmt )> , we have:


Since xt = (1 e

1 e
rmt 1 E (e
rmt )
E xt xt> = E =
e
rmt e2
rmt E (e
rmt ) E e 2
rmt

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 56 / 68
Problem (CAPM, cont’d)
Question 3: Let us consider a consistent estimator of σ2 de…ned by:
T T 2
1 1
b2 =
σ
T ∑ bε2t =
2 t =1 T ∑
2 t =1
e
rit b
xt> β

b ?
Propose a consistent estimator of the asymptotic variance Vasy β

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 57 / 68
Solution:
We know that
2
Vasy β b = σ E 1
xt xt>
T
Using the LLN, if xt is i .i .d ., we get:

1 T p
T t∑
xt xt> ! E xt xt>
=1

Using the CMP:


! 1
1 T p
T t∑
xt xt> !E 1
xt xt>
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 58 / 68
Solution (cont’d):
So, we have
2
b = σ E
Vasy β 1
xt xt>
T
and ! 1
1 T p
T t∑
xt xt> !E 1
xt xt>
=1
p
b2 ! σ2
σ
By using the Slutsky’s theorem:
! 1
b2
σ 1 T p σ2
T t∑
xt xt> ! E 1 b
xt xt> = Vasy β
T =1 T

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 59 / 68
Solution (cont’d):
b is de…ned by
A consistent estimator of the asymptotic variance Vasy β
! 1
b2
b = σ 1 T
T t∑
b asy β
V xt xt>
T =1

Or equivalently by
! 1
T
b =σ
b asy β
V b2 ∑ xt xt>
t =1
with
T T 2
1 1
b2 =
σ
T ∑ bε2t =
2 t =1 T ∑
2 t =1
e
rit b
xt> β

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 60 / 68
Remark:
! 1
T
b =σ
b asy β
V b2 ∑ xt xt>
t =1
>
Since xt = (1 e
rmt ) :
T
T ∑T
t =1 e
rmt
∑ xt xt> = ∑T ∑T 2
t =1 t =1 e
rmt t =1 e
rmt

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 61 / 68
Problem (CAPM, cont’d)
Question 4: Using the excel …le [Link], write a Matlab code to estimate the beta and
the alpha for MSFT. Compare your results with the following table of estimation results
(Eviews).

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 62 / 68
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 63 / 68
Perfect....

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 64 / 68
Problem (CAPM, cont’d)
Question 5: Using the excel …le [Link], write a Matlab code
(1) to estimate the variance of the error term εt
b
(2) to estimate the asymptotic standard errors of the estimators β
(3) Compare your results with the table of estimation results (Eviews).

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 65 / 68
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 66 / 68
Perfect too....

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 67 / 68
End of Exercices - Chapter 1

Christophe Hurlin

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 68 / 68
Exercises Chapter 2
Maximum Likelihood Estimation
Data science and advanced programming

Christophe Hurlin

HEC Lausanne

September 2024

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 1 / 74
Exercise 1

MLE and Geometric Distribution

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 2 / 74
Problem (MLE and geometric distribution)
We consider a sample X1 , X2 , .., XN of i .i .d . discrete random variables, where Xi has a
geometric distribution with a pmf given by:

fX (x , θ ) = Pr (X = x ) = θ (1 θ )x 1
8x 2 f1, 2, 3, ..g

where the success probability θ satis…es 0 < θ < 1 and is unknown. We assume that:
1 1 θ
E (X ) = V (X ) =
θ θ2
Question 1: Write the log-likelihood function of the sample fx1 , x2 , ..xN g .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 3 / 74
Solution

fX (x , θ ) = Pr (X = x ) = θ (1 θ )x 1
8x 2 f1, 2, 3, ..g
Since the X1 , X2 , .., XN are i .i .d . then
N N
LN (θ; x1 .., xN ) = ∏ fX (x i ; θ ) = θ N (1 θ ) ∑ i =1 ( x i 1)
i =1

N N
`N (θ; x1 , .., xn ) = ∑ ln fX (xi ; θ ) = N ln (θ ) + ln (1 θ) ∑ (xi 1)
i =1 i =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 4 / 74
Problem (MLE and geometric distribution)
Question 2: Determine the maximum likelihood estimator of the success probability θ.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 5 / 74
Solution
The maximum likelihood estimate of the success probability θ is de…ned by:
N
b
θ = arg max `N (θ; x ) = arg max N ln (θ ) + ln (1 θ) ∑ (xi 1)
0 < θ <1 0 < θ <1 i =1

The gradient and the hessian (deterministic) are de…ned by:


N
∂`N (θ; x ) N 1
∂θ
=
θ 1 ∑ (xi
θ i =1
1)

2 N
∂2 `N (θ; x ) N 1
∂θ 2
=
θ2 1 θ ∑ (xi 1)
i =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 6 / 74
Solution (cont’d)
So, the FOC (likelihood equation) is:
N
∂`N (θ; x ) N 1
∂θ b
=
b
θ 1 b ∑
θ i =1
(xi 1) = 0
θ

1 b
θ 1 N
N i∑
() = xi 1
b
θ =1

1 1 N
N i∑
() = xi
b
θ =1
So we have:
b 1
θ=
xn
where x n denotes the realisation of the sample mean X N = N 1
∑N
i =1 X i .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 7 / 74
Solution (cont’d)
The SOC is:
2 N
∂2 `N (θ; x ) N 1
∂θ 2 b
=
b 2
1 b
θ
∑ (xi 1)
θ θ i =1

Since b
θ = 1/x n , we have:
!
N N
1 1 b
θ
∑ (xi 1) = ∑ xi N = Nx n N =N
b
θ
1 =N
b
θ
i =1 i =1

So, we have:
!
2 b
∂2 `N (θ; x ) N 1 1 θ
= N
∂θ 2 b
θ b
θ
2
1 b
θ b
θ
0 1
1 1
= N@ 2
+ A
b
θ b
θ 1 bθ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 8 / 74
Solution (cont’d)

0 2 1
b
θ 1 b
θ +b
∂2 `N (θ; x ) θ
= N@ A
∂θ 2 b
θ b3
θ 1 b
θ
N
= 2
<0
b
θ 1 b
θ

we have a maximum since 0 < b


θ < 1.
Conclusion: the ML estimator of θ is equal to the inverse of the sample mean:

b 1
θ=
XN

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 9 / 74
Problem (MLE and geometric distribution)
Question 3: Show that the maximum likelihood estimator of the success probability θ is
weakly consistent.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 10 / 74
Solution
In two lines...
1 Since the X1 , X2 , .., XN are i .i .d . then according to the Khinchin’s theorem (WLLN),
we have:
p 1
X N ! E (X i ) =
θ
2 Given that bθ = 1/X N , by using the continuous mapping theorem (CMP) for a
function g (x ) = 1/x , we get:

b p 1
θ = g XN ! g
θ
or equivalently
p
b
θ!θ
The estimator b
θ is (weakly) consistent.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 11 / 74
Problem (MLE and geometric distribution)
Question 4: By using the asymptotic properties of the MLE, derive the asymptotic
distribution of the ML estimator b
θ = 1/X N .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 12 / 74
Solution

1 The log-likelihood function ln fX (θ; xi ) satis…es the regularity conditions.

2 So, the ML estimator is asymptotically normally distributed with


p d
N bθ θ 0 ! N 0, I 1 (θ 0 )

where θ 0 denotes the true value of the parameter and I (θ 0 ) the (average) Fisher
information number for one observation.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 13 / 74
Solution (cont’d)

3. Compute the Fisher information number for one observation. Since we consider a
marginal log-likelihood, the Fisher information number associated to Xi is the same
for the observations i . We have three de…nition for I (θ )

∂`i (θ; Xi )
I (θ ) = Vθ
∂θ
!
∂`i (θ; Xi ) ∂`i> (θ; Xi )
= Eθ
∂θ ∂θ
∂2 `i (θ; Xi )
= Eθ
∂θ 2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 14 / 74
Solution (cont’d)
Let us consider the third one:
∂2 `i (θ; Xi )
I (θ ) = Eθ
∂θ 2
!
2
1 1
= Eθ + (X i 1)
θ2 1 θ
2
1 1
= + (Eθ (X i ) 1)
θ2 1 θ
2
1 1 1
= 2
+ 1
θ 1 θ θ
1
=
θ 2 (1 θ )

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 15 / 74
Solution (cont’d)
The asymptotic distribution of the ML estimator is:
p d
N b θ θ0 ! N 0, θ 20 (1 θ0 )
N !∞

where θ 0 denotes the true value of the parameter. Or equivalently:


!
b
asy θ 2 (1 θ 0 )
θ N θ0 , 0
N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 16 / 74
Problem (MLE and geometric distribution)
Question 5: By using the central limit theorem and the delta method, …nd the
asymptotic distribution of the ML estimator b
θ = 1/X N .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 17 / 74
Solution

1 Since the X1 , X2 , .., XN are i .i .d . with E (X ) = 1/θ 0 and V (X ) = (1 θ 0 ) /θ 20 ,


according to the Lindberg-Levy’s CLT we get immediately
p 1 d 1 θ0
N XN !N 0,
θ0 θ 20

2 Our MLE estimator is de…ned by b θ = 1/X N . Let us consider a function


g (z ) = 1/z. So, g (.) is a continuous and continuously di¤erentiable function with
g (1/θ ) = θ 6= 0 and not involving N, then the delta method implies
!
p 2
1 d ∂g (z ) 1 θ0
N g XN g ! N 0,
θ0 ∂z 1/θ θ 20

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 18 / 74
Solution (cont’d)
!
p 2
1 d ∂g (z ) 1 θ0
N g XN g !N 0,
θ0 ∂z 1/θ θ 20

We known that g (z ) = 1/z and ∂g (z ) /∂z = 1/z 2 , so we have


p d 1 θ0
N b
θ θ0 ! N 0, θ 40
θ 20
Finally, we get the same result as in the previous question:
p d
N b θ θ 0 ! N 0, θ 20 (1 θ 0 )

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 19 / 74
Problem (MLE and geometric distribution)
Question 6: Determine the FDCR or Cramer-Rao bound. Is the ML estimator b
θ e¢ cient
and/or asymptotically e¢ cient?

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 20 / 74
Solution
The FDCR or Cramer-Rao bound is de…ned by:

FDCR = IN 1 (θ 0 )

where I N (θ 0 ) denotes the Fisher information number for the sample evaluated at the
true value θ 0 . There are three alternative de…nitions for I N (θ 0 ) .
!
∂`N (θ; X )
I N ( θ 0 ) = Vθ
∂θ θ0
!
∂`N (θ; X ) ∂`N (θ; X )>
I N ( θ 0 ) = Eθ
∂θ θ0 ∂θ
θ0
!
∂2 `N (θ; X )
I N ( θ 0 ) = Eθ
∂θ∂θ > θ0

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 21 / 74
Solution (cont’d)
Let us consider the third one:
!
∂2 `N (θ; X )
I N (θ 0 ) = Eθ
∂θ∂θ > θ0
!
2 N
N 1
= Eθ
θ 20
+
1 θ0 ∑ (X i 1)
i =1
2 N
N 1
=
θ 20
+
1 θ0 ∑ (Eθ (X i ) 1)
i =1
2
N 1 1
= 2
+ N 1
θ0 1 θ0 θ0
N
=
θ 20 (1 θ 0 )

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 22 / 74
Solution (cont’d)
So, the FDCR or Cramer-Rao bound is de…ned by:

θ 20 (1 θ 0 )
FDCR = IN 1 (θ 0 ) =
N

1 We don’t know if b
θ is e¢ cient... For that we need to compute the variance
V bθ = V 1/X N .

2 Since the log-likelihood function ln fX (θ; xi ) satis…es the regularity conditions, the
MLE is asymptotically e¢ cient.

Remark: we shown that for N large:

θ 2 (1 θ 0 )
Vasy b
θ = IN 1 (θ 0 ) = 0
N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 23 / 74
Remark
How to get the Fisher information number for the sample (and as a consequence the
FDCR or Cramer-Rao bound) in one line from the question 4? Since the sample is i .i .d .,
we have:
N
IN (θ 0 ) = N I (θ 0 ) = 2
θ 0 (1 θ 0 )

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 24 / 74
Problem (MLE and geometric distribution)
Question 7: Propose a consistent estimator for the asymptotic variance of the ML
estimator b
θ.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 25 / 74
Solution
We have:
θ 2 (1 θ 0 )
Vasy b θ = 0
N
b
and we know that the ML estimator θ is a (weakly) consistent estimator of θ 0 :
p
b
θ ! θ0

A natural estimator for the asymptotic variance is given by:


2
b
θ0 1 b
θ0
b asy b
V θ =
N
Given the CMP and Slutsky’s theorem, it is easy to show that:
p
b asy b
V θ ! Vasy b
θ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 26 / 74
Problem (MLE and geometric distribution)
Question 8: Write a Matlab code in order to
(1) Generate a sample of size N = 1, 000 of i .i .d . random variable distributed according
to a geometric distribution with a success probability θ = 0.3 by using the function
geornd.
(2) Estimate by MLE the parameter θ. Compare your estimate with the sample mean.

Remak: There are two de…nitions of geometric distribution:

Pr (X = x ) = θ (1 θ )x 1
8x 2 f1, 2, ..g used in this exercice

Pr (X = x ) = θ (1 θ )x 8x 2 f0, 1, 2, ..g used by Matlab for geornd.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 27 / 74
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 28 / 74
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 29 / 74
Exercise 2

MLE and AR(p) processes

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 30 / 74
De…nition (AR(1) process)
A stationary Gaussian AR(1) process takes the form

Yt = c + ρYt 1 + εt

with εt i .i .d . N 0, σ2 , jρj < 1 and:

c σ2
E (Y t ) = V (Y t ) =
1 ρ 1 ρ2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 31 / 74
Problem (MLE and AR processes)
>
Question 1: Denote θ = c; ρ; σ2 the 3 1 vector of parameters and write the
likelihood and the log-likelihood of the …rst observation y1 .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 32 / 74
Solution
Since the variable Y1 is gaussian with

c σ2
E (Y t ) = V (Y t ) =
1 ρ 1 ρ2

The (unconditional) likelihood of y1 is equal to:


!
1 1 (y1 c / (1 ρ))2
L1 (θ; y1 ) = p p exp
2π σ 2 / (1 ρ2 ) 2 σ 2 / (1 ρ2 )

The (unconditional) log-likelihood of y1 is equal to:

1 1 σ2 1 (y1 c / (1 ρ))2
`1 (θ; y1 ) = ln (2π ) ln
2 2 1 ρ2 2 σ 2 / (1 ρ2 )

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 33 / 74
Problem (MLE and AR processes)
Question 2: What is the conditional distribution of Y2 given Y1 = y1 . Write the
(conditional) likelihood and the (conditional) log-likelihood of the second observation y2 .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 34 / 74
Solution
For t = 2, we have:
Y2 = c + ρY1 + ε2
where ε2 N 0, σ2
. As a consequence, the conditional distribution of Y2 given
Y1 = y1 is also normal:
Y2 j Y1 = y1 N c + ρy1 , σ2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 35 / 74
Solution (cont’d)
Given
Y 2 j Y 1 = y1 N c + ρy1 , σ2
The conditional likelihood of y2 is equal to:
!
1 1 (y2 c ρy1 )2
L2 (θ; y2 j y1 ) = p exp
σ 2π 2 σ2

The conditional log-likelihood of y2 is equal to:

1 1 1 (y2 c ρy1 )2
`2 (θ; y2 j y1 ) = ln (2π ) ln σ2
2 2 2 σ2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 36 / 74
Problem (MLE and AR processes)
Question 3: Consider a sample of fy1 , y2 g of size T = 2. Write the exact likelihood (or
full likelihood) and the exact log-likelihood of the AR (1) model for the sample fy1 , y2 g .
Note that for two continuous random variables X and Y , the pdf of the joint distribution
(X , Y ) can be written as:

fX ,Y (x , y ) = f X jY =y ( x j y ) fY (y )

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 37 / 74
Solution
The exact (or full) likelihood of the sample fy1 , y2 g corresponds to the pdf of the joint
distribution of (Y1 , Y2 ) :
LT (θ; y1 , y2 ) = fY 1 ,Y 2 (y1 , y2 )
This joint density can be rewritten as the product of the marginal density of Y1 by the
conditional density of Y2 given Y1 = y1 :

LT (θ; y1 , y2 ) = f Y 2 jY 1 =y1 ( y2 j y1 ; θ) fY 1 (y1 ; θ)

or equivalently:
LT (θ; y1 , y2 ) = L2 (θ; y2 j y1 ) L1 (θ; y1 )

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 38 / 74
Solution (cont’d)
The exact (or full) likelihood of the sample fy1 , y2 g is equal to:
!
1 1 (y1 c / (1 ρ))2
LT (θ; y1 , y2 ) = p p exp
2π σ 2 / (1 ρ2 ) 2 σ 2 / (1 ρ2 )
!
1 1 (y2 c ρy1 )2
p exp
σ 2π 2 σ2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 39 / 74
Solution (cont’d)
Similarly the exact (or full) log-likelihood of the sample fy1 , y2 g is equal to:

`T (θ; y1 , y2 ) = `2 (θ; y2 j y1 ) + `1 (θ; y1 )

Then, we get:

1 1 σ2 1 (y1 c / (1 ρ))2
`T (θ; y1 , y2 ) = ln (2π ) ln
2 2 1 ρ2 2 σ 2 / (1 ρ2 )
1 1 1 (y2 c ρy1 )2
ln (2π ) ln σ2
2 2 2 σ2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 40 / 74
Problem (MLE and AR processes)
Question 4: Write the exact likelihood (or full likelihood) and the exact log-likelihood
of the AR (1) model for a sample fy1 , y2 , .., yT g of size T .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 41 / 74
Solution
More generally, we have:
T
LT (θ; y1 , ., yT ) = L1 (θ; y1 ) ∏ Lt (θ; yt j yt 1)
t =2

T
`T (θ; y1 , .., yT ) = `1 (θ; y1 ) + ∑ `t (θ; yt j yt 1)
t =2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 42 / 74
Solution (cont’d)

!
1 1 (y1 c / (1 ρ))2
LT (θ; y ) = p p exp
2π σ 2 / (1 ρ2 ) 2 σ 2 / (1 ρ2 )
!
T
1 1 (yt c ρyt 1 )2
∏ σp2π exp 2 σ2
t =2

1 1 σ2 1 (y1 c / (1 ρ))2
`T (θ; y ) = ln (2π ) ln
2 2 1 ρ2 2 σ 2 / (1 ρ2 )
!
T 2
1 1 1 (yt c ρyt 1)
+ ∑ 2
ln (2π )
2
ln σ2
2 σ2
t =2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 43 / 74
Problem (MLE and AR processes)
Question 5: The exact log-likelihood function is a non-linear function of the parameters
θ, and so there is no closed form solution for the exact mles. The exact MLE
b
θ = (bc; b b2 )> must be determined by numerically maximizing the exact log-likelihood
ρ; σ
function. Write a Matlab code to
(1) to generate a sample of size T = 1, 000 from an AR (1) process with c = 1, ρ = 0.5
and σ2 = 1. Remark : for the initial condition, generate a normal random variable.
(2) to compute the exact MLE.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 44 / 74
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 45 / 74
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 46 / 74
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 47 / 74
Problem (MLE and AR processes)
Question 6: Now we consider the …rst observation y1 as given (deterministic). Then, we
have fY 1 (y1 ; θ) = 1. Write the conditional log-likelihood of the AR (1) model for a
sample fy1 , y2 , .., yT g of size T .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 48 / 74
Solution
The conditional likelihood is de…ned by:
T
LT (θ; y2 , .., yT j y1 ) = ∏ f Y jY t t 1 ,Y 1 =y 1
( yt j yt 1 , y1 ; θ) fY 1 (y1 ; θ)
t =2
T
= ∏ f Y jY t t 1
( yt j yt 1 ; θ)
t =2

The conditional log-likelihood is de…ned by:


T
`T (θ; y1 , .., yT j y1 ) = `1 (θ; y1 ) + ∑ `t (θ; yt j yt 1 , y1 )
t =2
T
= ∑ `t (θ; yt j yt 1)
t =2

where `t (θ; yt j yt 1) = ln f Y t jY t 1
( yt j yt 1 ; θ) .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 49 / 74
Solution (cont’d)
The conditional log-likelihood is then equal to:
!
T 2
1 1 1 (yt c ρyt 1)
`T (θ; y ) = ∑ 2
ln (2π )
2
ln σ2
2 σ2
t =2

or equivalently
(T 1) (T 1)
`T (θ; y ) = ln (2π ) ln σ2
2 2
1 T
2σ2 t∑
2
(yt c ρyt 1)
=2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 50 / 74
Problem (MLE and AR processes)
Question 7: Write the likelihood equations associated to the conditional log-likelihood.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 51 / 74
Solution
The ML estimator b c; b
θ = (b b2 )> of θ is de…ned by:
ρ; σ
b
θ = arg max`T (θ; y1 , .., yT )
θ 2Θ

The log-likelihood equations are:


0 ∂`T (θ;y )
1
B ∂c b
θ C 0 1
B C 0
∂`T (θ; y ) B ∂`T (θ;y ) C @
=B C= 0 A
∂θ b B ∂ρ b
θ C
θ @ A 0
∂`T (θ;y )
∂σ2 b
θ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 52 / 74
Solution (cont’d)

(T 1) (T 1)
`T (θ; y ) = ln (2π ) ln σ2
2 2
1 T
2σ2 t∑
2
(yt c ρyt 1)
=2

∂`T (θ; y ) 1 T
∂c b
= ∑ (yt
b 2 t =2
σ
b
c b
ρyt 1) =0
θ

∂`T (θ; y ) 1 T
∂ρ b
= ∑ (yt
b 2 t =2
σ
b
c b
ρyt 1 ) yt 1 =0
θ
T
∂`T (θ; y ) (T 1 ) 1
∂σ2 b
=
σ2
2b
+ 4
2b
σ
∑ (yt b
c b
ρyt 1)
2
=0
θ t =2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 53 / 74
Problem (MLE and AR processes)
Question 8: Show that the conditional ML estimators b c and bρ correspond to the OLS
b2 . Remark: do not verify the SOC at this step.
estimator. Give the the estimator of σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 54 / 74
Solution
The maximisation of `T (θ; y ) with respect to c and ρ

(T 1) (T 1)
`T (θ; y ) = ln (2π ) ln σ2
2 2
1 T
2σ2 t∑
2
(yt c ρyt 1)
=2

is equivalent to the minimisation of


T
∑ (yt c ρyt 1)
2
= (y Xβ)> (y Xβ)
t =2

with y = (y2 ; ..; yN )> , β = (c; ρ)> and X = (1 : y 1) with y 1 = (y1 ; ..; yN 1)
>
.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 55 / 74
Solution
The conditional ML estimators of c and ρ are equivalent to the ordinary least square
(OLS) estimators obtained in regression of yt on a constant and its own lagged value:

yt = c + ρyt 1 + εt
! 1 !
b
c T 1 ∑T
t =2 y t 1 ∑T
t =2 y t
=
b
ρ ∑T
t =2 y t 1 ∑T 2
t =2 y t 1 ∑T
t =2 y t 1 yt

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 56 / 74
Solution
b2 is de…ned by:
The ML estimator σ
T
∂`T (θ; y ) (T 1 ) 1
∂σ2 b
=
2b
σ 2
+ 4
2b
σ
∑ (yt b
c b
ρyt 1)
2
=0
θ t =2

Then, we get:
T T
1 1
b2 =
σ
T ∑ (yt
1 t =2
b
c b
ρyt 1)
2
=
T ∑ εbt 2
1 t =2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 57 / 74
Problem (MLE and AR processes)
Question 9: Write a Matlab code to compute the conditional maximum likelihood
estimator b c; b
θ = (b b2 )> .
ρ; σ
(1) Generate a sample of size T = 1, 000 from an AR (1) process with c = 1, ρ = 0.5
and σ2 = 1. Remark : for the initial condition, generate a normal random variable.
(2) Compute the conditional MLE.
c and b
(3) Compare the ML estimators b ρ to the OLS ones.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 58 / 74
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 59 / 74
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 60 / 74
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 61 / 74
Problem (MLE and AR processes)
Question 10: Write the average Fisher information matrix associated to the conditonal
likelihood.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 62 / 74
Solution
In general for a conditional model, in order to compute the average information matrix
I (θ ) for one observation:
Step 1: Compute the Hessian matrix or the score vector for one observation

∂2 `i (θ; Yi j xi ) ∂`i (θ; Yi j xi )


Hi (θ; Yi j xi ) = si (θ; Yi j xi ) =
∂θ∂θ > ∂θ
Step 2: Take the expectation (or the variance) with respect to the conditional
distribution Yi j Xi = xi

I i (θ ) = Vθ (si (θ; Yi j xi )) = Eθ ( Hi (θ; Yi j xi ))

Step 3: Then the expectation with respect to the conditioning variable X

I (θ ) = EX (I i (θ ))

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 63 / 74
Solution (cont’d)
Step 1:
∂`t (θ; yt ) 1
= 2 (yt c ρyt 1 )
∂c σ
∂`t (θ; yt ) 1
= 2 (yt c ρyt 1 ) yt 1
∂ρ σ
∂`t (θ; yt ) 1 1 2
= + 4 (yt c ρyt 1)
∂σ2 2σ2 2σ
The Hessian matrix for one observation is de…ned by:
0 1
1/σ2 yt 1 /σ2 εt /σ4
B C
Ht (θ; Yt j yt 1 ) = @ yt 1 /σ2 yt2 1 /σ2 εt yt 1 /σ
4
A
εt /σ4 εt yt 1 /σ
4 1/2σ4 εt /σ6
2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 64 / 74
Solution (cont’d)

0 1
1/σ2 yt 1 /σ
2 εt /σ4
B 2 C
Ht (θ; Yt j yt 1) =@ yt 1 /σ yt2 1 /σ2 εt yt 1 /σ
4
A
εt /σ4 εt yt 1 /σ
4 1/2σ4 ε2t /σ6

Step 2: Take the expectation (or the variance) with respect to the conditional
distribution Yt j Yt 1 = yt 1

I t (θ) = Eθ ( Ht (θ; Yt j yt 1 ))
0 1
1/σ2 yt 1 /σ2 0
B 2 C
I t (θ) = @ yt 1 /σ yt2 1 /σ2 0 A
0 0 1/2σ4
since Eθ (εt ) = 0, Eθ (εt yt 1) = yt 1 Eθ (εt ) = 0 and Eθ ε1t = σ2 .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 65 / 74
Solution (cont’d)

0 1
1/σ2 yt 1 /σ
2 0
B 2 C
I t (θ) = @ yt 1 /σ yt2 1 /σ2 0 A
0 0 1/2σ4

Step 3: Then take the expectation with respect to the conditioning variable
xt = (1 : yt 1 )
I (θ) = EX (I i (θ))
0 1
1/σ 2 EX (yt 1 ) /σ2 0
B C
I (θ) = @ EX (yt 1 ) /σ 2 EX yt 1 /σ
2 2 0 A
0 0 1/2σ4

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 66 / 74
Problem (MLE and AR processes)
Question 11: What is the asymptotic distribution of the conditional MLE? Propose an
estimator for the asymptotic variance covariance matrix of b c; b
θ = (b b2 )> .
ρ; σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 67 / 74
Solution
Since the log-likelihood is regular, we have:
p d
T 1 b
θ θ0 ! N 0, I 1
(θ0 )

or equivalently
b 1 1
θ N θ0 , I (θ0 )
T 1
with 0 1
1/σ2 EX (yt 1 ) /σ
2 0
B C
I (θ) = @ EX (yt 1 ) /σ
2 EX yt2 1 /σ2 0 A
0 0 1/2σ4

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 68 / 74
Solution (cont’d)

0 1
1/σ2 EX (yt 1 ) /σ
2 0
B C
I (θ) = @ EX (yt 1 ) /σ
2 EX yt2 1 /σ2 0 A
0 0 1/2σ4

An estimator of the asymptotic variance covariance matrix can be derived from:


0 1
1/bσ2 (T 1 ) 1 ∑T σ2
t =2 yt 1 /b 0
B C
bI (θ) = B (T 1) 1 ∑T yt 1 /b σ 2 1 T
(T 1) ∑t =2 yt 1 /b2 σ 2
0 C
@ t =2 A
0 0 1/2b σ4

b2 is the ML estimator of σ2 .
where σ

b asy b 1 bI 1
V θ = (θ)
T 1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 69 / 74
Solution (cont’d)
If we denote by X = (1 : y 1) , then we have:
!
bI (θ) = T
1 > σ2
1 X X/b 02 1
01 2 1/2bσ4

since
T 1 ∑T
t =2 y t 1
X> X =
∑T
t =2 y t 1 ∑T 2
t =2 y t 1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 70 / 74
Problem (MLE and AR processes)
Question 12: Write a Matlab code to compute the asymptotic variance covariance
matrix associated to the conditional maximum likelihood estimator b c; b
θ = (b b2 )> .
ρ; σ
(1) Import the data from the excel …le Chapter2_Exercice2.xls
(2) Compute the asymptotic variance covariance matrix of the conditional MLE.
(3) Compare your results with the results reported in Eviews.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 71 / 74
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 72 / 74
Perfect....

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 73 / 74
End of Exercices - Chapter 2

Christophe Hurlin

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 74 / 74
Exercises Chapter 4
Statistical Hypothesis Testing
Data science and advanced programming

Christophe Hurlin

HEC Lausanne

September 2024

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 1 / 88
Exercise 1

Parametric tests and the Neyman Pearson lemma

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 2 / 88
Problem
We consider two continuous independent random variables U and W normally distributed
with N 0, σ2 . The transformed variable X de…ned by:
p
X = U2 + W 2

has a Rayleigh distribution with a parameter σ2 :

X Rayleigh σ2

with a pdf fX x ; σ2 de…ned by:

x x2
fX x ; σ 2 = exp 8x 2 [0, +∞[
σ2 2σ2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 3 / 88
Problem (cont’d)
Question 1: we consider an i .i .d . sample fX1 , X2 , .., XN g . Derive the MLE estimator of
σ2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 4 / 88
Solution
x x2
fX x ; σ 2 = exp 8x 2 [0, +∞[
σ2 2σ2
The log-likelihood of the i .i .d . sample fx1 , x2 , .., xN g is
N N
1 N 2
`N σ 2 ; x = ∑ ln fX xi ; σ 2 = ∑ ln (xi ) N ln σ2
2σ2 i∑
xi
i =1 i =1 =1
2
b is de…ned as to be:
The ML estimator σ

b2 = arg max `N σ2 ; x
σ
σ 2 >0

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 5 / 88
Solution (cont’d)

N
1 N 2
b2 = arg max
σ ∑ ln (xi ) N ln σ2
2σ2 i∑
xi
σ 2 >0 i =1 =1
FOC (log-likelihood equations)

∂ `N σ 2 ; x N 1 N
= + 4 ∑ xi2 = 0
∂σ2 b
σ 2
2b
σ i =1
b2
σ

So, the ML estimator of σ2 is


1 N 2
2N i∑
b2 =
σ Xi
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 6 / 88
Solution (cont’d)

∂ `N σ 2 ; x N 1 N 2
2σ4 i∑
= + xi
∂σ2 σ2 =1
SOC:

∂2 `N σ 2 ; x N 1 N 2
∂σ4
=
b4
σ
∑ xi
b 6 i =1
σ
b2
σ

N b2
2N σ
=
b4
σ b6
σ
N
= <0
b4
σ
since ∑N 2 b2 . So, we have a maximum.
i =1 xi = 2N σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 7 / 88
Problem (cont’d)
b2 ?
Question 2: what is the asymptotic distribution of the MLE estimator σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 8 / 88
Solution
The average Fisher information matrix associated to the sample is:
!
2 ∂2 `N σ 2 ; X
IN σ = Eσ 2
∂σ4
!
N 1 N 2
= Eσ 2 + 6 ∑ Xi
σ4 σ i =1
!
N 1 N Xi2
σ4 i∑
= + E 2
σ4 =1
σ
σ2

Since (X /σ )2 = (U /σ )2 + (W /σ )2 where U /σ and W /σ are two independent standard


normal variables, then X 2 /σ2 χ2 (2) with
!
Xi2
Eσ 2 =2
σ2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 9 / 88
Solution (cont’d)
So, we have
!
N 1 N Xi2
σ4 i∑
2
IN σ = + Eσ 2
σ4 =1 σ2
N 2N
= + 4
σ4 σ
N
=
σ4
Since the sample is i .i .d ., the average Fisher information matrix is:
1 1
I σ2 = I σ2 = 4
N N σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 10 / 88
Solution (cont’d)
The regularity conditions hold, and we have:
p d
N σ b2 σ2 ! N 0, I 1
σ2

Here p d
b2
N σ σ2 ! N 0, σ4

where σ2 denotes the true value of the parameter. Or equivalently:


asy σ4
b2
σ N σ2 ,
N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 11 / 88
Problem (cont’d)
Question 3: consider the test

H0 : σ2 = σ20 H1 : σ2 = σ21

with σ21 > σ20 . Determine the critical region of the UMP test of size α.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 12 / 88
Solution
Given the Neyman Pearson lemma, the rejection region is given by:
( )
LN σ20 ; x1 , ., xN
W = x1 , .., xN j <K
LN σ21 ; x1 , ., xN

where K is a constant determined by the level of the test α. So, we have

`N σ20 ; x `N σ21 ; x < ln (K )

() ∑N
i =1 ln (xi ) N ln σ20 1
2σ20 ∑N 2
i =1 x i ∑N
i =1 ln (xi )

+N ln σ21 + 1
2σ21 ∑N 2
i =1 xi < ln (K )

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 13 / 88
Solution (cont’d)

1 1 1 N 2
2 i∑
N ln σ21 ln σ20 + xi < ln (K )
σ21 σ20 =1

1 1 1 N 2
2 i∑
() xi < K 1
σ21 σ20 =1

with K1 = ln (K ) N log σ21 log σ20 . or equivalently:

σ20 σ21 1 N 2
2 i∑
xi < K 1
σ20 σ21 =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 14 / 88
Solution (cont’d)

σ20 σ21 1 N 2
2 i∑
xi < K 1
σ20 σ21 =1

Since σ21 > σ20 , we have:


1 N 2
2N i∑
xi > A
=1

where A = K1 σ20 σ21 / σ20 σ21 N is a constant determined by α.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 15 / 88
Solution (cont’d)
The rejection region of the UMP test of size α

H0 : σ2 = σ20 H1 : σ2 = σ21

with σ21 > σ20 is: n o


b 2 (x ) > A
W= x :σ

b2 (x ) is the
where the critical value A is a constant determined by the size α and σ
2
realisation of the ML estimator σb (the test statistic):

1 N 2
2N i∑
b2 =
σ Xi
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 16 / 88
Solution (cont’d)
Given the de…nition of the size:

b2 > A H0
α = Pr ( W j H0 ) = Pr σ

Under the null, for N large, we have:


asy σ40
b2
σ N σ20 ,
H0 N
Then !
b2
σ σ2 A σ20
1 α = Pr p0 < p H0
2
σ0 / N σ20 / N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 17 / 88
Solution (cont’d)

!
b2
σ σ2 A σ20
1 α = Pr p0 < p H0
2
σ0 / N σ20 / N
Denote by Φ (.) the cdf of the standard normal distribution:

σ2
A = σ20 + p 0 Φ 1
(1 α)
N
The rejection region of the UMP test of size α

H0 : σ2 = σ20 H1 : σ2 = σ21

with σ21 > σ20 is:

σ2
W= b2 (x ) > σ20 + p 0 Φ
x :σ 1
(1 α)
N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 18 / 88
Problem (cont’d)
Question 4: consider the test

H0 : σ2 = 2 H1 : σ2 > 2

For a sample of size N = 100, we have


N
∑ xi2 = 470
i =1

What is the conclusion of the test for a size of 10%?

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 19 / 88
Solution
Consider the test
H0 : σ2 = σ20 H1 : σ2 = σ21
with σ21 > σ20 . The rejection region of the UMP test of size α is given by:

σ2
W= b2 (x ) > σ20 + p 0 Φ
x :σ 1
(1 α)
N

This region does not depend on the value of σ21 . So, it corresponds to the rejection region
of the one-sided UMP of size α :

H0 : σ2 = σ20 H1 : σ2 > σ20

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 20 / 88
Solution (cont’d)
H0 : σ2 = 2 H1 : σ2 > 2
σ2
W= b2 (x ) > σ20 + p 0 Φ
x :σ 1
(1 α)
N
NA: N = 100, α = 10% :
2
W= b 2 (x ) > 2 +
x :σ Φ 1
(0.9)
10
n o
b2 (x ) > 2.2563
W= x :σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 21 / 88
Solution (cont’d)
n o
b2 (x ) > 2.2563
W= x :σ

For this sample (N = 100) we have ∑N 2


i =1 xi = 470, and as a consequence

1 N 2 470
2N i∑
b 2 (x ) =
σ xi = = 2.35
=1 200

For a signi…cance level of 10%, we reject the null H0 : σ2 = 2.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 22 / 88
Problem (cont’d)
Question 5: determine the power of the one-sided UMP test of size α for:

H0 : σ2 = σ20 H1 : σ2 > σ20

Numerical application: N = 100, σ20 = 2 and α = 10%.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 23 / 88
Solution
The rejection region of the UMP of size α is:
n o
W= x :σ b 2 (x ) > A
p
with σ20 + Φ 1 (1 α) σ20 / N. By of the power, we have:

σ2
power = Pr ( W j H1 ) = Pr b2 > σ20 + p 0 Φ
σ 1
(1 α ) H1
N
Under the alternative hypothesis, for N large, we have:
asy σ4
b2
σ N σ2 , σ2 > σ20
H1 N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 24 / 88
Solution (cont’d)
Then, the power is equal to:
!
b2
σ σ2 A σ2 A σ2
power = 1 Pr p < p H1 =1 Φ p
σ2 / N σ2 / N σ2 / N
p
Given the de…nition of the critical value A = σ20 + Φ 1 (1 α) σ20 / N, we have:

σ20 σ2 σ2 1
power = 1 Φ p + 02 Φ (1 α) 8σ2 > σ20
σ2 / N σ

NA: σ20 = 2, N = 100 and α = 10%

2 σ2 2 1
power = 1 Φ + 2Φ (0.9) 8σ2 > σ20
σ2 /10 σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 25 / 88
Solution (cont’d)

0.9

0.8

0.7

0.6
power

0.5

0.4

0.3

0.2

0.1

0
2 2.2 2.4 2.6 2.8 3
2
σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 26 / 88
Solution (cont’d)

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 27 / 88
Problem (cont’d)
Question 6: consider the two-sided test

H0 : σ2 = σ20 H1 : σ2 6= σ20

What is the critical region of the test of size α?

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 28 / 88
Solution
Consider the one-sided tests:

Test A: H0 : σ2 = σ20 against H1 : σ2 < σ20

Test B: H0 : σ2 = σ20 against H1 : σ2 > σ20


The non-rejection regions of the UMP one-sided tests of size α/2 are:

σ2 α
WA = b2 (x ) > σ20 + p 0 Φ
x :σ 1
N 2

σ2 α
b2 (x ) < σ20 + p 0 Φ 1 1
WB =
x :σ
N 2
The non rejection region of the two-sided test corresponds to the intersection of these
two regions:
W = WA \WB

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 29 / 88
Solution (cont’d)
So, the non rejection region of the two-sided test of size α is:

σ2 α σ2 α
W= x : σ20 + p 0 Φ 1 b2 (x ) < σ20 + p 0 Φ
<σ 1
1
N 2 N 2

Since, Φ 1 (α/2) = Φ 1 (1 α/2) , this region can be rewritten as:

σ2 α
W= b 2 (x )
x: σ σ20 < p 0 Φ 1
1
N 2
The rejection region of the two-sided of size α is:

σ2 α
W= b 2 (x )
x: σ σ20 > p 0 Φ 1
1
N 2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 30 / 88
Problem (cont’d)
Question 7: determine the power of the two-sided test of size α for:

H0 : σ2 = σ20 H1 : σ2 6= σ20

Numerical application: N = 100, σ20 = 2 and α = 10%.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 31 / 88
Solution
The non rejection region of the two-sided test of size α is:
n o
W= x :A<σ b 2 (x ) < B

σ2 α σ2 α
A = σ20 + p 0 Φ 1
B = σ20 + p 0 Φ 1
1
N 2 N 2
By de…nition of the power:

power = Pr ( W j H1 ) = 1 Pr W H1

So, we have:
power = 1 b2 < B + Pr σ
Pr σ b2 < A

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 32 / 88
Solution (cont’d)

power = 1 b2 < B + Pr σ
Pr σ b2 < A

Under the alternative


asy σ4
b2
σ N σ2 , σ2 6= σ20
H1 N
So, we have
B σ2 A σ2
power = 1 Φ p +Φ p
σ2 / N σ2 / N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 33 / 88
Solution (cont’d)
We have
B σ2 A σ2
power = 1 Φ p +Φ p
σ2 / N σ2 / N
σ2 α σ2 α
A = σ20 + p 0 Φ 1
B = σ20 + p 0 Φ 1
1
N 2 N 2
So 8σ2 6= σ20 , the power function of the two sided test is de…ned by:

σ20 σ2 σ2 α
power = 1 Φ p + 02 Φ 1 1
σ2 / N σ 2
σ20 σ2 σ20 α
+Φ p + 2Φ 1
2
σ / N σ 2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 34 / 88
Solution (cont’d)
NA: N = 100, α = 10% and σ20 = 2. 8σ2 6= 2

2 σ2 2 1
power = 1 Φ + 2Φ (0.95)
σ2 /10 σ
2 σ2 2 1
+Φ + 2Φ (0.05)
σ2 /10 σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 35 / 88
Solution (cont’d)

0.9

0.8

0.7

0.6
power

0.5

0.4

0.3

0.2

0.1

0
1 1.5 2 2.5 3
2
σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 36 / 88
Solution (cont’d)

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 37 / 88
Problem (cont’d)
Question 8: show that the two-sided test is unbiased and consistent.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 38 / 88
Solution
The power function is de…ned as to be:

σ20 σ2 σ2 α
P σ2 = 1 Φ p + 20 Φ 1 1
σ2 / N σ 2
σ20 σ2 σ20 α
+Φ p + 2Φ 1
2
σ / N σ 2

If σ2 < σ20 , then:

lim P σ2 = 1 Φ (+∞) + Φ (+∞) = 1 1+1 = 1


N !∞

If σ2 > σ20 , then:

lim P σ2 = 1 Φ ( ∞) + Φ ( ∞) = 1 0+0 = 1
N !∞

The test is consistent.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 39 / 88
Solution (cont’d)
The power function is de…ned as to be:

σ20
σ2 σ2 α
P σ2 = 1 Φ p + 20 Φ 1 1
σ2 /
N σ 2
σ20 σ2 σ20 α
+Φ p + 2Φ 1
2
σ / N σ 2

This function reaches a minimum when σ2 tends to σ20 .


α α
lim P σ2 = 1 Φ Φ 1
1 +Φ Φ 1
σ2 !σ20 2 2
α α
= 1 1 +
2 2
= α

The test is unbiased.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 40 / 88
Subsection 4.2

The trilogy: LRT, Wald, and LM tests

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 41 / 88
Problem (Greene, 2007, page 531)
We consider two random variables Y and X such that the pdf of the conditional
distribution Y j X = x is given by

1 y
f Y jX ( y j x ; β ) = exp
β+x β+x
For convenience, let
1
βi =
β + xi
This exponential density is a restricted form of a more general gamma distribution,
ρ
βi ρ 1
f Y jX ( y j x ; β, ρ) = y exp ( yi βi )
Γ (ρ) i
The restriction is ρ = 1. We want to test the hypothesis

H0 : ρ = 1 versus H1 : ρ 6= 1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 42 / 88
Reminder: the gamma function
The gamma function Γ (p ) is de…ned as to be:
Z∞
Γ (p ) = tp 1
exp ( t ) dt 8p > 0
0

The gamma function obeys the recursion

Γ (p ) = (p 1 ) Γ (p 1)

1 p
Γ = π
2
So for integer values of p, we have

Γ (p ) = (p 1)!

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 43 / 88
Reminder: the gamma function (cont’d)
The derivatives of the gamma function are
Z∞
∂k Γ (p )
= (ln (t ))k t p 1
exp ( t ) dt
∂p k
0

The …rst two derivatives of ln (Γ (p )) are denoted

∂ ln (Γ (p )) Γ0
= = Ψ (p )
∂p Γ
0
∂2 ln (Γ (p )) ΓΓ" Γ2
= = Ψ 0 (p )
∂p 2 Γ2
where Ψ (p ) and Ψ0 (p ) are the digamma and trigamma functions (see polygamma
function and function psy in Matlab).

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 44 / 88
Problem (cont’d)
Question 1: consider an i .i .d . sample fXi , Yi gN
i =1 and write its log-likelihood under H1
(unconstrained model) and under H0 (constrained model).

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 45 / 88
Solution
Under H1 , with θ = ( β : ρ)> , we have
ρ
βi ρ 1 1
f Y i jX i ( yi j xi ; θ) = y exp ( yi βi ) with βi =
Γ (ρ) i β + xi
N
`N ( y j x ; θ) = ∑ ln f Y jXi i
( yi j xi ; θ)
i =1
The log-likelihood under H1 (unconstrained model) is:
N N N
`N ( y j x ; θ) = ρ ∑ ln ( βi ) N ln (Γ (ρ)) + (ρ 1) ∑ ln (yi ) ∑ yi βi
i =1 i =1 i =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 46 / 88
Solution (cont’d)
Under H0 : ρ = 1, we have
1
f Y i jX i ( yi j xi ; β) = βi exp ( yi βi ) with βi =
β + xi
N
`N ( y j x ; β ) = ∑ ln f Y jX i i
( yi j xi ; β )
i =1
The log-likelihood under H0 (constrained model) is:
N N
`N ( y j x ; β ) = ∑ ln ( βi ) ∑ yi βi
i =1 i =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 47 / 88
Problem (cont’d)
Question 2: write the gradient vectors and the Hessian matrices associated to the
unconstrained log-likelihood (under H1 ) and to the constrained log-likelihood (under H0 ).

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 48 / 88
Solution
Under H1 :
N N N
`N ( y j x ; θ) = ρ ∑ ln ( βi ) N ln (Γ (ρ)) + (ρ 1) ∑ ln (yi ) ∑ yi βi
i =1 i =1 i =1

Remarks:
∂βi ∂ (1/ ( β + xi )) 1
= = = β2i
∂β ∂β ( β + xi )2
∂ ln ( βi ) ∂( ln ( β + xi )) 1
= = = βi
∂β ∂β β + xi
∂ ln (Γ (ρ))
= Ψ (ρ)
∂ρ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 49 / 88
Solution (cont’d)

N N N
`N ( y j x ; θ) = ρ ∑ ln ( βi ) N ln (Γ (ρ)) + (ρ 1) ∑ ln (yi ) ∑ yi βi
i =1 i =1 i =1
The gradient vector under H1 is:
0 1
∂`N ( y jx ;θ)
∂` ( y j x ; θ) B ∂β C
gN ( y j x ; θ) = N =@ A
∂θ ∂`N ( y jx ;θ)
∂ρ

with
N N
∂`N ( y j x ; θ)
∂β
= ρ ∑ βi + ∑ yi β2i
i =1 i =1
N N
∂`N ( y j x ; θ)
= ∑ ln ( βi ) N Ψ (ρ) + ∑ ln (yi )
∂ρ i =1 i =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 50 / 88
Solution (cont’d)
N N
∂`N ( y j x ; θ)
∂β
= ρ ∑ βi + ∑ yi β2i
i =1 i =1
So, we have:
N N
∂2 `N ( y j x ; θ)
2
= ρ ∑ β2i 2 ∑ yi β3i
∂β i =1 i =1
N
∂2 ` ( y j x ; θ)
N
∂β∂ρ
= ∑ βi
i =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 51 / 88
Solution (cont’d)

N N
∂`N ( y j x ; θ)
= ∑ ln ( βi ) N Ψ (ρ) + ∑ ln (yi )
∂ρ i =1 i =1
So, we have
∂2 `N ( y j x ; θ)
= N Ψ0 (ρ)
∂ρ2
N
∂2 `N ( y j x ; θ)
∂ρ∂β
= ∑ βi
i =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 52 / 88
Solution (cont’d)
The Hessian matrix associated to the log-likelihood under H1 is:
0 2 2 1
∂ `N ( y jx ;θ) ∂ `N ( y jx ;θ)
∂2 `N ( y j x ; θ) B ∂β2 ∂β∂ρ C
HN ( y j x ; θ) = =@ A
∂θ∂θ> ∂2 `N ( y jx ;θ) ∂2 `N ( y jx ;θ)
∂ρ∂β ∂ρ2

with !
2 3
ρ ∑N
i =1 β i 2 ∑N
i =1 y i β i ∑N
i =1 β i
HN ( y j x ; θ) =
∑N
i =1 β i N Ψ0 (ρ)

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 53 / 88
Solution (cont’d)
Under H0 : ρ = 1, the gradient (scalar) is
N N
∂ `N ( y j x ; β )
gN ( y j x ; β ) =
∂β
= ∑ βi + ∑ yi β2i
i =1 i =1

The Hessian (scalar) is:

∂2 `N ( y j x ; β )
HN ( y j x ; β ) = = ∑N 2
i =1 β i 2 ∑N 3
i =1 y i β i
∂β2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 54 / 88
Problem (cont’d)
Question 3: write the average Fisher information matrices under H1 and under H0 .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 55 / 88
Solution
Under H1 (unconstrained model), the Hessian (stochastic) is
!
ρβ2i 2Yi β3i βi
Hi ( Yi j xi ; θ) =
βi Ψ0 (ρ)

The average Fisher information matrice can be de…ned (one of the three de…nitions) as:

I (θ) = EX Eθ ( Hi ( Yi j xi ; θ))

!
ρβ2i + 2Eθ (Yi ) β3i βi
I (θ) = EX
βi Ψ0 (ρ)
since βi = 1/ ( β + Xi ) depends on the random variable Xi .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 56 / 88
Solution

!
ρβ2i + 2Eθ (Yi ) β3i βi
I (θ) = EX
βi Ψ0 (ρ)
Consider the score of the unit i . By de…nition, we have:
!
ρβi + Yi β2i
Eθ (si ( Yi j xi ; θ)) = = 02 1
ln ( βi ) Ψ (ρ) + ln (Yi )

So, we have
ρ
Eθ (Y i ) =
βi
where Eθ denotes the expectation with respect to the conditional distribution of Y given
X = x.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 57 / 88
Solution (cont’d)

!
ρβ2i + 2Eθ (Yi ) β3i βi
I (θ) = EX
βi Ψ0 (ρ)
ρ
Eθ (Y i ) =
βi
Under H1 (unconstrained model), the average Fisher information is de…ned as to be:
!
ρβ2i βi
I (θ) = EX
βi Ψ0 (ρ)

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 58 / 88
Solution (cont’d)
Under H0 (constrained model), we have:

Hi ( Yi j xi ; β) = β2i 2Yi β3i

Eβ (si ( Yi j xi ; β)) = Eβ βi + Yi β2i = 0

The average Fisher information number is de…ned by:

I ( β) = EX Eβ ( Hi ( Yi j xi ; β))

= EX β2i + 2Eβ (Yi ) β3i

= EX β2i

The average Fisher information number is equal to:

I ( β ) = EX β2i

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 59 / 88
Problem (cont’d)
Question 4: denote b θH 1 the ML estimator of θ = ( β : ρ)> obtained under H1 and
θH 0 = b
b βH 0 the ML estimator of β obtained under H0 : ρ = 1. Determine the asymptotic
distribution and the asymptotic variance covariance matrix of b
θH 1 and b
θH 0 .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 60 / 88
Solution
The regularity conditions hold. Under H1 (unconstrainded model) we have:
p d
N b
θH 1 θ1 ! N 0, I 1
(θ1 )

where θ1 denotes the true value of the parameters (under H1 ), or equivalently:

b
asy 1 1
θH 1 N θ1 , I (θ1 )
H1 N
with !
ρβ2i βi
I (θ1 ) = EX
βi Ψ0 (ρ)

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 61 / 88
Solution (cont’d)
The regularity conditions hold. Under H0 (constrainded model) we have:
p d
N b
θH 0 θ0 ! N 0, I 1
(θ0 )

where θ0 = β0 denotes the true value of the parameter (under H0 ), or equivalently:

b
asy 1 1
θH 0 N θ0 , I (θ0 )
H0 N
with
I (θ0 ) = EX β2i

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 62 / 88
Problem (cont’d)
Question 5: Propose three alternative estimators of the average Fisher information
matrices under H1 and under H0 .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 63 / 88
Solution
Three alternative estimators of the average Fisher information matrix I (θ) can be used:

1 N b b
N i∑
bI A b
θ = Ii θ
=1
> !
1 N ∂`i (θ; yi j xi ) ∂`i (θ; yi j xi )
N i∑
bI B b
θ =
=1 ∂θ b
θ ∂θ b
θ

1 N ∂2 `i (θ; yi j xi )
N i∑
bI c b
θ =
=1 ∂θ∂θ> b
θ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 64 / 88
Solution (cont’d)
First estimator: actual Fisher information matrix

1 N b b
N i∑
bI A b
θ = Ii θ
=1

Under H1 : 0 1
1 ρ ∑N
b b2 ∑N b
bI A b @ i =1 β i i =1 β i A
θ =
N b
∑Ni =1 β i N Ψ0 (b
ρ)
Under H0 :
1 N b b 1 N b2
N i∑
bI A b
θ = Ii θ = ∑ i =1 β i
=1 N

where b
βi = 1/ b
β + xi and where the estimators b
β and b
ρ are obtained under H1
(unconstrained model) or H0 (constrained model) given the case.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 65 / 88
Solution (cont’d)
Second estimator: BHHH estimator
> !
1 N ∂`i (θ; yi j xi ) ∂`i (θ; yi j xi )
N i∑
bI B b
θ =
=1 ∂θ b
θ ∂θ b
θ

Under H1 :
0 1
2
1 N B ρb
b βi + yi b
βi
C
N i∑
bI B b
θ = @ A
=1 ln b
βi Ψ (b
ρ) + ln (yi )
2
ρb
b βi + yi b
βi ln b
βi Ψ (b
ρ) + ln (yi )

Under H0 :
1 N 2 2

N i∑
bI B b
θ = b
βi + yi b
βi
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 66 / 88
Solution (cont’d)
Third estimator: Hessian

1 N ∂2 `i (θ; yi j xi )
N i∑
bI C b
θ =
=1 ∂θ∂θ> b
θ

Under H1 : 0 1
2 3
1 N ρb
b βi + 2yi b
βi b
βi
N i∑
bI C b
θ = @ A
=1 b
βi Ψ0 (b
ρ)
Under H0 :
1 N 2 3
N i∑
bI C b
θ = b
βi + 2yi b
βi
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 67 / 88
Problem (cont’d)
Question 6: Consider the dataset provided by Greene (2007) in the …le
Chapter4_Exercise2.xls. Write a Matlab code (1) to estimate the parameters of model
under H1 ( unconstrained model) by MLE, and (2) to compute three alternative
estimates of the asymptotic variance covariance matrix.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 68 / 88
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 69 / 88
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 70 / 88
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 71 / 88
Remarks

1 Asymptotically the three estimators of the asymptotic variance covariance matrix are
equivalent.

2 But, this exercise con…rms that these estimators can give very di¤erent results for
small samples

3 The striking di¤erence of the BHHH estimator is typical of its erratic performance in
small samples

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 72 / 88
Problem (cont’d)
Question 7: Write a Matlab code (1) to estimate the parameters of model under H0
( constrained model) by MLE, and (2) to compute three alternative estimates of the
asymptotic variance covariance matrix.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 73 / 88
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 74 / 88
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 75 / 88
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 76 / 88
Problem (cont’d)
Question 8: test the hypothesis

H0 : ρ = 1 versus H1 : ρ 6= 1

with a likelihood ratio (LR) test for a signi…cance level of 5%.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 77 / 88
Solution
The likelihood ratio (LR) test-statistic is de…ned by:

LR = 2 `N b
θH 0 ; y j x `N b
θH 1 ; y j x

In this sample, we have a realisation equal to

LR (y ) = 2 ( 88.4363 + 82.9160) = 11.0406

The critical region is


n o
W = y : LR (y ) > χ20.95 (1) = 3.8415

where χ20.95 (1) is the critical value of the chi-squared distribution with p = 1 degrees of
freedom. Conclusion: for a signi…cance level of 5%, we reject the null H0 : ρ = 1.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 78 / 88
Problem (cont’d)
Question 9: test the hypothesis

H0 : ρ = 1 versus H1 : ρ 6= 1

with a Wald test for a signi…cance level of 5%.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 79 / 88
Solution
The null hypothesis H0 : ρ = 1 can be expressed as

H0 : c (θ) = 0

with c (θ) = ρ 1. The Wald test-statistic is de…ned by:

> > 1
∂c ∂c
Wald = c b
θH 1 b b asy b
θH 1 V θH 1 b
θH 1 c b
θH 1
∂θ> ∂θ>
Here, we have
∂c b
θH 1 = 0 1
∂θ>
Then, we get
2
Wald (y ) = b
ρH 1 1 Vasy1 b
ρH 1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 80 / 88
Solution (cont’d)
2
Wald (y ) = b
ρH 1 1 Vasy1 b
ρH 1

Given the estimator chosen for the asymptotic variance, we get:

(3.1509 1)2
WaldA (y ) = = 8.0214
0.5768
(3.1509 1)2
WaldB (y ) = = 3.0096
1.5372

(3.1509 1)2
WaldC (y ) = = 7.3335
0.6309
The critical region is
n o
W = y : Wald (y ) > χ20.95 (1) = 3.8415

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 81 / 88
Solution (cont’d)
Conclusion:

1 For a signi…cance level of 5%, the Wald test-statistic based on the estimators A and
C (actual Fisher matrix and Hessian) of the asymptotic variance covariance matrix
lead to reject the null H0 : ρ = 1.

2 For a signi…cance level of 5%, the Wald test-statistic based on the estimators B
(BHHH estimator) of the asymptotic variance covariance matrix fails to reject the
null H0 : ρ = 1.

3 In most of software, the Hessian (estimator C) is preferred and the Wald


test-statistics are computed with this estimator.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 82 / 88
Problem (cont’d)
Question 10: test the hypothesis

H0 : ρ = 1 versus H1 : ρ 6= 1

with a Lagrange Multiplier test for a signi…cance level of 5%. Write a Matlab code to
compute the three possibles values of the LM test-statistic.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 83 / 88
Solution
The Lagrange multiplier test is based on the restricted estimators. The LM test-statistic
is de…ned by:
> 1
LM = sN b θH ; yi j xi0
bI N b
θH sN b θH ; yi j xi
0 0

or equivalently
> 1
LM = sN b
θH 0 ; yi j xi N bI b
θH 0 sN b
θH 0 ; yi j xi

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 84 / 88
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 85 / 88
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 86 / 88
Solution (cont’d)

So, given the estimator chosen for bI b


θH 0 , we have:

LMA (y ) = 4.7825

LmB (y ) = 15.6868

LMC (y ) = 5.1162
The critical region is
n o
W = y : LM (y ) > χ20.95 (1) = 3.8415

Conclusion: for a signi…cance level of 5%, we reject the null H0 : ρ = 1, whatever the
choice of the estimator.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 87 / 88
End of Exercices - Chapter 4

Christophe Hurlin

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 88 / 88
Data science and advanced programming
Correction - Series 1
Christophe HURLIN

October 27, 2024

Exercice Maximum de vraisemblance et tests LR-LM-Wald. Barème : 30 points

Partie I : Maximum de vraisemblance pour θ1 connu (12 points)

Question 1 (2 points) On sait que les variables (X1 , ..., Xn ) sont i.i.d. de même loi que X (0,5
point). Dès lors, on a :
n
X
ℓn (θ2 ; x) = ln fX (xi ; θ1 ) (1)
i=1
Xn  
= ln (1 + θ1 ) + ln θ2 + θ1 ln (xi ) − θ2 x1+θ
i
1
(2)
i=1

La log-vraisemblance associée au n-échantillon (x1 , ..., xn ) est donc définie par :


n
X n
X
ℓn (θ2 ; x) = n ln (1 + θ1 ) + n ln (θ2 ) + θ1 ln (xi ) − θ2 x1+θ
i
1
(1,5 point) (3)
i=1 i=1

Question 2 (2 points) Soit gn (θ2 ; x) le gradient associé à l’échantillon (x1 , ..., xn ).


n
∂ℓn (θ2 ; x) n X 1+θ1
gn (θ2 ; x) = = − xi (1 point) (4)
∂θ2 θ2
i=1

Soit Hn (θ2 ; x) la hessienne associée à l’échantillon (x1 , ..., xn ).

∂ 2 ℓn (θ2 ; x) n
Hn (θ2 ; x) = 2 =− 2 (1 point) (5)
∂θ2 θ2

1
Question 3 (2 points) Soit θb2 l’estimateur du maximum de vraisemblance du paramètre θ2 . Ce
dernier vérifie :
θb2 = arg max ℓn (θ2 ; x) (0,5 point) (6)
θ2 ∈R+

La condition nécessaire (équation de vraisemblance) du programme d’optimisation de la log-


vraisemblance s’écrit alors :
n
  ∂ℓn (θ2 ; x) n X 1+θ1
gn θ2 ; x =
b = − xi = 0 (0,5 point) (7)
∂θ2 θb2 θb2 i=1

D’où l’on tire : !−1


n
1 X 1+θ1
θb2 = xi (8)
n
i=1
La condition suffisante du programme d’optimisation de la log-vraisemblance s’écrit :
  ∂ 2 ℓn (θ2 ; x) n
Hn θb2 ; x = 2 = − < 0 (0,5 point) (9)
∂θ2 θb2 θb22
On a bien un maximum : l’estimateur du maximum de vraisemblance est donc défini par :
n
!−1
1 X
1+θ 1
θb2 = Xi (0,5 point) (10)
n
i=1

Question 4 (2 points) Puisque les variables Xi1+θ1 sont i.i.d. de même loi que X 1+θ1 avec
E X 1+θ1 = 1/θ2 , la loi faible des grands nombres (théorème de Khintchine) implique que :
n
1 X 1+θ1 p  1+θ1  1
Xi →E X = (1 point) (11)
n θ2
i=1

Soit g (z) = z −1 une fonction continue telle que


n
! n
!−1
1 X
1+θ 1 X 1+θ1
θb2 = g Xi 1 = Xi (12)
n n
i=1 i=1

Par application du continuous mapping theorem (CMP), il vient :


n
!  
1 X
1+θ1 p 1
θ2 = g
b Xi →g = θ2 (0,5 point) (13)
n θ2
i=1

On en déduit que θb2 est un estimateur convergent (au sens faible) du paramètre θ2 :
p
θb2 → θ2 (0,5 point) (14)
 
Question 5 (2 points) Le n-échantillon X11+θ1 , . . . , Xn1+θ1 est i.i.d. de même loi que X 1+θ1
avec E X 1+θ1 = 1/θ2 et V X 1+θ1 = 1/θ22 . D’après le théorème central limite de Lindberg-
 

Levy, la moyenne empirique n−1 ni=1 Xi1+θ1 vérifie :


P

n
!
√ X  
d
  
n n−1 Xi1+θ1 − E X 1+θ1 → N 0, V X 1+θ1 (15)
i=1

Page 2
soit encore :
n
!

 
−1
X 1 d1
n n Xi1+θ1 − → N 0, 2 (1 point) (16)
θ2 θ 2
i=1

Soit g (z) = z une fonction continue telle que θb2 = g(n−1 ni=1 Xi1+θ1 ) = (n−1 ni=1 Xi1+θ1 )−1
−1
P P
et ∂g (z) /∂z = −1/z 2 . Par application de la méthode delta, il vient :
n
!  ! !
2
√ X 1 d 1 ∂g (z)
n g n−1 Xi1+θ1 − g → N 0, 2 × (0,5 point) (17)
θ2 θ2 ∂z 1/θ2
i=1

ou encore :
√  θ24
  
d
n θ2 − θ2 → N 0, 2
b (18)
θ2
On obtient au final : √  
d
n θb2 − θ2 → N 0, θ22

(0,5 point) (19)

Question 6 (2 points) Pour θ1 = 1, la réalisation de l’estimateur θb2 (estimation) est égale à :


n
!−1 
100 −1 200

1 X
2
θb2 = xi = = = 2 (1 point) (20)
n 200 100
i=1

D’après le résultat de la question 5, on a :


√  
d
n θb2 − θ2 → N 0, θ22

(21)

Pour n grand mais fini, on peut utiliser l’approximation suivante :

θ22
 
asy
θ2 ≈ N θ2 ,
b (22)
n
Dès lors, pour un niveau de risque α, il vient :
!
−1
α θb2 − θ2  α
Pr Φ < √ < Φ−1 1 − =1−α (23)
2 θ2 / n 2

où Φ (.) désigne la fonction de répartition de la loi normale standard. Puisque Φ−1 (α/2) =
−Φ−1 (1 − α/2), on en déduit un intervalle de confiance sur la valeur du paramètre θ2 :
 
θ2 −1  α b θ2 −1  α
IC1−α = θ2 −
b √ Φ 1− ; θ2 + √ Φ 1− (24)
n 2 n 2
Le paramètre θ2 étant inconnu, on le remplace par son estimateur :
    
1 −1  α b 1 −1  α
IC1−α = θ2 1 − √ Φ
b 1− ; θ2 1 + √ Φ 1− (25)
n 2 n 2

Application numérique : α = 5%, n = 200.


    
1, 96 1, 96
IC95% = 2 × 1 − √ ;2 × 1 + √ = [1, 7228 ; 2, 2772] (1 point)
200 200

Page 3
Partie II : Maximum de vraisemblance (8 points)
On admet que le gradient gn (θ; x) et la matrice hessienne Hn (θ; x) associés à l’échantillon
(x1 , . . . , xn ) s’écrivent respectivement sous la forme :
 Pn Pn 
n 1+θ1
1+θ1 + i=1 ln (x i ) − θ 2 i=1 ln (x i ) xi
gn (θ; x) =  Pn  (26)
n 1+θ1
θ2 − i=1 ix
Pn 2 1+θ1 Pn
n
x1+θ
 
− (1+θ 2 − θ2 i=1 ln (xi ) xi − i=1 ln (xi ) i
1
1)
Hn (θ; x) =  Pn  (27)
1+θ1
− i=1 ln (xi ) xi − θn2
2

Question 7 (2 points) Pour θ1 = 1 et θ2 = 2, on obtient


n Pn Pn 2
!
2 + i=1 ln (xi ) − 2 i=1 ln (xi ) xi
gn (θ; x) = n Pn (28)
2
2 − i=1 xi
200
!
2 − 130 − 2 × (−15)
= 200
(29)
2 − 100
 
0
= (1 point) (30)
0

De la même façon, on obtient :

− n4 − 2 ni=1 ln (xi )2 x2i − ni=1 ln (xi ) x2i


P P !
Hn (θ; x) = (31)
− ni=1 ln (xi ) x2i − n4
P

− 200
!
4 − 2 × 19 − (−15)
= (32)
− (−15) − 200
4
!
−88 15
= (1 point) (33)
15 −50

Question 8 (2 points) On note θe = (1, 2)′ la condition initiale. Soit θb le nouveau point candidat
déterminé par la méthode de Gauss Newton défini par :

θb = θe − Hn−1 (θ;
e x)gn (θ;
e x) (0,5 point) (34)

ou encore ! !−1 ! !
1 −88 15 0 1
θb = − × = (1 point) (35)
2 15 −50 0 2
Le vecteur (1, 2)′ est optimal et il correspond à l’estimation finale, puisque le gradient de la
log-vraisemblance en ce vecteur est nul.

Question 9 (2 points) Par définition, on a :


  
I n (θ) = Eθ0 −Hn θ;bX (0,5 point) (36)

Page 4
Il vient :
 n Pn 2 1+θ1 Pn 1+θ1

+ θ2
(1+θ1 )2 i=1 Eθ0 ln (Xi ) Xi i=1 Eθ0 ln (Xi ) Xi
I n (θ) =  Pn  (37)
i=1 Eθ0 ln (Xi ) Xi1+θ1 n
θ22
 
Pour θ = (1, 2)′ et sachant que Eθ0 ln (X)2 X 2 = 0, 0950, Eθ0 ln (X) X 2 = −0, 0750 et


n = 200, on obtient :
n 2 2 n × E ln (X) X 2
!
4 + 2 × n × Eθ0 ln (X) X θ0
I n (θ) = (38)
n
n × Eθ0 ln (X) X 2 4
200
!
4 + 2 × 200 × 0, 0950 −200 × 0, 0750
= 200
(39)
−200 × 0, 0750 4
!
88 −15
= (1,5 point) (40)
−15 50

Question 10 (2 points) Sous les hypothèses de régularité, l’estimateur du MV vérifie :


√  
d
n θb − θ0 → N 0, I −1 (θ0 )

(41)

où I(θ0 ) désigne la matrice d’information de Fisher moyenne. La matrice de variance covari-
ance asymptotique est égale à :
−1
   
b asy θb = b
V I n θb (0,5 point) (42)

On obtient donc (0,5 point)


!−1 50 15
! !
  88 −15 4175 4175 0, 0120 0, 0036
b asy θb =
V = = (43)
15 88
−15 50 4175 4175 0, 0036 0, 0211

Les écarts-types asymptotiques sont égaux à :


    p
b 1/2 θb1 = 0, 0120 = 0, 1094
std θb1 = V (0,5 point) (44)
asy

    p
b 1/2 θb2 = 0, 0211 = 0, 1452
std θb2 = V (0,5 point) (45)
asy

Partie III : Inférence (10 points)

Question 11 (3 points) La statistique de test LR est définie par :


  
LR = −2 ℓn (θH0 ; x) − ℓn θbH1 ; x (46)

où θH0 = (θ1,H0 , θ2,H0 )′ = (0, 2)′ désigne le vecteur de paramètres sous H0 et θbH1 = (θb1,H1 ; θb2,H1 )′ =
(1, 2)′ l’estimation du vecteur de paramètres θ obtenue sous l’hypothèse alternative par max-
imum de vraisemblance. La log-vraisemblance associée au n-échantillon (x1 , ..., xn ) sous

Page 5
l’hypothèse nulle est égale à :
n n
X X 1+θ1,H0
ℓn (θH0 ; x) = n ln (1 + θ1,H0 ) + n ln (θ2,H0 ) + θ1,H0 ln (xi ) − θ2,H0 xi (47)
i=1 i=1
= 200 × ln (1) + 200 × ln (2) + 0 × (−130) − 2 × 121 (48)
= −103, 3706 (0,5 point) (49)

La log-vraisemblance associée au n-échantillon (x1 , ..., xn ) sous l’hypothèse alternative est :


      n n
X X 1+θb1,H1
ℓn θbH1 ; x = n ln 1 + θb1,H1 + n ln θb2,H1 + θb1,H1 ln (xi ) − θb2,H1 xi (50)
i=1 i=1
n
X n
X
= n ln (2) + n ln (2) + ln (xi ) − 2 x2i (51)
i=1 i=1
= 200 × ln (2) + 200 × ln (2) − 130 − 2 × 100 (52)
= −52, 7411 (0,5 point) (53)

La réalisation de la statistique LR est donc égale à :


  
LR (x) = −2 ℓn (θH0 ; x) − ℓn θbH1 ; x (54)
= −2 (−52, 7411 + 103, 3706) (55)
= 101, 2590 (1 point) (56)

Sous les hypothèses de régularité et sous l’hypothèse nulle, on a :


d
LR → χ2 (2) (57)

La région critique pour un niveau de risque de 5% est donc égale à :

W = x : LR (x) > χ20.95 (2) = 5, 9915



(0,5 point) (58)

Pour un niveau de risque de 5%, on rejette l’hypothèse nulle H0 : θ1 = 0 et θ2 = 2 (0,5 point).

Question 12 (3 points) Le test se réécrit sous la forme

H0 : Rθ = q (59)

avec :    
1 0 0
R = I2 = q= (0,5 point) (60)
0 1 2
La statistique de Wald est définie par :
 ⊤    −1  

Wald = RθbH1 − q b asy θbH
RV 1 R R bH − q
θ 1

où θbH1 désigne l’estimateur du maximum de vraisemblance obtenu sous l’hypothèse alterna-
tive. Il vient :
      
1 0 1 0 1
R θH 1 − q =
b − = (0,5 point) (61)
0 1 2 2 0

Page 6
!−1 !
  88 −15 0, 0120 0, 0036
b asy θbH =
V = (62)
1
−15 50 0, 0036 0, 0211
!
 

  0, 0120 0, 0036
RV
b asy θbH
1 R = b asy θbH =
V 1 (0,5 point) (63)
0, 0036 0, 0211
On obtient donc :
!−1
⊤ 0, 01202 0 
Wald = 1 0 1 0 (64)
0 0, 02112
!
⊤ 88 −15 
= 1 0 1 0 (65)
−15 50

On en déduit immédiatement la réalisation de la statistique de Wald :

Wald (x) = 88 (1 point) (66)

Sous les hypothèses de régularité et sous l’hypothèse nulle, on a :


d
Wald → χ2 (2) (67)

La région critique pour un niveau de risque de 5% est donc égale à :

W = x : Wald (x) > χ20.95 (2) = 5, 9915



(0,5 point) (68)

Pour un niveau de risque de 5%, on rejette l’hypothèse nulle H0 : θ1 = 0 et θ2 = 2 (0,5 point).

Question 13 (4 points) La statistique du score ou statistique LM (Lagrange Multiplier) est


définie par :

LM = sn (θH0 ; X) Vb asy (θH ) sn (θH ; X) (0,5 point)
0 0 (69)
où θH0 désigne le vecteur de paramètres sous H0 , sn (θ; X) désigne le score associé à log-
vraisemblance de l’échantillon (X1 , ..., Xn ) sous l’hypothèse alternative H1 et V
b asy (θ) désigne
un estimateur convergent de la matrice de variance-covariance asymptotique de l’estimateur
du maximum de vraisemblance θ. b Dans notre cas :

∂ℓn (θ; X)
Sn (θH0 ; X) = (70)
∂θ θH0
 
n Pn Pn 1+θ1,H0
1+θ1,H0 + i=1 ln (Xi ) − θ2,H0 i=1 ln (Xi ) Xi
=  (0,5 point)
(71)
 
1+θ1,H0
Pn 
n
θ2,H0 − i=1 Xi

Page 7
La réalisation du score (gradient) est égale à :
n Pn Pn !
1 + i=1 ln (xi ) − 2 i=1 ln (xi ) xi
Sn (θH0 ; x) = n Pn (72)
2 − i=1 xi
n Pn Pn !
1 + i=1 ln (xi ) − 2 i=1 ln (xi ) xi
= n Pn (73)
2 − i=1 xi
!
200 − 130 − 2 × (−44)
= 200
− 121
 2
158
= (1 point) (74)
−21

L’estimation de la matrice de variance-covariance asymptotique de θb sous H0 est égale à :


!
0, 0043 0, 0013
b asy (θH ) =
V (75)
0
0, 0013 0, 0204

On en déduit la valeur de la réalisation de la statistique LM :


! !
 0, 0043 0, 0013 158
LM (x) = 158 −21 × × (76)
0, 0013 0, 0204 −21
ou encore :
LM (x) = 107, 3763 (1 point) (77)
Sous les hypothèses de régularité et sous l’hypothèse nulle, on a :
d
LM → χ2 (2) (78)

La région critique pour un niveau de risque de 5% est donc égale à :

W = x : LM (x) > χ20.95 (2) = 5, 9915



(0,5 point) (79)

Pour un niveau de risque de 5%, on rejette l’hypothèse nulle H0 : θ1 = 0 et θ2 = 2 (0,5 point).

Page 8

You might also like