0% found this document useful (0 votes)

61 views238 pages

Rayleigh Distribution Estimation Exercises

The document discusses estimation theory related to the Rayleigh distribution, focusing on the properties of an unbiased estimator for the parameter σ² derived from independent normally distributed random variables. It provides detailed calculations for expected values and variances, as well as the consistency of the estimator. Additionally, it includes instructions for approximating the finite sample distribution using MATLAB simulations.

Uploaded by

danyprimovalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views238 pages

Rayleigh Distribution Estimation Exercises

Uploaded by

danyprimovalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Exercises Chapter 1

Estimation Theory
Data science and advanced programming

Christophe Hurlin

HEC Lausanne

September 2024

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 1 / 68
Exercise 1

Rayleigh distribution

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 2 / 68
Problem
We consider two continuous independent random variables X and Y normally distributed
with N 0, σ2 . The transformed variable R de…ned by:
p
R = X2 + Y2

has a Rayleigh distribution with a parameter σ2 :

R Rayleigh σ2

with a pdf fR r ; σ2 de…ned by:

r r2
fR r ; σ 2 = exp 8r 2 [0, +∞[
σ2 2σ2
r
π 4 π
E (R ) = σ V (R ) = σ2
2 2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 3 / 68
Problem (cont’d)
We consider an i .i .d . sample fR1 , R2 , .., RN g and an estimator (MLE) of σ2 de…ned by

1 N 2
2N i∑
b2 =
σ Ri
=1

b2 is an unbiased estimator of σ2 .
Question 1: Show that σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 4 / 68
Solution:
We have:
1 N 2
2N i∑
b2 =
σ Ri
=1

b2 .
We want to calculate E σ

Since the sample fR1 , R2 , .., RN g is i .i .d ., we have:

!
1 N 2 1 N
E σ b2 = E
2N i =1∑ R i =
2N i∑
E Ri2
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 5 / 68
Solution (cont’d):
!
1 N 2 1 N
2N i∑ 2N i∑
2
E σ
b =E Ri = E Ri2
=1 =1
We know that: r
π 4 π
E (R i ) = σ V (R i ) = σ2
2 2
So, we have:

E Ri2 = V (R i ) + E (R i )2
4 π π 2
= σ2 + σ
2 2
= 2σ2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 6 / 68
Solution (cont’d):

1 N
2N i∑
b2
E σ = E Ri2
=1
1 N
2N i∑
= 2σ2
=1
N2σ2
=
2N
So, we have:
b2 = σ2
E σ

b2 is unbiased.
The estimator σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 7 / 68
Remark: Sometimes, the Rayleigh distribution is parametrized by σ. But, it is easier to
b2 is unbiased than to show that σ
show that σ b is unbiased...
v
u
u 1 N
b=t
2N i∑
σ Ri2
=1

Then to study the bias, we have to calculate:

0vu 1
u 1 N
b) = E @t
2N i∑
E (σ Ri2 A ???
=1

since for a nonlinear function g (.) , E (g (x )) 6= g (E (x )) ... The only solution is to

compute the integral
R∞
E (σb ) = 0 x fσb x ; σ2 dx

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 8 / 68
Problem (cont’d)
b2 is a (weakly) consistent estimator of σ2 . We admit that the
Question 2: Show that σ
raw moments of R are de…ned by:
k k
E Rik = σk 2 2 Γ 1 + k 2N
2

and where Γ (.) denotes the gamma function with:

Z∞
Γ (x ) = tx 1
exp ( t ) dt.
0

and
Γ (x ) = (x 1)! for x 2 N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 9 / 68
Solution:
b2 . Since the sample fR1 , R2 , .., RN g is i .i .d ., we have:
First, calculate V σ
!
1 N 2 1 N
2N i∑ 4N 2 i∑
2
V σ
b =V Ri = V Ri2
=1 =1

What is the value of V Ri2 ?

2
V Ri2 = E Ri4 E Ri2

We shown that
E Ri2 = 2σ2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 10 / 68
Solution (cont’d):
2
V Ri2 = E Ri4 E Ri2 = E Ri4 4σ4

What is the value of E Ri4 ? For any k 2 N :

k k
E Rik = σk 2 2 Γ 1 +
2
For k = 4, we have:
E Ri4 = σ4 22 Γ (3)

with
Γ (3 ) = (3 1)! = 2! = 2
So, we have:
E Ri4 = σ4 23 = 8σ4

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 11 / 68
Solution (cont’d):
The variance of Ri2 is equal to:

V Ri2 = E Ri4 4σ4 = 8σ4 4σ4 = 4σ4

As a consequence

1 N
4N 2 i∑
b2
V σ = V Ri2
=1
1 N
4N 2 i∑
= 4σ4
=1
N4σ4
=
4N 2
σ4
=
N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 12 / 68
Solution (cont’d):
To sum up:
b2 = σ2 (unbiased estimator)
E σ

σ4
b2 =
lim V σ lim =0
N !∞ N !∞ N

b2 is a (weakly) consistent estimator of σ2 :

So, the estimator σ
p
b2 ! σ2
σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 13 / 68
Problem (cont’d)
Question 3: Sometimes, the Rayleigh distribution is parametrized by σ (and not by σ2 )
with
R Rayleigh (σ )
Propose a (weakly) consistent estimator for σ.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 14 / 68
Solution:
Since σ > 0, a natural estimator for σ is de…ned by:
v
u p
u 1 N
b=t
σ ∑ Ri2 = σb2
2N i =1

b2 is a (weakly) consistent estimator of σ2 :

We shown that the estimator σ
p
b2 ! σ2
σ

By applying
p the Continuous Mapping Theorem (CMP) for the continuous function
g (x ) = x , we get immediately:
p
b2 ! g σ2
g σ

or
p
b!σ
σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 15 / 68
Problem (cont’d)
Question 4: For any value of σ2 , what is the …nite sample (or exact sampling)
b2 de…ned by
distribution of the estimator σ

1 N 2
2N i∑
b2 =
σ Ri
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 16 / 68
Solution:
The estimator is de…ned by
1 N 2
2N i∑
b2 =
σ Ri
=1
We know that R1 , R2 , .., RN are i .i .d . random variables with a Rayleigh distribution.

Ri Rayleigh σ2

Reminder: if X and Y are independent and normally distributed N 0, σ2 random

p
variables, the transformed variable R = X 2 + Y 2 has a Rayleigh distribution.
The exact distribution of R 2 = X 2 + Y 2 is unknown (it is not a χ2 ...), and as a
consequence the …nite sample distribution of σb2 is unknown.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 17 / 68
Problem (cont’d)
Question 5: Write a Matlab code in order to approximate the true (unknown) …nite
b2 for a sample size N = 10, a true value of σ2 = 16 and by using
sample distribution of σ
S = 1, 000 simulations.
b2 .
(1) Plot an histogram of the 1,000 realisations of the estimator σ
(2) plot the Kernel estimator of the density fσb2 (x ) by using the Matlab built-in function
ksdensity.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 18 / 68
De…nition (Kernel density estimator)
Let consider a sample X1 , .., XN , where X has a distribution characterized by the pdf
fX (x ) , for x 2 R. A consistent (kernel) estimator of fX (x ) for any x 2 R is given by:

1 N x xi
λN i∑
b
fX (x ) = K
=1 λ

where K (.) denotes a kernel function and λ is bandwidth parameter.

p
b
fX (x ) ! fX (x ) 8x 2 R

For more details (and a discussion on the optimal choice of λ), see:
Lecture notes "Econométrie Non Paramétrique", Hurlin (2008), Master Econométrie and
Statistique Appliquée, Université d’Orléans.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 19 / 68
De…nition (Kernel function)
A kernel function K (u ) satis…es the following properties:
(i ) K (u ) 0
R
(ii ) K (u ) du = 1
(iii ) K (u ) reaches its maximum for u = 0 and decreases with ju j.
(iv ) K (u ) is symmetric, i.e. K (u ) = K ( u ) .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 20 / 68
Some examples of Kernel functions

1 u2
Normal : K (u ) = p exp u2R
2π 2
Triangular : K (u ) = 1 ju j u 2 [ 1, 1]
15 2
Quartic or BiWeight : K (u ) = 1 u2 u 2 [ 1, 1]
16
3
Epanechnikov : K (u ) = 1 u2 u 2 [ 1, 1]
4
35 3
Triweight : K (u ) = 1 u2 u 2 [ 1, 1]
32

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 21 / 68
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 22 / 68
250

200

150

100

0
5 10 15 20 25 30 35

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 23 / 68
0.08

0.07

0.06

0.05

0.04

0.03

0.02

0.01

0
0 5 10 15 20 25 30 35 40

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 24 / 68
Problem (cont’d)
Question 6: For the special case where σ2 = 1, what is the …nite sample (or exact
b2 de…ned by
sampling) distribution of the estimator σ

1 N 2
2N i∑
b2 =
σ Ri
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 25 / 68
Solution:
The estimator is de…ned by
1 N 2
2N i∑
b2 =
σ Ri
=1
We know that R1 , R2 , .., RN are i .i .d . random variables with Ri Rayleigh (1)
p
Reminder 1 : if Ri Rayleigh (1), then R = X 2 + Y 2 where X and Y are independent
and standard normally distributed N (0, 1) random variables.
Reminder 2: if X N (0, 1) , then X 2 χ2 (1 )
Reminder 3: if X 2 χ2 (v1 ) and Y 2 χ2 (v2 ) , and X and Y are independent, then
X +Y χ2 (v1 + v2 )

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 26 / 68
Solution (cont’d):
So, if X and Y are independent and standard normally distributed N (0, 1) random
variables p
Ri = X 2 + Y 2 Rayleigh (1)
Ri2 = X 2 + Y 2 χ2 (2 )
The sum of independent chi-squared distributed random variable has a chi-squared
distribution.
N
∑ Ri2 χ2 (2N )
i =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 27 / 68
Solution (cont’d):

N
b2 =
2N σ ∑ Ri2 χ2 (2N )
i =1

In the special case where σ2 b2 has an exact sampling

= 1, the transformed variable 2N σ
(…nite sample) distribution that corresponds to a chi-squared distribution with 2N
degrees of freedom.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 28 / 68
Problem (cont’d)
Question 7: Write a Matlab code in order to approximate the true (unknown) …nite
b2 for a sample size N = 10 in the
sample distribution of the transformed variable 2N σ
2
special case where σ = 1 by using S = 10, 000 simulations.
(1) Plot the Kernel estimator of the density f2N σb2 (x ) by using the Matlab built-in
function ksdensity.
(2) Compare this estimated density function to the pdf of a chi-squared distribution with
2N degrees of freedom.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 29 / 68
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 30 / 68
0.07
Estimated finite sample pdf
Theoretical pdf of a chi-squared
0.06

0.05

0.04

0.03

0.02

0.01

0
0 10 20 30 40 50 60 70

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 31 / 68
Problem (cont’d)
b2 ?
Question 8: What is the asymptotic distribution of the estimator σ

1 N 2
2N i∑
b2 =
σ Ri
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 32 / 68
Solution:
We know that R12 , R22 , .., RN2 are i .i .d . variables with

E Ri2 = 2σ2 V Ri2 = 4σ4

Step 1: By applying the Lindberg-Levy univariate Central Limit Theorem (CLT), we

get: !
p 1 N 2 d
N i∑
2
N Ri 2σ ! N 0, 4σ4
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 33 / 68
Solution (cont’d):
Step 2: By de…nition, we have
!
1 N 2 1 N 2
b2 =
σ ∑
2N i =1
Ri = g
N i∑
Ri
=1
!
p 1 N 2 d
N i∑
N Ri 2σ2 ! N 0, 4σ4
=1
with g (x ) = x /2. So, g (.) is a continuous and continuously di¤erentiable function with
g 2σ2 6= 0 and not involving N, then the delta method implies
! ! !
p 2
1 N 2 d ∂g (x )
N i∑
2 4
N g R i g 2σ ! N 0, 4σ
=1 ∂x 2σ2

2σ2 ∂g (x ) ∂x /2 1
g 2σ2 = = σ2 = =
2 ∂x 2σ2 ∂x 2σ2 2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 34 / 68
Solution (cont’d):
! ! !
p 2
1 N 2 d 1
N i∑
2 4
N g Ri g 2σ !N 0, 4σ
=1 2

b2 is asymptotically normally distributed

The estimator σ
p d
N σ b2 σ2 ! N 0, σ4

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 35 / 68
Problem (cont’d)
b2 ?
Question 9: What is the asymptotic variance of the estimator σ

1 N 2
2N i∑
b2 =
σ Ri
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 36 / 68
Solution:
We know that:
p d
b2
N σ σ2 ! N 0, σ4

or equivalently
asy σ4
b2
σ N σ2 ,
N
b2 is equal to:
The asymptotic variance of σ

σ4
b2 =
Vasy σ
N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 37 / 68
Problem (cont’d)
Question 10: Write a Matlab code in order to approximate the asymptotic distribution
of the transformed variable p
N σ b2 σ2
Z =
σ
for a sample size N = 10, 000, a true value of σ2 = 16 by using S = 10, 000 simulations.
(1) Plot the Kernel estimator of the density fZ (x ) by using the Matlab built-in function
ksdensity.
(2) Compare this estimated density function to the pdf of a standard normal distribution.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 38 / 68
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 39 / 68
0.45
Estimated finite sample pdf
0.4 Theoretical pdf of a standard normal

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
-5 0 5

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 40 / 68
Problem (cont’d)
b of the parameter σ
Question 11: What is the asymptotic distribution of the estimator σ
de…ned by: v
u
u 1 N
b=t
2N i∑
σ Ri2
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 41 / 68
Solution:
Step 1 : We know that:
p d
b2
N σ σ2 ! N 0, σ4

b > 0, we have:
Since σ
v
u p
u 1 N
b=t
σ ∑ Ri2 = σ b2
b2 = g σ
2N i =1
p
where g (x ) = x is a continuous and continuously di¤erentiable function with
2
g σ 6= 0 and that does not depend on N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 42 / 68
Solution (cont’d):
Step 2: We have
p d
N σb2 σ2 ! N 0, σ4
p
The delta method for g (x ) = x implies
!
p 2
2 2 d ∂g (x ) 4
b
N g σ g σ !N 0, σ
∂x σ2

with p
∂g (x ) ∂ x 1 1
g σ2 = σ = = p =
∂x σ2 ∂x σ2 2 σ2 2σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 43 / 68
Solution (cont’d):
Step 2 (cont’d):
!
p 2
2 2 d ∂g (x ) 4
b
N g σ g σ !N 0, σ
∂x σ2
p
∂g (x ) ∂ x 1 1
g σ2 = σ = = p =
∂x σ2 ∂x σ2 2 σ 2 2σ
So, we have:
p d σ2
b
N (σ σ) ! N 0,
4

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 44 / 68
Exercise 2

CAPM

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 45 / 68
Problem (CAPM)
The empirical analogue of the CAPM is given by:

rit rft = αi + βi (rmt rft ) + εt

| {z } | {z }
excess return of security i for time t market excess return for time t

where εt is an i .i .d . error term. We assume that

e
rit = rit rft e
rmt = rmt rft

E ( εt ) = 0 V ( εt ) = σ 2 E ( εt j e
rmt ) = 0

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 46 / 68
Problem (CAPM, cont’d)
Consider the model
e
rit = αi + βi e
rmt + εt
Data: Microsoft, SP500 and Tbill (closing prices) from 11/1/1993 to 04/03/2003

0.10

0.08
0.05

0.04
RMSFT

0.00

-0.05
-0.04

-0.10
-0.06 -0.04 -0.02 0.00 0.02 0.04 0.06 0.08 -0.08
500 1000 1500 2000

RSP500 RSP500 RMSFT

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 47 / 68
Problem (CAPM, cont’d)
We consider the CAPM model rewritten as follows

rit = xt> β + εt
e t = 1, ..T

where xt = (1 e rmt )> is 2 1 vector of random variables, β = (αi βi )> is 2 1 vector of

parameters, and where the error term εt satis…es E (εt ) = 0, V (εt ) = σ2 and
E ( εt j e
rmt ) = 0.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 48 / 68
Problem (CAPM, cont’d)
Question 1: show that the OLS estimator
! 1 !
T T
b=
β ∑ xt xt> ∑ xt erit
t =1 t =1

satis…es p
b d
T β β0 ! N 0, σ2 E 1
xt> xt

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 49 / 68
Solution:

1 let us rewrite the OLS estimator as:

! 1 ! ! 1 !
T T T T
β = ∑ xt xt
b >
∑ xt erit = β0 + ∑ xt xt> ∑ xt εt
t =1 t =1 t =1 t =1

2 b
Normalize the vector β β0
! 1 !
p 1 T p 1 T
T t∑ T t∑
b
T β β0 = xt xt> T xt εt
=1 =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 50 / 68
Solution (cont’d):

3. Using the WLLN and the CMP:

! 1
1 T p
T t∑
xt xt> ! E 1
xt xt>
=1

4. Using the CLT:

!
p 1 T d
T t∑
T xt εt E (xt εt ) ! N (0, V (xt εt ))
=1

with E ( εt j e
rmt ) = 0 and E ( εt j 1) = E (εt ) = 0 =) E (xt εt ) = 0 and

V (xt εt ) = E xt εt εt xt> = E E xt εt εt xt> xt

= E xt V ( εt j xt ) xt> = σ2 E xt xt>

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 51 / 68
Solution (cont’d):
So, we have ! 1
1 T p
T t∑
>
xt xt ! E 1 xt xt>
=1
!
p 1 T d
T t∑
T xt εt ! N 0, σ2 E xt xt>
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 52 / 68
Solution (cont’d):
By using the Slutsky’s theorem (for a convergence in distribution), we have:
! 1 !
p 1 T p 1 T d
T t∑ T t∑
b
T β β0 = >
xt xt T xt µt ! N (Π, Ω)
=1 =1

with
Π=E 1
xt xt> 0=0

Ω=E 1
xt xt> σ2 E xt xt> E 1
xt xt> = σ2 E 1
xt xt>

Finally, we have:
p d
b
T β β0 ! N 0, σ2 E 1
xt xt>

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 53 / 68
Problem (CAPM, cont’d)
b?
Question 2: What is the asymptotic variance-covariance matrix of the OLS estimator β
! 1 !
T T
b=
β ∑ xt xt> ∑ xt e
rit
t =1 t =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 54 / 68
Solution:
We shown that p
b d
T β β0 ! N 0, σ2 E 1
xt xt>

or equivalently
b
asy σ2
β N β0 , E 1
xt xt>
T
b is equal to:
The asymptotic variance-covariance matrix of β
2
b = σ E
Vasy β 1
xt xt>
T | {z }
2 2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 55 / 68
Remarks:

1 The asymptotic variance covariance matrix is a 2 2 symmetric matrix:

0 1
σ 2 V asy ( b
α ) cov b
α , b
β
Vasy β b = E 1 xt xt> = @ A
T cov b β, b
α Vasy b β

2 rmt )> , we have:

Since xt = (1 e

1 e
rmt 1 E (e
rmt )
E xt xt> = E =
e
rmt e2
rmt E (e
rmt ) E e 2
rmt

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 56 / 68
Problem (CAPM, cont’d)
Question 3: Let us consider a consistent estimator of σ2 de…ned by:
T T 2
1 1
b2 =
σ
T ∑ bε2t =
2 t =1 T ∑
2 t =1
e
rit b
xt> β

b ?
Propose a consistent estimator of the asymptotic variance Vasy β

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 57 / 68
Solution:
We know that
2
Vasy β b = σ E 1
xt xt>
T
Using the LLN, if xt is i .i .d ., we get:

1 T p
T t∑
xt xt> ! E xt xt>
=1

Using the CMP:

! 1
1 T p
T t∑
xt xt> !E 1
xt xt>
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 58 / 68
Solution (cont’d):
So, we have
2
b = σ E
Vasy β 1
xt xt>
T
and ! 1
1 T p
T t∑
xt xt> !E 1
xt xt>
=1
p
b2 ! σ2
σ
By using the Slutsky’s theorem:
! 1
b2
σ 1 T p σ2
T t∑
xt xt> ! E 1 b
xt xt> = Vasy β
T =1 T

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 59 / 68
Solution (cont’d):
b is de…ned by
A consistent estimator of the asymptotic variance Vasy β
! 1
b2
b = σ 1 T
T t∑
b asy β
V xt xt>
T =1

Or equivalently by
! 1
T
b =σ
b asy β
V b2 ∑ xt xt>
t =1
with
T T 2
1 1
b2 =
σ
T ∑ bε2t =
2 t =1 T ∑
2 t =1
e
rit b
xt> β

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 60 / 68
Remark:
! 1
T
b =σ
b asy β
V b2 ∑ xt xt>
t =1
>
Since xt = (1 e
rmt ) :
T
T ∑T
t =1 e
rmt
∑ xt xt> = ∑T ∑T 2
t =1 t =1 e
rmt t =1 e
rmt

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 61 / 68
Problem (CAPM, cont’d)
Question 4: Using the excel …le [Link], write a Matlab code to estimate the beta and
the alpha for MSFT. Compare your results with the following table of estimation results
(Eviews).

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 62 / 68
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 63 / 68
Perfect....

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 64 / 68
Problem (CAPM, cont’d)
Question 5: Using the excel …le [Link], write a Matlab code
(1) to estimate the variance of the error term εt
b
(2) to estimate the asymptotic standard errors of the estimators β
(3) Compare your results with the table of estimation results (Eviews).

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 65 / 68
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 66 / 68
Perfect too....

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 67 / 68
End of Exercices - Chapter 1

Christophe Hurlin

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 68 / 68
Exercises Chapter 2
Maximum Likelihood Estimation
Data science and advanced programming

Christophe Hurlin

HEC Lausanne

September 2024

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 1 / 74
Exercise 1

MLE and Geometric Distribution

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 2 / 74
Problem (MLE and geometric distribution)
We consider a sample X1 , X2 , .., XN of i .i .d . discrete random variables, where Xi has a
geometric distribution with a pmf given by:

fX (x , θ ) = Pr (X = x ) = θ (1 θ )x 1
8x 2 f1, 2, 3, ..g

where the success probability θ satis…es 0 < θ < 1 and is unknown. We assume that:
1 1 θ
E (X ) = V (X ) =
θ θ2
Question 1: Write the log-likelihood function of the sample fx1 , x2 , ..xN g .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 3 / 74
Solution

fX (x , θ ) = Pr (X = x ) = θ (1 θ )x 1
8x 2 f1, 2, 3, ..g
Since the X1 , X2 , .., XN are i .i .d . then
N N
LN (θ; x1 .., xN ) = ∏ fX (x i ; θ ) = θ N (1 θ ) ∑ i =1 ( x i 1)
i =1

N N
`N (θ; x1 , .., xn ) = ∑ ln fX (xi ; θ ) = N ln (θ ) + ln (1 θ) ∑ (xi 1)
i =1 i =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 4 / 74
Problem (MLE and geometric distribution)
Question 2: Determine the maximum likelihood estimator of the success probability θ.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 5 / 74
Solution
The maximum likelihood estimate of the success probability θ is de…ned by:
N
b
θ = arg max `N (θ; x ) = arg max N ln (θ ) + ln (1 θ) ∑ (xi 1)
0 < θ <1 0 < θ <1 i =1

The gradient and the hessian (deterministic) are de…ned by:

N
∂`N (θ; x ) N 1
∂θ
=
θ 1 ∑ (xi
θ i =1
1)

2 N
∂2 `N (θ; x ) N 1
∂θ 2
=
θ2 1 θ ∑ (xi 1)
i =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 6 / 74
Solution (cont’d)
So, the FOC (likelihood equation) is:
N
∂`N (θ; x ) N 1
∂θ b
=
b
θ 1 b ∑
θ i =1
(xi 1) = 0
θ

1 b
θ 1 N
N i∑
() = xi 1
b
θ =1

1 1 N
N i∑
() = xi
b
θ =1
So we have:
b 1
θ=
xn
where x n denotes the realisation of the sample mean X N = N 1
∑N
i =1 X i .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 7 / 74
Solution (cont’d)
The SOC is:
2 N
∂2 `N (θ; x ) N 1
∂θ 2 b
=
b 2
1 b
θ
∑ (xi 1)
θ θ i =1

Since b
θ = 1/x n , we have:
!
N N
1 1 b
θ
∑ (xi 1) = ∑ xi N = Nx n N =N
b
θ
1 =N
b
θ
i =1 i =1

So, we have:
!
2 b
∂2 `N (θ; x ) N 1 1 θ
= N
∂θ 2 b
θ b
θ
2
1 b
θ b
θ
0 1
1 1
= N@ 2
+ A
b
θ b
θ 1 bθ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 8 / 74
Solution (cont’d)

0 2 1
b
θ 1 b
θ +b
∂2 `N (θ; x ) θ
= N@ A
∂θ 2 b
θ b3
θ 1 b
θ
N
= 2
<0
b
θ 1 b
θ

we have a maximum since 0 < b

θ < 1.
Conclusion: the ML estimator of θ is equal to the inverse of the sample mean:

b 1
θ=
XN

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 9 / 74
Problem (MLE and geometric distribution)
Question 3: Show that the maximum likelihood estimator of the success probability θ is
weakly consistent.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 10 / 74
Solution
In two lines...
1 Since the X1 , X2 , .., XN are i .i .d . then according to the Khinchin’s theorem (WLLN),
we have:
p 1
X N ! E (X i ) =
θ
2 Given that bθ = 1/X N , by using the continuous mapping theorem (CMP) for a
function g (x ) = 1/x , we get:

b p 1
θ = g XN ! g
θ
or equivalently
p
b
θ!θ
The estimator b
θ is (weakly) consistent.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 11 / 74
Problem (MLE and geometric distribution)
Question 4: By using the asymptotic properties of the MLE, derive the asymptotic
distribution of the ML estimator b
θ = 1/X N .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 12 / 74
Solution

1 The log-likelihood function ln fX (θ; xi ) satis…es the regularity conditions.

2 So, the ML estimator is asymptotically normally distributed with

p d
N bθ θ 0 ! N 0, I 1 (θ 0 )

where θ 0 denotes the true value of the parameter and I (θ 0 ) the (average) Fisher
information number for one observation.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 13 / 74
Solution (cont’d)

3. Compute the Fisher information number for one observation. Since we consider a
marginal log-likelihood, the Fisher information number associated to Xi is the same
for the observations i . We have three de…nition for I (θ )

∂ì (θ; Xi )
I (θ ) = Vθ
∂θ
!
∂ì (θ; Xi ) ∂ì> (θ; Xi )
= Eθ
∂θ ∂θ
∂2 ì (θ; Xi )
= Eθ
∂θ 2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 14 / 74
Solution (cont’d)
Let us consider the third one:
∂2 `i (θ; Xi )
I (θ ) = Eθ
∂θ 2
!
2
1 1
= Eθ + (X i 1)
θ2 1 θ
2
1 1
= + (Eθ (X i ) 1)
θ2 1 θ
2
1 1 1
= 2
+ 1
θ 1 θ θ
1
=
θ 2 (1 θ )

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 15 / 74
Solution (cont’d)
The asymptotic distribution of the ML estimator is:
p d
N b θ θ0 ! N 0, θ 20 (1 θ0 )
N !∞

where θ 0 denotes the true value of the parameter. Or equivalently:

!
b
asy θ 2 (1 θ 0 )
θ N θ0 , 0
N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 16 / 74
Problem (MLE and geometric distribution)
Question 5: By using the central limit theorem and the delta method, …nd the
asymptotic distribution of the ML estimator b
θ = 1/X N .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 17 / 74
Solution

1 Since the X1 , X2 , .., XN are i .i .d . with E (X ) = 1/θ 0 and V (X ) = (1 θ 0 ) /θ 20 ,

according to the Lindberg-Levy’s CLT we get immediately
p 1 d 1 θ0
N XN !N 0,
θ0 θ 20

2 Our MLE estimator is de…ned by b θ = 1/X N . Let us consider a function

g (z ) = 1/z. So, g (.) is a continuous and continuously di¤erentiable function with
g (1/θ ) = θ 6= 0 and not involving N, then the delta method implies
!
p 2
1 d ∂g (z ) 1 θ0
N g XN g ! N 0,
θ0 ∂z 1/θ θ 20

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 18 / 74
Solution (cont’d)
!
p 2
1 d ∂g (z ) 1 θ0
N g XN g !N 0,
θ0 ∂z 1/θ θ 20

We known that g (z ) = 1/z and ∂g (z ) /∂z = 1/z 2 , so we have

p d 1 θ0
N b
θ θ0 ! N 0, θ 40
θ 20
Finally, we get the same result as in the previous question:
p d
N b θ θ 0 ! N 0, θ 20 (1 θ 0 )

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 19 / 74
Problem (MLE and geometric distribution)
Question 6: Determine the FDCR or Cramer-Rao bound. Is the ML estimator b
θ e¢ cient
and/or asymptotically e¢ cient?

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 20 / 74
Solution
The FDCR or Cramer-Rao bound is de…ned by:

FDCR = IN 1 (θ 0 )

where I N (θ 0 ) denotes the Fisher information number for the sample evaluated at the
true value θ 0 . There are three alternative de…nitions for I N (θ 0 ) .
!
∂`N (θ; X )
I N ( θ 0 ) = Vθ
∂θ θ0
!
∂`N (θ; X ) ∂`N (θ; X )>
I N ( θ 0 ) = Eθ
∂θ θ0 ∂θ
θ0
!
∂2 `N (θ; X )
I N ( θ 0 ) = Eθ
∂θ∂θ > θ0

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 21 / 74
Solution (cont’d)
Let us consider the third one:
!
∂2 `N (θ; X )
I N (θ 0 ) = Eθ
∂θ∂θ > θ0
!
2 N
N 1
= Eθ
θ 20
+
1 θ0 ∑ (X i 1)
i =1
2 N
N 1
=
θ 20
+
1 θ0 ∑ (Eθ (X i ) 1)
i =1
2
N 1 1
= 2
+ N 1
θ0 1 θ0 θ0
N
=
θ 20 (1 θ 0 )

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 22 / 74
Solution (cont’d)
So, the FDCR or Cramer-Rao bound is de…ned by:

θ 20 (1 θ 0 )
FDCR = IN 1 (θ 0 ) =
N

1 We don’t know if b
θ is e¢ cient... For that we need to compute the variance
V bθ = V 1/X N .

2 Since the log-likelihood function ln fX (θ; xi ) satis…es the regularity conditions, the
MLE is asymptotically e¢ cient.

Remark: we shown that for N large:

θ 2 (1 θ 0 )
Vasy b
θ = IN 1 (θ 0 ) = 0
N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 23 / 74
Remark
How to get the Fisher information number for the sample (and as a consequence the
FDCR or Cramer-Rao bound) in one line from the question 4? Since the sample is i .i .d .,
we have:
N
IN (θ 0 ) = N I (θ 0 ) = 2
θ 0 (1 θ 0 )

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 24 / 74
Problem (MLE and geometric distribution)
Question 7: Propose a consistent estimator for the asymptotic variance of the ML
estimator b
θ.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 25 / 74
Solution
We have:
θ 2 (1 θ 0 )
Vasy b θ = 0
N
b
and we know that the ML estimator θ is a (weakly) consistent estimator of θ 0 :
p
b
θ ! θ0

A natural estimator for the asymptotic variance is given by:

2
b
θ0 1 b
θ0
b asy b
V θ =
N
Given the CMP and Slutsky’s theorem, it is easy to show that:
p
b asy b
V θ ! Vasy b
θ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 26 / 74
Problem (MLE and geometric distribution)
Question 8: Write a Matlab code in order to
(1) Generate a sample of size N = 1, 000 of i .i .d . random variable distributed according
to a geometric distribution with a success probability θ = 0.3 by using the function
geornd.
(2) Estimate by MLE the parameter θ. Compare your estimate with the sample mean.

Remak: There are two de…nitions of geometric distribution:

Pr (X = x ) = θ (1 θ )x 1
8x 2 f1, 2, ..g used in this exercice

Pr (X = x ) = θ (1 θ )x 8x 2 f0, 1, 2, ..g used by Matlab for geornd.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 27 / 74
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 28 / 74
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 29 / 74
Exercise 2

MLE and AR(p) processes

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 30 / 74
De…nition (AR(1) process)
A stationary Gaussian AR(1) process takes the form

Yt = c + ρYt 1 + εt

with εt i .i .d . N 0, σ2 , jρj < 1 and:

c σ2
E (Y t ) = V (Y t ) =
1 ρ 1 ρ2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 31 / 74
Problem (MLE and AR processes)
>
Question 1: Denote θ = c; ρ; σ2 the 3 1 vector of parameters and write the
likelihood and the log-likelihood of the …rst observation y1 .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 32 / 74
Solution
Since the variable Y1 is gaussian with

c σ2
E (Y t ) = V (Y t ) =
1 ρ 1 ρ2

The (unconditional) likelihood of y1 is equal to:

!
1 1 (y1 c / (1 ρ))2
L1 (θ; y1 ) = p p exp
2π σ 2 / (1 ρ2 ) 2 σ 2 / (1 ρ2 )

The (unconditional) log-likelihood of y1 is equal to:

1 1 σ2 1 (y1 c / (1 ρ))2
`1 (θ; y1 ) = ln (2π ) ln
2 2 1 ρ2 2 σ 2 / (1 ρ2 )

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 33 / 74
Problem (MLE and AR processes)
Question 2: What is the conditional distribution of Y2 given Y1 = y1 . Write the
(conditional) likelihood and the (conditional) log-likelihood of the second observation y2 .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 34 / 74
Solution
For t = 2, we have:
Y2 = c + ρY1 + ε2
where ε2 N 0, σ2
. As a consequence, the conditional distribution of Y2 given
Y1 = y1 is also normal:
Y2 j Y1 = y1 N c + ρy1 , σ2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 35 / 74
Solution (cont’d)
Given
Y 2 j Y 1 = y1 N c + ρy1 , σ2
The conditional likelihood of y2 is equal to:
!
1 1 (y2 c ρy1 )2
L2 (θ; y2 j y1 ) = p exp
σ 2π 2 σ2

The conditional log-likelihood of y2 is equal to:

1 1 1 (y2 c ρy1 )2
`2 (θ; y2 j y1 ) = ln (2π ) ln σ2
2 2 2 σ2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 36 / 74
Problem (MLE and AR processes)
Question 3: Consider a sample of fy1 , y2 g of size T = 2. Write the exact likelihood (or
full likelihood) and the exact log-likelihood of the AR (1) model for the sample fy1 , y2 g .
Note that for two continuous random variables X and Y , the pdf of the joint distribution
(X , Y ) can be written as:

fX ,Y (x , y ) = f X jY =y ( x j y ) fY (y )

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 37 / 74
Solution
The exact (or full) likelihood of the sample fy1 , y2 g corresponds to the pdf of the joint
distribution of (Y1 , Y2 ) :
LT (θ; y1 , y2 ) = fY 1 ,Y 2 (y1 , y2 )
This joint density can be rewritten as the product of the marginal density of Y1 by the
conditional density of Y2 given Y1 = y1 :

LT (θ; y1 , y2 ) = f Y 2 jY 1 =y1 ( y2 j y1 ; θ) fY 1 (y1 ; θ)

or equivalently:
LT (θ; y1 , y2 ) = L2 (θ; y2 j y1 ) L1 (θ; y1 )

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 38 / 74
Solution (cont’d)
The exact (or full) likelihood of the sample fy1 , y2 g is equal to:
!
1 1 (y1 c / (1 ρ))2
LT (θ; y1 , y2 ) = p p exp
2π σ 2 / (1 ρ2 ) 2 σ 2 / (1 ρ2 )
!
1 1 (y2 c ρy1 )2
p exp
σ 2π 2 σ2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 39 / 74
Solution (cont’d)
Similarly the exact (or full) log-likelihood of the sample fy1 , y2 g is equal to:

`T (θ; y1 , y2 ) = `2 (θ; y2 j y1 ) + `1 (θ; y1 )

Then, we get:

1 1 σ2 1 (y1 c / (1 ρ))2
`T (θ; y1 , y2 ) = ln (2π ) ln
2 2 1 ρ2 2 σ 2 / (1 ρ2 )
1 1 1 (y2 c ρy1 )2
ln (2π ) ln σ2
2 2 2 σ2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 40 / 74
Problem (MLE and AR processes)
Question 4: Write the exact likelihood (or full likelihood) and the exact log-likelihood
of the AR (1) model for a sample fy1 , y2 , .., yT g of size T .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 41 / 74
Solution
More generally, we have:
T
LT (θ; y1 , ., yT ) = L1 (θ; y1 ) ∏ Lt (θ; yt j yt 1)
t =2

T
`T (θ; y1 , .., yT ) = `1 (θ; y1 ) + ∑ `t (θ; yt j yt 1)
t =2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 42 / 74
Solution (cont’d)

!
1 1 (y1 c / (1 ρ))2
LT (θ; y ) = p p exp
2π σ 2 / (1 ρ2 ) 2 σ 2 / (1 ρ2 )
!
T
1 1 (yt c ρyt 1 )2
∏ σp2π exp 2 σ2
t =2

1 1 σ2 1 (y1 c / (1 ρ))2
`T (θ; y ) = ln (2π ) ln
2 2 1 ρ2 2 σ 2 / (1 ρ2 )
!
T 2
1 1 1 (yt c ρyt 1)
+ ∑ 2
ln (2π )
2
ln σ2
2 σ2
t =2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 43 / 74
Problem (MLE and AR processes)
Question 5: The exact log-likelihood function is a non-linear function of the parameters
θ, and so there is no closed form solution for the exact mles. The exact MLE
b
θ = (bc; b b2 )> must be determined by numerically maximizing the exact log-likelihood
ρ; σ
function. Write a Matlab code to
(1) to generate a sample of size T = 1, 000 from an AR (1) process with c = 1, ρ = 0.5
and σ2 = 1. Remark : for the initial condition, generate a normal random variable.
(2) to compute the exact MLE.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 44 / 74
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 45 / 74
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 46 / 74
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 47 / 74
Problem (MLE and AR processes)
Question 6: Now we consider the …rst observation y1 as given (deterministic). Then, we
have fY 1 (y1 ; θ) = 1. Write the conditional log-likelihood of the AR (1) model for a
sample fy1 , y2 , .., yT g of size T .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 48 / 74
Solution
The conditional likelihood is de…ned by:
T
LT (θ; y2 , .., yT j y1 ) = ∏ f Y jY t t 1 ,Y 1 =y 1
( yt j yt 1 , y1 ; θ) fY 1 (y1 ; θ)
t =2
T
= ∏ f Y jY t t 1
( yt j yt 1 ; θ)
t =2

The conditional log-likelihood is de…ned by:

T
`T (θ; y1 , .., yT j y1 ) = `1 (θ; y1 ) + ∑ `t (θ; yt j yt 1 , y1 )
t =2
T
= ∑ `t (θ; yt j yt 1)
t =2

where `t (θ; yt j yt 1) = ln f Y t jY t 1
( yt j yt 1 ; θ) .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 49 / 74
Solution (cont’d)
The conditional log-likelihood is then equal to:
!
T 2
1 1 1 (yt c ρyt 1)
`T (θ; y ) = ∑ 2
ln (2π )
2
ln σ2
2 σ2
t =2

or equivalently
(T 1) (T 1)
`T (θ; y ) = ln (2π ) ln σ2
2 2
1 T
2σ2 t∑
2
(yt c ρyt 1)
=2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 50 / 74
Problem (MLE and AR processes)
Question 7: Write the likelihood equations associated to the conditional log-likelihood.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 51 / 74
Solution
The ML estimator b c; b
θ = (b b2 )> of θ is de…ned by:
ρ; σ
b
θ = arg max`T (θ; y1 , .., yT )
θ 2Θ

The log-likelihood equations are:

0 ∂`T (θ;y )
1
B ∂c b
θ C 0 1
B C 0
∂`T (θ; y ) B ∂`T (θ;y ) C @
=B C= 0 A
∂θ b B ∂ρ b
θ C
θ @ A 0
∂`T (θ;y )
∂σ2 b
θ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 52 / 74
Solution (cont’d)

(T 1) (T 1)
`T (θ; y ) = ln (2π ) ln σ2
2 2
1 T
2σ2 t∑
2
(yt c ρyt 1)
=2

∂`T (θ; y ) 1 T
∂c b
= ∑ (yt
b 2 t =2
σ
b
c b
ρyt 1) =0
θ

∂`T (θ; y ) 1 T
∂ρ b
= ∑ (yt
b 2 t =2
σ
b
c b
ρyt 1 ) yt 1 =0
θ
T
∂`T (θ; y ) (T 1 ) 1
∂σ2 b
=
σ2
2b
+ 4
2b
σ
∑ (yt b
c b
ρyt 1)
2
=0
θ t =2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 53 / 74
Problem (MLE and AR processes)
Question 8: Show that the conditional ML estimators b c and bρ correspond to the OLS
b2 . Remark: do not verify the SOC at this step.
estimator. Give the the estimator of σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 54 / 74
Solution
The maximisation of `T (θ; y ) with respect to c and ρ

(T 1) (T 1)
`T (θ; y ) = ln (2π ) ln σ2
2 2
1 T
2σ2 t∑
2
(yt c ρyt 1)
=2

is equivalent to the minimisation of

T
∑ (yt c ρyt 1)
2
= (y Xβ)> (y Xβ)
t =2

with y = (y2 ; ..; yN )> , β = (c; ρ)> and X = (1 : y 1) with y 1 = (y1 ; ..; yN 1)
>
.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 55 / 74
Solution
The conditional ML estimators of c and ρ are equivalent to the ordinary least square
(OLS) estimators obtained in regression of yt on a constant and its own lagged value:

yt = c + ρyt 1 + εt
! 1 !
b
c T 1 ∑T
t =2 y t 1 ∑T
t =2 y t
=
b
ρ ∑T
t =2 y t 1 ∑T 2
t =2 y t 1 ∑T
t =2 y t 1 yt

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 56 / 74
Solution
b2 is de…ned by:
The ML estimator σ
T
∂`T (θ; y ) (T 1 ) 1
∂σ2 b
=
2b
σ 2
+ 4
2b
σ
∑ (yt b
c b
ρyt 1)
2
=0
θ t =2

Then, we get:
T T
1 1
b2 =
σ
T ∑ (yt
1 t =2
b
c b
ρyt 1)
2
=
T ∑ εbt 2
1 t =2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 57 / 74
Problem (MLE and AR processes)
Question 9: Write a Matlab code to compute the conditional maximum likelihood
estimator b c; b
θ = (b b2 )> .
ρ; σ
(1) Generate a sample of size T = 1, 000 from an AR (1) process with c = 1, ρ = 0.5
and σ2 = 1. Remark : for the initial condition, generate a normal random variable.
(2) Compute the conditional MLE.
c and b
(3) Compare the ML estimators b ρ to the OLS ones.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 58 / 74
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 59 / 74
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 60 / 74
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 61 / 74
Problem (MLE and AR processes)
Question 10: Write the average Fisher information matrix associated to the conditonal
likelihood.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 62 / 74
Solution
In general for a conditional model, in order to compute the average information matrix
I (θ ) for one observation:
Step 1: Compute the Hessian matrix or the score vector for one observation

∂2 `i (θ; Yi j xi ) ∂`i (θ; Yi j xi )

Hi (θ; Yi j xi ) = si (θ; Yi j xi ) =
∂θ∂θ > ∂θ
Step 2: Take the expectation (or the variance) with respect to the conditional
distribution Yi j Xi = xi

I i (θ ) = Vθ (si (θ; Yi j xi )) = Eθ ( Hi (θ; Yi j xi ))

Step 3: Then the expectation with respect to the conditioning variable X

I (θ ) = EX (I i (θ ))

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 63 / 74
Solution (cont’d)
Step 1:
∂`t (θ; yt ) 1
= 2 (yt c ρyt 1 )
∂c σ
∂`t (θ; yt ) 1
= 2 (yt c ρyt 1 ) yt 1
∂ρ σ
∂`t (θ; yt ) 1 1 2
= + 4 (yt c ρyt 1)
∂σ2 2σ2 2σ
The Hessian matrix for one observation is de…ned by:
0 1
1/σ2 yt 1 /σ2 εt /σ4
B C
Ht (θ; Yt j yt 1 ) = @ yt 1 /σ2 yt2 1 /σ2 εt yt 1 /σ
4
A
εt /σ4 εt yt 1 /σ
4 1/2σ4 εt /σ6
2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 64 / 74
Solution (cont’d)

0 1
1/σ2 yt 1 /σ
2 εt /σ4
B 2 C
Ht (θ; Yt j yt 1) =@ yt 1 /σ yt2 1 /σ2 εt yt 1 /σ
4
A
εt /σ4 εt yt 1 /σ
4 1/2σ4 ε2t /σ6

Step 2: Take the expectation (or the variance) with respect to the conditional
distribution Yt j Yt 1 = yt 1

I t (θ) = Eθ ( Ht (θ; Yt j yt 1 ))
0 1
1/σ2 yt 1 /σ2 0
B 2 C
I t (θ) = @ yt 1 /σ yt2 1 /σ2 0 A
0 0 1/2σ4
since Eθ (εt ) = 0, Eθ (εt yt 1) = yt 1 Eθ (εt ) = 0 and Eθ ε1t = σ2 .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 65 / 74
Solution (cont’d)

0 1
1/σ2 yt 1 /σ
2 0
B 2 C
I t (θ) = @ yt 1 /σ yt2 1 /σ2 0 A
0 0 1/2σ4

Step 3: Then take the expectation with respect to the conditioning variable
xt = (1 : yt 1 )
I (θ) = EX (I i (θ))
0 1
1/σ 2 EX (yt 1 ) /σ2 0
B C
I (θ) = @ EX (yt 1 ) /σ 2 EX yt 1 /σ
2 2 0 A
0 0 1/2σ4

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 66 / 74
Problem (MLE and AR processes)
Question 11: What is the asymptotic distribution of the conditional MLE? Propose an
estimator for the asymptotic variance covariance matrix of b c; b
θ = (b b2 )> .
ρ; σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 67 / 74
Solution
Since the log-likelihood is regular, we have:
p d
T 1 b
θ θ0 ! N 0, I 1
(θ0 )

or equivalently
b 1 1
θ N θ0 , I (θ0 )
T 1
with 0 1
1/σ2 EX (yt 1 ) /σ
2 0
B C
I (θ) = @ EX (yt 1 ) /σ
2 EX yt2 1 /σ2 0 A
0 0 1/2σ4

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 68 / 74
Solution (cont’d)

0 1
1/σ2 EX (yt 1 ) /σ
2 0
B C
I (θ) = @ EX (yt 1 ) /σ
2 EX yt2 1 /σ2 0 A
0 0 1/2σ4

An estimator of the asymptotic variance covariance matrix can be derived from:

0 1
1/bσ2 (T 1 ) 1 ∑T σ2
t =2 yt 1 /b 0
B C
bI (θ) = B (T 1) 1 ∑T yt 1 /b σ 2 1 T
(T 1) ∑t =2 yt 1 /b2 σ 2
0 C
@ t =2 A
0 0 1/2b σ4

b2 is the ML estimator of σ2 .
where σ

b asy b 1 bI 1
V θ = (θ)
T 1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 69 / 74
Solution (cont’d)
If we denote by X = (1 : y 1) , then we have:
!
bI (θ) = T
1 > σ2
1 X X/b 02 1
01 2 1/2bσ4

since
T 1 ∑T
t =2 y t 1
X> X =
∑T
t =2 y t 1 ∑T 2
t =2 y t 1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 70 / 74
Problem (MLE and AR processes)
Question 12: Write a Matlab code to compute the asymptotic variance covariance
matrix associated to the conditional maximum likelihood estimator b c; b
θ = (b b2 )> .
ρ; σ
(1) Import the data from the excel …le Chapter2_Exercice2.xls
(2) Compute the asymptotic variance covariance matrix of the conditional MLE.
(3) Compare your results with the results reported in Eviews.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 71 / 74
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 72 / 74
Perfect....

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 73 / 74
End of Exercices - Chapter 2

Christophe Hurlin

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 74 / 74
Exercises Chapter 4
Statistical Hypothesis Testing
Data science and advanced programming

Christophe Hurlin

HEC Lausanne

September 2024

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 1 / 88
Exercise 1

Parametric tests and the Neyman Pearson lemma

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 2 / 88
Problem
We consider two continuous independent random variables U and W normally distributed
with N 0, σ2 . The transformed variable X de…ned by:
p
X = U2 + W 2

has a Rayleigh distribution with a parameter σ2 :

X Rayleigh σ2

with a pdf fX x ; σ2 de…ned by:

x x2
fX x ; σ 2 = exp 8x 2 [0, +∞[
σ2 2σ2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 3 / 88
Problem (cont’d)
Question 1: we consider an i .i .d . sample fX1 , X2 , .., XN g . Derive the MLE estimator of
σ2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 4 / 88
Solution
x x2
fX x ; σ 2 = exp 8x 2 [0, +∞[
σ2 2σ2
The log-likelihood of the i .i .d . sample fx1 , x2 , .., xN g is
N N
1 N 2
`N σ 2 ; x = ∑ ln fX xi ; σ 2 = ∑ ln (xi ) N ln σ2
2σ2 i∑
xi
i =1 i =1 =1
2
b is de…ned as to be:
The ML estimator σ

b2 = arg max `N σ2 ; x
σ
σ 2 >0

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 5 / 88
Solution (cont’d)

N
1 N 2
b2 = arg max
σ ∑ ln (xi ) N ln σ2
2σ2 i∑
xi
σ 2 >0 i =1 =1
FOC (log-likelihood equations)

∂ `N σ 2 ; x N 1 N
= + 4 ∑ xi2 = 0
∂σ2 b
σ 2
2b
σ i =1
b2
σ

So, the ML estimator of σ2 is

1 N 2
2N i∑
b2 =
σ Xi
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 6 / 88
Solution (cont’d)

∂ `N σ 2 ; x N 1 N 2
2σ4 i∑
= + xi
∂σ2 σ2 =1
SOC:

∂2 `N σ 2 ; x N 1 N 2
∂σ4
=
b4
σ
∑ xi
b 6 i =1
σ
b2
σ

N b2
2N σ
=
b4
σ b6
σ
N
= <0
b4
σ
since ∑N 2 b2 . So, we have a maximum.
i =1 xi = 2N σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 7 / 88
Problem (cont’d)
b2 ?
Question 2: what is the asymptotic distribution of the MLE estimator σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 8 / 88
Solution
The average Fisher information matrix associated to the sample is:
!
2 ∂2 `N σ 2 ; X
IN σ = Eσ 2
∂σ4
!
N 1 N 2
= Eσ 2 + 6 ∑ Xi
σ4 σ i =1
!
N 1 N Xi2
σ4 i∑
= + E 2
σ4 =1
σ
σ2

Since (X /σ )2 = (U /σ )2 + (W /σ )2 where U /σ and W /σ are two independent standard

normal variables, then X 2 /σ2 χ2 (2) with
!
Xi2
Eσ 2 =2
σ2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 9 / 88
Solution (cont’d)
So, we have
!
N 1 N Xi2
σ4 i∑
2
IN σ = + Eσ 2
σ4 =1 σ2
N 2N
= + 4
σ4 σ
N
=
σ4
Since the sample is i .i .d ., the average Fisher information matrix is:
1 1
I σ2 = I σ2 = 4
N N σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 10 / 88
Solution (cont’d)
The regularity conditions hold, and we have:
p d
N σ b2 σ2 ! N 0, I 1
σ2

Here p d
b2
N σ σ2 ! N 0, σ4

where σ2 denotes the true value of the parameter. Or equivalently:

asy σ4
b2
σ N σ2 ,
N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 11 / 88
Problem (cont’d)
Question 3: consider the test

H0 : σ2 = σ20 H1 : σ2 = σ21

with σ21 > σ20 . Determine the critical region of the UMP test of size α.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 12 / 88
Solution
Given the Neyman Pearson lemma, the rejection region is given by:
( )
LN σ20 ; x1 , ., xN
W = x1 , .., xN j <K
LN σ21 ; x1 , ., xN

where K is a constant determined by the level of the test α. So, we have

`N σ20 ; x `N σ21 ; x < ln (K )

() ∑N
i =1 ln (xi ) N ln σ20 1
2σ20 ∑N 2
i =1 x i ∑N
i =1 ln (xi )

+N ln σ21 + 1
2σ21 ∑N 2
i =1 xi < ln (K )

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 13 / 88
Solution (cont’d)

1 1 1 N 2
2 i∑
N ln σ21 ln σ20 + xi < ln (K )
σ21 σ20 =1

1 1 1 N 2
2 i∑
() xi < K 1
σ21 σ20 =1

with K1 = ln (K ) N log σ21 log σ20 . or equivalently:

σ20 σ21 1 N 2
2 i∑
xi < K 1
σ20 σ21 =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 14 / 88
Solution (cont’d)

σ20 σ21 1 N 2
2 i∑
xi < K 1
σ20 σ21 =1

Since σ21 > σ20 , we have:

1 N 2
2N i∑
xi > A
=1

where A = K1 σ20 σ21 / σ20 σ21 N is a constant determined by α.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 15 / 88
Solution (cont’d)
The rejection region of the UMP test of size α

H0 : σ2 = σ20 H1 : σ2 = σ21

with σ21 > σ20 is: n o

b 2 (x ) > A
W= x :σ

b2 (x ) is the
where the critical value A is a constant determined by the size α and σ
2
realisation of the ML estimator σb (the test statistic):

1 N 2
2N i∑
b2 =
σ Xi
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 16 / 88
Solution (cont’d)
Given the de…nition of the size:

b2 > A H0
α = Pr ( W j H0 ) = Pr σ

Under the null, for N large, we have:

asy σ40
b2
σ N σ20 ,
H0 N
Then !
b2
σ σ2 A σ20
1 α = Pr p0 < p H0
2
σ0 / N σ20 / N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 17 / 88
Solution (cont’d)

!
b2
σ σ2 A σ20
1 α = Pr p0 < p H0
2
σ0 / N σ20 / N
Denote by Φ (.) the cdf of the standard normal distribution:

σ2
A = σ20 + p 0 Φ 1
(1 α)
N
The rejection region of the UMP test of size α

H0 : σ2 = σ20 H1 : σ2 = σ21

with σ21 > σ20 is:

σ2
W= b2 (x ) > σ20 + p 0 Φ
x :σ 1
(1 α)
N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 18 / 88
Problem (cont’d)
Question 4: consider the test

H0 : σ2 = 2 H1 : σ2 > 2

For a sample of size N = 100, we have

N
∑ xi2 = 470
i =1

What is the conclusion of the test for a size of 10%?

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 19 / 88
Solution
Consider the test
H0 : σ2 = σ20 H1 : σ2 = σ21
with σ21 > σ20 . The rejection region of the UMP test of size α is given by:

σ2
W= b2 (x ) > σ20 + p 0 Φ
x :σ 1
(1 α)
N

This region does not depend on the value of σ21 . So, it corresponds to the rejection region
of the one-sided UMP of size α :

H0 : σ2 = σ20 H1 : σ2 > σ20

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 20 / 88
Solution (cont’d)
H0 : σ2 = 2 H1 : σ2 > 2
σ2
W= b2 (x ) > σ20 + p 0 Φ
x :σ 1
(1 α)
N
NA: N = 100, α = 10% :
2
W= b 2 (x ) > 2 +
x :σ Φ 1
(0.9)
10
n o
b2 (x ) > 2.2563
W= x :σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 21 / 88
Solution (cont’d)
n o
b2 (x ) > 2.2563
W= x :σ

For this sample (N = 100) we have ∑N 2

i =1 xi = 470, and as a consequence

1 N 2 470
2N i∑
b 2 (x ) =
σ xi = = 2.35
=1 200

For a signi…cance level of 10%, we reject the null H0 : σ2 = 2.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 22 / 88
Problem (cont’d)
Question 5: determine the power of the one-sided UMP test of size α for:

H0 : σ2 = σ20 H1 : σ2 > σ20

Numerical application: N = 100, σ20 = 2 and α = 10%.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 23 / 88
Solution
The rejection region of the UMP of size α is:
n o
W= x :σ b 2 (x ) > A
p
with σ20 + Φ 1 (1 α) σ20 / N. By of the power, we have:

σ2
power = Pr ( W j H1 ) = Pr b2 > σ20 + p 0 Φ
σ 1
(1 α ) H1
N
Under the alternative hypothesis, for N large, we have:
asy σ4
b2
σ N σ2 , σ2 > σ20
H1 N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 24 / 88
Solution (cont’d)
Then, the power is equal to:
!
b2
σ σ2 A σ2 A σ2
power = 1 Pr p < p H1 =1 Φ p
σ2 / N σ2 / N σ2 / N
p
Given the de…nition of the critical value A = σ20 + Φ 1 (1 α) σ20 / N, we have:

σ20 σ2 σ2 1
power = 1 Φ p + 02 Φ (1 α) 8σ2 > σ20
σ2 / N σ

NA: σ20 = 2, N = 100 and α = 10%

2 σ2 2 1
power = 1 Φ + 2Φ (0.9) 8σ2 > σ20
σ2 /10 σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 25 / 88
Solution (cont’d)

0.9

0.8

0.7

0.6
power

0.5

0.4

0.3

0.2

0.1

0
2 2.2 2.4 2.6 2.8 3
2
σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 26 / 88
Solution (cont’d)

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 27 / 88
Problem (cont’d)
Question 6: consider the two-sided test

H0 : σ2 = σ20 H1 : σ2 6= σ20

What is the critical region of the test of size α?

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 28 / 88
Solution
Consider the one-sided tests:

Test A: H0 : σ2 = σ20 against H1 : σ2 < σ20

Test B: H0 : σ2 = σ20 against H1 : σ2 > σ20

The non-rejection regions of the UMP one-sided tests of size α/2 are:

σ2 α
WA = b2 (x ) > σ20 + p 0 Φ
x :σ 1
N 2

σ2 α
b2 (x ) < σ20 + p 0 Φ 1 1
WB =
x :σ
N 2
The non rejection region of the two-sided test corresponds to the intersection of these
two regions:
W = WA \WB

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 29 / 88
Solution (cont’d)
So, the non rejection region of the two-sided test of size α is:

σ2 α σ2 α
W= x : σ20 + p 0 Φ 1 b2 (x ) < σ20 + p 0 Φ
<σ 1
1
N 2 N 2

Since, Φ 1 (α/2) = Φ 1 (1 α/2) , this region can be rewritten as:

σ2 α
W= b 2 (x )
x: σ σ20 < p 0 Φ 1
1
N 2
The rejection region of the two-sided of size α is:

σ2 α
W= b 2 (x )
x: σ σ20 > p 0 Φ 1
1
N 2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 30 / 88
Problem (cont’d)
Question 7: determine the power of the two-sided test of size α for:

H0 : σ2 = σ20 H1 : σ2 6= σ20

Numerical application: N = 100, σ20 = 2 and α = 10%.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 31 / 88
Solution
The non rejection region of the two-sided test of size α is:
n o
W= x :A<σ b 2 (x ) < B

σ2 α σ2 α
A = σ20 + p 0 Φ 1
B = σ20 + p 0 Φ 1
1
N 2 N 2
By de…nition of the power:

power = Pr ( W j H1 ) = 1 Pr W H1

So, we have:
power = 1 b2 < B + Pr σ
Pr σ b2 < A

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 32 / 88
Solution (cont’d)

power = 1 b2 < B + Pr σ
Pr σ b2 < A

Under the alternative

asy σ4
b2
σ N σ2 , σ2 6= σ20
H1 N
So, we have
B σ2 A σ2
power = 1 Φ p +Φ p
σ2 / N σ2 / N

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 33 / 88
Solution (cont’d)
We have
B σ2 A σ2
power = 1 Φ p +Φ p
σ2 / N σ2 / N
σ2 α σ2 α
A = σ20 + p 0 Φ 1
B = σ20 + p 0 Φ 1
1
N 2 N 2
So 8σ2 6= σ20 , the power function of the two sided test is de…ned by:

σ20 σ2 σ2 α
power = 1 Φ p + 02 Φ 1 1
σ2 / N σ 2
σ20 σ2 σ20 α
+Φ p + 2Φ 1
2
σ / N σ 2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 34 / 88
Solution (cont’d)
NA: N = 100, α = 10% and σ20 = 2. 8σ2 6= 2

2 σ2 2 1
power = 1 Φ + 2Φ (0.95)
σ2 /10 σ
2 σ2 2 1
+Φ + 2Φ (0.05)
σ2 /10 σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 35 / 88
Solution (cont’d)

0.9

0.8

0.7

0.6
power

0.5

0.4

0.3

0.2

0.1

0
1 1.5 2 2.5 3
2
σ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 36 / 88
Solution (cont’d)

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 37 / 88
Problem (cont’d)
Question 8: show that the two-sided test is unbiased and consistent.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 38 / 88
Solution
The power function is de…ned as to be:

σ20 σ2 σ2 α
P σ2 = 1 Φ p + 20 Φ 1 1
σ2 / N σ 2
σ20 σ2 σ20 α
+Φ p + 2Φ 1
2
σ / N σ 2

If σ2 < σ20 , then:

lim P σ2 = 1 Φ (+∞) + Φ (+∞) = 1 1+1 = 1

N !∞

If σ2 > σ20 , then:

lim P σ2 = 1 Φ ( ∞) + Φ ( ∞) = 1 0+0 = 1
N !∞

The test is consistent.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 39 / 88
Solution (cont’d)
The power function is de…ned as to be:

σ20
σ2 σ2 α
P σ2 = 1 Φ p + 20 Φ 1 1
σ2 /
N σ 2
σ20 σ2 σ20 α
+Φ p + 2Φ 1
2
σ / N σ 2

This function reaches a minimum when σ2 tends to σ20 .

α α
lim P σ2 = 1 Φ Φ 1
1 +Φ Φ 1
σ2 !σ20 2 2
α α
= 1 1 +
2 2
= α

The test is unbiased.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 40 / 88
Subsection 4.2

The trilogy: LRT, Wald, and LM tests

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 41 / 88
Problem (Greene, 2007, page 531)
We consider two random variables Y and X such that the pdf of the conditional
distribution Y j X = x is given by

1 y
f Y jX ( y j x ; β ) = exp
β+x β+x
For convenience, let
1
βi =
β + xi
This exponential density is a restricted form of a more general gamma distribution,
ρ
βi ρ 1
f Y jX ( y j x ; β, ρ) = y exp ( yi βi )
Γ (ρ) i
The restriction is ρ = 1. We want to test the hypothesis

H0 : ρ = 1 versus H1 : ρ 6= 1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 42 / 88
Reminder: the gamma function
The gamma function Γ (p ) is de…ned as to be:
Z∞
Γ (p ) = tp 1
exp ( t ) dt 8p > 0
0

The gamma function obeys the recursion

Γ (p ) = (p 1 ) Γ (p 1)

1 p
Γ = π
2
So for integer values of p, we have

Γ (p ) = (p 1)!

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 43 / 88
Reminder: the gamma function (cont’d)
The derivatives of the gamma function are
Z∞
∂k Γ (p )
= (ln (t ))k t p 1
exp ( t ) dt
∂p k
0

The …rst two derivatives of ln (Γ (p )) are denoted

∂ ln (Γ (p )) Γ0
= = Ψ (p )
∂p Γ
0
∂2 ln (Γ (p )) ΓΓ" Γ2
= = Ψ 0 (p )
∂p 2 Γ2
where Ψ (p ) and Ψ0 (p ) are the digamma and trigamma functions (see polygamma
function and function psy in Matlab).

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 44 / 88
Problem (cont’d)
Question 1: consider an i .i .d . sample fXi , Yi gN
i =1 and write its log-likelihood under H1
(unconstrained model) and under H0 (constrained model).

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 45 / 88
Solution
Under H1 , with θ = ( β : ρ)> , we have
ρ
βi ρ 1 1
f Y i jX i ( yi j xi ; θ) = y exp ( yi βi ) with βi =
Γ (ρ) i β + xi
N
`N ( y j x ; θ) = ∑ ln f Y jXi i
( yi j xi ; θ)
i =1
The log-likelihood under H1 (unconstrained model) is:
N N N
`N ( y j x ; θ) = ρ ∑ ln ( βi ) N ln (Γ (ρ)) + (ρ 1) ∑ ln (yi ) ∑ yi βi
i =1 i =1 i =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 46 / 88
Solution (cont’d)
Under H0 : ρ = 1, we have
1
f Y i jX i ( yi j xi ; β) = βi exp ( yi βi ) with βi =
β + xi
N
`N ( y j x ; β ) = ∑ ln f Y jX i i
( yi j xi ; β )
i =1
The log-likelihood under H0 (constrained model) is:
N N
`N ( y j x ; β ) = ∑ ln ( βi ) ∑ yi βi
i =1 i =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 47 / 88
Problem (cont’d)
Question 2: write the gradient vectors and the Hessian matrices associated to the
unconstrained log-likelihood (under H1 ) and to the constrained log-likelihood (under H0 ).

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 48 / 88
Solution
Under H1 :
N N N
`N ( y j x ; θ) = ρ ∑ ln ( βi ) N ln (Γ (ρ)) + (ρ 1) ∑ ln (yi ) ∑ yi βi
i =1 i =1 i =1

Remarks:
∂βi ∂ (1/ ( β + xi )) 1
= = = β2i
∂β ∂β ( β + xi )2
∂ ln ( βi ) ∂( ln ( β + xi )) 1
= = = βi
∂β ∂β β + xi
∂ ln (Γ (ρ))
= Ψ (ρ)
∂ρ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 49 / 88
Solution (cont’d)

N N N
`N ( y j x ; θ) = ρ ∑ ln ( βi ) N ln (Γ (ρ)) + (ρ 1) ∑ ln (yi ) ∑ yi βi
i =1 i =1 i =1
The gradient vector under H1 is:
0 1
∂`N ( y jx ;θ)
∂` ( y j x ; θ) B ∂β C
gN ( y j x ; θ) = N =@ A
∂θ ∂`N ( y jx ;θ)
∂ρ

with
N N
∂`N ( y j x ; θ)
∂β
= ρ ∑ βi + ∑ yi β2i
i =1 i =1
N N
∂`N ( y j x ; θ)
= ∑ ln ( βi ) N Ψ (ρ) + ∑ ln (yi )
∂ρ i =1 i =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 50 / 88
Solution (cont’d)
N N
∂`N ( y j x ; θ)
∂β
= ρ ∑ βi + ∑ yi β2i
i =1 i =1
So, we have:
N N
∂2 `N ( y j x ; θ)
2
= ρ ∑ β2i 2 ∑ yi β3i
∂β i =1 i =1
N
∂2 ` ( y j x ; θ)
N
∂β∂ρ
= ∑ βi
i =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 51 / 88
Solution (cont’d)

N N
∂`N ( y j x ; θ)
= ∑ ln ( βi ) N Ψ (ρ) + ∑ ln (yi )
∂ρ i =1 i =1
So, we have
∂2 `N ( y j x ; θ)
= N Ψ0 (ρ)
∂ρ2
N
∂2 `N ( y j x ; θ)
∂ρ∂β
= ∑ βi
i =1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 52 / 88
Solution (cont’d)
The Hessian matrix associated to the log-likelihood under H1 is:
0 2 2 1
∂ `N ( y jx ;θ) ∂ `N ( y jx ;θ)
∂2 `N ( y j x ; θ) B ∂β2 ∂β∂ρ C
HN ( y j x ; θ) = =@ A
∂θ∂θ> ∂2 `N ( y jx ;θ) ∂2 `N ( y jx ;θ)
∂ρ∂β ∂ρ2

with !
2 3
ρ ∑N
i =1 β i 2 ∑N
i =1 y i β i ∑N
i =1 β i
HN ( y j x ; θ) =
∑N
i =1 β i N Ψ0 (ρ)

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 53 / 88
Solution (cont’d)
Under H0 : ρ = 1, the gradient (scalar) is
N N
∂ `N ( y j x ; β )
gN ( y j x ; β ) =
∂β
= ∑ βi + ∑ yi β2i
i =1 i =1

The Hessian (scalar) is:

∂2 `N ( y j x ; β )
HN ( y j x ; β ) = = ∑N 2
i =1 β i 2 ∑N 3
i =1 y i β i
∂β2

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 54 / 88
Problem (cont’d)
Question 3: write the average Fisher information matrices under H1 and under H0 .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 55 / 88
Solution
Under H1 (unconstrained model), the Hessian (stochastic) is
!
ρβ2i 2Yi β3i βi
Hi ( Yi j xi ; θ) =
βi Ψ0 (ρ)

The average Fisher information matrice can be de…ned (one of the three de…nitions) as:

I (θ) = EX Eθ ( Hi ( Yi j xi ; θ))

!
ρβ2i + 2Eθ (Yi ) β3i βi
I (θ) = EX
βi Ψ0 (ρ)
since βi = 1/ ( β + Xi ) depends on the random variable Xi .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 56 / 88
Solution

!
ρβ2i + 2Eθ (Yi ) β3i βi
I (θ) = EX
βi Ψ0 (ρ)
Consider the score of the unit i . By de…nition, we have:
!
ρβi + Yi β2i
Eθ (si ( Yi j xi ; θ)) = = 02 1
ln ( βi ) Ψ (ρ) + ln (Yi )

So, we have
ρ
Eθ (Y i ) =
βi
where Eθ denotes the expectation with respect to the conditional distribution of Y given
X = x.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 57 / 88
Solution (cont’d)

!
ρβ2i + 2Eθ (Yi ) β3i βi
I (θ) = EX
βi Ψ0 (ρ)
ρ
Eθ (Y i ) =
βi
Under H1 (unconstrained model), the average Fisher information is de…ned as to be:
!
ρβ2i βi
I (θ) = EX
βi Ψ0 (ρ)

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 58 / 88
Solution (cont’d)
Under H0 (constrained model), we have:

Hi ( Yi j xi ; β) = β2i 2Yi β3i

Eβ (si ( Yi j xi ; β)) = Eβ βi + Yi β2i = 0

The average Fisher information number is de…ned by:

I ( β) = EX Eβ ( Hi ( Yi j xi ; β))

= EX β2i + 2Eβ (Yi ) β3i

= EX β2i

The average Fisher information number is equal to:

I ( β ) = EX β2i

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 59 / 88
Problem (cont’d)
Question 4: denote b θH 1 the ML estimator of θ = ( β : ρ)> obtained under H1 and
θH 0 = b
b βH 0 the ML estimator of β obtained under H0 : ρ = 1. Determine the asymptotic
distribution and the asymptotic variance covariance matrix of b
θH 1 and b
θH 0 .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 60 / 88
Solution
The regularity conditions hold. Under H1 (unconstrainded model) we have:
p d
N b
θH 1 θ1 ! N 0, I 1
(θ1 )

where θ1 denotes the true value of the parameters (under H1 ), or equivalently:

b
asy 1 1
θH 1 N θ1 , I (θ1 )
H1 N
with !
ρβ2i βi
I (θ1 ) = EX
βi Ψ0 (ρ)

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 61 / 88
Solution (cont’d)
The regularity conditions hold. Under H0 (constrainded model) we have:
p d
N b
θH 0 θ0 ! N 0, I 1
(θ0 )

where θ0 = β0 denotes the true value of the parameter (under H0 ), or equivalently:

b
asy 1 1
θH 0 N θ0 , I (θ0 )
H0 N
with
I (θ0 ) = EX β2i

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 62 / 88
Problem (cont’d)
Question 5: Propose three alternative estimators of the average Fisher information
matrices under H1 and under H0 .

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 63 / 88
Solution
Three alternative estimators of the average Fisher information matrix I (θ) can be used:

1 N b b
N i∑
bI A b
θ = Ii θ
=1
> !
1 N ∂`i (θ; yi j xi ) ∂`i (θ; yi j xi )
N i∑
bI B b
θ =
=1 ∂θ b
θ ∂θ b
θ

1 N ∂2 `i (θ; yi j xi )
N i∑
bI c b
θ =
=1 ∂θ∂θ> b
θ

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 64 / 88
Solution (cont’d)
First estimator: actual Fisher information matrix

1 N b b
N i∑
bI A b
θ = Ii θ
=1

Under H1 : 0 1
1 ρ ∑N
b b2 ∑N b
bI A b @ i =1 β i i =1 β i A
θ =
N b
∑Ni =1 β i N Ψ0 (b
ρ)
Under H0 :
1 N b b 1 N b2
N i∑
bI A b
θ = Ii θ = ∑ i =1 β i
=1 N

where b
βi = 1/ b
β + xi and where the estimators b
β and b
ρ are obtained under H1
(unconstrained model) or H0 (constrained model) given the case.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 65 / 88
Solution (cont’d)
Second estimator: BHHH estimator
> !
1 N ∂`i (θ; yi j xi ) ∂`i (θ; yi j xi )
N i∑
bI B b
θ =
=1 ∂θ b
θ ∂θ b
θ

Under H1 :
0 1
2
1 N B ρb
b βi + yi b
βi
C
N i∑
bI B b
θ = @ A
=1 ln b
βi Ψ (b
ρ) + ln (yi )
2
ρb
b βi + yi b
βi ln b
βi Ψ (b
ρ) + ln (yi )

Under H0 :
1 N 2 2

N i∑
bI B b
θ = b
βi + yi b
βi
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 66 / 88
Solution (cont’d)
Third estimator: Hessian

1 N ∂2 `i (θ; yi j xi )
N i∑
bI C b
θ =
=1 ∂θ∂θ> b
θ

Under H1 : 0 1
2 3
1 N ρb
b βi + 2yi b
βi b
βi
N i∑
bI C b
θ = @ A
=1 b
βi Ψ0 (b
ρ)
Under H0 :
1 N 2 3
N i∑
bI C b
θ = b
βi + 2yi b
βi
=1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 67 / 88
Problem (cont’d)
Question 6: Consider the dataset provided by Greene (2007) in the …le
Chapter4_Exercise2.xls. Write a Matlab code (1) to estimate the parameters of model
under H1 ( unconstrained model) by MLE, and (2) to compute three alternative
estimates of the asymptotic variance covariance matrix.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 68 / 88
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 69 / 88
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 70 / 88
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 71 / 88
Remarks

1 Asymptotically the three estimators of the asymptotic variance covariance matrix are
equivalent.

2 But, this exercise con…rms that these estimators can give very di¤erent results for
small samples

3 The striking di¤erence of the BHHH estimator is typical of its erratic performance in
small samples

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 72 / 88
Problem (cont’d)
Question 7: Write a Matlab code (1) to estimate the parameters of model under H0
( constrained model) by MLE, and (2) to compute three alternative estimates of the
asymptotic variance covariance matrix.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 73 / 88
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 74 / 88
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 75 / 88
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 76 / 88
Problem (cont’d)
Question 8: test the hypothesis

H0 : ρ = 1 versus H1 : ρ 6= 1

with a likelihood ratio (LR) test for a signi…cance level of 5%.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 77 / 88
Solution
The likelihood ratio (LR) test-statistic is de…ned by:

LR = 2 `N b
θH 0 ; y j x `N b
θH 1 ; y j x

In this sample, we have a realisation equal to

LR (y ) = 2 ( 88.4363 + 82.9160) = 11.0406

The critical region is

n o
W = y : LR (y ) > χ20.95 (1) = 3.8415

where χ20.95 (1) is the critical value of the chi-squared distribution with p = 1 degrees of
freedom. Conclusion: for a signi…cance level of 5%, we reject the null H0 : ρ = 1.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 78 / 88
Problem (cont’d)
Question 9: test the hypothesis

H0 : ρ = 1 versus H1 : ρ 6= 1

with a Wald test for a signi…cance level of 5%.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 79 / 88
Solution
The null hypothesis H0 : ρ = 1 can be expressed as

H0 : c (θ) = 0

with c (θ) = ρ 1. The Wald test-statistic is de…ned by:

> > 1
∂c ∂c
Wald = c b
θH 1 b b asy b
θH 1 V θH 1 b
θH 1 c b
θH 1
∂θ> ∂θ>
Here, we have
∂c b
θH 1 = 0 1
∂θ>
Then, we get
2
Wald (y ) = b
ρH 1 1 Vasy1 b
ρH 1

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 80 / 88
Solution (cont’d)
2
Wald (y ) = b
ρH 1 1 Vasy1 b
ρH 1

Given the estimator chosen for the asymptotic variance, we get:

(3.1509 1)2
WaldA (y ) = = 8.0214
0.5768
(3.1509 1)2
WaldB (y ) = = 3.0096
1.5372

(3.1509 1)2
WaldC (y ) = = 7.3335
0.6309
The critical region is
n o
W = y : Wald (y ) > χ20.95 (1) = 3.8415

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 81 / 88
Solution (cont’d)
Conclusion:

1 For a signi…cance level of 5%, the Wald test-statistic based on the estimators A and
C (actual Fisher matrix and Hessian) of the asymptotic variance covariance matrix
lead to reject the null H0 : ρ = 1.

2 For a signi…cance level of 5%, the Wald test-statistic based on the estimators B
(BHHH estimator) of the asymptotic variance covariance matrix fails to reject the
null H0 : ρ = 1.

3 In most of software, the Hessian (estimator C) is preferred and the Wald

test-statistics are computed with this estimator.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 82 / 88
Problem (cont’d)
Question 10: test the hypothesis

H0 : ρ = 1 versus H1 : ρ 6= 1

with a Lagrange Multiplier test for a signi…cance level of 5%. Write a Matlab code to
compute the three possibles values of the LM test-statistic.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 83 / 88
Solution
The Lagrange multiplier test is based on the restricted estimators. The LM test-statistic
is de…ned by:
> 1
LM = sN b θH ; yi j xi0
bI N b
θH sN b θH ; yi j xi
0 0

or equivalently
> 1
LM = sN b
θH 0 ; yi j xi N bI b
θH 0 sN b
θH 0 ; yi j xi

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 84 / 88
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 85 / 88
Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 86 / 88
Solution (cont’d)

So, given the estimator chosen for bI b

θH 0 , we have:

LMA (y ) = 4.7825

LmB (y ) = 15.6868

LMC (y ) = 5.1162
The critical region is
n o
W = y : LM (y ) > χ20.95 (1) = 3.8415

Conclusion: for a signi…cance level of 5%, we reject the null H0 : ρ = 1, whatever the
choice of the estimator.

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 87 / 88
End of Exercices - Chapter 4

Christophe Hurlin

Christophe Hurlin (HEC Lausanne) Data science and advanced programming September 2024 88 / 88
Data science and advanced programming
Correction - Series 1
Christophe HURLIN

October 27, 2024

Exercice Maximum de vraisemblance et tests LR-LM-Wald. Barème : 30 points

Partie I : Maximum de vraisemblance pour θ1 connu (12 points)

Question 1 (2 points) On sait que les variables (X1 , ..., Xn ) sont i.i.d. de même loi que X (0,5
point). Dès lors, on a :
n
X
ℓn (θ2 ; x) = ln fX (xi ; θ1 ) (1)
i=1
Xn
= ln (1 + θ1 ) + ln θ2 + θ1 ln (xi ) − θ2 x1+θ
i
1
(2)
i=1

La log-vraisemblance associée au n-échantillon (x1 , ..., xn ) est donc définie par :

n
X n
X
ℓn (θ2 ; x) = n ln (1 + θ1 ) + n ln (θ2 ) + θ1 ln (xi ) − θ2 x1+θ
i
1
(1,5 point) (3)
i=1 i=1

Question 2 (2 points) Soit gn (θ2 ; x) le gradient associé à l’échantillon (x1 , ..., xn ).

n
∂ℓn (θ2 ; x) n X 1+θ1
gn (θ2 ; x) = = − xi (1 point) (4)
∂θ2 θ2
i=1

Soit Hn (θ2 ; x) la hessienne associée à l’échantillon (x1 , ..., xn ).

∂ 2 ℓn (θ2 ; x) n
Hn (θ2 ; x) = 2 =− 2 (1 point) (5)
∂θ2 θ2

1
Question 3 (2 points) Soit θb2 l’estimateur du maximum de vraisemblance du paramètre θ2 . Ce
dernier vérifie :
θb2 = arg max ℓn (θ2 ; x) (0,5 point) (6)
θ2 ∈R+

La condition nécessaire (équation de vraisemblance) du programme d’optimisation de la log-

vraisemblance s’écrit alors :
n
∂ℓn (θ2 ; x) n X 1+θ1
gn θ2 ; x =
b = − xi = 0 (0,5 point) (7)
∂θ2 θb2 θb2 i=1

D’où l’on tire : !−1

n
1 X 1+θ1
θb2 = xi (8)
n
i=1
La condition suffisante du programme d’optimisation de la log-vraisemblance s’écrit :
∂ 2 ℓn (θ2 ; x) n
Hn θb2 ; x = 2 = − < 0 (0,5 point) (9)
∂θ2 θb2 θb22
On a bien un maximum : l’estimateur du maximum de vraisemblance est donc défini par :
n
!−1
1 X
1+θ 1
θb2 = Xi (0,5 point) (10)
n
i=1

Question 4 (2 points) Puisque les variables Xi1+θ1 sont i.i.d. de même loi que X 1+θ1 avec
E X 1+θ1 = 1/θ2 , la loi faible des grands nombres (théorème de Khintchine) implique que :
n
1 X 1+θ1 p 1+θ1 1
Xi →E X = (1 point) (11)
n θ2
i=1

Soit g (z) = z −1 une fonction continue telle que

n
! n
!−1
1 X
1+θ 1 X 1+θ1
θb2 = g Xi 1 = Xi (12)
n n
i=1 i=1

Par application du continuous mapping theorem (CMP), il vient :

n
!
1 X
1+θ1 p 1
θ2 = g
b Xi →g = θ2 (0,5 point) (13)
n θ2
i=1

On en déduit que θb2 est un estimateur convergent (au sens faible) du paramètre θ2 :
p
θb2 → θ2 (0,5 point) (14)

Question 5 (2 points) Le n-échantillon X11+θ1 , . . . , Xn1+θ1 est i.i.d. de même loi que X 1+θ1
avec E X 1+θ1 = 1/θ2 et V X 1+θ1 = 1/θ22 . D’après le théorème central limite de Lindberg-

Levy, la moyenne empirique n−1 ni=1 Xi1+θ1 vérifie :

n
!
√ X
d

n n−1 Xi1+θ1 − E X 1+θ1 → N 0, V X 1+θ1 (15)
i=1

Page 2
soit encore :
n
!
√

−1
X 1 d1
n n Xi1+θ1 − → N 0, 2 (1 point) (16)
θ2 θ 2
i=1

Soit g (z) = z une fonction continue telle que θb2 = g(n−1 ni=1 Xi1+θ1 ) = (n−1 ni=1 Xi1+θ1 )−1
−1
P P
et ∂g (z) /∂z = −1/z 2 . Par application de la méthode delta, il vient :
n
! ! !
2
√ X 1 d 1 ∂g (z)
n g n−1 Xi1+θ1 − g → N 0, 2 × (0,5 point) (17)
θ2 θ2 ∂z 1/θ2
i=1

ou encore :
√ θ24

d
n θ2 − θ2 → N 0, 2
b (18)
θ2
On obtient au final : √
d
n θb2 − θ2 → N 0, θ22

(0,5 point) (19)

Question 6 (2 points) Pour θ1 = 1, la réalisation de l’estimateur θb2 (estimation) est égale à :

n
!−1
100 −1 200

1 X
2
θb2 = xi = = = 2 (1 point) (20)
n 200 100
i=1

D’après le résultat de la question 5, on a :

√
d
n θb2 − θ2 → N 0, θ22

(21)

Pour n grand mais fini, on peut utiliser l’approximation suivante :

θ22

asy
θ2 ≈ N θ2 ,
b (22)
n
Dès lors, pour un niveau de risque α, il vient :
!
−1
α θb2 − θ2 α
Pr Φ < √ < Φ−1 1 − =1−α (23)
2 θ2 / n 2

où Φ (.) désigne la fonction de répartition de la loi normale standard. Puisque Φ−1 (α/2) =
−Φ−1 (1 − α/2), on en déduit un intervalle de confiance sur la valeur du paramètre θ2 :

θ2 −1 α b θ2 −1 α
IC1−α = θ2 −
b √ Φ 1− ; θ2 + √ Φ 1− (24)
n 2 n 2
Le paramètre θ2 étant inconnu, on le remplace par son estimateur :

1 −1 α b 1 −1 α
IC1−α = θ2 1 − √ Φ
b 1− ; θ2 1 + √ Φ 1− (25)
n 2 n 2

Application numérique : α = 5%, n = 200.

1, 96 1, 96
IC95% = 2 × 1 − √ ;2 × 1 + √ = [1, 7228 ; 2, 2772] (1 point)
200 200

Page 3
Partie II : Maximum de vraisemblance (8 points)
On admet que le gradient gn (θ; x) et la matrice hessienne Hn (θ; x) associés à l’échantillon
(x1 , . . . , xn ) s’écrivent respectivement sous la forme :
 Pn Pn 
n 1+θ1
1+θ1 + i=1 ln (x i ) − θ 2 i=1 ln (x i ) xi
gn (θ; x) =  Pn  (26)
n 1+θ1
θ2 − i=1 ix
Pn 2 1+θ1 Pn
n
x1+θ
 
− (1+θ 2 − θ2 i=1 ln (xi ) xi − i=1 ln (xi ) i
1
1)
Hn (θ; x) =  Pn  (27)
1+θ1
− i=1 ln (xi ) xi − θn2
2

Question 7 (2 points) Pour θ1 = 1 et θ2 = 2, on obtient

n Pn Pn 2
!
2 + i=1 ln (xi ) − 2 i=1 ln (xi ) xi
gn (θ; x) = n Pn (28)
2
2 − i=1 xi
200
!
2 − 130 − 2 × (−15)
= 200
(29)
2 − 100

0
= (1 point) (30)
0

De la même façon, on obtient :

− n4 − 2 ni=1 ln (xi )2 x2i − ni=1 ln (xi ) x2i

P P !
Hn (θ; x) = (31)
− ni=1 ln (xi ) x2i − n4
P

− 200
!
4 − 2 × 19 − (−15)
= (32)
− (−15) − 200
4
!
−88 15
= (1 point) (33)
15 −50

Question 8 (2 points) On note θe = (1, 2)′ la condition initiale. Soit θb le nouveau point candidat
déterminé par la méthode de Gauss Newton défini par :

θb = θe − Hn−1 (θ;
e x)gn (θ;
e x) (0,5 point) (34)

ou encore ! !−1 ! !
1 −88 15 0 1
θb = − × = (1 point) (35)
2 15 −50 0 2
Le vecteur (1, 2)′ est optimal et il correspond à l’estimation finale, puisque le gradient de la
log-vraisemblance en ce vecteur est nul.

Question 9 (2 points) Par définition, on a :

I n (θ) = Eθ0 −Hn θ;bX (0,5 point) (36)

Page 4
Il vient :
 n Pn 2 1+θ1 Pn 1+θ1

+ θ2
(1+θ1 )2 i=1 Eθ0 ln (Xi ) Xi i=1 Eθ0 ln (Xi ) Xi
I n (θ) =  Pn  (37)
i=1 Eθ0 ln (Xi ) Xi1+θ1 n
θ22

Pour θ = (1, 2)′ et sachant que Eθ0 ln (X)2 X 2 = 0, 0950, Eθ0 ln (X) X 2 = −0, 0750 et

n = 200, on obtient :
n 2 2 n × E ln (X) X 2
!
4 + 2 × n × Eθ0 ln (X) X θ0
I n (θ) = (38)
n
n × Eθ0 ln (X) X 2 4
200
!
4 + 2 × 200 × 0, 0950 −200 × 0, 0750
= 200
(39)
−200 × 0, 0750 4
!
88 −15
= (1,5 point) (40)
−15 50

Question 10 (2 points) Sous les hypothèses de régularité, l’estimateur du MV vérifie :

√
d
n θb − θ0 → N 0, I −1 (θ0 )

(41)

où I(θ0 ) désigne la matrice d’information de Fisher moyenne. La matrice de variance covari-
ance asymptotique est égale à :
−1

b asy θb = b
V I n θb (0,5 point) (42)

On obtient donc (0,5 point)

!−1 50 15
! !
88 −15 4175 4175 0, 0120 0, 0036
b asy θb =
V = = (43)
15 88
−15 50 4175 4175 0, 0036 0, 0211

Les écarts-types asymptotiques sont égaux à :

p
b 1/2 θb1 = 0, 0120 = 0, 1094
std θb1 = V (0,5 point) (44)
asy

p
b 1/2 θb2 = 0, 0211 = 0, 1452
std θb2 = V (0,5 point) (45)
asy

Partie III : Inférence (10 points)

Question 11 (3 points) La statistique de test LR est définie par :

LR = −2 ℓn (θH0 ; x) − ℓn θbH1 ; x (46)

où θH0 = (θ1,H0 , θ2,H0 )′ = (0, 2)′ désigne le vecteur de paramètres sous H0 et θbH1 = (θb1,H1 ; θb2,H1 )′ =
(1, 2)′ l’estimation du vecteur de paramètres θ obtenue sous l’hypothèse alternative par max-
imum de vraisemblance. La log-vraisemblance associée au n-échantillon (x1 , ..., xn ) sous

Page 5
l’hypothèse nulle est égale à :
n n
X X 1+θ1,H0
ℓn (θH0 ; x) = n ln (1 + θ1,H0 ) + n ln (θ2,H0 ) + θ1,H0 ln (xi ) − θ2,H0 xi (47)
i=1 i=1
= 200 × ln (1) + 200 × ln (2) + 0 × (−130) − 2 × 121 (48)
= −103, 3706 (0,5 point) (49)

La log-vraisemblance associée au n-échantillon (x1 , ..., xn ) sous l’hypothèse alternative est :

n n
X X 1+θb1,H1
ℓn θbH1 ; x = n ln 1 + θb1,H1 + n ln θb2,H1 + θb1,H1 ln (xi ) − θb2,H1 xi (50)
i=1 i=1
n
X n
X
= n ln (2) + n ln (2) + ln (xi ) − 2 x2i (51)
i=1 i=1
= 200 × ln (2) + 200 × ln (2) − 130 − 2 × 100 (52)
= −52, 7411 (0,5 point) (53)

La réalisation de la statistique LR est donc égale à :

LR (x) = −2 ℓn (θH0 ; x) − ℓn θbH1 ; x (54)
= −2 (−52, 7411 + 103, 3706) (55)
= 101, 2590 (1 point) (56)

Sous les hypothèses de régularité et sous l’hypothèse nulle, on a :

d
LR → χ2 (2) (57)

La région critique pour un niveau de risque de 5% est donc égale à :

W = x : LR (x) > χ20.95 (2) = 5, 9915

(0,5 point) (58)

Pour un niveau de risque de 5%, on rejette l’hypothèse nulle H0 : θ1 = 0 et θ2 = 2 (0,5 point).

Question 12 (3 points) Le test se réécrit sous la forme

H0 : Rθ = q (59)

avec :
1 0 0
R = I2 = q= (0,5 point) (60)
0 1 2
La statistique de Wald est définie par :
⊤ −1
⊤
Wald = RθbH1 − q b asy θbH
RV 1 R R bH − q
θ 1

où θbH1 désigne l’estimateur du maximum de vraisemblance obtenu sous l’hypothèse alterna-
tive. Il vient :

1 0 1 0 1
R θH 1 − q =
b − = (0,5 point) (61)
0 1 2 2 0

Page 6
!−1 !
88 −15 0, 0120 0, 0036
b asy θbH =
V = (62)
1
−15 50 0, 0036 0, 0211
!

⊤
0, 0120 0, 0036
RV
b asy θbH
1 R = b asy θbH =
V 1 (0,5 point) (63)
0, 0036 0, 0211
On obtient donc :
!−1
⊤ 0, 01202 0
Wald = 1 0 1 0 (64)
0 0, 02112
!
⊤ 88 −15
= 1 0 1 0 (65)
−15 50

On en déduit immédiatement la réalisation de la statistique de Wald :

Wald (x) = 88 (1 point) (66)

Sous les hypothèses de régularité et sous l’hypothèse nulle, on a :

d
Wald → χ2 (2) (67)

La région critique pour un niveau de risque de 5% est donc égale à :

W = x : Wald (x) > χ20.95 (2) = 5, 9915

(0,5 point) (68)

Pour un niveau de risque de 5%, on rejette l’hypothèse nulle H0 : θ1 = 0 et θ2 = 2 (0,5 point).

Question 13 (4 points) La statistique du score ou statistique LM (Lagrange Multiplier) est

définie par :
′
LM = sn (θH0 ; X) Vb asy (θH ) sn (θH ; X) (0,5 point)
0 0 (69)
où θH0 désigne le vecteur de paramètres sous H0 , sn (θ; X) désigne le score associé à log-
vraisemblance de l’échantillon (X1 , ..., Xn ) sous l’hypothèse alternative H1 et V
b asy (θ) désigne
un estimateur convergent de la matrice de variance-covariance asymptotique de l’estimateur
du maximum de vraisemblance θ. b Dans notre cas :

∂ℓn (θ; X)
Sn (θH0 ; X) = (70)
∂θ θH0
 
n Pn Pn 1+θ1,H0
1+θ1,H0 + i=1 ln (Xi ) − θ2,H0 i=1 ln (Xi ) Xi
=  (0,5 point)
(71)
 
1+θ1,H0
Pn 
n
θ2,H0 − i=1 Xi

Page 7
La réalisation du score (gradient) est égale à :
n Pn Pn !
1 + i=1 ln (xi ) − 2 i=1 ln (xi ) xi
Sn (θH0 ; x) = n Pn (72)
2 − i=1 xi
n Pn Pn !
1 + i=1 ln (xi ) − 2 i=1 ln (xi ) xi
= n Pn (73)
2 − i=1 xi
!
200 − 130 − 2 × (−44)
= 200
− 121
2
158
= (1 point) (74)
−21

L’estimation de la matrice de variance-covariance asymptotique de θb sous H0 est égale à :

!
0, 0043 0, 0013
b asy (θH ) =
V (75)
0
0, 0013 0, 0204

On en déduit la valeur de la réalisation de la statistique LM :

! !
0, 0043 0, 0013 158
LM (x) = 158 −21 × × (76)
0, 0013 0, 0204 −21
ou encore :
LM (x) = 107, 3763 (1 point) (77)
Sous les hypothèses de régularité et sous l’hypothèse nulle, on a :
d
LM → χ2 (2) (78)

La région critique pour un niveau de risque de 5% est donc égale à :

W = x : LM (x) > χ20.95 (2) = 5, 9915

(0,5 point) (79)

Pour un niveau de risque de 5%, on rejette l’hypothèse nulle H0 : θ1 = 0 et θ2 = 2 (0,5 point).

Page 8

Radar Delay Estimation and Analysis
No ratings yet
Radar Delay Estimation and Analysis
55 pages
Statistical Signal Processing Problem Set
No ratings yet
Statistical Signal Processing Problem Set
3 pages
MIMO System Simulation and Estimation Techniques
No ratings yet
MIMO System Simulation and Estimation Techniques
18 pages
Estimation Theory Overview
100% (2)
Estimation Theory Overview
66 pages
HW 2 Solutions
No ratings yet
HW 2 Solutions
6 pages
Solutions to Kay's Estimation Theory
67% (3)
Solutions to Kay's Estimation Theory
16 pages
Nonparametric Estimation Techniques
No ratings yet
Nonparametric Estimation Techniques
37 pages
ECE 531 Detection & Estimation Theory Solutions
100% (2)
ECE 531 Detection & Estimation Theory Solutions
47 pages
Statistical Inference and Estimation Techniques
No ratings yet
Statistical Inference and Estimation Techniques
25 pages
Statistical Inference and Estimation Techniques
No ratings yet
Statistical Inference and Estimation Techniques
82 pages
Properties of Sums in ECON 139
No ratings yet
Properties of Sums in ECON 139
17 pages
Point Estimation Fundamentals Explained
No ratings yet
Point Estimation Fundamentals Explained
11 pages
Statistical Estimation Exercises and Solutions
No ratings yet
Statistical Estimation Exercises and Solutions
10 pages
Parameter Estimation in Statistics
No ratings yet
Parameter Estimation in Statistics
44 pages
Math Exercises on Random Variables and OLS
No ratings yet
Math Exercises on Random Variables and OLS
8 pages
Maximum Likelihood & Moment Estimators in Statistics
No ratings yet
Maximum Likelihood & Moment Estimators in Statistics
6 pages
Unbiased Estimation in Statistics
No ratings yet
Unbiased Estimation in Statistics
186 pages
Estimating Functions in Bivariate Exponential Distribution
No ratings yet
Estimating Functions in Bivariate Exponential Distribution
17 pages
CSCI567 Spring 2016 Homework 1: Density Estimation
No ratings yet
CSCI567 Spring 2016 Homework 1: Density Estimation
9 pages
Math 448 Statistics Lecture Notes
No ratings yet
Math 448 Statistics Lecture Notes
184 pages
Bias and Estimation in Statistics
No ratings yet
Bias and Estimation in Statistics
4 pages
Inferential Statistics Exam 2017-2018
No ratings yet
Inferential Statistics Exam 2017-2018
32 pages
Statistical Signal Processing Problem Set
No ratings yet
Statistical Signal Processing Problem Set
5 pages
Foundations of Parametric Estimation in ML
No ratings yet
Foundations of Parametric Estimation in ML
13 pages
Statistical Properties of LS Estimator
No ratings yet
Statistical Properties of LS Estimator
25 pages
ESO 209: MLE and MME in Statistics
No ratings yet
ESO 209: MLE and MME in Statistics
3 pages
Unbiased Estimator
No ratings yet
Unbiased Estimator
70 pages
Statistical Signal Processing Practice Problems
No ratings yet
Statistical Signal Processing Practice Problems
1 page
Probability & Statistics Solutions for Economics
No ratings yet
Probability & Statistics Solutions for Economics
6 pages
Unbiased Estimators and Consistency
No ratings yet
Unbiased Estimators and Consistency
12 pages
Properties of Point Estimators Explained
No ratings yet
Properties of Point Estimators Explained
9 pages
R Code Estimation for German Tank Problem
No ratings yet
R Code Estimation for German Tank Problem
2 pages
Unbiased Estimators and MLE in Statistics
No ratings yet
Unbiased Estimators and MLE in Statistics
7 pages
Statistical Signal Processing Problem Set
No ratings yet
Statistical Signal Processing Problem Set
4 pages
Estimation Theory Estimation Theory
No ratings yet
Estimation Theory Estimation Theory
55 pages
Stat 3005 Assignment 1 Solutions
No ratings yet
Stat 3005 Assignment 1 Solutions
7 pages
Introduction to R: Expectation & Variance
No ratings yet
Introduction to R: Expectation & Variance
118 pages
Unbiased Estimators in Statistics
No ratings yet
Unbiased Estimators in Statistics
13 pages
Data Analysis Techniques in Statistics
No ratings yet
Data Analysis Techniques in Statistics
10 pages
Bayesian Time Series Econometrics Guide
No ratings yet
Bayesian Time Series Econometrics Guide
73 pages
Statistical Signal Processing Problem Set 3
No ratings yet
Statistical Signal Processing Problem Set 3
2 pages
PSTAT 120B Midterm Practice Exam
No ratings yet
PSTAT 120B Midterm Practice Exam
4 pages
Estimation Theory Overview
No ratings yet
Estimation Theory Overview
17 pages
UCLA Stats 100B Exam 2 Questions
No ratings yet
UCLA Stats 100B Exam 2 Questions
4 pages
Detection and Estimation Theory Homework
No ratings yet
Detection and Estimation Theory Homework
5 pages
Statistical Inference in Linear Models
No ratings yet
Statistical Inference in Linear Models
43 pages
Chebyshev's Inequality and Estimators
No ratings yet
Chebyshev's Inequality and Estimators
12 pages
OLS Estimator Properties Explained
No ratings yet
OLS Estimator Properties Explained
40 pages
ECON 1630 Problem Set #2 Solutions
No ratings yet
ECON 1630 Problem Set #2 Solutions
9 pages
Parameter Estimation: ML vs Bayesian
No ratings yet
Parameter Estimation: ML vs Bayesian
15 pages
Econometrics: Linear Mean Model Analysis
No ratings yet
Econometrics: Linear Mean Model Analysis
8 pages
Estimation Theory in Signal Processing
No ratings yet
Estimation Theory in Signal Processing
56 pages
Biostatistics: Key Concepts & Data Analysis
No ratings yet
Biostatistics: Key Concepts & Data Analysis
86 pages
Random Factor Experiments Analysis
No ratings yet
Random Factor Experiments Analysis
37 pages
Online Shopping Behavior Research Methodology
No ratings yet
Online Shopping Behavior Research Methodology
5 pages
Statistical Analysis of Drug Nausea Effects
No ratings yet
Statistical Analysis of Drug Nausea Effects
3 pages
Factors Affecting Trainees' Diet Choices
No ratings yet
Factors Affecting Trainees' Diet Choices
8 pages
Regression Analysis of Height and Weight
No ratings yet
Regression Analysis of Height and Weight
3 pages
Data Analyst Aptitude Test Paper
No ratings yet
Data Analyst Aptitude Test Paper
31 pages
Attrition Analysis in Sundaram BPO
67% (3)
Attrition Analysis in Sundaram BPO
138 pages
BUS 511 Summer 2017 Exam Guide
No ratings yet
BUS 511 Summer 2017 Exam Guide
2 pages
SPSS Data Import and Analysis Guide
No ratings yet
SPSS Data Import and Analysis Guide
36 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
49 pages
Cash Flow Mastery: Fueling Micro-Enterprises' Financial Stability
No ratings yet
Cash Flow Mastery: Fueling Micro-Enterprises' Financial Stability
13 pages
Ensemble Methods - Bagging, Boosting and Stacking - by Joseph Rocca - Towards Data Science
No ratings yet
Ensemble Methods - Bagging, Boosting and Stacking - by Joseph Rocca - Towards Data Science
26 pages
Validation by Design: The Statistical Handbook For Pharmaceutical Process Validation
No ratings yet
Validation by Design: The Statistical Handbook For Pharmaceutical Process Validation
18 pages
Understanding Measurement Errors
No ratings yet
Understanding Measurement Errors
8 pages
Mitchell H. Katz-Evaluating Clinical and Public Health Interventions - A Practical Guide To Study Design and Statistics (2010)
100% (1)
Mitchell H. Katz-Evaluating Clinical and Public Health Interventions - A Practical Guide To Study Design and Statistics (2010)
176 pages
Six Sigma Analysis Tools Overview
No ratings yet
Six Sigma Analysis Tools Overview
22 pages
Statistics Assignment 1 Answers
No ratings yet
Statistics Assignment 1 Answers
2 pages
Philippine Creative Economy Employment 2023
No ratings yet
Philippine Creative Economy Employment 2023
1 page
CMSD Standard: Insights and Recommendations
No ratings yet
CMSD Standard: Insights and Recommendations
12 pages
Quantitative Research Methodology Guide
No ratings yet
Quantitative Research Methodology Guide
8 pages
Understanding Probability Basics
No ratings yet
Understanding Probability Basics
48 pages
BUAN 6359 Exam #2 Practice Questions
No ratings yet
BUAN 6359 Exam #2 Practice Questions
13 pages
Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning
No ratings yet
Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning
49 pages
Limestone Exploration in Goyla Block, Gujarat
No ratings yet
Limestone Exploration in Goyla Block, Gujarat
10 pages
Estimation Theory in Statistics
No ratings yet
Estimation Theory in Statistics
5 pages
Statistical Analysis of Google Pixel Data
No ratings yet
Statistical Analysis of Google Pixel Data
29 pages
Business Research Methods Overview
No ratings yet
Business Research Methods Overview
63 pages
Data Mining for Credit Scoring Models
No ratings yet
Data Mining for Credit Scoring Models
23 pages
Sample Mean, Variance, and Deviation Guide
No ratings yet
Sample Mean, Variance, and Deviation Guide
9 pages