0% found this document useful (0 votes)
8 views

Econometrics - Exercise set 2 (solution)

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Econometrics - Exercise set 2 (solution)

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Econometrics - Exercise set 2

* Marks exercises to be solved in class. Try to solve the remaining at home.

Exercise 1
Remember exercise 5 of exercise set 1 we had Model 1,

𝑤𝑎𝑔𝑒 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐 + 𝛽2 𝑒𝑥𝑝𝑒𝑟 + 𝛽3 𝑡𝑒𝑛𝑢𝑟𝑒 + 𝜀𝑖

We now propose a Model 2,

𝑙𝑜𝑔(𝑤𝑎𝑔𝑒) = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐 + 𝛽2 𝑒𝑥𝑝𝑒𝑟 + 𝛽3 𝑡𝑒𝑛𝑢𝑟𝑒 + 𝜀𝑖

Using Model 2 and the dataset wage1.RData, answer the following questions:

1.1
What is the difference between Model 1 and Model 2?

Model 1 and 2 differs in the specification of the outcome variable. Particularly, Model 1 makes use of wage
in levels, while Model 2 uses the natural logarithm of wage.

1.2
Explain why log(wage) should give a more fitting result

Page 1 of 12
Income distributions are notorious for being right-skewed. We can approximate a normal distribution by
taking the natural logarithm of wage. This will also minimize the effect of outliers.

One should note that when log-transforming the outcome variable the interpretation of coefficients changes.
Namely, in a log-level regression the interpretation of a coefficient 𝛽𝑖 is: “a one-unit change in 𝑥𝑖 leads to a
𝛽𝑖 × 100% change in 𝑦”.

1.3
Normally, we assume that the explanatory variables are exogenous.

What conditions would be necessary if the model were to be endogenous?

For our model to be endogenous, the assumption of strict exogeneity would have to be violated. Hence,

𝐸(𝜀|𝑋) ≠ 0

A violation of the strict exogeneity assumption could occur if we have omitted an otherwise relevant variable
from our model. In this case, the error term would pick up the effect of this variable and a non-zero
correlation between our error term and explanatory variables is possible.

Page 2 of 12
Exercise 2
This exercise is an example of a Monte Carlo simulation.

2.1*
Only exercise 2.1 is meant for class. Try to do 2.2 at home.

Using 100 iterations, simulate the following data generation process and estimate the equations. Estimate
eq. (1) and (2), using the LM-function, and try estimating eq. (2) and (3) using the NLS-function.

(Hint: Create a new DGP for every iteration, i.e., include the DGP in the for-loop). Set seed to 123 for
comparability in class.

𝑥𝑖 ∼ 𝑈[0; 20]

𝜀𝑖 ∼ 𝑁(0; 0.01)

𝑦𝑖 = 2 + √𝑥𝑖 + 𝜀𝑖

𝑛 = 100

𝑈[𝑎; 𝑏]: Uniform distribution, with 𝑛 number of observations, from min. level 𝑎 to max. level 𝑏.

𝑁(𝜇; 𝜎 2 ): Normal distribution with 𝑛 number of observations with 𝑚𝑒𝑎𝑛 = 𝜇 and 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝜎 2 . Note
that R-studio take the std. dev. as input in 𝑟𝑛𝑜𝑟𝑚( ).

𝑦 = 𝛽0 + 𝛽1 𝑥 (1)

𝑦 = 𝛽0 + 𝛽1 √𝑥 (2)

𝑦 = 𝛽0 + 𝛽1 𝑥 𝛽2 (3)

2.1.1

Do your estimates yield unbiased estimates of 𝜷𝟏 and 𝜷𝟐 ? Are there any surprising results in this part?

Model 1 does not yield unbiased estimates of 𝛽1 as it converges to 𝑏1 = 0.179. Model 2 (OLS) and model 2
and 3 (NLS) all provide unbiased estimates of 𝛽1 . These results are not surprising as Model 1 is not specified
accurately. Model 2 and 3 are identical although model 3 could prove to be the most robust to consistent
estimates, as we do not assume a functional form.

Page 3 of 12
2.1.2
𝝏𝒇
Determine the 𝟑 × 𝟏 vector of gradients, 𝒈 = 𝝏𝜷 , of equation 3

Taking the partial derivative of equation 3 w.r.t. 𝛽0 , 𝛽1 and 𝛽2 respectively yields:

1
𝜕𝑦 𝛽2
𝑔= =( 𝑥 )
𝜕𝛽 𝛽1 · 𝑥 𝛽2 · ln(𝑥)

2.2

2.2.1

Generate a single dataset, and estimate eq. (3) using the following start values, 𝜷 = [𝟎, 𝟏, 𝟎]′. The
algorithm breaks down - why?

Inserting 𝛽 = [0,1,0]′ into our vector of gradients, 𝑔:

𝜕𝑦 1
0
𝑔= =( 𝑥 =1 )
𝜕𝛽 1 · 𝑥 0 · ln(𝑥) = ln(𝑥)

According to our seventh assumption we need the x-matrix to have full rank.

• Rank = Number of unique rows (not made of other rows).

2.2.2

Using the same data set, estimate eq. (3) using 𝜷 = [𝟎, 𝟏, 𝟏]′ as start values and perform a F- and LM-test
𝟏
on the hypothesis that 𝜷𝟐 = 𝟐. What do we conclude? And is this result surprising?

F-test

Page 4 of 12
Running our F-test via the linearHypothesis function in R, we find a p-value of 0.90. Hence, we fail to reject
1
𝐻0 : 𝛽2 = 2.

LM-test

Computation of the LM-test as described in Heij et al (2004):

Step 1: Estimate the restricted model. That is, equation (3) with 𝛽2 = 0.5 imposed.

Step 2: Auxiliary regression. Regress the residuals from step 1 onto the 𝑛 × 𝑘 matrix of first order derivatives.
The first order derivatives are:

𝜕𝑦 1
𝑔= =( 𝑥 0.5 )
𝜕𝛽 𝑥 0.5 · ln(𝑥)

Step 3: 𝐿𝑀 = 𝑛𝑅2 of the regression in step 2. Asymptotically, the LM-statistic follows the 𝜒 2 (𝑔) distribution,
where 𝑔 is the number of restrictions under the null hypothesis (in our case 𝑔 = 1).

1
Running our LM-test reveals a p-value of 0.90 and we fail to reject 𝐻0 : 𝛽2 = 2.

The conclusions are not surprising, since we did in fact specify our model to have 𝛽2 = 0.5.

Exercise 3
Answer the following questions about Maximum Likelihood.

3.1
State the likelihood function and give a brief interpretation

The likelihood function is defined in equation (4.25) in Heij et al (2004).

𝐿(𝜃) = 𝑝(𝑦, 𝑋, 𝜃) (4.25)

The likelihood function measures the probability of observing the data (𝑦, 𝑋) for different values of 𝜃.
𝑝(𝑦, 𝑋, 𝜃) is a probability density for (𝑦, 𝑋) and the joint density for 𝑌 is the product of the marginal densities
of each 𝑦𝑖 :
𝑛

𝐿(𝜃) = ∏ 𝑝(𝑦𝑖 , 𝑥𝑖 , 𝜃)
𝑖=1

Page 5 of 12
Naturally, we want to choose the parameter values such that the probability of observing the data (𝑦, 𝑋) is
large.

The maximum likelihood estimator is obtained when the parameter values 𝜃 maximizes the function 𝐿(𝜃).

3.2
State the log-likelihood function and give a brief interpretation
𝑛 𝑛

𝑙(𝜃) = log(𝐿(𝜃)) = ∑ log(𝑝𝜃 (𝑦𝑖 , 𝑥𝑖 )) = ∑ 𝑙𝑖 (𝜃)


𝑖=1 𝑖=1

The log-likelihood function is usually preferred over the likelihood function as it is easier to optimize. The
maximum of the likelihood and log-likelihood functions are obtained for the same values of 𝜃 (because the
logarithm is a monotonically increasing transformation).

3.3
Give a brief explanation of the paragraph ML in the linear model on page 227 in Heij (try to do the
calculations). For simplicity use the univariate normal distribution (1.20) on p. 29

For a model following the Gauss-Markov assumptions:

𝑦 = 𝑋𝛽 + 𝜀, 𝜀 = 𝑁(0, 𝜎 2 𝐼) (4.29)

As the error term is normally distributed, we have: 𝑦 ~ 𝑁(𝑋𝛽, 𝜎 2 𝐼). Here 𝑦̅ = 𝑋𝛽 since 𝐸(𝑦) = 𝐸(𝑋𝛽) +
𝐸(𝜖) = 𝐸(𝑋𝛽). The maximum log-likelihood function gives the following solutions:

−1
𝑏𝑀𝐿 = (𝑋 ′ 𝑋) 𝑋 ′ 𝑦 = 𝑏 (4.31)

Page 6 of 12
2 =
1 𝑛−𝑘 2
𝑠𝑀𝐿 (𝑦 − 𝑋𝑏)′ (𝑦 − 𝑋𝑏) = 𝑠 (4.34)
𝑛 𝑛

Where 𝑠 2 is the unbiased least squares estimator of 𝜎 2.

Specify the likelihood distribution

We know from equation (1.20) that the density function of the normal distribution is:

1 1
− 2 (𝑣−𝜇)2
𝑓(𝑣) = ·𝑒 2𝜎
𝜎√2𝜋

In our case:

• 𝑣 = 𝑦 - the outcome
• 𝜇 = 𝑋𝛽 - mean
• (𝑦 − 𝑋𝛽)2 = (𝑦 − 𝑋𝛽)′ (𝑦 − 𝑋𝛽) = 𝑒 ′ 𝑒

Thus:

1 1
− ·(𝑦−𝑋𝛽)′ (𝑦−𝑋𝛽)
𝑓(𝑣) = ·𝑒 2𝜎 2
𝜎√2𝜋

1 1
− 2·(𝑦−𝑋𝛽)′ (𝑦−𝑋𝛽)
𝑓(𝑣) = ·𝑒 2𝜎
√2𝜋𝜎 2

We apply the likelihood function:


𝑛

𝐿(𝜃) = ∏ 𝑝(𝑦𝑖 , 𝑥𝑖 , 𝜃)
𝑖=1

𝑛
1 1
− 2 ·(𝑦−𝑋𝛽)′ (𝑦−𝑋𝛽)
𝐿(𝜃) = ∏ ·𝑒 2𝜎
𝑖=1
√2𝜋𝜎 2

1 1
− ·(𝑦−𝑋𝛽)′ (𝑦−𝑋𝛽)
𝐿(𝜃) = ·𝑒 2𝜎 2
∏𝑛𝑖=1 √2𝜋𝜎 2

1 1
− 2 ·(𝑦−𝑋𝛽)′ (𝑦−𝑋𝛽)
𝐿(𝜃) = · 𝑒 2𝜎
∏𝑛𝑖=1(2𝜋𝜎 2 )0.5

A product of variables yields ∏𝑛𝑖=1 𝜎 2 = (𝜎 2 )𝑛

Page 7 of 12
1 1
− 2 (𝑦−𝑋𝛽)′ (𝑦−𝑋𝛽)
𝐿(𝜃) = 𝑛·𝑒
2𝜎
(2𝜋𝜎 2 ) 2

Take the logarithm:

1 1
− 2 (𝑦−𝑋𝛽)′ (𝑦−𝑋𝛽)
log(𝐿(𝜃)) = 𝑙(𝜃) = log ( 𝑛 ·𝑒 2𝜎 )
(2𝜋𝜎 2 ) 2

Use the log-product rule: log(𝐴𝐵) = log(𝐴) + log(𝐵)

1 1
− 2 (𝑦−𝑋𝛽)′ (𝑦−𝑋𝛽)
𝑙(𝜃) = log ( 𝑛 ) + log (𝑒
2𝜎 )
(2𝜋𝜎 2 ) 2

1
Let us first handle the expression: log ( 𝑛 ):
(2𝜋𝜎 2) 2

𝐴
Use the quotient rule: log (𝐵) = log(𝐴) − log(𝐵)

𝑛
𝑙(𝜃) = log(1) − log [(2𝜋𝜎 2 ) 2 ]

Remember: log(1) = 0 and log(𝐴𝑥 ) = 𝑥 · log(𝐴). Hence, we have:

𝑛
𝑙(𝜃) = 0 − · log(2𝜋𝜎 2 )
2

Again, by the log-product rule:

𝑛 𝑛
𝑙(𝜃) = − · log(2𝜋) − log(𝜎 2 )
2 2
1
− 2 (𝑦−𝑋𝛽)′ (𝑦−𝑋𝛽)
Let us now handle the second expression: log (𝑒 2𝜎 )

Remember that log(𝑒 𝐴 ) = 𝐴. Hence, we have:


1 1
− 2 (𝑦−𝑋𝛽)′ (𝑦−𝑋𝛽)
log (𝑒 2𝜎 ) =− (𝑦 − 𝑋𝛽)′ (𝑦 − 𝑋𝛽)
2𝜎 2

Our final expression of the likelihood function:

𝑛 𝑛 1
𝑙(𝜃) = − · log(2𝜋) − log(𝜎 2 ) − 2 (𝑦 − 𝑋𝛽)′ (𝑦 − 𝑋𝛽)
2 2 2𝜎

Maximum likelihood estimation

Page 8 of 12
The maximum likelihood expression for 𝛽 and 𝜎 are found by taking the first-order conditions (FOC) w.r.t.
these parameters. Let us start with finding the FOC w.r.t. 𝛽.

𝜕𝑙 𝜕 1
= (− 2 (𝑦 − 𝑋𝛽)′ (𝑦 − 𝑋𝛽)) = 0
𝜕𝛽 𝜕𝛽 2𝜎

𝜕𝑙 1
= − 2 · 2 · (−𝑋′)(𝑦 − 𝑋𝛽) = 0
𝜕𝛽 2𝜎

1
= · 2𝑋′(𝑦 − 𝑋𝛽) = 0
2𝜎 2
1
= · 𝑋′(𝑦 − 𝑋𝛽) = 0
𝜎2

Multiply with 𝜎 2:

= 𝑋 ′ 𝑦 − 𝑋 ′ 𝑋𝛽 = 0 ⇔ 𝑋 ′ 𝑦 = 𝑋 ′ 𝑋𝛽
−1
Multiply by (𝑋 ′ 𝑋) :

−1
𝛽 = (𝑋 ′ 𝑋) · 𝑋′ 𝑦

Finally, we want to find an expression for 𝜎 2:

𝜕𝑙 𝜕 𝑛 𝑛 1
= (− · log(2𝜋) − log(𝜎 2 ) − 2 (𝑦 − 𝑋𝛽)′ (𝑦 − 𝑋𝛽))
𝜕𝜎 2 𝜕𝜎 2 2 2 2𝜎

𝑛 1 1
= (− − · (𝑦 − 𝑋𝛽)′ (𝑦 − 𝑋𝛽) · (− 4 )) = 0
2𝜎 2 2 𝜎
𝑛 1 1
=− 2 + · (𝑦 − 𝑋𝛽)′ (𝑦 − 𝑋𝛽) · 4 = 0
2𝜎 2 𝜎
𝑛
Simplify by adding and multiply by 2𝜎 4:
2𝜎 2

𝑛
(𝑦 − 𝑋𝛽 )′ (𝑦 − 𝑋𝛽 ) = · 2𝜎 4
2𝜎 2

(𝑦 − 𝑋𝛽)′ (𝑦 − 𝑋𝛽) = 𝑛𝜎 2

Divide by 𝑛:

2 =
1 𝑒′𝑒
𝜎𝑀𝐿 · (𝑦 − 𝑋𝛽)′ (𝑦 − 𝑋𝛽) =
𝑛 𝑛

From p. 128 in Heij et al (2004) we know that the unbiased least squares estimator of 𝜎 2 is:

𝑒′𝑒
𝑠2 =
𝑛−𝑘

Page 9 of 12
2 = 𝑠 2 then,
Thus for 𝑠𝑀𝐿

𝑒′𝑒 𝑛 − 𝑘 𝑒′𝑒
𝑠2 = · = 2
= 𝑠𝑀𝐿
𝑛−𝑘 𝑛 𝑛

And:

2 =
𝑛−𝑘 2
𝑠𝑀𝐿 ·𝑠
𝑛
2 → 𝑠 2 (i.e. unbiased).
When 𝑛 → ∞ then 𝑠𝑀𝐿

Page 10 of 12
Exercise 4
Use the dataset 𝑠𝑒𝑡24. 𝑅𝐷𝑎𝑡𝑎 on stock market returns data for the sector of non-cyclical consumer goods.

Consider the following models,

𝑟𝑒𝑛𝑑𝑛𝑐𝑐𝑜 = 𝛽0 + 𝛽1 𝑟𝑒𝑛𝑑𝑚𝑎𝑟𝑘 + 𝜀, 𝜀 ∼ 𝑁(0, 𝜎 2 )

4.1
Estimate the model by Maximum Likelihood. Compare the results to normal OLS estimates

The estimates are identical. This is not surprising as the errors are normally distributed. Moreover, standard
errors, t/z-values, and p-values are almost identical.

OLS

MLE

4.2
Create a density plot of the residuals from each model

Page 11 of 12
Page 12 of 12

You might also like