Econometrics - Exercise set 2 (solution)
Econometrics - Exercise set 2 (solution)
Exercise 1
Remember exercise 5 of exercise set 1 we had Model 1,
Using Model 2 and the dataset wage1.RData, answer the following questions:
1.1
What is the difference between Model 1 and Model 2?
Model 1 and 2 differs in the specification of the outcome variable. Particularly, Model 1 makes use of wage
in levels, while Model 2 uses the natural logarithm of wage.
1.2
Explain why log(wage) should give a more fitting result
Page 1 of 12
Income distributions are notorious for being right-skewed. We can approximate a normal distribution by
taking the natural logarithm of wage. This will also minimize the effect of outliers.
One should note that when log-transforming the outcome variable the interpretation of coefficients changes.
Namely, in a log-level regression the interpretation of a coefficient 𝛽𝑖 is: “a one-unit change in 𝑥𝑖 leads to a
𝛽𝑖 × 100% change in 𝑦”.
1.3
Normally, we assume that the explanatory variables are exogenous.
For our model to be endogenous, the assumption of strict exogeneity would have to be violated. Hence,
𝐸(𝜀|𝑋) ≠ 0
A violation of the strict exogeneity assumption could occur if we have omitted an otherwise relevant variable
from our model. In this case, the error term would pick up the effect of this variable and a non-zero
correlation between our error term and explanatory variables is possible.
Page 2 of 12
Exercise 2
This exercise is an example of a Monte Carlo simulation.
2.1*
Only exercise 2.1 is meant for class. Try to do 2.2 at home.
Using 100 iterations, simulate the following data generation process and estimate the equations. Estimate
eq. (1) and (2), using the LM-function, and try estimating eq. (2) and (3) using the NLS-function.
(Hint: Create a new DGP for every iteration, i.e., include the DGP in the for-loop). Set seed to 123 for
comparability in class.
𝑥𝑖 ∼ 𝑈[0; 20]
𝜀𝑖 ∼ 𝑁(0; 0.01)
𝑦𝑖 = 2 + √𝑥𝑖 + 𝜀𝑖
𝑛 = 100
𝑈[𝑎; 𝑏]: Uniform distribution, with 𝑛 number of observations, from min. level 𝑎 to max. level 𝑏.
𝑁(𝜇; 𝜎 2 ): Normal distribution with 𝑛 number of observations with 𝑚𝑒𝑎𝑛 = 𝜇 and 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝜎 2 . Note
that R-studio take the std. dev. as input in 𝑟𝑛𝑜𝑟𝑚( ).
𝑦 = 𝛽0 + 𝛽1 𝑥 (1)
𝑦 = 𝛽0 + 𝛽1 √𝑥 (2)
𝑦 = 𝛽0 + 𝛽1 𝑥 𝛽2 (3)
2.1.1
Do your estimates yield unbiased estimates of 𝜷𝟏 and 𝜷𝟐 ? Are there any surprising results in this part?
Model 1 does not yield unbiased estimates of 𝛽1 as it converges to 𝑏1 = 0.179. Model 2 (OLS) and model 2
and 3 (NLS) all provide unbiased estimates of 𝛽1 . These results are not surprising as Model 1 is not specified
accurately. Model 2 and 3 are identical although model 3 could prove to be the most robust to consistent
estimates, as we do not assume a functional form.
Page 3 of 12
2.1.2
𝝏𝒇
Determine the 𝟑 × 𝟏 vector of gradients, 𝒈 = 𝝏𝜷 , of equation 3
1
𝜕𝑦 𝛽2
𝑔= =( 𝑥 )
𝜕𝛽 𝛽1 · 𝑥 𝛽2 · ln(𝑥)
2.2
2.2.1
Generate a single dataset, and estimate eq. (3) using the following start values, 𝜷 = [𝟎, 𝟏, 𝟎]′. The
algorithm breaks down - why?
𝜕𝑦 1
0
𝑔= =( 𝑥 =1 )
𝜕𝛽 1 · 𝑥 0 · ln(𝑥) = ln(𝑥)
According to our seventh assumption we need the x-matrix to have full rank.
2.2.2
Using the same data set, estimate eq. (3) using 𝜷 = [𝟎, 𝟏, 𝟏]′ as start values and perform a F- and LM-test
𝟏
on the hypothesis that 𝜷𝟐 = 𝟐. What do we conclude? And is this result surprising?
F-test
Page 4 of 12
Running our F-test via the linearHypothesis function in R, we find a p-value of 0.90. Hence, we fail to reject
1
𝐻0 : 𝛽2 = 2.
LM-test
Step 1: Estimate the restricted model. That is, equation (3) with 𝛽2 = 0.5 imposed.
Step 2: Auxiliary regression. Regress the residuals from step 1 onto the 𝑛 × 𝑘 matrix of first order derivatives.
The first order derivatives are:
𝜕𝑦 1
𝑔= =( 𝑥 0.5 )
𝜕𝛽 𝑥 0.5 · ln(𝑥)
Step 3: 𝐿𝑀 = 𝑛𝑅2 of the regression in step 2. Asymptotically, the LM-statistic follows the 𝜒 2 (𝑔) distribution,
where 𝑔 is the number of restrictions under the null hypothesis (in our case 𝑔 = 1).
1
Running our LM-test reveals a p-value of 0.90 and we fail to reject 𝐻0 : 𝛽2 = 2.
The conclusions are not surprising, since we did in fact specify our model to have 𝛽2 = 0.5.
Exercise 3
Answer the following questions about Maximum Likelihood.
3.1
State the likelihood function and give a brief interpretation
The likelihood function measures the probability of observing the data (𝑦, 𝑋) for different values of 𝜃.
𝑝(𝑦, 𝑋, 𝜃) is a probability density for (𝑦, 𝑋) and the joint density for 𝑌 is the product of the marginal densities
of each 𝑦𝑖 :
𝑛
𝐿(𝜃) = ∏ 𝑝(𝑦𝑖 , 𝑥𝑖 , 𝜃)
𝑖=1
Page 5 of 12
Naturally, we want to choose the parameter values such that the probability of observing the data (𝑦, 𝑋) is
large.
The maximum likelihood estimator is obtained when the parameter values 𝜃 maximizes the function 𝐿(𝜃).
3.2
State the log-likelihood function and give a brief interpretation
𝑛 𝑛
The log-likelihood function is usually preferred over the likelihood function as it is easier to optimize. The
maximum of the likelihood and log-likelihood functions are obtained for the same values of 𝜃 (because the
logarithm is a monotonically increasing transformation).
3.3
Give a brief explanation of the paragraph ML in the linear model on page 227 in Heij (try to do the
calculations). For simplicity use the univariate normal distribution (1.20) on p. 29
𝑦 = 𝑋𝛽 + 𝜀, 𝜀 = 𝑁(0, 𝜎 2 𝐼) (4.29)
As the error term is normally distributed, we have: 𝑦 ~ 𝑁(𝑋𝛽, 𝜎 2 𝐼). Here 𝑦̅ = 𝑋𝛽 since 𝐸(𝑦) = 𝐸(𝑋𝛽) +
𝐸(𝜖) = 𝐸(𝑋𝛽). The maximum log-likelihood function gives the following solutions:
−1
𝑏𝑀𝐿 = (𝑋 ′ 𝑋) 𝑋 ′ 𝑦 = 𝑏 (4.31)
Page 6 of 12
2 =
1 𝑛−𝑘 2
𝑠𝑀𝐿 (𝑦 − 𝑋𝑏)′ (𝑦 − 𝑋𝑏) = 𝑠 (4.34)
𝑛 𝑛
We know from equation (1.20) that the density function of the normal distribution is:
1 1
− 2 (𝑣−𝜇)2
𝑓(𝑣) = ·𝑒 2𝜎
𝜎√2𝜋
In our case:
• 𝑣 = 𝑦 - the outcome
• 𝜇 = 𝑋𝛽 - mean
• (𝑦 − 𝑋𝛽)2 = (𝑦 − 𝑋𝛽)′ (𝑦 − 𝑋𝛽) = 𝑒 ′ 𝑒
Thus:
1 1
− ·(𝑦−𝑋𝛽)′ (𝑦−𝑋𝛽)
𝑓(𝑣) = ·𝑒 2𝜎 2
𝜎√2𝜋
1 1
− 2·(𝑦−𝑋𝛽)′ (𝑦−𝑋𝛽)
𝑓(𝑣) = ·𝑒 2𝜎
√2𝜋𝜎 2
𝐿(𝜃) = ∏ 𝑝(𝑦𝑖 , 𝑥𝑖 , 𝜃)
𝑖=1
𝑛
1 1
− 2 ·(𝑦−𝑋𝛽)′ (𝑦−𝑋𝛽)
𝐿(𝜃) = ∏ ·𝑒 2𝜎
𝑖=1
√2𝜋𝜎 2
1 1
− ·(𝑦−𝑋𝛽)′ (𝑦−𝑋𝛽)
𝐿(𝜃) = ·𝑒 2𝜎 2
∏𝑛𝑖=1 √2𝜋𝜎 2
1 1
− 2 ·(𝑦−𝑋𝛽)′ (𝑦−𝑋𝛽)
𝐿(𝜃) = · 𝑒 2𝜎
∏𝑛𝑖=1(2𝜋𝜎 2 )0.5
Page 7 of 12
1 1
− 2 (𝑦−𝑋𝛽)′ (𝑦−𝑋𝛽)
𝐿(𝜃) = 𝑛·𝑒
2𝜎
(2𝜋𝜎 2 ) 2
1 1
− 2 (𝑦−𝑋𝛽)′ (𝑦−𝑋𝛽)
log(𝐿(𝜃)) = 𝑙(𝜃) = log ( 𝑛 ·𝑒 2𝜎 )
(2𝜋𝜎 2 ) 2
1 1
− 2 (𝑦−𝑋𝛽)′ (𝑦−𝑋𝛽)
𝑙(𝜃) = log ( 𝑛 ) + log (𝑒
2𝜎 )
(2𝜋𝜎 2 ) 2
1
Let us first handle the expression: log ( 𝑛 ):
(2𝜋𝜎 2) 2
𝐴
Use the quotient rule: log (𝐵) = log(𝐴) − log(𝐵)
𝑛
𝑙(𝜃) = log(1) − log [(2𝜋𝜎 2 ) 2 ]
𝑛
𝑙(𝜃) = 0 − · log(2𝜋𝜎 2 )
2
𝑛 𝑛
𝑙(𝜃) = − · log(2𝜋) − log(𝜎 2 )
2 2
1
− 2 (𝑦−𝑋𝛽)′ (𝑦−𝑋𝛽)
Let us now handle the second expression: log (𝑒 2𝜎 )
𝑛 𝑛 1
𝑙(𝜃) = − · log(2𝜋) − log(𝜎 2 ) − 2 (𝑦 − 𝑋𝛽)′ (𝑦 − 𝑋𝛽)
2 2 2𝜎
Page 8 of 12
The maximum likelihood expression for 𝛽 and 𝜎 are found by taking the first-order conditions (FOC) w.r.t.
these parameters. Let us start with finding the FOC w.r.t. 𝛽.
𝜕𝑙 𝜕 1
= (− 2 (𝑦 − 𝑋𝛽)′ (𝑦 − 𝑋𝛽)) = 0
𝜕𝛽 𝜕𝛽 2𝜎
𝜕𝑙 1
= − 2 · 2 · (−𝑋′)(𝑦 − 𝑋𝛽) = 0
𝜕𝛽 2𝜎
1
= · 2𝑋′(𝑦 − 𝑋𝛽) = 0
2𝜎 2
1
= · 𝑋′(𝑦 − 𝑋𝛽) = 0
𝜎2
Multiply with 𝜎 2:
= 𝑋 ′ 𝑦 − 𝑋 ′ 𝑋𝛽 = 0 ⇔ 𝑋 ′ 𝑦 = 𝑋 ′ 𝑋𝛽
−1
Multiply by (𝑋 ′ 𝑋) :
−1
𝛽 = (𝑋 ′ 𝑋) · 𝑋′ 𝑦
𝜕𝑙 𝜕 𝑛 𝑛 1
= (− · log(2𝜋) − log(𝜎 2 ) − 2 (𝑦 − 𝑋𝛽)′ (𝑦 − 𝑋𝛽))
𝜕𝜎 2 𝜕𝜎 2 2 2 2𝜎
𝑛 1 1
= (− − · (𝑦 − 𝑋𝛽)′ (𝑦 − 𝑋𝛽) · (− 4 )) = 0
2𝜎 2 2 𝜎
𝑛 1 1
=− 2 + · (𝑦 − 𝑋𝛽)′ (𝑦 − 𝑋𝛽) · 4 = 0
2𝜎 2 𝜎
𝑛
Simplify by adding and multiply by 2𝜎 4:
2𝜎 2
𝑛
(𝑦 − 𝑋𝛽 )′ (𝑦 − 𝑋𝛽 ) = · 2𝜎 4
2𝜎 2
(𝑦 − 𝑋𝛽)′ (𝑦 − 𝑋𝛽) = 𝑛𝜎 2
Divide by 𝑛:
2 =
1 𝑒′𝑒
𝜎𝑀𝐿 · (𝑦 − 𝑋𝛽)′ (𝑦 − 𝑋𝛽) =
𝑛 𝑛
From p. 128 in Heij et al (2004) we know that the unbiased least squares estimator of 𝜎 2 is:
𝑒′𝑒
𝑠2 =
𝑛−𝑘
Page 9 of 12
2 = 𝑠 2 then,
Thus for 𝑠𝑀𝐿
𝑒′𝑒 𝑛 − 𝑘 𝑒′𝑒
𝑠2 = · = 2
= 𝑠𝑀𝐿
𝑛−𝑘 𝑛 𝑛
And:
2 =
𝑛−𝑘 2
𝑠𝑀𝐿 ·𝑠
𝑛
2 → 𝑠 2 (i.e. unbiased).
When 𝑛 → ∞ then 𝑠𝑀𝐿
Page 10 of 12
Exercise 4
Use the dataset 𝑠𝑒𝑡24. 𝑅𝐷𝑎𝑡𝑎 on stock market returns data for the sector of non-cyclical consumer goods.
4.1
Estimate the model by Maximum Likelihood. Compare the results to normal OLS estimates
The estimates are identical. This is not surprising as the errors are normally distributed. Moreover, standard
errors, t/z-values, and p-values are almost identical.
OLS
MLE
4.2
Create a density plot of the residuals from each model
Page 11 of 12
Page 12 of 12