Tutorial-5
Tutorial-5
2024-12-01
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
library(fixest)
library(ivreg)
library(gmm)
library(flextable)
library(readxl)
library(tidyverse)
library(stargazer)
library(car)
library(lmtest)
library(modelsummary)
A. summary statistics
summary(mroz$wage)
summary(mroz$educ)
2. OLS estimates of a model for log(wages) controlling for education, experience and
experience square.
# ols <- lm(log(wage) ~ educ + exper + I(exper^2), data=mroz)
# Problem: wages are zero for some individuals in the sample, and therefore
log(wage) is -infinity
# you need to construct a new dataset selecting only idividuals with positive
wages
##
## Call:
## lm(formula = educ ~ exper + I(exper^2) + mothereduc, data = mroz2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.4423 -1.2963 -0.0837 1.1761 5.9870
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.775103 0.423889 23.061 <2e-16 ***
## exper 0.048862 0.041669 1.173 0.242
## I(exper^2) -0.001281 0.001245 -1.029 0.304
## mothereduc 0.267691 0.031130 8.599 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.111 on 424 degrees of freedom
## Multiple R-squared: 0.1527, Adjusted R-squared: 0.1467
## F-statistic: 25.47 on 3 and 424 DF, p-value: 3.617e-15
anova(strong1)
The first step in testing for weak instruments is to estimate the first-stage regression,
where the endogenous variable is regressed on the instrument(s) and any exogenous
control variables. Use summary() to examine the coefficients for the instrument(s).
Significant coefficients (small p-values) suggest that the instrument is relevant. The F-
statistic of the first-stage regression is the key test for weak instruments. If F<10, the
instrument is considered weak. If F≥10, the instrument is considered strong enough.
D. 2SLS (two stage least square) estimator
Obtain the 2SLS (two stage least square) estimator performing all the steps:
#2SLS estimator
##
## Call:
## lm(formula = educ ~ exper + I(exper^2) + mothereduc, data = mroz2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.4423 -1.2963 -0.0837 1.1761 5.9870
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.775103 0.423889 23.061 <2e-16 ***
## exper 0.048862 0.041669 1.173 0.242
## I(exper^2) -0.001281 0.001245 -1.029 0.304
## mothereduc 0.267691 0.031130 8.599 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.111 on 424 degrees of freedom
## Multiple R-squared: 0.1527, Adjusted R-squared: 0.1467
## F-statistic: 25.47 on 3 and 424 DF, p-value: 3.617e-15
##
## Call:
## lm(formula = log(wage) ~ exper + I(exper^2) + educhat, data = mroz2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.15299 -0.34773 0.02906 0.39023 2.35624
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1981861 0.4933427 0.402 0.68809
## exper 0.0448558 0.0141644 3.167 0.00165 **
## I(exper^2) -0.0009221 0.0004240 -2.175 0.03019 *
## educhat 0.0492630 0.0390562 1.261 0.20788
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.709 on 424 degrees of freedom
## Multiple R-squared: 0.04559, Adjusted R-squared: 0.03884
## F-statistic: 6.751 on 3 and 424 DF, p-value: 0.0001861
##
## Call:
## ivreg(formula = log(wage) ~ educ + exper + I(exper^2) | mothereduc +
## exper + I(exper^2), data = mroz2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.10804 -0.32633 0.06024 0.36772 2.34351
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1981861 0.4728772 0.419 0.67535
## educ 0.0492630 0.0374360 1.316 0.18891
## exper 0.0448558 0.0135768 3.304 0.00103 **
## I(exper^2) -0.0009221 0.0004064 -2.269 0.02377 *
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 424 73.946 <2e-16 ***
## Wu-Hausman 1 423 2.968 0.0856 .
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6796 on 424 degrees of freedom
## Multiple R-Squared: 0.1231, Adjusted R-squared: 0.1169
## Wald test: 7.348 on 3 and 424 DF, p-value: 8.228e-05
(note x1 is the endogenous variable and z1 is the instrument, x2 and x3 are exogenous
variables)
iv <- ivreg(y ~ x1 + x2 + x3 | z1 + x2 + x3, data=dataset) note the weak instrument test as part
of the output
G. Compare the OLS and IV estimates
m_list <- list(OLS = ols, TWOSLS = iv2, TWOSLSmanually=step2)
msummary(m_list, stars=TRUE)
note that coefficients of 2SLS and 2SLSmanual are the same but standard errors are
different: when using ivreg command, it computes correct standard errors in second stage,
so always use this command! Compare OLS and IV estimates: the estimated returns to
education is 4.93% with 2SLS, which is lower than the OLS estimate of 10.75%; this is
consistent with a positive omitted variable bias due to positive correlation between the
variable educ and the omitted factor. Note also that the standard errors in 2SLS (the
correct ones) of the instrumeted regressor is quite high and the variable educ is now not
significant. So there is an efficiency loss with the IV regression and at this point we would
like to make sure that we really need IV estimation (i.e. educ is endogenous).
H. GIVE estimates using gmm
mroz2$exper2 <- (mroz2$exper^2)
mm <- cbind(mroz2$mothereduc, mroz2$exper, mroz2$exper2)
gmm <- gmm(log(wage) ~ educ + exper + I(exper^2),
x = mm, data = mroz2)
summary(gmm)
##
## Call:
## gmm(g = log(wage) ~ educ + exper + I(exper^2), x = mm, data = mroz2)
##
##
## Method: twoStep
##
## Kernel: Quadratic Spectral
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.19818608 0.56526349 0.35060831 0.72588222
## educ 0.04926295 0.04450138 1.10699836 0.26829464
## exper 0.04485585 0.01453878 3.08525506 0.00203378
## I(exper^2) -0.00092208 0.00040154 -2.29632397 0.02165736
##
## J-Test: degrees of freedom is 0
## J-test P-value
## Test E(g)=0: 1.20773227475269e-24 *******
I. Wu-Hauman test
# First-stage regression
step1 <- lm(educ ~ mothereduc + exper + I(exper^2), data = mroz2)
#here, the endogenous variable (educ) is regressed on the instrumental
variable (mothereduc) and other exogenous variables
##
## Call:
## lm(formula = log(wage) ~ educ + exper + I(exper^2) + v, data = mroz2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.04565 -0.30313 0.04232 0.39785 2.34435
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1981861 0.4626315 0.428 0.668586
## educ 0.0492630 0.0366249 1.345 0.179324
## exper 0.0448558 0.0132827 3.377 0.000801 ***
## I(exper^2) -0.0009221 0.0003976 -2.319 0.020857 *
## v 0.0683815 0.0396903 1.723 0.085642 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6649 on 423 degrees of freedom
## Multiple R-squared: 0.1627, Adjusted R-squared: 0.1548
## F-statistic: 20.55 on 4 and 423 DF, p-value: 1.733e-15
we reject at 10% but there does not seeem to be a strong rejection of no endogeneity This
p-value is close to 0.1 but greater than typical significance levels (0.05), meaning we fail to
reject the null hypothesis of no endogeneity at the 5% level. However, at a 10%
significance level, the result could be interpreted as weak evidence of endogeneity.
J. two instruments Test for weak instruments
strong2 <- lm(educ ~ exper + exper2 + mothereduc + fathereduc, data=mroz2)
summary(strong2)
##
## Call:
## lm(formula = educ ~ exper + exper2 + mothereduc + fathereduc,
## data = mroz2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.8057 -1.0520 -0.0371 1.0258 6.3787
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.102640 0.426561 21.340 < 2e-16 ***
## exper 0.045225 0.040251 1.124 0.262
## exper2 -0.001009 0.001203 -0.839 0.402
## mothereduc 0.157597 0.035894 4.391 1.43e-05 ***
## fathereduc 0.189548 0.033756 5.615 3.56e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.039 on 423 degrees of freedom
## Multiple R-squared: 0.2115, Adjusted R-squared: 0.204
## F-statistic: 28.36 on 4 and 423 DF, p-value: < 2.2e-16
##
## Linear hypothesis test:
## mothereduc = 0
## fathereduc = 0
##
## Model 1: restricted model
## Model 2: educ ~ exper + exper2 + mothereduc + fathereduc
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 425 2219.2
## 2 423 1758.6 2 460.64 55.4 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Both mothereduc and fathereduc are significant predictors of educ individually (based on
their p-values). The joint hypothesis test confirms that they are jointly significant as
predictors of educ. This means they are relevant instruments for use in an instrumental
variables (IV) regression. The F-statistic is 55.4 with a p-value of < 2.2e-16. This means we
reject the null hypothesis that mothereduc and fathereduc are jointly equal to zero.
K. Obtain the 2SLS estimates.
iv3 <- ivreg(log(wage) ~ educ + exper + exper2 | exper + exper2 +
mothereduc + fathereduc , data = mroz2)
summary(iv3)
##
## Call:
## ivreg(formula = log(wage) ~ educ + exper + exper2 | exper + exper2 +
## mothereduc + fathereduc, data = mroz2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0986 -0.3196 0.0551 0.3689 2.3493
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0481003 0.4003281 0.120 0.90442
## educ 0.0613966 0.0314367 1.953 0.05147 .
## exper 0.0441704 0.0134325 3.288 0.00109 **
## exper2 -0.0008990 0.0004017 -2.238 0.02574 *
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 2 423 55.400 <2e-16 ***
## Wu-Hausman 1 423 2.793 0.0954 .
## Sargan 1 NA 0.378 0.5386
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6747 on 424 degrees of freedom
## Multiple R-Squared: 0.1357, Adjusted R-squared: 0.1296
## Wald test: 8.141 on 3 and 424 DF, p-value: 2.787e-05
L. Wu-Hausman test
# First-stage regression
step5 <- lm(educ ~ mothereduc + fathereduc + exper + I(exper^2), data =
mroz2)
#here, the endogenous variable (educ) is regressed on the instrumental
variable and other exogenous variables
##
## Call:
## lm(formula = log(wage) ~ educ + exper + I(exper^2) + u, data = mroz2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.03743 -0.30775 0.04191 0.40361 2.33303
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0481003 0.3945753 0.122 0.903033
## educ 0.0613966 0.0309849 1.981 0.048182 *
## exper 0.0441704 0.0132394 3.336 0.000924 ***
## I(exper^2) -0.0008990 0.0003959 -2.271 0.023672 *
## u 0.0581666 0.0348073 1.671 0.095441 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.665 on 423 degrees of freedom
## Multiple R-squared: 0.1624, Adjusted R-squared: 0.1544
## F-statistic: 20.5 on 4 and 423 DF, p-value: 1.888e-15
##
## Call:
## lm(formula = res_iv3 ~ exper + exper2 + mothereduc + fathereduc,
## data = mroz2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1012 -0.3124 0.0478 0.3602 2.3441
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.096e-02 1.413e-01 0.078 0.938
## exper -1.833e-05 1.333e-02 -0.001 0.999
## exper2 7.341e-07 3.985e-04 0.002 0.999
## mothereduc -6.607e-03 1.189e-02 -0.556 0.579
## fathereduc 5.782e-03 1.118e-02 0.517 0.605
##
## Residual standard error: 0.6752 on 423 degrees of freedom
## Multiple R-squared: 0.0008833, Adjusted R-squared: -0.008565
## F-statistic: 0.0935 on 4 and 423 DF, p-value: 0.9845