0% found this document useful (0 votes)
3 views

Tutorial-5

Uploaded by

sahrish.khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Tutorial-5

Uploaded by

sahrish.khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Tutorial 5

Sahrish Aisha Khan

2024-12-01
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)

library(fixest)
library(ivreg)
library(gmm)
library(flextable)
library(readxl)
library(tidyverse)
library(stargazer)
library(car)
library(lmtest)
library(modelsummary)

mroz <- read_excel("mroz.xlsx")


mroz <- data.frame(mroz)

A. summary statistics
summary(mroz$wage)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## 0.000 0.000 1.625 2.375 3.788 25.000

summary(mroz$educ)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## 5.00 12.00 12.00 12.29 13.00 17.00
summary(mroz$exper)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## 0.00 4.00 9.00 10.63 15.00 45.00

2. OLS estimates of a model for log(wages) controlling for education, experience and
experience square.
# ols <- lm(log(wage) ~ educ + exper + I(exper^2), data=mroz)
# Problem: wages are zero for some individuals in the sample, and therefore
log(wage) is -infinity
# you need to construct a new dataset selecting only idividuals with positive
wages

mroz2 <-subset(mroz, wage>0)


ols <- lm(log(wage) ~ educ + exper + I(exper^2), data=mroz2)
summary(ols)
##
## Call:
## lm(formula = log(wage) ~ educ + exper + I(exper^2), data = mroz2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.08404 -0.30627 0.04952 0.37498 2.37115
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.5220406 0.1986321 -2.628 0.00890 **
## educ 0.1074896 0.0141465 7.598 1.94e-13 ***
## exper 0.0415665 0.0131752 3.155 0.00172 **
## I(exper^2) -0.0008112 0.0003932 -2.063 0.03974 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6664 on 424 degrees of freedom
## Multiple R-squared: 0.1568, Adjusted R-squared: 0.1509
## F-statistic: 26.29 on 3 and 424 DF, p-value: 1.302e-15

C. A test for weak instruments


strong1 <- lm(educ ~ exper + I(exper^2) + mothereduc, data=mroz2)
summary(strong1)

##
## Call:
## lm(formula = educ ~ exper + I(exper^2) + mothereduc, data = mroz2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.4423 -1.2963 -0.0837 1.1761 5.9870
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.775103 0.423889 23.061 <2e-16 ***
## exper 0.048862 0.041669 1.173 0.242
## I(exper^2) -0.001281 0.001245 -1.029 0.304
## mothereduc 0.267691 0.031130 8.599 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.111 on 424 degrees of freedom
## Multiple R-squared: 0.1527, Adjusted R-squared: 0.1467
## F-statistic: 25.47 on 3 and 424 DF, p-value: 3.617e-15

anova(strong1)

## Analysis of Variance Table


##
## Response: educ
## Df Sum Sq Mean Sq F value Pr(>F)
## exper 1 0.52 0.52 0.1157 0.7339
## I(exper^2) 1 10.46 10.46 2.3479 0.1262
## mothereduc 1 329.56 329.56 73.9459 <2e-16 ***
## Residuals 424 1889.66 4.46
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# the F statistic for a test of not significance of the coefficient on the


variable mothereduc is 73.9, well above 10
# instrument is strong

The first step in testing for weak instruments is to estimate the first-stage regression,
where the endogenous variable is regressed on the instrument(s) and any exogenous
control variables. Use summary() to examine the coefficients for the instrument(s).
Significant coefficients (small p-values) suggest that the instrument is relevant. The F-
statistic of the first-stage regression is the key test for weak instruments. If F<10, the
instrument is considered weak. If F≥10, the instrument is considered strong enough.
D. 2SLS (two stage least square) estimator
Obtain the 2SLS (two stage least square) estimator performing all the steps:

- stage 1: regress endogenous variable on exogenous variables and instruments


- obtain predicted values
- stage 2: regress dependent variable on exogenous regressors and predicted
values form stage 1

#2SLS estimator

step1 <- lm(educ ~ exper + I(exper^2) + mothereduc, data=mroz2)


summary(step1)

##
## Call:
## lm(formula = educ ~ exper + I(exper^2) + mothereduc, data = mroz2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.4423 -1.2963 -0.0837 1.1761 5.9870
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.775103 0.423889 23.061 <2e-16 ***
## exper 0.048862 0.041669 1.173 0.242
## I(exper^2) -0.001281 0.001245 -1.029 0.304
## mothereduc 0.267691 0.031130 8.599 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.111 on 424 degrees of freedom
## Multiple R-squared: 0.1527, Adjusted R-squared: 0.1467
## F-statistic: 25.47 on 3 and 424 DF, p-value: 3.617e-15

educhat <- fitted(step1)


step2 <- lm(log(wage) ~ exper + I(exper^2) + educhat, data = mroz2)
summary(step2)

##
## Call:
## lm(formula = log(wage) ~ exper + I(exper^2) + educhat, data = mroz2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.15299 -0.34773 0.02906 0.39023 2.35624
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1981861 0.4933427 0.402 0.68809
## exper 0.0448558 0.0141644 3.167 0.00165 **
## I(exper^2) -0.0009221 0.0004240 -2.175 0.03019 *
## educhat 0.0492630 0.0390562 1.261 0.20788
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.709 on 424 degrees of freedom
## Multiple R-squared: 0.04559, Adjusted R-squared: 0.03884
## F-statistic: 6.751 on 3 and 424 DF, p-value: 0.0001861

stage 1: regress endogenous variable on exogenous variables and instruments obtain


predicted values stage 2: regress dependent variable on exogenous regressors and
predicted values form stage 1
E. fixest package
Let’s use now the fixest package; use the following command where x1 and x2 are the
exogenous regressors, x3 is the endogenous regressor and z is the instrument(s):

feols(y ~ x1 + x2 | x3 ~ z, data = dataset)

iv <- feols(log(wage) ~ exper + I(exper^2) | educ ~ mothereduc, mroz2)


summary(iv)

## TSLS estimation - Dep. Var.: log(wage)


## Endo. : educ
## Instr. : mothereduc
## Second stage: Dep. Var.: log(wage)
## Observations: 428
## Standard-errors: IID
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.198186 0.472877 0.419107 0.6753503
## fit_educ 0.049263 0.037436 1.315924 0.1889107
## exper 0.044856 0.013577 3.303856 0.0010346 **
## I(exper^2) -0.000922 0.000406 -2.268993 0.0237705 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.67642 Adj. R2: 0.116926
## F-test (1st stage), educ: stat = 73.9 , p < 2.2e-16 , on 1 and 424 DoF.
## Wu-Hausman: stat = 2.9683, p = 0.085642, on 1 and 423 DoF.

Here, feols(y ~ x1 + x2 | x3 ~ z, data = dataset) where x1 and x2 are the exogenous


regressors, x3 is the endogenous regressor and z is the instrument(s) The code performs
Instrumental Variable (IV) Regression to estimate the relationship between log(wage)
(dependent variable) and education (educ), while accounting for potential endogeneity in
the education variable by using an instrument (mothereduc).
F. ivreg to obtain 2SLS estimates
Let’s use instead the command ivreg to obtain 2SLS estimates:
(note x1 is the endogenous variable and z1 is the instrument, x2 and x3 are
exogenous variables)

iv <- ivreg(y ~ x1 + x2 + x3 | z1 + x2 + x3, data=dataset)

iv2 <- ivreg(log(wage) ~ educ + exper + I(exper^2) | mothereduc + exper +


I(exper^2), data=mroz2)
summary(iv2)

##
## Call:
## ivreg(formula = log(wage) ~ educ + exper + I(exper^2) | mothereduc +
## exper + I(exper^2), data = mroz2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.10804 -0.32633 0.06024 0.36772 2.34351
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1981861 0.4728772 0.419 0.67535
## educ 0.0492630 0.0374360 1.316 0.18891
## exper 0.0448558 0.0135768 3.304 0.00103 **
## I(exper^2) -0.0009221 0.0004064 -2.269 0.02377 *
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 424 73.946 <2e-16 ***
## Wu-Hausman 1 423 2.968 0.0856 .
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6796 on 424 degrees of freedom
## Multiple R-Squared: 0.1231, Adjusted R-squared: 0.1169
## Wald test: 7.348 on 3 and 424 DF, p-value: 8.228e-05

(note x1 is the endogenous variable and z1 is the instrument, x2 and x3 are exogenous
variables)
iv <- ivreg(y ~ x1 + x2 + x3 | z1 + x2 + x3, data=dataset) note the weak instrument test as part
of the output
G. Compare the OLS and IV estimates
m_list <- list(OLS = ols, TWOSLS = iv2, TWOSLSmanually=step2)
msummary(m_list, stars=TRUE)

OLS TWOSLS TWOSLSmanually


(Intercept) -0.522** 0.198 0.198
(0.199) (0.473) (0.493)
educ 0.107*** 0.049
(0.014) (0.037)
exper 0.042** 0.045** 0.045**
(0.013) (0.014) (0.014)
I(exper^2) -0.001* -0.001* -0.001*
(0.000) (0.000) (0.000)
educhat 0.049
(0.039)
Num.Obs. 428 428 428
R2 0.157 0.123 0.046
R2 Adj. 0.151 0.117 0.039
AIC 1892.0 1908.8 1945.0
BIC 1912.3 1929.1 1965.3
Log.Lik. -431.599 -458.117
F 26.286 6.751
RMSE 0.66 0.68 0.71
OLS TWOSLS TWOSLSmanually
• p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

note that coefficients of 2SLS and 2SLSmanual are the same but standard errors are
different: when using ivreg command, it computes correct standard errors in second stage,
so always use this command! Compare OLS and IV estimates: the estimated returns to
education is 4.93% with 2SLS, which is lower than the OLS estimate of 10.75%; this is
consistent with a positive omitted variable bias due to positive correlation between the
variable educ and the omitted factor. Note also that the standard errors in 2SLS (the
correct ones) of the instrumeted regressor is quite high and the variable educ is now not
significant. So there is an efficiency loss with the IV regression and at this point we would
like to make sure that we really need IV estimation (i.e. educ is endogenous).
H. GIVE estimates using gmm
mroz2$exper2 <- (mroz2$exper^2)
mm <- cbind(mroz2$mothereduc, mroz2$exper, mroz2$exper2)
gmm <- gmm(log(wage) ~ educ + exper + I(exper^2),
x = mm, data = mroz2)
summary(gmm)

##
## Call:
## gmm(g = log(wage) ~ educ + exper + I(exper^2), x = mm, data = mroz2)
##
##
## Method: twoStep
##
## Kernel: Quadratic Spectral
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.19818608 0.56526349 0.35060831 0.72588222
## educ 0.04926295 0.04450138 1.10699836 0.26829464
## exper 0.04485585 0.01453878 3.08525506 0.00203378
## I(exper^2) -0.00092208 0.00040154 -2.29632397 0.02165736
##
## J-Test: degrees of freedom is 0
## J-test P-value
## Test E(g)=0: 1.20773227475269e-24 *******

• create the square of experience variable


• define the moment conditions you are going to use, i.e. the vector of exogenous
variable plus instruments
• estimate the model using those moment conditions gmm <- gmm(y ~ x1 + x2 + x3, x
= mm, data = mroz2)
m_list2 <- list(OLS = ols, TWOSLS = iv2, GMM = gmm)
msummary(m_list2, stars=TRUE)
OLS TWOSLS GMM
(Intercept) -0.522** 0.198 0.198
(0.199) (0.473) (0.565)
educ 0.107*** 0.049 0.049
(0.014) (0.037) (0.045)
exper 0.042** 0.045** 0.045**
(0.013) (0.014) (0.015)
I(exper^2) -0.001* -0.001* -0.001*
(0.000) (0.000) (0.000)
Num.Obs. 428 428 428
R2 0.157 0.123
R2 Adj. 0.151 0.117
AIC 1892.0 1908.8
BIC 1912.3 1929.1
Log.Lik. -431.599
F 26.286
RMSE 0.66 0.68
• p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

I. Wu-Hauman test
# First-stage regression
step1 <- lm(educ ~ mothereduc + exper + I(exper^2), data = mroz2)
#here, the endogenous variable (educ) is regressed on the instrumental
variable (mothereduc) and other exogenous variables

# Extract residuals from the first stage


v <- step1$residuals
# Augmented regression
wu_test <- lm(log(wage) ~ educ + exper + I(exper^2) + v, data = mroz2)

# Summary of the model


summary(wu_test)

##
## Call:
## lm(formula = log(wage) ~ educ + exper + I(exper^2) + v, data = mroz2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.04565 -0.30313 0.04232 0.39785 2.34435
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1981861 0.4626315 0.428 0.668586
## educ 0.0492630 0.0366249 1.345 0.179324
## exper 0.0448558 0.0132827 3.377 0.000801 ***
## I(exper^2) -0.0009221 0.0003976 -2.319 0.020857 *
## v 0.0683815 0.0396903 1.723 0.085642 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6649 on 423 degrees of freedom
## Multiple R-squared: 0.1627, Adjusted R-squared: 0.1548
## F-statistic: 20.55 on 4 and 423 DF, p-value: 1.733e-15

we reject at 10% but there does not seeem to be a strong rejection of no endogeneity This
p-value is close to 0.1 but greater than typical significance levels (0.05), meaning we fail to
reject the null hypothesis of no endogeneity at the 5% level. However, at a 10%
significance level, the result could be interpreted as weak evidence of endogeneity.
J. two instruments Test for weak instruments
strong2 <- lm(educ ~ exper + exper2 + mothereduc + fathereduc, data=mroz2)
summary(strong2)

##
## Call:
## lm(formula = educ ~ exper + exper2 + mothereduc + fathereduc,
## data = mroz2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.8057 -1.0520 -0.0371 1.0258 6.3787
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.102640 0.426561 21.340 < 2e-16 ***
## exper 0.045225 0.040251 1.124 0.262
## exper2 -0.001009 0.001203 -0.839 0.402
## mothereduc 0.157597 0.035894 4.391 1.43e-05 ***
## fathereduc 0.189548 0.033756 5.615 3.56e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.039 on 423 degrees of freedom
## Multiple R-squared: 0.2115, Adjusted R-squared: 0.204
## F-statistic: 28.36 on 4 and 423 DF, p-value: < 2.2e-16

library(car) #Testing the Joint Validity of Instruments


linearHypothesis(strong2, c("mothereduc=0", "fathereduc=0"))

##
## Linear hypothesis test:
## mothereduc = 0
## fathereduc = 0
##
## Model 1: restricted model
## Model 2: educ ~ exper + exper2 + mothereduc + fathereduc
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 425 2219.2
## 2 423 1758.6 2 460.64 55.4 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Both mothereduc and fathereduc are significant predictors of educ individually (based on
their p-values). The joint hypothesis test confirms that they are jointly significant as
predictors of educ. This means they are relevant instruments for use in an instrumental
variables (IV) regression. The F-statistic is 55.4 with a p-value of < 2.2e-16. This means we
reject the null hypothesis that mothereduc and fathereduc are jointly equal to zero.
K. Obtain the 2SLS estimates.
iv3 <- ivreg(log(wage) ~ educ + exper + exper2 | exper + exper2 +
mothereduc + fathereduc , data = mroz2)
summary(iv3)

##
## Call:
## ivreg(formula = log(wage) ~ educ + exper + exper2 | exper + exper2 +
## mothereduc + fathereduc, data = mroz2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0986 -0.3196 0.0551 0.3689 2.3493
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0481003 0.4003281 0.120 0.90442
## educ 0.0613966 0.0314367 1.953 0.05147 .
## exper 0.0441704 0.0134325 3.288 0.00109 **
## exper2 -0.0008990 0.0004017 -2.238 0.02574 *
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 2 423 55.400 <2e-16 ***
## Wu-Hausman 1 423 2.793 0.0954 .
## Sargan 1 NA 0.378 0.5386
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6747 on 424 degrees of freedom
## Multiple R-Squared: 0.1357, Adjusted R-squared: 0.1296
## Wald test: 8.141 on 3 and 424 DF, p-value: 2.787e-05

L. Wu-Hausman test
# First-stage regression
step5 <- lm(educ ~ mothereduc + fathereduc + exper + I(exper^2), data =
mroz2)
#here, the endogenous variable (educ) is regressed on the instrumental
variable and other exogenous variables

# Extract residuals from the first stage


u <- step5$residuals
# Augmented regression
wu_test1 <- lm(log(wage) ~ educ + exper + I(exper^2) + u, data = mroz2)

# Summary of the model


summary(wu_test1)

##
## Call:
## lm(formula = log(wage) ~ educ + exper + I(exper^2) + u, data = mroz2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.03743 -0.30775 0.04191 0.40361 2.33303
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0481003 0.3945753 0.122 0.903033
## educ 0.0613966 0.0309849 1.981 0.048182 *
## exper 0.0441704 0.0132394 3.336 0.000924 ***
## I(exper^2) -0.0008990 0.0003959 -2.271 0.023672 *
## u 0.0581666 0.0348073 1.671 0.095441 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.665 on 423 degrees of freedom
## Multiple R-squared: 0.1624, Adjusted R-squared: 0.1544
## F-statistic: 20.5 on 4 and 423 DF, p-value: 1.888e-15

or simply look at results of ivreg again we reject at 10%


M. The Sargan test
res_iv3 <- iv3$residuals
sargan <- lm(res_iv3 ~ exper + exper2 + mothereduc + fathereduc, data=mroz2)
summary(sargan)

##
## Call:
## lm(formula = res_iv3 ~ exper + exper2 + mothereduc + fathereduc,
## data = mroz2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1012 -0.3124 0.0478 0.3602 2.3441
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.096e-02 1.413e-01 0.078 0.938
## exper -1.833e-05 1.333e-02 -0.001 0.999
## exper2 7.341e-07 3.985e-04 0.002 0.999
## mothereduc -6.607e-03 1.189e-02 -0.556 0.579
## fathereduc 5.782e-03 1.118e-02 0.517 0.605
##
## Residual standard error: 0.6752 on 423 degrees of freedom
## Multiple R-squared: 0.0008833, Adjusted R-squared: -0.008565
## F-statistic: 0.0935 on 4 and 423 DF, p-value: 0.9845

# compute the test statistic = N*R^2=428*0.0009=0.3852, you cannot reject the


null that instruments are valid.
# again you can check the output of 2SLS.

You might also like