MID-TERM REVIEWS
PART A: INSTRUMENTAL VARIABLES
1. What is endogenous problem? What causes endogenous?
2. What solutions are suggested when the model has endogenous problems?
3. What is the condition for the variable Z to be an instrumental variable for the endogenous
variable?
4. When is the instrumental variable Z called a weak instrumental variable? Should weak
instrumental variables be used for estimation?
5. Consider a simple model to estimate the effect of personal computer (𝑃𝐶) ownership on college
grade point average (GPA) for graduating seniors at a large public university:
𝐺𝑃𝐴 = 𝛽0 + 𝛽1 𝑃𝐶 + 𝑢,
where 𝑃𝐶 is a binary variable indicating PC ownership.
a. Why might 𝑃𝐶 ownership be correlated with u?
b. Explain why PC is likely to be related to parents’ annual income. Does this mean parental
income is a good IV for PC? Why or why not?
c. Suppose that, four years ago, the university gave grants to buy computers to roughly perhalf
of the incoming students, and the students who received grants were randomly chosen.
Carefully explain how you would use this information to construct an instrumental variable
for PC.
6. In a recent article, Evans and Schwab (1995) studied the effects of attending a Catholic high
school on the probability of attending college. For concreteness, let college be a binary variable
equal to unity if a student attends college, and zero otherwise. Let CathHS be a binary variable
equal to one if the student attends a Catholic high school. A linear probability model is
𝑐𝑜𝑙𝑙𝑒𝑔𝑒 = 𝛽0 + 𝛽1 𝐶𝑎𝑡ℎ𝐻𝑆 + 𝛽2 𝑜𝑡ℎ𝑒𝑟𝑓𝑎𝑐𝑡𝑜𝑟𝑠 + 𝑢,
where the other factors include gender, race, family income, and parental education.
a) Why might 𝐶𝑎𝑡ℎ𝐻𝑆 be correlated with u?
b) Evans and Schwab have data on a standardized test score taken when each student was a
sophomore. What can be done with this variable to improve the ceteris paribus estimate of
attending a Catholic high school?
c) Let 𝐶𝑎𝑡ℎ𝑅𝑒𝑙 be a binary variable equal to one if the student is Catholic. Discuss the two
requirements needed for this to be a valid IV for 𝐶𝑎𝑡ℎ𝐻𝑆 in the preceding equation. Which
of these can be tested?
d) Not surprisingly, being Catholic has a significant positive effect on attending a Catholic
high school. Do you think 𝐶𝑎𝑡ℎ𝑅𝑒𝑙 is a convincing instrument for 𝐶𝑎𝑡ℎ𝐻𝑆?
7. The data in [Link] include, for women in Botswana during 1988, information on number
of children, years of education, age, and religious and economic status variables.
a) Estimate the model
𝑐ℎ𝑖𝑙𝑑𝑟𝑒𝑛 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐 + 𝛽2 𝑎𝑔𝑒 + 𝛽3 𝑎𝑔𝑒 2 + 𝑢
by OLS and interpret the estimates. In particular, holding 𝑎𝑔𝑒 fixed, what is the estimated
effect of another year of education on fertility? If 100 women receive another year of
education, how many fewer children are they expected to have?
b) The variable 𝑓𝑟𝑠𝑡ℎ𝑎𝑙𝑓 is a dummy variable equal to one if the woman was born during the
first six months of the year. Assuming that 𝑓𝑟𝑠𝑡ℎ𝑎𝑙𝑓 is uncorrelated with the error term
from part (a), show that 𝑓𝑟𝑠𝑡ℎ𝑎𝑙𝑓 is a reasonable IV candidate for 𝑒𝑑𝑢𝑐. (Hint: You need
to do a regression.)
c) Estimate the model from part (a) by using 𝑓𝑟𝑠𝑡ℎ𝑎𝑙𝑓 as an IV for 𝑒𝑑𝑢𝑐. Compare the
estimated effect of education with the OLS estimate from part (a).
d) Add the binary variables 𝑒𝑙𝑒𝑐𝑡𝑟𝑖𝑐, 𝑡𝑣, and 𝑏𝑖𝑐𝑦𝑐𝑙𝑒 to the model and assume these are
exogenous. Estimate the equation by OLS and 2SLS and compare the estimated
coefficients on 𝑒𝑑𝑢𝑐. Interpret the coefficient on 𝑡𝑣 and explain why television ownership
has a negative effect on fertility.
PART B: PANEL MODELS
1. What are the advantages of the panel data model over the cross data model?
2. Compare the difference of FE and RE models? What is the basis for choosing FE or RE to
estimate the model with panel data? Which test is used for selection?
3. What is balanced panel data? What causes unbalance?
4. The defect test results of the regression model are reported as follows:
Modified Wald test for groupwise heteroskedasticity
in fixed effect regression model
H0: sigma(i)^2 = sigma^2 for all i
chi2 (2094) = 2.1e+05
Prob>chi2 = 0.0000
What defects does the model have? How to fix it?
5. Use the data in JTRAIN to determine the effect of the job training grant on hours of job training
per employee. The basic model for the three years is
ℎ𝑟𝑠𝑒𝑚𝑝𝑖𝑡 = 𝛽0 + 𝛿1 𝑑88𝑡 + 𝛿2 𝑑89𝑡 + 𝛽1 𝑔𝑟𝑎𝑛𝑡𝑖𝑡 + 𝛽2 𝑔𝑟𝑎𝑛𝑡_1𝑖,𝑡 + 𝛽3 log(𝑒𝑚𝑝𝑙𝑜𝑦𝑖𝑡 ) + 𝑐𝑖 + 𝑢𝑖𝑡
a. Estimate the equation using fixed effects. How many firms are used in the FE estimation?
How many total observations would be used if each firm had data on all variables (in
particular, hrsemp) for all three years?
b. Interpret the coefficient on grant and comment on its significance.
c. Do larger firms provide their employees with more or less training, on average? How big
are the differences?
6. Use the state-level data on murder rates and executions in MURDER for the following
exercise.
a. Consider the unobserved effects model
𝑚𝑟𝑑𝑟𝑡𝑒𝑖𝑡 = 𝜇𝑡 + 𝛽1 𝑒𝑥𝑒𝑐𝑖𝑡 + 𝛽2 𝑢𝑛𝑒𝑚𝑖𝑡 + 𝑐𝑖 + 𝑢𝑖𝑡
where 𝜇𝑡 simply denotes different year intercepts and ai is the unobserved state effect. If
past executions of convicted murderers have a deterrent effect, what should be the sign of
𝛽1? What sign do you think 𝛽2 should have? Explain.
b. Using just the years 1990 and 1993, estimate the equation from part (i) by pooled OLS.
Ignore the serial correlation problem in the composite errors. Do you find any evidence for
a deterrent effect?D
7. To evaluate the impact of FDI on business performance, we consider the following model
ln 𝑉𝐴𝑖𝑡 = 𝛽0 + 𝛽1 ln 𝐾𝑖𝑡 + 𝛽2 ln 𝐿𝑖𝑡 + 𝛽3 𝐹𝐷𝐼𝑖𝑡 + 𝛽4 𝑒𝑑𝑢𝑖𝑡 + 𝛽5 𝑅𝐷𝑖𝑡 + 𝛽6 𝑠𝑖𝑧𝑒𝑖𝑡 + 𝑐𝑖 + 𝑢𝑖𝑡
where, ln 𝑉𝐴is the natural logarithm of total added value; 𝐹𝐷𝐼 is a dummy variable, equal to
1 if the enterprise has foreign direct investment, and zero otherwise; ln 𝐾 is the natural
logarithm of total capital; ln 𝐿 is the logarithm of total labor; 𝑒𝑑𝑢 is labor training cost/total
labor; 𝑅𝐷 is the total cost of research and development/total investment; 𝑠𝑖𝑧𝑒 is the size of the
business, the dummy variable includes 4 categories (1-super small; 2-small; 3-medium; 4-
large), and 𝑠𝑖𝑧𝑒_1 is the base category.
The estimated results of the panel model are reported below.
Fixed-effects (within) regression Number of obs = 736,694
Group variable: ma_thue Number of groups = 105,242
R-sq: Obs per group:
within = 0.3026 min = 7
between = 0.8641 avg = 7.0
overall = 0.7677 max = 7
F(8,105241) = 15766.30
corr(u_i, Xb) = 0.4678 Prob > F = 0.0000
(Std. Err. adjusted for 105,242 clusters in ma_thue)
------------------------------------------------------------------------------
| Robust
ln_VA_ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ln_K_ | .152219 .0014121 107.80 0.000 .1494513 .1549867
ln_L_ | .7200339 .0031237 230.51 0.000 .7139116 .7261563
fdi_ | .2223424 .0356943 6.23 0.000 .1523822 .2923027
edu_ | .0001049 .0000345 3.04 0.002 .0000373 .0001725
R&D_ | .0000101 .0000192 -0.53 0.599 -.0000477 .0000275
|
size_ |
2 | .033728 .0042856 7.87 0.000 .0253283 .0421277
3 | -.0277995 .0080698 -3.44 0.001 -.0436162 -.0119827
4 | -.0874293 .0115103 -7.60 0.000 -.1099893 -.0648694
|
_cons | 3.944356 .0112145 351.72 0.000 3.922376 3.966336
-------------+----------------------------------------------------------------
sigma_u | .67424384
sigma_e | .63714114
rho | .52827013 (fraction of variance due to u_i)
------------------------------------------------------------------------------
a. How many observations are included in the data? Is the data balanced?
b. Is the above result estimated from the fixed effects model or the random effects model?
c. Explain the meaning of the estimate coefficient of the variable 𝐹𝐷𝐼
d. From the estimated coefficient of the variable 𝑒𝑑𝑢, how do you conclude about the impact
of spending on labor training on the performance of the enterprise?
e. From the estimated coefficient of the variable RD, how do you conclude about the impact
of spending on R&D on the performance of the business?
f. Can it be concluded that enterprise size has a positive effect on firm performance? Why?