Chapter8 - Solution Manual
Chapter8 - Solution Manual
SOLUTIONS TO PROBLEMS
8.1 Parts (ii) and (iii). The homoskedasticity assumption played no role in Chapter 5 in showing
that OLS is consistent. But we know that heteroskedasticity causes statistical inference based on
the usual t and F statistics to be invalid, even in large samples. As heteroskedasticity is a
violation of the Gauss-Markov assumptions, OLS is no longer BLUE.
8.3 False. The unbiasedness of WLS and OLS hinges crucially on Assumption MLR.4, and, as
we know from Chapter 4, this assumption is often violated when an important variable is
omitted. When MLR.4 does not hold, both WLS and OLS are biased. Without specific
information on how the omitted variable is correlated with the included explanatory variables, it
is not possible to determine which estimator has a small bias. It is possible that WLS would
have more bias than OLS or less bias. Because we cannot know, we should not claim to use
WLS in order to solve “biases” associated with OLS.
8.5 (i) No. For each coefficient, the usual standard errors and the heteroskedasticity-robust ones
are practically very similar.
(ii) The effect is .029(4) = .116, so the probability of smoking falls by about .116.
(iii) As usual, we compute the turning point in the quadratic: .020/[2(.00026)] 38.46, so
about 38 and one-half years.
(iv) Holding other factors in the equation fixed, a person in a state with restaurant smoking
restrictions has a .101 lower chance of smoking. This is similar to the effect of having four more
years of education.
(v) We just plug the values of the independent variables into the OLS regression line:
ˆ
smokes .656 .069 log(67.44) .012 log(6,500) .029(16) .020(77) .00026(77 2 ) .0052.
Thus, the estimated probability of smoking for this person is close to zero. (In fact, this person is
not a smoker, so the equation predicts well for this particular observation.)
8.7 (i) This follows from the simple fact that, for uncorrelated random variables, the variance of
the sum is the sum of the variances: Var( f i vi ,e ) Var( f i ) Var(vi ,e ) 2f v2 .
(ii) We compute the covariance between any two of the composite errors as
43
where we use the fact that the covariance of a random variable with itself is its variance and the
assumptions that fi , vi ,e , and vi , g are pairwise uncorrelated.
m m m
mi1 e i 1 ui ,e mi1 e i 1 ( f i ui ,e ) fi mi1 e i 1 vi ,e .
Now, by assumption, fi is uncorrelated with each term in the last sum; therefore, fi is uncorrelated
m
with mi1 e i 1 vi ,e . It follows that
m
Var f i mi1 e i 1 vi ,e Var fi Var mi1 e i 1 vi ,e
m
2f v2 / mi ,
where we use the fact that the variance of an average of mi uncorrelated random variables with
common variance ( v2 in this case) is simply the common variance divided by mi – the usual
formula for a sample average from a random sample.
(iv) The standard weighting ignores the variance of the firm effect, 2f . Thus, the
(incorrect) weight function used is1/ hi mi . A valid weighting function is obtained by writing
the variance from (iii) as Var(ui ) 2f [1 ( v2 / 2f ) / mi ] 2f hi . But obtaining the proper
weights requires us to know (or be able to estimate) the ratio v2 / 2f . Estimation is possible, but
we do not discuss that here. In any event, the usual weight is incorrect. When the mi are large or
the ratio v2 / 2f is small – so that the firm effect is more important than the individual-specific
effect – the correct weights are close to being constant. Thus, attaching large weights to large
firms may be quite inappropriate.
the assumption that the variance of u given all explanatory variables depends only on gender is
Then the variance for women is simply 0 and that for men is 0 + 1 ; the difference in
variances is 1.
44
(ii) After estimating the above equation by OLS, we regress uˆi2 on malei, i = 1,2, ,706
(including, of course, an intercept). We can write the results as
Because the coefficient on male is negative, the estimated variance is higher for women.
(iii) No. The t statistic on male is only about –1.06, which is not significant at even the 20%
level against a two-sided alternative.
C8.3 After estimating equation (8.18), we obtain the squared OLS residuals û 2 . The full-blown
White test is based on the R-squared from the auxiliary regression (with an intercept),
where “l ” in front of lotsize and sqrft denotes the natural log. [See equation (8.19).] With 88
observations the n-R-squared version of the White statistic is 88(.109) 9.59, and this is the
outcome of an (approximately) 92 random variable. The p-value is about .385, which provides
little evidence against the homoskedasticity assumption.
C8.5 (i) By regressing sprdcvr on an intercept only we obtain ̂ .515 se .021). The
asymptotic t statistic for H0: µ = .5 is (.515 .5)/.021 .71, which is not significant at the 10%
level, or even the 20% level.
=
sprdcvr .490 + .035 favhome + .118 neutral .023 fav25 + .018 und25
(.045) (.050) (.095) (.050) (.092)
n = 553, R2 = .0034.
The variable neutral has by far the largest effect – if the game is played on a neutral court, the
probability that the spread is covered is estimated to be about .12 higher – and, except for the
intercept, its t statistic is the only t statistic greater than one in absolute value (about 1.24).
(iv) Under H0: 1 = 2 = 3 = 4 = 0, the response probability does not depend on any
explanatory variables, which means neither the mean nor the variance depends on the
explanatory variables. [See equation (8.38).]
45
(v) The F statistic for joint significance, with 4 and 548 df, is about .47 with p-value .76.
There is essentially no evidence against H0.
(vi) Based on these variables, it is not possible to predict whether the spread will be covered.
The explanatory power is very low, and the explanatory variables are jointly very insignificant.
The coefficient on neutral may indicate something is going on with games played on a neutral
court, but we would not want to bet money on it unless it could be confirmed with a separate,
larger sample.
C8.7 (i) The heteroskedasticity-robust standard error for ˆwhite .129 is about .026, which is
notably higher than the nonrobust standard error (about .020). The heteroskedasticity-robust
95% confidence interval is about .078 to .179, while the nonrobust CI is, of course, narrower,
about .090 to .168. The robust CI still excludes the value zero by some margin.
(ii) There are no fitted values less than zero, but there are 231 greater than one. Unless we
do something to those fitted values, we cannot directly apply WLS, as hˆi will be negative in 231
cases.
C8.9 (i) I now get R2 = .0527, but the other estimates seem okay.
(ii) One way to ensure that the unweighted residuals are being provided is to compare them
with the OLS residuals. They will not be the same, of course, but they should not be wildly
different.
(iii) The R-squared from the regression ui2 on yi , yi2 , i 1,...,807 is about .027. We use this
as Rû22 in equation (8.15) but with k = 2. This gives F = 11.15, and so the p-value is essentially
zero.
(iv) The substantial heteroskedasticity found in part (iii) shows that the feasible GLS
procedure described on page 279 does not, in fact, eliminate the heteroskedasticity. Therefore,
the usual standard errors, t statistics, and F statistics reported with weighted least squares are not
valid, even asymptotically.
(v) Weighted least squares estimation with robust standard errors gives
=
cigs 5.64 + 1.30 log(income) 2.94 log(cigpric) .463 educ
(37.31) (.54) (8.97) (.149)
n = 807, R2 = .1134
46
The substantial differences in standard errors compared with equation (8.36) further indicate that
our proposed correction for heteroskedasticity did not fully solve the heteroskedasticity problem.
With the exception of restaurn, all standard errors got notably bigger; for example, the standard
error for log(cigpric) doubled. All variables that were statistically significant with the nonrobust
standard errors remain significant, but the confidence intervals are much wider in several cases.
C8.11 (i) The usual OLS standard errors are in (), the heteroskedasticity-robust standard errors
are in []:
= 17.20
nettfa + .628 inc + .0251 (age 25) 2 + 2.54 male
(2.82) (.080) (.0026) (2.04)
[3.23] [.098] [.0044] [2.06]
n = 2,017, R2 = .131
Although the usual OLS t statistic on the interaction term is about 2.8, the heteroskedasticity-
robust t statistic is just under 1.6. Therefore, using OLS, we must conclude the interaction term is
only marginally significant. But the coefficient is nontrivial: it implies a much more sensitive
relationship between financial wealth and income for those eligible for a 401(k) plan.
(ii) The WLS estimates, with usual WLS standard errors in () and the robust ones in [], are
= 14.09
nettfa + .619 inc + .0175 (age 25) 2 + 1.78 male
(2.27) (.084) (.0019) (1.56)
[2.53] [.091] [.0026] [1.31]
n = 2,017, R2 = .114
The robust t statistic is about 1.84, and so the interaction term is marginally significant (two-
sided p-value is about .066).
(iii) The coefficient on e401k literally gives the estimated difference in financial wealth at inc
= 0, which obviously is not interesting. It is not surprising that it is not statistically different from
zero; we obviously cannot hope to estimate the difference at inc = 0, nor do we care to.
(iv) When we replace e401kinc with e401k(inc 30), the coefficient on e401k becomes
6.68 (robust t = 3.20). Now, this coefficient is the estimated difference in nettfa between those
47
with and without 401(k) eligibility at roughly the average income, $30,000. Naturally, we can
estimate this much more precisely, and its magnitude ($6,680) makes sense.
4.223 + .341 age .0027 age2 .075 educ .310 electric .200 urban
children
(0.240) (.017) (.0003) (.006) (.069) (.047)
[0.244] [.019] [.0004] [.006] [.064] [.045]
n = 4,358, R2 = .573
The robust standard errors for electric and urban are actually smaller than the nonrobust ones.
(ii) I used the test command in Stata to obtain both tests. The p-value for the usual
(nonrobust) F test is .0864, and so we fail to reject that each of the dummies has a zero
coefficient at the 5% level. The p-value of the robust test – where Stata uses a statistic that can be
treated as having an approximate F distribution – is .0911. This is pretty close to the nonrobust
p-value.
(iii) The test for heteroskedasticity yields and F statistic of 726.11, which is a very large
value in an F distribution with 2 and 4,355 df. The p-value is virtually zero; there is strong
evidence of heteroskedasticity.
(iv) Even though we find conclusive evidence of heteroskedasticity, it has only a minor
effect when we compute heteroskedasticity-robust standard errors. Consequently, confidence
intervals and tests of individual coefficients are largely unaffected. In part (ii) we saw that a joint
test was barely effected when we made it robust to heteroskedasticity. So the heteroskedasticity
seems to have a minor effect on inference. Note that we have a large sample size here, so a
statistical finding of heteroskedasticity does not mean its presence need have important practical
effects.
48