Wooldridge 6e Ch05 IM
Wooldridge 6e Ch05 IM
CHAPTER 5
Multiple Regression Analysis: OLS Asymptotics
Table of Contents
Teaching notes 58
Solutions to Problems 59
Solutions to Computer Exercises 60
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website,
in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise
on a password-protected website or school-approved learning management system for classroom use.
58
TEACHING NOTES
Chapter 5 is short, but it is conceptually more difficult than the earlier chapters, primarily
because it requires some knowledge of the asymptotic properties of estimators. In class, I give a
brief, heuristic description of consistency and asymptotic normality before stating the
consistency and asymptotic normality of OLS. (Conveniently, the same assumptions that work
for finite sample analysis work for asymptotic analysis.) More advanced students can follow the
proof of consistency of the slope coefficient in the bivariate regression case. Section E.4 contains
a full matrix treatment of asymptotic analysis appropriate for a master’s level course.
An explicit illustration of what happens to standard errors as the sample size grows emphasizes
the importance of having a larger sample. I do not usually cover the LM statistic in a first-
semester course, and I only briefly mention the asymptotic efficiency result. Without full use of
matrix algebra combined with limit theorems for vectors and matrices, it is difficult to prove
asymptotic efficiency of OLS.
I think the conclusions of this chapter are important for students to know, even though they may
not fully grasp the details. On exams, I usually include true-false type questions, with
explanations, to test the students’ understanding of asymptotics. [For example: “In large
samples we do not have to worry about omitted variable bias.” (False). Or “Even if the error
term is not normally distributed, in large samples we can still compute approximately valid
confidence intervals under the Gauss-Markov assumptions.” (True).]
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website,
in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise
on a password-protected website or school-approved learning management system for classroom use.
59
SOLUTIONS TO PROBLEMS
5.1 Write y = β 0 + β1 x 1 + u, and take the expected value: E(y) = β 0 + β1 E(x 1 ) + E(u), or µ y =
β 0 + β1 µ x since E(u) = 0, where µ y = E(y) and µ x = E(x 1 ). We can rewrite this as β 0 = µ y - β1
µ x . Now, β̂ 0 = y − β̂1 x1 . Taking the plim of this we have plim( β̂ 0 ) = plim( y − β̂1 x1 ) =
plim( y ) – plim( β̂1 ) ⋅ plim( x1 ) = µ y − β1 µ x , where we use the fact that plim( y ) = µ y and plim(
x1 ) = µ x by the law of large numbers, and plim( β̂1 ) = β1 . We have also used the parts of
Property PLIM.2 from Appendix C.
5.2 A higher tolerance of risk means more willingness to invest in the stock market, so β 2 > 0.
By assumption, funds and risktol are positively correlated. Now we use equation (5.5), where
δ 1 > 0: plim( β1 ) = β1 + β 2 δ 1 > β1 , so β1 has a positive inconsistency (asymptotic bias). This
makes sense; if we omit risktol from the regression and it is positively correlated with funds,
some of the estimated effect of funds is actually due to the effect of risktol.
5.3 The variable cigs has nothing close to a normal distribution in the population. Most people
do not smoke, so cigs = 0 for over half of the population. A normally distributed random
variable takes on no particular value with positive probability. Further, the distribution of cigs is
skewed, whereas a normal random variable must be symmetric about its mean.
5.4 Write y = β 0 + β1 x + u, and take the expected value: E(y) = β 0 + β1 E(x) + E(u), or µ y =
β 0 + β1 µ x , since E(u) = 0, where µ y = E(y) and µ x = E(x). We can rewrite this as β 0 = µ y −
β µ x . Now, β = y − β x . Taking the plim of this we have plim( β ) = plim( y − β x ) =
1 0 1 0 1
plim( y ) – plim( β1 )⋅plim( x ) = µ y − β1 µ x , where we use the fact that plim( y ) = µ y and plim( x
) = µ x by the law of large numbers, and plim( β ) = β . We have also used the parts of the
1 1
(ii) By observing the histogram, we can say that a very small proportion of the students have
scored less than 60. No, the normal distribution does not fit well in the left tail.
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website,
in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise
on a password-protected website or school-approved learning management system for classroom use.
60
Seen below is a histogram of the 526 residuals, uˆi , i = 1, 2 , …, 526. The histogram uses 27
bins, which is suggested by the formula in the Stata manual for 526 observations. For
comparison, the normal distribution that provides the best fit to the histogram is also plotted.
.18
.13
Fraction
.08
.04
0
-8 -4 -2 0 2 6 10 15
uhat
log( wage) = .284 + .092 educ + .0041 exper + .022 tenure
(.104) (.007) (.0017) (.003)
n = 526, R2 = .316, σˆ = .441.
The histogram for the residuals from this equation, with the best-fitting normal distribution
overlaid, is given below:
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website,
in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise
on a password-protected website or school-approved learning management system for classroom use.
61
.14
.1
Fraction
.06
.03
0
-2 -1 0 1.5
uhat
(iii) The residuals from the log(wage) regression appear to be more normally distributed.
Certainly the histogram in part (ii) fits under its comparable normal density better than the one in
part (i), and the histogram for the wage residuals is notably skewed to the left. In the wage
regression, there are some very large residuals (roughly equal to 15) that lie almost five
estimated standard deviations ( σˆ = 3.085) from the mean of the residuals, which is identically
zero, of course. Residuals far from zero do not appear to be nearly as much of a problem in the
log(wage) regression.
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website,
in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise
on a password-protected website or school-approved learning management system for classroom use.
62
(iii) The ratio of the standard error using 2,070 observations to that using 4,137 observations
is about 1.04. From (5.10), we compute (4,137 / 2, 070) ≈ 1.41, which is somewhat above the
ratio of the actual standard errors but reasonably close.
C5.3 We first run the regression bwght on cigs, parity, and faminc using only the 1,191
observations with nonmissing observations on motheduc and fatheduc. After obtaining these
residuals, ui , these are regressed on cigs i , parity i , faminc i , motheduc i , and fatheduc i , where, of
course, we can only use the 1,191 observations with nonmissing values for both motheduc and
fatheduc. The R-squared from this regression, Ru2 is about .0024. With 1,191 observations, the
chi-square statistic is (1,191)(.0024) ≈ 2.86. The p-value from the χ 22 distribution is about .239,
which is very close to .242, the p-value for the comparable F test.
C5.4 (i) The measure of skewness for inc is about 1.86. When we use log(inc), the skewness
measure is about .360. Therefore, there is much less skewness in log of income, which means inc
is less likely to be normally distributed. (In fact, the skewness in income distributions is a well-
documented fact across many countries and time periods.)
(ii) The skewness for bwght is about −.60. When we use log(bwght), the skewness measure
is about −2.95. In this case, there is much more skewness after taking the natural log.
(iii) The example in part (ii) clearly shows that this statement cannot hold generally. It is
possible to introduce skewness by taking the natural log. As an empirical matter, for many
economic variables, particularly dollar values, taking the log often does help to reduce or
eliminate skewness. But it does not have to.
(iv) For the purposes of regression analysis, we should be studying the conditional
distributions: that are, the distributions of y and log(y) conditional on the explanatory variables,
x1 , ..., xk . If we think the mean is linear, as in Assumptions MLR.1 and MLR.3, then this is
equivalent to studying the distribution of the population error, u. In fact, the skewness measure
studied in this question often is applied to the residuals from and OLS regression.
C5.5 (i) The variable educ takes on all integer values from 6 to 20, inclusive. So it takes on 15
distinct values. It is not a continuous random variable, nor does it make sense to think of it as
approximately continuous. (Contrast a variable such as hourly wage, which is rounded to two
decimal places but takes on so many different values it makes sense to think of it as continuous.)
(ii) With a discrete variable, usually, a histogram has bars centered at each outcome, with
the height being the fraction of observations taking on the value. Such a histogram, with a
normal distribution overlay, is given below.
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website,
in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise
on a password-protected website or school-approved learning management system for classroom use.
63
density
.4
.3
.2
.1
0
4 6 8 10 12 14 16 18 20 22
highest grade completed by 1991
Even discounting the discreteness, the best fitting normal distribution (matching the sample
mean and variance) fits poorly. The focal point at educ = 12 clearly violates the notion of a
smooth bell-shaped density.
(iii) Given the findings in part (iii), the error term in the equation
cannot have a normal distribution independent of the explanatory variables. Thus, MLR.6 is
violated. In fact, the inequality educ ≥ 0 means that u is not even free to vary over all values
given motheduc, fatheduc, and abil. (It is likely that the homoskedasticity assumption fails, too,
but this is less clear and does not follow from the nature of educ.)
The violation of MLR.6 means that we cannot perform exact statistical inference; we must
rely on asymptotic analysis. This in itself does not change how we perform statistical inference:
without normality, we use exactly the same methods, but we must be aware that our inference
holds only approximately.
C5.6 (i) Logically, the smallest and the largest values of score would be 0 and 100, respectively.
In the sample, the smallest and the largest values of score are 19.53 and 98.44, respectively.
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website,
in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise
on a password-protected website or school-approved learning management system for classroom use.
64
(ii) The distribution of score is skewed to left tail. This violates the normality assumption even
conditional on the explanatory variables. Therefore, the assumption MLR.6 does not hold for the
error term u and score will not be normally distributed, which means that the t statistics will not
have t distributions and the F statistics will not have F distributions. This is a potentially serious
problem because our inference hinges on being able to obtain critical values or p-values from the
t or F distributions.
(iii) 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
� = 27.43 + 13.801𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 + 0.54𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎ℎ − 0.26𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 + 𝑢𝑢.
t value for acteng is -2.48 and the corresponding p - value is 0.013. If the sample size is large,
then the t-distribution can be approximated to the distribution of the t statistics when the error
term is not normally distributed.
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website,
in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise
on a password-protected website or school-approved learning management system for classroom use.