0% found this document useful (0 votes)
6 views33 pages

Chap8-9_Fall20_1129

The document discusses heteroskedasticity, defining it as a condition where the variance of the error term is not constant across observations, and outlines its implications for Ordinary Least Squares (OLS) regression, including the invalidation of standard variance formulas and tests. It highlights the importance of using heteroskedasticity-robust standard errors to ensure valid statistical inference, especially when heteroskedasticity is present. Additionally, it covers methods for testing heteroskedasticity, such as the Breusch-Pagan and White tests, and emphasizes the necessity of adjusting standard errors accordingly.

Uploaded by

jl6207
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views33 pages

Chap8-9_Fall20_1129

The document discusses heteroskedasticity, defining it as a condition where the variance of the error term is not constant across observations, and outlines its implications for Ordinary Least Squares (OLS) regression, including the invalidation of standard variance formulas and tests. It highlights the importance of using heteroskedasticity-robust standard errors to ensure valid statistical inference, especially when heteroskedasticity is present. Additionally, it covers methods for testing heteroskedasticity, such as the Breusch-Pagan and White tests, and emphasizes the necessity of adjusting standard errors accordingly.

Uploaded by

jl6207
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

CH8 HETEROS KEDA STIC ITY

(8. 1 & 8. 2)
Overview

1. What is it?
2. Consequences of heteroskedasticity (8.1)
3. Implication for computing standard errors (8.2)
4. Testing for heteroskedasticity (8.3)

What …?
If var(u|Xi) is constant – that is, if the variance of the conditional
distribution of u given X does not depend on X – then u is said to
be homoskedastic . Otherwise, u is heteroskedastic .
Heteroskedasticity

 Consequences of heteroskedasticity for OLS


 OLS still unbiased and consistent under heteroskedastictiy!
 Also, interpretation of R-squared is not changed
Unconditional error variance is unaffected
by heteroskedasticity (which refers to the
conditional error variance)

 Heteroskedasticity invalidates variance formulas for OLS estimators


 The usual F tests and t tests are not valid under heteroskedasticity
 Under heteroskedasticity, OLS is no longer the best linear unbiased
estimator (BLUE); there may be more efficient linear estimators
Heteroskedasticity-robust Inference

 Luckily, we can adjust standard errors and t, F statistics so that they are
valid in the presence of heteroscedasticity – heteroscedasticity-robust
procedure.

 Formulas for OLS standard errors and related statistics have been
developed that are robust to heteroskedasticity of unknown form.
o heteroskedasticiy-robust standard error;
o heteroskedasticiy-robust t stastistics;
o heteroskedasticiy-robust F stastistics;

All formulas are only valid in large samples;


Heteroskedasticity-robust Inference

We now have two formulas for standard errors for

 Homoskedasticity-only standard errors – these are valid only


if the errors are homoskedastic.
 Heteroskedasticity – robust standard errors (robust standard
errors) – they are valid whether or not the errors are
heteroskedastic.

 The main advantage of the homoskedasticity-only standard errors is


that the formula is simpler. But the disadvantage is that the formula
is only correct if the errors are homoskedastic.
Heteroskedasticity-robust Inference

 Example: Hourly wage equation

Heteroskedasticity robust standard errors may be


larger or smaller than their nonrobust counterparts.
The differences are often small in practice.

F statistics are also often not too different.

If there is strong heteroskedasticity, differences may be larger.


To be on the safe side, it is advisable to always compute robust
standard errors.
Heteroskedasticity-robust Inference

In practice:
 The homoskedasticity-only formula for the standard error and the
“heteroskedasticity-robust” formula differs – so in general, you get different
standard errors using the different formulas.

 Homoskedasticity-only standard errors are the default setting in


regression software – sometimes the only setting (e.g. Excel). To get the
general “heteroskedasticity-robust” standard errors you must override the
default.

 If you don’t override the default and there is in fact heteroskedasticity,


your standard errors (and t-statistics and confidence intervals) will be
wrong.
Heteroskedasticity-robust Standard Errors in STATA

reg testscr str pctel, robust;

Regression with robust standard errors Number of obs = 420


F( 2, 417) = 223.82
Prob > F = 0.0000
R-squared = 0.4264
Root MSE = 14.464

------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -1.101296 .4328472 -2.54 0.011 -1.95213 -.2504616
pctel | -.6497768 .0310318 -20.94 0.000 -.710775 -.5887786
_cons | 686.0322 8.728224 78.60 0.000 668.8754 703.189
------------------------------------------------------------------------------
-------------------------------------------------------------------------

 If you use the “, robust” option, STATA computes heteroskedasticity-robust standard errors
 Otherwise, STATA computes homoskedasticity-only standard errors
Heteroskedasticity-robust Inference

The final guidelines:


 If the errors are either homoskedastic or heteroskedastic and you
use heteroskedastic-robust standard errors, you are OK.
 If the errors are heteroskedastic and you use the homoskedasticity-
only formula for standard errors, your standard errors will be
wrong.
 So, you should always use heteroskedasticity-robust standard errors.
8. 3 TES TIN G FOR
HETEROS KEDA STIC ITY
Testing for Heteroskedasticity

 Testing for heteroskedasticity


 It may still be interesting whether there is heteroskedasticity
because then OLS may not be the most efficient linear
estimator anymore
 Breusch-Pagan test for heteroskedasticity

Under MLR.4

The mean of u2 must not vary


with x1, x2, …, xk

If H0 is false, the expected value of u2 , given the independent variables, can be


virtually any function of the xj
Testing for Heteroskedasticity

 Breusch-Pagan test for heteroskedasticity (cont.)

8.14 auxiliary regression


Regress squared residuals on all expla-
natory variables and test whether this
regression has explanatory power.

A large test statistic (= a high R-


squared) is evidence against the null
hypothesis.

Alternative test statistic (= Lagrange multiplier statistic, LM).


Again, high values of the test statistic (= high R-squared) lead to
rejection of the null hypothesis that the expected value of u2 is
unrelated to the explanatory variables.
Review from 4.5

(2) Test of overall significance of a regression

The null hypothesis states that the explanatory


variables are not useful at all in explaining the
dependent variable

Restricted model
(regression on constant)
2 General
�𝑆𝑆𝑆𝑆𝑅𝑅𝑟𝑟 − 𝑆𝑆𝑆𝑆𝑅𝑅𝑢𝑢𝑢𝑢 )⁄𝑞𝑞 �𝑅𝑅𝑢𝑢𝑢𝑢 − 𝑅𝑅𝑟𝑟2 )⁄𝑞𝑞
𝐹𝐹 = = 2 ⁄ ∼ 𝐹𝐹q,𝑛𝑛−𝑘𝑘−1 Derive
𝑆𝑆𝑆𝑆 𝑅𝑅𝑢𝑢𝑢𝑢 ⁄( 𝑛𝑛 − 𝑘𝑘 − 1) 1 − 𝑅𝑅𝑢𝑢𝑢𝑢 ) ( 𝑛𝑛 − 𝑘𝑘 − 1
k restrictions so q=k

 The test of overall significance is reported in most regression packages;


the null hypothesis is usually overwhelmingly rejected
Review from LM (Chap 5)

 Suppose we have a standard model, y = b0 + b1x1 + b2x2


+ . . . bkxk + u and our null hypothesis is
 H0: bk-q+1 = 0, ... , bk =0 last q of these variables
 First, we just run the restricted model
~ ~ ~
y = β 0 + β1 x1 + ... + β k −q xk−q + u~
Now take the residuals, u~,and regress
auxiliary regression u~on x , x ,..., x (i.e. all the variables)
1 2 k
LM = nR , where R2 is from thisreg
2
u u
R-squared is close to zero because u tilta will be approximately uncorrelated with all indep. vars.
Lagrange Multiplier Statistic

a
LM ~ χ q2 , so can choose a critical
value, c, from a χ q2 distribution, or
just calculate a p - value for χ q2

 With a large sample, the result from an F test and


from an LM test should be similar.

LM is better than F because LM does not need


normality assumption.
Testing for Heteroskedasticity

 Breusch-Pagan test for heteroskedasticity (cont.)


8.10

8.14

 If we reject the BP test, some corrective measure should be taken, for example,
use the heteroskedasticity-robust standard errors. (More on 8.4 WLS)
Testing for Heteroskedasticity

 Example 8.5: Heteroskedasticity in housing price equations


High correlation means x can not be ignored so that heteroskedasticity exists.

Heteroskedasticity

In the logarithmic specification, homoskedasticity cannot be rejected


Testing for Heteroskedasticity

 The White test for heteroskedasticity


weaker assumption: the squared error, u2, is uncorrelated with all the
independent variables (xj), the squares of the independent variables
(xj 2), and all the cross products (xj xh)
Regress squared residuals on all expla-
natory variables, their squares, and in-
teractions (here: example for k=3)

Weaker than homoskedasticity assumption (p274)


Testing for Heteroskedasticity

 The White test for heteroskedasticity


Regress squared residuals on all expla-
natory variables, their squares, and in-
teractions (here: example for k=3)

auxiliary regression

The White test detects more general


deviations from heteroskedasticity
than the Breusch-Pagan test

 Disadvantage of this form of the White test


 Including all squares and interactions leads to a large number of estimated parameters
(e.g. k=6 leads to 27 (=6+6+15) parameters to be estimated)
Testing for Heteroskedasticity

 Alternative form of the White test

8.20

This regression indirectly tests the dependence of the squared residuals


on the explanatory variables, their squares, and interactions, because the
predicted value of y and its square implicitly contain all of these terms.
Testing for Heteroskedasticity

 White test for heteroskedasticity


8.10

8.20

 If we reject the white test, some corrective measure should be


taken, for example, use the heteroskedasticity-robust standard
errors. (More on 8.4 WLS)
HW

HW: 1, 4
More on
Chapter 9 Specification
and Data Issues

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use. © kentoh/Shutterstock.
More on Specification and Data
Issues
● Tests for functional form misspecification
• One can always test whether explanatory should appear as squares
or higher order terms by testing whether such terms can be
excluded
• Otherwise, one can use general specification tests such as RESET

● Regression specification error test (RESET)


• The idea of RESET is to include squares and possibly higher order
fitted values in the regression (similarly to the reduced White test)
Nonlinear function of xj

Test for the exclusion of these terms. If they cannot be excluded, this is evidence for
omitted higher order terms and interactions, i.e. for misspecification of functional form.

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
More on Specification and Data
Issues
● Example: Housing price equation

Evidence for
misspecification

Less evidence for


misspecification
● Discussion Preferred due to less evidence for misspecification
• One may also include higher order terms, which implies complicated
interactions and higher order terms of all explanatory variables
• RESET provides little guidance as to where misspecification comes
from (don‘t know whether it is from interactions or higher order)
© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
More on Specification and Data
Issues
● Testing against nonnested alternatives
Which specification
is more appropriate?
Model 1:

Model 2:

Define a general model that contains both models as subcases and test:

● Discussion Logic of F is nested


• Can always be done; however, a clear winner need not emerge
• Cannot be used if the models differ in their definition of the dep.
var.

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
More on Specification and Data
Issues
● Using proxy variables for unobserved explanatory variables

● Example: Omitted ability in a wage equation Replace by proxy

In general, the estimates for the returns to education and experience will be biased because
one has omit the unobservable ability variable. Idea: find a proxy variable for ability which is
able to control for ability differences between individuals so that the coefficients of the other
variables will not be biased. A possible proxy for ability is the IQ score or similar test scores.

● General approach to use proxy variables

Omitted variable, e.g. ability

Regression of the omitted variable on its proxy


abil IQ score
© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
More on Specification and Data
Issues
● Assumptions necessary for the proxy variable method to work
• The proxy is “just a proxy” for the omitted variable, it does not
belong into the population regression, i.e. it is uncorrelated with its
error
If the error and the proxy were correlated, the proxy
would actually have to be included in the population
regression function

• The proxy variable is a “good” proxy for the omitted variable, i.e.
using other variables (x1,x2) in addition will not help to predict the
omitted variable p310

Otherwise x1 and x2 would


have to be included in the
regression for the omitted
variable

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
More on Specification and Data
Issues
● Under these assumptions, the proxy variable method works:
Plug (2)
into (1)
In this regression model, the error term is uncorrelated with all the explanatory variables. As
a consequence, all coefficients will be correctly estimated using OLS. The coefficents for the
explanatory variables x1 and x2 will be correctly identified. The coefficient for the proxy va-
riable may also be of interest (it is a multiple, δ3 of the coefficient of the omitted variable).

● Discussion of the proxy assumptions in the wage example


• Assumption 1: Should be fullfilled as IQ score is not a direct wage
determinant; what matters is how able the person proves at work
• Assumption 2: Most of the variation in ability should be explainable
by variation in IQ score, leaving only a small rest to educ and exper
1
2
© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
More on Specification and Data
Issues

As expected, the measured return to


education decreases if IQ is included
as a proxy for unobserved ability.

The coefficient for the proxy suggests


that ability differences between indivi-
duals are important (e.g. +15 points
IQ score are associated with a wage
increase of 5.4 percentage points).

Even if IQ score imperfectly soaks up


the variation caused by ability, inclu-
ding it will at least reduce the bias in
the measured return to education.

No significant interaction effect bet-


ween ability and education.

10 pts is predicted to raise monthly earnings by 3.6% so 15 pts is (3.6*1.5=5.4%)


© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
More on Specification and Data
Issues
Cities with high historical crime rates may spend more on crime prevention.
● Using lagged dependent variables as proxy variables
• In many cases, omitted unobserved factors may be proxied by the
value of the dependent variable from an earlier time period

● Example: City crime rates “+” because crime has inertia.

• Including the past crime rate will at least partly control for the many
omitted factors that also determine the crime rate in a given year
• Another way to interpret this equation is that one compares cities
which had the same crime rate last year; this avoids comparing
cities that differ very much in unobserved crime factors
Ex: Some universities are traditionally better in academics than other
universities. Inertial effects are also captured by putting in lags of y.
© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Example: City crime rates

Unemployment rate & expenditures on law enforcement are counterintuitive in column (1).
With lagged dep. Vars., signs of unem & log (lawexpc) are corrected. R-squared is
increased a lot. The + sign of crmrte82 shows highly correlation with current crime rate.

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
HW

HW: 1, 3

© 2016 Cengage Learning ® . May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.

You might also like