Hsts423 Unit 4
Hsts423 Unit 4
Heteroscedasticity
Objectives
It was indicated in Unit 2 that in practice the assumptions of the General Linear
Model (GLM) can be, or are sometimes, violated for a number of reasons. In this
Unit we describe a statistical phenomenon called heteroscedasticity, its common causes,
treatment etc.
Unit 1
Heteroscedasticity
2
1.1. WHAT IS HETEROSCEDASTICITY? 3
Recall that the main goal of econometric modeling is to obtain accurate and efficient
estimates of relationships among variables of an economic system, on the basis of which
the main aims of Econometrics namely, prediction, planning and control can be effected.
As indicated earlier, the degree of success or failure in achieving these goals depends
to a large extent, on the degree of success achieved at the specification, estimation and
diagnostic stages of the modelling process. At the estimation and inferential stages,
efficiency of parameter estimates and validity of inference resulting therefrom, depends
largely on whether or not the fundamental assumptions of the model are satisfied. In
the econometric regression model
Yt = Xt β + ut
one of the basic assumptions is that the disturbance or error terms {ut } are homoscedas-
tic i.e. σ 2 (t) = var(ut ) = σ 2 is a constant for all t or equivalently
Ω = E(uu0 = σ 2 I
where u = (u1 , u2 , . . . , un )0 . This assumption among other things, that the least squares
estimator
β̂ = (X0 X)−1 XY
is an efficient estimator of β and that the conventional t-test and F-test used to make
inference about the model are valid statistical procedures. If the error terms in the
regression model have unequal variances we say that there is heteroscedasticity.
Definition 1.1 Let {ut } be a time series. Then the series is said to heteroscedastic if
Heteroscedasticity refers to a situation where the variances of the error terms {ut } are
unequal. Since this is an undesirable phenomenon, the GLM assumption referring to
the absence of this phenomenon is called the assumption of homogenious variances i.e.
equal variances.
Whenever there is heteroscedasticity in the error terms all inference namely estimation,
hypothesis testing and forecasting must take into account the effects of heteroscedasticity
for the conclusions to be valid.
We examine below some common causes of heteroscedasticity and how they be avoided
or taken into account when making statistical inference.
4 UNIT 1. HETEROSCEDASTICITY
From the foregoing it is clear that it is not always safe to assume that error terms are
homogeneous over all economic units being observed.
1.3. IMPACT OF HETEROSCEDASTICITY ON ESTIMATION AND INFERENCE5
which implies biased variance estimation and hence general lack of accuracy in all
subsequent inference. In particular, heteroscedasticity implies that
1/σ1 0 ... 0
0 1/σ2 ... 0
P =
... .. .. ..
. . .
0 ... . . . 1/σn
It follows from the Gauss-Markov theorem that the generalised Least squares estimator
β̂ = (X0 ΩX)−1 X0 ΩY is a BLUE estimator for β.
σ 2 (t) = σ 2 Xt2
where Yt∗ = Yt /Xt , β0∗ = β1 , β1∗ = β0 , Xt∗ = 1/Xt and u∗t = ut /Xt .
It is easy then to see that var(u∗t ) = var(ut /Xt ) = 1
Xt2
σ 2 Xt2 = σ 2 = constant.
The tests for heteroscedasticity discussed here assume that the disturbance term {ut }
are
(i) uncorrelated
H0 : σ12 = σ22 = . . . = σm
2
versus H1 : σ12 ≤ σ22 ≤ . . . ≤ σm
2
In the case of two groups the Goldfeld-Quandt testing procedure consists of perform-
ing two regressions on each of the two sets of observations. Let SSE1 and SSE2
denote the residual sum of squares from the first and second regressions respectively.
Let SSEmax = max(SSE1 , SSE2 ) and SSEmin = min(SSE1 , SSE2 ). Then the test
statistic which is given by
SSEmax /n1 − p
F =
SSEmin /n2 − p
has, under the assumption of equal variances, an F-distribution with (n1 − p, n2 − p)
degrees of freedom where p is the number of model parameters.
In the case where the variance is a function of an exogenous i.e. explanatory variable,
the procedure is to order the data with the values of Xt . Omit say c central or middle
values. The value of c is ussually chosen so that about 10% or at most, 25% of the
observations are omitted. Peform the two regressions as in the case of two groups. Let
F be the ratio of the mean error sum of squares from the two regressions. Under the
assumption of equal variances, the statistic
Example 1.1 The following data show national consumption and income in millions
of dollars.
8 UNIT 1. HETEROSCEDASTICITY
consumption Y income X
1.1 1.0
2.0 2.0
2.5 3.0
2.6 4.0
2.9 5.0
3.0 6.0
3.2 7.0
Use the Goldfeld-Qaundt test to test for heteroscedasticity in the linear regression model
Y = β0 + β1 X + u for consumption as a function of income. Use a 5% significance
level.
Solution 1.1 The sample size is n = 7. Thus the number of central observations to be
excluded from the two regressions is c = 10%(7) = 1. The error sum of squares from
the first 3 observations is SSE = 0.03. Similarly the error sum of squares from the last
3 observations is SSE = 0.01. Thus with n = 7, c = 1 and p = 2, the test statistic is
and using [(n−c−2p)/2, (n−c−2p)/2] = [2, 3, ] degrees of freedom the critical region is
[F2,2,0.05 , ∞) = [19.0, ∞). Thus H0 is not rejected and conclude that the error terms
have (approximately) equal variances.
Another test for gouped heteroscedasticity but less frequently used in Econometrics is
Barttlet’s test. The testing procedure is as follows:
time t = n + 1. Then it can be shown that the most efficient prediction of Yn+1 is given
by
Ŷ (Xn+1 ) = Xn+1 β ?
Activity 1.1 .
σt2 = σ 2 Xt
Y ∗ = β0 + β1 X ∗ + u∗
(i) Plot the Philips curve using unemployment not its its reciprocal.
(ii) Fit the regression model
wage = β0 + β1 unemplument + u
4. Show how the degrees of freedom for the G-Q test are derived.
1.6. SUMMARY OF THE UNIT 11
1 22 29 1
2 22 20 1
3 20 14 1
4 24 21 1
5 12 6 1
6 30 15 2
7 32 9 2
8 26 1 2
9 26 6 2
10 37 19 2
11 12 16 3
12 8 31 3
13 13 26 3
14 25 35 3
15 7 12 3
16 23 5 4
17 25 25 4
18 28 16 4
19 26 10 4
20 23 24 4
(a) Use the Goldfeld-Qaundt test to test for heteroscedasticity in the linear re-
gression model Y = β0 + β1 X + u for consumption as a function of income
in each of the following cases.
(i) using data for income groups 1 and 2 only and c = 0.
(ii) using data for income groups 1 and 2 only and c = 2
(iii) using all the data.
In each case use a 5% significance level.
12 UNIT 1. HETEROSCEDASTICITY
13