0% found this document useful (0 votes)
30 views13 pages

Hsts423 Unit 4

This document discusses the concept of heteroscedasticity in econometrics. It defines heteroscedasticity as unequal error variances in a regression model. Some common causes of heteroscedasticity mentioned include model mis-specification, data stratification, data treatment methods, and data collection procedures. The impacts of heteroscedasticity include inaccurate model coefficients, underestimated error variance, and reduced predictive power of models. Methods for estimating models and testing for heteroscedasticity include assuming grouped heteroscedasticity and using the Goldfeld-Quandt test.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views13 pages

Hsts423 Unit 4

This document discusses the concept of heteroscedasticity in econometrics. It defines heteroscedasticity as unequal error variances in a regression model. Some common causes of heteroscedasticity mentioned include model mis-specification, data stratification, data treatment methods, and data collection procedures. The impacts of heteroscedasticity include inaccurate model coefficients, underestimated error variance, and reduced predictive power of models. Methods for estimating models and testing for heteroscedasticity include assuming grouped heteroscedasticity and using the Goldfeld-Quandt test.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

UNIT 4

Heteroscedasticity

Objectives

At the end of this unit students are expected to be able to

1. define and explain heteroscedasticity as used in Econometrics,

2. describe and explain causes of heteroscedasticity,

3. describe common or standard types of heteroscedasticity,


in particular, grouped heteroscedasticity ,

4. describe consequences of heteroscedasticity,

5. describe the aims and procedures of Goldfeld-Qaundt test,

6. estimate model paramters in the presence of heteroscedasticity


i.e. conduct appropriate estimation procedures for the cases
where the assumption of homoscedasticity is violated,
in particular, students mus be able to perform weighted
regression,

7. to generate accurate forecasts with a heteroscedastic model.


i.e. to make forecasts that take into account the effects of
heteroscedasticity.

It was indicated in Unit 2 that in practice the assumptions of the General Linear
Model (GLM) can be, or are sometimes, violated for a number of reasons. In this
Unit we describe a statistical phenomenon called heteroscedasticity, its common causes,
treatment etc.
Unit 1
Heteroscedasticity

2
1.1. WHAT IS HETEROSCEDASTICITY? 3

1.1 What is heteroscedasticity?


As indicated above, in this section we examine a statistical phenomenon called het-
eroscedasticity.

Recall that the main goal of econometric modeling is to obtain accurate and efficient
estimates of relationships among variables of an economic system, on the basis of which
the main aims of Econometrics namely, prediction, planning and control can be effected.
As indicated earlier, the degree of success or failure in achieving these goals depends
to a large extent, on the degree of success achieved at the specification, estimation and
diagnostic stages of the modelling process. At the estimation and inferential stages,
efficiency of parameter estimates and validity of inference resulting therefrom, depends
largely on whether or not the fundamental assumptions of the model are satisfied. In
the econometric regression model

Yt = Xt β + ut

one of the basic assumptions is that the disturbance or error terms {ut } are homoscedas-
tic i.e. σ 2 (t) = var(ut ) = σ 2 is a constant for all t or equivalently

Ω = E(uu0 = σ 2 I

where u = (u1 , u2 , . . . , un )0 . This assumption among other things, that the least squares
estimator
β̂ = (X0 X)−1 XY
is an efficient estimator of β and that the conventional t-test and F-test used to make
inference about the model are valid statistical procedures. If the error terms in the
regression model have unequal variances we say that there is heteroscedasticity.

Definition 1.1 Let {ut } be a time series. Then the series is said to heteroscedastic if

σ 2 (t) 6= σ 2 (s) form some t 6= s

Heteroscedasticity refers to a situation where the variances of the error terms {ut } are
unequal. Since this is an undesirable phenomenon, the GLM assumption referring to
the absence of this phenomenon is called the assumption of homogenious variances i.e.
equal variances.

Whenever there is heteroscedasticity in the error terms all inference namely estimation,
hypothesis testing and forecasting must take into account the effects of heteroscedasticity
for the conclusions to be valid.

We examine below some common causes of heteroscedasticity and how they be avoided
or taken into account when making statistical inference.
4 UNIT 1. HETEROSCEDASTICITY

1.2 Causes or sources of heteroscedasticity


The assumption of spherical disturbances, as indicated earlier involves the double as-
sumption that the error terms have equal variances and are uncorrelated. It is possible,
of course, for the error series to be uncorrelated but with unequal variances. In this
case the covariance matrix for the error terms will be diagonal. Heteroscedasticity can
be caused by a number of factors. These include,

1. MIS-SPECIFICATION: Some economic variables such as Consumer Price


Index(CPI) or GDP tend to increase linearly or exponentially. If such variables
are omitted from the regression they will be absorbed in the error term ut which
will then exhibit changing variance. For example if a model of the form Yt =
β0 + β1 + X1t + β2 X2t + +ut is wrongly specified as Yt = β0 + β1 + X1t + vt . If
{X2t } is increasing with time, so will {vt }.

Heteroscedasticity due to mis-specification by exclusion of important explanatory


variables or by assuming a linear relation when in fact a non-linear relationship
exists is quite common. The solution to the problem if detected is simply cor-
recting the specification. Other treatments of the problem are also possible as we
will see later.

2. STRATIFICATION: Different economic units or populations are hardly ho-


mogeneous. Data for two different groups of populations can exhibit unequal
variances for many reasons. For example income figures for low and high income
groups, in general, show different variablity or spread of values. Similarly data
for small firms will not show as much variability as data for large firms whose
economic activities are on a larger scale. Smaller firms are unlikely to engage
in extensive and/or competitive research and developments since they may not
have the leverage i.e. asserts, liquidity, economies of scale etc. As there are also
greater risks involved in these activities, we would expect variability to be more
pronounced for larger firms.

3. DATA TREATMENT: Data manipulation such as data aggregation and group-


ing techniques tend to produce marked heterogeneity. Use of indices and choice
or change of base year can cause heteroscedasticity.

4. DATA COLLECTION PROCEDURES: Sampling procedures such as clus-


ter sampling can easily generate unequal variances.

5. ADMINSTRATIVE INTERFERENCE: Sometimes and for some socio-


political reasons, statistical data are interfered with so that some (types or groupss
of) figures are changed so as to make them appear larger or smaller than what
they really are. In addition, Statistical acts and their enforcements can results in
marked differences in data, especially for data collected during different periods.

From the foregoing it is clear that it is not always safe to assume that error terms are
homogeneous over all economic units being observed.
1.3. IMPACT OF HETEROSCEDASTICITY ON ESTIMATION AND INFERENCE5

As in the previous Unit, before we examine the treatment of heteroscedasticity, we first


examine the impact i.e. consequences of heteroscedasticity on inference.

1.3 Impact of heteroscedasticity on estimation and infer-


ence
We have already seen the main effects of non-spherical disturbances. In particular we
have seen that if β̂ is the OLS estimate of β in the GLM Y = X0 β + u, then

cov(β̂) = (X0 X)−1 (X0 ΩX)(X0 X)−1

which implies biased variance estimation and hence general lack of accuracy in all
subsequent inference. In particular, heteroscedasticity implies that

1. model coefficients will be inaccurate.

2. Error variance σ 2 is underestimated by OLS estimation.

3. The estimated model has low predictive power.

Thus, as indicated again earlier there should be routine checks of heteroscedasticity.


Further, as for autocorrelation, heteroscedasticity can take on various forms. Typical
or standard heteroscedasticity takes the form
 σ2 0 . . . 0 
1
0 σ22 ... 0 
Ω = cov(u, u) = E(uu0 ) = 

.. .. .. .. 
 . . . . 
. . . . . . σn2
0
 σ2 0 . . . 0 
1
 0 σ22 . . . 0 
It is, however, not enough to just assume that Ω = 
 ... .. .. .. .
. . . 
0 ... ... σn2

We do not have enough degrees of freedom as there are (p+n) parameters

β0 , . . . , βp−1 and σ12 , . . . , σn2

to be estimated. So to make the study or analysis feasible we must impose further


restrictions on the structure of Ω. This is discused below under Estimation and Testing
for heteroscedasticity.

1.4 Estimation and Testing for heteroscedasticity


As indicated above in order to be able to estimate accurately parameters of a model
in the presence of heteroscedasticity, it is necessary to make some simplifying but
attainable assumptions.
6 UNIT 1. HETEROSCEDASTICITY

1.4.1 Grouped heteroscedasticity: Goldfeld-Quandt test

Suppose that it is possible to identify groups G1 , G2 , . . . , Gm such that error variances


are homogeneous i.e. equal within a group but possibly differing from group to group.
Let ng , g = 1, 2, . . . , m be the number of observations in the ith group.

Estimation in the presence of grouped heteroscedasticity

The essence of GLS procedure in producing efficient parameter estimates is to weight


each observation or a group of observations inversely proportional to variability at the
corresponding level. In this case it is easy to verify that an appropriate transformation
matrix P = Ω−1/2 is given by

1/σ1 0 ... 0
 
 0 1/σ2 ... 0 
P =
 ... .. .. .. 
. . . 
0 ... . . . 1/σn

It follows from the Gauss-Markov theorem that the generalised Least squares estimator
β̂ = (X0 ΩX)−1 X0 ΩY is a BLUE estimator for β.

Solutions to heteroscedastic disturbances. If heteroscedasticity is established or detec-


tected by any test, the solution is to make an appropriate transfomation of the data or
model in such a way as to obtain a form in which the disturbance terms have a constant
variance. This in general depends on the form, or particular type of heteroscedasticity.
For example if the heteroscedasticity is of the form

σ 2 (t) = σ 2 Xt2

increasing proportionally with Xt , then an appropropriate tranformation of the model


Yt = β0 + β1 Xt + ut is
Yt β0 ut
= + β1 +
Xt Xt Xt
i.e.
Yt∗ = β0∗ + β1∗ Xt∗ + u∗t

where Yt∗ = Yt /Xt , β0∗ = β1 , β1∗ = β0 , Xt∗ = 1/Xt and u∗t = ut /Xt .

 
It is easy then to see that var(u∗t ) = var(ut /Xt ) = 1
Xt2
σ 2 Xt2 = σ 2 = constant.

In practice Ω is unknown. Let b̂ = (X0g Xg )−1 X0g Yg be an estimator of β based on the


gth sub-group and SSEg be the corresponding error sum of squares. Then b̂ is a BLUE
estimator for β since within sub-group variability is assumed to be homogeneous. It
follows that s2g = SSEg /(ng − p) is an unbiased and consitent estimator of σg2 . A
1.4. ESTIMATION AND TESTING FOR HETEROSCEDASTICITY 7

feasible i.e. an estimated GLS


 estimator of β which is efficient
 can thus be obtained
1/σ1 0 ... 0
 .. .. .. .. 
 . . . . 
 
 0 1/σ 1 . . . 0 
. . . .
 
by substituting Ω̂ −1/2 =P = .
 . .
. .
. .
.  for the unknown Ω−1 .

 .
 ..

 0 1/σ m 0 

 . .. .. .. 
 .. . . . 
0 ... ... 1/σm

Testing for grouped and increasing i.e. ordered heteroscedasticity

The tests for heteroscedasticity discussed here assume that the disturbance term {ut }
are

(i) uncorrelated

(i) normally distributed

The Hypothessis to be tested is

H0 : σ12 = σ22 = . . . = σm
2
versus H1 : σ12 ≤ σ22 ≤ . . . ≤ σm
2

In the case of two groups the Goldfeld-Quandt testing procedure consists of perform-
ing two regressions on each of the two sets of observations. Let SSE1 and SSE2
denote the residual sum of squares from the first and second regressions respectively.
Let SSEmax = max(SSE1 , SSE2 ) and SSEmin = min(SSE1 , SSE2 ). Then the test
statistic which is given by
SSEmax /n1 − p
F =
SSEmin /n2 − p
has, under the assumption of equal variances, an F-distribution with (n1 − p, n2 − p)
degrees of freedom where p is the number of model parameters.

In the case where the variance is a function of an exogenous i.e. explanatory variable,
the procedure is to order the data with the values of Xt . Omit say c central or middle
values. The value of c is ussually chosen so that about 10% or at most, 25% of the
observations are omitted. Peform the two regressions as in the case of two groups. Let
F be the ratio of the mean error sum of squares from the two regressions. Under the
assumption of equal variances, the statistic

SSEmax /(n − c − 2p)/2


F =
SSEmin /(n − c − 2p)/2

has an F-distribution with [(n − c − 2p)/2, (n − c − 2p)/2] degrees of freddom where p


is the number of parameters.

Example 1.1 The following data show national consumption and income in millions
of dollars.
8 UNIT 1. HETEROSCEDASTICITY

consumption Y income X

1.1 1.0
2.0 2.0
2.5 3.0
2.6 4.0
2.9 5.0
3.0 6.0
3.2 7.0

Use the Goldfeld-Qaundt test to test for heteroscedasticity in the linear regression model
Y = β0 + β1 X + u for consumption as a function of income. Use a 5% significance
level.

Solution 1.1 The sample size is n = 7. Thus the number of central observations to be
excluded from the two regressions is c = 10%(7) = 1. The error sum of squares from
the first 3 observations is SSE = 0.03. Similarly the error sum of squares from the last
3 observations is SSE = 0.01. Thus with n = 7, c = 1 and p = 2, the test statistic is

max(0.03, 0.01) SSE1 /2


F = = =3
min(0.03, 0.01) SSE2 /2

and using [(n−c−2p)/2, (n−c−2p)/2] = [2, 3, ] degrees of freedom the critical region is
[F2,2,0.05 , ∞) = [19.0, ∞). Thus H0 is not rejected and conclude that the error terms
have (approximately) equal variances.

Another test for gouped heteroscedasticity but less frequently used in Econometrics is
Barttlet’s test. The testing procedure is as follows:

The test statistic is


m
" #
X
Q = 2 n∗ ln s − (ng − 1) ln s
i=1
where Pm
2 g=1 (ng − 1)s2g
s = Pm
g=1 ng −m
and s2g is the usual unbiased estimate of the variance for the gth sub-group. Under
H0 , Q follows an approximate χ2 -distribution with m-1 degrees of freedom. Thus H0
is rejected at the α signfificance level if Q > χ2m−1,α .

1.5 Forecasting in the presence of heteroscedasticity


Suppose that we detected heteroscedasticity and that we have transformed the data and
obtained estimates of the parameters of the model using GLS. Let Y? , X?1 , X?2 , . . . , X?k , be
the transformed variables or data and let
Xn+1 = (X1(n+1) , X2(n+1) , . . . , Xk(n+1) )0 be the values of the explanatory variables at
1.6. SUMMARY OF THE UNIT 9

time t = n + 1. Then it can be shown that the most efficient prediction of Yn+1 is given
by
Ŷ (Xn+1 ) = Xn+1 β ?

1.6 Summary of the Unit


In this Unit we have learnt that in practice there are several causes or sources of het-
eroscedasticity. These include mispecification i.e. omission of relevant explanatory vari-
ables, stratificaction, adminstrative interference, data treatment etc. The undesirable
consequences of heteroscedasticity include biased variance estimation and hence inef-
ficient parameter estimation, low forecasting power of there resulting model etc. Het-
eroscdsticity can be checked by ploting residuals against each predictor, plotting resid-
uals against index or time. Specific heteroscedasticity can be tested for more formally
by conducting for example, the Goldfeld-Quandt test, etc. As for auto-correlation,
compensation for effects of heteroscedasticity can be achieved by Generalized Least
Squares (GLS), noting as before that if heteroscedasticity is generated by misspecifi-
cation then there can be no effective remedy other than appropriate re-specification.
Further heteroscedasticity and other violations of the GLM assumptions can occur si-
multaneously. Inference, in particular, forecasting in the presence of heteroscedasticity
can be made in essentially the same way as for auto-correlation.
10 UNIT 1. HETEROSCEDASTICITY

Activity 1.1 .

1. Give an example of an econometric model in which the error series {ut } is

(a) heteroscedastic but uncorrelated


(b) homoscedastic but correlated

2. Suppose that heteroscedastiscity is of the form

σt2 = σ 2 Xt

Determine an appropriate transformation of Y , X and u so that the model

Y ∗ = β0 + β1 X ∗ + u∗

is homoscedastic. Justify your answer.

3. Prove that for grouped heteroscedasticity an appropriate tranformation (in what


sense?) is given by
 
1/σ1 0 ... 0
 0 1/σ2 0 0 
P =
 ...

... ... ... 
0 0 ... 1/σn

4. The Philips curve is a plot of percentage change in wages as a function of unem-


ployment. Consider the wage-unemplyment data given in Unit 2.

(i) Plot the Philips curve using unemployment not its its reciprocal.
(ii) Fit the regression model

wage = β0 + β1 unemplument + u

and test for heteroscedasticty.


(iii) If heteroscedasticity is establised transform the data or model and test for
heteroscedasticity again in the tranformed model.

4. Show how the degrees of freedom for the G-Q test are derived.
1.6. SUMMARY OF THE UNIT 11

5. The following data show expenditure data

household consumption income income group

1 22 29 1
2 22 20 1
3 20 14 1
4 24 21 1
5 12 6 1
6 30 15 2
7 32 9 2
8 26 1 2
9 26 6 2
10 37 19 2
11 12 16 3
12 8 31 3
13 13 26 3
14 25 35 3
15 7 12 3
16 23 5 4
17 25 25 4
18 28 16 4
19 26 10 4
20 23 24 4

(a) Use the Goldfeld-Qaundt test to test for heteroscedasticity in the linear re-
gression model Y = β0 + β1 X + u for consumption as a function of income
in each of the following cases.
(i) using data for income groups 1 and 2 only and c = 0.
(ii) using data for income groups 1 and 2 only and c = 2
(iii) using all the data.
In each case use a 5% significance level.
12 UNIT 1. HETEROSCEDASTICITY

(b) repeat part a(iii) using the Barttlet test.


(c) Explain why Bartlet’s test is rarely used to test for heteroscedasticity and
state, with reasons, a situation where you would justifiably use it.
References
1. Christ, C.F (1966), Econometric Models and Methods, John Wiley, New York.

2. Koutsoyiannis A. (1991) Theory of Econmetrics: An Introductory Exposition of


Econometric methods, Macmillan, Hong-kong

3. Matindike G. (1997), Commerce vol 2, College Press, Harare

4. Stanlake (1980), Introductory Economics, Longman, Harare

5. Statistical Year Book(1987), Central Statistical Office (CSO), Harare

13

You might also like