Robust Variance Equality Tests
Robust Variance Equality Tests
To cite this article: Morton B. Brown & Alan B. Forsythe (1974) Robust Tests for the Equality of Variances, Journal of the American
Statistical Association, 69:346, 364-367
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the
publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or
warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed
by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with
primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings,
demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly
in connection with, in relation to or arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is
expressly forbidden. Terms & Conditions of access and use can be found at [Link]
and-conditions
Robust Tests for the Equality of Variances
MORTON B. BROWN and ALAN B. FORSYTHE*
Alternative formulations of Levene’s test statistic for equality of vari- estimates of central locations [l, 8,131, as alternates
ances are found to be robust under nonnormality. These statistics use
to the mean in the calculation of absolute deviations.
more robust estimators of central location in place of the mean. They
are compared with the unmodified Levene’s statistic, a jackknife pro- The ten percent trimmed mean is the mean of the
cedure, and a x * test suggested by Layard which are all found to be less observations after deleting the ten percent largest and
robust under nonnormality. ten percent smallest values in that group. The median
can be considered a 50 percent trimmed mean and the
1. INTRODUCTION mean is a zero percent trimmed mean. The choice of ten
In many statistical applications a test of the equality percent trimming is arbitrary.
Downloaded by [Moskow State Univ Bibliote] at 04:57 09 November 2013
of variances is of interest. Examples are found in the This investigation of the robustness of these statistics
lot-to-lot reproducibility of a manufacturing process and is limited to two groups since their robustness or lack
in the importance of the homogeneity of variances of it is well illustrated in this simple case. When the
assumption in many statistical procedures. Unfortunately, underlying distribution is Gaussian, the usual F-test is
the common F-ratio and Bartlett’s test are very sensitive a check on the sampling experiment. For purposes of
to the assumption that the underlying populations are comparison, the jackknife procedure of Miller [l2] and
from a Gaussian distribution [3, 121. When the under- a x2 test presented by Layard [9] are included.
lying distributions are nonnormal, these tests can have
an actual size several times larger than their nominal 2. DEFINITION OF THE STATISTICS
level of significance. Let z;j = pi + e ; j by the j t h ( j = i, . . ., n;) observa-
Two recent papers [7, 91 have compared several tion in the i t h group (i = 1, . . I, 9) where the means p i
alternative statistics for this problem. The statistics are neither known nor assumed equal. The e;j are in-
least sensitive t o nonnormality were of the same form: dependent and similarly distributed with zero mean and
each group is randomly divided into subgroups; the possibly unequal variances. For each group the sample
standard deviation of each subgroup is calculated; and mean ( 2 , ) and sample variance (st) are estimated in the
a one-way ANOVA is calculated between groups using usual manner. The statistics compared are :
the logarithms of the subgroup variances as data points The usual F-test; F = sf/& The critical values of F
[2, 31. Different results may be produced by alternate are taken from the Snedecor F-table with n1 - 1 and
subdivisions of the data points into subgroups. These n2 - 1 degrees of freedom (df) for the a/2 and 1 - a/2
procedures were not considered because of the non- percentiles. Bartlett’s test statistic [9] yielded essentially
uniqueness of the results and the loss of power caused by the same results for the two-sample case and therefore
subdividing the samples [12]. is not reported.
Levene [lo] proposed a statistic for equal sample sizes Levene [S , 101 suggested the following statistic :
which was subsequently generalized to unequal sample I
Let zi, = z;, - ~ ; l . Then form the one-way ANOVA
sizes [S]. The statistic is obtained from a one-way statistic
ANOVA between groups, where each observation has C ; n;(zi. - z. . ) 2 / ( g - 1)
been replaced by its absolute deviation from its group wo =
mean. Miller [11] has pointed out that for very small xi Cj (zij - zi.)z/Ci (ni - 1) ’
samples the high correlations between deviations in the where
same group destroys the validity of the test. However, 2; = C zij/ni and z.. = C C zij/C n;.
empirical sampling with ten or more observations per
group does not indicate that this is the problem. In our The critical values of W oare obtained from the Snedecor
sampling this statistic was not robust when the under- F-table with g - 1 and Xi (ni - 1) df. Alternate
lying populations were skew. This led us to consider the formulations considered by Levene were to replace z ; j
median and ten percent trimmed mean, more robust by & or by log (zij). Since, in our empirical sampling,
both are less powerful than using z i j (although d< is
* Morton B. Brown is senior lecturer, Tel-Aviv University, Tel-Aviv, Israel, quite similar), only WO’S results are reported.
formerly visiting research statistician, Department of Biomathematics. University
of California, Los Angeles (1972-731. Alan B. Forsythe is supervising statistician, Replacing the mean 2; by the median in [Link]
Department of Biomathematics, Health Sciences Computing Facility-AV 111,
Center for Health Sciences, University of California, Los Angeles, Calif. 90024. @ Journal of the American Statlstical Association
This research waa supported by NIH Special Research Resources Grant RR-3.
Both authors contributed equally to the research in thia article and are listed June 1974, Volume 69, Number 346
alphabetically. Applications Section
364
Robust Tests for the Equality of Variances 365
defines W60. By replacing the mean f i by f , where f , is These were repeated 1,000 times for each distribution in
the ten percent trimmed mean of the i t h group, Wlo is ten blocks of 100 trials. The numbers of rejections a t
defined. both the nominal five-percent and one-percent levels of
Layard [9] suggested a X2 test statistic which is a significance were recorded.
function of the kurtosis. The kurtosis is estimated by
pooling the numerators and denominators of the in- 4. RESULTS AND CONCLUSIONS
dividual estimates of kurtosis for each group. Using The results of the empirical sampling from the Gauss-
his notation let ian, Student's-t with four df and the chi-square with
is the estimate of the kurtosis. Then S' is asymptotically the reader, the power results are not shown for tests
distributed as a x2 variable with (g - 1) df. whose empirical sizes are greater than eight percent and
Miller's [12] jackknife procedure was generalized by are parenthesized for tests whose sizes are between seven
Layard [9]. Let and eight percent.
When there are equal variances in the two groups, the
results for the Gaussian distribution (table, part A)
indicate that W50 is conservative for small sample sizes.
The results for the other test statistics are not incon-
(-) 1
3 a ( j ) = nc - 1 c k # j x , k (2 = 1, * ' .1 9 ) sistent with the sampling error. The powers of these
statistics do not differ greatly when the differences in
and let
their empirical size are taken into account.
u,, = n, log s,2 - (n, - 1) log &,. When the distribution is long-tailed (table, part B),
Compute a n F-statistic from a one-way analysis of the F-test rejects far too often. For unequal groups, the
variance on the U,,, namely, size of the jackknife statistic is larger than it should be.
c
J =
n,(Ut. - U . . ) 2 / ( g - 1)
This may be due to the lack of robustness of the ANOVA
when the within group variances are unequal [5]. The
cc - Ua.12/C(n, - 1) Layard ~2 deviates from its level of significance for the
and test H o by approximating the null distribution of J smaller sample sizes. Since an estimate of kurtosis is
b y the F o - ~ . ~ ( , , - distribution
-l) used in the calculation of the statistic, this result is not
surprising. Of the three Levene-type procedures, WIO
3. DESIGN OF THE SAMPLING EXPERIMENT appears the most robust. It varies in size less than WO
and is nearer to 5 percent than w50.
Pseudorandom numbers were generated from four Sampling from the Cauchy distribution emphasized
underlying distributions : the Gaussian by the Box- the preceding departures from the nominal size. For the
Muller method [4] ;the chi-square distribution with four four sets of sample sizes considered here, the rejection
df as proportional to the logarithm of the product of two rates were excessive for all but Wlo and [Link] rates
uniform random numbers; the Student-t with four df as for a nominal five-percent level were 6&80 percent
proportional t o the ratio of a Gaussian to the square root rejection for the F-test, approximately 20 percent for the
of a chi square with four df ; and the Cauchy as the ratio jackknife, 28-36 percent for Layard's X 2 , and 15-21
of two Gaussians. The chi-square distribution is skewed percent for Levene's Wo. For W10 the observed rates
t o the right while the last two distributions are symmetric were 2.8-5.9 percent and for W60 1.8-3.5 percent. Again
but long-tailed. The long-tailedness of the Cauchy is Wlo was closer to five percent than WSo.
more pronounced than that of the t4. For each distri- The results of sampling from the chi square with
bution a different series of random numbers was used.
four df are reported in the table, part C. Only W S O
A set of 80 random numbers were generated a t one
time. Pairs of groups of sample sizes (40, 40), (20, 40), maintains its size near the five-percent level of signifi-
(10, lo), and (10, 20) were selected from the set of 80 cance. The others depart from their nominal sizes by
observations and used in the calculation of the test rejecting far too often.
statistics. The observations in the second sample were The preceding results indicate th a t the 'equality of
rescaled to reflect the ratios of population variances of variances in long-tailed distributions can best be tested
the two groups; i.e., 1:1, 1:2, 1:4, 2:1, and 4 : l . All the by a statistic of the form of WIo and asymmetric distri-
test statistics were computed on the same pairs of groups. butions by a statistic similar to Wso. Therefore, when
366 Journal of the American Statistical Association, June 1974
n b n2 u12:
u*1 F Jackknife Layard 2 Levene (Wo) WIO w511
A. Gaussian distribution
40,40 1:1 6.3 5.8 6.5 6.4 6.1 5.1
2:l 57.5 54.2 56.3 51.1 50.8 48.4
4:l 98.2 98.1 98.2 97.1 96.9 96.7
10,lO 1:1 5.4 4.6 6.3 5.5 4.9 2.9
2:l 16.7 13.9 19.5 15.8 14.7 9.8
4:l 51.3 42.1 51.1 44.3 41.4 31.7
20,40 1:1 5.8 5.7 6.2 5.8 5.2 4.5
2:l 43.4 41.3 37.8 38.8 37.5 32.9
4: 1 92.0 89.4 88.4 88.5 88.0 85.7
1 :2 36.6 38.0 42.9 33.2 32.9 31.1
1 :4 92.0 90.5 93.3 86.3 85.3 83.9
10,20 1:1 4.8 5.2 6.6 5.7 5.2 4.0
2:l 24.3 19.7 17.7 21.5 20.4 15.7
Downloaded by [Moskow State Univ Bibliote] at 04:57 09 November 2013
departures from normality are anticipated, the estimate robust tests of homogeneity of variances must take into
of the mean for each group in the Levene statistic should consideration robust estimates of location.
be replaced by a more robust estimate of central location. [Received July 1973. Revised January 1974.1
The loss in power that occurs when Wlo is used in place
of W Oi s small relative to the increased probability of a REFERENCES
false rejection of the null hypothesis caused by non- [l] Andrews, D.F., et al., Robust Estimates of Location: Survey and
normality. Aduances, Princeton, N.J. : Princeton University Press, 1972.
These results also indicate that future research into [a] Bartlett, M.S. and Kendal, D.G., “The Statistical Analysis of
Robust Tests for the Equality of Variances 367
Variance-Heterogeneity and the Logarithmic Transformation,” [8] Huber, P.J., “Robust Statistics: A Review,” Annals of Mathe-
Supplement to the Journal of the Royal Statistical Society, Ser. matical Statistics, 43,No. 4 (1972),1041-67.
B, NO. 1 (1946),128-38. [9] Layard, M.W.J., “Robust Large-Sample Tests for Homo-
[3] Box, G.E.P., “Non-Normality and Tests on Variances,” geneity of Variances,” Journal of the A w i e a n Stalistical
Bimnetrika, 40,No. 1 and 2 (1953),318-35. Association, 68,No. 341 (March 1973), 195-8.
c41 -and Muller, M.E., “A Note on the Generation of [lo] Levene, H.,“Robust Tests for Equality of Variances,” in I.
Normal Deviates,” Annals of Mathematical Statistics, 29, No. 2 Olkin, ed., Contributions to Probability and Statistics, Palo Alto,
(1958), 610-11. Calif. : Stanford University Press,1960,278-92.
[5] Brown, M.B., and Forsythe, A.B., “The Small Sample Be- [ll] Miller, R.G., Jr., Appeared in “Letten to the Editor,” Tech-
havior of Some Statistics which Test the Equality of Several n d r i c s , 14,No. 2 (1972),507.
Means,” Technometrics 16, No. 1 (1974),129-32. c121 - , “Jackknifing Variances,” Annals of Mathematical
[6] Draper, N.R. and Hunter, W.G., “Transformations: Some Statist&, 39,NO. 2 (1968),567-82.
Examples Revisited,” Technometrics, 11, No. 1 (1969),2340. [13] Tukey, J.W. and McLaughlin, D.H., “Less Vulnerable Confi-
[7] Gartside, P.S., “A Study of Methods for Comparing Several dence and Significance Procedures for Location Based on a
Variances,” Journal of the American Slatistical Association, 67, Single Sample : Trimming/ Winsorieation I,” Sankhyd, Ser. A.,
No. 338 (June 1972),3424. 25, NO. 3 (1963),33-52.
Downloaded by [Moskow State Univ Bibliote] at 04:57 09 November 2013