0% found this document useful (0 votes)
36 views20 pages

Institute of Mathematical Statistics

This document reviews various criteria that have been proposed for identifying outliers in a sample from a normal population that may be contaminated. The criteria can be grouped into those that assume knowledge of the population variance versus those that do not. Criteria in the first group include x2 tests and measures of extreme deviation. Criteria in the second group that rely only on sample information include modified F tests and measures based on the sample range. The document aims to evaluate the performance of these criteria at discovering different types of contamination and potential biases they may introduce.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views20 pages

Institute of Mathematical Statistics

This document reviews various criteria that have been proposed for identifying outliers in a sample from a normal population that may be contaminated. The criteria can be grouped into those that assume knowledge of the population variance versus those that do not. Criteria in the first group include x2 tests and measures of extreme deviation. Criteria in the second group that rely only on sample information include modified F tests and measures based on the sample range. The document aims to evaluate the performance of these criteria at discovering different types of contamination and potential biases they may introduce.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Analysis of Extreme Values

Author(s): W. J. Dixon
Reviewed work(s):
Source: The Annals of Mathematical Statistics, Vol. 21, No. 4 (Dec., 1950), pp. 488-506
Published by: Institute of Mathematical Statistics
Stable URL: http://www.jstor.org/stable/2236602 .
Accessed: 24/12/2012 01:12

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].

Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to The
Annals of Mathematical Statistics.

http://www.jstor.org

This content downloaded on Mon, 24 Dec 2012 01:12:31 AM


All use subject to JSTOR Terms and Conditions
ANALYSISOF EXTREMEVALUES
BYW. J. DIXON1
Universityof Oregon
1. Introduction. It is well recognized by those who collect or analyze data
that values occur in a sample of n observations which are so far removed from
the remaining values that the analyst is not willing to believe that these values
have come from the same population. Many times values occur which are "diu-
bious" in the eyes of the analyst and he feels that he should make a decision as
to whether to accept or reject these values as part of his sample. On the other
hand he may not be looking for an error, but may wish to recognize a situation
when an occasional observation occurs which is from a different population.
He may wish to discover whether a significant analysis of variance indicates an
extreme value significantly different from the remainder. Also, of course, the
extreme value may differ significantly without causing a significant analysis
of variance and he may wish to discover this. It is reasonable to suppose that a
criterion for rejecting observations would be useful here also. The choice of a
suitable criterion for rejecting observations introduces a number of questions.
1. Should any observations be removed if we wish a representative sample in-
cluding whatever contamination arises naturally? In other words, it may be
desirable to describe the population including all observations, for only in that
way do we describe what is actually happening.
2. If the analyst wishes to sample the population unaffected by contamination
he must either remove the contaminating items or employ statistical procedures
which reduce to a minimum the effect of the contamination on the estimates of
the population. That is, he may wish to describe only 95% of his population
if the description is altered radically by the remaining 5% of the observations.
He may have external reasons which are good and sufficient for wishing to de-
scribe only 95% of his observations. Suppose he wishes to use the sample for a
statistical inference; the inclusion of all the data may sufficiently violate the
assumptions underlying the inference to exclude the possibility of making a valid
inference.
This paper will concern itself only with those problems wvhich arise from Ques-
tion 2.
If we wish to follow some procedure which attempts to remove contamination
we must consider the performance of any proposed criterion with respect to the
propoition of contamination the criterion will discover and, of course, the propor-
tion of the "good" observations which are removed by the use of the criterion.
But, perhaps more important, we must consider what sort of bias will resuilt
when the standard statistical procedures are applied to samples of observations
which have been processed in this manner.

IThis paper wi-asprepared under a contract with the Office of Naval Research.
488

This content downloaded on Mon, 24 Dec 2012 01:12:31 AM


All use subject to JSTOR Terms and Conditions
EXTREME VALUES 489

If we wish to follow a procedure which will not search for particular values to
be excluded but will minimize their effect if present, we must investigate the
sampling distributions of these modified statistics and estimate the loss in in-
formation resulting from their use when all observations are "good." We must
also investigate the expected bias which will result when "bad" items are present
even though essentially excluded. Perhaps most disturbing about the avoidance
of "bad" items is the fact that a decision must still be made as to whether a
"bad" item was present or not in order to know in which way our estimates may
be biased. For example, a sample mean computed by avoiding the two end ob-
servations will not be a biased estimate of the mean of a symmetric population
if both end items should actually be included or if both end items should not be
included. However, if only one of the two should not be included this estimate of
the mean will be biased.

2. Models of contamination. The performance of the various criteria for dis-


covery of one or more contaminators will be measured with reference to con-
taminations of the following two types entering into samples of observations
from a normal population with mean u and variance _2, N(p, o2)
A. One or more observations from N(Q + Ao, 2)2

B. One or more observations from N(y, X2a2).


A represents the occurrence of an "error" in mean value such as will occur in
dial readings when errors are made in reading incorrectly digits other than the
last one or two digits. Errors of this sort may result from momentary shifts in
line voltage or from the inclusion among a group of objects of one or two items
of completely different origin. This type of contamination will be referred to as
"location error." B represents the occurrence of an "error" from a population
with the same mean but with a greater variance than the remainder of the sample.
This type of error will be referred to as a "scalar error." It is likely that many
errors could be better described as a combination of A and B, but a study of these
two errors separately should throw considerable light on the question of "gross
errors" or "blunders."
Many authors have written on the subject of the rejection of outlying observa-
tions. Apparently none have been successful in obtaining a general solution to
the problem. Nor has there been success in the development of a criterion for
discovery of outliers by means of a general statistical theory; e.g., maximum
likelihood. A large number of criteria have been advanced on more or less intui-
tive grounds as appropriate criteria for this purpose. In no case was investigation
made of the performance of these criteria except for a few illustrative examples.
References for the criteria discussed in the next section are given at the end
of this paper. Indications are given as to the significance values available in
those papers.

This content downloaded on Mon, 24 Dec 2012 01:12:31 AM


All use subject to JSTOR Terms and Conditions
490 W. J. DIXON

3. Criteria to be considered. The performance of two types of criteria has


been investigated for samples contaminated with location or scalar errors.
a) af known or estimated independently,
b) crunknown.
The n observations are ordered xi < x2 < ... < x.. The criteria involving
external knowledge of o-are:
A. x2test,
2 2 (x_ x2
X 2

B. Extreme deviation,
-
B1 = (or X)

Xn -Xn-1 I X2 -XlX
B2 = or -x )

C. Range,

C , W-X7-wX= -

C2= s2 (x - x)2 (s independently estimated).

The criteria involving only the information of a single sample of n observations


are:
D. Modified F test.
1. For single outlier xi,
s2 n n
D= S2 where S1 = E (x - Ex2, / (n 1),
S2 2 2
n n
=
s2 (X-X)27 xln
1 1

n
(or forxn, D1 = S

2. For double outliers x1 ,X


s2 n nL
A2 = 1S, 2 where S2L2 = - x1>, 22 2 E x/(n-2)
3 3

(or for Xn Xn11, D2 =

E. Ratios of ranges and subranges.


1. For single outlier xi,

This content downloaded on Mon, 24 Dec 2012 01:12:31 AM


All use subject to JSTOR Terms and Conditions
EXTREME VALUES 491

X2 - Xi

Xn - Xi

(or for Xn X rio = Xn-Xn1)

2. For single outlier xi avoiding xf,

ril - X1
Xn-I - Xl

(or for xn avoiding xl, rll = Xn Xni1

3. For single outlier xl, avoiding xn, xn_,


X2 -X
r12 =
Xn-2 - Xl

(or for xn avoiding xl, X2, r12 = Xn- Xn1)

4. For outlier xi avoiding x2,


X3 - Xl
r2O =
Xn - XI

Xn Xn-2
(or for Xn avoiding Xn1 r20 = )
-n Xl

5. For outlier xi avoiding x2 and xn,

X3 -Xl
r2l=
xn-l - X1
Xn/ XnXIX

(orfor Xn avoiding xn-I , x1 , r2l = X


Xn
- X-2)
X

6. For outlier xi avoiding x2 and xn, xnl,


X3 -X
r22 = -
Xn-2 -X

(or for Xn avoiding xn-1, xl, X2, r22 = Xn -xn).

F. Extreme deviation and standard deviation.


For single outlier Xn,
x
F =
-Z
(or for x ,F _ Xl)
s s

The performance of the large number of criteria listed here will be assessed
with respect to discovery of contamination of the type given in Section 2.

This content downloaded on Mon, 24 Dec 2012 01:12:31 AM


All use subject to JSTOR Terms and Conditions
492 W. J. DIXON

4. Performanceof criteria(estimate of a available).The x2 test will of course


give an indication of a large dispersion and since the extreme values are chief
contributors to the sum of squares, it is possible to use this test as a criterion for
rejecting a value or values which are at the greatest distance from the mean.
It might be supposed the B1 and B2 would give better results since particular
attention is paid to the end item. The same argument would influence one in
favor of C, or C2 . The performance of C2 can, of course, be expected to vary with
the degrees of freedom in the independent estimate of r. For this study the de-
grees of freedom for this estimate were held to the single value 9 d.f.
x may be used since if the value of x2 is too large (greater than some upper per-
centage point for x2) we might reject the value most distant from the mean.
x2 tables may be used for percentage points. Percentage points for the other
statistics considered here are given in the references at the end of this paper.
The criteria A, B1, B2, C1, C2 were investigated for a = 1%, 5% and 10%
for X 2, 3, 5, 7, where one or more items are selected from a population N(1, +
Xac,O) and the remainder from N(p, 2). Investigations were also made for one
item from N(M,X2a2)for X = 2, 4, 8, 12. The investigation was carried out by
sampling methods. The performances of different criteria were assessed for the
same group of samples in order to obtain more precision in the comparison of the
(lifferent tests. All of the points appearing on the graphs in the subsequent sec-
tions of this paper were based on from 66 to 200 determinations.
The performance of the above criteria is measured by computing the propor-
tion of the time the contaminating distribution provides an extreme value and
the test discovers this value. Of course, performance could be measured by the
proportion of the time the test gives a significant value when a member of the
contaminating population is present in the sample, even though not at an ex-
treme. However, since it is assumed that discovery of an outlier will frequently
be followed by the rejection of an extreme We shall consider discovery a success
only when the extreme value is from the contaminating distribution.
The performance was judged by applying the criteria to each sample, always
suspecting an outlier in the direction of the shifted mean for location error.
Since the location errors were inserted by adding a fixed value to one or more
of the observations, the largest value was tested as an outlier. The measure of
performance was the percentage of location errors identified. When the location
error was not an outlier, no test was performed and a failure for the test recorded.
In the case of the model of contamination involving the scalar error, the value
was suspected which was farthest from the mean. This of course, alters somewhat
the level of significance, but this procedure was followed alike for all criteria
investigated. The performance was measured in the same fashion as for location
errors.
Considering first, location errors, a study of the performance curves showing
the per cent discovery of contaminators plotted against X (the number of standard
deviation units the population of contaminators is removed from the remainder),
shows that the level of performance for 47 known is considerably above the level

This content downloaded on Mon, 24 Dec 2012 01:12:31 AM


All use subject to JSTOR Terms and Conditions
EXTREME VALUES 493

of performance when a is not known. The difference is greater for n = 5 than


for n = 15 and, of course, the difference will diminish as the sample size increases.
Figure 1 shows the performance curves for a = 5% (5% significance level for
the test for an outlier) of B1 = (x. - x)/la for n = 5 and n = 15 and of rio
Xn- =
- for n = 5and n 15.
X- Xi

The graphs for a = 1% and 10% would be similar in appearance. Figure 2


indicates the change in performance for a = 1%, 5%, and 10%. The curves
plotted are for B1 = (xn - t)/a. The curves for A, B2, Cl, C2show very similar
results.
The curve for test B1 was used in Figures 1 and 2 since it gives the best per-
formance of all criteria which are considered here if a single location error is
present. The curves showing the comparative performance of these criteria as

/
B/

'~71
75-
l
rX loll -- 7 A-) } T

/1 1'f//! j_ - --T

eSt~~ 0-~~~ 1-fX l2Xr7- t -1-

/ _ _? _5_ _> - > - -- 7 -

_
/~~~7
, S 6 S 6 7 A

FIG. 1. Improvement in performance ob- FIG. 2. The effect of the level of signifi-
tained with knowledge of a, a = 5%, n - 5, cance on the performance of B1 ; a =%,
15. 5%, 10%; n = 5, 15.

wvellas one to be considered later (rio) are given in Figure 3 for a = 5% and for
n = 5 and n = 15.
The following statements can be made from inspection of Figure 3:
a) The differences among A, B1, B2, and Ci are not great.
b) The knowledge of a- is less important in larger samples.
c) The curve for C2 lies above that of rio for n = 5 and below that of r1ofor
n = 15. This is consistent with the use of 9 d.f. in the independent estimate
of o.
If the question of ease in computation or application is important, it may be
desirable to use B2 or C1 in place of B1 for they are slightly easier to compute
and it is not necessary to measure all observations to obtain the value of these
statistics. From Figure 3 it will be noted that the performances of these criteria
are nearly as good as for B1 . If two outliers may be expected in a single sample,

This content downloaded on Mon, 24 Dec 2012 01:12:31 AM


All use subject to JSTOR Terms and Conditions
494 W. J. DIXON

CZ

rok ; $ /~~~~S 7-= W /_1

0 / 2 3 4L; 6 7 A 0 / 2 3 4 s. 6 A

FIG. 3. Comparison of the performance of criteria using a known (or using external
estimates of a) and rio for samples of size 5 and 15, a = 5%.

the performance of B2 will be lowered and the performance of B1 and C1will be


improved. Any differences between the performance of B1 and the performance
of C1when two outliers are present was not discernable for n = 5 or 15. Figure 4
illustrates the improvement in performance for B1 for a 5% and n = 15.
The performance curves of these criteria if a scalar error is present are very
similar to those above except that:
1. A high level of performance is approached very slowly. For example, see
Figure 5 showing the performance of B1 and r1ofor nr- 5 and n = 15 and a = 5%.
2. There is a smaller difference in the performance between the criteria with
a known and a unknown (see Figure 5).
The performance of B1 and C1 are noticeably increased by the introduction
of more contaminators while that of B2 decreases. No difference in the perform-

/00-

as50- - +i
-
-?

9 / 2 34 5S 75?8
FIG. 4. Comparisonof the performance of B1for one and two location errors in samples
of size 15, ax- 5%.

This content downloaded on Mon, 24 Dec 2012 01:12:31 AM


All use subject to JSTOR Terms and Conditions
EXTREME VALUES 495

ance of B1 and C1were noted for either n = 5 or n = 15. Figure 6 shows the in-
crease in performance of two contaminators for B1 for n = 15, a = 5%.
The general recommendations for possibilities of either type of contamina-
tion, location or scalar errors, would lead one to the use of B1 or C1if o-is known.
Criterion C1 is recommended since:
1. Its performance is almost as good as the performance of B1 for a single
outlier. Their performances are about equal for two outliers and C1 affords pro-
tection for outliers either above or below the mean.
2. It is simple to compute.
If ease of computation is not essential and maximum performance is desired,
the criterion B1 should be used. The performance of C2 will approach that of
B1as the number of degrees of freedom in the denominator increases.

O / 2 3 + 5 6 Z 8TWO E2 3O/5--7 8

;7
-5.

FIG. 5. Comparison of the performance of FIG. 6. Comparison of the perfo: mance


B1 and r,ofor one scalar error for samples of B1 for one and two scalar errors in samples of
size 5 aIid 15, C8 = 5%. size 15, CY= 5/.

6. Performance of criteria (no external estimate of a). Criteria Di and A2


have strong intuitive reasons for their use since the dispersion is estimated by
s2. The r ratios are attractive becauseof their simplicityand their preoccupation
with the extreme values. Test F is the "studentized " ratio corresponding to Bl,
and is equivalent to DI since Di= 1-F2/(n-1). There is no apparent dif-
ference in the performance of Di and rio when one outlier is present and no
apparentdifferencein D2 and r2owhen two outliers are present. This is true for
both models of contamination and for the three levels of significance investigated.
However the comparisonof D2 and r2owas made only for n-=5 since critical
values are not available2 for A2 for n-= 15. (Critical values are availablefor
n < 12.)
The performance of Di and rio under the two models of contamination can
be obtained by reference to the curve for rio in Figure 1 and Figure 5. The curve
for DI is practically identical with the curve for rio .

2After this paper was submitted, the critical values of D2 have been extended to n < 20
(see references).

This content downloaded on Mon, 24 Dec 2012 01:12:31 AM


All use subject to JSTOR Terms and Conditions
496 W. J. DIXON

There is no question that rio is simpler to use, so that if this condition of


contamination (scalar errors) exists, rio would probably be chosen. However, as
before, we should investigate what happens when more than one error is present.
D2 is designed for this case as is r2o. Since the performance of these two criteria
is approximately the same, r20would probably be chosen because of its simplicity.
Critical values for this statistic are available for n < 30.
r1l, r12, r20, r21, r22were designed for use in situations where additional out-
liers may occur and we wish to minimize the effect of these outliers on the in-
vestigation of the particular value being tested.
It has been suggested that D1 could be used repeatedly to remove more than
one outlier from a sample. This procedure cannot be recommended since the
presence of additional outliers handicaps the performance of both D1 and rlo
for small sample sizes and therefore the process of rejection might never get
started. For larger sample sizes the performance of D1 is affected much less by
the presence of two errors than is the performance of rio . The repetitive use of
Di is not recommended in this case either since r20performs in a superior man-
ner to D1 in such situations. This difference in performance of D1 and rio de-
pends markedly on the level of significance used as well as the sample size.
For small samples there is little difference in perfoimance for any of the levels
of significance one might use. For the larger sample sizes there is no appreciable
difference for very high levels of significance. The diffefence is however very
great for lower levels of significance. In fact as X increases for two errors of the
location type, the level of significance which divides the region of approach to
zero performance from the region of approach to perfect performance of D1 is
given by the level of significance correspondingto a significance value of!( - )

n
for D1. Thus, for example, in samples of size 15, =2 = .536.
This value lies between the values for the 2.5% and 5% level of significance.
These values are .503 and .556 respectively. Therefore the use of the 1% or
2.5% levels will give poorer and poorer performance as X increases, and the
use of the 5% or 10% levels will give better and better performance as X increases
when two errors are present. The dividing point is such that for samples of
size 11 or less the use of any of the given levels of significance will cause the
performance to decrease as X increases. For samples of size n < 14 the 1%,
2.5% and 5% levels have the same effect, and for samples of size n < 16 the 1%
and 2.5%, for samples of size n < 19 just the 1% level. For three such errors
2
the limit approached by D1 as X increases is n - . Therefore, the perform-
ance of D1 will approach zero for all levels of significance and for all sample
sizes for which critical values are known except the 10% level of significance
for sample sizes larger than 21. An indication of these limiting values c 1 n
k n -t
for k contaminations present can be obtained by considering these k values to

This content downloaded on Mon, 24 Dec 2012 01:12:31 AM


All use subject to JSTOR Terms and Conditions
EXTREME VALUES 497

/O ~ ~ ~ ~ ~ ~ 0

k70~~~~~5

S.

ri

-0t r -----|A
- - - .

0
250-4ELeLI~~~ AL
/ 2 3 + $5 6 7 8
-4
0 / 2 3 4 5 6 7 8

FIG. 7. Comparison of the performance of FIG. 8. Comparison of the performance of


the r criteria for one location error in the r criteria for one scalar error in samples
samples of size 5, a = 5%. of size 5, a = 5%.

be at a distance k from the population mean, computing D1 and allowing X to


increase indefinitely.
The comparative performance of the r criteria, a = 5%, in samples of size 5
for the two models of contamination (one contaminator present) are given in
Figures 7 and 8. For samples of size 15 the curves are given in Figures 9 and 10.
A single curve suffices here since there is no discernable difference in the curves
for the different r criteria. There is considerable difference in the performance
curves if more than one outlier is present. However, the performances of r10,
r1l, rI2 are essentially the same when two location outliers are present as are
the performances of r2o, r2l, r22. Figures 11 and 12 show the comparative per-
formance of r1o, ril, r12for one and two contaminators for a = 5% and n = 5.
Figures 13 and 14 are for n = 15. Figures 15 and 16 show the comparative per-

25:XXX~~~~~~i iX _4 A i -
O I 23> > 4 5 6 7 8 0 / e23 4 5 6 7 8
FIG. 9. Performance of the r criteria for FIG. 10. Performance of the r criteria for
one location error in saniples of size 15, a = one scalar earrorinsamples of size 15, a = 5%/o.
5%,/.

This content downloaded on Mon, 24 Dec 2012 01:12:31 AM


All use subject to JSTOR Terms and Conditions
498 W. J. DIXON

/90/Oo - |-o | I 0 TI IF I i1 -

FIG. 11. Comparison of the performance FIG. 12. Comparison of the performance
of the ri. criteria for one and two location of the r1. criteria for one and two scalar
errors in samples of size 5, a = 5%. errors in samples of size 5, a = 5%.

formance for r20, r2l, (r22 is not a test for n = 5) for one and two contaminators
for a = 5% and n = 5. Figures 17 and 18 are for r20, r2l , r22for n = 15. The
six curves represented by the single curve of Figure 17 lie within 5% of the
curve shown. The same is true of the three curves represented by each of the
two curves of Figure 18.
Since no loss in performance results for larger samples from the use of r2O,
r2i, r22 in place of rio, nrl, r12, and further, these criteria are not appreciably
affected by the presence of another outlier it would seem unwise to recommend
the use of rio, ri2, r12. However, note that for small samples (see Figures 11 and
12) the performances of rlo and ril and r12are considerably better when a single

/00 ] r - - 6 -7
- / I

7~~~~~~~~~~~~~~~~~PZ

err in sa 1 sie 7t isp of a =

fFIG.
O9 1. Coprio othpefrace Fi. 14. ,Coprison of tll pronac
ofthr. crteriafo on adtwloain fthricieiaor on an tw scla
:yo 0) 50 o /

0 0
/ 234v5 678 d / 2 3 45 678

FIG. 13. Comparison of the performance FIG. 14. Comparison of the performance
of the r1. criteria for one anld two location of the r1. criteria for one and two scalar
errors in samples of size 15, az = 5%. errors in samples of size 15, az = 5%0.

This content downloaded on Mon, 24 Dec 2012 01:12:31 AM


All use subject to JSTOR Terms and Conditions
EXTREME VALUES 499

5/.~~~~~~~~~
O1V16_ Ic- wo I ___ _
7w/Y E2 OeROe-SeeS, _____ 1

5o
11_, 50-,>

o X Iz 3 4- s5 6 7 83 ? 3 v s5 6' 7 6
FIG. 15. Coinparison of the performance FIG. 16. Comparison of the performance
of the r2. criteria for one and two location of the r2. criteria for one and two scalar er-
errors in samples of size 5, a = 5%1o. rors in samples of size 5, a = 5%.

outlier is present. Therefore in larger (n > 10) samples r20 or r2l Would appear
to be the best criteria. In samples of size 10 or less, r10or r2Oshould be used;
r21if the extreme value at the opposite end should be avoided.
It should be noted in the comparisons that no model of contamination was
investigated which would cause one or more errors at both extremes in the
sample. It is obvious that the performance of D1 and D2 would be conisiderably
decreased while the performance of r11, r12, and r21, r22wvould not be materially
affected since these criteria avoid values at the opposite extreme. Their repeated
use might discover most of such outliers, while D1 or D2 might fail on the first
trial.

~~~~~~~~~/0c. . __/n
I elR_.

-rvvo E-zzeos-__

50 - A ---. 1 50 - -

25tX2 1 X a t 1 I J 1
2 6
0FIG. 18.
FIG. 17. Comparison of the performance Comparison of the
ofrOr
thei3nsm s for one and two location er- of the r2.
r2. criteria /2
criteria for one and twoi -scalar
performanice
-
er-
rors in samples of size 15, a = 5%. rors in samnples of size 15, a = 5/.

This content downloaded on Mon, 24 Dec 2012 01:12:31 AM


All use subject to JSTOR Terms and Conditions
50)0 W. J. DIXON

75---- ~~~~~~~~7$
~ ~ - ~ -

250 0 X 1 X \ 5 LI---L A

0 2 3 4 5 6 7 8 0 38 6 7
n=5 n=15
FIG. 19. Performance of B, for various levels of significance when the population is 10%
contaminated with location errors.

6. Sampling from a contaminated population. In the previous sections the


performance of the various criteria were assessed for samples where a certain
number of contaminators were present. One might well ask why a test is needed
is it is known that contaminators are present. It would seem more realistic to
state that a certain per cent of contamination will occur in the long run and
that one will not know in any particular case whether 0, 1, 2, . . . contaminators
will be present. One would then wish a criterion to indicate the presence of
contamination in a particular sample.
The performances of these criteria will be investigated for the same two
models of contamination and their performances will be reported as per cent of

/00--?~~~~~~~~~~0

,oo' , . - oo_.
..725- - /0

0 3 0_ 5 / _ 4 6 78
n=5 n =15
FIG. 20. Performance of B1 for various levels of significance when the population is 10%
contaminated with scalar errors.

This content downloaded on Mon, 24 Dec 2012 01:12:31 AM


All use subject to JSTOR Terms and Conditions
EXTREME VALUES 501

50 l0/;
A00. ~ ~ ~ ~ ~ ~ ~ ~ t, ~ ~ ~ Xt0

A2 ~ ~ ~ ~ ~ ~ ~ 25

-?-~~~~~A
0 / 203456 8 o / 234567 8
n=5 n= 15
FIG. 21. Performance of B1 for various levels of contamination for location errors and
using the 5% level of significance.

total contamination discovered. The tests will be applied only once to each
sample. Repeated use of the criterion would in many cases increase the per cent
of total contamination discovered. It is not known what effect such a procedure
would have on the level of significance.
Investigation has been made for 5, 10, and 20% contamination. For example,
in samples of size 5 which have 10% contamination, on the average, 59.0% of
the samples will contain no "errors", 32.8% will contain one, 7.3% two, 0.8%
three, 0.1% four, and 0.0% five. Thus in 100 samples of 5 which are 10% con-
taminated with location errors having mean A + 5o-, about 59 contain no errors.
If the rio criteria is used with a 5% level of significance one value will be "dis-

7S- 75~~~00
- - - - -{- -

50- ~ ~ ~ ~ ~ ~ ~ ~ 7

FIG.
0

Xq~~~~~~i
n =5
A:A 0~~~~~~~~~~~~~~2

22. Performance of B1 for various levels of contamination for scalar errors and
n =15

using the 5% level of significance.

This content downloaded on Mon, 24 Dec 2012 01:12:31 AM


All use subject to JSTOR Terms and Conditions
502 W. J. DIXON

/00 j /001-
- -

254- - - 75 - A

as - _ S W_ A

0 / L?34D 5 67z8 0 / 2 3 4L564i? 8


(Location) (Scalar)
FIG.23. Performance of r1o, D, r, D2 in samples of size 5 using the 5% level of signifi-
cance and sampling from a population which is 10% contaminated.

covered" in 3.0 of the samples containing no errors. Of the 33 samples containing


one "error"the "error" would by discovered in 18 of these samples. This criteria
would discover none of the "errors" in samples containing more than one "er-
ror". We would have obtained 18 of the 50 contaminating values and 3 which
were members of the original population.
When o- is known the performance will increase when more contaminators
are present. Performance however has been measured in terms of finding a
single contaminator; i.e., the test has been used only once. Therefore even with
increasing percent contamination the level of performance will decrease with
increasing contamination. Repeated use of the test criteria has not been in-
vestigated.

0~~~~~~

50?-~~~~~~~~0
75- S _ ___ _ .
- 4;t
7S----
--I I

0 /a34S6 ~~ A ~ 1. 23 4 6 7 a
r10(DI) r,2 (Dl r2, r,1)
n=5 n =15
FIG. 24. Performance of rlo(D1)and r22(DI, r2o, r2l) for various levels of significance
when the population is 10% contaminated with location errors.

This content downloaded on Mon, 24 Dec 2012 01:12:31 AM


All use subject to JSTOR Terms and Conditions
EXTREME VALUES 503

7.s*?---- ~~~~~~~~75---
/00 - - - - 1 - --

50?-- '0

0 2 .3 < 4~~~~~~~~~~/

rlo(Di) r22(11, r2l)


n = 5 n =15
FIG. 25. Performance of rio(D1) and r22(D1, r2O, r2) for various levels of significance
when the population is 10% contaminated with scalar errors.

Criteria B1 gives the best performance for both location and scalar errors for
the levels of contamination and levels of significance considered. A and C1 are
only slightly inferior. B2 is handicapped when more than one error is present
thus its performance is poorer for heavier contamination. Figure 19 shows the
performance of B1 for the different levels of significance, 10% contamination,
and the two sample sizes 5 and 15 for location errors. Figure 20 shows the results
for scalar errors. Figures 21 and 22 show the performance of B1 for the 5%
level of significance for the different levels of contamination.
When ar is not known the performance of various criteria will eventually
decrease as more and more contaminators are present in the sample even though

10
~ 0

0-- 50-?

2\ 2

0 2.3 4 5 6 78 0 8 3 4 -56 7

rio(D1) r22(D , r2 , r2l)


n = 5 n = 15
FIG. 26. Performance of rio(D1) and r22(DI r2, , r,n) for various levels of contamination
J

for location errors and using the 5% level of significance.

This content downloaded on Mon, 24 Dec 2012 01:12:31 AM


All use subject to JSTOR Terms and Conditions
504 W. J. DIXON

7t.l
5-- 7S -- -

sot 1t CW

rjo(D1) r22(D1, ra , r2j)


n=5 n = 15
FIG. 27. Performance of rlo(D1) and r22(D1, r2O, r2l)for various levels of contamination
for scalar errors and the 5% level of significance, a = 5%.

several of the criteria show improvement in discovering a single error if two


are present. The performance of these criteria is greatly affected by the size
of the sample. For samples of size 5, r10and D1 perform alike, rio being superior
to the other r's (r2Osecond best) for the levels of contamination considered,
and D2 is inferior to r20. Figure 23 compares the performance of r10, D1, r2o0
and D2 for the 5% level of significance and 10% contamination. The results
for other levels of significance and contamination are comparable.
For samples of size 15, r2o, r21 and r22perform alike as do rio, ril and r12. D
and r20, r2l, r22 perform approximately the same and are superior to r10, 'ru,

/0 4 6 <

FIG. 28. A comparison of the performance of r22 and D1 for two scalar contaminators
when tests are made at one extreme only, a = 5%, n 15.

This content downloaded on Mon, 24 Dec 2012 01:12:31 AM


All use subject to JSTOR Terms and Conditions
EXTREME VALUES 505

and r12 . Critical values are not available for D2 for n > 12. The performances
of D1, r20, r21and r22are indicated by a single line in Figures 24, 25, 26, and 27
which show the effect of level of significance and level of contamination of the
performance of D1, r20, r21 and r22 for samples of size 15 and for r10 (D1) for
samples of size 5.

7. Remarks and conclusions. Throughout the investigation of performance,


location errors were placed only at one extreme and scalar errors at either ex-
treme. The test for an error was made using as a suspected value the extreme
value in the direction of the location error or in the case of the scalar error the
value most distant from the mean. It can be expected then that if performance
were assessed when location errors could occur in either direction, different
results would be obtained. Also in the case of scalar errors if errors were always
sought at one particular extreme or at both extremes different results would be
obtained. If these changes were made in the models of contamination, those
criteria designed to avoid errors at the other extreme would have an advantage
over those which were not so designed for a- unknown. If a- is known the criteria
which do not avoid the other extreme would have an advantage over those
which do avoid the other extreme. These points just mentioned will be used to
discriminate between those criteria which were judged to be equal in perform-
ance under the models used in the sampling study. For example, Figure 28
compares the performance of r22 and D1 for two scalar contaminators when
tests are made only at one extreme, a = 5%, n = 15.
1. For a-known:
B1 or C1should be used, or in small samples A, B1 or C1should be used.
2. For a- unknown:
r10should be used for very small samples. r22should be used for sample sizes
over 15. Probably r2l would be best for sample sizes from about 8 to 13. If sim-
plicity in computation is not important and "errors" are not expected at both
extremes D1 would do equally well. When critical values are available for larger
n, D2 should prove useful in the larger sample sizes.
LITERATURE REFERRING TO CRITERIA LISTED IN SECTION 3
(B1) A. T. MCKAY, "The distribution of the difference between the extreme observation and
the sample mean in samples of n from a normal universe," Biometrika, Vol. 27
(1935), pp. 466-471. Procedures for obtaining percentage values given.
(B2) J. 0. IRWIN, "On a criterion for the rejection of outlying observa.tions," Biotnetrika,
Vol. 17(1925),pp.238-250. Pr(B2 > X),X= .1(.1)5.0;n= 2,3,10(10)100(100)1,000.
Tables concerning the second and third ordered observations are also given.
(C1) E. S. PEARSON AND H. 0. HARTLEY, "The probability integral of the range in samples
of n observations from the normal population," Biometrika, Vol. 32 (1942), pp.
301-310. 0.17o, 0.5%o, 1.0%0, 2.5%, 5%, 10%0, i = 2(1)12, values to 20 available by
interpolation.
(C2) D. NEWMAN, "The distribution of ranges in samples from a normal population, ex-
pressed in terms of an independent estimate of the standard deviation," Bionmetrika,
Vol. 31 (1940), pp. 20-30. 1%.t,and 5%6points for C2; for w, n = 2(1)12, 20; s, d.f. =
5(1)20, 24, 30, 40, 60, co.

This content downloaded on Mon, 24 Dec 2012 01:12:31 AM


All use subject to JSTOR Terms and Conditions
506 W. J. DIXON

(C2) E. S. PEARSONANDH. 0. IIARTLEY, "Tables of the probability integral of the student-


ized range," Biometrika, Vol. 33 (1942), pp. 89-99. Upper and lower 5% and 1%
points for C2 ; for w, n = 2(1)20; for s, d.f. = 10(1)20, 24, 30, 40, 60, 120, oo.
(C2,B1) K R. NAIR, "The distribution of the extreme deviate from the sample mean and
its studentized forms," Bionietrika, Vol. 35 (1948), pp. 118-144. B1 upper and lower
.1%, .5%, 1%, 2.5%, 5%, 10%points for n = 3(1)9.
(D1, D2, F, B1) F. E. GRUBBS, "Sample criterion for testing outlying observations,"
Annals of Math. Stat., Vol. 21 (1950), pp. 27-58. F, DI :1%, 2.5%, 5%, 10%, n < 25;
D2: 1%, 2.5%,5%, 10%,n < 20; B1:1%, 2.5%,5%, 10%,n < 25.
R. THOMPSON,
(F) WV. "On a criterion for the rejection of observations and the distribution
of the ratio of deviation to sample standard deviation," Annals of Math. Stat.,
Vol. 6 (1935), pp. 214-219. 20%, 10%, 5%, n = 3(1)22(10)42, 102, 202, 502, 1002.
(F) E. S. PEARSON AINDCHANDRA SEKAR give a further discussion of F in "The efficiency of
statistical tools and a criterion for the rejection of outlying observations," Bio-
metrika,Vol. 28 (1936), pp. 308-320. 10%,5%, 2.5%, 1%, n = 3(1)19.
(r's) W. J. Dixos, "Ratios involving extreme values," Annals of Math. Stat., to be pub-
lished. r1o , ril , r12 , r20 , r2l, r22 ; .5%, 1%, 2%c, 5%2c,10%, 20%, 30%, 40%, 50%,
60%, 70%, 80%, 90%, 95%, n < 30.

This content downloaded on Mon, 24 Dec 2012 01:12:31 AM


All use subject to JSTOR Terms and Conditions

You might also like