Institute of Mathematical Statistics
Institute of Mathematical Statistics
Author(s): W. J. Dixon
Reviewed work(s):
Source: The Annals of Mathematical Statistics, Vol. 21, No. 4 (Dec., 1950), pp. 488-506
Published by: Institute of Mathematical Statistics
Stable URL: http://www.jstor.org/stable/2236602 .
Accessed: 24/12/2012 01:12
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to The
Annals of Mathematical Statistics.
http://www.jstor.org
IThis paper wi-asprepared under a contract with the Office of Naval Research.
488
If we wish to follow a procedure which will not search for particular values to
be excluded but will minimize their effect if present, we must investigate the
sampling distributions of these modified statistics and estimate the loss in in-
formation resulting from their use when all observations are "good." We must
also investigate the expected bias which will result when "bad" items are present
even though essentially excluded. Perhaps most disturbing about the avoidance
of "bad" items is the fact that a decision must still be made as to whether a
"bad" item was present or not in order to know in which way our estimates may
be biased. For example, a sample mean computed by avoiding the two end ob-
servations will not be a biased estimate of the mean of a symmetric population
if both end items should actually be included or if both end items should not be
included. However, if only one of the two should not be included this estimate of
the mean will be biased.
B. Extreme deviation,
-
B1 = (or X)
Xn -Xn-1 I X2 -XlX
B2 = or -x )
C. Range,
C , W-X7-wX= -
n
(or forxn, D1 = S
X2 - Xi
Xn - Xi
ril - X1
Xn-I - Xl
Xn Xn-2
(or for Xn avoiding Xn1 r20 = )
-n Xl
X3 -Xl
r2l=
xn-l - X1
Xn/ XnXIX
The performance of the large number of criteria listed here will be assessed
with respect to discovery of contamination of the type given in Section 2.
/
B/
'~71
75-
l
rX loll -- 7 A-) } T
/1 1'f//! j_ - --T
_
/~~~7
, S 6 S 6 7 A
FIG. 1. Improvement in performance ob- FIG. 2. The effect of the level of signifi-
tained with knowledge of a, a = 5%, n - 5, cance on the performance of B1 ; a =%,
15. 5%, 10%; n = 5, 15.
wvellas one to be considered later (rio) are given in Figure 3 for a = 5% and for
n = 5 and n = 15.
The following statements can be made from inspection of Figure 3:
a) The differences among A, B1, B2, and Ci are not great.
b) The knowledge of a- is less important in larger samples.
c) The curve for C2 lies above that of rio for n = 5 and below that of r1ofor
n = 15. This is consistent with the use of 9 d.f. in the independent estimate
of o.
If the question of ease in computation or application is important, it may be
desirable to use B2 or C1 in place of B1 for they are slightly easier to compute
and it is not necessary to measure all observations to obtain the value of these
statistics. From Figure 3 it will be noted that the performances of these criteria
are nearly as good as for B1 . If two outliers may be expected in a single sample,
CZ
0 / 2 3 4L; 6 7 A 0 / 2 3 4 s. 6 A
FIG. 3. Comparison of the performance of criteria using a known (or using external
estimates of a) and rio for samples of size 5 and 15, a = 5%.
/00-
as50- - +i
-
-?
9 / 2 34 5S 75?8
FIG. 4. Comparisonof the performance of B1for one and two location errors in samples
of size 15, ax- 5%.
ance of B1 and C1were noted for either n = 5 or n = 15. Figure 6 shows the in-
crease in performance of two contaminators for B1 for n = 15, a = 5%.
The general recommendations for possibilities of either type of contamina-
tion, location or scalar errors, would lead one to the use of B1 or C1if o-is known.
Criterion C1 is recommended since:
1. Its performance is almost as good as the performance of B1 for a single
outlier. Their performances are about equal for two outliers and C1 affords pro-
tection for outliers either above or below the mean.
2. It is simple to compute.
If ease of computation is not essential and maximum performance is desired,
the criterion B1 should be used. The performance of C2 will approach that of
B1as the number of degrees of freedom in the denominator increases.
O / 2 3 + 5 6 Z 8TWO E2 3O/5--7 8
;7
-5.
2After this paper was submitted, the critical values of D2 have been extended to n < 20
(see references).
n
for D1. Thus, for example, in samples of size 15, =2 = .536.
This value lies between the values for the 2.5% and 5% level of significance.
These values are .503 and .556 respectively. Therefore the use of the 1% or
2.5% levels will give poorer and poorer performance as X increases, and the
use of the 5% or 10% levels will give better and better performance as X increases
when two errors are present. The dividing point is such that for samples of
size 11 or less the use of any of the given levels of significance will cause the
performance to decrease as X increases. For samples of size n < 14 the 1%,
2.5% and 5% levels have the same effect, and for samples of size n < 16 the 1%
and 2.5%, for samples of size n < 19 just the 1% level. For three such errors
2
the limit approached by D1 as X increases is n - . Therefore, the perform-
ance of D1 will approach zero for all levels of significance and for all sample
sizes for which critical values are known except the 10% level of significance
for sample sizes larger than 21. An indication of these limiting values c 1 n
k n -t
for k contaminations present can be obtained by considering these k values to
/O ~ ~ ~ ~ ~ ~ 0
k70~~~~~5
S.
ri
-0t r -----|A
- - - .
0
250-4ELeLI~~~ AL
/ 2 3 + $5 6 7 8
-4
0 / 2 3 4 5 6 7 8
25:XXX~~~~~~i iX _4 A i -
O I 23> > 4 5 6 7 8 0 / e23 4 5 6 7 8
FIG. 9. Performance of the r criteria for FIG. 10. Performance of the r criteria for
one location error in saniples of size 15, a = one scalar earrorinsamples of size 15, a = 5%/o.
5%,/.
/90/Oo - |-o | I 0 TI IF I i1 -
FIG. 11. Comparison of the performance FIG. 12. Comparison of the performance
of the ri. criteria for one and two location of the r1. criteria for one and two scalar
errors in samples of size 5, a = 5%. errors in samples of size 5, a = 5%.
formance for r20, r2l, (r22 is not a test for n = 5) for one and two contaminators
for a = 5% and n = 5. Figures 17 and 18 are for r20, r2l , r22for n = 15. The
six curves represented by the single curve of Figure 17 lie within 5% of the
curve shown. The same is true of the three curves represented by each of the
two curves of Figure 18.
Since no loss in performance results for larger samples from the use of r2O,
r2i, r22 in place of rio, nrl, r12, and further, these criteria are not appreciably
affected by the presence of another outlier it would seem unwise to recommend
the use of rio, ri2, r12. However, note that for small samples (see Figures 11 and
12) the performances of rlo and ril and r12are considerably better when a single
/00 ] r - - 6 -7
- / I
7~~~~~~~~~~~~~~~~~PZ
fFIG.
O9 1. Coprio othpefrace Fi. 14. ,Coprison of tll pronac
ofthr. crteriafo on adtwloain fthricieiaor on an tw scla
:yo 0) 50 o /
0 0
/ 234v5 678 d / 2 3 45 678
FIG. 13. Comparison of the performance FIG. 14. Comparison of the performance
of the r1. criteria for one anld two location of the r1. criteria for one and two scalar
errors in samples of size 15, az = 5%. errors in samples of size 15, az = 5%0.
5/.~~~~~~~~~
O1V16_ Ic- wo I ___ _
7w/Y E2 OeROe-SeeS, _____ 1
5o
11_, 50-,>
o X Iz 3 4- s5 6 7 83 ? 3 v s5 6' 7 6
FIG. 15. Coinparison of the performance FIG. 16. Comparison of the performance
of the r2. criteria for one and two location of the r2. criteria for one and two scalar er-
errors in samples of size 5, a = 5%1o. rors in samples of size 5, a = 5%.
outlier is present. Therefore in larger (n > 10) samples r20 or r2l Would appear
to be the best criteria. In samples of size 10 or less, r10or r2Oshould be used;
r21if the extreme value at the opposite end should be avoided.
It should be noted in the comparisons that no model of contamination was
investigated which would cause one or more errors at both extremes in the
sample. It is obvious that the performance of D1 and D2 would be conisiderably
decreased while the performance of r11, r12, and r21, r22wvould not be materially
affected since these criteria avoid values at the opposite extreme. Their repeated
use might discover most of such outliers, while D1 or D2 might fail on the first
trial.
~~~~~~~~~/0c. . __/n
I elR_.
-rvvo E-zzeos-__
50 - A ---. 1 50 - -
25tX2 1 X a t 1 I J 1
2 6
0FIG. 18.
FIG. 17. Comparison of the performance Comparison of the
ofrOr
thei3nsm s for one and two location er- of the r2.
r2. criteria /2
criteria for one and twoi -scalar
performanice
-
er-
rors in samples of size 15, a = 5%. rors in samnples of size 15, a = 5/.
75---- ~~~~~~~~7$
~ ~ - ~ -
250 0 X 1 X \ 5 LI---L A
0 2 3 4 5 6 7 8 0 38 6 7
n=5 n=15
FIG. 19. Performance of B, for various levels of significance when the population is 10%
contaminated with location errors.
/00--?~~~~~~~~~~0
,oo' , . - oo_.
..725- - /0
0 3 0_ 5 / _ 4 6 78
n=5 n =15
FIG. 20. Performance of B1 for various levels of significance when the population is 10%
contaminated with scalar errors.
50 l0/;
A00. ~ ~ ~ ~ ~ ~ ~ ~ t, ~ ~ ~ Xt0
A2 ~ ~ ~ ~ ~ ~ ~ 25
-?-~~~~~A
0 / 203456 8 o / 234567 8
n=5 n= 15
FIG. 21. Performance of B1 for various levels of contamination for location errors and
using the 5% level of significance.
total contamination discovered. The tests will be applied only once to each
sample. Repeated use of the criterion would in many cases increase the per cent
of total contamination discovered. It is not known what effect such a procedure
would have on the level of significance.
Investigation has been made for 5, 10, and 20% contamination. For example,
in samples of size 5 which have 10% contamination, on the average, 59.0% of
the samples will contain no "errors", 32.8% will contain one, 7.3% two, 0.8%
three, 0.1% four, and 0.0% five. Thus in 100 samples of 5 which are 10% con-
taminated with location errors having mean A + 5o-, about 59 contain no errors.
If the rio criteria is used with a 5% level of significance one value will be "dis-
7S- 75~~~00
- - - - -{- -
50- ~ ~ ~ ~ ~ ~ ~ ~ 7
FIG.
0
Xq~~~~~~i
n =5
A:A 0~~~~~~~~~~~~~~2
22. Performance of B1 for various levels of contamination for scalar errors and
n =15
/00 j /001-
- -
254- - - 75 - A
as - _ S W_ A
0~~~~~~
50?-~~~~~~~~0
75- S _ ___ _ .
- 4;t
7S----
--I I
0 /a34S6 ~~ A ~ 1. 23 4 6 7 a
r10(DI) r,2 (Dl r2, r,1)
n=5 n =15
FIG. 24. Performance of rlo(D1)and r22(DI, r2o, r2l) for various levels of significance
when the population is 10% contaminated with location errors.
7.s*?---- ~~~~~~~~75---
/00 - - - - 1 - --
50?-- '0
0 2 .3 < 4~~~~~~~~~~/
Criteria B1 gives the best performance for both location and scalar errors for
the levels of contamination and levels of significance considered. A and C1 are
only slightly inferior. B2 is handicapped when more than one error is present
thus its performance is poorer for heavier contamination. Figure 19 shows the
performance of B1 for the different levels of significance, 10% contamination,
and the two sample sizes 5 and 15 for location errors. Figure 20 shows the results
for scalar errors. Figures 21 and 22 show the performance of B1 for the 5%
level of significance for the different levels of contamination.
When ar is not known the performance of various criteria will eventually
decrease as more and more contaminators are present in the sample even though
10
~ 0
0-- 50-?
2\ 2
0 2.3 4 5 6 78 0 8 3 4 -56 7
7t.l
5-- 7S -- -
sot 1t CW
/0 4 6 <
FIG. 28. A comparison of the performance of r22 and D1 for two scalar contaminators
when tests are made at one extreme only, a = 5%, n 15.
and r12 . Critical values are not available for D2 for n > 12. The performances
of D1, r20, r21and r22are indicated by a single line in Figures 24, 25, 26, and 27
which show the effect of level of significance and level of contamination of the
performance of D1, r20, r21 and r22 for samples of size 15 and for r10 (D1) for
samples of size 5.