The Robustness of Parametricb Statistical Methods
The Robustness of Parametricb Statistical Methods
175-208
1. Abstract
1
Dieter Rasch, BioMath – Institute of Applied Mathematical Statistics in Biology and Medicine Ltd.,
Rostock; E-mail: [email protected]
2
Volker Guiard, Research Institute for the Biology of Farm Animals, Dummerstorf, Research Unit Genetics
& Biometry
176 D. Rasch, V. Guiard
In Herrendörfer, G. (ed.) (1980) and Bock (1982) several robustness concepts like those
of Box and Tiao (1964), Huber (1964) and Zielinsky (1977) have been discussed. The con-
cept of ε–robustness was introduced which was used in the robustness research presented in
this paper. This concept is here introduced for the construction of confidence intervals in a
simplified way.
Definition 1
Definition 2
A distribution belongs to the Fleishman system (Fleishman, 1978) if its first four mo-
ments exist and if it is the distribution of the transform
y = a + bx + cx 2 + d x3 (1)
where x is a standard normal random variable (with mean 0 and variance 1).
By a proper choice of the coefficients a, b, c and d the random variable y will have any
quadruple of first four moments ( µ , σ 2 , γ 1 , γ 2 ) . By γ 1 and γ 2 we denote the skewness
(standardized third moment) and the kurtosis (standardized fourth moment) of any distribu-
tion, respectively. For instance any normal distribution (ie. any element of G) with mean µ
and variance σ can be represented as a member of the Fleishman system by choosing a = µ,
2
3
Random variables are underlined.
The robustness of parametric statistical methods 177
Experimenters know that they nearly never really meet observations from normal distri-
butions. Sometimes they know, that the underlying distribution is skew, truncated or differs
in another way from being normal. In Herrendörfer, G. (ed.) (1980) 144 characters from
experiments in medicine, plant and animal breeding have been investigated. For each of the
characters from a sample of size r the empirical skewness g1 and kurtosis g2 have been calcu-
lated. The values of r have been 64 for 2 characters, 128 for 11 characters, between 192 and
438 for 38 characters and for all the other characters between 656 and 5466. This means that
the point estimates for the skewness and the kurtosis in the most cases are not far from the
unknown parameters γ 1 and γ 2 respectively.
It is known (see Herrendörfer, G. (ed.) (1980), page 26) that all probability distributions
(with existing fourth order moment) fulfil the inequality
γ 2 ≥ γ 12 − 2 (2)
Figure 1:
Values of empirical skewness g1 and kurtosis g2 of 144 characters in a (γ1,γ2)-plane, by ♦ the
parameters (γ1,γ2) of 6 distributions of the Fleishman system are denoted
178 D. Rasch, V. Guiard
g 2 ≥ g12 − 2 (3)
Table 1:
(γ 1 , γ 2 ) –values and the corresponding coefficients in (2)
No of dis- γ1 γ2 c = -a b d
tribution
1 0 0 0 1 0
2 0 1.5 0 0.866993269415 0.042522484238
3 0 3.75 0 0.748020807992 0.077872716101
4 0 7 0 0.630446727840 0.110696742040
5 1 1.5 0.163194276264 0.953076897706 0.006597369744
6 1.5 3.75 0.221027621012 0.865882603523 0.027220699158
7 2 7 0.260022598940 0.761585274860 0.053072273491
The values of b, c and d in Table 1 are taken from Table A3 in Nürnberg (1982), some of
them can be found already in Fleishman (1972).
In the simulation experiments (as also in the statistical analysis of experimental results)
we mainly use linear statistical models with an error term. We investigate the behaviour of
statistical procedures like confidence estimation, hypotheses testing and selection procedures
which have been derived (and thus are exact) under the assumption of normal error terms.
The question is: „What happens with the properties of these procedures if we replace the
normal error by an error term following a Fleishman (or a truncated normal) distribution
with non-zero γ 1 and γ 2 ?“
There are only very few analytical results to answer this question for sample sizes 2 and
3 which are useless for practical purposes. For very extreme (two- and three-point distribu-
tions) exact answers have been given by Herrendörfer (1980) and Herrendörfer and Rasch
(1981). The results in our paper are based on simulation experiments.
The robustness of parametric statistical methods 179
Example 1
A sample ( that is a vector of identically and independently distributed (abbr. i.i.d.) ran-
dom variables) of size n is drawn from a N ( µ , σ 2 ) - distribution. A (realized) confidence
interval is calculated from the realization of the random sample (the observations) if σ - as
usually - is unknown by:
α s α s
[ y − t (n − 1;1 − ) ; y + t (n − 1;1 − ) ] (4)
2 n 2 n
where the sample mean y is the point estimate of µ , the sample standard deviation s is the
α α
point estimate of σ and t (n − 1;1 − ) is the (1 − ) quantile of the central t-distribution
2 2
with n – 1 d.f.. The corresponding random interval (with random bounds as functions of the
a
sample) covers the unknown µ with probability (1 − ) as long as the sample really stems
2
from a normal distribution. Amongst all intervals with such a property the interval (4) is
optimal what means it has the smallest expected width. Non-parametric intervals have a
larger expected width if the normal assumption is true. In this case the linear model for the n
elements y of the random sample has the simple form
i
y i = µ + e i , i = 1,..., n (5)
The experimental design Vn in (5) is simply given by the sample size n (there is no struc-
ture in the experiment). The robustness against truncation (occurring in any selecting process
like in schools or in agricultural artificial selection) of normal distributions of this confi-
dence estimation (and the corresponding test) is investigated in simulation experiment 1
below.
We use besides the t-quantiles other abbreviations (for quantiles) as given in the follow-
ing Table.
180 D. Rasch, V. Guiard
Table 2:
Abbreviations used in this paper
Term Abbreviation
Degrees of freedom d.f.
Identically and independently distributed i.i.d.
P-quantile (percentile) of the N(0;1)distribution u(P)
P-quantile (percentile) of the t-distribution with f d.f. t(f; P)
P-quantile (percentile) of the F-distribution with f1 and f 2 d.f. respectively F ( f1, f 2 ; P)
P-quantile (percentile) of the χ 2 -distribution with f d.f. CQ(f; P)
and
Within the initialisation phase the values x1 ,..., x128 will be generated. In order to calcu-
late the random values Z i , i = 1..., we calculate the index I with 1 ≤ I ≤ 128 , being the
integer part of 1 + yi / 221 . Then we select xI from the x-values above and calculate
Z i = xI /1049339 . Within the vector ( x1 ,..., x128 ) the value xI will be substituted by the
next value of the xk -values, i.e. x128+i .
The Z-values generated from (6) are PRN uniformly distributed in the interval (0,1).
They have to be transformed into PRN with a standard normal distribution and then by (1) in
PRN of the Fleishman system. FEIGE et al. (1985) tested 5 different transformations of the
Z- values into standard normal PRN u and proposed the following for further use. At first we
define for Z < 0.5
z = −2 ln Z and c = 1
The robustness of parametric statistical methods 181
z = −2 ln (1 − Z ) and c = -1
Then with coefficients given by Odeh and Evans (1974) (see also Feige et al., 1985, page
31) calculate
u = c ⋅[ z +
{ ( p4 ⋅ z + p3 ) z + p2 z + p1} z + p0 ].
{ ( q4 ⋅ z + q3 ) z + q2 z + q1} z + q0
These N(0;1)-PRN have been used directly (after truncation) in simulation experiment 1
and have been transformed by (1) into Fleishman variates for the other experiments.
3.2 Planning the Size and the Scope of the Simulation Experiments
H 0 : α = 0.05
H A : α > 0.05
(if α is smaller than 0.05 we do not exceed the nominal significance level).
To determine the number of simulated samples we proceed like an experimenter in life
science planning a real life experiment. We follow the principle explained in paragraph 4.4
in Rasch (2003) and fix the significance level of this test at α*=0.01 and the value of the
power function at α + d = 0.05 + 0.006 = 0.056 at 0.85. To determine the number of simu-
lated samples we edit the constants above into the CADEMO module MEANS, branch
„Comparing a probability with a constant” and receive the result 9894. Therefore 10 000
runs were used in each of the simulation experiments below.
We report simulation experiments for the investigation of the robustness of tests, confi-
dence estimation and selection procedures. Each of the statistical inference methods was
182 D. Rasch, V. Guiard
perfomed 10 000 times for each case included in the research program. Then the number R
of negative results has been observed and used to calculate the observed error rate by α as
an estimate of the actual risk α . The distributions to be simulated are special cases of the
Fleishman system given in Table 1. In paragraph 4 we report only the results for the nominal
risk α = 0.05 but in Herrendörfer, G. (ed.) (1980), Guiard, V. (ed.) (1981), Rasch, D. and G.
Herrendörfer (ed.) (1982), Rasch, D. (ed.) (1985), Rudolph, P.E. (ed.) (1985 a), and Guiard, V.
and Rasch, D. (ed.) (1987) results for α = 0.01 and α = 0.1 are also given. The normal distri-
bution was included on one hand to check the correctness of the programs. On the other hand
– like in the investigation of asymptotic tests – the results for the normal have been of inter-
est too.
We demonstrate this in the following example which is also used to show how more
complicated cases than that in example 1 can be handled.
Example 2
Let us consider the non-linear regression model with unknown parameters α, β and γ.
yi = α + β eγ xi + ei i = 1,..., n) (7)
with i.i.d. random errors ei . Here Vn is given not only by the sample size n but by the alloca-
tion of the n x-values. We define a design with k different x-values x1 , x 2 ,..., x k which
occur with the frequencies n1 , n 2 ,..., n k , respectively, by
x x2 ... x k k
Vn = 1 , k ≥ 3, ∑ n j = n (8)
n1 n2 ... n k j =1
The distribution of the least squares estimators of the three parameters is unknown for fi-
nite samples even if the error terms are normally distributed. What is known is the asymp-
totic distribution on which statistical tests can be based (see RASCH, 1995).
Let us now describe the simulation experiments whose results will be reported in para-
graph 4. The first 6 experiments deal with tests for means and variances. But the results are
also useful for the construction of confidence intervals. Between a confidence interval with
coefficient 1-α and a test with a significance level α a one-to-one correspondence exists. As
long as the hypothesis value µ0 lies inside the interval, H0 is accepted and rejected otherwise.
On the other hand, all accepted µ0 – values are inside the interval and all rejected outside. We
report the tests here because they give additional information about the power (or the risk of
the second kind).
The robustness of parametric statistical methods 183
a) H + A : µ > µ0 ,
b) H − A : µ < µ0 and
c) H A : µ ≠ µ0 .
Test 1
y - µ0
Calculate u = z = n and reject H0 :
σ
a) if u > u(1-α),
b) if u < - u(1-α),
α
if u > u 1 −
2
c) .
Test 2
Test 3
g g 2 n
t J = y − µ0 + 21 + 4 1 ( y − µ0 ) and reject H0 :
6s n 3s n s
a) if tJ> t(n-1;1-α),
b) if tJ < - t(n-1; 1-α),
α
if t J > t n − 1;1 −
2
c) .
This test corrects the test statistic by the empirical skewness g1. There is no correction for
kurtosis.
In this experiment the class H in definition 1 was not the Fleishman system but the sys-
tem of truncated normal distributions. If a standard normal distribution is truncated at v
(lower truncation point) and w (upper truncation point) then the truncated distribution is
skew if w ≠ −v and it has a non-zero kurtosis for most (v, w).
Teuscher used the distributions 1-7 of Table 3 and sample sizes of n = 5, 10, 30 and 50.
Table 3:
Values of v and w and of the skewness and kurtosis of the simulated distributions
No of distribution v w γ1 γ2
1 -6 6 0 -0.06
2 -2.2551 2.2551 0 -0.5
3 -1.183 1.183 0 -1.0013
4 -1 6 0.5918 0.0014
5 -0.29 2.743 0.75 0
6 -0.7041 2.2778 0.5 -0.5
7 1 1.52513 1.3159 1.998
8 0.5 3 1.0057 0.5915
9 -1.5 3 0.348 -0.3481
10 2.85 4.71 1.505 3.75
RASCH (1985a, b) investigated the behaviour of three sequential one-sample tests for
non-normal situations. These tests are sequential tests analogue to those of experiment 1.
The null hypothesis:
H 0 : µ = µ0
was tested against:
H A : ( µ − µ0 ) = σ 2 d 2
2
The robustness of parametric statistical methods 185
The test statistics t1, t2, and t3 of the three tests can be found in the reference, test 1 was
based on the original WALD-test.
For each of the tests with test statistic t (t1, t2, and t3) and fixed risk α of the first and β of
the second kind, respectively, the decision rule was as follows:
Accept H0 if
t1 , t2 ≤ b = ln B or if t3 ≤ B
Reject H0 if
t1 , t2 ≥ a = ln A or if t3 ≥ A
1− β β
A= ;B = (9)
α 1−α
with (α, β) = (0.05, 0.1); (0.05, 0.2); (0.1, 0.2) and (0.1; 0.5); d =0.6; 1 and 1.6.
In the simulation a normal distribution (no 1 in Table 4), the distributions 3-6 of Table 1
(number 5-8 in Table 4) and 8-10 of Table 3 (2-4 in Table 4) have been used. Further the
average sample size of the 10 000 replications has been calculated for each situation.
Table 4:
Simulated distributions and their skewness and kurtosis
Simulation Experiment 3 and Posten’s experiment – Tests for comparing two means
Comparing the means of two distributions may be one of the most frequent applications
of statistics in many research fields. We discuss here the independent samples case, this
means, the samples of size n1 and n2 have been drawn independently from two populations
with parameters µ1 , σ12 , γ 11 , γ 12 and µ 2 , σ 2 2 , γ 21 , γ 22 , respectively. The dependent sam-
ples case as it occurs if the same persons are investigated before and after a treatment can be
reduced to the one-sample case discussed in experiment 1.
186 D. Rasch, V. Guiard
Our purpose therefore is to take two independent random samples (y 11, ..., y1n1) and
(y21, ..., y2n2) of sizes n1 and n2 from the two populations in order to test the null hypothesis
Ho : µ1 = µ2
a) HA : µ1 > µ2
b) HA : µ1 < µ2
c) HA : µ1 ≠ µ2
Calculate
n1 n2
2
∑ ( y1i - y1. )2 + ∑ ( y2i - y 2. )2
i=1 i=1
sp= (10)
n1 + n2 - 2
and the test statistic
y1 - y 2 n1 n2
t= .
sp n1 + n2
and reject H0
y1 - y 2
t W= (11)
2 2
s1 s 2
+
n1 n2
2
s12 s22
+
f*= n1 n2 (12)
s14 s24
+
( n1 − 1) n12 ( n2 − 1) n22
reject H0:
and accept it otherwise (the three cases correspond to those in test 1).
WILCOXON (1945) and later MANN and WHITNEY (1947) proposed a two-sample
test based on the ranks of the observations. This test is not based on the normal assumption,
in its exact form only two continuous distributions with all moments existing are assumed,
we call it the Wilcoxon test. As it can be seen from the title of the second paper, this test is
testing whether one of the underlying random variables is stochastically larger than the
other. It can be used for testing the equality of means under additional assumptions: The null
hypothesis tested by the Wilcoxon test corresponds with our null hypothesis Ho : µ1 = µ2 if
and only if all higher moments of the two populations are equal. Otherwise a rejection of the
Wilcoxon hypothesis says few about the rejection of Ho : µ1 = µ2. The test runs as follows:
Calculate:
and then
188 D. Rasch, V. Guiard
n1 ( n1 + 1) n1 n2
W = W12 = + ∑ ∑ dij . (14)
2 i =1 j =1
Reject Ho
and accept it otherwise (the three cases correspond to those in test 1). Extensive tables of the
quantiles W ( n1 , n2 ; α ) are given by VERDOOREN (1963).
y1 − y 2
TLord =
w1 + w2
and reject Ho
and accept it otherwise (the three cases correspond to those in test 1). Tables of the quantiles
τ ( n1 , n2 ; P ) are given in Lord’s paper.
The following tests are due to TIKU (1980).
FRICK (1985) investigated the robustness of two tests against non-normality. Besides
the normal distribution the Fleishman distributions in Table 4 and two truncated distribu-
tions have been used. In some of the simulations different distributions have been generated
in the two samples. The null hypothesis.
H 0 : µ1 = µ2
( µ1 − µ20 )2
HA : 2
= d2
σ
190 D. Rasch, V. Guiard
Calculate
−d 2 f + 1 1 d 2t 2
Q = exp H , ,
2K
2 2 2 K f + t
2
( )
with
1 1 y −y
K2 = + , f = n1 + n2 − 2, t = 1 2 ,
n1 n2 sP ⋅ K
π1 , π 2 ; 0 < π i < 1; ( i = 1, 2 ) ; π1 + π 2 = 1 .
−d 2 fW + 1 1 dW 2tW 2
QW = exp W H , ,
2K
2 2 2 K fW + tW 2 ( )
2 ( µ1 − µ2 )
2
*
with fW = f in (12), tW from (11), dW = and the other symbols as in test 1.
π1σ12 + π 2σ 22
The decision rules are as for test 1 with QW in place of Q.
We announce here that Häusler (2003) investigated the robustness of the triangular se-
quential test described by Schneider (1992) for some Fleishman distributions.
The robustness of parametric statistical methods 191
yi = µi + ei , i = 1,..., k ; E ( ei ) = 0; var( ei ) = σ 2
for all i.
Several hypotheses can be tested. If both risk of a pair-wise comparison of all the µi are
defined pair-wise, the pair-wise t-test is the appropriate procedure. This means, for each
comparison a two-sample t-test with a pooled variance estimate is applied. The tests of
STEEL (1959) and NEMENY (1963) mentioned below can be applied in this case, too.
Their properties are investigated for comparing means with a control.
To test
H 0 : µ1 = µ2 = ... = µk
against the alternative that at least two of the means are different, the F-test (of a one-way
ANOVA) usually is used.
If the risk of the first or the second kind must be understood for one comparison only,
this risk is called comparison-wise. If the risk must be understood for any comparison in the
experiment, it is called experiment-wise.
There are many multiple comparison procedures parametric and non-parametric ones for
several situations and comparison-wise or experiment-wise significance levels (see the post
hoc tests in SPSS ANOVA branch).
This is also true for comparing k –1 means with a standard or control where compar-
ing the k-1 non-standard procedures with each other is not the aim of the experiment. To test
(if the standard mean is µ k ) the null hypotheses
we use for a comparison-wise significance level the t-test for each comparison with a
pooled variance estimate (equal variances assumed):
ni 2
k
∑∑ ( yij − yi. )
2 i =1 j =1
s = k
(15)
∑ ni − k
i =1
A Wilcoxon type rank test of Steel (1959) rejects (based on samples of size n) for the
significance level α
if
( )
TS = max Wik ;Wik* > r ( k − 1, n,1 − α ) , i = 1,..., k − 1, .
For the bounds r ( k − 1, n,1 − α ) , i = 1,..., k − 1, see Steel (1959). The Wik are defined
analogue to (14) and the W*ik are the corresponding values for the inverse rank order.
The non-parametric Kruskal-Wallis test in the version of NEMENY (1963) is based on
1 n 1 n
TKW = ∑ Rkj − ∑ Rij , ( i = 1,..., k − 1)
n j =1 n j =1
with the ranks Rlj of the j-th element of the l-th sample in the overall ranking of all observa-
tions.
kn(kn + 1) 2
H0ik: µi = µk (i = 1, ..., k - 1) is rejected, if TKW > d ( k − 1; ∞;1 − α ) with
12 n
the d ( k − 1; ∞;1 − α ) of Dunnett’s test below.
For an experiment-wise significance level we use the Dunnett test (see RASCH et.al.,
1999, page 133):
Reject H0ik: µi = µk (i = 1, ..., k - 1) with experiment-wise significance level αe, whenever
1 1 *
y i. − y k. > s 2 + . d
ni n k
holds. The value of d is for α = 0.05, n1 = n2 = ... = nk-1 = n and nk = n k − 1 the value d(k-
*
RUDOLPH (1985 a,b) compared in a simulation experiment the properties of the Dun-
nett test with those of the test of Steel and Kruskal and Wallis for k = 3, n = 6 and n = 21;
α = 0.01 and α = 0.05 and the following variance structures:
The robustness of parametric statistical methods 193
Table 5:
Variance structures of the simulation experiment
Case σ12 σ 22 σ 32
1 0.5 1 1
2 0.5 0.5 1
3 0.5 2 1
4 2 2 1
5 1 2 1
6 1 1 1
The power was estimated as the relative frequency Pi of rejecting H 0ik : µi = µ3 (i =1,2)
if
µi − µ3 ⋅ n
= ∆i (16)
σ12 + σ 22
For comparing two variances we assume samples (y 11, ..., y1n1) and (y21, ..., y2n2) of size n1
and n2 drawn independently from two populations with parameters µ1 , σ12 , γ 11 , γ 12 and
µ2 , σ 2 2 , γ 21 , γ 22 , respectively.
Our purpose is to test the null hypothesis
H 0 : σ12 = σ 2 2
against
H A : σ12 ≠ σ 22
The usual test for comparing two variances is the F-test based on the two sample vari-
ances on the statistic
with that of 10 further tests (Bartlett, modified Bartlett, modified χ 2 , Cochran, Range, Box-
Scheffé, Box-Andersen, Jackknife, Levene-z, Levene-s). We report here only the most im-
portant members of this set.
Levene Test: For this test we use the observations y1i and y2i to calculate the quantities
and
Then we carry out an independent samples t-test with the u-values or the v-values for the
z- and s-version of the test, respectively.
For the Box-Scheffé-test (also known as Box-test or Box-Kendall-test) (Box, 1953,
Scheffé, 1963) the samples are randomly divided into c groups, c given. These groups con-
tain mij , i = 1, 2; j = 1,..., c , observations. Let sij2 denote an estimator of σ i2 and define
The null hypothesis will be rejected if F + > F (1, 2(c − 1),1 − α ) . In his simulation
study, Nürnberg (1985) considered the cases c = 2 and c = 3 .
Tests and confidence estimation in linear and quasi-linear (for instance polynomial) re-
gression can be expected to behave like the corresponding inferences for means. Because in
psychological research intrinsically non-linear regression – like exponential regression –
play no important role, we only summarize the results briefly. The different non-linear re-
gression functions investigated in the robustness research are described in Rasch (1995,
chapter 16).
In the intrinsically non-linear regression problems arise which are not known in the pro-
cedures discussed so far. Parameters can be estimated only iteratively and even for normal
distributions the distribution of the estimators (inclusive their expectation and variance) are
only known asymptotically (i.e. for a sample size tending to infinity). Therefore we investi-
gated together with the robustness also the behaviour of the estimators and the tests for the
regression parameters in the normal case for small samples.
The robustness of parametric statistical methods 195
As already mentioned in Rasch et al. (1999) and Rasch (2003) selection procedures
should be preferred to multiple comparison because most practical problems are selection
problems.
Let us take independent samples from a normal distributions. If the model of the simu-
lation experiment 5 is applied, than we are interested to find the greatest value of the expec-
tations µi (i = 1,..., a ) . Without loss of generality, we assume, that
µ1 ≤ µ2 ≤ ≤ µa .
But this order is not known for the user. There are two formulations of this problem:
1 n
µˆ i = y i = ∑y .
n j =1 ij
For normal distributions Rasch (1995) presents tables for planning the sample size n.
n (1−α )
1
RTr0.1 and RTr0.2: The trimmed means y
iα
=
n(1 − 2α )
∑ y
i( j)
are less sensitive
j = nα +1
to outliers. Here y denotes the order statistics y ≤y ≤ ≤y of the variables
i( j) i (1) i (2) i (n)
y , j = 1,..., n . If n ⋅ α is not integer, y and y have to be counted by
ij i ([ nα ]+1) i ([ n (1−α ) ]+1)
weights 1 − (nα − [ nα ]) , where [ x ] denotes the integer part of x . We consider the estima-
tors µˆ i = y for rule RTr0.1, and µˆ i = y for rule RTr0.2, respectively.
i 0,1 i 0,2
196 D. Rasch, V. Guiard
1 n−r
µˆ i = y Tir ' = ∑ y i ( j ) + r β '( y i ( r +1) + y i ( n − r ) ) ,
n − 2r + 2r β ' j = r +1
where r = [ 0.5 + r '⋅ n ] and β ' = −ϕ (t )(t − ϕ (t ) ⋅ n / r )n / r (0 < β ' < 1) with t from
1 − Φ (t ) = r / n . Following Tiku (1981) we use r ' = 0.1 , for RTr0.1 and r ' = 0.3 for RTr0.3,
respectively.
RA: Randlers, Ramberg, and Hogg (1973) described an adaptive selection rule. This rule
works in two steps. In the first step by means of a particular estimator of the kurtosis, the
distribution of the data is grouped into one of the three groups „light-tailed“, „medium-
tailed“ and „heavy-tailed“. In the second step a selection rule is applied depending on the
group, found within the first step.
RRS: The selection rule RRS uses the rank sums Ri = ∑ j =1 rij instead of µˆ i , where
n
µ1 = µ2 = = µa −1 = µa − d
with different values for d . For all other configurations the P(dCS ) would be greater. The
distributions used in the simulation study of Domröse and Rasch (1987) are shown in Table
6.
Table 6:
(γ 1 , γ 2 ) -values used in the simulation experiment
type of distribution N U F F F F
γ1 = 0 0 0 1 2 -2
γ2 = 0 - 1.2 6 1.5 6 6
N: normal, U: uniform, F: Fleishman
n d /σ
11 1; 0.75
21 1; 0.75
47 0.75; 0.5
The robustness of parametric statistical methods 197
The goal of a subset selection procedure consists in selecting a subset, s , of the a distri-
butions such, that the probability P (CS ) , that the „best“ distribution belongs to this subset,
is not smaller than a prespecified value P * . The first of the following selection rules was
developed by Gupta for the case of normal distributions N ( µi , σ 2 ) :
RG: (Gupta, 1956, 1965) Calculate the sample means yi , i = 1,..., a , and the pooled
variance estimate (15) . Select the distribution i , if
yi ≥ max yl − ta −1 (d . f ., P*) 2 s 2 / n .
4
1≤l ≤ a
D (ij ) , if n 2 = 2k + 1
(ij )
D[medium] = 1 [ k +1](ij ) (ij ) 2 .
2 ( D[ k ] + D[ k +1] ) , if n =2k
Table 7:
Simulated distributions
s 2 / 2 instead of 2s 2 / n
4
Listing and Rasch (1996) erroneously wrote
198 D. Rasch, V. Guiard
In order to estimate P (CS ) , Listing and Rasch (1996) simulated the least favourable
configuration
µ1 = µ1 = = µa .
For estimating the expected number f (d ) of selected false populations, the configura-
tions
µ1 = µ2 = = µa −1 = µa − d ⋅ σ
Table 8:
Sample sizes, which give a 20% robustness
It is not surprising that test 3 is quite good if the kurtosis is small. Then it behaves better
than the t-test (test 2). But for sample sizes larger or equal to 50 all the tests are 20%-robust.
The average sample sizes for the 10000 replications did not differ too much for the three
tests if d =1, otherwise they were larger for test 3 than for the other tests. It is known that
sequential tests guarantee the risks only approximately even in the normal case. This can be
seen from Table 9, but the test is conservative.
Table 9:
Percentage of false rejections of H0 for 8 distributions from Table 4 (columns 3 –10) for
α = 0.05, β = 0.2 and d = 0.6
d Test ti 1 2 3 4 5 6 7 8
[0; 0] [1.006;0.6] [0.35; -0.35] [1.5; 3.75] [0; 3.75] [0; 7] [1; 1.5] [1.5; 3.75]
0.6 1 4.34 7.44 4.93 10.41 3.23 2.95 6.17 7.57
0.6 2 4.64 1.69 3.59 11.11 3.33 2.63 1.78 1.36
0.6 3 4.9 9.18 5.41 12.54 3.33 3.53 7.15 7.02
Table 10:
Percentage of false acceptions of H0 for 8 distributions from Table 4 (columns 3 –10 with
[ γ 1 ; γ 2 ]) for α = 0.05, β = 0.2 and d = 0.6
d Test ti 1 2 3 4 5 6 7 8
[0; 0] [1.006;0.6] [0.35; -0.35] [1.5; 3.75] [0; 3.75] [0; 7] [1; 1.5] [1.5; 3.75]
0.6 1 15.54 8.95 13.78 5.05 14.64 14.76 10.46 7.24
0.6 2 15.46 27.8 19.45 31.3 13.15 11.53 24.01 27.01
0.6 3 18.9 29.87 21.68 38.84 17.74 16.8 27.21 22.75
The Wald test (test 1) has for the normal distribution (distribution 1) the best approxima-
tion for both risks and is always robust (more than that, conservative) for the risk of the
second kind. With exception of the extremely non-normal distribution 2, 4, 7 and 8 it is also
robust for the significance level. From the 3 tests examined test 2 must be preferred even if
its power is lower than for test 3 (this is a consequence of the higher first kind risk, the
power function of test 3 dominates that of test 2 and 1 with exception of distribution 8) in
both positions and for all distributions. But for practical distributions in which either the
skewness or the kurtosis is low, the test is robust and the average sample size moderate.
200 D. Rasch, V. Guiard
The results of the experiments described above let to a huge data set and it seems there-
fore better to repeat here a summary which was published in Guiard, V. and Rasch, D. (ed.)
(1987) pages 37-73 unifying both the results of Posten (1978) and Tuchscherer and Pierer
(1985). This summary is given in figure 2.
Figure 2:
Proposals for the comparison of two means
The Wilcoxon test is in the case of equal variances and equal sample sizes also quite
good but there is no real need to use it even with small samples. The tests proposed are in
general more robust in cases where no prior knowledge is available and this is in most prac-
tical situations the case. Before an experiment is performed, it has to be planned amongst
others its size has to be determined (see RASCH, 2003). We recommend to design the ex-
periment with equal sizes in both samples. This not only because the total size
N = n1 + n2 is for a given precision a minimum, if the two sample sizes are equal but also
the robustness is larger in this case and the t-test can always be applied.
The robustness of parametric statistical methods 201
For pairwise sampling at each step and if the non-normality of the two populations is of
the same type both tests are conservative for both risks and 20%-robust. The skewness than
has nearly no influence on the risks. Pairwise sampling is recommended.
If the non-normality in both populations is different, the skewness influences the risks
and the tests are not always robust. Than the kurtosis has no big influence. We think that in
practical situations differences in the higher order moments in both populations can not be
expected.
For pair-wise sampling in both populations with different probabilities
π1 = 0.75; π 2 = 0.25 the results are given in Table 11.
Table 11:
Estimated actual risks α̂ and β̂ for Hajnal’s (suffix H) test and the Welch modification
(suffix W) for pairwise sampling and sampling with probabilities π1 = 0.75;π 2 = 0.25 ;
d = dW =1.5 are given for α = 0.05 and β = 0.1
Summarizing the results of Table 11 and of further simulation results by Frick (1985) for
unequal variances we propose pair-wise sampling and the Welch modification (test 2) which
is also robust in the case of equal non-normality but unequal variances in both populations.
Recently it was shown by Häusler (2003) that the triangular sequential test (Schneider, 1992)
behaves much better than the tests above. More information will be found in Rasch, Kubin-
ger, Schmidtke and Häusler, 2004.
202 D. Rasch, V. Guiard
For the multiple t-test the robustness is at least as good as for the two-sample t-test, be-
cause robustness increases with the d.f. of the t-test. If the size of each sample is n, the two-
sample t-test has 2(n-1) d.f. but in the k-sample case we have k(n - 1) d.f. Therefore no new
simulation is needed. This is analogue in the case of comparing with a standard.
That the F-test is very robust against non-normality (but be careful: the F-test for com-
paring variances is extremely non-robust, see result 6) was already shown by ITO (1964) and
no further simulation was needed. For the many existing multiple comparisons as for in-
stance listed under „post hoc tests” in the SPSS ANOVA branch no general recommendation
can be given but we think that the Tukey and the Student-Newman Keuls test (based on
range) is as robust as the t-test due to the two-sample results for the t- and the Lord (range)
test. This will be supported by the poor power results of the rank tests in Rudolph’s experi-
ment, where these tests have been compared with the Dunnett-test. The result of Rudolph
(1985b) shows that the Dunnett test must be preferred to the rank tests for n< 15, for larger n
it is still good but the rank tests behave better than for small samples. For non-normal distri-
butions and variance heterogeneity no test is really robust. Here a multiple Welch test may
be helpful as in the two-sample problem.
Table 12:
Estimated actual risks 100αˆ for α = 0.05, k = 3, n = 6 and n = 21 and variance homogeneity
(structur 6) and 100P1 and 100P2 for ∆1 = ∆ 2 = 3.46 and n = 6 and n = 21
γ1 γ2 P1 + P2 P1 + P2
Test n=6 n = 21 ,n = 6 , n = 21
2 2
0 0 Dunnett 4.9 4.8 83.48 63.46
Steel 4.85 4.82 75.93 55.58
Kruskal - Wallis 4.72 4.64 71.48 56.52
0 1.5 Dunnett 4.16 5.13 83.34 63.67
Steel 4.27 5.34 76.33 62.42
Kruskal - Wallis 4.31 5.32 73.05 63.29
0 3.75 Dunnett 4.26 4.74 83.87 64.66
Steel 4.89 4.82 77.29 70.50
Kruskal - Wallis 4.61 4.64 74.98 71.11
0 7 Dunnett 3.98 4.82 84.05 66.22
Steel 4.74 5.34 77.06 77.33
Kruskal - Wallis 4.55 5.32 76.16 78.94
1 1.5 Dunnett 4.75 4.90 83.53 64.04
Steel 5.01 4.95 76.19 64.28
Kruskal - Wallis 4.75 4.72 72.63 64.49
1.5 3.75 Dunnett 4.23 4.97 83.51 64.59
Steel 4.71 5.09 76.01 72.98
Kruskal - Wallis 4.3 4.98 73.26 72.59
2 7 Dunnett 4.17 4.3 83.62 65.49
Steel 5.08 5.1 76.79 80.81
Kruskal - Wallis 4.73 5.01 76.06 80.19
The robustness of parametric statistical methods 203
We now report some of Rudolph’s results in Table 12. Because P1 and P2 differed only
by not more than 0.4 we report the arithmetic mean.
The results can not be interpreted as if the power does not increase with increasing n.
From (16) we see that the same ∆ for larger n means a smaller difference in the means. The
following conclusion can be drawn. For small samples the Dunnett test is uniformely more
powerful than the non-parametric competitors. If the sample size becomes larger, the non-
parametric tests are slightly better than the Dunnett test for kurtosis of 3.75 and larger. In
Figure 1 those values rather than the rule are the exception and due to the fact that a sample
size determination is more easy for the Dunnett test, we see no need to replace this test by a
non-parametric one. Furthermore we recommend the selection rules in place of multiple
comparisons, not only that they need less number of observations but they are also more
robust as shown in Result 7.
Indifference-zone formulation
Some of the estimations of P(dCS), calculated by means of 6000 simulation runs, are
shown in Table 13.
From these results we derive the following conclusions and recommendations:
1. The easily handled Bechhofers selection rule RB with extensive tables for planning
sample size n in Rasch et al. (1996) can be recommended.
2. If the underlying distribution is symmetric with unknown kurtosis, we recommend
adaptive selection rule RA.
3. If the underlying distribution possibly has a negative kurtosis or is skew with low
kurtosis and must be trimmed on account of outliers, α should be kept as small as
possible and both y and y should be given larger weight, say by the
( n (α +1)) ( n (1−α ))
use of RTr0.1.
4. If the underlying distribution is distinctly skew, we recommend the application of
the rank sum selection rule RRS.
204 D. Rasch, V. Guiard
Table 13:
Estimates Pˆ (dCS ) for different selection rules given 1 − β = 0.95
Selection n d /σ γ1 = 0 0 0 1 2 -2
rule
γ 2 = −1, 2 0 6 1.5 6 6
RB 21 0.75 0.9553 0.9496 0.9515 0.9454 0.9320 0.9627
47 0.5 0.9533 0.9495 0.9512 0.9392 0.9483 0.9627
RTr0.1 21 0.75 0,8970 0.9425 0.9847 0.9449 0.9605 0.9787
47 0.5 0.8940 0.9405 0.9878 0.9456 0.9635 0.9763
RTr0.2 21 0.75 0.8367 0.9313 0.9903 0.9359 0.9585 0.9810
47 0.5 0.8393 0.9256 0.9907 0.9346 0.9650 0.9790
RTi01 21 0.75 0.9280 0.9423 0.9760 0.9427 0.9485 0.9742
47 0.5 0.9270 0.9442 0.9783 0.9483 0.9557 0.9730
RTi0.2 21 0.75 0.8150 0.9253 0.9893 0.9290 0.9523
47 0.5 0.8168 0.9156 0.9888 0.9244 0.9557
RA 21 0.75 0.9862 0.9483 0.9897 0.9434 0.9335 0.9677
47 0.5 0.9995 0.9495 0.9913 0.9485 0.9412 0.9637
RRS 21 0.75 0.9244 0.9434 0.9867 0.9755 0.9982 0.9844
47 0.5 0.9369 0.9403 0.9884 0.9746 1. 0.9920
The estimated values of P(CS) for P*=0.95 are shown in Table 14.
In case of Pˆ (CS ) < P * −ε , this entry was set in boldface. As robustness condition we
use ε = β / 5 = 0.01 where β = 1 − P * denotes the error probability. The estimations of the
expected probabilities, f (d ) , of selecting non-best populations are shown in Table 15. In
the cases where a subset selection procedure fails in robustness we have marked the corre-
sponding entry with a „*“ sign.
For the case of equal variances, Listing and Rasch (1996) recommend to use the Gupta
rule RG if it is known, that the distributions are approximately normal. If the distributions
are completely unknown, the adaptive rule RA should be used.
For the case of unequal variances it was not possible to give a clear recommendation.
5. Final Conclusions
Table 14:
Pˆ (CS ) for P* = 0.95 , n = 35 , and k = 10
Table 15:
Pˆ (CS ) for d = 0.5 , P* = 0.95 , n = 35 , and k = 10
In general we can formulate that non-parametric tests are not really needed for the infer-
ences discussed in this article. Only the Wilcoxon test may be applied in some cases but only
in few cases (see result 4) it is really better than the t-test. But when we consider the prob-
lems arising in non-parametric procedures for determining optimal sample sizes as discussed
in Rasch (2003) we recommend the use of parametric inferences in general. In special cases
as well as in Rasch et al. (1996, 1998) as also in the design software CADEMO a warning is
given, if the sample size is too small to apply parametric inferences. In designing an experi-
ment, the sample size must be increased to reach robustness and in the analysis of un-
designed experiments (we hope that such an approach will disappear in psychological re-
search in the near future) the parametric procedure should be supported by the corresponding
non-parametric one.
Selection procedures, the classical and the sequential (test 2) t-test (as well as the F-test
for the effects in ANOVA models) for means and for regression coefficients in linear or
quasi-linear regression models are robust. For the t-test in the two- or k- sample problem this
is true, if the variances are equal in all populations. For unequal variances in the two-sample
problem the Welch-test is recommended.
For comparing variances the F-test is extremely non-robust or sensitive, it should never
be used (don’t forget that the F-test for comparing means is pretty robust). To compare vari-
ances, the Box-Scheffé-test (c=2 or 3) is 20%-robust for n ≥ 6 and therefore recommended.
For n ≥ 18 Levene-s test can also be applied.
206 D. Rasch, V. Guiard
6. References
1. Bechhofer, R.E. (1954): A single-sample multiple decision procedure for ranking means of
normal populations with known variances. Am. Math. Statist., 16-39.
2. Bock, J. (1982): Definition der Robustheit. In: Guiard, V. (ed.) 1981, 1-9.
3. Box, G.E.P. & Tiao, G.C. (1964): A note on criterion robustness and inference robustness.
Biometrika 51, 1-34.
4. Domröse, H., Rasch, D. (1987): Robustness of Selection Procedures. Biom. J. 29, 5, 541-553.
5. Feige, K.-D., Guiard, V., Herrendörfer, G., Hoffmann, J., Neumann, P., Peters, H., Rasch, D.
& Vettermann, Th. (1985): Results of Comparisons Between Different Random Number
Generators. In: Rasch, D. & Tiku, M.L. (editors): (1985), 30-34.
6. Fleishman, A. J. (1978): A method for simulating non-normal distributions. Psychometrika 43,
521-532.
7. Frick, D. (1985): Robustness of the two-sample sequential t-test. In: Rasch, D. & Tiku, M.L.
(editors): (1985), 35-36
8. Guiard, V. (ed.) (1981): Robustheit II – Arbeitsmaterial zum Forschungsthema Robustheit.
Probleme der angewandten Statistik, Heft 5, Dummerstorf-Rostock.
9. Guiard, V. & Rasch, D. (ed.) (1987): Robustheit Statistischer Verfahren. Probleme der
angewandten Statistik, Heft 20, Dummerstorf-Rostock.
10. Guiard, V. (1996): Different definitions of ∆-correct selection for the indifference zone
formulation. J. of Statistical Planing and Inference 54, 175-199.
11. Gupta, S.S. (1956): On a decision rule for a problem in ranking means. Mimeogr. Ser. No. 150,
Univ. of North Carolina, Chapel Hill.
12. Gupta, S.S. (1965): On some multiple decision (selection and ranking rules). Technometrics 7,
225-245.
13. Gupta, S.S. & Hsu, J.C. (1980): Subset selection procedures with application to motor-vehicle
fatality data in a two-way layout. Technometrics 22, 543-546.
14. Häusler, (2003): Personal communication.
15. Hajnal, J. (1961): A two-sample sequential t-test. Biometrika 48, 65-75.
16. Herrendörfer, G. (ed.) (1980): Robustheit I - Arbeitsmaterial zum Forschungsthema Robustheit.
Probleme der angewandten Statistik, Heft 4, Dummerstorf-Rostock.
17. Herrendörfer, G. & Rasch, D. (1981): Definition of Robustness and First Results of an Exact
Method. Biometrics 37, 605.
18. Herrendörfer, G., Rasch, D. & Feige, K.-D. (1983): Robustness of Statistical Methods. II
Methods for the one-sample problem. Biom. Jour. 25, 327-343.
19. Hogg, R.V. (1974): Adaptive robust procedures: a partial review and some suggestions for
future applications and theory. JASA 69, 909-927.
20. Hsu, J.C. (1980): Robust and nonparametric subset selection procedures. Comm. Statist. Theory
Methods A 9, 1439-1459.
21. Huber, P.J. (1964): Robust estimation of a location parameter. Ann. Math. Statist. 35, 73-101.
22. Huber, P.J. (1972): Robust Statistics: A review. Ann. Math. Statist. 43, 1041-1067.
23. Ito, K. (1969): On the effect of heteroscedasticity and non-normality upon some multivariate test
procedures. Proc. Internat. Symp. Multiv. Analysis vol. 2, New York Academic Press, 87-120.
24. Johnson, N.J. (1978): Modified t-tests and confidence intervals for asymmetric populations.
JASA 73, 536-544.
25. Listing, J., Rasch, D. (1996): Robustnes of subset selection procedures. J. Statist. Planning and
Inference 54, 291-305.
The robustness of parametric statistical methods 207
26. Nemeny, P. (1963): Distribution-free multiple comparisons. PhD thesis Princeton Univ.,
Princeton N.J.
27. Nürnberg, G. (1982): Beiträge zur Versuchsplanung für die Schätzung von Varianzkom-
ponenten und Robustheitsuntersuchungen zum Vergleich zweier Varianzen. Probleme der
angewandten Statistik, Heft 6, Dummerstorf-Rostock.
28. Nürnberg, G. (1985): Robustness of two-sample tests for variances. In:. Rasch, D. & Tiku, M.L.
(editors): (1985), 75-82.
29. Nürnberg, G. & Rasch, D. (1985): The Influence of Different Shapes of Distributions with the
same first four Moments on Robustness. In: Rasch, D. & Tiku, M.L. (editors): (1985), 83-84.
30. Mac Laren , M.D. & Marsaglia, G. (1965): Uniform Random Number Generators. J. Assoc.
Comput. Mach. 12, 83-89.
31. Mann, H.B. & Whitney, D.R. (1947): On a Test whether One of Two Random Variables is
Stochastically Larger than the Other. Ann. Math. Statist. 18, 50-60.
32. Odeh, R.E. & Evans, J.O. (1974): Algorithm AS70: The percentage points of the normal
distribution. Appl. Statist. 23, 96-97.
33. Posten, H.O (1985): Robustness of the Two-Sample T-test. In: Rasch, D. & Tiku, M.L.
(editors): (1985), 92-99.
34. Posten, H.O (1978): The Robustness of the Two-Sample T-test over the Pearson System. J. of
Statist. Comput. and Simulation, 6, 295-311.
35. Posten, H.O. (1982): Two-Sample Wilcoxon Power over the Pearson System. J. of Statist.
Comput. and Simulation, 16, 1-18.
36. Randlers, R.H., Ramberg, J.S. & Hogg, R.V. (1973): An adaptive procedure for selecting the
population with largest location parameters. Technometrics 15, 769-778.
37. Rasch, D. & Herrendörfer G. (ed.) (1982): Robustheit III - Arbeitsmaterial zum Forschungs-
thema Robustheit. Probleme der angewandten Statistik, Heft 7, Dummerstorf-Rostock
38. Rasch, D. (1983): First results on robustness of the one-sample sequential t-test. Transactions of
the 9th Prague Conf. Inf. Theory, Statist Dec. Funct., Rand. Processes. Academia Publ. House
CAS, 133-140.
39. Rasch, D. (1984): Robust Confidence Estimation and Tests for Parameters of Growth Functions.
In: Gyori, I (Ed.): Szamitastechnikai es Kibernetikai Modserek. Alkalmazasa az
orvostudomangban es a Biologiaban, Szeged, 306-331.
40. Rasch, D. & Schimke, E. (1985): Die Robustheit von Konfidenzschätzungen und Tests für
Parameter der exponentiellen Regression. In: Rasch, D. (ed.) (1985), 40-92
41. Rasch, D. & Tiku, M.L. (editors) (1985): Robustness of Statistical Methods and Nonparametric
Statistics. Proceedings of the Conference on Robustness of Statistical Methods and
Nonparametric Statistics, held at Schwerin, May 29-June 2, 1983. Reidel Publ. Co. Dortrecht,
Boston, Lancaster, Tokyo.
42. Rasch, D. (1985 a): Robustness of Three Sequential One-Sample Tests Against Non- Normality.
In: Rasch, D. & Tiku, M.L. (editors): (1985), 100-103.
43. Rasch, D. (1985 b): Robustness of Sequential Tests for Means. Biom. Jour. 27, 139-148.
44. Rasch, D. (ed.) (1985c): Robustheit IV - Arbeitsmaterial zum Forschungsthema Robustheit.
Probleme der angewandten Statistik, Heft 13, Dummerstorf-Rostock.
45. Rasch, D. (1995): Mathematische Statistik. Joh. Ambrosius Barth, Berlin, Heidelberg (851 S.).
46. Rasch, D. (2003): Determining the Optimal Size of Experiments and Surveys in Empirical
Research. Psychology Science, vol. 45 ; suppl. IV ; 3-47.
47. Rasch, D. & Herrendörfer, G. (1981 a): Review of Robustness and Planning of Simulation
Experiments for its Investigation. Biometrics 37, 607.
208 D. Rasch, V. Guiard
48. Rasch, D. & Herrendörfer, G. (1981 b): Robustheit statistischer Methoden. Rostocker Math.
Kolloq. 17, 87-104.
49. Rasch, D., Herrendörfer, G., Bock, J., Victor, N. & Guiard, V. (1998): Verfahrensbibliothek
Versuchsplanung und -auswertung. R. Oldenboug Verlag München Wien, Band I 1996; Band II
1998.
50. Rasch, D., Kubinger, K., Schmidtke, J. & Häusler (2004): Use and misuse of Hypothesis
Testing. In preparation.
51. Rasch, D., Verdooren, L.R., Gowers, J.I. (1999): Fundamentals in the Design and Analysis of
Experiments and Surveys – Grundlagen der Planung und Auswertung von Versuchen und
Erhebungen. R. Oldenburg Verlag München Wien.
52. Reed A.H. & Frantz, M.E. (1979): A Sequential Two-Sample t-Test using Welch Type
Modification for unequal variances. Comm. Statist. A –Theory and Methods, 14, 1459-1471.
53. Rudolph, P.E. (ed.) (1985 a): Robustheit V - Arbeitsmaterial zum Forschungsthema Robustheit.
Probleme der angewandten Statistik, Heft 15, Dummerstorf-Rostock.
54. Rudolph, P.E. (1985 b): Robustness of many-one statistics. Rasch, D. & Tiku, M.L. (editors):
(1985), 128-133.
55. Schneider, B. (1992): An Interactive Computer Program for Design and Monitoring of
Sequential Clinical Trials. International Biometric Conference 1992, Hamilton, New Zealand,
Invited Paper Volume.
56. Steel, R.G.D. (1959): A multiple comparison rank sum test. Biometrics, 15, 560-572.
57. Teuscher, F. (1979): Ein hierarchischer Pseudo-Zufallszahlengenerator. Unpublished paper at
the research centre Dummerstorf-Rostock.
58. Teuscher, F. (1985): Simulation Studies on Robustness of the t- and u- Test against Truncation
of the Normal distribution. In: Rasch, D. & Tiku, M.L. (editors): (1985), 145-151.
59. Tiku, M.L. (1980): Robustness of MML estimators based on censored samples and robust test
statistics. J. Statist. Planning and Inference, 4, 123-143.
60. Tiku, M.L. (1981): Testing linear contrasts of means in experimental design without assuming
normality and homogeneity of variances. Invited paper presented at the March 22 to 26, 1981
Biometric Colloquium of the GDR-Region of the Biometric Society.
61. Tiku, M.L. (1982): Robust statistics for testing equality of means or variances. Comm. Statist
Theory Methods A 11, 2543-2558.
62. Tuchscherer, A. (1985): The robustness of some procedures for the two-sample location
problem – a simulation study (concept). In: Rasch, D. & Tiku, M.L. (editors): (1985), 159-164.
63. Tuchscherer, A. & Pierer, H. (1985): Simulationsuntersuchungen zur Robustheit verschiedener
Verfahren zum Mittelwertvergleich im Zweistichprobenproblem (Simulationsergebnisse). In:
Rudolph, P.E. (ed.) (1985 a), 1-42.
64. Verdooren, L.R. (1963): Extended tables of critical values for Wilcoxons test. Biometrika, 50,
177-185.
65. Wald, A. (1947): Sequential Analysis. John Wiley, New York.
66. Welch, B.L. (1947): The Generalization of Student’s Problem when Several Different
Population Variances are Involved. Biometrika, 34, 28-35.
67. Wilcoxon, F. (1945): Individual Comparisons by Ranking Methods. Biometrics, 1, 80-82.
68. Zielinsky, R. (1977): Robustness: a quantitative approach. Bull. Laced. Polonaise de Science,
Ser. Math., Astr. et Phys., Vol. XXV, 12, 1281-1286.