Math204 NonParOneTwo
Math204 NonParOneTwo
Non-parametric tests are normally based on ranks of the data samples, and test hypotheses relating to
quantiles of the probability distribution representing the population from which the data are drawn.
Specifically, tests concern the population median, η, where
1
Pr[ Observation ≤ η ] =
2
The sample median, xMED , is the mid-point of the sorted sample; if the data x1 , . . . , xn are sorted into
ascending order, then
xm n odd, n = 2m + 1
xMED =
xm + xm+1
n even, n = 2m
2
H0 : η = η0
Ha : η > η0
whereas to test
H0 : η = η0
Ha : η < η0
we define S by
S = Number of observations less than η0
If H0 is true, it follows that µ ¶
1
S ∼ Binomial n,
2
The p-value is defined by
p = Pr[X ≥ S]
where X ∼ Binomial(n, 1/2). The rejection region for significance level α is defined implicitly by
the rule
Reject H0 if α ≥ p.
The Binomial distribution is tabulated on pp 885-888 of McClave and Sincich.
1
(b) For a two-sided test,
H0 : η = η0
Ha : η 6= η0
p = 2 Pr[X ≥ S]
Notes :
1. The only assumption behind the test is that the data are drawn independently from a continuous
distribution.
2. If any data are equal to η0 , we discard them before carrying out the test.
3. Large sample approximation. If n is large (say n ≥ 30), and X ∼ Binomial(n, 1/2), then it can be
shown that
X∼ : Normal(np, np(1 − p))
Thus for the sign test, where p = 1/2, we can use the test statistic
n n
S−S−
Z=r 2 =√ 2
1 1 1
n× × n×
2 2 2
4. For the large sample approximation, it is common to make a continuity correction, where we
replace S by S − 1/2 in the definition of Z
µ ¶
1 n
S− −
2 2
Z= √ 1
n×
2
Tables of the standard Normal distribution are given on p 894 of McClave and Sincich.
2
2. T WO S AMPLE T ESTS FOR I NDEPENDENT S AMPLES :
T HE M ANN -W HITNEY-W ILCOXON T EST
For a two independent samples of size n1 and n2 , to test the hypothesis of equal population medians
η1 = η2
we use the Wilcoxon Rank Sum Test, or an equivalent test, the Mann-Whitney U Test; we refer to this
as the
Mann-Whitney-Wilcoxon (MWW) Test
By convention it is usual to formulate the test statistic in terms of the smaller sample size. Without
loss of generality, we label the samples such that
n1 > n 2 .
The test is based on the sum of the ranks for the data from sample 2.
EXAMPLE : n1 = 4, n2 = 3 yields the following ranked data
SAMPLE 2 2 1 1 1 2 1
0.16 0.20 0.31 0.48 1.02 1.97 3.11
RANK 1 2 3 4 5 6 7
Ha : η1 < η2
Ha : η1 > η2
Ha : η1 = η2
3
Large Sample Test
If n2 ≥ 10, a large sample test based on the Z statistic
n1 n2
U−
Z=r 2
n1 n2 (n1 + n2 + 1)
12
can be used. Under the hypothesis H0 : η1 = η2 ,
Z∼
: Normal(0, 1)
• One-sided tests:
Ha : η1 6= η2 Rejection Region is R2 ≤ TL or R2 ≥ TU
Notes :
1. The only assumption is are needed for the test to be valid is that the samples are independently
drawn from two continuous distributions.
(n1 + n2 )(n1 + n2 + 1)
R1 + R2 =
2
3. If there are ties (equal values) in the data, then the rank values are replaced by average rank
values.
DATA VALUE 0.16 0.20 0.31 0.31 0.48 1.97 3.11
ACTUAL RANK 1 2 3 3 5 6 7
AVERAGE RANK 1 2 3.5 3.5 5 6 7