0% found this document useful (0 votes)
54 views13 pages

Non-Parametric Tests For Two Samples

The Wilcoxon rank-sum test can be used to compare the locations of two distributions when the distributions are continuous but may have different means and shapes. The test involves ranking all observations from both samples together and summing the ranks for each sample. If the rank sum for a sample is less than the critical value from tables, the null hypothesis of equal means can be rejected. An example applies the test to compare the mean axial twisting resistance of two metal alloys, ranking the observations, calculating the rank sums, and comparing to critical values to determine if the null hypothesis can be rejected.

Uploaded by

mahyar777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views13 pages

Non-Parametric Tests For Two Samples

The Wilcoxon rank-sum test can be used to compare the locations of two distributions when the distributions are continuous but may have different means and shapes. The test involves ranking all observations from both samples together and summing the ranks for each sample. If the rank sum for a sample is less than the critical value from tables, the null hypothesis of equal means can be rejected. An example applies the test to compare the mean axial twisting resistance of two metal alloys, ranking the observations, calculating the rank sums, and comparing to critical values to determine if the null hypothesis can be rejected.

Uploaded by

mahyar777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Non-parametric Tests

for Two Samples

45.2

Introduction
In Section 45.1 we look at the sign test and the Wilcoxon signed-rank test. Each of these is a
one-sample test which is used for hypotheses about the location (or average of some sort) of a
single distribution. When we looked at t-tests in
41 we saw how hypotheses concerning the
mean of a single normal distribution could be tested using a one-sample t-test and the means of two
normal populations could be compared using a two-sample t-test. In the same way we can have a
two-sample nonparametric test to compare the locations of two distributions when we are unwilling
to assume that the distribution is normal or belongs to some other particular type. In this Section
we will look at one such test, the Wilcoxon rank-sum test.

'

be familiar with the general ideas and terms


of signicance tests

Prerequisites
Before starting this Section you should . . .

Learning Outcomes
On completion you should be able to . . .
"

24

be familiar with the ideas of a nonparametric


test and rank-based tests as explained in
Section 45.1
be familiar with t-tests
be familiar with the general ideas of
continuous distributions

&
#

decide when a Wilcoxon rank-sum test may


be used
use and interpret the results of a Wilcoxon
rank-sum test

HELM (2005):
Workbook 45: Non-parametric Statistics

1. The Wilcoxon rank-sum test


Sometimes called the Mann-Whitney test, the Wilcoxon rank-sum test may be applied to continuous
distributions which have the same shape and spread but may have dierent means. If we take the
distributions as X1 with mean 1 and X2 with mean 2 then the Wilcoxon rank-sum test may be
used to test the null hypothesis
H 0 : 1 = 2
Against the alternatives
H1 : 1 = 2
H1 : 1 > 2
H1 : 1 < 2
Now assume that a random sample of size n1 is taken from population X1 and a random sample of
size n2 is taken from population X2 . As with the Wilcoxon signed-rank test, the theory is demanding
but the application is straightforward. The test procedure is as follows:
1. Arrange all of the n1 + n2 sample members in ascending order and assign ranks to them. Equal
ranks are dealt with in the usual way.
2. Find the sum of the ranks assigned to members of the smaller of the two samples and call this
S1 .
3. Find the sum of the ranks assigned to members of the larger of the two samples and call this
S2 . Normally, this is not done directly. It may be shown that
S2 =

(n1 + n2 )(n1 + n2 + 1)
S1
2

and it is usual to use this relationship to nd S2 rather than do the direct calculation to save
both time and eort.
4. When testing H0 : 1 = 2 against H1 : 1 = 2 , Tables 2 and 3 given at the end of this
Workbook may be used directly to test at both the 5% and 1% levels of signicance. Rejection
of the null hypothesis occurs when either rank sum is less than the tabulated critical value.
5. In the case of one-tailed tests the same tables may be used but with these tables the levels
of signicance are restricted to 2.5% (from the 5% table) and 0.5% (from the 1% table).
Examples given here will normally use a two-tailed test and the 5% level of signicance.
6. The tables gives critical values for sample sizes n 25. For n > 25 we use a normal distribution
as an approximation to the distribution of the rank sum.

HELM (2005):
Section 45.2: Non-parametric Tests for Two Samples

25

Example 6
The properties of a new alloy for potential use in aircraft wing construction are
being investigated. If the new alloy is to replace the one in current use, it must
be established that the mean axial twisting resistance of the two alloys does not
dier signicantly. 10 samples of each alloy are tested and the mean axial twisting
resistance is measure. The results are given in the table below.
Mean Axial Twisting Resistance
Current Alloy
New Alloy
2224 2306 2247
2387
2340 2356 2302
2407
2410 2367 2405
2409
2389 2380 2399
2388
2402 2401 2378
2397
Use the Wilcoxon rank-sum test to decide, at the 5% level of signicance, whether
there is evidence of a signicant dierence in the mean axial twisting resistance of
the two alloys.

Solution
Denoting the mean axial twisting resistance of the current alloy by 1 and the mean axial twisting
resistance of the new alloy by 2 , we will test the hypothesis
H 0 : 1 = 2
against the alternative
H1 : = 2 .
Note that in the following table the use of -c and -n to denote current and new alloys is simply a
device to enable use to distinguish between the two samples for the purposes of calculation.
Data
2224-c
2340-c
2410-c
2389-c
2402-c
2306-c
2356-c
2367-c
2380-c
2401-c
2247-n
2302-n
2405-n
2399-n
2378-n
2387-n
2407-n
2409-n
2388-n
2397-n

Sorted
2224-c
2247-n
2302-n
2306-c
2340-c
2356-c
2367-c
2378-n
2380-c
2387-n
2388-n
2389-c
2397-n
2399-n
2401-c
2402-c
2405-n
2407-n
2409-n
2410-c

Ranked
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Note that a spreadsheet such as Excel will sort quickly and accurately when this notation is used.

26

HELM (2005):
Workbook 45: Non-parametric Statistics

Solution (contd.)
We now calculate the sum of the ranks assigned to the current (-c) alloy. Note that in this case the
choice of which sum to calculate is arbitrary since both samples are the same size. We have
SC = (1 + 4 + 5 + 6 + 7 + 9 + 12 + 15 + 16 + 20) = 95
The sum SN of the ranks assigned to the new alloy is calculated as follows:
(10 + 10)(10 + 10 + 1)
20 21
SC =
95 = 115
2
2
From Table 2, the critical value at the 5% level of signicance corresponding to two samples each
of size 10 is 78. As neither rank sum is less than (or equal to) this value we conclude that on the
basis of the available evidence we cannot reject the null hypothesis at the 5% level of signicance.
SN =

Now do this Task.


Task

A motorcycle engineer is investigating the resistance to stretching of two alloy


steels for potential use in chains. The engineer wishes to establish in the rst
instance whether there is any dierence in the mean resistance to stretch of the
two alloys. 10 samples of one alloy and 12 samples of the second alloy are tested
under the same conditions and the actual stretch is measured. All samples are the
same length. The results are given in the table below.
Actual Stretch
Steel-Alloy 1
2.22 2.30
2.34 2.35
2.41 2.36
2.38 2.39
2.40 2.41

Found (mm)
Steel-Alloy 2
2.24 2.38
2.31 2.43
2.42 2.25
2.45 2.43
2.37 2.29
2.28 2.46

Use the Wilcoxon rank-sum test to decide, at the 5% level of signicance, whether
there is evidence of a signicant dierence in the mean resistance to stretching of
the two alloys.

Your solution
Work the problem on a separate piece of paper. Record the important stages of the work here
together with your conclusions.

HELM (2005):
Section 45.2: Non-parametric Tests for Two Samples

27

Answer
Denoting the mean resistance to stretching of alloy 1 by 1 and the mean resistance to stretching
of alloy 2 by 2 , we will test the hypothesis
H 0 : 1 = 2
against the alternative
H1 : 1 = 2 .
Note that the use of -1 and -2 to denote the two alloys is simply a device to enable us to distinguish
between the two samples for the purposes of calculation.
Data
2.22-1
2.34-1
2.41-1
2.38-1
2.40-1
2.30-1
2.35-1
2.36-1
2.39-1
2.41-1
2.24-2
2.31-2
2.42-2
2.45-2
2.37-2
2.28-2
2.38-2
2.43-2
2.25-2
2.43-2
2.29-2
2.46-2

Sorted
2.22-1
2.24-2
2.25-2
2.28-2
2.29-2
2.30-1
2.31-2
2.34-1
2.35-1
2.36-1
2.37-2
2.38-1
2.38-2
2.39-1
2.40-1
2.41-1
2.41-1
2.42-2
2.43-2
2.43-2
2.45-2
2.46-2

Ranked
1
2
3
4
5
6
7
8
9
10
11
12.5
12.5
14
15
16.5
16.5
18
19.5
19.5
21
22

We now calculate the sum S1 of the ranks assigned to alloy 1 since this is the smaller sample. We
have:
S1 = (1 + 6 + 8 + 9 + 10 + 12.5 + 14 + 15 + 16.5 + 16.5) = 108.5
The sum S2 of the ranks assigned to the second alloy is calculated as follows:
(10 + 12)(10 + 12 + 1)
22 23
S1 =
108.5 = 144.5
2
2
From Table 2, the critical value at the 5% level of signicance corresponding to samples of sizes 10
and 12 is 85. As neither rank sum is less than (or equal to) this value we conclude that on the basis
of the available evidence we cannot reject the null hypothesis at that 5% level of signicance.
S2 =

28

HELM (2005):
Workbook 45: Non-parametric Statistics

General comments about the Wilcoxon rank-sum test

1. It can be shown that in cases where the underlying distribution is normal, the t-test is preferable
to the Wilcoxon rank-sum test.

2. In cases where the underlying distribution in non-normal and the conditions for the t-test cannot
reasonably be met, it may well be preferable to use the Wilcoxon rank-sum test.

3. In cases where the underlying distribution is symmetric but non-normal and exhibits substantially larger tails then the normal distribution, it is often preferable to use the Wilcoxon rank-sum
test since the mean of such distributions is often unstable.

Example 7
A civil engineer is investigating the compressive strength of a new type of insulating
block for potential use in the building of new houses.
The engineer wishes to establish whether there is any dierence in the mean compressive strengths of the blocks in current usage and the proposed new block.
Ten samples of the current block and 14 samples of the new block are tested under
the same conditions and their compressive strength in pounds per square inch (psi)
is measured. All samples are of the standard size used in the building industry.
The results are given in the table below.
Compressive Strength (mm)
Current Block New Block
2228 2301 2243 2389
2342 2354 2311 2436
2413 2366 2425 2258
2387 2398 2456 2437
2408 2417 2371 2293
2284 2467
2313 2324
Use the Wilcoxon rank-sum test to decide, at the 5% level of signicance, whether
there is evidence of a signicant dierence in the mean compressive strengths of
the two types of insulating blocks.

HELM (2005):
Section 45.2: Non-parametric Tests for Two Samples

29

Solution
Denoting the mean compressive strength of the current blocks by 1 and the mean compressive
strength of the new blocks by 2 , we will test the hypothesis
H 0 : 1 = 2

against the alternative H1 : 1 = 2 .

Note that the use of -c and -n to denote the current and new blocks is simply to device to enable
us to distinguish between the two samples for the purposes of calculation.
Data
2228-c
2342-c
2413-c
2387-c
2408-c
2301-c
2354-c
2366-c
2398-c
2417-c
2243-n
2311-n
2425-n
2456-n
2371-n
2284-n
2313-n
2389-n
2436-n
2258-n
2437-n
2293-n
2467-n
2324-n

Sorted
2228-c
2243-n
2258-n
2284-n
2293-n
2301-c
2311-n
2313-n
2324-n
2342-c
2354-c
2366-c
2371-n
2387-c
2389-n
2398-c
2408-c
2413-c
2417-c
2425-n
2436-n
2437-n
2456-n
2467-n

Ranked
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

We now calculate the sum SC of the ranks assigned to the blocks in current usage since this is the
smallest sample. We have:
SC = (1 + 6 + 10 + 11 + 12 + 14 + 16 + 17 + 18 + 19) = 124
The sum SN of the ranks assigned to the new type of block is calculated as follows:
(10 + 14)(10 + 14 + 1)
24 25
SC =
124 = 176
2
2
From Table 2, the critical value at the 5% level of signicance corresponding to samples of sizes 10
and 14 is 91. As neither rank sum is less than (or equal to) this value we conclude that on the basis
of the available evidence we cannot reject the null hypothesis at the 5% level of signicance.
SN =

30

HELM (2005):
Workbook 45: Non-parametric Statistics

Task

The breaking strengths of cables made with two dierent compounds are compared.
Standard lengths of ten samples using compound A and twelve using compound
B are tested. The breaking strengths in newtons are as follows.
Compound A
Compound B
10854 11627 10000 11632 11000 10856 10245 9157
9106 10051 13720 11222 11072 9540 11000 10959
10325 10001
8851 11513 10030 11197
Use a Wilcoxon rank-sum test to test the null hypothesis that the mean breaking
strengths for the two compounds are the same against the two-sided alternative.
Use the 5% level of signicance.

Your solution

HELM (2005):
Section 45.2: Non-parametric Tests for Two Samples

31

Answer
The data and their ranks are as follows.
Data
Sorted
Strength Compound Strength Compound
10854
A
8851
B
11627
A
9106
A
10000
A
9157
B
11632
A
9540
B
9106
A
10000
A
10051
A
10001
A
13720
A
10030
B
11222
A
10051
A
10325
A
10245
B
10001
A
10325
A
11000
B
10854
A
10856
B
10856
B
10245
B
10959
B
9157
B
11000
B
11072
B
11000
B
9540
B
11072
B
11000
B
11197
B
10959
B
11222
A
8851
B
11513
B
11513
B
11627
A
10030
B
11632
A
11197
B
13720
A

Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14.5
14.5
16
17
18
19
20
21
22

The sum of the ranks for Compound A is 123. The sum of the ranks for Compound B is
22 23
123 = 130.
2
From Table 2 we see that the critical value at the 5% level for a two-tailed test is 85. Neither rank
sum is less than this so we do not reject the null hypothesis. There is no signicant evidence of a
dierence in mean breaking strength between cables made with the two compounds.

32

HELM (2005):
Workbook 45: Non-parametric Statistics

Exercises
1. The lifetimes of plastic clips with two dierent designs are compared by subjecting clips to
continuous exing until they break. Twelve of each design are tested. The lifetimes in hours
are as follows.

36.1
15.6
14.3

Design A
16.6 24.6
28.3 16.0
10.8 0.7

38.5 62.5
44.7 13.3
6.5 12.7

Design B
28.2 19.9 33.9
39.4 19.3 23.7
122.0 168.0 55.0

Use a Wilcoxon rank-sum test to test the null hypothesis that the mean lifetimes are equal
for the two designs against the alternative that they are not. Use the 5% level of signicance.
Comment on any assumptions which are necessary.
2. An experiment is conducted to test whether the installation of cavity-wall insulation reduces
the amount of energy consumed in houses. Out of twenty otherwise similar houses on a housing
estate, ten are selected at random for insulation. The total energy consumption over a winter
is measured for each house. The data, in mwh, are as follows.
Without insulation
12.6 11.8 12.8 11.4 14.4 10.8
12.3 11.5 13.2 11.0 11.8 10.7

With insulation
9.9 9.5 10.0 10.4
11.8 7.5 11.8 10.1

Use a Wilcoxon rank-sum test to test the null hypothesis that the insulation has no eect
against the alternative that it reduces energy consumption. Use the 1% level of signicance.

HELM (2005):
Section 45.2: Non-parametric Tests for Two Samples

33

Answers
1. The data, sorted into ascending order within each design, and their ranks are as follows.

Obs.
Rank
Obs.
Rank
Obs.
Rank

0.7
1
15.6
7
28.3
15

Design A
6.5 10.8 14.3 12.7
2
3
6
4
16.0 16.6 24.6 23.7
8
9
13
12
36.1 38.5 44.7 55.0
17
18
20
21

Design B
13.3 19.3
5
10
28.2 33.9
14
16
62.5 122.0
22
23

19.9
11
39.4
19
168.0
24

The rank sum for design A is 119 and the rank sum for design B is
24 25
119 = 181.
2
Table 2 shows that the critical value for a two-sided test at the 5% level of signicance is
115. Neither rank sum is less than 115 so we do not reject the null hypothesis. There is no
signicant evidence of a dierence in the mean lifetimes between the designs.
Comment: We assume that the two distributions have the same shape and spread. It may
be that the spread in this case would increase with the mean but this could be corrected by
application of a transformation such as taking logs and this would not aect the ranks and
so would have no eect on the test outcome. In fact it is sucient to assume that the two
distributions would be the same under the null hypothesis and this seems reasonable in this
case.
2. The data, sorted into ascending order within each group, and their ranks are as follows.

Obs.
Rank
Obs.
Rank

11.0
9.0
12.3
16.0

Without insulation
11.4 11.5 11.8
10.0 11.0 13.5
12.6 12.8 13.2
17.0 18.0 19.0

11.8 7.5
13.5 1.0
14.4 10.4
20.0 6.0

With insulation
9.5 9.9 10.0 10.1
2.0 3.0 4.0 5.0
10.7 10.8 11.8 11.8
7.0 8.0 13.5 13.5

The rank sum for houses without insulation is 147. The rank sum for houses with insulation
is
20 21
147 = 63.
2
From Table 3 we see that the critical value for a two-sided test at the 1% level is 71. The
rank sum for the houses with insulation is 63. This is less than 71 so our result is signicant
at the 1% level in a two-tailed test and therefore signicant at the 0.5% level in a one-tailed
test. The table given does not give one-sided 1% critical values but, because the result is
signicant at the 0.5% level, we can deduce that it is signicant at the 1% level. Therefore we
reject the null hypothesis and conclude that the insulation does reduce energy consumption.

34

HELM (2005):
Workbook 45: Non-parametric Statistics

Table 1

Critical values for the Wilcoxon signed-rank test

0.10
0.05

0.05
0.025

0.02
0.01

0.01
0.005

4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

0
2
3
5
8
10
13
17
21
25
30
35
41
47
53
60
67
75
83
91
100

0
2
3
5
8
10
13
17
21
25
29
34
40
46
52
58
65
73
81
89

0
1
3
5
7
9
12
15
19
23
27
32
37
43
49
55
62
69
76

0
1
3
5
7
9
12
15
19
23
27
32
37
42
48
54
61
68

Two tailed tests


One tailed tests

1
For n > 25 the rank sum has an approximately normal distribution with mean M = n(n + 1) and
4

standard deviation s = n(n + 1)(2n + 1)/24.

HELM (2005):
Section 45.2: Non-parametric Tests for Two Samples

35

Table 2

Critical Values for the Wilcoxon Rank-Sum Test (5% Two-tail Values)
n1
4
5
6
7
8
9
10
11
12
13
14
15
n2

\
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

10
11
12
13
14
15
15
16
17
18
19
20
21
21
22
23
24
25
26
27
28
28
29

17
18
20
21
22
23
24
26
27
28
29
31
32
33
34
35
37
38
39
40
42

26
27
29
31
32
34
35
37
38
40
42
43
45
46
48
50
51
53
55

36
38
40
42
44
46
48
50
52
54
56
58
60
62
64
66
68

49
51
53
55
58
60
63
65
67
70
72
74
77
79
82

63
65
68
71
73
76
79
82
84
87
90
93
95

78
81
85
88
91
94
97
100
103
107
110

96
99
103
106
110
114
117
121
124

115
119
123
127
131
135
139

137
141
145
150
154

160
164
169

185

Table 3

Critical Values for the Wilcoxon Rank-Sum Test (1% Two-tail Values)
n1
4
5
6
7
8
9
10
11
12
13
14
15
n2

\
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

36

10
10
11
11
12
12
13
14
14
15
15
16
16
17
18
18
19
19
20
20
21

15
16
17
17
18
19
20
21
22
22
23
24
25
26
27
28
29
29
30
31
32

23
24
25
26
27
28
30
31
32
33
34
36
37
38
39
40
42
43
44

32
34
35
37
38
40
41
43
44
46
47
49
50
52
53
55
57

43
45
47
49
51
53
54
56
58
60
62
64
66
68
70

56
68
61
63
65
67
70
72
74
76
78
81
83

71
74
76
79
81
84
86
89
92
94
97

87
90
93
96
99
102
105
108
111

106
109
112
115
119
122
125

125
129
133
137
140

147
151
155

171

HELM (2005):
Workbook 45: Non-parametric Statistics

You might also like