What Is The Sign Test?: Step 1
What Is The Sign Test?: Step 1
The sign test compares the sizes of two groups. It is a non-parametric or “distribution free” test, which means the
test doesn’t assume the data comes from a particular distribution, like the normal distribution. The sign test is an
alternative to a one sample t test or a paired t test. It can also be used for ordered (ranked) categorical data.
The null hypothesis for the sign test is that the difference between medians is zero.
For a one sample sign test, where the median for a single sample is analyzed, see: One Sample Median Tests.
Step 2: Add a fourth column indicating the sign of the number in column 3.
Step 3: Count the number of positives and negatives.
• 4 positives.
• 12 negatives.
12 negatives seems like a lot, but we can’t say for sure that it’s significant (i.e. that it didn’t happen by chance) until
we run the sign test.
Step 3: Add up the number of items in your sample and subtract any you had a difference of zero for (in column 3).
The sample size in this question was 17, with one zero, so n = 16.
Step 4: Find the p-value using a binomial distribution table or use a binomial calculator. I used the calculator,
putting in:
• .5 for the probability. The null hypothesis is that there are an equal number of signs (i.e. 50/50).
Therefore, the test is a simple binomial experiment with a .5 chance of the sign being negative and .5 of
it being positive (assuming the null hypothesis is true).
• 16 for the number of trials.
• 4 for the number of successes. “Successes” here is the smaller of either the positive or negative
signs from Step 2.
The p-value is 0.038, which is smaller than the alpha level of 0.05. We can reject the null hypothesis and say there
is a significant difference.
Wilcoxon Signed Rank Test
Another popular nonparametric test for matched or paired data is called the Wilcoxon Signed Rank Test. Like the
Sign Test, it is based on difference scores, but in addition to analyzing the signs of the differences, it also takes into
account the magnitude of the observed differences.
Let's use the Wilcoxon Signed Rank Test to re-analyze the data in Example 4 on page 5 of this module. Recall that
this study assessed the effectiveness of a new drug designed to reduce repetitive behaviors in children affected with
autism. A total of 8 children with autism enroll in the study and the amount of time that each child is engaged in
repetitive behavior during three hour observation periods are measured both before treatment and then again after
taking the new medication for a period of 1 week. The data are shown below.
1 85 75
2 70 50
3 40 50
4 65 40
5 80 20
6 75 65
7 55 40
8 20 25
Difference
Child Before Treatment After 1 Week of Treatment
(Before-After)
1 85 75 10
2 70 50 20
3 40 50 -10
4 65 40 25
5 80 20 60
6 75 65 10
7 55 40 15
8 20 25 -5
The next step is to rank the difference scores. We first order the absolute values of the difference scores and assign
rank from 1 through n to the smallest through largest absolute values of the difference scores, and assign the mean
rank when there are ties in the absolute values of the difference scores.
10 -5 1
20 10 3
-10 -10 3
25 10 3
60 15 5
10 20 6
15 25 7
-5 60 8
The final step is to attach the signs ("+" or "-") of the observed differences to each rank as shown below.
Observed Differences Ordered Absolute Values of Difference Scores Ranks Signed Ranks
10 -5 1 -1
20 10 3 3
-10 -10 3 -3
25 10 3 3
60 15 5 5
10 20 6 6
15 25 7 7
-5 60 8 8
Similar to the Sign Test, hypotheses for the Wilcoxon Signed Rank Test concern the population median of the
difference scores. The research hypothesis can be one- or two-sided. Here we consider a one-sided test.
The test statistic for the Wilcoxon Signed Rank Test is W, defined as the smaller of W+ (sum of the positive ranks)
and W- (sum of the negative ranks). If the null hypothesis is true, we expect to see similar numbers of lower and
higher ranks that are both positive and negative (i.e., W+ and W- would be similar). If the research hypothesis is true
we expect to see more higher and positive ranks (in this example, more children with substantial improvement in
repetitive behavior after treatment as compared to before, i.e., W+ much larger than W-).
In this example, W+ = 32 and W- = 4. Recall that the sum of the ranks (ignoring the signs) will always equal n(n+1)/2.
As a check on our assignment of ranks, we have n(n+1)/2 = 8(9)/2 = 36 which is equal to 32+4. The test statistic is
W = 4.
Next we must determine whether the observed test statistic W supports the null or research hypothesis. This is done
following the same approach used in parametric testing. Specifically, we determine a critical value of W such that if
the observed value of W is less than or equal to the critical value, we reject H0 in favor of H1, and if the observed value
of W exceeds the critical value, we do not reject H0.
Note that when we analyzed the data previously using the Sign Test, we failed to find statistical significance.
However, when we use the Wilcoxon Signed Rank Test, we conclude that the treatment result in a statistically
significant improvement at α=0.05. The discrepant results are due to the fact that the Sign Test uses very little
information in the data and is a less powerful test.
Example:
A study is run to evaluate the effectiveness of an exercise program in reducing systolic blood pressure in patients
with pre-hypertension (defined as a systolic blood pressure between 120-139 mmHg or a diastolic blood pressure
between 80-89 mmHg). A total of 15 patients with pre-hypertension enroll in the study, and their systolic blood
pressures are measured. Each patient then participates in an exercise training program where they learn proper
techniques and execution of a series of exercises. Patients are instructed to do the exercise program 3 times per week
for 6 weeks. After 6 weeks, systolic blood pressures are again measured. The data are shown below.
1 125 118
2 132 134
3 138 130
4 120 124
5 125 105
6 127 130
7 136 130
8 139 132
9 131 123
10 132 128
11 135 126
12 136 140
13 128 135
14 127 126
15 130 132
Is there is a difference in systolic blood pressures after participating in the exercise program as compared to before?
• Step1. Set up hypotheses and determine level of significance.
H0: The median difference is zero versus
H1: The median difference is not zero α=0.05
1 125 118 7
2 132 134 -2
3 138 130 8
4 120 124 -4
5 125 105 20
6 127 130 -3
7 136 130 6
8 139 132 7
9 131 123 8
10 132 128 4
11 135 126 9
12 136 140 -4
13 128 135 -7
14 127 126 1
15 130 132 -2
The next step is to rank the ordered absolute values of the difference scores using the approach outlined in Section
10.1. Specifically, we assign ranks from 1 through n to the smallest through largest absolute values of the difference
scores, respectively, and assign the mean rank when there are ties in the absolute values of the difference scores.
Ordered Absolute
Observed Differences Ranks
Values of Differences
7 1 1
-2 -2 2.5
8 -2 2.5
-4 -3 4
20 -4 6
-3 -4 6
6 4 6
7 6 8
8 -7 10
4 7 10
9 7 10
-4 8 12.5
-7 8 12.5
1 9 14
-2 20 15
The final step is to attach the signs ("+" or "-") of the observed differences to each rank as shown below.
7 1 1 1
-2 -2 2.5 -2.5
8 -2 2.5 -2.5
-4 -3 4 -4
20 -4 6 -6
-3 -4 6 -6
6 4 6 6
7 6 8 8
8 -7 10 -10
4 7 10 10
In this example, W+ = 89 and W- = 31. Recall that the sum of the ranks (ignoring the signs) will always equal
n(n+1)/2. As a check on our assignment of ranks, we have n(n+1)/2 = 15(16)/2 = 120 which is equal to 89 + 31. The
test statistic is W = 31.
• Step 5. Conclusion.
We do not reject H0 because 31 > 25. Therefore, we do not have statistically significant evidence at α=0.05, to show
that the median difference in systolic blood pressures is not zero (i.e., that there is a significant difference in systolic
blood pressures after the exercise program as compared to before).
What is Friedman’s Test?
Friedman’s test is a non-parametric test for finding differences in treatments across multiple attempts.
Nonparametric means the test doesn’t assume your data comes from a particular distribution (like the normal
distribution). Basically, it’s used in place of the ANOVA test when you don’t know the distribution of your data.
Friedman’s test is an extension of the sign test, used when there are multiple treatments. In fact, if there are only two
treatments the two tests are identical.
Step 2: Rank each column separately. The smallest score should get a rank of 1. I am ranking across rows here so
each patient is being ranked a 1, 2, or 3 for each treatment.
Step 3: Sum the ranks (find a total for each column).
Step 5: Find the FM critical value from the table of critical values for Friedman (see table below).
Use the k=3 table (as that is how many treatments we have) and an alpha level of 5%. You could choose a higher or
lower alpha level, but 5% if fairly common — so use the 5% table if you don’t know your alpha level.
Looking up n-12 in that table, we find a FM critical value of 6.17.
Step 6: Compare the calculated FM test statistic (Step 4) to the FM critical value (Step 5). Reject the null hypothesis
if the calculated F value is larger than the FM critical value.:
• Calculated FM Test Statistic = 15.526.
• FM Critical value from table = 6.17.
The calculated FM statistic is larger, so you would reject the null hypothesis.