Module-6-A-Inferential-Statistics-Non-parametric
Module-6-A-Inferential-Statistics-Non-parametric
DATA ANALYSIS
MODULE 4
INFERENTIAL STATISTICS - NON PARAMETRIC TEST
In this module, you will be learning about inferential statistics under the non-parametric
test. Non-parametric test is usually used to test the relationship between or among variables or if
there is a significant difference between or among variables if the distribution of the gathered
data is not normally distributed. In other words, nonparametric tests are tests that do not require
a normal distribution.
The difference between the parametric and non-parametric test is that in a parametric test,
many hypothesis tests rely on the assumption that the population follows a normal distribution
with parameters μ and σ but here in a nonparametric test, we do not have this assumption. These
nonparametric tests are useful when the data are strongly non-normal, say the data is skewed to
the right or skewed to the left, and resistant to transformation as what we have stated in our
previous statement.
But, in all kinds of statistical tests, it has its own limitations. The question is, “What are
the limitations of nonparametric tests?” The following are the limitations of the nonparametric
test:
Nonparametric test is used when the distribution is abnormal, that is when the skewness
is either positive or negative and the kurtosis is greater than or less than 0.265. It is also used
when the level of measurement is expressed in ordinal or nominal data.
Some of the inferential statistical test under non-parametric test that will be discussed on
this module are as follows:
a. Chi-square test
b. Mann-Whitney U Test or Wilcoxon Rank Sum Test
c. Kruskal-Wallis Test
d. Median Test Two-Sample Case or Sign Test for Two Independent Groups
A. Chi-Square Test
One of the most commonly used tests and nonparametric tests is the “Chi-Square
Test” and it is denoted by the symbol χ2. It is used to test the significance for data presented in
frequencies or nominal forms. This is a test of difference between the observed and expected
frequencies. The chi-square is considered a unique test due to its function; test of goodness-of-fit
and test of independence.
2
(𝑂−𝐸)
χ2 = ∑ 𝐸
Where;
χ2 = the Chi-Square Test
O = the observed frequencies
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
E = the expected frequencies = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑒𝑠
The Manager of a certain Bookstore decided to find out whether the Computer
books have an equal selling in a week. See the table below. Determine if there is a
significant difference in the number of Computer books sold in a week. Use the 0.05
level of significance.
Solution:
1. Hypotheses:
2. Level of Significance:
α = 0.05
𝑑𝑓 = 𝑛 − 1 = 5 − 1 = 4
χ2 𝑎𝑡 0. 05 = 9. 49
55
𝐸= 5
= 11
∑= 7.092
4. Decision Rule:
5. Conclusion:
Now, let’s use the SPSS to run the chi-square test for goodness of fit.
Step 1. On the menu bar, look for the “Analyze” and proceed to “Non-parametric Tests”, then
look the “Legacy Dialogs” and click the first item which is the “Chi-square”.
Transfer the data (from this example is the “Title of Computer Books”) into the “Test Variable
List” by clicking the arrow to the right.
Here, by default, SPSS will assume that all categorical variables are equal in number.
Step 3. Click “ok” and the output window will pop up or it will appear.
Here in the output window, you could see the summary statistics. In test statistics, the
numerical values that you could see here are important numbers which are very important in
analyzing the data where you could report that the computed chi-square is 7.091 with a degree of
freedom of 4 and the p-value (Assymp. Sig.) of 0.131 is greater than the critical value of 0.05
thus, there is no significant difference in the number of Computer books sold in a week.
The computation for the chi-square test for independence is similar to the test of
goodness-of-fit. The following steps will help us to be able to determine if the two variables has
a significant relationship.
3. Set the level of significance and determine the degree of freedom, that is ∝
can be 0.01 or 0.05 and the formula for the df = (c-1)(r-1) where c is the
number of columns and r is the number of rows.
𝑅*𝐶
𝐸= 𝑁
Where:
R is the row total
C is the column total
N is the grand total
2
(𝑂−𝐸)
χ2 = ∑ 𝐸
χ2 = chi-square test
O = observed frequency
E = expected frequency
6. Interpret the result whether you are going to accept or reject the null
hypothesis using the table for chi-square.
Decision Rule: If the computed chi-square is greater than the tabular value,
reject the null hypothesis and confirm the alternative hypothesis. With this, we can draw a
conclusion for variables under investigation.
Example:
The Samahan ng Edukador sa Estatistika – SEE surveyed 1,500 selected
registered voters nationwide and asked for their voting preference. The respondents were
classified by their sex (Male or Female) and their preference (PDP, UNO, LP). Results are
shown below.
Voting Preference
Sex Total
PDP UNA LP
Male 325 265 60 650
Female 375 395 80 850
Total 700 660 140 1,500
Test if there is a significant relationship between the respondents’ sex and their
voting preference. Use ∝ = 0.05 as level of significance.
Is there a significant relationship between the respondents’ sex and their voting
preference?
Ho: There is no significant relationship between the respondents’ sex and their
voting preference.
Ha: There is a significant relationship between the respondents’ sex and their
voting preference.
Voting Preference
Sex PDP UNO LP Total
O E O E O E
Male 325 E11 = 303.33 265 E12 = 286 60 E13 = 60.67 650
Female 375 E21 = 396.67 395 E22 = 374 80 E23 = 79.33 850
Total 700 660 140 1,500
To solve the expected frequencies or values, take the product of the column total and the
row total and divide the product by the grand total.
(650)(700)
𝐸11 = 1,500
= 303. 33
(850)(700)
𝐸21 = 1,500
= 396. 67
(650)(660)
𝐸12 = 1,500
= 286
(850)(660)
𝐸22 = 1,500
= 374
(650)(140)
𝐸13 = 1,500
= 60. 67
(850)(140)
𝐸23 = 1,500
= 79. 33
2 2 2 2
(325−303.33) (375−396.67) (265−286) (395−374)
= 303.33
+ 396.67
+ 286
+ 374
+
2 2
(60−60.67) (80−79.33)
60.67
+ 79.33
χ2 = 5. 466
Decision Rule: If the computed chi-square is greater than the tabular value rejects the null
hypothesis and accepts the alternative hypothesis.
Conclusion: Since the computed chi-square of 5.466 is less than the tabular value of 5.99 with
0.05 levels of significance and df of 2, we are not going to reject the null hypothesis thus there
is no significant relationship between the respondents’ sex and their voting preference. This
implies that the respondent’s sex has no relationship on their voting preference
Example 2.
A researcher surveyed 420 respondents if they are in favor or not in the same sex
marriage in the Philippines. He wants to determine if the response of the male respondent has a
significant relationship to the response of the female. The data is given below.
Test if there is a significant relationship between the respondents’ sex and their voting
preference. Use ∝ = 0.05 as level of significance.
Ho: There is no significant difference between the respondents’ sex and their
favorable response in divorce.
∝ = 0.05
df = (c -1)(r-1) = ( 2 – 1)(2 – 1) = 1
χ2at 0.05 = 3.84 tabular value
2
(𝑂−𝐸)
χ2 = ∑ 𝐸
2 2 2 2
(45−56.55) (50−38.45) (205−193.45) (120−131.55)
χ2 = 56.55
+ 38.45
+ 193.45
+ 131.55
χ2 = 2. 36 + 3. 47 + 0. 69 + 1. 01
χ2 = 7. 53
Decision Rule: If the computed chi-square is greater than the tabular value rejects the
null hypothesis and accepts the alternative hypothesis.
Conclusion: Since the computed chi-square of 7.53 is greater than the tabular value of
3.84 with 0.05 levels of significance and df of 1, reject the null hypothesis and accept
the alternative hypothesis. This implies that the respondent’s sex has a significant
difference on the favorable response whether they are in favor or not in favor to divorce.
In doing the chi-square test of independence in SPSS, it is quite different from the
chi-square test of goodness-of-fit. Assuming that we already encoded our needed data.
Note that our data view is not complete due to a large number of observations.
Step 1. In the menu bar of SPSS, look at “Analyze”, then “Descriptive Statistics” and proceed to
“Crosstabs”.
Step 3. Other information that we need is the test statistics, so we need to click the “Statistics”
button and another dialog box will appear. Here, we need to check the “Chi-square” and the “Phi
and Cramer’s V” in order to determine the strength of the relationship. After this, you need to
click continue and it will go back to the original dialog box (crosstabs).
Step 5. Once you click the “continue” you will return to the original dialog box. Here, you need
to click “ok”. With this action, the output window will appear.
Creating your report, you could use numerical information under “Chi-square Test”
specifically the result on the Pearson chi-square such as its value of chi-square, the degree of
freedom and the p-value. In this example, the computed chi-square is 5.465 with 2 as a degree of
freedom and the p-value is 0.065 where this value is greater than the level of significance of
0.05. With this important information, we are not going to reject the null hypothesis thus there is
no significant relationship between the respondents’ sex and their voting preference. This
implies that the respondent’s sex has no relationship on their voting preference
Section:________________________ Score:__________
Practice Exercises
Chi-Square Goodness-of-Fit and Test of Independence
1. The Local Government Unit of a particular place in the Philippines wants to find
out whether the Covid vaccine has an equal rolling out in a week from Monday to
Friday of one of the Barangays in their place. See the table below. Determine if
there is a significant difference in the number of Covid vaccines rolled out in a
week. Use the 0.05 level of significance.
Number of 45 38 53 59 62
Vaccine
Use chi-square test at 0.05 level of significance to test the null hypothesis that
there is no significant relationship between the opinion of male and female students on
the implementation of haircut policy in the University.
Another equally important test statistic in being able to analyze the data is the
Mann-Whitney U test. This test is also non-parametric or it is said to be that the distribution of
data is non normal or we are unable to assume the normality in both groups of variables. It is
also known as the Wilcoxon Rank Sum test and it is a counterpart or it can be used in place of an
unpaired t-test or F-test (ANOVA).
We could also use this test when we have a continuous level variable or in scale in SPSS
measured for all observations in two groups and we want to test if the distribution of this variable
is different in the two groups. Let us say group A and group B are two independent groups and
we need to test if these two groups are statistically different from each other. So, we could
formulate our null hypothesis as “There is no significant difference between the two groups”.
For example, the research problem would be: “Is there a difference in math anxiety scores
between males and females?” With this problem, we could formulate the null hypothesis or
known as the Ho and this is “There is no significant difference in the math anxiety scores
between male and female”. It is obvious that we have two independent groups; one is for the
male group and the other one is the female group.
In order to use this test, we need to know the formula for this statistical test. The formula
is illustrated below:
𝑛(𝑛+1)
𝑈𝑠𝑡𝑎𝑡 = 𝑅𝑎𝑛𝑘 𝑆𝑢𝑚 − 2
> U is the Mann-Whitney U (where this be computed separately for group 1 and 2)
> n is the number of observation in each group (the number of observations will be taken
separately say n1 and n2)
> Rank sum is the total rank for each group (where sum of the ranks for each group will be
determined separately, say RS1 and RS2)
So, in the previous formula for Mann-Whitney U test, this could be written as:
𝑛1(𝑛1+1) 𝑛2(𝑛2+1)
𝑈𝑠𝑡𝑎𝑡1 = 𝑅𝑆1 − 2
and 𝑈𝑠𝑡𝑎𝑡2 = 𝑅𝑆2 − 2
Example: A professor gave a final examination to his 20 students, 10 are male and the remaining
students are female. He wants to test if there is a difference in the performance of these two
groups of students. The level of significance is 0.05, two-tailed test. The scores are given below.
M1 90 F1 88
M2 89 F2 78
M3 86 F3 80
M4 88 F4 81
M5 78 F5 92
M6 86 F6 86
M7 80 F7 79
M8 82 F8 85
M9 73 F9 82
M10 78 F10 80
Research Problem: Is there any significant difference in the performance between male and
female?
Ho: There is no significant difference in the performance between male and female.
Ha: There is a significant difference in the performance between male and female.
If the computed U is less than the U critical value, disconfirmed the null hypothesis.
Note: The decision rule for this test is different from the previous decision where we reject the
Ho if the computed value is greater than the critical value. But in this test, if the computed
value is less than the U-critical value, we need to reject the null hypothesis.
a) Rank the score of students regardless the sex (in a group) from lowest to highest.
b) Make an additional column on the table to place the rank of the score of male and female.
c) Add the ranks for males and also add the ranks for females.
d) Compute U for each of the groups using the formula.
Note: In ranking, it is your choice if you want to rank from lowest to highest or from highest
to lowest.
M1 90 19 F1 88 16.5
M2 89 18 F2 78 3
M3 86 14 F3 80 7
M4 88 16.5 F4 81 9
M5 78 3 F5 92 20
M7 80 7 F7 79 5
M8 82 10.5 F8 85 12
M9 73 1 F9 82 10.5
M10 78 3 F10 80 7
Computation:
(10)(11)
𝑈𝑠𝑡𝑎𝑡 (𝑀𝑎𝑙𝑒) = 106 − 2
= 106 − 55 = 51
(10)(11)
𝑈𝑠𝑡𝑎𝑡 (𝐹𝑒𝑚𝑎𝑙𝑒) = 104 − 2
= 104 − 55 = 49
Here, we need to choose a computed U that gives the smaller value. So, our U is 49.
Since the computed Ustat of 49 is greater than the U-critical value of 23 at 0.05 level of
significance, we do not reject the null hypothesis. Hence, there is no significant difference in the
performance between male and female.
How does the Mann-Whitney U test can be used in SPSS? Just like on the previous
statistical test and if you are well versed in using the SPSS, analyzing the data set using this test
would be easy for you. But for those that could be considered as first time SPSS users, the
following steps would be a great help for you.
Of course, we need to have an encoded data set in the data view of SPSS. For discussion
purposes, the previous example would be used. Assuming that the data is already encoded in
SPSS. In doing this test, there are two ways that you could be applied. First is by treating the
data and no need to rank since SPSS will read the ranking and do the computation for you or
second, you could transform the data into rank and still you could get the same output.
Step 1. On the menu bar, look for “Analyze” then “Nonparametric Tests” proceed to “Legacy
Dialogs” and then click the “2 Independent Samples”.
You will notice here that the computed Mann-Whitney U which is 49.000 is the same as
in our previous illustration with the use of manual computation as well as the sum of the rank of
female group and the male group. At this point, we could analyze the data that the mean rank of
male and female is quite the same. In being able to prove that there will be no difference
between the mean rank of male and female is by the use of Asymp. Sig (2-tailed) that is equal to
0.939. Since the p-value of 0.939 is greater than the level of significance of 0.05, we will not
reject the null hypothesis that is “There is no significant difference in the performance between
male and female”.
As what stated before, we could do another way in using this test by transforming the
data into rank.
Step 2. Click the “rank cases” and another dialog box will appear.
Step 4. Once you click the “ok” button”, an output view will appear and you will notice that
there will be another column automatically added in the data view.
Step 5: Now, if you want to test the data using the Mann-Whitney U, you will do the same in our
previous illustration but instead of using the “score” we will be using the “rank score”.
Section:________________________ Score:__________
Practice Exercises
Mann-Whitney U Test
1. The pharmaceutical company wants to determine if the two brands of COVID 19 vaccine
have an equal length of shelf life in terms of number of years and months. Ten samples
for each brand were examined. The data are as follows:
Brand A 3.2 1.5 1.9 2.4 3.0 2.6 3.2 1.5 3.2 1.8
Brand B 4.5 5.2 3.9 4.3 2.9 5.5 4.0 5.3 5.2 4.8
Based on the problem, determine the possible statement as your research problem and
formulate the null and alternative hypotheses. Test the hypothesis at 0.05 level of significance
that the two different brands of vaccine have an equal length of shelf life in terms of years and
months. Do both the manual computation and with the use of SPSS. Note: Use the space for
your computation and analysis.
1. As stated previously, we only use this test if the distribution of the data to
be treated is not normal, otherwise it is better to use the F-test.
3. The variances are not equal and the level of measurement of data is
ordinal scale (rank data).
2
12 𝑅𝑖
𝐻= 𝑁(𝑁+1)
Σ 𝑁𝑖
− 3(𝑁 + 1)
)( )
2 2 2 2
𝐻= ( 12
𝑁(𝑁+1)
𝑅1
𝑁1
+
𝑅2
𝑁2
+
𝑅3
𝑁3
+... +
𝑅𝑛
𝑁𝑛
− 3(𝑁 + 1)
Where:
N is the total number of observations that was combined in all groups
Ri is the total rank of each group
Ni is the number of observations of each group
Step 1: Rank the data. You can do the ranking from lowest to highest or from highest to
lowest and all of the data must be ranked regardless what group it belongs to. You have an
option when it comes to ranking the data.
Step 2: On the prepared table for your group of data, say you have three groups, meaning
there are three columns, four groups there will be four columns, etc., you will add a column to
the right of each column group so that you could place the assigned rank for each data in every
group column.
Step 3: Take the sum of the rank of the respective columns ranks of each group that will
serve as your ΣR1, ΣR2, ΣR3 ... , ΣRn
86 88 78
80 92 83
80 87 75
89 92 80
82 95 77
85 89 80
1 75 1
2 77 2
3 78 3
4 80 5.5
5 80 5.5
6 80 5.5
7 80 5.5
8 82 8
9 83 9
10 85 10
11 86 11
12 87 12
13 88 13
14 89 14.5
15 89 14.5
16 92 16.5
17 92 16.5
18 95 18
86 11 88 13 78 3
80 5.5 92 16.5 83 9
80 5.5 87 12 75 1
82 8 95 18 77 2
85 10 89 14.5 80 5.5
Research Problem: Is there any significant difference in the mean grade of the students using the
three different methods of teaching?
Ho: There is no significant difference in the mean grade of the students using the three
different methods of teaching.
Ha: There is a significant difference in the mean grade of the students using the three
different methods of teaching.
Step 2: Criterion
Step 3: Decision: If the computed Kruskal-Wallis test or H is greater than the critical value,
reject the null hypothesis.
)( )
2 2 2
𝐻= ( 12
𝑁(𝑁+1)
𝑅1
𝑁1
+
𝑅2
𝑁2
+
𝑅3
𝑁3
− 3(𝑁 + 1)
)( ) − 3(18 + 1)
2 2 2
𝐻= ( 12
18(18+1)
(54.5)
6
+
(87.5)
6
+
(26.0)
6
𝐻= ( 12
(18)(19) )( 2,970.25
6
+
7,656.25
6
+
676
6 ) − 3(19)
𝐻= ( )(
12
342
11,302.5
6
− ) 3(19)
𝐻= ( 135,630
2,052 ) − 57 = 66. 096 − 57 = 9. 096
Since the computed H-value of 9.096 is greater than the critical value which is also a
tabular value on chi-square of 5.991 with the degrees of freedom of 2 at 0.05 level of
significance, we reject the null hypothesis in favor of the alternative hypothesis. Therefore,
there is a significant difference in the mean grade of the students using the three different
methods of teaching.
Just like in our previous test statistics with the aid of SPSS, we must see to it that the data
to be treated is not normally distributed. In order to show how to use SPSS for this kind of test,
these are the following steps prior to treating the data using the H-test. Our assumption is that
the data is pre-encoded or there is data already encoded in the variable view of the SPSS.
Step : On the main menu, look for “Analyze” and drop down to “Non-Parametric Test” and
proceed to “Legacy Dialog” then click the “K Independent Samples”.
Step 4. Place the “Final Grade of Student” in the test variable and the “Methods of Teaching” in
the grouping variable by highlighting the data and with the use of an arrow. You may add
descriptive information by clicking the “option” button. Once you click this, you will see
another dialog box and you may click the “Descriptive” then click continue.
After clicking the “continue” button, you will see this dialog box below. On this dialog box you
will now click the “ok” button to proceed to the output view.
Analysis:
The mean rank of the students’ final grade for the three different methods of teaching
such as virtual online, face to face and modular approach is 9.08, 15.08 and 4.33 respectively
with the number of observations of 6 for each method. Since the p-value of 0.002 is less than the
level of significance of 0.05 with the degrees of freedom of 2 at 0.05 level of significance, we
reject the null hypothesis in favor of the alternative hypothesis.
Conclusion:
Therefore, there is a significant difference in the mean grade of the students using the
three different methods of teaching.
Section:________________________ Score:__________
Practice Exercises
Kruskal-Wallis H Test
Initially, a farmer planted an equal height of fruit bearing baby tree in a common
quality of soil. He grouped the plants into four groups where each group was treated by a
fertilizer of different brands. Group A, B, C, D plants are treated with brand A, B, C and D
fertilizer respectively. After a month, the farmer measures the height of the plants individually
and the data are recorded below. All measurements are in centimeters.
Brand of Fertilizers
243.67 439.12
154.65
Test the hypothesis if there is a significant difference of the height of a three treating
different kinds of fertilizers. Use 0.05 at level significance. Do this both with manual
computation and with the use of SPSS.
This statistical test is another tool under the non-parametric test. The sign test for
two independent samples is also known as the median test of two sample cases. If in t-test, we
compare the mean of two groups, here in this test statistics, we basically compare the median of
the two independent samples or two independent groups in a categorical or numerical variable
known as the grouping variable. So, the counterpart of this test under parametric test is the
student t-test or simply t-test.
If we want to compare the median of two independent groups and the distribution of the
data on these groups are not normally distributed, this test is the appropriate statistical test to use.
Also, the data should be in a numerical data (interval and ratio scale) or in categorical scale
(nominal and ordinal scale) and an example is just like in the Likert-scale.
If you wish to test if there is a significant difference between the two independent groups
or if you want to know whether the median score of the first group, say group A is significantly
different or not with the second group, say group B, this test statistic is the appropriate test to
use. The formula for this test is given below.
2
2 𝑛(𝑎𝑑 − 𝑏𝑐)
𝑋 = (𝑎+𝑏)(𝑐+𝑑)(𝑎+𝑐)(𝑏+𝑑)
Where;
2
𝑋 = Chi-square test
a = the number of positive observation in the first row
b = the number of negative observation in the first row
c = the number of positive observation in the second row
d = the number of negative observation in the second row
a+b = the row total on the first row
c+d = the row total on the second row
a+c = the column total on the first column
b+d = the column total on the second column
n = the grand total
Example: A professor wants to know if there is a significant difference in the quiz performance
of the students between male and female in Data Analysis. The score of male and female
students are as follows:
Score
Male Female
6 10
4 8
3 10
6 5
5 7
3 9
5 5
8 6
4 8
3 2
5 8
7
7
Score Sign
Male Female Male (sign) Female (sign)
6 10 - +
4 8 - +
3 10 - +
6 5 - -
5 7 - +
3 9 - +
5 5 - -
8 6 + -
4 8 - +
3 2 - -
5 8 - +
7 +
7 +
Let us now do the steps in hypothesis testing. The research problem would be, “Is there
any significant difference in the performance of male and female students in Data Analysis?”
Ho: There is no significant difference in the quiz performance between male and female.
Ha: There is a significant difference in the quiz performance between male and female.
Step 2. Criterion
Step 4. Statistics
2
2 𝑛(𝑎𝑑 − 𝑏𝑐)
𝑋 = (𝑎+𝑏)(𝑐+𝑑)(𝑎+𝑐)(𝑏+𝑑)
2
2 24(4 − 90)
𝑋 = (11)(13)(10)(14)
2 (24)(7,396) 177,504
𝑋 = 20,020
= 20,020
= 8. 866
Since the computed chi-square of 8.866 is greater than the critical chi-square value of
3.841 at 0.05 level of significance with a degree of freedom of 1, we have to reject the null
hypothesis. Therefore, we could say that there is a significant difference in the quiz performance
of male and females in Data Analysis.
At this point, we will be using SPSS to test the hypothesis if the median of two
independent variables or groups is statistically significant or not. First, we need to encode all the
data that are necessary for the test. For presentation purposes, we will be using our previous
hypothetical data for this statistical test.
Now, let us use SPSS in order to test the hypothesis. For this test, we will be having two ways to
determine if there is a significant difference between the two groups (median). Let us use the
first one.
Step 1. In the menu bar, place the cursor on “Analyze” and drop down to “Nonparametric Tests”,
then the “Legacy Dialogs” and proceed and click the “2 Independent Samples”.
Step 3. When you see another dialog box for the range for grouping variables, place the proper
code that you use for your group variable. For this example, the minimum is 1 for male and 2 for
female, then click continue.
For this illustration, we can conclude that since the p-value of 0.005 is less than the level
of significance of 0.05, we will reject the null hypothesis in favor of the alternative hypothesis
that there is a significant difference between the score of males and females.
Alternative:
Step 1. In the menu bar, place the cursor on “Analyze” and drop down to “Nonparametric Tests”,
and click the “Independent Samples”.
Step 3. Highlight your data and transfer it to the appropriate box with the use of an arrow. In this
presentation, transfer the “sex” variable into groups while the “score in quiz” into the test fields
and then click the run button.
Step 4. Once you click the run button, an output view will appear.
Section:________________________ Score:__________
Practice Exercises
Median Test Two-Sample Case or Sign Test for Two Independent Samples
Directions: Use the sign test for two independent samples in this set of data both manual
computation and with the use of SPSS. Analyze and make a conclusion about it.
Determine whether there is a significant difference or not between the male and female
based on the given data below in the number of hours on the usage of social media in a day.
The data are as follows:
Number of Hours in a Day
Male Female
6.5 10.0
4.5 8.5
3.0 10.0
6.5 5.0
5.5 7.0
3.0 9.9
5.5 5.5
8.5 6.5
4.5 8.0
3.0 2.5
5.5 8.0
7.0 7.7
8.0 10.5
What is a Fisher Sign test or known as Sign test for Correlated samples?
A Fisher Sign test is another test statistic under non parametric. This is also
known as the Sign Test for correlated samples and this test is the counterpart of the paired t-test
or t-test for correlated samples under parametric test and it is one of the easiest to perform.
When we talk about Fisher Sign test, we are dealing with two parameters that are being
considered and tested in one sample group or it simply compares two correlated samples or the
data is in n-paired observations.
We use this test if you need to test if there is a difference between each pair
(dependent) of observations that are being tested or this test according to Broto is based on the
idea that half the difference between the paired observations will be positive and the other half
will be negative. This test is the counterpart of the t-test for one sample group or paired t-test in
parametric test and the level of data that are being treated with the use of this test are ordered
categorical data where a numerical scale is inappropriate but where it is possible to rank the
observations or it can be used if the data is continuous numeric.
The formula for Fisher Sign test or Sign Test for Correlated Samples is given
below.
|𝐷| − 1
𝑧 =
𝑛
Where:
What are the steps in performing a Sign Test for Correlated Samples?
Listed below are the steps on how you will perform or use this test and as we all
know, we need to have a research problem before we do the hypothesis testing.
1. Make a table for the gathered data. First column is for the individual
observations and the second column and third column would be the paired
data to be tested.
2. The fourth column for the table is for the determination of the sign for
each pair of observations. Note. In counting the number of positive and
negative, zero would not be included in the count.
3. Then do the hypothesis testing and apply it into the formula. The
computed value should be compared to the critical value in being able to
know if you will accept or reject the null hypothesis.
Example.
During the first day of class, a professor conducted a 50-item pre-test to his
fifteen (15) students in Statistics and Probability before the formal lesson started on the said
subject. After a semester, he gave a posttest to his fifteen students using the same set of
examinations that he was given in the pretest. He wants to determine if there is a significant
difference between the pretest and the posttest. The following is the result of the experiment and
he used the sign test for correlated samples. The professor uses the α = 0.05 level of significance.
Student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Pre-test 15 12 20 10 8 27 29 13 19 22 25 14 28 18 16
Post-test 20 18 25 25 20 35 43 28 29 37 46 27 33 37 28
Research Problem: Is there a significant difference between the Pretest and Posttest of the fifteen
students in Statistics and Probability on the use of the teaching method by the Professor?
2 12 18 -
3 25 20 +
4 10 25 -
5 8 20 -
6 37 25 +
7 43 43 0
8 13 28 -
9 29 19 +
10 22 37 -
11 25 46 -
12 27 20 +
13 28 33 -
14 18 37 -
15 16 28 -
Step 2. Criterion
|𝐷| − 1
𝑧 =
𝑛
Since the computed value of z = 1.33 is less than the tabular value of 1.96, we
confirmed the null hypothesis that there is no significant difference on the pre-test and post-test.
Therefore, there is no significant difference in the performance of fifteen students on their
pre-test and post-test in Statistics and Probability.
To do this, let us use the previous data given on this kind of test statistics. Assuming that
there is already data which are being encoded.
Step 1. On the main menu bar, look for the “Analysis”, drop down the cursor into the
“Non-parametric” then move to “Legacy Dialogs” and proceed to “2 Related Samples”. Click
the “ 2 Related Samples”.
Step 2. Once you click it, you will see another dialog box.
Step 3. Transfer your independent variable (Pre-test) into “Test Pairs” under variable 1 and your
dependent variable (Post-test) in variable 2 using the arrow. Wilcoxon is the default test statistic.
Step 4. Click the “continue” button and you will be taken to the previous dialog box and then
click the “ok” button. Once you click it, the output view will appear.
So, at this point, you can now do your analysis of your data.
Based on your output view, there are 15 students who took the pre-test and post-test.
There are 4 under the negative differences, 10 for positive differences and 1 for ties. The p-value
of 0.180 is greater than the level of significance of 0.05, hence we need to accept the null
hypothesis. Thus, there is no significant difference between the pre-test and the post-test.
F. Median Test for Multiple Sample or Sign Test for K-Independent Groups/Samples
In order to apply this test, the formula for the chi-square test will be used but the
contingency table must be in a 2 x k table, where 2 represents the above and below
median and k is the number of independent groups.
Where:
χ2 = chi-square test
O = observed frequency
E = expected frequency
Just like in our previous lesson about chi-square test for independence, in being
able to use this test statistics (sign test for k-independent sample), simply do the same on what
you do in the chi-square test.
1 10 12 18
2 12 18 23
4 20 24 25
5 18 12 24
6 11 23 23
7 10 20 29
Research Problem:
Is there a significant difference in the number of sales in a week of a particular
brand of cloth in three different ways of selling?
Step 1. Hypothesis
Step 2. Criterion
If the computed chi-square is greater than the critical value, reject the null hypothesis.
Step 4. Computation
First, determine the median of all data regardless of the place of selling. Here, the
median is 18.
Then make a table. The data falls below or at the median will be negative (-) and data
which are above the median will be positive (+).
10 - 12 - 18 -
12 - 18 - 23 +
9 - 11 - 18 -
20 + 24 + 25 +
18 - 12 - 24 +
11 - 23 + 23 +
10 - 20 + 29 +
O E O E Total
Online 1 3 4 7
Selling 6
Mall Selling 3 3 4 4 7
Public
Market 5 3 2 4 7
Selling
Total 9 12 21
(7)(9)/21 = 3 (7)(12)/21 = 4
(7)(9)/21 = 3 (7)(12)/21 = 4
2 2 2 2 2 2
(1 − 3) (3 − 3) (5 − 3) (6 − 4) (4 − 4) (2 − 4)
χ2 = 3
+ 3
+ 3
+ 4
+ 4
+ 4
= 4. 6666 = 4. 66
Since the computed value for chi-square (sign test for k-independent sample) of 4.67 is
less than the critical or tabular value of chi-square of 5.991 at 0.05 level of significance, we will
be confirming the null hypothesis. Therefore, there is no significant difference in the number of
sales in a week of a particular brand of cloth in three different ways or places of selling.
Doing this test statistics manually is very tedious and time consuming for the researcher.
SPSS will make it easier to compute this test statistic as long as you know how to use this
software. Let us do this test statistics using the SPSS using the previous data.
Step 1. Based on the above data set, go to “Analyze” then down the cursor to “Nonparametric
Tests”, proceed to the “Legacy Dialogs” then move the cursor to “K independent Samples” and
click it.
Step 2. Once you click the K-independent samples, another dialog box will appear.
Highlight and transfer the “Number of Sales” into the “Test Variable List” and the “Way of
Selling” on the “Grouping Variable” using the arrow right.
Here, you will type 1 as the minimum and 3 as the maximum since there are three
different ways of selling as our grouping variable, then click continue.
Step 4. Once you click the continue button, it will go back to the previous dialog box. You will
notice that the default statistical test is the “Kruskal-Wallis H”. What you are going to do is to
uncheck it and check the square on the “Median” and then click the ok button.
From here you can now start doing your analysis, interpretation and conclusion based on
the treated data.
Since the p-value (Asym. Sig.) of 0.097 is greater than the 0.05 level of
significance at median is 18 with a chi-square value of 4.667 and a degree of freedom of 2, do
not reject the null hypothesis. Therefore, there is no significant difference in the number of sales
in a week of a particular brand of cloth in three different ways or places of selling.
In order to understand these test statistics, we need to know what is the meaning of
monotonic function. For sure your question would be, “What is this function?” A monotonic
function is one that either never increases or never decreases as its independent variable
increases. There are three kinds of monotonic function. One is monotonically increasing, the
second is monotonically decreasing and the third is not monotonic. When we say monotonically
increasing, as x increases the y also increases. The monotonically decreasing is that as the x
variable increases, the y variable decreases and the not monotonic is that as the x variable
increases the y variable sometimes decreases and sometimes increases. In other words, we use
this test in order to measure the strength and direction of the monotonic between two continuous
or ordinal variables like in Likert scale.
What is the Spearman Rank Correlation coefficient and how do we interpret it?
Similar to Pearson product moment of correlation, the spearman rank correlation has
what we call its coefficient of correlation and it is denoted by rs. This coefficient tells the
strength of the relationship between two variables. Unlike in Pearson correlation, there is no
need to test its normality since it belongs to the non parametric test.
𝑛
2
6 ∑ 𝑑𝑖
𝑖=1
𝑟𝑠 = 𝞺 = 1 − 2
𝑛(𝑛 −1)
Where:
How do we use or what are the steps to compute for the Spearman Rank-Order
Correlation?
Step 1. Find the rank of the scores/data for each variable from highest to lowest.
Step 2. Look for the difference of the rank of the first variable to the second
variable for each observation.
Step 4. Take the summation of the difference of the two variables that are squared.
Note: It is important that in doing this test statistics manually, we need to create a table for
this.
Illustration: A health professor wants to determine if the height and weight of his fifteen (15)
students are correlated to each other. He calculated the correlation coefficient using Spearman
rank order correlation at 0.05 level of significance. The data are as follows:
1 72 172
2 62 169
3 58 167
4 75 195
5 43 155
7 56 151
8 59 167
9 41 157
10 49 163
11 77 188
12 60 179
13 50 172
14 45 160
15 63 189
Research Problem: Is there any significant relationship between the students’ height (in cm.) and
weight (in kg.)?
Step 1. Hypotheses
Ho: There is no significant relationship between the students’ height and weight.
Ha: There is a significant relationship between the students’ height and weight.
Step 2. Criterion
If the computed value of rs is greater than the critical value rs at 0.05 level of significance,
reject the null hypothesis.
Step 4: Computation: Rank the data from highest to lowest as 1 is the highest rank.
62 169 6 8 -2 4
75 195 2 1 1 1
43 155 14 14 0 0
70 182 4 4 0 0
56 151 10 15 -5 25
41 157 15 13 2 4
49 163 12 11 1 1
77 188 1 3 -2 4
60 179 7 5 2 4
45 160 13 12 1 1
63 189 5 2 3 9
2
∑ 𝑑𝑖 = 88
𝑛
2
6 ∑ 𝑑𝑖
(6)(88) 528 528
𝑟𝑠 = 1 − 𝑖=1
2 =1- 2 = 1 − (15)(225−1)
= 1− 3360
= 1 − 0. 15 = 0. 843
𝑛(𝑛 −1) (15)(15 −1)
For sure, you will be saying that it is more convenient to use SPSS if you want to use test
statistics since the SPSS does the computation. But still, it is in your hand to get the right test
statistics. Even if the SPSS could do the computation, the output will be based on the input you
do.
Again, if you are going to use the software, make sure that you are already familiar with
its interface and how to explore it. You should have to encode the proper data set in the data
editor before treating and analyzing the data. So, let us apply SPSS by following the steps on
how to deal with these statistics, the Spearman Rank-Order of Correlation.
For presentation purposes, we will be using the previous data under this test statistics.
Step 1. In the menu bar, look for “Analyze” and drop down the cursor to “Correlate” then click
the “Bivariate”.
Step 2. Once you click the “Bivariate”, a dialog box will appear. The default statistics you could
see here is the “Pearson-r.”
Step 2. Place the two groups of data into the variable window by highlighting each data and click
the arrow. Also, unclick the “Pearson” and click the small box on the “Spearman” then click the
“ok” button.
Step 3. Once the “ok” button is clicked, an output view will be displayed.
At this point, we could now do the analysis and interpretation of data. In interpreting and
analyzing the data, it is just the same as how you interpret the data using manual computation for
this test statistics.
As you can see, the correlation coefficient of 0.843 tells us that the two independent
groups of variables have a positive strong association or correlation where the p-value, Sig.
(2-tailed) is less than the level of significance 0.05 and 0.01. Meaning, we need to reject the null
hypothesis in favor of the alternative hypothesis. Hence, there is a significant relationship
between the students height and weight. It implies that as the height of the student increases, the
weight is also increased.
For sure you will be asking, “If this is the test that we are going to use, where is the
ranking?” We could also use this test in SPSS even if we rank the data set. In order to do that,
we need to rank first the data of each group of data.
Step 1. In the menu bar, look for the “Transform” and drop down the cursor to “Rank Cases”.
Step 2. Click the “Rank” and the dialog box will appear.
Step 3. On the rank cases dialog box, move the two data sets on the window variable by
highlighting it and click the arrow for this.
Before clicking on the “ok” button, make sure that you choose the assigned rank whether
you are going to rank the data from smallest to largest or vice versa. Here in our example, the
ranking from largest to smallest was chosen, then click ok.
Step 4. Once you have done it, you will see on the data view that the ranks are included. Where
the rank is located? There will be an added column for the rank for each group of data set. For
Step 5. In analyzing the data, we will be adopting the steps in our previous presentation but
instead of using the data set, we will be using the data set that is being ranked and then click
“ok”.
Step 6. After clicking the ok button, an output window will appear.You will notice that the result
is the same.
What is the Friedman Test and how will deal with this test?
For sure you will ask what is your k as treatment and b as the block. For example
there are 8 individual observations who are subject to be treated in 4 different methods of
teaching and learning. For this example k is the number of different methods of teaching and
learning and the b is the number of observations that serve as the number of blocks.
Step 1. Label the treatments as k (If there are 3 treatments, k = 3. If there are 4 different
treatments, k = 4, and so on) and the b as the number of blocks is the number of samples subject
for testing.
Step 2. Do the ranking for each block. Meaning, the data for different treatments in one
particular observation will be ranked from highest to lowest.
Step 3. After determining the ranks for each block of different treatments, take the sum of
the rank per treatment or separately and label this as T1, T2, … Tn, where T1 is the first treatment,
T2 is the second treatment up to the Tn and this is the nth treatment.
Do this step in the prepared table. After this, we could proceed now to the computation
of the Friedman Fr test.
𝐹𝑟 = ( 12
(𝑏)(𝑘)(𝑘+1) )(𝑇 2
1
2 2
)
2
+ 𝑇2 + 𝑇3 +... + 𝑇𝑘 − 3𝑏(𝑘 + 1)
Where:
Example. Five students are subject to a test if there are significant differences in their
level of acceptance on the four different designs of computers in the 1970s, 1980s, 1990s and in
the 2000s . The level of acceptance in a 5-likert scale is 1 is unacceptable, 2 is slightly
unacceptable, 3 is undecided, 4 is slightly acceptable and 5 is acceptable. The responses of the
five students are given below. Use 0.05 as the level of significance to test whether there is a
significant difference of 5 students on their level of acceptance of computer designs for the said
four years.
A 2 1 3 5
B 1 5 2 4
C 3 1 5 4
D 4 1 2 5
E 1 3 2 5
Research Problem: Is there any significant difference in the level of acceptance of five (5)
students on computer design to four (4) different years?
Step 1. Hypothesis
Ho: There is no significant difference in the level of acceptance of five (5) students on
computer design to four (4) different years?
Step 2. Criterion
If the computed value is greater than the critical value, reject the null hypothesis.
Step 4. Computation: Note that ranking is from the lowest grade to highest.
A 2 2 1 1 3 3 5 4
B 1 1 5 4 2 2 4 3
C 3 2 1 1 5 4 4 3
D 4 3 1 1 2 2 5 4
E 1 1 3 3 2 2 5 4
T1 9 T2 10 T3 13 T4 18
𝐹𝑟 = ( 12
(𝑏)(𝑘)(𝑘+1) )(𝑇 2
1
2 2
+ 𝑇2 + 𝑇3 +... + 𝑇𝑘 − 3𝑏(𝑘 + 1))
2
𝐹𝑟 = ( 12
(5)(4)(4+1) )(9 2 2
+ 10 + 13 + 18
2 2
) − (3)(5)(4 + 1)
Since the computed value for Fr which is 5.88 is less than the critical value for chi-square
of 7.815 at 0.05 level of significance at degree of freedom of 3, do not reject the null hypothesis.
Hence, the hypothesis that there is no significant difference in the level of acceptance of five (5)
students on computer design to four (4) different years was confirmed.
Using SPSS for this test statistics is not as difficult as you think. If you really have a
strong knowledge on how to explore SPSS, just like what you did to other test statistics, for sure
you could do this in a very simple manner.
Let us use the previous data under this test statistics with the use of SPSS. But make sure
that you already encoded the necessary data in the variable and data view of SPSS. Assuming
that you already encoded all the necessary data, this could be the environment.
Step 2.
Step 3.
Step 5.
McNemar Test for two correlated proportions is a test statistics under the
non-parametric test.
References:
https://www.statstutor.ac.uk/resources/uploaded/mannwhitney.pdf
https://psych.unl.edu/psycrs/handcomp/hcmann.PDF
https://www.bristol.ac.uk/cmm/media/research/ba-teaching-ebooks/pdf/Mann%20Whitney%20-
%20Practical.pdf
https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/mann-whitne
y-u-test/
https://www.real-statistics.com/statistics-tables/mann-whitney-table/
https://davidmlane.com/hyperstat/viswanathan/Median_Test.html
https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470776124.app1
https://www.stat.purdue.edu/~lfindsen/stat503/Table%207%20-%20b.pdf
https://psych.unl.edu/psycrs/handcomp/hcmedian.PDF
https://www.statisticshowto.com/probability-and-statistics/correlation-coefficient-formula/spear
man-rank-correlation-definition-calculate/
https://www.google.com/search?q=spearman+rank+order+correlation+table
https://www.statstutor.ac.uk/resources/uploaded/spearmans.pdf
https://statistics.laerd.com/statistical-guides/spearmans-rank-order-correlation-statistical-guide-2.
php
https://www.sciencedirect.com/topics/medicine-and-dentistry/friedman-test