As Las 4
As Las 4
Supposed we are interested in the effect of three teaching approaches (conventional, modular, blended
learning) to the performance of the students in Mathematics. If there is no difference between the different
approaches, then we would expect all of them to be approximately equal. Otherwise, we would expect
the mean performances to differ. Here, we introduce a situation that utilizes a test that is used to analyze
the data from more than two populations. These tests are used to deal with treatment (e.g. teaching
approaches) effects, including tests that take into account other factors that may affect the response (e.g.
Mathematics performance). The hypothesis that the population means are equal is considered equivalent
to the hypothesis that there is no difference in treatment effects. The analytical method we will use in such
problems is called the analysis of variance (ANOVA). Initial development of this method could be credited
to Sir Ronald A. Fisher who introduced this technique for the analysis of agricultural field experiments which
at present is also useful when dealing with social science and educational researches.
AS_4a.
Discuss briefly the history and works of Sir Ronald A. Fisher.
One-way ANOVA
We are going to study the hypothesis testing problem of comparing population means for more than
two independent populations, where the data are about several independent groups (different
treatments being applied, or different populations being sampled).
In one-way ANOVA, it is assumed that the 𝑘 populations are independent and normally distributed with
means 𝜇1 , … , 𝜇𝑘 and with unknown but equal variance 𝜎 2 . The question is whether the means of these
groups are different or equal. That is, we test the hypothesis that:
Ho: 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘 versus
Ha: At least two of the means are different.
The idea is to consider the overall variability in the data and partition it into two parts: (1) between-
groups variability and (2) within-groups variability. If the between groups is much larger than that within
groups, this will indicate that differences between the groups are real. Let independent samples be
drawn randomly of sizes 𝑛1 , 𝑛2 , … 𝑛𝑘 and let 𝑛𝑡 = 𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘 .
Numerous Statistics books discuss the mathematics behind this statistical tool but this module will spare
you from analyzing those and we get straightly to the point on how to calculate the F-statistic (maybe
attributed from Fisher hence also called the F-value) manually. First consider the following ANOVA table,
Notice that the table must be filled out from left to right based on the flow of the formulae. It may seem
overwhelming also to perform the calculations but to facilitate understanding of the process of
calculation, we illustrate it using an example.
Example 2.2.1. The samples presented in the table below (blue-colored part) represent test scores
from three classes of Statistics taught using three different approaches (conventional,
modular and blended learning) and are independently obtained. Assume that the three
different populations are normal with equal variances. At α = 0.05 level of significance, test
for equality of population means.
𝑛 5 3 3 𝑛𝑡 = ∑ 𝑛 = 11
(∑ ∑ 𝑥)2 8362
𝑥̅ 76 66.33 85.67 𝑛𝑡
= 11
= 63536
(∑ 𝑥)2 (∑ 𝑥)2
28880 13200.33 22016.33 ∑ = 64096.67
𝑛 𝑛
Notice that the red-colored cells contain the values needed in calculating the “Sum of Squares” in
our ANOVA table. Therefore, the ANOVA table will be,
Source of Degrees of
Sum of Squares Mean Squares F-statistic
Variation Freedom
𝑆𝑆𝑏 𝑀𝑆𝑏
(∑ 𝑥)2 (∑ ∑ 𝑥)2 𝑀𝑆𝑏 = 𝐹=
𝑆𝑆𝑏 = ∑ − 𝑘−1 𝑘−1 𝑀𝑆𝑤
𝑛 𝑛𝑡
Between = 64096.67 − 63536 = 3−1 560.67 280.335
= =
= 560.67 =2 2 57.916
= 280.335 = 4.840
𝑆𝑆𝑤
(∑ 𝑥)2 𝑀𝑆𝑤 =
𝑆𝑆𝑤 = ∑ ∑(𝑥 2 ) − ∑ 𝑛 𝑛𝑡 − 𝑘 𝑛𝑡 − 𝑘
Within = 64560 − 64096.67 = 11 − 3 463.33
=8 =
= 463.33 8
= 57.916
𝑆𝑆𝑡 = 𝑆𝑆𝑏 + 𝑆𝑆𝑤 𝑛𝑡 − 1
Total = 560.67 + 463.33 = 11 − 1
= 1024 = 10
Advanced Statistics /LAS 4/Page 2 of 6
Now, we need to decide whether to accept or reject the null hypothesis. Therefore, we need to
complete our table,
Source of Sum of Degrees of Mean F- F- P-
Variation Squares Freedom Squares statistic critical value
Between 560.67 2 280.335 4.840 4.459 0.042
Within 463.33 8 57.916
Total 1024 10
The F-critical can be obtained by looking up on the table of critical values for an F-distribution. However,
the Microsoft Excel can help you obtain such by entering the formula:
= F. INV. RT (α, dfb , dfw ) = F. INV. RT (0.05,2, 8)
F-critical
It will return 4.840
Note that when using this method in deciding whether to accept or reject the hypothesis, if:
• F-statistic < F-critical, accept the Ho.
• F-statistic > F-critical, reject the Ho and conclude the Ha.
Since 4.840 > 4.459, we reject the null hypothesis and conclude that at least 2 of the classes in
Statistics differ in test scores. Similarly,
Either way, we provide the same conclusion as long as the interpretation is correct. Note also that in
using ANOVA as a statistical tool, you must first satisfy the following assumptions attributed to it,
otherwise, we must use non-parametric test which will be discussed on the succeeding units of this
subject.
1. Normality – that each sample is taken from a normally distributed population
2. Sample Independence – that each sample has been drawn independently of the other samples
3. Variance Equality – that the variance of data in the different groups should be the same
4. Continuous Dependent Variable – that is measured on a scale which can be subdivided using
increments/ intervals
AS_4b.
In an experiment to investigate the long-distance marathon performance of the four (4) colleges, 5 students
from each college were taken and the combined distances in kilometers ran by each group at a given
time was recorded. A partially completed ANOVA table is given. Fill in the missing entries (blue cells) and
test the relevant hypothesis using 0.05 level of significance
Total 𝟑𝟏𝟎𝟓𝟎𝟎. 𝟕𝟔
Between
Within 𝟐𝟑𝟓𝟒𝟏𝟗. 𝟎𝟒
Total 𝟑𝟏𝟎𝟓𝟎𝟎. 𝟕𝟔
Reason: Reason:
6. Draw a conclusion.
Conclusion: