One-Way ANOVA Test
One-Way ANOVA
The one-way analysis of variance is used to test the
claim that three or more population means are equal
This is an extension of the two independent samples
t-test
One-Way ANOVA
The response variable is the variable you’re
comparing
The factor variable is the categorical variable being
used to define the groups
We will assume k samples (groups)
The one-way is because each value is classified in
exactly one way
Examples include comparisons by gender, race,
political party, color, etc.
One-Way ANOVA
Conditions or Assumptions
The data are randomly sampled
The variances of each sample are assumed equal
The residuals are normally distributed
One-Way ANOVA
The null hypothesis is that the means are all
equal
H 0 : 1 2 3 k
The alternative hypothesis is that at least one
of the means is different
The ANOVA doesn’t test that one mean is
less than another, only whether they’re all
equal or at least one is different.
Example
A random sample of the students in each row was
taken .The score for those students on the second
exam was recorded
Front: 82, 83, 97, 93, 55, 67, 53
Middle: 83, 78, 68, 61, 77, 54, 69, 51, 63
Back: 38, 59, 55, 66, 45, 52, 52, 61
One-Way ANOVA
The summary statistics for the grades of each row
are shown in the table below
Row Front Middle Back
Sample size 7 9 8
Mean 75.71 67.11 53.50
St. Dev 17.63 10.95 8.96
Variance 310.90 119.86 80.29
One-Way ANOVA
Variation
Variation is the sum of the squares of the deviations
between a value and the mean of the value
Sum of Squares is abbreviated by SS and often
followed by a variable in parentheses such as SS(B) or
SS(W) so we know which sum of squares we’re talking
about
One-Way ANOVA
1- Are all of the values identical?
No, so there is some variation in the data
This is called the total variation
Denoted SS(Total) for the total Sum of Squares
(variation)
Sum of Squares is another name for variation
One-Way ANOVA
2- Are all of the sample means identical?
No, so there is some variation between the groups
This is called the between group variation
Sometimes called the variation due to the factor
Denoted SS(B) for Sum of Squares (variation)
Between the groups
One-Way ANOVA
3- Are each of the values within each group
identical?
No, there is some variation within the groups
This is called the within group variation
Sometimes called the error variation
Denoted SS(W) for Sum of Squares (variation) Within
the groups
One-Way ANOVA
There are two sources of variation
the variation between the groups, SS(B), or the
variation due to the factor
the variation within the groups, SS(W), or the variation
that can’t be explained by the factor so it’s called the
error variation
One-Way ANOVA
Here is the basic one-way ANOVA table (Important)
Source SS df MS Fcal F tab
Between
Within
Total
One-Way ANOVA
Grand Mean k
n x
The grand mean is the average of all the values when the factor
i i is
ignored x
i 1
k
n
It is a weighted average of the individual sample means
i
Grand Mean for our example is 65.08 i 1
n x n x n x
• x
Between 1 Group Variation, SS(B)
1 2 2 k k
n n n
The between group variation is the variation between each sample mean
and the grand mean
1 2
k
Each individual variation is weighted by the sample size
• The Between Group Variation for our example
is SS(B)=1902
One-Way ANOVA
Within Group Variation, SS(W)
The Within Group Variation is the weighted total of
the individual variations
The weighting is done with the degrees of freedom
The df for each sample is one less than the sample
size for that sample.
The within group variation for our
example is 3386
One-Way ANOVA
After filling in the sum of squares, we have …
Source SS df MS Fcal F tab
Between 1902
Within 3386
Total 5288
One-Way ANOVA
Degrees of Freedom, df
A degree of freedom occurs for each value that can
vary before the rest of the values are predetermined
For example, if you had six numbers that had an
average of 40, you would know that the total had to be
240. Five of the six numbers could be anything, but
once the first five are known, the last one is fixed so
the sum is 240. The df would be 6-1=5
The df is often one less than the number of values
One-Way ANOVA
The between group df is one less than the
number of groups
We have three groups, so df(B) = 2
The within group df is the sum of the individual
df’s of each group
The sample sizes are 7, 9, and 8
df(W) = 6 + 8 + 7 = 21
The total df is one less than the sample size
df(Total) = 24 – 1 = 23
One-Way ANOVA
Filling in the degrees of freedom gives this …
Source SS df MS Fcal F tab
Between 1902 2
Within 3386 21
Total 5288 23
One-Way ANOVA
Variances
The variances are also called the Mean of the Squares
and abbreviated by MS, often with convoying variable
MS(B) or MS(W)
They are an average squared deviation from the mean
and are found by dividing the variation by the degrees
of freedom
MS = SS / df
Variation
Variance
df
One-Way ANOVA
MS(B) = 1902 / 2 = 951.0
MS(W) = 3386 / 21 = 161.2
MS(T) = 5288 / 23 = 229.9
One-Way ANOVA
Completing the MS gives …
Source SS df MS Fcal F tab
Between 1902 2 951.0
Within 3386 21 161.2
Total 5288 23 229.9
One-Way ANOVA
F test statistic
An F test statistic is the ratio of two sample variances
The MS(B) and MS(W) are two sample variances and
that’s what we divide to find F.
F = MS(B) / MS(W)
For our data, F = 951.0 / 161.2 = 5.9
One-Way ANOVA
Adding F to the table …
Source SS df MS Fcal F tab
Between 1902 2 951.0 5.9
Within 3386 21 161.2
Total 5288 23 229.9
One-Way ANOVA
The F test is a right tail test
The F test statistic has an F distribution with df(B)
numerator df and df(W) denominator df
Here df of numerator =2
df of denominator =21
Now at 0.05 significant value
From table (F0.o5,2,21 = 3.4668)
One-Way ANOVA
Completing the table
Source SS df MS Fcal F tab
3.466
Between 1902 2 951.0 5.9
8
Within 3386 21 161.2
Total 5288 23 229.9
One-Way ANOVA
Since the calculated F (5.9) > Tabulated F(3.4668)
,so we reject the null hypothesis.
The null hypothesis is that the means of the three
rows in class were the same, but we reject that, so at
least one row has a different mean.
There is enough evidence to support the that there
is a difference in the mean scores of the front,
middle, and back rows in class.
The ANOVA doesn’t tell which row is different, you
would need to look at confidence intervals or run
post hoc tests to determine that
Thank You