0% found this document useful (0 votes)
99 views39 pages

ANOVA

The document discusses statistical methods for comparing means across multiple groups, specifically focusing on Analysis of Variance (ANOVA) and its assumptions. It outlines the process for conducting one-way ANOVA, including hypothesis testing and the calculation of sums of squares. Additionally, it covers pair-wise comparisons and various multiple comparison tests to identify specific differences among group means.

Uploaded by

Berhanu Yelea
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views39 pages

ANOVA

The document discusses statistical methods for comparing means across multiple groups, specifically focusing on Analysis of Variance (ANOVA) and its assumptions. It outlines the process for conducting one-way ANOVA, including hypothesis testing and the calculation of sums of squares. Additionally, it covers pair-wise comparisons and various multiple comparison tests to identify specific differences among group means.

Uploaded by

Berhanu Yelea
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

ARBA MINCHUNIVERSITY

COLLEGE OF MEDICINE AND HEALTH


SCIENCES
SCHOOL OF PUBLIC HEALTH

Comparison of more than two groups

Mesfin Kote (Assis. Prof, PhD candidate)

March 22, 2025 1


So far
1. Simple Random Sample from a population with
known σ – 0,l continuous response.
- One sample z-test.
2. Simple Random Sample from a population with
unknown σ - continuous response.
- One sample t-test.

2
So far…
3. Simple Random Samples from two populations
with known σ.
- Two sample z-test.
4. Simple Random Samples from
two populations with unknown σ.
- Two sample t-test.

3
Sampling Study with k>2
Populations
• In the previous lecture, we compared means from
two independent groups.
• In this lecture, we extend the procedure to consider
means from k independent groups, where k is 2 or
greater
• One sample is drawn independently and randomly
from each of k > 2 populations.

4
Test of Significance for Three or More
Population

1. Population mean (Analysis of Variance)

a. Completely randomized design (One-way ANOVA)

b. Factorial experiment (Two-way ANOVA)

c. Repeated Measures ANOVA

5
Test of Significance for Three or More Population

2. Population proportion
• 2 test

6
Analysis of Variance
(ANOVA)

7
Comparison of several means -
Analysis of variance
• A t-distribution can be used for testing hypotheses about
differences of means for independent samples if both
populations are normal and have the same variances.

• However, the usual two-sample t-test cannot be applied


when more complex sets of data comprising more than two
groups are considered. In this regard, one-way analysis of
variance (ANOVA) is used to compare the means of several
groups.

• It is used when there is a single way of classifying individuals.


• That is, when the subgroups to be compared are defined by
just one factor, for example in the comparison of means
between different socioeconomic classes.
8
How does ANOVA work?

• Instead of dealing with means as data points we deal


with variation
• There is variation (variance) within groups (data)
• There is variation between group means
• If groups are equivalent then the variance between and
within groups will be equal.
• Expected variation is used to calculate statistical
significance as expected differences in means are used in
t-tests

9
Hypothesis in ANOVA

• The hypothesis to be tested for k means (k2) is


H0 : 1 = 2 = 3 = . . . =k
H1 : At least one population mean does not equal another
population mean
(We can not carry out a one-tailed alternative
hypothesis, H1 : 1 < 2 < 3 < …< k)
 If we reject the null hypothesis, we cannot conclude that
at least one population mean does not equal another
population mean
 Other methods are needed to determine which
population means are different.
10
• One-way analysis of variance is based on assessing how
much of the overall variation in the data is attributable to
differences between the group means, and comparing this
with the amount attributable to differences between
individuals in the same group.

• The calculations for one way ANOVA are expressed in


relation to the sum of the observations in each sample.
• Suppose we have K samples of observations, with n i
observations in the sample, then we calculate:
• Mi = = mean of observations in the group,
• T = sum of all observations = = ΣXi
• S = sum of squares of all observations = Σxi
• N = total number of observations =
11
• One way ANOVA partitions the total sum of squares (SST) into
two distinct components.
• i) The sum of squares due to differences between the group
means (SSB).
• ii) The sum of squares due to differences between the
observations within each group (SSW). This is also called the
residual sum of squares.
• SST = SSB + SSW

• SST = Total sum of squared deviations of each observation


about grand mean
• SSB = Total sum of squared deviations of group means about
grand mean
• SSW = Total sum of squared deviations of each observation
about group mean
12
The sum of squares for one way ANOVA are
given as follows:

Source of variation Sum of squares

between groups k
SSB =  niMi 2  T 2 / N
i 1

within groups
SSW = S   niMi
i 1
2

Total
SST =
S  T2 /N
(= SSB +SSW)

13
The significance test for differences between the groups is based on a
comparison of the between groups and within groups mean squares.

♣ If the observed differences between the means of the groups are simply
due to chance variation, the variation between these group means will be
about the same as the variation within individuals of the same type.

♣ If there are real differences, the between groups variation will be larger.
The mean squares are compared using the F-test. This test is sometimes
known as variance-ratio test.

Between  groups
F= ,
Within  groups
D.F. = d.f. Between-groups, d.f. within-groups
= (k-1, N-k)

where N is the total number of observations and k is the number of


groups.
14
One way ANOVA table looks like the following:

Source of DF SS Mean square F P


variation
Between k-1 SSB SSB / k-1
groups
(SSB / k-1)/
(SSW / N-k)
Within groups N - k SSW SSW / N-k

Total N-1 SST

15
• The within-groups mean square is also called the
error mean square (MSE); the degrees of freedom
is error degrees of freedom and the SS is error sum
of squares (SSE).
– Between Groups Variance – measures how groups
vary about the Grand mean.
– Within Groups Variance – measures how scores in
each group vary about the group mean.

16
• If calculated F > Ftabulated  reject HO.
– Ftabulated depends on , degree of freedom for the
numerator and degree of freedom for the denominator
• Follow-up Procedures
– “significant” F only tells us there are differences, not
where specific differences lie.

17
Critical values for the F-statistic


DCH,
AAU

18
Assumptions

 The data are normally distributed or the samples have come from
Normally distributed populations.
 The population value for the standard deviation between individuals is
the same for each group (equal variance).
 Moderate departures from normality may be safely ignored, but the
effect of unequal standard deviations may be serious. In the latter
case, transforming the data may be useful.

19
Assumptions in ANOVA…
• distributional assumptions:
- Independence
- Normality
- Equal variance (homoscedasticity)

20
Assumptions in ANOVA…
• We are familiar with the first two distributional
assumptions from our study of the independent t
test.
• The independence assumptions supposes we have
k simple random samples, one from each of the k
populations.

21
Assumptions in ANOVA…
• The Normality assumption supposes that each
population has a Normal distribution or the sample
is large enough to impose Normal sampling
distributions of means through the Central Limit
Theorem.
• Check with
- Residuals plot
- Shpiro – Wilk’s test

22
Tests for
Heteroscedasticity
• It is prudent to assess the equal variance
assumption before conducting an ANOVA.
• These include
- Bartlett’s test, and
- Levene’s test.

23
Pair-wise comparisons of group means

 One way ANOVA is an extension of the two sample t test. When there
are only two groups, the F value will be the square of the
corresponding t value with (1, N-2) degrees of freedom. Remember the
degrees of freedom for the two sample t test is N-2.

 With two groups the interpretation of a significant difference is


reasonably straightforward, but how do we interpret significant
variation among the means of three or more groups?

 Further analysis is required to find out how the means differ, for
example, whether one group differs from all the others.

 It should be noted that pair-wise comparisons will be carried out when


the overall comparison of groups in the analysis of variance is
significant.

 With k groups, there are ½k(k-1) possible pair-wise comparisons of


group means.

 As the number of tests (comparisons) increases, the number of false


positives (false significant results) will increase accordingly.

24
Example

Twenty-two patients undergoing cardiac bypass surgery were randomized to


one of three ventilation groups:

Group I Patients received a 50% nitrous oxide and


50% oxygen mixture continuously for 24 hours;

Group II Patients received a 50% nitrous oxide and 50%


oxygen mixture only during the operation;

Group III Patients received no nitrous oxide but received


35-50% oxygen for 24 hours.

♣ The table below shows red cell folate levels for the three groups after 24
hours' ventilation. We wish to compare the three groups, and test the null
hypothesis that the three groups have the same red cell folate levels.

♣ Examination of the data does not reveal any obvious outliers and the
data in each group look plausible samples from a Normal distribution. The
standard deviation in group I is rather higher than those in the other groups,
but moderate variability is not a problem. Bartlett's test is useful for
assessing the null hypothesis that more than two samples come from
populations with the same variance. Some computer programs incorporate
this test.

25
Red cell folate levels (μg/l) in three groups of cardiac
bypass patients given different levels of nitrous oxide
ventilation (Amess et al., 1978)

Group I Group II Group III


(n=8) (n=9) (n=5)
243 206 241
251 210 258
275 226 270
291 249 293
347 255 328
354 273
380 285
392 295
309

Mean =316.6 256.4 278.0


SD = 58.7 37.1 33.8

26
calculations
k
N=  ni
i 1
= 8 + 9 + 5 = 22
Σxi = T = 243+251+275+ ….+293+328 = 6231

Σxi = S = 243+251+275+ ….+293+328 = 1820021

SST = Σ(Xi - X ) = Σxi - (Σxi)/N = 1820021 - 1764789


= 55232

SSB = Σni( X i- X ), or more easily calculated as

Σni X i - (Σxi)/ N

= 1780305 - 1764789 = 15516

SSW = Σxi - Σni X i

= 1820021 - 1780305
= 39716
Hypotheses

Ho : μ1 = μ2 = μ3

HA : Differences exist between at least some of the means


27
ANOVA table
Source of d.f. SS Mean F P
variation square
Between 2 15516 7758
groups
3.71 0.044
Within 19 39716 2090
groups
Total 21 55232

Since the P value is less than 0.05, the null hypothesis is rejected.

28
Multiple Comparison Tests
1. Tukey’s HSD Procedure (Honestly Significant
Difference)
2. Scheffe”s Procedure – most versatile
3. Newman-Keuls Procedure
4. Dunnet’s Procedure
5. Dunn’s multiple-comparison
6. Least Significance Difference (LSD)

29
Choice of Multiple Comparisons of Means
• The choice of a multiple comparisons method in
ANOVA will depend on the type of experimental
design used and the comparisons of interest to the
analyst.
For example,
• Tukey ( 1949) developed his procedure specifically for
pairwise comparisons when the sample sizes of the
treatments are equal.

30
Choice of Multiple Comparisons of
Means...
• The Bonferroni method, like the Tukey procedure, can
be applied when pair wise comparisons are of interest;
however,
• Bonferroni's method does not require equal sample
sizes.
• Scheffe (1953) developed a more general procedure for
comparing all possible linear combinations of treatment
means ( called contrasts).

31
Choice of Multiple Comparisons of
Means...
• Scheffé test – Suitable for pair-wise comparison
between all groups, not simply pair wise compare
– Corrects for the increased risk of a Type I error
(most conservative post-hoc test)
• Dunnet test – Useful for “planned comparisons”,
e.g. comparing two different groups against a
single control group

32
The Bonferroni t procedure
• also called Dunn’s multiple-comparison procedure)
Let m = the total number of possible comparisons.
• All pair wise differences (xi- xj )
are compared with the statistic

MSW MSW
t   
2m
, n-k ni nj

33
The Bonferroni t
procedure…
• If the difference exceed the above value, the
differences are significantly different from zero.

34
Tukey’s HSD Test
• Frequently used for testing the null hypotheses that
all possible pairs of treatment means are equal when
the samples are all of the same size.
• Makes use of a single value against which all mean
differences are compared.
• This value, called the HSD, which is given by
MSW MSW MSE
HSD q ,k , N  k   q ,k , N  k
ni nj n

35
Tukey’s HSD Test…
a the chosen level of significance
k the number of means in the experiment
N the total number of observations in the
experiment
q obtained from studentized range distribution with
, k, and N-k.
If the difference exceed the above value, the
differences are significantly different from zero.

36
Pair-wise comparisons of group means

 With 3 groups, there are ½ x 3 x (3-1) = 3 possible comparisons.

 Benferroni method

The modified t tests to compare the pairs of groups of interest, using the
Benferroni method to adjust the p-values are given as follows:

i) Find tcalc for the pairs of groups of interest

ii) The modified t test is based on the pooled


estimate of variance from all the groups (which is
the residual variance in the ANOVA table), not
just the pair being considered.

iii) If we perform k paired comparisons, then we


should multiply the P value obtained from each
test by k; that is, we calculate P' = kP with the
restriction that P' cannot exceed 1.

 Returning to the red cell follate data given above, the residual
standard deviation is
2090 = 45.72.
 Modified t tests to compare groups I and II, I and III and II and III
are performed by calculating:

37
i) groups I and II
t = (316.6 - 256.4) / (45.72 x( 18  19 ) = 2.71 on 19
degrees of freedom. The corresponding P
value = 0.014 and the corrected P value

P' = 0.014x3
= 0.042

II) groups I and III

t = (316.6 - 278.0) / (45.72 x( 1  1 ) = 38.6/26.06


8 5
= 1.48 on 19 degrees of freedom. The corresponding
P value = 0.1625 and the corrected P value
P' = 0.1625x3
= 0.4875

III) groups II and III

t = (278 - 256.4) / (45.72 x ( 1  1 ) = 21.6/25.5


5 9
= 0.85 on 19 degrees of freedom. The
corresponding P value = 0.425 and the corrected
P value P' = 1.00

Therefore, the main explanation for the difference between the groups that was
identified in the analysis of variance is thus the difference between groups I and II.

38
Exercise
A botanist plants random samples of each of four different strains of wheat on five plots
of land. The plots are of the same size and fertility and the same fertilizer is used on
each. The yields per plot are as follows:

strain 1 strain 2 strain 3 strain 4


4 7 10 16
3 8 14 14
6 9 12 10
2 8 9 7
2 5 5 3
a) At a .05 level of significance, do the different
strains produce the same yield?

b) Perform modified t tests to compare the pairs of


groups of interest, using the Benferroni method to
adjust the P values.

c) Perform part 'b' using Scheffe's method.

39

You might also like