Course Name
Medical Statistics
(Medicine, Prevention, Nursing)
Lecture-4
2025/03/19
Contents
Some basics
Chapter-6: Analysis of Variances (ANOVA)
Analysis of Variances (ANOVA)
We know that the t-test for the difference between two
population means depends on whether the population variances are
equal. To determine whether the population variances are equal, we
can perform a two-sample F-distribution.:
Analysis of variance (ANOVA) is a method for testing the hypothesis
that three or more population means are equal.
For example:
H0 : µ1 = µ2 = µ3 = . . . µk
H1 : At least one mean is different
F-distribution
• The F-distribution was developed by Fisher to study the
behavior of two variances from random samples taken from two
independent normal populations.
• In applied problems, we may be interested in knowing whether
the population variances are equal or not, based on the
response of the random samples.
1. The F- distribution is not symmetric; it is skewed to the
right.
2. The values of F can be 0 or positive; they cannot be
negative.
3. There is a different F-distribution for each pair of
degrees of freedom for the numerator and denominator.
F-distribution
Formula for F-distribution
Let and represent the sample variances of two different
populations. If both populations are normal and the population
variances and are equal, then the sampling distribution
Several Properties of F-distribution:
• The F-distribution is a family of curves each of which is determined
by two types of degrees of freedom, the degrees of freedom
corresponding to the variance in the numerator, denoted by df1, and
the degrees of freedom corresponding to the variance in the
denominator, denoted by df2.
• F-distribution is positively skewed.
• The total area under each curve of an F-distribution is equal to 1.
• F-values are always greater than or equal to zero.
• For all F-distributions, the mean value of F is approximately equal to
1.
Requirements for two-samples F-test
• The samples must be randomly selected.
• The samples must be independent.
• Each population must have a normal distribution.
• If these requirements are met then we can use F-test to compare
two population variances and . The test statistic is
or
Steps of a two-sample F-test
Example
Analysis of Variance with One Way
(One-Way ANOVA)
(Completely Randomized Design)
Key Concept
This section introduces the method of one-way analysis of
variance, which is used for tests of hypotheses that three or
more population means are all equal.
An Approach to Understanding ANOVA
• Understand that a small P-value (such as 0.05 or less) leads to
the rejection of the null hypothesis of equal means.
• With a large P-value (such as greater than 0.05), fail to
reject the null hypothesis of equal means.
• Develop an understanding of the underlying rationale by
studying the examples in this section.
Basics of One Way Analysis of Variance
One-way analysis of variance (ANOVA) is a method of testing the
equality of three or more population means by analyzing sample
variances. One-way analysis of variance is used with data
categorized with one treatment (or factor), which is a
characteristic that allows us to distinguish the different
populations from one another.
Requirements of One Way ANOVA
• The populations have approximately normal distributions.
• The populations have the same variance 2 (or standard deviation
).
• The samples are simple random samples.
• The samples are independent of each other.
• The different samples are from populations that are categorized
in only one way.
Procedure for testing
H0: µ1 = µ2 = µ3 = . . .
• Use STATDISK, Minitab, Excel, or a TI-83/84 Calculator to obtain
results.
• Identify the P-value from the display.
• Form a conclusion based on these criteria:
1. If the P-value , reject the null hypothesis of equal means
and conclude that at least one of the population means is
different from the others.
2. If the P-value > , fails to reject the null hypothesis of
equal means.
Example
Use the chest deceleration measurements listed in Table 12-1 and a
significance level of = 0.05 to test the claim that the three
samples come from populations with means that are all equal.
Example
Requirements are satisfied: distributions are approximately normal
(normal quantile plots); population variances appear to be about the
same; simple random samples; independent samples, not matched;
categorized according to a single factor of size
H0: 1 = 2 = 3
H1: At least one of the means is different from the others
significance level is = 0.05
Example
Step 1: Use technology to obtain ANOVA results
Example
Example
Step 2: Displays all show P-value = 0.028 when rounded
Step 3: Because the P-value of 0.028 is less than the
significance level of = 0.05, we reject the null hypothesis
of equal means.
There is sufficient evidence to warrant the rejection of the claim
that the three samples come from populations with means that are
all equal.
Example
• Based on the samples of measurements listed in Table 12-1, we
conclude that those values come from populations having means
that are not all the same.
• On the basis of this ANOVA test, we cannot conclude that any
particular mean is different from the others, but we can
informally note that the sample mean is the smallest for the
large cars.
• Because small measurements correspond to less trauma
experienced by the crash test dummies, it appears that the
large cars are safest, but this conclusion is not formally
justified by this ANOVA test.
P-Value and Test Statistic
• Larger values of the test statistic result in smaller P-values,
so the ANOVA test is right-tailed.
• Figure 12-2 shows the relationship between the F-test statistic
and the P-value. Assuming that the populations have the same
variance 2 (as required for the test), the F-test statistic is
the ratio of these two estimates of2:
(1) Variation between samples (based on variation among sample
means); and
(2) Variation within samples (based on the sample variances).
v a ria n c e b e tw e e n s a m p le s
F
v a ria n c e w ith in s a m p le s
Relationship Between F-test Statistic / P-value
Calculations and Identifying the
Means that are Different
ANOVA
Fundamental Concepts
Estimate the common value of 2
• The variance between samples (also called variation due to
treatment) is an estimate of the common population variance
2 that is based on the variability among the sample means.
• The variance within samples (also called variation due to
error) is an estimate of the common population variance 2
based on the sample variances.
ANOVA
Fundamental Concepts
Test Statistic for One-Way ANOVA
Variance between samples
F= Variance within samples
An excessively large F-test statistic is evidence against equal
population means.
Calculations with Equal Sample Sizes
2
• Variance between samples = n sx
where sx2 = variance of sample means
• Variance within samples = sp2
where sp = pooled variance (or the mean of the sample variances)
Example
Sample Calculations
Example
Sample Calculations
Use Table 12-2 to calculate the variance between samples, variance
within samples, and the F-test statistic.
2
• Find the variance between samples = ns x
.
For the means 5.5, 6.0 & 6.0, the sample variance is
= 0.0833
= 4 X 0.0833 = 0.3332
• Estimate the variance within samples by calculating the
mean of the sample variances.
2
s p =
3.0 + 2.0 + 2.0
= 2.3333
3
Example
Sample Calculations
Use Table 12-2 to calculate the variance between samples, variance
within samples, and the F-test statistic.
• Evaluate the F-test statistic
Variance between samples
F = Variance within samples
0.3332
F= 2.3333 = 0.1428
Critical Value of F
Right-tailed test
• Degree of freedom with k samples of the same size n
numerator df = k – 1
denominator df = k(n – 1)
Calculations with Unequal Sample
Sizes
ni(xi – x)2
variance within samples k –1
F= variance between samples
=
(ni – 1)s2i
(ni – 1)
where x = mean of all sample scores combined
k = number of population means being compared
ni = number of values in the ith sample
xi = mean of values in the ith sample
si = variance of values in the ith sample
Key Components of the ANOVA
Method
SS(total), or total sum of squares, is a measure of the total
variation (around x) in all the sample data combined.
SS total x x
2
Key Components of the ANOVA
Method
SS(treatment), also referred to as SS(factor) or SS(between
groups) or SS(between samples), is a measure of the variation
between the sample means.
SS treatment
n1 x1 x n2 x2 x nk xk x
2 2 2
ni xi x
2
Key Components of the ANOVA
Method
SS(error), also referred to as SS(within groups) or SS(within
samples), is a sum of squares representing the variability that is
assumed to be common to all the populations being considered.
SS error
n1 1 s n2 1 s nk 1 s
2
1
2
2
2
k
ni 1s i
2
Key Components of the ANOVA
Method
Given the previous expressions for SS(total), SS(treatment), and
SS(error), the following relationship will always hold.
SS(total) = SS(treatment) + SS(error)
Mean Squares (MS)
MS(treatment) is a mean square for treatment, obtained as
follows:
SS (treatment)
MS(treatment) = k–1
MS(error) is a mean square for error, obtained as follows:
SS (error)
MS(error) =
N–k
N = total number of values in all samples combined
Mean Squares (MS)
MS(total) is a mean square for the total variation, obtained
as follows:
SS(total)
MS(total) = N–1
N = total number of values in all samples combined
Test Statistic for ANOVA with
Unequal Sample Sizes
MS (treatment)
F=
MS (error)
• Numerator df = k – 1
• Denominator df = N – k
Example
Example
Advantages and Disadvantages
Completely Randomized Design
Advantages:
There are a number of special characteristics possessed by
completely randomized designs to which it is worth drawing
attention. These are summarized below:
1) The designs are very flexible and can be used for any
number of treatments, and may have any numbers (not
necessarily all the same) of observations in each treatment
group.
2) The statistical analysis is comparatively easy and
straightforward. It is, moreover, unaffected if some or all
of the observations, for any treatment, are lost or missing
for some purely random accidental reason, i.e. if the
accident is not more likely to happen to one treatment
rather than another. We merely carry out the standard
analysis on the observations that are available.
Disadvantages:
However, in certain circumstances, the design suffers from
the disadvantage of being inherently less informative than
other more sophisticated layouts.
1) If there are large differences between blocks, due to
fluctuations in fertility, the whole of this variation is
included in the residual variance, making the usual
significance tests less sensitive.
2) It is better to use the randomized block design. With
entirely homogeneous material, on the other hand, the
completely randomized layout is the most accurate.
Analysis of Variance with Two-Way
(Two-Way ANOVA)
(Randomized Block Designs)
• The statistical term "block" is conceptually an extension of
the term “pair" introduced in Chapter 6. In that chapter, it
was shown how members of two groups might be paired and that
the resultant paired-sample testing could be more powerful
than the nonpaired two-sample test.
• We shall now see that the randomized block ANOVA is an
extension of the paired t-test just as the One-way ANOVA is
an extension of the two-sample t-test, and also that the data
from such an experimental design can be analyzed by Model III
ANOVA consideration.
• A common example of paired data is the observation of the
same subjects at two different times, e.g., before and after
some treatments, similarly, blocks may be subjects from which
data are collected three or more times.
Sources of Variation
Basic steps of Analysis