0% found this document useful (0 votes)
16 views18 pages

L4 Anova - 082842

Analysis of Variance (ANOVA) is a statistical method used to determine if there are significant differences between the means of three or more groups. It includes various types such as one-way and two-way ANOVA, with or without repeated measurements, and requires specific assumptions to be met. The results indicate whether at least one group differs from others, but post-hoc tests are needed to identify which groups are different.

Uploaded by

Ahmed Khaled
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views18 pages

L4 Anova - 082842

Analysis of Variance (ANOVA) is a statistical method used to determine if there are significant differences between the means of three or more groups. It includes various types such as one-way and two-way ANOVA, with or without repeated measurements, and requires specific assumptions to be met. The results indicate whether at least one group differs from others, but post-hoc tests are needed to identify which groups are different.

Uploaded by

Ahmed Khaled
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Analysis of Variance (ANOVA)

An analysis of variance (ANOVA) tests whether statistically significant differences exist between more than two samples.
For this purpose, the means and variances of the respective groups are compared with each other.In contrast to the t-
test, which tests whether there is a difference between two samples, the ANOVA tests whether there is a difference
between more than two groups.
There are different types of analysis of variance, being the one-way and two way analyses of variance the most common
ones, each of which can be calculated either with or without repeated measurements. In this tutorial you will learn the
basics of ANOVA; for each of the four types of analysis of variance you will find a separate detailed tutorial:
• One-factor (or one-way) ANOVA
• Two-factors (or two-way) ANOVA
• One-factor ANOVA with repeated measurements
• Two-factors ANOVA with repeated measurements

# Why not calculate multiple t-tests?


ANOVA is used when there are more than two groups. Of course, it would also be a possibility to calculate a t-test for each
combination of the groups. The problem here, however, is that every hypothesis test has some degree of
error. This probability of error is usually set at 5%, so that, from a purely sta s cal point of view, every 20th test gives a
wrong result If, for example, 20 groups are compared in which there is actually no difference, one of the tests will show a
significant difference purely due to the sampling.
# Difference between one-way and two-way ANOVA
The one-way analysis of variance only checks whether an independent variable has an influence on a metric dependent
variable. This is the case, for example, if it is to be examined whether the place of residence (independent
variable) has an influence on the salary (dependent variable). However, if two factors, i.e. two independent variables, are
considered, a two-way analysis of variance must be used.
# Analysis of variance with and without repeated measures
Depending on whether the sample is independent or dependent, either analysis of variance with or without repeated
measures is used. If the same person was interviewed at several points in time, the sample is a dependent
sample and analysis of variance with repeated measurements is used.

# One-factor ANOVA
The one-way analysis of variance is an extension of the t-test for independent groups. With the t-test only a maximum of
two groups can be compared; this is now extended to more than two groups. For two groups (k = 2), the analysis
of variance is therefore equivalent to the t-test. The independent variable is accordingly a nominally scaled variable with
at least two characteristic values.
The dependent variable is on a metric scale. In the case of the analysis of variance, the independent variable is referred
to as the factor.The following question can be answered: Is there a difference in the population between the different
groups of the independent variable with respect to the dependent variable?
The aim of ANOVA is to explain as much variance as possible in the dependent variable by dividing it into the groups. Let
us consider the following example.

# One-factor ANOVA example


With the help of the dependent variable, e.g. "highest educational qualification" with the three characteris cs group 1,
group 2 and group 3 should be explained as much variance of the dependent variable "salary" as possible. In the graphic
below, under A) a lot of variance can be explained with the three groups and under B) only very little variance.
Accordingly, in case A) the groups have a very high influence on the salary and in case B) they do not.
In the case of A), the values in the respective groups deviate only slightly from the group mean, the variance within the
groups is therefore very small. In the case of B), however, the variance within the groups is large. The variance
between the groups is the other way round; it is large in the case of A) and small in the case of B). In the case of B) the
group means are close together,in the case of A) they are not.

# Analysis of variance hypotheses


As with the statistical tests already discussed, when performing analyses of variance, it is necessary to formulate
hypotheses in advance that are to be tested. The null hypothesis and the alternative hypothesis arise in a single
factor analysis of variance as follows:
• Null hypothesis H0: The mean of all groups is equal.
• Alterna ve hypothesis H1: There are differences in the means of the groups.
The results of the Anova can only make a statement about whether there are differences between at least two groups.
However, it cannot be determined which groups are exactly different. A post-hoc test is needed to determine
which groups differ. There are various methods to choose from, with Duncan, Dunnet C and Scheffe being among the
most common methods.

# Assumptions for one-way analysis of variance


Before you start performing an analysis of variance, it is important to check the following assumptions so that you know
whether your data are suitable for this test. These assumptions are:
1) Scale level: The scale level of the dependent variable must be metric, whereas the independent variable must be
nominally scaled.
2) Homogeneity: The variances in each group should be roughly the same. This can be checked with the Levene test.
3) Normal distribu on: The data within the groups should be normally distributed. This means that the majority of the
values are in the average range, while very few values are significantly below orsignificantly above. If this condition is not
met, the Kruskal-Wallis test can be used.
If there are no independent samples but dependent ones, then a one-factor analysis of variance with repeated measures
is used.
Solved example of one way ANOVA

Variation
Variation is the sum of the squares of the deviations between a value and the mean of the value
Sum of Squares is abbreviated by SS and often followed by a variable in parentheses such as SS(B) or SS(W) so we know
which sum of squares we’re talking about
1- Are all of the values identical?
No, so there is some variation in the data
This is called the total variation
Denoted SS(Total) for the total Sum of Squares (variation)
Sum of Squares is another name for variation
2- Are all of the sample means identical?
• No, so there is some variation between the groups
• This is called the between group variation
• Sometimes called the variation due to the factor
• Denoted SS(B) for Sum of Squares (variation) Between the groups
3- Are each of the values within each group identical?
• No, there is some variation within the groups
• This is called the within group variation
• Sometimes called the error variation
• Denoted SS(W) for Sum of Squares (variation) Within the groups
• There are two sources of variation
• the variation between the groups, SS(B), or the variation due to the factor
• the variation within the groups, SS(W), or the variation that can’t be explained by the factor so it’s called
the error variation
Degrees of Freedom, df
A degree of freedom occurs for each value that can vary before the rest of the values are predetermined.
For example, if you had six numbers that had an average of 40, you would know that the total had to be 240. Five of the
six numbers could be anything, but once the first five are known, the last one is fixed so the sum is 240.
The df would be 6-1=5
The df is often one less than the number of values
• The between group df is one less than the number of groups
• We have three groups, so df(B) = 2
• The within group df is the sum of the individual df’s of each group
• The sample sizes are 7, 9, and 8
• df(W) = 6 + 8 + 7 = 21
• The total df is one less than the sample size
• df(Total) = 24 – 1 = 23
Variances
The variances are also called the Mean of the Squares and abbreviated by MS, often with convoying variable MS(B) or
MS(W)
They are an average squared deviation from the mean and are found by dividing the variation by the degrees of freedom
MS = SS / df
• MS(B) = 1902 / 2 = 951.0
• MS(W) = 3386 / 21 = 161.2
• MS(T) = 5288 / 23 = 229.9
F test statistic
An F test statistic is the ratio of two sample variances
The MS(B) and MS(W) are two sample variances and that’s what we divide to find F.
F = MS(B) / MS(W)
For our data, F = 951.0 / 161.2 = 5.9
The F test is a right tail test
The F test statistic has an F distribution with df(B) numerator df and df(W) denominator df
Here df of numerator =2
df of denominator =21
Now at 0.05 significant value
From table (F0.o5,2,21 = 3.4668)
• Since the calculated F (5.9) > Tabulated F(3.4668) ,so we reject the null
hypothesis.
• The null hypothesis is that the means of the three rows were the same, but we
reject that, so at least one row has a different mean.
• There is enough evidence to support the that there is a difference in the mean
scores of the front, middle, and back rows.
• The ANOVA doesn’t tell which row is different, you would need to look at
confidence intervals or run post hoc tests to determine that

# Welch's ANOVA
If the condition of variance homogeneity is not fulfilled, Welch's ANOVA can be calculated instead of the "normal" ANOVA.
If the Levene test results in a significant deviation of the variances in the groups, DATAtab automatically calculates the
Welch's ANOVA in addition.

# Two factor analysis of variance


As the name suggests, two-way analysis of variance examines the influence of two factors on a dependent variable. This
extends the one-way analysis of variance by a further factor, i.e. by a further nominally scaled independent variable. The
question is again whether the mean of the groups differs significantly.
Example:
In a screw factory, a screw is produced by three different production systems, factor 1 in two shi s, factor 2. You now
want to find out whether the production facilities or the shifts have an influence on the weight of the bolts.
To do this, take 50 screws from each produc on line and each shi and measure the weight. Now you use two-factor
ANOVA to determine whether the average weight of the screws from the three production lines and the two
shifts is significantly different from one another.

# Repeated Measures ANOVA


Repeated measures ANOVA tests whether there are statistically significant differences in three or more dependent
samples. In a dependent sample, the same participants are measured multiple times under different conditions or
at different time points.
The one-way analysis of variance with repeated measures is the extension of the t-test for dependent samples for more
than two groups. Figure 85: Measurement repe ons 13.10.1 What are dependent samples?
In a dependent sample, the measured values are connected. For example, if a sample is drawn of people who have knee
surgery and these people are interviewed before the surgery and one week and two weeks after the
surgery, it is a dependent sample. This is the case because the same person was interviewed at two points in time.
Repeated measures: Measurements are repeated when a person is questioned at different times. This is the case, for
example, when a person is asked about the intensity of the pain a er 3, 6 and 9 months a er a surgery.
Now, of course, it doesn't have to be about people or points in time, in a generalized way, we can say: In a dependent
sample, the same test units are measured several times under different conditions. The test units can be people, animals
or cells, for example, and the conditions can be time points or treatments, for example.

# Difference of analysis of variance with and without repeated measurements


If 3 or more independent samples are available, ANOVA without repeated measures is used. But be careful, of course the
assumptions have to be checked.

# Example of repeated measures ANOVA


You might be interested to know whether therapy after a slipped disc has an influence on the patient's perception of pain.
For this purpose, you measure the pain perception before the therapy, in the middle of the therapy and at
the end of the therapy. Now you want to know if there is a difference between the different times.
So, your independent variable is time, or therapy progressing over time. Your dependent variable is the pain perception.
You now have a history of the pain perception of each person over time and want to know whether the therapy
has an influence on the pain perception.
To put it simply, in the left case the therapy has an influence and in the right case the therapy has no influence on the pain
sensation. In the course of time, the pain sensation does not change on the right hand case, but it does on the
left hand one.
# Research question and hypotheses
What is the research question in a repeated measures ANOVA? The research question is: Is there a significant difference
between the dependent groups in terms of the mean?
The null and alternative hypothis results in:
• Null hypothesis: there are no significant differences between the dependent groups.
• Alternative hypothesis: there is a significant difference between the dependent groups.
# Assumptions ANOVA with repeated measures

Now we come to the assumptions of ANOVA with repeated measures and finally I will show you how you can easily
calculate it online. So what are the assumptions?
• Dependent samples: The samples must be dependent samples.
• Normality: The data should be approximately normally distributed and have metric scale level. This assumption is
especially important when the sample size is small. When the sample size is large, ANOVA is
somewhat robust to violations of normality.
• Sphericity: The variances of the differences between all combinations of factor levels (time points) should be the same.
• Homogeneity of Variances: The variance in each group should be equal. Levene's test can be used to check this
assumption.
• Homogeneity of Covariances (Sphericity): The variances of the differences between all combinations of the different
groups should be equal. This assumption can be tested using Mauchly's test of
sphericity.
• No significant Outliers: Outliers can have a disproportionate effect on ANOVA, potentially leading to misleading results.
Whether data is normally distributed or not can be tested using the QQ plot or the Kolmogorov smirnov test.
Whether the assumption of sphericity is violated can be tested using Mauchly's test for sphericity. If the resulting p-value
is greater than 0.05, it can be assumed that the variances are equal, and the condition is not violated.
If this assumption is violated, adjustments such as Greenhouse-Geisser or Huynh-Feldt can be made.

# Results of the one-factor analysis of variance with repeated measures.

The analysis of variance with repeated measurement gives you a p-value for your data. With the help of this p-value you
can read whether there is a significant difference between the repeated measurements.
If the calculated p-value is smaller than the predefined significance level, which is usually 0.05, the null hypothesis is
rejected.
In this example, the p-value is 0.01, which is less than 0.05. Therefore, the null hypothesis is rejected and it can be assumed
that there is a difference between the different time points.
#The Post-hoc-Test
As soon as there is a significant difference between the different time points, it is of course also of interest to identify
between which exact time points that difference exists. This can be found out with the help of the Bonferroni post
hoc test.
In the post-hoc test in a repeated measures ANOVA, multiple t tests are calculated for dependent samples. However, the
problem with multiple testing is that the so-called alpha error (the false rejection of the null hypothesis) increases with
the number of tests. To counteract this, the Bonferroni post-hoc test calculates the obtained p-values times the number
of tests.
In the present case, 3 tests were performed, so for the calcula on of the post-hoc test, the p-value obtained from the t-
test was multiplied by 3 in the background. If one or more p-values are less than 0.05, a significant difference between
the two groups is assumed. In this case, we therefore have a significant difference between Before and End and between
Middle and End.
# Calculate a repeated measures ANOVA
How do you calculate an analysis of variance with repeated measures by
hand? Here you can find the formulas to calculate an ANOVA.
Let's say this is our data. We have 8 people, each of whom we measured at
three different points in time (start, middle and end).

First, we can calculate the necessary mean values. With the mean values we can calculate the Sum of squares and the
Mean Square. Now we can calculate the F value, which is calculated by dividing the mean square of the treatment by the
mean square of the residual or error.
Finally we can calculate the p value using the F value and the degrees of freedom from the treatment and error. To
calculate the p-Value we use the F distribution.

# Two-way ANOVA (without repeated measures)


As the name suggests, the two-way or two-factor analysis of variance examines the influence of two factors on a
dependent variable. Here, the single-factor analysis of variance is extended by a further factor, i.e. by a further nominally
scaled independent variable. The question here is again whether the mean values of the groups differ significantly.
# What is a factor?
A factor is, for example, the gender of a person with the characteristics male and female, the form of therapy used for a
disease with therapy A, B and C or the field of study with, for example, medicine, business administration,
psychology and math.
In the case of variance analysis, a factor is a categorical variable. You use an analysis of variance whenever you want to
test whether these categories have an influence on the so-called dependent variable.
For example, you could test whether gender has an influence on salary, whether therapy has an influence on blood
pressure, or whether the field of study has an influence on the duration of studies. Salary, blood pressure or
study duration are then the dependent variables. In all these cases you now check whether the factor has an influence on
the dependent variable.
Since you only have one factor in these cases, you would use a single factor analysis of variance in these cases (except of
course for the gender, there we have a variable with only two expressions, there we would use the t-test for
independent samples).
# Two factors
Now you may have another categorical variable that you want to include as well. You might be interested in whether:
• in addition to gender, the highest level of education also has an influence on salary.
• besides therapy, gender also has an influence on blood pressure.
• in addition to the field of study, the university attended also has an influence on the duration of studies

Now in all three cases you would not have one factor, but two factors each.And since you now have two factors, you use
the two-way analysis of variance.
Using the two-way analysis of variance, you can now answer three things:
• Does factor 1 have an effect on the dependent variable?
• Does factor 2 have an effect on the dependent variable?
• Is there an interac on between factor 1 and factor 2?
Therefore, in the case of one-way analysis of variance, we have one factor from which we create the groups. In the case
of two-way analysis of variance, the groups result from the combination of the expressions of the two factors.

Example Two-Way ANOVA


Here's an example dataset for a two-way ANOVA in medicine. Let's say we are interested in studying the effect of two
factors, "Treatment" and "Gender," on the response variable "Blood Pressure."
In this example, we have two levels of the "Treatment" factor (A and B) and two levels of the "Gender" factor (Male and
Female). The "Blood Pressure" measurements are recorded for each participant based on their treatment
and gender.
To perform a two-way ANOVA on this dataset, we would test the null hypothesis that there is no interaction between the
"Treatment" and "Gender" factors and no main effects of each factor on the "Blood Pressure" response variable.
# Hypotheses
Three statements can be tested with the 2 way ANOVA, so there are 3 null hypotheses and therefore 3 alterna ve
hypotheses.
Assumptions
For a two-way analysis of variance to be calculated without repeated measures, the following assumptions must be met:
• The scale level of the dependent variable should be metric, that of the independent variable (factors) nominal scale.
• Independence: The measurements should be independent, i.e. the measured value of one group should not be
influenced by the measured value of another group. If this were the case, we would need an analysis
of variance with repeated measures.
• Homogeneity: The variances in each group should be approximately equal. This can be checked with Levene's test.
• Normal distribution: The data within the groups should be normally distributed.
So the dependent variable could be, for example, salary, blood pressure, and study duration. These are all metric variables.
And the independent variable should be nominally or ordinally scaled. For example, gender, highest level of
education, or a type of therapy. Note, however, that rank order is not used with ordinal variables, so this information is
lost.
# Calculation of a two-way ANOVA
To calculate a two-way ANOVA, the following formulas are needed. Let's
look at this with an example.
Solved Example
Let's say you work in the marketing department of a bank and you want to find out if gender and the fact that a person
has studied or not have an influence on their attitude towards retirement planning.
In this example, your two independent variables (factors) are gender (male or female) and study (yes or no). Your
dependent variable is attitude toward re rement planning, where 1 means "not important" and 10 means "very
important."

After all, is attitude toward retirement planning really a metric variable? Let's just assume that attitude toward retirement
planning was measured using a Likert scale and thus we consider the resulting variable to be metric.
Mean values
In the first step we calculate the mean values of the individual groups, i.e. of male and not studied, which is 5.8 then of
male and studied, which is 5.4, we now do the same for female.
Then we calculate the mean of all male and female and of not studied and studied respectively. Finally, we calculate the
overall mean as 5.4.
Sums of squares
With this, we can now calculate the required sums of squares. SStot is the sum of squares of each individual value minus
the overall mean.
SSbtw results from the sum of squares of the group means minus the overall mean multiplied by the number of values in
the groups.
The sums of squares of the factors SSA and SSB result from the sum of squares of the means of the factor levels minus the
total mean.
Now we can calculate the sum of squares for the interaction. These are obtained by calculating SSbtw minus SSA minus
SSB.
Finally, we calculate the sum of squares for the error. This will calculate similar to the total sum of squares, so again we
use each individual value. Only in this case, instead of subtracting the overall mean from each value, we subtract
the respective group mean from each value.
Degrees of freedom
The required degrees of freedom are as follows:

Mean squares or variance


Together with the sums of squares and the degrees of freedom, the variance
can now be calculated:
F value
And now we can calculate the F values. These are obtained by dividing the
variance of factor A, factor B or the interaction AB by the error variance.

p-value
This gives us a p-value of 0.323 for Factor A, a p-value of 0.686 for Factor B, and a p-value of 0.55 for the interac on. None
of these p-values is less than 0.05 and thus we retain the respective null hypotheses.

#Interaction effect
But what exactly does interaction mean? Let us first have a look at this diagram.
The dependent variable is plotted on the y axis, in our example the attitude towards retirement provision. On the x axis,
one of the two factors is plotted, let's just take gender. The other factor is represented by lines with different
colors. Green is studied and red is not studied.
The endpoints of the lines are the mean values of the groups, e.g. male and not studied. In this diagram, one can see that
both gender and the variable of having studied or not have an influence on attitudes toward retirement planning. Females
have a higher value than males and studied have a higher value than not studied. But now finally to the interaction effects,
for that we compare these two graphs.

In the first case, we said there is no interaction effect. If a person has studied, he has a value that is, say, 1.5 higher than
a person who has not studied. This increase of 1.5 is independent of whether the person is male or female.
It is different in this case, here studied persons also have a higher value, but how much higher the value is depends on
whether one is male or female. If I am male, there is a difference of, let's say for example 0.5 and if I am female,
there is a difference of 3.5.
So in this case we clearly have an interaction between gender and study because the two variables affect each other. It
makes a difference how strong the influence from studying is depending on whether I am male or female.
In this case, we do have an interaction effect, but the direction still remains the same. So females have higher scores than
males and studied have higher scores than non-studied.

# Two-way ANOVA with measurement repetition


If we look at the most common types of the analysis of variance, we distinguish once between the one-way and the two-
way analysis of variance and on the other side the analysis of variance without measurement repetition and with
measurement repetition. Now we will look at the two-way analysis of variance with measurement repetition.
Two-way analysis of variance with measurement repetition tests whether there is a difference between more than two
samples divided between two variables or factors. In contrast to the two-factorial analysis of variance without
measurement repetitions, one of the factors is thereby created by measurement repetitions.
In other words, one factor is a dependent sample.

# Sample with measurement repetition


What is the difference to the "normal" one-factor analysis of variance with repeated measures? Or what is the difference
between one-factorial and two factorial?
Single factorial ANOVA with repeated measures tests whether there are statistically significant differences between three
or more dependent samples.
In a dependent sample, the measured values are linked. Thus, one and the same person is measured at several time points.

# Example two-way ANOVA with repeated measures


For example, if you take a sample of people with high blood pressure and measure their blood pressure before, during
and after treatment, this is a dependent sample. This is because the same person is interviewed at different
times.

You may want to know if the treatment for high blood pressure has an effect on the blood pressure. So you want to know
if blood pressure changes over time.
But what if you have different therapies and you want to see if there is a difference between them? You now have two
factors, one for the therapy and one for the repeated measurements. Since you now have two factors and one
of the factors is a dependent sample, you use a two-way repeated measures
analysis of variance.

Using two-way analysis of variance with repeated measures, you can now answer three things:
• Does the first factor with measurement repetition have an effect on the dependent variable?
• Does the second factor have an effect on the dependent variable?
• Is there an interac on between factor 1 and factor 2?
Hypotheses
As already indicated, you can test three statements with the 2 factorial analysis of variance, so there are also 3 null
hypotheses and therefore also 3 alternative hypotheses.
Null hypotheses:
• The mean values of the different measurement times do not differ (There are no significant differences between the
"groups" of the first factor).
• The mean values of the different groups of the second factor do not differ.
• One factor has no influence on the effect of the other factor

# Assumptions of the two-way analysis of variance with repeated measures


In order for a two-way analysis of variance with measurement repetition to be calculated, the following prerequisites must
be met:
• The scale level of the dependent variable should be metric. For example, salary or blood pressure.
• The scale level of the factors should be categorical.
• The measurements of one factor should be dependent, e.g. the
measurements should have arisen from repeated measurements of the same person.
• The measurements from the other factor should be independent, i.e. the measurement from one group should not be
influenced by the measurement from another group.
• The variances in each group should be approximately equal. This can be checked with the Levene test
• The data within the groups should be normally distributed.
• The variances of the differences between all combinations of the
different groups should be equal (Sphericity). This assumption can be tested using Mauchly's test of sphericity.

#Interpret two-way analysis of variance with repeated measures


Most important in this table are the plotted three rows, with these three rows, you can test if the 3 null hypotheses we
made before are kept or rejected. The first row tests you null hypothesis, whether blood pressure changes over time, so
whether the therapies have an effect on blood pressure. The second row tests whether there is a difference between the
respective therapies with respect to blood pressure. And the last row checks if there is
an interaction between the two factors. You can read the p-value at the very back of each one. Let's say we set the
significance level at 5%. If our calculated p-value is less than 0.05, then the respective null hypothesis is rejected and if the
calculated p-value is greater than 0.05, then the null hypothesis is not rejected. Thus, we see that the p-value of before,
middle and end is less than 0.05 and thus the before, middle and end times are significantly different in terms of
blood pressure. The p-value in the second row is greater than 0.05, so the therapies have the same mean values over time.
It is important to note that the mean value over the three time points is considered here. It could also be that in one
therapy the blood pressure increases and in the other therapy the blood pressure decreases, but on
average over the time points the blood pressure is the same, then we would not get a significant difference here.
If that were the case, however, we would have an interaction between the therapies and time. We test this with the last
hypothesis.
In this case, there is no significant interaction between therapy and time.

You might also like