Tutorial+6
Tutorial+6
TUTORIAL 6
Download the t6e1, t6e2 and t6e3 Excel data files from the subject website and save them
to your computer or USB flash drive. Read this handout and complete the tutorial
exercises before your tutorial class, so that you can ask help from your tutor during the
Zoom session if necessary.
After the tutorial class attempt the “Exercises for assessment”. For each assessment
exercise type your answer in the corresponding box available in the Quiz. If the exercise
requires you to use R, insert the relevant R/RStudio script and printout in the same Quiz box
below your answer. To get the tutorial mark on week 7, you must submit your answers to
these exercises in the Tutorial 6 Canvas Quiz by 10am Wednesday in week 7 and attend
Tutorial 7.
There are many different types of ANOVA, the two simplest ones being (i) one-way ANOVA
based on the independent measures design and (ii) one-way ANOVA based on randomised
blocks. The first is the generalisation of the two-independent-sample Z / t test for the
difference between two population means (parametric) and of the Wilcoxon rank-sum test
for the difference between two population medians (nonparametric), while the second is the
extension of the matched pair Z / t test for the difference between two population means
(parametric) and of the matched pair Wilcoxon signed ranks test for the difference between
two population medians (nonparametric).
1
L. Kónya ECON20003 - Tutorial 6
The calculations are based on the decomposition of the total sum of squares (SS) in the
pooled sample into two components: the sum of squares for treatments (SST), which is
related to the variations between the samples, and the sum of squares for error (SSE), which
is related to the variations within the samples.
In symbols,
SS SST SSE
where
k nj k k nj
j 1 i 1 j 1 j 1 i 1
k is the number of (sub-) populations and also the number of independent samples, nj is the
number of observations in sample j, xj-bar is the mean of sample j, and x-bar is the grand
mean, i.e., the mean of the pooled sample (all available observations). The corresponding
degrees of freedom are n – 1 for SS, k – 1 for SST and n – k for SSE, where n is the total
number of observations (n1 + n2 + …+ nk).
Under the required conditions, the common population variance can be estimated with the
sample variance of the pooled sample,
k nj
x xj
2
ij
j 1 i 1 SSE
s 2p MSE
nk nk
If, in addition, the composite null hypothesis, H0: 1 = 2 =…= k, is correct, the common
population variance can also be estimated using the sample variance of the sample means.
This second estimator is
k
n x x
2
j j
j 1 SST
s02 MST
k 1 k 1
s02 MST
F
s 2p MSE
and under H0 it has an F distribution with df1 = k – 1 numerator degrees of freedom and df2
= n – k denominator degrees of freedom.
2
ECON20003 - Tutorial 6
This test is always a right-tail test in terms of the decision rule, meaning that H0 is rejected
at the 100% significance level if the observed test statistic value exceeds the critical
value, i.e. if Fobs > F,k-1,n-k.
The friendly folks at the Taxpayers Association are always looking for ways to improve the
wording and format of their tax return forms. Three new forms have been developed
recently. To determine which, if any, are superior to the current form, 120 individuals were
asked to participate in an experiment. Each of the three new forms and the currently used
form were filled out by 30 different people. The amount of time (in minutes) taken by each
person to complete the task was recorded and stored in the t6e1 Excel file.
(a) What conclusions can be drawn from these data? (Use = 0.05.)
The variable of interest is Time (in minutes) it takes to fill out a form. It is a quantitative
variable measured on a ratio scale.
The question is whether there is any difference between the four forms in terms of Time,
so the hypotheses are
H0: 1 = 2 = 3 = 4 and HA: not all four population means are the same.1
Granted that the required conditions are satisfied, we can apply an ANOVA F-test. Let
us do so first manually and then with R.
The 5% critical value is F, k-1,n-k, the (1-) 100% percentiles of the F-distribution with
df1 = k – 1 numerator degrees of freedom and df2 = n – k denominator degrees of
freedom. In this case, k = 4 and n = 4 30 = 120 and from the F table, this critical value
is F0.05,3,116 F0.05,3,120 = 2.68. Therefore, H0 is to be rejected if Fobs > 2.68.
To simplify the manual calculations, launch RStudio, create a new project and script,
and name them t6e1. Import the data saved in the t6e1 Excel data file to RStudio in the
usual way, just make sure that in the Import Options section of the Import Excel Data
dialogue window (see next page) you name the data frame t6e1_wide.2
1
Note that this alternative hypothesis is not equivalent to 1 2 3 4. This latter statement is stronger
than HA, because, for example, it excludes the possibility that 1 and 2 are equal, while under HA they can be
equal.
2
So far, for the sake of simplicity, we always used the name of the Excel data file for the R data frame. You
will understand soon why to have a different name for the R data frame this time.
3
ECON20003 - Tutorial 6
Generate the basic descriptive statistics for the four samples by executing the following
commands:
library(pastecs)
round(stat.desc(t6e1_wide, basic = FALSE,
desc = TRUE, norm = TRUE, p = 0.95),3)
Since the sample sizes are equal, the grand mean is the average of the four sample
means:
k
j 1
xj 90.167 95.767 106.833 111.167
x 100.984
k 4
The sum of squares for treatments can be calculated from the sample means and the
grand mean:
k
SST n j x j x 30 [ 90.167 100.984 95.767 100.984
2 2 2
j 1
The sum of squares for errors can be obtained from the sample variances
4
ECON20003 - Tutorial 6
k nj k
SSE xij x j ( n j 1) s 2j
2
j 1 i 1 j 1
SST 8463.866
MST 2821.289
k 1 3
SSE 111479.857
MSE 961.033
nk 116
MST 2821.289
Fobs 2.936
MSE 961.033
Fobs > 2.68, so we reject H0 and conclude at the 5% significance level that it does not
take the same time on average to fill out the four different forms.
Before we reproduce these results in R, open the t6e1 Excel file. It has two sheets, Wide
and Long. They contain the same data but in different formats. On the Wide sheet, there
are four columns, one for each treatment (Form1, Form2, Form3, Form4), to record the
four samples of the time (Time) required to fill in the forms. On the Long sheet, there are
only two columns. The first column is for the type of the Form and the second for Time.
Form is the treatment variable. It has four possible values and if you scroll down you
can see that each value is repeated n = 30 times. The second column contains the
variable of interest, Time, and it has the four columns of time from the Wide sheet (from
row 2 to row 31) stacked on top of each other.
When you imported the data from the t6e1 Excel file, by default, RStudio opened the
first sheet in the file, i.e., the Wide sheet. If you check your Environment tab, you can
see that the data frame contains 4 numeric vectors.
5
ECON20003 - Tutorial 6
However, to perform ANOVA in R, the data set needs to be arranged in long format, like
on the Long sheet of t6e1.
In general, a set of data arranged in a table can be in wide (unstacked) format or in long
(stacked) format. In the wide format, every data point is recorded in a single row and the
columns hold the values of various attributes, while in the long format each data point is
represented by as many rows as the number of different sets of attributes and each row
contains the values of one set of attributes for the given data point. When the data set has
2 or 3 variables, the wide format is more compact and hence it is preferred for display
purposes. However, when there are more variables, the long format is more convenient.
Import the data on that sheet to RStudio and name the new data frame t6e1_long.
aov(formula)
where the formula argument specifies the statistical model that we intend to analyse. In
this example, we want to compare the times required to fill in the forms, so Time is the
‘dependent’ variable, Form is the independent variable, and the appropriate formula is
Time ~ Form. The output of aov is quite succinct. To obtain a more meaningful printout,
it is better to combine it with the summary function that we already used in Tutorial 3.
Execute
summary(aov(Time ~ Form))
What if we did not have the data set in long format? One option is to reshape the data
set from wide to long in Excel before importing it to RStudio, just as I did myself.
However, this can be time consuming when the data set is large. Alternatively, we can
import the data in wide format and convert it to long format in R before using the aov()
function.
To illustrate this option, stack the four vectors of time into one vector, called minutes, by
executing
The new vector, minutes, has 120 elements and it is identical with Time in the t6e1_long
data frame. On its own, it is clearly insufficient because by stacking the four original
vectors into one we lost a crucial piece of information, namely which tax return forms
the observations belong to. For this reason, just like in the t6e1_long data frame, Time
must be complemented with a second variable that categorizes Time and stores it as
levels. It is a qualitative or categorical variable, called factor in R.
function, where n is the number of factor levels, k is the number of replications in a row,
length is the required length of the resulting factor, and labels (optional) contains the
labels of the factor levels.
In this case, n = 4, k = 30, length = 120, labels = c("Form1", "Form2", "Form3", "Form4"),
so execute
Type forms in the Console and press Enter to verify that we managed to replicate Form
in the t6e1_long data frame.
Now, execute
summary(aov(minutes ~ forms))
7
ECON20003 - Tutorial 6
to obtain
Apart from the different variable names (Form versus forms), this ANOVA table is the
same as the one on the previous page.
(b) What are the required conditions for the test conducted in part (a)?
As mentioned on page 1, the ANOVA F-test is based on five assumptions: (i) the data
set constitutes k independent random samples of independent observations drawn from
k (sub-) populations; (ii) the variable of interest is quantitative and continuous; (iii) the
measurement scale is interval or ratio; (iv) each (sub-) population is normally distributed
and (v) has the same variance (i.e., they are homoskedastic).
(c) Does it appear that the required conditions of the tests in part (a) are satisfied?
Independence of the samples is not a testable requirement, we just take it granted. The
amount of time (in minutes) taken by each person to complete the task is a quantitative
variable measured on a ratio scale. The descriptive statistics and the Shapiro-Wilk test
results on page 4 do not challenge normality.3
As regards the last requirement, homoskedasticity, you learnt in the week 4 lectures and
on the previous tutorial how to test whether two population variances are equal with the
F-test. A generalization of this test for the equality of two or more population variances
is the Levene’s test. The hypotheses are
leveneTest(formula)
where the formula argument is like in aov.4 This function is part of the car package, so
install this package if you do not have it yet, load it
install.packages("car")
library(car)
and then execute
leveneTest(minutes ~ forms)
It returns
3
We do not discuss this issue here in details because by now you should be able to check normality on your
own. Remember though that in the assignments and in the final exam you cannot simply assume normality,
you need to verify it using the checks you learnt about.
4
We do not discuss the details of the Levene’s test, and you are not expected to be able to perform it manually.
8
ECON20003 - Tutorial 6
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 3 0.0833 0.969
116
As you can see, the numbers of degrees of freedom are the same as on the previous
ANOVA printout. The test statistic value is 0.0833 and the p-value (Pr(>F)) is 0.969, so
there is no reason to question the homoskedasticity assumption. This implies that the
ANOVA F-test is appropriate this time.
But, what if Levene’s test rejected homoskedasticity? In that case, we should not rely
on the ANOVA F-test, but should use Welch’s F-test, which is a generalization of
Welch’s t-test for two or more independent samples. This test requires independence
and normality, just like the ANOVA F-test, but it allows unequal variances (called
heteroskedasticity). In R, it can be done with the
oneway.test(formula, var.equal = )
command, where formula is like before and var.equal is a logical variable indicating
whether to treat the variances in the samples as equal (it is FALSE by default).
Hence,
Apart from the different numbers of decimals, the F test statistic and the p-value on this
printout are the same than on the aov printout. To run the Welch test, we need to drop
the var.equal argument.5
oneway.test(minutes ~ forms)
returns
Compared to the ANOVA F-test, the Welch F-test statistic is a bit smaller (2.8057) and
the corresponding p-value is a bit larger (0.04661), but H0 is still rejected at the 5%
5
We can use var.equal = FALSE, but it is unnecessary as this is the default option.
9
ECON20003 - Tutorial 6
significance level. Therefore, it does not matter this time whether the variances are equal
or not.
What if we reduce the significance level to 4%? At this level the ANOVA F-test rejects
the null hypothesis, but the Welch F-test does not. Because of the Levene’s test result,
we better rely on the former test, so at the 4% significance level we still conclude that it
does not take the same time on average to fill out the four different forms.
When the mean does not exist or the sampled populations are clearly not normally
distributed, we should use neither the ANOVA F-test nor the Welch F-test, but some
nonparametric test instead. The nonparametric counterpart of these tests is the Kruskal-
Wallis test, a generalization of the Wilcoxon rank-sum test to two or more (sub-) populations.
The Kruskal-Wallis test is a one-way ANOVA test for the equality of k 2 (sub-) population
medians based on the ranks of the observations in the pooled set of k independent samples,
one from each (sub-) population. It is based on the following assumptions:
To perform this test, we need to rank all available observations from the smallest (1) to the
largest (n = n1 + n2 +…+ nk), averaging the ranks of tied observations, and calculate the sum
of the ranks assigned to the observations in each sample. The test statistic is
k T2
12
H j
n(n 1) j 1 n j
3(n 1)
where Tj is the sum of ranks assigned to the observations in the jth sample.
The sampling distribution of this test statistic is non-standard. The small-sample critical
values for k = 3, nj = 1,...,6 and k = 4, nj = 1,...,5 are provided in the Kruskal-Wallis table that
you can download from the subject website, while for larger sample sizes (5 or more) it can
be approximated with a chi-square distribution, 2k-1.
10
ECON20003 - Tutorial 6
Exercise 2 (Selvanathan et al., p. 933, ex. 20.56)
This exercise is similar to the previous exercise, with one important difference. Namely, this
time the observed variable, response, is a qualitative variable measured on an ordinal scale.
For this reason, we cannot use a parametric test for the population means. Instead, we must
use a nonparametric procedure for the population medians. The null hypothesis is that there
is no difference among the four ads in terms of median believability, i.e., 1 = 2 = 3 = 4,
while the alternative hypothesis is that not all four population medians are the same.
The appropriate procedure is the Kruskal-Wallis test, granted that its requirements are met.
The data set consists of four independent samples of 100 independent observations each,
the underlying variable of interest is belief that can be thought of as a continuous variable,
and the measurement scale of the observed variable is ordinal. Hence, requirements (i), (ii)
and (iii) are satisfied. As for (iv), it can be checked by illustrating the data with four
histograms.
Launch RStudio, create a new project and script, name them t6e2, import the data saved in
the t6e2 Excel file6 and load it into your project. As you can see on the Environment tab,
there are two series in the data set: Response, which is the variable of interest, and Ad,
which is used to classify the observations according to the four advertisements.
Check whether the psych library is available on your computer. If it is not, install it.7
You can develop a histogram of Response for each of the four ads by executing the following
commands:
library(psych)
par(mfrow = c(2, 2))
for (i in c(1,2,3,4)) {hist(subset(Response, Ad == i))}
The first command, library(psych), loads the psych package in the active project. The
second command, par(mfrow), sets up a 2 x 2 plotting space to be able to view the four
histograms together in a single plot. The third command, for (i in c(1,2,3,4)), creates a for
loop that iterates 4 times. In each iteration i takes on the value of the corresponding element
of vector (1, 2, 3, 4) and a histogram is generated by the hist function for a subset of
Response defined by the Ad equals i restriction.
6
Before importing the data, check in Excel that it is in long format.
7
Click Tools / Install Packages … and write psych in the second box of the Install Packages dialogue window.
11
ECON20003 - Tutorial 6
You should get the plot8 shown below. The four histograms look similar, so the fourth
requirement of Kruskal-Wallis test is also satisfied.
Because of the large sample sizes, we do not do the Kruskal-Wallis test manually. The
relevant R function is
kruskal.test(x, group)
In this case,
kruskal.test(Response, Ad)
returns
8
You might get an error message if you use a laptop with its native screen as this combined plot might not fit
in the bottom-right plots panel of RStudio. If that is the case, try to resize this panel, make it wider.
12
ECON20003 - Tutorial 6
Kruskal-Wallis rank sum test
data: Response and Ad
Kruskal-Wallis chi-squared = 4.7766, df = 3, p-value = 0.1889
Under the null hypothesis the Kruskal-Wallis test statistic is distributed as a chi-square
random variable with k – 1 = 3 degrees of freedom. The p-value is 0.1889 > 0.05. Hence, at
the 5% significance level we maintain the null hypothesis and conclude that there are no
significant differences among the four ads in terms of believability.
Exercise 3
A farmer wants to know if the weight of parsley plants is influenced by using a fertilizer. He
selects 90 plants and randomly divides them into three groups of 30 plants each. He applies
a biological fertilizer to the first group, a chemical fertilizer to the second group and no
fertilizer at all to the third group. After a month he weighs all plants and saves the
measurements in the t6e3 Excel file.
Can we conclude from these data at the 5% significance level that fertilizer affects the weight
of parsley plants?
a) Obtain the basic descriptive statistics with R and then perform the ANOVA F-test
manually.
c) What are the required conditions for the tests in parts (a) and (b)? Do they seem to be
satisfied?
d) Perform the Welch F-test in R. Does it lead to the same conclusion than the ANOVA F-
test?
e) Perform the Kruskal -Wallis test in R (use = 0.05). Does it lead to a different conclusion
than the parametric tests in part (b) and (d)?
13
ECON20003 - Tutorial 6