0% found this document useful (0 votes)

29 views13 pages

One-Way ANOVA Tutorial for ECON20003

The document outlines the tutorial for ECON20003, focusing on one-way ANOVA procedures to compare population means using Excel and R. It details the necessary steps for data preparation, hypothesis testing, and calculations for ANOVA, including conditions for parametric tests and the interpretation of results. Additionally, it provides an exercise involving the analysis of time taken to fill out different tax forms, requiring students to perform calculations manually and using R for statistical analysis.

Uploaded by

1512866916

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views13 pages

One-Way ANOVA Tutorial for ECON20003

Uploaded by

1512866916

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

ECON20003 – QUANTITATIVE METHODS 2

TUTORIAL 6

Download the t6e1, t6e2 and t6e3 Excel data files from the subject website and save them
to your computer or USB flash drive. Read this handout and complete the tutorial
exercises before your tutorial class, so that you can ask help from your tutor during the
Zoom session if necessary.

After the tutorial class attempt the “Exercises for assessment”. For each assessment
exercise type your answer in the corresponding box available in the Quiz. If the exercise
requires you to use R, insert the relevant R/RStudio script and printout in the same Quiz box
below your answer. To get the tutorial mark on week 7, you must submit your answers to
these exercises in the Tutorial 6 Canvas Quiz by 10am Wednesday in week 7 and attend
Tutorial 7.

Comparing Several Population Central Locations with Parametric and

Nonparametric Procedures

Analysis of Variance (ANOVA) is a class of statistical procedures used to test differences

between two or more (sub-) population central locations. It is called "Analysis of Variance"
rather than "Analysis of Means” or Analysis of Medians” because it makes inferences about
several population central locations by analysing the variations within and between these
populations.

There are many different types of ANOVA, the two simplest ones being (i) one-way ANOVA
based on the independent measures design and (ii) one-way ANOVA based on randomised
blocks. The first is the generalisation of the two-independent-sample Z / t test for the
difference between two population means (parametric) and of the Wilcoxon rank-sum test
for the difference between two population medians (nonparametric), while the second is the
extension of the matched pair Z / t test for the difference between two population means
(parametric) and of the matched pair Wilcoxon signed ranks test for the difference between
two population medians (nonparametric).

One-Way ANOVA Based on the Independent Measures Design

Parametric one-way ANOVA based on independent samples has five conditions:

(i) The data set constitutes k independent random samples of independent

observations drawn from k (sub-) populations.
(ii) The variable of interest is quantitative and continuous.
(iii) The measurement scale is interval or ratio.
(iv) Each (sub-) population is normally distributed, …
(v) … and has the same variance.

1
L. Kónya ECON20003 - Tutorial 6
The calculations are based on the decomposition of the total sum of squares (SS) in the
pooled sample into two components: the sum of squares for treatments (SST), which is
related to the variations between the samples, and the sum of squares for error (SSE), which
is related to the variations within the samples.

In symbols,

SS  SST  SSE

where

k nj k k nj

SS    xij  x  , SST   n j  x j  x  , SSE    xij  x j 

2 2 2

j 1 i 1 j 1 j 1 i 1

k is the number of (sub-) populations and also the number of independent samples, nj is the
number of observations in sample j, xj-bar is the mean of sample j, and x-bar is the grand
mean, i.e., the mean of the pooled sample (all available observations). The corresponding
degrees of freedom are n – 1 for SS, k – 1 for SST and n – k for SSE, where n is the total
number of observations (n1 + n2 + …+ nk).

Under the required conditions, the common population variance can be estimated with the
sample variance of the pooled sample,

k nj

x  xj 
2
ij
j 1 i 1 SSE
s 2p    MSE
nk nk

where MSE is the mean squares for errors.

If, in addition, the composite null hypothesis, H0: 1 = 2 =…= k, is correct, the common
population variance can also be estimated using the sample variance of the sample means.
This second estimator is
k

n x  x
2
j j
j 1 SST
s02    MST
k 1 k 1

where MST is the mean squares for treatments.

The test statistic is the ratio of these two estimators,

s02 MST
F 
s 2p MSE

and under H0 it has an F distribution with df1 = k – 1 numerator degrees of freedom and df2
= n – k denominator degrees of freedom.

2
ECON20003 - Tutorial 6
This test is always a right-tail test in terms of the decision rule, meaning that H0 is rejected
at the  100% significance level if the observed test statistic value exceeds the critical
value, i.e. if Fobs > F,k-1,n-k.

Exercise 1 (Selvanathan et al., p. 636, ex. 15.13)

The friendly folks at the Taxpayers Association are always looking for ways to improve the
wording and format of their tax return forms. Three new forms have been developed
recently. To determine which, if any, are superior to the current form, 120 individuals were
asked to participate in an experiment. Each of the three new forms and the currently used
form were filled out by 30 different people. The amount of time (in minutes) taken by each
person to complete the task was recorded and stored in the t6e1 Excel file.

(a) What conclusions can be drawn from these data? (Use  = 0.05.)

The variable of interest is Time (in minutes) it takes to fill out a form. It is a quantitative
variable measured on a ratio scale.

The question is whether there is any difference between the four forms in terms of Time,
so the hypotheses are

H0: 1 = 2 = 3 = 4 and HA: not all four population means are the same.1

Granted that the required conditions are satisfied, we can apply an ANOVA F-test. Let
us do so first manually and then with R.

The 5% critical value is F, k-1,n-k, the (1-) 100% percentiles of the F-distribution with
df1 = k – 1 numerator degrees of freedom and df2 = n – k denominator degrees of
freedom. In this case, k = 4 and n = 4  30 = 120 and from the F table, this critical value
is F0.05,3,116  F0.05,3,120 = 2.68. Therefore, H0 is to be rejected if Fobs > 2.68.

To simplify the manual calculations, launch RStudio, create a new project and script,
and name them t6e1. Import the data saved in the t6e1 Excel data file to RStudio in the
usual way, just make sure that in the Import Options section of the Import Excel Data
dialogue window (see next page) you name the data frame t6e1_wide.2

Save the data set in your project as t6e1.

1
Note that this alternative hypothesis is not equivalent to 1  2  3  4. This latter statement is stronger
than HA, because, for example, it excludes the possibility that 1 and 2 are equal, while under HA they can be
equal.
2
So far, for the sake of simplicity, we always used the name of the Excel data file for the R data frame. You
will understand soon why to have a different name for the R data frame this time.
3
ECON20003 - Tutorial 6
Generate the basic descriptive statistics for the four samples by executing the following
commands:

library(pastecs)
round([Link](t6e1_wide, basic = FALSE,
desc = TRUE, norm = TRUE, p = 0.95),3)

You should get the following results:

Form1 Form2 Form3 Form4

median 83.500 99.500 100.000 113.000
mean 90.167 95.767 106.833 111.167
[Link] 5.749 5.480 5.564 5.840
[Link].0.95 11.758 11.208 11.379 11.943
var 991.523 900.875 928.695 1023.040
[Link] 31.488 30.015 30.475 31.985
[Link] 0.349 0.313 0.285 0.288
skewness 0.457 -0.116 0.361 -0.045
skew.2SE 0.535 -0.136 0.422 -0.053
kurtosis -0.106 -1.087 -0.716 -0.954
kurt.2SE -0.064 -0.653 -0.430 -0.573
normtest.W 0.956 0.969 0.967 0.967
normtest.p 0.251 0.504 0.450 0.455

Since the sample sizes are equal, the grand mean is the average of the four sample
means:


k
j 1
xj 90.167  95.767  106.833  111.167
x   100.984
k 4

The sum of squares for treatments can be calculated from the sample means and the
grand mean:

k
SST   n j  x j  x   30  [ 90.167  100.984    95.767  100.984 
2 2 2

j 1

 106.833  100.984   111.167  100.984  ]  8463.866

2 2

The sum of squares for errors can be obtained from the sample variances

4
ECON20003 - Tutorial 6
k nj k
SSE    xij  x j    ( n j  1) s 2j
2

j 1 i 1 j 1

 29  (991.523  900.875  928.695  1023.040)  111479.857

The mean squares are

SST 8463.866
MST    2821.289
k 1 3

SSE 111479.857
MSE    961.033
nk 116

and the observed test statistic is

MST 2821.289
Fobs    2.936
MSE 961.033

Fobs > 2.68, so we reject H0 and conclude at the 5% significance level that it does not
take the same time on average to fill out the four different forms.

Before we reproduce these results in R, open the t6e1 Excel file. It has two sheets, Wide
and Long. They contain the same data but in different formats. On the Wide sheet, there
are four columns, one for each treatment (Form1, Form2, Form3, Form4), to record the
four samples of the time (Time) required to fill in the forms. On the Long sheet, there are
only two columns. The first column is for the type of the Form and the second for Time.
Form is the treatment variable. It has four possible values and if you scroll down you
can see that each value is repeated n = 30 times. The second column contains the
variable of interest, Time, and it has the four columns of time from the Wide sheet (from
row 2 to row 31) stacked on top of each other.

When you imported the data from the t6e1 Excel file, by default, RStudio opened the
first sheet in the file, i.e., the Wide sheet. If you check your Environment tab, you can
see that the data frame contains 4 numeric vectors.

5
ECON20003 - Tutorial 6
However, to perform ANOVA in R, the data set needs to be arranged in long format, like
on the Long sheet of t6e1.

In general, a set of data arranged in a table can be in wide (unstacked) format or in long
(stacked) format. In the wide format, every data point is recorded in a single row and the
columns hold the values of various attributes, while in the long format each data point is
represented by as many rows as the number of different sets of attributes and each row
contains the values of one set of attributes for the given data point. When the data set has
2 or 3 variables, the wide format is more compact and hence it is preferred for display
purposes. However, when there are more variables, the long format is more convenient.

Import the data on that sheet to RStudio and name the new data frame t6e1_long.

The basic form of the R function for one-way ANOVA is

aov(formula)

where the formula argument specifies the statistical model that we intend to analyse. In
this example, we want to compare the times required to fill in the forms, so Time is the
‘dependent’ variable, Form is the independent variable, and the appropriate formula is
Time ~ Form. The output of aov is quite succinct. To obtain a more meaningful printout,
it is better to combine it with the summary function that we already used in Tutorial 3.

Execute

summary(aov(Time ~ Form))

to obtain the following ANOVA table:

Df Sum Sq Mean Sq F value Pr(>F)

Form 3 8464 2821 2.936 0.0363 *
Residuals 116 111480 961
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
6
ECON20003 - Tutorial 6
Compare this printout to the manual calculations on pages 4-5. Note that in this printout
the two sources of variation, treatment and error, are labelled as Form and Residuals,
and the mean squares are denoted as Mean Sq. The reported F value is 2.936, the
same as Fobs, and the p-value (Pr(>F)) is 0.0363, smaller than 0.05, so the test rejects
H0 at the 5% significance level.

What if we did not have the data set in long format? One option is to reshape the data
set from wide to long in Excel before importing it to RStudio, just as I did myself.
However, this can be time consuming when the data set is large. Alternatively, we can
import the data in wide format and convert it to long format in R before using the aov()
function.

To illustrate this option, stack the four vectors of time into one vector, called minutes, by
executing

minutes = c(t6e1_wide$Form1, t6e1_wide$Form2, t6e1_wide$Form3, t6e1_wide$Form4)

The new vector, minutes, has 120 elements and it is identical with Time in the t6e1_long
data frame. On its own, it is clearly insufficient because by stacking the four original
vectors into one we lost a crucial piece of information, namely which tax return forms
the observations belong to. For this reason, just like in the t6e1_long data frame, Time
must be complemented with a second variable that categorizes Time and stores it as
levels. It is a qualitative or categorical variable, called factor in R.

A factor can be generated by the

gl(n, k, length, labels)

function, where n is the number of factor levels, k is the number of replications in a row,
length is the required length of the resulting factor, and labels (optional) contains the
labels of the factor levels.

In this case, n = 4, k = 30, length = 120, labels = c("Form1", "Form2", "Form3", "Form4"),
so execute

forms = gl(4, 30, 120, c("Form1", "Form2", "Form3", "Form4"))

Type forms in the Console and press Enter to verify that we managed to replicate Form
in the t6e1_long data frame.

Now, execute

summary(aov(minutes ~ forms))

7
ECON20003 - Tutorial 6
to obtain

Apart from the different variable names (Form versus forms), this ANOVA table is the
same as the one on the previous page.

(b) What are the required conditions for the test conducted in part (a)?

As mentioned on page 1, the ANOVA F-test is based on five assumptions: (i) the data
set constitutes k independent random samples of independent observations drawn from
k (sub-) populations; (ii) the variable of interest is quantitative and continuous; (iii) the
measurement scale is interval or ratio; (iv) each (sub-) population is normally distributed
and (v) has the same variance (i.e., they are homoskedastic).

Independence of the samples is not a testable requirement, we just take it granted. The
amount of time (in minutes) taken by each person to complete the task is a quantitative
variable measured on a ratio scale. The descriptive statistics and the Shapiro-Wilk test
results on page 4 do not challenge normality.3

As regards the last requirement, homoskedasticity, you learnt in the week 4 lectures and
on the previous tutorial how to test whether two population variances are equal with the
F-test. A generalization of this test for the equality of two or more population variances
is the Levene’s test. The hypotheses are

H 0 :  12   22  ...   k2 , H A : not all  i2 s are equal

In R, it can be performed easily with the

leveneTest(formula)

where the formula argument is like in aov.4 This function is part of the car package, so
install this package if you do not have it yet, load it
[Link]("car")
library(car)
and then execute

leveneTest(minutes ~ forms)
It returns

3
We do not discuss this issue here in details because by now you should be able to check normality on your
own. Remember though that in the assignments and in the final exam you cannot simply assume normality,
you need to verify it using the checks you learnt about.
4
We do not discuss the details of the Levene’s test, and you are not expected to be able to perform it manually.
8
ECON20003 - Tutorial 6
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 3 0.0833 0.969
116

As you can see, the numbers of degrees of freedom are the same as on the previous
ANOVA printout. The test statistic value is 0.0833 and the p-value (Pr(>F)) is 0.969, so
there is no reason to question the homoskedasticity assumption. This implies that the
ANOVA F-test is appropriate this time.

But, what if Levene’s test rejected homoskedasticity? In that case, we should not rely
on the ANOVA F-test, but should use Welch’s F-test, which is a generalization of
Welch’s t-test for two or more independent samples. This test requires independence
and normality, just like the ANOVA F-test, but it allows unequal variances (called
heteroskedasticity). In R, it can be done with the

[Link](formula, [Link] = )

command, where formula is like before and [Link] is a logical variable indicating
whether to treat the variances in the samples as equal (it is FALSE by default).

Hence,

[Link](minutes ~ forms, [Link] = TRUE)

performs the ANOVA F-test and returns

One-way analysis of means

data: minutes and forms
F = 2.9358, num df = 3, denom df = 116, p-value = 0.03632

Apart from the different numbers of decimals, the F test statistic and the p-value on this
printout are the same than on the aov printout. To run the Welch test, we need to drop
the [Link] argument.5

[Link](minutes ~ forms)

returns

One-way analysis of means (not assuming equal variances)

data: minutes and forms
F = 2.8057, num df = 3.000, denom df = 64.426, p-value = 0.04661

Compared to the ANOVA F-test, the Welch F-test statistic is a bit smaller (2.8057) and
the corresponding p-value is a bit larger (0.04661), but H0 is still rejected at the 5%

5
We can use [Link] = FALSE, but it is unnecessary as this is the default option.
9
ECON20003 - Tutorial 6
significance level. Therefore, it does not matter this time whether the variances are equal
or not.

What if we reduce the significance level to 4%? At this level the ANOVA F-test rejects
the null hypothesis, but the Welch F-test does not. Because of the Levene’s test result,
we better rely on the former test, so at the 4% significance level we still conclude that it
does not take the same time on average to fill out the four different forms.

When the mean does not exist or the sampled populations are clearly not normally
distributed, we should use neither the ANOVA F-test nor the Welch F-test, but some
nonparametric test instead. The nonparametric counterpart of these tests is the Kruskal-
Wallis test, a generalization of the Wilcoxon rank-sum test to two or more (sub-) populations.

The Kruskal-Wallis test is a one-way ANOVA test for the equality of k  2 (sub-) population
medians based on the ranks of the observations in the pooled set of k independent samples,
one from each (sub-) population. It is based on the following assumptions:

(i) The data set constitutes k independent random samples of independent

observations drawn from k (sub-) populations.
(ii) The variable of interest is quantitative and continuous.
(iii) The measurement scale is at least ordinal.
(iv) The sampled populations differ at most with respect to their central locations (i.e.
medians).

The hypotheses are

H0: 1 = 2 = … = k and HA: not all population medians are equal.

To perform this test, we need to rank all available observations from the smallest (1) to the
largest (n = n1 + n2 +…+ nk), averaging the ranks of tied observations, and calculate the sum
of the ranks assigned to the observations in each sample. The test statistic is

k T2
12
H  j

n(n  1) j 1 n j
 3(n  1)

where Tj is the sum of ranks assigned to the observations in the jth sample.

The sampling distribution of this test statistic is non-standard. The small-sample critical
values for k = 3, nj = 1,...,6 and k = 4, nj = 1,...,5 are provided in the Kruskal-Wallis table that
you can download from the subject website, while for larger sample sizes (5 or more) it can
be approximated with a chi-square distribution, 2k-1.

10
ECON20003 - Tutorial 6
Exercise 2 (Selvanathan et al., p. 933, ex. 20.56)

It is common practice in the advertising business to create several different advertisements

and then ask a random sample of potential customers to rate the ads on several different
dimensions. Suppose that an advertising firm developed four different ads for a new
breakfast cereal and asked a random sample of 400 shoppers to rate the believability of the
advertisements. One hundred people viewed ad 1, another 100 viewed ad 2, another 100
saw ad 3, and another 100 saw ad 4. The ratings were: very believable (4), quite believable
(3), somewhat believable (2) and not believable at all (1). The responses are stored in the
t6e2 Excel file. Based on this data, can the firm’s management conclude at the 5%
significance level that differences exist in believability among the four ads?

This exercise is similar to the previous exercise, with one important difference. Namely, this
time the observed variable, response, is a qualitative variable measured on an ordinal scale.
For this reason, we cannot use a parametric test for the population means. Instead, we must
use a nonparametric procedure for the population medians. The null hypothesis is that there
is no difference among the four ads in terms of median believability, i.e., 1 = 2 = 3 = 4,
while the alternative hypothesis is that not all four population medians are the same.

The appropriate procedure is the Kruskal-Wallis test, granted that its requirements are met.
The data set consists of four independent samples of 100 independent observations each,
the underlying variable of interest is belief that can be thought of as a continuous variable,
and the measurement scale of the observed variable is ordinal. Hence, requirements (i), (ii)
and (iii) are satisfied. As for (iv), it can be checked by illustrating the data with four
histograms.

Launch RStudio, create a new project and script, name them t6e2, import the data saved in
the t6e2 Excel file6 and load it into your project. As you can see on the Environment tab,
there are two series in the data set: Response, which is the variable of interest, and Ad,
which is used to classify the observations according to the four advertisements.

Check whether the psych library is available on your computer. If it is not, install it.7

You can develop a histogram of Response for each of the four ads by executing the following
commands:

library(psych)
par(mfrow = c(2, 2))
for (i in c(1,2,3,4)) {hist(subset(Response, Ad == i))}

The first command, library(psych), loads the psych package in the active project. The
second command, par(mfrow), sets up a 2 x 2 plotting space to be able to view the four
histograms together in a single plot. The third command, for (i in c(1,2,3,4)), creates a for
loop that iterates 4 times. In each iteration i takes on the value of the corresponding element
of vector (1, 2, 3, 4) and a histogram is generated by the hist function for a subset of
Response defined by the Ad equals i restriction.

6
Before importing the data, check in Excel that it is in long format.
7
Click Tools / Install Packages … and write psych in the second box of the Install Packages dialogue window.
11
ECON20003 - Tutorial 6
You should get the plot8 shown below. The four histograms look similar, so the fourth
requirement of Kruskal-Wallis test is also satisfied.

Because of the large sample sizes, we do not do the Kruskal-Wallis test manually. The
relevant R function is

[Link](x, group)

In this case,

[Link](Response, Ad)

returns

8
You might get an error message if you use a laptop with its native screen as this combined plot might not fit
in the bottom-right plots panel of RStudio. If that is the case, try to resize this panel, make it wider.
12
ECON20003 - Tutorial 6
Kruskal-Wallis rank sum test
data: Response and Ad
Kruskal-Wallis chi-squared = 4.7766, df = 3, p-value = 0.1889

Under the null hypothesis the Kruskal-Wallis test statistic is distributed as a chi-square
random variable with k – 1 = 3 degrees of freedom. The p-value is 0.1889 > 0.05. Hence, at
the 5% significance level we maintain the null hypothesis and conclude that there are no
significant differences among the four ads in terms of believability.

Exercises for Assessment

Exercise 3

A farmer wants to know if the weight of parsley plants is influenced by using a fertilizer. He
selects 90 plants and randomly divides them into three groups of 30 plants each. He applies
a biological fertilizer to the first group, a chemical fertilizer to the second group and no
fertilizer at all to the third group. After a month he weighs all plants and saves the
measurements in the t6e3 Excel file.

Can we conclude from these data at the 5% significance level that fertilizer affects the weight
of parsley plants?

a) Obtain the basic descriptive statistics with R and then perform the ANOVA F-test
manually.

b) Repeat the ANOVA F-test in R.

c) What are the required conditions for the tests in parts (a) and (b)? Do they seem to be
satisfied?

d) Perform the Welch F-test in R. Does it lead to the same conclusion than the ANOVA F-
test?

e) Perform the Kruskal -Wallis test in R (use  = 0.05). Does it lead to a different conclusion
than the parametric tests in part (b) and (d)?

13
ECON20003 - Tutorial 6

Unit 5 Mba 1ST
No ratings yet
Unit 5 Mba 1ST
197 pages
Stat Slides 5
No ratings yet
Stat Slides 5
30 pages
Understanding One-Way ANOVA Basics
No ratings yet
Understanding One-Way ANOVA Basics
17 pages
Bus173 Anova One Way Spring2026 (3)
No ratings yet
Bus173 Anova One Way Spring2026 (3)
44 pages
ANOVA for Comparing Multiple Groups
No ratings yet
ANOVA for Comparing Multiple Groups
39 pages
Understanding ANOVA: Key Concepts & Applications
No ratings yet
Understanding ANOVA: Key Concepts & Applications
57 pages
One-Way ANOVA Analysis Guide
No ratings yet
One-Way ANOVA Analysis Guide
15 pages
Hypothesis Testing in ANOVA and Kruskal-Wallis
No ratings yet
Hypothesis Testing in ANOVA and Kruskal-Wallis
5 pages
Understanding Parametric Statistics
No ratings yet
Understanding Parametric Statistics
7 pages
Lab STATS
No ratings yet
Lab STATS
13 pages
ANOVA: Hypothesis Testing Explained
100% (1)
ANOVA: Hypothesis Testing Explained
19 pages
One-Way ANOVA Explained: Key Concepts
No ratings yet
One-Way ANOVA Explained: Key Concepts
45 pages
One-Way ANOVA for Diet Impact on Weight
No ratings yet
One-Way ANOVA for Diet Impact on Weight
19 pages
Understanding One-Way ANOVA Basics
No ratings yet
Understanding One-Way ANOVA Basics
31 pages
ANOVA Assumptions: Normality & Variance
No ratings yet
ANOVA Assumptions: Normality & Variance
6 pages
Exp Stats (Stat-1001) 2025
No ratings yet
Exp Stats (Stat-1001) 2025
137 pages
ANOVA Techniques: One-Way & Two-Way
No ratings yet
ANOVA Techniques: One-Way & Two-Way
32 pages
Methodology and Application of One-Way ANOVA: Keywords
No ratings yet
Methodology and Application of One-Way ANOVA: Keywords
6 pages
ANOVA: Testing Differences in Means
No ratings yet
ANOVA: Testing Differences in Means
10 pages
Z-Test, T-Test, and ANOVA Overview
No ratings yet
Z-Test, T-Test, and ANOVA Overview
15 pages
ANOVA: Experimental Design Basics
No ratings yet
ANOVA: Experimental Design Basics
90 pages
Understanding One-Way ANOVA Analysis
No ratings yet
Understanding One-Way ANOVA Analysis
14 pages
One-Way ANOVA in Experimental Design
No ratings yet
One-Way ANOVA in Experimental Design
42 pages
ANOVA and Categorical Data Analysis
No ratings yet
ANOVA and Categorical Data Analysis
34 pages
ANOVA in Experimental Design
No ratings yet
ANOVA in Experimental Design
35 pages
Introduction to Hypothesis Testing in R
No ratings yet
Introduction to Hypothesis Testing in R
47 pages
One-Way ANOVA: Statistical Analysis Guide
No ratings yet
One-Way ANOVA: Statistical Analysis Guide
7 pages
STAT 359 R Commands Study Guide
No ratings yet
STAT 359 R Commands Study Guide
7 pages
STAT4003 Final Exam Formula Sheet
No ratings yet
STAT4003 Final Exam Formula Sheet
8 pages
T-Tests and ANOVA Explained
No ratings yet
T-Tests and ANOVA Explained
31 pages
Advanced Statistics Exam Questions
100% (1)
Advanced Statistics Exam Questions
3 pages
One-Way ANOVA: Concepts and Methods
No ratings yet
One-Way ANOVA: Concepts and Methods
68 pages
Understanding ANOVA: Key Concepts and Applications
No ratings yet
Understanding ANOVA: Key Concepts and Applications
11 pages
Tukey Test for ANOVA Differences
No ratings yet
Tukey Test for ANOVA Differences
5 pages
One-Way ANOVA for Population Means
No ratings yet
One-Way ANOVA for Population Means
32 pages
Basic Statistics Formula Overview
No ratings yet
Basic Statistics Formula Overview
5 pages
Hypothesis Testing for Population Means
No ratings yet
Hypothesis Testing for Population Means
38 pages
Statistical Testing and Modelling in R
No ratings yet
Statistical Testing and Modelling in R
21 pages
One-Way ANOVA: Statistical Analysis Guide
No ratings yet
One-Way ANOVA: Statistical Analysis Guide
25 pages
Anova
No ratings yet
Anova
21 pages
10 F-Test and Analysis of Variance (ANOVA)
No ratings yet
10 F-Test and Analysis of Variance (ANOVA)
15 pages
ANOVA Hypothesis Testing Explained
No ratings yet
ANOVA Hypothesis Testing Explained
14 pages
STAT140 Test 1: Hypothesis Testing
No ratings yet
STAT140 Test 1: Hypothesis Testing
9 pages
Full Form of ANOVA in Statistics
No ratings yet
Full Form of ANOVA in Statistics
114 pages
One-Way ANOVA Calculator Tool
No ratings yet
One-Way ANOVA Calculator Tool
6 pages
Understanding ANOVA: A Statistical Guide
No ratings yet
Understanding ANOVA: A Statistical Guide
36 pages
ANOVA
No ratings yet
ANOVA
34 pages
ANOVA in Business Statistics
No ratings yet
ANOVA in Business Statistics
51 pages
ANOVA Analysis in R: InsectSprays Data
No ratings yet
ANOVA Analysis in R: InsectSprays Data
7 pages
Sample Test Answer Key
No ratings yet
Sample Test Answer Key
4 pages
Understanding Analysis of Variance (ANOVA)
No ratings yet
Understanding Analysis of Variance (ANOVA)
48 pages
Inferences for One and Two Populations
No ratings yet
Inferences for One and Two Populations
50 pages
ANOVA and Chi-Square Analysis Guide
No ratings yet
ANOVA and Chi-Square Analysis Guide
67 pages
One-Way ANOVA: Analysis & Examples
No ratings yet
One-Way ANOVA: Analysis & Examples
55 pages
Basic Statistical Analyses Overview
No ratings yet
Basic Statistical Analyses Overview
44 pages
Understanding ANOVA: Key Concepts
No ratings yet
Understanding ANOVA: Key Concepts
17 pages
One-Way ANOVA and F-Distribution Guide
No ratings yet
One-Way ANOVA and F-Distribution Guide
12 pages
Understanding One-Way ANOVA Analysis
No ratings yet
Understanding One-Way ANOVA Analysis
29 pages
One-Way ANOVA: Concepts and Applications
No ratings yet
One-Way ANOVA: Concepts and Applications
23 pages
Data Science Practice Final Exam
No ratings yet
Data Science Practice Final Exam
15 pages
Master Budget Overview and Process
No ratings yet
Master Budget Overview and Process
46 pages
Understanding Measures of Variation
No ratings yet
Understanding Measures of Variation
31 pages
Understanding Data Attributes and Measures
No ratings yet
Understanding Data Attributes and Measures
7 pages
Paper Authentication Austrian Carrots
No ratings yet
Paper Authentication Austrian Carrots
10 pages
ESG Impact on European REITs Performance
No ratings yet
ESG Impact on European REITs Performance
71 pages
MIT Probabilistic Systems Analysis Quiz Solutions
No ratings yet
MIT Probabilistic Systems Analysis Quiz Solutions
6 pages
Real-Life Hypothesis Testing Examples
No ratings yet
Real-Life Hypothesis Testing Examples
3 pages
Analyzing Penny Mass: Mean & Median
No ratings yet
Analyzing Penny Mass: Mean & Median
3 pages
Robust Adaptive GNSS RTK Algorithm for Smartphones
No ratings yet
Robust Adaptive GNSS RTK Algorithm for Smartphones
20 pages
Introduction to Demand Forecasting
No ratings yet
Introduction to Demand Forecasting
35 pages
Statistical Treatments in Quantitative Research
No ratings yet
Statistical Treatments in Quantitative Research
11 pages
(Libro) Arellano - Panel Data Econometrics 2003
100% (1)
(Libro) Arellano - Panel Data Econometrics 2003
244 pages
Variance Analysis for Hospital Costs
No ratings yet
Variance Analysis for Hospital Costs
15 pages
Statistics For Experimenters - Box and Hunter
91% (11)
Statistics For Experimenters - Box and Hunter
655 pages
(Ebook) Probability in Petroleum and Environmental Engineering by George V. Chilingar, Leonid F. Khilyuk and Herman H. Reike (Auth.) ISBN 9780976511304, 0976511304
No ratings yet
(Ebook) Probability in Petroleum and Environmental Engineering by George V. Chilingar, Leonid F. Khilyuk and Herman H. Reike (Auth.) ISBN 9780976511304, 0976511304
78 pages
Portfolio Management Formula Sheet
No ratings yet
Portfolio Management Formula Sheet
2 pages
Measures of Variability Explained
No ratings yet
Measures of Variability Explained
1 page
MH3511 Midterm Test Overview
No ratings yet
MH3511 Midterm Test Overview
4 pages
Flipped Classroom Impact on Student Success
No ratings yet
Flipped Classroom Impact on Student Success
30 pages
Graphical Presentation of Test Data
No ratings yet
Graphical Presentation of Test Data
99 pages
Math's Role in AI: Stats & Probability
No ratings yet
Math's Role in AI: Stats & Probability
45 pages
A Hybrid SEM ANN Approach For Predicting The Impact of Psychological Needs On Satisfaction With Generative AI Use
No ratings yet
A Hybrid SEM ANN Approach For Predicting The Impact of Psychological Needs On Satisfaction With Generative AI Use
22 pages
Nagpayong Elementary School Test Results
No ratings yet
Nagpayong Elementary School Test Results
72 pages
Statistical Analysis of Various Data Sets
No ratings yet
Statistical Analysis of Various Data Sets
2 pages
Statistics Exam for Management Students
No ratings yet
Statistics Exam for Management Students
7 pages
Statistics for Management Question Bank
No ratings yet
Statistics for Management Question Bank
12 pages
Materials: Nanosecond Laser Etching of Aluminum-Plated Composite Materials Applied To Frequency Selective Surfaces
No ratings yet
Materials: Nanosecond Laser Etching of Aluminum-Plated Composite Materials Applied To Frequency Selective Surfaces
11 pages
Concrete Compressive Strength Analysis
No ratings yet
Concrete Compressive Strength Analysis
48 pages

One-Way ANOVA Tutorial for ECON20003

Uploaded by

One-Way ANOVA Tutorial for ECON20003

Uploaded by

ECON20003 – QUANTITATIVE METHODS 2

Comparing Several Population Central Locations with Parametric and

Analysis of Variance (ANOVA) is a class of statistical procedures used to test differences

One-Way ANOVA Based on the Independent Measures Design

Parametric one-way ANOVA based on independent samples has five conditions:

(i) The data set constitutes k independent random samples of independent

SS    xij  x  , SST   n j  x j  x  , SSE    xij  x j 

where MSE is the mean squares for errors.

where MST is the mean squares for treatments.

The test statistic is the ratio of these two estimators,

Exercise 1 (Selvanathan et al., p. 636, ex. 15.13)

Save the data set in your project as t6e1.

You should get the following results:

Form1 Form2 Form3 Form4

 106.833  100.984   111.167  100.984  ]  8463.866

 29  (991.523  900.875  928.695  1023.040)  111479.857

The mean squares are

and the observed test statistic is

The basic form of the R function for one-way ANOVA is

to obtain the following ANOVA table:

Df Sum Sq Mean Sq F value Pr(>F)

minutes = c(t6e1_wide$Form1, t6e1_wide$Form2, t6e1_wide$Form3, t6e1_wide$Form4)

A factor can be generated by the

gl(n, k, length, labels)

forms = gl(4, 30, 120, c("Form1", "Form2", "Form3", "Form4"))

H 0 :  12   22  ...   k2 , H A : not all  i2 s are equal

In R, it can be performed easily with the

[Link](minutes ~ forms, [Link] = TRUE)

performs the ANOVA F-test and returns

One-way analysis of means

One-way analysis of means (not assuming equal variances)

(i) The data set constitutes k independent random samples of independent

The hypotheses are

H0: 1 = 2 = … = k and HA: not all population medians are equal.

It is common practice in the advertising business to create several different advertisements

Exercises for Assessment

b) Repeat the ANOVA F-test in R.

You might also like