0% found this document useful (0 votes)
15 views

Analysis of Variance Sp 24-25

Chapter 12 discusses one-way ANOVA, a statistical method used to test the equality of means across three or more populations using the F-distribution as a test statistic. It outlines the assumptions required for ANOVA, the reasons for preferring it over multiple t-tests, and the steps involved in conducting an ANOVA test, including hypothesis formulation and interpretation of results. A mini case study illustrates the application of ANOVA in assessing passenger satisfaction across four airlines, demonstrating the calculation of the ANOVA table and the decision-making process based on the F-statistic.

Uploaded by

warriorsoul1029
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Analysis of Variance Sp 24-25

Chapter 12 discusses one-way ANOVA, a statistical method used to test the equality of means across three or more populations using the F-distribution as a test statistic. It outlines the assumptions required for ANOVA, the reasons for preferring it over multiple t-tests, and the steps involved in conducting an ANOVA test, including hypothesis formulation and interpretation of results. A mini case study illustrates the application of ANOVA in assessing passenger satisfaction across four airlines, demonstrating the calculation of the ANOVA table and the decision-making process based on the F-statistic.

Uploaded by

warriorsoul1029
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Chapter 12: Analysis of Variance (ANOVA)-one way

ANOVA
1. ANOVA is used to test the equality of three or more than
three populations means simultaneously.
2. F-distribution is used as a test statistic.
3. Note: F-distribution is continuous, cannot be negative and is
positively skewed. Further, like normal distribution, there is a
family of F-distribution.

Assumptions:
1. The populations follow distribution.
2. The populations have equal SD.
3. The populations are independent.
Why ANOVA? Why not t-test to compare more than 2 populations
means?
1. If we use t distribution to compare among more than 2
populations means then the size of the significance level
becomes larger.
2. Let us consider 4 populations A, B, C & D and their
corresponding means are µ1, µ2, µ3 and µ4 respectively.
3. Using the t-distribution to compare the four population
means, we would have to conduct six different t-tests. That
is
(i) µ1 vs 2 (ii) µ1 vs µ3 (iii) µ1 vs µ4 (iv) µ2 vs µ3 (v) µ2
vs µ4 and
(vi) µ3 vs µ4
4. For each t-test, suppose we choose an α = .05.
i.e. Type I error: P(Rejecting H0/H0 is true)=0.05.
Now according to the complement rule of probability the
complement is the P(We do not reject H0/H0 is true) =1-
0.05=0.95
5. Because we conduct six separate (independent) tests, the
probability that all six tests result in correct decisions is:
P(All correct) = (.95)(.95)(.95)(.95)(.95)(.95) =0 .735
6. Thus P(at least one incorrect decision due to sampling)=1−
0.735= 0.265
7. To summarize, if we conduct six independent tests using the
t distribution, the likelihood of rejecting a true null
hypothesis because of sampling error is an unsatisfactory
0.265.
8. The ANOVA technique allows us to compare population
means simultaneously at a selected significance level. It
avoids the buildup of Type I error associated with testing
many hypotheses.
ANOVA Steps
1. Formulation of H0 & H1:
H0: All populations means are equal
i.e. µ1=µ2=µ3=………………..=µk (k is the number of
treatments)
H1: At least two of them are not equal (not all population
means are equal)
Here, k is the number of treatments.
2. Select, level of significance, α=0.05, 0.01, 0.10 etc.
MST
3. Test statistic F= MSE distributed with F, α,k-1, n-k (k-1 is the
df in numerator & n-k is the df of denominator
4. If F (cal)> F(tab), H0 is rejected, accepted otherwise
5. ANOVA Table (we need to developed an ANOVA table to find
F statistic)
ANOVA Table

Source of df SS (sum of MS (mean F (cal)


variation square) sum of
square)
Treatment k-1 SST SST/k-1=MST F=MST/
(factor) MSE
Error n-k SSE SSE/n-k=MSE
Total n-1 TSS TSS/n-1=TMS
Mini Case (Source-Text):

Recently airlines cut services, such as meals and snacks during flights, and started charging for checked
luggage. A group of four carriers hired Brunner Marketing Research Inc. to survey passengers regarding
their level of satisfaction with a recent flight. The survey included questions on ticketing, boarding, in-flight
service, baggage handling, pilot communication, and so forth. Twenty-five questions offered a range of
possible answers: excellent, good, fair, or poor. A response of excellent was given a score of 4, good a 3,
fair a 2, and poor a 1. These responses were then totaled, so the total score was an indication of the
satisfaction with the flight. The greater the score, the higher the level of satisfaction with the service. The
highest possible score was 100. Brunner randomly selected and surveyed passengers from the four
airlines. Following is the sample information. Is there a difference in the mean satisfaction level
among the four airlines? Use the .01 significance level. Use the software output to answer the question.

One-way ANOVA: Northern, WTA, Pocono,


Branson

Source DF SS MS F P
Factor 3 890.7 296.9 8.99 0.001
Error 18 594.4 33.0
Total 21 1485.1

Solution (using statistical software output):

We will use the six-step hypothesis-testing procedure.

Step 1:

H0: μN = μW =μP = μB

H1: The mean scores are not all equal/ At least two mean scores are
not equal.

Step 2: Select the level of significance. Given, α= 0.01.

Step 3: Determine the test statistic, F.


MST
F= with (k-1), (n-k) df
MSE
F-statistic can be found from software output given above

The critical value for the F-statistic is found from F-distribution table. The CV
for F is 5.09.

Note: We will discuss in the next class how to find CV from table

Decision:

F(cal)=8.99, F(tab)=5.09. As F(cal)>F(tab), Hence, H0 is rejected.

Use alternative decision rule:

As p=0.001<0.05. Hence, null hypothesis is rejected

Interpretation: As null hypothesis is rejected, we conclude that there is no


difference in the mean scores for the four airlines. That means, there is a
difference in at least one pair of mean scores. However, at this point we do
not know which pair or how many pairs differ.

Using hands (manually) to develop ANOVA table


To complete the ANOVA table, we need to calculate the following
elements:

TSS (Total sum of square)=The sum of squared differences between each


observation and overall mean =∑ (x−x G )
2

Here, x is the individual observation and x Gis the grand (overall mean), where

x G=
∑x
n
SST (Sum of square due to treatment) = The sum of the squared
differences between each treatment mean and grand mean
k

∑ ni (x i−x G)2
i=1

Here, ni is the number of observations for each treatment, x i is the each


treatment mean and x G isthe grand mean.
SSE (Erro sum of square/random)=The sum of the squared differences
between each observation and its treatment=
n1 n2 n3

∑ (x 1 j−x 1) +¿ ∑ (x 2 j−x 2) +… … .+∑ ( x kj −x k )2 ¿


2 2

j=1 j=1 j=1

Here, x1j, x2j and x3j denote the individual observations of treatment1,
treatment 2 & treatment 3 respectively and x 1 , x 2 and x 3 denote the mean of
treatment 1, treatment 2 and treatment 3 respectively.

Note: TSS=SST +
SSE

Calculations:

x G=
∑ x = 94+90+ 85+80+ … … … 65 = 1664 = 7 5. 64
n 22 22

x 1=
∑ x 1 j = 94+ 90+85+80 = 349 = 87.25
n1 4 4

x 2=
∑ x 2 j = 75+68+77+ 83+88 = 391 =¿78.2
n2 5 5

x 3=
∑ x 3 j = 70+73+76+ 78+80+68+65 = 510 = 72.86
n3 7 7

x4 =
∑ x 4 j = 68+70 +72+ 65+74+ 65 = 414 =¿69
n4 6 6

∑ (x−x G ) =
2
TSS= (94-75.64)2+(90-75.64)2 +(85-75.64)2 + (80-75.64)2

+(75-75.64)2+(68-75.64)2+(77-75.64)2+(83-
75.64)2+(88-75.64)2
+(70-75.64)2 +(73-75.64)2+(76-75.64)2+(78-75.64)2+(80-
75.64)2+(88-75.64)2+(65-75.64)2

+(68-75.64)2+(70-75.64)2+(72-75.64)2+(65-
75.64)2+(74-75.64)2+(65-75.64)2

=1485.10
The Error SS is SSE=
n1 n2 n3

∑ (x 1 j−x 1) +¿ ∑ (x 2 j−x 2) +… … .+∑ ( x kj −x k )2 ¿


2 2

j=1 j=1 j=1

= (94-87.25)2+(90-87.25)2 +(85-87.25)2 + (80-87.25)2

+(75-78.2)2+(68-78.2)2+(77-78.2)2+(83-78.2)2+(88-
78.2)2
+(70-72.86)2 +(73-72.86)2+(76-72.86)2+(78-72.86)2+(80-
72.86) +(88-72.86)2+(65-72.86)2
2

+(68-69)2+(70-69)2+(72-69)2+(65-69)2+(74-69)2+(65-
69)2

= 594.41
k
SSTreat is SST= ∑ ni (x i−x G)2
i=1

= 4(87.25-75.64)2+5(78.2-75.64)2+7(72.86-75.64)2+6(69-75.64)2

=539.1684+32.768+54.0988+264.5376

=890.69

Source of df SS (sum of MS (mean sum of F (cal)


variation square) square)
Treatment k-1=4-1=3 SST=890.6 SST 890.69 MST 296.90
MST= = = F= =
(factor) 8 k−1 3 MSE 33.02
296.90 =8.99
Error n-k=22- SSE=594.4 SSE 594.41
MSE= = =
4=18 1 n−k 18
33.02
Total n-1=22- TSS=1485.
1=21 10
Decision: Fcal=8.90, F0.01, 3, 18= 5.09. Since, F (cal)>F(tab), H0 is rejected.

Note: TSS=Total Sum of Square, SSE= Error Sum of Square, SST=Treatment Sum of
Square, MST=Mean Sum of Square due to Treatment, MSE=Mean Sum of Square due
to Error
Special Note: As H0 is rejected, i.e. not all populations means are equal. At this stage
we can use t-distribution to identify which pairs are not equal. The following formula is
used to do so.

You might also like