Analysis of Variance Sp 24-25
Analysis of Variance Sp 24-25
ANOVA
1. ANOVA is used to test the equality of three or more than
three populations means simultaneously.
2. F-distribution is used as a test statistic.
3. Note: F-distribution is continuous, cannot be negative and is
positively skewed. Further, like normal distribution, there is a
family of F-distribution.
Assumptions:
1. The populations follow distribution.
2. The populations have equal SD.
3. The populations are independent.
Why ANOVA? Why not t-test to compare more than 2 populations
means?
1. If we use t distribution to compare among more than 2
populations means then the size of the significance level
becomes larger.
2. Let us consider 4 populations A, B, C & D and their
corresponding means are µ1, µ2, µ3 and µ4 respectively.
3. Using the t-distribution to compare the four population
means, we would have to conduct six different t-tests. That
is
(i) µ1 vs 2 (ii) µ1 vs µ3 (iii) µ1 vs µ4 (iv) µ2 vs µ3 (v) µ2
vs µ4 and
(vi) µ3 vs µ4
4. For each t-test, suppose we choose an α = .05.
i.e. Type I error: P(Rejecting H0/H0 is true)=0.05.
Now according to the complement rule of probability the
complement is the P(We do not reject H0/H0 is true) =1-
0.05=0.95
5. Because we conduct six separate (independent) tests, the
probability that all six tests result in correct decisions is:
P(All correct) = (.95)(.95)(.95)(.95)(.95)(.95) =0 .735
6. Thus P(at least one incorrect decision due to sampling)=1−
0.735= 0.265
7. To summarize, if we conduct six independent tests using the
t distribution, the likelihood of rejecting a true null
hypothesis because of sampling error is an unsatisfactory
0.265.
8. The ANOVA technique allows us to compare population
means simultaneously at a selected significance level. It
avoids the buildup of Type I error associated with testing
many hypotheses.
ANOVA Steps
1. Formulation of H0 & H1:
H0: All populations means are equal
i.e. µ1=µ2=µ3=………………..=µk (k is the number of
treatments)
H1: At least two of them are not equal (not all population
means are equal)
Here, k is the number of treatments.
2. Select, level of significance, α=0.05, 0.01, 0.10 etc.
MST
3. Test statistic F= MSE distributed with F, α,k-1, n-k (k-1 is the
df in numerator & n-k is the df of denominator
4. If F (cal)> F(tab), H0 is rejected, accepted otherwise
5. ANOVA Table (we need to developed an ANOVA table to find
F statistic)
ANOVA Table
Recently airlines cut services, such as meals and snacks during flights, and started charging for checked
luggage. A group of four carriers hired Brunner Marketing Research Inc. to survey passengers regarding
their level of satisfaction with a recent flight. The survey included questions on ticketing, boarding, in-flight
service, baggage handling, pilot communication, and so forth. Twenty-five questions offered a range of
possible answers: excellent, good, fair, or poor. A response of excellent was given a score of 4, good a 3,
fair a 2, and poor a 1. These responses were then totaled, so the total score was an indication of the
satisfaction with the flight. The greater the score, the higher the level of satisfaction with the service. The
highest possible score was 100. Brunner randomly selected and surveyed passengers from the four
airlines. Following is the sample information. Is there a difference in the mean satisfaction level
among the four airlines? Use the .01 significance level. Use the software output to answer the question.
Source DF SS MS F P
Factor 3 890.7 296.9 8.99 0.001
Error 18 594.4 33.0
Total 21 1485.1
Step 1:
H0: μN = μW =μP = μB
H1: The mean scores are not all equal/ At least two mean scores are
not equal.
The critical value for the F-statistic is found from F-distribution table. The CV
for F is 5.09.
Note: We will discuss in the next class how to find CV from table
Decision:
Here, x is the individual observation and x Gis the grand (overall mean), where
x G=
∑x
n
SST (Sum of square due to treatment) = The sum of the squared
differences between each treatment mean and grand mean
k
∑ ni (x i−x G)2
i=1
Here, x1j, x2j and x3j denote the individual observations of treatment1,
treatment 2 & treatment 3 respectively and x 1 , x 2 and x 3 denote the mean of
treatment 1, treatment 2 and treatment 3 respectively.
Note: TSS=SST +
SSE
Calculations:
x G=
∑ x = 94+90+ 85+80+ … … … 65 = 1664 = 7 5. 64
n 22 22
x 1=
∑ x 1 j = 94+ 90+85+80 = 349 = 87.25
n1 4 4
x 2=
∑ x 2 j = 75+68+77+ 83+88 = 391 =¿78.2
n2 5 5
x 3=
∑ x 3 j = 70+73+76+ 78+80+68+65 = 510 = 72.86
n3 7 7
x4 =
∑ x 4 j = 68+70 +72+ 65+74+ 65 = 414 =¿69
n4 6 6
∑ (x−x G ) =
2
TSS= (94-75.64)2+(90-75.64)2 +(85-75.64)2 + (80-75.64)2
+(75-75.64)2+(68-75.64)2+(77-75.64)2+(83-
75.64)2+(88-75.64)2
+(70-75.64)2 +(73-75.64)2+(76-75.64)2+(78-75.64)2+(80-
75.64)2+(88-75.64)2+(65-75.64)2
+(68-75.64)2+(70-75.64)2+(72-75.64)2+(65-
75.64)2+(74-75.64)2+(65-75.64)2
=1485.10
The Error SS is SSE=
n1 n2 n3
+(75-78.2)2+(68-78.2)2+(77-78.2)2+(83-78.2)2+(88-
78.2)2
+(70-72.86)2 +(73-72.86)2+(76-72.86)2+(78-72.86)2+(80-
72.86) +(88-72.86)2+(65-72.86)2
2
+(68-69)2+(70-69)2+(72-69)2+(65-69)2+(74-69)2+(65-
69)2
= 594.41
k
SSTreat is SST= ∑ ni (x i−x G)2
i=1
= 4(87.25-75.64)2+5(78.2-75.64)2+7(72.86-75.64)2+6(69-75.64)2
=539.1684+32.768+54.0988+264.5376
=890.69
Note: TSS=Total Sum of Square, SSE= Error Sum of Square, SST=Treatment Sum of
Square, MST=Mean Sum of Square due to Treatment, MSE=Mean Sum of Square due
to Error
Special Note: As H0 is rejected, i.e. not all populations means are equal. At this stage
we can use t-distribution to identify which pairs are not equal. The following formula is
used to do so.