0% found this document useful (0 votes)
126 views46 pages

Chai Square ANOVA 1

Okay, here are the steps: 1) Sum of row "Male" = 32 + 51 + 52 + 43 + 28 + 10 = 216 2) Sum of column "16-20" = 32 + 13 = 45 3) Sample size = 321 4) Expected frequency for cell "Male, 16-20" = (Sum of row "Male") × (Sum of column "16-20") / Sample size = (216) × (45) / 321 = 32 I would find the expected frequencies for the other "Male" cells similarly. The key steps are to use the marginal sums and sample size based on the independence assumption.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views46 pages

Chai Square ANOVA 1

Okay, here are the steps: 1) Sum of row "Male" = 32 + 51 + 52 + 43 + 28 + 10 = 216 2) Sum of column "16-20" = 32 + 13 = 45 3) Sample size = 321 4) Expected frequency for cell "Male, 16-20" = (Sum of row "Male") × (Sum of column "16-20") / Sample size = (216) × (45) / 321 = 32 I would find the expected frequencies for the other "Male" cells similarly. The key steps are to use the marginal sums and sample size based on the independence assumption.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 46

Chi-Square Tests and the

F-Distribution
Multinomial Experiments
A multinomial experiment is a probability experiment consisting of
a fixed number of trials in which there are more than two possible
outcomes for each independent trial. (Unlike the binomial
experiment in which there were only two possible outcomes.)

Example:
A researcher claims that the distribution of favorite pizza toppings
among teenagers is as shown below.
Topping Frequency, f
Each outcome is Cheese 41% The probability for
classified into Pepperoni 25% each possible outcome
categories. Sausage 15% is fixed.
Mushrooms 10%
Onions 9%
Larson & Farber, Elementary Statistics: Picturing the World, 3e 2
Chi-Square Goodness-of-Fit Test
A Chi-Square Goodness-of-Fit Test is used to test whether a frequency
distribution fits an expected distribution.
To calculate the test statistic for the chi-square goodness-of-fit test, the
observed frequencies and the expected frequencies are used.
The observed frequency O of a category is the frequency for the category observed in the sample data.
The expected frequency E of a category is the calculated frequency for the category. Expected frequencies are obtained assuming the specified (or hypothesized) distribution. The expected frequency for
the ith category is
Ei = npi
where n is the number of trials (the sample size) and pi is the assumed probability of the ith category.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 3


Observed and Expected Frequencies
Example:
200 teenagers are randomly selected and asked what their favorite
pizza topping is. The results are shown below.
Find the observed frequencies and the expected frequencies.

Topping Results (n % of Observed Expected Frequency


= 200) teenagers Frequency
Cheese 78 41% 78 200(0.41) = 82
Pepperoni 52 25% 52 200(0.25) = 50
Sausage 30 15% 30 200(0.15) = 30
Mushrooms 25 10% 25 200(0.10) = 20
Onions 15 9% 15 200(0.09) = 18

Larson & Farber, Elementary Statistics: Picturing the World, 3e 4


Chi-Square Goodness-of-Fit Test
For the chi-square goodness-of-fit test to be used, the following must be true.

1. The observed frequencies must be obtained by using a random sample.


2. Each expected frequency must be greater than or equal to 5.

The Chi-Square Goodness-of-Fit Test


If the conditions listed above are satisfied, then the sampling distribution for the
goodness-of-fit test is approximated by a chi-square distribution with k – 1
degrees of freedom, where k is the number of categories. The test statistic for the
chi-square goodness-of-fit test is

2 (O  E )2 The test is always a right-


χ 
E
where O represents the observed frequencytailed test.
of each category and E represents the
expected frequency of each category.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 5


Chi-Square Goodness-of-Fit Test
Performing a Chi-Square Goodness-of-Fit Test
In Words In Symbols
1. Identify the claim. State the null and State H0 and Ha.
alternative hypotheses.

2. Specify the level of significance. Identify .

3. Identify the degrees of freedom. d.f. = k – 1

4. Determine the critical value. Use Table 6 in


Appendix B.
5. Determine the rejection region.

Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 6
Chi-Square Goodness-of-Fit Test
Performing a Chi-Square Goodness-of-Fit Test
In Words In Symbols
6. Calculate the test statistic. 2 (O  E )2
χ 
E

7. Make a decision to reject or fail to If χ2 is in the rejection


reject the null hypothesis. region, reject H0.
Otherwise, fail to reject
H0.
8. Interpret the decision in the context
of the original claim.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 7


Chi-Square Goodness-of-Fit Test
Example:
A researcher claims that the distribution of favorite pizza toppings
among teenagers is as shown below. 200 randomly selected
teenagers are surveyed.
Topping Frequency, f
Cheese 39%
Pepperoni 26%
Sausage 15%
Mushrooms 12.5%
Onions 7.5%

Using  = 0.01, and the observed and expected values previously


calculated, test the surveyor’s claim using a chi-square goodness-
of-fit test.
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 8
Chi-Square Goodness-of-Fit Test
Example continued:
H0: The distribution of pizza toppings is 39% cheese, 26%
pepperoni, 15% sausage, 12.5% mushrooms, and 7.5%
onions. (Claim)
Ha: The distribution of pizza toppings differs from the claimed
or expected distribution.

Because there are 5 categories, the chi-square distribution has k – 1 = 5


– 1 = 4 degrees of freedom.

With d.f. = 4 and  = 0.01, the critical value is χ20 = 13.277.

Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 9
Chi-Square Goodness-of-Fit Test
Example continued:
Topping Observed Expected
Rejection Frequency Frequency
region
Cheese 78 82
  0.01 Pepperoni 52 50
Sausage 30 30
X2
Mushrooms 25 20
χ20 = 13.277 Onions 15 18

2 (O  E )2 (78  82)2 (52  50)2 (30  30)2 (25  20)2 (15  18)2
χ      
E 82 50 30 20 18
 2.025
Fail to reject H0.
There is not enough evidence at the 1% level to reject the
surveyor’s claim.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 10
Independence
Contingency Tables
An r  c contingency table shows the observed frequencies for
two variables. The observed frequencies are arranged in r rows and
c columns. The intersection of a row and a column is called a cell.

The following contingency table shows a random sample of 321


fatally injured passenger vehicle drivers by age and gender.
(Adapted from Insurance Institute for Highway Safety)

Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and older
Male 32 51 52 43 28 10
Female 13 22 33 21 10 6

Larson & Farber, Elementary Statistics: Picturing the World, 3e 12


Expected Frequency
Assuming the two variables are independent, you can use the
contingency table to find the expected frequency for each cell.

Finding the Expected Frequency for Contingency Table Cells


The expected frequency for a cell Er,c in a contingency table is

(Su m of r ow r )  (Su m of colu m n c )


E xpect ed fr equ en cy E r ,c  .
Sa m ple size

Larson & Farber, Elementary Statistics: Picturing the World, 3e 13


Expected Frequency
Example:
Find the expected frequency for each “Male” cell in the contingency table
for the sample of 321 fatally injured drivers. Assume that the variables,
age and gender, are independent.
Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and Total
older
Male 32 51 52 43 28 10 216
Female 13 22 33 21 10 6 105
Total 45 73 85 64 38 16 321

Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 14
Expected Frequency
Example continued:
Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and Total
older
Male 32 51 52 43 28 10 216
Female 13 22 33 21 10 6 105
Total 45 73 85 64 38 16 321

(Su m of r ow r )  (Su m of colu m n c )


E xpect ed fr equ en cy E r ,c 
Sa m ple size
216  45 216  73 216  85
E 1,1   30.28 E 1,2   49.12 E 1,3   57.20
321 321 321

216  64 216  38 216  16


E 1,4   43.07 E 1,5   25.57 E 1,6   10.77
321 321 321

Larson & Farber, Elementary Statistics: Picturing the World, 3e 15


Chi-Square Independence Test
A chi-square independence test is used to test the independence of
two variables. Using a chi-square test, you can determine whether
the occurrence of one variable affects the probability of the
occurrence of the other variable.

For the chi-square independence test to be used, the following must


be true.
1. The observed frequencies must be obtained by using a random sample.
2. Each expected frequency must be greater than or equal to 5.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 16


Chi-Square Independence Test
The Chi-Square Independence Test
If the conditions listed are satisfied, then the sampling distribution
for the chi-square independence test is approximated by a chi-
square distribution with
(r – 1)(c – 1)
degrees of freedom, where r and c are the number of rows and
columns, respectively, of a contingency table. The test statistic for
the chi-square independence test is
(O  E )2 The test is always a right-
2
χ  tailed test.
E
where O represents the observed frequencies and E represents the
expected frequencies.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 17
Chi-Square Independence Test
Performing a Chi-Square Independence Test
In Words In Symbols
1. Identify the claim. State the null and State H0 and Ha.
alternative hypotheses.

2. Specify the level of significance. Identify .

3. Identify the degrees of freedom. d.f. = (r – 1)(c – 1)

4. Determine the critical value. Use Table 6 in


Appendix B.
5. Determine the rejection region.

Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 18
Chi-Square Independence Test
Performing a Chi-Square Independence Test
In Words In Symbols
6. Calculate the test statistic. 2 (O  E )2
χ 
E

7. Make a decision to reject or fail to If χ2 is in the rejection


reject the null hypothesis. region, reject H0.
Otherwise, fail to reject
H0.
8. Interpret the decision in the context
of the original claim.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 19


Chi-Square Independence Test
Example:
The following contingency table shows a random sample of 321
fatally injured passenger vehicle drivers by age and gender. The
expected frequencies are displayed in parentheses. At  = 0.05,
can you conclude that the drivers’ ages are related to gender in
such accidents?
Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and Total
older
Male 32 51 52 43 28 10 216
(30.28) (49.12) (57.20) (43.07) (25.57) (10.77)
Female 13 22 33 21 10 6 (5.23) 105
(14.72) (23.88) (27.80) (20.93) (12.43)
45 73 85 64 38 16 321
Larson & Farber, Elementary Statistics: Picturing the World, 3e 20
Chi-Square Independence Test
Example continued:
Because each expected frequency is at least 5 and the drivers were
randomly selected, the chi-square independence test can be used to
test whether the variables are independent.

H0: The drivers’ ages are independent of gender.


Ha: The drivers’ ages are dependent on gender. (Claim)

d.f. = (r – 1)(c – 1) = (2 – 1)(6 – 1) = (1)(5) = 5

With d.f. = 5 and  = 0.05, the critical value is χ20 = 11.071.

Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 21
Chi-Square Independence Test
Example continued: O E O–E (O – E)2 (O  E )2
Rejection
E
32 30.28 1.72 2.9584 0.0977
region
51 49.12 1.88 3.5344 0.072
  0.05 52 57.20 5.2 27.04 0.4727
43 43.07 0.07 0.0049 0.0001
X2 28 25.57 2.43 5.9049 0.2309
10 10.77 0.77 0.5929 0.0551
χ20 = 11.071
13 14.72 1.72 2.9584 0.201
(O  E )2 22 23.88 1.88 3.5344 0.148
2
χ   2.84 33 27.80 5.2 27.04 0.9727
E
21 20.93 0.07 0.0049 0.0002
Fail to reject H0. 10 12.43 2.43 5.9049 0.4751
6 5.23 0.77 0.5929 0.1134

There is not enough evidence at the 5% level to conclude that age


is dependent on gender in such accidents.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 22
Comparing Two
Variances
F-Distribution
Let s 12 a n d s 22represent the sample variances of two different
populations. If both populations are normal and the population
2 2
variances are equal,σthen
1 a n the
d σ 2sampling distribution of

s 12
F  2
is called an F-distribution.
s2
There are several properties of this distribution.

1. The F-distribution is a family of curves each of which is determined


by two types of degrees of freedom: the degrees of freedom
corresponding to the variance in the numerator, denoted d.f.N, and
the degrees of freedom corresponding to the variance in the
denominator, denoted d.f.D. Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 24
F-Distribution
Properties of the F-distribution continued:
2. F-distributions are positively skewed.
3. The total area under each curve of an F-distribution is equal to 1.
4. F-values are always greater than or equal to 0.
5. For all F-distributions, the mean value of F is approximately equal to
1.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 25


Critical Values for the F-Distribution
Finding Critical Values for the F-Distribution
1. Specify the level of significance .
2. Determine the degrees of freedom for the numerator, d.f.N.
3. Determine the degrees of freedom for the denominator, d.f.D.
4. Use Table 7 in Appendix B to find the critical value. If the hypothesis
test is
a. one-tailed, use the  F-table.
1
b. two-tailed, use the 2 F-table.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 26


Critical Values for the F-Distribution
Example:
Find the critical F-value for a right-tailed test when  = 0.05,
d.f.N = 5 and d.f.D = 28.
Appendix B: Table 7: F-Distribution
d.f.D: Degrees  = 0.05
of freedom, d.f.N: Degrees of freedom, numerator
denominator

1 2 3 4 5 6
1 161.4 199.5 215.7 224.6 230.2 234.0
2 18.51 19.00 19.16 19.25 19.30 19.33
27 4.21 3.35 2.96 2.73 2.57 2.46
28 4.20 3.34 2.95 2.71 2.56 2.45
29 4.18 3.33 2.93 2.70 2.55 2.43

The critical value is F0 = 2.56.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 27


Critical Values for the F-Distribution
Example:
Find the critical F-value for a two-tailed test 1 1
 = (0.10) = 0.05
when  = 0.10, d.f.N = 4 and d.f.D = 6. 2 2

Appendix B: Table 7: F-Distribution


d.f.D: Degrees  = 0.05
of freedom, d.f.N: Degrees of freedom, numerator
denominator

1 2 3 4 5 6
1 161.4 199.5 215.7 224.6 230.2 234.0
2 18.51 19.00 19.16 19.25 19.30 19.33
3 10.13 9.55 9.28 9.12 9.01 8.94
4 7.71 6.94 6.59 6.39 6.26 6.16
5 6.61 5.79 5.41 5.19 5.05 4.95
6 5.99 5.14 4.76 4.53 4.39 4.28
The critical
7 value5.59
is F0 =4.74
4.53. 4.35 4.12 3.97 3.87
Larson & Farber, Elementary Statistics: Picturing the World, 3e 28
Two-Sample F-Test for Variances
Two-Sample F-Test for Variances
A two-sample F-test is used to compare two population variances
when σ 12aasample
n d σ 22 is randomly selected from each population.
The populations must be independent and normally distributed.
The test statistic is
s 12
F  2
s2
where s 12 a n d s 22 represent the sample variances with
2 2
sThe
1  s 2.
degrees of freedom for the numerator is d.f.N = n1 – 1 and
the degrees of freedom for the denominator is d.f.D = n2 – 1, where
n1 is the size of the sample having
2
variance and n2 is the size of
s1
the sample having variance
s 22.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 29
Two-Sample F-Test for Variances
Using a Two-Sample F-Test to Compare σ 12 and σ 22
In Words In Symbols
1. Identify the claim. State the null and State H0 and Ha.
alternative hypotheses.

2. Specify the level of significance. Identify .

3. Identify the degrees of freedom. d.f.N = n1 – 1


d.f.D = n2 – 1

4. Determine the critical value. Use Table 7 in


Appendix B.

Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 30
Two-Sample F-Test for Variances
Using a Two-Sample F-Test to Compare σ 12 and σ 22
In Words In Symbols
5. Determine the rejection region.
6. Calculate the test statistic. s 12
F  2
s2

7. Make a decision to reject or fail to If F is in the rejection


reject the null hypothesis. region, reject H0.
Otherwise, fail to reject
H0.
8. Interpret the decision in the context
of the original claim.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 31


Two-Sample F-Test
Example:
A travel agency’s marketing brochure indicates that the standard
deviations of hotel room rates for two cities are the same. A
random sample of 13 hotel room rates in one city has a standard
deviation of $27.50 and a random sample of 16 hotel room rates in
the other city has a standard deviation of $29.75. Can you reject
the agency’s claim at  = 0.01?

Because 29.75 > 27.50, s 12 =885.06 a n d s 22  756.25.


H0: σ 12  σ 22 (Claim)
Ha: σ 12  σ 22
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 32
Two-Sample F-Test
Example continued:
1 1
This is a two-tailed test with  =2 ( 0.01)
2 = 0.005, d.f.N = 15 and d.f.D
= 12.
The critical value is F0 = 4.72.
1
  0.005
2 The test statistic is
s 12 885.06
F F  2  1.17.
1 2 3 4
F0 = 4.72 s2 756.25

Fail to reject H0.


There is not enough evidence at the 1% level to reject the claim
that the standard deviation of the hotel room rates for the two
cities are the same.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 33
Analysis of Variance
One-Way ANOVA
One-way analysis of variance is a hypothesis-testing technique
that is used to compare means from three or more populations.
Analysis of variance is usually abbreviated ANOVA.

In a one-way ANOVA test, the following must be true.


1. Each sample must be randomly selected from a normal, or approximately normal, population.
2. The samples must be independent of each other.
3. Each population must have the same variance.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 35


One-Way ANOVA
Va r ia n ce bet ween sa m ples
Test st at ist ic 
Var ia n ce wit h in sa m ples

1. The variance between samples MSB measures the differences related to the treatment given to each
sample and is sometimes called the mean square between.
2. The variance within samples MSW measures the differences related to entries within the same sample.
This variance, sometimes called the mean square within, is usually due to sampling error.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 36


One-Way ANOVA
One-Way Analysis of Variance Test
If the conditions listed are satisfied, then the sampling
distribution for the test is approximated by the F-
distribution. The test statistic is
MS B
F  .
M SW
The degrees of freedom for the F-test are
d.f.N = k – 1
and
d.f.D = N – k
where k is the number of samples and N is the sum of the sample
sizes.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 37
Test Statistic for a One-Way ANOVA
Finding the Test Statistic for a One-Way ANOVA Test
In Words In Symbols
1. Find the mean and variance of each x 2 (x  x )2
x  s 
sample. n n 1

2. Find the mean of all entries in all x


x 
samples (the grand mean). N
3. Find the sum of squares between the S S B   n i (x i  x )2
samples.
4. Find the sum of squares within the S S W  (n i  1)s i2
samples.
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 38
Test Statistic for a One-Way ANOVA
Finding the Test Statistic for a One-Way ANOVA Test
In Words In Symbols
5. Find the variance between the SS B SS B
MS B  
samples. k  1 d.f.N

6. Find the variance within the S SW SS


M SW   W
samples N  k d.f.D

7. Find the test statistic. MS B


F 
M SW

Larson & Farber, Elementary Statistics: Picturing the World, 3e 39


Performing a One-Way ANOVA Test

Performing a One-Way Analysis of Variance Test


In Words In Symbols
1. Identify the claim. State the null and State H0 and Ha.
alternative hypotheses.

2. Specify the level of significance. Identify .

3. Identify the degrees of freedom. d.f.N = k – 1 d.f.D


=N–k

4. Determine the critical value. Use Table 7 in


Appendix B.
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 40
Performing a One-Way ANOVA Test
Performing a One-Way Analysis of Variance Test
In Words In Symbols
5. Determine the rejection region.
6. Calculate the test statistic. MS B
F 
M SW

7. Make a decision to reject or fail to If F is in the rejection


reject the null hypothesis. region, reject H0.
Otherwise, fail to reject
H0.
8. Interpret the decision in the context
of the original claim.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 41


ANOVA Summary Table
A table is a convenient way to summarize the results in a one-way
ANOVA test.

Sum of Degrees of Mean


Variation F
squares freedom squares

SS B
Between SSB d.f.N MS B  MS B  MSW
d.f.N

S SW
Within SSW d.f.D M SW 
d.f.D

Larson & Farber, Elementary Statistics: Picturing the World, 3e 42


Performing a One-Way ANOVA Test
Example:
The following table shows the salaries of randomly selected
individuals from four large metropolitan areas. At  = 0.05, can
you conclude that the mean salary is different in at least one of the
areas? (Adapted from US Bureau of Economic Analysis)

Pittsburgh Dallas Chicago Minneapolis


27,800 30,000 32,000 30,000
28,000 33,900 35,800 40,000
25,500 29,750 28,000 35,000
29,150 25,000 38,900 33,000
30,295 34,055 27,245 29,805

Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 43
Performing a One-Way ANOVA Test
Example continued:
H0: μ1 = μ2 = μ3 = μ4
Ha: At least one mean is different from the others. (Claim)

Because there are k = 4 samples, d.f.N = k – 1 = 4 – 1 = 3.

The sum of the sample sizes is


N = n1 + n2 + n3 + n4 = 5 + 5 + 5 + 5 = 20.

d.f.D = N – k = 20 – 4 = 16

Using  = 0.05, d.f.N = 3, and d.f.D = 16,


the critical value is F0 = 3.24.
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 44
Performing a One-Way ANOVA Test
Example continued:
To find the test statistic, the following must be calculated.

 x 140745  152705  161945  167805


x    31160
N 20
SS B  n (x  x )2
MS B   i i
d.f.N k 1
5(28149  31160)2  5(30541  31160)2
 
4 1
5(32389  31160)2  5(33561  31160)2
4 1
 27874206.67
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 45
Performing a One-Way ANOVA Test
Example continued:
S SW (n i  1)s i2
MSW  
d.f.D N k
(5  1)(3192128.94)  (5  1)(13813030.08)
 
20  4
(5  1)(24975855.83)  (5  1)(17658605.02)
20  4
 14909904.97 Test statistic Critical
value
MS B 27874206.67  1.870
F   1.870 < 3.24.
M SW 14909904.34
Fail to reject H0.
There is not enough evidence at the 5% level to conclude that the mean
salary is different in at least one of the areas.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 46

You might also like