Chapter 6
Chapter 6
1
Chi Square Tests of Significance
There are three types of Chi Square tests:
1. Goodness of Fit
2. Independence
3. Test for Homogeneity of Proportions
Goodness of Fit
A goodness-of-fit test is an inferential procedure used to determine whether a
frequency distribution follows a claimed distribution. It is a test of the agreement or
conformity between the observed frequencies (Oi) and the expected frequencies ( Ei ) for
several classes or categories.
When you are testing to see whether a frequency distribution fits a specific
pattern, you can use the chi-square goodness-of-fit test. An emergency service
may want to see whether it receives more calls at certain times of the day than
at others,
The goodness-of-fit chi-square is used when you have one nominal variable and you want to know whether the
frequency of occurrence (e.g., the number of people in each group) differs from what we would expect by
chance alone.
The goodness-of-fit chis quare only looks at one variable at a time
Goodness-of-fit test only evaluates one variable at a time; it does not tell us anything about the relationship
between two variables.
The goodness-of-fit test often assumes that the expected values for each group are equal. For example, with two
Groups we may expect 50% of our sample to be in each group.
Example: An emergency service may want to see whether it receives more calls at certain times of the day than at
others, A traffic engineer may wish to see whether accidents occur more often on some days than on others,
The formula for the test statistic for Chi Square is
The frequencies were obtained from a sample data or actual frequencies are called the observed frequencies. The
frequencies obtained by calculation are called the expected frequencies.
In the goodness-of-fit test, the degrees of freedom are equal to the number of categories minus 1
Example:
A grocery store manager wishes to determine whether a certain product will sell equally well in any of five locations in
the store. Five displays are set up, one in each location, and the resulting numbers of the product sold are noted
1 2 3 4 5
Observed number 45 30 53 34 48
2
Is there a significant evidence to conclude that the location makes a difference in product sold? Test at both 5%and 10%
level of significance?
Step 1. State the hypotheses and identify the claim.
Ho: There is no location difference
H1: There is location difference
Step 2: Find the critical value.
α=0.05 and the critical value is 5-1 =4 at 0.05 is 9.488
Step3. Test statistic Compute the test value by subtracting the expected value from the
corresponding observed value, squaring the result and dividing by the expected value,
and finding the sum. The expected value for each category is
2 2 2 2 2
( 45−42) (30−42) (53−42) (34−42) (48−42)
+ + + +
42 42 42 42 42
0.214+3.43+2.89+1.523+0.3809
8.43
Step 4: Make a decision: the decision is not to reject Ho, since 8.43<9.488
Step 5: Conclusion: there is no enough evidence to reject Ho or there is no a location
difference in the product sold.
Example : In a recent year, at the 6 pm time slot, TV channels 2, 3, 4 and 5 captured the entire audience with 30%,
25%, 20%, and 25% respectively. The TV executives think these numbers are no longer
Correct. During the first week of the next season they decide to poll 500 viewers.
Test the claim that is there a significant evidence to reject the null hypothesis that is The television audience is
distributed over channels 2, 3, 4, and 5 with percentages 30, 25, 30, and 25% respectively.
Chi-square test of Association (Independence)
The chi-square independence test is used to find out whether there is an association
between a row variable and column variable in a contingency table constructed from
sample data. The null hypothesis is that the variables are not associated: in other
words, they are independent. The alternative hypothesis is that the variables are
associated, or dependent.
The chi-square test is used to test the hypothesis of independency of two attributes. If
the data are classified according to several attributes, but if the probability distribution
of these classifications is not given, we may be interested in whether there is any
dependency relationship between the two attributes. If the chi-square test indicates
independence, it means that the observed data are consistent with the hypothesis that
the two attributes are independent. A test of independence requires comparing the
3
observed frequencies in a contingency table to the expected frequencies. The chi-square
test for independence, also known as the chi-square test for association, is used to investigate if there is an
association between two categorical variables.
Examples in which we can use chi-squared tests of independence
Whether the presence or absence of hypertension is independent of smoking
habit or not.
Is level of education related to level of income?
Whether the size of the family is independent of the level of education attained by the mothers.
Is the level of price related to the level of quality in production?
Whether there is association between crime and place of residence
Whether there is association between stability of marriage and period of
acquaintance ship prior to marriage.
Suppose we have a population having two attributes. The chi-square test procedure is
used to test whether there is association between attributes A and B. Assume A has r
mutually exclusive and exhaustive classes and B has c mutually exclusive and
exhaustive classes. This can be displayed in table form as follows.
B Total
O11 O12 O1c R1
O21 O22 O2c R2
A
Or 1 Or 2 Orc Rr
Total C1 C2 Cc N
4
χ
cal
2
r c
=∑ ∑
i=1 j=1
[ (O −ei j)2
e ij
ij
]
Where Oi j : The observed frequency in the catagory i of A and j of B
eij : The exp ected frequency in the catagory i of A and j of B
R Cj
The e ij ' s are obtained by : e = i
ij n
th
R i : i row total
th
C j : j column total
n: Sample size
2α , ( r−1 )( c −1)
χ cal > χ
0 if
2
5
Summarize the results. There is not enough evidence to support the claim that a person’s place of
residence is dependent on the number of years of college completed.
1. In studying the relationship between smoking and lung cancer the following data were obtained.
6
At α=0.05, test whether there is association between smoking and lung cancer
2. Violence and lack of discipline have become major problems in schools in Gondar town. A random
sample of 300 adults was selected, and they were asked if they favor giving more freedom to school
teachers to punish students for violence and lack of discipline. The two-way classification of the
responses of these adults is presented in the following table.
3. Based on the interview of couples seeking divorce a social worker compiles the following data related
to the period of acquaintance ship before marriage and the duration of marriage.
Duration of Marriage
acquaintanc
e ship
period ¿ 3 years 3−5 years ¿ 5 years Total
¿ 1 years 22 30 48 100
1−2 years 10 12 38 60
¿ 2 years 18 8 14 40
Total 50 50 100 200
Test whether stability of marriage depends on period of acquaintance prior to marriage at α=0.05.
7
1)
Step 1: State the null and alternative hypothesis
H 0 :There is no association between smoking and lung cancer
H 1 : Not H o
Step 2:α=0.05
2
Step 3: Re ject H 0 if χ cal2 ¿ χ 0.05,1 =3.84
RiC j
Step 4: Calculate the expected frequencies eij =
n
RC RC
e 11 = 1 1 =26×488 =11.1 e12 = 1 2 =26×655 =14.9
n 1143 n 1143
RC RC
e 21 = 2 1 =1117×488 =476.9 e22 = 2 21 =1117×456 =640.1
n 1143 n 1143
Step 5: Calculate the test statistic
2
(Oij−eij) ( 23−11.1)2 ( 3−14.9)2 ( 456−476.9 )2 ( 652−640.1)2
χ cal2 =∑∑ = + + +
e 11.1 14 .9 476.9 640.1
ij
=20.9
2
Step 6: Since χ 2 =20.9>¿ χ 0.05,1 =3.84⇒Re ject H 0 ¿Step 7:Conclussion: There is association between smoking and lung cancer
cal
2)
Step 1: State the null and alternative hypothesis
H 0 :Gender and opinion of adults are independent
H 1 : Not H o
Step 2:α=0.05
2
Step 3: Re ject H 0 if χ cal2 ¿ χ 0.01,2 =9.21
RiC j
Step 4: Calculate the expected frequencies eij =
n
RC RC RC
e 11 = 1 1 =175×180 =105 e12 = 1 2 = 175×102 =59.5 e13 = 1 3 =175×18 =10.5
n 300 n 300 n 300
RC RC RC
e 21 = 2 1 =125×180 =75 e 22 = 2 2 =125×102 =42.5 e23 = 2 3 =125×18 =7.5
n 300 n 300 n 300
Step 5: Calculate the test statistic
2
(Oij−eij) ( 93−105)2 ( 6−7.5 )2
χ cal2 =∑∑ = +. . .+
e 105 7.5
ij
=8.253
2
Step 6: Since χ 2 =8.253<¿ χ 0.05,1 =9.21⇒ Do not reject H 0 ¿ Step 7:Conclussion: Gender and opnion of adults are independent
cal
8
3)
Step 1: State the null and alternative hypothesis
H 0 : Stability of marriage and period of acquaint ance prioir to marriage are indepndent
H 1 : Not H o
Step 2:α=0.05
2
Step 3: Re ject H 0 if χ cal2 ¿ χ 0.05,4 =9.94
RiC j
Step 4: Calculate the expected frequencies eij =
n
R1 C1 100×50 R C 100×50 R C 100×100
e 11 = = =25 e12 = 1 2 = =25 e13 = 1 3 = =50
n 200 n 200 n 200
R2 C R 2C 2 60×50 R 2 C3 60×100
e 21 =n 1 =60×50
200
=15 e 22 = =
n 200
=15 e23 = =
n 200
=30
RC RC RC
e 31 = 3 1 = 40×50 =10 e32 = 3 2 =40×50 =10 e 33 = 3 3 = 40×100 =20
n 200 n 200 n 200
Step 5: Calculate the test statistic
2
(Oij −eij ) ( 22−25 )2 (14−20 )2
χ cal2 =∑∑ = +. . .+
e 25 20
ij
=14. 44
2
Step 6: Since χ cal2 =14.44>¿ χ 0.05,1 =9.94⇒Re ject H 0 ¿Step 7:Conclussion: Stability of marriage dependece onperiod of acqua intance prioir to marriage
Exercises:
1. A study was conducted to determine whether there is association between sex and preference of color. The
following data were obtained
2. A police department collected data for on three regions of the city on the occurrence of various crimes. The
department wanted to determine whether the type of crime was dependent of the city region. The following
data were obtained.
Type of Crime
Region Homicide Assault Grand Larceny Total
1 12 10 8 30
2 28 25 18 71
3 19 23 22 64
Total 59 58 48 165
At α=0.05 test whether there is association between crime and city region
9
3. An analysis of car accident data was made to determine if the distribution of fatal accidents is dependent on
the size of car involved. The following data were collected
Size of Car
Small Medium Large Total
Fatal 67 26 16 109
Non-fatal 128 63 46 237
Total 195 89 62 346
At α=0.05 test whether fatality is dependent on the size of the car.
10