0% found this document useful (0 votes)
3 views

Chapter 6

Chapter Six discusses the chi-square tests, which are statistical methods used to determine relationships between categorical variables. It covers the chi-square distribution, its properties, and three types of chi-square tests: goodness of fit, independence, and homogeneity of proportions. The chapter includes examples and step-by-step procedures for conducting these tests and interpreting the results.

Uploaded by

Tigist G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Chapter 6

Chapter Six discusses the chi-square tests, which are statistical methods used to determine relationships between categorical variables. It covers the chi-square distribution, its properties, and three types of chi-square tests: goodness of fit, independence, and homogeneity of proportions. The chapter includes examples and step-by-step procedures for conducting these tests and interpreting the results.

Uploaded by

Tigist G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Chapter Six

The chi-square Tests


2
Let X be a random variable from the normal distribution with mean µ and variance σ 2. That is X~N( μ , σ ) .
 The chi-square distribution with one degree of freedom is defined as the square of the standard normal

distributed variable i.e.


2
Z= ( )
x −μ 2 2
σ
= χ (1)
 The chi-square distribution with n degrees of freedom is defined as the sum n squared independent
n n
∑ Z =∑ ( x−μ
σ )
2
2
= χ 2 (n )
standard normal distributed variables i.e. i=1 i=1
2
Properties of χ distribution
 The chi-square distribution is non-negative (Chi-square values can be zero or
positive but never negative)
 The chi-square distribution non-symmetric (it is skewed to the right)
 As the number of degrees of freedom increases the chi-square distribution
becomes more symmetric and approaches a normal distribution see fig. 1.
2
 The sum (difference) of two independent χ distributed variables with degrees of
freedom n1 and n2 respectively is also a chi-square distributed with degrees of
freedom equal to the sum (difference) of the degrees of freedom
 E ( χ )=n and V ( χ )=2 n
2 2

 Total area under the curve is equal to 1.0.

Finding Critical Values of the Chi-Square Distribution:


Critical values of the chi-square distribution are found in Table IV in Appendix A.
Find the critical value of chi-square for a one-tail (right-tail) test with α= 0.05 and
df=15.

1
Chi Square Tests of Significance
There are three types of Chi Square tests:
1. Goodness of Fit
2. Independence
3. Test for Homogeneity of Proportions
Goodness of Fit
A goodness-of-fit test is an inferential procedure used to determine whether a
frequency distribution follows a claimed distribution. It is a test of the agreement or
conformity between the observed frequencies (Oi) and the expected frequencies ( Ei ) for
several classes or categories.
 When you are testing to see whether a frequency distribution fits a specific
pattern, you can use the chi-square goodness-of-fit test. An emergency service
may want to see whether it receives more calls at certain times of the day than
at others,
 The goodness-of-fit chi-square is used when you have one nominal variable and you want to know whether the
frequency of occurrence (e.g., the number of people in each group) differs from what we would expect by
chance alone.
 The goodness-of-fit chis quare only looks at one variable at a time
 Goodness-of-fit test only evaluates one variable at a time; it does not tell us anything about the relationship
between two variables.
 The goodness-of-fit test often assumes that the expected values for each group are equal. For example, with two
Groups we may expect 50% of our sample to be in each group.
Example: An emergency service may want to see whether it receives more calls at certain times of the day than at
others, A traffic engineer may wish to see whether accidents occur more often on some days than on others,
The formula for the test statistic for Chi Square is

The frequencies were obtained from a sample data or actual frequencies are called the observed frequencies. The
frequencies obtained by calculation are called the expected frequencies.
 In the goodness-of-fit test, the degrees of freedom are equal to the number of categories minus 1
Example:
A grocery store manager wishes to determine whether a certain product will sell equally well in any of five locations in
the store. Five displays are set up, one in each location, and the resulting numbers of the product sold are noted
1 2 3 4 5
Observed number 45 30 53 34 48

2
Is there a significant evidence to conclude that the location makes a difference in product sold? Test at both 5%and 10%
level of significance?
Step 1. State the hypotheses and identify the claim.
Ho: There is no location difference
H1: There is location difference
Step 2: Find the critical value.
α=0.05 and the critical value is 5-1 =4 at 0.05 is 9.488
Step3. Test statistic Compute the test value by subtracting the expected value from the
corresponding observed value, squaring the result and dividing by the expected value,
and finding the sum. The expected value for each category is

2 2 2 2 2
( 45−42) (30−42) (53−42) (34−42) (48−42)
+ + + +
42 42 42 42 42
0.214+3.43+2.89+1.523+0.3809
8.43
Step 4: Make a decision: the decision is not to reject Ho, since 8.43<9.488
Step 5: Conclusion: there is no enough evidence to reject Ho or there is no a location
difference in the product sold.
Example : In a recent year, at the 6 pm time slot, TV channels 2, 3, 4 and 5 captured the entire audience with 30%,
25%, 20%, and 25% respectively. The TV executives think these numbers are no longer
Correct. During the first week of the next season they decide to poll 500 viewers.
Test the claim that is there a significant evidence to reject the null hypothesis that is The television audience is
distributed over channels 2, 3, 4, and 5 with percentages 30, 25, 30, and 25% respectively.
Chi-square test of Association (Independence)
The chi-square independence test is used to find out whether there is an association
between a row variable and column variable in a contingency table constructed from
sample data. The null hypothesis is that the variables are not associated: in other
words, they are independent. The alternative hypothesis is that the variables are
associated, or dependent.
The chi-square test is used to test the hypothesis of independency of two attributes. If
the data are classified according to several attributes, but if the probability distribution
of these classifications is not given, we may be interested in whether there is any
dependency relationship between the two attributes. If the chi-square test indicates
independence, it means that the observed data are consistent with the hypothesis that
the two attributes are independent. A test of independence requires comparing the

3
observed frequencies in a contingency table to the expected frequencies. The chi-square
test for independence, also known as the chi-square test for association, is used to investigate if there is an
association between two categorical variables.
Examples in which we can use chi-squared tests of independence
 Whether the presence or absence of hypertension is independent of smoking
habit or not.
 Is level of education related to level of income?
 Whether the size of the family is independent of the level of education attained by the mothers.
 Is the level of price related to the level of quality in production?
 Whether there is association between crime and place of residence
 Whether there is association between stability of marriage and period of
acquaintance ship prior to marriage.
Suppose we have a population having two attributes. The chi-square test procedure is
used to test whether there is association between attributes A and B. Assume A has r
mutually exclusive and exhaustive classes and B has c mutually exclusive and
exhaustive classes. This can be displayed in table form as follows.

B Total
O11 O12 O1c R1
O21 O22 O2c R2
A

Or 1 Or 2 Orc Rr
Total C1 C2 Cc N

Where Oij is the observed frequency∈the category i of A∧ j of B


th
Ri thei row total
th
C j the j column total
The null∧alternative hypotheses are stated as :
H 0 :There isno association between A∧B∨A∧B are independent
H 1 : Not H 0
Computing the Test Statistic
 Conceptually, the chi-square test of independence statistic is computed by summing the difference
between the expected and observed frequencies for each cell in the table divided by the expected
frequencies for the cell.
The chi-square test statistics is given by:

4
χ
cal
2
r c
=∑ ∑
i=1 j=1
[ (O −ei j)2
e ij
ij

]
Where Oi j : The observed frequency in the catagory i of A and j of B
eij : The exp ected frequency in the catagory i of A and j of B
R Cj
The e ij ' s are obtained by : e = i
ij n
th
R i : i row total
th
C j : j column total
n: Sample size
2α , ( r−1 )( c −1)
χ cal > χ
0 if
2

Decision rule: Reject H

Decision and Interpretation


 If the probability of the test statistic is less than or equal to the probability of the alpha error rate,
we reject the null hypothesis and conclude that our data supports the research hypothesis. We
conclude that there is a relationship between the variables.
 If the probability of the test statistic is greater than the probability of the alpha error rate, we fail to
reject the null hypothesis. We conclude that there is no relationship between the variables, i.e. they
are independent.
Examples: A sociologist wishes to see whether the number of years of college a person has completed is
related to her or his place of residence. A sample of 88 people is selected and classified as shown

5
Summarize the results. There is not enough evidence to support the claim that a person’s place of
residence is dependent on the number of years of college completed.

1. In studying the relationship between smoking and lung cancer the following data were obtained.

Smokers Non-smokers opinion Total


Cancer 23 3 26
No cancer 465 652 1117
Total 488 655 1143

6
At α=0.05, test whether there is association between smoking and lung cancer

2. Violence and lack of discipline have become major problems in schools in Gondar town. A random
sample of 300 adults was selected, and they were asked if they favor giving more freedom to school
teachers to punish students for violence and lack of discipline. The two-way classification of the
responses of these adults is presented in the following table.

In Favor Against No opinion Total


Men 93 70 12 175
Women 87 32 6 125
Total 180 102 18 300
Does the sample provide sufficient information to conclude that the two attributes, gender and
opinions of adults, are dependent? Use a 1% significance level.

3. Based on the interview of couples seeking divorce a social worker compiles the following data related
to the period of acquaintance ship before marriage and the duration of marriage.

Duration of Marriage
acquaintanc
e ship
period ¿ 3 years 3−5 years ¿ 5 years Total
¿ 1 years 22 30 48 100
1−2 years 10 12 38 60
¿ 2 years 18 8 14 40
Total 50 50 100 200
Test whether stability of marriage depends on period of acquaintance prior to marriage at α=0.05.

7
1)
Step 1: State the null and alternative hypothesis
H 0 :There is no association between smoking and lung cancer
H 1 : Not H o
Step 2:α=0.05
2
Step 3: Re ject H 0 if χ cal2 ¿ χ 0.05,1 =3.84
RiC j
Step 4: Calculate the expected frequencies eij =
n
RC RC
e 11 = 1 1 =26×488 =11.1 e12 = 1 2 =26×655 =14.9
n 1143 n 1143
RC RC
e 21 = 2 1 =1117×488 =476.9 e22 = 2 21 =1117×456 =640.1
n 1143 n 1143
Step 5: Calculate the test statistic
2
(Oij−eij) ( 23−11.1)2 ( 3−14.9)2 ( 456−476.9 )2 ( 652−640.1)2
χ cal2 =∑∑ = + + +
e 11.1 14 .9 476.9 640.1
ij
=20.9
2
Step 6: Since χ 2 =20.9>¿ χ 0.05,1 =3.84⇒Re ject H 0 ¿Step 7:Conclussion: There is association between smoking and lung cancer
cal
2)
Step 1: State the null and alternative hypothesis
H 0 :Gender and opinion of adults are independent
H 1 : Not H o
Step 2:α=0.05
2
Step 3: Re ject H 0 if χ cal2 ¿ χ 0.01,2 =9.21
RiC j
Step 4: Calculate the expected frequencies eij =
n
RC RC RC
e 11 = 1 1 =175×180 =105 e12 = 1 2 = 175×102 =59.5 e13 = 1 3 =175×18 =10.5
n 300 n 300 n 300
RC RC RC
e 21 = 2 1 =125×180 =75 e 22 = 2 2 =125×102 =42.5 e23 = 2 3 =125×18 =7.5
n 300 n 300 n 300
Step 5: Calculate the test statistic
2
(Oij−eij) ( 93−105)2 ( 6−7.5 )2
χ cal2 =∑∑ = +. . .+
e 105 7.5
ij
=8.253
2
Step 6: Since χ 2 =8.253<¿ χ 0.05,1 =9.21⇒ Do not reject H 0 ¿ Step 7:Conclussion: Gender and opnion of adults are independent
cal

8
3)
Step 1: State the null and alternative hypothesis
H 0 : Stability of marriage and period of acquaint ance prioir to marriage are indepndent
H 1 : Not H o
Step 2:α=0.05
2
Step 3: Re ject H 0 if χ cal2 ¿ χ 0.05,4 =9.94
RiC j
Step 4: Calculate the expected frequencies eij =
n
R1 C1 100×50 R C 100×50 R C 100×100
e 11 = = =25 e12 = 1 2 = =25 e13 = 1 3 = =50
n 200 n 200 n 200
R2 C R 2C 2 60×50 R 2 C3 60×100
e 21 =n 1 =60×50
200
=15 e 22 = =
n 200
=15 e23 = =
n 200
=30
RC RC RC
e 31 = 3 1 = 40×50 =10 e32 = 3 2 =40×50 =10 e 33 = 3 3 = 40×100 =20
n 200 n 200 n 200
Step 5: Calculate the test statistic
2
(Oij −eij ) ( 22−25 )2 (14−20 )2
χ cal2 =∑∑ = +. . .+
e 25 20
ij
=14. 44
2
Step 6: Since χ cal2 =14.44>¿ χ 0.05,1 =9.94⇒Re ject H 0 ¿Step 7:Conclussion: Stability of marriage dependece onperiod of acqua intance prioir to marriage
Exercises:
1. A study was conducted to determine whether there is association between sex and preference of color. The
following data were obtained

Color Male Female Total


Red 85 59 144
Blue 65 91 156
Total 150 150 300
At α=0.05 test whether there is association between sex and preference of color

2. A police department collected data for on three regions of the city on the occurrence of various crimes. The
department wanted to determine whether the type of crime was dependent of the city region. The following
data were obtained.

Type of Crime
Region Homicide Assault Grand Larceny Total
1 12 10 8 30
2 28 25 18 71
3 19 23 22 64
Total 59 58 48 165
At α=0.05 test whether there is association between crime and city region

9
3. An analysis of car accident data was made to determine if the distribution of fatal accidents is dependent on
the size of car involved. The following data were collected

Size of Car
Small Medium Large Total
Fatal 67 26 16 109
Non-fatal 128 63 46 237
Total 195 89 62 346
At α=0.05 test whether fatality is dependent on the size of the car.

Chi-Square Test for Homogeneity of Proportions


In a chi-square test for homogeneity of proportions, we test the claim that different populations have the same
proportion of individuals with some characteristic. The appropriate null hypothesis is:
H0: p1 = p2,
Vs:
H1: At least one of the population proportions is different from the other

10

You might also like