Application of Coefficient of Contingency Among Classification
Application of Coefficient of Contingency Among Classification
SEMINAR PRESENTED
ON
PRESENTED BY
FPA/SA/15/1-0034
SUBMITTED TO
JANUARY, 2018.
1
1.0 INTRODUCTION
In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table
in a matrix format that displays the (multivariate) frequency distribution of the variable. They are
heavily used in survey research, business intelligent, engineering and science research. They
provide a basic picture of the interrelation between them. The term contingency table was first
used by Karl Pearson in ‘On the theory of contingency and its relation to association and normal
correlation’, part of Drapers’ company Research Memoirs Biometric series I published in 1904.
independences are revealed, then even the storage of the data can be done in a smarter way (see
lauritzen 2002). In order to do this one can use information theory concepts, which gain the
information only from the distribution of probability, which can be expressed easily from the
The aim of this work it to test for coefficient of contingency among two variables, sex (male or
I. To compare the proportion of men who are right handed and the proportion of
female who are right handed if they are the same or not.
2
1.2 SOURCE OF STATISTICAL DATA
Statistical data are available in several forms. Some can be through experimentation in which the
person that collect the data record of the observation will serve as his data. The type of problem
at hand dictates where an investigator could obtain statistical data. Statistical data could there be
obtained from the public, commercial, laboratory and field experiment. Government ministers
also provide statistical for research use. Some could be through census and sample survey while
This write up is base on application of coefficient of contingency and in which the data is
hypothetical data
to tables larger than 2 by 2. The contingency coefficient is computed as the square root of
coefficient will be always less than 1 and will be approaching 1.0 only for large table or
the minimum of -1, the highest it can reach in 2 by 2 table is 0.707 and the maximum it
can reach in a 4 by 4 table is 0.870. The larger the contingency coefficient the stronger
3
the association. Some researches recommend it only for 5 by 5 table or larger. For
smaller table it will underestimate the level of association. Moreover, it does not apply to
asymmetric table (those where the number of rows and columns are not equal).
𝑋2
C=√𝑁+𝑋 2
Contingency coefficient by
𝑘−1
√ (Recall that Contingency coefficient only applies to tables in which the
𝑘
number of rows is equal to the number of columns and therefore equal to k). Where 𝑥 2 is
Pearson’s chi-square test is used to assess the types of comparison; goodness of fit,
A test of homogeneity; compare the distribution of counts for two or more groups
sorted by graduation year, to see if number of graduates choosing a given activity has
4
from people of different nationalities to see if one’s nationality is related to the
response).
For all three tests, the computational procedure includes the following steps;
1. Calculate the chi-squared test statistics, 𝑥 2 which resembles a normalize sum of squared
2. Determine the degree of freedom (d.f), of the statistic, for a Test of goodness-of-fit, this
is essentially the number of categories reduced by the number of parameters of the fitted
distribution. For Test for homogeneity, df= (Rows -1)*(columns-1), where Rows
corresponds to the number of categories (i.e. rows in the associated contingency table),
and columns corresponds the number of independent groups (i.e. columns in the
where in this case, rows corresponds to number of categories in one variable, and
3. Select a desired level of confidence (significance level, p-value or alpha value) for the
4. Compare 𝑥 2 to the critical value from the chi-squared distribution with df degrees of
freedom and the selected confidence level (one-side since the test is only one direction,
(i.e. is the test value greater than the critical value?), which in many cases gives a good
5. Accept or reject the null hypothesis that the observed frequency distribution is the same
as the theoretical distribution base on whether the test statistic exceeds the critical value
of 𝑥 2 . If the test statistic exceeds the critical value 𝑥 2 , the null hypothesis( H0=there is no
difference between the distribution) can be rejected, and the alternative hypothesis ( H1=
5
there is a difference between the distributions) can be accepted, both with the selected
level of confidence.
GENDER HANDEDNESS
MALE 43 9 52
FEMALE 44 4 48
TOTAL 87 13 100
HYPOTHESIS STATEMENT
TEST STATISTICS
EXPECTED FREQUENCY
TI ×TJ
EIJ=
𝑁
6
87×52
E11= =45.24
100
13×52
E12= =6.76
100
87×48
E21= =41.76
100
48×13
E22= =6.24
100
GENDER HANDEDNESS
TOTAL 87 13 100
(𝑂𝐼 −𝐸𝐼 )
X2=∑𝐾
𝐼=1 𝐸𝐼
Where O is the observed valve, E is the expected value and “ith” position in contingency table.
(43×4−9×44)2 (172−396)2
X2=100 52×48×87×13 =100 2822976
−224
= 100 × 0.0178=1.78
2822976
IJ O E O-E O − E2
𝐸
7
12 9 6.76 2.24 0.743
TOTAL 1.78
Decision rule
Decision
CONCLUSION
Base on the decision taken above, we concluded that the handedness of the respondent is
independent on their gender. That is anybody can use left or right irrespective of their gender.
𝑋2
C=√𝑁+𝑋 2
1.78 1.78
C=√100+1.78 =√101.78
C=√0.0175 = 0.1323.
8
COMMENT
We observe that the value of contingency coefficient is 0.1323, in which contingency coefficient
suffer from the disadvantages that it does not reach a maximum of 1 or minimum of -1; then the
highest it can reach in a 2 by 2 table is 0.7070, the association or relationship of the two variables
is very weak.
HYPOTENSIS STATEMENT
Cases
HANDEDNESS
RIGHT
HANDED LEFT HANDED Total
FEMALE Count 44 4 48
9
Chi-Square Tests
Symmetric Measures
Approximate
Value Significance
DECISION RULE
Decision
Since of Pvalue (0.182) is greater than ⍺value(0.05), therefore we have no reason to reject H0.
CONCLUSION
Based on the decision taken above, we concluded that the handedness of the respondent is
independent on their gender. That is anybody can use left or right irrespective of their gender.
10
4.0 SUMMARY AND CONCLUSION
It is concluded that the handedness of the respondent is independent on their gender. That
Furthermore, by comparing the result obtained from the manual method with that of
electronic method i.e. the use of statistical package for social science (SPSS) it can be seen
that the result obtained and the conclusion are the same. The result for contingency
coefficient is also equal, the coefficient of contingency among this association is 0.132 i.e
11
REFERENCES
Karl Pearson, F.R.S. (1904): Mathematical contributions to the theory of evolution (PDF).
Ferguson, G. A. (1966); Statistical analysis in psychology and education . New York: McGraw–
Hill.
Andersen, Erling B. (1980). Discrete Statistical Models with Social Science Applications. North
Holland, 1980.
Christensen, Ronald (1997). Log-linear models and logistic regression. Springer Texts in
Statistics (Second ed.). New York: Springer-Verlag. pp. xvi +483. ISBN 0-387-98247-7 . MR
1633357 .
Lauritzen, Steffen L. (2002 electronic (1979, 1982, 1989)). Lectures on Contingency Tables
PDF) (updated electronic version of the (University of Aalborg) 3rd (1989) ed.). Check date values
Gokhale, D. V.; Kullback, Solomon (1978). The Information in Contingency Tables. Marcel
12