0% found this document useful (0 votes)
122 views

Application of Coefficient of Contingency Among Classification

This document summarizes a seminar presentation on applying the coefficient of contingency to classify variables. It includes: 1) An introduction to contingency tables and the coefficient of contingency measure. 2) A description of hypothetical data on sex (male/female) and handedness (right/left). 3) Calculations to test the independence of sex and handedness using Pearson's chi-square test. The results show the chi-square statistic is 1.78, below the critical value, so we fail to reject independence.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views

Application of Coefficient of Contingency Among Classification

This document summarizes a seminar presentation on applying the coefficient of contingency to classify variables. It includes: 1) An introduction to contingency tables and the coefficient of contingency measure. 2) A description of hypothetical data on sex (male/female) and handedness (right/left). 3) Calculations to test the independence of sex and handedness using Pearson's chi-square test. The results show the chi-square statistic is 1.78, below the critical value, so we fail to reject independence.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

A

SEMINAR PRESENTED

ON

APPLICATION OF COEFFICIENT OF CONTINGENCY


AMONG CLASSIFICATION

PRESENTED BY

JIMOH OLAMIDAYO MICHEAL

FPA/SA/15/1-0034

SUBMITTED TO

DEPARTMENT OF MATHEMATICS AND STATISTICS

THE FEDERAL POLYTHEHNIC, ADO EKITI, EKITI STATE.

JANUARY, 2018.

1
1.0 INTRODUCTION

In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table

in a matrix format that displays the (multivariate) frequency distribution of the variable. They are

heavily used in survey research, business intelligent, engineering and science research. They

provide a basic picture of the interrelation between them. The term contingency table was first

used by Karl Pearson in ‘On the theory of contingency and its relation to association and normal

correlation’, part of Drapers’ company Research Memoirs Biometric series I published in 1904.

A crucial problem of multivariate statistics is finding (direct) dependence structure underlying

the variables contained in high-dimensional contingency tables. If some of the conditional

independences are revealed, then even the storage of the data can be done in a smarter way (see

lauritzen 2002). In order to do this one can use information theory concepts, which gain the

information only from the distribution of probability, which can be expressed easily from the

contingency table by the relative frequencies.

1.1 AIM AND OBJECTIVE OF THIS TOOLS

The aim of this work it to test for coefficient of contingency among two variables, sex (male or

female) and handedness (right or left handed) in this hypothetical data.

The objective of the tools is:

I. To compare the proportion of men who are right handed and the proportion of

female who are right handed if they are the same or not.

II. To determine the significant of difference between the two proportion.

2
1.2 SOURCE OF STATISTICAL DATA

Statistical data are available in several forms. Some can be through experimentation in which the

person that collect the data record of the observation will serve as his data. The type of problem

at hand dictates where an investigator could obtain statistical data. Statistical data could there be

obtained from the public, commercial, laboratory and field experiment. Government ministers

also provide statistical for research use. Some could be through census and sample survey while

others can be collected through questionnaire and personal interviews.

The type of data use for this seminar is hypothetical data.

1.3 SCOPE OF THE STUDY

This write up is base on application of coefficient of contingency and in which the data is

hypothetical data

2.0 RESEARCH METHODOLOGY

2.1 CONTINGENCY COEFFICIENT

Contingency coefficients can be used to estimate the extent of relationship between

variables, or to show the strength of a relationship.

The contingency coefficient is an adjustment to phi coefficient, intended to adapt it

to tables larger than 2 by 2. The contingency coefficient is computed as the square root of

chi-square (𝑋 2 ) divided by chi-square plus N, the sample size. The contingency

coefficient will be always less than 1 and will be approaching 1.0 only for large table or

the minimum of -1, the highest it can reach in 2 by 2 table is 0.707 and the maximum it

can reach in a 4 by 4 table is 0.870. The larger the contingency coefficient the stronger

3
the association. Some researches recommend it only for 5 by 5 table or larger. For

smaller table it will underestimate the level of association. Moreover, it does not apply to

asymmetric table (those where the number of rows and columns are not equal).

Formula for contingency coefficient is

𝑋2
C=√𝑁+𝑋 2

Complete association in a table of any number of rows and columns dividing

Contingency coefficient by

𝑘−1
√ (Recall that Contingency coefficient only applies to tables in which the
𝑘

number of rows is equal to the number of columns and therefore equal to k). Where 𝑥 2 is

computed as in Pearson’s chi-square test.

2.2 PEARSON’S CHI-SQUARE TEST

Pearson’s chi-square test is used to assess the types of comparison; goodness of fit,

homogeneity, and independence.

 A test of goodness of fit; establishes whether an observed frequency distribution

differs from a theoretical distribution.

 A test of homogeneity; compare the distribution of counts for two or more groups

using the same categorical variable (e.g. choice of activity-college, military,

employment, travel-of graduates of a high school reported a year after graduation,

sorted by graduation year, to see if number of graduates choosing a given activity has

changed from class to class, or from decade to decade).

 A test of independence; assesses whether unpaired observations on two variables,

expressed in a contingency table, are independent of each other(e.g. polling responses

4
from people of different nationalities to see if one’s nationality is related to the

response).

For all three tests, the computational procedure includes the following steps;

1. Calculate the chi-squared test statistics, 𝑥 2 which resembles a normalize sum of squared

deviations between observed and theoretical frequencies.

2. Determine the degree of freedom (d.f), of the statistic, for a Test of goodness-of-fit, this

is essentially the number of categories reduced by the number of parameters of the fitted

distribution. For Test for homogeneity, df= (Rows -1)*(columns-1), where Rows

corresponds to the number of categories (i.e. rows in the associated contingency table),

and columns corresponds the number of independent groups (i.e. columns in the

associated contingency table). For Test of independence, df = (rows-1)*(column-1),

where in this case, rows corresponds to number of categories in one variable, and

columns corresponds to number of categories in the second variable.

3. Select a desired level of confidence (significance level, p-value or alpha value) for the

result of the test.

4. Compare 𝑥 2 to the critical value from the chi-squared distribution with df degrees of

freedom and the selected confidence level (one-side since the test is only one direction,

(i.e. is the test value greater than the critical value?), which in many cases gives a good

approximation of the distribution of 𝑥 2 .

5. Accept or reject the null hypothesis that the observed frequency distribution is the same

as the theoretical distribution base on whether the test statistic exceeds the critical value

of 𝑥 2 . If the test statistic exceeds the critical value 𝑥 2 , the null hypothesis( H0=there is no

difference between the distribution) can be rejected, and the alternative hypothesis ( H1=

5
there is a difference between the distributions) can be accepted, both with the selected

level of confidence.

3.0 ANALYSIS OF THE SEMINAR

3.1 DESCRIPTION OF DATA

GENDER HANDEDNESS

RIGHT HANDED LEFT HANDED TOTAL

MALE 43 9 52

FEMALE 44 4 48

TOTAL 87 13 100

HYPOTHESIS STATEMENT

H0: the handedness of the respondent is independent on their gender

H1: the handedness of the respondent depend on their gender

TEST STATISTICS

EXPECTED FREQUENCY
TI ×TJ
EIJ=
𝑁

 EIJ= expected frequency for ith row/j column.

 TJ= total in the ith row

 TJ= total in the jth column

 N= table grand total.

6
87×52
E11= =45.24
100

13×52
E12= =6.76
100

87×48
E21= =41.76
100

48×13
E22= =6.24
100

EXPECTED FREQUENCY TABLE

GENDER HANDEDNESS

RIGHT HANDED LEFT HANDED TOTAL

MALE 45.24 6.76 52

FEMALE 41.76 6.24 48

TOTAL 87 13 100

(𝑂𝐼 −𝐸𝐼 )
X2=∑𝐾
𝐼=1 𝐸𝐼

Where O is the observed valve, E is the expected value and “ith” position in contingency table.

(43×4−9×44)2 (172−396)2
X2=100 52×48×87×13 =100 2822976

−224
= 100 × 0.0178=1.78
2822976

Or using tabular method

IJ O E O-E O − E2
𝐸

11 43 45.24 -2.24 0.111

7
12 9 6.76 2.24 0.743

21 44 41.76 2.24 0.120

22 4 6.24 -2.24 0.805

TOTAL 1.78

Then our x2 is 1.78.

Degree of freedom is calculated for 2 by 2 table as (2-1) (2-1) =1*1=1

95% level of significance = 0.05

Table value at 95% of level of significance with df 1 is 3.841

Decision rule

Reject H0 if 𝑥 2 cal≥ 𝑥 2 table otherwise no reason or evidence to reject H0.

Decision

Since 𝑥 2 cal(1.78) is less than 𝑥 2 tal(3.841),therefore we have no reason to reject H0

CONCLUSION

Base on the decision taken above, we concluded that the handedness of the respondent is

independent on their gender. That is anybody can use left or right irrespective of their gender.

3.2 CONTINGENCY COEFFICIENT

𝑋2
C=√𝑁+𝑋 2

where 𝑥 2 is the chi-square calculate which is 1.78

N is the total number of respondent which is 100.

1.78 1.78
C=√100+1.78 =√101.78

C=√0.0175 = 0.1323.

8
COMMENT

We observe that the value of contingency coefficient is 0.1323, in which contingency coefficient

suffer from the disadvantages that it does not reach a maximum of 1 or minimum of -1; then the

highest it can reach in a 2 by 2 table is 0.7070, the association or relationship of the two variables

is very weak.

USING SPSS FOR ANALYSIS

HYPOTENSIS STATEMENT

H0: the handedness of the respondent is independent on their gender

H1: the handedness of the respondent depend on their gender

Case Processing Summary

Cases

Valid Missing Total

N Percent N Percent N Percent

GENDER * HANDEDNESS 100 100.0% 0 0.0% 100 100.0%

GENDER * HANDEDNESS Crosstabulation

HANDEDNESS

RIGHT
HANDED LEFT HANDED Total

GENDER MALE Count 43 9 52

Expected Count 45.2 6.8 52.0

FEMALE Count 44 4 48

Expected Count 41.8 6.2 48.0


Total Count 87 13 100

Expected Count 87.0 13.0 100.0

9
Chi-Square Tests

P-value (2- P-value. (2- P-value (1-


Value Df sided) sided) sided)

Pearson Chi-Square 1.777 1 .182

Symmetric Measures

Approximate
Value Significance

Nominal by Nominal Contingency Coefficient .132 .182


N of Valid Cases 100

DECISION RULE

Reject H0 if P value ≤ ⍺value, otherwise there is no reason to reject H0

Decision

Since of Pvalue (0.182) is greater than ⍺value(0.05), therefore we have no reason to reject H0.

CONCLUSION

Based on the decision taken above, we concluded that the handedness of the respondent is

independent on their gender. That is anybody can use left or right irrespective of their gender.

10
4.0 SUMMARY AND CONCLUSION

It is concluded that the handedness of the respondent is independent on their gender. That

is anybody can use left or right irrespective of their gender.

Furthermore, by comparing the result obtained from the manual method with that of

electronic method i.e. the use of statistical package for social science (SPSS) it can be seen

that the result obtained and the conclusion are the same. The result for contingency

coefficient is also equal, the coefficient of contingency among this association is 0.132 i.e

the relationship or association is weak .

11
REFERENCES

Karl Pearson, F.R.S. (1904): Mathematical contributions to the theory of evolution (PDF).

Dulau and Co.

Ferguson, G. A. (1966); Statistical analysis in psychology and education . New York: McGraw–

Hill.

Andersen, Erling B. (1980). Discrete Statistical Models with Social Science Applications. North

Holland, 1980.

Bishop, Y. M. M.; Fienberg, S. E. ; Holland, P. W. (1975). Discrete Multivariate Analysis:

Theory and Practice . MIT Press. ISBN 978-0-262-02113-5 . MR 381130 .

Christensen, Ronald (1997). Log-linear models and logistic regression. Springer Texts in

Statistics (Second ed.). New York: Springer-Verlag. pp. xvi +483. ISBN 0-387-98247-7 . MR

1633357 .

Lauritzen, Steffen L. (2002 electronic (1979, 1982, 1989)). Lectures on Contingency Tables

PDF) (updated electronic version of the (University of Aalborg) 3rd (1989) ed.). Check date values

in: |date= ( help )

Gokhale, D. V.; Kullback, Solomon (1978). The Information in Contingency Tables. Marcel

Dekker. ISBN 0-824-76698-9 .

12

You might also like