0% found this document useful (0 votes)
15 views

Chi Square Test

Uploaded by

vidya gadkari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Chi Square Test

Uploaded by

vidya gadkari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Chi-squared Test

A chi-square (χ2) statistic is a test that measures how a model compares to actual
observed data. The data used in calculating a chi-square statistic must be
 random,
 raw,
 mutually exclusive,
 drawn from independent variables,
 drawn from a large enough sample.
For these tests, degrees of freedom are utilized to determine if a certain null
hypothesis can be rejected based on the total number of variables and samples within
the experiment. As with any statistic, the larger the sample size, the more reliable the
result is.
Historical aspects: -
In the 19th century, statistical analytical methods were mainly applied in biological data
analysis and it was customary for researchers such as Sir George Airy and Professor
Merriman to assume that observations followed a normal distribution. Later on, Karl
Pearson criticized the work of those researchers in his 1900 paper.
At the end of 19th century, Pearson noticed the existence of significant skewness within some
biological observations.
In order to model the observations... regardless of being normal or skewed, Pearson, in a
series of articles published from 1893 to 1916 devised the Pearson distribution, a family of
continuous probability distributions. This includes the normal distribution and many skewed
distributions.
He also proposed a method of statistical analysis consisting of using the Pearson distribution
to model the observation.
In 1900, Pearson published a paper on the χ2 test which is considered to be one
of the foundations of modern statistics. In this paper, Pearson investigated a
test of goodness of fit to determine how well the model really fits to the
observations.

purpose:-
The Chi-square test is intended to test how likely it is that an
observed distribution is due to chance. It is also called a "goodness of fit" statistic,
because it measures how well the observed distribution of data fits with
the distribution that is expected if the variables are independent.
Application area:-The Chi-square is used most commonly to compare the
incidence (or proportion) of a characteristic in one group to the incidence (or
proportion) of a characteristic in other group(s).
1. Test for independence of attributes:-
With the help of X2 test we can find out whether 2 or more attributes are
associated or not
2. X2test as goodness of fit:-
The X2test for goodness of fit enables us to determine the extent to which the
theoretical probability distributions coincides with empirical sample
distributions.
3. For yate’s correction for conformity
The distribution of X2 statistics is continuous but the data under the test is
categorical which is discrete.
It causes error due to discrete data and if it is a 2*2 contingency table then we can
apply yate’s correction for continuity.
4. For population variance: -
 This is considered as parametric test.
 The assumption underlying the X2 test is that the population from which
sample is drawn is normally distributed.

5. Test for Homogeneity: -


 This is useful in case when we intend to verify whether several populations
are homogeneous with respect to some characteristics of interest.
 The milk supplied by various suppliers has particular ingredient (for e.g.,
lactose) in common or not.
Steps to perform the Chi-Square Test:

1. Define Hypothesis.
2. Build a Contingency table.
3. Find the expected values.
4. Calculate the Chi-Square statistic.
5. Accept or Reject the Null Hypothesis.

Steps for Chi-Square Test with an example:

Consider a data-set where we have to determine why customers are leaving the bank,
let’s perform a Chi-Square test for two variables. Gender of a customer with values
as Male/Female as the predictor and Exited describes whether a customer is
leaving the bank with values Yes/No as the response. In this test we will check is
there any relationship between Gender and Exited.
Define Hypothesis
Null hypothesis: Assumes that there is no association between the two variables.

Alternative hypothesis: Assumes that there is an association between the two


variables.

Hypothesis testing: Hypothesis testing for the chi-square test of independence as


it is for other tests like ANOVA, where a test statistic is computed and compared to a
critical value. The critical value for the chi-square statistic is determined by the

1.level of significance (typically .05)

2.degrees of freedom.

If the observed chi-square test statistic is greater than the critical value, the null
hypothesis can be rejected.

If the observed chi-square test statistic is less than the critical value, the null
hypothesis can be Accepted.

For given example-


Null Hypothesis (H0): Two variables are independent.
I.e., there is no relation between gender of customer and their exit.

Alternate Hypothesis (H1): Two variables are not independent.


I.e., There is relationship between gender of customer and their exit.

2. Contingency table

A table showing the distribution of one variable in rows and another in columns. It is
used to study the relation between two variables.
Contingency table for observed values

In the above table we have figured out all observed values and our next steps are to
find expected values, get the Chi-Square value and check for relationship.

3. Find the Expected Value

Based on the null hypothesis that the two variables are independent. We can say if A,
B are two independent events.

Formula:

E=RT*CT/N

Let’s calculate the expected value for the first cell that is those who are Males and
are Exited from the bank

E11= 216*82/400= 44

Males and not Exited from the bank

E12= 216*318/400=178

Females and are Exited from the bank

E21=184*82/400=38

Females and not Exited from the bank

E22= 184*318/400=146
We get the following results.

Expected values

4. Calculate Chi-Square value

Summarizing the observed values and calculated expected values into a table and
determine the Chi-Square value.

We can see Chi-Square is calculated as 2.22 by using the Chi-Square statistic formula.

6. Accept or Reject the Null Hypothesis-

Degrees of freedom for contingency table is given as

(r-1) * (c-1)

Where

R and C are rows and columns.

Here df = (2–1) * (2–1) = 1.

With 95% confidence that is alpha = 0.05, we will check the calculated Chi-Square
value falls in the acceptance or rejection region.
The Chi-Square values can be determined with the Chi-Square table.

Having degrees of freedom =1(calculated with contingency table) and alpha =0.05 the
Chi-Square value is 3.84.
In the above fig, we can see Chi-Square ranges from 0 to inf and alpha ranges from 0
to 1 in the opposite direction. We will reject the Null hypothesis if Chi-Square value
falls in the error region

(alpha from 0 to 0.05).

So here we are accepting the null hypothesis since the calculated Chi-Square
value is less than the critical Chi-Square value.

2.22<3.84

I.e., There is no relation between gender of customer and their exit.

Limitations

Chi-Square is sensitive to small frequencies in cells of tables. Generally, when the


expected value in a cell of a table is less than 5, chi-square can lead to errors in
conclusions.
EXAMPLES -CLINICAL ORIENTED: -
EXAMPLE NO. 1
In Antimalarial campaign in India, Quinine was
administered to 500 persons out of a total population of
2000. The no. of fever cases is shown below:-
Treatment Fever No fever TOTAL
Quinine 20 480 500
No Quinine 100 1400 1500
Total 120 1880 2000
Discuss the usefulness of Quinine in checking malaria.
Solution:-
H0=Quinine=Not effective in checking malaria
Ha=Quinine is effective in checking malaria
Table of Given values
Treatment Fever No fever TOTAL
Quinine 20 480 500
No Quinine 100 1400 1500
Total 120 1880 2000
Table of Expected values:-
Treatment Fever No fever
Quinine 30 470
No Quinine 90 1410

E11=RT*CT/N = 120*500/2000 = 30
Calculation of X2
O E (O-E) (O-E)2 (O-E)2/E
20 30 -10 100 3.33
100 90 +10 100 1.11
480 470 +10 100 0.21
1400 1410 -10 100 0.07
Σ (O-E)2/E =4.72
X2 = Σ (O-E)2/E =4.72
d.f=(C-1)(R-1)=(2-1)(2-1)=1
X2O.O5=3.84
H0 =fail and rejected
Therefore, The Conclusion is Quinine is useful in Malaria.

You might also like