100% found this document useful (1 vote)
442 views14 pages

Chi-Square Distribution

The chi-square distribution is used to test whether an observed distribution fits an expected distribution. It is applied to categorical data by comparing observed and expected frequencies in a contingency table. The chi-square test statistic is calculated as the sum of the squared differences between observed and expected values divided by the expected value for each category. The test statistic is then compared to a critical value from the chi-square distribution table based on the degrees of freedom to determine if the null hypothesis that the distributions are the same can be rejected.

Uploaded by

Abhorn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
442 views14 pages

Chi-Square Distribution

The chi-square distribution is used to test whether an observed distribution fits an expected distribution. It is applied to categorical data by comparing observed and expected frequencies in a contingency table. The chi-square test statistic is calculated as the sum of the squared differences between observed and expected values divided by the expected value for each category. The test statistic is then compared to a critical value from the chi-square distribution table based on the degrees of freedom to determine if the null hypothesis that the distributions are the same can be rejected.

Uploaded by

Abhorn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Chi-square distribution

 The chi square distribution is a theoretical or mathematical


distribution which has wide applicability in statistical work.
 It is intended to test how likely it is that an observed distribution is
due to chance. It is also measures how well the observed distribution
of data fits with the distribution that is expected if the variables are
independent.
Conti…..

 A Chi-square test is designed to analyze categorical data. That means


that the data has been counted and divided into categories. It will not
work with parametric or continuous data (such as height in inches).
• The chi square distribution is the distribution of the sum of these
random samples squared .
• Chi square goodness of fit test depends only on the set of observed
and expected frequencies and degrees of freedom. This test does not
need any assumption regarding distribution of the parent population
from which the samples are taken.
Conti…….
 To calculate the test statistic for the chi-square goodness-of-fit test,
the observed frequencies and the expected frequencies are used.
 The observed frequency O of a category is the frequency for the
category observed in the sample data.
 The expected frequency E of a category is the calculated frequency
for the category. Expected frequencies are obtained assuming the
specified (or hypothesized) distribution.
 A contingency table is used to investigate whether two traits or
characteristics are related.
Conti…..

• An r  c contingency table shows the observed frequencies for two


variables. The observed frequencies are arranged in r rows and c
columns. The intersection of a row and a column is called a cell.
• The purpose of a chi-square goodness-of-fit test is to compare an
observed distribution to an expected distribution.
Conti……

 Generally chi-square distribution performed on Discrete data .

steeps to calculate chi-square distribution:-


1. Make a hypothesis based on your basic question

2. Determine the expected frequencies


3. Create a table with observed frequencies, expected frequencies, and
chi-square values using the formula:
χ2 =∑ (O-E)2
E
Conti……

4. Find the degrees of freedom: (c-1)(r-1)


5. Find the chi-square statistic in the Chi-Square Distribution table
6. If chi-square statistic > calculated chi-square value, you do not reject
your null hypothesis and vice versa.
Steps To Follow to Perform the Chi-Squared Test with examples:

1. State the null hypothesis. This states that the variables in the
contingency table are independent (or the classification of one
variable does not affect the classification of the other).

e.g. Ho: Divorce is not related to being a smoker


State the alternative hypothesis. This states that the variables are
dependent, or that the direction of one affects the direction of the
other.

e.g. HA : Divorce is related to being a smoker.


Conti……

2. Calculate the expected values.


 Record the experimental results (observed values, O) in a contingency table of rows and
columns
 Calculate the “expected” (E) values for each cell:

E =(R)(C)
N

Where:

E= Estimated or expected number for the cell in ith row and jth column

R=Row total

C=Column total

N=Grand total
Conti…..

Record these values in the contingency table.

Divorced Non divorced


Observed Observed Total
Smoker 73 12 85
Non 43 39 82
smoker
Total 116 51 167
Cont……
Divorced Non divorced
Observ Expected Observe Expected Total
ed d
Smoker 73 85×116 =59 12 85×51 =26 85
167 167
Non 43 82×116 =57 39 82×51 =25 82
smoker 167 167
Total 116 51 167

Calculate the chi-square test value:

χ2=∑(O – E)2 , χ 2 =22.14

E
Conti….
4. Find the critical value from chi-squared tables.

To use these tables, two pieces of information are required:


 The level of significance – at the 5% level (written as

α =0.05
 degrees of freedom = df = (no. of rows minus - 1) × (no. of columns
-1) = ( r-1 )( c-1 ).
With 1 degree of freedom and α =0.05, from tables χ2 = 3.841
Conti…….
5. Compare the test statistic with table statistic.

If the calculated test statistic (χ2 ) test > the critical value from χ2
table, then the null hypothesis is rejected and the alternative
hypothesis is favoured .
In our example, test value of χ2 =22.14 This is greater than the table
value of 3.841. Hence we reject HO in favour of HA , that is that
divorce and smoking are related.
Cont…..
df Level of Signicance

0.20 0.100 0.075 0.050 0.025 0.010 0.005 0.001


1 1.642 2.706 3.170 3.841 5.024 6.635 7.879 10.828
2 3.219 4.605 5.181 5.991 7.378 9.210 10.597 13.81

3 4.642 6.251 6.905 7.815 9.348 11.345 12.838 16.266 1


4 5.989 7.779 8.496 9.488 11.143 13.277 14.860 18.467
5 7.289 9.236 10.008 11.070 12.833 15.086 16.750 20.516

6 8.558 10.645 11.466 12.592 14.449 16.812 18.548 22.458


Thank you !!

You might also like