Chi square
Chi square
2
HYPOTHESIS TESTING
• It is a statistical procedure for determining the difference between observed and expected data. This test can also
be used to determine whether it correlates to the categorical variables in our data. It helps to find out whether a
difference between two categorical variables is due to chance or a relationship between them.
• For example, a meal delivery firm in India wants to investigate the link between gender, geography, and people's
food preferences.
• It is used to calculate the difference between two categorical variables, which are:
• As a result of chance or
• Because of the relationship
CHI-SQUARE FORMULA
In Chi-Square – df=n-1
CHI-SQUARE
STATISTIC Chi-square test
• A chi-square statistic is one way to show a • Used to compare between expected and
relationship between two categorical variables. • observed data.
• In statistics, there are two types of • • A statistical method assessing the goodness of
variables: numerical (countable)
• fit or independence between a set of observed
variables and non-numerical (categorical)
variables. • values and those expected theoretically.
1. Independence 2. Goodness-Of-Fit
• The Chi-Square Test of Independence is a derivable ( also known • In statistical hypothesis testing, the Chi-Square Goodness-of-Fit test
as inferential ) statistical test which examines whether the two determines whether a variable is likely to come from a given
sets of variables are likely to be related with each other or not. distribution or not. We must have a set of data values and the idea of
This test is used when we have counts of values for two nominal the distribution of this data. We can use this test when we have value
or categorical variables and is considered as non-parametric test. counts for categorical variables. This test demonstrates a way of
A relatively large sample size and independence of obseravations deciding if the data values have a “ good enough” fit for our idea or
are the required criteria for conducting this test. if it is a representative sample data of the entire population.
• For Example- In a movie theatre, suppose we made a list of • For Example- Suppose we have bags of balls with five different
movie genres. Let us consider this as the first variable. The colours in each bag. The given condition is that the bag should
second variable is whether or not the people who came to watch contain an equal number of balls of each colour. The idea we would
those genres of movies have bought snacks at the theatre. Here like to test here is that the proportions of the five colours of balls in
the null hypothesis is that th genre of the film and whether each bag must be exact.
people bought snacks or not are unrelatable. If this is true, the
movie genres don’t impact snack sales.
WHO USES CHI-SQUARE ANALYSIS?
• Chi-square is most commonly used by
researchers who are studying survey response
data because it applies to categorical variables.
Demography, consumer and marketing research,
political science, and economics are all examples
of this type of research.
• Example
• Let's say you want to know if gender has
anything to do with political party preference.
You poll 440 voters in a simple random sample
to find out which political party they prefer. The
results of the survey are shown in the table
below:
YOUTUBE