0% found this document useful (0 votes)
13 views18 pages

W8 Hypothesis Testing

Uploaded by

Thu Phương
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views18 pages

W8 Hypothesis Testing

Uploaded by

Thu Phương
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

BIG DATA

Week 8 – Hypothesis testing


July 2024
Inferential statistics-
Hypothesis testing
2
TODAY’S OBJECTIVES

We will be dealing with:


1. Null/Alternative hypotheses
2. Test statistics
3. P-values
4. Chi-squared test

3
HYPOTHESIS TESTING

Our objective is to choose between these two opposite statements about the population,
where these statements are known as hypotheses. By convention these are denoted by
𝐻0 & 𝐻1 .

Null hypothesis, 𝐻0 Alternative hypothesis, 𝐻1


A modified process does not produce a A modified process does produce a
higher yield than the standard process. higher yield than the standard process.
There is no association between smoking There is an association between smoking
and lung cancer incidence and lung cancer incidence
A person is not gifted with Extra Sensory A person is gifted with Extra Sensory
Perception. Perception.
The average level of lead in the blood in a The average level of lead in the blood
particular environment of people in a particular environment of
is as high as 3.0. people is lower than 3.0.

There is no difference in the mean of the There is a difference in the mean of the
exam score between Class A and Class B exam score between Class A and Class B
4
HYPOTHESIS TESTING : EXAMPLE

Example : A coin is tossed 100 times in order to decide whether or not it is a fair coin.
𝐻0 : Prob of getting a head = prob of getting a tail = ½.
𝐻1 : Prob of getting a head ≠ prob of getting a tail.

We observe 98 heads and 2 tails. Common sense tells us that this is (very!) strong
evidence that the coin is biased toward heads.

More formally, we reject the null hypothesis that the coin is fair as the chance of obtaining 98
heads with a fair coin is extremely low - the observation, 98 heads, is inconsistent with the coin
being fair since we would have expected (approximately) 50 heads and the observation
significantly departed from this expectation.

Choosing between competing hypotheses requires us to conduct a statistical test,


which use sample data.

5
TEST STATISTIC

In ‘classical testing’, we always assume the null hypothesis is true by performing the test conditional on
𝐻0 being true. That is, 𝐻0 is our ‘working hypothesis' which we hold to be true until we obtain significant
evidence against it.

A test statistic is the formal mechanism used to evaluate the support


given to 𝐻0 by sample data.

Note:
• Different test statistics are used for testing different forms of hypotheses.
• Observed (or measured) test statistic is a value calculated from the sample data.

6
SIGNIFICANCE LEVELS

Definition: Significance level 𝛼 % is defined by the probability when rejecting 𝐻0 when it is true (i.e. a false
positive). It is sometimes referred to as type I error.

We control for the probability of committing a Type I error by setting the value for 𝛼.

Interpretation: If we perform a test at the 5% significance level, say, then we are basing our decision on a
procedure which gives us a 5% probability of making a Type I error when 𝐻0 is true.

It is common to use significance level 1% , 5% and 10%.

7
If a test is performed at the 10% significance level, what
does it imply?
A. There is a 10% probability of rejecting a true null hypothesis
B. There is a 10% probability of accepting a false null hypothesis
What does the significance level measure?

A. Type II error
B. Beta level
C. Type I error
D. Alpha level
P-VALUES

One way to make decision in hypothesis testing is based on p-value

Definition: p-value is the probability of obtaining test result at least as extreme as observed test statistic
under the assumption that the null hypothesis is true.

10
PROCEDURE OF HYPOTHESIS TESTING (P-VALUE APPROACH)

Define the Calculate observed


Set significance
hypotheses 𝐻0 vs test statistic and p-
level
𝐻1 . value.

Decide whether or
Draw conclusions
not to reject 𝐻0

11
CHI-SQUARED 𝝌𝟐
TEST

12

This Photo by Unknown Author is licensed under CC BY-NC


CHI-SQUARED 𝜒 2 TEST

A chi-squared 𝜒 2 test is a statistical hypothesis test used in the analysis of contingency tables when the
sample sizes are large.

Commonly used for Categorical data

This type of test tests the null hypothesis that two factors (or attributes) are not associated, against the
alternative hypothesis that they are associated. That is,

• 𝐻0 : There is no association between Factor 1 and Factor 2. (or Factor 1 and Factor 2 are independent)
• 𝐻1 : There is an association between Factor 1 and Factor 2. (or Factor 1 and Factor 2 are dependent)
CHI-SQUARED 𝜒 2 TEST: EXAMPLE
Example: The following cross-tabulation shows data on the 3,593 people who applied to graduate study at
the University X. Dung classified the applicants according to their sex, and whether or not they were
admitted to the university.
Admitted
Sex No Yes Total
Male 1,180 686 1,866
Female 1,259 468 1,727
Total 2,439 1,154 3,593

Perform a hypothesis testing using 𝜒 2 test at 5% level of significance.

• 𝐻0 : There is no association between decision of admission and gender.


• 𝐻1 : There is an association between decision of admission and gender.
CHI-SQUARED 𝜒 2 TEST: JASP EXAMPLE

INPUT DATA
CHI-SQUARED 𝜒 2 TEST: JASP EXAMPLE

Observed test statistic

p-value < 5%

Decision: Reject 𝐇𝟎 , there is sufficient evidence to suggest there is an association between decision of admission and gender.
CHI-SQUARED 𝜒 2 TEST: EXAMPLE
Example: Linh, however, decides to take another look at the statistics. She adds one more piece of data, the
department to which each person applied, and creates cross-tabulations separately for each department
(which are labelled A, B and C).
Admitted
Department Sex No Yes Total
A Male 207 353 560
Female 8 17 25
B Male 484 258 724
Female 635 346 968
C Male 489 75 564
Female 616 105 734
Total Total 2,439 1,154 3,593

Perform a hypothesis testing using 𝜒 2 test at 5% level of significance.

• 𝐻0 : There is no association between decision of admission and gender in department A/B/C.


• 𝐻1 : There is an association between decision of admission and gender in department A/B/C. .
CHI-SQUARED 𝜒 2 TEST: JASP EXAMPLE

Observed test statistic

p-value > 5%
Decision
p-value > 5%
???
p-value > 5%

You might also like