Understanding Hypothesis Testing Basics
Understanding Hypothesis Testing Basics
Testing of Hypotheses
Hypothesis
• A hypothesis is an assumption, an idea that is proposed for
the sake of argument so that it can be tested to see if it might
be true.
• A hypothesis is an educated guess or proposed explanation
for something that can be tested. It’s a starting point for
further investigation or testing.
• A hypothesis is an prediction about the relationship between
two variables. It must be a testable statement, something that
you can support or falsify with observable evidence.
Key Concept
This section presents individual components of a hypothesis test, and the
following sections use those components in comprehensive procedures.
The role of the following should be understood:
null hypothesis
alternative hypothesis
test statistic
critical region
significance level
critical value
P-value
Type I and II error
Types of Hypothesis
Null Hypothesis (H0):
• A statement of no effect or no difference. It's the
default position that researchers aim to disprove.
Alternative Hypothesis (Ha or H1):
• A statement that contradicts the null hypothesis. It
suggests there is a significant effect or difference.
Hypothesis Testing
• Hypothesis testing is a structured method used to determine
if the findings of a study provide evidence to support a
specific theory relevant to a larger population.
• Hypothesis Testing is a type of statistical analysis in which you
put your assumptions about a population parameter to the
test. It is used to estimate the relationship between 2
statistical variables. When performing hypothesis testing, you
must understand different data types, such as nominal data.
Level of Significance
• The level of significance, often denoted by the Greek letter alpha (α), is a
crucial concept in statistical hypothesis testing. It represents the
probability of incorrectly rejecting a true null hypothesis, also known as a
Type I error. Rejecting a hypothesis which is true is risky.
• Level of significance is used for this. It is expressed as a percentage
(normally 5% or 1%). A 5% significance means, researcher is willing to
take as much as a 5% risk of rejecting (with sampling evidence) the null
hypothesis (H0) when it happens to be true. Thus the significance level is
the maximum value of the probability of rejecting H0 when it is true and is
usually determined in advance before testing the hypothesis.
p-value
• A p-value, or probability value, is a number describing the likelihood of
obtaining the observed data under the null hypothesis of a statistical test.
• The p-value serves as an alternative to rejection points to provide the
smallest level of significance at which the null hypothesis would be
rejected. A smaller p-value means stronger evidence in favor of the
alternative hypothesis.
• A p-value is a statistical measurement used to validate a hypothesis
against observed data.
• A p-value measures the probability of obtaining the observed results,
assuming that the null hypothesis is true.
• The lower the p-value, the greater the statistical significance of the
observed difference.
• A p-value of 0.05 or lower is generally considered statistically significant.
Decision Criterion
P-value method:
Decision Rules:
Let's say a researcher sets the significance level (α) at 0.05. If
the calculated p-value is 0.02, the decision rule would be:
Since 0.02 ≤ 0.05, reject the null hypothesis.
Test Statistic
• A test statistic is a single number calculated from sample data
during a hypothesis test. It summarizes the data and helps
determine whether to reject the null hypothesis. The test
statistic's value is compared to a critical value or p-value to
make this decision.
• They are derived from sample data using a specific formula
relevant to the chosen statistical test (e.g., t-test, z-test, chi-
square test).
• If the test statistic falls within the critical region (or the p-
value is below the significance level), the null hypothesis is
rejected.
Critical Region
• In statistics, a critical region, also known as the rejection region, is a set of
values for a test statistic that leads to the rejection of the null
hypothesis. It's a portion of the probability distribution that defines the
area where observed data is considered statistically significant enough to
cast doubt on the null hypothesis.
Type I Error
• A two-tailed test rejects the null hypothesis if, say, the sample mean is
significantly higher or lower than the hypothesized value of the mean of
the population. Such a test is appropriate when the null hypothesis is
some specified value and the alternative hypothesis is a value not equal to
the specified value of the null hypothesis.
• If the sample being tested falls into either of the critical areas, the
alternative hypothesis is accepted instead of the null hypothesis. The two-
tailed test gets its name from testing the area under both tails of a normal
distribution.
• Example: Suppose a researcher wants to test if the average height of
students in a particular school is different from the national average
height. They would use a two-tailed test because they are interested in
detecting any difference (either taller or shorter) from the national
average.
Two-tailed Test
H0: = is divided equally between
the two tails of the critical
H1: region
z = (x̄ - μ) / (σ / √n)
Where:
z: is the z-test statistic.
x̄: is the sample mean.
μ: is the population mean.
σ: is the population standard deviation.
n: is the sample size.
Step 3: Choose Level of Significance
Type I Error
• Occurs if the null hypothesis is rejected when it is in fact true.
• The probability of type I error ( α ) is also called the level of
significance.
Type II Error
• Occurs if the null hypothesis is not rejected when it is in fact false.
• The probability of type II error is denoted by β .
• Unlike α, which is specified by the researcher, the magnitude of β
depends on the actual value of the population parameter
(proportion).
Median
Diagrams:
Mode
Bar charts
Measures of dispersion:
Pie charts Range
Standard deviation
Bivariate Analysis
Bivariate analysis is concerned with the analysis of two
variables at a time in order to uncover whether the two
variables are related
Main types:
1. Simple Correlation
2. Simple Regression
3. Two-Way ANOVA
Multi-Variate Analysis
1. Multiple Correlation
2. Multiple Regression
3. Multi- ANOVA
Causal Analysis
The manner in which the total number of observations are distributed over different
classes is called a frequency distribution or systematic arrangement of numeric values
1) Histogram
2) Bar Graph
3) Circle Graph /pie diagram
4) Frequency polygon
5) Cumulative frequency curve / ogive curve
Histogram
Its is two dimensional frequency density
diagram
• Shows percentages
• Shows how a total is divided into parts
Disadvantages
• Not best for showing trends
FREQUENCY POLYGON
Median
Mode
Statistical Measures
A. Measures of central tendency – Mean,
Median, Mode
1. Mean
• The sum of the value of individual in the data
divided by the number of individual in the data.
FORMULA
Here,
∑, represents the summation
X, represents scores
N, represents number of scores
• 2. Median: It is a positional average. It is the value of the
middle item in a series, when they are arranged in ascending
or descending order. Not useful, when we have to determine
relative importance.
• If ‘n’ is odd: Median = (n+1)th term
2
• If ‘n’ is even: Median = n th term + n + 1th term
2 2 .
2
Mode
• 3. Mode: This is also a positional average. The mode in a
distribution is that item around which there is maximum
concentration.
• The mode is the value that occurs most frequently in the
dataset. Unlike the mean and median, the mode can be
applied to both numerical and categorical data. It's useful for
identifying the most common value in a dataset.
3,12,15,3,15,8,20,19,3,15,12,19,9
Mode = 3 and 15 (as they both occur maximum times)
C. Measures of Dispersion
It indicate the extend of scattering or
variability of item about a central value
RANGE
MEAN DEVIATION
STANDARD DEVIATION
QUARTILE DEVIATION
VARIANCE
COEFFICIENT OF VARIATION
COEFFICIENT OF STANDARD DEVIATION
COEFFICIENT OF VARIATION
1. Range?
• In a given data set the difference between the largest value and the
smallest value of the data set is called the range of data set. For example,
if the height(in cm) of 10 students in a class are given in ascending order,
160, 161, 167, 169, 170, 172, 174, 175, 177, and 181, respectively. Then,
the range of the data set is (181 - 160) = 21 cm.
2. Mean Deviation: Average of difference of the values of items
Where,
μ is the mean x is
each value
N is the number of values
• (Eg: class average mark is 60, I have got 72, my deviation from average is
12)
• Suppose someone got 48, then deviation is -12)
Modulus (Absolute Value):
• In the context of mean deviation, "modulus" means taking the absolute
value of each difference. For example, if a data point is 2 units away from
the mean, and another is -2 units away from the mean, both deviations
are treated as +2 when calculating the MD.
3. Standard deviation is a statistical measure that quantifies the
amount of variation or dispersion of a set of values from its mean. A
low standard deviation indicates that data points are clustered closely
around the mean, while a high standard deviation suggests the data is
more spread out.
In simpler terms:
• Low standard deviation: Values in the dataset are very similar to
each other and close to the average.
• High standard deviation: Values in the dataset vary significantly
from the average.
σ = √[∑(xi - μ)² / N],
where σ is the population standard deviation, xi is each individual data
point, μ is the population mean, and N is the total number of data
points.
4. Quartile Deviation
• It is based on the lower quartile Q1 and the upper quartile Q3.
• Q1=Value of (n/4 )
5. Variance is a statistical measure that quantifies the amount of variation or
dispersion within a set of data points. It essentially indicates how far each
data point deviates from the mean (average) of the dataset. A higher variance
means the data points are more spread out, while a lower variance indicates
they are clustered closer to the mean.
• Population Variance (σ²):
• σ² = Σ (xi - μ)² / N, where:
• σ² is the population variance
• xi is each individual data point
• μ is the population mean
• N is the number of data points in the population
6. The coefficient of variation (CV) is a statistical measure that expresses
the extent of variability in relation to the mean of a dataset. It is calculated as
the ratio of the standard deviation to the mean, often expressed as a
percentage. A higher CV indicates greater dispersion or variability relative to
the mean, while a lower CV suggests less variability.
• Formula:
• CV = (Standard Deviation / Mean) * 100%
Example:
Let's say two datasets have the following statistics:
Dataset A: Mean = 50, Standard Deviation = 10
Dataset B: Mean = 100, Standard Deviation = 10
Calculating the CV for each:
Dataset A: (10 / 50) * 100% = 20%
Dataset B: (10 / 100) * 100% = 10%
Even though both datasets have the same standard deviation, Dataset A has a
higher CV (20%) than Dataset B (10%). This indicates that the data in Dataset
A is more spread out relative to its mean compared to Dataset B.
7. Coefficient of standard deviation is a statistical measure that
indicates the extent of dispersion or variability in a dataset relative to
its mean. It is calculated by dividing the standard deviation by the
mean of the data. It is also known as the coefficient of
variation. The formula is: Coefficient of Variation = (Standard
Deviation / Mean) * 100.
Example:
• If a dataset has a standard deviation of 10 and a mean of 50,
the coefficient of variation would be:
• Coefficient of Variation = (10 / 50) * 100 = 20%
8. Coefficient of Variation (CV) formula is: CV = (Standard
Deviation / Mean) * 100. It's a dimensionless measure that
expresses the standard deviation as a percentage of the mean,
indicating the relative dispersion of a dataset.
Parametric Test
These tests depends upon assumptions typically
that
the population(s) from which data are
randomly sampled have a normal distribution.
Types of parametric tests are:
1. t- test
2. z- test
3 2- test
1. Z-test
• In a z-test, the sample is assumed to be normally distributed. A z-
score is calculated with population parameters such as “population
mean” and “population standard deviation” and is used to validate
a hypothesis that the sample drawn belongs to the same
population.
• Null: Sample mean is same as the population mean (z value is zero)
• Z = Z-test
• x̄ = sample average
• μ = mean
• s = standard deviation
If the statistic is lower than the critical value, accept the hypothesis or
else reject the hypothesis.
Critical Value
Example of z-test
Problems on z-test
Problem 1 : A new company XYZ claims that their fees are lower than that of your current
service provider, ABC. Data available from an independent research firm indicates that the
mean and standard deviation of all ABC clients are $18 and $6, respectively. 100 clients of
ABC are chosen, and the fees calculated by the rates provided by XYZ. The mean of this
sample is $18.75. Can you accept the hypothesis.
• H0: mean = 18
• H1: mean ≠ 18
s =σ/√n
• x̄ = sample average
• μ = mean
• s = standard deviation
= (18.75-18)/(6/10) = 1.25
H0 accepted
• Problem 2 : A school claimed that the students who study that are more
intelligent than the average school. On calculating the IQ scores of 50 students, the
average turns out to be 110. The mean of the population IQ is 100 and the
standard deviation is 15. State whether the claim of the principal is right or not at
a 5% significance level.
• H0: mean = 100
• H1: mean > 100 s =σ/√n
• x̄ = sample average
• μ = mean
• s = standard deviation
= (110-100)/(15/√50)
= 10/2.12 = 4.75
H0 is rejected
• Problem 3: A company claims that the average battery life of their new
smartphone is 12 hours. A consumer group tests 100 phones and finds the
average battery life to be 11.8 hours with a population standard deviation
of 0.5 hours. At a 5% significance level, is there evidence to refute the
company's claim?
• H₀: μ = 12
• H₁: μ ≠ 12
Z = (x̄ - μ) / (σ / √n)
= (11.8 - 12) / (0.5 / √100)
= -0.2 / 0.05
= -4
• -4 > -1.96, so we reject the null hypothesis.
• Problem 4: A gym trainer claimed that all the new boys in the gym are
above average weight.
• A random sample of thirty boys weight have a mean score of 112.5 kg and
the population mean weight is 100 kg and the standard deviation is 15.
• Is there a sufficient evidence to support the claim of gym trainer.
• H0 = 100
• Ha > 100
• Z = (112.5 – 100)/ √30 = 4.56
2. T-test
• Like a z-test, a t-test also assumes a normal distribution of the sample. A t-test is
used when the population parameters (mean and standard deviation) are not
known. It is also used to test whether there is a significant difference between the
means of two groups using t tables. T scores are given for a certain degree of
freedom. The degree of freedom (df) represents the number of independent
values that can vary in a sample.
Types of t-test
• T-test for one sample: in one sample t-test when we want to compare the
mean of a sample with a known reference mean. Eg: A manufacturer of
chocolate bars claims that its chocolate bars weigh 50 grams on average.
To verify this, a sample of 30 bars is taken and weighed. The mean value of
this sample is 48 grams. We can now perform a one sample t-test to see if
the mean of 48 grams is significantly different from the claimed 50 grams.
• Degree of freedom = n-1.
t = (sample mean - population mean) / (sample standard deviation / square
root of sample size)
• T-test for independent samples: Used to compare the names
of two independent groups or samples. We want to know if
there is a significant difference between these means. Eg: we
would like to compare the effectiveness of two painkillers,
drug A and drug B. Degree of freedom = n1+n2-2.
• The formula is: t = (mean of group 1 - mean of group 2) /
standard error. The standard error is calculated using the
variances and sample sizes of both groups.
T-test formula
• Paired sample t-test: The t-test for dependent samples is used to
compare the means of two dependent groups. We want to know how
effective a diet is. To do this, we weigh 30 people before the diet and
exactly the same people after the diet. With a paired samples t-test, we
only need to calculate the difference of the paired values and calculate the
mean from this. The standard error is then the same as in the t-test for
one sample. Degree of freedom = n-1.
• t = (mean of differences) / (standard deviation of differences / √n)
• When you know the standard deviation of population, use z-
test and
• When you don’t know the standard deviation of sample, use
t-test.
• z-test - Reject H₀ if the absolute value of the calculated z-
statistic is greater than the critical z-value (e.g., if alpha = 0.05,
critical value is 1.96 for two tailed and 1.65 for one tailed) or P
value is less than alpha value.
• t-test - If the absolute value of the calculated t-statistic is
greater than the critical t-value, or if the p-value is less than
alpha, then the null hypothesis is rejected.
Examples for one sample t-test
• Example 1: Consider this problem. Average weight of men in a region is
191 pounds. The mean weight of 100 men chosen was 197.1 pounds with
a standard deviation of 25.6. Can we accept the hypothesis?
• H0 = 191
• Ha ≠ 191
• Since we know the standard deviation of the sample population., we use
the t-test.
t = (1600-1570)/150/20) = 4
• (if t-score is greater than critical value or p value is less than significance
level, reject H0)
• Reject H0 in above case
• So the mean life is greater than 1570 hrs.
Hypothesis Testing of Proportions
Where:
= Sample proportion
p = population proportion
n = sample size
q = failure rate
Examples
• Example 1: the hypothesis is that 20 percent of the passengers go
in first class, but management recognizes the possibility that this
percentage could be more or less. A random sample of 400
passengers includes 70 passengers holding first class tickets. Can
the null hypothesis be rejected at 10 percent level of significance?
• H0: p = 0.2 Ha: p ≠ 0.2
• = 70/400 = 0.175. from table critical value at 10%
significance is 1.65.
z= (0.175-0.2)/√(0.2*0.8)/400
= -1.25
Since z<1.65, H0 is accepted
• Example 2: A sample survey indicates that out of 3232
births, 1705 were boys and the rest were girls. Do
these figures confirm the hypothesis that the sex ratio
is 50:50? Test at 5% level of significance.
• H0: p=0.5; Ha: p≠0.5; = 1705/3232 = 0.5275; its
two sided test.
Z= 0.5275 – 0.5000
0.0088
= 3.125
Reject H0, ratio is not 50:50
• Problem 3: A researcher wants to compare the effectiveness of two different
medications for reducing blood pressure. Medication A is tested on 50
patients, resulting in a mean reduction of 15 mmHg with a standard deviation
of 3 mmHg. Medication B is tested on 60 patients, resulting in a mean
reduction of 13 mmHg with a standard deviation of 4 mmHg. At a 1%
significance level, is there a significant difference between the two
medications? Critical value at significance level 1% is 2.576.
• H₀: μ₁ - μ₂ = 0
H₁: μ₁ - μ₂ ≠ 0
1 1
Z = (200-220)/√(11)2 +
100 15
= - 20/1.42 = -14.08
Its two sided test. Critical value is -1.96. So we reject H0. which means the two
samples is not been taken from the same population. This means the difference
between two samples is significant and not due to sampling fluctuations.
• Example 2: A simple random sampling survey in respect of
monthly earnings of semi-skilled workers in two cities gives
the following statistical information.
City Mean Standard Size of
monthly deviation of sample
earnings (Rs) sample data
of ,monthly
earnings (Rs)
A 695 40 200
B 710 60 175
• With 5% significant level, are their earnings in the two cities
same?
• H0: earnings are same
• Ha: Earnings are not same
Z = (695-710)/√(40)2/200 + (60)2/175)
= - 15/√(8+20.57) = -2.809
H0 is rejected. It means that the earnings in the
two cities are different.
Hypothesis testing for difference in proportions
• N-1 is the degree of freedom, n being the number of items in the sample
• Then by comparing the calculated value of x2 with its table value for (n-1)
degrees of freedom at a given level of significance, we may either accept
H0 or reject it. If the calculated value of x2 is equal to or less than the
table value, the null hypothesis is accepted; otherwise the null hypothesis
is rejected.
• If the calculated chi-square statistic is greater than the critical value, you
reject the null hypothesis. This suggests the observed data is significantly
different from what was expected under the null hypothesis.
• If the calculated chi-square statistic is less than or equal to the critical
value, you fail to reject the null hypothesis. This suggests the observed
data is not significantly different from what was expected under the null
hypothesis.
Test of Independence
• The Chi-Square Test of Independence is a derivable (also known as inferential)
statistical test which examines whether the two sets of variables are likely to be
related with each other or not. This test is used when we have counts of values for
two nominal or categorical variables and is considered as non-parametric test. A
relatively large sample size and independence of observations are required.
• Example: In a movie theatre, suppose we made a list of movie genres (horror,
comedy, thriller etc.,.) let us consider this as the first variable. The second variable
is whether or not the people who came to watch those genres of movies have
bought snacks at the theatre. Here the null hypothesis is that the genre of the film
and whether bought snacks or not are un-relatable. If this is true, the movie
genres don’t impact snack sales.
• A researcher wants to determine if there is an association between gender
(male/female) and preference for a new product (like/dislike). The test can assess
whether preferences are independent of gender.
Goodness-of-fit test
• A goodness-of-fit test is a statistical test that assesses how well a
statistical model fits a set of observations. It essentially measures the
discrepancy between observed values and the values expected under a
specific model. These tests are used to determine if a sample data set is
representative of a larger population or if a chosen probability distribution
accurately describes the data.
• = 50/9
=(50/9)/5 (10-1) = 10
Degree of Freedom 10-1 = 9
The table value of at 5 per cent level for 9 d.f. is 16.92. The calculated value of
is less than this table value, so we accept the null hypothesis and conclude that the
variance of the population is 5 as given in the question.
1
[Link].
𝑝
𝐴
=
1
4
,
𝑝
𝐴
𝐵
• Problem 2: Genetic theory states that children having one parent of blood type A
=
1
and the other of blood type B will always be of one of three types, A, AB, B and
2
,
𝑝 the proportion of three types will on an average be as 1 : 2 : 1. A report states
that
that out of 300 children having one A parent and B parent, 30 per cent were found
Alternative hypothesis
: The population proportions do not follow
to be types A, 45 per cent percent type AB and remainder type B. Test the
1
:
hypothesis by test.
2
:
• The observed frequencies of type A, AB and B is given in the question are 90, 135
1
[Link]; i.e., at least one of
𝑝 75 respectively.
and
𝐴
• The
,
𝑝 expected frequencies of type A, AB and B (as per the genetic theory) should
𝐴
have been 75, 150 and 75 respectively.
𝐵
,
𝑝
𝐵
p
A
,p
AB
,p
B
= 3+1.5+0 = 4.5
d.f. = n-1 = 3-1 = 2
Table value of for 2 d.f. at 5 per cent level of significance is 5.991.
The calculated value of is 4.5 which is less than the table value and hence can be ascribed
to have taken place because of chance. This supports the theoretical hypothesis of the
genetic theory that on an average type A, AB and B stand in the proportion of 1 : 2 : 1.