Module_3_Class
Module_3_Class
Dr. P. Rajendra
CMRIT, Bengalore.
Dr. P. Rajendra (Professor, Dept. of Maths) Module III: Statistical Inference 1 CMRIT, Bengalore. 1 / 14
Outline
1 Sampling in Statistics
2 Statistical Inference
3 Sampling Distribution
4 Standard Error
5 Hypothesis Testing
7 Confidence Interval
Dr. P. Rajendra (Professor, Dept. of Maths) Module III: Statistical Inference 1 CMRIT, Bengalore. 2 / 14
Sampling in Statistics
* Sampling is a statistical method of obtaining representative data
(observations) from a group. We often use sampling concepts in
everyday life without realizing it. For example, Checking the quality
of rice by taking a handful is an example of random sampling from a
large population.
* Population (Universe):The group of objects (or individuals) under
study is called the population or universe. A population can be either
Finite or Infinite.
* Sample: A part of the population that contains selected objects (or
individuals) is called a sample. The number of individuals in a sample
is called the sample size.
* Random sampling is the selection of objects (individuals) from the
population in such a way that each object has the same chance of
being selected. The lottery system is a common example of random
sampling.
* Simple sampling is a special case of random sampling.Each event in
simple sampling has the same probability of success or failure.
Dr. P. Rajendra (Professor, Dept. of Maths) Module III: Statistical Inference 1 CMRIT, Bengalore. 3 / 14
Statistical Inference
Dr. P. Rajendra (Professor, Dept. of Maths) Module III: Statistical Inference 1 CMRIT, Bengalore. 4 / 14
Population
Sample
(a) Entire group of interest
(a) Subset of the population
(b) Often too large to study
(b) Used to make inferences
entirely
(c) Described by statistics (e.g.,
(c) Described by parameters
x̄, s)
(e.g., µ, σ)
Example: In AI/ML: We often use statistical inference to understand and
make predictions about large datasets. Predicting house prices based on
features like size, location, etc.
. Population: All houses in a city
. Sample: Dataset of houses with known prices and features
. Parameter: True relationship between features and price
. Statistic: Estimated relationship from our model (e.g., coefficients in
linear regression)
. Inference: Using the model to predict prices of new houses
Dr. P. Rajendra (Professor, Dept. of Maths) Module III: Statistical Inference 1 CMRIT, Bengalore. 5 / 14
Sampling Distribution
A sampling distribution is the distribution of a statistic over many
samples. It describes the variability of the statistic. Most commonly used:
sampling distribution of the sample mean (X̄ ).
Sampling Distribution of the Mean
0.8
Normal Distribution
0.6
Frequency
0.4
0.2
0
95 97.5 100 102.5 105
Sample Mean
* Properties:
. Center: Expected value of the statistic
. Spread: Variability of the statistic
Dr. P. Rajendra (Professor, Dept. of Maths) Module III: Statistical Inference 1 CMRIT, Bengalore. 6 / 14
Standard Error
i. The Standard Error (SE) is the standard deviation of a sampling
distribution. The most commonly used one is Standard Error of the
Mean.
ii. Formula: SE = √σn , where σ is the population standard deviation
and n is the sample size. If σ is unknown, we estimate it with the
sample standard deviation s. The reciprocal of standard error is called
the precision.
Effect of Sample Size on Standard Error
5
√
SE = σ/ n
4
Standard Error
0
20 40 60 80 100
Dr. P. Rajendra (Professor, Dept. of Maths) Sample
Module III: SizeInference
Statistical (n) 1 CMRIT, Bengalore. 7 / 14
Hypothesis Testing
* Hypothesis testing is a statistical method used to make inferences
about a population based on sample data. It involves making an
assumption (hypothesis) about a population parameter and then
testing this assumption using sample data.
* Key components:
. Null hypothesis (H0 ): The assumption we start with
. Alternative hypothesis (H1 or Ha ): The competing claim
. Test statistic: A value calculated from the sample data
. Decision rule: Criteria for rejecting or failing to reject H0
Steps in Hypothesis Testing:
1 State the null and alternative hypotheses
Dr. P. Rajendra (Professor, Dept. of Maths) Module III: Statistical Inference 1 CMRIT, Bengalore. 8 / 14
Type I, Type II Errors and Levels of Significance
* Type I Error (False Positive): Rejecting the null hypothesis when it’s
actually true.Probability = α (significance level)
* Type II Error (False Negative): Failing to reject the null hypothesis
when it’s actually false. Probability = β
* Power of a test = 1 - β (probability of correctly rejecting a false null
hypothesis)
* The level of significance (α) is the probability of rejecting the null
hypothesis when it is actually true. It represents the risk of making a
Type I error.
* Common levels: 0.05 (5%), 0.01 (1%), 0.10 (10%)
* Smaller α means Lower risk of Type I error and Higher risk of Type II
error (failing to reject a false null hypothesis)
* Choosing the Level of Significance is depends on the nature of the
problem & consequences of errors. For (i) Critical applications like
healthcare diagnosis: Lower α (0.01 or 0.001) & (ii) Exploratory Data
Analysis: Higher α (0.05 or 0.10)
* Decreasing α typically increases β
Dr. P. Rajendra (Professor, Dept. of Maths) Module III: Statistical Inference 1 CMRIT, Bengalore. 9 / 14
Probability
Test Statistic
Critical Value
Dr. P. Rajendra (Professor, Dept. of Maths) Module III: Statistical Inference 1 CMRIT, Bengalore. 10 / 14
Confidence Interval
Dr. P. Rajendra (Professor, Dept. of Maths) Module III: Statistical Inference 1 CMRIT, Bengalore. 11 / 14
Probability
Dr. P. Rajendra (Professor, Dept. of Maths) Module III: Statistical Inference 1 CMRIT, Bengalore. 12 / 14
Question 1: Explain the following terms:
(i).Standard Error
(ii). Statistical Hypothesis
(iii). Critical Region of a Statistical Test
(iv). Test of Significance
Answer:
i. Standard Error: The standard deviation of the sampling distribution of
a statistic, usually the mean. It measures the accuracy with which a
sample represents the population.
ii. Statistical Hypothesis: A statement about a population parameter
that can be tested using statistical methods. Common hypotheses include
null (H0 ) and alternate (H1 ).
iii. Critical Region of a Statistical Test: The range of values for which
the null hypothesis is rejected. If the test statistic falls in this region, it
indicates that the result is statistically significant.
iv. Test of Significance: A method to determine if the observed data
provide enough evidence to reject a null hypothesis. Common tests include
the Z-test, t-test, and chi-square test.
Dr. P. Rajendra (Professor, Dept. of Maths) Module III: Statistical Inference 1 CMRIT, Bengalore. 13 / 14
Question 2: Define the following terms:
(i).Alternate Hypothesis
(ii). A Statistic
(iii). Level of Significance
(iv). Two-Tailed Test
Answer:
i. Alternate Hypothesis (H1 ): A hypothesis that proposes a change or
difference from the null hypothesis. It represents the conclusion that is
accepted if the null hypothesis is rejected.
ii. A Statistic: A quantity calculated from sample data, used to estimate
a population parameter. Examples: sample mean, sample variance.
iii. Level of Significance (α): The probability of rejecting the null
hypothesis when it is actually true (Type I error). Common values are 0.05
(5%) and 0.01 (1%).
iv. Two-Tailed Test: A test of significance where the critical region is in
both tails of the probability distribution. It checks for deviation in either
direction from the hypothesized value.
Dr. P. Rajendra (Professor, Dept. of Maths) Module III: Statistical Inference 1 CMRIT, Bengalore. 14 / 14
Hypothesis Testing problems
(Problems based on Binomial distributions and Proportions)
Dr. P. Rajendra
CMRIT, Bengalore.
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 1 / 29
Hypothesis Testing problems based on Binomial
distribution
The binomial distribution can be approximated by the normal distribution
when the sample size is large. Normal approximation is valid if np ≥ 5,
where p is the probability of success.
Steps of Hypothesis Testing:
1 State the Hypotheses:
H0 : p = p0 (Null Hypothesis)
H1 : p ̸= p0 (Alternative Hypothesis) for a two-tailed test.
2 Choose the Significance Level:
Reject
Dr. P. Rajendra or Dept.
(Professor, fail oftoMaths)
reject HHypothesis
0 basedTesting
on problems
the comparison.CMRIT, Bengalore. 2 / 29
Problem 1: A coin was tossed 400 times, and the head turned up 216
times. Test the hypothesis that the coin is unbiased at a 5% level of
significance.
Solution:
Step 1:
Null hypothesis (H0 ): The coin is unbiased (p = 0.5).
Alternative hypothesis (H1 ): The coin is biased (p ̸= 0.5).
Step 2: Expected number of heads
1
E (heads) = × 400 = 200 = np
2
Observed number of tails = 216.
Step 3: Standard Deviation
1 1 √
r
√
S.D. = npq = 400 × × = 100 = 10
2 2
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 3 / 29
Step 4:
The z-test statistic formula is:
x − np
z= √
npq
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 4 / 29
The critical values at a 5% significance level are z = −1.96 and z = 1.96.
Our calculated z-score (1.6) lies in the acceptance region.
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 5 / 29
Problem 2: A coin was tossed 1600 times, and the tail turned up 864
times. Test the hypothesis that the coin is unbiased at a 1% level of
significance.
Solution:
Step 1:
Null hypothesis (H0 ): The coin is unbiased (p = 0.5).
Alternative hypothesis (H1 ): The coin is biased (p ̸= 0.5).
Step 2: Expected number of tails
1
× 1600 = 800 = np
E (tails) =
2
Observed number of tails = 864.
Step 3:
1 1 √
r
√
S.D. = npq = 1600 × × = 400 = 20
2 2
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 6 / 29
Step 4:
The z-test statistic formula is:
x − np
z= √
npq
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 7 / 29
The critical values at a 1% significance level are z = −2.576 and
z = 2.576. Our calculated z-score (3.2) lies in the rejection region.
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 8 / 29
Problem 3: In 324 throws of a six-faced die, an odd number turned up
181 times. Test the hypothesis that the die is unbiased at the 1% level of
significance.
Solution:
Step 1:
Null hypothesis (H0 ): The die is unbiased (p = 0.5), i.e., the
probability of getting an odd number (1, 3, 5) is the same as the
probability of getting an even number (2, 4, 6).
Alternative hypothesis (H1 ): The die is biased (p ̸= 0.5).
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 9 / 29
Step 2:Expected Number of 3’s or 4’s
The probability of getting an odd number (1, 3, or 5) on a fair die is:
3
P(odd number) =
= 0.5
6
Hence, the expected number of odd numbers in 324 throws is:
√ √
∴ S.D. = 324 × 0.5 × 0.5 = 81 = 9
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 10 / 29
Step 4:
The z-test statistic formula is:
x − np
z= √
npq
Substituting the values:
181 − 162 19
z= = = 2.11
9 9
Step 5: At the 1% level of significance for a two-tailed test, the critical
value is z = 2.576.Since z = 2.11 is less than 2.576, we fail to reject the
null hypothesis.
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 11 / 29
The critical values at a 1% significance level are z = −2.576 and
z = 2.576. Our calculated z-score (2.11) lies within the acceptance region.
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 12 / 29
Problem 4: A die is thrown 9000 times, and a throw of 3 or 4 was
observed 3240 times. Test whether the die can be regarded as unbiased.
Solution:
Step 1:
Null hypothesis (H0 ): The die is unbiased (p = 62 = 31 ), i.e., the
probability of getting a 3 or 4 is 1/3.
Alternative hypothesis (H1 ): The die is biased (p ̸= 13 ).
Step 2: Expected Number of 3’s or 4’s
The probability of getting a 3 or 4 on a fair die is:
2 1
=P(3 or 4) =
6 3
Hence, the expected number of 3’s or 4’s in 9000 throws is:
1
× 9000 = 3000
E (3 or 4) =
3
Observed number of 3’s or 4’s = 3240.
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 13 / 29
Step 3:
The standard deviation is calculated using the formula for binomial
distribution:
√
S.D. = npq
1 2 √
r
∴ S.D. = 9000 × × = 2000 ≈ 44.72
3 3
Step 4:
The z-test statistic formula is:
x − np
z= √
npq
Substituting the values:
3240 − 3000 240
z= = ≈ 5.37
44.72 44.72
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 14 / 29
Step 5: At the 5% level of significance for a two-tailed test, the critical
value is z = 1.96. Since z = 5.37 is much greater than 1.96, we reject the
null hypothesis.
Conclusion: The die is significantly biased.
Step 4:
For a two-tailed test at the 5% significance level, the critical values are
z = ±1.96 and the calculated z-value is 1.6, which is within the
acceptance region (−1.96, 1.96).
Conclusion: Since the calculated z-value does not exceed the critical
value, we fail to reject the null hypothesis. Therefore, there is no
evidence to suggest the coin is biased.
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 18 / 29
The shaded areas represent the rejection regions for a two-tailed test with
α = 0.05. The calculated z = 1.6 falls within the acceptance region.
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 19 / 29
Problem 6: A coin is tossed 1600 times, and tails turn up 864 times. Test
the hypothesis that the coin is unbiased at a 1% level of significance.
Step 1: Null Hypothesis (H0 ):
The coin is unbiased, meaning the proportion of tails is 0.5.
H0 : p = 0.5
Alternative Hypothesis (H1 ):
The coin is biased, meaning the proportion of tails is not equal to 0.5.
H1 : p ̸= 0.5
This is a two-tailed test.
Step 2: Significance level (α) = 0.01.
Step 3: Observed Proportion of Tails:
864
p̂ = = 0.54
1600
Expected Proportion under H0 :
p0 = 0.5
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 20 / 29
Step 4: Test Statistic is
p̂ − p0
z=q
p0 (1−p0 )
n
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 21 / 29
The shaded areas represent the rejection regions for a two-tailed test with
α = 0.01. The calculated z = 3.2 falls within the rejection region.
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 22 / 29
Problem 7: In 324 throws of a six-faced die, an odd number turned up
181 times. Test the hypothesis that the die is unbiased at a 1% level of
significance.
Solution:
Step 1:
Null Hypothesis (H0 ): The die is unbiased, meaning the proportion of odd
numbers is 0.5.
H0 : p = 0.5
Alternative Hypothesis (H1 ): The die is biased, meaning the proportion of
odd numbers is not equal to 0.5.
H1 : p ̸= 0.5
Step 2: Significance level (α) = 0.01.
Step 3: Observed Proportion of Odd Numbers:
181
p̂ = = 0.5586
324
Expected Proportion under H0 :
p0 = 0.5
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 23 / 29
Step 4: The Test Statistic is
p̂ − p0
z=q
p0 (1−p0 )
n
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 24 / 29
The shaded areas represent the rejection regions for a two-tailed test with
α = 0.01. The calculated z = 2.11 falls within the acceptance region.
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 25 / 29
Problem 8 A die is thrown 9000 times, and a throw of 3 or 4 was
observed 3240 times. Test whether the die can be regarded as unbiased
using a hypothesis test for proportions.
Solution:
Step 1:
Null Hypothesis (H0 ): The die is unbiased, meaning the proportion of
throws resulting in 3 or 4 is 62 = 31 .
1
H0 : p =
3
Alternative Hypothesis (H1 ): The die is biased, meaning the proportion of
throws resulting in 3 or 4 is not equal to 13 .
1
H1 : p ̸=
3
Step 2:
We will conduct the test at the 5% significance level (α = 0.05). This
means there is a 5% risk of rejecting the null hypothesis when it is true.
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 26 / 29
3240
Step 3: Observed Proportion of Throws with 3 or 4: p̂ = 9000 = 0.36
Expected Proportion under H0 : p0 = 31 = 0.3333
Step 4: The Test Statistic is
p̂ − p0
z=q
p0 (1−p0 )
n
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 28 / 29
Assignment Problems
1. In a city, a sample of 500 people is taken, out of which 280 are tea
drinkers, and the rest are coffee drinkers. Can we assume that both coffee
and tea are equally popular in this city at a 5% level of significance?
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems CMRIT, Bengalore. 29 / 29
Sampling and Significance Tests
(Simple sampling of attributes. Test of significance for large samples,
comparison of large samples)
Dr. P. Rajendra
CMRIT, Bengalore.
Dr. P. Rajendra (Professor, Dept. of Maths) Sampling and Significance Tests CMRIT, Bengalore. 1 / 13
Simple Sampling of Attributes
Dr. P. Rajendra (Professor, Dept. of Maths) Sampling and Significance Tests CMRIT, Bengalore. 2 / 13
Test of Significance for Large Samples
where, x̄1 , x̄2 are the sample means, σ1 , σ2 are the population
standard deviations, n1 , n2 are the sample sizes.
Scenario: We have two datasets with different feature values, and we
want to test whether their mean feature values differ significantly.
Dr. P. Rajendra (Professor, Dept. of Maths) Sampling and Significance Tests CMRIT, Bengalore. 4 / 13
Test of Significance of Difference between Two Sample
Proportions
Used to test whether two population proportions are significantly
different.
Null Hypothesis (H0 ): P1 = P2
Alternate Hypothesis (H1 ): P1 ̸= P2
Test statistic:
p1 − p2
Z=r (3)
P(1 − P) n11 + n12
Solution:
Step 1: Null hypothesis (H0 ): µ1 = µ2
Alternative hypothesis (HA ): µ1 ̸= µ2 (two-tailed test)
Step 2: x̄1 = 75, x̄1 = 73, σ1 = 8, σ2 = 10, n1 = 60, n2 = 100
Step 3: Z-test formula for difference of means:
Dr. P. Rajendra (Professor, Dept. of Maths) Sampling and Significance Tests CMRIT, Bengalore. 10 / 13
Since the calculated Z-value (1.39) is less than 1.96, we fail to reject the
null hypothesis at the 5% level. Hence, there is no significant difference
between the means of boys and girls.
Dr. P. Rajendra (Professor, Dept. of Maths) Sampling and Significance Tests CMRIT, Bengalore. 11 / 13
Problem 4: A group of researchers tested two machine learning
algorithms (Algorithm A and Algorithm B) on a benchmark dataset. The
results of their accuracy scores are as follows:
Algorithm Mean (x̄) Standard Deviation (σ) Sample Size (n)
Algorithm A 85 5 50
Algorithm B 82 6 80
Using a significance level of 5%, determine if there is a significant
difference in the mean accuracy scores of the two algorithms.
Solution:
Step 1: Null hypothesis (H0 ): µA = µB
Alternative hypothesis (HA ): µA ̸= µB (two-tailed test)
Step 2: x̄A = 85, x̄B = 82, σA = 5, σB = 6, nA = 50, nB = 80
Step 3: Z-test formula for difference of means:
(x̄A − x̄B ) (85 − 82)
Z=q 2 q ≈ 3.08
σA σB2 52 62
nA + nB 50 + 80
Dr. P. Rajendra (Professor, Dept. of Maths) Sampling and Significance Tests CMRIT, Bengalore. 12 / 13
Since the calculated Z-value (3.08) is greater than 1.96, we reject the
null hypothesis at the 5% level. Hence, there is a significant difference
between the mean accuracy scores of Algorithm A and Algorithm B.
Dr. P. Rajendra (Professor, Dept. of Maths) Sampling and Significance Tests CMRIT, Bengalore. 13 / 13
Hypothesis Testing problems - II
Difference of Two-Proportion
Dr. P. Rajendra
CMRIT, Bengalore.
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems - II CMRIT, Bengalore. 1 / 11
Problem 1: A sample of 300 units of a manufactured product contains 65
defective units. In another sample of 200 units, 35 units were found
defective. At the 5% level of significance, we want to test if there is a
significant difference in the proportion of defectives between the two
samples.
Solution:
Step 1:
Null hypothesis (H0 ): H0 : p1 = p2
Alternative hypothesis (H1 ): H1 : p1 ̸= p2
Step 2: The sample proportions are calculated as follows:
65
pˆ1 = = 0.2167
300
35
pˆ2 = = 0.175
200
Step 3: The pooled proportion (P) is the combined proportion of
defectives from both samples:
x1 + x2 65 + 35 100
P= = = = 0.2
n1 + n2 300 + 200 500
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems - II CMRIT, Bengalore. 2 / 11
Step 4: The Z-test statistic is calculated using the formula:
(pˆ1 − pˆ2 )
z=r
P(1 − P) n11 + 1
n2
(0.2167 − 0.175)
z=q
1 1
≈ 1.14
0.2(1 − 0.2) 300 + 200
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems - II CMRIT, Bengalore. 4 / 11
Problem 2: In a large city A, 20% of a random sample of 900 school boys
had a slight physical defect. In another large city B, 18.5% of a random
sample of 1600 school boys had the same defect. At the 5% level of
significance, we want to test if the difference between the proportions is
significant.
Solution:
Step 1:
Null hypothesis (H0 ): H0 : p1 = p2
Alternative hypothesis (H1 ):H1 : p1 ̸= p2
Step 2: The sample proportions are:
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems - II CMRIT, Bengalore. 5 / 11
Step 4: The Z-test statistic is calculated using the formula:
pˆ1 − pˆ2
z=r
P(1 − P) n11 + 1
n2
(0.20 − 0.185)
z=q
1 1
≈ 0.94
0.19(1 − 0.19) 900 + 1600
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems - II CMRIT, Bengalore. 7 / 11
Problem 3: Before an increase in excise duty on tea, 800 out of 1000
people were tea drinkers. After the increase, 800 people were tea drinkers
out of 1200 people sampled. At the 5% level of significance, we want to
test if the difference in tea consumption before and after the excise.
Solution:
Step 1:
Null hypothesis (H0 ): H0 : p1 = p2 Alternative hypothesis (H1 ):
H1 : p1 ̸= p2
Step 2: The sample proportions are calculated as follows:
800
pˆ1 = = 0.80 (Before excise duty increase)
1000
800
pˆ2 =
= 0.6667 (After excise duty increase)
1200
Step 3: The pooled proportion (P) is the combined proportion of tea
drinkers from both samples:
x1 + x2 800 + 800
P= = = 0.7273
n1 + n2 1000 + 1200
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems - II CMRIT, Bengalore. 8 / 11
Step 4: The Z-test statistic is calculated using the formula:
(p1 − p2 )
z=r
P(1 − P) n11 + 1
n2
(0.80 − 0.6667)
z=q
1 1
≈ 7.00
0.7273(1 − 0.7273) 1000 + 1200
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems - II CMRIT, Bengalore. 10 / 11
Assignment Questions
(1) In a sample of 600 men from a certain city, 450 are found smokers. In
another sample of 900 men from another city, 450 are smokers. Do the
data indicate that the cities are significantly different with respect to the
habit of smoking among men? Test at 5% significance level.
z = 6.38
(2) A sample of 100 tyres is taken from a lot. The mean life of a tyre is
found to be 39350 kms with a SD of 3260. Can it be considered as the
true random sample from a population with a mean life of 40000 kms?
(Use 5% significance level).
(3) In two large populations there are 30% and 25% respectively of fair
haired people. Is this difference likely to be hidden in samples of 1200 and
900 respectively from the two populations?
(4) A stenographer claims that she can type at the rate of 120 words per
minute. Can we reject her claim on the basis of 100 trials in which she
demonstrates a mean of 116 words with a standard deviation of 15 words?
Use 5% level of significance.
Dr. P. Rajendra (Professor, Dept. of Maths) Hypothesis Testing problems - II CMRIT, Bengalore. 11 / 11
Confidence Intervals for Means and Proportions
Dr. P. Rajendra
CMRIT, Bengalore.
Dr. P. Rajendra (Professor, Dept. of Maths)Confidence Intervals for Means and Proportions CMRIT, Bengalore. 1 / 13
(i) Confidence Interval for Mean:
If the population standard deviation σ is known, the confidence interval for
the population mean µ is given by:
σ
x̄ ± Z × √
n
Where x̄ is the sample mean, Z is the desired confidence level, σ is the
population standard
Dr. P. Rajendra (Professor, Dept. of Maths)Confidence Intervals for Means and Proportions CMRIT, Bengalore. 2 / 13
(iii) Confidence Interval for Difference of Two Means:
For two independent samples, the confidence interval for the difference in
means µ1 − µ2 is given by:
s
σ12 σ22
(x̄1 − x̄2 ) ± Z × +
n1 n2
Where x̄1 and x̄2 are the sample means, σ1 and σ2 are the population
standard deviations for the two groups, n1 and n2 are the sample sizes.
σ 2
SE = √ = = 0.1333
n 15
Thus, we are 95% confident that the true mean weight lies between 66.74
pounds and 67.26 pounds.
Dr. P. Rajendra (Professor, Dept. of Maths)Confidence Intervals for Means and Proportions CMRIT, Bengalore. 4 / 13
Dr. P. Rajendra (Professor, Dept. of Maths)Confidence Intervals for Means and Proportions CMRIT, Bengalore. 5 / 13
Problem 2: The heights of a random sample of 50 college students
showed a mean of 174.5 centimeters and a standard deviation of 6.9
centimeters. Construct a 99% confidence interval for the mean height of
all college students.
Solution: Given:
Sample size n = 50, Sample mean x̄ = 174.5 cm
Sample standard deviation s = 6.9 cm
Confidence level = 99%
Standard error (SE) is calculated as:
s 6.9
SE = √ = √ = 0.9756 cm
n 50
The 99% confidence interval is:
Thus, we are 99% confident that the true mean height of college students
lies between 171.99 cm and 177.01 cm.
Dr. P. Rajendra (Professor, Dept. of Maths)Confidence Intervals for Means and Proportions CMRIT, Bengalore. 6 / 13
Figure: Normal Distribution Curve showing the 99% Confidence Interval
Dr. P. Rajendra (Professor, Dept. of Maths)Confidence Intervals for Means and Proportions CMRIT, Bengalore. 7 / 13
Problem 3: A random sample of 500 apples was taken from a large
consignment, and 65 were found to be bad. Estimate the proportion of
bad apples in the consignment as well as the standard error of the
estimate. Also, find the percentage of bad apples in the consignment.
Solution: Given:
Sample size n = 500, Number of bad apples in the sample x = 65
The estimated proportion p̂ is calculated as:
x 65
p̂ = = = 0.13
n 500
The standard error (SE) of the proportion is given by:
r r
p̂ q̂) 0.13 × 0.87
SE = = = 0.0154
n 500
The percentage of bad apples is:
p̂ × 100 = 0.13 × 100 = 13%
Thus, the estimated proportion of bad apples in the consignment is 0.13,
with a standard error of 0.0154, and 13% of the apples are bad.
Dr. P. Rajendra (Professor, Dept. of Maths)Confidence Intervals for Means and Proportions CMRIT, Bengalore. 8 / 13
Problem 4: In a locality of 18,000 families, a sample of 840 families was
selected at random. Of these 840 families, 206 families were found to have
a monthly income of Rs. 2500 or less. Estimate how many of the 18,000
families have a monthly income of Rs. 2500 or less. Also, find the limits
within which this estimate would lie.
Solution: Given:
Total number of families N = 18, 000
Sample size n = 840
Number of families with income Rs. 2500 or less x = 206
The sample proportion p̂ is calculated as:
206
p̂ = = 0.2452
840
Estimation of Families with income Rs. 2500 or less in the locality:
Dr. P. Rajendra (Professor, Dept. of Maths)Confidence Intervals for Means and Proportions CMRIT, Bengalore. 9 / 13
Standard Error (SE) is given by:
r r
p̂(1 − p̂) 0.2452 × (1 − 0.2452)
SE = = ≈ 0.015
n 840
Confidence Interval for 95% confidence level (Z = 1.96):
CI = (0.216, 0.274)
Multiplying by the total population N:
Limits = (0.216 × 18, 000, 0.274 × 18, 000) = (3, 888, 4, 932)
Thus, the estimate is that between 3,888 and 4,932 families in the locality
have a monthly income of Rs. 2500 or less.
Dr. P. Rajendra (Professor, Dept. of Maths)Confidence Intervals for Means and Proportions CMRIT, Bengalore. 10 / 13
Problem 5: The mean and standard deviation of the maximum loads
supported by 60 cables are 11.09 tonnes and 0.73 tonnes, respectively.
Find: (i) 95% confidence limits for the mean of the maximum loads of all
cables produced by the company. (ii) 99% confidence limits for the mean
of the maximum loads of all cables produced by the company.
Solution: Given:
Sample size n = 60
Sample mean x̄ = 11.09 tonnes
Standard deviation s = 0.73 tonnes
Standard Error (SE):
s 0.73
SE = √ = √ = 0.0943 tonnes
n 60
Dr. P. Rajendra (Professor, Dept. of Maths)Confidence Intervals for Means and Proportions CMRIT, Bengalore. 11 / 13
(i) 95% Confidence Limits: For 95% confidence level, the Z-value is
1.96. The confidence limits are:
= 11.09 ± 0.1848
= (10.9052 tonnes, 11.2748 tonnes)
Thus, the 95% confidence limits for the mean are (10.91, 11.27) tonnes.
(ii) 99% Confidence Limits: For 99% confidence level, the Z-value is
2.576. The confidence limits are:
= 11.09 ± 0.2429
= (10.8471 tonnes, 11.3329 tonnes)
Thus, the 99% confidence limits for the mean are (10.85, 11.33) tonnes.
Dr. P. Rajendra (Professor, Dept. of Maths)Confidence Intervals for Means and Proportions CMRIT, Bengalore. 12 / 13
Assignment Questions
Dr. P. Rajendra (Professor, Dept. of Maths)Confidence Intervals for Means and Proportions CMRIT, Bengalore. 13 / 13