Sampling and Estimation A Level Notes (Precision)
Sampling and Estimation A Level Notes (Precision)
Types of population.
There are 4 types of population
a) Finite population
It is a population in which the objects in the dataset can be counted. Eg books in
library.
b) Infinite population
It is a population in which it is impossible to count all the members under study. Eg
number of insects in a field.
B. SAMPLE
Refers to a subset of the population.
It contains the characteristics of the population in which it is drawn from.
Samples are used in statistical testing when population sizes are too large for the test
to include all possible members or observation
The sample is measured in terms statistic (sample mean and sample variance).
Statistic
A statistic is a value that describes a characteristic of a sample i.e. sample mean (𝑋̅ )
and sample variance.
OR
It is a calculation from a sample.
Data collection
a) CENSUS
Refers to a study of every unit in the population.
b) Sample survey
Refers to a survey which is carried out using a sampling method,
It is methodology that focuses on selecting a set of individuals capable of
representing the population.
Sampling
Refers to the process of selecting samples from the population.
Merits of sampling Demerits of sampling
It is quicker and cheaper than a It might not be representative of the
census population
A sampling frame is a list of all members of the population that a sample is drawn
form
A list all the items in the population under study. For example: a list of employees’
names within a company
The sampling frame contains information about the size of the population and it helps
the researcher to choose the appropriate sampling method.
The sampling methods are divided into two:
A. Probability sampling,
B. Nonprobability sampling.
Sample Statistics
In drawing conclusions about the population random samples are taken and values obtained
from them are considered.
Therefore it is essential to know the sampling distribution of these random samples.
Sampling distribution
c) T-Distribution
T-distribution is used when the chosen population parameters are not known or when
the sample size is very small.
As the sample size increases, even T distribution tends to become very close to
normal distribution.
When the parent distribution is normally distributed, its sampling distributions will also
be normal (symmetrical) and have specific properties for the central tendency (mean) and
variability (variance).
𝜎2
𝐸 (𝑋̅ ) = 𝜇 𝑉𝑎𝑟(𝑋̅ ) =
𝑛
The theorem states that the distribution of sample means approaches a normal
distribution as the sample size (n) gets larger (𝑛 ≥ 30), regardless of the shape and type
of the original population distribution.
OR
The central limit theorem (CLT) states that the distribution of a sample variable
approximates a normal distribution (i.e., a “bell curve”) as the sample size (n) becomes
larger (𝑛 ≥ 30)
CLT states that regardless of the variable’s distribution (binomial, Poisson, exponential,
geometric; uniform; chi-square etc.) in the population, the sampling distribution will tend
to approximate the normal distribution as the sample size (n) becomes larger (𝑛 ≥ 30)
.
𝜎2
Then 𝑋̅ ~𝑁 (𝜇 , )
𝑛
1. The diameters, 𝑥 , of 110 steel rods were measured in centimetres and the results were
summarised as follows:
∑ 𝑥 = 36.5, ∑ 𝑥 2 = 12.49
Assuming these measurements are a sample from a normal distribution with this mean and
this variance, find the probability that the mean diameter of a sample of size 110 is greater
than 0.345 cm. (O &C)
2. Two red balls and 2 white balls are placed in a bag. Balls are drawn one by one, at
random and without replacement. The random variable X is the number of white balls
drawn before the first red ball is drawn.
1
a) Show that 𝑃(𝑋 = 1) = 3 , and find the rest of the probability distribution of X.
5
b) Find 𝐸(𝑋) and show that 𝑉𝑎𝑟(𝑋) = 9 .
c) The sample mean for 80 independent observations of 𝑋 is denoted by ̅𝑋. Using a
suitable approximation, find 𝑃(𝑋̅ > 0.75) (C)
3. The variable X is such that 𝑋~𝑁(𝜇; 4). A random sample of size 𝑛 is taken from the
population. Find the least 𝑛 such that 𝑃 (|𝑋̅ − 𝜇| < 0.5) > 0.95
Solutions
36.5
1. 𝑚𝑒𝑎𝑛 = 𝜇 = = 0.331818
110
0.345 − 0.331818
𝑃 (𝑋̅ > 0.345) = 𝑃 𝑍 >
2
√0.05867
( 110 )
= 𝑃 (𝑍 > 2.356)
= 1 − 𝛷2.356
= 1 − 0.9908
= 0.009
2. NB: X is the number of white balls drawn before the first red ball is drawn
There are 2 red balls and 2 white balls
once red ball is picked first then the chances of having white balls drawn before the
first red ball is drawn becomes 0
Probability of no white is as result of picking:Red first i.e.
RRWW or RWRW or RWWR
2 1 2 2 1 2 2 1
𝑃 (𝑋 = 0) = × ×1×1+ × × ×1+ × × ×1
4 3 4 3 2 4 3 2
2 2 2
= + +
12 12 12
6
=
12
1
=
2
a) Probability of one white before red is as result . WRRW or WRWR
2 2 1 2 2 1
𝑃 (𝑋 = 1) = × × × 1 + × × × 1
4 3 2 4 3 2
4
= 12
1
= 𝑠ℎ𝑜𝑤𝑛
3
𝑉𝑎𝑟(𝑋) = 𝐸 (𝑋 2 ) − [𝐸(𝑋)]2
1 1 1 2 2
= 02 ( ) + 12 ( ) + 22 ( ) − ( )
2 3 6 3
4
=1−
9
5
=
9
c) By central limit theorem
5
2 9 )
𝑋̅~𝑁 ( ;
3 80
Therefore
2 1
𝑋̅ ~𝑁 ( ; )
3 144
2
0.75 − 3
𝑃(𝑋̅ > 0.75) = 𝑃 𝑍 >
√ 1
( 144 )
(
=𝑃 𝑍>1 )
= 1 − 𝛷1
= 1 − 0.841
= 0.159
3. 𝑋~𝑁(𝜇; 4)
4
𝑋̅~𝑁 (𝜇 ; )
𝑛
𝑃(|𝑋̅ − 𝜇| < 0.5) > 0.95
|𝑋̅−𝜇| 0.5
𝑃(|𝑋̅ − 𝜇| < 0.5) = 𝑃( 4
< 4
)
√ √
𝑛 𝑛
0.5
= 𝑃 |𝑍 | <
√4
( 𝑛)
THEREFORE
0.5
𝑃 (|𝑍| < 4
) > 0.95
√
𝑛
0.5
4
= 1.960
√
𝑛
2
0.5 2 4
(1.960 ) > √𝑛
4
0.065 >
𝑛
4
𝑛 > 0.065 > 61.46
𝑛 = 62
Worked example
The weight of an empty jar is Normally distributed with mean 250 grams and standard
deviation 2 grams. The weight of jam delivered in a jar is Normally distributed with mean
200 grams and standard deviation 5 grams.
a) Find the probability that the mean weight of 4 jam filled jars is greater than
454 grams.
b) Determine the least number of jam filled jars that have to be sampled so that there is at
most a 0.5% chance that their mean is greater than 454 grams.
Solution
(a) Let E be the weight of an empty jar and J be weight of jam
𝐸~𝑁(250; 22 ) 𝐽~𝑁(200 ; 52 )
Let a filled jar be Y (𝑌 = 𝐸 + 𝐽)
𝐸 (𝑌 ) = 𝐸 (𝐸 + 𝐽 )
= 250 + 200
= 450
𝑉𝑎𝑟(𝑌) = 𝑉𝑎𝑟(𝐸 + 𝐽)
= 4 + 25
= 29
𝑌~𝑁(450 ; 29)
The question is requesting for the probability of a sample mean
29
𝑌̅~𝑁(450 ; )
4
29
(b) 𝑌̅~𝑁 (450 ; )
𝑛
454 − 450
𝑃 𝑍> = 0.05
√29
( 𝑛 )
454 − 450
1−𝛷 = 0.05
√29
( 𝑛 )
454 − 450
1 − 0.05 = 𝛷
√29
( 𝑛 )
454 − 450
0.95 = 𝛷
√29
( 𝑛 )
454 − 450
= 𝛷 −1 0.995
√29
𝑛
0.05
−2.576
29
4 = −2.576√
𝑛
4 29
=√
−2.576 𝑛
29
2.41117240847 =
𝑛
29
𝑛=
2.41117240847
𝑛 ≈ 12.0273
𝑛 = 13
Suppose all samples of size n are taken from a population with proportion p with mean
𝐸 (𝑝𝑠 ) and 𝑉𝑎𝑟(𝑝𝑠 ), where;
𝐸 (𝑝𝑠 ) = 𝑥̅
𝑥
=
𝑛
=𝑝
𝑝𝑞
𝑉𝑎𝑟(𝑝𝑠 ) = 𝑛
𝑝𝑞
𝑝𝑠 ~𝑁 (𝑝 ; )
𝑛
The probabilities of the distribution of the sample proportion can be found by using
Continuity correction
Worked Example
A recent study asked working adults if they worked most of their time remotely. The study
found that 30% of employees spend the majority of their time working remotely. Suppose a
sample of 150 working adults is taken.
Solution:
Checking 𝑛𝑝 and 𝑛𝑞
𝑛𝑝 = 150 × 0.3 = 45 ≥ 5
0.7(0.3)
𝜎𝑝 = √
150
= 0.037416
0.27333−0.3
𝑃(𝑃𝑠 < 0.27) = 𝑃 (𝑍 < ) = 𝑃(𝑍 < −0.713)
0.0374
= 1 − 𝛷0.713
= 0.2379
Or
Normal approximation.
LET
𝑋~𝑁(45; 31.7)
41 − 45
𝑃(𝑋 < 41) = 𝑃 (𝑍 < ) = 𝑃 (𝑍 < −0.710) = 0.2387
√31.7
Worked example.
70% of the tomato plants of a particular variety produce more than 10 tomatoes per
plant. Find the probability that a random sample of 50 plants of this variety consist of
more than 37 plants which produce more than ten tomatoes per plant.
Solution
OR
0.7(0.3)
𝑝𝑠 ~𝑁 (0.7 ; ) Becomes 𝑝𝑠 ~𝑁(0.3 ; 0.0042)
50
37 37 1
𝑃(𝑃𝑠 > 50) by continuity correction 𝑃(𝑃𝑠 > 50 + 2(50)) = 𝑃(𝑃𝑠 > 0.75)
0.75 − 0.7
𝑃(𝑃𝑠 > 0.75) = 𝑃 (𝑍 > )
√0.0042
= 𝑃(𝑍 > 0.772)
= 𝑃(𝑍 > 0.772)
= 1 − 𝛷0.772
= 0.22
Point estimation
Point estimation involves the use of sample data (statistic) to estimate a single value which
is to serve as a "best guess" or "best estimate" of an unknown population parameter e.g.
Population mean.
OR
NB: this process is done when the population parameters are not known
1. unbiasedness;
2. Consistency.
3. Efficiency
A point estimate is called unbiased if its value is equal to the population parameter
e.g. if sample mean 𝑥̅ is equal to population mean 𝜇 then 𝑥̅ is an unbiased estimate of
the 𝜇.
o An estimate from an unbiased estimator is called an unbiased estimate
o This means that the mean of the unbiased estimates will get closer to
the population parameter as more samples are taken
2. Consistency
A point estimate should be consistent – the larger the sample size, the more accurate the
forecast is
3. Efficiency
Remember that
a. The unbiased estimate of 𝑝 the population proportion of success is 𝑝̂ which reads 𝑝 hat
∑𝑥
𝜇̂ = 𝑥̅ =
𝑛
OR
1 (∑ 𝑥 )2
𝜎̂ 2 = (∑ 𝑥 2 − )
𝑛−1 𝑛
𝑜𝑟
∑(𝑥 − 𝑥̅ )
𝜎̂ 2 =
𝑛−1
Worked example
The times, T minutes, spent on daily revision of a random sample of 50 A Level students
from the Wedza are summarised as follows.
Calculate unbiased estimates of the population mean and variance of the times spent on daily
revision by A Level students in the Wedza.
∑ 𝑡 6174
𝜇̂ = 𝑥̅ = =
𝑛 50
𝜇̂ = 123.48 𝑚𝑖𝑛𝑢𝑡𝑒𝑠
1 (∑ 𝑡)2
𝜎̂ 2 = (∑ 𝑡 2 − )
𝑛−1 𝑛
1 (6174)2
𝜎̂ 2 = (831 581 − )
49 50
𝜎̂ 2 = 1412.56 𝑚𝑖𝑛𝑢𝑡𝑒𝑠
Worked example
Potato pockets are filled by a machine. A random sample of 10 pockets from the production
line had the following quantities in kg.
20.12 20.50 20.91 20.23 20.46
20.64 21.01 20.19 20.37 20.73
Calculate the unbiased estimates of the
(a) Mean [2]
(b) Variance Zimsec [2]
Solution
a) Let X be the number of potatoes filled by the machine.
∑ 𝑋 = 205.16 ∑ 𝑋 2 = 4 209.8926 𝑛 = 10
i. 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 𝑜𝑓 𝑚𝑒𝑎𝑛 = 𝑥̅ = 𝜇̂
∑𝑥
=
𝑛
205.16
=
10
= 20.516
ii. 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝜎̂ 2
1 (∑ 𝑥 )2
= (∑ 𝑋 2 − )
𝑛−1 𝑛
1 (205.16)2
= (4 209.8926 − )
9 10
= 0.092226666
= 0.092
Confidence Interval
A Confidence Interval is
an estimate of an interval that may contain a population parameter
a range of values the statistician is fairly sure the true value of population parameter
lies in
Is a range of values i.e.[𝑎 ; 𝑏] , bounded above (by 𝑏) and below (by 𝑎) the statistic's
mean that likely would contain an unknown population parameter.
𝑎 𝜇 𝑏
−𝑧 0 𝑧
To obtain the critical z values consider the area from −∞ 𝑡𝑜 𝑧
𝑃(𝑍 > 1.96) = 0.975
𝛷𝑧 = 0.975
𝑧 = 𝛷 −1 0.975
𝑧 = 1.96
Therefore the critical z values are −1.96 𝑎𝑛𝑑 1.96 since the two points are symmetrical
about 0.
= ±𝛷 −1 0.975
= ±1.960
𝑋̅ − 𝜇
𝑃(−1.96 < 𝑍 < 1.96) = 𝑃 (−1.96 < 𝜎 < 1.96)
√𝑛
𝜎 𝜎
= 𝑃 (−1.96 < 𝑋̅ − 𝜇 < 1.96 ) Multiply both side of the
√𝑛 √𝑛
𝜎
inequality by
√𝑛
𝜎 𝜎
= 𝑃 (−1.96 < 𝑋̅ − 𝜇 < 1.96 )
√𝑛 √𝑛
𝜎 𝜎
= 𝑃 (1.96 > −𝑋̅ + 𝜇 > −1.96 ) multiply both side by –
√𝑛 √𝑛
𝜎 𝜎
= 𝑃 (𝑋̅ + 1.96 > 𝜇 > 𝑋̅ − 1.96 ) add 𝑋̅ both side
√𝑛 √𝑛
𝜎 𝜎
𝑋̅ − 1.96 < 𝜇 < 𝑋̅ + 1.96 is the Z distribution formula at 95%
√𝑛 √𝑛
𝜎 𝜎 𝜎 𝜎
a. 𝐶. 𝐼 = 𝑋̅ − 𝑍 < 𝜇 < 𝑋̅ + 𝑍 𝑜𝑟 (𝑋̅ − 𝑍 ; 𝑋̅ + 𝑍 )
√𝑛 √𝑛 √ 𝑛 √𝑛
ii. If population is non-normal with known variance and large sample size (𝑛 ≥30)
𝜎 𝜎 𝜎 𝜎
a. 𝐶. 𝐼 = 𝑋̅ − 𝑍 < 𝜇 < 𝑋̅ + 𝑍 𝑜𝑟 (𝑋̅ − 𝑍 ; 𝑋̅ + 𝑍 )
√𝑛 √𝑛 √𝑛 √𝑛
iii. If population normal or non-normal with unknown variance and large sample size
( where 𝑛 ≥30)
𝜎
̂ 𝜎
̂ 𝜎
̂ 𝜎
̂
a. 𝐶. 𝐼 = 𝑋̅ − 𝑍 𝑛 < 𝜇 < 𝑋̅ + 𝑍 𝑛 𝑜𝑟 (𝑋̅ − 𝑍 𝑛 ; 𝑋̅ + 𝑍 𝑛)
√ √ √ √
where 𝜎̂ 2 is the unbiased estimate of population variance
𝑝𝑠 𝑞𝑠 𝑝𝑠 𝑞𝑠 𝑝𝑠 𝑞𝑠 𝑝𝑠 𝑞𝑠
𝐶. 𝐼 = 𝑝𝑠 − 𝑍√ < 𝜇 < 𝑝𝑠 + 𝑍√ 𝑜𝑟 (𝑝𝑠 − 𝑍√ ; 𝑝𝑠 + 𝑍√ )
𝑛 𝑛 𝑛 𝑛
𝜎 𝜎
̂ 𝑝𝑞
𝑀𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟 == 𝐳 𝑜𝑟 𝑧 Or 𝑧 × √ 𝑛
√𝑛 √𝑛
Worked example
The masses, m grams, of a random sample of 80 strawberries of a certain type were measured
and summarised as follows.
𝑛 = 80 ∑ 𝑚 = 4200 ∑ 𝑚2 = 229 000
(i) Find unbiased estimates of the population mean and variance. [3]
(ii) Calculate a 98% confidence interval for the population mean. 50 random samples of
size 80 were taken and a 98% confidence interval for the population mean, 𝜇, was
found from each sample. [3]
(iii) Find the number of these 50 confidence intervals that would be expected to include
the true value of 𝜇. Camb [1]
Solution
(i) 𝜇̂ =𝑚̅
∑𝑚
=
𝑛
4200
=
80
= 52.5
1 (∑ 𝑚)2
𝜎̂ 2 = (∑ 𝑚2 − )
𝑛−1 𝑛
1 (4200)2
= (229000 − )
79 80
= 107.5949367
= 108 (3𝑠𝑓)
(ii) C.I is a Z distribution : population is non-normal with unknown variance and sample
size is large.
𝜎̂ 𝜎
̂
C.I is (𝑋̅ − 𝑍 ; 𝑋̅ + 𝑍 )
√𝑛 √𝑛
1 + 0.98
𝑧 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝑜𝑓 98% 𝐶. 𝐼 = ±𝛷 −1 [ ]
2
−1
= ±𝛷 0.99
𝑧 = 2.326
Worked example
The masses of heavy weight boxers have mean 𝜇 and standard deviation 𝜎. A random sample
of 49 heavy weight boxers is taken and a 95% confidence interval is constructed for 𝜇.
Given that 95% confidence interval is [94.5 ; 105.3]; find
(i) The sample mean 𝜇 and the standard deviation 𝜎
(ii) A 99% confidence interval for 𝜇 Zimsec N2020 P2 [9]
Solution
The distribution is z since variance is known and sample size is large.
𝜎 𝜎
(i) 𝐶. 𝐼 = [𝜇 − 𝑍 × ; 𝜇 − 𝑍 × ] = [94.5 ; 105.3]
√𝑛 √𝑛
1 + 0.95
𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠(𝑧 = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.975
= ±1.960
𝜎 𝜎
[𝜇 − 1.960 × ] = [94.5 ; 105.3]
; 𝜇 + 1.960 ×
√49 √49
𝜎 𝜎
[𝜇 − 1.960 × ; 𝜇 + 1.960 × ] = [94.5 ; 105.3]
7 7
𝜎
𝜇 − 1.960 × = 94.5 𝑒𝑞𝑎𝑢𝑡𝑖𝑜𝑛 1
7
𝜎
𝜇 + 1.960 × = 105.3 𝑒𝑞𝑎𝑢𝑡𝑖𝑜𝑛 2
7
7𝜇 − 1.960𝜎 = 661.5 𝑒𝑞𝑎𝑢𝑡𝑖𝑜𝑛 1
7𝜇 + 1.960𝜎 = 737.1 𝑒𝑞𝑎𝑢𝑡𝑖𝑜𝑛 2
𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛1 − 𝑒𝑞𝑎𝑢𝑡𝑖𝑜𝑛2
−3.92𝜎 −75.6
=
−3.92 −3.92
𝜎 = 19.28571428
19.28571428
𝐶. 𝐼 = 99.9 ± 2.576 ×
√49
= (92.8 ; 107)
Worked example
The results of a survey showed that 360 oout of 1000 families view a certain television
show.
Calculate the 95% confidence interval for the population of families viewing the show.
Zimsec N2021 P1 [5]
Solution
𝑝𝑠 𝑞𝑠
𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 = 𝑝𝑠 ± 𝑍√
𝑛
1 + 0.95
𝑍 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 𝑎𝑡 95% = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.975
𝑝𝑠 = 0.36 𝑞𝑠 = 0.64
0.36 × 0.64
𝐶. 𝐼 = 0.36 ± 1.96√
1000
= (0.3302492 ; 0.3897507)
= (0.33 ; 0.39 ) 𝑡𝑜 2 𝑠 𝑓
Pencils produced on a certain machine have lengths, in millimetres, which are normal
distributed with a mean 𝜇 and standard deviation of 3. A random sample of 16 pencils was
taken and the length 𝑥 millimetres, measured for each pencil, giving
∑ 𝑥 = 2848
a) State why 𝑋̅, the mean length; in millimetres, of a random sample of 16 pencils produced
on the machine; is normally distributed. [1]
b) Construct a 99% confidence interval for . [5]
3
𝐶. 𝐼 = 178 ± 2.576 ×
√16
= (176.068 ; 179.932)
Worked example
A machine produces balls which are normally distributed with a mean of 𝜇 𝑐𝑚 and a standard
deviation of 0.24 𝑐𝑚.
The diameter, d cm, of each ball in a random sample of 144 balls was measured. This gave:
∑ 𝑑 = 3585.6
(i) Calculate the unbiased estimate of 𝜇 [1]
(ii) Calculate the standard error of your unbiased estimate. [2]
(iii) Construct a 95% confidence interval of 𝜇. [4]
(iv) Hence state, with a reason, whether you agree with the claim that 𝜇 = 25 [2]
Solution
(i) 𝜇̂ = 𝑑̅
3585.6
=
144
= 24.9
𝜎
(ii) 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝜇̂ =
√𝑛
0.24
=
√144
= 0.02
(iii) .
1 + 0.95
𝑍 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 𝑎𝑡 95% = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.975
= 1.960
Worked example
The contents of each of a random sample of 100 cans of soft drink are measured. The results
have a mean of 331.28 𝑚𝑙 and standard deviation of 2.97 𝑚𝑙.
100
𝜎̂ 2 = 2.972
100 − 1
100
= 2.972
100 − 1
= 8.91
𝜎
b) 𝐶. 𝐼 = 𝑥̅ ± 𝑧 𝑛
√
Worked example
The lifetimes of light bulbs of a certain type have standard deviation 25.3 hours. Each bulb in
randomly chosen box of 12 was tested to failure and the mean lifetime was found to be 1785.7
hours.
a) State two assumptions which are required so that a symmetric 90% confidence interval
for population mean lifetime of the bulbs can be calculated.
b) Calculate a symmetric 90% confidence interval, given the validity of the assumptions.
The values of the end-points should be given to nearest integer. Camb
Solution
a) the distribution is normal ; the bulbs in the box form a sample.
b)
1 + 0.9
𝑧 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.95
= ±1.645
Worked example
A group of 65 students is asked to guess the length of a particular object and their answers are
recorded as x cm, with the following results.
∑ x = 6019.0 ∑ x 2 = 557 733.8
a) Show that the estimated standard error of the sample mean is 0.3cm
b) Determine an approximate symmetric 95% confidence interval for the mean of the
population of all such guesses, giving your limits correct to two 1 decimal places.
c) State one assumption which you have made in your calculations. NEAB
Solution
a)
1 6019.02
σ2
̂ = (557 733.8 − 0)
64 65
117
=
20
= 5.85
̂
σ
S. E =
√n
̂2
σ
=√
n
5.85
=√
65
9
.= √
100
3
=
10
= 0.3 shown
6019
(b) x̅ = = 92.6
65
1 + 0.95
𝑍 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 𝑎𝑡 95% = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.975
= ±1.960
𝐶. 𝐼 = 92.6 ± 1.960(0.3)
= (92.012 ; 93.188)
Worked example
Shoe shop staff routinely measure the length of their customers feet. Measurements of the
length of one foot (without shoes) from each of 180 adult male customers yielded a mean
length of 29.2 cm and a standard deviation of 1.47 cm.
(a) Calculate a 95% confidence interval for the mean length of male feet.
(b) Why was it not necessary to assume that the lengths of feet are normally distributed in
order to calculate the confidence interval in part (a)?
(c) What assumption was it necessary to make in order to calculate the confidence interval in
part (a)?
(d) Given that the lengths of male feet may be modelled by a normal distribution, and
making any other necessary assumptions, calculate an interval within which 90% of the
lengths of male feet will lie.
(e) In the light of your calculations in parts (a) discuss, briefly, the question 'is a foot long?"
(One foot is 30.5 cm.) [AEB]
Solution
(a) The population variance is unknown and the sample size is large
𝑋̅ = 29.2
𝑛
𝜎̂ 2 = 𝑠2
𝑛−1
180
= × 1.47 2
179
= 2.172972067
1 + 0.95
𝑍 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 𝑎𝑡 95% = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.975
= 1.960
2.172972067
𝐶. 𝐼 = 29.2 ± 1.960 × √
180
= (28.98𝑐𝑚 ; 29.42𝑐𝑚)
(e) No foot is 30.5cm since 30.5cm is out of the 95% confidence interval for mean
Worked example
The probability of success in each of a long series of 𝑛 independent trials is constant and
equal to 𝑝. Explain how an approximate 95% confidence interval for p may be obtained. In
an opinion poll carried out before a local election, 501 people out of a random sample of 925
voters declare that they will vote for a particular one of the two candidates contesting the
election. Find approximate 95% confidence limits for the proportion of all voters in favour of
this candidate. (AEB)
Solution
Population proportion from a large sample
1 + 0.95
𝑍 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 𝑎𝑡 95% = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.975
= ±1.960
501 424
501 √(925) (925)
𝑪. 𝑰 = ± 1.960
925 925
= (𝟎. 𝟓𝟏𝟎 ; 𝟎. 𝟓𝟕𝟒)
8. The results of a survey showed that 3600 out of 10 000 families regularly purchased a
specific weekly magazine.
(a) Find approximate 95% confidence limits for the proportion of families buying the
magazine.
(b) Estimate the additional number of families to be contacted if the probability that the
estimated proportion is in error by more than 0.01 is to be at most 1%. (AEB)
Solution
1+0.95
(a) 𝑍 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 𝑎𝑡 95% = ±𝛷 −1 [ 2 ]
= ±𝛷 −1 0.975
= ±1.960
3600 6400
3600 √(10 000) (10 000)
𝐶. 𝐼 = − 1.960
10 000 10 000
𝑝𝑞
𝑀. 𝐸 = 𝑧√
𝑛
𝑍 2
𝑛 = 𝑝𝑞 (𝑀.𝐸 )
After calculating the value of n from the formula, round the value of n up to the next integer.
3600 6400
1.960𝟐 × ((10 000)) (10 000)
𝐧=
0.012
𝐧 = 𝟖𝟖𝟓𝟏. 𝟎𝟒𝟔𝟒
= 𝟖𝟖𝟓𝟐
𝟖𝟖𝟓𝟐 − 𝟑𝟔𝟎𝟎 = 𝟓𝟐𝟓𝟐 𝐚𝐝𝐝𝐢𝐭𝐢𝐨𝐧𝐚𝐥 𝐟𝐚𝐦𝐢𝐥𝐢𝐞𝐬
The t-distribution like normal distribution has bell shaped distributions that is
symmetric about the mean and
The mean of the standard normal distribution and t-distribution is zero.
The t-distribution has a larger variance than the standard normal distribution.
The standard normal distributions' confidence levels are wider than those of the t-
distribution.
The t distribution has only one parameter called the degrees of freedom which is denoted (𝑣),
𝑣 = 𝑛 – 1 where n is the sample size.
As sample size 𝑛 and degrees of freedom increase the t-distribution becomes more similar
to a normal distribution.
𝜎
̂
The width of a confidence interval (𝑎 ; 𝑏) = 𝑏 − 𝑎 𝑜𝑟 2 × 𝑡 ,
√𝑛
Margin of error
𝜎̂
𝑀𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟 = 𝑡
√𝑛
Standard error of the mean in an Interval
𝜎̂
𝐒𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐞𝐫𝐫𝐨𝐫 =
√𝑛
Worked example
The mass of a certain brand of chocolate bar has a normal distribution with mean 𝜇 grams
and standard deviation 0.85 grams. The masses, in grams, of 5 randomly chosen bars are
Calculate a symmetric 90% confidence interval for 𝜇, giving the end-points correct to two
decimal places.
Forty random samples of 5 bars are taken, and a 90% confidence interval for 𝜇 is calculated
for each sample. Find the expected number of intervals that do not contain 𝜇. [Camb]
Solution
𝜎
̂ 𝜎
̂
The formula is (𝑋̅ − 𝑡 ; 𝑋̅ + 𝑡 )
√𝑛 √𝑛
∑ 𝑥 = 624.85;
624.85
𝑥̅ = = 124.97
5
∑ 𝑥 2 = 78 089.3343
1 (124.97)2
𝜎̂ 2 = (78 089.3343 − )
5−1 5
= 0.45745
𝜎̂ = 0.6763505
95%
2.132
1 + 0.9
𝑡 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.95
0.6763505 0.6763505
𝐶𝐼 = (124.97 − 2.132 × ; 124.97 + 2.132 × )
√5 √5
= (124.33; 125.61)
Worked example
The external diameters (measured in units of 0.01mm above a nominal value) of a sample of
piston rings produced on the same machine were:
11, 9, 32, 18, 29, 1, 21, 19, 6.
Assuming a normal distribution calculate a 95% confidence interval for the population mean.
[AEB]
Solution
X is the external diameter of a piston ring
146
𝑥̅ = 9 = 16.222
2
1 1462
𝜎̂ = (3230 − )
8 9
1 + 0.95
𝑡 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.975
107.694444
𝐶. 𝐼 = 16.222 ± 2.306 × √
9
= (8.25 ; 24.2)
= (0.0825𝑚𝑚 ; 0.242𝑚𝑚)
Worked example
In Tesbury’s supermarket, economy packs of butter are marked 250g. An inspector takes a
random sample of 12 packs and weighs them. Correct to the nearest 0.1g; the weights, in
grams, were
(a) Making any necessary assumptions, which should be stated, calculate a 99%
confidence interval for the mean weight of the packs of butter.
(b) Calculate the width of the 99% confidence interval.
(c) How is the width affected when calculating a 90% confidence interval
Solutions
𝑥~𝑡(11)
2966.8
𝑥̅ = = 247.23333
12
1 + 0.99
𝑡 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.995
11.207879
𝐶. 𝐼 = 247.23333 ± 3.106 × √
12
= (244.231597𝑔 ; 250.2350697𝑔)
= 6.0𝑔
Worked example
An experimental physicist needs to estimate the true viscosity, 𝜇 Pascal seconds (Pa s), of a
light machine oil. Using the same apparatus he takes 12 independent measurements, 𝑥 Pa s,
of the viscosity of the oil, obtaining the values below:
25.8 25.2 24.7 25.5 25.3 25.4
25.2 25.3 25.8 25.9 25.2 24.9
(∑ 𝑥 = 304.2 ∑ 𝑥 2 = 7712.9)
When using this apparatus, measurements of the oil's viscosity are distributed with mean 𝜇
and variance 𝜎 2 . Obtain unbiased estimates of 𝜇 and 𝜎 2 . Hence obtain a symmetric 95%
confidence interval for 𝜇.
State any distributional assumptions you have made in obtaining your confidence interval.
The physicist explained the meaning of his confidence interval by saying there was a
probability of 0.95 that 𝜇 lay between the limits of the interval. Explain why this
interpretation is wrong and provide a correct explanation of 95 % confidence as used in this
context.
The manufacturer of the oil quotes a viscosity of 25.5 Pa s for the oil. With reference to your
confidence interval, state any conclusion you can come to regarding the validity of this
figure. (NEAB)
2
1 304.22
𝜎̂ = (7712.9 − )
11 12
= 0.13
𝑥~𝑡(11)
1 + 0.95
𝑡 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.975
0.13
𝐶. 𝐼 = 25.35 ± 2.201 × √
12
= (25.1 ; 25.6)
The assumptions are: the sample is small and the population variance is unknown.
The 95% Confidence interval (25.1 ; 25.6) either covers the parameter value (25.35) or it
does not. The 95% probability relates to the reliability of the estimation procedure, not to a
specific calculated interval.
In an investigation to assess the difference in use between a credit card and a store card a
random sample of 20 people, each using both cards, was selected. They supplied information
from which, in 1994, the difference between each person's mean monthly spending on the
credit and store cards, £d, was calculated. The following summary data were then calculated.
∑ 𝑑 = 1664 and ∑ 𝑑 2 = 426 445.
Stating all necessary distributional assumptions, calculate a symmetric 90% confidence
interval for the mean difference between the mean monthly spending for all users of the two
cards. (NEAB)
Solution
= 15 157.90526
1 + 0.90
𝑡 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.95
15 157.9052
𝐶. 𝐼 = 83.2 ± 1.729 × √
20
= (£35.60080710 ; £130.7991929)
= (£35.60 ; £130.80)
Worked example
Five independent measurements of the diameter of a ball bearing were made using a certain
instrument. The results obtained in millimetres were
8.1; 9.1; 8.9; 8.9; 9.1
Given that true diameter of the ball bearing is 9.0 mm,
(a) calculate the unbiased estimates of the mean and variance of the measurement error of the
instrument, [5]
(b) Assuming that the measurement errors are independent and normally distributed, find the
90% confidence interval of the mean measurement error. Zimsec N2021 p1 [3]
Solution
(a) The errors are −0.9; 0.1; −0.1 ; −0.1; 0.1
−0.9 + 0.1 − 0.1 − 0.1 + 0.1
𝜇̂ =
5
= −0.18
= 0.172
(b) Since variance is unknown and sample size is small the distribution is t.
1+0.9
𝑡 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 (𝑤𝑖𝑡ℎ 4 𝑑𝑓) = ±𝛷 −1 [ 2 ]
= ±𝛷 −1 0.95
= ±2.132
0.172
𝐶. 𝐼 = −0.18 ± 2.132 × √
5
= (0.575 ; 0.215 )
A confidence interval (C.I.) for a difference between means is a range of values that is likely
to contain the true difference between two population means with a certain level of confidence.
To determine whether the difference between two means is statistically significant, analysts
often compare the confidence intervals for those groups.
𝜎𝑥 2 𝜎𝑦 2
𝐶. 𝐼 = (𝑋̅ − 𝑌̅) ± 𝑍√ +
𝑛𝑥 𝑛𝑦
𝑜𝑟
𝜎𝑥 2 𝜎𝑦 2
𝐶. 𝐼 = (𝑌̅ − 𝑋̅ ) ± 𝑍√ +
𝑛𝑥 𝑛𝑦
ii. Confidence Interval for the Difference between Means of populations X and Y
non-normal with known variance and large sample size (𝑛 ≥30)
𝜎𝑥 2 𝜎𝑦 2
𝐶. 𝐼 = (𝑋̅ − 𝑌̅) ± 𝑍√ +
𝑛𝑥 𝑛𝑦
𝑜𝑟
𝜎𝑥 2 𝜎𝑦 2
𝐶. 𝐼 = (𝑌̅ − 𝑋̅ ) ± 𝑍√ +
𝑛𝑥 𝑛𝑦
iii. Confidence Interval for the Difference between Means of populations X and Y
normal or non-normal with unknown variance and large sample size ( where
𝑛 ≥30)
𝜎̂𝑥 2 𝜎̂𝑦 2
𝐶. 𝐼 = (𝑋̅ − 𝑌̅) ± 𝑍√ +
𝑛𝑥 𝑛𝑦
𝑜𝑟
𝜎̂𝑥 2 𝜎̂𝑦 2
𝐶. 𝐼 = (𝑌̅ − 𝑋̅ ) ± 𝑍√ +
𝑛𝑥 𝑛𝑦
iv. Confidence Interval for the Difference between Means of populations X and Y
2
𝜎̂𝑥 2 𝜎̂𝑦
̅ ̅
𝐶. 𝐼 = (𝑋 − 𝑌 ) ± 𝑡√ +
𝑛𝑥 𝑛𝑦
𝑜𝑟
𝜎̂𝑥 2 𝜎̂𝑦 2
𝐶. 𝐼 = (𝑌̅ − 𝑋̅ ) ± 𝑡√ +
𝑛𝑥 𝑛𝑦
Worked example
Kayla is investigating the lengths of the leaves of a certain type of tree found in two forests X
and Y. She chooses a random sample of 40 leaves of this type from forest X and records their
lengths, x cm. She also records the lengths, y cm, for a random sample of 60 leaves of this
type from forest Y. Her results are summarised as follows.
∑ 𝑥 = 242.0 ∑ 𝑥 2 = 1587.0 ∑ 𝑦 = 373.2 ∑ 𝑦 2 = 2532.6
Find a 90% confidence interval for the difference between the population mean lengths of
leaves in forests X and Y. [7]
Solution
In both cases sample sizes are large and variances are unknown.
242.0
𝑋̅ = = 6.05
40
373.2
𝑌̅ = = 6.22
60
2 1 242.02
𝜎̂𝑋 = (1587.0 − ) = 3.151282051
39 40
2 1 373.22
𝜎̂𝑌 = (2532.6 − ) = 3.581288136
59 60
1 + 0.9
𝑧 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.95
= ±1.645
3.15128205 3.581288136
𝐶. 𝐼 = (6.05 − 6.22) ± 1.645√ +
40 60
= (−0.782 ; 0.442)