0% found this document useful (0 votes)
102 views

Sampling and Estimation A Level Notes (Precision)

Uploaded by

bongani mungadze
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views

Sampling and Estimation A Level Notes (Precision)

Uploaded by

bongani mungadze
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

SAMPLING AND ESTIMATION

After studying this topic students will be able to:


 distinguish between sample and population
 distinguish between probability sampling techniques and non-probability sampling
techniques
 apply the sampling methods to identify representative samples
 calculate sample mean, variance and standard deviation
 find the unbiased estimates of population parameters
 solve problems involving sampling and estimation
 state the Central Limit Theorem
 recognize that the sample mean can be regarded as a random variable
 use the Central Limit Theorem to find probabilities
 identify the implications of a small and large sample to the Central Limit Theorem
 determine a confidence interval for a population mean in cases where the population is
normally distributed with known variance or where a large sample with unknown
variance is used
 determine a confidence interval for a population mean in cases where the population is
normally distributed with unknown variance where a small sample is used
 determine from a large sample an approximate confidence interval for a population
proportion

1 | PRECISION +263775973880 COMPILED BY MWEDZI S


POPULATION AND SAMPLE
A. POPULATION
 A population consists of all the objects or events of the dataset.
 The population is measured in terms parameters (population mean and population
standard deviations).
Parameter
 Is a calculation from a population
OR
 It is a value that describes a characteristic of an entire population, such as the
population mean (μ) and population standard deviations (σ).

Types of population.
There are 4 types of population
a) Finite population
 It is a population in which the objects in the dataset can be counted. Eg books in
library.
b) Infinite population
 It is a population in which it is impossible to count all the members under study. Eg
number of insects in a field.

c) Existent population – refers to a population of concrete objects or objects that exist in


solid form. E.g. books
d) Hypothetical population- refers to a population that exist theoretically. e.g. the
population of heads or tails obtained after tossing a coin on infinite number of times.

B. SAMPLE
 Refers to a subset of the population.
 It contains the characteristics of the population in which it is drawn from.
 Samples are used in statistical testing when population sizes are too large for the test
to include all possible members or observation
 The sample is measured in terms statistic (sample mean and sample variance).

Statistic
A statistic is a value that describes a characteristic of a sample i.e. sample mean (𝑋̅ )
and sample variance.
OR
It is a calculation from a sample.

 Sample statistics are used in inferential statistics, to estimate population parameters


in order to draw conclusions about the entire population.

2 | PRECISION +263775973880 COMPILED BY MWEDZI S


 To draw valid conclusions, particular sampling techniques must be used to obtain a
suitable sample.
 These sampling techniques help ensure that samples produce unbiased estimates.

Data collection

Data is collected by means of a survey and there are 2 types of surveys

A survey is an investigation of one or more characteristics of a population or sample


a) Census
b) Sample survey

a) CENSUS
 Refers to a study of every unit in the population.

Merits of census Demerits of census


 Gives the researcher detailed  Very costly since it requires a lot of
information manpower to collect data.
 Reliable data is collected since the  Time consuming since every
investigator observes every member member of the entire population is
personally observed
 Ensures accuracy since every aspect  Many possibilities of errors due to
of the entire population is collected non-response , personal bias of the
investigator, skipping of members
etc.

b) Sample survey
 Refers to a survey which is carried out using a sampling method,
 It is methodology that focuses on selecting a set of individuals capable of
representing the population.
Sampling
Refers to the process of selecting samples from the population.
Merits of sampling Demerits of sampling
 It is quicker and cheaper than a  It might not be representative of the
census population

 It leads to less data needing to be  It could introduce bias


analysed

3 | PRECISION +263775973880 COMPILED BY MWEDZI S


Sampling frame

 A sampling frame is a list of all members of the population that a sample is drawn
form
 A list all the items in the population under study. For example: a list of employees’
names within a company
 The sampling frame contains information about the size of the population and it helps
the researcher to choose the appropriate sampling method.
The sampling methods are divided into two:
A. Probability sampling,
B. Nonprobability sampling.

A. Probability sampling methods


 In probability sampling ensures the every unit has an equal chance of being selected

4 | PRECISION +263775973880 COMPILED BY MWEDZI S


Method Descriptions Advantages Disadvantages
Is a sampling method in which every  Ensure a high  Not possible without complete list of
Simple member of the population has an degree of population members;
random equal probability of being selected representativeness  potentially uneconomical to achieve;
sampling e.g, drawing names out of a hat, or  Reduce the risk of  can be disruptive to isolate members
. using a random number table or favoritism and from a group;
random number generator. biased opinions.  Time consuming and tedious
 Minimizes amount
Simple random sampling is most of sampling bias
appropriate when all the
population members are similar to
each other.
Systematic A systematic sample is a sample in  Simple and  Only possible when the complete list
sampling which each member of the population convenient to use of the population is available.
is assigned a number. The members  Creates an even  It’s less random than random
of the population are ordered in some distribution of sampling
way, a starting number is randomly members to form  Low degree of representativeness.
selected, and then sample members samples.  Has high risk of data manipulation as
are selected at regular intervals from  It is effectively the researcher might set their
the starting number. suitable in systems to increase the chance of
(For instance, every 3rd, 5th, or 100th collecting data certain targets to be selected
member is selected.) from  If there are periodic patterns within
geographically the dataset, the sample will be bias
dispersed areas.
Stratified Is a sampling method in which Can ensure that More complex, requires greater effort
random members of the population are specific groups are than simple random; strata must be
divided into two or more subsets, represented, even carefully defined.
called strata, which share a similar proportionally, in the
characteristic such as age, gender, sample(s) (e.g., by
ethnicity etc. A sample is then gender), by selecting
randomly selected from each of the individuals from strata
strata. list.
Stratified random sampling is most
appropriate when the population is
heterogeneous.
Cluster Is a sampling method in which the  Require fewer  Prone to bias, the cluster can lead to
researcher splits the entire population resources biased results if formed under biased
into naturally occurring subgroups therefore it is opinion.
with similar characteristics called relatively cheaper  Samples drawn under cluster method
clusters and then randomly select a (traveling and are prone to high sampling error
cluster in which all members are administrative cost
surveyed. are minimized)
Cluster sampling is most  More feasible as it
appropriate when the population deals with smaller
consists of units rather than groups of similar
individual. traits)

5 | PRECISION +263775973880 COMPILED BY MWEDZI S


B. Nonprobability sampling method
Method Descriptions Advantages Disadvantages
Purposive Involves Hand-pick subjects on the Ensures balance of Samples are not easily
basis of specific characteristics group sizes when defensible as being
multiple groups representative of
are to be selected populations due to
potential subjectivity of
researcher
Quota Involves selecting individuals as they Ensures selection Not possible to prove that
come to fill a quota by characteristics of adequate the sample is representative
proportional to populations numbers of of designated population
subjects with
appropriate
characteristics
Snowball Subjects with desired traits or Possible to include No way of knowing
characteristics give names of further members of groups whether the sample is
appropriate subjects where no lists or representative of the
identifiable population
clusters even exist
(e.g., drug abusers,
criminals)
Volunteer, Involves either asking for volunteers, Inexpensive way Can be highly
accidental, or the consequence of not all those of ensuring unrepresentative
convenience selected finally participating, or a set of sufficient numbers
subjects who just happen to be of a study
available

6 | PRECISION +263775973880 COMPILED BY MWEDZI S


POPULATION VS SAMPLE
FOR COMPARISON POPULATION SAMPLE

Meaning Population refers to the Sample means a subgroup


collection of all elements of the members of
possessing common population chosen for
characteristics that participation in the study.
comprises universe.

Includes Each and every unit of the Only a handful of units of


group. population.

Characteristic Parameter Statistic

Data collection Complete enumeration or Sample survey or


census sampling

Focus on Identifying the Making inferences about


characteristics. population

Sample Statistics

In drawing conclusions about the population random samples are taken and values obtained
from them are considered.
Therefore it is essential to know the sampling distribution of these random samples.

Sampling distribution

 A sampling distribution is a probability distribution of a statistic obtained from a


larger number of samples drawn from a specific population.
 Its primary purpose is to establish representative results of small samples of a
comparatively larger population.
 Since the population is too large to analyze, a smaller group is selected.
 The gathered data, or statistic, is used to calculate the likely occurrence, or
probability, of an event.

Types of Sampling distribution

a) Sampling distribution of mean.


 It is the probabilistic spread of all the means of samples of fixed size selected
randomly from a particular population.
 When the distribution is plotted on the graph, it indicates normal distribution.
 The centre of the graph is the mean of the finite-sample distribution, which is also the
mean of that population.
 This is a commonly used sampling distribution.

b) Sampling Distribution of Proportion


 This type of finite-sample distribution identifies the proportions of the population.

7 | PRECISION +263775973880 COMPILED BY MWEDZI S


 The samples are selected and the sample proportion calculated.
 The mean of the sample proportions gathered from each sample group signifies the
mean proportion of the population as a whole.

c) T-Distribution
 T-distribution is used when the chosen population parameters are not known or when
the sample size is very small.
 As the sample size increases, even T distribution tends to become very close to
normal distribution.

Sampling distribution of mean.

When the parent distribution is normally distributed, its sampling distributions will also
be normal (symmetrical) and have specific properties for the central tendency (mean) and
variability (variance).

𝜎2
𝐸 (𝑋̅ ) = 𝜇 𝑉𝑎𝑟(𝑋̅ ) =
𝑛

Central limit theorem.


The central limit theorem helps in constructing a sampling distribution.

The theorem states that the distribution of sample means approaches a normal
distribution as the sample size (n) gets larger (𝑛 ≥ 30), regardless of the shape and type
of the original population distribution.

OR

The central limit theorem (CLT) states that the distribution of a sample variable
approximates a normal distribution (i.e., a “bell curve”) as the sample size (n) becomes
larger (𝑛 ≥ 30)

CLT states that regardless of the variable’s distribution (binomial, Poisson, exponential,
geometric; uniform; chi-square etc.) in the population, the sampling distribution will tend
to approximate the normal distribution as the sample size (n) becomes larger (𝑛 ≥ 30)
.

𝜎2
Then 𝑋̅ ~𝑁 (𝜇 , )
𝑛

8 | PRECISION +263775973880 COMPILED BY MWEDZI S


Worked Examples

1. The diameters, 𝑥 , of 110 steel rods were measured in centimetres and the results were
summarised as follows:

∑ 𝑥 = 36.5, ∑ 𝑥 2 = 12.49

Find the mean and standard deviation of these measurements.

Assuming these measurements are a sample from a normal distribution with this mean and
this variance, find the probability that the mean diameter of a sample of size 110 is greater
than 0.345 cm. (O &C)

2. Two red balls and 2 white balls are placed in a bag. Balls are drawn one by one, at
random and without replacement. The random variable X is the number of white balls
drawn before the first red ball is drawn.

1
a) Show that 𝑃(𝑋 = 1) = 3 , and find the rest of the probability distribution of X.
5
b) Find 𝐸(𝑋) and show that 𝑉𝑎𝑟(𝑋) = 9 .
c) The sample mean for 80 independent observations of 𝑋 is denoted by ̅𝑋. Using a
suitable approximation, find 𝑃(𝑋̅ > 0.75) (C)

3. The variable X is such that 𝑋~𝑁(𝜇; 4). A random sample of size 𝑛 is taken from the
population. Find the least 𝑛 such that 𝑃 (|𝑋̅ − 𝜇| < 0.5) > 0.95

Solutions
36.5
1. 𝑚𝑒𝑎𝑛 = 𝜇 = = 0.331818
110

9 | PRECISION +263775973880 COMPILED BY MWEDZI S


𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝜎
12.49
=√ − 0.3318182
110
= 0.05867

By central limit theorem


0.058672
𝑋̅~𝑁 (0.331818; )
110

0.345 − 0.331818
𝑃 (𝑋̅ > 0.345) = 𝑃 𝑍 >
2
√0.05867
( 110 )
= 𝑃 (𝑍 > 2.356)
= 1 − 𝛷2.356
= 1 − 0.9908
= 0.009

2. NB: X is the number of white balls drawn before the first red ball is drawn
There are 2 red balls and 2 white balls
once red ball is picked first then the chances of having white balls drawn before the
first red ball is drawn becomes 0
Probability of no white is as result of picking:Red first i.e.
RRWW or RWRW or RWWR

2 1 2 2 1 2 2 1
𝑃 (𝑋 = 0) = × ×1×1+ × × ×1+ × × ×1
4 3 4 3 2 4 3 2
2 2 2
= + +
12 12 12
6
=
12
1
=
2
a) Probability of one white before red is as result . WRRW or WRWR
2 2 1 2 2 1
𝑃 (𝑋 = 1) = × × × 1 + × × × 1
4 3 2 4 3 2
4
= 12
1
= 𝑠ℎ𝑜𝑤𝑛
3

Probability of two white balls . WWRR


2 1 1
𝑃 (𝑋 = 0) = × ×1×1 =
4 3 6
𝑥 0 1 2
𝑃(𝑋 = 𝑥) 1 1 1
2 3 6

10 | PRECISION +263775973880 COMPILED BY MWEDZI S


1 1 1
b) 𝐸 (𝑋) = 0 (2) + 1 (3) + 2 (6)
2
=
3

𝑉𝑎𝑟(𝑋) = 𝐸 (𝑋 2 ) − [𝐸(𝑋)]2
1 1 1 2 2
= 02 ( ) + 12 ( ) + 22 ( ) − ( )
2 3 6 3
4
=1−
9
5
=
9
c) By central limit theorem
5
2 9 )
𝑋̅~𝑁 ( ;
3 80
Therefore
2 1
𝑋̅ ~𝑁 ( ; )
3 144

2
0.75 − 3
𝑃(𝑋̅ > 0.75) = 𝑃 𝑍 >
√ 1
( 144 )
(
=𝑃 𝑍>1 )
= 1 − 𝛷1
= 1 − 0.841
= 0.159
3. 𝑋~𝑁(𝜇; 4)
4
𝑋̅~𝑁 (𝜇 ; )
𝑛
𝑃(|𝑋̅ − 𝜇| < 0.5) > 0.95

|𝑋̅−𝜇| 0.5
𝑃(|𝑋̅ − 𝜇| < 0.5) = 𝑃( 4
< 4
)
√ √
𝑛 𝑛

0.5
= 𝑃 |𝑍 | <
√4
( 𝑛)
THEREFORE

0.5
𝑃 (|𝑍| < 4
) > 0.95

𝑛

11 | PRECISION +263775973880 COMPILED BY MWEDZI S


0.5
4
> 𝛷 −1 (0.975)

𝑛

0.5
4
= 1.960

𝑛
2
0.5 2 4
(1.960 ) > √𝑛
4
0.065 >
𝑛
4
𝑛 > 0.065 > 61.46
𝑛 = 62
Worked example
The weight of an empty jar is Normally distributed with mean 250 grams and standard
deviation 2 grams. The weight of jam delivered in a jar is Normally distributed with mean
200 grams and standard deviation 5 grams.
a) Find the probability that the mean weight of 4 jam filled jars is greater than
454 grams.
b) Determine the least number of jam filled jars that have to be sampled so that there is at
most a 0.5% chance that their mean is greater than 454 grams.

Solution
(a) Let E be the weight of an empty jar and J be weight of jam
𝐸~𝑁(250; 22 ) 𝐽~𝑁(200 ; 52 )
Let a filled jar be Y (𝑌 = 𝐸 + 𝐽)
𝐸 (𝑌 ) = 𝐸 (𝐸 + 𝐽 )
= 250 + 200
= 450
𝑉𝑎𝑟(𝑌) = 𝑉𝑎𝑟(𝐸 + 𝐽)
= 4 + 25
= 29
𝑌~𝑁(450 ; 29)
The question is requesting for the probability of a sample mean
29
𝑌̅~𝑁(450 ; )
4

12 | PRECISION +263775973880 COMPILED BY MWEDZI S


454 − 450
𝑃(𝑌̅ > 454) = 𝑃 𝑍 >
√29
( 4 )
= 𝑃(𝑍 > 0.371)
= 1 − 𝛷1.486
= 1 − 0.9314
= 0.0686

29
(b) 𝑌̅~𝑁 (450 ; )
𝑛

𝑃(𝑌̅ > 454) = 0.05

454 − 450
𝑃 𝑍> = 0.05
√29
( 𝑛 )

454 − 450
1−𝛷 = 0.05
√29
( 𝑛 )

454 − 450
1 − 0.05 = 𝛷
√29
( 𝑛 )

454 − 450
0.95 = 𝛷
√29
( 𝑛 )

454 − 450
= 𝛷 −1 0.995
√29
𝑛

0.05

−2.576

13 | PRECISION +263775973880 COMPILED BY MWEDZI S


454 − 450
= −2.576
√29
𝑛

29
4 = −2.576√
𝑛

4 29
=√
−2.576 𝑛

29
2.41117240847 =
𝑛
29
𝑛=
2.41117240847
𝑛 ≈ 12.0273
𝑛 = 13

Distribution of sample proportion

A proportion is the percent, fraction, or ratio of a sample or population.

The population proportion is denoted by p and the sample proportion is denoted by 𝑝𝑠 or


𝑝̂ . (𝑝 is a probability of success and 𝑞 is the probability of failure)

THE CENTRAL LIMIT THEORM FOR SAMPLE PROPORTIONS

Suppose all samples of size n are taken from a population with proportion p with mean
𝐸 (𝑝𝑠 ) and 𝑉𝑎𝑟(𝑝𝑠 ), where;

𝐸 (𝑝𝑠 ) = 𝑥̅

𝑥
=
𝑛

=𝑝
𝑝𝑞
𝑉𝑎𝑟(𝑝𝑠 ) = 𝑛

By central limit theorem when n is large; 𝑛𝑝 ≥ 5 and 𝑛𝑞 ≥ 5 ,the distribution of 𝑝𝑠 is


approximately normal then

𝑝𝑞
𝑝𝑠 ~𝑁 (𝑝 ; )
𝑛

The probabilities of the distribution of the sample proportion can be found by using

14 | PRECISION +263775973880 COMPILED BY MWEDZI S


 Normal approximation method
 Normal distribution using the central limit theorem method.

Continuity correction

METHOD CONTINUITY CORRECTION


Normal approximation to ±𝟎. 𝟓
discrete distribution
Normal approximation to 𝟏
proportion sample. ±
𝟐𝒏

Worked Example

A recent study asked working adults if they worked most of their time remotely. The study
found that 30% of employees spend the majority of their time working remotely. Suppose a
sample of 150 working adults is taken.

a. What is the distribution of the sample proportion? Explain.


b. What is the mean and standard deviation of the sample proportion?
c. What is the probability that at most 27% of the workers in the sample work remotely
most of the time?

Solution:

a. 𝑛 = 150 ; 𝑝 = 0.3 𝑎𝑛𝑑 𝑞 = 0.7

Checking 𝑛𝑝 and 𝑛𝑞

𝑛𝑝 = 150 × 0.3 = 45 ≥ 5

𝑛𝑞 = 150 × (1 − 0.3) = 105 ≥ 5

The distribution of the sample proportion is normal.

b. The mean of the distribution of the sample proportions is 𝐸 (𝑝𝑠 ) = 0.3.


The standard deviation of the sample proportions is

0.7(0.3)
𝜎𝑝 = √
150

= 0.037416

c. Central limit method

15 | PRECISION +263775973880 COMPILED BY MWEDZI S


0.7(0.3)
𝑝𝑠 ~𝑁 (0.3 ; ) Becomes 𝑝𝑠 ~𝑁(0.3 ; 0.03742 )
150

𝑃(𝑃𝑠 ≤ 0.27) by continuity correction becomes


1
𝑃 (𝑃𝑠 < 0.27 + ) = 𝑃(𝑃𝑠 < 0.27333)
2(150)

0.27333−0.3
𝑃(𝑃𝑠 < 0.27) = 𝑃 (𝑍 < ) = 𝑃(𝑍 < −0.713)
0.0374
= 1 − 𝛷0.713
= 0.2379
Or

Normal approximation.
LET

This method applies since 𝑛𝑝 = 150 × 0.3 = 45 ≥ 5

and 𝑛𝑞 = 150 × (0.7) = 105 ≥ 5

let X be the number of workers who work remotely

𝑋~𝑁(45; 31.7)

27% 𝑜𝑓 150 = 40.5

𝑃(𝑋 ≤ 40.5) by continuity correction becomes 𝑃(𝑋 < 41)

41 − 45
𝑃(𝑋 < 41) = 𝑃 (𝑍 < ) = 𝑃 (𝑍 < −0.710) = 0.2387
√31.7

Worked example.
70% of the tomato plants of a particular variety produce more than 10 tomatoes per
plant. Find the probability that a random sample of 50 plants of this variety consist of
more than 37 plants which produce more than ten tomatoes per plant.

Solution

Let X be the number of plants


𝑋~𝐵(50; 0.7 )
50(0.7) > 5 𝑎𝑛𝑑 50(0.3) > 5 Normal approximation applies.
𝑋~𝑁(35 ; 10.5)
𝑃(𝑋 > 37) by becomes 𝑃(𝑋 > 37.5)
37.5 − 35
𝑃(𝑋 > 37.5) = 𝑃 (𝑍 > )
√10.5
= 𝑃(𝑍 > 0.772)
= 1 − 𝛷0.772

16 | PRECISION +263775973880 COMPILED BY MWEDZI S


= 0.22

OR

0.7(0.3)
𝑝𝑠 ~𝑁 (0.7 ; ) Becomes 𝑝𝑠 ~𝑁(0.3 ; 0.0042)
50

37 37 1
𝑃(𝑃𝑠 > 50) by continuity correction 𝑃(𝑃𝑠 > 50 + 2(50)) = 𝑃(𝑃𝑠 > 0.75)
0.75 − 0.7
𝑃(𝑃𝑠 > 0.75) = 𝑃 (𝑍 > )
√0.0042
= 𝑃(𝑍 > 0.772)
= 𝑃(𝑍 > 0.772)
= 1 − 𝛷0.772
= 0.22

17 | PRECISION +263775973880 COMPILED BY MWEDZI S


ESTIMATION
Estimation refers to the process by which one makes inferences about a population, based on
information obtained from a sample.
Types of Estimation
There are two types of estimation
 Point estimation
 Interval estimation

Point estimation
Point estimation involves the use of sample data (statistic) to estimate a single value which
is to serve as a "best guess" or "best estimate" of an unknown population parameter e.g.
Population mean.

OR

A point estimator is a statistic used to estimate the value of an unknown parameter of a


population. It uses sample data when calculating a single statistic that will be the best
estimate of the unknown parameter of the population.

NB: this process is done when the population parameters are not known

Properties of point estimator


There are three desirable properties every good estimator should possess. These are:

1. unbiasedness;
2. Consistency.
3. Efficiency

1. Unbiased estimates of population parameters

 A point estimate is called unbiased if its value is equal to the population parameter
e.g. if sample mean 𝑥̅ is equal to population mean 𝜇 then 𝑥̅ is an unbiased estimate of
the 𝜇.
o An estimate from an unbiased estimator is called an unbiased estimate
o This means that the mean of the unbiased estimates will get closer to
the population parameter as more samples are taken

2. Consistency
A point estimate should be consistent – the larger the sample size, the more accurate the
forecast is

3. Efficiency

18 | PRECISION +263775973880 COMPILED BY MWEDZI S


The most efficient point estimator is the one with the smallest variance of all the unbiased
and consistent estimators. If two competing estimators are both unbiased and consistent,
the one with the smaller variance (for a given sample size) is said to be relatively more
efficient

Calculations of unbiased estimate for the mean and variance of population.

Remember that

 Sample mean is denoted 𝑥̅ whilst population mean is denoted 𝜇


 Sample variance is denoted 𝑠 2 whilst population variance is denoted 𝜎 2

a. The unbiased estimate of 𝑝 the population proportion of success is 𝑝̂ which reads 𝑝 hat

𝑝̂ = 𝑝𝑠 ; 𝑝𝑠 𝑖𝑠 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒.

b. The unbiased estimate of population mean (𝜇) is 𝜇̂ which reads mu hat

∑𝑥
𝜇̂ = 𝑥̅ =
𝑛

c. The unbiased estimate of population variance (𝜎 2 ) is 𝜎̂ 2 which reads sigma hat


𝑛
𝜎̂ 2 = 𝑠 2 𝑤ℎ𝑒𝑟𝑒 𝑠 2 𝑖𝑠 𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑛−1

OR

1 (∑ 𝑥 )2
𝜎̂ 2 = (∑ 𝑥 2 − )
𝑛−1 𝑛

𝑜𝑟

∑(𝑥 − 𝑥̅ )
𝜎̂ 2 =
𝑛−1

Worked example
The times, T minutes, spent on daily revision of a random sample of 50 A Level students
from the Wedza are summarised as follows.

∑ 𝑡 = 6174 ; ∑ 𝑡 2 = 831 581

Calculate unbiased estimates of the population mean and variance of the times spent on daily
revision by A Level students in the Wedza.

19 | PRECISION +263775973880 COMPILED BY MWEDZI S


Solution

∑ 𝑡 6174
𝜇̂ = 𝑥̅ = =
𝑛 50

𝜇̂ = 123.48 𝑚𝑖𝑛𝑢𝑡𝑒𝑠

1 (∑ 𝑡)2
𝜎̂ 2 = (∑ 𝑡 2 − )
𝑛−1 𝑛
1 (6174)2
𝜎̂ 2 = (831 581 − )
49 50
𝜎̂ 2 = 1412.56 𝑚𝑖𝑛𝑢𝑡𝑒𝑠
Worked example
Potato pockets are filled by a machine. A random sample of 10 pockets from the production
line had the following quantities in kg.
20.12 20.50 20.91 20.23 20.46
20.64 21.01 20.19 20.37 20.73
Calculate the unbiased estimates of the
(a) Mean [2]
(b) Variance Zimsec [2]
Solution
a) Let X be the number of potatoes filled by the machine.

∑ 𝑋 = 205.16 ∑ 𝑋 2 = 4 209.8926 𝑛 = 10
i. 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 𝑜𝑓 𝑚𝑒𝑎𝑛 = 𝑥̅ = 𝜇̂
∑𝑥
=
𝑛
205.16
=
10
= 20.516
ii. 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝜎̂ 2
1 (∑ 𝑥 )2
= (∑ 𝑋 2 − )
𝑛−1 𝑛

1 (205.16)2
= (4 209.8926 − )
9 10

= 0.092226666
= 0.092

20 | PRECISION +263775973880 COMPILED BY MWEDZI S


Interval estimation
Interval estimation is the use of sample data(statistic) to calculate an interval of possible (or
probable) values of an unknown population parameter, in contrast to point estimation, which
is a single number.
Types of interval estimation
There are several types of interval estimation but confidence interval is the most
common type to be discussed in this course.

Confidence Interval
A Confidence Interval is
 an estimate of an interval that may contain a population parameter
 a range of values the statistician is fairly sure the true value of population parameter
lies in
 Is a range of values i.e.[𝑎 ; 𝑏] , bounded above (by 𝑏) and below (by 𝑎) the statistic's
mean that likely would contain an unknown population parameter.

Statisticians use confidence intervals to measure uncertainty in a sample variable.


E.g. a researcher may select different samples randomly from the same population and
compute a confidence interval for each sample to see how it may represent the true value of
the population variable. The result from datasets will differ where some intervals include the
true population parameter and others do not
This interval has a confidence level
The confidence level of a confidence interval is the probability that the interval
contains the population parameter
Confidence levels of 90%; 95% and 99% are commonly used.

Types of confidence intervals


There are two types of confidence intervals that can be used:
a. z-intervals confidence interval of Z distribution (normal distribution)
b. t-intervals.

21 | PRECISION +263775973880 COMPILED BY MWEDZI S


a. z-intervals or confidence interval of Z distribution
Suppose the researcher does not know the mean 𝜇 of a particular population and his or her
confidence level is 𝟗𝟓% that population parameters lies between the interval [𝑎 ; 𝑏] then
𝑃(𝑎 < 𝜇 < 𝑏) = 95%
OR
𝑃(𝑎 < 𝜇 < 𝑏) = 0.95
𝜎2
Where 𝑋̅~𝑁 (𝜇 , ) 𝑛

𝑎 𝜇 𝑏

When standardized the distribution becomes 𝑍~𝑁( 0 ; 1)

−𝑧 0 𝑧
To obtain the critical z values consider the area from −∞ 𝑡𝑜 𝑧
𝑃(𝑍 > 1.96) = 0.975
𝛷𝑧 = 0.975
𝑧 = 𝛷 −1 0.975
𝑧 = 1.96

Therefore the critical z values are −1.96 𝑎𝑛𝑑 1.96 since the two points are symmetrical
about 0.

22 | PRECISION +263775973880 COMPILED BY MWEDZI S


𝑋̅ − 𝜇
∴ 𝑪𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 𝑳𝒆𝒗𝒆𝒍 𝒂𝒕 𝟗𝟓% 𝒃𝒆𝒄𝒐𝒎𝒆𝒔 𝑃(−1.96 < 𝑍 < 1.96) 𝑤ℎ𝑒𝑟𝑒 𝑍 = 𝜎
√𝑛
In short critical values 𝒛 of 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 𝑎𝑡 𝛼% 𝑖𝑠
1+𝛼
𝑧 = ±𝛷 −1 [( )]
2
1 + 0.95
𝑍 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 𝑎𝑡 95% = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.975

= ±𝛷 −1 0.975
= ±1.960

𝑋̅ − 𝜇
𝑃(−1.96 < 𝑍 < 1.96) = 𝑃 (−1.96 < 𝜎 < 1.96)
√𝑛
𝜎 𝜎
= 𝑃 (−1.96 < 𝑋̅ − 𝜇 < 1.96 ) Multiply both side of the
√𝑛 √𝑛
𝜎
inequality by
√𝑛

𝜎 𝜎
= 𝑃 (−1.96 < 𝑋̅ − 𝜇 < 1.96 )
√𝑛 √𝑛

𝜎 𝜎
= 𝑃 (1.96 > −𝑋̅ + 𝜇 > −1.96 ) multiply both side by –
√𝑛 √𝑛

𝜎 𝜎
= 𝑃 (𝑋̅ + 1.96 > 𝜇 > 𝑋̅ − 1.96 ) add 𝑋̅ both side
√𝑛 √𝑛

23 | PRECISION +263775973880 COMPILED BY MWEDZI S


𝜎 𝜎
= 𝑃 (𝑋̅ − 1.96 < 𝜇 < 𝑋̅ + 1.96 )
√𝑛 √𝑛

𝜎 𝜎
𝑋̅ − 1.96 < 𝜇 < 𝑋̅ + 1.96 is the Z distribution formula at 95%
√𝑛 √𝑛

The general formulae for Confidence Interval (C.I) of a Z distribution.


i. If population is normal with known variance and sample of any size.

𝜎 𝜎 𝜎 𝜎
a. 𝐶. 𝐼 = 𝑋̅ − 𝑍 < 𝜇 < 𝑋̅ + 𝑍 𝑜𝑟 (𝑋̅ − 𝑍 ; 𝑋̅ + 𝑍 )
√𝑛 √𝑛 √ 𝑛 √𝑛
ii. If population is non-normal with known variance and large sample size (𝑛 ≥30)

𝜎 𝜎 𝜎 𝜎
a. 𝐶. 𝐼 = 𝑋̅ − 𝑍 < 𝜇 < 𝑋̅ + 𝑍 𝑜𝑟 (𝑋̅ − 𝑍 ; 𝑋̅ + 𝑍 )
√𝑛 √𝑛 √𝑛 √𝑛

iii. If population normal or non-normal with unknown variance and large sample size
( where 𝑛 ≥30)
𝜎
̂ 𝜎
̂ 𝜎
̂ 𝜎
̂
a. 𝐶. 𝐼 = 𝑋̅ − 𝑍 𝑛 < 𝜇 < 𝑋̅ + 𝑍 𝑛 𝑜𝑟 (𝑋̅ − 𝑍 𝑛 ; 𝑋̅ + 𝑍 𝑛)
√ √ √ √
where 𝜎̂ 2 is the unbiased estimate of population variance

iv. If it’s a population proportion from a large sample

𝑝𝑠 𝑞𝑠 𝑝𝑠 𝑞𝑠 𝑝𝑠 𝑞𝑠 𝑝𝑠 𝑞𝑠
𝐶. 𝐼 = 𝑝𝑠 − 𝑍√ < 𝜇 < 𝑝𝑠 + 𝑍√ 𝑜𝑟 (𝑝𝑠 − 𝑍√ ; 𝑝𝑠 + 𝑍√ )
𝑛 𝑛 𝑛 𝑛

The width of the a z-interval


𝜎
The width of a confidence interval (𝑎 ; 𝑏) = 𝑏 − 𝑎 𝑜𝑟 2 × 𝑍 ,
√𝑛

Confidence interval width increases as the confidence level increases


Margin of error
The margin of error is the possible degree of error while conducting a survey.
The margin of error is equal to half the width of the entire confidence interval.

𝜎 𝜎
̂ 𝑝𝑞
𝑀𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟 == 𝐳 𝑜𝑟 𝑧 Or 𝑧 × √ 𝑛
√𝑛 √𝑛

Standard error of the mean in an Interval


 The standard error of the mean, or simply standard error, indicates how different the
population mean is likely to be from a sample mean
 With 95% confidence interval, 95% of all sample will be expected to lie within a
confidence interval of ±1.960 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟𝑠 of all the sample mean.
 The bigger the sample size the smaller the standard error of mean

24 | PRECISION +263775973880 COMPILED BY MWEDZI S


𝜎 𝜎̂
𝐒𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐞𝐫𝐫𝐨𝐫 = 𝑜𝑟
√𝑛 √𝑛

Worked example
The masses, m grams, of a random sample of 80 strawberries of a certain type were measured
and summarised as follows.
𝑛 = 80 ∑ 𝑚 = 4200 ∑ 𝑚2 = 229 000
(i) Find unbiased estimates of the population mean and variance. [3]
(ii) Calculate a 98% confidence interval for the population mean. 50 random samples of
size 80 were taken and a 98% confidence interval for the population mean, 𝜇, was
found from each sample. [3]
(iii) Find the number of these 50 confidence intervals that would be expected to include
the true value of 𝜇. Camb [1]
Solution
(i) 𝜇̂ =𝑚̅
∑𝑚
=
𝑛
4200
=
80
= 52.5
1 (∑ 𝑚)2
𝜎̂ 2 = (∑ 𝑚2 − )
𝑛−1 𝑛

1 (4200)2
= (229000 − )
79 80

= 107.5949367
= 108 (3𝑠𝑓)
(ii) C.I is a Z distribution : population is non-normal with unknown variance and sample
size is large.
𝜎̂ 𝜎
̂
C.I is (𝑋̅ − 𝑍 ; 𝑋̅ + 𝑍 )
√𝑛 √𝑛
1 + 0.98
𝑧 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝑜𝑓 98% 𝐶. 𝐼 = ±𝛷 −1 [ ]
2
−1
= ±𝛷 0.99

𝑧 = 2.326

25 | PRECISION +263775973880 COMPILED BY MWEDZI S


107.5949367 107.5949367
𝐶. 𝐼 = (52.5 − 2.326 × √ ; 52.5 + 2.326 × √ )
80 80
= (49.8 ; 55.2)

(iii)98% × 50 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 = 49 𝑠𝑎𝑚𝑝𝑙𝑒𝑠

Worked example
The masses of heavy weight boxers have mean 𝜇 and standard deviation 𝜎. A random sample
of 49 heavy weight boxers is taken and a 95% confidence interval is constructed for 𝜇.
Given that 95% confidence interval is [94.5 ; 105.3]; find
(i) The sample mean 𝜇 and the standard deviation 𝜎
(ii) A 99% confidence interval for 𝜇 Zimsec N2020 P2 [9]

Solution
The distribution is z since variance is known and sample size is large.
𝜎 𝜎
(i) 𝐶. 𝐼 = [𝜇 − 𝑍 × ; 𝜇 − 𝑍 × ] = [94.5 ; 105.3]
√𝑛 √𝑛
1 + 0.95
𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠(𝑧 = ±𝛷 −1 [ ]
2

= ±𝛷 −1 0.975
= ±1.960
𝜎 𝜎
[𝜇 − 1.960 × ] = [94.5 ; 105.3]
; 𝜇 + 1.960 ×
√49 √49
𝜎 𝜎
[𝜇 − 1.960 × ; 𝜇 + 1.960 × ] = [94.5 ; 105.3]
7 7
𝜎
𝜇 − 1.960 × = 94.5 𝑒𝑞𝑎𝑢𝑡𝑖𝑜𝑛 1
7
𝜎
𝜇 + 1.960 × = 105.3 𝑒𝑞𝑎𝑢𝑡𝑖𝑜𝑛 2
7
7𝜇 − 1.960𝜎 = 661.5 𝑒𝑞𝑎𝑢𝑡𝑖𝑜𝑛 1
7𝜇 + 1.960𝜎 = 737.1 𝑒𝑞𝑎𝑢𝑡𝑖𝑜𝑛 2

𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛1 − 𝑒𝑞𝑎𝑢𝑡𝑖𝑜𝑛2
−3.92𝜎 −75.6
=
−3.92 −3.92
𝜎 = 19.28571428

26 | PRECISION +263775973880 COMPILED BY MWEDZI S


7𝜇 − 1.960(19.28571428) = 661.5
𝜇 = 99.9
19.28571428
(ii) 99.9 ∓ 𝑧 ×
√49
𝑧 = 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝑜𝑓 99% 𝐶. 𝐼
1 + 0.99
= ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.995
= ±2.576

19.28571428
𝐶. 𝐼 = 99.9 ± 2.576 ×
√49
= (92.8 ; 107)
Worked example
The results of a survey showed that 360 oout of 1000 families view a certain television
show.
Calculate the 95% confidence interval for the population of families viewing the show.
Zimsec N2021 P1 [5]
Solution
𝑝𝑠 𝑞𝑠
𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 = 𝑝𝑠 ± 𝑍√
𝑛
1 + 0.95
𝑍 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 𝑎𝑡 95% = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.975
𝑝𝑠 = 0.36 𝑞𝑠 = 0.64

0.36 × 0.64
𝐶. 𝐼 = 0.36 ± 1.96√
1000

= (0.3302492 ; 0.3897507)
= (0.33 ; 0.39 ) 𝑡𝑜 2 𝑠 𝑓
Pencils produced on a certain machine have lengths, in millimetres, which are normal
distributed with a mean 𝜇 and standard deviation of 3. A random sample of 16 pencils was
taken and the length 𝑥 millimetres, measured for each pencil, giving
∑ 𝑥 = 2848

a) State why 𝑋̅, the mean length; in millimetres, of a random sample of 16 pencils produced
on the machine; is normally distributed. [1]
b) Construct a 99% confidence interval for . [5]

27 | PRECISION +263775973880 COMPILED BY MWEDZI S


Solution
a) Length 𝑥 follows a normal distributrition
∑𝑥
b) 𝑋̅ =
𝑛
2848
=
16
= 178
𝑧 = 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝑜𝑓 99% 𝐶. 𝐼
1 + 0.99
= ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.995
= ±2.576

3
𝐶. 𝐼 = 178 ± 2.576 ×
√16
= (176.068 ; 179.932)
Worked example
A machine produces balls which are normally distributed with a mean of 𝜇 𝑐𝑚 and a standard
deviation of 0.24 𝑐𝑚.
The diameter, d cm, of each ball in a random sample of 144 balls was measured. This gave:
∑ 𝑑 = 3585.6
(i) Calculate the unbiased estimate of 𝜇 [1]
(ii) Calculate the standard error of your unbiased estimate. [2]
(iii) Construct a 95% confidence interval of 𝜇. [4]
(iv) Hence state, with a reason, whether you agree with the claim that 𝜇 = 25 [2]
Solution

(i) 𝜇̂ = 𝑑̅
3585.6
=
144
= 24.9
𝜎
(ii) 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝜇̂ =
√𝑛
0.24
=
√144
= 0.02
(iii) .
1 + 0.95
𝑍 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 𝑎𝑡 95% = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.975
= 1.960

𝐶. 𝐼 = 24.5 ± 1.960(0.02) = (24.4608 ; 24.5392 )

28 | PRECISION +263775973880 COMPILED BY MWEDZI S


(iv) I disagree with the claim since 95% confidence interval excludes 25.

Worked example

The contents of each of a random sample of 100 cans of soft drink are measured. The results
have a mean of 331.28 𝑚𝑙 and standard deviation of 2.97 𝑚𝑙.

a) Calculate the unbiased estimate of the population variance. [2]


b) Construct a 99% confidence interval for the population mean [4]
c) Explain why, in answering part (b), an assumption regarding the distribution of the
contents of cans was not necessary. [2]
Solution
Let X be a can of soft drink
𝑛
a) 𝜎̂ 2 = 𝑛−1 𝑠 2

100
𝜎̂ 2 = 2.972
100 − 1
100
= 2.972
100 − 1
= 8.91
𝜎
b) 𝐶. 𝐼 = 𝑥̅ ± 𝑧 𝑛

𝑧 = 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝑜𝑓 99% 𝐶. 𝐼


1 + 0.99
= ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.995
= ±2.576
8.91
c) 𝐶. 𝐼 = 331.28 ± 2.576√ 100 = (330.5 ; 332.05)
d) The sample size is large. By CENTRAL LIMIT THEOREM the mean is
approximately normally distributed.

Worked example
The lifetimes of light bulbs of a certain type have standard deviation 25.3 hours. Each bulb in
randomly chosen box of 12 was tested to failure and the mean lifetime was found to be 1785.7
hours.
a) State two assumptions which are required so that a symmetric 90% confidence interval
for population mean lifetime of the bulbs can be calculated.
b) Calculate a symmetric 90% confidence interval, given the validity of the assumptions.
The values of the end-points should be given to nearest integer. Camb

Solution
a) the distribution is normal ; the bulbs in the box form a sample.
b)
1 + 0.9
𝑧 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.95
= ±1.645

29 | PRECISION +263775973880 COMPILED BY MWEDZI S


25.3
𝐶. 𝐼 = 1785.7 ± 1.645 × = (1774 𝐻𝑜𝑢𝑟𝑠 ; 1798 𝐻𝑜𝑢𝑟𝑠)
√12

Worked example
A group of 65 students is asked to guess the length of a particular object and their answers are
recorded as x cm, with the following results.
∑ x = 6019.0 ∑ x 2 = 557 733.8
a) Show that the estimated standard error of the sample mean is 0.3cm
b) Determine an approximate symmetric 95% confidence interval for the mean of the
population of all such guesses, giving your limits correct to two 1 decimal places.
c) State one assumption which you have made in your calculations. NEAB

Solution
a)
1 6019.02
σ2
̂ = (557 733.8 − 0)
64 65
117
=
20
= 5.85
̂
σ
S. E =
√n
̂2
σ
=√
n

5.85
=√
65

9
.= √
100
3
=
10
= 0.3 shown
6019
(b) x̅ = = 92.6
65
1 + 0.95
𝑍 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 𝑎𝑡 95% = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.975
= ±1.960
𝐶. 𝐼 = 92.6 ± 1.960(0.3)
= (92.012 ; 93.188)

30 | PRECISION +263775973880 COMPILED BY MWEDZI S


= (92.01 ; 93.19)
(c) Since sample is large by Central Limit Theorem the distribution is approximately
Normal.

Worked example
Shoe shop staff routinely measure the length of their customers feet. Measurements of the
length of one foot (without shoes) from each of 180 adult male customers yielded a mean
length of 29.2 cm and a standard deviation of 1.47 cm.
(a) Calculate a 95% confidence interval for the mean length of male feet.
(b) Why was it not necessary to assume that the lengths of feet are normally distributed in
order to calculate the confidence interval in part (a)?
(c) What assumption was it necessary to make in order to calculate the confidence interval in
part (a)?
(d) Given that the lengths of male feet may be modelled by a normal distribution, and
making any other necessary assumptions, calculate an interval within which 90% of the
lengths of male feet will lie.
(e) In the light of your calculations in parts (a) discuss, briefly, the question 'is a foot long?"
(One foot is 30.5 cm.) [AEB]

Solution
(a) The population variance is unknown and the sample size is large
𝑋̅ = 29.2
𝑛
𝜎̂ 2 = 𝑠2
𝑛−1
180
= × 1.47 2
179
= 2.172972067

1 + 0.95
𝑍 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 𝑎𝑡 95% = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.975
= 1.960

2.172972067
𝐶. 𝐼 = 29.2 ± 1.960 × √
180
= (28.98𝑐𝑚 ; 29.42𝑐𝑚)

(b) Because the data given is of a sample


(c) Central limit theorem: The sample mean from a large sample is approximately normal
(d) If 𝑋~𝑁(29.2 ; 1.472 )
1 + 0.9
𝑧 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.95
= ±1.645
𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 = 29.2 ± 1.645 × 1.47

31 | PRECISION +263775973880 COMPILED BY MWEDZI S


= (26.78 𝑐𝑚 ; 31.62𝑐𝑚)

(e) No foot is 30.5cm since 30.5cm is out of the 95% confidence interval for mean

Worked example
The probability of success in each of a long series of 𝑛 independent trials is constant and
equal to 𝑝. Explain how an approximate 95% confidence interval for p may be obtained. In
an opinion poll carried out before a local election, 501 people out of a random sample of 925
voters declare that they will vote for a particular one of the two candidates contesting the
election. Find approximate 95% confidence limits for the proportion of all voters in favour of
this candidate. (AEB)
Solution
Population proportion from a large sample
1 + 0.95
𝑍 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 𝑎𝑡 95% = ±𝛷 −1 [ ]
2
= ±𝛷 −1 0.975
= ±1.960

501 424
501 √(925) (925)
𝑪. 𝑰 = ± 1.960
925 925
= (𝟎. 𝟓𝟏𝟎 ; 𝟎. 𝟓𝟕𝟒)

8. The results of a survey showed that 3600 out of 10 000 families regularly purchased a
specific weekly magazine.
(a) Find approximate 95% confidence limits for the proportion of families buying the
magazine.
(b) Estimate the additional number of families to be contacted if the probability that the
estimated proportion is in error by more than 0.01 is to be at most 1%. (AEB)
Solution
1+0.95
(a) 𝑍 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 𝑎𝑡 95% = ±𝛷 −1 [ 2 ]
= ±𝛷 −1 0.975
= ±1.960

3600 6400
3600 √(10 000) (10 000)
𝐶. 𝐼 = − 1.960
10 000 10 000

32 | PRECISION +263775973880 COMPILED BY MWEDZI S


= (0.350592 ; 0.369408)
= (0.351 ; 0.369)
(b) .
𝑀𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟 (𝑀. 𝐸 )𝑖𝑠 𝑔𝑖𝑣𝑒𝑛 𝑎𝑠 0.01

𝑝𝑞
𝑀. 𝐸 = 𝑧√
𝑛

Make n the subject to get a new sample population proportions:

𝑍 2
𝑛 = 𝑝𝑞 (𝑀.𝐸 )

After calculating the value of n from the formula, round the value of n up to the next integer.

3600 6400
1.960𝟐 × ((10 000)) (10 000)
𝐧=
0.012
𝐧 = 𝟖𝟖𝟓𝟏. 𝟎𝟒𝟔𝟒
= 𝟖𝟖𝟓𝟐
𝟖𝟖𝟓𝟐 − 𝟑𝟔𝟎𝟎 = 𝟓𝟐𝟓𝟐 𝐚𝐝𝐝𝐢𝐭𝐢𝐨𝐧𝐚𝐥 𝐟𝐚𝐦𝐢𝐥𝐢𝐞𝐬

33 | PRECISION +263775973880 COMPILED BY MWEDZI S


b. t-intervals or confidence interval of t-distribution
The t-distribution, also known as the Student’s t-distribution
It is a type of probability distribution that is used in statistics to estimate the population
parameters for small sample sizes (𝑛 < 30) or unknown variances.

 The t-distribution like normal distribution has bell shaped distributions that is
symmetric about the mean and
 The mean of the standard normal distribution and t-distribution is zero.
 The t-distribution has a larger variance than the standard normal distribution.
 The standard normal distributions' confidence levels are wider than those of the t-
distribution.

The t distribution has only one parameter called the degrees of freedom which is denoted (𝑣),
𝑣 = 𝑛 – 1 where n is the sample size.

As sample size 𝑛 and degrees of freedom increase the t-distribution becomes more similar
to a normal distribution.

If variable T follows a t-distribution then 𝑇~𝑡(𝑛 − 1)


Critical T values
The general formulae for Confidence Interval (C.I) of a T distribution.
If population normal or non-normal with unknown variance and small sample size ( where
𝑛 <30)
𝜎̂ 𝜎̂ 𝜎̂ 𝜎̂
𝐶. 𝐼 = 𝑋̅ − 𝑡 < 𝜇 < 𝑋̅ + 𝑡 𝑜𝑟 (𝑋̅ − 𝑡 ; 𝑋̅ + 𝑡 )
√𝑛 √𝑛 √𝑛 √𝑛
where 𝜎̂ 2 is the unbiased estimate of population variance

34 | PRECISION +263775973880 COMPILED BY MWEDZI S


The width of the a z-interval

𝜎
̂
The width of a confidence interval (𝑎 ; 𝑏) = 𝑏 − 𝑎 𝑜𝑟 2 × 𝑡 ,
√𝑛

Margin of error
𝜎̂
𝑀𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟 = 𝑡
√𝑛
Standard error of the mean in an Interval
𝜎̂
𝐒𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐞𝐫𝐫𝐨𝐫 =
√𝑛

Worked example

The mass of a certain brand of chocolate bar has a normal distribution with mean 𝜇 grams
and standard deviation 0.85 grams. The masses, in grams, of 5 randomly chosen bars are

124.31, 125.14, 124.23, 125.41, 125.76

Calculate a symmetric 90% confidence interval for 𝜇, giving the end-points correct to two
decimal places.

Forty random samples of 5 bars are taken, and a 90% confidence interval for 𝜇 is calculated
for each sample. Find the expected number of intervals that do not contain 𝜇. [Camb]

Solution

Sample size is small 𝑛 = 5; variance is unknown


The distribution is t
𝑋~𝑡(5 − 1) ie 𝑋~𝑡(4)

𝜎
̂ 𝜎
̂
The formula is (𝑋̅ − 𝑡 ; 𝑋̅ + 𝑡 )
√𝑛 √𝑛

∑ 𝑥 = 624.85;

624.85
𝑥̅ = = 124.97
5

∑ 𝑥 2 = 78 089.3343

1 (124.97)2
𝜎̂ 2 = (78 089.3343 − )
5−1 5
= 0.45745
𝜎̂ = 0.6763505

35 | PRECISION +263775973880 COMPILED BY MWEDZI S


At 90% find the critical t values of 95% using 4 as the degrees of freedom from the
𝑡-distribution table of critical values

95%

2.132

1 + 0.9
𝑡 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 = ±𝛷 −1 [ ]
2

= ±𝛷 −1 0.95

The critical values of t with 𝑑. 𝑓 = 4 are ±2.132

0.6763505 0.6763505
𝐶𝐼 = (124.97 − 2.132 × ; 124.97 + 2.132 × )
√5 √5
= (124.33; 125.61)

Worked example
The external diameters (measured in units of 0.01mm above a nominal value) of a sample of
piston rings produced on the same machine were:
11, 9, 32, 18, 29, 1, 21, 19, 6.
Assuming a normal distribution calculate a 95% confidence interval for the population mean.
[AEB]

Solution
X is the external diameter of a piston ring
146
𝑥̅ = 9 = 16.222
2
1 1462
𝜎̂ = (3230 − )
8 9

36 | PRECISION +263775973880 COMPILED BY MWEDZI S


= 107.694444

𝑋~𝑡(8) Since sample size is small and variance is unknown

1 + 0.95
𝑡 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 = ±𝛷 −1 [ ]
2

= ±𝛷 −1 0.975

The critical values of t with 𝑑. 𝑓 = 8 are ±2.306

107.694444
𝐶. 𝐼 = 16.222 ± 2.306 × √
9

= (8.25 ; 24.2)

= (8.25 × 0.01𝑚𝑚 ; 24.2 × 0.01𝑚𝑚)

= (0.0825𝑚𝑚 ; 0.242𝑚𝑚)

Worked example

In Tesbury’s supermarket, economy packs of butter are marked 250g. An inspector takes a
random sample of 12 packs and weighs them. Correct to the nearest 0.1g; the weights, in
grams, were

246.5 240.9 245.3 250.5 248.7 249.1

251.0 249.8 249.8 247.6 246.2 241.4

(a) Making any necessary assumptions, which should be stated, calculate a 99%
confidence interval for the mean weight of the packs of butter.
(b) Calculate the width of the 99% confidence interval.
(c) How is the width affected when calculating a 90% confidence interval

Solutions

(a) The sample is small and population variance is unknown.

Let x be an economy pack of butter marked 250g.

𝑥~𝑡(11)

2966.8
𝑥̅ = = 247.23333
12

37 | PRECISION +263775973880 COMPILED BY MWEDZI S


2
1 2966.82
𝜎̂ = (733 615.14 − )
11 12
= 11.207879

1 + 0.99
𝑡 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 = ±𝛷 −1 [ ]
2

= ±𝛷 −1 0.995

The critical values of t with 𝑑. 𝑓 = 11 are ±3.106

11.207879
𝐶. 𝐼 = 247.23333 ± 3.106 × √
12

= (244.231597𝑔 ; 250.2350697𝑔)

= (244.2 𝑔 ; 250.2𝑔) to the nearest 0.1g

(b) 250.2350697𝑔 − 244.231597𝑔 = 6.0034𝑔

= 6.0𝑔

(c) The width becomes smaller.

Worked example
An experimental physicist needs to estimate the true viscosity, 𝜇 Pascal seconds (Pa s), of a
light machine oil. Using the same apparatus he takes 12 independent measurements, 𝑥 Pa s,
of the viscosity of the oil, obtaining the values below:
25.8 25.2 24.7 25.5 25.3 25.4
25.2 25.3 25.8 25.9 25.2 24.9
(∑ 𝑥 = 304.2 ∑ 𝑥 2 = 7712.9)
When using this apparatus, measurements of the oil's viscosity are distributed with mean 𝜇
and variance 𝜎 2 . Obtain unbiased estimates of 𝜇 and 𝜎 2 . Hence obtain a symmetric 95%
confidence interval for 𝜇.
State any distributional assumptions you have made in obtaining your confidence interval.
The physicist explained the meaning of his confidence interval by saying there was a
probability of 0.95 that 𝜇 lay between the limits of the interval. Explain why this
interpretation is wrong and provide a correct explanation of 95 % confidence as used in this
context.
The manufacturer of the oil quotes a viscosity of 25.5 Pa s for the oil. With reference to your
confidence interval, state any conclusion you can come to regarding the validity of this
figure. (NEAB)

38 | PRECISION +263775973880 COMPILED BY MWEDZI S


Solution
𝑥̅ = 𝜇̂
304.2
=
12
= 25.35

2
1 304.22
𝜎̂ = (7712.9 − )
11 12

= 0.13
𝑥~𝑡(11)

1 + 0.95
𝑡 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 = ±𝛷 −1 [ ]
2

= ±𝛷 −1 0.975

The critical values of t with 𝑑. 𝑓 = 11 are ±2.201

0.13
𝐶. 𝐼 = 25.35 ± 2.201 × √
12

= (25.1 ; 25.6)
The assumptions are: the sample is small and the population variance is unknown.
The 95% Confidence interval (25.1 ; 25.6) either covers the parameter value (25.35) or it
does not. The 95% probability relates to the reliability of the estimation procedure, not to a
specific calculated interval.

The figure (25.5) is valid since it contains (25.1 ; 25.6)


Worked example.

In an investigation to assess the difference in use between a credit card and a store card a
random sample of 20 people, each using both cards, was selected. They supplied information
from which, in 1994, the difference between each person's mean monthly spending on the
credit and store cards, £d, was calculated. The following summary data were then calculated.
∑ 𝑑 = 1664 and ∑ 𝑑 2 = 426 445.
Stating all necessary distributional assumptions, calculate a symmetric 90% confidence
interval for the mean difference between the mean monthly spending for all users of the two
cards. (NEAB)
Solution

39 | PRECISION +263775973880 COMPILED BY MWEDZI S


1664
𝑑̅ =
20
= 83.2
1 16642
𝜎̂ 2 = (426 445 − )
19 20

= 15 157.90526

1 + 0.90
𝑡 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 = ±𝛷 −1 [ ]
2

= ±𝛷 −1 0.95

The critical values of t with 𝑑. 𝑓 = 19 are ±1.729

15 157.9052
𝐶. 𝐼 = 83.2 ± 1.729 × √
20

= (£35.60080710 ; £130.7991929)
= (£35.60 ; £130.80)

Worked example
Five independent measurements of the diameter of a ball bearing were made using a certain
instrument. The results obtained in millimetres were
8.1; 9.1; 8.9; 8.9; 9.1
Given that true diameter of the ball bearing is 9.0 mm,
(a) calculate the unbiased estimates of the mean and variance of the measurement error of the
instrument, [5]
(b) Assuming that the measurement errors are independent and normally distributed, find the
90% confidence interval of the mean measurement error. Zimsec N2021 p1 [3]
Solution
(a) The errors are −0.9; 0.1; −0.1 ; −0.1; 0.1
−0.9 + 0.1 − 0.1 − 0.1 + 0.1
𝜇̂ =
5
= −0.18

∑ 𝑥 = −0.9 𝑎𝑛𝑑 ∑ 𝑥 2 = 0.85

40 | PRECISION +263775973880 COMPILED BY MWEDZI S


2
1 (−0.9)2
𝜎̂ = (0.85 − )
4 5

= 0.172

(b) Since variance is unknown and sample size is small the distribution is t.
1+0.9
𝑡 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 (𝑤𝑖𝑡ℎ 4 𝑑𝑓) = ±𝛷 −1 [ 2 ]
= ±𝛷 −1 0.95
= ±2.132
0.172
𝐶. 𝐼 = −0.18 ± 2.132 × √
5
= (0.575 ; 0.215 )

41 | PRECISION +263775973880 COMPILED BY MWEDZI S


Confidence Interval for the Difference between Means OR Confidence Interval for Two
Independent Samples.

A confidence interval (C.I.) for a difference between means is a range of values that is likely
to contain the true difference between two population means with a certain level of confidence.

To determine whether the difference between two means is statistically significant, analysts
often compare the confidence intervals for those groups.

i. Confidence Interval for the Difference between Means of populations X and Y


 Normal with known variance and sample of any size.

𝜎𝑥 2 𝜎𝑦 2
𝐶. 𝐼 = (𝑋̅ − 𝑌̅) ± 𝑍√ +
𝑛𝑥 𝑛𝑦
𝑜𝑟
𝜎𝑥 2 𝜎𝑦 2
𝐶. 𝐼 = (𝑌̅ − 𝑋̅ ) ± 𝑍√ +
𝑛𝑥 𝑛𝑦

ii. Confidence Interval for the Difference between Means of populations X and Y
 non-normal with known variance and large sample size (𝑛 ≥30)

𝜎𝑥 2 𝜎𝑦 2
𝐶. 𝐼 = (𝑋̅ − 𝑌̅) ± 𝑍√ +
𝑛𝑥 𝑛𝑦
𝑜𝑟
𝜎𝑥 2 𝜎𝑦 2
𝐶. 𝐼 = (𝑌̅ − 𝑋̅ ) ± 𝑍√ +
𝑛𝑥 𝑛𝑦

iii. Confidence Interval for the Difference between Means of populations X and Y
 normal or non-normal with unknown variance and large sample size ( where
𝑛 ≥30)
𝜎̂𝑥 2 𝜎̂𝑦 2
𝐶. 𝐼 = (𝑋̅ − 𝑌̅) ± 𝑍√ +
𝑛𝑥 𝑛𝑦
𝑜𝑟
𝜎̂𝑥 2 𝜎̂𝑦 2
𝐶. 𝐼 = (𝑌̅ − 𝑋̅ ) ± 𝑍√ +
𝑛𝑥 𝑛𝑦

Where 𝜎̂ 2 is the unbiased estimate of population variance.

iv. Confidence Interval for the Difference between Means of populations X and Y

42 | PRECISION +263775973880 COMPILED BY MWEDZI S


 normal or non-normal with unknown variance and small sample size ( where
𝑛 <30)

2
𝜎̂𝑥 2 𝜎̂𝑦
̅ ̅
𝐶. 𝐼 = (𝑋 − 𝑌 ) ± 𝑡√ +
𝑛𝑥 𝑛𝑦
𝑜𝑟
𝜎̂𝑥 2 𝜎̂𝑦 2
𝐶. 𝐼 = (𝑌̅ − 𝑋̅ ) ± 𝑡√ +
𝑛𝑥 𝑛𝑦

Where 𝜎̂ 2 is the unbiased estimate of population variance.

Worked example
Kayla is investigating the lengths of the leaves of a certain type of tree found in two forests X
and Y. She chooses a random sample of 40 leaves of this type from forest X and records their
lengths, x cm. She also records the lengths, y cm, for a random sample of 60 leaves of this
type from forest Y. Her results are summarised as follows.
∑ 𝑥 = 242.0 ∑ 𝑥 2 = 1587.0 ∑ 𝑦 = 373.2 ∑ 𝑦 2 = 2532.6
Find a 90% confidence interval for the difference between the population mean lengths of
leaves in forests X and Y. [7]

Solution
In both cases sample sizes are large and variances are unknown.
242.0
𝑋̅ = = 6.05
40
373.2
𝑌̅ = = 6.22
60
2 1 242.02
𝜎̂𝑋 = (1587.0 − ) = 3.151282051
39 40
2 1 373.22
𝜎̂𝑌 = (2532.6 − ) = 3.581288136
59 60

1 + 0.9
𝑧 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 = ±𝛷 −1 [ ]
2

= ±𝛷 −1 0.95

= ±1.645

3.15128205 3.581288136
𝐶. 𝐼 = (6.05 − 6.22) ± 1.645√ +
40 60
= (−0.782 ; 0.442)

43 | PRECISION +263775973880 COMPILED BY MWEDZI S


OR
3.15128205 3.581288136
𝐶. 𝐼 = (6.22 − 6.05) ± 1.645√ +
40 60
= (−0.442 ; 0.782)

44 | PRECISION +263775973880 COMPILED BY MWEDZI S

You might also like