Normal Prob - Sampling Distr and Estimation-2022
Normal Prob - Sampling Distr and Estimation-2022
About 200 years ago, scientists discovered that large numbers of measurements often exhibit
bell-shaped frequency distribution pattern. The bell shaped pattern was encountered in many
situations and hence came to be called normal pattern; Normal distribution and the normal curve
are used extensively in solving probability problems, in describing frequency distribution and in
statistical inferences specifically estimation and hypothesis testing.
x
The sum of the lengths of the vertical line segment of the distribution is 1⇒ probability
distribution.
For a continuous probability distribution, probability is given by the area above the specified
interval.
Properties of Normal Probability Distribution
1. It is a continuous probability distribution ⇒It deals with probabilities of specified
intervals.
Note: a continuous random variable is one that can take any value within a specified
range. The probability of an interval is represented by the area above the interval
If there is no area, probability = 0
1
2. It is symmetrical about its mean () 50% of the area is to the left and 50% to the right of
3. The total area under the curve above x-axis = 1 probability distribution.
4. Normal probability distribution is described by two parameters, mean and standard
deviation ( and );
X ~ N (, )
For every pair of mean and standard deviation we have a different normal distribution
The normal distribution is really a family of distributions in which one member is
distinguished from others on the basis of the values of mean and standard deviation.
The most important member of this family is the standard normal distribution which has
mean of 0 and standard deviation of 1.
5. The mean, median and mode are equal.
6. In a normal pattern, about 68% of all observations lie within one standard deviation from
the mean; 95% lie within two standard deviations from the mean;
99.7% lie within three standard deviations from the mean.
Though no set of measurements may exactly fit the normal distribution, normal distribution is
used widely in statistical inference where the random variable is approximately normally
distributed. In these applications it’s customary to transform the given random variable to the
standard random variate Z.
The standard normal distribution is the normal distribution with a mean of 0 and standard
deviation of 1. Its random variable is Z. Areas above intervals have been worked out in relation
to the standard normal distribution. To make use of the standard normal table, a given normal
distribution must be transformed to a Z i.e. must be standardized. Suppose X is the random
variable in a problem, then this X must be transformed to a Z.
Z = value of the random variable – mean of the random variable
Standard deviation of the random variable
2
Questions
1. Marks scored by students are normally distributed with mean of 70 and std. deviation of
10. Use the standard normal table to calculate the following;
a) The proportion of scores between 70 and 90 marks
b) The proportion of student scoring above 90 marks.
c) The proportion of student failing given that all those getting less than 52 marks fail.
d) The number of students scoring grade A given that all those scoring above 91 qualify for
A and that 160 students sat the exams.
2. Weekly demand for eggs stocked by Waumini grocers is normally distributed. The mean
is 500 trays and standard deviation is 100 trays. How many trays should be available for a
week if waumini wants to ensure that the probability of running out of stock does not
exceed 4%?
3. Assume that the diameters of shafts made by an automatic machine are normally
distributed with a mean of 25mm and standard deviation of 0.5mm
a. What is the probability a shaft will have diameter between 25.2 and 25.9mm?
b. What proportions of the shafts have diameters of 25mm or less?
c. If 1,000 shafts are made how many will be expected to have diameters of 24 mm or
less?
d. What percent of the shafts made will have diameter of 24.56 mm or more?
4. Assume scores of an aptitude test are normally distributed; if mean score is 140 points
and standard deviation is 25 points, find the cut off passing grade or score such that 67%
of those taking the test will pass.
5. Assume the aptitude test score are normally distributed; mean is 140 points and standard
deviation is 25 points. Within what interval centered at the mean will 95% of the scores
lie?
3
The inverse use of the standard normal table
It means to find the value of Z in the margin which corresponds to a given probability in the
body of the table.
(Z/P=k) ⇒the value of Z in the margins given that probability 0 to Z is k
e.g.
(Z/P= 0.2054) = 0.54
(Z/P = 0.4746) = 1.95
A given probability P may not appear in the standard normal table. In that case consider the
probability closest to the one given and write its z value e.g.
(Z/P = 0.4780) = 2.0
(Z/P= 0.3912) = 1.23
Sampling Distribution
Statistical inference is the procedure in which we reach a conclusion about the population on the
basis of the information contained in a sample drawn from that population.
Sampled vs target population
Sampled population- Population from which we actually draw the sample
Target population- Population about which we want information
Parameters vs statistics
A parameter is a characteristic of a population e.g. population mean
A statistic is a characteristic of a sample and serves as an estimator of unknown population
parameter.
A sampling distribution is a probability distribution of a sample statistic. For a start we look at:
1) Sampling distribution of the mean
2) Sampling distribution of the proportion
4
Probability distribution of ̅
The number of samples that can be formed from a population depends on whether we sample
with replacement or without replacement.
a) Sampling with replacement a selected item is returned to the population before
another selection is made
b) Sampling without replacement a selected item is not returned to the population. Unless
indicated the practice is to sample without replacement.
Given a population of size N the number of samples of size n that can be drawn from the
population is given by Nc n = ( )
Solution
a. No of samples NCn = 5C3 = 10 samples
b. Computation and tabulation of sampling distribution of the mean
Samples Values Sample mean ( ̅) f P( ̅ )
ABC 9,0,27 12 7 1 0.1
ABD 9,0,15 8 8 1 0.1
ABE 9,0,12 7 9 1 0.1
ACD 9,27,15 17 12 2 0.2
ACE 9,27,12 16 13 1 0.1
ADE 9,15,12 12 14 1 0.1
BCD 0,27,15 14 16 1 0.1
BCE 0,27,12 13 17 1 0.1
BDE 0,15,12 9 18 1 0.1
CDE 27,15,12 18
∑f = 10 1.0
5
Important Relationships
Rel.1: Relationship between population mean and sampling distribution (mean of sample means)
{9,0,27,15,12} mean of the population = x = x
N
= 63= 12.6
5
Mean of sample means ̅ ∑x
Ncn
= 126/10 = 12.6
𝑥̅ = x
Rel.2: Relationship between population standard deviation (x) and standard deviation of
sample means ( ̅ )
Standard deviation of sample means standard error of the mean
a) 9, 0, 27, 15, 12 b) x²
X F X² 7 49
9 1 81 8 64
0 1 0 9 81
27 1 729 12 144
15 1 225 13 169
12 1 144 14 196
16 256
17 289
. 18 324
3.5832
= 314.4
6
∑√ ( )
- (12.6)² √
= √77.04 = 8.777
σ̅ =
√
√ √
; the value √( )/N-1 is called finite population multiplier (FPM)
If N is large relative to n FPM can be omitted in calculating the standard error of the mean since
𝑥̅ = x/√𝑛
it tends to 1. As a rule FPM is omitted if n < 0.05N; {less than 5%}; such that
Conclusion: ̅ =P
7
Relationship between standard deviation of sample proportion and population proportion
The standard error of the proportion σ ̅=√ √ If n < 0.05N, then the finite population
σ ̅=√
Normal distribution of ̅
The term normal distribution means probability distribution of a random variable
i) For a normal population the distribution of ̅ is normally distributed irrespective of
the sample size
ii) Even where the population is not normal, the distribution of ̅ is practically normal if
the sample size is greater than 30. This is true for all populations usually encountered
in business problems. If n >30, ̅ ~ N
Assignment 3 pages A4, typed: Central limit theorem and its significance.
Z=
8
Statistical Estimation
-Process of inferring the values of unknown population parameters from known sample
statistics; the type of a sample statistic that is used to make inference about a given type of
population parameter is called the estimator of that parameter.
An estimator is usually expressed as a rule/mathematical formula that tells us how an
estimator is calculated; the values generated are called estimates a specific numeric
value; There are two types of estimates:
a. Point estimates
b. Interval estimates
Statisticians always seek to use estimators that are likely to take on numeric values close to
parameters of interest. Good estimators possess certain properties viz:
1. Unbiased
2. Consistent read more
3. Sufficient
4. Efficient
Point Estimation
Characteristic of interest Parameter Estimator
1) Mean
x x
x
N n
2) Variance
x x x x
2 2
2
S 2
n 1
x x
N
3) Standard Deviation
x x x x
2 2
2
sx
n 1
x
N
9
calculated in the same N successes divided by n
way
6) Standard error of the PQ Pq
p SP
proportion n n
Question 1
Answers
(a) 3.633
(b) 1.483
Question 2
Let an even number be a success; suppose a random sample of 200 numbers selected
randomly from a population contains 120 even numbers. Write the symbol for and
compute the value of point estimate of the standard error of the proportion.
n=200
PQ
Sp
N
Sp
0.60.4 0.0346
200
Unbiased estimator
An estimator is unbiased if its expected value equals the parameter value being estimated.
10
The statistic has a smaller standard error than any other unbiased statistic for estimating
the parameter of interest. (Try getting meaning of the other properties).
Efficiency: The property of an estimator whose variance is smaller than that of any other
estimator using the same sample size.
Consistency: the property of an estimator by which the probability that it’s value is near
the parameter value approaches unity as the sample size increases.
Sufficiency: This relates to the use of information in computation of the parameter. The
more information a parameter uses the more sufficient it’s said to be.
Question 3
The standard error of the mean was earlier defined as standard deviation of means of all
random samples of size n that can be selected from a population; usually the number of
samples is astronomically large; can the standard error of the mean be estimated from the
element in one of these samples? If so, how is the estimate computed?
Number of samples = NCn
Question 4
In a random sample of size n= 50, estimate for the standard error of the mean given that
∑ (x - ̅ )2 = 449
x x
2
sx
n 1
sx
sx
n
449
sx
49
3.027
50
9.163
0.428
3.027
Question 5
In a random sample of 1,600 voters, 480 were Uhuru supporters. Estimate the standard
error of the proportion of Uhuru supporters.
11
n 1600 SP
PQ
480 N
p 0.3
1600 0 .3 0 .7
1120 1600
q 0. 7
1600 0.01146
Interval estimate
It describes a range of values within which a parameter might lie.
Interval estimate of a population mean
An interval estimate of u is an interval of values A to B within which an unknown
population mean is expected to lie. The interval is an inference based upon:
i. The value of the mean ̅ of a simple random sample
ii. Known facts about the sampling distribution of the mean
0.475 0.475
0.45 0.025
1 2
Z x Z x
x x
n n
The proportion of correct statement e.g. 0.95 in interval specification is called confidence
co-efficient (c). The percentage value i.e. 95% is the confidence level. The proportion of
incorrect statement is symbolized by α; the sum of the proportion of correct and incorrect
statement is 1; Thus
c 1
1 c
The procedure for obtaining the value of z corresponding to a stated confidence level can
be simplified by using a table that gives z values when tail areas- particularly right tail
areas are known.
In interval estimation, we use the notation Zα/2 ⇒ the value of z corresponding to a tail area
of size α/2.
13
Question
Normal population has a standard deviation of 10. A random sample of size 25 has a mean
of 50. Construct a 95% confidence interval estimate of the population mean.
Answer
Concluding remarks
x x
z
x
x z x x z x
C 1
x x
x z /2 x z /2
n n
Process
14
Question
A normal population has standard deviation of 10. Random sample of size 15 selected from
population has a mean of 52.5, construct,
a) 80%
b) 90%
Confidence intervals
c) 97.5%
d) 98%
e) 99%
x x
x z 2 x z 2
n n
Answer (a)
49.195 55.805
x x
When standard deviation is not known ………… (ii)
sx
Equation (i) has only one source of variation that is the value of Z varies from 1 sample to
the next because each sample has a different mean ( x );
Equation (ii) has two sources of variation, the sample mean ( ̅ ) and sample standard
deviation (Sx) which change from sample to sample
x x
The term follows a sampling distribution different from the normal distribution
sx
x x
followed by normal
x
15
x x
(Not normal) follows student t- distribution
sx
Student’s t-Distribution
In the early 1900s W.S Gosset an employee of Guinness Brewery in Dublin Ireland studied
the distribution of t. He published his result under the pen name (pseudo name) student.
He used this pen name because his employer Guinness Brewery forbade publication of
scientific research of this nature by employees under their own names.
Properties of t-distribution
iii) In general it has a standard deviation > 1 but this standard deviation approaches
1 as sample size increases
iv) The t – distribution is really a family of distributions since there is a different
distribution for each degree of freedom value.
V = (n1 – 1) + (n2 – 1)
= n116
+ n2 -2
In the two sample case
To use t – table one needs to know one tail area and V degrees of freedom t 2 , v
2
⇒ the t value corresponding to a right tail area of size where degrees of freedom
2
equals v
e.g.
t 0.01,4= 3.75
t0.025,19=2.09
Observations
1. t- Values for each tail area decrease as V Increases (sample size increases).
2. t- Value approaches Z value as n approaches infinity.
3. t- values are available for V=1 to V=30
For more than 30 degrees of freedom we use standard normal distribution
x t ,V x x t ,V x
2 n 2 n
Where:
̅ is sample mean
V- Degrees of freedom (n = 1)
17
n- Sample size
x x
2
S S
x t V X x t V X
n n n n
Question
S S
x t , V X x t 'V X
2 n n n
1 c
2 2
1 0.95
2
0.025
sx 34.6
7.737
n 20
v n 1
20 1 19
t 0.025,19 2.69
925 2.697.737 925 2697.727
904.19 945.81
Case3: σx unknown, n large (n>30) (In application this is perhaps the most common)
S S
xz V X x z V X
n n n n
Question
18
n 35
mean x 20
Sx 20
1 0.95
2 2
0.025.34
2.04
The narrower a confidence interval, the more precise it is said to be; the wider, the less
precise. Precision of an interval depends on 3 variables:
a. Level of confidence – the lower the confidence interval the more precise the interval
estimate
b. Variability of the population - the more variable the population the less precise.
c. Sample size – the larger the sample size ,the more precise
SX S
xZ xZ X
n n n n
SX
xZ x e
n n
X
eZ
2 n
19
Sample proportion of failure → ̅
P q 1
q 1 P
Pq
p
n
Pq Pq
pz P pz
n n n n
nP , nq 15
Question
Last November, a random sample of 400 members of labour force in Nyanza province
showed that 32 were unemployed. Construct the 95٪ confidence interval for the proportion
employed in the region.
20
PQ PQ
pZ pZ
n N n N
P 0.92
Q 0.08
Z 0.025
0.92 0.08
400
0.01356
0.92 1.960.01356
0.893 P 0.947
89.3% P 94.7%
1. Cost cutting
2. Time saving
3. Samples justified where the exact size of the population unknown
4. If impractical to carry out a census
5. If elements of the study damaged
6. If in depth analysis is needed.
X
Sample size for constructing a C.I estimate of x z
2 n
X
Degree of precision = Z
2 n
21
Z / 2 X
2
n
e
z / 2 X
2
n
e
Question
If mean = 60 and standard deviation = 10, find proportion lying between two standard
deviations from the mean:
x
z
x
40 60
z
10
2
0.4772 2
0.9544
95.44%
z X
2
1. n
e
22
PQ PQ
pz P pz
n N n N
PQ
PPZ
2 n
P P e
Question
A sample is to be taken to estimate the mean salary of plumbers to within ±$500 with a
confidence of 0.99. A plumbers’ union official states that $40,000 and $26,000 would be
unusually large and small salaries for plumbers respectively. What should the sample size
be?
Answer: 327
23
Question
An advertising executive thinks that the proportion of consumers who have seen his
company’s advertising in newspapers is between 0.65 and 0.85. The executive wants to
estimate the consumer population proportion to within ±0.05 and have 98 percent
confidence in the estimate. How large a sample should be taken? State the safer sample
size.
Answer
x1 x2 z x x 1 2 x1 x 2 z
x1 x2
2 1 2
2
where :
x1
x 1
n1
x2
x 2
n2
12 22
x x2
1
n1 n21
24
Case 2: population distribution normal or not, standard deviations not known, n1, n2
large
x1 x2 z s x x 1 2 x1 x 2 z
s x1 x2
2 1 2
2
s12 s 22
Sx1 x 2
n1 n21
where :
Sx 2
2
n2 1
V n1 n2 2
25
p1 p 2 z S p p P1 P 2 p1 p 2 z
S p1 p2
2 1 2
2
where
P1 q1 P q 2
S p1 p2
n1 n2
Question
The variance in both factories is known to be 10 lbs2. The populations are normally
distributed. Construct 95% confidence interval estimate for the difference between the two
.means.
Answer
0 1 2 4
26
Question
There are 2 fertilizers; one of the two is applied in each field. At harvest time, a random
sample of 25 cabbages from the crop grown by fertilizer 1 is selected, 12 cabbages grown
from fertilizer 2 is also selected. Sample mean and variance of the weights of the cabbages
grown with fertilizer one are 44.1 and 36 0Z2.
The experiment assumes the two populations of weight are normally distributed. The two
population variances are equal. Compute 95% confidence interval estimate for µ1-µ2.
27