0% found this document useful (0 votes)
59 views

Normal Prob - Sampling Distr and Estimation-2022

1. Normal distribution is a continuous probability distribution that produces bell-shaped curves. It is defined by two parameters: mean and standard deviation. 2. The sampling distribution of the mean refers to the probability distribution of sample means that would be obtained by randomly sampling from a population. As sample size increases, the sampling distribution of the mean approximates the normal distribution. 3. There are relationships between the population mean and the mean of the sampling distribution of the mean, as well as between the population standard deviation and the standard deviation of the sampling distribution of the mean. The mean of sampling distributions equals the population mean, and its standard deviation decreases with increasing sample size.

Uploaded by

Donnovan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Normal Prob - Sampling Distr and Estimation-2022

1. Normal distribution is a continuous probability distribution that produces bell-shaped curves. It is defined by two parameters: mean and standard deviation. 2. The sampling distribution of the mean refers to the probability distribution of sample means that would be obtained by randomly sampling from a population. As sample size increases, the sampling distribution of the mean approximates the normal distribution. 3. There are relationships between the population mean and the mean of the sampling distribution of the mean, as well as between the population standard deviation and the standard deviation of the sampling distribution of the mean. The mean of sampling distributions equals the population mean, and its standard deviation decreases with increasing sample size.

Uploaded by

Donnovan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Normal Probability distribution

About 200 years ago, scientists discovered that large numbers of measurements often exhibit
bell-shaped frequency distribution pattern. The bell shaped pattern was encountered in many
situations and hence came to be called normal pattern; Normal distribution and the normal curve
are used extensively in solving probability problems, in describing frequency distribution and in
statistical inferences specifically estimation and hypothesis testing.

Continuous probability distribution


The values of a discrete random variable e.g. 1,2,3,4,5 etc. are graphed as separated lines and
probability as lengths of vertical line segments e.g.
p(x)

x
The sum of the lengths of the vertical line segment of the distribution is 1⇒ probability
distribution.
For a continuous probability distribution, probability is given by the area above the specified
interval.
Properties of Normal Probability Distribution
1. It is a continuous probability distribution ⇒It deals with probabilities of specified
intervals.
Note: a continuous random variable is one that can take any value within a specified
range. The probability of an interval is represented by the area above the interval
If there is no area, probability = 0

1
2. It is symmetrical about its mean () 50% of the area is to the left and 50% to the right of

3. The total area under the curve above x-axis = 1  probability distribution.
4. Normal probability distribution is described by two parameters, mean and standard
deviation ( and );
X ~ N (, )
For every pair of mean and standard deviation we have a different normal distribution
The normal distribution is really a family of distributions in which one member is
distinguished from others on the basis of the values of mean and standard deviation.
The most important member of this family is the standard normal distribution which has
mean of 0 and standard deviation of 1.
5. The mean, median and mode are equal.
6. In a normal pattern, about 68% of all observations lie within one standard deviation from
the mean; 95% lie within two standard deviations from the mean;
99.7% lie within three standard deviations from the mean.

Though no set of measurements may exactly fit the normal distribution, normal distribution is
used widely in statistical inference where the random variable is approximately normally
distributed. In these applications it’s customary to transform the given random variable to the
standard random variate Z.
The standard normal distribution is the normal distribution with a mean of 0 and standard
deviation of 1. Its random variable is Z. Areas above intervals have been worked out in relation
to the standard normal distribution. To make use of the standard normal table, a given normal
distribution must be transformed to a Z i.e. must be standardized. Suppose X is the random
variable in a problem, then this X must be transformed to a Z.
Z = value of the random variable – mean of the random variable
Standard deviation of the random variable

2
Questions
1. Marks scored by students are normally distributed with mean of 70 and std. deviation of
10. Use the standard normal table to calculate the following;
a) The proportion of scores between 70 and 90 marks
b) The proportion of student scoring above 90 marks.
c) The proportion of student failing given that all those getting less than 52 marks fail.
d) The number of students scoring grade A given that all those scoring above 91 qualify for
A and that 160 students sat the exams.

2. Weekly demand for eggs stocked by Waumini grocers is normally distributed. The mean
is 500 trays and standard deviation is 100 trays. How many trays should be available for a
week if waumini wants to ensure that the probability of running out of stock does not
exceed 4%?
3. Assume that the diameters of shafts made by an automatic machine are normally
distributed with a mean of 25mm and standard deviation of 0.5mm
a. What is the probability a shaft will have diameter between 25.2 and 25.9mm?
b. What proportions of the shafts have diameters of 25mm or less?
c. If 1,000 shafts are made how many will be expected to have diameters of 24 mm or
less?
d. What percent of the shafts made will have diameter of 24.56 mm or more?
4. Assume scores of an aptitude test are normally distributed; if mean score is 140 points
and standard deviation is 25 points, find the cut off passing grade or score such that 67%
of those taking the test will pass.
5. Assume the aptitude test score are normally distributed; mean is 140 points and standard
deviation is 25 points. Within what interval centered at the mean will 95% of the scores
lie?

3
The inverse use of the standard normal table
It means to find the value of Z in the margin which corresponds to a given probability in the
body of the table.
(Z/P=k) ⇒the value of Z in the margins given that probability 0 to Z is k
e.g.
(Z/P= 0.2054) = 0.54
(Z/P = 0.4746) = 1.95

A given probability P may not appear in the standard normal table. In that case consider the
probability closest to the one given and write its z value e.g.
(Z/P = 0.4780) = 2.0
(Z/P= 0.3912) = 1.23

Sampling Distribution
Statistical inference is the procedure in which we reach a conclusion about the population on the
basis of the information contained in a sample drawn from that population.
Sampled vs target population
Sampled population- Population from which we actually draw the sample
Target population- Population about which we want information

Parameters vs statistics
A parameter is a characteristic of a population e.g. population mean
A statistic is a characteristic of a sample and serves as an estimator of unknown population
parameter.
A sampling distribution is a probability distribution of a sample statistic. For a start we look at:
1) Sampling distribution of the mean
2) Sampling distribution of the proportion

Sampling Distribution of the mean


It is the probability distribution of the means ̅ of all simple random samples of a given size that
can be drawn from a population.

4
Probability distribution of ̅
The number of samples that can be formed from a population depends on whether we sample
with replacement or without replacement.
a) Sampling with replacement  a selected item is returned to the population before
another selection is made
b) Sampling without replacement  a selected item is not returned to the population. Unless
indicated the practice is to sample without replacement.
Given a population of size N the number of samples of size n that can be drawn from the

population is given by Nc n = ( )

e.g. N=5 5c3= 10


n=3
Given the population A, B, C, D, E
9, 0, 27, 15, 12
a. How many samples of size n = 3 can be drawn from the population?
b. Compute and tabulate the sampling distribution of the mean for samples of size n =3.

Solution
a. No of samples NCn = 5C3 = 10 samples
b. Computation and tabulation of sampling distribution of the mean
Samples Values Sample mean ( ̅) f P( ̅ )
ABC 9,0,27 12 7 1 0.1
ABD 9,0,15 8 8 1 0.1
ABE 9,0,12 7 9 1 0.1
ACD 9,27,15 17 12 2 0.2
ACE 9,27,12 16 13 1 0.1
ADE 9,15,12 12 14 1 0.1
BCD 0,27,15 14 16 1 0.1
BCE 0,27,12 13 17 1 0.1
BDE 0,15,12 9 18 1 0.1
CDE 27,15,12 18
∑f = 10 1.0

5
Important Relationships
Rel.1: Relationship between population mean and sampling distribution (mean of sample means)
{9,0,27,15,12} mean of the population = x = x
N
= 63= 12.6
5
Mean of sample means  ̅ ∑x
Ncn
= 126/10 = 12.6

Mean of all sample means = population mean

𝑥̅ = x

Rel.2: Relationship between population standard deviation (x) and standard deviation of
sample means (  ̅ )
Standard deviation of sample means standard error of the mean

a) 9, 0, 27, 15, 12 b) x²
X F X² 7 49
9 1 81 8 64
0 1 0 9 81
27 1 729 12 144
15 1 225 13 169
12 1 144 14 196
16 256
17 289
. 18 324
3.5832

= 314.4

6
∑√ ( )

- (12.6)² √
= √77.04 = 8.777

√ = 5.0674 * √0.5 = 3.5832


σ̅ =

√ √
; the value √( )/N-1 is called finite population multiplier (FPM)

If N is large relative to n FPM can be omitted in calculating the standard error of the mean since

𝑥̅ = x/√𝑛
it tends to 1. As a rule FPM is omitted if n < 0.05N; {less than 5%}; such that

Sampling distribution of the proportion


It is the probability distribution of sample proportions of all samples of a given size that can be
drawn from the population
Sample proportion ( ̅ ) = no of successes in the sample
Samples size
For the data below, compute and tabulate sampling distribution of the proportion for samples of
size 3 if success is defined as getting an odd number.
A B C D E
9, 0, 27, 15, 12
Solution: Getting sample proportion of success given success is odd number
Sample proportions of success
̅ F P( ̅ )
⅓ 3 0.3
⅔ 6 0.6
1 1 1
Population proportion =3/5
= 0.6
Mean of sample proportions (µ ̅ ) = ̅
Ncn
6/10 = 0.6

Conclusion: ̅ =P

7
Relationship between standard deviation of sample proportion and population proportion

The standard error of the proportion σ ̅=√ √ If n < 0.05N, then the finite population

multiplier can be omitted such that

σ ̅=√

Normal distribution of ̅
The term normal distribution means probability distribution of a random variable
i) For a normal population the distribution of ̅ is normally distributed irrespective of
the sample size
ii) Even where the population is not normal, the distribution of ̅ is practically normal if
the sample size is greater than 30. This is true for all populations usually encountered
in business problems. If n >30, ̅ ~ N
Assignment 3 pages A4, typed: Central limit theorem and its significance.

Using normal distribution of ̅

Mean () and standard deviation ( ̅ )

To make use of the normal table we calculate z scores (we standardize)


z= ̅ -x
 ̅
Where  ̅ = x/√
Values of z calculated using this formula are used to enter the normal tables in the usual
manner
Normal distribution ̅
Sample proportions ̅ are normally distributed when both np and nq >5

Z=

8
Statistical Estimation
-Process of inferring the values of unknown population parameters from known sample
statistics; the type of a sample statistic that is used to make inference about a given type of
population parameter is called the estimator of that parameter.
An estimator is usually expressed as a rule/mathematical formula that tells us how an
estimator is calculated; the values generated are called estimates a specific numeric
value; There are two types of estimates:
a. Point estimates
b. Interval estimates

Statisticians always seek to use estimators that are likely to take on numeric values close to
parameters of interest. Good estimators possess certain properties viz:

1. Unbiased
2. Consistent read more
3. Sufficient
4. Efficient

Point Estimation
Characteristic of interest Parameter Estimator

1) Mean

x x
x
N n

2) Variance
 x  x   x  x 
2 2

 2
 S 2

n 1
x x
N

3) Standard Deviation
 x  x   x  x 
2 2

  2
sx 
n 1
x
N

4) Standard error of mean x sx


x  sx 
n n

5) Proportion of Success P=Number of successes in p sample proportion of


-Proportion of failure the population divided by success; given by no. of

9
calculated in the same N successes divided by n
way
6) Standard error of the PQ Pq
p  SP 
proportion n n

Question 1

For the random sample 1,2,4,5,7,11 compute the following:

(a) Sample standard deviation


(b) The estimate of the standard error of the mean.

Answers

(a) 3.633
(b) 1.483

Question 2

Let an even number be a success; suppose a random sample of 200 numbers selected
randomly from a population contains 120 even numbers. Write the symbol for and
compute the value of point estimate of the standard error of the proportion.

n=200

p  120 / 200  0.6


q  80 / 200  0.4

PQ
Sp 
N

Sp 
0.60.4  0.0346
200

Unbiased estimator
An estimator is unbiased if its expected value equals the parameter value being estimated.

10
The statistic has a smaller standard error than any other unbiased statistic for estimating
the parameter of interest. (Try getting meaning of the other properties).
Efficiency: The property of an estimator whose variance is smaller than that of any other
estimator using the same sample size.
Consistency: the property of an estimator by which the probability that it’s value is near
the parameter value approaches unity as the sample size increases.
Sufficiency: This relates to the use of information in computation of the parameter. The
more information a parameter uses the more sufficient it’s said to be.

Question 3
The standard error of the mean was earlier defined as standard deviation of means of all
random samples of size n that can be selected from a population; usually the number of
samples is astronomically large; can the standard error of the mean be estimated from the
element in one of these samples? If so, how is the estimate computed?
Number of samples = NCn
Question 4
In a random sample of size n= 50, estimate for the standard error of the mean given that
∑ (x - ̅ )2 = 449

 x  x 
2

sx 
n 1
sx
sx 
n
449
sx 
49
3.027
50
9.163

0.428
3.027
Question 5
In a random sample of 1,600 voters, 480 were Uhuru supporters. Estimate the standard
error of the proportion of Uhuru supporters.

11
n  1600 SP 
PQ
480 N
p  0.3
1600 0 .3  0 .7

1120 1600
q  0. 7
1600  0.01146

Interval estimate
It describes a range of values within which a parameter might lie.
Interval estimate of a population mean
An interval estimate of u is an interval of values A to B within which an unknown
population mean is expected to lie. The interval is an inference based upon:
i. The value of the mean ̅ of a simple random sample
ii. Known facts about the sampling distribution of the mean

Confidence Interval Estimate of a Population mean

Confidence interval estimate of a population mean u is an interval estimate together with a


statement of how confident we are that the interval is correct; several methods exists for
estimating confidence interval estimate of the population mean. The choice of methods
depends on:

a) Probability distribution of the population- normal or not?


b) Whether x is known or unknown
c) Sample size – large or small?
n<30 small
n≥30 large

Case1: Confidence interval estimate of u, normal Population,  known

If the population is normal, σx known,


12
x  N  x ,  x 
x  x
z
x

0.475 0.475
0.45 0.025
1 2

Construct an interval around x

Z x Z x 
x x
n n

The proportion of correct statement e.g. 0.95 in interval specification is called confidence
co-efficient (c). The percentage value i.e. 95% is the confidence level. The proportion of
incorrect statement is symbolized by α; the sum of the proportion of correct and incorrect
statement is 1; Thus

c  1
  1 c

C- refers to the chance that the confidence interval is correct

α- refers to the chance that the confidence interval is incorrect.

The procedure for obtaining the value of z corresponding to a stated confidence level can
be simplified by using a table that gives z values when tail areas- particularly right tail
areas are known.

In interval estimation, we use the notation Zα/2 ⇒ the value of z corresponding to a tail area
of size α/2.

13
Question

Normal population has a standard deviation of 10. A random sample of size 25 has a mean
of 50. Construct a 95% confidence interval estimate of the population mean.

Answer

46.08 ≤µ≥ 53.92

Concluding remarks

x  x
z
x

x  z  x    x  z  x
C   1

Confidence interval estimate, normal population, σx known is given by the formula,

x x
x  z  /2    x  z  /2
n n

Confidence Interval estimate of µ, Normal population, σx known

Problem is to estimate μ using a Confidence interval with confidence coefficient c

Process

1. Collect a simple random sample of size n and compute ̅


 1 c
2. Compute  , lookup Z  / 2
2 2
 x
3. Compute x  Z
2 n
 x  x
4. Inference x  Z xZ
2 n 2 n

14
Question

A normal population has standard deviation of 10. Random sample of size 15 selected from
population has a mean of 52.5, construct,

a) 80%
b) 90%
Confidence intervals
c) 97.5%
d) 98%
e) 99%

x x
x  z 2     x  z 2 
n n

Answer (a)

49.195    55.805

Case 2:Normal population σX unknown, n small


x  x
When standard deviation of the population is known z  ….. (i)
x

x  x
When standard deviation is not known ………… (ii)
sx

Equation (i) has only one source of variation that is the value of Z varies from 1 sample to
the next because each sample has a different mean ( x );

Equation (ii) has two sources of variation, the sample mean ( ̅ ) and sample standard
deviation (Sx) which change from sample to sample

x  x
The term follows a sampling distribution different from the normal distribution
sx
x  x
followed by normal
x

15
x  x
(Not normal) follows student t- distribution
sx

Student’s t-Distribution

In the early 1900s W.S Gosset an employee of Guinness Brewery in Dublin Ireland studied
the distribution of t. He published his result under the pen name (pseudo name) student.
He used this pen name because his employer Guinness Brewery forbade publication of
scientific research of this nature by employees under their own names.

Consequently the t – distribution is often referred to as the student t distribution

Properties of t-distribution

i) - Distribution is symmetrical with mean of 0 just like standard normal


distribution. The graph is almost similar to that of standard normal distribution
ii) Though symmetrical t – distributions have higher tails than standard normal
distributions; Consequently, the t– value for a given right tail area is greater than
the Z value for the same area

In general the t – distribution is less peak


at the centre and higher in the tails than
standard normal distribution.

iii) In general it has a standard deviation > 1 but this standard deviation approaches
1 as sample size increases
iv) The t – distribution is really a family of distributions since there is a different
distribution for each degree of freedom value.

In the one sample case V=n -1, while

V = (n1 – 1) + (n2 – 1)
= n116
+ n2 -2
In the two sample case

v) The t distribution approaches normal distribution as n increases;

if n ≥ 30 (use normal (z) instead of t).

Using the t tables

 
To use t – table one needs to know one tail area   and V degrees of freedom t 2 , v
2

 
⇒ the t value corresponding to a right tail area of size   where degrees of freedom
2
equals v

e.g.

t 0.01,4= 3.75

t0.025,19=2.09

Observations

1. t- Values for each tail area decrease as V Increases (sample size increases).
2. t- Value approaches Z value as n approaches infinity.
3. t- values are available for V=1 to V=30
For more than 30 degrees of freedom we use standard normal distribution

In conclusion, for Normal Population x unknown, n small,


Confidence interval estimate of µ is given by the formula:

   
x t ,V  x    x  t ,V x
2 n 2 n

Where:

̅ is sample mean

V- Degrees of freedom (n = 1)

17
n- Sample size

 x  x 
2

SX-Sample standard deviation: s X 


n 1

 S  S
x t V  X    x  t V  X
n n n n

Question

n = 20, x =925, s X  34.6

What is the 95% confidence interval?

 S  S
x t , V  X    x  t 'V  X
2 n n n
 1 c

2 2
1  0.95

2
 0.025
sx 34.6
  7.737
n 20
v  n 1
 20  1  19

t 0.025,19  2.69
925  2.697.737     925  2697.727 
904.19    945.81

Case3: σx unknown, n large (n>30) (In application this is perhaps the most common)

 S  S
xz V  X    x  z V  X
n n n n

Question

18
n  35
mean x   20
Sx  20
 1  0.95

2 2
 0.025.34
 2.04

Precision, confidence and sample size

The narrower a confidence interval, the more precise it is said to be; the wider, the less
precise. Precision of an interval depends on 3 variables:

a. Level of confidence – the lower the confidence interval the more precise the interval
estimate
b. Variability of the population - the more variable the population the less precise.
c. Sample size – the larger the sample size ,the more precise

 SX  S
xZ    xZ  X
n n n n

 SX
  xZ     x e
n n

 X
eZ 
2 n

Confidence Interval Estimate of a Population Proportion

Population proportion of success → P, P+q=1

Population proportion of failure →q, q=1-P

P - Sample proportion →serves as an estimator of proportion of success.


P= Number of success in sample
n

19
Sample proportion of failure → ̅

P  q 1

q  1 P

Pq
p 
n

P  N  P  P , p  , if both np and nq exceed 5.

Approximate confidence interval of a population proportion is given by the formulae

 Pq  Pq
pz  P pz 
n n n n
nP , nq  15

Question

Last November, a random sample of 400 members of labour force in Nyanza province
showed that 32 were unemployed. Construct the 95٪ confidence interval for the proportion
employed in the region.

20
 PQ  PQ
pZ    pZ 
n N n N

P  0.92
Q  0.08
Z  0.025

0.92  0.08

400
 0.01356

0.92  1.960.01356 
0.893  P  0.947
89.3%  P  94.7%

Sample size estimation in interval construction

Reasons for using samples:

1. Cost cutting
2. Time saving
3. Samples justified where the exact size of the population unknown
4. If impractical to carry out a census
5. If elements of the study damaged
6. If in depth analysis is needed.

The larger the sample the more reliable the result

How large should a sample be?

 X
Sample size for constructing a C.I estimate of   x  z 
2 n

 X
Degree of precision = Z 
2 n

21
 Z  / 2 X 
2

n 
 e 
 z  / 2 X 
2

n   
 e 

 If given a range of σx, use the larger value


 Given Sx estimate from earlier surveys, use that
 If given a range of Xs (high and low values)
4σ = Ho – Lo (Official High values)
Ho  Lo
  (Official low values)
4

Question

If mean = 60 and standard deviation = 10, find proportion lying between two standard
deviations from the mean:

x
z
x
40  60
z
10
 2
0.4772  2
 0.9544
 95.44%

Sample size selection

 z   X 
2

1. n   
 e 

2. Sample size for purpose of estimating population proportion confidence interval:

22
 PQ  PQ
pz  P pz 
n N n N

 PQ
PPZ 
2 n

P  P e

Since P not estimated as yet guess


 P Q values p and Q
e  z parameter

n N
2 
Z  PQ Zˆ 2  P Q
2 n 2
n e2
e2

- Choosing P if earlier surveys have an estimate for P use the same


- If given a range of P, use the value closest to 0.5, e.g 0.6, and 0.9.use 0.6
- The largest sample size obtains when P is exactly 05 guessed.

Question

A sample is to be taken to estimate the mean salary of plumbers to within ±$500 with a
confidence of 0.99. A plumbers’ union official states that $40,000 and $26,000 would be
unusually large and small salaries for plumbers respectively. What should the sample size
be?

Answer: 327

23
Question

An advertising executive thinks that the proportion of consumers who have seen his
company’s advertising in newspapers is between 0.65 and 0.85. The executive wants to
estimate the consumer population proportion to within ±0.05 and have 98 percent
confidence in the estimate. How large a sample should be taken? State the safer sample
size.

Answer

Safer Sample = 495

Interval Estimation for difference between two population means ( 1   2 )

Case 1: Normal Populations σ1 and σ2 known

x1  x2   z    x  x  1   2  x1  x 2   z

  x1  x2
2 1 2
2

where :

x1 
x 1

n1

x2 
x 2

n2

 12  22
 x  x2  
1
n1 n21

24
Case 2: population distribution normal or not, standard deviations not known, n1, n2
large

x1  x2   z   s x  x  1   2  x1  x 2   z

 s x1  x2
2 1 2
2

s12 s 22
Sx1  x 2  
n1 n21
where :

 xn1 1, nx22 small but assumptions made that σ1=σ2


2
Case 3: populations normal
Sx12 
n1  1
x1  x2   t   V  s x  x 
 1   2  x1  x 2   t  V  s x1  x2
2
 x1  x2 2 2
1 2

Sx 2 
2

n2  1
V  n1  n2  2

n1  1S 21  n2  1S 22  1 1 



Sx1  x 2  n 
n1  n2  2  1 n2 

Interval Estimation between Two Populations Proportions

25
 p1  p 2   z   S p  p  P1  P 2   p1  p 2   z

 S p1  p2
2 1 2
2

where

Number of success in sample1


p1 
n1

Number of success in sample 2


p1 
n1

P1 q1 P q 2
S p1  p2  
n1 n2

Question

Maendeleo manufacturing company produces a synthetic fibre at two factories located at


different parts of the country. Every effort is made to maintain uniformity of production
between the two factories with respect to the mean breaking strength of the fibre. To
determine if the two factories are maintaining uniformity of production, the manager
selected a sample of 25 specimens from factory 1 and, 16 from 2. The objective is to
construct a confidence interval for the difference between the two means. The mean
breaking strength of the sample from factory 1 is 22 lbs. and from 2 is 20 lbs.

The variance in both factories is known to be 10 lbs2. The populations are normally
distributed. Construct 95% confidence interval estimate for the difference between the two
.means.

Answer

0  1   2  4

26
Question

There are 2 fertilizers; one of the two is applied in each field. At harvest time, a random
sample of 25 cabbages from the crop grown by fertilizer 1 is selected, 12 cabbages grown
from fertilizer 2 is also selected. Sample mean and variance of the weights of the cabbages
grown with fertilizer one are 44.1 and 36 0Z2.

x2  31.7 Oz,  2  44 Oz2

The experiment assumes the two populations of weight are normally distributed. The two
population variances are equal. Compute 95% confidence interval estimate for µ1-µ2.

27

You might also like