is within 10% of the true of the average retail price in the city.
A SRS will be taken from available list
of all outlets. Another survey from the same population showed an average price of $7.00 for 20
items with a standard deviation of $1.4. Assuming 99.7% confidence interval, determine the sample
size.
Solution: N=2500, s=1.4 s2=(1.4)2
s 2 (1.4) 2
CV 2 ( y ) 0.04, Z=3 for 99.7%,
y2 72
2
Z 2 CV y (3) 2 (0.04) no 36
no 36 0.0144 <5%
2
(0.1) 2
N 2500
Therefore, no=n=36 which is a good approximation for the sample. But if you calculate for n, you will
no 36 36
get that n n 35.5 36
no 36 1 0.0144
1 1
N 2500
Chapter 3 Sampling Proportion (for categorical data)
Definitions: In some cases the nature of the survey may require recording of the attributes, which can be
expressed qualitatively. The qualitative information can be quantified by counting the attribute
characteristics. These characteristics could be of various forms, such as living in urban or rural, being a male
or female, married or unmarried, literate or illiterate, adults between 18 and 45 years or adults over 45 years,
etc.
Therefore, the main interest for such attributes could be to estimate the total number of units and the
proportion of units in the population possessing some characteristics. Attributes can be changed in to
quantifiable information by allocating the score “1” or “0”, while measurable variables can also be changed
in to attributes by categorizing the population in to different groups. It is worth presenting the special simple
form that the variance of a proportion takes when the design is simple random sampling.
Notation
Consider a population in which each member is classified as either having or not having a specified attribute,
a two category population.
Population
N= Total number of units in the population
A= total number of units in specified category (in C)
P= A/N is population proportion, i.e the proportion (percentage) of the entire population that has a specified
value.
Q= 1-P is proportion of units not in C
Sample:
Page | 22
n= total number of members in the sample
a= the total number of members sampled that have the specified attribute,
p= a/n sample proportion, i.e. the proportion (percentage) of a sample from the population that has the
specified attribute.
q= 1-p proportion of sample members not in C.
Variance and Standard Errors of the Estimates
For any unit in the population or in the sample, we define an observation (variable) yi as follows to facilitate
counting.
1, if the is in C
yi
0, if the Unit is not in C
N
N Y i
A NPQ
For population, Y Yi A , Y i 1
P and S 2 and for sample
i 1 N N N 1
n
n y i
a npq
y yi a , y i 1
p and s 2 (Verify).
i 1 n n n 1
Similar to a continuous case, a sample proportion, p, is also a random variable that depends on what
members of the population are included in that sample.
Theorem 5: The sample proportion, p=a/n is an unbiased estimate of the population proportion P=A/N, i.e..
Prove this theorem.
Theorem 6: The variance of the sample proportion or percentage (p) is given by
PQ N n
Var ( p) E ( p P) . Prove this theorem.
n N 1
Corollary: i) The estimated total number of units in class C, is an unbiased estimate of A.
ii) The variance of the estimated total number of units in class C, is
N 2 PQ N n
Var ( Aˆ )
n N 1
Estimation of the Standard error from the sample
Theorem 7: An unbiased estimate of the sample variance will be Var ( p) pq N n pq 1 f
n 1 N n 1
If N is large relative to n, the finite population correction (1-f) is negligible and the variance of p is
pq
Var( p) (verify)
n 1
Corollary: The sample variance of estimated total number of members in specified category, is given by
N ( N n)
Var ( Aˆ ) pq. In each case we can get the standard error by taking the square root of the variances.
n 1
Example: See Cochran 3rd edition page 52
For the proportion estimate the confidence limits can be obtained by: for large sample size and substitute
S.E(p) by s.e(p) to get the confidence interval . A slight improvement can be achieved by applying
continuity correction for normal approximation to binomial, i.e .
Relative Error
Page | 23
Statistical measures such as standard deviation and the standard error appear in the units of measurement of
variables. Such measurement units may cause difficulties in making some comparisons. Relative measures,
such as coefficients of variation, can be used to overcome the problems.
Sy sy
The element coefficient of variation is can be expressed as CV ( y ) and estimated by cv( y ) . For
Y y
S .E ( y ) s.e( y )
the mean the coefficient of variation is given by CV ( y ) and estimated by cv( y ) . For the
Y y
ˆ Nse( y ) se( y )
total , the coefficient of variation is given by CV (Yˆ ) S .E (Y ) .and estimated by cv(Yˆ ) ,
E (Yˆ ) Ny y
which is the same as the coefficient of variation of the mean.
PQ( N n)
n( N 1) Q ( N n)
For proportion (p), we can write the coefficient of variation as CV ( p) ,
P nP( N 1)
Q
which is approximately equal to if finite population correction (1-f) is ignored. Its estimate is given as:
nP
pq( N n)
s.e( p ) (n 1) N q
cv( p ) (1 f ) .
p p (n 1) p
S .E (ˆ)
Generally, the coefficient of variation of an estimator is given by CV (ˆ) and its square is known
E (ˆ)
Var (ˆ)
as rel-variance, i.e, CV 2 (ˆ)
.
E (ˆ)
2
Sample Size Determination
For Categorical Data
The sample size required for estimation population proportion (P) can be obtained in a similar way and have
similar form to those shown above for the mean. Assume that the proportion estimate p is normally
distributed with absolute margin of error or relative error , the sample size n can be calculated by
Z 2 PQ d 2
n ( Verify this).
1 1 N Z 2 PQ Nd 2
no
If we put n0 Z 2 PQ d 2 , then we get n . For large population size (N) we have the
1 1 N n0 N
n0
sample size n , and we can approximate n by n0 as we have done for the mean.
1 n0 N
Using the relative error () and the relation, we set n0 Z 2 Q P 2
In practice the population parameters ( must be estimated and the other factors usually set by the investigator
(researcher). The relation shows the following summary points.
The smaller we make , the greater will be sample size n.
If the degree of confidence () increases, certainly the sample size increases.
Since population parameters are unknown, calculate n0 by using the sample estimates. That is
Page | 24
Z 2 s y
2
Z 2 cv 2 ( y )
n0 Z pq d
2 2
or n0
2 2
How do we get estimates of the population parameters in order to use these estimates in sample size
determination? In actual practice, there are four possible ways of estimating the parameters.
By taking simple random sample of size n1, small preliminary sample, from which and the required n
will be obtained. This method gives the most reliable estimates, but slows up the completion of the
survey and because of this it is not often used.
By using the results of pilot survey: To design efficiently a large sample in an unknown field, a pilot
study may be conducted prior to the survey to gain information for designing the survey which also
serves many other proposes.
By using previous surveys results: we should search for data from previous/past surveys of similar
variables and make use of it after adjusting for time changes.
By guesswork about the nature of population: these requires educated guesses or the services of
experts such as survey statisticians, supported by specialists in the subject matter concerned who may
construct a model of the population distribution, its shape, and its probable limits, and deduce from
it.
Reading Assignment: Read Cochran 3rd ed, chapter 4, section 4.7, page 78-81.
Example.
2. A teacher training institutes are interested in estimating the proportion (P) of teachers who consider
system to be more suitable as compared to the 3-term system of education. A SRS of n=120 teachers is
taken from a total N=1200 teachers, without replacement. Some of the teachers are in favor of two
semesters while others are not and it is found that 72 teachers are in favor of semester system.
i) Estimate the proportion P along with the standard error of your estimate.
ii) Calculate the 95% confidence interval for P.
iii) Do you think the sample size 120 is sufficient if the tolerable error could be 0.08? If not, how many
more units should be included in the sample?
Solution:
n=120, a=72, N=1200,
i) P=a/n=72/120=0.6
ii) 95% confidence limits
Therefore the proportion of teachers in the institutes favoring semester system is likely to be between
51% and 68%. Estimate of total number of teachers who are in favor of two-semester system is
Z 2 pq (1.96) 2 0.6 0.4 no 144
iii) no 144, 0.12 >5%
d2 (0.08) 2 N 1200
Therefore n can be estimated as
Page | 25
144
n
no
n
144 128.57 129
no 144 1.12
1 1
N 1200
Therefore 120 is not sufficient for achieving the given precision meaning 9 more teachers need to be
selected.
Chapter 4: Stratified Random Sampling
4.1 Definition:
Stratified Sampling is a technique, which involves the division or stratification of a population by
partition the sampling frame in to non-overlapping and relatively homogeneous groups called strata. The
selection of samples can be performed independently in each of those strata.
Stratified random Sampling is a sampling plan in which a population is divided in to L mutually
exclusive and exhaustive strata, and a simple random sample of nh elements is taken separately and
independently within each stratum. Let N1, N2, ------, NL represent the number of sampling units within
each stratum, and n1, n2, …….nL represent the number of randomly selected units within each stratum.
Then the total number of possible stratified random samples is equal to
N1 N 2 N N
....... L
n1 n2 nL n
Stratified random sampling, in particular involves dividing the population in to strata, and then selecting
simple random samples from each of strata. Stratification variables may be geographic (region, province,
rural/urban, zone) or non-geographic (income, age, sex, size of employees, etc). it should be kept in mind
that stratification is limited only to those items of information, which are available on the frame.
4.2 The purpose of stratified Sampling
Stratified sampling is used in certain types of surveys because it combines the conceptual simplicity of
simple random sampling with potentially significant gains in reliability. Basically there are four major
reasons for resorting to stratification:
Page | 26