Sampling Techniques MCQ
Sampling Techniques MCQ
Section A
1. A sample consists of
(a) All units of the population
(b) 5% units of the population
(c) 10% units of the population
(d) Any fraction of the population
(c) (N-n)/N
(d) n/N
4. The number of possible samples of size n out of N population size in SRSWR is
equal to
(a) Ncn
(b)Nn
(c) (N-n)/N
(e) n/N
5. The number of possible samples of size 2 out of 5 population size in SRSWOR is
equal to
(a) 10
(b) 4
(c) 2
(d) 12
6. The number of possible samples of size 2 out of 5 population size in SRSWR is
equal to
(a) 25
(b) 20
(c) 2
(d) 12
(a) SRSWOR
(b) SRSWR
(c) Both (a) &(b)
(d) None
8. The sampling fraction in usual notation is expressed as
(a) n/N
(b) N/n
(c) 1-n/N
(d) None.
9. The finite population correction in usual notation is expressed as
(a) (N-n)/N
(b) 1-(n/N)
(c) Both(a)&(b)
(d) None
10. A selection procedure of sampling having no involvement of probability is known
as
(a) SRSWOR
(b) Purposive sampling
(c) SRSWR
(d) None
11. For gathering information on rare events, sampling is used
(a) SRSWOR
(b) Stratified random sampling
(c) Inverse sampling
(d) None
12. If a larger units have more probability of their inclusion in the sample, the
sampling is known as
(a) SRSWOR
(b) PPS sampling
(c) Stratified random sampling
(d) None
13. Simple random samples can be drawn with of help of
(a) Random numbers table
(b) Chit Method
(c) Roulette wheel
(d) All the above
14. Sampling frame is a list of
(a) A list of units of a population
(b) A list of random numbers
(c) A list of natural numbers
(d) None
15. In SRSWR, the same sampling unit may be included in the sample
(a) Only once
(b) Two times
(c) More than once
(d) None
16. The discrepancies between the estimate and the population parameter is known
as
(a) Sampling error
(b) Non-sampling error
(c) Formula error
(d) None
17. The error in a survey other than sampling error is known as
(a) Sampling error
(b) Non-sampling error
(c) Formula error
(d) None
18. A function of sample observations is known as
(a) Statistic
(b) Estimator
(c) Both (a)&(b)
(d) None
19. If the sample sizes are large from the population, then which error will contribute
more errors
(a) Sampling error
(b) Non-sampling error
(c) Both(a)&(b)
(d) None
20. If the sample sizes are large from the population, then which error will contribute
less errors
(a) Sampling error
(b) Non-sampling error
(c) Both (a)&(b)
(d) None
21. Simple random sample can be drawn with the help of (a)random number
tables(b)Chit method (c) roulette wheel (d) all the above
22. If each and every unit of a population has an equal chance of being included in
the sample, it is called(a) restricted sampling (b) unrestricted sampling (c)
purposive sampling (d) subjective sampling
23. If the observations recorded on five sampled items are 3,4,5,6,7,then sample
variance is equal to (a)0 (b) 1(c)2 (d) 2.5
24. If all the observations in a set of observations are the same, then variance of set
of values is (a) 0 (b) 2 (c) infinite (d) none
25. If the sample values are 1,3,5,7,9 then S.E. of sample mean is (a) 2 (b) √2 (c) 3
(d) √3 ℎ ( )
26. As a normal practice sampling fraction is considered to be negligible if it is (a)
more than 5%(b) less than≤ 5% (c) More than 10% (d) none
27. Systematic sampling is used when (a) when data are on cards (b) when the
items are in row (c) when the items are given in a sequential order (d) all the
above
28. If the population units N is multiple of n and k, then we use (a) linear
systematic sampling (b) circular systematic sampling (c) random systematic
sampling (d) all the above
29. Circular systematic sampling is used when (a) N is a whole number (b) N is not
divisible by n (c) N is a multiple of n (d) All the above.
30. Problem of non-response has (a) no solution (b) can be solved (c) no meaning
(d) none
31. If sample sizes increase, then sampling error will (a) increase (b) decrease (c)
both (a) &(b) (d) none
32. If sample sizes increase, then non-sampling error will (a) increase (b) decrease
(c) both (a) &(b) (d) none
33. A population is divided into clusters and it has been found that all the units within
a cluster are same. In this situation which sampling will be adopted (a)
SRSWOR (b) Stratified random sampling (c) Cluster sampling (d) Systematic
sampling
34. A population N is divided into k strata. A sample of size n is to be chosen and Ni
is the size of the ith stratum. Then sample size n as per proportional allocation is
given by (a) ni=nN (b) ni/Ni=n/N (c) niNi=nN (d) none
35. In case of iniverse sampling, the proportion p of m units of interest contained in a
sample of n units is equal to (a) m/n (b)(m-1)/n (c) (m-1)/(n-1) (d) (m-1)/(n+1)
36. If the respondents do not provide the required information to the researcher,
then it is known as (a) non-sampling error(b) the problem of non-response (c)
both (a) &(b) (d) none
37. The errors falling under faulty planning of survey, it is called (a) non-sampling
errors (b) non-response errors (c) Sampling errors (d) Absolute error
38. If there is a certain number of very high values in the sample, it is preferable to
compute (a) Standard error (b) Standard deviation (c) variance (d) all the
above.
39. For estimating the population mean T,let T1 be the sample mean under
SRSWOR and T2 sample mean under SRSWR, then which relationship is true
(a) Var(T1)< Var(T2) (b) Var(T1)> Var(T2) (c) Var(T1)≤ Var(T2) (d) none
40. The magnitude of the standard error of an estimate is an index of its (a) accuracy
(b) precision (c) efficiency (d) none
41. Which of the following statement is true (a) population mean increases with
increase in sample size (b) population mean decreases with increase in sample
size(c) population mean decreases with decrease in sample size(d) population
mean is a constant value.
42. A sample of 25 units from a population with standard deviation10 results into a
total score of 450. Then the mean of sampling distribution is equal to (a) 45(b) 18
(c) 50 (d) none
43. A population is perfectly homogeneous with respect to a characteristic, what size
of sample would you need (a) no sample (b) a large sample (c) a small sample
(d) a single sample.
68. Under SRSWOR, the same item can occur more than once.
69. The sampling procedure in which the population is divided into homogeneous
groups and sample drawn from each group is called stratified random
sampling.
70. Stratified random sampling is useful when population is heterogeneous.
71. Stratification is done with respect to certain characteristics.
72. Deciding the sample size for each stratum is known as allocation problem.
73. If the sample size of each stratum is in proportion to stratum size, it is called
proportional allocation.
74. Stratified random sampling falls under the category of restricted sampling.
75. More heterogeneous is the population, larger sample sizes are required.
76. In usual notation (N-n)/N is known as finite population correction.
77. In usual notation n/N is called sampling fraction.
78. For a high precision of estimates, larger samples are required.
79. Estimators and estimates are different.
80. Determination of sample sizes for each stratum subject to the cost constrained is
called optimum allocation.
81. Optimum allocation is also known as Neyman allocation.
82. Double sampling is termed as two phase sampling.
83. Cluster sampling ordinarily leads to the loss of precision.
84. Cluster sampling helps to reduce cost of survey.
85. Larger the cluster size, less efficient it is relative to the element as sampling
unit.
86. Two stage sampling is less efficient rather than that of single stage sampling.
87. A sampling procedure in which units are selected with chance of selection in
proportion to some measure of their size is known as PPS sampling.
We know that
1 1
V ( yn ) ( ) S 2
n N
1 1 k
and V ( yst ) p ( ) pi Si 2
n N i 1
We know that
1 K Ni
S2 i N )2
N 1 i 1 j 1
( y Y
K Ni
(N-1)S2= ( yi Y Ni YNi YN ) 2 after addition and subtraction of Y¯Ni
i 1 j 1
K Ni K Ni
= (y Y
i 1 j 1
i Ni ) ( YNi YN ) 2
2
i 1 j 1
+ Third term
The third term is zero due to the algebraic property of arithmetic mean.
K K
( N 1) S 2 (Ni 1) Si 2 Ni ( YNi YN ) 2 ……………(1)
i 1 i 1
Ni 1 N 1
1
Ni N
Dividing equation (1) by N, we have
N 1 2 K ( Ni 1) 2 K Ni
( )S Si ( y Ni Y N ) 2
N i 1 N i 1 N
K
( Ni 1) 2 K Ni
S2 Si ( y Ni Y N ) 2
i 1 N i 1 N
1 1
Multiplying both sides by ( ) we have
n N
1 1 1 1 K 1 1 K
( ) S 2 ( ) pi Si 2 ( ) pi ( y Ni YNi ) 2
n N n N i 1 n N i 1
Certainly, there is some gain in precision due to stratification. A gain can be increased
by making the stratum means difference among themelves.
If we combine these two results, then we have V random - V Ney= a positive term
Hence proved.
Q2. Define ratio estimator for a population mean of character Y giving its conditions.
Write the expression for its bias and variance. Find the condition under which ratio
estimator is more efficient than usual simple mean estimator.
we have considered only estimates based on simple arithmetic means of the observed
values in the sample. We shall consider other methods of estimation which make use
of the auxiliary /ancillary information and which, under certain conditions, give more
reliable estimates of the population values than those based on the simple averages.
One of these methods is of particular importance. It is called the ratio method of
estimation.
Ratio estimator is defined as y R R X N where R is ratio of sample mean of y and
yn
sample mean of x i.e. R
and X N is the population mean of auxiliary characteristics
xn
X.
N n
= [C X 2 2 C X CY ] where Cx and Cy are the coefficient of variation of auxiliary
Nn
variable and study variable respectively.
N n N n 2
= [CY 2 C X 2 2 C X CY ] = V ( y R ) [S y 2 S x 2 2 S x S y ]
Nn Nn
Here, we will compare ratio estimator with simple mean estimator. For this we require
the variance of ratio estimator and variance of sample mean estimator in SRSWOR. We
know that variance of ratio estimator is
N n 2
V ( yR ) [ S y 2 S x 2 2 S x S y ] and variance of sample mean in the case of
Nn
SRSWOR is
N n 2
V ( yn ) Sy .
Nn
1 1
/
which is equal to 1+ ρ2[Sx/Sy]2-2RρSx/Sy. It implies that the ratio
V ( y R ) V ( yn )
estimator is more efficient in comparison to sample mean estimator if ρ>1/2 [Cx/Cy]
where Cx and Cy are the coefficient of variation of auxiliary variable and study variable
respectively.
Q3.Derive the condition when regression estimator and ratio estimator are equally
efficient.
Ans,
The regression estimator will be more efficient than the ratio estimator provided
MSE ( ylr ) MSE ( yR )
i.e. S y 2 (1 2 ) S y 2 R 2 S x 2 2 RS x S y
Sy
or ( S y RS x ) 2 0 which is always true. R= ( S y RS x ) 2 0 R in which both
Sx
regression and ratio estimator are equally efficient.
Q4. In SRSWOR, the probability of drawing any specified unit at rth draw is equal to the
probability of drawing it at the first draw.
Proof: In order to prove this property two probability statements will be multiplied
together since they are mutually exclusive.
Let us suppose that there are N units in the population. The probability of selecting a
unit at the first draw is 1/N and the probability of its not selection at this draw is 1-
(1/N)= (N-1)/N. Similarly, the probability of not selecting the unit at the second draw
is (N-2)/(N-1)
In the same way, the probability of not selecting the unit at the third draw is (N-3)/(N-
2).
In general, above statements say that the probability of not selecting the unit at the
(r-1)th draw is (N-r+1)/(N-r+2).
The second statement was that the unit should be selected at the rth draw.
If we proceed in the same way, we can deduce this probability. The probability of
selecting a unit at the first draw is 1/N. The probability of selecting a unit at the
second draw is 1/(N-1). Similarly, the probability of selecting a unit at the third draw
is 1/(N-2).
Then, the probability of selecting a unit at the rth draw is 1/(N-r+1).
( N 1) ( N 2) ( N 3) ( N r 1) 1
......... which is equal to 1/N
N ( N 1) ( N 2) ( N r 2) ( N r 1)
It shows that in SRSWOR, the probability of drawing any specified unit at rth draw is
equal to the probability of drawing it at the first draw.
Hence proved.
Estimator is a function of sample values which can be used for estimating the
population parameter. It is also called statistic. For example sample mean
and sample mean square are always based on sample values and they are
able to estimate population mean(µ) and population mean square(S 2).
Estimate is a specific value of the estimator which is always fixed for an
estimator.
(i) Lahiri’s method in PPS sampling:
(a) Let M=max Xi i.e. maximum of the sizes of N units in the population or some
convenient number greater than M. We can write the following steps in a nutshell
in order to select our desire samples:
(b) Select a pair of random number (i, j) such that 1≤i≤N and 1≤j≤M
(c) If j ≤ Xi then ith unit is selected otherwise rejected and another pair of random
number is chosen.
(d)To get a sample of size n, this procedure is repeated till n units are selected.
(iii) In inverse sampling (sometimes called standard inverse sampling), you continue to
choose items until an event has occurred a specified number of times. It is often
used when you don’t know the exact size of the sample you want to take. For example,
let’s say you were conducting a wildlife management survey and wanted to capture
20 banded birds. Inverse sampling is often performed when a certain characteristic
is rare. For example, it is a good method for detecting differences between two different
treatments for a rare disease.
For example, if you were studying how many books the average city dweller read
in a week, you might stratify the population by college graduates and less educated
people. If an initial quick survey told you that, in 100 residents, 60 were college
graduates and 40 were not, you could decide to sample college graduates and non-
graduates at a 6/4 ratio during phase two of your study.
Q6. Define simple random sampling. Differentiate between SRSWOR and SRSWR.
Ans. The simplest of the methods of probability sampling which is usually called the
method of random sampling. In this method, an equal probability of selection is
assigned to each available units of the population at the first and each subsequent
draw. Thus, if the number of units in the population is N, then the probability of selection
of any unit at first draw is 1/N and at the second draw is 1/N-1 etc, which are ultimately
equal to 1/N. The sample obtained using the above method is called “ Simple Random
Sampling”. Since this result is independent of the specified unit it follows that every one
of the units in the population has the same chance of being included in the sample
under the procedure of simple random sampling.
(i) If the selected units are not being replaced back in the population before the
second draw, it is called SRSWOR and if the selected units are being
replaced back in the population before the second draw, it is called SRSWR
(ii) In SRSWOR, at each draw ,new information on the units will be generated
while it may be possible to have the same kind of information on the units in
SRSWR.
(iii) SRSWOR method will cover the whole population units while it is not true in
the case of SRSWR.
(iv) The variance of sample mean in SRSWOR is found to be smaller rather than
that of SRSWR providing more efficiency in SRSWOR.
Sampling error:
It is a general assumption in the sampling theory that the true value of each
unit in the population can be obtained and tabulated without any errors. In
practice, this assumption may be violated due to several reasons and
practical constraints. This results in errors in the observations as well as in
the tabulation. Such errors which are due to the factors other than sampling
are called non-sampling errors.
Non sampling errors can occur at every stage of planning and execution of
survey or census. It occurs at planning stage, field work stage as well as at
tabulation and computation stage. The main sources of the non-sampling
errors are
(i) lack of proper specification of the domain of study and scope of
investigation,
(ii) incomplete coverage of the population or sample,
(iii) faulty definition,
(iv) defective methods of data collection and
(v) tabulation errors.
More specifically, one or more of the following reasons may give rise to nonsampling
errors or indicate its presence
(i) The data specification may be inadequate and inconsistent with the objectives
of the survey or census
(ii) Due to imprecise definition of the boundaries of area units, incomplete or
wrong identification of units, faulty methods of enumeration etc, the data may
be duplicated or may be omitted.
(iii) The methods of interview and observation collection may be inaccurate or
inappropriate.
(iv) The questionnaire, definitions and instructions may be ambiguous.
(v) The investigators may be inexperienced or not trained properly.
(vi) The recall errors may pose difficulty in reporting the true data.
(vii) The scrutiny of data is not adequate.
(viii) The coding, tabulation etc. of the data may be erroneous.
(ix) There can be errors in presenting and printing the tabulated results, graphs
etc.
(x) In a sample survey, the non-sampling errors arise due to defective frames
and faulty selection of sampling units.
Q9. Explain the situations when cluster sampling is used.
In simple random sampling without replacement, we assume that samples are selected
with equal probability for all the units in the population. If the units vary considerably in
size, SRSWOR may not be appropriate since it does not take in to account the possible
importance of the varying sizes of the units in the population. In fact, a larger unit for
variable Y may contribute more to the population total rather than the smaller units. For
example, villages having larger geographical areas are likely to have larger population
and larger areas under food crops. It is therefore natural to expect that a scheme of
selection which provide the probability of selection in the sample to larger units also
than to smaller units giving more efficient estimators in comparison to equal probability.
Such type of sampling which vary from probability to probability according to the size of
units. It is called “Probability Proportional to Size” (PPS ) sampling.
In the cumulative total method , the following steps are considered:
Q11. Explain the important points for planning and organization of a sample survey.
(i) Objectives
(ii) Data to be gathered
(iii) Population under investigation
(iv) Sampling frame
(v) Methods of collecting data
(vi) Organization and supervision of field work
(vii) Tabulation of data
(viii) Analysis of data
(ix) Precision
(x) Writing reports and conclusions
Q12. Derive a relationship between mean square error, sampling variance and bias.
Let us suppose that be the estimator of the parameter θ. Then mean square error is
defined as MSE= E ( ) 2
subtracting & adding E ( ) in the right hand side, we have
MSE= E[ E ( ) E ( ) ]2
Q13. . Define stratified random sampling and write the advantages of stratification. What
are the choice of sample sizes in different allocations?.Derive the expression of
variance of sample mean in proportional and Neyman allocations.
We know that the precision of a sample estimate of the population mean depends
upon two factors: (1) the size of the sample, and (2) the variability or heterogeneity of
the population. Apart from the size of the sample, therefore, the only way of increasing
the precision of an estimate is to devise sampling procedures which will effectively
reduce the heterogeneity. One such procedure is known as the procedure of stratified
random sampling. It consists in dividing the population of N units into K groups of sub
population of N1,N2,…Nk units respectively. These sub population are non-overlapping
and together they can comprise the whole of the population, so that N 1+N2+…Nk =N.
These sub-population are called strata and the single group is called stratum. The
sample size within the strata are denoted by n1,n2,…nk respectively such that
n1+n2+…nk=n. K is called the number of strata or groups.
If a simple random sample is taken in each stratum using SRSWOR, the whole
procedure is described as” Stratified Random Sampling”
Advantages of stratification:
(i) If the admissible error is given, a small sample should be taken so that our
expenditure may be reduced.
(ii) There is a reduction of error due to stratification, if the cost of survey is fixed.
(iii) Stratification provides the individual means of stratum and then for the whole
population.
(iv) Stratification may provide the administrative convenience.
(i) Equal allocation: ni=n/k where n is the total sample size and k is the
number of strata.
(ii) Proportional allocation: ni=npi where pi= Ni/N stratum weight.
(iii) Optimum allocation: It involves cost per unit in the stratum
(iv) Neyman allocation: A particular case of optimum allocation when ci=c ,a
constant cost for all the units ni= npisi/∑ .
Derivation of the variance of sample mean in proportional and Neyman allocation:
1 1 k 2 2
We know that V ( yst ) ( ) pi Si
ni N i i 1
Where pi=Ni/N stratum weight and Si2 is the population mean square.
Substituting ni=npi in the above expression, we have the variance of sample mean in
proportional allocation.
1 1 k p i
2
Si 2
1 k
V ( yst ) p ( ) pi 2 Si 2 ( i 1 p i
2
Si 2 ) which is written in simplified form
npi N i i 1 n N i 1
1 1 k
as V ( yst ) p ( ) pi Si 2 .
n N i 1
Similarly, substituting ni= npisi/∑ . in the above expression, we have the variance
of sample mean in Neyman allocation .
( pisi ) ( pisi )
2 2
1 1 k 1 k 1 k
V ( yst ) Ney ( ) pi 2 Si 2 [ pi 2 Si 2 ] [ pS i i
2
]
ni Ni i 1 n Ni i 1 n N i 1
Example:
Using the three-figure numbers given in columns 1 to 3, 4 to 6, etc., of the table given
in the Appendix and rejecting numbers greater than 338 (and also the number 000), we
have for the sample:
125, 326, 12, 237, 35, 251, 165, 131, 198, 33, 161, 209, 51, 52, 331, 218, 337, 263,
223, 241, 277, 42, 14, 303, 40, 99, 102, 173, 137, 321, 335, 155, 163, 81.
The procedure involves the rejection of a large number of random numbers, nearly two-
thirds. A device commonly employed to avoid the rejection of such large numbers is to
divide a random number by 338 and take the remainder as equivalent to the
corresponding serial number between 1 to 337, the remainder zero corresponding to
338. It is, however, necessary to reject random numbers 677 to 999 and also 000 in
adopting this procedure as otherwise villages with serial numbers 1 to 323 will get a
larger chance of selection equal to 3/999 while those with serial numbers 324 to 338 will
get a chance equal to 2/999. If we use this procedure and also the same three-figure
random numbers as given in columns 1 to 3, 4 to 6, etc., we will obtain the sample of
villages with serial numbers given below:
125, 206, 326, 193, 12, 237, 35, 251, 325, 338, 114, 231, 78, 112, 126, 330, 312, 165,
131, 198, 33, 161, 209, 51, 52, 331, 218, 337, 238, 323, 263, 90,
(i) There is a reduction of cost either in terms of money or in terms of man hours.
Although the cost per individual may be larger in a sample survey but the total
cost that is expected to be smaller.
(ii) There is a greater scope in a sample survey rather than that of complete
enumeration. We should employ highly trained field staff for collection of data.
(iii) There is a chance of getting better quality data rather than that of complete
enumeration because in sample survey we gather information after probing.
(iv) Accuracy and efficiency is increased along with greater speed and more
importantly the quantification of uncertainty i.e.errors.
We have seen that the precision of a sample estimate of the population mean depends
upon two factors: (1) the size of the sample, and (2) the variability or heterogeneity of
the population. Apart from the size of the sample, therefore, the only way of increasing
the precision of an estimate is to devise sampling procedures which will effectively
reduce the heterogeneity. One such procedure is known as the procedure of stratified
sampling. It consists in dividing the population into k classes and drawing random
samples of known sizes, one each from the different classes. The classes into which the
population is divided are called the strata and the process is termed the procedure of
stratified sampling as distinct from the procedure considered in the previous chapters,
called unrestricted or un-stratified sampling. An example of stratified sampling is
furnished by the survey for estimating the average yield of a crop per acre in which
administrative areas are taken as the strata and random samples of predetermined
numbers of fields are selected from each of the several strata. The geographical
proximity of fields within a stratum makes it more homogeneous than the entire
population and thus helps to increase the precision of the estimate. In this chapter we
shall consider the theory applicable to the procedure of stratified sampling. Stratified
sampling is a common procedure in sample surveys. The procedure ensures any
desired representation in the sample of all the strata in the population. In un-stratified
sampling, on the other hand, adequate representation of all the strata cannot always be
ensured and indeed a sample may be so distributed among the different strata that
certain strata may be over-represented and others under-represented. The procedure of
stratified sampling is thus intended to give a better cross-section of the population than
that of un-stratified sampling. It follows that one would expect the precision of the
estimated character to be higher in stratified than in un-stratified sampling. Stratified
sampling also serves other purposes. The selection of sampling units, the location and
enumeration of the selected units and the distribution and supervision of field work are
all simplified in stratified sampling. Of course, stratified sampling presupposes the
knowledge of the strata sizes, i.e., the total number of sampling units in each stratum
and the availability of the frame for the selection of the sample from each stratum. It is
not necessary that the strata be formed of geographically continuous administrative
areas. Thus, in yield surveys, the fields may be stratified according as they are irrigated
or un-irrigated and separate samples selected from each. In a survey for estimating the
acreage under crops, strata may be formed by classifying the villages according to their
geographical area instead of on the basis of geographical proximity. The principles to be
followed in stratifying a population will become clear in the subsequent sections.
(i) Equal allocation: ni=n/k where nis the total sample size and k is the number
of strata.
(ii) Proportional allocation: ni=npi where pi= Ni/N stratum weight.
(iii) Optimum allocation: It involves cost per unit in the stratum
(iv) Neyman allocation: A particular case of optimum allocation when ci=c ,a
constant cost for all the units ni= npisi/∑ .
(v) From above allocations, it can be concluded that (i) the larger the size of the
stratum, the larger should be the size of the sample to be selected there from;
(ii) the larger the variability within a stratum, the larger should be the size of
the sample from that stratum; and
(vi) (iii) the cheaper the labour in a stratum, the larger the sample from that
stratum.
In simple random sampling the chance of each selecting units are equal. But often
units may vary in sizes. If simple random sampling is being used in this case the
expected result or desired information may not get. Under this circumstance, such
subordinate information can be utilized in selecting the sample so as to more precise
estimators of the population parameters. The probabilities of selecting samples to
different units depend on their sizes. The probability of selection may be assigned to
the proportional of the sizes are known to us probability proportion to sampling size
(PPS).
Let consider y is the variable under study. We are considering Y is the shopping malls
in a town and X is an auxiliary variables or number of workers who works in these
factories. The most commonly used varying probability scheme. The shopping malls
are selected with proportional to their number of workers. This term is known to us
Probability proportion to sampling (PPS).
PPS Sampling Procedure with Replacement
We are discussing here the two methods to draw sample PPS with replacement.
We will describe the cumulative total method on the basis of example. We consider a
district Kushtia that contains 10 shopping malls and where following number of
workers work 2, 5, 11, 13,7,3,9,16,6 and 4. We select a sample of four shopping
malls with replacement method for knowing the life pattern of workers.
The first step of selecting factories is to form cumulative totals. So we will determine
the cumulative totals in order to compare with selected random numbers.
Lahiri’s Method
Let M=max Xi i.e. maximum of the sizes of N units in the population or some
convenient number greater than M. We can write the following steps in a nutshell in
order to select our desire samples:
Select a pair of random number (i, j) such that 1≤i≤N and 1≤j≤M
If j ≤ Xi then ith unit is selected otherwise rejected and another pair of random
number is chosen.
To get a sample of size n, this procedure is repeated till n units are selected.
The given number of shopping malls N=10, First of all we select the random number
between 1-N. It means we have to choose such random numbers those are less than
10. Suppose 3 is selected. It is noting down that unit with corresponding serial
number provisionally selected. We select another random number between 1 to M
where M= Max Xi=16. Suppose our second random number 7 is selected. Now, if the
second random number that we have selected smaller than the size of unit
provisionally selected. Then the unit is selected into the sample. If not then entire
procedure will be repeated until is finally selected. We are considering these selected
random numbers into table.
The pairs (3,7),(4,5),(2,3),(7,8) are selected. Hence the samples will consist of the
shopping malls with serial number 3, 4, 2 and 7
The sum and substance of this method is that we will repeat the procedure until our
desire samples are selected.
The basic difference between simple random sampling and varying probability
scheme:
In simple random sampling the probability of selecting unit at any drawn is the same.
But in varying probability scheme the probability of selecting any unit diff er from unit
to unit. It appears in PPS sampling that such procedure would give biased estimators
as the larger units are over-represented and the smaller units are under-represented
in the sample. This will happen in case of sample mean as an estimator of population
mean.
For selecting a sample of size n without replacement, the first unit is selected by the
above cumulative total method and then it is deleted from the population and for the
reminder population new cumulative totals are calculated and again the same
procedure is used to select a second unit. The procedure is continued until a sample
of n units is obtained.
The procedure is illustrated in table. We consider the first example for explaining the
cumulative total method for without replacement. We will describe the cumulative total
method on the basis of example. We consider a district Kushtia that contains 10
shopping malls and where following number of workers work 2, 5, 11, 13,7,3,9,16,6
and 4. We select a sample of 2 factories without replacement method for knowing
their life pattern.
The first step of selecting factories is to form cumulative totals. So we will determine
the cumulative totals in order to compare with selected random numbers.
The data set of 10 numbers of workers in the shopping mall and their output.
Shopping mall no Number of workers sizes Sell commodities Cumulative totals
1 2 30 C1=2
2 5 60 C2=2+5=7
3 11 12 C3=7+11=18
4 13 6 C4=18+13=31
5 7 8 C5=31+7=38
6 3 13 C6=38+3=41
7 9 4 C7=41+9=50
8 16 17 C8=50+16=66
9 6 13 C9=66+6=72
10 4 8 C10=72+4=76
Suppose we wish to draw a pps sample of 3 factories without replacement for the
selection of the first unit. We choose such random numbers that don’t exceed 76. We
select a random number k=37. We could see from the table it lies in 6 th no unit. So it
will be selected. Now we remove this unit and rearrange the shopping mall and
calculate the cumulative total.
Conclusion
It is known to us that Hansen and Hurwitz first introduced the use of probability
proportional to size (PPS) sampling; it goes without saying that a number of
procedures for selecting samples without replacement have been developed by the
help of statisticians. Survey statisticians have found probability proportional to size
(PPS) sampling scheme more useful for selecting units from the population as well as
estimating parameters of interest especially when it is clear that the survey is large in
size and involves multiple characteristics. So finally we could say that the selection of
samples are being done on the basis of its unit sizes.
Q19. Objective: Showing the unbiased estimator for population mean and biased
estimator for population mean square in SRSWR with the help of an example
Kinds of data: Consider a finite population of size N=4 including the values of sampling
units as ( 1,2,3,4). Enumerate all possible samples of size n=2 using SRSWR.
(i) Show that sample mean provides an unbiased estimator of population mean
(ii) Show that sample mean square does not provide an unbiased estimator of
population mean square.
N 1 2
(iii) Show that V ( yn ) S is correct .
Nn
(iv) Compute sampling errors and show that their sum is equal to zero.
E[yn] =40.0/16=2.5
E[ s2] =20/16=1.25
1
And population mean square S2= (Yi YN ) 2 =5/3=1.67
N 1
(ii) It shows that in SRSWR, sample mean square is not an unbiased estimator of
population mean square, because 1.25 is not equal to 1.67.
N 1 2
(iii) V ( yn ) S , we will compute L.H.S. and R.H.S.
Nn
1 n
L.H.S.= V ( yn ) = i n
n i 1
( y y ) 2
where y
n is based on n =16 observations.
10
Then, V ( yn ) = =5/8
16
4 1 5 5
R.H.S= x
4.2 3 8
N 1 2
It shows that V ( yn ) S is correct.
Nn
(iv) In the last column of the respective table, sampling errors are computed and
shown their sum is equal to zero.
Q20.
Objective: Showing the unbiased estimator for population mean and population mean
square in SRSWOR with the help of same example given above
Kinds of data: Consider a finite population of size N=4 including the values of sampling
units as ( 1,2,3,4). Enumerate all possible samples of size n=2 using SRSWOR
(i) Show that sample mean provides an unbiased estimator of population mean
(ii) Show that sample mean square provides an unbiased estimator of population
mean square.
N n 2
(iii) Show that V ( yn ) S is correct
Nn
(iv) Compute sampling errors and show that their sum is equal to zero.
S.No. Possible Sample mean Sample mean Sampling error
samples yn square s2
1. 1,2 1.5 0.50 -1.0
2. 1,3 2.0 2.00 -0.5
3. 1,4 2.5 4.50 0.0
4. 2,3 2.5 0.50 0.0
5. 2,4 3.0 2.00 0.5
6. 3,4 3.5 0.50 1.0
Total 15.0 10.0 0.0
E[ s2] =10/6=1.67
1
And population mean square S2= (Yi YN ) 2 =5/3=1.67
N 1
Then, V ( yn ) = 2.50/6=5/12
4 2 5 10 5
R.H.S= x
4.2 3 24 12
N n 2
It shows that V ( yn ) S is correct.
Nn
(i) In the last column of the respective table, sampling errors are computed and
shown their sum is equal to zero.
Bias of ratio estimator = E ( y R ) Y N = E ( R X N ) Y N = E ( R) E ( x n ) E ( yn ) = Cov( R, xn )
Similarly, bias of regression estimator = E ( y l ) YN =
E ( y n ) E ( ( X N x n ) YN E ( y n ) E ( ) E ( x n ) E ( x n ) E ( y n ) Cov( x n )
z 2 p(1 P)
n where z is the value of Z-score at 95% or 99% confidence levels.
e2
For example:
Using the three digits number taking either from a row or a column, we have the 34
selected samples
125,326,12,237,35,251,165,131,198,33,161,209,51,52,331,218,337,283,223,241,27
7,42,14,303,40,99,102,173,137,321,335,155,163,81
The procedure involves the rejection of large number of random numbers, nearly two
thirds. A device is commonly used to avoid the rejection of such large
numbers(known as Remainder Method of Selection) is to divide a random
number by 338 and take the remainder as equivalent to the corresponding serial
number between 1 to 377, the remainder zero corresponding to 338.It is, however,
essential to reject random numbers 677 to 999 and also 000 in adopting this
procedure as otherwise villages with serial numbers 1 to 323 will get a larger chance
of selection.
Using the random number tables, the following samples have been made.
125,206,326,193,12,237,35,251,325,338,114,231,78,112,126,330,312,165,131,198,
33,161,209,51,52,331,218,337,238,323,263,90,11,223
i.e. E(pi)= Pi
It has been shown with an example given below:
Then,E(pi)=4.0/10=0.4
If the sampling is done using SRSWR, then, E(pi)=10.0/25=0.4 and Pi=0.4.Here also
sample proportion provides an unbiased estimator of population proportion.
Q24. Simple and Stratified Random Sampling ( For practical and Theory)
Simple Random Sampling(SRS): It is the process of selecting a sample from given population
according to some law of chance in which each unit of population has an equal and independent
chance of being included in the sample.
SRSWR(With Replacement): A selection process in which the unit selected at any draw is
replaced to the population before the next subsequent draw is known as Simple random sampling
with replacement. In this case the number of possible samples of size n selected from the
population of size N is . The samples selected through this method are not distinct.
SRSWOR(Without Replacement): A selection process in which the unit selected at any draw
is not replaced to the population before the next subsequent draw and the next sample is
selected from the remaining population is known as Simple random sampling without
replacement. In this case the number of possible samples of size n selected from the population
of size N is . The samples selected through this method are distinct.
Note: Sample mean is an unbiased estimate of population mean in SRSWR and SRSWOR,
whereas sample variance is an unbiased estimate of population variance in case of SRSWOR
only.
SRSWOR is more efficient than SRSWR because V( ) < V( ) .
Stratified Random Sampling: When the population is heterogeneous and we wish that every
section of population is represented in the sample. We divide the whole population into different
number of strata so that the one stratum is much different from one another whereas the samples
within each stratum are more homogeneous. This technique of selecting a representative sample
of whole population is known as stratified random sampling.
In stratified random sampling allocation of sample size to different strata is based on the
staratum sizes (Ni), the variability within the stratum Si2 and the cost of surveying per sampling
unit in the stratum.
Methods for allocation of sample size to different strata are
Equal Allocation : ni =
Proportional Allocation: ni =
Neyman Allocation: ni = ∗∑
Objective: In simple random sampling, show the sample mean and sample mean square is an
unbiased estimate of population mean and population mean square with the help of an
hypothetical population in SRSWOR and to determine its variances and S.E.
Kinds of data: The data relate to the hypothetical population whose units are 1, 2, 3, 4 and 5.
Draw a sample of size n=3 using SRSWOR.
Solution: Number of all possible samples of size n=3 under SRSWOR is given by =
5 =10.
∑
Compute the mean of each sample = and sample mean square = ∑( − ) .
∑
Similarly the mean of population = = =3 and population mean square =
∑( − )
S2 = [(1-3)2 + (2-3)2 +(3-3)2 + (4-3)2 + (5-3)2]= =2.5
The 10 possible samples are given below in the table.
S.No. Possible Sample mean Sample mean Sampling error
samples square (s2) ( − )
∑ ∑
E ( )= = =3 = and E (s2)= = =2.5=S2,
then we can say that sample mean and sample variance s2 are an unbiased estimator of
population mean and population variance S2 respectively.
In order to find out the variance of sample mean in SRSWOR, we know that
V( )SRSWOR= S2 = ∗
*2.5 = 0.33
S.No. Possible Sample Sample Sampling S.No. Possible Sample Sample Sampling
samples mean mean error samples mean mean error
square ( − ) square ( − )
(s2) (s2)
1 1,2 1.5 0.50 -1.5 13 4,1 2.5 4.50 -0.5
2 1,3 2.0 2.00 -1.0 14 5,1 3.0 8.00 0.0
3 1,4 2.5 4.50 -0.5 15 3,2 2.5 0.50 -0.5
4 1,5 3.0 8.00 0.0 16 4,2 3.0 2.00 0.0
5 2,3 2.5 0.50 -0.5 17 5,2 3.5 4.50 0.5
6 2,4 3.0 2.00 0.0 18 4,3 3.5 0.50 0.5
7 2,5 3.5 4.50 0.5 19 5,3 4.0 2.00 1.0
8 3,4 3.5 0.50 0.5 20 5,4 4.5 0.50 1.5
9 3,5 4.0 2.00 1.0 21 1,1 1.0 0.00 -2.0
10 4,5 4.5 0.50 1.5 22 2,2 2,0 0.00 - 1.0
11 2,1 1.5 0.50 -1.5 23 3,3 3.0 0.00 0.0
12 3,1 2.0 2.00 -1.0 24 4,4 4.0 0.00 1.0
25 5,5 5.0 0.00 2.0
Total 75.0 50.00
then we can say that sample mean is an unbiased estimate of population mean whereas and
sample variance s2 is not an unbiased estimate of population variance S2 in case of SRSWR.
In order to find out the variance of sample mean in SRSWR, we known that
V( )= = S2 = ∗
*2.5 = 1.0
Standard Error of ( )= V( ) = √1 =1
In order to find the estimate of V( ) based on 9th sample, we have
V( )= = ∗
*2.0 = 0.8
Standard Error of ( )= V( ) = √0.80 =0.894
Objective : Drawing of samples in stratified random sampling under different allocation along
with determination of their variances and standard errors.
Kinds of data: A hypothetical population of N= 3000 is divided into four strata, their sizes of
population and standard deviations are given as follows :
Strata I II III IV
Size Ni 400 600 900 1100
SD Si 4 6 9 12
A stratified random sample of size 800 is to be selected from the population
Soultion : In case of
(i) Equal allocation the sizes of sample allocated to different strata will be the same. Hence the
different sample sizes will be ni = = = =200 samples from each
allocation.
(ii) In case of proportional allocation ni (i=1,2,3,4) is given by ni = npi where pi =Ni/N
ni =
∗
Hence n1 = =106.67≈107 samples from stratum I
∗
n2 = =160 samples from stratum II
∗
n3 = =240 samples from stratum III
∗
n4 = =293 samples from stratum IV
Thus, n1 + n2 + n3 + n4 = 800 constitute the samples required from all the strata.
∗ ∗
Hence, n1 = 800 ∗ =48, n2== 800 ∗ =109,
∗ ∗
n3 = 800 ∗ =245, n4== 800 ∗ =398,
In Neyman allocation, the sample sizes from four strata are 48, 109, 245 and 398 which
constitute the required sample size.
∑ ∑
Variance of in equal allocation V( ) = − ,
from above data ∑ p S = 8.83, ∑pisi2= 86.43 and ∑pi2 si2 = 28.37
∗ . .
V( ) = − , =.141-.028= 0.1130
Standard Error of ( )= V( ) = √0.1130 =0.336
Variance of in proportional allocation V( ) =( − ) ∑ =( -
)*86.43 =0.0792
Standard Error of ( )prop = V( ) = √0.0792 =0.2815
(∑ ) ∑ . .
Variance of in Neyman allocation V( ) = − = − =.068
Standard Error of ( )ney = V( ) = √. 068 = 0.262
Ratio Estimator: Ratio method of estimation is based on the information available for
auxiliary variable. When the correlation coefficient between the study variable and the
auxiliary variable is positive and high, the ratio method of estimation can be used to study the
population parameters of study variable Y.
The equation of ratio estimator is given by = , where and are sample means of
R=
Regression Estimator: Ratio estimator is used if y and x are linearly related and the line of
regression between y and x are passes through origin. But when this is not the case and the
variate y is approximately a constant multiple of an auxiliary variate x, the regression
estimator is used.
The regression estimator can be defined as = + ( − )
Regression estimator is also a biased estimate of population mean.
( )
The variance of regression estimator is given by V( )= (1- ), here rxy =
(∑ ) ∑ ∑
and = where = [∑ - ] and = [∑ - ]
Regression estimator is more efficient than Ratio Estimator V( ) < V( )
If correlation coefficient is equal to zero , we should not apply regression estimator.
Objective : Estimation of the average number of bullocks per acre using ratio estimator and
show that it is a biased estimator of population mean. Compute bias and variance along with its
standard error.
Kinds of data : A bivariate population of size N=6 is given below :
No. of bullocks(Y) 3 4 8 9 6 9
Farm Size (acre)(X) 15 20 40 45 25 42
∑ ∑
= = =31.17, = = =6.50
∑ .
E( )= = = 6.514,
Since E( ) ≠ , ℎ ratio estimator is not an unbiased estimator of population mean .
The bias of ratio estimator to the first order of approximation is given by
( )
( )= ( − ) , where = and =
Objective: Determination of the regression estimator, comparison with the ratio estimator, and
its sampling variance and standard errors.
Kinds of data: A bi-variate population of size N=85 with population mean = 6.55 and =
8.55, a random sample of size n=10 was drawn using SRSWOR scheme and was recorded as
Y 11 8 7 6 4 5 3 2 9 10
X 10 7 6 5 3 4 2 1 8 9
( ) ( )
V( )SRSWOR = = ∗
*9.16 = 0.808
Let’s consider a situation where a research team is seeking opinions about religion amongst
various age groups. Instead of collecting feedback from 326,044,985 U.S citizens, random
samples of around 10000 can be selected for research. These 10000 citizens can be
divided into strata according to age,i.e, groups of 18-29, 30-39, 40-49, 50-59, and 60 and
above. Each stratum will have distinct members.
Example: A hypothetical population of N=2000 is divided into four strata. Their sizes of
population and standard deviations are as under:
Strata I II III IV
Size(Ni) 300 400 600 700
S.D.(Si) 6 10 12 15
Sample 5 10 15 20
means(yni)
A stratified random sample of size 400 is to be chosen from the population using
SRSWOR.
(i) Equal allocation; ni=n/k, means 400/4=100 samples will be tken from
each stratum.
(ii)Proportional allocation: ni=npi
n1=400x300/2000=60 samples from I stratum
n2=400x400/2000=80 samples from II stratum
n3=400x600/2000=120 samples from III stratum
n4=400x700/2000=140 samples from IV stratum
Neyman allocation:ni=npisi/Σpisi
n1=400x(300/2000)x6/(241/20)=40 samples from I stratum
n2=400x(400/2000)x10/(241/20)=66 samples from II stratum
n3=400x(600/2000)x12/(241/20)=120 samples from III stratum
n4=400x(700/2000)x15/(241/20)=174 samples from IV stratum
In this way ,we have 400 sample from the entire population using three allocations.
Q26. Show that the sample mean in stratified random sampling provides an unbiased
estimator of population mean.
Since samples have been taken using SRSWOR from each stratum from stratified
random sampling, therefore E ( yni ) provides an unbiased estimator of Y Ni from ith
stratum.
K
Hence, E ( yni ) piYNi which is equal to YNi
i 1
Hence proved.
Hence proved. Theorem1: In SRSWOR, the probability of drawing any specified unit at
rth draw is equal to the probability of drawing it at the first draw.
Proof: In order to prove this property two probability statements will be multiplied
together since they are mutually exclusive.
Let us suppose that there are N units in the population. The probability of selecting a
unit at the first draw is 1/N and the probability of its not selection at this draw is 1-
(1/N)= (N-1)/N. Similarly, the probability of not selecting the unit at the second draw
is (N-2)/(N-1)
In the same way, the probability of not selecting the unit at the third draw is (N-3)/(N-
2).
In general, above statements say that the probability of not selecting the unit at the
(r-1)th draw is (N-r+1)/(N-r+2).
The second statement was that the unit should be selected at the rth draw.
If we proceed in the same way, we can deduce this probability. The probability of
selecting a unit at the first draw is 1/N. The probability of selecting a unit at the
second draw is 1/(N-1). Similarly, the probability of selecting a unit at the third draw
is 1/(N-2).
( N 1) ( N 2) ( N 3) ( N r 1) 1
......... which is equal to 1/N
N ( N 1) ( N 2) ( N r 2) ( N r 1)
It shows that in SRSWOR, the probability of drawing any specified unit at rth draw is
equal to the probability of drawing it at the first draw.
Theorem 2: In SRSWOR , the probability of inclusion any specified unit in the sample is
equal to n/N.
Proof: In order to prove this property,we have to sum all the probability of selecting the
units. The probability of selecting the unit at the first draw ,then for the second draw ,
third draw and so on and so forth
1 1 1 1
Thus, ……n samples which is equal to n/N.
N N N N
Hence proved.