Sampling CH-9
Sampling CH-9
9.1 Definition
In most of practical applications the sampling units, for example, schools, cities, kebeles,
households, farms, etc., contain different numbers of elements or subunits. The sampling
procedure we discussed so far assumes that samples are selected by simple random sampling in
which the selection probabilities are equal for all units in the population. If the units vary
considerably in size, simple random sampling may not be appropriate since it does not take into
account the possible importance of the larger units in the population. One of the different
approaches which would allow treating different sizes of the units is to assign unequal
probabilities of selection to the different units in the population to get reasonable estimates of the
population quantities.
For example, villages with larger geographical areas are likely to have larger population and
larger areas under food crops. In estimating production or food supply, it may well be desirable
to adopt a scheme of selection in which villages are selected with probabilities proportional to
their population or to their geographical areas.
A sampling procedure in which the units are selected with probabilities proportional to some
measure of size is known as sampling with probability proportional to size (pps). The units may
be selected with or without replacement. In this chapter we only treat sampling with replacement.
To draw a sample of size n from a population of size N with probability proportional to size and
with replacement, we proceed as follows. If Zi is an ancillary variable, which is the size of the ith
Example 1: A village has 10 kebeles containing 150, 50, 80, 100, 200, 160, 40, 220, 60 and 140
households respectively. It is desired to select a sample of 4 kebeles with replacement and with
probability proportional to the number of households in the kebele. The first step in the selection
of kebeles is to form successive cumulative totals and ranges as shown below:
1
Sr. No. of Size Zi (HH) Cumulative Total Range
the Kebele
1 150 150 1-150
2 50 200 151-200
3 80 280 201-280
4 100 380 281-380
5 200 580 381-580
6 160 740 581-740
7 40 780 741-780
8 220 1000 781-1000
9 60 1060 1001-1060
10 140 1200 1061-1200
To select a kebele, we choose a random number between 1 and 1200 with the help of a table of
random numbers. Suppose the chosen random number is 600. It will be seen that this number
falls within the range 581-740 associated with the 6th kebele and it is selected. Draw three more
random numbers and assume these numbers are 650, 850 and 300. Then the kebeles selected
corresponding to these random numbers are 6th, 8th and 4th respectively. We observe that in a
sample of 4 kebeles selected with probability proportional to size with replacement, the 6th
kebele is selected twice.
If the n is the sample size, then we compute the sampling interval I, which is the nearest
integer to Z/n, i.e., I = Z/n.
We choose the number R at random between 1 to I inclusive using a random table. Let this
number be j.
Then the sample contains the n units with serial numbers j, j + I, j + 2I, - - -, j + (n-1)I and the
units corresponding to these numbers are selected.
If the interval I = Z/n is not an integer, a pps circular systematic sample can be obtained by
selecting a random start (j) from 1 to Z and then proceeding cyclically with the integer
nearest to Z/n as the interval.
Example 2: Consider example 1, this time using PPS systematic sampling. Compute the interval
I = Z/n =1200/4 = 300. Then select a random number between 1 and 300 from a random table.
Let this number be j = 291. The remaining three numbers are 591, 891, 1191. Then the selected
kebeles corresponding to these random numbers are 4th, 6th, 8th and 10th respectively.
2
9.3 Estimation of population Total Y and Mean Y from Selection of Unequal
Probabilities
Suppose that the sampling is with replacement and that on each draw the probability of selecting
the ith unit of the population is pi, and the characteristics under study is represented by the y-
value, for i = 1, 2, - - -, N.
1 N y
having the variance V( y pps ) = 2 pi ( i Y ) 2
N n i 1 pi
n
1 y
The sample variance is v( y pps ) = 2 ( i Yˆpps ) 2
N n(n 1) i 1 pi
N
When selection is strictly proportional to size, that is, pi = Zi/Z, where Z = Z
i 1
i , then theorems
Theorem 9.3: If a sample of n units is drawn with probabilities proportional to size, pi = Zi/Z and
Z n y Z n
with replacement, the unbiased estimate of Y is given as Yˆpps = i = y i = Z y
n i 1 Z i n i 1
where y is the unweighted mean of the unit means, with variance
n
Z N
y y i
Y
V( Yˆpps ) = 2
Z i ( y i Y ) , where y i i , y i 1
and Y
n i 1 Zi n Z
Z2 n
Similarly an unbiased sample estimate of V( Yˆpps ) is v( Yˆpps ) = ( yi y ) 2
n(n 1) i 1
3
Example 3: A village has 24 households and the size of each household is shown below in the
table.
a) Select 5 households with probability proportional to size, with replacement using simple
random sampling.
N
Solution: Cumulate the sizes of households and obtain Z = Z
i 1
i = 109. Then choose 5 random
numbers between 1 and 109 from random table. If these numbers are 28, 36, 69, 80, 104, then
these numbers correspond to the numbers of households 7, 8, 15, 17, and 23 respectively.
4
n
Z xi
Z n xi i 1 109(10.1667 10.3333 11.6 11.6667 11.75) 109x55.5167
X̂ pps =
n i 1 Z i
=
n
=
5
=
5
= 1210.264
Z2 n
v( X̂ pps ) = ( xi x ) 2 ,
n(n 1) i 1
n
x
i 1
i
10.1667 10.3333 11.6 11.6667 11.75 55.5167
where x = = = = 11.10334
n 5 5
v( X̂ pps )=
2 2 2 2
109 2 (10.1667 11.10334) (10.3333 11.10334) (11.6 11.10334) (11.6667 11.10334)
5(5 1) (11.75 11.10334) 2
109 2 x 2.45247
= = 1456.8898 s.e ( X̂ pps ) =38.1692
20
n n
Z wi w i
Z n w i 1 109( 4.5 5.0 5.0 7.0 5.5) i 1
Ŵ pps = i = = = 588.6, where w = 5.4
n i 1 Z i n 5 n
Z2 n
v( Ŵ pps ) = (wi w ) 2
n(n 1) i 1
2 (109) 2 x3.7
= 109 (4.5 5.4 2 2 2 2
(5 5.4) (5 _ 5.4) (5.5 5.4) (7 5.4) 2
= = 2197.985
5(5 1) 20
s.e( Ŵ pps ) = 46.88267