LESSON 5 - RESEARCH DESIGN
LESSON 5 - RESEARCH DESIGN
C. Diagnostic research studies: determine the frequency with which something occurs or its
association with something else. For example studies concerning whether certain
variables are associated.
In both descriptive and diagnostic research studies the researcher defines clearly what she wants
to measure, and find adequate methods of measuring it along with a clear-cut definition of the
population she wants to study.
The research design must make enough provision for protection against bias and must maximize
reliability with due concern for the economical completion of the study.
R Type of study
e E D
s x e
e p s
a l c
o r
r
r i
c a p
h t t
o i
D r v
e y e
s / /
i D
g f i
n o a
r g
m n
u o
l s
a t
t i
i c
v
e s
t
s u
t d
u y
d
y
O Flexible design (provides Rigid design (design must
v opportunity for considering different make enough provision for
e aspects of the problem) protection against bias and
r must maximize
a r
l e
l l
i
d a
e b
i
s
l
i i
g t
n y
)
S N Probability sampling design
a o (random sampling)
m n
p -
l p
e r
o
d b
e a
s b
i i
g l
n i
t
y
s
a
m
p
l
i
n
g
d
e
s
i
g
n
(
P
u
r
p
o
s
i
v
e
o
r
j
u
d
g
m
e
n
t
s
a
m
p
l
i
n
g
)
S N P
t o r
a e
t p -
i r p
s e l
t - a
i p n
c l n
a a e
l n d
n
d e d
e d e
s s
i d i
g e g
n s n
i
g f
n o
r
f
o a
r n
a
a l
n y
a s
l i
y s
s
i
s
O Unstructured instruments for S
b collection of data t
s r
e u
r c
v t
a u
t r
i e
o d
n
a o
l r
d w
e e
s l
i l
g
n t
h
o
u
g
h
t
o
u
t
i
n
s
t
r
u
m
e
n
t
s
f
o
r
c
o
l
l
e
c
t
i
o
n
o
f
d
a
t
a
O N Advanced decisions about
p o operational procedures
e
r f
a i
t x
i e
o d
n
a d
l e
c
d i
e s
s i
i o
g n
n s
a
b
o
u
t
t
h
e
o
p
e
r
a
t
i
o
n
a
l
p
r
o
c
e
d
u
r
e
s
SAMPLING DESIGN
(i) Census
All items in any field of inquiry constitute a ‘Universe’ or ‘Population’. A complete enumeration
of all items in the ‘population’ is known as a census inquiry. It can be presumed that in such an
inquiry, when all items are covered, no element of chance is left and highest accuracy is obtained.
Demerits of census:
1. There is no way of checking the element of bias or its extent except through a resurvey or
use of sample checks.
2. Involves a great deal of time, money and energy.
3. At times, this method is practically beyond the reach of ordinary researchers.
4. Sometimes it is not possible to examine every item in the population, and sometimes it is
possible to obtain sufficiently accurate results by studying only a part of total population.
In such cases there is no utility of census surveys.
5. Where census involves destruction of elements in the population.
However, it needs to be emphasized that when the universe is a small one, it is no use resorting to
a sample survey.
(ii) Sample survey
When field studies are undertaken in practical life, considerations of time and cost almost
invariably lead to a selection of respondents i.e. selection of only a few items. The informants
selected should be as representative of the total population as possible in order to produce a
miniature cross-section. The selected respondents constitute what is called a sample and the
selection process is known as a sampling technique. The survey so conducted is known as
sample survey.
Algebraically, if we let the population size to be N, if a part of size n (where n < N) of this
population is selected according to some rule, for studying some characteristic of the population,
then the group consisting of these n units is known as sample.
1. Systematic bias
This results from errors in the sampling procedures. It cannot be reduced or eliminated by
increasing the sample size. At best the causes responsible for these errors can be detected and
corrected.
A sample of households was taken with the object of making a study of morbidity. It was also
intended to use this sample for the study of birth rates. Before beginning this latter study, which
was subsidiary to the morbidity study, a comparison was made of the sizes of households of the
sample with those of the corresponding census tracts. This comparison is shown in table 5.1
(households of one were not included in the survey).
N O C
u r e
m i n
b g s
e i u
r n s
a
o l T
f r
H S a
o a c
u m t
s p s
e l
h e
o
l
d
s
N P N P
u e u e
m r m r
b c b c
e e e e
r n r n
t t
254 1 1 2
9 , 6
. 7 .
4 6 8
2
338 2 1 2
5 , 6
. 7 .
9 4 5
5
307 2 1 2
3 , 1
. 4 .
5 3 9
8
201 1 8 1
5 5 3
. 3 .
4 0
106 8 3 5
. 8 .
1 8 9
46 3 2 3
. 0 .
5 8 2
25 1 9 1
. 6 .
9 5
29 2 8 1
. 6 .
2 3
1 9 6 1
, 9 , 0
3 . 5 0
0 9 7 .
6 6 1
It is immediately apparent from the table that the sample contains a greater proportion of large
household than what exists in the whole population. Households of two are under-represented in
the sample to the extent of 7.4 per cent of all households. This deficiency is attributed to the
failure of enumerators to include missed households, in which childless married women working
away from home are likely to predominate. In order to provide a more satisfactory sample, it
was necessary to make a further survey of those families that were missed together at the time of
the morbidity survey.
It is interesting to note that the sample was apparently considered satisfactory for the morbidity
study because the workers had been primarily concerned with securing a sample representative
of the area in regard to prevalence of sickness rather than size of household. Actually, such
biased sample can scarcely be regarded as satisfactory even for a morbidity study, since sickness
rates are likely to vary with the size and composition of the family (Adopted from Ngao and
Kumssa, 2004)
2. Sampling errors
These are the random variations in the sample estimates around the true population parameters.
Since they occur randomly and are equally likely to be in either direction, their nature happens to
be of compensatory type and the expected value of such errors happens to be equal to zero.
Sampling error decreases with the increase in the size of the sample, and it is of a smaller
magnitude in the case of homogeneous population.
Sampling error
Can be measured for a given sample design and size. The measurement of sampling error is
usually called the precision of the sampling plan. If the sample size is increased, the precision is
improved.
But increasing the size of the sample has its own limitations: it increases the cost of collecting
data and enhances the systematic bias.
Thus the effective way to increase precision is usually to select a better sampling design which
has a smaller sampling error for a given sample size at a given cost.
- On the representation basis, the sample may be probability sampling (based on the concept
of random selection) or it may be non-probability sampling (non-random sampling).
- On element selection basis, the sample may be either unrestricted (each sample element is
drawn individually from the population at large) or restricted (all other forms of sampling).
Element selection R
Technique e
p
r
e
s
e
n
t
a
t
i
o
n
b
a
s
i
s
↓ ↓
Probability
sampling Non-probability sampling
S H
U i a
n m p
r p h
e l a
s e z
t a
r r r
i
c a d
t n
e d s
d o a
m m
s p
a s l
m a i
p m n
l p g
i l
n i o
g n r
g c
o
n
v
e
n
i
e
n
c
e
s
a
m
p
l
i
n
g
R Complex random sampling Purposive sampling (such as
e (such as cluster sampling, quota sampling, judgment
s systematic sampling, stratified sampling).
t s
r a
i m
c p
t l
e i
d n
g
,
s
a e
m t
p c
l .
i )
n
g
A: Non-probability sampling:
- Refers to the sampling procedure which does not afford any basis for estimating the
probability that each item in the population has of being included in the sample. In such a
design, personal element has a great chance of entering into the selection of the sample.
- The probability of selecting an element into the sample may not be the same for each
element. It is not quite possible to introduce randomization into this type of sampling.
We can therefore define a simple random sample from a finite population as a sample, which is
chosen in such a way that each of the NCn possible samples have the same probability, (1/NCn) of
being selected.
Example
Consider a certain finite population consisting of six elements (a, b, c, d, e, f) i.e. N = 6. Suppose
that you want to take a sample size n = 3 from it. Then there are 6C3 = 20 possible distinct
samples of the required size, and they consist of the elements:
{abc}; {abd}; {abe}; {abf}; {acd}; {ace};
{acf}; {ade}; {adf}; {aef}; {bcd}; {bce};
{bcf};
{bde}; {bdf}; {bef}; {cde}; {cdf}; {cef}; and
{def}.
If you choose one of these samples in such a way that each has the probability 1/20 of being
chosen, you will then call this a random sample.
(ii) You can write the name of each element of a finite population on a slip of paper, put the
slips of paper so prepared into a box or bag and mix them thoroughly and then draw the
required number of slips for the sample one after the other without replacement. In doing
so you must make sure that in successive drawing each of the remaining elements of the
population has the same chance of being selected. This procedure will also result in the
same probability for each possible sample.
In the earlier example, since you have a finite population of 6 elements and you want to
select a sample of size 3, the probability of drawing any one element for your sample in
the first draw is 3/6, the probability of drawing one more element in the second draw is
2/5, (the first element drawn is not replaced) and similarly the probability of drawing one
more element in the third draw is 2/4. Since these draws are independent, the joint
probability of the three elements which constitute our sample is the product of their
individual probabilities and this works out to 3/6 x 2/5 x ¼ = 1/20.
(iii) Use random number tables to select a random sample. Tippet gave 10400 four-figure
numbers. He selected 41600 digits from the census reports and combined them into fours
to give his random numbers, which may be used to obtain a random sample.
Suppose you are interested in taking a sample of 10 units from a population of 5000 units,
bearing numbers from 3001 to 8000. You will select 10 such figures from the above random
numbers which are not less than 3001 and not greater than 8000. If you randomly decide to read
the table numbers from left to right, starting from the first row itself, you obtain the following
numbers: 6641, 3992, 7979, 5911, 3170, 5624, 4167, 7203, 5356 and 7483. The units bearing the
above serial numbers would then constitute your required random sample.
Note that it is easy to draw random samples from finite populations with the aid of random
number tables only when lists are available and items are numbered. But in some situations, it is
often impossible to proceed in this way. For example, if you want to estimate the mean height of
trees in a forest, it would not be possible to number the trees, and choose random numbers to
select a random sample. In such a situation what you should do is to select some trees for the
sample haphazardly without aim or purpose, and should treat the sample as a random sample for
study purposes.
For example, suppose you consider the 20 throws of a fair dice as a sample from the
hypothetically infinite population, which consists of the results of all possible throws of the dice .
If the probability of getting a particular number, say 1, is the same for each throw and the 20
throws are all independent, then the sample is random. Also if you sample with replacement from
a finite population, the sample would be considered as a random sample if in each draw all
elements of the population have the same probability of being selected and successive draws
happen to be independent
Probability sampling methods/sampling designs
2. Systematic Sampling
You begin with a listing of all elements in the designated population. Then determine the desired
sample size and divide it into the population size to give an increment value, labeled N. The
sample selected is composed of every N th element of the sample frame. The first element is
selected by a random process in order to avoid bias.
For example, if a 4 per cent sample is desired, the first item would be selected randomly from the
first twenty-five and thereafter every 25 th item would automatically be included in the sample.
Thus, in systematic sampling only the first unit is selected randomly and the remaining units of
the sample are selected at fixed intervals.
Merits:
(i) It can be taken as an improvement over a simple random sample in as much as the
systematic sample is spread more evenly over the entire population.
(ii) It is an easier and less costly method of sampling and can be conveniently used even
in case of large populations.
Demerits:
(i) If there is a hidden periodicity in the population, systematic sampling will prove to be
an inefficient method of sampling.
For instance, every 25th item produced by a certain production process is defective. If
you were to select a 4% sample of the items of this process in a systematic manner,
you would either get all defective items or all good items in the sample depending
upon the random starting position.
(ii) If the population list is not in random order, the results of such sampling may, at
times, not be very reliable.
In practice, systematic sampling is used when lists of population are available and they are of
considerable length.
3. Stratified Sampling
The population is divided into layers or strata. Stratification is especially useful when a
population is characterized as heterogeneous but consists of a number of homogeneous sub-
populations or strata. When a population is homogeneous, little or no benefit is obtained from
stratification.
The population is divided into several sub-populations that are individually more homogeneous
than the total population and then you select items from each stratum to constitute a sample.
Since each stratum is more homogeneous than the total population, you are able to get more
precise estimates for each stratum and by estimating more accurately each of the component
parts; you get a better estimate of the whole. Stratification results in more reliable and detailed
information.
(c) How many items to be selected from each stratum or how to allocate the sample size of each
stratum?
Method of proportional allocation under which the sizes of the samples from the different
strata are kept proportional to the sizes of the strata is followed. That is, if P i represents the
proportion of population included in stratum I, and n represents the total sample size, the
number of elements selected from stratum I is n.Pi.
Example
Suppose we want a sample of size n = 30 to be drawn from a population of
size N = 8000 which is divided into three strata of size N1 = 4000, N2 = 2400
and N3 = 1600.
Adopting proportional allocation, the sample sizes from each stratum are obtained as follows:
Thus, using proportional allocation, the samples sizes for different strata are 15, 9 and 6
respectively which is in proportion to the sizes of the strata viz., 4000: 2400: 1600.
Proportional allocation is considered the most efficient and an optimal design when the cost of
selecting an item is equal for each stratum, there is no difference in within-stratum variances, and
the purpose of sampling happens to be to estimate the population value of some characteristic.
But in case the purpose happens to compare the differences among the strata, then equal sample
selection from each stratum would be more efficient even if the strata differ in sizes.
In cases where strata differ not only in size but also in variability and it is considered reasonable
to take larger samples from the more variable strata and smaller samples from the less variable
strata, a researcher can then account for both (differences in stratum size and differences in
stratum variability) by using disproportionate sampling design by requiring that:
This is called ‘optimum allocation’ in the context of disproportionate sampling. The allocation in
such a situation results in the following formula for determining the sample sizes different strata:
ni = n. N1 σ1
N1 σ1 + N2 σ2 +. …+ NK σK
For I = 1, 2, …,k.
Example
A population is divided into three strata so that N1 = 5000, N2 = 2000 and N3 = 3000. Respective
standard deviations are:
σ1 = 15, σ2 = 18 and σ3 = 5.
How should a sample of size n = 84 be allocated to the three strata, if you want optimum
allocation using disproportionate sampling design?
Solution:
Using the disproportionate sampling design for optimum allocation, the sample sizes for different
strata will be determined as under:
n1 = 84(5000) (15)
(5000) (15) + (2000) (18) + (3000) (5)
= 6300000/126000 = 50
n2 = 84(2000) (18)
(5000) (15) + (2000) (18) + (3000)
(5)
= 3024000/126000 = 24
n3 = 84(3000) (5)
(5000) (15) + (2000) (18) + (3000) (5)
= 1260000/126000 = 10
In addition to differences in stratum size and differences in stratum variability, you may have
differences in stratum sampling cost, and then you can have cost optimal disproportionate
sampling design by requiring
n1 = n2 =… = nK
N1 σ1 C1 C CK
N2 σ 2 2 NKσK
W
h
e
r
e
C =
1
C =
2
C =
K
And all other terms remain the same as explained earlier. The allocation in such a situation results
in the following formula for determining the sample sizes for different strata:
ni = n.Niσi/ Ci for I = 1; 2, …, k
N1 σ1 C1 + N2 σ2 C + …+ NKσK CK
NB:
2
It is not necessary that stratification be done keeping in view a single characteristic. Populations
are often stratified according to several characteristics. For example, a system-wide survey
designed to determine the attitude of students toward a new teaching plan, a state college system
with 20 colleges might stratify the students with respect to class, sec and college. Stratification of
this type is known as cross-stratification, and up to a point such stratification increases the
reliability of estimates and is much used in opinion surveys.
4. Cluster sampling
If the total area of interest is big, a convenient way in which a sample can be kept is to divide the
area into a number of smaller non-overlapping areas and then to a randomly select a number of
these smaller areas (clusters), with the ultimate sample consisting of all (or samples of) units in
these small areas or clusters. Thus in cluster sampling the total population is divided into a
number of relatively small subdivisions which are themselves clusters of still smaller units and
then some of these clusters are randomly selected for inclusion in the overall sample.
Suppose you want to estimate the proportion of machine-parts in an inventory, which are
defective. Also assume that there are 20000 machine parts in the inventory at a given point of
time, stored in 400 cases of 50 each. Now using a cluster sampling, you would consider the 400
cases as clusters and randomly select ‘n’ cases and examine all the machine parts in each
randomly selected case.
It requires grouping of the population. The units of the population are grouped by cluster rather
than by strata For example, workers in the quality control division. Cluster sampling is used only
because it reduces cost by concentrating surveys in selected clusters. Hence estimates based on
cluster samples are usually more reliable per unit cost.
Demerits:
(i) Cluster sampling can lead to large sampling errors if it is not properly done, hence less
precise than random sampling.
(ii) There is not as much information in ‘n’ observations within a cluster as there happens
to be in ‘n’ randomly drawn observations.
5. Multi-stage Sampling
This is a form of random sampling, which takes place in a series of stages. For example:
Stage 1: Random selection of regions
Stage 2: Random selection of neighbourhood within regions and
Stage 3: Random selection of households within neighborhood
Any of the other methods of sampling may be used in each of these stages. If you select
randomly at all stages, you will have what is known as multi-stage random sampling design. This
method of sampling is applied in big inquiries extending to a considerable large geographical
area, such as the entire country.
Suppose you want to investigate the working efficiency of nationalized banks in Kenya and you
want to take a sample of few banks for this purpose. The first stage is to select large primary
sampling unit such as provinces.
▪ If you select certain districts and interview all banks in the chosen districts. This would
represent a two-stage sampling with the ultimate sampling units being clusters of districts.
▪ If instead of taking a census of all banks within the selected districts, you select certain towns
and interview all banks in the chosen towns. This would represent a three-stage sampling
design.
▪ If instead of taking a census of all banks within the selected towns, you randomly sample
banks from each selected town, then it is a case of using a four-stage sampling plan.
Merits:
(i) It is easier to administer than most single stage designs mainly because of the fact
that sampling frame is developed in partial units.
(ii) A large number of units can be sampled for a given cost because of sequential
clustering, whereas this is not possible in most of the simple designs.
(iii) It is most useful in sampling a large number of units, especially when cost saving is
an important consideration.
Demerits:
Sampling errors are likely to be larger than those
of other probability samples.
6. Area sampling
If clusters happen to be some geographic subdivisions, in that case cluster sampling is known as
area sampling. Hence cluster designs, where the primary sampling unit represents a cluster
sampling are also applicable to area sampling.
Merits:
(i) The results of this type of sampling are equivalent to those of a simple random
sample i.e. not so biased
(ii) The method is less cumbersome
(iii) It is relatively less expensive.
Example
The following are the number of departmental
stores in 15 towns: 35, 17, 10, 32, 70, 28, 26, 19,
26, 66, 37, 44, 33, 29 and 28. If you want to select a sample of 10 stores, using cities as clusters
and selecting within clusters proportional to size, how many stores from each town should be
chosen?
N C S
o u a
. m m
u p
o l l
f a e
t
d i
e v
p e
a
r t
t o
m t
e a
n l
t
a
l
s
t
o
r
e
s
1 35 35 1
0
2 17 52
3 10 62 6
0
4 32 94
5 70 1 1 1
6 1 6
4 0 0
6 28 1
9
2
7 26 2 2
1 1
8 0
8 19 2
3
7
9 26 2 2
6 6
3 0
66 3 3
2 1
9 0
37 3 3
6 6
6 0
44 4 4
1 1
0 0
33 4
4
3
29 4 4
7 6
2 0
28 5
0
0
Since there are 500 departmental stores from which you have to select a sample of 10 stores, the
appropriate sampling interval is 50. The starting point is 10 and then you add successively
increments of 50 till 10 numbers have been selected. The numbers, thus, obtained are: 10, 60,
110, 160, 210, 260, 310, 410 and 460. From this two, stores should be selected randomly from
town number five and one each from town number 1, 3, 7, 9, 10, 11, 12, and 14. This sample of
10 stores is the sample with probability proportional to size.
8. Sequential sampling
The ultimate size of the sample is determined according to mathematical decision rules on the
basis of information yielded as survey progresses. This is usually adopted in case of acceptance
sampling plan in context of statistical quality control. In sequential sampling, one can go on
taking samples one after another as long as one desires to do so.
When a particular lot is to be accepted or rejected on the basis of single sample, it is known as
single sampling; when the decision is to be taken on the basis of two samples, it is known as
double sampling and in case the decision rests on the basis of more than two samples but the
number of samples is certain and decided in advance, the sampling is known as multiple
sampling. But when the number of samples is more than two but it is neither certain nor decided
in advance, this type of system is often referred to as sequential sampling.
Conclusion
▪ One should resort to simple random sampling because under it bias is generally eliminated
and the sampling error can be estimated.
▪ Purposive sampling is considered more appropriate when the universe happens to be small
and a known characteristic of it is to be studied intensively.
▪ At times, several methods of sampling may
well be used in the same study.