Chapter 5
Chapter 5
5.1. Introduction
Survey research is one of most basic methods in economic
research.
Survey research permits rigorous step by step development and
testing of complex propositions through survey data.
Sample surveys are often conducted for generalizing from sample
to population.
There could be many different reasons for conducting surveys.
The three most common purposes of surveys are: Description,
Explanation, and Exploration
Cont’d
Basic Survey Designs
Two basic types of survey designs are:
1. Cross sectional surveys
Data are collected at one point in time.
Less expensive and most common type
2. Longitudinal Survey
Surveys are collected at different point in time.
Useful for capturing changes over time.
Survey Sampling
Some studies involve only small number of people and thus all of
them can be included.
But when population is large, it is usually not possible to
undertake census or complete enumeration of all items in
population.
We then have to draw SAMPLE from total population
Sampling is process of selecting number of study units from
defined study population.
It aims at obtaining consistent and unbiased estimates of
population parameters.
Cont’d
The sampling process requires decision on: What is group of
people (STUDY POPULATION) we are interested in from which
we want to draw sample?
How many people do we need in our sample? How will these
people be selected?
To do sampling, study population has to be clearly defined (for
example, according to age, sex, and residence)
Apart from persons, study population may consist of villages,
institutions, records; etc.
Each study population consists of STUDY UNITS.
Cont’d
The way we define our study population and study unit
depends on problem we want to investigate and on objectives of
study.
There are two principles underlying any sample design: need to
avoid bias in selection procedure and need to gain maximum
precision.
Bias can arise:
1. if selection of sample is done by some non-random method i.e.
selection is consciously or unconsciously influenced by human
choice.
2. If sampling frame (i.e. list, index, population record) does not
adequately cover target population.
3. If some sections of population are impossible to find or refuse
to co-operate.
Cont’d
Representativeness:- is important particularly if you want to make
generalization about population.
A representative sample has all important characteristics of
population from which it is drawn.
For Quantitative Studies:
If researchers want to draw conclusions which are valid for whole
study population, they should draw sample in such a way that it is
representative of that population.
For Qualitative Studies: Representativeness of sample is NOT
primary concern.
In exploratory studies we select study units which give you richest
possible information. You go for INFORMATION-RICH cases!
Steps in Sampling Design
The critical steps in sampling are:
a) Identifying relevant population: when one wants to undertake
sample survey relevant population from which sample is going to be
drawn need to be identified.
Example: if study concerns income, then definition of population
elements as individuals or households can make difference.
b) Determining method of sampling:
Whether probability sampling procedure or non-probability
sampling procedure has to be used is also very important.
c) Securing sampling frame:
A list of elements from which sample is actually drawn is
important and necessary.
Cont’d
d) Identifying parameters of interest:
What specific population characteristics (variables and
attributes) may be of interest?
e) Determining sample size
The determination of sample size depends on several factors.
Obviously, the bigger, the better.
Still, the limitations of sample size will vary depending on what
we want to do?
From purely methodological perspective, decision on sample
size hinges on how large error one is willing to tolerate in
estimating population parameters or, put differently, what effect
size will be required for result to be considered significant?
These questions must be addressed prior to start of data collection.
Cont’d
But in final decision, statistical precision must be balanced
against time, cost, and other practical considerations.
Research designs with too small of "n" are unethical because they
waste resources since they can only provide anecdotal
(unreliable)evidence.
If sample size is too small, data will be unusable and experiment
must be repeated, with larger "n" to test hypothesis with greater
rigor.
Research studies that use too large "n" that is, larger than needed
to test hypothesis adequately, also are unethical because in
addition to time and expense wasted, human subjects may
undergo needlessly experimental procedures that could be
distressful, painful.
Cont’d
To avoid these errors and ensure development of ethical experimental design, a
power analysis or similar statistical procedures to determine "n" are
essential.
Researchers should consider seeking specialized expertise if they are not sure
they are competent to select data correctly.
It is plainly unethical to start work on research project, which burdens
research subjects and consumes resources, without sufficient
methodological skills to ensure a useful product.
Thus, consider the followings while determining sample for study:-
i) Degree of homogeneity: size of population variance is single most important
parameter.
The greater the dispersion in the population the larger the sample must be to
provide a given estimation precession.
ii) Degree of confidence required: Since a sample can never reflect its population
for certain, the researcher must determine how much precision s/he needs.
Cont’d
Precision is measured in terms of :-
a) An interval range in which we would expect to find the
parameter estimate.
b) The degree of confidence we wish to have in the estimate.
c) Number of sub groups to be studied:
When researcher is interested in making estimates concerning
various subgroups of population then sample must be large
enough for each of these subgroups to meet desired quality level.
iv) Cost: cost considerations have major impact on decisions about
size and type of sample, as well as data collection methods. All
studies have some budgetary constraint and hence cost dictates the
size of the sample.
Cont’d
V) Other Considerations:
Prior information: If our process has been studied before, we can
use that prior information to determine our sample size.
This can be done by using prior mean and variance estimates and
by stratifying population to reduce variation within groups.
Rule of Thumb: is based on past experience with samples that have
met requirements of statistical methods. Researchers use it because
they rarely have information on variance or standard errors.
Practicality: Of course sample size you select must make sense.
We want to take enough observations to obtain reasonably precise
estimates of parameters of interest but we also want to do this
within practical resource budget.
Cont’d
Therefore, sample size is usually a compromise between what
DESIRABLE is and what is FEASIBLE.
In general, the smaller the population, the bigger the sampling
ratio has to be for a reasonable sample.
Hence, for small populations (under 1000 a researcher needs a large
sampling ratio (about 30%), sample size of about 300 is required for
high degree of accuracy.
For moderately large population (10,000), a smaller sampling ratio
(about 10%) is needed – a sample size around 1,000.
To sample from very large population (over 10 million), one can
achieve accuracy using tiny sampling ratios (.025%) or samples of
about 2,500.
These are approximates sizes, and practical limitations (e.g. cost) also
play a role in a researcher’s decision about sample size.
Cont’d
Sample Size in Qualitative Studies
There are no fixed rules for sample size in qualitative research.
The size of sample depends on WHAT you try to find out, and
from what different informants or perspectives you try to find
that out.
The sample size is therefore estimated as precisely as possible, but
not determined.
Probability and non-probability sampling method
Probability sampling is based on concept of random selection
that assures that each population element is given a known non-
zero chance of selection.
It uses a random selection procedure to ensure that each unit of
the sample is chosen on the basis of chance.
Probability sampling requires a sampling frame (a listing of all
study units).
A randomization process is used in order to reduce or eliminate
sampling bias so that sample is representative of population from
which it is drawn.
Cont’d
A sample will be representative of population from which it is drawn if all
members of population have equal chance of being included in sample.
So, it is ideal to obtain probability sample, where any member of
population is equally likely to be observed.
Probability samples, although not perfectly representative are more
representative than any other type of sample because biases are avoided.
Probability sampling has considerable advantages over all other forms of
sampling:-.
First, sampling errors can be calculated.
Second, probability samples rely on random process, i.e. the selection process
operates in a truly random method (no pattern).
Finally, since each element has equal chance or probability of being selected it
is possible to get consistent and unbiased estimate of population parameter.
Types of probability sampling methods
Generally speaking we could distinguish between the following
types of sampling designs: Simple Random Sampling Technique,
Systematic sampling Technique, Stratified Sampling Technique,
Cluster Sampling Technique and Hybrid Sampling.
1. Simple Random Sampling (SRS)
It is the simplest and easiest method of probability sampling.
It is sampling procedure in which each element of population has
equal chance of being selected into sample.
It assumes that accurate sampling frame exists.
Usually two methods are adopted to pick a sample :- lottery
method and table of random number.
Cont’d
SRS may not be used because it is not practical.
It requires listing of entire population of interest.
This is impossible for national surveys.
It is too expensive to interview a national face to face sample
based on SRS.
The cost of interviewing randomly selected individual drawn from
a list of the entire population is extremely high.
SRS can only be applied in situation where the population size
is small.
Cont’d
2. Systematic Sampling Technique
In SYSTEMATIC SAMPLING individuals are chosen at regular
intervals (for example every fifth) from the sampling frame.
Under systematic sampling procedures, instead of list of random
number researcher calculates sampling interval.
The sampling interval is standard distance between elements
selected in sample.
For example, a systematic sample is to be selected from 1200
students of a school.
The sample size to be selected is 100.
The sampling fraction is: 100/1200= sample size/study population
= 1/12.
The sampling interval is therefore 12.
Cont’d
The number of the first student to be included in the sample is
chosen randomly, for example by blindly picking one out of
twelve pieces of paper, numbered 1 to 12.
If number 6 is picked as a starting number, then every 12th
student will be included in the sample until 100 students are
selected: the numbers selected would be 6, 18, 30, 42, etc.
The major advantages of SS are its simplicity and flexibility.
Cont’d
3. Stratified Sampling
One disadvantage of simple random sampling method is that small
groups in which researcher is interested may not appear in sample.
Hence, if it is important that sample includes representative
study units of small groups with specific characteristics (for
example, residents from urban and rural areas, or different religious
or ethnic groups), then sampling frame must be divided into
groups, or STRATA, according to these characteristics.
After population is divided into appropriate strata, simple random
sample can be taken either using SRS or SS techniques from
each stratum.
The stratified sampling technique is particularly useful when we
have heterogeneous populations.
Cont’d
The reasons for stratifying:
a) To increase a sample’s statistical efficiency,
b) To provide adequate data for analyzing the various
subpopulation,
c) To enable different research methods and procedures to be
used in different strata.
In addition, absence or poor quality of sampling frame makes it
necessary to first select a sample of geographical units, and then to
construct sampling frame only within those selected units.
The samples of households can then be selected from those lists.
Cont’d
How to Stratify?
Three major decisions must be made in order to stratify the given
population into some mutually exclusive groups.
1) What stratification base to use: stratification would be based on
principal variable under study such as income, age, education, sex,
location, religion, etc.
2) How many strata to use: there is no precise answer as to how many
strata to use. The more strata the closer one would be to come to
maximizing inter-strata differences and minimizing intra-strata
variables.
3) What strata sample size to draw: different approaches could be used:
One could adopt proportionate sampling procedure.
Cont’d
If number of units selected from different strata is proportional to
total number of units of strata then we have proportionate sampling
or use disproportionate sampling which allocates elements on basis of
some bias.
Cont’d
4. Cluster Sampling:
It may be difficult or impossible to take simple random sample of
units of study population at random, because complete sampling
frame does not exist.
Logistical difficulties may also discourage random sampling
techniques.
Example: interviewing people who are scattered over large area
may be too time-consuming.
However, when a list of groupings of study units is available (e.g.,
villages or schools) or can be easily compiled, a number of these
groupings can be randomly selected.
Selection of groups of study units (clusters) instead of selection
of study units individually is called CLUSTER SAMPLING.
Cont’d
If total area of interest happens to be big one and can be divided
into number of smaller non –overlapping areas (clusters) and if
some of groups or clusters are selected randomly we have
cluster sampling.
Clusters are often geographic units (e.g., districts, villages) or
organizational units (e.g., clinics, training groups, etc.).
Cluster sampling addresses two problems:
1) Researchers lack good sampling frame for a dispersed population
2) Cost to reach sample element is very high and cluster sampling
reduces cost by concentrating surveys in selected clusters.
Cont’d
Multistage area sampling (MAS) - is a cluster sampling with
several stages:
First take sample of set of geographic regions or clusters –
randomly select X number of clusters.
Next, subset of geographic area is sampled within each of those
regions and so on.
Finally sample of elements is drawn from smaller areas.
Cont’d
Reasons for using cluster sampling
1. economic advantages it posses. The need for more economic
efficiency than can be provided by simple random sampling
2. frequent un-availability of practical sampling frame
3. When population is infinite ,
4. When geographical distribution of units is highly scattered,
5. When sampling of individual units is not convenient for several
administrative reasons
While statistical efficiency for cluster sampling is usually lower
than for simple random samplings, chiefly because clusters tend to
be homogenous, economic efficiency is often great enough to
overcome this weakness.
The criterion is net relative efficiency resulting from tradeoffs
between economic and statistical factors.
Cont’d
5. Hybrid sampling
Where there is no single way to sample a particular population some
researchers use combination of four different methods discussed
above.
Non-Probability Sampling
Non-probability sampling is non random i.e., each member does not
have a known non-zero chance of being included.
Sometimes probability sample is infeasible and we are stuck with
convenience sample.
Example: If we want to conduct lengthy experiment using human
subjects we use whoever is willing to participate.
If we want to conduct economic study of drug users we often only
have limited information (hearsay, criminal records) of who uses
drugs.
Cont’d
Generally three conditions need to be met in order to use non-
probability sampling:-
First, if there is no desire to generalize to population parameter,
then there is much less concern whether or not sample fully
reflects population - when precise representation is not necessary.
Secondly, it is used because of cost and time requirements.
Probability sampling could be prohibitively expensive since it
calls for more planning and repeated callbacks to assure that
each selected sample unit is contacted.
Thirdly, probability sampling may breakdown in its applications.
The total population may not be available for study in certain
cases.
Non-probability sampling methods
(1) Convenience sampling
The method selects anyone who is convenient.
It can produce ineffective, highly un-representative samples and
is not recommended.
Such samples are cheap, however, biased and full of systematic
errors.
Example: the person on street interview conducted by television
programs is example of a convenient sample.
Cont’d
(2) Quota Sampling
A researcher first identifies categories of people (e.g., male,
female) then decides how many to get from each category.
Quotas are assigned to different strata groups and questionnaires
can be filled from different strata.
The major limitation of this method is absence of element of
randomization.
Consequently, extent of sampling error cannot be estimated.
It is used in opinion pollsters(researchers), marketing research
and other similar research areas.
Cont’d
(3) Purposive or Judgment sampling
Purposive sampling occurs when one draws non-probability sample
based on certain criteria.
When focusing on limited number of informants, whom we
select strategically so that their in-depth information will give
optimal insight into issue is known as purposive sampling.
It uses judgment of expert in selecting cases.
BUT, care should be taken that for different categories of
informants; selection rules are developed to prevent researcher
from sampling according to personal preference.
Cont’d
(4) Snowball (Network) sampling – chain sampling
This is method for identifying and sampling (or selecting) cases
in network.
Snowball sampling is based on analogy to snowball, which begins
small but becomes larger as it is rolled on wet snow and pick up
additional snow.
Snowball sampling begins with one or few people or cases and
spread out on basis of links to initial case.
You start with one or two information-rich key informants and
ask them if they know persons who know a lot about your topic
of interest.
Cont’d
(5) Typical case sampling
It is sometimes illustrative to describe in-depth some cases which
are ‘typical’ for group one is interested in.
For example, one may describe ‘typical’ family in rural village in
country A, ‘typical’ health problem of miners or malnourished
children, etc.
Such descriptions are merely illustrative; they cannot be
generalized for whole group.
Typical examples can either be selected with co-operation of
key informants who know study population well, or from
survey that helps to identify characteristics we are interested
in.
Problems in Sampling
Two types of errors: Non-sampling errors and Sampling errors
Non Sampling errors are biases or errors due to fieldwork
problems; interviewer induced bias, clerical problems in
managing data, etc.
These would contribute to error in survey, irrespective of whether
sample is drawn or census is taken.
On other hand, error which is attributable to sampling, and
which therefore, is not present in information gathered in census is
called sampling error.
a) Non-Sampling Error
Non sampling error refer to: Non-coverage error, Wrong
population is being sampled, non response error, Instrument
error and Interviewer’s error
Cont’d
Non-Coverage sampling error: This refers to sample frame defect,
Omission of part of the target population (for instance, soldiers,
students living on campus, people in hospitals, prisoners, households
without telephone in telephone surveys, etc.).
Non-coverage error also occurs when list used for sampling are
incomplete or are outdated.
The wrong population is sampled
Researchers must always be sure that group being sampled is
drawn from population they want to generalize about or
intended population.
Cont’d
Non response error
This error occurs when you are not able to find those whom you
were supposed to study.
Some people refuse to be interviewed because they are ill, are too
busy, or simply do not trust interviewer.
When one is forced to interview substitutes, unknown bias is
introduced.
One should try to reduce incidence of non-response errors.
Non-response error can occur in any interview situation, but it is
mostly encountered in self-administered surveys.
It is important in any study to mention non-response rate and to
honestly discuss whether and how non-response might have
influenced results.
Cont’d
Instrument error :
The word instrument in sampling survey means device in which
we collect data- usually questionnaire.
When question is badly asked or worded, resulting error is called
instrument error.
Example: leading questions or carelessly worded questions may
be misinterpreted by some researchers.
Interviewer error:
This might occur when some characteristics of interviewer
affects way in which respondent answer questions.
Enumerator can distort results of survey by in-appropriate
suggestions, word emphasis, tone of voice and question
rephrasing.
Cont’d
Cheating by enumerators -with only limited training and under
little directs supervision, and Perceived social distance between
enumerator and respondent have distorting effect.
Example: gender of interviewer may influence the respondent.
b) Sampling Errors
Sampling errors are random variations in sample estimates
around true population parameters.
Error which is attributable to sampling, and which therefore is
not present in census-gathered information, is called sampling
error.
Sampling errors can be calculated only for probability samples.
Increasing sample size is one of major instruments to reduce
extent of sampling error.
Cont’d
Sampling error is related to confidence intervals.
A narrower confidence interval means more precise estimates of
population for given level of confidence.
The confidence interval for true population mean is given by: