67% found this document useful (3 votes)
2K views

Sampling Design

The document discusses sampling design and different types of sampling techniques. It defines sampling as selecting a subset of units from a target population to make inferences about the whole population. There are two main types of sampling - probability sampling and non-probability sampling. Probability sampling uses random selection so that each unit has an equal chance of being selected, while non-probability sampling involves deliberate or judgment-based selection. The key aspects of a good sampling design are that it results in a representative sample and minimizes errors and biases.

Uploaded by

Santhosh Acharya
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
67% found this document useful (3 votes)
2K views

Sampling Design

The document discusses sampling design and different types of sampling techniques. It defines sampling as selecting a subset of units from a target population to make inferences about the whole population. There are two main types of sampling - probability sampling and non-probability sampling. Probability sampling uses random selection so that each unit has an equal chance of being selected, while non-probability sampling involves deliberate or judgment-based selection. The key aspects of a good sampling design are that it results in a representative sample and minimizes errors and biases.

Uploaded by

Santhosh Acharya
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

Sampling design

• Scope and purpose.


• Sampling is a means of selecting a subset of
units from a target population for the purpose
of collecting information.
• This information is used to draw inferences
about the population as a whole.
• The subset of units that are selected is called
a sample.
Sampling design
 The process of obtaining information from a
subset (sample) of a larger group (population)
 The results for the sample are then used to
make estimates of the larger group
 Faster and cheaper than asking the entire
population
• All items in any field of inquiry constitute a
“Universe” or “Population”
• A complete enumeration of all items in the
population is known as “ census inquiry”
• It can be presumed that in such an inquiry, when all
items are covered, no element is left----thus
achieving highest accuracy
• But in a practical scenario—this may not be true
• A small amount of bias in such an inquiry will
become larger and larger as the number of
observation increases
• There is no way of checking the element of bias or
its extent except through a survey or use of sample
checks
• ‘Bias’ in research is anything that produces systematic
variation in the research finding
• This is unexpected ( it is out of your control)
• In research, the goal is to understand the “true”
relationship between the predictor and an outcome
• Isolating these 2 is difficult
• Bias is defined as any tendency which prevents
consideration of a question
• In research, bias occurs when “systematic error is
introduced into sampling or testing by selecting or
encouraging one outcome or answer over others”
• Besides, this type of inquiry involves a great deal of
time, money and effort
• Therefore, when the field of inquiry is large, this
method becomes difficult to adopt because of the
resources involved
• At times, this method is practically beyond the
reach of ordinary researchers/or group
• Perhaps, Government is the only Institution which
can accomplish such a task
• Example…population census carried out once in a
decade
• However, it needs to be emphasized that when the
Universe is a small one, it is no use resorting to a
sample survey
• At the same time, when field studies are
undertaken in practical life, time, cost and effort
have to be taken care of
• This consideration invariably lead to a selection of
respondents i.e., selection of only a few items
• The respondents selected should be as
representative of the total population as possible in
order to produce a miniature cross-section
• The selected respondents constitute what is
technically called a ‘sample’
• The selection process is called ‘sampling technique’
• The survey so conducted is known as ‘sample survey’
• Let the population size be ‘N’
• If a part of size ‘n’ (which is< N) of this population is
selected according to some rule for studying some
characteristics of the population
• The group consisting of these ‘n’ units is known as
“sample”
• ‘Sample design’---how a sample should be selected
and of what size such a sample would be
• Example: to study the feed back of a teacher in a class
consisting of 60 students, you select 2 samples of
same no of students…one sample of 10 boys and an
other sample of 10 girls
Steps in Sample design
1. Type of Universe: Clearly define the set of
objects….technically called ‘Universe’ (or population)
• The Universe can be finite ( population of city, no of
students in a college etc.) or infinite ( no. of stars in
the sky, no of insects in the world etc)
2. Sampling Unit:
• It may be a geographical one such as state, district,
village etc.
• May be a social unit such as family, school, college etc
• May be a unit of students (scored FCD, girls placed
etc)
3. Source list: It is the ‘sampling frame’ from which
sample is to be drawn, ex: an Engineering college
4.Size of Sample: Ex: 10 boys from a class of 60---here
10 is the sample size and 60 is the population
• The size of the sample should neither be
excessively large nor too small
• It should be optimal
5. Parameters of interest:
• Height, weight , marks scored by students (of
sample)etc
6.Budgetary constraint:
• Cost considerations, from practical point of view,
have a major impact upon decisions relating to not
only the size of the sample but also the type of the
sample
7.Sampling procedure:
• Researcher must decide about the technique to be
used in selecting the items for the sample
• This technique or procedure stands for the sample
design itself
Criteria for selecting a sampling procedure
• 2 costs are involved in a sampling analysis
1. The cost of collecting the data
2. The cost of an incorrect inference resulting from
the data
• There are 2 causes of incorrect inferences namely
systematic bias and sampling error
• A systematic bias results from errors in the
sampling procedure, and it cannot be reduced or
eliminated by increasing the sample size
• Usually a systematic bias is the result of one or
more of the following factors
1. Inappropriate sampling frame:
2. Defective measuring device:
If the physical measuring device is defective there
will be systematic bias in the data collected
through such a measuring device
3. Non-respondents:
The individuals selected in the sampling procedure
should be as representative of the population as
possible
4. Indeterminacy principle:
Example is that some individuals act differently
when they are under observation
5.Natural bias in the reporting of data:
• People in general understate their incomes if asked
about it for tax purposes, but they overstate the
same if asked for social status
• Generally in psychological surveys, people tend to
give what they think is the correct answer rather
than revealing their true feelings
• In summary, while selecting a sampling procedure,
researcher must ensure that the procedure causes a
relatively small sampling error and helps to control
the systematic bias in a better way
Characteristics of a good sampling design
a. Sample design must result in a truly representative
sample
b. Sample design must be done in such way that it
would result in a small sampling error
c. It should be viable in the context of budget
allocated for the research study
d. It should be able to control the systematic bias in a
better way
e. The results of the sample study can be applied, in
general, for the Universe with a reasonable level of
confidence
Different types of sample designs
• There are different types of sample designs based
on 2 factors namely representation basis and the
element selection technique
• On the representation basis, the sample may be
probability sampling or it may be non-probability
sampling
• Probability sampling is based on the concept of
random selection
• Non-probability sampling is ‘non-random’ selection
• On the element selection basis, the sample may be
either restricted or unrestricted
Non-probability sampling (random)
• This is also known by different names such as
deliberate sampling, purposive sampling and
judgment sampling
• The choice of the researcher concerning the items
remains supreme
• In such a design, personal element has a great chance
of entering into the selection of the sample
• The investigator may select a sample which shall yield
results favourable to his point of view
• If this happens, the entire inquiry may get vitiated
(destroyed/spoilt)
• Thus there is always the danger of bias entering into
this type of sampling technique
• ‘Quota sampling’ is also an example of non-
probability sampling
• A sampling method of
gathering representative data from a group
• As opposed to random sampling, quota
sampling requires that
representative individuals are chosen out of a
specific subgroup.
• For example, a researcher might ask for
a sample of 100 females, or 100 individuals
between the ages of 20 and 30
• Under this sampling, the interviewers are simply
given quotas to be filled from the different strata
• The actual selection of items for the sample is left to
the interviewer’s discretion
• This type of sampling is very convenient and is
relatively inexpensive
• But the samples so selected certainly do not possess
the characteristic of random samples
• Quota samples are essentially judgment samples
• Inferences drawn on this basis are not amenable
(willing to accept) to statistical treatment in a formal
way
Probability sampling
• This is known as ‘random sampling’ or ‘Chance
sampling’
• Every item of the universe has an equal chance of
inclusion in the sample
• It is so to say, a lottery method
• It is blind chance alone that determines whether one
item or the other is selected
• Results obtained can be assured in terms of probability
• Random sampling ensures the law of ‘Statistical
Regularity’
• It (SR) states that if on an average the sample chosen
is a random one, the sample will have the same
composition and characteristics of the universe
• The implications of random sampling are:
a) It gives each element in the population an equal
probability of getting into the sample
b) All choices are independent of one another
c) It gives each possible sample combination an equal
probability of being chosen
• Thus we can define a simple random sample from a
finite population as a sample which is chosen such
that each of the NCn possible samples has the same
probability…1/NCn of being selected
• Example:
• Take a certain finite population consisting of 6
elements(say a,b,c,d,e and f).
• So N=6
• Suppose that we want to take a sample of size n=3
from it.
• Then there are 6c3= 6!/3!.3!=20
• 20 possible distinct samples of the required size
• The elements are: abc, abd, abe, abf………
• If we choose one of these samples, then its
probability will be 1/20
How to select a random sample
• If the population is small, write each of the possible
sample on a slip of paper
• Mix them properly
• Then draw a lottery to pick a sample
• This procedure is obviously impractical if the
population is large
• The procedure may be simplified in actual practice
by the use of random number tables
• Various statisticians like Tippett, Yates, Fisher have
prepared tables of random numbers which can be
used for selecting a random sample
Tippett’s random number tables
• Tippett gave 10400 four digit numbers
• He selected 41600 digits from the census reports and
combined them into fours to give his random numbers
which may be used to obtain a random sample
• The first 30 sets of Tippett’s numbers:

2952 6641 3992 9792 7979 5911


3170 5624 4167 9525 1545 1396
7203 5356 1300 2693 2370 7483
3408 2769 3563 6107 6913 7691
0560 5246 1112 9025 6008 8126
• Suppose we are interested in taking a sample of 10 units from a
population of 5000 Units, bearing numbers from 3001 to 8000
• Take 10 such numbers which are not less than 3001 and not
greater than 8000
• We shall read the table randomly from left to right starting from
the first row, we get the following 10 numbers;
• 6641,3992,7979,5911,3170,5624,4167,7203,5356 and 7483
• The units bearing the above serial numbers would then
constitute our required random sample

2952 6641 3992 9792 7979 5911


3170 5624 4167 9525 1545 1396
7203 5356 1300 2693 2370 7483
3408 2769 3563 6107 6913 7691
0560 5246 1112 9025 6008 8126
Complex Random sampling designs
• Probability sampling under restricted sampling
techniques may result in complex random sampling
designs
• There are 6 popular types of complex random
sampling designs:
1. Systematic sampling
2. Stratified sampling
3. Cluster sampling
4. Area sampling
5. Multi stage sampling
6. Sampling with probability proportional to size
Systematic sampling
• Select every ith item on a list….it is systematic
sampling
• An element of randomness is introduced into this
kind sampling by using random numbers to pick up
the unit with which to start
• The remaining units of the sample are selected at
fixed intervals
• As the unit interval is fixed, this type of sampling is
spread more evenly over the entire population
• It is an easier and inexpensive method and it could
be used even in case of large population
Stratified sampling
• If a population from which a sample is to be drawn
does not constitute a homogeneous group, stratified
sampling technique is generally applied in order to
obtain a representative sample
• Here the population is divided into several sub-
populations (strata) and then we select items from
each stratum to constitute a sample
• Since each stratum is more homogeneous than the
total population, precise estimation would be
achieved
• Stratified sampling results in more reliable and
detailed information
• How to form strata?
• Strata be formed on the basis of common
characteristic(s) of the items to be put in each
stratum
• Ensure elements being most homogeneous within
each stratum and most heterogeneous between the
different strata
• Thus strata are purposely formed and are usually
based on past experience and personal judgment of
the researcher
• At times, pilot study may be conducted for
determining a more appropriate and efficient
stratification plan
• How should items be selected from each stratum?
• By simple random sampling
• Systematic sampling can be used if it is considered
more appropriate in certain situations
• How many items be selected from each stratum or
how to allocate the sample size of each stratum?
• Usually follow the method of proportional
allocation under which the sizes of the samples
from the different strata are kept proportional to
the sizes of the strata
• If Pi represents the proportion of population
included in stratum ‘i’, and ‘n’ represents the total
sample size, then the number of elements selected
from stratum i is ---n.pi
• Illustration:
• Let n=30, N=8000, 3 Strata of sizes N1=4000,
N2=2400 and N3 = 1600
Adopting proportional allocation, we can get the
sample size for strata with N1 =4000, as:
P1 = 4000/8000
n1 = n.P1 = 30x0.5 =15
Similarly for the 2nd strata: n2 = 30x2400/8000=09
And for the 3rd strata: n3 = 30x 1600/8000 =06
The Proportion of sample size-----15:9:6==5:3:2
The proportion of sizes of strata--- 4000:2400:1600
Disproportionate sampling

• In cases where strata differ not only in size but also


in variability, it is considered reasonable to take
larger samples from the more variable strata and
smaller samples from the less variable strata
• This procedure accounts for both----differences in
stratum size and differences in stratum variability
by using disproportionate sampling design
Illustration
• A population is divided into 3 strata so that
N1=5000, N2=2000 and N3=3000. Respective
standard deviations are:15,18 and 5.
• How should a sample size n=84 be allocated to the 3
strata, if we want optimum allocation using
disproportionate sampling design?
• ni = n Ni i / Ni I
• Ni i = N11+ N12+ N13 +………..+ Ni i
Sample size for strata with N1=5000 is given by:
N1 = 84x5000x15/(5000x15 +2000x18 +3000x9)
=6300000/126000 =50
On the similar lines we get N2=24, N3=10
• In addition to differences in stratum size and
differences in stratum variability (),one may have
differences in stratum cost
• Then one can have cost optimal disproportionate
sampling design as:
ni = n.Ni i/ Ci /( Ni i/Ci) for i=1,2….k
N i i = N11+ N12+ N13 + N14 + N15
Illustration: Next slide…
Problem.2
A certain population is divided into 5 strata so that N1=200,
N2=2000, N3=1800,N4=1700, and N5=2500. Respective
standard deviations are:
1= 1.6, 2=2.0, 3 =4.4, 4 =4.8, 5= 6.0, and further the
expected sampling cost in the first 2 strata is Rs4.0 per
interview and in the remaining 3 strata, the sampling cost is
Rs6.0, per interview. How should a sample of size n=226 be
allocated to 5 strata if we adopt
a) proportionate sampling design,
b) disproportionate sampling design considering
i) only the differences in stratum variability
ii) differences in stratum variability as well as the differences
in stratum sampling costs
Cluster sampling
• The total area of interest happens to be a big one,
• A convenient way in which a sample can be taken is
to divide the area into a number of smaller non-
overlapping areas
• And then to randomly select a number of these small
areas (usually called clusters), with the ultimate
sample consisting of all units in these small areas or
clusters
• Assume that there 20,000 machine parts in the
inventory at a given point of time
• These are stored in 400 cases (boxes) each case
having 50 each.
• Suppose we want to estimate the proportion of
machine parts which are defective in the above
said inventory
• Now use a cluster sampling treating each case as a
cluster
• Then randomly select ‘n’ cases out of 400 cases and
examine all the machine parts in them
• Cluster sampling, no doubt, reduces cost by
concentrating surveys in selected clusters
• But it is less precise than random sampling
• There may not be as much information in ‘n’
clusters’ observation as in ‘n’ randomly drawn
observations without making clusters
• This has the economic advantage
Area sampling
• Area sampling is same as ‘cluster sampling’, if some
geographical subdivisions are treated as clusters
• So the merits and demerits of cluster sampling are
also applicable for area mapping
Multi-stage sampling
• Multi-stage sampling is further development of the
principle of cluster sampling
• Here one will have clusters at different levels
• Suppose we want to investigate the working
efficiency of nationalized banks in India and we
want to take a sample of few banks for this purpose
• This can be solved having banks at different stages--
-banks at states, banks in the districts of a state,
banks in the towns in a district so on…
• Multi-stage sampling is applied in big inquires
extending to a considerable large geographical area
or a big organization (say an University)
• There are 2 advantages
1. It is easier to administer than most single stage
designs because it is developed in partial units
2. A large number of units can be sampled for a given
cost under multistage sampling because of
sequential clustering
6. Sampling with probability proportional to size
• If the cluster sampling units do not have the same
number of units, then cluster sampling cannot be
followed.
• In that case a random selection process where the
probability of each cluster being included in the
sample is proportional to the size of the cluster
• List the number of elements in each cluster
• Then sample systematically the appropriate number
of elements from the cumulative total of elements
Illustration
• The following are the number of departmental
stores in 15 cities;
35,17,10,32,70,28,26,19,26,66,37,44,33,29,28
• It is required to select a sample of 10 stores, using
cities as clusters and selecting within clusters
proportional to size
• How many stores from each city should be chosen.
Use a starting point of first sample from the first
city (cluster) with 10 elements(departmental stores)
• Sampling of appropriate number of elements has to
be done from the cumulative total elements.
• Cumulative total is =500 departmental stores
• From this we have to select a sample of 10 stores
• Therefore the appropriate sampling interval is
500/10=50
• The given starting point of the element is 10 ( if it is
not given, it can be randomly selected)
• The next sampling is at 10+50=60
• The sequence goes on 60 + 50 =110 and so on
City No. No.of Deptl stores Cumulativ Sample
e total
1 35 35 10 (1)
2 17 52 ----(0)
3 10 62 60 (1)
4 32 94 -----
5 70 164 110, 160(2)
6 28 192 ----
7 26 218 210 (1)
8 19 237 --- 10 samples
9 26 263 260 (1)
10 66 329 310 (1)
11 37 366 360 (1)
12 44 410 410 (1)
13 33 443 ---
14 29 472 460 (1)
15 28 500
• Problem.2
The following are the number of departmental
stores in 10 cities: 35,27,24,32,42,30,34,40,29 and
38.If we want to select a sample of 15 stores using
cities as clusters and selecting within clusters
proportional to size, how many stores each city
should be chosen?
• Use a starting point of 4
City No. of Departmental Cumulative Sample No.of stores to
Number Stores total be selected
1 35 35 4, 26 2
2 27 62 48, 1
3 24 86 70 1
4 32 118 92,114 2
5 42 160 136,158, 2
6 30 190 180, 1 15
7 34 224 202,224 2
8 40 264 246 1
9 29 293 268, 1
10 38 331 290,312 2
Sequential sampling
• When a particular lot has to be accepted or rejected
on the basis of a single sample---it is known as
single sampling
• Similarly double sampling……….., multiple sampling
• When the number of samples is more than 2 but it
is neither certain nor decided in advance, then this
type of sampling is often referred to as sequential
sampling

You might also like