0% found this document useful (0 votes)
6 views

Module 3 RM

Uploaded by

Aman Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Module 3 RM

Uploaded by

Aman Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

1

Module III- Sampling Design

Data Collection in Statistics

In Statistics, the basis of all statistical calculations or interpretation lies in


the collection of data. There are numerous methods of data collection. In this lesson,
we shall focus on two primary methods and understand the difference between them.
Both are suitable in different cases and the knowledge of these methods is important
to understand when to apply which method. These two methods are the Census
method and Sampling method.

Browse more Topics under Collection Of Data

 Source and Collection

 Sampling Errors and NSSO

Census Method

Census method is the method of statistical enumeration where all members of the
population are studied. A population refers to the set of all observations under
concern. For example, if you want to carry out a survey to find out student’s
feedback about the facilities of your school, all the students of your school would
form a part of the ‘population’ for your study.

At a more realistic level, a country wants to maintain information and records about
all households. It can collect this information by surveying all households in the
country using the census method.

In our country, the Government conducts the Census of India every ten years. The
Census appropriates information from households regarding their incomes, the
earning members, the total number of children, members of the family, etc. This
method must take into account all the units. It cannot leave out anyone in
collecting data. Once collected, the Census of India reveals demographic
information such as birth rates, death rates, total population, population growth rate
of our country, etc. The last census was conducted in the year 2011.

Sampling Method
2

Like we have studied, the population contains units with some similar characteristics
on the basis of which they are grouped together for the study. In the case of
the Census of India, for example, the common characteristic was that all units are
Indian nationals. But it is not always practical to collect information from all the units
of the population.

It is a time-consuming and costly method. Thus, an easy way out would be to collect
information from some representative group from the population and then make
observations accordingly. This representative group which contains some units from
the whole population is called the sample.

How to select a Sample?

The first most important step in selecting a sample is to determine the population.
Once the population is identified, a sample must be selected. A good sample is one
which is:

 Small in size.

 It provides adequate information about the whole population.

 It takes less time to collect and is less costly.


In the case of our previous example, you could choose students from your class to be
the representative sample out of the population (all students in the school). However,
there must be some rationale behind choosing the sample. If you think your class
comprises a set of students who will give unbiased opinions/feedback or if you think
your class contains students from different backgrounds and their responses would be
relevant to your student, you must choose them as your sample. Otherwise, it is ideal
to choose another sample which might be more relevant.

Again, realistically, the government wants estimates on the average income of the
Indian household. It is difficult and time-consuming to study all households. The
government can simply choose, say, 50 households from each state of the country
and calculate the average of that to arrive at an estimate. This estimate is not
necessarily the actual figure that would be arrived at if all units of the population
underwent study. But, it approximately gives an idea of what the figure might look
like.
3

Difference between Census and Sample Surveys

Parameter Census Sample Survey

A statistical method
A statistical method
that studies only a
that studies all the units
Definition representative group of
or members of a
the population, and not
population.
all its members.

Calculation Total/Complete Partial

It is a time-consuming
Time involved It is a quicker process.
process.

It is a relatively
Cost involved It is a costly method.
inexpensive method.

The results are


The results obtained are
relatively inaccurate
accurate as each
due to leaving out of
Accuracy member is surveyed.
items from the sample.
So, there is a negligible
The resulting error is
error.
large.

Reliability Highly reliable Low reliability


4

The smaller the sample


Error Not present size, the larger the
error.

This method is suited This method is suited


Relevance
for heterogeneous data. for homogeneous data.

What are the merits and demerits of the sampling method?

Merits –

 It is an economically viable method as it is less costly, saves time and requires


less manpower to collect data.

 The result of the census method may be checked with the help of the sampling
method.

 In cases where the population size is too large, the sampling method is easy and
more practical.

 We can use it to make estimations about population characteristics without


even surveying all units of the population.
Demerits –

 If the sampling is not properly conducted, it might lead to erroneous and


unrepresentative results.

 Sampling normally generates an error due to leaving out of units from the
population. If a crucial unit is left out of the sample, the resulting error will be
large.
5

 If skilled personnel are not available to interpret the data, the results drawn will
be unreliable.

Sampling

1. The process of selecting the representative sample units from the population
to study the characteristics of the population is called sampling.
2. In many empirical studies, data are to be collected from a population under
study.
3. A population consist number of units usually very large and sometimes
infinitely large.
4. In many cases, it is not practically possible to include all units of the
population for the investigation.
5. Therefore a few units of the population have to be selected as a
representative of the whole population.
6. So sampling is needed in this situation to draw the representative sample of
the population.

STEPS IN SAMPLING DESIGN


A researcher should take into consideration the following aspects while
developing a sample design:

1 Type of Universe:
The first step involved in developing sample design is to clearly define the number
of cases, technically known as the universe. A universe may be finite or infinite. In
a finite universe the number of items is certain, whereas in the case of an infinite
universe the number of items is infinite (i.e., there is no idea about the total number
of items). For example, while the population of a city or the number of workers in
a factory comprise finite universes, the number of stars in the sky, or throwing of a
dice represent infinite universe.
6

2 Sampling Units:
Prior to selecting a sample, decision has to be made about the sampling unit. A
sampling unit may be a geographical area like a state, district, village, etc., or a
social unit like a family, religious community, school, etc., or it may also be an
individual. At times, the researcher would have to choose one or more of such
units for his/her study.

3 Source List:

Source list is also known as the ‘sampling frame’, from which the sample is to be
selected. The source list consists of names of all the items of a universe. The
researcher has to prepare a source list when it is not available. The source list must
be reliable, comprehensive, correct, and appropriate. It is important that the source
list should be as representative of the population as possible.

4 Size of Sample:

Size of the sample refers to the number of items to be chosen from the universe to
form a sample. For a researcher, this constitutes a major problem. The size of
sample must be optimum. An optimum sample may be defined as the one that
satisfies the requirements of representativeness, flexibility, efficiency, and
reliability. While deciding the size of sample, a researcher should determine the
desired precision and the acceptable confidence level for the estimate. The size of
the population variance should be considered, because in the case of a larger
variance generally a larger sample is required. The size of the population should be
considered, as it also limits the sample size. The parameters of interest in a
research study should also be considered, while deciding the sample size. Besides,
costs or budgetary constraint also plays a crucial role in deciding the sample size.

(A) Parameters Of Interest:

The specific population parameters of interest should also be considered while


determining the sample design. For example, the researcher may want to make an
estimate of the proportion of persons with certain characteristic in the population,
or may be interested in knowing some average regarding the population. The
7

population may also consist of important sub-groups about whom the researcher
would like to make estimates. All such factors have strong impact on the sample
design the researcher selects.

(B) Budgetary Constraint:


From the practical point of view, cost considerations exercise a major influence on
the decisions related to not only the sample size, but also on the type of sample
selected. Thus, budgetary constraint could also lead to the adoption of a non-
probability sample design.

(c) Sampling Procedure:

Finally, the researcher should decide the type of sample or the technique to be
adopted for selecting the items for a sample. This technique or procedure itself
may represent the sample design. There are different sample designs from which a
researcher should select one for his/her study. It is clear that the researcher should
select that design which, for a given sample size and budget constraint, involves a
smaller error.

CHOOSING THE BEST SAMPLING METHOD


By following the steps below we could choose the best sampling method for our
study in an orderly fashion.

1 Research objectiveness
Firstly, a refined research question and goal would help us define our population of
interest. If our calculated sample size is small then it would be easier to get a
random sample. If, however, the sample size is large, then we should check if our
budget and resources can handle a random sampling method.

2 Sampling frame availability


Secondly, we need to check for availability of a sampling frame (Simple), if not,
could we make a list of our own (Stratified). If neither option is possible, we could
still use other random sampling methods, for instance, systematic or cluster
sampling.
8

3 Study design
Moreover, we could consider the prevalence of the topic (exposure or outcome) in
the population, and what would be the suitable study design. In addition, checking
if our target population is widely varied in its baseline characteristics. For example,
a population with large ethnic subgroups could best be studied using a stratified
sampling method.

4 Random sampling
Finally, the best sampling method is always the one that could best answer our
research question while also allowing for others to make use of our results
(generalisability of results). When we cannot afford a random sampling method,
we can always choose from the non-random sampling methods.

CHARACTERISTICS OF A GOOD SAMPLE


(1) Goal-oriented: A sample design should be goal oriented. ...
(2) Accurate representative of the universe: A sample should be an accurate
representative of the universe from which it is taken. ...
(3) Proportional: A sample should be proportional.

DIFFERENT TYPES OF SAMPLE DESIGNS

Types of sampling: sampling methods

Sampling in market research is of two types – probability sampling and non-


probability sampling. Let’s take a closer look at these two methods of sampling.

1. Probability sampling: Probability sampling is a sampling technique where a


researcher sets a selection of a few criteria and chooses members of a population
randomly. All the members have an equal opportunity to be a part of the sample
with this selection parameter.
2. Non-probability sampling: In non-probability sampling, the researcher chooses
members for research at random. This sampling method is not a fixed or
predefined selection process. This makes it difficult for all elements of a
population to have equal opportunities to be included in a sample.
In this blog, we discuss the various probability and non-probability sampling
methods that you can implement in any market research study.
9

Types of probability sampling with examples:

Probability sampling is a sampling technique in which researchers choose samples


from a larger population using a method based on the theory of probability. This
sampling method considers every member of the population and forms samples
based on a fixed process.

For example, in a population of 1000 members, every member will have a 1/1000
chance of being selected to be a part of a sample. Probability sampling eliminates
bias in the population and gives all members a fair chance to be included in the
sample.

There are four types of probability sampling techniques:

 Simple random sampling: One of the best probability sampling techniques that
helps in saving time and resources, is the Simple Random Sampling method. It is
a reliable method of obtaining information where every single member of a
population is chosen randomly, merely by chance. Each individual has the same
probability of being chosen to be a part of a sample.
For example, in an organization of 500 employees, if the HR team decides on
conducting team building activities, it is highly likely that they would prefer
picking chits out of a bowl. In this case, each of the 500 employees has an equal
opportunity of being selected.
 Cluster sampling: Cluster sampling is a method where the researchers divide
the entire population into sections or clusters that represent a population.
Clusters are identified and included in a sample based on demographic
parameters like age, sex, location, etc. This makes it very simple for a survey
creator to derive effective inference from the feedback.
For example, if the United States government wishes to evaluate the number of
immigrants living in the Mainland US, they can divide it into clusters based on
states such as California, Texas, Florida, Massachusetts, Colorado, Hawaii, etc.
This way of conducting a survey will be more effective as the results will be
organized into states and provide insightful immigration data.
 Systematic sampling: Researchers use the systematic sampling method to
choose the sample members of a population at regular intervals. It requires the
selection of a starting point for the sample and sample size that can be repeated
at regular intervals. This type of sampling method has a predefined range, and
hence this sampling technique is the least time-consuming.
For example, a researcher intends to collect a systematic sample of 500 people in
a population of 5000. He/she numbers each element of the population from 1-
10

5000 and will choose every 10th individual to be a part of the sample (Total
population/ Sample Size = 5000/500 = 10).
 Stratified random sampling: Stratified random sampling is a method in which
the researcher divides the population into smaller groups that don’t overlap but
represent the entire population. While sampling, these groups can be organized
and then draw a sample from each group separately.
For example, a researcher looking to analyze the characteristics of people
belonging to different annual income divisions will create strata (groups)
according to the annual family income. Eg – less than $20,000, $21,000 –
$30,000, $31,000 to $40,000, $41,000 to $50,000, etc. By doing this, the
researcher concludes the characteristics of people belonging to different income
groups. Marketers can analyze which income groups to target and which ones to
eliminate to create a roadmap that would bear fruitful results.

Types of non-probability sampling with examples


The non-probability method is a sampling method that involves a collection of
feedback based on a researcher or statistician’s sample selection capabilities and
not on a fixed selection process. In most situations, the output of a survey
conducted with a non-probable sample leads to skewed results, which may not
represent the desired target population. But, there are situations such as the
preliminary stages of research or cost constraints for conducting research, where
non-probability sampling will be much more useful than the other type.

Four types of non-probability sampling explain the purpose of this sampling


method in a better manner:

 Convenience sampling: This method is dependent on the ease of access to


subjects such as surveying customers at a mall or passers-by on a busy street. It
is usually termed as convenience sampling, because of the researcher’s ease of
carrying it out and getting in touch with the subjects. Researchers have nearly no
authority to select the sample elements, and it’s purely done based on proximity
and not representativeness. This non-probability sampling method is used when
there are time and cost limitations in collecting feedback. In situations where
there are resource limitations such as the initial stages of research, convenience
sampling is used.
For example, startups and NGOs usually conduct convenience sampling at a mall
to distribute leaflets of upcoming events or promotion of a cause – they do that
by standing at the mall entrance and giving out pamphlets randomly.
11

 Judgmental or purposive sampling: Judgemental or purposive samples are


formed by the discretion of the researcher. Researchers purely consider the
purpose of the study, along with the understanding of the target audience. For
instance, when researchers want to understand the thought process of people
interested in studying for their master’s degree. The selection criteria will be:
“Are you interested in doing your masters in …?” and those who respond with a
“No” are excluded from the sample.
 Snowball sampling: Snowball sampling is a sampling method that researchers
apply when the subjects are difficult to trace. For example, it will be extremely
challenging to survey shelterless people or illegal immigrants. In such cases,
using the snowball theory, researchers can track a few categories to interview
and derive results. Researchers also implement this sampling method in
situations where the topic is highly sensitive and not openly discussed—for
example, surveys to gather information about HIV Aids. Not many victims will
readily respond to the questions. Still, researchers can contact people they might
know or volunteers associated with the cause to get in touch with the victims and
collect information.
 Quota sampling: In Quota sampling, the selection of members in this sampling
technique happens based on a pre-set standard. In this case, as a sample is
formed based on specific attributes, the created sample will have the same
qualities found in the total population. It is a rapid method of collecting samples.

HOW TO SELECT A RANDOM SAMPLE.

How do you decide on the type of sampling to use?


For any research, it is essential to choose a sampling method accurately to meet the
goals of your study. The effectiveness of your sampling relies on various factors.
Here are some steps expert researchers follow to decide the best sampling method.

 Jot down the research goals. Generally, it must be a combination of cost,


precision, or accuracy.
 Identify the effective sampling techniques that might potentially achieve the
research goals.
 Test each of these methods and examine whether they help in achieving your
goal.
 Select the method that works best for the research.
12

RANDOM SAMPLE FROM AN INFINITE UNIVERSE

So far we have talked about random sampling, keeping in view only the finite
populations. But what about random sampling in context of infinite populations? It
is relatively difficult to explain the concept of random sample from an infinite
population. However, a few examples will show the basic characteristic of such a
sample.

Suppose we consider the 20 throws of a fair dice as a sample from the


hypothetically infinite population which consists of the results of all possible
throws of the dice. If the probability of getting a particular number, say 1, is the
same for each throw and the 20 throws are all independent, then we say that the
sample is random.

Similarly, it would be said to be sampling from an infinite population if we sample


with replacement from a finite population and our sample would be considered as a
random sample if in each draw all elements of the population have the same
probability of being selected and successive draws happen to be independent. In
brief, one can say that the selection of each item in a random sample from an
infinite population is controlled by the same probabilities and that successive
selections are independent of one another.

COMPLEX RANDOM SAMPLING DESIGNS.

What is a Complex Sample?

The most defining feature of a complex sample is that sample members do not
have equal probability of being selected.

That sounds simple enough. But…

There are many ways of meeting this defining feature while also being capable of
representing the population and having advantages for other research goals.

You’ve probably heard of some of the common design features of complex


samples. It’s these features that create the complexity.
13

If you’ve got one or more of these, you can consider your sample complex:

 Stratification
 Clustering
 Oversampling
 Sampling Without replacement
 Finite populations
 Multistage sampling

Our next article will define each of these terms, but there are two things that are
important to understand first.

First, whenever a sample is complex, statistical estimation and analysis are


more complex.

In order for the sample’s data to accurately estimate the population,


you must account for the design features in the analysis. This means using stat
software that’s designed just for complex samples.

Second, despite this, there are very good reasons to use complex samples.

Three Very Good Reasons to use Complex Samples

1. Complex samples can be incredibly cost effective, while improving the


precision of sample estimates.

In other words, a good complex sampling design will simultaneously cost much,
much less to administer and keep standard errors smaller than they would be in a
simple random sample.

When your research budget is tight (and whose isn’t?), this is hugely important.

2. Complex samples allow access to difficult-to-access sampling frames.


14

For example, it’s very difficult to sample schoolchildren without first sampling
schools or patients without sampling hospitals. Including these multiple stages of
sampling means not every student or patient has an equal probability of being in
the sample (making the sample complex).

3. Complex samples can ensure sufficient representation of small sub-


population groups in the final sample.

Let’s say you want to study color blind people over age 75 from rural areas. This
group makes up a tiny, tiny proportion of the overall adult population. To have
sufficient numbers in the group to make any statistical comparisons or have
reasonably sized confidence intervals, you’d have to sample millions of people
using simple random sampling. Oversampling this group allows you to have
enough group members to study them, but it makes the overall sample complex.

What is a complex sample?


Most statistical analyses assume that the data collected are from a simple
random sample (SRS) of the population of interest. So say, for example, that
you were conducting a survey of employees in your workplace (this is the
“population”), a simple random sample would be where each of your colleagues
in the office (or “sampling units”) were equally likely to be sampled. However,
it’s not always possible or practical to take a simple random sample. Simple
random sampling requires access to the whole population of interest (a
“complete sampling frame” listing the sampling units) which may not be
feasible for large populations. If sampling units are widely spread out
geographically, for example, it might also be prohibitively expensive to access
and sample across the whole area. Or, if some members of the population (e.g.,
of a particular demographic background) are relatively low in number, a simple
random sample might not obtain enough (or any) of these individuals to reliably
measure their responses. So, even if a complete sampling frame is available, it
might be much cheaper or more efficient to use a complex sampling
scheme instead of SRS, such as multi-stage sampling, clustering and/or
stratification, for example.

With these approaches, members of the population don’t all have the same
probability of being selected into the sample. Complex samples are most often
used for surveys, especially large national or multinational ones where simple
15

random sampling is simply not practical. For example, suppose you were
conducting a survey in a conflict-affected country and the target population was
all adults aged over 18, totaling, say, 20 million individuals. You might be
interested in how responses differ by occupation, but some categories (perhaps
self-employed) may only represent a small fraction of the population. You
might therefore consider stratifying your sampling to ensure that sufficient
responses were obtained to make reliable estimates in each occupation group.
The country may also be split geographically into, say, 40 states. To travel to
and interview people in each of these states would likely be unfeasible, so
cluster sampling might be used so that only a subset of the states needed to be
accessed.

Remember – complex samples require statistical methods that take the sampling
design into account.

Complex samples may also be incorporated into the design of cross-sectional


observational studies or even interventional studies (such as clinical trials). The
key thing to remember is that when analysing data from a survey using complex
sampling, the statistical methods that you use must take the sampling design
into account.

So, what are the most common complex sampling approaches and why and
when are they used? We focus here on cluster sampling and stratified sampling.
We’ll also discuss sampling without replacement which should also be taken
into account when analysing your data.

Clustering
In cluster sampling, the population is split into similar groups of individuals
(“clusters”) and then a sample of these clusters is taken (the clusters are the
sampling units in this case) so that all of the elements in the selected clusters are
included in the sample. Clustering is appropriate when we expect elements in
different clusters to be relatively similar (“homogeneous”), i.e., each cluster is
representative of the population.

For example, suppose we wanted to gather the opinions of school children in a


particular county in England, say Somerset. It would be difficult and expensive
to interview all school-aged children in Somerset, so we take a sample of those
children instead. However, taking a simple random sample of pupils in
Somerset may mean that we still need to survey pupils in all, or a large
proportion, of the schools in the county. It would be much cheaper to only
16

survey the pupils in a subset of schools – so, we might cluster pupils according
to their schools and then take a sample of the clusters (surveying all students
within those selected clusters) to obtain a clustered sample of school children in
the county.

This method is most efficient when most of the variation in the population is
within clusters, rather than between them (higher within-cluster correlation
increases the variance compared to SRS). Cluster sampling is generally used to
reduce costs, by reducing the number of clusters that we sample within whilst
maintaining the sample efficiency.

Stratification
Stratified sampling
Stratified sampling involves splitting the members of the population into
subgroups (“strata”) before sampling, and then applying sampling (usually SRS)
separately within each and every group (“stratum”). This is in contrast to cluster
sampling where whole clusters are sampled, rather than samples of individuals
being taken within each group (i.e., stratum), as illustrated in Figure 1.
Stratified sampling can help to ensure that the sample collected is representative
of the population, by guaranteeing that sufficient individuals from each sub-
group (e.g., gender, or socioeconomic status) will be sampled. This is
especially important if some strata only represent a small proportion of the
overall population and if the survey responses are expected to differ across the
subgroups. For example, responses to a survey might differ by nationality so if
we were to miss some of the nationalities in our sample, our results might be
biased.
17

Figure 1: An illustration of clustering (all units within a sample of groups are


taken) versus stratified sampling (a sample of units within all groups is taken).

Stratified sampling is appropriate when elements in different clusters are


relatively dissimilar (“heterogeneous”), whereas cluster sampling is most
efficient when the majority of the variation in the population is within clusters.

Returning to the example of surveying school children in Somerset, we might


want to estimate the proportion of pupils with different characteristics stratified
(i.e., estimated separately) by school type (e.g., academy, faith school, voluntary
aided school, etc.). In this case, we could use stratified sampling to ensure that
pupils from different school types are adequately represented in our sample.
Schools would be split into strata (e.g., by school type: academy, faith school,
18

voluntary aided school, etc.), and then samples of pupils would be taken within
each stratum, to ensure that pupils from each school type were adequately
represented in the sample. Contrast this with cluster sampling where we would
cluster pupils and then take a sample of the clusters (surveying all students
within those selected clusters).

Each stratum can be sampled in proportion to the relative size of that sub-group
in the total population (“proportionate allocation”) to make the overall sample
as representative as possible. Or, larger samples can be obtained in strata with
greater variability to minimise the sampling variance (“optimum allocation”),
improving the efficiency of the sample overall.

It is also possible to combine stratified sampling with clustered sampling. For


example, we might stratify schools by type and then take cluster samples of
schools within each stratum. This is an example of a one-stage cluster sampling
scheme but further stages of sampling could be also included. In two-stage
cluster sampling, for example, after taking the sample of clusters, a sample of
elements within each selected cluster is then taken. So, we might only interview
a sample of the pupils in each selected school.

Post-stratification

After completing your survey, you might find that the sample you have taken is
not representative of the population (for example, 40% of the population might
be male, whereas in the sample obtained only 20% might be males and so males
are “under-sampled”). In this case post-stratification can be applied. Such
differences can be due to non-response or incomplete coverage, which are an
inevitable consequence of the fact that we cannot sample everyone in the
population nor compel them to respond. If the sample is imbalanced with
respect to key factors that are likely to affect the study/survey responses then
they can lead to biases in the results. Sampling weights can be calculated to
post-stratify the sample (to adjust the sample data after it has been collected) to
ensure that the results are representative of the population. For more
information on survey weighting and post-stratification, see our case study on
the work we did recently with Sport Wales for their School Sport Survey.

Sampling without replacement

Suppose you were taking a sample of animals from the wild, in order to estimate
their average weights, for example. Once one animal had been caught and
measured, it would then be released back into the wild. It’s possible, in this
19

case, that you might catch and measure the same animal more than once – we
call this “sampling with replacement”. With replacement means that once an
individual is selected to be in the sample, that individual is placed back in the
population to potentially be sampled again. There are two ways to select a
sample from the population – with replacement, as in this example, or without
replacement. Without replacement means that once an individual is sampled,
that individual cannot be sampled again; they are not placed back in the
population. This will often occur when a sample is preselected from a sampling
frame, i.e., a list of all those in the population who can be sampled.

Many standard analysis techniques assume that the sample being analysed was
obtained from a sample taken with replacement or from an infinite population
(when the population is infinite, or extremely large, then there’s little difference
between sampling with and without replacement). However, in practice, most
simple random samples are actually taken without replacement from a finite
population. In this case, the variability of our sample is actually less than
expected, and therefore we can apply a finite population correction to account
for this greater efficiency in the sampling process. Each sampled individual is
always unique and therefore provides ‘new’ information when sampling without
replacement, whereas it’s possible when sampling with replacement to have
‘repeated’ information. When sampling without replacement from a finite
population, it may be possible to sample all individuals in which case we’ll have
no uncertainty in our estimates. The correction only has a noticeable effect
when the sampling fraction, i.e., the proportion of the population sampled, is
large. A good rule-of-thumb to decide whether you need to apply a finite
population correction is if you obtain a sample that makes up more than 5% of
the population you should apply the correction. A finite population correction
20

factor (FPC) is calculated, which is then multiplied by the standard error of the
estimate. We’ve recently released a series of sample size and confidence
interval calculators, some of which include a finite population correction – for
more details (including the formula for the FPC) see the calculators on the
Resources section of our website.

How to analyse data from a complex sample

The most important thing to understand about complex sampling is that a more
sophisticated analysis is needed when analysing the data collected – standard
approaches are not necessarily appropriate. We must take account of the sample
design in order for our conclusions to be reliable, whether we are estimating a
characteristic of the population or testing for effects, for example.

The usual standard errors, assuming a simple random sample with replacement,
will be incorrect if a complex sample has been taken. For example, a sample
that is collected using cluster sampling underestimates the true population
variance. Adjusting the standard errors to account for the complex sampling
plan we find that they are larger, if correctly estimated, than those that would
have been obtained assuming a simple random sample of the same size. This is
because we might expect responses within a cluster to be more similar to each
other than for randomly selected individuals across the population. Without
correcting for these under-estimates, we increase the risk of falsely determining
significant effects when they do not actually exist (“false positives”).

In the statistical software package SPSS, complex samples analysis plans can be
generated which, when used alongside the corresponding Analyze>Complex
Samples menu, ensure that the sample design is incorporated into the analysis.
In R, the survey package similarly allows you to specify a complex survey
design and carry out appropriate analyses taking the design into account. Other
packages in R, such as the anesrake package are also useful for implementing
survey weighting, for example.
Complex samples are a useful tool for creating more efficient (e.g., stratified
sampling with optimum allocation) or cheaper (e.g., cluster sampling) sampling
designs. However, it’s crucial when using a complex sample to account for the
sampling design when analysing your data in order to ensure that the results are
accurate and reliable. If you’re conducting a survey using complex sampling
and need help with the survey design or analysis, contact us to find out how we
can help.

You might also like