Module 3 RM
Module 3 RM
Census Method
Census method is the method of statistical enumeration where all members of the
population are studied. A population refers to the set of all observations under
concern. For example, if you want to carry out a survey to find out student’s
feedback about the facilities of your school, all the students of your school would
form a part of the ‘population’ for your study.
At a more realistic level, a country wants to maintain information and records about
all households. It can collect this information by surveying all households in the
country using the census method.
In our country, the Government conducts the Census of India every ten years. The
Census appropriates information from households regarding their incomes, the
earning members, the total number of children, members of the family, etc. This
method must take into account all the units. It cannot leave out anyone in
collecting data. Once collected, the Census of India reveals demographic
information such as birth rates, death rates, total population, population growth rate
of our country, etc. The last census was conducted in the year 2011.
Sampling Method
2
Like we have studied, the population contains units with some similar characteristics
on the basis of which they are grouped together for the study. In the case of
the Census of India, for example, the common characteristic was that all units are
Indian nationals. But it is not always practical to collect information from all the units
of the population.
It is a time-consuming and costly method. Thus, an easy way out would be to collect
information from some representative group from the population and then make
observations accordingly. This representative group which contains some units from
the whole population is called the sample.
The first most important step in selecting a sample is to determine the population.
Once the population is identified, a sample must be selected. A good sample is one
which is:
Small in size.
Again, realistically, the government wants estimates on the average income of the
Indian household. It is difficult and time-consuming to study all households. The
government can simply choose, say, 50 households from each state of the country
and calculate the average of that to arrive at an estimate. This estimate is not
necessarily the actual figure that would be arrived at if all units of the population
underwent study. But, it approximately gives an idea of what the figure might look
like.
3
A statistical method
A statistical method
that studies only a
that studies all the units
Definition representative group of
or members of a
the population, and not
population.
all its members.
It is a time-consuming
Time involved It is a quicker process.
process.
It is a relatively
Cost involved It is a costly method.
inexpensive method.
Merits –
The result of the census method may be checked with the help of the sampling
method.
In cases where the population size is too large, the sampling method is easy and
more practical.
Sampling normally generates an error due to leaving out of units from the
population. If a crucial unit is left out of the sample, the resulting error will be
large.
5
If skilled personnel are not available to interpret the data, the results drawn will
be unreliable.
Sampling
1. The process of selecting the representative sample units from the population
to study the characteristics of the population is called sampling.
2. In many empirical studies, data are to be collected from a population under
study.
3. A population consist number of units usually very large and sometimes
infinitely large.
4. In many cases, it is not practically possible to include all units of the
population for the investigation.
5. Therefore a few units of the population have to be selected as a
representative of the whole population.
6. So sampling is needed in this situation to draw the representative sample of
the population.
1 Type of Universe:
The first step involved in developing sample design is to clearly define the number
of cases, technically known as the universe. A universe may be finite or infinite. In
a finite universe the number of items is certain, whereas in the case of an infinite
universe the number of items is infinite (i.e., there is no idea about the total number
of items). For example, while the population of a city or the number of workers in
a factory comprise finite universes, the number of stars in the sky, or throwing of a
dice represent infinite universe.
6
2 Sampling Units:
Prior to selecting a sample, decision has to be made about the sampling unit. A
sampling unit may be a geographical area like a state, district, village, etc., or a
social unit like a family, religious community, school, etc., or it may also be an
individual. At times, the researcher would have to choose one or more of such
units for his/her study.
3 Source List:
Source list is also known as the ‘sampling frame’, from which the sample is to be
selected. The source list consists of names of all the items of a universe. The
researcher has to prepare a source list when it is not available. The source list must
be reliable, comprehensive, correct, and appropriate. It is important that the source
list should be as representative of the population as possible.
4 Size of Sample:
Size of the sample refers to the number of items to be chosen from the universe to
form a sample. For a researcher, this constitutes a major problem. The size of
sample must be optimum. An optimum sample may be defined as the one that
satisfies the requirements of representativeness, flexibility, efficiency, and
reliability. While deciding the size of sample, a researcher should determine the
desired precision and the acceptable confidence level for the estimate. The size of
the population variance should be considered, because in the case of a larger
variance generally a larger sample is required. The size of the population should be
considered, as it also limits the sample size. The parameters of interest in a
research study should also be considered, while deciding the sample size. Besides,
costs or budgetary constraint also plays a crucial role in deciding the sample size.
population may also consist of important sub-groups about whom the researcher
would like to make estimates. All such factors have strong impact on the sample
design the researcher selects.
Finally, the researcher should decide the type of sample or the technique to be
adopted for selecting the items for a sample. This technique or procedure itself
may represent the sample design. There are different sample designs from which a
researcher should select one for his/her study. It is clear that the researcher should
select that design which, for a given sample size and budget constraint, involves a
smaller error.
1 Research objectiveness
Firstly, a refined research question and goal would help us define our population of
interest. If our calculated sample size is small then it would be easier to get a
random sample. If, however, the sample size is large, then we should check if our
budget and resources can handle a random sampling method.
3 Study design
Moreover, we could consider the prevalence of the topic (exposure or outcome) in
the population, and what would be the suitable study design. In addition, checking
if our target population is widely varied in its baseline characteristics. For example,
a population with large ethnic subgroups could best be studied using a stratified
sampling method.
4 Random sampling
Finally, the best sampling method is always the one that could best answer our
research question while also allowing for others to make use of our results
(generalisability of results). When we cannot afford a random sampling method,
we can always choose from the non-random sampling methods.
For example, in a population of 1000 members, every member will have a 1/1000
chance of being selected to be a part of a sample. Probability sampling eliminates
bias in the population and gives all members a fair chance to be included in the
sample.
Simple random sampling: One of the best probability sampling techniques that
helps in saving time and resources, is the Simple Random Sampling method. It is
a reliable method of obtaining information where every single member of a
population is chosen randomly, merely by chance. Each individual has the same
probability of being chosen to be a part of a sample.
For example, in an organization of 500 employees, if the HR team decides on
conducting team building activities, it is highly likely that they would prefer
picking chits out of a bowl. In this case, each of the 500 employees has an equal
opportunity of being selected.
Cluster sampling: Cluster sampling is a method where the researchers divide
the entire population into sections or clusters that represent a population.
Clusters are identified and included in a sample based on demographic
parameters like age, sex, location, etc. This makes it very simple for a survey
creator to derive effective inference from the feedback.
For example, if the United States government wishes to evaluate the number of
immigrants living in the Mainland US, they can divide it into clusters based on
states such as California, Texas, Florida, Massachusetts, Colorado, Hawaii, etc.
This way of conducting a survey will be more effective as the results will be
organized into states and provide insightful immigration data.
Systematic sampling: Researchers use the systematic sampling method to
choose the sample members of a population at regular intervals. It requires the
selection of a starting point for the sample and sample size that can be repeated
at regular intervals. This type of sampling method has a predefined range, and
hence this sampling technique is the least time-consuming.
For example, a researcher intends to collect a systematic sample of 500 people in
a population of 5000. He/she numbers each element of the population from 1-
10
5000 and will choose every 10th individual to be a part of the sample (Total
population/ Sample Size = 5000/500 = 10).
Stratified random sampling: Stratified random sampling is a method in which
the researcher divides the population into smaller groups that don’t overlap but
represent the entire population. While sampling, these groups can be organized
and then draw a sample from each group separately.
For example, a researcher looking to analyze the characteristics of people
belonging to different annual income divisions will create strata (groups)
according to the annual family income. Eg – less than $20,000, $21,000 –
$30,000, $31,000 to $40,000, $41,000 to $50,000, etc. By doing this, the
researcher concludes the characteristics of people belonging to different income
groups. Marketers can analyze which income groups to target and which ones to
eliminate to create a roadmap that would bear fruitful results.
So far we have talked about random sampling, keeping in view only the finite
populations. But what about random sampling in context of infinite populations? It
is relatively difficult to explain the concept of random sample from an infinite
population. However, a few examples will show the basic characteristic of such a
sample.
The most defining feature of a complex sample is that sample members do not
have equal probability of being selected.
There are many ways of meeting this defining feature while also being capable of
representing the population and having advantages for other research goals.
If you’ve got one or more of these, you can consider your sample complex:
Stratification
Clustering
Oversampling
Sampling Without replacement
Finite populations
Multistage sampling
Our next article will define each of these terms, but there are two things that are
important to understand first.
Second, despite this, there are very good reasons to use complex samples.
In other words, a good complex sampling design will simultaneously cost much,
much less to administer and keep standard errors smaller than they would be in a
simple random sample.
When your research budget is tight (and whose isn’t?), this is hugely important.
For example, it’s very difficult to sample schoolchildren without first sampling
schools or patients without sampling hospitals. Including these multiple stages of
sampling means not every student or patient has an equal probability of being in
the sample (making the sample complex).
Let’s say you want to study color blind people over age 75 from rural areas. This
group makes up a tiny, tiny proportion of the overall adult population. To have
sufficient numbers in the group to make any statistical comparisons or have
reasonably sized confidence intervals, you’d have to sample millions of people
using simple random sampling. Oversampling this group allows you to have
enough group members to study them, but it makes the overall sample complex.
With these approaches, members of the population don’t all have the same
probability of being selected into the sample. Complex samples are most often
used for surveys, especially large national or multinational ones where simple
15
random sampling is simply not practical. For example, suppose you were
conducting a survey in a conflict-affected country and the target population was
all adults aged over 18, totaling, say, 20 million individuals. You might be
interested in how responses differ by occupation, but some categories (perhaps
self-employed) may only represent a small fraction of the population. You
might therefore consider stratifying your sampling to ensure that sufficient
responses were obtained to make reliable estimates in each occupation group.
The country may also be split geographically into, say, 40 states. To travel to
and interview people in each of these states would likely be unfeasible, so
cluster sampling might be used so that only a subset of the states needed to be
accessed.
Remember – complex samples require statistical methods that take the sampling
design into account.
So, what are the most common complex sampling approaches and why and
when are they used? We focus here on cluster sampling and stratified sampling.
We’ll also discuss sampling without replacement which should also be taken
into account when analysing your data.
Clustering
In cluster sampling, the population is split into similar groups of individuals
(“clusters”) and then a sample of these clusters is taken (the clusters are the
sampling units in this case) so that all of the elements in the selected clusters are
included in the sample. Clustering is appropriate when we expect elements in
different clusters to be relatively similar (“homogeneous”), i.e., each cluster is
representative of the population.
survey the pupils in a subset of schools – so, we might cluster pupils according
to their schools and then take a sample of the clusters (surveying all students
within those selected clusters) to obtain a clustered sample of school children in
the county.
This method is most efficient when most of the variation in the population is
within clusters, rather than between them (higher within-cluster correlation
increases the variance compared to SRS). Cluster sampling is generally used to
reduce costs, by reducing the number of clusters that we sample within whilst
maintaining the sample efficiency.
Stratification
Stratified sampling
Stratified sampling involves splitting the members of the population into
subgroups (“strata”) before sampling, and then applying sampling (usually SRS)
separately within each and every group (“stratum”). This is in contrast to cluster
sampling where whole clusters are sampled, rather than samples of individuals
being taken within each group (i.e., stratum), as illustrated in Figure 1.
Stratified sampling can help to ensure that the sample collected is representative
of the population, by guaranteeing that sufficient individuals from each sub-
group (e.g., gender, or socioeconomic status) will be sampled. This is
especially important if some strata only represent a small proportion of the
overall population and if the survey responses are expected to differ across the
subgroups. For example, responses to a survey might differ by nationality so if
we were to miss some of the nationalities in our sample, our results might be
biased.
17
voluntary aided school, etc.), and then samples of pupils would be taken within
each stratum, to ensure that pupils from each school type were adequately
represented in the sample. Contrast this with cluster sampling where we would
cluster pupils and then take a sample of the clusters (surveying all students
within those selected clusters).
Each stratum can be sampled in proportion to the relative size of that sub-group
in the total population (“proportionate allocation”) to make the overall sample
as representative as possible. Or, larger samples can be obtained in strata with
greater variability to minimise the sampling variance (“optimum allocation”),
improving the efficiency of the sample overall.
Post-stratification
After completing your survey, you might find that the sample you have taken is
not representative of the population (for example, 40% of the population might
be male, whereas in the sample obtained only 20% might be males and so males
are “under-sampled”). In this case post-stratification can be applied. Such
differences can be due to non-response or incomplete coverage, which are an
inevitable consequence of the fact that we cannot sample everyone in the
population nor compel them to respond. If the sample is imbalanced with
respect to key factors that are likely to affect the study/survey responses then
they can lead to biases in the results. Sampling weights can be calculated to
post-stratify the sample (to adjust the sample data after it has been collected) to
ensure that the results are representative of the population. For more
information on survey weighting and post-stratification, see our case study on
the work we did recently with Sport Wales for their School Sport Survey.
Suppose you were taking a sample of animals from the wild, in order to estimate
their average weights, for example. Once one animal had been caught and
measured, it would then be released back into the wild. It’s possible, in this
19
case, that you might catch and measure the same animal more than once – we
call this “sampling with replacement”. With replacement means that once an
individual is selected to be in the sample, that individual is placed back in the
population to potentially be sampled again. There are two ways to select a
sample from the population – with replacement, as in this example, or without
replacement. Without replacement means that once an individual is sampled,
that individual cannot be sampled again; they are not placed back in the
population. This will often occur when a sample is preselected from a sampling
frame, i.e., a list of all those in the population who can be sampled.
Many standard analysis techniques assume that the sample being analysed was
obtained from a sample taken with replacement or from an infinite population
(when the population is infinite, or extremely large, then there’s little difference
between sampling with and without replacement). However, in practice, most
simple random samples are actually taken without replacement from a finite
population. In this case, the variability of our sample is actually less than
expected, and therefore we can apply a finite population correction to account
for this greater efficiency in the sampling process. Each sampled individual is
always unique and therefore provides ‘new’ information when sampling without
replacement, whereas it’s possible when sampling with replacement to have
‘repeated’ information. When sampling without replacement from a finite
population, it may be possible to sample all individuals in which case we’ll have
no uncertainty in our estimates. The correction only has a noticeable effect
when the sampling fraction, i.e., the proportion of the population sampled, is
large. A good rule-of-thumb to decide whether you need to apply a finite
population correction is if you obtain a sample that makes up more than 5% of
the population you should apply the correction. A finite population correction
20
factor (FPC) is calculated, which is then multiplied by the standard error of the
estimate. We’ve recently released a series of sample size and confidence
interval calculators, some of which include a finite population correction – for
more details (including the formula for the FPC) see the calculators on the
Resources section of our website.
The most important thing to understand about complex sampling is that a more
sophisticated analysis is needed when analysing the data collected – standard
approaches are not necessarily appropriate. We must take account of the sample
design in order for our conclusions to be reliable, whether we are estimating a
characteristic of the population or testing for effects, for example.
The usual standard errors, assuming a simple random sample with replacement,
will be incorrect if a complex sample has been taken. For example, a sample
that is collected using cluster sampling underestimates the true population
variance. Adjusting the standard errors to account for the complex sampling
plan we find that they are larger, if correctly estimated, than those that would
have been obtained assuming a simple random sample of the same size. This is
because we might expect responses within a cluster to be more similar to each
other than for randomly selected individuals across the population. Without
correcting for these under-estimates, we increase the risk of falsely determining
significant effects when they do not actually exist (“false positives”).
In the statistical software package SPSS, complex samples analysis plans can be
generated which, when used alongside the corresponding Analyze>Complex
Samples menu, ensure that the sample design is incorporated into the analysis.
In R, the survey package similarly allows you to specify a complex survey
design and carry out appropriate analyses taking the design into account. Other
packages in R, such as the anesrake package are also useful for implementing
survey weighting, for example.
Complex samples are a useful tool for creating more efficient (e.g., stratified
sampling with optimum allocation) or cheaper (e.g., cluster sampling) sampling
designs. However, it’s crucial when using a complex sample to account for the
sampling design when analysing your data in order to ensure that the results are
accurate and reliable. If you’re conducting a survey using complex sampling
and need help with the survey design or analysis, contact us to find out how we
can help.