Which Clusters of Statistical Techniques Should I Use
Which Clusters of Statistical Techniques Should I Use
by Simon Moss
Introduction
This document helps many individuals, from research students to experienced researchers,
decide which cluster of techniques they should apply to analyze their quantitative data. The
instructions are, perhaps, most useful after researchers have developed a preliminary research
question—but have not finalized the research design and methods. Other documents or articles can
then help researchers decide precisely which statistical techniques to apply.
Many studies comprise more than one research question or hypothesis. For example,
consider this abstract. How many research questions did this study test?
Typically, each data point corresponds to one person, organization, nation, plant, field,
animal, specimen, or object. In these examples, the person, organization, nation, plant, field,
animal, specimen, or object is called the unit of analysis.
To illustrate, consider a researcher who wants to assess whether people who eat carrots are
happier than people who do not eat carrots. In this example, the unit of analysis is the individual.
In contrast, consider a researcher who wants to assess whether nations that drive on the left
side are more unequal than nations that drive on the right side. In this instance, the nation is the
unit of analysis. Each data point—such as the side in which people drive and level of inequality—
correspond to one nation
So, before you proceed, in your mind, clarify the unit of analysis for your study. Is the unit of
analysis a person, group, specimen, and so forth?
In most studies, researchers need to randomly select a sample from a broader population.
For example, researchers might need to select a random sample of 100 participants from the
population of adults in Australia. Or the researchers might select a random sample of 50 specimens
from a random sample of animal droppings, and so forth. In this instance, the participants,
specimens, and so forth are referred to a random variable.
In some instances, the research comprises two or more random variables. For example, a
researcher might choose a random sample of 20 employees from a random sample of 10 companies.
In this instance, both the employees and companies are random variables.
Similar, a researcher might select a random sample of 10 specimens from a random sample
of 5 fields. In this instance, both the specimens and fields are random variables.
So, before you proceed, in your mind, clarify the number of random variables in your study.
Is the number of random variables one or more than one?
Phase 4: Determine which cluster of techniques may be relevant
The next series of questions helps you determine which cluster you should consider. The
first set of questions are designed to clarify whether your data include complications—complications
that can be solved only with a specific cluster of techniques.
Do the data include information about the location of If the answer to all three
specific objects or people—such as coordinates on a questions is yes, consider geo-
map or the postcode? statistics, geographic information
If so, are objects or people that are close in space likely systems and mapping*
to be more similar in some way than objects or people
that are farther away in space—called spatial
autocorrelation?
If so, is the research in a field in which this spatial
autocorrelation tends to be considered important, such
as geography and environmental science?
Are you collecting data from each unit—such as each If the answer to these two
person, animal, or field—at more than one time? questions is yes, consider time
If so, does the number of times exceed about 50 or so? series analyses, such as ARIMA or
spectral analyses*
Note that
Nonparametric statistics are
not necessary if you can
convert this rank to a more
precise number—such as the
time at which they completed
the race?
You might need to conduct
non-parametric tests for
other reasons as well.
Does your data include information about the If the answer to this question is
relationship between various units—such as people, yes, consider social network
animals, or objects? For instance, you might have analyses.
collected data on the degree to which employees in a
company like one another
Do you want to assess whether some measure, If the answer to this question is
instrument, or procedure accurately predicts one of two yes, consider ROC curves
possible outcomes—such as whether some tool
accurately diagnoses an illness
Would you like the results of your analyses to also If the answer to this question is
integrate your past expectations or knowledge about yes, consider Bayesian statistics *
the topic. For example, perhaps you want to estimate
the percentage of individuals who read Peppa Pig.
However, publishers tell you the percentage is almost
definitely between 20% and 30%--and somehow you to
incorporate this information in the analyses.
*. These clusters of analyses can be challenging, because the researcher needs to reach several
decisions or the advice on this topic is not as extensive. The clusters that are assigned asterisks can
be especially challenging.
If this table indicates that you should consider a specific cluster of techniques, search whether or
not this site provides more information about this cluster. Or Google this term.
If the table indicates that you should consider more than one specific cluster of techniques,
proceed to the first cluster that was recommended. After reading about this cluster, if you feel
this first cluster is not suitable, consider the second cluster, and so forth.
If the table indicates that you do not need to consider any of these specific clusters of
techniques, proceed to the next set of questions
This second set of questions differentiate the most common cluster of techniques.
Common techniques
Questions to differentiate techniques Relevant cluster of techniques to
consider
Do you want to compare distinct groups of units—such If the answer to this question is
as distinct groups of people, animals, nations, yes, consider between-subject
specimens, or drugs. analyses such as independent t-
tests or between-subject
For example, you might want to compare males and ANOVAs.
females, blondes and brunettes—or four possible
combinations of genders and hair colours—on IQ. Or you
might want to compare mice injected with vitamin C with
mice injected with vitamin D on activity?
Do you want to compare the same units—such as If the answer to this question is
people, animals, nations, specimens, or plants—over yes, consider within-subject
time. analyses, such as dependent t-
tests or repeated-measures
For example, you might want to examine the memory of ANOVAs
individuals before and after exercise
Do you want to compare distinct groups of units--such If the answer to this question is
as distinct groups of people, animals, nations, yes, consider mixed-model
specimens, or drugs—but over time? analyses, such as mixed-model
ANOVAs
For instance, you might want to compare mice injected with
vitamin C with mice injected with vitamin D on three
separate occasions? On each occasion, you examine the
same mice
Do you want to explore the relationship between If the answer to this question is
numeric variables? yes, consider correlation and
regression analyses
For instance, you might want to examine the association
between the self-esteem and income of people. Or you
might want to examine the association between the water
consumption and height of plants. In these examples, a
variable is regarded as numerical if each unit is assigned a
number—such as income—rather than a category name.
Do you want to identify clusters of similar units, such as If the answer to this question is
sets of people with similar personalities or attitudes? yes, consider cluster analyses or
Q methodology
Do you want to explore or test measures or If the answer to this question is
instruments? yes, consider analyses of scales—
such as Cronbach’s alpha,
For example, you might have asked individuals 20 questions exploratory factor analysis, and
about their personality. You now want to ascertain whether confirmatory factor analysis
these 20 questions can be divided into sets of items, each of
which measures one trait
Do you want to simultaneously test measures and If the answer to this question is
explore the association between these measures yes, consider structural equation
modelling*
Do you want to examine a sequence of variables? If the answer to this question is
yes, consider path analyses and
For example, perhaps you want to assess the sequence in structural equation modelling*
which experiences during childhood shapes personality, and
personality shapes income.
Other approaches are also available, but more suitable to researchers who have already developed
advanced knowledge about data analyses. These techniques include