E001375 Full
E001375 Full
BMJ Medicine: first published as 10.1136/bmjmed-2025-001375 on 27 October 2025. Downloaded from [Link] on 28 October 2025 by guest.
multivariable regression to identify causal risk factors
Dan Lewer ,1,2 Thomas Brothers ,3 Elizabeth O’Nions ,1 John Pickavance 1
For numbered affiliations see Lewer and colleagues explain why risk factors for a associated with studies often seem reasonable, these
end of article. health outcome should not be studied by putting several
Correspondence to: Dr Dan
problems mean that the method can also produce
candidate independent variables (exposures) into a
Lewer; d.lewer@ucl.ac.uk implausible results, such as dementia reducing the
multivariable regression model and identifying which
Cite this as: BMJMED are statistically significant after mutual adjustment.
risk of death in patients admitted to hospital for
2025;4:e001375. doi:10.1136/ trauma,1 diabetes reducing the risk of venous throm-
bmjmed-2025-001375
boembolism in the general population,2 and lack
Protected by copyright, including for uses related to text and data mining, AI training, and similar technologies.
DL and TB are joint first authors. Introduction of food reducing the risk of post-traumatic stress
Many medical and epidemiological studies use disorder in refugees.3 Many of these studies are
EO’N and JP contributed equally. multivariable regression to test whether several inde- published every year, including in well respected
Received: 5 February 2025 pendent variables (exposures) are causal determi- journals. We argue that these studies are misleading
Accepted: 30 September 2025 nants of a health outcome. Where mutually adjusted and contribute to research waste, and the “factors
regression coefficients are significant, the exposures associated with” method should be abandoned.
are labelled as risk factors for the outcome. We call
this study design “factors associated with.” In this
The “factors associated with” study design
article, we argue that this method is flawed due to a
What factors are associated with death and disease?
lack of reasoning about which variables are treated
What are the most important determinants of health?
as confounders, multiple statistical testing, and
Can we identify risk factors that might be modified?
post hoc interpretation of the results. In some cases,
These questions are often asked in observational
researchers use algorithmic or stepwise approaches
health research. For example, researchers have inves-
to select exposure variables, which further exacer-
tigated risk factors associated with heart attacks,4
bates these problems. Although the results of factors
strokes,5 serious covid-19,6–9 complications of cystic
fibrosis,10 suicide among veterans,11 and pain after
breast cancer surgery,12 with the aim of informing
preventive interventions or identifying high risk
KEY MESSAGES groups. These studies are not testing a theory or
⇒ A “factors associated with” study design can hypothesis about the role of a specific risk factor,
be defined as an observational research study but instead simultaneously screen multiple poten-
that does not have a prespecified primary tial risk factors for significant associations with the
independent variable (exposure) and instead outcome (box 1).
uses multivariable regression to test whether In a typical “factors associated with” study,
any of several exposures affects a health the researcher uses a regression model where the
outcome dependent variable is the outcome of interest and
⇒ Variables that have a statistically significant the independent variables are exposures such as
association with the outcome after mutual personal characteristics, health behaviours, clinical
adjustment are identified as factors associated factors, living and working conditions, and other
with the outcome socioeconomic and environmental factors. Where
⇒ This study design is common in medical and the mutually adjusted regression coefficients are
epidemiological research, and thousands of statistically significant, the exposures are identified
these studies are published each year as factors associated with the outcome.
⇒ Although not always labelled as such, these Although the results often look reasonable, we
studies aim to identify causal relations but do argue that the findings provide little scientific value
not have the rigour necessary to make causal and can be misleading. “Factors associated with”
claims studies combine multiple poor research practices,
⇒ This study design has important methodological including the table 2 fallacy, P- hacking, fishing
flaws, such as lack of rationale for mutual expeditions, data dredging, HARKing (hypothe-
adjustment of the variables, multiple sising after the results are known), and the Texas
statistical testing, and post hoc interpretation sharpshooter fallacy.13 14
of significant results, which can produce The “factors associated with” design was main-
unreliable and sometimes implausible results stream by the 1970s15 16 and has become increas-
ingly common. In 2024, more than 4000 articles
⇒ Researchers should not use the “factors
with the phrase “factors associated with” in the
associated with” study design and scientific
title were added to the PubMed database, a 10-fold
journals should not publish these analyses
increase from 2004. Reasons for the increased use
BMJ Medicine: first published as 10.1136/bmjmed-2025-001375 on 27 October 2025. Downloaded from [Link] on 28 October 2025 by guest.
BOX 1 | HOW TO IDENTIFY A “FACTORS can occur when an analysis adjusts, stratifies, or
ASSOCIATED WITH” STUDY: SOME COMMON selects participants based on a variable (in this
FEATURES case, covid-19 testing) that is affected by both the
⇒ Exploratory aims (eg, to explore, identify, exposure (tobacco smoking) and the outcome
understand, or characterise factors associated (covid-19).19 Collider bias can mean that the
with an outcome). observed association between the exposure and
⇒ No primary independent variable (exposure) or outcome is very different from the causal effect.
hypothesis. 3. A study of refugees from Guatemala found that
lacking sufficient food almost eliminated the
⇒ Tables reporting regression results (such as
risk of post- traumatic stress disorder.3 The
odds ratios) and highlighting statistically
authors suggested that sharing of food and
significant results.
collective cooking might have created a sense
⇒ Classification of multiple exposures as either
Protected by copyright, including for uses related to text and data mining, AI training, and similar technologies.
of wellbeing. Again, collider bias might be an
associated or not associated with the outcome. alternative explanation. Here, refugee status is
the collider, which can arise from famine or food
scarcity (the exposure) and war (which is a cause
of this method might include the widespread avail- of post-traumatic stress disorder, the outcome).
ability of multivariable regression in desktop statis- Among refugees, those displaced by famine may
tical software, ease of doing these studies when data be less likely to have experienced war, producing
are already collected, lack of requirement for theo- a misleading negative association between food
retical development, and apparent efficiency of stud- scarcity and post-traumatic stress disorder.
ying several risk factors simultaneously. Although 4. A study of patients admitted to hospital due to
many epidemiologists might be aware of the limi- trauma found that dementia reduced the risk
tations of this study design, increasing numbers of of death.1 This finding might be explained by
studies continue to be published, including in the selection bias, where patients with dementia
highest ranked journals. have a lower threshold for hospital admission or
In this article, we do not consider other forms of activation of a trauma team response.
exploratory causal research, such as genome-wide 5. A study of US enlisted marines found that
association studies,17 which typically adjust the relationship counselling increased the risk of
association between each candidate genetic variant suicide, whereas post-traumatic stress disorder,
and the outcome for a limited set of confounding self-
harm, and adverse childhood experiences
variables, rather than adjusting all of the genetic did not.20 The authors suggested that relationship
variables for each other. problems might be underlying both counselling
and suicide, a scenario sometimes known as
Problems with factors associated with studies confounding by a common cause.
Consider these six unexpected findings published 6. A study of participants in the UK Biobank
in high quality journals, each of which was based found that diabetes reduced the risk of
on analysis of multiple candidate exposures using a venous thromboembolism. 2 The authors
regression model. could not explain the apparent protective
effect of diabetes and acknowledged
1. A study of primary care patients in England that further evidence would be needed
found that tobacco smoking reduced the risk of to support this finding. More plausible
death due to covid-19.7 The authors suggested findings included increased risk of venous
that inclusion of chronic respiratory disease in thromboembolism associated with older
the regression model might explain this finding, age, male sex, tobacco smoking, and higher
because smoking causes chronic respiratory body mass index.
disease, which in turn increases vulnerability
to covid- 19. Therefore, adjusting for chronic Although we offered some possible explana-
respiratory disease could mask the effect of tions for these unexpected findings, we can only
smoking on death due to covid-19. This problem speculate, and many unknown mechanisms and
is known as adjusting for a mediator. unmeasured variables likely contribed to the
2. A study of people testing for covid-19 found that
observed associations (ie, a kind of spaghetti of
tobacco smoking reduced the risk of a positive test
causation). Apparently plausible results gener-
result.18 This finding may be explained by collider
ated by the same method should be treated with
bias. Two reasons why people might access a
covid-19 test are that they have symptoms of the same caution as these strange findings. For
covid-19 or they smoke tobacco and have a cough example, in the study of venous thromboembo-
related to smoking. Those with a smoking related lism, 2 tobacco smoking was associated with an
cough are less likely to have a positive test result increased risk, which is likely true, but the effect
within the sample of people testing for covid-19. size may be biased by other elements of the study
design.
Key problems of the “factors associated with” falls, an adjustment for age might be made, because
BMJ Medicine: first published as 10.1136/bmjmed-2025-001375 on 27 October 2025. Downloaded from [Link] on 28 October 2025 by guest.
studies include: no prespecified primary expo- age affects both balance and the risk of injuries,
sure and no rationale for which variables are used and hence is a confounder. Therefore, the value in
in statistical adjustment; use of multiple statis- table 2 might be a useful estimate. To estimate the
tical tests; and creating post hoc hypotheses. effect of age on injuries, however, the effect would
likely be left unadjusted in relation to balance prob-
No strategy for statistical adjustment: the table 2 lems, because balance is probably best understood
fallacy as a mechanism or mediator for the effect of age on
In a focused observational study of cause and effect, injuries.
one variable is examined as a potential risk factor Exposures can affect each other in many ways,
(exposure) on a health outcome, and the other and some are easier to understand than others. Some
variables are included in the model to adjust for common relations between relevant variables and
confounding. In a “factors associated with” study, an exposure of interest include: a confounder, which
Protected by copyright, including for uses related to text and data mining, AI training, and similar technologies.
all variables are treated simultaneously as exposures affects the probability of both the primary exposure
and confounding variables for each other. The lack and the outcome, and may therefore mean that the
of rationale for statistical adjustment means that the exposure is correlated with the outcome even if it does
quantities estimated by the regression model (eg, not cause the outcome; a mediator, which lies on the
odds ratios) do not represent the effect of each vari- causal pathway between the exposure and outcome,
able on the outcome. and can be considered a mechanism for a causal effect;
Westreich and Greenland called this problem the and a collider, which is an event that is caused by both
table 2 fallacy,13 because many studies include a the exposure and the outcome, and can introduce bias
table 1 describing the distribution of exposure vari- if controlled in the analysis. These relations can be
ables in a study sample and a table 2 showing multi- represented diagrammatically with directed acyclic
variable associations between the exposures and an graphs, which can help in the design of statistical
outcome. The table 2 fallacy occurs when all values adjustment strategies to estimate the causal effect of
in table 2 are interpreted as causal effects on the an exposure.21 When estimating causal effects, in most
outcome, rather than only the value estimated for the cases, confounders but not other variables should
primary exposure. be controlled. The effect of controlling other varia-
Imagine having survey data from the general bles is unpredictable, but in some cases, controlling
population, including whether participants recently for a mediator can create the false impression of no
had an injury or fall as a pedestrian (the outcome), relation between a real exposure and an outcome (a
their age (an exposure), and whether participants false negative result) whereas controlling for a collider
had balance problems (a second exposure). Table 2 can create the false appearance of a relation between
might have two rows showing the mutually adjusted the exposure and outcome (a false positive result).
results for age and balance problems, based on a Figure 1 shows an example of a confounder, mediator,
multivariable regression model. To estimate the and collider in a hypothetical study of risk factors for
effect of balance problems on pedestrian injuries and an injury or fall as a pedestrian.
Confounder Confounding
Age Sample: General population
Exposure: Balance problems
Outcome: Pedestrian unintentional injuries and falls
Age affects both balance and risk of pedestrian unintentional injuries and is therefore
Exposure Outcome a confounder, which should be controlled when estimating effect of balance
Balance problems Pedestrian injuries and falls problems on pedestrian injuries
Mediator Mediation
Balance problems Sample: General population
Exposure: Age
Outcome: Pedestrian unintentional injuries and falls
Age affects balance, and balance subsequently affects the risk of pedestrian unintentional
Exposure Outcome injuries. Balance is therefore a mediator or mechanism. When estimating the effect of age on
Age Pedestrian injuries and falls pedestrian injuries, balance should not be controlled because this would lead to
overadjustment
Figure 1 | Example of a confounder, mediator, and collider in a hypothetical study of risk factors for an injury or fall as
a pedestrian, with simplified directed acyclic graphs
BMJ Medicine: first published as 10.1136/bmjmed-2025-001375 on 27 October 2025. Downloaded from [Link] on 28 October 2025 by guest.
Probability of at least one false positive (%)
How likely is a false positive result, given no real risk factors?
in a study that was not designed to quantify these 100
pathways. Using a different set of variables to adjust
80
each the effect of each variable is unlikely to resolve
these problems, because unmeasured variables and 60
the choice of study sample may also affect observed
40
associations. For example, collider bias can result
from the choice of study sample or selection bias 20
during recruitment.22 In the examples above, where
0
studies were conducted in samples of people testing
for covid-19 or in Guatemalan refugees, collider bias How likely is it that an observed risk factor is real, given one real
risk factor with effect size 0.1?
could result from the fact that people not testing for 100
Sample size
Protected by copyright, including for uses related to text and data mining, AI training, and similar technologies.
5000 500 200 50
80
variables could affect each exposure in a “factors
associated with” study differently, and may not 60
appear in the list of exposure variables or in the
40
description of the sample frame.
Given these complexities, it is generally not 20
useful to try to unpick the reasons for an apparent
0
multivariable association related to an exposure 0 5 10 15 20
that is not the primary exposure. Unfortunately, in No of variables being studied
“factors associated with” studies, plausible find- Figure 2 | How multiple testing affects the interpretation
ings are often taken to be true, whereas implausible of statistical significance. Top panel: calculated based
findings are ignored or dismissed, even though on the formula P=1−(0.95 ˆ n), where n is the number of
these findings were generated with the same variables studied and P is the probability of at least one
false positive result. Bottom panel: power calculations
method.
assuming continuous independent and dependent
variables and one of the independent variables
affecting the dependent variable with a standardised
Multiple statistical tests: fishing expeditions, data
mean difference of 0.1. With a large sample or a large
dredging, and P-hacking effect size, or both, the positive predictive value is
By design, “factors associated with” studies involve asymptotically 1/(1+(n−1)×0.05), where n is the number
multiple statistical tests. Each independent variable of variables. Each variable, other than the true risk
has its own P value, which is purportedly a proba- factor, has a 0.05 chance of being a false positive result,
bility that the effect size or something greater might whereas asymptotically, the true risk factor is always
detected
be observed if there was actually no real effect.
When none of the variables under investigation are
true risk factors and all are unrelated to each other
and to the outcome, on average one in 20 will yield associations with the outcome, or the impact on
a P value below 0.05 — a false positive finding (type model residuals (the model fit).
I error). The probability of obtaining a false positive Stepwise variable selection exacerbates
finding increases with the number of statistical tests the problems of multiple statistical testing.
performed and depends on whether any true associ- By comparing combinations of risk factors
ations exist — which, in practice, are never known. and selecting those with significant associa-
For instance, when testing 10 independent candi- tions, this procedure maximises false positive
date risk factors, there is roughly a 40% probability results, 23 24 and can be viewed as a type of auto-
of obtaining at least one false positive finding, irre- mated P- h acking. P-
h acking has been defined
spective of sample size (figure 2, top). If the study as “trying out several statistical analyses and/
includes one true risk factor and a sample size of 250 or data eligibility specifications and then selec-
participants, the probability that an observed statis- tively reporting those that produce significant
tically significant association reflects a real effect results.” 25 Researchers might use P-h acking to
is only about 50% (figure 2, bottom). Researchers deliberately maximise their scientific publi-
might assume a 95% probability that a significant cations at the cost of misleading results. More
risk factor is real. often, well intentioned researchers are unaware
Researchers might include all of their candidate that the stepwise algorithms are a type of
risk factors into one model, or use algorithmic or P-h acking. These problems are also known as
stepwise methods to reduce the number of vari- data dredging 26 or fishing expeditions because
ables. Algorithmic or stepwise methods involve the researcher looks for associations in a dataset
iteratively adding and removing independent rather than testing a hypothesis or theory based
variables based on the significance of their on previous knowledge.
BMJ Medicine: first published as 10.1136/bmjmed-2025-001375 on 27 October 2025. Downloaded from [Link] on 28 October 2025 by guest.
and HARKing BOX 2 | WHEN STUDYING THE CAUSES OF
The Texas sharpshooter fallacy is an analogy to DISEASES, HOW TO AVOID THE PROBLEMS OF
describe hypothesising about associations after “FACTORS ASSOCIATED WITH” STUDIES
these associations have been observed. Imagine ⇒ Use unambiguous causal language in your
that someone is doing target practice when no one research question, such as “does reducing salt
is watching. They fire a gun at a blank wall and in the diet prevent strokes?”
then draw a target around the tightest cluster of ⇒ Prespecify an adjustment strategy based
bullet holes. They then invite people to observe the on existing evidence, knowledge, and
accuracy of their aim. This practice is also known assumptions; directed acyclic graphs can help
as Hypothesising After the Results are Known to communicate your strategy.
(HARKing).27 ⇒ Include all potential confounding variables in
“Factors associated with” studies are a special the analysis, and do not adjust for mediator or
Protected by copyright, including for uses related to text and data mining, AI training, and similar technologies.
case of the Texas sharpshooter fallacy and HARKing. collider variables without good reason.
Classically, these practices involve pretending that ⇒ Use a preregistered research protocol; if
the hypothesis (or target) was already present before legitimate deviations from the protocol are
the research was started, and therefore have an necessary, the protocol will help to explain the
element of dishonesty. Many “factors associated deviations transparently.
with” studies do not claim they had prior hypoth- ⇒ Focus on estimating an effect, including the
eses. The conclusions, however, often suggest uncertainty; if P>0.05, the effect size and
mechanisms for the observed risk factors as if these confidence interval should still be reported
proposed mechanisms were being tested. In fact, (rather than reported as not significant).
these mechanisms are being suggested post hoc.
⇒ Avoid speculation about results from variables
In summary, these three problems mean that the
that were included in the analysis to adjust for
results of “factors associated with” studies do not
confounding.
provide useful insights into causal relations between
exposures and outcomes. Methods that estimate
causal effects in observational research must be
guided by counterfactual theories about what might examined the risk factors associated with recent high
have happened to people in unobserved exposure intensity physical activity in patients with hyper-
statuses. Introductions to these problems and tools trophic cardiomyopathy who died during physical
such as directed acyclic graphs are available else- activity of any intensity.30 The results suggested that
where,28 29 and recommendations for avoiding some younger age was associated with “high intensity
common pitfalls are provided in box 2. physical activity related sudden cardiac death”. The
researchers concluded that younger patients (but not
older patients) should be advised against high inten-
sity physical activity, which could undermine clinical
Defences of factors associated with studies guidance and evidence from randomised trials.31 32
Results are exploratory or hypothesis generating A “factors associated with” study might identify
Researchers might accept that limitations exist in an important risk factor by chance, but at the cost
a “factors associated with” analysis, but argue that of misleading findings and substantial research
the results give a useful initial indication of potential resources.
risk factors. Researchers might say that the results
are hypothesis generating and can inform confirma- Results show associations rather than causal
tory studies with a more focused study design. But effects
how often has an important hypothesis been gener- Some researchers argue that an analysis of observa-
ated for the first time from a “factors associated with” tional data is not attempting to quantify the effect
study, and then confirmed to be true? We have yet to of a risk factor, and therefore the rigour required
identify an example. If readers can identify exam- to measure a causal effect is not needed. Instead,
ples, which we would like to hear about, we wonder researchers might claim that the risk factor is “inde-
if these justify the waste from the large number of pendently associated” with the outcome, and that
“factors associated with studies that are done each estimating the size of an association is different from
year. estimating a causal effect.
In some cases, “factors associated with” studies We have argued that the findings of a factors asso-
are not only wasteful, but potentially harmful. Two ciated with study can reflect the arbitrary design of a
of the examples above7 18 suggested that tobacco regression model rather than processes of substan-
smoking might prevent serious covid-19, which could tive importance in the real world. Moreover, if the
undermine public health efforts to reduce the preva- findings are interpreted in ways that imply the expo-
lence of smoking. In another example, researchers sure should be modified or might contribute to the
2
risk of a disease, then the inference is inherently Epidemiology and Public Health, University College London,
BMJ Medicine: first published as 10.1136/bmjmed-2025-001375 on 27 October 2025. Downloaded from [Link] on 28 October 2025 by guest.
London, UK
causal. The term independently associated, if not 3
Department of Medicine, Dalhousie University, Halifax, Nova
implying causation, does not mean anything. This Scotia, Canada
language can obscure methodological challenges
or downplay the need for a clear research question Contributors DL and TB had the idea for the manuscript. All authors
discussed the article contents. DL wrote the first draft. All authors
and careful causal reasoning. Even if the discussion reviewed and revised the manuscript. DL is the guarantor.
section of the article highlights the limitations of
Funding The authors have not declared a specific grant for this
the method, the results may still be highlighted and research from any funding agency in the public, commercial, or not-
promoted by journalists or policy makers. for-profit sectors.
Competing interests We have read and understood the BMJ policy
on declaration of interests and declare the following interests: none.
Results can help prioritise groups for more support
Provenance and peer review Commissioned; externally peer
Based on the results of “factors associated with” reviewed.
studies, researchers might conclude that subgroups
Protected by copyright, including for uses related to text and data mining, AI training, and similar technologies.
Open access This is an open access article distributed in accordance
should be prioritised for effective interventions. In the with the Creative Commons Attribution Non Commercial (CC BY-NC
example of US enlisted marines,20 one might argue 4.0) license, which permits others to distribute, remix, adapt, build
upon this work non-commercially, and license their derivative works
that relationship counselling probably did not cause on different terms, provided the original work is properly cited,
suicide, but nonetheless marines who receive relation- appropriate credit is given, any changes made indicated, and the use
is non-commercial. See: [Link]
ship counselling are at higher risk. Therefore, those
4.0/.
receiving counselling could benefit from interventions
to prevent suicide. This is a question about predic-
ORCID iDs
tion rather than causation. If the question is whether Dan Lewer [Link]
marines should be prioritised for interventions to Thomas Brothers [Link]
Elizabeth O’Nions [Link]
prevent suicide according to whether they received John Pickavance [Link]
relationship counselling, then the relevant value is the
univariable association between relationship coun-
selling and suicide, rather than this association after REFERENCES
1 Yadav K, Lampron J, Nadj R, et al. Predictors of mortality among
adjustment for other variables, such as deployment to older major trauma patients. Can J Emerg Med 2023;25:865–72.
war zones and traumatic brain injury. If the multivari- 10.1007/s43678-023-00597-w
able model is used to prioritise interventions to prevent 2 Gregson J, Kaptoge S, Bolton T, et al. Cardiovascular Risk Factors
Associated With Venous Thromboembolism. JAMA Cardiol
suicide, then the relevant value would be the predicted 2019;4:163–73. 10.1001/jamacardio.2018.4537
risk for each marine based on the full model, rather 3 Sabin M, Lopes Cardozo B, Nackerud L, et al. Factors associated
with poor mental health among Guatemalan refugees living in
than coefficients for individual exposures. Neither Mexico 20 years after civil conflict. JAMA 2003;290:635–42. 10.1001/
approach would involve inferring a causal relation jama.290.5.635
4 Yusuf S, Hawken S, Ounpuu S, et al. Effect of potentially modifiable
based on the significant association between receiving risk factors associated with myocardial infarction in 52 countries (the
counselling and suicide, after adjustment for the other INTERHEART study): case-control study. Lancet 2004;364:937–52.
available variables. 10.1016/S0140-6736(04)17018-9
5 O’Donnell MJ, Chin SL, Rangarajan S, et al. Global and regional
Multivariable regression models can estimate causal effects of potentially modifiable risk factors associated with acute
effects (what is the effect of an exposure on a disease?) stroke in 32 countries (INTERSTROKE): a case-control study. Lancet
2016;388:761–75. 10.1016/S0140-6736(16)30506-2
or predict outcomes (who is most at risk?) but cannot do 6 Swann OV, Holden KA, Turtle L, et al. Clinical characteristics of
both. “Factors associated with” studies often confuse children and young people admitted to hospital with covid-19 in
United Kingdom: prospective multicentre observational cohort
these different aims, and use language and methods study. BMJ 2020;370:m3249. 10.1136/bmj.m3249
related to both paradigms in the same research study. 7 Williamson EJ, Walker AJ, Bhaskaran K, et al. Factors associated
with COVID-19-related death using OpenSAFELY. Nature New Biol
2020;584:430–6. 10.1038/s41586-020-2521-4
8 Petrilli CM, Jones SA, Yang J, et al. Factors associated with hospital
Conclusions admission and critical illness among 5279 people with coronavirus
We recommend that researchers should not use the disease 2019 in New York City: prospective cohort study. BMJ
“factors associated with” method and scientific jour- 2020;369:m1966. 10.1136/bmj.m1966
9 Grasselli G, Greco M, Zanella A, et al. Risk Factors Associated With
nals should not publish these analyses. Many exam- Mortality Among Patients With COVID-19 in Intensive Care Units
ples of “factors associated with” studies have results in Lombardy, Italy. JAMA Intern Med 2020;180:1345–55. 10.1001/
jamainternmed.2020.3539
that seem reasonable. We are concerned, however, that 10 Sly PD, Gangell CL, Chen L, et al. Risk factors for bronchiectasis
these studies add little beyond common sense, and in children with cystic fibrosis. N Engl J Med 2013;368:1963–70.
10.1056/NEJMoa1301725
may be misleading and harmful. To our knowledge, no 11 LeardMann CA, Powell TM, Smith TC, et al. Risk factors associated
“factors associated with” studies have led to important with suicide in current and former US military personnel. JAMA
2013;310:496–506. 10.1001/jama.2013.65164
scientific progress, despite the publication of many 12 Gärtner R, Jensen M-B, Nielsen J, et al. Prevalence of and factors
studies each year. Hence we believe that the “factors associated with persistent pain following breast cancer surgery.
associated with” study design should be abandoned. JAMA 2009;302:1985–92. 10.1001/jama.2009.1568
13 Westreich D, Greenland S. The table 2 fallacy: presenting and
interpreting confounder and modifier coefficients. Am J Epidemiol
AUTHOR AFFILIATIONS 2013;177:292–8. 10.1093/aje/kws412
14 Andrade C. HARKing, Cherry-Picking, P-Hacking, Fishing Expeditions,
1
Bradford Centre for Health Data Science, Bradford Institute for and Data Dredging and Mining as Questionable Research Practices.
Health Research, Bradford, UK J Clin Psychiatry 2021;82:20f13804. 10.4088/JCP.20f13804
15 Stamler J, Rhomberg P, Schoenberger JA, et al. Multivariate analysis 24 Heinze G, Dunkler D. Five myths about variable selection. Transpl Int
BMJ Medicine: first published as 10.1136/bmjmed-2025-001375 on 27 October 2025. Downloaded from [Link] on 28 October 2025 by guest.
of the relationship of seven variables to blood pressure: findings of 2017;30:6–10. 10.1111/tri.12895
the Chicago Heart Association Detection Project in Industry, 1967- 25 Head ML, Holman L, Lanfear R, et al. The extent and consequences
1972. J Chronic Dis 1975;28:527–48. 10.1016/0021-9681(75)90060-0 of p-hacking in science. PLoS Biol 2015;13:e1002106. 10.1371/
16 Eisenberg M, Bergner L, Hallstrom A. Paramedic programs and [Link].1002106
out-of-hospital cardiac arrest: I. Factors associated with successful 26 Davey Smith G. Data dredging, bias, or confounding. BMJ
resuscitation. Am J Public Health 1979;69:30–8. 10.2105/ 2002;325:1437–8. 10.1136/bmj.325.7378.1437
ajph.69.1.30 27 Kerr NL. HARKing: Hypothesizing After the Results are Known. Pers
17 Uffelmann E, Huang QQ, Munung NS, et al. Genome-wide Soc Psychol Rev 1998;2:196–217. 10.1207/s15327957pspr0203_4
association studies. Nat Rev Methods Primers 2021;1. 10.1038/ 28 Hernan MA, Robins JM. What If: causal inference. Boca Raton:
s43586-021-00056-9 Chapman & Hall/CRC, Available: [Link]
18 de Lusignan S, Dorward J, Correa A, et al. Risk factors for SARS-CoV-2 whatifbook
among patients in the Oxford Royal College of General Practitioners 29 Tennant PWG, Murray EJ, Arnold KF, et al. Use of directed acyclic
Research and Surveillance Centre primary care network: a cross- graphs (DAGs) to identify confounders in applied health research:
sectional study. Lancet Infect Dis 2020;20:1034–42. 10.1016/S1473- review and recommendations. Int J Epidemiol 2021;50:620–32.
3099(20)30371-6 10.1093/ije/dyaa213
19 Griffith GJ, Morris TT, Tudball MJ, et al. Collider bias undermines our 30 Lee H-J, Gwak S-Y, Kim K, et al. Factors associated with high-
understanding of COVID-19 disease risk and severity. Nat Commun intensity physical activity and sudden cardiac death in
Protected by copyright, including for uses related to text and data mining, AI training, and similar technologies.
2020;11:5749. 10.1038/s41467-020-19478-2 hypertrophic cardiomyopathy. Heart 2025;111:253–61. 10.1136/
20 Phillips CJ, LeardMann CA, Vyas KJ, et al. Risk Factors Associated heartjnl-2024-324928
With Suicide Completions Among US Enlisted Marines. Am J 31 Basu J, Nikoletou D, Miles C, et al. High intensity exercise programme
Epidemiol 2017;186:668–78. 10.1093/aje/kwx117 in patients with hypertrophic cardiomyopathy: a randomized trial.
21 Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic Eur Heart J 2025;46:1803–15. 10.1093/eurheartj/ehae919
research. Epidemiology (Sunnyvale) 1999;10:37–48. 32 Ommen SR, Ho CY, Asif IM, et al. 2024 AHA/ACC/AMSSM/HRS/
22 Hernán MA, Monge S. Selection bias due to conditioning on a PACES/SCMR Guideline for the Management of Hypertrophic
collider. BMJ 2023;381:1135. 10.1136/bmj.p1135 Cardiomyopathy: A Report of the American Heart Association/
23 Smith G. Step away from stepwise. J Big Data 2018;5:32. 10.1186/ American College of Cardiology Joint Committee on Clinical Practice
s40537-018-0143-6 Guidelines. Circulation 2024;149. 10.1161/CIR.0000000000001250