Lengersdorff Lamm 2025 With Low Power Comes Low Credibility Toward A Principled Critique of Results From Underpowered
Lengersdorff Lamm 2025 With Low Power Comes Low Credibility Toward A Principled Critique of Results From Underpowered
research-article2025
AMPXXX10.1177/25152459241296397Lengersdorff, LammAdvances in Methods and Practices in Psychological Science
ASSOCIATION FOR
General Article PSYCHOLOGICAL SCIENCE
Advances in Methods and
Abstract
Researchers should be motivated to adequately power statistical tests because tests with low power have a low probability
of detecting true effects. However, it is also often claimed that significant results obtained by underpowered tests are
less likely to reflect a true effect. Here, we critically discuss this “low-power/low-credibility” (LPLC) critique from both
frequentist and Bayesian perspectives. Although the LPLC critique is first and foremost a critique of frequentist tests, it
is itself not consistent with frequentist theory. In particular, it demands that researchers have some information on the
probability that a hypothesis is true before they test it. However, such prior probabilities are dismissed as meaningless in
frequentist inference, and we demonstrate that they cannot be meaningfully introduced into frequentist thinking. Even
when adopting a Bayesian perspective, however, significant results from tests with low power can provide a nonnegligible
amount of support for the tested hypothesis. We conclude that even though low power reduces the chances to obtain
significant findings, there is little justification to dismiss already obtained significant findings on the basis of low power
alone. However, concerns about low power might often reflect suspicions that findings were produced by questionable
research practices. If this is the case, we suggest that communicating these issues transparently rather than using low
power as a proxy concern will be more appropriate. We conclude by providing suggestions on how results from tests
with low power can be critiqued for the correct reasons and in a constructive manner.
Keywords
inference, questionable research practices
Psychological science is in a period of reform. In the grounds alone, every researcher should be discouraged
wake of the replication crisis, empirical researchers and from conducting studies with sample sizes that are too
methodologists have raised awareness about question- small to detect plausible effect sizes because they will
able research practices (QRPs), pointing out that many likely result in a waste of resources and time. However,
analytical habits decrease the credibility of scientific it now seems a widely held belief that underpowered
output (Bakker et al., 2012; Open Science Collaboration, tests are problematic for another reason: Not only does
2015). Most reform efforts address the (mis)use of fre- low power decrease the probability of a significant result
quentist inference, especially in the form of significance when the hypothesis is true, but (so goes the notion) it
testing and p values (McShane et al., 2019; Wicherts also makes the significant results that are obtained less
et al., 2016). One of the biggest concerns raised is that
many of the conducted statistical tests suffer from low
Corresponding Author:
power (Button et al., 2013; Ioannidis, 2005; Lindsay,
Lukas L. Lengersdorff, Social, Cognitive, and Affective Neuroscience
2015). Per definition, a statistical test with low power Unit, Department of Cognition, Emotion, and Methods in Psychology,
has a low probability of producing a significant result Faculty of Psychology, University of Vienna, Vienna, Austria
even when the tested hypothesis is true. On these Email: [email protected]
Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License
(https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission
provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
2 Lengersdorff, Lamm
credible because they have a lower probability of reflect- critique must include a statement for which effect sizes
ing a true finding. This idea is prominently voiced in the test’s power was too low. As Morey (2019) put it
Ioannidis’s (2005) influential article as one of the reasons succinctly, “‘This sample size feels small’ is not good
“why most published research findings are false” and enough.” Whenever we use phrases such as “low power”
cited and further expanded in numerous other articles or “underpowered test” in this article, we take the point
(e.g., Button et al., 2013; Colquhoun, 2014; but for a criti- of view of scientists who assess the power of a test to
cal evaluation, see Wagenmakers et al., 2015). We from be lower than the standard they would find acceptable
now on refer to this as the “low-power/low-credibility” given the range of effect sizes they would find interest-
(LPLC) critique. ing and plausible in the given situation.
In this article, we sketch out the development of the
LPLC critique and discuss the assumptions one needs to
Frequentist and Bayesian Probability
accept to use it from both frequentist and Bayesian points
of view. We also debate whether results from tests with The aim of the present article is to assess the LPLC cri-
low power are particularly susceptible to being invali- tique from the perspective of the two dominating schools
dated by the use of QRPs and evaluate the LPLC critique of statistical inference: frequentist and Bayesian statistics.
in the presence of publication bias. We expect that These two frameworks differ fundamentally in their
researchers with a general interest in statistical inference understanding of probability, which is why we first need
will benefit from reading this article (if only by the oppor- to shortly explain these differences. According to the
tunity to reflect on why they disagree with our points). frequentist interpretation, probability is the relative fre-
We would like to achieve two things: first, to make sci- quency with which repeatable processes produce out-
entists reflect, question, and openly communicate their comes (Fisher, 1935; Neyman & Pearson, 1933). This is
own assumptions when they use the LPLC critique and consistent with many real-life applications of probability,
second, to empower scientists to demand the same level such as games of chance. However, the frequentist inter-
of reflection and open communication from peers who pretation does not allow the assignment of a probability
use the LPLC critique when assessing their work, for to nonrepeatable, singular events. In particular, from a
example, during peer review. At the end of this article, frequentist perspective, one cannot assign a probability
we provide recommendations on how this reflection and to the truth of a certain hypothesis. In contrast, the
communication could be achieved in practice. Bayesian interpretation treats probability as a way to
We would like to explicitly emphasize here that only quantify knowledge (e.g., De Finetti, 1974; Jaynes, 2003).
an ill-informed reading of our arguments could be used Consequently, Bayesian probability can make statements
as an “excuse” for deliberately conducting studies with about singular events and hypotheses.
suboptimal sample sizes. After all, tests that have low Consider the following example: A friend tells you that
power for even the largest plausible effect size are very she will toss a fair coin. Before the toss, she asks you
likely to produce nonsignificant results. We expect that about the probability of the coin showing heads. What
in the presence of the reflection and open communica- do you answer? In this situation, the obvious answer of
tion we advocate, such studies will be justifiably assessed 50% is a valid statement in both the frequentist and Bayes-
as irrelevant or inefficient to achieve scientific ian frameworks. Now, your friend tosses the coin and
progress. covers the result with her hand. What is the probability
that the coin under her hand is showing heads? As a
Bayesian, you still have every reason to say 50%—you
What Does “Underpowered” Mean? know that a fair coin toss results in heads in 50% of cases,
The recent debate about the prevalence of “underpow- and since this particular toss happened, you gained no
ered” tests can create the impression that power is an additional knowledge about its outcome. From the fre-
objective property of a test. Indeed, in our view, calling quentist point of view, however, you find it impossible to
a test or study “underpowered” is often used as a short- give a meaningful answer. You cannot see through your
hand to say that the sample size is too small relative to friend’s hand, but you are certain that beneath it, the coin
some explicit or implicit standard. However, one test shows either heads or tails. Thus, the only “probability”
with one given sample size does not have “one” power you could assign to this event would be 100% or 0%.
but, rather, many different levels of power for different The same difference arises whenever the truth of sci-
possible effect sizes of interest (Morey & Lakens, 2016). entific hypotheses is discussed. In frequentist inference,
However, what effect sizes are “interesting” or “plausi- the probability of some particular hypothesis being true
ble” must ultimately be decided by scientists and their is not meaningful because its truth is seen as a fixed
peers, and it is through this decision that a substantial state of the world. In contrast, Bayesian inference allows
amount of subjective consideration enters the assess- the assignment of probability to a hypothesis to quantify
ment of a test’s power. Thus, a fair and transparent the existing evidence for its truth.
Advances in Methods and Practices in Psychological Science 8(1) 3
The Original Formulation of the LPLC To complete the derivation of the LPLC critique, it is
Critique easily shown that for any fixed value of α and P ( H true ),
a decrease in power 1 − β leads to a decrease in the PPV.
The LPLC critique originated in an attempt to transfer In the words of the LPLC critique’s proponents, this rela-
the logic of diagnostic medical screening to tests of tion shows that low power comes with a low “post-study
epistemic inference (Colquhoun, 2014; Mayo & Morey, probability” that the “claimed research finding is true”
2017). Its central concept is the “positive predictive (Ioannidis, 2005, paragraph 2, “Modeling the Framework
value” (PPV), a term frequently used in epidemiology. for False Positive Findings”) and that “low power . . .
In epidemiology, the PPV of a test for some disease is reduces the likelihood that a statistically significant result
the proportion of individuals who really have the disease reflects a true effect” (Button et al., 2013, p. 365).
among those individuals for which the test gave a posi- Note that in the publications of Ioannidis (2005) and
tive result (Altman & Bland, 1994). The PPV is calculated, others (e.g., Button et al., 2013), the focus lays on the
using Bayes’s theorem, as effects of low power on the evidence supplied by popu-
lations of published studies. However, the authors also
P ( +|D ) P ( D )
PPV = P ( D| + ) = , invited the use of their critique to individual tests. This
P ( +|D ) P ( D ) + (1 − P ( −|not D ) ) (1 − P ( D ) ) is seen in statements such as “the probability that a
research finding is indeed true depends on . . . the sta-
where the prevalence P ( D ) is the probability that any tistical power of the study” (Ioannidis, 2005, paragraph
randomly selected member of the population of interest 2, “Modeling the Framework for False Positive Findings”)
has the disease, the sensitivity P ( + | D ) is the probability and “a study with low statistical power has a reduced
of a positive result given that the patient has the disease, chance of detecting a true effect, but it is less well appre-
and the specificity P ( −|not D ) is the probability of a ciated that low power also reduces the likelihood that
negative result when the patient does not have the dis- a statistically significant result reflects a true effect” (But-
ease. Note that here, the use of Bayes’s theorem is valid ton et al., 2013, p. 365). Indeed, the LPLC critique is often
within the frequentist framework because all involved also used in the assessment of single studies, and it is
quantities can be interpreted as relative frequencies. on this use that we focus here. With small adaptations,
The central step in the formulation of the LPLC cri- our arguments can also be extended to the population-
tique is to apply the same logic to significance tests. In level LPLC critique, however.
this framework, hypotheses take the role of “individuals,”
and the truth of hypotheses takes the role of the “dis-
ease.” Sensitivity and specificity are identified with the The LPLC Critique in Frequentist Inference
corresponding quantities from frequentist statistics: The The LPLC critique takes a peculiar position within fre-
power of the test, typically written as 1 − β, is equated quentist inference. Concepts such as power and Type I
with the sensitivity, whereas the Type I error probability errors are central to its formulation. However, by introduc-
α is understood to be the complement of the test’s speci- ing the prevalence P ( H true ), the LPLC critique becomes
ficity. However, the prevalence has no direct counterpart decidedly “unfrequentist.” Frequentist tests are designed
in frequentist terminology. Therefore, the quantity to perform inference without any reference to the prob-
P ( H true ) is introduced, defined as the proportion of true ability of hypotheses, which is even dismissed as a mean-
hypotheses among some larger “population” of hypoth- ingless concept (e.g., Fisher, 1935; Neyman & Pearson,
eses, for example, all hypotheses tested in a certain 1933). In the same vein, many recent publications describe
research field. Most formulations of the LPLC critique the fallacies that arise whenever researchers erroneously
represent P ( H true ) by the corresponding odds ratio assume that frequentist tests tell anything about the prob-
P ( H true ) ability of a hypothesis being true (Greenland et al., 2016;
R= , defined as the “ratio of the number of
1 − P ( H true ) Morey et al., 2016; Wasserstein & Lazar, 2016).
‘true relationships’ to ‘no relationships’ among those Why, then, is the LPLC critique widely accepted by a
tested in the field” (Ioannidis, 2005, paragraph 2, “Mod- predominantly frequentist field instead of being dis-
eling the Framework for False Positive Findings”) or “the missed as another such fallacy? Its similarity to diagnostic
odds that a probed effect is indeed non-null among the screening is very appealing, but it is easily overlooked
effects being probed” (Button et al., 2013, p. 366). The that one cannot treat tested hypotheses like screened
PPV can then be calculated as (cf. Ioannidis, 2005) patients without making very strong assumptions about
the nature of the scientific process. For the LPLC critique
(1 − β ) P ( Htrue ) to be consistent with frequentist theory, one would need
PPV =
(1 − β ) P ( Htrue ) + α (1 − P ( Htrue ) ) to assume that tested hypotheses are randomly sampled
from some population of hypotheses, just as screened
(1 − β ) R individuals are sampled from a population of people.
= .
(1 − β ) R + α (1) This assumption is best illustrated by a variation of the
4 Lengersdorff, Lamm
classic urn model, in which the urn is “filled” with true Findings”), the frequentist probability of any particular
and false hypotheses (Colquhoun, 2014; Mayo & Morey, hypothesis being true is 100% or 0% independent of the
2017). Researchers do not know of any particular hypoth- test’s result or its power. Rather, the PPV is the probabil-
esis if it is true, but they do have knowledge of P ( H true ), ity that a randomly selected hypothesis that produced a
the percentage of hypotheses in the urn that are true. significant result is true. Mistaking the latter for the for-
Researchers should now imagine that when they perform mer is similar to the common misunderstanding that a
scientific inference, they randomly draw a hypothesis given 95% confidence interval contains the true param-
from the urn, perform the significance test with a speci- eter with a probability of 95% (when it is rather the case
fied Type I error and power, and accept or reject the that in the long run, 95% of confidence intervals will
hypothesis based on the test’s result. Then, the proportion contain the true parameter; Morey et al., 2016).
of true hypotheses among all accepted hypotheses will Ultimately, these problems arise because frequentist
be the PPV. However, this is clearly not a useful model inference pursues a different objective than the one
of the scientific process, for a number of reasons. implied by the LPLC critique. Significance tests have
First, hypotheses are not randomly sampled. Scientists been designed with the intention to achieve optimal
do not randomly draw their hypotheses from some larger error control in repeated decision situations without any
set of distinguishable hypotheses, all of which are consideration of prior information. Researchers run into
equally likely to be selected. Rather, new research ques- inconsistencies, however, when they demand that sig-
tions are derived from existing findings, and the results nificance tests inform them about the probability that
of one study change the likelihood that another study claimed findings are true.
will be started.
Second, it is unclear which population of hypotheses
should be considered. The proponents of the LPLC cri-
The LPLC Critique in Bayesian Inference
tique propose that P ( H true ) should reflect the number of Accepting the Bayesian interpretation of probability
true hypotheses in the research field (Button et al., 2013; makes the LPLC critique much simpler—one does not
Ioannidis, 2005). But how does one define the field? For need to identify P ( H true ) with the proportion of true
example, as psychologists, should we consider all pos- hypotheses in some hard-to-define population. Instead,
sible hypotheses within psychology or only within our researchers may define it as the “proper” prior probabil-
discipline or subdiscipline? Do we consider all hypoth- ity that the hypothesis in question is true, reflecting
eses that have ever been tested in this field, or only all available knowledge. One may then also interpret
recent hypotheses, or maybe all hypotheses that could the PPV as the posterior probability that the hypoth-
possibly be tested in the future? Do we consider even esis is true after observing the significant result,
the most absurd hypotheses, or do we restrict the popu- PPV = P ( H true | sig.). It is mathematically correct to say
lation to “reasonable” ones? If the latter, how do we that lower power leads to lower values of P ( H true | sig.).
define “reasonable”? These questions demonstrate that However, this relationship should not lead one to dis-
the LPLC critique suffers heavily from the so-called miss significant findings from underpowered tests
reference-class problem (Hájek, 2007; Mayo & Morey, entirely. Rather, it prompts one to determine exactly how
2017). The choice of an appropriate reference class of much (or little) evidence the results can provide (see
hypotheses must ultimately be based on the assumptions also Wagenmakers et al., 2015).
and goals of the individuals performing or assessing the Figure 1 illustrates the posterior probability P ( H true | sig.)
analysis, introducing an element similarly subjective as after a test significant at α = .05 for different prior prob-
the priors elicited in Bayesian analyses. abilities P ( H true ) and different levels of power. We show
Third, P(Htrue) cannot be known or estimated. The that P ( H true | sig.) does decrease with decreasing power.
situation implied by the LPLC critique is rather peculiar: However, whether low power makes the result not cred-
Researchers do not know of any particular hypothesis ible depends heavily on its prior probability and on what
whether it is true, yet they do know the proportion of values of P ( H true | sig.) one considers credible or not cred-
true hypotheses in the field. It is not clear how research- ible. If, for example, we preassign a prior probability of
ers could have such substantial prior knowledge about 50% to the hypothesis, significant results from tests with
hypotheses that have not yet been tested without resort- power levels of .80, .50, and .20 provide posterior prob-
ing to such Bayesian solutions as expert knowledge or abilities of 94.1%, 90.9%, and 80.0%, respectively. Should
subjective belief. the difference between 94.1% and 90.9% prompt us to
But even if one solved (or ignored) all of these issues, accept the result of the test with the conventional power
the PPV would still not give the probability that any of .80 but to dismiss the result of the test with the low
particular significant finding is true. Although the PPV power of .50? And can we dismiss the significant result
has been equated with “the post-study probability that of a test with the very low power of .20 given that it
[a research finding] is true” (Ioannidis, 2005, paragraph provides the (perhaps surprisingly high) posterior prob-
2, “Modeling the Framework for False Positive ability of 80%? The answers to these questions depend
Advances in Methods and Practices in Psychological Science 8(1) 5
0.6
0.5
P ( H true | sig.) P (Data | H true ) P ( H true )
0.4 = × ,
Power P ( H false | sig.) P (Data | H false ) P ( H false )
0.3
{
1 BFsig
0.2 0.8
0.1 0.5 where BFsig , the BF obtained through a significant result,
0.2 is given by
0.0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P ( sig.| H true ) 1 − β
BFsig = = .
P(Htrue) P ( sig.| H false ) α
BF
if not, that this is transparently communicated and that
The BF is independent of P ( H true ) and can therefore be appropriate corrections for multiple comparisons have
interpreted as an objective measure of how strongly the been performed). But significant results reported in psy-
data support the tested hypothesis. In a fully Bayesian chological science are often produced by QRPs, such as
6 Lengersdorff, Lamm
selective reporting and p-hacking (e.g., John et al., The relationship between Q and BFsig. rep is depicted in
2012). Moreover, it has been suggested that the results Figure 2. For all levels of power, evidence decreases
of studies reporting low-powered tests are more likely sharply with increasing Q . If we assume a high probabil-
to be “tainted” by QRPs than those of studies with higher ity that nonsignificant findings will be “embellished”
power (Bakker et al., 2012; Schimmack, 2012). The rea- with QRPs, the evidence provided by even highly pow-
soning behind this is that researchers who do not obtain ered tests is negligibly small. If, for example, we assume
a significant result will, with a certain probability, use Q larger than 0.30, even a test with maximal power
QRPs to nevertheless produce and report “publishable” cannot produce a BFsig. rep larger than 3, the conventional
significant findings. Researchers who conduct under- threshold for moderate evidence.
powered studies are more likely to obtain insignificant To conclude, the use of QRPs can completely nullify
results and thus to be tempted to use QRPs than research- the evidence gained from results. Power plays only a
ers who conduct appropriately powered studies. Cru- side role in this. A study that shows clear signs of
cially, however, this reasoning holds only in cases in p-hacking or selective reporting cannot be saved by a
which the alternative hypothesis is, indeed, true. When large sample size (for simulation results that large sample
the alternative hypothesis is false, low- and high- sizes are not a protective factor against most p-hacking
powered studies have the same probability (1 − α ) of strategies, see also Stefan & Schönbrodt, 2023). At the
producing nonsignificant results and are at the same risk same time, low power gives little reason to question the
of being affected by QRPs. Thus, maybe surprisingly, results of an otherwise well-designed, transparently
low power increases the likelihood that a significant reported, and perhaps preregistered study.
result was produced by QRPs only in those cases in
which it reflects a true finding anyway.
The LPLC Critique in the Presence
Does this mean that QRPs are no issue after all? To
the contrary, QRPs vastly dilute the evidence one can of Publication Bias
gain from studies of low and excellent power alike. For Another phenomenon that severely affects the credibility
illustration, consider the following two-step model for of research output is publication bias (Franco et al.,
the generation of research “findings.” First, a researcher 2014). It is well established that significant findings are
conducts the originally planned hypothesis test. If this far more likely to be published, leading to an overesti-
produces a significant result, it is reported as such. If the mation of effects and even the widespread acceptance
test is nonsignificant, the researcher will, with a certain of false theories because of spurious evidence. This
probability Q , use QRPs to produce and report a signifi- relates to the application of the LPLC critique inasmuch
cant result anyway. Now, when a researcher reports a as it might unduly increase scientists’ estimates of the
significant result, it can reflect either a questionable or prior probability and subsequently also the posterior
nonquestionable finding, depending on whether QRPs probability of research hypotheses. These issues high-
were used; and it can be either a true positive (TP) or a light that one’s belief in a hypothesis should be informed
false positive (FP), depending on whether the tested not only by the number of significant findings reported
hypothesis is true or false. The resulting four possible but also the plausibility of the hypothesis, its fit into
events have the following probabilities: well-established theories, and the quality of previous
research.
P ( Nonquestionable TP ) = (1 − β ) × P ( H true ) In view of the ubiquity of publication biases, one
could argue that decision makers such as editors and
P ( Nonquestionable FP ) = α × P ( H false ) reviewers should apply a “preventive” LPLC critique,
denying publication to studies whose tests are, by some
P ( Questionable TP ) = Q × β × P ( H true ) standard, considered underpowered. The goal of such a
policy would be to prevent an “overflow effect,” in which
P ( Questionable FP ) = Q × (1 − α ) × P ( H false ). false positives from a large number of small sample stud-
ies (which are quicker and cheaper to conduct than large
To demonstrate the effects of QRPs on evidence, it is sample studies) lead to the establishment of spurious
illustrative to consider the BF produced by a reported new theories. This is an arguable position, but we note
significant finding, two crucial points. First, this would necessitate an elabo-
rate and clearly communicated policy to decide when
P ( Nonquestionable TP ) + P ( Questionable TP) power will be considered too low: Given the whole range
BFsig. rep = of possible effect sizes of interest, is the power of a
P ( Nonquestionable FP ) + P ( Questionable FP)
study’s tests so low that the field would not want studies
1 − β + Qβ like this to be published for preventive reasons? Second,
= .
α + Q (1 − a ) this form of the LPLC critique would inevitably prioritize
Advances in Methods and Practices in Psychological Science 8(1) 7
group. This test would have a power of around .20, 30%, which is not enough to convince me. With a
which I consider too low. In this situation, I would power of .80, a significant result would raise the
be willing to accept a power of .80 or larger. posterior probability to above 60%, which would
make me consider it further.
3a. If you are concerned that the
I am highly skeptical of the claims the authors are
current result came about by the use making, as they are not at all supported by previ-
of QRPs, communicate that concern ous research. I would give the authors’ research
Sometimes, you might suspect that a study’s significant hypothesis a prior probability of being true of at
finding results from the use of QRPs. Low power might most 1%. The significant result from the present
be one of the factors that led you to this suspicion test, which I assume to have had a power of .20,
because the analysis had a low prior probability to find raises this probability only to about 4%. But, to be
the effect it claims to detect. However, there should also fair, even the significant result from a perfectly
be other causes for this concern, such as signs of selec- powered test would give this hypothesis a posterior
tive reporting, abuse of researcher degrees of freedom, probability of only ≈17%. In my opinion, the
lack of or imprecise preregistration, and so on. If this is authors need to provide much more evidence, such
the case, it is important that you clearly explain the as additional experiments with more stringent tests,
reasons for your concerns. You might also name your to make the claims they are making.
conditions for accepting the current results.
For example, Conclusion
I am concerned that the reported result could have There is no question that the ubiquity of underpowered
been obtained through questionable research prac- tests is a pressing problem for psychological science. How-
tices, especially given that the presented analysis ever, we consider it crucial that low power is criticized for
had such a low probability to detect an effect even the correct reasons. Whenever a study can be construc-
if it were there. My concerns are further corrobo- tively critiqued before its implementation, low power
rated by [further evidence for QRPs]. I kindly ask should be pointed out and improvements demanded.
the authors to provide further evidence for their However, as we have outlined in this article, it is less
claims. As it is now, a preregistered replication straightforward to justify the critique of low power after a
would be necessary to convince me of this result. positive result has been obtained. Most important, low
power should not be used as a proxy concern when there
are deeper concerns about the trustworthiness of the
3b. If you are not concerned about reported results and the possible use of QRPs. When sus-
QRPs, assess the result in view of all picions about QRPs arise, they should be directly com-
available information municated as such whenever possible. This way, it is
ensured that issues of scientific integrity and transparency
In the absence of concerns about QRPs, a significant
are not mistaken for issues of statistical power.
result from an ostensibly underpowered test should be
assessed at face value. At this stage, you should openly
Transparency
discuss if, in your view, the result provides convincing
evidence for the truth of the investigated hypothesis. Action Editor: Yasemin Kisbu-Sakarya
Here, you could also make use of Equation 1 to assess Editor: David A. Sbarra
the posterior probability. Author Contributions
Lukas L. Lengersdorff: Conceptualization; Formal analy-
For example,
sis; Visualization; Writing – original draft; Writing – review
& editing.
Given previous results, I would say that this Claus Lamm: Conceptualization; Supervision; Writing –
hypothesis had a 50% chance of being true. With review & editing.
a power of about .5, the test was certainly not Declaration of Conflicting Interests
optimally powered. However, its significant result The author(s) declared that there were no conflicts of inter-
still gives the hypothesis a posterior probability of est with respect to the authorship or the publication of this
about 90%. This is enough evidence to make this article.
result worthwhile for future research. Open Practices
This article has received the badge for Open Materials. More
I would give this research hypothesis only a 10% information about the Open Practices badges can be found
chance of being true. The significant result from at http://www.psychologicalscience.org/publications/badges
the present test, which I assume to have had a
power of .2, raises this probability only to about
Advances in Methods and Practices in Psychological Science 8(1) 9