0% found this document useful (0 votes)
108 views

Eronen Bringmann 2021

Uploaded by

Bogdan Sopterean
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views

Eronen Bringmann 2021

Uploaded by

Bogdan Sopterean
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/348905524

The Theory Crisis in Psychology: How to Move Forward

Article  in  Perspectives on Psychological Science · January 2021


DOI: 10.1177/1745691620970586

CITATIONS READS

27 596

2 authors:

Markus Ilkka Eronen Laura Bringmann


University of Groningen University of Groningen
33 PUBLICATIONS   466 CITATIONS    58 PUBLICATIONS   1,811 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

When and Why to Replicate View project

Nature and History of the Concept of 'Levels of Organization' View project

All content following this page was uploaded by Laura Bringmann on 08 February 2021.

The user has requested enhancement of the downloaded file.


970586
research-article2021
PPSXXX10.1177/1745691620970586Eronen, BringmannTheory Crisis in Psychology

ASSOCIATION FOR
PSYCHOLOGICAL SCIENCE

Perspectives on Psychological Science

The Theory Crisis in Psychology: 1­–10


© The Author(s) 2021

How to Move Forward Article reuse guidelines:


sagepub.com/journals-permissions
DOI: 10.1177/1745691620970586
https://doi.org/10.1177/1745691620970586
www.psychologicalscience.org/PPS

Markus I. Eronen1 and Laura F. Bringmann2


1
Department of Theoretical Philosophy, and 2Department of Psychometrics
and Statistics, University of Groningen

Abstract
Meehl argued in 1978 that theories in psychology come and go, with little cumulative progress. We believe that this
assessment still holds, as also evidenced by increasingly common claims that psychology is facing a “theory crisis”
and that psychologists should invest more in theory building. In this article, we argue that the root cause of the theory
crisis is that developing good psychological theories is extremely difficult and that understanding the reasons why it
is so difficult is crucial for moving forward in the theory crisis. We discuss three key reasons based on philosophy of
science for why developing good psychological theories is so hard: the relative lack of robust phenomena that impose
constraints on possible theories, problems of validity of psychological constructs, and obstacles to discovering causal
relationships between psychological variables. We conclude with recommendations on how to move past the theory
crisis.

Keywords
theory, phenomena, robustness, validity, causation

In recent years, more and more authors have called atten- In the 40 years that have passed since Meehl’s (1978)
tion to the fact that the theoretical foundations of psy- classic article, the role of theories in psychology has
chology are shaky (e.g., Fiedler, 2017; Gigerenzer, 2010; not changed much. For example, the book ABC of
Klein, 2014; Muthukrishna & Henrich, 2019; Oberauer & Behavior Change Theories lists 83 theories in the field
Lewandowsky, 2019; Reber, 2016; Robinaugh et  al., of behavior change alone, ranging from self-regulation
2020; van Rooij, 2019). The claim is that psychological and self-efficacy theories to ecological models (Michie
theories are in general of poor quality and that the focus et  al., 2014).1 It is safe to assume that none of these
in psychology should shift more toward developing better theories is universally accepted or decisively refuted.
theories instead of (just) improving statistical techniques As a more specific example, consider ego-depletion
and practices and performing more replication studies. In theory (Baumeister et al., 1998, 2000). After a period
other words, we are facing a “theory crisis” that is more of great enthusiasm, this theory has been heavily criti-
fundamental than the replication crisis that has received far cized in recent years, and currently there is no conclu-
more attention (Muthukrishna & Henrich, 2019; Oberauer sive evidence either for or against it (Friese et al., 2019).
& Lewandowsky, 2019; Reber, 2016). An explanation for the lack of theoretical progress in
This point is of course not new but notably was also psychology is that psychological theories tend to be
emphasized by Paul Meehl throughout his career (e.g., formulated so vaguely or abstractly that it is difficult to
Meehl, 1967, 1978, 1990). Meehl pointed out that psy- falsify or test them (Meehl, 1978, 1990). Moreover, even
chological scientists are fond of developing new theo- when a theory is found to be deficient and unable to
ries, but instead of resulting in cumulative theoretical explain some phenomena, psychological scientists often
progress, these theories tend to just come and go: Theo- continue to use it, focusing on its past successes (e.g.,
ries are neither decisively refuted nor accepted as part the Rescorla-Wagner model of classical conditioning;
of established knowledge; they simply hang around until
they are abandoned or forgotten. He mentions as exam-
Corresponding Author:
ples theories of “level of aspiration” and “risky shift,” Markus I. Eronen, Department of Theoretical Philosophy, University of
which were received with much enthusiasm in the 1930s Groningen
and 1960s, respectively, but are now largely forgotten. E-mail: [email protected]
2 Eronen, Bringmann

Miller et al., 1995). These factors result in a plethora of evidence for the Stroop effect. If we then want to explain
coexisting and overlapping psychological theories that the phenomena, we need theories that describe how
are known to be deficient but have not been decisively they come about.2
falsified (Meehl, 1990). Therefore, a common theme in This framework is well established and has been
the recent literature on the theory crisis is that psycho- applied to psychological science (Borsboom et  al.,
logical theories should be improved by making them 2021; Haig, 2013). However, the relationships between
more formal and precise or by teaching psychologists theories and phenomena are usually discussed only as
how to build better theories (e.g., Gigerenzer, 2010; “one-way traffic”: A theory is formulated to explain
Muthukrishna & Henrich, 2019; Oberauer & Lewandowsky, phenomena, and therefore it should be possible to
2019; van Rooij & Baggio, 2020). derive or predict the relevant phenomena from the
We find these efforts important and laudable. How- theory. For example, a central (and, in our view, valid)
ever, in this article, we take a different approach. We argument in the theory debate in psychological science
argue that the core of the problem is that developing is that psychological theories are so vaguely formulated
good psychological theories is extremely difficult and that they do not make precise predictions regarding
that understanding the reasons why it is so difficult is phenomena (e.g., Oberauer & Lewandowsky, 2019).
a crucial first step in making progress in the theory What has received far less attention in this debate is
crisis. In other words, the problem is not (just) that that this relationship is bidirectional: Phenomena also
psychological scientists do not put enough effort into impose constraints on the possible theories (Bechtel &
developing theories or do not know how to build theo- Richardson, 1993; Craver & Darden, 2013). In other
ries but that there are great obstacles to building good words, a theory has to be consistent with all the rele-
psychological theories because of the nature of the vant phenomena of the field, which narrows the space
subject matter. To explain and analyze these obstacles, of possible theories.
we draw from recent philosophy of science. Let us illustrate this with an example. Before intro-
With this approach, we follow in Meehl’s footsteps: ducing the theory of evolution, Charles Darwin had
In the article that is the focal point of this special issue gathered an immense amount of descriptive evidence
(Meehl, 1978), he provided a list of difficulties that (Browne, 2006; Darwin, 1859; Rozin, 2001). During his
make human psychology hard to study scientifically. famous voyage on the H.M.S. Beagle (lasting nearly 5
However, Meehl was naturally relying on the philoso- years), he made numerous observations and wrote them
phy of science of the day, and since then there have down in his notebooks, which in the framework
been many developments that are highly relevant for described above correspond to data. From these data
the theory crisis, especially in understanding the nature he derived interesting patterns, such as the distribution
of data, theories, and causality. We draw from these of different but very similar bird species on the islands
developments in philosophy of science and discuss of the Galapagos. Over the years after his return, Darwin
three key reasons for why developing good psychologi- intensively studied a broad range of topics, including
cal theories is so hard: the lack of constraints on theo- selective breeding, the fossil record, and the samples
ries by robust phenomena, problems of validity of he had collected during the voyage. In all of these
psychological constructs, and obstacles to discovering areas, he found phenomena suggesting that species
causal relationships between psychological variables. have common ancestors and are selected by nature in
a manner analogous to selective breeding. He wrote
the Origin of Species, a large part of which consists of
Phenomena as Constraints for Theories
detailed descriptions of the various lines of evidence,
In this section, we argue that phenomena constrain theory on the basis of these findings (Browne, 2006; Darwin,
development in science, but that in psychological science, 1859).
there is not enough knowledge of robust phenomena to Importantly, this evidence was not only diverse but
impose sufficient constraints. To start with, in philosophy also highly robust: The phenomena were verifiable and
of science, it is common to distinguish among data, phe- detectable in several independent ways and not depen-
nomena, and theories (Bogen & Woodward, 1988; Haig, dent on a specific theoretical framework or observation
2013; Woodward, 1989). Data are the raw observations method (Eronen, 2015, 2019; Kuorikoski & Marchionni,
based on experiments or data collection: In the case of 2016; Munafò & Smith, 2018; Wimsatt, 2007). For exam-
psychological science, they can be, for example, ple, the patterns of the evolution of traits could be
responses to questionnaires or observations of behav- observed in the selective breeding of pigeons, cattle, and
ior. Data serve as evidence for phenomena, which are dogs, and any other researcher could in principle con-
relatively stable features of the world: For example, the firm these patterns. These phenomena were therefore
data from different Stroop task experiments provide generally agreed on in the scientific community and
Theory Crisis in Psychology 3

imposed very strong constraints on the space of possible the 95% confidence intervals of the effect size included
theories. A theory of evolution had to fit with not just zero. The authors concluded that “if there is any effect,
one or two of these robust patterns but with all of them. it is close to zero” (p. 558). Moreover, it has been
The history of astronomy provides an even more pointed out that even if the effect is real, the available
striking example of the constraints that phenomena evidence is compatible with other theories in addition
impose on theories. In this case, the relevant phenom- to the strength model of self-control (Inzlicht & Friese,
ena were the patterns in the movement of celestial 2019). For example, in the process model proposed by
objects (most importantly the moon and the planets). Inzlicht and Schmeichel (2012), the ego-depletion effect
These patterns were based on centuries of observations is explained by reduced motivation and shifts in atten-
and highly robust; the problem was coming up with a tion instead of a generic resource that is depleted.
theory that satisfied the stringent constraint imposed Importantly, this is not an isolated example. The
by the phenomena (Hoskin, 1997). Ptolemy’s geocentric numerous replication failures of findings in psychology,
model, according to which planets followed complex even phenomena that were thought to be well estab-
epicycle-based trajectories, survived for centuries partly lished (e.g., stereotype threat, neonatal imitation, vari-
because it was extremely difficult to come up with a ous priming effects; Bird, 2018), suggest that the
theory that would have fit the phenomena better or situation is similar in other areas of psychology (Inzlicht
equally well (Hoskin, 1997). Thus, when Copernicus & Friese, 2019). In other words, in many areas of psy-
and Galileo developed their heliocentric theories, the chology, there is no broad range of robust phenomena
space of possible theories was very strongly constrained that would impose strong constraints on theories. This
by the phenomena. The constraints on contemporary means that the possible theories are underdetermined
theoretical physics are even more extreme: There is a by evidence: The available evidence (i.e., the relevant
vast body of robust and undisputed patterns ranging phenomena) is not sufficient to determine which theory
from particle physics to astronomy, and any new theory we should believe to be true (Stanford, 2017). 3 In this
of physics needs to be consistent with all of these light, it is not surprising that little theoretical progress
patterns. has been made in areas of psychology in which rela-
The situation in psychological science is very differ- tively few robust phenomena have been established.
ent. To see this, let us recall the distinction between
data and phenomena. In psychological science, there Psychological Constructs and
is an increasing amount of data available from question-
naires, wearable devices, Internet behavior, and so on.
Epistemic Iteration
However, these data are often of questionable quality Another important factor explaining why there are so
(see the next section), and many areas of psychological few good theories in psychology is the lack of attention
science still have no large body of robust phenomena on improving and validating psychological constructs.
comparable to that of biology or physics. In the psychological literature, we find a large and
As an example, consider the ego-depletion effect increasing number of psychological constructs. New
(Baumeister et al., 1998, 2000): the phenomenon that constructs and corresponding scales are constantly
people perform worse on a task requiring self-control introduced, new terms are invented for what seem to
(e.g., solving a difficult puzzle) after having previously be old constructs, the same term is used for apparently
engaged in a task requiring self-control (e.g., resisting different constructs, and so on (Hagger, 2014). For
the temptation to eat cookies). The original and highly example, in her review of constructs in the psychologi-
influential theory explaining this phenomenon is the cal literature on control, Ellen Skinner (1996) found
strength (or muscle, or resource) model of self-control, more than 30 constructs related to perceived control
according to which self-control is a limited and domain- alone, and since then many more have been introduced
general resource that is used by any tasks that require (Hagger, 2014).
self-control and can be depleted (Baumeister et  al., In principle, to be acceptable scientific constructs, all
1998, 2000). of these psychological constructs should have construct
Hundreds of studies that seem to support this theory validity. The notion of construct validity was introduced
have been published (Inzlicht & Friese, 2019). However, by Cronbach and Meehl (1955), and its meaning has
in recent years, both the ego-depletion effect itself and greatly evolved and ramified in the decades that fol-
the theory behind it have been called into question lowed (Newton & Shaw, 2013). Some of the core ideas
(Friese et al., 2019). In a multilab preregistered replica- are that the construct should be embedded in a theoreti-
tion study (Hagger et al., 2016), little evidence for ego cal framework (or a “nomological network” as originally
depletion was found: The overall effect size was small phrased by Cronbach & Meehl, 1955) and that measure-
(d = 0.04), and for most of the participating laboratories, ments of the construct should be valid in the sense that
4 Eronen, Bringmann

they measure what they are intended to measure (Borsboom changed since then, although it is increasingly clear
et al., 2004). that the validity of the construct is problematic (De
The problem is that although it is widely agreed that Jonge et al., 2015; Fried, 2017). For example, because
construct validity is crucially important, in practice psy- there is great heterogeneity in different cases of MDD
chological scientists give it very little attention com- (e.g., two individuals can have MDD without sharing a
pared with measures such as reliability. For example, single symptom), it is doubtful that MDD in itself is a
Flake et al. (2017) reviewed a random sample of articles well-defined category (Fried, 2017). In addition, the
published in Journal of Personality and Social Psychol- numerous scales that are used to measure MDD often
ogy and found that most of the articles reviewed have little content overlap, making it unclear whether
reported no validity evidence whatsoever for the con- they are really measuring one and the same construct
structs used. When evidence was reported, it typically (Fried, 2017; Fried & Flake, 2018).
consisted only of a citation to another article. Likewise, It is illuminating to contrast these examples with the
the articles collected in Zumbo and Chan (2014) show natural sciences. Concepts or classifications in the natu-
that psychological scientists tend to report relatively ral sciences are constantly refined through further
little validity evidence and focus much more on other experiments and observations and by improving the
psychometric properties, most importantly reliability. The theoretical framework in which they are embedded. A
simplest explanation for this is that providing reliability concept that is initially rough and poorly defined (e.g.,
evidence is relatively easy, whereas providing validity the commonsense notion “fish”) is refined and recon-
evidence is very hard. For the former, there are well- ceptualized (e.g., into the concept “Pisces” in the tra-
established and quantified measures, such as Cronbach’s ditional Linnaean taxonomy of species, defined roughly
α. For the latter, there is no simple quantitative mea- as finned animals perpetually living in water), and then
sure, and there is not even agreement on what construct the new version is again tested and adjusted on the
validity is or what validity evidence should amount to basis of new theories and evidence (e.g., “Pisces” is no
(Newton & Shaw, 2013). If construct validity is under- longer considered a scientific category but has been
stood in terms of the phrase “the test should measure divided into several distinct classes on the basis of
what it is intended to measure,” which often appears evolutionary relationships).
in textbooks and guidelines, then establishing validity Examples of this are abundant in the sciences: For
requires showing that variation in the attribute of inter- example, the concept “electron” was introduced to
est is actually causing the variation in the test scores physics in the 1890s, and it initially meant an elemen-
(Bringmann & Eronen, 2016; Borsboom et  al., 2004). tary unit of electric charge, but since then its meaning
As construct validation of this kind is hardly ever done, has evolved through experiments and theoretical
the result is that psychological science is permeated by advances such as the quantum theory, and now “elec-
numerous psychological constructs of unknown validity tron” refers to an elementary particle that is a fermion,
(Flake et al., 2017; Fried & Flake, 2018). has a charge of −1, spin of 1/2, and so on. Chang (2004,
Ego-depletion research is a prime example of this. 2016) calls this process “epistemic iteration” and char-
As Lurquin and Miyake (2017) point out, the key con- acterizes it as “a process in which successive stages of
cept “self-control” has never been clearly defined or knowledge, each building on the preceding one, are
operationalized. It is often used very broadly to refer created in order to enhance the achievement of certain
to any kind of (inhibitory) control over thoughts, emo- epistemic goals” (Chang, 2004, p. 224)
tions, or actions without further specifying the nature In contrast, in psychology, this kind of iteration is
of this control (Lurquin & Miyake, 2017). Moreover, the not the norm, although official guidelines emphasize
setups that are used to measure or manipulate self- the importance of validation and how it should be seen
control in ego-depletion studies have never been vali- as an ongoing process (Flake et al., 2017). There are,
dated (Inzlicht & Friese, 2019). In a recent study, however, some positive exceptions (see also Kendler,
Wimmer et al. (2019) systematically tested one of the 2012). For example, when Ebbinghaus pioneered the
most widely used tasks to induce ego depletion, the scientific study of memory in the 1880s, he was treating
letter-cancellation task, in which participants have to “memory” as a monolithic commonsense notion and
cross off letters following complex rules. They did not did not distinguish between different kinds of memory
find any evidence that this task would affect self-control (Tulving, 2007). In subsequent research, especially
or inhibitory control (Wimmer et al. 2019). starting from the 1950s, many different kinds of memory
As an example from clinical psychology, consider have been introduced, such as nondeclarative and
major depressive disorder (MDD). The definition of declarative memory, the latter of which can be further
MDD stems from the 1970s and has not essentially divided into episodic and semantic memory (Michaelian
Theory Crisis in Psychology 5

& Sutton, 2017). The different categories and kinds of of the interventionist theory of causation (Woodward,
memory are not fixed but are still refined and debated 2003, 2015; see also Pearl, 2000, 2009), which lays out the
in light of new evidence and arguments (Tulving, 2007). conditions for inferring causal relationships in a clear and
One practical reason why psychological constructs general way.
are often so resistant to change is “generative entrench- The characteristic feature of causal relationships is
ment,” a concept coined and developed by William that (unlike correlations) they are relationships that are
Wimsatt (1986, 2007). Once a concept has many other exploitable for manipulation and control: Intervening
concepts, theories, or practices depending on it, it on the cause is a way of bringing about a change in
becomes “entrenched” and will be very difficult to the effect. The interventionist theory takes this as the
change, even if it is known to be deficient or prob- starting point and defines causation (roughly) as fol-
lematic. This is because changing the concept could lows: X is a cause of Y if (and only if) it is possible to
collapse the structures depending on it, leading either intervene on X to change Y when other variables are
to a disaster or a revolution (Wimsatt, 2007, p. 140). held fixed to their values. The intervention should be
Psychological constructs (especially in clinical psy- an unconfounded manipulation of X with respect to Y:
chology) often become deeply entrenched over time, The manipulation of X should not change Y via any
as they have applications not only in other theories other route that does not go through X (for more pre-
and models but also in society at large. For example, cise definitions, see Eronen, 2020; Woodward, 2003). It
constructs such as MDD play an important role in is not always necessary to actually perform an interven-
diagnosing patients or in making decisions about tion; sometimes it is possible to gain knowledge about
health insurance. the effects of interventions indirectly, for example, on
However, epistemic iteration and validation of psy- the basis of observational data. The same ideas also
chological constructs is crucially important for finding appear in different forms in other approaches to causa-
a way out of the theory crisis. As we argued in the tion that are more familiar to psychological scientists,
previous section, the basis for good theories is robust such as Rubin’s causal model (e.g., Rubin, 2005) or
phenomena. Phenomena, in turn, are inferred from data, Campbell’s causal model (e.g., Shadish et al., 2002).
and if the data are based on constructs and measure- Randomized controlled trials are usually taken to be
ments that for the most part have not been well under- the “gold standard” for causal inference and for satisfy-
stood or validated, the phenomena that are inferred are ing the above conditions. For example, in a drug trial,
unlikely to be robust. In other words, one source for participants are randomly assigned to treatment and
the lack of robust phenomena in psychology is the lack control groups, and this randomization generates the
of emphasis on the process of construct validation. effect of “holding fixed” other variables than the cause
(the drug) and the effect (recovery). The intervention
Psychological Theories and the of administering the drug to participants in the treat-
ment group should be unconfounded: For example,
Problem of Finding Causes there should not be other ingredients in the pill that
The third reason why there are so few good theories would affect recovery through a causal route that goes
in psychology is that finding psychological causes is around the drug itself.
extremely challenging. It is widely agreed that a key Many psychological experiments involve the manip-
feature of good theories is that they should, in one way ulation of nonpsychological causes, such as drugs, edu-
or another, track causal relationships (e.g., Craver, 2007; cational materials, or visual and auditory stimuli
Pearl, 2000; Woodward, 2003). For example, Darwin’s (Eronen, 2020). In such cases, performing the right
theory of evolution described the causes of evolution kinds of interventions is in principle not more difficult
(natural selection), and the DNA theory describes the than in other fields. Therefore, the following arguments
causal mechanism of inheritance. In this light, it is rea- do not concern the venerable experimental tradition,
sonable to require that psychological theories, insofar going all the way back to Wilhelm Wundt, of manipu-
as they aim to explain how the mind works, should lating external independent variables and tracking their
also reflect the causal mechanisms of the mind (Bechtel, psychological effects. However, if the aim is to develop
2008; Thomas & Sharp, 2019). In other words, they substantive psychological theories that describe causal
should capture causal relationships between psycho- mechanisms of the mind, establishing causal relation-
logical variables. ships between external independent variables and psy-
The problem, however, is that discovering causal rela- chological variables is not enough: We also need to
tionships between psychological variables is often learn causal relationships between psychological vari-
extremely difficult or impossible, as extensively argued in ables. And to do this, we need to learn about the effects
Eronen (2020). To explain why, we rely on the framework of interventions on psychological variables.
6 Eronen, Bringmann

The problem with interventions on psychological provide a reliable basis for causal inference. The
variables is that they are typically “fat-handed” (Eronen, experimental tradition of manipulating external factors
2020)4: They do not change just the one variable that and tracking their psychological effects cannot simply
is targeted but several other variables as well. This is be extended to manipulate psychological variables, as
because there is no direct way of manipulating psycho- interventions on psychological variables are entirely
logical variables such as thoughts or affects (Chiesa, different in kind and far more difficult than interven-
1992; Hughes et  al., 2016). Instead, they have to be tions on external variables (see also Chiesa, 1992; De
manipulated indirectly via verbal instruction or other Houwer, 2011). Insofar as psychological theories
external stimuli, and such techniques are typically not should track causal relationships, this is an important
precise enough to change just one variable. For exam- factor in explaining why there are so few good theo-
ple, it is (at least currently) impossible to manipulate ries in psychology and why they are so difficult to
feelings of loss of control without changing any other develop.
psychological states, such as motivation, attention, or
feelings of anxiety. Moreover, psychological variables
Discussion
can be measured only indirectly, for example, on the
basis of self-reports or behavioral proxies (De Houwer, In this article, we have discussed three fundamental
2011). This makes it very difficult to verify or check difficulties in developing good psychological theories:
what variables the intervention precisely changed and the lack of (sufficient) robust phenomena, the lack of
therefore to what extent it was fat-handed. validity and epistemic iteration for psychological con-
This creates a problem for finding psychological structs, and the problem of establishing psychological
causes because when interventions are fat-handed, we causes. These issues should be addressed and discussed
cannot assume that they are unconfounded manipula- to make progress in resolving the theory crisis. We now
tions that license causal inferences. More specifically, outline several recommendations for psychological
we cannot assume that they change putative effect Y research on the basis of these issues.
only via a route that goes through the putative cause First, our discussion supports the recent calls for
X. To illustrate this, let us again focus on ego-depletion more “phenomena detection” or “phenomenon-driven
research. In ego-depletion experiments, self-control is research” in psychology (Borsboom et  al., 2021; De
manipulated in very diverse ways (e.g., by letting par- Houwer, 2011; Haig, 2013; see also Trafimow & Earp,
ticipants engage in a complex or frustrating task or 2016). By discovering new phenomena and gathering
game or by letting them resist the temptation to eat more robust evidence for those already discovered, the
delicious food; Friese et al., 2019). To warrant the con- space of possible theories will be constrained.
clusion that self-control is the cause of impaired per- Another important reason to support phenomenon-
formance in the second task, these interventions should driven research is that phenomena can also be extremely
be unconfounded manipulations of self-control with important for science and society as such (Eronen,
respect to the putative effect (i.e., impaired perfor- 2020): Consider, for example, the broad range of cogni-
mance in the second task). In other words, they should tive biases that psychologists have discovered, such as
change self-control in such a way that other possible confirmation bias, most of which are very robust phe-
causes of the effect are not affected (e.g., motivation, nomena (Gilovich et al., 2002). Various theories have
attention, feelings of anger). However, given the rather been proposed to explain these phenomena, such as
general nature of the interventions and our lack of the attribute-substitution theory, according to which
knowledge of the causal structure of self-control and people substitute difficult computations with simple
related constructs (motivation, attention, etc.), we can- heuristics, or the more general dual-system theory
not realistically assume that this is the case (Friese (Kahneman & Frederick, 2002). However, these theories
et al., 2019). For example, resisting the temptation to are far more controversial than the phenomena them-
eat cookies might also affect motivation or induce feel- selves. Moreover, knowing that these phenomena exist
ings of anger and frustration. This means that ego- is extremely important for science and society, even if
depletion experiments do not provide sufficient evidence we do not know the theory or mechanism behind them.
that a diminished self-control resource is the cause for The same holds for a broad range of other robust phe-
the impaired performance in the second task which is nomena discovered in psychology, for example, the phe-
indeed in line with the conclusion reached in recent nomenon that people tend to prefer familiar stimuli to
reviews of the state of the research (Friese et al., 2019; unfamiliar ones (i.e., the mere-exposure effect; Bornstein,
Inzlicht & Friese, 2019). 1989). Simply knowing that these phenomena exist and
In sum, interventions on psychological variables are describing them is useful, even in the absence of an
likely to be fat-handed, and such interventions do not accepted theory that would explain them.
Theory Crisis in Psychology 7

In addition to being discovered and being described, be taken to be an iterative and ongoing process instead
phenomena can also be further analyzed by looking for of just a hurdle that needs be crossed. In our view,
shared abstract structures in different phenomena strengthening the conceptual basis of psychological
(Hughes et al., 2016). For example, at an abstract level, theories is at least as important as improving statistical
phenomena as different as constantly checking your techniques and practices in psychological research.
phone and rewarding the good behavior of children with In the long run, this will also help with the problem
candy can both be seen as instances of (positive) rein- of causal inference, as having clearly defined and
forcement (Hughes et al., 2016). For all of these reasons, clearly measurable constructs makes it easier to per-
phenomena detection should be seen as an important form targeted interventions and to track their effects.
goal in itself and as a central part of psychological With sufficiently well-defined constructs and valid mea-
research (see also Fiedler, 2017; Haig, 2013; Rozin, 2001). surements, it may also be possible to eventually infer
However, we by no means intend to suggest that theo- causal relationships from purely observational data (for
rizing in psychology is hopeless or a waste of resources more, see, e.g., Eronen, 2020; Rohrer, 2018). Another
or that we should return to a kind of behaviorism in possible reaction to the problem of finding psychologi-
which theories about mental processes are rejected as cal causes is to develop noncausal theories, for exam-
unscientific. The issues we have raised should not be ple, in the form of abstract functional principles
seen as insurmountable obstacles but rather as challenges extracted from phenomena (De Houwer, 2011; Hughes
that need to be met before good psychological theories et al., 2016), although whether noncausal theories can
can be developed in a given domain. be truly explanatory is a matter of ongoing debate (see,
This brings us to our next point: It is doubtful whether e.g., Reutlinger & Saatsi, 2018).
making psychological theories more mathematical or Fortunately, there are ongoing research programs in
formal, which is a common theme in the recent literature psychology that exemplify the good practices we have
(e.g., Borsboom et al., 2021; Muthukrishna & Henrich, describe above. For example, after the recent disap-
2019; Oberauer & Lewandowsky, 2019; van Rooij & pointments in ego-depletion research, there are now
Baggio, 2020), will lead to significant advances in psy- increasing efforts to better define the key constructs,
chology as a science. 5 None of the problems we have such as self-control and related concepts, and to vali-
discussed is solved by formalizing psychological theo- date different ways of measuring them (Friese et  al.,
ries: There will still be no large body of robust phenom- 2019; Inzlicht & Friese, 2019; Lurquin & Miyake, 2017).
ena to constrain the theories, the constructs used do not A broader example is the functional-cognitive paradigm
become more valid, and a formal treatment alone does (De Houwer, 2011; Hughes et  al., 2016) that aims at
not solve the problem of causality and fat-handed inter- first establishing environment-behavior relations (robust
ventions. Moreover, many successful and extremely phenomena) and then formulating explanations for
important theories in the life sciences are not formalized them in terms of clearly defined mental constructs that
or mathematical theories (e.g., the fermentation theory act as mediators. Finally, as a more concrete example,
or the theory of synaptic transmission; Bechtel & Rich- Robinaugh et  al. (2020) propose a theory for panic
ardson, 1993; Machamer, Darden & Craver, 2000). As disorder that is tailored to this specific disorder and
pointed out by Rozin (2001; see also Morey et al., 2018), thereby constrained by phenomena (there is robust
using complex statistical and computational models evidence for many central phenomena related to panic
does not make psychology more scientific and can be attacks), and the authors also explicitly focus on defin-
even counterproductive if the conceptual and empirical ing the key concepts.
basis (e.g., robust phenomena) is not yet solid. To conclude, we believe that the most fundamental
Finally, it is hard to overemphasize the importance of factor underlying the theory crisis is that the subject
having clearly and transparently defined concepts as the matter itself, psychology, makes it very hard to develop
basis for theories. Note that this is not the same as for- good theories (Meehl, 1978). Drawing on contemporary
malization of theories: Concepts can be well defined in philosophy of science, we have discussed three central
qualitatively formulated theories as well (e.g., Darwin’s challenges to developing psychological theories: There
theory of evolution), and formal theories can have are often not enough robust phenomena to constrain
poorly defined concepts as their elements (e.g., models theories, not enough attention is paid to defining and
in memetics that have a clear mathematical structure validating constructs, and establishing psychological
but for which the central concept “meme” is not well causes is very hard. We hope that this article brings
defined; Kronfeldner, 2011). Conceptual clarification more attention to these crucial issues and thereby helps
and construct validation should be seen as an important to provide more solid building blocks for the theoretical
and valuable parts of research, and validation should foundations of psychology.
8 Eronen, Bringmann

Transparency Journal of Personality and Social Psychology, 74, 1252–


1265.
Action Editor: Travis Proulx and Richard Morey
Baumeister, R. F., Muraven, M., & Tice, D. M. (2000). Ego
Advisory Editor: Richard Lucas
depletion: A resource model of volition, self-regulation, and
Editor: Laura A. King
controlled processing. Social Cognition, 18(2), 130–150.
Declaration of Conflicting Interests
Baumgartner, M., & Gebharter, A. (2016). Constitutive rel-
The author(s) declared that there were no conflicts of
evance, mutual manipulability, and fat-handedness. The
interest with respect to the authorship or the publication
British Journal for the Philosophy of Science, 67, 731–756.
of this article.
Bechtel, W. C. (2008). Mental mechanisms. Routledge.
Bechtel, W. C., & Richardson, R. C. (1993). Discovering com-
ORCID iDs plexity: Decomposition and localization as strategies in
Markus I. Eronen https://orcid.org/0000-0003-2028-3338 scientific research. Princeton University Press.
Laura F. Bringmann https://orcid.org/0000-0002-8091-9935 Bird, A. (2018). Understanding the replication crisis as a
base rate fallacy. The British Journal for the Philosophy
of Science. Advance online publication. https://doi.org/
Acknowledgments
10.1093/bjps/axy051
We thank Freek Oude Maatman and Sven Ulpts for helpful Bogen, J., & Woodward, J. (1988). Saving the phenomena.
comments on an earlier draft. We are also very grateful to The Philosophical Review, 97(3), 303–352.
Travis Proulx and the two reviewers for their extensive and Bornstein, R. F. (1989). Exposure and affect: Overview and
constructive feedback. meta-analysis of research, 1968–1987. Psychological
Bulletin, 106(2), 265–289.
Notes Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2004).
1. This example is borrowed from Lakens (2019). The concept of validity. Psychological Review, 111(4),
2. Because there is no consensus on the definition of “theory,” 1061–1071.
we use the term very broadly in this article to include also mod- Borsboom, D., van der Maas, H., Dalege, J., Kievit, R., &
els, nonquantitative theories, and descriptions of mechanisms. Haig, B. (2021). Theory construction methodology: A
3. Meehl (1990) made an analogous point regarding the test- practical framework for theory formation in psychology.
ability of psychological theories: Perspectives on Psychological Science. Advance online
publication. https://doi.org/10.1177/1745691620969647
There exists an implicit misconception, ubiquitous among stu- Bringmann, L. F., & Eronen, M. I. (2016). Heating up the
dents and professors studying soft areas . . . This misconception measurement debate: What psychologists can learn
is that, if a theoretical conjecture is “scientifically meaning- from the history of physics. Theory & Psychology, 26(1),
ful” (not theological or metaphysical or so vague as to cover 27–43.
anything), then it must be possible to test it at the present Browne, J. (2006). Darwin’s origin of species: A biography.
time. Even a slight familiarity with the history of astronomy, Allen & Unwin.
physics, chemistry, medicine, and genetics shows that such a Chang, H. (2004). Inventing temperature: Measurement and
metatheoretical notion is plainly false. . . . The most dramatic scientific progress. Oxford University Press.
example from biological science in recent times, and one of Chang, H. (2016). The rising of chemical natural kinds through
the two or three greatest scientific discoveries ever made, is epistemic iteration. In C. Kendig (Ed.), Natural kinds and
Crick and Watson’s theory of the DNA. No amount of theoreti- classification in scientific practice (pp. 53–66). Routledge.
cal ingenuity would have enabled them to do this, let alone Chiesa, M. (1992). Radical behaviorism and scientific frame-
test it, until chemical methods were sufficiently precise to be works: From mechanistic to relational accounts. American
able to show that in any organism the adenine and thymine Psychologist, 47(11), 1287–1299.
are always precisely equal in the number of molecules pres- Craver, C. F. (2007). Explaining the brain. Oxford University
ent, as are the guanine and cytosine. (p. 239) Press.
Craver, C. F., & Darden, L. (2013). In search of mechanisms:
4. The notion of fat-handed interventions was introduced to
Discoveries across the life sciences. University of Chicago
the philosophy of psychology by Baumgartner and Gebharter
Press.
(2016) and Romero (2015) as an alternative to Craver’s mutual
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in
manipulability criterion for constitutive relevance (Craver,
psychological tests. Psychological Bulletin, 52(4), 281–302.
2007). The kind of fat-handedness that we are discussing in this
Darwin, C. (1859). On the origin of species by means of natural
article is independent from the fat-handedness due to constitu-
selection. John Murray.
tion discussed by these authors.
De Houwer, J. (2011). Why the cognitive approach in psy-
5. Of course, if ”formal theories” is understood in a very general
chology would profit from a functional approach and
sense as theories that are clearly and explicitly formulated and
vice versa. Perspectives on Psychological Science, 6(2),
not necessarily quantitative or mathematical in structure, we
202–209. https://doi.org/10.1177/1745691611400238
agree that formal theories are preferable to nonformal ones.
De Jonge, P., Wardenaar, K. J., & Wichers, M. (2015). What
kind of thing is depression? Epidemiology and Psychiatric
References Sciences, 24(4), 312–314.
Baumeister, R. F., Bratslavsky, E., Muraven, M., & Tice, D. M. Eronen, M. I. (2015). Robustness and reality. Synthese, 192,
(1998). Ego depletion: Is the active self a limited resource? 3961–3977.
Theory Crisis in Psychology 9

Eronen, M. I. (2019). Robust realism for the life sciences. Kahneman, D., & Frederick, S. (2002). Representativeness
Synthese, 196, 2341–2354. revisited: Attribute substitution in intuitive judgment. In
Eronen, M. I. (2020). Causal discovery and the problem of psy- T. Gilovich, D. Griffin, & D. Kahneman (Eds.), Heuristics
chological interventions. New Ideas in Psychology, 59, Article and biases (pp. 49–81). Cambridge University Press.
100785. https://doi.org/10.1016/j.newideapsych.2020 Kendler, K. S. (2012). Epistemic iteration as a historical model
.100785 for psychiatric nosology: Promises and limitations. In K.
Fiedler, K. (2017). What constitutes strong psychological sci- Kendler & J. Parnas (Eds.), Philosophical issues in psychia-
ence? The (neglected) role of diagnosticity and a priori try II: Nosology (pp. 305–322). Oxford University Press.
theorizing. Perspectives on Psychological Science, 12(1), Klein, S. B. (2014). What can recent replication failures tell us
46–61. https://doi.org/10.1177/1745691616654458 about the theoretical commitments of psychology? Theory
Flake, J. K., Pek, J., & Hehman, E. (2017). Construct validation & Psychology, 24(3), 326–338.
in social and personality research: Current practice and Kronfeldner, M. (2011). Darwinian creativity and memetics.
recommendations. Social Psychological and Personality Routledge.
Science, 8(4), 370–378. Kuorikoski, J., & Marchionni, C. (2016). Evidential diver-
Fried, E. I. (2017). Moving forward: How depression hetero- sity and the triangulation of phenomena. Philosophy of
geneity hinders progress in treatment and research. Expert Science, 83, 227–247.
Review of Neurotherapeutics, 17(5), 423–425. https://doi Lakens, D. [@lakens] (2019, September 20). The Scheel
.org/10.1080/14737175.2017.1307737 Theorem: Things get more personal in psych because
Fried, E. I., & Flake, J. K. (2018). Measurement matters. people have their own theory. Consequence: Books like
APS Observer, 31(3), 29–30. https://www.psychologi the ABC . . . [Tweet]. https://twitter.com/lakens/status/
calscience.org/observer/measurement-matters 1174963097158578176
Friese, M., Loschelder, D. D., Gieseler, K., Frankenbach, J., Lurquin, J. H., & Miyake, A. (2017). Challenges to ego-
& Inzlicht, M. (2019). Is ego depletion real? An analysis depletion research go beyond the replication crisis: A
of arguments. Personality and Social Psychology Review, need for tackling the conceptual crisis. Frontiers in Psy­
23(2), 107–131. chology, 8, Article 568. https://doi.org/10.3389/fpsyg.2017
Gigerenzer, G. (2010). Personal reflections on theory and .00568
psychology. Theory & Psychology, 20(6), 733–743. Machamer, P., Darden, L., & Craver, C. F. (2000). Thinking
Gilovich, T., Griffin, D., & Kahneman, D. (Eds.). (2002). about mechanisms. Philosophy of science, 67(1), 1–25.
Heuristics and biases: The psychology of intuitive judg- Meehl, P. E. (1967). Theory-testing in psychology and physics:
ment. Cambridge University Press. A methodological paradox. Philosophy of Science, 34(2),
Hagger, M. S. (2014). Avoiding the “déjà-variable” phenom- 103–115.
enon: Social psychology needs more guides to con- Meehl, P. E. (1978). Theoretical risks and tabular asterisks:
structs. Frontiers in Psychology, 5, Article 52. https://doi Sir Karl, Sir Ronald, and the slow progress of soft psy-
.org/10.3389/fpsyg.2014.00052 chology. Journal of Consulting and Clinical Psychology,
Hagger, M. S., Chatzisarantis, N. L., Alberts, H., Anggono, 46(4), 806–834. https://doi.org/10.1037/0022-006X.46.
C. O., Batailler, C., Birt, A. R., Brand, R., Brandt, M. J., 4.806
Brewer, G., Bruyneel, S., Calvillo, D. P., Campbell, W. K., Meehl, P. E. (1990). Why summaries of research on psycho-
Cannon, P. R., Carlucci, M., Carruth, N. P., Cheung, logical theories are often uninterpretable. Psychological
T., Crowell, A., De Ridder, D. T. D., Dewitte, S., . . . Reports, 66(1), 195–244.
Zwienenberg, M. (2016). A multilab preregistered rep- Michaelian, K., & Sutton, J. (2017). Memory. In E. N. Zalta
lication of the ego-depletion effect. Perspectives on (Ed.), The Stanford encyclopedia of philosophy (Summer
Psychological Science, 11(4), 546–573. https://doi.org/ 2017 ed.). https://plato.stanford.edu/archives/sum2017/
10.1177/1745691616652873 entries/memory
Haig, B. D. (2013). Detecting psychological phenomena: Michie, S. F., West, R., Campbell, R., Brown, J., & Gainforth,
Taking bottom-up research seriously. The American H. (2014). ABC of behaviour change theories. Silverback
Journal of Psychology, 126(2), 135–153. Publishing.
Hoskin, M. (1997). Astronomy in antiquity. In M. Hoskin Miller, R. R., Barnet, R. C., & Grahame, N. J. (1995). Assessment
(Ed.), The Cambridge illustrated history of astronomy of the Rescorla-Wagner model. Psychological Bulletin,
(pp. 22–47). Cambridge University Press. 117(3), 363–386.
Hughes, S., De Houwer, J., & Perugini, M. (2016). The func- Morey, R., Homer, S., & Proulx, T. (2018). Beyond statis-
tional-cognitive framework for psychological research: tics: Accepting the null hypothesis in mature sciences.
Controversies and resolutions. International Journal of Advances in Methods and Practices in Psychological
Psy­chology, 51(1), 4–14. Science, 1(2), 245–258.
Inzlicht, M., & Friese, M. (2019). The past, present, and future Munafò, M. R., & Smith, G. D. (2018). Robust research needs
of ego depletion. Social Psychology, 50(5-6), 370–378. many lines of evidence. Nature, 553, 399–401.
https://doi.org/10.1027/1864-9335/a000398 Muthukrishna, M., & Henrich, J. (2019). A problem in theory.
Inzlicht, M., & Schmeichel, B. J. (2012). What is ego depletion? Nature Human Behaviour, 3, 221–229.
Toward a mechanistic revision of the resource model of Newton, P. E., & Shaw, S. D. (2013). Standards for talking
self-control. Perspectives on Psychological Science, 7(5), and thinking about validity. Psychological Methods, 18(3),
450–463. 301–319.
10 Eronen, Bringmann

Oberauer, K., & Lewandowsky, S. (2019). Addressing the the- Thomas, J. G., & Sharp, P. B. (2019). Mechanistic science:
ory crisis in psychology. Psychonomic Bulletin & Review, A new approach to comprehensive psychopathology
26(5), 1596–1618. research that relates psychological and biological phe-
Pearl, J. (2000). Causality: Models, reasoning, and inference. nomena. Clinical Psychological Science, 7(2), 196–215.
Cambridge University Press. Trafimow, D., & Earp, B. D. (2016). Badly specified theories
Pearl, J. (2009). Causal inference in statistics: An overview. are not responsible for the replication crisis in social
Statistics Surveys, 3, 96–146. psychology: Comment on Klein. Theory & Psychology,
Reber, R. (2016, April 30). The theory crisis in psychology. 26(4), 540–548.
Psychology Today. https://www.psychologytoday.com/ Tulving, E. (2007). Are there 256 different kinds of memory?
intl/blog/critical-feeling/201604/the-theory-crisis-in-psy In J. S. Nairne (Ed.), The foundations of remembering:
chology Essays in honor of Henry L. Roediger, III (pp. 39–52).
Reutlinger, A., & Saatsi, J. (Eds.). (2018). Explanation beyond Psychology Press.
causation. Oxford University Press. van Rooij, I. (2019, January 18). Psychological science needs
Rohrer, J. M. (2018). Thinking clearly about correlations and theory development before preregistration. Psychonomic
causation: Graphical causal models for observational data. Society. https://featuredcontent.psychonomic.org/psy
Advances in Methods and Practices in Psychological Science, chological-science-needs-theory-development-before-
1(1), 27–42. https://doi.org/10.1177/2515245917745629 preregistration/
Robinaugh, D., Haslbeck, J. M. B., Waldorp, L., Kossakowski, van Rooij, I., & Baggio, G. (2021). Theory before the test: How
J. J., Fried, E. I., Millner, A., McNally, R. J., van Nes, E. H., to build high-verisimilitude explanatory theories in psy-
Scheffer, M., Kendler, K. S., & Borsboom, D. (2020). chological science. Perspectives on Psychological Science.
Advancing the network theory of mental disorders: A com- Advance online publication. https://doi.org/10.1177/17456
putational model of panic disorder. PsyArXiv. https://doi 91620970604
.org/10.31234/osf.io/km37w Wimmer, M. C., Dome, L., Hancock, P. J., & Wennekers, T.
Romero, F. (2015). Why there isn’t inter-level causation in (2019). Is the letter cancellation task a suitable index of
mechanisms. Synthese, 192(11), 3731–3755. ego depletion? Social Psychology, 50(5-6), 345–354.
Rozin, P. (2001). Social psychology and science: Some lessons Wimsatt, W. C. (1986). Developmental constraints, generative
from Solomon Asch. Personality and Social Psychology entrenchment, and the innate-acquired distinction. In W.
Review, 5(1), 2–14. Bechtel (Ed.), Integrating scientific disciplines. Science
Rubin, D. B. (2005). Causal inference using potential out- and philosophy (pp. 185–208). Springer.
comes: Design, modeling, decisions. Journal of the Wimsatt, W. C. (2007). Re-engineering philosophy for lim-
American Statistical Association, 100(469), 322–331. ited beings: Piecewise approximations to reality. Harvard
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). University Press.
Experimental and quasi-experimental designs for gener- Woodward, J. (1989). Data and phenomena. Synthese, 79(3),
alized causal inference. Houghton-Mifflin. 393–472.
Skinner, E. A. (1996). A guide to constructs of control. Journal Woodward, J. (2003). Making things happen. A theory of
of Personality and Social Psychology, 71(3), 549–570. causal explanation. Oxford University Press.
Stanford, K. (2017). Underdetermination of scientific the- Woodward, J. (2015). Methodology, ontology, and interven-
ory. In E. N. Zalta (Ed.), The Stanford encyclopedia of tionism. Synthese, 192, 3577–3599.
philosophy (Winter 2017 ed.). https://plato.stanford Zumbo, B. D., & Chan, E. K. (Eds.). (2014). Validity and
.edu/archives/win2017/entries/scientific-underdeterm validation in social, behavioral, and health sciences (Vol.
ination 54). Springer.

View publication stats

You might also like