Scientist’s guide to developing explanatory statistical models using Causal Principles
Scientist’s guide to developing explanatory statistical models using Causal Principles
Citation: Grace, J. B., and K. M. Irvine. 2020. Scientist’s guide to developing explanatory statistical models
using causal analysis principles. Ecology 101(4):e02962. 10.1002/ecy.2962
Abstract. Recent discussions of model selection and multimodel inference highlight a gen-
eral challenge for researchers: how to convey the explanatory content of a hypothesized model
or set of competing models clearly. The advice from statisticians for scientists employing multi-
model inference is to develop a well-thought-out set of candidate models for comparison,
though precise instructions for how to do that are typically not given. A coherent body of
knowledge, which falls under the general term causal analysis, now exists for examining the
explanatory scientific content of candidate models. Much of the literature on causal analysis
has been recently developed, and we suspect may not be familiar to many ecologists. This body
of knowledge comprises a set of graphical tools and axiomatic principles to support scientists
in their endeavors to create “well-formed hypotheses,” as statisticians are asking them to do.
Causal analysis is complementary to methods such as structural equation modeling, which
provides the means for evaluation of proposed hypotheses against data. In this paper, we sum-
marize and illustrate a set of principles that can guide scientists in their quest to develop
explanatory hypotheses for evaluation. The principles presented in this paper have the capacity
to close the communication gap between statisticians, who urge scientists to develop well-
thought-out coherent models, and scientists, who would like some practical advice for exactly
how to do that.
Key words: causal analysis; causal diagrams; explanatory models; multimodel averaging; multimodel
comparison; path analysis; regression; science methodology; structural equation modeling.
knowledge to permit interpretation lies with the scien- some unmeasured common cause? Perhaps X1 has a cau-
tist, not the statistician. This raises the question of where sal influence on X2? Perhaps the other way around? As
scientists can find a set of general rules for developing shown in Fig. 1B, there are many possible reasons for
well-reasoned explanatory hypotheses. three predictors to be correlated, each implying a differ-
We propose that the answer lies in the logical system ent scientific explanation for the nature of the relation-
referred to as causal analysis (Pearl 2000, 2009, Hern an ships between predictors and response. Because
and Robins 2020). Recent advances in the field of causal scientific knowledge about possible explanations for the
analysis provide the core principles for how to evaluate intercorrelations cannot be encoded in the multiple
the causal content of competing hypotheses. Some regression equation because of its simplicity, the hypoth-
aspects of modern causal analysis will be familiar to esis represented by a multiple linear regression model
ecologists through the related literature on structural (Fig. 1A) is incomplete regarding potential mechanisms
equation modeling (SEM; Shipley 2000a, 2016, Grace underlying observed correlations, and thereby, interpre-
2006), whereas other elements are relatively new and tationally ambiguous.
ecological examples are scarce. In this paper, we first Compounding the problem of interpreting regression
illustrate the challenges that accompany attempts at coefficients is the fact that the correlations among pre-
drawing scientific inferences from conventional regres- dictors have profound effects on the estimated regression
CONCEPTS & SYNTHESIS
sion models. We then describe and illustrate a set of coefficients. To show how this is manifested in regression
principles to guide scientists when developing explana- studies, let us turn to a specific example. In the fall of
tory hypotheses for evaluation. 2003, Keeley et al. (2005) established 1,000-m2 plots in
each of 90 sites to investigate the ecological effects of a
series of wildfires that burned through southern Califor-
THE PROBLEM WITH DRAWING SCIENTIFIC INTERPRETATIONS
nia. The investigators measured a set of variables that
FROM MULTIPLE REGRESSION MODELS
might explain variations in postfire vegetation recovery
The challenge to drawing causal conclusions by throughout the region. Measurements included: land-
employing conventional regression methods is tied, in scape position variables (distance from the coast and ele-
part, to the limited capacity of such models to represent vation), prefire stand age (estimated from ring counts of
complete hypotheses. This problem can be made more stems), fire severity (based on skeletal remains of
tangible through graphical representation (Fig. 1). As shrubs), and vegetation cover the spring following the
revealed in Fig. 1A, the predictors in a multiple regres- fires. The intercorrelations among variables in the wild-
sion model can be (and typically are) intercorrelated. fire study (Table 1) were only moderately strong; yet,
However, no causal explanations for the correlations intercorrelation still presents a substantial problem for
among predictors are specified as part of the regression interpreting the contributions of individual factors, as
model. Why are X1 and X2 correlated? Is it because of we will show.
FIG. 1. (A) Graphical depiction of a regression model. Observed variables are in boxes and e represents errors of prediction.
Double-headed arrows represent intercorrelations and regression relationships by unidirectional arrows. (B) Six potential causal
hypotheses for the three correlated predictors (X1–X3). U variables represent unmeasured causes.
April 2020 EXPLANATORY MODELING AND CAUSAL ANALYSIS Article e02962; page 3
TABLE 1. Bivariate correlations among variables used in the TABLE 2. All-subsets regression models for vegetation recovery
wildfire example. following wildfire.
vegcover firesev age elev coastdist Explanatory variables included in the model
vegcover 1 Model Fire Preburn age Distance
firesev 0.437 1 no. severity of stand Elevation from coast
age 0.350 0.454 1 1
elev 0.218 0.117 0.093 1 2 0.084
coastdist 0.243 0.278 0.278 0.606 1 3 0.088
4 0.027
Using conventional regression, all-subsets model com- 5 0.875
parison might be employed (Fisher et al. 2018). In the 6 0.067 0.048
7 0.080 0.021
case of four explanatory variables, there are 16 possible
8 0.077 0.475
regression models that could be specified (Table 2). Can-
9 0.094 0.031
didate models were estimated and ranked (Table 3;
10 0.077 0.568
Appendix S1, Data S1), and ultimately a single model
11 0.014 0.634
in this case) as P[Y|do(Tmt = t)]. Notice the ability to knowledge, and the need to accommodate computational
manipulate the variable Tmt physically is not required relationships in models. Because it depends so critically on
(Bollen and Pearl 2013), though the ability to imagine knowledge outside of an individual study, explanatory
the value of Tmt being set to some value by humans or modeling is best thought of as a process that builds confi-
nature is implied. Note that the common statistical nota- dence in our understanding through a series of investiga-
tion “P(Y|X = x)” means the “probability of observing tions. In this paper, our presentation focuses on causal
Y, given that X is observed to equal x,” whereas Pearl’s analysis to serve our objective of showing how to think
notation “P(Y|do(X = x))” means “the probability of hard about the scientific content of candidate models.
observing Y, given that X is forced to equal x.” In gen- More about the details associated with explanatory model-
eral P(Y|X = x) is not equal to P(Y|do(X = x)). Pearl’s ing, moving from theory to models to results to interpreta-
do-calculus tells us when the two are equal and when tions, is outlined in Grace et al. (2012) and Shipley (2016:
they are not equal (Pearl et al. 2016: Chapter 3). Chapter 8). Many of the general points we present are
This conceptualization of causal relations reveals a widely applicable. However, the specific techniques herein
problem, but also a possible solution to that problem. will most commonly be useful in observational studies
The problem is that we can only observe one of the and field experiments with important covariate effects.
potential outcomes for any given individual frog during An overview of explanatory modeling is presented in
an experiment. Therefore, individual-level causal effects Fig. 2. The first step, assembling background information
are counterfactual and not directly observable, even in about the system under study, is critical to success and
randomized experiments. For this reason, the focus is represents the mechanistic foundation required for scien-
typically on the average causal effect, ACE, where tific interpretations. Step 2 is the process of causal analy-
sis, which describes the requirements for drawing
ACE ¼ E ½Y1 E ½Y0 (1) scientific (cause–effect) interpretations, our primary focus
in this paper. Step 3 refers to the subsequent step where
averaged across all i. When the consistency assump- models are confronted with data and their testable impli-
tion (Pearl 2009:99) holds, cations are evaluated. This step is described by the litera-
ture on structural equation modeling (Appendix S3, Part
E ½Y1 ¼ E ½Y jT ¼ 1 and E ½Y0 ¼ E½Y jT ¼ 0 (2) S1). Step 4 relates to drawing interpretations with proper
respect for assumptions. Finally, confident interpretations
In words, Eq. 2 says that the potential outcomes for ultimately depend on the consistency of findings and our
the total population, E[Y1] or E[Y0] are being inferred knowledge of mechanisms, which we refer to as sequential
from the subpopulations assigned to the treatments, E learning (Step 5). Although this step is widely recognized
[Y|T = 1] and E[Y|T = 0]. For observational studies, this as important, it is often omitted from formal discussions
assumption permits causal effects to be estimated from of the requirements for drawing causal inferences.
data as long as confounding effects are controlled for
(i.e., the treatment subpopulations are comparable). This
FOUR KEY PRINCIPLES OF CAUSAL ANALYSIS—THE PRAC-
is a critical and also powerful assumption that determi-
TICE OF “THINKING HARD”
nes our ability to infer causal information from observa-
tional (but also experimental) data. Note that this
Principle 1: Causal networks provide a powerful and
conceptualization can be extended to the situa-
convenient interpretive structure for causal analysis
tion where X is continuous, in which case the causal
effect is captured in a parameter or set of parameters The first of the four principles for causal analysis we
(for complex functions). present in this paper (Table 4) deals with the merits of
April 2020 EXPLANATORY MODELING AND CAUSAL ANALYSIS Article e02962; page 5
representing hypothesized relationships among a set of Understanding the structure of causal diagrams.—A cau-
variables using probabilistic causal networks. There are sal diagram is a pictorial graph of cause–effect relation-
two essential components of causal networks. One is the ships that is used to represent scientists’ ideas about how
causal diagram, which allows us to convey the logic of a system works (Robins 1986, Pearl and Verma 1991,
our hypotheses, and the second is an appropriate equa- Greenland et al. 1999). The diagram is meant to repre-
tional framework for estimating network relationships. sent a translation of the scientist’s ideas and hypotheses
Article e02962; page 6 JAMES B. GRACE AND KATHRYN M. IRVINE Ecology, Vol. 101, No. 4
TABLE 4. Four key principles for causal analysis. variables in M except for X and Y and those variables
along the directed path (causal chain) between X and Y
Principle Description being discussed (W in this case). Importantly, variables
1 Causal networks provide a powerful and convenient and links omitted from a causal diagram also represent
interpretive structure for causal analysis. a set of explicit assumptions with empirical implications.
2 Many elements of network structure can be tested There are several definitional features of causal dia-
with appropriate data, such as both omitted and
included links. grams. First, directed arrows imply cause–effect relation-
3 Confounding due to model misspecifications can ships. This means that if we physically change the values
bias parameter estimates. Analysis of causal of the variable at the base of an arrow (e.g., W in Fig. 3),
diagrams can provide strategies for addressing this while holding constant all other predictors, it can alter
issue.
the values of the variable at the tip of the arrow (e.g., Y
4 The inclusion of mediators can encode testable
hypotheses about mechanisms and strengthen in Fig. 3). There are two corollaries that go with this def-
inferences. inition of arrows. One is that there is understood to be a
finite passage of time between the cause and the
response. Thus, the arrows represent movement of infor-
mation from some time in the past to present. Another
CONCEPTS & SYNTHESIS
distributions are implied. As a result, the interpretation One might argue that the most notable separation
of a causal diagram is independent of statistical details. between statistical modeling and scientific modeling is
reflected in the difference between the equational form
Understanding how associations among variables are gen- for regression vs. causal networks.
erated.—To understand the logic of causal diagrams, we This dichotomy can be traced back to competing
need to understand the various ways that associations world views established by Ronald Fisher and Sewall
between variables can be created. If the causal diagram Wright in the 1920s. Fisher, whose dominant influence
in Fig. 3 is true, a fundamental expectation is that all over the development of statistics remains to this day,
adjacent variables will show conditional associations. was the father of the regression equation (Aldrich 2005).
Considering the nonadjacent pairs of variables, there are There are many types of regression models, but they all
two primary ways associations can be created indirectly: have the same general weaknesses with regard to
(1) through causal chains and (2) through common explanatory interpretation. Regression models are based
causes. (There are actually other ways that associations on the fundamental equational form
among variables can accidentally be created, which we
will describe later in the paper.) Causal chains (e.g., X ? Y ¼ f ðXÞ (3)
W ? Y) produce correlations between nonadjacent vari-
implications for data. Included links, which only claim sampling scheme, and (5) through various roles the vari-
nonignorability, represent less strong, but nonetheless able might play in censoring or truncating the data or
testable, premises. The directionality of arrows (i.e., of the sample. The rules that follow are meant to apply to
causal relationships), however, is one type of assumption all these different situations.
that requires theoretical justification and is not testable
from observational data alone (though can be tested d-separation rule 1.—Given a causal chain, X ? W ? Y,
through manipulative experiments). complete or full mediation means once we regress Y on
W, X provides no capacity to explain additional varia-
tion in Y (i.e., the effect of X on Y is explained through
Principle 3: Confounding because of model
W). Therefore, Y is independent (⊥) of X, given that we
misspecifications can bias parameter estimates. Analysis
have conditioned Y on W (|W).
of causal diagrams can provide strategies for addressing
this issue
Y ?X jW (5)
Much of the methodological work on causal inference
has dealt with ways of addressing potential impacts from In d-separation parlance, X and Y are said to be d-sep-
model misspecification (models that do not match the arated when Y is conditioned on W.
CONCEPTS & SYNTHESIS
For more complicated diagrams than examined here, framework for estimation, evaluation, and summarizing
the d-separation criterion goes a step further and findings.
describes how to find the minimum sets of conditioning There are many modern treatments of SEM (Kline
variables to achieve d-separation. The implementation 2016), some directed at natural scientists (Grace et al.
of d-separation was first described in Shipley (2000b). 2015, Shipley 2016). It is beyond our purpose in this
Several software packages are now available for working paper to provide an in-depth introduction. Rather, we
through all the possibilities prior to fitting a model with offer a high-level view for scientists not already familiar
data (Textor et al. 2011, Marchetti et al. 2015). Routines with the methodology, as well as executable code for the
also exist in some SEM software (Lefcheck et al. 2018). examples in this paper in the Appendices. A survey of
Both Shipley (2016) and Kline (2016) provide thorough what ecologists find helpful about SEM can be found in
discussions of conditioning sets based on d-separation. Laughlin and Grace (2019).
Perhaps the most important point to make about
SEM is that it is a scientific framework for explanatory
Principle 4: The inclusion of mediation relationships, in
modeling rather than a specific statistical technique.
the form of causal chains, encodes testable mechanistic
There are no statistical assumptions that are inherent to
explanations
SEM; those depend on the implementations supported
CONFRONTING HYPOTHESES WITH DATA—STRUCTURAL Example 1: A return to the wildfire recovery example
EQUATION MODELING
Our explanatory modeling work-flow process requires
Our primary objective in this paper is to show scien- explicit consideration of expert knowledge in order to
tists how to “think hard” about their hypotheses to pro- construct plausible hypotheses to convey in a causal dia-
mote explanatory analyses (Fig. 2, Steps 1 and 2). Our gram. Following the data collection and initial examina-
four principles of causal analysis should assist in repre- tions of relationships conducted by Keeley et al. (2005),
senting causal hypotheses and establishing their empiri- Grace and Keeley (2006) compiled a carefully consid-
cal claims. Now, we need to say something about ered list of biological assumptions relevant to the mea-
methods for confronting those hypotheses with data so sured variables (Table 5). This table is an example of
we can complete the illustration through the use of Step 1 in Fig. 2 and we feel a useful companion to the
examples (Fig. 2, Steps 3 and 4). No longer are we deal- causal diagram. In this demonstration, we first consid-
ing with diagrams, but instead, fully specified models. ered a na€ıve causal diagram (Fig. 4A) that considered all
We use the general term explanatory modeling in this possible links (denoted by the corresponding numbers in
paper to describe the intention. For the context in which Table 5). We used our na€ıve causal diagram (Fig. 4A)
we are working, the statistical methodology of structural and assembled expert knowledge (Table 5) to arrive at
equation modeling (SEM) provides a well-established an informed diagram (Fig. 4B). In this example, we start
Article e02962; page 10 JAMES B. GRACE AND KATHRYN M. IRVINE Ecology, Vol. 101, No. 4
TABLE 5. Expert opinion relevant to an initial hypothesis of how vegetation cover following wildfire could be explained by the
measured predictors.
Number Interpretation
1 In southern California, where the wildfires took place, elevation generally increases as one moves eastward from the
coastal lowlands to the interior highlands (see Keeley et al. 2005, Fig. 1). Thus, increasing the distance from the coast
tends to cause an increase in elevation (expected effect: positive).
2 Increasing elevation should produce more mesic conditions, which in turn, would reduce the frequency of wildfires and
thereby increase the average ages of forest stands (because age is determined by time since last stand-replacing fire).
The net relationship expected is thus increased age with increasing elevation (expected effect: positive).
3 As stands age, they accumulate more biomass (alive and dead) that can serve as fuel during a wildfire. Therefore,
increasing stand age should lead to more severe fires when burning finally takes place (expected effect: positive).
4 Independent from other effects, moving away from the coast should result in less wildfire suppression and larger fires
because of reduced urbanization and population densities (expected effect: negative).
5 Fire severity depends not only on fuel, but also humidity. Thus, increasing elevation should contribute to less severe fires,
all other things equal (expected effect: negative).
6 Once the effects of stand age and elevation are taken into account, it is not obvious, based on a priori information, that
additional processes related to distance from the coast should have a major influence on wildfire severity (expected
CONCEPTS & SYNTHESIS
effect: undetectable).
7 Once effects of elevation, stand age, and fire severity are controlled, it is not obvious, based on a priori information, that
vegetation recovery will be influenced by distance from the coast (expected effect: undetectable)
8 In the study region, a marked, orographic gradient leads to a strong positive effect of elevation on precipitation, which
would lead to faster plant regrowth and a more rapid recovery of the vegetation (expected effect: positive).
9 A negative effect of stand age on vegetation recovery might be observed because of the fact that increasing overstory
dominance in older stands will lead to a reduced understory. If the understory (including the seed bank) is a strong
contributor to vegetation recovery, an independent effect of stand age on vegetation recovery might be expected
(expected effect: negative).
10 There is a very strong a priori expectation that increased fire severity will result in reduced vegetation because of the
damage to the regenerating tissues of plants and possibly effects on soil that reduce water penetration (expected effect:
negative).
with a single hypothesis for consideration, which is our explanations and thereby avoid the problem of infi-
derived from our summary of expert knowledge. We do nite regress in explanation.
this knowing that we will have the opportunity to com- The causal diagram in Fig. 4B was translated into a
pare the support for this hypothesis against all the alter- structural equation model using the software package
native hypotheses based on the same causal ordering of piecewiseSEM (Lefcheck et al. 2018). As shown in
predictors. Appendix S3, Data S3, estimation produces a set of d-
The backbone of the causal diagram is a directed separation tests, one for each omitted link in the model.
chain relating vegetation recovery (V) to fire severity (F) These tests evaluate whether there is evidence to include
to stand age (A) to elevation (E) to distance from the a missing link, and thus, whether it should be added to
coast (C). This sequence of variables represents an the model. Global fit of the initial model suggested it
ordering from proximal to distal potential drivers of veg- was a plausible explanation for the data. Further, tests
etation response following fire. In an earlier review of suggested equivocal support for some of the links, how-
this paper, we were asked to explain how it can be ever, so simpler models were also considered. During
argued that the various predictors in this example actu- the evaluation process, individual d-separation tests
ally constitute causes. The rule related to this point is suggested the possibility of local violations of condi-
very basic and can be called the intervention-response tional independence. We ultimately considered four
rule: if we can manipulate something and induce a alternative models, omitting and adding various links.
response in another system property, then the thing The model selected from that process is shown in
manipulated qualifies as a cause of the thing that Fig. 4C. The scientific conclusion we draw from the
responds. Confusion comes about when someone starts results is that vegetation recovery varies widely in the
wanting to distinguish what they think of as “true” landscape, primarily as a function of fire severity, which
causes. Let us consider the most distal cause in the in turn is influenced by greater fuel accumulation in
chain, distance from the coast. Intuitively, it might seem older stands of woody plants. Distance from the coast
that this is a placeholder for some “actual” cause. A sim- (dismissed as important by all-subsets regression) is
ple statement shows that location qualifies as a causal shown to influence recovery, but only indirectly through
variable. That statement is, “If you think location in effects on both stand age and fire severity, ultimately
space is not an actual cause, then stand closer to the fire being quite important in the system. The explanation
and see if your opinion changes.” In reality, there are achieved is based on careful and explicit considerations,
many potential mediators that make up a causal chain. something we could not easily achieve using an all-sub-
We usually work with mediators that are meaningful for sets regression approach.
April 2020 EXPLANATORY MODELING AND CAUSAL ANALYSIS Article e02962; page 11
FIG. 4. (A) Naive causal diagram with all possible links. (B) Hypothesis based on available expert opinion (Table 5). (C) Model
informed by data and d-separation tests (Appendix S3). Dashed lines represent negative effects, solid lines positive effects.
FIG. 5. Causal diagrams representing (A) ANOVA 1, (B) ANOVA 2, (C) MANOVA 1, (D) MANCOVA 1, (E) ANCOVA 1, (F)
ANCOVA 2, (G) ANCOVA 3. Trt = treatment, Epi = epiphyte abundance, Gam = Gammarid abundance, Cap = Caprellid abun-
dance, Mac = macroalgae abundance, and Gra = seagrass density. Results for these models can be found in Appendix S4.
study. Controlling the variable responses of microcrus- to explain the response of epiphytes to treatment com-
tacean by including these variables as covariates might pletely. As a result, there is no link directly from treat-
seem to be a useful way to remove their variations from ment to epiphytes in the final model. Finding support
the assessment of treatment effects on epiphytes. Dia- for full mediation is a highly desirable outcome in an
gram G represents a commonly employed approach, experiment, because we are not scientifically interested
which is to control for as many possible influences as in the effects of insecticide and prefer a model where
possible. This cursory discussion of possibilities is meant artificial treatments are conditionally ignorable. Sec-
to point out that causal hypotheses will have to be ond, empirical evaluation of the implied conditional
brought to bear in order to decide which of these models independences from the initial hypothesis (Fig. 6A)
will produce defensible results. revealed three nonindependences that could be
It is instructive to compare the hypotheses that can be resolved by adding links (Fig. 6B, added links shown
examined using classical statistical models (Fig. 5) to the with asterisks). This produced evidence to suggest pre-
possibilities that emerge from adopting a causal network viously unanticipated biological discoveries related to
perspective. For this comparison, we refer the reader to the system under study.
a causal diagram representing a dual mediation hypothe- The positive effects discovered by the analysis (links 7,
sis in Fig. 6A. Here, impacts of treatment on Gam- 8, and 9) can be interpreted as indications that macroal-
marids and Caprellids, the two dominant groups of gae (to a large degree) and eelgrass density (to a lesser
microcrustaceans, are hypothesized to explain the effect degree) provide protective refuges for microcrustaceans.
of treatment on epiphytes. This model also includes By promoting microcrustaceans, macroalgae have indi-
macroalgae and seagrass density as covariates poten- rect negative effects on epiphytes (through the pathways
tially influencing epiphytes. This diagram was treated as 7 ? 3 and 8 ? 4). Within the total food web, this could
the initial hypothesis, which was subsequently evaluated translate into macroalgae facilitating seagrasses by har-
using the available data. The structural equation model boring microcrustaceans, the grazers of epiphytes (which
presented in Fig. 6B summarizes results for the final are the enemies of seagrasses). We believe scientists will
model once d-separation tests were performed and links find this system-level set of results substantially more
were added to resolve d-separation violations informative than results from traditional ANCOVA.
(Appendix S4, Data S4).
There are numerous, interesting scientific findings
INTERPRETATIONS AND CONSIDERATIONS
revealed by the SEM (Fig. 6B; see also Whalen et al.
2013). First, we are able to evaluate the full-mediation Explanatory modeling requires adequate expert
hypothesis formally. The two mediation pathways knowledge to defend scientific interpretations. Appropri-
(those combining links 1 ? 3 and 2 ? 4) were found ate data are also required and frequently limit the
April 2020 EXPLANATORY MODELING AND CAUSAL ANALYSIS Article e02962; page 13
FIG. 6. (A) Causal diagram representing initial hypothesis and (B) final structural model supported by the data (see
Appendix S4). Solid lines in (B) represent effects of positive sign and dashed lines represent effects of negative sign. Linkages in B
not seen in A (7, 8, and 9; indicated with asterisks) represent important effects uncovered because of failed d-separation (conditional
conclusions that can be drawn. There are many ways our Subsequent SEM studies (Keeley et al. 2008) have
models and interpretations can deviate from the truth, enhanced our confidence in the general inferences drawn
to either major or minor degrees. Here we list three from the original study. That said, we would not claim
increasingly challenging types of assumptions upon that all our parameter values are unbiased causal esti-
which interpretations depend. (1) Foundational assump- mates without further evidence to support such inferences.
tions—arrows in models indeed represent directional
cause–effect relationships. Generally, this must be sup- CONCLUSIONS AND FUTURE DIRECTIONS
ported by a priori expert knowledge of mechanisms. (2)
Parameter estimates approximate true values to the Causal understanding is ultimately about understand-
degree that overall conclusions are not confounded. ing mechanisms. The majority of studies that scientists
Again, expert knowledge of the system under investiga- conduct are directed to that end. Classical statistical
tion needs to be sufficient to defend general conclusions. models do not easily accommodate the explicit incorpo-
Further, it is important that investigators be aware of ration of mechanisms into hypotheses, limiting their
the ways that confounding can occur and the need to capacity for explanatory application. Adopting a causal
guard against the omission of critically important com- network framework and the principles of causal analysis
mon causes. (3) Inferring that one or more parameter for hypothesis development and evaluation greatly
estimates are unbiased (true) estimates of causal effects increases possibilities for the development of explana-
constitutes a stricter assumption. Both omitted con- tory models and clear expression of their logic. We hope
founders and measurement error, along with a host of this presentation provides a guide toward that future by
more technical issues, can bias estimates to various helping scientists see their own responsibilities and
degrees (though remedies to even these challenges exist; opportunities in quantitative analysis.
Bollen 2019). For these reasons, appropriate caution is ACKNOWLEDGEMENTS
required when drawing conclusions.
We thank Brian Cade, Katherine Banner, Megan Higgs, Dar-
Suitably cautious language for expressing findings will
ren Johnson, Lori Randall, Billy Schweiger, Magdalena Steiner,
vary depending on the above-mentioned factors. As one Lara Volery, and Rachel Korn for reviews of earlier drafts of
illustration, we present the language Grace and Keeley the manuscript. Brian Inouye, Bill Shipley, and an anonymous
(2006) used when reporting their findings from the wild- reviewer generously provided very helpful suggestions. JG was
fire recovery study. supported by the USGS Land Change Science and Ecosystems
Programs. Any use of trade, firm, or product names is for
We infer from the SEM results that postfire rich- descriptive purposes only and does not imply endorsement by
the U.S. Government.
ness in this system is strongly influenced by local
conditions and that these conditions are, in turn,
predictably related to landscape-level conditions. LITERATURE CITED
For example, we observed that older stands of Aldrich, J. 2005. Fisher and regression. Statistical Science
shrubs were characterized by more severe fires, 20:401–417.
which were associated with a low recovery of plant Banner, K. M., and M. D. Higgs. 2017. Considerations for
cover and low richness. These results may have assessing model averaging of regression coefficients. Ecologi-
implications for the use of prescribed fire in this cal Applications 27:78–93.
Bollen, K. A. 2019. Model implied instrumental variables
system if these findings extrapolate to prescribed
(MIIVs): An alternative orientation to structural equa-
burns as we would expect. tion modeling. Multivariate Behavioral Research 54:31–46.
Article e02962; page 14 JAMES B. GRACE AND KATHRYN M. IRVINE Ecology, Vol. 101, No. 4
Bollen, K. A., and J. Pearl. 2013. Eight myths about causality Lefcheck, J., J. Byrnes, and J. B. Grace. 2018. Package ‘piece-
and structural equation models. Pages 301–328 in S. L. Mor- wiseSEM’. https://cran.microsoft.com/web/packages/piece
gan, editor. Handbook of causal analysis for social research. wiseSEM/piecewiseSEM.pdf
Springer, Dordrecht, The Netherlands. Lindley, D. 2002. Seeing and doing: The concept of causation.
Burnham, K. P., and D. R. Anderson 2002. Model selection International Statistical Review 70:191–214.
and multimodel inference. Second edition. Springer, New Marchetti, G. M., M. Drton, and K. Sadeghi. 2015. ggm: Func-
York, New York, USA. tions for graphical Markov models. R package version, 2.
Cade, B. S. 2015. Model averaging and muddled multimodel https://cran.r-project.org/web/packages/ggm/ggm.pdf
inferences. Ecology 96:2370–2382. Neyman, J. 1923. On the applicability of probability theory to
Cohen, J., P. Cohen, S. G. West, and L. S. Aiken 2003. Applied agricultural experiments. Essay on principles. Section 9. Sta-
multiple regression/correlation analysis for the behavioral tistical Science 5:465–480.
sciences, Third edition. Routledge, New York, New York, USA. Pawlowsky-Glahn, V., and A. Buccianti. 2011. Compositional
Elwert, F., and C. Winship. 2014. Endogenous selection bias: data analysis. John Wiley & Sons, New York, New York,
The problem of conditioning on a collider variable. Annual USA.
Review of Sociology 40:31–53. Pearl, J. 1988. Probabilistic reasoning in intelligent systems.
Fieberg, J., and D. H. Johnson. 2015. MMI: Multimodel infer- Morgan Kaufmann, San Francisco, California, USA.
ence or models with management implications? Journal of Pearl, J. 2000. Causality. Cambridge University Press, Cam-
Wildlife Management 79:708–718. bridge, UK.
Fisher, R., S. K. Wilson, T. M. Sin, A. C. Lee, and T. J. Lan- Pearl, J. 2009. Causality. Second edition. Cambridge University
CONCEPTS & SYNTHESIS
glois. 2018. A simple function for full-subsets multiple regres- Press, Cambridge, UK.
sion in ecology with R. Ecology and Evolution 8:6104–6113. Pearl, J., and D. MacKenzie. 2018. The book of why: The new
Gough, L., and J. B. Grace. 1999. Effects of environmental science of cause and effect. Basic Books, New York, New
change on plant species density: comparing predictions with York, USA.
experiments. Ecology 80:882–890. Pearl, J., and T. Verma. 1991. A theory of inferred causation.
Grace, J. B. 2006. Structural equation modeling and natural sys- Pages 441–452inJ. F. Allen, R. Fikes, and E. Sandewall editors.
tems. Cambridge University Press, Cambridge, UK. KR ’91: Proceedings of the Second International Conference
Grace, J. B., and J. E. Keeley. 2006. A structural equation model on Principles of Knowledge Representation and Reasoning.
analysis of postfire plant diversity in California shrublands. Morgan Kaufman, San Francisco, California, USA.
Ecological Applications 16:503–514. Pearl, J., M. Glymour, and N. P. Jewell 2016. Causal inference
Grace, J. B., D. R. Schoolmaster, Jr., G. R. Guntenspergen, A. in statistics: a primer. John Wiley & Sons, West Sussex, UK.
M. Little, B. R. Mitchell, K. M. Miller, and E. W. Schweiger. Robins, J. M. 1986. A new approach to causal inference in mor-
2012. Guidelines for a graph-theoretic implementation of tality studies with sustained exposure periods—Application
structural equation modeling. Ecosphere 3:art73. to control of the healthy worker survivor effect. Mathematical
Grace, J. B., S. M. Scheiner, and D. R. Schoolmaster Jr.2015. Modelling 7:1393–1512.
Structural equation modeling: building and evaluating causal Rubin, D. B. 1974. Estimating causal effects of treatments in
models. Pages 169–200 in G. A. Fox, S. Negrete-Yankelevich, randomized and nonrandomized studies. Journal of Educa-
and V. J. Sosa, editors. Ecological statistics: From principles tional Psychology 66:688.
to applications. Oxford University Press, Oxford, UK. Shipley, B. 2000a. Cause and correlation in biology: A user’s
Greenland, S., J. Pearl, and J. M. Robins. 1999. Causal diagrams guide to path analysis, structural equations, and causal infer-
for epidemiologic research. Epidemiology 10:37–48. ence. Cambridge University Press, Cambridge, UK.
Hern an, M. A., and J. M. Robins 2020. Causal inference: What Shipley, B. 2000b. A new inferential test for path models based
if. Chapman & Hall/CRC, Boca Raton, Florida, USA. on directed acyclic graphs. Structural Equation Modeling
Hitchcock, C.2019. Causal models. In E. N. Zalta, editor. The Stan- 7:206–218.
ford encyclopedia of philosophy (summer 2019 edition). https:// Shipley, B. 2016. Cause and correlation in biology: A user’s
plato.stanford.edu/archives/sum2019/entries/causal-models guide to path analysis, structural equations and causal infer-
Keeley, J. E., C. J. Fotheringham, and M. Baer-Keeley. 2005. ence with R. Second edition. Cambridge University Press,
Factors affecting plant diversity during post-fire recovery and Cambridge, UK.
succession of Mediterranean-climate shrublands in Califor- Symonds, M. R., and A. Moussalli. 2011. A brief guide to
nia, USA. Diversity and Distributions 11:525–537. model selection, multimodel inference and model averaging
Keeley, J. E., T. Brennan, and A. H. Pfaff. 2008. Fire severity in behavioural ecology using Akaike’s information criterion.
and ecosystem responses following crown fires in California Behavioral Ecology and Sociobiology 65:13–21.
shrublands. Ecological Applications 18:1530–1546. Textor, J., J. Hardt, and S. Kn€ uppel. 2011. DAGitty: a graphical
Kline, R. B. 2016. Principles and practice of structural equa- tool for analyzing causal diagrams. Epidemiology 22:745.
tion modeling. Fourth edition. Guilford Press, New York, Whalen, M. A., J. E. Duffy, and J. B. Grace. 2013. Temporal
New York, USA. shifts in top-down versus bottom-up control of epiphytic
Laughlin, D. C., and J. B. Grace. 2019. Discoveries and novel algae in a seagrass ecosystem. Ecology 94:510–520.
insights in ecology using structural equation modeling. Ideas Wright, S. 1921. Correlation and causation. Journal of Agricul-
in Ecology and Evolution 12:28–34. tural Research 10:557–585.
SUPPORTING INFORMATION
Additional supporting information may be found in the online version of this article at http://onlinelibrary.wiley.com/doi/
10.1002/ecy.2962/suppinfo