Testing Recursive Path Models With Correlated Er
Testing Recursive Path Models With Correlated Er
Bill Shipley
Département de biologie
Université de Sherbrooke
CANADA
6 June 2002
Running Head: d-sep test for path models with correlated errors
1
ABSTRACT
This paper shows how to extend the inferential test of Shipley (2000b),
model), to a class of recursive path models that include correlated errors (a semi-
Markov model). The path model is first converted to a partial ancestral graph
(PAG) and then, for PAGs that do not require latent variables, an inducing path
to the original path model. The null probabilities of the k tests of independence
that are implied by this DAG are combined using Fisher’s test statistic C=-
2
INTRODUCTION
models without correlated errors. This test has a number of advantages over the
matrix. First, the new test is exact rather than asymptotic and therefore permits
tests using small data sets. Second, being based on tests of (conditional)
probability density functions can be accommodated. Third, the model test can be
causal hypothesis; this “local” property allows the researcher to identify those
parts of a poorly-fitting model that are contributing to the lack of fit. Here, I
describe the conditions under which the test can be extended to path models
The test in Shipley (2000b) is based on two notions derived from the
the directed paths between any two variables (vi, vj) in the DAG given a set Q of
other variables. In this paper I will use the notation (vi| Q| vj) to mean “variable vi
is d-separated from variable vj given the set of variables Q” and say that the set
Q “blocks” the directed paths linking vi and vj in the DAG. It can be proven that if
any two variables in a DAG are d-separated then, in any data generated by the
3
independent upon conditioning on the variables that are blocked (Pearl 1988;
Geiger, Verma and Pearl 1990; Geiger, Paz and Pearl 1991; Geiger and Pearl
1993; Pearl 2000). This does not require assumptions of linearity of the
graph, for instance a path diagram with correlated errors, are d-separated given
a set Q of other variables in the graph if and only if there is no undirected path U
between vi and vj such that (i) every non-colliding variable along U is not a
“collide” along an undirected path U if it has arrowheads pointing into it from both
Shipley (2000a).
implies, using the laws of probability and the axioms of d-separation, all other
Verma and Pearl 1988; Pearl 2000). This means that a test of the
The basis set used in Shipley (2000a,b) is the following (Pi is the set of
causal parents of variable vi in the DAG): BU={vi|Pi∪Pj|vj}. This basis set has the
4
additional property that its elements imply mutually independent residuals of vi
independence (Shipley 2000b). Other basis sets are also possible (Pearl 2000)
but are not appropriate for the inferential test discussed here. An exact
inferential test for such a DAG (i.e. a path model without correlated errors) is
independence claims in the basis set BU and then calculating the statistic
k
C = −2∑ Ln( p j ) . This statistic is distributed as a chi-squared variate with 2k
j =1
degrees of freedom if all of the k independence claims are true in the statistical
DAG and therefore the test proposed by Shipley (2000b) cannot be used. This is
because the basis sets for DAGs are not, in general, basis sets for path models
with correlated errors. I had suggested in Shipley (2000b) a way of extending the
variable which is the causal parent of the two variables possessing the correlated
errors, constructing the basis set from this augmented DAG, and then testing
only those d-separation claims in the basis set that do not involve latent
variables. Unfortunately this way of testing a path model with correlated errors is
5
of limited value because there can be d-separation claims involving only
observed variables that are implied by the model but which are neither in BU nor
Unshielded colliders
X and Y, an edge between Y and Z, no edge between X and Z, then call this an
“unshielded pattern”. If, in this unshielded pattern, there are arrowheads pointing
into Y from both X and Z then call Y an “unshielded collider”. This is shown
graphically as X•ÆYÅ•Z where the circle (•) means that there can be an
arrowhead or not in that position. Note that Spirtes et al. (1993; 2000) use an
open circle (o) to represent the same information. If, in this unshielded pattern,
there are not arrowheads pointing into Y from both X and Z then call Y a “definite
An augmented DAG
new latent variable (lij) which is the causal parent of the two variables (vi, vj)
possessing the correlated errors. This augmented DAG will have two types of
variables: observed variables denoted by the set V and latent variables denoted
6
by the set L. This augmented DAG will have the same d-separation relationships
involving the original (observed) variables as the path model with correlated
Inducing paths
An inducing path between two observed variables (vi, vj) in the augmented
DAG D’ relative to the set of observed variables (V) exists if there is no subset
Q⊆ V\{vi,vj}, including the null subset, such that vi and vj are d-separated given Q.
In a DAG, two variables that are non-adjacent are necessarily d-separated given
some subset of the remaining variables. This is not true in general for semi-
1b are not adjacent. Although they are d-separated given {X3, l24}, they are not
d-separated given any subset of the remaining observed variables {X2, X3}.
Using the null subset there is an open path X1ÆX2ÆX3ÆX4. Using either {X2},
{X3} or {X2, X3} the pair (X1, X4) are not d-separated because there exists the
and/or its descendent X3. There is therefore an inducing path between X1 and X4
in the extended DAG D’ over the observed variables. In general, two non-
adjacent observed variables (vi, vj) will have an inducing path between them
relative to V\vi,vj if there exists an undirected path U between them such that all
ancestral graph. Spirtes et al. (1993) previously called the same thing is a
7
“partially oriented inducing path graph” and Desjardins (1999) has called it a
marginal dependency graph. Consider all those DAGs (M) containing the same
set V of observed variables, different sets of latent variables, but that imply the
PAG is a graphical construct involving only V such that (i) two variables (vi, vj)
have an edge between them if every DAG in M has an inducing path between vi
and vj relative to V\{vi, vj} and (ii) each unshielded pattern that collides at Y in
every DAG in M also collides in the PAG and (iii) every unshielded pattern that is
PAG. There are some other orientation rules that can be applied to the PAG but
these aren’t necessary for the purposes of this paper. In other words, every
faithful acyclic model that has the same conditional independence relationships
has the same PAG. The construction of a PAG is given in the Causal Inference
Algorithm.
The causal inference (CI) algorithm is given on page 183 of Spirtes et al.
(1993). Theorem 6.3 of that reference states that if the input to the CI algorithm,
involving the observed variables, is faithful to the generating graph, then the
output of the CI algorithm is a PAG. Since we assume, under the null hypothesis
that the path model with correlated errors is correct, this is always true by
assumption. The following steps will produce a PAG that is sufficient for our
purposes.
8
1. Given the path model with correlated edges (G), construct the
but retaining each edge and adding circles at the ends of each edge
(vi•—•vj).
relationships in G’.
5. Orient the remaining edges in P such a way that all definite non-
colliders are formed, and no cycles are formed. Verma and Pearl
theorems 37 and 38 of Meek (1998) prove that they are sound (i.e. any
orientation other that that specified by these rules would lead to either
9
The result of step 4 is a PAG. Step 5 simply generates one of the
this an “inducing path graph”. Because the PAG, and therefore the
relationships to the original path model with correlated errors (G), every d-
path graph. If P can be oriented in such a way that it is a DAG then one
can test it using the inferential test of Shipley (2000b). I will call inducing
path graphs that are also DAGs “inducing path DAGs”. Figure 2 shows
the steps involved. Note that since the path models in Figure 2a,b are d-
separation equivalent to an inducing path DAG, one can obtain a basis set
equivalent to the path model in Figure 2c. Although one can still obtain d-
therefore of the original path model, these d-separation statements are not
Conclusions
10
When testing path models without correlated errors (i.e. DAG models), the
here is not always superior to classical SEM when applied to path models
with correlated errors. First, some such path models are not d-separation
equivalent to any inducing path DAG, thus precluding the exact test.
Second, path models with correlated errors can imply constraints on the
estimation. There are conditions in which Shipley’s (2000b) test would still
correlated errors, assuming that an inducing path DAG exists. First, non-
2000a). Second, the test is exact and can therefore be used with small
samples. Third, there are models like the one in Figure 2b that are
unidentified and can therefore not be tested using classical SEM but that
can still be tested using the method presented here. Finally, if the model
11
is judged not to fit the data, the local property of the test can allow one to
determine which parts of the model are contributing to lack of fit. This
cannot be done using ML estimation since errors in one part of the model
are propagated throughout the rest of the model due to the global nature
of the method.
Acknowledgements
This research was financially supported by the Natural Sciences and Engineering
12
References
Geiger, D., Paz, A & Pearl, J. (1991). Axioms and algorithms for inferences
128-141.
2021.
Scheines, R., Spirtes, P., Glymour, C, & Richardson, T. (1998). The TETRAD
Press, Oxford.
13
Shipley, B. (2000b). A new inferential test for path models based on directed
Spirtes, P., Richardson, T., Meek, A., Scheines, R. & Glymour, C. (1995). Using
Elsevier.
Kaufmann.
14
Figure 1. (a) A path model with correlated errors (i.e. a semi-Markov model). (b)
The augmented DAG for the path model. (c) A partial ancestral graph for
Figure 2. Three different path models with correlated errors (a, b and c) are
shown along with the undirected dependency graph, the partial ancestral
graph (PAG) and the inducing path acyclic graph. The d-separation
relationships shown at the bottom form a basis set for models a and b, but
not for c.
15
Figure 1.
(a) X1 X2 X3 X4
(b) X1 X2 X3 X4
L24
(c) X1 X2 X3 X4
16
Figure 2.
Path model X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4
l12 l34 l12 l34 l12 l23 l34
Extended DAG X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4
Undirected X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4
dependency graph
PAG X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4
IPG X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4
17