Sem - Sample.notes
Sem - Sample.notes
Objectives
Introduce SEM as a flexible combination of path analysis and CFA
Review recent SEMs in the literature
x1 y1 y2
y1 y2 y3 y4 y5 y6 y7 y8
1 2 3 4 5 6 7 8
Here the covariance matrix among the factors is saturated, including non-directional associations among
all latent variables.
y1 y2 y3 y4 y5 y6 y7 y8
1 2 3 4 5 6 7 8
Here a causal structure has been applied to the covariance matrix among the factors
x1 1 2
x2
y1 y2 y3 y4
1 2 3 4
As with path analysis / simultaneous equation models, we can include nominal exogenous predictors via
coding variables.
1 y1
2 y2 1
3 y3 y6 3
y7 4
4 y4
2
5 y5
Example Applications
Hwang & Woods (2009) used a relatively simple SEM to show
that the mental health status of Asian and Latino American
college students is predicted by the acculturation gap with their
parents, as mediated by family conflict.
Hwang W.-C. & Wood, J.J. (2009). Acculturative family distancing: links with self-reported
symptomatology among Asian Americans and Latinos. Child Psychiatry and Human Development, 40,
123–138.
Abstract from manuscript: Objective Our knowledge of how acculturative processes affect families
remains quite limited. This article tests whether acculturative family distancing (AFD), a more proximal
and problem-oriented measure of the acculturation gap, influences the mental health status of Asian
American and Latino college students. AFD occurs along two dimensions: communication difficulties and
cultural value incongruence. Methods Data were collected from 186 Asian American (n = 107) and Latino
(n = 79) undergraduates, who provided self-reports on psychological problems, depressive symptoms,
and family conflict. A new self-report measure of AFD evidencing good psychometric properties was
used to test hypothesized relations among these variables in structural equation models (SEM). Results
For both Asian American and Latinos, results indicated that higher levels of AFD were associated with
higher psychological distress and greater risk for clinical depression, and that family conflict mediated
this relation. Conclusion AFD processes were associated with the mental health of students and the
functioning of their families. These findings highlight potential foci to address in prevention and
intervention programs, such as improving communication and teaching families how to negotiate
cultural value differences.
Example Applications
Vieno et al. (2010) evaluated the indirect effect of neighborhood
context on children’s antisocial behavior
Vieno, A., Nation, M., Perkins, D.D., Pastore, M. & Santinello, M. (2010). Social capital, safety
concerns, parenting, and early adolescents’ antisocial behavior. Journal of Community Psychology, 38,
314-328.
Abstract from manuscript: This study explores the relations between neighborhood social capital
(neighbor support and social climate), safety concerns (fear of crime and concern for one’s child),
parenting (solicitation and support), and adolescent antisocial behavior in a sample of 952 parents (742
mothers) and 588 boys and 559 girls from five middle schools (sixth through eighth grades) in a midsize
Italian city. In structural equation models, social capital is strongly and inversely related to safety
concerns and positively related to parental support and solicitation. In turn, safety concerns are also
positively related to parental support and solicitation. Social capital and safety concerns have indirect
effects on children’s antisocial behavior through their effects on parenting. Implications are discussed
for parenting and community-based interventions to prevent or reduce youth antisocial behaviors
Example Applications
Park, Heppner &
Lee (2010)
evaluated whether
maladaptive coping
and self-esteem
mediate the
relationship
between
perfectionism and
distress
Park, H.-J., Heppner, P.P. & Lee, D.-G. (2010). Maladaptive coping and self-esteem as mediators
between perfectionism and psychological distress. Personality and Individual Differences 48, 469–474.
Abstract from manuscript: This study with 508 Korean college students examined the mediation effects
of maladaptive coping styles and self-esteem on the links of evaluative concerns perfectionism and
psychological distress. Structural equation modeling analyses supported a full mediation effect of
maladaptive coping between evaluative concerns perfectionism and distress. The final model also
revealed a significant path from evaluative concerns perfectionism through maladaptive coping and self-
esteem to distress. Furthermore, a multi-group analysis found that male college students with
evaluative concerns perfectionism tend to use maladaptive coping strategies more compared to their
female counterparts. The findings provided not only external validity for the full mediation effect of
coping but also evidence of more complex relations among the variables.
Summary
Structural equation models represent the natural combination of
simultaneous equation models with confirmatory factor analysis
The full model is very flexible, permitting complex structural
models involving both observed and latent predictors and
outcomes
The presence of the measurement model permits the estimation
of effects without bias or standard error inflation due to
measurement error
We will now see that fitting SEMs is also a straightforward
extension of procedures we have covered for simultaneous
equation models and CFA
Objectives
Show specification of SEMs in path diagram and matrix form
Describe model identification rules
Describe process of model estimation, evaluation and re-
specification
Fitting SEMs
As with simultaneous equation models and CFA, model fitting
involves the following steps
Specification
Identification
Estimation
Evaluation
Potential re-specification
Interpretation
Fortunately, we can draw on the knowledge base we have built
for simultaneous equation models and CFA to develop general
principles for the full structural equation model with latent
variables.
Model Specification
As with prior models, the initial specification of the model should
be heavily guided by theory.
Which variables are predictors, mediators, and outcomes?
Which variables have causal effects, and which are merely associated?
The model can be specified graphically via a path diagram, or in
equations using an expanded set of model matrices.
Our model matrices now combine matrices used previously with
simultaneous equation models and matrices used with CFA.
The full SEM consists of two parts:
The measurement model (like CFA)
The structural model (like path analysis)
Model Specification
The measurement model is used to define the relationships of the
indicator variables to the latent variables:
Model Specification
The structural model is used to define the relationships among the
latent variables and between the latent variables and any fixed-x
exogenous predictors:
Notice that the structural model is the same as path analysis, except that the latent variables defined by
the measurement model have replaced the observed variables.
2 3
y1 y2 y3 y4 y5 y6 y7 y8
1 2 3 4 5 6 7 8
We’ll talk more about scaling latent variables in SEM shortly. For now we just use scaling items.
where
11
VAR(ζi ) Ψ 0 22
0 0 33
x1 1 2
x2
y1 y2 y3 y4
1 2 3 4
y1i 0 1 0 1i
y ν
2i 2 21 0 1i 2i
y3i 0 0 1 2i 3i
y4i ν 4 0 42 4i
where
where
VAR(ζi ) Ψ 11
0 22
1 y1
2 y2 1
3 y3 y6 3
y7 4
4 y4
2
5 y5
y1i 0 1 0 1i
y ν
2i 2 21 0 1i 2i
y3i ν 3 31 0 3i
y4 i 0 0 1 2 i 4 i
y5i ν 5 0 52 5i
where
VAR(εi ) Θ DIAG 11 ,22 ,33 ,44 ,55
The variables y6 and y7 do not appear here as they are not indicators of a latent factor. They will appear
in the structural model instead as, essentially, “manifest factors”.
where
11
21 22
VAR(ζi ) Ψ
0 0 33
0 0 43 44
Here we see y6 and y7 alongside the latent variables. The subscript numbering on the regression
coefficients and disturbances is a little unfortunate because it doesn’t match the numbering of the y
variables. Such is life.
yi ν Ληi εi , VAR(εi ) Θ
μ y (θ) ν Λμ η ν Λ I B (α Γμ x )
1
As in the previous models we have seen, the mean structure will often be saturated, with as many
parameters estimated as there are observed means.
Σyy (θ)
Σ(θ)
Σ xy (θ) Σ xx
where
Σyy (θ) ΛΣ ηη Λ Θ
Λ I B
1
ΓΣxx Γ Ψ I B Λ Θ
1
and
Σxy (θ) Σxx Γ I B Λ
1
Notice that the expression Σyy (θ) ΛΣηη Λ Θ is of the same form as the CFA model. The difference
in the SEM is that we also apply a path analytic structure to Σ ηη , the covariance matrix of the factors.
ν Λ I B 1 (α Γμ x )
μ(θ)
μx
Λ I B 1 ΓΣ Γ Ψ I B 1 Λ Θ
xx
Σ(θ)
1
Σ xx Γ I B Λ Σ xx
There is a lot going on here, but the important thing is just to realize that the model implies a specific
structure for the means and covariances of the data, and one of our key objectives is to see how well
this structure matches the observed means and covariances.
Model Identification
Like CFA, one requirement for model identification is that the
scale of the latent variables be defined through specific
restrictions in the model.
As before, one way to set the scale of the latent factors is by
fixing n and to zero and one for a scaling indicator, respectively
This is what we did when specifying the preceding models
Alternatively, we can set the scale of the latent variables by fixing
and to zero and one, respectively.
Unlike CFA, this is not always equivalent to putting the factor on a
standard normal scale.
If the factor is predicted by other variables in the model, then we are
actually standardizing the factor disturbances, and the total variance of
the factor will exceed one
One can put the factors on a standard normal scale post-estimation by requesting the standardized
solution. We will see this in our example later.
Model Identification
Aside from assigning a metric to the latent variables, other
restrictions are also necessary to identify the model.
e.g., common to assume simple structure and local independence of
items (but not required)
Model identification can be determined by
Matrix algebra (hard)
t-rule (necessary but not sufficient)
Two-step rule (sufficient but not necessary)
t-Rule
As before, the t-rule simply stipulates that there should be more
observed means and unique (co)variances than estimated
parameters.
More formally, if t is the number of freely estimated parameters in
then a necessary (but not sufficient) condition for identification
is
t p q p q 1 2 p q
where p and q are the number of y and x variables, respectively
Because x and Σ xx are always saturated, some software programs (e.g.,
Mplus) do not count the elements in these matrices as parameters to be
estimated
In assessing the t-rule, however, one should include unique elements in
x and Σ xx when counting the number of parameters t.
Two-Step Rule
Step 1: Respecify the model as a confirmatory factor analysis
with all possible associations between factors. Determine the
identification of the measurement model as you would do for a
CFA.
Step 2: Treat all latent variables within the structural model as if
they were observed variables. Determine the identification of
the structural model as you would do for a simultaneous
equation model for observed variables.
Passing the two-step rule is sufficient to ensure model
identification, but not necessary (i.e., some identified models fail
the two-step rule).
2 3
y1 y2 y3 y4 y5 y6 y7 y8
1 2 3 4 5 6 7 8
y1 y2 y3 y4 y5 y6 y7 y8
1 2 3 4 5 6 7 8
x y y
(1)
0 0
B Ψ 11
21 0 0 22
Note that the first latent variable is now treated as if it were a fixed-x exogenous predictor, so the B and
matrices are only 2 x 2 for the simultaneous equation model (whereas they are 3 x 3 for the SEM as
actually fit to the data).
y1 y2 y3 y4 y5 y6 y7 y8
1 2 3 4 5 6 7 8
Model Estimation
Given identification of model, we can now turn to estimation
As with simultaneous equation models and CFA, multiple
estimators exist for SEMs
Maximum likelihood
Ordinary least squares
Generalized least squares
Weighted least squares
As before, we will concentrate on maximum likelihood
The only significance difference from path analysis and CFA is in the
computation of the model-implied means and covariances
Model Estimation
The normal-theory ML fitting function is again
i 1
Model Respecification
When the model fails to fit, modification indices may suggest
avenues for improvement.
As always, one must treat these with some skepticism, as use of MIs is
exploratory and model fit can be improved by capitalizing on chance.
MIs indicate expected improvement in fit for freeing a single parameter
at a time, as opposed to larger scale model misspecifications.
Model Respecification
Another strategy for locating model misfit is to isolate problems
to the structural or measurement model.
Useful only if the structural model is not saturated.
Similar in concept to the two-step rule for identification.
Overall model c2 can be decomposed into additive components due to
misfit of the measurement model + misfit of the structural model.
The component for the measurement model can be obtained by fitting a
CFA respecification of the model with a saturated matrix. The c2
obtained for this model is the misfit due to the measurement model.
Poor fit indicates a problem with the measurement model.
The difference in c2 between the CFA and original SEM is the misfit due
to the structural model. Can evaluate by LRT, given these are nested
models (Anderson & Gerbing, 1988).
A significant c2 difference test indicates a problem with the structural model
Summary
In sum, the specification, identification, and process of fitting and
interpreting SEMs all follow straightforwardly from procedures
for simultaneous equation models and CFA models.
Objectives
Provide a real-data example of the process of fitting, evaluating,
re-specifying, and interpreting an SEM
It is worth noting that the sample size is a bit low, especially for the complexity of the fitted model, but
this is not uncommon for published applications of SEM.
Senol-Durak, E. & Ayvasik, H.B. (2010). Factors associated with posttraumatic growth among the
spouses of myocardial infarction patients. Journal of Health Psychology, 15, 5–95.
Abstract from manuscript: To clarify the rationale behind Posttraumatic Growth (PTG), a model by
Schaefer and Moos describes the relative contribution of environmental resources, individual resources,
event related factors, cognitive processing and coping (CPC) on PTG. In the present study, this model
was tested with the spouses of myocardial infarction patients with data from various hospitals in Turkey.
A structural equation model revealed that neither individual nor environmental resources had indirect
effects on PTG through the effect of event-related factors and CPC, while they showed direct effects on
PTG. The findings were discussed in the context of the theoretical model.
One could probably make a case that the indicators for Event-Related Factors should be causal
indicators rather than effect indicators. The authors used a reflective measurement model for all
factors, however, and we remain consistent with their analysis.
ERF
Event-Related ptgi1 18
Factors PTG
Environmental
Resources ptgi2 19
Posttraumatic
ptgi3 20
Growth
Individual
ptgi4 21
Resources
Coping Cope
ptgi5 22
1
1
21
where Ψ 0 0 1
0 0 0 1
0 0 0 0 1
Note that the means/intercepts and (residual) variances of the factors have been fixed to 0 and 1,
respectively to scale the latent variables.
Example: Identification
We shall scale the latent factors/residuals by setting their means
and variances to 0 and 1, respectively
Let us now see if the model passes the t-rule.
The number of estimated parameters is 22 factor loadings + 22
intercepts + 22 residual variances + 1 factor correlation + 6
factor regression slopes = 73
The number of observed variables is 22, so there are
22 22 1 2 22 275
unique observed first- and second moments from which to fit the
model
The model thus passes the t-rule and will have 275 – 73 = 202
degrees of freedom.
Example: Identification
The t-rule is necessary but not sufficient.
It quickly shows us that our model may be identified.
Now we can turn to the two-step rule to further evaluate the
identification of the model.
Recall that the first step is to reparameterize the model as a
confirmatory factor model with all possible associations among
the factors to verify identification of the measurement model.
The second step is to examine the structural model as if it were
a structural model for observed variables to verify its
identification.
Example: Identification
9 10 11
Here we see the
1 2 3
pro th time
measurement model is
identified by the 3-indicator
fa fr si
rule
Event-Related ptgi1 18
Factors
Environmental
Resources ptgi2 19
Posttraumatic
ptgi3 20
Growth
Individual
ptgi4 21
Resources
Coping
ptgi5 22
Example: Identification
ERF
Event-Related
Factors PTG
Environmental
Resources
Posttraumatic
Growth
Individual
Resources
Coping Cope
Example: Identification
If the latent variables were observed, the simultaneous equation
model would be
ERFi ERF 0 0 0 ERFi 11 12 ERFi
Coping ER
i COPE 21 0 0 Copingi 0 0 i COPEi
0 PTGi 31 32 i PTGi
IR
PTGi PTG 0 32
with
ERF
Ψ 0 COPE
0 0 PTG
Note that the Environmental and Individual Resources factors are treated as if they were exogenous x
variables in this step.
Our fit measures do not perfectly reproduce the results reported in Senol-Durak & Ayvasik (2010) but
this may be due to rounding error in the correlation matrix.
Posttraumatic
ptgi3 20
Growth
Individual
ptgi4 21
Resources
Coping
ptgi5 22
Revised Model
9 10 11
1 2 3
pro th time
fa fr si
ERF
Event-Related ptgi1 18
Factors PTG
Environmental
Resources ptgi2 19
Posttraumatic
ptgi3 20
Growth
Individual
ptgi4 21
Resources
Coping Cope
ptgi5 22
Revised Model
Fit statistics for the correlated uniqueness model:
c2(195) = 270.28, p = .0003
CFI = .923; TLI = .908
RMSEA = .054; CI90 = (.037, .069)
SRMR = .090
CU model fits significantly better than the original model:
Dc2(7) = 350.00 – 270.28 = 79.72, p < .0001.
Overall fit is borderline, especially given low power with low N
Some large MIs still, but suggested paths not consistent with theory
Especially at low N, suggested paths may reflect only chance
fluctuations in data that would not replicate in another sample
We will retain the revised, correlated uniqueness model and
proceed to interpret the estimates.
Interpretation
Interpretation of the measurement model is done similarly to
CFA. We will thus consider just the IR factor here as an example
Note that locus of control is a
negative indicator of individual Individual
resources (but not clear from Resources
article how this was scored) -.64 .51
.39 .47
Control subscale of Psychological .71
All reported values are standardized estimates. Although we consider only the Individual Resources
factor here, the interpretation of the other factors would proceed similarly.
Interpretation
The structural is depicted with standardized estimates below:
.60
ERF
Event-Related .59
-.18 Factors PTG
Environmental
Resources
.20
.65 -.33 Posttraumatic
.24 Growth
Individual .59
.22
Resources
Coping Cope
Findings include: .89
Effect Decomposition
Let’s first consider the effect of Event-Related Factors on Post-
Traumatic Growth
The effects of Event-Related Factors on Post-Traumatic Growth:
Total: -.20, p < .05
Direct : 0
Indirect : -.20, p < .05
Interpretation: More positive event-related factors reduce the
need for Coping. Since Coping positively impacts Post-Traumatic
Growth, higher Event-Related Factors lead to less opportunity
for Post-Traumatic Growth
Note that the direct effect of Event-Related Factors on Post-Traumatic
Growth was constrained to be zero, so the total effect equals the
indirect effect in this case.
Effect Decomposition
Let us now consider the effects of our distal predictors,
Environmental Resources and Individual Resources
The effects of Environmental Resources on Post-Traumatic
Growth:
Total: .24, p < .05
Direct : .20, p < .05
Indirect : .04, ns
Environmental Resources do not significantly impact Event-
Related Factors, and this is reflected in the non-significant indirect
effect of Environmental Resources on Post-Traumatic Growth
through Event-Related Factors and Coping.
Environmental Resources do, however, have a significantly positive
direct effect on Post-Traumatic Growth.
Effect Decomposition
Last we can consider the effect decomposition for Individual
Resources
The effects of Individual Resources on Post-Traumatic Growth:
Total: .09, ns
Direct : .22, p < .05
Indirect : -.13, p = .05
There are countervailing indirect and direct effects that result in
a non-significant total effect.
Higher Individual Resources predict more positive Event-Related Factors
(which predict less Coping and, in turn, less opportunity for Post-
Traumatic Growth), leading to the negative indirect effect and null total
effect.
Summary
The procedures we followed in fitting the SEM are very similar to
those used in path analysis and CFA
Like in path analysis, our models may include complex causal
chains and mediation effects, but now including latent predictors,
mediators and outcomes.
The computation, testing, and interpretation of total, direct and
indirect effects is similar to path analysis, and can aid in model
interpretation.
Objectives
Revisit the issue of equivalent models in the SEM context
Equivalent Models
When one tests an SEM, generally one considers this a test of the
particular SEM that has been specified.
But in fact it is a test of the class of all equivalent models to the
specified SEM.
Some equivalent models, despite having equal fit, may generate
quite different substantive conclusions, so it is a good idea to
consider what alternative model structures cannot be
differentiated from the hypothesized one.
Coping
PTG
Environmental
Resources
Posttraumatic
Growth
Individual
Resources
-.03
Coping
PTG
Environmental .59
.60
Resources
.20 Posttraumatic
.24 Growth
Individual
.22
Resources
More positive Event-Related Factors lead to less need for coping and in
turn less opportunity for post-traumatic growth
Environmental and Individual Resources promote post-traumatic growth
Summary
Many other equivalent models may exist that provide precisely
the same fit to the data as the one initially motivated by theory
Thus the test of theory may not be as strong as initially supposed
(not just testing original model, but also all equivalent models)
In some cases, some equivalent models may be equally or more
defensible / interpretable as the original model
Chapter Summary
Structural equation models with latent variables blend together
the core features of path analysis and confirmatory factor analysis
SEMs are composed of two parts
Measurement Model (like CFA)
Structural Model (like path analysis)
Fitting and interpreting SEMs involves similar procedures to path
analysis and CFA
Identification, estimation, evaluation, re-specification, interpretation
Interpretation aided by computation of total, direct, and indirect effects
Relative to path analysis, SEM with latent variables provides the
key advantage that the coefficients in the structural model are
unbiased by measurement error