Structual Equation Modeling PDF
Structual Equation Modeling PDF
Introduction to Structural
Equation Modeling: Issues
and Practical Considerations
Pui-Wa Lei and Qiong Wu, The Pennsylvania State University
Structural equation modeling (SEM) is a versatile statistical modeling tool. Its estimation
techniques, modeling capacities, and breadth of applications are expanding rapidly. This module
introduces some common terminologies. General steps of SEM are discussed along with important
considerations in each step. Simple examples are provided to illustrate some of the ideas for
beginners. In addition, several popular specialized SEM software programs are briefly discussed
with regard to their features and availability. The intent of this module is to focus on foundational
issues to inform readers of the potentials as well as the limitations of SEM. Interested readers are
encouraged to consult additional references for advanced model types and more application
examples.
Keywords:
Fall 2007
ongoing. With advances in estimation techniques, basic models, such as measurement models, path models, and their
integration into a general covariance structure SEM analysis framework have been expanded to include, but are by
no means limited to, the modeling of mean structures, interaction or nonlinear relations, and multilevel problems.
The purpose of this module is to introduce the foundations
of SEM modeling with the basic covariance structure models to new SEM researchers. Readers are assumed to have
basic statistical knowledge in multiple regression and analysis of variance (ANOVA). References and other resources
on current developments of more sophisticated models are
provided for interested readers.
What is Structural Equation Modeling?
Structural equation modeling is a general term that has
been used to describe a large number of statistical models
used to evaluate the validity of substantive theories with
empirical data. Statistically, it represents an extension of
general linear modeling (GLM) procedures, such as the
ANOVA and multiple regression analysis. One of the primary advantages of SEM (vs. other applications of GLM)
is that it can be used to study the relationships among latent constructs that are indicated by multiple measures. It is
also applicable to both experimental and non-experimental
data, as well as cross-sectional and longitudinal data. SEM
takes a confirmatory (hypothesis testing) approach to the
33
multivariate analysis of a structural theory, one that stipulates causal relations among multiple variables. The causal
pattern of intervariable relations within the theory is specified a priori. The goal is to determine whether a hypothesized theoretical model is consistent with the data collected
to reflect this theory. The consistency is evaluated through
model-data fit, which indicates the extent to which the postulated network of relations among variables is plausible.
SEM is a large sample technique (usually N > 200; e.g.,
Kline, 2005, pp. 111, 178) and the sample size required is
somewhat dependent on model complexity, the estimation
method used, and the distributional characteristics of observed variables (Kline, pp. 1415). SEM has a number of
synonyms and special cases in the literature including path
analysis, causal modeling, and covariance structure analysis.
In simple terms, SEM involves the evaluation of two models:
a measurement model and a path model. They are described
below.
Path Model
Path analysis is an extension of multiple regression in that it
involves various multiple regression models or equations that
are estimated simultaneously. This provides a more effective
and direct way of modeling mediation, indirect effects, and
other complex relationship among variables. Path analysis
can be considered a special case of SEM in which structural
relations among observed (vs. latent) variables are modeled.
Structural relations are hypotheses about directional influences or causal relations of multiple variables (e.g., how
independent variables affect dependent variables). Hence,
path analysis (or the more generalized SEM) is sometimes
referred to as causal modeling. Because analyzing interrelations among variables is a major part of SEM and these interrelations are hypothesized to generate specific observed
covariance (or correlation) patterns among the variables,
SEM is also sometimes called covariance structure analysis.
In SEM, a variable can serve both as a source variable
(called an exogenous variable, which is analogous to an independent variable) and a result variable (called an endogenous variable, which is analogous to a dependent variable)
in a chain of causal hypotheses. This kind of variable is
often called a mediator. As an example, suppose that family environment has a direct impact on learning motivation
which, in turn, is hypothesized to affect achievement. In this
case motivation is a mediator between family environment
and achievement; it is the source variable for achievement
and the result variable for family environment. Furthermore,
feedback loops among variables (e.g., achievement can in
turn affect family environment in the example) are permissible in SEM, as are reciprocal effects (e.g., learning
motivation and achievement affect each other).1
In path analyses, observed variables are treated as if they
are measured without error, which is an assumption that
does not likely hold in most social and behavioral sciences.
When observed variables contain error, estimates of path coefficients may be biased in unpredictable ways, especially for
complex models (e.g., Bollen, 1989, p. 151178). Estimates
of reliability for the measured variables, if available, can be
incorporated into the model to fix their error variances (e.g.,
squared standard error of measurement via classical test
theory). Alternatively, if multiple observed variables that
are supposed to measure the same latent constructs are
34
available, then a measurement model can be used to separate the common variances of the observed variables from
their error variances thus correcting the coefficients in the
model for unreliability.2
Measurement Model
The measurement of latent variables originated from psychometric theories. Unobserved latent variables cannot be
measured directly but are indicated or inferred by responses
to a number of observable variables (indicators). Latent
constructs such as intelligence or reading ability are often
gauged by responses to a battery of items that are designed
to tap those constructs. Responses of a study participant to
those items are supposed to reflect where the participant
stands on the latent variable. Statistical techniques, such
as factor analysis, exploratory or confirmatory, have been
widely used to examine the number of latent constructs underlying the observed responses and to evaluate the adequacy
of individual items or variables as indicators for the latent
constructs they are supposed to measure.
The measurement model in SEM is evaluated through confirmatory factor analysis (CFA). CFA differs from exploratory
factor analysis (EFA) in that factor structures are hypothesized a priori and verified empirically rather than derived
from the data. EFA often allows all indicators to load on all
factors and does not permit correlated residuals. Solutions
for different number of factors are often examined in EFA
and the most sensible solution is interpreted. In contrast,
the number of factors in CFA is assumed to be known. In
SEM, these factors correspond to the latent constructs represented in the model. CFA allows an indicator to load on
multiple factors (if it is believed to measure multiple latent
constructs). It also allows residuals or errors to correlate (if
these indicators are believed to have common causes other
than the latent factors included in the model). Once the measurement model has been specified, structural relations of
the latent factors are then modeled essentially the same way
as they are in path models. The combination of CFA models
with structural path models on the latent constructs represents the general SEM framework in analyzing covariance
structures.
Other Models
Current developments in SEM include the modeling of mean
structures in addition to covariance structures, the modeling
of changes over time (growth models) and latent classes or
profiles, the modeling of data having nesting structures (e.g.,
students are nested within classes which, in turn, are nested
with schools; multilevel models), as well as the modeling of
nonlinear effects (e.g., interaction). Models can also be different for different groups or populations by analyzing multiple sample-specific models simultaneously (multiple sample
analysis). Moreover, sampling weights can be incorporated
for complex survey sampling designs. See Marcoulides and
Schumacker (2001) and Marcoulides and Moustaki (2002)
for more detailed discussions of the new developments in
SEM.
How Does SEM Work?
In general, every SEM analysis goes through the steps of
model specification, data collection, model estimation, model
Educational Measurement: Issues and Practice
evaluation, and (possibly) model modification. Issues pertaining to each of these steps are discussed below.
Model Specification
A sound model is theory based. Theory is based on findings in the literature, knowledge in the field, or ones educated guesses, from which causes and effects among variables
within the theory are specified. Models are often easily conceptualized and communicated in graphical forms. In these
graphical forms, a directional arrow () is universally used
to indicate a hypothesized causal direction. The variables
to which arrows are pointing are commonly termed endogenous variables (or dependent variables) and the variables
having no arrows pointing to them are called exogenous variables (or independent variables). Unexplained covariances
among variables are indicated by curved arrows ( ). Observed variables are commonly enclosed in rectangular boxes
and latent constructs are enclosed in circular or elliptical
shapes.
For example, suppose a group of researchers have developed a new measure to assess mathematics skills of preschool
children and would like to find out (a) whether the skill
scores measure a common construct called math ability and
(b) whether reading readiness (RR) has an influence on
math ability when age (measured in month) differences are
controlled for. The skill scores available are: counting aloud
(CA)count aloud as high as possible beginning with the
number 1; measurement (M)identify fundamental measurement concepts (e.g., taller, shorter, higher, lower) using
basic shapes; counting objects (CO)count sets of objects
and correctly identify the total number of objects in the
set; number naming (NN)read individual numbers (or
shapes) in isolation and rapidly identify the specific number (shape) being viewed; and pattern recognition (PR)
identify patterns using short sequences of basic shapes (i.e.,
circle, square, and triangle). These skill scores (indicators)
are hypothesized to indicate the strength of childrens latent math ability, with higher scores signaling stronger math
ability. Figure 1 presents the conceptual model.
The model in Figure 1 suggests that the five skill scores
on the right are supposedly results of latent math ability
(enclosed by an oval) and that the two exogenous observed
variables on the left (RR and age enclosed by rectangles) are
predictors of math ability. The two predictors (connected by
) are allowed to be correlated but their relationship is not
explained in the model. The latent math ability variable and
the five observed skill scores (enclosed by rectangles) are
endogenous in this example. The residual of the latent endogenous variable (residuals of structural equations are also
called disturbances) and the residuals (or errors) of the skill
variables are considered exogenous because their variances
CA
RR
M
MATH
CO
AGE
NN
PR
Structural part
Measurement part
VC
PO
CL
MATH
RE
to a baseline model (often one in which all observed variables are uncorrelated). Examples of incremental fit indices
include normed fit index (NFI; Bentler & Bonett, 1980),
Tucker-Lewis index (TLI; Tucker & Lewis, 1973), relative
noncentrality index (RNI; McDonald & Marsh, 1990), and
comparative fit index (CFI; Bentler, 1989, 1990). Higher values of incremental fit indices indicate larger improvement
over the baseline model in fit. Values in the .90s (or more
recently .95) are generally accepted as indications of good
fit.
In contrast, absolute fit indices measure the extent to
which the specified model of interest reproduces the sample
covariance matrix. Examples of absolute fit indices include
Joreskog and Sorboms (1986) goodness-of-fit index (GFI)
and adjusted GFI (AGFI), standardized root mean square
residual (SRMR; Bentler, 1995), and the RMSEA (Steiger &
Lind, 1980). Higher values of GFI and AGFI as well as lower
values of SRMR and RMSEA indicate better model-data fit.
SEM software programs routinely report a handful of
goodness-of-fit indices. Some of these indices work better
than others under certain conditions. It is generally recommended that multiple indices be considered simultaneously
when overall model fit is evaluated. For instance, Hu and
Bentler (1999) proposed a 2-index strategy, that is, reporting SRMR along with one of the fit indices (e.g., RNI, CFI, or
RMSEA). The authors also suggested the following criteria
for an indication of good model-data fit using those indices:
RNI (or CFI) .95, SRMR .08, and RMSEA .06. Despite the sample size sensitivity problem with the chi-square
test, reporting the model chi-square value with its degrees of
freedom in addition to the other fit indices is recommended.
Because some solutions may be improper, it is prudent
for researchers to examine individual parameter estimates
as well as their estimated standard errors. Unreasonable
magnitude (e.g., correlation > 1) or direction (e.g., negative
variance) of parameter estimates or large standard error
estimates (relative to others that are on the same scale) are
some indications of possible improper solutions.
If a model fits the data well and the estimation solution
is deemed proper, individual parameter estimates can be interpreted and examined for statistical significance (whether
they are significantly different from zero). The test of individual parameter estimates for statistical significance is based
Table 1. Sample Correlation, Mean, and Standard Deviation for the Model
of Figure 1
Variables
Variables
Age
RR
CA
M
CO
NN
PR
Mean
SD
AGE
RR
1
.357
.382
.510
.439
.513
.372
50.340
6.706
1
.439
.405
.447
.475
.328
.440
1.023
CA
CO
NN
PR
.588
.512
.591
.564
.666
1.297
1
.604
.560
.531
.730
.855
1
.606
.443
.545
.952
1
.561
.625
.933
1
.624
1.196
Note: CA = counting aloud; M = measurement; CO = counting objects; NN = number naming; PR = pattern recognition; RR =
reading readiness; age is measured in months.
Fall 2007
37
Table 2. Parameter and Standard Error Estimates for the Model of Figure 1
Model Parameters
Standardized
Estimate
Unstandardized
Estimate
Standard
Error
.74
.77
.74
.80
.68
.46
.40
1.00a
.68
.73
.77
.84
.07
.38
.07
.08
.08
.10
.01
.07
.45
.41
.46
.36
.54
.50
.75
.30
.41
.32
.78
.46
.10
.04
.05
.05
.09
.09
.36
2.45
.56
Loadings/effects on MATH
CA
M
CO
NN
PR
Age
RR
Residual variances
CA
M
CO
NN
PR
MATH
Covariance
RR and age
Note: Table values are maximum likelihood estimates. CA = counting aloud; M = measurement; CO = counting objects; NN =
number naming; PR = pattern recognition; RR = reading readiness; age is measured in months.
a
Fixed parameter to set the scale of the latent variable.
A large modification index (>3.84) suggests that a large improvement in model fit as measured by chi-square can be
expected if a certain fixed parameter is freed. The decision
of freeing a fixed parameter is less likely affected by chance
if it is based on a large modification index as well as a large
expected parameter change value.
As an illustration, Table 3 shows the simple descriptive
statistics of the variables for the model of Figure 2, and
Table 4 provides the parameter estimates (standardized and
unstandardized) and their standard error estimates. Had
one restricted the residuals of the latent READ and MATH
variables to be uncorrelated, the model would not fit the sample data well as suggested by some of the overall model fit
indices: 26 = 45.30, p < .01, RMSEA = .17 (>.10), SRMR =
.078 (acceptable because it is < .08). The solution was also
Table 3. Sample Correlation, Mean, and Standard Deviation for the Model
of Figure 2
Variables
Variables
VC
PO
BW
RC
CL
RE
Mean
SD
VC
PO
BW
RC
CL
RE
1
.704
.536
.682
.669
.743
89.810
15.234
1
.354
.515
.585
.618
92.347
18.463
1
.781
.560
.532
1
.657
.688
83.448
14.546
87.955
15.726
1
.781
85.320
15.366
1
91.160
15.092
Note: VC = verbal comprehension; PO = perceptual organization; BW = basic word reading; RC = reading comprehension; CL =
calculation; RE = reasoning.
38
Table 4. Parameter and Standard Error Estimates for the Model of Figure 2
Model Parameters
Loadings/effects on READ
BW
RC
VC
PO
Loadings/effects on MATH
CL
RE
VC
PO
Residual variances
BW
RC
CL
RE
READ
MATH
Covariances
VC and PO
Residuals of READ and MATH
Standardized
Estimate
Unstandardized
Estimate
Standard
Error
.79
.99
.64
.07
1.00a
1.35
.48
.04
.10
.07
.05
.85
.92
.64
.23
1.00a
1.05
.55
.16
.07
.06
.05
.38
.02
.27
.16
.52
.33
79.55
5.28
64.22
36.77
69.10
56.98
10.30
11.96
8.98
7.88
10.44
9.21
.70
.21
198.05
31.05
24.39
6.31
Note: Table values are maximum likelihood estimates. VC = verbal comprehension; PO = perceptual organization; BW = basic
word reading; RC = reading comprehension; CL = calculation; RE = reasoning.
a
Fixed parameter to set the scale of the latent variable.
improper because there was a negative error variance estimate. The modification index for the covariance between the
residuals of READ and MATH was 33.03 with unstandardized
expected parameter change of 29.44 (standardized expected
change = .20). There were other large modification indices.
However, freeing the residual covariance between READ and
MATH was deemed most justifiable because the relationship
between these two latent variables was not likely fully explained by the two intelligence subtests (VC and PO). The
modified model appeared to fit the data quite well ( 25 =
8.63, p = .12, RMSEA = .057, SRMR = .017). The actual chisquare change from 45.30 to 8.63 (i.e., 36.67) was slightly
different from the estimated change (33.03), as was the actual parameter change (31.05 vs. 29.44; standardized value =
.21 vs. .20). The differences between the actual and estimated
changes are slight in this illustration because only one parameter was changed. Because parameter estimates are not
independent of each other, the actual and expected changes
may be very different if multiple parameters are changed
simultaneously, or the order of change may matter if multiple parameters are changed one at a time. In other words,
different final models can potentially result when the same
initial model is modified by different analysts.
As a result, researchers are warned against making a large
number of changes and against making changes that are
not supported by strong substantive theories (e.g., Byrne,
1998, p. 126). Changes made based on modification indices
may not lead to the true model in a large variety of realistic situations (MacCallum, 1986; MacCallum, Roznowski, &
Necowitz, 1992). The likelihood of success of post hoc modification depends on several conditions: It is higher if the
initial model is close to the true model, the search continues even when a statistically plausible model is obtained, the
Fall 2007
means, or asymptotic covariance matrix of variances and covariances (required for asymptotically distribution-free estimator or Satorra and Bentlers scaled chi-square and robust
standard errors; see footnote 5). PRELIS can be used as a
stand-alone program or in conjunction with other programs.
Summary statistics or raw data can be read by SIMPLIS or
LISREL for the estimation of SEM models. The LISREL syntax requires the understanding of matrix notation while the
SIMPLIS syntax is equation-based and uses variable names
defined by users. Both LISREL and SIMPLIS syntax can be
built through interactive LISREL by entering information for
the model construction wizards. Alternatively, syntax can be
built by drawing the models on the Path Diagram screen.
LISREL 8.7 allows the analysis of multilevel models for hierarchical data in addition to the core models. A free student
version of the program, which has the same features as the
full version but limits the number of observed variables to
12, is available from the web site of Scientific Software International, Inc. (http://www.ssicentral.com). This web site
also offers a list of illustrative examples of LISRELs basic
and new features.
EQS
Version 6 (Bentler, 2002; Bentler & Wu, 2002) of EQS (Equations) provides many general statistical functions including descriptive statistics, t-test, ANOVA, multiple regression,
nonparametric statistical analysis, and EFA. Various data exploration plots, such as scatter plot, histogram, and matrix
plot are readily available in EQS for users to gain intuitive
insights into modeling problems. Similar to LISREL, EQS
allows different ways of writing syntax for model specification. The program can generate syntax through the available
templates under the Build_EQS menu, which prompts the
user to enter information regarding the model and data for
analysis, or through the Diagrammer, which allows the user
to draw the model. Unlike LISREL, however, data screening
(information about missing pattern and distribution of observed variables) and model estimation are performed in one
run in EQS when raw data are available. Model-based imputation that relies on a predictive distribution of the missing
data is also available in EQS. Moreover, EQS generates a
number of alternative model chi-square statistics for nonnormal or categorical data when raw data are available. The
program can also estimate multilevel models for hierarchical
data. Visit http://www.mvsoft.com for a comprehensive list
of EQSs basic functions and notable features.
Mplus
Version 3 (Muthen & Muthen, 19982004) of the Mplus program includes a Base program and three add-on modules.
The Mplus Base program can analyze almost all single-level
models that can be estimated by other SEM programs. Unlike
LISREL or EQS, Mplus version 3 is mostly syntax-driven and
does not produce model diagrams. Users can interact with the
Mplus Base program through a language generator wizard,
which prompts users to enter data information and select the
estimation and output options. Mplus then converts the information into its program-specific syntax. However, users have
to supply the model specification in Mplus language themselves. Mplus Base also offers a robust option for non-normal
data and a special full-information maximum likelihood estimation method for missing data (see footnote 4). With the
Educational Measurement: Issues and Practice
Y2 is an exogenous variable.
Variation in Y2 is affected by variation in Y1.
Variation in Y1 is explained within the model.
Y1 and Y2 are correlated.
Exogenous variable
Endogenous variable
Latent variable
Disturbance
7. What are some possible ways of accounting for measurement errors of observed variables in SEM?
8. What does it mean for a model to be identified?
A. The model is theory based.
B. Fit indices for the model are all satisfactory.
C. An SEM program successfully produces a solution
for the model.
D. There exists a unique solution for every free parameter in the model.
9. In order for a model to be identified, we must set the
variances of the latent variables to 1. T/F
41
Chi-square
RMSEA
SRMR
CFI
Acknowledgment
We thank James DiPerna, Paul Morgan, and Marley Watkins for
sharing the data used in the illustrative examples, as well as
Hoi Suen and the anonymous reviewers for their constructive
comments on a draft of this article.
Notes
1
When a model involves feedback or reciprocal relations or
correlated residuals, it is said to be nonrecursive; otherwise
the model is recursive. The distinction between recursive and
nonrecursive models is important for model identification
and estimation.
2
The term error variance is often used interchangeably
with unique variance (that which is not common variance). In measurement theory, unique variance consists of
both true unique variance and measurement error variance, and only measurement error variance is considered
the source of unreliability. Because the two components of
unique variance are not separately estimated in measurement models, they are simply called error variance.
3
This principle of identification in SEM is also known as
the t-rule (Bollen, 1989, p. 93, p. 242). Given the number
of p observed variables in any covariance-structure model,
the number of variances and covariances is p(p + 1)/2. The
parameters to be estimated include factor loadings of measurement models, path coefficients of structural relations,
and variances and covariances of exogenous variables including those of residuals. In the math ability example, the
number of observed variances and covariances is 7(8)/2 =
28 and the number of parameters to be estimated is 15
(5 loadings + 2 path coefficients + 3 variancecovariance
among predictors + 6 residual variances 1 to set the scale
of the latent factor). Because 28 is greater than 15, the model
satisfies the t-rule.
4
It is not uncommon to have missing observations in any
research study. Provided data are missing completely at random, common ways of handling missing data, such as imputation, pairwise deletion, or listwise deletion can be applied.
However, pairwise deletion may create estimation problems
for SEM because a covariance matrix that is computed based
on different numbers of cases may be singular or some estimates may be out-of-bound. Recent versions of some SEM
software programs offer a special maximum likelihood estimation method (referred to as full-information maximum
likelihood), which uses all available data for estimation and
requires no imputation. This option is logically appealing
because there is no need to make additional assumptions for
imputation and there is no loss of observations. It has also
Educational Measurement: Issues and Practice
Fall 2007
43