0% found this document useful (0 votes)
54 views

Model Evaluation and Selection

1) The document discusses evaluating a single model and comparing alternative models by examining various measures of model fit. 2) It describes evaluating the chi-square statistic, RMSEA, AIC, CAIC, and BIC values to determine if a model fits the data well. Additional pathways only improve model fit if they reduce chi-square by more than 3.84. 3) Based on the values for this model, the document concludes the model fits reasonably well and omitting the pathway from s_age to tcov is justified, though more data could potentially change this conclusion. Statistical tests should guide and not dictate conclusions.

Uploaded by

Aashish Mehra
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Model Evaluation and Selection

1) The document discusses evaluating a single model and comparing alternative models by examining various measures of model fit. 2) It describes evaluating the chi-square statistic, RMSEA, AIC, CAIC, and BIC values to determine if a model fits the data well. Additional pathways only improve model fit if they reduce chi-square by more than 3.84. 3) Based on the values for this model, the document concludes the model fits reasonably well and omitting the pathway from s_age to tcov is justified, though more data could potentially change this conclusion. Statistical tests should guide and not dictate conclusions.

Uploaded by

Aashish Mehra
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 17

Model Evaluation and

Selection

1
Example Objective: Demonstrate how
to evaluate a single model and how to
compare alternative models.

2
Evaluating the Sufficiency of a Single Model
(followup to example of Mediation Test)

When this model is run, a variety of


measures of model fit will be generated.
A question of importance is, "Is the fit of
the model sufficiently good to yield
reliable results?"
The alternative model is one in which there is also an arrow from s_age to
tcov. In other words, does fire severity explain the effect of stand age on 3
cover, or, is there another pathway of influence independent of fire severity?
Finding Measures of Model Fit in Amos I

The model chi-square is the


most commonly used measure
of absolute model fit.

It is always good to check the section


of the output called “Notes for Model”.
Here we can see that a minimum was
achieved and the full p-value for the chi-
square. P-value greater than 0.05
suggests that we could accept this model
(it indicates no major deviations between
data and model). 4
Further Considerations of Model Chi-square
It is well known that model p-values are not always the
best way to decide if a model is adequate (in an
absolute sense) or the best model (in a relative sense).
This is a complex topic and one that lacks complete
consensus. What is generally agreed upon is:
(1) Chi-squares automatically increase with increasing
sample size and p-values reflect increasing power for
detecting deviations.
(2) P-values for model chi-squares are pretty useful when
sample sizes are less than 200, especially for models
that do not include latent variables possessing multiple
indicators.
(3) It is recommended that folks look at multiple
measures. 5
Further Considerations (cont.)
One useful way to evaluate model adequacy is to see if
the addition of pathways causes the model chi-square to
drop by more than 3.84 units. This is the “single-degree-
of-freedom chi-square test”. If adding a path reduces
the chi-square by less than 3.84, it implies that the
added path is not strongly supported by the data.

In the current example, the chi-square is 3.243, which


tells us that adding a path from s_age to tcov could only
reduce model chi-square by 3.243. This further indicates
that our model could be considered to be adequate.

6
Finding Measures of Model Fit in Amos II

“Cmin” means minimum chi-square.

Model Fit tab gives us


several measures to
consider.

7
continued

clicking on labels gives additional info

8
continued

RMSEA indicates “close” fit.


Also that a value of 0
(perfect fit) cannot be ruled
out.

An AIC for our model (the


“default” model) of 13.243
could only be reduced to a
value of 12.000 by
saturating our model. This
is less than the minimum
recommended AIC
difference of 2.0,
suggesting models
indistinguishable.

BUT, AIC is often not a


reliable measure. 9
continued some more

The CAIC (consistent AIC) is


generally viewed to be a
better measure than AIC.
Here we see that the
default model value is more
than 2.0 units smaller than
the saturated model,
supporting the conclusion
that our model is adequate.

10
and still some more

The BIC (Bayesian


Information Criterion) is one
of the more popular
measures at the moment.
In this case, the saturated
model BIC is only 1.257
greater, which is less than
the 2.0 difference
recommended for picking
among models. This index
tells us that while the
evidence is better for the
default model, the
saturated model can’t be 11
ruled out.
and even still some more

The Hoelter index relates


back to our model Chi-
square and its p-values. It
tells us that at a sample
size of 106, we would have
enough power to detect an
additional path from s_age
to tcov with a p-value less
than 0.05. 183 samples
would be required to obtain
a p-value less than 0.01.
12
AIC difference criteria
AIC diff support for equivalency of models

0-2 substantial

4-7 weak

> 10 none

Burnham, K.P. and Anderson, D.R. 2002. Model Selection and


Multimodel Inference. Springer Verlag. (second edition), p 70.

13
BIC difference criteria
BIC diff support for difference between models

0-2 weak

2-6 positive

6-10 strong

> 10 very strong

Raftery, A.E. 1995. Sociological Methodology. 25:111-163, p 70

14
What do we conclude in this case?
Given the data we have available, we could justify (in my
view) omitting the pathway from s_age to tcov. However,
we must recognize that this is an approximation of the
truth. If we had more samples, would they lead us to
decide that we needed to include a path from s_age to
tcov? Without the additional samples we don’t really
know. Comparing the path coefficients for the two models
would allow us to decide the scientific consequences of
our model choice.

15
What is the SEM perspective on
model selection?
In SEM we use our scientific knowledge to guide our
decisions, and this applies especially to model selection. Do
we believe it serves our scientific purposes to omit the path
from s_age to tcov? We certainly can present the results for
the path in the following fashion if we think it merits
discussion.

e1 e2
0.45 -0.35
s_age fidx tcov

-0.19ns
16
Final thought

"Statistical tests are aids to (hopefully wise) judgement,


not two-valued logical declarations of truth or falsity".
Abelson, RP (1995) Statistics as Principled Argument.
Lawrence Erlbaum Associates, Hillsdale, NJ, USA

17

You might also like