Misunderstanding Analysis of Covariance
Misunderstanding Analysis of Covariance
Despite numerous technical treatments in many venues, analysis of covariance (ANCOVA) remains a
widely misused approach to dealing with substantive group differences on potential covariates, partic-
ularly in psychopathology research. Published articles reach unfounded conclusions, and some statistics
texts neglect the issue. The problem with ANCOVA in such cases is reviewed. In many cases, there is
no means of achieving the superficially appealing goal of "correcting" or "controlling for" real group
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
differences on a potential covariate. In hopes of curtailing misuse of ANCOVA and promoting appro-
This document is copyrighted by the American Psychological Association or one of its allied publishers.
priate use, a nontechnical discussion is provided, emphasizing a substantive confound rarely articulated
in textbooks and other general presentations, to complement the mathematical critiques already available.
Some alternatives are discussed for contexts in which ANCOVA is inappropriate or questionable.
In research comparing groups of participants, classical experi- inherently, not 4th graders, and ANCOVA cannot "control for"
mental design (Campbell & Stanley, 1963) relies, whenever pos- that fact. Age is so intimately associated with grade in school that
sible, on random assignment of participants to groups. Observed removal of variance in basketball ability associated with age would
differences between such groups, prior to experimental treatments, remove considerable (perhaps nearly all) variance in basketball
are due to chance rather than being meaningfully related to the ability associated with grade. The results of the ANCOVA would
group variable. In contrast, when preexisting groups are studied, be meaningless. As a complement to this problem of the covariate
observed pretreatment differences may reflect some meaningful, removing too much of the independent variable of interest, a
substantive differences that are attributable to group membership. problem can arise when preexisting groups differ systematically on
It has been noted that, "Even in the absence of a true treatment more than the covariate. The covariate will leave those differences
effect, the outcome scores in the treatment groups are likely to intact, thus biasing the estimate of the treatment effect (Reichardt
differ substantially because of initial selection differences. As a & Bormann, 1994), which has been called specification error.
result, selection differences are a threat to validity. . . . " (Reichardt
& Bormann, 1994, p. 442).
In this article, we discuss why attempts to control statistically Nonrandom Group Assignment
for such differences are, in general, inappropriate. For example,
When group membership is determined nonrandomly, there is
consider a data set consisting of age as a potential covariate, grade
typically no thorough basis for determining whether a given pre-
in school as the grouping variable, and basketball performance as
treatment difference reflects random error or a true group differ-
the dependent variable. An analysis of covariance (ANCOVA)
ence. This uncertainty complicates interpretation of apparent treat-
might be run in hopes of asking whether 3rd and 4th graders would
ment effects because it is impossible to distinguish a main effect of
differ in performance were they not different in age. This might
treatment from an interaction between the effects of the treatment
seem to be a reasonable question, in that one could ask whether
and the pretreatment difference, and from meaningful overlap
some maturational change at that age makes a nonlinear contribu-
(variance shared) between the treatment and the pretreatment char-
tion to basketball ability. However, in fact it makes no sense to ask
acteristic. The confound due to an undetected interaction is widely
how 3rd graders would do if they were 4th graders. They are,
understood—the pooled regression is inappropriate as a basis for
"correcting" for the covariate. However, the other confound, due to
undetected or misinterpreted overlap of pretreatment difference
Gregory A. Miller, Departments of Psychology and Psychiatry and and grouping factor, is commonly neglected, although it was noted
Beckman Institute, University of Illinois, Champaign; Jean P. Chapman, in the first modern treatment of ANCOVA (Cochran, 1957) and
Department of Psychology, University of Wisconsin—Madison. occasionally more recently (e.g., Porter & Raudenbush, 1987).
The writing of this article was supported by National Institute of Mental The problem of preexisting group differences arises very com-
Health Grant MH39628. We thank Loren J. Chapman, Lawrence J. Hubert, monly in psychopathology research because random assignment to
Brandy G. Isaacks, Jack B. Nitschke, and E. Keolani Taitano for comments group (diagnostic category in the typical experimental design in
on an earlier draft.
psychopathology but called "treatment" in much of the statistical
Correspondence concerning this article should be addressed to Gregory
literature) is routinely infeasible and/or unethical. It is thus ex-
A. Miller, Department of Psychology, University of Illinois, 603 East
Daniel Street, Champaign, Illinois 61820, or to Jean P. Chapman, Depart- tremely tempting for psychopathologists to seek to use analytic
ment of Psychology, University of Wisconsin, 1202 West Johnson Street, methods in an attempt to avoid the interpretative problems that
Madison, Wisconsin 53706. Electronic mail may be sent to gamiller arise when groups differ pretreatment. It is unfortunate that, in the
@uiuc.edu or [email protected]. general case, no such analytic method is available, nor can one be
40
SPECIAL SECTION: ANALYSIS OF COVARIANCE 41
inappropriate uses or interpretations of ANCOVA, the present relationship detected between Grp and DV. In the left panel, Grp and Cov
This document is copyrighted by the American Psychological Association or one of its allied publishers.
share no variance. This is the classic situation for a true experiment, with
article offers a relatively nontechnical critique, in hopes of helping
random assignment to groups. Removing the variance associated with Cov
to popularize the correct use of ANCOVA and helping researchers
will not alter Grp. Given random assignment, individual characteristics
to avoid its more common abuses. such as height or presence of hallucinations would generally be randomly
distributed across the groups, and group means should not differ except by
chance. In the right panel, Grp and Cov do share variance. This is often the
Understanding Analysis of Covariance case when preexisting groups are studied, such as comparisons of two
diagnostic groups—a quasi-experiment rather than a true experiment. In
ANCOVA is part of the ANOVA (analysis of variance) tradi-
such a case, removing the variance associated with Cov will also alter Grp
tion. ANCOVA was developed to improve the power of the test of in potentially problematic ways.
the independent variable, not to "control" for anything. It is helpful
here to place ANOVA and ANCOVA in the more general frame-
work of multiple regression and correlation (MRC), understood
within the general linear model. In fact, at least one popular However, the distinction between the classical and MRC-based
ANOVA package, BMDP2V (Dixon, 1992), actually computes its perspectives on ANCOVA is not important for the issues ad-
analyses using multiple regression rather than using the equations dressed in the present article, and the MRC approach has some
typically presented in textbooks on ANOVA. Some sources place expository advantages. For illustrative purposes, we assume a
ANCOVA in the context of MRC, with the covariate in ANCOVA simple design having one covariate, Cov, one grouping variable,
understood as a regression predictor entered before what are, in Grp, and one dependent variable, DV, discussed in the MRC
ANOVA, the main effects and interactions (e.g., Harris, Bisbee, & framework advocated by Cohen and Cohen (1983). Figure 1 shows
Evans, 1 97 1 ; for extended treatments of this viewpoint, see Cohen two of several possible relationships among the three variables,
& Cohen, 1983, and Judd, McClelland, & Smith, 1996). More
with overlap indicating shared variance and, equivalently, a non-
commonly, other sources have argued that ANCOVA is not strictly
zero correlation. In the left panel, Cov and Grp share no variance,
equivalent to regressing the dependent variable on the covariate,
reflecting random assignment to group. On the right, they are
then doing an ANOVA on the residuals (e.g., Elashoff, 1969;
correlated.
Maxwell & Delaney, 1990; Maxwell et al., 1985; Porter & Rau-
denbush, 1987). In this view, ANCOVA provides a test of the main Following the approach of Cohen and Cohen (1983), Cov is
effect of group by comparing the error sums of squares resulting entered first in the regression. This removes variance from Grp
from two models rather than a regression. The two models are: that Cov shares with Grp, if any (area 2 + area 5 in Figure 1),
leaving a residual portion of Grp with which it is not correlated,
GrpKS (area 3 and area 6). Entry of Cov also removes variance it
shares with DV from DV (area 4 + area 5), leaving a residual
portion of DV with which it is not correlated, DVres (area 6 + area
7). For either panel in Figure 1, removal of Cov leaves areas 3, 6,
Roughly, Yy is the dependent variable for the fth subject in the jth and 7. The regression then enters GrpKS into the model and
group, ;u is the grand mean, Oj is the treatment effect for the ^'th
computes its correlation with £>Vres. What the investigator may
group, /JYjj is the product of a population regression coefficient
view as the conventional F test of Grp is actually, instead, an
and the score on the covariate for the ith subject in the y'th group,
evaluation of the variance shared by GrpKS and DVKS: How large
and €y is an error term for the fth subject in the jth group. The Oj
area 6 is, compared with the sum of area 6 + area 7.
term obviously differentiates the equations, but the two |BXy terms
would not generally be identical, either. Further discussion of these
equations and of the F test of the main effect of group based on
them can be found in Maxwell et al. (1985) and elsewhere. The 1
Gregory Miller was Chair of the NIMH Clinical Psychopathology
point in this second view is that the actual main-effect test is not Review Committee when this article was written. Approximately one third
simply the direct outcome of a regression, as the MRC approach of the grant applications he reviewed proposed a questionable or clearly
implies. invalid use of ANCOVA.
42 MILLER AND CHAPMAN
ANCOVA When Groups Do Not Differ on the Covariate Violation of this assumption is not as disabling as nonequiva-
lence of groups on the covariate. When faced with heterogeneity of
In the classical treatment of random assignment to groups, Cov regression slopes, the investigator is encouraged (e.g., Cohen &
should, in principle, share no variance with Grp, as a direct result Cohen, 1983) simply to frame the analysis as a hierarchical or
of the random assignment (Figure 1, left panel). That is, the simultaneous regression and in that context to include an interac-
expected value of Cov will be the same for every group, and, tion term consisting of the product of Cov and Grp (see Cohen &
except for random error, the group means for Cov will be identical. Cohen, 1983, for methods of representing categorical variables in
As a consequence, entry into the regression of Cov before Grp will such an approach). In effect, Cov is no longer viewed as a quali-
remove no variance from Grp. Thus, Grp = Grpres. (In practice, tatively distinct covariate with a purely methodological role in the
some nonzero correlation between Cov and Grp may be observed, analysis but as a meaningful, substantive part of the analysis. Such
but generally the correlation will be negligible. Viewpoints on interactions may be theoretically interesting. Given that the inves-
cases in which it is not negligible are discussed below.) When tigator's goal is to identify sources of variance in DV, the test of
Grp = Grpres, the only effect of ANCOVA is to remove variance such an interaction may be fruitful. Rogosa (1980) noted that the
(area 4) from DV which, from the standpoint of Grp, is simply typical all-or-none framing of the assumption of equality of re-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
noise (error). Thus, Grp will correlate more highly with DVKS than gression slopes is simplistic and often inappropriate. For example,
This document is copyrighted by the American Psychological Association or one of its allied publishers.
with DV, resulting (all things being equal) in a larger effect size with very large sample sizes, differences in slope might be "sig-
and a more powerful significance test. Specifically in the left panel nificant" but trivially small; with small sample sizes, functionally
of Figure 1, Grp overlaps with a higher proportion of (the smaller) important differences in slopes might not be "significant." He
DVres (area 6 + area 7) than with DV (area 4 + area 6 + area 7), discussed other ways to frame the issue and some alternative
so the correlation of Grp and DVKS is higher than that of Grp analytic strategies in the face of nonparallel regression slopes.
and DV.
Used this way, ANCOVA serves as a legitimate and appealing Invalid ANCOVA When Groups Differ on the Covariate:
noise-reduction technique for evaluating the relationship of Grp Consensus of the Technical Literature
and DV. Other than the loss of a degree of freedom associated with
The assumption of homogeneity of regression slopes is fairly
inclusion of Cov in the analytic model, ANCOVA would appear to
well known. In contrast, the importance of groups not differing on
be valuable in improving the power of the test of Grp (see Bock,
Cov is not widely recognized in the psychopathology literature.
1975, and Porter & Raudenbush, 1987, for technical treatments of
Mistakenly, investigators frequently turn to ANCOVA in hopes of
the gain in efficiency from an appropriate use of ANCOVA). For
"controlling for" group differences on the covariate. There is no
example, a study comparing simple phobic and social phobic
statistical means of accomplishing this "control" (Chapman &
patients' response to phobia-relevant material might use individ-
Chapman, 1973; Fleiss & Tanur, 1973; Lord, 1967).
ually developed videos that differ in duration, and duration itself In fact, "control" is altogether the wrong metaphor for under-
might affect peak anxiety rating. If the groups do not differ on standing what ANCOVA accomplishes. We have found that in-
mean video duration, duration might serve as a covariate, reducing vestigators are frequently surprised when this is pointed out. Some
variance in anxiety rating that is driven by video duration and assert that "controlling" or "removing" nontrivial group differ-
unrelated to diagnostic group. In general, this will improve the ences on the covariate is the primary use of ANCOVA. It is
observed relationship between Grp and DV. Given this benefit, important to establish that relevant literature roundly condemns
ANCOVA is much underutilized in the psychopathology literature. this view, before attempting to provide an accessible explanation
It is important to measure Cov as reliably as possible to maxi- for why this view is mistaken and before considering some alter-
mize its ability to capture noise variance in DV and to ensure that natives. Modern literature on ANCOVA began with Cochran
the adjusted DV, £>Vres, is not contaminated by noise associated (1957), who stated, "[I]t is important to verify that the treatments
with the measurement of Cov. Measurement error could distort the have had no effect on" the covariate and "a covariance adjustment
resulting mean effects and significance test because the adjustment . . . may remove most of the real treatment effect" (p. 264). Camp-
is necessarily for observed scores rather than true scores on the bell and Stanley (1963) spoke favorably of ANCOVA in general
covariate (Reichardt, 1979; Elashoff, 1969; Fleiss & Tanur, 1973; but cautioned, "The usual statistics [including ANCOVA, cited
Maxwell & Delaney, 1990; Richards, 1980; although see Huitema, earlier on the same page] are appropriate only where individual
1980; Overall & Woodward, 1977; Porter & Raudenbush, 1987; students have been assigned at random to treatments" (p. 23). In
Reichardt, 1979; and Reichardt & Bormann, 1994, for discussions contrast, ANCOVA "to compare naturally occurring groups
of how this problem might be dealt with). This issue of measure- which is contrary to the admonitions of many experts in experi-
ment error is particularly problematic in studies with a small mental design, can yield statistically significant results which are
sample size. entirely spurious" (Evans & Anastasio, 1968, p. 225). Elashoff
With this MRC perspective on ANCOVA in mind, the assump- (1969) explained:
tions in ANCOVA can be readily summarized (for more extended,
A basic postulate underlying the use of analysis of covariance to
technical treatments of the assumptions of ANCOVA, see
adjust treatment means for the effects of the covariate x is that the x
Elashoff, 1969; Fleiss & Tanur, 1973; Huitema, 1980; Maxwell et variable is statistically independent of the treatment effect. In other
al., 1985; Porter & Raudenbush, 1987; Wildt & Ahtola, 1978). words, this means that the distribution of covariate values is not
Widely noted is that ANCOVA assumes that groups do not differ affected by the treatments either through direct causation or through
in the regression of DV on Cov. This is often referred to as the correlation with another affected character (and the x variable does not
assumption of homogeneity of regression slopes. affect the treatment). . . . Therefore, i f . . . treatments are not manip-
SPECIAL SECTION: ANALYSIS OF COVARIANCE 43
ulated as independent variables but are classifications of naturally that if the treatments differentially affect the covariate scores, then
occurring groups this assumption will not be valid... . Analysis of an ANCOVA .. . would in fact remove from the treatment sum of
covariance is inappropriate if the covariate is not independent of the squares part of the treatment effect you really want included"
treatment, (pp. 388-389) (Maxwell & Delaney, 1990, pp. 380, 382-383).
Chapman and Chapman (1973) stated that there is no statistical
method that can address the question of whether two groups that Invalid ANCOVA When Groups Differ on the Covariate:
differ on variable A would differ on variable B if they did not differ The Substantive Problem
on variable A and added, "The only legitimate use of analysis of
covariance is for reducing variability of scores in groups that vary Granting this highly consistent sentiment in the technical liter-
randomly. Its use is invalid for preexisting disparate groups that ature, what is it that makes ANCOVA unacceptable in the face of
differ on the variable to be covaried out" (p. 82). Cohen and Cohen group differences on Cov? Technical discussions of this issue,
(1975) stated the principle forcefully: known as Lord's Paradox, have long been available (e.g., Bock,
1975; Fleiss & Tanur, 1973; Holland & Rubin, 1983; Lord, 1967,
[O]ne does not answer such questions with [ANCOVA]. .. . What are 1969; Maris, 1998; Maxwell & Delaney, 1990). It may be useful,
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
cians analyzing a hypothetical data set. The data are for boys and older and younger children on height, because growth is an inher-
girls at the beginning and end of an academic year. The boys, as a ent (not chance or noise) differentiation of the two groups.
group, start and end the year weighing more than the girls, and As noted above, it would be entirely reasonable, given such a
neither groups' average weight changes over time. An issue is data set, to explore the relationships among age, height, and weight
whether diet affected boys and girls differentially during the (Cohen & Cohen, 1983). The point is that statistical control, in the
school year. The statistician who favored an ANCOVA (incor- sense of cleanly removing the effect of Cov, is not what one would
rectly so, in the opinion of Lord and numerous subsequent authors be able to accomplish with ANCOVA. Cohen and Cohen (1983)
quoted above) used initial weight as a covariate and concluded: provided the following extreme example: "Consider the fact that
the difference in mean height between the mountains of the
If one selects on the basis of initial weight a subgroup of boys and a
Himalayan and Catskill ranges, adjusting for differences in atmo-
subgroup of girls having identical frequency distributions of initial
weight,... the subgroup of boys is going to gain substantially more
spheric pressure, is zero!" (p. 425), the point being that one has not
during the year than the subgroup of girls. (Lord, 1967, p. 305) in any sense "equated" the two mountain ranges by using atmo-
spheric pressure as a covariate.
Lord's pro-ANCOVA statistician concluded from this that there is
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
causing offense, because it was published in a respected journal discussed narrow purposes or conditions under which ANCOVA
before the problem was recognized (Cochran, 1957) and because it might be interpretable despite nonrandorri assignment.
is a deservedly famous article for other reasons, having recently There can thus be a range of views on how to interpret certain
been republished as a landmark (Journal ofNIH Research, 1997). cases in which groups differ on Cov. Even in the case of random
Among other important contributions, this paper introduced the assignment to groups, nontrivial differences on Cov will occasion-
still widely used Continuous Performance Test (CPT) to the neu- ally arise by chance. In the right panel of Figure 1, the issue is
ropsychology and psychopathology literatures. The design in- whether area 2 and area 5 are substantively important parts of Grp.
cluded several groups of child and adult brain-damaged and con- If not, then after their removal GrpKS would still be a valid
trol samples, with X and AX denoting two types of trials. measure of the Group construct.
How is an investigator to know when a Type I error has
Since there was a significant age difference between the Child sub- occurred in a test of group differences on Cov? A moderate
groups and a significant IQ difference between the Adult sub- position might be that, if the investigator has good reason to
groups . . . , the differences between the paired subgroups means on X believe that group differences on Cov truly arose by chance,
and AX in these two groups were evaluated by means of an analysis ANCOVA is appropriate (Maxwell & Delaney, 1990). The ratio-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
as the covariate, as a means of understanding the patterns of shared strategies relative to ANCOVA (see also Reichardt & Bormann,
variance in a data set. He hastened to add that, when groups differ 1994). Cohen and Cohen (1983) and Harris et al. (1971) recom-
on the covariate, "it would be reasonable to speculate that the mended incorporating the covariate into the analysis—no longer
treatment effect on the dependent variable is mediated by the conceived as a covariate but as another substantive variable. Both
covariate. It couldn't be concluded. . . . " In effect, Huitema was of these are sound, relatively familiar strategies. Rosenbaum and
describing the strategy of Cohen and Cohen (1983) in using a Rubin (1984; Rosenbaum, 1995; Rubin, in press) discussed the
variety of hierarchical regression analyses to evaluate the relation- less widely known concept of propensity score, the conditional
ships among variables. Procedures such as those discussed by probability of assignment to a particular group or treatment given
Baron and Kenny (1986) might then be used to confirm media- a set of observed covariates. They noted that there are circum-
tional relationships. stances under which propensity score analysis can balance the
group on the covariates. It cannot address unobserved differences
Beyond Statistical Concerns in covariates, although there are methods of determining the extent
of bias arising from unobserved covariates (Rosenbaum, 1984).
We have emphasized that the problem with ANCOVA in the Less specific but potentially fruitful is the aggregation (perhaps via
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
face of group differences on the covariate is as much a substantive formal meta-analysis) of a large number of individually compro-
This document is copyrighted by the American Psychological Association or one of its allied publishers.
and interpretative issue as it is a mathematical issue. It can be mised studies, none of which may have random assignment but
noted that the problem of misuse of ANCOVA putatively to collectively present a strong inferential case (Rosenbaum, 1987).
"control for" differences on the covariate is not confined to aca- For example, no study of lung cancer has undertaken random
deme. An article in a prominent weekly political magazine stated: assignment to chronic smoking and nonsmoking groups, but a
strong case for causal factors in the face of potentially confounded
On the average, students in predominantly White districts did much
better on reading tests than those in predominantly Black districts. The covariates (such as hypertension, on which smokers and non-
reason? Not poverty, parent education, or class size. The White smokers might vary and on which smoking can have causal ef-
districts were, of course, generally better off, the parents of pupils fects) has been made on the basis of a variety of different studies.
were more highly educated, and so on. But when these differences Importantly for psychopathology research, Rosenbaum's sugges-
were held constant statistically, the difference that made a difference tion can be generalized beyond traditionally defined control
was the quality of the teaching staff, as measured by a language skills groups. For example, in a study comparing symptoms of psychosis
exam. (Thernstrom, 1991, p. 22; emphasis added) observed in schizophrenia and bipolar patients, there are advan-
tages to recruiting multiple samples, known to be different, within
It is apparent from this example that the public-policy stakes
each diagnosis.
associated with the misuse or misinterpretation of ANCOVA can
Placement of group means on the regression line. Fleiss and
be quite high. As we have argued on substantive grounds, com-
Tanur (1973) suggested another alternative to ANCOVA. They
plementing previous technical discussions, it is simply not possible
reasoned from the principle that a problem in using analysis of
for such differences to be "held constant statistically" as this article
covariance for the comparison of intact groups is that the regres-
claims was done, unless we were to suppose that teachers were
sion line within groups cannot legitimately be used for between-
assigned to White and Black school districts randomly or that
groups comparisons. They inferred from this that the investigator
differences in poverty, parental education, and class size existed
should consider the performance of many differing groups of
only by chance. Surely, teacher recruitment (and subsequent per-
participants, examining how different groups place on the regres-
formance) is in part driven by differences in district poverty, parent
sion line.
education, class size, etc. All of these variables are correlated, and
As a simple example to illustrate their method, an investigator
they must be understood as inherently confounded if one adopts
might wish to test the hypothesis that schizophrenic patients show
the superficially appealing but inappropriate goal of "controlling
a greater concreteness of proverb interpretation than is accounted
for" some of them. Ultimately, "quality of teaching staff is not a
for by their lower Verbal IQ score on the Wechsler Adult Intelli-
variable substantively left in the analysis by the time all of those gence Scale (WAIS). (Schizophrenic individuals' thinking has
covariates that are surely meaningfully correlated with it have been often been characterized as more concrete and less abstract than
partialed out. Only a highly residualized variable given the same nonpatients'. Judging that a hammer and a screwdriver are both
name is being tested. One could not conclude anything about the tools is more abstract than that they are both made of metal.) The
original variable, "quality of teaching staff." At best, one could investigator recognizes that she or he cannot simply match sub-
speculate, as Huitema (1980) suggested. Such nuances are often groups of schizophrenic and control participants on WAIS IQ and
lost in public debate and, in our experience, in the psychopathol- then compare the matched subgroups on concreteness because
ogy literature. such a comparison would be contaminated by regression toward
the mean (see discussion of Lord's Paradox above). Fleiss and
Some Alternatives to ANCOVA Tanur's (1973, p. 522) solution would be to extend the study of IQ
and concreteness to "many different kinds of subjects (normals,
Methods
neurotics, depressives, etc.), all having in common the fact that
Once again, analysis of covariance cannot tell us how groups they are not schizophrenic." Then, after obtaining mean scores on
would differ if they did not differ on the covariate. What then IQ and on WAIS Verbal IQ for each group, the investigator finds
should the investigator do instead of analysis of covariance? Max- a transformation of the scores such that a nearly linear relation can
well and Delaney (1990) discussed the analysis of gain scores and be fitted to all pairs of group means. In our example, a straight line
the blocking of subjects on the covariate, noting limitations in both would be fitted to the relation of mean group concreteness score to
SPECIAL SECTION: ANALYSIS OF COVARIANCE 47
mean group IQ score. Then the means of the schizophrenic group A third problem is that the answer yielded by the study does not
are examined in relation to the fitted line. If the schizophrenic precisely match the original question. The investigators' hypoth-
patients score more deviantly on concreteness than predicted by esis was probably not concerned with a comparison of the con-
the multigroup regression line, one infers that their concreteness is creteness of schizophrenia with that of borderline retardation. In
unusual. our view, however, this kind of answer is often better than the
This solution is problematic. It is, of course, rather impractical available alternatives.
to test many groups in most such studies. More importantly, the
method may be flawed in some cases. If, as Fleiss and Tanur Substantive Questions
suggested, only transformation data rather than original data fit a
Future work may produce more general or more satisfactory
straight line, the meaning of that linearity is unclear. Some trans-
means to address the question psychopathologists usually attempt
formations might alter the data considerably, and the transforma-
to answer with ANCOVA, a question for which the technical
tion is selected to fit all of the pairs of group means to a straight
literature shows ANCOVA to be inappropriate: How would the
line except those of the schizophrenics. This procedure appears
groups differ on DV if they did not differ on Cov? However, we
vulnerable to random error that could foster the predicted result
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Bock, D. (1975). Multivariate statistical methods in behavioral research. analyzing data: A model comparison perspective. Belmont, CA: Wads-
New York: McGraw-Hill. worth.
Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi- Maxwell, S. E., Delaney, H. D., & Manheimer, J. M. (1985). ANOVA of
experimental designs for research. Chicago: Rand McNally. residuals and ANCOVA: Correcting an illusion by using model com-
Chapman, L. J., & Chapman, J. P. (1973). Disordered thought in schizo- parisons and graphs. Journal of Educational Statistics, 10, 197-209.
phrenia. New York: Appleton-Century-Crofts. Miller, M. B., Chapman, L. J., & Chapman, J. P. (1993). Slowness and the
Cochran, W. G. (1957). Analysis of covariance: Its nature and uses. preceding preparatory interval effect in schizophrenia. Journal of Ab-
Biometrics, 44, 261-281. normal Psychology, 102, 145—151.
Cohen, J., & Cohen, P. (1975). Applied multiple regression/correlation Neale, J. M., & Oltmanns, T. F. (1980). Schizophrenia. New York: Wiley.
analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum. Overall, J. E., & Woodward, J. A. (1977). Nonrandom assignment and the
Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis of covariance. Psychological Bulletin, 84, 588-594.
analysis for the behavioral sciences (2nd edition). Hillsdale, NJ: Erl- Porter, A. C., & Raudenbush, S. W. (1987). Analysis of covariance: Its
baum. model and use in psychological research. Journal of Counseling Psy-
Deldin, P. J. (1996). Information processing in major depression: The ERP chology, 34, 383-392.
connection. Unpublished doctoral dissertation, University of Illinois, Reichardt, C. S. (1979). The statistical analysis of data from nonequivalent
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Dixon, W. J. (1992). BMDP statistical software manual. Los Angeles: mentation: Design and analysis issues for field settings (pp. 147-205).
University of California Press. Boston: Houghton Mifflin.
Elashoff, J. D. (1969). Analysis of covariance: A delicate instrument. Reichardt, C. S., & Bormann, C. A. (1994). Using regression models to
American Educational Research Journal, 6, 383-401. estimate program effects. In J. S. Wholey, H. P. Harry, & K. E.
Evans, S. H., & Anastasio, E. J. (1968). Misuse of analysis of covariance Newcomer (Eds.), Handbook of practical program evaluation (pp. 417—
when treatment effect and covariate are confounded. Psychological 455). San Francisco: Jossey-Bass.
Bulletin, 69, 225-234. Richards, J. E. (1980). The statistical analysis of heart rate: A review
emphasizing infancy data. Psychophysiology, 17, 153-166.
Fleiss, J. L., & Tanur, J. M. (1973). The analysis of covariance in psycho-
Rogosa, D. (1980). Comparing nonparallel regression lines. Psychological
pathology. In M. Hammer, K. Salzinger, & S. Sutton (Eds.), Psychopa-
Bulletin, 88, 307-321.
thology: Contributions from the social, behavioral, and biological sci-
Rosenbaum, P. R. (1984). From association to causation in observational
ences (pp. 509-527). New York: Wiley.
studies: The role of strongly ignorable treatment assignment. Journal of
Harris, D. R., Bisbee, C. T., & Evans, S. H. (1971). Further comments:
the American Statistical Association, 79, 41-48.
Misuse of analysis of covariance. Psychological Bulletin, 75, 220-222.
Rosenbaum, P. R. (1987). The role of a second control group in an
Heckman, J. J. (1989). Causal inference and nonrandom samples. Journal
observational study (with discussion). Statistical Science, 2, 292-316.
of Educational Statistics, 14, 159-168.
Rosenbaum, P. R. (1995). Observational studies. New York: Springer-
Heller, W., Etienne, M. A., & Miller, G. A. (1995). Patterns of perceptual
Verlag.
asymmetry in depression and anxiety: Implications for neuropsycholog-
Rosenbaum, P. R., & Rubin, D. (1984). Reducing bias in observational
ical models of emotion and psychopathology. Journal of Abnormal
studies using subclassifications on the propensity score. Journal of the
Psychology, 104, 327-333.
American Statistical Association, 79, 516-524.
Holland, P. W., & Rubin, D. B. (1983). On Lord's paradox. In H. Wainer Rosvold, H. E., Mirsky, A. F., Sarason, I., Bransome, E. D., Jr., & Beck,
& S. Messick (Eds.), Principles of modem psychological measurement L. H. (1956). A continuous performance test of brain damage. Journal of
(pp. 3-35). Hillsdale, NJ: Erlbaum. Consulting Psychology, 20, 346-350.
Huitema, B. (1980). Analysis of covariance and alternatives. New York: Rubin, D. B. (in press). Propensity score methods. In L. Bickman (Ed.),
Wiley. Contributions to research design: Donald Campbell's legacy (Vol. 2).
Jin, P. (1992). Toward a reconceptualization of the Law of Initial Value. Thousand Oaks, CA: Sage.
Psychological Bulletin, 111, 176-184. Siddle, D. A. T., & Turpin, G. (1980). Measurement, quantification, and
Judd, C. M., McClelland, G. H., & Smith, E. R. (1996). Testing treatment analysis of cardiac activity. In I. Martin & P. H. Venables (Eds.),
by covariate interactions when treatment varies within subjects. Psycho- Techniques in psychophysiology (pp. 139-246). London: Wiley.
logical Methods, 1, 366-378. Thernstrom, A. (1991, December 16). Beyond the pale. New Republic.
Keller, J., Nitschke, J. B., Bhargava, T., Deldin, P. J., Gergen, J. A., Miller, Wainer, H. (1991). Adjusting for differential base rates: Lord's Paradox
G. A., & Heller, W. (2000). Neuropsychological differentiation of de- again. Psychological Bulletin, 109, 147-151.
pression and anxiety. Journal of Abnormal Psychology, 109, 3-10. Walker, E., & Lewine, R. (1993). Sampling biases in studies of gender and
Kessler, R. C., McGonagle, K. A., Zhao, S., Nelson, C. B., Hughes, M., schizophrenia. Schizophrenia Bulletin, 19, 1-7.
Eshleman, S., Wittchen, H., & Kendler, K. S. (1994). Lifetime and Watson, D., Weber, K., Assenheimer, J. S., Clark, L. A., Strauss, M. E., &
12-month prevalence of DSM-III-R psychiatric disorders in the United McCormick, R. A. (1995). Testing a tripartite model: I. Evaluating the
States. Archives of General Psychiatry, 51, 8-19. convergent and discriminant validity of anxiety and depression symptom
Lord, F. M. (1967). A paradox in the interpretation of group comparisons. scales. Journal of Abnormal Psychology, 104, 3-14.
Psychological Bulletin, 68, 304-305. Wildt, A. R., & Ahtola, O. T. (1978). Analysis of covariance. Beverly
Lord, F. M. (1969). Statistical adjustments when comparing preexisting Hills, CA: Sage.
groups. Psychological Bulletin, 72, 336-337.
Maris, E. (1998). Covariance adjustment versus gain scores—revisited. Received November 3, 1998
Psychological Methods, 3, 309-327. Revision received September 7, 2000
Maxwell, S. E., & Delaney, H. D. (1990). Designing experiments and Accepted September 7, 2000 •