A Unification of Models For Meta-Analysis of Diagnostic Accuracy Studies
A Unification of Models For Meta-Analysis of Diagnostic Accuracy Studies
239–251
doi:10.1093/biostatistics/kxl004
Advance Access publication on May 11, 2006
JONATHAN J. DEEKS
Centre for Statistics in Medicine, Oxford, UK
MATTHIAS EGGER
Department of Social and Preventive Medicine, University of Berne, Switzerland
S UMMARY
Studies of diagnostic accuracy require more sophisticated methods for their meta-analysis than studies of
therapeutic interventions. A number of different, and apparently divergent, methods for meta-analysis of
diagnostic studies have been proposed, including two alternative approaches that are statistically rigorous
and allow for between-study variability: the hierarchical summary receiver operating characteristic (ROC)
model (Rutter and Gatsonis, 2001) and bivariate random-effects meta-analysis (van Houwelingen and
others, 1993, 2002; Reitsma and others, 2005). We show that these two models are very closely related,
and define the circumstances in which they are identical. We discuss the different forms of summary model
output suggested by the two approaches, including summary ROC curves, summary points, confidence
regions, and prediction regions.
Keywords: Bivariate normal distribution; Diagnostic tests; Hierarchial models; HSROC model; Meta-analysis; ROC
analysis; Sensitivity and specificity.
1. I NTRODUCTION
There is increasing interest in systematic reviews and meta-analyses of data from diagnostic accuracy
studies (Deeks, 2001; Deville and others, 2002; Bossuyt and others, 2003; Khan and others, 2003;
c The Author 2006. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected].
240 R. M. H ARBORD AND OTHERS
Whiting and others, 2004; Tatsioni and others, 2005; Gluud and Gluud, 2005). Typically, the data from
each primary study are summarized as a 2 × 2 table, based on dichotomized test result against true disease
status, from which familiar measures such as sensitivity and specificity can be derived.
Several statistical methods for meta-analysis of data from diagnostic test accuracy studies have been
proposed (Moses and others, 1993; Rutter and Gatsonis, 2001; Dukic and Gatsonis, 2003; Siadaty and
Shu, 2004; Reitsma and others, 2005). These methods reflect two important characteristics of such data.
First, a negative correlation between sensitivity and specificity is expected because of the trade-off be-
tween these measures as the test threshold varies (Moses and others, 1993; Deeks, 2001). Second, and in
contrast to meta-analysis of data from randomized controlled trials, substantial between-study heterogene-
2. T HE BIVARIATE MODEL
The bivariate model is based on an approach to meta-analysis introduced by van Houwelingen and others
(1993) (see also van Houwelingen and others, 2002). It has recently been applied to meta-analysis of
diagnostic accuracy studies by Reitsma and others (2005).
Following Reitsma and others (2005), we define µ Ai as the logit-transformed sensitivity in study i,
and µ Bi as the logit-transformed specificity. We use the letter µ where Reitsma and others (2005) used θ
to avoid a clash of notation with the HSROC model defined in Section 3. The bivariate model is a random-
effects model in which the logit transforms of the true sensitivity and true specificity in each study are
assumed to have a bivariate normal distribution across studies, thereby allowing for the possibility of
correlation between them (Reitsma and others, 2005):
µ Ai µA σ A2 σ AB
∼N , AB with AB = . (2.1)
µ Bi µB σ AB σ B2
Covariates that affect either sensitivity or specificity or both can be included in a natural way by replacing
one or both of the means µ A and µ B by linear predictors in the covariates. For example, for a single
covariate Z that may affect both sensitivity and specificity, we could replace µ A by µ A + ν A Z i and µ B
by µ B + ν B Z i .
Unification of models for diagnostic meta-analysis 241
3. T HE HSROC MODEL
The HSROC model (Rutter and Gatsonis, 2001) was motivated by a model for ordinal regression
(McCullagh, 1980) that has been used to estimate a receiver operating characteristic (ROC) curve from
a single study with data available for multiple thresholds (Tosteson and Begg, 1988). The model is for-
mulated in terms of the probability πi j that a patient in study i with disease status j has a positive test
result, where j = 0 for a patient without the disease and j = 1 for a patient with the disease. In the usual
terminology of diagnostic accuracy studies, πi1 is the true-positive rate or sensitivity in study i, while πi0
is the false-positive rate, equal to 1 − specificity. The HSROC model is defined by separate equations
where X i j is a dummy variable denoting the true disease status for a patient in study i with disease
status j. Rutter and Gatsonis (2001) chose to code X i j = − 12 for those without disease ( j = 0) and
+ 12 for those with disease ( j = 1). Both θi and αi are allowed to vary between studies. Rutter and
Gatsonis (2001) refer to the θi as “cutpoint parameters” or “positivity criteria,” as they model the trade-
off between sensitivity and specificity in each study: true-positive rate (sensitivity) and false-positive rate
(1−specificity) both increase with increasing θi . The αi are “accuracy parameters,” as they measure the
difference between true-positive and false-positive fractions in each study. When β = 0, the diagnostic
odds ratio for each study does not depend on the cutpoint parameter θi , and αi is then the log of the
diagnostic odds ratio. β is a “scale parameter” or “shape parameter” which models possible asymmetry
in the ROC curve by allowing true-positive and false-positive fractions to increase at different rates as
θi increases. When β = 0, the diagnostic odds ratio varies with θi even if the accuracy parameter αi is
held fixed. β is assumed to be constant across studies, although this assumption can be relaxed somewhat,
for example to allow a different value of β in each of several groups of studies (Rutter and Gatsonis,
2001).
αi ∼ N( + λZ i , σα2 ), (3.3)
where the coefficients γ and λ express the effect of the covariate Z on the cutpoint and accuracy pa-
rameters, respectively. This model may be extended to include more than one covariate, or to allow the
covariates that affect the accuracy parameters to differ from those that affect the cutpoint parameters.
242 R. M. H ARBORD AND OTHERS
µ Ai = logit(πi1 ) (4.3)
We can therefore relate the random variables that form the basis of the two models:
1
µ Ai = b−1 θi + αi (4.5)
2
1
µ Bi = −b θi − αi . (4.6)
2
This pair of equations tells us that µ Ai and µ Bi are linear combinations of two random variables, θi
and αi , which the HSROC model assumes to have independent normal distributions (conditional on any
covariates). Any pair of linear combinations of random variables with normal distributions has a bivariate
normal distribution (see, e.g. Dudewicz and Mishra, 1988, p. 242). Therefore, the HSROC model implies
that the joint distribution of µ Ai and µ Bi is bivariate normal. So the HSROC model is precisely equivalent
to the bivariate model. We give explicit expressions for the relationships between their parameters in the
subsections that follow.
We can express the relationship more concisely using matrix notation. We may write (4.5) and (4.6)
in a single matrix equation as
−1 1 −1
µ Ai −1 θi −1
b 2b
=S , where S = . (4.7)
µ Bi αi −b 1
2b
Inverting this,
θi µ Ai 1
2b − 12 b−1
=S , where S = . (4.8)
αi µ Bi b b−1
S is then the transformation matrix associated with the change from the bivariate model coordinates (logit-
transformed sensitivity and specificity) to the HSROC model coordinates (cutpoint and accuracy parame-
ters). Note that S is not orthogonal (S−1 = ST ). As illustrated in Section 6, it follows that when plotted in
bivariate model space (logit-ROC space), the axes corresponding to the coordinates of the HSROC model
are not perpendicular to each other.
Unification of models for diagnostic meta-analysis 243
The assumption of the HSROC model that θi and αi are uncorrelated, i.e. the off-diagonal elements
above are zero, fixes the value of b and hence the transformation matrix S. So S is a non-orthogonal
transformation that diagonalizes the variance–covariance matrix of the bivariate model. On expanding
√ the
right-hand side of (4.10), we find that these off-diagonal elements are zero if and only if b = σ B /σ A or,
equivalently,
β = log(σ B /σ A ). (4.11)
Thus, the shape parameter (β) of the HSROC model is determined solely by the ratio of the variances of
logit sensitivity and logit specificity in the bivariate model, and, perhaps surprisingly, is unrelated to their
correlation. Equations (4.9) and (4.10) then allow us to relate the other parameters of the HSROC model
to those of the bivariate model:
1
= {(σ B /σ A )1/2 µ A − (σ A /σ B )1/2 µ B } (4.12)
2
The extension to more than one covariate, with each covariate affecting both accuracy and cutpoint
parameters, is straightforward. Therefore, a bivariate model in which one or more covariates affect both
sensitivity and specificity is equivalent to an HSROC model in which the same covariates are allowed to
affect both accuracy and cutpoint parameters.
However, a bivariate model in which different covariates are allowed to affect sensitivity from speci-
ficity, or covariates are included for only sensitivity or only specificity, will not be equivalent to an HSROC
model including covariates, unless constraints are imposed on the relationship between the coefficients of
the covariates in the HSROC model. The converse is also true.
where s A and s B are the estimated standard errors of µ̂ A and µ̂ B , r is the estimate of their correlation, and
varying t from 0 to 2π generates the boundary of the ellipse. The constant c has been called the boundary
of the ellipse (Alexandersson, 2004); asymptotically, to give a 100(1 − α%) confidence region,
constant
c = χ2;α 2 , where χ 2 is the upper 100α% point of the χ 2 distribution with two degrees of freedom.
2;α
When the number of studies is small, it may be preferable to use a more conservative approximate confi-
dence region given by c = 2 f 2,n−2;α , where n is the number of studies and f 2,n−2;α is the upper 100α%
point of the F distribution with degrees of freedom 2 and n − 2 (Douglas, 1993; Chew, 1966). Such an
ellipse in logit-ROC space can then be back-transformed to conventional ROC space to give a confidence
In practice, both terms must be estimated by fitting the model to the data. The parameters s A , s B , and r in
(5.2) and (5.3) can then be replaced by the corresponding quantities derived from this covariance matrix to
give the prediction ellipse in logit-ROC space. Again, this can be back-transformed to a prediction region
for the true sensitivity and specificity of a future study in conventional ROC space.
As an example, we shall apply both methods to data on 17 studies of lymphangiography for the di-
agnosis of lymph node metastasis in women with cervical cancer, one of three imaging techniques in
the meta-analysis of Scheidler and others (1997) which has been much used as an example data set for
methodological papers on diagnostic meta-analysis (Rutter and Gatsonis, 2001; Macaskill, 2004; Reitsma
and others, 2005). A SROC plot showing the estimates of sensitivity and specificity from the individual
studies is shown in Figure 1.
We fitted both the bivariate and the HSROC models using the NLMIXED procedure in the statistical
software package SAS (SAS Institute Inc., 2003), using code similar to that given by Macaskill (2004)
and available from the authors on request. Note that our results differ slightly from those in Reitsma and
others (2005) as they use empirical logit transforms and their standard errors followed by the MIXED
procedure in SAS, where we choose to model the binomial error structure directly using the NLMIXED
procedure.
Table 1 shows the parameter estimates obtained for both models, and the result of applying (4.11)–
(4.20) to transform estimates from the HSROC model to the corresponding parameters of the bivariate
model and vice versa. The standard errors of the transformed estimates were computed by the delta
method using the ESTIMATE statement of the NLMIXED procedure. As can be seen, the results are
virtually identical. (The standard errors are identical in theory due to the close relationship between the
delta method and maximum likelihood; Cox, 1998; Cox and Hinkley, 1974, Exercise 4.15.) By taking the
inverse logit transforms of µ A and µ B , respectively, and assuming their estimates have a normal distribu-
tion, the summary estimate of sensitivity is found to be 0.67 (95% CI, 0.60–0.74) and that of specificity
is 0.84 (95% CI, 0.76–0.89). In this example, σ AB is estimated to be positive, though with large standard
error. This implies a positive correlation between sensitivity and specificity across the studies, not the neg-
ative correlation that would be expected if the between-study heterogeneity was due mainly to variation
in threshold.
246 R. M. H ARBORD AND OTHERS
Table 1. Results of fitting the bivariate and HSROC models to the lymphangiography data
Figure 2 shows the 95% confidence region for the summary operating point and a 95% prediction
region for the true operating point in a single future study in both logit-transformed ROC space (left
panel) and back-transformed to conventional ROC space (right panel). The prediction region covers a
greater range of specificity than sensitivity, in contrast to the estimates from the separate studies shown
in the SROC plot in Figure 1, which exhibit more variation in estimated sensitivity than specificity. This
is due to the fact that most of the studies had a considerably larger number of patients with negative
results on the reference test than positive results, leading to greater sampling variability in the estimates
Unification of models for diagnostic meta-analysis 247
of sensitivity than specificity. The prediction region is for the true sensitivity and specificity in a future
study, not the estimated values.
Also shown in Figure 2 is the SROC curve (a straight line in logit-transformed ROC space). Note
that the SROC curve takes a conventional shape despite the positive estimate of the correlation between
sensitivity and specificity. The left-hand panel also shows the HSROC coordinate axes in logit-transformed
ROC space. Note that these axes do not align with the major or minor axes of the ellipse. The θ axis is
parallel to the ROC curve, while its horizontal reflection is parallel to the α axis. The method of Littenberg
and Moses (1993) (using unweighted linear regression) gives a curve similar to, but slightly above, the
HSROC curve, as shown in Macaskill (2004).
7. D ISCUSSION
We have shown that the HSROC model and the bivariate random-effects model for meta-analysis of di-
agnostic accuracy studies are very closely related, and in common situations identical. In the absence of
study-level covariates, they are different parameterizations of the same model. The bivariate model allows
inclusion of covariates that affect sensitivity or specificity or both, while the HSROC model allows co-
variates that affect accuracy or threshold parameters or both. An HSROC model that allows one or more
covariates to affect both accuracy and threshold parameters is equivalent to a bivariate model that allows
the same covariates to affect both sensitivity and specificity. However, the HSROC model can be more
easily extended to include a covariate to affect the degree of asymmetry of the SROC curve.
The models may differ in the options for introducing greater model parsimony by dropping or com-
bining parameters: The HSROC framework allows the analyst to drop the random effect for the accuracy
parameter and assume this is fixed across all studies, and hence that only the threshold parameter varies
between studies. This corresponds to perfect negative correlation between the logit transforms of sensi-
tivity and specificity in the bivariate model (σ AB = −σ A σ B ). The confidence and prediction regions then
collapse to lie along the SROC curve. The HSROC framework also allows the assumption of a symmetric
248 R. M. H ARBORD AND OTHERS
SROC curve with constant diagnostic odds ratio by setting β = 0, which in the bivariate model corre-
sponds to equal variances of logit sensitivity and logit specificity (σ A2 = σ B2 ). The ability to enforce such
constraints on bivariate model parameters may vary between software packages. By contrast, it does not
appear natural to set any of the parameters of the bivariate model to zero. One practical advantage of the
bivariate model is that it can be fitted in a wider range of software, for example MLwiN, SAS, or the
Stata package “gllamm” (Rabe-Hesketh and others, 2004), whereas the HSROC model is at present only
estimable using WinBUGS or the NLMIXED procedure in SAS.
As we have seen, the different parameterizations of the HSROC and bivariate models arise from dif-
ferent ideas of the most appropriate meta-analytic summaries of the results of diagnostic test accuracy
ACKNOWLEDGMENTS
We wish to acknowledge helpful discussions with Petra Macaskill. This work was supported by the MRC
Health Services Research Collaboration. Jonathan J. Deeks is funded in part by a Senior Scientist in
Evidence Synthesis Award from the UK Department of Health. Conflict of Interest: None declared.
R EFERENCES
A LEXANDERSSON , A. (2004). Graphing confidence ellipses: an update of ellip for Stata 8. Stata Journal 4, 242–56.
B OSSUYT, P. M., R EITSMA , J. B., B RUNS , D. E., G ATSONIS , C. A., G LASZIOU , P. P., I RWIG , L. M., L IJMER ,
J. G., M OHER , D., R ENNIE , D. AND DE V ET, H. C. W. (2003). Towards complete and accurate reporting of
studies of diagnostic accuracy: the STARD initiative. British Medical Journal 326, 41–4.
250 R. M. H ARBORD AND OTHERS
C HEW, V. (1966). Confidence, prediction, and tolerance regions for the multivariate normal distribution. Journal of
the American Statistical Association 61, 605–17.
C OX , C. (1998). Delta method. In: Armitage, P. and Colton, T. (editors), Encyclopedia of Biostatistics, 1st edition.
Chichester, UK: Wiley. pp 1125–1127.
C OX , D. R. AND H INKLEY, D. V. (1974). Theoretical Statistics. London: Chapman and Hall.
D EEKS , J. J. (2001). Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening
tests. British Medical Journal 323, 157–62.
S CHEIDLER , J., H RICAK , H., Y U , K. K., S UBAK , L. AND S EGAL , M. R. (1997). Radiological evaluation of lymph
node metastases in patients with cervical cancer. a meta-analysis. Journal of the American Medical Association
278, 1096–101.
S IADATY, M. AND S HU , J. (2004). Proportional odds ratio model for comparison of diagnostic tests in meta-analysis.
BMC Medical Research Methodology 4, 27.
TATSIONI , A., Z ARIN , D. A., A RONSON , N., S AMSON , D. J., F LAMM , C. R., S CHMID , C. AND L AU , J. (2005).
Challenges in systematic reviews of diagnostic technologies. Annals of Internal Medicine 142, 1048–55.
T OSTESON , A. N. AND B EGG , C. B. (1988). A general regression methodology for ROC curve estimation. Medical
[Received January 31, 2006; first revision March 17, 2006; second revision April 10, 2006;
accepted for publication May 10, 2006 ]