An Overview of Contemporary ROC Methodology in Medical Imaging and Computer-Assist Modalities
An Overview of Contemporary ROC Methodology in Medical Imaging and Computer-Assist Modalities
ROC
Receiver Operating Characteristic
(historic name from radar studies)
Relative Operating Characteristic
(psychology, psychophysics)
Operating Characteristic
(preferred by some)
OUTLINE:
- Efforts toward consensus development on present issues
- The ROC Paradigm
- The complication of reader variability
- The multiple-reader multiple-case (MRMC) ROC paradigm
- The measurement scales:
categories; patient-management/action; probability scale
- Complications from
location uncertainty
truth uncertainty
effective sample # uncertainty
reader vigilance
- Summary
Non-diseased
cases
Diseased
cases
Threshold
Non-diseased
cases
Diseased
cases
more typically:
Threshold
Diseased
cases
TPF, sensitivity
Non-diseased
cases
less aggressive
mindset
FPF, 1-specificity
Threshold
Diseased
cases
TPF, sensitivity
Non-diseased
cases
moderate
mindset
FPF, 1-specificity
10
Threshold
Diseased
cases
TPF, sensitivity
Non-diseased
cases
more
aggressive
mindset
FPF, 1-specificity
11
Non-diseased
cases
Threshold
Diseased
cases
TPF, sensitivity
FPF, 1-specificity
12
TPF, sensitivity
lin
Reader Skill
and/or
Level of Technology
FPF, 1-specificity
13
14
15
T r u e N e g a tiv e F r a c tio n
1 .0
0 .9
0 .8
0 .7
0 .6
0 .5
0 .4
0 .3
0 .2
0 .1
0 .0
0 .0
0 .9
0 .1
0 .8
0 .2
0 .7
0 .3
0 .6
0 .4
0 .5
0 .5
0 .4
0 .6
0 .3
0 .7
0 .2
0 .8
0 .1
0 .9
0 .0
1 .0
0 .0
0 .1
0 .2
0 .3
0 .4
0 .5
0 .6
0 .7
0 .8
0 .9
F a ls e N e g a tiv e F r a c tio n
T r u e P o s itiv e F r a c tio n
1 .0
1 .0
F a ls e P o s itiv e F r a c tio n
17
18
19
21
22
23
26
28
1.0
0.0
0.0
Modality B
Modality A
1.0
29
1.0
0.0
0.0
Conclusion:
Modality B
is better:
Modality B
Modality A
higher TPF at
same FPF, or
lower FPF at
False Positive Fraction
= 1.0 Specificity
1.0
same TPF
30
1.0
Modality B
Modality A
0.0
0.0
1.0
False Positive
Fraction
= 1.0 Specificity
31
1.0
0.0
0.0
Conclusion:
Modality B
Modality A
is better:
Modality A
higher TPF at
same FPF, or
False Positive Fraction
= 1.0 Specificity
1.0
lower FPF at
same TPF
32
Location scoring:
- The basic ROC paradigm is an assessment of the
decision making at the level of the patient.
- In complex imaging, assessment of
decision making at a finer level is desired,
i.e., assessment of localization is desired.
- Localization adds more information,
more statistical power
34
36
39
40
41
42
IN SUMMARY
These points reflect the current status of
on-going interactions
between and among
FDA
Academia
Industry sponsors
NCI and the LIDC
on the topic and issues for submissions like the present one
43
Selected References
Metz CE. Basic principles of ROC analysis. Seminars in Nuclear Medicine 1978;
8: 283-298.
Metz CE. ROC methodology in radiologic imaging. Invest Radiol 1986; 21: 720-33.
Metz CE. Some practical issues of experimental design and data analysis in
radiological ROC studies. Invest Radiol 1989; 24: 234-245.
Metz CE. Fundamentals of ROC Analysis. [In] Handbook of Medical Imaging.
Vol. 1. Physics and Psychophysics. Beutel J, Kundel HL, and Van Metter RL,
Eds. SPIE Press (Bellingham WA 2000), Chapter 15: 751-769.
Swets JA and Pickett RM. Evaluation of Diagnostic Systems. Academic Press,
New York, 1982.
Wagner RF, Beiden SV, Campbell G, Metz CE, and Sacks WM. Assessment of
medical imaging and computer-assist systems: Lessons from recent
experience. Acad Radiol 2002; 9: 1264-1277
Wagner RF, Beiden SV, Campbell G, Metz CE, and Sacks WM. Contemporary
issues for experimental design in assessment of medical imaging and
computer-assist systems. Proc. of the SPIE-Medical Imaging 2003; 5034:
213-224.
Dodd LE, Wagner RF, Armato SG, McNitt-Gray MF, et al. Assessment
methodologies and statistical issues for computer-aided diagnosis of lung
nodules in computed tomography: Contemporary research topics relevant
44 to
Toledano AY, Gatsonis C. Ordinal regression methodology for ROC curves derived
from correlated data. Statistics in Medicine 1996, 15: 1807-1826.
Nishikawa RM and Yarusso LM. Variations in measured performance of CAD
schemes due to database composition and scoring protocol. Proc. of the
SPIE 1998; 3338: 840-844.
Giger ML. Current issues in CAD for mammography. In: Doi K, Giger ML,
Nishikawa RM, and Schmidt RA, Eds. Digital Mammography 96. Elsevier
Science B.V. 1996, 53-59.
Clarke LP, Croft BY, Staab E, Baker H, Sullivan DC, National Cancer Institute
initiative: Lung image database resource for imaging research. Acad Radiol
2001 May;8(5):447-50.
Wagner RF, Beiden SV, Metz CE. Continuous versus categorical data for ROC
analysis: Some quantitative considerations. Acad Radiol 2001; 8: 328-334.
Revesz G, Kundel HL, and Bonitatibus M. The effect of verification on the
assessment of imaging techniques. Invest. Radiol. 1983; 18: 194-198.
Beiden SV, Wagner RF, Campbell G. Components-of-variance models and
multiple-bootstrap experiments: An alternative method for random-effects
receiver operating characteristic analysis. Acad Radiol 2000; 7: 341-349.
Obuchowski NA. Multireader, multimodality receiver operating characteristic
curve studies: Hypothesis testing and sample size estimation using an
analysis of variance approach with dependent observations. Acad Radiol
1995; 2 (Supplement 1): S22-S29.
Chan HP, Doi K, Vyborny CJ et al. Improvement in radiologists detection of
451102.
clustered microcalcifications on mammograms. Invest Radiol 1990; 25:
46
Metz CE. Evaluation of CAD Methods. In: Doi K, MacMahon H, Giger ML, and
Hoffmann KR, eds. Computer-Aided Diagnosis in Medical Imaging. Amsterdam:
Elsevier Science B.V. (Excerpta Medica International Congress Series, Vol.
1182), 1999, 543-554.
Chakraborty, DP. Statistical power in observer performance studies: Comparison
of the receiver operating characteristic and free-response methods in tasks
involving localization. Acad Radiol 2002; 9: 147-156.
Dorfman DD, Berbaum KS, Metz CE. Receiver operating characteristic rating
analysis: generalization to the population of readers and patients with the
jackknife method. Invest Radiol 1992; 27: 723-731.
Chakraborty DP and Berbaum KS: Comparing Inter-Modality Diagnostic
Accuracies in Tasks Involving Lesion Localization: A Jackknife AFROC
Approach. Supplement to Radiology, Volume 225 (P), 259, 2002.
Obuchowski NA, Lieber ML, Powell KA. Data analysis for detection and
localization of multiple abnormalities with application to mammography. Acad
Radiol 2000; 7: 516-525.
Rutter CM. Bootstrap estimation of diagnostic accuracy with patient-clustered
data. Acad Radiol 2000; 7 : 413-419.
Efron B, Tibshirani RJ. An introduction to the bootstrap. Chapman and Hall, New
York, 1993.
Beam CA, Layde PM, Sullivan DC. Variability in the interpretation of screening
mammograms by US radiologists. Arch Intern Med 1996; 156: 209-213.
Jiang Y, Nishikawa RM, Schmidt RA, Metz CE, Giger ML, Doi K. Improving breast
cancer diagnosis with computer-aided diagnosis. Acad Radiol 1999; 6: 22-33.
47